The Incident Response Plan Nobody Actually Follows
By Domenic DiNatale
Somewhere in your organization's shared drive, there is a document. It's probably titled "Incident Response Plan" or something close to it. It has a version number. It has been reviewed, approved, and signed off by someone with authority. It describes exactly what to do when a security incident occurs.
When an actual incident happens, almost nobody will follow it.
This isn't a cynical take — it's an operational reality that plays out in breach after breach, post-incident review after post-incident review. The plan gets bypassed not because people are negligent, but because it was written to describe a process, not to enable one. The gap between documented procedure and the decisions that actually get made in a real incident is where manageable events turn into disasters.
Why Plans Fail Under Pressure
Incident response plans tend to fail in predictable ways. They describe the right steps in the wrong level of abstraction. They assume information that isn't available. They were written by a security team without input from the operational teams who'll actually be executing them. And crucially, they've almost never been practiced under conditions that resemble an actual incident.
The information problem is particularly acute. IR plans often specify decision points — "if the incident is confirmed as a breach, escalate to the CISO" — without accounting for the fact that in a real incident, you often don't know what you have. You have anomalies. You have alerts. You have logs that may or may not be complete, from systems that may or may not be reliable, interpreted by people who are operating under significant stress. The decision logic in the plan assumes clarity you won't have.
The coordination problem is almost as bad. Real incidents involve simultaneous demands from legal, communications, executive leadership, technical responders, and potentially regulators or law enforcement. The plan typically describes a linear sequence. The reality is a simultaneous multi-front operation where someone needs to make hard calls about what to prioritize while incomplete information is flowing in from every direction.
And the people problem: the individuals named in the plan may not be available. Contact information may be out of date. The person who knows where the encryption keys are stored may be on vacation. The plan was written assuming a best-case operational context that rarely exists during an actual incident.
Practice Under Realistic Conditions
The difference between organizations that handle incidents well and those that don't is almost always the same: the good ones practiced under realistic conditions before the real thing happened.
Tabletop exercises — the kind where a facilitator walks a team through a scenario and everyone discusses what they'd do — have value. They create familiarity, surface assumptions, and generate conversation. But they don't produce operational readiness. They produce familiarity with the exercise. Real incidents don't wait for everyone to think carefully before responding. They arrive at 2:00 AM on a Friday, when half the team is unavailable, when the first indicators are ambiguous, and when every escalation has real business consequences.
Red team exercises and realistic simulations — where a team actually executes a response, in real systems or high-fidelity simulations, under time pressure and incomplete information — are a different category of preparation. They reveal the coordination failures, the information gaps, the tool-familiarity problems, and the decision bottlenecks that tabletops miss. They're also expensive and organizationally disruptive, which is why most organizations don't do them seriously.
The organizations that do them seriously discover, consistently, that their first real exercise is a disaster — and that the disaster is the point. You want to find out that your detection tooling doesn't give you the information you need during a drill, not during an actual breach. You want to discover that your escalation chain has three single points of failure during a simulation, not while trying to contain a ransomware deployment.
Architecture and IR Are Not Separate
There's a connection between architectural decisions and incident response effectiveness that doesn't get enough attention. The quality of your IR response is partially determined before any incident occurs, by how your systems are built.
If your network is flat, incident response requires quickly figuring out the scope of potential compromise across the entire environment — an inventory problem that can take days. If your network is segmented, responders start with a much narrower scope and can make isolation decisions faster. If your logs are centralized and retained with integrity guarantees, responders have a reliable record to work from. If logging is inconsistent or incomplete, responders are trying to reconstruct events from fragments.
The forensic artifacts that incident responders depend on — logs, network flows, endpoint telemetry, identity records — are generated or not generated based on decisions made when systems were designed. Those decisions are made long before any incident occurs. Organizations that build systems without thinking about incident response eventually find themselves trying to respond to incidents in systems that weren't designed to be understood under adversarial conditions.
What Good IR Actually Looks Like
Effective incident response is characterized by things you can observe before any incident happens. Clear ownership: specific people, reachable at any hour, who are the authority on specific decisions during an incident. Pre-authorized playbooks: for the most common incident types, decision trees that can execute without requiring escalation at every step. Regular, realistic practice: exercises that stress the actual coordination and information problems, not just the process.
Good IR also means having already made the architectural decisions that make response effective — centralized logging, network segmentation that allows isolation without cascading business impact, identity systems that allow rapid credential revocation, infrastructure that can be rebuilt from known-good state rather than requiring forensic analysis of potentially-compromised systems.
The incident response plan matters. But it matters significantly less than the operational readiness behind it, and both matter less than the architecture it's operating on.
A plan that nobody follows isn't a documentation problem. It's a symptom.
This post is part of a series on security as an architectural problem. Read the full series on the Intellitech blog.