Incident Management is typically the first stop in most people’s ITSM journey. So, if that’s the case, then why can it go so wrong, particularly in the case of a Major Incident?

I recently read an article on a failed Major Incident Response. A ‘very stable’ system fell over for the first time in years, long after the people who implemented it had hung up their cables.

Guess what happened?

  • MI Bridge chaos
  • Every SME is talking at the same time
  • Mini solutions appearing with no coordination
  • Documentation? What documentation?

So here’s your cheat sheet.

DO:

  • Get the right people (not everyone)
  • Have a single leader
  • Document everything as you go, even if rough notes
  • Focus on restoration first
  • Keep communications clear, brief and relevant

DON’T:

  • Start finger-pointing
  • Chase the root cause during the fire
  • Let non-essential management hijack the call
  • Forget stakeholder communications
  • Throw everything at it without a plan
  • Try multiple resolutions at once, obscuring the fix

When you are weathering a storm, have a single Captain steering the ship.