The worst IT outage so far: lessons from the CrowdStrike incident

July’s global IT outage was among the worst ever seen… so far. Microsoft estimates that as many as 8.5 million1 Windows devices were affected, just from a single code error. The next disruption could be much worse – so what can we learn?

The issues began with a single error in an update made to CrowdStrike’s Falcon antivirus software. Sent out before it was fully tested, one small mistake caused computers to enter a “blue screen of death” (BSOD) state. It took the security company 79 minutes to resolve the error and push out a fix, but in that time 8.5 million computers had been downloaded the broken update were unable to download the corrected update.

Banks and stock exchanges around the world lost access to their systems. Over 10,000 flights were cancelled globally2. In the UK, GPs and pharmacies were unable to access the NHS’s EMIS booking and records system, affecting consultations and preventing pharmacies from accessing prescriptions. The economic impact of the outage is hard to assess, with some estimates as high as tens of billions of dollars3.

Compounding the immediate effects of the outage, cybercriminals were quick to take advantage of the confusion. Many organisations reported receiving phishing emails and calls from threat actors posing as CrowdStrike support staff. Meanwhile, some opportunists began selling scripts that they claimed could be run to resolve the issue; in actually they would not fix the issue, and would more likely hide malware.

At time of writing – one week on from the incident – most of the impacted computers are back online. But with fewer than 1% of Windows devices affected in total4, this could have been much worse – and next time we might not be so lucky. The next such error – or cyberattack – could have wide ranging, long-lasting impacts. So how can we protect ourselves?

  • Backups and BCDR are vital. Having a robust Business Continuity and Disaster Recovery protocol in place is critical. Regular scheduled backups of servers, users and document libraries help you recover your data quickly.
  • Network redundancy matters. Having a secondary failover line into the premises, using a different route and provider, can keep you up and running if your primary network provider fails.
  • Protect against phishing. Cybercriminals are quick to take advantage of global events to lend legitimacy to their attacks or provide a new vector into target organisations. Effective email protection software helps filter out these attacks.
  • Reduce human error. The Information Commissioner’s Office lists human error as the number one cause of data breaches. User awareness training reduces the risk of cyberattacks compounding the impacts of an IT outage.
  • Keep up to date. While the CrowdStrike outage was caused by a faulty patch, the next incident could be caused by unpatched vulnerabilities – especially if cybercriminals can use these as a vector into your systems. Ensuring software and hardware are regularly updated can protect against such breaches.
  • Stay alert. Proactive monitoring alerts us when your systems encounter issues, in many cases allowing us to resolve problems before they can spread.

This incident has demonstrated the fragility of the global IT systems. By learning from this outage and implementing these protective measures, we can better prepare for and withstand the next potential IT crisis. The resilience of our digital infrastructure depends on our ability to anticipate and adapt to these challenges, ensuring continuity and security in the digital world.

  1. Helping our customers through the CrowdStrike outage – The Official Microsoft Blog ↩︎
  2. World’s largest airlines cancelled 10,000 flights in three days due to IT outage – Business Traveller ↩︎
  3. CrowdStrike-Microsoft Outage To Cost $44M Per Fortune 500 Company: Report (crn.com) ↩︎
  4. Helping our customers through the CrowdStrike outage – The Official Microsoft Blog ↩︎
Share
  • Share on LinkedIn