Here’s how to make unexpected incidents feel less Y2K and more Keep Calm and Carry On!
One rainy Friday afternoon in Victoria, some of our IT team were out of the office at a social event…No sooner had we started mingling and chatting came the first twitterings of something amiss…Have you heard ? The Internet has gone down…Planes can’t take off…Noone can use their cards…
When you first heard about the widespread July 19 global IT outage, you mind may have immediately jumped to cyberattack, especially in light of the increasing prevalence of state-based threat actor alerts issued recently. Yet cyber authorities including the ASD’s ACSC were quick to reassure the public that the incident was not related to malicious activity. Fortunately, or unfortunately depending on how you look at it, this incident seemed to be the result of largest cyber risk factor of all – human error! Yep, someone, somewhere stuffed up!
What Happened and Could It Have Been Avoided?
As we all know now, the outage was triggered by a software update to global cybersecurity vendor CrowdStrike’s Falcon sensor, a tool designed to enhance security on Windows devices. Unfortunately, a defect in the update led to widespread system crashes, affecting sectors from travel to banking. Not gonna lie…it did all feel a little bit Y2K… 20+ years too late!
While CrowdStrike quickly identified and isolated the issue, post-incident reviews have pointed to a fundamental code error that somehow made its way unchecked through the vendor’s testing and update release process. Crowdstrike’s post-incident review updates have pointed to clear failures in change management processes and pre-deployment testing, and the steps they will now take to prevent a reoccurrence in the future.
Tip! If you’d like to sink your teeth into an engaging technical unpacking of the CrowdStrike incident, take a few minutes to watch the Dave’s Garage summary.
Crisis Averted or Opportunity for Improvement
Here at Maxsum, we don’t utilise CrowdStrike so we didn’t experience any impact from the broader outage – nor did our Managed IT Services clients. If, however, you were affected by the incident and require ongoing recovery assistance, please contact us at Maxsum to assist.
However, it’s certainly not lost on us as a Managed IT Services provider, that whether we’re talking cyber incident, human error, process failure or service outage, the key is to always – Prepare for the worst and Expect the unexpected!
Immune or otherwise to the recent CrowdStrike incident, you absolutely should take this as an opportunity to improve your preparedness for the next “unexpected” incident to come your way. Here’s how.
1. Dust Off Your BCDR Plans
The recent outage is an opportunity to review and strengthen your Business Continuity and Disaster Recovery (BCDR) plans. This means talking tactics, testing them, and telling your team about them!
- Identify your business-critical operations and functions – Do you have work-arounds (offline tools or manual processes), are you able to maintain business operations during disruption?
- Know what your core IT and security systems are and do – What your security incident management plan is, and the steps to recover systems and data, including back-ups.
- Get better acquainted with your suppliers – Know who they are, what systems they use, how the integrate with yours and what security or business continuity provisions are in your agreement with them.
- Understand where to source the facts – official incident communications and trusted status updates.
- Nominate who is responsible for making key decisions – and communicating with your stakeholders and service providers.
- Keep calm and carry on – how you will proactively keep up to date with the changing threat landscape and seek and incorporate recommendations from your IT department or service provider.
2. Scope Out Your Supply Chain
If you were watching the recent incident unfold and actually didn’t know if you use the affected product or not…then you have some supply chain management work to do!
It’s no coincidence that the recent ISO27001:2022 security standard update focusses heavily on supply chain management – understanding who is in your supply chain, what technologies they use, their approach to change management, how they handle updates, their testing protocols, and their communication strategies.
- Ask your suppliers to provide you with supply chain assurance information
- Review your service contracts for clarification on their business continuity provisions and commitments to you.
- Make an informed decision, armed with this information, on whether a supplier’s approach meets your risk criteria, and if you’re good to play on, knowing that you’re less likely to be caught off guard by issues originating from your suppliers.
3. Manage Change Better in your Supply Chain
Change management failures are shaping up to be a key causal factor in the July 19 global outage. But reviewing a supplier’s, especially a technology providers’, change management process can be very technical, which is precisely why many organisations overlook it…and also precisely why they shouldn’t!
The devil is in the detail as they say. If in doubt, your Managed IT Services provider is a great place to start, especially if they are ISO27001-certified, as they will know exactly what to look for. Here are some key activities to get you on the right track.
- Conduct a thorough analysis: Examine their performance metrics, conduct gap analyses, and identify bottlenecks or inefficiencies.
- Understand their change management framework: Do they have one? What is their approach to planning, implementing, and evaluating changes?
- Evaluate testing protocols: Inquire about the testing protocols your suppliers use before rolling out changes. Robust pre-deployment testing can prevent issues like the recent CrowdStrike outage.
- Assess collaboration and communication strategies: Ensure your suppliers have clear communication strategies to keep you informed about upcoming changes and potential impacts and have the mechanisms in place to collaborate proactively with you.
- Review historical performance: Review case studies or performance reports to see how they managed previous updates, changes and any issues that arose.
- Perform regular audits: Audit your supply chain yourself regularly to ensure compliance with your standards and expectations.