Understanding the Crowd Strike/Microsoft IT Outage

Job Angula

Last week, the IT world was shaken by a significant outage involving two major players: CrowdStrike and Microsoft.

The incident underscored the critical vulnerabilities within even the most robust IT infrastructure and the far-reaching consequences such outages can have across various industries.

THE CAUSE

CrowdStrike is a cybersecurity technology company that provides endpoint security solutions to thousands of the biggest companies in the world.

Endpoint security, also known as endpoint protection, is a cybersecurity approach that focuses on protecting individual devices (endpoints) from malicious activity.

Think of endpoint security as a digital shield for your devices. The outage was traced back to a routine software update.

Microsoft’s Azure platform, which provides cloud computing services to numerous businesses globally, experienced a disruption because of a conflict with CrowdStrike’s Falcon security software.

Specifically, an incompatibility arose during a routine security patch update, causing widespread service disruptions.

While both companies quickly acknowledged the issue and worked diligently to restore services, the event highlighted the fragility of IT ecosystems where multiple high-dependency systems interact.

In order to grasp this, the ATM at your local bank, the point of sale system at your local supermarket, your doctor’s medical records system, and the system at your local municipality or government department are all powered by servers in the background.

If those servers go down, the system becomes unavailable.

THE IMPACT ON INDUSTRIES

The outage had a profound impact across various sectors reliant on cloud computing and cybersecurity services.

Financial institutions, healthcare providers, and retail giants were among the hardest hit.

For example, banks experienced significant delays in transaction processing, affecting millions of customers.

Healthcare providers faced interruptions in accessing patient data, which could have led to critical delays in medical services.

Retail companies relying on cloud-based point-of-sale systems saw disruptions in their operations, leading to potential revenue losses during peak business hours.

In the Netherlands for example, July is peak summer travel time and thousands of travellers were stranded and delayed at Schiphol airport.
Quantifying the financial impact is challenging, but estimates suggest that the outage may have cost affected industries billions in lost revenue and operational disruptions.

The incident also led to a temporary dip in stock prices for both CrowdStrike and Microsoft, reflecting the market’s sensitivity to such vulnerabilities.

ROBUST RISK MANAGEMENT

This incident underscores the necessity of robust redundancy, change and patch management procedures to mitigate the risks associated with software updates and system integrations.

Effective change management involves meticulous planning, impact analysis, and comprehensive testing before deploying updates. It’s crucial for organisations to:

  • •Conduct Thorough Testing: All updates should be rigorously tested in a controlled environment that simulates the production set-up as closely as possible. This helps identify potential conflicts and issues before they affect the live environment.
  • •Implement Rollback Procedures: In case of failure, there should be a well-defined rollback plan to restore systems to their previous state swiftly.
  • •Redundancy: Building redundancy into IT systems means having multiple instances of critical components. If one fails, the other can take over, minimising service interruptions.

For example, using multiple data centres across different geographical locations can ensure that if one data centre goes down, others can handle the load, maintaining service availability.

  • •Enhance communication channels: Coordination between vendors and clients is vital. Early communication about upcoming changes and potential risks can help clients prepare and mitigate impacts.
  • •Continuous Monitoring and Improvement: Post-deployment monitoring can quickly identify issues, and continuous improvement processes can refine and enhance patch management practices over time.

CRITICAL REMINDER

The CrowdStrike/Microsoft outage serves as a critical reminder of the interconnected nature of modern IT systems and the cascading effects that can arise from a single point of failure.

By adopting stringent change and patch management practices, organisations can better shield themselves from the operational and financial repercussions of similar incidents in the future.

The incident also calls for enhanced collaboration between service providers and clients to ensure seamless and secure updates, reinforcing the resilience of IT infrastructure globally.

– Job Angula is an IT risk professional and co-founder of Accelerate Advisory Services (Pty) Ltd.

Stay informed with The Namibian – your source for credible journalism. Get in-depth reporting and opinions for only N$85 a month. Invest in journalism, invest in democracy –
Subscribe Now!

Latest News