Posted inBlog

As Questions Continue About the CrowdStrike Snafu, Microsoft and Others Revisit Resiliency – IT Connection


Amy Larsen DeCarlo – Principal Analyst, Security and Data Center Services

Summary Bullets:
• A flaw in an update of CrowdStrike’s Falcon threat intelligence and incidence response platform brought down millions of Windows systems, disrupting operations around the world earlier this month.

• The event, which took days to recover from, put the need for greater operational resiliency and better quality control as well as better protections for systems and data in sharp relief.

Earlier this month, the combination of an undetected error in CrowdStrike’s Rapid Response content update and a bug in the content validator used to ensure the code is hygienic led to the corrupt update being released in production. The software distribution led to 8.5 million Windows systems being knocked offline and operations being interrupted around the world. The fix was manual and kludgey in nature. Thousands of flights were canceled, medical procedures postponed, and operations across industries were stalled, in some cases for days. The incident is expected to cost organizations billions of dollars when the fallout from the disruption is tallied.

The event raised serious questions both about vendor quality control and their customers’ overreliance on automation with respect to IT updates. With respect to the former, CrowdStrike published an initial incident report, identifying the pair of issues that drove the proverbial IT train right off the tracks with mass system shutdowns across the globe. Along with profuse apologies from CrowdStrike’s CEO, the company promised a full post-breach disclosure once it completes its investigation.

Microsoft offered hundreds of engineers to support customer system restoration efforts. The company said it is collaborating with other cloud providers, including Amazon Web Services and Google Cloud Platform, to understand the full effect of the incident. The expectation is that gaining a thorough understanding of what happened during this event will help everyone better prepare for future issues.

In a blog post, John Cable, vice president of program management for Windows servicing and delivery, wrote that the company needs to make development changes to support greater systems resilience. Cable said the company is looking to reduce kernel-level access for software applications to better steel Windows operating systems against malicious code and corrupted software.

Enterprises that were impacted need to revisit their business continuity plans. Everyone involved, from the vendors and service providers to the end customers, has a lot to learn. There is an open dialogue now that hopefully will lead to better organizational resilience in the future.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *