Cloud infrastructure drift: The good, the bad, and the ugly

Écrit par

6 février 2019

0 minutes de lecture

Editor's note

This blog originally appeared on fugue.co. Fugue joined Snyk in 2022 and is a key component of Snyk IaC.

Infrastructure misconfiguration is the leading cause of data breaches in the cloud, and a big reason misconfiguration happens is infrastructure configuration drift, or change that occurs in a cloud environment post-provisioning. If you’re responsible for the security and compliance of cloud environments, you probably spend a lot of time focused on analyzing infrastructure drift events and remediating them.

It’s easy to think of all drift as being bad or undesirable. And make no mistake, some of it is really bad. Ugly even! But some drift is good and desired, and understanding the differences between the good, the bad, and the ugly--and how to recognize them--can save you and your team a lot of frustration and wasted time.

Good Drift: Why We Use The Cloud

Cloud infrastructure is elastic and scalable. We want our cloud environments to change dynamically to meet the second-by-second needs of our applications—without human intervention. Gone are the days of capacity planning and pre-provisioning in the datacenter.

There are many desired cloud infrastructure change events that we want to occur post-provisioning. Cloud resources like AWS Autoscaling Groups and DynamoDB can respond to usage and throughput to scale infrastructure dynamically to meet new demands. Applications may create new resources like SQS Queues, SNS Topics, or S3 Buckets during the course of its work. ALBs connected to an Elastic Container Service or AWS Fargate will take actions that change infrastructure configurations.

Any approach to cloud security and compliance needs to take into account such “good” change, especially if you’re looking to automatically remediate drift events. You don’t want your security tool fighting an automation battle with your application. Of course, you’ll need to ensure such “good” change is actually good, but that’s a topic for another post.

Bad Drift: Our App is Down!

Configuration drift has bedeviled ops and infrastructure teams for a really long time, causing application downtime and deployment failures. It occurs when a production environment changes in some way and Ops is unaware of it. Perhaps a Security Group Rule gets deleted, or an IAM Policy is removed. A simple fat finger in the AWS Console can take down an entire application. A fire drill to identify the cause ensues. The postmortem is uncomfortable.

If you’re lucky, you’re simply dealing with a failed deployment and not a serious downtime event. Regardless, this kind of bad drift eats up engineering resources. And where the applications are mission-critical, bad drift can result in lost revenue or eroding customer trust.

This is where effective approaches to cloud security can bring positive benefits to App and Ops teams. They don’t like it when someone moves their cheese, and for good reason. Addressing bad drift events can help keep your environment secure and in adherence with compliance policy, but it also helps prevent unplanned downtime events. Everyone benefits when operating from a single source of truth as to what’s running in cloud environments.

Ugly Drift: Data Breach!

The last category of cloud infrastructure drift is the kind that leaves critical data exposed to an exploit or leak. These are the events that land organizations in the news. And they’re always the fault of the cloud customer, not the cloud provider.

The most common “ugly” drift event is a critical object storage resource set to public access--often AWS S3 (if only due to the massive popularity of the service). The default setting for S3 is private, but it’s not uncommon for cloud users to inadvertently change it to public, potentially exposing critical or private data. Numerous headlines speak to the danger of cloud breaches due to S3 misconfiguration.

Looking beyond S3, configuration drift for Security Group Rules, VPCs (including subnets, ACLs), IAM Policies, and database access policies must also be detected and remediated quickly to avoid landing your organization in the papers, not to mention hefty compliance fines and a loss of customer trust.

Developing a Game Plan

Addressing and preventing cloud misconfiguration should be a top priority for any enterprise cloud security or operations team, and getting a handle on drift is key to accomplishing this. Understanding how to categorize drift events will help you focus limited resources on the drift events that matter and how to best fix them (i.e., manual vs. automated remediation).

Good Drift: Ignore it! But at some point you’ll want to apply some security validation to the range of possible configuration options an application can make. We’ll cover this topic in a later post.
Bad Drift: Monitor for it! Make sure you’re continuously monitoring for drift in this category and are prepared for fast remediation. For specific resources that tend to drift frequently, such as Security Groups, consider deploying an automated remediation solution to protect against system downtime events.
Ugly Drift: Prevent it! To avoid suffering a data breach due to infrastructure drift, a solution that automatically detects and remediates drift events for critical resources is mandatory. You can’t afford to leave your critical data exposed for hours, days, or longer.

Sécurité IaC conçue pour les développeurs

Snyk assure la sécurité de votre infrastructure en tant que code du cycle du développement logiciel à son exécution dans le cloud avec un moteur de politique en tant que code unifié pour permettre à chaque équipe de développer, déployer et faire fonctionner les solutions en toute sécurité dans le cloud.

Réservez une démo en ligne

Plateforme de sécurité des développeurs