Infrastructure drift and drift detection explained
Expectations do not always line up with reality. If you’ve started using infrastructure as code (IaC) to manage your infrastructure, you’re already on your way to making your cloud provisioning processes more secure. But there’s a second piece to the infrastructure lifecycle — how do you know what resources are not yet managed by IaC in your cloud? And of the managed resources, do they remain the same in the cloud as when you defined them in code?
Changes to cloud workloads happen all the time. Increasing the amount of workloads running in the cloud means an increasing number of people and authenticated services interacting with infras, across several cloud environments. As IaC becomes more widely adopted and IaC codebases become larger, it becomes more and more difficult to track changes or ensure manual configuration changes are accounted for. That’s why drift detection is important to secure what is automated, after its been deployed and is living in the cloud.
In this blog, we’ll cover what infrastructure drift is, the causes of drift, and tips for managing drift — whether you’re a solo developer or a large organization.
What is infrastructure drift?
Infrastructure drift, or what we refer to as “drift” here, is when the real-time state of infrastructure doesn’t match what’s defined in your IaC configuration. This difference between what’s defined in code and what exists in the cloud can happen for many reasons.
A benefit to using IaC or having higher IaC coverage of cloud resources is that it minimizes instances of drift! By defining intended configurations and security best practices ahead of deployment, it reduces the need for configuration changes to be made later on in your cloud console. Still, there are inevitable changes made due to emergencies or human error.
Drift can be caused by human input, poor configuration, applications making unwanted changes, and more. Two of the most common causes of drift are linked with process or workflow issues, like manual changes in a cloud console not being transposed as code, or changes applied to some environment but not propagated to others.
Common causes of infrastructure drift:
- Manual changes: Someone goes to the console and manually creates or modifies resources outside of Terraform, CloudFormation, or other IaC
- Authenticated applications: Microservices performing not as they’re supposed to
- Out of sync IaC environments:hidden or unseen changes from environment to environment
What is drift detection?
Drift detection is the continuous process of detecting drift in your cloud infrastructure managed by IaC, especially the deviations from your IaC that pose a security risk to your organization. Ideally, the results of a drift detection tool include drift results reported in a developer-friendly manner (i.e. a Terraform resource reported directly with proper formatting) to help developers quickly understand the issue and fix it in their deployed infrastructure.
You can think of it as the second phase in IaC security. You’re finding misconfiguration issues and enforcing security guardrails in IaC during development and build pipelines, but also after things are pushed to production.
Every drift event causes uncertainty, a resolution time, and a potential security issue.DevOps interviewee
Drift detection is important as you’re only as secure as what is actually deployed and running in your cloud environments. There’s often a false sense of security when you’ve automated management of your infrastructure with IaC, but things change! And that’s why we need drift detection. It’s important to keep up with what’s automated and to build security through your entire infrastructure lifecycle, from the moment an IaC configuration is written to after it’s deployed in the cloud.
What happens if drift goes unmanaged?
Intentionally or not, a developer can do a whole lot of damage with just IAM keys and an SDK. Being able to catch bad decisions quickly and reverse the situation back to a healthy and secure state is crucial.
Some bad outcomes of infrastructure configuration drift include…
- Data breaches: Drift can leave critical data exposed.
- Application downtime: Drift can cause applications to crash.
- Deployment failures: Drift can cause your deployment to fail.
In each of these cases, increasing IaC coverage or making sure a greater percentage of infrastructure is managed by IaC can aid in minimizing drift or remediating issues faster than if issues were not managed by IaC. As opposed to infrastructure deployments automated by IaC, manually configured — or unmanaged resources — take more time to be set up and are prone to errors. With IaC, you can standardize the setup of infrastructure, so there is a reduced possibility of any errors in setup or of deleted dependencies (such as a missing security group rule or IAM policy).
In the case of data breaches, if all resources are managed by IaC, you could standardize security controls and prevent or mitigate issues such as an S3 bucket becoming publicly available. In an instance of downtime, you could more efficiently track your infrastructure and redeploy the last healthy version before the disaster happened.
Drift detection vs. drift management
Drift management is a more holistic security solution to reducing the risk of drift and enabling the quick remediation of drift by detecting drift of your managed resources and detecting unmanaged resources in your cloud environments so that they can be brought under control.
In an ideal world, security and development teams could have 100% IaC coverage of their cloud resources and a workflow would look something like this: An unmanaged resource is detected, it is then imported as code, then it’s tested and brought up to a healthy and secure state according to your organization’s defined IaC security best practices and compliance policies.
Keeping infrastructure secure & in sync with code
A comprehensive recipe for IaC security looks like this:
- Increase your IaC coverage of cloud resources across cloud environments.
- Adopt an IaC security tool to scan your configurations during development and build pipelines to catch any misconfig issues early and pass through security reviews.
- Leverage your IaC (Terraform or AWS CloudFormation) to detect synchronized infrastructure.
- Employ an open-source drift detection tool (driftctl) to catch drift issues in production and report drift results to your developers, in developer terms.
- Take action on driftctl findings by having developers add more code and import it as is on Terraform.
- Close the feedback loop by using your IaC security tool (or
snyk iac test) to secure those newly created Terraform configurations.
- Rinse and repeat until happy with the coverage, per region if needed.
- Finally, build as many recurring jobs as needed for alerting (for example, an hourly check for any IAM change and a daily check for less critical cloud services).
Considerations when mitigating drift
There are more than a few drift detection and drift management tools out there these days. There are some things to consider when you’re choosing your tool.
When making a decision, ask what level of access you’re giving the tool (i.e. full access, read-only access, or a least-privileged policy). Certain tools like Terraform require fully authenticated access, while others require read-only access. Driftctl (mentioned above) operates on least-privileged access or the minimum to detect drift.
Drift management with Snyk IaC
If you’re interested in bringing unmanaged resources under IaC control as well as detecting drift of your managed resources, Snyk is doing just this. Drift management in Snyk IaC helps you secure infrastructure faster by reporting issues and fixes direct to developers, in developer-friendly terms. By building a faster feedback loop between cloud security and development teams, developers will be empowered to own their Terraform from code to cloud and secure infrastructure configurations post-deployment. The second part of this is also surfacing unmanaged resources across cloud environment, so you can bring them under IaC control and reduce the risk of drift from the start.
Drift management in Snyk IaC
Secure infrastructure faster by reporting issues and fixes direct to developers.