How to Detect and Prevent Configuration Drift

Steps to successfully detect and prevent configuration drift

0 mins read

How to Detect Configuration Drift Across Your Organization

Regardless of how your organization approaches infrastructure — automated, manual, on-prem, cloud-based, or a combination — small, daily changes are inevitable. Your systems were built to be used and modified to fit both external and internal needs, meaning that they will change over time.

When you have multiple engineers and teams interacting with this infrastructure ad hoc and not following the right protocols, these micro-changes can pile up quickly, creating inconsistencies between your current system’s configuration and the baseline of how it should look. This is how configuration drift happens: changes are implemented in an improper way causing issues for your infrastructure over time.

What is configuration drift?

Configuration drift occurs when changes to a company’s infrastructure are not documented or performed correctly, creating the potential for the change to negatively affect the structural integrity of the infrastructure. The changes that cause drift aren’t always inherently bad, but because they cause the system to drift further away from the baseline of what it should be, these changes can end up causing security, compliance, or performance issues if left unaddressed.

While configuration drift happens to all types of infrastructure setups, companies that have infrastructure as code (IaC) environments face several unique challenges in remediating and proactively preventing drift. Some causes are mitigated by default because IaC provides measures such as version control, which make it less likely that an undocumented change will get deployed. But the fast pace of automated cloud provisioning can still cause other issues to arise. Configuration drift in DevOps, CI/CD, and other automated development environments can especially add up quickly, simply because changes are happening rapidly.

Ultimately, drift occurs in IaC when the infrastructure’s current state doesn’t align with the coded IaC configuration. It leaves engineers working off of a non-current version of the infrastructure while a discrepant version lives on the actual live system.

What are the causes of configuration drift?

Configuration drift can happen due to a variety of changes made to the structure of your system. A few of the most common causes of drift include:

Manual system changes. When someone manually creates or modifies resources, outside of your established IaC system such as Terraform or Cloudformation, these changes do not get translated to the coded configuration.
Authenticated application. This means that microservices for performing automated actions, such as reading a script or writing a bucket, are not performing as intended due to some kind of bug. These automation errors can lead to drift.
Hidden or unseen changes in IaC. Unknown changes cause IaC environments to become out-of-sync, meaning that there’s a gap between what the engineers are working with, and what is actually happening with the system out in the wild.
Patches and upgrades. In order to release new patches or upgrades, engineers have to make a number of big or small changes to the system. Each of these alterations has the potential of causing drift.

What happens when drift goes unmanaged?

Configuration drift can cause serious problems for teams, especially those who use IaC. This is because when the infrastructure’s current state doesn’t align with the coded IaC configuration, it leaves engineers working off of a non-current version of the infrastructure, while a discrepant version lives on the live system. The larger this gap between the engineers’ knowledge of the system and the actual system becomes, the worse the system’s security posture, performance, and compliance will become.

Security risks

When left unaddressed, drift can cause serious security problems and expose your systems to attackers. This is because ad hoc, undocumented changes to the system can result in unknown backdoors. In many cases, the real risk comes from not knowing about some of the changes being made to the system, which results in unaddressed vulnerabilities. For example, the 2020 breach of Twilio was due to configuration drift of an S3 bucket. An engineer applied a fix to an earlier problem, but in the process cause the bucket’s configuration to become insecure. Without detection and corrective action to roll back the configuration to its original secure state, the drift went undetected for years and was eventually exploited by attackers to skim users’ personal data.

Performance issues

Not only does configuration drift cause security vulnerabilities, but it also can lead to performance issues and downtime. When the current state of your infrastructure doesn’t line up with the actual IaC code, this discrepancy can lead to over-provisioned workloads as well as unoptimized resources and processes. All of these small inconsistencies can add up, creating lots of performance problems.

Compliance failures

IaC should serve as living documentation of what your current system looks like. When configuration drift throws off the accuracy of the coded IaC, small changes won’t get properly addressed by your security policies. Let’s say that an engineer opens a port without reflecting the change in the IaC. Even if this change is harmless, it causes your system’s current state to ‘drift away’ from the state defined in IaC and its subsequent security policies. This would be considered a compliance failure during an audit.

How to detect configuration drift

Configuration drift management practices can help your team catch and remediate drifts before they cause underlying problems. There are several tools out there for reviewing your infrastructure, detecting drift, and giving actionable next steps for remediating it.

Unmanaged versus managed resources

As your team reviews configuration drift detection tools, it’s important to note that some solutions specialize in dealing with managed resources, while others are intended to tackle unmanaged ones. To detect drift within managed resources, such as Terraform resources, your team needs to use a solution that can regularly scan your identified assets for signs of drift.

To deal with unmanaged resources, on the other hand, your team needs to use a tool that can inventory any assets that are not currently documented and, as a result, not regularly checked for drift. After identifying unmanaged resources, you need to take steps to put them under IaC control.

Configuration drift management tools

Terraform plan. This is simply a command that can be used within your Terraform instance. It locates drifts inside your managed resources and explains the undocumented change that occurred, along with a plan for remediating the drift. But, this command line can only capture changes within your managed resources, not unmanaged ones.
Snyk IaC. Our configuration drift management tool aims to make the drift detection process fast and simple for teams to implement. It has the ability to bring unmanaged resources under Terraform control to increase IaC coverage of your clouds. Our solution focuses on creating a partnership between development and security, through user-friendly reporting.
CloudQuery. This open source cloud asset inventory tool can quickly enumerate cloud resources, detect unmanaged resources, and scan multiple state files. While helpful for visualizing your cloud assets, it does have a few downsides when it comes to remediating drift. CloudQuery can be inaccurate when searching for drift in S3 buckets, has limited support for backend state file storage, requires an SQL database, and does not support all Terraform resources.
Driftctl. This open source resource — maintained by Snyk — can detect, track, and alert on drift. While it’s versatile, as it can detect drift on unmanaged resources and scan multiple state files, it also leaves a few gaps. For instance, it doesn’t support all Terraform resources. In addition, some users have experienced issues because of API throttling errors and long wait times when scanning in deep mode.

Configuration drift will inevitably happen as systems evolve and change over time. But if your organization focuses on identifying and correcting drift as soon as it happens, it won’t end up causing issues later down the pipeline, saving time and resources in the process. We created Snyk IaC to help organizations manage and prevent drift. If you’d like to see it in action, reach out to us for a demo today.

Secure infrastructure from the source

Snyk automates IaC security and compliance in workflows and detects drifted and missing resources.

Book a live demo Start free