Improving coverage of cloud resources to reduce infrastructure drift
Stephane Jourdan
March 23, 2022
0 mins readDeprecation notice: Drift detection of managed resources
Drift detection of managed resources, including snyk iac describe --only-managed and snyk iac describe --drift
has been deprecated. The end-of-life date for drift detection of managed resources is September 30, 2023.
As developers, we need maximum visibility of what’s actually running in our cloud environments, in order to keep them secure. Infrastructure as code (IaC) helps developers automate their cloud infrastructures, so what’s deployed to the cloud is under control and can easily be audited. But achieving and maintaining 100% IaC coverage of your infrastructure has many challenges.
We’re only as secure as what is actually deployed and running in our cloud environments, and more often than not, a lot of manual actions are still done on a regular basis, by us, other teams, or some authenticated services. Those changes are hidden from IaC and auditing, bringing issues like misconfigurations and security concerns. That’s when drift management becomes important: we want reports of resources that are not yet under IaC control, or that have changed for some reason.
In this article, we will show how Snyk IaC helps developers to discover cloud resources that are not under infrastructure as code (IaC) control (unmanaged resources), or that have drifted from their expected state (managed resources).
Set up the environment
Snyk IaC helps you list the resources it finds as Terraform resources, so you can easily know which part of the cloud service is targeted by the discovery. For example, a single Amazon API Gateway v2 service is made of at least 12 Terraform resources. With discovery information provided by Snyk, you'll be able to promptly decide whether to revert the modification, import a new resource, or simply delete that new change.
To follow along, you can use the Terraform file below to create two AWS resources that we will use in walkthrough. It creates an IAM user named "user1" with a random suffix, an access key, and an attached policy for read-only access.
At the time of this writing, we used Terraform v1.1.7 with the AWS provider v3.74.2
Reuse the following HCL configuration:
main.tf
1resource "random_string" "prefix" {
2 length = 6
3 upper = false
4 special = false
5}
6
7resource "aws_iam_user" "user1" {
8 name = "user1-${random_string.prefix.result}"
9
10 tags = {
11 Name = "user1-${random_string.prefix.result}"
12 manual = "true"
13 }
14}
15
16resource "aws_iam_access_key" "user1" {
17 user = aws_iam_user.user1.name
18}
19
20resource "aws_iam_user_policy_attachment" "user1" {
21 user = aws_iam_user.user1.name
22 policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess"
23}
Apply that Terraform configuration:
1$ terraform init
2[...]
3$ terraform apply
4[...]
Confirm you have a terraform.tfstate
at the root of the directory:
1$ ls -al terraform.tfstate
2-rw-r--r-- 1 sjourdan staff 5049 Mar 16 18:31 terraform.tfstate
Also confirm that the IAM user is successfully created on AWS.
Getting started with a clean slate
Let's start by listing all the cloud resources that are not under Terraform control:
1$ snyk iac describe --only-unmanaged
You'll likely end up with a huge list of resources that are not under Terraform control. It's already great information, but not very actionable for our case. Snyk IaC has a built-in way to ignore resources in bulk, by adding all found resources to the .snyk
policy file.
Let's ignore all those existing unmanaged resources, so we can work more precisely in a controlled environment with just the two resources we created above:
1$ snyk iac describe --only-unmanaged --json | snyk iac update-exclude-policy
Scan again to confirm that your environment is now actively ignoring the discovered drifts (you'll have all the time later to schedule importing them).
1$ snyk iac describe --only-unmanaged
2
3Scanned states (1)
4Found 3 resource(s)
5 - 100% coverage
6Congrats! Your infrastructure is fully in sync.
We're now ready to start from a clean state.
Let's drift with IAM!
We will now create three types of drift to simulate real-life situations:
A modification on the existing IAM user (that we will want to revert)
A manual attachment of a new IAM policy (that we will want to remove)
A new IAM user (that we will want to improve)
To do so, navigate to the AWS Console for IAM.
Modify the existing IAM user by adding a tag
On the IAM users page, click on "user1"
Click on the Tags tab
Click on the Edit Tags button
Add a new key ("environment") and a new value ("production")
Click Save
Attach a powerful policy to the existing IAM user
On the IAM users page, click on "user1"
Click on the Permissions tab
Click on the Add permissions button
Click on Attach existing policies directly
Select Administrator Access
Click Next: Review
Validate by clicking on Add permissions
Create another IAM user manually
On the IAM users page, click on the Add Users button
Enter "user2" in the User name: field
Select Access key
Click on the Next: Permissions button
Don’t set any permissions or tags
Click on Create user (we don't care about the displayed credentials, so you can discard them).
We're now ready to tackle those the types of manual changes using Snyk IaC drift detection.
Managed and unmanaged infrastructure drift
Let's now find out how those changes are detected by Snyk IaC, and start with resources that are simply not managed at all by Terraform.
1$ snyk iac describe --only-unmanaged
2
3Scanned states (1)
4Found resources not covered by IaC:
5 aws_iam_access_key:
6 - AKIASBXWQ3AYQETE6OFR
7 User: user2
8 aws_iam_policy_attachment:
9 - user1-84i30k-arn:aws:iam::aws:policy/AdministratorAccess
10 aws_iam_user:
11 - user2
12Found 6 resource(s)
13 - 50% coverage
14 - 3 resource(s) managed by Terraform
15 - 3 resource(s) not managed by Terraform
16 - 0 resource(s) found in a Terraform state but missing on the cloud provider
This scan reported, using Terraform resources terms:
The manually created IAM "user2", with its IAM access key
The manually attached IAM policy to the Terraform managed "user1" IAM user.
Let's now check for changes only on resources managed by Terraform that are found in the various Terraform states:
1$ snyk iac describe –only-managed
2Scanned states (1)
3Found changed resources:
4 From tfstate://terraform.tfstate
5 - user1-84i30k (aws_iam_user.user1):
6 + tags.environment: <nil> => "production"
7Found 5 resource(s)
8 - 100% coverage
9 - 5 resource(s) managed by Terraform
10 - 1/5 resource(s) out of sync with Terraform state
11 - 0 resource(s) found in a Terraform state but missing on the cloud provider
This scan reported a very different output, and took significantly longer (36s versus 9s for the "unmanaged" scan mode).
Using this output we learn that the IAM user named "user1-84i30k", which we can find in the HCL (as a resource) under the name "user1", has a tag named "environment" set to "production".
Action plan
The Snyk drift detection tool helped us discover four unexpected differences between our expectations and reality. For the sake of this article, let's say the team decides the following:
"user2" IAM user is used in production and should be imported in Terraform.
"user2" IAM Access Key should be rotated for security reasons.
"user1" should under no circumstances be an Administrator.
"user1" new tag is needed by some requirement and should be imported in Terraform.
What | Resource type | Name | Drift type | Action |
---|---|---|---|---|
An IAM user |
|
| Unmanaged | IMPORT |
An IAM access key |
|
| Unmanaged | ROTATE |
An attached IAM policy |
|
| Unmanaged | DELETE |
A tag on an IAM user |
|
| Managed | IMPORT |
Deployment pipelines are not remediation
We have a great Terraform deployment pipeline in place, and the next time terraform apply
triggered, we may expect things to go back to normal.
In this case, what will Terraform do? A deployment job:
1$ terraform apply
2Terraform will perform the following actions:
3
4 # aws_iam_user.user1 will be updated in-place
5 ~ resource "aws_iam_user" "user1" {
6 id = "user1-84i30k"
7 name = "user1-84i30k"
8 ~ tags = {
9 - "environment" = "production" -> null
10 # (1 unchanged element hidden)
11 }
12[...]
13
14Plan: 0 to add, 1 to change, 0 to destroy.
Terraform was never meant to discover manually created or attached resources, and will simply revert the modified ones to the original state (which is not what we want in this situation).
What | Resource type | Name | Drift type | Action |
---|---|---|---|---|
An IAM user |
|
| Unmanaged | NONE |
An IAM access key |
|
| Unmanaged | NONE |
An attached IAM policy |
|
| Unmanaged | NONE |
A tag on an IAM user |
|
| Managed | REVERT |
In none of the cases is the help that we expect:
The manually created IAM user and its Access Key are not reported (unhelpful)
The manually attached Administrator policy to a managed user is not reported (unhelpful)
The important tag manually added to a managed user will be reverted (harmful)
A different type of tool is needed for this type of detection and work.
Improving our coverage
We start our journey at 50% coverage for the unmanaged resources:
1$ snyk iac describe --only-unmanaged
2
3Scanned states (1)
4Found resources not covered by IaC:
5 aws_iam_access_key:
6 - AKIASBXWQ3AYQETE6OFR
7 User: user2
8 aws_iam_policy_attachment:
9 - user1-84i30k-arn:aws:iam::aws:policy/AdministratorAccess
10 aws_iam_user:
11 - user2
12Found 6 resource(s)
13 - 50% coverage
14 - 3 resource(s) managed by Terraform
15 - 3 resource(s) not managed by Terraform
16 - 0 resource(s) found in a Terraform state but missing on the cloud provider
Let's improve this based on the team plan.
Delete The IAM policy for 'user1'
Let's start with the most urgent and easiest: removing the "Administrator" policy for the managed IAM "user1":
Go to IAM > Users > "user1"
Click on Permissions > delete "AdministratorAccess"
1$ snyk iac describe --only-unmanaged
2Scanned states (1)
3Found resources not covered by IaC:
4 aws_iam_access_key:
5 - AKIASBXWQ3AYQETE6OFR
6 User: user2
7 aws_iam_user:
8 - user2
9Found 5 resource(s)
10 - 60% coverage
11 - 3 resource(s) managed by Terraform
12 - 2 resource(s) not managed by Terraform
We're now covering 60% of our AWS resources, up from 50%.
What | Resource type | Name | Drift type | Action | Status |
---|---|---|---|---|---|
An IAM user |
|
| Unmanaged | IMPORT | |
An IAM access key |
|
| Unmanaged | ROTATE | |
An attached IAM policy |
|
| Unmanaged | DELETE | * |
A tag on an IAM user |
|
| Managed | ADD |
Let's continue.
Unblock the Terraform deployment pipeline
The pipeline is currently blocked by this manual change to the tags for aws_iam_user.user1. If any deployment happens, the tags will be reverted back to what's on the HCL. So what’s the solution? Use the Snyk IaC drift output to adapt our Terraform configuration.
The information we have is the following:
1Found changed resources:
2 From tfstate://terraform.tfstate
3 - user1-84i30k (aws_iam_user.user1):
4 + tags.environment: <nil> => "production"
We know from this output that:
We're looking for a resource named
aws_iam_user
named "user1"That resource is found in terraform.tfstate (very handy when you have dozens or hundreds of states)
There's a new tag key named
environment
with a value of "production".
Let's update our IAM user resource by simply adding environment = "production"
, so our resource now looks like this:
1resource "aws_iam_user" "user1" {
2 name = "user1-${random_string.prefix.result}"
3
4 tags = {
5 Name = "user1-${random_string.prefix.result}"
6 environment = "production"
7 }
8}
We can now safely unblock our Terraform deployment pipeline:
1$ terraform apply
2No changes. Your infrastructure matches the configuration.
3Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
We have fixed our "managed" drifts for now:
1$ snyk iac describe --only-managed
2Scanned states (1)
3Found 3 resource(s)
4 - 100% coverage
5Congrats! Your infrastructure is fully in sync.
What | Resource type | Name | Drift type | Action | Status |
---|---|---|---|---|---|
An IAM user |
|
| Unmanaged | IMPORT | |
An IAM access key |
|
| Unmanaged | ROTATE | |
An attached IAM policy |
|
| Unmanaged | DELETE | * |
A tag on an IAM user |
|
| Managed | ADD | * |
Import and rotate IAM user2
Let's now handle the "user2" case. We want to:
Import it into Terraform
Rotate the key
Let's start by importing the IAM user into Terraform, and here's one simple way to do it.
Start by collecting the information from Snyk IaC:
Resource type | Name |
---|---|
|
|
How do we import an aws_iam_user resource
? According to Terraform official documentation: IAM Users can be imported using thename, e.g., $ terraform import aws_iam_user.lb loadbalancer
.
We can also read that the only required argument is the name
. So let's add this basic structure to our HCL file:
1resource "aws_iam_user" "user2" {
2 name = "user2" # required
3}
Let's now import this user into Terraform:
1$ terraform import aws_iam_user.user2 user2
2aws_iam_user.user2: Importing from ID "user2"...
3aws_iam_user.user2: Import prepared!
4 Prepared aws_iam_user for import
5aws_iam_user.user2: Refreshing state... [id=user2]
6
7Import successful!
How did our coverage evolve? Let's find out:
1$ snyk iac describe --only-unmanaged
2Scanned states (1)
3Found resources not covered by IaC:
4 aws_iam_access_key:
5 - AKIASBXWQ3AYQETE6OFR
6 User: user2
7Found 5 resource(s)
8 - 80% coverage
9 - 4 resource(s) managed by Terraform
10 - 1 resource(s) not managed by Terraform
We're now at 80% coverage (from 60%) and only one resource is left.
What | Resource Type | Name | Drift type | Action | Status |
---|---|---|---|---|---|
An IAM user |
|
| Unmanaged | IMPORT | * |
An IAM access key |
|
| Unmanaged | ROTATE | |
An attached IAM policy |
|
| Unmanaged | DELETE | * |
A tag on an IAM user |
|
| Managed | ADD | * |
Rotate the key
Let's tackle this now. We know we want to rotate the key while adding it to Terraform. Let's start by adding the new key to the HCL to create a new key (so we can give it to the relevant team for example) and finally we'll simply delete the old one from AWS.
Terraform documentation for aws_iam_access_key is very straightforward, so we can simply create one resource that takes the user2 name as argument:
1resource "aws_iam_access_key" "user2" {
2 user = aws_iam_user.user2.name
3}
As the deployment pipeline was previously unblocked, we can safely apply this using Terraform to create a new key:
1$ terraform apply
2[...]
3Terraform will perform the following actions:
4
5 # aws_iam_access_key.user2 will be created
6 + resource "aws_iam_access_key" "user2" {
7 + create_date = (known after apply)
8 + encrypted_secret = (known after apply)
9 + id = (known after apply)
10 + key_fingerprint = (known after apply)
11 + secret = (sensitive value)
12 + ses_smtp_password_v4 = (sensitive value)
13 + status = "Active"
14 + user = "user2"
15 }
16
17Plan: 1 to add, 0 to change, 0 to destroy.
18
19aws_iam_access_key.user2: Creating...
20aws_iam_access_key.user2: Creation complete after 1s [id=AKIASBXWQ3AY4KPUNIHZ]
21
22Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
We still have the old key to remove. Using the information from the Snyk IaC output, we know that the key name is AKIASBXWQ3AYQETE6OFR
.
The simplest way to remove this key is to:
Go to IAM > Users > user2 > Security Credentials
Remove the key named
AKIASBXWQ3AYQETE6OFR
as reported by Snyk IaC by deactivating it, then deleting it.
How does our coverage look like now?
1$ snyk iac describe --only-unmanaged
2Scanned states (1)
3Found 5 resource(s)
4 - 100% coverage
5Congrats! Your infrastructure is fully in sync.
Congratulations! Everything is back under control, thanks to Snyk IaC drift detection!
What | Resource type | Name | Drift type | Action | Status |
---|---|---|---|---|---|
An IAM user |
|
| Unmanaged | IMPORT | * |
An IAM access key |
|
| Unmanaged | ROTATE | * |
An attached IAM policy |
|
| Unmanaged | DELETE | * |
A tag on an IAM user |
|
| Managed | ADD | * |
Wrapping Up
In this article, we showed how Snyk IaC drift detection can help discover manually created AWS resources, how it reports everything in Terraform terms with the right information to help developers import those resources into their Terraform HCL code. We also briefly discovered that automatically reverting changes might not always be the desired outcome and that a lightweight drift detection alerting system is needed in conjunction with that deployment pipeline.
We firmly believe all infrastructure should be in code, so engineers can have security feedback and visibility into issues as soon as possible.
That’s why Snyk IaC can help teams quickly reintegrate all the resources actually running in their AWS account into Terraform code to increase overall IaC coverage and reduce security issues overall. Snyk IaC drives faster fixes by closing the feedback loop between cloud security and engineering teams and reporting actionable fixes direct to engineer, in engineer-friendly terms.
Secure infrastructure from the source
Snyk automates IaC security and compliance in workflows and detects drifted and missing resources.