Four steps for hardening Amazon EKS security
Kamil Potrec
July 21, 2021
0 mins readIn the first part of this blog series, we explored deploying Amazon EKS with Terraform, and looked at how to secure the initial RBAC implementation along with securing the Instance Metadata Service. In this second post, we’ll look at more best practices to harden Amazon EKS security, including the importance of dedicated continuous delivery IAM roles, multi-account architecture for Amazon EKS cluster isolation, and how to encrypt your secrets in the control plane. And finally, we will show how to incorporate static analysis tooling in your CD pipelines to catch these issues before they reach production environments.
Keep track of who deployed the cluster
The Amazon EKS service integrates with the Identity Access Management service to authenticate users to the cluster. The authorization decisions are performed by the native Kubernetes RBAC model. RBAC configuration is mapped to the IAM identities via aws-auth ConfigMap. However there is a special mapping that is not visible in any configuration file or setting. The IAM entity used to create the cluster is automatically mapped to the system:masters
built-in Kubernetes group. This is well documented in the user documentation. However, it still has significant implications on auditability of the cluster permissions.
In order to showcase the impact of this configuration we will delete all Roles
and ClusterRoles
from our demo cluster.
1dev@pwnbox:~$ kubectl delete clusterroles --all
2clusterrole.rbac.authorization.k8s.io "admin" deleted
3clusterrole.rbac.authorization.k8s.io "aws-node" deleted
4clusterrole.rbac.authorization.k8s.io "cluster-admin" deleted
5clusterrole.rbac.authorization.k8s.io "edit" deleted
6clusterrole.rbac.authorization.k8s.io "eks:addon-manager" deleted
7clusterrole.rbac.authorization.k8s.io "eks:fargate-manager" deleted
8clusterrole.rbac.authorization.k8s.io "eks:node-bootstrapper" deleted
9clusterrole.rbac.authorization.k8s.io "eks:node-manager" deleted
10clusterrole.rbac.authorization.k8s.io "eks:podsecuritypolicy:privileged" deleted
11<OMITTED>
12dev@pwnbox:~$ kubectl delete roles --all-namespaces --all
13role.rbac.authorization.k8s.io "system:controller:bootstrap-signer" deleted
14role.rbac.authorization.k8s.io "eks:addon-manager" deleted
15role.rbac.authorization.k8s.io "eks:certificate-controller" deleted
16role.rbac.authorization.k8s.io "eks:fargate-manager" deleted
17role.rbac.authorization.k8s.io "eks:node-manager" deleted
18<OMITTED>
19dev@pwnbox:~$ kubectl auth can-i --list
20Resources Non-Resource URLs Resource Names Verbs
21*.* [] [] [*]
22 [*] [] [*]
23selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
24selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
25 [/api/*] [] [get]
26 [/api] [] [get]
27 [/apis/*] [] [get]
28 [/apis] [] [get]
29 [/healthz] [] [get]
30 [/healthz] [] [get]
31 [/livez] [] [get]
32 [/livez] [] [get]
33 [/openapi/*] [] [get]
34 [/openapi] [] [get]
35 [/readyz] [] [get]
36 [/readyz] [] [get]
37 [/version/] [] [get]
38 [/version/] [] [get]
39 [/version] [] [get]
40 [/version] [] [get]
41dev@pwnbox:~$ kubectl get configmap --all-namespaces
42NAMESPACE NAME DATA AGE
43kube-system coredns 1 7h59m
44kube-system cp-vpc-resource-controller 0 7h58m
45kube-system eks-certificates-controller 0 7h59m
46kube-system extension-apiserver-authentication 6 7h59m
47kube-system kube-proxy 1 7h59m
48kube-system kube-proxy-config 1 7h59m
Amazon EKS uses webhook token authentication in order to integrate with IAM. We can see the configuration in the api server arguments.
1FLAG: --authentication-token-webhook-cache-ttl="7m0s"
2FLAG: --authentication-token-webhook-config-file="/etc/kubernetes/authenticator/apiserver-webhook-kubeconfig.yaml"
3FLAG: --authentication-token-webhook-version="v1beta1"
4
AWS IAM has a concept of unique identifiers which can be used in specific services to avoid name reuse misconfiguration. It is unclear from the Amazon EKS documentation whether the system:masters
permission is bound to the unique identifier of the entity or its friendly name. In order to validate if removing the original entity will revoke all permissions from the cluster, we have deleted the temporary role and recreated it.
1dev@pwnbox:~$ aws iam get-role --role-name ci-eks-test
2{
3 "Role": {
4 "Path": "/",
5 "RoleName": "ci-eks-test",
6 "RoleId": "AROAYREY3WYOOSHLPO3W6",
7 "Arn": "arn:aws:iam::123456789012:role/ci-eks-test",
8 "CreateDate": "2021-06-20T21:09:01+00:00",
9 "AssumeRolePolicyDocument": {
10 "Version": "2012-10-17",
11 "Statement": [
12 {
13 "Effect": "Allow",
14 "Principal": {
15 "AWS": "arn:aws:iam::123456789012:user/ci"
16 },
17 "Action": "sts:AssumeRole",
18 "Condition": {}
19 }
20 ]
21 },
22 "MaxSessionDuration": 3600,
23 "RoleLastUsed": {}
24 }
25}
26dev@pwnbox:~$ kubectl auth can-i --list
27Resources Non-Resource URLs Resource Names Verbs
28*.* [] [] [*]
29 [*] [] [*]
30selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
31selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
32 [/api/*] [] [get]
33 [/api] [] [get]
34 [/apis/*] [] [get]
35 [/apis] [] [get]
36 [/healthz] [] [get]
37 [/healthz] [] [get]
38 [/livez] [] [get]
39 [/livez] [] [get]
40 [/openapi/*] [] [get]
41 [/openapi] [] [get]
42 [/readyz] [] [get]
43 [/readyz] [] [get]
44 [/version/] [] [get]
45 [/version/] [] [get]
46 [/version] [] [get]
47 [/version] [] [get]
48podsecuritypolicies.policy [] [eks.privileged] [use]
49
50dev@pwnbox:~$ aws iam get-role --role-name ci-eks-test
51{
52 "Role": {
53 "Path": "/",
54 "RoleName": "ci-eks-test",
55 "RoleId": "AROAYREY3WYOAK6QMGSUT",
56 "Arn": "arn:aws:iam::123456789012:role/ci-eks-test",
57 "CreateDate": "2021-06-20T21:14:28+00:00",
58 "AssumeRolePolicyDocument": {
59 "Version": "2012-10-17",
60 "Statement": [
61 {
62 "Effect": "Allow",
63 "Principal": {
64 "AWS": "arn:aws:iam::123456789012:user/ci"
65 },
66 "Action": "sts:AssumeRole",
67 "Condition": {}
68 }
69 ]
70 },
71 "MaxSessionDuration": 3600,
72 "RoleLastUsed": {}
73 }
74}
75dev@pwnbox:~$ kubectl auth can-i --list
76Resources Non-Resource URLs Resource Names Verbs
77*.* [] [] [*]
78 [*] [] [*]
79selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
80selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
81 [/api/*] [] [get]
82 [/api] [] [get]
83 [/apis/*] [] [get]
84 [/apis] [] [get]
85 [/healthz] [] [get]
86 [/healthz] [] [get]
87 [/livez] [] [get]
88 [/livez] [] [get]
89 [/openapi/*] [] [get]
90 [/openapi] [] [get]
91 [/readyz] [] [get]
92 [/readyz] [] [get]
93 [/version/] [] [get]
94 [/version/] [] [get]
95 [/version] [] [get]
96 [/version] [] [get]
97podsecuritypolicies.policy [] [eks.privileged] [use]
As you can see we have a completely new role as identified by different RoleId
. However, the Amazon EKS authentication service uses the friendly name of the role to grant us the same system:masters
permissions.
The most effective way to mitigate this issue is to use a centralized deployment system, use a dedicated role for deployments, and avoid use of personal credentials at all cost. Tools such as aws-vault can help you securely manage switching between different roles.
Next let's have a look at how nodes authenticate to the control plane.
Use multi-account separation for clusters
Amazon EKS requires that all nodes have an instance profile attached with the following two policies: AmazonEKSWorkerNodePolicy
and AmazonEC2ContainerRegistryReadOnly
.
1dev@pwnbox:$ aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy --version-id v1
2{
3 "PolicyVersion": {
4 "Document": {
5 "Version": "2012-10-17",
6 "Statement": [
7 {
8 "Action": [
9 "ec2:DescribeInstances",
10 "ec2:DescribeRouteTables",
11 "ec2:DescribeSecurityGroups",
12 "ec2:DescribeSubnets",
13 "ec2:DescribeVolumes",
14 "ec2:DescribeVolumesModifications",
15 "ec2:DescribeVpcs",
16 "eks:DescribeCluster"
17 ],
18 "Resource": "*",
19 "Effect": "Allow"
20 }
21 ]
22 },
23 "VersionId": "v1",
24 "IsDefaultVersion": true,
25 "CreateDate": "2018-05-27T21:09:01+00:00"
26 }
27}
28
29dev@pwnbox:$ aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly --version-id v3
30{
31 "PolicyVersion": {
32 "Document": {
33 "Version": "2012-10-17",
34 "Statement": [
35 {
36 "Effect": "Allow",
37 "Action": [
38 "ecr:GetAuthorizationToken",
39 "ecr:BatchCheckLayerAvailability",
40 "ecr:GetDownloadUrlForLayer",
41 "ecr:GetRepositoryPolicy",
42 "ecr:DescribeRepositories",
43 "ecr:ListImages",
44 "ecr:DescribeImages",
45 "ecr:BatchGetImage",
46 "ecr:GetLifecyclePolicy",
47 "ecr:GetLifecyclePolicyPreview",
48 "ecr:ListTagsForResource",
49 "ecr:DescribeImageScanFindings"
50 ],
51 "Resource": "*"
52 }
53 ]
54 },
55 "VersionId": "v3",
56 "IsDefaultVersion": true,
57 "CreateDate": "2019-12-10T20:56:32+00:00"
58 }
59}
The policies by default gain privileges to all resources within the AWS account. This means that a compromised pod on this cluster will be able to enumerate information about any resource, vpc, subnet, and image in the account.
The Amazon EKS node actually requires access to one more managed policy named AmazonEKS_CNI_Policy
. This policy allows the node to allocate IP addresses and perform any networking-related functionality.
1dev@pwnbox:$ aws iam get-policy-version --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy --version-id v4
2{
3 "PolicyVersion": {
4 "Document": {
5 "Version": "2012-10-17",
6 "Statement": [
7 {
8 "Effect": "Allow",
9 "Action": [
10 "ec2:AssignPrivateIpAddresses",
11 "ec2:AttachNetworkInterface",
12 "ec2:CreateNetworkInterface",
13 "ec2:DeleteNetworkInterface",
14 "ec2:DescribeInstances",
15 "ec2:DescribeTags",
16 "ec2:DescribeNetworkInterfaces",
17 "ec2:DescribeInstanceTypes",
18 "ec2:DetachNetworkInterface",
19 "ec2:ModifyNetworkInterfaceAttribute",
20 "ec2:UnassignPrivateIpAddresses"
21 ],
22 "Resource": "*"
23 },
24 {
25 "Effect": "Allow",
26 "Action": [
27 "ec2:CreateTags"
28 ],
29 "Resource": [
30 "arn:aws:ec2:*:*:network-interface/*"
31 ]
32 }
33 ]
34 },
35 "VersionId": "v4",
36 "IsDefaultVersion": true,
37 "CreateDate": "2020-04-20T20:52:01+00:00"
38 }
39}
This policy allows the caller to modify the networking configuration of any instance in the account. Amazon actually recommends this policy be attached directly to the aws-node Kubernetes service account. By default, the Terraform module attaches this policy to the IAM profile. We can change that behavior by setting the attach_worker_cni_policy
attribute to false.
Encrypt your Secrets
One of the control plane components managed by the Amazon EKS service is the etcd key value store. It is used by the Kubernetes API server to store all objects and configurations. By default Kubernetes does not encrypt data stored in this key value store. AWS is responsible for security of the control plane components however we can yet again apply a defence in depth strategy to further enhance the security of our data.
AWS KMS service is one of the encryption providers that can provide envelope encryption for secrets objects stored in our cluster. By setting the `cluster_encryption_config` option we can specify a KMS key which will be used to encrypt intermediate data encryption keys, which in turn will be used to encrypt specific objects. This Amazon EKS blog post from AWS provides a great overview of the underlying process. From this point onward anyone with access to an etcd cluster will not be able to read our secrets without also having access to our KMS key. The whole process is fully transparent to the end user.
Detect issues in the deployment pipeline
Infrastructure as code allows us to declaratively describe the desired state of the Amazon EKS cluster. With that we have the ability to statically discover some of these issues before anything is deployed.
In Terraform we can generate a plan of configuration that will be deployed.
1dev@pwnbox:$ terraform plan
2
3Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
4 + create
5 <= read (data resources)
6
7Terraform will perform the following actions:
8
9 # data.aws_eks_cluster.cluster will be read during apply
10 # (config refers to values not yet known)
11 <= data "aws_eks_cluster" "cluster" {
12 + arn = (known after apply)
13 + certificate_authority = (known after apply)
14 + created_at = (known after apply)
15 + enabled_cluster_log_types = (known after apply)
16 + endpoint = (known after apply)
17 + id = (known after apply)
18 + identity = (known after apply)
19 + kubernetes_network_config = (known after apply)
20 + name = (known after apply)
21 + platform_version = (known after apply)
22 + role_arn = (known after apply)
23 + status = (known after apply)
24 + tags = (known after apply)
25 + version = (known after apply)
26 + vpc_config = (known after apply)
27 }
28
29<OMITTED>
30
31# module.vpc.aws_vpc.this[0] will be created
32 + resource "aws_vpc" "this" {
33 + arn = (known after apply)
34 + assign_generated_ipv6_cidr_block = false
35 + cidr_block = "10.0.0.0/16"
36 + default_network_acl_id = (known after apply)
37 + default_route_table_id = (known after apply)
38 + default_security_group_id = (known after apply)
39 + dhcp_options_id = (known after apply)
40 + enable_classiclink = (known after apply)
41 + enable_classiclink_dns_support = (known after apply)
42 + enable_dns_hostnames = false
43 + enable_dns_support = true
44 + id = (known after apply)
45 + instance_tenancy = "default"
46 + ipv6_association_id = (known after apply)
47 + ipv6_cidr_block = (known after apply)
48 + main_route_table_id = (known after apply)
49 + owner_id = (known after apply)
50 + tags = {
51 + "Name" = "eks-threat-modelling-ddddaaaa"
52 }
53 + tags_all = {
54 + "Name" = "eks-threat-modelling-ddddaaaa"
55 }
56 }
57
58Plan: 44 to add, 0 to change, 0 to destroy.
59
60Changes to Outputs:
61 + cluster_endpoint = (known after apply)
62 + cluster_name = (known after apply)
63 + cluster_security_group_id = (known after apply)
64 + config_map_aws_auth = []
65 + kubectl_config = (known after apply)
66 + region = "eu-west-1
This plan can be converted into JSON format which allows us to perform static analysis on all the attributes.
1dev@pwnbox:$ terraform show -json tfplan > tfplan.json
2dev@pwnbox:$ jq '.resource_changes[] | select((.type == "aws_eks_cluster") and .mode == "managed").change.after' tfplan.json
3{
4 "enabled_cluster_log_types": [
5 "audit",
6 "authenticator"
7 ],
8 "encryption_config": [
9 {
10 "provider": [
11 {}
12 ],
13 "resources": [
14 "secrets"
15 ]
16 }
17 ],
18 "kubernetes_network_config": [
19 {}
20 ],
21 "name": "eks-threat-modelling-ddddaaaa",
22 "tags": null,
23 "timeouts": {
24 "create": "30m",
25 "delete": "15m",
26 "update": null
27 },
28 "version": "1.19",
29 "vpc_config": [
30 {
31 "endpoint_private_access": true,
32 "endpoint_public_access": false,
33 "public_access_cidrs": [
34 "0.0.0.0/0"
35 ]
36 }
37 ]
38}
Once we understand the structure of the Terraform plan, we can use the open policy agent and its declarative language to write simple tests.
1package play
2
3deny[msg]{
4 input.vpc_config[0].endpoint_public_access == true
5 msg := sprintf("The public endpoint is enabled on the cluster", [])
6}
You can play with this simple example in an online Rego Playground.
Writing these rules can become cumbersome. It requires analysis of Terraform output structures, parsing the files, library maintenance and results reported in a meaningful way. This is where products such as Snyk IaC can help. Here is an example of the security scan performed on our default deployment.
1dev@pwnbox:$ snyk iac test tfplan.json
2
3Testing tfplan.json...
4
5Infrastructure as code issues:
6 ✗ EKS cluster allows public access [High Severity] [SNYK-CC-TF-94] in EKS
7 introduced by aws_eks_cluster[this] > vpc_config
8
9 ✗ Non-Encrypted root block device [Medium Severity] [SNYK-CC-TF-53] in EC2
10 introduced by aws_launch_configuration[workers] > root_block_device > encrypted
11
12 ✗ Public IPs are automatically mapped to instances [Low Severity] [SNYK-CC-AWS-427] in VPC
13 introduced by aws_subnet[public_0] > map_public_ip_on_launch
14
15 ✗ EKS control plane logging insufficient [Low Severity] [SNYK-CC-TF-131] in EKS
16 introduced by aws_eks_cluster[this] > enabled_cluster_log_types
17
18 ✗ Public IPs are automatically mapped to instances [Low Severity] [SNYK-CC-AWS-427] in VPC
19 introduced by aws_subnet[public_2] > map_public_ip_on_launch
20
21 ✗ Public IPs are automatically mapped to instances [Low Severity] [SNYK-CC-AWS-427] in VPC
22 introduced by aws_subnet[public_1] > map_public_ip_on_launch
23
24Organization: p0tr3c
25Type: Terraform
26Target file: tfplan.json
27Project name: hardened
28Open source: no
29Project path: tfplan.json
30
31Tested tfplan.json for known issues, found 6 issues
32
33dev@pwnbox:/$
By using Snyk’s IaC scanning, you can detect issues with your Terraform configuration before deploying to production, ensuring your production environments remain secure. Sign up for a free account at https://app.snyk.io! Additionally, you can also try Snyk's free code checker tool for a quick sense on the security of your code.
Secure infrastructure from the source
Snyk automates IaC security and compliance in workflows and detects drifted and missing resources.