Best practices for container isolation
Maryann Agofure
29 août 2022
0 minutes de lectureContainers are a standardized software packaging format that provides a predictable, replicable way to run applications. Container isolation is one of the primary benefits of containerized applications. Using containers enables us to isolate our software from its environment, increasing consistency and reliability across our development and staging environments.
You’re probably familiar with — or are using — Docker containers. Docker containers achieve isolation by leveraging Linux features like control groups (commonly abbreviated as cgroups), secure computing mode (seccomp) filters, and kernel namespaces. There are other containers, too, including sandbox containers — like gVisor — and virtualized containers — like AWS Firecracker.
Although containers are isolated by design, we still need to implement best practices to ensure we effectively isolate our containers to keep them secure. This article explores the implications of container isolation and outlines best practices for security and methodology for Linux, sandbox, and virtualized containers.
What is container isolation, and how does it work?
As its name implies, container isolation involves isolating a containerized application’s runtime environment from the host operating system and other processes running on the host.
This isolation takes several forms, including file system isolation, network isolation, system call isolation, and isolation of resource usages — such as CPU and memory.
The technical details of how container isolation works depend on the container we use. As we’ll see, there are several to choose from.
Multiple approaches to container isolation
As we previously mentioned, this article will explore container isolation and its relationship to three different containers: Linux containers like Docker, sandboxed containers like gVisor, and lightweight KVM-based virtualization like AWS Firecracker.
Each container type approaches container isolation differently, isolating different system parts. Each type of container has distinctive best practices to follow when isolating our containers.
Let’s explore the best practices for container isolation with each container type.
Linux containers
Linux containers like Docker use cgroups, seccomp filters, and kernel namespaces to isolate containers. cgroups make it possible to place resource usage limits on a group of processes. For example, cgroups can cap usage of various resources, such as disk I/O, memory, network, CPU time, and even individual CPUs in a multi-core system. Seccomp allows us to filter all system calls made by a process. This filtering works with kernel namespacing technique, enabling functions within a container to have an isolated view of the system.
This approach is the most straightforward, but it also means that Docker now has to ensure all containerized applications correctly set up their namespace flags upon creation.
Docker’s approach is relatively simple, making it easy to use. However, this simplicity means its isolation isn’t as strong as alternatives like gVisor. Docker has to allow some system calls across the namespace boundary, meaning that an application can still gain access to information on the host if there’s a valid argument for a given call in the seccomp filter.
Fortunately, there are a few best practices we can follow to make the most of container isolation under Docker:
Use a dedicated user for each container by specifying a specific user account in your Dockerfile. Doing so ensures that one container can’t access or modify another container’s resources (for example, files and directories). Use the least-privileged users possible for containers when creating new Docker images and runc configurations.
Limit the capabilities of containers by using the previously mentioned Linux capabilities. Aim for the principle of least privilege: give a container only what it needs to accomplish its tasks. Following this principle limits what an application inside a container can see or do outside its isolated environment.
Block unneeded network devices from being configured for each container. This limits what a container can see and do on the host’s networking infrastructure.
Use cgroup restrictions to limit resources available to each container, such as CPU shares, memory pages, block I/O bandwidth, and more.
Configure kernel parameters such as the number of PIDs, maximum stack size, and the maximum number of threads a container can spawn. These will help prevent one container from taking over another’s PID namespace, or crashing the host’s kernel by triggering a panic.
When combined, these practices reduce the attack surface of our containers by limiting the number of things an attacker can exploit. Of course, the best security defense is never to run untrusted code in our containers. However, this means we need to know exactly what we’re running in every container, which is rarely realistic for busy development and DevOps teams trying to hit deadlines.
If we know we need to run an untrusted container, the next best practice we should consider is using a container runtime with a stronger security model.
Sandboxed containers
Sandboxed containers offer the same isolation mechanisms as regular Linux containers and add extra layers of protection.
gVisor, an excellent sandboxed container, implements a custom userspace mini-kernel that sits between containerized applications and the host’s kernel. It intercepts all the container’s system calls and performs a policy check before passing each call off to the host kernel. gVisor also implements a custom TCP/IP stack to enforce greater control over the way containerized workloads interact with the network. Finally, gVisor implements a filesystem proxy that sits between the container and the host’s filesystem. These layered checks mean that gVisor can reduce a container’s attack surface by enforcing firm container boundaries while maintaining compatibility with most applications. And since gVisor is written in Go, a memory-safe language, it’s far less likely to suffer from buffer overflows and other exploits than applications written in C like the Linux Kernel.
gVisor’s approach does have some drawbacks, however. One is that it can be challenging to debug applications running inside gVisor, as we won’t be able to use most tools designed for the Linux kernel. Another drawback is that because gVisor doesn’t use a standard Linux kernel, we may need to reimplement features to support some workloads running inside gVisor.
While gVisor and other sandboxed containers are a step up from regular Linux containers, we might find that upgrading to a container model with even stronger isolation is the most appropriate best practice.
Lightweight virtual machines
In contrast to Linux containers and sandboxed containers, lightweight Virtual Machine (VM)-based containers like AWS Firecracker take an entirely different approach. They use a hypervisor, like KVM or qemu, to create lightweight VMs (typically called microVMs) for each container. This strong isolation means that the attack surface inside the guest microVMs is minimal and easy to control.
This technique makes it very difficult for an attacker to gain access to privileged information on the host, but also means fewer ways for a container to interact with the host. For example, containers running in microVMs can’t directly share files with the host.
MicroVMs have a smaller attack surface than containers running on traditional VMs because they only need to support a subset of the hardware devices supported by normal Linux kernels.
Overall, microVMs are the best choice if we need very strong isolation between containers — like when running untrusted code from multiple tenants on the same server.
Approaching container isolation
Regardless of which container type we use, there’s no one-size-fits-all solution for container isolation. So, we need to balance the following factors for the unique requirements of each project.
Performance. Linux containers offer the best performance because they have the fewest layers of indirection between the container and the host OS. Sandboxed containers are measurably slower due to the additional layers of intermediaries between the container and the host OS. microVMs have the largest performance hit since they have to provide a virtualized hardware interface to each container.
Security. As we increase container isolation capabilities, our containers become more secure. As we’ve seen, sandboxed containers are more secure than Linux containers, and microVM-based containers are more secure than either one.
Complexity and development time. Linux containers are relatively simple and represent the majority of containers running in production. Excellent tooling and documentation are plentiful, cutting development time and reducing the complexity of containerized app deployments. Sandboxed containers and microVMs are far less common, meaning many common tools don’t support them. Since they are less popular, fewer tools and platforms support them — which increases development time and complexity because developer and DevOps teams must do more work manually, instead of relying on tooling.
Ultimately, we’ll need to trade-off between performance, security, and complexity. Prioritizing performance, for example, might mean we’ll have to give up some security. On the other hand, prioritizing security might mean giving up performance and dealing with increased development time.
Container security
As developers and DevOps practitioners, we need to know how our containers are isolated from our host machines to understand the limitations of our container isolation. While containers are great at providing rapid development cycles, it’s important to employ best practices to keep our containers — and applications — secure.
All development and DevOps teams should employ a defense-in-depth strategy, and container isolation is just one piece of the puzzle. You can improve your systems’ security by ensuring your code is vulnerability free by using a code scanner and a container scanner to keep the content of your containers secure. After all, container isolation should be one line of defense — not your only line of defense.
La sécurité des conteneurs au service des développeurs
Snyk détecte et corrige automatiquement les vulnérabilités des images de conteneurs et des charges de travail Kubernetes.