Skip to main content

Container image formats under the hood

Written by:
Agata Krajewska
Agata Krajewska
wordpress-sync/containerimageformats

November 18, 2020

0 mins read

Over the last few years, following Docker's release, containers have become more and more the standard mechanism for software delivery.

We see a growing number of container-based solutions and while innovation in the space is obviously welcomed, there is a requirement for establishing certain standards around format and runtime.

Because of the rapid growth of Docker project, Docker images became a standard for many purposes, but with no doubt, there is widespread interest in a single, open container specification, which is:

  • not bound to higher-level constructs such as a particular client or orchestration stack,

  • not tightly associated with any particular commercial vendor or project,

  • portable across a wide variety of operating systems, hardware, CPU architectures, public clouds, etc.

OCI (Open Container Initiative) is a Linux Foundation project to design open standards for operating-system-level virtualization, most importantly Linux containers.

Docker has donated both a draft specification for the base format and runtime and the code associated with a reference implementation of that specification, to the OCI.

Various open-source build tools support the OCI image format now, including:

  • BuildKit: an optimized rewrite of Docker's build engine

  • Podman: an alternative implementation of Docker's command-line tool

  • Buildah: a command-line alternative to writing Dockerfiles

Snyk supports all of the common container image formats, and because of that, we can integrate with a wide range of tools from across the ecosystem. In this blog post, we'll look at some of these, and show how Snyk works with them!

Docker archive

Firstly let's take a look at the Docker archive format. This is part of the deprecated v1 of the docker image specification, and you can find the full specification here. When you run docker save, this is the default format that Docker client will output, which is used for backward compatibility with other tools. Container registries use a newer format to store the images

If you're running Docker locally and using the Snyk CLI or any of our container integrations, we will use your local Docker instance to pull & save the image to the filesystem and start the analysis.

Let's explore what that archive looks like, and what in there is interesting for Snyk!

I'll be using the ubuntu:bionic image in this post. First, let's save the archive into your local filesystem:

docker save --output ubuntu-docker.tar ubuntu:bionic

The bionic distribution has three image layers, which we'll be looking closer at.

After we've unpacked the tar archive, we can inspect what's inside:

[ubuntu-docker] ll                                                                                                               
total 24
drwxr-xr-x  5 agatakrajewska  staff   160B 25 Sep 23:33 46076a325f0de3f745254638b8b0f0de343685b34e7ca6ec5cd0b6b7930eb7fa
drwxr-xr-x  5 agatakrajewska  staff   160B 25 Sep 23:33 468327b5cd7ce539db695bd0ef05dae8a4ff77b02870a8e823ed74dedad4bd55
-rw-r--r--  1 agatakrajewska  staff   3.3K 25 Sep 23:33 56def654ec22f857f480cdcc640c474e2f84d4be2e549a9d16eaba3f397596e9.json
drwxr-xr-x  5 agatakrajewska  staff   160B 25 Sep 23:33 8bf067b107a6f7444876e33c6ed85652355f679ac98ebab97ab3ebad63f0dff3
-rw-r--r--  1 agatakrajewska  staff   356B  1 Jan  1970 manifest.json
-rw-r--r--  1 agatakrajewska  staff    89B  1 Jan  1970 repositories

Let's have a look at manifest.json:

[ubuntu-docker] cat manifest.json | jq                                                                                           
[
  {
    "Config": "56def654ec22f857f480cdcc640c474e2f84d4be2e549a9d16eaba3f397596e9.json",
    "RepoTags": [
      "ubuntu:bionic"
    ],
    "Layers": [
      "8bf067b107a6f7444876e33c6ed85652355f679ac98ebab97ab3ebad63f0dff3/layer.tar",
      "468327b5cd7ce539db695bd0ef05dae8a4ff77b02870a8e823ed74dedad4bd55/layer.tar",
      "46076a325f0de3f745254638b8b0f0de343685b34e7ca6ec5cd0b6b7930eb7fa/layer.tar"
    ]
  }
]

The file is pointing us at container config where we can find some super useful info, like architecture, configuration, root filesystem layers, etc. Also, probably the most interesting part from Snyk's point of view, the Layers property is pointing us at actual image layers, which are just directories in the archive. If we change into one of these directories, we can see what it contains:

[8bf067b107a6f7444876e33c6ed85652355f679ac98ebab97ab3ebad63f0dff3] ll                                                            
total 128160
-rw-r--r--  1 agatakrajewska  staff     3B 25 Sep 23:33 VERSION
-rw-r--r--  1 agatakrajewska  staff   401B 25 Sep 23:33 json
-rw-r--r--  1 agatakrajewska  staff    63M 25 Sep 23:33 layer.tar

And dig a little deeper and unpack the layer.tar:

[layer] ll                                                                                                                       
total 0
drwxr-xr-x  87 agatakrajewska  staff   2.7K 21 Sep 18:17 bin
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 boot
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:17 dev
drwxr-xr-x  68 agatakrajewska  staff   2.1K 21 Sep 18:17 etc
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 home
drwxr-xr-x   8 agatakrajewska  staff   256B 23 May  2017 lib
drwxr-xr-x   3 agatakrajewska  staff    96B 21 Sep 18:16 lib64
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 media
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 mnt
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 opt
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 proc
drwx------   4 agatakrajewska  staff   128B 21 Sep 18:17 root
drwxr-xr-x   5 agatakrajewska  staff   160B 21 Sep 18:14 run
drwxr-xr-x  68 agatakrajewska  staff   2.1K 21 Sep 18:17 sbin
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 srv
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 sys
drwxrwxrwt   2 agatakrajewska  staff    64B 21 Sep 18:17 tmp
drwxr-xr-x  10 agatakrajewska  staff   320B 21 Sep 18:14 usr
drwxr-xr-x  13 agatakrajewska  staff   416B 21 Sep 18:17 var

As you can see above, the layer.tar is just a filesystem changeset for the image layer, with all the dependencies and other binaries, depending on the image content. After we extract & analyze the content

s of those layers, we can show you the list of vulnerable paths.

Now let's snyk test the saved Docker archive image, we can test a local archive by specifying docker-archive: prefix:

snyk container test docker-archive:ubuntu-docker.tar or scan the image from your local Docker repository:

snyk container test ubuntu:bionic

Both will produce exactly the same output:

Testing ubuntu:bionic...

✗ Low severity vulnerability found in tar
  Description: Loop with Unreachable Exit Condition ('Infinite Loop')
  Info: <https://snyk.io/vuln/SNYK-UBUNTU1804-TAR-312298>
  Introduced through: tar@1.29b-2ubuntu0.1, pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2
  From: tar@1.29b-2ubuntu0.1
  From: pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2 > debconf@1.5.66ubuntu1 > perl/perl-base@5.26.1-6ubuntu0.3 > dpkg@1.19.0.5ubuntu2.3 > tar@1.29b-2ubuntu0.1

Low severity vulnerability found in tar
  Description: NULL Pointer Dereference
  Info: <https://snyk.io/vuln/SNYK-UBUNTU1804-TAR-559435>
  Introduced through: tar@1.29b-2ubuntu0.1, pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2
  From: tar@1.29b-2ubuntu0.1
  From: pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2 > debconf@1.5.66ubuntu1 > perl/perl-base@5.26.1-6ubuntu0.3 > dpkg@1.19.0.5ubuntu2.3 > tar@1.29b-2ubuntu0.1

# vuln list continues in here
...

Organization:      example-org
Package manager:   deb
Project name:      docker-image|ubuntu
Docker image:      ubuntu:bionic
Licenses:          enabled

Tested 90 dependencies for known issues, found 31 issues.

Pro tip: use `--file` option to get base image remediation advice.
Example: $ snyk test --docker ubuntu:bionic --file=path/to/Dockerfile

OCI image

In the last few months we've also added OCI archives scanning to our list of features in the Snyk CLI.

Let's have a look at the spec now and see what we can find in the archive! I'm going to use ubuntu:bionic for this section as well.

If you'd like to inspect an OCI archive tarball yourself, you can run a following command:

skopeo copy --override-os linux docker://ubuntu:bionic oci-archive:ubuntu.tar

I'm using Skopeo here to save the image, it is a cli utility tool, which makes performing various operations on images super easy.

All we have to do, to save an ubuntu image in an OCI format is to specify oci-archive: prefix to our tar output.

Notice how I'm also using the --override-os flag—it's because I am on macOS and official Docker Hub ubuntu images are only available for Linux. We will also talk about multiple architecture images below.

Once we unpack the tar archive, we can inspect the contents of it:

[ubuntu] ll                                                                                                                      
total 16
drwxr-xr-x  3 agatakrajewska  staff    96B 28 Oct 11:18 blobs
-rw-r--r--  1 agatakrajewska  staff   186B 28 Oct 11:18 index.json
-rw-r--r--  1 agatakrajewska  staff    31B 28 Oct 11:18 oci-layout

We've got an interesting index.json at the root, let's inspect it's contents:

[ubuntu] cat index.json | jq                                                                                                     
{
  "schemaVersion": 2,
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.manifest.v1+json",
      "digest": "sha256:afa93a8ce255ca452ca8c88f4b5c821a466cf0a3e0148a31d0d97dfdb91d9aef",
      "size": 658
    }
  ]
}

This is our (optional) higher-level manifest, which points us to specific image manifests, it contains information about a set of images that can span a variety of architectures and operating systems. If you're keen to learn more here's the full spec.

Let's now inspect our image specific manifest, which index.json pointed us at, in blobs/sha256 dir:

[sha256] cat afa93a8ce255ca452ca8c88f4b5c821a466cf0a3e0148a31d0d97dfdb91d9aef | jq                                               
{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:33a51d09088285451e7a7525d4bd64fc15563264afe5a91ef84a8b3042018899",
    "size": 2426
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:171857c49d0f5e2ebf623e6cb36a8bcad585ed0c2aa99c87a055df034c1e5848",
      "size": 26701612
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:419640447d267f068d2f84a093cb13a56ce77e130877f5b8bdb4294f4a90a84f",
      "size": 852
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:61e52f862619ab016d3bcfbd78e5c7aaaa1989b4c295e6dbcacddd2d7b93e1f5",
      "size": 162
    }
  ]
}

We're starting to see some interesting contents, like the container config, where we'll again find lots of useful pieces of information about the images architecture, base image layers, config etc. Also we have image layers info available, pointing us to the layers available in our ubuntu:bionic image.

We can see below it's exactly the first layer's content is the same as above for docker archive.

[171857c49d0f5e2ebf623e6cb36a8bcad585ed0c2aa99c87a055df034c1e5848] ll                                                            
total 0
drwxr-xr-x  87 agatakrajewska  staff   2.7K 21 Sep 18:17 bin
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 boot
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:17 dev
drwxr-xr-x  68 agatakrajewska  staff   2.1K 21 Sep 18:17 etc
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 home
drwxr-xr-x   8 agatakrajewska  staff   256B 23 May  2017 lib
drwxr-xr-x   3 agatakrajewska  staff    96B 21 Sep 18:16 lib64
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 media
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 mnt
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 opt
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 proc
drwx------   4 agatakrajewska  staff   128B 21 Sep 18:17 root
drwxr-xr-x   5 agatakrajewska  staff   160B 21 Sep 18:14 run
drwxr-xr-x  68 agatakrajewska  staff   2.1K 21 Sep 18:17 sbin
drwxr-xr-x   2 agatakrajewska  staff    64B 21 Sep 18:14 srv
drwxr-xr-x   2 agatakrajewska  staff    64B 24 Apr  2018 sys
drwxrwxrwt   2 agatakrajewska  staff    64B 21 Sep 18:17 tmp
drwxr-xr-x  10 agatakrajewska  staff   320B 21 Sep 18:14 usr
drwxr-xr-x  13 agatakrajewska  staff   416B 21 Sep 18:17 var

You can scan your local OCI archive by running:

snyk container test oci-archive:ubuntu.tar

Various platforms & architectures

When we were inspecting both Docker archive and OCI image manifests, we've noticed they both carried architecture information. Let's go back and have another look at our container config manifest's contents under ubuntu-docker dir, where we saved our docker archive:

[sha256] cat 56def654ec22f857f480cdcc640c474e2f84d4be2e549a9d16eaba3f397596e9.json | jq
{
  ...
  "architecture": "amd64",
  "os": "linux",
  ...
}

Among other pieces of information we can see architecture & os, which in my case is amd64 & linux. What about if our image is a arm64 based image, or we are on windows platform?

We're in luck, Snyk also supports other platforms, by passing -—platform flag to the Snyk CLI.

We've talked about docker save archive format, let's have a look at the format images are stored in, in Docker Hub, the v2 schema, which is very interesting under the hood. This is a second version of the v2 schema and it was created to support two primary goals. According to the Docker docs:

The first is to allow multi-architecture images, through a “fat manifest” which references image manifests for platform-specific versions of an image. The second is to move the Docker engine towards content-addressable images, by supporting an image model where the image’s configuration can be hashed to generate an ID for the image.

Using our local Docker, we can first inspect the "fat manifest" for the image we're interested in, to check which platform variants are available for us. As explained above, that means that a single repository can house multiple images for different architectures.

docker manifest inspect ubuntu:bionic

As a result we will see an array of manifests:

{
   ...
  "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 943,
         "digest": "sha256:45c6f8f1b2fe15adaa72305616d69a6cd641169bc8b16886756919e7c01fa48b",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 943,
         "digest": "sha256:e80b8affb2361dc632c1fa8fcbf6b6514f750eb6ef99b7e7f825a55f849bfd89",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 943,
         "digest": "sha256:01a2038b20d165ab7df81934f9849bdfbc59bd6f6322c5d11e341504f66ec266",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
...

Now, that we've established ubuntu image comes for various different platforms, let's see how to scan ubuntu image for arm64 platform architecture.

Make sure you have experimental features enabled in Docker on your host and you can run:

snyk container test --platform=linux/arm64 ubuntu:bionic

Let's have a look at the output:

Testing ubuntu:bionic...

✗ Low severity vulnerability found in tar
  Description: Loop with Unreachable Exit Condition ('Infinite Loop')
  Info: <https://snyk.io/vuln/SNYK-UBUNTU1804-TAR-312298>
  Introduced through: tar@1.29b-2ubuntu0.1, pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2
  From: tar@1.29b-2ubuntu0.1
  From: pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2 > debconf@1.5.66ubuntu1 > perl/perl-base@5.26.1-6ubuntu0.3 > dpkg@1.19.0.5ubuntu2.3 > tar@1.29b-2ubuntu0.1

Low severity vulnerability found in tar
  Description: NULL Pointer Dereference
  Info: <https://snyk.io/vuln/SNYK-UBUNTU1804-TAR-559435>
  Introduced through: tar@1.29b-2ubuntu0.1, pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2
  From: tar@1.29b-2ubuntu0.1
  From: pam/libpam-runtime@1.1.8-3.6ubuntu2.18.04.2 > debconf@1.5.66ubuntu1 > perl/perl-base@5.26.1-6ubuntu0.3 > dpkg@1.19.0.5ubuntu2.3 > tar@1.29b-2ubuntu0.1

# vuln list continues in here
...

Organization:      example-org
Package manager:   apk
Project name:      docker-image|ubuntu:bionic
Docker image:      alpine:3.12
Platform:          linux/arm64
Licenses:          enabled

Tested 90 dependencies for known issues, found 31 issues.

Pro tip: use `--file` option to get base image remediation advice.
Example: $ snyk test --docker ubuntu:bionic --file=path/to/Dockerfile

We can see all the vulnerable paths discovered, and also 'Platform' information in the output of our scan.

So in this blog post, we've explored the two main container image formats in Docker and OCI and shown how Snyk can interact with both of them, even across multiple architectures.

To wrap up, keep up with security best practices for building optimal Docker images for Node.js and Java applications:

  1. 10 Docker Security Best Practices — details security practices that you should follow when building docker base images and when pulling them too, as it also introduces the reader to docker content trust.

  2. Are you a Java developer? You’ll find this resource valuable: Docker for Java developers: 5 things you need to know not to fail your security

  3. 10 best practices to containerize Node.js web applications with Docker - If you’re a Node.js developer you are going to love this step by step walkthrough, showing you how to build secure and performant Docker base images for your Node.js applications.

All this functionality is available in Snyk for free — sign up for an account.

Posted in: