Debugging Docker image steps with BuildKit

Dockerfiles are kind of weird to debug. They're fundamentally stateful (every RUN depends on the previous commands).

For complex things, it's not always easy to figure out the exact state of the image at a given point in the Dockerfile (most often for me, I need to figure out where certain files were installed).


With pre-BuildKit docker, this was pretty easy. Docker would always create intermediate images for every step of the Dockerfile and print out the short-SHAs of those images.

Let's use this simple Dockerfile as an example and see what happens when we build it:

# Dockerfile
FROM ubuntu:20.04

RUN apt-get update && apt-get install -y curl
RUN pip3 install numpy

By disabling BuildKit, we get the old behavior:

$ DOCKER_BUILDKIT=0 docker build .

Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM ubuntu:20.04
 ---> bdbe84df0b98
Step 2/3 : RUN apt-get update && apt-get install -y curl
 ---> Running in 7caa3a90fa96
Get:1 focal InRelease [265 kB]
Get:2 focal-updates InRelease [114 kB]
... <clip> ...
Removing intermediate container 7caa3a90fa96
 ---> c47ed9aec5f6
Step 3/3 : RUN pip3 install numpy
 ---> Running in facf18bce582
/bin/sh: 1: pip3: not found
The command '/bin/sh -c pip3 install numpy' returned a non-zero code: 127

Notable, the ---> c47ed9aec5f6 line indicates that a temporary image was exported with the SHA c47ed9aec5f6. The way this works under the hood is by creating a container based on the previous layers image, running the command in the container, and then exporting that container as a new image (à la the docker export command) — that's why we also see the Removing intermediate container 7caa3a90fa96 line in the output.

In this example, step 3 fails, but we can spin up the image as it was after step 2 with a simple docker run c47ed9aec5f6 command:

$ docker run -it --rm c47ed9aec5f6 bash

root@7dcf5ca5ac35:/# which pip3
root@7dcf5ca5ac35:/# echo $?

In this example, it's pretty obvious that the issue that pip3 isn't installed, but more complex situations might involve more rooting around.

With BuildKit

BuildKit doesn't export intermediate layers, so we can't use the same trick as above. There's a slightly more involved trick we can use, though, using multi-stage builds1.

# Dockerfile (updated!)
FROM ubuntu:20.04 AS tmp

RUN apt-get update && apt-get install -y curl

FROM tmp
RUN pip3 install numpy

Now we've named our first build stage tmp. We add the FROM tmp line just before the RUN command that isn't working.

Now we can build just the tmp stage:

$ docker build . --target tmp

[+] Building 15.1s (6/6) FINISHED
 => [internal] load build definition from Dockerfile                                                     0.0s
 => => transferring dockerfile: 172B                                                                     0.0s
 => [internal] load .dockerignore                                                                        0.0s
 => => transferring context: 2B                                                                          0.0s
 => [internal] load metadata for                                          0.0s
 => [tmp 1/2] FROM                                                        0.0s
 => [tmp 2/2] RUN apt-get update && apt-get install -y curl                                             14.9s
 => exporting to image                                                                                   0.1s
 => => exporting layers                                                                                  0.1s
 => => writing image sha256:2954b5e4cda897c1767728ad00de75dab8b4950251fb6fdc8eb1d30468d4301b             0.0s

and we can run with:

# Copy the bit after the `sha256:` in the output above as the image to run
$ docker run -it --rm 2954b5e4cda897c1767728ad00de75dab8b4950251fb6fdc8eb1d30468d4301b

root@fd3a4bf0046f:/# which pip3
root@fd3a4bf0046f:/# echo $?

A little more work, but not toooooo bad. Just remember to remove the AS tmp and FROM tmp lines when you're done debugging (they won't hurt anything, but you probably don't want to commit them).

Why not just disable BuildKit?

Many answers on StackOverflow just suggest disabling BuildKit with DOCKER_BUILDKIT=0 and then using the old behavior. This works fine.

The reason you might not want to do that is mostly just that BuildKit is faster and doesn't share any cache with non-BuildKit builds. So if you're trying to debug the last stage of a build, and you set DOCKER_BUILDKIT=0, it'll have to rebuild the entire image from scratch.

Do whatever works for you (and hopefully debugging BuildKit issues will get a little easier in the future...). If you know of a better way, feel free to let me know. 😎


  1. Note that a step in a Dockerfile corresponds to a single instruction (e.g., a RUN or ADD command). A stage is a grouping of steps starting from a FROM commandand ending with the nextFROM` command (or the end of the Dockerfile). Most Dockerfiles only have one stage, but some have more (e.g., to build a binary in one stage and then copy it into a smaller stage that doesn't have all the build tools installed).