Caching Docker images in Azure DevOps Pipelines
TL;DR
: Go to the bottom of the post to see the full Pipeline template.
The problem
In the Data Science team at DFDS, we are using Azure DevOps Pipelines to build and deploy our models. We are using Docker containers to package our models, and we are using Azure Pipelines for our CI/CD.
For most projects we will build the docker images in:
- The pull request: To make sure the docker image can be built and sometimes also to run some tests in the new container.
- After merging to main: To build the final image that will be deployed to production.
Step 1 usually happens more than once, as issues with a PR will often require multiple iterations of reviews and fixes. For this reason, it is important that the build time is as short as possible. Long feedback loops are not good for productivity.
So the solution is to cache the docker images between builds. Azure Pipelines even has a Cache task that claims to help with caching docker builds. But the commands listed on that documentation page have never worked for me.
The solution
My brilliant friend Morten Hels came up with a solution that works. I’m taking the liberty of writing it down here, but he is the one who deserves the credit.
Instead of using docker save
and docker load
for (attempting to) make cached docker layers available, we use docker buildx
to build the image from, and save to, a cache.
The commend to run is:
docker buildx create --name builder --driver docker-container --use #1
docker buildx build \
--cache-from=type=local,src=docker_cache \ #2
--cache-to=type=local,dest=docker_cache,mode=max \ #3
--file Dockerfile \
--output=type=docker,name=myimage \ #4
.
- Create a new builder, and use it. This is needed to make the
--cache-from
and--cache-to
options available. I’m using thedocker-container
driver, but there are other options available. This one is just the easiest to set up, both locally and in a pipeline. - Use the local cache as a source for the build. This will make the build use the cached layers if they are available.
- Save the layers that were used in the build to the local cache. This will make the layers available for the next build.
- Set the output to be a docker image. This is needed to make the image available for the next step in the pipeline, e.g. pushing it to a registry.
The pipeline template
Here is a complete pipeline template that you can use in your own pipelines.
parameters:
- name: docker_image_name
type: string
displayName: 'The name of the Docker image to build. Example: klaur-testing.'
- name: additional_docker_build_args
type: string
default: ''
displayName: 'Additional arguments to pass to the docker build command. Example: --build-arg SOME_ARG=some_value.'
- name: dockerfile_path
type: string
default: 'Dockerfile'
displayName: 'The path to the Dockerfile to use. Example: Dockerfile.'
- name: docker_build_context
type: string
default: '.'
displayName: 'The path to the directory to use as the build context. Example: .'
steps:
- task: Cache@2
displayName: Cache Docker layers
inputs:
key: '"docker" | "$(Agent.OS)" | "$" | $'
restoreKeys: |
"docker" | "$(Agent.OS)" | "$"
path: $(Pipeline.Workspace)/docker_cache
- script: |
docker buildx create --name builder --driver docker-container --use
docker buildx build \
--cache-from=type=local,src=$(Pipeline.Workspace)/docker_cache \
--cache-to=type=local,dest=$(Pipeline.Workspace)/docker_cache,mode=max \
--file $ \
--output=type=docker,name=$ \
$ $
displayName: Build Docker image
env:
DOCKER_BUILDKIT: 1
If the above yaml is saved in a templates.yaml
file, you can use it in your pipeline like this:
jobs:
- job: BuildDockerImage
steps:
- template: templates.yaml
parameters:
docker_image_name: 'my-image'
additional_docker_build_args: '--build-arg SOME_ARG=some_value'
dockerfile_path: 'Dockerfile'
docker_build_context: '.'
References
- Morten Hels - Great data scientist moonlighting as an excellent data engineer.
- Stack Overflow post that Morten claims got him on the right track.
- Docker documentation on
docker buildx
.