TL;DR: Go to the bottom of the post to see the full Pipeline template.
In the Data Science team at DFDS, we are using Azure DevOps Pipelines to build and deploy our models. We are using Docker containers to package our models, and we are using Azure Pipelines for our CI/CD.
For most projects we will build the docker images in:
- The pull request: To make sure the docker image can be built and sometimes also to run some tests in the new container.
- After merging to main: To build the final image that will be deployed to production.
Step 1 usually happens more than once, as issues with a PR will often require multiple iterations of reviews and fixes. For this reason, it is important that the build time is as short as possible. Long feedback loops are not good for productivity.
So the solution is to cache the docker images between builds. Azure Pipelines even has a Cache task that claims to help with caching docker builds. But the commands listed on that documentation page have never worked for me.
My brilliant friend Morten Hels came up with a solution that works. I’m taking the liberty of writing it down here, but he is the one who deserves the credit.
Instead of using
docker save and
docker load for (attempting to) make cached docker layers available, we use
docker buildx to build the image from, and save to, a cache.
The commend to run is:
docker buildx create --name builder --driver docker-container --use #1 docker buildx build \ --cache-from=type=local,src=docker_cache \ #2 --cache-to=type=local,dest=docker_cache,mode=max \ #3 --file Dockerfile \ --output=type=docker,name=myimage \ #4 .
- Create a new builder, and use it. This is needed to make the
--cache-tooptions available. I’m using the
docker-containerdriver, but there are other options available. This one is just the easiest to set up, both locally and in a pipeline.
- Use the local cache as a source for the build. This will make the build use the cached layers if they are available.
- Save the layers that were used in the build to the local cache. This will make the layers available for the next build.
- Set the output to be a docker image. This is needed to make the image available for the next step in the pipeline, e.g. pushing it to a registry.
The pipeline template
Here is a complete pipeline template that you can use in your own pipelines.
parameters: - name: docker_image_name type: string displayName: 'The name of the Docker image to build. Example: klaur-testing.' - name: additional_docker_build_args type: string default: '' displayName: 'Additional arguments to pass to the docker build command. Example: --build-arg SOME_ARG=some_value.' - name: dockerfile_path type: string default: 'Dockerfile' displayName: 'The path to the Dockerfile to use. Example: Dockerfile.' - name: docker_build_context type: string default: '.' displayName: 'The path to the directory to use as the build context. Example: .' steps: - task: Cache@2 displayName: Cache Docker layers inputs: key: '"docker" | "$(Agent.OS)" | "$" | $' restoreKeys: | "docker" | "$(Agent.OS)" | "$" path: $(Pipeline.Workspace)/docker_cache - script: | docker buildx create --name builder --driver docker-container --use docker buildx build \ --cache-from=type=local,src=$(Pipeline.Workspace)/docker_cache \ --cache-to=type=local,dest=$(Pipeline.Workspace)/docker_cache,mode=max \ --file $ \ --output=type=docker,name=$ \ $ $ displayName: Build Docker image env: DOCKER_BUILDKIT: 1
If the above yaml is saved in a
templates.yaml file, you can use it in your pipeline like this:
jobs: - job: BuildDockerImage steps: - template: templates.yaml parameters: docker_image_name: 'my-image' additional_docker_build_args: '--build-arg SOME_ARG=some_value' dockerfile_path: 'Dockerfile' docker_build_context: '.'