Developing with Docker
Update: there's now a companion repo to serve as a demonstration of the below.
You'd think that this topic would have been done to death, but given that every job I've started in the past 10+ years has used Docker differently (if at all) to varying degrees of success, I feel like we need some sense of consensus around how to Do This Right™. And, as someone with an ego the size of a small moon, I'm here to decide what's right... on my own blog anyway.
The argument here is that the use of Docker and various tooling shouldn't be unique to any particular project, that this sort of thing should be so standard it's both common and boring to even think about. My experience tells me that we're not there yet though, so this is just me making the case for what I think constitutes a good setup.
Thesis
The main argument I want to make is that if you're using Docker, you don't write software anymore, but rather you build immutable images. Images that are developed locally, tested in CI, and deployed to production, but importantly, the image at each of these stages is exactly the same: reproducible at every turn. There's little value in developing and testing one thing if what you roll out to production is something else entirely. Say it with me:
Test what you develop. Deploy what you test.
There are a lot of advantages to this model, strongest of which is probably simplicity. It greatly reduces the number of tricks & hacks you need to make your various environments work, thereby reducing surprises (and extra code!) at all stages. It does however require a shift in how you might be used to building things. That's ok though. We'll get through this together ;-)
The 12 Factor Model
Maybe you've heard about it and maybe you haven't, but the 12-factor app is a design pattern that's uniquely suited to Docker-based systems. The gist is that your application code and behaviour shouldn't change based on its environment. At it's simplest, stuff like this:
if ENVIRONMENT == "prod":
do_something_only_production_does()
...shouldn't happen. How exactly do you test that? How do you even demo it locally? You're introducing unpredictable behaviour into your application that can only be revealed in production.
Often external services like ElasticSearch or Kafka are a culprit here. We think to ourselves: "This is development. We don't need an industrial grade service to handle the little stuff I'm doing on my laptop." ...and so we stub out these services with dummy functions or bypass them altogether with hacks like the above.
The problem of course is that in doing so you've cemented a blind spot into your application. Your software behaves differently in one environment than it does in another, so when it breaks in production, you have no reliable way to debug it, let alone catch it before it reaches production in the first place.
The work-around for this is to reproduce your production environment locally as best you can — a feat that wasn't really possible pre-Docker. Maybe you could stand up a database with your application, but a queue, Kafka, or a built-in-house-by-another-team-API-server? Without containers, that's a fool's errand.
But we have containers, so let's use them! When you're developing locally, you can use Docker Compose to spin up everything you use in production. Sure, you won't have 64 instances of the webserver and 128 workers, but one of everything relevant is likely enough, and if you ever want to track down some of the ugly race conditions and async issues, you can always increment the replicas for a single service to 2 or 3.
So what does that look like? Well here's a sample compose.yaml
file for a common pattern you see in Djangoland:
name: myproject
## Note that we're hard-coding credentials here with some very insecure values.
## That's ok, because these values are set to something sensible in production.
## What's important is that (a) the way your application behaves is consistent
## (looks for credentials and uses them to do something), and (b) the values are
## all known, so standing this baby up requires no special knowledge (more on
## this in the next section).
x-base: &base
build:
context: .
environment:
DEBUG: 'True'
ALLOWED_HOSTS: '*'
SECRET_KEY: secret
DB_URL: postgres://postgres:postgres@database/postgres
CACHE_URL: redis://redis:6379/0
QUEUE_URL: redis://redis:6379/1
ELASTICSEARCH_URL: http://elasticsearch:9200
ELASTICSEARCH_USER: elastic
ELASTICSEARCH_PASS: elastic
BULK_UPLOAD_BUCKET: mybucket
BOTO3_ENDPOINT: http://localstack:4566/
AWS_ACCESS_KEY_ID: XXX
AWS_SECRET_ACCESS_KEY: XXX
working_dir: /app/src
volumes:
- .:/app
restart: on-failure
depends_on:
- redis
- database
- the_other_database
- elasticsearch
- localstack
services:
# The database. Our project is using PostgreSQL 16 in production, so
# that's what we use here.
database:
image: postgres:16-alpine
restart: always
environment:
POSTGRES_PASSWORD: postgres
# There's another database running an older version that we pinky-swear
# we're going to update soon, but For Right Now™ it's doing its own thing
# with behaviour that differs from our primary db.
the_other_database:
image: postgres:11-alpine
restart: always
environment:
POSTGRES_PASSWORD: postgres
# We cache things, but we also might use this to queue stuff.
redis:
image: library/redis:7.4-alpine
# Apparently this project needs some industrial strength search magic.
elasticsearch:
image: elasticsearch:8.15.1
environment:
- discovery.type=single-node
- "ES_JAVA_OPTS=-Xms256m -Xmx1g"
- xpack.security.enabled=false
- ELASTIC_PASSWORD=elastic
# Production is on AWS, so this is how we simulate S3
localstack:
image: localstack/localstack:1.2
environment:
SERVICES: s3:5002
LOCALSTACK_HOSTNAME: 127.0.0.1
BULK_UPLOAD_BUCKET: mybucket
stop_grace_period: 1s
# Our webserver
web:
<<: *base
ports:
- '${PORT:-8000}:8000'
command: /app/src/manage.py runserver '0.0.0.0:8000'
# No point in having a queue if you don't have workers for it. Note that we
# can set ${WORKERS} in our environment to spin up multiple workers, but it
# defaults to 1. This is handy when you want to test out how async stuff
# might handle a large workload.
worker:
<<: *common
deploy:
replicas: ${WORKERS:-1}
depends_on:
- redis
- database
command: /app/src/manage.py rqworker default
It looks daunting, but what you see here is a declaratively-defined, canonical way for you to stand up A Very Complicated Project with everything you need to make your application work just as it does in production (whether using all of these tools together is a good idea or not is a whole other conversation). Critically, now you can do this on anyone's laptop, and even in CI (we'll get to that).
The big win here is that you don't have to be thinking things like "In production, my app will do this, but in development that feature's not enabled, so this other thing will happen". No: if your app uses ElasticSearch, then your local development experience can now account for all the quirks of interacting with that system. More importantly, your tests can be written to interact with them too.
It's important to remember though that this isn't religion. Big services like Oracle or Kafka for example can be beasts to stand up locally and may only have very tangential relevance to your project. In such cases it may make more sense to take on the risk of stubbing-out the service for development and tests, but these really should be the exception.
Developer Tooling
This is where I tend to run into the most pushback on this pattern but it's also the part that can greatly reduce headaches. Are you ready? Your immutable image includes everything you need for development: linters, tests, and debugging modules. I will sometimes even include a few useful system tools like netcat
or ping
, as well as a fancy prompt.
None of these things are necessary for production. They are at best, image bloat, adding anywhere from 100 to 200 MB of useless code to your image that's never used in the wild. Why then, would we want to include it?
Part of it goes back to the develop → test → deploy paradigm. If these tools exist in one stage, they must exist in all of them unless you're willing to deploy something you didn't test. Mostly though, this pattern saves you a pile of pain around getting everyone developing on your project with as few problems as possible.
One of the first questions I always ask on a new job is "How do you run the tests?". This is because everyone does it differently and many companies bake assumptions into the process that depend on things like your OS or Python version, etc. As a Python developer, I see instruction manuals like this all too often:
- Install homebrew
brew install <some system dependencies>
- Install Pyenv
- Switch your Python version to 3.10
- Setup a virtualenv and
pip install -r dev-requirements.txt
- Run our special script that assumes you're running everything on a Mac. It'll set some environment variables, create some folders and make a bunch of assumptions about the location of various files and their permissions on your system.
If you're on Windows or Linux of course, you're typically met with a shrug, but even if you are on a Mac, this is a rather demanding ask: make a bunch of changes to your entire system with homebrew (good luck if you've got multiple projects with different version requirements) and more importantly you're not testing the image. You're deploying a Debian image with dependencies of various versions that are the result of apt install <package name>
. These system dependencies (and the system's behaviour when interacting with our Python dependencies) are unique to the environment and by developing outside of that controlled environment, you're inviting surprises.
Here's a fun example of this that I ran into on a recent job. The company was mostly comprised of Mac-based developers, but there was a small group of Linux nerds as well of which I was a part. The project we were working on required an antiquated version of a system-level XML parsing library, so in order to even build the project's Python dependencies, you needed to have version 1.x of this library installed on your system. This was fine for the Macs, since there was apparently no way to upgrade this library anyway, but for some of the Linux machines (like my beloved Arch) we had to pin the version of this library to the old value, lest our package manager upgrade it and break our project.
It doesn't stop there though. We also had to install a bunch of spy/management-ware on our machines which (as you might have guessed by now) complained at you if you were running an old version of any package. So every time I had to rebuild the Python dependencies, I had to downgrade this XML library, install the dependencies, then re-upgrade it so I could actually use my computer without it barking at me about updates. Good times.
It's just so needlessly complicated. Imagine if those complicated instructions were as simple as:
docker compose up
The Python version is guaranteed to the be same as production. The OS and system dependencies are also guaranteed to be the same. We don't care what OS you're running, or what version of Python you might have locally, and we're not asking you to change any of your system to make this project go. You just say: "start the project in its own environment".
It's liberating how simple this is. It's even more exciting when you consider that you can now run unit tests for multiple languages (your NodeJS-based front-end will have its own set of special behaviour constrained to its container) and you can run integration tests between the various services -- no demands or expectations made on the host computer whatsoever.
The same goes for the linters. What version of ruff
did you install on your laptop? How 'bout mypy
? Is this up to date with the project requirements? What if running the linters was as easy as:
$ docker compose exec web /scripts/run-linters
How much less hassle would you and every new developer on your project have? How many hours of senior developers debugging junior setups could be reclaimed? This alone is a goal worth working toward: a one-line way to ensure Shit Just Works™.
The CI
The nice thing about building an accurate, consistent, and reproducible environment for local development is that you can reuse that environment in the CI. I've seen this far too many times: the dev team has put in all the effort to dockerise their project and may even use Docker both in development and production, but for some reason they run the tests in whatever broken Ubuntu container their favourite CI provider offers. The whole CI process looks something like this:
- Check out the code into an Ubuntu container
- Build a virtualenv
- Use
apt
to install one or two things - Install the Python dependencies into the virtualenv with something like
pip
oruv
- Stand up a Docker container of Postgres (What version? Probably "latest".)
- Run the tests
- Build the Debian or Alpine-based Docker image
- Push it to the repo for deployment
This is madness. You just tested all of your code in an entirely different Linux distro using different versions of system dependencies and ran your tests against only one of your external dependencies (the database), and there's no guarantee that it was even the right version. Then you throw all that out and build an entirely separate Docker image with different... everything and you deploy that.
To top this all off, it's extra work! Someone had to sit down and write those instructions above, accounting for the differences between development and production and trying to make a best-effort middle road between the two. Someone has to keep an eye on CI environment versions, and worry about keeping dependencies in sync between the Docker container and what's installed in CI.
Why do this to yourself when you've already gone through the effort of building a working Compose environment. Just use that! You can build your image, then test that your image behaves as it should, and then push that image out for deployment.
Here's what that looks like with GitLab's CI system:
## We use "Docker in Docker", since GitLab CI drops you into a Dockerised
## environment to begin with
services:
- docker:dind
stages:
- build_and_test
build_and_test:
services:
- docker:dind
variables:
DOCKER_HOST: tcp://docker:2375
DOCKER_DRIVER: overlay2
stage: build_and_test
image: docker
# We log into the Docker repo to pull down the latest image that was
# successfully built. That way when we build our image here, we can
# take advantage of layer caching which reduces our build time to
# seconds.
before_script:
- docker login --username $CI_REGISTRY_USER --password $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker pull $CI_REGISTRY_IMAGE:latest || true
- docker build --cache-from $CI_REGISTRY_IMAGE:latest --build-arg BUILDKIT_INLINE_CACHE=1 --tag $CI_REGISTRY_IMAGE:latest .
# This is a simple project, so we're running the linters followed by the
# tests in the same job. We could split this up, but I'm trying to keep
# it easy to follow.
script:
- docker compose run web sh -c "/scripts/run-linters"
- docker compose run web sh -c "/scripts/run-tests"
- docker push $CI_REGISTRY_IMAGE:latest
Note that we're running exactly the same stuff here as we did in development. That's by design. It should be easy and obvious to a developer just what's running in CI and how they might reproduce a problem locally, but more than anything else, they should be able to know:
"If the tests pass locally, they'll pass in CI"
There's something deeply satisfying about knowing this with a high degree of confidence. It's also a lot cheaper for a company that pays a SaSS CI provider.
Why You Should Care
The big advantage to this model is simplicity.
- No special documentation for getting your development environment working.
- No special documentation for running the tests or linting.
- No weird hacks to run the tests in CI.
- It works everywhere, on any system, without hassle.
- It doesn't depend on (and therefore potentially interfere with) the host system.
This is a massive net reduction in code and documentation as well as in confusion and headaches for the entire development team. How does the project work? The same way, regardless of where it's being used. You reduce the chances that you'll miss something in production or CI because development is as close to production as is technically possible and that saves you a mountain of pain as well.
Mostly though, it'll make me happy. Just once I'd like to sit down at a project that doesn't require me to globally install a bunch of tools and libraries, or open some network ports, or use homebrew, or Pyenv, or any other virtualisation-before-we-had-containers nonsense. I Have A Dream my friends, and that dream is that one day I'll sit down at a new project and the instructions will be this simple:
To stand up the project:
$ docker compose up
To run the tests:
$ docker compose exec web /scripts/run-tests
Or just use our alias:
$ make tests
Until that day comes though, I'm going to keep pointing people to this blog post.