Docker Basics #6: .dockerignore and the Build Context — Using the Cache Well

8 min read

The final post of the Docker Basics series. We’ve containerized a small app, run it, kept its data, and pushed it to a registry. This post drills into one problem — why images get fat and why builds get slow — and the two tools that fix both: .dockerignore and the layer cache.

This post in the Docker Basics series:

The build context — what Docker does before the build starts #

When you run docker build -t myapp ., the trailing dot is the build context. Before the build starts, Docker bundles up the entire directory and ships it to the daemon.

Build context flow
host                              docker daemon
┌──────────────────┐              ┌─────────────────┐
│  ./hello-docker  │              │                 │
│  ├ app.py        │  tar archive │                 │
│  ├ requirements  │ ───────────▶ │   build         │
│  ├ Dockerfile    │              │                 │
│  ├ ...           │              │                 │
│  └ .git/         │              └─────────────────┘
└──────────────────┘

That’s the line you see in build output:

One line in the build output
=> [internal] load build context           500ms
=> => transferring context: 152MB

The bigger that 152MB:

  • The slower the build is to start.
  • The more memory / disk Docker daemon holds it in.
  • The more work goes into change detection (cache key calculation).
  • And most importantly — files in there can be baked into the image by an instruction like COPY . ..

Two things to keep separate:

  1. What ships as context — every file the daemon receives.
  2. What ends up in the image — only what you explicitly bring in via COPY / ADD.

.dockerignore trims the first of those.

.git, node_modules, .venv — what to exclude #

The numbers are often surprising when you look at a small project’s directory:

Look at the context size
du -sh .
# 152M    .

du -sh .git node_modules .venv build dist *.log
# 80M     .git
# 60M     node_modules     # will be reinstalled inside the container
# 8M      .venv            # same
# 4M      build
# 1M      dist
# 200K    *.log

Almost none of that is needed inside the container. Dependencies will be reinstalled by RUN pip install / RUN npm ci inside, so there’s zero reason to send your host’s node_modules or .venv. .git is usually excluded too (if the build needs git info, pass it as --build-arg GIT_SHA=... — cleaner).

Writing .dockerignore #

The syntax is almost the same as .gitignore. Create a .dockerignore next to your Dockerfile and list patterns to exclude from the build context.

.dockerignore (Python/Node common base)
# version control
.git
.gitignore

# env / secrets
.env
.env.*
*.key
*.pem

# build artifacts
dist/
build/
out/
*.egg-info/

# dependencies (reinstalled inside the container)
node_modules/
.venv/
__pycache__/
*.pyc

# editor / OS
.vscode/
.idea/
.DS_Store
Thumbs.db

# logs
*.log
logs/

# test / cache
.pytest_cache/
.mypy_cache/
.ruff_cache/
coverage/
.coverage
htmlcov/

# Docker tooling
Dockerfile.dev
docker-compose*.yml
.dockerignore

The Dockerfile itself and .dockerignore are part of the context too — no need for them inside the image, so it’s cleaner to add them to the ignore list.

Pattern syntax in one table #

PatternMeaning
node_modulesA file or directory with this name anywhere
node_modules/Only the directory
*.logAll .log files
**/*.log.log at any depth (Docker recurses by default, so practically the same as *.log)
dist/**Everything inside dist
!important.logAn explicit exception to the rules above — include this file

The ! exception is powerful but easy to misuse. Stick to plain ignores as much as you can.

Verifying the effect #

Before / after, the build output shows the size drop directly:

Before
=> [internal] load build context           1.2s
=> => transferring context: 152MB

# After adding .dockerignore
=> [internal] load build context           120ms
=> => transferring context: 240kB

The difference is tangible. CI pulling the context onto a build machine each time makes the savings even more obvious.

Layer cache — making builds fast #

The second pillar of Docker builds is the cache. For each line in your Dockerfile (i.e. each layer), Docker asks:

Is the input to this instruction (the previous layer + the instruction itself + any files it references) the same as last build? → If yes, reuse the cache; if no, rebuild from this layer down.

This model is decisive for build speed. And — once one layer breaks, every layer after it breaks too (since they stack top-down).

Where the cache breaks #

Wrong order — cache breaks easily
FROM python:3.14-slim
WORKDIR /app

COPY . .                              # code and the dependency manifest land together
RUN pip install -r requirements.txt   # any code change above busts this layer

This Dockerfile reinstalls dependencies for every single code change. COPY . . produces different output, so the cache key for the next RUN changes. Big projects? Builds get minutes longer every time.

Right order — separate dependencies and code #

Recommended order
FROM python:3.14-slim
WORKDIR /app

# 1) Copy only the dependency manifest first
COPY requirements.txt .
# 2) Install — cache reused as long as requirements.txt is unchanged
RUN pip install --no-cache-dir -r requirements.txt

# 3) Then copy the code
COPY . .

CMD ["python", "app.py"]

Now if you only change app.py:

  • COPY requirements.txt . → cache hit
  • RUN pip install ...cache hit (skip the install entirely)
  • COPY . . → re-runs
  • The whole build finishes in seconds.

Things that don’t change go up; things that change often go down. That’s the first heuristic for writing Dockerfiles. Order: base image (rarely changes) → system deps → language deps → code (changes often).

Same pattern for Node.js #

Caching Node deps
FROM node:20-slim
WORKDIR /app

COPY package.json package-lock.json ./
RUN npm ci

COPY . .
CMD ["node", "server.js"]

Copy package.json / package-lock.json first → install with npm ci → then code. Same idea.

A side effect of the cache — image size #

Layer cache affects more than build speed; it shapes image size. Once a layer is baked, anything created during it is locked in.

Common mistake — temp files baked into a layer
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get clean

These three lines become three layers. The package index from apt-get update is baked into the first; deleting it later in another layer doesn’t actually shrink the image.

Combine into one line and you get one layer:

Recommended pattern
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

Combine with &&, clean up the cache directory at the end — all in one layer. Image size differs noticeably:

SplitCombined
Image size~180MB~85MB

You’ll see this pattern in nearly every official base image’s Dockerfile.

--no-cache — force a fresh build #

When the cache is hanging on to something stale:

Build ignoring cache
docker build --no-cache -t myapp .

To pull base images fresh too:

Refresh bases as well
docker build --pull -t myapp .
docker build --pull --no-cache -t myapp .   # both

A stale apt-get update index can leave you missing security patches. Periodic CI builds often force --no-cache or --pull on a schedule.

Wrapping the series — one container’s full cycle #

What you’ve built up over Basics, at a glance:

The full Dockerfile
FROM python:3.14-slim

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

WORKDIR /app

# System deps (one layer with cleanup)
RUN apt-get update && \
    apt-get install -y --no-install-recommends curl && \
    rm -rf /var/lib/apt/lists/*

# Language deps (rarely change — go up)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Code (changes often — goes down)
COPY app.py .

EXPOSE 8000
CMD ["python", "app.py"]
.dockerignore
.git
.env
.env.*
node_modules/
.venv/
__pycache__/
*.pyc
*.log
.pytest_cache/
.mypy_cache/
.ruff_cache/
.DS_Store
Dockerfile.dev
docker-compose*.yml
Build → run → push
# 1) Build (the command CI runs most)
docker build -t ghcr.io/curtis/myapp:1.0.0 -t ghcr.io/curtis/myapp:latest .

# 2) Local sanity check
docker run --rm -p 8000:8000 ghcr.io/curtis/myapp:1.0.0

# 3) Production run — daemon mode + persistent data + restart policy
docker run -d --name myapp \
  --restart unless-stopped \
  --network mynet \
  -p 127.0.0.1:8000:8000 \
  -v myapp-data:/app/data \
  -e DB_HOST=pg \
  ghcr.io/curtis/myapp:1.0.0

# 4) Pull on another machine
docker pull ghcr.io/curtis/myapp:1.0.0

Once this flow is in your hands — define → build → run → ship — that’s the destination of Docker Basics.

What’s next — Docker Intermediate #

Basics covered making and running a single container well. The next series steps further into multiple containers + deeper builds. What it covers:

  • Multi-stage builds — separate build deps from runtime deps and slim the image
  • Docker Compose — define web + db + cache in one file and start them together
  • healthcheck, depends_on, profiles — Compose’s operational features
  • Build cache deep dive — BuildKit, mount cache — share pip install / npm ci caches across builds
  • Logging and debugging — viewing logs from many containers in one place
  • Environment variables and secrets — handling secrets without baking them into the image

It builds on the single-container habits you’ve now formed, and steps closer to a production environment. Everything from this series — layers, cache, volumes, networks — stacks underneath it.

Wrap-up #

The 6 Basics posts in one line each:

  • #1 — Container vs. VM, the bones of the Docker ecosystem
  • #2 — Your first Dockerfile with FROM / RUN / COPY / CMD
  • #3 — Day-to-day commands: build / run / ps / logs / exec / stop / rm
  • #4 — bind mount / named volume, user-defined bridge networks
  • #5 — Docker Hub / GHCR, tag / push / pull, digests
  • #6 — .dockerignore, build context, the order that keeps the layer cache alive

The Docker track has four series. Next up is Docker Intermediate — Compose and multi-stage builds.

X