Python Testing #7: Running Tests in CI — People Forget, Machines Don't

5 min read

No matter how good your tests are, one last question remains: who runs them? Tests that depend on someone remembering to run them eventually stop being run. You skip them once on a busy day, skip them again for an urgent hotfix, and at some point they sit broken and abandoned. In this final post of the series we’ll take test execution out of human memory and hand it to a machine — that is, CI integration.

  • #1 Getting started with pytest
  • #2 Fixtures
  • #3 parametrize and markers
  • #4 mock and monkeypatch
  • #5 Testing the outside world
  • #6 Test design and coverage
  • #7 Running tests in CI ← this post

A basic GitHub Actions workflow #

Add a single .github/workflows/test.yml file to your repository, and your tests run automatically on every push and PR.

.github/workflows/test.yml
name: tests

on:
  push:
    branches: [main]
  pull_request:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
      - run: uv sync --all-extras
      - run: uv run pytest

Four steps, that’s all: check out the code, install uv, sync dependencies, run pytest. The single line enable-cache: true stores uv’s package cache in GitHub’s cache storage, so from the second run on, dependency installation finishes in seconds. Now every PR shows a green check or a red X, and if you enable branch protection in the repository settings, a PR with failing tests can’t be merged at all. At this point, “running the tests” is fully decoupled from human willpower.

A Python version matrix #

If you maintain a library or need to support multiple versions, run the same tests once per version.

version matrix
jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: ["3.11", "3.12", "3.13"]
    steps:
      - uses: actions/checkout@v4
      - uses: astral-sh/setup-uv@v5
        with:
          enable-cache: true
          python-version: ${{ matrix.python-version }}
      - run: uv sync --all-extras
      - run: uv run pytest

One parallel job spawns for each version listed in strategy.matrix. Catching code that breaks only on 3.13 is hard locally, but the matrix checks all three versions every single time. If you run a single service of your own, one version matching production is enough.

Attaching a coverage report to PRs #

Let’s hook the coverage measurement we built in #6 into CI. Generate an XML report and upload it to a service like Codecov, and every PR gets a comment showing the coverage delta.

coverage upload
      - run: uv run pytest --cov=app --cov-report=xml
      - uses: codecov/codecov-action@v5
        with:
          token: ${{ secrets.CODECOV_TOKEN }}

When a comment like “this PR drops coverage by 2%” appears automatically, the author sees it before any reviewer has to point it out. The value isn’t the number itself but that the direction of change is visible. As we established in #6, coverage is a signal, not a target — and CI puts that signal where everyone can see it.

Earlier feedback with pre-commit #

CI only gives feedback after a push. Problems that can be caught right before a commit are better caught earlier. Wire ruff into a pre-commit hook and formatting and lint issues get caught at commit time.

.pre-commit-config.yaml
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.8.4
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

Install it with uv tool install pre-commit, run pre-commit install once in the repository, and ruff runs automatically on every commit from then on. One caution: don’t put your full pytest suite in pre-commit. When commits take tens of seconds, people start bypassing with --no-verify, and once bypassing becomes a habit the whole hook is neutralized. The division of labor that lasts: commit hooks for checks that finish in 1–2 seconds, the full test suite for CI.

Separating slow tests #

As tests grow, CI time grows — and slow CI wrecks your development rhythm. The markers from #3 earn their keep here.

marking a slow test
import pytest

@pytest.mark.slow
def test_full_data_pipeline():
    ...

Split the runs: only the fast tests locally, everything in CI.

split execution
# local: fast tests only
uv run pytest -m "not slow"

# CI: everything
uv run pytest

For even heavier tests — say, integration tests that call real external APIs — another option is a separate workflow on a schedule trigger that runs once a day instead of on every PR. The key is separating the fast feedback loop from deep verification.

Re-running only the failed tests #

While fixing tests locally, there’s no need to run the whole suite every time.

--lf
# last failed: only the tests that failed last run
uv run pytest --lf

pytest records the previous run’s results in the .pytest_cache directory, and --lf reads that record to run only the tests that failed. --ff, which runs the failures first and the rest after, uses the same cache. Being able to iterate on just the three broken tests out of hundreds makes the fix loop dramatically shorter. Once everything is fixed, finish with one full run.

Once tests pass in CI, the next step is deployment. Chaining a Docker build and deploy stage onto the same workflow is covered in Modern Python in Practice #6 — read on there if you want tests and deployment in a single pipeline.

Wrapping up the series #

A one-line look back at each of the seven posts.

Tests are, in the end, a safety net for your future self. Three months from now you won’t remember the intent behind today’s code. When you break something while changing it then, the test you wrote today will tell you in red letters. CI is the mechanism that guarantees the net is always spread open. People forget; machines don’t. May the testing habits built across this series underpin every line of Python you write from here on.

X