Docker in Practice #6 Cloud Deployment — Fly.io / Railway / ECS — Wrapping the Track

The last post of the Docker track. In #1–3 we built container images, in #4–5 we built and pushed them in CI. This post is about putting them on actual production infrastructure.

This post in the Docker in Practice track:

After comparing the three options, I’ll briefly walk through the Fly.io and Railway flows, and link ECS to the AWS in Practice track. At the end, a recap of all 24 Docker posts.

The three-way fork #

Start with one table.

Fly.ioRailwayAWS ECS Fargate
Whereedge (30+ regions globally)US/EU regionsAll AWS regions
Infra modelFirecracker VM (per machine)Container (Nomad)Container (managed)
Deploy unitApp + MachineServiceTask + Service
Multi-regionBuilt in (anycast)PartialNeeds separate design
DB / RedisBuilt in (Postgres, Upstash)Built in (Postgres, Redis)RDS / ElastiCache
PricingUsage-based (per minute)Usage-based (monthly)Hour/memory based
Learning curveLowLowestHigh
Lock-inLowLow (just docker images)High (AWS ecosystem)

A one-line decision rule:

  • Need to ship fast and stay portable → Railway or Fly.io. As long as you have a docker image, easy to move.
  • Already running on AWS → ECS. Naturally pairs with other AWS services (RDS, S3, IAM).
  • Global users / low latency → Fly.io. Edge is the default.

This post goes deep on the first two, with ECS linked over to AWS in Practice #1 ECS Deployment.

Fly.io — the fly launch flow #

Fly.io takes a docker image and runs it on Firecracker VMs (= Machines). One App contains multiple Machines, each Machine running one container.

1. Install the CLI and log in.

Fly CLI
brew install flyctl
fly auth login

2. Start with fly launch.

fly launch looks at the directory and auto-generates an appropriate fly.toml. If a Dockerfile exists, it uses that.

Create the app
cd my-fastapi-app
fly launch
# - choose app name
# - choose region (recommends nearest)
# - asks if you want Postgres → Yes creates one and auto-injects DATABASE_URL

The generated fly.toml:

fly.toml
app = "my-fastapi-app"
primary_region = "nrt"   # Tokyo

[build]
  # Uses Dockerfile (auto-detected)

[env]
  PYTHONUNBUFFERED = "1"

[http_service]
  internal_port = 8000
  force_https = true
  auto_stop_machines = "stop"      # auto-stops when no traffic (saves cost)
  auto_start_machines = true        # auto-starts on first request
  min_machines_running = 0

  [[http_service.checks]]
    interval = "30s"
    timeout = "5s"
    grace_period = "10s"
    method = "GET"
    path = "/healthz"

[[vm]]
  cpu_kind = "shared"
  cpus = 1
  memory_mb = 512

auto_stop_machines = "stop" is the interesting part. No traffic → container automatically stops; first request comes in → automatically starts. Cold start is around 0.5–2 seconds — useful for side projects or environments with bursty traffic.

3. Inject secrets.

Secrets like DATABASE_URL go in via fly secrets. Never pin them in fly.toml.

Secrets
fly secrets set DJANGO_SECRET_KEY=$(openssl rand -hex 32)
fly secrets set DATABASE_URL="postgres://..."
fly secrets list

secrets set automatically redeploys the app with the new env vars applied. Secrets enter at runtime, not build time — they never get baked into the image (the same rule from #5).

4. Deploy.

Deploy
fly deploy
# Or use an already-pushed image
fly deploy --image ghcr.io/me/app:sha-a1b2c3d

Fly takes the image, spins up a new Machine, waits for healthcheck to pass, then routes traffic. Rolling strategy by default — zero-downtime.

5. Logs / status / shell.

Operations
fly status
fly logs
fly ssh console        # shell into a container
fly machine restart
fly scale count 3      # scale to 3 Machines
fly scale memory 1024  # change memory

Auto-deploying from CI is simple:

.github/workflows/deploy-fly.yml
name: Deploy to Fly.io

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: superfly/flyctl-actions/setup-flyctl@master
      - run: flyctl deploy --remote-only
        env:
          FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}

--remote-only builds on Fly’s builder — no GHA runner build, so the workflow is shorter. But there’s a usage limit on Fly’s builder. If you need a bigger cache, do GHA build → registry push → flyctl deploy --image ... like in #4.

Railway — railway up #

Railway’s strength is the fastest start. The UI is clean, and docker + env vars + Postgres all bundle into one screen.

1. Install the CLI and log in.

Railway CLI
brew install railwayapp/railway/railway
railway login

2. Create and connect a project.

Creating a project in the web console asks you to connect a GitHub repo. Once connected, every push to main auto-builds/deploys. If a Dockerfile exists, it’s preferred (otherwise Nixpacks auto-detects).

CLI-only:

CLI deploy
cd my-app
railway init      # new project
railway up        # build and deploy current directory

3. Secrets and service connections.

Adding a Postgres service in the console automatically injects env vars like DATABASE_URL into other services. Same via CLI.

Env vars
railway variables set DJANGO_SECRET_KEY=$(openssl rand -hex 32)
railway variables                # see current variables
railway run -- python manage.py migrate    # run a command with env vars injected

4. Healthcheck and zero-downtime.

Set healthcheck in railway.json (or railway.toml).

railway.json
{
  "$schema": "https://railway.com/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE",
    "dockerfilePath": "Dockerfile"
  },
  "deploy": {
    "healthcheckPath": "/healthz",
    "healthcheckTimeout": 30,
    "restartPolicyType": "ON_FAILURE",
    "numReplicas": 2
  }
}

Railway also defaults to rolling deploys that route traffic after healthcheck passes. With numReplicas: 2, you get zero-downtime.

Fly vs Railway, briefly.

Railway really is close to “just run my docker image.” Multi-region, edge, anycast — those are Fly’s strengths. For a simple full-stack app, Railway is the fastest option to get up.

ECS Fargate — briefly #

ECS was covered in depth in the AWS track, so just the outline here.

Core concepts:

  • Task Definition — defines which image, with what resources, how to run (JSON).
  • Task — one instance produced from a Task Definition (= a bundle of containers).
  • Service — manager that maintains N Tasks. Pairs with an ALB to distribute traffic.
  • Cluster — the bowl that holds the resources above.

Deploy flow in one line:

  1. Push image to ECR (CI runs aws ecr get-login-password then docker push).
  2. Update the image field of the Task Definition with the new SHA tag (revision +1).
  3. Service rolling-deploys the new revision — new Task spins up, healthcheck passes, old Task terminates.

A simple workflow example:

.github/workflows/deploy-ecs.yml — skeleton
- uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::ACCOUNT:role/github-actions
    aws-region: ap-northeast-2

- uses: aws-actions/amazon-ecr-login@v2
  id: login-ecr

- name: Build, tag, push
  run: |
    docker build -t $ECR_REGISTRY/$REPO:sha-${{ github.sha }} .
    docker push $ECR_REGISTRY/$REPO:sha-${{ github.sha }}
  env:
    ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}

- name: Update task definition
  id: task-def
  uses: aws-actions/amazon-ecs-render-task-definition@v1
  with:
    task-definition: task-def.json
    container-name: web
    image: ${{ steps.login-ecr.outputs.registry }}/${{ env.REPO }}:sha-${{ github.sha }}

- name: Deploy
  uses: aws-actions/amazon-ecs-deploy-task-definition@v1
  with:
    task-definition: ${{ steps.task-def.outputs.task-definition }}
    service: my-service
    cluster: my-cluster
    wait-for-service-stability: true

Detailed IAM setup, ALB / Target Group / healthcheck, RDS integration, cost optimization — those are covered in the 6 posts starting with AWS in Practice #1.

Common principles of zero-downtime #

The principles of zero-downtime deployment are the same across platforms.

Where rolling deployment happens
initial:     [v1]  [v1]  [v1]   ← LB
v2 push →   [v1]  [v1]  [v1]
            [v2]                ← spin up new instance
            ┌── healthcheck ──┐
            │  /healthz OK?   │
            └─────────────────┘
            [v1]  [v1]  [v2]   ← register on LB, remove one old instance
            [v1]  [v2]  [v2]
            [v2]  [v2]  [v2]   ← done

What each step needs:

  • /healthz endpoint — confirms the app is actually ready to receive. To be accurate, it should also check the DB connection. Returning 200 too early means traffic comes in before warm-up and the first request breaks.
  • Graceful shutdown — when terminating the old instance, receive SIGTERM, finish in-flight requests, then exit. This depends on PID 1 receiving signals correctly (covered in #2’s exec "$@" and Advanced #6).
  • Enough instances — you can’t get zero-downtime with just one. At least 2 to roll.
  • DB migration compatibility — while the new code is running, the old code is also running. Always make migrations backward-compatible (e.g., add columns nullable, drop columns in two stages).

Secret management — per platform #

Runtime secrets (DATABASE_URL, API keys) never go in the image — put them in the platform’s secret management.

PlatformWhere secrets go
Fly.iofly secrets set (encrypted at rest, injected as env vars into the container)
RailwayVariables in the console, or railway variables set
ECSSecrets Manager / Parameter Store + the secrets section of the Task Definition
KubernetesSecret resource or ExternalSecrets / SOPS

Common rules:

  • Don’t pin them at build time (runtime env vars, not --build-arg).
  • Never commit them to the repo (.env.production).
  • If CI itself needs to handle secrets, use GHA’s secrets.* or OIDC to issue temporary cloud credentials.

A common layout — backend + frontend + DB #

The output of the Docker track in one diagram:

Production layout — common shape
              users
        ┌───────────────┐
        │   CDN/LB      │   (Cloudflare / ALB / Fly anycast)
        └───┬───────┬───┘
            │       │
       ┌────▼─┐  ┌──▼────────┐
       │  Web │  │   API     │   (Next.js / FastAPI , Django)
       │ (#3) │  │  (#1, #2) │   ← deployed as containers
       └──────┘  └─────┬─────┘
                ┌──────▼───┐
                │   DB     │   (RDS / Fly Postgres / Railway PG)
                │ Postgres │
                └──────────┘

What the Docker track covered at each layer:

  • Web container#3: standalone / static export.
  • API container#1: uv , multi-stage , non-root.
  • DB — production should be a managed service (RDS / Fly Postgres / Railway PG). The #2 compose pattern is for local/dev.
  • CI build/push#4, #5.
  • Deploy — this post.

Recap of the 24-post Docker track #

The 6 Basics posts laid down where containers fit, the 6 Intermediate posts covered multi-stage / compose / env vars, the 6 Advanced posts dealt with BuildKit / security / resources / PID 1, and these 6 In Practice posts covered FastAPI / Django / Next.js / CI / tags / deployment — one cycle is closed.

Recall the first sentence of Basics #1 — containers showed up to solve “works on my machine.” After 24 posts, the answer to that same problem is — one image runs the same way everywhere, CI builds and pushes it, and the cloud takes it and runs it. The wrinkles inside (PID 1, healthcheck, multi-arch, secrets, tags) are the actual operational details.

Where to go next:

  • Kubernetes — running dozens of services instead of one container. The abstractions ECS/Fly/Railway hide are exposed in K8s. Triggered by big org / multi-team / self-hosting.
  • Service Mesh (Istio, Linkerd) — where mTLS , observability , policy go on top of inter-container traffic.
  • Container Native CI/CD — GitOps flows like Tekton, ArgoCD.

This track ends at the level of solo-dev to small-team operations; the items above grow into separate tracks.

Summary #

  • The first fork in cloud deployment is Fly.io vs Railway vs ECS. Fast to ship → Railway, edge → Fly, on AWS → ECS.
  • Common everywhere: take the image → spin up new instances until healthcheck passes → LB shifts traffic → terminate old instances. Rolling deployment.
  • The 4 requirements for zero-downtime: /healthz endpoint / graceful shutdown / instances ≥ 2 / backward-compatible migrations.
  • Secrets at runtime only. In the platform’s secret manager. Never bake them into the image.
  • The image: in production manifests should be a SHA tag (#5). Rollback is a one-line change.
  • The 24-post Docker track closes here. Going to a bigger org / self-hosting, Kubernetes is next.

The Docker track wraps up here. In other tracks — Modern Python / Django / Go / TypeScript / React / Angular / AWS — Docker always showed up in the final post. Now that Docker has been covered end-to-end in this track, you can solve every track’s deployment flow with the same tool.

X