AWS Advanced #2: ECR — Image Registry

9 min read

#1 ECS and Fargate covered container operations. One piece is missing — where do the images ECS / Fargate pulls actually live? External registries like Docker Hub work, but inside AWS the standard is Amazon ECR (Elastic Container Registry).

This post covers ECR’s place — private vs public, IAM auth, push / pull, security (scanning, tag immutability), and ops (lifecycle policies, multi-arch) — all in one go.

What an image registry is for #

Recall the Docker flow:

image lifecycle
docker build → local image
docker push → registry (remote)
docker pull (somewhere else) → bring it down, docker run

The registry sits in the middle. It’s what decides who can pull which image, from where, at which version.

The options #

RegistryNotes
Docker HubMost famous. Free, but with pull limits, public-by-default
GHCR (GitHub Container Registry)Linked to GitHub accounts. Generous private free tier
Amazon ECR PrivateIAM auth inside AWS. Plays naturally with ECS / Lambda / EKS
Amazon ECR PublicFor OSS distribution. Anyone can pull anonymously
GCR / Azure ACROther clouds

If your ECS / Lambda / EKS runs on AWS, ECR is the standard:

  • IAM auth — no separate password to manage
  • VPC Endpoints to skip the internet on pull (NAT cost savings)
  • Same region → fast pulls
  • Image scanning (vulnerability analysis) integrated

Private vs Public #

Two flavors.

Private (most cases) #

Only your account’s users / roles can access. Almost all production work uses Private.

  • Region-scoped (each image is pinned to a region)
  • IAM policies for access control
  • Cost: GB stored + Data Transfer

Public (OSS distribution / learning) #

Anyone in the world can pull anonymously. Listed on the AWS-run Public Gallery.

  • Always hosted in us-east-1 (global)
  • Push requires IAM auth, pull is anonymous
  • Cost: GB stored + Data Transfer (push side)

This post assumes Private.

Creating a Repository #

The unit in ECR is a Repository. One repo holds many tags (versions) of the same app.

create a repo
aws ecr create-repository \
  --repository-name myapp \
  --region ap-northeast-2 \
  --image-scanning-configuration scanOnPush=true \
  --encryption-configuration encryptionType=AES256

On success the URI is:

123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp

That’s the address for every push / pull. Format:

ECR URI shape
<account-id>.dkr.ecr.<region>.amazonaws.com/<repo>:<tag>

Options #

OptionMeaning
image-scanning-configuration scanOnPush=trueAuto vulnerability scan on push
image-tag-mutability IMMUTABLEForbid overwriting the same tag — recommended for prod
encryption-configuration encryptionType=KMSEncrypt with a customer-managed KMS key

You can do the same in the console GUI.

Auth — aws ecr get-login-password #

Unlike Docker Hub, ECR auth is via AWS IAM. There’s no separate password — instead you get a temporary token and docker login with it.

ECR login (12-hour validity)
aws ecr get-login-password --region ap-northeast-2 \
  | docker login --username AWS --password-stdin \
    123456789012.dkr.ecr.ap-northeast-2.amazonaws.com

The token is valid for 12 hours. In CI, re-acquire it at the start of every job.

Permissions you need #

policy on the user / role doing the push
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload",
        "ecr:PutImage",
        "ecr:BatchGetImage"
      ],
      "Resource": "arn:aws:ecr:ap-northeast-2:123456789012:repository/myapp"
    }
  ]
}

GetAuthorizationToken requires * (the only one that does); the rest you scope to the specific repo (Basics #6 least privilege).

Pull-only permissions #

For ECS Tasks (Execution Role) you only need pull. The AWS-managed policy AmazonECSTaskExecutionRolePolicy includes ECR pull permissions.

Push / Pull #

Push #

first push
# build
docker build -t myapp:v1 .

# tag (with the ECR URI)
docker tag myapp:v1 \
  123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1

# login (see above)
aws ecr get-login-password --region ap-northeast-2 | docker login ...

# push
docker push \
  123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1

If the image is 100 MB the first push uploads 100 MB. Subsequent pushes use layer-level caching — only the changed parts go up, usually a few MB.

Pull #

pull
docker pull 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1

ECS / Lambda do this automatically. You almost never pull manually from the console, but it’s useful for debugging.

Tagging strategies #

How you name versions of the same image inside the repo. Common patterns.

1) Semver #

myapp:1.4.2
myapp:1.4
myapp:1
myapp:latest

Natural for libraries / tools you publish externally. In production, latest is dangerous — it’s unclear which build “latest” actually refers to.

2) Git SHA #

myapp:abc1234        ← short sha
myapp:abc1234567...  ← full sha

1:1 with the commit your CI built. Most recommended for prod — you can immediately trace which commit is in production.

3) Environment + sequence #

myapp:prod-2025-04-01.001
myapp:staging-2025-04-01.005

Where releases are counted per day.

4) Multi-tag #

The recommended ops pattern: immutable + alias.

build once, two tags
docker tag myapp:abc1234 myapp:abc1234           # immutable (forever the same)
docker tag myapp:abc1234 myapp:prod-current      # mutable (points at current prod)

Make the ECR repo itself IMMUTABLE (can’t overwrite a pushed tag) and let a separate tool (your deployment system) manage aliases when needed.

Image scanning #

ECR auto-scans pushed images for vulnerabilities. Enable with scanOnPush=true (set above).

Two flavors #

TypeWhatCost
Basic ScanningOne-shot scan against open-source CVE DB (CoreOS Clair)Free
Enhanced ScanningInspector integration. OS layer + language libraries (npm, pip, etc.). Continuous monitoring (alerts when new CVEs land after push)per repo-hour / per image

Look at Enhanced for production workloads. Basic is enough to start.

Reading results #

scan findings
aws ecr describe-image-scan-findings \
  --repository-name myapp \
  --image-id imageTag=v1

In the console: repo → image → “Vulnerabilities” tab shows CRITICAL / HIGH / MEDIUM / LOW counts at a glance.

Block at the build stage #

Block deploys when CRITICAL is non-zero — in your CI job:

CI gate
CRITICAL=$(aws ecr describe-image-scan-findings \
  --repository-name myapp --image-id imageTag=$SHA \
  --query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
  --output text)

if [ "$CRITICAL" != "None" ] && [ "$CRITICAL" -gt 0 ]; then
  echo "🚨 CRITICAL CVE found. Blocking deploy."
  exit 1
fi

Lifecycle policies — auto-cleanup #

Images pile up; storage costs follow. Use a lifecycle policy to auto-clean.

lifecycle.json
{
  "rules": [
    {
      "rulePriority": 1,
      "description": "Delete untagged images after 7 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 7
      },
      "action": { "type": "expire" }
    },
    {
      "rulePriority": 2,
      "description": "Keep only the latest 30 (delete the rest)",
      "selection": {
        "tagStatus": "any",
        "countType": "imageCountMoreThan",
        "countNumber": 30
      },
      "action": { "type": "expire" }
    }
  ]
}
apply
aws ecr put-lifecycle-policy \
  --repository-name myapp \
  --lifecycle-policy-text file://lifecycle.json

Common patterns:

  • Delete untagged after 7–14 days (leftovers from failed builds)
  • Delete pr- prefixed tags after 30 days (PR preview images)
  • Keep release- prefixed forever

After a year in production, an unmanaged repo accumulates GBs of stored images and the cost that goes with them. Setting a lifecycle policy when you create the repo is part of operational hygiene.

Multi-architecture images #

A linux/arm64 image built on Apple Silicon won’t boot on amd64 production. Two paths.

1) buildx multi-platform push #

buildx multi-arch
docker buildx create --use
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1 \
  --push .

A single ECR tag (v1) holds a manifest list with both architectures. Pull side automatically picks the matching arch.

2) Standardize on Fargate ARM #

Set #1 Fargate’s runtimePlatform.cpuArchitecture: ARM64 in your Task Definition and you only need ARM images. Bonus: about 20% cheaper.

Task Definition (ARM)
{
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  }
}

For a new small- to mid-traffic project, going ARM from day one is the right call.

VPC Endpoint — pull without NAT #

ECS Tasks in private subnets pulling from ECR → by default through NAT Gateway → per-GB charge.

ECR supports VPC Endpoints to skip NAT:

three ECR VPC Endpoints
# api calls
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.ap-northeast-2.ecr.api \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-aaa subnet-bbb

# image layer downloads
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.ap-northeast-2.ecr.dkr \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-aaa subnet-bbb

# image layers live in S3 — add an S3 endpoint
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.ap-northeast-2.s3 \
  --vpc-endpoint-type Gateway \
  --route-table-ids rtb-xxx

The three (api, dkr, s3) form a set. Cuts NAT Gateway costs significantly — almost mandatory at scale.

Cross-account access #

Want to pull from the prod account’s ECR repo into a dev account? Use a Repository Policy to allow it.

repo-policy.json
{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "AllowDevAccountPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::222222222222:root"
      },
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
      ]
    }
  ]
}
aws ecr set-repository-policy \
  --repository-name myapp \
  --policy-text file://repo-policy.json

Principle: production images live once in the prod ECR; other accounts pull. Don’t rebuild the same image into per-environment ECRs.

Cost #

ItemPrice (Seoul region)
StorageGB / month $0.10
Data Transfer Out (internet)GB $0.126 (1GB free)
Data Transfer Out (same region)Free
Enhanced Scanningper image + per repo-hour

ECS pulls in the same region → free. Only internet egress (CI / external tools) pulls cost. With a lifecycle policy your storage stays small and the bill is essentially nothing.

Common pitfalls #

1) denied: User: ... is not authorized to perform: ecr:... #

Permission missing. Check both:

  • ECR actions are in the user / role policy
  • The Repository Policy isn’t blocking (usually empty — empty means IAM policy is enough)

2) manifest unknown or repository ... not found #

99% the time it’s wrong region or account ID in the URI. Double-check the ap-northeast-2 part and the 123456789012 part.

3) Pushing the same tag against an IMMUTABLE repo #

Pushing the same tag twice gets denied. This is intentional — recommended for prod. Work around it by tagging with the commit SHA in your CI job.

4) Forgot multi-architecture #

Build on Mac (ARM) → push → exec format error on x86_64 Fargate. Use buildx multi-arch, or standardize the Task Definition on ARM.

5) NAT Gateway cost blow-up #

ECR pulls through NAT add up by GB. Add the three VPC Endpoints (api / dkr / s3).

6) Image accumulation #

Without a lifecycle policy, a year of production gives you thousands of images and GB of cost. Add a lifecycle policy when you create the repo.

Wrap-up #

Here is what this post covered:

  • Where ECR fits — AWS’s image registry. Plays naturally with ECS / Lambda / EKS via IAM
  • Private vs Public — production is Private. Public is for OSS distribution
  • Repository = one app’s image collection. URI shape <account>.dkr.ecr.<region>.amazonaws.com/<repo>:<tag>
  • Authaws ecr get-login-password → docker login. 12-hour token
  • Permissions — split push (full) from pull (AmazonECSTaskExecutionRolePolicy)
  • Tagging strategies — Semver / Git SHA / env+sequence. Production goes Git SHA + IMMUTABLE
  • Image scanning — Basic (free) / Enhanced (Inspector, continuous). CI gate on CRITICAL
  • Lifecycle policy — rules like “untagged for 7 days, keep latest N” auto-clean
  • Multi-architecture — buildx for amd64 + arm64. Or standardize on Fargate ARM (20% cheaper)
  • VPC Endpoint (api / dkr / s3) — skip NAT Gateway. Almost mandatory in production
  • Cross-account — Repository Policy for prod ↔ dev pull
  • Pitfalls — permission, URI typo, IMMUTABLE conflict, missing multi-arch, NAT cost, image sprawl

Up next — Lambda #

ECS / ECR are the model where a container is always running. Next we look at the opposite — the function wakes only when called, the serverless side.

In #3 Lambda Basics we cover where Lambda fits (vs ECS / EC2), runtime / handler / event model, cold start, concurrency, and logging — AWS’s first serverless piece, all in one go.

X