16 Chapter

ECR — the Image Registry

Where you store the container images that ECS and Lambda will pull. We cover the private / public difference in Amazon ECR, IAM authentication, docker push / pull, image scanning, tag strategy, lifecycle policies, multi-architecture (linux/amd64 + arm64), VPC Endpoints, and cross-account access.

In the previous Chapter 15 ECS and Fargate we got the big picture of container operations. One piece is missing — where do the images that ECS / Fargate pull actually live? An external registry like Docker Hub is possible too, but inside AWS the standard is Amazon ECR (Elastic Container Registry).

This chapter puts ECR’s structure together all at once — the private / public difference, IAM authentication, image push / pull, security (scanning, tag immutability), and operations (lifecycle policies, multi-architecture). The push / pull flow and lifecycle policy we set up here show up again and again in Part 4’s Chapter 22 ECS Fargate deployment skeleton and Chapter 24 CI/CD pipeline.

The role of an image registry #

Recall the flow of Docker.

image lifecycle

docker build → local image
       ↓
docker push → registry (remote)
       ↓
docker pull (another machine) → pull that image and docker run

The registry is that midpoint. It decides who can pull which version of an image, and from where.

Comparing the options #

Registry	Use
Docker Hub	The most famous. Free but has a pull limit; public by default
GHCR (GitHub Container Registry)	Linked to a GitHub account. Large private free tier
Amazon ECR Private	Authenticated by IAM inside AWS. Wires naturally into ECS / Lambda / EKS
Amazon ECR Public	For OSS distribution. Anyone can do an anonymous pull
GCR / Azure ACR	For other clouds

If your ECS / Lambda / EKS is inside AWS, ECR is the standard.

You authenticate with IAM — no separate password management.
You pull via a VPC Endpoint without going over the internet (NAT cost savings).
Pulls are fast because it’s the same region.
Image scanning (automatic vulnerability analysis) is integrated.

Private vs Public #

ECR comes in two kinds.

Private (most cases) #

Only users / roles in your account can access it. Company / operational workloads are almost all here.

Regional (each image is stamped with a region)
Access control via IAM policy
Cost: GB storage + Data Transfer

Public (OSS distribution / learning material) #

Anyone in the world can do an anonymous pull. It’s exposed on AWS’s Public Gallery.

Always hosted in us-east-1 (global)
Push requires IAM authentication; Pull requires none
Cost: GB storage + Data Transfer (on the push side)

This chapter proceeds on a Private basis.

Creating a Repository #

ECR’s unit is the Repository. Inside one repo you store several versions (tags) of the same app.

create a repo

aws ecr create-repository \
  --repository-name myapp \
  --region ap-northeast-2 \
  --image-scanning-configuration scanOnPush=true \
  --encryption-configuration encryptionType=AES256

On success, you get a URI.

123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp

This is the address for all push / pull. Its shape is as follows.

The shape of an ECR URI

<accountID>.dkr.ecr.<region>.amazonaws.com/<repo>:<tag>

Options summary #

Option	Meaning
`image-scanning-configuration scanOnPush=true`	Automatic vulnerability scan on push
`image-tag-mutability IMMUTABLE`	Forbid overwriting the same tag — recommended for operations
`encryption-configuration encryptionType=KMS`	Encrypt with a customer-managed KMS key

You can create the same thing in the console via the GUI.

Authentication — `aws ecr get-login-password` #

Unlike Docker Hub, ECR authenticates with AWS IAM. There’s no separate password. Instead you get a temporary token and docker login with it.

ECR login (valid for 12 hours)

aws ecr get-login-password --region ap-northeast-2 \
  | docker login --username AWS --password-stdin \
    123456789012.dkr.ecr.ap-northeast-2.amazonaws.com

The token is valid for 12 hours. In CI, it’s natural to fetch a fresh one each time (once at the start of the CI job).

Permissions needed #

policy for the user / role that pushes

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:GetAuthorizationToken"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload",
        "ecr:PutImage",
        "ecr:BatchGetImage"
      ],
      "Resource": "arn:aws:ecr:ap-northeast-2:123456789012:repository/myapp"
    }
  ]
}

Only GetAuthorizationToken is on the * resource; the rest are restricted to the specific repo (the least privilege of Chapter 6 Security basics).

Pull-only permission #

Like the ECS Task’s Execution Role, you only need to pull. The AWS-managed policy AmazonECSTaskExecutionRolePolicy automatically includes ECR pull permission.

Push / Pull #

Push #

first push

# build
docker build -t myapp:v1 .

# tag (to the ECR URI)
docker tag myapp:v1 \
  123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1

# login (see above)
aws ecr get-login-password --region ap-northeast-2 | docker login ...

# push
docker push \
  123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1

If the image is 100MB, the first push is a 100MB upload. From then on, layer-level caching means only the changed parts go up, usually a few MB.

Pull #

pull

docker pull 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1

ECS / Lambda pull automatically. You’ll rarely do it by hand in the console, but it’s useful when debugging.

Tag strategy #

These are the rules for how to name the several versions of the same image inside an ECR repo. Common patterns follow.

1) Semver #

myapp:1.4.2
myapp:1.4
myapp:1
myapp:latest

Natural for distributing externally, like a library / tool. In operations, latest is dangerous (it’s ambiguous which moment’s latest it is).

2) Git SHA #

myapp:abc1234        ← short sha
myapp:abc1234567...  ← full sha

Maps 1:1 to the commit built in CI. The most recommended way for operations — you can instantly trace which commit is in production.

3) Environment + sequence #

myapp:prod-2025-04-01.001
myapp:staging-2025-04-01.005

Used to count releases by date.

4) Multi-tag #

The recommended operational pattern is immutable + alias.

build the image once → two tags

docker tag myapp:abc1234 myapp:abc1234           # immutable (stays forever)
docker tag myapp:abc1234 myapp:prod-current      # mutable (points to current production)

Keep the ECR repo itself as IMMUTABLE (can’t overwrite a pushed tag), and if you need an alias, let a separate tool (deployment system) manage it.

Image scanning #

ECR automatically scans pushed images for vulnerabilities. That’s the scanOnPush=true option (set above).

Two kinds #

Kind	What	Cost
Basic Scanning	A one-shot scan based on an open source CVE DB (CoreOS Clair)	Free
Enhanced Scanning	Inspector integration. OS layer + language libraries (npm, pip, etc.). Continuous monitoring (alerts when a new CVE is found even after a push)	Per repo / per image

For operational workloads, consider Enhanced. Basic is enough to start.

Viewing results #

scan results

aws ecr describe-image-scan-findings \
  --repository-name myapp \
  --image-id imageTag=v1

In the console, you see the CRITICAL / HIGH / MEDIUM / LOW counts at a glance under repo → image → the “Vulnerabilities” tab.

Blocking at the build stage #

Block the deployment if there’s a CRITICAL — put the following in the CI job.

CI gate

CRITICAL=$(aws ecr describe-image-scan-findings \
  --repository-name myapp --image-id imageTag=$SHA \
  --query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
  --output text)

if [ "$CRITICAL" != "None" ] && [ "$CRITICAL" -gt 0 ]; then
  echo "🚨 CRITICAL CVE found. Stopping deployment."
  exit 1
fi

You plug this gate straight into the build stage in Chapter 24 CI/CD pipeline.

Lifecycle policy — automatic cleanup #

As images pile up, ECR cost follows. Clean them up automatically with a lifecycle policy.

lifecycle.json

{
  "rules": [
    {
      "rulePriority": 1,
      "description": "delete untagged images after 7 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 7
      },
      "action": { "type": "expire" }
    },
    {
      "rulePriority": 2,
      "description": "keep only the latest 30 (delete the rest)",
      "selection": {
        "tagStatus": "any",
        "countType": "imageCountMoreThan",
        "countNumber": 30
      },
      "action": { "type": "expire" }
    }
  ]
}

apply

aws ecr put-lifecycle-policy \
  --repository-name myapp \
  --lifecycle-policy-text file://lifecycle.json

Common patterns are as follows.

Delete untagged after 7 ~ 14 days (the remnants of failed builds)
Delete those with the tag prefix pr- after 30 days (PR preview images)
Keep those with the tag prefix release- forever

For an operational workload, once a year’s worth of images accumulates it costs by the GB. Setting the lifecycle up from the start is the core of operational hygiene.

Multi-architecture images #

If you push an image built on an Apple Silicon Mac (arm64) straight to production (usually amd64), it won’t boot. There are two paths.

1) Multi-platform push with buildx #

buildx multi-arch

docker buildx create --use
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1 \
  --push .

It’s a manifest list with both architectures inside one ECR tag (v1). The pull side auto-selects the one matching its own architecture.

2) Standardize on Fargate ARM #

In Chapter 15 Fargate, if you set the task definition’s runtimePlatform.cpuArchitecture: ARM64, you only have to push the ARM-only image. There’s a bonus that the unit price is about 20% cheaper.

Task Definition (ARM)

{
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  }
}

For a new small / medium-traffic project, we recommend ARM from the start.

VPC Endpoint — pull without NAT #

When an ECS Task in a private subnet pulls from ECR, by default it goes via the NAT Gateway and is charged per GB.

ECR supports a VPC Endpoint to bypass the NAT.

two ECR VPC Endpoints

# for api calls
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.ap-northeast-2.ecr.api \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-aaa subnet-bbb

# for downloading image layers
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.ap-northeast-2.ecr.dkr \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-aaa subnet-bbb

# image layers live in S3, so add the S3 endpoint too
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.ap-northeast-2.s3 \
  --vpc-endpoint-type Gateway \
  --route-table-ids rtb-xxx

The three (api, dkr, s3) are one set. They cut the NAT Gateway cost dramatically — in environments with heavy operational traffic, they’re nearly mandatory. The depth of VPC and endpoints is covered further in Chapter 28 VPC in depth.

Cross-account access #

If you want to pull the prod account’s ECR repo from the dev account, allow it with a Repository Policy.

repo-policy.json

{
  "Version": "2008-10-17",
  "Statement": [
    {
      "Sid": "AllowDevAccountPull",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::222222222222:root"
      },
      "Action": [
        "ecr:GetAuthorizationToken",
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:BatchGetImage"
      ]
    }
  ]
}

aws ecr set-repository-policy \
  --repository-name myapp \
  --policy-text file://repo-policy.json

The principle is as follows. Put the production image in the production account’s ECR just once and have other accounts pull it. Don’t redundantly build the same image into per-environment accounts. Multi-account governance is covered in Chapter 29 Security governance.

Cost #

Item	Price (Seoul region)
Storage	$0.10 / GB / month
Data Transfer Out (internet)	$0.126 / GB (1GB free)
Data Transfer Out (same region)	Free
Enhanced Scanning	per image + per repo per hour

When ECS in the same region pulls, it’s free. Only pulls going out to the internet (CI / external tools) cost. If you keep only storage in check with a lifecycle policy, the cost is close to zero.

Pitfalls you’ll often hit #

1) `denied: ... is not authorized to perform: ecr:...` #

A missing permission. Check both.

Whether the user / role policy includes the ECR action
Whether the Repository Policy is blocking it (usually empty — if empty, the IAM policy alone is enough)

2) `manifest unknown` or `repository ... not found` #

99% of the time it’s a typo in the region / account ID. Recheck the position of ap-northeast-2 and 123456789012 in the URI.

3) Pushing the same tag to an `IMMUTABLE` repo #

If you try to push the same tag twice, it’s rejected. This is intended behavior and recommended for operations. Work around it by tagging with the commit SHA in the CI job.

4) Forgetting multi-architecture #

Build on a Mac (ARM) → push → exec format error on x86_64 Fargate. Do multi-arch with buildx or standardize the task definition on ARM.

5) NAT Gateway cost explosion #

If ECR pulls go through the NAT, they’re charged per GB. Add the three VPC Endpoints (api / dkr / s3).

6) Image accumulation #

Operating for a year without a lifecycle policy creates thousands of images that cost by the GB. Set the lifecycle up when you create the first repo.

Exercises #

Pick one of the four patterns in §“Tag strategy” for your app’s image tags, and write in one sentence why you don’t use latest in operations. In Chapter 24 CI/CD pipeline, this decision becomes the basis when CI tags with the commit SHA.
When an ECS Task in a private subnet pulls from ECR, explain in one paragraph the cost difference between the path going through the NAT Gateway and the path going through the VPC Endpoints (api / dkr / s3), basing it on §“VPC Endpoint” and §“Cost” (tie it together with the NAT cost pitfall of Chapter 15 ECS and Fargate).
If you apply the lifecycle rules “delete untagged images after 7 days + keep the latest 30,” estimate roughly how many images remain in the repo after deploying once a day for a year, basing it on §“Lifecycle policy.”

In short: ECR is a container image registry inside AWS, wired to ECS / Lambda / EKS via IAM, and operations use a private repository. Authentication is aws ecr get-login-password, which gets a 12-hour token, and the recommended operational tag is Git SHA + IMMUTABLE. Image scanning on push lets you block CRITICAL in CI, and a lifecycle policy cleans images up automatically. Multi-architecture is solved with buildx or Fargate ARM, and a VPC Endpoint (api, dkr, s3) avoids the NAT cost.

Next chapter #

ECS and ECR are models where containers are always running. The next Chapter 17 Lambda basics covers the opposite side — serverless, where a function wakes only when a request arrives. It puts together the first button of AWS serverless: how Lambda works, the runtime / handler / event model, cold starts, concurrency, and logging.