AWS Advanced #2: ECR — Image Registry
#1 ECS and Fargate covered container operations. One piece is missing — where do the images ECS / Fargate pulls actually live? External registries like Docker Hub work, but inside AWS the standard is Amazon ECR (Elastic Container Registry).
This post covers ECR’s place — private vs public, IAM auth, push / pull, security (scanning, tag immutability), and ops (lifecycle policies, multi-arch) — all in one go.
What an image registry is for #
Recall the Docker flow:
docker build → local image
↓
docker push → registry (remote)
↓
docker pull (somewhere else) → bring it down, docker runThe registry sits in the middle. It’s what decides who can pull which image, from where, at which version.
The options #
| Registry | Notes |
|---|---|
| Docker Hub | Most famous. Free, but with pull limits, public-by-default |
| GHCR (GitHub Container Registry) | Linked to GitHub accounts. Generous private free tier |
| Amazon ECR Private | IAM auth inside AWS. Plays naturally with ECS / Lambda / EKS |
| Amazon ECR Public | For OSS distribution. Anyone can pull anonymously |
| GCR / Azure ACR | Other clouds |
If your ECS / Lambda / EKS runs on AWS, ECR is the standard:
- IAM auth — no separate password to manage
- VPC Endpoints to skip the internet on pull (NAT cost savings)
- Same region → fast pulls
- Image scanning (vulnerability analysis) integrated
Private vs Public #
Two flavors.
Private (most cases) #
Only your account’s users / roles can access. Almost all production work uses Private.
- Region-scoped (each image is pinned to a region)
- IAM policies for access control
- Cost: GB stored + Data Transfer
Public (OSS distribution / learning) #
Anyone in the world can pull anonymously. Listed on the AWS-run Public Gallery.
- Always hosted in
us-east-1(global) - Push requires IAM auth, pull is anonymous
- Cost: GB stored + Data Transfer (push side)
This post assumes Private.
Creating a Repository #
The unit in ECR is a Repository. One repo holds many tags (versions) of the same app.
aws ecr create-repository \
--repository-name myapp \
--region ap-northeast-2 \
--image-scanning-configuration scanOnPush=true \
--encryption-configuration encryptionType=AES256On success the URI is:
123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myappThat’s the address for every push / pull. Format:
<account-id>.dkr.ecr.<region>.amazonaws.com/<repo>:<tag>Options #
| Option | Meaning |
|---|---|
image-scanning-configuration scanOnPush=true | Auto vulnerability scan on push |
image-tag-mutability IMMUTABLE | Forbid overwriting the same tag — recommended for prod |
encryption-configuration encryptionType=KMS | Encrypt with a customer-managed KMS key |
You can do the same in the console GUI.
Auth — aws ecr get-login-password
#
Unlike Docker Hub, ECR auth is via AWS IAM. There’s no separate password — instead you get a temporary token and docker login with it.
aws ecr get-login-password --region ap-northeast-2 \
| docker login --username AWS --password-stdin \
123456789012.dkr.ecr.ap-northeast-2.amazonaws.comThe token is valid for 12 hours. In CI, re-acquire it at the start of every job.
Permissions you need #
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage",
"ecr:BatchGetImage"
],
"Resource": "arn:aws:ecr:ap-northeast-2:123456789012:repository/myapp"
}
]
}GetAuthorizationToken requires * (the only one that does); the rest you scope to the specific repo (Basics #6 least privilege).
Pull-only permissions #
For ECS Tasks (Execution Role) you only need pull. The AWS-managed policy AmazonECSTaskExecutionRolePolicy includes ECR pull permissions.
Push / Pull #
Push #
# build
docker build -t myapp:v1 .
# tag (with the ECR URI)
docker tag myapp:v1 \
123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1
# login (see above)
aws ecr get-login-password --region ap-northeast-2 | docker login ...
# push
docker push \
123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1If the image is 100 MB the first push uploads 100 MB. Subsequent pushes use layer-level caching — only the changed parts go up, usually a few MB.
Pull #
docker pull 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1ECS / Lambda do this automatically. You almost never pull manually from the console, but it’s useful for debugging.
Tagging strategies #
How you name versions of the same image inside the repo. Common patterns.
1) Semver #
myapp:1.4.2
myapp:1.4
myapp:1
myapp:latestNatural for libraries / tools you publish externally. In production, latest is dangerous — it’s unclear which build “latest” actually refers to.
2) Git SHA #
myapp:abc1234 ← short sha
myapp:abc1234567... ← full sha1:1 with the commit your CI built. Most recommended for prod — you can immediately trace which commit is in production.
3) Environment + sequence #
myapp:prod-2025-04-01.001
myapp:staging-2025-04-01.005Where releases are counted per day.
4) Multi-tag #
The recommended ops pattern: immutable + alias.
docker tag myapp:abc1234 myapp:abc1234 # immutable (forever the same)
docker tag myapp:abc1234 myapp:prod-current # mutable (points at current prod)Make the ECR repo itself IMMUTABLE (can’t overwrite a pushed tag) and let a separate tool (your deployment system) manage aliases when needed.
Image scanning #
ECR auto-scans pushed images for vulnerabilities. Enable with scanOnPush=true (set above).
Two flavors #
| Type | What | Cost |
|---|---|---|
| Basic Scanning | One-shot scan against open-source CVE DB (CoreOS Clair) | Free |
| Enhanced Scanning | Inspector integration. OS layer + language libraries (npm, pip, etc.). Continuous monitoring (alerts when new CVEs land after push) | per repo-hour / per image |
Look at Enhanced for production workloads. Basic is enough to start.
Reading results #
aws ecr describe-image-scan-findings \
--repository-name myapp \
--image-id imageTag=v1In the console: repo → image → “Vulnerabilities” tab shows CRITICAL / HIGH / MEDIUM / LOW counts at a glance.
Block at the build stage #
Block deploys when CRITICAL is non-zero — in your CI job:
CRITICAL=$(aws ecr describe-image-scan-findings \
--repository-name myapp --image-id imageTag=$SHA \
--query 'imageScanFindings.findingSeverityCounts.CRITICAL' \
--output text)
if [ "$CRITICAL" != "None" ] && [ "$CRITICAL" -gt 0 ]; then
echo "🚨 CRITICAL CVE found. Blocking deploy."
exit 1
fiLifecycle policies — auto-cleanup #
Images pile up; storage costs follow. Use a lifecycle policy to auto-clean.
{
"rules": [
{
"rulePriority": 1,
"description": "Delete untagged images after 7 days",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 7
},
"action": { "type": "expire" }
},
{
"rulePriority": 2,
"description": "Keep only the latest 30 (delete the rest)",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 30
},
"action": { "type": "expire" }
}
]
}aws ecr put-lifecycle-policy \
--repository-name myapp \
--lifecycle-policy-text file://lifecycle.jsonCommon patterns:
- Delete untagged after 7–14 days (leftovers from failed builds)
- Delete
pr-prefixed tags after 30 days (PR preview images) - Keep
release-prefixed forever
After a year in production, an unmanaged repo accumulates GBs of stored images and the cost that goes with them. Setting a lifecycle policy when you create the repo is part of operational hygiene.
Multi-architecture images #
A linux/arm64 image built on Apple Silicon won’t boot on amd64 production. Two paths.
1) buildx multi-platform push #
docker buildx create --use
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myapp:v1 \
--push .A single ECR tag (v1) holds a manifest list with both architectures. Pull side automatically picks the matching arch.
2) Standardize on Fargate ARM #
Set #1 Fargate’s runtimePlatform.cpuArchitecture: ARM64 in your Task Definition and you only need ARM images. Bonus: about 20% cheaper.
{
"runtimePlatform": {
"cpuArchitecture": "ARM64",
"operatingSystemFamily": "LINUX"
}
}For a new small- to mid-traffic project, going ARM from day one is the right call.
VPC Endpoint — pull without NAT #
ECS Tasks in private subnets pulling from ECR → by default through NAT Gateway → per-GB charge.
ECR supports VPC Endpoints to skip NAT:
# api calls
aws ec2 create-vpc-endpoint \
--vpc-id vpc-xxx \
--service-name com.amazonaws.ap-northeast-2.ecr.api \
--vpc-endpoint-type Interface \
--subnet-ids subnet-aaa subnet-bbb
# image layer downloads
aws ec2 create-vpc-endpoint \
--vpc-id vpc-xxx \
--service-name com.amazonaws.ap-northeast-2.ecr.dkr \
--vpc-endpoint-type Interface \
--subnet-ids subnet-aaa subnet-bbb
# image layers live in S3 — add an S3 endpoint
aws ec2 create-vpc-endpoint \
--vpc-id vpc-xxx \
--service-name com.amazonaws.ap-northeast-2.s3 \
--vpc-endpoint-type Gateway \
--route-table-ids rtb-xxxThe three (api, dkr, s3) form a set. Cuts NAT Gateway costs significantly — almost mandatory at scale.
Cross-account access #
Want to pull from the prod account’s ECR repo into a dev account? Use a Repository Policy to allow it.
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AllowDevAccountPull",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::222222222222:root"
},
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
]
}
]
}aws ecr set-repository-policy \
--repository-name myapp \
--policy-text file://repo-policy.jsonPrinciple: production images live once in the prod ECR; other accounts pull. Don’t rebuild the same image into per-environment ECRs.
Cost #
| Item | Price (Seoul region) |
|---|---|
| Storage | GB / month $0.10 |
| Data Transfer Out (internet) | GB $0.126 (1GB free) |
| Data Transfer Out (same region) | Free |
| Enhanced Scanning | per image + per repo-hour |
ECS pulls in the same region → free. Only internet egress (CI / external tools) pulls cost. With a lifecycle policy your storage stays small and the bill is essentially nothing.
Common pitfalls #
1) denied: User: ... is not authorized to perform: ecr:...
#
Permission missing. Check both:
- ECR actions are in the user / role policy
- The Repository Policy isn’t blocking (usually empty — empty means IAM policy is enough)
2) manifest unknown or repository ... not found
#
99% the time it’s wrong region or account ID in the URI. Double-check the ap-northeast-2 part and the 123456789012 part.
3) Pushing the same tag against an IMMUTABLE repo
#
Pushing the same tag twice gets denied. This is intentional — recommended for prod. Work around it by tagging with the commit SHA in your CI job.
4) Forgot multi-architecture #
Build on Mac (ARM) → push → exec format error on x86_64 Fargate. Use buildx multi-arch, or standardize the Task Definition on ARM.
5) NAT Gateway cost blow-up #
ECR pulls through NAT add up by GB. Add the three VPC Endpoints (api / dkr / s3).
6) Image accumulation #
Without a lifecycle policy, a year of production gives you thousands of images and GB of cost. Add a lifecycle policy when you create the repo.
Wrap-up #
Here is what this post covered:
- Where ECR fits — AWS’s image registry. Plays naturally with ECS / Lambda / EKS via IAM
- Private vs Public — production is Private. Public is for OSS distribution
- Repository = one app’s image collection. URI shape
<account>.dkr.ecr.<region>.amazonaws.com/<repo>:<tag> - Auth —
aws ecr get-login-password→ docker login. 12-hour token - Permissions — split push (full) from pull (
AmazonECSTaskExecutionRolePolicy) - Tagging strategies — Semver / Git SHA / env+sequence. Production goes Git SHA + IMMUTABLE
- Image scanning — Basic (free) / Enhanced (Inspector, continuous). CI gate on CRITICAL
- Lifecycle policy — rules like “untagged for 7 days, keep latest N” auto-clean
- Multi-architecture — buildx for amd64 + arm64. Or standardize on Fargate ARM (20% cheaper)
- VPC Endpoint (api / dkr / s3) — skip NAT Gateway. Almost mandatory in production
- Cross-account — Repository Policy for prod ↔ dev pull
- Pitfalls — permission, URI typo, IMMUTABLE conflict, missing multi-arch, NAT cost, image sprawl
Up next — Lambda #
ECS / ECR are the model where a container is always running. Next we look at the opposite — the function wakes only when called, the serverless side.
In #3 Lambda Basics we cover where Lambda fits (vs ECS / EC2), runtime / handler / event model, cold start, concurrency, and logging — AWS’s first serverless piece, all in one go.