CI/CD Pipeline
The myshop-api built through Chapter 23 still relies heavily on humans when a new version comes in. This chapter automates that process. With OIDC trust, GitHub Actions pushes a container image to AWS ECR without static keys, auto-commits the Helm values in the manifest repo, and ArgoCD, covered in Chapter 20, detects that change and syncs it to the cluster. We also cover PR approval gates, the dev / prod split, Argo Rollouts canary deployment, and image tag immutability.
Having gone through Chapter 23 DB integration, myshop-api is a complete service with EKS · RDS · Secrets · the connection pool all in place, but humans are still heavily involved when a new version arrives. Someone builds and pushes the container, someone changes the manifest’s image tag, and someone runs helm upgrade. This chapter automates that flow as code. GitHub Actions pushes the image to ECR without static keys via OIDC trust, auto-commits the Helm values in the manifest repo, and ArgoCD, covered in Chapter 20 GitOps, detects that change and syncs it to the cluster.
The goal of this chapter is a state where one code push auto-deploys to dev, and one git tag queues up a prod deployment. We also cover the production-standard PR approval gate and canary automatic promote / rollback.
The two-repo model — separating code and manifests #
The most common pattern in GitOps is the separation of two repos. The model touched on in Chapter 20 GitOps §“One repo vs two repos” is shown here as a full production pipeline.
| repo | Role |
|---|---|
myshop-api (application repo) | Source code, Dockerfile, GitHub Actions workflow |
myshop-manifests (manifest repo) | Helm values, ArgoCD Application manifests, per-environment config |
There are three benefits to this separation.
- Separation of permissions — the reviewers for code changes and for infrastructure / deployment changes can differ.
- Clarity of changes — looking at the git log, “which version was up in prod at this point” is clear.
- ArgoCD only needs to watch one place — watch just the manifest repo and the desired state of every environment is captured.
The flow of a code push is captured in the following one line.
[developer push] -> [GitHub Actions: build / test / ECR push]
-> [auto-commit the image tag in the manifest repo]
-> [ArgoCD detects the change]
-> [deploy the new version to the cluster]We unpack each stage one section at a time.
GitHub Actions — AWS credentials dynamically via OIDC #
The old way to call AWS APIs from GitHub Actions was to store an IAM user’s access key / secret key in GitHub Secrets. The problem with this method is obvious — the keys are static so rotation is hard, and once leaked the impact is large.
The new standard is OIDC trust. It’s the model where GitHub Actions issues a JWT token and AWS IAM verifies that token to issue temporary credentials — the same structure as the IRSA of Chapter 16 RBAC / ServiceAccount in depth. It’s the shape where a ServiceAccount’s projected token just changes places to GitHub Actions’s JWT.
Registering the OIDC provider (Terraform) #
resource "aws_iam_openid_connect_provider" "github" {
url = "https://token.actions.githubusercontent.com"
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = ["6938fd4d98bab03faadb97b34396831e3780aea1"]
}
resource "aws_iam_role" "github_actions_ecr_push" {
name = "github-actions-myshop-api-ecr-push"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.github.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
}
StringLike = {
"token.actions.githubusercontent.com:sub" = "repo:myshop/myshop-api:ref:refs/heads/main"
}
}
}]
})
}
resource "aws_iam_role_policy_attachment" "ecr_push" {
role = aws_iam_role.github_actions_ecr_push.name
policy_arn = aws_iam_policy.ecr_push.arn
}The sub in Condition is the key — only a workflow triggered from the main branch of the myshop/myshop-api repo can take on this Role. Other repos, other branches, and other forks are all rejected. The way the IRSA trust policy of Chapter 16 isolated by namespace + ServiceAccount name changes, in GitHub Actions, to repo + branch.
Workflow — build and push #
name: Build and push
on:
push:
branches: [main]
tags: ['v*']
permissions:
id-token: write # needed to issue the OIDC token
contents: read
env:
AWS_REGION: ap-northeast-2
ECR_REPOSITORY: myshop-api
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set image tag
id: meta
run: |
if [[ "$GITHUB_REF" == refs/tags/v* ]]; then
echo "tag=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
else
echo "tag=main-$(git rev-parse --short HEAD)" >> $GITHUB_OUTPUT
fi
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/github-actions-myshop-api-ecr-push
aws-region: ${{ env.AWS_REGION }}
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: |
123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/${{ env.ECR_REPOSITORY }}:${{ steps.meta.outputs.tag }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Update manifest repo
env:
GH_TOKEN: ${{ secrets.MANIFESTS_REPO_TOKEN }}
run: |
gh api repos/myshop/myshop-manifests/dispatches \
-f event_type=update-image \
-F client_payload[app]=myshop-api \
-F client_payload[tag]=${{ steps.meta.outputs.tag }} \
-F client_payload[env]=devWe point out the three key steps.
Configure AWS credentials (OIDC)—AssumeRoleWithWebIdentityon the IAM Role created above via OIDC. This one step receives temporary credentials without static keys.Build and push— multi-platform build with Docker buildx + ECR push. Layer caching is automatic with the GHA cache.Update manifest repo— triggers another workflow in the manifest repo with arepository_dispatchevent. That workflow auto-commits the Helm values.
Auto-commit in the manifest repo #
In the manifest repo we put a workflow that receives the dispatch above and updates the values file.
name: Update image tag
on:
repository_dispatch:
types: [update-image]
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Update values
run: |
APP=${{ github.event.client_payload.app }}
TAG=${{ github.event.client_payload.tag }}
ENV=${{ github.event.client_payload.env }}
yq -i ".image.tag = \"$TAG\"" charts/$APP/values-$ENV.yaml
- name: Commit and push
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add charts/
git commit -m "chore: bump ${{ github.event.client_payload.app }} to ${{ github.event.client_payload.tag }} (${{ github.event.client_payload.env }})"
git pushWhen this commit goes into the manifest repo’s main branch, ArgoCD, which has been watching that change, auto-syncs it to the cluster. It’s the shape where the two files values-dev.yaml / values-prod.yaml of Chapter 22 become the targets of this chapter’s auto-commit.
ArgoCD — the watcher of the manifest repo #
We use the ArgoCD model covered in Chapter 20 GitOps directly. One Application CRD manifest handles the deployment of one myshop-api environment.
Installing ArgoCD #
helm repo add argo https://argoproj.github.io/argo-helm
helm install argocd argo/argo-cd \
-n argocd --create-namespace \
--values argocd-values.yamlserver:
ingress:
enabled: true
ingressClassName: alb
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
hosts:
- argocd.myshop.example.com
configs:
cm:
timeout.reconciliation: 30sThe ArgoCD UI is exposed at argocd.myshop.example.com. It’s the shape where the AWS Load Balancer Controller created in Chapter 22 resolves this Ingress to an ALB too. In production it’s standard to bind it with SSO (GitHub, Google), and the RBAC model seen in Chapter 14 RBAC / NetworkPolicy / ResourceQuota carries over straight into ArgoCD UI’s permission model too.
The myshop-api Application #
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myshop-api-prod
namespace: argocd
spec:
project: myshop
source:
repoURL: https://github.com/myshop/myshop-manifests.git
targetRevision: main
path: charts/myshop-api
helm:
valueFiles:
- values.yaml
- values-prod.yaml
destination:
server: https://kubernetes.default.svc
namespace: myshop
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- ServerSideApply=true
retry:
limit: 5
backoff:
duration: 5s
maxDuration: 3mLet’s point out the tradeoffs among the three options.
automated— changes in git are reflected to the cluster immediately. A mode suitable for dev.selfHeal: true— even if someone modifies directly withkubectl edit, it auto-recovers to the git manifest.prune: true— objects that disappear from git are deleted from the cluster too.
The migration Job made with the Helm hook of Chapter 23 is automatically converted into a PreSync hook in ArgoCD. The flow where ArgoCD runs the migration Job first before applying the new manifest, and moves to the next stage only on that Job’s success, is absorbed naturally into GitOps.
dev vs prod — the automatic sync split #
The pattern of turning off automatic sync for prod and going with a manual trigger is frequently used.
syncPolicy:
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
# remove the automated block -> manual sync modeThe deployment flow branches as follows.
[dev]
git push -> GitHub Actions build -> ECR push
-> manifest repo commit (values-dev.yaml)
-> ArgoCD auto-sync -> deploy to the dev cluster
[prod]
git tag v1.5.0 -> GitHub Actions build -> ECR push
-> manifest repo commit (values-prod.yaml)
-> a human clicks "Sync" in the ArgoCD UI
-> deploy to the prod clusterThe human gate for prod deployment is the safeguard. The manifest itself is reviewed via a git PR, and the actual application is confirmed once more by an operator. This double gate is covered in the change-management procedure of Chapter 26 Operations checklist.
The standard for bundling Applications — App of Apps #
It’s a pattern where, instead of applying Application manifests by hand into ArgoCD, one root Application watches the other Applications.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root
namespace: argocd
spec:
source:
repoURL: https://github.com/myshop/myshop-manifests.git
targetRevision: main
path: argocd/applications
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: trueWhen you make a new Application in the argocd/applications/ directory, it’s automatically registered in ArgoCD, and that Application syncs its own manifest. The cluster’s own operations come into GitOps too. It’s the stage where the model touched on in Chapter 20 §“App of Apps” settles in as the standard setup of full multi-environment operations.
Image Updater — moving image tag updates to ArgoCD #
The flow above had GitHub Actions commit to the manifest repo to update the image tag. ArgoCD Image Updater is an option that moves this step to ArgoCD.
metadata:
annotations:
argocd-image-updater.argoproj.io/image-list: api=123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myshop-api
argocd-image-updater.argoproj.io/api.update-strategy: semver
argocd-image-updater.argoproj.io/write-back-method: git
argocd-image-updater.argoproj.io/write-back-target: helmvalues:./charts/myshop-api/values-prod.yamlArgoCD Image Updater polls ECR regularly and, when it finds a new tag, auto-commits to the manifest repo. The commit step of GitHub Actions becomes unnecessary, but since the polling cycle is on a 5-minute basis, immediacy drops. If you want to leave the order of the code push and manifest commit clearly in git, the GitHub Actions commit model is more intuitive. This book’s standard path is the GitHub Actions commit, and we point out Image Updater only as an option for multi-cluster environments.
Canary · blue-green — Argo Rollouts #
The standard Deployment’s RollingUpdate is the simplest zero-downtime deployment model. On top of that model covered in Chapter 4 Deployment / ReplicaSet, more sophisticated patterns (canary, blue-green, promote after automatic analysis) are unpacked by Argo Rollouts.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myshop-api
namespace: myshop
spec:
replicas: 10
strategy:
canary:
canaryService: myshop-api-canary
stableService: myshop-api-stable
trafficRouting:
alb:
ingress: myshop-api
servicePort: 80
steps:
- setWeight: 5
- pause: { duration: 5m }
- analysis:
templates:
- templateName: success-rate
- setWeight: 25
- pause: { duration: 10m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 100
selector:
matchLabels:
app.kubernetes.io/name: myshop-api
template:
spec:
containers:
- name: api
image: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myshop-api:1.5.0
# ... (same spec as the Deployment)The new version transitions gradually in the order of 5 % traffic for 5 minutes → automatic analysis (Prometheus metric query) → on pass 25 % → 50 % → 100 %. If a failure is detected at the analysis stage, it auto-rolls back.
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
metrics:
- name: success-rate
provider:
prometheus:
address: http://prometheus.monitoring.svc:9090
query: |
sum(rate(http_requests_total{app="myshop-api",status=~"2.."}[5m]))
/ sum(rate(http_requests_total{app="myshop-api"}[5m]))
successCondition: result[0] >= 0.99
failureLimit: 1The Prometheus metric we’ll cover in Chapter 25 Monitoring · alerts goes directly into the canary’s automatic promote / rollback decision at this stage. Argo Rollouts shows its true value when bound with the metric stack of Chapter 19 Observability — it’s the shape where the metric is used as the input data of automation, not as a dashboard a human looks at.
The standard for the PR flow — environments + required reviewers #
We also lay out GitHub Actions’s production-standard gate.
jobs:
build-prod:
if: startsWith(github.ref, 'refs/tags/v')
runs-on: ubuntu-latest
environment:
name: production
url: https://api.myshop.example.com
steps:
- ...If you create the environment: production in GitHub Settings and specify Required reviewers, a workflow going to that environment won’t start without a human’s approval. It’s the standard pattern that prevents a prod deployment from auto-starting on a single tag. Combined with the ArgoCD UI’s manual Sync, a double gate of the build stage + the deployment stage is created.
Checks for the first cycle #
These are the items to check at the point when GitHub Actions push → ECR → manifest commit → ArgoCD sync has gone around once.
aws ecr describe-images \
--repository-name myshop-api \
--region ap-northeast-2 \
--query 'imageDetails[*].[imageTags,imagePushedAt]' \
--output tableargocd app get myshop-api-prod
argocd app sync myshop-api-prod # manual sync (for prod)
argocd app history myshop-api-prodkubectl get deployment myshop-api -n myshop \
-o jsonpath='{.spec.template.spec.containers[0].image}'If the three commands consistently point to the new tag, the deployment is working normally. In the ArgoCD UI the same information is shown visually, and drift between the manifest and the cluster is visible at a glance too. If ArgoCD is stuck at OutOfSync, refer to the GitOps debugging section of Chapter 27 kubectl debugging patterns — a format error in the values file, insufficient ECR image permission (ImagePullBackOff), and the manifest repo’s trust are the three most common causes.
One trap — the mutability of container image tags #
The production standard is keeping image tags immutable. If you let the same tag point to a different image, ArgoCD’s drift detection loses its meaning. The following setup is essential.
- Enable immutable tags on the ECR repository — turn on
image_tag_mutability = "IMMUTABLE"with Terraform. - Never use the
latesttag in prod — always a git SHA or semver. - Image tag = git commit hash or git tag — which commit is up in which environment is visible at a glance.
If this setup is missing, the accident of “the tag that worked until yesterday is a different image today” happens. It’s the point where the source of truth of GitOps breaks. The principle touched on in Chapter 20 GitOps §“For git to be the single source” leads into a concrete form in this chapter’s ECR setup.
Exercises #
- Apply this chapter’s GitHub OIDC Terraform manifest and set up ECR push without static keys from one repo in your own GitHub organization. Switch the pattern of
Condition.subbetween the two valuesrepo:org/repo:ref:refs/heads/mainandrepo:org/repo:environment:productionand compare which one each policy allows. In one paragraph, explain how environment-based isolation differs from branch-based isolation. - Make the two ArgoCD Application manifests for dev and prod, and split them so dev is
automated.prune + selfHealand prod is manual sync. Measure how many seconds it takes for selfHeal to revert to the git value when you arbitrarily change the Deployment’s replicas withkubectl editin dev. Explain in one paragraph why the same behavior is dangerous in prod, in the operational context of Chapter 26. - Apply Argo Rollouts’s canary manifest and auto-promote myshop-api’s new version 5 % → 25 % → 100 %. Deliberately deploy a version that returns 5xx and observe the analysis stage detecting the failure and auto-rolling back. Note how the Prometheus query that becomes the input of this automatic analysis connects to the alert rules of Chapter 25.
In one line: the CI/CD standard for a production cluster is a GitOps pipeline where GitHub Actions OIDC, ECR, manifest repo auto-commit, and ArgoCD watch work as one flow. The two-repo separation solves permissions, change tracking, and ArgoCD’s single watch target at once, and it splits dev into automated sync + selfHeal and prod into manual sync + the double gate of GitHub environment. Argo Rollouts’s canary uses Prometheus metrics as automation input to move promote / rollback from a human’s hand to code. If image tag immutability is missing, GitOps’s source of truth breaks.
Next chapter #
At this point myshop-api has settled into a pattern where one code push auto-deploys to dev, and one git tag queues up a prod deployment. But there’s still no layer that looks into all those behaviors.
In the next chapter we fill that empty space. In Chapter 25 Monitoring · alerts we cover the observability stack of a production cluster composed of Prometheus + Grafana + Alertmanager + CloudWatch, and the core alert rule set. The metric · log · trace model of Chapter 19 Observability leads into a full AWS-coupled operational setup.