Deploying a Fullstack App on EKS
The Part 6 capstone, and the book's final chapter. It deploys the React Next.js (App Router + RSC + Server Actions) app and the Modern Python FastAPI (SQLAlchemy 2.x + Pydantic v2) app together on one EKS cluster under the same TODO domain. Across 13 PRs, it walks through cluster setup with Terraform + Karpenter + IRSA + ALB Controller + ExternalDNS + cert-manager, DB integration with RDS + External Secrets + RDS IAM auth, per-environment deployment with Helm + ArgoCD ApplicationSet, observability with Prometheus + Grafana + Loki + OpenTelemetry, autoscaling with HPA + Karpenter, k6 load testing + OpenCost cost estimation, and the operations cycles of Chapters 26 and 30. This capstone shows how the tools from Chapters 1 ~ 30 fit together inside one system.
This is the book’s last chapter. The Part 6 capstone is a comprehensive exercise that shows how all the tools from Chapters 1 ~ 30 fit together inside one system. Rather than an imaginary company, it uses the outputs of two other books in this series as its input — the Next.js TODO app from Part 6 of React and the FastAPI TODO backend from Part 4 of Modern Python, both running under the same domain. In this chapter we deploy them together on one EKS cluster and revisit the full Kubernetes track inside one system.
The goals of this chapter are these.
- Next.js is up at
https://todo.example.comand FastAPI athttps://api.todo.example.com - RDS PostgreSQL is combined with backups · Multi-AZ · External Secrets
- the GitHub push → ECR → ArgoCD ApplicationSet auto-sync flow
- the Prometheus + Grafana + Loki + OpenTelemetry observability stack can observe both apps in the same direction
- HPA + Karpenter respond automatically to traffic fluctuations
- the operational-cost hypothesis of roughly $80 ~ $120 a month is verified with OpenCost
It proceeds in 13 PRs. Each PR becomes the input for the next, and the change volume stays deliberately small so every step remains reviewable.
The target architecture #
[Browser]
|
| HTTPS (Route 53 + ACM)
v
[ALB] -- AWS Load Balancer Controller
|
|-- / -> [Next.js Pod x N] (SSR + RSC + Server Actions)
`-- /api/* -> [FastAPI Pod x M] (REST + Pydantic v2)
|
| PgBouncer
v
[RDS PostgreSQL] (Multi-AZ)
^
|
[External Secrets] <- [AWS Secrets Manager]
^
| IRSA
[ServiceAccount]This picture is the final form reached by the 13 PRs in this chapter. Each arrow in the picture is a problem solved in one or more chapters of this book — this chapter is where those pieces are bound into one system.
PR #1 — Domain and architecture decision #
The first PR is a single ADR (Architecture Decision Record) with no code.
# ADR-0001: The K8s deployment architecture of the fullstack todo system
## Context
The todo system of Next.js (App Router + RSC) + FastAPI + PostgreSQL
must be deployed to a production environment.
## Options
1. ECS Fargate (managed containers)
2. EKS (Kubernetes)
3. Lambda + RDS (serverless)
## Decision
Adopt EKS.
## Rationale
- the two apps (Next.js + FastAPI) have different lifecycles and need isolation
- the autoscaling model of HPA · Karpenter fits the traffic pattern
- GitOps (ArgoCD) matches the operational standard model
- comprehensive validation of the tools from Chapters 1 ~ 30 of this book
## Consequences
- a cost hypothesis of $80 ~ $120 a month, to be verified with the Chapter 28 model
- application of the regular operations calendar cycle (Chapter 26)
- comparison with the ECS Fargate chapter of the AWS bookBecause the same capstone in AWS takes the ECS Fargate route, comparing the two books makes the operational difference between “Kubernetes vs managed containers” clear. This chapter starts the operational cycle after choosing Kubernetes.
PR #2 — A fresh EKS cluster setup #
The Terraform manifest of Chapter 21, EKS Cluster Setup is the input. In this capstone we keep one deliberate difference — we introduce Karpenter from the start.
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
# ... as in Chapter 21
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "todo-${var.env}"
cluster_version = "1.32"
enable_irsa = true
cluster_addons = {
coredns = { most_recent = true }
kube-proxy = { most_recent = true }
vpc-cni = { most_recent = true }
aws-ebs-csi-driver = {
most_recent = true
service_account_role_arn = module.ebs_csi_irsa.iam_role_arn
}
}
# keep only the minimum number of ON_DEMAND nodes; Karpenter handles the rest on demand
eks_managed_node_groups = {
system = {
desired_size = 2
min_size = 2
max_size = 3
instance_types = ["t3.medium"]
capacity_type = "ON_DEMAND"
labels = { role = "system" }
taints = [{
key = "system"
value = "true"
effect = "NO_SCHEDULE"
}]
}
}
}
module "karpenter" {
source = "terraform-aws-modules/eks/aws//modules/karpenter"
cluster_name = module.eks.cluster_name
irsa_oidc_provider_arn = module.eks.oidc_provider_arn
}The system node group hosts only system components like Karpenter, CoreDNS, and the monitoring stack. Application workloads (Next.js / FastAPI) go to the nodes Karpenter brings up. That pattern combines the Karpenter model from Chapter 13, Autoscaling with Chapter 28, Cost Optimization §“Karpenter — the decision tree against Cluster Autoscaler.”
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["t", "m", "c"]
- key: karpenter.k8s.aws/instance-cpu
operator: In
values: ["2", "4", "8"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
budgets:
- nodes: "10%"
duration: 10m
schedule: "0 9 * * mon-fri"disruption.budgets is the blast-radius control from Chapter 30, Upgrade Strategy — it replaces at most 10 % of nodes at a time during weekday business hours.
A bundle of auxiliary components #
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system --set clusterName=todo-prod --set serviceAccount.create=false
helm install external-dns external-dns/external-dns \
-n external-dns --create-namespace \
--set provider=aws --set "domainFilters[0]=todo.example.com"
helm install cert-manager jetstack/cert-manager \
-n cert-manager --create-namespace --set installCRDs=trueThis is exactly the setup described in Chapter 22, The App Deployment Skeleton §“cert-manager and external-dns.”
PR #3 — The Namespace / RBAC / NetworkPolicy skeleton #
Before bringing up workloads, we establish the isolation skeleton.
---
apiVersion: v1
kind: Namespace
metadata:
name: todo-frontend
labels:
team: web
env: prod
role: frontend
---
apiVersion: v1
kind: Namespace
metadata:
name: todo-backend
labels:
team: backend
env: prod
role: backend
---
apiVersion: v1
kind: Namespace
metadata:
name: todo-data
labels:
team: backend
env: prod
role: dataThe split into three namespaces — frontend / backend / data — is the isolation unit of this capstone. The standard labels (team / env / role) from Chapter 7, Namespace and Labels are used as the grouping key in Chapter 25, Monitoring · Alerts and the cost-allocation key in Chapter 28.
NetworkPolicy — full isolation #
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: todo-backend-ingress
namespace: todo-backend
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: todo-api
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
role: frontend
- namespaceSelector:
matchLabels:
role: backend # allow other workloads of the same backend too
ports:
- port: 8000The NetworkPolicy model of Chapter 14, RBAC / NetworkPolicy / ResourceQuota carries over into real isolation. It’s a forced flow where frontend can’t go to RDS directly and must pass through backend.
ResourceQuota — a per-team limit #
apiVersion: v1
kind: ResourceQuota
metadata:
name: todo-backend-quota
namespace: todo-backend
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
persistentvolumeclaims: "5"The ResourceQuota of Chapter 14 is the first protection line for cost isolation in a multi-team environment.
PR #4 — PostgreSQL RDS + External Secrets #
The Terraform manifest of Chapter 23, DB Integration carries over almost unchanged. The difference is that we keep Aurora Serverless v2 as an option in dev.
module "rds" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.0"
identifier = "todo-${var.env}"
engine = "postgres"
engine_version = "16.3"
major_engine_version = "16"
instance_class = var.env == "prod" ? "db.t4g.small" : "db.t4g.micro"
allocated_storage = 20
manage_master_user_password = true
multi_az = var.env == "prod"
backup_retention_period = var.env == "prod" ? 30 : 7
performance_insights_enabled = true
deletion_protection = var.env == "prod"
}To keep the cost hypothesis small, we set the instance class to db.t4g.small — a smaller option than Chapter 23’s db.m6g.large. The todo domain’s load is small, so it’s plenty.
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: todo-api-db
namespace: todo-backend
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: todo-api-db
template:
data:
DATABASE_URL: "postgresql://{{ .username }}:{{ .password }}@pgbouncer.todo-backend.svc:5432/todo?sslmode=disable"
data:
- secretKey: username
remoteRef:
key: rds!cluster-todo-prod
property: username
- secretKey: password
remoteRef:
key: rds!cluster-todo-prod
property: passwordIt’s the manifest of Chapter 23 unchanged, and the RDS IAM auth of Chapter 29, Secret Operations §“Zero passwords” is kept as an option in this capstone — todo’s traffic is small, so the PgBouncer + password model is enough.
PR #5 — Deploying the FastAPI backend #
The FastAPI todo backend from the Part 4 capstone of Modern Python is the input. (modern-python keeps the “modern” prefix to preserve the meaning of distinguishing it from the older Python course.) Containerization is outside the scope of this book, but we point at the core of the Dockerfile.
FROM python:3.13-slim AS builder
WORKDIR /app
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv sync --frozen --no-dev
FROM python:3.13-slim AS runtime
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
COPY src/ src/
ENV PATH="/app/.venv/bin:$PATH"
EXPOSE 8000
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]Deployment #
apiVersion: apps/v1
kind: Deployment
metadata:
name: todo-api
namespace: todo-backend
labels:
app.kubernetes.io/name: todo-api
spec:
replicas: 2
selector:
matchLabels:
app.kubernetes.io/name: todo-api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app.kubernetes.io/name: todo-api
spec:
serviceAccountName: todo-api
containers:
- name: api
image: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/todo-api:1.0.0
ports:
- containerPort: 8000
name: http
envFrom:
- secretRef:
name: todo-api-db
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 30
periodSeconds: 10
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
terminationGracePeriodSeconds: 60It is the standard manifest from Chapter 22, The App Deployment Skeleton combined with the graceful shutdown pattern (preStop + terminationGracePeriodSeconds) from Chapter 30.
ServiceAccount + IRSA #
apiVersion: v1
kind: ServiceAccount
metadata:
name: todo-api
namespace: todo-backend
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/todo-prod-api
automountServiceAccountToken: false # the security pattern from Chapter 16The IRSA pattern from Chapter 16, RBAC / ServiceAccount in Depth and the security pattern from Chapter 29, Secret Operations §“automountServiceAccountToken: false” are combined in one manifest.
PR #6 — Deploying the Next.js front #
The Next.js TODO app from the Part 6 capstone of React is the input. The App Router + RSC + Server Actions model behaves inside K8s as follows.
[Browser]
|
| HTTPS
v
[ALB]
|
v
[Next.js Pod] -- Node.js server (next start)
|
| fetch on RSC rendering
v
[todo-api Service] -- ClusterIP, points at FastAPI
|
v
[todo-api Pod]Server Actions run inside the Next.js Pod. When an external API call is needed, it calls the todo-api Service in the same cluster.
apiVersion: apps/v1
kind: Deployment
metadata:
name: todo-web
namespace: todo-frontend
spec:
replicas: 2
template:
spec:
serviceAccountName: todo-web
containers:
- name: web
image: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/todo-web:1.0.0
ports:
- containerPort: 3000
name: http
env:
- name: TODO_API_URL
value: "http://todo-api.todo-backend.svc.cluster.local:80"
- name: NODE_ENV
value: "production"
resources:
requests:
cpu: 200m
memory: 256Mi # the memory hypothesis of SSR + RSC
limits:
cpu: 1
memory: 512MiThe memory hypothesis of the Next.js Pod is set by the sizing model of Chapter 11, Resource Requests and Limits — since SSR + RSC’s per-request memory footprint accumulates to a degree, setting requests to 256 Mi is a conservative starting point. We converge on the appropriate value a month later with the VPA recommendation of Chapter 28, Cost Optimization.
PR #7 — Ingress + ALB #
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: todo
namespace: todo-frontend
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:...
alb.ingress.kubernetes.io/group.name: todo
external-dns.alpha.kubernetes.io/hostname: "todo.example.com,api.todo.example.com"
spec:
rules:
- host: todo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: todo-web
port:
number: 80
- host: api.todo.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: todo-api.todo-backend
port:
number: 80alb.ingress.kubernetes.io/group.name: todo is decisive — the two hosts share the same single ALB. The cost-savings pattern pointed at in Chapter 28, Cost Optimization §“The ALB’s LCU” applies directly in this section.
external-dns auto-registers the A records of the two hosts in Route 53, and one wildcard (*.todo.example.com) ACM certificate is enough. It’s the shape of the Ingress manifest of Chapter 22 extended into a multi-host pattern.
PR #8 — Binding with Helm charts #
We bind the manifests written so far into two Helm charts.
charts/
├── todo-web/
│ ├── Chart.yaml
│ ├── values.yaml
│ ├── values-dev.yaml
│ ├── values-prod.yaml
│ └── templates/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── hpa.yaml
│ └── pdb.yaml
├── todo-api/
│ ├── Chart.yaml
│ ├── values.yaml
│ ├── values-dev.yaml
│ ├── values-prod.yaml
│ └── templates/
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── serviceaccount.yaml
│ ├── externalsecret.yaml
│ ├── hpa.yaml
│ ├── pdb.yaml
│ └── servicemonitor.yaml
└── todo-infra/
├── Chart.yaml
└── templates/
├── namespaces.yaml
├── networkpolicy.yaml
├── resourcequota.yaml
└── ingress.yamlThe split of three charts is the key.
todo-infra— namespaces · NetworkPolicy · ResourceQuota · Ingress. The infra the two apps share.todo-api— all of backend’s manifests.todo-web— all of frontend’s manifests.
It’s the real application of how the pattern of Chapter 22, The App Deployment Skeleton §“Binding with Helm charts” splits in a multi-app environment. There’s also the option of binding via Chart.yaml’s dependencies, but for simplicity this capstone keeps a flat structure and integrates with ArgoCD ApplicationSet.
PR #9 — GitOps: ArgoCD ApplicationSet #
The model of Chapter 20, GitOps + Chapter 24, The CI / CD Pipeline is organized into one ApplicationSet manifest.
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: todo
namespace: argocd
spec:
generators:
- matrix:
generators:
- list:
elements:
- app: todo-infra
- app: todo-api
- app: todo-web
- list:
elements:
- env: dev
cluster: https://kubernetes.default.svc
- env: prod
cluster: https://kubernetes.default.svc
template:
metadata:
name: '{{`{{.app}}`}}-{{`{{.env}}`}}'
spec:
project: todo
source:
repoURL: https://github.com/myorg/todo-manifests.git
targetRevision: main
path: charts/{{`{{.app}}`}}
helm:
valueFiles:
- values.yaml
- values-{{`{{.env}}`}}.yaml
destination:
server: '{{`{{.cluster}}`}}'
namespace: todo-{{`{{.app}}`}}
syncPolicy:
automated:
prune: true
selfHeal: trueThe matrix generator auto-generates 3 apps × 2 environments = 6 Applications from one manifest. The operational standard is for dev to be auto-sync and for prod to branch into a separate instance of the ApplicationSet in manual-sync mode, but this capstone keeps both as automated for simplicity.
The GitHub Actions OIDC + ECR push + manifest repo auto-commit cycle of Chapter 24 is the input of this manifest — one code push auto-syncs both the dev / prod environments.
PR #10 — Observability #
The kube-prometheus-stack from Chapter 19, Observability + Chapter 25, Monitoring · Alerts carries over unchanged. The difference is that we add an OpenTelemetry Collector to tie the traces of the two apps together.
apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
name: otel
namespace: monitoring
spec:
mode: daemonset
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
prometheus:
endpoint: 0.0.0.0:8889
otlp/tempo:
endpoint: tempo.monitoring.svc:4317
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
exporters: [prometheus]When Next.js’s OpenTelemetry SDK and FastAPI’s OTel instrumentation send traces to the same endpoint, the full path of one request crossing the two apps is visible in Tempo. Which handler of FastAPI a fetch call on RSC rendering passed through to reach RDS is traced on one trace screen.
ServiceMonitor + PrometheusRule #
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: todo-api
namespace: todo-backend
labels:
release: prometheus
spec:
groups:
- name: todo-api.golden-signals
rules:
- alert: TodoApiHighErrorRate
expr: |
sum(rate(http_requests_total{app="todo-api",status=~"5.."}[5m]))
/ sum(rate(http_requests_total{app="todo-api"}[5m])) > 0.05
for: 5m
labels:
severity: critical
# ... latency, traffic, saturation are the sameIt’s the manifest of Chapter 25 unchanged, and the same rule applies to todo-web too. The alert severity routing reuses the Alertmanager manifest from Chapter 25 unchanged.
PR #11 — Autoscaling #
The HPA of Chapter 13, Autoscaling and the Karpenter NodePool of Chapter 28 combine to make two stages of automatic response.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: todo-api
namespace: todo-backend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: todo-api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
behavior:
scaleUp:
stabilizationWindowSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300traffic increase
|
v
HPA: in 30 seconds, todo-api Pod 2 -> 5 -> 10 -> 20
|
| the node runs short on resources
v
Karpenter: in 30 seconds ~ 1 minute, provisions a new node (spot first)
|
v
the Pod that was Pending gets scheduled on the new nodeThe shape of these two stages running together is the goal of K8s autoscaling. We measure that shape with a load test in the next PR.
PR #12 — Load testing and cost estimation #
import http from "k6/http";
import { check } from "k6";
export const options = {
stages: [
{ duration: "2m", target: 50 },
{ duration: "5m", target: 200 },
{ duration: "2m", target: 500 },
{ duration: "5m", target: 500 },
{ duration: "2m", target: 0 },
],
};
export default function () {
const res = http.get("https://todo.example.com/api/todos");
check(res, {
"status is 200": (r) => r.status === 200,
"duration < 500ms": (r) => r.timings.duration < 500,
});
}k6 run k6/script.jsWhat to measure.
- HPA scale-up response time — in how many seconds Pods increase between traffic 50 → 200
- Karpenter’s node-add time — the time a Pod stayed Pending
- P95 latency — how latency changes at the load peak (500 VUs)
- 5xx ratio — whether the error rate during load exceeds Chapter 25’s threshold (5 %)
Cost verification #
helm install opencost opencost/opencost \
-n opencost --create-namespaceFrom OpenCost’s output we verify the monthly cost hypothesis.
| Item | Estimate (monthly) |
|---|---|
| EKS control plane | $73 |
| nodes (system t3.medium × 2 ON_DEMAND) | $60 |
| nodes (application spot, average 1.5 units) | $20 |
| RDS db.t4g.small Multi-AZ | $30 |
| ALB (1 unit, LCU) | $20 |
| NAT Gateway + data transfer | $35 |
| ECR / Route 53 / other | $10 |
| total | about $248 |
We compare each item of Chapter 28 §“A checklist for reviewing the bill” with these actual measurements. Whether prod’s target landed within this book’s standard guide ($200 ~ $300) is the validation metric.
In a learning environment, the following adjustments can cut it to $40 ~ $80 a month.
- prod’s Multi-AZ RDS to dev’s single AZ
- one ALB (already shared)
- the system node group on spot too
- the NAT Gateway to a Single NAT
PR #13 — Applying the operations checklist #
The last PR applies the regular calendar of Chapter 26, The Operations Checklist and the upgrade checklist of Chapter 30, Upgrade Strategy to this system.
# The todo system operations calendar
## Daily
- check the 5 panels of the todo Grafana dashboard
- review the active alerts in Alertmanager
## Weekly
- ECR Trivy scan results (both todo-api, todo-web)
- review new security patches
## Monthly
- the top 1, 2, 3 costs by team / by workload in OpenCost
- the unreflected workloads of VPA recommendation
- check the OutOfSync state of ArgoCD
## Quarterly
- EKS minor upgrade (the 13 steps of Chapter 30)
- right-sizing signals from RDS Performance Insights
- RBAC audit
- recovery drill (PITR simulation)
- kube-bench CIS checkup
## Semi-annually
- external security audit
- DR simulation (Velero restore)
## Annually
- cluster architecture review
- manifest modernizationThis manifest going into git is the last PR of this capstone. Putting not only the code but also the operational procedures in git’s single source is the essential goal of GitOps.
Retrospective — how the 30 chapters were bound together #
We organize how this book’s chapters meshed inside one system across the 13 PRs.
| Chapter of this book | Role in this capstone |
|---|---|
| Chapters 1 ~ 3 | the vision to read a manifest line |
| Chapter 4, Deployment | the RollingUpdate strategy of todo-api / todo-web |
| Chapter 5, Service | the cluster DNS connection of todo-api ↔ todo-web |
| Chapter 6, ConfigMap · Secret | the standard for environment-variable injection |
| Chapter 7, Namespace and Labels | the frontend / backend / data split |
| Chapter 9, PV / PVC / StorageClass | the EBS CSI Driver (no direct PV used — RDS) |
| Chapter 10, Ingress | one ALB + group.name for two hosts |
| Chapter 11, Resource Requests and Limits | the starting points of Next.js 256 Mi · FastAPI 128 Mi |
| Chapter 12, Health Checks | the 3 probes + graceful shutdown |
| Chapter 13, Autoscaling | the two-stage automatic response of HPA + Karpenter |
| Chapter 14, RBAC / NetworkPolicy / Quota | namespace isolation + per-team limits |
| Chapter 15, CNI in Depth | VPC CNI assigns IP directly to Pods (background) |
| Chapter 16, IRSA | todo-api’s AWS credentials |
| Chapter 17, Admission Controller | Kyverno policy (optional) |
| Chapter 18, CRD and Operator | ESO, Karpenter, ALB Controller, OTel |
| Chapter 19, Observability | the trace of OpenTelemetry + Tempo |
| Chapter 20, GitOps | one ArgoCD ApplicationSet manifest |
| Chapter 21, EKS Setup | the starting point of Terraform |
| Chapter 22, The App Deployment Skeleton | the standard 9-bundle of todo-api / todo-web |
| Chapter 23, DB Integration | RDS + ESO + PgBouncer |
| Chapter 24, The CI / CD Pipeline | GitHub Actions OIDC → ECR → ApplicationSet |
| Chapter 25, Monitoring · Alerts | PrometheusRule + Alertmanager routing |
| Chapter 26, The Operations Checklist | daily / weekly / monthly / quarterly / semi-annually / annually |
| Chapter 27, kubectl Debugging | the standard 5-minute flow on an incident |
| Chapter 28, Cost Optimization | OpenCost + Karpenter spot + ALB sharing |
| Chapter 29, Secret Operations | ESO + automountServiceAccountToken |
| Chapter 30, Upgrade Strategy | preStop · PDB · Karpenter disruption budgets |
This table is the one-line summary of this capstone — the shape of the 30 chapters each taking a role in one system is the goal of the K8s track.
Comparison with the AWS book #
The Part 6 capstone of the AWS book (forthcoming) takes up the same todo system on the ECS Fargate route. Comparative learning of the two books makes the operational difference of implementing the same domain on two platforms clearly visible.
| Grain | This book (EKS) | AWS (ECS Fargate) |
|---|---|---|
| starting cost | $200 ~ $300 a month | $80 ~ $150 a month |
| operational surface | K8s’s richness + learning curve | AWS console + fewer objects |
| automation tools | Karpenter, HPA, ArgoCD | Service Auto Scaling, CodePipeline |
| observability | Prometheus + Grafana | CloudWatch Container Insights |
| multi-cloud possibility | possible (K8s standard) | AWS-locked |
| the team’s learning cost | high | low |
For a small team working in a single domain, ECS Fargate is more efficient; if you need multi-domain support, GitOps, rich workload patterns, and a multi-cloud option, EKS is suitable. This capstone’s decision (EKS) is the result of learning value plus a comprehensive validation of this book’s 30 chapters.
Cleanup — deleting the cluster #
The cost-side standard is to clean up a learning cluster immediately after the capstone ends.
# 1. delete the ArgoCD Application (clean up the workloads)
kubectl delete applicationset todo -n argocd
# 2. release RDS deletion_protection then terraform destroy
# (for prod, deletion_protection is on, so set it to false via a terraform variable then apply)
# 3. confirm the automatic cleanup of ALB / Route 53
# external-dns auto-deletes the hostname's A records
# 4. terraform destroy
terraform destroy
# 5. delete the ECR repositories (image remnants)
aws ecr delete-repository --repository-name todo-api --force
aws ecr delete-repository --repository-name todo-web --forceThis order is the standard for safe cleanup — if you don’t clean up from the Application first, Terraform gets blocked on the ALB dependency and destroy fails.
Exercises #
- Actually apply this capstone’s 13 PRs to your own GitHub organization, and organize the last load test’s results together with OpenCost’s cost output into one page. Map where the gap between the expected cost hypothesis (about $248) and the actual measurement arose (especially NAT data transfer · ALB LCU · spot ratio) onto Chapter 28, Cost Optimization §“A checklist for reviewing the bill.”
- Branch this capstone’s ApplicationSet manifest and modify it so dev’s and prod’s sync policies behave differently (dev as automated + selfHeal, prod as manual sync). Deliberately apply a broken value to dev’s manifest (e.g., a nonexistent image tag) and, in one paragraph, compare how selfHeal protects and how prod’s manual sync works as a human gate.
- After following the same todo system ECS Fargate capstone of the AWS book, compare the operational tradeoffs of the two implementations against your own scenario in one table. Organize into one page the decision tree of which platform is suitable at which point, tailored to your domain (traffic pattern · team size · tolerance for cloud lock-in).
In one line: The Part 6 capstone deploys modern-react’s Next.js and modern-python’s FastAPI together on the same EKS cluster across 13 PRs. It starts with Terraform + Karpenter + IRSA + ALB Controller + ExternalDNS + cert-manager, then adds the namespace split of frontend / backend / data + NetworkPolicy + ResourceQuota, the DB stack of RDS + External Secrets + PgBouncer, three Helm charts (infra + api + web), ArgoCD ApplicationSet generating 6 Applications from one manifest, OpenTelemetry tracing both apps together, HPA + Karpenter handling traffic changes, k6 + OpenCost verifying the monthly cost hypothesis of about $248, and the last PR putting the daily / weekly / monthly / quarterly / semi-annually / annually operations calendar in git. The goal of the K8s track is to make the 30 tools of Chapters 1 ~ 30 each play a clear role in one system. For a small team working in a single domain, AWS’s ECS Fargate may be more efficient. If you need multi-domain support, GitOps, and a multi-cloud option, EKS is suitable.
The end of the book — next steps #
With this capstone, the vision of how this book’s 30 chapters mesh inside one system is complete. But this book is not the destination of K8s — it’s a starting point. We point at the topics you can move to as the next track.
- Service Mesh — Istio · Linkerd. mTLS · fine-grained traffic routing · observability mesh.
- MLOps on K8s — Kubeflow · KServe · Argo Workflows. A dedicated stack for ML model training · deployment · serving.
- Multi-cluster — patterns that go beyond the limits of a single cluster. Cluster federation · multi-region · the multi-cluster mode of ArgoCD ApplicationSet.
- eBPF in depth — beyond Cilium. The next generation of security / observability / networking.
- eks-anywhere / on-prem K8s — the challenge of operating clusters outside a managed offering.
These topics are the domain of separate books, and this book’s 30 chapters create the vision of standing at their starting point.
Finally, Appendix A — From docker-compose to k8s closes the book as a migration guide for the entry-level reader. For a reader who has followed this whole book it’s an appendix, but for a reader who came as far as Docker / docker-compose and opened this book for the first time, it’s a starting point.