Contents
23 Chapter

DB Integration — RDS · External Secrets

The myshop-api we exposed externally in Chapter 22 is an empty shell with no data store. This chapter fills that space. We stand up RDS PostgreSQL with Terraform, keep the master password in AWS Secrets Manager, auto-sync that secret into a Kubernetes Secret with the External Secrets Operator, grant permissions without static credentials via IRSA, add a connection pool with PgBouncer, and automate schema migrations with a Helm hook-based Job pattern.

In Chapter 22 App deployment skeleton myshop-api got its external HTTPS entry point, but inside that container there is still nowhere for data to go. It is an empty shell that only returns 200 on /health/ready. This chapter fills that space. We cover the flow of standing up RDS PostgreSQL with Terraform, keeping the master password in AWS Secrets Manager, auto-syncing that secret into a Kubernetes Secret with the External Secrets Operator, granting permissions without static credentials via IRSA, adding a connection pool with PgBouncer, and automating schema migrations with a Job.

The goal of this chapter is a state where myshop-api communicates normally with RDS PostgreSQL, password rotation is automated, and the threat of the connection pool limit is controlled. In the next Chapter 24 CI / CD we automate the path by which a new version comes in.

Why managed RDS #

There’s also a path of standing up a PostgreSQL StatefulSet inside K8s. The StatefulSet model of Chapter 8 StatefulSet · DaemonSet · Job is the starting point of self-hosting. But the standard for production environments is managed RDS. The burdens of backup, Multi-AZ failover, patching, monitoring, and major version upgrade all fall out to AWS’s responsibility, and we can focus only on operating the cluster’s stateless workloads.

ItemK8s self-hosting (StatefulSet)Managed RDS
Data reliabilityOperate PV backup · recovery yourselfAutomatic backup · PITR · Multi-AZ
Major version upgradeDowntime · migration yourselfAutomatic via the RDS console or Terraform
HAStatefulSet + external operational toolingOne line for the Multi-AZ option
CostNode resources + operator headcount timeRDS instance + storage
Operational burdenLargeSmall

This book’s pattern is the separation of keeping only stateless workloads on Kubernetes and stateful systems on managed services. It is an EKS best practice and the starting point of nearly every production cluster.

RDS — standing up PostgreSQL with Terraform #

Terraform module #

terraform/modules/myshop-rds/main.tf
module "rds" {
  source  = "terraform-aws-modules/rds/aws"
  version = "~> 6.0"

  identifier = "myshop-${var.env}"

  engine               = "postgres"
  engine_version       = "16.3"
  family               = "postgres16"
  major_engine_version = "16"
  instance_class       = var.env == "prod" ? "db.m6g.large" : "db.t4g.medium"

  allocated_storage     = 50
  max_allocated_storage = 500
  storage_type          = "gp3"
  storage_encrypted     = true

  db_name  = "myshop"
  username = "myshop_admin"
  port     = 5432

  manage_master_user_password   = true
  master_user_secret_kms_key_id = aws_kms_key.rds.arn

  multi_az               = var.env == "prod"
  db_subnet_group_name   = var.db_subnet_group_name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = var.env == "prod" ? 30 : 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "Mon:04:00-Mon:05:00"

  performance_insights_enabled = true
  monitoring_interval          = 60
  monitoring_role_arn          = aws_iam_role.rds_monitoring.arn

  enabled_cloudwatch_logs_exports = ["postgresql"]

  deletion_protection = var.env == "prod"
  skip_final_snapshot = var.env != "prod"
}

The production standards are gathered inside one manifest. We point out the key options.

  • manage_master_user_password = true — RDS creates the master password itself and stores it in Secrets Manager. It’s the pattern that results in a state where no human has ever seen the password.
  • multi_az — prod is Multi-AZ for failover capability, dev is single AZ to save cost. It binds with the VPC multi-AZ decision of Chapter 21.
  • storage_encrypted — KMS encryption. A production standard.
  • performance_insights_enabled — PostgreSQL query performance analysis. Almost no impact on RDS’s own cost.
  • enabled_cloudwatch_logs_exports — exports PostgreSQL’s slow query / error logs to CloudWatch. It becomes an alert source for Chapter 25 Monitoring · alerts.
  • deletion_protection — always turn it on for prod. It’s the last line of protection against a terraform destroy accident.

Security group — only EKS nodes have access #

terraform/modules/myshop-rds/sg.tf
resource "aws_security_group" "rds" {
  name_prefix = "myshop-${var.env}-rds-"
  vpc_id      = var.vpc_id
}

resource "aws_security_group_rule" "rds_from_eks" {
  type                     = "ingress"
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
  security_group_id        = aws_security_group.rds.id
  source_security_group_id = var.eks_node_security_group_id
  description              = "Allow from EKS worker nodes"
}

Port 5432 is accepted only from the EKS nodes’ security group. Nowhere else can access RDS directly — a production standard. When a human needs to temporarily go into RDS directly to look at SQL, it’s done through a bastion or SSM Session Manager, which we’ll cover in Chapter 26 Operations checklist.

The master password — keep it in Secrets Manager #

If you turn on manage_master_user_password = true, RDS automatically creates the password and stores it in Secrets Manager in the following format.

The shape of the RDS secret stored in Secrets Manager
{
  "username": "myshop_admin",
  "password": "<RDS-generated random>",
  "engine": "postgres",
  "host": "myshop-prod.abcdef.ap-northeast-2.rds.amazonaws.com",
  "port": 5432,
  "dbname": "myshop"
}

How a Pod inside K8s reads this secret is the next step. The path of writing a static password directly into a K8s Secret overlaps exactly with the traps pointed out in Chapter 6 ConfigMap · Secret §“The limits of secret management” — plaintext goes into a manifest, a secret encoded only as base64 goes into git, or you have to update the manifest yourself on password rotation. The tool that solves all three traps at once is the External Secrets Operator.

External Secrets Operator — syncing K8s Secrets with cloud secrets #

This is the stage where the Operator pattern of Chapter 18 CRD and Operator leads into a full production tool. It’s a controller that auto-syncs secrets in AWS Secrets Manager (or Parameter Store, Vault, GCP Secret Manager, etc.) into K8s Secrets.

Install #

Install with Helm
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets \
  -n external-secrets --create-namespace \
  --set installCRDs=true

After installation, two new CRDs are registered in the cluster — ClusterSecretStore and ExternalSecret. It’s the CRD pattern covered in Chapter 18. When you declare the source and mapping of a secret in a manifest, the Operator looks at that declaration and creates and updates the actual K8s Secret object.

Secrets Manager access permission via IRSA #

We attach, via IRSA, an IAM Role granting read permission on Secrets Manager to the External Secrets Operator’s ServiceAccount. The model from Chapter 16 RBAC / ServiceAccount in depth applies directly.

terraform/modules/external-secrets/iam.tf
data "aws_iam_policy_document" "secrets_read" {
  statement {
    actions = [
      "secretsmanager:GetSecretValue",
      "secretsmanager:DescribeSecret",
    ]
    resources = [
      "arn:aws:secretsmanager:${var.region}:${var.account_id}:secret:rds!cluster-myshop-${var.env}/*",
      "arn:aws:secretsmanager:${var.region}:${var.account_id}:secret:myshop/${var.env}/*",
    ]
  }
}

resource "aws_iam_policy" "secrets_read" {
  name   = "myshop-${var.env}-external-secrets-read"
  policy = data.aws_iam_policy_document.secrets_read.json
}

module "external_secrets_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name = "myshop-${var.env}-external-secrets"

  oidc_providers = {
    main = {
      provider_arn               = var.oidc_provider_arn
      namespace_service_accounts = ["external-secrets:external-secrets"]
    }
  }

  role_policy_arns = {
    main = aws_iam_policy.secrets_read.arn
  }
}

The ARN pattern in resources is the key — it grants access only to myshop’s secrets and blocks reading other teams’ secrets. It’s the least-privilege principle, and the second axis of the security model covered together with the trust policy in Chapter 16.

ClusterSecretStore — defining the secret source #

cluster-secret-store.yaml
apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: aws-secrets-manager
spec:
  provider:
    aws:
      service: SecretsManager
      region: ap-northeast-2
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets
            namespace: external-secrets

This object declares at the cluster level that “secrets are fetched from AWS Secrets Manager.” auth.jwt.serviceAccountRef is that ServiceAccount with IRSA attached, and the External Secrets Operator calls STS’s AssumeRoleWithWebIdentity with that ServiceAccount’s projected token to receive Secrets Manager permission. It’s the point where the flow of Chapter 16 §“How IRSA works” leads into full secret synchronization.

ExternalSecret — fetching a secret via manifest #

myshop-api/templates/externalsecret.yaml
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: myshop-api-db
  namespace: myshop
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: myshop-api-db
    creationPolicy: Owner
    template:
      data:
        DATABASE_URL: "postgresql://{{ .username }}:{{ .password }}@{{ .host }}:{{ .port }}/{{ .dbname }}?sslmode=require"
  data:
    - secretKey: username
      remoteRef:
        key: rds!cluster-myshop-prod
        property: username
    - secretKey: password
      remoteRef:
        key: rds!cluster-myshop-prod
        property: password
    - secretKey: host
      remoteRef:
        key: rds!cluster-myshop-prod
        property: host
    - secretKey: port
      remoteRef:
        key: rds!cluster-myshop-prod
        property: port
    - secretKey: dbname
      remoteRef:
        key: rds!cluster-myshop-prod
        property: dbname

Organizing what happens with this one manifest gives the following.

  1. The External Secrets Operator fetches the rds!cluster-myshop-prod secret from Secrets Manager every hour.
  2. It pulls the 5 fields of that secret (username, password, host, port, dbname).
  3. It assembles those values into the Connection String format with template.data.DATABASE_URL.
  4. It creates a K8s Secret named myshop-api-db and stores it inside with a single key, DATABASE_URL.

The myshop-api Pod is done once it receives this Secret injected as environment variables via envFrom.

deployment.yaml — Secret injection
envFrom:
  - configMapRef:
      name: myshop-api
  - secretRef:
      name: myshop-api-db   # auto-synced by External Secrets

The DATABASE_URL that was left as a placeholder in Chapter 22 gets filled with the real value at this point. When RDS’s password is rotated, within an hour the K8s Secret is also auto-updated with the new value.

The need to restart Pods #

Even when a K8s Secret is updated, the environment variables inside the Pod are not updated automatically. Values injected via envFrom are fixed only at Pod start time. After a password rotation you have to restart the Pod for the new password to take effect.

Force-restart Pods after the Secret is updated
kubectl rollout restart deployment/myshop-api -n myshop

External Secrets has an integration with a separate component called Reloader, so it can automatically trigger a rollout restart when a Secret changes. It frequently comes together as part of the standard setup of a production cluster. It’s the shape where the rolling update of Chapter 4 Deployment / ReplicaSet is reused for secret updates too.

Connection pool — why, and PgBouncer #

If 5 myshop-api Pods are up and the application inside each Pod holds 50 PostgreSQL connections in its own pool, the cluster as a whole occupies 250 RDS connections. Each RDS instance class has a max_connections limit — db.t4g.medium defaults to about 100, and even db.m6g.large defaults to about 800. In an environment where Pods scale up with the HPA of Chapter 13 Autoscaling, this limit is threatened quickly.

A connection pooler comes into this empty space. The most standard one is PgBouncer.

The role of PgBouncer #

Composition
[5 myshop-api Pods] --> [2 PgBouncer] --> [RDS PostgreSQL]
   50 conn each                            20 backend conn

myshop-api connects to PgBouncer, and PgBouncer multiplexes those connections onto a small number of backend connections. In transaction pooling mode, one PostgreSQL connection handles many clients’ short transactions in turn, so utilization efficiency is very high.

Manifest #

pgbouncer-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgbouncer
  namespace: myshop
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: pgbouncer
  template:
    metadata:
      labels:
        app.kubernetes.io/name: pgbouncer
    spec:
      containers:
        - name: pgbouncer
          image: edoburu/pgbouncer:1.22.1
          ports:
            - containerPort: 6432
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: myshop-api-db
                  key: DATABASE_URL
            - name: POOL_MODE
              value: transaction
            - name: MAX_CLIENT_CONN
              value: "1000"
            - name: DEFAULT_POOL_SIZE
              value: "20"
            - name: SERVER_RESET_QUERY
              value: "DISCARD ALL"
          resources:
            requests:
              cpu: 50m
              memory: 64Mi
            limits:
              cpu: 200m
              memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
  name: pgbouncer
  namespace: myshop
spec:
  selector:
    app.kubernetes.io/name: pgbouncer
  ports:
    - port: 5432
      targetPort: 6432

myshop-api’s DATABASE_URL now points not at RDS directly but at pgbouncer.myshop.svc.cluster.local:5432, overridden per environment. Thanks to the ClusterIP + DNS model of Chapter 5 Service, even if RDS’s actual host changes, myshop-api’s environment variables stay the same.

values-prod.yaml — DATABASE_URL to PgBouncer
config:
  DATABASE_URL: "postgresql://myshop_admin:$(DB_PASSWORD)@pgbouncer.myshop.svc.cluster.local:5432/myshop?sslmode=disable"

The trap of transaction pooling #

In PgBouncer’s transaction pooling mode, you cannot safely use PostgreSQL’s prepared statements, advisory locks, or session variables. When one transaction ends, the backend connection moves to another client, so session-level state isn’t maintained. If your ORM automatically uses prepared statements (some SQLAlchemy settings, etc.), you have to turn the option off or switch to session pooling mode.

As an alternative, RDS Proxy is a managed option that does the same role. AWS operates it for you and its integration with IAM authentication is deep, but it adds cost and the transaction pooling trap is the same. We revisit the cost tradeoff of PgBouncer vs RDS Proxy in Chapter 28 Cost optimization.

Schema migration — the Job pattern #

Moving a database schema to a new version is handled in K8s with a Job.

Job manifest #

migration-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: myshop-api-migrate-1.4.2
  namespace: myshop
spec:
  backoffLimit: 3
  ttlSecondsAfterFinished: 86400
  template:
    spec:
      serviceAccountName: myshop-api
      restartPolicy: OnFailure
      containers:
        - name: migrate
          image: 123456789012.dkr.ecr.ap-northeast-2.amazonaws.com/myshop-api:1.4.2
          command: ["alembic", "upgrade", "head"]
          envFrom:
            - secretRef:
                name: myshop-api-db

The Job pattern of Chapter 8 StatefulSet · DaemonSet · Job leads into a production cluster’s migration. It uses the same image as myshop-api and only changes the command to a migration tool (alembic, flyway, golang-migrate, etc.). ttlSecondsAfterFinished: 86400 auto-deletes the Job object after 24 hours — a standard option that keeps finished Jobs from piling up in etcd.

Coupling with deployment — Helm hook #

A new version’s Pods should come up only after the migration finishes. The pattern that forces the order is the Helm hook.

migration-job.yaml — Helm hook annotation
metadata:
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-weight": "0"
    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded

A Job with this annotation attached runs before the new Deployment on helm upgrade, and only on success does it move to the next stage. A migration failure leads naturally to a Pod deployment failure, preventing the accident of new code coming up on top of a broken schema. How this Helm hook corresponds to a PreSync hook in the ArgoCD Sync flow of Chapter 24 CI / CD follows naturally.

Comparison with initContainer #

There’s also a pattern of putting the migration into the Pod’s initContainer. But if there are 5 myshop-api Pods, the migration is attempted 5 times. Some migration tools (alembic, flyway) prevent duplicates with advisory locks, but from a K8s point of view, migration once, as a Job is cleaner. It’s a pattern that keeps responsibility in one place, and in Chapter 27 kubectl debugging patterns too the operational benefit shows up when the diagnostic path for a migration incident narrows to the log of a single Job.

IAM authentication — the path of eliminating the password itself #

The most advanced pattern is RDS’s IAM authentication. It eliminates the password entirely and accesses RDS directly with the IRSA token.

Python code — connecting to RDS with an IAM token
import boto3
import psycopg2

rds_client = boto3.client('rds')
token = rds_client.generate_db_auth_token(
    DBHostname='myshop-prod.abcdef.ap-northeast-2.rds.amazonaws.com',
    Port=5432,
    DBUsername='myshop_app',
    Region='ap-northeast-2',
)

conn = psycopg2.connect(
    host='myshop-prod.abcdef.ap-northeast-2.rds.amazonaws.com',
    port=5432,
    user='myshop_app',
    password=token,  # an IAM token, not a password
    dbname='myshop',
    sslmode='require',
)

generate_db_auth_token creates a 15-minute token with IAM credentials. That token goes in the position of the PostgreSQL password. There’s no need to worry about password rotation, and every access is recorded in CloudTrail.

There are downsides too, though.

  • You have to get a new token every 15 minutes, so coupling with a connection pool is tricky.
  • On the PostgreSQL side, you need a user for IAM authentication (the rds_iam group) and grant management.
  • It’s very hard to use together with PgBouncer transaction pooling.

The traditional password + Secrets Manager + External Secrets model is the goal with the best balance of operational burden and security, and IAM authentication is the option you consider adding in environments with even stricter security. In this book we set the former as the standard path and present IAM authentication as an option only.

Checks after the first connection #

These are the commands to check at the point when both the migration Job and the myshop-api deployment have finished.

Was the Secret created?
kubectl get secret myshop-api-db -n myshop -o jsonpath='{.data.DATABASE_URL}' | base64 -d
Test the DB connection from inside the Pod
kubectl exec -it deployment/myshop-api -n myshop -- \
  psql "$DATABASE_URL" -c "SELECT version();"
PgBouncer stats
kubectl exec -it deployment/pgbouncer -n myshop -- \
  psql -p 6432 pgbouncer -c "SHOW POOLS;"
External Secrets sync status
kubectl get externalsecret myshop-api-db -n myshop
kubectl describe externalsecret myshop-api-db -n myshop

These four steps confirm whether secret synchronization · DB connection · connection pool are all working normally. When PostgreSQL’s version() responds, it’s the moment myshop-api has arrived at a state where it can receive real data. If DATABASE_URL is empty, the ExternalSecret’s synchronization has failed, and the Events of describe show the cause — a typo in the IRSA Role’s ARN, a mismatch in the Secrets Manager secret key name, and a mismatch in the SA namespace of the OIDC trust policy are the three most common.

Exercises #

  1. Apply this chapter’s RDS Terraform module to stand up one PostgreSQL instance in the dev environment. Check the name and format of the Secrets Manager secret created by manage_master_user_password = true, match that secret’s ARN to the resources pattern of the External Secrets IRSA Role, and then apply the ExternalSecret manifest. Measure the time until the K8s Secret is automatically created, and organize the Events of kubectl describe externalsecret in one paragraph.
  2. Switch PgBouncer’s POOL_MODE between the two values transaction and session and compare how myshop-api behaves. Check whether your ORM (SQLAlchemy / Prisma / GORM, etc.) automatically uses prepared statements, and if there’s a case that breaks under transaction pooling, reproduce that trap in your own code. Note which metric you’d use in Chapter 25 Monitoring · alerts to detect the limit you found.
  3. After setting up the migration Job as a Helm hook, deliberately make a failing migration once (e.g., ALTER a column that doesn’t exist). Organize in one paragraph how helm upgrade stops, what state the new version’s Pods are left in, and how this failure is visualized in the ArgoCD Sync flow of Chapter 24 CI / CD.

In one line: the standard for a production workload’s DB integration is the flow where the six tools managed RDS + Secrets Manager + External Secrets + IRSA + PgBouncer + a Helm hook migration Job work together. RDS’s master password is stored in Secrets Manager via manage_master_user_password without passing through human eyes, the External Secrets Operator auto-syncs that secret into a K8s Secret, IRSA grants permissions without static credentials, and PgBouncer prevents the connection pool limit threat. IAM authentication is an option for even stricter environments.

Next chapter #

At this point myshop-api is a complete service with an external entry point · internal workload · DB connection, but the path by which a new version comes in is bound to a human’s hand (helm upgrade). In the next chapter we automate that empty space.

In Chapter 24 CI / CD pipeline we cover the GitOps pipeline where GitHub Actions builds the container and pushes it to ECR, and ArgoCD detects the manifest change in git and auto-syncs it to the cluster. It’s the stage where the model of Chapter 20 GitOps leads into a full EKS pipeline.

X