AWS Intermediate #4: RDS — managed DB, backups, parameter groups

Tuesday, April 21, 2026

9 min read

If #3 S3 was the object layer, now we move to the relational DB layer. AWS’s managed relational DB service is RDS (Relational Database Service). From a single console you can launch and operate PostgreSQL / MySQL / MariaDB / Oracle / SQL Server / Aurora.

In this post we line up the RDS managed model → automated backups and PITR → Multi-AZ and Read Replica → parameter / option groups → upgrades.

DB on EC2 vs RDS #

Everyone hesitates when first moving to the cloud. “Should I spin up an EC2 and install PostgreSQL myself, or go with RDS?”

Item	DB on EC2	RDS
Install / setup	DIY	Console click
Patches / minor upgrades	DIY	Click (or auto)
Backup	DIY (`pg_dump`, cron)	Auto + PITR
Multi-AZ failover	DIY (Patroni, etc.)	Toggle option
Read Replica	DIY (replication setup)	Console click
Monitoring	DIY (`pg_stat_*`)	CloudWatch + Performance Insights
Cost	Instance only	Instance + license + managed premium
Freedom	OS / extensions / kernel everything	Limited (e.g., no superuser)

For production, RDS is the answer 99% of the time. DB-on-EC2 is for special cases — when an extension isn’t supported on RDS, or you need OS-level tuning.

Engine choice #

Engines RDS supports:

RDS engines

PostgreSQL  ── First pick for new projects. JSONB / rich extensions
MySQL       ── Most common choice. Compatibility-driven
MariaDB     ── MySQL fork. Almost identical to MySQL
Oracle      ── Enterprise with expensive license
SQL Server  ── Microsoft ecosystem
Aurora      ── AWS's own engine. PostgreSQL / MySQL compatible

Where Aurora sits #

Aurora is AWS’s cloud-native DB. Wire-compatible with PostgreSQL / MySQL, so you can move with almost no code changes.

	Aurora	RDS PostgreSQL/MySQL
Storage	Distributed (auto 6 copies)	EBS
Max size	128 TB auto-scale	64 TB
Read Replica	Up to 15 (millisecond sync)	5 (async)
Failover time	< 30 sec	1–2 min
Cost	~20% more than RDS	Standard
New features	Serverless v2, Global Database	RDS basics

If scale / availability matter most, Aurora. If cost / simplicity matter, RDS PostgreSQL.

Aurora Serverless v2 is usage-based auto-scaling RDS — attractive for workloads with uneven traffic. Cold starts are nearly gone (the v1 weakness fixed).

Launching an RDS instance #

Create RDS PostgreSQL

aws rds create-db-instance \
  --db-instance-identifier my-postgres \
  --db-instance-class db.t3.micro \
  --engine postgres \
  --engine-version 16.4 \
  --master-username postgres \
  --master-user-password "very-strong-password" \
  --allocated-storage 20 \
  --storage-type gp3 \
  --vpc-security-group-ids sg-0abc... \
  --db-subnet-group-name my-db-subnet-group \
  --backup-retention-period 7 \
  --multi-az \
  --no-publicly-accessible

Common options:

Option	Description
`db-instance-class`	Instance type. `db.t3` (small), `db.m5` (general), `db.r5` (memory)
`engine` / `engine-version`	Engine and version
`allocated-storage`	Disk GB. `storage-type=gp3` is the default
`multi-az`	Standby auto-placed in another AZ
`publicly-accessible`	Public IP. `false` in production
`backup-retention-period`	Auto-backup retention days (0–35)

DB Subnet Group #

RDS needs you to pre-specify subnets for Multi-AZ. That’s the DB Subnet Group. Usually two or more private subnets across AZs.

Create a DB Subnet Group

aws rds create-db-subnet-group \
  --db-subnet-group-name my-db-subnet-group \
  --db-subnet-group-description "DB private subnets" \
  --subnet-ids subnet-0a1... subnet-0b2... subnet-0c3...

The DB sits in a private subnet (#1 VPC) — never directly exposed to the internet. Only the app server SG comes in via SG-by-SG.

Automated backups — the core value of managed #

The real value of RDS lives in backups.

Automated Backup #

If backup-retention-period > 0, automated backups are on.

Daily full backup (during the backup window)
Transaction log every 5 minutes
Kept for the retention period (1–35 days)
Removed when DB is deleted (you can prevent this with SkipFinalSnapshot=false)

Point-in-Time Recovery (PITR) #

RDS with automated backup on lets you restore to any point within the retention window. 5-minute precision via transaction logs.

Restore to a point 3 hours ago

aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier my-postgres \
  --target-db-instance-identifier my-postgres-restored \
  --restore-time 2026-04-21T08:30:00Z

Restore creates a new instance — the original stays intact. “I need exactly the state at 03:27 this morning” becomes entirely doable.

Manual Snapshot #

Backups separate from automated, taken explicitly. They survive even if the DB is deleted, with no retention limit.

Manual snapshot

aws rds create-db-snapshot \
  --db-instance-identifier my-postgres \
  --db-snapshot-identifier my-postgres-2026-04-21-prerelease

Operational uses:

Snapshot right before a major upgrade
Snapshot right before a big migration
Final snapshot when deleting
Copy across regions / accounts (for DR)

Multi-AZ — high availability #

With --multi-az, RDS auto-replicates a standby into another AZ.

Shape of Multi-AZ

   ┌──────────────────────────────────┐
   │           VPC                    │
   │                                  │
   │    AZ a              AZ b        │
   │    ┌──────┐          ┌──────┐    │
   │    │ Pri  │ ◀══════▶ │Stand │    │
   │    │mary  │  sync repl│ by   │    │
   │    └──────┘          └──────┘    │
   │       ▲                          │
   │       │ DNS endpoint             │
   │       │ (auto failover)          │
   └───────┼──────────────────────────┘
           │
       App servers

Synchronous replication — standby has every committed transaction
Auto failover on outage — within 30 sec to 2 min, standby becomes primary and the DNS endpoint repoints
Reads not load-balanced — standby is not used for reads (different from Aurora)

Cost of Multi-AZ #

The cost of duplication is 2x instance / storage cost. Single AZ for learning / side projects, Multi-AZ for production.

Multi-AZ Cluster (option) #

The newer Multi-AZ DB Cluster for PostgreSQL / MySQL has readable standbys and failover under 35 seconds. But uses 3 AZs (3-instance cost).

Read Replica — read distribution #

A Read Replica is an asynchronously replicated read-only copy. Distributes read load on read-heavy workloads.

Create a Read Replica

aws rds create-db-instance-read-replica \
  --db-instance-identifier my-postgres-read-1 \
  --source-db-instance-identifier my-postgres \
  --availability-zone ap-northeast-2c

Properties:

Async replication — slight lag (usually ms–seconds)
Cross-region possible — global read distribution / DR
Up to 5 (Aurora has 15)
Can be promoted to a standalone instance

Where Read Replica fits #

Use	Fit
Read traffic distribution	⭐⭐⭐
Analytics / reporting	⭐⭐⭐
Backup / DR	⭐⭐ (snapshots are safer)
Auto failover	❌ — Read Replicas don’t auto-promote

If read traffic isn’t huge, Multi-AZ Cluster is simpler than Read Replica.

Parameter group and option group #

DB engine settings (max_connections, shared_buffers, etc.) are managed in RDS via parameter groups.

Parameter Group #

Create a custom parameter group

aws rds create-db-parameter-group \
  --db-parameter-group-name my-postgres-16-params \
  --db-parameter-group-family postgres16 \
  --description "Custom params for my workload"

aws rds modify-db-parameter-group \
  --db-parameter-group-name my-postgres-16-params \
  --parameters \
    "ParameterName=max_connections,ParameterValue=200,ApplyMethod=pending-reboot" \
    "ParameterName=log_statement,ParameterValue=ddl,ApplyMethod=immediate"

Types:

Static — applies after DB reboot (max_connections, …)
Dynamic — applies immediately (log_statement, …)

Common parameters:

Parameter	PostgreSQL	MySQL
Max connections	`max_connections`	`max_connections`
Query logging	`log_min_duration_statement`	`slow_query_log`
Memory	`shared_buffers`, `work_mem`	`innodb_buffer_pool_size`
Timezone	`timezone`	`time_zone`

Option Group #

The group for enabling engine-specific extras (e.g., SSIS for SQL Server, OEM for Oracle). Hardly used for PostgreSQL / MySQL.

Upgrades — operational work #

RDS splits engine versions in two.

Minor Upgrade — safe #

Like 16.3 → 16.4. Usually security patches + small improvements. Toggle auto-apply and they happen during the backup window.

Enable auto minor upgrades

aws rds modify-db-instance \
  --db-instance-identifier my-postgres \
  --auto-minor-version-upgrade \
  --apply-immediately

Downtime is 30 sec to 5 min. Shorter on Multi-AZ (standby first → failover → old primary).

Major Upgrade — careful #

Like PostgreSQL 16 → 17. Things can break. Procedure:

Take a manual snapshot (for rollback)
Try the same version migration in a test environment
Upgrade Read Replicas first (when possible)
Schedule downtime outside business hours
aws rds modify-db-instance --engine-version 17.0
Monitor the upgrade
On issues, restore a new instance from the snapshot

Before a major upgrade, audit compatibility issues like PostgreSQL deprecated SQL / MySQL strict mode changes.

Blue/Green Deployment #

RDS’s Blue/Green Deployment is a newer approach to reducing downtime for major upgrades and large changes. A replica (green) is built in the background, and only the final cutover is brief.

Create a Blue/Green deployment

aws rds create-blue-green-deployment \
  --blue-green-deployment-name my-postgres-bg \
  --source arn:aws:rds:ap-northeast-2:123456789012:db:my-postgres \
  --target-engine-version 17.0

Performance Insights — the performance tool #

RDS’s performance monitoring tool. It shows which SQL statements consume the most time, visualized on a graph.

Where Performance Insights sits

Time axis ──▶
DB Load ▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮
        │ ── SELECT ... FROM users WHERE ...
        │ ── UPDATE products SET ...
        │ ── lock:relation

7 days free, anything more costs extra
Slow queries / locks / wait analysis
N+1 patterns we meet in Django Advanced #3 query optimization show up on the graph

RDS Proxy — connection pool #

When Lambda or containers connect to RDS, the overhead of a full TCP / TLS handshake on every invocation is costly. RDS Proxy is a managed connection pool that eliminates this.

Where it helps:

Lambda + RDS — new connection per invocation → pool via Proxy
Container auto-scaling — connections explode as instances multiply
Auto-recovery on failover

Cost is per vCPU-hour — overkill for small workloads.

Common pitfalls #

1) Public RDS #

publicly-accessible=true and SG 0.0.0.0/0 → brute force in days. Production: always private subnet + only the app SG.

2) `master-user-password` in git #

Plain password in scripts / Terraform → leaked. Use Secrets Manager (Advanced #6).

3) Multi-AZ off in production #

Cost-cut and turned Multi-AZ off → 1–2 hour DB outage during AZ failure. Production: turn it on.

4) backup-retention 0 #

Cost-cut and disabled automated backups → PITR is off too. Recovery impossible after an incident. Recommend at least 7 days.

5) Deleting without final snapshot #

Deleting with --skip-final-snapshot for speed → permanent data loss. Force final snapshot in automation like terraform destroy.

6) Storage Auto-Scaling off #

Disk hits 80% at 3am → write fails. Turn on auto-scaling with max-allocated-storage.

Enable Storage Auto-Scaling

aws rds modify-db-instance \
  --db-instance-identifier my-postgres \
  --max-allocated-storage 200

7) Read Replica as a failover #

Read Replicas don’t auto-failover. They need manual promote. Auto failover is Multi-AZ.

8) Connection leak #

App doesn’t close connections, fills max_connections → new requests rejected. Check PgBouncer / RDS Proxy or the app pool config.

Wrap-up #

What we took home this time:

RDS = AWS’s managed relational DB. PostgreSQL / MySQL / Aurora are the common picks
Aurora = AWS’s own engine. Distributed storage, faster failover, more RRs
Place in private subnets via DB Subnet Group. publicly-accessible=false is the production default
Automated backup + PITR = restore at 5-min precision to any point
Manual Snapshot = explicit, survives DB deletion
Multi-AZ = sync replication + auto failover, but standby is unreadable
Read Replica = async copy, for read distribution / analytics. No auto failover
Manage engine settings in parameter groups. Static / Dynamic difference
Minor upgrade = can be auto. Major upgrade = use Blue/Green
Reinforce with Performance Insights + RDS Proxy
Pitfalls — public, password, Multi-AZ off, backup 0, missing final snapshot, storage auto-scale, RR-as-failover, connection leak

Next — Route 53 #

The DB piece is set. Now to the place where users first meet our system — DNS.

In #5 Route 53 — domains and DNS we’ll line up domain registration / Hosted Zones / record types and Aliases / routing policies (Failover / Latency / Geolocation).

DB on EC2 vs RDS #

Engine choice #

Where Aurora sits #

Launching an RDS instance #

DB Subnet Group #

Automated backups — the core value of managed #

Automated Backup #

Point-in-Time Recovery (PITR) #

Manual Snapshot #

Multi-AZ — high availability #

Cost of Multi-AZ #

Multi-AZ Cluster (option) #

Read Replica — read distribution #

Where Read Replica fits #

Parameter group and option group #

Parameter Group #

Option Group #

Upgrades — operational work #

Minor Upgrade — safe #

Major Upgrade — careful #

Blue/Green Deployment #

Performance Insights — the performance tool #

RDS Proxy — connection pool #

Common pitfalls #

1) Public RDS #

2) master-user-password in git #

3) Multi-AZ off in production #

4) backup-retention 0 #

5) Deleting without final snapshot #

6) Storage Auto-Scaling off #

7) Read Replica as a failover #

8) Connection leak #

Wrap-up #

Next — Route 53 #

2) `master-user-password` in git #