AWS Intermediate #2: EC2 Operations — security group, key pair, SSM
In #1 EC2 and VPC Basics we drew the picture of launching one EC2. This post is about handling that EC2 — how to design security rules, how to connect, and what to harden if you want to launch the same instance many times.
About 80% of the work of operating EC2 lives in three things.
- Use Security Groups to control who can come in
- Connect — once SSH + key pair, today SSM Session Manager
- AMI to harden the skeleton so you can recreate fast
Thread these three together and the day-to-day of EC2 operations becomes simple.
What a Security Group looks like #
A Security Group (SG) is a stateful firewall attached to an instance (an ENI, more precisely). One instance can carry multiple SGs, and one SG can be shared by many instances.
Inbound vs Outbound #
SG rules go in two directions:
| Inbound | Outbound | |
|---|---|---|
| Controls | Incoming traffic | Outgoing traffic |
| Default | All blocked | All allowed |
| How often you touch it | ✅ | Almost never |
Remember the defaults. Inbound is blocked by default, outbound is allowed by default. That’s why 99% of SG work is adding inbound rules.
Shape of a rule #
Protocol Port Source Description
TCP 80 0.0.0.0/0 HTTP from anywhere
TCP 443 0.0.0.0/0 HTTPS from anywhere
TCP 22 198.51.100.10/32 SSH from my home IPEach rule is a (protocol, port, source) tuple. The source field can hold one of two things:
- CIDR block —
0.0.0.0/0(any IP),10.0.0.0/16(inside the VPC),198.51.100.10/32(single IP) - Another SG’s ID —
sg-0abc...← this is the truly powerful one
SG referencing SG #
The core operational pattern is SGs pointing at other SGs.
ALB SG (sg-alb)
Inbound: TCP 443 from 0.0.0.0/0
App SG (sg-app)
Inbound: TCP 8080 from sg-alb ← not an IP, the SG itselfEven when the ALB’s IP changes (and ALB really does shuffle IPs dynamically), the SG-pointer keeps up automatically. Rule maintenance for operational infrastructure becomes much simpler.
Common SG patterns #
ALB SG (sg-alb)
in: 443 ← 0.0.0.0/0
out: all
App SG (sg-app)
in: 8080 ← sg-alb ← only ALB can come in
22 ← sg-bastion ← SSH from Bastion (legacy approach)
out: all
DB SG (sg-db)
in: 5432 ← sg-app ← only the app servers reach the DB
out: all (or closed)
Bastion SG (sg-bastion)
in: 22 ← 198.51.100.10/32 ← only your own IP
out: allThe point is rules flow “SG → SG”, not by IP.
When to lock down outbound #
The default is to allow all outbound, but to prevent data exfiltration if the instance gets compromised, there’s a pattern of narrowing outbound too. Apply it first to production DBs / internal systems.
App SG outbound:
TCP 5432 → sg-db ← DB only
TCP 443 → 0.0.0.0/0 ← external API calls
TCP 53 → 0.0.0.0/0 ← DNS
UDP 53 → 0.0.0.0/0 ← DNSNACL — the other layer #
The VPC’s second firewall is the NACL (Network Access Control List). It works at the subnet level.
| Security Group | NACL | |
|---|---|---|
| Applied at | Instance (ENI) | Subnet |
| Stateful | ✅ | ❌ (responses also need explicit allow) |
| Rule kinds | Allow only | Allow + Deny |
| Order of evaluation | All rules | By number (lower first) |
| Day to day | Touched daily | Hardly touched |
You don’t use NACLs much. The default NACL allows all, and SGs are granular enough. NACLs come up when:
- Blocking a specific IP range (when Deny is needed — SGs don’t have Deny)
- Temporary block during attack
- Compliance demands subnet-level explicit blocks
NACL’s stateless trap #
Because NACLs are stateless, you have to allow response traffic explicitly.
Inbound Allow TCP 1024-65535 0.0.0.0/0 ← ephemeral port responses
Outbound Allow TCP 80 0.0.0.0/01024-65535 is the ephemeral port range. Miss it and responses won’t come back. SGs are stateful so this is automatic; with NACLs it’s explicit.
Where key pairs sit, and where they break #
EC2 SSH access has used key pairs since the beginning.
# Create the key pair
aws ec2 create-key-pair --key-name my-key --query 'KeyMaterial' --output text > my-key.pem
chmod 400 my-key.pem
# Specify the key when launching
aws ec2 run-instances --key-name my-key ...
# Connect
ssh -i my-key.pem ec2-user@<public-ip>When the EC2 boots, the key is automatically embedded in ~/.ssh/authorized_keys so SSH works.
Limits of key pairs #
The key-pair model breaks at scale:
- Lost key — once gone, you can’t make it again. Recreate the instance, or mount the EBS and add manually
- Sharing risk — you have to give it to teammates, but once leaked it can’t be recalled
- Audit is hard — who connected when needs separate logging
- Port 22 exposed to the internet — attack surface
- No MFA — having the key is enough
EC2 Instance Connect #
The console mints a temporary one-time SSH key. You still need port 22 open in the SG. The “Connect” button in the console uses this.
SSM Session Manager — keyless connect #
The Session Manager in SSM (AWS Systems Manager) is the new standard for EC2 access. You get a shell into the EC2 without opening port 22, without a key.
[my computer] ──HTTPS──▶ [SSM Endpoint] ◀──HTTPS──[SSM Agent inside EC2]
│
▼
IAM permission checkThe SSM Agent inside the EC2 makes an outbound connection via the AWS API, and your console shell flows back through that channel. Because the direction is reversed (the EC2 makes outbound), no inbound port 22 is needed in the SG.
Session Manager setup #
- SSM Agent installed in the AMI — Amazon Linux 2023 / latest Ubuntu include it by default
- The EC2’s IAM Role has the
AmazonSSMManagedInstanceCorepolicy - Outbound internet or a VPC Endpoint (so EC2s in private subnets can use SSM too)
aws ssm start-session --target i-0abc1234def567890
# Port forwarding works too
aws ssm start-session --target i-0abc... \
--document-name AWS-StartPortForwardingSession \
--parameters '{"portNumber":["80"],"localPortNumber":["8080"]}'key pair vs Session Manager #
| key pair (SSH) | Session Manager | |
|---|---|---|
| Port 22 | Must open | Don’t need to |
| Key management | DIY | None |
| Authentication | SSH key | IAM (MFA possible) |
| Audit log | Separate | CloudTrail / S3 automatic |
| Private subnet | Bastion needed | Direct via VPC Endpoint |
| Port forwarding | ssh -L | start-session |
Session Manager is almost always the right answer in production. For IAM details see Basics #2, for security details see Basics #6.
Don’t confuse this with CloudShell. Basics #5 CloudShell is a browser terminal inside the AWS console (where you run
aws cliwith your IAM credentials). Session Manager is a shell inside an EC2 instance.
EC2 metadata service (IMDS) #
The way an EC2 reads its own info (instance ID, region, IAM role credentials, …) is the IMDS (Instance Metadata Service).
TOKEN=$(curl -X PUT http://169.254.169.254/latest/api/token \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-idThe link-local address 169.254.169.254 only responds inside an EC2. The IAM Role’s temporary credentials come from here too — that’s why aws cli inside an EC2 just works.
IMDSv1 vs IMDSv2 #
The old way was tokenless GET (IMDSv1). After repeated SSRF incidents that scraped tokens, IMDSv2 arrived — PUT for a token, then GET with it. New instances should be IMDSv2-only.
aws ec2 modify-instance-metadata-options \
--instance-id i-0abc... \
--http-tokens required \
--http-endpoint enabledBuilding an AMI — harden the skeleton #
To launch the same setup fast and many times, you have two paths.
- Build an AMI — snapshot a current instance into a new AMI
- User data + IaC — start from a blank AMI and run setup scripts at boot
Building an AMI #
Right-click the instance in the console → “Create image”, or:
aws ec2 create-image \
--instance-id i-0abc... \
--name "my-app-2026-04-19" \
--description "Node 20 + nginx + my-app v1.2.3" \
--no-reboot # optional — without reboot (disk consistency may be slightly weaker)The resulting AMI is:
- An EBS snapshot of the instance + metadata
- New instances launched from it start with the same disk state
- Per region — use
copy-imagefor other regions
User data — boot script #
Instead of an AMI, launch from a blank OS image and use a boot script for setup. More flexible than an AMI, and easier to track changes.
#!/bin/bash
yum update -y
yum install -y nginx
systemctl enable --now nginx
# Pull app code
aws s3 cp s3://my-bucket/app.tar.gz /tmp/
tar -xzf /tmp/app.tar.gz -C /opt/myappUser data runs once on the first boot. Logs at /var/log/cloud-init-output.log.
Golden AMI vs User data #
| Golden AMI | User data | |
|---|---|---|
| Boot speed | Fast | Slow (script execution time) |
| Change management | Rebuild AMI | Edit script |
| Reproducibility | Very high | External deps (yum repo, S3) can drift |
| Method | Fast ASG scaling / stable | Dev / quick changes |
In production, both are used together — the golden AMI bakes OS / dependencies, and user data slots in the app version.
Auto Scaling Group — automatic recovery #
If an instance dies, the thing that launches a new one and re-attaches to the ALB is the ASG (Auto Scaling Group).
Launch Template (instance template: AMI, type, SG, key, user data)
│
▼
┌─────────┐
│ ASG │ desired=2 min=2 max=10
└─────────┘
│
├─── EC2 (AZ a) ← failed health check → terminate + relaunch
├─── EC2 (AZ b)
└─── EC2 (AZ b)Just the basics:
- Launch Template — defines what kind of EC2 to launch (AMI, type, SG, IAM, user data)
- Desired / Min / Max — number to maintain, minimum, maximum
- Health Check — by EC2 itself (
EC2) or by ALB target group (ELB)
For details on ASG, Advanced #1 ECS / Fargate is a smoother alternative — ECS absorbs container ASG.
Common pitfalls #
1) “Why can’t the ALB reach my EC2?” #
Checklist (top to bottom):
- ALB SG outbound matches EC2 SG inbound
- EC2 SG inbound has the ALB SG as a source
- The OS-level firewall on the EC2 (
firewalld,ufw) also allows the port - The ALB target group’s health check path returns 200
- The EC2 is listening on that port (
ss -tlnp)
It’s usually 1) or 2). If the SG entry is written by IP, switching to SG-by-SG is the operational answer.
2) “I lost the key but I need to get inside” #
- Session Manager is on →
aws ssm start-session - Not on → stop the instance → detach EBS → mount on another EC2 → edit
~/.ssh/authorized_keys→ reattach - Or snapshot the EBS and launch a new instance with a new key
3) Outbound left wide → data exfiltration #
If outbound is fully open when an EC2 is compromised, the attacker can send data to any IP. For DB servers / internal systems, narrow outbound is the best practice.
4) NACL deny that breaks responses #
You forgot NACL is stateless and only allowed outbound → inbound responses get blocked. Almost always: leave NACLs at default and only touch SGs.
5) IMDSv1 still on #
An old AMI / setup is running IMDSv1 → SSRF surface. Apply --http-tokens required to all instances.
6) AMI too big, slow boot #
You snapshot a long-running instance straight into an AMI and it grows to 5GB+. Boot time goes up. Before the AMI:
- Clean logs / cache / temp files (
yum clean all, etc.) cloud-init clean(so init runs again on next boot)- Empty swap / journal
Wrap-up #
What we took home this time:
- SG = per-instance stateful firewall. SG → SG is more powerful than IP
- Defaults are inbound blocked + outbound allowed. Narrowing outbound is data-exfil defense
- NACLs are per-subnet, stateless. Hardly touched. Stateless means ephemeral ports need explicit allow
- key pair is the old standard. Lost keys / sharing / port 22 exposure are its limits
- SSM Session Manager is the new standard. No port 22, no key, IAM auth, audit logs automatic
- Force IMDSv2 — SSRF defense
- Use AMI to harden the skeleton + user data for boot setup, often both
- ASG for automatic recovery. Launch Template + desired/min/max + health check
- Pitfalls — 5-step ALB→EC2 check, key-less recovery, outbound all, NACL stateless, IMDSv1, oversized AMI
Next — S3 #
The EC2 piece is set. Now we move to the most common thing EC2 reaches into — object storage.
In #3 S3 — static hosting, presigned URL we’ll line up the shape of a bucket, policies and Public Access Block, static site hosting, presigned URLs, and other daily patterns.