AWS Intermediate #2: EC2 Operations — security group, key pair, SSM

Infrastructure AWS EC2 Security Group SSM

Sunday, April 19, 2026

10 min read

In #1 EC2 and VPC Basics we drew the picture of launching one EC2. This post is about handling that EC2 — how to design security rules, how to connect, and what to harden if you want to launch the same instance many times.

About 80% of the work of operating EC2 lives in three things.

Use Security Groups to control who can come in
Connect — once SSH + key pair, today SSM Session Manager
AMI to harden the skeleton so you can recreate fast

Thread these three together and the day-to-day of EC2 operations becomes simple.

What a Security Group looks like #

A Security Group (SG) is a stateful firewall attached to an instance (an ENI, more precisely). One instance can carry multiple SGs, and one SG can be shared by many instances.

Inbound vs Outbound #

SG rules go in two directions:

	Inbound	Outbound
Controls	Incoming traffic	Outgoing traffic
Default	All blocked	All allowed
How often you touch it	✅	Almost never

Remember the defaults. Inbound is blocked by default, outbound is allowed by default. That’s why 99% of SG work is adding inbound rules.

Shape of a rule #

Example inbound rules for a web server SG

Protocol  Port      Source              Description
TCP       80        0.0.0.0/0           HTTP from anywhere
TCP       443       0.0.0.0/0           HTTPS from anywhere
TCP       22        198.51.100.10/32    SSH from my home IP

Each rule is a (protocol, port, source) tuple. The source field can hold one of two things:

CIDR block — 0.0.0.0/0 (any IP), 10.0.0.0/16 (inside the VPC), 198.51.100.10/32 (single IP)
Another SG’s ID — sg-0abc... ← this is the truly powerful one

SG referencing SG #

The core operational pattern is SGs pointing at other SGs.

ALB → app EC2 pattern

ALB SG  (sg-alb)
  Inbound:  TCP 443 from 0.0.0.0/0

App SG  (sg-app)
  Inbound:  TCP 8080 from sg-alb     ← not an IP, the SG itself

Even when the ALB’s IP changes (and ALB really does shuffle IPs dynamically), the SG-pointer keeps up automatically. Rule maintenance for operational infrastructure becomes much simpler.

Common SG patterns #

3-tier web app SG design

ALB SG (sg-alb)
  in:  443 ← 0.0.0.0/0
  out: all

App SG (sg-app)
  in:  8080 ← sg-alb              ← only ALB can come in
       22   ← sg-bastion          ← SSH from Bastion (legacy approach)
  out: all

DB SG  (sg-db)
  in:  5432 ← sg-app              ← only the app servers reach the DB
  out: all (or closed)

Bastion SG (sg-bastion)
  in:  22 ← 198.51.100.10/32      ← only your own IP
  out: all

The point is rules flow “SG → SG”, not by IP.

When to lock down outbound #

The default is to allow all outbound, but to prevent data exfiltration if the instance gets compromised, there’s a pattern of narrowing outbound too. Apply it first to production DBs / internal systems.

Example narrowed outbound

App SG outbound:
  TCP 5432 → sg-db                 ← DB only
  TCP 443  → 0.0.0.0/0             ← external API calls
  TCP 53   → 0.0.0.0/0             ← DNS
  UDP 53   → 0.0.0.0/0             ← DNS

NACL — the other layer #

The VPC’s second firewall is the NACL (Network Access Control List). It works at the subnet level.

	Security Group	NACL
Applied at	Instance (ENI)	Subnet
Stateful	✅	❌ (responses also need explicit allow)
Rule kinds	Allow only	Allow + Deny
Order of evaluation	All rules	By number (lower first)
Day to day	Touched daily	Hardly touched

You don’t use NACLs much. The default NACL allows all, and SGs are granular enough. NACLs come up when:

Blocking a specific IP range (when Deny is needed — SGs don’t have Deny)
Temporary block during attack
Compliance demands subnet-level explicit blocks

NACL’s stateless trap #

Because NACLs are stateless, you have to allow response traffic explicitly.

Example NACL rules — to receive responses to outbound TCP 80

Inbound  Allow  TCP  1024-65535  0.0.0.0/0   ← ephemeral port responses
Outbound Allow  TCP  80          0.0.0.0/0

1024-65535 is the ephemeral port range. Miss it and responses won’t come back. SGs are stateful so this is automatic; with NACLs it’s explicit.

Where key pairs sit, and where they break #

EC2 SSH access has used key pairs since the beginning.

Create a key pair + SSH connect

# Create the key pair
aws ec2 create-key-pair --key-name my-key --query 'KeyMaterial' --output text > my-key.pem
chmod 400 my-key.pem

# Specify the key when launching
aws ec2 run-instances --key-name my-key ...

# Connect
ssh -i my-key.pem ec2-user@<public-ip>

When the EC2 boots, the key is automatically embedded in ~/.ssh/authorized_keys so SSH works.

Limits of key pairs #

The key-pair model breaks at scale:

Lost key — once gone, you can’t make it again. Recreate the instance, or mount the EBS and add manually
Sharing risk — you have to give it to teammates, but once leaked it can’t be recalled
Audit is hard — who connected when needs separate logging
Port 22 exposed to the internet — attack surface
No MFA — having the key is enough

EC2 Instance Connect #

The console mints a temporary one-time SSH key. You still need port 22 open in the SG. The “Connect” button in the console uses this.

SSM Session Manager — keyless connect #

The Session Manager in SSM (AWS Systems Manager) is the new standard for EC2 access. You get a shell into the EC2 without opening port 22, without a key.

Session Manager flow

[my computer] ──HTTPS──▶ [SSM Endpoint] ◀──HTTPS──[SSM Agent inside EC2]
                          │
                          ▼
                   IAM permission check

The SSM Agent inside the EC2 makes an outbound connection via the AWS API, and your console shell flows back through that channel. Because the direction is reversed (the EC2 makes outbound), no inbound port 22 is needed in the SG.

Session Manager setup #

SSM Agent installed in the AMI — Amazon Linux 2023 / latest Ubuntu include it by default
The EC2’s IAM Role has the AmazonSSMManagedInstanceCore policy
Outbound internet or a VPC Endpoint (so EC2s in private subnets can use SSM too)

Connect from CLI instead of the console

aws ssm start-session --target i-0abc1234def567890

# Port forwarding works too
aws ssm start-session --target i-0abc... \
  --document-name AWS-StartPortForwardingSession \
  --parameters '{"portNumber":["80"],"localPortNumber":["8080"]}'

key pair vs Session Manager #

	key pair (SSH)	Session Manager
Port 22	Must open	Don’t need to
Key management	DIY	None
Authentication	SSH key	IAM (MFA possible)
Audit log	Separate	CloudTrail / S3 automatic
Private subnet	Bastion needed	Direct via VPC Endpoint
Port forwarding	`ssh -L`	`start-session`

Session Manager is almost always the right answer in production. For IAM details see Basics #2, for security details see Basics #6.

Don’t confuse this with CloudShell. Basics #5 CloudShell is a browser terminal inside the AWS console (where you run aws cli with your IAM credentials). Session Manager is a shell inside an EC2 instance.

EC2 metadata service (IMDS) #

The way an EC2 reads its own info (instance ID, region, IAM role credentials, …) is the IMDS (Instance Metadata Service).

IMDSv2 — get token, then read metadata

TOKEN=$(curl -X PUT http://169.254.169.254/latest/api/token \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

The link-local address 169.254.169.254 only responds inside an EC2. The IAM Role’s temporary credentials come from here too — that’s why aws cli inside an EC2 just works.

IMDSv1 vs IMDSv2 #

The old way was tokenless GET (IMDSv1). After repeated SSRF incidents that scraped tokens, IMDSv2 arrived — PUT for a token, then GET with it. New instances should be IMDSv2-only.

Force IMDSv2

aws ec2 modify-instance-metadata-options \
  --instance-id i-0abc... \
  --http-tokens required \
  --http-endpoint enabled

Building an AMI — harden the skeleton #

To launch the same setup fast and many times, you have two paths.

Build an AMI — snapshot a current instance into a new AMI
User data + IaC — start from a blank AMI and run setup scripts at boot

Building an AMI #

Right-click the instance in the console → “Create image”, or:

Build an AMI

aws ec2 create-image \
  --instance-id i-0abc... \
  --name "my-app-2026-04-19" \
  --description "Node 20 + nginx + my-app v1.2.3" \
  --no-reboot     # optional — without reboot (disk consistency may be slightly weaker)

The resulting AMI is:

An EBS snapshot of the instance + metadata
New instances launched from it start with the same disk state
Per region — use copy-image for other regions

User data — boot script #

Instead of an AMI, launch from a blank OS image and use a boot script for setup. More flexible than an AMI, and easier to track changes.

Example user data — Amazon Linux 2023

#!/bin/bash
yum update -y
yum install -y nginx
systemctl enable --now nginx

# Pull app code
aws s3 cp s3://my-bucket/app.tar.gz /tmp/
tar -xzf /tmp/app.tar.gz -C /opt/myapp

User data runs once on the first boot. Logs at /var/log/cloud-init-output.log.

Golden AMI vs User data #

	Golden AMI	User data
Boot speed	Fast	Slow (script execution time)
Change management	Rebuild AMI	Edit script
Reproducibility	Very high	External deps (yum repo, S3) can drift
Method	Fast ASG scaling / stable	Dev / quick changes

In production, both are used together — the golden AMI bakes OS / dependencies, and user data slots in the app version.

Auto Scaling Group — automatic recovery #

If an instance dies, the thing that launches a new one and re-attaches to the ALB is the ASG (Auto Scaling Group).

Where ASG sits

Launch Template (instance template: AMI, type, SG, key, user data)
        │
        ▼
   ┌─────────┐
   │   ASG   │  desired=2  min=2  max=10
   └─────────┘
        │
        ├─── EC2 (AZ a)  ← failed health check → terminate + relaunch
        ├─── EC2 (AZ b)
        └─── EC2 (AZ b)

Just the basics:

Launch Template — defines what kind of EC2 to launch (AMI, type, SG, IAM, user data)
Desired / Min / Max — number to maintain, minimum, maximum
Health Check — by EC2 itself (EC2) or by ALB target group (ELB)

For details on ASG, Advanced #1 ECS / Fargate is a smoother alternative — ECS absorbs container ASG.

Common pitfalls #

1) “Why can’t the ALB reach my EC2?” #

Checklist (top to bottom):

ALB SG outbound matches EC2 SG inbound
EC2 SG inbound has the ALB SG as a source
The OS-level firewall on the EC2 (firewalld, ufw) also allows the port
The ALB target group’s health check path returns 200
The EC2 is listening on that port (ss -tlnp)

It’s usually 1) or 2). If the SG entry is written by IP, switching to SG-by-SG is the operational answer.

2) “I lost the key but I need to get inside” #

Session Manager is on → aws ssm start-session
Not on → stop the instance → detach EBS → mount on another EC2 → edit ~/.ssh/authorized_keys → reattach
Or snapshot the EBS and launch a new instance with a new key

3) Outbound left wide → data exfiltration #

If outbound is fully open when an EC2 is compromised, the attacker can send data to any IP. For DB servers / internal systems, narrow outbound is the best practice.

4) NACL deny that breaks responses #

You forgot NACL is stateless and only allowed outbound → inbound responses get blocked. Almost always: leave NACLs at default and only touch SGs.

5) IMDSv1 still on #

An old AMI / setup is running IMDSv1 → SSRF surface. Apply --http-tokens required to all instances.

6) AMI too big, slow boot #

You snapshot a long-running instance straight into an AMI and it grows to 5GB+. Boot time goes up. Before the AMI:

Clean logs / cache / temp files (yum clean all, etc.)
cloud-init clean (so init runs again on next boot)
Empty swap / journal

Wrap-up #

What we took home this time:

SG = per-instance stateful firewall. SG → SG is more powerful than IP
Defaults are inbound blocked + outbound allowed. Narrowing outbound is data-exfil defense
NACLs are per-subnet, stateless. Hardly touched. Stateless means ephemeral ports need explicit allow
key pair is the old standard. Lost keys / sharing / port 22 exposure are its limits
SSM Session Manager is the new standard. No port 22, no key, IAM auth, audit logs automatic
Force IMDSv2 — SSRF defense
Use AMI to harden the skeleton + user data for boot setup, often both
ASG for automatic recovery. Launch Template + desired/min/max + health check
Pitfalls — 5-step ALB→EC2 check, key-less recovery, outbound all, NACL stateless, IMDSv1, oversized AMI

Next — S3 #

The EC2 piece is set. Now we move to the most common thing EC2 reaches into — object storage.

In #3 S3 — static hosting, presigned URL we’ll line up the shape of a bucket, policies and Public Access Block, static site hosting, presigned URLs, and other daily patterns.