AWS Certified CloudOps Engineer - Associate (SOA-C03) #10 Domain 4-1 Networking — VPC Operations and Connectivity Troubleshooting

5 min read

Through #9 we finished the deployment and automation domain. The fourth domain is networking and content delivery (18%). In operations, networking questions almost always start with “it won’t connect.” So the goal of this post is not to memorize VPC components, but to lock in the operational procedure for where to start and in what order to trace a connectivity failure.

VPC Components Quick Summary #

ComponentRole
SubnetAn IP range that divides a VPC per AZ. Public,private
Route TableDecides where traffic goes
Internet Gateway (IGW)VPC ↔ internet, bidirectional
NAT GatewayOutbound internet for private subnets (inbound blocked)
Security GroupFirewall at the instance (ENI) level
NACLFirewall at the subnet level

The difference between a public and a private subnet comes down to just one thing: whether the route table has a path to the IGW (0.0.0.0/0 → igw). Even if an instance has a public IP, without that route it can’t reach the internet.

Security Group vs NACL #

Half of connectivity troubleshooting is these two. You need to know the difference precisely.

ItemSecurity GroupNACL
ScopeInstance (ENI)Subnet
StateStateful (return traffic auto-allowed)Stateless (responses also need rules)
RulesAllow onlyBoth allow and deny
EvaluationAll rules combinedIn rule-number order

The most important difference is statefulness. A security group is stateful, so if you allow inbound, the outbound response is automatically allowed. A NACL is stateless, so you must open both inbound and outbound (especially the ephemeral ports 1024〜65535). The crux of the classic “the security group rules are correct but the response is blocked by the NACL” scenario is exactly these stateless ephemeral ports.

External Connectivity for Private Subnets #

A private instance has two paths to communicate with the outside.

PurposeMeans
Outbound to the internet (patches,external APIs)NAT Gateway
Private connection to AWS services (S3,DynamoDB,SSM, etc.)VPC endpoint

VPC Endpoints #

TypeTargetsBehavior
Gateway endpointS3, DynamoDBAdds a route to the route table (free)
Interface endpoint (PrivateLink)Most services (SSM,ECR, etc.)Provides a private IP via an ENI (paid)

The value of an endpoint is that traffic doesn’t traverse the internet. It addresses both cost (reduced NAT data processing fees) and security. The answer to “a private instance accesses S3, reduce NAT cost” is an S3 Gateway endpoint, and the answer to “#8’s SSM doesn’t work from a private subnet” is an Interface endpoint.

VPC-to-VPC Connectivity #

MeansWhen
VPC PeeringConnects two VPCs 1:1. Not transitive
Transit GatewayConnects many VPCs,on-premises through a hub. Large scale
Site-to-Site VPNOn-premises ↔ VPC (over the internet, encrypted)
Direct ConnectOn-premises ↔ AWS dedicated line (low latency,stable)

The key trap is that peering is not transitive. Even if A-B and B-C are peered, A cannot communicate with C. The answer to “the mesh connectivity is getting complex as VPCs grow” is Transit Gateway.

Connectivity Troubleshooting Order #

When you hit “I can’t connect to the instance,” the order you check in operations is generally as follows.

  1. Route table: Is there a route to the destination (IGW,NAT,endpoint)?
  2. Security group: Is the relevant port,source allowed on inbound?
  3. NACL: Is inbound,outbound (including ephemeral ports) not blocked?
  4. Public IP / DNS: For public access, verify the public IP,DNS
  5. OS firewall: iptables, etc., inside the instance

The diagnostic tools are also worth remembering.

  • VPC Flow Logs: Record traffic at the ENI,subnet,VPC level. They show ACCEPT,REJECT, giving you a clue about where it was blocked
  • Reachability Analyzer: Specify a source and destination, and it statically analyzes what along the path is blocking it
  • Network Access Analyzer: Checks for unintended network paths (exposure)

The answer to “traffic is rejected but I don’t know whether it’s the security group or the NACL” is VPC Flow Logs (check REJECT), and “validate in advance whether two resources can connect” is Reachability Analyzer.

Exam Question Patterns #

  • Public but no internet → check the IGW route in the route table
  • Security group is open but the response is blocked → NACL ephemeral ports (stateless)
  • Access S3 from a private subnet, reduce NAT cost → S3 Gateway endpoint
  • Private connection to SSM,ECR from a private subnet → Interface endpoint
  • Many VPCs, connectivity is complex → Transit Gateway (peering is non-transitive)
  • Don’t know where traffic is blocked → VPC Flow Logs (REJECT)
  • Pre-validate whether a connection is possible → Reachability Analyzer

Common Pitfalls #

1) Treating security groups and NACLs as the same thing #

An SG is instance-level,stateful,allow-only; a NACL is subnet-level,stateless,allow/deny. A NACL must also open the response ports.

2) Thinking peering is transitive #

Even with A-B-C peering, A-C requires direct peering or a Transit Gateway.

3) Mistaking NAT Gateway for handling inbound #

NAT is outbound only. Inbound coming from the outside into a private subnet cannot go through NAT.

4) Assuming SSM works from a private subnet without an endpoint #

Without internet,NAT, a private instance’s SSM,ECR needs an Interface endpoint.

Summary #

What we covered in this post:

  • The difference between public and private is whether the route table has an IGW route
  • Security group (instance,stateful,allow) vs NACL (subnet,stateless,allow/deny). A NACL also needs ephemeral ports
  • Private external connectivity is NAT (outbound internet) and VPC endpoints (AWS private connection). Gateway (S3,DynamoDB,free) vs Interface (PrivateLink)
  • VPC-to-VPC is peering (non-transitive),Transit Gateway (hub); on-premises is VPN,Direct Connect
  • Troubleshooting goes route → SG → NACL → public IP → OS in order. Diagnose with Flow Logs,Reachability Analyzer

Next: Domain 4-2 Route 53 and CloudFront #

Now that we’ve locked in connectivity inside the VPC, next is DNS and content delivery.

In #11 Domain 4-2 Networking: Route 53,CloudFront,Delivery Operations, I’ll cover Route 53 records and routing policies, health checks, CloudFront caching and origin configuration, TLS certificates (ACM), and content delivery troubleshooting.

X