AWS Advanced #5: EventBridge / SQS / SNS
If #3 Lambda and #4 API Gateway were the synchronous side, this is the asynchronous / event-driven side. Callers don’t wait for a result; components are loosely connected via messages.
AWS has three tools for this — SNS / SQS / EventBridge. Their roles look similar but each emphasizes different things. This post draws the lines between them and covers the operational topics — fan-out, DLQ, idempotency.
The three at a glance #
| SNS | SQS | EventBridge | |
|---|---|---|---|
| Model | Pub/Sub (push) | Queue (pull) | Event Bus (push, with routing rules) |
| Receivers | 0–N subscribers in parallel | One per message | 0–N (per rule match) |
| Retention | Push immediately, lost if no receiver | Up to 14 days in queue | Push immediately on rule match |
| Where it fits | One event → many receivers | Work queue, backpressure | Event-based routing (multiple sources) |
| Integrations | Lambda, SQS, HTTP, Email, SMS | Lambda, ECS, EC2 (poll) | Lambda, SQS, SNS, ECS, Step Functions, … 25+ |
The most common pattern in one diagram:
┌──────────────┐
│ │
producer ─→ SNS ─→ ├─→ SQS ─→ Lambda
└─→ Lambda Worker (long processing)
(one event) (push immediately)
producer ─→ EventBridge ──┬─→ Lambda (rule A match)
(varied sources) ├─→ Step Functions (rule B)
└─→ SQS (rule C)SNS — Pub/Sub #
Publish to a topic and all subscribers get it immediately.
Model #
publish → Topic ──┬─→ subscriber 1 (Lambda)
├─→ subscriber 2 (SQS)
├─→ subscriber 3 (Email)
├─→ subscriber 4 (HTTP endpoint)
└─→ subscriber 5 (SMS)Use this when one event happens and many components need to know about it simultaneously.
Create + publish #
TOPIC_ARN=$(aws sns create-topic --name user-events \
--query TopicArn --output text)
# Lambda subscriber
aws sns subscribe \
--topic-arn $TOPIC_ARN \
--protocol lambda \
--notification-endpoint arn:aws:lambda:ap-northeast-2:123456789012:function:on-user-event
# Email subscriber (active after confirmation email)
aws sns subscribe \
--topic-arn $TOPIC_ARN \
--protocol email \
--notification-endpoint admin@example.com
# Publish
aws sns publish \
--topic-arn $TOPIC_ARN \
--message '{"userId":42,"action":"signup"}'Standard vs FIFO #
- Standard: nearly unlimited throughput, no ordering, possible duplicates
- FIFO: in-group ordering, exactly-once. Throughput-limited (3,000 msg/s with batching)
Name needs .fifo suffix and MessageGroupId is required.
Message filtering #
Same topic publish, but only some subscribers get it:
aws sns set-subscription-attributes \
--subscription-arn $SUB_ARN \
--attribute-name FilterPolicy \
--attribute-value '{"event": ["signup", "purchase"]}'Only delivers when MessageAttributes.event matches at publish time.
Where it fits #
- One event → many handlers (email, notification, analytics, audit log)
- Email / SMS / Mobile Push notifications
- Entry to SQS fan-out
SQS — Queue #
Producer drops messages on a queue; one consumer polls and processes.
Model #
producer ─→ Queue ─→ consumer (Lambda / ECS Task)
↑
pull (polling)
Receive-from model — **backpressure** is built in. Slow consumers cause the queue to fill, and the producer is unaffected meanwhile.Create + send / receive #
QUEUE_URL=$(aws sqs create-queue --queue-name jobs \
--query QueueUrl --output text)
# Send
aws sqs send-message \
--queue-url $QUEUE_URL \
--message-body '{"job":"resize","key":"photo.jpg"}'
# Receive
aws sqs receive-message \
--queue-url $QUEUE_URL \
--max-number-of-messages 10 \
--wait-time-seconds 20 # long polling
# Delete (after processing)
aws sqs delete-message \
--queue-url $QUEUE_URL \
--receipt-handle <handle>Auto-wire to Lambda #
Attach SQS as a Lambda trigger and the polling / deletion is handled by Lambda’s infrastructure.
aws lambda create-event-source-mapping \
--function-name process-jobs \
--event-source-arn arn:aws:sqs:ap-northeast-2:123456789012:jobs \
--batch-size 10 \
--maximum-batching-window-in-seconds 5def handler(event, context):
for record in event["Records"]:
body = json.loads(record["body"])
process_job(body)
# Normal exit → Lambda auto-deletes the message
# Exception → message stays, returns to queue after visibility timeoutVisibility Timeout — the most important piece #
When a consumer receives a message, it’s invisible to others for N seconds. Process + delete must finish in that window. Otherwise it surfaces back and another consumer can receive it → duplicate processing risk.
| Setting | Recommended |
|---|---|
| Average processing 1 sec | Visibility Timeout 30 sec (margin) |
| Average processing 30 sec | 60–90 sec |
| Variable / could-be-long | Use change-message-visibility to extend during processing |
Standard vs FIFO #
Same as SNS.
- Standard: nearly unlimited throughput, no ordering, at-least-once (rare duplicates)
- FIFO: in-group ordering, exactly-once (
MessageDeduplicationId)
Dead Letter Queue (DLQ) #
Move failed messages to a separate queue. Operational must.
# create the DLQ
DLQ_URL=$(aws sqs create-queue --queue-name jobs-dlq \
--query QueueUrl --output text)
DLQ_ARN=$(aws sqs get-queue-attributes \
--queue-url $DLQ_URL --attribute-names QueueArn \
--query 'Attributes.QueueArn' --output text)
# attach redrive policy on main queue
aws sqs set-queue-attributes \
--queue-url $QUEUE_URL \
--attributes '{
"RedrivePolicy": "{\"deadLetterTargetArn\":\"'"$DLQ_ARN"'\",\"maxReceiveCount\":\"3\"}"
}'After 3 retry failures → message goes to DLQ. Humans then analyze / fix / re-queue.
Where it fits #
- Work queues (image processing, mail send, external API calls)
- Absorbing traffic spikes
- Decoupling components (producer doesn’t feel consumer’s speed)
EventBridge — Event Bus #
EventBridge routes events from many sources to many targets by rules.
Model #
event source (S3, EC2 state change, external SaaS, your code...)
│
▼
Event Bus (default or custom)
│
├─ Rule A (pattern: source=aws.s3, eventName=ObjectCreated:Put)
│ → Lambda
├─ Rule B (pattern: source=myapp, detail-type=user.signup)
│ → SQS + Step Functions (both targets)
└─ Rule C (cron: daily 03:00)
→ Lambda (batch job)Two kinds of events #
1) AWS service events — published by AWS automatically.
- EC2 state change (running → stopped)
- S3 ObjectCreated
- CodePipeline stage completion
- …
2) Custom events — your code publishes.
aws events put-events --entries '[{
"Source": "myapp.users",
"DetailType": "user.signup",
"Detail": "{\"userId\":42,\"plan\":\"pro\"}",
"EventBusName": "default"
}]'Create a rule #
aws events put-rule \
--name on-user-signup \
--event-pattern '{
"source": ["myapp.users"],
"detail-type": ["user.signup"],
"detail": {"plan": ["pro"]}
}'
# add target
aws events put-targets \
--rule on-user-signup \
--targets 'Id=1,Arn=arn:aws:lambda:ap-northeast-2:123456789012:function:welcome-pro'aws events put-rule \
--name nightly-cleanup \
--schedule-expression 'cron(0 18 * * ? *)' # UTC 18:00 = KST 03:00
aws events put-targets \
--rule nightly-cleanup \
--targets 'Id=1,Arn=arn:aws:lambda:ap-northeast-2:123456789012:function:cleanup'A cleaner scheduling tool, EventBridge Scheduler, exists separately (up to 200,000 schedules / cron or rate / one-shot or recurring). Prefer Scheduler in new projects.
Schema Registry / Pipes #
For larger event ecosystems:
- Schema Registry: register / auto-discover event schemas. Type-safe clients via codegen
- EventBridge Pipes: SQS / Kinesis / DynamoDB Streams → (optional enrichment / filter) → target
Where it fits #
- React to AWS service events (S3 upload → process)
- Domain event routing (
user.signup→ 5 handlers) - Scheduling (cron) — Lambda + EventBridge Scheduler is standard
- SaaS integration (Datadog, PagerDuty, Stripe — managed integrations)
Fan-out pattern #
One of the most common ops patterns. One event → many async handlers.
producer ─→ SNS Topic ──┬─→ SQS Queue A ─→ Lambda A
├─→ SQS Queue B ─→ Lambda B
└─→ SQS Queue C ─→ Lambda CWhy SNS + SQS? — You could put Lambda directly on SNS, but inserting SQS in between gives you:
- Independent backpressure per handler (B slowing doesn’t affect A)
- Independent retry / DLQ per handler
- Messages survive in SQS for up to 14 days even if Lambda is briefly down
EventBridge has the same shape — one event matches multiple rules and lands at multiple targets.
Idempotency #
Core for async systems. The same message processed twice must produce the same result.
Why duplicates happen:
- SQS Standard is at-least-once — occasional duplicates
- Failure to finish inside Visibility Timeout → redeliver
- Lambda auto-retry (async invocation)
- Network / client retries
Pattern 1: naturally idempotent operations #
# idempotent (same outcome no matter how many times)
user.set_status("active")
s3.put_object(Bucket=b, Key=k, Body=data)
# not idempotent
user.add_credit(100)
queue.send(...)Aim for naturally idempotent design.
Pattern 2: ID + DB marker #
def handler(event, context):
msg = json.loads(event["body"])
msg_id = msg["id"]
# already processed?
if redis.get(f"processed:{msg_id}"):
return # skip
process(msg)
redis.setex(f"processed:{msg_id}", 86400, "1") # keep 1 dayLambda Powertools’ Idempotency wraps this nicely with DynamoDB backend.
Pattern 3: FIFO Queue deduplication #
MessageDeduplicationId (or content-based dedup) — same ID within 5 minutes is auto-rejected.
When to use which #
One event, many receivers #
→ SNS Fan-out (+SQS) or EventBridge
EventBridge for richer rules / filters / AWS service integration. Pure fan-out → SNS+SQS.
Work queue #
→ SQS
React to AWS service events #
→ EventBridge (S3, EC2, etc. publish automatically)
Scheduling #
→ EventBridge Scheduler (or EventBridge Rule)
Simple message between components (1:1) #
→ SQS
Email / SMS / Mobile Push #
→ SNS
External SaaS integration #
→ EventBridge (managed integrations) or EventBridge Pipes
Cost — short #
| Price | |
|---|---|
| SQS | $0.40 / 1M requests (Standard), $0.50 (FIFO) |
| SNS | $0.50 / 1M publishes (Email / SMS / Mobile separately) |
| EventBridge | $1.00 / 1M publishes (custom events). AWS service events free |
10M SQS messages / month = $4. Small workloads are negligible.
Common pitfalls #
1) Duplicate processing #
A non-idempotent operation (charge / credit) on SQS Standard can be processed twice. One charge becomes two. Always enforce idempotency or use FIFO.
2) Visibility Timeout miss #
Processing takes 60s but timeout is 30s → in-flight message resurfaces, another consumer takes it → processed twice. Set Visibility Timeout to processing time + margin.
3) No DLQ #
Without a DLQ, failed messages keep cycling through the queue indefinitely and CloudWatch alarms keep firing. Add a DLQ to every queue and set an alarm on it. When the DLQ has messages, a human needs to investigate.
4) Lambda’s SQS polling looks expensive #
Lambda polls a quiet queue too often → SQS request cost. Confirm long polling is on (WaitTimeSeconds=20), the default.
5) Too many EventBridge rules #
A rule per domain event → console chaos. Per-domain custom Event Buses (not default), managed by IaC.
6) The role of SNS Email / SMS #
SNS Email is plain (text only, hard to change From) and prone to spam classification. For transactional mail (signup etc.), Amazon SES is the answer. SNS Email is fine only for ops alerts.
7) Notifying the client about async results #
API → SQS → Lambda processing → how does the client get the result? Two patterns:
- Client polls (GET by job ID)
- WebSocket (API Gateway WebSocket API) push
This is something you must decide up front when designing async systems.
Wrap-up #
Here is what this post covered:
- The three — SNS (pub/sub), SQS (queue), EventBridge (rule routing)
- SNS — publish to topic, instant push to many subscribers. Email / SMS / Mobile Push integration
- SQS — queue + polling. Visibility Timeout is the most important setting
- DLQ — must on every queue. Move messages after
maxReceiveCountfailures - Standard vs FIFO — Standard is at-least-once / unlimited throughput. FIFO is ordered + dedup + throttled
- EventBridge — AWS service events + custom events. Rule patterns for routing. Scheduler for cron
- Fan-out pattern — SNS → many SQS → respective Lambdas. Backpressure / retry / DLQ independent per receiver
- Idempotency — required in at-least-once. Naturally idempotent / DB markers / FIFO dedup
- Choice guide — one event N receivers (SNS / EventBridge), work queue (SQS), AWS event reactions (EventBridge), scheduling (EventBridge Scheduler), email (SES, not SNS)
- Cost — all very cheap. Negligible for small workloads
- Pitfalls — duplicate processing, Visibility Timeout miss, missing DLQ, polling cost (long polling), rule sprawl, role of SNS Email, async result delivery
Up next — secrets and configuration #
Next is a serious operational topic — where do passwords and config values live.
#6 Secrets Manager / Parameter Store covers the differences between the two, automatic rotation, fetching from code, IaC integration, and cost — AWS’s two secret-management tools, all in one piece.