Contents
19 Chapter

EventBridge / SQS / SNS

AWS's messaging infrastructure, all in one place. We cover the difference between the three tools, SNS topic / SQS queue / EventBridge bus·rule, the fan-out pattern, FIFO vs Standard, DLQs and idempotency, Visibility Timeout, and how they tie into Lambda / ECS.

If Chapter 17 Lambda basics and Chapter 18 API Gateway + Lambda were the realm of synchronous invocation, this chapter is the asynchronous / event-driven way — the pattern where the caller doesn’t wait for the result and components are loosely connected through messages.

AWS has three tools in this space — SNS / SQS / EventBridge. They look similar, but each has a different emphasis. In this chapter we draw the line between them precisely and bring together operational concerns like the fan-out pattern, DLQs, and idempotency. The queues and fan-out we set up here interlock with the configuration management of the next Chapter 20 Secrets Manager / Parameter Store and the workflows of Chapter 21 Step Functions intro.

The difference between the three #

SNSSQSEventBridge
ModelPub/Sub (push)Queue (pull)Event Bus (push, routing rules)
Receiving side0 ~ N subscribers at onceOnly 1 (per message)0 ~ N (per rule match)
RetentionImmediate push, gone if not receivedHeld in the queue up to 14 daysImmediate push on rule match
When it fitsOne event → multiple receiversWork queue, backpressureEvent-based routing (multiple sources)
IntegrationLambda, SQS, HTTP, Email, SMSLambda, ECS, EC2 (polling)Lambda, SQS, SNS, ECS, Step Functions, … 25+

Let’s look at the most common patterns at a glance.

the flow of the 3 configurations
                    ┌──────────────┐
                    │              │
   producer ─→ SNS ─→ ├─→ SQS  ─→ Lambda
                    └─→ Lambda    Worker (long processing)
   (one event)        (immediate push)

   producer ─→ EventBridge ──┬─→ Lambda (rule A match)
   (various sources)         ├─→ Step Functions (rule B)
                             └─→ SQS (rule C)

SNS — Pub/Sub #

Publish to a Topic and it’s delivered immediately to all subscribers.

Model #

SNS Topic
publish → Topic ──┬─→ subscriber 1 (Lambda)
                  ├─→ subscriber 2 (SQS)
                  ├─→ subscriber 3 (Email)
                  ├─→ subscriber 4 (HTTP endpoint)
                  └─→ subscriber 5 (SMS)

For when one event occurs and several components need to know about it at once.

Create + Publish #

SNS
TOPIC_ARN=$(aws sns create-topic --name user-events \
  --query TopicArn --output text)

# Lambda subscription
aws sns subscribe \
  --topic-arn $TOPIC_ARN \
  --protocol lambda \
  --notification-endpoint arn:aws:lambda:ap-northeast-2:123456789012:function:on-user-event

# Email subscription (activated after a confirmation email)
aws sns subscribe \
  --topic-arn $TOPIC_ARN \
  --protocol email \
  --notification-endpoint admin@example.com

# Publish
aws sns publish \
  --topic-arn $TOPIC_ARN \
  --message '{"userId":42,"action":"signup"}'

Standard vs FIFO #

  • Standard: near-unlimited throughput, no ordering guarantee, duplicates possible
  • FIFO: ordering guaranteed within a group, exactly-once delivery. Throughput limited (3,000 msg/s with batching)

A .fifo suffix in the name is required, and MessageGroupId is required.

Message filtering #

Even when published to the same Topic, you can have a subscriber receive only some of them.

filter policy
aws sns set-subscription-attributes \
  --subscription-arn $SUB_ARN \
  --attribute-name FilterPolicy \
  --attribute-value '{"event": ["signup", "purchase"]}'

It’s delivered only when the event in MessageAttributes at publish matches.

When it fits #

  • One event → multiple processings (email, notification, analytics, audit log)
  • Email / SMS / Mobile Push notifications
  • The entrance to an SQS fan-out

SQS — Queue #

Put a message in a Queue and a single consumer pulls it via polling and processes it.

Model #

SQS Queue
producer ─→ Queue ─→ consumer (Lambda / ECS Task)
                    pull (polling)

Since the consumer pulls, backpressure is held naturally. If the consumer is slow, messages pile up in the Queue, and meanwhile the producer is unaffected.

Create + Send / Receive #

SQS
QUEUE_URL=$(aws sqs create-queue --queue-name jobs \
  --query QueueUrl --output text)

# Send
aws sqs send-message \
  --queue-url $QUEUE_URL \
  --message-body '{"job":"resize","key":"photo.jpg"}'

# Receive
aws sqs receive-message \
  --queue-url $QUEUE_URL \
  --max-number-of-messages 10 \
  --wait-time-seconds 20  # long polling

# Delete (after processing is complete)
aws sqs delete-message \
  --queue-url $QUEUE_URL \
  --receipt-handle <handle>

Automatic connection with Lambda #

Attach SQS to Lambda as a trigger, and the Lambda infrastructure does the polling / deletion automatically.

SQS → Lambda
aws lambda create-event-source-mapping \
  --function-name process-jobs \
  --event-source-arn arn:aws:sqs:ap-northeast-2:123456789012:jobs \
  --batch-size 10 \
  --maximum-batching-window-in-seconds 5
Lambda handler
def handler(event, context):
    for record in event["Records"]:
        body = json.loads(record["body"])
        process_job(body)
    # normal exit → Lambda deletes the message automatically
    # exception → the message stays in the queue and retries after the visibility timeout

Visibility Timeout — the most important setting #

When a consumer receives a message, it becomes invisible to other consumers for N seconds. Processing and deletion must finish within this time. Otherwise it re-surfaces in the queue and another consumer receives it → risk of duplicate processing.

SettingRecommendation
Average processing time 1 secVisibility Timeout 30 sec (slack)
Average processing time 30 sec60 ~ 90 sec
Processing that can become very longExtend it during processing with change-message-visibility

Standard vs FIFO #

Same as SNS.

  • Standard: near-unlimited throughput, no ordering guarantee, at-least-once (rarely duplicates)
  • FIFO: ordering within a group, exactly-once (MessageDeduplicationId)

Dead Letter Queue (DLQ) #

Forwards failed-to-process messages to a separate queue. Mandatory in operations.

DLQ setup
# create the DLQ
DLQ_URL=$(aws sqs create-queue --queue-name jobs-dlq \
  --query QueueUrl --output text)
DLQ_ARN=$(aws sqs get-queue-attributes \
  --queue-url $DLQ_URL --attribute-names QueueArn \
  --query 'Attributes.QueueArn' --output text)

# redrive policy on the main queue
aws sqs set-queue-attributes \
  --queue-url $QUEUE_URL \
  --attributes '{
    "RedrivePolicy": "{\"deadLetterTargetArn\":\"'"$DLQ_ARN"'\",\"maxReceiveCount\":\"3\"}"
  }'

After 3 retries that fail, it’s forwarded to the DLQ. A person analyzes / fixes the DLQ’s messages and sends them back to the queue.

When it fits #

  • Work queues (image processing, mail sending, external API calls)
  • Absorbing traffic spikes
  • Component separation (the producer isn’t affected by the consumer’s speed)

EventBridge — Event Bus #

EventBridge routes events coming in from several sources to several targets, rule-based.

Model #

EventBridge
event source (S3, EC2 state change, external SaaS, my code ...)
Event Bus (default or custom)
    ├─ Rule A (pattern: source=aws.s3, eventName=ObjectCreated:Put)
    │     → Lambda
    ├─ Rule B (pattern: source=myapp, detail-type=user.signup)
    │     → SQS + Step Functions (both targets)
    └─ Rule C (cron: daily at 03:00)
          → Lambda (batch job)

Two kinds of events #

1) AWS service events — AWS publishes them automatically.

  • EC2 state change (running → stopped)
  • S3 ObjectCreated
  • CodePipeline stage completion

2) Custom events — your code publishes them.

publish a custom event
aws events put-events --entries '[{
  "Source": "myapp.users",
  "DetailType": "user.signup",
  "Detail": "{\"userId\":42,\"plan\":\"pro\"}",
  "EventBusName": "default"
}]'

Creating a Rule #

Rule (event-pattern based)
aws events put-rule \
  --name on-user-signup \
  --event-pattern '{
    "source": ["myapp.users"],
    "detail-type": ["user.signup"],
    "detail": {"plan": ["pro"]}
  }'

# add a target
aws events put-targets \
  --rule on-user-signup \
  --targets 'Id=1,Arn=arn:aws:lambda:ap-northeast-2:123456789012:function:welcome-pro'
Rule (cron / rate based — scheduler)
aws events put-rule \
  --name nightly-cleanup \
  --schedule-expression 'cron(0 18 * * ? *)'   # UTC 18:00 = KST 03:00

aws events put-targets \
  --rule nightly-cleanup \
  --targets 'Id=1,Arn=arn:aws:lambda:ap-northeast-2:123456789012:function:cleanup'

There’s a cleaner tool for scheduling, EventBridge Scheduler (up to 200,000 / cron or rate / one-time or recurring). For new projects, prefer Scheduler.

Schema Registry / Pipes #

Used in a large-scale event ecosystem.

  • Schema Registry: registers / auto-discovers the schemas of events. Use a code generator to make a type-safe client.
  • EventBridge Pipes: connects SQS / Kinesis / DynamoDB Streams → (optional enrichment / filter) → a target.

When it fits #

  • Reacting to AWS service events (S3 upload → processing)
  • Domain event routing (user.signup → 5 processings)
  • Scheduling (cron) — Lambda + EventBridge Scheduler is the standard
  • SaaS integration (managed integrations like Datadog, PagerDuty, Stripe)

The fan-out pattern #

One of the most common operational patterns. One event → multiple asynchronous processings.

SNS + SQS Fan-out
producer ─→ SNS Topic ──┬─→ SQS Queue A ─→ Lambda A
                        ├─→ SQS Queue B ─→ Lambda B
                        └─→ SQS Queue C ─→ Lambda C

Why SNS + SQS? You could attach Lambda directly to SNS, but inserting an SQS in between gives the following benefits.

  • Each processing’s backpressure is independent (even if B is slow, A is unaffected).
  • Each processing’s retry / DLQ is independent.
  • Even if a Lambda dies for a while, the message is held in SQS for up to 14 days.

EventBridge is the same pattern — when one event matches several rules, it’s delivered to several targets at once.

The importance of idempotency #

The heart of an asynchronous system. Even if you receive the same message twice, the result must be the same.

Why you receive it twice is as follows.

  • SQS Standard is at-least-once, so it occasionally duplicates.
  • If you can’t finish processing within the Visibility Timeout, it’s redelivered.
  • There’s a Lambda auto-retry (asynchronous invocation).
  • There are network / client retries.

Pattern 1: naturally idempotent operations #

# idempotent (same result no matter how many times)
user.set_status("active")
s3.put_object(Bucket=b, Key=k, Body=data)

# non-idempotent
user.add_credit(100)
queue.send(...)

Design for natural idempotency where possible.

Pattern 2: ID + DB marker #

prevent duplicate processing by ID
def handler(event, context):
    msg = json.loads(event["body"])
    msg_id = msg["id"]

    # already processed this ID?
    if redis.get(f"processed:{msg_id}"):
        return  # skip

    process(msg)
    redis.setex(f"processed:{msg_id}", 86400, "1")  # keep for 1 day

Lambda Powertools’s Idempotency handles this pattern cleanly with a DynamoDB backend.

Pattern 3: FIFO Queue deduplication #

MessageDeduplicationId or content-based deduplication — within 5 minutes, the same ID is auto-rejected.

Which tool when #

CaseTool
One event, multiple receiversSNS Fan-out (+ insert SQS) or EventBridge
Work queueSQS
Reacting to AWS service eventsEventBridge (S3, EC2, etc. auto-publish)
SchedulingEventBridge Scheduler (or EventBridge Rule)
Simple messages between components (1:1)SQS
Email / SMS / Mobile PushSNS
External SaaS integrationEventBridge (managed integrations) or EventBridge Pipes

EventBridge offers rich rules / filters / AWS service integrations. For just simple fan-out, it’s SNS + SQS.

Cost #

Price
SQS$0.40 / 1M requests (Standard), $0.50 (FIFO)
SNS$0.50 / 1M publishes (Email / SMS / Mobile separate)
EventBridge$1.00 / 1M publishes (custom events), AWS’s own events free

1M SQS messages/month is $4. A small workload is effectively negligible.

Pitfalls you’ll often hit #

1) Duplicate message processing #

If you do a non-naturally-idempotent operation (payment / credit addition, etc.) on SQS Standard, one payment becomes two. Always use an idempotency pattern or FIFO.

2) Visibility Timeout miss #

If processing takes 60 seconds but the Visibility Timeout is 30 seconds, the message being processed re-surfaces in the queue and another consumer receives it → processed twice. Set the Visibility Timeout to the processing time + slack.

3) No DLQ #

If failed messages drift in the queue forever, only the CloudWatch alarm gets noisy. Put a DLQ + alarm on every queue. Put an alarm on the DLQ itself too (a message arriving in the DLQ = a person needs to look).

4) Lambda’s SQS polling looks expensive #

If Lambda polls an empty queue at high frequency, it incurs SQS request cost. Check that long polling (WaitTimeSeconds=20) is the default.

5) Too many EventBridge rules #

If rules grow with every domain event, the console becomes chaos. Separate a custom Event Bus (not the default) per domain and manage it with IaC.

6) The use of SNS Email / SMS #

SNS Email is plain in format (text only, hard to change the From) and easily classified as Spam. For transactional mail (signup, etc.), Amazon SES is the answer. Use SNS Email only for operator alerts and the like.

7) How to notify the client of an asynchronous processing result #

After API → SQS → Lambda processes it, how do you notify the client of the result? There are two patterns.

  • The client polls (a GET request with the job ID)
  • Push via WebSocket (API Gateway WebSocket API)

This is a criterion you have to decide up front when designing an asynchronous system.

Exercises #

  1. Suppose your system has a requirement that “when one user signs up, send a welcome email, an analytics record, and an admin notification at the same time.” Basing it on §“Which tool when,” design in one paragraph which combination of SNS·SQS·EventBridge you’d use (also write where it diverges from the synchronous invocation of Chapter 18 API Gateway + Lambda).
  2. Suppose you do payment processing with SQS Standard + Lambda. Pick which of the three patterns in §“The importance of idempotency” you’d apply, and explain how to set the Visibility Timeout to prevent the duplicate processing of §“Pitfalls you’ll often hit,” tying it together with the concurrency of Chapter 17 Lambda basics.
  3. Write in one sentence why every work queue needs a DLQ, basing it on §“Dead Letter Queue” and §“Pitfalls you’ll often hit,” and write what an operator should do when messages pile up in the DLQ.

In short: The asynchronous messaging tools split into SNS (pub/sub push), SQS (queue pull), and EventBridge (routing rules). SQS carries backpressure, Visibility Timeout is its most important setting, and every queue needs a DLQ. To send one event to several places, use an SNS fan-out into multiple SQS queues and then into each Lambda. EventBridge and Scheduler handle AWS service events and time-based scheduling. Because this is an at-least-once environment, you must guarantee idempotency with natural idempotency, DB markers, or FIFO dedup. For transactional mail, SES is the right choice, not SNS.

Next chapter #

The next Chapter 20 Secrets Manager / Parameter Store covers a serious axis of operations — where to store passwords and configuration values. It puts together AWS’s two secret-management tools: the difference between the two, automatic rotation, fetching from code, ECS / Lambda integration, IaC connection, and a cost comparison.

X