10 Chapter

S3 — static hosting, presigned URLs

AWS's oldest object storage, S3. The shape of a bucket and the global uniqueness of its name, policies and Public Access Block, static site hosting, presigned URLs, and the patterns for lowering cost with storage classes.

If the EC2 of Chapter 8 ~ Chapter 9 is the compute domain, then S3 (Simple Storage Service) is AWS’s object storage domain. Launched in 2006 as AWS’s first service, it’s the oldest service and still one of the most used.

S3 is effectively an infinite-capacity global file system (though directories are fake). 11 9’s (99.999999999%) of durability, a price of about $0.023 per GB, and a data hub for every other AWS service — these three are what S3 is.

In this chapter we start from S3’s structure and lay out, in one flow, policies and security, static hosting, presigned URLs, and storage classes. The static hosting covered here is completed with HTTPS and an Edge cache in Chapter 14 CloudFront, and the S3 PUT trigger carries into the event processing of Chapter 17 Lambda Basics.

Buckets and objects #

S3 has only two core things to remember.

A Bucket — a container that holds objects. Created per account / region.
An Object — the actual file. Identified by a key.

The shape of S3

my-bucket/                       ← bucket (name is globally unique)
  images/
    profile/2026/avatar-001.jpg  ← object (key = the full path)
    profile/2026/avatar-002.jpg
  videos/
    intro.mp4
  index.html

Directories actually don’t exist. The / in the picture above is part of the key. images/profile/2026/avatar-001.jpg is the full key of one object. The console just shows it folder-like, based on /.

The global uniqueness of a bucket name #

A bucket name must be unique across all AWS accounts worldwide. A plain name like my-bucket has already been taken by someone.

Safe bucket names

my-company-dev-uploads-2026
acme-prod-static-ap-northeast-2

The rules are as follows.

3 ~ 63 characters, using lowercase / digits / - / ..
A dot (.) is allowed but causes problems with SSL certificate wildcards, so usually only - is used.
It can’t be in IP address form, and it can’t start with xn-- (Punycode).
Uppercase and underscores are not allowed.

Putting the environment / purpose / region / company name in the name makes billing and search easier.

A bucket is per-region #

A bucket name is globally unique, but the data lives in one region. If you create it in ap-northeast-2 (Seoul), the object data is inside the Seoul data center. The console shows which region it’s in. This distinction of global name / regional data is in the same vein as the global services vs regional services discussion in Chapter 1 Intro to AWS.

Cross-region replication is set up explicitly with S3 Replication (CRR, Cross Region Replication).

The core attributes of an object #

One object has the following.

Attribute	Role
Key	The object’s full path. Unique within the bucket
Body	The actual data (up to 5TB)
Content-Type	How the browser should handle it (image/jpeg, application/json)
Metadata	User-defined headers (`x-amz-meta-*`)
ACL	Object-level permissions (rarely used now, replaced by bucket policy)
Storage Class	The storage class (Standard, IA, Glacier, etc.)
Version ID	The version identifier, if versioning is on
ETag	A content hash (mostly MD5)

Uploading via console / CLI / SDK #

Upload / download with the aws cli

# a single file
aws s3 cp ./image.jpg s3://my-bucket/images/profile/avatar.jpg

# sync an entire folder
aws s3 sync ./public s3://my-bucket --delete

# specify Content-Type
aws s3 cp ./index.html s3://my-bucket/ --content-type "text/html; charset=utf-8"

# download
aws s3 cp s3://my-bucket/data.json ./

Python (boto3)

import boto3

s3 = boto3.client("s3")
s3.upload_file("image.jpg", "my-bucket", "images/avatar.jpg")
s3.download_file("my-bucket", "data.json", "data.json")

The four components of security #

S3’s security operates as overlapping layers. The evaluation order gets weaker as you go from top to bottom.

The evaluation order of S3 permissions

1. Public Access Block      ← most powerful. A block decision sits above everything
2. SCP (Organizations)      ← an account-level guard
3. IAM Policy               ← per user / role
4. Bucket Policy            ← per bucket
5. Object ACL               ← per object (old way, rarely used)

Public Access Block — first of all #

Public Access Block (PAB) is a safeguard that prevents a bucket from being made public by mistake. There are four options.

Option	Meaning
BlockPublicAcls	Prevent new ACLs from becoming public
IgnorePublicAcls	Ignore existing public ACLs
BlockPublicPolicy	Prevent new bucket policies from becoming public
RestrictPublicBuckets	Even for already-public buckets, only IAM Principals can access

These days, turning on all four at the account level is the default for every new bucket. Only buckets that are deliberately public — like for static hosting — get explicitly disabled.

Turning on account-level PAB

aws s3control put-public-access-block \
  --account-id 123456789012 \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

Bucket Policy — a JSON policy #

A Bucket Policy is a JSON policy attached directly to a bucket. It defines who (Principal) can do what (Action) where (Resource).

Example bucket policy that receives ALB logs

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "logdelivery.elasticloadbalancing.amazonaws.com"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-alb-logs/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"
        }
      }
    }
  ]
}

Allow read to the app IAM Role only

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/MyAppRole"
      },
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    }
  ]
}

IAM Policy #

A policy attached to an IAM user / role. It combines with the Bucket Policy to produce the effect. In the cross-account case, both must pass for Allow to apply.

The app IAM Role policy

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::my-bucket/uploads/*"
  }]
}

Detailed IAM setup is covered in Chapter 2 IAM.

Static site hosting #

S3 can just host static HTML / CSS / JS. It’s the simplest static site hosting method.

Enabling bucket static hosting

aws s3 website s3://my-static-site/ \
  --index-document index.html \
  --error-document 404.html

After this command, the bucket responds at the following URL.

S3 website endpoint

http://my-static-site.s3-website-ap-northeast-2.amazonaws.com

Allowing public access #

Under the defaults, PAB blocks it. Static hosting is intentionally public, so do the following.

Disable the two BlockPublicPolicy items in the bucket’s PAB.
Allow GetObject to everyone with a Bucket Policy.

A public read policy for static hosting

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "PublicReadGetObject",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-static-site/*"
  }]
}

The limits of S3 static hosting #

S3 alone can’t do the following.

HTTPS — the S3 website endpoint is HTTP.
Custom domain + SSL certificate — not directly.
Edge cache — no fast worldwide responses.

So in operations you almost always use the S3 + CloudFront pattern (covered in Chapter 14 CloudFront). At that point it becomes a pattern where you turn PAB back on and allow access to CloudFront only via OAC.

Presigned URL — temporary permission #

A Presigned URL is a way to create temporary permission allowing anyone to download or upload this object for just N minutes. It’s a pattern of briefly delegating permission to a user who doesn’t have it.

The most common use cases are as follows.

Uploading a user profile image — the client PUTs directly to S3.
Downloading a payment receipt — a 5-minute link.
Private video streaming — a 1-hour token.

Creating a presigned PUT URL (boto3)

import boto3

s3 = boto3.client("s3")
url = s3.generate_presigned_url(
    "put_object",
    Params={
        "Bucket": "my-bucket",
        "Key": f"uploads/user-123/{filename}",
        "ContentType": "image/jpeg",
    },
    ExpiresIn=600,  # 10 minutes
)
# the client makes a PUT request to this URL

Using a presigned URL with curl

curl -X PUT --upload-file ./photo.jpg "<presigned-url>"

The security of presigned URLs #

The URL itself contains temporary credentials. Anyone with just that URL can use it.
It automatically becomes invalid once the expiry time passes.
Use HTTPS only. If it leaks over HTTP, it’s exposed.
You can bake in conditions like ContentType / Content-Length too.

POST form vs PUT URL #

There are two upload methods.

PUT URL — simple. Pass metadata via headers, and fix a single ContentType.
POST form (presigned post) — complex. Safer with multiple conditions (content-length-range, starts-with, etc.).

For large-scale or important uploads, the POST form is recommended. For simple cases, a PUT URL is enough.

Versioning and lifecycle #

Versioning — object history #

When you turn on Versioning for a bucket, the previous version is automatically preserved when you PUT to the same key multiple times.

Turning on Versioning

aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

After it’s on, the following holds.

Even a Delete doesn’t actually delete. Only a Delete Marker is added.
A previous version can be restored with --version-id.
Storage cost sums up all versions (a trap).

Lifecycle — automatic cleanup / transition #

A rule that automatically moves old objects to a cheaper class or deletes them.

Example lifecycle rule

{
  "Rules": [{
    "ID": "ArchiveOldLogs",
    "Status": "Enabled",
    "Filter": { "Prefix": "logs/" },
    "Transitions": [
      { "Days": 30,  "StorageClass": "STANDARD_IA" },
      { "Days": 90,  "StorageClass": "GLACIER" }
    ],
    "Expiration": { "Days": 365 }
  }]
}

In operations a lifecycle is nearly essential. Without one, the bill becomes scary a few months later.

Storage classes — the cost item #

By placing the same data in a different class based on how often and how fast you retrieve it, you save a great deal on cost.

Class	GB/month	Frequent access	Retrieval time	Role
Standard	$0.023	Daily	Instant	Default. Hot data
Standard-IA	$0.0125	Occasional	Instant	Backups, analytics data
One Zone-IA	$0.01	Occasional, recreatable	Instant	One AZ only — low importance
Intelligent-Tiering	Auto	Pattern unknown	Instant	Access frequency varies
Glacier Instant Retrieval	$0.004	Once a quarter	Instant	Archive + occasionally needed instantly
Glacier Flexible Retrieval	$0.0036	1~2 times a year	Minutes~hours	General archive
Glacier Deep Archive	$0.00099	Almost never	12 hours	Long-term compliance

The numbers are approximate values for ap-northeast-2. For details, see the official price list.

A class decision guide #

Decision tree

This data — do you look at it daily / weekly?
├── YES → Standard
└── NO →
    Occasionally (once a month or less)?
    ├── YES → Standard-IA  (One Zone-IA if recreatable)
    └── NO →
        Pattern predictable?
        ├── YES → Glacier family
        └── NO  → Intelligent-Tiering

Trap — class transition cost #

Each transition like Standard to IA costs a small amount, about $0.01 per object. If you have 100 million objects, this cost adds up. Don’t move them often with a lifecycle. Decide by looking at object size and frequency.

S3’s consistency #

In the old days, read-after-write consistency was weak. Since December 2020, strong consistency is guaranteed in all regions.

GET immediately after PUT is possible.
LIST after DELETE is reflected immediately.

That said, version objects or metadata changes may still have a slight time lag.

The role of S3 and other services #

Service often used together	Pattern
CloudFront	S3 + Edge cache + your domain (Chapter 14)
Lambda	Image conversion / indexing via an S3 PUT trigger (Chapter 17)
Athena	SQL over CSV / Parquet / JSON in S3
Glue	Data catalog / ETL over S3
CloudTrail / VPC Flow Logs / ALB Logs	All stored in S3

Common pitfalls #

A bucket unintentionally public — Half of the data leaks that make the news are S3. Start new buckets with all four PAB options on, and explicitly disable only in intended cases like static hosting.
A cost bomb — It’s triple billing: storage per GB + number of requests + data transfer. In particular, traffic going out to the internet (Egress) is about $0.09 per GB, so for a popular static site this is large. Bundle it with CloudFront to cut Egress and accelerate with the Edge cache (Chapter 14).
Millions of small files — There’s a GET / PUT cost for every single object, so a pattern of millions of small files is surprisingly costly. The answer is to bundle them (tar.gz, Parquet) for storage or move to a different store.
A year with no Lifecycle — If logs or temp files just sit in Standard, the bill explodes a few months later. Set up the lifecycle on the first day you create the bucket.
Turning on Versioning and forgetting — If you turn on Versioning with no lifecycle, storage cost grows without bound. If you turned it on, clean up old noncurrent versions with a lifecycle.
Presigned URL expiry too long — A 24-hour presigned URL is effectively permanent access. Usually set it to 5 ~ 15 minutes, 1 hour at the longest.
s3:* wildcard IAM — An Action: "s3:*" policy is dangerous. Spell out at least GetObject / PutObject / ListBucket.

Exercises #

Of the five layers in §“The evaluation order of S3 permissions”, write down which layer you have to disable, and how, to deliberately make a static-hosting bucket public. Then, in one sentence, contrast why the OAC pattern of Chapter 14 CloudFront turns the same bucket’s PAB back on.
Looking at the code that sets ExpiresIn to 600 seconds for a presigned PUT URL, write down what problems arise when the expiry is too long versus too short, based on §“The security of presigned URLs”.
Assume one log bucket and write a Lifecycle rule yourself that moves to Standard-IA after 30 days, to Glacier after 90 days, and deletes after 365 days. Note which item in §“A cost bomb” this rule reduces, connecting it to Chapter 27 Cost Optimization.

In short: S3 is infinite object storage with only two concepts — buckets (globally unique names) and objects (keys) — and directories are fake. For security, PAB is at the top, and new buckets turn on all four PAB options. Static hosting has no HTTPS or Edge, so bundle it with CloudFront; a presigned URL is a 5~15-minute temporary permission delegation; and Versioning must always be paired with a Lifecycle.

Next chapter #

We’ve got the object domain in hand. Next, Chapter 11 RDS moves on to relational DBs. We’ll lay out RDS’s managed model, automated backups and PITR, Multi-AZ, parameter / option groups, and how to handle minor vs major upgrades.