AWS Intermediate #3: S3 — static hosting and presigned URLs

9 min read

If EC2 (#1 ~ #2) is the compute layer, S3 (Simple Storage Service) is AWS’s object storage layer. Launched in 2006 as AWS’s very first service, it’s the oldest service and still one of the most used.

S3 is essentially “an infinitely large global file system (where directories are fake)”. 11 9’s (99.999999999%) durability, ~$0.023 per GB, and the data hub of every other AWS service — these three are S3’s identity.

This post threads S3’s shape → policies and security → static hosting → presigned URLs → storage classes.

Buckets and objects #

S3 has only two things:

  • Bucket — container that holds objects. Per account / per region
  • Object — the actual file. Identified by key
Shape of S3
my-bucket/                       ← bucket (globally unique name)
  images/
    profile/2026/avatar-001.jpg  ← object (key = full path)
    profile/2026/avatar-002.jpg
  videos/
    intro.mp4
  index.html

There are no directories, really. The / in the picture is part of the key. images/profile/2026/avatar-001.jpg is the full key of one object. The console just renders it like folders by splitting on /.

Global uniqueness of bucket names #

The bucket name has to be unique across every AWS account in the world. Plain names like my-bucket are long taken.

Safe bucket names
my-company-dev-uploads-2026
acme-prod-static-ap-northeast-2

Rules:

  • 3–63 chars, lowercase / digits / - / .
  • Dots (.) are allowed but break with SSL wildcard certificates → usually - only
  • No IP-address-like name, no xn-- start (Punycode)
  • No uppercase, no underscore

Encoding environment / purpose / region / company in the name makes the bill / search easier later.

Buckets are regional #

Bucket names are global, but the data lives in one region. Create in ap-northeast-2 (Seoul) and the data sits in Seoul. The console shows you which region.

Cross-region replication is configured explicitly via S3 Replication (CRR — Cross Region Replication).

Core attributes of an object #

Each object carries:

AttributeDescription
KeyThe object’s full path. Unique within the bucket
BodyThe actual data (up to 5TB)
Content-TypeHow the browser interprets it (image/jpeg, application/json)
MetadataUser-defined headers (x-amz-meta-*)
ACLPer-object permission (rarely used today, replaced by bucket policy)
Storage ClassStorage class (Standard, IA, Glacier, etc.)
Version IDVersion identifier if versioning is on
ETagContent hash (mostly MD5)

Upload via console / CLI / SDK #

Upload / download with aws cli
# Single file
aws s3 cp ./image.jpg s3://my-bucket/images/profile/avatar.jpg

# Sync a whole folder
aws s3 sync ./public s3://my-bucket --delete

# Specify Content-Type
aws s3 cp ./index.html s3://my-bucket/ --content-type "text/html; charset=utf-8"

# Download
aws s3 cp s3://my-bucket/data.json ./
Python (boto3)
import boto3

s3 = boto3.client("s3")
s3.upload_file("image.jpg", "my-bucket", "images/avatar.jpg")
s3.download_file("my-bucket", "data.json", "data.json")

The four layers of security #

S3 security stacks four layers. Priority (top is stronger):

S3 permission evaluation order
1. Public Access Block      ← strongest. Block decisions trump everything
2. SCP (Organizations)      ← account-level guard
3. IAM Policy               ← per user / role
4. Bucket Policy            ← per bucket
5. Object ACL               ← per object (legacy approach, rarely used)

Public Access Block — first line #

Public Access Block (PAB) is the safety net to keep buckets from accidentally going public. Four options:

OptionMeaning
BlockPublicAclsNew ACLs can’t be public
IgnorePublicAclsExisting public ACLs are ignored
BlockPublicPolicyNew bucket policies can’t be public
RestrictPublicBucketsEven already-public buckets only allow IAM Principals

The default for every new bucket today is all four turned on at the account level. Only buckets that intend to be public (e.g., static hosting) explicitly relax these.

Turn on account-level PAB
aws s3control put-public-access-block \
  --account-id 123456789012 \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

Bucket Policy — JSON policy #

A bucket policy is a JSON policy attached directly to a bucket. It says who (Principal) can do what (Action) where (Resource).

Bucket policy that receives ALB logs
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "logdelivery.elasticloadbalancing.amazonaws.com"
      },
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-alb-logs/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-acl": "bucket-owner-full-control"
        }
      }
    }
  ]
}
Allow only the app IAM Role to read
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/MyAppRole"
      },
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    }
  ]
}

IAM Policy #

The policy attached to IAM users / roles. Combined with the bucket policy — both have to allow for a cross-account Allow to take effect.

App IAM Role policy
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::my-bucket/uploads/*"
  }]
}

For IAM details, see Basics #2.

Static site hosting #

S3 can host static HTML / CSS / JS as is. The simplest way to host a static site.

Enable bucket static hosting
aws s3 website s3://my-static-site/ \
  --index-document index.html \
  --error-document 404.html

After this, the bucket responds at:

S3 website endpoint
http://my-static-site.s3-website-ap-northeast-2.amazonaws.com

Allowing public access #

PAB blocks it by default. Static hosting is intentionally public, so:

  1. Disable the two BlockPublicPolicy items in the bucket’s PAB
  2. Allow GetObject for everyone via a bucket policy:
public-read policy for static hosting
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "PublicReadGetObject",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-static-site/*"
  }]
}

Limits of S3 static hosting #

S3 alone can’t do:

  • HTTPS (the S3 website endpoint is HTTP)
  • Custom domain + SSL certificate directly
  • Edge cache (fast worldwide responses)

That’s why production almost always uses the S3 + CloudFront pattern — covered in #7 CloudFront. At that point you turn PAB back on and let only CloudFront in via OAC.

Presigned URL — temporary permission #

A presigned URL lets you say “anyone can download / upload this object for the next N minutes.” It is a pattern for temporarily delegating permission to users who have none.

The most common use cases:

  • User profile image upload — the client PUTs to S3 directly
  • Receipt download — a 5-minute link
  • Private video streaming — 1-hour token
Generate a presigned PUT URL (boto3)
import boto3

s3 = boto3.client("s3")
url = s3.generate_presigned_url(
    "put_object",
    Params={
        "Bucket": "my-bucket",
        "Key": f"uploads/user-123/{filename}",
        "ContentType": "image/jpeg",
    },
    ExpiresIn=600,  # 10 minutes
)
# The client PUTs to this URL
Use presigned URL with curl
curl -X PUT --upload-file ./photo.jpg "<presigned-url>"

Security of presigned URLs #

  • The URL itself carries temporary credentials. Anyone with the URL can use it
  • It expires automatically after the expiry
  • Use HTTPS only. HTTP exposes it
  • You can pin conditions like ContentType / Content-Length

POST form vs PUT URL #

Two upload modes:

  • PUT URL — simple. Metadata via headers, one fixed ContentType
  • POST form (presigned post) — complex. Multiple conditions (content-length-range, starts-with, …) for stronger safety

Big / important uploads should use POST form. Simple cases use PUT URL.

Versioning and lifecycle #

Versioning — object history #

Turning on versioning preserves earlier versions automatically when you PUT the same key again.

Enable versioning
aws s3api put-bucket-versioning \
  --bucket my-bucket \
  --versioning-configuration Status=Enabled

After enabling:

  • Even Delete doesn’t really delete — only adds a Delete Marker
  • Recover earlier versions with --version-id
  • Storage cost is the sum across all versions ← trap

Lifecycle — auto cleanup / transition #

Rules to automatically move old objects to cheaper classes or delete them.

Example lifecycle rule
{
  "Rules": [{
    "ID": "ArchiveOldLogs",
    "Status": "Enabled",
    "Filter": { "Prefix": "logs/" },
    "Transitions": [
      { "Days": 30,  "StorageClass": "STANDARD_IA" },
      { "Days": 90,  "StorageClass": "GLACIER" }
    ],
    "Expiration": { "Days": 365 }
  }]
}

In production, lifecycle is essentially required — without it the bill becomes scary in 6 months.

Storage classes — the cost lever #

The same data can live in different classes based on how often / how fast you read it, and the savings are big.

Class$/GB/moFrequent accessRetrievalUse
Standard$0.023DailyInstantDefault. Hot data
Standard-IA$0.0125SometimesInstantBackups, analysis
One Zone-IA$0.01Sometimes, recreatableInstantOne AZ — low criticality
Intelligent-TieringAutoPattern unknownInstantWhen access frequency is uneven
Glacier Instant Retrieval$0.004QuarterlyInstantArchive + need instant sometimes
Glacier Flexible Retrieval$0.00361–2x per yearMin–hoursGeneral archive
Glacier Deep Archive$0.00099Almost never12 hoursLong-term compliance

Numbers are approximate, ap-northeast-2. See the official pricing page for details.

Class decision guide #

Decision tree
Daily / weekly access?
├── YES → Standard
└── NO →
    Sometimes (≤1x/month)?
    ├── YES → Standard-IA  (One Zone-IA if recreatable)
    └── NO →
        Pattern predictable?
        ├── YES → Glacier family
        └── NO  → Intelligent-Tiering

Trap — class transition cost #

Each transition like Standard → IA costs ~$0.01 per object. With 100M objects, that adds up fast — don’t bounce around with lifecycle rules. Decide based on object size / access frequency.

S3 consistency #

In the old days, read-after-write consistency was weak. Since December 2020, every region has strong consistency:

  • GET right after PUT works
  • LIST reflects DELETE immediately

However, versioned objects and metadata changes can still have a slight lag.

S3 with other services #

Common companionsPattern
CloudFrontS3 + edge cache + custom domain (#7)
LambdaS3 PUT trigger for image processing / indexing (Advanced #3)
AthenaSQL on CSV / Parquet / JSON in S3
GlueS3 data catalog / ETL
CloudTrail / VPC Flow Logs / ALB LogsAll stored in S3

Common pitfalls #

1) Bucket inadvertently public #

Half the data leaks in the news involve S3. New buckets start with all 4 PAB flags on, only intentionally-public buckets like static hosting explicitly relax them.

2) Cost bomb #

  • Per-GB storage + request count + data transfer triple-charged
  • Egress to the internet is ~$0.09/GB — for a popular static site that dominates
  • Pair with CloudFront to cut egress + accelerate via edge cache (#7)

3) Millions of small files #

Each object incurs GET / PUT cost; millions of tiny files is surprisingly expensive. Bundle (tar.gz, Parquet) or move to DynamoDB.

4) No lifecycle for a year #

Logs / temp files staying on Standard — bill spikes after 6 months. Set up lifecycle on day one.

5) Versioning on, forgotten #

Versioning + no lifecycle = storage cost grows forever. If versioning is on, set lifecycle to clean up non-current versions.

6) Presigned URL expiry too long #

A 24-hour presigned URL is practically permanent. Usually 5–15 minutes, hour at most.

7) s3:* wildcard in IAM #

Action: "s3:*" is dangerous. List explicit actions like GetObject / PutObject / ListBucket.

Wrap-up #

What we took home this time:

  • S3 = infinite object storage. Bucket (globally unique name) + Object (key) are the only two things
  • Directories are fake — just part of the key
  • Permission evaluation: PAB → IAM Policy → Bucket Policy → ACL
  • New buckets start with all 4 PAB flags on
  • Static hosting = bucket + website endpoint + public read policy. HTTPS / edge come from #7 CloudFront
  • Presigned URL = temporary delegation. 5–15 min, HTTPS only, pin ContentType
  • Versioning + lifecycle are a pair. Versioning without lifecycle = bill grows
  • Storage classes — Standard / IA / One Zone-IA / Intelligent-Tiering / 3 Glacier flavors
  • Pitfalls — public leaks, egress cost, small files, missing lifecycle, versioning cost, expiry too long, wildcard IAM

Next — RDS #

The object piece is set. Now to relational databases.

In #4 RDS — managed DB, backups, parameter groups we’ll line up the managed model, automated backups and PITR, Multi-AZ, parameter / option groups, and how to handle minor vs major upgrades.

X