Django Advanced #3: Query optimization — N+1, select_related, EXPLAIN, indexes

Tuesday, April 28, 2026

9 min read

Building on the queries we sharpened in Intermediate #2 ORM advanced, we now layer proper optimization. In production, 99% of performance problems come from ORM usage patterns. Not CPU, not memory — the number and shape of queries.

Today’s toolbox:

N+1 diagnosis and fixes (select_related, prefetch_related, Prefetch)
Field selection (only, defer, values, values_list)
Measurement (debug-toolbar, connection.queries, .explain())
Indexes (Meta.indexes, db_index, multi-column)
Bulk operations (bulk_create, bulk_update, update_or_create, iterator)
Concurrency (select_for_update)
Escape hatch (raw SQL)

N+1 — the most common trap #

Reproducing it #

models.py

class Author(models.Model):
    name = models.CharField(max_length=100)

class Post(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="posts")

🚫 N+1 happens

posts = Post.objects.all()[:50]
for post in posts:
    print(post.title, post.author.name)   # ← extra query each time

Query trace:

SELECT * FROM post LIMIT 50;                      -- 1 query
SELECT * FROM author WHERE id = ?;                -- 50 queries
-- 51 total

A 50-post page fires 51 queries. With 1000 posts, 1001 queries. That’s N+1.

`select_related` — forward FK / OneToOne #

Solved in a single JOIN.

✅ select_related

posts = Post.objects.select_related("author").all()[:50]
for post in posts:
    print(post.title, post.author.name)

Query:

SELECT post.*, author.*
FROM post LEFT OUTER JOIN author ON post.author_id = author.id
LIMIT 50;
-- 1 query

You can chain it.

Multi-step

Comment.objects.select_related("post__author__profile").all()

Even comment.post.author.profile is resolved in one query. But more JOINs widen the row, so be careful past 3-4 hops.

select_related works for:

ForeignKey (the side that owns the FK)
OneToOneField (both sides)

It doesn’t work for:

Reverse ForeignKey (author.posts)
ManyToMany

Those two are prefetch_related’s territory.

`prefetch_related` — reverse FK / M2M #

Separate queries + Python-side join.

✅ prefetch_related

authors = Author.objects.prefetch_related("posts").all()[:20]
for author in authors:
    print(author.name, [p.title for p in author.posts.all()])

Query trace:

SELECT * FROM author LIMIT 20;                                    -- 1 query
SELECT * FROM post WHERE author_id IN (1, 2, ..., 20);            -- 1 query
-- 2 total

Django gathers IDs from the first query and fires the second with an IN clause, then maps in Python. The reverse collection is filled with no N+1.

	select_related	prefetch_related
Mechanism	SQL JOIN	Separate query + Python matching
Query count	+0	+1 (per relation)
Fits	Forward FK, 1:1	Reverse FK, M2M
Result width	Wider rows	Same

Chaining works too.

Mixing both

Comment.objects.select_related("author").prefetch_related("post__tags")

`Prefetch` object — customize #

When you want extra conditions or ordering on the prefetched query.

Prefetch

from django.db.models import Prefetch

authors = Author.objects.prefetch_related(
    Prefetch(
        "posts",
        queryset=Post.objects.filter(published=True).order_by("-created_at"),
        to_attr="recent_posts",   # access via author.recent_posts
    )
).all()

for author in authors:
    for post in author.recent_posts:
        ...

to_attr lets you fill a separate attribute, decoupled from the original manager. author.posts.all() still returns all posts; author.recent_posts returns the filtered ones.

Field selection — `only`, `defer`, `values`, `values_list` #

If a model has 30 columns and you only use 5, you’re wasting 70% of the data.

`only` — load only the listed fields immediately #

only

posts = Post.objects.only("id", "title").all()
for p in posts:
    print(p.title)        # OK; accessing a non-loaded field triggers an extra query
    print(p.body)         # ← extra query (deferred field)

Big effect when you have a heavy text column like body that’s not used on a list page.

`defer` — defer only the listed fields #

defer — opposite of only

posts = Post.objects.defer("body", "raw_html").all()

`values`, `values_list` — skip ORM object construction #

Skip even the cost of instantiating ORM objects, returning dicts/tuples.

values / values_list

# list of dicts
Post.objects.values("id", "title")
# [{"id": 1, "title": "..."}, ...]

# list of tuples
Post.objects.values_list("id", "title")
# [(1, "..."), ...]

# single column flattened
ids = list(Post.objects.values_list("id", flat=True))
# [1, 2, 3, ...]

Decisive when you only need IDs, when serializing straight to JSON, or for CSV export.

Practical — pulling IDs for an IN clause

recent_ids = Post.objects.filter(
    created_at__gte=cutoff
).values_list("id", flat=True)

Comment.objects.filter(post_id__in=recent_ids).delete()

Django automatically pushes recent_ids down as a subquery and merges into a single SQL.

Measurement #

No optimization without measurement. Django comes with the right tooling.

`django-debug-toolbar` #

The standard for development environments.

Install

pip install django-debug-toolbar

settings.py

INSTALLED_APPS = [..., "debug_toolbar"]
MIDDLEWARE = [..., "debug_toolbar.middleware.DebugToolbarMiddleware"]
INTERNAL_IPS = ["127.0.0.1"]

urls.py

from django.conf import settings
from django.urls import include, path

urlpatterns = [...]

if settings.DEBUG:
    urlpatterns += [path("__debug__/", include("debug_toolbar.urls"))]

The right-hand panel shows query count, each query, EXPLAIN, and timings. If the query count scales with page item count, suspect N+1.

`connection.queries` #

When you want to inspect from inside the code.

Only when DEBUG=True

from django.db import connection, reset_queries

reset_queries()
posts = list(Post.objects.select_related("author")[:50])
for p in posts:
    _ = p.author.name
print(len(connection.queries), "queries")
for q in connection.queries:
    print(q["time"], q["sql"][:100])

It’s only populated when DEBUG=True or CONN_MAX_AGE is active. In tests, assertNumQueries is more standard.

Lock it in with a test

from django.test import TestCase

class PostListQueryCountTest(TestCase):
    def test_no_n_plus_one(self):
        for i in range(10):
            author = Author.objects.create(name=f"A{i}")
            Post.objects.create(title=f"P{i}", author=author)

        with self.assertNumQueries(2):   # author + posts
            list(self.client.get("/posts/").context["posts"])

A regression test on the query count is the most effective protection against N+1 reappearing.

`.explain()` #

See the actual execution plan the DB will use.

explain

qs = Post.objects.filter(published=True, author__name__startswith="K")
print(qs.explain())

PostgreSQL output (excerpt):

Hash Join  (cost=15.50..120.30 rows=200 width=64)
  Hash Cond: (post.author_id = author.id)
  ->  Seq Scan on post  (cost=0.00..80.00 rows=2000 width=40)
        Filter: published
  ->  Hash  (cost=12.00..12.00 rows=200 width=24)
        ->  Seq Scan on author  (cost=0.00..12.00 rows=200 width=24)
              Filter: ((name)::text ~~ 'K%'::text)

Seeing Seq Scan on a big table is a signal of a missing index. After adding an index, re-explain to confirm Index Scan.

PostgreSQL-only options:

ANALYZE too

qs.explain(analyze=True, verbose=True, buffers=True)

Actually runs the query to show row counts, time, and IO. Only meaningful on data similar to production.

Indexes #

`db_index=True` — single column #

Per column

class Post(models.Model):
    slug = models.SlugField(max_length=200, unique=True)   # unique → auto index
    status = models.CharField(max_length=20, db_index=True)
    created_at = models.DateTimeField(db_index=True)

unique=True, primary_key=True, and ForeignKey get indexes automatically.

`Meta.indexes` — multi-column, function, partial indexes #

Meta.indexes

from django.db.models import Index, F, Q

class Post(models.Model):
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField()
    author = models.ForeignKey("Author", on_delete=models.CASCADE)
    title = models.CharField(max_length=200)

    class Meta:
        indexes = [
            # multi-column (status, created_at) — order matters
            Index(fields=["status", "-created_at"], name="post_status_recent_idx"),

            # function index (PostgreSQL)
            Index(F("title").lower(), name="post_title_lower_idx"),

            # partial index (conditional)
            Index(
                fields=["created_at"],
                condition=Q(status="published"),
                name="post_published_recent_idx",
            ),
        ]

Rules:

For multi-column indexes, order columns to match the WHERE clause for best effect
Adding indexes increases write cost — don’t slap an index on every column
A partial index indexes only frequently queried subsets — small and fast

Where indexes don’t help #

Wrapping a column in a function disables the regular index.

🚫 No index can help

WHERE LOWER(title) = 'hello'

Fix: create a function index (Index(F("title").lower(), ...)), or normalize on the data side (store lowercase).

LIKE 'hello%' can use the index, but LIKE '%hello' can’t (leading wildcard). For full text, use PostgreSQL’s SearchVector / GIN index.

Bulk operations #

`bulk_create` — INSERT in one shot #

🚫 1000 INSERTs

for data in items:
    Post.objects.create(**data)

✅ 1 INSERT

posts = [Post(**data) for data in items]
Post.objects.bulk_create(posts, batch_size=500)

Hundreds of times faster. Caveats:

Auto PK fill is only guaranteed on PostgreSQL (other DBs may leave None)
pre_save / post_save signals are not fired (#5)
M2M needs separate handling

`bulk_update` — UPDATE in one shot #

bulk_update

posts = list(Post.objects.filter(status="draft"))
for p in posts:
    p.status = "published"
Post.objects.bulk_update(posts, ["status"], batch_size=500)

You list the fields to update. Signals from save() don’t fire.

`update_or_create` — idempotent upsert #

upsert pattern

DailyStat.objects.update_or_create(
    date=target,
    defaults={"order_count": n, "total_amount": total},
)

The same one we saw in #2’s stats command. Safe to re-run.

PostgreSQL’s true INSERT … ON CONFLICT is supported via bulk_create(..., update_conflicts=True) since 5.x:

bulk_create's upsert

Post.objects.bulk_create(
    posts,
    update_conflicts=True,
    update_fields=["title", "body"],
    unique_fields=["slug"],
)

`iterator` — save memory #

🚫 Million rows in memory

for post in Post.objects.all():
    process(post)

✅ chunked

for post in Post.objects.iterator(chunk_size=2000):
    process(post)

iterator doesn’t cache results and streams via a DB cursor. The standard for iterating big tables.

Concurrency — `select_for_update` #

The scenario where multiple workers update the same row at once — balance deductions, stock decrements, and so on.

🚫 race condition

def withdraw(account_id, amount):
    account = Account.objects.get(pk=account_id)
    if account.balance < amount:
        raise ValueError("insufficient")
    account.balance -= amount
    account.save()

If two requests arrive simultaneously, both pass the balance check at get time, both decrement — and the balance goes negative.

✅ select_for_update + atomic

from django.db import transaction

def withdraw(account_id, amount):
    with transaction.atomic():
        account = Account.objects.select_for_update().get(pk=account_id)
        if account.balance < amount:
            raise ValueError("insufficient")
        account.balance -= amount
        account.save()

A row-level lock is held until the transaction ends. The second request waits until the first commits/rolls back.

`nowait`, `skip_locked` #

Skip to another row instead of waiting

Job.objects.select_for_update(skip_locked=True).filter(status="pending").first()

Common in job queue patterns — skip rows another worker has locked and grab the next job.

Fail fast instead of waiting

account = Account.objects.select_for_update(nowait=True).get(pk=id)

Avoiding deadlocks #

A deadlock is when two transactions wait on each other’s locks. Avoidance rules:

Always acquire locks in the same order (e.g., always smaller ID first)
Keep transactions short — don’t put external API calls inside a transaction
Lock only the rows you need — keep select_for_update().filter(...) narrow

Raw SQL — escape hatch #

For cases where expressing something through the ORM is too complex or you need DB-specific features.

Manager.raw

posts = Post.objects.raw(
    "SELECT * FROM post WHERE to_tsvector('simple', title) @@ to_tsquery('simple', %s)",
    ["python | django"],
)
for p in posts:
    print(p.title)

raw maps results back into ORM objects, so model methods and properties stay alive.

A lower layer:

connection.cursor

from django.db import connection

with connection.cursor() as cur:
    cur.execute(
        "UPDATE post SET status = %s WHERE created_at < %s RETURNING id",
        ["archived", cutoff],
    )
    archived_ids = [row[0] for row in cur.fetchall()]

Decoupled from the ORM. Always use parameter binding (%s) to prevent SQL injection — never build SQL with Python f-strings.

Common pitfalls #

1) `len(qs)` vs `qs.count()` #

🚫 fetch all just to count

n = len(Post.objects.all())

✅ COUNT(*)

n = Post.objects.count()

len loads the entire result into memory and counts. If you only need the count, use count().

That said, if the queryset is already cached, len works without an extra query — it depends on the situation.

2) `exists()` vs `count() > 0` #

🚫 count everything

if Post.objects.filter(slug=s).count() > 0: ...

✅ check one row

if Post.objects.filter(slug=s).exists(): ...

exists ends with LIMIT 1.

3) NULL check with `.first()` #

✅ clean

post = Post.objects.filter(slug=s).first()
if post is None:
    return Http404

Shorter than get + try/except DoesNotExist, and intent is clearer.

4) M2M queries #

Without prefetch_related, running for tag in post.tags.all() over N posts gives you N+1.

5) `count()` is not cached #

Even within the same transaction, each count() call is a fresh SQL. If you need the same count multiple times, store it in a variable.

Wrap-up #

What you covered this time:

N+1 — forward use select_related (JOIN); reverse/M2M use prefetch_related (separate query)
Prefetch(queryset=..., to_attr=...) to customize the prefetched result
only, defer for column selection; values/values_list(flat=True) to avoid ORM instantiation
Measurement: debug-toolbar, assertNumQueries, connection.queries, .explain()
Indexes: db_index, Meta.indexes (multi-column / function / partial); column order matters in multi-column
Bulk: bulk_create, bulk_update, update_or_create, iterator(chunk_size=...)
Concurrency: select_for_update (+ atomic), skip_locked (job queue), avoid deadlocks
Raw escape hatch: Manager.raw, connection.cursor — parameter binding required
Pitfalls: len vs count, count > 0 vs exists, M2M, count not cached

In the next post (#4 Caching) we layer caching on top. Per-view, template fragment, low-level, Redis backend, invalidation strategies, stampede — another axis for cutting queries.

N+1 — the most common trap #

Reproducing it #

select_related — forward FK / OneToOne #

prefetch_related — reverse FK / M2M #

Prefetch object — customize #

Field selection — only, defer, values, values_list #

only — load only the listed fields immediately #

defer — defer only the listed fields #

values, values_list — skip ORM object construction #

Measurement #

django-debug-toolbar #

connection.queries #

.explain() #

Indexes #

db_index=True — single column #

Meta.indexes — multi-column, function, partial indexes #

Where indexes don’t help #

Bulk operations #

bulk_create — INSERT in one shot #

bulk_update — UPDATE in one shot #

update_or_create — idempotent upsert #

iterator — save memory #

Concurrency — select_for_update #

nowait, skip_locked #

Avoiding deadlocks #

Raw SQL — escape hatch #

Common pitfalls #

1) len(qs) vs qs.count() #

2) exists() vs count() > 0 #

3) NULL check with .first() #

4) M2M queries #

5) count() is not cached #

Wrap-up #

`select_related` — forward FK / OneToOne #

`prefetch_related` — reverse FK / M2M #

`Prefetch` object — customize #

Field selection — `only`, `defer`, `values`, `values_list` #

`only` — load only the listed fields immediately #

`defer` — defer only the listed fields #

`values`, `values_list` — skip ORM object construction #

`django-debug-toolbar` #

`connection.queries` #

`.explain()` #

`db_index=True` — single column #

`Meta.indexes` — multi-column, function, partial indexes #

`bulk_create` — INSERT in one shot #

`bulk_update` — UPDATE in one shot #

`update_or_create` — idempotent upsert #

`iterator` — save memory #

Concurrency — `select_for_update` #

`nowait`, `skip_locked` #

1) `len(qs)` vs `qs.count()` #

2) `exists()` vs `count() > 0` #

3) NULL check with `.first()` #

5) `count()` is not cached #