Django Advanced #3: Query optimization — N+1, select_related, EXPLAIN, indexes
Building on the queries we sharpened in Intermediate #2 ORM advanced, we now layer proper optimization. In production, 99% of performance problems come from ORM usage patterns. Not CPU, not memory — the number and shape of queries.
Today’s toolbox:
- N+1 diagnosis and fixes (
select_related,prefetch_related,Prefetch) - Field selection (
only,defer,values,values_list) - Measurement (debug-toolbar,
connection.queries,.explain()) - Indexes (
Meta.indexes,db_index, multi-column) - Bulk operations (
bulk_create,bulk_update,update_or_create,iterator) - Concurrency (
select_for_update) - Escape hatch (raw SQL)
N+1 — the most common trap #
Reproducing it #
class Author(models.Model):
name = models.CharField(max_length=100)
class Post(models.Model):
title = models.CharField(max_length=200)
author = models.ForeignKey(Author, on_delete=models.CASCADE, related_name="posts")posts = Post.objects.all()[:50]
for post in posts:
print(post.title, post.author.name) # ← extra query each timeQuery trace:
SELECT * FROM post LIMIT 50; -- 1 query
SELECT * FROM author WHERE id = ?; -- 50 queries
-- 51 totalA 50-post page fires 51 queries. With 1000 posts, 1001 queries. That’s N+1.
select_related — forward FK / OneToOne
#
Solved in a single JOIN.
posts = Post.objects.select_related("author").all()[:50]
for post in posts:
print(post.title, post.author.name)Query:
SELECT post.*, author.*
FROM post LEFT OUTER JOIN author ON post.author_id = author.id
LIMIT 50;
-- 1 queryYou can chain it.
Comment.objects.select_related("post__author__profile").all()Even comment.post.author.profile is resolved in one query. But more JOINs widen the row, so be careful past 3-4 hops.
select_related works for:
- ForeignKey (the side that owns the FK)
- OneToOneField (both sides)
It doesn’t work for:
- Reverse ForeignKey (
author.posts) - ManyToMany
Those two are prefetch_related’s territory.
prefetch_related — reverse FK / M2M
#
Separate queries + Python-side join.
authors = Author.objects.prefetch_related("posts").all()[:20]
for author in authors:
print(author.name, [p.title for p in author.posts.all()])Query trace:
SELECT * FROM author LIMIT 20; -- 1 query
SELECT * FROM post WHERE author_id IN (1, 2, ..., 20); -- 1 query
-- 2 totalDjango gathers IDs from the first query and fires the second with an IN clause, then maps in Python. The reverse collection is filled with no N+1.
| select_related | prefetch_related | |
|---|---|---|
| Mechanism | SQL JOIN | Separate query + Python matching |
| Query count | +0 | +1 (per relation) |
| Fits | Forward FK, 1:1 | Reverse FK, M2M |
| Result width | Wider rows | Same |
Chaining works too.
Comment.objects.select_related("author").prefetch_related("post__tags")Prefetch object — customize
#
When you want extra conditions or ordering on the prefetched query.
from django.db.models import Prefetch
authors = Author.objects.prefetch_related(
Prefetch(
"posts",
queryset=Post.objects.filter(published=True).order_by("-created_at"),
to_attr="recent_posts", # access via author.recent_posts
)
).all()
for author in authors:
for post in author.recent_posts:
...to_attr lets you fill a separate attribute, decoupled from the original manager. author.posts.all() still returns all posts; author.recent_posts returns the filtered ones.
Field selection — only, defer, values, values_list
#
If a model has 30 columns and you only use 5, you’re wasting 70% of the data.
only — load only the listed fields immediately
#
posts = Post.objects.only("id", "title").all()
for p in posts:
print(p.title) # OK; accessing a non-loaded field triggers an extra query
print(p.body) # ← extra query (deferred field)Big effect when you have a heavy text column like body that’s not used on a list page.
defer — defer only the listed fields
#
posts = Post.objects.defer("body", "raw_html").all()values, values_list — skip ORM object construction
#
Skip even the cost of instantiating ORM objects, returning dicts/tuples.
# list of dicts
Post.objects.values("id", "title")
# [{"id": 1, "title": "..."}, ...]
# list of tuples
Post.objects.values_list("id", "title")
# [(1, "..."), ...]
# single column flattened
ids = list(Post.objects.values_list("id", flat=True))
# [1, 2, 3, ...]Decisive when you only need IDs, when serializing straight to JSON, or for CSV export.
recent_ids = Post.objects.filter(
created_at__gte=cutoff
).values_list("id", flat=True)
Comment.objects.filter(post_id__in=recent_ids).delete()Django automatically pushes recent_ids down as a subquery and merges into a single SQL.
Measurement #
No optimization without measurement. Django comes with the right tooling.
django-debug-toolbar
#
The standard for development environments.
pip install django-debug-toolbarINSTALLED_APPS = [..., "debug_toolbar"]
MIDDLEWARE = [..., "debug_toolbar.middleware.DebugToolbarMiddleware"]
INTERNAL_IPS = ["127.0.0.1"]from django.conf import settings
from django.urls import include, path
urlpatterns = [...]
if settings.DEBUG:
urlpatterns += [path("__debug__/", include("debug_toolbar.urls"))]The right-hand panel shows query count, each query, EXPLAIN, and timings. If the query count scales with page item count, suspect N+1.
connection.queries
#
When you want to inspect from inside the code.
from django.db import connection, reset_queries
reset_queries()
posts = list(Post.objects.select_related("author")[:50])
for p in posts:
_ = p.author.name
print(len(connection.queries), "queries")
for q in connection.queries:
print(q["time"], q["sql"][:100])It’s only populated when DEBUG=True or CONN_MAX_AGE is active. In tests, assertNumQueries is more standard.
from django.test import TestCase
class PostListQueryCountTest(TestCase):
def test_no_n_plus_one(self):
for i in range(10):
author = Author.objects.create(name=f"A{i}")
Post.objects.create(title=f"P{i}", author=author)
with self.assertNumQueries(2): # author + posts
list(self.client.get("/posts/").context["posts"])A regression test on the query count is the most effective protection against N+1 reappearing.
.explain()
#
See the actual execution plan the DB will use.
qs = Post.objects.filter(published=True, author__name__startswith="K")
print(qs.explain())PostgreSQL output (excerpt):
Hash Join (cost=15.50..120.30 rows=200 width=64)
Hash Cond: (post.author_id = author.id)
-> Seq Scan on post (cost=0.00..80.00 rows=2000 width=40)
Filter: published
-> Hash (cost=12.00..12.00 rows=200 width=24)
-> Seq Scan on author (cost=0.00..12.00 rows=200 width=24)
Filter: ((name)::text ~~ 'K%'::text)Seeing Seq Scan on a big table is a signal of a missing index. After adding an index, re-explain to confirm Index Scan.
PostgreSQL-only options:
qs.explain(analyze=True, verbose=True, buffers=True)Actually runs the query to show row counts, time, and IO. Only meaningful on data similar to production.
Indexes #
db_index=True — single column
#
class Post(models.Model):
slug = models.SlugField(max_length=200, unique=True) # unique → auto index
status = models.CharField(max_length=20, db_index=True)
created_at = models.DateTimeField(db_index=True)unique=True, primary_key=True, and ForeignKey get indexes automatically.
Meta.indexes — multi-column, function, partial indexes
#
from django.db.models import Index, F, Q
class Post(models.Model):
status = models.CharField(max_length=20)
created_at = models.DateTimeField()
author = models.ForeignKey("Author", on_delete=models.CASCADE)
title = models.CharField(max_length=200)
class Meta:
indexes = [
# multi-column (status, created_at) — order matters
Index(fields=["status", "-created_at"], name="post_status_recent_idx"),
# function index (PostgreSQL)
Index(F("title").lower(), name="post_title_lower_idx"),
# partial index (conditional)
Index(
fields=["created_at"],
condition=Q(status="published"),
name="post_published_recent_idx",
),
]Rules:
- For multi-column indexes, order columns to match the WHERE clause for best effect
- Adding indexes increases write cost — don’t slap an index on every column
- A partial index indexes only frequently queried subsets — small and fast
Where indexes don’t help #
Wrapping a column in a function disables the regular index.
WHERE LOWER(title) = 'hello'Fix: create a function index (Index(F("title").lower(), ...)), or normalize on the data side (store lowercase).
LIKE 'hello%' can use the index, but LIKE '%hello' can’t (leading wildcard). For full text, use PostgreSQL’s SearchVector / GIN index.
Bulk operations #
bulk_create — INSERT in one shot
#
for data in items:
Post.objects.create(**data)posts = [Post(**data) for data in items]
Post.objects.bulk_create(posts, batch_size=500)Hundreds of times faster. Caveats:
- Auto PK fill is only guaranteed on PostgreSQL (other DBs may leave None)
pre_save/post_savesignals are not fired (#5)- M2M needs separate handling
bulk_update — UPDATE in one shot
#
posts = list(Post.objects.filter(status="draft"))
for p in posts:
p.status = "published"
Post.objects.bulk_update(posts, ["status"], batch_size=500)You list the fields to update. Signals from save() don’t fire.
update_or_create — idempotent upsert
#
DailyStat.objects.update_or_create(
date=target,
defaults={"order_count": n, "total_amount": total},
)The same one we saw in #2’s stats command. Safe to re-run.
PostgreSQL’s true INSERT … ON CONFLICT is supported via bulk_create(..., update_conflicts=True) since 5.x:
Post.objects.bulk_create(
posts,
update_conflicts=True,
update_fields=["title", "body"],
unique_fields=["slug"],
)iterator — save memory
#
for post in Post.objects.all():
process(post)for post in Post.objects.iterator(chunk_size=2000):
process(post)iterator doesn’t cache results and streams via a DB cursor. The standard for iterating big tables.
Concurrency — select_for_update
#
The scenario where multiple workers update the same row at once — balance deductions, stock decrements, and so on.
def withdraw(account_id, amount):
account = Account.objects.get(pk=account_id)
if account.balance < amount:
raise ValueError("insufficient")
account.balance -= amount
account.save()If two requests arrive simultaneously, both pass the balance check at get time, both decrement — and the balance goes negative.
from django.db import transaction
def withdraw(account_id, amount):
with transaction.atomic():
account = Account.objects.select_for_update().get(pk=account_id)
if account.balance < amount:
raise ValueError("insufficient")
account.balance -= amount
account.save()A row-level lock is held until the transaction ends. The second request waits until the first commits/rolls back.
nowait, skip_locked
#
Job.objects.select_for_update(skip_locked=True).filter(status="pending").first()Common in job queue patterns — skip rows another worker has locked and grab the next job.
account = Account.objects.select_for_update(nowait=True).get(pk=id)Avoiding deadlocks #
A deadlock is when two transactions wait on each other’s locks. Avoidance rules:
- Always acquire locks in the same order (e.g., always smaller ID first)
- Keep transactions short — don’t put external API calls inside a transaction
- Lock only the rows you need — keep
select_for_update().filter(...)narrow
Raw SQL — escape hatch #
For cases where expressing something through the ORM is too complex or you need DB-specific features.
posts = Post.objects.raw(
"SELECT * FROM post WHERE to_tsvector('simple', title) @@ to_tsquery('simple', %s)",
["python | django"],
)
for p in posts:
print(p.title)raw maps results back into ORM objects, so model methods and properties stay alive.
A lower layer:
from django.db import connection
with connection.cursor() as cur:
cur.execute(
"UPDATE post SET status = %s WHERE created_at < %s RETURNING id",
["archived", cutoff],
)
archived_ids = [row[0] for row in cur.fetchall()]Decoupled from the ORM. Always use parameter binding (%s) to prevent SQL injection — never build SQL with Python f-strings.
Common pitfalls #
1) len(qs) vs qs.count()
#
n = len(Post.objects.all())n = Post.objects.count()len loads the entire result into memory and counts. If you only need the count, use count().
That said, if the queryset is already cached, len works without an extra query — it depends on the situation.
2) exists() vs count() > 0
#
if Post.objects.filter(slug=s).count() > 0: ...if Post.objects.filter(slug=s).exists(): ...exists ends with LIMIT 1.
3) NULL check with .first()
#
post = Post.objects.filter(slug=s).first()
if post is None:
return Http404Shorter than get + try/except DoesNotExist, and intent is clearer.
4) M2M queries #
Without prefetch_related, running for tag in post.tags.all() over N posts gives you N+1.
5) count() is not cached
#
Even within the same transaction, each count() call is a fresh SQL. If you need the same count multiple times, store it in a variable.
Wrap-up #
What you covered this time:
- N+1 — forward use
select_related(JOIN); reverse/M2M useprefetch_related(separate query) Prefetch(queryset=..., to_attr=...)to customize the prefetched resultonly,deferfor column selection;values/values_list(flat=True)to avoid ORM instantiation- Measurement: debug-toolbar,
assertNumQueries,connection.queries,.explain() - Indexes:
db_index,Meta.indexes(multi-column / function / partial); column order matters in multi-column - Bulk:
bulk_create,bulk_update,update_or_create,iterator(chunk_size=...) - Concurrency:
select_for_update(+ atomic),skip_locked(job queue), avoid deadlocks - Raw escape hatch:
Manager.raw,connection.cursor— parameter binding required - Pitfalls:
lenvscount,count > 0vsexists, M2M, count not cached
In the next post (#4 Caching) we layer caching on top. Per-view, template fragment, low-level, Redis backend, invalidation strategies, stampede — another axis for cutting queries.