Django Intermediate #2: ORM Advanced — annotate, aggregate, F/Q, prefetch_related
In Basics #3 we got our first introduction to the ORM — tools like Post.objects.filter, get, create. This post stacks the proper tools on top of that. These are patterns used almost daily in real work, and this single post will change half the ORM code you write.
annotate/aggregate— aggregationFexpression — DB-side computationQobject — complex conditions (OR/AND/NOT)select_related/prefetch_related— solving N+1only/defer/values— choosing fields to fetch- raw SQL escape hatch — the last resort
These are also the tools that go inside get_queryset from the previous post #1 CBV.
annotate vs aggregate — two faces of aggregation
#
Both use aggregate functions, but the shape of the result differs.
annotate | aggregate | |
|---|---|---|
| Result | QuerySet (new field on each row) | dict (aggregate over all) |
| Granularity | Per-row computation | Whole-set computation |
| Example | “comment count for each post” | “total posts, average rating” |
aggregate — aggregate over all
#
from django.db.models import Count, Avg, Sum, Max, Min
Post.objects.aggregate(Count("id"))
# {'id__count': 1234}
Post.objects.aggregate(
total=Count("id"),
avg_views=Avg("views"),
max_views=Max("views"),
)
# {'total': 1234, 'avg_views': 152.3, 'max_views': 9876}Key names are auto-generated (<field>__<func> form), or you can name them via keyword arguments.
annotate — per-row computation
#
from django.db.models import Count
posts = Post.objects.annotate(comment_count=Count("comments"))
for p in posts:
print(p.title, p.comment_count)comments is the reverse manager name from Comment model’s ForeignKey(Post, related_name="comments"). annotate automatically generates SQL GROUP BY + an aggregate function to compute the comment count for each post.
SELECT post.*, COUNT(comment.id) AS comment_count
FROM post
LEFT OUTER JOIN comment ON comment.post_id = post.id
GROUP BY post.id;annotate + filter combination #
Post.objects.annotate(
comment_count=Count("comments")
).filter(comment_count__gte=5)You can use the virtual field created by annotate directly as a filter condition. It translates to SQL HAVING.
filter + annotate ordering trap #
# We wanted to count only published comments...
Post.objects.filter(comments__published=True).annotate(
cnt=Count("comments")
)
# After the JOIN, cnt may be inflated (duplicate rows)from django.db.models import Q
Post.objects.annotate(
cnt=Count("comments", filter=Q(comments__published=True))
)Count("comments", filter=Q(...)) is the standard form for applying a condition at aggregation time.
F expression — DB-side computation
#
Fetching a value into Python, computing, and saving it back creates race conditions.
post = Post.objects.get(pk=1)
post.views += 1
post.save()
# If two requests read and write at the same time, one update is lostUse F to compute on the DB side.
from django.db.models import F
Post.objects.filter(pk=1).update(views=F("views") + 1)
# UPDATE post SET views = views + 1 WHERE id = 1;The DB handles it atomically, so there’s no concurrency issue. Other uses:
Product.objects.update(price=F("price") * Decimal("1.1"))
# items where stock is less than the order quantity
Order.objects.filter(quantity__gt=F("product__stock"))
# update two fields at once
Account.objects.filter(pk=1).update(
balance=F("balance") - 100,
debit_count=F("debit_count") + 1,
)Q object — complex conditions
#
All keyword arguments to filter(...) are AND-combined. For OR / NOT / complex conditions you need the Q object.
from django.db.models import Q
Post.objects.filter(
Q(title__icontains="django") | Q(body__icontains="django")
)# AND
Post.objects.filter(Q(published=True) & Q(views__gte=100))
# NOT
Post.objects.filter(~Q(category__name="notice"))
# combined
Post.objects.filter(
Q(published=True) & (Q(featured=True) | Q(views__gte=1000))
)Building dynamic conditions #
Q shines when search conditions are dynamic.
def search_posts(query, only_published=True, category=None):
q = Q()
if query:
q &= Q(title__icontains=query) | Q(body__icontains=query)
if only_published:
q &= Q(published=True)
if category:
q &= Q(category=category)
return Post.objects.filter(q)A pattern that starts from an empty Q() and accumulates conditions. It fits well inside a CBV’s get_queryset.
N+1 problem — the most common performance trap #
comments = Comment.objects.all()
for c in comments:
print(c.post.title) # one extra SELECT per iterationQueries: 1 initially (all comments) + N (one post lookup per comment). 1000 comments → 1001 queries. This is the N+1 problem.
select_related — JOIN at once (forward FK)
#
comments = Comment.objects.select_related("post").all()
for c in comments:
print(c.post.title) # 0 extra queriesGenerated SQL:
SELECT comment.*, post.* FROM comment
LEFT OUTER JOIN post ON post.id = comment.post_id;One query with a JOIN. Usable on OneToOne, ForeignKey (forward).
prefetch_related — separate query at once (reverse, M2M)
#
Reverse ForeignKey and ManyToMany relations cannot be fetched in a single JOIN without exploding the row count. prefetch_related solves this with a separate query and Python-side mapping.
posts = Post.objects.all()
for p in posts:
print(len(p.comments.all())) # comment lookup per iterationposts = Post.objects.prefetch_related("comments").all()
for p in posts:
print(len(p.comments.all())) # 0 extra queriesGenerated queries: SELECT * FROM post + SELECT * FROM comment WHERE post_id IN (...) — always 2 queries. Even with 1000 posts, still 2.
Combining the two #
posts = (Post.objects
.select_related("author", "category") # FK
.prefetch_related("comments", "tags") # reverse / M2M
.all())Prefetch — customize the prefetch query itself
#
from django.db.models import Prefetch
posts = Post.objects.prefetch_related(
Prefetch(
"comments",
queryset=Comment.objects.filter(published=True).order_by("-created_at"),
to_attr="published_comments",
)
)
for p in posts:
for c in p.published_comments:
print(c.body)A Prefetch object lets you filter/order the prefetched query itself. to_attr puts it in a separate attribute, avoiding conflicts with the original manager.
Detailed N+1 diagnosis methods (django-debug-toolbar, EXPLAIN, query-count assertions) are covered in Advanced #3 Query optimization.
only / defer — choosing fields to fetch
#
For large text fields (body, etc.) you may not want to fetch them on a list page.
Post.objects.only("id", "title", "slug").all()
# light query without the body
Post.objects.defer("body").all()
# defer is the opposite — exclude the named fieldsCaution: on an only-restricted object, accessing a field that wasn’t named triggers an extra query automatically. It’s easy to accidentally introduce N+1.
values / values_list — receive as dict / tuple
#
When you want dict or tuple instead of model instances.
list(Post.objects.values("id", "title")[:3])
# [{'id': 1, 'title': '...'}, {'id': 2, 'title': '...'}, ...]
list(Post.objects.values_list("id", "title")[:3])
# [(1, '...'), (2, '...'), ...]
list(Post.objects.values_list("title", flat=True)[:3])
# ['...', '...', '...'] # flat=True gives a list of single valuesUseful when building a light dict for an API response, or when collecting only ids for another query. No model instances are created, so it’s also lighter on memory.
Subqueries — Subquery, OuterRef, Exists
#
For complex conditions, subqueries are the answer.
from django.db.models import Subquery, OuterRef
latest_comment = Comment.objects.filter(
post=OuterRef("pk")
).order_by("-created_at")
posts = Post.objects.annotate(
latest_comment_at=Subquery(latest_comment.values("created_at")[:1])
)from django.db.models import Exists, OuterRef
has_comment = Comment.objects.filter(post=OuterRef("pk"))
Post.objects.annotate(
has_comment=Exists(has_comment)
).filter(has_comment=True)OuterRef("pk") is the marker that references a column from the outer query.
Last resort — raw SQL #
When ORM expression gets too complex, there’s a raw SQL escape hatch.
posts = Post.objects.raw(
"SELECT * FROM blog_post WHERE views > %s",
[100],
)
for p in posts:
print(p.title)from django.db import connection
with connection.cursor() as cur:
cur.execute(
"SELECT category_id, COUNT(*) FROM blog_post GROUP BY category_id"
)
rows = cur.fetchall()Principle: if you can express it in the ORM, use the ORM. raw is for genuinely complex or performance-critical cases. Parameters must use placeholders (%s) — no string formatting (to prevent SQL injection).
Transactions — transaction.atomic
#
Use this when several writes must be treated as one atomic unit.
from django.db import transaction
with transaction.atomic():
order = Order.objects.create(user=user, total=total)
for item in items:
OrderItem.objects.create(order=order, product=item.product, qty=item.qty)
Product.objects.filter(pk=item.product_id).update(
stock=F("stock") - item.qty
)If an exception occurs inside the block, all writes are rolled back. The decorator form is also possible — @transaction.atomic wraps a whole function/method as a single transaction. Deeper patterns like transaction post-processing (on_commit) are in Advanced #5.
The lazy nature of QuerySet #
A QuerySet does not send SQL until it is actually evaluated.
qs = Post.objects.filter(published=True) # no SQL yet
qs = qs.order_by("-created_at") # still none
qs = qs.annotate(cnt=Count("comments")) # still none
list(qs) # SQL runs here
for p in qs: pass # or here
print(qs[0]) # or hereWhile chaining, you’re just assembling the query. Evaluation happens on iteration, slicing, len, bool, list, etc. Understanding this lets you avoid unnecessary queries.
Summary #
What we covered in this post:
aggregate(whole-set aggregation → dict),annotate(per-row aggregation → QuerySet)Count(..., filter=Q(...))— conditional aggregationFexpression — DB-side computation, concurrency-safeQobject — OR/AND/NOT, dynamic condition assembly- N+1 problem and the fix:
select_related(forward FK, JOIN),prefetch_related(reverse/M2M, separate query) Prefetch(..., queryset=..., to_attr=...)to customize prefetch itselfonly/defer— field selection (beware extra queries when accessing unselected fields)values/values_list(flat=True)— receive as dict/tupleSubquery,OuterRef,Exists— subqueries- raw SQL is the last resort, parameters via placeholders
transaction.atomicto make multiple writes one unit- QuerySet is lazy — understand evaluation timing
In the next post (#3 Signals and Middleware), we cover two things that happen outside the model/view flow — Signals (events) and Middleware (request/response pipeline). Both are powerful but full of pitfalls.