Django Intermediate #2: ORM Advanced — annotate, aggregate, F/Q, prefetch_related

8 min read

In Basics #3 we got our first introduction to the ORM — tools like Post.objects.filter, get, create. This post stacks the proper tools on top of that. These are patterns used almost daily in real work, and this single post will change half the ORM code you write.

  • annotate / aggregate — aggregation
  • F expression — DB-side computation
  • Q object — complex conditions (OR/AND/NOT)
  • select_related / prefetch_related — solving N+1
  • only / defer / values — choosing fields to fetch
  • raw SQL escape hatch — the last resort

These are also the tools that go inside get_queryset from the previous post #1 CBV.

annotate vs aggregate — two faces of aggregation #

Both use aggregate functions, but the shape of the result differs.

annotateaggregate
ResultQuerySet (new field on each row)dict (aggregate over all)
GranularityPer-row computationWhole-set computation
Example“comment count for each post”“total posts, average rating”

aggregate — aggregate over all #

aggregate
from django.db.models import Count, Avg, Sum, Max, Min

Post.objects.aggregate(Count("id"))
# {'id__count': 1234}

Post.objects.aggregate(
    total=Count("id"),
    avg_views=Avg("views"),
    max_views=Max("views"),
)
# {'total': 1234, 'avg_views': 152.3, 'max_views': 9876}

Key names are auto-generated (<field>__<func> form), or you can name them via keyword arguments.

annotate — per-row computation #

annotate — comment count for each post
from django.db.models import Count

posts = Post.objects.annotate(comment_count=Count("comments"))
for p in posts:
    print(p.title, p.comment_count)

comments is the reverse manager name from Comment model’s ForeignKey(Post, related_name="comments"). annotate automatically generates SQL GROUP BY + an aggregate function to compute the comment count for each post.

generated SQL (simplified)
SELECT post.*, COUNT(comment.id) AS comment_count
FROM post
LEFT OUTER JOIN comment ON comment.post_id = post.id
GROUP BY post.id;

annotate + filter combination #

posts with 5 or more comments
Post.objects.annotate(
    comment_count=Count("comments")
).filter(comment_count__gte=5)

You can use the virtual field created by annotate directly as a filter condition. It translates to SQL HAVING.

filter + annotate ordering trap #

🚫 result that doesn't match the intent
# We wanted to count only published comments...
Post.objects.filter(comments__published=True).annotate(
    cnt=Count("comments")
)
# After the JOIN, cnt may be inflated (duplicate rows)
✅ with Q
from django.db.models import Q

Post.objects.annotate(
    cnt=Count("comments", filter=Q(comments__published=True))
)

Count("comments", filter=Q(...)) is the standard form for applying a condition at aggregation time.

F expression — DB-side computation #

Fetching a value into Python, computing, and saving it back creates race conditions.

🚫 race condition possible
post = Post.objects.get(pk=1)
post.views += 1
post.save()
# If two requests read and write at the same time, one update is lost

Use F to compute on the DB side.

✅ F expression
from django.db.models import F

Post.objects.filter(pk=1).update(views=F("views") + 1)
# UPDATE post SET views = views + 1 WHERE id = 1;

The DB handles it atomically, so there’s no concurrency issue. Other uses:

raise prices by 10%, compare two fields
Product.objects.update(price=F("price") * Decimal("1.1"))

# items where stock is less than the order quantity
Order.objects.filter(quantity__gt=F("product__stock"))

# update two fields at once
Account.objects.filter(pk=1).update(
    balance=F("balance") - 100,
    debit_count=F("debit_count") + 1,
)

Q object — complex conditions #

All keyword arguments to filter(...) are AND-combined. For OR / NOT / complex conditions you need the Q object.

OR — keyword in title or body
from django.db.models import Q

Post.objects.filter(
    Q(title__icontains="django") | Q(body__icontains="django")
)
AND, NOT
# AND
Post.objects.filter(Q(published=True) & Q(views__gte=100))

# NOT
Post.objects.filter(~Q(category__name="notice"))

# combined
Post.objects.filter(
    Q(published=True) & (Q(featured=True) | Q(views__gte=1000))
)

Building dynamic conditions #

Q shines when search conditions are dynamic.

search-form handling
def search_posts(query, only_published=True, category=None):
    q = Q()
    if query:
        q &= Q(title__icontains=query) | Q(body__icontains=query)
    if only_published:
        q &= Q(published=True)
    if category:
        q &= Q(category=category)
    return Post.objects.filter(q)

A pattern that starts from an empty Q() and accumulates conditions. It fits well inside a CBV’s get_queryset.

N+1 problem — the most common performance trap #

🚫 N+1 — a separate query per comment to look up the post
comments = Comment.objects.all()
for c in comments:
    print(c.post.title)        # one extra SELECT per iteration

Queries: 1 initially (all comments) + N (one post lookup per comment). 1000 comments → 1001 queries. This is the N+1 problem.

select_related — JOIN at once (forward FK) #

✅ select_related
comments = Comment.objects.select_related("post").all()
for c in comments:
    print(c.post.title)        # 0 extra queries

Generated SQL:

SELECT comment.*, post.* FROM comment
LEFT OUTER JOIN post ON post.id = comment.post_id;

One query with a JOIN. Usable on OneToOne, ForeignKey (forward).

prefetch_related — separate query at once (reverse, M2M) #

Reverse ForeignKey and ManyToMany relations cannot be fetched in a single JOIN without exploding the row count. prefetch_related solves this with a separate query and Python-side mapping.

🚫 N+1 — comment list per post
posts = Post.objects.all()
for p in posts:
    print(len(p.comments.all()))   # comment lookup per iteration
✅ prefetch_related
posts = Post.objects.prefetch_related("comments").all()
for p in posts:
    print(len(p.comments.all()))   # 0 extra queries

Generated queries: SELECT * FROM post + SELECT * FROM comment WHERE post_id IN (...)always 2 queries. Even with 1000 posts, still 2.

Combining the two #

both together
posts = (Post.objects
         .select_related("author", "category")     # FK
         .prefetch_related("comments", "tags")     # reverse / M2M
         .all())

Prefetch — customize the prefetch query itself #

prefetch only published comments
from django.db.models import Prefetch

posts = Post.objects.prefetch_related(
    Prefetch(
        "comments",
        queryset=Comment.objects.filter(published=True).order_by("-created_at"),
        to_attr="published_comments",
    )
)
for p in posts:
    for c in p.published_comments:
        print(c.body)

A Prefetch object lets you filter/order the prefetched query itself. to_attr puts it in a separate attribute, avoiding conflicts with the original manager.

Detailed N+1 diagnosis methods (django-debug-toolbar, EXPLAIN, query-count assertions) are covered in Advanced #3 Query optimization.

only / defer — choosing fields to fetch #

For large text fields (body, etc.) you may not want to fetch them on a list page.

only — only the named fields
Post.objects.only("id", "title", "slug").all()
# light query without the body

Post.objects.defer("body").all()
# defer is the opposite — exclude the named fields

Caution: on an only-restricted object, accessing a field that wasn’t named triggers an extra query automatically. It’s easy to accidentally introduce N+1.

values / values_list — receive as dict / tuple #

When you want dict or tuple instead of model instances.

values
list(Post.objects.values("id", "title")[:3])
# [{'id': 1, 'title': '...'}, {'id': 2, 'title': '...'}, ...]

list(Post.objects.values_list("id", "title")[:3])
# [(1, '...'), (2, '...'), ...]

list(Post.objects.values_list("title", flat=True)[:3])
# ['...', '...', '...']    # flat=True gives a list of single values

Useful when building a light dict for an API response, or when collecting only ids for another query. No model instances are created, so it’s also lighter on memory.

Subqueries — Subquery, OuterRef, Exists #

For complex conditions, subqueries are the answer.

latest comment date for each post
from django.db.models import Subquery, OuterRef

latest_comment = Comment.objects.filter(
    post=OuterRef("pk")
).order_by("-created_at")

posts = Post.objects.annotate(
    latest_comment_at=Subquery(latest_comment.values("created_at")[:1])
)
only posts that have comments
from django.db.models import Exists, OuterRef

has_comment = Comment.objects.filter(post=OuterRef("pk"))

Post.objects.annotate(
    has_comment=Exists(has_comment)
).filter(has_comment=True)

OuterRef("pk") is the marker that references a column from the outer query.

Last resort — raw SQL #

When ORM expression gets too complex, there’s a raw SQL escape hatch.

raw — results as model instances
posts = Post.objects.raw(
    "SELECT * FROM blog_post WHERE views > %s",
    [100],
)
for p in posts:
    print(p.title)
connection.cursor() — full freedom
from django.db import connection

with connection.cursor() as cur:
    cur.execute(
        "SELECT category_id, COUNT(*) FROM blog_post GROUP BY category_id"
    )
    rows = cur.fetchall()

Principle: if you can express it in the ORM, use the ORM. raw is for genuinely complex or performance-critical cases. Parameters must use placeholders (%s) — no string formatting (to prevent SQL injection).

Transactions — transaction.atomic #

Use this when several writes must be treated as one atomic unit.

atomic block
from django.db import transaction

with transaction.atomic():
    order = Order.objects.create(user=user, total=total)
    for item in items:
        OrderItem.objects.create(order=order, product=item.product, qty=item.qty)
        Product.objects.filter(pk=item.product_id).update(
            stock=F("stock") - item.qty
        )

If an exception occurs inside the block, all writes are rolled back. The decorator form is also possible — @transaction.atomic wraps a whole function/method as a single transaction. Deeper patterns like transaction post-processing (on_commit) are in Advanced #5.

The lazy nature of QuerySet #

A QuerySet does not send SQL until it is actually evaluated.

evaluation timing
qs = Post.objects.filter(published=True)   # no SQL yet
qs = qs.order_by("-created_at")            # still none
qs = qs.annotate(cnt=Count("comments"))    # still none

list(qs)                                    # SQL runs here
for p in qs: pass                           # or here
print(qs[0])                                # or here

While chaining, you’re just assembling the query. Evaluation happens on iteration, slicing, len, bool, list, etc. Understanding this lets you avoid unnecessary queries.

Summary #

What we covered in this post:

  • aggregate (whole-set aggregation → dict), annotate (per-row aggregation → QuerySet)
  • Count(..., filter=Q(...)) — conditional aggregation
  • F expression — DB-side computation, concurrency-safe
  • Q object — OR/AND/NOT, dynamic condition assembly
  • N+1 problem and the fix: select_related (forward FK, JOIN), prefetch_related (reverse/M2M, separate query)
  • Prefetch(..., queryset=..., to_attr=...) to customize prefetch itself
  • only / defer — field selection (beware extra queries when accessing unselected fields)
  • values / values_list(flat=True) — receive as dict/tuple
  • Subquery, OuterRef, Exists — subqueries
  • raw SQL is the last resort, parameters via placeholders
  • transaction.atomic to make multiple writes one unit
  • QuerySet is lazy — understand evaluation timing

In the next post (#3 Signals and Middleware), we cover two things that happen outside the model/view flow — Signals (events) and Middleware (request/response pipeline). Both are powerful but full of pitfalls.

X