Contents
4 Chapter

Collections and comprehensions

The four collections — list/tuple/dict/set — and the comprehensions and generator expressions that build new collections in one line.

Time to look at comprehensions in earnest, having previewed them at the end of Chapter 3 Control flow. Before that, we cover the use cases of Python’s four core collectionslist, tuple, dict, and set.

The two axes of this chapter are (1) knowing the differences between the four collections precisely, and (2) getting comfortable with one-line comprehensions. Both show up every day throughout the book. Chapter 11 Iterables, generators, yield from covers in depth how the comprehensions and generator expressions in this chapter sit on the same iterator protocol.

Comparison in one table #

MutableOrderedDuplicatesKey-valueNotation
listOOallowedX[1, 2, 3]
tupleXOallowedX(1, 2, 3)
setOXnot allowedX{1, 2, 3}
dictOkeys uniqueO{"a": 1}

¹ Since Python 3.7+, dict guarantees insertion order.

list — the most-used collection #

list basics
nums: list[int] = [1, 2, 3]

nums.append(4)         # [1, 2, 3, 4]
nums.insert(0, 0)      # [0, 1, 2, 3, 4]
nums.remove(2)         # [0, 1, 3, 4]
last = nums.pop()      # returns 4, list is [0, 1, 3]

print(len(nums))       # 3
print(nums[0])         # 0
print(3 in nums)       # True

Slicing — [start:stop:step] #

The real value of list is in slicing.

slicing
items = [10, 20, 30, 40, 50]

items[1:3]    # [20, 30]    1 inclusive, 3 exclusive
items[:2]     # [10, 20]    from start, less than 2
items[3:]     # [40, 50]    from 3 to the end
items[-2:]    # [40, 50]    last 2
items[::2]    # [10, 30, 50]  step 2
items[::-1]   # [50, 40, 30, 20, 10]   reverse

The pattern of step = -1 to reverse a list shows up often. Shorter and faster than JavaScript’s arr.toReversed().

Slicing returns a new list. The original is unchanged.

+ and * — concatenation and repetition #

concat and repeat
[1, 2] + [3, 4]   # [1, 2, 3, 4]
[0] * 5           # [0, 0, 0, 0, 0]

[0] * 5 is the shortest way to build a list filled with zeros. But multiplying reference types makes them all the same object.

pitfall — multiplying a mutable object
matrix = [[]] * 3
matrix[0].append(1)
print(matrix)
# [[1], [1], [1]]   ← all point to the same list!

For these cases you must use a comprehension (covered below).

tuple — a fixed-shape grouping #

Like list but immutable. The use cases differ instead.

tuple basics
point: tuple[float, float] = (1.0, 2.0)
person: tuple[str, int] = ("curtis", 30)

# tuples are usually used by unpacking
name, age = person
print(name, age)   # curtis 30

# A single-element tuple requires a comma — parentheses alone aren't enough
single = (42,)     # tuple
not_tuple = (42)   # just an integer

When tuple fits #

  • Returning multiple values together: return name, age (actually a tuple)
  • Dictionary keys: lists can’t be keys but tuples can (they’re immutable)
  • Fixed-shape data like coordinates / dates — clearer intent than list

A more explicit tuple — NamedTuple #

Position-based tuples get confusing over time (was person[0] the name?). Naming the tuple keeps it clean.

NamedTuple
from typing import NamedTuple

class Person(NamedTuple):
    name: str
    age: int

p = Person("curtis", 30)
print(p.name, p.age)   # curtis 30

# can also be unpacked like a tuple
name, age = p

The more generally used @dataclass (vs. NamedTuple) is covered in earnest in Chapter 8 dataclass and __slots__. When you need mutable data, methods, or validation, dataclass is the more natural fit.

dict — key-value mapping #

dict basics
user: dict[str, int] = {"id": 1, "age": 30}

print(user["id"])              # 1
print(user.get("nope"))        # None  (missing key → safe)
print(user.get("nope", -1))    # -1    (default)

user["name"] = "curtis"        # add/update
del user["age"]                # delete
print("name" in user)          # True

user["missing-key"] raises a KeyError exception. To fetch safely, use .get().

Iteration #

dict iteration
for key in user:                # keys only
    print(key)

for key, value in user.items(): # both
    print(key, value)

for value in user.values():     # values only
    print(value)

.items() gets used almost every day.

Merging — the | operator (3.9+) #

merging dicts
defaults = {"a": 1, "b": 2}
overrides = {"b": 20, "c": 30}

merged = defaults | overrides
print(merged)    # {"a": 1, "b": 20, "c": 30}

With |, the right side wins — for duplicate keys, the later value survives. Same meaning as JavaScript’s {...defaults, ...overrides}.

set — a group without duplicates #

set basics
unique: set[int] = {1, 2, 3, 2, 1}
print(unique)   # {1, 2, 3}

unique.add(4)      # {1, 2, 3, 4}
unique.discard(2)  # {1, 3, 4}
print(3 in unique) # True

# empty set is set() — {} is an empty dict
empty = set()

Set operations #

set operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

a | b   # union          {1, 2, 3, 4, 5, 6}
a & b   # intersection   {3, 4}
a - b   # difference     {1, 2}
a ^ b   # symmetric diff {1, 2, 5, 6}

The shortest way to de-duplicate a list: list(set(items)). But the order can break. To de-duplicate while preserving order:

order-preserving de-dup
items = [1, 2, 1, 3, 2, 4]
unique = list(dict.fromkeys(items))
print(unique)   # [1, 2, 3, 4]

dict.fromkeys() uses dict’s order preservation + key uniqueness at the same time.

Comprehensions — build a collection in one line #

A syntax that combines a for loop and a condition into one line. One of the most-used patterns in Python.

List comprehension #

basic form
# [expression for variable in iterable]

squares = [x ** 2 for x in range(5)]
# [0, 1, 4, 9, 16]

Written out as a for loop:

same code, expanded
squares = []
for x in range(5):
    squares.append(x ** 2)

Five lines shrink to one. It also reads faster — the intent is visible at a glance.

Condition — filter #

if clause
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]

The if is a filter. Only elements that match the condition enter.

Transform + filter together #

both at once
# pick evens and square them
result = [x ** 2 for x in range(10) if x % 2 == 0]
# [0, 4, 16, 36, 64]

if-else expression (mind the position) #

if-else goes in the expression position. Its position differs from the filter if.

conditional expression
labels = ["even" if x % 2 == 0 else "odd" for x in range(5)]
# ['even', 'odd', 'even', 'odd', 'even']

The order is [expression if cond else other-expression for x in iter]. It’s a confusing part, so read it slowly.

Nesting — building 2D #

nested comprehension — 2D matrix
matrix = [[0 for _ in range(3)] for _ in range(3)]
# [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

The comprehension solves the [[0]] * 3 pitfall seen earlier. Every iteration creates a new list, so they’re independent.

Dict comprehension #

dict comprehension
square_map = {x: x ** 2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# key-value swap
user = {"id": 1, "name": "curtis"}
swapped = {v: k for k, v in user.items()}
# {1: "id", "curtis": "name"}

Set comprehension #

set comprehension
unique_lengths = {len(w) for w in ["a", "bb", "cc", "ddd"]}
# {1, 2, 3}

Pitfall — order of nested for #

nested for
pairs = [(x, y) for x in [1, 2] for y in ['a', 'b']]
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

When there are multiple fors, the leftmost is the outer loop. Expanded:

same thing, expanded
pairs = []
for x in [1, 2]:
    for y in ['a', 'b']:
        pairs.append((x, y))

Generator expressions — saving memory #

Using parentheses instead of brackets turns the comprehension into a generator.

generator expression
gen = (x ** 2 for x in range(1_000_000))
print(gen)   # <generator object ...>

A list comprehension creates every element immediately. With a million elements, you take a million elements’ worth of memory. A generator produces the next value only on request, so memory cost is nearly zero.

sum — a generator is enough
total = sum(x ** 2 for x in range(1_000_000))
# When passed directly to a function, the () can be omitted

Passing to functions like sum, max, any, all that only need a single pass is the proper place for a generator.

List comprehensionGenerator expression
Notation[ ... ]( ... )
MemoryCreates every element immediatelyOnly when needed
ReuseIterable many timesOnce only
Index accessO result[3]X

Deeper generator usage — yield, yield from, async generators — is covered in Chapter 11 Iterables, generators, yield from.

When to use comprehensions, when to expand them? #

Comprehensions aren’t always best. When the logic grows complex, expanding reads better.

this one — expand it
# Possible in one line, but hard to read
result = [transform(x) for x in items if validate(x) and is_active(x) and not is_deleted(x)]

# Expanded — easier to debug and read
result = []
for x in items:
    if not validate(x):
        continue
    if not is_active(x) or is_deleted(x):
        continue
    result.append(transform(x))

Rule of thumb: one expression per line, at most one if — that’s where comprehensions shine. Beyond that, expanding is usually better.

Exercises #

  1. users: list[dict[str, int | str]] is given as [{"name": "a", "age": 30}, {"name": "b", "age": 17}, {"name": "c", "age": 22}]. Write a one-line comprehension expression that returns the names of adults (19+) only as a list.
  2. From words = ["apple", "banana", "cherry", "date"], use a dict comprehension to build a word → length mapping like {"apple": 5, "banana": 6, "cherry": 6, "date": 4}.
  3. You need to compute the sum of squares of evens in range(1, 100_000_001). Do it (1) once with a list comprehension sum([x**2 for x in range(...) if x % 2 == 0]) and (2) once with a generator expression sum(x**2 for x in range(...) if x % 2 == 0). Feel the memory / time difference yourself (memory monitoring is revisited in Chapter 21).

In one line: Choosing between the four collections turns on four axes — mutability / order / duplicates / key-value. list slicing, dict.get and .items(), and set operations cover 90% of daily work. A one-line comprehension is usually shorter and faster, but when conditions / transforms get complex, expanding is better. For memory-heavy single-pass iteration, use a generator expression (...).

Next chapter #

In the next chapter, Chapter 5 Functions — argument patterns, we cover the various argument patterns in function definitions. Positional / keyword / default / *args / **kwargs / positional-only / keyword-only. Combined with the comprehensions of this chapter, you can write short functional-style data transformation code.

X