Collections and comprehensions
The four collections — list/tuple/dict/set — and the comprehensions and generator expressions that build new collections in one line.
Time to look at comprehensions in earnest, having previewed them at the end of Chapter 3 Control flow. Before that, we cover the use cases of Python’s four core collections — list, tuple, dict, and set.
The two axes of this chapter are (1) knowing the differences between the four collections precisely, and (2) getting comfortable with one-line comprehensions. Both show up every day throughout the book. Chapter 11 Iterables, generators, yield from covers in depth how the comprehensions and generator expressions in this chapter sit on the same iterator protocol.
Comparison in one table #
| Mutable | Ordered | Duplicates | Key-value | Notation | |
|---|---|---|---|---|---|
list | O | O | allowed | X | [1, 2, 3] |
tuple | X | O | allowed | X | (1, 2, 3) |
set | O | X | not allowed | X | {1, 2, 3} |
dict | O | O¹ | keys unique | O | {"a": 1} |
¹ Since Python 3.7+, dict guarantees insertion order.
list — the most-used collection
#
nums: list[int] = [1, 2, 3]
nums.append(4) # [1, 2, 3, 4]
nums.insert(0, 0) # [0, 1, 2, 3, 4]
nums.remove(2) # [0, 1, 3, 4]
last = nums.pop() # returns 4, list is [0, 1, 3]
print(len(nums)) # 3
print(nums[0]) # 0
print(3 in nums) # TrueSlicing — [start:stop:step]
#
The real value of list is in slicing.
items = [10, 20, 30, 40, 50]
items[1:3] # [20, 30] 1 inclusive, 3 exclusive
items[:2] # [10, 20] from start, less than 2
items[3:] # [40, 50] from 3 to the end
items[-2:] # [40, 50] last 2
items[::2] # [10, 30, 50] step 2
items[::-1] # [50, 40, 30, 20, 10] reverseThe pattern of step = -1 to reverse a list shows up often. Shorter and faster than JavaScript’s arr.toReversed().
Slicing returns a new list. The original is unchanged.
+ and * — concatenation and repetition
#
[1, 2] + [3, 4] # [1, 2, 3, 4]
[0] * 5 # [0, 0, 0, 0, 0][0] * 5 is the shortest way to build a list filled with zeros. But multiplying reference types makes them all the same object.
matrix = [[]] * 3
matrix[0].append(1)
print(matrix)
# [[1], [1], [1]] ← all point to the same list!For these cases you must use a comprehension (covered below).
tuple — a fixed-shape grouping
#
Like list but immutable. The use cases differ instead.
point: tuple[float, float] = (1.0, 2.0)
person: tuple[str, int] = ("curtis", 30)
# tuples are usually used by unpacking
name, age = person
print(name, age) # curtis 30
# A single-element tuple requires a comma — parentheses alone aren't enough
single = (42,) # tuple
not_tuple = (42) # just an integerWhen tuple fits
#
- Returning multiple values together:
return name, age(actually a tuple) - Dictionary keys: lists can’t be keys but tuples can (they’re immutable)
- Fixed-shape data like coordinates / dates — clearer intent than
list
A more explicit tuple — NamedTuple
#
Position-based tuples get confusing over time (was person[0] the name?). Naming the tuple keeps it clean.
from typing import NamedTuple
class Person(NamedTuple):
name: str
age: int
p = Person("curtis", 30)
print(p.name, p.age) # curtis 30
# can also be unpacked like a tuple
name, age = pThe more generally used @dataclass (vs. NamedTuple) is covered in earnest in Chapter 8 dataclass and __slots__. When you need mutable data, methods, or validation, dataclass is the more natural fit.
dict — key-value mapping
#
user: dict[str, int] = {"id": 1, "age": 30}
print(user["id"]) # 1
print(user.get("nope")) # None (missing key → safe)
print(user.get("nope", -1)) # -1 (default)
user["name"] = "curtis" # add/update
del user["age"] # delete
print("name" in user) # Trueuser["missing-key"] raises a KeyError exception. To fetch safely, use .get().
Iteration #
for key in user: # keys only
print(key)
for key, value in user.items(): # both
print(key, value)
for value in user.values(): # values only
print(value).items() gets used almost every day.
Merging — the | operator (3.9+)
#
defaults = {"a": 1, "b": 2}
overrides = {"b": 20, "c": 30}
merged = defaults | overrides
print(merged) # {"a": 1, "b": 20, "c": 30}With |, the right side wins — for duplicate keys, the later value survives. Same meaning as JavaScript’s {...defaults, ...overrides}.
set — a group without duplicates
#
unique: set[int] = {1, 2, 3, 2, 1}
print(unique) # {1, 2, 3}
unique.add(4) # {1, 2, 3, 4}
unique.discard(2) # {1, 3, 4}
print(3 in unique) # True
# empty set is set() — {} is an empty dict
empty = set()Set operations #
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b # union {1, 2, 3, 4, 5, 6}
a & b # intersection {3, 4}
a - b # difference {1, 2}
a ^ b # symmetric diff {1, 2, 5, 6}The shortest way to de-duplicate a list: list(set(items)). But the order can break. To de-duplicate while preserving order:
items = [1, 2, 1, 3, 2, 4]
unique = list(dict.fromkeys(items))
print(unique) # [1, 2, 3, 4]dict.fromkeys() uses dict’s order preservation + key uniqueness at the same time.
Comprehensions — build a collection in one line #
A syntax that combines a for loop and a condition into one line. One of the most-used patterns in Python.
List comprehension #
# [expression for variable in iterable]
squares = [x ** 2 for x in range(5)]
# [0, 1, 4, 9, 16]Written out as a for loop:
squares = []
for x in range(5):
squares.append(x ** 2)Five lines shrink to one. It also reads faster — the intent is visible at a glance.
Condition — filter #
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]The if is a filter. Only elements that match the condition enter.
Transform + filter together #
# pick evens and square them
result = [x ** 2 for x in range(10) if x % 2 == 0]
# [0, 4, 16, 36, 64]if-else expression (mind the position)
#
if-else goes in the expression position. Its position differs from the filter if.
labels = ["even" if x % 2 == 0 else "odd" for x in range(5)]
# ['even', 'odd', 'even', 'odd', 'even']The order is [expression if cond else other-expression for x in iter]. It’s a confusing part, so read it slowly.
Nesting — building 2D #
matrix = [[0 for _ in range(3)] for _ in range(3)]
# [[0, 0, 0], [0, 0, 0], [0, 0, 0]]The comprehension solves the [[0]] * 3 pitfall seen earlier. Every iteration creates a new list, so they’re independent.
Dict comprehension #
square_map = {x: x ** 2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
# key-value swap
user = {"id": 1, "name": "curtis"}
swapped = {v: k for k, v in user.items()}
# {1: "id", "curtis": "name"}Set comprehension #
unique_lengths = {len(w) for w in ["a", "bb", "cc", "ddd"]}
# {1, 2, 3}Pitfall — order of nested for #
pairs = [(x, y) for x in [1, 2] for y in ['a', 'b']]
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]When there are multiple fors, the leftmost is the outer loop. Expanded:
pairs = []
for x in [1, 2]:
for y in ['a', 'b']:
pairs.append((x, y))Generator expressions — saving memory #
Using parentheses instead of brackets turns the comprehension into a generator.
gen = (x ** 2 for x in range(1_000_000))
print(gen) # <generator object ...>A list comprehension creates every element immediately. With a million elements, you take a million elements’ worth of memory. A generator produces the next value only on request, so memory cost is nearly zero.
total = sum(x ** 2 for x in range(1_000_000))
# When passed directly to a function, the () can be omittedPassing to functions like sum, max, any, all that only need a single pass is the proper place for a generator.
| List comprehension | Generator expression | |
|---|---|---|
| Notation | [ ... ] | ( ... ) |
| Memory | Creates every element immediately | Only when needed |
| Reuse | Iterable many times | Once only |
| Index access | O result[3] | X |
Deeper generator usage — yield, yield from, async generators — is covered in Chapter 11 Iterables, generators, yield from.
When to use comprehensions, when to expand them? #
Comprehensions aren’t always best. When the logic grows complex, expanding reads better.
# Possible in one line, but hard to read
result = [transform(x) for x in items if validate(x) and is_active(x) and not is_deleted(x)]
# Expanded — easier to debug and read
result = []
for x in items:
if not validate(x):
continue
if not is_active(x) or is_deleted(x):
continue
result.append(transform(x))Rule of thumb: one expression per line, at most one if — that’s where comprehensions shine. Beyond that, expanding is usually better.
Exercises #
users: list[dict[str, int | str]]is given as[{"name": "a", "age": 30}, {"name": "b", "age": 17}, {"name": "c", "age": 22}]. Write a one-line comprehension expression that returns the names of adults (19+) only as a list.- From
words = ["apple", "banana", "cherry", "date"], use a dict comprehension to build a word → length mapping like{"apple": 5, "banana": 6, "cherry": 6, "date": 4}. - You need to compute the sum of squares of evens in
range(1, 100_000_001). Do it (1) once with a list comprehensionsum([x**2 for x in range(...) if x % 2 == 0])and (2) once with a generator expressionsum(x**2 for x in range(...) if x % 2 == 0). Feel the memory / time difference yourself (memory monitoring is revisited in Chapter 21).
In one line: Choosing between the four collections turns on four axes — mutability / order / duplicates / key-value.
listslicing,dict.getand.items(), andsetoperations cover 90% of daily work. A one-line comprehension is usually shorter and faster, but when conditions / transforms get complex, expanding is better. For memory-heavy single-pass iteration, use a generator expression(...).
Next chapter #
In the next chapter, Chapter 5 Functions — argument patterns, we cover the various argument patterns in function definitions. Positional / keyword / default / *args / **kwargs / positional-only / keyword-only. Combined with the comprehensions of this chapter, you can write short functional-style data transformation code.