Modern Python Basics #4: Collections and comprehensions
It’s time to look at comprehensions — which we briefly saw at the end of #3 Control flow — in earnest. Before that, let’s lay out Python’s four core collections — list, tuple, dict, set.
Their roles in one table #
| Mutable | Order | Duplicates | Key-value | Notation | |
|---|---|---|---|---|---|
list | O | O | allowed | X | [1, 2, 3] |
tuple | X | O | allowed | X | (1, 2, 3) |
set | O | X | disallowed | X | {1, 2, 3} |
dict | O | O¹ | unique keys | O | {"a": 1} |
¹ From Python 3.7+, dict guarantees insertion order.
list — the workhorse
#
nums: list[int] = [1, 2, 3]
nums.append(4) # [1, 2, 3, 4]
nums.insert(0, 0) # [0, 1, 2, 3, 4]
nums.remove(2) # [0, 1, 3, 4]
last = nums.pop() # returns 4; list becomes [0, 1, 3]
print(len(nums)) # 3
print(nums[0]) # 0
print(3 in nums) # TrueSlicing — [start:stop:step]
#
The list’s real power is in slicing.
items = [10, 20, 30, 40, 50]
items[1:3] # [20, 30] 1 inclusive, 3 exclusive
items[:2] # [10, 20] from start, less than 2
items[3:] # [40, 50] from 3 to end
items[-2:] # [40, 50] last 2
items[::2] # [10, 30, 50] step 2
items[::-1] # [50, 40, 30, 20, 10] reverseThe pattern of reversing with step = -1 is common. Shorter and faster than JavaScript’s arr.toReversed().
Slicing returns a new list. The original is unchanged.
+ and * — concatenation and repetition
#
[1, 2] + [3, 4] # [1, 2, 3, 4]
[0] * 5 # [0, 0, 0, 0, 0][0] * 5 is the shortest way to make a list of zeros. Beware: multiplying reference types gives you multiple references to the same object.
matrix = [[]] * 3
matrix[0].append(1)
print(matrix)
# [[1], [1], [1]] ← all reference the same list!For this case, use a comprehension (covered below).
tuple — fixed-shape bundles
#
Almost the same as list but immutable. With a different role.
point: tuple[float, float] = (1.0, 2.0)
person: tuple[str, int] = ("커티스", 30)
# tuples are usually unpacked
name, age = person
print(name, age) # 커티스 30
# A single-element tuple needs a comma — parens alone aren't enough
single = (42,) # tuple
not_tuple = (42) # just an integerWhere tuples fit #
- Returning multiple values:
return name, age(it’s actually a tuple) - Dict keys: lists can’t, tuples can (immutability)
- Fixed-shape data like coordinates/dates — clearer intent than a list
A more explicit tuple — NamedTuple
#
Position-based tuples become confusing with time (was person[0] the name?). Naming the tuple keeps it clean.
from typing import NamedTuple
class Person(NamedTuple):
name: str
age: int
p = Person("커티스", 30)
print(p.name, p.age) # 커티스 30
# unpack like a tuple
name, age = pdict — key-value mapping
#
user: dict[str, int] = {"id": 1, "age": 30}
print(user["id"]) # 1
print(user.get("nope")) # None (missing key → safe)
print(user.get("nope", -1)) # -1 (default)
user["name"] = "curtis" # add/update
del user["age"] # remove
print("name" in user) # Trueuser["없는키"] raises a KeyError exception. To get safely, use .get().
Iteration #
for key in user: # keys only
print(key)
for key, value in user.items(): # both
print(key, value)
for value in user.values(): # values only
print(value).items() is used almost daily.
Merging — the | operator (3.9+)
#
defaults = {"a": 1, "b": 2}
overrides = {"b": 20, "c": 30}
merged = defaults | overrides
print(merged) # {"a": 1, "b": 20, "c": 30}| favors the right side — for the same key, the right one wins. Same idea as JavaScript’s {...defaults, ...overrides}.
set — duplicate-free collection
#
unique: set[int] = {1, 2, 3, 2, 1}
print(unique) # {1, 2, 3}
unique.add(4) # {1, 2, 3, 4}
unique.discard(2) # {1, 3, 4}
print(3 in unique) # True
# Empty set is set() — {} is an empty dict
empty = set()Set operations #
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}
a | b # union {1, 2, 3, 4, 5, 6}
a & b # intersection {3, 4}
a - b # difference {1, 2}
a ^ b # symmetric difference {1, 2, 5, 6}The shortest deduplication of a list: list(set(items)). Watch out — order may break. To dedupe while preserving order:
items = [1, 2, 1, 3, 2, 4]
unique = list(dict.fromkeys(items))
print(unique) # [1, 2, 3, 4]dict.fromkeys() exploits dict’s preserved insertion order + key uniqueness at once.
Comprehensions — build collections in one line #
Syntax that fuses a for loop and a condition into a single line. One of the most-used patterns in Python.
List comprehension #
# [expression for variable in iterable]
squares = [x ** 2 for x in range(5)]
# [0, 1, 4, 9, 16]Expanded as a for loop:
squares = []
for x in range(5):
squares.append(x ** 2)Five lines compress to one. It’s faster to read too — intent is visible at a glance.
Condition — filter #
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]if is a filter. Only matching elements get in.
Transform + filter together #
# square only the evens
result = [x ** 2 for x in range(10) if x % 2 == 0]
# [0, 4, 16, 36, 64]if-else expression (mind the position)
#
if-else goes in the expression position. Different position from the filter if.
labels = ["even" if x % 2 == 0 else "odd" for x in range(5)]
# ['even', 'odd', 'even', 'odd', 'even']The order is [expr if cond else other_expr for x in iter]. A confusing area — read it carefully.
Nested — making 2D #
matrix = [[0 for _ in range(3)] for _ in range(3)]
# [[0, 0, 0], [0, 0, 0], [0, 0, 0]]Solves the [[0]] * 3 pitfall from earlier. Each inner list is freshly created and independent.
Dict comprehension #
square_map = {x: x ** 2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
# Swap key/value
user = {"id": 1, "name": "curtis"}
swapped = {v: k for k, v in user.items()}
# {1: "id", "curtis": "name"}Set comprehension #
unique_lengths = {len(w) for w in ["a", "bb", "cc", "ddd"]}
# {1, 2, 3}Pitfall — the order of nested for #
pairs = [(x, y) for x in [1, 2] for y in ['a', 'b']]
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]With multiple fors, the leftmost is the outer loop. Expanded:
pairs = []
for x in [1, 2]:
for y in ['a', 'b']:
pairs.append((x, y))Generator expressions — saving memory #
Replace square brackets with parentheses and the comprehension becomes a generator instead.
gen = (x ** 2 for x in range(1_000_000))
print(gen) # <generator object ...>A list comprehension builds every element immediately. A million elements means a million slots in memory. A generator produces the next value only when asked, so it uses almost no memory.
total = sum(x ** 2 for x in range(1_000_000))
# When passed directly to a function, the () can be omittedGenerators fit when passed to a function that only iterates once like sum, max, any, all.
| List comprehension | Generator expression | |
|---|---|---|
| Notation | [ ... ] | ( ... ) |
| Memory | Every element built immediately | Only on demand |
| Reuse | Iterate many times | Iterate once only |
| Index access | O result[3] | X |
When to use a comprehension, when to expand #
Comprehensions aren’t always best. As logic gets complex, expanding reads better.
# Possible in one line, but hard to read
result = [transform(x) for x in items if validate(x) and is_active(x) and not is_deleted(x)]
# Expanded — easy to debug, easy to read
result = []
for x in items:
if not validate(x):
continue
if not is_active(x) or is_deleted(x):
continue
result.append(transform(x))Rule of thumb: one expression per line, up to one if is where comprehensions shine. Past that, expanding is usually better.
Wrap-up #
What this post covered:
list(mutable, ordered, duplicates) — slicing[start:stop:step]is powerfultuple(immutable, fixed-shape) — multi-return, dict keys, NamedTupledict(key-value, ordered) —.get(),.items(), merge with|set(no duplicates, no order) — set operations| & - ^- List / dict / set comprehensions —
[expr for x in iter if cond] if-elsegoes in the expression slot;ifgoes in the filter slot- Generator expression
(x for x in iter)— memory-efficient, single iteration - When too complex, expanding the comprehension is better
In the next post (#5 Functions and argument patterns) we cover the various argument patterns in function definitions — positional-only, keyword-only, *args / **kwargs, and more.