Modern Python Basics #4: Collections and comprehensions

It’s time to look at comprehensions — which we briefly saw at the end of #3 Control flow — in earnest. Before that, let’s lay out Python’s four core collectionslist, tuple, dict, set.

Their roles in one table #

MutableOrderDuplicatesKey-valueNotation
listOOallowedX[1, 2, 3]
tupleXOallowedX(1, 2, 3)
setOXdisallowedX{1, 2, 3}
dictOunique keysO{"a": 1}

¹ From Python 3.7+, dict guarantees insertion order.

list — the workhorse #

list basics
nums: list[int] = [1, 2, 3]

nums.append(4)         # [1, 2, 3, 4]
nums.insert(0, 0)      # [0, 1, 2, 3, 4]
nums.remove(2)         # [0, 1, 3, 4]
last = nums.pop()      # returns 4; list becomes [0, 1, 3]

print(len(nums))       # 3
print(nums[0])         # 0
print(3 in nums)       # True

Slicing — [start:stop:step] #

The list’s real power is in slicing.

Slicing
items = [10, 20, 30, 40, 50]

items[1:3]    # [20, 30]    1 inclusive, 3 exclusive
items[:2]     # [10, 20]    from start, less than 2
items[3:]     # [40, 50]    from 3 to end
items[-2:]    # [40, 50]    last 2
items[::2]    # [10, 30, 50]  step 2
items[::-1]   # [50, 40, 30, 20, 10]   reverse

The pattern of reversing with step = -1 is common. Shorter and faster than JavaScript’s arr.toReversed().

Slicing returns a new list. The original is unchanged.

+ and * — concatenation and repetition #

Concatenate and repeat
[1, 2] + [3, 4]   # [1, 2, 3, 4]
[0] * 5           # [0, 0, 0, 0, 0]

[0] * 5 is the shortest way to make a list of zeros. Beware: multiplying reference types gives you multiple references to the same object.

Pitfall — multiplying mutable objects
matrix = [[]] * 3
matrix[0].append(1)
print(matrix)
# [[1], [1], [1]]   ← all reference the same list!

For this case, use a comprehension (covered below).

tuple — fixed-shape bundles #

Almost the same as list but immutable. With a different role.

tuple basics
point: tuple[float, float] = (1.0, 2.0)
person: tuple[str, int] = ("커티스", 30)

# tuples are usually unpacked
name, age = person
print(name, age)   # 커티스 30

# A single-element tuple needs a comma — parens alone aren't enough
single = (42,)     # tuple
not_tuple = (42)   # just an integer

Where tuples fit #

  • Returning multiple values: return name, age (it’s actually a tuple)
  • Dict keys: lists can’t, tuples can (immutability)
  • Fixed-shape data like coordinates/dates — clearer intent than a list

A more explicit tuple — NamedTuple #

Position-based tuples become confusing with time (was person[0] the name?). Naming the tuple keeps it clean.

NamedTuple
from typing import NamedTuple

class Person(NamedTuple):
    name: str
    age: int

p = Person("커티스", 30)
print(p.name, p.age)   # 커티스 30

# unpack like a tuple
name, age = p

dict — key-value mapping #

dict basics
user: dict[str, int] = {"id": 1, "age": 30}

print(user["id"])              # 1
print(user.get("nope"))        # None  (missing key → safe)
print(user.get("nope", -1))    # -1    (default)

user["name"] = "curtis"        # add/update
del user["age"]                # remove
print("name" in user)          # True

user["없는키"] raises a KeyError exception. To get safely, use .get().

Iteration #

Iterating a dict
for key in user:                # keys only
    print(key)

for key, value in user.items(): # both
    print(key, value)

for value in user.values():     # values only
    print(value)

.items() is used almost daily.

Merging — the | operator (3.9+) #

Merging dicts
defaults = {"a": 1, "b": 2}
overrides = {"b": 20, "c": 30}

merged = defaults | overrides
print(merged)    # {"a": 1, "b": 20, "c": 30}

| favors the right side — for the same key, the right one wins. Same idea as JavaScript’s {...defaults, ...overrides}.

set — duplicate-free collection #

set basics
unique: set[int] = {1, 2, 3, 2, 1}
print(unique)   # {1, 2, 3}

unique.add(4)      # {1, 2, 3, 4}
unique.discard(2)  # {1, 3, 4}
print(3 in unique) # True

# Empty set is set() — {} is an empty dict
empty = set()

Set operations #

Set operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

a | b   # union  {1, 2, 3, 4, 5, 6}
a & b   # intersection  {3, 4}
a - b   # difference  {1, 2}
a ^ b   # symmetric difference  {1, 2, 5, 6}

The shortest deduplication of a list: list(set(items)). Watch out — order may break. To dedupe while preserving order:

Order-preserving dedupe
items = [1, 2, 1, 3, 2, 4]
unique = list(dict.fromkeys(items))
print(unique)   # [1, 2, 3, 4]

dict.fromkeys() exploits dict’s preserved insertion order + key uniqueness at once.

Comprehensions — build collections in one line #

Syntax that fuses a for loop and a condition into a single line. One of the most-used patterns in Python.

List comprehension #

Basic shape
# [expression for variable in iterable]

squares = [x ** 2 for x in range(5)]
# [0, 1, 4, 9, 16]

Expanded as a for loop:

Same code, expanded
squares = []
for x in range(5):
    squares.append(x ** 2)

Five lines compress to one. It’s faster to read too — intent is visible at a glance.

Condition — filter #

if clause
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]

if is a filter. Only matching elements get in.

Transform + filter together #

Both at once
# square only the evens
result = [x ** 2 for x in range(10) if x % 2 == 0]
# [0, 4, 16, 36, 64]

if-else expression (mind the position) #

if-else goes in the expression position. Different position from the filter if.

Conditional expression
labels = ["even" if x % 2 == 0 else "odd" for x in range(5)]
# ['even', 'odd', 'even', 'odd', 'even']

The order is [expr if cond else other_expr for x in iter]. A confusing area — read it carefully.

Nested — making 2D #

Nested comprehension — 2D matrix
matrix = [[0 for _ in range(3)] for _ in range(3)]
# [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

Solves the [[0]] * 3 pitfall from earlier. Each inner list is freshly created and independent.

Dict comprehension #

dict comprehension
square_map = {x: x ** 2 for x in range(5)}
# {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# Swap key/value
user = {"id": 1, "name": "curtis"}
swapped = {v: k for k, v in user.items()}
# {1: "id", "curtis": "name"}

Set comprehension #

set comprehension
unique_lengths = {len(w) for w in ["a", "bb", "cc", "ddd"]}
# {1, 2, 3}

Pitfall — the order of nested for #

Nested for
pairs = [(x, y) for x in [1, 2] for y in ['a', 'b']]
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]

With multiple fors, the leftmost is the outer loop. Expanded:

Same behavior expanded
pairs = []
for x in [1, 2]:
    for y in ['a', 'b']:
        pairs.append((x, y))

Generator expressions — saving memory #

Replace square brackets with parentheses and the comprehension becomes a generator instead.

Generator expression
gen = (x ** 2 for x in range(1_000_000))
print(gen)   # <generator object ...>

A list comprehension builds every element immediately. A million elements means a million slots in memory. A generator produces the next value only when asked, so it uses almost no memory.

Sum — a generator is enough
total = sum(x ** 2 for x in range(1_000_000))
# When passed directly to a function, the () can be omitted

Generators fit when passed to a function that only iterates once like sum, max, any, all.

List comprehensionGenerator expression
Notation[ ... ]( ... )
MemoryEvery element built immediatelyOnly on demand
ReuseIterate many timesIterate once only
Index accessO result[3]X

When to use a comprehension, when to expand #

Comprehensions aren’t always best. As logic gets complex, expanding reads better.

Expand this one
# Possible in one line, but hard to read
result = [transform(x) for x in items if validate(x) and is_active(x) and not is_deleted(x)]

# Expanded — easy to debug, easy to read
result = []
for x in items:
    if not validate(x):
        continue
    if not is_active(x) or is_deleted(x):
        continue
    result.append(transform(x))

Rule of thumb: one expression per line, up to one if is where comprehensions shine. Past that, expanding is usually better.

Wrap-up #

What this post covered:

  • list (mutable, ordered, duplicates) — slicing [start:stop:step] is powerful
  • tuple (immutable, fixed-shape) — multi-return, dict keys, NamedTuple
  • dict (key-value, ordered) — .get(), .items(), merge with |
  • set (no duplicates, no order) — set operations | & - ^
  • List / dict / set comprehensions — [expr for x in iter if cond]
  • if-else goes in the expression slot; if goes in the filter slot
  • Generator expression (x for x in iter) — memory-efficient, single iteration
  • When too complex, expanding the comprehension is better

In the next post (#5 Functions and argument patterns) we cover the various argument patterns in function definitions — positional-only, keyword-only, *args / **kwargs, and more.

X