Modern Python Intermediate #4: Iterables/generators/yield from
Remember the generator expression (x for x in iter) we briefly saw at the end of Basics #4? This is that post. Starting from how for works, we cover user-defined iterables, generator functions, and yield from.
What for in really is — the iterable protocol
#
for x in [1, 2, 3]:
print(x)Unfolding what this single line does internally:
items = [1, 2, 3]
it = iter(items) # 1) iterable → iterator
while True:
try:
x = next(it) # 2) request the next value
except StopIteration:
break # 3) stop when done
print(x)Two key steps:
iter(obj)— get an iterator from an iterable (calls__iter__)next(it)— request the next value (calls__next__); raisesStopIterationwhen finished
Iterable vs iterator #
The terminology is confusing — let’s sort it out.
| Definition | Methods | Examples | |
|---|---|---|---|
| Iterable | anything that can iter() | __iter__ | list, dict, str, range, files, generators |
| Iterator | something that produces “the next value” | __next__ (and __iter__) | result of iter([1,2,3]), generators |
Every iterator is iterable (its __iter__ returns itself). The reverse isn’t true — list is iterable but isn’t an iterator. next(my_list) errors.
User-defined iterables — using a class #
class MyRange:
def __init__(self, start: int, stop: int):
self.start = start
self.stop = stop
def __iter__(self):
return MyRangeIterator(self.start, self.stop)
class MyRangeIterator:
def __init__(self, current: int, stop: int):
self.current = current
self.stop = stop
def __iter__(self):
return self
def __next__(self):
if self.current >= self.stop:
raise StopIteration
value = self.current
self.current += 1
return value
for x in MyRange(0, 3):
print(x)
# 0, 1, 2Two classes — separating iterable from iterator. The iterable can be iterated multiple times; the iterator is exhausted after one full pass.
r = MyRange(0, 3)
list(r) # [0, 1, 2]
list(r) # [0, 1, 2] ← a new iterator each timeGenerator functions — the same job in a single function #
You can replace both classes above with a single function containing yield.
def my_range(start: int, stop: int):
current = start
while current < stop:
yield current
current += 1
for x in my_range(0, 3):
print(x)
# 0, 1, 2If a function body contains even one yield, calling the function returns a generator object instead of a regular value — the same job as the two classes above.
How yield works
#
This is the most confusing part.
def gen():
print("step 1")
yield 1
print("step 2")
yield 2
print("step 3")
g = gen()
# Function body has not yet executed!g = gen() alone doesn’t execute the function body. The first next(g) starts it.
print(next(g))
# step 1
# 1
print(next(g))
# step 2
# 2
print(next(g))
# step 3
# StopIteration ← when there are no more yieldsThe function pauses at each yield. The next next() call resumes from that point. The essence of generators is interleaving the function’s execution with the caller.
How is this different from a generator expression? #
The same behavior as (x for x in iter) from Basics #4, but the function form is more expressive.
# A one-liner — expression fits
squares = (x ** 2 for x in range(10))
# Complex logic — function fits
def squares_evens_only():
for x in range(10):
if x % 2 != 0:
continue
yield x ** 2The value of laziness — memory and speed #
The biggest advantage of a generator is not materializing every value at once.
# list comprehension — builds 1 million immediately, full memory
squares_list = [x ** 2 for x in range(1_000_000)]
# generator — builds on demand, almost no memory
squares_gen = (x ** 2 for x in range(1_000_000))
total = sum(squares_gen) # done in a single passEven infinite sequences #
def counter(start: int = 0):
n = start
while True:
yield n
n += 1
# Take only the first 5
from itertools import islice
first_five = list(islice(counter(), 5))
print(first_five) # [0, 1, 2, 3, 4]Impossible with a list. A generator produces only as many values as requested, so infinity is fine.
Pipelines — chaining generators #
A data-processing pipeline where each stage is a generator is both memory-efficient and easy to reason about.
def read_lines(path: str):
with open(path) as f:
for line in f:
yield line.rstrip()
def filter_errors(lines):
for line in lines:
if "ERROR" in line:
yield line
def parse_timestamp(lines):
for line in lines:
ts, _, msg = line.partition(" ")
yield (ts, msg)
# Compose
errors = parse_timestamp(filter_errors(read_lines("app.log")))
for ts, msg in errors:
print(ts, msg)Each stage handles one line at a time. Even a 100GB file isn’t loaded into memory.
yield from — generator delegation
#
When you want to forward values from another iterable directly.
def chain_two(a, b):
for x in a:
yield x
for y in b:
yield ydef chain_two(a, b):
yield from a
yield from bSame result, but yield from is shorter — and gives two additional benefits:
- send/throw delegate automatically (covered below)
- You can capture the inner generator’s return value
Natural for trees / recursive traversal #
def flatten(items):
for item in items:
if isinstance(item, list):
yield from flatten(item)
else:
yield item
result = list(flatten([1, [2, [3, [4]], 5]]))
print(result) # [1, 2, 3, 4, 5]A single line yield from flatten(...) unfolds recursion naturally.
send, throw, close — coroutine features
#
A generator can also receive values. Assign the result of yield to a variable, and the caller can push values in via send.
def echo():
while True:
received = yield
print(f"받음: {received}")
g = echo()
next(g) # advance to the first yield (priming)
g.send("hello") # 받음: hello
g.send("world") # 받음: worldThis mechanism is what async (#7) and cooperative multitasking are built on. In regular code you rarely use send directly. Just be aware the mechanism exists.
g.throw(ValueError, "예외 주입") # equivalent to raising inside the generator
g.close() # terminate the generator (raises GeneratorExit)close() is useful: for generators that hold resources, placing cleanup in a try/finally ensures it runs when the generator is closed.
def read_lines(path):
f = open(path)
try:
for line in f:
yield line
finally:
f.close()Even if you don’t fully consume this generator (e.g., you break out of the loop), garbage collection will call close() and the file will be closed.
itertools — gem of the standard library
#
itertools has tools used heavily in data pipelines.
from itertools import (
count, cycle, repeat, # infinite
islice, # slicing
chain, # concatenation
groupby, # grouping
accumulate, # accumulate
combinations, permutations, product, # combinations
starmap, filterfalse, dropwhile, takewhile, # transform/filter
)
# First N
list(islice(count(), 5)) # [0, 1, 2, 3, 4]
# Chain multiple iterables
list(chain([1, 2], [3, 4])) # [1, 2, 3, 4]
# Running totals
list(accumulate([1, 2, 3, 4])) # [1, 3, 6, 10]
# Grouping (needs to be sorted)
data = [("a", 1), ("a", 2), ("b", 3)]
for key, group in groupby(data, key=lambda x: x[0]):
print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3)]If you work with data pipelines regularly, reading through this module once pays dividends forever.
The standard library collections follow the same protocol #
The ABCs in collections.abc are formalizations of this protocol.
from collections.abc import Iterable, Iterator
def consume(items: Iterable[int]) -> int:
total = 0
for x in items:
total += x
return total
# list, tuple, set, generator, range, ... all pass
consume([1, 2, 3])
consume(range(10))
consume(x for x in [1, 2, 3])Typing function parameters as Iterable[T] is the broadest and safest choice. No need to narrow to list[T] — it works the same whether the caller passes a generator, a set, or any other iterable.
@contextmanager is actually a generator
#
With generators covered, we can now see how @contextmanager from #3 actually works.
from contextlib import contextmanager
@contextmanager
def chdir(path):
old = os.getcwd()
os.chdir(path)
try:
yield path
finally:
os.chdir(old)Because it contains a yield, this is a generator function. @contextmanager takes that generator and wraps it in an object whose __enter__ calls the first next, and __exit__ calls the second next or throw. Context managers are built directly on top of generators.
Wrap-up #
What this post covered:
for= sugar foriter()+next()+StopIteration- Iterable (
__iter__) ⊃ Iterator (__iter__+__next__) - Generator function — even a single
yieldmakes a call return a generator object - Pause at each
yield, resume on the nextnext - The value of laziness — memory / infinite sequences / pipelines
yield from— delegates to another iterable; natural for recursionsend/throw/close— coroutine mechanisms;close+try/finallyfor resource cleanupitertoolsstandard toolkit- Type function arguments as broadly as
Iterable[T] @contextmanageris built on generators
In the next post (#5 Decorator patterns) we cover every pattern of decorators — the tool for wrapping functions and classes. @contextmanager and @dataclass are also one form.