Modern Python Intermediate #4: Iterables/generators/yield from

4 min read

Remember the generator expression (x for x in iter) we briefly saw at the end of Basics #4? This is that post. Starting from how for works, we cover user-defined iterables, generator functions, and yield from.

What for in really is — the iterable protocol #

What we usually write
for x in [1, 2, 3]:
    print(x)

Unfolding what this single line does internally:

The actual flow
items = [1, 2, 3]
it = iter(items)        # 1) iterable → iterator
while True:
    try:
        x = next(it)    # 2) request the next value
    except StopIteration:
        break           # 3) stop when done
    print(x)

Two key steps:

  1. iter(obj) — get an iterator from an iterable (calls __iter__)
  2. next(it) — request the next value (calls __next__); raises StopIteration when finished

Iterable vs iterator #

The terminology is confusing — let’s sort it out.

DefinitionMethodsExamples
Iterableanything that can iter()__iter__list, dict, str, range, files, generators
Iteratorsomething that produces “the next value”__next__ (and __iter__)result of iter([1,2,3]), generators

Every iterator is iterable (its __iter__ returns itself). The reverse isn’t true — list is iterable but isn’t an iterator. next(my_list) errors.

User-defined iterables — using a class #

Building Range yourself
class MyRange:
    def __init__(self, start: int, stop: int):
        self.start = start
        self.stop = stop

    def __iter__(self):
        return MyRangeIterator(self.start, self.stop)

class MyRangeIterator:
    def __init__(self, current: int, stop: int):
        self.current = current
        self.stop = stop

    def __iter__(self):
        return self

    def __next__(self):
        if self.current >= self.stop:
            raise StopIteration
        value = self.current
        self.current += 1
        return value

for x in MyRange(0, 3):
    print(x)
# 0, 1, 2

Two classes — separating iterable from iterator. The iterable can be iterated multiple times; the iterator is exhausted after one full pass.

Why it can iterate many times
r = MyRange(0, 3)
list(r)    # [0, 1, 2]
list(r)    # [0, 1, 2]  ← a new iterator each time

Generator functions — the same job in a single function #

You can replace both classes above with a single function containing yield.

Generator function
def my_range(start: int, stop: int):
    current = start
    while current < stop:
        yield current
        current += 1

for x in my_range(0, 3):
    print(x)
# 0, 1, 2

If a function body contains even one yield, calling the function returns a generator object instead of a regular value — the same job as the two classes above.

How yield works #

This is the most confusing part.

Creating a generator
def gen():
    print("step 1")
    yield 1
    print("step 2")
    yield 2
    print("step 3")

g = gen()
# Function body has not yet executed!

g = gen() alone doesn’t execute the function body. The first next(g) starts it.

Requesting values
print(next(g))
# step 1
# 1

print(next(g))
# step 2
# 2

print(next(g))
# step 3
# StopIteration ← when there are no more yields

The function pauses at each yield. The next next() call resumes from that point. The essence of generators is interleaving the function’s execution with the caller.

How is this different from a generator expression? #

The same behavior as (x for x in iter) from Basics #4, but the function form is more expressive.

Generator expression vs function
# A one-liner — expression fits
squares = (x ** 2 for x in range(10))

# Complex logic — function fits
def squares_evens_only():
    for x in range(10):
        if x % 2 != 0:
            continue
        yield x ** 2

The value of laziness — memory and speed #

The biggest advantage of a generator is not materializing every value at once.

Processing a million
# list comprehension — builds 1 million immediately, full memory
squares_list = [x ** 2 for x in range(1_000_000)]

# generator — builds on demand, almost no memory
squares_gen = (x ** 2 for x in range(1_000_000))

total = sum(squares_gen)   # done in a single pass

Even infinite sequences #

Infinite sequence
def counter(start: int = 0):
    n = start
    while True:
        yield n
        n += 1

# Take only the first 5
from itertools import islice
first_five = list(islice(counter(), 5))
print(first_five)   # [0, 1, 2, 3, 4]

Impossible with a list. A generator produces only as many values as requested, so infinity is fine.

Pipelines — chaining generators #

A data-processing pipeline where each stage is a generator is both memory-efficient and easy to reason about.

Pipeline
def read_lines(path: str):
    with open(path) as f:
        for line in f:
            yield line.rstrip()

def filter_errors(lines):
    for line in lines:
        if "ERROR" in line:
            yield line

def parse_timestamp(lines):
    for line in lines:
        ts, _, msg = line.partition(" ")
        yield (ts, msg)

# Compose
errors = parse_timestamp(filter_errors(read_lines("app.log")))
for ts, msg in errors:
    print(ts, msg)

Each stage handles one line at a time. Even a 100GB file isn’t loaded into memory.

yield from — generator delegation #

When you want to forward values from another iterable directly.

🚫 Manually expanded
def chain_two(a, b):
    for x in a:
        yield x
    for y in b:
        yield y
✅ yield from
def chain_two(a, b):
    yield from a
    yield from b

Same result, but yield from is shorter — and gives two additional benefits:

  1. send/throw delegate automatically (covered below)
  2. You can capture the inner generator’s return value

Natural for trees / recursive traversal #

Flattening a tree recursively
def flatten(items):
    for item in items:
        if isinstance(item, list):
            yield from flatten(item)
        else:
            yield item

result = list(flatten([1, [2, [3, [4]], 5]]))
print(result)   # [1, 2, 3, 4, 5]

A single line yield from flatten(...) unfolds recursion naturally.

send, throw, close — coroutine features #

A generator can also receive values. Assign the result of yield to a variable, and the caller can push values in via send.

send
def echo():
    while True:
        received = yield
        print(f"받음: {received}")

g = echo()
next(g)            # advance to the first yield (priming)
g.send("hello")    # 받음: hello
g.send("world")    # 받음: world

This mechanism is what async (#7) and cooperative multitasking are built on. In regular code you rarely use send directly. Just be aware the mechanism exists.

throw and close
g.throw(ValueError, "예외 주입")   # equivalent to raising inside the generator
g.close()                          # terminate the generator (raises GeneratorExit)

close() is useful: for generators that hold resources, placing cleanup in a try/finally ensures it runs when the generator is closed.

Resource cleanup
def read_lines(path):
    f = open(path)
    try:
        for line in f:
            yield line
    finally:
        f.close()

Even if you don’t fully consume this generator (e.g., you break out of the loop), garbage collection will call close() and the file will be closed.

itertools — gem of the standard library #

itertools has tools used heavily in data pipelines.

Common itertools
from itertools import (
    count, cycle, repeat,                # infinite
    islice,                               # slicing
    chain,                                # concatenation
    groupby,                              # grouping
    accumulate,                           # accumulate
    combinations, permutations, product,  # combinations
    starmap, filterfalse, dropwhile, takewhile,  # transform/filter
)

# First N
list(islice(count(), 5))             # [0, 1, 2, 3, 4]

# Chain multiple iterables
list(chain([1, 2], [3, 4]))          # [1, 2, 3, 4]

# Running totals
list(accumulate([1, 2, 3, 4]))       # [1, 3, 6, 10]

# Grouping (needs to be sorted)
data = [("a", 1), ("a", 2), ("b", 3)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))
# a [('a', 1), ('a', 2)]
# b [('b', 3)]

If you work with data pipelines regularly, reading through this module once pays dividends forever.

The standard library collections follow the same protocol #

The ABCs in collections.abc are formalizations of this protocol.

collections.abc — interfaces
from collections.abc import Iterable, Iterator

def consume(items: Iterable[int]) -> int:
    total = 0
    for x in items:
        total += x
    return total

# list, tuple, set, generator, range, ... all pass
consume([1, 2, 3])
consume(range(10))
consume(x for x in [1, 2, 3])

Typing function parameters as Iterable[T] is the broadest and safest choice. No need to narrow to list[T] — it works the same whether the caller passes a generator, a set, or any other iterable.

@contextmanager is actually a generator #

With generators covered, we can now see how @contextmanager from #3 actually works.

@contextmanager again
from contextlib import contextmanager

@contextmanager
def chdir(path):
    old = os.getcwd()
    os.chdir(path)
    try:
        yield path
    finally:
        os.chdir(old)

Because it contains a yield, this is a generator function. @contextmanager takes that generator and wraps it in an object whose __enter__ calls the first next, and __exit__ calls the second next or throw. Context managers are built directly on top of generators.

Wrap-up #

What this post covered:

  • for = sugar for iter() + next() + StopIteration
  • Iterable (__iter__) ⊃ Iterator (__iter__ + __next__)
  • Generator function — even a single yield makes a call return a generator object
  • Pause at each yield, resume on the next next
  • The value of laziness — memory / infinite sequences / pipelines
  • yield from — delegates to another iterable; natural for recursion
  • send/throw/close — coroutine mechanisms; close + try/finally for resource cleanup
  • itertools standard toolkit
  • Type function arguments as broadly as Iterable[T]
  • @contextmanager is built on generators

In the next post (#5 Decorator patterns) we cover every pattern of decorators — the tool for wrapping functions and classes. @contextmanager and @dataclass are also one form.

X