Contents
8 Chapter

dataclass and __slots__

Every option of @dataclass for short, safe data classes — frozen, kw_only, field() — plus __slots__ for memory savings, all in one place.

The first chapter of Part 2, Structuring Code. By the end of Part 1’s seven chapters we had functions, collections, exceptions, and modules in hand, but we had not yet seen a concise way to express “objects with a fixed shape.” This chapter uses @dataclass and __slots__ to handle that problem.

The dataclass from this chapter is the one you’ll meet most often in the rest of the book. It pairs up with the Protocol from Chapter 9 typing in earnest — Generic, Protocol, TypedDict, Literal, and Part 4’s FastAPI Pydantic model is essentially a dataclass extension (Chapter 24 Pydantic v2 in depth compares them head-on).

What problem do data classes solve? #

Anyone who’s written a class like this knows what’s annoying right away.

🚫 hand-written class — long and repetitive
class User:
    def __init__(self, id: int, name: str, age: int):
        self.id = id
        self.name = name
        self.age = age

    def __repr__(self) -> str:
        return f"User(id={self.id!r}, name={self.name!r}, age={self.age!r})"

    def __eq__(self, other) -> bool:
        if not isinstance(other, User):
            return NotImplemented
        return (self.id, self.name, self.age) == (other.id, other.name, other.age)

Three fields require hand-writing __init__, __repr__, and __eq__. Adding one field means editing all three places.

@dataclass solves this.

✅ @dataclass — same job
from dataclasses import dataclass

@dataclass
class User:
    id: int
    name: str
    age: int

That’s all. __init__, __repr__, and __eq__ are auto-generated.

It works
u = User(id=1, name="Curtis", age=30)
print(u)
# User(id=1, name='Curtis', age=30)

print(u == User(id=1, name="Curtis", age=30))   # True

Type hints (id: int, etc.) are themselves the field definitions. Declare the data shape in one place; behaviors come automatically.

@dataclass options — the common ones #

With options
from dataclasses import dataclass

@dataclass(frozen=True, kw_only=True, slots=True)
class User:
    id: int
    name: str
    age: int = 0

What each option means:

OptionDefaultMeaning
frozenFalseTrue: immutable — fields can’t change after creation
kw_onlyFalseTrue: every field is keyword-only
slotsFalseTrue: auto-add __slots__ (3.10+)
eqTrueauto-generate __eq__
orderFalseTrue: auto-generate <, >, etc.
reprTrueauto-generate __repr__
initTrueauto-generate __init__

frozen=True — immutable objects #

frozen
@dataclass(frozen=True)
class Point:
    x: float
    y: float

p = Point(1.0, 2.0)
p.x = 3.0    # ✗ FrozenInstanceError

Why immutability is good:

  • Usable as a dict key or set element (becomes hashable automatically)
  • Blocks unintended mutations — prevents bugs from someone editing data after passing it
  • Safe in multithreaded code (no race conditions)

A great fit for “this data doesn’t change after creation” in domain models.

kw_only=True — no positional args #

kw_only
@dataclass(kw_only=True)
class User:
    id: int
    name: str
    age: int = 0

u = User(id=1, name="Curtis")          # OK
u = User(1, "Curtis")                  # ✗ TypeError

The same effect as keyword-only from Chapter 5 function argument patterns. Calls with many fields like User(1, "Curtis", 30, True, "admin") are hard to read; kw_only=True blocks them. Recommended on by default for new data classes.

slots=True — memory and speed #

slots
@dataclass(slots=True)
class Point:
    x: float
    y: float

We cover this in detail later in the chapter. Short version: “makes instances lighter and faster”.

Comparable — order=True #

order
@dataclass(order=True)
class Score:
    value: int
    name: str

scores = [Score(80, "B"), Score(95, "A"), Score(70, "C")]
scores.sort()    # works automatically
print(scores)    # [Score(70, 'C'), Score(80, 'B'), Score(95, 'A')]

<, <=, >, >= compare field-by-field like a tuple. If the first field ties, it falls through to the next. Useful where order matters (scores, times, coordinates).

field() — fine-grained per-field config #

When the default isn’t a simple value, or when you need finer options, use field().

Pitfall — don’t put a mutable default directly #

🚫 Error
@dataclass
class User:
    name: str
    tags: list[str] = []   # ✗ ValueError
# mutable default <class 'list'> for field tags is not allowed

The mutable-default pitfall from Chapter 5. dataclass kindly catches it. Use default_factory.

✅ default_factory
from dataclasses import dataclass, field

@dataclass
class User:
    name: str
    tags: list[str] = field(default_factory=list)

Each instance gets a fresh empty list.

Other field() options #

field options
from dataclasses import dataclass, field

@dataclass
class User:
    id: int
    # 1. default
    role: str = "member"
    # 2. mutable default
    tags: list[str] = field(default_factory=list)
    # 3. exclude from repr/eq
    password: str = field(repr=False, compare=False, default="")
    # 4. exclude from init — populate later
    created_at: float = field(init=False, default=0.0)
    # 5. metadata
    score: int = field(default=0, metadata={"max": 100})

repr=False is common for fields like passwords that shouldn’t appear in logs (Chapter 31 logging and observability revisits PII / secret logging prevention patterns). compare=False keeps a field out of equality — e.g., users with different created_at still count as equal.

__post_init__ — post-creation hook #

Since __init__ is auto-generated, it’s hard to write directly; use this when you need extra processing after creation.

__post_init__
from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)

    def __post_init__(self):
        self.area = self.width * self.height

r = Rectangle(3, 4)
print(r.area)   # 12.0

A common pattern: exclude a field from the constructor with init=False and compute it in __post_init__.

Where dataclass doesn’t fit #

It isn’t a panacea. Look elsewhere for these:

SpotBetter tool
Strong validation (email format, length limits)Pydantic
Frequent JSON conversion (serialize/deserialize)Pydantic, attrs, msgspec
Inheritance + lots of behaviorregular class
Named tuple is enoughNamedTuple
A dict is fineTypedDict

For API input validation, Pydantic is overwhelmingly better. Chapter 24 Pydantic v2 in depth covers it in earnest. Treat dataclass as for “internal data models.”

__slots__ — memory and speed #

Now to the real story behind slots=True.

Regular instances — using __dict__ #

Python objects store attributes in a dict by default.

dict-based
class Point:
    def __init__(self, x: float, y: float):
        self.x = x
        self.y = y

p = Point(1.0, 2.0)
print(p.__dict__)
# {'x': 1.0, 'y': 2.0}

p.z = 3.0    # can freely add attributes
print(p.__dict__)
# {'x': 1.0, 'y': 2.0, 'z': 3.0}

Pro: very flexible. Con: per-attribute dict overhead every time. Memory grows a lot when you create millions of objects.

__slots__ — only predeclared attributes #

__slots__
class Point:
    __slots__ = ("x", "y")

    def __init__(self, x: float, y: float):
        self.x = x
        self.y = y

p = Point(1.0, 2.0)
p.z = 3.0    # ✗ AttributeError: 'Point' object has no attribute 'z'

Defining __slots__:

  • No __dict__ is created — memory drops
  • Can’t add attributes — only declared ones
  • Slightly faster attribute access — direct slot access instead of dict lookup

In numbers, 40–50% memory savings per instance and 10–25% faster attribute access are typical (varies by object size and interpreter version). Precise measurement is covered with the right tools in Chapter 21 performance — cProfile, py-spy, memory profiling.

dataclass(slots=True) is the easiest path #

Writing __slots__ directly means listing field names twice (typing declaration + __slots__). dataclass(slots=True) handles it automatically.

Auto slots
from dataclasses import dataclass

@dataclass(slots=True)
class Point:
    x: float
    y: float

Behind the scenes, it’s the same as writing it manually. One line — there’s no reason not to use it.

Things to watch when using __slots__ #

It isn’t a silver bullet.

1) Multiple inheritance restrictions #

Multiple inheritance of classes that both define __slots__ causes conflicts. Single inheritance is fine.

2) Weak references — weakref doesn’t work #

Default __slots__ doesn’t support weakref. If needed:

weakref support
class Node:
    __slots__ = ("data", "__weakref__")

dataclass(slots=True, weakref_slot=True) is also available (3.11+).

3) Beware class-variable conflicts #

🚫 Conflict
class Bad:
    __slots__ = ("x",)
    x = 0    # ✗ ValueError — same-named class variable and slot

4) Can’t add dynamic attributes #

Patterns that attach temporary attributes (plugins, mocking) break. Usually fine, but consider this when building libraries — users may try.

When to turn slots on? #

Situationslots
Data models with tens of thousands to millions of instances (coordinates, graph nodes)✅ definitely
Strong immutability — block arbitrary attribute additions
Regular domain objects, not many instances⭕ no harm in turning it on (just turn it on)
Metaprogramming / dynamic attributes / heavy multiple inheritance❌ off, or use carefully

When in doubt, defaulting to dataclass(slots=True) is a typical modern Python choice.

Exercises #

  1. Define Address(country: str, city: str, postal_code: str) with @dataclass(frozen=True, kw_only=True, slots=True). Verify that calling Address("KR", "Seoul", "06000") raises TypeError (kw_only), that two instances built with the same values are ==, that hash(addr) works (frozen → hashable), and that addr.city = "Busan" raises FrozenInstanceError.
  2. Build a Cart class with @dataclass. The items: list[str] field needs field(default_factory=list) because it’s a mutable default. Exclude total: float from init with field(init=False, default=0.0) and fill it in __post_init__ as the number of items × 1000.
  3. Create 10,000 @dataclass instances and 10,000 @dataclass(slots=True) instances, then compare per-instance memory using sys.getsizeof(instance) + sys.getsizeof(instance.__dict__) (wrap the slots side in try / except since it has no __dict__). Precise memory measurement tools come back in Chapter 21 performance.

In one line: one line of @dataclass auto-generates __init__/__repr__/__eq__. Common options are frozen (immutable, hashable), kw_only (call readability), slots (memory), order (sorting). Use field(default_factory=...) for mutable defaults; __post_init__ for post-creation processing. If you need validation / serialization, use Pydantic instead of dataclass.

Next chapter #

In Chapter 9 typing in earnest — Generic, Protocol, TypedDict, Literal we cover the powerful tools of the type system. When this chapter’s dataclass meets Chapter 9’s Protocol, the core pattern of “structural typing” comes together.

X