Modern Python Intermediate #1: dataclass and __slots__
If you’ve finished the Modern Python Basics series, it’s time to step up. The intermediate series is seven posts that take the tools we only touched on in basics and explore them seriously.
- #1 dataclass and
__slots__← this post - #2 typing in earnest — Generic, Protocol, TypedDict, Literal
- #3 Context managers (
with,contextlib) - #4 Iterables/generators/
yield from - #5 Decorator patterns
- #6 Pattern matching in depth
- #7 Async intro (asyncio)
The first topic is a tool for writing data-holding classes with less boilerplate — @dataclass, plus the __slots__ option that saves memory.
What problem do data classes solve? #
Anyone who’s written a class like this knows what’s annoying right away.
class User:
def __init__(self, id: int, name: str, age: int):
self.id = id
self.name = name
self.age = age
def __repr__(self) -> str:
return f"User(id={self.id!r}, name={self.name!r}, age={self.age!r})"
def __eq__(self, other) -> bool:
if not isinstance(other, User):
return NotImplemented
return (self.id, self.name, self.age) == (other.id, other.name, other.age)Three fields require hand-writing __init__, __repr__, and __eq__. Adding one field means editing all three places.
@dataclass solves this.
from dataclasses import dataclass
@dataclass
class User:
id: int
name: str
age: intThat’s all. __init__, __repr__, and __eq__ are auto-generated.
u = User(id=1, name="커티스", age=30)
print(u)
# User(id=1, name='커티스', age=30)
print(u == User(id=1, name="커티스", age=30)) # TrueType hints (id: int, etc.) are themselves the field definitions. Declare the data shape in one place; behaviors come automatically.
@dataclass options — the common ones
#
from dataclasses import dataclass
@dataclass(frozen=True, kw_only=True, slots=True)
class User:
id: int
name: str
age: int = 0What each option means:
| Option | Default | Meaning |
|---|---|---|
frozen | False | True: immutable — fields can’t change after creation |
kw_only | False | True: every field is keyword-only |
slots | False | True: auto-add __slots__ (3.10+) |
eq | True | auto-generate __eq__ |
order | False | True: auto-generate <, >, etc. |
repr | True | auto-generate __repr__ |
init | True | auto-generate __init__ |
frozen=True — immutable objects
#
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
p.x = 3.0 # ✗ FrozenInstanceErrorWhy immutability is good:
- Usable as a dict key or set element (becomes hashable automatically)
- Blocks unintended mutations — prevents bugs from data being modified after it has been passed around
- Safe in multithreaded code (no race conditions)
A great fit for domain model objects that shouldn’t change after creation.
kw_only=True — no positional args
#
@dataclass(kw_only=True)
class User:
id: int
name: str
age: int = 0
u = User(id=1, name="커티스") # OK
u = User(1, "커티스") # ✗ TypeErrorThe same effect as keyword-only from Basics #5. Calls with many fields like User(1, "커티스", 30, True, "admin") are hard to read; kw_only=True blocks them. Recommended on by default for new data classes.
slots=True — memory and speed
#
@dataclass(slots=True)
class Point:
x: float
y: floatWe cover this in detail later in the post. One-line summary: “makes instances lighter and faster”.
Comparable — order=True
#
@dataclass(order=True)
class Score:
value: int
name: str
scores = [Score(80, "B"), Score(95, "A"), Score(70, "C")]
scores.sort() # works automatically
print(scores) # [Score(70, 'C'), Score(80, 'B'), Score(95, 'A')]<, <=, >, >= compare field-by-field like a tuple. If the first field ties, it falls through to the next. Useful where order matters (scores, times, coordinates).
field() — fine-grained per-field config
#
When the default isn’t a simple value, or when you need finer options, use field().
Pitfall — don’t put a mutable default directly #
@dataclass
class User:
name: str
tags: list[str] = [] # ✗ ValueError
# mutable default <class 'list'> for field tags is not allowedThe pitfall from Basics #5. dataclass kindly catches it. Use default_factory.
from dataclasses import dataclass, field
@dataclass
class User:
name: str
tags: list[str] = field(default_factory=list)Each instance gets a fresh empty list.
Other field() options
#
from dataclasses import dataclass, field
@dataclass
class User:
id: int
# 1. default
role: str = "member"
# 2. mutable default
tags: list[str] = field(default_factory=list)
# 3. exclude from repr/eq
password: str = field(repr=False, compare=False, default="")
# 4. exclude from init — populate later
created_at: float = field(init=False, default=0.0)
# 5. metadata
score: int = field(default=0, metadata={"max": 100})repr=False is common for fields like passwords that shouldn’t appear in logs. compare=False keeps a field out of equality — e.g., users with different created_at still count as equal.
__post_init__ — post-creation hook
#
Since __init__ is auto-generated, you can’t write it directly; use __post_init__ when you need extra processing after the object is created.
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self):
self.area = self.width * self.height
r = Rectangle(3, 4)
print(r.area) # 12.0A common pattern: exclude a field from the constructor with init=False and compute it in __post_init__.
Where dataclass doesn’t fit
#
It isn’t a panacea. Look elsewhere for these:
| Case | Better tool |
|---|---|
| Strong validation (email format, length limits) | Pydantic |
| Frequent JSON conversion (serialize/deserialize) | Pydantic, attrs, msgspec |
| Inheritance + lots of behavior | regular class |
| Named tuple is enough | NamedTuple |
| A dict is fine | TypedDict |
Especially for API input validation, Pydantic (briefly seen in Basics #2) is overwhelmingly better. Treat dataclass as for “internal data models.”
__slots__ — memory and speed
#
Now to the real story behind slots=True.
Regular instances — using __dict__
#
Python objects store attributes in a dict by default.
class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
p = Point(1.0, 2.0)
print(p.__dict__)
# {'x': 1.0, 'y': 2.0}
p.z = 3.0 # can freely add attributes
print(p.__dict__)
# {'x': 1.0, 'y': 2.0, 'z': 3.0}Pro: very flexible. Con: per-attribute dict overhead every time. Memory grows a lot when you create millions of objects.
__slots__ — only predeclared attributes
#
class Point:
__slots__ = ("x", "y")
def __init__(self, x: float, y: float):
self.x = x
self.y = y
p = Point(1.0, 2.0)
p.z = 3.0 # ✗ AttributeError: 'Point' object has no attribute 'z'Defining __slots__:
- No
__dict__is created — memory drops - Can’t add attributes — only declared ones
- Slightly faster attribute access — direct slot access instead of dict lookup
In numbers, 40–50% memory savings per instance and 10–25% faster attribute access are typical (varies by object size and interpreter version).
dataclass(slots=True) is the easiest path
#
Writing __slots__ directly means listing field names twice — once in the type annotations and once in __slots__. dataclass(slots=True) handles both automatically.
from dataclasses import dataclass
@dataclass(slots=True)
class Point:
x: float
y: floatBehind the scenes, it generates the same thing as writing it manually. One line, no reason not to use it.
Things to watch when using __slots__
#
It isn’t a silver bullet.
1) Multiple inheritance restrictions #
Multiple inheritance of classes that both define __slots__ causes conflicts. Single inheritance is fine.
2) Weak references — weakref doesn’t work
#
Default __slots__ doesn’t support weakref. If needed:
class Node:
__slots__ = ("data", "__weakref__")dataclass(slots=True, weakref_slot=True) is also available (3.11+).
3) Beware class-variable conflicts #
class Bad:
__slots__ = ("x",)
x = 0 # ✗ ValueError — same-named class variable and slot4) Can’t add dynamic attributes #
Patterns that attach temporary attributes (plugins, mocking) break. Usually fine, but worth considering when building libraries — users may try to do this.
When to turn slots on? #
| Situation | slots |
|---|---|
| Data models with tens of thousands to millions of instances (coordinates, graph nodes) | ✅ definitely |
| Strong immutability — block arbitrary attribute additions | ✅ |
| Regular domain objects, not many instances | ⭕ no harm in turning it on (just turn it on) |
| Metaprogramming / dynamic attributes / heavy multiple inheritance | ❌ off, or use carefully |
When in doubt, defaulting to dataclass(slots=True) is the typical modern Python answer.
Wrap-up #
The tools this post covered:
@dataclassauto-generates__init__/__repr__/__eq__- Options:
frozen(immutable, hashable),kw_only(no positional),order(sorting),slots(memory) - Use
field(default_factory=list)for mutable defaults - Per-field control with
field(repr=False, compare=False, init=False) - Post-creation hook with
__post_init__ __slots__— removes per-instance dict overhead, saves memory and speeddataclass(slots=True)is the shortest way to use slots- For strong validation/JSON serialization, Pydantic, not dataclass
In the next post (#2 typing in earnest) we cover the powerful tools of the type system — Generic, Protocol, TypedDict, Literal. The next step from the type hints we set up in basics.