Pydantic v2 in depth — validation, serialization, custom validators
A dedicated deep dive on Pydantic, the core of FastAPI. v2's performance and API changes, the right places to use model_validator/field_validator, serialization control, and JSON Schema generation.
In Chapter 23 Routing, Pydantic models, dependency injection we covered the basics of Pydantic. This chapter goes a level deeper — the exact lifecycle of validation / serialization, patterns for custom validators / serializers, JSON Schema integration, and the pitfalls people most often miss.
The patterns in this chapter come back in Chapter 25 Connecting a DB for ORM-object ↔ Pydantic conversion, in Chapter 29 Capstone — finishing the TODO API for domain-schema design, and in Chapter 31 Logging and observability for PII-masking serialization. Working through this chapter first makes the back half of the book easier to read.
v1 → v2 — why migrate #
Pydantic v2 (released in 2023) is the same library as v1, but with effectively a different usage style. The core differences:
| Area | v1 | v2 |
|---|---|---|
| Core | Pure Python | Rust (pydantic-core) |
| Performance | 1× | 5–50× faster |
| Validation methods | @validator | @field_validator + @model_validator |
| Serialization | .dict(), .json() | .model_dump(), .model_dump_json() |
| Configuration | class Config: | model_config = ConfigDict(...) |
| Generic | Limited | PEP 695 friendly |
| union branching | First match | discriminated union or smart mode |
There’s an automated v1 → v2 conversion tool (bump-pydantic). That said, the custom-validator parts are safer to port by hand.
The old v1 API keeps working in v2 for a while with deprecation warnings. New code should use the v2 API exclusively.
BaseModel vs dataclass vs TypedDict — when to pick what
#
This book has three tools for expressing the shape of data.
| Tool | Location | Good for |
|---|---|---|
@dataclass | Chapter 8 | Internal domain models, lightweight, almost no validation |
TypedDict | Chapter 9 | Declaring the shape of external dicts (JSON responses, etc.), plain dict at runtime |
pydantic.BaseModel | This chapter | Validating external input, serialization / deserialization, FastAPI integration |
The picks:
- Need input validation? → BaseModel
- JSON conversion happens often? → BaseModel
- Pure internal data, no conversion / validation? → dataclass
- Only declaring the shape of an external dict, zero runtime cost? → TypedDict
The input / output of FastAPI routes is almost always BaseModel. ORM models are SQLAlchemy Mapped[T] (Chapter 25), and when sending a response you wrap them once more in BaseModel (from_attributes=True).
Validation lifecycle — one picture #
What happens in the single line User.model_validate({"name": "curtis", "age": 30}):
Input dict
│
▼
1. Run @model_validator(mode="before")
├─ Can refine the input (raw-dict stage)
▼
2. Type conversion + Field() constraint validation per field
├─ ge / le / min_length / pattern, etc.
▼
3. Run @field_validator(mode="before") per field
▼
4. Run @field_validator(mode="after") per field
▼
5. Build the BaseModel instance
▼
6. Run @model_validator(mode="after")
├─ Cross-field validation
▼
Completed instanceWith this flow in your head, “where does my validator belong” becomes a natural choice.
@field_validator — single field
#
from pydantic import BaseModel, field_validator
class TodoCreate(BaseModel):
title: str
tags: list[str] = []
@field_validator("title")
@classmethod
def title_trim_and_check(cls, v: str) -> str:
v = v.strip()
if not v:
raise ValueError("title cannot be whitespace only")
if "<" in v or ">" in v:
raise ValueError("HTML tags are not allowed")
return v
@field_validator("tags")
@classmethod
def tags_lowercase(cls, v: list[str]) -> list[str]:
return [t.lower() for t in v]Rules:
@classmethodis required (explicit in v2)- The return value is the transformed value — for simple validation,
return v - Raising ValueError / TypeError / AssertionError is caught as a validation failure
mode="before" — before type conversion
#
class Event(BaseModel):
timestamp: int
@field_validator("timestamp", mode="before")
@classmethod
def parse_iso(cls, v):
if isinstance(v, str):
from datetime import datetime
return int(datetime.fromisoformat(v).timestamp())
return vmode="before" is called before type conversion. You can step in and convert a string input like "2026-05-17T12:00:00" to int before it ever gets typed.
The default mode="after" runs after type conversion — v: int is guaranteed.
@model_validator — whole model
#
For validating relationships across fields.
from pydantic import BaseModel, model_validator
from typing import Self
class DateRange(BaseModel):
start: datetime
end: datetime
@model_validator(mode="after")
def check_order(self) -> Self:
if self.start > self.end:
raise ValueError("start is later than end")
return selfmode="after" (the default) runs after every field is filled, receives self, and returns it. The Self type guarantees correctness (Self from Chapter 20 Advanced typing).
mode="before" — refine raw input
#
class FlexibleInput(BaseModel):
name: str
age: int
@model_validator(mode="before")
@classmethod
def normalize(cls, data):
if isinstance(data, str):
# Also work when handed a string
name, age = data.split(",")
return {"name": name.strip(), "age": int(age)}
return dataA pattern for taking varied inputs from outside and normalizing them into a standard dict. Useful for user input or legacy-system integration.
field vs model validator — picking #
| Job | Which one |
|---|---|
| Validate format / value of one field | @field_validator |
| Transform one field (strip, lowercase) | @field_validator |
| Validate relationship between two fields (start ≤ end) | @model_validator(mode="after") |
| Transform the shape of the input itself | @model_validator(mode="before") |
Serialization — model_dump / model_dump_json
#
The opposite direction of validation: BaseModel instance → dict / JSON.
user = User(name="curtis", age=30, password="secret")
user.model_dump()
# {"name": "curtis", "age": 30, "password": "secret"}
user.model_dump_json()
# '{"name": "curtis", "age": 30, "password": "secret"}'exclude / include
#
user.model_dump(exclude={"password"})
# {"name": "curtis", "age": 30}
user.model_dump(include={"name"})
# {"name": "curtis"}
# Nested
order.model_dump(exclude={"items": {"__all__": {"price"}}})
# Drop only price from each element of itemsexclude_unset / exclude_defaults / exclude_none
#
class TodoUpdate(BaseModel):
title: str | None = None
done: bool | None = None
upd = TodoUpdate(title="new title")
upd.model_dump() # {"title": "new title", "done": None}
upd.model_dump(exclude_unset=True) # {"title": "new title"} ← drop fields not explicitly set
upd.model_dump(exclude_none=True) # {"title": "new title"} ← drop None values
upd.model_dump(exclude_defaults=True) # {"title": "new title"} ← drop defaultsThe reason Chapter 23’s PATCH pattern uses exclude_unset=True. Update only the fields the client actually specified.
@field_serializer — custom serialization
#
Control the serialization shape of a specific field.
from pydantic import BaseModel, field_serializer
from datetime import datetime
class Event(BaseModel):
name: str
occurred_at: datetime
@field_serializer("occurred_at")
def serialize_dt(self, dt: datetime) -> str:
return dt.strftime("%Y-%m-%d %H:%M:%S KST")Emit a custom string instead of the default ISO format.
@model_serializer — serialize the whole model
#
from pydantic import BaseModel, model_serializer
class Coordinates(BaseModel):
lat: float
lng: float
@model_serializer
def serialize(self) -> str:
return f"{self.lat},{self.lng}"
c = Coordinates(lat=37.5, lng=127.0)
c.model_dump() # "37.5,127.0"Convert the whole model to something other than a dict. Useful when an external API demands a peculiar format.
PII masking — an operational pattern #
Sensitive information like passwords and card numbers must be protected in both logs and responses.
from pydantic import BaseModel, field_serializer, SecretStr
class User(BaseModel):
email: str
password: SecretStr # automatically shown as '*'
card_number: str
@field_serializer("card_number")
def mask_card(self, v: str) -> str:
return f"****-****-****-{v[-4:]}"SecretStr is a built-in masking type — repr / dump automatically show '**********', and .get_secret_value() is the only way to get the raw value. Chapter 31 Logging and observability covers this pattern together with production logging.
Field() — everything about field metadata
#
The Field() options briefly seen in Chapter 23, gathered in one place.
from datetime import datetime, timezone
from pydantic import BaseModel, Field
from typing import Annotated
class Product(BaseModel):
# Validation constraints
price: int = Field(ge=0, le=1_000_000)
name: str = Field(min_length=1, max_length=200)
sku: str = Field(pattern=r"^[A-Z]{3}-\d{4}$")
tags: list[str] = Field(min_length=1, max_length=10)
# Defaults
stock: int = Field(default=0)
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
# Alias (different name on input)
internal_id: int = Field(alias="id")
# OpenAPI documentation
description: str = Field(
default="",
description="Product description (markdown allowed)",
examples=["One box of delicious apples"],
)
# deprecated
legacy_code: str | None = Field(default=None, deprecated=True)
# Exclude marker (read by other tooling)
internal_note: str = Field(default="", exclude=True)Annotated pattern — pulling metadata out
#
from typing import Annotated
Price = Annotated[int, Field(ge=0, le=1_000_000)]
SKU = Annotated[str, Field(pattern=r"^[A-Z]{3}-\d{4}$")]
class Product(BaseModel):
price: Price
sku: SKUUseful for reusing the same constraint across many models. Same pattern as Annotated in Chapter 20 Advanced typing.
ConfigDict — model-level configuration
#
v1’s class Config: becomes model_config = ConfigDict(...) in v2.
from pydantic import BaseModel, ConfigDict
class User(BaseModel):
model_config = ConfigDict(
# 1. Read attributes from ORM objects (SQLAlchemy, etc.)
from_attributes=True,
# 2. Make every field strict (turn off auto type conversion)
strict=True,
# 3. Immutable
frozen=True,
# 4. How to handle undefined fields
extra="forbid", # extra field → error (safe)
# extra="allow", # allowed (loose)
# extra="ignore", # ignored (default)
# 5. Accept both alias and original name
populate_by_name=True,
# 6. Auto-strip string inputs
str_strip_whitespace=True,
)
id: int = Field(alias="user_id")
name: strstrict=True — turn off auto conversion
#
By default, Pydantic converts loosely. Send "30" to an int field and it becomes 30. strict=True blocks that and only accepts the exact type.
class Loose(BaseModel):
age: int
class Strict(BaseModel):
model_config = ConfigDict(strict=True)
age: int
Loose(age="30") # OK → age=30
Strict(age="30") # ✗ ValidationErrorFor production settings where you want strict control of input formats.
extra="forbid" — block unknown fields
#
class User(BaseModel):
model_config = ConfigDict(extra="forbid")
name: str
User(name="curtis", admin=True)
# ✗ ValidationError: Extra inputs are not permittedUseful for API input validation when you want to catch “typos / unintended fields” early. The default is "ignore" — unknown fields are silently dropped.
RootModel — model the collection itself
#
from pydantic import RootModel
class TagList(RootModel[list[str]]):
pass
t = TagList.model_validate(["python", "fastapi"])
t.root # ["python", "fastapi"]
t.model_dump_json() # '["python","fastapi"]'For JSON inputs whose root is a dict / list. A FastAPI route that takes list[str] directly handles this automatically, but if you want to attach validation logic / methods, use RootModel.
Generic models — reusable responses #
from pydantic import BaseModel
from typing import Generic, TypeVar
T = TypeVar("T")
class Paginated(BaseModel, Generic[T]):
items: list[T]
total: int
page: int
class TodoOut(BaseModel):
id: int
title: str
@router.get("/todos", response_model=Paginated[TodoOut])
def list_todos(): ...Generic from Chapter 9 Typing in earnest + Pydantic. Python 3.12+ syntax (class Paginated[T](BaseModel):) also works.
Discriminated union — precise branching #
When models of different shapes share one union, identify which model it is by a single key.
from pydantic import BaseModel, Field
from typing import Literal, Annotated
class ClickEvent(BaseModel):
type: Literal["click"]
x: int
y: int
class KeyEvent(BaseModel):
type: Literal["key"]
code: str
Event = Annotated[ClickEvent | KeyEvent, Field(discriminator="type")]
class Payload(BaseModel):
event: Event
Payload.model_validate({"event": {"type": "click", "x": 10, "y": 20}})
# event branches automatically to ClickEventWith discriminator="type", Pydantic picks the exact model looking only at that key. Faster, and the JSON Schema is accurate too.
The discriminated-union pattern from Chapter 9 Typing in earnest is exactly where it lives inside Pydantic. Combine it with match-case from Chapter 13 Pattern matching in depth and you get input → validation → branching in one flow.
JSON Schema generation — OpenAPI integration #
Pydantic models can auto-generate a JSON Schema.
class TodoCreate(BaseModel):
title: str = Field(min_length=1, max_length=200)
done: bool = False
print(TodoCreate.model_json_schema())
# {
# "properties": {
# "title": {"type": "string", "minLength": 1, "maxLength": 200},
# "done": {"type": "boolean", "default": false}
# },
# "required": ["title"]
# }FastAPI runs every route’s input / output models through this method and bakes the result into the OpenAPI spec. The “Schema” section of Swagger UI, client-codegen tools (openapi-generator, etc.) — all of them consume this output.
Examples and Field(examples=...)
#
class User(BaseModel):
email: str = Field(examples=["alice@example.com"])
age: int = Field(examples=[30, 42])
model_config = ConfigDict(
json_schema_extra={
"examples": [
{"email": "alice@example.com", "age": 30},
{"email": "bob@example.com", "age": 42},
]
}
)Pre-filled examples appear in Swagger UI’s “Try it out”, which improves the experience.
Common pitfalls #
1) Mutable defaults #
class A(BaseModel):
items: list[str] = [] # ⚠ actually safe — Pydantic handles itUnlike dataclass, Pydantic automatically copies mutable defaults. In v2 you can write [] directly and it’s safe. Even so, default_factory makes the intent clearer.
class A(BaseModel):
items: list[str] = Field(default_factory=list)2) Overriding __init__
#
class User(BaseModel):
name: str
name_lower: str
def __init__(self, **data):
super().__init__(**data)
self.name_lower = self.name.lower()Overriding __init__ bypasses the validation lifecycle. Solve it with @model_validator or a computed field.
from pydantic import BaseModel, computed_field
class User(BaseModel):
name: str
@computed_field
@property
def name_lower(self) -> str:
return self.name.lower()computed_field creates a dynamic field that’s included in response serialization.
3) Forward reference and self-referencing #
class Node(BaseModel):
name: str
children: list["Node"] = []
# Works directly on Python 3.12+
# On 3.11 or below, call .model_rebuild() at the endTree structures and similar patterns come up often. If a forward reference doesn’t resolve, a single Node.model_rebuild() rebuilds it.
4) union without a discriminator #
class Payload(BaseModel):
item: ItemA | ItemB | ItemC
# Pydantic tries each model in turn — slow + ambiguousIt tries the three models one by one and picks the first match. With similar-shaped inputs, the wrong model can match, and the trying itself is costly.
Payload = Annotated[ItemA | ItemB | ItemC, Field(discriminator="kind")]SQLAlchemy model conversion — from_attributes
#
When converting an ORM object to a response model in Chapter 25 Connecting a DB.
class TodoOut(BaseModel):
model_config = ConfigDict(from_attributes=True)
id: int
title: str
done: bool
# SQLAlchemy model
todo_orm = await db.get(Todo, 1)
# Convert
todo_out = TodoOut.model_validate(todo_orm)from_attributes=True makes Pydantic read data via attribute access on the object rather than dict keys (todo_orm.title, etc.). FastAPI’s response_model=TodoOut uses this same mechanism under the hood.
Example that carries into the next chapter #
Every pattern in this chapter shows up together in the schema design of Chapter 29 Capstone — finishing the TODO API.
from pydantic import BaseModel, ConfigDict, Field, field_validator, computed_field
from typing import Annotated, Literal
from datetime import datetime
Priority = Annotated[int, Field(ge=1, le=5)]
class TodoBase(BaseModel):
model_config = ConfigDict(str_strip_whitespace=True)
title: str = Field(min_length=1, max_length=200)
description: str = ""
priority: Priority = 3
tags: list[str] = Field(default_factory=list)
@field_validator("tags")
@classmethod
def lowercase_tags(cls, v: list[str]) -> list[str]:
return list({t.lower() for t in v}) # dedupe + lowercase
class TodoCreate(TodoBase):
pass
class TodoUpdate(BaseModel):
title: str | None = None
done: bool | None = None
priority: Priority | None = None
class TodoOut(TodoBase):
model_config = ConfigDict(from_attributes=True)
id: int
done: bool
created_at: datetime
updated_at: datetime
@computed_field
@property
def is_overdue(self) -> bool:
# Placeholder — in reality you'd look at the due_date field
return FalseThis becomes the starting schema in Chapter 29.
Exercises #
- On the
TodoCreatemodel, bundle three checks into a single@field_validator("title"): (1) strip, (2) raise ValueError if empty, (3) forbid HTML tags (<,>). Add a@model_validator(mode="after")that enforces “ifpriority > 4, the title must contain ‘urgent’”. - Make a
User(email, password)model that acceptspasswordasSecretStr, and confirm thatmodel_dump()output automatically masks the password. Add acard_number: strfield with@field_serializerthat only shows the last four digits. - With the discriminated-union pattern, make three models
ClickEvent/KeyEvent/ScrollEventand defineEventasAnnotated[..., Field(discriminator="type")]. Confirm that JSON input branches to the right model by looking only at thetypekey, and that theoneOfinmodel_json_schema()output is correct.
In one line: v2 uses a Rust core for 5–50× speed and the API differs from v1 —
@field_validator/@model_validator,model_dump,ConfigDict. The validation lifecycle is the 6 steps mode=‘before’ → type conversion → mode=‘after’. Field-level is@field_validator, model-level is@model_validator. Serialization ismodel_dump(exclude=..., exclude_unset=...)+@field_serializer/@model_serializer.SecretStrauto-masks PII.ConfigDict(strict=True, extra="forbid", from_attributes=True)are the production-grade options. Discriminated union for fast, precise branching. ORM ↔ Pydantic isfrom_attributes=True.
Next chapter #
Next, Chapter 25 Connecting a DB — SQLAlchemy 2.x + Alembic combines this chapter’s Pydantic patterns (from_attributes=True, etc.) with ORM objects. The schema design in this chapter becomes the starting point of Chapter 29 Capstone — finishing the TODO API once more.