Contents
24 Chapter

Pydantic v2 in depth — validation, serialization, custom validators

A dedicated deep dive on Pydantic, the core of FastAPI. v2's performance and API changes, the right places to use model_validator/field_validator, serialization control, and JSON Schema generation.

In Chapter 23 Routing, Pydantic models, dependency injection we covered the basics of Pydantic. This chapter goes a level deeper — the exact lifecycle of validation / serialization, patterns for custom validators / serializers, JSON Schema integration, and the pitfalls people most often miss.

The patterns in this chapter come back in Chapter 25 Connecting a DB for ORM-object ↔ Pydantic conversion, in Chapter 29 Capstone — finishing the TODO API for domain-schema design, and in Chapter 31 Logging and observability for PII-masking serialization. Working through this chapter first makes the back half of the book easier to read.

v1 → v2 — why migrate #

Pydantic v2 (released in 2023) is the same library as v1, but with effectively a different usage style. The core differences:

Areav1v2
CorePure PythonRust (pydantic-core)
Performance5–50× faster
Validation methods@validator@field_validator + @model_validator
Serialization.dict(), .json().model_dump(), .model_dump_json()
Configurationclass Config:model_config = ConfigDict(...)
GenericLimitedPEP 695 friendly
union branchingFirst matchdiscriminated union or smart mode

There’s an automated v1 → v2 conversion tool (bump-pydantic). That said, the custom-validator parts are safer to port by hand.

The old v1 API keeps working in v2 for a while with deprecation warnings. New code should use the v2 API exclusively.

BaseModel vs dataclass vs TypedDict — when to pick what #

This book has three tools for expressing the shape of data.

ToolLocationGood for
@dataclassChapter 8Internal domain models, lightweight, almost no validation
TypedDictChapter 9Declaring the shape of external dicts (JSON responses, etc.), plain dict at runtime
pydantic.BaseModelThis chapterValidating external input, serialization / deserialization, FastAPI integration

The picks:

  • Need input validation? → BaseModel
  • JSON conversion happens often? → BaseModel
  • Pure internal data, no conversion / validation? → dataclass
  • Only declaring the shape of an external dict, zero runtime cost? → TypedDict

The input / output of FastAPI routes is almost always BaseModel. ORM models are SQLAlchemy Mapped[T] (Chapter 25), and when sending a response you wrap them once more in BaseModel (from_attributes=True).

Validation lifecycle — one picture #

What happens in the single line User.model_validate({"name": "curtis", "age": 30}):

Input dict
1. Run @model_validator(mode="before")
   ├─ Can refine the input (raw-dict stage)
2. Type conversion + Field() constraint validation per field
   ├─ ge / le / min_length / pattern, etc.
3. Run @field_validator(mode="before") per field
4. Run @field_validator(mode="after") per field
5. Build the BaseModel instance
6. Run @model_validator(mode="after")
   ├─ Cross-field validation
Completed instance

With this flow in your head, “where does my validator belong” becomes a natural choice.

@field_validator — single field #

@field_validator
from pydantic import BaseModel, field_validator

class TodoCreate(BaseModel):
    title: str
    tags: list[str] = []

    @field_validator("title")
    @classmethod
    def title_trim_and_check(cls, v: str) -> str:
        v = v.strip()
        if not v:
            raise ValueError("title cannot be whitespace only")
        if "<" in v or ">" in v:
            raise ValueError("HTML tags are not allowed")
        return v

    @field_validator("tags")
    @classmethod
    def tags_lowercase(cls, v: list[str]) -> list[str]:
        return [t.lower() for t in v]

Rules:

  • @classmethod is required (explicit in v2)
  • The return value is the transformed value — for simple validation, return v
  • Raising ValueError / TypeError / AssertionError is caught as a validation failure

mode="before" — before type conversion #

mode='before'
class Event(BaseModel):
    timestamp: int

    @field_validator("timestamp", mode="before")
    @classmethod
    def parse_iso(cls, v):
        if isinstance(v, str):
            from datetime import datetime
            return int(datetime.fromisoformat(v).timestamp())
        return v

mode="before" is called before type conversion. You can step in and convert a string input like "2026-05-17T12:00:00" to int before it ever gets typed.

The default mode="after" runs after type conversion — v: int is guaranteed.

@model_validator — whole model #

For validating relationships across fields.

@model_validator
from pydantic import BaseModel, model_validator
from typing import Self

class DateRange(BaseModel):
    start: datetime
    end: datetime

    @model_validator(mode="after")
    def check_order(self) -> Self:
        if self.start > self.end:
            raise ValueError("start is later than end")
        return self

mode="after" (the default) runs after every field is filled, receives self, and returns it. The Self type guarantees correctness (Self from Chapter 20 Advanced typing).

mode="before" — refine raw input #

Refining input
class FlexibleInput(BaseModel):
    name: str
    age: int

    @model_validator(mode="before")
    @classmethod
    def normalize(cls, data):
        if isinstance(data, str):
            # Also work when handed a string
            name, age = data.split(",")
            return {"name": name.strip(), "age": int(age)}
        return data

A pattern for taking varied inputs from outside and normalizing them into a standard dict. Useful for user input or legacy-system integration.

field vs model validator — picking #

JobWhich one
Validate format / value of one field@field_validator
Transform one field (strip, lowercase)@field_validator
Validate relationship between two fields (start ≤ end)@model_validator(mode="after")
Transform the shape of the input itself@model_validator(mode="before")

Serialization — model_dump / model_dump_json #

The opposite direction of validation: BaseModel instance → dict / JSON.

Basic serialization
user = User(name="curtis", age=30, password="secret")

user.model_dump()
# {"name": "curtis", "age": 30, "password": "secret"}

user.model_dump_json()
# '{"name": "curtis", "age": 30, "password": "secret"}'

exclude / include #

Field selection
user.model_dump(exclude={"password"})
# {"name": "curtis", "age": 30}

user.model_dump(include={"name"})
# {"name": "curtis"}

# Nested
order.model_dump(exclude={"items": {"__all__": {"price"}}})
# Drop only price from each element of items

exclude_unset / exclude_defaults / exclude_none #

Conditional exclusion
class TodoUpdate(BaseModel):
    title: str | None = None
    done: bool | None = None

upd = TodoUpdate(title="new title")

upd.model_dump()                       # {"title": "new title", "done": None}
upd.model_dump(exclude_unset=True)      # {"title": "new title"}  ← drop fields not explicitly set
upd.model_dump(exclude_none=True)       # {"title": "new title"}  ← drop None values
upd.model_dump(exclude_defaults=True)   # {"title": "new title"}  ← drop defaults

The reason Chapter 23’s PATCH pattern uses exclude_unset=True. Update only the fields the client actually specified.

@field_serializer — custom serialization #

Control the serialization shape of a specific field.

datetime → custom format
from pydantic import BaseModel, field_serializer
from datetime import datetime

class Event(BaseModel):
    name: str
    occurred_at: datetime

    @field_serializer("occurred_at")
    def serialize_dt(self, dt: datetime) -> str:
        return dt.strftime("%Y-%m-%d %H:%M:%S KST")

Emit a custom string instead of the default ISO format.

@model_serializer — serialize the whole model #

@model_serializer
from pydantic import BaseModel, model_serializer

class Coordinates(BaseModel):
    lat: float
    lng: float

    @model_serializer
    def serialize(self) -> str:
        return f"{self.lat},{self.lng}"

c = Coordinates(lat=37.5, lng=127.0)
c.model_dump()    # "37.5,127.0"

Convert the whole model to something other than a dict. Useful when an external API demands a peculiar format.

PII masking — an operational pattern #

Sensitive information like passwords and card numbers must be protected in both logs and responses.

Masking via @field_serializer
from pydantic import BaseModel, field_serializer, SecretStr

class User(BaseModel):
    email: str
    password: SecretStr        # automatically shown as '*'
    card_number: str

    @field_serializer("card_number")
    def mask_card(self, v: str) -> str:
        return f"****-****-****-{v[-4:]}"

SecretStr is a built-in masking type — repr / dump automatically show '**********', and .get_secret_value() is the only way to get the raw value. Chapter 31 Logging and observability covers this pattern together with production logging.

Field() — everything about field metadata #

The Field() options briefly seen in Chapter 23, gathered in one place.

All Field options (the ones you use often)
from datetime import datetime, timezone
from pydantic import BaseModel, Field
from typing import Annotated

class Product(BaseModel):
    # Validation constraints
    price: int = Field(ge=0, le=1_000_000)
    name: str = Field(min_length=1, max_length=200)
    sku: str = Field(pattern=r"^[A-Z]{3}-\d{4}$")
    tags: list[str] = Field(min_length=1, max_length=10)

    # Defaults
    stock: int = Field(default=0)
    created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

    # Alias (different name on input)
    internal_id: int = Field(alias="id")

    # OpenAPI documentation
    description: str = Field(
        default="",
        description="Product description (markdown allowed)",
        examples=["One box of delicious apples"],
    )

    # deprecated
    legacy_code: str | None = Field(default=None, deprecated=True)

    # Exclude marker (read by other tooling)
    internal_note: str = Field(default="", exclude=True)

Annotated pattern — pulling metadata out #

Move it to Annotated
from typing import Annotated

Price = Annotated[int, Field(ge=0, le=1_000_000)]
SKU = Annotated[str, Field(pattern=r"^[A-Z]{3}-\d{4}$")]

class Product(BaseModel):
    price: Price
    sku: SKU

Useful for reusing the same constraint across many models. Same pattern as Annotated in Chapter 20 Advanced typing.

ConfigDict — model-level configuration #

v1’s class Config: becomes model_config = ConfigDict(...) in v2.

Options you reach for often
from pydantic import BaseModel, ConfigDict

class User(BaseModel):
    model_config = ConfigDict(
        # 1. Read attributes from ORM objects (SQLAlchemy, etc.)
        from_attributes=True,

        # 2. Make every field strict (turn off auto type conversion)
        strict=True,

        # 3. Immutable
        frozen=True,

        # 4. How to handle undefined fields
        extra="forbid",          # extra field → error (safe)
        # extra="allow",          # allowed (loose)
        # extra="ignore",         # ignored (default)

        # 5. Accept both alias and original name
        populate_by_name=True,

        # 6. Auto-strip string inputs
        str_strip_whitespace=True,
    )

    id: int = Field(alias="user_id")
    name: str

strict=True — turn off auto conversion #

By default, Pydantic converts loosely. Send "30" to an int field and it becomes 30. strict=True blocks that and only accepts the exact type.

strict difference
class Loose(BaseModel):
    age: int

class Strict(BaseModel):
    model_config = ConfigDict(strict=True)
    age: int

Loose(age="30")     # OK → age=30
Strict(age="30")    # ✗ ValidationError

For production settings where you want strict control of input formats.

extra="forbid" — block unknown fields #

forbid
class User(BaseModel):
    model_config = ConfigDict(extra="forbid")
    name: str

User(name="curtis", admin=True)
# ✗ ValidationError: Extra inputs are not permitted

Useful for API input validation when you want to catch “typos / unintended fields” early. The default is "ignore" — unknown fields are silently dropped.

RootModel — model the collection itself #

RootModel
from pydantic import RootModel

class TagList(RootModel[list[str]]):
    pass

t = TagList.model_validate(["python", "fastapi"])
t.root              # ["python", "fastapi"]
t.model_dump_json()  # '["python","fastapi"]'

For JSON inputs whose root is a dict / list. A FastAPI route that takes list[str] directly handles this automatically, but if you want to attach validation logic / methods, use RootModel.

Generic models — reusable responses #

Generic response
from pydantic import BaseModel
from typing import Generic, TypeVar

T = TypeVar("T")

class Paginated(BaseModel, Generic[T]):
    items: list[T]
    total: int
    page: int

class TodoOut(BaseModel):
    id: int
    title: str

@router.get("/todos", response_model=Paginated[TodoOut])
def list_todos(): ...

Generic from Chapter 9 Typing in earnest + Pydantic. Python 3.12+ syntax (class Paginated[T](BaseModel):) also works.

Discriminated union — precise branching #

When models of different shapes share one union, identify which model it is by a single key.

discriminated union
from pydantic import BaseModel, Field
from typing import Literal, Annotated

class ClickEvent(BaseModel):
    type: Literal["click"]
    x: int
    y: int

class KeyEvent(BaseModel):
    type: Literal["key"]
    code: str

Event = Annotated[ClickEvent | KeyEvent, Field(discriminator="type")]

class Payload(BaseModel):
    event: Event

Payload.model_validate({"event": {"type": "click", "x": 10, "y": 20}})
# event branches automatically to ClickEvent

With discriminator="type", Pydantic picks the exact model looking only at that key. Faster, and the JSON Schema is accurate too.

The discriminated-union pattern from Chapter 9 Typing in earnest is exactly where it lives inside Pydantic. Combine it with match-case from Chapter 13 Pattern matching in depth and you get input → validation → branching in one flow.

JSON Schema generation — OpenAPI integration #

Pydantic models can auto-generate a JSON Schema.

JSON Schema
class TodoCreate(BaseModel):
    title: str = Field(min_length=1, max_length=200)
    done: bool = False

print(TodoCreate.model_json_schema())
# {
#   "properties": {
#     "title": {"type": "string", "minLength": 1, "maxLength": 200},
#     "done": {"type": "boolean", "default": false}
#   },
#   "required": ["title"]
# }

FastAPI runs every route’s input / output models through this method and bakes the result into the OpenAPI spec. The “Schema” section of Swagger UI, client-codegen tools (openapi-generator, etc.) — all of them consume this output.

Examples and Field(examples=...) #

Examples in docs
class User(BaseModel):
    email: str = Field(examples=["alice@example.com"])
    age: int = Field(examples=[30, 42])

    model_config = ConfigDict(
        json_schema_extra={
            "examples": [
                {"email": "alice@example.com", "age": 30},
                {"email": "bob@example.com", "age": 42},
            ]
        }
    )

Pre-filled examples appear in Swagger UI’s “Try it out”, which improves the experience.

Common pitfalls #

1) Mutable defaults #

🚫 Sharing the same list
class A(BaseModel):
    items: list[str] = []   # ⚠ actually safe — Pydantic handles it

Unlike dataclass, Pydantic automatically copies mutable defaults. In v2 you can write [] directly and it’s safe. Even so, default_factory makes the intent clearer.

✅ Explicit
class A(BaseModel):
    items: list[str] = Field(default_factory=list)

2) Overriding __init__ #

🚫
class User(BaseModel):
    name: str
    name_lower: str

    def __init__(self, **data):
        super().__init__(**data)
        self.name_lower = self.name.lower()

Overriding __init__ bypasses the validation lifecycle. Solve it with @model_validator or a computed field.

✅ computed field
from pydantic import BaseModel, computed_field

class User(BaseModel):
    name: str

    @computed_field
    @property
    def name_lower(self) -> str:
        return self.name.lower()

computed_field creates a dynamic field that’s included in response serialization.

3) Forward reference and self-referencing #

Self reference
class Node(BaseModel):
    name: str
    children: list["Node"] = []

# Works directly on Python 3.12+
# On 3.11 or below, call .model_rebuild() at the end

Tree structures and similar patterns come up often. If a forward reference doesn’t resolve, a single Node.model_rebuild() rebuilds it.

4) union without a discriminator #

🚫 Slow union
class Payload(BaseModel):
    item: ItemA | ItemB | ItemC
# Pydantic tries each model in turn — slow + ambiguous

It tries the three models one by one and picks the first match. With similar-shaped inputs, the wrong model can match, and the trying itself is costly.

✅ Specify discriminator
Payload = Annotated[ItemA | ItemB | ItemC, Field(discriminator="kind")]

SQLAlchemy model conversion — from_attributes #

When converting an ORM object to a response model in Chapter 25 Connecting a DB.

ORM → Pydantic
class TodoOut(BaseModel):
    model_config = ConfigDict(from_attributes=True)

    id: int
    title: str
    done: bool

# SQLAlchemy model
todo_orm = await db.get(Todo, 1)

# Convert
todo_out = TodoOut.model_validate(todo_orm)

from_attributes=True makes Pydantic read data via attribute access on the object rather than dict keys (todo_orm.title, etc.). FastAPI’s response_model=TodoOut uses this same mechanism under the hood.

Example that carries into the next chapter #

Every pattern in this chapter shows up together in the schema design of Chapter 29 Capstone — finishing the TODO API.

Chapter 29 preview — all patterns combined
from pydantic import BaseModel, ConfigDict, Field, field_validator, computed_field
from typing import Annotated, Literal
from datetime import datetime

Priority = Annotated[int, Field(ge=1, le=5)]

class TodoBase(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True)

    title: str = Field(min_length=1, max_length=200)
    description: str = ""
    priority: Priority = 3
    tags: list[str] = Field(default_factory=list)

    @field_validator("tags")
    @classmethod
    def lowercase_tags(cls, v: list[str]) -> list[str]:
        return list({t.lower() for t in v})    # dedupe + lowercase

class TodoCreate(TodoBase):
    pass

class TodoUpdate(BaseModel):
    title: str | None = None
    done: bool | None = None
    priority: Priority | None = None

class TodoOut(TodoBase):
    model_config = ConfigDict(from_attributes=True)

    id: int
    done: bool
    created_at: datetime
    updated_at: datetime

    @computed_field
    @property
    def is_overdue(self) -> bool:
        # Placeholder — in reality you'd look at the due_date field
        return False

This becomes the starting schema in Chapter 29.

Exercises #

  1. On the TodoCreate model, bundle three checks into a single @field_validator("title"): (1) strip, (2) raise ValueError if empty, (3) forbid HTML tags (<, >). Add a @model_validator(mode="after") that enforces “if priority > 4, the title must contain ‘urgent’”.
  2. Make a User(email, password) model that accepts password as SecretStr, and confirm that model_dump() output automatically masks the password. Add a card_number: str field with @field_serializer that only shows the last four digits.
  3. With the discriminated-union pattern, make three models ClickEvent / KeyEvent / ScrollEvent and define Event as Annotated[..., Field(discriminator="type")]. Confirm that JSON input branches to the right model by looking only at the type key, and that the oneOf in model_json_schema() output is correct.

In one line: v2 uses a Rust core for 5–50× speed and the API differs from v1 — @field_validator / @model_validator, model_dump, ConfigDict. The validation lifecycle is the 6 steps mode=‘before’ → type conversion → mode=‘after’. Field-level is @field_validator, model-level is @model_validator. Serialization is model_dump(exclude=..., exclude_unset=...) + @field_serializer / @model_serializer. SecretStr auto-masks PII. ConfigDict(strict=True, extra="forbid", from_attributes=True) are the production-grade options. Discriminated union for fast, precise branching. ORM ↔ Pydantic is from_attributes=True.

Next chapter #

Next, Chapter 25 Connecting a DB — SQLAlchemy 2.x + Alembic combines this chapter’s Pydantic patterns (from_attributes=True, etc.) with ORM objects. The schema design in this chapter becomes the starting point of Chapter 29 Capstone — finishing the TODO API once more.

X