Structured Output¶

Learn to create advanced Pydantic models for complex extractions.

Field Types¶

Required Fields¶

from pydantic import BaseModel, Field

class Model(BaseModel):
    # Required without default
    title: str = Field(description="Document title")

    # Required with validation
    score: int = Field(ge=1, le=10, description="Score from 1 to 10")

Optional Fields¶

from typing import Optional

class Model(BaseModel):
    # Optional - can be None
    notes: Optional[str] = Field(
        default=None,
        description="Additional notes, if any"
    )

Fixed Value Fields (Literal)¶

from typing import Literal

class Model(BaseModel):
    # Only accepts these values
    priority: Literal['low', 'medium', 'high', 'critical']

    # Multiple options
    status: Literal['pending', 'in_progress', 'completed', 'cancelled']

When to use Literal

Use Literal whenever possible values are known and finite. This:

Forces the LLM to choose from valid options
Avoids unwanted variations (e.g., "High" vs "high" vs "HIGH")
Makes subsequent analysis easier

Lists¶

from typing import List

class Model(BaseModel):
    # List of strings
    tags: List[str] = Field(description="Relevant tags")

    # List of objects
    items: List[Item] = Field(description="List of extracted items")

Nested Models¶

For complex structures, use nested models:

from pydantic import BaseModel, Field
from typing import List, Optional, Literal

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: Optional[str] = None

class Contact(BaseModel):
    name: str
    email: Optional[str] = None
    phone: Optional[str] = None
    address: Optional[Address] = None

class Company(BaseModel):
    name: str
    tax_id: Optional[str] = None
    contacts: List[Contact] = Field(description="List of company contacts")
    sector: Literal['technology', 'health', 'finance', 'retail', 'other']

Real Example: Legal Analysis¶

from pydantic import BaseModel, Field
from typing import List, Optional, Literal

class Party(BaseModel):
    """Party involved in the case."""
    name: str = Field(description="Full name of the party")
    type: Literal['plaintiff', 'defendant', 'third_party'] = Field(description="Party type")
    tax_id: Optional[str] = Field(default=None, description="Tax ID")

class Claim(BaseModel):
    """Claim made in the case."""
    description: str = Field(description="Claim description")
    amount: Optional[float] = Field(default=None, description="Amount in USD")
    granted: Optional[bool] = Field(default=None, description="Whether it was granted")

class CourtDecision(BaseModel):
    """Complete analysis of a court decision."""

    # Identification
    case_number: str = Field(description="Case number")
    court: str = Field(description="Court (e.g., Supreme Court, District Court)")
    decision_date: str = Field(description="Decision date (YYYY-MM-DD)")

    # Parties
    parties: List[Party] = Field(description="Parties involved")

    # Merit
    decision_type: Literal['judgment', 'ruling', 'order', 'interlocutory']
    outcome: Literal['granted', 'denied', 'partially_granted', 'dismissed']

    # Claims
    claims: List[Claim] = Field(description="Claims analyzed")

    # Summary
    summary: str = Field(description="Decision summary in up to 100 words")
    legal_grounds: List[str] = Field(description="Main legal grounds")

PROMPT = """
Analyze the court decision below and extract all relevant information.
Be precise with dates, amounts, and names.
If information is not available, use null.
"""

result = dataframeit(df_decisions, CourtDecision, PROMPT, text_column='text')

Custom Validations¶

from pydantic import BaseModel, Field, field_validator

class Document(BaseModel):
    ssn: str = Field(description="SSN in format XXX-XX-XXXX")
    email: str = Field(description="Valid email")

    @field_validator('ssn')
    @classmethod
    def validate_ssn(cls, v):
        # Remove non-numeric characters
        numbers = ''.join(filter(str.isdigit, v))
        if len(numbers) != 9:
            raise ValueError('SSN must have 9 digits')
        return v

Be careful with validations

Very restrictive validations can cause frequent errors. Use sparingly.

Best Practices¶

Use clear descriptions: The LLM uses descriptions to understand what to extract
Prefer Literal over str: When values are known, use Literal
Use Optional for uncertain fields: If information may not exist, mark as Optional
Break down complex models: Use nested models for better organization
Test with examples: Validate your model with real texts before processing everything