Introduction to Pydantic

21 minutes read

Pydantic is a Python library for data validation and data modeling. It uses Python's type hints to verify that data matches with the specified types and structures.

While Python's dynamic typing offers flexibility, it can result in runtime errors when data types don't match. Pydantic solves this by enforcing strict type checks at runtime, ensuring data integrity before processing or storage.

Since Pydantic is not part of the standard library, we'll need to install it by typing

pip install pydantic

in the command line.

To use Pydantic in our code, we can import the library:

import pydantic

However, we'll only import the specific components we need to keep our environment clean.

What is Pydantic's BaseModel?

Pydantic's BaseModel functions as a data validator for Python objects. It handles everything from simple validations—like checking if a variable is an integer—to complex ones, such as verifying data types in deeply nested dictionaries. With minimal code, it validates virtually any data scenario, making our code more robust, readable, easier to debug and making it easy to catch errors.

Let's create a HyperSkill user model with two fields: a name (using string type), a subscription status (using boolean type), and active no of months (using int type):

First, import BaseModel:

from pydantic import BaseModel

Then define the HyperSkillUser class that inherits from BaseModel with its fields and types:

class HyperSkillUser(BaseModel):
    name: str
    active_subscription: bool
    active_months: int

Now, create a user with a name and subscription status.

user = HyperSkillUser(name="John", active_subscription=True, active_months=3)

Another way of creating user is as below.

data = {"name": "John", "active_subscription": True, "active_months": 3}
user = HyperSkillUser(**data)
print(user)

# name='John' active_subscription=True active_months=3

Data Validation

To test the validation features of BaseModel, create another user with incorrect datatype fields other than what we defined for it. Add the exception handling to checkout the error in a clean format.

try:
    user = HyperSkillUser(name="John", active_subscription="True", active_months="three")
    print(user.name)
except Exception as e:
    print(e)

# Output
1 validation error for HyperSkillUser
active_months
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='three', input_type=str]
For further information visit <https://errors.pydantic.dev/2.10/v/int_parsing>

Pydantic automatically validates the input data. The active_subscription is handled with coercion, but whereas active_months has incorrect datatype which completely fails in the validation.

Type conversion is the general process of changing a value from one data type to another, while type coercion is a specific type of conversion that happens implicitly (automatically) during operations or comparisons.

Serialization

Serialization is the process of converting Python objects into a format suitable for storage or transmission, such as JSON (str) or dictionaries (dict). Pydantic simplifies this process using its BaseModel. It ensures that the serialized data is validated and structured correctly, making it ideal for applications like APIs, data pipelines, or configuration management.

The model_dump_jsongives the JSON format of the data as below:

user = HyperSkillUser(name="John", active_subscription=True, active_months=3)
print(type(user.model_dump_json()), user.model_dump_json())

# <class 'str'> {"name":"John","active_subscription":true,"active_months":3}

The model_dump gives the dict format of the data as below:

print(user.model_dump(), type(user.model_dump())) 

# {'name': 'John', 'active_subscription': True, 'active_months': 3} <class 'dict'>

What is Field in Pydantic?

Field() is a function in Pydantic, a tool to customize, set rules and limits, and a lot of other things within the BaseModel. It also allows us to define default values, validation constraints, add metadata, aliases, and other field-specific configurations. This makes our models more expressive and adaptable to various data requirements.

Set Default Value:

The Field() function can be used for Setting Assigning static or dynamic default values like below:

from pydantic import Field

class HyperSkillUser(BaseModel):
    name: str
    active_subscription: bool = Field(default=True)
    active_months: int
		
person = HyperSkillUser(name="John", active_months=3)
print(person)

# name='John' active_subscription=True active_months=3

We can ignore assigning a value to the default data field as Field()takes care of this.

Validation Constraints:

We can set range constraints and, length of the data field. Let’s set a minimum value of active_months to 1 month, the maximum to 12 months, and minimum length of nameto 8, and the maximum length to 20 (as per HyperSkill subscription) like below:

class HyperSkillUser(BaseModel):
    name: str = Field(min_length=8, max_length=20)
    active_subscription: bool = Field(default=True)
    active_months: int = Field(gt=0, lt=13)

NOTE: Constraints Representations

ge : greater or equal to
le: lesser or equal to
gt: greater than
lt: less than

person = HyperSkillUser(name="Elon Musk", active_months=12)
print(person)

# name='Elon Musk' active_subscription=True active_months=12

person = HyperSkillUser(name="Elon Musk", active_months=13)
print(person)

# pydantic_core._pydantic_core.ValidationError: 1 validation error 
# for HyperSkillUser active_months Input should be less 
# than 13 [type=less_than, input_value=13, input_type=int]

Field Validation

In Pydantic, there are two main mechanisms for validation: Field Validators and Model Validators. Field validators allow for custom validation logic specific to each field.

It’s good so far, but what if we want some more customized validation constraints, like not allowing spaces in the name?

@field_validator decorator in Pydantic is one such field validator that is used to validate specific fields in a model. It allows defining custom validation logic for one or more fields. Let’s implement the spacing constraint in our code.

from pydantic import BaseModel, ValidationError, field_validator, Field

class HyperSkillUser(BaseModel):
    name: str = Field(min_length=8, max_length=20)
    active_subscription: bool = Field(default=True)
    active_months: int = Field(gt=0, lt=13)

    # Validator for the 'name' field

    @field_validator('name')
    @classmethod
    def name_must_contain_space(cls, value: str) -> str:
        if ' ' in value:
            raise ValueError("Name must not contain a space")
        return value.title()

# Example usage
try:
    user = HyperSkillUser(name="John Doe", active_subscription=True, active_months=12)
    print(user)
except ValidationError as e:
    print(e)

@field_validator('name') ****indicates that the following method will be used to validate the field named name .

The method is decorated with @classmethod, which receives the class (cls) as its first argument instead of an instance. cls Represents the class HyperSkillUser itself and value being the field name which is to be validated.

Function name_must_contain_space takes the value as input and performs the check of space in it.

#Output:
1 validation error for HyperSkillUser
name
Value error, Name must not contain a space [type=value_error, input_value='John Doe', input_type=str]
For further information visit <https://errors.pydantic.dev/2.10/v/value_error>

Model Validation

Model Validators perform validation at the model level, enabling checks that involve multiple fields or the entire model's data. There are 3 modes in Model Validation before, after, wrap .

Model validation allows for cross-field validation and more complex logic.

Key features include:

@model_validator Decorator: This decorator is used to create validation methods that can access all fields of a model.
Cross-field Validation: It enables validation based on the relationships between different fields.

After validators are instance methods that run post-validation, serving as post-initialization hooks to perform additional actions. To use this, make the mode of the @model_validator to 'after'. Note that they should always return the validated instance.

Let’s make the logic of our example more real by making sure the active_subscription users do not have active_months as zero, and no active user should have active_months more than zero as they don’t have a subscription at all.

from pydantic import BaseModel, ValidationError, model_validator, Field
from typing import Self, Any

class HyperSkillUser(BaseModel):
    name: str = Field(min_length=8, max_length=20)
    active_subscription: bool = Field(default=True)
    active_months: int = Field(gt=0, lt=13)

    @model_validator(mode='after')
    def validate_subscription_consistency(self) -> Self:
        """Ensure active_months aligns with subscription status"""
        if self.active_subscription and self.active_months == 0:
            raise ValueError("Active subscription requires months > 0")
        if not self.active_subscription and self.active_months > 0:
            raise ValueError("Inactive subscription must have 0 active months")
        return self
        
 try:
    user = HyperSkillUser(name="Alice Smith", active_subscription=False, active_months=5)
    print(f"Valid user: {user}")
except ValidationError as e:
    print(e)

#Output:
1 validation error for HyperSkillUser
  Value error, Inactive subscription must have 0 active months [type=value_error, input_value={'name': 'Alice Smith', '...lse, 'active_months': 5}, input_type=dict]
    For further information visit <https://errors.pydantic.dev/2.10/v/value_error>

Before validators are executed prior to the instantiation of a model. They offer greater flexibility compared to after validators; however, they must handle raw input, which could potentially be any arbitrary object.

The before validators receive raw input data, so let’s make the valid data enter the model also by setting the mode to 'before' and the @classmethod finds its place back as the @field_validator.

@model_validator(mode='before')
@classmethod
def validate_input(cls, data: Any) -> Any:
    if not isinstance(data, dict):
        raise ValueError("Data should be dictionary format.")
    return data

#Output
Valid user: name='Alice Smith' active_subscription=False active_months=5

Wrap validators are the flexible ones that can be executed before or after based on the data fields. When we have multiple validations, we can implement the logic to execute only some of these, depending on the data.

@model_validator(mode='wrap')
@classmethod
def validate_logic(cls, validated_data: Any):
    *"""
    Validate subscription logic after processing raw input and field-level validation.    
    """*
    if validated_data['active_subscription'] and validated_data['active_months'] == 0:
        raise ValueError("Active subscription requires at least 1 active month")
    if not validated_data['active_subscription'] and validated_data['active_months'] > 0:
       raise ValueError("Inactive subscription cannot have active months")
    return validated_data

#Output
1 validation error for HyperSkillUser
Value error, Inactive subscription cannot have active months [type=value_error, input_value={'name': 'Alice Smith', '...lse, 'active_months': 5}, input_type=dict]
For further information visit <https://errors.pydantic.dev/2.10/v/value_error>

Refer to the documentation for more complex logic implementations and control flow adjustment of wrap validators.

Example: Usage in OpenAI API

When working with OpenAI's API (e.g., chat.completions.create), you can use Pydantic to validate input parameters (like model, temperature, etc.) before making API calls. This pre-validation ensures data correctness and helps build reliable systems. Pydantic also provides custom validators for more granular control.

Key Parameters in OpenAI API Calls

model: Specifies the model to use (e.g., "gpt-3.5-turbo", "gpt-4").
temperature: Controls randomness in the output (range: 0–2).
messages: A list of messages forming the conversation.
Custom Parameters: Additional parameters like max_tokens, top_p, etc.

Pydantic can validate these parameters before API calls to ensure they meet the correct types and constraints. You can also use it to validate API responses against your expected schema.

Validating OpenAI API Call Parameters:

from pydantic import BaseModel, Field, ValidationError

class OpenAIRequest(BaseModel):
    model: str = Field(..., description="The model to use, e.g., 'gpt-3.5-turbo'") 
    temperature: float = Field(default=0.7, ge=0, le=2, description="Controls randomness(0-2)")
    max_tokens: int = Field(default=256, ge=1, le=4096, description="Maximum number of tokens")
    prompt: str = Field(..., min_length=1, description="The input prompt for the model")

request_data = {
    "model": "gpt-3.5-turbo",
    "temperature": 0.8,
    "max_tokens": 150,
    "prompt": "What is the capital of France?"
}

NOTE: Field()is used to add metadata like description. Required fields are indicated with ....

try:
    validated_request = OpenAIRequest(**request_data)
    print(validated_request)
except ValidationError as e:
    print("Validation Error:", e)

Field Validations:

model: Required field with a description.
temperature: Must be between 0 and 2 (ge=0, le=2).
max_tokens: Must be between 1 and 4096.
prompt: Must be a non-empty string with a minimum length of 1.

If any parameter fails validation (e.g., temperature=3), Pydantic raises a detailed ValidationError.

Conclusion

We've covered data validations in Pydantic, including setting constraints using built-in methods and custom functions for field validations. Additionally, we've explored model validations and support modes, with practical implementation for the OpenAI API use case.

4 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo