Effortless Declarative data validation with Pydantic: smarter, cleaner, and more reliable

Validating incoming data is critical when building modern applications. Whether you're dealing with configuration files, API inputs, or manually entered data, ensuring data integrity upfront can prevent costly errors later. This is where Pydantic comes in, a powerful library that uses Python's type hints to provide a declarative, intuitive, and efficient approach to data handling.

Why Declarative Data Validation?

Declarative programming focuses on describing what you expect rather than how to achieve it. In data validation, this means defining the structure, types, and constraints upfront while the library enforces these rules automatically.

Adopting a declarative style offers several advantages:

Readability: The intent is clear and concise.
Less Boilerplate: Reduces repetitive manually written validation logic.
Consistency: Standardizes validation across your application.

With these advantages in mind, let’s see how Pydantic makes declarative validation simpler.

Introduction to Pydantic

Pydantic uses Python’s type annotations to define data validation rules. It introduces data models, which are essentially Python classes that define the structure and constraints of your data while allowing custom validators to be attached when needed.

Fig.1: Key features of the Pydantic library

A number of key features of the Pydantic library are listed above. Now, let’s see it in action with a real-world example.

Basic Pydantic Usage Example

Imagine we are building a resource management system that collects and processes data from edge computing devices. Given that data from various sources may be incomplete or inconsistent, we need a good way to validate it.

Consider this example of device information provided as a JSON object:

{
    "model": "POWER EDGE",
    "serial_no": "ED-2025-455 ",
    "dimensions": ["20", " 30", "25.5"],
    "active": "yes"
}

This dataset includes some formatting inconsistencies: extra spaces, a mix of different data types within a single container, and some fields that are missing (e.g., id and device_color). Let’s see how Pydantic handles it:

from pydantic import BaseModel


class EdgeComputingDevice(BaseModel):
    id: int | None = None
    model: str
    serial_no: str
    dimensions: list[int | float]
    device_color: str = 'black'
    active: bool


validated_device = EdgeComputingDevice.validate_json(raw_json_string)
print(repr(validated_device))
print(f'Model: {validated_device.model}')
print(f'Dimensions: {validated_device.dimensions}')

Output:

EdgeComputingDevice(id=None, model='POWER EDGE', serial_no='ED-2025-455 ', dimensions=[20, 30, 25.5], device_color='black', active=True)
Model: POWER EDGE
Dimensions: [20, 30, 25.5]

What’s happening here?:

Type-based Validation: Fields like int, str, and bool drive validation.
Support for Complex Types: dimensions accepts sequential data.
Automatic Conversion: Strings are converted to integers, and "yes" is interpreted as True.
Default Values: Missing fields (id, device_color) are automatically populated in the validated output.
Union Types: The | operator supports multiple acceptable types (e.g., int | float).
JSON Validation: Models can parse and validate JSON directly.

While this example works well, what if we need even stricter validation? Let’s refine it.

Adding Constraints and Data Transformations

Input data can be validated more precisely by enforcing constraints on fields. Additionally, it may be very beneficial to transform data during the validation. For instance, removing leading/trailing whitespaces can ensure string length constraints are meaningful. Here’s an enhanced example with additional validation rules:

from pydantic import AfterValidator, BaseModel, Field
from annotated_types import Ge, Le, MaxLen, MinLen
from typing import Annotated


DimensionSize = Annotated[int | float, Ge(10), Le(1000)]


class EdgeComputingDevice(BaseModel):
    id: int | None = None
    model: Annotated[str, AfterValidator(str.lower),
           AfterValidator(str.strip), MaxLen(30)]
    dimensions: Annotated[list[DimensionSize], MinLen(3), MaxLen(3)]
    serial_no: Annotated[str, AfterValidator(str.strip)]
    active: bool
    device_color: Annotated[str, AfterValidator(str.lower),
                  MaxLen(30)] = 'black'


validated_device = EdgeComputingDevice(**json.loads(raw_json_string))
print(validated_device.model_dump())

Output:

{'id': None, 'model': 'power edge', 'dimensions': [20, 30, 25.5], 'serial_no': 'ED-2025-455', 'active': True, 'device_color': 'black'}

Key improvements:

Enhanced Validation: Enforced constraints like MinLen and MaxLen.
Field Transformations: Used AfterValidator for string trimming and case conversion.
Field Validation Pipeline: Constraints and transformations are applied in sequence.
Serialization: Models are easily converted to dictionaries.

By enforcing these rules, we eliminate many potential data inconsistencies before they can cause issues.

Informative Validation Errors

An important aspect of data validation is knowing why a particular piece of data failed validation. Validation errors can be presented to users or checked by system administrators. Consider the following invalid input. It is missing a mandatory field, contains the wrong value types, and some values do not meet the constraints:

{
    "serial_no": "ED-2024-103",
    "dimensions": ["not int", "3000", "25.5", "50"],
    "active": "not active",
    "color": 10
}

In case of the above invalid data, Pydantic raises a ValidationError with the following human-readable information:

pydantic_core._pydantic_core.ValidationError: 6 validation errors for EdgeComputingDevice
model
  Field required [type=missing, input_value={'serial_no': 'ED-2024-103', ...10}, input_type=dict]        
    For further information visit  
    https://errors.pydantic.dev/2.10/v/missing
dimensions
  List should have at most 3 items after validation, not 4 [type=too_long, input_value=['not int', '3000', '25.5', '50'], input_type=list]
dimensions.0.int
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not int', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/int_parsing 
dimensions.0.float
  Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not int', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/float_parsing
dimensions.1
  Input should be less than or equal to 1000 [type=less_than_equal, input_value=' 3000', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/less_than_equal 
active
  Input should be a valid boolean, unable to interpret input [type=bool_parsing, input_value='not active', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/bool_parsing

Features:

All Is Reported: Reports all errors , not just the first encountered.
Detailed Info: Provides detailed explanations, including error location, description, invalid value, and helpful links.
Process Errors: Offers structured error details for easier programmatic handling.

Handle Hierarchical Data

In our example, the edge device input may include an optional network field. This field value consists of hierarchically nested structures. Its data is unstructured, especially network interfaces, and contains repeated values in the IP address list.

{
    "id": 2025139,
    "model": " POWER EDGE",
    "serial_no": "ED-2025-139",
    "device_color": "WHITE",
    "dimensions": ["20", "30", 25.5],
    "active": "TRUE",
    "network": {
        "interfaces": " eth0, eth1 ",
        "macs": ["11:22:33:44:55:66", "AA-BB-CC-DD-EE-FF"],
        "ip-addresses": ["192.168.0.1", "192.168.0.1", "192.168.16.1"]
    }
}

To handle such hierarchical data, Pydantic models should also be organized hierarchically, with one model using other models as types in field annotations. This way, even multi-level data hierarchies can be successfully adjusted according to the rules and validated.

These are validation models to handle the inputs shown above:

from annotated_types import Ge, Le, MaxLen, MinLen
from pydantic import AfterValidator, BaseModel, BeforeValidator,  
                     ConfigDict
from typing import Annotated, Any


DimensionSize = Annotated[int | float, Ge(10), Le(1000)]


def split(value: Any) -> list[str] | Any:
    if not isinstance(value, str):
        return value
    return [part for part in value.split(',') if part != '']


class DeviceNetwork(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    mac_addresses: Annotated[list[str], Field(alias='macs')] = []
    interfaces: Annotated[list[str], BeforeValidator(split)] = []
    ip_addresses: Annotated[set[str], Field(alias='ip-addresses')] = {}


class EdgeComputingDevice(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    id: int | None = None
    model: Annotated[str, AfterValidator(str.lower)]
    dimensions: Annotated[list[DimensionSize], MinLen(3),MaxLen(3)]
    serial_no: str
    active: bool
    device_color: Annotated[str, AfterValidator(str.lower)] = 'black'
    network: DeviceNetwork = DeviceNetwork()

Output:

EdgeComputingDevice(id=2025139, model='power edge', dimensions=[20, 30, 25.5], serial_no='ED-2025-139', active=True, device_color='white', network=DeviceNetwork(mac_addresses=['11:22:33:44:55:66', 'AA-BB-CC-DD-EE-FF'], interfaces=['eth0', 'eth1'], ip_addresses={'192.168.16.1', '192.168.0.1'}))

Pydantic features used:

Nested Models: Enables the use of one model as a field type within another, simplifying validation of hierarchical data .
Default Model Behavior: Declaring the model_config attribute eliminates repetitive use of AfterValidator(str.strip) and MaxLen(30) for specific fields.
Field Aliases: Allows input field names to differ from model attributes, beneficial for Python-class incompatible field names (e.g., ip-addresses).
Field Parsing Customization: Functions such as BeforeValidator enable specialized input parsing, like splitting interface names from a single string.
Mutable Defaults: Pydantic safely deep-copies default field values , avoiding issues with mutable defaults (e.g., lists).
Container Type Conversion: For fields like ip_addresses, input lists are automatically converted to sets to eliminate duplicates.

Add Cross-Field Rules and Reuse Types

The validation models presented in the previous section are still not precise enough to catch all invalid data. Malformat IP and MAC addresses or invalid color names don’t cause errors. Additionally, each network interface should have a corresponding MAC address, which is not checked.

Let's fix this in the validation models using another set of Pydantic features:

from annotated_types import Ge, Le, MaxLen, MinLen, Predicate
from ipaddress import IPv4Address
from pydantic import AfterValidator, BaseModel, BeforeValidator,  
                     ConfigDict, Field
from typing import Annotated, Any, Self
from pydantic_extra_types.mac_address import MacAddress
from pydantic_extra_types.color import Color


def split(value: Any) -> list[str] | Any:
    if not isinstance(value, str):
        return value
    return [part for part in value.split(',') if part != '']


DimensionSize = Annotated[int | float, Ge(10), Le(1000)]
Interface = Annotated[str, Field(pattern=r'^eth\d+$')]
UniqueMembers = Predicate(
    lambda collection: len(set(collection)) == len(collection)
)


class DeviceNetwork(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    mac_addresses: Annotated[list[MacAddress], UniqueMembers,
                  Field(alias='macs')] = []
    interfaces: Annotated[list[Interface], BeforeValidator(split),
                UniqueMembers] = []
    ip_addresses: Annotated[set[IPv4Address],
                  Field(alias='ip-addresses')] = {}


    @model_validator(mode='after')
    def _check_if_each_interface_has_mac_address(self) -> Self:
        if len(self.mac_addresses) != len(self.interfaces):
            message = (
                'Number of macs must be the same as the number '
                f'of interfaces in {repr(self)}.'
            )
            raise ValueError(message)
        return self


class EdgeComputingDevice(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    id: int | None = None
    model: str
    dimensions: Annotated[list[DimensionSize], MinLen(3), MaxLen(3)]
    serial_no: Annotated[str, Field(pattern=r'^ED-\d{4}-\d{3}$')]
    active: bool
    device_color: Color = Color('black')
    network: DeviceNetwork | None = DeviceNetwork()

Output:

EdgeComputingDevice(id=2025139, model='POWER EDGE', dimensions=[20, 30, 25.5], serial_no='ED-2025-139', active=True, device_color=Color('white', rgb=(255, 255, 255)), network=DeviceNetwork(mac_addresses=['11:22:33:44:55:66', 'aa:bb:cc:dd:ee:ff'], interfaces=['eth0', 'eth1'], ip_addresses={IPv4Address('192.168.0.1'), IPv4Address('192.168.16.1')}))

Pydantic features:

Model Validator: Cross-field validation is done by a decorated method having access to all model fields.
Reuse Types: Leverage standard library types (e.g.: IPv4Address, datetime) and Pydantic extra types (e.g.: MacAddress, Color).
Regex Pattern: Supports fine-tuned string field validation through regular expressions.

Conclusion

By leveraging type hints and constraints, Pydantic enables developers to write less code, reduce errors, and focus on features rather than debugging validation issues. Its declarative style, robust error reporting, and flexibility make it an invaluable tool for modern applications.

If you haven’t already, give Pydantic a try and see how it can simplify and enhance your next project!

Services

Knowledge

Effortless Declarative data validation with Pydantic: smarter, cleaner, and more reliable

Table of contents:

Why Declarative Data Validation?

Introduction to Pydantic

Basic Pydantic Usage Example

Adding Constraints and Data Transformations

Informative Validation Errors

Handle Hierarchical Data

Pydantic features used:

Add Cross-Field Rules and Reuse Types

Pydantic features:

Conclusion

Read also

Ensuring Consistency in Distributed Systems: Atomic Operations in Microservices

Hardware acceleration in action: Four of the most innovative PoCs from CodiLime

Get your project estimate

Trusted by leaders: