Blog>>Software development>>Effortless Declarative data validation with Pydantic: smarter, cleaner, and more reliable

Effortless Declarative data validation with Pydantic: smarter, cleaner, and more reliable

Validating incoming data is critical when building modern applications. Whether you're dealing with configuration files, API inputs, or manually entered data, ensuring data integrity upfront can prevent costly errors later. This is where Pydantic comes in, a powerful library that uses Python's type hints to provide a declarative, intuitive, and efficient approach to data handling.

Why Declarative Data Validation?

Declarative programming focuses on describing what you expect rather than how to achieve it. In data validation, this means defining the structure, types, and constraints upfront while the library enforces these rules automatically.

Adopting a declarative style offers several advantages:

  • Readability: The intent is clear and concise.
  • Less Boilerplate: Reduces repetitive manually written validation logic.
  • Consistency: Standardizes validation across your application.

With these advantages in mind, let’s see how Pydantic makes declarative validation simpler.

Introduction to Pydantic

Pydantic      link-icon uses Python’s type annotations to define data validation rules. It introduces data models, which are essentially Python classes that define the structure and constraints of your data while allowing custom validators to be attached when needed.

Fig.1: Key features of the Pydantic library
Key features of the Pydantic library

A number of key features of the Pydantic library are listed above. Now, let’s see it in action with a real-world example.

Basic Pydantic Usage Example

Imagine we are building a resource management system that collects and processes data from edge computing devices. Given that data from various sources may be incomplete or inconsistent, we need a good way to validate it.

Consider this example of device information provided as a JSON object:

{
    "model": "POWER EDGE",
    "serial_no": "ED-2025-455 ",
    "dimensions": ["20", " 30", "25.5"],
    "active": "yes"
}

This dataset includes some formatting inconsistencies: extra spaces, a mix of different data types within a single container, and some fields that are missing (e.g., id and device_color). Let’s see how Pydantic handles it:

from pydantic import BaseModel


class EdgeComputingDevice(BaseModel):
    id: int | None = None
    model: str
    serial_no: str
    dimensions: list[int | float]
    device_color: str = 'black'
    active: bool


validated_device = EdgeComputingDevice.validate_json(raw_json_string)
print(repr(validated_device))
print(f'Model: {validated_device.model}')
print(f'Dimensions: {validated_device.dimensions}')

Output:

EdgeComputingDevice(id=None, model='POWER EDGE', serial_no='ED-2025-455 ', dimensions=[20, 30, 25.5], device_color='black', active=True)
Model: POWER EDGE
Dimensions: [20, 30, 25.5]

What’s happening here?:

While this example works well, what if we need even stricter validation? Let’s refine it.

Adding Constraints and Data Transformations

Input data can be validated more precisely by enforcing constraints on fields. Additionally, it may be very beneficial to transform data during the validation. For instance, removing leading/trailing whitespaces can ensure string length constraints are meaningful. Here’s an enhanced example with additional validation rules:

from pydantic import AfterValidator, BaseModel, Field
from annotated_types import Ge, Le, MaxLen, MinLen
from typing import Annotated


DimensionSize = Annotated[int | float, Ge(10), Le(1000)]


class EdgeComputingDevice(BaseModel):
    id: int | None = None
    model: Annotated[str, AfterValidator(str.lower),
           AfterValidator(str.strip), MaxLen(30)]
    dimensions: Annotated[list[DimensionSize], MinLen(3), MaxLen(3)]
    serial_no: Annotated[str, AfterValidator(str.strip)]
    active: bool
    device_color: Annotated[str, AfterValidator(str.lower),
                  MaxLen(30)] = 'black'


validated_device = EdgeComputingDevice(**json.loads(raw_json_string))
print(validated_device.model_dump())

Output:

{'id': None, 'model': 'power edge', 'dimensions': [20, 30, 25.5], 'serial_no': 'ED-2025-455', 'active': True, 'device_color': 'black'}

Key improvements:

By enforcing these rules, we eliminate many potential data inconsistencies before they can cause issues.

Informative Validation Errors

An important aspect of data validation is knowing why a particular piece of data failed validation. Validation errors can be presented to users or checked by system administrators. Consider the following invalid input. It is missing a mandatory field, contains the wrong value types, and some values ​​do not meet the constraints:

{
    "serial_no": "ED-2024-103",
    "dimensions": ["not int", "3000", "25.5", "50"],
    "active": "not active",
    "color": 10
}

In case of the above invalid data, Pydantic raises a ValidationError with the following human-readable information:

pydantic_core._pydantic_core.ValidationError: 6 validation errors for EdgeComputingDevice
model
  Field required [type=missing, input_value={'serial_no': 'ED-2024-103', ...10}, input_type=dict]        
    For further information visit  
    https://errors.pydantic.dev/2.10/v/missing
dimensions
  List should have at most 3 items after validation, not 4 [type=too_long, input_value=['not int', '3000', '25.5', '50'], input_type=list]
dimensions.0.int
  Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not int', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/int_parsing 
dimensions.0.float
  Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not int', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/float_parsing
dimensions.1
  Input should be less than or equal to 1000 [type=less_than_equal, input_value=' 3000', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/less_than_equal 
active
  Input should be a valid boolean, unable to interpret input [type=bool_parsing, input_value='not active', input_type=str]
    For further information visit 
    https://errors.pydantic.dev/2.10/v/bool_parsing 

Features:

  • All Is Reported: Reports all errors      link-icon, not just the first encountered.
  • Detailed Info: Provides detailed explanations, including error location, description, invalid value, and helpful links.
  • Process Errors: Offers structured error details for easier programmatic handling.

Handle Hierarchical Data

In our example, the edge device input may include an optional network field. This field value consists of hierarchically nested structures. Its data is unstructured, especially network interfaces, and contains repeated values ​​in the IP address list.

{
    "id": 2025139,
    "model": " POWER EDGE",
    "serial_no": "ED-2025-139",
    "device_color": "WHITE",
    "dimensions": ["20", "30", 25.5],
    "active": "TRUE",
    "network": {
        "interfaces": " eth0, eth1 ",
        "macs": ["11:22:33:44:55:66", "AA-BB-CC-DD-EE-FF"],
        "ip-addresses": ["192.168.0.1", "192.168.0.1", "192.168.16.1"]
    }
}

To handle such hierarchical data, Pydantic models should also be organized hierarchically, with one model using other models as types in field annotations. This way, even multi-level data hierarchies can be successfully adjusted according to the rules and validated.

These are validation models to handle the inputs shown above:

from annotated_types import Ge, Le, MaxLen, MinLen
from pydantic import AfterValidator, BaseModel, BeforeValidator,  
                     ConfigDict
from typing import Annotated, Any


DimensionSize = Annotated[int | float, Ge(10), Le(1000)]


def split(value: Any) -> list[str] | Any:
    if not isinstance(value, str):
        return value
    return [part for part in value.split(',') if part != '']


class DeviceNetwork(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    mac_addresses: Annotated[list[str], Field(alias='macs')] = []
    interfaces: Annotated[list[str], BeforeValidator(split)] = []
    ip_addresses: Annotated[set[str], Field(alias='ip-addresses')] = {}


class EdgeComputingDevice(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    id: int | None = None
    model: Annotated[str, AfterValidator(str.lower)]
    dimensions: Annotated[list[DimensionSize], MinLen(3),MaxLen(3)]
    serial_no: str
    active: bool
    device_color: Annotated[str, AfterValidator(str.lower)] = 'black'
    network: DeviceNetwork = DeviceNetwork()

Output:

EdgeComputingDevice(id=2025139, model='power edge', dimensions=[20, 30, 25.5], serial_no='ED-2025-139', active=True, device_color='white', network=DeviceNetwork(mac_addresses=['11:22:33:44:55:66', 'AA-BB-CC-DD-EE-FF'], interfaces=['eth0', 'eth1'], ip_addresses={'192.168.16.1', '192.168.0.1'}))

Pydantic features used:

  • Nested Models: Enables the use of one model as a field type within another, simplifying validation of hierarchical data      link-icon.
  • Default Model Behavior: Declaring      link-icon the model_config attribute eliminates repetitive use of AfterValidator(str.strip) and MaxLen(30) for specific fields.
  • Field Aliases: Allows      link-icon input field names to differ from model attributes, beneficial for Python-class incompatible field names (e.g., ip-addresses).
  • Field Parsing Customization: Functions      link-icon such as BeforeValidator enable specialized input parsing, like splitting interface names from a single string.
  • Mutable Defaults: Pydantic safely deep-copies default field values      link-icon, avoiding issues with mutable defaults (e.g., lists).
  • Container Type Conversion: For fields      link-icon like ip_addresses, input lists are automatically converted to sets to eliminate duplicates.

Add Cross-Field Rules and Reuse Types

The validation models presented in the previous section are still not precise enough to catch all invalid data. Malformat IP and MAC addresses or invalid color names don’t cause errors. Additionally, each network interface should have a corresponding MAC address, which is not checked.

Let's fix this in the validation models using another set of Pydantic features:

from annotated_types import Ge, Le, MaxLen, MinLen, Predicate
from ipaddress import IPv4Address
from pydantic import AfterValidator, BaseModel, BeforeValidator,  
                     ConfigDict, Field
from typing import Annotated, Any, Self
from pydantic_extra_types.mac_address import MacAddress
from pydantic_extra_types.color import Color


def split(value: Any) -> list[str] | Any:
    if not isinstance(value, str):
        return value
    return [part for part in value.split(',') if part != '']


DimensionSize = Annotated[int | float, Ge(10), Le(1000)]
Interface = Annotated[str, Field(pattern=r'^eth\d+$')]
UniqueMembers = Predicate(
    lambda collection: len(set(collection)) == len(collection)
)


class DeviceNetwork(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    mac_addresses: Annotated[list[MacAddress], UniqueMembers,
                  Field(alias='macs')] = []
    interfaces: Annotated[list[Interface], BeforeValidator(split),
                UniqueMembers] = []
    ip_addresses: Annotated[set[IPv4Address],
                  Field(alias='ip-addresses')] = {}


    @model_validator(mode='after')
    def _check_if_each_interface_has_mac_address(self) -> Self:
        if len(self.mac_addresses) != len(self.interfaces):
            message = (
                'Number of macs must be the same as the number '
                f'of interfaces in {repr(self)}.'
            )
            raise ValueError(message)
        return self


class EdgeComputingDevice(BaseModel):
    model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)


    id: int | None = None
    model: str
    dimensions: Annotated[list[DimensionSize], MinLen(3), MaxLen(3)]
    serial_no: Annotated[str, Field(pattern=r'^ED-\d{4}-\d{3}$')]
    active: bool
    device_color: Color = Color('black')
    network: DeviceNetwork | None = DeviceNetwork()

Output:

EdgeComputingDevice(id=2025139, model='POWER EDGE', dimensions=[20, 30, 25.5], serial_no='ED-2025-139', active=True, device_color=Color('white', rgb=(255, 255, 255)), network=DeviceNetwork(mac_addresses=['11:22:33:44:55:66', 'aa:bb:cc:dd:ee:ff'], interfaces=['eth0', 'eth1'], ip_addresses={IPv4Address('192.168.0.1'), IPv4Address('192.168.16.1')}))

Pydantic features:

  • Model Validator: Cross-field validation is done by a decorated method having access to all model fields.
  • Reuse Types: Leverage standard library types      link-icon (e.g.: IPv4Address, datetime) and Pydantic extra types      link-icon (e.g.: MacAddress, Color).
  • Regex Pattern: Supports fine-tuned string field validation through regular expressions.

Conclusion

By leveraging type hints and constraints, Pydantic enables developers to write less code, reduce errors, and focus on features rather than debugging validation issues. Its declarative style, robust error reporting, and flexibility make it an invaluable tool for modern applications.

If you haven’t already, give Pydantic a try and see how it can simplify and enhance your next project!

Parniewicz  Damian

Damian Parniewicz

Senior Software Engineer

Damian Parniewicz is a Senior Software Engineer with nearly 20 years of experience. He specializes in networking technologies. His previous roles included network systems design, implementation and deployment, IT and network infrastructure provisioning, as well as problem solving for network protocols,...Read about author >

Read also

Get your project estimate

For businesses that need support in their software or network engineering projects, please fill in the form and we'll get back to you within one business day.