Validating incoming data is critical when building modern applications. Whether you're dealing with configuration files, API inputs, or manually entered data, ensuring data integrity upfront can prevent costly errors later. This is where Pydantic comes in, a powerful library that uses Python's type hints to provide a declarative, intuitive, and efficient approach to data handling.
Why Declarative Data Validation?
Declarative programming focuses on describing what you expect rather than how to achieve it. In data validation, this means defining the structure, types, and constraints upfront while the library enforces these rules automatically.
Adopting a declarative style offers several advantages:
- Readability: The intent is clear and concise.
- Less Boilerplate: Reduces repetitive manually written validation logic.
- Consistency: Standardizes validation across your application.
With these advantages in mind, let’s see how Pydantic makes declarative validation simpler.
Introduction to Pydantic
Pydantic uses Python’s type annotations to define data validation rules. It introduces data models, which are essentially Python classes that define the structure and constraints of your data while allowing custom validators to be attached when needed.

A number of key features of the Pydantic library are listed above. Now, let’s see it in action with a real-world example.
Basic Pydantic Usage Example
Imagine we are building a resource management system that collects and processes data from edge computing devices. Given that data from various sources may be incomplete or inconsistent, we need a good way to validate it.
Consider this example of device information provided as a JSON object:
{
"model": "POWER EDGE",
"serial_no": "ED-2025-455 ",
"dimensions": ["20", " 30", "25.5"],
"active": "yes"
}
This dataset includes some formatting inconsistencies: extra spaces, a mix of different data types within a single container, and some fields that are missing (e.g., id and device_color). Let’s see how Pydantic handles it:
from pydantic import BaseModel
class EdgeComputingDevice(BaseModel):
id: int | None = None
model: str
serial_no: str
dimensions: list[int | float]
device_color: str = 'black'
active: bool
validated_device = EdgeComputingDevice.validate_json(raw_json_string)
print(repr(validated_device))
print(f'Model: {validated_device.model}')
print(f'Dimensions: {validated_device.dimensions}')
Output:
EdgeComputingDevice(id=None, model='POWER EDGE', serial_no='ED-2025-455 ', dimensions=[20, 30, 25.5], device_color='black', active=True)
Model: POWER EDGE
Dimensions: [20, 30, 25.5]
What’s happening here?:
- Type-based Validation: Fields
like int, str, and bool drive validation.
- Support for Complex Types: dimensions accepts sequential data.
- Automatic Conversion: Strings are converted
to integers, and "yes" is interpreted as True.
- Default Values: Missing fields (id, device_color) are automatically populated
in the validated output.
- Union Types: The | operator supports multiple acceptable types
(e.g., int | float).
- JSON Validation: Models can parse and validate JSON
directly.
While this example works well, what if we need even stricter validation? Let’s refine it.
Adding Constraints and Data Transformations
Input data can be validated more precisely by enforcing constraints on fields. Additionally, it may be very beneficial to transform data during the validation. For instance, removing leading/trailing whitespaces can ensure string length constraints are meaningful. Here’s an enhanced example with additional validation rules:
from pydantic import AfterValidator, BaseModel, Field
from annotated_types import Ge, Le, MaxLen, MinLen
from typing import Annotated
DimensionSize = Annotated[int | float, Ge(10), Le(1000)]
class EdgeComputingDevice(BaseModel):
id: int | None = None
model: Annotated[str, AfterValidator(str.lower),
AfterValidator(str.strip), MaxLen(30)]
dimensions: Annotated[list[DimensionSize], MinLen(3), MaxLen(3)]
serial_no: Annotated[str, AfterValidator(str.strip)]
active: bool
device_color: Annotated[str, AfterValidator(str.lower),
MaxLen(30)] = 'black'
validated_device = EdgeComputingDevice(**json.loads(raw_json_string))
print(validated_device.model_dump())
Output:
{'id': None, 'model': 'power edge', 'dimensions': [20, 30, 25.5], 'serial_no': 'ED-2025-455', 'active': True, 'device_color': 'black'}
Key improvements:
- Enhanced Validation: Enforced constraints
like MinLen and MaxLen.
- Field Transformations: Used
AfterValidator for string trimming and case conversion.
- Field Validation Pipeline: Constraints and transformations are applied in sequence.
- Serialization: Models are easily converted
to dictionaries.
By enforcing these rules, we eliminate many potential data inconsistencies before they can cause issues.
Informative Validation Errors
An important aspect of data validation is knowing why a particular piece of data failed validation. Validation errors can be presented to users or checked by system administrators. Consider the following invalid input. It is missing a mandatory field, contains the wrong value types, and some values do not meet the constraints:
{
"serial_no": "ED-2024-103",
"dimensions": ["not int", "3000", "25.5", "50"],
"active": "not active",
"color": 10
}
In case of the above invalid data, Pydantic raises a ValidationError with the following human-readable information:
pydantic_core._pydantic_core.ValidationError: 6 validation errors for EdgeComputingDevice
model
Field required [type=missing, input_value={'serial_no': 'ED-2024-103', ...10}, input_type=dict]
For further information visit
https://errors.pydantic.dev/2.10/v/missing
dimensions
List should have at most 3 items after validation, not 4 [type=too_long, input_value=['not int', '3000', '25.5', '50'], input_type=list]
dimensions.0.int
Input should be a valid integer, unable to parse string as an integer [type=int_parsing, input_value='not int', input_type=str]
For further information visit
https://errors.pydantic.dev/2.10/v/int_parsing
dimensions.0.float
Input should be a valid number, unable to parse string as a number [type=float_parsing, input_value='not int', input_type=str]
For further information visit
https://errors.pydantic.dev/2.10/v/float_parsing
dimensions.1
Input should be less than or equal to 1000 [type=less_than_equal, input_value=' 3000', input_type=str]
For further information visit
https://errors.pydantic.dev/2.10/v/less_than_equal
active
Input should be a valid boolean, unable to interpret input [type=bool_parsing, input_value='not active', input_type=str]
For further information visit
https://errors.pydantic.dev/2.10/v/bool_parsing
Features:
- All Is Reported: Reports all errors
, not just the first encountered.
- Detailed Info: Provides detailed explanations, including error location, description, invalid value, and helpful links.
- Process Errors: Offers structured error details for easier programmatic handling.
Handle Hierarchical Data
In our example, the edge device input may include an optional network field. This field value consists of hierarchically nested structures. Its data is unstructured, especially network interfaces, and contains repeated values in the IP address list.
{
"id": 2025139,
"model": " POWER EDGE",
"serial_no": "ED-2025-139",
"device_color": "WHITE",
"dimensions": ["20", "30", 25.5],
"active": "TRUE",
"network": {
"interfaces": " eth0, eth1 ",
"macs": ["11:22:33:44:55:66", "AA-BB-CC-DD-EE-FF"],
"ip-addresses": ["192.168.0.1", "192.168.0.1", "192.168.16.1"]
}
}
To handle such hierarchical data, Pydantic models should also be organized hierarchically, with one model using other models as types in field annotations. This way, even multi-level data hierarchies can be successfully adjusted according to the rules and validated.
These are validation models to handle the inputs shown above:
from annotated_types import Ge, Le, MaxLen, MinLen
from pydantic import AfterValidator, BaseModel, BeforeValidator,
ConfigDict
from typing import Annotated, Any
DimensionSize = Annotated[int | float, Ge(10), Le(1000)]
def split(value: Any) -> list[str] | Any:
if not isinstance(value, str):
return value
return [part for part in value.split(',') if part != '']
class DeviceNetwork(BaseModel):
model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)
mac_addresses: Annotated[list[str], Field(alias='macs')] = []
interfaces: Annotated[list[str], BeforeValidator(split)] = []
ip_addresses: Annotated[set[str], Field(alias='ip-addresses')] = {}
class EdgeComputingDevice(BaseModel):
model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)
id: int | None = None
model: Annotated[str, AfterValidator(str.lower)]
dimensions: Annotated[list[DimensionSize], MinLen(3),MaxLen(3)]
serial_no: str
active: bool
device_color: Annotated[str, AfterValidator(str.lower)] = 'black'
network: DeviceNetwork = DeviceNetwork()
Output:
EdgeComputingDevice(id=2025139, model='power edge', dimensions=[20, 30, 25.5], serial_no='ED-2025-139', active=True, device_color='white', network=DeviceNetwork(mac_addresses=['11:22:33:44:55:66', 'AA-BB-CC-DD-EE-FF'], interfaces=['eth0', 'eth1'], ip_addresses={'192.168.16.1', '192.168.0.1'}))
Pydantic features used:
- Nested Models: Enables the use of one model as a field type within another, simplifying validation of hierarchical data
.
- Default Model Behavior: Declaring
the model_config attribute eliminates repetitive use of AfterValidator(str.strip) and MaxLen(30) for specific fields.
- Field Aliases: Allows
input field names to differ from model attributes, beneficial for Python-class incompatible field names (e.g., ip-addresses).
- Field Parsing Customization: Functions
such as BeforeValidator enable specialized input parsing, like splitting interface names from a single string.
- Mutable Defaults: Pydantic safely deep-copies default field values
, avoiding issues with mutable defaults (e.g., lists).
- Container Type Conversion: For fields
like ip_addresses, input lists are automatically converted to sets to eliminate duplicates.
Add Cross-Field Rules and Reuse Types
The validation models presented in the previous section are still not precise enough to catch all invalid data. Malformat IP and MAC addresses or invalid color names don’t cause errors. Additionally, each network interface should have a corresponding MAC address, which is not checked.
Let's fix this in the validation models using another set of Pydantic features:
from annotated_types import Ge, Le, MaxLen, MinLen, Predicate
from ipaddress import IPv4Address
from pydantic import AfterValidator, BaseModel, BeforeValidator,
ConfigDict, Field
from typing import Annotated, Any, Self
from pydantic_extra_types.mac_address import MacAddress
from pydantic_extra_types.color import Color
def split(value: Any) -> list[str] | Any:
if not isinstance(value, str):
return value
return [part for part in value.split(',') if part != '']
DimensionSize = Annotated[int | float, Ge(10), Le(1000)]
Interface = Annotated[str, Field(pattern=r'^eth\d+$')]
UniqueMembers = Predicate(
lambda collection: len(set(collection)) == len(collection)
)
class DeviceNetwork(BaseModel):
model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)
mac_addresses: Annotated[list[MacAddress], UniqueMembers,
Field(alias='macs')] = []
interfaces: Annotated[list[Interface], BeforeValidator(split),
UniqueMembers] = []
ip_addresses: Annotated[set[IPv4Address],
Field(alias='ip-addresses')] = {}
@model_validator(mode='after')
def _check_if_each_interface_has_mac_address(self) -> Self:
if len(self.mac_addresses) != len(self.interfaces):
message = (
'Number of macs must be the same as the number '
f'of interfaces in {repr(self)}.'
)
raise ValueError(message)
return self
class EdgeComputingDevice(BaseModel):
model_config = ConfigDict(str_strip_whitespace=True, str_max_length=30)
id: int | None = None
model: str
dimensions: Annotated[list[DimensionSize], MinLen(3), MaxLen(3)]
serial_no: Annotated[str, Field(pattern=r'^ED-\d{4}-\d{3}$')]
active: bool
device_color: Color = Color('black')
network: DeviceNetwork | None = DeviceNetwork()
Output:
EdgeComputingDevice(id=2025139, model='POWER EDGE', dimensions=[20, 30, 25.5], serial_no='ED-2025-139', active=True, device_color=Color('white', rgb=(255, 255, 255)), network=DeviceNetwork(mac_addresses=['11:22:33:44:55:66', 'aa:bb:cc:dd:ee:ff'], interfaces=['eth0', 'eth1'], ip_addresses={IPv4Address('192.168.0.1'), IPv4Address('192.168.16.1')}))
Pydantic features:
- Model Validator: Cross-field validation is done by a decorated method having access to all model fields.
- Reuse Types: Leverage standard library types
(e.g.: IPv4Address, datetime) and Pydantic extra types
(e.g.: MacAddress, Color).
- Regex Pattern: Supports fine-tuned string field validation through regular expressions.
Conclusion
By leveraging type hints and constraints, Pydantic enables developers to write less code, reduce errors, and focus on features rather than debugging validation issues. Its declarative style, robust error reporting, and flexibility make it an invaluable tool for modern applications.
If you haven’t already, give Pydantic a try and see how it can simplify and enhance your next project!