Parse, Don’t Pray: The Case for Data Validation

Indrajeet Patil

Parse, Don't Pray - Two hands clasped together symbolizing the partnership between Pydantic and Zod for data validation

What You’ll Learn

  • Why data validation libraries are essential for robust applications
  • Why validation is needed on both client and server (not just one)
  • How Pydantic (Python) and Zod (TypeScript) improve code quality
  • Benefits across readability, runtime behavior, typing, testing, and security

🎯 Goal

Create a validated type system that works at runtime, ensuring data actually matches your assumptions.


Examples assume a Python-TypeScript full-stack, but the principles apply to other stacks.

Python examples use from pydantic import * for brevity; prefer explicit imports in production.

Introduction

Why data validation matters

Dynamic vs Static Typing

Type systems differ in when they catch type errors.

Statically Typed (e.g., Java, C++, Rust)

  • Types checked at compile time
  • Type errors caught before runtime
  • More verbose type declarations
  • Compiler guarantees type safety
fn greet(name: String) -> String {
    format!("Hello, {}!", name)
}

// Won't compile: type mismatch
greet(42);  // Error at compile time

Dynamically Typed (e.g., Python, JavaScript)

  • Types checked at runtime
  • Type errors appear during execution
  • More flexible, less verbose
  • No compile-time type guarantees
def greet(name: str) -> str:
    return f"Hello, {name.upper()}!"

# Works with correct type
greet("alice")  # "Hello, ALICE!"
# Wrong type fails at runtime
greet(42)       # Runtime error: 'int' object has no attribute 'upper'

Why validation matters more for dynamic languages:
Without compile-time checks, runtime validation ensures data integrity.

What is Data Validation?

Ensuring data meets specific criteria before use.

❌ Without validation libraries

Manual checks scattered throughout code:

  • Type, range, and format checks written by hand
  • Logic spreads across codebase
  • Difficult to maintain and extend

βœ… With validation libraries

Schemas automatically validate, parse, and type data:

  • Centralized, self-documenting rules
  • Automatic type coercion
  • Detailed error messages
  • Easier to maintain and evolve

Meet the Libraries

Pydantic logo

Pydantic (Python)

Runtime data validation using Python type annotations

Zod logo

Zod (TypeScript)

TypeScript-first schema validation with type inference

Benefits of Data Validation Libraries

Creating robust, maintainable applications

Readability

Self-documenting schemas that reduce cognitive load

Self-Documenting Schemas

from pydantic import *
from typing import Annotated, Literal

UserName = Annotated[str, Field(min_length=3, max_length=20)]

class UserProfile(BaseModel):
    username: UserName
    email: EmailStr
    age: int = Field(ge=18, le=120)
    role: Literal["admin", "user", "guest"]
    is_active: bool = True
import { z } from "zod";

const UserName = z.string().min(3).max(20);

const UserProfileSchema = z.object({
  username: UserName,
  email: z.string().email(),
  age: z.number().int().min(18).max(120),
  role: z.enum(["admin", "user", "guest"]),
  isActive: z.boolean().default(true),
});

type UserProfile = z.infer<typeof UserProfileSchema>;

The schema is the documentation – no separate docs needed!

Schema definition, validation logic, and type information live in one place – a single source of truth.

Declarative Syntax

Without Pydantic

# Imperative: Lots of manual checks
def validate_user(data):
    if not isinstance(data.get('email'), str):
        raise ValueError('Email must be string')
    if '@' not in data['email']:
        raise ValueError('Invalid email')
    if not isinstance(data.get('age'), int):
        raise ValueError('Age must be integer')
    if data['age'] < 18 or data['age'] > 120:
        raise ValueError('Age must be 18-120')
    # ... and so on

With Pydantic

from pydantic import *

class User(BaseModel):
    email: EmailStr
    age: int = Field(ge=18, le=120)

Without Zod

// Imperative: Lots of manual checks
function validateUser(data: any) {
  if (typeof data.email !== 'string') {
    throw new Error('Email must be string');
  }
  if (!data.email.includes('@')) {
    throw new Error('Invalid email');
  }
  if (typeof data.age !== 'number') {
    throw new Error('Age must be number');
  }
  if (data.age < 18 || data.age > 120) {
    throw new Error('Age must be 18-120');
  }
  // ... and so on
}

With Zod

import { z } from "zod";

const UserSchema = z.object({
  email: z.string().email(),
  age: z.number().int().min(18).max(120),
});

type User = z.infer<typeof UserSchema>;

Describes what the data should look like, not how to validate it.

Specific Types Over Generic

❌ Generic: str

class User(BaseModel):
    email: str
  • Accepts any string
  • No validation
  • Unclear intent

⚠️ Better: EmailStr

class User(BaseModel):
    email: EmailStr
  • Validates email format
  • Self-documenting
  • Still generic

βœ… Best: AcmeEmailStr

from pydantic import *
from typing import Annotated

def must_be_acme(value: str) -> str:
    if not value.endswith('@acme.com'):
        raise ValueError('Must be @acme.com')
    return value

AcmeEmailStr = Annotated[
    EmailStr,
    AfterValidator(must_be_acme),
]

class User(BaseModel):
    email: AcmeEmailStr
  • Domain-specific
  • Business rule enforced
  • Crystal clear intent

❌ Generic: string

const UserSchema = z.object({
  email: z.string(),
});
  • Accepts any string
  • No validation
  • Unclear intent

⚠️ Better: .email()

const UserSchema = z.object({
  email: z.string().email(),
});
  • Validates email format
  • Self-documenting
  • Still generic

βœ… Best: Custom refinement

const acmeEmail = z
  .string()
  .email()
  .refine(
    (email) => email.endsWith('@acme.com'),
    { message: 'Must be @acme.com email' }
  );

const UserSchema = z.object({
  email: acmeEmail,
});
  • Domain-specific
  • Business rule enforced
  • Crystal clear intent

str β†’ EmailStr β†’ AcmeEmailStr = Increasing specificity = Better readability

Runtime Behavior

Catch errors early before they propagate

Automatic Validation

Schema Definition

from pydantic import *

class ProductPrice(BaseModel):
    product_id: str
    price: PositiveFloat
    currency: Currency  # from pydantic_extra_types

Validation Examples

# βœ… Valid data passes through
ProductPrice(
    product_id="PROD-123",
    price=29.99,
    currency="USD"
)

# 🚫 Invalid data blocked
ProductPrice(
    product_id="PROD-123",
    price=-10,     # Negative!
    currency="US"  # Invalid format!
)

Schema Definition

import { z } from "zod";

const ProductPriceSchema = z.object({
  productId: z.string(),
  price: z.number().positive(),
  currency: z.string().regex(/^[A-Z]{3}$/),
});

Validation Examples

// βœ… Valid data passes through
ProductPriceSchema.parse({
  productId: "PROD-123",
  price: 29.99,
  currency: "USD",
});

// 🚫 Invalid data blocked
ProductPriceSchema.parse({
  productId: "PROD-123",
  price: -10,     // Negative!
  currency: "US", // Invalid format!
});

Data validated as it enters the system, catching errors early.

Data Coercion & Parsing

Schema Definition

from pydantic import *
from datetime import datetime

class Event(BaseModel):
    event_id: int        # Coerces string to int
    timestamp: datetime  # Parses ISO 8601 strings
    is_public: bool      # Coerces to bool
    attendees: list[str] # Coerces tuple to list

Coercion Examples

event = Event(
    event_id="42",                    # β†’ int 42
    timestamp="2024-01-15T10:30:00",  # β†’ datetime
    is_public="yes",                  # β†’ True
    attendees=("Alice", "Bob")        # β†’ list
)

Schema Definition

import { z } from "zod";

const EventSchema = z.object({
  eventId: z.coerce.number(),   // Coerce to number
  timestamp: z.coerce.date(),   // Parse to Date
  isPublic: z.boolean(),
  attendees: z.array(z.string()),
});

Coercion Examples

const event = EventSchema.parse({
  eventId: "42",                     // β†’ number 42
  timestamp: "2024-01-15T10:30:00",  // β†’ Date
  isPublic: true,
  attendees: ["Alice", "Bob"],
});

Handles common API/JSON data format conversions automatically!

Detailed Error Messages

Schema Definition

from pydantic import *

class SignupForm(BaseModel):
    username: str = Field(min_length=3)
    email: EmailStr

Error Messages

try:
    user = SignupForm(
        username="ab",        # Too short
        email="not-an-email", # Invalid format
    )
except ValidationError as e:
    print(e.json(indent=2))
    # Output:
    # [{
    #   "loc": ["username"],
    #   "msg": "at least 3 characters",
    #  },
    #  {
    #   "loc": ["email"],
    #   "msg": "not a valid email",
    # }]

Schema Definition

import { z } from "zod";

const SignupFormSchema = z.object({
  username: z.string().min(3),
  email: z.string().email(),
});

Error Messages

const result = SignupFormSchema.safeParse({
  username: "ab",            // Too short
  email: "not-an-email",     // Invalid format
});

if (!result.success) {
  console.log(result.error.format());
  // Output:
  // {
  //   "username": {
  //     "_errors": ["at least 3 character(s)"]
  //   },
  //   "email": { "_errors": ["Invalid email"] }
  // }
}

Specific, actionable feedback about what’s wrong and where.

Typing

Type inference that bridges static and runtime worlds

Type Inference

Schema Definition

from pydantic import *

class Address(BaseModel):
    street: str
    city: str
    zipcode: str
    country: str = "USA"

class Person(BaseModel):
    name: str
    age: int
    address: Address
    phone: str | None = None

Type Inference

# Type checkers understand types automatically
def get_person_city(person: Person) -> str:
    return person.address.city

# No separate type annotations needed
person = Person(
    name="Alice",
    age=30,
    address=Address(
        street="123 Main St",
        city="Boston",
        zipcode="02101"
    )
)

city: str = get_person_city(person)

Schema Definition

import { z } from "zod";

const AddressSchema = z.object({
  street: z.string(),
  city: z.string(),
  zipcode: z.string(),
  country: z.string().default("USA"),
});

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
  address: AddressSchema,
  phone: z.string().optional(),
});

Type Inference

// Types automatically inferred from schema
type Person = z.infer<typeof PersonSchema>;

function getPersonCity(person: Person): string {
  return person.address.city;
}

const person = PersonSchema.parse({
  name: "Alice",
  age: 30,
  address: {
    street: "123 Main St",
    city: "Boston",
    zipcode: "02101"
  }
});

const city: string = getPersonCity(person);

Schema is both validator and type definition!

End-to-End Type Safety

Schema Definition

from pydantic import *

class ApiResponse(BaseModel):
    status: str = Field(pattern=r'^(success|error)$')
    data: list[dict]
    message: str

Type-Safe Usage

def fetch_data_from_api() -> ApiResponse:
    raw_response = requests.get(
        "https://api.example.com/data"
    ).json()
    # Validates runtime data
    return ApiResponse(**raw_response)

def process_response(response: ApiResponse) -> None:
    # Type-safe throughout
    if response.status == "success":
        for item in response.data:
            process_item(item)

response = fetch_data_from_api()
process_response(response)

Schema Definition

import { z } from "zod";

const ApiResponseSchema = z.object({
  status: z.enum(["success", "error"]),
  data: z.array(z.record(z.unknown())),
  message: z.string(),
});

type ApiResponse = z.infer<typeof ApiResponseSchema>;

Type-Safe Usage

async function fetchDataFromApi(): Promise<ApiResponse> {
  const rawResponse = await fetch(
    "https://api.example.com/data"
  );
  const json = await rawResponse.json();
  // Validates runtime data
  return ApiResponseSchema.parse(json);
}

function processResponse(response: ApiResponse): void {
  // Type-safe throughout
  if (response.status === "success") {
    for (const item of response.data) {
      processItem(item);
    }
  }
}

External data is validated before entering your type-safe code!

Testing

Reduce test burden with built-in validation

Reduced Test Burden

Without Pydantic

# Need tests for every validation rule
def test_signup_validation():
    with pytest.raises(ValueError):
        create_user(email="invalid")
    with pytest.raises(ValueError):
        create_user(age=17)
    with pytest.raises(ValueError):
        create_user(age=121)
    # ... more validation tests

With Pydantic

from pydantic import *

class User(BaseModel):
    email: EmailStr
    age: int = Field(ge=18, le=120)

# Only test business logic, not validation
def test_user_permissions():
    admin = User(email="a@test.com", age=30)
    assert admin.has_permission("delete_users")

# Validation is covered by Pydantic itself.

Without Zod

// Need tests for every validation rule
describe("User validation", () => {
  it("rejects invalid email", () => {
    expect(() => createUser({ email: "invalid" }))
      .toThrow();
  });
  it("rejects young age", () => {
    expect(() => createUser({ age: 17 }))
      .toThrow();
  });
  // ... more validation tests
});

With Zod

import { z } from "zod";

const UserSchema = z.object({
  email: z.string().email(),
  age: z.number().min(18).max(120),
});

type User = z.infer<typeof UserSchema>;

// Only test business logic, not validation
describe("User permissions", () => {
  it("admin has delete permission", () => {
    const admin = UserSchema.parse({
      email: "a@test.com", age: 30,
    });
    expect(admin.hasPermission("delete_users")).toBe(true);
  });
});

// Validation is covered by Zod itself.

Validation testing is delegated to the well-tested library!

Easy Mock Data Generation

from pydantic import *
from hypothesis import given
from hypothesis.strategies import builds

class Product(BaseModel):
    name: str = Field(min_length=1, max_length=100)
    price: PositiveFloat = Field(le=10000)

# Hypothesis auto-generates valid instances from the schema
@given(builds(Product))
def test_product_discount(product):
    discounted = apply_discount(product, 0.1)
    assert discounted.price < product.price
import { z } from "zod";
import { faker } from "@faker-js/faker";

const ProductSchema = z.object({
  name: z.string().min(1).max(100),
  price: z.number().positive().max(10000),
});

type Product = z.infer<typeof ProductSchema>;

// Faker generates valid test data matching the schema
function generateProduct(): Product {
  return ProductSchema.parse({
    name: faker.commerce.productName(),
    price: faker.number.float({ min: 0.01, max: 10000 }),
  });
}

Generate hundreds of valid test cases from your schema!

Security

Prevent vulnerabilities through validation

Input Sanitization

Schema Definition

from pydantic import *

class UserUpdate(BaseModel):
    # Rejects unknown fields
    model_config = ConfigDict(extra='forbid')
    email: str

Mass Assignment Prevention

# βœ… Safe request
UserUpdate(email="user@example.com")

# 🚫 Malicious fields blocked
UserUpdate(
    email="hacker@evil.com",
    is_admin=True,   # Attack!
    balance=999999,  # Attack!
)

Schema Definition

import { z } from "zod";

const UserUpdateSchema = z.strictObject({  // Rejects unknown fields
  email: z.string().email(),
});

Mass Assignment Prevention

// βœ… Safe request
UserUpdateSchema.parse({ email: "user@example.com" });

// 🚫 Malicious fields blocked
UserUpdateSchema.parse({
  email: "hacker@evil.com",
  isAdmin: true,    // Attack!
  balance: 999999,  // Attack!
});

Attackers can’t inject fields to escalate privileges!

Type Confusion Prevention

Schema Definition

from pydantic import *

class PaymentRequest(BaseModel):
    user_id: str
    amount: PositiveFloat
    currency: Currency  # from pydantic_extra_types

Type Confusion Prevention

# βœ… Valid payment
PaymentRequest(
    user_id="USR-123",
    amount=10.0,
    currency="USD"
)

# 🚫 String instead of float blocked
PaymentRequest(
    user_id="USR-123",
    amount="0.01 OR 1=1",  # Attack!
    currency="USD"
)

# 🚫 Array instead of string blocked
PaymentRequest(
    user_id=["USR-123", "ADMIN"],  # Attack!
    amount=10.0,
    currency="USD"
)

Schema Definition

import { z } from "zod";

const PaymentRequestSchema = z.object({
  userId: z.string(),
  amount: z.number().positive(),
  currency: z.string().regex(/^[A-Z]{3}$/),
});

Type Confusion Prevention

// βœ… Valid payment
PaymentRequestSchema.parse({
  userId: "USR-123",
  amount: 10.0,
  currency: "USD",
});

// 🚫 String instead of number blocked
PaymentRequestSchema.parse({
  userId: "USR-123",
  amount: "0.01 OR 1=1",  // Attack!
  currency: "USD",
});

// 🚫 Array instead of string blocked
PaymentRequestSchema.parse({
  userId: ["USR-123", "ADMIN"],  // Attack!
  amount: 10.0,
  currency: "USD",
});

Use strict schemas at trust boundaries; coerce only when intentional.

Size & Range Limits

Schema Definition

from pydantic import *
from typing import Annotated

class CommentSubmission(BaseModel):
    post_id: str
    author: str = Field(min_length=1, max_length=50)
    content: str = Field(min_length=1, max_length=5000)
    tags: list[Annotated[str, Field(max_length=30)]] = Field(max_length=10)
    rating: int = Field(ge=1, le=5)

DoS Attack Prevention

# 🚫 Massive content blocked
CommentSubmission(
    post_id="POST-123",
    author="attacker",
    content="X" * 1_000_000,  # 1MB!
    tags=["spam"] * 1000,     # 1000 tags!
    rating=999                # Out of range!
)
# Errors:
# - content: max 5000 characters
# - tags: max 10 items
# - rating: max 5

Schema Definition

import { z } from "zod";

const CommentSubmissionSchema = z.object({
  postId: z.string(),
  author: z.string().min(1).max(50),
  content: z.string().min(1).max(5000),
  tags: z.array(z.string().max(30)).max(10),
  rating: z.number().int().min(1).max(5),
});

DoS Attack Prevention

// 🚫 Massive content blocked
CommentSubmissionSchema.parse({
  postId: "POST-123",
  author: "attacker",
  content: "X".repeat(1_000_000),  // 1MB!
  tags: Array(1000).fill("spam"),  // 1000 tags!
  rating: 999,                     // Out of range!
});
// Errors:
// - content: max 5000 character(s)
// - tags: max 10 element(s)
// - rating: max 5

Limit input sizes to prevent memory/storage exhaustion attacks!

Full-Stack Validation

Client + Server validation for UX and security

Why Both Sides Matter

❌ Client-Only Validation

const UserSchema = z.object({
  role: z.enum(["user", "guest"]),
});
# Attacker bypasses frontend entirely
curl -X POST /api/users \
  -d '{"role": "admin"}'

Frontend never ran – attacker sends "admin" directly. Without server validation, this is privilege escalation.

Client validation is for UX (fast feedback), not security.

⚠️ Server-Only Validation

async function submit(data: any) {
  await fetch("/api/users", {
    method: "POST",
    body: JSON.stringify(data),
  });
}

No client-side checks means every mistake requires a server round trip:

  1. Submit β†’ β€œEmail invalid” β†’ fix β†’ resubmit
  2. β€œAge too low” β†’ fix β†’ resubmit
  3. Finally succeeds after 3 round trips

Server validation is mandatory for security, but without client validation, users get a frustrating trial-and-error experience.

Additional Benefits

More advantages of using validation libraries

API Design & Documentation

One schema drives validation, types, and documentation.

flowchart LR
    S["πŸ”· <b>Schema</b><br/><i>Pydantic / Zod</i>"]
    S --> V["βœ… <b>Request &<br/>Response Validation</b>"]
    S --> T["πŸ”’ <b>Type Safety</b><br/><i>IDE autocomplete</i>"]
    S --> D["πŸ“„ <b>OpenAPI / JSON Schema</b>"]
    D --> UI["🌐 <b>Interactive Docs</b><br/><i>/docs &bull; Swagger UI</i>"]
    D --> CG["βš™οΈ <b>Client Code<br/>Generation</b>"]

    style S fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
    style V fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
    style T fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
    style D fill:#fff3e0,stroke:#e65100,color:#bf360c
    style UI fill:#fce4ec,stroke:#c62828,color:#b71c1c
    style CG fill:#fce4ec,stroke:#c62828,color:#b71c1c

API docs generated from schemas, always accurate!

Immutable Configs: Validate Once, Done

from pydantic import *

class AppConfig(BaseModel):
    model_config = ConfigDict(frozen=True)  # Immutable after creation

    api_key: str
    max_retries: int = Field(ge=1, le=10)

config.max_retries = 5  # 🚫 ValidationError: frozen instance
import { z } from "zod";

const AppConfigSchema = z.object({
  apiKey: z.string(),
  maxRetries: z.number().int().min(1).max(10),
}).readonly();  // Immutable after parsing

config.maxRetries = 5;  // 🚫 Compile error: readonly property

Validate once at startup, use confidently everywhere without re-checking.

Summary

Creating a validated type system that works at runtime

Key Takeaways

Data validation libraries like Pydantic and Zod transform how we build applications:

Readability: Self-documenting schemas that serve as single source of truth

Runtime Safety: Catch errors at system boundaries before they propagate

Type Safety: Bridge the gap between static types and runtime data

Testing: Reduce test burden and generate mock data easily

Security: Prevent injection, mass assignment, and DoS attacks

Parse, don’t pray! Validate data at boundaries and trust it throughout your application.

Thank You

And Happy Parsing! 😊



Check out my other slide decks on software development best practices