<< goback()

How to Add Typed Tool-Calling to PydanticAI Workflows with Typedef

Typedef Team

How to Add Typed Tool-Calling to PydanticAI Workflows with Typedef

Typed tool-calling enables AI agents to execute functions with compile-time safety and runtime validation. This guide shows how to build type-safe tools using Typedef's Fenic framework and expose them through the Model Context Protocol (MCP) for integration with PydanticAI and other agent frameworks.

Prerequisites

Install Fenic:

bash
pip install fenic

Set up environment variables for your model providers:

bash
export OPENAI_API_KEY="your-key"
export ANTHROPIC_API_KEY="your-key"

Understanding Fenic's Tool Architecture

Fenic provides two approaches for building typed tools:

Declarative tools: Define tools as parameterized DataFrames with explicit type declarations. These tools are stored in Fenic's catalog and automatically generate type-safe interfaces.

System tools: Custom Python functions that return DataFrames. These provide full programmatic control while maintaining type safety.

Both approaches integrate seamlessly with MCP servers, making them accessible to AI agents through standardized protocols.

Setting Up Your Fenic Session

Configure a session with your model providers:

python
from fenic.api.session import Session, SessionConfig
from fenic.api.session.config import (
    SemanticConfig,
    OpenAILanguageModel,
    AnthropicLanguageModel
)

config = SessionConfig(
    app_name="pydanticai_tools",
    semantic=SemanticConfig(
        language_models={
            "gpt4": OpenAILanguageModel(
                model_name="gpt-4.1-nano",
                rpm=100,
                tpm=100
            ),
            "claude": AnthropicLanguageModel(
                model_name="claude-3-5-haiku-latest",
                rpm=100,
                input_tpm=100,
                output_tpm=100
            )
        },
        default_language_model="gpt4"
    )
)

session = Session.get_or_create(config)

Creating Typed Tools with tool_param

The tool_param function creates typed placeholders that enforce parameter validation at runtime. Each parameter requires a name and explicit data type.

Basic Tool with Single Parameter

python
import fenic.api.functions as fc
from fenic.core.types import StringType, IntegerType
from fenic.core.mcp.types import ToolParam

# Load your data
df = session.read.csv("users.csv")

# Create parameterized query
user_search = df.filter(
    fc.col("name").contains(fc.tool_param("search_term", StringType))
)

# Register in catalog
session.catalog.create_tool(
    tool_name="search_users",
    tool_description="Search users by name substring",
    tool_query=user_search,
    result_limit=50,
    tool_params=[
        ToolParam(
            name="search_term",
            description="Substring to search for in user names"
        )
    ]
)

Tool with Multiple Parameters and Optional Values

python
from fenic.core.types import StringType, IntegerType

# Create filters with optional parameters
min_age_filter = fc.coalesce(
    fc.col("age") >= fc.tool_param("min_age", IntegerType),
    fc.lit(True)
)

max_age_filter = fc.coalesce(
    fc.col("age") <= fc.tool_param("max_age", IntegerType),
    fc.lit(True)
)

status_filter = fc.coalesce(
    fc.col("status") == fc.tool_param("status", StringType),
    fc.lit(True)
)

# Combine filters
filtered_users = df.filter(min_age_filter & max_age_filter & status_filter)

session.catalog.create_tool(
    tool_name="filter_users",
    tool_description="Filter users by age range and status",
    tool_query=filtered_users,
    result_limit=100,
    tool_params=[
        ToolParam(
            name="min_age",
            description="Minimum age threshold",
            has_default=True,
            default_value=None
        ),
        ToolParam(
            name="max_age",
            description="Maximum age threshold",
            has_default=True,
            default_value=None
        ),
        ToolParam(
            name="status",
            description="User status to filter by",
            allowed_values=["active", "inactive", "pending"],
            has_default=True,
            default_value=None
        )
    ]
)

Building Tools with Pydantic Models

Fenic's extract function works directly with Pydantic models for structured data extraction:

python
from pydantic import BaseModel, Field
from typing import List

class ContactInfo(BaseModel):
    email: str = Field(description="Email address")
    phone: str = Field(description="Phone number")
    location: str = Field(description="City and country")

class UserProfile(BaseModel):
    name: str = Field(description="Full name")
    role: str = Field(description="Job title or role")
    contact: ContactInfo = Field(description="Contact information")
    skills: List[str] = Field(description="List of technical skills")

# Extract structured data from unstructured text
df_with_profiles = df.select(
    fc.col("user_id"),
    fc.semantic.extract(
        fc.col("bio_text"),
        response_format=UserProfile
    ).alias("profile")
)

# Create tool for profile extraction
session.catalog.create_tool(
    tool_name="extract_profiles",
    tool_description="Extract structured user profiles from bio text",
    tool_query=df_with_profiles,
    result_limit=20,
    tool_params=[]
)

Supported Pydantic Types

Fenic supports these field types in Pydantic models:

  • Primitive types: str, int, float, bool
  • Optional fields: Optional[T]
  • Lists: List[T]
  • Literals: Literal["value1", "value2"]
  • Nested Pydantic models

Unsupported types include unions, custom classes, and circular references.

Creating Custom SystemTools

For complex logic requiring Python functions, use SystemTools:

python
from fenic.core.mcp.types import SystemTool
from fenic.api.dataframe import DataFrame

async def analyze_user_behavior(
    user_id: int,
    date_from: str,
    date_to: str
) -> DataFrame:
    """
    Analyze user behavior within a date range.

    Args:
        user_id: The user ID to analyze
        date_from: Start date (YYYY-MM-DD)
        date_to: End date (YYYY-MM-DD)
    """
    # Load activity data
    df = session.table("user_activities")

    # Filter by user and date range
    filtered = df.filter(
        (fc.col("user_id") == user_id) &
        (fc.col("date") >= date_from) &
        (fc.col("date") <= date_to)
    )

    # Aggregate metrics
    return filtered.group_by("activity_type").agg(
        fc.count("*").alias("count"),
        fc.avg("duration_seconds").alias("avg_duration")
    )

# Register as SystemTool
user_behavior_tool = SystemTool(analyze_user_behavior)

Automatic System Tools

Fenic automatically generates standard tools for data operations:

python
from fenic.api.mcp.tools import SystemToolConfig

# Prepare your data
df = session.read.parquet("products.parquet")
df.write.save_as_table("products", mode="overwrite")
session.catalog.set_table_description(
    "products",
    "Product catalog with pricing, categories, and inventory"
)

# Configure automatic tools
system_tools = SystemToolConfig(
    table_names=["products"],
    tool_namespace="products",
    max_result_rows=100
)

This generates these tools automatically:

  • Schema: List columns and types
  • Profile: Column statistics and distributions
  • Read: Paginated data access with filters
  • Search Summary: Regex search across text columns
  • Search Content: Detailed search within specific columns
  • Analyze: Raw SQL execution for complex queries

Setting Up an MCP Server

Create an MCP server to expose your tools:

python
from fenic.api.mcp.server import (
    create_mcp_server,
    run_mcp_server_sync
)

# Retrieve catalog tools
catalog_tools = session.catalog.list_tools()

# Create server with both catalog and system tools
server = create_mcp_server(
    session=session,
    server_name="UserDataTools",
    user_defined_tools=catalog_tools,
    system_tools=system_tools,
    concurrency_limit=10
)

# Run server
run_mcp_server_sync(
    server,
    transport="http",
    stateless_http=True,
    port=8000,
    host="127.0.0.1",
    path="/mcp"
)

Production Deployment with ASGI

For production environments, deploy as an ASGI application:

python
from fenic.api.mcp.server import run_mcp_server_asgi

app = run_mcp_server_asgi(
    server,
    stateless_http=True,
    path="/mcp"
)

# Deploy with uvicorn:
# uvicorn myapp:app --host 0.0.0.0 --port 8000

CLI Deployment

Use the fenic-serve command for quick deployment:

bash
# Run with all catalog tools
fenic-serve --transport http --port 8000

# Run specific tools
fenic-serve --tools search_users filter_users --port 8000

# Use stdio transport for direct integration
fenic-serve --transport stdio

Integrating with PydanticAI

Configure PydanticAI to use your MCP server. The MCP protocol provides standardized tool definitions that PydanticAI can consume directly:

python
# In your PydanticAI agent configuration
from pydantic_ai import Agent

agent = Agent(
    "openai:gpt-4",
    system_prompt="You are a user data assistant.",
    # Configure MCP client to connect to Fenic server
    mcp_servers={
        "user_tools": {
            "url": "http://127.0.0.1:8000/mcp",
            "transport": "http"
        }
    }
)

# PydanticAI automatically discovers and validates tools
result = await agent.run(
    "Find all active users between ages 25 and 35"
)

The agent now has type-safe access to all your Fenic tools with automatic parameter validation.

Advanced Patterns

Semantic Operations in Tools

Combine data transformations with AI inference:

python
from fenic.core.types import StringType

# Sentiment analysis tool
df_with_sentiment = df.select(
    fc.col("review_id"),
    fc.col("review_text"),
    fc.semantic.map(
        "Classify the sentiment as positive, negative, or neutral: {{ text }}",
        text=fc.col("review_text"),
        model_alias="gpt4"
    ).alias("sentiment")
).filter(
    fc.col("sentiment") == fc.tool_param("target_sentiment", StringType)
)

session.catalog.create_tool(
    tool_name="filter_by_sentiment",
    tool_description="Find reviews with specific sentiment",
    tool_query=df_with_sentiment,
    result_limit=50,
    tool_params=[
        ToolParam(
            name="target_sentiment",
            description="Target sentiment to filter by",
            allowed_values=["positive", "negative", "neutral"]
        )
    ]
)

Async UDFs for External API Calls

Implement concurrent I/O operations within DataFrames:

python
import aiohttp
from fenic.api.functions import async_udf
from fenic.core.types import StructType, StructField, IntegerType, BooleanType

@async_udf(
    return_type=StructType([
        StructField("score", IntegerType),
        StructField("verified", BooleanType)
    ]),
    max_concurrency=20,
    timeout_seconds=5,
    num_retries=2
)
async def enrich_with_api(user_id: int) -> dict:
    async with aiohttp.ClientSession() as session:
        async with session.get(
            f"https://api.example.com/users/{user_id}/score"
        ) as resp:
            data = await resp.json()
            return {
                "score": data["score"],
                "verified": data["verified"]
            }

# Use in DataFrame
enriched = df.select(
    fc.col("user_id"),
    enrich_with_api(fc.col("user_id")).alias("api_data")
)

Chaining Multiple Tools

Build complex workflows by composing tools:

python
from fenic.core.types import StringType

# First tool: filter users
filtered = session.table("users").filter(
    fc.col("status") == fc.tool_param("status", StringType)
)

session.catalog.create_tool(
    tool_name="get_users_by_status",
    tool_description="Get users filtered by status",
    tool_query=filtered,
    result_limit=100,
    tool_params=[
        ToolParam(name="status", description="User status")
    ]
)

from fenic.core.types import IntegerType, ArrayType

# Second tool: aggregate activities for filtered users
activities = session.table("activities")
user_ids_param = fc.tool_param("user_ids", ArrayType(IntegerType))

aggregated = activities.filter(
    fc.col("user_id").isin(user_ids_param)
).group_by("user_id").agg(
    fc.count("*").alias("activity_count"),
    fc.sum("points").alias("total_points")
)

session.catalog.create_tool(
    tool_name="aggregate_user_activities",
    tool_description="Calculate activity metrics for specific users",
    tool_query=aggregated,
    result_limit=100,
    tool_params=[
        ToolParam(
            name="user_ids",
            description="List of user IDs to analyze"
        )
    ]
)

Best Practices

Type declaration: Always specify explicit types for tool_param. This enables compile-time validation and prevents runtime type errors.

Description quality: Write clear parameter descriptions. AI agents use these to understand when and how to call tools.

Result limits: Set appropriate result_limit values to prevent performance issues. Consider pagination for large datasets.

Error handling: Use coalesce for optional parameters with sensible defaults:

python
fc.coalesce(
    fc.col("field") == fc.tool_param("value", StringType),
    fc.lit(True)
)

Namespace tools: Use tool_namespace in SystemToolConfig to avoid naming conflicts when running multiple MCP servers.

Monitor performance: Access query metrics through the built-in metrics table:

python
metrics = session.table("fenic_system.query_metrics")
metrics.select(
    "tool_name",
    "latency_ms",
    "total_lm_cost"
).order_by("latency_ms").show()

Validate early: Provider keys are validated during session creation. This prevents runtime failures from misconfigured credentials.

Use profiles for model variants: Configure model profiles for different performance requirements:

python
OpenAILanguageModel(
    model_name="o4-mini",
    rpm=100,
    tpm=100,
    profiles={
        "fast": OpenAILanguageModel.Profile(reasoning_effort="low"),
        "thorough": OpenAILanguageModel.Profile(reasoning_effort="high")
    },
    default_profile="fast"
)

Resources

Type-safe tool-calling transforms AI agents from probabilistic systems into reliable software components. Fenic provides the infrastructure to build, test, and deploy these tools at scale while maintaining the simplicity of DataFrame operations.

Share this page
the next generation of

data processingdata processingdata processing

Join us in igniting a new paradigm in data infrastructure. Enter your email to get early access and redefine how you build and scale data workflows with typedef.