πŸ€– AI Explained
5 min read

Tool Schema Design

The schema is not documentation: it is the instruction the model reads to decide whether to call your tool and what to pass. A bad schema causes wrong tool selections, invalid arguments, and hallucinated parameter values. This module covers what separates a production schema from a prototype one.

Layer 1: Surface

A tool schema has three parts the model directly uses:

  1. Name: how the model refers to the tool in a call
  2. Description: why and when to use the tool (this is the primary signal for tool selection)
  3. Parameters: what arguments to pass and in what shape

A model choosing between five tools with vague names and one-word descriptions will pick incorrectly. The same model choosing between five tools with specific names and precise descriptions will pick correctly almost every time. The schema is the prompt for tool selection.

A bad schema:

{
  "name": "get_data",
  "description": "Gets data",
  "parameters": { "type": "object", "properties": { "id": { "type": "string" } } }
}

A good schema:

{
  "name": "get_customer_order",
  "description": "Retrieves a single customer order by order ID. Returns order status, line items, shipping address, and total. Use this when the user asks about a specific order. Do NOT use for order lists or search β€” use list_customer_orders for that.",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "The order ID in format ORD-XXXXXXXX (e.g. ORD-00012345)"
      }
    },
    "required": ["order_id"]
  }
}

Layer 2: Guided

Name conventions

Follow a verb_noun pattern with enough specificity to distinguish from similar tools:

BadBetterWhy
get_dataget_customer_orderSpecific to the entity and operation
searchsearch_knowledge_baseDistinguishes from search_web or search_tickets
updateupdate_ticket_statusUnambiguous about what is being updated
sendsend_slack_messageTool registry may contain send_email too

Names should be lowercase with underscores. Avoid abbreviations: retrieve_kb_doc is harder to reason about than retrieve_knowledge_base_document.

Description anatomy

A complete description answers four questions:

  1. What does this tool do?
  2. What does it return?
  3. When should the model use it?
  4. When should the model not use it?
# Example: describing a tool that queries order history
description = """
Retrieves the order history for a customer. Returns a list of orders
with status, date, total, and order ID for each.

Use when the user asks about their past orders, recent purchases,
or wants to find a specific order by date or amount.

Do NOT use to get details of a single order β€” use get_customer_order for that.
Do NOT use if the user is asking about a return or refund β€” use get_return_status.
"""

The β€œdo NOT use” clauses are especially important when you have multiple similar tools. Without them, the model picks the first plausible match.

Parameter design

Use enums instead of free strings:

// Bad β€” model might pass "Pending", "PENDING", "in progress", "waiting"
{ "status": { "type": "string", "description": "The status to filter by" } }

// Good β€” model knows exactly what values are valid
{
  "status": {
    "type": "string",
    "enum": ["pending", "processing", "shipped", "delivered", "cancelled"],
    "description": "Filter orders by status"
  }
}

Required vs optional parameters:

{
  "type": "object",
  "properties": {
    "customer_id": {
      "type": "string",
      "description": "Customer ID (required)"
    },
    "limit": {
      "type": "integer",
      "description": "Maximum results to return. Defaults to 10, max 50.",
      "default": 10
    },
    "since_date": {
      "type": "string",
      "description": "ISO 8601 date string (e.g. 2026-01-15). Only return orders on or after this date."
    }
  },
  "required": ["customer_id"]
}

Keep the shape flat: nested objects are harder for the model to populate correctly. If you need nested data, describe the structure clearly:

// Harder for the model β€” nested object with unclear semantics
{
  "filter": {
    "type": "object",
    "properties": {
      "date": { "type": "object", "properties": { "from": ..., "to": ... } }
    }
  }
}

// Easier β€” flat, explicit parameters
{
  "date_from": { "type": "string", "description": "Start of date range, ISO 8601" },
  "date_to":   { "type": "string", "description": "End of date range, ISO 8601" }
}

Schema validation at runtime

Define schemas once and validate both the model’s output and your implementation against them:

import jsonschema

ORDER_TOOL_SCHEMA = {
    "type": "object",
    "properties": {
        "order_id": {"type": "string", "pattern": "^ORD-[0-9]{8}$"}
    },
    "required": ["order_id"],
    "additionalProperties": False,
}

def call_get_customer_order(arguments: dict) -> dict:
    # Validate before executing β€” catch model errors before they hit your API
    try:
        jsonschema.validate(arguments, ORDER_TOOL_SCHEMA)
    except jsonschema.ValidationError as e:
        return {"error": f"Invalid arguments: {e.message}"}
    return fetch_order(arguments["order_id"])

Returning a structured error lets the model self-correct in the next turn rather than crashing your application.

Generating schemas from code

Rather than maintaining JSON Schema by hand, derive it from your function signatures:

from pydantic import BaseModel, Field
from typing import Literal

class GetCustomerOrderArgs(BaseModel):
    order_id: str = Field(description="Order ID in format ORD-XXXXXXXX")

class ListCustomerOrdersArgs(BaseModel):
    customer_id: str = Field(description="Customer ID")
    status: Literal["pending", "shipped", "delivered", "cancelled"] | None = Field(
        default=None,
        description="Filter by status. Omit to return all orders."
    )
    limit: int = Field(default=10, ge=1, le=50, description="Max results (1–50)")

# Generate schema from model
schema = ListCustomerOrdersArgs.model_json_schema()
# schema is now a valid JSON Schema dict β€” use directly as tool parameter schema

Pydantic is not required: any library that emits JSON Schema works. The benefit: schema and implementation stay in sync because they share the same type definition.


Layer 3: Deep Dive

Schema versioning and backwards compatibility

Tools are long-lived. When you need to change a schema:

Change typeSafe?How
Add optional parameterβœ“ SafeNew parameter with default, not in required
Add new toolβœ“ SafeOld tool still works; deprecate gradually
Change parameter descriptionβœ“ SafeCosmetic; no breaking change
Remove required parameterβœ“ SafeMake it optional first; remove after clients updated
Add required parameterβœ— BreakingExisting clients pass null; use new tool name instead
Rename parameterβœ— BreakingAdd new name as alias; deprecate old name
Change parameter typeβœ— BreakingNew tool name + migration period

For breaking changes, deploy both the old and new tool simultaneously. Point old clients at the old schema; new clients use the new one. Remove the old version once no traffic remains.

Testing schemas

Schemas should be tested as part of CI:

import pytest
from pydantic import ValidationError

def test_valid_arguments_pass():
    args = ListCustomerOrdersArgs(customer_id="C-001", status="shipped", limit=5)
    assert args.limit == 5

def test_invalid_status_rejected():
    with pytest.raises(ValidationError):
        ListCustomerOrdersArgs(customer_id="C-001", status="unknown_status")

def test_limit_bounds_enforced():
    with pytest.raises(ValidationError):
        ListCustomerOrdersArgs(customer_id="C-001", limit=100)  # max is 50

def test_schema_round_trips_through_json():
    import json
    schema = ListCustomerOrdersArgs.model_json_schema()
    serialised = json.dumps(schema)
    restored = json.loads(serialised)
    assert restored["properties"]["limit"]["maximum"] == 50

Number of tools

Models handle a limited number of tools well. Beyond roughly 20–30 tools, selection accuracy degrades: the model is picking from a long menu without being able to read it all carefully. Strategies for large tool sets:

  • Tool routing: first call decides which category of tool to use, then a second call picks the specific tool from a smaller set
  • Dynamic tool loading: only surface tools relevant to the current conversation context
  • Tool namespacing: prefix tool names by domain (orders_get, orders_list, returns_get) so grouping is explicit even in a flat list
  • Semantic tool search: embed tool descriptions; at runtime, retrieve the top-k most relevant tools for the current query and pass only those to the model

Further reading

✏ Suggest an edit on GitHub

Tool Schema Design: Check your understanding

Q1

A tool is defined with name: 'get_data', description: 'Gets data', and a single string parameter called 'id'. Users report the model frequently calls this tool when it should be calling a different tool, and passes wrong values for the id parameter. What is the root cause?

Q2

You have two similar tools: search_orders and search_tickets. The model frequently calls the wrong one. What addition to the tool descriptions most directly fixes this?

Q3

Your Pydantic model generates a JSON Schema for a tool's parameters. Six months later, the tool API adds a new required field. What is the safest schema update strategy?

Q4

Why should tool schemas be validated at runtime, before the tool implementation runs?

Q5

You have a tool with a deeply nested parameter object: filter.date.from and filter.date.to. The model frequently omits nested fields or passes the wrong structure. What redesign directly addresses this?