Layer 1: Surface
LLMs return free-form text. Your application usually needs structured data.
The naive approach, parse the text yourself, breaks as soon as the model formats its output slightly differently. The robust approach is to constrain what the model is allowed to return, either by asking for a specific schema or by using tool use (also called function calling), where the model fills in typed function arguments rather than writing free text.
These are two points on the same spectrum:
| Pattern | What it does | Best for |
|---|---|---|
| Prompted JSON | System prompt instructs model to return JSON; you parse and validate | Simple extraction, low stakes |
| Tool use (also called function calling) | Model fills typed function arguments; add strict: true for API-level schema guarantees | Extraction, triggering actions, agents |
The key shift: instead of hoping the model formats its response correctly, you define the shape of valid output. With strict: true on a tool definition, the API enforces that shape: guaranteed types, required fields present, no unexpected properties.
Production Gotcha
Common Gotcha: Tool definitions count against your context window. An application with 30 registered tools is spending tokens on those definitions every single request: even when most tools are irrelevant. Define only the tools needed for the current task, or select a relevant subset dynamically based on user intent before sending the request.
Layer 2: Guided
Prompted JSON (simplest, least reliable)
Works for low-stakes extraction when you control the prompt carefully:
# --- pseudocode ---
import json
def extract_contact(text: str) -> dict:
response = llm.chat(
model="balanced",
system=(
"Extract contact information and return valid JSON only. "
"No markdown, no explanation — raw JSON.\n\n"
'Schema: {"name": string, "email": string or null, "phone": string or null}'
),
messages=[{"role": "user", "content": text}],
max_tokens=256,
)
return json.loads(response.text)
# In practice — Anthropic SDK
import anthropic
import json
client = anthropic.Anthropic()
def extract_contact(text: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
system=(
"Extract contact information and return valid JSON only. "
"No markdown, no explanation — raw JSON.\n\n"
'Schema: {"name": string, "email": string or null, "phone": string or null}'
),
messages=[{"role": "user", "content": text}]
)
return json.loads(response.content[0].text)
# OpenAI: response.choices[0].message.content | Gemini: response.text
Fragile: the model may wrap the JSON in a markdown code block, add a comment, or omit a field. Use this only when you have a reliable fallback for parse failures.
Tool use / function calling (structured output)
Tool use asks the model to invoke a typed function instead of writing text. The model returns structured arguments: no markdown stripping, no json.loads. Add strict: true to the tool definition to get API-level guarantees: correct types, all required fields present, no unexpected properties.
The examples below use Anthropic’s SDK. The concept is identical across providers, but the response shape differs: see the provider comparison table at the end of this layer.
import anthropic
client = anthropic.Anthropic()
# strict: True — the API guarantees types and required fields
tools = [
{
"name": "extract_contact",
"description": "Extract contact information from text.",
"strict": True, # API-level schema enforcement
"input_schema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Full name"},
"email": {"type": "string", "description": "Email address"},
"phone": {"type": "string", "description": "Phone number including country code"},
},
"required": ["name"],
"additionalProperties": False, # required by strict mode
},
}
]
def extract_contact(text: str) -> dict:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=256,
tools=tools,
tool_choice={"type": "tool", "name": "extract_contact"}, # force this tool
messages=[{"role": "user", "content": text}]
)
# The model's response will be a tool_use block, not text
tool_use = next(b for b in response.content if b.type == "tool_use")
return tool_use.input # already a dict — no json.loads needed
Without strict: True, the model generally conforms to the schema but the API does not guarantee it: you may receive wrong types or missing required fields, and should validate before use.
tool_choice: {"type": "tool", "name": "extract_contact"} forces the model to call that specific tool. Without it, the model may choose to respond in text.
Executing real actions with tool use
Tool use becomes powerful when the tools actually do things. The model decides when and how to call a tool; your code executes it and returns the result:
import anthropic
import json
client = anthropic.Anthropic()
# Tool definitions
tools = [
{
"name": "get_weather",
"description": "Get current weather for a city.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"country": {"type": "string", "description": "ISO 3166-1 alpha-2 country code"},
},
"required": ["city"],
},
},
{
"name": "get_forecast",
"description": "Get a 5-day weather forecast for a city.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string"},
"days": {"type": "integer", "minimum": 1, "maximum": 5},
},
"required": ["city"],
},
},
]
def run_tool(name: str, inputs: dict) -> str:
"""Execute the tool and return a result string."""
if name == "get_weather":
# Replace with a real weather API call
return json.dumps({"city": inputs["city"], "temp_c": 18, "condition": "partly cloudy"})
if name == "get_forecast":
return json.dumps({"city": inputs["city"], "forecast": ["sunny", "cloudy", "rain", "sunny", "sunny"]})
return json.dumps({"error": f"unknown tool: {name}"})
def chat_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages,
)
if response.stop_reason == "end_turn":
# Model responded in text — we're done
return next(b.text for b in response.content if b.type == "text")
if response.stop_reason == "tool_use":
# Collect all tool calls in this turn
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = run_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Add the assistant's tool-call turn and the results to history
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
# Loop — let the model continue with the tool results
chat_with_tools("What's the weather in Tokyo, and should I pack an umbrella for the next 5 days?")
This is the core loop behind most AI agents: model decides what to call → your code executes it → results go back to the model → repeat until end_turn.
Before vs After
String parsing: brittle:
# BAD: Fragile, breaks on any formatting variation
raw = llm_response.text
# What if the model says "The price is $42.50" vs "Price: 42.50" vs "42.5 USD"?
price = float(raw.split("$")[1].split()[0]) # crashes constantly in production
Tool use: robust:
# GOOD: Model fills a typed field; your code gets a float, always
tool_use = next(b for b in response.content if b.type == "tool_use")
price = tool_use.input["price"] # typed, validated, no parsing
Common mistakes
- Not specifying
tool_choice: When you need a specific tool called, settool_choiceexplicitly. Without it, the model may answer in plain text. - Over-broad tool descriptions: Vague descriptions like “do anything with files” confuse the model. Write descriptions as if documenting a public API: what it does, what it doesn’t do, and when to use it.
- Not handling
end_turnin tool loops: If you build an agentic loop, always checkstop_reason. A missingend_turncheck creates an infinite loop. - Ignoring required vs optional fields: Fields not in
requiredmay be omitted. Code that accesses them without a default will crash. - Registering all tools for every request: See the production gotcha.
Provider comparison: function calling API shapes
The tool / function calling concept is the same across providers, but the request and response fields differ:
| Anthropic | OpenAI | |
|---|---|---|
| Request field | tools | tools |
| Force a specific tool | tool_choice: {"type": "tool", "name": "..."} | tool_choice: {"type": "function", "function": {"name": "..."}} |
| Model chose a tool | stop_reason == "tool_use" | finish_reason == "tool_calls" |
| Tool call in response | block.type == "tool_use" (in response.content) | choice.message.tool_calls[n] |
| Tool call ID | block.id | tool_call.id |
| Tool arguments | block.input (dict) | tool_call.function.arguments (JSON string: parse it) |
| Result message role | "user" | "tool" |
| Result format | {"type": "tool_result", "tool_use_id": id, "content": "..."} | {"role": "tool", "tool_call_id": id, "content": "..."} |
The tool definition schema itself (name, description, parameters / input_schema using JSON Schema) is nearly identical: both providers follow the same JSON Schema vocabulary. The main structural difference is parameters (OpenAI) vs input_schema (Anthropic) as the key name.
Layer 3: Deep Dive
Schema design
Tool input schemas follow JSON Schema. A few design principles that matter in practice:
Use enums to constrain free-text fields:
{
"status": {
"type": "string",
"enum": ["open", "in_progress", "resolved", "closed"],
"description": "Current ticket status"
}
}
Without an enum, the model may invent valid-sounding but unexpected values ("pending", "active"). Enums collapse the output space to exactly the values your downstream code handles.
Be specific in descriptions:
{
"date": {
"type": "string",
"description": "Date in ISO 8601 format (YYYY-MM-DD). Use today's date if the user says 'today'."
}
}
The model reads descriptions at inference time. Precise descriptions reduce ambiguity and reduce the need to re-prompt when the model guesses wrong.
Mark genuinely optional fields as not required:
If a field is truly optional, omit it from required. The model will include it when the information is present and omit it when it isn’t: more reliable than having it guess a null/empty value.
Multi-tool calls in a single turn
The model can call multiple tools in a single response when the tasks are independent. This is more efficient than a serial loop:
# The model may return multiple tool_use blocks in one response
# when it determines the calls are parallelisable
for block in response.content:
if block.type == "tool_use":
# Execute concurrently if your implementation supports it
result = run_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
Treat each tool_use block independently; collect all results before returning them to the model as a single user turn.
Error handling in tool loops
Tool execution can fail. Return errors to the model in the result: don’t throw exceptions that break the loop. The model can often recover:
def run_tool_safe(name: str, inputs: dict) -> str:
try:
return run_tool(name, inputs)
except Exception as e:
# Return the error as a tool result — the model can decide what to do
return json.dumps({"error": str(e)})
Also implement a maximum turn limit. A misbehaving tool or ambiguous task can produce a loop where the model keeps calling tools without reaching end_turn. A limit of 10–20 turns covers the vast majority of legitimate workflows.
Tool use vs fine-tuning for structured output
For applications that need consistent structured output, tool use with a schema is almost always preferable to fine-tuning:
| Tool use | Fine-tuning | |
|---|---|---|
| Schema changes | Update the tool definition | Re-train and re-deploy |
| New fields | Add to schema | Collect examples, re-train |
| Validation | API-enforced | Must validate yourself |
| Cost | Standard inference | Training + higher per-token cost |
Fine-tuning for structured output makes sense only when the output has highly domain-specific patterns (e.g. a proprietary format) that are hard to express in a JSON schema or prompt.
The tool use ↔ agent relationship
Everything in the agents track builds on this loop:
user message
↓
model decides: respond or call tool?
↓ (tool)
your code executes the tool
↓
result returned to model
↓
model decides: respond or call another tool?
↓ (respond)
final answer to user
The sophistication of an agent is largely a function of the tools it has access to and how well those tools are defined. A well-designed tool interface is more valuable than a more capable model given a poorly defined one.
Further reading
- Tool use, Anthropic documentation [Anthropic], Reference for tool definition,
tool_choice, multi-tool calls, and the full request/response lifecycle using Anthropic’s API. - Function calling, OpenAI documentation [OpenAI], The equivalent OpenAI reference; useful for understanding the
tool_callsresponse shape androle: "tool"result format. - JSON Schema specification; The schema language used for tool input definitions across all providers. Understanding
type,enum,required, and$refcovers 95% of tool authoring. - ReAct: Synergising Reasoning and Acting in Language Models; Yao et al., 2022. The paper that formalised the reason-then-act loop that underlies most tool-use agent patterns.