πŸ€– AI Explained
6 min read

Agent Failure Modes

Agents fail in ways that are qualitatively different from single API calls: errors compound, loops consume unbounded resources, and failures can be invisible until they cause damage. This module catalogues the failure modes and the structural mitigations for each.

Layer 1: Surface

Agent failures are harder to detect than API failures. A 500 error is obvious. An agent that completes with a confident wrong answer because step 2 retrieved the wrong data and every subsequent step built on it: that is invisible without deliberate instrumentation.

The five failure modes:

Failure modeWhat happensWhy it is dangerous
Infinite loopAgent calls tools repeatedly without reaching a conclusionUnbounded cost; context window fills; session never returns
Hallucinated tool callModel invents tool arguments that pass schema but are factually wrongWrong action taken with no error signal
Compounding errorsWrong output from step N feeds step N+1Error amplifies across the task; hard to trace at the end
Context overflowAccumulated history exceeds context windowModel loses access to early context; behaviour degrades silently
Stuck stateAgent is waiting for something that will never arriveSession hangs; resource held indefinitely

Layer 2: Guided

Infinite loop detection

from collections import Counter
import hashlib

class LoopDetector:
    def __init__(self, window: int = 5, threshold: int = 2):
        self.window = window
        self.threshold = threshold
        self._history: list[str] = []

    def record(self, tool_name: str, arguments: dict) -> bool:
        """Record a tool call and return True if a loop is detected."""
        # Normalise to catch near-duplicates (whitespace, key order)
        sig = hashlib.md5(
            f"{tool_name}:{sorted(arguments.items())}".encode()
        ).hexdigest()
        self._history.append(sig)

        recent = self._history[-self.window:]
        counts = Counter(recent)
        if any(c >= self.threshold for c in counts.values()):
            return True
        return False

def run_agent_with_loop_detection(goal: str, tools: list[dict]) -> str:
    messages = [{"role": "user", "content": goal}]
    detector = LoopDetector()

    for step in range(10):
        response = llm.chat(model="balanced", messages=messages, tools=tools)
        if response.stop_reason == "end_turn":
            return response.text

        messages.append({"role": "assistant", "content": response.content})

        for tc in response.tool_calls:
            if detector.record(tc.name, tc.arguments):
                # Inject a corrective message instead of continuing blindly
                messages.append({"role": "user", "content": [{
                    "type": "tool_result",
                    "tool_use_id": tc.id,
                    "content": (
                        "You have called this tool with the same arguments multiple times. "
                        "The results are not giving you what you need. "
                        "Try a different approach or conclude with what you have."
                    )
                }]})
                continue

            result = execute_tool(tc.name, tc.arguments)
            messages.append({"role": "user", "content": [{
                "type": "tool_result",
                "tool_use_id": tc.id,
                "content": result,
            }]})

    return "Task incomplete β€” loop or step limit reached."

Hallucinated tool arguments

The model passes structurally valid arguments that are factually invented: an order ID that doesn’t exist, a date in the wrong format, a search query that constructs a URL the model made up.

def execute_with_verification(tool_name: str, arguments: dict) -> str:
    # 1. Schema validation (catches structural errors)
    validate_schema(tool_name, arguments)

    # 2. Semantic validation (catches hallucinated values)
    validators = SEMANTIC_VALIDATORS.get(tool_name, {})
    for param, validator in validators.items():
        if param in arguments:
            valid, reason = validator(arguments[param])
            if not valid:
                return (
                    f"Error: argument '{param}' failed validation β€” {reason}. "
                    f"Verify the correct value and try again."
                )

    return TOOL_REGISTRY[tool_name](**arguments)

# Example semantic validators
SEMANTIC_VALIDATORS = {
    "get_customer_order": {
        "order_id": lambda v: (
            (True, "") if re.match(r"^ORD-\d{8}$", v)
            else (False, f"expected format ORD-XXXXXXXX, got {v!r}")
        )
    },
    "lookup_user": {
        "email": lambda v: (
            (True, "") if "@" in v and "." in v.split("@")[-1]
            else (False, f"{v!r} does not look like a valid email")
        )
    },
}

Return validation errors as tool results so the model can self-correct in the next step.

Compounding error detection

Add checkpoint verification between high-stakes steps:

def verify_intermediate_result(
    step_description: str,
    result: str,
    expected_properties: list[str],
) -> tuple[bool, str]:
    """Ask a fast model to verify that an intermediate result meets basic criteria."""
    check = llm.chat(
        model="fast",
        messages=[{
            "role": "user",
            "content": f"""Verify this result from: "{step_description}"

Result:
{result[:1000]}

Does the result satisfy ALL of these:
{chr(10).join(f'- {p}' for p in expected_properties)}

Answer YES or NO, then briefly explain."""
        }]
    )
    passed = check.text.strip().upper().startswith("YES")
    return passed, check.text

# Use in multi-step tasks before irreversible actions
def execute_plan_with_checkpoints(plan: list[dict], tools: list[dict]) -> str:
    results = {}
    for step in plan:
        results[step["id"]] = execute_step(step, tools, context=results)

        if step.get("checkpoint_before_next"):
            passed, reason = verify_intermediate_result(
                step["description"],
                results[step["id"]],
                step["checkpoint_criteria"],
            )
            if not passed:
                raise StepVerificationError(
                    f"Step {step['id']} failed verification: {reason}"
                )

    return synthesise(results)

Context overflow management

def estimate_tokens(messages: list[dict]) -> int:
    """Rough estimate: 1 token β‰ˆ 4 characters."""
    total_chars = sum(
        len(str(m.get("content", ""))) for m in messages
    )
    return total_chars // 4

def compress_context(messages: list[dict], target_tokens: int) -> list[dict]:
    """Summarise the middle of the conversation to stay within budget."""
    if estimate_tokens(messages) <= target_tokens:
        return messages

    # Always keep: system prompt (index 0) and recent N turns
    keep_recent = 6
    if len(messages) <= keep_recent + 1:
        return messages

    core = messages[1:-keep_recent]
    tail = messages[-keep_recent:]

    summary = llm.chat(
        model="fast",
        messages=[{
            "role": "user",
            "content": (
                "Summarise the following agent steps and findings concisely. "
                "Preserve key facts, decisions, and tool results:\n\n"
                + format_messages(core)
            )
        }]
    ).text

    return [
        messages[0],  # system prompt
        {"role": "user", "content": f"[Compressed prior steps]: {summary}"},
        *tail,
    ]

Stuck state detection

import time

class StuckStateDetector:
    def __init__(self, inactivity_threshold: float = 300.0):  # 5 minutes
        self.threshold = inactivity_threshold
        self.last_progress_at = time.monotonic()
        self._last_state_hash = ""

    def record_progress(self, state_snapshot: str):
        current_hash = hashlib.md5(state_snapshot.encode()).hexdigest()
        if current_hash != self._last_state_hash:
            self.last_progress_at = time.monotonic()
            self._last_state_hash = current_hash

    def is_stuck(self) -> bool:
        return (time.monotonic() - self.last_progress_at) > self.threshold

When stuck state is detected: cancel the task, release locks, return a partial result with a clear explanation of where the agent stopped.


Layer 3: Deep Dive

Failure taxonomy and recovery strategies

FailureDetectionRecovery
Infinite loopRepeated call signatures in sliding windowInject corrective message; escalate if persists
Hallucinated argsSchema + semantic validationReturn structured error; model self-corrects
Compounding errorCheckpoint verification between stepsReplan from last good checkpoint
Context overflowToken count monitoringCompress middle context; prune old tool results
Stuck stateInactivity timeout on state hashCancel task; release resources; return partial
Wrong tool selectedPost-call validation of result relevanceRetry with corrective context; escalate

Cascading failure in multi-agent systems

In a multi-agent system, one agent’s failure can cascade:

Orchestrator β†’ Research agent (fails silently, returns partial data)
           β†’ Writer agent (builds on partial data, produces confident wrong output)
           β†’ Review agent (reviews plausible-looking output, approves)
           β†’ User receives wrong answer with high confidence

Containment strategies:

  • Validate at every handoff point (not just at final output)
  • Use explicit confidence scores on intermediate results
  • Route below-threshold confidence results to a human gate before proceeding

Idempotency in failure recovery

When an agent step fails and is retried, it must not duplicate side effects:

class IdempotentStepExecutor:
    def __init__(self, result_cache):
        self.cache = result_cache

    def execute(self, step_id: str, fn, *args, **kwargs) -> str:
        cached = self.cache.get(step_id)
        if cached:
            return cached  # Return previous result β€” do not re-execute

        result = fn(*args, **kwargs)
        self.cache.set(step_id, result)
        return result

Step IDs should be deterministic from the task context: the same logical step in a retried task must reuse the same ID to hit the cache.

Further reading

  • Failure Modes in LLM Agents; Empirical study of agent failures across benchmark tasks; useful taxonomy that matches the structure of this module.
  • Risks from Learned Optimization; Foundational analysis of how optimising systems fail in unexpected ways; background reading for understanding why agentic failure is structurally different.
✏ Suggest an edit on GitHub

Agent Failure Modes: Check your understanding

Q1

An agent calls search_knowledge_base(query='refund policy') five times in a row with no variation. Each call returns the same empty result. The loop continues until the context window fills. What two mitigations would have prevented this?

Q2

An agent calls delete_record(record_id='REC-99999'). The record ID looks syntactically valid but doesn't exist. Schema validation passes. The deletion API returns a 404. Which validation layer catches this, and which doesn't?

Q3

Step 2 of a 5-step task returns incorrect pricing data. The agent uses this to complete steps 3, 4, and 5 with high confidence. The final output contains wrong prices, cited correctly from the wrong intermediate result. Which failure mode is this?

Q4

An agent accumulates 40 tool results and reasoning traces over a long session. You notice the model begins ignoring instructions from the system prompt that it was following earlier in the session. What is the most likely cause?

Q5

A step in an agent task is retried after a crash. To prevent duplicate side effects, the retry must not re-execute actions that already completed. What mechanism provides this guarantee?