🤖 AI Explained
how agents work / 10 min read

Skills — For Experienced Developers

Just-in-time retrieval-augmented prompting — how skill files work, how to structure them, and how they compare to RAG and fine-tuning.

What Skills Actually Are

Skills are not an Anthropic API feature. You won’t find them in the API docs. They’re a prompt engineering pattern implemented at the host level — a way to inject expert knowledge into the model’s context right before a specific task.

The mechanism is dead simple:

  1. Host app detects what type of task the model is about to do
  2. Host reads a relevant markdown file (e.g., SKILL.md)
  3. File contents are injected into the system prompt or conversation
  4. Model generates output with that expert context loaded

No embeddings. No vector store. No retrieval model. Just a file read and string concatenation.


The Discovery and Loading Pipeline

User asks: "Create a Word doc summarizing these test results"

Host (Claude Code):
  1. Identifies task type: "document generation" → "docx"
  2. Checks known paths:
     - ./SKILL.md                    (project root)
     - ./docx/SKILL.md              (task-specific)
     - ~/.claude/SKILL.md           (user-level)
     - ./.claude/SKILL.md           (project config dir)
  3. Reads matching file(s)
  4. Injects into context:
     system_prompt += "\n\n" + skill_content

LLM now sees 2,000 words of python-docx best practices
  → generates better code than it would from training data alone

The key insight: this is deterministic. Given the same task type and the same file on disk, the same skill loads every time. No probabilistic retrieval, no relevance scoring — just path matching.


Skills vs Fine-tuning vs RAG

Three ways to give a model domain-specific knowledge. Each has a sweet spot:

Fine-tuning

Permanently bakes knowledge into the model weights through additional training.

Pros: Always available, no context cost, fast inference
Cons: Expensive ($$$), slow iteration (retrain to update),
      hard to scope (affects all outputs), can degrade
      general capabilities (catastrophic forgetting)
When: You need fundamental behavioral changes across ALL tasks

RAG (Retrieval-Augmented Generation)

Semantic search over a document corpus → relevant chunks injected into context.

Pros: Scales to millions of documents, dynamic, no retraining
Cons: Requires infra (embedding model, vector DB, retrieval pipeline),
      retrieval quality is probabilistic, chunking artifacts
When: Large corpus (docs, codebases, knowledge bases) where the
      right context varies per query

Skills

Deterministic file load → exact content injected into context.

Pros: Zero infrastructure, version-controlled, predictable,
      instant iteration (edit file, reload), scoped to task type
Cons: Doesn't scale to large corpora, consumes context budget,
      manual curation required
When: Domain-specific instructions, team conventions, library
      best practices — small, curated, expert knowledge

The decision matrix:

Knowledge typeSizeChanges often?Use
Team coding standardsSmallRarelySkills
API documentationLargeQuarterlyRAG
Brand voice guidelinesSmallMonthlySkills
Customer support historyMassiveDailyRAG
Framework best practicesMediumPer versionSkills
Fundamental model behaviorN/ARarelyFine-tune

Structuring a Good Skill File

A skill file is just markdown, but structure matters. Here’s what works:

# Python FastAPI SKILL

## When to apply
When generating or modifying FastAPI route handlers, middleware, or
dependency injection code.

## Conventions
- Always use async route handlers (never sync)
- Use Pydantic v2 model_validator, not v1 validator
- Dependency injection via Annotated[Type, Depends(fn)]
- Return type annotations on all handlers

## Error handling
- Use HTTPException for client errors (4xx)
- Use custom exception handlers for domain errors
- Never return raw 500s — always structured error response

## Testing patterns
- Use httpx.AsyncClient, not TestClient
- Fixtures for DB sessions, not global state
- Parametrize with @pytest.mark.parametrize for edge cases

## Anti-patterns to avoid
- Do NOT use @app.on_event("startup") — use lifespan instead
- Do NOT import Settings at module level — inject via Depends
- Do NOT use synchronous DB drivers with async handlers

What makes this effective:

  • Scoped — says exactly when it applies
  • Prescriptive — concrete rules, not vague guidelines
  • Includes anti-patterns — tells the model what NOT to do (models hallucinate common mistakes)
  • Concise — ~200 words, ~250 tokens — minimal context budget cost

Context Budget Management

Skills compete with everything else for the context window:

Context window (200K tokens)
├── System prompt         ~2,000 tokens
├── Skill files           ~2,500 tokens (loaded skill)
├── Conversation history  ~10,000 tokens (ongoing chat)
├── Tool results          ~50,000 tokens (file reads, command output)
└── Available for output  ~135,500 tokens remaining

This looks fine for a single skill. But consider:

  • Loading 3 skills simultaneously: +7,500 tokens
  • A long conversation with file reads: history + results can hit 100K+
  • Now your skill content is competing for space the model needs to reason

Practical guidelines:

  • Keep individual skills under 500 words (~650 tokens)
  • If a skill exceeds 1,000 words, split it into task-specific sub-skills
  • Monitor agent output quality degradation as conversations lengthen — skill bloat is a common cause
  • Prefer “what NOT to do” lists over exhaustive “how to do everything” guides — anti-patterns are more token-efficient

When Skills Are the Wrong Tool

  • Large document corpus → Use RAG. Skills don’t scale past a few thousand words.
  • Dynamic data (live metrics, API responses) → Use MCP resources. Skills are static files.
  • Per-user customization → Skills are repo-wide. For per-user behavior, use system prompt configuration.
  • Complex reasoning patterns → If you’re writing a 5,000-word skill to teach the model a complex workflow, consider whether a tool (structured input/output) would be more reliable.

Key Takeaways

  • Skills are just-in-time file injection — deterministic, zero-infrastructure, version-controlled
  • They fill the gap between fine-tuning (too heavy) and RAG (too complex) for small, curated expert knowledge
  • Context budget is the main constraint — keep skills concise, split when they grow
  • Structure matters: scope the skill, be prescriptive, include anti-patterns
  • Treat skill files like config: PR review, CODEOWNERS, test that they produce the expected output