What Skills Actually Are
Skills are not an Anthropic API feature. You won’t find them in the API docs. They’re a prompt engineering pattern implemented at the host level — a way to inject expert knowledge into the model’s context right before a specific task.
The mechanism is dead simple:
- Host app detects what type of task the model is about to do
- Host reads a relevant markdown file (e.g.,
SKILL.md) - File contents are injected into the system prompt or conversation
- Model generates output with that expert context loaded
No embeddings. No vector store. No retrieval model. Just a file read and string concatenation.
The Discovery and Loading Pipeline
User asks: "Create a Word doc summarizing these test results"
Host (Claude Code):
1. Identifies task type: "document generation" → "docx"
2. Checks known paths:
- ./SKILL.md (project root)
- ./docx/SKILL.md (task-specific)
- ~/.claude/SKILL.md (user-level)
- ./.claude/SKILL.md (project config dir)
3. Reads matching file(s)
4. Injects into context:
system_prompt += "\n\n" + skill_content
LLM now sees 2,000 words of python-docx best practices
→ generates better code than it would from training data alone
The key insight: this is deterministic. Given the same task type and the same file on disk, the same skill loads every time. No probabilistic retrieval, no relevance scoring — just path matching.
Skills vs Fine-tuning vs RAG
Three ways to give a model domain-specific knowledge. Each has a sweet spot:
Fine-tuning
Permanently bakes knowledge into the model weights through additional training.
Pros: Always available, no context cost, fast inference
Cons: Expensive ($$$), slow iteration (retrain to update),
hard to scope (affects all outputs), can degrade
general capabilities (catastrophic forgetting)
When: You need fundamental behavioral changes across ALL tasks
RAG (Retrieval-Augmented Generation)
Semantic search over a document corpus → relevant chunks injected into context.
Pros: Scales to millions of documents, dynamic, no retraining
Cons: Requires infra (embedding model, vector DB, retrieval pipeline),
retrieval quality is probabilistic, chunking artifacts
When: Large corpus (docs, codebases, knowledge bases) where the
right context varies per query
Skills
Deterministic file load → exact content injected into context.
Pros: Zero infrastructure, version-controlled, predictable,
instant iteration (edit file, reload), scoped to task type
Cons: Doesn't scale to large corpora, consumes context budget,
manual curation required
When: Domain-specific instructions, team conventions, library
best practices — small, curated, expert knowledge
The decision matrix:
| Knowledge type | Size | Changes often? | Use |
|---|---|---|---|
| Team coding standards | Small | Rarely | Skills |
| API documentation | Large | Quarterly | RAG |
| Brand voice guidelines | Small | Monthly | Skills |
| Customer support history | Massive | Daily | RAG |
| Framework best practices | Medium | Per version | Skills |
| Fundamental model behavior | N/A | Rarely | Fine-tune |
Structuring a Good Skill File
A skill file is just markdown, but structure matters. Here’s what works:
# Python FastAPI SKILL
## When to apply
When generating or modifying FastAPI route handlers, middleware, or
dependency injection code.
## Conventions
- Always use async route handlers (never sync)
- Use Pydantic v2 model_validator, not v1 validator
- Dependency injection via Annotated[Type, Depends(fn)]
- Return type annotations on all handlers
## Error handling
- Use HTTPException for client errors (4xx)
- Use custom exception handlers for domain errors
- Never return raw 500s — always structured error response
## Testing patterns
- Use httpx.AsyncClient, not TestClient
- Fixtures for DB sessions, not global state
- Parametrize with @pytest.mark.parametrize for edge cases
## Anti-patterns to avoid
- Do NOT use @app.on_event("startup") — use lifespan instead
- Do NOT import Settings at module level — inject via Depends
- Do NOT use synchronous DB drivers with async handlers
What makes this effective:
- Scoped — says exactly when it applies
- Prescriptive — concrete rules, not vague guidelines
- Includes anti-patterns — tells the model what NOT to do (models hallucinate common mistakes)
- Concise — ~200 words, ~250 tokens — minimal context budget cost
Context Budget Management
Skills compete with everything else for the context window:
Context window (200K tokens)
├── System prompt ~2,000 tokens
├── Skill files ~2,500 tokens (loaded skill)
├── Conversation history ~10,000 tokens (ongoing chat)
├── Tool results ~50,000 tokens (file reads, command output)
└── Available for output ~135,500 tokens remaining
This looks fine for a single skill. But consider:
- Loading 3 skills simultaneously: +7,500 tokens
- A long conversation with file reads: history + results can hit 100K+
- Now your skill content is competing for space the model needs to reason
Practical guidelines:
- Keep individual skills under 500 words (~650 tokens)
- If a skill exceeds 1,000 words, split it into task-specific sub-skills
- Monitor agent output quality degradation as conversations lengthen — skill bloat is a common cause
- Prefer “what NOT to do” lists over exhaustive “how to do everything” guides — anti-patterns are more token-efficient
When Skills Are the Wrong Tool
- Large document corpus → Use RAG. Skills don’t scale past a few thousand words.
- Dynamic data (live metrics, API responses) → Use MCP resources. Skills are static files.
- Per-user customization → Skills are repo-wide. For per-user behavior, use system prompt configuration.
- Complex reasoning patterns → If you’re writing a 5,000-word skill to teach the model a complex workflow, consider whether a tool (structured input/output) would be more reliable.
Key Takeaways
- Skills are just-in-time file injection — deterministic, zero-infrastructure, version-controlled
- They fill the gap between fine-tuning (too heavy) and RAG (too complex) for small, curated expert knowledge
- Context budget is the main constraint — keep skills concise, split when they grow
- Structure matters: scope the skill, be prescriptive, include anti-patterns
- Treat skill files like config: PR review, CODEOWNERS, test that they produce the expected output