Skill Wiki v0.1.0

Docs / background / problem

On this page

The bulk-loading problem

The pattern

The default way to teach a frontier-model agent something is to write a markdown file (a "skill"), drop it in .claude/skills/ or an equivalent directory, and have it injected into the system prompt whenever the agent detects relevance. Anthropic's SKILL.md format is the canonical example. A typical skill is 100–500 lines of prose, YAML frontmatter, and worked examples.

This works well at small scale. With 5 skills, the cost is negligible. With 50 skills it's noticeable. With 500 it breaks the agent.

Three measurable failures

Failure 1 — token cost grows with skill count, not task complexity

Every turn loads every potentially-relevant skill. With 60 skills averaging 200 lines each, you are spending ~30k tokens of context on knowledge the agent might not need. Most turns use one or two skills' worth of content, but pay for all 60.

# A real measurement (frontend-design corpus, v3 benchmark)

Bulk-loaded skills:    60 SKILL.md  ×  ~200 lines  =  ~30 KB context
Skill Wiki index:        980 atoms  ×  ~30 tok summary line   =  ~3 KB context
Atom bodies on demand:                              =  ~600–2,000 tok per turn

Saving:                                                ~10 ×  index reduction
                                                       ~6 ×   typical-turn token cost

Failure 2 — context pollution beats context budget

The token-budget framing misses the more serious failure: the wrong tokens, in the prompt, actively mislead the model. A skill on "OWASP input validation" loaded into a prompt about email template copy doesn't just waste tokens — it nudges the model toward security-flavoured responses that don't fit. We measured a quality-score delta of −13 between bulk-loaded and on-demand conditions on the same 20-task benchmark.

A 200-line markdown blob has no boundary. If you want to use just three facts and one rule from a skill, you have to load the whole thing. Coarse-grained knowledge plus eager loading is the failure mode.

Failure 3 — skills can't reference each other meaningfully

The relationship between two skills, in the SKILL.md format, is at best a hyperlink to "see also." There is no machine-checkable way to say "this rule requires that term," "this pattern contradicts that anti-pattern," or "this example validates that fact." So the validator can't catch contradictions, the retriever can't follow dependencies, and the chunker has no kind-aware notion of what to keep vs. drop.

The skill ends up being JavaScript: a typeless blob that happens to do the right thing in the cases the author tested.

What the alternative needs

The alternative needs three properties, taken together:

  1. Atomic granularity. The unit of knowledge must be small enough that it can be loaded individually. Not "a skill" — one fact, one rule, one pattern, one method.
  2. Typed edges. The relationships between atoms must be expressed as typed verbs (requires, validates-with, contradicts) so a validator can reason over them and a retriever can walk them.
  3. Lazy projection. The agent must see which atoms exist (cheap) without loading their content (expensive). Loading happens by id, on demand, after retrieval picks targets.