Skill Wiki v0.1.0

Docs / community / roadmap

On this page

Roadmap

Covers the system repo only. Corpus repos maintain their own roadmaps. See the CHANGELOG for what's already shipped.

Status legend:

  • Shipped — in v0.1.0
  • Underway — being worked on now
  • Planned — committed, not started
  • Considering — open question, may or may not happen
  • Out of scope — explicitly not pursuing

v0.1.0 — protocol baseline [shipped 2026-05-09]

The protocol baseline: implementation separated from any domain corpus.

  • Parser + L1 structural checker
  • L3 cross-atom graph checker
  • Runtime atom loader + projection resolver
  • HTTP registry with publish / install
  • prime CLI with 10 verbs
  • Generic MCP server (prime_query over any compiled corpus)
  • 3 example corpora demonstrating cross-domain applicability
  • Protocol spec at spec/PRIME-PROTOCOL-v1.md
  • Apache-2.0 license + NOTICE attributions

v0.2 — lifecycle and AST [planned · Q3 2026]

The two pieces of v1 spec the v0.1 implementation didn't honor.

Lifecycle enforcement

The atom DSL allows version: "1.2.0" and a status field — active | deprecated | experimental. Today the parser accepts these but the compiler does nothing with them: deprecated atoms appear in retrieval without warnings.

Plan: when a deprecated atom is selected by prime_query, the response includes warnings: [{ kind: "deprecated", atom_id, message, replacement }]. The runtime API surfaces this; the CLI prints a yellow warning line. Optional follow-up: prime check --strict-deprecation fails the build when the corpus contains active references to deprecated atoms.

Structured type AST

Atoms can declare types using a small expression language: function signatures, unions, ranges. Today the parser preserves these as opaque strings — they round-trip but the compiler can't reason over them. Plan: build a TypeExpr AST and use it to enforce that, for example, a fact atom's confidence: parses as 0.0-1.0.

Prime self-evolution [planned · 2026 → 2027]

A corpus that only grows by hand-written PRs goes stale. Real usage is the best signal for what's missing — which atoms get queried with no hit, which intents return low-confidence results, which projections get rewritten by the agent before use. v0.1 throws all that signal away. v0.2+ keeps it, opt-in, and feeds it back into the corpus.

Telemetry ingest API

A small, opt-in HTTP endpoint the runtime can POST to on every prime_query: intent, kinds asked, ids returned, projection level, hit / miss, latency. No content, no PII. Off by default; enabled per-corpus via domain.yaml.

Atom-proposal PR-bot from usage gaps

A scheduled job reads the telemetry stream, finds repeated zero-hit queries, and uses an LLM extractor (DSPy-style program) to propose new atoms. Output: a draft PR against the corpus repo with a stub .prime file the maintainers can edit and merge.

Edge inference reflective pass

After authors write atoms, a TextGrad-style pass proposes likely edges (requires, contradicts, validates-with) by reflecting on each pair. Maintainers approve / reject; nothing auto-merges.

Atom-diff viewer in marketplace UI

When a corpus version bumps, show the diff at the atom level: which atoms changed, which edges moved, which projections re-rendered. Helps consumers audit upgrades.

DSPy-style auto-tuning of projection priors

The chunker's projection prior (which fields to keep at summary vs core vs full) is hand-coded per kind today. A DSPy-style program could tune those priors per corpus, optimising for downstream task accuracy.

Moonshot: schema evolution. Atom kinds are fixed by spec today. A corpus accumulating telemetry could propose its own kinds — à la AutoSchemaKG — and bubble them up as v2 spec candidates.

References: DSPy · TextGrad

Prime evaluation [planned · 2026 Q4]

A protocol is only as useful as it is measurable. Today, the only evaluation is "does the agent cite the right atoms" — checked by hand on a 20-task benchmark. v0.2 makes this a first-class verb so corpus authors can prove a change improves something.

prime eval CLI verb

A corpus-scoped harness that wraps Inspect AI under the hood. Reads an eval/ directory of task definitions, runs them against a configured agent, scores against expected atom citations and domain-specific scorers.

MCP-Bench adapter

An adapter that exposes a Skill Wiki corpus to MCP-Bench so corpora can be benchmarked head-to-head against other MCP servers on the same task suite.

Domain-specific scorer plugins

The harness ships with kind-aware scorers; corpora can register more. For prime-corpus-frontend: axe-core pass-rate, Lighthouse score, visual-regression delta. For a security corpus: OWASP-rule pass-rate.

Three-arm A/B harness

prime eval --arms prime,skill,raw runs the same task three times — once with the corpus mounted via Skill Wiki, once with bulk-loaded SKILL.md, once with no skill — and reports the deltas. Lets corpus authors prove the protocol pays its keep.

Citation-precision metric

Of the atoms prime_query returned, how many appeared in the agent's final output? A high-precision corpus is one whose retrieval is well-calibrated; a low-precision corpus is over-fetching or under-using.

Moonshot: a public Prime leaderboard. Corpora register; the harness runs a fixed task suite weekly; results are published with version pinning. Same spirit as HumanEval for code.

References: MCP-Bench · Inspect AI

Prime optimization [planned · 2027]

The v0.1 retrieval path is naïve: load _index.xml, rank, fetch projections. That's fine at 1k atoms. At 10k it's wasteful; at 100k it stops fitting in context at all. v0.3+ tightens the loop without changing the protocol surface.

Per-intent edge-graph pruning at query time

Walk the edge graph from the seed atoms outward up to max_depth, but prune branches whose verb mix doesn't match the intent (e.g., for an "implementation" intent, drop tradeoff and provocation edges). Smaller candidate set, same recall on the relevant kinds.

LLMLingua-style projection compressor (--compress flag)

A compressor applied to core and full projections at serve time, selectable per query. Trades a small amount of fidelity for ~2× tokens saved.

Atom-result cache with content-addressed keys

Keyed by (intent_hash, kinds, max_atoms). Same query inside a session = zero retrieval cost. Invalidates on corpus recompile.

Multi-Prime composition budget allocation

When multiple corpora are mounted (per the v0.3 multi-corpus MCP), the runtime allocates a token budget across them based on per-corpus intent score, instead of fixed per-corpus quotas.

Compile-time projection profiles per intent class

Compile a corpus N times, once per intent class (e.g., "design", "implementation", "review"), producing per-class core projections that emphasise different fields. Runtime picks the profile based on intent.

Moonshot: KVzip-style key-value memory per atom. Cache the decoder KV state for each core projection at compile time; on retrieval, splice it in instead of re-encoding. Removes the per-turn re-encoding cost entirely.

References: LLMLingua · KVzip

Top 5 v0.2 ships

If we ship nothing else in v0.2, these five carry the release:

  1. prime eval CLI — the harness that makes every other claim measurable.
  2. Telemetry ingest + atom-proposal bot — closes the corpus-staleness loop.
  3. Atom-diff viewer — required for users to trust corpus version bumps.
  4. Per-intent edge-graph pruning — the first retrieval optimisation that pays for itself on day one.
  5. Citation-precision metric — the single number that tells a corpus author whether their atoms are pulling their weight.

v0.3 — formal domain plugin protocol [planned · Q4 2026]

Today, domain: is a metadata tag on each atom. The runtime treats frontend-design, security, recipes identically. The spec already imagines multiple corpora coexisting under one server, with domain-aware routing — v0.3 lands the plugin interface, multi-corpus MCP, and an optional auto-disambiguation pass when an intent is ambiguous between corpora.

v0.4 — better registry [planned · 2027]

The v0.1 registry is barebones HTTP: PUT to publish, GET to install. No semver resolution, no signing, no audit log. v0.4 adds semver-aware install with prime.lock, cryptographic signing, and an audit log. A web UI is considered out of scope unless someone actually self-hosts the public registry.

How priorities are set

Three questions for any addition:

  1. Does it unblock a corpus team that exists today?
  2. Does it generalise across corpora? (Protocol-level features must.)
  3. Does it preserve the "tiny system, expressive corpus" balance? The system repo aims to stay under ~15k LoC.

The fastest path for a feature: prototype in a corpus repo, prove it generalises by porting to a second corpus, then propose the protocol change.