On this page
Roadmap
Covers the system repo only. Corpus repos maintain their own roadmaps. See the CHANGELOG for what's already shipped.
Status legend:
- Shipped — in v0.1.0
- Underway — being worked on now
- Planned — committed, not started
- Considering — open question, may or may not happen
- Out of scope — explicitly not pursuing
v0.1.0 — protocol baseline [shipped 2026-05-09]
The protocol baseline: implementation separated from any domain corpus.
- Parser + L1 structural checker
- L3 cross-atom graph checker
- Runtime atom loader + projection resolver
- HTTP registry with publish / install
primeCLI with 10 verbs- Generic MCP server (
prime_queryover any compiled corpus) - 3 example corpora demonstrating cross-domain applicability
- Protocol spec at
spec/PRIME-PROTOCOL-v1.md - Apache-2.0 license + NOTICE attributions
v0.2 — lifecycle and AST [planned · Q3 2026]
The two pieces of v1 spec the v0.1 implementation didn't honor.
Lifecycle enforcement
The atom DSL allows version: "1.2.0" and a status field —
active | deprecated | experimental.
Today the parser accepts these but the compiler does nothing with them:
deprecated atoms appear in retrieval without warnings.
Plan: when a deprecated atom is selected by
prime_query, the response includes
warnings: [{ kind: "deprecated", atom_id, message, replacement }].
The runtime API surfaces this; the CLI prints a yellow warning line.
Optional follow-up: prime check --strict-deprecation fails the
build when the corpus contains active references to deprecated
atoms.
Structured type AST
Atoms can declare types using a small expression language: function
signatures, unions, ranges. Today the parser preserves these as opaque
strings — they round-trip but the compiler can't reason over them.
Plan: build a TypeExpr AST and use it to enforce that, for
example, a fact atom's confidence: parses as
0.0-1.0.
Prime self-evolution [planned · 2026 → 2027]
A corpus that only grows by hand-written PRs goes stale. Real usage is the best signal for what's missing — which atoms get queried with no hit, which intents return low-confidence results, which projections get rewritten by the agent before use. v0.1 throws all that signal away. v0.2+ keeps it, opt-in, and feeds it back into the corpus.
Telemetry ingest API
A small, opt-in HTTP endpoint the runtime can POST to on every
prime_query: intent, kinds asked, ids returned, projection
level, hit / miss, latency. No content, no PII. Off by default; enabled
per-corpus via domain.yaml.
Atom-proposal PR-bot from usage gaps
A scheduled job reads the telemetry stream, finds repeated zero-hit
queries, and uses an LLM extractor (DSPy-style program) to propose new
atoms. Output: a draft PR against the corpus repo with a stub
.prime file the maintainers can edit and merge.
Edge inference reflective pass
After authors write atoms, a TextGrad-style pass proposes likely edges
(requires, contradicts,
validates-with) by reflecting on each pair. Maintainers
approve / reject; nothing auto-merges.
Atom-diff viewer in marketplace UI
When a corpus version bumps, show the diff at the atom level: which atoms changed, which edges moved, which projections re-rendered. Helps consumers audit upgrades.
DSPy-style auto-tuning of projection priors
The chunker's projection prior (which fields to keep at summary
vs core vs full) is hand-coded per kind today.
A DSPy-style program could tune those priors per corpus, optimising for
downstream task accuracy.
Moonshot: schema evolution. Atom kinds are fixed by spec today. A corpus accumulating telemetry could propose its own kinds — à la AutoSchemaKG — and bubble them up as v2 spec candidates.
Prime evaluation [planned · 2026 Q4]
A protocol is only as useful as it is measurable. Today, the only evaluation is "does the agent cite the right atoms" — checked by hand on a 20-task benchmark. v0.2 makes this a first-class verb so corpus authors can prove a change improves something.
prime eval CLI verb
A corpus-scoped harness that wraps
Inspect AI
under the hood. Reads an eval/ directory of task definitions,
runs them against a configured agent, scores against expected atom
citations and domain-specific scorers.
MCP-Bench adapter
An adapter that exposes a Skill Wiki corpus to MCP-Bench so corpora can be benchmarked head-to-head against other MCP servers on the same task suite.
Domain-specific scorer plugins
The harness ships with kind-aware scorers; corpora can register more.
For prime-corpus-frontend: axe-core pass-rate, Lighthouse
score, visual-regression delta. For a security corpus: OWASP-rule
pass-rate.
Three-arm A/B harness
prime eval --arms prime,skill,raw runs the same task three
times — once with the corpus mounted via Skill Wiki, once with bulk-loaded
SKILL.md, once with no skill — and reports the deltas. Lets corpus
authors prove the protocol pays its keep.
Citation-precision metric
Of the atoms prime_query returned, how many appeared in the
agent's final output? A high-precision corpus is one whose retrieval is
well-calibrated; a low-precision corpus is over-fetching or under-using.
Moonshot: a public Prime leaderboard. Corpora register; the harness runs a fixed task suite weekly; results are published with version pinning. Same spirit as HumanEval for code.
References: MCP-Bench · Inspect AI
Prime optimization [planned · 2027]
The v0.1 retrieval path is naïve: load _index.xml, rank,
fetch projections. That's fine at 1k atoms. At 10k it's wasteful; at
100k it stops fitting in context at all. v0.3+ tightens the loop without
changing the protocol surface.
Per-intent edge-graph pruning at query time
Walk the edge graph from the seed atoms outward up to max_depth,
but prune branches whose verb mix doesn't match the intent
(e.g., for an "implementation" intent, drop tradeoff and
provocation edges). Smaller candidate set, same recall on
the relevant kinds.
LLMLingua-style projection compressor (--compress flag)
A compressor applied to core and full
projections at serve time, selectable per query. Trades a small amount of
fidelity for ~2× tokens saved.
Atom-result cache with content-addressed keys
Keyed by (intent_hash, kinds, max_atoms). Same query inside
a session = zero retrieval cost. Invalidates on corpus recompile.
Multi-Prime composition budget allocation
When multiple corpora are mounted (per the v0.3 multi-corpus MCP), the runtime allocates a token budget across them based on per-corpus intent score, instead of fixed per-corpus quotas.
Compile-time projection profiles per intent class
Compile a corpus N times, once per intent class (e.g., "design",
"implementation", "review"), producing per-class core
projections that emphasise different fields. Runtime picks the profile
based on intent.
Moonshot: KVzip-style
key-value memory per atom. Cache the decoder KV state for each
core projection at compile time; on retrieval, splice it in
instead of re-encoding. Removes the per-turn re-encoding cost entirely.
Top 5 v0.2 ships
If we ship nothing else in v0.2, these five carry the release:
prime evalCLI — the harness that makes every other claim measurable.- Telemetry ingest + atom-proposal bot — closes the corpus-staleness loop.
- Atom-diff viewer — required for users to trust corpus version bumps.
- Per-intent edge-graph pruning — the first retrieval optimisation that pays for itself on day one.
- Citation-precision metric — the single number that tells a corpus author whether their atoms are pulling their weight.
v0.3 — formal domain plugin protocol [planned · Q4 2026]
Today, domain: is a metadata tag on each atom. The runtime
treats frontend-design, security,
recipes identically. The spec already imagines multiple
corpora coexisting under one server, with domain-aware routing — v0.3
lands the plugin interface, multi-corpus MCP, and an optional
auto-disambiguation pass when an intent is ambiguous between corpora.
v0.4 — better registry [planned · 2027]
The v0.1 registry is barebones HTTP: PUT to publish, GET to install. No
semver resolution, no signing, no audit log. v0.4 adds semver-aware
install with prime.lock, cryptographic signing, and an audit
log. A web UI is considered out of scope unless someone actually
self-hosts the public registry.
How priorities are set
Three questions for any addition:
- Does it unblock a corpus team that exists today?
- Does it generalise across corpora? (Protocol-level features must.)
- Does it preserve the "tiny system, expressive corpus" balance? The system repo aims to stay under ~15k LoC.
The fastest path for a feature: prototype in a corpus repo, prove it generalises by porting to a second corpus, then propose the protocol change.