本页目录

`_index.xml` 格式

_index.xml 是运行时最关键的一份产物——agent 一启动看到的就是它。每个 Prime 都要发布一份。格式在 v1 已经定死。

为什么用 XML

消费方是 LLM 的时候，标签边界比格式美观更要紧。JSON 里那堆 { / } 噪声，模型解析起来比 <atom>...</atom> 费劲。Anthropic 的官方文档明确建议在构造 prompt 时用 XML 标签，index 顺势就走 XML。

Schema

<?xml version="1.0" encoding="UTF-8"?>
<prime_index version="1.0" total="898" total_tokens="422867">

  <!-- One <cluster> per domain (optional grouping). -->
  <cluster name="accessibility" density="0.42">

    <!-- One <atom> per atom in the Prime. -->
    <atom id="@community/rule-keyboard-accessible"
          kind="rule"
          tokens="312"
          q="4.0">

      <!-- The summary projection (~30 tokens). Inline. -->
      Every interactive control must be operable from the keyboard…

      <!-- All edges out. Self-closing. -->
      <edge type="related"        target="@community/anti-pattern-..."/>
      <edge type="validates-with" target="@w3c/source-wcag-2.1.1"/>
      <edge type="contradicts"    target="@community/anti-pattern-..."/>
    </atom>

    <atom id="..." kind="..." tokens="..." q="...">
      ...
    </atom>

  </cluster>

  <!-- More clusters... -->

</prime_index>

必填属性

元素	属性	是否必填	说明
`prime_index`	`version`	是	协议版本，当前为 `1.0`。
`prime_index`	`total`	是	原子总数。
`prime_index`	`total_tokens`	建议	所有 full 投影的 token 总和。
`cluster`	`name`	是	domain 标签，或者随便起的分组名。
`cluster`	`density`	建议	这个 cluster 的"边/原子"密度。
`atom`	`id`	是	全限定 id，形如 `@scope/kind-name`。
`atom`	`kind`	是	28 种 kind 中的一种。
`atom`	`tokens`	是	`full` 投影的 token 数。
`atom`	`q`	建议	质量分（0.0 – 5.0）。
`edge`	`type`	是	14 个 verb 之一。
`edge`	`target`	是	原子 id，可以跨 Prime。

体积预算

单个 <atom> 项连同内联的 summary 和边，应当压在 50 token 以内。100 个原子的 corpus 编出来约 5 KB；1,000 个原子的 corpus 控制在 50 KB 左右。参考 frontend-design Prime（898 个原子）编出来的 _index.xml 是 52 KB。

Agent 怎么用它

MCP server 是惰性发出 index 的——第一次 prime_query 调用时启动运行时，运行时把 index 文件读一次，之后就从内存结构里返 hits。磁盘上的 XML 才是 source of truth，内存里的那份只是临时缓存。

agent 在 MCP 工具响应里通常拿到的是过滤后的视图，不是完整那 50 KB。完整 XML 是协议表面，每一轮 agent 实际看到什么，由 query 决定。

_index.xml 格式