本页目录

Chunker

Chunker 把解析过的 AtomDeclaration 切成三份 Markdown 投影。切法按 kind 走 —— 每种 kind 都有自己的规矩，决定 summary、core、full 各装什么。

函数签名

// packages/compiler/src/chunker.ts
export interface ChunkLevels {
  /** ~30 tok: description + tags + 1-line claim */
  summary: string;
  /** ~150 tok: + core body fields */
  core: string;
  /** ~380 tok: + sources + examples + relations + notes */
  full: string;
}

export function chunk(atom: AtomDeclaration): ChunkLevels {
  const splitter = SPLITTERS[atom.kind] ?? defaultSplitter;
  return splitter(atom);
}

每种 kind 的 splitter

每种 kind 配一个 splitter 函数。chunker 按 atom.kind 派发。新 kind（见自定义 kind）通过 SPLITTERS 这张表注册自己的 splitter。

const SPLITTERS: Record<AtomKind, Splitter> = {
  rule:         ruleSplitter,
  pattern:      patternSplitter,
  fact:         factSplitter,
  method:       methodSplitter,
  // ...
};

function ruleSplitter(atom: AtomDeclaration): ChunkLevels {
  return {
    summary: `# ${atom.name}

${atom.fields.find('claim')?.value}`,

    core: `# ${atom.name}

${atom.fields.find('claim')?.value}

**Applies to:** ${atom.fields.find('applies-to')?.value.join(', ')}
**Severity:** ${atom.fields.find('severity')?.value}`,

    full: `# ${atom.name}

${atom.fields.find('claim')?.value}

**Applies to:** ${atom.fields.find('applies-to')?.value.join(', ')}
**Severity:** ${atom.fields.find('severity')?.value}

## Remediation
${atom.fields.find('remediation')?.value ?? '(none specified)'}

## Validates with
${edgesByVerb(atom, 'validates-with').map(formatRef).join('\n- ')}
`,
  };
}

Token 计数

Token 目标是粗估值。编译期 chunker 用一个快速的字符级启发式（约 3.5 字符一个 token），不调 LLM 的 tokenizer。_index.xml 里 <atom> 上的 tokens 属性才是发射后 full 投影的实际长度。

从一层到下一层丢什么

从 → 到	丢掉
full → core	sources、counter-example、例外情形、软性 verb 关联
core → summary	次要字段（applies-to、severity、parameters）

原则只有一条：原子在每一层都得有用。一份只剩名字的 summary 是错的 —— agent 要的是 claim。一份只有名字加 claim 的 full 同样是错的 —— agent 要的是 body。

输出文件

compiled/
└── @scope/
    └── rule-keyboard-accessible/
        ├── atom.yaml
        ├── graph.yaml
        └── chunks/
            ├── summary.md     ←  ~30 tok
            ├── core.md        ←  ~150 tok
            └── full.md        ←  ~380 tok