On this page
Chunker
The chunker turns a parsed AtomDeclaration into three Markdown
projections. Splits are kind-aware — every kind has its own rule for what
fits in summary, core, and full.
Signature
// packages/compiler/src/chunker.ts
export interface ChunkLevels {
/** ~30 tok: description + tags + 1-line claim */
summary: string;
/** ~150 tok: + core body fields */
core: string;
/** ~380 tok: + sources + examples + relations + notes */
full: string;
}
export function chunk(atom: AtomDeclaration): ChunkLevels {
const splitter = SPLITTERS[atom.kind] ?? defaultSplitter;
return splitter(atom);
} Per-kind splitters
Each kind has a splitter function. The chunker dispatches on
atom.kind. New kinds (see Custom kinds)
register a splitter via the SPLITTERS map.
const SPLITTERS: Record<AtomKind, Splitter> = {
rule: ruleSplitter,
pattern: patternSplitter,
fact: factSplitter,
method: methodSplitter,
// ...
};
function ruleSplitter(atom: AtomDeclaration): ChunkLevels {
return {
summary: `# ${atom.name}
${atom.fields.find('claim')?.value}`,
core: `# ${atom.name}
${atom.fields.find('claim')?.value}
**Applies to:** ${atom.fields.find('applies-to')?.value.join(', ')}
**Severity:** ${atom.fields.find('severity')?.value}`,
full: `# ${atom.name}
${atom.fields.find('claim')?.value}
**Applies to:** ${atom.fields.find('applies-to')?.value.join(', ')}
**Severity:** ${atom.fields.find('severity')?.value}
## Remediation
${atom.fields.find('remediation')?.value ?? '(none specified)'}
## Validates with
${edgesByVerb(atom, 'validates-with').map(formatRef).join('\n- ')}
`,
};
} Token counting
The token target is approximate; the chunker uses a fast char-based heuristic
(~3.5 chars/token) at compile time, not an LLM tokenizer. The
tokens attribute on <atom> in _index.xml
is the actual full-projection length, post-emit.
What gets dropped per level
| From → to | Dropped |
|---|---|
| full → core | sources, counter-examples, exception cases, related-via-soft verbs |
| core → summary | secondary fields (applies-to, severity, parameters) |
The principle: an atom should be useful at every level. A summary that's just the atom's name is wrong — the agent needs the claim. A full that's just the atom's name + claim is also wrong — the agent needs the body.
Output files
compiled/
└── @scope/
└── rule-keyboard-accessible/
├── atom.yaml
├── graph.yaml
└── chunks/
├── summary.md ← ~30 tok
├── core.md ← ~150 tok
└── full.md ← ~380 tok