On this page

Chunker

The chunker turns a parsed AtomDeclaration into three Markdown projections. Splits are kind-aware — every kind has its own rule for what fits in summary, core, and full.

Signature

// packages/compiler/src/chunker.ts
export interface ChunkLevels {
  /** ~30 tok: description + tags + 1-line claim */
  summary: string;
  /** ~150 tok: + core body fields */
  core: string;
  /** ~380 tok: + sources + examples + relations + notes */
  full: string;
}

export function chunk(atom: AtomDeclaration): ChunkLevels {
  const splitter = SPLITTERS[atom.kind] ?? defaultSplitter;
  return splitter(atom);
}

Per-kind splitters

Each kind has a splitter function. The chunker dispatches on atom.kind. New kinds (see Custom kinds) register a splitter via the SPLITTERS map.

const SPLITTERS: Record<AtomKind, Splitter> = {
  rule:         ruleSplitter,
  pattern:      patternSplitter,
  fact:         factSplitter,
  method:       methodSplitter,
  // ...
};

function ruleSplitter(atom: AtomDeclaration): ChunkLevels {
  return {
    summary: `# ${atom.name}

${atom.fields.find('claim')?.value}`,

    core: `# ${atom.name}

${atom.fields.find('claim')?.value}

**Applies to:** ${atom.fields.find('applies-to')?.value.join(', ')}
**Severity:** ${atom.fields.find('severity')?.value}`,

    full: `# ${atom.name}

${atom.fields.find('claim')?.value}

**Applies to:** ${atom.fields.find('applies-to')?.value.join(', ')}
**Severity:** ${atom.fields.find('severity')?.value}

## Remediation
${atom.fields.find('remediation')?.value ?? '(none specified)'}

## Validates with
${edgesByVerb(atom, 'validates-with').map(formatRef).join('\n- ')}
`,
  };
}

Token counting

The token target is approximate; the chunker uses a fast char-based heuristic (~3.5 chars/token) at compile time, not an LLM tokenizer. The tokens attribute on <atom> in _index.xml is the actual full-projection length, post-emit.

What gets dropped per level

From → to	Dropped
full → core	sources, counter-examples, exception cases, related-via-soft verbs
core → summary	secondary fields (applies-to, severity, parameters)

The principle: an atom should be useful at every level. A summary that's just the atom's name is wrong — the agent needs the claim. A full that's just the atom's name + claim is also wrong — the agent needs the body.

Output files

compiled/
└── @scope/
    └── rule-keyboard-accessible/
        ├── atom.yaml
        ├── graph.yaml
        └── chunks/
            ├── summary.md     ←  ~30 tok
            ├── core.md        ←  ~150 tok
            └── full.md        ←  ~380 tok