Skip to main content

Token Strategy

How jaan.to manages token efficiency across skills, sessions, and invocations.


What Is It?

Token strategy is jaan.to's system-wide approach to minimizing context window usage while preserving full skill capabilities. Every skill loaded into a Claude Code session consumes tokens from a finite context window. Without active management, skill definitions, descriptions, and reference material would quickly exhaust the available budget — degrading performance or silently dropping skills.

jaan.to addresses this at four levels: CLAUDE.md (always loaded), session-level (what gets loaded), invocation-level (how skills run), skill-level (how SKILL.md files are structured), plus CI enforcement to prevent regression.


Key Points

  • Description Budget — All skill descriptions share a 15,000-character budget in the system prompt. Exceeding it causes skills to be silently dropped. Each skill costs ~109 chars XML overhead plus description length.
  • Reference Extraction — Large skills split into a compact SKILL.md (execution instructions) and a reference file (templates, tables, patterns). The AI loads the reference file on demand via inline pointers.
  • Frontmatter Flags — Two flags control when and how skills load: disable-model-invocation removes internal skills from auto-suggestions (~280 tokens/session saved), and context: fork runs heavy skills in isolated subagents (30-48K tokens saved per invocation).

How It Works

Layer 0: CLAUDE.md (Always Loaded)

The plugin's root CLAUDE.md loads in every session where the plugin is active. It contains behavioral rules, trust boundaries, file locations, and the Skill-First Decision Tree. All content here is "always-on" cost.

ConstraintValue
Target size≤ 130 lines
Hard cap≤ 150 lines
Current size~119 lines

Why most content must stay in CLAUDE.md: Claude Code's path-scoped .claude/rules/ files do not ship with plugins — they are project-local. Skills are invoked from arbitrary directories, so even project-level scoped rules (e.g., paths: ["jaan-to/**"]) would not load when a user invokes /pm-prd-write from /src/app/. Therefore, all universal behavioral rules (Two-Phase Workflow, Trust boundaries, Skill-First Decision Tree) must remain in CLAUDE.md.

Tightening strategy: Consolidate redundant wording and compress prose while preserving all behavioral semantics. No content removal — only reformulation.

Layer 1: Session-Level (System Prompt)

When a Claude Code session starts, every skill's description field from its YAML frontmatter is injected into the system prompt. This is the description budget:

ConstraintValue
Total budget15,000 characters
Per-skill overhead~109 chars XML
Max description120 chars
Validationscripts/validate-skills.sh

Skills with disable-model-invocation: true are excluded from auto-suggestions, saving ~280 tokens per session. These are internal skills (like detect-* infrastructure) that users don't invoke directly.

Layer 2: Invocation-Level (Skill Execution)

When a skill runs, its full SKILL.md is loaded into context. Two mechanisms control this cost:

Fork isolation (context: fork): Heavy analysis skills (like detect-dev, detect-design) run in an isolated subagent. The parent conversation never sees the full skill definition — only the bounded output. This saves 30-48K tokens per invocation.

Reference file loading: Skills with inline pointers load reference material only when needed. The AI reads the pointer and fetches the specific section from the reference file, rather than having the entire reference pre-loaded.

Layer 3: Skill-Level (SKILL.md Structure)

Each SKILL.md has a line target based on complexity:

ComplexityTargetMax
Simple (single-phase)150-300400
Standard (two-phase)300-500500
Complex (multi-stack)400-500600

When a SKILL.md exceeds ~500 lines, reference extraction splits it:

Stays in SKILL.md (needed every invocation):

  • Phase structure and step headings
  • User interaction flows (AskUserQuestion)
  • Compact detection tables (< 10 rows)
  • Quality checklists and Definition of Done

Moves to reference file (loaded on demand):

  • Code template blocks
  • Multi-stack comparison tables
  • CWE/OWASP mapping tables
  • Configuration file examples
  • Anti-pattern lists (> 10 items)
  • Directory layout trees (> 10 lines)

Reference files live at docs/extending/{skill-name}-reference.md and are linked via inline pointers:

> **Reference**: See `${CLAUDE_PLUGIN_ROOT}/docs/extending/{skill-name}-reference.md`
> section "{Section Name}" for {description}.

Layer 4: CI Enforcement

Automated gates prevent token budget regression:

GateTargetEnforcement
SKILL.md hard cap≤ 600 linesvalidate-skills.sh — fail
Reference coverage>500 lines → must have reference filevalidate-skills.sh — warn
Auto-invocable count≤ 35 skillsvalidate-skills.sh — warn
CLAUDE.md size≤ 150 linesrelease-check.yml — fail
Description budget≤ 15,000 charsvalidate-skills.sh — fail
Hook stdout cap≤ 1,200 chars (~300 tokens)release-check.yml — fail

Examples

v5.0.0 Optimization Results

Body-trimmed 8 large skills with reference extraction. Extracted language settings and pre-execution boilerplate from 31 skills. Added disable-model-invocation to 7 internal skills and context: fork to 6 detect skills.

Savings: ~2,000 tokens/session permanently, ~7K-48K tokens per skill invocation.

v6.0.0 Spec-to-Ship Skills

5 new skills created at 3,351 total lines, then token-optimized to 2,507 lines (~25% reduction) via reference extraction:

SkillBeforeAfterSaved
dev-project-assemble72648933%
backend-service-implement73152129%
qa-test-generate73555624%
sec-audit-remediate51845213%
devops-infra-scaffold64148924%

v7.0.0 Token Optimization (Research #75)

Aggressive but quality-safe optimization based on Research #75. Applied extraction safety checklist to distinguish safe-to-extract content (lookup tables, templates, scoring rubrics) from unsafe content (decision tables coupled to procedures, entity extraction algorithms).

PhaseOptimizationBaseline savingsPer-invocation savings
1Reference extraction (16 skills)~2,000-8,000 tokens/invocation
1Shared detect reference (5 skills)~1,500 tokens/invocation
1BProse tightening (safe patterns, 15 skills)~150-225 tokens/invocation
2CLAUDE.md tightening (~18 lines)~50 tokens/session
3bootstrap.sh compact mode~100-200 tokens/session
45 skills → manual-only~200 tokens/session
Total~250-450 tokens/session~3,650-9,725 tokens/invocation

Safe prose tightening rules (verified via deep analysis of 4 skills):

  • Pattern 1: Kill preambles that only restate headings (safe)
  • Pattern 5 (selective): Abbreviate informational placeholders only (safe for semantic IDs, unsafe for function params)
  • Patterns 2-4 rejected: telegraphic instructions lose ordering constraints, compressed boolean lists lose mutual-exclusivity signaling, trimmed "Show user" blocks lose behavioral gates

Representative skills after extraction (lines extracted → current SKILL.md size):

SkillLines ExtractedCurrent Size
pm-research-about230547
roadmap-update175465
jaan-issue-report149598
backend-data-model134464
qa-test-cases124478
ux-flowchart-generate114482
detect-design100497
detect-ux97498
detect-writing93532
ux-microcopy-write82553
+ 12 more skills

23 reference files created at docs/extending/*-reference.md, including detect-shared-reference.md shared across 5 detect skills. Total: 44 files changed, 2,211 insertions, 1,858 deletions (net reduction ~938 lines).

Post-v7 budget state (updated after Agent Skills compatibility, v8):

MetricValueHeadroom
Description budget~10,609 / 15,000 chars29% remaining (~16 more skills)
Auto-invocable skills33 / 35 cap2 more before cap
CLAUDE.md119 / 150 lines31 lines free
Largest SKILL.md~507 lines / 600 cap93 lines before cap
Total skill lines~21,000 across 56 skillsMedian ~440 lines

UI workflow addition (v7.6): 3 new auto-invocable skills (frontend-story-generate, frontend-visual-verify, frontend-component-fix) added ~327 chars to description budget and 3 auto-invocable slots. A shared reference file (docs/extending/frontend-ui-workflow-reference.md, ~200 lines) avoids duplicating CSF3 patterns, visual scoring rubric, and MCP degradation logic across the 3 skills.

Note: Description budget increased from 8,409 to 10,282 chars due to Agent Skills enrichment (adding "Use when" trigger phrases to all 44 descriptions). 13 overlong SKILL.md files were refactored below 500 lines via reference extraction.


Cumulative Impact

Token optimization across three major versions:

VersionSession SavingsPer-Invocation SavingsMethod
v5.0.0~2,000 tokens~7K-48K tokensFork isolation (6 detect), body trimming (8 skills), disable-model-invocation (7 skills)
v6.0.0~25% body reductionReference extraction at creation time (5 new skills)
v7.0.0+~350 tokens+~2K-8K tokensAggressive extraction (22 skills), CLAUDE.md tightening, bootstrap compact mode
Cumulative~2,400 tokens/sessionUp to ~56K per invocation49 skills, 26 reference files, 6 CI gates

Practical effect: A typical skill invocation loads ~450-500 lines of execution instructions instead of the ~600-700 lines that would exist without extraction. This saves roughly 500-2,000 tokens per call. For skills using context: fork (6 detect skills), the parent session never sees these tokens — the full 30-48K cost is isolated to a disposable subagent. Combined, a session that invokes 3 skills and 1 detect analysis saves approximately 5,000-52,000 tokens versus an unoptimized plugin of equivalent capability.


Future Skill Compliance

All new skills must follow token strategy from creation:

  1. Description: ≤ 120 chars, no colons
  2. Body size tiers: Simple 150-300 (max 400), Standard 300-500 (max 500), Complex 400-500 (max 600)
  3. Reference extraction trigger: If >500 lines during authoring, extract lookup/template content using the extraction safety checklist
  4. Prose rules: Kill preambles (don't restate headings), abbreviate informational placeholders (not function params)
  5. Frontmatter checklist: Consider disable-model-invocation for narrow-domain skills, context: fork for >30K token skills
  6. CI validation: Run scripts/validate-skills.sh after adding any skill

Reference: docs/extending/create-skill.md for the enforced template.