Token Strategy
How jaan.to manages token efficiency across skills, sessions, and invocations.
What Is It?
Token strategy is jaan.to's system-wide approach to minimizing context window usage while preserving full skill capabilities. Every skill loaded into a Claude Code session consumes tokens from a finite context window. Without active management, skill definitions, descriptions, and reference material would quickly exhaust the available budget — degrading performance or silently dropping skills.
jaan.to addresses this at four levels: CLAUDE.md (always loaded), session-level (what gets loaded), invocation-level (how skills run), skill-level (how SKILL.md files are structured), plus CI enforcement to prevent regression.
Key Points
- Description Budget — All skill descriptions share a 15,000-character budget in the system prompt. Exceeding it causes skills to be silently dropped. Each skill costs ~109 chars XML overhead plus description length.
- Reference Extraction — Large skills split into a compact SKILL.md (execution instructions) and a reference file (templates, tables, patterns). The AI loads the reference file on demand via inline pointers.
- Frontmatter Flags — Two flags control when and how skills load:
disable-model-invocationremoves internal skills from auto-suggestions (~280 tokens/session saved), andcontext: forkruns heavy skills in isolated subagents (30-48K tokens saved per invocation).
How It Works
Layer 0: CLAUDE.md (Always Loaded)
The plugin's root CLAUDE.md loads in every session where the plugin is active. It contains behavioral rules, trust boundaries, file locations, and the Skill-First Decision Tree. All content here is "always-on" cost.
| Constraint | Value |
|---|---|
| Target size | ≤ 130 lines |
| Hard cap | ≤ 150 lines |
| Current size | ~119 lines |
Why most content must stay in CLAUDE.md: Claude Code's path-scoped .claude/rules/ files do not ship with plugins — they are project-local. Skills are invoked from arbitrary directories, so even project-level scoped rules (e.g., paths: ["jaan-to/**"]) would not load when a user invokes /pm-prd-write from /src/app/. Therefore, all universal behavioral rules (Two-Phase Workflow, Trust boundaries, Skill-First Decision Tree) must remain in CLAUDE.md.
Tightening strategy: Consolidate redundant wording and compress prose while preserving all behavioral semantics. No content removal — only reformulation.
Layer 1: Session-Level (System Prompt)
When a Claude Code session starts, every skill's description field from its YAML frontmatter is injected into the system prompt. This is the description budget:
| Constraint | Value |
|---|---|
| Total budget | 15,000 characters |
| Per-skill overhead | ~109 chars XML |
| Max description | 120 chars |
| Validation | scripts/validate-skills.sh |
Skills with disable-model-invocation: true are excluded from auto-suggestions, saving ~280 tokens per session. These are internal skills (like detect-* infrastructure) that users don't invoke directly.
Layer 2: Invocation-Level (Skill Execution)
When a skill runs, its full SKILL.md is loaded into context. Two mechanisms control this cost:
Fork isolation (context: fork): Heavy analysis skills (like detect-dev, detect-design) run in an isolated subagent. The parent conversation never sees the full skill definition — only the bounded output. This saves 30-48K tokens per invocation.
Reference file loading: Skills with inline pointers load reference material only when needed. The AI reads the pointer and fetches the specific section from the reference file, rather than having the entire reference pre-loaded.
Layer 3: Skill-Level (SKILL.md Structure)
Each SKILL.md has a line target based on complexity:
| Complexity | Target | Max |
|---|---|---|
| Simple (single-phase) | 150-300 | 400 |
| Standard (two-phase) | 300-500 | 500 |
| Complex (multi-stack) | 400-500 | 600 |
When a SKILL.md exceeds ~500 lines, reference extraction splits it:
Stays in SKILL.md (needed every invocation):
- Phase structure and step headings
- User interaction flows (AskUserQuestion)
- Compact detection tables (< 10 rows)
- Quality checklists and Definition of Done
Moves to reference file (loaded on demand):
- Code template blocks
- Multi-stack comparison tables
- CWE/OWASP mapping tables
- Configuration file examples
- Anti-pattern lists (> 10 items)
- Directory layout trees (> 10 lines)
Reference files live at docs/extending/{skill-name}-reference.md and are linked via inline pointers:
> **Reference**: See `${CLAUDE_PLUGIN_ROOT}/docs/extending/{skill-name}-reference.md`
> section "{Section Name}" for {description}.
Layer 4: CI Enforcement
Automated gates prevent token budget regression:
| Gate | Target | Enforcement |
|---|---|---|
| SKILL.md hard cap | ≤ 600 lines | validate-skills.sh — fail |
| Reference coverage | >500 lines → must have reference file | validate-skills.sh — warn |
| Auto-invocable count | ≤ 35 skills | validate-skills.sh — warn |
| CLAUDE.md size | ≤ 150 lines | release-check.yml — fail |
| Description budget | ≤ 15,000 chars | validate-skills.sh — fail |
| Hook stdout cap | ≤ 1,200 chars (~300 tokens) | release-check.yml — fail |
Examples
v5.0.0 Optimization Results
Body-trimmed 8 large skills with reference extraction. Extracted language settings and pre-execution boilerplate from 31 skills. Added disable-model-invocation to 7 internal skills and context: fork to 6 detect skills.
Savings: ~2,000 tokens/session permanently, ~7K-48K tokens per skill invocation.
v6.0.0 Spec-to-Ship Skills
5 new skills created at 3,351 total lines, then token-optimized to 2,507 lines (~25% reduction) via reference extraction:
| Skill | Before | After | Saved |
|---|---|---|---|
| dev-project-assemble | 726 | 489 | 33% |
| backend-service-implement | 731 | 521 | 29% |
| qa-test-generate | 735 | 556 | 24% |
| sec-audit-remediate | 518 | 452 | 13% |
| devops-infra-scaffold | 641 | 489 | 24% |
v7.0.0 Token Optimization (Research #75)
Aggressive but quality-safe optimization based on Research #75. Applied extraction safety checklist to distinguish safe-to-extract content (lookup tables, templates, scoring rubrics) from unsafe content (decision tables coupled to procedures, entity extraction algorithms).
| Phase | Optimization | Baseline savings | Per-invocation savings |
|---|---|---|---|
| 1 | Reference extraction (16 skills) | — | ~2,000-8,000 tokens/invocation |
| 1 | Shared detect reference (5 skills) | — | ~1,500 tokens/invocation |
| 1B | Prose tightening (safe patterns, 15 skills) | — | ~150-225 tokens/invocation |
| 2 | CLAUDE.md tightening (~18 lines) | ~50 tokens/session | — |
| 3 | bootstrap.sh compact mode | ~100-200 tokens/session | — |
| 4 | 5 skills → manual-only | ~200 tokens/session | — |
| Total | ~250-450 tokens/session | ~3,650-9,725 tokens/invocation |
Safe prose tightening rules (verified via deep analysis of 4 skills):
- Pattern 1: Kill preambles that only restate headings (safe)
- Pattern 5 (selective): Abbreviate informational placeholders only (safe for semantic IDs, unsafe for function params)
- Patterns 2-4 rejected: telegraphic instructions lose ordering constraints, compressed boolean lists lose mutual-exclusivity signaling, trimmed "Show user" blocks lose behavioral gates
Representative skills after extraction (lines extracted → current SKILL.md size):
| Skill | Lines Extracted | Current Size |
|---|---|---|
| pm-research-about | 230 | 547 |
| roadmap-update | 175 | 465 |
| jaan-issue-report | 149 | 598 |
| backend-data-model | 134 | 464 |
| qa-test-cases | 124 | 478 |
| ux-flowchart-generate | 114 | 482 |
| detect-design | 100 | 497 |
| detect-ux | 97 | 498 |
| detect-writing | 93 | 532 |
| ux-microcopy-write | 82 | 553 |
| + 12 more skills | — | — |
23 reference files created at docs/extending/*-reference.md, including detect-shared-reference.md shared across 5 detect skills. Total: 44 files changed, 2,211 insertions, 1,858 deletions (net reduction ~938 lines).
Post-v7 budget state (updated after Agent Skills compatibility, v8):
| Metric | Value | Headroom |
|---|---|---|
| Description budget | ~10,609 / 15,000 chars | 29% remaining (~16 more skills) |
| Auto-invocable skills | 33 / 35 cap | 2 more before cap |
| CLAUDE.md | 119 / 150 lines | 31 lines free |
| Largest SKILL.md | ~507 lines / 600 cap | 93 lines before cap |
| Total skill lines | ~21,000 across 56 skills | Median ~440 lines |
UI workflow addition (v7.6): 3 new auto-invocable skills (frontend-story-generate, frontend-visual-verify, frontend-component-fix) added ~327 chars to description budget and 3 auto-invocable slots. A shared reference file (docs/extending/frontend-ui-workflow-reference.md, ~200 lines) avoids duplicating CSF3 patterns, visual scoring rubric, and MCP degradation logic across the 3 skills.
Note: Description budget increased from 8,409 to 10,282 chars due to Agent Skills enrichment (adding "Use when" trigger phrases to all 44 descriptions). 13 overlong SKILL.md files were refactored below 500 lines via reference extraction.
Cumulative Impact
Token optimization across three major versions:
| Version | Session Savings | Per-Invocation Savings | Method |
|---|---|---|---|
| v5.0.0 | ~2,000 tokens | ~7K-48K tokens | Fork isolation (6 detect), body trimming (8 skills), disable-model-invocation (7 skills) |
| v6.0.0 | — | ~25% body reduction | Reference extraction at creation time (5 new skills) |
| v7.0.0 | +~350 tokens | +~2K-8K tokens | Aggressive extraction (22 skills), CLAUDE.md tightening, bootstrap compact mode |
| Cumulative | ~2,400 tokens/session | Up to ~56K per invocation | 49 skills, 26 reference files, 6 CI gates |
Practical effect: A typical skill invocation loads ~450-500 lines of execution instructions instead of the ~600-700 lines that would exist without extraction. This saves roughly 500-2,000 tokens per call. For skills using context: fork (6 detect skills), the parent session never sees these tokens — the full 30-48K cost is isolated to a disposable subagent. Combined, a session that invokes 3 skills and 1 detect analysis saves approximately 5,000-52,000 tokens versus an unoptimized plugin of equivalent capability.
Future Skill Compliance
All new skills must follow token strategy from creation:
- Description: ≤ 120 chars, no colons
- Body size tiers: Simple 150-300 (max 400), Standard 300-500 (max 500), Complex 400-500 (max 600)
- Reference extraction trigger: If >500 lines during authoring, extract lookup/template content using the extraction safety checklist
- Prose rules: Kill preambles (don't restate headings), abbreviate informational placeholders (not function params)
- Frontmatter checklist: Consider
disable-model-invocationfor narrow-domain skills,context: forkfor >30K token skills - CI validation: Run
scripts/validate-skills.shafter adding any skill
Reference: docs/extending/create-skill.md for the enforced template.
Related
- Token Optimization Strategy (Builder Reference) — Implementation details for skill authors
- Roadmap — Version history with optimization milestones
- Research #18: Token Optimization — Original research
- Research #62: Claude Code Token Optimization — Deep research
- Research #75: Aggressive Token Optimization — v7.0.0 research basis
- Extraction Safety Checklist — What to extract vs. keep inline