Token Optimization Mastery for Claude Code

Summary of: deepresearch/dev-workflow/token-optimization.md

Key Points

MCP tool overhead is massive: 55,000-134,000 tokens before conversation starts; use defer_loading: true for 85% reduction
Subagent isolation is key: Subagents consume 10,000+ tokens internally but return only 500-1,000 to main context
Model selection transforms costs: Haiku costs 3x less than Sonnet; use for 80% of routine operations
Prompt caching delivers 90% savings: Cache reads cost 0.1x base price; structure static content first
File inclusion is dangerous: @filename loads entire contents; target specific files, not directories
Bash output is unbounded: Always limit output (e.g., git log -5 not git log)
Extended thinking levels: "think" (5-10K), "think hard" (20-50K), "think harder" (50-100K), "ultrathink" (100-128K)
Session management: Use /compact before 95% capacity, /clear between unrelated tasks

Compounding effect is multiplicative - Caching (90%) + model routing (3x) + isolation (37%) + deferral (85%) compounds
MCP tools are hidden cost - Most teams don't realize tool definitions consume tokens before any work begins
Typical cost reduction: From $6/day average to $2-3/day with proper optimization

Model	Input/MTok	Output/MTok	Use Case
Haiku 4.5	$1	$5	Quick fixes, searches
Sonnet 4.5	$3	$15	Standard development
Opus 4.5	$5	$25	Complex architecture