codeburn/src/types.ts
ozymandiashh 1a080a006f feat(optimize): MCP tool coverage detector with cache-aware costing
Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".

Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.

Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.

When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.

A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.

Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.

`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."

Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged

Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.

Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
  Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
  fully-qualified names, returns sorted unique list. `buildSessionSummary`
  passes it through; field is omitted when empty so JSON exports stay
  clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
  (single- and multi-server signatures), `detectMcpToolCoverage`. Wired
  into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
  new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
  combined-cap behaviour, threshold gates, invocation-only-session
  filtering, foreign-tool invocations, cache rebuild events, write+read
  on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
  including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).

Closes #2
2026-05-05 04:13:04 +03:00

159 lines
3.8 KiB
TypeScript

export type TokenUsage = {
inputTokens: number
outputTokens: number
cacheCreationInputTokens: number
cacheReadInputTokens: number
cachedInputTokens: number
reasoningTokens: number
webSearchRequests: number
}
export type ToolUseBlock = {
type: 'tool_use'
id: string
name: string
input: Record<string, unknown>
}
export type ContentBlock =
| { type: 'text'; text: string }
| { type: 'thinking'; thinking: string }
| ToolUseBlock
| { type: string; [key: string]: unknown }
export type ApiUsage = {
input_tokens: number
output_tokens: number
cache_creation_input_tokens?: number
cache_read_input_tokens?: number
server_tool_use?: {
web_search_requests?: number
web_fetch_requests?: number
}
speed?: 'standard' | 'fast'
}
export type AssistantMessageContent = {
model: string
id?: string
type: 'message'
role: 'assistant'
content: ContentBlock[]
usage: ApiUsage
stop_reason?: string
}
export type JournalEntry = {
type: string
uuid?: string
parentUuid?: string | null
timestamp?: string
sessionId?: string
cwd?: string
version?: string
gitBranch?: string
promptId?: string
message?: AssistantMessageContent | { role: 'user'; content: string | ContentBlock[] }
isSidechain?: boolean
[key: string]: unknown
}
export type ParsedTurn = {
userMessage: string
assistantCalls: ParsedApiCall[]
timestamp: string
sessionId: string
}
export type ParsedApiCall = {
provider: string
model: string
usage: TokenUsage
costUSD: number
tools: string[]
mcpTools: string[]
skills: string[]
hasAgentSpawn: boolean
hasPlanMode: boolean
speed: 'standard' | 'fast'
timestamp: string
bashCommands: string[]
deduplicationKey: string
}
export type TaskCategory =
| 'coding'
| 'debugging'
| 'feature'
| 'refactoring'
| 'testing'
| 'exploration'
| 'planning'
| 'delegation'
| 'git'
| 'build/deploy'
| 'conversation'
| 'brainstorming'
| 'general'
export type ClassifiedTurn = ParsedTurn & {
category: TaskCategory
subCategory?: string
retries: number
hasEdits: boolean
}
export type SessionSummary = {
sessionId: string
project: string
firstTimestamp: string
lastTimestamp: string
totalCostUSD: number
totalInputTokens: number
totalOutputTokens: number
totalCacheReadTokens: number
totalCacheWriteTokens: number
apiCalls: number
turns: ClassifiedTurn[]
modelBreakdown: Record<string, { calls: number; costUSD: number; tokens: TokenUsage }>
toolBreakdown: Record<string, { calls: number }>
mcpBreakdown: Record<string, { calls: number }>
bashBreakdown: Record<string, { calls: number }>
categoryBreakdown: Record<TaskCategory, { turns: number; costUSD: number; retries: number; editTurns: number; oneShotTurns: number }>
skillBreakdown: Record<string, { turns: number; costUSD: number; editTurns: number; oneShotTurns: number }>
// Observed MCP tools available in this session, captured from
// `attachment.deferred_tools_delta.addedNames` entries. Union across all
// turns. Each name is a fully-qualified `mcp__<server>__<tool>` identifier.
// Built-in tools (Bash, Edit, etc.) are filtered out. Provider-agnostic field;
// currently populated only by the Claude parser.
mcpInventory?: string[]
}
export type ProjectSummary = {
project: string
projectPath: string
sessions: SessionSummary[]
totalCostUSD: number
totalApiCalls: number
}
export type DateRange = {
start: Date
end: Date
}
export const CATEGORY_LABELS: Record<TaskCategory, string> = {
coding: 'Coding',
debugging: 'Debugging',
feature: 'Feature Dev',
refactoring: 'Refactoring',
testing: 'Testing',
exploration: 'Exploration',
planning: 'Planning',
delegation: 'Delegation',
git: 'Git Ops',
'build/deploy': 'Build/Deploy',
conversation: 'Conversation',
brainstorming: 'Brainstorming',
general: 'General',
}