feat(optimize): MCP tool coverage detector with cache-aware costing

Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".

Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.

Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.

When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.

A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.

Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.

`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."

Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged

Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.

Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
  Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
  fully-qualified names, returns sorted unique list. `buildSessionSummary`
  passes it through; field is omitted when empty so JSON exports stay
  clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
  (single- and multi-server signatures), `detectMcpToolCoverage`. Wired
  into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
  new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
  combined-cap behaviour, threshold gates, invocation-only-session
  filtering, foreign-tool invocations, cache rebuild events, write+read
  on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
  including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).

Closes #2
This commit is contained in:
ozymandiashh 2026-05-05 04:13:04 +03:00
parent 18335a1f9d
commit 1a080a006f
6 changed files with 970 additions and 1 deletions

View file

@ -53,6 +53,18 @@ const LOW_RATIO_MEDIUM_THRESHOLD = 3
const MIN_API_CALLS_FOR_CACHE = 10
const CACHE_EXCESS_HIGH_THRESHOLD = 15000
const UNUSED_MCP_HIGH_THRESHOLD = 3
// MCP tool coverage detector thresholds. A server only earns a finding when
// every condition holds: the inventory is large enough to matter, real-world
// usage is poor, and we observed it in enough sessions to trust the signal.
const MCP_COVERAGE_MIN_TOOLS = 10
const MCP_COVERAGE_MIN_SESSIONS = 2
const MCP_COVERAGE_LOW_THRESHOLD = 0.20
const MCP_COVERAGE_HIGH_IMPACT_TOKENS = 200_000
// Anthropic prices cached input reads at roughly 10% of fresh input. We use
// this to keep "ongoing" overhead estimates honest: most MCP schema bytes
// live in the cached prefix and only get charged at the discount rate after
// the first turn of a session.
const CACHE_READ_DISCOUNT = 0.10
const GHOST_AGENTS_HIGH_THRESHOLD = 5
const GHOST_AGENTS_MEDIUM_THRESHOLD = 2
const GHOST_SKILLS_HIGH_THRESHOLD = 10
@ -477,6 +489,298 @@ export function detectDuplicateReads(calls: ToolCall[], dateRange?: DateRange):
}
}
/**
* Per-server breakdown of MCP tool inventory vs invocations, computed from the
* `mcpInventory` field captured by the Claude parser.
*
* Each session that loaded a server contributes its observed tool list to
* the union for that server. Invocations come from the existing
* `mcpBreakdown` per-call counts plus the parser's `call.tools` stream.
*/
export type McpServerCoverage = {
server: string
toolsAvailable: number
toolsInvoked: number
unusedTools: string[]
invocations: number
loadedSessions: number
coverageRatio: number
}
/**
* Aggregate MCP inventory and invocations across the projects in scope.
*
* Returns one entry per `mcp__<server>__*` namespace observed in any
* session's `mcpInventory`. Counts of invocations come from
* `session.mcpBreakdown` (per-server call totals already maintained by the
* parser).
*/
export function aggregateMcpCoverage(projects: ProjectSummary[]): McpServerCoverage[] {
type ServerAcc = {
inventory: Set<string>
invokedTools: Set<string>
invocations: number
loadedSessions: number
}
const servers = new Map<string, ServerAcc>()
function getOrInit(server: string): ServerAcc {
let acc = servers.get(server)
if (!acc) {
acc = { inventory: new Set(), invokedTools: new Set(), invocations: 0, loadedSessions: 0 }
servers.set(server, acc)
}
return acc
}
for (const project of projects) {
for (const session of project.sessions) {
// Only sessions with an observed inventory count toward `loadedSessions`.
// Pure invocation-only sessions (server seen via `call.mcpTools` or
// `session.mcpBreakdown` without any matching `deferred_tools_delta`)
// could otherwise satisfy the `MCP_COVERAGE_MIN_SESSIONS` threshold
// without giving us evidence that the schema was actually loaded.
const inventoriedServers = new Set<string>()
const sessionInvoked = new Map<string, Set<string>>()
// Inventory: union of tools observed available in this session.
for (const fqn of session.mcpInventory ?? []) {
const parts = fqn.split('__')
if (parts.length < 3 || parts[0] !== 'mcp') continue
const server = parts[1]
if (!server) continue
const tool = parts.slice(2).join('__')
if (!tool) continue
const acc = getOrInit(server)
acc.inventory.add(fqn)
inventoriedServers.add(server)
}
// Invoked tools: walk turns to collect per-tool invocations. We can't
// get this from session.mcpBreakdown alone because that's keyed by
// server, not tool.
for (const turn of session.turns) {
for (const call of turn.assistantCalls) {
for (const fqn of call.mcpTools) {
const parts = fqn.split('__')
if (parts.length < 3 || parts[0] !== 'mcp') continue
const server = parts[1]
if (!server) continue
let invoked = sessionInvoked.get(server)
if (!invoked) {
invoked = new Set()
sessionInvoked.set(server, invoked)
}
invoked.add(fqn)
}
}
}
// Invocation totals: trust mcpBreakdown which was already aggregated
// turn-by-turn, including any invocations the inventory pass missed.
for (const [server, data] of Object.entries(session.mcpBreakdown)) {
const acc = getOrInit(server)
acc.invocations += data.calls
}
for (const [server, invoked] of sessionInvoked) {
const acc = getOrInit(server)
for (const fqn of invoked) acc.invokedTools.add(fqn)
}
for (const server of inventoriedServers) {
getOrInit(server).loadedSessions += 1
}
}
}
const result: McpServerCoverage[] = []
for (const [server, acc] of servers) {
if (acc.inventory.size === 0) continue
// Coverage is only meaningful against tools we actually observed in the
// inventory: invocations of tools never inventoried (older config, typo,
// etc.) would otherwise inflate the numerator and could even drive
// `unusedCount` negative.
const invokedInInventory = new Set<string>()
for (const fqn of acc.invokedTools) {
if (acc.inventory.has(fqn)) invokedInInventory.add(fqn)
}
const unusedTools = Array.from(acc.inventory).filter(t => !invokedInInventory.has(t)).sort()
const toolsInvoked = acc.inventory.size - unusedTools.length
result.push({
server,
toolsAvailable: acc.inventory.size,
toolsInvoked,
unusedTools,
invocations: acc.invocations,
loadedSessions: acc.loadedSessions,
coverageRatio: acc.inventory.size === 0 ? 0 : toolsInvoked / acc.inventory.size,
})
}
result.sort((a, b) => b.toolsAvailable - a.toolsAvailable)
return result
}
/**
* Cache-aware token cost estimate for the unused-tool overhead of one or
* more servers, summed across all sessions that loaded any of them.
*
* Returns three buckets:
* - `cacheWriteTokens`: schema bytes paid at full input price (each
* cache-creation event in a session that loaded one of the servers).
* - `cacheReadTokens`: schema bytes carried at the cache-read discount on
* subsequent turns (ongoing overhead).
* - `effectiveInputTokens`: equivalent fresh-input tokens, weighted by
* cache pricing. Used to estimate dollar cost downstream by multiplying
* by the project's input rate.
*
* We cap each call's contribution at the observed cache-creation /
* cache-read totals for that call: it is not meaningful to claim more MCP
* overhead than the call's own cache bucket could possibly contain. The
* cap is applied once across the combined unused-schema budget for all
* flagged servers, not per server, so two flagged servers cannot both
* independently claim the same call's cache bucket.
*
* Anthropic caches expire after roughly 5 minutes of inactivity, so a long
* session can rebuild the cache multiple times. Every call that reports
* `cacheCreationInputTokens > 0` is treated as another rebuild, not just
* the very first one.
*
* "Loaded" is defined exclusively by observed inventory: a session that
* invoked a server without ever emitting a `deferred_tools_delta` for it
* does not count, matching the invariant `aggregateMcpCoverage` uses for
* `loadedSessions`.
*/
export function estimateMcpSchemaCost(
unusedToolCounts: Record<string, number> | number,
projects: ProjectSummary[],
serverOrServers: string | string[],
): { cacheWriteTokens: number; cacheReadTokens: number; effectiveInputTokens: number } {
// Backward-compatible single-server signature used by tests.
const servers = Array.isArray(serverOrServers) ? serverOrServers : [serverOrServers]
const counts: Record<string, number> = typeof unusedToolCounts === 'number'
? { [serverOrServers as string]: unusedToolCounts }
: unusedToolCounts
const totalUnusedSchemaTokens = servers.reduce(
(s, srv) => s + (counts[srv] ?? 0) * TOKENS_PER_MCP_TOOL,
0,
)
if (totalUnusedSchemaTokens === 0) {
return { cacheWriteTokens: 0, cacheReadTokens: 0, effectiveInputTokens: 0 }
}
const serverSet = new Set(servers)
let cacheWriteTokens = 0
let cacheReadTokens = 0
for (const project of projects) {
for (const session of project.sessions) {
// A session counts only if its observed inventory included at least
// one of the flagged servers — same invariant `aggregateMcpCoverage`
// uses for `loadedSessions`.
let loaded = false
for (const fqn of session.mcpInventory ?? []) {
const seg = fqn.split('__')[1]
if (seg && serverSet.has(seg)) { loaded = true; break }
}
if (!loaded) continue
for (const turn of session.turns) {
for (const call of turn.assistantCalls) {
// Both buckets can be non-zero on the same call (cache rebuild
// alongside a partial read), so account for them independently.
// The cap is applied to the combined unused-schema budget so
// multiple flagged servers cannot all claim the same call.
if (call.usage.cacheCreationInputTokens > 0) {
cacheWriteTokens += Math.min(totalUnusedSchemaTokens, call.usage.cacheCreationInputTokens)
}
if (call.usage.cacheReadInputTokens > 0) {
cacheReadTokens += Math.min(totalUnusedSchemaTokens, call.usage.cacheReadInputTokens)
}
}
}
}
}
const effectiveInputTokens = cacheWriteTokens + cacheReadTokens * CACHE_READ_DISCOUNT
return { cacheWriteTokens, cacheReadTokens, effectiveInputTokens }
}
/**
* Find MCP servers whose tool inventory is largely unused. Replaces the
* older server-only `detectUnusedMcp` (which only flagged servers with
* literal zero invocations).
*
* A server is flagged when, taken together:
* - it exposed more than `MCP_COVERAGE_MIN_TOOLS` tools,
* - we saw it loaded in at least `MCP_COVERAGE_MIN_SESSIONS` sessions,
* - the coverage ratio is below `MCP_COVERAGE_LOW_THRESHOLD`.
*
* Token-savings estimates use the cache-aware accounting from
* `estimateMcpSchemaCost` so we don't mistake cached-prefix carry-over for
* fresh-input billing.
*/
export function detectMcpToolCoverage(
projects: ProjectSummary[],
): WasteFinding | null {
const coverage = aggregateMcpCoverage(projects)
if (coverage.length === 0) return null
const flagged = coverage.filter(c =>
c.toolsAvailable > MCP_COVERAGE_MIN_TOOLS
&& c.loadedSessions >= MCP_COVERAGE_MIN_SESSIONS
&& c.coverageRatio < MCP_COVERAGE_LOW_THRESHOLD,
)
if (flagged.length === 0) return null
flagged.sort((a, b) => (b.toolsAvailable - b.toolsInvoked) - (a.toolsAvailable - a.toolsInvoked))
const lines: string[] = []
const removeCommands: string[] = []
const unusedCountsByServer: Record<string, number> = {}
const flaggedServers: string[] = []
for (const c of flagged) {
unusedCountsByServer[c.server] = c.toolsAvailable - c.toolsInvoked
flaggedServers.push(c.server)
const pct = Math.round(c.coverageRatio * 100)
lines.push(
`${c.server}: ${c.toolsInvoked}/${c.toolsAvailable} tools used (${pct}% coverage) across ${c.loadedSessions} session${c.loadedSessions === 1 ? '' : 's'}`,
)
removeCommands.push(`claude mcp remove ${c.server}`)
}
// Single combined cost pass: caps each call's contribution at the
// total unused-schema budget across all flagged servers, so two
// flagged servers cannot independently claim the same call's cache
// bucket and overstate `tokensSaved`.
const cost = estimateMcpSchemaCost(unusedCountsByServer, projects, flaggedServers)
const tokensSaved = Math.round(cost.effectiveInputTokens)
const impact: Impact = tokensSaved >= MCP_COVERAGE_HIGH_IMPACT_TOKENS
? 'high'
: flagged.length >= UNUSED_MCP_HIGH_THRESHOLD
? 'high'
: 'medium'
return {
title: `${flagged.length} MCP server${flagged.length === 1 ? '' : 's'} with low tool coverage`,
explanation:
`Schema for unused tools is loaded into the system prompt every session and ` +
`carried in the cached prefix on every turn. ` +
`${lines.join('; ')}.`,
impact,
tokensSaved,
fix: {
type: 'command',
label: flagged.length === 1
? 'Remove the underused server, or trim its tools in your MCP config:'
: 'Remove underused servers, or trim their tools in your MCP config:',
text: removeCommands.join('\n'),
},
}
}
export function detectUnusedMcp(
calls: ToolCall[],
projects: ProjectSummary[],
@ -497,10 +801,27 @@ export function detectUnusedMcp(
}
}
// Servers that the new coverage detector will flag fall under its
// jurisdiction (per-tool granularity, cache-aware costing) and we
// suppress them here to avoid double-flagging. Importantly, we suppress
// only the servers that actually clear the coverage detector's
// thresholds — a small, inventoried-but-uninvoked server that the
// coverage detector skips would otherwise become a blind spot.
const coverageReportedServers = new Set(
aggregateMcpCoverage(projects)
.filter(c =>
c.toolsAvailable > MCP_COVERAGE_MIN_TOOLS
&& c.loadedSessions >= MCP_COVERAGE_MIN_SESSIONS
&& c.coverageRatio < MCP_COVERAGE_LOW_THRESHOLD,
)
.map(c => c.server),
)
const now = Date.now()
const unused: string[] = []
for (const entry of configured.values()) {
if (calledServers.has(entry.normalized)) continue
if (coverageReportedServers.has(entry.normalized)) continue
if (entry.mtime > 0 && now - entry.mtime < MCP_NEW_CONFIG_GRACE_MS) continue
unused.push(entry.original)
}
@ -973,6 +1294,7 @@ export async function scanAndDetect(
() => detectJunkReads(toolCalls, dateRange),
() => detectDuplicateReads(toolCalls, dateRange),
() => detectUnusedMcp(toolCalls, projects, projectCwds),
() => detectMcpToolCoverage(projects),
() => detectBloatedClaudeMd(projectCwds),
() => detectBashBloat(),
]

View file

@ -203,10 +203,54 @@ function groupIntoTurns(entries: JournalEntry[], seenMsgIds: Set<string>): Parse
return turns
}
/**
* Extract MCP tool inventory observed across a session's JSONL entries.
*
* Claude Code emits `attachment.type === "deferred_tools_delta"` entries whose
* `addedNames` array lists every tool currently available at that turn (built-in
* tools plus all `mcp__<server>__<tool>` names exposed by configured MCP
* servers). Tool inventory can change mid-session if the user reloads MCP
* config, so we union every occurrence rather than trusting only the first.
*
* Built-in tools are filtered out: only `mcp__*` identifiers survive.
*/
// Fully-qualified MCP tool name shape: `mcp__<server>__<tool>`. Both server
// and tool segments must be non-empty. Names like `mcp__server` (no tool
// segment) or `mcp__server__` (trailing empty tool) would silently pollute
// the inventory and break downstream `split('__')` consumers, so they're
// rejected here.
function isMcpToolName(name: string): boolean {
if (!name.startsWith('mcp__')) return false
const rest = name.slice(5) // strip `mcp__`
const sep = rest.indexOf('__')
if (sep <= 0) return false // missing or empty server
if (sep >= rest.length - 2) return false // missing or empty tool
return true
}
export function extractMcpInventory(entries: JournalEntry[]): string[] {
const inventory = new Set<string>()
for (const entry of entries) {
const att = entry['attachment']
if (!att || typeof att !== 'object') continue
const a = att as { type?: unknown; addedNames?: unknown }
if (a.type !== 'deferred_tools_delta') continue
if (!Array.isArray(a.addedNames)) continue
for (const name of a.addedNames) {
if (typeof name !== 'string') continue
if (!isMcpToolName(name)) continue
inventory.add(name)
}
}
if (inventory.size === 0) return []
return Array.from(inventory).sort()
}
function buildSessionSummary(
sessionId: string,
project: string,
turns: ClassifiedTurn[],
mcpInventory?: string[],
): SessionSummary {
const modelBreakdown: SessionSummary['modelBreakdown'] = Object.create(null)
const toolBreakdown: SessionSummary['toolBreakdown'] = Object.create(null)
@ -311,6 +355,7 @@ function buildSessionSummary(
bashBreakdown,
categoryBreakdown,
skillBreakdown,
...(mcpInventory && mcpInventory.length > 0 ? { mcpInventory } : {}),
}
}
@ -362,7 +407,14 @@ async function parseSessionFile(
}
const classified = turns.map(classifyTurn)
return buildSessionSummary(sessionId, project, classified)
// Inventory is extracted from the full entry stream, not just the
// turns we kept after date filtering: tool availability is set up
// once at the start of a session (with possible mid-session reloads),
// and we want to reflect what was loaded even if the user only ran
// turns inside a narrow date window.
const mcpInventory = extractMcpInventory(entries)
return buildSessionSummary(sessionId, project, classified, mcpInventory)
}
async function collectJsonlFiles(dirPath: string): Promise<string[]> {

View file

@ -121,6 +121,12 @@ export type SessionSummary = {
bashBreakdown: Record<string, { calls: number }>
categoryBreakdown: Record<TaskCategory, { turns: number; costUSD: number; retries: number; editTurns: number; oneShotTurns: number }>
skillBreakdown: Record<string, { turns: number; costUSD: number; editTurns: number; oneShotTurns: number }>
// Observed MCP tools available in this session, captured from
// `attachment.deferred_tools_delta.addedNames` entries. Union across all
// turns. Each name is a fully-qualified `mcp__<server>__<tool>` identifier.
// Built-in tools (Bash, Edit, etc.) are filtered out. Provider-agnostic field;
// currently populated only by the Claude parser.
mcpInventory?: string[]
}
export type ProjectSummary = {