codeburn

mirror of https://github.com/AgentSeal/codeburn.git synced 2026-05-17 12:20:43 +00:00

Author	SHA1	Message	Date
ozymandiashh	1a080a006f	feat(optimize): MCP tool coverage detector with cache-aware costing Adds a per-tool optimizer finding for MCP servers whose schema is loaded on every turn but rarely invoked. Builds on the existing server-level `detectUnusedMcp` (zero invocations) by reporting partial-use cases: "loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)". Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta` entries: `addedNames` lists the exact tools available at that turn, including every fully-qualified `mcp__<server>__<tool>` name. We union across all delta entries in a session (not just the first) because tool availability can change mid-session when the user reloads MCP config or a subagent inherits a different tool set. Names that don't match the `mcp__<server>__<tool>` shape with both segments non-empty are rejected at extraction so downstream `split('__')` consumers can't be poisoned. Token-savings estimates are cache-aware. MCP tool schemas live in the cached prefix of the system prompt: a session pays the full input price on each cache-creation turn (rebuilds happen every ~5 minutes of inactivity) and the cache-read discount on subsequent turns. Each call's contribution is capped at its observed `cacheCreationInputTokens` / `cacheReadInputTokens` so we never claim more MCP overhead than the call's own cache buckets could contain. When multiple servers are flagged, costing happens in a single combined pass: the per-call cap applies to the total unused-schema budget across all flagged servers, not per server. Two flagged servers cannot both independently claim the same call's cache bucket, which would otherwise overstate `tokensSaved` and misclassify findings as high impact. A session counts toward `loadedSessions` (and toward the cost estimate) only if its observed inventory included the server. Pure invocation-only sessions, where the server appears in `mcpBreakdown` or `call.mcpTools` without any matching `deferred_tools_delta`, do not satisfy the `>= 2 sessions` threshold on their own. The same invariant applies in `estimateMcpSchemaCost` so the two passes agree. Coverage is computed against the inventory only: invocations of names not present in any observed inventory (older config, hallucinated tool, typo) do not inflate `toolsInvoked` and cannot drive `unusedCount` negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length` to keep both numbers consistent. `detectUnusedMcp` and the new detector are explicitly disjoint: `detectUnusedMcp` skips servers that the coverage detector will report, not every server that happens to be in any inventory, so a small inventoried-but-uninvoked server below the coverage thresholds still gets flagged as "configured but never called." Thresholds for the coverage finding: - > 10 tools available (small servers are noise) - < 20% coverage - >= 2 sessions with observed inventory - High impact when total effective tokens >= 200_000 or >= 3 servers flagged Smoke-tested on a real account: 7 servers flagged across 93 sessions (`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37, `excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus `claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest. Changes: - src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`. Provider-agnostic field; currently populated only by the Claude parser. - src/parser.ts: `extractMcpInventory` walks all entries, validates fully-qualified names, returns sorted unique list. `buildSessionSummary` passes it through; field is omitted when empty so JSON exports stay clean. - src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost` (single- and multi-server signatures), `detectMcpToolCoverage`. Wired into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the new detector. - tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing, combined-cap behaviour, threshold gates, invocation-only-session filtering, foreign-tool invocations, cache rebuild events, write+read on the same call, multi-server pluralisation. - tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor including malformed name rejection and tolerant attachment parsing. - CHANGELOG.md: entry under Unreleased / Added (CLI). Closes #2	2026-05-05 04:13:04 +03:00
voidborne-d	c16b21ec50	fix(classifier): surface skill name as subCategory for general turns (#203 ) Turns whose only assistant tool is `Skill` collapse to category `general` because `classifyByToolPattern` returns `'general'` and `refineByKeywords` only operates on `coding`/`exploration`. In environments that lean on Claude Code skills, the per-activity dashboard column flattens — every `/init`, `/review`, `/security-review`, `/claude-api`, plus user-defined skills, all land in `general` with no signal about which workflow ran. Implements Option A from the issue: - `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser via a new `extractSkillNames` helper that reads `input.skill \|\| input.name` from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at optimize.ts:765 so the two stay in sync). - `ClassifiedTurn.subCategory?: string` set to the first skill name when the resolved category is `general` AND any skill identifier was extracted. Top-level category stays `general` — existing aggregations, exports, and category-keyed code paths unchanged. - `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns, oneShotTurns}>` populated in the same per-turn loop that builds `categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills: []` — they don't expose the Skill tool surface today. - Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the `general` row when present (indented `/skill-name`, dimmed). Other categories render exactly as before; if no skills were invoked, the panel is byte-identical to current output. Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases: single skill via `input.skill`, single via `input.name`, first-wins for multi-skill turns, aggregation across multiple assistant calls in one turn, no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to `coding` and dropping subCategory, non-Skill general turns, and a legacy ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix verification by stashing the source change reproduces 4/8 failures with the exact "expected 'init', received undefined" diff; restoring → 8/8 pass. Closes #203. 🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7); author of record reviewed every line, ran the full vitest suite locally (`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and `npm run build` produces a clean ESM bundle.	2026-05-04 06:26:45 +08:00
AgentSeal	391a235d1d	feat: multi-provider support (Codex + provider plugin system) Add Codex (OpenAI) as a second provider alongside Claude Code. Provider plugin architecture makes adding future providers (Pi, OpenCode, Amp) a single-file addition. - Provider interface: types, session discovery, stateful JSONL parsing - Codex parser: token_count dedup, tool normalization, model resolution - TUI: press p to cycle All/Claude/Codex with 1-min cache for instant switching - CLI: --provider flag on report, today, month, status, export commands - Pricing: Codex model fallbacks, fixed fuzzy matching for gpt-5.4-mini - Menubar: per-provider cost breakdown when multiple providers detected - 27 tests (10 new: Codex parser, provider registry, tool/model mapping)	2026-04-14 04:32:09 -07:00
Rafael Calleja	6d8c8643a0	feat: extract bash commands and add bashBreakdown to session summary	2026-04-14 10:24:38 +02:00
AgentSeal	d20281514c	feat: one-shot success rate per activity category Detects edit/test/fix retry cycles (Edit -> Bash -> Edit) within each turn. Shows 1-shot percentage in the By Activity panel for categories that involve code edits. Updated screenshot and README. Fixes #4	2026-04-14 01:14:34 -07:00
AgentSeal	00afed6930	v0.1.0 - initial release Interactive TUI dashboard for Claude Code token observability. 13-category task classifier, per-project/model/tool breakdowns, gradient bar charts, SwiftBar menu bar widget, CSV/JSON export.	2026-04-13 15:10:27 -07:00

6 commits