codeburn

mirror of https://github.com/AgentSeal/codeburn.git synced 2026-05-17 03:56:45 +00:00

Author	SHA1	Message	Date
ozymandiashh	1a080a006f	feat(optimize): MCP tool coverage detector with cache-aware costing Adds a per-tool optimizer finding for MCP servers whose schema is loaded on every turn but rarely invoked. Builds on the existing server-level `detectUnusedMcp` (zero invocations) by reporting partial-use cases: "loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)". Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta` entries: `addedNames` lists the exact tools available at that turn, including every fully-qualified `mcp__<server>__<tool>` name. We union across all delta entries in a session (not just the first) because tool availability can change mid-session when the user reloads MCP config or a subagent inherits a different tool set. Names that don't match the `mcp__<server>__<tool>` shape with both segments non-empty are rejected at extraction so downstream `split('__')` consumers can't be poisoned. Token-savings estimates are cache-aware. MCP tool schemas live in the cached prefix of the system prompt: a session pays the full input price on each cache-creation turn (rebuilds happen every ~5 minutes of inactivity) and the cache-read discount on subsequent turns. Each call's contribution is capped at its observed `cacheCreationInputTokens` / `cacheReadInputTokens` so we never claim more MCP overhead than the call's own cache buckets could contain. When multiple servers are flagged, costing happens in a single combined pass: the per-call cap applies to the total unused-schema budget across all flagged servers, not per server. Two flagged servers cannot both independently claim the same call's cache bucket, which would otherwise overstate `tokensSaved` and misclassify findings as high impact. A session counts toward `loadedSessions` (and toward the cost estimate) only if its observed inventory included the server. Pure invocation-only sessions, where the server appears in `mcpBreakdown` or `call.mcpTools` without any matching `deferred_tools_delta`, do not satisfy the `>= 2 sessions` threshold on their own. The same invariant applies in `estimateMcpSchemaCost` so the two passes agree. Coverage is computed against the inventory only: invocations of names not present in any observed inventory (older config, hallucinated tool, typo) do not inflate `toolsInvoked` and cannot drive `unusedCount` negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length` to keep both numbers consistent. `detectUnusedMcp` and the new detector are explicitly disjoint: `detectUnusedMcp` skips servers that the coverage detector will report, not every server that happens to be in any inventory, so a small inventoried-but-uninvoked server below the coverage thresholds still gets flagged as "configured but never called." Thresholds for the coverage finding: - > 10 tools available (small servers are noise) - < 20% coverage - >= 2 sessions with observed inventory - High impact when total effective tokens >= 200_000 or >= 3 servers flagged Smoke-tested on a real account: 7 servers flagged across 93 sessions (`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37, `excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus `claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest. Changes: - src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`. Provider-agnostic field; currently populated only by the Claude parser. - src/parser.ts: `extractMcpInventory` walks all entries, validates fully-qualified names, returns sorted unique list. `buildSessionSummary` passes it through; field is omitted when empty so JSON exports stay clean. - src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost` (single- and multi-server signatures), `detectMcpToolCoverage`. Wired into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the new detector. - tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing, combined-cap behaviour, threshold gates, invocation-only-session filtering, foreign-tool invocations, cache rebuild events, write+read on the same call, multi-server pluralisation. - tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor including malformed name rejection and tolerant attachment parsing. - CHANGELOG.md: entry under Unreleased / Added (CLI). Closes #2	2026-05-05 04:13:04 +03:00
voidborne-d	c16b21ec50	fix(classifier): surface skill name as subCategory for general turns (#203 ) Turns whose only assistant tool is `Skill` collapse to category `general` because `classifyByToolPattern` returns `'general'` and `refineByKeywords` only operates on `coding`/`exploration`. In environments that lean on Claude Code skills, the per-activity dashboard column flattens — every `/init`, `/review`, `/security-review`, `/claude-api`, plus user-defined skills, all land in `general` with no signal about which workflow ran. Implements Option A from the issue: - `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser via a new `extractSkillNames` helper that reads `input.skill \|\| input.name` from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at optimize.ts:765 so the two stay in sync). - `ClassifiedTurn.subCategory?: string` set to the first skill name when the resolved category is `general` AND any skill identifier was extracted. Top-level category stays `general` — existing aggregations, exports, and category-keyed code paths unchanged. - `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns, oneShotTurns}>` populated in the same per-turn loop that builds `categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills: []` — they don't expose the Skill tool surface today. - Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the `general` row when present (indented `/skill-name`, dimmed). Other categories render exactly as before; if no skills were invoked, the panel is byte-identical to current output. Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases: single skill via `input.skill`, single via `input.name`, first-wins for multi-skill turns, aggregation across multiple assistant calls in one turn, no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to `coding` and dropping subCategory, non-Skill general turns, and a legacy ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix verification by stashing the source change reproduces 4/8 failures with the exact "expected 'init', received undefined" diff; restoring → 8/8 pass. Closes #203. 🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7); author of record reviewed every line, ran the full vitest suite locally (`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and `npm run build` produces a clean ESM bundle.	2026-05-04 06:26:45 +08:00
iamtoruk	800c106250	Fix streaming dedup: keep last occurrence of each message.id within session files Claude Code writes the same message.id multiple times during streaming. The first write has partial tokens (often 1) and no tool_use blocks. The last write has authoritative token counts and all tool_use/MCP blocks. Old behavior kept the first occurrence (keep-first), silently dropping real output tokens (+6.3% undercount) and all MCP tool calls. New behavior keeps the last occurrence's content but preserves the first occurrence's timestamp for correct date bucketing. Validated against 21,390 real session files: 40.5% had duplicate IDs, output tokens were understated by up to 78% per session.	2026-05-02 22:30:17 -07:00
Resham Joshi	8c845253c2	Add Antigravity IDE provider Fetch token usage from Antigravity's local language server via RPC. Falls back to cached results when the IDE is closed.	2026-05-02 08:58:23 -07:00
iamtoruk	8ab9ea916b	Add per-file result cache for Codex provider Fixes #183. Users with large Codex session directories (45 GB, 10K+ files) experienced CPU pegging because every 30-second refresh re-parsed all session files from scratch. Three optimizations: 1. readFirstLine now reads 16 KB via fs.open() instead of loading the entire file through readSessionFile. Cuts discovery I/O from ~45 GB to ~160 MB for 10K files. 2. Per-file result cache (codex-results.json) with mtime+size fingerprinting. Parsed results are cached on first run; subsequent runs return cached data instantly for unchanged files. 3. Cache-accelerated discovery skips header validation for cached files, pulling the project name directly from the cache manifest. Cache safety: fingerprint captured before read (no TOCTOU), atomic write via temp+fsync+rename, 0o600 permissions, Object.hasOwn for prototype pollution defense, eviction of deleted files on flush, try/finally ensures flush even on parse errors.	2026-04-30 16:43:41 -07:00
Łukasz Majcher	5e49f17e64	fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM readViaStream (used for files ≥8 MB) reconstructs the full file as a single string via chunks.join('\n'), giving the same peak allocation as readFile. Callers then call content.split('\n'), creating a second copy. With FILE_READ_CONCURRENCY=16 and files up to 128 MB this can exhaust the V8 heap (~6 GB theoretical peak). readSessionLines already exists as a proper async generator that yields one line at a time. Switch both hot-path callers to iterate it directly so the full file string is never held in memory. Adds two tests: a spy test confirming readSessionLines is called (not readSessionFile), and a 500-entry correctness test. Fixes #131	2026-04-22 10:11:13 +00:00
iamtoruk	b491a1f590	fix: bucket turns by assistant timestamp, filter at turn level A turn that straddles midnight (user typed at 23:58, assistant responded at 00:30) was bucketed and filtered inconsistently across call sites. parseSessionFile filtered entries by timestamp, producing orphan assistant calls that groupIntoTurns pushed as turns with empty timestamp. Some downstream code counted those (buildPeriodData summing project totals) and other code dropped them (renderStatusBar's empty-timestamp skip). The menubar showed today = $32 while the terminal status showed today = $27 for the same dataset; each was internally consistent but used a different turn-bucket rule. Fix both: parseSessionFile now builds all turns first, then filters each turn by its first assistant call timestamp (the moment cost was incurred). renderStatusBar buckets the same way. day-aggregator.ts already bucketed on assistant time, so it is now consistent too. Net effect: a turn is counted in the day the API call actually ran in.	2026-04-21 04:40:44 -07:00
Ninym	5932a273a1	chore(ci): add semgrep guard against prototype pollution regressions in provider hot paths (#78 ) * chore(ci): add semgrep rule no-bracket-assign-on-literal-object-map * chore(ci): add workflow running semgrep bracket-assign guard on push/PR * fix(parser): use Object.create(null) for categoryBreakdown map * chore(ci): expand semgrep rule to cover \|\|, ??=, and if-guard variants * chore(ci): limit push trigger to main and add semgrep --strict * chore(ci): use jq to enforce finding count (--error unreliable in semgrep 1.x)	2026-04-18 15:10:24 -07:00
Resham Joshi	495a254338	feat(mac): native Swift menubar app + one-command install Introduces mac/ with a native SwiftUI menubar app that replaces the previous SwiftBar plugin entirely. Install via `npx codeburn menubar`, which downloads the .app from GitHub Releases, strips Gatekeeper quarantine, and drops it into ~/Applications. Highlights - mac/ SwiftUI app: agent tabs, Today/7/30/Month/All period switcher, Trend/Forecast/Pulse/Stats/Plan insights, activity + model breakdowns, optimize findings, CSV/JSON export, Star-on-GitHub banner, live 60s refresh, instant currency switching with offline FX cache. - Security: CodeburnCLI argv-based spawn (no shell interpretation), SafeFile symlink guards + O_NOFOLLOW writes, FX rate clamping to [0.0001, 1_000_000], keychain filtered to account == "default", removed byte-window credential log, in-flight refresh guard, POSIX flock on config.json writes, TerminalLauncher validates argv before AppleScript interpolation. - Performance: shared static NumberFormatter (thousands of allocations per popover redraw eliminated), concurrent pipe drain with 20 MB cap + 60s timeout in DataClient, Observation-tracked reactive UI, 5-min payload cache keyed on (period, provider). - CLI: new `codeburn menubar` subcommand that downloads + installs + launches the .app (no clone, no build). New `status --format menubar-json` payload builder. `export` rewritten to produce a folder of one-table-per-file CSVs with a `.codeburn-export` marker so arbitrary -o paths cannot be silently deleted. - Removed: src/menubar.ts (SwiftBar plugin generator), install-menubar / uninstall-menubar subcommands, `status --format menubar` directive output, tests/menubar.test.ts, tests/security/menubar-injection.test.ts. - Release: .github/workflows/release-menubar.yml builds universal binary, assembles .app, ad-hoc signs, zips, uploads on mac-v* tag push. Runs on the free macos-latest runner. Tests - 230 TypeScript tests pass - 10 Swift CapacityEstimator tests pass - TypeScript typecheck clean - Swift release build clean	2026-04-17 16:55:56 -07:00
Ninym	ee738a1b26	fix(parser): use bounded readSessionFile helper Replaces the unbounded readFile in parseSessionFile with the 128 MB-capped helper from src/fs-utils. Addresses MEDIUM-1 for the Claude provider hot path. Verbose-mode stderr output replaces the previous silent catch, closing LOW-1 as a side effect.	2026-04-17 08:32:19 +02:00
Ninym	5b810161e7	fix(parser): block prototype pollution via Object.create(null) Initialize the four breakdown maps (model, tool, mcp, bash) with null prototype so attacker-controlled keys named __proto__ create own properties on the map instead of mutating Object.prototype. Closes the HIGH-1 finding from the 2026-04-16 external security audit.	2026-04-17 08:32:18 +02:00
Travis Haley	67c504a60a	feat: add --project and --exclude filters for project-level filtering Adds two new repeatable flags to all commands (report, today, month, status, export): - --project <name>: include only projects matching name (substring, case-insensitive) - --exclude <name>: exclude projects matching name (substring, case-insensitive) Both flags can be specified multiple times to match multiple projects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:54:37 -06:00
AgentSeal	2d114d9393	feat: add OpenCode provider Reads session data from OpenCode's SQLite databases at ~/.local/share/opencode/. Reuses the existing better-sqlite3 adapter (same as Cursor), lazy-loaded so users without OpenCode see no difference. Adds bashCommands to the provider interface so shell command breakdowns work across all providers. 31 tests, schema validation, diagnostic stderr on failures. Also fixes a pre-existing tsc error in currency.ts.	2026-04-15 14:24:37 -07:00
AgentSeal	94762ca1f4	fix: address review findings before merge - getProvider() now async, eliminates race condition with cursor loading - cursor:edit pseudo-tool prevents inflating Claude's Edit count in --provider all - Tightened SCRIPT_PATTERNS to avoid false positives (run requires file context) - Removed duplicated LANG_NAMES from cursor.ts (dashboard handles display) - Test no longer assumes cursor always loads (CI-safe) - Removed unnecessary type assertion and setTimeout yield	2026-04-15 05:31:51 -07:00
AgentSeal	51c56d0726	fix: include agent/subagent sessions, fix Codex cache hit and cost calculation - Remove agent-*.jsonl exclusion filter that was dropping ~46% of API calls - Scan subagents/ directories for subagent session files - Normalize Codex token semantics: OpenAI includes cached tokens inside input_tokens, subtract them to match Anthropic's separate reporting - Fixes cost double-counting and 100% cache hit display for Codex users	2026-04-14 10:18:14 -07:00
AgentSeal	391a235d1d	feat: multi-provider support (Codex + provider plugin system) Add Codex (OpenAI) as a second provider alongside Claude Code. Provider plugin architecture makes adding future providers (Pi, OpenCode, Amp) a single-file addition. - Provider interface: types, session discovery, stateful JSONL parsing - Codex parser: token_count dedup, tool normalization, model resolution - TUI: press p to cycle All/Claude/Codex with 1-min cache for instant switching - CLI: --provider flag on report, today, month, status, export commands - Pricing: Codex model fallbacks, fixed fuzzy matching for gpt-5.4-mini - Menubar: per-provider cost breakdown when multiple providers detected - 27 tests (10 new: Codex parser, provider registry, tool/model mapping)	2026-04-14 04:32:09 -07:00
AgentSeal	cb5853c460	Merge pull request #6 from rafaelcalleja/feat/bash-breakdown-panel feat: add Shell Commands breakdown panel	2026-04-14 10:39:23 +02:00
AgentSeal	3964478e61	fix: handle unreadable session files gracefully readFile in parseSessionFile had no error handling. If a .jsonl file is missing, has bad permissions, or gets deleted between readdir and readFile, the whole process crashes with ENOENT. Now returns null and skips the file. Fixes #9	2026-04-14 01:31:31 -07:00
Rafael Calleja	a5696362f2	refactor: share BASH_TOOLS from classifier, remove comments - Export BASH_TOOLS from classifier.ts instead of duplicating in bash-utils.ts - Remove isBashTool helper (use BASH_TOOLS.has() directly) - Strip unnecessary comments per codebase conventions	2026-04-14 10:24:38 +02:00
Rafael Calleja	6d8c8643a0	feat: extract bash commands and add bashBreakdown to session summary	2026-04-14 10:24:38 +02:00
AgentSeal	d20281514c	feat: one-shot success rate per activity category Detects edit/test/fix retry cycles (Edit -> Bash -> Edit) within each turn. Shows 1-shot percentage in the By Activity panel for categories that involve code edits. Updated screenshot and README. Fixes #4	2026-04-14 01:14:34 -07:00
AgentSeal	74744f07bb	fix: stop tool-result entries from splitting turns and inflating Conversation Tool results in JSONL are type:"user" entries with no text content. groupIntoTurns was flushing on every type:"user" entry, creating phantom turns that got classified as Conversation. Now only flush when the user entry contains actual text. Fixes #7	2026-04-14 00:57:43 -07:00
AgentSeal	0da57d1172	add Claude Desktop (code tab) session support Scans ~/Library/Application Support/Claude/local-agent-mode-sessions/ for Desktop sessions in addition to ~/.claude/projects/. Same JSONL format, just nested deeper. Cross-platform paths for macOS/Windows/Linux.	2026-04-13 17:58:19 -07:00
AgentSeal	f6cc68a7d4	support CLAUDE_CONFIG_DIR environment variable Respects CLAUDE_CONFIG_DIR if set, falls back to ~/.claude. Closes #3.	2026-04-13 17:52:27 -07:00
AgentSeal	00afed6930	v0.1.0 - initial release Interactive TUI dashboard for Claude Code token observability. 13-category task classifier, per-project/model/tool breakdowns, gradient bar charts, SwiftBar menu bar widget, CSV/JSON export.	2026-04-13 15:10:27 -07:00

25 commits