Commit graph

25 commits

Author SHA1 Message Date
ozymandiashh
1a080a006f feat(optimize): MCP tool coverage detector with cache-aware costing
Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".

Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.

Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.

When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.

A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.

Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.

`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."

Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged

Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.

Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
  Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
  fully-qualified names, returns sorted unique list. `buildSessionSummary`
  passes it through; field is omitted when empty so JSON exports stay
  clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
  (single- and multi-server signatures), `detectMcpToolCoverage`. Wired
  into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
  new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
  combined-cap behaviour, threshold gates, invocation-only-session
  filtering, foreign-tool invocations, cache rebuild events, write+read
  on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
  including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).

Closes #2
2026-05-05 04:13:04 +03:00
voidborne-d
c16b21ec50 fix(classifier): surface skill name as subCategory for general turns (#203)
Turns whose only assistant tool is `Skill` collapse to category `general`
because `classifyByToolPattern` returns `'general'` and `refineByKeywords`
only operates on `coding`/`exploration`. In environments that lean on Claude
Code skills, the per-activity dashboard column flattens — every `/init`,
`/review`, `/security-review`, `/claude-api`, plus user-defined skills, all
land in `general` with no signal about which workflow ran.

Implements Option A from the issue:

- `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser
  via a new `extractSkillNames` helper that reads `input.skill || input.name`
  from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at
  optimize.ts:765 so the two stay in sync).
- `ClassifiedTurn.subCategory?: string` set to the first skill name when the
  resolved category is `general` AND any skill identifier was extracted.
  Top-level category stays `general` — existing aggregations, exports, and
  category-keyed code paths unchanged.
- `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns,
  oneShotTurns}>` populated in the same per-turn loop that builds
  `categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills:
  []` — they don't expose the Skill tool surface today.
- Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the
  `general` row when present (indented `/skill-name`, dimmed). Other
  categories render exactly as before; if no skills were invoked, the panel
  is byte-identical to current output.

Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases:
single skill via `input.skill`, single via `input.name`, first-wins for
multi-skill turns, aggregation across multiple assistant calls in one turn,
no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to
`coding` and dropping subCategory, non-Skill general turns, and a legacy
ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix
verification by stashing the source change reproduces 4/8 failures with the
exact "expected 'init', received undefined" diff; restoring → 8/8 pass.

Closes #203.

🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7);
author of record reviewed every line, ran the full vitest suite locally
(`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and
`npm run build` produces a clean ESM bundle.
2026-05-04 06:26:45 +08:00
iamtoruk
800c106250 Fix streaming dedup: keep last occurrence of each message.id within session files
Claude Code writes the same message.id multiple times during streaming.
The first write has partial tokens (often 1) and no tool_use blocks.
The last write has authoritative token counts and all tool_use/MCP blocks.

Old behavior kept the first occurrence (keep-first), silently dropping
real output tokens (+6.3% undercount) and all MCP tool calls.

New behavior keeps the last occurrence's content but preserves the first
occurrence's timestamp for correct date bucketing.

Validated against 21,390 real session files: 40.5% had duplicate IDs,
output tokens were understated by up to 78% per session.
2026-05-02 22:30:17 -07:00
Resham Joshi
8c845253c2
Add Antigravity IDE provider
Fetch token usage from Antigravity's local language server via RPC.
Falls back to cached results when the IDE is closed.
2026-05-02 08:58:23 -07:00
iamtoruk
8ab9ea916b Add per-file result cache for Codex provider
Fixes #183. Users with large Codex session directories (45 GB, 10K+
files) experienced CPU pegging because every 30-second refresh re-parsed
all session files from scratch.

Three optimizations:

1. readFirstLine now reads 16 KB via fs.open() instead of loading the
   entire file through readSessionFile. Cuts discovery I/O from ~45 GB
   to ~160 MB for 10K files.

2. Per-file result cache (codex-results.json) with mtime+size
   fingerprinting. Parsed results are cached on first run; subsequent
   runs return cached data instantly for unchanged files.

3. Cache-accelerated discovery skips header validation for cached files,
   pulling the project name directly from the cache manifest.

Cache safety: fingerprint captured before read (no TOCTOU), atomic
write via temp+fsync+rename, 0o600 permissions, Object.hasOwn for
prototype pollution defense, eviction of deleted files on flush,
try/finally ensures flush even on parse errors.
2026-04-30 16:43:41 -07:00
Łukasz Majcher
5e49f17e64 fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM
readViaStream (used for files ≥8 MB) reconstructs the full file as a
single string via chunks.join('\n'), giving the same peak allocation as
readFile. Callers then call content.split('\n'), creating a second copy.
With FILE_READ_CONCURRENCY=16 and files up to 128 MB this can exhaust
the V8 heap (~6 GB theoretical peak).

readSessionLines already exists as a proper async generator that yields
one line at a time. Switch both hot-path callers to iterate it directly
so the full file string is never held in memory.

Adds two tests: a spy test confirming readSessionLines is called (not
readSessionFile), and a 500-entry correctness test.

Fixes #131
2026-04-22 10:11:13 +00:00
iamtoruk
b491a1f590 fix: bucket turns by assistant timestamp, filter at turn level
A turn that straddles midnight (user typed at 23:58, assistant responded
at 00:30) was bucketed and filtered inconsistently across call sites.
parseSessionFile filtered entries by timestamp, producing orphan assistant
calls that groupIntoTurns pushed as turns with empty timestamp. Some
downstream code counted those (buildPeriodData summing project totals)
and other code dropped them (renderStatusBar's empty-timestamp skip).

The menubar showed today = $32 while the terminal status showed today = $27
for the same dataset; each was internally consistent but used a different
turn-bucket rule.

Fix both: parseSessionFile now builds all turns first, then filters each
turn by its first assistant call timestamp (the moment cost was incurred).
renderStatusBar buckets the same way. day-aggregator.ts already bucketed
on assistant time, so it is now consistent too.

Net effect: a turn is counted in the day the API call actually ran in.
2026-04-21 04:40:44 -07:00
Ninym
5932a273a1
chore(ci): add semgrep guard against prototype pollution regressions in provider hot paths (#78)
* chore(ci): add semgrep rule no-bracket-assign-on-literal-object-map

* chore(ci): add workflow running semgrep bracket-assign guard on push/PR

* fix(parser): use Object.create(null) for categoryBreakdown map

* chore(ci): expand semgrep rule to cover ||, ??=, and if-guard variants

* chore(ci): limit push trigger to main and add semgrep --strict

* chore(ci): use jq to enforce finding count (--error unreliable in semgrep 1.x)
2026-04-18 15:10:24 -07:00
Resham Joshi
495a254338 feat(mac): native Swift menubar app + one-command install
Introduces mac/ with a native SwiftUI menubar app that replaces the
previous SwiftBar plugin entirely. Install via `npx codeburn menubar`,
which downloads the .app from GitHub Releases, strips Gatekeeper
quarantine, and drops it into ~/Applications.

Highlights

- mac/ SwiftUI app: agent tabs, Today/7/30/Month/All period switcher,
  Trend/Forecast/Pulse/Stats/Plan insights, activity + model
  breakdowns, optimize findings, CSV/JSON export, Star-on-GitHub
  banner, live 60s refresh, instant currency switching with offline FX
  cache.
- Security: CodeburnCLI argv-based spawn (no shell interpretation),
  SafeFile symlink guards + O_NOFOLLOW writes, FX rate clamping to
  [0.0001, 1_000_000], keychain filtered to account == "default",
  removed byte-window credential log, in-flight refresh guard, POSIX
  flock on config.json writes, TerminalLauncher validates argv before
  AppleScript interpolation.
- Performance: shared static NumberFormatter (thousands of allocations
  per popover redraw eliminated), concurrent pipe drain with 20 MB cap
  + 60s timeout in DataClient, Observation-tracked reactive UI, 5-min
  payload cache keyed on (period, provider).
- CLI: new `codeburn menubar` subcommand that downloads + installs +
  launches the .app (no clone, no build). New `status --format
  menubar-json` payload builder. `export` rewritten to produce a
  folder of one-table-per-file CSVs with a `.codeburn-export` marker
  so arbitrary -o paths cannot be silently deleted.
- Removed: src/menubar.ts (SwiftBar plugin generator),
  install-menubar / uninstall-menubar subcommands, `status --format
  menubar` directive output, tests/menubar.test.ts,
  tests/security/menubar-injection.test.ts.
- Release: .github/workflows/release-menubar.yml builds universal
  binary, assembles .app, ad-hoc signs, zips, uploads on mac-v* tag
  push. Runs on the free macos-latest runner.

Tests

- 230 TypeScript tests pass
- 10 Swift CapacityEstimator tests pass
- TypeScript typecheck clean
- Swift release build clean
2026-04-17 16:55:56 -07:00
Ninym
ee738a1b26 fix(parser): use bounded readSessionFile helper
Replaces the unbounded readFile in parseSessionFile with the 128 MB-capped
helper from src/fs-utils. Addresses MEDIUM-1 for the Claude provider
hot path.

Verbose-mode stderr output replaces the previous silent catch,
closing LOW-1 as a side effect.
2026-04-17 08:32:19 +02:00
Ninym
5b810161e7 fix(parser): block prototype pollution via Object.create(null)
Initialize the four breakdown maps (model, tool, mcp, bash) with null
prototype so attacker-controlled keys named __proto__ create own
properties on the map instead of mutating Object.prototype.

Closes the HIGH-1 finding from the 2026-04-16 external security audit.
2026-04-17 08:32:18 +02:00
Travis Haley
67c504a60a feat: add --project and --exclude filters for project-level filtering
Adds two new repeatable flags to all commands (report, today, month, status, export):
- --project <name>: include only projects matching name (substring, case-insensitive)
- --exclude <name>: exclude projects matching name (substring, case-insensitive)

Both flags can be specified multiple times to match multiple projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:54:37 -06:00
AgentSeal
2d114d9393 feat: add OpenCode provider
Reads session data from OpenCode's SQLite databases at
~/.local/share/opencode/. Reuses the existing better-sqlite3
adapter (same as Cursor), lazy-loaded so users without OpenCode
see no difference. Adds bashCommands to the provider interface
so shell command breakdowns work across all providers.

31 tests, schema validation, diagnostic stderr on failures.
Also fixes a pre-existing tsc error in currency.ts.
2026-04-15 14:24:37 -07:00
AgentSeal
94762ca1f4 fix: address review findings before merge
- getProvider() now async, eliminates race condition with cursor loading
- cursor:edit pseudo-tool prevents inflating Claude's Edit count in --provider all
- Tightened SCRIPT_PATTERNS to avoid false positives (run requires file context)
- Removed duplicated LANG_NAMES from cursor.ts (dashboard handles display)
- Test no longer assumes cursor always loads (CI-safe)
- Removed unnecessary type assertion and setTimeout yield
2026-04-15 05:31:51 -07:00
AgentSeal
51c56d0726 fix: include agent/subagent sessions, fix Codex cache hit and cost calculation
- Remove agent-*.jsonl exclusion filter that was dropping ~46% of API calls
- Scan subagents/ directories for subagent session files
- Normalize Codex token semantics: OpenAI includes cached tokens inside
  input_tokens, subtract them to match Anthropic's separate reporting
- Fixes cost double-counting and 100% cache hit display for Codex users
2026-04-14 10:18:14 -07:00
AgentSeal
391a235d1d feat: multi-provider support (Codex + provider plugin system)
Add Codex (OpenAI) as a second provider alongside Claude Code. Provider
plugin architecture makes adding future providers (Pi, OpenCode, Amp) a
single-file addition.

- Provider interface: types, session discovery, stateful JSONL parsing
- Codex parser: token_count dedup, tool normalization, model resolution
- TUI: press p to cycle All/Claude/Codex with 1-min cache for instant switching
- CLI: --provider flag on report, today, month, status, export commands
- Pricing: Codex model fallbacks, fixed fuzzy matching for gpt-5.4-mini
- Menubar: per-provider cost breakdown when multiple providers detected
- 27 tests (10 new: Codex parser, provider registry, tool/model mapping)
2026-04-14 04:32:09 -07:00
AgentSeal
cb5853c460
Merge pull request #6 from rafaelcalleja/feat/bash-breakdown-panel
feat: add Shell Commands breakdown panel
2026-04-14 10:39:23 +02:00
AgentSeal
3964478e61 fix: handle unreadable session files gracefully
readFile in parseSessionFile had no error handling. If a .jsonl file
is missing, has bad permissions, or gets deleted between readdir and
readFile, the whole process crashes with ENOENT. Now returns null
and skips the file.

Fixes #9
2026-04-14 01:31:31 -07:00
Rafael Calleja
a5696362f2 refactor: share BASH_TOOLS from classifier, remove comments
- Export BASH_TOOLS from classifier.ts instead of duplicating in bash-utils.ts
- Remove isBashTool helper (use BASH_TOOLS.has() directly)
- Strip unnecessary comments per codebase conventions
2026-04-14 10:24:38 +02:00
Rafael Calleja
6d8c8643a0 feat: extract bash commands and add bashBreakdown to session summary 2026-04-14 10:24:38 +02:00
AgentSeal
d20281514c feat: one-shot success rate per activity category
Detects edit/test/fix retry cycles (Edit -> Bash -> Edit) within each
turn. Shows 1-shot percentage in the By Activity panel for categories
that involve code edits. Updated screenshot and README.

Fixes #4
2026-04-14 01:14:34 -07:00
AgentSeal
74744f07bb fix: stop tool-result entries from splitting turns and inflating Conversation
Tool results in JSONL are type:"user" entries with no text content.
groupIntoTurns was flushing on every type:"user" entry, creating
phantom turns that got classified as Conversation. Now only flush
when the user entry contains actual text.

Fixes #7
2026-04-14 00:57:43 -07:00
AgentSeal
0da57d1172 add Claude Desktop (code tab) session support
Scans ~/Library/Application Support/Claude/local-agent-mode-sessions/
for Desktop sessions in addition to ~/.claude/projects/. Same JSONL
format, just nested deeper. Cross-platform paths for macOS/Windows/Linux.
2026-04-13 17:58:19 -07:00
AgentSeal
f6cc68a7d4 support CLAUDE_CONFIG_DIR environment variable
Respects CLAUDE_CONFIG_DIR if set, falls back to ~/.claude.
Closes #3.
2026-04-13 17:52:27 -07:00
AgentSeal
00afed6930 v0.1.0 - initial release
Interactive TUI dashboard for Claude Code token observability.
13-category task classifier, per-project/model/tool breakdowns,
gradient bar charts, SwiftBar menu bar widget, CSV/JSON export.
2026-04-13 15:10:27 -07:00