Commit graph

119 commits

Author SHA1 Message Date
Resham Joshi
151d24fb26
Drop Z suffix from day-aggregator test timestamps for timezone stability (#322)
Some checks failed
CI / semgrep (push) Has been cancelled
Timestamps with Z are interpreted as UTC, causing date bucketing tests
to fail in non-UTC timezones (e.g. UTC+12 shifts Apr 9 10:00Z to Apr 8).
Local timestamps without Z are interpreted in the runtime timezone,
matching how the aggregator actually buckets dates.

Based on #112 by @lfl1337, extended to cover all affected timestamps.
2026-05-11 22:25:32 -07:00
Resham Joshi
3b71650f24
Fix mangled project paths in dashboard (#320)
* Fix mangled project paths in By Project and Top Sessions panels

shortProject() decoded Claude Code slugs by splitting on '-', which
broke directory names containing dashes ('foo-bar' became 'foo/bar').
Switch the dashboard to consume ProjectSummary.projectPath (the
canonical cwd already extracted by parser.ts) and rewrite shortProject
to operate on a real absolute path.

* shortProject: cache homedir, normalize Windows backslashes, fix stale test helper

---------

Co-authored-by: Abdallah Meghraoui <abdallah.meghraoui@outlook.com>
2026-05-11 22:02:38 -07:00
AgentSeal
a1b5e4bd00
Fix OpenCode MCP usage reporting (#318)
* Fix OpenCode MCP usage reporting

* Move OpenCode MCP changelog entry to Unreleased section

---------

Co-authored-by: ozymandiashh <234437643+ozymandiashh@users.noreply.github.com>
Co-authored-by: iamtoruk <hello@agentseal.org>
2026-05-11 21:30:27 -07:00
AgentSeal
c85beeaeae
Fix Claude 1-hour cache write pricing (#317)
Co-authored-by: ozymandiashh <234437643+ozymandiashh@users.noreply.github.com>
Co-authored-by: iamtoruk <hello@agentseal.org>
2026-05-11 21:23:04 -07:00
AgentSeal
03e22ecb80
Add IBM Bob provider with workspace extraction (#316)
Some checks are pending
CI / semgrep (push) Waiting to run
* Add IBM Bob provider

* Add workspace extraction for Cline-family providers

Extract project name from workspace directory in api_conversation_history.json
so sessions show actual folder names instead of the provider display name.
Thread projectPath through ParsedProviderCall to avoid unsanitizePath mangling
hyphenated folder names.

---------

Co-authored-by: ozymandiashh <234437643+ozymandiashh@users.noreply.github.com>
Co-authored-by: iamtoruk <hello@agentseal.org>
2026-05-11 20:54:13 -07:00
AgentSeal
2301577e03
Merge pull request #310 from getagentseal/fix/menubar-wake-recovery
Fix menubar wake recovery and release asset selection
2026-05-11 10:58:33 -07:00
iamtoruk
1149ab6e43 Fix menubar wake recovery and release asset selection 2026-05-11 10:57:02 -07:00
Resham Joshi
02f4635cec
Fix node:sqlite V8 crash on invalid UTF-8 in text columns (#272)
node:sqlite calls v8::String::NewFromUtf8 with kAbort on TEXT columns.
Cursor chat blobs often contain truncated multi-byte chars from streaming
boundaries, which triggers a V8 CHECK abort (not a JS exception).

Select all text-content columns as CAST(col AS BLOB) so node:sqlite
returns Uint8Array instead. Decode in JS with TextDecoder fatal:false
which replaces bad bytes with U+FFFD. Covers all three SQLite providers
(Cursor, Goose, OpenCode).

Removes the version blocklist (MIN_NODE_22_PATCH) and lowers engines
requirement from >=22.20 to >=22 since the BLOB cast approach works
on all Node 22.x versions.

Closes #264
Closes #250
2026-05-10 17:05:08 -07:00
Resham Joshi
d142bd97ef
daily-cache: discard pre-v5 caches (fixes menubar providers regression) (#297)
PR #296 (Cursor per-project breakdown) bumped DAILY_CACHE_VERSION
from 4 to 5 but left MIN_SUPPORTED_VERSION at 2. The migration
path (isMigratableCache + migrateDays) only fills in missing
default fields; it does NOT recompute the providers / categories
/ models rollups from session data, because raw sessions are not
retained in the cache. So a v4 cache migrated to v5 carried
forward its old per-day provider totals (single 'cursor' bucket)
for the full retention window.

Effect on users post-#296: the macOS menubar's
`current.providers.cursor` would show the orphan-bucket subtotal
instead of the full Cursor cost for any historical day whose
daily entry was computed before #296 landed. Live-test on my
machine showed cursor=$3.78 against a migrated v4 cache vs
cursor=$4.08 (correct) after the daily cache was discarded — the
$0.30 gap was the workspace projects whose costs were no longer
aggregated under the 'cursor' label by the new code.

Fix: raise MIN_SUPPORTED_VERSION to 5 so any cache with
version < DAILY_CACHE_VERSION is renamed to `.bak` and the cache
is recomputed from scratch on next run. The recompute is the same
operation that backfills the cache for a new user, so the cost is
a one-time cold-path hit (~3s on the test machine).

Test for the migration case updated to assert the new
discard-and-bak behavior. Full suite: 46 files / 654 tests pass.
2026-05-10 16:05:59 -07:00
Resham Joshi
810b214476
Cursor: per-project breakdown by workspace (closes per-project half of #196) (#296)
Cursor's chat history showed as a single row labeled 'cursor' in
the dashboard because the global state.vscdb has no workspace
field on individual bubbles. The fix joins through Cursor's
per-workspace storage:

1. Walk ~/Library/Application Support/Cursor/User/workspaceStorage/*
2. For each hash dir, read workspace.json -> folder URI
3. Open that dir's state.vscdb, read
   ItemTable['composer.composerData'] -> allComposers list
4. Build Map<composerId, folder URI>
5. emit one SessionSource per workspace plus a catch-all 'cursor'
   source for composers that did not register against any
   workspace (multi-root workspaces, no-folder-open windows,
   deleted workspaces with surviving global rows)

The parser decodes source.path's #cursor-ws= tag, filters the
parsed bubbles to the composerIds that belong to this workspace,
and yields only those. The orphan-tag source negates the filter so
it captures every composer not in any workspace.

In passing, fix a real bug in the old code: parseBubbles set
`sessionId: row.conversation_id ?? 'unknown'`, but the JSON
`conversationId` field is empty in current Cursor builds, so every
call shipped with `sessionId: 'unknown'`. We now derive the
composer id from the row key (`bubbleId:<composerId>:<bubbleUuid>`)
which is what the workspace map joins on. The old behavior masked
the bug because every call went into a single 'cursor' project
anyway; with per-workspace bucketing the bug becomes load-bearing.
Cache version bumped 2 -> 3 to invalidate caches that still record
'unknown' as the session id.

Live-tested against my real 1.9 GB Cursor DB: the single 'cursor'
row with 1904 calls / $4.08 now breaks into 5 workspaces plus an
orphan bucket, totals reconcile exactly. 8 fixture-based tests
cover multi-workspace routing, orphan filtering, legacy bare DB
path backwards compat, multi-root workspace skip, vscode-remote
URI slugification, and total reconciliation across all sources.

Full suite: 46 files, 653 tests passing.
2026-05-10 15:35:57 -07:00
Resham Joshi
cdf7169a89
Cursor model aliases: cover every variant so non-Auto sessions price (#159) (#290)
Cursor emits model names in a `claude-<dot-version>-<tier>` shape
(`claude-4.6-sonnet`, `claude-4.5-opus`, `claude-4.5-opus-high-thinking`,
etc.) plus its own `composer-1` house model. None of these match
the canonical LiteLLM pricing keys (`claude-sonnet-4-6`,
`claude-opus-4-5`).

The alias map in `src/models.ts` filled some of these in v0.9.4
but missed:

- plain no-suffix forms: `claude-4.5-opus`, `claude-4.5-sonnet`,
  `claude-4.6-opus`
- haiku tier: `claude-4.5-haiku`, `claude-4.6-haiku`
- forward-looking: `claude-4.7-opus`
- Cursor's house model: `composer-1`

The dashboard rendered $0 for sessions that used any unaliased
model — visible in the screenshots posted in #159 even after the
v0.9.4 fix that added the `-thinking` variants.

This PR fills the gaps and adds 16 regression tests under
`Cursor model variants resolve to pricing` that assert every
model name in `src/providers/cursor.ts:modelDisplayNames` plus
the additional plain forms resolves to a non-null pricing entry
with `inputCostPerToken > 0` and `outputCostPerToken > 0`. So a
future LiteLLM snapshot bump or a typo in the alias map will fail
the test before users see $0.

Direct hits in the snapshot (no alias needed): `gpt-5`, `gpt-5.2`,
`grok-code-fast-1`, `gemini-3-pro` (already aliased). These are
covered in the test suite as well so a snapshot that drops them
would also be caught.

Tests: 45 files, 617 passing locally (16 new). Closes #159.
2026-05-10 03:27:44 -07:00
Resham Joshi
7a878f4d19
Classifier: feature verb wins over debug keyword (part of #196) (#289)
Some checks are pending
CI / semgrep (push) Waiting to run
Messages like "add error handling", "create an issue tracker", or
"implement the 404 page" were landing in the Debugging bucket
because the classifier checked DEBUG_KEYWORDS (which matches
`error`, `issue`, `404`) before FEATURE_KEYWORDS in both
`refineByKeywords` (tool-bearing turns) and `classifyConversation`
(chat-only turns). The position of the matched word in the
sentence is a much stronger intent signal than the order of the
checks in code, so we now pick whichever pattern matches earliest.

The new helper `firstMatchingCategory` runs each candidate regex
once with `RegExp.exec` and keeps the match with the lowest
`index`. Ties (rare in practice — same start position) break by
the order the candidates were listed, which is `refactoring >
feature > debugging` for coding turns. That ordering preserves
existing behavior for plain bug reports (e.g. "login is broken,
traceback below") while flipping mislabeled feature work to its
correct category.

8 regression tests in `tests/classifier.test.ts` cover the
mislabel cases from #196 plus tie-break / chat-only cases. Full
suite: 45 files / 609 tests, all green.

Closes the activity-misattribution half of #196. The Cursor
provider attribution half (single 'cursor' project for all
sessions) is addressed in a separate PR.
2026-05-09 22:48:11 -07:00
Resham Joshi
b72e51e538
Support CLAUDE_CONFIG_DIRS for scanning multiple Claude data dirs (#208) (#288)
Adds an OS-delimited list env var so a user with more than one
Claude account or profile can scan all of them in a single run.
Sessions across every configured dir merge into one ProjectSummary
per project, matching the option-1 design agreed on the issue
thread (no per-account splitting in the data model or the UI).

Format: `CLAUDE_CONFIG_DIRS=~/.claude-work:~/.claude-personal`
on POSIX, `;`-separated on Windows. Precedence is
CLAUDE_CONFIG_DIRS > CLAUDE_CONFIG_DIR > ~/.claude. Empty entries
in the list are skipped, duplicates are deduped on resolved path,
and a missing or unreadable dir does not abort the scan of the
others. If the user explicitly set CLAUDE_CONFIG_DIRS but every
listed entry is unreadable, a one-line stderr hint identifies the
attempted paths and the platform's expected delimiter, so a
Windows user typing the POSIX `:` does not get a silent zero-row
result. `~` is now also expanded in CLAUDE_CONFIG_DIR for
consistency.

Implementation is intentionally narrow: only `claude.ts` changes,
plus a small parser-cache key update so a stale cache from one
config does not bleed into a run with a different config (matters
for the macOS menubar and GNOME extension which run as long-lived
processes). The merge happens for free in
`src/parser.ts:scanProjectDirs`, which keys ProjectSummary entries
by canonical cwd (or the sanitized slug as a fallback). Two
SessionSource entries with the same `project` field land under the
same key and combine their sessions, regardless of which dir they
came from. No new fields on SessionSource / SessionSummary /
ProjectSummary, and no UI changes.

Tests: 12 fixture-based cases covering the unset path (default
~/.claude), single-dir override via CLAUDE_CONFIG_DIR, multi-dir
override via CLAUDE_CONFIG_DIRS, ~ expansion, dedup of repeated
entries, leading/trailing/doubled delimiters, missing dir
tolerated, file-not-directory entry tolerated, empty
CLAUDE_CONFIG_DIRS falls back to single-dir env, and two
parser-level integration tests asserting (a) two sessions from
two dirs sharing one cwd produce one ProjectSummary with combined
totals and no `account`/`accountPath` fields anywhere, and (b)
two sessions sharing a slug but with different canonical cwds
still merge by slug at the project-rollup layer (option 1
behavior pinned so a future refactor cannot quietly swap to
cwd-aware merging without an explicit opt-in).

Supersedes the alternative implementation in #227, which builds
per-account attribution (option 2) instead.
2026-05-09 22:04:45 -07:00
Resham Joshi
d1eb13fb91
Expose per-day one-shot data in daily JSON output (#279) (#280)
* Expose per-day one-shot data in daily JSON output

Closes #279.

Adds turns, editTurns, oneShotTurns, oneShotRate to each entry of the
`daily[]` array in `codeburn report --format json` output. The data was
already computed internally for activity-level rollups; this just buckets
it by date so consumers building daily-resolution efficiency dashboards
(streak tracking, heatmaps, rolling-window charts) don't have to re-derive
the rate from period-level activities.

Counting matches parser.ts categoryBreakdown semantics:
- every turn counts toward `turns`
- turns with hasEdits=true count toward `editTurns`
- edit turns with retries=0 count toward `oneShotTurns`
- oneShotRate is null (not 0) when editTurns=0 — a chat-only day's rate
  is undefined, and reading it as 0% would be misleading

Real consumer named in the issue: a 10-developer internal usage tracker
that scores days by cache hit + cost/call + (now) one-shot rate.

* Strengthen daily/activities reconciliation + CHANGELOG entry

- Fall back to turn.assistantCalls[0]?.timestamp when turn.timestamp is
  missing so daily aggregate doesn't drop turns that activities[] keeps.
  Previously sum(daily[].editTurns) could be < sum(activities[].editTurns)
  for sessions starting with assistant entries before any user line.
- Add Unreleased CHANGELOG entry for the daily one-shot fields.
2026-05-09 21:01:05 -07:00
Resham Joshi
4c29f6b880
Add Crush provider plus per-provider icon column in README (#286)
Closes #278.

Adds Charmbracelet Crush as a lazy-loaded provider:
- src/providers/crush.ts: walks ~/.local/share/crush/projects.json
  (XDG_DATA_HOME and CRUSH_GLOBAL_DATA aware), opens each project's
  crush.db read-only, queries root sessions where parent_session_id
  IS NULL. Emits one ParsedProviderCall per session with real
  prompt_tokens, completion_tokens, cost (dollars), and the
  dominant model resolved from messages.model.
- src/providers/index.ts: register crush alongside cursor, goose,
  opencode, antigravity, cursor-agent in the lazy import path.
- tests/providers/crush.test.ts: 10 fixture-based tests covering
  discovery, parsing, missing-registry, malformed JSON, missing db,
  child session exclusion, dominant model selection, dedup, and
  array-shaped legacy registry.

Schema source: charmbracelet/crush@v0.66.1
internal/db/migrations/20250424200609_initial.sql, verified by
spawning a research agent against upstream. The schema *comments*
in that migration claim millisecond timestamps but every actual
INSERT/UPDATE uses strftime('%s', 'now') which returns Unix
seconds; the parser treats values as seconds. Tokscale's
parser (junhoyeo/tokscale#346) gets this wrong and is off by
1000x, plus its parser misses the prompt_tokens/completion_tokens
columns that exist in Crush's schema. Our integration uses both,
so Crush sessions get real per-model attribution.

Menubar:
- mac/Sources/CodeBurnMenubar/AppStore.swift: add .crush case to
  ProviderFilter and its cliArg switch.
- mac/Sources/CodeBurnMenubar/Views/AgentTabStrip.swift: add
  Crush color to the per-tab color extension. The visibleFilters
  computed property already filters by detected providers, so the
  Crush tab appears automatically when a user has Crush data.

README:
- Replace the provider table with an icon-led layout. Icons live
  under assets/providers/<name>.<ext>. 14 icons sourced from
  junhoyeo/tokscale (MIT) under nominative fair use, 4 sourced
  separately: codex (OpenAI org avatar), cursor-agent (reuses the
  Cursor icon), kiro (kiro.dev favicon, ico->png via sips), omp
  (can1357/oh-my-pi icon.svg, MIT). Attribution line added.
- Add Crush row.

Docs:
- docs/providers/crush.md: full per-provider doc with verified
  schema excerpt, the seconds-vs-milliseconds quirk, and a
  "when fixing a bug here" checklist.
- docs/architecture.md: provider count 17 -> 18, test count
  41 -> 42, and crush in the lazy list.
- docs/providers/README.md: add Crush row to the lazy index.
- CONTRIBUTING.md: bump test count to 568 (was 558).

All 568 tests pass locally; swift build clean.
2026-05-09 20:47:56 -07:00
Resham Joshi
b4ed98cfa4
Add codeburn models per-model + per-task breakdown command (#287)
A single dense table of every (provider, model) you have used in the
selected period, sorted by cost. Inspired by tokscale's per-model
output and ccusage's responsive cli-table3 layout, ported to plain
Node with no new runtime dependency.

Default view: one row per (provider, model) with a Top Task cell
showing the dominant task category and its cost share, e.g.
`Coding (42%)`.

`--by-task` explodes each model into one row per task type, with
provider/model cells blanked on subsequent rows of the same group
and a horizontal divider between groups so the sections read as
distinct units.

Output formats: table (Unicode box-drawn, default), markdown
(GitHub-flavored, copy-paste friendly), json, csv.

Filters: --period (today/week/30days/month/all, default 30days),
--from/--to, --provider, --task, --top, --min-cost, --no-totals.

The table renderer auto-sizes every column to its content (no fixed
widths leaving trailing whitespace) and drops cache columns as a
pair when the terminal is narrow, then input/output, then top-task,
in that order. Provider, model, total, and cost stay regardless.
Visible-width math uses strip-ansi (already a dependency) so styled
cells pad correctly. Cyan headers, yellow totals, dim provider name.

The aggregator walks every parsed turn and attributes each
assistant call to its (provider, model, task) bucket, computing
real input / output / cache_write / cache_read tokens and cost.
Output tokens include reasoning. Cached input tokens are folded
into cache_read so the column matches what users intuitively expect.

19 fixture-based tests cover aggregation correctness, byTask
grouping, taskFilter, topN/minCost filters, reasoning-as-output,
all four renderers (table/markdown/json/csv), narrow-terminal
column dropping, CSV/markdown escaping, totals row toggle, and
visible-width math under styled cells.
2026-05-09 20:45:21 -07:00
Resham Joshi
46e43a0ec3
Label optimize suggestions by destination (#281)
Some checks are pending
CI / semgrep (push) Waiting to run
Closes #277.

Every paste-style fix now declares an explicit `destination` so users can
tell at a glance whether a suggestion belongs in CLAUDE.md as a permanent
rule, in a one-time session opener, in the current chat as an ask, or in
a shell config file. Previously the prompts had no labeled home and users
were dropping one-time session openers into CLAUDE.md as permanent rules.

Type changes:
- New `PasteDestination` union: `claude-md` / `session-opener` / `prompt`
  / `shell-config`
- `WasteAction.paste` gains `destination?: PasteDestination`

Renderer changes:
- CLI `optimize` command (renderOptimize → renderFinding) prints a
  section header above each fix block:
    -- Suggested CLAUDE.md addition (permanent rule) ───
    -- One-time session opener (do NOT add to CLAUDE.md) ───
    -- Ask Claude in the current session ───
    -- Add to your shell config ───
    -- Run this command ───
- Interactive dashboard (FindingAction in dashboard.tsx) gets the same
  treatment so the in-popover findings list reads identically.

Existing fixes retagged appropriately. Two existing prompts that lacked
destination context altogether ("Set a delivery checkpoint at the start
of the next expensive thread", "Start the next expensive thread with a
fresh-context constraint") now read as one-time session openers with a
clear "do not add to CLAUDE.md" hint — the exact failure mode the
reporter described.

Tests:
- Existing `detectJunkReads` test extended to assert the destination tag.
- New regression block walks every detector that emits a paste-style fix
  and asserts each one declares a destination — future detectors that
  ship without one get caught here.
2026-05-08 23:30:53 -07:00
Resham Joshi
fb8f25fb97
Reject invalid --format and --period values instead of silently falling back (#258)
getDateRange() silently fell back to week on unknown periods, and no command
validated --format. A typo like --period mounth or --format yaml produced
wrong output with exit 0. Now all 6 format-accepting commands and all
period-accepting paths reject unknown values with a clear message and exit 1.
Also fixes the status description (said today+week+month, only shows
today+month).
2026-05-06 23:03:41 -07:00
Resham Joshi
daa673449c
Menubar and CLI hardening from multi-agent audit (#257)
Some checks are pending
CI / semgrep (push) Waiting to run
Two passes of validators across CLI accuracy, dashboard UX, menubar Swift,
performance, security, and end-to-end smoke tests on real session data.

Data-correctness fixes:

- parseLocalDate rejects month/day overflow. JS Date silently rolled
  Feb 31 to Mar 3, so --from 2026-02-31 --to 2026-03-15 quietly dropped
  sessions on Feb 28 - Mar 2. Now throws "Invalid date" with a clear
  reason. Leap-day case covered (2024-02-29 valid, 2025-02-29 rejected).

- CSV/JSON exports use the active currency's natural decimal places. The
  previous round2 helper produced ¥412.37 in CSV while the dashboard
  rendered ¥412 — finance teams comparing the two surfaces saw a
  discrepancy. New roundForActiveCurrency consults Intl.NumberFormat for
  the right precision (0 for JPY/KRW/CLP, 2 for USD/EUR, etc).

- Copilot toolRequests is Array.isArray-guarded in both modern and legacy
  event branches. Previously a corrupt session with toolRequests=null or
  a string aborted the whole file's parse loop and silently dropped every
  legitimate call after it.

- Codex token_count dedup uses a null sentinel for prevCumulativeTotal so
  the first event is never confused with a duplicate. Sessions that emit
  only last_token_usage (no total_token_usage) report cumulativeTotal=0
  on every event; with the previous 0-initialized prev, the first event
  matched the dedup guard and was dropped.

- LiteLLM pricing values are clamped to [0, 1] per token via safePerTokenRate.
  Defense in depth against a tampered upstream JSON shipping negative or
  absurdly large per-token costs that would otherwise propagate into all
  cost totals.

Performance:

- Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Two
  changes: per-conversation user-message buffer uses an index pointer
  instead of Array.shift() (which was O(n) per call); and a real ROWID
  cutoff via subquery limits the scan to the most recent 250k bubbles
  with a stderr warning so power users get a partial report rather than
  a stalled CLI.

- Spawned codeburn CLI subprocesses are terminated when the calling Task
  is cancelled. Without this, rapid period/provider tab clicks in the
  menubar cancelled the Task but left the subprocess running to
  completion, piling up zombie processes.

UX:

- Dashboard period switch flips to loading and clears projects
  synchronously before reloadData runs, eliminating the frame where the
  new period label rendered over the old period's projects.

- Optimize findings tab paginates 3-at-a-time with j/k scroll. With 4
  new detectors plus 7 originals, 8-10 findings * 6 lines was scrolling
  the StatusBar off the alt buffer top.

- Custom --from/--to ranges hide the period tab strip and disable the
  1-5 / arrow keys so a stray period press no longer abandons the user's
  explicit range. A "Custom range: X to Y" banner replaces the tab strip.

- OpenCode storage-format warning is per-table-set, rate-limited to once
  per process, and points the user at OpenCode's migration step or the
  issue tracker. The previous all-or-nothing check fired the generic
  "format not recognized" string for any schema mismatch.

Menubar / OAuth:

- Both Claude and Codex bootstrap (Reconnect button) now honour the
  usageBlockedUntil 429 backoff that refreshIfBootstrapped respects.
  Spamming Reconnect during sustained rate-limit windows previously
  hammered the upstream endpoint on every click.

- Codex Retry-After HTTP header is parsed (delta-seconds plus IMF-fixdate
  fallback) so we don't over-back-off when ChatGPT tells us a shorter
  window than our 5-minute floor.

- Both credential cache files are written via SafeFile.write
  (O_CREAT | O_EXCL | O_NOFOLLOW with explicit 0600) so there is no race
  window where the temp file briefly exists at default umask, and a
  symlink at the destination cannot redirect the write. Reads now route
  through SafeFile.read with a 64 KiB cap, closing the symlink-follow gap
  on Data(contentsOf:).

CI signal:

- TypeScript strict typecheck (tsc --noEmit) is now zero errors. The
  six errors in src/providers/copilot.ts came from a discriminated-union
  catch-all branch whose `data: Record<string, unknown>` shape TS picked
  over the specific event branches when narrowing on `type`. Removed the
  catch-all; runtime falls through unknown event types via the existing
  if/else chain.

Tests added: 16 new (now 555 total)
- date-range-filter: month/day/year overflow rejection, leap-day correctness
- currency-rounding: convertCost no-rounding contract, roundForActiveCurrency
  for USD/JPY/KRW/EUR
- providers/copilot: malformed toolRequests does not abort the parse
- providers/cursor-bubble-dedup: re-parse after token mutation does not
  double-count, single parse yields one call per bubble
- providers/codex: first event with cumulativeTotal=0 not dropped,
  consecutive zero-cumulative duplicates still deduped
2026-05-06 22:15:11 -07:00
Resham Joshi
afd0ee7011
Validator hardenings on the bug-hunt batch (#254)
* Five correctness fixes from multi-agent bug hunt

A multi-agent audit of the codeburn correctness surface found five
real bugs each producing visibly wrong numbers or risking data loss.
All five fixes were validated by parallel review agents and exercised
end-to-end against real session data on this machine.

- src/cli.ts: --refresh <seconds> was using bare parseInt as the
  commander callback. Commander invokes the callback as
  parseInt(value, previous), so previous becomes the radix:
  --refresh 30 was being parsed as parseInt('30', 30) = 90, and
  --refresh 60 became NaN. Replaced with parseInteger (already
  defined at line 48 with radix locked to 10) at all three sites.

- src/providers/cursor.ts: parseAgentKv was timestamping every
  agentKv call as new Date().toISOString() because the Cursor
  SQLite schema has no per-message timestamp. Result: every
  Cursor agent call regardless of when it happened landed in
  today's date bucket. Now uses statSync(dbPath).mtimeMs as a
  bounded ceiling so calls land at the actual last-write time of
  the Cursor database, not today. Verified locally: a 1904-call
  Cursor history with March 22 mtime now correctly bucket into
  all-time only and shows 0 calls for today/week/30days.

- src/providers/codex.ts: prev token counters were only updated
  inside the cumulative-fallback branch, so a session emitting N
  events with last_token_usage followed by one cumulative-only
  event computed the next delta against prev=0 and double-counted
  the entire cumulative window. Cost could be inflated 10-100x
  for any mixed-format Codex session. Now prev advances to the
  current cumulative state regardless of which branch ran.

- src/providers/gemini.ts: totalOutput accumulated output+thoughts
  while totalThoughts was tracked separately. The result was
  outputTokens = output+thoughts AND reasoningTokens = thoughts;
  any consumer summing the two double-counted thoughts. Now
  totalOutput holds just output, reasoningTokens holds thoughts,
  and the cost calc folds thoughts into the output count to keep
  pricing correct (Google bills thoughts at the output rate;
  calculateCost has no reasoning parameter).

- src/export.ts: exportJson had no safety check before writeFile,
  so codeburn export -f json -o ~/important.json would silently
  clobber the user's file. CSV path had a marker-file guard; JSON
  did not. Now refuses to overwrite a file unless its first 4KB
  contain the codeburn schema marker. Uses a streaming partial
  read so a large existing file does not OOM Node's ~512MB
  string limit. Refuses directories outright.

Skipped intentionally: cursor-auto/copilot-auto/cline-auto/
qwen-auto are aliased to claude-sonnet-4-5. The audit flagged
this as wrong pricing for non-Anthropic auto-routed turns, but
Cursor's "auto" mode does not expose the actual model and any
alternative estimate is equally arbitrary. README already
documents this as a Sonnet-based estimate.

vitest run: 38 files, 529 tests pass.

* Five more correctness fixes from the bug-hunt round

This commit closes out the remaining critical-tier findings from the
multi-agent audit, with one item documented as a known limitation.

- src/providers/cursor.ts: bubble dedup key included mutable
  inputTokens/outputTokens. Cursor mutates token counts on the row in
  place when streaming completes, so re-parsing the same DB produced
  a fresh dedup key per bubble and silently double-counted. Switched
  to the SQLite row key (`bubbleId:<unique>`) which is stable per
  bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose
  `key as bubble_key`.

- src/providers/pi.ts: usage fields were destructured non-optionally,
  but real Pi/OMP session files sometimes omit individual fields.
  `calculateCost(model, undefined, ...)` returned NaN, and that NaN
  propagated into every aggregate cost total. Coerce each field to
  0 with `?? 0`.

- src/models.ts: getShortModelName and the getModelCosts startsWith
  fallback both walked the dictionary in insertion order. A model id
  like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched
  by startsWith first) and silently get GPT-5's display name and
  pricing tier. Iterate longest keys first so more-specific prefixes
  win. Tightened the cost fallback's match condition from
  `startsWith(key) || startsWith(key + '-')` to require either an
  exact match or a `key + '-'` continuation, removing accidental
  matches like `gpt-50` against `gpt-5`.

- src/models.ts: calculateCost returned 0 silently for any model
  missing from the pricing snapshot. New Anthropic / OpenAI models
  shipped between snapshot refreshes look free until the user
  notices. Now warns once per unknown model name per process to
  stderr. Skips the warning for the `<synthetic>` placeholder so
  the noise floor stays low.

- src/yield.ts: revert detection was broken on the canonical case.
  Two problems: (1) `subject.toLowerCase().includes('revert')`
  matched any commit whose subject mentioned the word ("Add revert
  button" was misclassified). (2) The window logic only counted
  reverts within the original session's 1-hour boundary, but real
  `git revert` commits land in later sessions, so original sessions
  always looked productive. Now: getRevertedShas runs once with
  `--grep=^This reverts commit` and parses bodies to build a Set of
  SHAs that were the target of a revert anywhere in history.
  CommitInfo.wasReverted is set when this commit's SHA appears in
  that set. categorizeSession then flags a session as reverted when
  its in-main commits were later reverted, regardless of when the
  revert itself happened.

- src/providers/droid.ts: SKIPPED with comment. Droid records token
  usage only at session level. The current behavior splits evenly
  across emitted assistant calls and prices all of them at
  settings.model (the latest model). For sessions where the user
  switched models mid-stream, costs are approximate. Added an
  inline comment documenting this; a real fix requires per-message
  model data that isn't in the Droid JSONL schema.

Verified end-to-end on this machine:
- vitest run: 38 files, 529 tests pass
- `codeburn report --format json` produces valid JSON
- `codeburn yield -p week` runs without crashing, finds 0 reverts
  in the user's recent git history (plausible — fix changed the
  detection from "subject contains revert" to "this commit's SHA
  appears in a later 'This reverts commit ...' body")
- Stderr now warns for unknown model ids: `openai/gpt-5.3`,
  `qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced
  silently at $0.

* Four high-severity fixes from the bug-hunt round

- src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in
  one try/catch. If fetchRate succeeded but cacheRate threw (disk
  full, ENOSPC, no permissions on the cache dir), the catch block
  swallowed the error and returned 1. Every cost rendered after that
  point became USD-equivalent silently. Now the fetch and the cache
  write live in separate paths: a successful fetch returns the rate
  even if the persist fails, and the cache-write error is dropped to
  a fire-and-forget so transient disk problems do not corrupt the
  user's currency display.

- src/cursor-cache.ts: writeFile was non-atomic. Two concurrent
  codeburn invocations writing to cursor-results.json could
  interleave bytes mid-write, leaving a truncated file that
  parsed-error on next read and forced a full SQLite re-scan every
  run. Switched to the temp-file + rename pattern with a randomized
  temp name so each writer gets its own staging file and the rename
  is atomic on POSIX. Crash mid-write also leaves only a leftover
  temp file, which gets unlinked in the catch path; the destination
  is never half-written.

- mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's
  Task.sleep keeps a wakeup pending across system sleep, so on wake
  the natural tick fires the same instant the wake observers do.
  Combined with didWakeNotification, screensDidWakeNotification, and
  the launchd com.codeburn.refresh distributed notification, that
  produced 2-3 concurrent CLI spawns within ms of every wake. Now:
  willSleepNotification cancels the loop task; didWakeNotification
  restarts it. The loop also reads lastRefreshTime and skips its
  natural tick if a wake/manual/distributed-notification refresh ran
  within the last 5 seconds, coalescing the two sources of refresh
  into one CLI spawn per wake event.

- mac/.../CodeBurnApp.swift observeStore: the read closure had an
  implicit strong self capture (it accessed store.* without a
  capture annotation), pinning self for the lifetime of any
  unfired observation. Added [weak self] and a guard to make the
  capture explicit. withObservationTracking is one-shot per call,
  so there is at most one active subscription at a time; the
  earlier audit's claim of an unbounded leak overstated the issue,
  but tightening the capture pattern is still cleaner.

Verified:
- vitest run: 38 files, 529 tests pass
- swift build -c release --arch arm64 --arch x86_64: clean, no
  diagnostics, no MainActor warnings
- mac/Scripts/package-app.sh dev produces a valid universal bundle
- Menubar launches and runs without crash

* Eleven medium-severity fixes from the bug-hunt round

- src/format.ts formatTokens: guard against Infinity, NaN, and
  negative input. Previously a corrupt aggregate could leak into
  the UI as the literal strings "NaN" or "Infinity". Negatives now
  render as "0" rather than "-500" with no scaling.

- src/cli-date.ts parseDateRangeFlags: the missing-from default
  was new Date(0), which opened a 55-year scan from 1970 epoch
  whenever the user passed only --to. Default now anchors at 6
  months back from now, matching the dashboard's all-time period.
  Test updated to assert the new bounded window.

- src/cli-date.ts toPeriod: previously fell back silently to "week"
  for any unknown input, so a typo like `-p mounth` produced a
  quiet 7-day report while the user thought they were viewing the
  month. Now exits with a clear stderr error and exit code 1.
  Test updated to assert the loud-failure behavior.

- src/optimize.ts urgencyScore: rebalanced weights so a high-impact
  finding with zero observed tokens cannot outrank a medium-impact
  finding with millions of tokens. Old 0.7/0.3 split made high+0
  (0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B
  (0.75) beat high+0 (0.50). Token normalization lifted to 5M so
  the ramp covers a realistic spend range.

- src/models.ts calculateCost: clamp negative or non-finite token
  inputs to 0 before pricing. A corrupt JSONL emitting a negative
  count would otherwise produce a negative cost that silently
  subtracted from real spend in aggregates.

- src/currency.ts convertCost: stop rounding during aggregation.
  For zero-fraction currencies (JPY, KRW, CLP) this clamped every
  per-session cost to a whole unit before sum, so a project of
  1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of
  ¥400. formatCost still rounds at the display boundary.

- src/config.ts saveConfig: the temp file path was a fixed
  `${configPath}.tmp` suffix. Two simultaneous saveConfig calls
  (overlapping menubar and CLI runs) raced on the same staging
  file and could leave one writer reading partial bytes from the
  other. Randomized the temp suffix per call.

- src/providers/antigravity.ts flushCache: the early return on
  `!cacheDirty` short-circuited eviction when liveCascadeIds was
  supplied but no cascade had been added or updated this run. As
  a result, deleted .pb files persisted in the cache forever once
  the user stopped writing to it. Eviction now runs whenever
  liveCascadeIds is provided, marks the cache dirty if anything
  was removed, and only then short-circuits if there is nothing
  to write.

- src/daily-cache.ts addNewDays: cap retention at 2 years. The
  days array previously merged forever, growing the cache file by
  hundreds of bytes per day until JSON parse on every CLI
  invocation became measurable. The 6-month UI period plus the
  365-day BACKFILL_DAYS bootstrap both fit comfortably inside the
  cap, with headroom for a future longer window.

- src/dashboard.tsx useInput: period number keys (1-5) and arrow
  keys triggered a reload while the compare view was mounted. The
  parent's data state changed underneath the user with no visual
  affordance back to the dashboard. Now those keys are gated on
  view !== 'compare', and `b` / Esc inside compare returns to the
  dashboard.

- mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend
  Bars, computeTrendStats, computeForecast, and computeAllStats
  each allocated a fresh DateFormatter (and Calendar) on every
  call. SwiftUI re-evaluates these views many times per second
  during hover scrubbing on the trend chart, so the allocations
  were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d"
  / "MMM d" formatters and the gregorian Calendar to fileprivate
  cached singletons.

Two findings from the same bucket were not addressed here:
- UpdateChecker SHA-256 / codesign verification is already
  performed by src/menubar-installer.ts (verifyChecksum at line
  85). The Swift side just kicks off `codeburn menubar --force`
  which runs that path. The audit's claim of missing verification
  was a misread.
- NSDistributedNotificationCenter sender validation: the
  `com.codeburn.refresh` listener accepts from any sender, but
  forceRefresh has a 5-second rate-limit gate so the abuse
  ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC,
  per-launch shared secret) are disproportionate to the impact.

vitest run: 38 files, 529 tests pass.
swift build -c release: clean, no warnings.

* Validator hardenings on the bug-hunt batch

Hoist the per-call sort in getModelCosts and getShortModelName to module
scope so model lookups on the hot path stop reallocating sorted key arrays.

Sanitize the unknown-model stderr warning by stripping C0/C1 controls
and capping length, so a hostile or corrupt JSONL cannot inject terminal
escape sequences via the model field.

Skip the daily-cache prune when newestDate fails to parse. The previous
code produced a NaN cutoff and silently dropped every cached day on the
next merge.

Adds tests locking down the stable resolution of common model names
(gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and
the prune NaN guard.
2026-05-06 19:50:40 -07:00
Resham Joshi
75d4701bd8
feat(optimize): flag low-worth expensive sessions
Some checks are pending
CI / semgrep (push) Waiting to run
Adds a low-worth detector to codeburn optimize that flags expensive sessions with weak delivery signals (no edits, repeated retries, or no one-shot edits) when no git/gh delivery command is observed. Priority order is low-worth → context-bloat → outliers; each later detector excludes sessions named by an earlier one so the same session is never listed in three findings. Detection:  floor,  for no-edit, 3+ retries, regex matches git commit/push and gh pr create/merge but excludes commit-tree/commit-graph and dry-run. Three impact tiers consistent with #246. Token-savings uses full session tokens for no-edit sessions and the retry fraction for edit-with-retry sessions. Supersedes #241 with review fixes. Original implementation by @ozymandiashh.
2026-05-06 00:35:41 -07:00
Resham Joshi
f92d57d24a
feat(optimize): detect context-heavy sessions
Adds a context-bloat finding to codeburn optimize that flags sessions where effective input/cache tokens (cache-discounted via existing pricing constants) are large and disproportionate to output. Suggests starting fresh with a tightened context. Sessions flagged here are excluded from the cost-outlier finding to avoid double-listing. Growth-from-previous-session callouts are suppressed when the predecessor is more than 7 days back. Three impact tiers (low/medium/high). Supersedes #242 with review fixes from real-data probe. Original implementation by @ozymandiashh.
2026-05-06 00:11:12 -07:00
Resham Joshi
6151cf6d73
fix(parser): use Claude cwd for Windows project paths
Reads the canonical cwd already stored inside Claude session JSONL files and uses it as the project path, then groups sessions by a normalized path key (case + slash insensitive) so Windows projects no longer split into 3+ rows on case/slash variants. Falls back to the legacy slug-derived path when cwd is missing. Closes #217. Supersedes #228 with a fix that preserves the canonical cwd even when mixed with slug-only sessions in the same directory. Original implementation by @ozymandiashh.
2026-05-05 23:53:31 -07:00
Resham Joshi
be6068b244
feat(report): add per-model efficiency metrics
Adds per-model efficiency metrics (edit turns, one-shot rate, retries/edit, cost/edit) to the TUI By Model panel, JSON report output, and CSV export. Closes item 4 of #12. Supersedes #226 with review fixes (units rename, min-sample guard in TUI, tighter <synthetic> filter, multi-model attribution test). Original implementation by @ozymandiashh.
2026-05-05 23:36:59 -07:00
ozymandiashh
fc4c4f0091
feat(export): support custom date ranges 2026-05-05 23:18:48 -07:00
iamtoruk
38d21643bd Merge origin/main into feat/session-outlier-detection 2026-05-04 20:21:26 -07:00
iamtoruk
5120ec696c Merge origin/main into feat/mcp-tool-coverage 2026-05-04 20:13:15 -07:00
iamtoruk
735f41bc6c Fix cache-write pricing and shell-quote server names in fix commands
- Use 1.25x multiplier for cache-write tokens to match Anthropic's
  actual pricing (was incorrectly using 1x)
- Shell-quote server names in `claude mcp remove` fix text to prevent
  issues with unusual server names
2026-05-04 20:11:50 -07:00
ozymandiashh
d18ba3d2fe feat(optimize): detect session cost outliers 2026-05-05 05:25:49 +03:00
ozymandiashh
9a258a8a99 fix(date-range): avoid all-period month overflow 2026-05-05 05:05:13 +03:00
ozymandiashh
1a080a006f feat(optimize): MCP tool coverage detector with cache-aware costing
Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".

Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.

Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.

When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.

A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.

Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.

`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."

Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged

Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.

Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
  Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
  fully-qualified names, returns sorted unique list. `buildSessionSummary`
  passes it through; field is omitted when empty so JSON exports stay
  clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
  (single- and multi-server signatures), `detectMcpToolCoverage`. Wired
  into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
  new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
  combined-cap behaviour, threshold gates, invocation-only-session
  filtering, foreign-tool invocations, cache rebuild events, write+read
  on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
  including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).

Closes #2
2026-05-05 04:13:04 +03:00
ozymandiashh
3dc3e32715 fix(date-range): unify 'all' period semantics between CLI and dashboard
`getDateRange` was duplicated across `src/cli.ts` and `src/dashboard.tsx`
with conflicting semantics for `'all'`. The CLI intentionally bounded
`'all'` to the last 6 months (justified inline: keeps Codex/Cursor parses
responsive on sparse multi-year history). The dashboard returned
`new Date(0)` instead, so the same `--period all` flag silently meant
two different windows depending on which entry point you hit.

`Period`, `PERIODS`, `PERIOD_LABELS`, and `toPeriod` were duplicated as
well, and `cli-date.ts` already existed for date helpers
(`parseDateRangeFlags`) so the consolidation lives there.

Both call sites now go through a single `getDateRange(period: string)`
in `cli-date.ts` that returns `{ range, label }`. The dashboard wraps it
as `getPeriodRange(period: Period)` to keep the strict `Period` type at
the React boundary while letting the CLI continue to accept extras like
`'yesterday'`.

`PERIOD_LABELS.all` becomes `'6 Months'` (short, for the dashboard tab
strip; the previous `'All Time'` was misleading and the long-form
`'Last 6 months'` from `getDateRange().label` already drives CLI output).

Changes:
- src/cli-date.ts: add `Period`, `PERIODS`, `PERIOD_LABELS`, `toPeriod`,
  `getDateRange`. Pull the existing 6-month rationale into a named
  `ALL_TIME_MONTHS` constant.
- src/cli.ts: drop the local copies and import from cli-date.
- src/dashboard.tsx: drop the local copies, route through
  `getPeriodRange`, alias the shared `getDateRange` import to
  `getDateRangeShared` to avoid shadowing the wrapper.
- tests/cli-date.test.ts: 13 cases covering `'all'` regression guard
  (must never silently fall back to `Date(0)`), CLI/dashboard agreement,
  end-of-month clamping tolerance, `'yesterday'` support, and
  unknown-input fallback.
- README.md, CHANGELOG.md: surface the bound and point heavy users at
  `--from`/`--to` for unbounded windows.

The CLI flag `--period all` continues to be accepted; only the dashboard
window changes to match what the CLI was already doing. No public API
or schema change.

Refs #93
2026-05-05 03:53:46 +03:00
voidborne-d
c16b21ec50 fix(classifier): surface skill name as subCategory for general turns (#203)
Turns whose only assistant tool is `Skill` collapse to category `general`
because `classifyByToolPattern` returns `'general'` and `refineByKeywords`
only operates on `coding`/`exploration`. In environments that lean on Claude
Code skills, the per-activity dashboard column flattens — every `/init`,
`/review`, `/security-review`, `/claude-api`, plus user-defined skills, all
land in `general` with no signal about which workflow ran.

Implements Option A from the issue:

- `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser
  via a new `extractSkillNames` helper that reads `input.skill || input.name`
  from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at
  optimize.ts:765 so the two stay in sync).
- `ClassifiedTurn.subCategory?: string` set to the first skill name when the
  resolved category is `general` AND any skill identifier was extracted.
  Top-level category stays `general` — existing aggregations, exports, and
  category-keyed code paths unchanged.
- `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns,
  oneShotTurns}>` populated in the same per-turn loop that builds
  `categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills:
  []` — they don't expose the Skill tool surface today.
- Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the
  `general` row when present (indented `/skill-name`, dimmed). Other
  categories render exactly as before; if no skills were invoked, the panel
  is byte-identical to current output.

Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases:
single skill via `input.skill`, single via `input.name`, first-wins for
multi-skill turns, aggregation across multiple assistant calls in one turn,
no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to
`coding` and dropping subCategory, non-Skill general turns, and a legacy
ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix
verification by stashing the source change reproduces 4/8 failures with the
exact "expected 'init', received undefined" diff; restoring → 8/8 pass.

Closes #203.

🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7);
author of record reviewed every line, ran the full vitest suite locally
(`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and
`npm run build` produces a clean ESM bundle.
2026-05-04 06:26:45 +08:00
Nihal Jain
791f2b077d
Add gpt-5.5 model display name for Codex 2026-05-02 08:57:44 -07:00
ozymandiashh
ff8b20a79e review: drop streamError flag, add multi-chunk and torn-write tests
- Stop tracking a separate streamError flag. createReadStream's default
  64 KiB highWaterMark means the stream may already be reading chunk 2
  when we break out of the loop after yielding the first line; if that
  later chunk errors, the flag could reject an otherwise-valid line.
  readline's async iterator already re-throws stream errors on Node 16+,
  which the existing catch handles.
- Test: 120 KB session_meta line forces multi-chunk line assembly.
- Test: truncated mid-write first line is rejected, not parsed as half
  an object.
2026-05-02 02:34:41 +03:00
ozymandiashh
98bbe5b678 review: cap first-line read size and add edge-case tests
- Cap createReadStream at 1 MiB so a malformed file with no newline
  cannot make readline buffer indefinitely (real session_meta lines
  are 22-27 KB).
- Capture stream errors explicitly; readline's async iterator does
  not always re-throw underlying stream errors per Node docs.
- Test: assert project is extracted from the >16 KB session_meta to
  prove the line was actually parsed, not just discovered.
- Test: session_meta line with no trailing newline is still accepted.
- Test: empty rollout file is silently skipped.
2026-05-02 02:30:17 +03:00
ozymandiashh
945da9f0ba fix(codex): read full first line for session validation
`readFirstLine` allocated a fixed 16 KB buffer, but Codex CLI 0.128+
embeds the entire base_instructions / system prompt in the
`session_meta` line, pushing it past 20 KB. When the buffer doesn't
catch a newline, `isValidCodexSession` rejects the session, so every
recent Codex session is silently excluded from totals.

Switch to a streaming readline read so the first line is captured
regardless of length, and add a regression test that creates a
40 KB session_meta payload.

Locally, this changes my 30-day Codex total from €267 (only ~half
of sessions parsed) to €878 (all sessions parsed).
2026-05-02 02:17:53 +03:00
AgentSeal
888f592bd2 Make daily cache durable: hydrate from all commands, migrate instead of nuke
- Extract ensureCacheHydrated() from menubar-json path into daily-cache.ts
- Call it from every command that parses sessions (report, status, today,
  month, export, optimize, compare, yield) so CLI-only users also persist
  historical data that survives source file deletion
- Replace strict version equality check with fill-defaults migration for
  cache versions 2-4, preserving history across schema changes
- Back up old cache to .bak before discarding on unmigrateable versions
- Fix Copilot auto bucket display names in menubar (Copilot (Anthropic),
  Copilot (OpenAI))
- Fix Roo Code / KiloCode provider key matching in menubar tab strip
2026-04-28 22:41:01 +02:00
Resham Joshi
fbb2c4e69c
Merge pull request #171 from ksp2000/feature/copilot-auto-model-buckets
refactor(copilot): use auto model buckets for transcript inference
2026-04-28 12:17:50 -07:00
Dunccan de Weerdt
26ebe75aa1 Add Droid CLI provider
Discovers and parses sessions from ~/.factory/sessions/, reading JSONL
message logs and companion settings.json files for token usage tracking.

- Discovers sessions by scanning per-cwd subdirectories
- Skips internal .factory housekeeping sessions
- Extracts tools, bash commands, and user messages from JSONL
- Distributes session-level cumulative token counts across calls
- Normalizes Droid model wrappers before existing pricing lookup
- Derives clean project names from cwd paths
- Adds menubar provider filtering for Droid
2026-04-28 20:16:45 +02:00
AgentSeal
d043795855 Add Qwen provider and replace hardcoded pricing with LiteLLM snapshot
- Add Qwen CLI provider (discovers sessions from ~/.qwen/projects/)
- Replace FALLBACK_PRICING (40 hand-maintained entries) with auto-generated
  LiteLLM snapshot (3595 models including Azure, OpenRouter pricing)
- Build script fetches and bundles LiteLLM data before tsup
- Provider-prefixed lookups (azure/, openrouter/) resolve to correct pricing
- Add display names for all GPT-5.x model variants
- Add Qwen to menubar provider filter and tab strip
2026-04-28 19:49:14 +02:00
Resham Joshi
ec2de6a642
Add OpenClaw, Roo Code, and KiloCode providers (#175)
- OpenClaw: JSONL parser with multi-path discovery, tool extraction
  (toolCall + tool_use block types), model tracking via model_change
  and custom model-snapshot events
- Roo Code + KiloCode: shared Cline-family parser extracts model from
  <model> tags in api_conversation_history.json, strips provider
  prefixes from model names
- Add cline-auto and openclaw-auto aliases and display names
- Add menubar provider filters and tab colors for all three
- Show cached data instantly instead of blocking on CLI refresh
2026-04-28 09:24:14 -07:00
saipraneeth.konda
74c1c4b4c1 refactor(copilot): use auto model buckets for transcript inference 2026-04-28 19:32:03 +05:30
Resham Joshi
6d15ea43a5
Add Gemini CLI provider for session tracking (#168)
Parse ~/.gemini/tmp/<project>/chats/session-*.json files from Gemini
CLI 0.38+. Uses real token counts (input, output, cached, thoughts)
embedded in each message instead of character estimation. Correctly
separates cached tokens from fresh input to avoid double-charging.

- Pricing for gemini-3.1-pro-preview, gemini-3-flash-preview,
  gemini-2.5-pro, gemini-2.5-flash from official Google API rates
- Tool name normalization (ReadFile->Read, SearchText->Grep, etc.)
- Menubar tab with Google Blue color (#4485F4)

Closes #166
2026-04-27 19:48:25 -07:00
Resham Joshi
f7f64a01ab
Add new providers, fix menubar tabs, accent color picker (#167)
* Add Kiro provider and transparent auto-model naming

- Add Kiro IDE provider: parses .chat JSON files, estimates tokens,
  normalizes dot-versioned model IDs for cost lookup
- Show "Cursor (auto)", "Copilot (auto)", "Kiro (auto)" in CLI
  dashboard instead of pretending to know which model was used
- Route auto model names through BUILTIN_ALIASES for cost estimation

* Fix menubar tabs: add missing providers, show period-scoped costs

- Add Kiro, OMP to ProviderFilter enum so installed providers appear as tabs
- Merge Cursor + Cursor Agent into single Cursor tab
- Tab costs now reflect the selected period (7d/30d/month/all) instead
  of always showing today
- Tab visibility still uses today's provider list so tabs don't
  disappear when switching to periods with no data

* Add accent color picker to menubar with Apple system presets

- 9 presets using Apple's exact macOS dark-mode accent colors
  (Ember, Blue, Purple, Pink, Red, Orange, Yellow, Green, Graphite)
- Color picker in header, persisted via UserDefaults
- "Burn" text stays fixed ember regardless of accent
- ThemeState is MainActor-isolated for thread safety
- Picker state lifted to AppStore so it survives .id() tree rebuild
- Accessibility labels on all color swatches
- Renamed brandAccentDark/brandEmberDeep/brandEmberGlow to match
  their actual light/deep/glow semantics

* Fix review findings: case-sensitive cost lookup, Kiro timestamp guard, cache versioning

- Normalize provider dictionary keys to lowercase in tab cost lookup
  so "Cursor Agent" (title-case from CLI) matches providerKeys
- Guard against missing/invalid/epoch startTime in Kiro parser to
  prevent RangeError crash or 1970-01-01 ghost entries
- Bump DAILY_CACHE_VERSION to 4 so upgraded users get a clean
  recompute with the new auto-model naming (cursor-auto vs default)
- Add version field to cursor-results.json cache to invalidate stale
  entries that still use the old 'default' model name
2026-04-27 19:46:30 -07:00
Resham Joshi
5d1b335c0a
Fix Copilot provider to read VS Code workspace transcripts (#165)
The Copilot provider only looked in ~/.copilot/session-state/ which is
from an older CLI tool. VS Code Copilot agent stores transcripts in
~/Library/Application Support/Code/User/workspaceStorage/*/GitHub.copilot-chat/transcripts/.

The new transcript format has no outputTokens or model_change events,
so tokens are estimated from content length and the model is inferred
from tool call ID prefixes. Both legacy and VS Code paths are now
scanned in parallel.

Fixes #161
2026-04-27 19:44:35 -07:00
AgentSeal
410fad9495 Fix Cursor provider reporting $0 for v3 bubble format and NULL createdAt rows
Cursor v3 stores zero token counts in bubbles, causing parseBubbles to
return empty results. The query also dropped rows with NULL createdAt
via the SQL comparison, hiding data from older Cursor versions too.

Changes:
- Remove inputTokens > 0 SQL filter, estimate tokens from text length
  when token counts are zero (same 4 chars/token ratio as agentKv)
- Include NULL createdAt rows with OR IS NULL, fall back to current
  timestamp when createdAt is missing
- Parse agentKv entries with plain string content instead of skipping
  them (not all content is a JSON array)
- Always parse both bubbles and agentKv instead of agentKv-only fallback
- Discover subagent transcripts in subagents/ subdirectories
- Fix timezone-dependent test in day-aggregator

Fixes #159, #163
2026-04-28 00:35:51 +02:00
Łukasz Majcher
5e49f17e64 fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM
readViaStream (used for files ≥8 MB) reconstructs the full file as a
single string via chunks.join('\n'), giving the same peak allocation as
readFile. Callers then call content.split('\n'), creating a second copy.
With FILE_READ_CONCURRENCY=16 and files up to 128 MB this can exhaust
the V8 heap (~6 GB theoretical peak).

readSessionLines already exists as a proper async generator that yields
one line at a time. Switch both hot-path callers to iterate it directly
so the full file string is never held in memory.

Adds two tests: a spy test confirming readSessionLines is called (not
readSessionFile), and a 500-entry correctness test.

Fixes #131
2026-04-22 10:11:13 +00:00
iamtoruk
4f1138290e Merge main into feat/omp-support-model-aliases
Second merge of main since the PR was opened. Main moved 30+ commits
(0.8.5 bump, plan tracking feature, MiniMax pricing, menubar
prefetchAll walk-back, aicrowd cache rewrite revert) so the branch
needed another reconciliation before merging to main.

Two new conflicts resolved. Took main's text in both cases per the
policy of favoring main when the feature work is neutral:

  README.md             Kept main's Node 20+ / better-sqlite3
                        Requirements wording and main's shorter src/
                        tree listing. Added OMP to the Requirements
                        line.

  src/providers/pi.ts   Main dropped the discovery-cache snapshot and
                        the rich source-metadata fields as part of the
                        aicrowd revert. Took main's simpler structure
                        and only kept the providerName parameter so
                        OMP sources still report the correct provider
                        in the session source and dedup key.

Earlier fixups carried forward from the prior merge commit:
  - Object.hasOwn guards in resolveAlias against prototype-pollution
    via a model literally named '__proto__'.
  - source.provider in the dedup key prefix so OMP rows no longer
    stamp 'pi:'.
  - Combined pi.js imports in providers/index.ts.
  - Trailing newline on pi.ts.
  - Unknown-model fallback in cursor-agent.ts from yesterday's PR #117
    fixup (preserved via main).

353 tests pass (count dropped from 378 because main deleted the
parse-progress / parser-cache / provider-colors / source-cache test
files alongside the cache-rewrite revert).

Feature work by @cgrossde.
2026-04-21 11:51:20 -07:00
iamtoruk
81b5cda173 feat: add MiniMax-M2.7 and MiniMax-M2.7-highspeed model pricing
Adds FALLBACK_PRICING entries plus display names so MiniMax sessions
show up with the right cost and readable labels when users route MiniMax
through providers like OpenCode. Pricing verified against the live
MiniMax paygo page:

  MiniMax-M2.7           input $0.3/M  output $1.2/M  cache-read $0.06/M  cache-write $0.375/M
  MiniMax-M2.7-highspeed input $0.6/M  output $2.4/M  cache-read $0.06/M  cache-write $0.375/M
2026-04-21 05:50:52 -07:00