Commit graph

7 commits

Author SHA1 Message Date
Resham Joshi
810b214476
Cursor: per-project breakdown by workspace (closes per-project half of #196) (#296)
Cursor's chat history showed as a single row labeled 'cursor' in
the dashboard because the global state.vscdb has no workspace
field on individual bubbles. The fix joins through Cursor's
per-workspace storage:

1. Walk ~/Library/Application Support/Cursor/User/workspaceStorage/*
2. For each hash dir, read workspace.json -> folder URI
3. Open that dir's state.vscdb, read
   ItemTable['composer.composerData'] -> allComposers list
4. Build Map<composerId, folder URI>
5. emit one SessionSource per workspace plus a catch-all 'cursor'
   source for composers that did not register against any
   workspace (multi-root workspaces, no-folder-open windows,
   deleted workspaces with surviving global rows)

The parser decodes source.path's #cursor-ws= tag, filters the
parsed bubbles to the composerIds that belong to this workspace,
and yields only those. The orphan-tag source negates the filter so
it captures every composer not in any workspace.

In passing, fix a real bug in the old code: parseBubbles set
`sessionId: row.conversation_id ?? 'unknown'`, but the JSON
`conversationId` field is empty in current Cursor builds, so every
call shipped with `sessionId: 'unknown'`. We now derive the
composer id from the row key (`bubbleId:<composerId>:<bubbleUuid>`)
which is what the workspace map joins on. The old behavior masked
the bug because every call went into a single 'cursor' project
anyway; with per-workspace bucketing the bug becomes load-bearing.
Cache version bumped 2 -> 3 to invalidate caches that still record
'unknown' as the session id.

Live-tested against my real 1.9 GB Cursor DB: the single 'cursor'
row with 1904 calls / $4.08 now breaks into 5 workspaces plus an
orphan bucket, totals reconcile exactly. 8 fixture-based tests
cover multi-workspace routing, orphan filtering, legacy bare DB
path backwards compat, multi-root workspace skip, vscode-remote
URI slugification, and total reconciliation across all sources.

Full suite: 46 files, 653 tests passing.
2026-05-10 15:35:57 -07:00
Resham Joshi
afd0ee7011
Validator hardenings on the bug-hunt batch (#254)
* Five correctness fixes from multi-agent bug hunt

A multi-agent audit of the codeburn correctness surface found five
real bugs each producing visibly wrong numbers or risking data loss.
All five fixes were validated by parallel review agents and exercised
end-to-end against real session data on this machine.

- src/cli.ts: --refresh <seconds> was using bare parseInt as the
  commander callback. Commander invokes the callback as
  parseInt(value, previous), so previous becomes the radix:
  --refresh 30 was being parsed as parseInt('30', 30) = 90, and
  --refresh 60 became NaN. Replaced with parseInteger (already
  defined at line 48 with radix locked to 10) at all three sites.

- src/providers/cursor.ts: parseAgentKv was timestamping every
  agentKv call as new Date().toISOString() because the Cursor
  SQLite schema has no per-message timestamp. Result: every
  Cursor agent call regardless of when it happened landed in
  today's date bucket. Now uses statSync(dbPath).mtimeMs as a
  bounded ceiling so calls land at the actual last-write time of
  the Cursor database, not today. Verified locally: a 1904-call
  Cursor history with March 22 mtime now correctly bucket into
  all-time only and shows 0 calls for today/week/30days.

- src/providers/codex.ts: prev token counters were only updated
  inside the cumulative-fallback branch, so a session emitting N
  events with last_token_usage followed by one cumulative-only
  event computed the next delta against prev=0 and double-counted
  the entire cumulative window. Cost could be inflated 10-100x
  for any mixed-format Codex session. Now prev advances to the
  current cumulative state regardless of which branch ran.

- src/providers/gemini.ts: totalOutput accumulated output+thoughts
  while totalThoughts was tracked separately. The result was
  outputTokens = output+thoughts AND reasoningTokens = thoughts;
  any consumer summing the two double-counted thoughts. Now
  totalOutput holds just output, reasoningTokens holds thoughts,
  and the cost calc folds thoughts into the output count to keep
  pricing correct (Google bills thoughts at the output rate;
  calculateCost has no reasoning parameter).

- src/export.ts: exportJson had no safety check before writeFile,
  so codeburn export -f json -o ~/important.json would silently
  clobber the user's file. CSV path had a marker-file guard; JSON
  did not. Now refuses to overwrite a file unless its first 4KB
  contain the codeburn schema marker. Uses a streaming partial
  read so a large existing file does not OOM Node's ~512MB
  string limit. Refuses directories outright.

Skipped intentionally: cursor-auto/copilot-auto/cline-auto/
qwen-auto are aliased to claude-sonnet-4-5. The audit flagged
this as wrong pricing for non-Anthropic auto-routed turns, but
Cursor's "auto" mode does not expose the actual model and any
alternative estimate is equally arbitrary. README already
documents this as a Sonnet-based estimate.

vitest run: 38 files, 529 tests pass.

* Five more correctness fixes from the bug-hunt round

This commit closes out the remaining critical-tier findings from the
multi-agent audit, with one item documented as a known limitation.

- src/providers/cursor.ts: bubble dedup key included mutable
  inputTokens/outputTokens. Cursor mutates token counts on the row in
  place when streaming completes, so re-parsing the same DB produced
  a fresh dedup key per bubble and silently double-counted. Switched
  to the SQLite row key (`bubbleId:<unique>`) which is stable per
  bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose
  `key as bubble_key`.

- src/providers/pi.ts: usage fields were destructured non-optionally,
  but real Pi/OMP session files sometimes omit individual fields.
  `calculateCost(model, undefined, ...)` returned NaN, and that NaN
  propagated into every aggregate cost total. Coerce each field to
  0 with `?? 0`.

- src/models.ts: getShortModelName and the getModelCosts startsWith
  fallback both walked the dictionary in insertion order. A model id
  like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched
  by startsWith first) and silently get GPT-5's display name and
  pricing tier. Iterate longest keys first so more-specific prefixes
  win. Tightened the cost fallback's match condition from
  `startsWith(key) || startsWith(key + '-')` to require either an
  exact match or a `key + '-'` continuation, removing accidental
  matches like `gpt-50` against `gpt-5`.

- src/models.ts: calculateCost returned 0 silently for any model
  missing from the pricing snapshot. New Anthropic / OpenAI models
  shipped between snapshot refreshes look free until the user
  notices. Now warns once per unknown model name per process to
  stderr. Skips the warning for the `<synthetic>` placeholder so
  the noise floor stays low.

- src/yield.ts: revert detection was broken on the canonical case.
  Two problems: (1) `subject.toLowerCase().includes('revert')`
  matched any commit whose subject mentioned the word ("Add revert
  button" was misclassified). (2) The window logic only counted
  reverts within the original session's 1-hour boundary, but real
  `git revert` commits land in later sessions, so original sessions
  always looked productive. Now: getRevertedShas runs once with
  `--grep=^This reverts commit` and parses bodies to build a Set of
  SHAs that were the target of a revert anywhere in history.
  CommitInfo.wasReverted is set when this commit's SHA appears in
  that set. categorizeSession then flags a session as reverted when
  its in-main commits were later reverted, regardless of when the
  revert itself happened.

- src/providers/droid.ts: SKIPPED with comment. Droid records token
  usage only at session level. The current behavior splits evenly
  across emitted assistant calls and prices all of them at
  settings.model (the latest model). For sessions where the user
  switched models mid-stream, costs are approximate. Added an
  inline comment documenting this; a real fix requires per-message
  model data that isn't in the Droid JSONL schema.

Verified end-to-end on this machine:
- vitest run: 38 files, 529 tests pass
- `codeburn report --format json` produces valid JSON
- `codeburn yield -p week` runs without crashing, finds 0 reverts
  in the user's recent git history (plausible — fix changed the
  detection from "subject contains revert" to "this commit's SHA
  appears in a later 'This reverts commit ...' body")
- Stderr now warns for unknown model ids: `openai/gpt-5.3`,
  `qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced
  silently at $0.

* Four high-severity fixes from the bug-hunt round

- src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in
  one try/catch. If fetchRate succeeded but cacheRate threw (disk
  full, ENOSPC, no permissions on the cache dir), the catch block
  swallowed the error and returned 1. Every cost rendered after that
  point became USD-equivalent silently. Now the fetch and the cache
  write live in separate paths: a successful fetch returns the rate
  even if the persist fails, and the cache-write error is dropped to
  a fire-and-forget so transient disk problems do not corrupt the
  user's currency display.

- src/cursor-cache.ts: writeFile was non-atomic. Two concurrent
  codeburn invocations writing to cursor-results.json could
  interleave bytes mid-write, leaving a truncated file that
  parsed-error on next read and forced a full SQLite re-scan every
  run. Switched to the temp-file + rename pattern with a randomized
  temp name so each writer gets its own staging file and the rename
  is atomic on POSIX. Crash mid-write also leaves only a leftover
  temp file, which gets unlinked in the catch path; the destination
  is never half-written.

- mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's
  Task.sleep keeps a wakeup pending across system sleep, so on wake
  the natural tick fires the same instant the wake observers do.
  Combined with didWakeNotification, screensDidWakeNotification, and
  the launchd com.codeburn.refresh distributed notification, that
  produced 2-3 concurrent CLI spawns within ms of every wake. Now:
  willSleepNotification cancels the loop task; didWakeNotification
  restarts it. The loop also reads lastRefreshTime and skips its
  natural tick if a wake/manual/distributed-notification refresh ran
  within the last 5 seconds, coalescing the two sources of refresh
  into one CLI spawn per wake event.

- mac/.../CodeBurnApp.swift observeStore: the read closure had an
  implicit strong self capture (it accessed store.* without a
  capture annotation), pinning self for the lifetime of any
  unfired observation. Added [weak self] and a guard to make the
  capture explicit. withObservationTracking is one-shot per call,
  so there is at most one active subscription at a time; the
  earlier audit's claim of an unbounded leak overstated the issue,
  but tightening the capture pattern is still cleaner.

Verified:
- vitest run: 38 files, 529 tests pass
- swift build -c release --arch arm64 --arch x86_64: clean, no
  diagnostics, no MainActor warnings
- mac/Scripts/package-app.sh dev produces a valid universal bundle
- Menubar launches and runs without crash

* Eleven medium-severity fixes from the bug-hunt round

- src/format.ts formatTokens: guard against Infinity, NaN, and
  negative input. Previously a corrupt aggregate could leak into
  the UI as the literal strings "NaN" or "Infinity". Negatives now
  render as "0" rather than "-500" with no scaling.

- src/cli-date.ts parseDateRangeFlags: the missing-from default
  was new Date(0), which opened a 55-year scan from 1970 epoch
  whenever the user passed only --to. Default now anchors at 6
  months back from now, matching the dashboard's all-time period.
  Test updated to assert the new bounded window.

- src/cli-date.ts toPeriod: previously fell back silently to "week"
  for any unknown input, so a typo like `-p mounth` produced a
  quiet 7-day report while the user thought they were viewing the
  month. Now exits with a clear stderr error and exit code 1.
  Test updated to assert the loud-failure behavior.

- src/optimize.ts urgencyScore: rebalanced weights so a high-impact
  finding with zero observed tokens cannot outrank a medium-impact
  finding with millions of tokens. Old 0.7/0.3 split made high+0
  (0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B
  (0.75) beat high+0 (0.50). Token normalization lifted to 5M so
  the ramp covers a realistic spend range.

- src/models.ts calculateCost: clamp negative or non-finite token
  inputs to 0 before pricing. A corrupt JSONL emitting a negative
  count would otherwise produce a negative cost that silently
  subtracted from real spend in aggregates.

- src/currency.ts convertCost: stop rounding during aggregation.
  For zero-fraction currencies (JPY, KRW, CLP) this clamped every
  per-session cost to a whole unit before sum, so a project of
  1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of
  ¥400. formatCost still rounds at the display boundary.

- src/config.ts saveConfig: the temp file path was a fixed
  `${configPath}.tmp` suffix. Two simultaneous saveConfig calls
  (overlapping menubar and CLI runs) raced on the same staging
  file and could leave one writer reading partial bytes from the
  other. Randomized the temp suffix per call.

- src/providers/antigravity.ts flushCache: the early return on
  `!cacheDirty` short-circuited eviction when liveCascadeIds was
  supplied but no cascade had been added or updated this run. As
  a result, deleted .pb files persisted in the cache forever once
  the user stopped writing to it. Eviction now runs whenever
  liveCascadeIds is provided, marks the cache dirty if anything
  was removed, and only then short-circuits if there is nothing
  to write.

- src/daily-cache.ts addNewDays: cap retention at 2 years. The
  days array previously merged forever, growing the cache file by
  hundreds of bytes per day until JSON parse on every CLI
  invocation became measurable. The 6-month UI period plus the
  365-day BACKFILL_DAYS bootstrap both fit comfortably inside the
  cap, with headroom for a future longer window.

- src/dashboard.tsx useInput: period number keys (1-5) and arrow
  keys triggered a reload while the compare view was mounted. The
  parent's data state changed underneath the user with no visual
  affordance back to the dashboard. Now those keys are gated on
  view !== 'compare', and `b` / Esc inside compare returns to the
  dashboard.

- mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend
  Bars, computeTrendStats, computeForecast, and computeAllStats
  each allocated a fresh DateFormatter (and Calendar) on every
  call. SwiftUI re-evaluates these views many times per second
  during hover scrubbing on the trend chart, so the allocations
  were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d"
  / "MMM d" formatters and the gregorian Calendar to fileprivate
  cached singletons.

Two findings from the same bucket were not addressed here:
- UpdateChecker SHA-256 / codesign verification is already
  performed by src/menubar-installer.ts (verifyChecksum at line
  85). The Swift side just kicks off `codeburn menubar --force`
  which runs that path. The audit's claim of missing verification
  was a misread.
- NSDistributedNotificationCenter sender validation: the
  `com.codeburn.refresh` listener accepts from any sender, but
  forceRefresh has a 5-second rate-limit gate so the abuse
  ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC,
  per-launch shared secret) are disproportionate to the impact.

vitest run: 38 files, 529 tests pass.
swift build -c release: clean, no warnings.

* Validator hardenings on the bug-hunt batch

Hoist the per-call sort in getModelCosts and getShortModelName to module
scope so model lookups on the hot path stop reallocating sorted key arrays.

Sanitize the unknown-model stderr warning by stripping C0/C1 controls
and capping length, so a hostile or corrupt JSONL cannot inject terminal
escape sequences via the model field.

Skip the daily-cache prune when newestDate fails to parse. The previous
code produced a NaN cutoff and silently dropped every cached day on the
next merge.

Adds tests locking down the stable resolution of common model names
(gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and
the prune NaN guard.
2026-05-06 19:50:40 -07:00
Resham Joshi
f7f64a01ab
Add new providers, fix menubar tabs, accent color picker (#167)
* Add Kiro provider and transparent auto-model naming

- Add Kiro IDE provider: parses .chat JSON files, estimates tokens,
  normalizes dot-versioned model IDs for cost lookup
- Show "Cursor (auto)", "Copilot (auto)", "Kiro (auto)" in CLI
  dashboard instead of pretending to know which model was used
- Route auto model names through BUILTIN_ALIASES for cost estimation

* Fix menubar tabs: add missing providers, show period-scoped costs

- Add Kiro, OMP to ProviderFilter enum so installed providers appear as tabs
- Merge Cursor + Cursor Agent into single Cursor tab
- Tab costs now reflect the selected period (7d/30d/month/all) instead
  of always showing today
- Tab visibility still uses today's provider list so tabs don't
  disappear when switching to periods with no data

* Add accent color picker to menubar with Apple system presets

- 9 presets using Apple's exact macOS dark-mode accent colors
  (Ember, Blue, Purple, Pink, Red, Orange, Yellow, Green, Graphite)
- Color picker in header, persisted via UserDefaults
- "Burn" text stays fixed ember regardless of accent
- ThemeState is MainActor-isolated for thread safety
- Picker state lifted to AppStore so it survives .id() tree rebuild
- Accessibility labels on all color swatches
- Renamed brandAccentDark/brandEmberDeep/brandEmberGlow to match
  their actual light/deep/glow semantics

* Fix review findings: case-sensitive cost lookup, Kiro timestamp guard, cache versioning

- Normalize provider dictionary keys to lowercase in tab cost lookup
  so "Cursor Agent" (title-case from CLI) matches providerKeys
- Guard against missing/invalid/epoch startTime in Kiro parser to
  prevent RangeError crash or 1970-01-01 ghost entries
- Bump DAILY_CACHE_VERSION to 4 so upgraded users get a clean
  recompute with the new auto-model naming (cursor-auto vs default)
- Add version field to cursor-results.json cache to invalidate stale
  entries that still use the old 'default' model name
2026-04-27 19:46:30 -07:00
AgentSeal
2afab5f71a fix: final review cleanup
- Remove unused vi import from cursor test
- Move LANG_DISPLAY_NAMES to module scope (was re-created per render)
- Remove redundant 'script' regex (scrip?t already covers it)
- Unexport getDbFingerprint (internal to cache module)
- Move beforeEach inside describe block (only cursor tests need it)
2026-04-15 05:35:11 -07:00
AgentSeal
3fabc105d8 perf: file-based result cache for Cursor DB
First run parses the 21GB DB (slow, ~40-80s). Writes parsed
results to ~/.cache/codeburn/cursor-results.json. Subsequent
runs check DB mtime+size -- if unchanged, load from cache
(instant). Cache auto-invalidates when Cursor modifies the DB.
2026-04-15 05:11:30 -07:00
AgentSeal
c7195c4a17 fix: handle ISO string timestamps from Cursor database
Cursor stores createdAt as ISO strings ('2026-02-23T06:00:51.123Z'),
not numeric timestamps. Was comparing string against number, so all
rows were filtered out. Now uses ISO string comparison throughout.
2026-04-15 04:17:56 -07:00
AgentSeal
70931b7269 feat: add Cursor IDE provider with SQLite adapter
Reads token usage from Cursor's local state.vscdb database.
Supports per-request input/output tokens, model tracking,
and incremental caching for large databases.

- better-sqlite3 as optionalDependency (lazy-loaded, no impact on Claude/Codex)
- Parameterized SQL queries, read-only mode, per-row error handling
- Schema detection with clear error on format changes
- Cache layer with timestamp watermark for incremental reads
- Provider colors and [p] key cycling in dashboard
- 39 tests passing, zero regressions
2026-04-15 03:44:43 -07:00