shortProject() decoded Claude Code slugs by splitting on '-', which
broke directory names containing dashes ('foo-bar' became 'foo/bar').
Switch the dashboard to consume ProjectSummary.projectPath (the
canonical cwd already extracted by parser.ts) and rewrite shortProject
to operate on a real absolute path.
The Project Structure tree was duplicating information that
docs/architecture.md already covers in better detail (and updates
faster). Removing it from the README keeps the marketing-facing
README scoped to "what is this and how do I install it" and
points contributors at the proper map.
In its place, a short Sponsor section pointing at
https://github.com/sponsors/iamtoruk so users who find the tool
useful know where to support development.
Cursor emits model names in a `claude-<dot-version>-<tier>` shape
(`claude-4.6-sonnet`, `claude-4.5-opus`, `claude-4.5-opus-high-thinking`,
etc.) plus its own `composer-1` house model. None of these match
the canonical LiteLLM pricing keys (`claude-sonnet-4-6`,
`claude-opus-4-5`).
The alias map in `src/models.ts` filled some of these in v0.9.4
but missed:
- plain no-suffix forms: `claude-4.5-opus`, `claude-4.5-sonnet`,
`claude-4.6-opus`
- haiku tier: `claude-4.5-haiku`, `claude-4.6-haiku`
- forward-looking: `claude-4.7-opus`
- Cursor's house model: `composer-1`
The dashboard rendered $0 for sessions that used any unaliased
model — visible in the screenshots posted in #159 even after the
v0.9.4 fix that added the `-thinking` variants.
This PR fills the gaps and adds 16 regression tests under
`Cursor model variants resolve to pricing` that assert every
model name in `src/providers/cursor.ts:modelDisplayNames` plus
the additional plain forms resolves to a non-null pricing entry
with `inputCostPerToken > 0` and `outputCostPerToken > 0`. So a
future LiteLLM snapshot bump or a typo in the alias map will fail
the test before users see $0.
Direct hits in the snapshot (no alias needed): `gpt-5`, `gpt-5.2`,
`grok-code-fast-1`, `gemini-3-pro` (already aliased). These are
covered in the test suite as well so a snapshot that drops them
would also be caught.
Tests: 45 files, 617 passing locally (16 new). Closes#159.
Messages like "add error handling", "create an issue tracker", or
"implement the 404 page" were landing in the Debugging bucket
because the classifier checked DEBUG_KEYWORDS (which matches
`error`, `issue`, `404`) before FEATURE_KEYWORDS in both
`refineByKeywords` (tool-bearing turns) and `classifyConversation`
(chat-only turns). The position of the matched word in the
sentence is a much stronger intent signal than the order of the
checks in code, so we now pick whichever pattern matches earliest.
The new helper `firstMatchingCategory` runs each candidate regex
once with `RegExp.exec` and keeps the match with the lowest
`index`. Ties (rare in practice — same start position) break by
the order the candidates were listed, which is `refactoring >
feature > debugging` for coding turns. That ordering preserves
existing behavior for plain bug reports (e.g. "login is broken,
traceback below") while flipping mislabeled feature work to its
correct category.
8 regression tests in `tests/classifier.test.ts` cover the
mislabel cases from #196 plus tie-break / chat-only cases. Full
suite: 45 files / 609 tests, all green.
Closes the activity-misattribution half of #196. The Cursor
provider attribution half (single 'cursor' project for all
sessions) is addressed in a separate PR.
Adds an OS-delimited list env var so a user with more than one
Claude account or profile can scan all of them in a single run.
Sessions across every configured dir merge into one ProjectSummary
per project, matching the option-1 design agreed on the issue
thread (no per-account splitting in the data model or the UI).
Format: `CLAUDE_CONFIG_DIRS=~/.claude-work:~/.claude-personal`
on POSIX, `;`-separated on Windows. Precedence is
CLAUDE_CONFIG_DIRS > CLAUDE_CONFIG_DIR > ~/.claude. Empty entries
in the list are skipped, duplicates are deduped on resolved path,
and a missing or unreadable dir does not abort the scan of the
others. If the user explicitly set CLAUDE_CONFIG_DIRS but every
listed entry is unreadable, a one-line stderr hint identifies the
attempted paths and the platform's expected delimiter, so a
Windows user typing the POSIX `:` does not get a silent zero-row
result. `~` is now also expanded in CLAUDE_CONFIG_DIR for
consistency.
Implementation is intentionally narrow: only `claude.ts` changes,
plus a small parser-cache key update so a stale cache from one
config does not bleed into a run with a different config (matters
for the macOS menubar and GNOME extension which run as long-lived
processes). The merge happens for free in
`src/parser.ts:scanProjectDirs`, which keys ProjectSummary entries
by canonical cwd (or the sanitized slug as a fallback). Two
SessionSource entries with the same `project` field land under the
same key and combine their sessions, regardless of which dir they
came from. No new fields on SessionSource / SessionSummary /
ProjectSummary, and no UI changes.
Tests: 12 fixture-based cases covering the unset path (default
~/.claude), single-dir override via CLAUDE_CONFIG_DIR, multi-dir
override via CLAUDE_CONFIG_DIRS, ~ expansion, dedup of repeated
entries, leading/trailing/doubled delimiters, missing dir
tolerated, file-not-directory entry tolerated, empty
CLAUDE_CONFIG_DIRS falls back to single-dir env, and two
parser-level integration tests asserting (a) two sessions from
two dirs sharing one cwd produce one ProjectSummary with combined
totals and no `account`/`accountPath` fields anywhere, and (b)
two sessions sharing a slug but with different canonical cwds
still merge by slug at the project-rollup layer (option 1
behavior pinned so a future refactor cannot quietly swap to
cwd-aware merging without an explicit opt-in).
Supersedes the alternative implementation in #227, which builds
per-account attribution (option 2) instead.
* Expose per-day one-shot data in daily JSON output
Closes#279.
Adds turns, editTurns, oneShotTurns, oneShotRate to each entry of the
`daily[]` array in `codeburn report --format json` output. The data was
already computed internally for activity-level rollups; this just buckets
it by date so consumers building daily-resolution efficiency dashboards
(streak tracking, heatmaps, rolling-window charts) don't have to re-derive
the rate from period-level activities.
Counting matches parser.ts categoryBreakdown semantics:
- every turn counts toward `turns`
- turns with hasEdits=true count toward `editTurns`
- edit turns with retries=0 count toward `oneShotTurns`
- oneShotRate is null (not 0) when editTurns=0 — a chat-only day's rate
is undefined, and reading it as 0% would be misleading
Real consumer named in the issue: a 10-developer internal usage tracker
that scores days by cache hit + cost/call + (now) one-shot rate.
* Strengthen daily/activities reconciliation + CHANGELOG entry
- Fall back to turn.assistantCalls[0]?.timestamp when turn.timestamp is
missing so daily aggregate doesn't drop turns that activities[] keeps.
Previously sum(daily[].editTurns) could be < sum(activities[].editTurns)
for sessions starting with assistant entries before any user line.
- Add Unreleased CHANGELOG entry for the daily one-shot fields.
Closes#278.
Adds Charmbracelet Crush as a lazy-loaded provider:
- src/providers/crush.ts: walks ~/.local/share/crush/projects.json
(XDG_DATA_HOME and CRUSH_GLOBAL_DATA aware), opens each project's
crush.db read-only, queries root sessions where parent_session_id
IS NULL. Emits one ParsedProviderCall per session with real
prompt_tokens, completion_tokens, cost (dollars), and the
dominant model resolved from messages.model.
- src/providers/index.ts: register crush alongside cursor, goose,
opencode, antigravity, cursor-agent in the lazy import path.
- tests/providers/crush.test.ts: 10 fixture-based tests covering
discovery, parsing, missing-registry, malformed JSON, missing db,
child session exclusion, dominant model selection, dedup, and
array-shaped legacy registry.
Schema source: charmbracelet/crush@v0.66.1
internal/db/migrations/20250424200609_initial.sql, verified by
spawning a research agent against upstream. The schema *comments*
in that migration claim millisecond timestamps but every actual
INSERT/UPDATE uses strftime('%s', 'now') which returns Unix
seconds; the parser treats values as seconds. Tokscale's
parser (junhoyeo/tokscale#346) gets this wrong and is off by
1000x, plus its parser misses the prompt_tokens/completion_tokens
columns that exist in Crush's schema. Our integration uses both,
so Crush sessions get real per-model attribution.
Menubar:
- mac/Sources/CodeBurnMenubar/AppStore.swift: add .crush case to
ProviderFilter and its cliArg switch.
- mac/Sources/CodeBurnMenubar/Views/AgentTabStrip.swift: add
Crush color to the per-tab color extension. The visibleFilters
computed property already filters by detected providers, so the
Crush tab appears automatically when a user has Crush data.
README:
- Replace the provider table with an icon-led layout. Icons live
under assets/providers/<name>.<ext>. 14 icons sourced from
junhoyeo/tokscale (MIT) under nominative fair use, 4 sourced
separately: codex (OpenAI org avatar), cursor-agent (reuses the
Cursor icon), kiro (kiro.dev favicon, ico->png via sips), omp
(can1357/oh-my-pi icon.svg, MIT). Attribution line added.
- Add Crush row.
Docs:
- docs/providers/crush.md: full per-provider doc with verified
schema excerpt, the seconds-vs-milliseconds quirk, and a
"when fixing a bug here" checklist.
- docs/architecture.md: provider count 17 -> 18, test count
41 -> 42, and crush in the lazy list.
- docs/providers/README.md: add Crush row to the lazy index.
- CONTRIBUTING.md: bump test count to 568 (was 558).
All 568 tests pass locally; swift build clean.
The provider table now points each row at its docs/providers/<name>.md
file (added in #284) instead of repeating a one-line path that the
per-provider doc covers in full. The Data Location column is dropped;
the new Doc column links to the markdown that owns the path, storage
format, dedup key, and known quirks.
The trailing sentence is updated to reference the per-provider docs as
the source of truth for data locations.
A single dense table of every (provider, model) you have used in the
selected period, sorted by cost. Inspired by tokscale's per-model
output and ccusage's responsive cli-table3 layout, ported to plain
Node with no new runtime dependency.
Default view: one row per (provider, model) with a Top Task cell
showing the dominant task category and its cost share, e.g.
`Coding (42%)`.
`--by-task` explodes each model into one row per task type, with
provider/model cells blanked on subsequent rows of the same group
and a horizontal divider between groups so the sections read as
distinct units.
Output formats: table (Unicode box-drawn, default), markdown
(GitHub-flavored, copy-paste friendly), json, csv.
Filters: --period (today/week/30days/month/all, default 30days),
--from/--to, --provider, --task, --top, --min-cost, --no-totals.
The table renderer auto-sizes every column to its content (no fixed
widths leaving trailing whitespace) and drops cache columns as a
pair when the terminal is narrow, then input/output, then top-task,
in that order. Provider, model, total, and cost stay regardless.
Visible-width math uses strip-ansi (already a dependency) so styled
cells pad correctly. Cyan headers, yellow totals, dim provider name.
The aggregator walks every parsed turn and attributes each
assistant call to its (provider, model, task) bucket, computing
real input / output / cache_write / cache_read tokens and cost.
Output tokens include reasoning. Cached input tokens are folded
into cache_read so the column matches what users intuitively expect.
19 fixture-based tests cover aggregation correctness, byTask
grouping, taskFilter, topN/minCost filters, reasoning-as-output,
all four renderers (table/markdown/json/csv), narrow-terminal
column dropping, CSV/markdown escaping, totals row toggle, and
visible-width math under styled cells.
Document the contributor onboarding path:
- CONTRIBUTING.md: setup, npm scripts, coding conventions, PR process,
the block-claude-coauthor enforcement, and the five providers without
test coverage today (claude, gemini, goose, qwen, antigravity).
- docs/architecture.md: 12-command CLI surface, parser pipeline, three
cache layers, 14 optimize detectors, and the mac / gnome / build
layouts with cited line numbers.
- docs/providers/: one file per provider (17 providers plus the shared
vscode-cline-parser helper). Each covers data path, storage format,
caching, dedup key, quirks, and a "when fixing a bug here" checklist.
Also fix two pre-existing documentation issues surfaced while writing
the new docs:
- RELEASING.md claimed GitHub Actions auto-publishes the CLI when a
v* tag is pushed. There is no such workflow; CLI publishing is
manual via npm publish. Updated the CLI section to reflect reality
and kept the menubar (mac-v* tag) automation accurate.
- .gitignore had CLAUDE.md unanchored, which on case-insensitive
filesystems also matched docs/providers/claude.md. Anchored to
/CLAUDE.md so the root-level memory file stays ignored without
affecting subdirectory docs.
All cited file paths, line numbers, function names, and test counts
were verified against current code (41 test files, 558 tests passing).
Replace blocking availableData drain with non-blocking POSIX read
that respects Task cancellation. Handle EINTR from child SIGCHLD,
close pipe fds after drain to prevent deadlock on oversized output,
and escalate SIGTERM to SIGKILL after 0.5s grace period.
Add 60-second loading watchdog as safety net that auto-clears stuck
state on each refresh loop tick.
Fixes#282
Closes#277.
Every paste-style fix now declares an explicit `destination` so users can
tell at a glance whether a suggestion belongs in CLAUDE.md as a permanent
rule, in a one-time session opener, in the current chat as an ask, or in
a shell config file. Previously the prompts had no labeled home and users
were dropping one-time session openers into CLAUDE.md as permanent rules.
Type changes:
- New `PasteDestination` union: `claude-md` / `session-opener` / `prompt`
/ `shell-config`
- `WasteAction.paste` gains `destination?: PasteDestination`
Renderer changes:
- CLI `optimize` command (renderOptimize → renderFinding) prints a
section header above each fix block:
-- Suggested CLAUDE.md addition (permanent rule) ───
-- One-time session opener (do NOT add to CLAUDE.md) ───
-- Ask Claude in the current session ───
-- Add to your shell config ───
-- Run this command ───
- Interactive dashboard (FindingAction in dashboard.tsx) gets the same
treatment so the in-popover findings list reads identically.
Existing fixes retagged appropriately. Two existing prompts that lacked
destination context altogether ("Set a delivery checkpoint at the start
of the next expensive thread", "Start the next expensive thread with a
fresh-context constraint") now read as one-time session openers with a
clear "do not add to CLAUDE.md" hint — the exact failure mode the
reporter described.
Tests:
- Existing `detectJunkReads` test extended to assert the destination tag.
- New regression block walks every detector that emits a paste-style fix
and asserts each one declares a destination — future detectors that
ship without one get caught here.
* Quiet routine pricing warnings + menubar recovery from stuck-loading
CLI:
- Default `codeburn` invocation no longer prints "no pricing data for model"
warnings on every run. Greeting a fresh user with three lines of stderr
before the dashboard even draws looked like the tool was broken on first
launch. The warning now requires --verbose, and the suppressed pricing
miss still results in $0 cost (correct for unmapped models).
- Local-model heuristic skips the warning entirely for Ollama tags
(`qwen3.6:35b-a3b-bf16`), GGUF/quantized fingerprints, and similar names
that will never have public pricing. The "update codeburn" hint was
actively misleading there.
- When the warning does fire (with --verbose), it points users at
`codeburn model-alias <model> <known-model>` as the actual escape hatch
alongside the package update suggestion.
Menubar:
- Replace perpetual "Loading…" spinner with a FetchErrorOverlay when the
per-key fetch fails and the cache is empty. User sees the error and a
Retry button instead of an infinite hang.
- Add diagnostic breadcrumbs (NSLog, invisible to normal users — Console.app
/ `log stream --process CodeBurnMenubar` only) for the four states that
produce a stuck loading overlay:
- subprocess timeout after 45s
- fetch result dropped due to Task cancellation (rapid tab switch)
- fetch result dropped due to mid-fetch calendar rollover
- retry attempt where the last successful fetch is >2 min stale
- Track lastSuccessByKey separately from cache freshness so the staleness
diagnostic survives day-rollover cache wipes.
* Stop flashing the compare-view loading screen on background refresh
When the 30s CLI tick updated `projects` while the user was reading the
model comparison results, the projects-watching effect always fired
setLoadTrigger, which flipped phase to 'loading' and re-ran the slow
scanSelfCorrections walk over every provider's session directory. The
user lost their scroll position and saw a loading flash mid-read.
Recompute the comparison rows in place when:
- the user is already on the results phase, AND
- both picked models still exist in the new aggregate.
Skip the corrections rescan on these in-place refreshes — corrections
drift slowly enough that holding the previous value until the user
re-enters compare is acceptable, and the rescan is the slow part of the
load. Initial selection and post-selection load still run the full
pipeline.
killRunningApp sent SIGTERM but returned immediately. The subsequent
open call raced against the dying process, failing with error -600
on --force reinstalls. Poll up to 5 seconds for exit.
Sleep: track forceRefreshTask and cancel it on willSleep alongside
refreshLoopTask. Reset loadingCount on wake so orphaned fetches from
before sleep cannot leave the loading bar stuck.
Tabs: replace Button wrapper with onTapGesture on AgentTab so the
NSPopover hover tooltip does not eat the first click. Add
clickDismissed guard to prevent the popover from re-showing while
the mouse is still over the tab after a tap.
Tab size: only render the quota bar slot when quota data exists
(connected), not for every provider that supports quota. Disconnected
Claude/Codex tabs are now the same height as other tabs.
Drop .userInitiated from the process activity so macOS can enter
deep sleep while the menubar is running. The wake observer already
re-syncs on resume.
Run the all-provider and per-provider refreshes in parallel in the
loop tick so tab strip costs and the detail view update together
instead of the detail lagging behind by one CLI call.
Reported in #264 as a V8 CHECK abort with `Check failed: (location_) != nullptr`
inside `node::sqlite::StatementSync::ColumnToValue`. The crash happens when
SQLite returns a TEXT column whose bytes V8's String::NewFromUtf8 rejects
(invalid UTF-8 — common for Cursor's stored chat text where multi-byte chars
are truncated at streaming boundaries). Node 22.x prior to 22.20 does not
check the resulting MaybeLocal<String> for empty before dereferencing,
aborting the whole process with a trace trap.
A try/catch in JS can't recover — the abort runs in the C++ extension before
the V8 exception handler. So we refuse to load node:sqlite at all when we
detect a buggy Node version, surface a clear "upgrade Node" diagnostic, and
let the rest of the CLI run with the file-based providers (Claude, Codex,
Copilot, Gemini, etc.) instead of taking the whole tool down.
- engines.node bumped to >=22.20 so npm warns at install time
- src/sqlite.ts: checkBuggyNodeVersion() detects Node 22.x < 22.20 and routes
through the existing isSqliteAvailable() / loadError diagnostic path
getDateRange() silently fell back to week on unknown periods, and no command
validated --format. A typo like --period mounth or --format yaml produced
wrong output with exit 0. Now all 6 format-accepting commands and all
period-accepting paths reject unknown values with a clear message and exit 1.
Also fixes the status description (said today+week+month, only shows
today+month).
Two passes of validators across CLI accuracy, dashboard UX, menubar Swift,
performance, security, and end-to-end smoke tests on real session data.
Data-correctness fixes:
- parseLocalDate rejects month/day overflow. JS Date silently rolled
Feb 31 to Mar 3, so --from 2026-02-31 --to 2026-03-15 quietly dropped
sessions on Feb 28 - Mar 2. Now throws "Invalid date" with a clear
reason. Leap-day case covered (2024-02-29 valid, 2025-02-29 rejected).
- CSV/JSON exports use the active currency's natural decimal places. The
previous round2 helper produced ¥412.37 in CSV while the dashboard
rendered ¥412 — finance teams comparing the two surfaces saw a
discrepancy. New roundForActiveCurrency consults Intl.NumberFormat for
the right precision (0 for JPY/KRW/CLP, 2 for USD/EUR, etc).
- Copilot toolRequests is Array.isArray-guarded in both modern and legacy
event branches. Previously a corrupt session with toolRequests=null or
a string aborted the whole file's parse loop and silently dropped every
legitimate call after it.
- Codex token_count dedup uses a null sentinel for prevCumulativeTotal so
the first event is never confused with a duplicate. Sessions that emit
only last_token_usage (no total_token_usage) report cumulativeTotal=0
on every event; with the previous 0-initialized prev, the first event
matched the dedup guard and was dropped.
- LiteLLM pricing values are clamped to [0, 1] per token via safePerTokenRate.
Defense in depth against a tampered upstream JSON shipping negative or
absurdly large per-token costs that would otherwise propagate into all
cost totals.
Performance:
- Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Two
changes: per-conversation user-message buffer uses an index pointer
instead of Array.shift() (which was O(n) per call); and a real ROWID
cutoff via subquery limits the scan to the most recent 250k bubbles
with a stderr warning so power users get a partial report rather than
a stalled CLI.
- Spawned codeburn CLI subprocesses are terminated when the calling Task
is cancelled. Without this, rapid period/provider tab clicks in the
menubar cancelled the Task but left the subprocess running to
completion, piling up zombie processes.
UX:
- Dashboard period switch flips to loading and clears projects
synchronously before reloadData runs, eliminating the frame where the
new period label rendered over the old period's projects.
- Optimize findings tab paginates 3-at-a-time with j/k scroll. With 4
new detectors plus 7 originals, 8-10 findings * 6 lines was scrolling
the StatusBar off the alt buffer top.
- Custom --from/--to ranges hide the period tab strip and disable the
1-5 / arrow keys so a stray period press no longer abandons the user's
explicit range. A "Custom range: X to Y" banner replaces the tab strip.
- OpenCode storage-format warning is per-table-set, rate-limited to once
per process, and points the user at OpenCode's migration step or the
issue tracker. The previous all-or-nothing check fired the generic
"format not recognized" string for any schema mismatch.
Menubar / OAuth:
- Both Claude and Codex bootstrap (Reconnect button) now honour the
usageBlockedUntil 429 backoff that refreshIfBootstrapped respects.
Spamming Reconnect during sustained rate-limit windows previously
hammered the upstream endpoint on every click.
- Codex Retry-After HTTP header is parsed (delta-seconds plus IMF-fixdate
fallback) so we don't over-back-off when ChatGPT tells us a shorter
window than our 5-minute floor.
- Both credential cache files are written via SafeFile.write
(O_CREAT | O_EXCL | O_NOFOLLOW with explicit 0600) so there is no race
window where the temp file briefly exists at default umask, and a
symlink at the destination cannot redirect the write. Reads now route
through SafeFile.read with a 64 KiB cap, closing the symlink-follow gap
on Data(contentsOf:).
CI signal:
- TypeScript strict typecheck (tsc --noEmit) is now zero errors. The
six errors in src/providers/copilot.ts came from a discriminated-union
catch-all branch whose `data: Record<string, unknown>` shape TS picked
over the specific event branches when narrowing on `type`. Removed the
catch-all; runtime falls through unknown event types via the existing
if/else chain.
Tests added: 16 new (now 555 total)
- date-range-filter: month/day/year overflow rejection, leap-day correctness
- currency-rounding: convertCost no-rounding contract, roundForActiveCurrency
for USD/JPY/KRW/EUR
- providers/copilot: malformed toolRequests does not abort the parse
- providers/cursor-bubble-dedup: re-parse after token mutation does not
double-count, single parse yields one call per bubble
- providers/codex: first event with cumulativeTotal=0 not dropped,
consecutive zero-cumulative duplicates still deduped
* Gate Claude OAuth refresh attempts on terminal failures
Anthropic returns invalid_grant (HTTP 400) when the user's refresh token has
been revoked or rotated, typically after they re-ran claude login on another
device. The previous code rethrew the raw error every refresh cycle, leaving
the Plan UI stuck on a Swift error string and pummeling Anthropic's token
endpoint forever.
The new SubscriptionRefreshGate captures a fingerprint of
~/.claude/.credentials.json on terminal failure and stops trying until that
fingerprint changes (the user re-logs-in). Transient 5xx/network failures
get exponential backoff capped at 6 hours.
Two new SubscriptionError cases let the UI distinguish "user must reconnect"
from "Anthropic is flaky right now" and show a clean reconnect CTA instead
of raw HTTP guts.
* Inline live-quota progress bar inside each AgentTab chip
When a provider exposes a live quota source, the AgentTab chip grows by ~3pt
to host a thin weekly-utilization bar directly under the label. Hovering the
chip reveals a popover with all four Anthropic windows (5-hour, weekly, weekly
Opus, weekly Sonnet) plus reset countdowns. Click still switches the tab as
before.
Today only Claude has a quota source (the existing /api/oauth/usage path);
other providers' chips render unchanged. The QuotaSummary abstraction lets
us bolt on Cursor/Copilot/Codex meters in follow-up commits.
Subscription is now refreshed eagerly on the periodic loop so the bar lights
up without forcing the user to open a deep view first. The previous
SubscriptionRefreshGate keeps a dead refresh token from spamming Anthropic.
Adds two new SubscriptionLoadState cases (terminalFailure, transientFailure)
so the deep Plan view shows a "reconnect" message instead of a raw Swift
error string when the user's claude login expired.
* Replace SubscriptionClient with credential-store + service architecture
The previous SubscriptionClient never persisted refreshed access tokens, so
every 30s tick read the expired token from Keychain, refreshed it (1 call),
fetched usage with the new token (2nd call), and threw the new token away —
3 API calls per cycle, which burned through Anthropic's per-account rate
budget and produced the 429s and `invalid_grant` loops users were seeing.
The replacement mirrors CodexBar's proven pattern:
- ClaudeCredentialStore owns the credential lifecycle. Bootstrap is strictly
user-initiated (Connect button in the Plan tab); the menubar does not touch
Claude's keychain at startup. After bootstrap, refreshed tokens — including
rotated refresh tokens — are persisted to a local cache file under
~/Library/Application Support/CodeBurn (mode 0600). Using a file instead of
our own keychain item means rebuild signature changes don't trigger a
startup keychain prompt; the only prompt the user ever sees is the one for
Claude Code-credentials on Connect.
- ClaudeUsageFetcher (folded into the service) is a pure /api/oauth/usage
call with one allowed 401-recovery roundtrip. 429s record an explicit
backoff window honouring Retry-After.
- ClaudeSubscriptionService orchestrates bootstrap / refresh / disconnect,
applies the 429 backoff, and surfaces terminal vs transient failures so
the UI can show the right CTA.
- Reading Claude's keychain now tries the entry keyed by NSUserName() first
and falls back to the unscoped query, so users who re-ran /login and ended
up with two Claude Code-credentials items pick up the fresh one. This was
the actual cause of "I logged in but the menubar still shows stale data".
User-facing additions:
- A proper Settings window (right-click → Settings…) with General / Claude /
About tabs. Provider quota cadence is configurable (Manual / 1m / 2m / 5m /
15m). New providers plug in as additional tabs.
- Plan tab: notBootstrapped → "Connect Claude subscription" CTA;
terminalFailure → "Reconnect Claude" with the correct /login instruction
for Claude Code 2.1; transientFailure preserves the last loaded view with
a retrying badge.
- AgentTab quota bar slot is always reserved so chip height doesn't jitter
when the user connects for the first time. Hover popover has 250ms enter
/ 150ms exit debounce so swiping across chips doesn't pop a popover for
every chip touched.
- Disconnect requires confirmation, clears capacityEstimates and the
subscription snapshot store so a reconnect under a different account
doesn't surface "Based on last cycle" projections from the old account.
Validator findings applied: cadence anchor only updates on successful
refresh (not every attempt), refresh-token rotation persists in memory
before keychain write so a write failure doesn't lock the user out, server
error bodies are sanitized (token redaction + 240-char cap) before they
reach the UI or NSLog, and Refresh Now refreshes both the menubar payload
and quota.
* Add Codex live quota + multi-provider warning, with validator fixes
CodexCredentialStore reads ~/.codex/auth.json (ChatGPT-mode only) on
user-initiated Connect, caches under Application Support like Claude.
CodexSubscriptionService hits chatgpt.com/backend-api/wham/usage with
the bearer token + ChatGPT-Account-Id header, parses primary/secondary
windows, additional per-model rate limits (e.g. GPT-5.3-Codex-Spark),
and credits balance with a Double-or-String fallback.
Plan-tier enum captures the full ChatGPT plan list including prolite,
free_workspace, education, quorum, k12, plus an unknown(String) case
that preserves the raw plan name when OpenAI ships a tier we haven't
mapped yet.
Multi-provider warning system:
- Menubar flame tints from neutral to yellow (70%) → orange (90%) →
red (100%) based on the worst-affected connected provider's worst
window. Uses NSImage.SymbolConfiguration palette colors.
- Popover header gains a warning row when any provider is at 70%+.
"Claude 79% of quota used", "Claude 79% · Codex 92%", or
"Claude over limit (105%)" when severity hits .danger.
- Hover popover gains a plan-name badge in the top-right corner so
users know which subscription is feeding the bar.
- Codex chip surfaces the credits balance and any non-zero per-model
additional rate limits as footer rows.
Validator fixes applied in the same commit:
- Provider-specific reconnect / disconnected copy in QuotaDetailPopover
(was hardcoded to Claude).
- Generation-token guard on refreshSubscriptionReportingSuccess and
refreshCodexReportingSuccess so a Disconnect during an in-flight
fetch can't resume after the await and re-populate the cleared state.
- Codex codexQuotaSummary promotes secondary to primary when only one
window is returned, so free / guest tiers don't render an empty bar.
- Memory-cache TTL is now actually consulted in currentRecord (the
isFresh check was dead code, leaving cached records valid forever).
- sanitizeForUI now redacts OpenAI sk-* keys, JWT tokens, and Bearer
headers in addition to Claude sk-ant-*.
- Removed diagnostic NSLog that wrote raw chatgpt.com response bodies
to the unified log.
- Codex Connect / Reconnect copy in Settings explains the auth.json
prerequisite and the API-key vs ChatGPT-mode distinction.
- Disconnect dialogs now state explicitly that the auth.json /
credentials keychain entry is left untouched.
- Plan badge in the popover gets line-limit + truncation + max-width
so a long unknown plan name can't overflow the row.
- Renamed shadowing `let max` to `let worst` in aggregateQuotaStatus.
* Add Codex Plan tab + size plan badge to content
The Plan tab is now visible when the Codex chip is selected, mirroring
the Claude tab's deep view. CodexPlanInsight renders the user's plan
tier ("Pro Lite", "Plus", etc.), the primary and secondary rate-limit
windows with reset countdowns, and any non-zero per-model additional
limits (e.g. GPT-5.3-Codex-Spark) so power users see them.
The "On pace at reset" projection that Claude's Plan view shows is not
included here — that math feeds from local Claude per-message spend
extrapolated against API quota windows, and our local Codex spend is
not a 1:1 signal for the ChatGPT-subscription rate windows reported by
wham/usage. Wiring a Codex extrapolator is a follow-up.
Drop the maxWidth=90 frame on the plan badge in the hover popover. It
was stretching short labels like "Pro Lite" to fill the full 90pt slot;
fixedSize makes the badge hug the text. Plan names are bounded short
strings, so truncation is a non-issue in practice.
* Five correctness fixes from multi-agent bug hunt
A multi-agent audit of the codeburn correctness surface found five
real bugs each producing visibly wrong numbers or risking data loss.
All five fixes were validated by parallel review agents and exercised
end-to-end against real session data on this machine.
- src/cli.ts: --refresh <seconds> was using bare parseInt as the
commander callback. Commander invokes the callback as
parseInt(value, previous), so previous becomes the radix:
--refresh 30 was being parsed as parseInt('30', 30) = 90, and
--refresh 60 became NaN. Replaced with parseInteger (already
defined at line 48 with radix locked to 10) at all three sites.
- src/providers/cursor.ts: parseAgentKv was timestamping every
agentKv call as new Date().toISOString() because the Cursor
SQLite schema has no per-message timestamp. Result: every
Cursor agent call regardless of when it happened landed in
today's date bucket. Now uses statSync(dbPath).mtimeMs as a
bounded ceiling so calls land at the actual last-write time of
the Cursor database, not today. Verified locally: a 1904-call
Cursor history with March 22 mtime now correctly bucket into
all-time only and shows 0 calls for today/week/30days.
- src/providers/codex.ts: prev token counters were only updated
inside the cumulative-fallback branch, so a session emitting N
events with last_token_usage followed by one cumulative-only
event computed the next delta against prev=0 and double-counted
the entire cumulative window. Cost could be inflated 10-100x
for any mixed-format Codex session. Now prev advances to the
current cumulative state regardless of which branch ran.
- src/providers/gemini.ts: totalOutput accumulated output+thoughts
while totalThoughts was tracked separately. The result was
outputTokens = output+thoughts AND reasoningTokens = thoughts;
any consumer summing the two double-counted thoughts. Now
totalOutput holds just output, reasoningTokens holds thoughts,
and the cost calc folds thoughts into the output count to keep
pricing correct (Google bills thoughts at the output rate;
calculateCost has no reasoning parameter).
- src/export.ts: exportJson had no safety check before writeFile,
so codeburn export -f json -o ~/important.json would silently
clobber the user's file. CSV path had a marker-file guard; JSON
did not. Now refuses to overwrite a file unless its first 4KB
contain the codeburn schema marker. Uses a streaming partial
read so a large existing file does not OOM Node's ~512MB
string limit. Refuses directories outright.
Skipped intentionally: cursor-auto/copilot-auto/cline-auto/
qwen-auto are aliased to claude-sonnet-4-5. The audit flagged
this as wrong pricing for non-Anthropic auto-routed turns, but
Cursor's "auto" mode does not expose the actual model and any
alternative estimate is equally arbitrary. README already
documents this as a Sonnet-based estimate.
vitest run: 38 files, 529 tests pass.
* Five more correctness fixes from the bug-hunt round
This commit closes out the remaining critical-tier findings from the
multi-agent audit, with one item documented as a known limitation.
- src/providers/cursor.ts: bubble dedup key included mutable
inputTokens/outputTokens. Cursor mutates token counts on the row in
place when streaming completes, so re-parsing the same DB produced
a fresh dedup key per bubble and silently double-counted. Switched
to the SQLite row key (`bubbleId:<unique>`) which is stable per
bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose
`key as bubble_key`.
- src/providers/pi.ts: usage fields were destructured non-optionally,
but real Pi/OMP session files sometimes omit individual fields.
`calculateCost(model, undefined, ...)` returned NaN, and that NaN
propagated into every aggregate cost total. Coerce each field to
0 with `?? 0`.
- src/models.ts: getShortModelName and the getModelCosts startsWith
fallback both walked the dictionary in insertion order. A model id
like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched
by startsWith first) and silently get GPT-5's display name and
pricing tier. Iterate longest keys first so more-specific prefixes
win. Tightened the cost fallback's match condition from
`startsWith(key) || startsWith(key + '-')` to require either an
exact match or a `key + '-'` continuation, removing accidental
matches like `gpt-50` against `gpt-5`.
- src/models.ts: calculateCost returned 0 silently for any model
missing from the pricing snapshot. New Anthropic / OpenAI models
shipped between snapshot refreshes look free until the user
notices. Now warns once per unknown model name per process to
stderr. Skips the warning for the `<synthetic>` placeholder so
the noise floor stays low.
- src/yield.ts: revert detection was broken on the canonical case.
Two problems: (1) `subject.toLowerCase().includes('revert')`
matched any commit whose subject mentioned the word ("Add revert
button" was misclassified). (2) The window logic only counted
reverts within the original session's 1-hour boundary, but real
`git revert` commits land in later sessions, so original sessions
always looked productive. Now: getRevertedShas runs once with
`--grep=^This reverts commit` and parses bodies to build a Set of
SHAs that were the target of a revert anywhere in history.
CommitInfo.wasReverted is set when this commit's SHA appears in
that set. categorizeSession then flags a session as reverted when
its in-main commits were later reverted, regardless of when the
revert itself happened.
- src/providers/droid.ts: SKIPPED with comment. Droid records token
usage only at session level. The current behavior splits evenly
across emitted assistant calls and prices all of them at
settings.model (the latest model). For sessions where the user
switched models mid-stream, costs are approximate. Added an
inline comment documenting this; a real fix requires per-message
model data that isn't in the Droid JSONL schema.
Verified end-to-end on this machine:
- vitest run: 38 files, 529 tests pass
- `codeburn report --format json` produces valid JSON
- `codeburn yield -p week` runs without crashing, finds 0 reverts
in the user's recent git history (plausible — fix changed the
detection from "subject contains revert" to "this commit's SHA
appears in a later 'This reverts commit ...' body")
- Stderr now warns for unknown model ids: `openai/gpt-5.3`,
`qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced
silently at $0.
* Four high-severity fixes from the bug-hunt round
- src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in
one try/catch. If fetchRate succeeded but cacheRate threw (disk
full, ENOSPC, no permissions on the cache dir), the catch block
swallowed the error and returned 1. Every cost rendered after that
point became USD-equivalent silently. Now the fetch and the cache
write live in separate paths: a successful fetch returns the rate
even if the persist fails, and the cache-write error is dropped to
a fire-and-forget so transient disk problems do not corrupt the
user's currency display.
- src/cursor-cache.ts: writeFile was non-atomic. Two concurrent
codeburn invocations writing to cursor-results.json could
interleave bytes mid-write, leaving a truncated file that
parsed-error on next read and forced a full SQLite re-scan every
run. Switched to the temp-file + rename pattern with a randomized
temp name so each writer gets its own staging file and the rename
is atomic on POSIX. Crash mid-write also leaves only a leftover
temp file, which gets unlinked in the catch path; the destination
is never half-written.
- mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's
Task.sleep keeps a wakeup pending across system sleep, so on wake
the natural tick fires the same instant the wake observers do.
Combined with didWakeNotification, screensDidWakeNotification, and
the launchd com.codeburn.refresh distributed notification, that
produced 2-3 concurrent CLI spawns within ms of every wake. Now:
willSleepNotification cancels the loop task; didWakeNotification
restarts it. The loop also reads lastRefreshTime and skips its
natural tick if a wake/manual/distributed-notification refresh ran
within the last 5 seconds, coalescing the two sources of refresh
into one CLI spawn per wake event.
- mac/.../CodeBurnApp.swift observeStore: the read closure had an
implicit strong self capture (it accessed store.* without a
capture annotation), pinning self for the lifetime of any
unfired observation. Added [weak self] and a guard to make the
capture explicit. withObservationTracking is one-shot per call,
so there is at most one active subscription at a time; the
earlier audit's claim of an unbounded leak overstated the issue,
but tightening the capture pattern is still cleaner.
Verified:
- vitest run: 38 files, 529 tests pass
- swift build -c release --arch arm64 --arch x86_64: clean, no
diagnostics, no MainActor warnings
- mac/Scripts/package-app.sh dev produces a valid universal bundle
- Menubar launches and runs without crash
* Eleven medium-severity fixes from the bug-hunt round
- src/format.ts formatTokens: guard against Infinity, NaN, and
negative input. Previously a corrupt aggregate could leak into
the UI as the literal strings "NaN" or "Infinity". Negatives now
render as "0" rather than "-500" with no scaling.
- src/cli-date.ts parseDateRangeFlags: the missing-from default
was new Date(0), which opened a 55-year scan from 1970 epoch
whenever the user passed only --to. Default now anchors at 6
months back from now, matching the dashboard's all-time period.
Test updated to assert the new bounded window.
- src/cli-date.ts toPeriod: previously fell back silently to "week"
for any unknown input, so a typo like `-p mounth` produced a
quiet 7-day report while the user thought they were viewing the
month. Now exits with a clear stderr error and exit code 1.
Test updated to assert the loud-failure behavior.
- src/optimize.ts urgencyScore: rebalanced weights so a high-impact
finding with zero observed tokens cannot outrank a medium-impact
finding with millions of tokens. Old 0.7/0.3 split made high+0
(0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B
(0.75) beat high+0 (0.50). Token normalization lifted to 5M so
the ramp covers a realistic spend range.
- src/models.ts calculateCost: clamp negative or non-finite token
inputs to 0 before pricing. A corrupt JSONL emitting a negative
count would otherwise produce a negative cost that silently
subtracted from real spend in aggregates.
- src/currency.ts convertCost: stop rounding during aggregation.
For zero-fraction currencies (JPY, KRW, CLP) this clamped every
per-session cost to a whole unit before sum, so a project of
1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of
¥400. formatCost still rounds at the display boundary.
- src/config.ts saveConfig: the temp file path was a fixed
`${configPath}.tmp` suffix. Two simultaneous saveConfig calls
(overlapping menubar and CLI runs) raced on the same staging
file and could leave one writer reading partial bytes from the
other. Randomized the temp suffix per call.
- src/providers/antigravity.ts flushCache: the early return on
`!cacheDirty` short-circuited eviction when liveCascadeIds was
supplied but no cascade had been added or updated this run. As
a result, deleted .pb files persisted in the cache forever once
the user stopped writing to it. Eviction now runs whenever
liveCascadeIds is provided, marks the cache dirty if anything
was removed, and only then short-circuits if there is nothing
to write.
- src/daily-cache.ts addNewDays: cap retention at 2 years. The
days array previously merged forever, growing the cache file by
hundreds of bytes per day until JSON parse on every CLI
invocation became measurable. The 6-month UI period plus the
365-day BACKFILL_DAYS bootstrap both fit comfortably inside the
cap, with headroom for a future longer window.
- src/dashboard.tsx useInput: period number keys (1-5) and arrow
keys triggered a reload while the compare view was mounted. The
parent's data state changed underneath the user with no visual
affordance back to the dashboard. Now those keys are gated on
view !== 'compare', and `b` / Esc inside compare returns to the
dashboard.
- mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend
Bars, computeTrendStats, computeForecast, and computeAllStats
each allocated a fresh DateFormatter (and Calendar) on every
call. SwiftUI re-evaluates these views many times per second
during hover scrubbing on the trend chart, so the allocations
were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d"
/ "MMM d" formatters and the gregorian Calendar to fileprivate
cached singletons.
Two findings from the same bucket were not addressed here:
- UpdateChecker SHA-256 / codesign verification is already
performed by src/menubar-installer.ts (verifyChecksum at line
85). The Swift side just kicks off `codeburn menubar --force`
which runs that path. The audit's claim of missing verification
was a misread.
- NSDistributedNotificationCenter sender validation: the
`com.codeburn.refresh` listener accepts from any sender, but
forceRefresh has a 5-second rate-limit gate so the abuse
ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC,
per-launch shared secret) are disproportionate to the impact.
vitest run: 38 files, 529 tests pass.
swift build -c release: clean, no warnings.
* Validator hardenings on the bug-hunt batch
Hoist the per-call sort in getModelCosts and getShortModelName to module
scope so model lookups on the hot path stop reallocating sorted key arrays.
Sanitize the unknown-model stderr warning by stripping C0/C1 controls
and capping length, so a hostile or corrupt JSONL cannot inject terminal
escape sequences via the model field.
Skip the daily-cache prune when newestDate fails to parse. The previous
code produced a NaN cutoff and silently dropped every cached day on the
next merge.
Adds tests locking down the stable resolution of common model names
(gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and
the prune NaN guard.
Five interleaving menubar regressions traced back to the cache-wipe and
showLoading additions in 18c3c8f, surfaced by adversarial multi-agent
review against the v0.9.6 baseline.
- forceRefresh no longer calls store.invalidateCache(). Wiping the
whole cache on every wake or manual refresh emptied todayPayload,
flipped showAgentTabs to false, and made cache[key] == nil for all
keys, which forced the full-popover loading overlay over already
rendered data. The day-rollover guard inside refresh() still wipes
the cache when the calendar date changes, so the legitimate part of
18c3c8f is preserved.
- Overlay condition is now !store.hasCachedData. Without this, the
popover briefly rendered $0.00 placeholders before the overlay slid
in on a cold key, and reflashed the overlay on every manual refresh
even when fresh data was on screen.
- refreshStatusButton skips while popover is anchored. Rewriting the
button's attributedTitle changes its intrinsic width, which makes
macOS reflow the status item and detaches the anchored popover to
the screen's top-left default position. popoverDidClose runs the
refresh once so the menubar title catches up immediately on
dismiss.
- showAgentTabs is sticky via hasAnyProvidersInCache. Prevents the
one-frame flicker where the tab strip vanished while the new key's
payload had not yet arrived.
- observeStore tracks store.currency. Without this, switching
currency did not propagate to refreshStatusButton until the next
30s payload tick, leaving the menubar showing the old currency
symbol and rate.
- Day-rollover race in refresh and refreshQuietly: capture cacheDate
at fetch start, drop the write if the calendar date changed during
the await. Prevents an in-flight fetch from yesterday polluting
today's freshly cleared cache.
- Manual refresh button passes showLoading: true again. Safe now that
the overlay is gated on cache state instead of isLoading; the
refresh button icon swaps to the spinner glyph for visible feedback,
while the popover body keeps the existing data and updates when the
fetch lands.
Adds a low-worth detector to codeburn optimize that flags expensive sessions with weak delivery signals (no edits, repeated retries, or no one-shot edits) when no git/gh delivery command is observed. Priority order is low-worth → context-bloat → outliers; each later detector excludes sessions named by an earlier one so the same session is never listed in three findings. Detection: floor, for no-edit, 3+ retries, regex matches git commit/push and gh pr create/merge but excludes commit-tree/commit-graph and dry-run. Three impact tiers consistent with #246. Token-savings uses full session tokens for no-edit sessions and the retry fraction for edit-with-retry sessions. Supersedes #241 with review fixes. Original implementation by @ozymandiashh.
Adds a context-bloat finding to codeburn optimize that flags sessions where effective input/cache tokens (cache-discounted via existing pricing constants) are large and disproportionate to output. Suggests starting fresh with a tightened context. Sessions flagged here are excluded from the cost-outlier finding to avoid double-listing. Growth-from-previous-session callouts are suppressed when the predecessor is more than 7 days back. Three impact tiers (low/medium/high). Supersedes #242 with review fixes from real-data probe. Original implementation by @ozymandiashh.
Reads the canonical cwd already stored inside Claude session JSONL files and uses it as the project path, then groups sessions by a normalized path key (case + slash insensitive) so Windows projects no longer split into 3+ rows on case/slash variants. Falls back to the legacy slug-derived path when cwd is missing. Closes#217. Supersedes #228 with a fix that preserves the canonical cwd even when mixed with slug-only sessions in the same directory. Original implementation by @ozymandiashh.
Adds per-model efficiency metrics (edit turns, one-shot rate, retries/edit, cost/edit) to the TUI By Model panel, JSON report output, and CSV export. Closes item 4 of #12. Supersedes #226 with review fixes (units rename, min-sample guard in TUI, tighter <synthetic> filter, multi-model attribution test). Original implementation by @ozymandiashh.
terminationHandler only reset isUpdating on non-zero exit, assuming
the app would be killed and relaunched on success. If pkill fails
silently the old process survives with isUpdating stuck true. Now
always resets on termination and clears the update badge on success.
Cache now tracks the calendar date and clears on day rollover so
overnight sleep no longer shows yesterday's numbers. Wake-from-sleep
invalidates the entire cache before fetching. Manual refresh and wake
explicitly request loading feedback so the spinner is visible even
when stale data exists.
copilot-openai-auto maps to gpt-5.3-codex pricing, which is wrong
for users on Anthropic models. The original copilot-auto fallback
is provider-agnostic and correct when no model can be inferred.
The menubar ran --optimize on every 30-second CLI invocation. As
sessions accumulated throughout the day, optimize got heavier until
it exceeded the 45-second timeout. When the fetch failed with no
cached data, the loading overlay had no escape hatch and stayed
forever.
- Never pass includeOptimize from the menubar (background loop,
forceRefresh, tab/period switches, manual refresh button)
- On fetch failure with empty cache, retry without optimize as
fallback so the spinner always clears
- refreshQuietly also skips optimize
- Use 1.25x multiplier for cache-write tokens to match Anthropic's
actual pricing (was incorrectly using 1x)
- Shell-quote server names in `claude mcp remove` fix text to prevent
issues with unusual server names
PR #221 unified the period logic but missed the TUI hotkey bar,
GNOME indicator popup, and macOS menubar app. All surfaces now
consistently show '6 Months' instead of 'All' or 'all time'.