Commit graph

475 commits

Author SHA1 Message Date
Resham Joshi
fb8f25fb97
Reject invalid --format and --period values instead of silently falling back (#258)
getDateRange() silently fell back to week on unknown periods, and no command
validated --format. A typo like --period mounth or --format yaml produced
wrong output with exit 0. Now all 6 format-accepting commands and all
period-accepting paths reject unknown values with a clear message and exit 1.
Also fixes the status description (said today+week+month, only shows
today+month).
2026-05-06 23:03:41 -07:00
Resham Joshi
daa673449c
Menubar and CLI hardening from multi-agent audit (#257)
Some checks are pending
CI / semgrep (push) Waiting to run
Two passes of validators across CLI accuracy, dashboard UX, menubar Swift,
performance, security, and end-to-end smoke tests on real session data.

Data-correctness fixes:

- parseLocalDate rejects month/day overflow. JS Date silently rolled
  Feb 31 to Mar 3, so --from 2026-02-31 --to 2026-03-15 quietly dropped
  sessions on Feb 28 - Mar 2. Now throws "Invalid date" with a clear
  reason. Leap-day case covered (2024-02-29 valid, 2025-02-29 rejected).

- CSV/JSON exports use the active currency's natural decimal places. The
  previous round2 helper produced ¥412.37 in CSV while the dashboard
  rendered ¥412 — finance teams comparing the two surfaces saw a
  discrepancy. New roundForActiveCurrency consults Intl.NumberFormat for
  the right precision (0 for JPY/KRW/CLP, 2 for USD/EUR, etc).

- Copilot toolRequests is Array.isArray-guarded in both modern and legacy
  event branches. Previously a corrupt session with toolRequests=null or
  a string aborted the whole file's parse loop and silently dropped every
  legitimate call after it.

- Codex token_count dedup uses a null sentinel for prevCumulativeTotal so
  the first event is never confused with a duplicate. Sessions that emit
  only last_token_usage (no total_token_usage) report cumulativeTotal=0
  on every event; with the previous 0-initialized prev, the first event
  matched the dedup guard and was dropped.

- LiteLLM pricing values are clamped to [0, 1] per token via safePerTokenRate.
  Defense in depth against a tampered upstream JSON shipping negative or
  absurdly large per-token costs that would otherwise propagate into all
  cost totals.

Performance:

- Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Two
  changes: per-conversation user-message buffer uses an index pointer
  instead of Array.shift() (which was O(n) per call); and a real ROWID
  cutoff via subquery limits the scan to the most recent 250k bubbles
  with a stderr warning so power users get a partial report rather than
  a stalled CLI.

- Spawned codeburn CLI subprocesses are terminated when the calling Task
  is cancelled. Without this, rapid period/provider tab clicks in the
  menubar cancelled the Task but left the subprocess running to
  completion, piling up zombie processes.

UX:

- Dashboard period switch flips to loading and clears projects
  synchronously before reloadData runs, eliminating the frame where the
  new period label rendered over the old period's projects.

- Optimize findings tab paginates 3-at-a-time with j/k scroll. With 4
  new detectors plus 7 originals, 8-10 findings * 6 lines was scrolling
  the StatusBar off the alt buffer top.

- Custom --from/--to ranges hide the period tab strip and disable the
  1-5 / arrow keys so a stray period press no longer abandons the user's
  explicit range. A "Custom range: X to Y" banner replaces the tab strip.

- OpenCode storage-format warning is per-table-set, rate-limited to once
  per process, and points the user at OpenCode's migration step or the
  issue tracker. The previous all-or-nothing check fired the generic
  "format not recognized" string for any schema mismatch.

Menubar / OAuth:

- Both Claude and Codex bootstrap (Reconnect button) now honour the
  usageBlockedUntil 429 backoff that refreshIfBootstrapped respects.
  Spamming Reconnect during sustained rate-limit windows previously
  hammered the upstream endpoint on every click.

- Codex Retry-After HTTP header is parsed (delta-seconds plus IMF-fixdate
  fallback) so we don't over-back-off when ChatGPT tells us a shorter
  window than our 5-minute floor.

- Both credential cache files are written via SafeFile.write
  (O_CREAT | O_EXCL | O_NOFOLLOW with explicit 0600) so there is no race
  window where the temp file briefly exists at default umask, and a
  symlink at the destination cannot redirect the write. Reads now route
  through SafeFile.read with a 64 KiB cap, closing the symlink-follow gap
  on Data(contentsOf:).

CI signal:

- TypeScript strict typecheck (tsc --noEmit) is now zero errors. The
  six errors in src/providers/copilot.ts came from a discriminated-union
  catch-all branch whose `data: Record<string, unknown>` shape TS picked
  over the specific event branches when narrowing on `type`. Removed the
  catch-all; runtime falls through unknown event types via the existing
  if/else chain.

Tests added: 16 new (now 555 total)
- date-range-filter: month/day/year overflow rejection, leap-day correctness
- currency-rounding: convertCost no-rounding contract, roundForActiveCurrency
  for USD/JPY/KRW/EUR
- providers/copilot: malformed toolRequests does not abort the parse
- providers/cursor-bubble-dedup: re-parse after token mutation does not
  double-count, single parse yields one call per bubble
- providers/codex: first event with cumulativeTotal=0 not dropped,
  consecutive zero-cumulative duplicates still deduped
2026-05-06 22:15:11 -07:00
Resham Joshi
efac2bfa15
Live quota bar inside AgentTab + Claude OAuth refresh gate (#255)
Some checks are pending
CI / semgrep (push) Waiting to run
* Gate Claude OAuth refresh attempts on terminal failures

Anthropic returns invalid_grant (HTTP 400) when the user's refresh token has
been revoked or rotated, typically after they re-ran claude login on another
device. The previous code rethrew the raw error every refresh cycle, leaving
the Plan UI stuck on a Swift error string and pummeling Anthropic's token
endpoint forever.

The new SubscriptionRefreshGate captures a fingerprint of
~/.claude/.credentials.json on terminal failure and stops trying until that
fingerprint changes (the user re-logs-in). Transient 5xx/network failures
get exponential backoff capped at 6 hours.

Two new SubscriptionError cases let the UI distinguish "user must reconnect"
from "Anthropic is flaky right now" and show a clean reconnect CTA instead
of raw HTTP guts.

* Inline live-quota progress bar inside each AgentTab chip

When a provider exposes a live quota source, the AgentTab chip grows by ~3pt
to host a thin weekly-utilization bar directly under the label. Hovering the
chip reveals a popover with all four Anthropic windows (5-hour, weekly, weekly
Opus, weekly Sonnet) plus reset countdowns. Click still switches the tab as
before.

Today only Claude has a quota source (the existing /api/oauth/usage path);
other providers' chips render unchanged. The QuotaSummary abstraction lets
us bolt on Cursor/Copilot/Codex meters in follow-up commits.

Subscription is now refreshed eagerly on the periodic loop so the bar lights
up without forcing the user to open a deep view first. The previous
SubscriptionRefreshGate keeps a dead refresh token from spamming Anthropic.

Adds two new SubscriptionLoadState cases (terminalFailure, transientFailure)
so the deep Plan view shows a "reconnect" message instead of a raw Swift
error string when the user's claude login expired.

* Replace SubscriptionClient with credential-store + service architecture

The previous SubscriptionClient never persisted refreshed access tokens, so
every 30s tick read the expired token from Keychain, refreshed it (1 call),
fetched usage with the new token (2nd call), and threw the new token away —
3 API calls per cycle, which burned through Anthropic's per-account rate
budget and produced the 429s and `invalid_grant` loops users were seeing.

The replacement mirrors CodexBar's proven pattern:

- ClaudeCredentialStore owns the credential lifecycle. Bootstrap is strictly
  user-initiated (Connect button in the Plan tab); the menubar does not touch
  Claude's keychain at startup. After bootstrap, refreshed tokens — including
  rotated refresh tokens — are persisted to a local cache file under
  ~/Library/Application Support/CodeBurn (mode 0600). Using a file instead of
  our own keychain item means rebuild signature changes don't trigger a
  startup keychain prompt; the only prompt the user ever sees is the one for
  Claude Code-credentials on Connect.

- ClaudeUsageFetcher (folded into the service) is a pure /api/oauth/usage
  call with one allowed 401-recovery roundtrip. 429s record an explicit
  backoff window honouring Retry-After.

- ClaudeSubscriptionService orchestrates bootstrap / refresh / disconnect,
  applies the 429 backoff, and surfaces terminal vs transient failures so
  the UI can show the right CTA.

- Reading Claude's keychain now tries the entry keyed by NSUserName() first
  and falls back to the unscoped query, so users who re-ran /login and ended
  up with two Claude Code-credentials items pick up the fresh one. This was
  the actual cause of "I logged in but the menubar still shows stale data".

User-facing additions:

- A proper Settings window (right-click → Settings…) with General / Claude /
  About tabs. Provider quota cadence is configurable (Manual / 1m / 2m / 5m /
  15m). New providers plug in as additional tabs.

- Plan tab: notBootstrapped → "Connect Claude subscription" CTA;
  terminalFailure → "Reconnect Claude" with the correct /login instruction
  for Claude Code 2.1; transientFailure preserves the last loaded view with
  a retrying badge.

- AgentTab quota bar slot is always reserved so chip height doesn't jitter
  when the user connects for the first time. Hover popover has 250ms enter
  / 150ms exit debounce so swiping across chips doesn't pop a popover for
  every chip touched.

- Disconnect requires confirmation, clears capacityEstimates and the
  subscription snapshot store so a reconnect under a different account
  doesn't surface "Based on last cycle" projections from the old account.

Validator findings applied: cadence anchor only updates on successful
refresh (not every attempt), refresh-token rotation persists in memory
before keychain write so a write failure doesn't lock the user out, server
error bodies are sanitized (token redaction + 240-char cap) before they
reach the UI or NSLog, and Refresh Now refreshes both the menubar payload
and quota.

* Add Codex live quota + multi-provider warning, with validator fixes

CodexCredentialStore reads ~/.codex/auth.json (ChatGPT-mode only) on
user-initiated Connect, caches under Application Support like Claude.
CodexSubscriptionService hits chatgpt.com/backend-api/wham/usage with
the bearer token + ChatGPT-Account-Id header, parses primary/secondary
windows, additional per-model rate limits (e.g. GPT-5.3-Codex-Spark),
and credits balance with a Double-or-String fallback.

Plan-tier enum captures the full ChatGPT plan list including prolite,
free_workspace, education, quorum, k12, plus an unknown(String) case
that preserves the raw plan name when OpenAI ships a tier we haven't
mapped yet.

Multi-provider warning system:
- Menubar flame tints from neutral to yellow (70%) → orange (90%) →
  red (100%) based on the worst-affected connected provider's worst
  window. Uses NSImage.SymbolConfiguration palette colors.
- Popover header gains a warning row when any provider is at 70%+.
  "Claude 79% of quota used", "Claude 79% · Codex 92%", or
  "Claude over limit (105%)" when severity hits .danger.
- Hover popover gains a plan-name badge in the top-right corner so
  users know which subscription is feeding the bar.
- Codex chip surfaces the credits balance and any non-zero per-model
  additional rate limits as footer rows.

Validator fixes applied in the same commit:

- Provider-specific reconnect / disconnected copy in QuotaDetailPopover
  (was hardcoded to Claude).
- Generation-token guard on refreshSubscriptionReportingSuccess and
  refreshCodexReportingSuccess so a Disconnect during an in-flight
  fetch can't resume after the await and re-populate the cleared state.
- Codex codexQuotaSummary promotes secondary to primary when only one
  window is returned, so free / guest tiers don't render an empty bar.
- Memory-cache TTL is now actually consulted in currentRecord (the
  isFresh check was dead code, leaving cached records valid forever).
- sanitizeForUI now redacts OpenAI sk-* keys, JWT tokens, and Bearer
  headers in addition to Claude sk-ant-*.
- Removed diagnostic NSLog that wrote raw chatgpt.com response bodies
  to the unified log.
- Codex Connect / Reconnect copy in Settings explains the auth.json
  prerequisite and the API-key vs ChatGPT-mode distinction.
- Disconnect dialogs now state explicitly that the auth.json /
  credentials keychain entry is left untouched.
- Plan badge in the popover gets line-limit + truncation + max-width
  so a long unknown plan name can't overflow the row.
- Renamed shadowing `let max` to `let worst` in aggregateQuotaStatus.

* Add Codex Plan tab + size plan badge to content

The Plan tab is now visible when the Codex chip is selected, mirroring
the Claude tab's deep view. CodexPlanInsight renders the user's plan
tier ("Pro Lite", "Plus", etc.), the primary and secondary rate-limit
windows with reset countdowns, and any non-zero per-model additional
limits (e.g. GPT-5.3-Codex-Spark) so power users see them.

The "On pace at reset" projection that Claude's Plan view shows is not
included here — that math feeds from local Claude per-message spend
extrapolated against API quota windows, and our local Codex spend is
not a 1:1 signal for the ChatGPT-subscription rate windows reported by
wham/usage. Wiring a Codex extrapolator is a follow-up.

Drop the maxWidth=90 frame on the plan badge in the hover popover. It
was stretching short labels like "Pro Lite" to fill the full 90pt slot;
fixedSize makes the badge hug the text. Plan names are bounded short
strings, so truncation is a non-issue in practice.
2026-05-06 19:57:17 -07:00
Resham Joshi
afd0ee7011
Validator hardenings on the bug-hunt batch (#254)
* Five correctness fixes from multi-agent bug hunt

A multi-agent audit of the codeburn correctness surface found five
real bugs each producing visibly wrong numbers or risking data loss.
All five fixes were validated by parallel review agents and exercised
end-to-end against real session data on this machine.

- src/cli.ts: --refresh <seconds> was using bare parseInt as the
  commander callback. Commander invokes the callback as
  parseInt(value, previous), so previous becomes the radix:
  --refresh 30 was being parsed as parseInt('30', 30) = 90, and
  --refresh 60 became NaN. Replaced with parseInteger (already
  defined at line 48 with radix locked to 10) at all three sites.

- src/providers/cursor.ts: parseAgentKv was timestamping every
  agentKv call as new Date().toISOString() because the Cursor
  SQLite schema has no per-message timestamp. Result: every
  Cursor agent call regardless of when it happened landed in
  today's date bucket. Now uses statSync(dbPath).mtimeMs as a
  bounded ceiling so calls land at the actual last-write time of
  the Cursor database, not today. Verified locally: a 1904-call
  Cursor history with March 22 mtime now correctly bucket into
  all-time only and shows 0 calls for today/week/30days.

- src/providers/codex.ts: prev token counters were only updated
  inside the cumulative-fallback branch, so a session emitting N
  events with last_token_usage followed by one cumulative-only
  event computed the next delta against prev=0 and double-counted
  the entire cumulative window. Cost could be inflated 10-100x
  for any mixed-format Codex session. Now prev advances to the
  current cumulative state regardless of which branch ran.

- src/providers/gemini.ts: totalOutput accumulated output+thoughts
  while totalThoughts was tracked separately. The result was
  outputTokens = output+thoughts AND reasoningTokens = thoughts;
  any consumer summing the two double-counted thoughts. Now
  totalOutput holds just output, reasoningTokens holds thoughts,
  and the cost calc folds thoughts into the output count to keep
  pricing correct (Google bills thoughts at the output rate;
  calculateCost has no reasoning parameter).

- src/export.ts: exportJson had no safety check before writeFile,
  so codeburn export -f json -o ~/important.json would silently
  clobber the user's file. CSV path had a marker-file guard; JSON
  did not. Now refuses to overwrite a file unless its first 4KB
  contain the codeburn schema marker. Uses a streaming partial
  read so a large existing file does not OOM Node's ~512MB
  string limit. Refuses directories outright.

Skipped intentionally: cursor-auto/copilot-auto/cline-auto/
qwen-auto are aliased to claude-sonnet-4-5. The audit flagged
this as wrong pricing for non-Anthropic auto-routed turns, but
Cursor's "auto" mode does not expose the actual model and any
alternative estimate is equally arbitrary. README already
documents this as a Sonnet-based estimate.

vitest run: 38 files, 529 tests pass.

* Five more correctness fixes from the bug-hunt round

This commit closes out the remaining critical-tier findings from the
multi-agent audit, with one item documented as a known limitation.

- src/providers/cursor.ts: bubble dedup key included mutable
  inputTokens/outputTokens. Cursor mutates token counts on the row in
  place when streaming completes, so re-parsing the same DB produced
  a fresh dedup key per bubble and silently double-counted. Switched
  to the SQLite row key (`bubbleId:<unique>`) which is stable per
  bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose
  `key as bubble_key`.

- src/providers/pi.ts: usage fields were destructured non-optionally,
  but real Pi/OMP session files sometimes omit individual fields.
  `calculateCost(model, undefined, ...)` returned NaN, and that NaN
  propagated into every aggregate cost total. Coerce each field to
  0 with `?? 0`.

- src/models.ts: getShortModelName and the getModelCosts startsWith
  fallback both walked the dictionary in insertion order. A model id
  like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched
  by startsWith first) and silently get GPT-5's display name and
  pricing tier. Iterate longest keys first so more-specific prefixes
  win. Tightened the cost fallback's match condition from
  `startsWith(key) || startsWith(key + '-')` to require either an
  exact match or a `key + '-'` continuation, removing accidental
  matches like `gpt-50` against `gpt-5`.

- src/models.ts: calculateCost returned 0 silently for any model
  missing from the pricing snapshot. New Anthropic / OpenAI models
  shipped between snapshot refreshes look free until the user
  notices. Now warns once per unknown model name per process to
  stderr. Skips the warning for the `<synthetic>` placeholder so
  the noise floor stays low.

- src/yield.ts: revert detection was broken on the canonical case.
  Two problems: (1) `subject.toLowerCase().includes('revert')`
  matched any commit whose subject mentioned the word ("Add revert
  button" was misclassified). (2) The window logic only counted
  reverts within the original session's 1-hour boundary, but real
  `git revert` commits land in later sessions, so original sessions
  always looked productive. Now: getRevertedShas runs once with
  `--grep=^This reverts commit` and parses bodies to build a Set of
  SHAs that were the target of a revert anywhere in history.
  CommitInfo.wasReverted is set when this commit's SHA appears in
  that set. categorizeSession then flags a session as reverted when
  its in-main commits were later reverted, regardless of when the
  revert itself happened.

- src/providers/droid.ts: SKIPPED with comment. Droid records token
  usage only at session level. The current behavior splits evenly
  across emitted assistant calls and prices all of them at
  settings.model (the latest model). For sessions where the user
  switched models mid-stream, costs are approximate. Added an
  inline comment documenting this; a real fix requires per-message
  model data that isn't in the Droid JSONL schema.

Verified end-to-end on this machine:
- vitest run: 38 files, 529 tests pass
- `codeburn report --format json` produces valid JSON
- `codeburn yield -p week` runs without crashing, finds 0 reverts
  in the user's recent git history (plausible — fix changed the
  detection from "subject contains revert" to "this commit's SHA
  appears in a later 'This reverts commit ...' body")
- Stderr now warns for unknown model ids: `openai/gpt-5.3`,
  `qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced
  silently at $0.

* Four high-severity fixes from the bug-hunt round

- src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in
  one try/catch. If fetchRate succeeded but cacheRate threw (disk
  full, ENOSPC, no permissions on the cache dir), the catch block
  swallowed the error and returned 1. Every cost rendered after that
  point became USD-equivalent silently. Now the fetch and the cache
  write live in separate paths: a successful fetch returns the rate
  even if the persist fails, and the cache-write error is dropped to
  a fire-and-forget so transient disk problems do not corrupt the
  user's currency display.

- src/cursor-cache.ts: writeFile was non-atomic. Two concurrent
  codeburn invocations writing to cursor-results.json could
  interleave bytes mid-write, leaving a truncated file that
  parsed-error on next read and forced a full SQLite re-scan every
  run. Switched to the temp-file + rename pattern with a randomized
  temp name so each writer gets its own staging file and the rename
  is atomic on POSIX. Crash mid-write also leaves only a leftover
  temp file, which gets unlinked in the catch path; the destination
  is never half-written.

- mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's
  Task.sleep keeps a wakeup pending across system sleep, so on wake
  the natural tick fires the same instant the wake observers do.
  Combined with didWakeNotification, screensDidWakeNotification, and
  the launchd com.codeburn.refresh distributed notification, that
  produced 2-3 concurrent CLI spawns within ms of every wake. Now:
  willSleepNotification cancels the loop task; didWakeNotification
  restarts it. The loop also reads lastRefreshTime and skips its
  natural tick if a wake/manual/distributed-notification refresh ran
  within the last 5 seconds, coalescing the two sources of refresh
  into one CLI spawn per wake event.

- mac/.../CodeBurnApp.swift observeStore: the read closure had an
  implicit strong self capture (it accessed store.* without a
  capture annotation), pinning self for the lifetime of any
  unfired observation. Added [weak self] and a guard to make the
  capture explicit. withObservationTracking is one-shot per call,
  so there is at most one active subscription at a time; the
  earlier audit's claim of an unbounded leak overstated the issue,
  but tightening the capture pattern is still cleaner.

Verified:
- vitest run: 38 files, 529 tests pass
- swift build -c release --arch arm64 --arch x86_64: clean, no
  diagnostics, no MainActor warnings
- mac/Scripts/package-app.sh dev produces a valid universal bundle
- Menubar launches and runs without crash

* Eleven medium-severity fixes from the bug-hunt round

- src/format.ts formatTokens: guard against Infinity, NaN, and
  negative input. Previously a corrupt aggregate could leak into
  the UI as the literal strings "NaN" or "Infinity". Negatives now
  render as "0" rather than "-500" with no scaling.

- src/cli-date.ts parseDateRangeFlags: the missing-from default
  was new Date(0), which opened a 55-year scan from 1970 epoch
  whenever the user passed only --to. Default now anchors at 6
  months back from now, matching the dashboard's all-time period.
  Test updated to assert the new bounded window.

- src/cli-date.ts toPeriod: previously fell back silently to "week"
  for any unknown input, so a typo like `-p mounth` produced a
  quiet 7-day report while the user thought they were viewing the
  month. Now exits with a clear stderr error and exit code 1.
  Test updated to assert the loud-failure behavior.

- src/optimize.ts urgencyScore: rebalanced weights so a high-impact
  finding with zero observed tokens cannot outrank a medium-impact
  finding with millions of tokens. Old 0.7/0.3 split made high+0
  (0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B
  (0.75) beat high+0 (0.50). Token normalization lifted to 5M so
  the ramp covers a realistic spend range.

- src/models.ts calculateCost: clamp negative or non-finite token
  inputs to 0 before pricing. A corrupt JSONL emitting a negative
  count would otherwise produce a negative cost that silently
  subtracted from real spend in aggregates.

- src/currency.ts convertCost: stop rounding during aggregation.
  For zero-fraction currencies (JPY, KRW, CLP) this clamped every
  per-session cost to a whole unit before sum, so a project of
  1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of
  ¥400. formatCost still rounds at the display boundary.

- src/config.ts saveConfig: the temp file path was a fixed
  `${configPath}.tmp` suffix. Two simultaneous saveConfig calls
  (overlapping menubar and CLI runs) raced on the same staging
  file and could leave one writer reading partial bytes from the
  other. Randomized the temp suffix per call.

- src/providers/antigravity.ts flushCache: the early return on
  `!cacheDirty` short-circuited eviction when liveCascadeIds was
  supplied but no cascade had been added or updated this run. As
  a result, deleted .pb files persisted in the cache forever once
  the user stopped writing to it. Eviction now runs whenever
  liveCascadeIds is provided, marks the cache dirty if anything
  was removed, and only then short-circuits if there is nothing
  to write.

- src/daily-cache.ts addNewDays: cap retention at 2 years. The
  days array previously merged forever, growing the cache file by
  hundreds of bytes per day until JSON parse on every CLI
  invocation became measurable. The 6-month UI period plus the
  365-day BACKFILL_DAYS bootstrap both fit comfortably inside the
  cap, with headroom for a future longer window.

- src/dashboard.tsx useInput: period number keys (1-5) and arrow
  keys triggered a reload while the compare view was mounted. The
  parent's data state changed underneath the user with no visual
  affordance back to the dashboard. Now those keys are gated on
  view !== 'compare', and `b` / Esc inside compare returns to the
  dashboard.

- mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend
  Bars, computeTrendStats, computeForecast, and computeAllStats
  each allocated a fresh DateFormatter (and Calendar) on every
  call. SwiftUI re-evaluates these views many times per second
  during hover scrubbing on the trend chart, so the allocations
  were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d"
  / "MMM d" formatters and the gregorian Calendar to fileprivate
  cached singletons.

Two findings from the same bucket were not addressed here:
- UpdateChecker SHA-256 / codesign verification is already
  performed by src/menubar-installer.ts (verifyChecksum at line
  85). The Swift side just kicks off `codeburn menubar --force`
  which runs that path. The audit's claim of missing verification
  was a misread.
- NSDistributedNotificationCenter sender validation: the
  `com.codeburn.refresh` listener accepts from any sender, but
  forceRefresh has a 5-second rate-limit gate so the abuse
  ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC,
  per-launch shared secret) are disproportionate to the impact.

vitest run: 38 files, 529 tests pass.
swift build -c release: clean, no warnings.

* Validator hardenings on the bug-hunt batch

Hoist the per-call sort in getModelCosts and getShortModelName to module
scope so model lookups on the hot path stop reallocating sorted key arrays.

Sanitize the unknown-model stderr warning by stripping C0/C1 controls
and capping length, so a hostile or corrupt JSONL cannot inject terminal
escape sequences via the model field.

Skip the daily-cache prune when newestDate fails to parse. The previous
code produced a NaN cutoff and silently dropped every cached day on the
next merge.

Adds tests locking down the stable resolution of common model names
(gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and
the prune NaN guard.
2026-05-06 19:50:40 -07:00
iamtoruk
2817ebff47 chore: refresh litellm pricing snapshot
Some checks are pending
CI / semgrep (push) Waiting to run
2026-05-06 10:59:42 -07:00
iamtoruk
6e704271c9 chore: sync lockfile to package.json 0.9.6 2026-05-06 10:59:42 -07:00
iamtoruk
3c8ce32bf3 Fix popover anchor, tab strip flicker, and stale-data refresh
Five interleaving menubar regressions traced back to the cache-wipe and
showLoading additions in 18c3c8f, surfaced by adversarial multi-agent
review against the v0.9.6 baseline.

- forceRefresh no longer calls store.invalidateCache(). Wiping the
  whole cache on every wake or manual refresh emptied todayPayload,
  flipped showAgentTabs to false, and made cache[key] == nil for all
  keys, which forced the full-popover loading overlay over already
  rendered data. The day-rollover guard inside refresh() still wipes
  the cache when the calendar date changes, so the legitimate part of
  18c3c8f is preserved.

- Overlay condition is now !store.hasCachedData. Without this, the
  popover briefly rendered $0.00 placeholders before the overlay slid
  in on a cold key, and reflashed the overlay on every manual refresh
  even when fresh data was on screen.

- refreshStatusButton skips while popover is anchored. Rewriting the
  button's attributedTitle changes its intrinsic width, which makes
  macOS reflow the status item and detaches the anchored popover to
  the screen's top-left default position. popoverDidClose runs the
  refresh once so the menubar title catches up immediately on
  dismiss.

- showAgentTabs is sticky via hasAnyProvidersInCache. Prevents the
  one-frame flicker where the tab strip vanished while the new key's
  payload had not yet arrived.

- observeStore tracks store.currency. Without this, switching
  currency did not propagate to refreshStatusButton until the next
  30s payload tick, leaving the menubar showing the old currency
  symbol and rate.

- Day-rollover race in refresh and refreshQuietly: capture cacheDate
  at fetch start, drop the write if the calendar date changed during
  the await. Prevents an in-flight fetch from yesterday polluting
  today's freshly cleared cache.

- Manual refresh button passes showLoading: true again. Safe now that
  the overlay is gated on cache state instead of isLoading; the
  refresh button icon swaps to the spinner glyph for visible feedback,
  while the popover body keeps the existing data and updates when the
  fetch lands.
2026-05-06 10:59:42 -07:00
Resham Joshi
75d4701bd8
feat(optimize): flag low-worth expensive sessions
Some checks are pending
CI / semgrep (push) Waiting to run
Adds a low-worth detector to codeburn optimize that flags expensive sessions with weak delivery signals (no edits, repeated retries, or no one-shot edits) when no git/gh delivery command is observed. Priority order is low-worth → context-bloat → outliers; each later detector excludes sessions named by an earlier one so the same session is never listed in three findings. Detection:  floor,  for no-edit, 3+ retries, regex matches git commit/push and gh pr create/merge but excludes commit-tree/commit-graph and dry-run. Three impact tiers consistent with #246. Token-savings uses full session tokens for no-edit sessions and the retry fraction for edit-with-retry sessions. Supersedes #241 with review fixes. Original implementation by @ozymandiashh.
2026-05-06 00:35:41 -07:00
Resham Joshi
f92d57d24a
feat(optimize): detect context-heavy sessions
Adds a context-bloat finding to codeburn optimize that flags sessions where effective input/cache tokens (cache-discounted via existing pricing constants) are large and disproportionate to output. Suggests starting fresh with a tightened context. Sessions flagged here are excluded from the cost-outlier finding to avoid double-listing. Growth-from-previous-session callouts are suppressed when the predecessor is more than 7 days back. Three impact tiers (low/medium/high). Supersedes #242 with review fixes from real-data probe. Original implementation by @ozymandiashh.
2026-05-06 00:11:12 -07:00
Resham Joshi
6151cf6d73
fix(parser): use Claude cwd for Windows project paths
Reads the canonical cwd already stored inside Claude session JSONL files and uses it as the project path, then groups sessions by a normalized path key (case + slash insensitive) so Windows projects no longer split into 3+ rows on case/slash variants. Falls back to the legacy slug-derived path when cwd is missing. Closes #217. Supersedes #228 with a fix that preserves the canonical cwd even when mixed with slug-only sessions in the same directory. Original implementation by @ozymandiashh.
2026-05-05 23:53:31 -07:00
Resham Joshi
be6068b244
feat(report): add per-model efficiency metrics
Adds per-model efficiency metrics (edit turns, one-shot rate, retries/edit, cost/edit) to the TUI By Model panel, JSON report output, and CSV export. Closes item 4 of #12. Supersedes #226 with review fixes (units rename, min-sample guard in TUI, tighter <synthetic> filter, multi-model attribution test). Original implementation by @ozymandiashh.
2026-05-05 23:36:59 -07:00
ozymandiashh
fc4c4f0091
feat(export): support custom date ranges 2026-05-05 23:18:48 -07:00
iamtoruk
869474b3b4 Fix update button stuck spinning after successful install
Some checks are pending
CI / semgrep (push) Waiting to run
terminationHandler only reset isUpdating on non-zero exit, assuming
the app would be killed and relaunched on success. If pkill fails
silently the old process survives with isUpdating stuck true. Now
always resets on termination and clears the update badge on success.
2026-05-05 11:38:54 -07:00
iamtoruk
18c3c8f908 Fix stale menubar data after sleep and silent refresh button
Cache now tracks the calendar date and clears on day rollover so
overnight sleep no longer shows yesterday's numbers. Wake-from-sleep
invalidates the entire cache before fetching. Manual refresh and wake
explicitly request loading feedback so the spinner is visible even
when stale data exists.
2026-05-05 11:35:38 -07:00
Resham Joshi
474c71a77b
Merge pull request #220 from NihalJain/fix_copilot_auto
fix: update model aliases and enhance event parsing for Copilot provider
2026-05-05 11:24:04 -07:00
iamtoruk
58b14c7561 Keep copilot-auto as provider-agnostic fallback
copilot-openai-auto maps to gpt-5.3-codex pricing, which is wrong
for users on Anthropic models. The original copilot-auto fallback
is provider-agnostic and correct when no model can be inferred.
2026-05-05 11:23:52 -07:00
iamtoruk
719432aaf4 Add session outlier detector and menubar spinner fix to changelog
Some checks are pending
CI / semgrep (push) Waiting to run
2026-05-04 23:12:46 -07:00
iamtoruk
c706cd2de2 Strip optimize from menubar, fix stuck loading spinner
The menubar ran --optimize on every 30-second CLI invocation. As
sessions accumulated throughout the day, optimize got heavier until
it exceeded the 45-second timeout. When the fetch failed with no
cached data, the loading overlay had no escape hatch and stayed
forever.

- Never pass includeOptimize from the menubar (background loop,
  forceRefresh, tab/period switches, manual refresh button)
- On fetch failure with empty cache, retry without optimize as
  fallback so the spinner always clears
- refreshQuietly also skips optimize
2026-05-04 23:11:42 -07:00
Resham Joshi
e7560052ed
Merge pull request #224 from ozymandiashh/feat/session-outlier-detection
feat(optimize): detect session cost outliers
2026-05-04 20:21:33 -07:00
iamtoruk
38d21643bd Merge origin/main into feat/session-outlier-detection 2026-05-04 20:21:26 -07:00
Resham Joshi
4ac8e8de34
Merge pull request #223 from ozymandiashh/feat/mcp-tool-coverage
feat(optimize): MCP tool coverage detector with cache-aware costing
2026-05-04 20:13:27 -07:00
iamtoruk
5120ec696c Merge origin/main into feat/mcp-tool-coverage 2026-05-04 20:13:15 -07:00
iamtoruk
735f41bc6c Fix cache-write pricing and shell-quote server names in fix commands
- Use 1.25x multiplier for cache-write tokens to match Anthropic's
  actual pricing (was incorrectly using 1x)
- Shell-quote server names in `claude mcp remove` fix text to prevent
  issues with unusual server names
2026-05-04 20:11:50 -07:00
iamtoruk
bfa5fe7fa0 fix(labels): update remaining 'all' period labels to '6 Months'
PR #221 unified the period logic but missed the TUI hotkey bar,
GNOME indicator popup, and macOS menubar app. All surfaces now
consistently show '6 Months' instead of 'All' or 'all time'.
2026-05-04 19:46:20 -07:00
Resham Joshi
32dfa8e165
Merge pull request #221 from ozymandiashh/fix/unify-all-period-semantics
fix(date-range): unify 'all' period semantics between CLI and dashboard
2026-05-04 19:43:28 -07:00
ozymandiashh
d18ba3d2fe feat(optimize): detect session cost outliers 2026-05-05 05:25:49 +03:00
ozymandiashh
e46b20b927 fix(optimize): reuse mcp coverage and type schema estimator 2026-05-05 05:11:00 +03:00
ozymandiashh
9a258a8a99 fix(date-range): avoid all-period month overflow 2026-05-05 05:05:13 +03:00
ozymandiashh
1a080a006f feat(optimize): MCP tool coverage detector with cache-aware costing
Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".

Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.

Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.

When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.

A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.

Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.

`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."

Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged

Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.

Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
  Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
  fully-qualified names, returns sorted unique list. `buildSessionSummary`
  passes it through; field is omitted when empty so JSON exports stay
  clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
  (single- and multi-server signatures), `detectMcpToolCoverage`. Wired
  into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
  new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
  combined-cap behaviour, threshold gates, invocation-only-session
  filtering, foreign-tool invocations, cache rebuild events, write+read
  on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
  including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).

Closes #2
2026-05-05 04:13:04 +03:00
Resham Joshi
f5cbfe28bb
Overhaul GNOME Shell extension with full-featured UI (#222)
Some checks are pending
CI / semgrep (push) Waiting to run
* Rewrite GNOME extension UI with branded popover matching macOS menubar

Combines PR #212's modular architecture (DataClient, GSettings, Libadwaita
prefs) with the custom St widget UI from feat/tauri-menubar-win-linux.

Adds: branded header, horizontal agent tabs, hero typography, period/insight
pills, 19-day token histogram, 6 content views (Activity, Trend, Forecast,
Pulse, Stats, Plan), currency switcher with FX conversion, findings CTA,
budget alerts, theme detection, payload caching with TTL.

* Add Main.panel.addToStatusArea call to extension entry point

* Align activity/model rows as table with separators, gear icon for prefs

* Add table column headers, oneshot placeholder, currency picker dropdown

* Enhance GNOME extension with scrollable UI, dark mode, charts, and performance fixes

- Add vertical scroll for popup content and horizontal scroll for 6+ provider tabs
- Add token histogram chart with hover tooltips showing date, in/out tokens, cost
- Add skeleton loading animation with stale-while-revalidate caching (5min TTL)
- Add dark/light theme support with force-dark-mode setting
- Add exact costs toggle for full decimal values
- Add right-aligned columns for cost, turns, oneshot, and model data
- Remove unsupported St CSS properties (text-align, letter-spacing)
- Fix post-destroy crash: guard async callbacks, abort Soup session on teardown
- Fix dataClient double-resolve race with settled guard
- Expand PATH resolution for volta, bun, cargo, asdf, fnm, pnpm
- Reduce CLI timeout from 45s to 15s and cache augmented PATH
- Remove unused imports (Pango, Main) and dead constants
- Show 10 top activities, remove Plan pill
2026-05-04 18:05:59 -07:00
ozymandiashh
3dc3e32715 fix(date-range): unify 'all' period semantics between CLI and dashboard
`getDateRange` was duplicated across `src/cli.ts` and `src/dashboard.tsx`
with conflicting semantics for `'all'`. The CLI intentionally bounded
`'all'` to the last 6 months (justified inline: keeps Codex/Cursor parses
responsive on sparse multi-year history). The dashboard returned
`new Date(0)` instead, so the same `--period all` flag silently meant
two different windows depending on which entry point you hit.

`Period`, `PERIODS`, `PERIOD_LABELS`, and `toPeriod` were duplicated as
well, and `cli-date.ts` already existed for date helpers
(`parseDateRangeFlags`) so the consolidation lives there.

Both call sites now go through a single `getDateRange(period: string)`
in `cli-date.ts` that returns `{ range, label }`. The dashboard wraps it
as `getPeriodRange(period: Period)` to keep the strict `Period` type at
the React boundary while letting the CLI continue to accept extras like
`'yesterday'`.

`PERIOD_LABELS.all` becomes `'6 Months'` (short, for the dashboard tab
strip; the previous `'All Time'` was misleading and the long-form
`'Last 6 months'` from `getDateRange().label` already drives CLI output).

Changes:
- src/cli-date.ts: add `Period`, `PERIODS`, `PERIOD_LABELS`, `toPeriod`,
  `getDateRange`. Pull the existing 6-month rationale into a named
  `ALL_TIME_MONTHS` constant.
- src/cli.ts: drop the local copies and import from cli-date.
- src/dashboard.tsx: drop the local copies, route through
  `getPeriodRange`, alias the shared `getDateRange` import to
  `getDateRangeShared` to avoid shadowing the wrapper.
- tests/cli-date.test.ts: 13 cases covering `'all'` regression guard
  (must never silently fall back to `Date(0)`), CLI/dashboard agreement,
  end-of-month clamping tolerance, `'yesterday'` support, and
  unknown-input fallback.
- README.md, CHANGELOG.md: surface the bound and point heavy users at
  `--from`/`--to` for unbounded windows.

The CLI flag `--period all` continues to be accepted; only the dashboard
window changes to match what the CLI was already doing. No public API
or schema change.

Refs #93
2026-05-05 03:53:46 +03:00
Nihal Jain
90772a3267 fix: update model aliases and enhance event parsing for Copilot provider
- Fixes #219
2026-05-05 04:21:40 +05:30
Resham Joshi
18335a1f9d
Merge pull request #212 from thameem-abbas/feat/gnome-extension
Some checks are pending
CI / semgrep (push) Waiting to run
Merging GNOME Shell extension as foundation - will enhance UI in follow-up
2026-05-04 10:50:36 -07:00
iamtoruk
6fb05c6813 Fix command injection in yield via execFileSync
Replace execSync with execFileSync and argument arrays so shell
metacharacters in git branch names cannot be interpreted as commands.
Add SAFE_REF_PATTERN validation as defense in depth for branch names
from git symbolic-ref.

Addresses #214.
2026-05-04 10:13:40 -07:00
iamtoruk
15334fac67 Add SHA-256 checksum verification to menubar installer
The installer now downloads and verifies a .sha256 companion file
before extracting and launching the menubar app. Build script and
CI workflow generate the checksum alongside the zip. Adds SECURITY.md
with reporting instructions.

Addresses #215.
2026-05-04 10:08:58 -07:00
thameem-abbas
87e45f43df fix: compact mode toggle updates instantly without restart
Toggle label visibility instead of rebuilding panel children.
Label always added to panel, just hidden when compact=true.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 09:46:21 -04:00
thameem-abbas
30b3ad0503 feat: add GNOME Shell extension for Linux panel indicator
GNOME 45+ extension that shows live token costs in the top bar panel
with a dropdown for provider breakdown, top activities/models, cache
stats, and budget alerts. Polls `codeburn status --format menubar-json`
every 30s — same data contract as the macOS menubar app.

Includes GSettings preferences (refresh interval, compact mode, budget
threshold, per-provider enable/disable toggles) with Libadwaita UI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 09:44:35 -04:00
Resham Joshi
cf8c2aa493
Merge pull request #206 from voidborne-d/fix/skill-subcategory-203
Some checks are pending
CI / semgrep (push) Waiting to run
fix(classifier): surface skill name as subCategory for general turns (#203)
2026-05-03 16:42:47 -07:00
Resham Joshi
ac8081bb08
Merge pull request #207 from ozymandiashh/fix/codex-stream-large-sessions
Stream-parse Codex session files to fix oversize-cap drops on heavy users
2026-05-03 16:35:30 -07:00
ozymandiashh
61be92a834 Stream-parse Codex session files to handle 250+ MB rollouts
Heavy Codex users hit MAX_SESSION_FILE_BYTES (128 MB) on long-running
sessions. The file is read in full via readSessionFile and then split on
'\n', so even bumping the cap eventually runs into V8's 512 MB string
limit (split doubles the high-water mark).

readSessionLines is a streaming generator that already exists in
fs-utils for exactly this case but only readFirstLine was using it.
Switch the Codex provider to consume it and let the cap apply only when
streaming would still be unreasonable.

Changes:
- src/fs-utils.ts: introduce MAX_STREAM_SESSION_FILE_BYTES (2 GB) and
  apply it in readSessionLines instead of the full-read cap. Keep
  MAX_SESSION_FILE_BYTES for readSessionFile / readSessionFileSync
  consumers that materialize the whole file.
- src/providers/codex.ts: replace `readSessionFile -> split('\n')` with
  `for await (... of readSessionLines)`. Add sawAnyLine guard so a
  failed/empty stream skips cache write, preserving the previous
  early-return behavior.

Empirical impact on a real account with one 247 MB rollout: 7-day totals
went from 4,536 calls / €358.69 / 20.1M input tokens to 6,111 calls /
€550.67 / 37.3M input tokens. The previously-skipped session is now
included; no other behavior changes.

Refs #204
2026-05-04 02:15:04 +03:00
voidborne-d
c16b21ec50 fix(classifier): surface skill name as subCategory for general turns (#203)
Turns whose only assistant tool is `Skill` collapse to category `general`
because `classifyByToolPattern` returns `'general'` and `refineByKeywords`
only operates on `coding`/`exploration`. In environments that lean on Claude
Code skills, the per-activity dashboard column flattens — every `/init`,
`/review`, `/security-review`, `/claude-api`, plus user-defined skills, all
land in `general` with no signal about which workflow ran.

Implements Option A from the issue:

- `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser
  via a new `extractSkillNames` helper that reads `input.skill || input.name`
  from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at
  optimize.ts:765 so the two stay in sync).
- `ClassifiedTurn.subCategory?: string` set to the first skill name when the
  resolved category is `general` AND any skill identifier was extracted.
  Top-level category stays `general` — existing aggregations, exports, and
  category-keyed code paths unchanged.
- `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns,
  oneShotTurns}>` populated in the same per-turn loop that builds
  `categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills:
  []` — they don't expose the Skill tool surface today.
- Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the
  `general` row when present (indented `/skill-name`, dimmed). Other
  categories render exactly as before; if no skills were invoked, the panel
  is byte-identical to current output.

Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases:
single skill via `input.skill`, single via `input.name`, first-wins for
multi-skill turns, aggregation across multiple assistant calls in one turn,
no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to
`coding` and dropping subCategory, non-Skill general turns, and a legacy
ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix
verification by stashing the source change reproduces 4/8 failures with the
exact "expected 'init', received undefined" diff; restoring → 8/8 pass.

Closes #203.

🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7);
author of record reviewed every line, ran the full vitest suite locally
(`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and
`npm run build` produces a clean ESM bundle.
2026-05-04 06:26:45 +08:00
iamtoruk
7501aee130 Add Goose and Antigravity providers to README, bump provider count to 18 2026-05-03 12:14:35 -07:00
iamtoruk
6dbd25ce55 Bump version to 0.9.6 2026-05-03 12:12:56 -07:00
iamtoruk
988398abfc Fix $0.0000 display for near-zero costs
Closes #205
2026-05-03 12:11:10 -07:00
iamtoruk
6702d55345 Fix menubar provider view showing $0.00 after idle and refresh race condition
CLI timeout increased from 20s to 45s to handle cold file-cache latency on
provider-specific queries. Loading overlay now appears when the all-provider
payload confirms a provider has spend but its dedicated data hasn't loaded yet.
Manual refresh (force: true) bypasses the in-flight guard so users can always
re-fetch. Tab strip prefers the provider-specific payload cost when available
so it stays in sync with the hero section.
2026-05-03 12:00:03 -07:00
AgentSeal
8cf68e7a16 Fix Antigravity dedup collision and add Codex ChatGPT Plus token estimation (#204)
- Antigravity: use loop index as fallback when responseId is empty to prevent
  all entries in a cascade sharing the same dedup key; bump CACHE_VERSION to
  force re-parse of stale cached data
- Codex: estimate tokens from message text when info is null (ChatGPT Plus/Pro
  subscription sessions), feeding through calculateCost so subscription users
  see API-equivalent spend; add costIsEstimated flag to ParsedProviderCall
- Update LiteLLM pricing snapshot
2026-05-03 19:51:13 +02:00
Resham Joshi
95585febf4
Merge pull request #201 from getagentseal/fix/streaming-dedup
Some checks are pending
CI / semgrep (push) Waiting to run
Fix streaming dedup: keep last message.id occurrence for accurate tokens and tools
2026-05-02 22:30:48 -07:00
iamtoruk
800c106250 Fix streaming dedup: keep last occurrence of each message.id within session files
Claude Code writes the same message.id multiple times during streaming.
The first write has partial tokens (often 1) and no tool_use blocks.
The last write has authoritative token counts and all tool_use/MCP blocks.

Old behavior kept the first occurrence (keep-first), silently dropping
real output tokens (+6.3% undercount) and all MCP tool calls.

New behavior keeps the last occurrence's content but preserves the first
occurrence's timestamp for correct date bucketing.

Validated against 21,390 real session files: 40.5% had duplicate IDs,
output tokens were understated by up to 78% per session.
2026-05-02 22:30:17 -07:00
iamtoruk
341aa46f78 Strip ANSI escapes from bash commands across all providers
Use strip-ansi (already in dep tree via Ink) in extractBashCommands
to prevent terminal escape codes from leaking into dashboard bash
breakdown keys. Route goose, gemini, qwen, and openclaw through
extractBashCommands instead of inline split, which also gives them
multi-command extraction (matching claude/codex/droid behavior).
2026-05-02 20:59:24 -07:00
iamtoruk
292265bf47 Add deno dx as a run method 2026-05-02 20:31:27 -07:00