codeburn

mirror of https://github.com/AgentSeal/codeburn.git synced 2026-05-17 03:56:45 +00:00

Author	SHA1	Message	Date
Resham Joshi	daa673449c	Menubar and CLI hardening from multi-agent audit (#257 ) Some checks are pending CI / semgrep (push) Waiting to run Details Two passes of validators across CLI accuracy, dashboard UX, menubar Swift, performance, security, and end-to-end smoke tests on real session data. Data-correctness fixes: - parseLocalDate rejects month/day overflow. JS Date silently rolled Feb 31 to Mar 3, so --from 2026-02-31 --to 2026-03-15 quietly dropped sessions on Feb 28 - Mar 2. Now throws "Invalid date" with a clear reason. Leap-day case covered (2024-02-29 valid, 2025-02-29 rejected). - CSV/JSON exports use the active currency's natural decimal places. The previous round2 helper produced ¥412.37 in CSV while the dashboard rendered ¥412 — finance teams comparing the two surfaces saw a discrepancy. New roundForActiveCurrency consults Intl.NumberFormat for the right precision (0 for JPY/KRW/CLP, 2 for USD/EUR, etc). - Copilot toolRequests is Array.isArray-guarded in both modern and legacy event branches. Previously a corrupt session with toolRequests=null or a string aborted the whole file's parse loop and silently dropped every legitimate call after it. - Codex token_count dedup uses a null sentinel for prevCumulativeTotal so the first event is never confused with a duplicate. Sessions that emit only last_token_usage (no total_token_usage) report cumulativeTotal=0 on every event; with the previous 0-initialized prev, the first event matched the dedup guard and was dropped. - LiteLLM pricing values are clamped to [0, 1] per token via safePerTokenRate. Defense in depth against a tampered upstream JSON shipping negative or absurdly large per-token costs that would otherwise propagate into all cost totals. Performance: - Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Two changes: per-conversation user-message buffer uses an index pointer instead of Array.shift() (which was O(n) per call); and a real ROWID cutoff via subquery limits the scan to the most recent 250k bubbles with a stderr warning so power users get a partial report rather than a stalled CLI. - Spawned codeburn CLI subprocesses are terminated when the calling Task is cancelled. Without this, rapid period/provider tab clicks in the menubar cancelled the Task but left the subprocess running to completion, piling up zombie processes. UX: - Dashboard period switch flips to loading and clears projects synchronously before reloadData runs, eliminating the frame where the new period label rendered over the old period's projects. - Optimize findings tab paginates 3-at-a-time with j/k scroll. With 4 new detectors plus 7 originals, 8-10 findings * 6 lines was scrolling the StatusBar off the alt buffer top. - Custom --from/--to ranges hide the period tab strip and disable the 1-5 / arrow keys so a stray period press no longer abandons the user's explicit range. A "Custom range: X to Y" banner replaces the tab strip. - OpenCode storage-format warning is per-table-set, rate-limited to once per process, and points the user at OpenCode's migration step or the issue tracker. The previous all-or-nothing check fired the generic "format not recognized" string for any schema mismatch. Menubar / OAuth: - Both Claude and Codex bootstrap (Reconnect button) now honour the usageBlockedUntil 429 backoff that refreshIfBootstrapped respects. Spamming Reconnect during sustained rate-limit windows previously hammered the upstream endpoint on every click. - Codex Retry-After HTTP header is parsed (delta-seconds plus IMF-fixdate fallback) so we don't over-back-off when ChatGPT tells us a shorter window than our 5-minute floor. - Both credential cache files are written via SafeFile.write (O_CREAT \| O_EXCL \| O_NOFOLLOW with explicit 0600) so there is no race window where the temp file briefly exists at default umask, and a symlink at the destination cannot redirect the write. Reads now route through SafeFile.read with a 64 KiB cap, closing the symlink-follow gap on Data(contentsOf:). CI signal: - TypeScript strict typecheck (tsc --noEmit) is now zero errors. The six errors in src/providers/copilot.ts came from a discriminated-union catch-all branch whose `data: Record<string, unknown>` shape TS picked over the specific event branches when narrowing on `type`. Removed the catch-all; runtime falls through unknown event types via the existing if/else chain. Tests added: 16 new (now 555 total) - date-range-filter: month/day/year overflow rejection, leap-day correctness - currency-rounding: convertCost no-rounding contract, roundForActiveCurrency for USD/JPY/KRW/EUR - providers/copilot: malformed toolRequests does not abort the parse - providers/cursor-bubble-dedup: re-parse after token mutation does not double-count, single parse yields one call per bubble - providers/codex: first event with cumulativeTotal=0 not dropped, consecutive zero-cumulative duplicates still deduped	2026-05-06 22:15:11 -07:00
Resham Joshi	afd0ee7011	Validator hardenings on the bug-hunt batch (#254 ) * Five correctness fixes from multi-agent bug hunt A multi-agent audit of the codeburn correctness surface found five real bugs each producing visibly wrong numbers or risking data loss. All five fixes were validated by parallel review agents and exercised end-to-end against real session data on this machine. - src/cli.ts: --refresh <seconds> was using bare parseInt as the commander callback. Commander invokes the callback as parseInt(value, previous), so previous becomes the radix: --refresh 30 was being parsed as parseInt('30', 30) = 90, and --refresh 60 became NaN. Replaced with parseInteger (already defined at line 48 with radix locked to 10) at all three sites. - src/providers/cursor.ts: parseAgentKv was timestamping every agentKv call as new Date().toISOString() because the Cursor SQLite schema has no per-message timestamp. Result: every Cursor agent call regardless of when it happened landed in today's date bucket. Now uses statSync(dbPath).mtimeMs as a bounded ceiling so calls land at the actual last-write time of the Cursor database, not today. Verified locally: a 1904-call Cursor history with March 22 mtime now correctly bucket into all-time only and shows 0 calls for today/week/30days. - src/providers/codex.ts: prev token counters were only updated inside the cumulative-fallback branch, so a session emitting N events with last_token_usage followed by one cumulative-only event computed the next delta against prev=0 and double-counted the entire cumulative window. Cost could be inflated 10-100x for any mixed-format Codex session. Now prev advances to the current cumulative state regardless of which branch ran. - src/providers/gemini.ts: totalOutput accumulated output+thoughts while totalThoughts was tracked separately. The result was outputTokens = output+thoughts AND reasoningTokens = thoughts; any consumer summing the two double-counted thoughts. Now totalOutput holds just output, reasoningTokens holds thoughts, and the cost calc folds thoughts into the output count to keep pricing correct (Google bills thoughts at the output rate; calculateCost has no reasoning parameter). - src/export.ts: exportJson had no safety check before writeFile, so codeburn export -f json -o ~/important.json would silently clobber the user's file. CSV path had a marker-file guard; JSON did not. Now refuses to overwrite a file unless its first 4KB contain the codeburn schema marker. Uses a streaming partial read so a large existing file does not OOM Node's ~512MB string limit. Refuses directories outright. Skipped intentionally: cursor-auto/copilot-auto/cline-auto/ qwen-auto are aliased to claude-sonnet-4-5. The audit flagged this as wrong pricing for non-Anthropic auto-routed turns, but Cursor's "auto" mode does not expose the actual model and any alternative estimate is equally arbitrary. README already documents this as a Sonnet-based estimate. vitest run: 38 files, 529 tests pass. * Five more correctness fixes from the bug-hunt round This commit closes out the remaining critical-tier findings from the multi-agent audit, with one item documented as a known limitation. - src/providers/cursor.ts: bubble dedup key included mutable inputTokens/outputTokens. Cursor mutates token counts on the row in place when streaming completes, so re-parsing the same DB produced a fresh dedup key per bubble and silently double-counted. Switched to the SQLite row key (`bubbleId:<unique>`) which is stable per bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose `key as bubble_key`. - src/providers/pi.ts: usage fields were destructured non-optionally, but real Pi/OMP session files sometimes omit individual fields. `calculateCost(model, undefined, ...)` returned NaN, and that NaN propagated into every aggregate cost total. Coerce each field to 0 with `?? 0`. - src/models.ts: getShortModelName and the getModelCosts startsWith fallback both walked the dictionary in insertion order. A model id like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched by startsWith first) and silently get GPT-5's display name and pricing tier. Iterate longest keys first so more-specific prefixes win. Tightened the cost fallback's match condition from `startsWith(key) \|\| startsWith(key + '-')` to require either an exact match or a `key + '-'` continuation, removing accidental matches like `gpt-50` against `gpt-5`. - src/models.ts: calculateCost returned 0 silently for any model missing from the pricing snapshot. New Anthropic / OpenAI models shipped between snapshot refreshes look free until the user notices. Now warns once per unknown model name per process to stderr. Skips the warning for the `<synthetic>` placeholder so the noise floor stays low. - src/yield.ts: revert detection was broken on the canonical case. Two problems: (1) `subject.toLowerCase().includes('revert')` matched any commit whose subject mentioned the word ("Add revert button" was misclassified). (2) The window logic only counted reverts within the original session's 1-hour boundary, but real `git revert` commits land in later sessions, so original sessions always looked productive. Now: getRevertedShas runs once with `--grep=^This reverts commit` and parses bodies to build a Set of SHAs that were the target of a revert anywhere in history. CommitInfo.wasReverted is set when this commit's SHA appears in that set. categorizeSession then flags a session as reverted when its in-main commits were later reverted, regardless of when the revert itself happened. - src/providers/droid.ts: SKIPPED with comment. Droid records token usage only at session level. The current behavior splits evenly across emitted assistant calls and prices all of them at settings.model (the latest model). For sessions where the user switched models mid-stream, costs are approximate. Added an inline comment documenting this; a real fix requires per-message model data that isn't in the Droid JSONL schema. Verified end-to-end on this machine: - vitest run: 38 files, 529 tests pass - `codeburn report --format json` produces valid JSON - `codeburn yield -p week` runs without crashing, finds 0 reverts in the user's recent git history (plausible — fix changed the detection from "subject contains revert" to "this commit's SHA appears in a later 'This reverts commit ...' body") - Stderr now warns for unknown model ids: `openai/gpt-5.3`, `qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced silently at $0. * Four high-severity fixes from the bug-hunt round - src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in one try/catch. If fetchRate succeeded but cacheRate threw (disk full, ENOSPC, no permissions on the cache dir), the catch block swallowed the error and returned 1. Every cost rendered after that point became USD-equivalent silently. Now the fetch and the cache write live in separate paths: a successful fetch returns the rate even if the persist fails, and the cache-write error is dropped to a fire-and-forget so transient disk problems do not corrupt the user's currency display. - src/cursor-cache.ts: writeFile was non-atomic. Two concurrent codeburn invocations writing to cursor-results.json could interleave bytes mid-write, leaving a truncated file that parsed-error on next read and forced a full SQLite re-scan every run. Switched to the temp-file + rename pattern with a randomized temp name so each writer gets its own staging file and the rename is atomic on POSIX. Crash mid-write also leaves only a leftover temp file, which gets unlinked in the catch path; the destination is never half-written. - mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's Task.sleep keeps a wakeup pending across system sleep, so on wake the natural tick fires the same instant the wake observers do. Combined with didWakeNotification, screensDidWakeNotification, and the launchd com.codeburn.refresh distributed notification, that produced 2-3 concurrent CLI spawns within ms of every wake. Now: willSleepNotification cancels the loop task; didWakeNotification restarts it. The loop also reads lastRefreshTime and skips its natural tick if a wake/manual/distributed-notification refresh ran within the last 5 seconds, coalescing the two sources of refresh into one CLI spawn per wake event. - mac/.../CodeBurnApp.swift observeStore: the read closure had an implicit strong self capture (it accessed store.* without a capture annotation), pinning self for the lifetime of any unfired observation. Added [weak self] and a guard to make the capture explicit. withObservationTracking is one-shot per call, so there is at most one active subscription at a time; the earlier audit's claim of an unbounded leak overstated the issue, but tightening the capture pattern is still cleaner. Verified: - vitest run: 38 files, 529 tests pass - swift build -c release --arch arm64 --arch x86_64: clean, no diagnostics, no MainActor warnings - mac/Scripts/package-app.sh dev produces a valid universal bundle - Menubar launches and runs without crash * Eleven medium-severity fixes from the bug-hunt round - src/format.ts formatTokens: guard against Infinity, NaN, and negative input. Previously a corrupt aggregate could leak into the UI as the literal strings "NaN" or "Infinity". Negatives now render as "0" rather than "-500" with no scaling. - src/cli-date.ts parseDateRangeFlags: the missing-from default was new Date(0), which opened a 55-year scan from 1970 epoch whenever the user passed only --to. Default now anchors at 6 months back from now, matching the dashboard's all-time period. Test updated to assert the new bounded window. - src/cli-date.ts toPeriod: previously fell back silently to "week" for any unknown input, so a typo like `-p mounth` produced a quiet 7-day report while the user thought they were viewing the month. Now exits with a clear stderr error and exit code 1. Test updated to assert the loud-failure behavior. - src/optimize.ts urgencyScore: rebalanced weights so a high-impact finding with zero observed tokens cannot outrank a medium-impact finding with millions of tokens. Old 0.7/0.3 split made high+0 (0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B (0.75) beat high+0 (0.50). Token normalization lifted to 5M so the ramp covers a realistic spend range. - src/models.ts calculateCost: clamp negative or non-finite token inputs to 0 before pricing. A corrupt JSONL emitting a negative count would otherwise produce a negative cost that silently subtracted from real spend in aggregates. - src/currency.ts convertCost: stop rounding during aggregation. For zero-fraction currencies (JPY, KRW, CLP) this clamped every per-session cost to a whole unit before sum, so a project of 1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of ¥400. formatCost still rounds at the display boundary. - src/config.ts saveConfig: the temp file path was a fixed `${configPath}.tmp` suffix. Two simultaneous saveConfig calls (overlapping menubar and CLI runs) raced on the same staging file and could leave one writer reading partial bytes from the other. Randomized the temp suffix per call. - src/providers/antigravity.ts flushCache: the early return on `!cacheDirty` short-circuited eviction when liveCascadeIds was supplied but no cascade had been added or updated this run. As a result, deleted .pb files persisted in the cache forever once the user stopped writing to it. Eviction now runs whenever liveCascadeIds is provided, marks the cache dirty if anything was removed, and only then short-circuits if there is nothing to write. - src/daily-cache.ts addNewDays: cap retention at 2 years. The days array previously merged forever, growing the cache file by hundreds of bytes per day until JSON parse on every CLI invocation became measurable. The 6-month UI period plus the 365-day BACKFILL_DAYS bootstrap both fit comfortably inside the cap, with headroom for a future longer window. - src/dashboard.tsx useInput: period number keys (1-5) and arrow keys triggered a reload while the compare view was mounted. The parent's data state changed underneath the user with no visual affordance back to the dashboard. Now those keys are gated on view !== 'compare', and `b` / Esc inside compare returns to the dashboard. - mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend Bars, computeTrendStats, computeForecast, and computeAllStats each allocated a fresh DateFormatter (and Calendar) on every call. SwiftUI re-evaluates these views many times per second during hover scrubbing on the trend chart, so the allocations were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d" / "MMM d" formatters and the gregorian Calendar to fileprivate cached singletons. Two findings from the same bucket were not addressed here: - UpdateChecker SHA-256 / codesign verification is already performed by src/menubar-installer.ts (verifyChecksum at line 85). The Swift side just kicks off `codeburn menubar --force` which runs that path. The audit's claim of missing verification was a misread. - NSDistributedNotificationCenter sender validation: the `com.codeburn.refresh` listener accepts from any sender, but forceRefresh has a 5-second rate-limit gate so the abuse ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC, per-launch shared secret) are disproportionate to the impact. vitest run: 38 files, 529 tests pass. swift build -c release: clean, no warnings. * Validator hardenings on the bug-hunt batch Hoist the per-call sort in getModelCosts and getShortModelName to module scope so model lookups on the hot path stop reallocating sorted key arrays. Sanitize the unknown-model stderr warning by stripping C0/C1 controls and capping length, so a hostile or corrupt JSONL cannot inject terminal escape sequences via the model field. Skip the daily-cache prune when newestDate fails to parse. The previous code produced a NaN cutoff and silently dropped every cached day on the next merge. Adds tests locking down the stable resolution of common model names (gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and the prune NaN guard.	2026-05-06 19:50:40 -07:00
Resham Joshi	75d4701bd8	feat(optimize): flag low-worth expensive sessions Some checks are pending CI / semgrep (push) Waiting to run Details Adds a low-worth detector to codeburn optimize that flags expensive sessions with weak delivery signals (no edits, repeated retries, or no one-shot edits) when no git/gh delivery command is observed. Priority order is low-worth → context-bloat → outliers; each later detector excludes sessions named by an earlier one so the same session is never listed in three findings. Detection: floor, for no-edit, 3+ retries, regex matches git commit/push and gh pr create/merge but excludes commit-tree/commit-graph and dry-run. Three impact tiers consistent with #246. Token-savings uses full session tokens for no-edit sessions and the retry fraction for edit-with-retry sessions. Supersedes #241 with review fixes. Original implementation by @ozymandiashh.	2026-05-06 00:35:41 -07:00
Resham Joshi	f92d57d24a	feat(optimize): detect context-heavy sessions Adds a context-bloat finding to codeburn optimize that flags sessions where effective input/cache tokens (cache-discounted via existing pricing constants) are large and disproportionate to output. Suggests starting fresh with a tightened context. Sessions flagged here are excluded from the cost-outlier finding to avoid double-listing. Growth-from-previous-session callouts are suppressed when the predecessor is more than 7 days back. Three impact tiers (low/medium/high). Supersedes #242 with review fixes from real-data probe. Original implementation by @ozymandiashh.	2026-05-06 00:11:12 -07:00
Resham Joshi	6151cf6d73	fix(parser): use Claude cwd for Windows project paths Reads the canonical cwd already stored inside Claude session JSONL files and uses it as the project path, then groups sessions by a normalized path key (case + slash insensitive) so Windows projects no longer split into 3+ rows on case/slash variants. Falls back to the legacy slug-derived path when cwd is missing. Closes #217. Supersedes #228 with a fix that preserves the canonical cwd even when mixed with slug-only sessions in the same directory. Original implementation by @ozymandiashh.	2026-05-05 23:53:31 -07:00
Resham Joshi	be6068b244	feat(report): add per-model efficiency metrics Adds per-model efficiency metrics (edit turns, one-shot rate, retries/edit, cost/edit) to the TUI By Model panel, JSON report output, and CSV export. Closes item 4 of #12. Supersedes #226 with review fixes (units rename, min-sample guard in TUI, tighter <synthetic> filter, multi-model attribution test). Original implementation by @ozymandiashh.	2026-05-05 23:36:59 -07:00
ozymandiashh	fc4c4f0091	feat(export): support custom date ranges	2026-05-05 23:18:48 -07:00
iamtoruk	38d21643bd	Merge origin/main into feat/session-outlier-detection	2026-05-04 20:21:26 -07:00
iamtoruk	5120ec696c	Merge origin/main into feat/mcp-tool-coverage	2026-05-04 20:13:15 -07:00
iamtoruk	735f41bc6c	Fix cache-write pricing and shell-quote server names in fix commands - Use 1.25x multiplier for cache-write tokens to match Anthropic's actual pricing (was incorrectly using 1x) - Shell-quote server names in `claude mcp remove` fix text to prevent issues with unusual server names	2026-05-04 20:11:50 -07:00
ozymandiashh	d18ba3d2fe	feat(optimize): detect session cost outliers	2026-05-05 05:25:49 +03:00
ozymandiashh	9a258a8a99	fix(date-range): avoid all-period month overflow	2026-05-05 05:05:13 +03:00
ozymandiashh	1a080a006f	feat(optimize): MCP tool coverage detector with cache-aware costing Adds a per-tool optimizer finding for MCP servers whose schema is loaded on every turn but rarely invoked. Builds on the existing server-level `detectUnusedMcp` (zero invocations) by reporting partial-use cases: "loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)". Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta` entries: `addedNames` lists the exact tools available at that turn, including every fully-qualified `mcp__<server>__<tool>` name. We union across all delta entries in a session (not just the first) because tool availability can change mid-session when the user reloads MCP config or a subagent inherits a different tool set. Names that don't match the `mcp__<server>__<tool>` shape with both segments non-empty are rejected at extraction so downstream `split('__')` consumers can't be poisoned. Token-savings estimates are cache-aware. MCP tool schemas live in the cached prefix of the system prompt: a session pays the full input price on each cache-creation turn (rebuilds happen every ~5 minutes of inactivity) and the cache-read discount on subsequent turns. Each call's contribution is capped at its observed `cacheCreationInputTokens` / `cacheReadInputTokens` so we never claim more MCP overhead than the call's own cache buckets could contain. When multiple servers are flagged, costing happens in a single combined pass: the per-call cap applies to the total unused-schema budget across all flagged servers, not per server. Two flagged servers cannot both independently claim the same call's cache bucket, which would otherwise overstate `tokensSaved` and misclassify findings as high impact. A session counts toward `loadedSessions` (and toward the cost estimate) only if its observed inventory included the server. Pure invocation-only sessions, where the server appears in `mcpBreakdown` or `call.mcpTools` without any matching `deferred_tools_delta`, do not satisfy the `>= 2 sessions` threshold on their own. The same invariant applies in `estimateMcpSchemaCost` so the two passes agree. Coverage is computed against the inventory only: invocations of names not present in any observed inventory (older config, hallucinated tool, typo) do not inflate `toolsInvoked` and cannot drive `unusedCount` negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length` to keep both numbers consistent. `detectUnusedMcp` and the new detector are explicitly disjoint: `detectUnusedMcp` skips servers that the coverage detector will report, not every server that happens to be in any inventory, so a small inventoried-but-uninvoked server below the coverage thresholds still gets flagged as "configured but never called." Thresholds for the coverage finding: - > 10 tools available (small servers are noise) - < 20% coverage - >= 2 sessions with observed inventory - High impact when total effective tokens >= 200_000 or >= 3 servers flagged Smoke-tested on a real account: 7 servers flagged across 93 sessions (`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37, `excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus `claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest. Changes: - src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`. Provider-agnostic field; currently populated only by the Claude parser. - src/parser.ts: `extractMcpInventory` walks all entries, validates fully-qualified names, returns sorted unique list. `buildSessionSummary` passes it through; field is omitted when empty so JSON exports stay clean. - src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost` (single- and multi-server signatures), `detectMcpToolCoverage`. Wired into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the new detector. - tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing, combined-cap behaviour, threshold gates, invocation-only-session filtering, foreign-tool invocations, cache rebuild events, write+read on the same call, multi-server pluralisation. - tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor including malformed name rejection and tolerant attachment parsing. - CHANGELOG.md: entry under Unreleased / Added (CLI). Closes #2	2026-05-05 04:13:04 +03:00
ozymandiashh	3dc3e32715	fix(date-range): unify 'all' period semantics between CLI and dashboard `getDateRange` was duplicated across `src/cli.ts` and `src/dashboard.tsx` with conflicting semantics for `'all'`. The CLI intentionally bounded `'all'` to the last 6 months (justified inline: keeps Codex/Cursor parses responsive on sparse multi-year history). The dashboard returned `new Date(0)` instead, so the same `--period all` flag silently meant two different windows depending on which entry point you hit. `Period`, `PERIODS`, `PERIOD_LABELS`, and `toPeriod` were duplicated as well, and `cli-date.ts` already existed for date helpers (`parseDateRangeFlags`) so the consolidation lives there. Both call sites now go through a single `getDateRange(period: string)` in `cli-date.ts` that returns `{ range, label }`. The dashboard wraps it as `getPeriodRange(period: Period)` to keep the strict `Period` type at the React boundary while letting the CLI continue to accept extras like `'yesterday'`. `PERIOD_LABELS.all` becomes `'6 Months'` (short, for the dashboard tab strip; the previous `'All Time'` was misleading and the long-form `'Last 6 months'` from `getDateRange().label` already drives CLI output). Changes: - src/cli-date.ts: add `Period`, `PERIODS`, `PERIOD_LABELS`, `toPeriod`, `getDateRange`. Pull the existing 6-month rationale into a named `ALL_TIME_MONTHS` constant. - src/cli.ts: drop the local copies and import from cli-date. - src/dashboard.tsx: drop the local copies, route through `getPeriodRange`, alias the shared `getDateRange` import to `getDateRangeShared` to avoid shadowing the wrapper. - tests/cli-date.test.ts: 13 cases covering `'all'` regression guard (must never silently fall back to `Date(0)`), CLI/dashboard agreement, end-of-month clamping tolerance, `'yesterday'` support, and unknown-input fallback. - README.md, CHANGELOG.md: surface the bound and point heavy users at `--from`/`--to` for unbounded windows. The CLI flag `--period all` continues to be accepted; only the dashboard window changes to match what the CLI was already doing. No public API or schema change. Refs #93	2026-05-05 03:53:46 +03:00
voidborne-d	c16b21ec50	fix(classifier): surface skill name as subCategory for general turns (#203 ) Turns whose only assistant tool is `Skill` collapse to category `general` because `classifyByToolPattern` returns `'general'` and `refineByKeywords` only operates on `coding`/`exploration`. In environments that lean on Claude Code skills, the per-activity dashboard column flattens — every `/init`, `/review`, `/security-review`, `/claude-api`, plus user-defined skills, all land in `general` with no signal about which workflow ran. Implements Option A from the issue: - `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser via a new `extractSkillNames` helper that reads `input.skill \|\| input.name` from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at optimize.ts:765 so the two stay in sync). - `ClassifiedTurn.subCategory?: string` set to the first skill name when the resolved category is `general` AND any skill identifier was extracted. Top-level category stays `general` — existing aggregations, exports, and category-keyed code paths unchanged. - `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns, oneShotTurns}>` populated in the same per-turn loop that builds `categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills: []` — they don't expose the Skill tool surface today. - Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the `general` row when present (indented `/skill-name`, dimmed). Other categories render exactly as before; if no skills were invoked, the panel is byte-identical to current output. Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases: single skill via `input.skill`, single via `input.name`, first-wins for multi-skill turns, aggregation across multiple assistant calls in one turn, no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to `coding` and dropping subCategory, non-Skill general turns, and a legacy ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix verification by stashing the source change reproduces 4/8 failures with the exact "expected 'init', received undefined" diff; restoring → 8/8 pass. Closes #203. 🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7); author of record reviewed every line, ran the full vitest suite locally (`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and `npm run build` produces a clean ESM bundle.	2026-05-04 06:26:45 +08:00
Nihal Jain	791f2b077d	Add gpt-5.5 model display name for Codex	2026-05-02 08:57:44 -07:00
ozymandiashh	ff8b20a79e	review: drop streamError flag, add multi-chunk and torn-write tests - Stop tracking a separate streamError flag. createReadStream's default 64 KiB highWaterMark means the stream may already be reading chunk 2 when we break out of the loop after yielding the first line; if that later chunk errors, the flag could reject an otherwise-valid line. readline's async iterator already re-throws stream errors on Node 16+, which the existing catch handles. - Test: 120 KB session_meta line forces multi-chunk line assembly. - Test: truncated mid-write first line is rejected, not parsed as half an object.	2026-05-02 02:34:41 +03:00
ozymandiashh	98bbe5b678	review: cap first-line read size and add edge-case tests - Cap createReadStream at 1 MiB so a malformed file with no newline cannot make readline buffer indefinitely (real session_meta lines are 22-27 KB). - Capture stream errors explicitly; readline's async iterator does not always re-throw underlying stream errors per Node docs. - Test: assert project is extracted from the >16 KB session_meta to prove the line was actually parsed, not just discovered. - Test: session_meta line with no trailing newline is still accepted. - Test: empty rollout file is silently skipped.	2026-05-02 02:30:17 +03:00
ozymandiashh	945da9f0ba	fix(codex): read full first line for session validation `readFirstLine` allocated a fixed 16 KB buffer, but Codex CLI 0.128+ embeds the entire base_instructions / system prompt in the `session_meta` line, pushing it past 20 KB. When the buffer doesn't catch a newline, `isValidCodexSession` rejects the session, so every recent Codex session is silently excluded from totals. Switch to a streaming readline read so the first line is captured regardless of length, and add a regression test that creates a 40 KB session_meta payload. Locally, this changes my 30-day Codex total from €267 (only ~half of sessions parsed) to €878 (all sessions parsed).	2026-05-02 02:17:53 +03:00
AgentSeal	888f592bd2	Make daily cache durable: hydrate from all commands, migrate instead of nuke - Extract ensureCacheHydrated() from menubar-json path into daily-cache.ts - Call it from every command that parses sessions (report, status, today, month, export, optimize, compare, yield) so CLI-only users also persist historical data that survives source file deletion - Replace strict version equality check with fill-defaults migration for cache versions 2-4, preserving history across schema changes - Back up old cache to .bak before discarding on unmigrateable versions - Fix Copilot auto bucket display names in menubar (Copilot (Anthropic), Copilot (OpenAI)) - Fix Roo Code / KiloCode provider key matching in menubar tab strip	2026-04-28 22:41:01 +02:00
Resham Joshi	fbb2c4e69c	Merge pull request #171 from ksp2000/feature/copilot-auto-model-buckets refactor(copilot): use auto model buckets for transcript inference	2026-04-28 12:17:50 -07:00
Dunccan de Weerdt	26ebe75aa1	Add Droid CLI provider Discovers and parses sessions from ~/.factory/sessions/, reading JSONL message logs and companion settings.json files for token usage tracking. - Discovers sessions by scanning per-cwd subdirectories - Skips internal .factory housekeeping sessions - Extracts tools, bash commands, and user messages from JSONL - Distributes session-level cumulative token counts across calls - Normalizes Droid model wrappers before existing pricing lookup - Derives clean project names from cwd paths - Adds menubar provider filtering for Droid	2026-04-28 20:16:45 +02:00
AgentSeal	d043795855	Add Qwen provider and replace hardcoded pricing with LiteLLM snapshot - Add Qwen CLI provider (discovers sessions from ~/.qwen/projects/) - Replace FALLBACK_PRICING (40 hand-maintained entries) with auto-generated LiteLLM snapshot (3595 models including Azure, OpenRouter pricing) - Build script fetches and bundles LiteLLM data before tsup - Provider-prefixed lookups (azure/, openrouter/) resolve to correct pricing - Add display names for all GPT-5.x model variants - Add Qwen to menubar provider filter and tab strip	2026-04-28 19:49:14 +02:00
Resham Joshi	ec2de6a642	Add OpenClaw, Roo Code, and KiloCode providers (#175 ) - OpenClaw: JSONL parser with multi-path discovery, tool extraction (toolCall + tool_use block types), model tracking via model_change and custom model-snapshot events - Roo Code + KiloCode: shared Cline-family parser extracts model from <model> tags in api_conversation_history.json, strips provider prefixes from model names - Add cline-auto and openclaw-auto aliases and display names - Add menubar provider filters and tab colors for all three - Show cached data instantly instead of blocking on CLI refresh	2026-04-28 09:24:14 -07:00
saipraneeth.konda	74c1c4b4c1	refactor(copilot): use auto model buckets for transcript inference	2026-04-28 19:32:03 +05:30
Resham Joshi	6d15ea43a5	Add Gemini CLI provider for session tracking (#168 ) Parse ~/.gemini/tmp/<project>/chats/session-*.json files from Gemini CLI 0.38+. Uses real token counts (input, output, cached, thoughts) embedded in each message instead of character estimation. Correctly separates cached tokens from fresh input to avoid double-charging. - Pricing for gemini-3.1-pro-preview, gemini-3-flash-preview, gemini-2.5-pro, gemini-2.5-flash from official Google API rates - Tool name normalization (ReadFile->Read, SearchText->Grep, etc.) - Menubar tab with Google Blue color (#4485F4) Closes #166	2026-04-27 19:48:25 -07:00
Resham Joshi	f7f64a01ab	Add new providers, fix menubar tabs, accent color picker (#167 ) * Add Kiro provider and transparent auto-model naming - Add Kiro IDE provider: parses .chat JSON files, estimates tokens, normalizes dot-versioned model IDs for cost lookup - Show "Cursor (auto)", "Copilot (auto)", "Kiro (auto)" in CLI dashboard instead of pretending to know which model was used - Route auto model names through BUILTIN_ALIASES for cost estimation * Fix menubar tabs: add missing providers, show period-scoped costs - Add Kiro, OMP to ProviderFilter enum so installed providers appear as tabs - Merge Cursor + Cursor Agent into single Cursor tab - Tab costs now reflect the selected period (7d/30d/month/all) instead of always showing today - Tab visibility still uses today's provider list so tabs don't disappear when switching to periods with no data * Add accent color picker to menubar with Apple system presets - 9 presets using Apple's exact macOS dark-mode accent colors (Ember, Blue, Purple, Pink, Red, Orange, Yellow, Green, Graphite) - Color picker in header, persisted via UserDefaults - "Burn" text stays fixed ember regardless of accent - ThemeState is MainActor-isolated for thread safety - Picker state lifted to AppStore so it survives .id() tree rebuild - Accessibility labels on all color swatches - Renamed brandAccentDark/brandEmberDeep/brandEmberGlow to match their actual light/deep/glow semantics * Fix review findings: case-sensitive cost lookup, Kiro timestamp guard, cache versioning - Normalize provider dictionary keys to lowercase in tab cost lookup so "Cursor Agent" (title-case from CLI) matches providerKeys - Guard against missing/invalid/epoch startTime in Kiro parser to prevent RangeError crash or 1970-01-01 ghost entries - Bump DAILY_CACHE_VERSION to 4 so upgraded users get a clean recompute with the new auto-model naming (cursor-auto vs default) - Add version field to cursor-results.json cache to invalidate stale entries that still use the old 'default' model name	2026-04-27 19:46:30 -07:00
Resham Joshi	5d1b335c0a	Fix Copilot provider to read VS Code workspace transcripts (#165 ) The Copilot provider only looked in ~/.copilot/session-state/ which is from an older CLI tool. VS Code Copilot agent stores transcripts in ~/Library/Application Support/Code/User/workspaceStorage/*/GitHub.copilot-chat/transcripts/. The new transcript format has no outputTokens or model_change events, so tokens are estimated from content length and the model is inferred from tool call ID prefixes. Both legacy and VS Code paths are now scanned in parallel. Fixes #161	2026-04-27 19:44:35 -07:00
AgentSeal	410fad9495	Fix Cursor provider reporting $0 for v3 bubble format and NULL createdAt rows Cursor v3 stores zero token counts in bubbles, causing parseBubbles to return empty results. The query also dropped rows with NULL createdAt via the SQL comparison, hiding data from older Cursor versions too. Changes: - Remove inputTokens > 0 SQL filter, estimate tokens from text length when token counts are zero (same 4 chars/token ratio as agentKv) - Include NULL createdAt rows with OR IS NULL, fall back to current timestamp when createdAt is missing - Parse agentKv entries with plain string content instead of skipping them (not all content is a JSON array) - Always parse both bubbles and agentKv instead of agentKv-only fallback - Discover subagent transcripts in subagents/ subdirectories - Fix timezone-dependent test in day-aggregator Fixes #159, #163	2026-04-28 00:35:51 +02:00
Łukasz Majcher	5e49f17e64	fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM readViaStream (used for files ≥8 MB) reconstructs the full file as a single string via chunks.join('\n'), giving the same peak allocation as readFile. Callers then call content.split('\n'), creating a second copy. With FILE_READ_CONCURRENCY=16 and files up to 128 MB this can exhaust the V8 heap (~6 GB theoretical peak). readSessionLines already exists as a proper async generator that yields one line at a time. Switch both hot-path callers to iterate it directly so the full file string is never held in memory. Adds two tests: a spy test confirming readSessionLines is called (not readSessionFile), and a 500-entry correctness test. Fixes #131	2026-04-22 10:11:13 +00:00
iamtoruk	4f1138290e	Merge main into feat/omp-support-model-aliases Second merge of main since the PR was opened. Main moved 30+ commits (0.8.5 bump, plan tracking feature, MiniMax pricing, menubar prefetchAll walk-back, aicrowd cache rewrite revert) so the branch needed another reconciliation before merging to main. Two new conflicts resolved. Took main's text in both cases per the policy of favoring main when the feature work is neutral: README.md Kept main's Node 20+ / better-sqlite3 Requirements wording and main's shorter src/ tree listing. Added OMP to the Requirements line. src/providers/pi.ts Main dropped the discovery-cache snapshot and the rich source-metadata fields as part of the aicrowd revert. Took main's simpler structure and only kept the providerName parameter so OMP sources still report the correct provider in the session source and dedup key. Earlier fixups carried forward from the prior merge commit: - Object.hasOwn guards in resolveAlias against prototype-pollution via a model literally named '__proto__'. - source.provider in the dedup key prefix so OMP rows no longer stamp 'pi:'. - Combined pi.js imports in providers/index.ts. - Trailing newline on pi.ts. - Unknown-model fallback in cursor-agent.ts from yesterday's PR #117 fixup (preserved via main). 353 tests pass (count dropped from 378 because main deleted the parse-progress / parser-cache / provider-colors / source-cache test files alongside the cache-rewrite revert). Feature work by @cgrossde.	2026-04-21 11:51:20 -07:00
iamtoruk	81b5cda173	feat: add MiniMax-M2.7 and MiniMax-M2.7-highspeed model pricing Adds FALLBACK_PRICING entries plus display names so MiniMax sessions show up with the right cost and readable labels when users route MiniMax through providers like OpenCode. Pricing verified against the live MiniMax paygo page: MiniMax-M2.7 input $0.3/M output $1.2/M cache-read $0.06/M cache-write $0.375/M MiniMax-M2.7-highspeed input $0.6/M output $2.4/M cache-read $0.06/M cache-write $0.375/M	2026-04-21 05:50:52 -07:00
iamtoruk	5a4e14f0f6	docs: remove stray Unreleased heading, note removed --no-cache flag Also adds a regression test for the midnight-straddle bucketing invariant that was flagged by the pre-push review: if someone reverts the assistant- timestamp bucketing back to user-timestamp, this test will catch it.	2026-04-21 04:54:50 -07:00
iamtoruk	68e9c63088	fix(cursor-agent): drop unused SessionSource fields reintroduced by revert cursor-agent was authored on top of the Sharada cache rewrite and referenced fingerprintPath, cacheStrategy, progressLabel, and parserVersion. With the persistent source cache reverted, these fields no longer exist on SessionSource. Strip the references; cursor-agent continues to work on the v0.8.1 discover + parse path like every other provider.	2026-04-21 04:23:20 -07:00
iamtoruk	0725fe2fbb	fix(cursor-agent): preserve raw model name for unknown Cursor models The fallback path in modelDisplayName returned "Auto (Sonnet est.) (est.)" for any model not listed in modelDisplayNames, double-tagging the est. suffix and hiding the real model ID. New Cursor model IDs now surface as their raw name with a single (est.) suffix until the display map is updated. Adds a regression test.	2026-04-21 04:21:06 -07:00
Matt Van Horn	620ca32219	feat(cursor-agent): add provider for cursor-agent CLI sessions Discovers transcripts at ~/.cursor/projects//agent-transcripts/.txt and joins against ~/.cursor/ai-tracking/ai-code-tracking.db for model attribution. Token counts are estimated from transcript character length since the attribution DB does not carry them; the model label surfaces the estimation with an (est.) suffix on every row. Deduplication keys prefix cursor-agent: to stay disjoint from the existing cursor: prefix so the two providers do not cross-dedupe on shared conversationId namespaces. Tests cover: empty ~/.cursor/projects/, single transcript, multiple projects, missing ai-code-tracking.db, unrecognized transcript format skip, non-UUID filename fallback, and sqlite metadata join. Closes #55	2026-04-21 04:21:01 -07:00
Trevin Chow	3f7470d29b	feat(plan): subscription plan tracking with usage progress bar Adds `codeburn plan set <id>` to configure a subscription plan (Claude Pro, Claude Max, Cursor Pro, or custom). When set, the Overview panel renders an API-equivalent progress bar against subscription price with a projected month-end cost. Closes the loudest demand signal on the repo: issue #11 ("Subscription vs API Use") from two independent voices, plus the routing-decision use case raised in #12. - src/config.ts: extends CodeburnConfig with Plan, adds readPlan/savePlan/clearPlan - src/plans.ts: presets (claude-pro $20, claude-max $200, cursor-pro $20) - src/plan-usage.ts: getPlanUsage, resetDay-aware period math (1-28), median-of-7-day-trailing projection - src/cli.ts: `codeburn plan [show\|set\|reset]` subcommand, plan wired into JSON outputs for report/today/month/status (only when active) - src/dashboard.tsx: Plan row in Overview, color-coded (green under 80%, orange near, red over), with days-until-reset - README.md: Plans section with honest framing (API-equivalent vs subscription price, not token allowance) - tests/plan-usage.test.ts, tests/plans.test.ts, tests/cli-plan.test.ts: period math, presets, CLI round-trip Resets respect resetDay across month boundaries. Uses median daily spend (not mean) so one huge day doesn't distort the month-end projection. Fixes #11	2026-04-21 04:20:50 -07:00
iamtoruk	8e39a89fe0	fix: pricing accuracy, stream leak, CSV injection hardening - Remove bidirectional fuzzy match in getModelCosts that could return wrong pricing when a short canonical name prefix-matched a longer key - Use explicit undefined check in parseLiteLLMEntry so free models with zero cost are not silently dropped from the LiteLLM pricing database - Destroy read stream in finally block of readSessionLines to prevent file descriptor leaks when the generator is abandoned early - Extend CSV injection escaping to cover tab and carriage-return prefixes - Add optional chaining fallback for empty periods in exportCsv/exportJson - Add regression tests for all fixes (models, export, fs-utils)	2026-04-21 04:20:46 -07:00
iamtoruk	c2ab80d6e2	Merge main into feat/omp-support-model-aliases Brings the PR branch up to the current main so the OMP provider and the model-alias command can land cleanly. Resolves six merge conflicts and applies a handful of small fixups alongside the resolution so the feature matches the conventions set by the cursor-agent merge earlier today. Conflict resolutions: README.md Combine cursor-agent and OMP rows in provider list, Requirements, and data-location table; take main's Node 22+ and node:sqlite text. src/cli.ts Keep both new commands: model-alias and plan. src/config.ts Add modelAliases alongside plan on the config type. src/providers/index.ts Keep the cursor-agent lazy-loader from main and add omp to coreProviders. Fold the two pi-module imports into one statement. src/providers/pi.ts Keep the discovery-cache snapshot path from main and the providerName parameterization from the PR. Propagate providerName through saveDiscoveryCache, loadDiscoveryCache, the parserVersion tag, and the dedup key prefix so OMP sources no longer stamp 'pi:' inside their cache entries or dedup keys. tests/models.test.ts Keep main's pricing-and-short-name tests and add the PR's alias tests alongside, sharing a single loadPricing setup and an afterEach alias reset. Fixups in the same commit: src/models.ts Replace ?? chain in resolveAlias with Object.hasOwn checks. The previous form returned Object.prototype for a model named '__proto__' and broke downstream canonical.startsWith calls. Caught by the existing prototype-pollution test suite. src/providers/pi.ts Use source.provider in the dedup key prefix and add a trailing newline to the file. tests/providers/omp.test.ts Expect 'omp:' in the dedup key for OMP sources, matching the fix above. Feature work by @cgrossde.	2026-04-21 03:16:28 -07:00
iamtoruk	ed5512144a	fix(cursor-agent): preserve raw model name for unknown Cursor models The fallback path in modelDisplayName returned "Auto (Sonnet est.) (est.)" for any model not listed in modelDisplayNames, double-tagging the est. suffix and hiding the real model ID. New Cursor model IDs now surface as their raw name with a single (est.) suffix until the display map is updated. Adds a regression test.	2026-04-20 19:20:15 -07:00
Matt Van Horn	554036d2a7	feat(cursor-agent): add provider for cursor-agent CLI sessions Discovers transcripts at ~/.cursor/projects//agent-transcripts/.txt and joins against ~/.cursor/ai-tracking/ai-code-tracking.db for model attribution. Token counts are estimated from transcript character length since the attribution DB does not carry them; the model label surfaces the estimation with an (est.) suffix on every row. Deduplication keys prefix cursor-agent: to stay disjoint from the existing cursor: prefix so the two providers do not cross-dedupe on shared conversationId namespaces. Tests cover: empty ~/.cursor/projects/, single transcript, multiple projects, missing ai-code-tracking.db, unrecognized transcript format skip, non-UUID filename fallback, and sqlite metadata join. Closes #55	2026-04-20 17:49:45 -07:00
Sharada Mohanty	7594fa0254	feat: optimize parse caching across providers	2026-04-21 00:07:07 +02:00
Sharada Mohanty	563f9c4f1b	refactor: share provider presentation metadata	2026-04-21 00:04:29 +02:00
Sharada Mohanty	140e50b702	test: stabilize local-date aggregation	2026-04-21 00:03:49 +02:00
Sharada Mohanty	ff442c71f2	perf: cache provider discovery metadata	2026-04-21 00:03:49 +02:00
Sharada Mohanty	2a9daec0ea	feat: add cache rebuild flag and progress	2026-04-21 00:03:49 +02:00
Sharada Mohanty	1b8e0f8289	fix: harden Claude append cache refresh	2026-04-21 00:01:46 +02:00
Sharada Mohanty	ad5366472a	feat: cache Claude sources by session file	2026-04-21 00:01:46 +02:00
Sharada Mohanty	862be251e5	refactor: move providers onto shared cache metadata	2026-04-21 00:01:46 +02:00
Sharada Mohanty	303a9256c5	feat: reuse cached parsed sources	2026-04-21 00:01:46 +02:00

1 2 3

101 commits