Validator hardenings on the bug-hunt batch (#254)

* Five correctness fixes from multi-agent bug hunt A multi-agent audit of the codeburn correctness surface found five real bugs each producing visibly wrong numbers or risking data loss. All five fixes were validated by parallel review agents and exercised end-to-end against real session data on this machine. - src/cli.ts: --refresh <seconds> was using bare parseInt as the commander callback. Commander invokes the callback as parseInt(value, previous), so previous becomes the radix: --refresh 30 was being parsed as parseInt('30', 30) = 90, and --refresh 60 became NaN. Replaced with parseInteger (already defined at line 48 with radix locked to 10) at all three sites. - src/providers/cursor.ts: parseAgentKv was timestamping every agentKv call as new Date().toISOString() because the Cursor SQLite schema has no per-message timestamp. Result: every Cursor agent call regardless of when it happened landed in today's date bucket. Now uses statSync(dbPath).mtimeMs as a bounded ceiling so calls land at the actual last-write time of the Cursor database, not today. Verified locally: a 1904-call Cursor history with March 22 mtime now correctly bucket into all-time only and shows 0 calls for today/week/30days. - src/providers/codex.ts: prev token counters were only updated inside the cumulative-fallback branch, so a session emitting N events with last_token_usage followed by one cumulative-only event computed the next delta against prev=0 and double-counted the entire cumulative window. Cost could be inflated 10-100x for any mixed-format Codex session. Now prev advances to the current cumulative state regardless of which branch ran. - src/providers/gemini.ts: totalOutput accumulated output+thoughts while totalThoughts was tracked separately. The result was outputTokens = output+thoughts AND reasoningTokens = thoughts; any consumer summing the two double-counted thoughts. Now totalOutput holds just output, reasoningTokens holds thoughts, and the cost calc folds thoughts into the output count to keep pricing correct (Google bills thoughts at the output rate; calculateCost has no reasoning parameter). - src/export.ts: exportJson had no safety check before writeFile, so codeburn export -f json -o ~/important.json would silently clobber the user's file. CSV path had a marker-file guard; JSON did not. Now refuses to overwrite a file unless its first 4KB contain the codeburn schema marker. Uses a streaming partial read so a large existing file does not OOM Node's ~512MB string limit. Refuses directories outright. Skipped intentionally: cursor-auto/copilot-auto/cline-auto/ qwen-auto are aliased to claude-sonnet-4-5. The audit flagged this as wrong pricing for non-Anthropic auto-routed turns, but Cursor's "auto" mode does not expose the actual model and any alternative estimate is equally arbitrary. README already documents this as a Sonnet-based estimate. vitest run: 38 files, 529 tests pass. * Five more correctness fixes from the bug-hunt round This commit closes out the remaining critical-tier findings from the multi-agent audit, with one item documented as a known limitation. - src/providers/cursor.ts: bubble dedup key included mutable inputTokens/outputTokens. Cursor mutates token counts on the row in place when streaming completes, so re-parsing the same DB produced a fresh dedup key per bubble and silently double-counted. Switched to the SQLite row key (`bubbleId:<unique>`) which is stable per bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose `key as bubble_key`. - src/providers/pi.ts: usage fields were destructured non-optionally, but real Pi/OMP session files sometimes omit individual fields. `calculateCost(model, undefined, ...)` returned NaN, and that NaN propagated into every aggregate cost total. Coerce each field to 0 with `?? 0`. - src/models.ts: getShortModelName and the getModelCosts startsWith fallback both walked the dictionary in insertion order. A model id like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched by startsWith first) and silently get GPT-5's display name and pricing tier. Iterate longest keys first so more-specific prefixes win. Tightened the cost fallback's match condition from `startsWith(key) || startsWith(key + '-')` to require either an exact match or a `key + '-'` continuation, removing accidental matches like `gpt-50` against `gpt-5`. - src/models.ts: calculateCost returned 0 silently for any model missing from the pricing snapshot. New Anthropic / OpenAI models shipped between snapshot refreshes look free until the user notices. Now warns once per unknown model name per process to stderr. Skips the warning for the `<synthetic>` placeholder so the noise floor stays low. - src/yield.ts: revert detection was broken on the canonical case. Two problems: (1) `subject.toLowerCase().includes('revert')` matched any commit whose subject mentioned the word ("Add revert button" was misclassified). (2) The window logic only counted reverts within the original session's 1-hour boundary, but real `git revert` commits land in later sessions, so original sessions always looked productive. Now: getRevertedShas runs once with `--grep=^This reverts commit` and parses bodies to build a Set of SHAs that were the target of a revert anywhere in history. CommitInfo.wasReverted is set when this commit's SHA appears in that set. categorizeSession then flags a session as reverted when its in-main commits were later reverted, regardless of when the revert itself happened. - src/providers/droid.ts: SKIPPED with comment. Droid records token usage only at session level. The current behavior splits evenly across emitted assistant calls and prices all of them at settings.model (the latest model). For sessions where the user switched models mid-stream, costs are approximate. Added an inline comment documenting this; a real fix requires per-message model data that isn't in the Droid JSONL schema. Verified end-to-end on this machine: - vitest run: 38 files, 529 tests pass - `codeburn report --format json` produces valid JSON - `codeburn yield -p week` runs without crashing, finds 0 reverts in the user's recent git history (plausible — fix changed the detection from "subject contains revert" to "this commit's SHA appears in a later 'This reverts commit ...' body") - Stderr now warns for unknown model ids: `openai/gpt-5.3`, `qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced silently at $0. * Four high-severity fixes from the bug-hunt round - src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in one try/catch. If fetchRate succeeded but cacheRate threw (disk full, ENOSPC, no permissions on the cache dir), the catch block swallowed the error and returned 1. Every cost rendered after that point became USD-equivalent silently. Now the fetch and the cache write live in separate paths: a successful fetch returns the rate even if the persist fails, and the cache-write error is dropped to a fire-and-forget so transient disk problems do not corrupt the user's currency display. - src/cursor-cache.ts: writeFile was non-atomic. Two concurrent codeburn invocations writing to cursor-results.json could interleave bytes mid-write, leaving a truncated file that parsed-error on next read and forced a full SQLite re-scan every run. Switched to the temp-file + rename pattern with a randomized temp name so each writer gets its own staging file and the rename is atomic on POSIX. Crash mid-write also leaves only a leftover temp file, which gets unlinked in the catch path; the destination is never half-written. - mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's Task.sleep keeps a wakeup pending across system sleep, so on wake the natural tick fires the same instant the wake observers do. Combined with didWakeNotification, screensDidWakeNotification, and the launchd com.codeburn.refresh distributed notification, that produced 2-3 concurrent CLI spawns within ms of every wake. Now: willSleepNotification cancels the loop task; didWakeNotification restarts it. The loop also reads lastRefreshTime and skips its natural tick if a wake/manual/distributed-notification refresh ran within the last 5 seconds, coalescing the two sources of refresh into one CLI spawn per wake event. - mac/.../CodeBurnApp.swift observeStore: the read closure had an implicit strong self capture (it accessed store.* without a capture annotation), pinning self for the lifetime of any unfired observation. Added [weak self] and a guard to make the capture explicit. withObservationTracking is one-shot per call, so there is at most one active subscription at a time; the earlier audit's claim of an unbounded leak overstated the issue, but tightening the capture pattern is still cleaner. Verified: - vitest run: 38 files, 529 tests pass - swift build -c release --arch arm64 --arch x86_64: clean, no diagnostics, no MainActor warnings - mac/Scripts/package-app.sh dev produces a valid universal bundle - Menubar launches and runs without crash * Eleven medium-severity fixes from the bug-hunt round - src/format.ts formatTokens: guard against Infinity, NaN, and negative input. Previously a corrupt aggregate could leak into the UI as the literal strings "NaN" or "Infinity". Negatives now render as "0" rather than "-500" with no scaling. - src/cli-date.ts parseDateRangeFlags: the missing-from default was new Date(0), which opened a 55-year scan from 1970 epoch whenever the user passed only --to. Default now anchors at 6 months back from now, matching the dashboard's all-time period. Test updated to assert the new bounded window. - src/cli-date.ts toPeriod: previously fell back silently to "week" for any unknown input, so a typo like `-p mounth` produced a quiet 7-day report while the user thought they were viewing the month. Now exits with a clear stderr error and exit code 1. Test updated to assert the loud-failure behavior. - src/optimize.ts urgencyScore: rebalanced weights so a high-impact finding with zero observed tokens cannot outrank a medium-impact finding with millions of tokens. Old 0.7/0.3 split made high+0 (0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B (0.75) beat high+0 (0.50). Token normalization lifted to 5M so the ramp covers a realistic spend range. - src/models.ts calculateCost: clamp negative or non-finite token inputs to 0 before pricing. A corrupt JSONL emitting a negative count would otherwise produce a negative cost that silently subtracted from real spend in aggregates. - src/currency.ts convertCost: stop rounding during aggregation. For zero-fraction currencies (JPY, KRW, CLP) this clamped every per-session cost to a whole unit before sum, so a project of 1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of ¥400. formatCost still rounds at the display boundary. - src/config.ts saveConfig: the temp file path was a fixed `${configPath}.tmp` suffix. Two simultaneous saveConfig calls (overlapping menubar and CLI runs) raced on the same staging file and could leave one writer reading partial bytes from the other. Randomized the temp suffix per call. - src/providers/antigravity.ts flushCache: the early return on `!cacheDirty` short-circuited eviction when liveCascadeIds was supplied but no cascade had been added or updated this run. As a result, deleted .pb files persisted in the cache forever once the user stopped writing to it. Eviction now runs whenever liveCascadeIds is provided, marks the cache dirty if anything was removed, and only then short-circuits if there is nothing to write. - src/daily-cache.ts addNewDays: cap retention at 2 years. The days array previously merged forever, growing the cache file by hundreds of bytes per day until JSON parse on every CLI invocation became measurable. The 6-month UI period plus the 365-day BACKFILL_DAYS bootstrap both fit comfortably inside the cap, with headroom for a future longer window. - src/dashboard.tsx useInput: period number keys (1-5) and arrow keys triggered a reload while the compare view was mounted. The parent's data state changed underneath the user with no visual affordance back to the dashboard. Now those keys are gated on view !== 'compare', and `b` / Esc inside compare returns to the dashboard. - mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend Bars, computeTrendStats, computeForecast, and computeAllStats each allocated a fresh DateFormatter (and Calendar) on every call. SwiftUI re-evaluates these views many times per second during hover scrubbing on the trend chart, so the allocations were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d" / "MMM d" formatters and the gregorian Calendar to fileprivate cached singletons. Two findings from the same bucket were not addressed here: - UpdateChecker SHA-256 / codesign verification is already performed by src/menubar-installer.ts (verifyChecksum at line 85). The Swift side just kicks off `codeburn menubar --force` which runs that path. The audit's claim of missing verification was a misread. - NSDistributedNotificationCenter sender validation: the `com.codeburn.refresh` listener accepts from any sender, but forceRefresh has a 5-second rate-limit gate so the abuse ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC, per-launch shared secret) are disproportionate to the impact. vitest run: 38 files, 529 tests pass. swift build -c release: clean, no warnings. * Validator hardenings on the bug-hunt batch Hoist the per-call sort in getModelCosts and getShortModelName to module scope so model lookups on the hot path stop reallocating sorted key arrays. Sanitize the unknown-model stderr warning by stripping C0/C1 controls and capping length, so a hostile or corrupt JSONL cannot inject terminal escape sequences via the model field. Skip the daily-cache prune when newestDate fails to parse. The previous code produced a NaN cutoff and silently dropped every cached day on the next merge. Adds tests locking down the stable resolution of common model names (gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and the prune NaN guard.
2026-05-19 07:43:09 +00:00 · 2026-05-06 19:50:40 -07:00 · 2026-05-06 19:50:40 -07:00 · afd0ee7011
commit afd0ee7011
parent 2817ebff47
24 changed files with 621 additions and 181 deletions
--- a/src/cli-date.ts
+++ b/src/cli-date.ts
@ -29,12 +29,17 @@ export const PERIOD_LABELS: Record<Period, string> = {
  all: '6 Months',
 }

+const VALID_PERIODS: ReadonlyArray<Period> = ['today', 'week', '30days', 'month', 'all']
+
 export function toPeriod(s: string): Period {
-  if (s === 'today') return 'today'
-  if (s === 'month') return 'month'
-  if (s === '30days') return '30days'
-  if (s === 'all') return 'all'
-  return 'week'
+  if ((VALID_PERIODS as readonly string[]).includes(s)) return s as Period
+  // Fail loudly instead of silently coercing to 'week'. Previously a typo
+  // like `-p mounth` produced a quiet 7-day report and the user thought
+  // they were viewing the month.
+  process.stderr.write(
+    `codeburn: unknown period "${s}". Valid values: ${VALID_PERIODS.join(', ')}.\n`
+  )
+  process.exit(1)
 }

 function parseLocalDate(s: string): Date {
@ -49,7 +54,14 @@ export function parseDateRangeFlags(from: string | undefined, to: string | undef
  if (from === undefined && to === undefined) return null

  const now = new Date()
-  const start = from !== undefined ? parseLocalDate(from) : new Date(0)
+  // When --from is omitted, default to 6 months back (the same window the
+  // dashboard's "all" period uses) instead of epoch. Previously a bare
+  // `--to 2026-01-01` opened a 55-year scan from 1970 which is rarely what
+  // the user meant and is expensive on machines with many session files.
+  const ALL_TIME_FALLBACK_MS = 6 * 31 * 24 * 60 * 60 * 1000
+  const start = from !== undefined
+    ? parseLocalDate(from)
+    : new Date(now.getTime() - ALL_TIME_FALLBACK_MS)

  const endDate = to !== undefined ? parseLocalDate(to) : new Date(now.getFullYear(), now.getMonth(), now.getDate())
  const end = new Date(
--- a/src/cli.ts
+++ b/src/cli.ts
@ -271,7 +271,7 @@ program
  .option('--format <format>', 'Output format: tui, json', 'tui')
  .option('--project <name>', 'Show only projects matching name (repeatable)', collect, [])
  .option('--exclude <name>', 'Exclude projects matching name (repeatable)', collect, [])
-  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInt, 30)
+  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInteger, 30)
  .action(async (opts) => {
    let customRange: DateRange | null = null
    try {
@ -515,7 +515,7 @@ program
  .option('--format <format>', 'Output format: tui, json', 'tui')
  .option('--project <name>', 'Show only projects matching name (repeatable)', collect, [])
  .option('--exclude <name>', 'Exclude projects matching name (repeatable)', collect, [])
-  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInt, 30)
+  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInteger, 30)
  .action(async (opts) => {
    if (opts.format === 'json') {
      await runJsonReport('today', opts.provider, opts.project, opts.exclude)
@ -532,7 +532,7 @@ program
  .option('--format <format>', 'Output format: tui, json', 'tui')
  .option('--project <name>', 'Show only projects matching name (repeatable)', collect, [])
  .option('--exclude <name>', 'Exclude projects matching name (repeatable)', collect, [])
-  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInt, 30)
+  .option('--refresh <seconds>', 'Auto-refresh interval in seconds (0 to disable)', parseInteger, 30)
  .action(async (opts) => {
    if (opts.format === 'json') {
      await runJsonReport('month', opts.provider, opts.project, opts.exclude)
--- a/src/config.ts
+++ b/src/config.ts
@ -1,6 +1,7 @@
 import { readFile, writeFile, mkdir, rename } from 'fs/promises'
 import { join } from 'path'
 import { homedir } from 'os'
+import { randomBytes } from 'crypto'

 export type PlanId = 'claude-pro' | 'claude-max' | 'claude-max-5x' | 'cursor-pro' | 'custom' | 'none'
 export type PlanProvider = 'claude' | 'codex' | 'cursor' | 'all'
@ -42,7 +43,11 @@ export async function readConfig(): Promise<CodeburnConfig> {
 export async function saveConfig(config: CodeburnConfig): Promise<void> {
  await mkdir(getConfigDir(), { recursive: true })
  const configPath = getConfigPath()
-  const tmpPath = `${configPath}.tmp`
+  // Randomize the temp path so two simultaneous saveConfig calls (from
+  // overlapping menubar + CLI runs, for example) do not race on the same
+  // staging file. The previous fixed `.tmp` suffix could leave one
+  // process reading partial bytes the other was mid-writing.
+  const tmpPath = `${configPath}.${randomBytes(8).toString('hex')}.tmp`
  await writeFile(tmpPath, JSON.stringify(config, null, 2) + '\n', 'utf-8')
  await rename(tmpPath, configPath)
 }
--- a/src/currency.ts
+++ b/src/currency.ts
@ -98,13 +98,19 @@ async function getExchangeRate(code: string): Promise<number> {
  const cached = await loadCachedRate(code)
  if (cached) return cached

+  let rate: number
  try {
-    const rate = await fetchRate(code)
-    await cacheRate(code, rate)
-    return rate
+    rate = await fetchRate(code)
  } catch {
    return 1
  }
+  // Persist the rate, but never let a cache-write failure (disk full, no
+  // permissions, etc.) cause us to return the USD-equivalent fallback.
+  // The original code wrapped fetch + cacheRate in one try/catch, so a
+  // disk-full at write time would discard a perfectly good rate and silently
+  // make every cost render as if the user had selected USD.
+  cacheRate(code, rate).catch(() => {})
+  return rate
 }

 export async function loadCurrency(): Promise<void> {
@ -137,9 +143,13 @@ export function getCostColumnHeader(): string {
 }

 export function convertCost(costUSD: number): number {
-  const digits = getFractionDigits(active.code)
-  const factor = 10 ** digits
-  return Math.round(costUSD * active.rate * factor) / factor
+  // Return the unrounded converted cost. Rounding here meant zero-fraction
+  // currencies (JPY, KRW, CLP) clamped every per-session cost to the nearest
+  // whole unit before aggregation; a project with 1000 sessions averaging
+  // ¥0.4 each would aggregate to ¥0 instead of ¥400 because each row was
+  // rounded independently. formatCost (and the export rowsToCsv path) round
+  // at the display boundary instead.
+  return costUSD * active.rate
 }

 export function formatCost(costUSD: number): string {
--- a/src/cursor-cache.ts
+++ b/src/cursor-cache.ts
@ -1,6 +1,7 @@
-import { readFile, writeFile, mkdir, stat } from 'fs/promises'
+import { readFile, writeFile, mkdir, rename, stat, unlink } from 'fs/promises'
 import { join } from 'path'
 import { homedir } from 'os'
+import { randomBytes } from 'crypto'

 import type { ParsedProviderCall } from './providers/types.js'

@ -50,18 +51,30 @@ export async function readCachedResults(dbPath: string): Promise<ParsedProviderC
 }

 export async function writeCachedResults(dbPath: string, calls: ParsedProviderCall[]): Promise<void> {
-  try {
-    const fp = await getDbFingerprint(dbPath)
-    if (!fp) return
+  const fp = await getDbFingerprint(dbPath)
+  if (!fp) return

-    const dir = getCacheDir()
-    await mkdir(dir, { recursive: true })
-    const cache: ResultCache = {
-      version: CURSOR_CACHE_VERSION,
-      dbMtimeMs: fp.mtimeMs,
-      dbSizeBytes: fp.size,
-      calls,
-    }
-    await writeFile(getCachePath(), JSON.stringify(cache), 'utf-8')
-  } catch {}
+  const dir = getCacheDir()
+  await mkdir(dir, { recursive: true }).catch(() => {})
+  const cache: ResultCache = {
+    version: CURSOR_CACHE_VERSION,
+    dbMtimeMs: fp.mtimeMs,
+    dbSizeBytes: fp.size,
+    calls,
+  }
+
+  // Atomic write: stage to a randomized temp file in the same directory,
+  // then rename onto the final path. rename() is atomic on POSIX, so a
+  // crash mid-write never leaves a half-written cache, and concurrent
+  // CLI invocations using their own random temp names cannot interleave
+  // bytes in the destination file (they only race on the final rename,
+  // last-writer-wins, both with valid content).
+  const target = getCachePath()
+  const tempPath = `${target}.${randomBytes(8).toString('hex')}.tmp`
+  try {
+    await writeFile(tempPath, JSON.stringify(cache), 'utf-8')
+    await rename(tempPath, target)
+  } catch {
+    await unlink(tempPath).catch(() => {})
+  }
 }
--- a/src/daily-cache.ts
+++ b/src/daily-cache.ts
@ -133,10 +133,24 @@ export function addNewDays(cache: DailyCache, incoming: DailyEntry[], newestDate
    byDate.set(day.date, day)
  }
  const merged = Array.from(byDate.values()).sort((a, b) => a.date.localeCompare(b.date))
+  // Prune entries older than the BACKFILL window so the cache file does not
+  // grow unbounded over years of daily use. The "all time" / 6-month period
+  // and the BACKFILL_DAYS bootstrap both fit comfortably inside this cap.
+  // Anchor the cap on the newestDate boundary so a stale or stuck clock
+  // can't accidentally evict everything. Skip the prune entirely if
+  // newestDate is malformed — an invalid Date would produce a NaN cutoff
+  // and `d.date >= "Invalid Date"` would silently drop every entry.
+  const cutoffDate = new Date(`${newestDate}T00:00:00Z`)
+  let pruned = merged
+  if (!isNaN(cutoffDate.getTime())) {
+    cutoffDate.setUTCDate(cutoffDate.getUTCDate() - DAILY_CACHE_RETENTION_DAYS)
+    const cutoff = toDateString(cutoffDate)
+    pruned = merged.filter(d => d.date >= cutoff)
+  }
  const nextLast = cache.lastComputedDate && cache.lastComputedDate > newestDate
    ? cache.lastComputedDate
    : newestDate
-  return { version: DAILY_CACHE_VERSION, lastComputedDate: nextLast, days: merged }
+  return { version: DAILY_CACHE_VERSION, lastComputedDate: nextLast, days: pruned }
 }

 export function getDaysInRange(cache: DailyCache, start: string, end: string): DailyEntry[] {
@ -153,6 +167,10 @@ export function withDailyCacheLock<T>(fn: () => Promise<T>): Promise<T> {

 export const MS_PER_DAY = 24 * 60 * 60 * 1000
 export const BACKFILL_DAYS = 365
+// Keep 2 years of history so the longest UI-exposed period (6 months
+// today, with headroom for future longer windows) always reads from
+// cache while old entries get pruned.
+export const DAILY_CACHE_RETENTION_DAYS = 730

 export function toDateString(date: Date): string {
  return `${date.getFullYear()}-${String(date.getMonth() + 1).padStart(2, '0')}-${String(date.getDate()).padStart(2, '0')}`
--- a/src/dashboard.tsx
+++ b/src/dashboard.tsx
@ -760,12 +760,18 @@ function InteractiveDashboard({ initialProjects, initialPeriod, initialProvider,
    if (input === 'o' && findingCount > 0 && view === 'dashboard' && optimizeAvailable) { setView('optimize'); return }
    if ((input === 'b' || key.escape) && view === 'optimize') { setView('dashboard'); return }
    if (input === 'c' && compareAvailable && view === 'dashboard') { setView('compare'); return }
+    if ((input === 'b' || key.escape) && view === 'compare') { setView('dashboard'); return }
    if (input === 'p' && multipleProviders && view !== 'compare') {
      const opts = ['all', ...detectedProviders]; const next = opts[(opts.indexOf(activeProvider) + 1) % opts.length]
      setActiveProvider(next); setView('dashboard')
      if (debounceRef.current) clearTimeout(debounceRef.current)
      reloadData(period, next); return
    }
+    // Period switches reload the underlying data. Disable them while the
+    // compare view is mounted; the compare view re-aggregates from
+    // `projects` and would visibly change underneath the user without any
+    // affordance back to the dashboard. Press `b` or Esc to return first.
+    if (view === 'compare') return
    const idx = PERIODS.indexOf(period)
    if (key.leftArrow) switchPeriod(PERIODS[(idx - 1 + PERIODS.length) % PERIODS.length]!)
    else if (key.rightArrow || key.tab) switchPeriod(PERIODS[(idx + 1) % PERIODS.length]!)
--- a/src/export.ts
+++ b/src/export.ts
@ -1,4 +1,4 @@
-import { writeFile, mkdir, readdir, stat, rm } from 'fs/promises'
+import { writeFile, mkdir, readdir, open, stat, rm } from 'fs/promises'
 import { dirname, join, resolve } from 'path'

 import { CATEGORY_LABELS, type ProjectSummary, type TaskCategory } from './types.js'
@ -357,6 +357,33 @@ export async function exportJson(periods: PeriodExport[], outputPath: string): P
  }

  const target = resolve(outputPath.toLowerCase().endsWith('.json') ? outputPath : `${outputPath}.json`)
+  // Refuse to overwrite an existing file that wasn't produced by codeburn
+  // export. CSV path has the same guard via the .codeburn-export marker; JSON
+  // was missing it, so a stray `-o ~/important.json` would silently clobber.
+  const existing = await stat(target).catch(() => null)
+  if (existing?.isFile()) {
+    // Read just the first 4KB to look for the schema marker. The schema key
+    // is the first field in the JSON object so a partial read is enough;
+    // loading the whole file (potentially gigabytes) into memory could OOM
+    // on Node's ~512MB string limit.
+    const fh = await open(target, 'r')
+    try {
+      const buf = Buffer.alloc(4096)
+      const { bytesRead } = await fh.read(buf, 0, buf.length, 0)
+      const head = buf.toString('utf-8', 0, bytesRead)
+      if (!head.includes('"schema": "codeburn.export.v')) {
+        throw new Error(
+          `Refusing to overwrite ${target}: file does not look like a codeburn export. ` +
+          `Delete it manually or pick a different -o path.`
+        )
+      }
+    } finally {
+      await fh.close()
+    }
+  }
+  if (existing?.isDirectory()) {
+    throw new Error(`Refusing to overwrite directory at ${target}. Pass a file path instead.`)
+  }
  await mkdir(dirname(target), { recursive: true })
  await writeFile(target, JSON.stringify(data, null, 2), 'utf-8')
  return target
--- a/src/format.ts
+++ b/src/format.ts
@ -8,9 +8,13 @@ import { formatCost } from './currency.js'
 export { formatCost }

 export function formatTokens(n: number): string {
+  // Guard against Infinity / NaN / negatives that would otherwise leak into
+  // the UI as "Infinity" or "NaN" strings when an upstream calculation glitches.
+  if (!Number.isFinite(n)) return '?'
+  if (n < 0) return '0'
  if (n >= 1_000_000) return `${(n / 1_000_000).toFixed(1)}M`
  if (n >= 1_000) return `${(n / 1_000).toFixed(1)}K`
-  return n.toString()
+  return Math.round(n).toString()
 }

 /// Returns YYYY-MM-DD for the given date in the process-local timezone. Cheaper than shelling
--- a/src/models.ts
+++ b/src/models.ts
@ -48,6 +48,14 @@ function loadSnapshot(): Map<string, ModelCosts> {
 }

 let pricingCache: Map<string, ModelCosts> = loadSnapshot()
+let sortedPricingKeys: string[] | null = null
+
+function getSortedPricingKeys(): string[] {
+  if (sortedPricingKeys === null) {
+    sortedPricingKeys = Array.from(pricingCache.keys()).sort((a, b) => b.length - a.length)
+  }
+  return sortedPricingKeys
+}

 function getCacheDir(): string {
  return join(homedir(), '.cache', 'codeburn')
@ -110,11 +118,13 @@ export async function loadPricing(): Promise<void> {
  const cached = await loadCachedPricing()
  if (cached) {
    pricingCache = cached
+    sortedPricingKeys = null
    return
  }

  try {
    pricingCache = await fetchAndCachePricing()
+    sortedPricingKeys = null
  } catch {
    // snapshot already loaded at init; nothing more to do
  }
@ -192,13 +202,23 @@ export function getModelCosts(model: string): ModelCosts | null {
  const canonical = resolveAlias(getCanonicalName(model))
  if (pricingCache.has(canonical)) return pricingCache.get(canonical)!

-  for (const [key, costs] of pricingCache) {
-    if (canonical.startsWith(key + '-') || canonical.startsWith(key)) return costs
+  // Iterate keys longest-first so a model id like `gpt-5-mini` matches the
+  // `gpt-5-mini` entry rather than collapsing to the shorter `gpt-5` entry
+  // due to dictionary insertion order.
+  for (const key of getSortedPricingKeys()) {
+    if (canonical.startsWith(key + '-') || canonical === key) {
+      return pricingCache.get(key)!
+    }
  }

  return null
 }

+// Warn at most once per unknown model name per process. Without this, a model
+// missing from the pricing snapshot would silently price at $0 for every
+// session that used it, hiding real spend until the user noticed.
+const warnedUnknownModels = new Set<string>()
+
 export function calculateCost(
  model: string,
  inputTokens: number,
@ -209,16 +229,39 @@ export function calculateCost(
  speed: 'standard' | 'fast' = 'standard',
 ): number {
  const costs = getModelCosts(model)
-  if (!costs) return 0
+  if (!costs) {
+    // Skip the synthetic placeholder and the auto-router pseudo-models that
+    // intentionally have no direct pricing entry; calculateCost callers
+    // resolve those through aliasing first, so an unknown here is genuinely
+    // an unmapped real model.
+    if (model && model !== '<synthetic>' && !warnedUnknownModels.has(model)) {
+      warnedUnknownModels.add(model)
+      // Strip control characters and cap length: model names come from JSONL
+      // payloads written by external tools, so a hostile or corrupt file
+      // could embed terminal escape sequences here.
+      const safeName = model.replace(/[\x00-\x1F\x7F-\x9F]/g, '?').slice(0, 200)
+      process.stderr.write(
+        `codeburn: no pricing data for model "${safeName}" — costs for this model will show $0. ` +
+        `Update with: npx codeburn@latest, or report at https://github.com/getagentseal/codeburn/issues.\n`
+      )
+    }
+    return 0
+  }

  const multiplier = speed === 'fast' ? costs.fastMultiplier : 1

+  // Clamp negative inputs to 0. A corrupt JSONL that emits a negative token
+  // count would otherwise produce a negative cost that silently subtracts
+  // from real spend in aggregate totals. NaN is also handled here; the
+  // arithmetic below short-circuits to 0 when any operand is non-finite.
+  const safe = (n: number) => (Number.isFinite(n) && n > 0 ? n : 0)
+
  return multiplier * (
-    inputTokens * costs.inputCostPerToken +
-    outputTokens * costs.outputCostPerToken +
-    cacheCreationTokens * costs.cacheWriteCostPerToken +
-    cacheReadTokens * costs.cacheReadCostPerToken +
-    webSearchRequests * costs.webSearchCostPerRequest
+    safe(inputTokens) * costs.inputCostPerToken +
+    safe(outputTokens) * costs.outputCostPerToken +
+    safe(cacheCreationTokens) * costs.cacheWriteCostPerToken +
+    safe(cacheReadTokens) * costs.cacheReadCostPerToken +
+    safe(webSearchRequests) * costs.webSearchCostPerRequest
  )
 }

@ -234,59 +277,67 @@ const autoModelNames: Record<string, string> = {
  'qwen-auto': 'Qwen (auto)',
 }

+const SHORT_NAMES: Record<string, string> = {
+  'claude-opus-4-7': 'Opus 4.7',
+  'claude-opus-4-6': 'Opus 4.6',
+  'claude-opus-4-5': 'Opus 4.5',
+  'claude-opus-4-1': 'Opus 4.1',
+  'claude-opus-4': 'Opus 4',
+  'claude-sonnet-4-6': 'Sonnet 4.6',
+  'claude-sonnet-4-5': 'Sonnet 4.5',
+  'claude-sonnet-4': 'Sonnet 4',
+  'claude-3-7-sonnet': 'Sonnet 3.7',
+  'claude-3-5-sonnet': 'Sonnet 3.5',
+  'claude-haiku-4-5': 'Haiku 4.5',
+  'claude-3-5-haiku': 'Haiku 3.5',
+  'gpt-4o-mini': 'GPT-4o Mini',
+  'gpt-4o': 'GPT-4o',
+  'gpt-4.1-nano': 'GPT-4.1 Nano',
+  'gpt-4.1-mini': 'GPT-4.1 Mini',
+  'gpt-4.1': 'GPT-4.1',
+  'codex-auto-review': 'Codex Auto Review',
+  'gpt-5.5-pro': 'GPT-5.5 Pro',
+  'gpt-5.5': 'GPT-5.5',
+  'gpt-5.4-pro': 'GPT-5.4 Pro',
+  'gpt-5.4-nano': 'GPT-5.4 Nano',
+  'gpt-5.4-mini': 'GPT-5.4 Mini',
+  'gpt-5.4': 'GPT-5.4',
+  'gpt-5.3-codex': 'GPT-5.3 Codex',
+  'gpt-5.3': 'GPT-5.3',
+  'gpt-5.2-pro': 'GPT-5.2 Pro',
+  'gpt-5.2-low': 'GPT-5.2 Low',
+  'gpt-5.2': 'GPT-5.2',
+  'gpt-5.1-codex-mini': 'GPT-5.1 Codex Mini',
+  'gpt-5.1-codex': 'GPT-5.1 Codex',
+  'gpt-5.1': 'GPT-5.1',
+  'gpt-5-pro': 'GPT-5 Pro',
+  'gpt-5-nano': 'GPT-5 Nano',
+  'gpt-5-mini': 'GPT-5 Mini',
+  'gpt-5': 'GPT-5',
+  'gemini-3.1-pro-preview': 'Gemini 3.1 Pro',
+  'gemini-3-flash-preview': 'Gemini 3 Flash',
+  'gemini-2.5-pro': 'Gemini 2.5 Pro',
+  'gemini-2.5-flash': 'Gemini 2.5 Flash',
+  'deepseek-coder-max': 'DeepSeek Coder Max',
+  'deepseek-coder': 'DeepSeek Coder',
+  'deepseek-r1': 'DeepSeek R1',
+  'o4-mini': 'o4-mini',
+  'o3': 'o3',
+  'MiniMax-M2.7-highspeed': 'MiniMax M2.7 Highspeed',
+  'MiniMax-M2.7': 'MiniMax M2.7',
+}
+
+// Sorted longest-first so more-specific prefixes match before shorter ones.
+// Without this, `gpt-5-mini` could resolve to "GPT-5" (the entry for `gpt-5`)
+// if it happened to be iterated before `gpt-5-mini`, hiding a distinct model
+// behind the wrong display name and pricing tier.
+const SORTED_SHORT_NAMES: [string, string][] = Object.entries(SHORT_NAMES)
+  .sort((a, b) => b[0].length - a[0].length)
+
 export function getShortModelName(model: string): string {
  if (autoModelNames[model]) return autoModelNames[model]
  const canonical = resolveAlias(getCanonicalName(model))
-  const shortNames: Record<string, string> = {
-    'claude-opus-4-7': 'Opus 4.7',
-    'claude-opus-4-6': 'Opus 4.6',
-    'claude-opus-4-5': 'Opus 4.5',
-    'claude-opus-4-1': 'Opus 4.1',
-    'claude-opus-4': 'Opus 4',
-    'claude-sonnet-4-6': 'Sonnet 4.6',
-    'claude-sonnet-4-5': 'Sonnet 4.5',
-    'claude-sonnet-4': 'Sonnet 4',
-    'claude-3-7-sonnet': 'Sonnet 3.7',
-    'claude-3-5-sonnet': 'Sonnet 3.5',
-    'claude-haiku-4-5': 'Haiku 4.5',
-    'claude-3-5-haiku': 'Haiku 3.5',
-    'gpt-4o-mini': 'GPT-4o Mini',
-    'gpt-4o': 'GPT-4o',
-    'gpt-4.1-nano': 'GPT-4.1 Nano',
-    'gpt-4.1-mini': 'GPT-4.1 Mini',
-    'gpt-4.1': 'GPT-4.1',
-    'codex-auto-review': 'Codex Auto Review',
-    'gpt-5.5-pro': 'GPT-5.5 Pro',
-    'gpt-5.5': 'GPT-5.5',
-    'gpt-5.4-pro': 'GPT-5.4 Pro',
-    'gpt-5.4-nano': 'GPT-5.4 Nano',
-    'gpt-5.4-mini': 'GPT-5.4 Mini',
-    'gpt-5.4': 'GPT-5.4',
-    'gpt-5.3-codex': 'GPT-5.3 Codex',
-    'gpt-5.3': 'GPT-5.3',
-    'gpt-5.2-pro': 'GPT-5.2 Pro',
-    'gpt-5.2-low': 'GPT-5.2 Low',
-    'gpt-5.2': 'GPT-5.2',
-    'gpt-5.1-codex-mini': 'GPT-5.1 Codex Mini',
-    'gpt-5.1-codex': 'GPT-5.1 Codex',
-    'gpt-5.1': 'GPT-5.1',
-    'gpt-5-pro': 'GPT-5 Pro',
-    'gpt-5-nano': 'GPT-5 Nano',
-    'gpt-5-mini': 'GPT-5 Mini',
-    'gpt-5': 'GPT-5',
-    'gemini-3.1-pro-preview': 'Gemini 3.1 Pro',
-    'gemini-3-flash-preview': 'Gemini 3 Flash',
-    'gemini-2.5-pro': 'Gemini 2.5 Pro',
-    'gemini-2.5-flash': 'Gemini 2.5 Flash',
-    'deepseek-coder-max': 'DeepSeek Coder Max',
-    'deepseek-coder': 'DeepSeek Coder',
-    'deepseek-r1': 'DeepSeek R1',
-    'o4-mini': 'o4-mini',
-    'o3': 'o3',
-    'MiniMax-M2.7-highspeed': 'MiniMax M2.7 Highspeed',
-    'MiniMax-M2.7': 'MiniMax M2.7',
-  }
-  for (const [key, name] of Object.entries(shortNames)) {
+  for (const [key, name] of SORTED_SHORT_NAMES) {
    if (canonical.startsWith(key)) return name
  }
  return canonical
--- a/src/optimize.ts
+++ b/src/optimize.ts
@ -111,9 +111,15 @@ const GRADE_A_MIN = 90
 const GRADE_B_MIN = 75
 const GRADE_C_MIN = 55
 const GRADE_D_MIN = 30
-const URGENCY_IMPACT_WEIGHT = 0.7
-const URGENCY_TOKEN_WEIGHT = 0.3
-const URGENCY_TOKEN_NORMALIZE = 500_000
+// Rebalanced so a high-impact finding with zero observed tokens (e.g.
+// detectGhostAgents firing on five files but tokensSaved=400) cannot
+// outrank a medium-impact finding with many millions of tokens.
+// Old: 0.7/0.3 → high+0 = 0.70, medium+1B = 0.65 (high+0 won).
+// New: 0.5/0.5 → high+0 = 0.50, medium+1B = 0.75 (medium+1B wins).
+// Token normalize lifted to 5M so the rank scales over a realistic range.
+const URGENCY_IMPACT_WEIGHT = 0.5
+const URGENCY_TOKEN_WEIGHT = 0.5
+const URGENCY_TOKEN_NORMALIZE = 5_000_000

 // ============================================================================
 // File system constants
--- a/src/providers/antigravity.ts
+++ b/src/providers/antigravity.ts
@ -87,13 +87,22 @@ async function loadCache(): Promise<AntigravityCache> {
 }

 async function flushCache(liveCascadeIds?: Set<string>): Promise<void> {
-  if (!memCache || !cacheDirty) return
-  try {
-    if (liveCascadeIds) {
-      for (const id of Object.keys(memCache.cascades)) {
-        if (!liveCascadeIds.has(id)) delete memCache.cascades[id]
+  if (!memCache) return
+  // If the caller supplied liveCascadeIds, we must run the eviction step
+  // even when no cascade was added or updated this run; otherwise deleted
+  // .pb files would persist in the cache forever once it stops getting
+  // dirty writes. Mark the cache dirty when an eviction happens so the
+  // file write below proceeds.
+  if (liveCascadeIds) {
+    for (const id of Object.keys(memCache.cascades)) {
+      if (!liveCascadeIds.has(id)) {
+        delete memCache.cascades[id]
+        cacheDirty = true
      }
    }
+  }
+  if (!cacheDirty) return
+  try {

    const dir = getCacheDir()
    await mkdir(dir, { recursive: true })
--- a/src/providers/codex.ts
+++ b/src/providers/codex.ts
@ -338,14 +338,19 @@ function createParser(source: SessionSource, seenKeys: Set<string>): SessionPars
            reasoningTokens = (total.reasoning_output_tokens ?? 0) - prevReasoning
          }

-          if (!last) {
-            const total = info.total_token_usage
-            if (total) {
-              prevInput = total.input_tokens ?? 0
-              prevCached = total.cached_input_tokens ?? 0
-              prevOutput = total.output_tokens ?? 0
-              prevReasoning = total.reasoning_output_tokens ?? 0
-            }
+          // Always advance the prev counters to track the cumulative state.
+          // Previously prev was only updated on the fallback branch, so a
+          // session with mixed last_token_usage / no-last events would
+          // compute the next fallback delta against a stale prev=0 baseline,
+          // double-counting the entire cumulative window. The prev value
+          // must mirror what cumulative reports regardless of whether this
+          // event used `last` or fell back to deltas.
+          const total = info.total_token_usage
+          if (total) {
+            prevInput = total.input_tokens ?? 0
+            prevCached = total.cached_input_tokens ?? 0
+            prevOutput = total.output_tokens ?? 0
+            prevReasoning = total.reasoning_output_tokens ?? 0
          }

          const totalTokens = inputTokens + cachedInputTokens + outputTokens + reasoningTokens
--- a/src/providers/cursor.ts
+++ b/src/providers/cursor.ts
@ -1,4 +1,4 @@
-import { existsSync } from 'fs'
+import { existsSync, statSync } from 'fs'
 import { join } from 'path'
 import { homedir } from 'os'

@ -27,6 +27,7 @@ const modelDisplayNames: Record<string, string> = {
 }

 type BubbleRow = {
+  bubble_key: string
  input_tokens: number | null
  output_tokens: number | null
  model: string | null
@ -100,6 +101,7 @@ function modelForDisplay(raw: string | null): string {

 const BUBBLE_QUERY_BASE = `
  SELECT
+    key as bubble_key,
    json_extract(value, '$.tokenCount.inputTokens') as input_tokens,
    json_extract(value, '$.tokenCount.outputTokens') as output_tokens,
    json_extract(value, '$.modelInfo.modelName') as model,
@ -204,7 +206,12 @@ function parseBubbles(db: SqliteDatabase, seenKeys: Set<string>): { calls: Parse

      const createdAt = row.created_at ?? ''
      const conversationId = row.conversation_id ?? 'unknown'
-      const dedupKey = `cursor:${conversationId}:${createdAt}:${inputTokens}:${outputTokens}`
+      // Use the SQLite row key (bubbleId:<unique>) as the dedup key.
+      // Cursor mutates token counts on the row in place when streaming
+      // completes — including tokens in the dedup key (the previous
+      // implementation) caused the same bubble to be counted twice once
+      // its tokens stabilized.
+      const dedupKey = `cursor:bubble:${row.bubble_key}`

      if (seenKeys.has(dedupKey)) continue
      seenKeys.add(dedupKey)
@ -273,9 +280,21 @@ function extractTextLength(content: AgentKvContent[]): number {
  return total
 }

-function parseAgentKv(db: SqliteDatabase, seenKeys: Set<string>): { calls: ParsedProviderCall[] } {
+function parseAgentKv(db: SqliteDatabase, seenKeys: Set<string>, dbPath: string): { calls: ParsedProviderCall[] } {
  const results: ParsedProviderCall[] = []

+  // Cursor's agentKv schema does not record per-message timestamps. Use the
+  // SQLite file's mtime as a bounded "last write" timestamp for all calls;
+  // it's at least honest (no future time, no always-now). Users running
+  // codeburn against an idle Cursor install will see agentKv calls land at
+  // the actual last activity time rather than today's date.
+  let agentKvTimestamp: string
+  try {
+    agentKvTimestamp = new Date(statSync(dbPath).mtimeMs).toISOString()
+  } catch {
+    agentKvTimestamp = new Date().toISOString()
+  }
+
  let rows: AgentKvRow[]
  try {
    rows = db.query<AgentKvRow>(AGENTKV_QUERY)
@ -362,7 +381,7 @@ function parseAgentKv(db: SqliteDatabase, seenKeys: Set<string>): { calls: Parse
      costUSD,
      tools: [],
      bashCommands: [],
-      timestamp: new Date().toISOString(),
+      timestamp: agentKvTimestamp,
      speed: 'standard',
      deduplicationKey: dedupKey,
      userMessage: session.userText,
@ -406,7 +425,7 @@ function createParser(source: SessionSource, seenKeys: Set<string>): SessionPars
        }

        const { calls: bubbleCalls } = parseBubbles(db, seenKeys)
-        const { calls: agentKvCalls } = parseAgentKv(db, seenKeys)
+        const { calls: agentKvCalls } = parseAgentKv(db, seenKeys, source.path)
        const calls = [...bubbleCalls, ...agentKvCalls]

        await writeCachedResults(source.path, calls)
--- a/src/providers/droid.ts
+++ b/src/providers/droid.ts
@ -206,7 +206,12 @@ function createParser(

      if (assistantCalls.length === 0) return

-      // Distribute session-level token usage across calls
+      // KNOWN LIMITATION: Droid records token usage only at session level
+      // (settings.tokenUsage), not per-message. We split evenly across the
+      // emitted assistant calls and price all of them at settings.model
+      // (the latest model the session used). For sessions where the user
+      // switched models mid-stream, costs are approximate — we have no
+      // ground-truth breakdown to attribute tokens per model.
      const totalTokens = settings.tokenUsage
      if (!totalTokens) return

--- a/src/providers/gemini.ts
+++ b/src/providers/gemini.ts
@ -84,7 +84,7 @@ function parseSession(data: GeminiSession, seenKeys: Set<string>): ParsedProvide
  for (const msg of geminiMessages) {
    const t = msg.tokens!
    totalInput += t.input ?? 0
-    totalOutput += (t.output ?? 0) + (t.thoughts ?? 0)
+    totalOutput += t.output ?? 0
    totalCached += t.cached ?? 0
    totalThoughts += t.thoughts ?? 0
    if (msg.model && !model) model = msg.model
@ -119,7 +119,10 @@ function parseSession(data: GeminiSession, seenKeys: Set<string>): ParsedProvide
  const tsDate = new Date(data.startTime)
  if (isNaN(tsDate.getTime()) || tsDate.getTime() < 1_000_000_000_000) return results

-  const costUSD = calculateCost(model, freshInput, totalOutput, 0, totalCached, 0)
+  // Gemini bills thoughts at the output token rate; calculateCost does not
+  // accept a reasoning parameter, so fold thoughts into the output count for
+  // pricing while keeping outputTokens / reasoningTokens reported separately.
+  const costUSD = calculateCost(model, freshInput, totalOutput + totalThoughts, 0, totalCached, 0)

  results.push({
    provider: 'gemini',
--- a/src/providers/pi.ts
+++ b/src/providers/pi.ts
@ -149,7 +149,14 @@ function createParser(source: SessionSource, seenKeys: Set<string>): SessionPars

        if (msg.role !== 'assistant' || !msg.usage) continue

-        const { input, output, cacheRead, cacheWrite } = msg.usage
+        // Coerce undefined/null token fields to 0. Pi/OMP session files
+        // sometimes omit individual usage fields; the destructure used to
+        // pass undefined into calculateCost which then returned NaN, and
+        // that NaN propagated into every aggregate cost total.
+        const input = msg.usage.input ?? 0
+        const output = msg.usage.output ?? 0
+        const cacheRead = msg.usage.cacheRead ?? 0
+        const cacheWrite = msg.usage.cacheWrite ?? 0
        if (input === 0 && output === 0) continue

        const model = msg.model ?? 'gpt-5'
--- a/src/yield.ts
+++ b/src/yield.ts
@ -50,8 +50,35 @@ function getMainBranch(cwd: string): string {
 type CommitInfo = {
  sha: string
  timestamp: Date
-  isRevert: boolean
  inMain: boolean
+  /** Set when a LATER commit's body says "This reverts commit <sha>" — i.e. the work in this commit was reverted out of main. */
+  wasReverted: boolean
+}
+
+/**
+ * Find SHAs that were the target of a `git revert` ANYWHERE in the repo's
+ * history (not just the time window). The standard `git revert` body
+ * format is "This reverts commit <SHA>." which we grep out.
+ *
+ * The previous implementation flagged a commit as `isRevert` based on the
+ * substring "revert" appearing in its OWN subject. Two bugs there:
+ * 1. Subjects like "Add revert button" matched.
+ * 2. The session that PERFORMED the revert was tagged "reverted", not the
+ *    session whose work was being reverted — so the original session always
+ *    looked productive even after its work was thrown away.
+ */
+function getRevertedShas(cwd: string): Set<string> {
+  const bodies = runGit(
+    ['log', '--all', '--grep=^This reverts commit', '--format=%B%x1e'],
+    cwd,
+  ) ?? ''
+  const set = new Set<string>()
+  const re = /This reverts commit ([0-9a-f]{7,40})/g
+  let m: RegExpExecArray | null
+  while ((m = re.exec(bodies)) !== null) {
+    set.add(m[1].toLowerCase())
+  }
+  return set
 }

 function getCommitsInRange(cwd: string, since: Date, until: Date, mainBranch: string): CommitInfo[] {
@ -68,14 +95,21 @@ function getCommitsInRange(cwd: string, since: Date, until: Date, mainBranch: st
  const mainCommits = new Set(
    (runGit(['log', mainBranch, '--format=%H'], cwd) ?? '').split('\n').filter(Boolean)
  )
+  const revertedShas = getRevertedShas(cwd)

  return log.split('\n').filter(Boolean).map(line => {
-    const [sha, timestamp, subject] = line.split('|')
+    const [sha] = line.split('|')
+    const timestamp = line.split('|')[1] ?? ''
    return {
      sha,
      timestamp: new Date(timestamp),
-      isRevert: subject.toLowerCase().includes('revert'),
      inMain: mainCommits.has(sha),
+      // wasReverted: matches when ANY later commit's body says
+      // "This reverts commit <sha>". Compare against the full SHA AND its
+      // 7-char short prefix to be safe; git revert sometimes records the
+      // short form.
+      wasReverted: revertedShas.has(sha.toLowerCase()) ||
+                   revertedShas.has(sha.toLowerCase().slice(0, 7)),
    }
  })
 }
@ -101,7 +135,10 @@ function categorizeSession(
  }

  const inMainCount = relevantCommits.filter(c => c.inMain).length
-  const revertedCount = relevantCommits.filter(c => c.isRevert && c.inMain).length
+  // A session is "reverted" when at least half of its in-main commits were
+  // later reverted out (revert detected via "This reverts commit <sha>"
+  // anywhere later in history, not in the same time window).
+  const revertedCount = relevantCommits.filter(c => c.inMain && c.wasReverted).length

  if (revertedCount > 0 && revertedCount >= inMainCount / 2) {
    return { category: 'reverted', commitCount: relevantCommits.length }