Commit graph

24 commits

Author SHA1 Message Date
iamtoruk
2fb078bdfb Fix V8 OOM crash on 30-day period with Buffer-based line reader and large-line parser
Three-layer fix for V8 heap exhaustion when parsing heavy session data:

1. Buffer-based readSessionLines (fs-utils.ts): Replace readline with raw
   Buffer streaming using Buffer.indexOf(0x0a). Eliminates ConsString trees
   that caused OOM when regex-flattening 100MB+ lines. Two-state machine
   (ACCUMULATING/SCANNING) skips old lines at ~2KB cost instead of 200MB.

2. Large-line streaming parser (parser.ts): Hand-written JSON scanner for
   lines >32KB extracts only cost/token/tool fields without JSON.parse,
   avoiding full object graph allocation. Dual string/Buffer paths.

3. Dashboard memory management (dashboard.tsx): Disable auto-refresh for
   heavy periods (30d/month/all), clear old dataset before reload via
   nextTick to allow GC, prevent overlapping reloads with mutex, lazy
   optimize scanning on keypress instead of useEffect.

Also fixes three race conditions in dashboard reload deduplication:
- Early return after nextTick bypassing finally block (permanent mutex lock)
- A->B->A period switching dropping final reload (stale pending)
- Stale pendingReloadRef not cleared when in-flight matches request
2026-05-15 23:15:26 -07:00
Resham Joshi
46e43a0ec3
Label optimize suggestions by destination (#281)
Some checks are pending
CI / semgrep (push) Waiting to run
Closes #277.

Every paste-style fix now declares an explicit `destination` so users can
tell at a glance whether a suggestion belongs in CLAUDE.md as a permanent
rule, in a one-time session opener, in the current chat as an ask, or in
a shell config file. Previously the prompts had no labeled home and users
were dropping one-time session openers into CLAUDE.md as permanent rules.

Type changes:
- New `PasteDestination` union: `claude-md` / `session-opener` / `prompt`
  / `shell-config`
- `WasteAction.paste` gains `destination?: PasteDestination`

Renderer changes:
- CLI `optimize` command (renderOptimize → renderFinding) prints a
  section header above each fix block:
    -- Suggested CLAUDE.md addition (permanent rule) ───
    -- One-time session opener (do NOT add to CLAUDE.md) ───
    -- Ask Claude in the current session ───
    -- Add to your shell config ───
    -- Run this command ───
- Interactive dashboard (FindingAction in dashboard.tsx) gets the same
  treatment so the in-popover findings list reads identically.

Existing fixes retagged appropriately. Two existing prompts that lacked
destination context altogether ("Set a delivery checkpoint at the start
of the next expensive thread", "Start the next expensive thread with a
fresh-context constraint") now read as one-time session openers with a
clear "do not add to CLAUDE.md" hint — the exact failure mode the
reporter described.

Tests:
- Existing `detectJunkReads` test extended to assert the destination tag.
- New regression block walks every detector that emits a paste-style fix
  and asserts each one declares a destination — future detectors that
  ship without one get caught here.
2026-05-08 23:30:53 -07:00
Resham Joshi
afd0ee7011
Validator hardenings on the bug-hunt batch (#254)
* Five correctness fixes from multi-agent bug hunt

A multi-agent audit of the codeburn correctness surface found five
real bugs each producing visibly wrong numbers or risking data loss.
All five fixes were validated by parallel review agents and exercised
end-to-end against real session data on this machine.

- src/cli.ts: --refresh <seconds> was using bare parseInt as the
  commander callback. Commander invokes the callback as
  parseInt(value, previous), so previous becomes the radix:
  --refresh 30 was being parsed as parseInt('30', 30) = 90, and
  --refresh 60 became NaN. Replaced with parseInteger (already
  defined at line 48 with radix locked to 10) at all three sites.

- src/providers/cursor.ts: parseAgentKv was timestamping every
  agentKv call as new Date().toISOString() because the Cursor
  SQLite schema has no per-message timestamp. Result: every
  Cursor agent call regardless of when it happened landed in
  today's date bucket. Now uses statSync(dbPath).mtimeMs as a
  bounded ceiling so calls land at the actual last-write time of
  the Cursor database, not today. Verified locally: a 1904-call
  Cursor history with March 22 mtime now correctly bucket into
  all-time only and shows 0 calls for today/week/30days.

- src/providers/codex.ts: prev token counters were only updated
  inside the cumulative-fallback branch, so a session emitting N
  events with last_token_usage followed by one cumulative-only
  event computed the next delta against prev=0 and double-counted
  the entire cumulative window. Cost could be inflated 10-100x
  for any mixed-format Codex session. Now prev advances to the
  current cumulative state regardless of which branch ran.

- src/providers/gemini.ts: totalOutput accumulated output+thoughts
  while totalThoughts was tracked separately. The result was
  outputTokens = output+thoughts AND reasoningTokens = thoughts;
  any consumer summing the two double-counted thoughts. Now
  totalOutput holds just output, reasoningTokens holds thoughts,
  and the cost calc folds thoughts into the output count to keep
  pricing correct (Google bills thoughts at the output rate;
  calculateCost has no reasoning parameter).

- src/export.ts: exportJson had no safety check before writeFile,
  so codeburn export -f json -o ~/important.json would silently
  clobber the user's file. CSV path had a marker-file guard; JSON
  did not. Now refuses to overwrite a file unless its first 4KB
  contain the codeburn schema marker. Uses a streaming partial
  read so a large existing file does not OOM Node's ~512MB
  string limit. Refuses directories outright.

Skipped intentionally: cursor-auto/copilot-auto/cline-auto/
qwen-auto are aliased to claude-sonnet-4-5. The audit flagged
this as wrong pricing for non-Anthropic auto-routed turns, but
Cursor's "auto" mode does not expose the actual model and any
alternative estimate is equally arbitrary. README already
documents this as a Sonnet-based estimate.

vitest run: 38 files, 529 tests pass.

* Five more correctness fixes from the bug-hunt round

This commit closes out the remaining critical-tier findings from the
multi-agent audit, with one item documented as a known limitation.

- src/providers/cursor.ts: bubble dedup key included mutable
  inputTokens/outputTokens. Cursor mutates token counts on the row in
  place when streaming completes, so re-parsing the same DB produced
  a fresh dedup key per bubble and silently double-counted. Switched
  to the SQLite row key (`bubbleId:<unique>`) which is stable per
  bubble. Adjusted BubbleRow type and BUBBLE_QUERY_BASE to expose
  `key as bubble_key`.

- src/providers/pi.ts: usage fields were destructured non-optionally,
  but real Pi/OMP session files sometimes omit individual fields.
  `calculateCost(model, undefined, ...)` returned NaN, and that NaN
  propagated into every aggregate cost total. Coerce each field to
  0 with `?? 0`.

- src/models.ts: getShortModelName and the getModelCosts startsWith
  fallback both walked the dictionary in insertion order. A model id
  like `gpt-5-mini` could resolve to the entry for `gpt-5` (matched
  by startsWith first) and silently get GPT-5's display name and
  pricing tier. Iterate longest keys first so more-specific prefixes
  win. Tightened the cost fallback's match condition from
  `startsWith(key) || startsWith(key + '-')` to require either an
  exact match or a `key + '-'` continuation, removing accidental
  matches like `gpt-50` against `gpt-5`.

- src/models.ts: calculateCost returned 0 silently for any model
  missing from the pricing snapshot. New Anthropic / OpenAI models
  shipped between snapshot refreshes look free until the user
  notices. Now warns once per unknown model name per process to
  stderr. Skips the warning for the `<synthetic>` placeholder so
  the noise floor stays low.

- src/yield.ts: revert detection was broken on the canonical case.
  Two problems: (1) `subject.toLowerCase().includes('revert')`
  matched any commit whose subject mentioned the word ("Add revert
  button" was misclassified). (2) The window logic only counted
  reverts within the original session's 1-hour boundary, but real
  `git revert` commits land in later sessions, so original sessions
  always looked productive. Now: getRevertedShas runs once with
  `--grep=^This reverts commit` and parses bodies to build a Set of
  SHAs that were the target of a revert anywhere in history.
  CommitInfo.wasReverted is set when this commit's SHA appears in
  that set. categorizeSession then flags a session as reverted when
  its in-main commits were later reverted, regardless of when the
  revert itself happened.

- src/providers/droid.ts: SKIPPED with comment. Droid records token
  usage only at session level. The current behavior splits evenly
  across emitted assistant calls and prices all of them at
  settings.model (the latest model). For sessions where the user
  switched models mid-stream, costs are approximate. Added an
  inline comment documenting this; a real fix requires per-message
  model data that isn't in the Droid JSONL schema.

Verified end-to-end on this machine:
- vitest run: 38 files, 529 tests pass
- `codeburn report --format json` produces valid JSON
- `codeburn yield -p week` runs without crashing, finds 0 reverts
  in the user's recent git history (plausible — fix changed the
  detection from "subject contains revert" to "this commit's SHA
  appears in a later 'This reverts commit ...' body")
- Stderr now warns for unknown model ids: `openai/gpt-5.3`,
  `qwen3.6:35b-a3b-bf16`, `big-pickle`. These previously priced
  silently at $0.

* Four high-severity fixes from the bug-hunt round

- src/currency.ts: getExchangeRate wrapped fetchRate and cacheRate in
  one try/catch. If fetchRate succeeded but cacheRate threw (disk
  full, ENOSPC, no permissions on the cache dir), the catch block
  swallowed the error and returned 1. Every cost rendered after that
  point became USD-equivalent silently. Now the fetch and the cache
  write live in separate paths: a successful fetch returns the rate
  even if the persist fails, and the cache-write error is dropped to
  a fire-and-forget so transient disk problems do not corrupt the
  user's currency display.

- src/cursor-cache.ts: writeFile was non-atomic. Two concurrent
  codeburn invocations writing to cursor-results.json could
  interleave bytes mid-write, leaving a truncated file that
  parsed-error on next read and forced a full SQLite re-scan every
  run. Switched to the temp-file + rename pattern with a randomized
  temp name so each writer gets its own staging file and the rename
  is atomic on POSIX. Crash mid-write also leaves only a leftover
  temp file, which gets unlinked in the catch path; the destination
  is never half-written.

- mac/.../CodeBurnApp.swift refresh loop on sleep: the loop's
  Task.sleep keeps a wakeup pending across system sleep, so on wake
  the natural tick fires the same instant the wake observers do.
  Combined with didWakeNotification, screensDidWakeNotification, and
  the launchd com.codeburn.refresh distributed notification, that
  produced 2-3 concurrent CLI spawns within ms of every wake. Now:
  willSleepNotification cancels the loop task; didWakeNotification
  restarts it. The loop also reads lastRefreshTime and skips its
  natural tick if a wake/manual/distributed-notification refresh ran
  within the last 5 seconds, coalescing the two sources of refresh
  into one CLI spawn per wake event.

- mac/.../CodeBurnApp.swift observeStore: the read closure had an
  implicit strong self capture (it accessed store.* without a
  capture annotation), pinning self for the lifetime of any
  unfired observation. Added [weak self] and a guard to make the
  capture explicit. withObservationTracking is one-shot per call,
  so there is at most one active subscription at a time; the
  earlier audit's claim of an unbounded leak overstated the issue,
  but tightening the capture pattern is still cleaner.

Verified:
- vitest run: 38 files, 529 tests pass
- swift build -c release --arch arm64 --arch x86_64: clean, no
  diagnostics, no MainActor warnings
- mac/Scripts/package-app.sh dev produces a valid universal bundle
- Menubar launches and runs without crash

* Eleven medium-severity fixes from the bug-hunt round

- src/format.ts formatTokens: guard against Infinity, NaN, and
  negative input. Previously a corrupt aggregate could leak into
  the UI as the literal strings "NaN" or "Infinity". Negatives now
  render as "0" rather than "-500" with no scaling.

- src/cli-date.ts parseDateRangeFlags: the missing-from default
  was new Date(0), which opened a 55-year scan from 1970 epoch
  whenever the user passed only --to. Default now anchors at 6
  months back from now, matching the dashboard's all-time period.
  Test updated to assert the new bounded window.

- src/cli-date.ts toPeriod: previously fell back silently to "week"
  for any unknown input, so a typo like `-p mounth` produced a
  quiet 7-day report while the user thought they were viewing the
  month. Now exits with a clear stderr error and exit code 1.
  Test updated to assert the loud-failure behavior.

- src/optimize.ts urgencyScore: rebalanced weights so a high-impact
  finding with zero observed tokens cannot outrank a medium-impact
  finding with millions of tokens. Old 0.7/0.3 split made high+0
  (0.70) beat medium+1B (0.65). New 0.5/0.5 split makes medium+1B
  (0.75) beat high+0 (0.50). Token normalization lifted to 5M so
  the ramp covers a realistic spend range.

- src/models.ts calculateCost: clamp negative or non-finite token
  inputs to 0 before pricing. A corrupt JSONL emitting a negative
  count would otherwise produce a negative cost that silently
  subtracted from real spend in aggregates.

- src/currency.ts convertCost: stop rounding during aggregation.
  For zero-fraction currencies (JPY, KRW, CLP) this clamped every
  per-session cost to a whole unit before sum, so a project of
  1000 sessions averaging ¥0.4 each aggregated to ¥0 instead of
  ¥400. formatCost still rounds at the display boundary.

- src/config.ts saveConfig: the temp file path was a fixed
  `${configPath}.tmp` suffix. Two simultaneous saveConfig calls
  (overlapping menubar and CLI runs) raced on the same staging
  file and could leave one writer reading partial bytes from the
  other. Randomized the temp suffix per call.

- src/providers/antigravity.ts flushCache: the early return on
  `!cacheDirty` short-circuited eviction when liveCascadeIds was
  supplied but no cascade had been added or updated this run. As
  a result, deleted .pb files persisted in the cache forever once
  the user stopped writing to it. Eviction now runs whenever
  liveCascadeIds is provided, marks the cache dirty if anything
  was removed, and only then short-circuits if there is nothing
  to write.

- src/daily-cache.ts addNewDays: cap retention at 2 years. The
  days array previously merged forever, growing the cache file by
  hundreds of bytes per day until JSON parse on every CLI
  invocation became measurable. The 6-month UI period plus the
  365-day BACKFILL_DAYS bootstrap both fit comfortably inside the
  cap, with headroom for a future longer window.

- src/dashboard.tsx useInput: period number keys (1-5) and arrow
  keys triggered a reload while the compare view was mounted. The
  parent's data state changed underneath the user with no visual
  affordance back to the dashboard. Now those keys are gated on
  view !== 'compare', and `b` / Esc inside compare returns to the
  dashboard.

- mac/.../HeatmapSection.swift formatters: prettyDate, buildTrend
  Bars, computeTrendStats, computeForecast, and computeAllStats
  each allocated a fresh DateFormatter (and Calendar) on every
  call. SwiftUI re-evaluates these views many times per second
  during hover scrubbing on the trend chart, so the allocations
  were a measurable hot spot. Lifted the yyyy-MM-dd / "EEE MMM d"
  / "MMM d" formatters and the gregorian Calendar to fileprivate
  cached singletons.

Two findings from the same bucket were not addressed here:
- UpdateChecker SHA-256 / codesign verification is already
  performed by src/menubar-installer.ts (verifyChecksum at line
  85). The Swift side just kicks off `codeburn menubar --force`
  which runs that path. The audit's claim of missing verification
  was a misread.
- NSDistributedNotificationCenter sender validation: the
  `com.codeburn.refresh` listener accepts from any sender, but
  forceRefresh has a 5-second rate-limit gate so the abuse
  ceiling is one CLI spawn per 5 seconds. Mitigations (Mach IPC,
  per-launch shared secret) are disproportionate to the impact.

vitest run: 38 files, 529 tests pass.
swift build -c release: clean, no warnings.

* Validator hardenings on the bug-hunt batch

Hoist the per-call sort in getModelCosts and getShortModelName to module
scope so model lookups on the hot path stop reallocating sorted key arrays.

Sanitize the unknown-model stderr warning by stripping C0/C1 controls
and capping length, so a hostile or corrupt JSONL cannot inject terminal
escape sequences via the model field.

Skip the daily-cache prune when newestDate fails to parse. The previous
code produced a NaN cutoff and silently dropped every cached day on the
next merge.

Adds tests locking down the stable resolution of common model names
(gpt-5-mini vs gpt-5, claude-haiku-4-5 vs claude-3-5-haiku, etc.) and
the prune NaN guard.
2026-05-06 19:50:40 -07:00
Resham Joshi
75d4701bd8
feat(optimize): flag low-worth expensive sessions
Some checks are pending
CI / semgrep (push) Waiting to run
Adds a low-worth detector to codeburn optimize that flags expensive sessions with weak delivery signals (no edits, repeated retries, or no one-shot edits) when no git/gh delivery command is observed. Priority order is low-worth → context-bloat → outliers; each later detector excludes sessions named by an earlier one so the same session is never listed in three findings. Detection:  floor,  for no-edit, 3+ retries, regex matches git commit/push and gh pr create/merge but excludes commit-tree/commit-graph and dry-run. Three impact tiers consistent with #246. Token-savings uses full session tokens for no-edit sessions and the retry fraction for edit-with-retry sessions. Supersedes #241 with review fixes. Original implementation by @ozymandiashh.
2026-05-06 00:35:41 -07:00
Resham Joshi
f92d57d24a
feat(optimize): detect context-heavy sessions
Adds a context-bloat finding to codeburn optimize that flags sessions where effective input/cache tokens (cache-discounted via existing pricing constants) are large and disproportionate to output. Suggests starting fresh with a tightened context. Sessions flagged here are excluded from the cost-outlier finding to avoid double-listing. Growth-from-previous-session callouts are suppressed when the predecessor is more than 7 days back. Three impact tiers (low/medium/high). Supersedes #242 with review fixes from real-data probe. Original implementation by @ozymandiashh.
2026-05-06 00:11:12 -07:00
iamtoruk
38d21643bd Merge origin/main into feat/session-outlier-detection 2026-05-04 20:21:26 -07:00
iamtoruk
735f41bc6c Fix cache-write pricing and shell-quote server names in fix commands
- Use 1.25x multiplier for cache-write tokens to match Anthropic's
  actual pricing (was incorrectly using 1x)
- Shell-quote server names in `claude mcp remove` fix text to prevent
  issues with unusual server names
2026-05-04 20:11:50 -07:00
ozymandiashh
d18ba3d2fe feat(optimize): detect session cost outliers 2026-05-05 05:25:49 +03:00
ozymandiashh
e46b20b927 fix(optimize): reuse mcp coverage and type schema estimator 2026-05-05 05:11:00 +03:00
ozymandiashh
1a080a006f feat(optimize): MCP tool coverage detector with cache-aware costing
Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".

Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.

Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.

When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.

A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.

Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.

`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."

Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged

Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.

Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
  Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
  fully-qualified names, returns sorted unique list. `buildSessionSummary`
  passes it through; field is omitted when empty so JSON exports stay
  clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
  (single- and multi-server signatures), `detectMcpToolCoverage`. Wired
  into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
  new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
  combined-cap behaviour, threshold gates, invocation-only-session
  filtering, foreign-tool invocations, cache rebuild events, write+read
  on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
  including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).

Closes #2
2026-05-05 04:13:04 +03:00
Łukasz Majcher
5e49f17e64 fix: switch scanJsonlFile and parseSessionFile to readSessionLines to prevent OOM
readViaStream (used for files ≥8 MB) reconstructs the full file as a
single string via chunks.join('\n'), giving the same peak allocation as
readFile. Callers then call content.split('\n'), creating a second copy.
With FILE_READ_CONCURRENCY=16 and files up to 128 MB this can exhaust
the V8 heap (~6 GB theoretical peak).

readSessionLines already exists as a proper async generator that yields
one line at a time. Switch both hot-path callers to iterate it directly
so the full file string is never held in memory.

Adds two tests: a spy test confirming readSessionLines is called (not
readSessionFile), and a 500-entry correctness test.

Fixes #131
2026-04-22 10:11:13 +00:00
AgentSeal
77257bcb89
Merge pull request #68 from lfl1337/fix/remove-claudeignore-references
docs(optimize): remove references to .claudeignore (#61)
2026-04-17 14:20:50 +02:00
Ninym
216782391a fix(optimize): use bounded read helpers
All four read paths in the optimizer (async session scan + three sync
config/import/profile scans) now pass through the 128 MB-capped
helpers. JSON.parse in readJsonFile stays wrapped in try/catch.
MEDIUM-1 coverage for the optimize module.
2026-04-17 08:32:20 +02:00
Ninym
bd71377fdd docs(optimize): remove references to non-existent .claudeignore
Claude Code does not document or implement a .claudeignore feature.
The junk-reads detector's fix is now a CLAUDE.md instruction asking
Claude to avoid generated/dependency directories. The separate
detectMissingClaudeignore finding and its tests are removed; checking
for the presence of a non-existent file has no signal.

Closes #61.
2026-04-17 08:32:07 +02:00
AgentSeal
f958756861 chore(optimize): name preview-count literals and drop punctuation dashes
Rename slice(0, N) and length - N literals used in waste-finding
preview strings to GHOST_NAMES_PREVIEW, GHOST_CLEANUP_COMMANDS_LIMIT,
TOP_ITEMS_PREVIEW, MISSING_IGNORE_PATHS_PREVIEW, JUNK_DIRS_IGNORE_PREVIEW.
Drop the punctuation double-hyphen from the config-health header.
2026-04-16 16:20:53 -07:00
AgentSeal
3bd3c0d7b4 chore(optimize): extract trend period magic number to named constant 2026-04-16 16:09:19 -07:00
AgentSeal
b9b2c7a900 chore: remove dead ContextBudgetPanel and unused dateRange parameter
- ContextBudgetPanel in dashboard.tsx was defined but never rendered
  after the per-project overhead column replaced it in Phase 1.
- detectLowReadEditRatio accepted a dateRange parameter that was
  never used inside the function (trend is computed via recent-call
  flags on ToolCall).

Verified with tsc --noUnusedLocals --noUnusedParameters: 0 errors.
160 tests pass.
2026-04-16 09:35:46 -07:00
AgentSeal
b9dffc16ec chore(optimize): remove dead versions plumbing + name remaining magic numbers
- ScanData.versions and ScanFileResult.versions were collected but never
  read in scanAndDetect. Per-call version lives on ApiCallMeta.version
  which is what detectCacheBloat actually uses. Dropped the unused
  aggregation path end-to-end.
- Extract DEFAULT_CACHE_BASELINE_TOKENS, CACHE_BASELINE_QUANTILE,
  CACHE_VERSION_MIN_SAMPLES, CACHE_VERSION_DIFF_THRESHOLD as named
  module-scope constants. Honors the "no magic numbers" project rule
  across the full optimize engine.
2026-04-16 07:09:20 -07:00
AgentSeal
707d2faff1 feat(optimize): UX polish
- Rename "ctx" column to "overhead" with wider padding for legibility
  in the By Project panel.
- Name magic numbers for the project column widths.
- Empty-state now includes a one-line description of what optimize
  does, so first-time users with no findings still understand the
  feature.
2026-04-16 06:59:07 -07:00
AgentSeal
a9ca2a1134 feat(optimize): fix tracking via recent vs baseline split
Solves the problem where users who fixed an issue continued to see
the finding for the remainder of the period. Findings now show
visible progress or disappear entirely.

Mechanism (no state file, no new I/O):
- ToolCall and ApiCallMeta gain a `recent` boolean, set when the
  entry's timestamp falls inside a rolling 48-hour window.
- Each session-based detector counts recent vs total occurrences.
- computeTrend classifies each finding:
    active    -- recent rate matches baseline
    improving -- recent rate under half of baseline (green arrow)
    resolved  -- zero recent waste AND confirmed recent activity
- Resolved findings are suppressed. Improving findings render with
  a green "improving down-arrow" badge next to the impact label.
- When no recent activity exists, findings default to active so a
  user who simply paused is not told everything is fixed.

Applies to the four session-based detectors: junk reads, duplicate
reads, low read:edit ratio, cache bloat. The filesystem detectors
(missing .claudeignore, bloated CLAUDE.md, unused MCP, ghost agents
/ skills / commands, bash limit) already self-heal on next run.

5 new tests cover computeTrend edge cases. 126 tests pass.
2026-04-16 06:53:08 -07:00
AgentSeal
7e4edf66b6 perf(optimize): mtime pre-filter, parallel reads, result cache
- runWithConcurrency helper runs file reads with configurable parallelism
  (default 16) instead of sequential await.
- isFileStaleForRange skips files whose mtime is older than the date range
  start, avoiding unnecessary reads for narrow periods.
- Result-level cache keyed on (dateRange, project fingerprint) with
  60s TTL. Warm dashboard 'o' press now hits cache instead of rescanning.
- Early-return when projects array is empty so empty-state path does not
  trigger filesystem walk.

Measured CLI cold scan on 12K files, 1833 sessions week:
  before: 12-17s
  after: 6-7s
Dashboard warm cache hit: <50ms.
2026-04-16 06:35:39 -07:00
AgentSeal
be45045fd8 refactor(optimize): correctness, constants, and real tests
Phase 1 hardening pass.

Bug fixes:
- Move cwd/version collection inside date-range filter. 7d and 30d
  now produce different findings for filesystem detectors.
- detectGhostSkills threshold aligned with peer detectors.
- detectUnusedMcp gets 24-hour grace period via config file mtime so
  newly added servers are not flagged as unused.
- detectCacheBloat replaces hardcoded 50K baseline with user-derived
  p25 of cache writes. Flags only when median exceeds 1.4x baseline.
- detectBashBloat scans user shell profiles instead of the auditor's
  process.env.
- @-import pattern requires ./ ../ or / to avoid matching email
  addresses or npm scopes.
- Command usage pattern requires leading whitespace/start-of-line
  before /cmd so path references like /tmp are not counted as usage.
- AVG_TOKENS_PER_READ lowered from 1500 to 600 and
  CLAUDEMD_TOKENS_PER_LINE lowered from 25 to 13 for realistic
  prose/config sizing.

Code quality:
- Every magic number extracted to named module-scope constants.
- Dead code removed (IMPACT_ORDER, unused stat import).
- Shared loadMcpConfigs helper deduplicates config walking.
- Shared shortHomePath, isReadTool, inRange helpers.
- All detectors and computeHealth exported for real tests.
- Ghost detectors run in parallel via Promise.all.
- Cost rate defaults to 0 when unknown so UI can suppress instead of
  showing fabricated numbers.

Tests:
- Replaced 17 fake tests that re-implemented detector logic with
  26 tests importing and exercising the real exports.
- Cover threshold boundaries, impact scaling, edge cases.
- 121 tests pass.

UX header: "Setup" renamed to "Health", issue count shown inline.
CLAUDE.md: adds rule against "steal/copy/rip-off" wording in
public-facing text.
2026-04-16 06:30:15 -07:00
AgentSeal
b88f2cd730 feat: ghost detectors, health grade, @-import expansion
Expanded the optimize engine with new detectors and scoring:

1. Health score + letter grade (A-F) in optimize header. Weighted
   per-impact with caps. Gives users an instant "is my setup healthy"
   read that doubles as a shareable number.

2. Urgency score replaces impact-enum sort. Weighted 0.7 * impact
   + 0.3 * normalized tokens. Produces better-ranked findings.

3. Three new ghost detectors:
   - Ghost agents: files in ~/.claude/agents/ never invoked via
     Agent/Task tool
   - Ghost skills: SKILL.md directories never triggered
   - Ghost slash-commands: ~/.claude/commands/ files never referenced
     in user messages

4. @-import chain expansion for CLAUDE.md. Recursively follows
   @path/to/file imports (max depth 5) so bloat detection counts
   transitive load, not just the base file. Fixes undercounting for
   users with modular CLAUDE.md setups.

9 new tests covering health scoring and import expansion.
2026-04-16 05:09:01 -07:00
AgentSeal
710316053e feat: add optimize command, in-TUI optimize view, and per-project context budget
Optimize engine detects 8 waste patterns from Claude Code session data:
- Junk directory reads (node_modules, .git, dist, etc.)
- Duplicate file reads per session
- Unused MCP servers (configured but never called)
- Missing .claudeignore in projects with junk dirs
- Bloated CLAUDE.md files (>200 lines)
- Uncapped BASH_MAX_OUTPUT_LENGTH
- Low Read:Edit ratio (edit-without-reading, per #42796)
- High cache_creation overhead (per #46917)

Each finding includes impact rating, token/cost savings estimate, and
exact fix (paste, command, or file content).

Dashboard integration:
- o key switches to in-TUI optimize view, b key goes back
- Background scan on load, o button only when findings exist
- Per-project Context Budget column in By Project panel showing
  estimated per-call overhead (system + MCP tools + skills + CLAUDE.md)

CLI: codeburn optimize [-p period] [--provider]
2026-04-16 04:04:37 -07:00