Cache now tracks the calendar date and clears on day rollover so
overnight sleep no longer shows yesterday's numbers. Wake-from-sleep
invalidates the entire cache before fetching. Manual refresh and wake
explicitly request loading feedback so the spinner is visible even
when stale data exists.
copilot-openai-auto maps to gpt-5.3-codex pricing, which is wrong
for users on Anthropic models. The original copilot-auto fallback
is provider-agnostic and correct when no model can be inferred.
The menubar ran --optimize on every 30-second CLI invocation. As
sessions accumulated throughout the day, optimize got heavier until
it exceeded the 45-second timeout. When the fetch failed with no
cached data, the loading overlay had no escape hatch and stayed
forever.
- Never pass includeOptimize from the menubar (background loop,
forceRefresh, tab/period switches, manual refresh button)
- On fetch failure with empty cache, retry without optimize as
fallback so the spinner always clears
- refreshQuietly also skips optimize
- Use 1.25x multiplier for cache-write tokens to match Anthropic's
actual pricing (was incorrectly using 1x)
- Shell-quote server names in `claude mcp remove` fix text to prevent
issues with unusual server names
PR #221 unified the period logic but missed the TUI hotkey bar,
GNOME indicator popup, and macOS menubar app. All surfaces now
consistently show '6 Months' instead of 'All' or 'all time'.
Adds a per-tool optimizer finding for MCP servers whose schema is loaded
on every turn but rarely invoked. Builds on the existing server-level
`detectUnusedMcp` (zero invocations) by reporting partial-use cases:
"loaded 54 tools, called 0" or "loaded 26 tools, called 2 (8% coverage)".
Inventory comes from Claude Code's JSONL `attachment.deferred_tools_delta`
entries: `addedNames` lists the exact tools available at that turn,
including every fully-qualified `mcp__<server>__<tool>` name. We union
across all delta entries in a session (not just the first) because tool
availability can change mid-session when the user reloads MCP config or
a subagent inherits a different tool set. Names that don't match the
`mcp__<server>__<tool>` shape with both segments non-empty are rejected
at extraction so downstream `split('__')` consumers can't be poisoned.
Token-savings estimates are cache-aware. MCP tool schemas live in the
cached prefix of the system prompt: a session pays the full input price
on each cache-creation turn (rebuilds happen every ~5 minutes of
inactivity) and the cache-read discount on subsequent turns. Each call's
contribution is capped at its observed `cacheCreationInputTokens` /
`cacheReadInputTokens` so we never claim more MCP overhead than the
call's own cache buckets could contain.
When multiple servers are flagged, costing happens in a single combined
pass: the per-call cap applies to the total unused-schema budget across
all flagged servers, not per server. Two flagged servers cannot both
independently claim the same call's cache bucket, which would otherwise
overstate `tokensSaved` and misclassify findings as high impact.
A session counts toward `loadedSessions` (and toward the cost estimate)
only if its observed inventory included the server. Pure invocation-only
sessions, where the server appears in `mcpBreakdown` or `call.mcpTools`
without any matching `deferred_tools_delta`, do not satisfy the
`>= 2 sessions` threshold on their own. The same invariant applies in
`estimateMcpSchemaCost` so the two passes agree.
Coverage is computed against the inventory only: invocations of names
not present in any observed inventory (older config, hallucinated tool,
typo) do not inflate `toolsInvoked` and cannot drive `unusedCount`
negative. `toolsInvoked` is derived as `inventory.size - unusedTools.length`
to keep both numbers consistent.
`detectUnusedMcp` and the new detector are explicitly disjoint:
`detectUnusedMcp` skips servers that the coverage detector will report,
not every server that happens to be in any inventory, so a small
inventoried-but-uninvoked server below the coverage thresholds still
gets flagged as "configured but never called."
Thresholds for the coverage finding:
- > 10 tools available (small servers are noise)
- < 20% coverage
- >= 2 sessions with observed inventory
- High impact when total effective tokens >= 200_000 or >= 3 servers flagged
Smoke-tested on a real account: 7 servers flagged across 93 sessions
(`office-word-mcp` 0/54, `notebooklm-mcp` 0/38, `office-ppt-mcp` 0/37,
`excel-mcp-server` 0/25, `github-mcp-server` 2/26, `peekaboo` 3/22, plus
`claude_ai_Asana`). Combined-cap costing keeps `tokensSaved` honest.
Changes:
- src/types.ts: optional `mcpInventory: string[]` on `SessionSummary`.
Provider-agnostic field; currently populated only by the Claude parser.
- src/parser.ts: `extractMcpInventory` walks all entries, validates
fully-qualified names, returns sorted unique list. `buildSessionSummary`
passes it through; field is omitted when empty so JSON exports stay
clean.
- src/optimize.ts: `aggregateMcpCoverage`, `estimateMcpSchemaCost`
(single- and multi-server signatures), `detectMcpToolCoverage`. Wired
into `scanAndDetect`. `detectUnusedMcp` updated to disjoint with the
new detector.
- tests/mcp-coverage.test.ts: 23 cases covering aggregation, costing,
combined-cap behaviour, threshold gates, invocation-only-session
filtering, foreign-tool invocations, cache rebuild events, write+read
on the same call, multi-server pluralisation.
- tests/parser-mcp-inventory.test.ts: 12 cases for the JSONL extractor
including malformed name rejection and tolerant attachment parsing.
- CHANGELOG.md: entry under Unreleased / Added (CLI).
Closes#2
`getDateRange` was duplicated across `src/cli.ts` and `src/dashboard.tsx`
with conflicting semantics for `'all'`. The CLI intentionally bounded
`'all'` to the last 6 months (justified inline: keeps Codex/Cursor parses
responsive on sparse multi-year history). The dashboard returned
`new Date(0)` instead, so the same `--period all` flag silently meant
two different windows depending on which entry point you hit.
`Period`, `PERIODS`, `PERIOD_LABELS`, and `toPeriod` were duplicated as
well, and `cli-date.ts` already existed for date helpers
(`parseDateRangeFlags`) so the consolidation lives there.
Both call sites now go through a single `getDateRange(period: string)`
in `cli-date.ts` that returns `{ range, label }`. The dashboard wraps it
as `getPeriodRange(period: Period)` to keep the strict `Period` type at
the React boundary while letting the CLI continue to accept extras like
`'yesterday'`.
`PERIOD_LABELS.all` becomes `'6 Months'` (short, for the dashboard tab
strip; the previous `'All Time'` was misleading and the long-form
`'Last 6 months'` from `getDateRange().label` already drives CLI output).
Changes:
- src/cli-date.ts: add `Period`, `PERIODS`, `PERIOD_LABELS`, `toPeriod`,
`getDateRange`. Pull the existing 6-month rationale into a named
`ALL_TIME_MONTHS` constant.
- src/cli.ts: drop the local copies and import from cli-date.
- src/dashboard.tsx: drop the local copies, route through
`getPeriodRange`, alias the shared `getDateRange` import to
`getDateRangeShared` to avoid shadowing the wrapper.
- tests/cli-date.test.ts: 13 cases covering `'all'` regression guard
(must never silently fall back to `Date(0)`), CLI/dashboard agreement,
end-of-month clamping tolerance, `'yesterday'` support, and
unknown-input fallback.
- README.md, CHANGELOG.md: surface the bound and point heavy users at
`--from`/`--to` for unbounded windows.
The CLI flag `--period all` continues to be accepted; only the dashboard
window changes to match what the CLI was already doing. No public API
or schema change.
Refs #93
Replace execSync with execFileSync and argument arrays so shell
metacharacters in git branch names cannot be interpreted as commands.
Add SAFE_REF_PATTERN validation as defense in depth for branch names
from git symbolic-ref.
Addresses #214.
The installer now downloads and verifies a .sha256 companion file
before extracting and launching the menubar app. Build script and
CI workflow generate the checksum alongside the zip. Adds SECURITY.md
with reporting instructions.
Addresses #215.
Toggle label visibility instead of rebuilding panel children.
Label always added to panel, just hidden when compact=true.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GNOME 45+ extension that shows live token costs in the top bar panel
with a dropdown for provider breakdown, top activities/models, cache
stats, and budget alerts. Polls `codeburn status --format menubar-json`
every 30s — same data contract as the macOS menubar app.
Includes GSettings preferences (refresh interval, compact mode, budget
threshold, per-provider enable/disable toggles) with Libadwaita UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Heavy Codex users hit MAX_SESSION_FILE_BYTES (128 MB) on long-running
sessions. The file is read in full via readSessionFile and then split on
'\n', so even bumping the cap eventually runs into V8's 512 MB string
limit (split doubles the high-water mark).
readSessionLines is a streaming generator that already exists in
fs-utils for exactly this case but only readFirstLine was using it.
Switch the Codex provider to consume it and let the cap apply only when
streaming would still be unreasonable.
Changes:
- src/fs-utils.ts: introduce MAX_STREAM_SESSION_FILE_BYTES (2 GB) and
apply it in readSessionLines instead of the full-read cap. Keep
MAX_SESSION_FILE_BYTES for readSessionFile / readSessionFileSync
consumers that materialize the whole file.
- src/providers/codex.ts: replace `readSessionFile -> split('\n')` with
`for await (... of readSessionLines)`. Add sawAnyLine guard so a
failed/empty stream skips cache write, preserving the previous
early-return behavior.
Empirical impact on a real account with one 247 MB rollout: 7-day totals
went from 4,536 calls / €358.69 / 20.1M input tokens to 6,111 calls /
€550.67 / 37.3M input tokens. The previously-skipped session is now
included; no other behavior changes.
Refs #204
Turns whose only assistant tool is `Skill` collapse to category `general`
because `classifyByToolPattern` returns `'general'` and `refineByKeywords`
only operates on `coding`/`exploration`. In environments that lean on Claude
Code skills, the per-activity dashboard column flattens — every `/init`,
`/review`, `/security-review`, `/claude-api`, plus user-defined skills, all
land in `general` with no signal about which workflow ran.
Implements Option A from the issue:
- `ParsedApiCall.skills: string[]` populated in the Anthropic-path parser
via a new `extractSkillNames` helper that reads `input.skill || input.name`
from each `Skill` ToolUseBlock (mirrors `detectGhostSkills` extraction at
optimize.ts:765 so the two stay in sync).
- `ClassifiedTurn.subCategory?: string` set to the first skill name when the
resolved category is `general` AND any skill identifier was extracted.
Top-level category stays `general` — existing aggregations, exports, and
category-keyed code paths unchanged.
- `SessionSummary.skillBreakdown: Record<string, {turns,costUSD,editTurns,
oneShotTurns}>` populated in the same per-turn loop that builds
`categoryBreakdown`. Provider sessions (Codex/Cursor/etc.) keep `skills:
[]` — they don't expose the Skill tool surface today.
- Dashboard `ActivityBreakdown` renders top-N skill sub-rows beneath the
`general` row when present (indented `/skill-name`, dimmed). Other
categories render exactly as before; if no skills were invoked, the panel
is byte-identical to current output.
Existing 419 tests still pass. New `tests/classifier.test.ts` adds 8 cases:
single skill via `input.skill`, single via `input.name`, first-wins for
multi-skill turns, aggregation across multiple assistant calls in one turn,
no-name fallback (`subCategory` stays undefined), `Skill+Edit` promoting to
`coding` and dropping subCategory, non-Skill general turns, and a legacy
ParsedApiCall shape with `skills` field absent (forward-compat). Pre-fix
verification by stashing the source change reproduces 4/8 failures with the
exact "expected 'init', received undefined" diff; restoring → 8/8 pass.
Closes#203.
🤖 AI assistance disclosure: assistant-scaffolded by Claude (Opus 4.7);
author of record reviewed every line, ran the full vitest suite locally
(`npm test` → 32 files / 427 tests pass), `npx tsc --noEmit` clean, and
`npm run build` produces a clean ESM bundle.
CLI timeout increased from 20s to 45s to handle cold file-cache latency on
provider-specific queries. Loading overlay now appears when the all-provider
payload confirms a provider has spend but its dedicated data hasn't loaded yet.
Manual refresh (force: true) bypasses the in-flight guard so users can always
re-fetch. Tab strip prefers the provider-specific payload cost when available
so it stays in sync with the hero section.
- Antigravity: use loop index as fallback when responseId is empty to prevent
all entries in a cascade sharing the same dedup key; bump CACHE_VERSION to
force re-parse of stale cached data
- Codex: estimate tokens from message text when info is null (ChatGPT Plus/Pro
subscription sessions), feeding through calculateCost so subscription users
see API-equivalent spend; add costIsEstimated flag to ParsedProviderCall
- Update LiteLLM pricing snapshot
Claude Code writes the same message.id multiple times during streaming.
The first write has partial tokens (often 1) and no tool_use blocks.
The last write has authoritative token counts and all tool_use/MCP blocks.
Old behavior kept the first occurrence (keep-first), silently dropping
real output tokens (+6.3% undercount) and all MCP tool calls.
New behavior keeps the last occurrence's content but preserves the first
occurrence's timestamp for correct date bucketing.
Validated against 21,390 real session files: 40.5% had duplicate IDs,
output tokens were understated by up to 78% per session.
Use strip-ansi (already in dep tree via Ink) in extractBashCommands
to prevent terminal escape codes from leaking into dashboard bash
breakdown keys. Route goose, gemini, qwen, and openclaw through
extractBashCommands instead of inline split, which also gives them
multi-command extraction (matching claude/codex/droid behavior).
The "vs last month" line in the forecast section used a hardcoded $
instead of the user's selected currency symbol and rate. Use
asCompactCurrency() which handles both.
Closes#197
Strip Ink v7 DEC mode 2026 synchronized output markers (BSU/ESU) on
Windows. ConPTY does not implement this protocol and buffers
indefinitely, causing the dashboard to hang with no output. The patch
intercepts standalone BSU/ESU writes on stdout while preserving full
interactivity (keyboard, live refresh, cursor management).
Fix ExperimentalWarning timing: the process.emit patch was restored
synchronously in the finally block, but Node defers the warning via
process.nextTick. Delay restore by one tick so the patch is still
active when the warning fires.
Closes#195
Set accessory activation policy in willFinishLaunching before the
focus chain forms. Debounce observation tracking to coalesce rapid
property changes into a single status bar refresh.
Goose: read token usage from ~/.local/share/goose/sessions/sessions.db.
Lazy-loaded, zero overhead for non-Goose users.
Codex: use sessionId instead of file path in dedup key so forked
sessions sharing the same session_id don't double-count tokens.
- Attach a no-op `stream.on('error', () => {})` so a late read-ahead
error that races with a successful first-line yield can never escape
as an unhandled 'error' event. Defense in depth: empirically the
destroy() in finally already swallows it on Node 18+, but the listener
removes any version-dependent surprise.
- Tighten the comment to say "up to FIRST_LINE_READ_CAP" instead of
"regardless of length"; the cap is real and worth being precise about.
- Stop tracking a separate streamError flag. createReadStream's default
64 KiB highWaterMark means the stream may already be reading chunk 2
when we break out of the loop after yielding the first line; if that
later chunk errors, the flag could reject an otherwise-valid line.
readline's async iterator already re-throws stream errors on Node 16+,
which the existing catch handles.
- Test: 120 KB session_meta line forces multi-chunk line assembly.
- Test: truncated mid-write first line is rejected, not parsed as half
an object.
- Cap createReadStream at 1 MiB so a malformed file with no newline
cannot make readline buffer indefinitely (real session_meta lines
are 22-27 KB).
- Capture stream errors explicitly; readline's async iterator does
not always re-throw underlying stream errors per Node docs.
- Test: assert project is extracted from the >16 KB session_meta to
prove the line was actually parsed, not just discovered.
- Test: session_meta line with no trailing newline is still accepted.
- Test: empty rollout file is silently skipped.