codeburn/tests/providers/cursor-bubble-dedup.test.ts
Resham Joshi daa673449c
Some checks are pending
CI / semgrep (push) Waiting to run
Menubar and CLI hardening from multi-agent audit (#257)
Two passes of validators across CLI accuracy, dashboard UX, menubar Swift,
performance, security, and end-to-end smoke tests on real session data.

Data-correctness fixes:

- parseLocalDate rejects month/day overflow. JS Date silently rolled
  Feb 31 to Mar 3, so --from 2026-02-31 --to 2026-03-15 quietly dropped
  sessions on Feb 28 - Mar 2. Now throws "Invalid date" with a clear
  reason. Leap-day case covered (2024-02-29 valid, 2025-02-29 rejected).

- CSV/JSON exports use the active currency's natural decimal places. The
  previous round2 helper produced ¥412.37 in CSV while the dashboard
  rendered ¥412 — finance teams comparing the two surfaces saw a
  discrepancy. New roundForActiveCurrency consults Intl.NumberFormat for
  the right precision (0 for JPY/KRW/CLP, 2 for USD/EUR, etc).

- Copilot toolRequests is Array.isArray-guarded in both modern and legacy
  event branches. Previously a corrupt session with toolRequests=null or
  a string aborted the whole file's parse loop and silently dropped every
  legitimate call after it.

- Codex token_count dedup uses a null sentinel for prevCumulativeTotal so
  the first event is never confused with a duplicate. Sessions that emit
  only last_token_usage (no total_token_usage) report cumulativeTotal=0
  on every event; with the previous 0-initialized prev, the first event
  matched the dedup guard and was dropped.

- LiteLLM pricing values are clamped to [0, 1] per token via safePerTokenRate.
  Defense in depth against a tampered upstream JSON shipping negative or
  absurdly large per-token costs that would otherwise propagate into all
  cost totals.

Performance:

- Cursor SQLite parse no longer pegs at minutes on multi-GB DBs. Two
  changes: per-conversation user-message buffer uses an index pointer
  instead of Array.shift() (which was O(n) per call); and a real ROWID
  cutoff via subquery limits the scan to the most recent 250k bubbles
  with a stderr warning so power users get a partial report rather than
  a stalled CLI.

- Spawned codeburn CLI subprocesses are terminated when the calling Task
  is cancelled. Without this, rapid period/provider tab clicks in the
  menubar cancelled the Task but left the subprocess running to
  completion, piling up zombie processes.

UX:

- Dashboard period switch flips to loading and clears projects
  synchronously before reloadData runs, eliminating the frame where the
  new period label rendered over the old period's projects.

- Optimize findings tab paginates 3-at-a-time with j/k scroll. With 4
  new detectors plus 7 originals, 8-10 findings * 6 lines was scrolling
  the StatusBar off the alt buffer top.

- Custom --from/--to ranges hide the period tab strip and disable the
  1-5 / arrow keys so a stray period press no longer abandons the user's
  explicit range. A "Custom range: X to Y" banner replaces the tab strip.

- OpenCode storage-format warning is per-table-set, rate-limited to once
  per process, and points the user at OpenCode's migration step or the
  issue tracker. The previous all-or-nothing check fired the generic
  "format not recognized" string for any schema mismatch.

Menubar / OAuth:

- Both Claude and Codex bootstrap (Reconnect button) now honour the
  usageBlockedUntil 429 backoff that refreshIfBootstrapped respects.
  Spamming Reconnect during sustained rate-limit windows previously
  hammered the upstream endpoint on every click.

- Codex Retry-After HTTP header is parsed (delta-seconds plus IMF-fixdate
  fallback) so we don't over-back-off when ChatGPT tells us a shorter
  window than our 5-minute floor.

- Both credential cache files are written via SafeFile.write
  (O_CREAT | O_EXCL | O_NOFOLLOW with explicit 0600) so there is no race
  window where the temp file briefly exists at default umask, and a
  symlink at the destination cannot redirect the write. Reads now route
  through SafeFile.read with a 64 KiB cap, closing the symlink-follow gap
  on Data(contentsOf:).

CI signal:

- TypeScript strict typecheck (tsc --noEmit) is now zero errors. The
  six errors in src/providers/copilot.ts came from a discriminated-union
  catch-all branch whose `data: Record<string, unknown>` shape TS picked
  over the specific event branches when narrowing on `type`. Removed the
  catch-all; runtime falls through unknown event types via the existing
  if/else chain.

Tests added: 16 new (now 555 total)
- date-range-filter: month/day/year overflow rejection, leap-day correctness
- currency-rounding: convertCost no-rounding contract, roundForActiveCurrency
  for USD/JPY/KRW/EUR
- providers/copilot: malformed toolRequests does not abort the parse
- providers/cursor-bubble-dedup: re-parse after token mutation does not
  double-count, single parse yields one call per bubble
- providers/codex: first event with cumulativeTotal=0 not dropped,
  consecutive zero-cumulative duplicates still deduped
2026-05-06 22:15:11 -07:00

176 lines
6.6 KiB
TypeScript

import { describe, it, expect, beforeEach, afterEach } from 'vitest'
import { mkdtemp, rm, writeFile } from 'fs/promises'
import { tmpdir } from 'os'
import { join } from 'path'
import { isSqliteAvailable, openDatabase } from '../../src/sqlite.js'
import { getAllProviders } from '../../src/providers/index.js'
import type { Provider, ParsedProviderCall } from '../../src/providers/types.js'
/// Pinned regression for the v3 bubble-dedup fix. The previous (v2) code used
/// the bubble row's mutable token counts as part of the deduplication key, so
/// the same bubble was counted twice once Cursor wrote the streaming-complete
/// final token totals on top of the streaming-in-progress row. v3 switched to
/// the SQLite primary `key` column (which is the stable bubbleId:<id>:<id>
/// path) so re-parsing the same DB after token updates produces zero new
/// calls. This test:
/// 1. Builds a tmp SQLite DB with the cursorDiskKV schema and one bubble row
/// with low token counts (the streaming-in-progress shape).
/// 2. Parses it through the cursor provider. Asserts one call.
/// 3. Mutates the row in place to higher token counts (the streaming-complete
/// shape) without changing the SQLite key.
/// 4. Re-parses with the SAME seenKeys set. Asserts zero new calls.
/// If a future refactor brings back token-count-based dedup, the second parse
/// will produce a duplicate call and this test will fail.
const skipReason = isSqliteAvailable()
? null
: 'node:sqlite not available — needs Node 22+; skipping'
let tmpDir: string
beforeEach(async () => {
tmpDir = await mkdtemp(join(tmpdir(), 'cursor-dedup-'))
})
afterEach(async () => {
await rm(tmpDir, { recursive: true, force: true })
})
function buildBubbleValue(opts: {
conversationId: string
text: string
inputTokens: number
outputTokens: number
type: 1 | 2
createdAt?: string
}): string {
return JSON.stringify({
type: opts.type,
conversationId: opts.conversationId,
text: opts.text,
tokenCount: {
inputTokens: opts.inputTokens,
outputTokens: opts.outputTokens,
},
createdAt: opts.createdAt ?? new Date().toISOString(),
modelId: 'gpt-5',
capabilityType: 'composer',
})
}
async function createCursorTestDb(): Promise<string> {
// Cursor uses a non-extension state DB filename (state.vscdb in the real app);
// any path works for openDatabase as long as we set up the schema and the
// directory layout the parser expects. The parser only checks the DB
// contents — discovery is bypassed because we hand it the path directly.
const dbPath = join(tmpDir, 'state.vscdb')
await writeFile(dbPath, '')
// Use the underlying node:sqlite to create the schema.
// We need cursorDiskKV with key + value columns.
const Module = await import('node:module')
const requireForSqlite = Module.createRequire(import.meta.url)
const { DatabaseSync } = requireForSqlite('node:sqlite') as {
DatabaseSync: new (path: string) => {
exec(sql: string): void
prepare(sql: string): { run(...p: unknown[]): unknown }
close(): void
}
}
const db = new DatabaseSync(dbPath)
db.exec('CREATE TABLE cursorDiskKV (key TEXT PRIMARY KEY, value TEXT)')
// Single assistant bubble (type=2). The parser yields one ParsedProviderCall
// per bubbleId:% row, so a multi-row fixture would muddy the dedup count;
// we keep the test surface minimal — one bubble through one parse, then
// the same bubble again after token mutation.
const bubbleKey = 'bubbleId:abc-123:bubble-xyz'
db.prepare('INSERT INTO cursorDiskKV (key, value) VALUES (?, ?)').run(
bubbleKey,
buildBubbleValue({
conversationId: 'abc-123',
text: 'def hello(): pass',
inputTokens: 100,
outputTokens: 20,
type: 2,
})
)
db.close()
return dbPath
}
async function updateAssistantBubbleTokens(dbPath: string, inputTokens: number, outputTokens: number): Promise<void> {
const Module = await import('node:module')
const requireForSqlite = Module.createRequire(import.meta.url)
const { DatabaseSync } = requireForSqlite('node:sqlite') as {
DatabaseSync: new (path: string) => {
prepare(sql: string): { run(...p: unknown[]): unknown }
close(): void
}
}
const db = new DatabaseSync(dbPath)
db.prepare('UPDATE cursorDiskKV SET value = ? WHERE key = ?').run(
buildBubbleValue({
conversationId: 'abc-123',
text: 'def hello(): pass',
inputTokens,
outputTokens,
type: 2,
}),
'bubbleId:abc-123:bubble-xyz'
)
db.close()
}
async function getCursorProvider(): Promise<Provider> {
const all = await getAllProviders()
const p = all.find(p => p.name === 'cursor')
if (!p) throw new Error('cursor provider not registered')
return p
}
describe.skipIf(skipReason !== null)('cursor bubble dedup (regression for v3 fix)', () => {
it('does not double-count when bubble token counts mutate between parses', async () => {
const dbPath = await createCursorTestDb()
const provider = await getCursorProvider()
// First parse: streaming-in-progress shape.
const seenKeys = new Set<string>()
const source = { path: dbPath, project: 'test-project', provider: 'cursor' }
const firstRunCalls: ParsedProviderCall[] = []
for await (const call of provider.createSessionParser(source, seenKeys).parse()) {
firstRunCalls.push(call)
}
expect(firstRunCalls.length).toBe(1)
// Cursor mutates the same bubble row to its final token totals when the
// stream completes. Simulate by updating in place. The SQLite primary
// key stays the same.
await updateAssistantBubbleTokens(dbPath, 250, 80)
// Second parse with the SAME seenKeys: must yield zero new calls. If the
// dedup key were derived from token counts (the v2 bug), this would
// produce a duplicate.
const secondRunCalls: ParsedProviderCall[] = []
for await (const call of provider.createSessionParser(source, seenKeys).parse()) {
secondRunCalls.push(call)
}
expect(secondRunCalls.length).toBe(0)
})
it('does not yield the same bubble twice within a single parser run', async () => {
const dbPath = await createCursorTestDb()
const provider = await getCursorProvider()
const seenKeys = new Set<string>()
const source = { path: dbPath, project: 'test-project', provider: 'cursor' }
const calls: ParsedProviderCall[] = []
for await (const call of provider.createSessionParser(source, seenKeys).parse()) {
calls.push(call)
}
// One bubble in the DB → one call. (The user message row at type=1 is
// not surfaced as a separate ParsedProviderCall; it's threaded into the
// assistant call's userMessage field.)
expect(calls.length).toBe(1)
})
})