Commit graph

5 commits

Author SHA1 Message Date
iamtoruk
bd43b15342 feat(compare): model comparison with planning rate fix
5-section compare view: Performance (one-shot, retry, self-correction),
Efficiency (cost/call, cost/edit, output/call, cache hit), Category
Head-to-Head bar charts, Working Style, and Context.

Planning rate now detects TaskCreate/TaskUpdate/TodoWrite instead of
only EnterPlanMode (which was never used, showing 0% for all models).
Validated against raw JSONL with zero false positives.

Responsive side-by-side layout at 90+ cols. Self-correction scanner
with compact file skipping and model+timestamp dedup. 274 tests.
2026-04-19 08:34:49 -07:00
iamtoruk
fb24eea186 fix(compare): refine self-correction patterns, skip compact files, deduplicate
Remove high-false-positive patterns (I'm sorry, I should have, sorry for).
Add precise patterns (you're right I, that was incorrect, let me correct).
Skip compact JSONL files that replay compressed context.
Deduplicate by model+timestamp to prevent double-counting.
Fix test timestamps to work with deduplication.
2026-04-19 07:14:02 -07:00
iamtoruk
3cb9a7a7bc feat(compare): add self-correction JSONL scanner
Adds scanSelfCorrections() which reads raw .jsonl session files (including subagent dirs) and counts per-model self-correction patterns for use in the model comparison metrics.
2026-04-19 05:25:31 -07:00
iamtoruk
ac9afffed5 feat(compare): add computeComparison with normalized metrics 2026-04-19 05:22:34 -07:00
iamtoruk
9d119bfe40 feat(compare): add ModelStats type and aggregateModelStats 2026-04-19 05:20:37 -07:00