feat(arena): add comparison summary for agent results (#3394)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run

Adds a summary view that runs after Arena agents finish, so users can
compare model outputs without opening each agent's conversation first.

Summary surface:
- Agent status overview
- Files changed in common vs. unique to one agent
- Per-agent approach summary generated through that agent's own provider
- Token / runtime / line-change / file-count metrics

Selection dialog now supports:
- p — toggle preview for the highlighted agent
- d — toggle detailed diff
- Enter — select winner
- x — discard all results
- Esc — cancel

Approach summary generation:
- Each agent's summary is generated through that agent's own content
  generator, keeping mixed-provider Arena sessions within their
  respective auth boundaries
- 20s timeout + AbortController per agent, bounded prompt inputs
  (finalText 2K, transcript 6K, diff 6K)
- Falls back to a deterministic "Changed N files ..." summary when no
  per-agent generator is available or on error

Diff summary now handles binary, rename-only, and mode-only diffs;
the previous heuristic required textual +/- hunks and would have
dropped those.

Resolves #2559
This commit is contained in:
Reid 2026-04-22 05:31:19 +08:00 committed by GitHub
parent 8a0489625b
commit d1c8dff4d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
15 changed files with 1378 additions and 143 deletions

View file

@ -90,8 +90,9 @@ When all agents complete, the Arena enters the result comparison phase. You'll s
- **Status summary**: Which agents succeeded, failed, or were cancelled
- **Execution metrics**: Duration, rounds of reasoning, token usage, and tool call counts for each agent
- **Arena comparison summary**: Files changed in common vs. by one agent only, line-change counts, token efficiency, and a high-level approach summary generated from each agent's diff, metrics, and conversation history
A selection dialog presents the successful agents. Choose one to apply its changes to your main workspace, or discard all results.
A selection dialog presents the successful agents. Choose one to apply its changes to your main workspace, or discard all results. Press `p` to toggle a quick preview for the highlighted agent, or `d` to toggle that agent's detailed diff before selecting a winner.
### What happens when you select a winner
@ -99,7 +100,7 @@ A selection dialog presents the successful agents. Choose one to apply its chang
2. The diff is applied to your main working directory
3. All worktrees and temporary branches are cleaned up automatically
If you want to inspect results before deciding, each agent's full conversation history is available via the tab bar while the selection dialog is active.
If you want to inspect the complete reasoning path before deciding, each agent's full conversation history is still available via the tab bar while the selection dialog is active.
## Configuration