mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-04-28 11:41:04 +00:00
feat(arena): add comparison summary for agent results (#3394)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Adds a summary view that runs after Arena agents finish, so users can compare model outputs without opening each agent's conversation first. Summary surface: - Agent status overview - Files changed in common vs. unique to one agent - Per-agent approach summary generated through that agent's own provider - Token / runtime / line-change / file-count metrics Selection dialog now supports: - p — toggle preview for the highlighted agent - d — toggle detailed diff - Enter — select winner - x — discard all results - Esc — cancel Approach summary generation: - Each agent's summary is generated through that agent's own content generator, keeping mixed-provider Arena sessions within their respective auth boundaries - 20s timeout + AbortController per agent, bounded prompt inputs (finalText 2K, transcript 6K, diff 6K) - Falls back to a deterministic "Changed N files ..." summary when no per-agent generator is available or on error Diff summary now handles binary, rename-only, and mode-only diffs; the previous heuristic required textual +/- hunks and would have dropped those. Resolves #2559
This commit is contained in:
parent
8a0489625b
commit
d1c8dff4d2
15 changed files with 1378 additions and 143 deletions
|
|
@ -90,8 +90,9 @@ When all agents complete, the Arena enters the result comparison phase. You'll s
|
|||
|
||||
- **Status summary**: Which agents succeeded, failed, or were cancelled
|
||||
- **Execution metrics**: Duration, rounds of reasoning, token usage, and tool call counts for each agent
|
||||
- **Arena comparison summary**: Files changed in common vs. by one agent only, line-change counts, token efficiency, and a high-level approach summary generated from each agent's diff, metrics, and conversation history
|
||||
|
||||
A selection dialog presents the successful agents. Choose one to apply its changes to your main workspace, or discard all results.
|
||||
A selection dialog presents the successful agents. Choose one to apply its changes to your main workspace, or discard all results. Press `p` to toggle a quick preview for the highlighted agent, or `d` to toggle that agent's detailed diff before selecting a winner.
|
||||
|
||||
### What happens when you select a winner
|
||||
|
||||
|
|
@ -99,7 +100,7 @@ A selection dialog presents the successful agents. Choose one to apply its chang
|
|||
2. The diff is applied to your main working directory
|
||||
3. All worktrees and temporary branches are cleaned up automatically
|
||||
|
||||
If you want to inspect results before deciding, each agent's full conversation history is available via the tab bar while the selection dialog is active.
|
||||
If you want to inspect the complete reasoning path before deciding, each agent's full conversation history is still available via the tab bar while the selection dialog is active.
|
||||
|
||||
## Configuration
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue