mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-05 23:42:03 +00:00
feat(review): expand review pipeline + qwen review CLI subcommands (#3754)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(review): expand review pipeline + add `qwen review` CLI subcommands
Review skill (SKILL.md) changes:
- Step 4: 5 → 9 parallel agents (split Correctness/Security, add Test
Coverage, 3 undirected personas: attacker / 3am-oncall / maintainer)
- Step 5: verification "uncertain → reject" → "uncertain → low-confidence"
(terminal-only "Needs Human Review" bucket; never posted as PR comments)
- Step 6: single reverse audit → iterative (terminate on no-new-findings,
hard cap 3 rounds)
- Step 9: self-PR detection (downgrade APPROVE/REQUEST_CHANGES → COMMENT
when GitHub forbids self-review with HTTP 422); CI status check
(downgrade APPROVE → COMMENT on red/pending CI); existing-Qwen-comment
classification with priority order Stale > Resolved > Overlap > NoConflict
(only Overlap blocks for confirmation)
`qwen review` CLI subcommands (packages/cli/src/commands/review/):
- fetch-pr — clean stale + fetch PR ref + create worktree + metadata
- pr-context — emit Markdown context file with security preamble +
already-discussed dedup section
- load-rules — read review rules from base branch (4 source files)
- deterministic— run tsc, eslint, ruff, cargo-clippy, go-vet, golangci-lint
on changed files; filtered + structured findings JSON
(TypeScript/JavaScript, Python, Rust, Go)
- presubmit — self-PR + CI status + existing-comment classification in
a single JSON report
- cleanup — worktree + branch ref + per-target temp files (idempotent)
Cross-platform: execFileSync (no shell), path.join, CRLF normalization,
which/where for tool detection. Replaces bash-style inline commands in
SKILL.md; works identically on macOS/Linux/Windows.
Path consistency: SKILL.md temp files moved from /tmp/qwen-review-* to
.qwen/tmp/qwen-review-* — matches what os.tmpdir() resolves to across
platforms (macOS returns /var/folders/... not /tmp).
DESIGN.md gains five "Why ..." sections explaining each design decision;
docs/users/features/code-review.md synced for user-visible changes.
* feat(review): expose full reply chains in pr-context output
`qwen review pr-context` now renders each replied-to inline-comment thread
as the original reviewer comment + chronological reply chain, instead of
only listing the root-comment snippet. This lets review agents see at a
glance whether a topic has been addressed (e.g. a "Fixed in <commit>"
reply closes the thread) and avoids re-reporting already-resolved
concerns without forcing the LLM driver to manually summarise each reply
chain in agent prompts.
- Walk `in_reply_to_id` chain to group replies under their root comment
- Sort replies chronologically (by id, monotonic on GitHub)
- Render thread block: root snippet as a quote + bulleted reply list
- Sort threads by `(path, line)` for deterministic output
- SKILL.md note updated to point agents at the new chain format
* feat(review): include review-level summaries in pr-context output
`qwen review pr-context` now also fetches `gh api repos/{owner}/{repo}/pulls/{n}/reviews`
and renders a "Review summaries" section listing each reviewer's
overall body (the comment they typed alongside an APPROVED /
CHANGES_REQUESTED / COMMENTED submission). Closes a real gap found
during the PR #3684 review:
> "@wenshao [CHANGES_REQUESTED]: The previously identified exported
> type rename issue no longer maps to the current PR diff, so this
> review only includes the remaining high-confidence blocker."
Without this section, the LLM driver's review agents would have missed
that integration note from the prior reviewer.
- New `RawReview` type + extra `ghApi` call
- Filter: skip empty bodies + the canonical "No issues found. LGTM!"
template the qwen-review pipeline auto-emits — those carry no
agent-actionable content beyond the review state itself
- Sort meaningful reviews by `submitted_at` for chronological output
- Stdout summary now reports `M/N review summaries` (M = kept after
filter)
Smoke-tested on PR #3684: 30 inline, 3 issue, 1/30 review summaries
correctly surfaces the @wenshao CHANGES_REQUESTED body and filters the
29 LGTM templates.
* fix(review): paginate gh API calls to capture comments past page 1
`gh api <path>` defaults to per_page=30. Busy PRs cross that limit on
inline comments, issue comments, and reviews — the latest entries (the
ones most likely to contain new reviewer feedback or in-flight reply
chains) end up on page 2+ and were silently truncated.
Concrete bug found while re-reviewing PR #3684:
Before: `30 inline, 3 issue comments, 1/30 review summaries`
After: `97 inline, 3 issue comments, 6/67 review summaries`
5 additional reviewer-level summaries surfaced — including the
@wenshao 2026-04-30 "Multi-agent re-review (Phase C)" body with the
explicit verification notes that this PR's pipeline is supposed to
chain forward into the next review.
Changes:
- `lib/gh.ts`: new `ghApiAll(path)` helper using `gh api --paginate`,
which walks every `next` link and concatenates each page's array.
- `pr-context.ts`: 3 fetches (inline / issue / reviews) → `ghApiAll`.
- `presubmit.ts`: PR comments fetch → `ghApiAll` too (existing-comment
classification was equally susceptible to dropping page 2+ overlap
candidates).
`check-runs` and `commits/<sha>/status` calls retain `ghApi` — those
return objects (with embedded arrays) and rarely cross 30 entries.
---------
Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>
This commit is contained in:
parent
431a87c384
commit
35fe97e0f6
14 changed files with 2423 additions and 138 deletions
|
|
@ -29,14 +29,16 @@ The `/review` command runs a multi-stage pipeline:
|
|||
Step 1: Determine scope (local diff / PR worktree / file)
|
||||
Step 2: Load project review rules
|
||||
Step 3: Run deterministic analysis (linter, typecheck) [zero LLM cost]
|
||||
Step 4: 5 parallel review agents [5 LLM calls]
|
||||
|-- Agent 1: Correctness & Security
|
||||
|-- Agent 2: Code Quality
|
||||
|-- Agent 3: Performance & Efficiency
|
||||
|-- Agent 4: Undirected Audit
|
||||
'-- Agent 5: Build & Test (runs shell commands)
|
||||
Step 4: 9 parallel review agents [9 LLM calls]
|
||||
|-- Agent 1: Correctness
|
||||
|-- Agent 2: Security
|
||||
|-- Agent 3: Code Quality
|
||||
|-- Agent 4: Performance & Efficiency
|
||||
|-- Agent 5: Test Coverage
|
||||
|-- Agent 6: Undirected Audit (3 personas: 6a/6b/6c)
|
||||
'-- Agent 7: Build & Test (runs shell commands)
|
||||
Step 5: Deduplicate --> Batch verify --> Aggregate [1 LLM call]
|
||||
Step 6: Reverse audit (find coverage gaps) [1 LLM call]
|
||||
Step 6: Iterative reverse audit (1-3 rounds, gap finding) [1-3 LLM calls]
|
||||
Step 7: Present findings + verdict
|
||||
Step 8: Autofix (user-confirmed, optional)
|
||||
Step 9: Post PR inline comments (if requested)
|
||||
|
|
@ -46,15 +48,17 @@ Step 11: Clean up (remove worktree + temp files)
|
|||
|
||||
### Review Agents
|
||||
|
||||
| Agent | Focus |
|
||||
| --------------------------------- | ------------------------------------------------------------------ |
|
||||
| Agent 1: Correctness & Security | Logic errors, null handling, race conditions, injection, XSS, SSRF |
|
||||
| Agent 2: Code Quality | Style consistency, naming, duplication, dead code |
|
||||
| Agent 3: Performance & Efficiency | N+1 queries, memory leaks, unnecessary re-renders, bundle size |
|
||||
| Agent 4: Undirected Audit | Business logic, boundary interactions, hidden coupling |
|
||||
| Agent 5: Build & Test | Runs build and test commands, reports failures |
|
||||
| Agent | Focus |
|
||||
| --------------------------------- | ------------------------------------------------------------------------------------------- |
|
||||
| Agent 1: Correctness | Logic errors, edge cases, null handling, race conditions, type safety |
|
||||
| Agent 2: Security | Injection, XSS, SSRF, auth bypass, sensitive data exposure |
|
||||
| Agent 3: Code Quality | Style consistency, naming, duplication, dead code |
|
||||
| Agent 4: Performance & Efficiency | N+1 queries, memory leaks, unnecessary re-renders, bundle size |
|
||||
| Agent 5: Test Coverage | Untested code paths in the diff, missing branch coverage, weak assertions |
|
||||
| Agent 6: Undirected Audit | 3 parallel personas (attacker / 3am-oncall / maintainer) — catches cross-dimensional issues |
|
||||
| Agent 7: Build & Test | Runs build and test commands, reports failures |
|
||||
|
||||
All agents run in parallel. Findings from Agents 1-4 are verified in a **single batch verification pass** (one agent reviews all findings at once, keeping LLM calls fixed). After verification, a **reverse audit agent** re-reads the entire diff with knowledge of all confirmed findings to catch issues that every other agent missed. Reverse audit findings skip the verification step (the agent already has full context) and are included directly as high-confidence results.
|
||||
All agents run in parallel (Agent 6 launches 3 persona variants concurrently, totaling 9 parallel tasks for same-repo reviews). Findings from Agents 1-6 are verified in a **single batch verification pass** (one agent reviews all findings at once, keeping verification cost fixed regardless of finding count). After verification, **iterative reverse audit** runs 1-3 rounds of gap-finding — each round receives the cumulative finding list from prior rounds, so successive rounds focus on whatever's left undiscovered. The loop stops as soon as a round returns "No issues found", or after 3 rounds (hard cap). Reverse audit findings skip verification (the agent already has full context) and are included as high-confidence results.
|
||||
|
||||
## Deterministic Analysis
|
||||
|
||||
|
|
@ -127,8 +131,8 @@ This runs in **lightweight mode** — no worktree, no linter, no build/test, no
|
|||
|
||||
| Capability | Same-repo | Cross-repo |
|
||||
| ------------------------------------------------ | --------- | ----------------------------- |
|
||||
| LLM review (Agents 1-4 + verify + reverse audit) | ✅ | ✅ |
|
||||
| Agent 5: Build & test | ✅ | ❌ (no local codebase) |
|
||||
| LLM review (Agents 1-6 + verify + iterative reverse audit) | ✅ | ✅ |
|
||||
| Agent 7: Build & test | ✅ | ❌ (no local codebase) |
|
||||
| Deterministic analysis (linter/typecheck) | ✅ | ❌ |
|
||||
| Cross-file impact analysis | ✅ | ❌ |
|
||||
| Autofix | ✅ | ❌ |
|
||||
|
|
@ -157,6 +161,12 @@ Or, after running `/review 123`, type `post comments` to publish findings withou
|
|||
- Nice to have findings (including linter warnings)
|
||||
- Low-confidence findings
|
||||
|
||||
**Self-authored PRs:** GitHub does not allow you to submit `APPROVE` or `REQUEST_CHANGES` reviews on your own pull request — both fail with HTTP 422. When `/review` detects that the PR author matches the current authenticated user, it automatically downgrades the API event to `COMMENT` regardless of verdict, so the submission still succeeds. The terminal still shows the honest verdict ("Approve" / "Request changes" / "Comment") — only the GitHub-side review event is neutralized. The actual findings still appear as inline comments on specific lines, so substantive feedback is unchanged.
|
||||
|
||||
**Re-reviewing a PR with prior Qwen Code comments:** when `/review` runs on a PR that already has previous Qwen Code review comments, it classifies them before posting new ones. Only **same-line overlap** (an existing comment on the same `(path, line)` as a new finding) prompts you to confirm — that's the case where you'd see a visual duplicate on the same code line. Comments from older commits, replied-to comments (treated as resolved), and comments that simply don't overlap with any new finding are silently skipped, with a terminal log line so you know what was filtered.
|
||||
|
||||
**CI / build status check before APPROVE:** if the verdict is "Approve", `/review` queries the PR's check-runs and commit statuses before submitting. If any check has failed (or all checks are still pending), the API event is automatically downgraded from `APPROVE` to `COMMENT`, with the review body explaining why. Rationale: the LLM review reads code statically and cannot see runtime test failures; approving while CI is red would be misleading. The inline findings are still posted unchanged. If you want to approve anyway (e.g., a known-flaky CI failure), submit the GitHub approval manually after verifying.
|
||||
|
||||
## Follow-up Actions
|
||||
|
||||
After the review, context-aware tips appear as ghost text. Press Tab to accept:
|
||||
|
|
@ -179,7 +189,7 @@ You can customize review criteria per project. `/review` reads rules from these
|
|||
3. `AGENTS.md` — `## Code Review` section
|
||||
4. `QWEN.md` — `## Code Review` section
|
||||
|
||||
Rules are injected into the LLM review agents (1-4) as additional criteria. For PR reviews, rules are read from the **base branch** to prevent a malicious PR from injecting bypass rules.
|
||||
Rules are injected into the LLM review agents (1-6) as additional criteria. For PR reviews, rules are read from the **base branch** to prevent a malicious PR from injecting bypass rules.
|
||||
|
||||
Example `.qwen/review-rules.md`:
|
||||
|
||||
|
|
@ -246,15 +256,17 @@ For large diffs (>10 modified symbols), analysis prioritizes functions with sign
|
|||
|
||||
## Token Efficiency
|
||||
|
||||
The review pipeline uses a fixed number of LLM calls regardless of how many findings are produced:
|
||||
The review pipeline uses a bounded number of LLM calls regardless of how many findings are produced:
|
||||
|
||||
| Stage | LLM calls | Notes |
|
||||
| ------------------------------- | ---------- | --------------------------------------------------- |
|
||||
| Deterministic analysis (Step 3) | 0 | Shell commands only |
|
||||
| Review agents (Step 4) | 5 (or 4) | Run in parallel; Agent 5 skipped in cross-repo mode |
|
||||
| Batch verification (Step 5) | 1 | Single agent verifies all findings at once |
|
||||
| Reverse audit (Step 6) | 1 | Finds coverage gaps; findings skip verification |
|
||||
| **Total** | **7 or 6** | Same-repo: 7; cross-repo: 6 (no Agent 5) |
|
||||
| Stage | LLM calls | Notes |
|
||||
| -------------------------------- | ----------------- | ---------------------------------------------------- |
|
||||
| Deterministic analysis (Step 3) | 0 | Shell commands only |
|
||||
| Review agents (Step 4) | 9 (or 8) | Run in parallel; Agent 7 skipped in cross-repo mode |
|
||||
| Batch verification (Step 5) | 1 | Single agent verifies all findings at once |
|
||||
| Iterative reverse audit (Step 6) | 1-3 | Loops until "No issues found" or 3-round cap |
|
||||
| **Total** | **11-13 (10-12)** | Same-repo: 11-13; cross-repo: 10-12 (no Agent 7) |
|
||||
|
||||
Most PRs converge to the lower end of the range (1 reverse audit round); the cap prevents runaway cost on pathological cases.
|
||||
|
||||
## What's NOT Flagged
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue