vrr/qwen-code

Fork 0

mirror of https://github.com/QwenLM/qwen-code.git synced 2026-05-05 15:31:27 +00:00

Shaojin Wen 35fe97e0f6

Qwen Code CI / Lint (push) Waiting to run

Details

Qwen Code CI / Test (push) Blocked by required conditions

Details

Qwen Code CI / Test-1 (push) Blocked by required conditions

Details

Qwen Code CI / Test-2 (push) Blocked by required conditions

Details

Qwen Code CI / Test-3 (push) Blocked by required conditions

Details

Qwen Code CI / Test-4 (push) Blocked by required conditions

Details

Qwen Code CI / Test-5 (push) Blocked by required conditions

Details

Qwen Code CI / Test-6 (push) Blocked by required conditions

Details

Qwen Code CI / Test-7 (push) Blocked by required conditions

Details

Qwen Code CI / Test-8 (push) Blocked by required conditions

Details

Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions

Details

Qwen Code CI / CodeQL (push) Waiting to run

Details

E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run

Details

E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run

Details

E2E Tests / E2E Test - macOS (push) Waiting to run

Details

feat(review): expand review pipeline + qwen review CLI subcommands (#3754 )

* feat(review): expand review pipeline + add `qwen review` CLI subcommands

Review skill (SKILL.md) changes:
- Step 4: 5 → 9 parallel agents (split Correctness/Security, add Test
  Coverage, 3 undirected personas: attacker / 3am-oncall / maintainer)
- Step 5: verification "uncertain → reject" → "uncertain → low-confidence"
  (terminal-only "Needs Human Review" bucket; never posted as PR comments)
- Step 6: single reverse audit → iterative (terminate on no-new-findings,
  hard cap 3 rounds)
- Step 9: self-PR detection (downgrade APPROVE/REQUEST_CHANGES → COMMENT
  when GitHub forbids self-review with HTTP 422); CI status check
  (downgrade APPROVE → COMMENT on red/pending CI); existing-Qwen-comment
  classification with priority order Stale > Resolved > Overlap > NoConflict
  (only Overlap blocks for confirmation)

`qwen review` CLI subcommands (packages/cli/src/commands/review/):
- fetch-pr     — clean stale + fetch PR ref + create worktree + metadata
- pr-context   — emit Markdown context file with security preamble +
                 already-discussed dedup section
- load-rules   — read review rules from base branch (4 source files)
- deterministic— run tsc, eslint, ruff, cargo-clippy, go-vet, golangci-lint
                 on changed files; filtered + structured findings JSON
                 (TypeScript/JavaScript, Python, Rust, Go)
- presubmit    — self-PR + CI status + existing-comment classification in
                 a single JSON report
- cleanup      — worktree + branch ref + per-target temp files (idempotent)

Cross-platform: execFileSync (no shell), path.join, CRLF normalization,
which/where for tool detection. Replaces bash-style inline commands in
SKILL.md; works identically on macOS/Linux/Windows.

Path consistency: SKILL.md temp files moved from /tmp/qwen-review-* to
.qwen/tmp/qwen-review-* — matches what os.tmpdir() resolves to across
platforms (macOS returns /var/folders/... not /tmp).

DESIGN.md gains five "Why ..." sections explaining each design decision;
docs/users/features/code-review.md synced for user-visible changes.

* feat(review): expose full reply chains in pr-context output

`qwen review pr-context` now renders each replied-to inline-comment thread
as the original reviewer comment + chronological reply chain, instead of
only listing the root-comment snippet. This lets review agents see at a
glance whether a topic has been addressed (e.g. a "Fixed in <commit>"
reply closes the thread) and avoids re-reporting already-resolved
concerns without forcing the LLM driver to manually summarise each reply
chain in agent prompts.

- Walk `in_reply_to_id` chain to group replies under their root comment
- Sort replies chronologically (by id, monotonic on GitHub)
- Render thread block: root snippet as a quote + bulleted reply list
- Sort threads by `(path, line)` for deterministic output
- SKILL.md note updated to point agents at the new chain format

* feat(review): include review-level summaries in pr-context output

`qwen review pr-context` now also fetches `gh api repos/{owner}/{repo}/pulls/{n}/reviews`
and renders a "Review summaries" section listing each reviewer's
overall body (the comment they typed alongside an APPROVED /
CHANGES_REQUESTED / COMMENTED submission). Closes a real gap found
during the PR #3684 review:

> "@wenshao [CHANGES_REQUESTED]: The previously identified exported
> type rename issue no longer maps to the current PR diff, so this
> review only includes the remaining high-confidence blocker."

Without this section, the LLM driver's review agents would have missed
that integration note from the prior reviewer.

- New `RawReview` type + extra `ghApi` call
- Filter: skip empty bodies + the canonical "No issues found. LGTM!"
  template the qwen-review pipeline auto-emits — those carry no
  agent-actionable content beyond the review state itself
- Sort meaningful reviews by `submitted_at` for chronological output
- Stdout summary now reports `M/N review summaries` (M = kept after
  filter)

Smoke-tested on PR #3684: 30 inline, 3 issue, 1/30 review summaries
correctly surfaces the @wenshao CHANGES_REQUESTED body and filters the
29 LGTM templates.

* fix(review): paginate gh API calls to capture comments past page 1

`gh api <path>` defaults to per_page=30. Busy PRs cross that limit on
inline comments, issue comments, and reviews — the latest entries (the
ones most likely to contain new reviewer feedback or in-flight reply
chains) end up on page 2+ and were silently truncated.

Concrete bug found while re-reviewing PR #3684:
  Before: `30 inline, 3 issue comments, 1/30 review summaries`
  After:  `97 inline, 3 issue comments, 6/67 review summaries`

5 additional reviewer-level summaries surfaced — including the
@wenshao 2026-04-30 "Multi-agent re-review (Phase C)" body with the
explicit verification notes that this PR's pipeline is supposed to
chain forward into the next review.

Changes:
- `lib/gh.ts`: new `ghApiAll(path)` helper using `gh api --paginate`,
  which walks every `next` link and concatenates each page's array.
- `pr-context.ts`: 3 fetches (inline / issue / reviews) → `ghApiAll`.
- `presubmit.ts`: PR comments fetch → `ghApiAll` too (existing-comment
  classification was equally susceptible to dropping page 2+ overlap
  candidates).

`check-runs` and `commits/<sha>/status` calls retain `ghApi` — those
return objects (with embedded arrays) and rarely cross 30 entries.

---------

Co-authored-by: wenshao <wenshao@U-K7F6PQY3-2157.local>

2026-05-01 18:30:35 +08:00

16 KiB

Raw Blame History

Code Review

Review code changes for correctness, security, performance, and code quality using /review.

Quick Start

# Review local uncommitted changes
/review

# Review a pull request (by number or URL)
/review 123
/review https://github.com/org/repo/pull/123

# Review and post inline comments on the PR
/review 123 --comment

# Review a specific file
/review src/utils/auth.ts

If there are no uncommitted changes, /review will let you know and stop — no agents are launched.

How It Works

The /review command runs a multi-stage pipeline:

Step 1:  Determine scope (local diff / PR worktree / file)
Step 2:  Load project review rules
Step 3:  Run deterministic analysis (linter, typecheck)    [zero LLM cost]
Step 4:  9 parallel review agents                          [9 LLM calls]
           |-- Agent 1: Correctness
           |-- Agent 2: Security
           |-- Agent 3: Code Quality
           |-- Agent 4: Performance & Efficiency
           |-- Agent 5: Test Coverage
           |-- Agent 6: Undirected Audit (3 personas: 6a/6b/6c)
           '-- Agent 7: Build & Test (runs shell commands)
Step 5:  Deduplicate --> Batch verify --> Aggregate         [1 LLM call]
Step 6:  Iterative reverse audit (1-3 rounds, gap finding) [1-3 LLM calls]
Step 7:  Present findings + verdict
Step 8:  Autofix (user-confirmed, optional)
Step 9:  Post PR inline comments (if requested)
Step 10: Save report + incremental cache
Step 11: Clean up (remove worktree + temp files)

Review Agents

Agent	Focus
Agent 1: Correctness	Logic errors, edge cases, null handling, race conditions, type safety
Agent 2: Security	Injection, XSS, SSRF, auth bypass, sensitive data exposure
Agent 3: Code Quality	Style consistency, naming, duplication, dead code
Agent 4: Performance & Efficiency	N+1 queries, memory leaks, unnecessary re-renders, bundle size
Agent 5: Test Coverage	Untested code paths in the diff, missing branch coverage, weak assertions
Agent 6: Undirected Audit	3 parallel personas (attacker / 3am-oncall / maintainer) — catches cross-dimensional issues
Agent 7: Build & Test	Runs build and test commands, reports failures

All agents run in parallel (Agent 6 launches 3 persona variants concurrently, totaling 9 parallel tasks for same-repo reviews). Findings from Agents 1-6 are verified in a single batch verification pass (one agent reviews all findings at once, keeping verification cost fixed regardless of finding count). After verification, iterative reverse audit runs 1-3 rounds of gap-finding — each round receives the cumulative finding list from prior rounds, so successive rounds focus on whatever's left undiscovered. The loop stops as soon as a round returns "No issues found", or after 3 rounds (hard cap). Reverse audit findings skip verification (the agent already has full context) and are included as high-confidence results.

Deterministic Analysis

Before the LLM agents run, /review automatically runs your project's existing linters and type checkers:

Language	Tools detected
TypeScript/JavaScript	`tsc --noEmit`, `npm run lint`, `eslint`
Python	`ruff`, `mypy`, `flake8`
Rust	`cargo clippy`
Go	`go vet`, `golangci-lint`
Java	`mvn compile`, `checkstyle`, `spotbugs`, `pmd`
C/C++	`clang-tidy` (if `compile_commands.json` available)
Other	Auto-discovered from CI config (`.github/workflows/*.yml`, etc.)

For projects that don't match standard patterns (e.g., OpenJDK), /review reads CI configuration files to discover what lint/check commands the project uses. No user configuration needed.

Deterministic findings are tagged with [linter] or [typecheck] and skip LLM verification — they are ground truth.

Errors → Critical severity
Warnings → Nice to have (terminal only, not posted as PR comments)

If a tool is not installed or times out, it is skipped with an informational note.

Severity Levels

Severity	Meaning	Posted as PR comment?
Critical	Must fix before merging (bugs, security, data loss, build failures)	Yes (high-confidence only)
Suggestion	Recommended improvement	Yes (high-confidence only)
Nice to have	Optional optimization	No (terminal only)

Low-confidence findings appear in a separate "Needs Human Review" section in the terminal and are never posted as PR comments.

Autofix

After presenting findings, /review offers to auto-apply fixes for Critical and Suggestion findings that have clear solutions:

Found 3 issues with auto-fixable suggestions. Apply auto-fixes? (y/n)

Fixes are applied using the edit tool (targeted replacements, not full-file rewrites)
Per-file linter checks run after fixes to verify they don't introduce new issues
For PR reviews, fixes are committed and pushed from the worktree automatically — your working tree stays clean
Nice to have and low-confidence findings are never auto-fixed
PR review submission always uses the pre-fix verdict (e.g., "Request changes") since the remote PR hasn't been updated until the autofix push completes

Worktree Isolation

When reviewing a PR, /review creates a temporary git worktree (.qwen/tmp/review-pr-<number>) instead of switching your current branch. This means:

Your working tree, staged changes, and current branch are never touched
Dependencies are installed in the worktree (npm ci, etc.) so linting and build/test work
Build and test commands run in isolation without polluting your local build cache
If anything goes wrong, your environment is unaffected — just delete the worktree
The worktree is automatically cleaned up after the review completes
If a review is interrupted (Ctrl+C, crash), the next /review of the same PR automatically cleans up the stale worktree before starting fresh
Review reports and cache are saved to the main project directory (not the worktree)

Cross-repo PR Review

You can review PRs from other repositories by passing the full URL:

/review https://github.com/other-org/other-repo/pull/456

This runs in lightweight mode — no worktree, no linter, no build/test, no autofix. The review is based on the diff text only (fetched via GitHub API). PR comments can still be posted if you have write access.

Capability	Same-repo	Cross-repo
LLM review (Agents 1-6 + verify + iterative reverse audit)	✅	✅
Agent 7: Build & test	✅	❌ (no local codebase)
Deterministic analysis (linter/typecheck)	✅	❌
Cross-file impact analysis	✅	❌
Autofix	✅	❌
PR inline comments	✅	✅ (if you have write access)
Incremental review cache	✅	❌

PR Inline Comments

Use --comment to post findings directly on the PR:

/review 123 --comment

Or, after running /review 123, type post comments to publish findings without re-running the review.

What gets posted:

High-confidence Critical and Suggestion findings as inline comments on specific lines
For Approve/Request changes verdicts: a review summary with the verdict
For Comment verdict with all inline comments posted: no separate summary (inline comments are sufficient)
Model attribution footer on each comment (e.g., — qwen3-coder via Qwen Code /review)

What stays terminal-only:

Nice to have findings (including linter warnings)
Low-confidence findings

Self-authored PRs: GitHub does not allow you to submit APPROVE or REQUEST_CHANGES reviews on your own pull request — both fail with HTTP 422. When /review detects that the PR author matches the current authenticated user, it automatically downgrades the API event to COMMENT regardless of verdict, so the submission still succeeds. The terminal still shows the honest verdict ("Approve" / "Request changes" / "Comment") — only the GitHub-side review event is neutralized. The actual findings still appear as inline comments on specific lines, so substantive feedback is unchanged.

Re-reviewing a PR with prior Qwen Code comments: when /review runs on a PR that already has previous Qwen Code review comments, it classifies them before posting new ones. Only same-line overlap (an existing comment on the same (path, line) as a new finding) prompts you to confirm — that's the case where you'd see a visual duplicate on the same code line. Comments from older commits, replied-to comments (treated as resolved), and comments that simply don't overlap with any new finding are silently skipped, with a terminal log line so you know what was filtered.

CI / build status check before APPROVE: if the verdict is "Approve", /review queries the PR's check-runs and commit statuses before submitting. If any check has failed (or all checks are still pending), the API event is automatically downgraded from APPROVE to COMMENT, with the review body explaining why. Rationale: the LLM review reads code statically and cannot see runtime test failures; approving while CI is red would be misleading. The inline findings are still posted unchanged. If you want to approve anyway (e.g., a known-flaky CI failure), submit the GitHub approval manually after verifying.

Follow-up Actions

After the review, context-aware tips appear as ghost text. Press Tab to accept:

State after review	Tip	What happens
Local review with unfixed findings	`fix these issues`	LLM interactively fixes each finding
PR review with findings	`post comments`	Posts PR inline comments (no re-review)
PR review, zero findings	`post comments`	Approves the PR on GitHub (LGTM)
Local review, all clear	`commit`	Commits your changes

Note: fix these issues is only available for local reviews. For PR reviews, use Autofix (Step 8) — the worktree is cleaned up after the review, so post-review interactive fixing is not possible.

Project Review Rules

You can customize review criteria per project. /review reads rules from these files (in order):

.qwen/review-rules.md (Qwen Code native)
.github/copilot-instructions.md (preferred) or copilot-instructions.md (fallback — only one is loaded, not both)
AGENTS.md — ## Code Review section
QWEN.md — ## Code Review section

Rules are injected into the LLM review agents (1-6) as additional criteria. For PR reviews, rules are read from the base branch to prevent a malicious PR from injecting bypass rules.

Example .qwen/review-rules.md:

# Review Rules

- All API endpoints must validate authentication
- Database queries must use parameterized statements
- React components must not use inline styles
- Error messages must not expose internal paths

Incremental Review

When reviewing a PR that was previously reviewed, /review only examines changes since the last review:

# First review — full review, cache created
/review 123

# PR updated with new commits — only new changes reviewed
/review 123

Cross-model review

If you switch models (via /model) and re-review the same PR, /review detects the model change and runs a full review instead of skipping:

# Review with model A
/review 123

# Switch model
/model

# Review again — full review with model B (not skipped)
/review 123
# → "Previous review used qwen3-coder. Running full review with gpt-4o for a second opinion."

Cache is stored in .qwen/review-cache/ and tracks both the commit SHA and model ID. Make sure this directory is in your .gitignore (a broader rule like .qwen/* also works). If the cached commit was rebased away, it falls back to a full review.

Review Reports

For same-repo reviews, results are saved as a Markdown file in your project's .qwen/reviews/ directory (cross-repo lightweight reviews skip report persistence):

.qwen/reviews/2026-04-06-143022-pr-123.md
.qwen/reviews/2026-04-06-150510-local.md

Reports include: timestamp, diff stats, deterministic analysis results, all findings with verification status, and the verdict.

Cross-file Impact Analysis

When code changes modify exported functions, classes, or interfaces, the review agents automatically search for all callers and check compatibility:

Parameter count/type changes
Return type changes
Removed or renamed public methods
Breaking API changes

For large diffs (>10 modified symbols), analysis prioritizes functions with signature changes.

Token Efficiency

The review pipeline uses a bounded number of LLM calls regardless of how many findings are produced:

Stage	LLM calls	Notes
Deterministic analysis (Step 3)	0	Shell commands only
Review agents (Step 4)	9 (or 8)	Run in parallel; Agent 7 skipped in cross-repo mode
Batch verification (Step 5)	1	Single agent verifies all findings at once
Iterative reverse audit (Step 6)	1-3	Loops until "No issues found" or 3-round cap
Total	11-13 (10-12)	Same-repo: 11-13; cross-repo: 10-12 (no Agent 7)

Most PRs converge to the lower end of the range (1 reverse audit round); the cap prevents runaway cost on pathological cases.

What's NOT Flagged

The review intentionally excludes:

Pre-existing issues in unchanged code (focus on the diff only)
Style/formatting/naming that matches your codebase conventions
Issues a linter or type checker would catch (handled by deterministic analysis)
Subjective "consider doing X" suggestions without a real problem
Minor refactoring that doesn't fix a bug or risk
Missing documentation unless the logic is genuinely confusing
Issues already discussed in existing PR comments (avoids duplicating human feedback)

Design Philosophy

Silence is better than noise. Every comment should be worth the reader's time.

If unsure whether something is a problem → don't report it
Linter/typecheck issues are handled by tools, not LLM guesses
Same pattern across N files → aggregated into one finding
PR comments are high-confidence only
Style/formatting issues matching codebase conventions are excluded

16 KiB Raw Blame History

Code Review

Quick Start

How It Works

Review Agents

Deterministic Analysis

Severity Levels

Autofix

Worktree Isolation

Cross-repo PR Review

PR Inline Comments

Follow-up Actions

Project Review Rules

Incremental Review

Cross-model review

Review Reports

Cross-file Impact Analysis

Token Efficiency

What's NOT Flagged

Design Philosophy

16 KiB

Raw Blame History