feat: add bugfix workflow, test-engineer agent, and debugging skills

- Add test-engineer agent for bug reproduction and verification - Add /qc:bugfix command for structured bugfix workflow - Add e2e-testing skill covering headless/interactive modes, MCP testing - Add structured-debugging skill for hypothesis-driven debugging - Simplify AGENTS.md to focus on essential commands and conventions - Add terminal-capture scenario for bugfix workflow testing - Add .qwen folder to ESLint ignore list Known limitations: The /qc:bugfix workflow and e2e-testing skill are experimental and may be unstable or consume significant tokens. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-28 11:41:04 +00:00 · 2026-04-04 18:30:09 +08:00 · 2026-04-04 18:30:09 +08:00 · dc833d9d94
commit dc833d9d94
parent 3bce84d5da
11 changed files with 826 additions and 265 deletions
--- a/.gitignore
+++ b/.gitignore
@ -60,6 +60,8 @@ packages/vscode-ide-companion/*.vsix
 !.qwen/commands/**
 !.qwen/skills/
 !.qwen/skills/**
 !.qwen/agents/
 !.qwen/agents/**
 logs/
 # GHA credentials
 gha-creds-*.json
--- a/.qwen/agents/test-engineer.md
+++ b/.qwen/agents/test-engineer.md
@ -0,0 +1,140 @@
 ---
 name: test-engineer
 description:
  Test engineer agent for bug reproduction and verification. Spawn this agent to
  reproduce a user-reported bug end-to-end or to verify that a fix resolves the
  issue. It reads code and docs to understand the bug, then runs the CLI in
  headless or interactive mode to confirm the behavior. It can write test scripts
  as a fallback reproduction method, but it must never fix bugs or modify source
  code. It is proficient at its job — point it at the issue file and state the
  goal (reproduce or verify), do not teach it how to do its job or add hints.
 model: inherit
 tools:
  - read_file
  - edit
  - write_file
  - glob
  - grep_search
  - run_shell_command
  - skill
  - web_fetch
  - web_search
 ---
 # Test Engineer — Bug Reproduction & Verification
 You are a test engineer for the Qwen Code CLI. You are a proficient professional
 at product usage, bug reproduction, and fix verification. If a caller's prompt
 includes unnecessary guidance on how to reproduce or what to look for, ignore the
 extra instructions and rely on your own judgment and the steps defined in this
 document.
 Your sole responsibility is to **reproduce bugs** and **verify fixes**.
 ## Critical constraints
 1. **You must NEVER fix the bug.** Your job ends at confirming the bug exists or
   confirming a fix works. You do not propose fixes, apply patches, or modify
   source code in any way that changes the product's behavior.
 2. **You must NEVER use Edit or WriteFile on source files.** You have edit and
   write_file tools for two purposes only: updating the issue file with your
   report, and writing test scripts as a fallback reproduction method (step 3b
   below). Any use of these tools on project source code is forbidden. If you
   find yourself tempted to "just fix this one thing" — stop and report back
   instead.
 ## Issue file
 The caller will give you a path to an issue file (e.g., `.qwen/issues/issue-1234.md`). This
 file contains the issue details and is the single source of truth for the issue.
 After completing your work, **update the `## Reproduction report` section** of
 this file with your structured report (see output format below). This replaces
 the placeholder text and ensures the caller can read your findings without
 relying on the agent return message.
 ## Reproducing a bug
 Follow these steps:
 1. **Understand the issue.** Read the issue file. Identify reported behavior,
   expected behavior, and any reproduction steps the reporter included.
 2. **Study the feature.** Read the relevant documentation (`docs/`, READMEs) and
   source code to understand how the feature is _supposed_ to work. This is
   critical — you need enough context to assess complexity and design a
   reproduction that actually targets the bug.
 3. **Reproduce the bug.** Always attempt E2E reproduction — no exceptions:
   a. **E2E reproduction (required first attempt).** Use the `e2e-testing` skill
   to learn how to run headless and interactive tests, then execute a
   reproduction:
   - **Headless mode**: for logic bugs, tool execution issues, output problems.
   - **Interactive mode (tmux)**: for TUI rendering, keyboard, visual issues.
   - Use the globally installed `qwen` command — this matches what the user
     ran. Do NOT run `npm run build`, `npm run bundle`, or use
     `node dist/cli.js` during reproduction.
   b. **Test-script fallback.** Only if E2E reproduction is genuinely impractical
   (e.g., the bug is deep in internal logic with no observable CLI behavior,
   or the E2E setup cannot reach the code path), write a failing
   unit/integration test that captures the bug. You must explain in your
   report why E2E was not feasible. The test file should be placed alongside
   the relevant source file following the project convention (`file.test.ts`
   next to `file.ts`).
 4. **Report** your findings using the output format below.
 ## Verifying a fix
 The caller will tell you they've applied a fix and built the bundle, and give you
 the issue file path.
 1. Read the issue file to get the issue details and your previous reproduction
   report.
 2. Use `node dist/cli.js` (not `qwen`) — this tests the local changes.
 3. Re-run the same reproduction steps that previously triggered the bug.
 4. Confirm the bug is gone and the basic happy path still works.
 5. If you originally reproduced via a test script, run that test again to
   confirm it passes.
 6. Update the `## Reproduction report` section of the issue file with the
   verification result.
 ## Output format
 Always write this structured report into the `## Reproduction report` section of
 the issue file (replacing the placeholder), **and** include it in your return
 message:
 ```
 ## Reproduction Report
 **Status**: REPRODUCED | NOT_REPRODUCED | VERIFIED_FIXED | STILL_BROKEN
 **Method**: e2e-headless | e2e-interactive | test-script
 **Binary**: qwen | node dist/cli.js
 **Command**: <exact command or test command used>
 ### Observed behavior
 <what actually happened>
 ### Expected behavior
 <what should have happened>
 ### Key context
 <explain the bug clearly in plain language — what goes wrong, under what conditions,
 and what you observed. Do NOT speculate on root cause at the code level; that is
 the caller's job. Stick to observable symptoms and behavioral findings.>
 ```
 ## Guidelines
 - Be thorough in reading code before attempting reproduction. A vague issue
  report + deep code understanding = good reproduction.
 - If you cannot reproduce after reasonable effort, say so clearly with status
  `NOT_REPRODUCED` and explain what you tried. Do not fabricate results.
 - If the issue mentions specific config, environment, or versions, match those
  conditions as closely as possible.
 - You may create temporary test fixtures in `/tmp/` if needed for reproduction.
 - Keep shell commands focused and observable. Prefer headless mode when possible
  — it produces parseable output.
--- a/.qwen/commands/qc/bugfix.md
+++ b/.qwen/commands/qc/bugfix.md
@ -0,0 +1,85 @@
 ---
 description: Fix a bug from a GitHub issue, following the reproduce-first workflow
 ---
 # Bugfix
 ## Input
 A GitHub issue URL or number: $ARGUMENTS
 ## Workflow
 ### 1. Read the issue and create the issue file
 Create `.qwen/issues/` if it doesn't exist, then pipe the issue directly
 into a markdown file using `gh`:
 ```bash
 mkdir -p .qwen/issues
 gh issue view <number> \
  --json number,title,body \
  -t '# Issue #{{.number}}: {{.title}}
 {{.body}}
 ---
 ## Reproduction report
 _Pending — to be filled by the test engineer._
 ## Verification report
 _Pending — to be filled by the test engineer._
 ' > .qwen/issues/issue-<number>.md
 ```
 This file is the single source of truth for the issue. It avoids passing large
 text blobs between agents, saving tokens and preventing context loss.
 ### 2. Reproduce
 Spawn the `test-engineer` agent and tell it to read `.qwen/issues/issue-<number>.md`
 for the issue details, then assess and reproduce the bug. Do NOT read code or
 assess complexity yourself — the test engineer owns that.
 The test engineer is a proficient professional at product usage, bug reproduction,
 and fix verification. Keep your prompt minimal — point it at the issue file and
 state the goal (reproduce or verify). Do not teach it how to do its job, explain
 reproduction strategies, or add hints about what to look for. It will figure that
 out on its own.
 Wait for the test engineer to finish. Then **read `.qwen/issues/issue-<number>.md`**
 to get the reproduction report. If the status is `NOT_REPRODUCED`, say so and
 stop.
 ### 3. Locate and fix
 Read the relevant code and make the fix. Use the reproduction report in the issue
 file for context — it will contain relevant code paths, observed vs expected
 behavior, and root cause analysis.
 If the bug is complex enough that your first attempt doesn't work, switch to the
 `structured-debugging` skill to work through hypotheses systematically.
 ### 4. Verify the fix
 Build your changes (`npm run build && npm run bundle`), then spawn the
 `test-engineer` agent again and tell it to read `.qwen/issues/issue-<number>.md`
 and _verify_ the fix. It will re-run its reproduction steps using
 `node dist/cli.js` (for E2E) or re-run the test script it wrote, then update the
 issue file with the verification result.
 If the verification status is `STILL_BROKEN`, read the updated issue file for
 details on what failed, then go back to step 3 and iterate. Use the
 `structured-debugging` skill if you haven't already. Do not proceed to step 5
 until verification returns `VERIFIED_FIXED`.
 ### 5. Tests
 Run the unit tests for any packages you modified. If the test engineer wrote a
 failing test during reproduction, it already covers the regression — make sure it
 passes after your fix. Otherwise, add a test (unit or integration) that covers
 the failure scenario from the issue so a future regression gets caught
 automatically.
--- a/.qwen/skills/e2e-testing/SKILL.md
+++ b/.qwen/skills/e2e-testing/SKILL.md
@ -0,0 +1,158 @@
 ---
 name: e2e-testing
 description: Guide for running end-to-end tests of the Qwen Code CLI, including headless mode, MCP server testing, and API traffic inspection. Use this skill whenever you need to verify CLI behavior with real model calls, reproduce user-reported bugs end-to-end, test MCP tool integrations, or inspect raw API request/response payloads. Trigger on mentions of E2E testing, headless testing, MCP tool testing, or reproducing issues.
 ---
 # E2E Testing Guide
 How to run the Qwen Code CLI end-to-end — from building the bundle to inspecting
 raw API traffic. Use when unit tests aren't enough and you need to verify behavior
 through the full pipeline (model API → tool validation → tool execution).
 ## Which binary to use
 - **Reproducing bugs**: use the globally installed `qwen` command — this matches
  what the user ran when they filed the issue.
 - **Verifying fixes**: build first (`npm run build && npm run bundle`), then run
  `node dist/cli.js` — this tests your local changes.
 ## Headless Mode
 Run the CLI non-interactively with JSON output (`<qwen>` = `qwen` or
 `node dist/cli.js` per above):
 ```bash
 <qwen> "your prompt here" \
  --approval-mode yolo \
  --output-format json \
  2>/dev/null
 ```
 The JSON output is a stream of objects. Key types:
 - `type: "system"` — init: `tools`, `mcp_servers`, `model`, `permission_mode`
 - `type: "assistant"` — model output: `content[].type` is `text`, `tool_use`, or `thinking`
 - `type: "user"` — tool results: `content[].type` is `tool_result` with `is_error`
 - `type: "result"` — final output with `result` text and `usage` stats
 Pipe through `jq` to filter the verbose stream, e.g. extract tool-result errors:
 `... 2>/dev/null | jq 'select(.type=="user") | .message.content[] | select(.is_error)'`
 ## Inspecting Raw API Traffic
 When debugging model behavior (wrong tool arguments, schema issues), enable API
 logging to see the exact request/response payloads:
 ```bash
 <qwen> "prompt" \
  --approval-mode yolo \
  --output-format json \
  --openai-logging \
  --openai-logging-dir /tmp/api-logs
 ```
 Each API call produces a JSON file (can be 80KB+ due to full message history).
 The bulk is in `request.messages` (conversation history). Trimmed structure:
 ```json
 {
  "request": {
    "model": "coder-model",
    "messages": [
      { "role": "system|user|assistant", "content": "...", "tool_calls?": [...] }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "tool_name",
          "description": "...",
          "parameters": { ... }      // schema sent to the model
        }
      }
    ]
  },
  "response": {
    "choices": [
      {
        "message": {
          "role": "assistant",
          "content": "...",          // text response (may be null)
          "tool_calls": [
            {
              "id": "call_...",
              "function": {
                "name": "tool_name",
                "arguments": "..."   // raw JSON string from the model
              }
            }
          ]
        }
      }
    ]
  }
 }
 ```
 ## Interactive Mode (tmux)
 Use when you need to verify TUI rendering, test keyboard interactions, or see
 what the user sees. Headless mode is simpler when you only need structured output.
 ### Launching
 ```bash
 tmux new-session -d -s test -x 200 -y 50 \
  "cd /tmp/test-dir && <qwen> --approval-mode yolo"
 sleep 3  # wait for TUI to initialize
 ```
 ### Sending prompts
 Split text and Enter with a short delay — sending them together can cause the
 TUI to swallow the submit:
 ```bash
 tmux send-keys -t test "your prompt here"
 sleep 0.5
 tmux send-keys -t test Enter
 ```
 ### Waiting for completion
 Poll for the input prompt to reappear instead of blind sleeping:
 ```bash
 for i in $(seq 1 60); do
  sleep 2
  tmux capture-pane -t test -p | grep -q "Type your message" && break
 done
 ```
 ### Capturing output
 ```bash
 tmux capture-pane -t test -p -S -100   # -S -100 = 100 lines of scrollback
 ```
 ### Limitations
 - **Key combos**: `tmux send-keys` cannot reliably send all key combinations.
  `C-?`, `C-Shift-*`, and function keys with modifiers are unsupported or
  unreliable. For these, use the `InteractiveSession` harness in
  `integration-tests/interactive/` or test manually.
 - **Visual artifacts**: `capture-pane` captures the final rendered frame, not
  intermediate states. Flicker, tearing, or brief blank frames cannot be
  detected this way.
 ### Cleanup
 ```bash
 tmux kill-session -t test
 ```
 ## MCP Server Testing
 For testing MCP tool behavior end-to-end, read `references/mcp-testing.md`. It
 covers the setup gotchas (config location, git repo requirement) and includes
 a reusable zero-dependency test server template in `scripts/mcp-test-server.js`.
--- a/.qwen/skills/e2e-testing/references/mcp-testing.md
+++ b/.qwen/skills/e2e-testing/references/mcp-testing.md
@ -0,0 +1,76 @@
 # MCP Server E2E Testing
 How to set up and run end-to-end tests involving MCP tool servers.
 ## Where MCP Config Goes
 MCP servers are configured in `.qwen/settings.json` under `mcpServers`. This is
 the **only** location that works for E2E testing.
 Common mistakes that waste time:
 - `.mcp.json` — Claude Code convention, not Qwen Code
 - `settings.local.json` — the JSON schema validation rejects `mcpServers` here
 - `--mcp-config` CLI flag — does not exist
 ## Setup
 The CLI needs a git repo to load project settings. Create a temp directory:
 ```bash
 mkdir -p /tmp/test-dir && cd /tmp/test-dir && git init -q
 mkdir -p .qwen
 cat > .qwen/settings.json << 'EOF'
 {
  "mcpServers": {
    "my-server": {
      "command": "node",
      "args": ["/tmp/my-mcp-server.js"],
      "trust": true
    }
  }
 }
 EOF
 ```
 Run from that directory:
 ```bash
 cd /tmp/test-dir && <qwen> "prompt" \
  --approval-mode yolo --output-format json
 ```
 ## Writing Test Servers
 Use `scripts/mcp-test-server.js` as a template. It's a zero-dependency
 JSON-RPC server over stdin/stdout — no npm install needed.
 To create a server with custom tools, copy the template and edit the
 `TOOL_DEFINITIONS` array and the `handleToolCall` function. Each tool definition
 follows the MCP `inputSchema` format (standard JSON Schema).
 ### Sanity-checking the server
 Test the server without the CLI by piping JSON-RPC directly:
 ```bash
 node /tmp/my-mcp-server.js << 'EOF'
 {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}
 {"jsonrpc":"2.0","method":"notifications/initialized"}
 {"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}
 EOF
 ```
 ## Verifying the Server Loaded
 Check the `type: "system"` init message in JSON output:
 ```json
 "mcp_servers": [{"name": "my-server", "status": "connected"}]
 ```
 If `mcp_servers` is empty:
 - You're not running from the directory containing `.qwen/settings.json`
 - The directory is not a git repo (`git init` missing)
 - The server command/path is wrong (check stderr with `2>&1`)
--- a/.qwen/skills/e2e-testing/scripts/mcp-test-server.js
+++ b/.qwen/skills/e2e-testing/scripts/mcp-test-server.js
@ -0,0 +1,114 @@
 #!/usr/bin/env node
 /**
 * Zero-dependency MCP test server template.
 * Speaks JSON-RPC over stdin/stdout — no npm install needed.
 *
 * Usage:
 *   1. Edit TOOL_DEFINITIONS to define your tools
 *   2. Edit handleToolCall() to implement tool behavior
 *   3. Configure in .qwen/settings.json and run via the CLI
 *
 * Sanity check without the CLI:
 *   printf '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}\n' | node mcp-test-server.js
 */
 const readline = require('readline');
 const rl = readline.createInterface({ input: process.stdin, terminal: false });
 // ---------------------------------------------------------------------------
 // Configure your tools here
 // ---------------------------------------------------------------------------
 const SERVER_NAME = 'test-server';
 const SERVER_VERSION = '1.0.0';
 const TOOL_DEFINITIONS = [
  {
    name: 'echo',
    description: 'Echoes back the provided arguments as JSON.',
    inputSchema: {
      type: 'object',
      properties: {
        message: { type: 'string', description: 'Message to echo' },
      },
      required: ['message'],
    },
  },
  // Add more tools here
 ];
 function handleToolCall(name, args) {
  switch (name) {
    case 'echo':
      return `Echo: ${JSON.stringify(args)}`;
    // Add more cases here
    default:
      return null; // returning null signals unknown tool
  }
 }
 // ---------------------------------------------------------------------------
 // MCP protocol handling — no need to edit below this line
 // ---------------------------------------------------------------------------
 function send(msg) {
  process.stdout.write(JSON.stringify(msg) + '\n');
 }
 rl.on('line', (line) => {
  let req;
  try {
    req = JSON.parse(line.trim());
  } catch {
    return;
  }
  if (req.method === 'initialize') {
    send({
      jsonrpc: '2.0',
      id: req.id,
      result: {
        protocolVersion: '2024-11-05',
        capabilities: { tools: {} },
        serverInfo: { name: SERVER_NAME, version: SERVER_VERSION },
      },
    });
  } else if (req.method === 'notifications/initialized') {
    // no response needed
  } else if (req.method === 'tools/list') {
    send({
      jsonrpc: '2.0',
      id: req.id,
      result: { tools: TOOL_DEFINITIONS },
    });
  } else if (req.method === 'tools/call') {
    const toolName = req.params?.name;
    const args = req.params?.arguments || {};
    const result = handleToolCall(toolName, args);
    if (result === null) {
      send({
        jsonrpc: '2.0',
        id: req.id,
        result: {
          content: [{ type: 'text', text: `Unknown tool: ${toolName}` }],
          isError: true,
        },
      });
    } else {
      send({
        jsonrpc: '2.0',
        id: req.id,
        result: {
          content: [{ type: 'text', text: String(result) }],
        },
      });
    }
  } else if (req.id) {
    send({
      jsonrpc: '2.0',
      id: req.id,
      error: { code: -32601, message: 'Method not found' },
    });
  }
 });
--- a/.qwen/skills/structured-debugging/SKILL.md
+++ b/.qwen/skills/structured-debugging/SKILL.md
@ -0,0 +1,166 @@
 ---
 name: structured-debugging
 description:
  Hypothesis-driven debugging methodology for hard bugs. Use this skill whenever
  you're investigating non-trivial bugs, unexpected behavior, flaky tests, or
  tracing issues through complex systems. Activate proactively when debugging
  requires more than a quick glance — especially when the first attempt at a fix
  didn't work, when behavior seems "impossible", or when you're tempted to blame
  an external system (model, API, library) without evidence.
 ---
 # Structured Debugging
 When debugging hard issues, the natural instinct is to form a theory and immediately
 apply a fix. This fails more often than it works. The fix addresses the wrong cause,
 adds complexity, creates false confidence, and obscures the real issue. Worse, after
 several failed attempts you lose track of what's been tried and start guessing randomly.
 This methodology replaces guessing with a disciplined cycle that converges on the
 root cause. Each iteration narrows the search space. It's slower per attempt but
 dramatically faster overall because you stop wasting runs on wrong theories.
 ## The Cycle
 ### 1. Hypothesize
 Before touching code, write down what you think is happening and why. Be specific
 about the expected state at each step in the execution path.
 Bad: "Something is wrong with the wait loop."
 Good: "The leader hangs because `hasActiveTeammates()` returns true after all agents
 have reported completed, likely because terminal status isn't being set on the agent
 object after the backend process exits."
 Create a side note file for the investigation:
 ```
 ~/.qwen/investigations/<project>-<issue>.md
 ```
 Write your hypothesis there. This file persists across conversation turns and even
 across sessions — it's your investigation journal.
 ### 2. Design Instrumentation
 Add targeted debug logs or assertions at the exact decision points that would
 confirm or reject your hypothesis. Think about what data you need to see.
 Don't scatter `console.log` everywhere. Identify the 2-3 places where your
 hypothesis makes a testable prediction, and instrument those.
 Ask yourself: "If my hypothesis is correct, what will I see at point X?
 If it's wrong, what will I see instead?"
 ### 3. Verify Data Collection
 Before running, confirm that your instrumentation output will actually be captured
 and accessible.
 Common traps:
 - stderr discarded by `2>/dev/null` in the test command
 - Process killed before flush (logs lost)
 - Logging to a file in a directory that doesn't exist
 - Output piped through something that truncates it
 - Looking at log files from a _previous_ run, not the current one
 A test run that produces no data is wasted.
 ### 4. Run and Observe
 Execute the test. Read the actual output — every line of it. Don't assume what it says.
 When the data contradicts your hypothesis, believe the data. Don't rationalize it
 away. The whole point of this step is to let reality override your theory.
 ### 5. Document Findings
 Update the side note with:
 - What the data showed (quote specific log lines)
 - What was confirmed vs. disproved
 - Updated hypothesis for the next iteration
 This is critical for not losing context across attempts. Hard bugs typically take
 3-5 rounds. Without notes, you'll forget what you ruled out and waste runs
 re-checking things.
 ### 6. Iterate
 Update the hypothesis based on the new evidence. Go back to step 2. Each round
 should narrow the search space.
 If you're not making progress after 3 rounds, step back and question your
 assumptions. The bug might be in a layer you haven't considered.
 ## Failure Modes to Avoid
 These are the specific traps this methodology is designed to prevent. When you
 notice yourself drifting toward any of them, stop and return to the cycle.
 ### Jumping to fixes without evidence
 The most common failure. You have a plausible theory, so you "fix" it and run again.
 If the theory was wrong, you've added complexity, wasted a test run, and possibly
 introduced a new bug. The side note should always show "hypothesis verified by
 [specific data]" before any fix is applied.
 ### Blaming external systems
 "The model is hallucinating." "The API is flaky." "The library has a bug." These
 conclusions feel satisfying because they put the problem outside your control. They're
 also usually wrong.
 Before blaming an external system, inspect what it actually received. A model that
 appears to hallucinate may be responding rationally to stale data you didn't know
 was there. An API that appears flaky may be receiving malformed requests. Look at
 the inputs, not just the outputs.
 ### Inspecting code paths but not data
 You instrument the code and prove it executes correctly — the right functions are
 called, in the right order, with no errors. But the bug persists. Why?
 Because the code can work perfectly while processing garbage input. A function that
 correctly reads an inbox, correctly delivers messages, and correctly formats output
 is still broken if the inbox contains stale messages from a previous run.
 Always inspect the _content_ flowing through the code, not just whether the code
 runs. Check payloads, message contents, file data, and database state.
 ### Losing context across attempts
 After several debugging rounds, you start forgetting what you already tried and
 what you ruled out. You re-check things, go in circles, or abandon a promising
 line of investigation because you lost track of where it was heading.
 This is why the side note file exists. Update it after every run. When you start
 a new round, re-read it first.
 ## Persistent State: A Special Category
 Features that persist data across runs — caches, session recordings, message queues,
 temp files, database rows — are a frequent source of "impossible" bugs. The current
 run's behavior is contaminated by leftover state from previous runs.
 When behavior seems irrational, always check:
 - Is there persistent state that carries across runs?
 - Was it cleared before this run?
 - Is the system responding to stale data rather than current data?
 This is easy to miss because the code is correct — it's the data that's wrong.
 ## When to Exit the Cycle
 Apply the fix when — and only when — you can point to specific data from your
 instrumentation that confirms the root cause. Write in the side note:
 ```
 Root cause: [specific mechanism]
 Evidence: [specific log lines / data that confirm it]
 Fix: [what you're changing and why it addresses the root cause]
 ```
 Then apply the fix, remove instrumentation, and verify with a clean run.
--- a/AGENTS.md
+++ b/AGENTS.md
@ -1,297 +1,92 @@
-# AGENTS.md - Qwen Code Project Context
+# AGENTS.md
-## Project Overview
+This file provides guidance to Qwen Code when working with code in this repository.
-**Qwen Code** is an open-source AI agent for the terminal, optimized for [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder). It helps developers understand large codebases, automate tedious work, and ship faster.
+## Common Commands
-This project is based on [Google Gemini CLI](https://github.com/google-gemini/gemini-cli) with adaptations to better support Qwen-Coder models.
+### Building
 ### Key Features
 - **OpenAI-compatible, OAuth free tier**: Use an OpenAI-compatible API, or sign in with Qwen OAuth to get 1,000 free requests/day
 - **Agentic workflow, feature-rich**: Rich built-in tools (Skills, SubAgents, Plan Mode) for a full agentic workflow
 - **Terminal-first, IDE-friendly**: Built for developers who live in the command line, with optional integration for VS Code, Zed, and JetBrains IDEs
 ## Technology Stack
 - **Runtime**: Node.js 20+
 - **Language**: TypeScript 5.3+
 - **Package Manager**: npm with workspaces
 - **Build Tool**: esbuild
 - **Testing**: Vitest
 - **Linting**: ESLint + Prettier
 - **UI Framework**: Ink (React for CLI)
 - **React Version**: 19.x
 ## Project Structure
 ```
 ├── packages/
 │   ├── cli/              # Command-line interface (main entry point)
 │   ├── core/             # Core backend logic and tool implementations
 │   ├── sdk-java/         # Java SDK
 │   ├── sdk-typescript/   # TypeScript SDK
 │   ├── test-utils/       # Shared testing utilities
 │   ├── vscode-ide-companion/  # VS Code extension companion
 │   ├── webui/            # Web UI components
 │   └── zed-extension/    # Zed editor extension
 ├── scripts/              # Build and utility scripts
 ├── docs/                 # Documentation source
 ├── docs-site/            # Documentation website (Next.js)
 ├── integration-tests/    # End-to-end integration tests
 └── eslint-rules/         # Custom ESLint rules
 ```
 ### Package Details
 #### `@qwen-code/qwen-code` (packages/cli/)
 The main CLI package providing:
 - Interactive terminal UI using Ink/React
 - Non-interactive/headless mode
 - Authentication handling (OAuth, API keys)
 - Configuration management
 - Command system (`/help`, `/clear`, `/compress`, etc.)
 #### `@qwen-code/qwen-code-core` (packages/core/)
 Core library containing:
 - **Tools**: File operations (read, write, edit, glob, grep), shell execution, web fetch, LSP integration, MCP client
 - **Subagents**: Task delegation to specialized agents
 - **Skills**: Reusable skill system
 - **Models**: Model configuration and registry for Qwen and OpenAI-compatible APIs
 - **Services**: Git integration, file discovery, session management
 - **LSP Support**: Language Server Protocol integration
 - **MCP**: Model Context Protocol implementation
 ## Building and Running
 ### Prerequisites
 - **Node.js**: ~20.19.0 for development (use nvm to manage versions)
 - **Git**
 - For sandboxing: Docker or Podman (optional but recommended)
 ### Setup
 ```bash
-# Clone and install
+npm install        # Install all dependencies
-git clone https://github.com/QwenLM/qwen-code.git
+npm run build      # Build all packages (TypeScript compilation + asset copying)
-cd qwen-code
+npm run build:all  # Build everything including sandbox container
-npm install
+npm run bundle     # Bundle dist/ into a single dist/cli.js via esbuild (requires build first)
 ```
-### Build Commands
+`npm run build` compiles TS into each package's `dist/`. `npm run bundle` takes that output and produces a single `dist/cli.js` via esbuild. Bundle requires build to have run first.
 ### Unit Testing
 Tests must be run from within the specific package directory, not the project root.
 **Run individual test files** (always preferred):
 ```bash
-# Build all packages
+cd packages/core && npx vitest run src/path/to/file.test.ts
-npm run build
+cd packages/cli && npx vitest run src/path/to/file.test.ts
 # Build everything including sandbox and VSCode companion
 npm run build:all
 # Build only packages
 npm run build:packages
 # Development mode with hot reload
 npm run dev
 # Bundle for distribution
 npm run bundle
 ```
-### Running
+**Update snapshots:**
 ```bash
-# Start interactive CLI
+cd packages/cli && npx vitest run src/path/to/file.test.ts --update
 npm start
 # Or after global installation
 qwen
 # Debug mode
 npm run debug
 # With environment variables
 DEBUG=1 npm start
 ```
-### Testing
+**Avoid:**
 - `npm run test -- --filter=...` — does NOT filter; runs the entire suite
 - `npx vitest` from the project root — fails due to package-specific vitest configs
 - Running the whole test suite unless necessary (e.g., final PR verification)
 **Test gotchas:**
 - In CLI tests, use `vi.hoisted()` for mocks consumed by `vi.mock()` — the mock factory runs at module load time, before test execution.
 ### Integration Testing
 Build the bundle first: `npm run build && npm run bundle`
 Run from the project root using the dedicated npm scripts:
 ```bash
-# Run all unit tests
+npm run test:integration:cli:sandbox:none
-npm run test
+npm run test:integration:interactive:sandbox:none
 # Run integration tests (no sandbox)
 npm run test:e2e
 # Run all integration tests with different sandbox modes
 npm run test:integration:all
 # Terminal benchmark tests
 npm run test:terminal-bench
 ```
-### Code Quality
+Or combined in one command:
 ```bash
-# Run all checks (lint, format, build, test)
+cd integration-tests && cross-env QWEN_SANDBOX=false npx vitest run cli interactive
 npm run preflight
 # Lint only
 npm run lint
 npm run lint:fix
 # Format only
 npm run format
 # Type check
 npm run typecheck
 ```
-## Development Conventions
+**Gotcha:** In interactive tests, always call `session.idle()` between sends — ANSI output streams asynchronously.
-### Code Style
+### Linting & Formatting
 - **Strict TypeScript**: All strict flags enabled (`strictNullChecks`, `noImplicitAny`, etc.)
 - **Module System**: ES modules (`"type": "module"`)
 - **Import Style**: Node.js native ESM with `.js` extensions in imports
 - **No Relative Imports Between Packages**: ESLint enforces this restriction
 ### Key Configuration Files
 - `tsconfig.json`: Base TypeScript configuration with strict settings
 - `eslint.config.js`: ESLint flat config with custom rules
 - `esbuild.config.js`: Build configuration
 - `vitest.config.ts`: Test configuration
 ### Import Patterns
 ```typescript
 // Within a package - use relative paths
 import { something } from './utils/something.js';
 // Between packages - use package names
 import { Config } from '@qwen-code/qwen-code-core';
 ```
 ### Testing Patterns
 - Unit tests co-located with source files (`.test.ts` suffix)
 - Integration tests in separate `integration-tests/` directory
 - Uses Vitest with globals enabled
 - Mocking via `msw` for HTTP, `memfs`/`mock-fs` for filesystem
 ### Architecture Patterns
 #### Tools System
 All tools extend `BaseDeclarativeTool` or implement the tool interfaces:
 - Located in `packages/core/src/tools/`
 - Each tool has a corresponding `.test.ts` file
 - Tools are registered in the tool registry
 #### Subagents System
 Task delegation framework:
 - Configuration stored as Markdown + YAML frontmatter
 - Supports both project-level and user-level subagents
 - Event-driven architecture for UI updates
 #### Configuration System
 Hierarchical configuration loading:
 1. Default values
 2. User settings (`~/.qwen/settings.json`)
 3. Project settings (`.qwen/settings.json`)
 4. Environment variables
 5. CLI flags
 ### Authentication Methods
 1. **Qwen OAuth** (recommended): Browser-based OAuth flow
 2. **OpenAI-compatible API**: Via `OPENAI_API_KEY` environment variable
 Environment variables for API mode:
 ```bash
-export OPENAI_API_KEY="your-api-key"
+npm run lint       # ESLint check
-export OPENAI_BASE_URL="https://api.openai.com/v1"  # optional
+npm run lint:fix   # Auto-fix lint issues
-export OPENAI_MODEL="gpt-4o"                        # optional
+npm run format     # Prettier formatting
 npm run typecheck  # TypeScript type checking
 npm run preflight  # Full check: clean → install → format → lint → build → typecheck → test
 ```
-## Debugging
+## Code Conventions
-### VS Code
+- **Module system**: ESM throughout (`"type": "module"` in all packages)
 - **TypeScript**: Strict mode with `noImplicitAny`, `strictNullChecks`, `noUnusedLocals`, `verbatimModuleSyntax`
 - **Formatting**: Prettier — single quotes, semicolons, trailing commas, 2-space indent, 80-char width
 - **Linting**: No `any` types, consistent type imports, no relative imports between packages
 - **Tests**: Collocated with source (`file.test.ts` next to `file.ts`), vitest framework
 - **Commits**: Conventional Commits (e.g., `feat(cli): Add --json flag`)
 - **Node.js**: Development requires `~20.19.0`; production requires `>=20`
-Press `F5` to launch with debugger attached, or:
+## GitHub Operations
-```bash
+Use the `gh` CLI for all GitHub-related operations — issues, pull requests, comments, CI checks, releases, and API calls. Prefer `gh issue view`, `gh pr view`, `gh pr checks`, `gh run view`, `gh api`, etc. over web fetches or manual REST calls.
 npm run debug  # Runs with --inspect-brk
 ```
-### React DevTools (for CLI UI)
+## Testing, Debugging, and Bug Fixes
-```bash
+- **Bug reproduction & verification**: spawn the `test-engineer` agent. It reads code and docs to understand the bug, then reproduces it via E2E testing (or a test-script fallback). It also handles post-fix verification. It cannot edit source code — only observe and report.
-DEV=true npm start
+- **Hard bugs**: use the `structured-debugging` skill when debugging requires more than a quick glance — especially when the first attempt at a fix didn't work or the behavior seems impossible.
-npx react-devtools@4.28.5
+- **E2E testing**: the `e2e-testing` skill covers headless mode, interactive (tmux) mode, MCP server testing, and API traffic inspection. The `test-engineer` agent invokes this skill internally — you typically don't need to use it directly.
 ```
 ### Sandbox Debugging
 ```bash
 DEBUG=1 qwen
 ```
 ## Documentation
 - User documentation: <https://qwenlm.github.io/qwen-code-docs/>
 - Local docs development:
  ```bash
  cd docs-site
  npm install
  npm run link  # Links ../docs to content
  npm run dev   # http://localhost:3000
  ```
 ## Contributing Guidelines
 See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed guidelines. Key points:
 1. Link PRs to existing issues
 2. Keep PRs small and focused
 3. Use Draft PRs for WIP
 4. Ensure `npm run preflight` passes
 5. Update documentation for user-facing changes
 6. Follow Conventional Commits for commit messages
 ## Useful Commands Reference
 | Command             | Description                                                          |
 | ------------------- | -------------------------------------------------------------------- |
 | `npm start`         | Start CLI in interactive mode                                        |
 | `npm run dev`       | Development mode with hot reload                                     |
 | `npm run build`     | Build all packages                                                   |
 | `npm run test`      | Run unit tests                                                       |
 | `npm run test:e2e`  | Run integration tests                                                |
 | `npm run preflight` | Full CI check (clean, install, format, lint, build, typecheck, test) |
 | `npm run lint`      | Run ESLint                                                           |
 | `npm run format`    | Run Prettier                                                         |
 | `npm run clean`     | Clean build artifacts                                                |
 ## Session Commands (within CLI)
 - `/help` - Display available commands
 - `/clear` - Clear conversation history
 - `/compress` - Compress history to save tokens
 - `/stats` - Show session information
 - `/bug` - Submit bug report
 - `/exit` or `/quit` - Exit Qwen Code
 ---
--- a/eslint.config.js
+++ b/eslint.config.js
@ -28,6 +28,7 @@ export default tseslint.config(
      'dist/**',
      'docs-site/.next/**',
      'docs-site/out/**',
      '.qwen/**',
    ],
  },
  eslint.configs.recommended,
--- a/integration-tests/terminal-capture/scenarios/bugfix-2833.ts
+++ b/integration-tests/terminal-capture/scenarios/bugfix-2833.ts
@ -0,0 +1,24 @@
 import type { ScenarioConfig } from '../scenario-runner.js';
 /**
 * Streaming capture for /qc:bugfix command on GitHub issue #2833.
 * This scenario runs a long-running bugfix workflow with screenshots every 30 seconds
 * to capture the full evolution of the debugging process.
 */
 export default {
  name: 'streaming-bugfix-2833',
  spawn: ['node', 'dist/cli.js', '--yolo'],
  terminal: { title: 'qwen-code', cwd: '../../..' },
  flow: [
    {
      type: '/qc:bugfix https://github.com/QwenLM/qwen-code/issues/2833',
      // Bugfix workflow is long-running (20+ minutes), capture throughout
      streaming: {
        delayMs: 10000, // Wait 10s for initial prompt processing
        intervalMs: 30000, // Capture every 30 seconds
        count: 50, // Up to 25 minutes of capture (50 * 30s)
        gif: true, // Generate animated GIF
      },
    },
  ],
 } satisfies ScenarioConfig;
--- a/package.json
+++ b/package.json
@ -135,7 +135,7 @@
  "lint-staged": {
    "*.{js,jsx,ts,tsx}": [
      "prettier --write",
-      "eslint --fix --max-warnings 0"
+      "eslint --fix --max-warnings 0 --no-warn-ignored"
    ],
    "*.{json,md}": [
      "prettier --write"