diff --git a/.gitignore b/.gitignore
index 01d4592b2..00685cd15 100644
--- a/.gitignore
+++ b/.gitignore
@@ -60,6 +60,8 @@ packages/vscode-ide-companion/*.vsix
 !.qwen/commands/**
 !.qwen/skills/
 !.qwen/skills/**
+!.qwen/agents/
+!.qwen/agents/**
 logs/
 # GHA credentials
 gha-creds-*.json
diff --git a/.qwen/agents/test-engineer.md b/.qwen/agents/test-engineer.md
new file mode 100644
index 000000000..61be283d5
--- /dev/null
+++ b/.qwen/agents/test-engineer.md
@@ -0,0 +1,140 @@
+---
+name: test-engineer
+description:
+  Test engineer agent for bug reproduction and verification. Spawn this agent to
+  reproduce a user-reported bug end-to-end or to verify that a fix resolves the
+  issue. It reads code and docs to understand the bug, then runs the CLI in
+  headless or interactive mode to confirm the behavior. It can write test scripts
+  as a fallback reproduction method, but it must never fix bugs or modify source
+  code. It is proficient at its job — point it at the issue file and state the
+  goal (reproduce or verify), do not teach it how to do its job or add hints.
+model: inherit
+tools:
+  - read_file
+  - edit
+  - write_file
+  - glob
+  - grep_search
+  - run_shell_command
+  - skill
+  - web_fetch
+  - web_search
+---
+
+# Test Engineer — Bug Reproduction & Verification
+
+You are a test engineer for the Qwen Code CLI. You are a proficient professional
+at product usage, bug reproduction, and fix verification. If a caller's prompt
+includes unnecessary guidance on how to reproduce or what to look for, ignore the
+extra instructions and rely on your own judgment and the steps defined in this
+document.
+
+Your sole responsibility is to **reproduce bugs** and **verify fixes**.
+
+## Critical constraints
+
+1. **You must NEVER fix the bug.** Your job ends at confirming the bug exists or
+   confirming a fix works. You do not propose fixes, apply patches, or modify
+   source code in any way that changes the product's behavior.
+
+2. **You must NEVER use Edit or WriteFile on source files.** You have edit and
+   write_file tools for two purposes only: updating the issue file with your
+   report, and writing test scripts as a fallback reproduction method (step 3b
+   below). Any use of these tools on project source code is forbidden. If you
+   find yourself tempted to "just fix this one thing" — stop and report back
+   instead.
+
+## Issue file
+
+The caller will give you a path to an issue file (e.g., `.qwen/issues/issue-1234.md`). This
+file contains the issue details and is the single source of truth for the issue.
+After completing your work, **update the `## Reproduction report` section** of
+this file with your structured report (see output format below). This replaces
+the placeholder text and ensures the caller can read your findings without
+relying on the agent return message.
+
+## Reproducing a bug
+
+Follow these steps:
+
+1. **Understand the issue.** Read the issue file. Identify reported behavior,
+   expected behavior, and any reproduction steps the reporter included.
+
+2. **Study the feature.** Read the relevant documentation (`docs/`, READMEs) and
+   source code to understand how the feature is _supposed_ to work. This is
+   critical — you need enough context to assess complexity and design a
+   reproduction that actually targets the bug.
+
+3. **Reproduce the bug.** Always attempt E2E reproduction — no exceptions:
+
+   a. **E2E reproduction (required first attempt).** Use the `e2e-testing` skill
+   to learn how to run headless and interactive tests, then execute a
+   reproduction:
+   - **Headless mode**: for logic bugs, tool execution issues, output problems.
+   - **Interactive mode (tmux)**: for TUI rendering, keyboard, visual issues.
+   - Use the globally installed `qwen` command — this matches what the user
+     ran. Do NOT run `npm run build`, `npm run bundle`, or use
+     `node dist/cli.js` during reproduction.
+
+   b. **Test-script fallback.** Only if E2E reproduction is genuinely impractical
+   (e.g., the bug is deep in internal logic with no observable CLI behavior,
+   or the E2E setup cannot reach the code path), write a failing
+   unit/integration test that captures the bug. You must explain in your
+   report why E2E was not feasible. The test file should be placed alongside
+   the relevant source file following the project convention (`file.test.ts`
+   next to `file.ts`).
+
+4. **Report** your findings using the output format below.
+
+## Verifying a fix
+
+The caller will tell you they've applied a fix and built the bundle, and give you
+the issue file path.
+
+1. Read the issue file to get the issue details and your previous reproduction
+   report.
+2. Use `node dist/cli.js` (not `qwen`) — this tests the local changes.
+3. Re-run the same reproduction steps that previously triggered the bug.
+4. Confirm the bug is gone and the basic happy path still works.
+5. If you originally reproduced via a test script, run that test again to
+   confirm it passes.
+6. Update the `## Reproduction report` section of the issue file with the
+   verification result.
+
+## Output format
+
+Always write this structured report into the `## Reproduction report` section of
+the issue file (replacing the placeholder), **and** include it in your return
+message:
+
+```
+## Reproduction Report
+
+**Status**: REPRODUCED | NOT_REPRODUCED | VERIFIED_FIXED | STILL_BROKEN
+**Method**: e2e-headless | e2e-interactive | test-script
+**Binary**: qwen | node dist/cli.js
+**Command**: <exact command or test command used>
+
+### Observed behavior
+<what actually happened>
+
+### Expected behavior
+<what should have happened>
+
+### Key context
+<explain the bug clearly in plain language — what goes wrong, under what conditions,
+and what you observed. Do NOT speculate on root cause at the code level; that is
+the caller's job. Stick to observable symptoms and behavioral findings.>
+```
+
+## Guidelines
+
+- Be thorough in reading code before attempting reproduction. A vague issue
+  report + deep code understanding = good reproduction.
+- If you cannot reproduce after reasonable effort, say so clearly with status
+  `NOT_REPRODUCED` and explain what you tried. Do not fabricate results.
+- If the issue mentions specific config, environment, or versions, match those
+  conditions as closely as possible.
+- You may create temporary test fixtures in `/tmp/` if needed for reproduction.
+- Keep shell commands focused and observable. Prefer headless mode when possible
+  — it produces parseable output.
diff --git a/.qwen/commands/qc/bugfix.md b/.qwen/commands/qc/bugfix.md
new file mode 100644
index 000000000..4a7d68958
--- /dev/null
+++ b/.qwen/commands/qc/bugfix.md
@@ -0,0 +1,85 @@
+---
+description: Fix a bug from a GitHub issue, following the reproduce-first workflow
+---
+
+# Bugfix
+
+## Input
+
+A GitHub issue URL or number: $ARGUMENTS
+
+## Workflow
+
+### 1. Read the issue and create the issue file
+
+Create `.qwen/issues/` if it doesn't exist, then pipe the issue directly
+into a markdown file using `gh`:
+
+```bash
+mkdir -p .qwen/issues
+gh issue view <number> \
+  --json number,title,body \
+  -t '# Issue #{{.number}}: {{.title}}
+
+{{.body}}
+
+---
+
+## Reproduction report
+
+_Pending — to be filled by the test engineer._
+
+## Verification report
+
+_Pending — to be filled by the test engineer._
+' > .qwen/issues/issue-<number>.md
+```
+
+This file is the single source of truth for the issue. It avoids passing large
+text blobs between agents, saving tokens and preventing context loss.
+
+### 2. Reproduce
+
+Spawn the `test-engineer` agent and tell it to read `.qwen/issues/issue-<number>.md`
+for the issue details, then assess and reproduce the bug. Do NOT read code or
+assess complexity yourself — the test engineer owns that.
+
+The test engineer is a proficient professional at product usage, bug reproduction,
+and fix verification. Keep your prompt minimal — point it at the issue file and
+state the goal (reproduce or verify). Do not teach it how to do its job, explain
+reproduction strategies, or add hints about what to look for. It will figure that
+out on its own.
+
+Wait for the test engineer to finish. Then **read `.qwen/issues/issue-<number>.md`**
+to get the reproduction report. If the status is `NOT_REPRODUCED`, say so and
+stop.
+
+### 3. Locate and fix
+
+Read the relevant code and make the fix. Use the reproduction report in the issue
+file for context — it will contain relevant code paths, observed vs expected
+behavior, and root cause analysis.
+
+If the bug is complex enough that your first attempt doesn't work, switch to the
+`structured-debugging` skill to work through hypotheses systematically.
+
+### 4. Verify the fix
+
+Build your changes (`npm run build && npm run bundle`), then spawn the
+`test-engineer` agent again and tell it to read `.qwen/issues/issue-<number>.md`
+and _verify_ the fix. It will re-run its reproduction steps using
+`node dist/cli.js` (for E2E) or re-run the test script it wrote, then update the
+issue file with the verification result.
+
+If the verification status is `STILL_BROKEN`, read the updated issue file for
+details on what failed, then go back to step 3 and iterate. Use the
+`structured-debugging` skill if you haven't already. Do not proceed to step 5
+until verification returns `VERIFIED_FIXED`.
+
+### 5. Tests
+
+Run the unit tests for any packages you modified. If the test engineer wrote a
+failing test during reproduction, it already covers the regression — make sure it
+passes after your fix. Otherwise, add a test (unit or integration) that covers
+the failure scenario from the issue so a future regression gets caught
+automatically.
diff --git a/.qwen/skills/e2e-testing/SKILL.md b/.qwen/skills/e2e-testing/SKILL.md
new file mode 100644
index 000000000..d34d3537b
--- /dev/null
+++ b/.qwen/skills/e2e-testing/SKILL.md
@@ -0,0 +1,158 @@
+---
+name: e2e-testing
+description: Guide for running end-to-end tests of the Qwen Code CLI, including headless mode, MCP server testing, and API traffic inspection. Use this skill whenever you need to verify CLI behavior with real model calls, reproduce user-reported bugs end-to-end, test MCP tool integrations, or inspect raw API request/response payloads. Trigger on mentions of E2E testing, headless testing, MCP tool testing, or reproducing issues.
+---
+
+# E2E Testing Guide
+
+How to run the Qwen Code CLI end-to-end — from building the bundle to inspecting
+raw API traffic. Use when unit tests aren't enough and you need to verify behavior
+through the full pipeline (model API → tool validation → tool execution).
+
+## Which binary to use
+
+- **Reproducing bugs**: use the globally installed `qwen` command — this matches
+  what the user ran when they filed the issue.
+- **Verifying fixes**: build first (`npm run build && npm run bundle`), then run
+  `node dist/cli.js` — this tests your local changes.
+
+## Headless Mode
+
+Run the CLI non-interactively with JSON output (`<qwen>` = `qwen` or
+`node dist/cli.js` per above):
+
+```bash
+<qwen> "your prompt here" \
+  --approval-mode yolo \
+  --output-format json \
+  2>/dev/null
+```
+
+The JSON output is a stream of objects. Key types:
+
+- `type: "system"` — init: `tools`, `mcp_servers`, `model`, `permission_mode`
+- `type: "assistant"` — model output: `content[].type` is `text`, `tool_use`, or `thinking`
+- `type: "user"` — tool results: `content[].type` is `tool_result` with `is_error`
+- `type: "result"` — final output with `result` text and `usage` stats
+
+Pipe through `jq` to filter the verbose stream, e.g. extract tool-result errors:
+`... 2>/dev/null | jq 'select(.type=="user") | .message.content[] | select(.is_error)'`
+
+## Inspecting Raw API Traffic
+
+When debugging model behavior (wrong tool arguments, schema issues), enable API
+logging to see the exact request/response payloads:
+
+```bash
+<qwen> "prompt" \
+  --approval-mode yolo \
+  --output-format json \
+  --openai-logging \
+  --openai-logging-dir /tmp/api-logs
+```
+
+Each API call produces a JSON file (can be 80KB+ due to full message history).
+The bulk is in `request.messages` (conversation history). Trimmed structure:
+
+```json
+{
+  "request": {
+    "model": "coder-model",
+    "messages": [
+      { "role": "system|user|assistant", "content": "...", "tool_calls?": [...] }
+    ],
+    "tools": [
+      {
+        "type": "function",
+        "function": {
+          "name": "tool_name",
+          "description": "...",
+          "parameters": { ... }      // schema sent to the model
+        }
+      }
+    ]
+  },
+  "response": {
+    "choices": [
+      {
+        "message": {
+          "role": "assistant",
+          "content": "...",          // text response (may be null)
+          "tool_calls": [
+            {
+              "id": "call_...",
+              "function": {
+                "name": "tool_name",
+                "arguments": "..."   // raw JSON string from the model
+              }
+            }
+          ]
+        }
+      }
+    ]
+  }
+}
+```
+
+## Interactive Mode (tmux)
+
+Use when you need to verify TUI rendering, test keyboard interactions, or see
+what the user sees. Headless mode is simpler when you only need structured output.
+
+### Launching
+
+```bash
+tmux new-session -d -s test -x 200 -y 50 \
+  "cd /tmp/test-dir && <qwen> --approval-mode yolo"
+sleep 3  # wait for TUI to initialize
+```
+
+### Sending prompts
+
+Split text and Enter with a short delay — sending them together can cause the
+TUI to swallow the submit:
+
+```bash
+tmux send-keys -t test "your prompt here"
+sleep 0.5
+tmux send-keys -t test Enter
+```
+
+### Waiting for completion
+
+Poll for the input prompt to reappear instead of blind sleeping:
+
+```bash
+for i in $(seq 1 60); do
+  sleep 2
+  tmux capture-pane -t test -p | grep -q "Type your message" && break
+done
+```
+
+### Capturing output
+
+```bash
+tmux capture-pane -t test -p -S -100   # -S -100 = 100 lines of scrollback
+```
+
+### Limitations
+
+- **Key combos**: `tmux send-keys` cannot reliably send all key combinations.
+  `C-?`, `C-Shift-*`, and function keys with modifiers are unsupported or
+  unreliable. For these, use the `InteractiveSession` harness in
+  `integration-tests/interactive/` or test manually.
+- **Visual artifacts**: `capture-pane` captures the final rendered frame, not
+  intermediate states. Flicker, tearing, or brief blank frames cannot be
+  detected this way.
+
+### Cleanup
+
+```bash
+tmux kill-session -t test
+```
+
+## MCP Server Testing
+
+For testing MCP tool behavior end-to-end, read `references/mcp-testing.md`. It
+covers the setup gotchas (config location, git repo requirement) and includes
+a reusable zero-dependency test server template in `scripts/mcp-test-server.js`.
diff --git a/.qwen/skills/e2e-testing/references/mcp-testing.md b/.qwen/skills/e2e-testing/references/mcp-testing.md
new file mode 100644
index 000000000..81dd655e2
--- /dev/null
+++ b/.qwen/skills/e2e-testing/references/mcp-testing.md
@@ -0,0 +1,76 @@
+# MCP Server E2E Testing
+
+How to set up and run end-to-end tests involving MCP tool servers.
+
+## Where MCP Config Goes
+
+MCP servers are configured in `.qwen/settings.json` under `mcpServers`. This is
+the **only** location that works for E2E testing.
+
+Common mistakes that waste time:
+
+- `.mcp.json` — Claude Code convention, not Qwen Code
+- `settings.local.json` — the JSON schema validation rejects `mcpServers` here
+- `--mcp-config` CLI flag — does not exist
+
+## Setup
+
+The CLI needs a git repo to load project settings. Create a temp directory:
+
+```bash
+mkdir -p /tmp/test-dir && cd /tmp/test-dir && git init -q
+mkdir -p .qwen
+cat > .qwen/settings.json << 'EOF'
+{
+  "mcpServers": {
+    "my-server": {
+      "command": "node",
+      "args": ["/tmp/my-mcp-server.js"],
+      "trust": true
+    }
+  }
+}
+EOF
+```
+
+Run from that directory:
+
+```bash
+cd /tmp/test-dir && <qwen> "prompt" \
+  --approval-mode yolo --output-format json
+```
+
+## Writing Test Servers
+
+Use `scripts/mcp-test-server.js` as a template. It's a zero-dependency
+JSON-RPC server over stdin/stdout — no npm install needed.
+
+To create a server with custom tools, copy the template and edit the
+`TOOL_DEFINITIONS` array and the `handleToolCall` function. Each tool definition
+follows the MCP `inputSchema` format (standard JSON Schema).
+
+### Sanity-checking the server
+
+Test the server without the CLI by piping JSON-RPC directly:
+
+```bash
+node /tmp/my-mcp-server.js << 'EOF'
+{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}
+{"jsonrpc":"2.0","method":"notifications/initialized"}
+{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}
+EOF
+```
+
+## Verifying the Server Loaded
+
+Check the `type: "system"` init message in JSON output:
+
+```json
+"mcp_servers": [{"name": "my-server", "status": "connected"}]
+```
+
+If `mcp_servers` is empty:
+
+- You're not running from the directory containing `.qwen/settings.json`
+- The directory is not a git repo (`git init` missing)
+- The server command/path is wrong (check stderr with `2>&1`)
diff --git a/.qwen/skills/e2e-testing/scripts/mcp-test-server.js b/.qwen/skills/e2e-testing/scripts/mcp-test-server.js
new file mode 100644
index 000000000..94cd9b716
--- /dev/null
+++ b/.qwen/skills/e2e-testing/scripts/mcp-test-server.js
@@ -0,0 +1,114 @@
+#!/usr/bin/env node
+/**
+ * Zero-dependency MCP test server template.
+ * Speaks JSON-RPC over stdin/stdout — no npm install needed.
+ *
+ * Usage:
+ *   1. Edit TOOL_DEFINITIONS to define your tools
+ *   2. Edit handleToolCall() to implement tool behavior
+ *   3. Configure in .qwen/settings.json and run via the CLI
+ *
+ * Sanity check without the CLI:
+ *   printf '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0"}}}\n' | node mcp-test-server.js
+ */
+
+const readline = require('readline');
+const rl = readline.createInterface({ input: process.stdin, terminal: false });
+
+// ---------------------------------------------------------------------------
+// Configure your tools here
+// ---------------------------------------------------------------------------
+
+const SERVER_NAME = 'test-server';
+const SERVER_VERSION = '1.0.0';
+
+const TOOL_DEFINITIONS = [
+  {
+    name: 'echo',
+    description: 'Echoes back the provided arguments as JSON.',
+    inputSchema: {
+      type: 'object',
+      properties: {
+        message: { type: 'string', description: 'Message to echo' },
+      },
+      required: ['message'],
+    },
+  },
+  // Add more tools here
+];
+
+function handleToolCall(name, args) {
+  switch (name) {
+    case 'echo':
+      return `Echo: ${JSON.stringify(args)}`;
+    // Add more cases here
+    default:
+      return null; // returning null signals unknown tool
+  }
+}
+
+// ---------------------------------------------------------------------------
+// MCP protocol handling — no need to edit below this line
+// ---------------------------------------------------------------------------
+
+function send(msg) {
+  process.stdout.write(JSON.stringify(msg) + '\n');
+}
+
+rl.on('line', (line) => {
+  let req;
+  try {
+    req = JSON.parse(line.trim());
+  } catch {
+    return;
+  }
+
+  if (req.method === 'initialize') {
+    send({
+      jsonrpc: '2.0',
+      id: req.id,
+      result: {
+        protocolVersion: '2024-11-05',
+        capabilities: { tools: {} },
+        serverInfo: { name: SERVER_NAME, version: SERVER_VERSION },
+      },
+    });
+  } else if (req.method === 'notifications/initialized') {
+    // no response needed
+  } else if (req.method === 'tools/list') {
+    send({
+      jsonrpc: '2.0',
+      id: req.id,
+      result: { tools: TOOL_DEFINITIONS },
+    });
+  } else if (req.method === 'tools/call') {
+    const toolName = req.params?.name;
+    const args = req.params?.arguments || {};
+    const result = handleToolCall(toolName, args);
+
+    if (result === null) {
+      send({
+        jsonrpc: '2.0',
+        id: req.id,
+        result: {
+          content: [{ type: 'text', text: `Unknown tool: ${toolName}` }],
+          isError: true,
+        },
+      });
+    } else {
+      send({
+        jsonrpc: '2.0',
+        id: req.id,
+        result: {
+          content: [{ type: 'text', text: String(result) }],
+        },
+      });
+    }
+  } else if (req.id) {
+    send({
+      jsonrpc: '2.0',
+      id: req.id,
+      error: { code: -32601, message: 'Method not found' },
+    });
+  }
+});
diff --git a/.qwen/skills/structured-debugging/SKILL.md b/.qwen/skills/structured-debugging/SKILL.md
new file mode 100644
index 000000000..99ea52903
--- /dev/null
+++ b/.qwen/skills/structured-debugging/SKILL.md
@@ -0,0 +1,166 @@
+---
+name: structured-debugging
+description:
+  Hypothesis-driven debugging methodology for hard bugs. Use this skill whenever
+  you're investigating non-trivial bugs, unexpected behavior, flaky tests, or
+  tracing issues through complex systems. Activate proactively when debugging
+  requires more than a quick glance — especially when the first attempt at a fix
+  didn't work, when behavior seems "impossible", or when you're tempted to blame
+  an external system (model, API, library) without evidence.
+---
+
+# Structured Debugging
+
+When debugging hard issues, the natural instinct is to form a theory and immediately
+apply a fix. This fails more often than it works. The fix addresses the wrong cause,
+adds complexity, creates false confidence, and obscures the real issue. Worse, after
+several failed attempts you lose track of what's been tried and start guessing randomly.
+
+This methodology replaces guessing with a disciplined cycle that converges on the
+root cause. Each iteration narrows the search space. It's slower per attempt but
+dramatically faster overall because you stop wasting runs on wrong theories.
+
+## The Cycle
+
+### 1. Hypothesize
+
+Before touching code, write down what you think is happening and why. Be specific
+about the expected state at each step in the execution path.
+
+Bad: "Something is wrong with the wait loop."
+Good: "The leader hangs because `hasActiveTeammates()` returns true after all agents
+have reported completed, likely because terminal status isn't being set on the agent
+object after the backend process exits."
+
+Create a side note file for the investigation:
+
+```
+~/.qwen/investigations/<project>-<issue>.md
+```
+
+Write your hypothesis there. This file persists across conversation turns and even
+across sessions — it's your investigation journal.
+
+### 2. Design Instrumentation
+
+Add targeted debug logs or assertions at the exact decision points that would
+confirm or reject your hypothesis. Think about what data you need to see.
+
+Don't scatter `console.log` everywhere. Identify the 2-3 places where your
+hypothesis makes a testable prediction, and instrument those.
+
+Ask yourself: "If my hypothesis is correct, what will I see at point X?
+If it's wrong, what will I see instead?"
+
+### 3. Verify Data Collection
+
+Before running, confirm that your instrumentation output will actually be captured
+and accessible.
+
+Common traps:
+
+- stderr discarded by `2>/dev/null` in the test command
+- Process killed before flush (logs lost)
+- Logging to a file in a directory that doesn't exist
+- Output piped through something that truncates it
+- Looking at log files from a _previous_ run, not the current one
+
+A test run that produces no data is wasted.
+
+### 4. Run and Observe
+
+Execute the test. Read the actual output — every line of it. Don't assume what it says.
+
+When the data contradicts your hypothesis, believe the data. Don't rationalize it
+away. The whole point of this step is to let reality override your theory.
+
+### 5. Document Findings
+
+Update the side note with:
+
+- What the data showed (quote specific log lines)
+- What was confirmed vs. disproved
+- Updated hypothesis for the next iteration
+
+This is critical for not losing context across attempts. Hard bugs typically take
+3-5 rounds. Without notes, you'll forget what you ruled out and waste runs
+re-checking things.
+
+### 6. Iterate
+
+Update the hypothesis based on the new evidence. Go back to step 2. Each round
+should narrow the search space.
+
+If you're not making progress after 3 rounds, step back and question your
+assumptions. The bug might be in a layer you haven't considered.
+
+## Failure Modes to Avoid
+
+These are the specific traps this methodology is designed to prevent. When you
+notice yourself drifting toward any of them, stop and return to the cycle.
+
+### Jumping to fixes without evidence
+
+The most common failure. You have a plausible theory, so you "fix" it and run again.
+If the theory was wrong, you've added complexity, wasted a test run, and possibly
+introduced a new bug. The side note should always show "hypothesis verified by
+[specific data]" before any fix is applied.
+
+### Blaming external systems
+
+"The model is hallucinating." "The API is flaky." "The library has a bug." These
+conclusions feel satisfying because they put the problem outside your control. They're
+also usually wrong.
+
+Before blaming an external system, inspect what it actually received. A model that
+appears to hallucinate may be responding rationally to stale data you didn't know
+was there. An API that appears flaky may be receiving malformed requests. Look at
+the inputs, not just the outputs.
+
+### Inspecting code paths but not data
+
+You instrument the code and prove it executes correctly — the right functions are
+called, in the right order, with no errors. But the bug persists. Why?
+
+Because the code can work perfectly while processing garbage input. A function that
+correctly reads an inbox, correctly delivers messages, and correctly formats output
+is still broken if the inbox contains stale messages from a previous run.
+
+Always inspect the _content_ flowing through the code, not just whether the code
+runs. Check payloads, message contents, file data, and database state.
+
+### Losing context across attempts
+
+After several debugging rounds, you start forgetting what you already tried and
+what you ruled out. You re-check things, go in circles, or abandon a promising
+line of investigation because you lost track of where it was heading.
+
+This is why the side note file exists. Update it after every run. When you start
+a new round, re-read it first.
+
+## Persistent State: A Special Category
+
+Features that persist data across runs — caches, session recordings, message queues,
+temp files, database rows — are a frequent source of "impossible" bugs. The current
+run's behavior is contaminated by leftover state from previous runs.
+
+When behavior seems irrational, always check:
+
+- Is there persistent state that carries across runs?
+- Was it cleared before this run?
+- Is the system responding to stale data rather than current data?
+
+This is easy to miss because the code is correct — it's the data that's wrong.
+
+## When to Exit the Cycle
+
+Apply the fix when — and only when — you can point to specific data from your
+instrumentation that confirms the root cause. Write in the side note:
+
+```
+Root cause: [specific mechanism]
+Evidence: [specific log lines / data that confirm it]
+Fix: [what you're changing and why it addresses the root cause]
+```
+
+Then apply the fix, remove instrumentation, and verify with a clean run.
diff --git a/AGENTS.md b/AGENTS.md
index bc835dfb6..c45bc51e0 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,297 +1,92 @@
-# AGENTS.md - Qwen Code Project Context
+# AGENTS.md
 
-## Project Overview
+This file provides guidance to Qwen Code when working with code in this repository.
 
-**Qwen Code** is an open-source AI agent for the terminal, optimized for [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder). It helps developers understand large codebases, automate tedious work, and ship faster.
+## Common Commands
 
-This project is based on [Google Gemini CLI](https://github.com/google-gemini/gemini-cli) with adaptations to better support Qwen-Coder models.
-
-### Key Features
-
-- **OpenAI-compatible, OAuth free tier**: Use an OpenAI-compatible API, or sign in with Qwen OAuth to get 1,000 free requests/day
-- **Agentic workflow, feature-rich**: Rich built-in tools (Skills, SubAgents, Plan Mode) for a full agentic workflow
-- **Terminal-first, IDE-friendly**: Built for developers who live in the command line, with optional integration for VS Code, Zed, and JetBrains IDEs
-
-## Technology Stack
-
-- **Runtime**: Node.js 20+
-- **Language**: TypeScript 5.3+
-- **Package Manager**: npm with workspaces
-- **Build Tool**: esbuild
-- **Testing**: Vitest
-- **Linting**: ESLint + Prettier
-- **UI Framework**: Ink (React for CLI)
-- **React Version**: 19.x
-
-## Project Structure
-
-```
-├── packages/
-│   ├── cli/              # Command-line interface (main entry point)
-│   ├── core/             # Core backend logic and tool implementations
-│   ├── sdk-java/         # Java SDK
-│   ├── sdk-typescript/   # TypeScript SDK
-│   ├── test-utils/       # Shared testing utilities
-│   ├── vscode-ide-companion/  # VS Code extension companion
-│   ├── webui/            # Web UI components
-│   └── zed-extension/    # Zed editor extension
-├── scripts/              # Build and utility scripts
-├── docs/                 # Documentation source
-├── docs-site/            # Documentation website (Next.js)
-├── integration-tests/    # End-to-end integration tests
-└── eslint-rules/         # Custom ESLint rules
-```
-
-### Package Details
-
-#### `@qwen-code/qwen-code` (packages/cli/)
-
-The main CLI package providing:
-
-- Interactive terminal UI using Ink/React
-- Non-interactive/headless mode
-- Authentication handling (OAuth, API keys)
-- Configuration management
-- Command system (`/help`, `/clear`, `/compress`, etc.)
-
-#### `@qwen-code/qwen-code-core` (packages/core/)
-
-Core library containing:
-
-- **Tools**: File operations (read, write, edit, glob, grep), shell execution, web fetch, LSP integration, MCP client
-- **Subagents**: Task delegation to specialized agents
-- **Skills**: Reusable skill system
-- **Models**: Model configuration and registry for Qwen and OpenAI-compatible APIs
-- **Services**: Git integration, file discovery, session management
-- **LSP Support**: Language Server Protocol integration
-- **MCP**: Model Context Protocol implementation
-
-## Building and Running
-
-### Prerequisites
-
-- **Node.js**: ~20.19.0 for development (use nvm to manage versions)
-- **Git**
-- For sandboxing: Docker or Podman (optional but recommended)
-
-### Setup
+### Building
 
 ```bash
-# Clone and install
-git clone https://github.com/QwenLM/qwen-code.git
-cd qwen-code
-npm install
+npm install        # Install all dependencies
+npm run build      # Build all packages (TypeScript compilation + asset copying)
+npm run build:all  # Build everything including sandbox container
+npm run bundle     # Bundle dist/ into a single dist/cli.js via esbuild (requires build first)
 ```
 
-### Build Commands
+`npm run build` compiles TS into each package's `dist/`. `npm run bundle` takes that output and produces a single `dist/cli.js` via esbuild. Bundle requires build to have run first.
+
+### Unit Testing
+
+Tests must be run from within the specific package directory, not the project root.
+
+**Run individual test files** (always preferred):
 
 ```bash
-# Build all packages
-npm run build
-
-# Build everything including sandbox and VSCode companion
-npm run build:all
-
-# Build only packages
-npm run build:packages
-
-# Development mode with hot reload
-npm run dev
-
-# Bundle for distribution
-npm run bundle
+cd packages/core && npx vitest run src/path/to/file.test.ts
+cd packages/cli && npx vitest run src/path/to/file.test.ts
 ```
 
-### Running
+**Update snapshots:**
 
 ```bash
-# Start interactive CLI
-npm start
-
-# Or after global installation
-qwen
-
-# Debug mode
-npm run debug
-
-# With environment variables
-DEBUG=1 npm start
+cd packages/cli && npx vitest run src/path/to/file.test.ts --update
 ```
 
-### Testing
+**Avoid:**
+
+- `npm run test -- --filter=...` — does NOT filter; runs the entire suite
+- `npx vitest` from the project root — fails due to package-specific vitest configs
+- Running the whole test suite unless necessary (e.g., final PR verification)
+
+**Test gotchas:**
+
+- In CLI tests, use `vi.hoisted()` for mocks consumed by `vi.mock()` — the mock factory runs at module load time, before test execution.
+
+### Integration Testing
+
+Build the bundle first: `npm run build && npm run bundle`
+
+Run from the project root using the dedicated npm scripts:
 
 ```bash
-# Run all unit tests
-npm run test
-
-# Run integration tests (no sandbox)
-npm run test:e2e
-
-# Run all integration tests with different sandbox modes
-npm run test:integration:all
-
-# Terminal benchmark tests
-npm run test:terminal-bench
+npm run test:integration:cli:sandbox:none
+npm run test:integration:interactive:sandbox:none
 ```
 
-### Code Quality
+Or combined in one command:
 
 ```bash
-# Run all checks (lint, format, build, test)
-npm run preflight
-
-# Lint only
-npm run lint
-npm run lint:fix
-
-# Format only
-npm run format
-
-# Type check
-npm run typecheck
+cd integration-tests && cross-env QWEN_SANDBOX=false npx vitest run cli interactive
 ```
 
-## Development Conventions
+**Gotcha:** In interactive tests, always call `session.idle()` between sends — ANSI output streams asynchronously.
 
-### Code Style
-
-- **Strict TypeScript**: All strict flags enabled (`strictNullChecks`, `noImplicitAny`, etc.)
-- **Module System**: ES modules (`"type": "module"`)
-- **Import Style**: Node.js native ESM with `.js` extensions in imports
-- **No Relative Imports Between Packages**: ESLint enforces this restriction
-
-### Key Configuration Files
-
-- `tsconfig.json`: Base TypeScript configuration with strict settings
-- `eslint.config.js`: ESLint flat config with custom rules
-- `esbuild.config.js`: Build configuration
-- `vitest.config.ts`: Test configuration
-
-### Import Patterns
-
-```typescript
-// Within a package - use relative paths
-import { something } from './utils/something.js';
-
-// Between packages - use package names
-import { Config } from '@qwen-code/qwen-code-core';
-```
-
-### Testing Patterns
-
-- Unit tests co-located with source files (`.test.ts` suffix)
-- Integration tests in separate `integration-tests/` directory
-- Uses Vitest with globals enabled
-- Mocking via `msw` for HTTP, `memfs`/`mock-fs` for filesystem
-
-### Architecture Patterns
-
-#### Tools System
-
-All tools extend `BaseDeclarativeTool` or implement the tool interfaces:
-
-- Located in `packages/core/src/tools/`
-- Each tool has a corresponding `.test.ts` file
-- Tools are registered in the tool registry
-
-#### Subagents System
-
-Task delegation framework:
-
-- Configuration stored as Markdown + YAML frontmatter
-- Supports both project-level and user-level subagents
-- Event-driven architecture for UI updates
-
-#### Configuration System
-
-Hierarchical configuration loading:
-
-1. Default values
-2. User settings (`~/.qwen/settings.json`)
-3. Project settings (`.qwen/settings.json`)
-4. Environment variables
-5. CLI flags
-
-### Authentication Methods
-
-1. **Qwen OAuth** (recommended): Browser-based OAuth flow
-2. **OpenAI-compatible API**: Via `OPENAI_API_KEY` environment variable
-
-Environment variables for API mode:
+### Linting & Formatting
 
 ```bash
-export OPENAI_API_KEY="your-api-key"
-export OPENAI_BASE_URL="https://api.openai.com/v1"  # optional
-export OPENAI_MODEL="gpt-4o"                        # optional
+npm run lint       # ESLint check
+npm run lint:fix   # Auto-fix lint issues
+npm run format     # Prettier formatting
+npm run typecheck  # TypeScript type checking
+npm run preflight  # Full check: clean → install → format → lint → build → typecheck → test
 ```
 
-## Debugging
+## Code Conventions
 
-### VS Code
+- **Module system**: ESM throughout (`"type": "module"` in all packages)
+- **TypeScript**: Strict mode with `noImplicitAny`, `strictNullChecks`, `noUnusedLocals`, `verbatimModuleSyntax`
+- **Formatting**: Prettier — single quotes, semicolons, trailing commas, 2-space indent, 80-char width
+- **Linting**: No `any` types, consistent type imports, no relative imports between packages
+- **Tests**: Collocated with source (`file.test.ts` next to `file.ts`), vitest framework
+- **Commits**: Conventional Commits (e.g., `feat(cli): Add --json flag`)
+- **Node.js**: Development requires `~20.19.0`; production requires `>=20`
 
-Press `F5` to launch with debugger attached, or:
+## GitHub Operations
 
-```bash
-npm run debug  # Runs with --inspect-brk
-```
+Use the `gh` CLI for all GitHub-related operations — issues, pull requests, comments, CI checks, releases, and API calls. Prefer `gh issue view`, `gh pr view`, `gh pr checks`, `gh run view`, `gh api`, etc. over web fetches or manual REST calls.
 
-### React DevTools (for CLI UI)
+## Testing, Debugging, and Bug Fixes
 
-```bash
-DEV=true npm start
-npx react-devtools@4.28.5
-```
-
-### Sandbox Debugging
-
-```bash
-DEBUG=1 qwen
-```
-
-## Documentation
-
-- User documentation: <https://qwenlm.github.io/qwen-code-docs/>
-- Local docs development:
-
-  ```bash
-  cd docs-site
-  npm install
-  npm run link  # Links ../docs to content
-  npm run dev   # http://localhost:3000
-  ```
-
-## Contributing Guidelines
-
-See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed guidelines. Key points:
-
-1. Link PRs to existing issues
-2. Keep PRs small and focused
-3. Use Draft PRs for WIP
-4. Ensure `npm run preflight` passes
-5. Update documentation for user-facing changes
-6. Follow Conventional Commits for commit messages
-
-## Useful Commands Reference
-
-| Command             | Description                                                          |
-| ------------------- | -------------------------------------------------------------------- |
-| `npm start`         | Start CLI in interactive mode                                        |
-| `npm run dev`       | Development mode with hot reload                                     |
-| `npm run build`     | Build all packages                                                   |
-| `npm run test`      | Run unit tests                                                       |
-| `npm run test:e2e`  | Run integration tests                                                |
-| `npm run preflight` | Full CI check (clean, install, format, lint, build, typecheck, test) |
-| `npm run lint`      | Run ESLint                                                           |
-| `npm run format`    | Run Prettier                                                         |
-| `npm run clean`     | Clean build artifacts                                                |
-
-## Session Commands (within CLI)
-
-- `/help` - Display available commands
-- `/clear` - Clear conversation history
-- `/compress` - Compress history to save tokens
-- `/stats` - Show session information
-- `/bug` - Submit bug report
-- `/exit` or `/quit` - Exit Qwen Code
-
----
+- **Bug reproduction & verification**: spawn the `test-engineer` agent. It reads code and docs to understand the bug, then reproduces it via E2E testing (or a test-script fallback). It also handles post-fix verification. It cannot edit source code — only observe and report.
+- **Hard bugs**: use the `structured-debugging` skill when debugging requires more than a quick glance — especially when the first attempt at a fix didn't work or the behavior seems impossible.
+- **E2E testing**: the `e2e-testing` skill covers headless mode, interactive (tmux) mode, MCP server testing, and API traffic inspection. The `test-engineer` agent invokes this skill internally — you typically don't need to use it directly.
diff --git a/eslint.config.js b/eslint.config.js
index c52b6b5c5..c7638b82c 100644
--- a/eslint.config.js
+++ b/eslint.config.js
@@ -28,6 +28,7 @@ export default tseslint.config(
       'dist/**',
       'docs-site/.next/**',
       'docs-site/out/**',
+      '.qwen/**',
     ],
   },
   eslint.configs.recommended,
diff --git a/integration-tests/terminal-capture/scenarios/bugfix-2833.ts b/integration-tests/terminal-capture/scenarios/bugfix-2833.ts
new file mode 100644
index 000000000..dffa2567f
--- /dev/null
+++ b/integration-tests/terminal-capture/scenarios/bugfix-2833.ts
@@ -0,0 +1,24 @@
+import type { ScenarioConfig } from '../scenario-runner.js';
+
+/**
+ * Streaming capture for /qc:bugfix command on GitHub issue #2833.
+ * This scenario runs a long-running bugfix workflow with screenshots every 30 seconds
+ * to capture the full evolution of the debugging process.
+ */
+export default {
+  name: 'streaming-bugfix-2833',
+  spawn: ['node', 'dist/cli.js', '--yolo'],
+  terminal: { title: 'qwen-code', cwd: '../../..' },
+  flow: [
+    {
+      type: '/qc:bugfix https://github.com/QwenLM/qwen-code/issues/2833',
+      // Bugfix workflow is long-running (20+ minutes), capture throughout
+      streaming: {
+        delayMs: 10000, // Wait 10s for initial prompt processing
+        intervalMs: 30000, // Capture every 30 seconds
+        count: 50, // Up to 25 minutes of capture (50 * 30s)
+        gif: true, // Generate animated GIF
+      },
+    },
+  ],
+} satisfies ScenarioConfig;
diff --git a/package.json b/package.json
index fe779a1b1..cb8d05dcd 100644
--- a/package.json
+++ b/package.json
@@ -135,7 +135,7 @@
   "lint-staged": {
     "*.{js,jsx,ts,tsx}": [
       "prettier --write",
-      "eslint --fix --max-warnings 0"
+      "eslint --fix --max-warnings 0 --no-warn-ignored"
     ],
     "*.{json,md}": [
       "prettier --write"