feat: add bugfix workflow, test-engineer agent, and debugging skills

- Add test-engineer agent for bug reproduction and verification - Add /qc:bugfix command for structured bugfix workflow - Add e2e-testing skill covering headless/interactive modes, MCP testing - Add structured-debugging skill for hypothesis-driven debugging - Simplify AGENTS.md to focus on essential commands and conventions - Add terminal-capture scenario for bugfix workflow testing - Add .qwen folder to ESLint ignore list Known limitations: The /qc:bugfix workflow and e2e-testing skill are experimental and may be unstable or consume significant tokens. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
2026-04-28 11:41:04 +00:00 · 2026-04-04 18:30:09 +08:00 · 2026-04-04 18:30:09 +08:00 · dc833d9d94
commit dc833d9d94
parent 3bce84d5da
11 changed files with 826 additions and 265 deletions
--- a/.qwen/skills/structured-debugging/SKILL.md
+++ b/.qwen/skills/structured-debugging/SKILL.md
@ -0,0 +1,166 @@
+---
+name: structured-debugging
+description:
+  Hypothesis-driven debugging methodology for hard bugs. Use this skill whenever
+  you're investigating non-trivial bugs, unexpected behavior, flaky tests, or
+  tracing issues through complex systems. Activate proactively when debugging
+  requires more than a quick glance — especially when the first attempt at a fix
+  didn't work, when behavior seems "impossible", or when you're tempted to blame
+  an external system (model, API, library) without evidence.
+---
+
+# Structured Debugging
+
+When debugging hard issues, the natural instinct is to form a theory and immediately
+apply a fix. This fails more often than it works. The fix addresses the wrong cause,
+adds complexity, creates false confidence, and obscures the real issue. Worse, after
+several failed attempts you lose track of what's been tried and start guessing randomly.
+
+This methodology replaces guessing with a disciplined cycle that converges on the
+root cause. Each iteration narrows the search space. It's slower per attempt but
+dramatically faster overall because you stop wasting runs on wrong theories.
+
+## The Cycle
+
+### 1. Hypothesize
+
+Before touching code, write down what you think is happening and why. Be specific
+about the expected state at each step in the execution path.
+
+Bad: "Something is wrong with the wait loop."
+Good: "The leader hangs because `hasActiveTeammates()` returns true after all agents
+have reported completed, likely because terminal status isn't being set on the agent
+object after the backend process exits."
+
+Create a side note file for the investigation:
+
+```
+~/.qwen/investigations/<project>-<issue>.md
+```
+
+Write your hypothesis there. This file persists across conversation turns and even
+across sessions — it's your investigation journal.
+
+### 2. Design Instrumentation
+
+Add targeted debug logs or assertions at the exact decision points that would
+confirm or reject your hypothesis. Think about what data you need to see.
+
+Don't scatter `console.log` everywhere. Identify the 2-3 places where your
+hypothesis makes a testable prediction, and instrument those.
+
+Ask yourself: "If my hypothesis is correct, what will I see at point X?
+If it's wrong, what will I see instead?"
+
+### 3. Verify Data Collection
+
+Before running, confirm that your instrumentation output will actually be captured
+and accessible.
+
+Common traps:
+
+- stderr discarded by `2>/dev/null` in the test command
+- Process killed before flush (logs lost)
+- Logging to a file in a directory that doesn't exist
+- Output piped through something that truncates it
+- Looking at log files from a _previous_ run, not the current one
+
+A test run that produces no data is wasted.
+
+### 4. Run and Observe
+
+Execute the test. Read the actual output — every line of it. Don't assume what it says.
+
+When the data contradicts your hypothesis, believe the data. Don't rationalize it
+away. The whole point of this step is to let reality override your theory.
+
+### 5. Document Findings
+
+Update the side note with:
+
+- What the data showed (quote specific log lines)
+- What was confirmed vs. disproved
+- Updated hypothesis for the next iteration
+
+This is critical for not losing context across attempts. Hard bugs typically take
+3-5 rounds. Without notes, you'll forget what you ruled out and waste runs
+re-checking things.
+
+### 6. Iterate
+
+Update the hypothesis based on the new evidence. Go back to step 2. Each round
+should narrow the search space.
+
+If you're not making progress after 3 rounds, step back and question your
+assumptions. The bug might be in a layer you haven't considered.
+
+## Failure Modes to Avoid
+
+These are the specific traps this methodology is designed to prevent. When you
+notice yourself drifting toward any of them, stop and return to the cycle.
+
+### Jumping to fixes without evidence
+
+The most common failure. You have a plausible theory, so you "fix" it and run again.
+If the theory was wrong, you've added complexity, wasted a test run, and possibly
+introduced a new bug. The side note should always show "hypothesis verified by
+[specific data]" before any fix is applied.
+
+### Blaming external systems
+
+"The model is hallucinating." "The API is flaky." "The library has a bug." These
+conclusions feel satisfying because they put the problem outside your control. They're
+also usually wrong.
+
+Before blaming an external system, inspect what it actually received. A model that
+appears to hallucinate may be responding rationally to stale data you didn't know
+was there. An API that appears flaky may be receiving malformed requests. Look at
+the inputs, not just the outputs.
+
+### Inspecting code paths but not data
+
+You instrument the code and prove it executes correctly — the right functions are
+called, in the right order, with no errors. But the bug persists. Why?
+
+Because the code can work perfectly while processing garbage input. A function that
+correctly reads an inbox, correctly delivers messages, and correctly formats output
+is still broken if the inbox contains stale messages from a previous run.
+
+Always inspect the _content_ flowing through the code, not just whether the code
+runs. Check payloads, message contents, file data, and database state.
+
+### Losing context across attempts
+
+After several debugging rounds, you start forgetting what you already tried and
+what you ruled out. You re-check things, go in circles, or abandon a promising
+line of investigation because you lost track of where it was heading.
+
+This is why the side note file exists. Update it after every run. When you start
+a new round, re-read it first.
+
+## Persistent State: A Special Category
+
+Features that persist data across runs — caches, session recordings, message queues,
+temp files, database rows — are a frequent source of "impossible" bugs. The current
+run's behavior is contaminated by leftover state from previous runs.
+
+When behavior seems irrational, always check:
+
+- Is there persistent state that carries across runs?
+- Was it cleared before this run?
+- Is the system responding to stale data rather than current data?
+
+This is easy to miss because the code is correct — it's the data that's wrong.
+
+## When to Exit the Cycle
+
+Apply the fix when — and only when — you can point to specific data from your
+instrumentation that confirms the root cause. Write in the side note:
+
+```
+Root cause: [specific mechanism]
+Evidence: [specific log lines / data that confirm it]
+Fix: [what you're changing and why it addresses the root cause]
+```
+
+Then apply the fix, remove instrumentation, and verify with a clean run.