mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-04-28 03:30:40 +00:00
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(vscode): expose /skills as slash command with secondary picker Add a secondary completion picker for the /skills slash command in the VSCode IDE companion, allowing users to browse and select skills from a dropdown before sending. Changes: - CLI: add 'skills' to ALLOWED_BUILTIN_COMMANDS_NON_INTERACTIVE whitelist - CLI: send available_skills_update via ACP with skill names/descriptions - Extension: handle available_skills_update in session update handler - Webview: implement secondary picker that triggers after selecting /skills - Webview: allow spaces in completion trigger for /skills sub-queries Closes #1562 Made-with: Cursor * feat(vscode-ide-companion): embed skills in commands update metadata - Move available skills from separate session update to _meta field of available_commands_update for more efficient delivery - Simplify skill data to just skill names (string array) - Add skillsCompletion utility for secondary picker logic - Cache available skills in WebViewProvider for replay on webview ready - Update all related types and handlers to support the new structure Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * refactor(vscode-ide-companion): simplify skills picker flow * refactor(vscode-ide-companion): extract skills completion utils to shared module Move `isSkillsSecondaryQuery`, `shouldOpenSkillsSecondaryPicker`, and `SKILL_ITEM_ID_PREFIX` from App.tsx and useCompletionTrigger.ts into a shared `completionUtils.ts` file to eliminate duplication. * fix(vscode-ide-companion): restore skills picker state on reload Cache and replay available skills when the webview becomes ready again. Clear stale skills when commands metadata does not include availableSkills. * fix(vscode-ide-companion): replay slash commands after webview reload Cache available commands in the webview provider. Replay them on webviewReady so slash command state survives reloads. * fix(vscode-ide-companion): import AvailableCommand from ACP SDK * fix(vscode-ide-companion): fallback /skills to direct command * test(vscode-ide-companion): cover skills secondary picker flow * test(vscode-ide-companion): guard App mock initialization * fix(vscode-ide-companion): remove duplicate AvailableCommand import The auto-merge introduced a duplicate AvailableCommand in the @agentclientprotocol/sdk import block, causing TS2300. * fix(vscode-ide-companion): remove duplicate availableCommands replay in handleWebviewReady The handleWebviewReady method was sending cachedAvailableCommands twice on every webview-ready handshake, causing an unnecessary extra state update in the webview. --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
193 lines
7.9 KiB
Markdown
193 lines
7.9 KiB
Markdown
---
|
|
name: structured-debugging
|
|
description: Hypothesis-driven debugging methodology for hard bugs. Use this
|
|
skill whenever you're investigating non-trivial bugs, unexpected behavior,
|
|
flaky tests, or tracing issues through complex systems. Activate proactively
|
|
when debugging requires more than a quick glance — especially when the first
|
|
attempt at a fix didn't work, when behavior seems "impossible", or when you're
|
|
tempted to blame an external system (model, API, library) without evidence.
|
|
---
|
|
|
|
# Structured Debugging
|
|
|
|
When debugging hard issues, the natural instinct is to form a theory and
|
|
immediately apply a fix. This fails more often than it works. The fix addresses
|
|
the wrong cause, adds complexity, creates false confidence, and obscures the
|
|
real issue. Worse, after several failed attempts you lose track of what's been
|
|
tried and start guessing randomly.
|
|
|
|
This methodology replaces guessing with a disciplined cycle that converges on
|
|
the root cause. Each iteration narrows the search space. It's slower per attempt
|
|
but dramatically faster overall because you stop wasting runs on wrong theories.
|
|
|
|
## The Cycle
|
|
|
|
### 1. Hypothesize
|
|
|
|
Before touching code, write down what you think is happening and why. Be
|
|
specific about the expected state at each step in the execution path.
|
|
|
|
Bad: "Something is wrong with the wait loop." Good: "The leader hangs because
|
|
`hasActiveTeammates()` returns true after all agents have reported completed,
|
|
likely because terminal status isn't being set on the agent object after the
|
|
backend process exits."
|
|
|
|
For bugs you expect to take more than one round, create a side note file for the
|
|
investigation in whichever location the project uses for such notes.
|
|
|
|
Write your hypothesis there. This file persists across conversation turns and
|
|
even across sessions — it's your investigation journal.
|
|
|
|
### 2. Design Instrumentation
|
|
|
|
Add targeted debug logs or assertions at the exact decision points that would
|
|
confirm or reject your hypothesis. Think about what data you need to see.
|
|
|
|
Don't scatter `console.log` everywhere. Identify the 2-3 places where your
|
|
hypothesis makes a testable prediction, and instrument those.
|
|
|
|
Prefer logging _values_ (return codes, payload contents, stream types, message
|
|
bodies, env state) over _presence checks_ ("was this function called?", "was
|
|
this branch taken?"). Code-path traces tell you what ran; data traces tell you
|
|
what it ran on. Most non-trivial bugs are correct code processing wrong data.
|
|
|
|
Ask yourself: "If my hypothesis is correct, what will I see at point X? If it's
|
|
wrong, what will I see instead?"
|
|
|
|
### 3. Verify Data Collection
|
|
|
|
Before running, confirm that your instrumentation output will actually be
|
|
captured and accessible.
|
|
|
|
Common traps:
|
|
|
|
- stderr discarded by `2>/dev/null` in the test command
|
|
- Process killed before flush (logs lost)
|
|
- Logging to a file in a directory that doesn't exist
|
|
- Output piped through something that truncates it
|
|
- Looking at log files from a _previous_ run, not the current one
|
|
|
|
A test run that produces no data is wasted.
|
|
|
|
### 4. Run and Observe
|
|
|
|
Execute the test. Read the actual output — every line of it. Don't assume what
|
|
it says.
|
|
|
|
When the data contradicts your hypothesis, believe the data. Don't rationalize
|
|
it away. The whole point of this step is to let reality override your theory.
|
|
|
|
### 5. Document Findings
|
|
|
|
Update the side note with:
|
|
|
|
- What the data showed (quote specific log lines)
|
|
- What was confirmed vs. disproved
|
|
- Updated hypothesis for the next iteration
|
|
|
|
This is critical for not losing context across attempts. Hard bugs typically
|
|
take 3-5 rounds. Without notes, you'll forget what you ruled out and waste runs
|
|
re-checking things.
|
|
|
|
### 6. Iterate
|
|
|
|
Update the hypothesis based on the new evidence. Go back to step 2. Each round
|
|
should narrow the search space.
|
|
|
|
If you're not making progress after 3 rounds, step back and question your
|
|
assumptions. The bug might be in a layer you haven't considered.
|
|
|
|
## Failure Modes to Avoid
|
|
|
|
These are the specific traps this methodology is designed to prevent. When you
|
|
notice yourself drifting toward any of them, stop and return to the cycle.
|
|
|
|
### Jumping to fixes without evidence
|
|
|
|
The most common failure. You have a plausible theory, so you "fix" it and run
|
|
again. If the theory was wrong, you've added complexity, wasted a test run, and
|
|
possibly introduced a new bug. The side note should always show "hypothesis
|
|
verified by [specific data]" before any fix is applied.
|
|
|
|
### Blaming external systems
|
|
|
|
"The model is hallucinating." "The API is flaky." "The library has a bug." These
|
|
conclusions feel satisfying because they put the problem outside your control.
|
|
They're also usually wrong.
|
|
|
|
Before blaming an external system, inspect what it actually received. A model
|
|
that appears to hallucinate may be responding rationally to stale data you
|
|
didn't know was there. An API that appears flaky may be receiving malformed
|
|
requests. Look at the inputs, not just the outputs.
|
|
|
|
### Inspecting code paths but not data
|
|
|
|
You instrument the code and prove it executes correctly — the right functions
|
|
are called, in the right order, with no errors. But the bug persists. Why?
|
|
|
|
Because the code can work perfectly while processing garbage input. A function
|
|
that correctly reads an inbox, correctly delivers messages, and correctly
|
|
formats output is still broken if the inbox contains stale messages from a
|
|
previous run.
|
|
|
|
Always inspect the _content_ flowing through the code, not just whether the code
|
|
runs. Check payloads, message contents, file data, and database state.
|
|
|
|
### Reframing the user's report instead of investigating it
|
|
|
|
When the user reports a symptom your own run doesn't reproduce, the
|
|
contradiction _is_ the evidence — the two environments differ in some way you
|
|
haven't identified yet. The wrong move is to reframe their report ("they must be
|
|
on a stale SHA", "they must be confused about what they saw", "must be a flake")
|
|
so that your run becomes the ground truth. Once you do that, every later piece
|
|
of evidence gets bent to defend the reframing, and the actual bug stays hidden.
|
|
|
|
The right move: catalogue what differs between their environment and yours (TTY
|
|
vs pipe, terminal emulator, shell, locale, env vars, prior state, build
|
|
artifacts) before forming any hypothesis. For ambiguous symptoms ("no output",
|
|
"it's slow", "it's wrong") ask one disambiguating question first — e.g., "does
|
|
does it hang or exit cleanly?" That prunes the hypothesis space before any
|
|
test run.
|
|
|
|
### Losing context across attempts
|
|
|
|
After several debugging rounds, you start forgetting what you already tried and
|
|
what you ruled out. You re-check things, go in circles, or abandon a promising
|
|
line of investigation because you lost track of where it was heading.
|
|
|
|
This is why the side note file exists. Update it after every run. When you start
|
|
a new round, re-read it first.
|
|
|
|
## Persistent State: A Special Category
|
|
|
|
Features that persist data across runs — caches, session recordings, message
|
|
queues, temp files, and database rows often cause "impossible" bugs.
|
|
The current run's behavior is contaminated by leftover state from previous runs.
|
|
|
|
When behavior seems irrational, always check:
|
|
|
|
- Is there persistent state that carries across runs?
|
|
- Was it cleared before this run?
|
|
- Is the system responding to stale data rather than current data?
|
|
|
|
This is easy to miss because the code is correct — it's the data that's wrong.
|
|
|
|
## When to Exit the Cycle
|
|
|
|
Apply the fix only when you can point to specific data from your
|
|
instrumentation that confirms the root cause. Write in the side note:
|
|
|
|
```
|
|
Root cause: [specific mechanism]
|
|
Evidence: [specific log lines / data that confirm it]
|
|
Fix: [what you're changing and why it addresses the root cause]
|
|
```
|
|
|
|
Then apply the fix, remove instrumentation, and verify with a clean run.
|
|
|
|
## Worked examples
|
|
|
|
- `examples/headless-bg-agent-empty-stdout.md`
|
|
— pipe-captured runs all passed; the user's TTY printed nothing. The
|
|
contradiction _was_ the bug. Illustrates _reproduction contradiction is data_
|
|
and _instrument data, not code paths_.
|