replaceAll('YOUR_MODEL_ID', modelId) was also replacing the instruction
"The variable YOUR_MODEL_ID is declared..." → nonsensical text.
Removed the literal reference from the instruction line.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Some models (e.g., glm-5.1) ignore the {{model}} template in code
blocks and write their own footer without the model name. Fix:
1. BundledSkillLoader prepends YOUR_MODEL_ID="glm-5.1" as a top-level
declaration at the start of the skill body — impossible to miss
2. SKILL.md references YOUR_MODEL_ID in footer instructions
3. Empty model → empty string (no "unknown" — prefer omission)
4. YOUR_MODEL_ID declaration only prepended when model is available
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(core): prevent followup suggestion input/output from appearing in tool call UI
The follow-up suggestion generation was leaking into the conversation UI
through three channels:
1. The forked query included tools in its generation config, allowing the
model to produce function calls during suggestion generation. Fixed by
setting `tools: []` in runForkedQuery's per-request config (kept in
createForkedChat for speculation which needs tools).
2. logApiResponse and logApiError recorded suggestion API events to the
chatRecordingService, causing them to appear in session JSONL files
and the WebUI. Fixed by adding isInternalPromptId() guard that skips
chatRecordingService for 'prompt_suggestion' and 'forked_query' IDs.
uiTelemetryService.addEvent() is preserved so /stats still tracks
suggestion token usage.
3. LoggingContentGenerator logged suggestion requests/responses to the
OpenAI logger and telemetry pipeline. Fixed by skipping logApiRequest,
buildOpenAIRequestForLogging, and logOpenAIInteraction for internal
prompt IDs. _logApiResponse is preserved (for /stats) but its
chatRecordingService path is filtered by fix#2.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: deduplicate isInternalPromptId into shared export from loggers.ts
Address review feedback: extract isInternalPromptId() to a single
exported function in telemetry/loggers.ts and import it in
LoggingContentGenerator, eliminating the duplicate private method.
Also update loggingContentGenerator.test.ts mock to use importOriginal
so the real isInternalPromptId is available during tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: extract isInternalPromptId to shared utils, add tests
Address maintainer review feedback:
1. Move isInternalPromptId() to packages/core/src/utils/internalPromptIds.ts
using a ReadonlySet for the ID registry. Adding new internal prompt IDs
only requires changing one file. loggers.ts re-exports for compatibility,
loggingContentGenerator.ts imports directly from utils.
2. Extract `tools: []` magic value to a frozen NO_TOOLS constant in
forkedQuery.ts.
3. Add unit tests for isInternalPromptId: prompt_suggestion → true,
forked_query → true, user_query → false, empty string → false.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address Copilot review — docs, stream optimization, tests
1. Update forkedQuery.ts module docs to reflect that runForkedQuery
overrides tools: [] at the per-request level while createForkedChat
retains the full generationConfig for speculation callers.
2. Propagate isInternal into loggingStreamWrapper to skip response
collection and consolidation for internal prompts, avoiding
unnecessary CPU/memory overhead.
3. Add logApiResponse chatRecordingService filter tests: verify
prompt_suggestion/forked_query skip recording while normal IDs
still record.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: deep-freeze NO_TOOLS, add internal prompt guard tests
Address Copilot review round 3:
1. Deep-freeze NO_TOOLS.tools array to prevent shared mutable state
across forked query calls.
2. Add LoggingContentGenerator tests verifying that internal prompt IDs
(prompt_suggestion, forked_query) skip logApiRequest and OpenAI
interaction logging while preserving logApiResponse.
3. Add logApiError chatRecordingService filter tests matching the
existing logApiResponse coverage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: reconcile createForkedChat JSDoc with module header
Clarify that createForkedChat retains the full generationConfig
(including tools) for speculation callers, while runForkedQuery
strips tools at the per-request level via NO_TOOLS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: build errors and Copilot round 4 feedback
1. Fix NO_TOOLS type: Object.freeze produces readonly array incompatible
with ToolUnion[]. Use Readonly<Pick<>> instead; spread in requestConfig
already creates a fresh mutable copy per call.
2. Fix test missing required 'model' field in ContentGeneratorConfig.
3. Track firstResponseId/firstModelVersion in loggingStreamWrapper so
_logApiResponse/_logApiError have accurate values even when full
response collection is skipped for internal prompts.
4. Strengthen OpenAI logger test assertion: assert OpenAILogger was
constructed (not guarded by if), then assert logInteraction was
not called.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove dead Object.keys check, add streaming internal prompt test
1. Simplify runForkedQuery: requestConfig always has tools:[] from
NO_TOOLS spread, so the Object.keys().length > 0 ternary is dead
code. Pass requestConfig directly.
2. Add generateContentStream test for internal prompt IDs to match
the existing generateContent coverage, ensuring the streaming
wrapper also skips logApiRequest and OpenAI interaction logging.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent Enter accept from re-inserting suggestion into buffer
When accepting a followup suggestion via Enter, accept() queued
buffer.insert(suggestion) in a microtask that executed after
handleSubmitAndClear had already cleared the buffer, leaving the
suggestion text stuck in the input.
Add skipOnAccept option to accept() so the Enter path bypasses the
onAccept callback. Also add runForkedQuery unit tests verifying
tools: [] is passed in per-request config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(core): add speculation to internal IDs, fix logToolCall filtering, improve suggestion prompt
- Add 'speculation' to INTERNAL_PROMPT_IDS so speculation API traffic
and tool calls are hidden from chat recordings and tool call UI
- Add isInternalPromptId check to logToolCall() for consistency with
logApiError/logApiResponse
- Improve SUGGESTION_PROMPT: prioritize assistant's last few lines and
extract actionable text from explicit tips (e.g. "Tip: type X")
- Fix garbled unicode in prompt text
- Update design docs and user docs to reflect changes
- Add test coverage for all new behavior
* fix(core): deep-freeze NO_TOOLS, add speculation to loggingContentGenerator tests
- Object.freeze NO_TOOLS and its tools array to prevent runtime mutation
- Add 'speculation' to loggingContentGenerator internal prompt ID tests
for consistency with loggers.test.ts and internalPromptIds.ts
* fix(core): fix NO_TOOLS Object.freeze type error
Use `as const` with type assertion to satisfy TypeScript while keeping
runtime immutability via Object.freeze.
* refactor(core): remove unused isInternalPromptId re-export from loggers.ts
All consumers import directly from utils/internalPromptIds.js.
The re-export was dead code with no importers.
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ui): prevent useEffect from running every render in InputPrompt
getDirectories() returns a new array reference each call, causing the
useEffect dependency check to fail on every render. Move the call
inside the effect body and use stable dependencies [config, dirs] so
the effect only re-runs when they actually change.
* fix(ui): use serialized dep key for directory change detection
Move from [config, dirs] deps (both stable refs that miss external
changes) to a dirKey string (join of current directories). This
preserves the perf fix (no new array ref in deps) while still
detecting directory additions/removals from /add-dir etc.
* refactor(cli): remove unused dirs state from InputPrompt
The dirs parameter passed to useCommandCompletion() was never read
inside that hook, making the dirs state and sync effect in InputPrompt
dead code. Remove the parameter, the state, the effect, and all test
call-site args.
The previous version bump commit (bb4376c) only updated the root
package.json but did not run `npm run release:version` to propagate
the version and sandboxImageUri to all workspace packages.
This caused Docker sandbox integration tests to fail in CI with
"manifest unknown" because build_sandbox.js built image 0.14.1
(from packages/cli/package.json) while sandboxConfig.ts expected
image 0.14.2 (from root package.json).
Fixes: https://github.com/QwenLM/qwen-code/actions/runs/24135197272/job/70424966323
- Add qwen3.6-plus to both China and Global/Intl regions as the first
model in the Coding Plan template (1M context, enable_thinking)
- Set qwen3.6-plus as the new default MAINLINE_CODER_MODEL
- Add image+video input modality support for qwen3.6-plus
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* feat(core): adaptive output token escalation (8K default + 64K retry)
99% of model responses are under 5K tokens, but we previously reserved
32K for every request. This wastes GPU slot capacity by ~4x.
Now the default output limit is 8K. When a response hits this cap
(stop_reason=max_tokens), it automatically retries once at 64K — only
the ~1% of requests that actually need more tokens pay the cost.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add design doc and user doc for adaptive output token escalation
- Add design doc covering problem, architecture, token limit
determination, escalation mechanism, and design decisions
- Document QWEN_CODE_MAX_OUTPUT_TOKENS env var in settings.md
- Add max_tokens adaptive behavior explanation in model config section
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
"Yes, and manually approve edits" was restoring getPrePlanMode() which
could be YOLO, contradicting the label. Now hardcodes DEFAULT to match
the "manually approve" semantics.
Align with observed provider prompt-cache TTL (~5 min). Add
`context.gapThresholdMinutes` setting so users can tune the threshold
for providers with different cache TTLs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename the subcommand to accurately reflect its behavior (exits plan
mode and restores previous approval mode, does not trigger execution).
Update source, tests, i18n keys (6 locales), and docs.
LLM was putting all findings in the review body (creating a summary
comment) instead of the comments array (inline comments). Added
prominent warning: "Findings go in comments array, NOT in body."
Also: "Do NOT use COMMENT when there are Critical findings."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the two-phase posting (individual gh api comments + separate
gh pr review verdict) with a single Create Review API call that bundles
inline comments + verdict together — same approach as Copilot Code Review.
Benefits:
- No summary comment needed (inline comments ARE the review)
- No "two-phase posting" complexity
- No "STOP for Comment verdict" rules
- No duplicate/orphaned reviews
- One API call instead of N+1
- Verdict (approve/request_changes/comment) correctly attached
Eliminates ~40 lines of complex posting rules replaced by ~30 lines
of straightforward JSON construction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three issues found from real /review output on PR #2921:
1. Critical findings but verdict submitted as --comment instead of
--request-changes. Added explicit: "Do NOT use --comment when
verdict is Request changes — this loses the blocking status."
2. Nice to have findings appeared in PR summary. Added: "Do NOT
include Nice to have findings" to all summary rules.
3. Clarified that failed-inline summary should only contain
Critical/Suggestion, never Nice to have.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues found from real review (PR #2826):
1. Multiple /review runs on same PR create duplicate comments. Now
Step 9 checks for existing "via Qwen Code /review" comments
before posting and warns the user about potential duplicates.
2. Comments posted without line numbers appear as orphaned PR
comments. Now enforced: every inline comment MUST reference a
specific line in the diff. Findings that can't be mapped to
diff lines go in the summary instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced 5 numbered rules + example with example-first format.
LLMs pattern-match from examples better than parsing rules.
Rules condensed to 2 sentences after the example.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Inline comments now use ```suggestion blocks when the fix is a direct
line replacement. PR authors can accept fixes with one click instead
of manually copying code. Falls back to regular code blocks when the
fix spans multiple locations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Logic errors causing incorrect behavior (wrong return values, skipped
code paths) were being classified as Suggestion instead of Critical.
Added explicit examples: "if code does something wrong, it's Critical
— not Suggestion."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three issues found in real review output:
1. Summary repeated findings already posted as inline comments
2. "Review Stats" (agent count, raw/confirmed) is internal noise
3. Summary was too verbose
Fix: partial-failure summary must contain ONLY the failed findings.
Distinguish terminal output (stats OK) from PR comments (no stats).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GitHub renders #1, #2 as links to issues/PRs with those numbers.
Review summaries using "#1 (logic error)" link to the wrong target.
Added guideline: use (1), [1], or descriptive references instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review comments, findings, and summaries must use the same language
as the PR (title/description/code comments). English PR → English
review. Chinese PR → Chinese review. No language switching.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Most Qwen OAuth users don't have a fast model configured for this
feature, so it fires a wasted API request on every turn with no
visible benefit. Default to off; users can opt in via settings.
LLM was writing detailed analysis in the review summary body despite
"minimal body" instruction. Strengthened to "one-line body only, do
NOT include analysis/findings/explanations" with concrete examples.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
User doc and PR description now include the "PR review, zero findings
→ post comments → approve PR" row in the follow-up actions table.
Also fixed PR description: "Step 4" → "Step 9" for post comments.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When PR review finds no issues and --comment was not specified, suggest
"post comments" so the user can formally approve the PR on GitHub.
Without this, the LGTM only appears in terminal — no approval status
on the PR.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iLink Bot API requires these headers for session authentication.
Without them, getupdates returns errcode -14 (session timeout).
The protocol version (2.0.0) is tracked independently from our channel
version, matching the current official plugin's API compatibility level.
Closes#2908
Same class of Windows CI timing flake — the backspace keypress
doesn't propagate through the paste/keypress pipeline fast enough
on slow runners, so replaceRangeByOffset is never called (0 calls).
Loading cached findings from a Markdown report is fragile (unstructured
prose, LLM might misparse). Instead, when --comment is specified on an
unchanged PR, simply run the full review. The user explicitly wants
comments posted — spending 7 LLM calls is acceptable.
Removed reportPath from cache schema (no longer needed).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When --comment is used on an unchanged PR, Step 9 needs prior findings
to post. Cache now stores reportPath pointing to the saved report from
Step 10, allowing findings to be loaded without re-running the review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM was ignoring the {{model}} template and writing its own footer
("— Qwen Code /review" instead of "— glm-5.1 via Qwen Code /review").
Added explicit warning: footer must appear EXACTLY as shown, do NOT
shorten or rephrase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test has a stale closure race condition: the 50ms wait between
pressing '2' and Enter may not be enough for React/Ink to re-render
and re-subscribe the useKeypress callback with the updated
selectedIndex, causing it to read the default value (0) instead of
the expected value (1) on slow CI runners (Windows + Node 20).