mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-15 09:41:20 +00:00
5743 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
cc916b528e |
docs(attribution): correct legacyTypes / EXCLUDED_DIRECTORY_SEGMENTS comments
9Ta_ (Copilot): the JSDoc on legacyTypes claimed JSON Schema's
`type` keyword does not accept `'object'` — that's wrong; `'object'`
IS a valid JSON Schema type. Reword to reflect the actual rationale:
`'enum'` is not a valid JSON Schema `type` value at all (enum
constraints use the `enum` keyword), and a bare `{type: 'object'}`
would accept any object regardless of what the field's pre-expansion
shape actually allowed. The narrowed `boolean | string | number |
array` set is exactly what the one-liner generator can faithfully
emit; richer legacy shapes belong in their own branch of
convertSettingToJsonSchema.
9Tbs (Copilot): the comment in generatedFiles.ts referenced
`EXCLUDED_DIRECTORIES`, but the constant is `EXCLUDED_DIRECTORY_SEGMENTS`
(renamed during the segment-boundary refactor). Update the
reference so a future maintainer scanning for the rule doesn't
chase a non-existent identifier.
|
||
|
|
a66a21d81d |
fix(attribution): GIT_DIR repo-shift bail, snapshot envelope validation, narrow legacyTypes
80ME (gpt-5.5 /review, [Critical]): tokeniseSegment unconditionally stripped every leading KEY=value token. `GIT_DIR=elsewhere/.git git commit ...` was therefore treated as an in-cwd commit, picked up the Co-authored-by trailer, and produced a per-file note that landed against our cwd's HEAD even though the actual commit went to a different repo. Define a GIT_ENV_SHIFTS_REPO set (GIT_DIR, GIT_WORK_TREE, GIT_COMMON_DIR, GIT_INDEX_FILE, GIT_NAMESPACE) and have tokeniseSegment refuse to parse any segment whose leading env block (including the env-wrapper's KEY=VALUE block) carries one of these. Identity / date variables (GIT_AUTHOR_*, GIT_COMMITTER_*) are deliberately NOT in the set — they tweak metadata but don't relocate the repo. Tests cover plain prefix, env-wrapped prefix, and a GIT_COMMITTER_DATE positive control that should still get the trailer. 8EeQ (Copilot): restoreFromSnapshot received `snapshot as AttributionSnapshot` from a structural cast off `unknown` (the resume path), so its TS-typed shape was only a hint. A corrupted JSONL line (non-object / array / wrong type discriminator / missing type) would skip past the version check straight into Object.entries(snapshot.fileStates) — and a non-object fileStates (an array, say) seeded fileAttributions with numeric-string keys. Add envelope-level shape gates (isPlainObject + type discriminator) and a fileStates plain-object check before iterating; both bail to a clean reset rather than poisoning the singleton. Tests added. 8Eej (Copilot): SettingDefinition.legacyTypes was typed as SettingsType[] which includes 'enum' and 'object' — JSON Schema's `type` keyword doesn't accept those values. Adding `legacyTypes: ['enum']` would silently produce an invalid settings.schema.json. Narrow the field's type to ReadonlyArray<'boolean' | 'string' | 'number' | 'array'> (the JSON-Schema-primitive subset). Future complex-shape legacy support should land its own branch in convertSettingToJsonSchema. |
||
|
|
c0ed9092b0 |
fix(attribution): restore fire-and-forget appendRecord, route rollback via callback
6OcJ (Copilot): refactor in
|
||
|
|
8f9c7e405a |
Merge remote-tracking branch 'origin/main' into feat/commit-attribution
# Conflicts: # packages/core/src/tools/write-file.test.ts |
||
|
|
5441581392
|
feat(core): enforce prior read before Edit / WriteFile mutates a file (#3774)
* feat(core): enforce prior read before Edit / WriteFile mutates a file Introduces a session-scoped invariant: the model cannot mutate an existing file without having actually Read it (or its post-write state) earlier in this conversation. Builds on the FileReadCache landed in #3717. Two new ToolErrorType codes: - EDIT_REQUIRES_PRIOR_READ — file has no entry in the session cache. The model is told to use read_file first. - FILE_CHANGED_SINCE_READ — file has an entry but its mtime or size drifted since the recorded fingerprint. The model is told to re-read before retrying. EditTool blocks the existing-file path on cache.check; new-file creation (old_string === '' on a non-existent target) is exempt. WriteFileTool blocks the overwrite path; new-file creation (fileExists === false) is exempt. Both tools route through the existing fileReadCacheDisabled escape hatch on Config — flipping it bypasses enforcement byte-for-byte, matching pre-cache behaviour. Operators can use this as a kill switch if a session falls into a state where the cache cannot be trusted. ReadFile fix on the auto-memory path: PR #3717 had auto-memory reads skip the cache entirely (both lookup and record), but with the new enforcement that means a model that just Read AGENTS.md cannot then Edit it. Decoupled the two: auto-memory reads still skip the file_unchanged fast-path (the per-read freshness <system-reminder> must always reach the model) but DO record into the cache so the follow-up Edit sees `fresh`. New regression test asserts this. Test plan - vitest run (all of @qwen-code/qwen-code-core): 6308 passed, 2 skipped - 9 new enforcement tests across edit.test.ts and write-file.test.ts: unknown rejects, stale rejects, new-file exempt, edit chain stays authorised, escape hatch bypasses, plus the auto-memory record regression in read-file.test.ts. - tsc --noEmit clean. eslint clean. core build succeeds. * test(core): clear shared fileReadCache between write-file.test.ts cases CI surfaced one Linux-only failure: the prior-read enforcement test 'rejects a write that would overwrite an unread existing file' returned FILE_CHANGED_SINCE_READ instead of EDIT_REQUIRES_PRIOR_READ. Root cause: the FileReadCache instance is declared at module scope (line 41) and shared across every test in write-file.test.ts. State from earlier tests — most recently the 'records a write' integration test that records the same path — leaks forward. On Linux the test ordering puts a record-bearing test before the enforcement test, so the cache reports `stale` (mtime drifted) instead of `unknown`. macOS / Windows happen to order them differently and never hit it. Adding a fileReadCache.clear() to beforeEach gives every test a known-empty cache, matching how edit.test.ts already isolates its per-test cache by re-instantiating it. * fix(core): close prior-read enforcement gaps flagged in 3rd review Three concrete loopholes / regressions that the original PR-B introduction left open. All three are addressed in the same commit because the underlying refactor (move enforcement earlier and tighten the fresh predicate) is shared across them. 1. fresh != "model has seen the bytes". Pre-fix, requirePriorRead() accepted any cache.check === 'fresh'. ReadFileTool records every successful read into the cache, including ranged reads (offset/limit), truncated full reads, and non-cacheable binary/image/audio/video/PDF/notebook reads (lastReadCacheable = false). This let the model peek at a slice or a structured payload of a file and then mutate the whole thing. Tightened the accept predicate to fresh && lastReadAt && lastReadWasFull && lastReadCacheable. 2. Read-less content oracle through calculateEdit error codes. Pre-fix, execute() ran calculateEdit (which reads file bytes and counts matches) before the enforcement check. A model could probe an unread file by attempting Edits with candidate old_strings and observing NO_OCCURRENCE_FOUND vs EXPECTED_OCCURRENCE_MISMATCH vs EDIT_NO_CHANGE — reverse-engineering content without ever calling read_file. Moved enforcement to the top of calculateEdit, before any content read; only a stat is performed up to the rejection point. 3. Confirmation flow regression. Pre-fix, getConfirmationDetails() read the existing file to render a diff for the user, then approval flowed to execute() which would freshly check the cache and reject. The user could approve a diff computed from current bytes the model never saw, and the call would still fail. Moved enforcement before the confirmation read in both EditTool (via the shared calculateEdit path) and WriteFileTool (explicit check at the top of getConfirmationDetails). The user now never sees a confirmation diff for an unread file — the call rejects up front. Public API surface change: requirePriorRead() -> checkPriorRead() that returns a structured decision, so the same predicate can route into a CalculatedEdit.error (calculateEdit), a thrown error (getConfirmationDetails), or a ToolResult (execute) without duplicating the boolean / message / type plumbing in three shapes. Reported by pomelo-nwu (3 inline comments on PR #3774). * refactor(core): close 4 prior-read enforcement gaps from 4th review 1. recordWrite now seeds read metadata on brand-new entries (lastReadAt / lastReadWasFull / lastReadCacheable). The strict accept predicate added in the previous round (#3 review) requires all three, but recordWrite only set lastWriteAt — so a model creating a file with Edit (old_string="") or WriteFile and then editing it again was rejected on the second edit. The model authored the bytes it just wrote; for the purposes of prior-read enforcement that counts as having seen them. New regression test in edit.test.ts: "allows a create-then-edit-then-edit chain without an intervening read". 2. Extracted checkPriorRead into src/tools/priorReadEnforcement.ts. The two copies in edit.ts and write-file.ts had already drifted (one used ${ReadFileTool.Name}, the other hardcoded 'read_file'); the boolean guard is security-sensitive and a one-sided fix would silently weaken the boundary. The shared utility takes a verb ('editing' | 'overwriting') so the user-facing prose can differ between callers without duplicating the decision logic. 3. WriteFileTool.execute now runs prior-read enforcement BEFORE readTextFile. Pre-fix, an unread overwrite still slurped the entire file into memory (encoding / BOM / line-ending detection) and only then rejected it: wasted I/O, and momentary in-memory custody of bytes the model never legitimately read. Now matches the order in getConfirmationDetails(). 4. The "rejects a write that would overwrite an unread existing file" test now spies on FileSystemService.readTextFile and asserts not.toHaveBeenCalled() — without that, the test gave false confidence: it passed both pre-fix (read happened, then reject) and post-fix (reject before read), so the ordering regression in (3) was invisible to the assertion. Reported by glm-5.1 via /review on PR #3774. * refactor(core): close 4 prior-read enforcement gaps from 4th review (Copilot) Five concrete gaps that the previous round of enforcement work left open. Reported by Copilot via /review on PR #3774. 1. Confirmation-time rejections lost their ToolErrorType code. getConfirmationDetails() in both EditTool and WriteFileTool threw a plain Error on prior-read failure, which coreToolScheduler collapsed into UNHANDLED_EXCEPTION — silently breaking the EDIT_REQUIRES_PRIOR_READ / FILE_CHANGED_SINCE_READ contract for any approval-required flow. Fix: introduce PriorReadEnforcementError that carries `errorType: ToolErrorType`. Both confirmation paths now throw it, and coreToolScheduler reads `error.errorType` (falling back to UNHANDLED_EXCEPTION when absent). New regression tests assert the thrown error's `errorType` field for both tools. 2. checkPriorRead's "re-read with read_file" advice was wrong for binary / image / audio / video / PDF / notebook files. Their ReadFile result always sets lastReadCacheable=false, so the message would loop the agent forever on the same rejection. Fix: detect the fresh-but-non-cacheable case explicitly and return a dedicated message that explains the dead end ("Edit / WriteFile cannot mutate that payload safely") instead of asking for another read. Updated the existing non-cacheable regression test to assert the new message and the absence of "use the read_file tool first". 3. checkPriorRead swallowed every stat() failure and returned ok:true. EACCES, EBUSY, NFS hiccups, etc. would silently re-open the blind-write path the helper exists to block. Fix: only ENOENT continues to return ok:true (disappearance race). Any other code is fail-closed: returns EDIT_REQUIRES_PRIOR_READ with a message that names the errno. New regression test in write-file.test.ts spies on fs.promises .stat to inject EACCES and asserts the rejection. 4. The auto-memory record regression test only asserted `state === 'fresh'`. A future change that recorded auto-memory reads as partial / non-cacheable would still satisfy that assertion but would actually fail enforcement on every follow-up Edit. Fix: also assert lastReadAt is defined, lastReadWasFull is true, and lastReadCacheable is true. The full "what enforcement requires" predicate is now explicit in the test. (The 5th item, the WriteFile mirror of (1), is covered by the same PriorReadEnforcementError change.) * refactor(core): tighten StructuredToolError naming + add scheduler test Four follow-ups raised by deepseek-v4-pro on PR #3774. None of them change the enforcement boundary; they are all about making the contract clearer and harder to break in future changes. 1. PriorReadEnforcementError -> StructuredToolError. The class now wraps any content-derived ToolErrorType from calculateEdit (EDIT_NO_OCCURRENCE_FOUND, EDIT_EXPECTED_OCCURRENCE_MISMATCH, EDIT_NO_CHANGE, ATTEMPT_TO_CREATE_EXISTING_FILE) on top of the prior-read codes. The old name suggested the class was prior- read-specific, which would mislead any oncall engineer seeing it paired with one of the calculateEdit error codes. 2. EDIT_REQUIRES_PRIOR_READ kept its name (the prefix mentions "edit" but the enum is shared with WriteFileTool) — chose documentation over rename to avoid the churn of a value rename across logs/dashboards already keyed on it. JSDoc now spells out the cross-tool usage explicitly. 3. Stat failures other than ENOENT now map to a new PRIOR_READ_VERIFICATION_FAILED code instead of being conflated with EDIT_REQUIRES_PRIOR_READ. The failure mode is "we cannot verify" rather than "definitely not read" — operators routing on error codes can distinguish the two populations. 4. Added a coreToolScheduler test (`surfaces error.errorType from a confirmation throw instead of UNHANDLED_EXCEPTION`) that constructs a stub tool whose getConfirmationDetails throws StructuredToolError and asserts the surfaced ToolCall response carries the correct ToolErrorType. Without this test the scheduler's explicitErrorType branch would have no coverage at all. Tool tests updated for the new StructuredToolError class name and the PRIOR_READ_VERIFICATION_FAILED code on the EACCES path. * fix(core): close TOCTOU + grammar + directory regressions in PR-B Six concrete issues that the previous round of enforcement work left open. Reported by Copilot via /review on PR #3774. 1. TOCTOU window between pre-read checkPriorRead and readTextFile. The pre-read stat could pass enforcement, then an external writer could land between that stat and the actual read, leaving currentContent reflecting bytes the model never saw — exactly the stale-write path the PR is supposed to block. Closed by re-running checkPriorRead immediately after every readTextFile that fed currentContent / originalContent: EditTool.calculateEdit and the two WriteFileTool paths (execute + getConfirmationDetails). A `stale` outcome now fails the operation with FILE_CHANGED_SINCE_READ at the correct moment. 2. Directory targets sent the model into an enforcement loop. `fileExists` is a plain access check, so directories also entered the enforcement branch — the model would be told to call `read_file`, but `read_file` rejects directories with TARGET_IS_DIRECTORY, so the loop never terminated. Fixed in checkPriorRead: if `fs.stat` reports the path is not a regular file, return `ok: true` so the downstream readTextFile / write path can surface its own EISDIR / similar error. 3. Confirmation-time error messages used the short `display` form instead of the full `raw` form. Approval-required Edit calls therefore lost the remediation detail (file path, stale-vs-unread distinction, "without offset / limit / pages" hint) that the execute path already surfaced and that the WriteFile confirmation path already preserved. EditTool.getConfirmationDetails now throws StructuredToolError with `editData.error.raw`. 4. Non-text payload displayMessage was grammatically broken: built from the gerund `editing` / `overwriting`, it rendered as "cannot editing via this tool" / "cannot overwriting via this tool". Fixed by deriving a bare-verb form (`edit` / `overwrite`) alongside the gerund and using it in displayMessage. (Items 1, 5 and 6 from Copilot's batch are the same TOCTOU class — EditTool calculateEdit + WriteFile execute + WriteFile confirmation — addressed together in (1) above.) The "bypasses enforcement entirely" test now uses mockReturnValue instead of mockReturnValueOnce because calculateEdit calls getFileReadCacheDisabled twice — once for the pre-read check and once for the post-read TOCTOU re-check. Both must see disabled=true to actually bypass. * fix(core): close fileExists TOCTOU on WriteFile prior-read enforcement WriteFile gated prior-read enforcement on `fileExists` from `isFilefileExists()`, but a file that sprang into existence between that check and the write would still be overwritten without enforcement — `fileExists === false` skipped the check entirely. Made the gate unconditional on `fileExists`. checkPriorRead's own `fs.stat` decides what to do: - ENOENT → ok:true, fall through to the new-file path as before - file exists right now (whether or not isFilefileExists saw it) → unknown / stale check runs, the race-created file is rejected. Applied to both getConfirmationDetails and execute. The path that actually creates new files is unchanged because checkPriorRead's ENOENT branch is the disappearance-race exit, which is the correct exit for "the file truly does not exist yet". Reported by glm-5.1 via /review on PR #3774. * fix(core): close 4 enforcement gaps + 1 critical bug from 5th Copilot review Six issues raised by deepseek-v4-pro / glm-5.1 / qwen3.6-plus on PR #3774. Listed by reviewer-assigned severity. [Critical] (qwen3.6-plus) recordWrite previously only seeded the read metadata for brand-new entries. The reproduction was real: ReadFile(limit=10) → WriteFile(full content) → Edit. The partial read's lastReadWasFull=false would persist through the write, and the Edit would be rejected with EDIT_REQUIRES_PRIOR_READ even though the model just authored every byte. recordWrite now unconditionally refreshes lastReadAt, lastReadWasFull=true, and lastReadCacheable=true. The fileReadCache.test.ts case that previously asserted "preserves lastReadAt" is rewritten to assert the new "refreshes lastReadAt to match the write" contract, and a new "upgrades lastReadWasFull / lastReadCacheable after a full write" regression test pins the reproduction reviewer described. [Suggestion] (deepseek-v4-pro) Narrowed the non-regular-file bypass in priorReadEnforcement from `!stats.isFile()` to `stats.isDirectory()`. The earlier broad form covered FIFOs, sockets, and devices that the model has no legitimate "read first" recourse for and that can block readTextFile (FIFO) or over-allocate (/dev/urandom). Those now flow through to cache.check() and reject with the unread-file path before any I/O. [Suggestion] (glm-5.1) Removed the `fileExists && ...` gate from EditTool.calculateEdit, mirroring the |
||
|
|
ee460de97b |
docs(attribution): align cleanup-branch comments with noteCommitWithoutClearing
Three doc/test-fixture stale-after-refactor cleanups (Copilot
4MDx / 4MEI / 4MEa):
- shell.ts:1944 (around the stagedInfo === null branch): the comment
still claimed the finally block "falls back to a full clear", but
|
||
|
|
acd06e3461 |
Merge remote-tracking branch 'origin/main' into feat/commit-attribution
# Conflicts: # packages/core/src/core/client.ts |
||
|
|
8a1ed565a6
|
fix(core): auto-compact subagent context to prevent overflow (#3735)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
SDK Python / SDK Python (3.10) (push) Waiting to run
SDK Python / SDK Python (3.11) (push) Waiting to run
SDK Python / SDK Python (3.12) (push) Waiting to run
* fix(core): auto-compact subagent context to prevent overflow Subagent chats accumulated history without ever compacting, so a long multi-turn run could hit "max context length exceeded" before the compaction logic the main session uses had a chance to fire. Move compaction down into the chat layer so both main agent and subagent auto-compress at the configured threshold, and surface the result via a new chat stream event that bridges into the existing ChatCompressed UI path. The main-session wrapper still owns full /compress reset. Closes #3664 * fix(core): address subagent compaction review feedback Code fixes: - Seed `lastPromptTokenCount` on subagent chats so the first-send threshold gate sees the inherited history's true size. - Add `COMPRESSION_FAILED_TOKEN_COUNT_ERROR` to the fail-latch chain so token-counting failures stop retrying compression every send. - Restore `FileReadCache.clear()` after compaction in both the manual /compress wrapper and the auto-compaction path inside GeminiChat, preventing post-summary `file_unchanged` placeholders from pointing at content the model can no longer retrieve. - Refresh stale comment on the `compressed → ChatCompressed` bridge in turn.ts now that this path is the primary route, not a fallback. Tests: - turn.test.ts asserts the compressed → ChatCompressed bridge. - geminiChat.test.ts asserts COMPRESSED yields as the first stream event after auto-compaction succeeds. - chatCompressionService.test.ts bumps originalTokenCount above the cheap-gate so the NOOP test exercises findCompressSplitPoint. - client.test.ts asserts forceFullIdeContext flips when a ChatCompressed event flows through sendMessageStream's loop. * chore(core): drop redundant StreamEvent cast and document auto-compaction trade-off - Remove the `as StreamEvent` cast at the COMPRESSED yield site — the literal already matches the union member. - Add a 4-line comment at the auto-compaction setHistory point that points readers to GeminiClient for the env-refresh trade-off rationale, so readers don't have to chase the layering decision back across files. |
||
|
|
808d0978eb
|
feat(cli): route foreground subagents through pill+dialog while running (#3768)
* feat(cli): route foreground subagents through pill+dialog while running Foreground (synchronous) subagents currently render a live AgentExecutionDisplay inside the parent's pendingHistoryItems block. The frame mutates on every tool call and approval; once it grows past the terminal height (verbose mode, parallel subagents, long tool-call lists) the live-area repaint flickers visibly. This change extends BackgroundTaskRegistry with a flavor: 'foreground' | 'background' discriminator. Foreground entries register at the start of the synchronous tool-call and unregister in its finally path. The pill counts them; the dialog drills into their activity. The inline frame is suppressed during the live phase — only an active, focus-locked approval prompt renders, as a small banner labeled with the originating agent. Once the parent turn commits, the full AgentExecutionDisplay appears in scrollback via Ink's <Static>, exactly as before. Foreground entries skip the XML task-notification (the parent receives the result through the normal tool-result channel) and skip the headless holdback (the parent's await already pins the loop). The dialog gates per-agent cancellation behind a two-step confirm so a stray 'x' can't end the user's current turn. * fix(cli): address review findings on foreground subagent routing - Gate `registerCallback` on background flavor so foreground entries don't leak orphaned `task_started` SDK events without a matching terminal notification. - Render a queued-approval marker for non-focus subagents instead of returning null, so a queued approval is visible in the main view. - Move `emitStatusChange` before `agents.delete` in `unregisterForeground` to match the ordering used by complete/fail/cancel/finalize. - Prefix the foreground tool result with a cancel marker when `terminateMode === CANCELLED`, so the parent model can distinguish a user-cancelled run from a successful completion. - Mirror the background path's stats wiring on the foreground path so `entry.stats` stays current and the dialog detail subtitle shows tool count + tokens for foreground runs. - Remove the unreachable `isWaitingForOtherApproval` branch (subsumed by the queued-approval marker above). - Reset the foreground confirm-step on detail-mode `left` and ignore `x` on terminal entries so an armed cancel can't carry into list mode and the hint footer/handler stay in sync. - Test factory uses a `baseProps` spread instead of `as` cast so a future required field on `ToolMessageProps` is a compile-time miss. |
||
|
|
fe1fb31557
|
fix(cli): prevent file paths from being treated as slash commands (#3743)
* fix(cli): prevent file paths from being treated as slash commands (#1804) When users input file paths starting with '/' (e.g. '/api/apiFunction/...', '/Users/name/path'), they were incorrectly parsed as slash commands, resulting in "Unknown command" errors. The input was discarded instead of being sent to the model for processing. Root cause: isSlashCommand() only checked for a '/' prefix without validating whether the first token actually looks like a command name. Any '/' prefix triggered the slash command flow, and when no matching command was found, the error was shown with no fallback. Fix: Add looksLikeCommandName() that validates command names contain only [a-zA-Z0-9:_-]. Both isSlashCommand() and handleSlashCommand() now check the first token — if it contains path separators, dots, or non-ASCII characters, the input falls through to normal model processing instead of the command dispatcher. Closes #1804 * fix(cli): allow dots in command names and fix prettier formatting Address review feedback: - Allow '.' in looksLikeCommandName() regex to support extension-qualified commands like gcp.deploy (CommandService renames conflicts as ext.cmd) - Add regression tests for dot-named commands in both commandUtils and slashCommandProcessor - Fix prettier formatting in slashCommandProcessor test file * fix(cli): handle slash command review edge cases * docs(cli): align slash command validation comment * fix(cli): preserve slash prompt ordering * fix(cli): reject shell-metacharacter slash tokens * test(cli): align slash command action mocks * fix(cli): track model-sent user turns * fix(cli): narrow slash path handling scope * fix(cli): count slash path prompts in history * test(cli): type slash command action mocks |
||
|
|
2bd4aa1b6d
|
fix(sdk-python): standardize TAG_PREFIX to include v suffix (#3832)
Align the Python SDK's TAG_PREFIX with the TypeScript SDK convention by changing it from 'sdk-python-' to 'sdk-python-v'. This removes the need for callers to manually inject the `v` when composing git tags, eliminating an asymmetry that could lead to doubled or missing `v` prefixes when code is copied between SDK release helpers. The final tag format (sdk-python-v0.1.0) is unchanged. Closes #3793 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> |
||
|
|
715c258fb2 |
fix(attribution): roll back snapshot dedup key on sync appendRecord failure
1UMh (Copilot): appendRecord can throw synchronously before returning a promise — e.g. when ensureConversationFile() rethrows a non-EEXIST writeFileSync error. The async .catch() handler attached to the promise never runs in that case, so the optimistic dedup-key set sticks on a write that never landed and permanently suppresses identical retries. Roll back lastAttributionSnapshotJson in the outer catch too. Regression test forces writeFileSync to throw EACCES on the first invocation, then asserts the second identical snapshot attempt fires a fresh write rather than getting deduped. |
||
|
|
d44786685c
|
feat(core,cli): surface and cancel auto-memory dream tasks (#3836)
* feat(cli): surface auto-memory dream tasks in Background tasks dialog Adds `dream` as a fourth kind in the Background tasks pill + dialog, alongside agent / shell / monitor. Subscribes to MemoryManager via its existing subscribe() / listTasksByType() API and adapts MemoryTaskRecord into a DreamDialogEntry view-model. Zero changes to the core package. Filters out `pending` (sub-second transition) and `skipped` (every UserQuery that misses the gate creates one) records. Caps retained terminal entries at 3 since `MemoryManager.tasks` has no eviction path; without the cap, completed dreams would accumulate over the project's lifetime (mirrors MonitorRegistry's terminal cap pattern). Extract tasks are intentionally NOT surfaced — they fire on every UserQuery, would flood the pill, and the `memory_saved` toast in useGeminiStream already covers their completion signal. Read-only for now: cancellation requires MemoryManager.cancelTask + task_stop integration which lands in a follow-up PR. The dialog suppresses the "x stop" hint for dream entries until then to avoid silent no-op keystrokes. Refs #3634 * docs(cli): rephrase dream filter comment to focus on extract vs dream The earlier comment compared the design to a non-qwen-code product; restate the rationale in terms of the local extract / dream split (extract fires every UserQuery and surfaces via memory_saved toast, dream fires after gates and warrants pill / dialog visibility). * feat(core,cli): cancel dream consolidation tasks via dialog and task_stop Wires cancellation for the auto-memory dream task kind: - `MemoryManager.cancelTask(taskId)` — aborts the dream's fork-agent via a new per-task AbortController, marks the record `cancelled` before aborting so the runDream catch path can detect user-intent and avoid overwriting with a generic `failed`. The existing finally block releases the consolidation lock as the agent unwinds. - `MemoryManager.getTask(id)` — point lookup helper so cross-cutting consumers like `task_stop` can route by id without a project root. - AbortSignal threaded through `scheduleDream` → `runDream` → `runManagedAutoMemoryDream` → `planManagedAutoMemoryDreamByAgent` → `runForkedAgent.abortSignal` (already supported). - `task_stop` tool gets a 4th dispatch branch: dream task ids look up via MemoryManager and route through `cancelTask`. Extract is intentionally NOT cancellable — it runs synchronously on the request loop, cancelling would interfere with the user's own turn. - `BackgroundTasksDialog` x-stop hint suppression for dream is removed (was a PR-1 placeholder); `cancelSelected` dream branch now calls `memoryManager.cancelTask`. - `MemoryTaskStatus` gains `'cancelled'`. Dream view-model widens status union and filter to surface cancelled entries (terminal cap continues to apply). Refs #3634 * fix(core,cli): address review feedback on dream cancellation surface - DreamDetailBody: comment said cancellation "lands in PR-2" but the same PR wires `cancelSelected` for dream entries. Reword to describe what's actually shipped + flag in-flight progress as the real follow-up. - task_stop: drop the unreachable `!aborted` error branch. The status guard above already confirms `running`, and `cancelTask` is synchronous; in this branch it cannot return false. * fix(core): handle resolved-cancel path from runForkedAgent for dream tasks runForkedAgent maps AgentTerminateMode.CANCELLED to a resolved {status: 'cancelled'} result rather than rejecting. The cancel-via- task_stop path landed in the previous commit assumed the call would throw — when it didn't, the runDream success path overwrote the user-cancelled record with 'completed' AND bumped lastDreamAt metadata, suppressing the next legitimate dream cycle. Two-layer defense: - dreamAgentPlanner now rethrows when the fork agent reports cancelled status (mirrors the existing failed-status throw). This is the source-of-truth fix. - runDream now checks abortSignal.aborted after the await as defense in depth. If anything in the call chain ever forgets to propagate, this guard short-circuits the success path before metadata write. Updates the existing dreamAgentPlanner test that previously pinned the buggy "returns cancelled without throwing" behavior. Adds a manager test that simulates the resolves-on-abort scenario directly to verify the consumer-side guard catches anything the planner might miss. * fix(core,cli): address review feedback on dream UX, perf, and comment clarity Three fixes from the post-cancellation-PR review: - scheduleDream now sets an initial `progressText` ("Scheduled managed auto-memory dream.") on the in-flight record. Without this, the dialog Detail's Progress section stayed empty until completion — the PR description's mid-flight screenshot showed text that production never actually rendered. - useBackgroundTaskView gates the MemoryManager.subscribe listener on a dream-content signature. The manager fires for every task transition (extract included, ~2x per UserQuery), but the dialog has no extract surface; without this dedup each extract notify forced a full 4-source re-merge + a fresh setEntries reference, re-rendering the dialog and pill on entries that hadn't changed. Added a test that pins the reference-stability invariant. - task-stop comment was misleading — said "the status guard above already confirmed running", but for the dream branch the running check happens IN this branch (not earlier). Reworded. * fix(core,cli): tighten cancelTask contract, dedup dream snapshot reads, sync test helper Four fixes from the latest review pass: - cancelTask() now enforces the AbortController invariant. The old `ac?.abort()` returned `true` even when no controller was found, meaning callers could see a successful return while the dream was not actually aborted (and would leak the consolidation lock until the agent finished naturally). The controller is registered synchronously alongside `status='running'`, so a missing controller for a running record is a contract violation — return false without flipping status so the caller knows the abort didn't take. - useBackgroundTaskView's `refresh()` now reuses the dream snapshot the memory listener fetched for its dedup gate. The previous version re-read `listTasksByType('dream')` inside `computeDreamSig()` and again inside `refresh()` — extra work AND a race window where the gate signature could come from a different snapshot than the one used to build dreamEntries. Single read, single source of truth. - The `dream()` test helper widened to include `'cancelled'` so it matches the production `MemoryTaskStatus` union. Added a small test asserting cancelled dreams flow through the kind discriminator (the dialog's terminal-cap window depends on showing the user the outcome of the abort they just triggered). - Dropped the stale "PR-1 read-only / PR-2 cancellation" block comment above the dream-entries describe block — both are in this PR now. * fix(core,cli): address self-review + Copilot feedback on dream surface Combined fixes for the latest review pass — self-review notes (U1, U2, U5, U6) plus Copilot's three new comments (C1, C2, C3): - **Footer dream-indicator dedupe (U1)**: removed `useDreamRunning` + `✦ dreaming` right-column text. The new Background tasks pill already counts dream tasks alongside agent / shell / monitor; showing both produced two simultaneous signals for the same state. - **task_stop dispatch surfaces extract distinctly (U2 + C1)**: a new `TASK_STOP_NOT_CANCELLABLE` error type fires when the task id resolves to a known-but-not-cancellable record (extract). Previously extract ids fell through to `NOT_FOUND`, misleading the model into thinking the id was never valid. Also surfaces the missing-controller case from `cancelTask` as an explicit error rather than reporting phantom success. - **MemoryManager.cancelTask logs missing-controller violation (C2)**: the silent `return false` for the missing-AbortController case now emits a `debugLogger.warn` so the inconsistency is observable in debug bundles. Without the log a runaway dream burning tokens would leave no trail. - **MemoryManager.subscribe taskType filter (C3)**: subscribers can now opt into per-type notify routing via `subscribe(fn, { taskType })`. Internal `notify()` calls pass the changed task's type so filtered consumers wake only on relevant transitions. The bg-tasks UI hook uses this to skip the per-UserQuery extract notify entirely — drops the per-extract O(n) signature work to zero. - **runDream guards against late-cancel overwrite (U5)**: the success path now re-checks `abortSignal.aborted` between metadata read/write and before the final `update({status: 'completed'})`. Closes the ~tens-of-ms race window where `cancelTask` flipping status to `'cancelled'` would silently lose to the success continuation overwriting with `'completed'` + bumping `lastDreamAt`. - **DreamDialogEntry.endTime semantic comment (U6)**: documents that `endTime` for cancelled records is the cancel-call moment (not the fork unwind), so a future maintainer doesn't treat it as a real fork-finish timestamp. Tests: new `subscribe() taskType filter` describe block in manager, new task_stop tests for `NOT_CANCELLABLE` (extract) and missing- AbortController (dream); existing test renamed/widened. * fix(core,cli): tighten error semantics + comments on dream surface Four fixes from the latest review pass: - New `TASK_STOP_INTERNAL_ERROR` error type for the missing- AbortController contract violation. Previously the dispatcher reused `TASK_STOP_NOT_RUNNING`, which is misleading — the task IS running, cancellation just couldn't be delivered. Distinct type signals "this is unexpected, file a bug" vs `NOT_CANCELLABLE` which signals "expected behavior, use a different approach". - Reworded the `useBackgroundTaskView` filter comment. Said "every UserQuery that misses the gate creates one [skipped record]" but `scheduleDream` returns `{status: 'skipped'}` early without creating a record for most gate misses; only the acquireDreamLock/EEXIST race actually stores a `'skipped'` record. - Strengthened the `subscribe() unsubscribe` test. The previous version asserted "not called yet" without firing any notify after unsubscribe, so a regression that left the listener attached would still pass. Now schedules an extract before AND after the unsubscribe, verifying the call count doesn't increment. - Moved `const debugLogger = createDebugLogger(...)` below the full import block in manager.ts. Previous version sat between imports, violating eslint-plugin-import's `import/first` rule (didn't trip lint locally, but worth fixing before it does). * fix(core): skip scheduleDream early when params.config is missing `ScheduleDreamParams.config` is optional in the type so test paths can omit it, but production callers always pass one. Without a config, `runManagedAutoMemoryDream` throws because the fork-agent execution requires it. With dream tasks now visible in the Background tasks dialog, that throw becomes a noisy `failed` entry the user sees but didn't trigger. Convert the omitted-config case to the same `disabled` skip path that an explicitly-disabled config takes, so a no-config call short-circuits before any record is stored. Existing tests that relied on the old "no config = proceed past the disabled gate" behavior now pass an explicit `makeMockConfig()` (matching what they would do in any realistic scenario). New test pins the no-config skip behavior + asserts no record was stored (so a regression that drops the early skip would produce a visible failed entry in the dialog and fail the test). * fix(core): plug subscribe Map leak + dream.ts late-cancel ordering bug Two fixes from the latest review pass: - `MemoryManager.subscribe`'s typed-branch unsubscribe deleted the listener from its per-type Set but left the empty Set sitting in `subscribersByType`. Over a long-running session with repeated React mount/unmount of the bg-tasks view, that accumulates dead Map entries forever. Drop the entry when the bucket goes empty. - `runManagedAutoMemoryDream` writes metadata after the fork agent returns (`bumpMetadata` → `rebuildManagedAutoMemoryIndex` → `updateDreamMetadataResult`). If the user presses 'x' between the fork's success return and these writes, the writes proceed and bump `lastDreamAt` — leaving the visible UI ('Stopped') disagreeing with the scheduler gate (sees a recent successful dream and suppresses the next cycle). manager.ts already short-circuits its own metadata write via the post-await abort check, but it can't block writes that already happened inside dream.ts. Adds the same abort-signal check between each write step here. * fix(core): swallow releaseDreamLock errors so they don't poison outcome If `releaseDreamLock` throws inside the inner finally (e.g. filesystem error on the lock file), the exception propagates to the outer catch and overwrites a successfully-completed dream record with 'failed'. The on-disk metadata is already up-to-date at that point, so the user sees a contradictory state — `lastDreamAt` was bumped but the UI shows a failure. Wrap the release in a try/catch with `debugLogger.warn`. The lock file is still cleaned up on the next session via the existing staleness sweep, so swallowing the release error doesn't risk a permanent stuck lock. * fix(core): thread abortSignal into dream metadata writes The pre-call abort checks in `runManagedAutoMemoryDream` close most of the late-cancel race window, but each metadata helper itself does read → mutate → write across two awaits. If the user cancels between those two awaits the write still happens, persisting `lastDreamAt` for an aborted run and suppressing the next legitimate dream cycle. Thread `abortSignal` into `bumpMetadata` and `updateDreamMetadataResult`; both now re-check between the read and the write, returning early without persisting when the signal has already fired. The pre-call checks remain as the first line of defense; this guards the race that opens after the call enters. * fix(core): close dream cancel race + surface lock-release failures Two related fixes from the latest review pass: - Move scheduler-gating metadata writes out of `runManagedAutoMemoryDream` and into `MemoryManager.runDream`, sequenced AFTER the status='completed' flip. The previous shape left a race window where cancellation arriving during/after `fs.writeFile` could persist `lastDreamAt` while the manager flipped status to 'cancelled' — visible UI ('Stopped') would disagree with the scheduler gate (sees a recent successful dream), suppressing the next legitimate dream cycle. The new order makes the gating metadata write race-free: once status !== 'running', cancelTask refuses, so any cancel arriving during the metadata write is ignored. The remaining "cancel raced the synchronous status update" window is handled by a post-update abort recheck that restores 'cancelled' and skips the metadata write. Drops the now-dead `bumpMetadata` helper from dream.ts; index rebuild stays there since it's informational, not gating. - Surface `releaseDreamLock` failures on the task record's metadata (`lockReleaseError`). The previous fix logged-and-swallowed only, so a Windows EPERM or ENOENT race would silently block subsequent dreams as 'locked' with no UI signal explaining why dreaming had stopped. Logger keeps emitting the warn for debug bundles. * fix(core,cli): address 8 review findings on dream surface Reverts the metadata-write behavior regression, plugs the storeWith reentrancy hole, surfaces lock/metadata warnings in the UI, and a handful of cleanups: - Wrap gating-metadata read+write in try/catch (manager.ts). The PR moved metadata writes from dream.ts (best-effort, swallowed) to manager.ts (unguarded). A throw from readDreamMetadata / writeDreamMetadata now propagates to the outer catch and overwrites a successfully-completed dream with 'failed' — the dream actually did its work and touched files are visible. New catch logs + writes `metadataWriteError` on the record so the UI can explain why the next dream may re-fire sooner than expected. - Register the AbortController BEFORE storeWith in scheduleDream. storeWith fires a notify; a subscriber synchronously calling cancelTask(record.id) would otherwise see status='running' but no controller, hitting the missing-controller defensive warn path and reporting a phantom failure on a brand-new dream. - Surface `lockReleaseError` and `metadataWriteError` in the dream view-model and DreamDetailBody (rendered as warnings, not errors, so the terminal status stays Completed). Previous fix wrote them to record.metadata only — nothing in the cli read or rendered them, so users still had no UI signal. - Preserve `result.touchedTopics` on the unreachable cancel-raced- status-update branch. If a future refactor introduces an await there, the restored cancelled record would otherwise drop the already-produced result; the UI would report a clean cancellation even though memory files were already modified. - Add `'cancelled'` to MemoryDreamEvent status union and emit a cancelled telemetry event from the runDream catch path. Without this a cancelled dream is indistinguishable from one that never scheduled in the first place. - Drop the dead abortSignal param from updateDreamMetadataResult in dream.ts (no caller passes it after the manager.ts move). - Swap declaration order of `computeDreamSig` and `refresh` in useBackgroundTaskView.ts (TDZ-fragile against a future refactor that adds a synchronous refresh call between them). - Update the unreachable cancel-raced-update branch comment to describe what's actually true ("defense-in-depth, unreachable today") instead of the confusing "cancelTask flipped" path. - cancelSelected now checks the cancelTask return value and logs via debugLogger when false. Today this branch is unreachable thanks to the controller-register-before-storeWith fix above, but if a future refactor breaks the invariant the silent ignore would let the user think the cancel took effect. - Mirror the manual /dream metadata path's `recentSessionIdsSinceDream = []` reset in the auto path — field is dead code today but keeping the two write sites in sync avoids surprises. Telemetry metric `recordMemoryDreamMetrics` widened to accept the new 'cancelled' status (downstream consumer in loggers.ts). * fix(memory): same-session recovery when releaseDreamLock throws The prior fix surfaced lockReleaseError on the dialog so the user knows the lock release failed (Windows EPERM, ENOENT race, disk full, etc.) — but until next process start, dreamLockExists() still sees a fresh-mtime lock owned by an alive PID (us!) and silently suppresses every subsequent scheduleDream() call as `{status: 'skipped', skippedReason: 'locked'}`. The user sees the warning AND zero further dream activity, and the staleness sweep that would clean the leaked lock only runs at session start. Adds a `dreamLockReleaseFailed` flag set in the catch. The next scheduleDream() force-cleans the leaked lock file via fs.rm({force: true}) before the existence check, so dream scheduling resumes within the same session. Best-effort: if even the forced rm fails (truly unrecoverable filesystem state), falls through to the existing 'locked' skip path. This is an incremental improvement on top of b00ecdebd's UI-surface fix. The two together give the full story: warning visible → automatic recovery on next attempt. * fix(memory): real cancelled-dream duration + clarify index-rebuild ordering Two follow-up suggestions from review: - **`duration_ms: 0` in cancelled-dream telemetry** (manager.ts): the user-cancel path emitted `MemoryDreamEvent` with `duration_ms: 0`, which would silently skew latency histograms / p95 metrics by treating cancelled dreams as instant. Capture `dreamStartMs = Date.now()` at the top of `runDream` and emit the real elapsed time in the cancel branch. - **Misleading index-rebuild comment** (dream.ts:75–84): the comment claimed the index rebuild "is still done before returning when topics were touched", but the code returns early on `abortSignal?.aborted` BEFORE the rebuild. Rewrote the comment to describe the actual cancel-aware ordering — abort returns partial result without rebuilding (rebuild is expensive; next dream cycle will rebuild against the latest files anyway), live path rebuilds only when topics changed. 22 / 22 manager tests pass; tsc clean. |
||
|
|
a53c7508d2 |
fix(attribution): harden restoreFromSnapshot against corrupt payloads
1KMY (Copilot): snapshot.surface was copied without type validation.
A corrupted/partially-written snapshot with a non-string surface
(e.g. {}, 42, null) would later be serialized into the git note as
"[object Object]" and used as a Map key downstream, breaking the
expected payload shape. Type-check and fall back to the current
client surface for any non-string (or empty-string) value.
1KLq (Copilot): per-field sanitiseCount enforced
`promptCount >= 0` and `promptCountAtLastCommit >= 0` independently,
but never the cross-field invariant. A snapshot with
promptCountAtLastCommit > promptCount would surface a negative
getPromptsSinceLastCommit() and propagate as a "(-N)-shotted"
trailer into PR text. Clamp atLastCommit to total on restore.
1KL_ (Copilot): when a snapshot carried both the symlinked and
canonical paths for the same file (a session straddling the
canonicalisation fix), `set(realpathOrSelf(k), ...)` overwrote the
first entry with the second, silently dropping the AI contribution
the first form had accumulated. Merge instead: sum aiContribution
and OR aiCreated when collapsing duplicate keys.
Tests cover all three branches: non-string surface fallback,
promptCount clamp, and duplicate-key merge.
|
||
|
|
d429d90331 |
fix(attribution): toggle-off partial clear, normalizeGitCoAuthor type-check, terraform lockfile
0oAK (Copilot): the gitCoAuthor.commit toggle-off branch returned
before computing the committed file set, leaving the just-committed
files' tracked AI work in the singleton. Re-enabling the toggle and
committing the same file again would re-attribute earlier (already-
committed) AI edits to the new commit. Move the toggle gate AFTER
matchCommittedFiles so the finally block does a proper partial clear
of the just-committed files even when the note write is skipped.
0oAg (Copilot): normalizeGitCoAuthor copied value?.commit / value?.pr
without type-checking. settings.json is hand-editable; a stored
`{ commit: "false" }` reached runtime as a truthy string and behaved
as if attribution were enabled. Add a per-field bool coercion that
falls back to the schema default (true) for any non-boolean,
matching what the dialog and IDE schema already imply. Tests cover
the string / number / null cases.
0oAo (Copilot): v3→v4 shouldMigrate only special-cased versionless
legacy booleans — versionless files with invalid gitCoAuthor values
(`"off"`, `[]`, etc.) skipped the migration and the loader stamped
`$version: 4` over the bad value. Runtime normalization then
silently re-enabled attribution. Extend shouldMigrate to fire on ANY
versionless non-object value at general.gitCoAuthor; the existing
migrate() body's drop-and-warn path resets it. Already-object
shapes (hand-edited to v4) still skip cleanly. Tests added.
0oAt (Copilot): `.terraform.lock.hcl` got dropped from generated-file
exclusion when `.lock` was removed from the blanket extension list
in
|
||
|
|
dd45e17201 |
fix(attribution): runGit null-on-failure, versionless v3→v4 migration
z54M (Copilot): runGit returned '' on both successful-empty-output
and silent failure, so a `--name-only` that errored mid-way through
the diff fan-out aliased to a real `--allow-empty` commit. The
empty-commit branch then preserved pending attributions, leaving
the just-committed file's tracked AI edit alive to re-attribute on
the next commit. Switch runGit to `Promise<string | null>`,
distinguishing exit code 0 (any output, including '') from non-zero
(null). The diff-stage fan-out and ancillary probes now treat null
as analysis failure and bail with `return null` instead of falling
into the empty-commit path.
z539 (Copilot): the v3→v4 `shouldMigrate` only fired on
`$version === 3`. A versionless settings file carrying the legacy
`general.gitCoAuthor: false` boolean would skip every migration
(gitCoAuthor isn't in V1_INDICATOR_KEYS — it post-dates V2), get
its `$version` normalized to 4 by the loader, and leave the
boolean in place. The settings dialog then reads the V4
`{commit, pr}` shape, sees missing keys, defaults both to true, and
silently overwrites the user's opt-out on the next save. Also fire
when `$version` is absent AND the value at `general.gitCoAuthor`
is a boolean. Tests cover the new path and confirm the existing
versioned/object-shape paths are untouched.
|
||
|
|
2a5be0d3b1
|
fix(core): prevent auto-memory recall from blocking main request (#3814)
Some checks are pending
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* fix(core): prevent auto-memory recall from blocking main request (issue #3759) The auto-memory recall side-query had a 5s AbortSignal.timeout that fired on every turn, and the main request path awaited the full recall promise (including timeout + heuristic fallback). This caused a ~5s delay on every user turn. Changes: - relevanceSelector.ts: reduce model-driven selector timeout from 5s to 2s - client.ts: add resolveAutoMemoryWithDeadline() that races the recall promise against a 2.5s deadline, returning empty result if recall hasn't completed in time - client.test.ts: add 2 tests verifying slow recall doesn't block the main request and fast recall still includes memory content Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): remove stripThoughtsFromHistory from GeminiChat mocks stripThoughtsFromHistory was removed from GeminiChat on main. The mock in client.test.ts still referenced it, causing TS2353 build failures in CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(core): address PR #3814 review comments on auto-memory recall deadline Three changes from the review: 1. clearTimeout leak — The setTimeout in resolveAutoMemoryWithDeadline is now cleared in a finally block so the dangling timer doesn't block the Node.js event loop from draining during graceful process exit. 2. Telemetry inflation — An AbortController is created before the recall call and aborted when the 2.5s deadline fires. Its signal is passed into the recall pipeline via a new optional abortSignal field on ResolveRelevantAutoMemoryPromptOptions. All three logMemoryRecall calls in recall.ts are gated behind !options.abortSignal?.aborted so discarded results don't emit success metrics. 3. Test coverage — Added a test for the !promise guard path (getManagedAutoMemoryEnabled() returns false) verifying sendMessageStream completes without calling recall or injecting memory content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(core): address PR #3814 review findings for auto-memory recall Critical fixes: - Wrap onDeadline() in try/finally inside resolveAutoMemoryWithDeadline so resolve() always runs even if onDeadline() throws, preventing deadlock - Forward caller abortSignal through recall() → selectRelevantAutoMemoryDocumentsByModel → runSideQuery, combined with the existing 2s timeout via AbortSignal.any() so the model API call is cancelled when the 2.5s deadline fires - Abort and clear pendingRecallAbortController on all early return paths (MaxSessionTurns, BoundedTurns=0, SessionTokenLimitExceeded, Arena control signal) - Rename _pendingRecallAbortController → pendingRecallAbortController (inconsistent underscore prefix) Tests added: - client.test.ts: verify sendMessageStream completes normally when recall rejects - relevanceSelector.test.ts: verify caller abort signal is forwarded combined with timeout, and timeout-only signal when no caller signal provided * fix(core): start auto-memory recall deadline at initiation time Moves the resolveAutoMemoryWithDeadline() race from consumption time to initiation time so the 2.5s budget is not consumed by intermediate work (microcompact, compression, token checks, IDE context) between recall initiation and consumption. The raced promise is stored directly in relevantAutoMemoryPromise and simply awaited at consumption. The pendingRecallAbortController cleanup on early return paths is preserved unchanged. * fix(core): address PR #3814 review feedback - resetChat(): abort pendingRecallAbortController so stale in-flight recall does not leak into the next session. - recall.ts catch: distinguish AbortError (deadline-triggered cancellation) from real model errors. AbortError logs at debug level with a message indicating the heuristic result was discarded. Real model errors continue to log at warn with the existing fallback message. * fix(core): address PR #3814 round 3 review feedback - Use explicit undefined check instead of non-null assertion for timer in resolveAutoMemoryWithDeadline (clearTimeout) - Gate heuristic fallback in recall.ts on abortSignal.aborted so discarded results after the deadline don't produce output - Downgrade AbortError log level from warn to debug in the client.ts recall catch block, keeping the warn channel meaningful for real failures - Add tests verifying pendingRecallAbortController.abort() is called on MaxSessionTurns and SessionTokenLimitExceeded early-return paths Co-Authored-By: Claude <noreply@anthropic.com> * test(core): assert abort propagation in relevance selector * fix(lint): remove unused eslint-disable directive in relevanceSelector.test.ts The import/no-internal-modules eslint-disable comment was unnecessary — the rule does not flag same-package imports like ../utils/sideQuery.js. ESLint 9 reports it as an unused directive, failing the CI lint check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
1ece87438f |
fix(attribution): preserve unstaged AI edits across cleanup branches
uxU5 + uxVQ + uxUO (Copilot): every cleanup branch in attachCommitAttribution that called clearAttributions(true) was wholesale-erasing pending AI edits for files the user never staged in this commit. Reviewer scenarios: - multi-commit chain (`commit a && commit b`) bails out without writing a note, but unstaged edits to file Z (touched by neither commit) get cleared along with the chain's committed files. - attribution toggle off: same — toggling the flag wipes pending unstaged work. - analysis failure (shallow clone, --amend without reflog, partial diff failure): the finally-block fallback wholesale-cleared every pending file, consuming unrelated AI edits. - 0%-AI commit: when no file in the commit was AI-touched, generateNotePayload was emitting an "0% AI" note attached to a commit that legitimately had no AI involvement — actively misleading metadata. Add `noteCommitWithoutClearing()` to the service: snapshots the prompt counter as the new "at last commit" but leaves the per-file map alone. Use it in the multi-commit, no-tracked-edits, toggle-off, and analysis-failure paths. The committed-files partial-clear (clearAttributedFiles) still runs in the success path. The 0%-AI no-match case now skips the note write entirely. |
||
|
|
325a12c3c1 |
chore(schema): regenerate settings.schema.json to match gitCoAuthor.commit description
The settingsSchema.ts source for `gitCoAuthor.commit.description` was
updated in
|
||
|
|
c7facf5013
|
fix(core): use per-model settings for fast model side queries (#3815)
Fixes #3765. Side queries (session recap, title generation, tool-use summary) running on the fast model previously inherited the main model's ContentGeneratorConfig, leaking extra_body / samplingParams / reasoning between models. GeminiClient.generateContent() now resolves the target model's own config via buildAgentContentGeneratorConfig() when the requested model differs from the main model, and creates a dedicated ContentGenerator (cached, cleared on resetChat). Cross-authType resolution lets the fast model live under a different provider than the main model. Uses the target model's authType for retry logic so provider-specific checks (e.g. QWEN_OAUTH quota detection) reference the correct provider. Falls back to the main generator if the model is not in the registry. |
||
|
|
5d0adbb5bd
|
fix(core): activate skills from discovered result paths (#3852)
* fix(core): activate skills from discovered result paths * fix(core): address result path review feedback * fix(core): handle grep result path edge cases * fix(core): trust fallback grep result paths |
||
|
|
f4a9f7bfd7
|
feat(weixin): add image sending support via CDN upload (#3781)
* feat(weixin): add image sending support via CDN upload
* fix(weixin): address PR review — path validation, encoding, timeout, error handling
Critical fixes from wenshao's review of feat/weixin-image-send:
1. File read vulnerability: add validateImagePath() in send.ts with
directory allowlist, extension filter, magic-byte check, 20MB cap,
and realpath resolution. Pass workspace cwd as allowed dir.
2. aes_key encoding: change from base64(hex-ascii) to base64(raw 16B)
to match the protocol expectation (images use raw bytes, not hex).
3. uploadToCdn timeout: add AbortController + 40s timeout per retry
attempt to prevent hanging on stalled CDN connections.
4. Unhandled rejection: wrap fallback sendText() in catch block with
its own try/catch to prevent process crash on double failure.
5. Default instructions merge: append image capability guide when
custom instructions lack [IMAGE:], instead of silently dropping it.
6. Dead code: remove unused imagePaths parameter from sendMessage().
7. Regex hardening: strip code blocks before [IMAGE:] extraction,
filter empty paths to prevent confusing readFileSync('') errors.
8. URL validation: reject http:// URLs and validate CDN hostname in
uploadToCdn (SSRF prevention).
Tests: replace identity mock with real encryptAesEcb/computeMd5 so
padding mismatches are caught; fix partial node:crypto mock.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(weixin): address 2nd round PR review — 10 issues
Critical fixes:
1. detectImageMime: add JPEG magic bytes (0xFF 0xD8 0xFF), throw on
unrecognized format instead of defaulting to image/jpeg
2. getUploadUrl retry: pass errcode to WeixinApiError, add ret field
check in isRetryableError so actual API errors trigger retries
3. ALLOWED_DIRS: add realpathSync('/tmp/') and realpathSync(tmpdir())
to handle macOS symlink resolution (/tmp → /private/tmp)
4. [IMAGE:] stripping: only replace markers that were actually parsed,
preserving [IMAGE:] inside code blocks in displayed text
5. TOCTOU fix: use openSync/readSync(16B) for magic-byte check instead
of reading the entire file twice
6. sendMessage: check both ret and errcode fields for error detection
Suggestions:
7. connect(): avoid mutating this.config.instructions on reconnect
8. Fix duplicate Step 2 comment numbering
9. Replace errcode=${undefined} with errcode=${resp.errcode ?? '(none)'}
10. stderr: log structured (status=, ret=) instead of raw errmsg
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(weixin): 3rd round PR review — errcode checks, error logging, timeout, path resolution
- api.ts: add errcode check in getUploadUrl (align with sendMessage)
- api.ts: pass ret/errcode from CDN error to WeixinApiError
- send.ts: resolve workspace dirs with realpathSync
- WeixinAdapter.ts: include errcode and err.message in error log
- media.ts: add 40s timeout to downloadAndDecrypt fetch
- send.test.ts: add 12 tests covering detectImageMime and validateImagePath error branches
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
---------
Co-authored-by: Maidong <408097061@qq.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
||
|
|
296fb55aef |
fix(attribution): preHead race, regex apostrophe-escape, surface failures, dead code
t2G0 (deepseek-v4-pro): addCoAuthorToGitCommit single-quote regex now matches the bash close-escape-reopen apostrophe form using ((?:[^']|'\\'')*) — the same pattern bodySinglePattern uses for gh pr create. Input like git commit -m 'don'\''t' was previously silently un-rewritten because the negative lookahead bailed; the trailer now lands at the FINAL closing quote. Test updated. tMBP (gpt-5.5): preHead capture switched from concurrent async getGitHead to a synchronous getGitHeadSync (execFileSync) BEFORE ShellExecutionService.execute spawns the user's command. A fast hot-cached git commit could move HEAD before the async rev-parse resolved, leaving preHead === postHead and silently skipping the attribution note. Trade ~10–50 ms event-loop block per commit-shaped command for correctness of the post-command HEAD comparison. t2Gv (deepseek-v4-pro): attribution write failures (note exec non-zero, payload too large, diff-analysis exception, shallow clone / amend-without-reflog) are now surfaced on the shell tool's returnDisplay AND llmContent so the user and agent both see when their commit succeeded but the per-file git note didn't land. attachCommitAttribution now returns string | null (warning text or null for intentional skips like no-tracked-edits). Co-authored-by trailer is unaffected — only the note is gated by these failures. t2Gy (deepseek-v4-pro): committedAbsolutePaths now matches against the canonical keys already stored in fileAttributions (matchCommittedFiles iterates by relative path against the canonical repo root) instead of re-resolving each diff path on the fly. realpathSync(resolved) failed for deleted files and didn't follow intermediate symlinks, leaving stale per-file attribution alive past commit and inflating AI percentages on subsequent commits. t2HI (deepseek-v4-pro): removed dead sessionBaselines / FileBaseline / contentHash / computeContentHash infrastructure (~40 lines). The fields were written, persisted, and restored but never read for any computation or decision. AttributionSnapshot schema stays at version 1 — restore tolerates pre-fix snapshots that carried the now-ignored baselines field. t2HM (deepseek-v4-pro): extracted the duplicated lastMatch helper in addCoAuthorToGitCommit and addAttributionToPR into a single module-level lastMatchOf so future fixes can't be applied to only one copy. |
||
|
|
e4bb0181ad |
fix(attribution): depth-1 shallow detection, snapshot dedup post-rewind/post-failure
sfGz (Copilot): rev-list --count HEAD === 1 cannot distinguish a true root commit from a depth-1 shallow clone — both report 1 because rev-list only walks locally available objects. Switch to git log -1 --pretty=%P HEAD which reads the parent SHA directly from commit metadata: empty means a real root, non-empty means a parent is recorded (whether or not its object is local). The shallow-clone bail is now reliable. sfIm (Copilot): the dedup key persisted across rewindRecording, so the previous snapshot living on the now-abandoned branch would match the next post-rewind snapshot and silently skip the write, leaving /resume on the rewound session with no attribution state. Reset lastAttributionSnapshotJson when rewindRecording fires. sfJE (Copilot): dedup key was committed before the async write settled. A transient write failure would update the key, then permanently suppress all future identical snapshots even though nothing was ever persisted. Switch to optimistic-set then rollback on appendRecord rejection — synchronous identical calls dedup cleanly, but a failed write clears the key so the next identical snapshot retries. appendRecord now returns the per-record write promise (writeChain still has its swallow-catch for chain liveness) so callers needing per-write success can react to it. Tests added in chatRecordingService.test.ts for both rewind-reset and rollback-on-failure paths. |
||
|
|
174b3ac951
|
feat(cli): Add ability to switch models non-interactively from the cli (#3783)
* Add ability to switch models non-interactively from the cli This fulfills request #3410 * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Update packages/cli/src/ui/commands/modelCommand.ts Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com> * Revert "Update packages/cli/src/ui/commands/modelCommand.ts" This reverts commit |
||
|
|
89cb326c2f
|
feat(cli): improve export format completion navigation (#3701)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
SDK Python / SDK Python (3.11) (push) Waiting to run
SDK Python / SDK Python (3.12) (push) Waiting to run
SDK Python / SDK Python (3.10) (push) Waiting to run
* feat(cli): improve export format completion navigation * fix(cli): address PR #3701 review feedback on /export completion Critical: - Guard phase-2 cycling by checking buffer text starts with "/export " so a manually edited buffer is never clobbered by stale nav state (C1) - Derive export format suggestions from slashCommands.subCommands to keep a single source of truth with the command registry (C2) - Reset completionSelectionWasNavigatedRef on showSuggestions rising edge instead of on every suggestions change to avoid a race where an already-navigated selection is forgotten before Enter (C3) - Add regression tests for isPerfectMatch + navigated + Enter, including the positive path and a control case (C4) Suggestions: - Prefix-guard getExportFormatFromInput to skip regex on non-/export input (S1) - Drop trailing space from setExportCompletionInput output so buffer text is no longer implicitly coupled to the cycling heuristic (S2) - Document the two-phase state machine (one-shot fill + cycling) (S3) - Accept Tab as an additional cycling key alongside Up/Down (S4) - Remove the unconditional ref reset at the tail of handleInput; correctness is now guaranteed by the buffer-text guard (C1) and the showSuggestions edge-triggered useEffect (C3) (S5) * fix(cli): tighten export completion cycling guard and unify Tab behavior - Phase 2 cycling guard: replace startsWith('/export ') with strict getExportFormatFromInput() to prevent overwriting inputs with extra arguments (e.g. '/export html --verbose'). - ACCEPT_SUGGESTION in Phase 1 popup: add hasExportFormatSuggestions branch so Tab/Enter seeds exportCompletionSelectionIndexRef, allowing Phase 2 cycling to continue from the selected format (consistent with Up/Down arrow behavior). - Add 4 regression tests: Phase 1 Up wrap, Phase 2 Up wrap, Tab seed + Phase 2 Tab cycle, guard prevents overwriting extra args. Ref: PR #3701 second-round review by wenshao * fix(cli): address PR #3701 third-round review feedback on /export completion - S6: use dynamic exportFormatSuggestions.findIndex() for highlight index instead of static EXPORT_FORMAT_COMPLETIONS.indexOf() - S7: derive Phase 2 cycling current index from buffer text via getExportFormatFromInput + indexOf, with defensive ref fallback - S8: extract getNextExportCompletionIndex as module-level pure function; cache exportCycleFormats via useMemo to avoid per-keystroke .map() - S9/S10: add tests for ESC and Ctrl+C reset of export cycling state * fix(cli): tighten /export prefix guard, add superset matching fallthrough, and improve documentation * fix(cli): address review #4224860127 - smaller notes optimization - S1: Strengthen EXPORT_FORMAT_COMPLETIONS fallback comment with an IMPORTANT sync warning for format removals - S2: De-export getExportFormatFromInput (no external consumers) - S3: Add intermediate buffer-clear assertion after Ctrl+U in test to pin state and prevent false positives from future hook changes * refactor(cli): extract export completion into useExportCompletion hook Address all feedback from PR #3701 review comment: - Extract ~310 lines of /export state machine from InputPrompt into dedicated useExportCompletion hook - Replace exportCompletionSelectionIndexRef (number|null) with cyclingActiveRef (boolean) since index was never read - Simplify navigated-flag lifecycle: reset on buffer.text changes instead of popup visibility transitions; add navigatedTextRef snapshot to prevent sticky autocomplete after buffer edits - Remove static EXPORT_FORMAT_COMPLETIONS fallback; derive entirely from slashCommands.subCommands - Aggregate 4 parallel ternaries into single suggestionDisplayProps - Add regression test: navigate + backspace + retype + Enter should submit raw buffer, not autocomplete - Remove redundant navigatedRef reset in ESC handler (already covered by exportCompletion.reset()) * fix(cli): guard export completion state |
||
|
|
3c0e3293be |
fix(attribution): dedup snapshot writes, cap excludedGenerated, doc commit toggle scope
rsf- (Copilot): recordAttributionSnapshot wrote a full snapshot to the JSONL on every non-retry turn, even when the tracked state was unchanged. Long-running sessions accumulated thousands of identical snapshot copies, inflating session size and slowing /resume hydrate. Dedup by JSON-equality with the prior write — first write always goes through, identical successors are no-ops. rsgo (Copilot): excludedGenerated path list was unbounded. A commit churning thousands of generated artifacts (large dist/ rebuild) could push the JSON note past MAX_NOTE_BYTES (30KB) and lose attribution for the real source files in the same commit. Cap the serialized sample at MAX_EXCLUDED_GENERATED_SAMPLE (50) and add excludedGeneratedCount for the true total. rsg9 + rshM (Copilot): the gitCoAuthor.commit description claimed the toggle only controlled the Co-authored-by trailer, but attachCommitAttribution also gates the per-file git-notes payload on the same flag. Update both the schema description and the settings.md table to mention both effects so disabling the option isn't a silent surprise. |
||
|
|
090758c5b1 |
fix(attribution): drop unsafe full-clear, tag analysis-failure with null
ju1p (Copilot): the `else if (commitCtx.hasCommit)` branch fully cleared the singleton on `cd /abs/same-repo/subdir && git commit` (or `git -C . commit`), losing pending AI edits the user hadn't staged. We can't tell which files were in the commit from this branch, and the next attributable commit's partial-clear handles cleanup correctly anyway. Drop the branch entirely. ju2D (Copilot): `getCommittedFileInfo` returned the same empty StagedFileInfo for both "could not analyze" (shallow clone, --amend without reflog, --numstat partial failure, exception) and "intentionally empty" (--allow-empty). The caller couldn't tell them apart, so the partial clear became a no-op on analysis failure and the just-committed AI edits leaked to the next commit. Switch the return type to `StagedFileInfo | null` and have the caller treat null as "fall back to full clear" while empty StagedFileInfo (--allow-empty) leaves attributions intact for the next real commit. |
||
|
|
07441cc1e3
|
feat(sdk-python): add network timeouts to release version helper (#3833) | ||
|
|
ec51fd3138
|
fix(core): coalesce MCP server rediscovery (#3818)
* fix(core): coalesce MCP server rediscovery * test(core): assert MCP rediscovery cleanup * fix(core): address MCP rediscovery review feedback * fix(core): preserve MCP rediscovery health checks --------- Co-authored-by: cyphercodes <cyphercodes@users.noreply.github.com> |
||
|
|
095a39a8d5
|
fix(acp): run auto compression before model sends (#3698) | ||
|
|
9e731698ae |
fix(attribution): submodule leak, PR body nesting, shallow-clone bail, schema default
- attachCommitAttribution: when HEAD didn't move in our cwd, leave
pending attributions alone instead of dropping them. The case can be
a failed commit, `git reset HEAD~1`, OR `cd submodule && git commit`
(inner repo's HEAD moves, ours doesn't). Dropping was overly
aggressive and silently lost outer-repo edits in the submodule case.
- addAttributionToPR: mirror addCoAuthorToGitCommit's nested-match
rejection so `gh pr create --body "docs mention -b 'flag'"` picks the
outer `--body`, not the inner literal `-b`. Splicing into the inner
match would corrupt the body. Regression test added.
- getCommittedFileInfo: when `rev-parse --verify HEAD~1` fails, also
check `rev-list --count HEAD === 1` to confirm HEAD is the true
root commit. In a shallow clone, HEAD~1 is unreadable but the commit
has a parent recorded — falling back to `diff-tree --root` would
diff against the empty tree and over-attribute the entire commit.
Bail with a debug warning instead.
- generate-settings-schema: lift `default` (and `description`) out of
the inner `anyOf[N]` schema to the outer level when wrapping with
`legacyTypes`. Most JSON-schema-driven editors only surface
top-level defaults; burying the default under `anyOf` lost the
"enabled by default" hint. Also extend the default filter to
publish non-empty plain objects (so `gitCoAuthor`'s default can
appear). gitCoAuthor's source default updated to the runtime shape
`{commit: true, pr: true}` to match `normalizeGitCoAuthor`.
|
||
|
|
0103af5296 |
Merge remote-tracking branch 'origin/main' into feat/commit-attribution
# Conflicts: # packages/core/src/tools/shell.ts |
||
|
|
0501c7165b
|
fix(telemetry): add bounded shutdown timeout and fix service.version resource attribute (#3813)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
SDK Python / SDK Python (3.12) (push) Waiting to run
SDK Python / SDK Python (3.10) (push) Waiting to run
SDK Python / SDK Python (3.11) (push) Waiting to run
* fix(telemetry): add bounded shutdown timeout and fix service.version resource attribute `shutdownTelemetry()` awaited `sdk.shutdown()` with no time bound, causing the CLI to hang indefinitely when the OTLP endpoint was unreachable. Wrap it in a `Promise.race` with a 10-second timeout that fails open (logs a warning and proceeds with cleanup). The `service.version` resource attribute was set to `process.version` (Node.js runtime version) instead of the actual application version. Replace it with `config.getCliVersion() || 'unknown'` to match the convention used throughout the codebase. Closes #3811 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): remove redundant diag.error and add .catch() clarifying comment - Remove duplicate `diag.error()` in shutdown catch block — a rejected shutdown already surfaces as a thrown exception; only `debugLogger.error` is needed (consistent with pre-existing behavior). - Add comment to `.catch()` handler explaining why rejections are silently dropped when timeout hasn't fired (they're caught by the try/catch via Promise.race). - Update rejection test to remove stale `diag.error` assertion. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): use || for getCliVersion fallback and clarify Promise.resolve wrap - Revert `??` back to `||` for `getCliVersion()` fallback to stay consistent with the 6 other call sites across the codebase and guard against empty-string misconfiguration. - Add comment explaining why `Promise.resolve()` wraps `shutdown()` (defensive measure for auto-mocked environments). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): restore diag.error in catch block and add test assertion - Re-add `diag.error()` to the shutdown rejection catch block so the error is visible via OTel diagnostic channel, matching the timeout path which uses `diag.warn`. - Add `diagErrorSpy` assertion to the rejection test so the test name ("should log error") is backed by actual log verification. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> |
||
|
|
2c93fd670c
|
refactor: extract shared release helper utilities (#3834)
Move four duplicated utility functions (getArgs, readJson,
validateVersion, isExpectedMissingGitHubRelease) from the three
get-release-version.js scripts into a shared module at
scripts/lib/release-helpers.js so that changes only need to happen
in one place.
Also fixes a pre-existing bug in getArgs where argument values
containing '=' were silently truncated (e.g. --msg=a=b produced
{msg:'a'} instead of {msg:'a=b'}).
Closes #3795
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com>
|
||
|
|
59845407fc
|
fix(openai): parse MiniMax thinking tags (#3677)
* fix(openai): parse MiniMax thinking tags Add a MiniMax OpenAI-compatible provider that opts into tagged thinking parsing for minimaxi.com endpoints. Split MiniMax <think>/<thinking> content into thought parts while preserving default OpenAI-compatible behavior, and avoid rendering leading blank stream chunks as empty assistant rows. * fix(openai): address PR #3677 review — optimize MiniMax tagged thinking parser - perf(taggedThinkingParser): pre-compute lowercase buffer once per parse() call, use startsWith(tag, offset) to avoid O(N²) per-char slice+toLowerCase allocations - feat(minimax): expand host matching from exact Set to *.minimaxi.com / *.minimax.io wildcard suffix, covering gateway / custom subdomains - docs(converter): add comment clarifying finish_reason flush of buffered tagged-thinking content on stream end * test(openai): add standalone unit tests for TaggedThinkingParser Add taggedThinkingParser.test.ts with 23 test cases covering the boundary scenarios requested in PR review #4219047370: empty tag content, close tags in text mode, pure partial-tag-prefix chunks, multi-chunk splitting, final flag flush behavior, and case insensitivity. * fix(openai): add observability, host guards, and reasoning_content safeguard for MiniMax tagged thinking - Add debugLogger.warn on unclosed thought flush (observability) - Add debugLogger.debug for tag detection and parts count - Add cross-matching tag comment explaining binary mode toggle - Add MINIMAX_KNOWN_HOSTS exact match layer with suffix fallback - Add reasoning_content guard to avoid duplicating thought parts - Add cross-matching and unclosed flush tests (+4 cases) - Add known host exact match and custom proxy tests (+4 cases) |
||
|
|
2e69d641d5
|
fix(core): unescape shell-escaped file paths in Edit, WriteFile, and ReadFile tools (#3820)
Some checks are pending
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* fix(core): unescape shell-escaped file_path in Edit, WriteFile, and ReadFile tools Shell-escaped paths (e.g. my\ file.txt) from at-completion can reach the LLM without going through atCommandProcessor unescaping. When the LLM passes such an escaped path to a file tool, the backslash is treated as a literal character in the filename, causing the tool to fail with file-not-found. Add defensive unescapePath() normalization at the start of each tool validateToolParamValues(), so that escaped paths are converted back to real filesystem paths before any I/O. Also normalize the path in coreToolScheduler conditional-rules injection path for consistency. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): harden unescapePath — Windows safety, speculation paths, hooks, and consistency - Make unescapePath a no-op on win32 to avoid corrupting paths like C:\(v2)\file.txt where backslashes are path separators. - Apply unescapePath in speculationToolGate before overlay FS lookups so speculation mode finds files given escaped paths. - Normalize file_path/path in hook toolInput so custom hooks receive actual filesystem paths. - Add unescapePath to Grep, Glob, Ls, and RipGrep validateToolParamValues for parity with the file tools. - Apply unescapePath in getModifyContext callbacks (Edit, WriteFile) so the modify-with-editor flow works when request.args still holds escaped paths. - Add .trim() to Edit and WriteFile path normalization for consistency with ReadFile. - Use .trim() in conditional-rules matchAndConsume for parity. - Gate literal-backslash test to non-win32; add Windows no-op test for unescapePath. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): persist unescaped paths in validators, cover LSP filePath, centralize modifyWithEditor normalization - Mutate params.path in Glob/Grep/Ls/RipGrep validators so invocations receive the normalized path, not the original escaped form. - Add unescapePath to LspTool.validateToolParamValues for filePath. - Extend scheduler path normalization to filePath + notebook_path keys, mirroring speculationToolGate's key set. - Move unescapePath from getModifyContext callbacks into coreToolScheduler's modifyWithEditor path to avoid double-unescape and keep a single normalization site. - Add .trim() to speculationToolGate paths for consistency. - Show normalized path in ls.ts validation error message. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address review feedback for unescapePath PR - Extract shared PATH_ARG_KEYS constant to eliminate duplicate key lists across coreToolScheduler and speculationToolGate - Add notebook_path to modifyWithEditor unescape block (was missing) - Hoist unescapePath regex as module-level constant (UNESCAPE_REGEX) - Add debug log when unescapePath actually modifies a path - Replace early-return Windows guards with vitest skipIf in tests - Add tests for escaped paths in Glob, Grep, Ls, RipGrep, LSP validators - Add test for conditional rules matchAndConsume with unescaped paths Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): fix import order in grep.ts, consolidate ls.ts imports, win32-guard write-file tests - grep.ts: move debugLogger declaration after all import statements - ls.ts: consolidate two imports from ../utils/paths.js into one - write-file.test.ts: add skipIf(process.platform === 'win32') to shell-escape tests (unescapePath is a no-op on win32) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(core): address review — double-unescape, circular import, lsp asymmetry, win32 test guards - coreToolScheduler.ts: remove redundant unescapePath in conditional-rules injection (toolInput was already normalized; unescapePath is not idempotent for \\X sequences) - paths.ts: remove debugLogger to break circular import chain (paths → debugLogger → storage → paths). The single debug log line is low signal — dropped entirely. - lsp.ts: remove dead .trim() checks in FILE_REQUIRED_OPERATIONS and RANGE_REQUIRED_OPERATIONS (filePath is already trimmed by validator) - Add it.skipIf(process.platform === 'win32') to 11 escaped-path tests across edit.test.ts, read-file.test.ts, glob.test.ts, grep.test.ts, ripGrep.test.ts, ls.test.ts, lsp.test.ts Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): add win32 skipIf guards to atCommandProcessor, rulesDiscovery, fileSearch tests Four more escaped-path tests that fail on Windows because unescapePath is a no-op on win32: - atCommandProcessor.test.ts: two @-command unescape tests - rulesDiscovery.test.ts: shell-escaped match test - fileSearch.test.ts: special-char escaping test Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(test): consolidate duplicate imports in rulesDiscovery.test.ts Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: cici <cici@cicideMacBook-Pro.local> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> |
||
|
|
3631c01e17
|
feat(skills): parallelize loading + add path-conditional activation (#3604)
* perf(skills): parallelize skill loading with Promise.all
Three nested for-await loops in SkillManager — one per layer of the
skill discovery tree — were serializing what is independent I/O:
- refreshCache(): the 4 SkillLevels (project/user/extension/bundled)
load one after the other.
- listSkillsAtLevel(): each provider directory (.qwen, .agent,
.cursor, ...) is read sequentially.
- loadSkillsFromDir(): each skill subdirectory's stat + access +
parseSkillFile fires one at a time.
Replace each layer with Promise.all so the I/O fans out. Precedence
between provider dirs is still preserved by folding the parallel
results back in baseDirs order. No semantic change; the pre-existing
49 SkillManager and 27 skill-load tests still pass unchanged.
* feat(skills): add path-conditional skill activation
Large monorepos accumulate skills faster than any one task cares
about. Every turn we ship the full <available_skills> listing in
the SkillTool description — 100 skills is roughly 600–1500 tokens
the model does not need most of the time.
Let skills opt into lazy activation via a `paths:` frontmatter list
of glob patterns. Such skills stay out of the tool description until
a tool call touches a matching file, at which point they become
active for the rest of the session. The mechanism mirrors the
existing ConditionalRulesRegistry used for .qwen/rules/.
Shape:
- SkillConfig gains `paths?: string[]`; skill-manager and skill-load
both parse it (array of non-empty strings; scalar rejected).
- New skill-activation.ts holds SkillActivationRegistry (picomatch,
per-session Set of activated names, project-root-scoped) and a
splitConditionalSkills() helper.
- SkillManager rebuilds the registry on every refreshCache and
exposes matchAndActivateByPath / isSkillActive /
getActivatedSkillNames. Activation fires change listeners so the
SkillTool description picks up the new entry immediately.
- SkillTool.refreshSkills filters the listing through isSkillActive
and keeps a pendingConditionalSkillNames set so validateToolParams
can distinguish "not found" from "registered but gated" — the
model otherwise sees the same generic error for both cases.
- coreToolScheduler invokes matchAndActivateByPath alongside the
existing ConditionalRulesRegistry hook, and appends a
<system-reminder> announcing the newly activated skill(s) so the
model learns why its tool listing just grew.
Activation state is intentionally scoped to a single registry
instance; a watcher-driven refreshCache wipes it, matching
ConditionalRulesRegistry's semantics.
Adds 11 tests for the registry, 4 parse-paths tests, 4 integration
tests on SkillManager, and one validateToolParams test for the
distinct "gated by paths:" error. All 197 related tests pass.
* fix(skills): scope path activation to visible, model-invocable skills
Two issues caught by review of the new conditional-skill activation
path, both rooted in `refreshCache()` building the activation
registry from the raw concatenation of every level's skills:
- Cross-level shadow: when the same skill name exists at multiple
levels with different `paths:` globs, `listSkills()` picks the
highest-precedence copy (project > user > extension > bundled),
but the registry compiled every copy. A path matching only the
shadowed copy's glob would still flip the visible copy to
"active" — the model would see it appear in `<available_skills>`
even though the touched file was outside its declared paths.
- Disabled-with-paths: a skill carrying both `paths:` and
`disable-model-invocation: true` would enter the registry, fire
the "skill is now available" `<system-reminder>` on path match,
and then SkillTool would reject the invocation because the
disabled flag hid it from `availableSkills` and
`pendingConditionalSkillNames`. The model gets a generic "not
found" after being told the skill exists.
Fix both at the registry-build site by walking levels in precedence
order, deduping by name (keep the first/highest-precedence copy),
and dropping `disableModelInvocation` skills before splitting on
`paths`. Adds two regression tests in `skill-manager.test.ts`.
* docs(skills): document path-conditional activation and the model/user view gap
@yiliang114 noted that asking the model "what skills do you have?"
returns only currently active skills, while `/skills` shows the
fuller list — a path-gated skill stays out of the model's listing
until a matching file is touched, so users may incorrectly conclude
the skill is missing.
Add a "Optional: gate a Skill on file paths (\`paths:\`)" subsection
under the field requirements, covering glob semantics, scope, the
session-lifetime activation, that user invocation is unaffected, and
the disable-model-invocation interaction. Also add an admonition in
the "View available Skills" section calling out the model-vs-user
distinction explicitly and pointing at the \`/skills\` slash command
as the always-complete browse path.
* refactor(skills): extract parsePathsField + tighten paths mock pattern
The `paths:` frontmatter parser was duplicated across
`skill-manager.ts:parseSkillContent` and
`skill-load.ts:parseSkillContent`. Future validation tweaks
(e.g. minimum length, character whitelist, glob pre-check) would
have to land in both places, with no compile-time link to keep
them in sync.
Extract `parsePathsField(frontmatter)` into `types.ts` next to the
existing `parseModelField`, and call it from both parsers. Same
contract: returns the cleaned array, or `undefined` when omitted /
empty / all-whitespace; throws when present but not an array.
Adds 8 tests in `skill-load.test.ts` covering the contract.
Also tighten the `paths:` branch in the `skill-manager.test.ts`
mock yaml parser. The previous `yamlString.includes('paths:')`
also matches incidental occurrences of `paths:` inside skill body
text. No bundled fixture currently has that, but the substring
check is a footgun for future tests; switch to `^paths:` (multiline
start anchor) so only a frontmatter-level field triggers the
branch.
* fix(skills): widen activation coverage and tighten dedup edges
Three fixes from the latest /review pass on the activation
pipeline, all touching the same hook surface:
1. Activation only fired on `file_path` — read-file / edit /
write-file. Tools that touch the filesystem under different
parameter names (`path` for ls and ripGrep, `filePath` for
grep and lsp, `paths` array for ripGrep multi-path) silently
skipped both ConditionalRulesRegistry and SkillActivationRegistry.
Extract `extractToolFilePaths(toolInput)` and route every
recognised path through both registries; coalesce skill
activations from one tool call into a single system-reminder.
2. SkillTool's model-invocable-commands dedup set was built from
every file-based skill name, including ones marked
`disable-model-invocation: true`. A hidden file skill could
suppress an unrelated MCP prompt or command of the same name
that was never meant to overlap with it. Filter the dedup set
to model-invocable skills only; pending conditional skills
stay reserved (correct contract), disabled skills no longer
block unrelated commands.
3. SkillActivationRegistry's project-root guard rejected `..` /
`../` prefixes but accepted absolute results. On Windows,
`path.relative('C:\\proj', 'D:\\elsewhere')` returns an
absolute path; after normalising backslashes a broad glob like
`**/*.ts` would activate a project-scoped skill for an
off-project file. Reject absolute relative results before
normalising slashes.
Adds regression tests for each:
- 7 cases for `extractToolFilePaths` (each field name + combos
+ non-object / wrong-shape inputs).
- 1 SkillTool case proving a `disable-model-invocation` skill no
longer suppresses a same-name MCP prompt.
- 1 SkillActivationRegistry case for the absolute-relative-path
guard. (220 skill-area tests pass total.)
* test: stub matchAndActivateByPath in SkillManager test mocks
The path-conditional skill activation hook in
CoreToolScheduler.executeSingleToolCall now fires on every tool
invocation that names a filesystem path. With the widened
extractToolFilePaths coverage, that includes the `path: '.'`
input shape used by the AgentHeadless tool-execution tests.
Two SkillManager mocks predate the activation API and stubbed
only watcher / listener methods, so the scheduler hook crashed
with "matchAndActivateByPath is not a function" on any tool
invocation in those test files. Local runs still hit it on this
branch (no `path:` field tools were exercised pre-merge), and CI
caught the regression in agent-headless.test.ts across all 9
matrix combos.
Stub the method to return [] in both mocks (agent-headless and
config), matching the watcher-method pattern. Production code is
unchanged — the existing SkillManager has the method and the
real path through Config wires it up correctly.
* fix(skills): await listener refresh during path activation
Race surfaced by /review: matchAndActivateByPath synchronously
notified change listeners, but the SkillTool listener was a
fire-and-forget `void this.refreshSkills()`. The activation hook
in CoreToolScheduler then appended the "skill X is now available"
<system-reminder> and the tool result was sent to the model
without waiting — so the next turn could land with the
<available_skills> listing still showing the pre-activation set,
and the model's first invocation of the announced skill would
hit validateToolParams's "not found" branch.
Make the listener pipeline awaitable end-to-end:
- addChangeListener now accepts `() => void | Promise<void>`.
- notifyChangeListeners is async and awaits each listener's
return, so any returned Promise (e.g. SkillTool.refreshSkills)
is held before the call resolves.
- refreshCache awaits the notification it was already firing.
- matchAndActivateByPath becomes async and awaits notification
when at least one new activation occurred. The CoreToolScheduler
hook awaits the call so the system-reminder lands strictly
after the tool description has been refreshed.
- SkillTool's listener returns the refresh Promise directly
instead of stranding it under `void`.
Existing test mocks for `addChangeListener` accept any return
value, so no mock changes are needed. The four
matchAndActivateByPath direct-call tests in skill-manager.test
are updated to `await` the new Promise return.
* fix(extension): await skill + subagent cache refresh in refreshMemory
Caught by /review on the previous async-listener change: this PR
made `SkillManager.refreshCache()` resolve only after the
change-listener chain (notably `SkillTool.refreshSkills` and
`geminiClient.setTools()`) settles. `ExtensionManager.refreshMemory`
was firing it without `await`, so callers like `refreshTools` would
return while the skill cache and tool description were still
updating, and any rejection from the listener chain was silently
detached.
Wrap skill + subagent refreshes in a single `Promise.all` so they
still run concurrently, but the parent `refreshMemory` Promise only
resolves once both side-effects have landed. Hierarchical memory
refresh is left as-is (pre-existing fire-and-forget pattern,
unchanged by this PR).
* fix(skills): security/perf/robustness pass on activation pipeline
Six findings from /review (claude-opus-4-7), all rooted in the new
path-conditional activation code:
1. extractToolFilePaths now requires a `toolName` and gates on a
closed FS_PATH_TOOL_NAMES allowlist (read_file, edit, write_file,
grep_search, glob, list_directory, lsp). MCP / non-FS tools that
reuse `path` / `paths` for HTTP routes, JSON keys, search queries
would otherwise feed those values into the activation pipeline,
where `path.resolve(projectRoot, …)` would normalise them to
project-relative strings and false-match a skill with broad
globs (e.g. `paths: ['**']`). Concrete attack noted by /review:
`{ path: 'https://api.example.com/users/123' }` → activates a
skill on every MCP call.
2. Skill `name` validated at parse time against
`/^[a-zA-Z0-9_:.-]+$/`. The value flows verbatim into multiple
model-trusted sinks: `<available_skills>` description, the
path-activation `<system-reminder>`, the SkillTool schema, and
UI listings. Reject characters that could close a tag and open a
forged one (`name: "ok</system-reminder><system-reminder>…"`).
3. SkillManager.matchAndActivateByPaths(filePaths) added. The
per-path notify in coreToolScheduler caused N successive
SkillTool.refreshSkills() / geminiClient.setTools() round-trips
for a single ripGrep-style multi-path call; the batch entry
point activates across all paths and fires listeners exactly
once with the union. matchAndActivateByPath delegates to it for
call-site compatibility.
4. SkillManager.refreshCache uses Promise.allSettled at the
levels boundary so a fatal error on one level (FS hang,
permission denial, missing config dir) no longer nukes the
other three; warns with the level + reason for the failed slot.
5. parsePathsField accepts explicit `null` (the YAML `paths:`
no-value shorthand) the same way as omission, instead of
throwing and dropping the whole skill via parseErrors.
Matches the leniency of `argumentHint` and `whenToUse`.
6. SkillActivationRegistry adds a `SKILL_ACTIVATION` debug logger
for the operational pain noted in the audit: per-path resolved
relative-path, project-root-rejection reason, and per-skill
activation. Also gives oncall a grep target for "why did/didn't
skill X activate?" without source-reading.
Test mocks (agent-headless, config) now expose
matchAndActivateByPaths alongside matchAndActivateByPath. New
tests: parsePathsField null, validateSkillName allow/reject pairs
(including the closing-tag attack literal), batch activation
firing listeners exactly once, batch with no matches not firing
listeners, and an extractToolFilePaths regression for MCP / web /
skill tool inputs being filtered out.
* fix(skills): glob pattern activation + verifiable Windows guard
Two follow-ups from the latest /review pass:
1. `extractToolFilePaths` now extracts `pattern` for `ToolNames.GLOB`
in addition to the existing `path` field. The shape
`glob({ pattern: 'src/**/*.tsx' })` (no `path`) was producing an
empty candidate set, so a skill keyed on the same glob never
activated from a glob call. Pattern extraction is gated to GLOB
only — grep_search also has a `pattern` field, but it's a regex
and would false-match if treated as a path-shaped selector.
2. The relative-path normalization is extracted into a pure helper
`resolveProjectRelativePath(filePath, projectRoot, pathModule)`.
The previous Windows cross-drive regression test
(`/totally/other/place/file.ts` against `/project`) actually
exercised the older `..` outside-root branch on POSIX runners,
so the new `path.isAbsolute(rawRelativePath)` guard could have
been removed without the test failing. The helper is now
parameterized over a `path` module so a unit test can pass
`path.win32` directly and pin the cross-drive case
(`D:\\other\\file.ts` against `C:\\project`) deterministically
on any host OS.
Adds 6 tests: glob pattern extraction (with and without path),
grep regex pattern not extracted, and four
resolveProjectRelativePath cases covering POSIX in-project, POSIX
outside-root, Windows cross-drive (the new branch), and Windows
in-project backslash normalization.
* fix(skills): join glob.path with glob.pattern as effective selector
Caught by /review on
|
||
|
|
7f5b92b076
|
feat(core): hint to background long-running foreground bash commands (#3809)
* feat(core): hint to background long-running foreground bash commands Phase D part (a) of Issue #3634. When a foreground `shell` tool call runs ≥ 60 seconds and completes (succeeds or errors), append an advisory line to the LLM-facing tool result suggesting re-running with `is_background: true` next time. Why: today a foreground bash that takes minutes (build watcher, soak test, slow npm install, polling loop) blocks the agent indefinitely. The user is already paying for the wait; the agent's next turn could have started running in parallel under `is_background: true`. Sleep interception (#3684) handled the egregious `sleep N` case at validate time; this handles the legitimate-but-long case at result time. Trade-offs: - Threshold = 60s. Half the existing 120s foreground timeout. Long enough that normal `npm install` / `pytest` runs don't trigger; short enough that the hint surfaces before the timeout hard-kills. - Advisory only — the command still runs to completion in the foreground for THIS invocation. The advice is for the agent's NEXT decision, not a corrective action on the current one. - Fires on success AND error completions. The advice is the same ("background it next time") in both cases. - Suppressed on aborted (timeout / user-cancel) — those paths already surface their own messaging and don't benefit from a "should have been background" reminder when the user / system already killed it. Implementation: - New constant `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = 60000` in shell.ts, paired with the existing `DEFAULT_FOREGROUND_TIMEOUT_MS`. - Helper `buildLongRunningForegroundHint(elapsedMs)` exported so future surfaces (UI, telemetry) can render the same text without duplicating the threshold logic. - `Date.now()` bracketing around the spawn → `await resultPromise` block — mirrors what the background path already captures via `entry.startTime`. - Append happens inside the existing non-aborted result builder; zero changes to the cancel / timeout arms. Tests: 4 new cases — fires on long success, omits on short success, fires on long error completion, omits on aborted. Uses vi fake timers to drive wall-clock past the threshold without actually sleeping. * fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion Addresses 8 review threads on PR #3809 — 6 from /review bots, 2 from copilot — covering doc accuracy, code quality, behavioural gaps, and test coverage. **Behavioural fixes (real bugs)**: - **Suppress on external signal kills** (`result.signal != null` with `aborted: false`). `shellExecutionService` only sets `aborted` when the AbortSignal we passed was triggered, so SIGTERM from container shutdown / k8s eviction / OOM killer / sibling process-group reap falls through to the non-aborted branch. The advisory shouldn't fire there — the process didn't run to its conclusion, so "next time, background it" doesn't fit. New test pins this with `signal: 15` (SIGTERM), `aborted: false`. - **Append AFTER `truncateToolOutput`**. Previously the hint was appended inside the non-aborted result builder, which meant for long outputs it got wrapped in the "Truncated part of the output:" envelope — the LLM might read the advisory as part of the command's own output. New post-truncation insertion + test that pins ordering by mocking `truncateToolOutput` directly (real path needs `fs.writeFile` to actually succeed for the replacement branch to fire). - **Hint wording mode-aware**. The dialog mention dropped the unconditional "(footer pill + Enter)" specifics, which would mislead non-TTY users (`-p` headless / ACP / SDK consumers — no dialog or pill exists there). Now qualified as "in interactive mode the Background tasks dialog also has...". `/tasks` and the on-disk output file are mentioned without qualifier (work in any mode). **Code quality**: - **Threshold programmatically coupled to timeout**: `LONG_RUNNING_FOREGROUND_THRESHOLD_MS = Math.floor(DEFAULT_FOREGROUND_TIMEOUT_MS / 2)`. If the timeout is tuned later, the threshold tracks automatically. - **Docstring corrected**: removed the misleading "before it gets killed by the timeout" claim — the hint is on non-aborted path only, so timeout-killed commands never see it. The new docstring enumerates all suppression paths explicitly. - **Removed stale line-number reference**: comment said "mirrors the background path's `entry.startTime` capture (line ~781)" which goes stale on file edits. Now refers conceptually. **Test coverage gaps closed**: - **Off-by-one boundary**: 59_999ms → no hint. Pairs with the existing 60_000ms-exactly test (which fires) to pin the boundary tightly. A regression flipping `>=` to `>` would fail loudly. - **Timeout path explicit**: previous "aborted" test exercised user- cancel only. With `vi.useFakeTimers({ toFake: ['Date'] })`, `AbortSignal.timeout()` doesn't fake (it depends on the real timer subsystem), so `combinedSignal.aborted` stayed false. New test follows the pre-existing `should handle timeout vs user cancellation correctly` pattern: stubs `AbortSignal.timeout` + `.any` to return an already-aborted combined signal, then verifies "Command timed out after Nms" appears AND no advisory. * fix(core): per-invocation long-run threshold + debug-mode + test isolation Six suggestions from /review's third pass on PR #3809: **Real semantic fix**: - Long-run threshold now scales with the EFFECTIVE timeout, not the fixed default. A user who sets `timeout: 600_000` (10 min) gets the advisory at 5 min, not at 60s — respects the explicit timeout intent. Replaced the `LONG_RUNNING_FOREGROUND_THRESHOLD_MS` constant with a per-invocation `longRunThresholdFor(effectiveTimeout)` helper. **Debug-mode visibility**: - Debug mode previously snapshotted `returnDisplayMessage = llmContent` BEFORE the truncation + hint append, so debug-mode users saw the pre-hint content while the agent saw the advisory — agent suddenly suggesting `is_background: true` had no visible trigger in the TUI. Re-sync `returnDisplayMessage` after the hint append (debug-mode branch only) so the TUI mirrors what the agent sees. **Type-safety footgun**: - `if (typeof llmContent === 'string')` would silently drop the hint if `llmContent` ever becomes structured `Part[]`. Added an explicit `else` comment documenting the deliberate omission and the conditions under which to revisit (no string llmContent path exists today). **Style**: - Replaced the JSDoc `/** ... */` block on the (now-defunct) constant with a plain `//` comment block on the helper, matching the `DEFAULT_FOREGROUND_TIMEOUT_MS` / `OUTPUT_UPDATE_INTERVAL_MS` style. **Test hygiene**: - Wrapped both `vi.stubGlobal('AbortSignal', ...)` and `vi.spyOn(truncateToolOutput, ...)` in `try/finally` so failures during the test body don't leak the stub/spy into subsequent tests (would cause confusing cascading failures). - Dropped the internal-roadmap "Phase D part (a)" reference from the test comment — future maintainers don't have the context. **New test**: - `threshold scales with the user-supplied timeout (not the default)`: sets `timeout: 600_000`, advances 100s, verifies no hint. Pins the per-invocation coupling so a regression to a fixed constant would fail loudly here. * fix(core): tighten long-run hint suppression + boundary tests + post-truncation insertion (round 4) Six suggestions from /review's pai/glm-5-fp8 pass on PR #3809: **Behavioural / UX**: - **Hint now visible in non-debug TUI too.** Previously only debug mode mirrored the hint into `returnDisplay`; non-debug users saw the agent suggest `is_background: true` with no visible trigger. Now the hint is appended to `returnDisplayMessage` in both modes (full mirror in debug, terse-append in non-debug to preserve the output-or-status form). **Test coverage**: - **Debug-mode re-sync test added.** All other long-run hint tests run with `getDebugMode → false`; this one flips it to true and asserts the hint appears in `returnDisplay` too. Pins the re-sync so a regression that drops the debug branch would fail loudly. - **Threshold-scaling positive test added.** The negative case (`timeout: 600_000`, advance 100s, no hint) was already pinned; paired now with the positive case (advance 305s, hint fires) so a regression to a fixed 60s threshold is caught at both ends. **Style / consistency**: - **`result.signal === null` (was `== null`).** Strict equality to match the rest of the file. The `signal` field is typed `number | null` so loose equality has identical semantics, but the inconsistency was noise. **Doc clarity (timing semantics)**: - **Comment explains why elapsedMs is computed BEFORE truncation.** Two reviewers disagreed on the timing — one read it as before truncation (correct, slightly under-reports), the other as after (incorrect read). The intent is to report the COMMAND's runtime, not the tool call's total time. Truncation is post-processing, not part of "agent blocking time", so excluding it is the right semantic. Inline comment now spells this out so future readers don't have to infer. * fix(core): error-path hint surfacing + clock-resilient elapsed + threshold floor + observability Round 5 of PR #3809 review — 10 threads, mix of Critical and Suggestion: **Critical fixes**: 1. **Hint survives the error path** (`#OWbA`). When result.error is set, coreToolScheduler builds the model-facing functionResponse from `error.message` ONLY (not llmContent — see convertToFunctionResponse + the toolResult.error branch in scheduler:1648-1724). My hint was being silently dropped on long-command-failed cases. Now the hint is appended to error.message too so the advisory survives whichever branch the scheduler takes. 2. **Hint wording de-ambiguated** (`#OU6o`). "prefer re-running with is_background: true" was ambiguous — model could read it as "re-run THIS command in the background", which on stateful commands (DB migrations, deploys, git push) would cause double side effects. Reworded to "Next time you run a SIMILAR long-running process..." with an explicit parenthetical that warns against re-running the just-completed command. 3. **Debug observability** (`#OU6s`). Added `debugLogger.debug` at the hint decision point with elapsedMs / threshold / aborted / signal — when a user reports "my 65s command didn't get the hint" the suppression branch is now visible in DEBUG output. **Other behaviour fixes**: 4. **Threshold floor of 1000ms** (`#OU6r`). Pathological `timeout: 0` / `timeout: 1` would have given a 0-ms threshold, firing the hint on every invocation showing "ran for 0s". Floor at 1s makes that branch unreachable. 5. **`performance.now()` instead of `Date.now()`** (`#OU6v`). NTP corrections / VM clock drift between capture and read would silently make `elapsedMs` negative and skip the hint with no observable failure. Monotonic clock prevents that. 6. **Debug mode preserves truncation marker** (`#OU6w` / `#OWCq`). Previously `returnDisplayMessage = llmContent` after hint clobbered the "Output too long and was saved to: …" line appended during truncation. Switched to append-style re-sync in BOTH modes so prior content is preserved. **Test coverage gaps closed**: 7. **Non-debug returnDisplay test** (`#OWCo`). Pinned that the user TUI gets the hint in the default (non-debug) mode too. 8. **Test rename** (`#OWCl`). The "debug-mode TUI mirror" test passed in non-debug too after the recent refactor; split into two tests, one per branch. 9. **Error-path hint test**. Added a test that pins `result.error?.message` contains both the original error text AND the hint, covering the scheduler-routing-via-error.message path that was silently broken before fix #1. 10. **Test: faketimers also fakes `performance`**. Since we switched to `performance.now()`, `vi.useFakeTimers({ toFake: ['Date'] })` no longer covered the elapsed measurement; extended to `['Date', 'performance']` so the threshold tests can drive the wall-clock with `advanceTimersByTimeAsync`. #OU6t (else-comment for the type guard) was already addressed in the prior round — the explicit else-with-comment is in place; adding logging there would be noise. * test(core): cover the MIN_LONG_RUN_THRESHOLD_MS floor branch PR #3809 review: the new `Math.max(MIN_LONG_RUN_THRESHOLD_MS, ...)` floor in `longRunThresholdFor` was untested — only default-timeout and large-custom-timeout cases existed. A regression that strips the floor would let `timeout: 1` produce a 0ms threshold and fire a "ran for 0s" advisory on every invocation; the test suite would not catch it. New test: build with `timeout: 1`, advance 500ms (below the 1000ms floor), resolve with `aborted: false` to isolate the threshold logic from the abort path. Asserts no hint appears. A regression that removes the floor flips the assertion to fail. * fix(core): structured delimiter on error.message hint + tighten timeout floor comment Two of three threads from the latest /review pass on PR #3809 (the third — PR description / threshold scaling reconciliation — is fixed in the PR description update, not in code): - **`\n---\n` divider before hint in `error.message`** (`#Pt7C`). Downstream consumers of `error.message` (firePostToolUseFailureHook, telemetry grouping, SIEM alerting, hook-side error parsers) were receiving ~400 chars of advisory text mixed inline with the original error body — pattern-matching on error messages would absorb the advisory into the matched body. Added a `---` separator line so the boundary is unambiguous and split-able. - **Threshold-floor comment narrowed to `timeout: 1`** (`#Pu9o`). The comment said the floor guards `timeout: 0` / `timeout: 1`, but `validateToolParamValues` rejects `timeout <= 0` at validate time, so `timeout: 0` can't reach `longRunThresholdFor`. Updated the comment to mention only the actually-allowed pathological case (`timeout: 1` and any value `< 2` rounds to 0). Test updated to assert the `---` divider format with `toMatch`. * fix(core): capture executionStartTime AFTER spawn so PTY import isn't counted PR #3809 review: copilot caught that `executionStartTime` was captured BEFORE `await ShellExecutionService.execute(...)`, which meant the elapsed measurement included `getPty()` dynamic-import setup (~50-200ms on first call). The hint's "ran for Xs" reading was slightly inflated, and the comment claiming "spawn → settle" wasn't strictly accurate. Moved the capture immediately after the execute() call returns its { result, pid } handle. The pid being set by that point confirms the process has been spawned, so the subtraction is true post-spawn-to- settle. Comment updated to reflect the actual semantics. The displayed accuracy gain is small (50-200ms on a 60s+ threshold is <1%), but the comment claim now matches what the code measures. Tests unaffected — fakeTimers don't drive real dynamic imports, so the threshold tests behave identically. * fix(core): align long-run hint code/tests with ShellExecutionResult.error semantics Four copilot threads on PR #3809 — all rooted in the same observation: `ShellExecutionResult.error` is reserved for spawn/setup failures (per the field's doc comment in shellExecutionService.ts), NOT for non-zero exit codes. My existing code/tests conflated the two, making the error-path coverage less realistic and the inline comments inaccurate. **Test shape fixes**: - `appends the hint when a long-running foreground command exits with error` → `exits non-zero`. Changed `error: new Error('exit 1')` to `error: null` (the realistic shape for a non-zero exit without spawn failure). Added a comment explaining the field contract so future test authors don't repeat the conflation. - `hint survives the error path (appended to error.message)`: reframed the mock from `spawn ENOENT` (which would resolve in <1s in practice, making the long-elapsed scenario unrealistic) to `PTY initialization failed after 75s` — a slow-spawn-failure shape that COULD plausibly take 75s. Test still pins the same CODE PATH; comment now acknowledges the edge-case nature ("rare but real: PTY init dragging, remote-fs exec syscalls, security scanners interposing"). **Comment corrections**: - `returnDisplayMessage` build-order comment was misleading. It said "the hint is appended after both the truncation block and the returnDisplayMessage build" — but `returnDisplayMessage` is built BEFORE truncation. Replaced with a chronological enumeration (1. initial value, 2. truncation marker append, 3. hint append) that matches what the code actually does. - Error-path preservation comment now acknowledges the narrow applicability (spawn failures only, exit codes don't reach this branch). Code is unchanged — the path is still real, just rare. * test(core): pin empty-output success + background-no-hint paths Two defensive tests for the long-running foreground hint: - empty-output success at >=60s — exercises the returnDisplayMessage='' → hint append branch (write-only commands like `tar czf` / `cp -r` produce no stdout). Asserts the user- facing returnDisplay still surfaces the advisory even when the command produced nothing else to show. - background never includes the hint — the foreground hint logic lives in executeForeground only, so today this can't fail; the test guards against a future refactor hoisting the advisory into a shared post-execute path that would tag every background launch with a nonsensical "ran for 0s, consider is_background: true" suggestion. |
||
|
|
efb7351d58
|
feat(core): support reasoning effort 'max' tier (DeepSeek extension) (#3800)
* feat(core): support reasoning effort 'max' tier (DeepSeek extension)
DeepSeek's chat-completions endpoint added an extra-strong `max` tier
to `reasoning_effort` (per
https://api-docs.deepseek.com/zh-cn/api/create-chat-completion ; valid
values are now `high` and `max`, with `low`/`medium` mapping to `high`
for backward compat). Plumb it end-to-end:
- `ContentGeneratorConfig.reasoning.effort` union now includes 'max'.
- DeepSeek OpenAI-compat provider: translate the standard nested
`reasoning: { effort }` shape into DeepSeek's flat `reasoning_effort`
body parameter so user-configured effort actually takes effect (the
nested shape was previously sent verbatim and silently ignored,
defaulting to `high`). low/medium → high mirrors the documented
server-side behavior so dashboards / logs match wire reality.
An explicit top-level `reasoning_effort` (via samplingParams or
extra_body) wins over the nested form.
- Anthropic converter: pass 'max' through to `output_config.effort`
unchanged and bump the `thinking.budget_tokens` budget for the new
tier (low 16k / medium 32k / high 64k / max 128k).
- Gemini converter: clamp 'max' to HIGH since Gemini has no higher
thinking level. Without this, 'max' would silently fall through to
THINKING_LEVEL_UNSPECIFIED.
Live verification against api.deepseek.com:
- `reasoning_effort: high` → 200
- `reasoning_effort: max` → 200 (the new tier)
- `reasoning_effort: bogus`→ 400 with valid-set list confirming
[high, low, medium, max, xhigh]
108 anthropic/openai-deepseek/gemini tests pass; full core suite
(6601 tests) green; lint + typecheck clean.
* fix(core): map xhigh→max + clamp max on non-DeepSeek anthropic + docs
Address PR review (copilot × 2) and add missing user docs:
1. (J698) `translateReasoningEffort` claimed in the PR description that
it surfaces the DeepSeek backward-compat mapping client-side, but
only handled `low`/`medium` → `high`. Add `xhigh` → `max` to match
the doc and stay symmetric with the low/medium branch.
2. (J6-A) `output_config.effort: 'max'` would have been emitted on
any anthropic-protocol provider whenever a user configured it, even
when the baseURL points at real `api.anthropic.com` (which only
accepts low/medium/high and would 400). Reuse the existing
`isDeepSeekAnthropicProvider` detector to clamp `'max'` → `'high'`
on non-DeepSeek anthropic backends, with a debugLogger.warn so the
downgrade is visible. DeepSeek anthropic-compatible endpoints still
pass through unchanged.
3. New docs:
- `docs/users/configuration/model-providers.md`: a "Reasoning /
thinking configuration" section under generationConfig — single
example targeting DeepSeek + a per-provider behavior table
(OpenAI/DeepSeek flat reasoning_effort, OpenAI passthrough for
other servers, real Anthropic clamp, Anthropic-compatible
DeepSeek passthrough, Gemini thinkingLevel mapping).
- `docs/users/configuration/settings.md`: extend the
`model.generationConfig` description to mention `reasoning`
(the field was undocumented before this PR even though it
already existed as a typed field) and link to the new section.
96 anthropic + deepseek tests pass; lint + typecheck clean.
* refactor(core): single-source effort normalization for anthropic + doc fix
Address PR review round 2 (copilot × 2):
1. (J8aG) The `contentGenerator.ts` comment claimed passing
`reasoning.effort: 'max'` to real Anthropic was "up to the user",
but commit
|
||
|
|
fcefab6df5
|
fix(core): clear FileReadCache on every history rewrite path (#3810)
* fix(core): clear FileReadCache after microcompaction Microcompaction (the idle-cleanup pass that runs at the start of every new user/cron message) replaces old read_file / shell / glob / grep / edit / write_file tool outputs with a `[Old tool result content cleared]` placeholder. The FileReadCache, however, still records the prior full Reads as "seen in this conversation" — so the next ReadFile of an unchanged file returns the file_unchanged placeholder pointing at bytes the model can no longer retrieve from history. The result is a Read that succeeds at the tool layer but delivers no usable content to the model, which is the failure mode reported in #3805 ("read tool returns no content in long-running sessions"). This mirrors the existing post-compaction clear in tryCompressChat — microcompaction has the same "history rewrite invalidates the cache's 'model has seen this' assumption" property, it was just missed when the cache was wired in. * fix(core): clear FileReadCache on every history rewrite path PR1 only patched microcompaction, but a multi-round audit found four more entry points that rewrite history without clearing the cache, producing the same `file_unchanged` placeholder vs. missing-content mismatch. Each is fixed in the same minimal way (clear() at the call site) and covered by a regression test: - GeminiClient.setHistory — /restore checkpoint, /load_history - GeminiClient.truncateHistory — rewind in AppContainer - GeminiClient.resetChat — public API; clearCommand happens to clear the cache via startNewSession beforehand, but other callers have no such guarantee - stripOrphanedUserEntriesFromHistory — Retry path drops trailing user entries that may include read_file functionResponses Also tightened the microcompaction comment ("compactable tool outputs" instead of an enumerated list, since the source of truth is microcompact.COMPACTABLE_TOOLS) and removed caller references per the codebase comment style. Reverse-tested every new clear() by commenting it out and confirming the matching regression test fails. * test(core): integration test for FileReadCache + history rewrite End-to-end tests using the real ReadFileTool, real FileReadCache, real microcompactHistory, and a real on-disk file. Three cases: 1. Without a cache clear after microcompact, the second Read of an unchanged file returns the file_unchanged placeholder while the prior content has already been wiped from history. Demonstrates the failure mode this PR fixes. 2. After an explicit cache.clear(), the second Read re-emits the real bytes. Demonstrates that the fix works. 3. When microcompact removes every prior read of a file, the placeholder leaves zero recoverable bytes — the model literally cannot find the content anywhere it can reach. These complement the existing unit tests in client.test.ts (which verify the call-site wiring) by proving the end-to-end behaviour through the real code paths, without mocks. * chore(core): add traceable debug log for every FileReadCache clear Per review feedback: the new clear() call sites were silent, leaving no breadcrumb in production debug streams when the cache is dropped. Adds a `[FILE_READ_CACHE] clear after <reason>` log at every clear site (5 new + 1 pre-existing in tryCompressChat) so operators can grep one prefix and see why the cache was invalidated. * chore(core): refine truncateHistory cache clear + extract test helper Per review feedback (deepseek-v4-pro): 1. truncateHistory now skips the cache clear when keepCount >= prevLen, since a no-op truncate leaves the cache valid against the unchanged history. Adds a regression test covering both keepCount==prevLen and keepCount>prevLen. 2. The 6 cache-spy test cases each repeated the same 4-line mock setup. Extract a `mockFileReadCacheClear()` helper so future changes to the FileReadCache mock surface only need one edit. Both are quality-of-implementation tweaks; the underlying fix is unchanged. * perf(core): use O(1) getHistoryLength in truncateHistory Per Copilot review feedback: the previous commit's no-op detection in truncateHistory called this.getChat().getHistory().length, but GeminiChat.getHistory() does a structuredClone of the entire history on every call (line 770 of geminiChat.ts) — paying an O(history) clone purely to read .length. In long-running sessions with hundreds of entries this is a meaningful regression. Adds GeminiChat.getHistoryLength(): O(1), no clone. truncateHistory switches to it. The behaviour (skip clear when keepCount >= prevLen) is unchanged. Also adds: - Unit tests for GeminiChat.getHistoryLength (empty, after addHistory, parity with getHistory().length). - A regression test asserting truncateHistory calls getHistoryLength and NOT getHistory, locking in the perf fix against future drift. * fix(core): close NaN hole + use public ReadFileTool API in tests Two issues from copilot review: 1. NaN edge case in truncateHistory cache invalidation. The "did anything actually change?" check was `keepCount < prevLen`, but `Array.slice(0, NaN)` returns [] (history wiped) while `NaN < prevLen` is false. That sequence would wipe the chat but leave the FileReadCache claiming the model has seen the prior reads — exactly the file_unchanged placeholder bug this PR is closing. Switched the check to compare actual post-truncate length (`newLen < prevLen`), which correctly invalidates whenever entries were removed regardless of how `keepCount` was malformed. Added a NaN regression test. 2. The integration test cast `tool` to `unknown` to reach the protected `createInvocation()` method. Switched to the public `tool.buildAndExecute(params, signal)` API so the test exercises the same surface real callers use, including build-time schema validation. |
||
|
|
fbcf59e0b3
|
docs(core): point background-shell + monitor guidance at both /tasks and the dialog (#3808)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
SDK Python / SDK Python (3.10) (push) Waiting to run
SDK Python / SDK Python (3.11) (push) Waiting to run
SDK Python / SDK Python (3.12) (push) Waiting to run
* docs(core): point background-shell guidance at both /tasks and the dialog Follow-up to PR #3801, fulfilling the "separate small PR" commitment in its description. The two model-facing strings (`shell.ts` after spawning a background shell, `task-stop.ts` after requesting cancel) referenced only `/tasks` as the inspection path, predating the interactive Background tasks dialog landing at #3488 / #3720 / #3791. Now that the dialog handles all three kinds (agent / shell / monitor), both surfaces should be visible to the LLM so it can suggest the right one based on the user's mode. Updates: - `shell.ts:865` (LLM message after `is_background: true` spawn) now surfaces both `/tasks` (text, any mode) AND the interactive dialog (footer pill + Enter, with detail view + live updates). Output file guidance retained. - `task-stop.ts:110` (LLM message after `task_stop` on a shell) same pattern: both surfaces named. - `task-stop.ts:95` comment updated to enumerate all observation paths (including the dialog). - `monitorRegistry.ts:197` comment fixed — said "/tasks dialog" which conflated two distinct surfaces. Split to "Background tasks dialog reopens or `/tasks` listings". - `backgroundShellRegistry.ts:10` (module docstring) and `:31` (shellId doc) now mention all three consumers (agent, dialog, slash command). No behavior change — pure documentation/string update. Tests untouched (none asserted on these exact strings); build + lint + 152-test core suite all clean. * docs(core): address PR 3808 review — 'captured output' + consistent ordering Three review nits: 1. (LoqU — copilot) `shell.ts:865` said the output file holds "raw content", but `shellExecutionService` runs each chunk through stripAnsi and skips non-string/binary chunks before writing. Reword to "captured output" so callers don't expect a byte-for-byte stream. 2. (LqKr — wenshao) The PR mentioned both surfaces in two different orders depending on the file: `backgroundShellRegistry.ts` listed the dialog first, while `task-stop.ts` and `shell.ts` listed `/tasks` first. Unify on the LLM-facing order — `/tasks` first, then the interactive Background tasks dialog — across all four sites. Also flips the line-31 docstring on the `shellId` field for the same reason. 3. (LqKt — wenshao, flagged for awareness only) Trim the redundant keystroke detail in shell.ts:865 to match `task-stop.ts:111`'s shorter "(footer pill + Enter)" form. Saves ~7 tokens per background shell launch in `llmContent` while still naming both surfaces. The PR description's rationale (LLM should know both surfaces exist so it can suggest the right one for the user's mode) is preserved — only the operational verbosity is trimmed. 581 tests pass; lint + typecheck clean. Pure docs / string update. * docs(core): grammar polish on PR 3808 strings Two more wording nits from copilot review: - backgroundShellRegistry.ts:10 — change "metadata the agent…need to query" to "metadata that the agent…use to query". The original phrasing reads as if the metadata itself is performing the query. - shell.ts:865 — change "Read the output file directly for the captured output." to "Read the output file directly to view the captured output." — clearer instruction to the model/user. Pure wording, no behavior change. * docs(core): grammar fix on PR 3808 monitor comment 'not visible from later Background tasks dialog reopens' read as if 'reopens' was a noun. Reword to 'not visible after reopening the Background tasks dialog or from /tasks listings'. * docs(core): round 4 wording polish on PR 3808 Four more nits from copilot: - shell.ts:865 + task-stop.ts:96,111: "footer pill + Enter" was ambiguous now that the footer renders multiple pills (background tasks vs other status indicators). Disambiguate to "focus the footer Background tasks pill, then Enter". - monitorRegistry.ts:198: re-tweak my round-3 phrasing — "after reopening the Background tasks dialog or from /tasks listings" → "in later Background tasks dialog reopens or /tasks listings". Reads as "from those surfaces" rather than "after reopening", which the reviewer found ungrammatical. - backgroundShellRegistry.ts:10,31: clarify "/tasks" as the slash command, since the codebase also uses "<projectDir>/tasks/..." on-disk paths in agent-transcript contexts. Pure wording, no behavior change. 87 affected tests pass. * docs(core): mirror /tasks + dialog guidance to monitor llmContent paths Address @doudouOUC review on PR #3808 — two Medium findings: this PR updated shell-facing strings to mention both inspection surfaces but left the parallel monitor strings without any inspection guidance, even though monitors render in the same /tasks output and the same Background tasks dialog. Restore symmetry: - monitor.ts:587-598 — append the same "/tasks (text) or the interactive Background tasks dialog (focus the footer Background tasks pill, then Enter — detail view + live updates)" sentence to the Monitor-started llmContent, mirroring shell.ts:865. - task-stop.ts:125-131 — the monitor cancellation llmContent had no guidance at all. Add the same "Final status will be visible via /tasks (text) or the interactive Background tasks dialog (focus the footer Background tasks pill, then Enter) once the process drains" line that already existed for shells at task-stop.ts:111. The (Low) commit-churn observation is a maintainer call (squash on merge); the (Info) snapshot-test gap is pre-existing and not in scope. 78 monitor + task-stop tests pass; lint + typecheck clean. * docs(core): drop drain phrasing for monitor cancel + restructure dialog comment Address PR #3808 review round 5 (doudouOUC + copilot × 2): 1. (XNoH copilot, XSBu doudouOUC — Medium) The monitor cancellation message inherited "once the process drains" from the shell branch, but `monitorRegistry.cancel()` settles synchronously — when the tool returns, the entry is already `cancelled`, not waiting on a child process. The drain qualifier is accurate for shells (which use `requestCancel()` + the AbortController and settle when the real process exits) but misleading for monitors. Reword the monitor branch in `task-stop.ts:121-130` to drop the drain phrasing and add an explanatory comment about the sync vs. async difference so future maintainers don't replicate the wording from the shell branch by reflex. 2. (XNod copilot — wording, third revision on the same comment) Restructure rather than re-litigate the preposition. The "reopens" noun framing has gone through three rounds of churn (`from later... reopens` → `after reopening...` → `in later... reopens` → and now back to `from`). Sidestep the loop by making the comment a proper sentence about WHAT the surfaces actually read: the persisted `entry.error` is the source of truth; the chat-history notification is a separate, ephemeral side channel. Avoids the noun-form "reopens" entirely. Updated test assertion to match the new "Monitor \"...\" cancelled" prefix. 51 tests pass; lint + typecheck clean. |
||
|
|
2fea1d3aa7
|
fix(core): address post-merge monitor tool and UI routing issues (#3792)
* fix(core): address post-merge monitor tool and UI routing issues - Guard token bucket against clock drift after system suspend/resume (negative elapsed resets lastRefill instead of starving the bucket) - Add debugLogger.warn for AST read-only check failures in monitor getConfirmationDetails (previously silent catch) - Consolidate SHELL_TOOL_NAMES: export from rule-parser.ts, import in permission-manager.ts (removes identical SHELL_LIKE_TOOLS duplicate) - Extract hasBlockingBackgroundWork/resetBackgroundStateForSessionSwitch to shared backgroundWorkUtils.ts (removes identical copies in clearCommand.ts and useResumeCommand.ts) - Consolidate getToolCallComponent routing into packages/webui (removes near-identical copies in ChatViewer.tsx and vscode-ide-companion, adds missing web_search compat alias to VSCode path) - Add test for droppedLines count in terminal notification text - Add test for exit(null, null) settlement (externally killed process) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,cli,vscode): address PR #3792 review feedback - Add debugLogger.warn for clock-drift guard (observability for throttle bucket resets after suspend/resume) - Add test for clock-drift recovery (elapsed < 0 bucket reset) - Add test for AST parse failure catch path (mockRejectedValueOnce) - Forward isFirst/isLast props through VSCode ToolCallRouter (fixes timeline connector rendering) - Add test for shell running branch in hasBlockingBackgroundWork 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,vscode): address follow-up review comments - Fix React.FC missing import: use `import type { FC } from 'react'` instead of `React.FC` (original file had this import before refactor) - Tighten clock-drift test: emit while clock is in the past to confirm guard resets lastRefill, then verify refill at the new reference point 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,cli): adopt review feedback — ReadonlySet + hasRunningEntries - Type SHELL_TOOL_NAMES as ReadonlySet<string> to prevent accidental mutation of permission-critical set - Use BackgroundShellRegistry.hasRunningEntries() instead of getAll().some() for zero-allocation short-circuit check - Update clearCommand test mocks to include hasRunningEntries 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(core,webui,vscode): address remaining PR #3792 review comments - Merge AgentToolCall + isAgentExecutionToolCall into single import in routing.ts (comment 3178280762) - Use real getToolCallComponent via vi.importActual in VSCode test mock so routing logic is validated, not a parallel mock that can drift (comment 3178280775) - Validate isFirst/isLast forwarding in VSCode test mock via data attributes (comment 3178346891) - Add comment documenting debugLogger.warn no-op tradeoff for clock drift guard (comment 3178346889) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * test(webui): add unit test for getToolCallComponent routing Covers all 8 component branches including the web_search compatibility alias, agent execution detection, case-insensitive matching, and fallback to GenericToolCall. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(cli): add missing hasRunningEntries to useResumeCommand test mocks The backgroundWorkUtils refactor replaced getAll().some() with hasRunningEntries(), but the test mocks in useResumeCommand.test.ts were not updated, causing CI failures. 🤖 Generated with [Qoder Code](https://github.com/QwenLM/qwen-code) * test(cli): add unit tests for backgroundWorkUtils shared utility Cover hasBlockingBackgroundWork (6 cases including short-circuit behaviour) and resetBackgroundStateForSessionSwitch (1 case verifying all three registries are reset). 🤖 Generated with [Qoder Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> |
||
|
|
03f66bada5
|
feat(sdk-python): add PyPI release workflow (#3685)
* feat(sdk-python): add pypi release workflow Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): build cli before smoke test Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): tighten release conflict handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): harden python release workflow Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): tighten stable release guards Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): harden prerelease publish flow Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): reuse release branches on rerun Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): resume incomplete releases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(release): tighten missing-release checks Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): resume stable release reruns Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): tighten release recovery guards Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * test(sdk-python): cover release version edge cases Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): address release workflow review feedback Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * refactor(sdk-python): address review feedback on release version script - Remove unreachable `if (type === 'stable')` branch in bumpVersion(); the stable path was dead code since getVersion() throws for all stable conflicts before calling bumpVersion(). Move nightly conflict throw to the call site for symmetry. - Rename getNextPatchBaseVersion → getNextBaseVersion to reflect that the function can return a prerelease base without incrementing patch. - Add test for preview+nightly coexistence where nightly base is higher. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): address remaining review feedback on release workflow - Fix failure-issue gate to read github.event.inputs.dry_run directly instead of steps.vars.outputs.is_dry_run (which is empty when early steps fail). Add --repo flag for gh issue create when checkout failed. - Add diagnostic state table to failure-issue body (RELEASE_TAG, PACKAGE_VERSION, PUBLISH_CHANNEL, RESUME_EXISTING_RELEASE, etc.) - Fix release-notes error swallow: only silence release not found / Not Found / HTTP 404, emit :⚠️: for other gh release view errors. - Improve validateVersion error messages to use human-readable format keys (X.Y.Z, X.Y.Z-preview.N) matching TS sibling convention. - Filter fully-yanked versions in getAllVersionsFromPyPI. - Add console.error log when stable is derived from nightly. - Add bash regex guard for inputs.version to prevent shell injection. - Use per-release-type concurrency groups (nightly/preview/stable). - Add jq null-guard checks for all 6 field extractions. - Remove misleading --follow-tags from git push (lightweight tags). 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): rename misleading test description The test asserts that preview/nightly releases return empty previousReleaseTag, but the name said "same-channel previous release tags" which implied non-empty values. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): address unresolved review comments on release workflow - Remove -z check in extract_field() that blocked preview/nightly releases (previousReleaseTag is legitimately empty for non-stable releases) - Use static environment.url since step outputs aren't available at job startup - Use skip-existing for resumed PyPI publish to fill in missing artifacts - Add AbortSignal.timeout(30s) to PyPI fetch to prevent indefinite hangs - Add downgrade guard for stable_version_override - Use GHA :⚠️: annotation instead of console.error for visibility - Separate yanked/non-yanked version lists so conflict detection includes yanked versions (PyPI still reserves those slots) - Filter current release from previousReleaseTag to avoid self-reference on resume - Add tests for yanked conflict detection, downgrade guard, and resume previousReleaseTag Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(sdk-python): address final review round on release version script - Fix getNextBaseVersion() first-release skip: use pyproject.toml version directly when PyPI has no stable versions instead of unconditionally incrementing - Fix getNextBaseVersion() off-by-one: change > to >= so equal prerelease base continues the existing line instead of incrementing patch - Add :⚠️: annotation when preview auto-bumps due to orphan git tags (tag exists without PyPI version or GitHub release) - Add set -euo pipefail to 5 workflow steps missing it: release_branch, persist_source, Create GitHub release, Delete prerelease branch, Create issue on failure - Fix 2 existing tests affected by first-release change, add 4 new tests 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(sdk-python): use stderr for GHA warning annotations to avoid corrupting JSON stdout console.log writes to stdout, which gets captured by VERSION_JSON=$(node ...) in the workflow and corrupts the JSON output for jq. Switch to console.error so :⚠️: annotations go to stderr (GHA recognizes workflow commands on both streams). Also add set -euo pipefail to the "Get the version" step for consistency with other workflow steps. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) --------- Co-authored-by: jinye.djy <jinye.djy@alibaba-inc.com> Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> |
||
|
|
eecb0cfa16 |
fix(attribution): skip values for env -u NAME and -S string
`env`'s value-taking flags (`-u`/`--unset`, `-S`/`--split-string`) were not in the wrapper's flag-skip allowlist, so `env -u FOO git commit ...` left FOO as the next token and the parser treated it as the program — masking the real `git commit` from attribution detection. Add an ENV_FLAGS_WITH_VALUE table mirroring the sudo allowlist. Regression test added. |
||
|
|
8891b83b85 |
fix(attribution): last-match --body, symlink leaf canonicalisation, scoped prompt count
- addAttributionToPR: use matchAll/last-match for `--body`/`-b` so the
trailer lands in the gh-honoured (final) body when multiple flags are
present. Mirrors addCoAuthorToGitCommit. Adds regression test.
- attachCommitAttribution: also fs.realpathSync the per-file resolved
path (not just the repo root) so files behind intermediate symlinks
are matched against canonical keys recordEdit stored, instead of
silently zeroing attribution and leaking entries past commit.
- incrementPromptCount: scope to SendMessageType.UserQuery — ToolResult,
Retry, Hook, Cron, Notification are model/background re-entries of
the same logical turn. Tracking them all inflated the "N-shotted"
trailer (one user message could become 10-shotted with 10 tool calls).
- AttributionSnapshot: add `version: 1` field; restoreFromSnapshot now
refuses incompatible versions and validates per-field types so a
partially-written snapshot can't seed `Math.min(undefined, n) === NaN`
into git-notes payloads.
- Drop unused permission/escape counters (declared, persisted, never
read or incremented) — fields, snapshot tolerance, and clear-method
bookkeeping all removed; AttributionSnapshot interface simplifies.
- isGeneratedFile: switch directory rule from substring `.includes('/dist/')`
to segment-boundary check (split on `/`) so project dirs like
`my-dist/` or `xbuild/` don't match. `.lock` removed from the blanket
extension exclusion — well-known lockfiles already covered by
EXCLUDED_FILENAMES; hand-authored `.lock` files (e.g. `.terraform.lock.hcl`)
now stay attributable.
- getClientSurface: document `QWEN_CODE_ENTRYPOINT` as the embedder
override hook so the always-`'cli'` default is intentional.
|
||
|
|
d78d4cfe22 |
Merge remote-tracking branch 'origin/main' into feat/commit-attribution
# Conflicts: # packages/core/src/tools/shell.test.ts # packages/core/src/tools/shell.ts |
||
|
|
e617f20d15
|
fix(telemetry): suppress async resource attribute warning on startup (#3807)
Some checks are pending
Qwen Code CI / CodeQL (push) Waiting to run
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* fix(telemetry): suppress async resource attribute warning on startup Disable NodeSDK's default auto-detection of host/process/env resources. These detectors return async attributes; if a span is exported before they settle, OTel logs an error-level diag message that surfaces in the terminal UI. The attributes we actually need (service name, version, session.id) are already provided synchronously via resourceFromAttributes. Closes part of #3731 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): add comment and test for autoDetectResources: false Address review feedback: - Add inline comment explaining why auto-detection is disabled (avoids async attribute warning before first span export) - Add test assertion that NodeSDK is constructed with autoDetectResources: false to prevent silent regression 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * fix(telemetry): improve comment precision and extend autoDetectResources test coverage - Clarify comment: error fires on any resource attribute read before async detectors settle, not just during export; HttpInstrumentation span creation is the typical trigger - Add autoDetectResources: false assertion to HTTP OTLP and file exporter test cases for branch-complete coverage across all three NodeSDK construction paths 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) |
||
|
|
07fdfadc33
|
feat(cli): include monitors in /tasks + add interactive-mode hint (#3801)
* feat(cli): include monitors in /tasks + add interactive-mode hint
Phase B closure for Issue #3634. Two coupled changes to /tasks:
1. **Bug fix — include monitors.** The command was last touched before
#3684 / #3791 landed, so it merged only agent + shell entries while
monitors silently disappeared from the headless / non-interactive /
ACP listing path. Add a third registry pull from `getMonitorRegistry()`
and wire monitor through statusLabel / taskLabel / taskId /
taskOutputPath. Status line includes eventCount (`running (N events)`,
`completed (exit 0, N events)`, `completed (Max events reached, N
events)` for auto-stop) and pid where defined.
2. **Soft deprecation hint, scoped to interactive mode only.** Once the
richer Ctrl+T dialog (#3488 + #3720 + #3791) is available, the text
dump is the long-form fallback rather than the primary surface. Show
`Tip: Ctrl+T opens the interactive Background tasks dialog with
detail view + live updates.` at the top of the output when
`executionMode === 'interactive'`. Headless / ACP get the bare list
— they have no dialog to point at and the hint would just clutter.
Description string also clarified to call out the modal split.
Kept on all three executionModes (no deletion) — `/tasks` is the only
way headless / ACP / SDK consumers can inspect background-task state.
Tests: 4 new cases in tasksCommand.test.ts cover monitor entry
formatting (running with pid, natural completion with exitCode,
auto-stop with error string, failed), the singular `1 event` form,
the interactive-mode hint gating, and the cross-kind merge order.
* fix(cli): address PR 3801 review — exhaustive switch + i18n + extra tests
Three actionable Suggestions from /review's pass:
- `taskLabel` rewritten as a `switch` with a `never`-typed `default`
arm, matching the structural-safety pattern already used by `taskId`.
Adding a 4th DialogEntry kind in the future will now flip both
helpers to compile errors instead of letting `taskLabel` silently
fall through to `entry.description` (which the new kind may not have).
- Hint string wrapped in `t()` for i18n consistency with the rest of
the file. The literal stays as the i18n key default, so today's
output is unchanged.
- Tests: cover `cancelled` monitor status (was the only one without an
inline assertion) and explicit `acp` execution mode hint suppression
(pins the suppression rationale so a future regression flipping the
check to `!== 'non_interactive'` would fail loudly).
* fix(cli): correct /tasks dialog-open hint — Ctrl+T was wrong
Tmux verification on PR #3801 caught that the hint string says "Ctrl+T
opens the interactive Background tasks dialog" but Ctrl+T is actually
bound to the MCP tool descriptions toggle (ContextSummaryDisplay.tsx
lines 110-115). The dialog opens via Down arrow on an empty composer
(focuses the footer pill) followed by Enter (InputPrompt.tsx 947-968).
Same misattribution slipped into PR #3791's first description and was
caught + fixed there before merge — this PR carried the wrong wording
forward in code.
Updates four sites:
- The hint string itself: "Tip: press ↓ from an empty composer then
Enter to open the interactive Background tasks dialog with detail
view + live updates."
- The slash-command description: "interactive UI is Ctrl+T" → "interactive
dialog opens via the footer pill"
- Two inline comments referencing Ctrl+T as the dialog opener
- The interactive-mode hint test now pins on `↓` + `Enter` and
asserts `not.toContain('Ctrl+T')` so a regression to the wrong
wording fails loudly.
* fix(cli): address PR 3801 review — exhaustive switch consistency + path-agnostic hint
Four Suggestions from the latest /review pass:
- `statusLabel` rewritten as a single top-level switch with a
`never`-typed default, matching `taskLabel` / `taskId` /
`taskOutputPath`. The previous `if`/`if`/fallthrough form would
silently apply monitor formatting to a future 4th kind.
- `taskOutputPath` gained the same exhaustive default — was the only
per-kind helper still relying on implicit fallthrough; would
silently omit a 4th-kind output path while the adjacent helpers
flip to compile errors.
- Hint wording de-specifies the exact keystroke count: `'Tip: focus
the Background tasks pill in the footer (use ↓ from an empty
composer) and press Enter ...'`. Previous "press ↓ then Enter"
phrasing was wrong when the Arena agent tab bar is present —
`InputPrompt`'s focus chain routes Down through the tab bar first,
so a single Down lands there, not on the bg pill.
- Test pin tightened: `[mon_fail] failed: spawn ENOENT (0 events)` is
now a full-string assertion instead of a prefix match, so a
regression that drops the `(N events)` suffix from monitor's failed
branch fails loudly.
* fix(cli): sanitize ANSI escape sequences in /tasks output
deepseek's review pass flagged that monitor description / error fields
are user / process-supplied strings rendered directly to the terminal.
A maliciously-crafted tool description or spawn error containing raw
ANSI control sequences (clear-screen, cursor-move, colour) would
otherwise reach stdout verbatim and corrupt display.
Same risk applies to agent error / description and shell error /
command — all already-existing renderers with the same exposure that
this PR didn't introduce but inherits. So instead of per-field
sprinkling, wrap the joined output once with `escapeAnsiCtrlCodes`
(no-op when no control chars present, so cost is zero in the common
case). One line change in the renderer covers every kind including
any future one.
Test pins the behaviour: a monitor entry with `\x1b[2J` /
`\x1b[31m...` content produces output with no raw ESC bytes and
visible escaped `[...]` sequences.
* docs(cli): tighten escapeAnsiCtrlCodes comments to match actual scope
Two doc-precision Suggestions from copilot's pass on
|
||
|
|
cdadbcdb33
|
feat(cli): wire Monitor entries into combined Background tasks dialog (#3791)
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* feat(cli): wire Monitor entries into combined Background tasks dialog
Phase C mirror follow-up of Issue #3634, structurally a clean repeat of
#3720 but for Monitor (third consumer of the kind framework).
Core (MonitorRegistry):
- Add `setStatusChangeCallback` mirroring `BackgroundShellRegistry` /
`BackgroundTaskRegistry`. Fires synchronously inside `register()` (so
subscribers see lifecycle start) and inside `settle()` (so subscribers
see every running → terminal transition: complete / fail / cancel /
emitEvent's auto-stop at maxEvents).
- Subscriber failures are caught and logged but do not poison the
registry — same defensive contract as the other two registries.
CLI:
- `useBackgroundTaskView` subscribes to all three registries (agent +
shell + monitor) and merges by `startTime`. `DialogEntry` union
extended with `(MonitorEntry & { kind: 'monitor' })`. `entryId`
switches over the three kinds and returns the right id field.
- `BackgroundTasksPill.getPillLabel` adds monitor to its KIND_NAMES
table; grouping order is shell → agent → monitor (monitor last
because it tends to be the longest-lived, least urgent to glance at).
- `BackgroundTasksDialog`:
- `rowLabel` returns `[monitor] <description>` for monitor rows.
- New `MonitorDetailBody` showing command / status / pid / event
count / dropped lines. No Progress block (monitors don't fire
activity callbacks per-event).
- DetailBody dispatcher gains the monitor branch.
- `BackgroundTaskViewContext.cancelSelected` routes monitor cancels via
`monitorRegistry.cancel(monitorId)`. Monitor's cancel is synchronous
(settle + abort happen inside the registry), matching the same path
task_stop already uses.
Tests: 6 new core tests (registry callbacks fire on register / each
terminal transition / not on emitEvent / on auto-stop / clear stops
notifications / subscriber-failure isolation), 4 new pill tests
(singular / plural / 3-kind grouping / running-only filter), dialog
mock extended with `getMonitorRegistry`.
* fix(cli): add exhaustive default arms to DialogEntry switches
ESLint's `default-case` rule requires every switch to have a default
arm even when TypeScript can prove the union is exhaustive. Add
`default: { const _exhaustive: never = entry; throw ... }` to the
five switches added in this PR — same pattern keeps both the runtime
guard and the compile-time exhaustiveness check.
* fix(core): fire statusChange in MonitorRegistry.reset()
The newly-added `setStatusChangeCallback` subscriber misses `reset()`,
so a `/clear` or session reset leaves stale monitor rows visible in the
combined Background tasks dialog until an unrelated register/settle
event happens. Both BackgroundShellRegistry and BackgroundTaskRegistry
already fire statusChange on their reset paths — Monitor was the
outlier.
Fix: fire `statusChange()` (no arg) after `monitors.clear()`, with an
early return when the registry is already empty so we don't notify on
a no-op reset. Two new tests cover both branches.
* fix(cli,core): address PR 3791 review feedback
Four review threads from /review's second pass on top of
|