qwen-code/docs/design
jinye 8dfbdaa5d4
feat(telemetry): unify span creation paths for hierarchical trace tree (#4126)
* feat(telemetry): unify span creation paths for hierarchical trace tree (#3731 P3 Phase 1)

Replace disconnected withSpan/startSpanWithContext calls in runtime with
session-tracing typed helpers so LLM and tool spans become children of
the interaction span instead of siblings under the session root.

- Add toolContext ALS with runInToolSpanContext() for concurrent-safe
  tool span scoping (uses AsyncLocalStorage.run, not enterWith)
- Wire startLLMRequestSpan/endLLMRequestSpan in loggingContentGenerator
  for both streaming and non-streaming paths
- Wire startToolSpan/endToolSpan + startToolExecutionSpan/endToolExecutionSpan
  in coreToolScheduler with proper try/finally lifecycle
- Remove redundant withSpan('client.generateContent') wrapper from client.ts
- Fix endToolSpan to not override pre-set status when metadata is omitted
- Change startToolExecutionSpan to read parent from toolContext ALS
- Update tests for new span creation APIs and remove dead test infrastructure

* fix(telemetry): address CI build errors in session-tracing tests

- Remove unused _toolSpan variable (TS6133)
- Use bracket notation for index signature property access (TS4111)

* fix(telemetry): update coreToolScheduler and loggingContentGenerator test mocks

- coreToolScheduler.test.ts: mock startToolSpan/endToolSpan/runInToolSpanContext
  instead of withSpan; update cancellation tests for restored safeSetStatus call
- loggingContentGenerator.test.ts: fix attribute keys in mock, add try/catch in
  endLLMRequestSpan mock to match production best-effort behavior

* fix(telemetry): address review feedback from wenshao

- Add debugLogger.warn in catch blocks of endLLMRequestSpan/endToolSpan/
  endToolExecutionSpan instead of silent swallowing
- Add JSDoc on endToolSpan documenting intentional no-metadata-no-status
  contract with setToolSpanFailure/setToolSpanCancelled
- Add warning in startToolExecutionSpan when called outside
  runInToolSpanContext (no active toolContext)
- Sanitize error message in endToolExecutionSpan: use constant
  TOOL_SPAN_STATUS_TOOL_EXCEPTION instead of raw error message

* fix(telemetry): use partial mock for telemetry/index.js in coreToolScheduler tests

The full mock shadowed all re-exports (logToolCall, etc.) causing 49 test
failures. Use importActual to preserve other exports, only override span
functions.

* fix(telemetry): getLastToolSpan must skip tool.execution sub-spans

startToolExecutionSpan mock also pushes to toolSpanRecords, so at(-1)
returns the execution sub-span instead of the tool span. Use findLast
to filter by name.

* fix(telemetry): address second round review feedback

- Remove redundant safeSetStatus(span, OK) on success path — endToolSpan
  in finally already sets OK via metadata
- Add llm_request.stream attribute (true/false) to distinguish streaming
  vs non-streaming LLM requests in trace backends

* fix(telemetry): endToolSpan mock writes to record directly

Bypass span.setStatus() in mock to avoid potential interference from
vitest module resolution. Write to statusCalls/ended directly on the
ToolSpanRecord.

* fix(telemetry): mock session-tracing.js directly instead of telemetry/index.js

Mocking the barrel re-export (telemetry/index.js) with importActual was
unreliable — vitest's module resolution could bind production code to the
real endToolSpan before the mock override took effect. Mock the source
module (session-tracing.js) directly to guarantee interception.

* fix(telemetry): fix endToolSpan status on success — toolCalls is empty in finally

Root cause: checkAndNotifyCompletion clears this.toolCalls before the
finally block in executeSingleToolCall runs, so the tc lookup always
returns undefined.

Fix: set OK status explicitly in _executeToolCallBody's success path
via safeSetStatus(span, OK), and call endToolSpan() without metadata
in finally (just ends the span, preserves pre-set status from any path).

* fix(telemetry): address Codex review — activate OTel context, end span on failure

- Wrap non-stream generateContent API call + logging in context.with(spanContext)
  so nested OTel spans (HTTP instrumentation, log-bridge spans) parent to
  qwen-code.llm_request instead of session root (matches streaming path).
- runInToolSpanContext now also activates OTel context via otelContext.with,
  not just the custom toolContext ALS. Hooks/HTTP/IO during tool execution
  now correctly parent to qwen-code.tool span.
- Split end*Span helpers: span.end() runs in its own try/catch so a throwing
  setAttributes/setStatus can't leak unended spans.

* fix(telemetry): address Codex review v2 — session-root fallback + execution span timing

- start{LLMRequest,Tool,ToolExecution}Span now fall back to getSessionContext()
  when no parent context, instead of otelContext.active(). Side-query LLM calls
  (auto-title, recap) now stay in the session trace instead of starting a new
  detached trace.
- Move startToolExecutionSpan() to BEFORE invocation.execute(), matching
  claude-code. Previously the synchronous setup inside execute (shell command
  preprocessing, child_process.spawn) ran outside the execution span.

* fix(telemetry): address Codex review v3 — sync throw, idle timeout race, test coverage

- coreToolScheduler.executeSingleToolCall: move try-block to wrap
  invocation.execute() so synchronous throws (e.g. shell setup failure)
  flow into the same catch path as async rejections. Previously a sync
  throw would leak the execution span and skip failure hooks.

- loggingStreamWrapper: track spanEndedByTimeout flag so a stream that
  resumes after the 5-min idle timeout does not run the final
  endLLMRequestSpan (which would no-op anyway, but the flag also stops
  resetSpanTimeout from queuing further timer callbacks).

- coreToolScheduler.test: add execution sub-span assertions for success,
  ToolResult.error, thrown invocation exceptions, and pre-hook denial.

- loggingContentGenerator.test: capture setAttribute calls into the mock
  span attributes record; assert llm_request.stream is false for non-stream
  and true for stream paths.

* fix(telemetry): address Codex review v4 — consistency + test coverage gaps

- endLLMRequestSpan now uses spanCtx.span for mutations (matches
  endToolSpan/endToolExecutionSpan pattern). Same object, but consistent
  lookup pattern prevents future drift.

- Mocks capture endLLMRequestSpan and endToolSpan/endToolExecutionSpan
  metadata so tests can assert token counts, durationMs, success, error
  are forwarded correctly. Add assertions on:
    * Non-stream LLM: inputTokens, outputTokens, success on response path
    * Non-stream LLM: success: false + sanitized error on rejection
    * Stream LLM: final lastUsageMetadata reaches endLLMRequestSpan
    * Tool execution sub-span: success: true on happy path
    * Tool execution sub-span: success: false on ToolResult.error
    * Tool execution sub-span: success: false + sanitized error on throw

- Add OTel error resilience tests: when setAttributes or setStatus throws,
  span.end() must still run and the span must be removed from activeSpans.
  Covered for endLLMRequestSpan, endToolSpan, endToolExecutionSpan.

* fix(telemetry): address Codex review v5 — abort distinction + API symmetry

- session-tracing.ts SpanContext.type: comment 'tool.blocked_on_user' |
  'hook' as Phase 2 forward-declarations (no helpers wired yet).

- endToolExecutionSpan: align no-metadata-no-status behavior with
  endToolSpan. Currently no caller omits metadata, but the asymmetric
  default (OK vs preserve-pre-set) was a maintenance trap.

- loggingContentGenerator generateContent (non-stream) catch block:
  call endLLMRequestSpan BEFORE the logging block, mirroring the
  streaming path. Defense-in-depth against logging-side rejections.

- loggingContentGenerator: restore abort-specific span status message.
  All three LLM error paths (non-stream catch, stream eager-error catch,
  stream loggingStreamWrapper finally) now use
  API_CALL_ABORTED_SPAN_STATUS_MESSAGE when req.config.abortSignal.aborted,
  matching the original withSpan('client.generateContent') behavior.
  Trace backends can now distinguish cancellations from real failures.

- coreToolScheduler _executeToolCallBody catch: distinguish abort vs
  exception in execSpan error message. New constant
  TOOL_SPAN_STATUS_TOOL_CANCELLED prevents operators filtering exec
  spans for errors from seeing cancellation false positives.

- New test asserting exec span uses cancelled-by-user message when the
  invocation throws after abort.

* fix(telemetry): always write 'success' attribute on tool spans

E2E review found qwen-code.tool spans never carry the `success` boolean
attribute (the helper only writes it when metadata is passed, and the
finally block calls endToolSpan(toolSpan) without metadata). This breaks
the most common observability query — filtering tool failures with
`success = false` — because tool spans don't have that field at all.

Fix: setToolSpanFailure / setToolSpanCancelled now also call
span.setAttribute('success', false); the success path in
_executeToolCallBody adds span.setAttribute('success', true) after
safeSetStatus(span, OK). Mirrors the unconditional `success` attribute
on llm_request spans, so backends can use one query for both span types.

Add 4 scheduler-level tests asserting the success attribute on:
- success path
- ToolResult.error path
- thrown invocation path
- cancellation path
2026-05-16 22:29:55 +08:00
..
adaptive-output-token-escalation fix(core): recover from truncated tool calls via multi-turn continuation (#3313) 2026-04-21 17:04:24 +08:00
auth refactor(cli): provider-first auth registry with unified install pipeline (#3864) 2026-05-08 12:19:28 +08:00
auto-memory feat(memory): managed auto-memory and auto-dream system (#3087) 2026-04-16 20:05:45 +08:00
channels docs(channels): consolidate design docs into single file 2026-04-02 11:17:37 +08:00
compact-mode feat: optimize compact mode UX — shortcuts, settings sync, and safety (#3100) 2026-04-16 09:29:24 +08:00
compaction-image-stripping feat(core): strip inline media before chat compaction summary (#4101) 2026-05-14 10:20:11 +08:00
customize-banner-area feat(cli): customize banner area (logo, title, hide) (#3710) 2026-05-07 10:17:53 +08:00
fork-subagent feat(core): implement fork subagent for context sharing (#2936) 2026-04-14 14:27:38 +08:00
prompt-suggestion fix(followup): prevent tool call UI leak and Enter accept buffer race (#2872) 2026-04-09 00:07:03 +08:00
session-recap fix(cli): rework session recap rendering and add blur threshold setting (#3482) 2026-04-21 14:39:13 +08:00
session-title feat(session): auto-title sessions via fast model, add /rename --auto (#3540) 2026-04-23 20:37:05 +08:00
skill-nudge feat(memory): add autoSkill background project skill extraction (#3673) 2026-05-09 14:25:02 +08:00
slash-command feat(cli): improve slash command discovery (#3736) 2026-05-09 14:25:44 +08:00
tool-use-summary fix(cli): add API Key option to qwen auth interactive menu (#3624) 2026-04-27 22:01:47 +08:00
custom-api-key-auth-wizard-prd.md docs(auth): add custom API key wizard PRD (#3583) 2026-05-13 14:04:41 +08:00
markdown-syntax-extension.md feat(cli): expand TUI markdown rendering (#3680) 2026-05-07 16:24:13 +08:00
openrouter-auth-and-models.md refactor(cli): remove legacy qwen auth CLI subcommand, redirect to /auth TUI dialog (#3959) 2026-05-11 16:44:09 +08:00
workflow-tracing-gaps.md feat(telemetry): unify span creation paths for hierarchical trace tree (#4126) 2026-05-16 22:29:55 +08:00
worktree.md feat(tools): add generic worktree support — EnterWorktree/ExitWorktree + Agent isolation (#4073) 2026-05-14 18:00:30 +08:00