mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-04-28 11:41:04 +00:00
Some checks are pending
Qwen Code CI / Lint (push) Waiting to run
Qwen Code CI / Test (push) Blocked by required conditions
Qwen Code CI / Test-1 (push) Blocked by required conditions
Qwen Code CI / Test-2 (push) Blocked by required conditions
Qwen Code CI / Test-3 (push) Blocked by required conditions
Qwen Code CI / Test-4 (push) Blocked by required conditions
Qwen Code CI / Test-5 (push) Blocked by required conditions
Qwen Code CI / Test-6 (push) Blocked by required conditions
Qwen Code CI / Test-7 (push) Blocked by required conditions
Qwen Code CI / Test-8 (push) Blocked by required conditions
Qwen Code CI / Post Coverage Comment (push) Blocked by required conditions
Qwen Code CI / CodeQL (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:none (push) Waiting to run
E2E Tests / E2E Test (Linux) - sandbox:docker (push) Waiting to run
E2E Tests / E2E Test - macOS (push) Waiting to run
* fix(core): prevent followup suggestion input/output from appearing in tool call UI The follow-up suggestion generation was leaking into the conversation UI through three channels: 1. The forked query included tools in its generation config, allowing the model to produce function calls during suggestion generation. Fixed by setting `tools: []` in runForkedQuery's per-request config (kept in createForkedChat for speculation which needs tools). 2. logApiResponse and logApiError recorded suggestion API events to the chatRecordingService, causing them to appear in session JSONL files and the WebUI. Fixed by adding isInternalPromptId() guard that skips chatRecordingService for 'prompt_suggestion' and 'forked_query' IDs. uiTelemetryService.addEvent() is preserved so /stats still tracks suggestion token usage. 3. LoggingContentGenerator logged suggestion requests/responses to the OpenAI logger and telemetry pipeline. Fixed by skipping logApiRequest, buildOpenAIRequestForLogging, and logOpenAIInteraction for internal prompt IDs. _logApiResponse is preserved (for /stats) but its chatRecordingService path is filtered by fix #2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: deduplicate isInternalPromptId into shared export from loggers.ts Address review feedback: extract isInternalPromptId() to a single exported function in telemetry/loggers.ts and import it in LoggingContentGenerator, eliminating the duplicate private method. Also update loggingContentGenerator.test.ts mock to use importOriginal so the real isInternalPromptId is available during tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: extract isInternalPromptId to shared utils, add tests Address maintainer review feedback: 1. Move isInternalPromptId() to packages/core/src/utils/internalPromptIds.ts using a ReadonlySet for the ID registry. Adding new internal prompt IDs only requires changing one file. loggers.ts re-exports for compatibility, loggingContentGenerator.ts imports directly from utils. 2. Extract `tools: []` magic value to a frozen NO_TOOLS constant in forkedQuery.ts. 3. Add unit tests for isInternalPromptId: prompt_suggestion → true, forked_query → true, user_query → false, empty string → false. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Copilot review — docs, stream optimization, tests 1. Update forkedQuery.ts module docs to reflect that runForkedQuery overrides tools: [] at the per-request level while createForkedChat retains the full generationConfig for speculation callers. 2. Propagate isInternal into loggingStreamWrapper to skip response collection and consolidation for internal prompts, avoiding unnecessary CPU/memory overhead. 3. Add logApiResponse chatRecordingService filter tests: verify prompt_suggestion/forked_query skip recording while normal IDs still record. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: deep-freeze NO_TOOLS, add internal prompt guard tests Address Copilot review round 3: 1. Deep-freeze NO_TOOLS.tools array to prevent shared mutable state across forked query calls. 2. Add LoggingContentGenerator tests verifying that internal prompt IDs (prompt_suggestion, forked_query) skip logApiRequest and OpenAI interaction logging while preserving logApiResponse. 3. Add logApiError chatRecordingService filter tests matching the existing logApiResponse coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: reconcile createForkedChat JSDoc with module header Clarify that createForkedChat retains the full generationConfig (including tools) for speculation callers, while runForkedQuery strips tools at the per-request level via NO_TOOLS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: build errors and Copilot round 4 feedback 1. Fix NO_TOOLS type: Object.freeze produces readonly array incompatible with ToolUnion[]. Use Readonly<Pick<>> instead; spread in requestConfig already creates a fresh mutable copy per call. 2. Fix test missing required 'model' field in ContentGeneratorConfig. 3. Track firstResponseId/firstModelVersion in loggingStreamWrapper so _logApiResponse/_logApiError have accurate values even when full response collection is skipped for internal prompts. 4. Strengthen OpenAI logger test assertion: assert OpenAILogger was constructed (not guarded by if), then assert logInteraction was not called. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove dead Object.keys check, add streaming internal prompt test 1. Simplify runForkedQuery: requestConfig always has tools:[] from NO_TOOLS spread, so the Object.keys().length > 0 ternary is dead code. Pass requestConfig directly. 2. Add generateContentStream test for internal prompt IDs to match the existing generateContent coverage, ensuring the streaming wrapper also skips logApiRequest and OpenAI interaction logging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent Enter accept from re-inserting suggestion into buffer When accepting a followup suggestion via Enter, accept() queued buffer.insert(suggestion) in a microtask that executed after handleSubmitAndClear had already cleared the buffer, leaving the suggestion text stuck in the input. Add skipOnAccept option to accept() so the Enter path bypasses the onAccept callback. Also add runForkedQuery unit tests verifying tools: [] is passed in per-request config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(core): add speculation to internal IDs, fix logToolCall filtering, improve suggestion prompt - Add 'speculation' to INTERNAL_PROMPT_IDS so speculation API traffic and tool calls are hidden from chat recordings and tool call UI - Add isInternalPromptId check to logToolCall() for consistency with logApiError/logApiResponse - Improve SUGGESTION_PROMPT: prioritize assistant's last few lines and extract actionable text from explicit tips (e.g. "Tip: type X") - Fix garbled unicode in prompt text - Update design docs and user docs to reflect changes - Add test coverage for all new behavior * fix(core): deep-freeze NO_TOOLS, add speculation to loggingContentGenerator tests - Object.freeze NO_TOOLS and its tools array to prevent runtime mutation - Add 'speculation' to loggingContentGenerator internal prompt ID tests for consistency with loggers.test.ts and internalPromptIds.ts * fix(core): fix NO_TOOLS Object.freeze type error Use `as const` with type assertion to satisfy TypeScript while keeping runtime immutability via Object.freeze. * refactor(core): remove unused isInternalPromptId re-export from loggers.ts All consumers import directly from utils/internalPromptIds.js. The re-export was dead code with no importers. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
265 lines
8.5 KiB
TypeScript
265 lines
8.5 KiB
TypeScript
/**
|
|
* @license
|
|
* Copyright 2025 Qwen Team
|
|
* SPDX-License-Identifier: Apache-2.0
|
|
*
|
|
* Forked Query Infrastructure
|
|
*
|
|
* Enables cache-aware secondary LLM calls that share the main conversation's
|
|
* prompt prefix (systemInstruction + history) for cache hits.
|
|
*
|
|
* DashScope already enables cache_control via X-DashScope-CacheControl header.
|
|
* By constructing the forked GeminiChat with identical generationConfig and
|
|
* history prefix, the fork automatically benefits from prefix caching.
|
|
*
|
|
* Note: `runForkedQuery` overrides `tools: []` at the per-request level so the
|
|
* model cannot produce function calls. `createForkedChat` retains the full
|
|
* generationConfig (including tools) for callers like speculation that need them.
|
|
*/
|
|
|
|
import type {
|
|
Content,
|
|
GenerateContentConfig,
|
|
GenerateContentResponseUsageMetadata,
|
|
} from '@google/genai';
|
|
import { GeminiChat, StreamEventType } from '../core/geminiChat.js';
|
|
import type { Config } from '../config/config.js';
|
|
|
|
/** Per-request config that strips tools so the model never produces function calls. */
|
|
const NO_TOOLS = Object.freeze({ tools: [] as const }) as Pick<
|
|
GenerateContentConfig,
|
|
'tools'
|
|
>;
|
|
|
|
/**
|
|
* Snapshot of the main conversation's cache-critical parameters.
|
|
* Captured after each successful main turn so forked queries share the same prefix.
|
|
*/
|
|
export interface CacheSafeParams {
|
|
/** Full generation config including systemInstruction and tools */
|
|
generationConfig: GenerateContentConfig;
|
|
/** Curated conversation history (deep clone) */
|
|
history: Content[];
|
|
/** Model identifier */
|
|
model: string;
|
|
/** Version number — increments when systemInstruction or tools change */
|
|
version: number;
|
|
}
|
|
|
|
/**
|
|
* Result from a forked query.
|
|
*/
|
|
export interface ForkedQueryResult {
|
|
/** Extracted text response, or null if no text */
|
|
text: string | null;
|
|
/** Parsed JSON result if schema was provided */
|
|
jsonResult?: Record<string, unknown>;
|
|
/** Token usage metrics */
|
|
usage: {
|
|
inputTokens: number;
|
|
outputTokens: number;
|
|
cacheHitTokens: number;
|
|
};
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Global cache params slot
|
|
// ---------------------------------------------------------------------------
|
|
|
|
let currentCacheSafeParams: CacheSafeParams | null = null;
|
|
let currentVersion = 0;
|
|
|
|
/**
|
|
* Save cache-safe params after a successful main conversation turn.
|
|
* Called from GeminiClient.sendMessageStream() on successful completion.
|
|
*/
|
|
export function saveCacheSafeParams(
|
|
generationConfig: GenerateContentConfig,
|
|
history: Content[],
|
|
model: string,
|
|
): void {
|
|
// Detect if systemInstruction or tools changed
|
|
const prevConfig = currentCacheSafeParams?.generationConfig;
|
|
const sysChanged =
|
|
!prevConfig ||
|
|
JSON.stringify(prevConfig.systemInstruction) !==
|
|
JSON.stringify(generationConfig.systemInstruction);
|
|
const toolsChanged =
|
|
!prevConfig ||
|
|
JSON.stringify(prevConfig.tools) !== JSON.stringify(generationConfig.tools);
|
|
|
|
if (sysChanged || toolsChanged) {
|
|
currentVersion++;
|
|
}
|
|
|
|
currentCacheSafeParams = {
|
|
generationConfig: structuredClone(generationConfig),
|
|
history, // caller passes structuredClone'd curated history (from getHistory(true))
|
|
model,
|
|
version: currentVersion,
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Get the current cache-safe params, or null if not yet captured.
|
|
*/
|
|
export function getCacheSafeParams(): CacheSafeParams | null {
|
|
return currentCacheSafeParams
|
|
? structuredClone(currentCacheSafeParams)
|
|
: null;
|
|
}
|
|
|
|
/**
|
|
* Clear cache-safe params (e.g., on session reset).
|
|
*/
|
|
export function clearCacheSafeParams(): void {
|
|
currentCacheSafeParams = null;
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Forked chat creation
|
|
// ---------------------------------------------------------------------------
|
|
|
|
/**
|
|
* Create an isolated GeminiChat that shares the main conversation's
|
|
* generationConfig (including systemInstruction, tools, and history).
|
|
*
|
|
* The full config is retained so that callers like `runSpeculativeLoop`
|
|
* can execute tool calls during speculation. For pure-text callers like
|
|
* `runForkedQuery`, tools are stripped at the per-request level via
|
|
* `NO_TOOLS` — see {@link runForkedQuery}.
|
|
*
|
|
* The fork does NOT have chatRecordingService or telemetryService to avoid
|
|
* polluting the main session's recordings and token counts.
|
|
*/
|
|
export function createForkedChat(
|
|
config: Config,
|
|
params: CacheSafeParams,
|
|
): GeminiChat {
|
|
// Limit history to avoid excessive cost
|
|
const maxHistoryEntries = 40;
|
|
const history =
|
|
params.history.length > maxHistoryEntries
|
|
? params.history.slice(-maxHistoryEntries)
|
|
: params.history;
|
|
|
|
// params.generationConfig and params.history are already deep-cloned snapshots
|
|
// from saveCacheSafeParams (which clones generationConfig) and getHistory(true)
|
|
// (which structuredClones the history). Slice creates a new array but shares
|
|
// Content references — GeminiChat only reads history, never mutates entries,
|
|
// so sharing is safe and avoids a redundant deep clone.
|
|
return new GeminiChat(
|
|
config,
|
|
{
|
|
...params.generationConfig,
|
|
// Disable thinking for forked queries — suggestions/speculation don't need
|
|
// reasoning tokens and it wastes cost + latency on the fast model path.
|
|
// This doesn't affect cache prefix (system + tools + history).
|
|
thinkingConfig: { includeThoughts: false },
|
|
},
|
|
[...history], // shallow copy — entries are read-only
|
|
undefined, // no chatRecordingService
|
|
undefined, // no telemetryService
|
|
);
|
|
}
|
|
|
|
// ---------------------------------------------------------------------------
|
|
// Forked query execution
|
|
// ---------------------------------------------------------------------------
|
|
|
|
function extractUsage(
|
|
metadata?: GenerateContentResponseUsageMetadata,
|
|
): ForkedQueryResult['usage'] {
|
|
return {
|
|
inputTokens: metadata?.promptTokenCount ?? 0,
|
|
outputTokens: metadata?.candidatesTokenCount ?? 0,
|
|
cacheHitTokens: metadata?.cachedContentTokenCount ?? 0,
|
|
};
|
|
}
|
|
|
|
/**
|
|
* Run a forked query using a GeminiChat that shares the main conversation's
|
|
* cache prefix. This is a single-turn, tool-free request (no function calls).
|
|
*
|
|
* @param config - App config
|
|
* @param userMessage - The user message to send (e.g., SUGGESTION_PROMPT)
|
|
* @param options - Optional configuration
|
|
* @returns Query result with text, optional JSON, and usage metrics
|
|
*/
|
|
export async function runForkedQuery(
|
|
config: Config,
|
|
userMessage: string,
|
|
options?: {
|
|
abortSignal?: AbortSignal;
|
|
/** JSON schema for structured output */
|
|
jsonSchema?: Record<string, unknown>;
|
|
/** Override model (e.g., for speculation with a cheaper model) */
|
|
model?: string;
|
|
},
|
|
): Promise<ForkedQueryResult> {
|
|
const params = getCacheSafeParams();
|
|
if (!params) {
|
|
throw new Error('CacheSafeParams not available');
|
|
}
|
|
|
|
const model = options?.model ?? params.model;
|
|
const chat = createForkedChat(config, params);
|
|
|
|
// Build per-request config overrides.
|
|
// NO_TOOLS prevents the model from producing function calls — forked
|
|
// queries are pure text completion and must not appear in tool-call UI.
|
|
const requestConfig: GenerateContentConfig = { ...NO_TOOLS };
|
|
if (options?.abortSignal) {
|
|
requestConfig.abortSignal = options.abortSignal;
|
|
}
|
|
if (options?.jsonSchema) {
|
|
requestConfig.responseMimeType = 'application/json';
|
|
requestConfig.responseJsonSchema = options.jsonSchema;
|
|
}
|
|
|
|
const stream = await chat.sendMessageStream(
|
|
model,
|
|
{
|
|
message: [{ text: userMessage }],
|
|
config: requestConfig,
|
|
},
|
|
'forked_query',
|
|
);
|
|
|
|
// Collect the full response
|
|
let fullText = '';
|
|
let usage: ForkedQueryResult['usage'] = {
|
|
inputTokens: 0,
|
|
outputTokens: 0,
|
|
cacheHitTokens: 0,
|
|
};
|
|
|
|
for await (const event of stream) {
|
|
if (event.type !== StreamEventType.CHUNK) continue;
|
|
const response = event.value;
|
|
// Extract text from candidates
|
|
const text = response.candidates?.[0]?.content?.parts
|
|
?.map((p) => p.text ?? '')
|
|
.join('');
|
|
if (text) {
|
|
fullText += text;
|
|
}
|
|
if (response.usageMetadata) {
|
|
usage = extractUsage(response.usageMetadata);
|
|
}
|
|
}
|
|
|
|
const trimmed = fullText.trim() || null;
|
|
|
|
// Parse JSON if schema was provided
|
|
let jsonResult: Record<string, unknown> | undefined;
|
|
if (options?.jsonSchema && trimmed) {
|
|
try {
|
|
jsonResult = JSON.parse(trimmed) as Record<string, unknown>;
|
|
} catch {
|
|
// Model returned non-JSON despite schema constraint — treat as text
|
|
}
|
|
}
|
|
|
|
return { text: trimmed, jsonResult, usage };
|
|
}
|