mirror of
https://github.com/QwenLM/qwen-code.git
synced 2026-05-15 09:41:20 +00:00
fix(core): clear FileReadCache on every history rewrite path (#3810)
* fix(core): clear FileReadCache after microcompaction Microcompaction (the idle-cleanup pass that runs at the start of every new user/cron message) replaces old read_file / shell / glob / grep / edit / write_file tool outputs with a `[Old tool result content cleared]` placeholder. The FileReadCache, however, still records the prior full Reads as "seen in this conversation" — so the next ReadFile of an unchanged file returns the file_unchanged placeholder pointing at bytes the model can no longer retrieve from history. The result is a Read that succeeds at the tool layer but delivers no usable content to the model, which is the failure mode reported in #3805 ("read tool returns no content in long-running sessions"). This mirrors the existing post-compaction clear in tryCompressChat — microcompaction has the same "history rewrite invalidates the cache's 'model has seen this' assumption" property, it was just missed when the cache was wired in. * fix(core): clear FileReadCache on every history rewrite path PR1 only patched microcompaction, but a multi-round audit found four more entry points that rewrite history without clearing the cache, producing the same `file_unchanged` placeholder vs. missing-content mismatch. Each is fixed in the same minimal way (clear() at the call site) and covered by a regression test: - GeminiClient.setHistory — /restore checkpoint, /load_history - GeminiClient.truncateHistory — rewind in AppContainer - GeminiClient.resetChat — public API; clearCommand happens to clear the cache via startNewSession beforehand, but other callers have no such guarantee - stripOrphanedUserEntriesFromHistory — Retry path drops trailing user entries that may include read_file functionResponses Also tightened the microcompaction comment ("compactable tool outputs" instead of an enumerated list, since the source of truth is microcompact.COMPACTABLE_TOOLS) and removed caller references per the codebase comment style. Reverse-tested every new clear() by commenting it out and confirming the matching regression test fails. * test(core): integration test for FileReadCache + history rewrite End-to-end tests using the real ReadFileTool, real FileReadCache, real microcompactHistory, and a real on-disk file. Three cases: 1. Without a cache clear after microcompact, the second Read of an unchanged file returns the file_unchanged placeholder while the prior content has already been wiped from history. Demonstrates the failure mode this PR fixes. 2. After an explicit cache.clear(), the second Read re-emits the real bytes. Demonstrates that the fix works. 3. When microcompact removes every prior read of a file, the placeholder leaves zero recoverable bytes — the model literally cannot find the content anywhere it can reach. These complement the existing unit tests in client.test.ts (which verify the call-site wiring) by proving the end-to-end behaviour through the real code paths, without mocks. * chore(core): add traceable debug log for every FileReadCache clear Per review feedback: the new clear() call sites were silent, leaving no breadcrumb in production debug streams when the cache is dropped. Adds a `[FILE_READ_CACHE] clear after <reason>` log at every clear site (5 new + 1 pre-existing in tryCompressChat) so operators can grep one prefix and see why the cache was invalidated. * chore(core): refine truncateHistory cache clear + extract test helper Per review feedback (deepseek-v4-pro): 1. truncateHistory now skips the cache clear when keepCount >= prevLen, since a no-op truncate leaves the cache valid against the unchanged history. Adds a regression test covering both keepCount==prevLen and keepCount>prevLen. 2. The 6 cache-spy test cases each repeated the same 4-line mock setup. Extract a `mockFileReadCacheClear()` helper so future changes to the FileReadCache mock surface only need one edit. Both are quality-of-implementation tweaks; the underlying fix is unchanged. * perf(core): use O(1) getHistoryLength in truncateHistory Per Copilot review feedback: the previous commit's no-op detection in truncateHistory called this.getChat().getHistory().length, but GeminiChat.getHistory() does a structuredClone of the entire history on every call (line 770 of geminiChat.ts) — paying an O(history) clone purely to read .length. In long-running sessions with hundreds of entries this is a meaningful regression. Adds GeminiChat.getHistoryLength(): O(1), no clone. truncateHistory switches to it. The behaviour (skip clear when keepCount >= prevLen) is unchanged. Also adds: - Unit tests for GeminiChat.getHistoryLength (empty, after addHistory, parity with getHistory().length). - A regression test asserting truncateHistory calls getHistoryLength and NOT getHistory, locking in the perf fix against future drift. * fix(core): close NaN hole + use public ReadFileTool API in tests Two issues from copilot review: 1. NaN edge case in truncateHistory cache invalidation. The "did anything actually change?" check was `keepCount < prevLen`, but `Array.slice(0, NaN)` returns [] (history wiped) while `NaN < prevLen` is false. That sequence would wipe the chat but leave the FileReadCache claiming the model has seen the prior reads — exactly the file_unchanged placeholder bug this PR is closing. Switched the check to compare actual post-truncate length (`newLen < prevLen`), which correctly invalidates whenever entries were removed regardless of how `keepCount` was malformed. Added a NaN regression test. 2. The integration test cast `tool` to `unknown` to reach the protected `createInvocation()` method. Switched to the public `tool.buildAndExecute(params, signal)` API so the test exercises the same surface real callers use, including build-time schema validation.
This commit is contained in:
parent
fbcf59e0b3
commit
fcefab6df5
5 changed files with 579 additions and 0 deletions
|
|
@ -457,8 +457,138 @@ describe('Gemini Client (client.ts)', () => {
|
|||
expect(newHistory.length).toBe(initialHistory.length);
|
||||
expect(JSON.stringify(newHistory)).not.toContain('some old message');
|
||||
});
|
||||
|
||||
it('clears the FileReadCache so post-reset Reads re-emit content', async () => {
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
|
||||
await client.resetChat();
|
||||
|
||||
expect(cacheClear).toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
|
||||
describe('history mutation invalidates FileReadCache', () => {
|
||||
it('setHistory clears the cache', () => {
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
client['chat'] = {
|
||||
setHistory: vi.fn(),
|
||||
} as unknown as GeminiChat;
|
||||
|
||||
client.setHistory([{ role: 'user', parts: [{ text: 'replaced' }] }]);
|
||||
|
||||
expect(cacheClear).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
/**
|
||||
* Test helper: mock a GeminiChat whose history length goes from
|
||||
* `before` to `after` across truncateHistory(). The first
|
||||
* getHistoryLength() call (pre-truncate) returns `before`; the
|
||||
* second (post-truncate) returns `after`.
|
||||
*/
|
||||
function mockChatWithLengths(before: number, after: number): GeminiChat {
|
||||
return {
|
||||
getHistoryLength: vi
|
||||
.fn()
|
||||
.mockReturnValueOnce(before)
|
||||
.mockReturnValueOnce(after),
|
||||
truncateHistory: vi.fn(),
|
||||
} as unknown as GeminiChat;
|
||||
}
|
||||
|
||||
it('truncateHistory clears the cache when entries are actually removed', () => {
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
client['chat'] = mockChatWithLengths(3, 2);
|
||||
|
||||
client.truncateHistory(2);
|
||||
|
||||
expect(cacheClear).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('truncateHistory does NOT clear the cache when nothing was removed (keepCount >= history length)', () => {
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
|
||||
// keepCount equals history length — nothing dropped.
|
||||
client['chat'] = mockChatWithLengths(2, 2);
|
||||
client.truncateHistory(2);
|
||||
expect(cacheClear).not.toHaveBeenCalled();
|
||||
|
||||
// keepCount exceeds history length — also a no-op.
|
||||
client['chat'] = mockChatWithLengths(2, 2);
|
||||
client.truncateHistory(99);
|
||||
expect(cacheClear).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('truncateHistory clears the cache when a non-finite keepCount empties history (NaN regression)', () => {
|
||||
// slice(0, NaN) returns [], but `NaN < prevLen` evaluates to
|
||||
// false. Comparing the actual post-truncate length closes that
|
||||
// hole — without this guard the cache would survive a history
|
||||
// wipe and the file_unchanged placeholder bug returns.
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
client['chat'] = mockChatWithLengths(3, 0);
|
||||
|
||||
client.truncateHistory(NaN);
|
||||
|
||||
expect(cacheClear).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('truncateHistory uses O(1) getHistoryLength, not getHistory (avoids structuredClone)', () => {
|
||||
mockFileReadCacheClear();
|
||||
const getHistoryLength = vi.fn().mockReturnValue(5);
|
||||
const getHistory = vi.fn();
|
||||
client['chat'] = {
|
||||
getHistoryLength,
|
||||
getHistory,
|
||||
truncateHistory: vi.fn(),
|
||||
} as unknown as GeminiChat;
|
||||
|
||||
client.truncateHistory(3);
|
||||
|
||||
expect(getHistoryLength).toHaveBeenCalled();
|
||||
expect(getHistory).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('retry strips orphaned trailing user entries and clears the cache', async () => {
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
const stripOrphanedUserEntriesFromHistory = vi.fn();
|
||||
client['chat'] = {
|
||||
addHistory: vi.fn(),
|
||||
getHistory: vi.fn().mockReturnValue([]),
|
||||
stripOrphanedUserEntriesFromHistory,
|
||||
} as unknown as GeminiChat;
|
||||
mockTurnRunFn.mockReturnValue(
|
||||
(async function* () {
|
||||
yield { type: GeminiEventType.Content, value: 'response' };
|
||||
})(),
|
||||
);
|
||||
|
||||
const stream = client.sendMessageStream(
|
||||
[{ text: 'retry' }],
|
||||
new AbortController().signal,
|
||||
'prompt-retry-1',
|
||||
{ type: SendMessageType.Retry },
|
||||
);
|
||||
for await (const _ of stream) {
|
||||
/* drain */
|
||||
}
|
||||
|
||||
expect(stripOrphanedUserEntriesFromHistory).toHaveBeenCalled();
|
||||
expect(cacheClear).toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
|
||||
/**
|
||||
* Test helper: replace mockConfig.getFileReadCache to return a stub
|
||||
* whose clear() is a fresh spy. Returned spy lets tests assert on
|
||||
* whether a code path invalidated the cache.
|
||||
*/
|
||||
function mockFileReadCacheClear(): ReturnType<typeof vi.fn> {
|
||||
const clearMock = vi.fn();
|
||||
vi.mocked(mockConfig.getFileReadCache).mockReturnValue({
|
||||
clear: clearMock,
|
||||
} as unknown as ReturnType<Config['getFileReadCache']>);
|
||||
return clearMock;
|
||||
}
|
||||
|
||||
describe('thinking block idle cleanup and latch', () => {
|
||||
let mockChat: Partial<GeminiChat>;
|
||||
|
||||
|
|
@ -506,6 +636,100 @@ describe('Gemini Client (client.ts)', () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe('microcompaction FileReadCache invalidation', () => {
|
||||
function makeReadFileResponses(count: number): Content[] {
|
||||
const out: Content[] = [];
|
||||
for (let i = 0; i < count; i++) {
|
||||
out.push({
|
||||
role: 'model',
|
||||
parts: [
|
||||
{
|
||||
functionCall: {
|
||||
name: 'read_file',
|
||||
args: { file_path: `/x/${i}.ts` },
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
out.push({
|
||||
role: 'user',
|
||||
parts: [
|
||||
{
|
||||
functionResponse: {
|
||||
name: 'read_file',
|
||||
response: { output: `content of ${i}` },
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
beforeEach(() => {
|
||||
mockTurnRunFn.mockReturnValue(
|
||||
(async function* () {
|
||||
yield { type: GeminiEventType.Content, value: 'response' };
|
||||
})(),
|
||||
);
|
||||
});
|
||||
|
||||
it('clears the cache after microcompaction strips old read_file results', async () => {
|
||||
// Default test fixture: toolResultsThresholdMinutes = 60,
|
||||
// toolResultsNumToKeep = 5. Six read_file results + a 90-minute
|
||||
// idle gap means the oldest one gets cleared, so the if-meta
|
||||
// branch in sendMessageStream fires and must invalidate the cache.
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
|
||||
const history = makeReadFileResponses(6);
|
||||
const setHistory = vi.fn();
|
||||
client['chat'] = {
|
||||
addHistory: vi.fn(),
|
||||
getHistory: vi.fn().mockReturnValue(history),
|
||||
setHistory,
|
||||
} as unknown as GeminiChat;
|
||||
client['lastApiCompletionTimestamp'] = Date.now() - 90 * 60_000;
|
||||
|
||||
const stream = client.sendMessageStream(
|
||||
[{ text: 'hi' }],
|
||||
new AbortController().signal,
|
||||
'prompt-mc-clear-1',
|
||||
{ type: SendMessageType.UserQuery },
|
||||
);
|
||||
for await (const _ of stream) {
|
||||
/* drain */
|
||||
}
|
||||
|
||||
expect(setHistory).toHaveBeenCalled();
|
||||
expect(cacheClear).toHaveBeenCalled();
|
||||
});
|
||||
|
||||
it('does not clear the cache when the idle gap is below the threshold', async () => {
|
||||
const cacheClear = mockFileReadCacheClear();
|
||||
|
||||
const history = makeReadFileResponses(6);
|
||||
client['chat'] = {
|
||||
addHistory: vi.fn(),
|
||||
getHistory: vi.fn().mockReturnValue(history),
|
||||
setHistory: vi.fn(),
|
||||
} as unknown as GeminiChat;
|
||||
// Recent activity — microcompaction must not fire.
|
||||
client['lastApiCompletionTimestamp'] = Date.now() - 30 * 1000;
|
||||
|
||||
const stream = client.sendMessageStream(
|
||||
[{ text: 'hi' }],
|
||||
new AbortController().signal,
|
||||
'prompt-mc-clear-2',
|
||||
{ type: SendMessageType.UserQuery },
|
||||
);
|
||||
for await (const _ of stream) {
|
||||
/* drain */
|
||||
}
|
||||
|
||||
expect(cacheClear).not.toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
|
||||
describe('tryCompressChat', () => {
|
||||
const mockGetHistory = vi.fn();
|
||||
|
||||
|
|
|
|||
|
|
@ -206,15 +206,47 @@ export class GeminiClient {
|
|||
|
||||
private stripOrphanedUserEntriesFromHistory() {
|
||||
this.getChat().stripOrphanedUserEntriesFromHistory();
|
||||
// Stripped trailing user entries can include read_file
|
||||
// functionResponses from a failed-then-retried request. The
|
||||
// FileReadCache would still record those reads, so the retry's
|
||||
// re-issued Read could hit the file_unchanged placeholder while
|
||||
// the model has nothing to fall back on. Clear to be safe.
|
||||
debugLogger.debug(
|
||||
'[FILE_READ_CACHE] clear after stripOrphanedUserEntriesFromHistory',
|
||||
);
|
||||
this.config.getFileReadCache().clear();
|
||||
}
|
||||
|
||||
setHistory(history: Content[]) {
|
||||
this.getChat().setHistory(history);
|
||||
// Replacing history wholesale drops any prior read_file tool
|
||||
// results the FileReadCache still believes the model has seen.
|
||||
// Without clearing, a follow-up Read of an unchanged file would
|
||||
// return the file_unchanged placeholder for bytes that no longer
|
||||
// exist in the new history.
|
||||
debugLogger.debug('[FILE_READ_CACHE] clear after setHistory');
|
||||
this.config.getFileReadCache().clear();
|
||||
this.forceFullIdeContext = true;
|
||||
}
|
||||
|
||||
truncateHistory(keepCount: number) {
|
||||
// Use the O(1) length getter rather than getHistory() — the latter
|
||||
// structuredClone's the entire history just to read .length, which
|
||||
// gets expensive in long-running sessions.
|
||||
const prevLen = this.getChat().getHistoryLength();
|
||||
this.getChat().truncateHistory(keepCount);
|
||||
// Decide whether to invalidate based on the *actual* post-truncate
|
||||
// length, not on the keepCount argument. Comparing keepCount alone
|
||||
// misses pathological inputs (e.g. NaN: slice(0, NaN) returns [],
|
||||
// emptying history, but `NaN < prevLen` is false and would skip
|
||||
// the clear, reintroducing the file_unchanged placeholder bug).
|
||||
const newLen = this.getChat().getHistoryLength();
|
||||
if (newLen < prevLen) {
|
||||
debugLogger.debug(
|
||||
`[FILE_READ_CACHE] clear after truncateHistory(keep=${keepCount}, prev=${prevLen}, new=${newLen})`,
|
||||
);
|
||||
this.config.getFileReadCache().clear();
|
||||
}
|
||||
this.forceFullIdeContext = true;
|
||||
}
|
||||
|
||||
|
|
@ -233,6 +265,12 @@ export class GeminiClient {
|
|||
async resetChat(): Promise<void> {
|
||||
this.surfacedRelevantAutoMemoryPaths.clear();
|
||||
this.lastApiCompletionTimestamp = null;
|
||||
// startChat() rewrites the chat to its initial state. Any prior
|
||||
// read_file tool results the FileReadCache still tracks are no
|
||||
// longer in history, so a follow-up Read would serve a placeholder
|
||||
// pointing at content the model can no longer retrieve.
|
||||
debugLogger.debug('[FILE_READ_CACHE] clear after resetChat');
|
||||
this.config.getFileReadCache().clear();
|
||||
await this.startChat();
|
||||
}
|
||||
|
||||
|
|
@ -694,6 +732,16 @@ export class GeminiClient {
|
|||
);
|
||||
if (mcResult.meta) {
|
||||
this.getChat().setHistory(mcResult.history);
|
||||
// Microcompaction replaces old compactable tool outputs
|
||||
// (including read_file) with a placeholder, but the
|
||||
// FileReadCache still records the prior full Reads as "seen in
|
||||
// this conversation". A follow-up Read of an unchanged file
|
||||
// would then return the file_unchanged placeholder pointing at
|
||||
// bytes the model can no longer retrieve from history. Drop the
|
||||
// cache so post-microcompaction Reads re-emit the bytes,
|
||||
// mirroring the post-compaction clear in tryCompressChat.
|
||||
debugLogger.debug('[FILE_READ_CACHE] clear after microcompaction');
|
||||
this.config.getFileReadCache().clear();
|
||||
const m = mcResult.meta;
|
||||
debugLogger.debug(
|
||||
`[TIME-BASED MC] gap ${m.gapMinutes}min > ${m.thresholdMinutes}min, ` +
|
||||
|
|
@ -1166,6 +1214,7 @@ export class GeminiClient {
|
|||
// placeholder pointing at content the model can no longer
|
||||
// retrieve from its own context. Clear the cache so post-
|
||||
// compaction Reads re-emit the bytes.
|
||||
debugLogger.debug('[FILE_READ_CACHE] clear after tryCompressChat');
|
||||
this.config.getFileReadCache().clear();
|
||||
uiTelemetryService.setLastPromptTokenCount(info.newTokenCount);
|
||||
this.forceFullIdeContext = true;
|
||||
|
|
|
|||
|
|
@ -1049,6 +1049,25 @@ describe('GeminiChat', async () => {
|
|||
});
|
||||
});
|
||||
|
||||
describe('getHistoryLength', () => {
|
||||
it('returns 0 for an empty history', () => {
|
||||
expect(chat.getHistoryLength()).toBe(0);
|
||||
});
|
||||
|
||||
it('reflects entries added via addHistory', () => {
|
||||
chat.addHistory({ role: 'user', parts: [{ text: 'a' }] });
|
||||
chat.addHistory({ role: 'model', parts: [{ text: 'b' }] });
|
||||
expect(chat.getHistoryLength()).toBe(2);
|
||||
});
|
||||
|
||||
it('matches getHistory().length without paying the structuredClone cost', () => {
|
||||
chat.addHistory({ role: 'user', parts: [{ text: 'a' }] });
|
||||
chat.addHistory({ role: 'model', parts: [{ text: 'b' }] });
|
||||
chat.addHistory({ role: 'user', parts: [{ text: 'c' }] });
|
||||
expect(chat.getHistoryLength()).toBe(chat.getHistory().length);
|
||||
});
|
||||
});
|
||||
|
||||
describe('sendMessageStream with retries', () => {
|
||||
it('should retry on invalid content, succeed, and report metrics', async () => {
|
||||
vi.useFakeTimers();
|
||||
|
|
|
|||
|
|
@ -770,6 +770,15 @@ export class GeminiChat {
|
|||
return structuredClone(history);
|
||||
}
|
||||
|
||||
/**
|
||||
* Returns the number of entries in the raw chat history. O(1) and
|
||||
* does not clone — use this when you only need the count and would
|
||||
* otherwise pay the {@link getHistory} `structuredClone` cost.
|
||||
*/
|
||||
getHistoryLength(): number {
|
||||
return this.history.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* Clears the chat history.
|
||||
*/
|
||||
|
|
|
|||
278
packages/core/src/services/fileReadCache.integration.test.ts
Normal file
278
packages/core/src/services/fileReadCache.integration.test.ts
Normal file
|
|
@ -0,0 +1,278 @@
|
|||
/**
|
||||
* Integration tests for the FileReadCache short-circuit. Real
|
||||
* filesystem, real ReadFileTool, real microcompactHistory — verify
|
||||
* that the placeholder fast-path stays correct under history rewrites.
|
||||
*/
|
||||
|
||||
import * as fs from 'node:fs';
|
||||
import * as os from 'node:os';
|
||||
import * as path from 'node:path';
|
||||
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
|
||||
import type { Content } from '@google/genai';
|
||||
|
||||
import { FileReadCache } from './fileReadCache.js';
|
||||
import { ReadFileTool } from '../tools/read-file.js';
|
||||
import { microcompactHistory } from './microcompaction/microcompact.js';
|
||||
import { StandardFileSystemService } from './fileSystemService.js';
|
||||
|
||||
function makeConfig(targetDir: string, cache: FileReadCache, disabled = false) {
|
||||
const explicit: Record<string, unknown> = {
|
||||
getTargetDir: () => targetDir,
|
||||
getProjectRoot: () => targetDir,
|
||||
getWorkspaceContext: () => ({
|
||||
isPathWithinWorkspace: () => true,
|
||||
}),
|
||||
storage: {
|
||||
getProjectTempDir: () => path.join(targetDir, '.tmp'),
|
||||
getProjectDir: () => path.join(targetDir, '.proj'),
|
||||
getUserSkillsDirs: () => [],
|
||||
},
|
||||
getFileReadCache: () => cache,
|
||||
getFileReadCacheDisabled: () => disabled,
|
||||
getFileService: () => ({ shouldQwenIgnoreFile: () => false }),
|
||||
getFileFilteringOptions: () => ({}),
|
||||
getDebugMode: () => false,
|
||||
getFileSystemService: () => new StandardFileSystemService(),
|
||||
getContentGeneratorConfig: () => ({ modalities: {} }),
|
||||
getModel: () => 'test-model',
|
||||
getTruncateToolOutputLines: () => 2000,
|
||||
getTruncateToolOutputThreshold: () => 4_000_000,
|
||||
getUsageStatisticsEnabled: () => false,
|
||||
};
|
||||
return new Proxy(explicit, {
|
||||
get(target, prop) {
|
||||
if (prop in target)
|
||||
return (target as Record<string | symbol, unknown>)[prop];
|
||||
// Default: any unknown getter returns undefined-yielding fn.
|
||||
return () => undefined;
|
||||
},
|
||||
}) as never;
|
||||
}
|
||||
|
||||
describe('FileReadCache integration: read after history rewrite', () => {
|
||||
let tmpDir: string;
|
||||
let filePath: string;
|
||||
|
||||
beforeEach(() => {
|
||||
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'repro-3805-'));
|
||||
filePath = path.join(tmpDir, 'foo.ts');
|
||||
fs.writeFileSync(
|
||||
filePath,
|
||||
'export function hello() {\n return "world";\n}\n'.repeat(10),
|
||||
);
|
||||
});
|
||||
|
||||
afterEach(() => {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it('returns the file_unchanged placeholder on a follow-up Read after microcompact, exposing why the cache must be cleared on history rewrite', async () => {
|
||||
const cache = new FileReadCache();
|
||||
const config = makeConfig(tmpDir, cache);
|
||||
const tool = new ReadFileTool(config);
|
||||
|
||||
// STEP 1 — first real Read populates the cache.
|
||||
const r1 = await tool.buildAndExecute(
|
||||
{ file_path: filePath },
|
||||
new AbortController().signal,
|
||||
);
|
||||
expect(typeof r1.llmContent).toBe('string');
|
||||
expect(r1.llmContent as string).toContain('export function hello');
|
||||
expect(cache.size()).toBe(1);
|
||||
|
||||
// STEP 2 — build a conversation history mirroring real flow:
|
||||
// 6 prior read_file functionResponses with the foo.ts content.
|
||||
// microcompact's keepRecent=1 will clear the oldest 5.
|
||||
const history: Content[] = [];
|
||||
for (let i = 0; i < 6; i++) {
|
||||
history.push({
|
||||
role: 'model',
|
||||
parts: [
|
||||
{
|
||||
functionCall: {
|
||||
name: 'read_file',
|
||||
args: { file_path: filePath },
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
history.push({
|
||||
role: 'user',
|
||||
parts: [
|
||||
{
|
||||
functionResponse: {
|
||||
name: 'read_file',
|
||||
response: { output: r1.llmContent as string },
|
||||
},
|
||||
},
|
||||
],
|
||||
});
|
||||
}
|
||||
|
||||
// STEP 3 — microcompact fires (>60min idle).
|
||||
const mcResult = microcompactHistory(history, Date.now() - 90 * 60_000, {
|
||||
toolResultsThresholdMinutes: 60,
|
||||
toolResultsNumToKeep: 1,
|
||||
});
|
||||
expect(mcResult.meta).toBeDefined();
|
||||
expect(mcResult.meta!.toolsCleared).toBe(5);
|
||||
|
||||
// Confirm: most foo.ts content has been wiped from history.
|
||||
const fooContentEntries = mcResult.history.filter((c) =>
|
||||
c.parts?.some((p) => {
|
||||
const out = p.functionResponse?.response?.['output'];
|
||||
return typeof out === 'string' && out.includes('export function hello');
|
||||
}),
|
||||
);
|
||||
// Only 1 fresh entry remains; the other 5 are placeholders.
|
||||
expect(fooContentEntries).toHaveLength(1);
|
||||
|
||||
// STEP 4 — pre-fix code path: cache is NOT cleared after microcompact.
|
||||
// User reads foo.ts again. File on disk is unchanged.
|
||||
const r2 = await tool.buildAndExecute(
|
||||
{ file_path: filePath },
|
||||
new AbortController().signal,
|
||||
);
|
||||
|
||||
// THE BUG: returned content is the placeholder, NOT the real file.
|
||||
expect(r2.llmContent as string).toContain(
|
||||
'unchanged since last read in this session',
|
||||
);
|
||||
expect(r2.llmContent as string).not.toContain('export function hello');
|
||||
|
||||
// The model now has:
|
||||
// - history: 5 entries are [Old tool result content cleared],
|
||||
// 1 entry has real content (the most-recent kept one)
|
||||
// - fresh tool response: a placeholder pointing at "earlier in
|
||||
// this conversation" — which is partly true (1 entry remains)
|
||||
// but if the LLM trusted the placeholder and discarded the
|
||||
// last surviving entry, the bytes are unrecoverable.
|
||||
//
|
||||
// In a longer chain (e.g. 20 reads, keep 1, microcompact clears
|
||||
// 19), the surviving entry might not even be foo.ts — it would be
|
||||
// whatever was read most recently. Then the placeholder points at
|
||||
// ZERO bytes the model can find.
|
||||
});
|
||||
|
||||
it('after cache.clear(), a follow-up Read of the same unchanged file re-emits the real bytes', async () => {
|
||||
const cache = new FileReadCache();
|
||||
const config = makeConfig(tmpDir, cache);
|
||||
const tool = new ReadFileTool(config);
|
||||
|
||||
const r1 = await tool.buildAndExecute(
|
||||
{ file_path: filePath },
|
||||
new AbortController().signal,
|
||||
);
|
||||
expect(r1.llmContent as string).toContain('export function hello');
|
||||
|
||||
// The fix.
|
||||
cache.clear();
|
||||
|
||||
const r2 = await tool.buildAndExecute(
|
||||
{ file_path: filePath },
|
||||
new AbortController().signal,
|
||||
);
|
||||
|
||||
expect(r2.llmContent as string).toContain('export function hello');
|
||||
expect(r2.llmContent as string).not.toContain(
|
||||
'unchanged since last read in this session',
|
||||
);
|
||||
});
|
||||
|
||||
it('worst case: when microcompact removes every prior read of a file, the placeholder leaves zero recoverable bytes for the model', async () => {
|
||||
// This is the worst-case version: many reads, microcompact clears
|
||||
// everything, the surviving entry is a different file. The placeholder
|
||||
// then points the model at content that no longer exists anywhere
|
||||
// in its reachable context.
|
||||
const cache = new FileReadCache();
|
||||
const config = makeConfig(tmpDir, cache);
|
||||
const tool = new ReadFileTool(config);
|
||||
|
||||
const otherPath = path.join(tmpDir, 'other.ts');
|
||||
fs.writeFileSync(otherPath, 'unrelated\n');
|
||||
|
||||
// Read foo.ts (target file).
|
||||
await tool.buildAndExecute(
|
||||
{ file_path: filePath },
|
||||
new AbortController().signal,
|
||||
);
|
||||
|
||||
// Build history: 1 foo.ts read, then 1 other.ts read (kept).
|
||||
const fooContent = fs.readFileSync(filePath, 'utf-8');
|
||||
const history: Content[] = [
|
||||
{
|
||||
role: 'model',
|
||||
parts: [
|
||||
{
|
||||
functionCall: {
|
||||
name: 'read_file',
|
||||
args: { file_path: filePath },
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
role: 'user',
|
||||
parts: [
|
||||
{
|
||||
functionResponse: {
|
||||
name: 'read_file',
|
||||
response: { output: fooContent },
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
role: 'model',
|
||||
parts: [
|
||||
{
|
||||
functionCall: {
|
||||
name: 'read_file',
|
||||
args: { file_path: otherPath },
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
{
|
||||
role: 'user',
|
||||
parts: [
|
||||
{
|
||||
functionResponse: {
|
||||
name: 'read_file',
|
||||
response: { output: 'unrelated\n' },
|
||||
},
|
||||
},
|
||||
],
|
||||
},
|
||||
];
|
||||
|
||||
const mc = microcompactHistory(history, Date.now() - 90 * 60_000, {
|
||||
toolResultsThresholdMinutes: 60,
|
||||
toolResultsNumToKeep: 1,
|
||||
});
|
||||
expect(mc.meta!.toolsCleared).toBe(1);
|
||||
|
||||
// foo.ts content is gone from history; only other.ts remains.
|
||||
const surviving = mc.history
|
||||
.flatMap((c) => c.parts ?? [])
|
||||
.map((p) => p.functionResponse?.response?.['output'])
|
||||
.filter((o): o is string => typeof o === 'string');
|
||||
expect(surviving.some((o) => o.includes('export function hello'))).toBe(
|
||||
false,
|
||||
);
|
||||
|
||||
// Now Read foo.ts again — pre-fix, cache returns placeholder.
|
||||
const r = await tool.buildAndExecute(
|
||||
{ file_path: filePath },
|
||||
new AbortController().signal,
|
||||
);
|
||||
|
||||
expect(r.llmContent as string).toContain(
|
||||
'unchanged since last read in this session',
|
||||
);
|
||||
// Total foo.ts content reachable to the model:
|
||||
// history → 0 bytes
|
||||
// fresh tool result → placeholder, 0 bytes
|
||||
// The model literally cannot recover the file contents.
|
||||
});
|
||||
});
|
||||
Loading…
Add table
Add a link
Reference in a new issue