fix(serve/fs): close 3 review-round-4 critical findings (#4250)

DeepSeek's fourth review pass surfaced three more critical bugs;
all are fixed in-PR.

1. TOCTOU symlink substitution in readText/readBytes/edit
   (workspaceFileSystem.ts) — Critical. fsp.stat(p) and the
   subsequent lowFs.readTextFile(p) (or fsp.readFile / fsp.readFile
   for edit) are two separate syscalls. An attacker who can write
   into the workspace can swap p from a regular file to a symlink
   pointing outside between the two calls; pre-stat sees the
   original, the read follows the swap.

   Fix: assertInodeStableAfterRead(p, preIno) — after each read,
   re-lstat p; reject with symlink_escape if the path is now a
   symlink (post.isSymbolicLink()) or its inode changed (preIno
   !== post.ino, with a 0-ino fallback for procfs / virtual
   mounts that don't report meaningful inodes). Catches the
   swap-and-leave attack and the swap-and-keep-swapped attack.
   Residual race (attacker swaps back AFTER our read but BEFORE
   our lstat) is much smaller than the original window and
   outside PR 18's threat model; fd-based reading via
   fsp.open + fileHandle.read would close it entirely but
   requires a new variant of lowFs that takes a FileHandle —
   tracked as a follow-up.

2. UTF-8 truncation corruption in readText (workspaceFileSystem.ts)
   — Critical. buf.subarray(0, sizeOutcome.bytesToRead).toString
   ('utf-8') silently emits U+FFFD when bytesToRead falls
   mid-codepoint (CJK, emoji). A downstream consumer parsing
   JSON or source code over the truncated content would see
   broken trailing bytes; meta.truncated would be true but the
   prefix is corrupt. A subsequent edit() with the corrupted
   string as oldText would also fail to match the on-disk content.

   Fix: safeUtf8Truncate(buf, maxBytes) walks back from maxBytes
   through any continuation bytes (0b10xxxxxx), then verifies
   the leading byte's full sequence fits within the cap; drops
   the leading byte if it doesn't. The result is always a clean
   prefix at a valid codepoint boundary. Test pins '中文测试' (12
   bytes / 3 bytes per char) truncated at 7 bytes -> '中文' (no
   U+FFFD).

3. glob opts.cwd bypasses workspace boundary
   (workspaceFileSystem.ts) — Critical. opts.cwd was used
   directly as the glob root with no validation against
   boundWorkspace. ResolvedPath is a brand cast and a stale
   or forged value lets a glob('**/*', { cwd: '/etc' })
   enumerate files outside the workspace. The pattern-side
   absolute / UNC checks added in a81ada43f only constrain
   the *pattern*; cwd is the actual hazard.

   Fix: at the entry point of glob(), path.resolve cwd and
   isWithinRoot-check against boundWorkspace. Throws
   path_outside_workspace if cwd is outside, even when the
   pattern itself is harmlessly relative. Test pins the case
   with cwd: scratch (outside workspace).

3 new tests:
- readText with mid-operation symlink swap -> symlink_escape
- safeUtf8Truncate keeps CJK codepoints intact at 7-byte cap
- glob with opts.cwd outside workspace -> path_outside_workspace

429/429 serve tests pass; typecheck + eslint clean.

Remaining tracked follow-ups (Post-PR-18 hardening, in #4175):
- list/glob brand-cast contract (PR 19)
- runQwenServe → fsFactory injection contract test (PR 19)
- edit() write-side TOCTOU + atomic-via-temp + expectedHash (PR 20)
- glob audit pathHash (independent audit.ts commit)
- glob maxResults streaming (independent hardening)
- glob pattern preflight refactor to reuse hasSuspiciousPathPattern (cosmetic)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
This commit is contained in:
doudouOUC 2026-05-18 01:23:37 +08:00
parent 911cb8e5d1
commit 546834267f
2 changed files with 176 additions and 4 deletions

View file

@ -14,6 +14,7 @@ import {
FS_ACCESS_EVENT_TYPE,
FS_DENIED_EVENT_TYPE,
createWorkspaceFileSystemFactory,
type ResolvedPath,
type WorkspaceFileSystem,
type WorkspaceFileSystemFactory,
} from './index.js';
@ -439,6 +440,60 @@ describe('WorkspaceFileSystem - trust gate', () => {
});
});
describe('WorkspaceFileSystem - TOCTOU + UTF-8 + cwd hardening', () => {
let h: Harness;
beforeEach(async () => {
h = await makeHarness();
});
afterEach(async () => teardown(h));
it('readText detects post-stat symlink swap and rejects with symlink_escape', async () => {
// Simulate the swap: write a regular file, resolve, then
// replace it with a symlink to outside the workspace AFTER the
// boundary's pre-stat. We approximate by performing the swap
// *before* the call but after `resolve`; since the pre-stat
// and post-lstat happen back-to-back in the actual call, the
// post-lstat catches the symlink state.
const target = path.join(h.workspace, 'victim.txt');
await fsp.writeFile(target, 'plain');
const r = await h.fs.resolve('victim.txt', 'read');
// Replace the regular file with a symlink to an outside path.
const outside = path.join(h.scratch, 'sensitive.txt');
await fsp.writeFile(outside, 'sensitive');
await fsp.unlink(target);
await fsp.symlink(outside, target, 'file');
const err = await h.fs.readText(r).catch((e: unknown) => e);
expect(isFsError(err)).toBe(true);
expect((err as { kind: string }).kind).toBe('symlink_escape');
});
it('safeUtf8Truncate keeps multi-byte codepoints intact at the boundary', async () => {
// 4-char Chinese string, each char 3 bytes UTF-8 = 12 bytes.
// A naive slice at 7 bytes would split the 3rd char.
const src = '中文测试';
const target = path.join(h.workspace, 'cjk.txt');
await fsp.writeFile(target, src, 'utf-8');
const r = await h.fs.resolve('cjk.txt', 'read');
const out = await h.fs.readText(r, { maxBytes: 7 });
expect(out.meta.truncated).toBe(true);
// Result must be a valid prefix (no U+FFFD); 7 bytes / 3 bytes
// per char → 2 complete chars.
expect(out.content).toBe('中文');
expect(out.content).not.toMatch(/<2F>/);
});
it('glob rejects opts.cwd that lies outside boundWorkspace', async () => {
// Forge a `cwd` brand cast pointing outside the workspace; the
// entry-point validation should refuse before `globAsync` runs.
const outsideCwd = h.scratch as unknown as ResolvedPath;
const err = await h.fs
.glob('**/*', { cwd: outsideCwd })
.catch((e: unknown) => e);
expect(isFsError(err)).toBe(true);
expect((err as { kind: string }).kind).toBe('path_outside_workspace');
});
});
describe('WorkspaceFileSystem - audit always emits on body errors', () => {
let h: Harness;
beforeEach(async () => {

View file

@ -19,6 +19,7 @@ import { glob as globAsync } from 'glob';
import {
StandardFileSystemService,
loadIgnoreRules,
isWithinRoot,
type Ignore,
type WriteTextFileOptions,
} from '@qwen-code/qwen-code-core';
@ -327,11 +328,17 @@ class WorkspaceFileSystemImpl implements WorkspaceFileSystem {
};
let truncatedContent = result.content;
if (sizeOutcome.truncated) {
// Use `safeUtf8Truncate` instead of `subarray(0,n).toString('utf-8')`
// so the slice never splits a multi-byte codepoint (CJK,
// emoji). The plain `subarray + toString` approach silently
// emits U+FFFD at the boundary and breaks downstream JSON /
// source-code parsing of the truncated prefix.
const buf = Buffer.from(result.content, 'utf-8');
if (buf.length > sizeOutcome.bytesToRead) {
truncatedContent = buf
.subarray(0, sizeOutcome.bytesToRead)
.toString('utf-8');
truncatedContent = safeUtf8Truncate(
buf,
sizeOutcome.bytesToRead,
).toString('utf-8');
}
meta.truncated = true;
}
@ -348,6 +355,12 @@ class WorkspaceFileSystemImpl implements WorkspaceFileSystem {
meta.truncated = true;
}
if (ignoreVerdict.ignored) meta.matchedIgnore = ignoreVerdict.category;
// Post-read TOCTOU check: confirm the path's inode hasn't
// changed and it isn't now a symlink. Catches the
// swap-during-read attack where the file is replaced
// mid-operation with a symlink pointing outside the
// workspace.
await assertInodeStableAfterRead(p as string, st.ino);
this.deps.audit.recordAccess(this.deps.ctx, {
intent: 'read',
absolute: p,
@ -372,6 +385,8 @@ class WorkspaceFileSystemImpl implements WorkspaceFileSystem {
const st = await fsp.stat(p as string);
enforceReadBytesSize(st.size, opts.maxBytes);
const buf = await fsp.readFile(p as string);
// Post-read TOCTOU guard — same shape as `readText`.
await assertInodeStableAfterRead(p as string, st.ino);
this.deps.audit.recordAccess(this.deps.ctx, {
intent: 'read',
absolute: p,
@ -454,9 +469,25 @@ class WorkspaceFileSystemImpl implements WorkspaceFileSystem {
{ hint: 'pass a relative pattern such as "src/**/*.ts"' },
);
}
// `opts.cwd` is typed `ResolvedPath` but a brand cast in
// calling code can produce a path that's never been verified
// against `boundWorkspace` (or was verified at a stale
// moment). Re-validate at the entry point so a glob with
// `cwd: '/etc'` cannot enumerate files outside the workspace
// even when the *pattern* is harmlessly relative. The
// pattern-side checks above only constrain the pattern
// shape; without this check `cwd` is the actual hazard.
const cwd = (opts.cwd as string | undefined) ?? this.deps.boundWorkspace;
const cwdAbs = path.resolve(cwd);
if (!isWithinRoot(cwdAbs, this.deps.boundWorkspace)) {
throw new FsError(
'path_outside_workspace',
`glob cwd is outside workspace: ${cwd}`,
{ hint: 'opts.cwd must be a path obtained from fs.resolve()' },
);
}
const matches = await globAsync(pattern, {
cwd,
cwd: cwdAbs,
nodir: false,
absolute: true,
dot: true,
@ -642,6 +673,13 @@ class WorkspaceFileSystemImpl implements WorkspaceFileSystem {
});
}
const current = await fsp.readFile(p as string, 'utf-8');
// Post-read TOCTOU guard — catches the swap-during-read
// attack where `p` is replaced with a symlink between
// `fsp.stat` above and `fsp.readFile` here. The full
// read-modify-write race window (between this check and
// `lowFs.writeTextFile` below) is the deferred PR 20
// follow-up that adds atomic-via-temp + `expectedHash`.
await assertInodeStableAfterRead(p as string, st.ino);
// Single replacement to preserve atomic write-once semantics.
// Multi-occurrence handling lives in PR 20's edit endpoint
// where the route can decide policy; the boundary stays
@ -709,6 +747,85 @@ class WorkspaceFileSystemImpl implements WorkspaceFileSystem {
}
}
/**
* Truncate a UTF-8 buffer to at most `maxBytes` bytes WITHOUT
* splitting a multi-byte codepoint. `Buffer.subarray(0, n).toString('utf-8')`
* silently emits U+FFFD replacement chars when `n` falls in the
* middle of a 2-4-byte sequence (CJK, emoji); a downstream consumer
* parsing JSON / source code over the truncated content sees corrupted
* trailing bytes. We back off `n` to the last valid codepoint
* boundary so the truncated string is always a clean prefix of the
* original.
*
* Algorithm:
* 1. If the buffer fits, return as-is.
* 2. Walk back from `maxBytes` while the previous byte is a UTF-8
* continuation byte (`0b10xxxxxx`).
* 3. The byte at the new boundary is now either ASCII (`<0x80`) or
* a leading byte. If it's a leading byte, check whether the full
* multi-byte sequence fits within `maxBytes`. If not, drop the
* leading byte too the sequence is incomplete.
*/
function safeUtf8Truncate(buf: Buffer, maxBytes: number): Buffer {
if (buf.length <= maxBytes) return buf;
let end = maxBytes;
while (end > 0 && (buf[end] & 0xc0) === 0x80) end--;
if (end > 0) {
const lead = buf[end - 1];
let seqLen = 1;
if ((lead & 0x80) === 0) seqLen = 1;
else if ((lead & 0xe0) === 0xc0) seqLen = 2;
else if ((lead & 0xf0) === 0xe0) seqLen = 3;
else if ((lead & 0xf8) === 0xf0) seqLen = 4;
if (seqLen > 1 && end - 1 + seqLen > maxBytes) {
end = end - 1;
}
}
return buf.subarray(0, end);
}
/**
* Post-read TOCTOU guard. After reading the file at `p`, re-`lstat`
* to confirm the inode hasn't changed and the path isn't now a
* symlink. Catches the swap-then-leave attack where a regular
* file is replaced with a symlink to outside the workspace
* BETWEEN the boundary's pre-stat and the actual read the
* pre-stat saw the original (small, regular) file but the read
* followed the swap to wherever the attacker pointed. There's a
* residual race where the attacker swaps back after our read but
* before this check; that window is much smaller than the swap-
* and-leave attack and outside PR 18's threat model. The proper
* fix is fd-based reading (`fsp.open` + `fileHandle.read`) so the
* fd binds to the inode at open time; that's a follow-up since it
* requires a new variant of `lowFs.readTextFile` that takes a
* FileHandle instead of a path.
*/
async function assertInodeStableAfterRead(
p: string,
preIno: bigint | number,
): Promise<void> {
const post = await fsp.lstat(p);
if (post.isSymbolicLink()) {
throw new FsError(
'symlink_escape',
`path was replaced with a symlink during read: ${p}`,
{ hint: 'TOCTOU swap detected via post-read lstat' },
);
}
// ino can be 0 on virtual filesystems (procfs etc.) — only compare
// when both sides report a meaningful value.
const preNum = typeof preIno === 'bigint' ? preIno : BigInt(preIno as number);
const postNum =
typeof post.ino === 'bigint' ? post.ino : BigInt(post.ino as number);
if (preNum !== 0n && postNum !== 0n && preNum !== postNum) {
throw new FsError(
'symlink_escape',
`path inode changed during read: ${p}`,
{ hint: 'TOCTOU swap detected via inode comparison' },
);
}
}
function kindFromStats(st: {
isFile(): boolean;
isDirectory(): boolean;