diff --git a/docs/developers/tools/file-system.md b/docs/developers/tools/file-system.md index 1d781eeaf..118f5e0b6 100644 --- a/docs/developers/tools/file-system.md +++ b/docs/developers/tools/file-system.md @@ -165,4 +165,63 @@ grep_search(pattern="function", glob="*.js", limit=10) - On failure: An error message explaining the reason (e.g., `Failed to edit, 0 occurrences found...`, `Failed to edit because the text matches multiple locations...`). - **Confirmation:** Yes. Shows a diff of the proposed changes and asks for user approval before writing to the file. +## File encoding and platform-specific behavior + +### Encoding detection and preservation + +When reading files, Qwen Code detects the file's encoding using a multi-step strategy: + +1. **UTF-8** — tried first (most modern tooling outputs UTF-8) +2. **chardet** — statistical detection for non-UTF-8 content +3. **System encoding** — falls back to the OS code page (Windows `chcp` / Unix `LANG`) + +Both `write_file` and `edit` preserve the original encoding and BOM (byte order mark) of existing files. If a file was read as GBK with a UTF-8 BOM, it will be written back the same way. + +### Configuring default encoding for new files + +The `defaultFileEncoding` setting controls encoding for **newly created** files (not edits to existing files): + +| Value | Behavior | +| ----------- | --------------------------------------------------------------------------- | +| _(not set)_ | UTF-8 without BOM, with automatic platform-specific adjustments (see below) | +| `utf-8` | UTF-8 without BOM, no automatic adjustments | +| `utf-8-bom` | UTF-8 with BOM for all new files | + +Set it in `.qwen/settings.json` or `~/.qwen/settings.json`: + +```json +{ + "general": { + "defaultFileEncoding": "utf-8-bom" + } +} +``` + +### Windows: CRLF for batch files + +On Windows, `.bat` and `.cmd` files are automatically written with CRLF (`\r\n`) line endings. This is required because `cmd.exe` uses CRLF as its line delimiter — LF-only endings can break multi-line `if`/`else`, `goto` labels, and `for` loops. This applies regardless of encoding settings and only on Windows. + +### Windows: UTF-8 BOM for PowerShell scripts + +On Windows with a **non-UTF-8 system code page** (e.g. GBK/cp936, Big5/cp950, Shift_JIS/cp932), newly created `.ps1` files are automatically written with a UTF-8 BOM. This is necessary because Windows PowerShell 5.1 (the version built into Windows 10/11) reads BOM-less scripts using the system's ANSI code page. Without a BOM, any non-ASCII characters in the script will be misinterpreted. + +This automatic BOM only applies when: + +- The platform is Windows +- The system code page is not UTF-8 (not code page 65001) +- The file is a new `.ps1` file (existing files keep their original encoding) +- The user has **not** explicitly set `defaultFileEncoding` in settings + +PowerShell 7+ (pwsh) defaults to UTF-8 and handles BOM transparently, so the BOM is harmless there. + +If you explicitly set `defaultFileEncoding` to `"utf-8"`, the automatic BOM is disabled — this is an intentional escape hatch for repositories or tooling that reject BOMs. + +### Summary + +| File type | Platform | Automatic behavior | +| -------------- | ----------------------------- | --------------------------- | +| `.bat`, `.cmd` | Windows | CRLF line endings | +| `.ps1` | Windows (non-UTF-8 code page) | UTF-8 BOM on new files | +| All others | All | UTF-8 without BOM (default) | + These file system tools provide a foundation for Qwen Code to understand and interact with your local project context.