docs(tools): document file encoding and platform-specific behavior

Add documentation for encoding detection, default encoding settings,
CRLF handling for batch files, and UTF-8 BOM for PowerShell scripts on
Windows.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This commit is contained in:
tanzhenxin 2026-03-16 23:03:50 +08:00
parent 17939baa66
commit c3f5dd353d

View file

@ -165,4 +165,63 @@ grep_search(pattern="function", glob="*.js", limit=10)
- On failure: An error message explaining the reason (e.g., `Failed to edit, 0 occurrences found...`, `Failed to edit because the text matches multiple locations...`).
- **Confirmation:** Yes. Shows a diff of the proposed changes and asks for user approval before writing to the file.
## File encoding and platform-specific behavior
### Encoding detection and preservation
When reading files, Qwen Code detects the file's encoding using a multi-step strategy:
1. **UTF-8** — tried first (most modern tooling outputs UTF-8)
2. **chardet** — statistical detection for non-UTF-8 content
3. **System encoding** — falls back to the OS code page (Windows `chcp` / Unix `LANG`)
Both `write_file` and `edit` preserve the original encoding and BOM (byte order mark) of existing files. If a file was read as GBK with a UTF-8 BOM, it will be written back the same way.
### Configuring default encoding for new files
The `defaultFileEncoding` setting controls encoding for **newly created** files (not edits to existing files):
| Value | Behavior |
| ----------- | --------------------------------------------------------------------------- |
| _(not set)_ | UTF-8 without BOM, with automatic platform-specific adjustments (see below) |
| `utf-8` | UTF-8 without BOM, no automatic adjustments |
| `utf-8-bom` | UTF-8 with BOM for all new files |
Set it in `.qwen/settings.json` or `~/.qwen/settings.json`:
```json
{
"general": {
"defaultFileEncoding": "utf-8-bom"
}
}
```
### Windows: CRLF for batch files
On Windows, `.bat` and `.cmd` files are automatically written with CRLF (`\r\n`) line endings. This is required because `cmd.exe` uses CRLF as its line delimiter — LF-only endings can break multi-line `if`/`else`, `goto` labels, and `for` loops. This applies regardless of encoding settings and only on Windows.
### Windows: UTF-8 BOM for PowerShell scripts
On Windows with a **non-UTF-8 system code page** (e.g. GBK/cp936, Big5/cp950, Shift_JIS/cp932), newly created `.ps1` files are automatically written with a UTF-8 BOM. This is necessary because Windows PowerShell 5.1 (the version built into Windows 10/11) reads BOM-less scripts using the system's ANSI code page. Without a BOM, any non-ASCII characters in the script will be misinterpreted.
This automatic BOM only applies when:
- The platform is Windows
- The system code page is not UTF-8 (not code page 65001)
- The file is a new `.ps1` file (existing files keep their original encoding)
- The user has **not** explicitly set `defaultFileEncoding` in settings
PowerShell 7+ (pwsh) defaults to UTF-8 and handles BOM transparently, so the BOM is harmless there.
If you explicitly set `defaultFileEncoding` to `"utf-8"`, the automatic BOM is disabled — this is an intentional escape hatch for repositories or tooling that reject BOMs.
### Summary
| File type | Platform | Automatic behavior |
| -------------- | ----------------------------- | --------------------------- |
| `.bat`, `.cmd` | Windows | CRLF line endings |
| `.ps1` | Windows (non-UTF-8 code page) | UTF-8 BOM on new files |
| All others | All | UTF-8 without BOM (default) |
These file system tools provide a foundation for Qwen Code to understand and interact with your local project context.