Commit graph

2 commits

Author SHA1 Message Date
Ichimura Tomoo
8e291ec404
encoding: Add "reopen with encoding" (#46553)
# Add "Reopen with Encoding" feature (Local/Single user)

## Summary

This PR adds a "Reopen with Encoding" feature to allow users to manually
specify an encoding and reload the active buffer.

This feature allows users to explicitly specify the encoding and reload
the file to resolve garbled text caused by incorrect detection.

## Changes

1.  Added encoding picker logic to `encoding_selector`

- Implemented a modal UI accessible via the command palette, shortcuts,
or by clicking the encoding status in the status bar.
- Allows users to select from a list of supported encodings (Shift JIS,
EUC-JP, UTF-16LE, etc.).

2.  Updated Buffer logic (crates/language)

- Added a `force_encoding_on_next_reload` flag to the Buffer struct.
- Updated the `reload` method to check this flag and apply the following
logic:
- **Non-Unicode (e.g., Shift JIS):** Bypasses heuristics (like BOM
checks) to force the specified encoding.
- **Unicode (e.g., UTF-8):** Performs standard BOM detection. This
ensures that the BOM is correctly handled/consumed when switching back
to UTF-8.

3.  UI / Keymap

- Made the encoding status in the status bar (ActiveBufferEncoding)
clickable.
- Added default keybindings:
  - macOS: cmd-k n
  - Linux/Windows: ctrl-k n
  - Windows: ctrl-k n

## Limitations & Scope

To ensure stability and keep the PR focused, the following scenarios are
intentionally out of scope:

1. **Collaboration and Remote Connections**

- Encoding changes are disabled when collaboration (is_shared) or SSH
remote connections (is_via_remote_server) are active.
- **Reason:** Synchronizing encoding state changes between host/guest or
handling remote reloads involves complex synchronization logic. This PR
focuses on local files only.

`Remote Connection (SSH/WSL)`

|Via status bar|Via shortcut/command|
|:---:|:---:|
|<img width="767" height="136" alt="remote_tooltip"
src="https://github.com/user-attachments/assets/6c7cb293-2486-4f6d-a3ff-2086d939398e"
width="400" />|<img width="742" height="219" alt="remote_shortcut"
src="https://github.com/user-attachments/assets/5448f199-2066-4baf-b349-a983ab2fa77a"
width="400" />|

`Collaboration Session `

|Via status bar|Via shortcut/command|
|:---:|:---:|
|<img width="734" height="86" alt="collab_tooltip"
src="https://github.com/user-attachments/assets/37de99a9-dd33-4c78-98bf-20654d41fdd0"
/>|<img width="720" height="182" alt="collab_pop"
src="https://github.com/user-attachments/assets/91d03ea7-f029-442a-8236-55234576f7ed"
/>|

2. Dirty State

- The feature is disabled if the buffer has unsaved changes to prevent
data loss during reload.

|Via status bar|Via shortcut/command|
|:---:|:---:|
|<img width="545" height="103" alt="local_dirty_tooltip"
src="https://github.com/user-attachments/assets/d9ae658e-52b3-4ecd-9873-d0ec8bd51b5d"
/>|<img width="707" height="178" alt="local_dirty_pop"
src="https://github.com/user-attachments/assets/d170ea1e-9fcb-42e7-aa3e-0555b4a19d86"
/>|

3. Files detected as Binary

Files that worktree detects as "binary" (e.g., UTF-16 files without BOM
containing non-ASCII characters) are not opened in the editor, so this
feature cannot be triggered.
**Future Work**: Fixing this would require modifying crates/worktree
heuristics or exposing a "Force Open as Text" action for InvalidItemView
to trigger. Given the scope and impact, this is deferred to a future PR.

## Test Plan

I verified the feature and BOM handling using the following scenarios:

### Preparation

Used the following test files:

-
[**test_utf8.txt**](https://github.com/user-attachments/files/24548803/test_utf8.txt):
English-only text file. No BOM.
-
[**test_utf8_bom.txt**](https://github.com/user-attachments/files/24548822/test_utf8_bom.txt):
English-only text file. With BOM.
-
[**test_utf8_jp_bom.txt**](https://github.com/user-attachments/files/24548825/test_utf8_jp_bom.txt):
UTF-8 with BOM file containing Japanese characters.
-
[**test_shiftjis_jp.txt**](https://github.com/user-attachments/files/24548827/test_shiftjis_jp.txt):
Shift-JIS file containing Japanese characters (content designed to
trigger misdetection, e.g., using only half-width katakana).

Used an external editor (VS Code or Notepad) for verification.

### Case 1: English-only file behavior

1.  Open an English-only UTF-8 file (test_utf8.txt).
2.  Reopen as Shift JIS.
3.  **Result:**

- Text appearance remains unchanged (since ASCII is compatible).
- Status bar updates to "Shift JIS".

### Case 2: Fixing Mojibake

1. Open a Shift-JIS file (test_shiftjis_jp.txt) that causes detection
failure.
    ※Confirm it opens with mojibake
2.  Select Shift JIS from the status bar selector.
3.  **Result:**

- Mojibake is resolved, and Japanese text is displayed correctly.
- Status bar updates to "Shift JIS".

### Case 3: Unicode file with BOM behavior

1.  Open an English-only UTF-8 with BOM file (test_utf8_bom.txt).
2.  Reopen as `Shift JIS`.
3.  **Result:**

- The BOM bytes are displayed as mojibake at the beginning of the file.
- The rest of the English text is displayed normally (ASCII
compatibility).
- Status bar updates to "Shift JIS".

### Case 4: Non-Unicode file with BOM behavior

1. Open a UTF-8 with BOM file containing Japanese
(test_utf8_jp_bom.txt).
2.  Reopen as Shift JIS.
3.  **Result:**

- The BOM bytes at the start are displayed as mojibake.
- The Japanese text body is displayed as mojibake (UTF-8 bytes
interpreted as Shift JIS).
- Status bar updates to "Shift JIS" (no BOM indicator).

### Case 5: Revert to Unicode

1.  From the state in Case 4 (Shift JIS with mojibake), reopen as UTF-8.
2.  **Result:**

- The BOM mojibake at the start disappears (consumed).
- The text returns to normal.
- Status bar updates to "UTF-8 (BOM)".

### Case 6: External BOM removal (State sync)

1.  Open a UTF-8 with BOM file in Zed (test_utf8_bom.txt).
2. Open the same file in an external editor and save it as UTF-8 (No
BOM).
3.  Refocus Zed.
4.  **Result:**

- Text appearance remains unchanged.
- The (BOM) indicator disappears from the status bar.
- Saving in Zed and checking externally confirms the BOM is gone.

### Case 7: External BOM addition

1. From the state in Case 6 (UTF-8 No BOM), save as UTF-8 with BOM in
the external editor.
2.  Refocus Zed.
3.  **Result:**

- The (BOM) indicator appears in the status bar.
- Saving in Zed and checking externally confirms the BOM is present.

### Case 8: External Encoding Change (Auto-detect sync)

1.  Open an English-only UTF-8 file in Zed (`test_utf8.txt`).
    * *Status bar shows: "UTF-8".*
2. Open the same file in an external editor and save it as **UTF-16LE
with BOM**.
3.  Refocus Zed.
4.  **Result:**
    * The text remains readable (no mojibake).
* **Status bar automatically updates to "UTF-16LE (BOM)".** (Verifies
that `buffer.encoding` is correctly updated during reload).

Release Notes:

- Added "Reopen with Encoding" feature (currently supported for local
files).

---------

Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
2026-01-27 05:27:26 +00:00
Ichimura Tomoo
6b90aaa1bd
status_bar: Add encoding indicator (#45476)
## Context / Related PRs This PR is the third part of the encoding
support improvements, following:
- #44819: Introduced initial legacy encoding support (Shift-JIS, etc.).
- #45243: Fixed UTF-16 saving behavior and improved binary detection.

## Summary
This PR implements a status bar item that displays the character
encoding of the active buffer (e.g., `UTF-8`, `Shift_JIS`). It provides
visibility into the file's encoding and indicates the presence of a Byte
Order Mark (BOM).

## Features
- **Encoding Indicator**: Displays the encoding name in the status bar.
- **BOM Support**: Appends `(BOM)` to the encoding name if a BOM is
detected (e.g., `UTF-8 (BOM)`).
- **Configuration**: The active_encoding_button setting in status_bar
accepts "enabled", "disabled", or "non_utf8". The default is "non_utf8",
which displays the indicator for all encodings except standard UTF-8
(without BOM).
- **Settings UI**: Provides a dropdown menu in the Settings UI to
control this behavior.
- **Documentation**: Updated `configuring-zed.md` and
`visual-customization.md`.

## Implementation Details
- Created `ActiveBufferEncoding` component in
`crates/encoding_selector`.
- The click handler for the button is currently a **no-op**.
Implementing the functionality to reopen files with a specific encoding
has potential implications for real-time collaboration (e.g., syncing
buffer interpretation across peers). Therefore, this PR focuses strictly
on the visualization and configuration aspects to keep the scope simple
and focused.
- Updated schema and default settings to include
`active_encoding_button`.

## Screenshots

<img width="487" height="104" alt="image"
src="https://github.com/user-attachments/assets/041f096d-ac69-4bad-ac53-20cdcb41f733"
/>
<img width="454" height="99" alt="image"
src="https://github.com/user-attachments/assets/ed76daa2-2733-484f-bb1f-4688357c035a"
/>


## Configuration
To hide the button, add the following to `settings.json`:
```json
"status_bar": {
  "active_encoding_button": "disabled"
}
```

- **enabled**: Always show the encoding.
- **disabled**: Never show the encoding.
- **non_utf8**: Shows for non-UTF-8 encodings and UTF-8 with BOM. Only
hides for standard UTF-8 (Default).

<img width="1347" height="415" alt="image"
src="https://github.com/user-attachments/assets/7f4f4938-3320-4d21-852c-53ee886d9a44"
/>

## Heuristic Limitations:
The underlying detection logic (implemented in #44819 and #45243)
prioritizes UTF-8 opening performance and does not guarantee perfect
detection for all encodings. We consider this margin of error
acceptable, similar to the behavior seen in VS Code. A future "Reopen
with Encoding" feature would serve as the primary fallback for any
misdetections.

Release Notes:

- Added a status bar item to display the active file's character encoding (e.g. `UTF-16`). This shows for non-utf8 files by default and can be configured with `{"status_bar":{"active_encoding_button":"disabled|enabled|non_utf8"}}`
2026-01-07 07:24:46 +00:00