Closes#50785
Opening a .wav file (PCM 16-bit) caused Zed to freeze because the binary
detection heuristic in `analyze_byte_contentmisidentified` it as
UTF-16LE text. The heuristic determines UTF-16 encoding solely by
checking whether null bytes are skewed toward even or odd positions. PCM
16-bit audio with small sample values produces bytes like `[sample,
0x00]`, creating an alternating null pattern at odd positions that is
indistinguishable from BOM-less UTF-16LE by position alone.
### Why not just add more binary headers?
The initial approach
(32d8bd7009)
was to add audio format signatures (RIFF, OGG, FLAC, MP3) to known
binary header. While this solved the reported `.wav` case, any binary
format containing small 16-bit values (audio, images, or arbitrary data)
would still be misclassified. Adding headers is an endless game that
cannot cover unknown or uncommon formats.
### Changes
* Adds `is_plausible_utf16_text` as a secondary validation: when the
null byte skew suggests UTF-16, decode the bytes and count code units
that fall in C0/C1 control character ranges (U+0000–U+001F,
U+007F–U+009F, excluding common whitespace) or form unpaired surrogates.
Real UTF-16 text has near-zero such characters. I've set the threshold
at 2% — note that this is an empirically derived value, not based on any
formal standard.
**Before fix**
<img width="1147" height="807" alt="스크린샷 2026-03-06 오후 9 00 07"
src="https://github.com/user-attachments/assets/2e6e47f9-f5e7-4cab-9d41-cc3dd20f9142"
/>
**After fix**
<img width="1280" height="783" alt="스크린샷 2026-03-06 오전 1 17 43"
src="https://github.com/user-attachments/assets/3fecea75-f061-4757-9972-220a34380d67"
/>
Before you mark this PR as ready for review, make sure that you have:
- [X] Added a solid test coverage and/or screenshots from doing manual
testing
- [ ] Done a self-review taking into account security and performance
aspects
- [ ] Aligned any UI changes with the [UI
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
Release Notes:
- Fixed binary files (e.g. WAV) being misdetected as UTF-16 text,
causing Zed to freeze.