Split the legacy core speech stack into two built-in, independently toggleable plugins: `_kokoro_tts` for TTS and `_whisper_stt` for STT.
This refactor keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic `tts-service` and `stt-service` brokers, with browser-native TTS preserved as the fallback when Kokoro is disabled.
Included in this change:
- add built-in `_kokoro_tts` plugin with plugin-owned synth API, config, status UI, and provider registration
- add built-in `_whisper_stt` plugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registration
- remove legacy core speech APIs/helpers/settings/UI and delete unused `webui/js/speech_browser.js`
- replace the old hardcoded speech settings section with a generic voice surface backed by plugin extensions
- update preload/docs/tests to match the new plugin-owned speech architecture
Behavioral intent:
- both plugins are built-in but not `always_enabled`
- users can now hot-switch TTS and STT independently
- browser TTS remains available when `_kokoro_tts` is off
- Whisper mic UI only appears when `_whisper_stt` is enabled
- Introduced new extension points in various chat components: `chat-input`, `chat-top`, and `chat-bar`.
- Added extension points for sidebar components: `sidebar-start`, `sidebar-end`, and others.
- Updated modal structure with extension points for better integration.
- Updated documentation in README.md to reflect current sidebar, input, chat, welcome, and modal surfaces.
- Added tests for web UI extension surfaces to ensure proper integration and functionality.
When you select LIST mode:
applyModeSteps("list", showUtils) is called
- Groups expand because: mode !== "collapsed" → "list" !== "collapsed" ✓
- Steps stay collapsed because: mode === "expanded" → "list" === "expanded" ✗
- During streaming, steps don't auto-expand because the condition checks for detailMode === "current", which is false.
This is why we only need one line for this new modality to work.