vrr/agent-zero

mirror of https://github.com/agent0ai/agent-zero.git synced 2026-05-22 19:47:15 +00:00

History

Alessandro 675afa8dee Some checks are pending Build And Publish Docker Images / plan (push) Waiting to run Details Build And Publish Docker Images / build (push) Blocked by required conditions Details Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins Split the legacy core speech stack into two built-in, independently toggleable plugins: `_kokoro_tts` for TTS and `_whisper_stt` for STT. This refactor keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic `tts-service` and `stt-service` brokers, with browser-native TTS preserved as the fallback when Kokoro is disabled. Included in this change: - add built-in `_kokoro_tts` plugin with plugin-owned synth API, config, status UI, and provider registration - add built-in `_whisper_stt` plugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registration - remove legacy core speech APIs/helpers/settings/UI and delete unused `webui/js/speech_browser.js` - replace the old hardcoded speech settings section with a generic voice surface backed by plugin extensions - update preload/docs/tests to match the new plugin-owned speech architecture Behavioral intent: - both plugins are built-in but not `always_enabled` - users can now hot-switch TTS and STT independently - browser TTS remains available when `_kokoro_tts` is off - Whisper mic UI only appears when `_whisper_stt` is enabled		2026-05-21 05:41:59 +02:00
..
api	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
extensions/webui	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
helpers	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
webui	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
default_config.yaml	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
hooks.py	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
plugin.yaml	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00
README.md	Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins	2026-05-21 05:41:59 +02:00

README.md

Whisper STT

Built-in speech-to-text plugin backed by Whisper.

Responsibilities

Registers Whisper as the active STT provider when the plugin is enabled.
Owns the microphone runtime, device selector UI, message delivery mode, and plugin APIs.
Keeps dependency installation and model bootstrap on the Docker/bootstrap path.

Config

model_size: Whisper model name
language: language hint or auto
message_mode: send to send final transcriptions immediately, or draft to leave them in the composer
silence_threshold: frontend threshold before recording starts
silence_duration: silence window before waiting state
waiting_timeout: delay before transcription dispatch

API

POST /api/plugins/_whisper_stt/transcribe
POST /api/plugins/_whisper_stt/status