fix(google-meet): harden observe mode speech health (#73256)

* fix(google-meet): harden observe mode speech health * fix(google-meet): address observe speech review * docs(google-meet): clarify observe mode guarantees
2026-04-28 06:31:11 +00:00 · 2026-04-28 06:21:10 +01:00 · 2026-04-28 06:21:10 +01:00 · 25851e3cae
commit 25851e3cae
parent 2633b14914
10 changed files with 398 additions and 154 deletions
--- a/docs/plugins/google-meet.md
+++ b/docs/plugins/google-meet.md
@ -74,12 +74,21 @@ Check setup:
 openclaw googlemeet setup
 ```

-The setup output is meant to be agent-readable. It reports Chrome profile,
-audio bridge, node pinning, delayed realtime intro, and, when Twilio delegation
-is configured, whether the `voice-call` plugin and Twilio credentials are ready.
-Treat any `ok: false` check as a blocker before asking an agent to join.
-Use `openclaw googlemeet setup --json` for scripts or machine-readable output.
-Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
+The setup output is meant to be agent-readable and mode-aware. It reports Chrome
+profile, node pinning, and, for realtime Chrome joins, the BlackHole/SoX audio
+bridge and delayed realtime intro checks. For observe-only joins, check the same
+transport with `--mode transcribe`; that mode skips realtime audio prerequisites
+because it does not listen through or speak through the bridge:
+
+```bash
+openclaw googlemeet setup --transport chrome-node --mode transcribe
+```
+
+When Twilio delegation is configured, setup also reports whether the
+`voice-call` plugin and Twilio credentials are ready. Treat any `ok: false`
+check as a blocker for the checked transport and mode before asking an agent to
+join. Use `openclaw googlemeet setup --json` for scripts or machine-readable
+output. Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
 to preflight a specific transport before an agent tries it.

 Join a meeting:
@ -144,8 +153,12 @@ then share the returned `meetingUri`.
 ```

 For an observe-only/browser-control join, set `"mode": "transcribe"`. That does
-not start the duplex realtime model bridge, so it will not talk back into the
-meeting.
+not start the duplex realtime model bridge, does not require BlackHole or SoX,
+and will not talk back into the meeting. Chrome joins in this mode also avoid
+OpenClaw's microphone/camera permission grant and avoid the Meet **Use
+microphone** path. If Meet shows an audio-choice interstitial, automation tries
+the no-microphone path and otherwise reports a manual action instead of opening
+the local microphone.

 During realtime sessions, `google_meet` status includes browser and audio bridge
 health such as `inCall`, `manualActionRequired`, `providerConnected`,
@ -155,10 +168,10 @@ appears, browser automation handles it when it can. Login, host admission, and
 browser/OS permission prompts are reported as manual action with a reason and
 message for the agent to relay.

-Local Chrome joins through the signed-in OpenClaw browser profile. In Meet, pick
-`BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For clean
-duplex audio, use separate virtual devices or a Loopback-style graph; a single
-BlackHole device is enough for a first smoke test but can echo.
+Local Chrome joins through the signed-in OpenClaw browser profile. Realtime mode
+requires `BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For
+clean duplex audio, use separate virtual devices or a Loopback-style graph; a
+single BlackHole device is enough for a first smoke test but can echo.

 ### Local gateway + Parallels Chrome

@ -286,13 +299,13 @@ phrase, and prints session health:
 openclaw googlemeet test-speech https://meet.google.com/abc-defg-hij
 ```

-During join, OpenClaw browser automation fills the guest name, clicks Join/Ask
-to join, and accepts Meet's first-run "Use microphone" choice when that prompt
-appears. During browser-only meeting creation, it can also continue past the
-same prompt without microphone if Meet does not expose the use-microphone button.
-If the browser profile is not signed in, Meet is waiting for host
-admission, Chrome needs microphone/camera permission, or Meet is stuck on a
-prompt automation could not resolve, the join/test-speech result reports
+During realtime join, OpenClaw browser automation fills the guest name, clicks
+Join/Ask to join, and accepts Meet's first-run "Use microphone" choice when that
+prompt appears. During observe-only join or browser-only meeting creation, it
+continues past the same prompt without microphone when that choice is available.
+If the browser profile is not signed in, Meet is waiting for host admission,
+Chrome needs microphone/camera permission for a realtime join, or Meet is stuck
+on a prompt automation could not resolve, the join/test-speech result reports
 `manualActionRequired: true` with `manualActionReason` and
 `manualActionMessage`. Agents should stop retrying the join, report that exact
 message plus the current `browserUrl`/`browserTitle`, and retry only after the
@ -979,7 +992,12 @@ Use `action: "status"` to list active sessions or inspect a session ID. Use
 `action: "speak"` with `sessionId` and `message` to make the realtime agent
 speak immediately. Use `action: "test_speech"` to create or reuse the session,
 trigger a known phrase, and return `inCall` health when the Chrome host can
-report it. Use `action: "leave"` to mark a session ended.
+report it. `test_speech` always forces `mode: "realtime"` and fails if asked to
+run in `mode: "transcribe"` because observe-only sessions intentionally cannot
+emit speech. Its `speechOutputVerified` result is based on realtime audio output
+bytes increasing during this test call, so a reused session with older audio
+does not count as a fresh successful speech check. Use `action: "leave"` to mark
+a session ended.

 `status` includes Chrome health when available:

@ -1224,7 +1242,12 @@ openclaw googlemeet doctor
 ```

 Use `mode: "realtime"` for listen/talk-back. `mode: "transcribe"` intentionally
-does not start the duplex realtime voice bridge.
+does not start the duplex realtime voice bridge. `googlemeet test-speech`
+always checks the realtime path and reports whether bridge output bytes were
+observed for that invocation. If `speechOutputVerified` is false and
+`speechOutputTimedOut` is true, the realtime provider may have accepted the
+utterance but OpenClaw did not see new output bytes reach the Chrome audio
+bridge.

 Also verify:

@ -1317,7 +1340,7 @@ call still needs a participant path. This plugin keeps that boundary visible:
 Chrome handles browser participation and local audio routing; Twilio handles
 phone dial-in participation.

-Chrome realtime mode needs either:
+Chrome realtime mode needs `BlackHole 2ch` plus either:

 - `chrome.audioInputCommand` plus `chrome.audioOutputCommand`: OpenClaw owns the
  realtime model bridge and pipes audio in `chrome.audioFormat` between those