mirror of
https://github.com/openclaw/openclaw.git
synced 2026-04-28 06:31:11 +00:00
fix(google-meet): harden observe mode speech health (#73256)
* fix(google-meet): harden observe mode speech health * fix(google-meet): address observe speech review * docs(google-meet): clarify observe mode guarantees
This commit is contained in:
parent
2633b14914
commit
25851e3cae
10 changed files with 398 additions and 154 deletions
|
|
@ -74,12 +74,21 @@ Check setup:
|
|||
openclaw googlemeet setup
|
||||
```
|
||||
|
||||
The setup output is meant to be agent-readable. It reports Chrome profile,
|
||||
audio bridge, node pinning, delayed realtime intro, and, when Twilio delegation
|
||||
is configured, whether the `voice-call` plugin and Twilio credentials are ready.
|
||||
Treat any `ok: false` check as a blocker before asking an agent to join.
|
||||
Use `openclaw googlemeet setup --json` for scripts or machine-readable output.
|
||||
Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
|
||||
The setup output is meant to be agent-readable and mode-aware. It reports Chrome
|
||||
profile, node pinning, and, for realtime Chrome joins, the BlackHole/SoX audio
|
||||
bridge and delayed realtime intro checks. For observe-only joins, check the same
|
||||
transport with `--mode transcribe`; that mode skips realtime audio prerequisites
|
||||
because it does not listen through or speak through the bridge:
|
||||
|
||||
```bash
|
||||
openclaw googlemeet setup --transport chrome-node --mode transcribe
|
||||
```
|
||||
|
||||
When Twilio delegation is configured, setup also reports whether the
|
||||
`voice-call` plugin and Twilio credentials are ready. Treat any `ok: false`
|
||||
check as a blocker for the checked transport and mode before asking an agent to
|
||||
join. Use `openclaw googlemeet setup --json` for scripts or machine-readable
|
||||
output. Use `--transport chrome`, `--transport chrome-node`, or `--transport twilio`
|
||||
to preflight a specific transport before an agent tries it.
|
||||
|
||||
Join a meeting:
|
||||
|
|
@ -144,8 +153,12 @@ then share the returned `meetingUri`.
|
|||
```
|
||||
|
||||
For an observe-only/browser-control join, set `"mode": "transcribe"`. That does
|
||||
not start the duplex realtime model bridge, so it will not talk back into the
|
||||
meeting.
|
||||
not start the duplex realtime model bridge, does not require BlackHole or SoX,
|
||||
and will not talk back into the meeting. Chrome joins in this mode also avoid
|
||||
OpenClaw's microphone/camera permission grant and avoid the Meet **Use
|
||||
microphone** path. If Meet shows an audio-choice interstitial, automation tries
|
||||
the no-microphone path and otherwise reports a manual action instead of opening
|
||||
the local microphone.
|
||||
|
||||
During realtime sessions, `google_meet` status includes browser and audio bridge
|
||||
health such as `inCall`, `manualActionRequired`, `providerConnected`,
|
||||
|
|
@ -155,10 +168,10 @@ appears, browser automation handles it when it can. Login, host admission, and
|
|||
browser/OS permission prompts are reported as manual action with a reason and
|
||||
message for the agent to relay.
|
||||
|
||||
Local Chrome joins through the signed-in OpenClaw browser profile. In Meet, pick
|
||||
`BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For clean
|
||||
duplex audio, use separate virtual devices or a Loopback-style graph; a single
|
||||
BlackHole device is enough for a first smoke test but can echo.
|
||||
Local Chrome joins through the signed-in OpenClaw browser profile. Realtime mode
|
||||
requires `BlackHole 2ch` for the microphone/speaker path used by OpenClaw. For
|
||||
clean duplex audio, use separate virtual devices or a Loopback-style graph; a
|
||||
single BlackHole device is enough for a first smoke test but can echo.
|
||||
|
||||
### Local gateway + Parallels Chrome
|
||||
|
||||
|
|
@ -286,13 +299,13 @@ phrase, and prints session health:
|
|||
openclaw googlemeet test-speech https://meet.google.com/abc-defg-hij
|
||||
```
|
||||
|
||||
During join, OpenClaw browser automation fills the guest name, clicks Join/Ask
|
||||
to join, and accepts Meet's first-run "Use microphone" choice when that prompt
|
||||
appears. During browser-only meeting creation, it can also continue past the
|
||||
same prompt without microphone if Meet does not expose the use-microphone button.
|
||||
If the browser profile is not signed in, Meet is waiting for host
|
||||
admission, Chrome needs microphone/camera permission, or Meet is stuck on a
|
||||
prompt automation could not resolve, the join/test-speech result reports
|
||||
During realtime join, OpenClaw browser automation fills the guest name, clicks
|
||||
Join/Ask to join, and accepts Meet's first-run "Use microphone" choice when that
|
||||
prompt appears. During observe-only join or browser-only meeting creation, it
|
||||
continues past the same prompt without microphone when that choice is available.
|
||||
If the browser profile is not signed in, Meet is waiting for host admission,
|
||||
Chrome needs microphone/camera permission for a realtime join, or Meet is stuck
|
||||
on a prompt automation could not resolve, the join/test-speech result reports
|
||||
`manualActionRequired: true` with `manualActionReason` and
|
||||
`manualActionMessage`. Agents should stop retrying the join, report that exact
|
||||
message plus the current `browserUrl`/`browserTitle`, and retry only after the
|
||||
|
|
@ -979,7 +992,12 @@ Use `action: "status"` to list active sessions or inspect a session ID. Use
|
|||
`action: "speak"` with `sessionId` and `message` to make the realtime agent
|
||||
speak immediately. Use `action: "test_speech"` to create or reuse the session,
|
||||
trigger a known phrase, and return `inCall` health when the Chrome host can
|
||||
report it. Use `action: "leave"` to mark a session ended.
|
||||
report it. `test_speech` always forces `mode: "realtime"` and fails if asked to
|
||||
run in `mode: "transcribe"` because observe-only sessions intentionally cannot
|
||||
emit speech. Its `speechOutputVerified` result is based on realtime audio output
|
||||
bytes increasing during this test call, so a reused session with older audio
|
||||
does not count as a fresh successful speech check. Use `action: "leave"` to mark
|
||||
a session ended.
|
||||
|
||||
`status` includes Chrome health when available:
|
||||
|
||||
|
|
@ -1224,7 +1242,12 @@ openclaw googlemeet doctor
|
|||
```
|
||||
|
||||
Use `mode: "realtime"` for listen/talk-back. `mode: "transcribe"` intentionally
|
||||
does not start the duplex realtime voice bridge.
|
||||
does not start the duplex realtime voice bridge. `googlemeet test-speech`
|
||||
always checks the realtime path and reports whether bridge output bytes were
|
||||
observed for that invocation. If `speechOutputVerified` is false and
|
||||
`speechOutputTimedOut` is true, the realtime provider may have accepted the
|
||||
utterance but OpenClaw did not see new output bytes reach the Chrome audio
|
||||
bridge.
|
||||
|
||||
Also verify:
|
||||
|
||||
|
|
@ -1317,7 +1340,7 @@ call still needs a participant path. This plugin keeps that boundary visible:
|
|||
Chrome handles browser participation and local audio routing; Twilio handles
|
||||
phone dial-in participation.
|
||||
|
||||
Chrome realtime mode needs either:
|
||||
Chrome realtime mode needs `BlackHole 2ch` plus either:
|
||||
|
||||
- `chrome.audioInputCommand` plus `chrome.audioOutputCommand`: OpenClaw owns the
|
||||
realtime model bridge and pipes audio in `chrome.audioFormat` between those
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue