Route no-path browser screenshots through an in-process ephemeral image registry that vision_load consumes into the existing data-url model boundary. Stop materializing host-browser artifacts into tmp/browser/host-screenshots, keep explicit path screenshots durable, and make browser log metadata point at the active chat/task context while preserving browser-context detail.
3.2 KiB
browser
Rendered browser automation for pages that need interaction, JavaScript, forms, downloads, screenshots, or visual inspection.
Prefer search_engine or document_query for plain text research. The tool must not open a Browser surface automatically. Use the tool headlessly unless the user opens the Browser surface or asks for the optional visible WebUI viewer.
The browser may run in Docker container mode or A0 CLI host-browser mode depending on settings. Container-mode paths resolve inside Agent Zero; host-mode paths resolve on the connected A0 CLI host.
When the user asks for "my browser", "host browser", "local browser", local Chrome, or opening a URL in their host browser, use this browser tool. Do not substitute computer_use_remote, code_execution_remote, or host shell launchers such as xdg-open, sensible-browser, or Python webbrowser.open. If host-browser setup fails and mentions remote debugging, stop and tell the user to open chrome://inspect/#remote-debugging, enable "Allow remote debugging for this browser instance", run /browser host on, and retry.
For complex browser workflows, load skill browser-automation. For fragile forms, load skill browser-form-workflows.
Actions: open, list, state, set_active, navigate, back, forward, reload, content, detail, screenshot, click, hover, double_click, right_click, drag, type, submit, type_submit, scroll, evaluate, key_chord, mouse, wheel, keyboard, clipboard, set_viewport, select_option, set_checked, upload_file, multi, close, close_all.
Common args: action, browser_id, url, ref, target_ref, text, selector, selectors, script, modifiers, keys, key, include_content, focus_popup, event_type, x, y, to_x, to_y, delta_x, delta_y, button, quality, full_page, path, paths, value, values, checked, width, height, calls.
Workflow:
opencreates a tab and returns id/state.contentreturns markdown with refs like[link 3],[button 6],[input text 8].- Interactions use refs from the latest
contentcapture. - For same-page controls that are easier to identify structurally,
click,type,submit,type_submit,scroll,select_option,set_checked, andupload_filemay useselectorinstead ofref; the tool resolves the selector throughcontentfirst. clickwithx/yand norefis treated as a coordinate mouse click.typewith text and noreftypes into the currently focused element.key_chordaccepts either["Control", "A"]or"CTRL+A".navigatereuses an existingbrowser_idand is preferred for serial browsing.- Screenshots are explicit only; the browser does not automatically load screenshots. Call
vision_loadwith the returnedvision_load.tool_args.pathsvalue before reasoning visually. When nopathis requested, browser screenshots are ephemeral refs rather than conserved files. - Keep the tab set small; close pages after extracting what you need.
multi is only a browser action: use tool_name: "browser" with tool_args.action: "multi". Never use tool_name: "multi".
Example:
{
"tool_name": "browser",
"tool_args": {
"action": "open",
"url": "https://example.com"
}
}