Route host/local browser requests through the Browser tool instead of desktop or shell fallbacks. Add remote-debugging setup guidance to Browser runtime errors and document the exact Chrome inspect setting in prompts, skills, and Web UI copy.
Teach the Browser content helper to ignore global/delegated framework event bindings so snapshots surface the actual actionable controls instead of broad wrapper elements. Add an accessible name to the Browser address bar for more reliable capture output.
Allow agents to use selector-based reference actions, coordinate click fallbacks, focused-field typing, and string key chords such as CTRL+A across the browser tool, container runtime, and host connector runtime. Cover the behavior with browser regression and host connector tests.
Rename high-impact skills to task-oriented names and move plugin-owned skills into their owning plugin folders.\n\nAlign renamed skill frontmatter with the official SKILL.md standard by keeping trigger language in name/description metadata, replacing the old create-skill wizard with build-skill, and updating browser, A0 connector, computer-use, CLI setup, and scheduler skill references.\n\nTighten the recurring cross-provider guidance gaps surfaced by the evidence sweeps: memory requests now avoid promptinclude-file routing, scheduler prompts distinguish cron schedules from planned ISO dates, document questions prefer document_query, skills_tool search/read_file usage is clearer, normal notifications set info/priority 10, and local/host text editors preserve patch intent.\n\nUpdate regression tests for the renamed skills, plugin ownership, prompt budget reality, and standard frontmatter shape.
Standardize multi-action tools around tool_args.action while keeping parser compatibility for older tool/args, tool_name:action, and method-shaped requests. This keeps new prompts clean without breaking agents that learned the previous dialect.
Move A0 connector remote execution/file tools into stable standard prompts, make remote targeting independent of the active chat context, and skill-gate beta computer-use remote so it no longer weighs down the always-on tool list.
Align text editor, scheduler, skills, office artifact, memory, notify, and browser prompts/tools around the canonical action contract. Add scheduler update/timezone handling, skills_tool read_file, text editor patch coverage, and fixes for memory_forget, behaviour_adjustment, and code execution progress warnings.
Reduce default prompt pressure by compacting browser and scheduler prompts into skill-backed manifests, shortening skill catalog descriptions, and pruning noisy framework knowledge. Remove obsolete connector prompt stubs and root tool-call knowledge examples.
Tests: conda run -n a0 pytest tests/test_a0_connector_prompt_gating.py tests/test_tool_action_contracts.py tests/test_task_scheduler_timezone.py tests/test_text_editor_context_patch.py tests/test_tool_request_normalization.py tests/test_office_document_store.py::test_odf_is_advertised_and_docx_remains_explicit_compatibility tests/test_office_document_store.py::test_document_artifact_accepts_method_alias_for_ods_create tests/test_skills_runtime.py tests/test_default_prompt_budget.py::test_a0_small_profile_removed_and_prompt_text_generic -q
Integrate host-browser routing into the existing Browser tool. Store connector host-browser metadata, add pending browser op resolution, select connector runtimes from Browser settings, enforce host-content privacy policy, support automatic host preparation, and document the A0 CLI host-browser flow.
Introduce the shared surfaces frontend service and stylesheet so Browser and Desktop can register docked or floating live UI without special cases in modals.js. Update Browser and right-canvas integration to preserve active viewers across canvas/modal switches and avoid creating blank tabs unless explicitly requested.
Adds the explicit browser:screenshot action that writes JPEG/PNG files for vision_load, extends agent-callable Browser input actions, and documents the explicit vision workflow.
Adds the browser-forms on-demand skill and regression coverage for dispatch, runtime screenshot files, ref point resolution, upload path normalization, prompt discoverability, and label-wrapped form controls surfaced by the chat-driven E2E.
- Auto-register tabs opened by site (window.open, target=_blank,
ctrl-click) via context.on("page",...) with registry lock and
closing-state guard.
- Modifier-key click via Playwright trusted input: keyboard.down/up
around mouse.click for coord-based path; locator.click(modifiers=...)
selector fallback for off-screen / hidden elements. Chrome focus
rule: ctrl/meta-click keeps focus on origin tab; override via
focus_popup arg.
- key_chord action: presses keys in order, releases in reverse;
guarantees release on exception. Supports Ctrl+A/C/V style chords.
- mouse modifiers click-only (raises ValueError for non-click events).
- list(include_content=true) bulk read across all tabs in parallel
via asyncio.gather (was sequential).
- multi action: batched sub-calls. Different browser_id groups run
concurrently; same browser_id sequentially. Returns array of
{ok, result|error} matching input order. Lets the agent fan out
reads or coordinated mutations across tabs in one tool call.
- Cross-tab work no longer steals viewer focus.
last_interacted_browser_id promotes only on open / set_active /
same-tab work / Chrome popup rule. WebUI auto-open allowlist
tightened to open|navigate|set_active so background actions don't
drag the viewer.
- New set_active action for explicit focus switch.
- JS helper bumps VERSION to force re-injection on cached pages;
exports boundingBoxFor returning {x,y,w,h,selector} for the
trusted-input modifier-click paths.
Backwards-compatible: every new arg is optional with safe defaults.
No removed actions; existing call shapes preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce the new built-in Browser plugin for Agent Zero, replacing the legacy
browser-use-based browser agent with a direct Playwright-powered browser tool,
live WebUI viewer, browser session controls, status APIs, configuration, and
extension-management support.
Add browser-specific modal behavior so the browser can run as a floating,
resizable, no-backdrop window, including modal focus, toggle, and idempotent
open helpers for richer WebUI surfaces.
Remove the old `_browser_agent` core plugin and the `browser-use` dependency,
then clean up stale browser-model wiring and references across agent code,
model configuration docs, setup guides, troubleshooting docs, skills, and
Agent Zero knowledge.
Update regression and WebUI extension-surface coverage for the new browser
architecture and modal behavior.
The legacy browser-use implementation has been extracted from core so it can
continue separately as a community plugin published through the A0 Plugin Index for any user or professional that were relying on it for workflow.