browser: replace browser-use agent with native browser

Introduce the new built-in Browser plugin for Agent Zero, replacing the legacy
browser-use-based browser agent with a direct Playwright-powered browser tool,
live WebUI viewer, browser session controls, status APIs, configuration, and
extension-management support.

Add browser-specific modal behavior so the browser can run as a floating,
resizable, no-backdrop window, including modal focus, toggle, and idempotent
open helpers for richer WebUI surfaces.

Remove the old `_browser_agent` core plugin and the `browser-use` dependency,
then clean up stale browser-model wiring and references across agent code,
model configuration docs, setup guides, troubleshooting docs, skills, and
Agent Zero knowledge.

Update regression and WebUI extension-surface coverage for the new browser
architecture and modal behavior.

The legacy browser-use implementation has been extracted from core so it can
continue separately as a community plugin published through the A0 Plugin Index for any user or professional that were relying on it for workflow.
This commit is contained in:
Alessandro 2026-04-24 15:43:52 +02:00
parent 603fc2064b
commit 983d431a5e
65 changed files with 6936 additions and 1926 deletions

View file

@ -99,12 +99,12 @@ A detailed setup guide for Windows, macOS, and Linux can be found in the Agent Z
![Multi-agent](docs/res/usage/multi-agent.png)
### Browser Agent
### Browser
- Browser automation is provided by the built-in `_browser_agent` plugin.
- It uses the effective Main Model resolved by `_model_config`; there is no separate browser model slot.
- Browser vision follows the Main Model's vision setting.
- Playwright Chromium: **Docker** images ship the headless shell preinstalled. **Local development** installs it on first Browser Agent use via `ensure_playwright_binary()` in `plugins/_browser_agent/helpers/playwright.py` (into `tmp/playwright`); you can pre-install manually (see [Development Setup](docs/setup/dev-setup.md)) to skip the wait.
- Browser automation is provided by the built-in `_browser` plugin and the direct `browser` tool.
- The tool uses Playwright operations controlled by the main agent, with typed page refs such as `[link 3]` and `[button 6]`.
- The plugin includes a visible WebUI browser viewer for open sessions.
- Playwright Chromium: **Docker** images ship the headless shell preinstalled. **Local development** installs it on first browser use via `ensure_playwright_binary()` in `plugins/_browser/helpers/playwright.py` (into `tmp/playwright`); you can pre-install manually (see [Development Setup](docs/setup/dev-setup.md)) to skip the wait.
4. **Completely Customizable and Extensible**

View file

@ -740,10 +740,6 @@ class Agent:
def get_utility_model(self):
return None
@extension.extensible
def get_browser_model(self):
return None
@extension.extensible
def get_embedding_model(self):
return None
@ -1044,4 +1040,4 @@ class Agent:
message=message,
loop_data=loop_data,
**kwargs,
)
)

View file

@ -216,6 +216,24 @@ Outcome:
- Nested modals dont “flatten” into each other.
- The backdrop always darkens the page behind the active modal without hiding lower modals incorrectly.
### Floating no-backdrop modals
Use `.modal-floating` on the outer `.modal` when a modal should behave like a floating utility panel instead of a blocking dialog. This is for special live surfaces such as the browser panel where the user should keep seeing and interacting with the chat or dashboard behind the panel.
Working contract:
- `.modal-floating` suppresses the shared `.modal-backdrop` for that modal.
- `.modal-floating` makes the full-screen `.modal` shell pointer-transparent.
- `.modal-floating .modal-inner` remains pointer-active, so the floating panel itself still receives clicks, keyboard focus, drag handlers, resize handles, and form input.
- Floating modal sizing, dragging, and resizing are still component-owned unless promoted to shared modal CSS later. The modal system only provides the backdrop and pointer-event behavior.
Good to know:
- A floating modal does not close by clicking the page behind it, because those clicks pass through to the app. Keep an obvious close button in the modal header.
- If a floating modal opens another normal modal, the normal modal can still use the backdrop; stacking remains governed by the shared z-index logic.
- Use `.modal-no-backdrop` only when a component needs backdrop suppression without click-through floating behavior. Prefer `.modal-floating` for utility panels.
- Do not use `.modal-floating` for destructive confirmations, settings forms, auth, import/export, or workflows that require the user to finish or dismiss the dialog before interacting with the rest of the app.
---
## Writing a modal component (conventions)

View file

@ -114,4 +114,4 @@ Community-tested and reliable MCP servers:
- **VSCode MCP** - IDE workflows
> [!TIP]
> For browser automation tasks, the built-in Browser Agent plugin covers the default workflow. MCP-based browser tools are still useful when you need a different browser stack, remote browser control, or an alternative to the built-in Playwright Chromium (preinstalled in Docker; on demand via `ensure_playwright_binary()` in local dev).
> For browser automation tasks, the built-in `_browser` plugin and direct `browser` tool cover the default workflow. MCP-based browser tools are still useful when you need a different browser stack, remote browser control, or an alternative to the built-in Playwright Chromium (preinstalled in Docker; on demand via `ensure_playwright_binary()` in local dev).

View file

@ -221,7 +221,7 @@ SMTP_PASSWORD=email_pwd_here
### Subagent Configuration
Projects can enable or disable specific subagents. This is configured via the UI and stored in `.a0proj/agents.json`. The Browser Agent is not a subagent; it is a built-in plugin.
Projects can enable or disable specific subagents. This is configured via the UI and stored in `.a0proj/agents.json`. The browser tool is not a subagent; it is a built-in plugin.
### Project LLM Configuration

View file

@ -26,8 +26,8 @@ Refer to the [Choosing your LLMs](../setup/installation.md#installing-and-using-
**7. How can I make Agent Zero retain memory between sessions?**
Use **Settings → Backup & Restore** and avoid mapping the entire `/a0` directory. See [How to update Agent Zero](../setup/installation.md#how-to-update-agent-zero).
**8. My browser agent fails or says Playwright is missing. What now?**
The built-in Browser Agent is a plugin that uses the Main Model from `_model_config`. **Docker:** the Chromium headless shell is shipped preinstalled (typically under `/a0/tmp/playwright`). **Local development:** if the binary is missing, `ensure_playwright_binary()` in `plugins/_browser_agent/helpers/playwright.py` runs `playwright install chromium --only-shell` into `tmp/playwright` on first Browser Agent use (you may see UI notifications). To install ahead of time, run `PLAYWRIGHT_BROWSERS_PATH=tmp/playwright playwright install chromium --only-shell` after `pip install -r requirements.txt`. If you prefer an external browser stack, use MCP alternatives such as Browser OS, Chrome DevTools, or Playwright MCP. See [MCP Setup](mcp-setup.md).
**8. My browser tool fails or says Playwright is missing. What now?**
The built-in browser is provided by the `_browser` plugin and the direct `browser` tool. **Docker:** the Chromium headless shell is shipped preinstalled (typically under `/a0/tmp/playwright`). **Local development:** if the binary is missing, `ensure_playwright_binary()` in `plugins/_browser/helpers/playwright.py` runs `playwright install chromium --only-shell` into `tmp/playwright` on first browser use (you may see UI notifications). To install ahead of time, run `PLAYWRIGHT_BROWSERS_PATH=tmp/playwright playwright install chromium --only-shell` after `pip install -r requirements.txt`. If you prefer an external browser stack, use MCP alternatives such as Browser OS, Chrome DevTools, or Playwright MCP. See [MCP Setup](mcp-setup.md).
**9. My secrets disappeared after a backup restore.**
Secrets are stored in `/a0/usr/secrets.env` and are not always included in backup archives. Copy them manually.
@ -36,7 +36,7 @@ Secrets are stored in `/a0/usr/secrets.env` and are not always included in backu
- Join the Agent Zero [Skool](https://www.skool.com/agent-zero) or [Discord](https://discord.gg/B8KZKNsPpj) community.
**11. How do I adjust API rate limits?**
Use the model rate limit fields in Settings (Main Model and Utility Model sections) to set request/input/output limits. The Browser Agent inherits the Main Model limits. These map to the model config limits (for example `limit_requests`, `limit_input`, `limit_output`).
Use the model rate limit fields in Settings (Main Model and Utility Model sections) to set request/input/output limits. These map to the model config limits (for example `limit_requests`, `limit_input`, `limit_output`).
**12. My `code_execution_tool` doesn't work, what's wrong?**
- Ensure Docker is installed and running.

View file

@ -126,8 +126,8 @@ Agent Zero's power comes from its ability to use [tools](../developer/architectu
- **Understand Tools:** Agent Zero includes default tools like knowledge (powered by SearXNG), code execution, and communication. Understand the capabilities of these tools and how to invoke them.
### Browser Agent Status & MCP Alternatives
The built-in Browser Agent is provided by the `_browser_agent` plugin. It uses the effective Main Model from `_model_config`, including per-chat overrides and the Main Model vision flag. Playwright Chromium is preinstalled in **Docker**; in **local development** it is installed on demand when needed via `ensure_playwright_binary()` (see [Development Setup](../setup/dev-setup.md) to pre-install).
### Browser Tool Status & MCP Alternatives
The built-in browser is provided by the `_browser` plugin and direct `browser` tool. It uses Playwright operations controlled by the main agent, exposes typed page refs for links, buttons, images, and inputs, and includes a WebUI viewer for open browser sessions. Playwright Chromium is preinstalled in **Docker**; in **local development** it is installed on demand when needed via `ensure_playwright_binary()` (see [Development Setup](../setup/dev-setup.md) to pre-install).
If you need a different browser stack or want external browser tooling, MCP-based browser tools are still a strong option:

View file

@ -69,7 +69,7 @@ Now when you select one of the python files in the project, you should see prope
pip install -r requirements.txt
PLAYWRIGHT_BROWSERS_PATH=tmp/playwright playwright install chromium --only-shell
```
The first command installs Python dependencies. The second installs the Chromium headless shell into `tmp/playwright` ahead of time (same path in Docker: `/a0/tmp/playwright`). If you skip the second command, **local development** still downloads the shell on first Browser Agent use through `ensure_playwright_binary()` in `plugins/_browser_agent/helpers/playwright.py`. Pre-installing avoids that wait. **Docker** images ship the shell preinstalled; runtime install is for local dev when the binary is missing.
The first command installs Python dependencies. The second installs the Chromium headless shell into `tmp/playwright` ahead of time (same path in Docker: `/a0/tmp/playwright`). If you skip the second command, **local development** still downloads the shell on first browser use through `ensure_playwright_binary()` in `plugins/_browser/helpers/playwright.py`. Pre-installing avoids that wait. **Docker** images ship the shell preinstalled; runtime install is for local dev when the binary is missing.
Errors in the code editor caused by missing packages should now be gone. If not, try reloading the window.

View file

@ -405,7 +405,7 @@ The Settings page is the control center for selecting the Large Language Models
| LLM Role | Description |
| --- | --- |
| `chat_llm` | This is the primary LLM used for conversations, agent reasoning, tool use, and the built-in browser agent. Vision support controls browser vision and image understanding. |
| `chat_llm` | This is the primary LLM used for conversations, agent reasoning, and tool use. Vision support controls image understanding. |
| `utility_llm` | This LLM handles internal tasks like summarizing messages, managing memory, and processing internal prompts. Using a smaller, less expensive model here can improve efficiency. |
| `embedding_llm` | The embedding model shipped with A0 runs on CPU and is responsible for generating embeddings used for memory retrieval and knowledge base lookups. Changing the `embedding_llm` will re-index all of A0's memory. |
@ -416,7 +416,7 @@ The Settings page is the control center for selecting the Large Language Models
3. Click "Save" to apply the changes.
> [!NOTE]
> The Browser Agent does not have a separate model slot. It uses the effective Main Model resolved by `_model_config`, including per-chat overrides and the Main Model vision flag.
> The built-in browser does not have a separate model slot. The main agent decides when to call the direct `browser` tool.
### Important Considerations

View file

@ -76,7 +76,7 @@ An external REST API is available for programmatic task submission. Agent-to-Age
- **No persistent state between chats** unless explicitly memorized or saved to files.
- **Context window**: long conversations are summarized automatically, which can lose detail.
- **Memory recall is approximate**: similarity search may miss relevant memories or surface irrelevant ones.
- **No GUI interaction** outside the browser agent (which is separate from the main agent).
- **No GUI interaction** outside built-in browser tooling or configured computer-use integrations.
- **Container boundary**: the agent cannot affect systems outside the Docker container unless network access or volume mounts are configured.
- **Model capability ceiling**: tool usage quality and reasoning depth are bounded by the underlying LLM. Small models may struggle with complex multi-step tool use.
- **No real-time data** beyond web search. The agent's own knowledge cutoff is the underlying model's training cutoff.

View file

@ -6,11 +6,11 @@ Agent Zero uses three configurable LLM roles:
| Role | Purpose |
|------|---------|
| `chat_llm` | Primary model for all agent reasoning, tool use, and the Browser Agent |
| `chat_llm` | Primary model for all agent reasoning and tool use |
| `utility_llm` | Secondary model for internal framework tasks: memory summarization, query generation, history compression, memory recall filtering |
| `embedding_llm` | Produces vector embeddings for memory and knowledge indexing |
The utility model handles high-volume, lower-stakes operations and can be a cheaper/faster model than the chat model. The Browser Agent uses the effective chat model resolved by `_model_config`, including per-chat overrides and the chat model vision flag. Changing the embedding model invalidates the existing vector index - the entire knowledge base is re-indexed automatically.
The utility model handles high-volume, lower-stakes operations and can be a cheaper/faster model than the chat model. Browser automation is exposed as the direct `browser` tool; the main agent decides when to call it. Changing the embedding model invalidates the existing vector index - the entire knowledge base is re-indexed automatically.
## Model Providers

View file

@ -45,7 +45,7 @@ from sentence_transformers import SentenceTransformer
from pydantic import ConfigDict
# disable extra logging, must be done repeatedly, otherwise browser-use will turn it back on for some reason
# keep provider logging quiet in normal operation
def turn_off_logging():
os.environ["LITELLM_LOG"] = "ERROR" # only errors
litellm.suppress_debug_info = True

View file

@ -0,0 +1,27 @@
from helpers.api import ApiHandler, Request
from plugins._browser.helpers.extension_manager import (
get_extensions_root,
install_chrome_web_store_extension,
list_browser_extensions,
)
class Extensions(ApiHandler):
async def process(self, input: dict, request: Request) -> dict:
action = input.get("action", "list")
if action == "list":
return {
"ok": True,
"root": str(get_extensions_root()),
"extensions": list_browser_extensions(),
}
if action == "install_web_store":
try:
result = install_chrome_web_store_extension(str(input.get("url", "")))
except ValueError as exc:
return {"ok": False, "error": str(exc)}
return result
return {"ok": False, "error": f"Unknown action: {action}"}

View file

@ -0,0 +1,32 @@
from helpers.api import ApiHandler, Request
from plugins._browser.helpers.config import build_browser_launch_config, get_browser_config
from plugins._browser.helpers.playwright import get_playwright_binary, get_playwright_cache_dir
from plugins._browser.helpers.runtime import known_context_ids
class Status(ApiHandler):
async def process(self, input: dict, request: Request) -> dict:
browser_config = get_browser_config()
launch_config = build_browser_launch_config(browser_config)
runtime_binary = get_playwright_binary(
full_browser=launch_config["requires_full_browser"]
)
shell_binary = get_playwright_binary(full_browser=False)
chromium_binary = get_playwright_binary(full_browser=True)
return {
"plugin": "_browser",
"playwright": {
"cache_dir": get_playwright_cache_dir(),
"binary_found": bool(runtime_binary),
"binary_path": str(runtime_binary) if runtime_binary else "",
"headless_shell_binary_path": str(shell_binary) if shell_binary else "",
"chromium_binary_path": str(chromium_binary) if chromium_binary else "",
"launch_mode": launch_config["browser_mode"],
},
"extensions": {
**launch_config["extensions"],
"launch_mode": launch_config["browser_mode"],
"requires_full_browser": launch_config["requires_full_browser"],
},
"contexts": known_context_ids(),
}

View file

@ -0,0 +1,241 @@
from __future__ import annotations
import asyncio
from typing import Any, ClassVar
from agent import AgentContext
from helpers.ws import WsHandler
from helpers.ws_manager import WsResult
from plugins._browser.helpers.runtime import get_runtime
class WsBrowser(WsHandler):
_streams: ClassVar[dict[tuple[str, str], asyncio.Task[None]]] = {}
async def on_disconnect(self, sid: str) -> None:
for key in [key for key in self._streams if key[0] == sid]:
task = self._streams.pop(key)
task.cancel()
async def process(
self,
event: str,
data: dict[str, Any],
sid: str,
) -> dict[str, Any] | WsResult | None:
if not event.startswith("browser_"):
return None
if event == "browser_viewer_subscribe":
return await self._subscribe(data, sid)
if event == "browser_viewer_unsubscribe":
return self._unsubscribe(data, sid)
if event == "browser_viewer_command":
return await self._command(data, sid)
if event == "browser_viewer_input":
return await self._input(data, sid)
return WsResult.error(
code="UNKNOWN_BROWSER_EVENT",
message=f"Unknown browser event: {event}",
correlation_id=data.get("correlationId"),
)
async def _subscribe(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult:
context_id = self._context_id(data)
if not context_id:
return self._error("MISSING_CONTEXT", "context_id is required", data)
if not AgentContext.get(context_id):
return self._error("CONTEXT_NOT_FOUND", f"Context '{context_id}' was not found", data)
runtime = await get_runtime(context_id)
listing = await runtime.call("list")
browsers = listing.get("browsers") or []
if not browsers:
opened = await runtime.call("open", "about:blank")
listing = await runtime.call("list")
browsers = listing.get("browsers") or []
if opened.get("id"):
listing["last_interacted_browser_id"] = opened.get("id")
active_id = data.get("browser_id") or listing.get("last_interacted_browser_id")
if not active_id and browsers:
active_id = browsers[0].get("id")
stream_key = (sid, context_id)
existing = self._streams.pop(stream_key, None)
if existing:
existing.cancel()
self._streams[stream_key] = asyncio.create_task(
self._stream_frames(sid, context_id, active_id)
)
return {
"context_id": context_id,
"active_browser_id": active_id,
"browsers": browsers,
}
def _unsubscribe(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult:
context_id = self._context_id(data)
if not context_id:
return self._error("MISSING_CONTEXT", "context_id is required", data)
task = self._streams.pop((sid, context_id), None)
if task:
task.cancel()
return {"context_id": context_id, "unsubscribed": True}
async def _command(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult:
context_id = self._context_id(data)
if not context_id:
return self._error("MISSING_CONTEXT", "context_id is required", data)
runtime = await get_runtime(context_id)
command = str(data.get("command") or "").strip().lower().replace("-", "_")
browser_id = data.get("browser_id")
try:
if command == "open":
result = await runtime.call("open", data.get("url") or "about:blank")
elif command == "navigate":
result = await runtime.call("navigate", browser_id, data.get("url") or "")
elif command == "back":
result = await runtime.call("back", browser_id)
elif command == "forward":
result = await runtime.call("forward", browser_id)
elif command == "reload":
result = await runtime.call("reload", browser_id)
elif command == "close":
result = await runtime.call("close_browser", browser_id)
elif command == "list":
result = await runtime.call("list")
else:
return self._error("UNKNOWN_COMMAND", f"Unknown browser command: {command}", data)
except Exception as exc:
return self._error("COMMAND_FAILED", str(exc), data)
listing = await runtime.call("list")
last_interacted_browser_id = listing.get("last_interacted_browser_id")
await self.emit_to(
sid,
"browser_viewer_state",
{
"context_id": context_id,
"result": result,
"browsers": listing.get("browsers") or [],
"last_interacted_browser_id": last_interacted_browser_id,
},
correlation_id=data.get("correlationId"),
)
return {
"result": result,
"browsers": listing.get("browsers") or [],
"last_interacted_browser_id": last_interacted_browser_id,
}
async def _input(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult:
context_id = self._context_id(data)
if not context_id:
return self._error("MISSING_CONTEXT", "context_id is required", data)
runtime = await get_runtime(context_id, create=False)
if not runtime:
return self._error("NO_BROWSER_RUNTIME", "No browser runtime exists for this context", data)
input_type = str(data.get("input_type") or "").strip().lower()
browser_id = data.get("browser_id")
try:
if input_type == "mouse":
result = await runtime.call(
"mouse",
browser_id,
data.get("event_type") or "click",
float(data.get("x") or 0),
float(data.get("y") or 0),
data.get("button") or "left",
)
elif input_type == "keyboard":
result = await runtime.call(
"keyboard",
browser_id,
key=str(data.get("key") or ""),
text=str(data.get("text") or ""),
)
elif input_type == "viewport":
result = await runtime.call(
"set_viewport",
browser_id,
int(data.get("width") or 0),
int(data.get("height") or 0),
)
elif input_type == "wheel":
result = await runtime.call(
"wheel",
browser_id,
float(data.get("x") or 0),
float(data.get("y") or 0),
float(data.get("delta_x") or 0),
float(data.get("delta_y") or 0),
)
else:
return self._error("UNKNOWN_INPUT", f"Unknown browser input: {input_type}", data)
except Exception as exc:
return self._error("INPUT_FAILED", str(exc), data)
return {"state": result}
async def _stream_frames(
self,
sid: str,
context_id: str,
browser_id: int | str | None,
) -> None:
while True:
try:
runtime = await get_runtime(context_id, create=False)
if runtime:
listing = await runtime.call("list")
browsers = listing.get("browsers") or []
browser_ids = {str(browser.get("id")) for browser in browsers}
requested_id = str(browser_id or "") if browser_id else ""
active_id = (
browser_id
if requested_id and requested_id in browser_ids
else listing.get("last_interacted_browser_id")
)
if active_id and str(active_id) not in browser_ids:
active_id = None
if not active_id and browsers:
active_id = browsers[0].get("id")
if active_id:
frame = await runtime.call("screenshot", active_id)
frame["context_id"] = context_id
frame["browsers"] = browsers
await self.emit_to(sid, "browser_viewer_frame", frame)
else:
await self.emit_to(
sid,
"browser_viewer_frame",
{
"context_id": context_id,
"browser_id": None,
"browsers": browsers,
"image": "",
"mime": "",
"state": None,
},
)
await asyncio.sleep(0.75)
except asyncio.CancelledError:
raise
except Exception:
await asyncio.sleep(1.5)
@staticmethod
def _context_id(data: dict[str, Any]) -> str:
return str(data.get("context_id") or data.get("context") or "").strip()
@staticmethod
def _error(code: str, message: str, data: dict[str, Any]) -> WsResult:
return WsResult.error(
code=code,
message=message,
correlation_id=data.get("correlationId"),
)

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,10 @@
# Load unpacked Chromium extension directories into the Browser tool.
# Paths must be readable from the Agent Zero runtime itself.
extensions_enabled: false
# One unpacked extension directory per item.
extension_paths: []
# Optional _model_config preset used by Browser-owned model helpers.
# Empty uses the effective Main Model.
model_preset: ""

View file

@ -0,0 +1,10 @@
from helpers.extension import Extension
from plugins._browser.helpers.runtime import close_runtime_sync
class CleanupBrowserRuntimeOnRemove(Extension):
def execute(self, data: dict = {}, **kwargs):
args = data.get("args", ())
context_id = args[0] if isinstance(args, tuple) and args else ""
if context_id:
close_runtime_sync(str(context_id), delete_profile=True)

View file

@ -0,0 +1,11 @@
from helpers.extension import Extension
from plugins._browser.helpers.runtime import close_runtime_sync
class CleanupBrowserRuntimeOnReset(Extension):
def execute(self, data: dict = {}, **kwargs):
args = data.get("args", ())
context = args[0] if isinstance(args, tuple) and args else None
context_id = getattr(context, "id", "")
if context_id:
close_runtime_sync(context_id, delete_profile=True)

View file

@ -0,0 +1,59 @@
from __future__ import annotations
from typing import Any
from agent import LoopData
from helpers.extension import Extension
from plugins._browser.helpers.runtime import get_runtime
class BrowserContextPrompt(Extension):
async def execute(
self,
system_prompt: list[str] = [],
loop_data: LoopData = LoopData(),
**kwargs: Any,
):
if not self.agent:
return
runtime = await get_runtime(self.agent.context.id, create=False)
if not runtime:
return
try:
listing = await runtime.call("list")
except Exception:
return
browsers = listing.get("browsers") or []
if not browsers:
return
rows = ["browser id|url|title"]
for browser in browsers:
rows.append(
f"{browser.get('id')}|{browser.get('currentUrl', '')}|{browser.get('title', '')}"
)
section = ["currently open web browsers", "\n".join(rows)]
last_id = listing.get("last_interacted_browser_id")
if last_id:
try:
state = await runtime.call("state", last_id)
content = await runtime.call("content", last_id, None)
document = content.get("document") if isinstance(content, dict) else ""
if document:
section.extend(
[
"",
"last interacted web browser",
f"browser id|url|title\n{state.get('id')}|{state.get('currentUrl', '')}|{state.get('title', '')}",
"page content↓",
str(document),
]
)
except Exception:
pass
system_prompt.append("\n".join(section))

View file

@ -0,0 +1,24 @@
from __future__ import annotations
from typing import Any
from helpers.extension import Extension
from plugins._browser.api.ws_browser import WsBrowser
class BrowserWebuiWsDisconnect(Extension):
async def execute(
self,
instance: Any = None,
sid: str = "",
**kwargs: Any,
) -> None:
if instance is None:
return
handler = WsBrowser(
instance.socketio,
instance.lock,
manager=instance.manager,
namespace=instance.namespace,
)
await handler.on_disconnect(sid)

View file

@ -0,0 +1,47 @@
from __future__ import annotations
from typing import Any
from helpers.extension import Extension
from helpers.ws_manager import WsResult
from plugins._browser.api.ws_browser import WsBrowser
class BrowserWebuiWsEvents(Extension):
async def execute(
self,
instance: Any = None,
sid: str = "",
event_type: str = "",
data: dict[str, Any] | None = None,
response_data: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
if not event_type.startswith("browser_") or instance is None or response_data is None:
return
handler = WsBrowser(
instance.socketio,
instance.lock,
manager=instance.manager,
namespace=instance.namespace,
)
result = await handler.process(event_type, data or {}, sid)
if result is None:
return
if isinstance(result, WsResult):
payload = result.as_result(
handler_id=handler.identifier,
fallback_correlation_id=(data or {}).get("correlationId"),
)
if payload.get("ok"):
response_data.update(payload.get("data") or {})
else:
response_data["browser_error"] = payload.get("error") or {
"code": "BROWSER_ERROR",
"error": "Browser request failed",
}
return
response_data.update(result)

View file

@ -0,0 +1,17 @@
<button
type="button"
class="text-button browser-chat-action"
title="Show or hide Browser"
aria-label="Show or hide Browser"
data-bs-placement="top"
data-bs-trigger="hover"
@click="window.toggleModal ? window.toggleModal('/plugins/_browser/webui/main.html') : window.openModal('/plugins/_browser/webui/main.html')"
>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" width="14" height="14" aria-hidden="true">
<rect x="3" y="4" width="18" height="16" rx="2"></rect>
<path d="M3 8h18"></path>
<path d="M7 6h.01"></path>
<path d="M10 6h.01"></path>
</svg>
<p>Browser</p>
</button>

View file

@ -0,0 +1,76 @@
import {
createActionButton,
copyToClipboard,
} from "/components/messages/action-buttons/simple-action-buttons.js";
import { store as stepDetailStore } from "/components/modals/process-step-detail/step-detail-store.js";
import { store as speechStore } from "/components/chat/speech/speech-store.js";
import {
buildDetailPayload,
cleanStepTitle,
drawProcessStep,
} from "/js/messages.js";
const BROWSER_MODAL = "/plugins/_browser/webui/main.html";
export default async function registerBrowserToolHandler(extData) {
if (extData?.tool_name === "browser") {
extData.handler = drawBrowserTool;
}
}
function drawBrowserTool({
id,
type,
heading,
content,
kvps,
timestamp,
agentno = 0,
...additional
}) {
const title = cleanStepTitle(heading);
const displayKvps = { ...kvps };
const headerLabels = [
kvps?._tool_name && { label: kvps._tool_name, class: "tool-name-badge" },
].filter(Boolean);
const contentText = String(content ?? "");
const browserButton = createActionButton(
"visibility",
"Browser",
() => {
if (window.ensureModalOpen) {
void window.ensureModalOpen(BROWSER_MODAL);
return;
}
void window.openModal?.(BROWSER_MODAL);
},
);
browserButton.setAttribute("title", "Open Browser");
browserButton.setAttribute("aria-label", "Open Browser");
browserButton.setAttribute("data-bs-placement", "top");
browserButton.setAttribute("data-bs-trigger", "hover");
const actionButtons = [browserButton];
if (contentText.trim()) {
actionButtons.push(
createActionButton("detail", "", () =>
stepDetailStore.showStepDetail(
buildDetailPayload(arguments[0], { headerLabels }),
),
),
createActionButton("speak", "", () => speechStore.speak(contentText)),
createActionButton("copy", "", () => copyToClipboard(contentText)),
);
}
return drawProcessStep({
id,
title,
code: "WWW",
classes: undefined,
kvps: displayKvps,
content,
actionButtons: actionButtons.filter(Boolean),
log: arguments[0],
});
}

View file

@ -0,0 +1 @@
# Built-in direct browser helpers.

View file

@ -0,0 +1,272 @@
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING, Any
if TYPE_CHECKING:
from agent import Agent
PLUGIN_NAME = "_browser"
MODEL_PRESET_KEY = "model_preset"
BASE_BROWSER_ARGS = [
"--no-sandbox",
"--disable-dev-shm-usage",
"--disable-gpu",
]
def _normalize_extension_paths(value: Any) -> list[str]:
if isinstance(value, str):
candidates = value.replace("\r\n", "\n").replace("\r", "\n").split("\n")
elif isinstance(value, (list, tuple, set)):
candidates = list(value)
else:
candidates = []
normalized_paths: list[str] = []
seen: set[str] = set()
for entry in candidates:
raw_path = str(entry or "").strip()
if not raw_path:
continue
normalized = str(Path(raw_path).expanduser())
if normalized in seen:
continue
seen.add(normalized)
normalized_paths.append(normalized)
return normalized_paths
def _normalize_model_preset(value: Any) -> str:
return str(value or "").strip()
def normalize_browser_config(settings: dict[str, Any] | None) -> dict[str, Any]:
raw = settings if isinstance(settings, dict) else {}
return {
"extensions_enabled": bool(raw.get("extensions_enabled", False)),
"extension_paths": _normalize_extension_paths(raw.get("extension_paths", [])),
MODEL_PRESET_KEY: _normalize_model_preset(raw.get(MODEL_PRESET_KEY, "")),
}
def browser_runtime_config(settings: dict[str, Any] | None) -> dict[str, Any]:
config = normalize_browser_config(settings)
return {
"extensions_enabled": config["extensions_enabled"],
"extension_paths": config["extension_paths"],
}
def get_browser_config(agent: "Agent | None" = None) -> dict[str, Any]:
from helpers import plugins
return normalize_browser_config(plugins.get_plugin_config(PLUGIN_NAME, agent=agent) or {})
def get_browser_model_preset_name(
agent: "Agent | None" = None,
settings: dict[str, Any] | None = None,
) -> str:
config = (
normalize_browser_config(settings)
if settings is not None
else get_browser_config(agent=agent)
)
return str(config.get(MODEL_PRESET_KEY, "") or "").strip()
def get_browser_model_preset_options(
agent: "Agent | None" = None,
settings: dict[str, Any] | None = None,
) -> list[dict[str, Any]]:
from plugins._model_config.helpers import model_config
selected_name = get_browser_model_preset_name(agent=agent, settings=settings)
options: list[dict[str, Any]] = []
found_selected = False
for preset in model_config.get_presets():
name = str(preset.get("name", "") or "").strip()
if not name:
continue
if name == selected_name:
found_selected = True
chat_cfg = preset.get("chat", {}) if isinstance(preset, dict) else {}
if not isinstance(chat_cfg, dict):
chat_cfg = {}
provider = str(chat_cfg.get("provider", "") or "").strip()
model_name = str(chat_cfg.get("name", "") or "").strip()
summary = " / ".join(part for part in (provider, model_name) if part)
options.append(
{
"name": name,
"label": name,
"missing": False,
"summary": summary,
}
)
if selected_name and not found_selected:
options.append(
{
"name": selected_name,
"label": f"{selected_name} (missing)",
"missing": True,
"summary": "",
}
)
return options
def resolve_browser_model_selection(
agent: "Agent | None" = None,
settings: dict[str, Any] | None = None,
) -> dict[str, Any]:
from plugins._model_config.helpers import model_config
preset_name = get_browser_model_preset_name(agent=agent, settings=settings)
if preset_name:
preset = model_config.get_preset_by_name(preset_name)
if isinstance(preset, dict):
chat_cfg = preset.get("chat", {})
if isinstance(chat_cfg, dict) and (
str(chat_cfg.get("provider", "") or "").strip()
or str(chat_cfg.get("name", "") or "").strip()
):
return {
"config": chat_cfg,
"source_kind": "preset",
"source_label": f"Preset '{preset_name}' via _model_config",
"selected_preset_name": preset_name,
"preset_status": "active",
"warning": "",
}
return {
"config": model_config.get_chat_model_config(agent),
"source_kind": "main",
"source_label": "Main Model via _model_config",
"selected_preset_name": preset_name,
"preset_status": "invalid",
"warning": (
f"Configured browser preset '{preset_name}' does not define a chat model. "
"Falling back to the Main Model."
),
}
return {
"config": model_config.get_chat_model_config(agent),
"source_kind": "main",
"source_label": "Main Model via _model_config",
"selected_preset_name": preset_name,
"preset_status": "missing",
"warning": (
f"Configured browser preset '{preset_name}' was not found. "
"Falling back to the Main Model."
),
}
return {
"config": model_config.get_chat_model_config(agent),
"source_kind": "main",
"source_label": "Main Model via _model_config",
"selected_preset_name": "",
"preset_status": "none",
"warning": "",
}
def resolve_browser_model(agent: "Agent", settings: dict[str, Any] | None = None):
selection = resolve_browser_model_selection(agent=agent, settings=settings)
if selection["source_kind"] == "main":
return agent.get_chat_model()
import models
from plugins._model_config.helpers import model_config
model_config_object = model_config.build_model_config(
selection["config"],
models.ModelType.CHAT,
)
return models.get_chat_model(
model_config_object.provider,
model_config_object.name,
model_config=model_config_object,
**model_config_object.build_kwargs(),
)
def describe_browser_extensions(settings: dict[str, Any] | None) -> dict[str, Any]:
config = normalize_browser_config(settings)
path_details: list[dict[str, Any]] = []
for extension_path in config["extension_paths"]:
path = Path(extension_path)
exists = path.exists()
is_dir = path.is_dir() if exists else False
path_details.append(
{
"path": extension_path,
"exists": exists,
"is_dir": is_dir,
"loadable": exists and is_dir,
}
)
active_paths = [item["path"] for item in path_details if item["loadable"]]
invalid_paths = [item["path"] for item in path_details if not item["loadable"]]
active = bool(config["extensions_enabled"] and active_paths)
warnings: list[str] = []
if config["extensions_enabled"] and not config["extension_paths"]:
warnings.append(
"Extensions are enabled, but no unpacked extension directories are configured."
)
elif config["extensions_enabled"] and not active_paths:
warnings.append(
"Extensions are enabled, but none of the configured extension directories are readable unpacked folders."
)
elif invalid_paths:
warnings.append(
"Some configured extension directories are missing or not directories, so they will be skipped."
)
return {
"enabled": bool(config["extensions_enabled"]),
"active": active,
"configured_paths": config["extension_paths"],
"active_paths": active_paths,
"invalid_paths": invalid_paths,
"path_details": path_details,
"active_path_count": len(active_paths),
"warnings": warnings,
}
def build_browser_launch_config(settings: dict[str, Any] | None) -> dict[str, Any]:
extensions = describe_browser_extensions(settings)
args = list(BASE_BROWSER_ARGS)
channel: str | None = None
browser_mode = "headless_shell"
if extensions["active"]:
joined_paths = ",".join(extensions["active_paths"])
args.extend(
[
f"--disable-extensions-except={joined_paths}",
f"--load-extension={joined_paths}",
]
)
channel = "chromium"
browser_mode = "chromium_extensions"
else:
args.insert(0, "--headless=new")
return {
"args": args,
"browser_mode": browser_mode,
"channel": channel,
"extensions": extensions,
"requires_full_browser": bool(extensions["active"]),
}

View file

@ -0,0 +1,177 @@
from __future__ import annotations
import json
import re
import shutil
import tempfile
import urllib.request
import zipfile
from pathlib import Path
from typing import Any
from helpers import files, plugins
from plugins._browser.helpers.config import PLUGIN_NAME, get_browser_config
EXTENSION_ID_RE = re.compile(r"^[a-p]{32}$")
WEB_STORE_ID_RE = re.compile(r"(?<![a-p])([a-p]{32})(?![a-p])")
WEB_STORE_DOWNLOAD_URL = (
"https://clients2.google.com/service/update2/crx"
"?response=redirect"
"&prodversion=120.0.0.0"
"&acceptformat=crx2,crx3"
"&x=id%3D{extension_id}%26installsource%3Dondemand%26uc"
)
def get_extensions_root() -> Path:
root = Path(files.get_abs_path("usr/browser-extensions"))
root.mkdir(parents=True, exist_ok=True)
return root
def parse_chrome_web_store_extension_id(value: str) -> str:
source = str(value or "").strip()
if EXTENSION_ID_RE.fullmatch(source):
return source
match = WEB_STORE_ID_RE.search(source)
if match:
return match.group(1)
raise ValueError("Enter a Chrome Web Store URL or a 32-character extension id.")
def list_browser_extensions() -> list[dict[str, Any]]:
root = get_extensions_root()
config = get_browser_config()
enabled_paths = {str(Path(path).expanduser()) for path in config["extension_paths"]}
entries: list[dict[str, Any]] = []
for manifest_path in sorted(root.glob("**/manifest.json")):
extension_dir = manifest_path.parent
try:
manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
except Exception:
manifest = {}
extension_path = str(extension_dir)
entries.append(
{
"name": manifest.get("name") or extension_dir.name,
"version": manifest.get("version") or "",
"path": extension_path,
"enabled": extension_path in enabled_paths,
}
)
return entries
def install_chrome_web_store_extension(source: str) -> dict[str, Any]:
extension_id = parse_chrome_web_store_extension_id(source)
target = get_extensions_root() / "chrome-web-store" / extension_id
with tempfile.TemporaryDirectory(prefix="a0-browser-ext-") as tmp:
archive_path = Path(tmp) / f"{extension_id}.crx"
_download_crx(extension_id, archive_path)
payload_path = Path(tmp) / f"{extension_id}.zip"
payload_path.write_bytes(_crx_zip_payload(archive_path.read_bytes()))
extracted_path = Path(tmp) / "extracted"
_safe_extract_zip(payload_path, extracted_path)
if not (extracted_path / "manifest.json").is_file():
raise ValueError("Downloaded extension did not contain a manifest.json file.")
if target.exists():
shutil.rmtree(target)
target.parent.mkdir(parents=True, exist_ok=True)
shutil.copytree(extracted_path, target)
config = _enable_extension_path(target)
manifest = _read_manifest(target)
return {
"ok": True,
"id": extension_id,
"name": manifest.get("name") or extension_id,
"version": manifest.get("version") or "",
"path": str(target),
"extensions_enabled": config["extensions_enabled"],
"extension_paths": config["extension_paths"],
}
def _download_crx(extension_id: str, archive_path: Path) -> None:
url = WEB_STORE_DOWNLOAD_URL.format(extension_id=extension_id)
request = urllib.request.Request(
url,
headers={
"User-Agent": (
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
},
)
with urllib.request.urlopen(request, timeout=30) as response:
data = response.read()
if not data:
raise ValueError("Chrome Web Store returned an empty extension package.")
archive_path.write_bytes(data)
def _crx_zip_payload(data: bytes) -> bytes:
if data.startswith(b"PK"):
return data
if data[:4] != b"Cr24":
raise ValueError("Downloaded package is not a CRX or ZIP archive.")
version = int.from_bytes(data[4:8], "little")
if version == 2:
public_key_len = int.from_bytes(data[8:12], "little")
signature_len = int.from_bytes(data[12:16], "little")
offset = 16 + public_key_len + signature_len
elif version == 3:
header_len = int.from_bytes(data[8:12], "little")
offset = 12 + header_len
else:
raise ValueError(f"Unsupported CRX version: {version}.")
payload = data[offset:]
if not payload.startswith(b"PK"):
raise ValueError("CRX payload did not contain a ZIP archive.")
return payload
def _safe_extract_zip(archive_path: Path, target_dir: Path) -> None:
target_dir.mkdir(parents=True, exist_ok=True)
root = target_dir.resolve()
with zipfile.ZipFile(archive_path) as archive:
for member in archive.infolist():
destination = (target_dir / member.filename).resolve()
if not destination.is_relative_to(root):
raise ValueError("Extension archive contains an unsafe path.")
if member.is_dir():
destination.mkdir(parents=True, exist_ok=True)
continue
destination.parent.mkdir(parents=True, exist_ok=True)
with archive.open(member) as source, destination.open("wb") as output:
shutil.copyfileobj(source, output)
def _enable_extension_path(extension_path: Path) -> dict[str, Any]:
config = get_browser_config()
path = str(extension_path)
paths = list(config["extension_paths"])
if path not in paths:
paths.append(path)
config["extensions_enabled"] = True
config["extension_paths"] = paths
plugins.save_plugin_config(PLUGIN_NAME, "", "", config)
return config
def _read_manifest(extension_path: Path) -> dict[str, Any]:
manifest_path = extension_path / "manifest.json"
try:
return json.loads(manifest_path.read_text(encoding="utf-8"))
except Exception:
return {}

View file

@ -0,0 +1,57 @@
import os
import subprocess
from pathlib import Path
from helpers import files
HEADLESS_SHELL_PATTERNS = (
"chromium_headless_shell-*/chrome-*/headless_shell",
"chromium_headless_shell-*/chrome-*/headless_shell.exe",
)
FULL_CHROMIUM_PATTERNS = (
"chromium-*/chrome-linux/chrome",
"chromium-*/chrome-win/chrome.exe",
)
def get_playwright_cache_dir() -> str:
return files.get_abs_path("tmp/playwright")
def configure_playwright_env() -> str:
cache_dir = get_playwright_cache_dir()
os.environ["PLAYWRIGHT_BROWSERS_PATH"] = cache_dir
return cache_dir
def get_playwright_binary(*, full_browser: bool = False) -> Path | None:
cache_dir = Path(get_playwright_cache_dir())
patterns = FULL_CHROMIUM_PATTERNS if full_browser else (HEADLESS_SHELL_PATTERNS + FULL_CHROMIUM_PATTERNS)
for pattern in patterns:
binary = next(cache_dir.glob(pattern), None)
if binary and binary.exists():
return binary
return None
def ensure_playwright_binary(*, full_browser: bool = False) -> Path:
binary = get_playwright_binary(full_browser=full_browser)
if binary:
return binary
cache_dir = configure_playwright_env()
env = os.environ.copy()
env["PLAYWRIGHT_BROWSERS_PATH"] = cache_dir
install_command = ["playwright", "install", "chromium"]
if not full_browser:
install_command.append("--only-shell")
subprocess.check_call(
install_command,
env=env,
)
binary = get_playwright_binary(full_browser=full_browser)
if not binary:
raise RuntimeError("Playwright Chromium binary not found after installation")
return binary

View file

@ -0,0 +1,623 @@
from __future__ import annotations
import atexit
import asyncio
import base64
import re
import shutil
import threading
from dataclasses import dataclass
from pathlib import Path
from typing import Any
from urllib.parse import urlsplit, urlunsplit
from helpers import files
from helpers.defer import DeferredTask
from helpers.print_style import PrintStyle
from plugins._browser.helpers.config import build_browser_launch_config, get_browser_config
from plugins._browser.helpers.playwright import configure_playwright_env, ensure_playwright_binary
PLUGIN_DIR = Path(__file__).resolve().parents[1]
CONTENT_HELPER_PATH = PLUGIN_DIR / "assets" / "browser-page-content.js"
RUNTIME_DATA_KEY = "_browser_runtime"
DEFAULT_VIEWPORT = {"width": 1024, "height": 768}
_SPECIAL_SCHEME_RE = re.compile(r"^(?:about|blob|data|file|mailto|tel):", re.I)
_URL_SCHEME_RE = re.compile(r"^[a-z][a-z\d+\-.]*://", re.I)
_LOCAL_HOST_RE = re.compile(
r"^(?:localhost|\[[0-9a-f:.]+\]|(?:\d{1,3}\.){3}\d{1,3})(?::\d+)?$",
re.I,
)
_TYPED_HOST_RE = re.compile(
r"^(?:localhost|\[[0-9a-f:.]+\]|(?:\d{1,3}\.){3}\d{1,3}|"
r"(?:[a-z\d](?:[a-z\d-]{0,61}[a-z\d])?\.)+[a-z\d-]{2,63})(?::\d+)?$",
re.I,
)
_SAFE_CONTEXT_RE = re.compile(r"[^a-zA-Z0-9_.-]+")
def normalize_url(value: str) -> str:
raw = str(value or "").strip()
if not raw:
raise ValueError("Browser navigation requires a non-empty URL.")
def with_trailing_path(url: str) -> str:
parts = urlsplit(url)
if parts.scheme in {"http", "https"} and not parts.path:
return urlunsplit((parts.scheme, parts.netloc, "/", parts.query, parts.fragment))
return urlunsplit(parts)
try:
host = re.split(r"[/?#]", raw, 1)[0] or ""
if (
not _URL_SCHEME_RE.match(raw)
and not _SPECIAL_SCHEME_RE.match(raw)
and not raw.startswith(("/", "?", "#", "."))
and not re.search(r"\s", raw)
and _TYPED_HOST_RE.match(host)
):
protocol = "http://" if _LOCAL_HOST_RE.match(host) else "https://"
return with_trailing_path(protocol + raw)
parts = urlsplit(raw)
if parts.scheme:
return with_trailing_path(raw)
except Exception:
pass
return with_trailing_path("https://" + raw)
def _safe_context_id(context_id: str) -> str:
return _SAFE_CONTEXT_RE.sub("_", str(context_id or "default")).strip("._") or "default"
@dataclass
class BrowserPage:
id: int
page: Any
class BrowserRuntime:
def __init__(self, context_id: str):
self.context_id = str(context_id)
self._core = _BrowserRuntimeCore(self.context_id)
self._worker = DeferredTask(thread_name=f"BrowserRuntime-{self.context_id}")
self._closed = False
async def call(self, method: str, *args: Any, **kwargs: Any) -> Any:
if self._closed and method != "close":
raise RuntimeError("Browser runtime is closed.")
async def runner():
fn = getattr(self._core, method)
return await fn(*args, **kwargs)
return await self._worker.execute_inside(runner)
async def close(self, delete_profile: bool = False) -> None:
if self._closed:
return
try:
await self.call("close", delete_profile=delete_profile)
finally:
self._closed = True
self._worker.kill(terminate_thread=True)
class _BrowserRuntimeCore:
def __init__(self, context_id: str):
self.context_id = context_id
self.safe_context_id = _safe_context_id(context_id)
self.playwright = None
self.context = None
self.pages: dict[int, BrowserPage] = {}
self.next_browser_id = 1
self.last_interacted_browser_id: int | None = None
self._content_helper_source: str | None = None
@property
def profile_dir(self) -> Path:
return Path(files.get_abs_path("tmp/browser/sessions", self.safe_context_id))
@property
def downloads_dir(self) -> Path:
return Path(files.get_abs_path("usr/downloads/browser"))
async def ensure_started(self) -> None:
if self.context:
return
from playwright.async_api import async_playwright
self.profile_dir.mkdir(parents=True, exist_ok=True)
self.downloads_dir.mkdir(parents=True, exist_ok=True)
browser_config = get_browser_config()
launch_config = build_browser_launch_config(browser_config)
configure_playwright_env()
browser_binary = ensure_playwright_binary(
full_browser=launch_config["requires_full_browser"]
)
self.playwright = await async_playwright().start()
launch_kwargs: dict[str, Any] = {
"user_data_dir": str(self.profile_dir),
"headless": True,
"accept_downloads": True,
"downloads_path": str(self.downloads_dir),
"viewport": DEFAULT_VIEWPORT,
"screen": DEFAULT_VIEWPORT,
"no_viewport": False,
"args": launch_config["args"],
}
if launch_config["channel"]:
launch_kwargs["channel"] = launch_config["channel"]
else:
launch_kwargs["executable_path"] = str(browser_binary)
self.context = await self.playwright.chromium.launch_persistent_context(
**launch_kwargs
)
self.context.set_default_timeout(30000)
self.context.set_default_navigation_timeout(30000)
await self.context.add_init_script(self._shadow_dom_script())
await self.context.add_init_script(path=str(CONTENT_HELPER_PATH))
for page in list(self.context.pages):
if page.url == "about:blank":
try:
await page.close()
except Exception:
pass
continue
self._register_page(page)
async def open(self, url: str = "about:blank") -> dict[str, Any]:
await self.ensure_started()
page = await self.context.new_page()
browser_page = self._register_page(page)
self.last_interacted_browser_id = browser_page.id
if url and url != "about:blank":
await self._goto(page, normalize_url(url))
else:
await self._settle(page)
return {"id": browser_page.id, "state": await self._state(browser_page.id)}
async def list(self) -> dict[str, Any]:
await self.ensure_started()
return {
"browsers": [await self._state(browser_id) for browser_id in sorted(self.pages)],
"last_interacted_browser_id": self.last_interacted_browser_id,
}
async def state(self, browser_id: int | str | None = None) -> dict[str, Any]:
await self.ensure_started()
return await self._state(self._resolve_browser_id(browser_id))
async def navigate(self, browser_id: int | str | None, url: str) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await self._goto(page, normalize_url(url))
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def back(self, browser_id: int | str | None = None) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await page.go_back(wait_until="domcontentloaded", timeout=10000)
await self._settle(page)
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def forward(self, browser_id: int | str | None = None) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await page.go_forward(wait_until="domcontentloaded", timeout=10000)
await self._settle(page)
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def reload(self, browser_id: int | str | None = None) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await page.reload(wait_until="domcontentloaded", timeout=15000)
await self._settle(page)
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def content(
self,
browser_id: int | str | None = None,
payload: dict[str, Any] | None = None,
) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await self._ensure_content_helper(page)
result = await page.evaluate(
"(payload) => globalThis.__spaceBrowserPageContent__.capture(payload || null)",
payload or None,
)
self.last_interacted_browser_id = resolved_id
return result or {}
async def detail(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await self._ensure_content_helper(page)
result = await page.evaluate(
"(ref) => globalThis.__spaceBrowserPageContent__.detail(ref)",
reference_id,
)
self.last_interacted_browser_id = resolved_id
return result or {}
async def evaluate(self, browser_id: int | str | None, script: str) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
result = await page.evaluate(str(script or "undefined"))
self.last_interacted_browser_id = resolved_id
return {"result": result, "state": await self._state(resolved_id)}
async def click(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]:
return await self._reference_action("click", browser_id, reference_id)
async def submit(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]:
return await self._reference_action("submit", browser_id, reference_id)
async def scroll(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]:
return await self._reference_action("scroll", browser_id, reference_id)
async def type(
self,
browser_id: int | str | None,
reference_id: int | str,
text: str,
) -> dict[str, Any]:
return await self._reference_action("type", browser_id, reference_id, text)
async def type_submit(
self,
browser_id: int | str | None,
reference_id: int | str,
text: str,
) -> dict[str, Any]:
return await self._reference_action("typeSubmit", browser_id, reference_id, text)
async def close_browser(self, browser_id: int | str | None = None) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await page.close()
self.pages.pop(resolved_id, None)
if self.last_interacted_browser_id == resolved_id:
self.last_interacted_browser_id = next(iter(sorted(self.pages)), None)
return await self.list()
async def close_all_browsers(self) -> dict[str, Any]:
await self.ensure_started()
for browser_id in list(self.pages):
try:
await self.pages[browser_id].page.close()
except Exception:
pass
self.pages.clear()
self.last_interacted_browser_id = None
return {"browsers": [], "last_interacted_browser_id": None}
async def screenshot(
self,
browser_id: int | str | None = None,
*,
quality: int = 70,
) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
image = await page.screenshot(type="jpeg", quality=max(20, min(95, int(quality))))
return {
"browser_id": resolved_id,
"mime": "image/jpeg",
"image": base64.b64encode(image).decode("ascii"),
"state": await self._state(resolved_id),
}
async def set_viewport(
self,
browser_id: int | str | None,
width: int,
height: int,
) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
viewport = {
"width": max(320, min(4096, int(width or DEFAULT_VIEWPORT["width"]))),
"height": max(200, min(4096, int(height or DEFAULT_VIEWPORT["height"]))),
}
await page.set_viewport_size(viewport)
self.last_interacted_browser_id = resolved_id
return {"state": await self._state(resolved_id), "viewport": viewport}
async def mouse(
self,
browser_id: int | str | None,
event_type: str,
x: float,
y: float,
button: str = "left",
) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
event_type = str(event_type or "click").lower()
if event_type == "move":
await page.mouse.move(float(x), float(y))
elif event_type == "down":
await page.mouse.down(button=button)
elif event_type == "up":
await page.mouse.up(button=button)
else:
await page.mouse.click(float(x), float(y), button=button)
await self._settle(page, short=True)
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def wheel(
self,
browser_id: int | str | None,
x: float,
y: float,
delta_x: float = 0,
delta_y: float = 0,
) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await page.mouse.move(float(x), float(y))
await page.mouse.wheel(float(delta_x), float(delta_y))
await self._settle(page, short=True)
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def keyboard(
self,
browser_id: int | str | None,
*,
key: str = "",
text: str = "",
) -> dict[str, Any]:
await self.ensure_started()
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
if text:
await page.keyboard.type(str(text))
elif key:
await page.keyboard.press(str(key))
await self._settle(page, short=True)
self.last_interacted_browser_id = resolved_id
return await self._state(resolved_id)
async def close(self, delete_profile: bool = False) -> None:
for browser_id in list(self.pages):
try:
await self.pages[browser_id].page.close()
except Exception:
pass
self.pages.clear()
if self.context:
try:
await self.context.close()
except Exception as exc:
PrintStyle.warning(f"Browser context close failed: {exc}")
self.context = None
if self.playwright:
try:
await self.playwright.stop()
except Exception as exc:
PrintStyle.warning(f"Playwright stop failed: {exc}")
self.playwright = None
self.last_interacted_browser_id = None
if delete_profile:
shutil.rmtree(self.profile_dir, ignore_errors=True)
async def _reference_action(
self,
helper_method: str,
browser_id: int | str | None,
reference_id: int | str,
text: str | None = None,
) -> dict[str, Any]:
resolved_id = self._resolve_browser_id(browser_id)
page = self._page(resolved_id)
await self._ensure_content_helper(page)
if text is None:
action = await page.evaluate(
"(args) => globalThis.__spaceBrowserPageContent__[args.method](args.ref)",
{"method": helper_method, "ref": reference_id},
)
else:
action = await page.evaluate(
"(args) => globalThis.__spaceBrowserPageContent__[args.method](args.ref, args.text)",
{"method": helper_method, "ref": reference_id, "text": text},
)
await self._settle(page, short=False)
self.last_interacted_browser_id = resolved_id
return {"action": action or {}, "state": await self._state(resolved_id)}
async def _goto(self, page: Any, url: str) -> None:
from playwright.async_api import TimeoutError as PlaywrightTimeoutError
try:
await page.goto(url, wait_until="domcontentloaded", timeout=30000)
except PlaywrightTimeoutError:
PrintStyle.warning(f"Browser navigation timed out after DOM handoff: {url}")
await self._settle(page)
async def _settle(self, page: Any, short: bool = False) -> None:
from playwright.async_api import TimeoutError as PlaywrightTimeoutError
try:
await page.wait_for_load_state(
"domcontentloaded",
timeout=1000 if short else 5000,
)
except PlaywrightTimeoutError:
pass
await asyncio.sleep(0.1 if short else 0.35)
async def _state(self, browser_id: int) -> dict[str, Any]:
browser_page = self.pages.get(int(browser_id))
if not browser_page:
raise KeyError(f"Browser {browser_id} is not open.")
page = browser_page.page
try:
title = await page.title()
except Exception:
title = ""
try:
history_length = await page.evaluate("() => globalThis.history?.length || 0")
except Exception:
history_length = 0
return {
"id": browser_page.id,
"currentUrl": page.url,
"title": title,
"canGoBack": bool(history_length and int(history_length) > 1),
"canGoForward": False,
"loading": False,
}
def _register_page(self, page: Any) -> BrowserPage:
existing = self._browser_id_for_page(page)
if existing is not None:
return self.pages[existing]
browser_id = self.next_browser_id
self.next_browser_id += 1
browser_page = BrowserPage(id=browser_id, page=page)
self.pages[browser_id] = browser_page
def on_close() -> None:
self.pages.pop(browser_id, None)
page.on("close", on_close)
return browser_page
def _browser_id_for_page(self, page: Any) -> int | None:
for browser_id, browser_page in self.pages.items():
if browser_page.page == page:
return browser_id
return None
def _resolve_browser_id(self, browser_id: int | str | None = None) -> int:
if browser_id is None or str(browser_id).strip() == "":
if self.last_interacted_browser_id in self.pages:
return int(self.last_interacted_browser_id)
if self.pages:
return sorted(self.pages)[0]
raise KeyError("No browser is open. Use action=open first.")
value = str(browser_id).strip()
if value.startswith("browser-"):
value = value.split("-", 1)[1]
resolved = int(value)
if resolved not in self.pages:
raise KeyError(f"Browser {resolved} is not open.")
return resolved
def _page(self, browser_id: int) -> Any:
return self.pages[int(browser_id)].page
async def _ensure_content_helper(self, page: Any) -> None:
has_helper = await page.evaluate(
"() => Boolean(globalThis.__spaceBrowserPageContent__?.capture)"
)
if has_helper:
return
if self._content_helper_source is None:
self._content_helper_source = CONTENT_HELPER_PATH.read_text(encoding="utf-8")
await page.evaluate(self._content_helper_source)
@staticmethod
def _shadow_dom_script() -> str:
return """
(() => {
const original = Element.prototype.attachShadow;
if (original && !original.__a0BrowserOpenShadowPatch) {
const patched = function attachShadow(options) {
return original.call(this, { ...(options || {}), mode: "open" });
};
patched.__a0BrowserOpenShadowPatch = true;
Element.prototype.attachShadow = patched;
}
})();
"""
_runtimes: dict[str, BrowserRuntime] = {}
_runtime_lock = threading.RLock()
async def get_runtime(context_id: str, *, create: bool = True) -> BrowserRuntime | None:
context_id = str(context_id or "").strip()
if not context_id:
raise ValueError("context_id is required")
with _runtime_lock:
runtime = _runtimes.get(context_id)
if runtime is None and create:
runtime = BrowserRuntime(context_id)
_runtimes[context_id] = runtime
return runtime
async def close_runtime(context_id: str, *, delete_profile: bool = True) -> None:
context_id = str(context_id or "").strip()
if not context_id:
return
with _runtime_lock:
runtime = _runtimes.pop(context_id, None)
if runtime:
await runtime.close(delete_profile=delete_profile)
def close_runtime_sync(context_id: str, *, delete_profile: bool = True) -> None:
task = DeferredTask(thread_name="BrowserCleanup")
task.start_task(close_runtime, context_id, delete_profile=delete_profile)
try:
task.result_sync(timeout=30)
finally:
task.kill(terminate_thread=True)
async def close_all_runtimes(*, delete_profiles: bool = False) -> None:
with _runtime_lock:
runtimes = list(_runtimes.values())
_runtimes.clear()
for runtime in runtimes:
try:
await runtime.close(delete_profile=delete_profiles)
except Exception as exc:
PrintStyle.warning(f"Browser runtime cleanup failed: {exc}")
def close_all_runtimes_sync() -> None:
task = DeferredTask(thread_name="BrowserCleanupAll")
task.start_task(close_all_runtimes, delete_profiles=False)
try:
task.result_sync(timeout=30)
finally:
task.kill(terminate_thread=True)
def known_context_ids() -> list[str]:
with _runtime_lock:
return sorted(_runtimes)
atexit.register(close_all_runtimes_sync)

47
plugins/_browser/hooks.py Normal file
View file

@ -0,0 +1,47 @@
from __future__ import annotations
from helpers import files, plugins, yaml as yaml_helper
from plugins._browser.helpers.config import (
PLUGIN_NAME,
browser_runtime_config,
normalize_browser_config,
)
from plugins._browser.helpers.runtime import close_all_runtimes_sync
def _load_saved_browser_config(project_name: str = "", agent_profile: str = "") -> dict:
entries = plugins.find_plugin_assets(
plugins.CONFIG_FILE_NAME,
plugin_name=PLUGIN_NAME,
project_name=project_name,
agent_profile=agent_profile,
only_first=True,
)
path = entries[0].get("path", "") if entries else ""
if path and files.exists(path):
return files.read_file_json(path) or {}
plugin_dir = plugins.find_plugin_dir(PLUGIN_NAME)
default_path = (
files.get_abs_path(plugin_dir, plugins.CONFIG_DEFAULT_FILE_NAME)
if plugin_dir
else ""
)
if default_path and files.exists(default_path):
return yaml_helper.loads(files.read_file(default_path)) or {}
return {}
def get_plugin_config(default=None, **kwargs):
return normalize_browser_config(default)
def save_plugin_config(settings=None, project_name="", agent_profile="", **kwargs):
normalized = normalize_browser_config(settings)
current = normalize_browser_config(
_load_saved_browser_config(project_name=project_name, agent_profile=agent_profile)
)
if browser_runtime_config(normalized) != browser_runtime_config(current):
close_all_runtimes_sync()
return normalized

View file

@ -0,0 +1,9 @@
name: _browser
title: Browser
description: Built-in direct Playwright browser tool and WebUI viewer.
version: 1.0.0
always_enabled: false
settings_sections:
- external
per_project_config: false
per_agent_config: false

View file

@ -0,0 +1,48 @@
### browser
direct Playwright browser control with visible WebUI viewer
use for web browsing, page inspection, forms, downloads, and browser-only tasks
state stays open per chat context
refs come from content as typed markers: [link 3], [button 6], [image 1], [input text 8]
actions: open list state navigate back forward reload content detail click type submit type_submit scroll evaluate close close_all
common args: action browser_id url ref text selector selectors script
workflow:
- open creates a new browser and returns id/state
- content returns readable page markdown with typed refs
- detail inspects one ref, including link/image/input/button metadata
- click/type/type_submit/submit/scroll use refs from latest content capture and return {action,state}
- navigate/back/forward/reload return fresh state
- list shows open browsers
examples:
~~~json
{
"tool_name": "browser",
"tool_args": {
"action": "open",
"url": "https://example.com"
}
}
~~~
~~~json
{
"tool_name": "browser",
"tool_args": {
"action": "content",
"browser_id": 1
}
}
~~~
~~~json
{
"tool_name": "browser",
"tool_args": {
"action": "click",
"browser_id": 1,
"ref": 3
}
}
~~~

View file

@ -0,0 +1,107 @@
from __future__ import annotations
import json
from typing import Any
from helpers.tool import Response, Tool
from plugins._browser.helpers.runtime import get_runtime
class Browser(Tool):
async def execute(
self,
action: str = "",
browser_id: int | str | None = None,
url: str = "",
ref: int | str | None = None,
text: str = "",
selector: str = "",
selectors: list[str] | None = None,
script: str = "",
**kwargs: Any,
) -> Response:
action = str(action or self.method or "state").strip().lower().replace("-", "_")
runtime = await get_runtime(self.agent.context.id)
try:
if action == "open":
result = await runtime.call("open", url or "about:blank")
elif action == "list":
result = await runtime.call("list")
elif action == "state":
result = await runtime.call("state", browser_id)
elif action == "navigate":
result = await runtime.call("navigate", browser_id, url)
elif action == "back":
result = await runtime.call("back", browser_id)
elif action == "forward":
result = await runtime.call("forward", browser_id)
elif action == "reload":
result = await runtime.call("reload", browser_id)
elif action == "content":
payload = self._selector_payload(selector, selectors)
result = await runtime.call("content", browser_id, payload)
elif action == "detail":
result = await runtime.call("detail", browser_id, self._require_ref(ref))
elif action == "click":
result = await runtime.call("click", browser_id, self._require_ref(ref))
elif action == "type":
result = await runtime.call("type", browser_id, self._require_ref(ref), text)
elif action == "submit":
result = await runtime.call("submit", browser_id, self._require_ref(ref))
elif action in {"type_submit", "typesubmit"}:
result = await runtime.call(
"type_submit",
browser_id,
self._require_ref(ref),
text,
)
elif action == "scroll":
result = await runtime.call("scroll", browser_id, self._require_ref(ref))
elif action == "evaluate":
result = await runtime.call("evaluate", browser_id, script)
elif action == "close":
result = await runtime.call("close_browser", browser_id)
elif action == "close_all":
result = await runtime.call("close_all_browsers")
else:
return Response(
message=f"Unknown browser action: {action}",
break_loop=False,
)
except Exception as exc:
return Response(message=f"Browser {action} failed: {exc}", break_loop=False)
return Response(message=self._format_result(action, result), break_loop=False)
def get_log_object(self):
return self.agent.context.log.log(
type="tool",
heading=f"icon://captive_portal {self.agent.agent_name}: Using browser",
content="",
kvps=self.args,
_tool_name=self.name,
)
@staticmethod
def _require_ref(ref: int | str | None) -> int | str:
if ref is None or str(ref).strip() == "":
raise ValueError("ref is required for this browser action")
return ref
@staticmethod
def _selector_payload(selector: str = "", selectors: list[str] | None = None) -> dict | None:
if selectors:
return {"selectors": selectors}
if selector:
return {"selector": selector}
return None
@staticmethod
def _format_result(action: str, result: Any) -> str:
if action == "content" and isinstance(result, dict):
if set(result.keys()) == {"document"}:
return str(result.get("document") or "")
return json.dumps(result, indent=2, ensure_ascii=False)
return json.dumps(result, indent=2, ensure_ascii=False, default=str)

View file

@ -0,0 +1,155 @@
import { createStore } from "/js/AlpineStore.js";
import { fetchApi } from "/js/api.js";
const MODEL_CONFIG_API = "/plugins/_model_config/model_presets";
function normalizePathList(value) {
const source = Array.isArray(value)
? value
: String(value || "").split(/\r?\n/);
const seen = new Set();
const paths = [];
for (const item of source) {
const path = String(item || "").trim();
if (!path || seen.has(path)) continue;
seen.add(path);
paths.push(path);
}
return paths;
}
function ensureConfig(config) {
if (!config || typeof config !== "object") return null;
if (typeof config.extensions_enabled !== "boolean") {
config.extensions_enabled = Boolean(config.extensions_enabled);
}
config.extension_paths = normalizePathList(config.extension_paths);
config.model_preset = String(config.model_preset || "").trim();
delete config.model;
return config;
}
export const store = createStore("browserConfig", {
config: null,
extensionPathsText: "",
presets: [],
presetsLoading: false,
presetsError: "",
_presetsLoaded: false,
async init(config) {
this.bindConfig(config);
await this.loadPresets();
},
cleanup() {
this.config = null;
this.extensionPathsText = "";
this.presetsError = "";
},
bindConfig(config) {
const safeConfig = ensureConfig(config);
if (!safeConfig) return;
if (this.config === safeConfig) return;
this.config = safeConfig;
this.extensionPathsText = safeConfig.extension_paths.join("\n");
},
setExtensionPathsText(value) {
this.extensionPathsText = String(value || "");
this.syncExtensionPaths();
},
syncExtensionPaths() {
const safeConfig = ensureConfig(this.config);
if (!safeConfig) return;
safeConfig.extension_paths = normalizePathList(this.extensionPathsText);
},
hasPaths() {
return this.pathCount() > 0;
},
pathCount() {
return normalizePathList(this.extensionPathsText).length;
},
pathCountLabel() {
const count = this.pathCount();
if (!count) return "No extension paths configured";
return `${count} path${count === 1 ? "" : "s"} configured`;
},
extensionModeReady() {
const safeConfig = ensureConfig(this.config);
return Boolean(safeConfig?.extensions_enabled && this.pathCount());
},
async loadPresets() {
if (this._presetsLoaded || this.presetsLoading) return;
this.presetsLoading = true;
this.presetsError = "";
try {
const response = await fetchApi(MODEL_CONFIG_API, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ action: "get" }),
});
const data = await response.json().catch(() => ({}));
this.presets = Array.isArray(data?.presets)
? data.presets.filter((preset) => String(preset?.name || "").trim())
: [];
this._presetsLoaded = true;
} catch (error) {
this.presets = [];
this.presetsError = error instanceof Error ? error.message : String(error);
} finally {
this.presetsLoading = false;
}
},
selectedPreset() {
const selected = String(this.config?.model_preset || "").trim();
if (!selected) return null;
return this.presets.find((preset) => preset?.name === selected) || null;
},
presetOptions() {
const selected = String(this.config?.model_preset || "").trim();
const options = this.presets.map((preset) => ({
...preset,
label: preset.name,
missing: false,
}));
if (selected && this._presetsLoaded && !options.some((preset) => preset.name === selected)) {
options.push({
name: selected,
label: `${selected} (missing)`,
missing: true,
});
}
return options;
},
selectedPresetSummary() {
const selected = String(this.config?.model_preset || "").trim();
if (!selected) return "Using the effective Main Model.";
const preset = this.selectedPreset();
if (!preset) return `Preset "${selected}" is not available. Browser will fall back to the Main Model.`;
const chat = preset.chat || {};
const parts = [chat.provider, chat.name].filter((item) => String(item || "").trim());
return parts.length ? parts.join(" / ") : "This preset has no Main Model; Browser will fall back to the Main Model.";
},
selectedPresetMissing() {
const selected = String(this.config?.model_preset || "").trim();
return Boolean(selected && this._presetsLoaded && !this.selectedPreset());
},
openPresets() {
void globalThis.openModal?.("/plugins/_model_config/webui/main.html");
},
});

View file

@ -0,0 +1,596 @@
import { createStore } from "/js/AlpineStore.js";
import { callJsonApi } from "/js/api.js";
import { getNamespacedClient } from "/js/websocket.js";
import { store as chatInputStore } from "/components/chat/input/input-store.js";
import { store as fileBrowserStore } from "/components/modals/file-browser/file-browser-store.js";
import { store as pluginSettingsStore } from "/components/plugins/plugin-settings-store.js";
const websocket = getNamespacedClient("/ws");
websocket.addHandlers(["ws_webui"]);
const EXTENSIONS_ROOT_FALLBACK = "/a0/usr/browser-extensions";
function firstOk(response) {
const result = response?.results?.find((item) => item?.ok);
if (result) {
const data = result.data || {};
if (data.browser_error) {
throw new Error(data.browser_error.error || data.browser_error.code || "Browser request failed");
}
return data;
}
const error = response?.results?.find((item) => !item?.ok)?.error;
if (error) throw new Error(error.error || error.code || "Browser request failed");
return {};
}
const model = {
loading: true,
error: "",
status: null,
contextId: "",
browsers: [],
activeBrowserId: null,
address: "",
frameSrc: "",
frameState: null,
connected: false,
addressFocused: false,
_frameOff: null,
_stateOff: null,
_lastFrameAt: 0,
_floatingCleanup: null,
_stageElement: null,
_stageResizeObserver: null,
_viewportSyncTimer: null,
_lastViewportKey: "",
extensionMenuOpen: false,
extensionInstallUrl: "",
extensionActionLoading: false,
extensionActionMessage: "",
extensionActionError: "",
extensionsRoot: "",
extensionsList: [],
async refreshStatus() {
this.status = await callJsonApi("/plugins/_browser/status", {});
},
async refreshExtensionsList() {
const response = await callJsonApi("/plugins/_browser/extensions", { action: "list" });
if (response?.ok) {
this.extensionsRoot = response.root || EXTENSIONS_ROOT_FALLBACK;
this.extensionsList = Array.isArray(response.extensions) ? response.extensions : [];
}
},
toggleExtensionsMenu() {
this.extensionMenuOpen = !this.extensionMenuOpen;
if (this.extensionMenuOpen) {
this.extensionActionMessage = "";
this.extensionActionError = "";
void this.refreshExtensionsList();
}
},
closeExtensionsMenu() {
this.extensionMenuOpen = false;
},
resolveContextId() {
const urlContext = new URLSearchParams(globalThis.location?.search || "").get("ctxid");
const selectedChat = globalThis.Alpine?.store?.("chats")?.selected;
return globalThis.getContext?.() || urlContext || selectedChat || "";
},
async openExtensionsSettings() {
if (!pluginSettingsStore?.openConfig) {
this.error = "Browser settings are unavailable.";
return;
}
try {
this.closeExtensionsMenu();
await pluginSettingsStore.openConfig("_browser");
await this.refreshAfterSettingsClose();
} catch (error) {
this.error = error instanceof Error ? error.message : String(error);
}
},
async refreshAfterSettingsClose() {
this.loading = true;
this.error = "";
try {
await this.refreshStatus();
await this.refreshExtensionsList();
this.connected = false;
this.browsers = [];
this.setActiveBrowserId(null);
this.address = "";
this.frameState = null;
this.frameSrc = "";
if (this.contextId) {
await this.connectViewer();
}
} finally {
this.loading = false;
}
},
async openExtensionsFolder() {
this.closeExtensionsMenu();
try {
if (!this.extensionsRoot) {
await this.refreshExtensionsList();
}
void fileBrowserStore.open(this.extensionsRoot || EXTENSIONS_ROOT_FALLBACK);
} catch (error) {
this.extensionActionError = error instanceof Error ? error.message : String(error);
}
},
createExtensionWithAgent() {
this._prefillAgentPrompt(
[
"Use the a0-browser-ext skill to create a new Chrome extension for Agent Zero's Browser.",
"Start by asking me for the extension name, purpose, target websites, and required permissions.",
`Create it under ${this.extensionsRoot || EXTENSIONS_ROOT_FALLBACK}/<extension-slug> and keep permissions minimal.`,
].join("\n")
);
},
askAgentInstallExtension() {
const url = String(this.extensionInstallUrl || "").trim();
this._prefillAgentPrompt(
[
"Use the a0-browser-ext skill to install and review a Chrome Web Store extension for Agent Zero's Browser.",
url ? `Chrome Web Store URL or id: ${url}` : "Ask me for the Chrome Web Store URL or extension id first.",
"Explain the permissions and any sandbox risk before enabling it.",
].join("\n")
);
},
async installExtensionFromUrl() {
const url = String(this.extensionInstallUrl || "").trim();
this.extensionActionMessage = "";
this.extensionActionError = "";
if (!url) {
this.extensionActionError = "Paste a Chrome Web Store URL or extension id first.";
return;
}
this.extensionActionLoading = true;
try {
const response = await callJsonApi("/plugins/_browser/extensions", {
action: "install_web_store",
url,
});
if (!response?.ok) {
throw new Error(response?.error || "Install failed.");
}
this.extensionInstallUrl = "";
this.extensionActionMessage = `Installed ${response.name || response.id}. Browser sessions restart when extension settings change.`;
await this.refreshStatus();
await this.refreshExtensionsList();
} catch (error) {
this.extensionActionError = error instanceof Error ? error.message : String(error);
} finally {
this.extensionActionLoading = false;
}
},
_prefillAgentPrompt(prompt) {
chatInputStore.message = prompt;
chatInputStore.adjustTextareaHeight?.();
chatInputStore.focus?.();
this.closeExtensionsMenu();
},
async onOpen(element = null) {
this.loading = true;
this.error = "";
this.setupFloatingModal(element);
this.contextId = this.resolveContextId();
try {
await this.refreshStatus();
await this.connectViewer();
} catch (error) {
this.error = error instanceof Error ? error.message : String(error);
} finally {
this.loading = false;
}
},
async connectViewer() {
if (!this.contextId) {
this.connected = false;
this.error = "No active chat context is selected.";
return;
}
this.error = "";
await this._bindSocketEvents();
const response = await websocket.request(
"browser_viewer_subscribe",
{
context_id: this.contextId,
browser_id: this.activeBrowserId,
},
{ timeoutMs: 10000 },
);
const data = firstOk(response);
this.browsers = data.browsers || [];
this.setActiveBrowserId(data.active_browser_id || this.activeBrowserId || null);
this.connected = true;
this.queueViewportSync(true);
},
async _bindSocketEvents() {
if (!this._frameOff) {
const frameHandler = ({ data }) => {
if (data?.context_id !== this.contextId) return;
this.browsers = data.browsers || this.browsers;
this.setActiveBrowserId(data.browser_id || data.state?.id || this.activeBrowserId);
this.frameState = data.state || null;
if (!this.addressFocused && data.state?.currentUrl) {
this.address = data.state.currentUrl;
}
this.frameSrc = data.image ? `data:${data.mime || "image/jpeg"};base64,${data.image}` : "";
if (!data.image && !data.state) {
this.setActiveBrowserId(null);
this.frameState = null;
this.frameSrc = "";
}
this._lastFrameAt = Date.now();
};
await websocket.on("browser_viewer_frame", frameHandler);
this._frameOff = () => websocket.off("browser_viewer_frame", frameHandler);
}
if (!this._stateOff) {
const stateHandler = ({ data }) => {
if (data?.context_id !== this.contextId) return;
this.browsers = data.browsers || [];
this.setActiveBrowserId(data.last_interacted_browser_id || this.firstBrowserId());
this.queueViewportSync(true);
};
await websocket.on("browser_viewer_state", stateHandler);
this._stateOff = () => websocket.off("browser_viewer_state", stateHandler);
}
},
async command(command, extra = {}) {
this.error = "";
const previousActiveBrowserId = this.activeBrowserId;
try {
const response = await websocket.request(
"browser_viewer_command",
{
context_id: this.contextId,
browser_id: this.activeBrowserId,
command,
...extra,
},
{ timeoutMs: 20000 },
);
const data = firstOk(response);
this.browsers = data.browsers || this.browsers;
const result = data.result || {};
this.setActiveBrowserId(
result.id
|| result.state?.id
|| result.last_interacted_browser_id
|| data.last_interacted_browser_id
|| this.firstBrowserId()
);
if (!this.activeBrowserId) {
this.frameState = null;
this.frameSrc = "";
}
if (result.state?.currentUrl || result.currentUrl) {
this.address = result.state?.currentUrl || result.currentUrl;
}
const activeChanged = this.activeBrowserId && this.activeBrowserId !== previousActiveBrowserId;
if ((command === "open" || command === "close" || activeChanged) && this.contextId && this.activeBrowserId) {
await this.connectViewer();
}
this.queueViewportSync(true);
} catch (error) {
this.error = error instanceof Error ? error.message : String(error);
}
},
async go() {
const url = String(this.address || "").trim();
if (!url) return;
this.addressFocused = false;
globalThis.document?.activeElement?.blur?.();
if (this.activeBrowserId) {
await this.command("navigate", { url });
} else {
await this.command("open", { url });
}
},
onAddressFocus() {
this.addressFocused = true;
},
onAddressBlur() {
this.addressFocused = false;
if (this.frameState?.currentUrl && !String(this.address || "").trim()) {
this.address = this.frameState.currentUrl;
}
},
async selectBrowser(id) {
if (String(id || "").trim() === "") {
await this.command("open", { url: "about:blank" });
return;
}
this.setActiveBrowserId(id);
if (this.contextId) {
await this.connectViewer();
}
},
firstBrowserId() {
const first = Array.isArray(this.browsers) ? this.browsers[0] : null;
return first?.id || null;
},
setActiveBrowserId(id) {
const previous = this.activeBrowserId;
const numeric = Number(id) || null;
const exists = !numeric || !Array.isArray(this.browsers) || this.browsers.some((browser) => Number(browser.id) === numeric);
this.activeBrowserId = exists ? numeric : null;
if (this.activeBrowserId !== previous) {
this._lastViewportKey = "";
}
},
pointerCoordinatesFor(event, element = null) {
const target = element || event?.currentTarget;
if (!target) return null;
const rect = target.getBoundingClientRect();
const naturalWidth = target.naturalWidth || rect.width;
const naturalHeight = target.naturalHeight || rect.height;
return {
x: ((event.clientX - rect.left) / Math.max(1, rect.width)) * naturalWidth,
y: ((event.clientY - rect.top) / Math.max(1, rect.height)) * naturalHeight,
};
},
currentViewportSize() {
const stage = this._stageElement;
if (!stage) return null;
const width = Math.floor(stage.clientWidth || 0);
const height = Math.floor(stage.clientHeight || 0);
if (width < 80 || height < 80) return null;
return {
width: Math.max(320, width),
height: Math.max(200, height),
};
},
queueViewportSync(force = false) {
if (this._viewportSyncTimer) {
globalThis.clearTimeout(this._viewportSyncTimer);
}
this._viewportSyncTimer = globalThis.setTimeout(() => {
this._viewportSyncTimer = null;
void this.syncViewport(force);
}, force ? 0 : 80);
},
async syncViewport(force = false) {
if (!this.contextId || !this.activeBrowserId) return;
const viewport = this.currentViewportSize();
if (!viewport) return;
const key = `${this.activeBrowserId}:${viewport.width}x${viewport.height}`;
if (!force && this._lastViewportKey === key) return;
try {
await websocket.emit("browser_viewer_input", {
context_id: this.contextId,
browser_id: this.activeBrowserId,
input_type: "viewport",
width: viewport.width,
height: viewport.height,
});
this._lastViewportKey = key;
} catch (error) {
this._lastViewportKey = "";
console.warn("Browser viewport sync failed", error);
}
},
async sendMouse(eventType, event) {
if (!this.activeBrowserId || !event?.currentTarget) return;
const pointer = this.pointerCoordinatesFor(event);
if (!pointer) return;
await websocket.emit("browser_viewer_input", {
context_id: this.contextId,
browser_id: this.activeBrowserId,
input_type: "mouse",
event_type: eventType,
x: pointer.x,
y: pointer.y,
button: "left",
});
},
async sendWheel(event) {
if (!this.activeBrowserId || !event) return;
const image = event.currentTarget?.querySelector?.(".browser-frame") || event.target?.closest?.(".browser-frame");
const pointer = this.pointerCoordinatesFor(event, image);
if (!pointer) return;
await websocket.emit("browser_viewer_input", {
context_id: this.contextId,
browser_id: this.activeBrowserId,
input_type: "wheel",
x: pointer.x,
y: pointer.y,
delta_x: Number(event.deltaX || 0),
delta_y: Number(event.deltaY || 0),
});
},
async sendKey(event) {
if (!this.activeBrowserId) return;
if (event.ctrlKey || event.metaKey || event.altKey) return;
const editable = ["INPUT", "TEXTAREA", "SELECT"].includes(event.target?.tagName);
if (editable) return;
event.preventDefault();
const printable = event.key && event.key.length === 1;
await websocket.emit("browser_viewer_input", {
context_id: this.contextId,
browser_id: this.activeBrowserId,
input_type: "keyboard",
key: printable ? "" : event.key,
text: printable ? event.key : "",
});
},
async cleanup() {
if (this.contextId) {
try {
await websocket.emit("browser_viewer_unsubscribe", { context_id: this.contextId });
} catch {}
}
this._frameOff?.();
this._stateOff?.();
this._frameOff = null;
this._stateOff = null;
this._floatingCleanup?.();
this._floatingCleanup = null;
this._stageResizeObserver?.disconnect?.();
this._stageResizeObserver = null;
this._stageElement = null;
if (this._viewportSyncTimer) {
globalThis.clearTimeout(this._viewportSyncTimer);
this._viewportSyncTimer = null;
}
this._lastViewportKey = "";
this.extensionMenuOpen = false;
this.extensionActionLoading = false;
this.connected = false;
},
setupFloatingModal(element = null) {
this._floatingCleanup?.();
const root = element || globalThis.document?.querySelector(".browser-panel");
const modal = root?.closest?.(".modal");
const inner = modal?.querySelector?.(".modal-inner");
const body = modal?.querySelector?.(".modal-bd");
const header = modal?.querySelector?.(".modal-header");
const stage = root?.querySelector?.(".browser-stage");
if (!modal || !inner || !header) return;
modal.classList.add("modal-floating");
inner.classList.add("browser-modal");
body?.classList?.add("browser-modal-body");
this._stageElement = stage || null;
const rect = inner.getBoundingClientRect();
inner.style.left = `${Math.max(8, rect.left)}px`;
inner.style.top = `${Math.max(8, rect.top)}px`;
inner.style.transform = "none";
let drag = null;
let resizeObserver = null;
const viewportGap = 8;
const clampPosition = (left, top) => {
const bounds = inner.getBoundingClientRect();
const maxLeft = Math.max(viewportGap, globalThis.innerWidth - bounds.width - viewportGap);
const maxTop = Math.max(viewportGap, globalThis.innerHeight - bounds.height - viewportGap);
return {
left: Math.min(Math.max(viewportGap, left), maxLeft),
top: Math.min(Math.max(viewportGap, top), maxTop),
};
};
const clampGeometry = () => {
const bounds = inner.getBoundingClientRect();
const left = Math.max(viewportGap, bounds.left);
const top = Math.max(viewportGap, bounds.top);
const maxWidth = Math.max(320, globalThis.innerWidth - viewportGap * 2);
const maxHeight = Math.max(300, globalThis.innerHeight - viewportGap * 2);
if (bounds.width > maxWidth) {
inner.style.width = `${maxWidth}px`;
}
if (bounds.height > maxHeight) {
inner.style.height = `${maxHeight}px`;
}
const next = clampPosition(left, top);
inner.style.left = `${next.left}px`;
inner.style.top = `${next.top}px`;
inner.style.maxWidth = `${Math.max(320, globalThis.innerWidth - next.left - viewportGap)}px`;
inner.style.maxHeight = `${Math.max(300, globalThis.innerHeight - next.top - viewportGap)}px`;
this.queueViewportSync();
};
clampGeometry();
globalThis.addEventListener("resize", clampGeometry);
if (globalThis.ResizeObserver) {
resizeObserver = new ResizeObserver(clampGeometry);
resizeObserver.observe(inner);
if (stage) {
this._stageResizeObserver?.disconnect?.();
this._stageResizeObserver = new ResizeObserver(() => this.queueViewportSync());
this._stageResizeObserver.observe(stage);
}
}
globalThis.requestAnimationFrame(() => this.queueViewportSync(true));
const onPointerMove = (event) => {
if (!drag) return;
const next = clampPosition(
drag.left + event.clientX - drag.x,
drag.top + event.clientY - drag.y,
);
inner.style.left = `${next.left}px`;
inner.style.top = `${next.top}px`;
clampGeometry();
};
const onPointerUp = () => {
drag = null;
globalThis.removeEventListener("pointermove", onPointerMove);
globalThis.removeEventListener("pointerup", onPointerUp);
try {
header.releasePointerCapture?.(header.__browserPanelPointerId || 0);
} catch {}
};
const onPointerDown = (event) => {
if (event.button !== 0) return;
if (event.target?.closest?.("button, input, select, textarea, a")) return;
const current = inner.getBoundingClientRect();
drag = {
x: event.clientX,
y: event.clientY,
left: current.left,
top: current.top,
};
header.__browserPanelPointerId = event.pointerId;
header.setPointerCapture?.(event.pointerId);
globalThis.addEventListener("pointermove", onPointerMove);
globalThis.addEventListener("pointerup", onPointerUp);
event.preventDefault();
};
header.addEventListener("pointerdown", onPointerDown);
this._floatingCleanup = () => {
header.removeEventListener("pointerdown", onPointerDown);
globalThis.removeEventListener("pointermove", onPointerMove);
globalThis.removeEventListener("pointerup", onPointerUp);
globalThis.removeEventListener("resize", clampGeometry);
resizeObserver?.disconnect?.();
this._stageResizeObserver?.disconnect?.();
this._stageResizeObserver = null;
};
},
get activeTitle() {
return this.frameState?.title || "Browser";
},
get activeUrl() {
return this.frameState?.currentUrl || this.address || "about:blank";
},
};
export const store = createStore("browserPage", model);

View file

@ -0,0 +1,225 @@
<html>
<head>
<title>Browser Settings</title>
<script type="module">
import { store } from "/plugins/_browser/webui/browser-config-store.js";
</script>
</head>
<body>
<div x-data>
<template x-if="$store.browserConfig && config">
<div
class="browser-config-sections"
x-init="$store.browserConfig.init(config)"
x-effect="$store.browserConfig.bindConfig(config)"
x-destroy="$store.browserConfig.cleanup()"
>
<div class="browser-config-card">
<div class="section-title">Browser Model Preset</div>
<div class="section-description">
Choose an optional Model Configuration preset for Browser-owned model helpers. Leave it
on default to follow the effective Main Model.
</div>
<div class="field">
<div class="field-label">
<div class="field-title">Preset</div>
<div class="field-description" x-text="$store.browserConfig.selectedPresetSummary()"></div>
</div>
<div class="field-control">
<select x-model="config.model_preset" :disabled="$store.browserConfig.presetsLoading">
<option value="">Default Main Model</option>
<template x-for="preset in $store.browserConfig.presetOptions()" :key="preset.name">
<option :value="preset.name" x-text="preset.label"></option>
</template>
</select>
</div>
</div>
<div class="browser-config-note" x-show="$store.browserConfig.presetsLoading">
<span class="material-symbols-outlined spinning">progress_activity</span>
<span>Loading model presets...</span>
</div>
<div class="browser-config-warning" x-show="$store.browserConfig.selectedPresetMissing()">
<span class="material-symbols-outlined">warning</span>
<span>The saved preset is missing. Browser will use the effective Main Model until you choose another preset.</span>
</div>
<div class="browser-config-note" x-show="$store.browserConfig.presetsError">
<span class="material-symbols-outlined">error</span>
<span x-text="$store.browserConfig.presetsError"></span>
</div>
<div class="browser-config-actions">
<button type="button" class="btn btn-field" @click="$store.browserConfig.openPresets()">
<span class="material-symbols-outlined">tune</span>
<span>Edit Presets</span>
</button>
</div>
</div>
<div class="browser-config-card">
<div class="section-title">Chrome Extensions</div>
<div class="section-description">
Load unpacked Chromium extensions into the Browser tool. When extensions are active,
Browser switches from Playwright's lightweight headless shell to bundled Chromium so
the extensions can actually load.
</div>
<div class="browser-config-warning">
<span class="material-symbols-outlined">warning</span>
<span>
Browser extensions run inside the Docker browser sandbox, but malicious or buggy
extensions can still damage that sandboxed environment. Install only extensions you
trust and keep permissions as small as possible.
</span>
</div>
<div class="field">
<div class="field-label">
<div class="field-title">Enable extensions</div>
<div class="field-description">
Turn this on only when you have unpacked extension folders ready. Saving changes
restarts active Browser sessions so the new launch mode applies immediately.
</div>
</div>
<div class="field-control">
<label class="toggle">
<input type="checkbox" x-model="config.extensions_enabled" />
<span class="toggler"></span>
</label>
</div>
</div>
<div class="field">
<div class="field-label">
<div class="field-title">Extension directories</div>
<div class="field-description">
One unpacked extension directory per line. Use paths that are visible inside the
runtime environment itself, especially when Agent Zero is running in Docker.
</div>
</div>
<div class="field-control">
<textarea
:value="$store.browserConfig.extensionPathsText"
@input="$store.browserConfig.setExtensionPathsText($event.target.value)"
rows="6"
placeholder="/a0/usr/browser-extensions/my-extension"
></textarea>
</div>
</div>
<div class="browser-config-note">
<span class="material-symbols-outlined">info</span>
<span>
This first version supports unpacked extension folders only. Chrome Web Store installs
and `.crx` files are out of scope for now.
</span>
</div>
<div class="browser-config-note">
<span class="material-symbols-outlined">deployed_code</span>
<span>
Playwright currently requires a persistent Chromium context for extension loading, so
Browser stays in its faster headless-shell mode until valid extension folders are both
configured and enabled.
</span>
</div>
<div class="browser-config-pill-row">
<span class="browser-config-pill" x-text="$store.browserConfig.pathCountLabel()"></span>
<span class="browser-config-pill tone-active" x-show="$store.browserConfig.extensionModeReady()">
Extension mode ready
</span>
</div>
</div>
</div>
</template>
</div>
<style>
.browser-config-sections {
display: flex;
flex-direction: column;
gap: 16px;
}
.browser-config-card {
display: flex;
flex-direction: column;
gap: 14px;
padding: 16px;
border: 1px solid var(--color-border);
border-radius: 8px;
}
.browser-config-card textarea {
min-height: 132px;
resize: vertical;
font-family: var(--font-family-monospace, monospace);
}
.browser-config-actions {
display: flex;
flex-wrap: wrap;
gap: 8px;
}
.browser-config-note {
display: flex;
align-items: flex-start;
gap: 8px;
padding: 10px 12px;
border-radius: 8px;
background: color-mix(in srgb, var(--color-panel) 82%, transparent);
color: var(--color-text-secondary);
font-size: var(--font-size-small);
}
.browser-config-warning {
display: flex;
align-items: flex-start;
gap: 9px;
padding: 11px 12px;
border: 1px solid color-mix(in srgb, #d97706 44%, var(--color-border));
border-radius: 8px;
background: color-mix(in srgb, #d97706 14%, var(--color-background));
color: color-mix(in srgb, var(--color-text) 86%, #92400e);
font-size: var(--font-size-small);
line-height: 1.4;
}
.browser-config-warning .material-symbols-outlined {
color: #b45309;
font-size: 20px;
}
.browser-config-pill-row {
display: flex;
flex-wrap: wrap;
gap: 8px;
}
.browser-config-pill {
display: inline-flex;
align-items: center;
gap: 6px;
min-height: 28px;
padding: 0 10px;
border-radius: 999px;
border: 1px solid color-mix(in srgb, var(--color-border) 70%, transparent);
background: color-mix(in srgb, var(--color-panel) 88%, transparent);
font-size: 0.78rem;
color: var(--color-text-secondary);
}
.browser-config-pill.tone-active {
color: #1b5e20;
border-color: rgba(27, 94, 32, 0.18);
background: rgba(46, 125, 50, 0.12);
}
</style>
</body>
</html>

View file

@ -0,0 +1,556 @@
<html class="browser-modal">
<head>
<title>Browser</title>
<script type="module">
import { store } from "/plugins/_browser/webui/browser-store.js";
</script>
</head>
<body class="browser-modal-body">
<div x-data>
<template x-if="$store.browserPage">
<div
class="browser-panel"
x-create="$store.browserPage.onOpen($el)"
x-destroy="$store.browserPage.cleanup()"
@keydown.window="$store.browserPage.sendKey($event)"
>
<div class="browser-meta">
<div class="browser-meta-top">
<div class="browser-titleline">
<span
class="browser-live-dot"
:class="{ active: $store.browserPage.connected && $store.browserPage.frameSrc }"
></span>
<span class="browser-title">Browser</span>
<span class="browser-id" x-show="$store.browserPage.activeBrowserId" x-text="'#' + $store.browserPage.activeBrowserId"></span>
</div>
<div class="browser-session-controls">
<select class="browser-select" x-model="$store.browserPage.activeBrowserId" @change="$store.browserPage.selectBrowser($event.target.value)">
<option value="">New Browser</option>
<template x-for="browser in $store.browserPage.browsers" :key="browser.id">
<option :value="browser.id" x-text="'#' + browser.id + ' ' + (browser.title || browser.currentUrl || 'about:blank')"></option>
</template>
</select>
<div
class="browser-extension-menu"
@click.outside="$store.browserPage.closeExtensionsMenu()"
@keydown.escape.window="$store.browserPage.closeExtensionsMenu()"
>
<button
type="button"
class="btn btn-icon-action browser-extensions"
title="Browser extensions"
aria-label="Browser extensions"
@click.stop="$store.browserPage.toggleExtensionsMenu()"
:aria-expanded="$store.browserPage.extensionMenuOpen.toString()"
:class="{ 'is-active': $store.browserPage.status?.extensions?.active }"
>
<span class="material-symbols-outlined">extension</span>
</button>
<div
class="browser-extension-dropdown"
x-show="$store.browserPage.extensionMenuOpen"
x-transition
style="display: none;"
>
<div class="browser-extension-warning">
<span class="material-symbols-outlined">warning</span>
<span>
Extensions run inside the Docker browser sandbox, but malicious or buggy extensions can still damage that environment. Review what you install.
</span>
</div>
<button type="button" class="dropdown-item" @click="$store.browserPage.createExtensionWithAgent()">
<span class="material-symbols-outlined">add_circle</span>
<span>+ Create New with A0</span>
</button>
<div class="browser-extension-url">
<label for="browser-extension-url">Chrome Web Store URL</label>
<input
id="browser-extension-url"
type="url"
x-model="$store.browserPage.extensionInstallUrl"
@keydown.enter.prevent="$store.browserPage.installExtensionFromUrl()"
placeholder="https://chromewebstore.google.com/detail/..."
/>
<div class="browser-extension-url-actions">
<button
type="button"
class="btn btn-ok"
@click="$store.browserPage.installExtensionFromUrl()"
:disabled="$store.browserPage.extensionActionLoading"
>
<span class="material-symbols-outlined" x-text="$store.browserPage.extensionActionLoading ? 'progress_activity' : 'download'"></span>
<span>Install URL</span>
</button>
<button type="button" class="btn btn-field" @click="$store.browserPage.askAgentInstallExtension()">
<span class="material-symbols-outlined">psychology_alt</span>
<span>Ask A0</span>
</button>
</div>
</div>
<button type="button" class="dropdown-item" @click="$store.browserPage.openExtensionsFolder()">
<span class="material-symbols-outlined">folder_open</span>
<span>My Browser Extensions</span>
</button>
<button type="button" class="dropdown-item" @click="$store.browserPage.openExtensionsSettings()">
<span class="material-symbols-outlined">tune</span>
<span>Browser Extension Settings</span>
</button>
<div class="browser-extension-message" x-show="$store.browserPage.extensionActionMessage" x-text="$store.browserPage.extensionActionMessage"></div>
<div class="browser-extension-error" x-show="$store.browserPage.extensionActionError" x-text="$store.browserPage.extensionActionError"></div>
</div>
</div>
<button class="btn btn-icon-action browser-close" title="Close Browser" @click="$confirmClick($event, () => $store.browserPage.command('close'))" :disabled="!$store.browserPage.activeBrowserId">
<span class="material-symbols-outlined">close</span>
</button>
</div>
</div>
</div>
<div class="browser-toolbar">
<div class="browser-navigation">
<button class="btn btn-icon-action" title="Back" @click="$store.browserPage.command('back')" :disabled="!$store.browserPage.activeBrowserId">
<span class="material-symbols-outlined">arrow_back</span>
</button>
<button class="btn btn-icon-action" title="Forward" @click="$store.browserPage.command('forward')" :disabled="!$store.browserPage.activeBrowserId">
<span class="material-symbols-outlined">arrow_forward</span>
</button>
<button class="btn btn-icon-action" title="Reload" @click="$store.browserPage.command('reload')" :disabled="!$store.browserPage.activeBrowserId">
<span class="material-symbols-outlined">refresh</span>
</button>
</div>
<form class="browser-address-form" @submit.prevent="$store.browserPage.go()">
<span class="material-symbols-outlined browser-address-icon">language</span>
<input
class="browser-address"
x-model="$store.browserPage.address"
@focus="$store.browserPage.onAddressFocus()"
@blur="$store.browserPage.onAddressBlur()"
placeholder="https://example.com"
autocomplete="off"
/>
</form>
</div>
<div class="browser-status" x-show="$store.browserPage.loading">
<span class="material-symbols-outlined spinning">progress_activity</span>
<span>Connecting browser...</span>
</div>
<div class="browser-error" x-show="$store.browserPage.error" x-text="$store.browserPage.error"></div>
<div
class="browser-stage"
tabindex="0"
@click="$el.focus()"
@wheel.prevent="$store.browserPage.sendWheel($event)"
>
<template x-if="$store.browserPage.frameSrc">
<img
class="browser-frame"
:src="$store.browserPage.frameSrc"
@click="$store.browserPage.sendMouse('click', $event)"
@mousemove.throttle.250ms="$store.browserPage.sendMouse('move', $event)"
draggable="false"
/>
</template>
<template x-if="!$store.browserPage.frameSrc && !$store.browserPage.loading">
<div class="browser-empty">
<span class="material-symbols-outlined">captive_portal</span>
<button class="btn btn-field" @click="$store.browserPage.command('open', { url: 'about:blank' })">Open Browser</button>
</div>
</template>
</div>
</div>
</template>
</div>
<style>
.modal-inner.browser-modal {
box-sizing: border-box;
container-type: inline-size;
width: min(78vw, 1120px);
height: min(88vh, 900px);
min-width: min(320px, calc(100vw - 16px));
min-height: min(480px, calc(100vh - 16px));
max-width: calc(100vw - 16px);
max-height: calc(100vh - 16px);
resize: both;
border: 1px solid color-mix(in srgb, var(--color-border) 75%, transparent);
border-radius: 7px;
box-shadow: 0 18px 48px rgba(0, 0, 0, 0.32);
background: color-mix(in srgb, var(--color-background) 94%, #000 6%);
}
.modal.modal-floating {
pointer-events: none;
}
.modal.modal-floating .modal-inner {
pointer-events: auto;
}
.modal-inner.browser-modal .modal-header {
min-height: 34px;
padding: 0.35rem 0.75rem 0.35rem 1rem;
cursor: move;
user-select: none;
background: color-mix(in srgb, var(--color-background) 92%, #000 8%);
border-bottom: 1px solid color-mix(in srgb, var(--color-border) 70%, transparent);
}
.modal-inner.browser-modal .modal-title {
font-size: 0.95rem;
letter-spacing: 0;
}
.modal-inner.browser-modal .modal-close {
font-size: 1.35rem;
line-height: 1;
}
.modal-inner.browser-modal .modal-scroll {
flex: 1 1 auto;
min-height: 0;
overflow: hidden;
padding: 0;
}
.modal-inner.browser-modal .modal-bd.browser-modal-body {
box-sizing: border-box;
display: flex;
flex-direction: column;
height: 100%;
padding: 0;
min-height: 0;
}
.modal-inner.browser-modal .modal-bd.browser-modal-body > div[x-data] {
display: flex;
flex: 1 1 auto;
min-height: 0;
}
.browser-panel {
box-sizing: border-box;
display: flex;
flex: 1 1 auto;
flex-direction: column;
gap: 0;
height: 100%;
min-height: 0;
}
.browser-toolbar {
display: grid;
grid-template-columns: auto minmax(0, 1fr);
grid-template-areas: "nav address";
gap: 6px;
align-items: center;
padding: 7px 8px;
border-bottom: 1px solid color-mix(in srgb, var(--color-border) 70%, transparent);
background: color-mix(in srgb, var(--color-panel) 90%, transparent);
}
.browser-navigation {
grid-area: nav;
display: flex;
gap: 4px;
}
.browser-address-form {
grid-area: address;
min-width: 0;
position: relative;
margin: 0;
}
.browser-address-icon {
position: absolute;
left: 10px;
top: 50%;
transform: translateY(-50%);
font-size: 18px;
opacity: 0.58;
pointer-events: none;
}
.browser-address,
.browser-select {
width: 100%;
min-height: 32px;
padding: 5px 9px;
border-radius: 6px;
border: 1px solid color-mix(in srgb, var(--color-border) 72%, transparent);
background: var(--color-input);
color: var(--color-text);
font: inherit;
}
.browser-address {
padding-left: 34px;
}
.browser-select {
min-width: 0;
}
.browser-meta {
display: grid;
grid-template-columns: minmax(0, 1fr);
gap: 6px;
padding: 6px 10px;
border-bottom: 1px solid color-mix(in srgb, var(--color-border) 65%, transparent);
background: color-mix(in srgb, var(--color-panel) 82%, transparent);
}
.browser-meta-top {
display: grid;
grid-template-columns: minmax(0, 1fr) auto;
gap: 10px;
align-items: center;
}
.browser-titleline {
display: flex;
align-items: center;
gap: 8px;
min-width: 0;
}
.browser-session-controls {
display: flex;
align-items: center;
gap: 6px;
min-width: 0;
}
.browser-session-controls .browser-select {
width: min(320px, 52cqw);
}
.browser-session-controls .browser-extensions.is-active {
color: #2e7d32;
}
.browser-extension-menu {
position: relative;
display: flex;
flex: 0 0 auto;
}
.browser-extension-dropdown {
position: absolute;
top: calc(100% + 6px);
right: 0;
z-index: 40;
display: flex;
flex-direction: column;
gap: 7px;
width: min(360px, calc(100vw - 24px));
padding: 10px;
border: 1px solid color-mix(in srgb, var(--color-border) 78%, transparent);
border-radius: 7px;
background: var(--color-background);
box-shadow: 0 16px 38px rgba(0, 0, 0, 0.28);
}
.browser-extension-dropdown .dropdown-item {
display: flex;
align-items: center;
gap: 8px;
width: 100%;
min-height: 34px;
padding: 7px 9px;
border: 0;
border-radius: 6px;
background: transparent;
color: var(--color-text);
font-weight: 600;
text-align: left;
cursor: pointer;
}
.browser-extension-dropdown .dropdown-item:hover {
background: color-mix(in srgb, var(--color-panel) 82%, transparent);
}
.browser-extension-warning,
.browser-extension-message,
.browser-extension-error {
display: flex;
align-items: flex-start;
gap: 8px;
padding: 9px 10px;
border-radius: 7px;
font-size: 0.8rem;
line-height: 1.35;
}
.browser-extension-warning {
border: 1px solid color-mix(in srgb, #d97706 42%, var(--color-border));
background: color-mix(in srgb, #d97706 14%, var(--color-background));
color: color-mix(in srgb, var(--color-text) 86%, #92400e);
}
.browser-extension-warning .material-symbols-outlined {
color: #b45309;
font-size: 19px;
}
.browser-extension-url {
display: flex;
flex-direction: column;
gap: 7px;
padding: 8px;
border: 1px solid color-mix(in srgb, var(--color-border) 58%, transparent);
border-radius: 7px;
background: var(--color-panel);
}
.browser-extension-url label {
font-size: 0.76rem;
color: var(--color-text-secondary);
}
.browser-extension-url input {
min-width: 0;
min-height: 32px;
padding: 6px 8px;
border: 1px solid color-mix(in srgb, var(--color-border) 72%, transparent);
border-radius: 6px;
background: var(--color-input);
color: var(--color-text);
}
.browser-extension-url-actions {
display: flex;
flex-wrap: wrap;
gap: 7px;
}
.browser-extension-url-actions .btn {
display: inline-flex;
align-items: center;
gap: 6px;
min-height: 30px;
}
.browser-extension-message {
background: color-mix(in srgb, #15803d 12%, var(--color-background));
color: color-mix(in srgb, var(--color-text) 88%, #166534);
}
.browser-extension-error {
background: color-mix(in srgb, #be123c 12%, var(--color-background));
color: #9f1239;
}
.browser-live-dot {
width: 8px;
height: 8px;
border-radius: 50%;
background: #777;
flex: 0 0 auto;
}
.browser-live-dot.active {
background: #2e7d32;
box-shadow: 0 0 0 4px rgba(46, 125, 50, 0.13);
}
.browser-title {
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.browser-title {
font-size: 0.9rem;
font-weight: 650;
}
.browser-id {
font-size: 0.78rem;
opacity: 0.68;
}
.browser-stage {
flex: 1 1 auto;
display: flex;
flex-direction: column;
min-height: 0;
overflow: auto;
background: #fff;
outline: none;
}
.browser-frame {
flex: 0 0 auto;
display: block;
width: 100%;
height: auto;
user-select: none;
background: #fff;
}
.browser-status,
.browser-error,
.browser-empty {
display: flex;
align-items: center;
gap: 8px;
min-height: 42px;
font-size: 0.88rem;
}
.browser-status,
.browser-error {
padding: 0 12px;
}
.browser-error {
color: #9f1239;
}
.browser-empty {
display: grid;
flex: 1 1 auto;
width: 100%;
min-height: 0;
justify-items: center;
align-content: center;
text-align: center;
padding: 24px;
color: var(--color-text);
background: var(--color-background);
}
@container (max-width: 460px) {
.browser-meta-top {
grid-template-columns: minmax(0, 1fr);
}
.browser-session-controls {
width: 100%;
}
.browser-session-controls .browser-select {
flex: 1 1 auto;
width: auto;
}
.browser-extension-dropdown {
right: 0;
left: auto;
width: min(296px, calc(100vw - 72px));
}
.browser-address,
.browser-select {
min-height: 34px;
}
}
</style>
</body>
</html>

View file

@ -1,35 +0,0 @@
from helpers.api import ApiHandler, Request, Response
from plugins._browser_agent.helpers.model_preset import (
get_browser_model_preset_name,
save_browser_model_preset_name,
)
from plugins._model_config.helpers import model_config
class ModelPreset(ApiHandler):
async def process(self, input: dict, request: Request) -> dict | Response:
action = str(input.get("action", "get") or "get").strip().lower()
if action == "get":
return {
"ok": True,
"preset_name": get_browser_model_preset_name(),
}
if action not in {"set", "clear"}:
return Response(status=400, response=f"Unknown action: {action}")
preset_name = ""
if action == "set":
preset_name = str(input.get("preset_name", "") or "").strip()
if not preset_name:
return Response(status=400, response="Missing preset_name")
if not model_config.get_preset_by_name(preset_name):
return Response(status=404, response=f"Preset '{preset_name}' not found")
save_browser_model_preset_name(preset_name)
return {
"ok": True,
"preset_name": preset_name,
}

View file

@ -1,54 +0,0 @@
import importlib.metadata
from helpers.api import ApiHandler, Request, Response
from plugins._browser_agent.helpers.model_preset import (
get_browser_model_preset_options,
resolve_browser_model_selection,
)
from plugins._browser_agent.helpers.playwright import (
get_playwright_binary,
get_playwright_cache_dir,
)
class Status(ApiHandler):
async def process(self, input: dict, request: Request) -> dict | Response:
selection = resolve_browser_model_selection()
cfg = selection["config"]
binary = get_playwright_binary()
browser_use_ok = False
browser_use_error = ""
browser_use_version = ""
try:
import browser_use # noqa: F401
browser_use_ok = True
browser_use_version = importlib.metadata.version("browser-use")
except Exception as e:
browser_use_error = str(e)
return {
"plugin": "_browser_agent",
"model_source": selection["source_label"],
"model_source_kind": selection["source_kind"],
"selected_preset_name": selection["selected_preset_name"],
"preset_status": selection["preset_status"],
"preset_warning": selection["warning"],
"available_presets": get_browser_model_preset_options(),
"model": {
"provider": cfg.get("provider", ""),
"name": cfg.get("name", ""),
"vision": bool(cfg.get("vision", False)),
},
"playwright": {
"cache_dir": get_playwright_cache_dir(),
"binary_found": bool(binary),
"binary_path": str(binary) if binary else "",
},
"browser_use": {
"import_ok": browser_use_ok,
"version": browser_use_version,
"error": browser_use_error,
},
}

View file

@ -1,246 +0,0 @@
// open all shadow doms
(function () {
const originalAttachShadow = Element.prototype.attachShadow;
Element.prototype.attachShadow = function attachShadow(options) {
return originalAttachShadow.call(this, { ...options, mode: "open" });
};
})();
// // Create a global bridge for iframe communication
// (function() {
// let elementCounter = 0;
// const ignoredTags = [
// "style",
// "script",
// "meta",
// "link",
// "svg",
// "noscript",
// "path",
// ];
// function isElementVisible(element) {
// // Return true for non-element nodes
// if (element.nodeType !== Node.ELEMENT_NODE) {
// return true;
// }
// const computedStyle = window.getComputedStyle(element);
// // Check if element is hidden via CSS
// if (
// computedStyle.display === "none" ||
// computedStyle.visibility === "hidden" ||
// computedStyle.opacity === "0"
// ) {
// return false;
// }
// // Check for hidden input type
// if (element.tagName === "INPUT" && element.type === "hidden") {
// return false;
// }
// // Check for hidden attribute
// if (
// element.hasAttribute("hidden") ||
// element.getAttribute("aria-hidden") === "true"
// ) {
// return false;
// }
// return true;
// }
// function convertAttribute(tag, attr) {
// let out = {
// name: attr.name,
// value: attr.value,
// };
// if (["srcset"].includes(out.name)) return null;
// if (out.name.startsWith("data-") && out.name != "data-A0UID" && out.name != "data-a0-frame-id") return null;
// if (tag === "img" && out.value.startsWith("data:")) out.value = "data...";
// return out;
// }
// // This function will be available in all frames
// window.__A0_extractFrameContent = function() {
// // Get the current frame's DOM content
// const extractContent = (node) => {
// if (!node) return "";
// let content = "";
// const tagName = node.tagName ? node.tagName.toLowerCase() : "";
// // Skip ignored tags
// if (tagName && ignoredTags.includes(tagName)) {
// return "";
// }
// if (node.nodeType === Node.ELEMENT_NODE) {
// // Add unique ID to the actual DOM element
// if (tagName) {
// const uid = elementCounter++;
// node.setAttribute("data-A0UID", uid);
// }
// content += `<${tagName}`;
// // Add invisible attribute if element is not visible
// if (!isElementVisible(node)) {
// content += " invisible";
// }
// // Add attributes with conversion
// for (let attr of node.attributes) {
// const out = convertAttribute(tagName, attr);
// if (out) content += ` ${out.name}="${out.value}"`;
// }
// if (tagName) {
// content += ` selector="${node.getAttribute("data-A0UID")}"`;
// }
// content += ">";
// // Handle shadow DOM
// if (node.shadowRoot) {
// content += "<!-- Shadow DOM Start -->";
// for (let shadowChild of node.shadowRoot.childNodes) {
// content += extractContent(shadowChild);
// }
// content += "<!-- Shadow DOM End -->";
// }
// // Handle child nodes
// for (let child of node.childNodes) {
// content += extractContent(child);
// }
// content += `</${tagName}>`;
// } else if (node.nodeType === Node.TEXT_NODE) {
// content += node.textContent;
// } else if (node.nodeType === Node.COMMENT_NODE) {
// content += `<!--${node.textContent}-->`;
// }
// return content;
// };
// return extractContent(document.documentElement);
// };
// // Setup message listener in each frame
// window.addEventListener('message', function(event) {
// if (event.data === 'A0_REQUEST_CONTENT') {
// // Extract content and send it back to parent
// const content = window.__A0_extractFrameContent();
// // Use '*' as targetOrigin since we're in a controlled environment
// window.parent.postMessage({
// type: 'A0_FRAME_CONTENT',
// content: content,
// frameId: window.frameElement?.getAttribute('data-a0-frame-id')
// }, '*');
// }
// });
// // Function to extract content from all frames
// window.__A0_extractAllFramesContent = async function(rootNode = document) {
// let content = "";
// // Extract content from current document
// content += window.__A0_extractFrameContent();
// // Find all iframes
// const iframes = rootNode.getElementsByTagName('iframe');
// // Create a map to store frame contents
// const frameContents = new Map();
// // Setup promise for each iframe
// const framePromises = Array.from(iframes).map((iframe) => {
// return new Promise((resolve) => {
// const frameId = 'frame_' + Math.random().toString(36).substr(2, 9);
// iframe.setAttribute('data-a0-frame-id', frameId);
// // Setup one-time message listener for this specific frame
// const listener = function(event) {
// if (event.data?.type === 'A0_FRAME_CONTENT' &&
// event.data?.frameId === frameId) {
// frameContents.set(frameId, event.data.content);
// window.removeEventListener('message', listener);
// resolve();
// }
// };
// window.addEventListener('message', listener);
// // Request content from frame
// iframe.contentWindow.postMessage('A0_REQUEST_CONTENT', '*');
// // Timeout after 2 seconds
// setTimeout(resolve, 2000);
// });
// });
// // Wait for all frames to respond or timeout
// await Promise.all(framePromises);
// // Add frame contents in order
// for (let iframe of iframes) {
// const frameId = iframe.getAttribute('data-a0-frame-id');
// const frameContent = frameContents.get(frameId);
// if (frameContent) {
// content += `<!-- IFrame ${iframe.src || 'unnamed'} Content Start -->`;
// content += frameContent;
// content += `<!-- IFrame Content End -->`;
// }
// }
// return content;
// };
// })();
// // override iframe creation to inject our script into them
// (function() {
// // Store the original createElement to use for iframe creation
// const originalCreateElement = document.createElement;
// // Override createElement to catch iframe creation
// document.createElement = function(tagName, options) {
// const element = originalCreateElement.call(document, tagName, options);
// if (tagName.toLowerCase() === 'iframe') {
// // Override the src setter
// const originalSrcSetter = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'src').set;
// Object.defineProperty(element, 'src', {
// set: function(value) {
// // Call original setter
// originalSrcSetter.call(this, value);
// // Wait for load and inject our script
// this.addEventListener('load', () => {
// try {
// // Try to inject our script into the iframe
// const iframeDoc = this.contentWindow.document;
// const script = iframeDoc.createElement('script');
// script.textContent = `
// // Make iframe accessible
// document.domain = document.domain;
// // Disable security policies if possible
// if (window.SecurityPolicyViolationEvent) {
// window.SecurityPolicyViolationEvent = undefined;
// }
// `;
// iframeDoc.head.appendChild(script);
// } catch(e) {
// console.warn('Could not inject into iframe:', e);
// }
// }, { once: true });
// }
// });
// }
// return element;
// };
// })();

View file

@ -1,7 +0,0 @@
from helpers.extension import Extension
from plugins._browser_agent.helpers.browser_llm import build_browser_model_for_agent
class BrowserModelProvider(Extension):
def execute(self, data: dict = {}, **kwargs):
if self.agent:
data["result"] = build_browser_model_for_agent(self.agent)

View file

@ -1,54 +0,0 @@
import {
createActionButton,
copyToClipboard,
} from "/components/messages/action-buttons/simple-action-buttons.js";
import { store as stepDetailStore } from "/components/modals/process-step-detail/step-detail-store.js";
import { store as speechStore } from "/components/chat/speech/speech-store.js";
import {
buildDetailPayload,
cleanStepTitle,
drawProcessStep,
} from "/js/messages.js";
export default async function registerBrowserAgentHandler(extData) {
if (extData?.type === "browser") {
extData.handler = drawMessageBrowserAgent;
}
}
function drawMessageBrowserAgent({
id,
type,
heading,
content,
kvps,
timestamp,
agentno = 0,
...additional
}) {
const title = cleanStepTitle(heading);
const displayKvps = { ...kvps };
const answerText = String(kvps?.answer ?? "");
const actionButtons = answerText.trim()
? [
createActionButton("detail", "", () =>
stepDetailStore.showStepDetail(
buildDetailPayload(arguments[0], { headerLabels: [] }),
),
),
createActionButton("speak", "", () => speechStore.speak(answerText)),
createActionButton("copy", "", () => copyToClipboard(answerText)),
].filter(Boolean)
: [];
return drawProcessStep({
id,
title,
code: "WWW",
classes: undefined,
kvps: displayKvps,
content,
actionButtons,
log: arguments[0],
});
}

View file

@ -1,15 +0,0 @@
import { drawMessageToolSimple } from "/js/messages.js";
/**
* Registers the browser_agent tool message handler to set the custom badge.
* @param {object} extData
*/
export default async function registerBrowserToolHandler(extData) {
if (extData?.tool_name === "browser_agent") {
extData.handler = drawBrowserTool;
}
}
function drawBrowserTool(args) {
return drawMessageToolSimple({ ...args, code: "WWW" });
}

View file

@ -1 +0,0 @@
# Built-in browser agent helpers.

View file

@ -1,162 +0,0 @@
from typing import Any, List, Optional
import litellm
from litellm import acompletion
from langchain_core.callbacks.manager import CallbackManagerForLLMRun
from langchain_core.messages import BaseMessage
import models
from browser_use.llm import ChatGoogle, ChatOpenRouter
from plugins._browser_agent.helpers import browser_use_monkeypatch
from plugins._browser_agent.helpers import model_preset
from plugins._browser_agent.helpers import browser_use_openrouter_compat
from plugins._browser_agent.helpers import browser_use_output_sanitize
_BROWSER_USE_PATCHED = False
def apply_browser_use_patches() -> None:
global _BROWSER_USE_PATCHED
if _BROWSER_USE_PATCHED:
return
browser_use_monkeypatch.apply()
litellm.modify_params = True
_BROWSER_USE_PATCHED = True
class AsyncAIChatReplacement:
class _Completions:
def __init__(self, wrapper):
self._wrapper = wrapper
async def create(self, *args, **kwargs):
return await self._wrapper._acall(*args, **kwargs)
class _Chat:
def __init__(self, wrapper):
self.completions = AsyncAIChatReplacement._Completions(wrapper)
def __init__(self, wrapper, *args, **kwargs):
self._wrapper = wrapper
self.chat = AsyncAIChatReplacement._Chat(wrapper)
class BrowserCompatibleChatWrapper(ChatOpenRouter):
"""
A wrapper for browser agent that can filter/sanitize messages
before sending them to the LLM.
"""
def __init__(self, *args, **kwargs):
apply_browser_use_patches()
models.turn_off_logging()
self._wrapper = models.LiteLLMChatWrapper(*args, **kwargs)
self.model = self._wrapper.model_name
self.kwargs = self._wrapper.kwargs
@property
def model_name(self) -> str:
return self._wrapper.model_name
@property
def provider(self) -> str:
return self._wrapper.provider
def get_client(self, *args, **kwargs): # type: ignore
return AsyncAIChatReplacement(self, *args, **kwargs)
async def _acall(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
):
models.apply_rate_limiter_sync(self._wrapper.a0_model_conf, str(messages))
try:
model = kwargs.pop("model", None)
effective_model = model or self._wrapper.model_name
kwrgs = {**self._wrapper.kwargs, **kwargs}
request_messages = messages
# hack from browser-use to fix json schema for gemini (additionalProperties, $defs, $ref)
if "response_format" in kwrgs and "json_schema" in kwrgs["response_format"] and effective_model and effective_model.startswith("gemini/"):
kwrgs["response_format"]["json_schema"] = ChatGoogle("")._fix_gemini_schema(kwrgs["response_format"]["json_schema"])
if browser_use_openrouter_compat.should_use_openrouter_prompt_schema_fallback(
provider=self.provider,
model_name=effective_model,
kwargs=kwrgs,
):
fallback_request = browser_use_openrouter_compat.build_json_object_fallback_request(
messages=messages,
kwargs=kwrgs,
)
if fallback_request is not None:
request_messages, kwrgs = fallback_request
resp = await acompletion(
model=self._wrapper.model_name,
messages=request_messages,
stop=stop,
**kwrgs,
)
# Gemini: strip triple backticks and conform schema
try:
msg = resp.choices[0].message # type: ignore
if self.provider == "gemini" and isinstance(getattr(msg, "content", None), str):
cleaned = browser_use_monkeypatch.gemini_clean_and_conform(msg.content) # type: ignore
if cleaned:
msg.content = cleaned
except Exception:
pass
except Exception as e:
raise e
# Structured output: normalize keys/models reject (e.g. "" on action dicts) and repair partial JSON
try:
rf = kwrgs.get("response_format") or {}
if "json_schema" in rf or "json_object" in rf:
msg_obj = resp.choices[0].message
raw_content = getattr(msg_obj, "content", None)
fixed = browser_use_output_sanitize.sanitize_llm_message_content_for_browser_use(raw_content) # type: ignore[arg-type]
if fixed is not None:
msg_obj.content = fixed
except Exception:
pass
return resp
def build_browser_model_from_config(
model_config: models.ModelConfig,
) -> BrowserCompatibleChatWrapper:
apply_browser_use_patches()
original_provider = model_config.provider.lower()
provider_name, kwargs = models._merge_provider_defaults( # type: ignore[attr-defined]
"chat", original_provider, model_config.build_kwargs()
)
return models._get_litellm_chat( # type: ignore[attr-defined]
BrowserCompatibleChatWrapper,
model_config.name,
provider_name,
model_config,
**kwargs,
)
def build_browser_model_for_agent(agent=None) -> BrowserCompatibleChatWrapper:
"""Build and return the browser-use adapter using chat model config."""
from plugins._model_config.helpers.model_config import (
build_model_config,
)
import models
selection = model_preset.resolve_browser_model_selection(agent)
cfg = selection["config"]
mc = build_model_config(cfg, models.ModelType.CHAT)
return build_browser_model_from_config(mc)

View file

@ -1,4 +0,0 @@
from helpers import dotenv
dotenv.save_dotenv_value("ANONYMIZED_TELEMETRY", "false")
import browser_use
import browser_use.utils

View file

@ -1,166 +0,0 @@
from typing import Any
from browser_use.llm import ChatGoogle
from helpers import dirty_json
from plugins._browser_agent.helpers import browser_use_output_sanitize
# ------------------------------------------------------------------------------
# Gemini Helper for Output Conformance
# ------------------------------------------------------------------------------
# This function sanitizes and conforms the JSON output from Gemini to match
# the specific schema expectations of the browser-use library. It handles
# markdown fences, aliases actions (like 'complete_task' to 'done'), and
# intelligently constructs a valid 'data' object for the final action.
def gemini_clean_and_conform(text: str):
obj = None
try:
# dirty_json parser is robust enough to handle markdown fences
obj = dirty_json.parse(text)
except Exception:
return None # return None if parsing fails
if not isinstance(obj, dict):
return None
obj = browser_use_output_sanitize.normalize_parsed_browser_use_output(obj)
# Conform actions to browser-use expectations
if isinstance(obj.get("action"), list):
normalized_actions = []
for item in obj["action"]:
if not isinstance(item, dict):
continue # Skip non-dict items
action_key, action_value = next(iter(item.items()), (None, None))
if not action_key:
continue
# Alias 'complete_task' to 'done' to handle inconsistencies
if action_key == "complete_task":
action_key = "done"
# Create a mutable copy of the value
v = (action_value or {}).copy()
if action_key in ("scroll_down", "scroll_up", "scroll"):
is_down = action_key != "scroll_up"
v.setdefault("down", is_down)
v.setdefault("num_pages", 1.0)
normalized_actions.append({"scroll": v})
elif action_key == "go_to_url":
v.setdefault("new_tab", False)
normalized_actions.append({action_key: v})
elif action_key == "done":
# If `data` is missing, construct it from other keys
if "data" not in v:
# Pop fields from the top-level `done` object
response_text = v.pop("response", None)
summary_text = v.pop("page_summary", None)
title_text = v.pop("title", "Task Completed")
final_response = response_text or "Task completed successfully." # browser-use expects string
final_summary = summary_text or "No page summary available." # browser-use expects string
v["data"] = {
"title": title_text,
"response": final_response,
"page_summary": final_summary,
}
v.setdefault("success", True)
normalized_actions.append({action_key: v})
else:
normalized_actions.append(item)
obj["action"] = normalized_actions
return dirty_json.stringify(obj)
# ------------------------------------------------------------------------------
# Monkey-patch for browser-use Gemini schema issue
# ------------------------------------------------------------------------------
# The original _fix_gemini_schema in browser_use.llm.google.chat.ChatGoogle
# removes the 'title' property but fails to remove it from the 'required' list,
# causing a validation error with the Gemini API. This patch corrects that behavior.
def _patched_fix_gemini_schema(self, schema: dict[str, Any]) -> dict[str, Any]:
"""
Convert a Pydantic model to a Gemini-compatible schema.
This function removes unsupported properties like 'additionalProperties' and resolves
$ref references that Gemini doesn't support.
"""
# Handle $defs and $ref resolution
if '$defs' in schema:
defs = schema.pop('$defs')
def resolve_refs(obj: Any) -> Any:
if isinstance(obj, dict):
if '$ref' in obj:
ref = obj.pop('$ref')
ref_name = ref.split('/')[-1]
if ref_name in defs:
# Replace the reference with the actual definition
resolved = defs[ref_name].copy()
# Merge any additional properties from the reference
for key, value in obj.items():
if key != '$ref':
resolved[key] = value
return resolve_refs(resolved)
return obj
else:
# Recursively process all dictionary values
return {k: resolve_refs(v) for k, v in obj.items()}
elif isinstance(obj, list):
return [resolve_refs(item) for item in obj]
return obj
schema = resolve_refs(schema)
# Remove unsupported properties
def clean_schema(obj: Any) -> Any:
if isinstance(obj, dict):
# Remove unsupported properties
cleaned = {}
for key, value in obj.items():
if key not in ['additionalProperties', 'title', 'default']:
cleaned_value = clean_schema(value)
# Handle empty object properties - Gemini doesn't allow empty OBJECT types
if (
key == 'properties'
and isinstance(cleaned_value, dict)
and len(cleaned_value) == 0
and isinstance(obj.get('type', ''), str)
and obj.get('type', '').upper() == 'OBJECT'
):
# Convert empty object to have at least one property
cleaned['properties'] = {'_placeholder': {'type': 'string'}}
else:
cleaned[key] = cleaned_value
# If this is an object type with empty properties, add a placeholder
if (
isinstance(cleaned.get('type', ''), str)
and cleaned.get('type', '').upper() == 'OBJECT'
and 'properties' in cleaned
and isinstance(cleaned['properties'], dict)
and len(cleaned['properties']) == 0
):
cleaned['properties'] = {'_placeholder': {'type': 'string'}}
# PATCH: Also remove 'title' from the required list if it exists
if 'required' in cleaned and isinstance(cleaned.get('required'), list):
cleaned['required'] = [p for p in cleaned['required'] if p != 'title']
return cleaned
elif isinstance(obj, list):
return [clean_schema(item) for item in obj]
return obj
return clean_schema(schema)
def apply():
"""Applies the monkey-patch to ChatGoogle."""
ChatGoogle._fix_gemini_schema = _patched_fix_gemini_schema

View file

@ -1,93 +0,0 @@
from __future__ import annotations
import copy
import json
from typing import Any
def is_openrouter_request(provider: str | None, model_name: str | None) -> bool:
provider_name = (provider or "").lower()
model = (model_name or "").lower()
return provider_name == "openrouter" or model.startswith("openrouter/")
def has_json_schema_response_format(kwargs: dict[str, Any]) -> bool:
response_format = kwargs.get("response_format")
return isinstance(response_format, dict) and (
response_format.get("type") == "json_schema" or "json_schema" in response_format
)
def should_use_openrouter_prompt_schema_fallback(
provider: str | None, model_name: str | None, kwargs: dict[str, Any]
) -> bool:
"""
OpenRouter sometimes routes browser-use structured output through providers
that reject large compiled grammars. Avoid the hard error entirely by
downgrading to `json_object` before the first request.
"""
return is_openrouter_request(provider, model_name) and has_json_schema_response_format(kwargs)
def relax_strict_tool_schemas(tools: Any) -> Any:
"""
Disable strict tool grammar on fallback while keeping tool definitions intact.
"""
if not isinstance(tools, list):
return tools
relaxed = copy.deepcopy(tools)
for tool in relaxed:
if not isinstance(tool, dict):
continue
function_spec = tool.get("function")
if isinstance(function_spec, dict) and function_spec.get("strict") is True:
function_spec["strict"] = False
return relaxed
def _schema_hint_text(response_format: dict[str, Any]) -> str | None:
schema_payload = response_format.get("json_schema")
if not isinstance(schema_payload, dict):
return None
compact_schema = json.dumps(
schema_payload,
ensure_ascii=False,
separators=(",", ":"),
)
return (
"Return only a single JSON object with no markdown fences, prose, or extra text. "
"Follow this schema exactly: "
f"{compact_schema}"
)
def prepend_schema_hint_to_messages(
messages: list[Any], response_format: dict[str, Any]
) -> list[Any]:
hint = _schema_hint_text(response_format)
if not hint:
return list(messages)
return [{"role": "system", "content": hint}, *list(messages)]
def build_json_object_fallback_request(
messages: list[Any],
kwargs: dict[str, Any],
) -> tuple[list[Any], dict[str, Any]] | None:
"""
Replace strict json_schema with json_object and move schema guidance into the prompt.
This keeps browser-use's local validation path while avoiding provider-side
grammar compilation limits on OpenRouter.
"""
response_format = kwargs.get("response_format")
if not isinstance(response_format, dict):
return None
updated_kwargs = copy.deepcopy(kwargs)
updated_kwargs["response_format"] = {"type": "json_object"}
if "tools" in updated_kwargs:
updated_kwargs["tools"] = relax_strict_tool_schemas(updated_kwargs["tools"])
updated_messages = prepend_schema_hint_to_messages(messages, response_format)
return updated_messages, updated_kwargs

View file

@ -1,79 +0,0 @@
"""
Utilities to normalize LLM replies before browser-use parses them into AgentOutput.
Some models (e.g. via OpenRouter) emit extra JSON keys such as "" : "", which
Pydantic rejects as extra_forbidden on strict action union members.
"""
from __future__ import annotations
from typing import Any
from helpers import dirty_json
def deep_strip_empty_string_keys(obj: Any) -> Any:
"""
Recursively remove dict entries whose key is the empty string.
Browser-use action objects must be discriminated unions with a single
action key; spurious "" keys break validation for every union variant.
"""
if isinstance(obj, dict):
return {
k: deep_strip_empty_string_keys(v)
for k, v in obj.items()
if k != ""
}
if isinstance(obj, list):
return [deep_strip_empty_string_keys(item) for item in obj]
return obj
def normalize_parsed_browser_use_output(obj: dict) -> dict:
"""Apply all normalizations safe for a parsed AgentOutput-shaped dict."""
out = deep_strip_empty_string_keys(obj)
if not isinstance(out, dict):
return obj
return out
def parse_and_sanitize_llm_json(text: str) -> str | None:
"""
Parse message content and return JSON text safe for AgentOutput parsing.
Returns None if the string is not a JSON object.
"""
try:
obj = dirty_json.parse(text)
except Exception:
return None
if not isinstance(obj, dict):
return None
return dirty_json.stringify(normalize_parsed_browser_use_output(obj))
def sanitize_llm_message_content_for_browser_use(content: str | None) -> str | None:
"""
Best-effort sanitize assistant message content in place for browser-use.
- If content parses as a dict: strip bad keys and re-serialize.
- If content is non-JSON or trailing garbage: try dirty_json parse; if dict, sanitize.
- Otherwise return the original string.
"""
if content is None:
return None
stripped = content.strip()
if not stripped:
return content
sanitized = parse_and_sanitize_llm_json(stripped)
if sanitized is not None:
return sanitized
if not stripped.startswith("{"):
try:
obj = dirty_json.parse(stripped)
except Exception:
return content
if isinstance(obj, dict):
return dirty_json.stringify(normalize_parsed_browser_use_output(obj))
return content

View file

@ -1,122 +0,0 @@
from __future__ import annotations
from typing import Any
from helpers import plugins as plugin_helpers
from plugins._model_config.helpers import model_config
MODEL_PRESET_KEY = "model_preset"
def get_browser_model_preset_name(agent=None) -> str:
config = plugin_helpers.get_plugin_config("_browser_agent", agent=agent) or {}
return str(config.get(MODEL_PRESET_KEY, "") or "").strip()
def get_browser_model_preset_options(agent=None) -> list[dict[str, Any]]:
selected_name = get_browser_model_preset_name(agent)
options: list[dict[str, Any]] = []
found_selected = False
for preset in model_config.get_presets():
name = str(preset.get("name", "") or "").strip()
if not name:
continue
if name == selected_name:
found_selected = True
chat_cfg = preset.get("chat", {}) if isinstance(preset, dict) else {}
if not isinstance(chat_cfg, dict):
chat_cfg = {}
provider = str(chat_cfg.get("provider", "") or "").strip()
model_name = str(chat_cfg.get("name", "") or "").strip()
summary = " / ".join(part for part in (provider, model_name) if part)
options.append(
{
"name": name,
"label": name,
"missing": False,
"summary": summary,
}
)
if selected_name and not found_selected:
options.append(
{
"name": selected_name,
"label": f"{selected_name} (missing)",
"missing": True,
"summary": "",
}
)
return options
def resolve_browser_model_selection(agent=None) -> dict[str, Any]:
preset_name = get_browser_model_preset_name(agent)
if preset_name:
preset = model_config.get_preset_by_name(preset_name)
if isinstance(preset, dict):
chat_cfg = preset.get("chat", {})
if isinstance(chat_cfg, dict) and (
str(chat_cfg.get("provider", "") or "").strip()
or str(chat_cfg.get("name", "") or "").strip()
):
return {
"config": chat_cfg,
"source_kind": "preset",
"source_label": f"Preset '{preset_name}' via _model_config",
"selected_preset_name": preset_name,
"preset_status": "active",
"warning": "",
}
return {
"config": model_config.get_chat_model_config(agent),
"source_kind": "main",
"source_label": "Main Model via _model_config",
"selected_preset_name": preset_name,
"preset_status": "invalid",
"warning": (
f"Configured browser preset '{preset_name}' does not define a chat model. "
"Falling back to the Main Model."
),
}
return {
"config": model_config.get_chat_model_config(agent),
"source_kind": "main",
"source_label": "Main Model via _model_config",
"selected_preset_name": preset_name,
"preset_status": "missing",
"warning": (
f"Configured browser preset '{preset_name}' was not found. "
"Falling back to the Main Model."
),
}
return {
"config": model_config.get_chat_model_config(agent),
"source_kind": "main",
"source_label": "Main Model via _model_config",
"selected_preset_name": "",
"preset_status": "none",
"warning": "",
}
def save_browser_model_preset_name(preset_name: str) -> None:
normalized = str(preset_name or "").strip()
config = plugin_helpers.get_plugin_config("_browser_agent") or {}
if normalized:
config[MODEL_PRESET_KEY] = normalized
else:
config.pop(MODEL_PRESET_KEY, None)
plugin_helpers.save_plugin_config(
"_browser_agent",
project_name="",
agent_profile="",
settings=config,
)

View file

@ -1,38 +0,0 @@
import os
import sys
from pathlib import Path
import subprocess
from helpers import files
# this helper ensures that playwright is installed in /lib/playwright
# should work for both docker and local installation
def get_playwright_binary():
pw_cache = Path(get_playwright_cache_dir())
for pattern in (
"chromium_headless_shell-*/chrome-*/headless_shell",
"chromium_headless_shell-*/chrome-*/headless_shell.exe",
):
binary = next(pw_cache.glob(pattern), None)
if binary:
return binary
return None
def get_playwright_cache_dir():
return files.get_abs_path("tmp/playwright")
def ensure_playwright_binary():
bin = get_playwright_binary()
if not bin:
cache = get_playwright_cache_dir()
env = os.environ.copy()
env["PLAYWRIGHT_BROWSERS_PATH"] = cache
subprocess.check_call(
["playwright", "install", "chromium", "--only-shell"],
env=env
)
bin = get_playwright_binary()
if not bin:
raise Exception("Playwright binary not found after installation")
return bin

View file

@ -1,8 +0,0 @@
name: _browser_agent
title: Browser Agent
description: Built-in browser-use automation tool.
version: 1.0.0
always_enabled: false
settings_sections: []
per_project_config: false
per_agent_config: false

View file

@ -1,7 +0,0 @@
### browser_agent
subordinate browser worker for web tasks
args: `message`, `reset`
- give clear task-oriented instructions, credentials, and a stop condition
- `reset=true` starts a new browser session; `false` continues the current one
- when continuing, refer to open pages instead of restarting
downloads go to `/a0/tmp/downloads`

View file

@ -1,22 +0,0 @@
# Operation instruction
Keep your tasks solution as simple and straight forward as possible
Follow instructions as closely as possible
When told go to website, open the website. If no other instructions: stop there
Do not interact with the website unless told to
Always accept all cookies if prompted on the website, NEVER go to browser cookie settings
If asked specific questions about a website, be as precise and close to the actual page content as possible
If you are waiting for instructions: you should end the task and mark as done
## Task Completion
When you have completed the assigned task OR are waiting for further instructions:
1. Use the "Complete task" action to mark the task as complete
2. Provide the required parameters: title, response, and page_summary
3. Do NOT continue taking actions after calling "Complete task"
## Important Notes
- Always call "Complete task" when your objective is achieved
- In page_summary respond with one paragraph of main content plus an overview of page elements
- Response field is used to answer to user's task or ask additional questions
- If you navigate to a website and no further actions are requested, call "Complete task" immediately
- If you complete any requested interaction (clicking, typing, etc.), call "Complete task"
- Never leave a task running indefinitely - always conclude with "Complete task"

View file

@ -1,440 +0,0 @@
import asyncio
import time
from typing import Optional, cast
from agent import Agent, InterventionException
from pathlib import Path
from helpers.tool import Tool, Response
from helpers import files, defer, persist_chat, strings
from plugins._browser_agent.helpers.browser_use import browser_use # type: ignore[attr-defined]
from helpers.print_style import PrintStyle
from plugins._browser_agent.helpers.playwright import ensure_playwright_binary
from helpers.secrets import get_secrets_manager
from extensions.python.message_loop_start._10_iteration_no import get_iter_no
from pydantic import BaseModel
import uuid
from helpers.dirty_json import DirtyJson
PLUGIN_DIR = Path(__file__).resolve().parents[1]
class State:
@staticmethod
async def create(agent: Agent):
state = State(agent)
return state
def __init__(self, agent: Agent):
self.agent = agent
self.browser_session: Optional[browser_use.BrowserSession] = None
self.task: Optional[defer.DeferredTask] = None
self.use_agent: Optional[browser_use.Agent] = None
self.secrets_dict: Optional[dict[str, str]] = None
self.iter_no = 0
def __del__(self):
self.kill_task()
files.delete_dir(self.get_user_data_dir()) # cleanup user data dir
def get_user_data_dir(self):
return str(
Path.home()
/ ".config"
/ "browseruse"
/ "profiles"
/ f"agent_{self.agent.context.id}"
)
def _get_browser_http_headers(self):
# ignored for now
return {}
def _get_browser_vision(self):
from plugins._model_config.helpers.model_config import get_chat_model_config
cfg = get_chat_model_config(self.agent)
return cfg.get("vision", False)
async def _initialize(self):
if self.browser_session:
return
# for some reason we need to provide exact path to headless shell, otherwise it looks for headed browser
pw_binary = ensure_playwright_binary()
self.browser_session = browser_use.BrowserSession(
browser_profile=browser_use.BrowserProfile(
headless=True,
disable_security=True,
chromium_sandbox=False,
accept_downloads=True,
downloads_path=files.get_abs_path("usr/downloads"),
allowed_domains=["*", "http://*", "https://*"],
executable_path=pw_binary,
keep_alive=True,
minimum_wait_page_load_time=1.0,
wait_for_network_idle_page_load_time=2.0,
maximum_wait_page_load_time=10.0,
window_size={"width": 1024, "height": 2048},
screen={"width": 1024, "height": 2048},
viewport={"width": 1024, "height": 2048},
no_viewport=False,
args=["--headless=new", "--no-sandbox"],
# Use a unique user data directory to avoid conflicts
user_data_dir=self.get_user_data_dir(),
extra_http_headers=self._get_browser_http_headers(),
)
)
await self.browser_session.start() if self.browser_session else None
# self.override_hooks()
# --------------------------------------------------------------------------
# Patch to enforce vertical viewport size
# --------------------------------------------------------------------------
# Browser-use auto-configuration overrides viewport settings, causing wrong
# aspect ratio. We fix this by directly setting viewport size after startup.
# --------------------------------------------------------------------------
if self.browser_session:
try:
page = await self.browser_session.get_current_page()
if page:
await page.set_viewport_size({"width": 1024, "height": 2048})
except Exception as e:
PrintStyle().warning(f"Could not force set viewport size: {e}")
# --------------------------------------------------------------------------
# Add init script to the browser session
if self.browser_session and self.browser_session.browser_context:
js_override = str(PLUGIN_DIR / "assets" / "init_override.js")
await self.browser_session.browser_context.add_init_script(path=js_override) if self.browser_session else None
def start_task(self, task: str):
if self.task and self.task.is_alive():
self.kill_task()
self.task = defer.DeferredTask(
thread_name="BrowserAgent" + self.agent.context.id
)
if self.agent.context.task:
self.agent.context.task.add_child_task(self.task, terminate_thread=True)
self.task.start_task(self._run_task, task) if self.task else None
return self.task
def kill_task(self):
if self.task:
self.task.kill(terminate_thread=True)
self.task = None
if self.browser_session:
try:
import asyncio
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(self.browser_session.close()) if self.browser_session else None
loop.close()
except Exception as e:
PrintStyle().error(f"Error closing browser session: {e}")
finally:
self.browser_session = None
self.use_agent = None
self.iter_no = 0
async def _run_task(self, task: str):
await self._initialize()
class DoneResult(BaseModel):
title: str
response: str
page_summary: str
# Initialize controller
controller = browser_use.Controller(output_model=DoneResult)
# Register custom completion action with proper ActionResult fields
@controller.registry.action("Complete task", param_model=DoneResult)
async def complete_task(params: DoneResult):
result = browser_use.ActionResult(
is_done=True, success=True, extracted_content=params.model_dump_json()
)
return result
model = self.agent.get_browser_model()
try:
secrets_manager = get_secrets_manager(self.agent.context)
secrets_dict = secrets_manager.load_secrets()
self.use_agent = browser_use.Agent(
task=task,
browser_session=self.browser_session,
llm=model,
use_vision=self._get_browser_vision(),
extend_system_message=self.agent.read_prompt(
"prompts/browser_agent.system.md"
),
controller=controller,
enable_memory=False, # Disable memory to avoid state conflicts
llm_timeout=3000, # TODO rem
sensitive_data=cast(dict[str, str | dict[str, str]] | None, secrets_dict or {}), # Pass secrets
)
except Exception as e:
raise Exception(
f"Browser agent initialization failed. This might be due to model compatibility issues. Error: {e}"
) from e
self.iter_no = get_iter_no(self.agent)
async def hook(agent: browser_use.Agent):
await self.agent.wait_if_paused()
if self.iter_no != get_iter_no(self.agent):
raise InterventionException("Task cancelled")
# try:
result = None
if self.use_agent:
result = await self.use_agent.run(
max_steps=50, on_step_start=hook, on_step_end=hook
)
return result
async def get_page(self):
if self.use_agent and self.browser_session:
try:
return await self.use_agent.browser_session.get_current_page() if self.use_agent.browser_session else None
except Exception:
# Browser session might be closed or invalid
return None
return None
async def get_selector_map(self):
"""Get the selector map for the current page state."""
if self.use_agent:
await self.use_agent.browser_session.get_state_summary(cache_clickable_elements_hashes=True) if self.use_agent.browser_session else None
return await self.use_agent.browser_session.get_selector_map() if self.use_agent.browser_session else None
await self.use_agent.browser_session.get_state_summary(
cache_clickable_elements_hashes=True
)
return await self.use_agent.browser_session.get_selector_map()
return {}
class BrowserAgent(Tool):
async def execute(self, message="", reset="", **kwargs):
self.guid = self.agent.context.generate_id() # short random id
reset = str(reset).lower().strip() == "true"
await self.prepare_state(reset=reset)
message = get_secrets_manager(self.agent.context).mask_values(message, placeholder="<secret>{key}</secret>") # mask any potential passwords passed from A0 to browser-use to browser-use format
task = self.state.start_task(message) if self.state else None
# wait for browser agent to finish and update progress with timeout
timeout_seconds = 300 # 5 minute timeout
start_time = time.time()
fail_counter = 0
while not task.is_ready() if task else False:
# Check for timeout to prevent infinite waiting
if time.time() - start_time > timeout_seconds:
PrintStyle().warning(
self._mask(f"Browser agent task timeout after {timeout_seconds} seconds, forcing completion")
)
break
await self.agent.handle_intervention()
await asyncio.sleep(1)
try:
if task and task.is_ready(): # otherwise get_update hangs
break
try:
update = await asyncio.wait_for(self.get_update(), timeout=10)
fail_counter = 0 # reset on success
except asyncio.TimeoutError:
fail_counter += 1
PrintStyle().warning(
self._mask(f"browser_agent.get_update timed out ({fail_counter}/3)")
)
if fail_counter >= 3:
PrintStyle().warning(
self._mask("3 consecutive browser_agent.get_update timeouts, breaking loop")
)
break
continue
update_log = update.get("log", get_use_agent_log(None))
self.update_progress("\n".join(update_log))
screenshot = update.get("screenshot", None)
if screenshot:
self.log.update(screenshot=screenshot)
except Exception as e:
PrintStyle().error(self._mask(f"Error getting update: {str(e)}"))
if task and not task.is_ready():
PrintStyle().warning(self._mask("browser_agent.get_update timed out, killing the task"))
self.state.kill_task() if self.state else None
return Response(
message=self._mask("Browser agent task timed out, not output provided."),
break_loop=False,
)
# final progress update
if self.state and self.state.use_agent:
log_final = get_use_agent_log(self.state.use_agent)
self.update_progress("\n".join(log_final))
# collect result with error handling
try:
result = await task.result() if task else None
except Exception as e:
PrintStyle().error(self._mask(f"Error getting browser agent task result: {str(e)}"))
# Return a timeout response if task.result() fails
answer_text = self._mask(f"Browser agent task failed to return result: {str(e)}")
self.log.update(answer=answer_text)
return Response(message=answer_text, break_loop=False)
# finally:
# # Stop any further browser access after task completion
# # self.state.kill_task()
# pass
# Check if task completed successfully
if result and result.is_done():
answer = result.final_result()
try:
if answer and isinstance(answer, str) and answer.strip():
answer_data = DirtyJson.parse_string(answer)
answer_text = strings.dict_to_text(answer_data) # type: ignore
else:
answer_text = (
str(answer) if answer else "Task completed successfully"
)
except Exception as e:
answer_text = (
str(answer)
if answer
else f"Task completed with parse error: {str(e)}"
)
else:
# Task hit max_steps without calling done()
urls = result.urls() if result else []
current_url = urls[-1] if urls else "unknown"
answer_text = (
f"Task reached step limit without completion. Last page: {current_url}. "
f"The browser agent may need clearer instructions on when to finish."
)
# Mask answer for logs and response
answer_text = self._mask(answer_text)
# update the log (without screenshot path here, user can click)
self.log.update(answer=answer_text)
# add screenshot to the answer if we have it
if (
self.log.kvps
and "screenshot" in self.log.kvps
and self.log.kvps["screenshot"]
):
path = self.log.kvps["screenshot"].split("//", 1)[-1].split("&", 1)[0]
answer_text += f"\n\nScreenshot: {path}"
# respond (with screenshot path)
return Response(message=answer_text, break_loop=False)
def get_log_object(self):
return self.agent.context.log.log(
type="browser",
heading=f"icon://captive_portal {self.agent.agent_name}: Calling Browser Agent",
content="",
kvps=self.args,
)
async def get_update(self):
await self.prepare_state()
result = {}
agent = self.agent
ua = self.state.use_agent if self.state else None
page = await self.state.get_page() if self.state else None
if ua and page:
try:
async def _get_update():
# await agent.wait_if_paused() # no need here
# Build short activity log
result["log"] = get_use_agent_log(ua)
path = files.get_abs_path(
persist_chat.get_chat_folder_path(agent.context.id),
"browser",
"screenshots",
f"{self.guid}.png",
)
files.make_dirs(path)
await page.screenshot(path=path, full_page=False, timeout=3000)
result["screenshot"] = f"img://{path}&t={str(time.time())}"
if self.state and self.state.task and not self.state.task.is_ready():
await self.state.task.execute_inside(_get_update)
except Exception:
pass
return result
async def prepare_state(self, reset=False):
self.state = self.agent.get_data("_browser_agent_state")
if reset and self.state:
self.state.kill_task()
if not self.state or reset:
self.state = await State.create(self.agent)
self.agent.set_data("_browser_agent_state", self.state)
def update_progress(self, text):
text = self._mask(text)
short = text.split("\n")[-1]
if len(short) > 50:
short = short[:50] + "..."
progress = f"Browser: {short}"
self.log.update(progress=text)
self.agent.context.log.set_progress(progress)
def _mask(self, text: str) -> str:
try:
return get_secrets_manager(self.agent.context).mask_values(text or "")
except Exception as e:
return text or ""
# def __del__(self):
# if self.state:
# self.state.kill_task()
def get_use_agent_log(use_agent: browser_use.Agent | None):
result = ["🚦 Starting task"]
if use_agent:
action_results = use_agent.history.action_results() or []
short_log = []
for item in action_results:
# final results
if item.is_done:
if item.success:
short_log.append("✅ Done")
else:
short_log.append(
f"❌ Error: {item.error or item.extracted_content or 'Unknown error'}"
)
# progress messages
else:
text = item.extracted_content
if text:
first_line = text.split("\n", 1)[0][:200]
short_log.append(first_line)
result.extend(short_log)
return result

View file

@ -1,51 +0,0 @@
import { createStore } from "/js/AlpineStore.js";
import { callJsonApi } from "/js/api.js";
const STATUS_API = "/plugins/_browser_agent/status";
const MODEL_PRESET_API = "/plugins/_browser_agent/model_preset";
const model = {
loading: true,
savingPreset: false,
error: "",
status: null,
async refreshStatus() {
this.status = await callJsonApi(STATUS_API, {});
},
async savePreset(presetName) {
this.savingPreset = true;
try {
await callJsonApi(MODEL_PRESET_API, {
action: presetName ? "set" : "clear",
preset_name: presetName || "",
});
this.error = "";
await this.refreshStatus();
} catch (error) {
this.error = error instanceof Error ? error.message : String(error);
await this.refreshStatus();
} finally {
this.savingPreset = false;
}
},
async onOpen() {
this.loading = true;
this.error = "";
try {
await this.refreshStatus();
} catch (error) {
this.status = null;
this.error = error instanceof Error ? error.message : String(error);
} finally {
this.loading = false;
}
},
cleanup() {},
};
export const store = createStore("browserAgentPage", model);

View file

@ -1,232 +0,0 @@
<html>
<head>
<title>Browser Agent</title>
<script type="module">
import { store } from "/plugins/_browser_agent/webui/browser-agent-store.js";
</script>
</head>
<body>
<div x-data>
<template x-if="$store.browserAgentPage">
<div
x-create="$store.browserAgentPage.onOpen()"
x-destroy="$store.browserAgentPage.cleanup()"
class="browser-agent-page"
>
<div class="section-description">
Built-in browser automation plugin backed by `browser-use` and Playwright.
Model selection stays in `_model_config`; the browser agent can follow the effective Main Model or use one saved preset just for browser tasks.
</div>
<div class="browser-agent-card" x-show="$store.browserAgentPage.loading">
<div class="status-row">
<span class="material-symbols-outlined spinning">progress_activity</span>
<span>Loading browser status...</span>
</div>
</div>
<div class="browser-agent-card error" x-show="!$store.browserAgentPage.loading && $store.browserAgentPage.error">
<div class="field-title">Status check failed</div>
<div class="field-description" x-text="$store.browserAgentPage.error"></div>
</div>
<template x-if="!$store.browserAgentPage.loading && $store.browserAgentPage.status">
<div class="browser-agent-grid">
<div class="browser-agent-card">
<div class="field-title">Model Source</div>
<div class="field-description" x-text="$store.browserAgentPage.status.model_source"></div>
<div class="field-description" x-show="$store.browserAgentPage.status.preset_warning" x-text="$store.browserAgentPage.status.preset_warning"></div>
</div>
<div class="browser-agent-card">
<div class="field-title">Resolved Browser Model</div>
<div class="status-row">
<span class="status-key">Provider</span>
<span class="status-value" x-text="$store.browserAgentPage.status.model.provider || 'Not configured'"></span>
</div>
<div class="status-row">
<span class="status-key">Model</span>
<span class="status-value" x-text="$store.browserAgentPage.status.model.name || 'Not configured'"></span>
</div>
<div class="status-row">
<span class="status-key">Vision</span>
<span class="status-badge" :class="$store.browserAgentPage.status.model.vision ? 'ok' : 'warn'" x-text="$store.browserAgentPage.status.model.vision ? 'Enabled' : 'Disabled'"></span>
</div>
</div>
<div class="browser-agent-card">
<div class="field-title">Browser Model Preset</div>
<div class="field-description">
Pick an optional `_model_config` preset for browser-only runs. Leave it empty to keep using the effective Main Model.
</div>
<label class="browser-agent-select-label" for="browser-agent-preset-select">Preset</label>
<select
id="browser-agent-preset-select"
class="browser-agent-select"
:disabled="$store.browserAgentPage.savingPreset"
x-model="$store.browserAgentPage.status.selected_preset_name"
@change="$store.browserAgentPage.savePreset($store.browserAgentPage.status.selected_preset_name)"
>
<option value="">Use Main Model</option>
<template x-for="preset in $store.browserAgentPage.status.available_presets" :key="preset.name">
<option :value="preset.name" x-text="preset.label"></option>
</template>
</select>
<div class="field-description" x-show="$store.browserAgentPage.savingPreset">Saving browser preset...</div>
</div>
<div class="browser-agent-card">
<div class="field-title">Playwright Runtime</div>
<div class="status-row">
<span class="status-key">Binary</span>
<span class="status-badge" :class="$store.browserAgentPage.status.playwright.binary_found ? 'ok' : 'fail'" x-text="$store.browserAgentPage.status.playwright.binary_found ? 'Found' : 'Missing'"></span>
</div>
<div class="status-row">
<span class="status-key">Cache</span>
<span class="status-value mono" x-text="$store.browserAgentPage.status.playwright.cache_dir"></span>
</div>
<div class="status-row" x-show="$store.browserAgentPage.status.playwright.binary_path">
<span class="status-key">Path</span>
<span class="status-value mono" x-text="$store.browserAgentPage.status.playwright.binary_path"></span>
</div>
<div class="field-description" x-show="!$store.browserAgentPage.status.playwright.binary_found">
Docker images ship the Playwright Chromium shell preinstalled. In local development, the first run installs it on demand via <span class="mono">ensure_playwright_binary()</span> if missing.
</div>
</div>
<div class="browser-agent-card">
<div class="field-title">browser-use</div>
<div class="status-row">
<span class="status-key">Import</span>
<span class="status-badge" :class="$store.browserAgentPage.status.browser_use.import_ok ? 'ok' : 'fail'" x-text="$store.browserAgentPage.status.browser_use.import_ok ? 'Ready' : 'Error'"></span>
</div>
<div class="status-row" x-show="$store.browserAgentPage.status.browser_use.version">
<span class="status-key">Version</span>
<span class="status-value" x-text="$store.browserAgentPage.status.browser_use.version"></span>
</div>
<div class="field-description mono" x-show="$store.browserAgentPage.status.browser_use.error" x-text="$store.browserAgentPage.status.browser_use.error"></div>
</div>
</div>
</template>
<div class="browser-agent-actions">
<button class="btn btn-field" @click="openModal('/plugins/_model_config/webui/main.html')">
Open Presets
</button>
<button class="btn btn-field" @click="openModal('/plugins/_model_config/webui/api-keys.html')">
Open API Keys
</button>
</div>
</div>
</template>
</div>
<style>
.browser-agent-page {
display: flex;
flex-direction: column;
gap: 14px;
}
.browser-agent-grid {
display: grid;
gap: 12px;
grid-template-columns: repeat(auto-fit, minmax(260px, 1fr));
}
.browser-agent-card {
display: flex;
flex-direction: column;
gap: 10px;
padding: 14px;
background: var(--color-input);
border: 1px solid var(--color-border);
border-radius: 10px;
}
.browser-agent-card.error {
border-color: rgba(214, 40, 40, 0.35);
}
.browser-agent-actions {
display: flex;
gap: 8px;
flex-wrap: wrap;
}
.browser-agent-select-label {
font-size: 0.78rem;
opacity: 0.75;
}
.browser-agent-select {
width: 100%;
min-height: 36px;
padding: 8px 10px;
border-radius: 8px;
border: 1px solid var(--color-border);
background: var(--color-bg);
color: var(--color-text);
}
.browser-agent-select:disabled {
opacity: 0.7;
cursor: wait;
}
.status-row {
display: flex;
align-items: flex-start;
justify-content: space-between;
gap: 12px;
font-size: 0.84rem;
}
.status-key {
opacity: 0.7;
min-width: 64px;
}
.status-value {
text-align: right;
word-break: break-word;
}
.status-badge {
padding: 2px 8px;
border-radius: 999px;
font-size: 0.76rem;
font-weight: 600;
border: 1px solid transparent;
}
.status-badge.ok {
color: #1b5e20;
background: rgba(46, 125, 50, 0.14);
border-color: rgba(46, 125, 50, 0.24);
}
.status-badge.warn {
color: #8a6100;
background: rgba(191, 144, 0, 0.14);
border-color: rgba(191, 144, 0, 0.24);
}
.status-badge.fail {
color: #9f1239;
background: rgba(190, 24, 93, 0.12);
border-color: rgba(190, 24, 93, 0.24);
}
.mono {
font-family: var(--font-mono);
font-size: 0.78rem;
}
option {
background: var(--color-input);
color: var(--color-text);
}
</style>
</body>
</html>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 9.7 KiB

View file

@ -23,7 +23,6 @@ This plugin centralizes model selection and model-related settings for the appli
- Allows a chat context to store a temporary override or preset reference in context data.
- **Model object construction**
- Builds `ModelConfig` objects and the runtime chat, utility, and embedding wrappers used elsewhere in the app.
- Note: Browser model wiring now lives in the `_browser_agent` plugin.
- **API key validation**
- Reports configured providers that still require API keys.

View file

@ -1,6 +1,5 @@
a2wsgi==1.10.8
ansio==0.0.1
browser-use==0.5.11
docker==7.1.0
duckduckgo-search==6.1.12
faiss-cpu==1.11.0

View file

@ -0,0 +1,106 @@
---
name: a0-browser-ext
description: Create, inspect, install, and safely maintain Chrome extensions for Agent Zero's built-in Browser plugin.
tags: ["agent-zero", "browser", "chrome-extension", "playwright", "manifest-v3"]
---
# Agent Zero Browser Extensions
Use this skill when the user wants to create a new Browser extension, modify an existing extension, or install a Chrome Web Store extension for Agent Zero's direct `_browser` plugin.
## Operating Model
- Agent Zero loads Browser extensions from unpacked directories.
- Create user-owned extensions under `/a0/usr/browser-extensions/<extension-slug>/`.
- Browser extension paths must be visible inside the Docker runtime. Prefer `/a0/usr/browser-extensions/...` paths over host-only paths.
- The Browser puzzle menu can open "My Browser Extensions", seed a "+ Create New with A0" request, and install Chrome Web Store URLs.
- Chrome Web Store installs are converted into unpacked extension folders before Browser can load them.
- Extension setting changes restart active Browser runtimes so Playwright can relaunch Chromium with the extension arguments.
## Safety First
Browser extensions run inside the Docker browser sandbox, but malicious or buggy extensions can still damage that sandboxed environment, corrupt browser profiles, exfiltrate page data visible to the Browser, or make browsing unreliable.
Before creating or installing an extension:
- State the requested behavior in one sentence.
- List the minimum permissions and host permissions needed.
- Avoid `<all_urls>` unless the user explicitly needs broad page access.
- Avoid remote code, eval-style execution, hidden credential collection, and broad network access.
- Do not store secrets in extension files.
- Prefer content scripts for page-local behavior and service workers for coordination.
- Tell the user when an extension can read or modify page content.
## Create New Extension
1. Ask for the extension name, user-visible purpose, target websites, and whether it needs a popup, content script, background service worker, options page, or side panel.
2. Choose a lowercase slug such as `reader-highlighter`.
3. Create `/a0/usr/browser-extensions/<slug>/manifest.json`.
4. Add only the files the extension actually needs.
5. Validate JSON syntax and confirm `manifest_version` is `3`.
6. Keep generated code small, readable, and easy for the user to audit.
7. After creating the folder, tell the user to open Browser's puzzle menu, use "Browser Extension Settings", enable extensions, and include the new folder path if it is not already enabled.
Minimal Manifest V3 starter:
```json
{
"manifest_version": 3,
"name": "Agent Zero Example Extension",
"version": "0.1.0",
"description": "Small, auditable Browser extension created with Agent Zero.",
"permissions": [],
"host_permissions": [],
"action": {
"default_title": "A0 Extension"
}
}
```
Content script starter:
```json
{
"manifest_version": 3,
"name": "Agent Zero Page Helper",
"version": "0.1.0",
"description": "Adds a small page helper for specific sites.",
"permissions": [],
"host_permissions": ["https://example.com/*"],
"content_scripts": [
{
"matches": ["https://example.com/*"],
"js": ["content.js"],
"run_at": "document_idle"
}
]
}
```
## Install From Chrome Web Store
If the user gives a Chrome Web Store URL or extension id:
1. Confirm they understand the sandbox warning.
2. Extract the 32-character extension id from the URL.
3. Prefer the Browser puzzle menu's URL installer for direct installs.
4. If installing manually, download the CRX from Chrome's update service, extract the ZIP payload safely, and place it under `/a0/usr/browser-extensions/chrome-web-store/<extension-id>/`.
5. Inspect `manifest.json` and summarize name, version, permissions, host permissions, and suspicious capabilities.
6. Enable only after the user accepts the risk.
Common URL shapes:
```text
https://chromewebstore.google.com/detail/name/<extension-id>
https://chrome.google.com/webstore/detail/name/<extension-id>
<extension-id>
```
## Review Checklist
- `manifest.json` parses cleanly.
- Every permission has a reason.
- Host matches are specific.
- No credential scraping, hidden data upload, or remote executable code.
- UI text is concise and tells the truth.
- The extension can be removed by deleting its folder from `/a0/usr/browser-extensions/` and removing the path from Browser settings.

View file

@ -706,7 +706,7 @@ The framework ships with these core plugins in `/a0/plugins/`:
| `_memory` | Persistent vector memory system |
| `_text_editor` | File read/write/patch with line numbers |
| `_model_config` | LLM model selection and configuration |
| `_browser_agent` | Browser automation and web interaction |
| `_browser` | Direct browser automation and WebUI viewing |
| `_infection_check` | Prompt injection safety checks |
| `_error_retry` | Retry on critical exceptions |
| `_email_integration` | Email communication via IMAP/SMTP |

View file

@ -1,74 +1,404 @@
import asyncio
import importlib
import json
import sys
import threading
from pathlib import Path
from types import SimpleNamespace
import pytest
PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
import plugins._browser_agent.helpers.browser_use_monkeypatch as browser_use_monkeypatch
import plugins._browser_agent.tools.browser_agent as browser_agent_module
from plugins._browser.helpers.config import (
build_browser_launch_config,
get_browser_model_preset_options,
normalize_browser_config,
resolve_browser_model_selection,
)
from plugins._browser.helpers.extension_manager import (
_crx_zip_payload,
parse_chrome_web_store_extension_id,
)
from plugins._browser.helpers.runtime import normalize_url
import plugins._browser.hooks as browser_hooks_module
import plugins._browser.tools.browser as browser_tool_module
import plugins._browser.api.ws_browser as ws_browser_module
def test_gemini_clean_and_conform_normalizes_known_single_action_shapes():
raw = (
'{"action":['
'{"complete_task":{"title":"T","response":"R","page_summary":"S"}}'
']}'
def test_browser_url_normalization_matches_address_bar_hosts():
assert normalize_url("localhost:3000") == "http://localhost:3000/"
assert normalize_url("127.0.0.1:8000/path") == "http://127.0.0.1:8000/path"
assert normalize_url("novinky.cz") == "https://novinky.cz/"
assert normalize_url("https://example.com") == "https://example.com/"
assert normalize_url("about:blank") == "about:blank"
def test_browser_config_normalizes_extension_paths(tmp_path):
extension_dir = tmp_path / "extension"
extension_dir.mkdir()
config = normalize_browser_config(
{
"extensions_enabled": 1,
"extension_paths": [str(extension_dir), "", " ", str(extension_dir)],
}
)
cleaned = browser_use_monkeypatch.gemini_clean_and_conform(raw)
assert config == {
"extensions_enabled": True,
"extension_paths": [str(extension_dir)],
"model_preset": "",
}
assert cleaned is not None
parsed = json.loads(cleaned)
assert parsed["action"] == [
def test_browser_config_normalizes_model_preset():
assert normalize_browser_config({"model_preset": " Research "})["model_preset"] == "Research"
assert "model" not in normalize_browser_config({"model": "main"})
def test_browser_model_selection_uses_presets(monkeypatch):
import plugins._browser.helpers.config as browser_config_module
from plugins._model_config.helpers import model_config
monkeypatch.setattr(
browser_config_module,
"get_browser_config",
lambda agent=None: {"model_preset": "Research", "extensions_enabled": False, "extension_paths": []},
)
monkeypatch.setattr(
model_config,
"get_preset_by_name",
lambda name: {
"name": "Research",
"chat": {"provider": "openrouter", "name": "example/model"},
} if name == "Research" else None,
)
selection = resolve_browser_model_selection(SimpleNamespace())
assert selection["source_kind"] == "preset"
assert selection["config"] == {"provider": "openrouter", "name": "example/model"}
def test_browser_model_selection_falls_back_to_main_for_missing_preset(monkeypatch):
from plugins._model_config.helpers import model_config
monkeypatch.setattr(model_config, "get_preset_by_name", lambda name: None)
monkeypatch.setattr(
model_config,
"get_chat_model_config",
lambda agent=None: {"provider": "openrouter", "name": "main/model"},
)
selection = resolve_browser_model_selection(SimpleNamespace(), {"model_preset": "Missing"})
assert selection["source_kind"] == "main"
assert selection["preset_status"] == "missing"
assert selection["config"] == {"provider": "openrouter", "name": "main/model"}
def test_browser_model_preset_options_include_missing_selected(monkeypatch):
from plugins._model_config.helpers import model_config
monkeypatch.setattr(
model_config,
"get_presets",
lambda: [{"name": "Balance", "chat": {"provider": "openrouter", "name": "model"}}],
)
options = get_browser_model_preset_options(settings={"model_preset": "Deleted"})
assert options[-1]["name"] == "Deleted"
assert options[-1]["missing"] is True
def test_browser_launch_config_switches_to_chromium_for_extensions(tmp_path):
extension_dir = tmp_path / "extension"
extension_dir.mkdir()
launch = build_browser_launch_config(
{
"done": {
"success": True,
"data": {
"title": "T",
"response": "R",
"page_summary": "S",
},
}
"extensions_enabled": True,
"extension_paths": [str(extension_dir)],
}
)
assert launch["browser_mode"] == "chromium_extensions"
assert launch["channel"] == "chromium"
assert launch["requires_full_browser"] is True
assert launch["extensions"]["active"] is True
assert any(arg.startswith("--load-extension=") for arg in launch["args"])
assert "--headless=new" not in launch["args"]
def test_browser_extension_manager_parses_web_store_urls():
extension_id = "a" * 32
assert parse_chrome_web_store_extension_id(extension_id) == extension_id
assert (
parse_chrome_web_store_extension_id(
f"https://chromewebstore.google.com/detail/example/{extension_id}"
)
== extension_id
)
assert (
parse_chrome_web_store_extension_id(
f"https://chrome.google.com/webstore/detail/example/{extension_id}?hl=en"
)
== extension_id
)
def test_browser_extension_manager_extracts_crx3_zip_payload():
payload = b"PK\x03\x04zip-payload"
header = b"metadata"
crx = b"Cr24" + (3).to_bytes(4, "little") + len(header).to_bytes(4, "little") + header + payload
assert _crx_zip_payload(crx) == payload
def test_browser_extension_menu_exposes_agent_and_url_paths():
html = (PROJECT_ROOT / "plugins" / "_browser" / "webui" / "main.html").read_text(
encoding="utf-8"
)
skill = PROJECT_ROOT / "skills" / "a0-browser-ext" / "SKILL.md"
assert "+ Create New with A0" in html
assert "Chrome Web Store URL" in html
assert "My Browser Extensions" in html
assert "malicious or buggy extensions" in html
assert skill.exists()
def test_browser_save_plugin_config_restarts_runtimes_on_change(monkeypatch, tmp_path):
extension_dir = tmp_path / "extension"
extension_dir.mkdir()
restarted = []
monkeypatch.setattr(
browser_hooks_module,
"_load_saved_browser_config",
lambda project_name="", agent_profile="": {
"extensions_enabled": False,
"extension_paths": [],
},
]
)
monkeypatch.setattr(
browser_hooks_module,
"close_all_runtimes_sync",
lambda: restarted.append(True),
)
result = browser_hooks_module.save_plugin_config(
{
"extensions_enabled": True,
"extension_paths": [str(extension_dir)],
},
project_name="",
agent_profile="",
)
assert result["extensions_enabled"] is True
assert result["extension_paths"] == [str(extension_dir)]
assert result["model_preset"] == ""
assert restarted == [True]
class DummyBrowserSession:
def __init__(self) -> None:
self.kill_called = False
self.close_called = False
def test_browser_save_plugin_config_does_not_restart_runtimes_for_preset_only(monkeypatch):
restarted = []
async def kill(self) -> None:
self.kill_called = True
monkeypatch.setattr(
browser_hooks_module,
"_load_saved_browser_config",
lambda project_name="", agent_profile="": {
"extensions_enabled": False,
"extension_paths": [],
"model_preset": "",
},
)
monkeypatch.setattr(
browser_hooks_module,
"close_all_runtimes_sync",
lambda: restarted.append(True),
)
async def close(self) -> None:
self.close_called = True
result = browser_hooks_module.save_plugin_config(
{
"extensions_enabled": False,
"extension_paths": [],
"model_preset": "Research",
},
project_name="",
agent_profile="",
)
assert result["model_preset"] == "Research"
assert restarted == []
class DummyAgent:
def __init__(self) -> None:
self.context = SimpleNamespace(id="ctx", task=None)
@pytest.mark.asyncio
async def test_browser_tool_dispatches_direct_actions(monkeypatch):
calls = []
class FakeRuntime:
async def call(self, method, *args):
calls.append((method, args))
if method == "content":
return {"document": "[link 1] Example"}
return {"ok": True, "method": method, "args": args}
async def fake_get_runtime(context_id, create=True):
assert context_id == "ctx"
return FakeRuntime()
monkeypatch.setattr(browser_tool_module, "get_runtime", fake_get_runtime)
agent = SimpleNamespace(context=SimpleNamespace(id="ctx"))
tool = browser_tool_module.Browser(
agent=agent,
name="browser",
method=None,
args={},
message="",
loop_data=None,
)
response = await tool.execute(action="content", browser_id=1)
assert response.message == "[link 1] Example"
assert calls == [("content", (1, None))]
def test_browser_session_teardown_prefers_kill_for_keep_alive_sessions():
state = browser_agent_module.State(DummyAgent())
session = DummyBrowserSession()
state.browser_session = session
@pytest.mark.asyncio
async def test_browser_viewer_subscribe_unregisters_stream(monkeypatch):
class FakeRuntime:
def __init__(self) -> None:
self.opened = False
state.kill_task()
async def call(self, method, *args):
if method == "list":
if self.opened:
return {
"browsers": [{"id": 1, "currentUrl": "about:blank", "title": ""}],
"last_interacted_browser_id": 1,
}
return {"browsers": [], "last_interacted_browser_id": None}
if method == "open":
self.opened = True
return {"id": 1, "state": {"id": 1, "currentUrl": "about:blank"}}
raise AssertionError(method)
assert session.kill_called is True
assert session.close_called is False
async def fake_get_runtime(context_id, create=True):
assert context_id == "ctx"
return FakeRuntime()
monkeypatch.setattr(ws_browser_module, "get_runtime", fake_get_runtime)
monkeypatch.setattr(
ws_browser_module.AgentContext,
"get",
staticmethod(lambda context_id: SimpleNamespace(id=context_id)),
)
handler = ws_browser_module.WsBrowser(
SimpleNamespace(),
threading.RLock(),
manager=None,
)
result = await handler.process(
"browser_viewer_subscribe",
{"context_id": "ctx", "correlationId": "c1"},
"sid-1",
)
assert result["context_id"] == "ctx"
assert ("sid-1", "ctx") in ws_browser_module.WsBrowser._streams
await handler.on_disconnect("sid-1")
assert ("sid-1", "ctx") not in ws_browser_module.WsBrowser._streams
def test_browser_cleanup_extensions_follow_new_extensible_path_layout():
extension = importlib.import_module("helpers.extension")
@pytest.mark.asyncio
async def test_browser_viewer_viewport_input_dispatches_resize(monkeypatch):
calls = []
class FakeRuntime:
async def call(self, method, *args, **kwargs):
calls.append((method, args, kwargs))
return {"ok": True, "method": method, "args": args}
async def fake_get_runtime(context_id, create=True):
assert context_id == "ctx"
assert create is False
return FakeRuntime()
monkeypatch.setattr(ws_browser_module, "get_runtime", fake_get_runtime)
handler = ws_browser_module.WsBrowser(
SimpleNamespace(),
threading.RLock(),
manager=None,
)
result = await handler.process(
"browser_viewer_input",
{
"context_id": "ctx",
"browser_id": 7,
"input_type": "viewport",
"width": 1280,
"height": 720,
},
"sid-1",
)
assert result == {"state": {"ok": True, "method": "set_viewport", "args": (7, 1280, 720)}}
assert calls == [("set_viewport", (7, 1280, 720), {})]
@pytest.mark.asyncio
async def test_browser_viewer_wheel_input_dispatches_scroll(monkeypatch):
calls = []
class FakeRuntime:
async def call(self, method, *args, **kwargs):
calls.append((method, args, kwargs))
return {"ok": True, "method": method, "args": args}
async def fake_get_runtime(context_id, create=True):
assert context_id == "ctx"
assert create is False
return FakeRuntime()
monkeypatch.setattr(ws_browser_module, "get_runtime", fake_get_runtime)
handler = ws_browser_module.WsBrowser(
SimpleNamespace(),
threading.RLock(),
manager=None,
)
result = await handler.process(
"browser_viewer_input",
{
"context_id": "ctx",
"browser_id": 3,
"input_type": "wheel",
"x": 320,
"y": 480,
"delta_x": 0,
"delta_y": 640,
},
"sid-1",
)
assert result == {"state": {"ok": True, "method": "wheel", "args": (3, 320.0, 480.0, 0.0, 640.0)}}
assert calls == [("wheel", (3, 320.0, 480.0, 0.0, 640.0), {})]
def test_browser_cleanup_extensions_follow_extensible_path_layout():
extension = __import__("helpers.extension", fromlist=["_get_extension_classes"])
remove_classes = extension._get_extension_classes( # type: ignore[attr-defined]
"_functions/agent/AgentContext/remove/start"
)
@ -76,5 +406,12 @@ def test_browser_cleanup_extensions_follow_new_extensible_path_layout():
"_functions/agent/AgentContext/reset/start"
)
assert any(cls.__name__ == "CleanupBrowserStateOnRemove" for cls in remove_classes)
assert any(cls.__name__ == "CleanupBrowserStateOnReset" for cls in reset_classes)
assert any(cls.__name__ == "CleanupBrowserRuntimeOnRemove" for cls in remove_classes)
assert any(cls.__name__ == "CleanupBrowserRuntimeOnReset" for cls in reset_classes)
def test_legacy_browser_dependency_is_removed():
assert not (PROJECT_ROOT / "plugins" / ("_browser" + "_agent")).exists()
assert ("browser" + "-use") not in (PROJECT_ROOT / "requirements.txt").read_text(
encoding="utf-8"
)

View file

@ -10,7 +10,7 @@ from typing import Iterator
import pytest
from flask import Flask
PROJECT_ROOT = Path(__file__).resolve().parents[2]
PROJECT_ROOT = Path(__file__).resolve().parents[1]
if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
@ -75,6 +75,19 @@ def _temporary_probe_plugin(surface: str) -> Iterator[tuple[str, str]]:
dir=plugins_root,
) as temp_plugin_dir:
plugin_id = Path(temp_plugin_dir).name
(Path(temp_plugin_dir) / "plugin.yaml").write_text(
(
f"name: {plugin_id}\n"
f"title: {plugin_id}\n"
"description: Temporary WebUI surface probe.\n"
"version: 0.0.0\n"
"always_enabled: false\n"
),
encoding="utf-8",
)
from helpers import cache
cache.clear("*(plugins)*")
probe_file = (
Path(temp_plugin_dir)
/ "extensions"
@ -91,7 +104,10 @@ def _temporary_probe_plugin(surface: str) -> Iterator[tuple[str, str]]:
),
encoding="utf-8",
)
yield plugin_id, probe_file.name
try:
yield plugin_id, probe_file.name
finally:
cache.clear("*(plugins)*")
@pytest.mark.asyncio
@ -117,8 +133,13 @@ async def test_webui_surface_extension_point_end_to_end(
f"{plugin_id}/extensions/webui/{surface}/{probe_file_name}"
)
assert any(
extension.get("plugin_id") == plugin_id
and str(extension.get("path", "")).replace("\\", "/").endswith(expected_suffix)
extension_paths = [
str(
extension.get("path", "")
if isinstance(extension, dict)
else extension
).replace("\\", "/")
for extension in extensions
)
]
assert any(path.endswith(expected_suffix) for path in extension_paths)

View file

@ -5,6 +5,20 @@ import { callJsExtensions } from "/js/extensions.js";
// Modal functionality
const modalStack = [];
function findModalIndexByPath(modalPath) {
return modalStack.findIndex((modal) => modal.path === modalPath);
}
function focusModal(modalPath) {
const modalIndex = findModalIndexByPath(modalPath);
if (modalIndex === -1) return false;
if (modalIndex === modalStack.length - 1) return true;
const [modal] = modalStack.splice(modalIndex, 1);
modalStack.push(modal);
updateModalZIndexes();
return true;
}
function getModalScrollElement(modal) {
return modal?.element?.querySelector(".modal-scroll");
}
@ -38,6 +52,15 @@ backdrop.style.display = "none";
backdrop.style.backdropFilter = "blur(5px)";
document.body.appendChild(backdrop);
function modalSuppressesBackdrop(modal) {
const path = String(modal?.path || "");
return path === "/plugins/_browser/webui/main.html"
|| path === "plugins/_browser/webui/main.html"
|| modal?.element?.classList?.contains("modal-floating")
|| modal?.element?.classList?.contains("modal-no-backdrop")
|| modal?.inner?.classList?.contains("modal-no-backdrop");
}
// Function to update z-index for all modals and backdrop
function updateModalZIndexes() {
// Base z-index for modals
@ -51,20 +74,26 @@ function updateModalZIndexes() {
modal.element.style.zIndex = baseZIndex + index * 20;
});
// Always show backdrop
backdrop.style.display = "block";
const backdropModalStack = modalStack.filter((modal) => !modalSuppressesBackdrop(modal));
if (modalStack.length > 1) {
// For multiple modals, position backdrop between the top two
const topModalIndex = modalStack.length - 1;
const previousModalZIndex = baseZIndex + (topModalIndex - 1) * 20;
backdrop.style.zIndex = previousModalZIndex + 10;
} else if (modalStack.length === 1) {
// For single modal, position backdrop below it
backdrop.style.zIndex = baseZIndex - 1;
} else {
// No modals, hide backdrop
if (backdropModalStack.length === 0) {
backdrop.style.display = "none";
return;
}
backdrop.style.display = "block";
backdrop.style.backdropFilter = "blur(5px)";
backdrop.style.backgroundColor = "";
if (backdropModalStack.length === modalStack.length && modalStack.length > 1) {
const topModalIndex = modalStack.length - 1;
backdrop.style.zIndex = baseZIndex + (topModalIndex - 1) * 20 + 10;
} else {
const topBackdropModal = backdropModalStack[backdropModalStack.length - 1];
const topBackdropModalIndex = modalStack.indexOf(topBackdropModal);
backdrop.style.zIndex = topBackdropModalIndex > 0
? baseZIndex + (topBackdropModalIndex - 1) * 20 + 10
: baseZIndex - 1;
}
}
@ -213,6 +242,26 @@ export async function openModal(modalPath, beforeClose = null) {
});
}
export function isModalOpen(modalPath) {
return findModalIndexByPath(modalPath) !== -1;
}
export async function ensureModalOpen(modalPath, beforeClose = null) {
if (focusModal(modalPath)) return null;
return openModal(modalPath, beforeClose);
}
export async function toggleModal(modalPath, beforeClose = null) {
if (!isModalOpen(modalPath)) {
return openModal(modalPath, beforeClose);
}
while (isModalOpen(modalPath)) {
const closed = await closeModal(modalPath);
if (closed === false) return false;
}
return true;
}
// Function to close modal
export async function closeModal(modalPath = null) {
if (modalStack.length === 0) return;
@ -369,3 +418,6 @@ document.addEventListener("keydown", (e) => {
globalThis.openModal = openModal;
globalThis.closeModal = closeModal;
globalThis.scrollModal = scrollModal;
globalThis.isModalOpen = isModalOpen;
globalThis.ensureModalOpen = ensureModalOpen;
globalThis.toggleModal = toggleModal;