diff --git a/README.md b/README.md index d9ecdedae..f75aa3ac8 100644 --- a/README.md +++ b/README.md @@ -99,12 +99,12 @@ A detailed setup guide for Windows, macOS, and Linux can be found in the Agent Z ![Multi-agent](docs/res/usage/multi-agent.png) -### Browser Agent +### Browser -- Browser automation is provided by the built-in `_browser_agent` plugin. -- It uses the effective Main Model resolved by `_model_config`; there is no separate browser model slot. -- Browser vision follows the Main Model's vision setting. -- Playwright Chromium: **Docker** images ship the headless shell preinstalled. **Local development** installs it on first Browser Agent use via `ensure_playwright_binary()` in `plugins/_browser_agent/helpers/playwright.py` (into `tmp/playwright`); you can pre-install manually (see [Development Setup](docs/setup/dev-setup.md)) to skip the wait. +- Browser automation is provided by the built-in `_browser` plugin and the direct `browser` tool. +- The tool uses Playwright operations controlled by the main agent, with typed page refs such as `[link 3]` and `[button 6]`. +- The plugin includes a visible WebUI browser viewer for open sessions. +- Playwright Chromium: **Docker** images ship the headless shell preinstalled. **Local development** installs it on first browser use via `ensure_playwright_binary()` in `plugins/_browser/helpers/playwright.py` (into `tmp/playwright`); you can pre-install manually (see [Development Setup](docs/setup/dev-setup.md)) to skip the wait. 4. **Completely Customizable and Extensible** diff --git a/agent.py b/agent.py index e7a73e858..ee3146db0 100644 --- a/agent.py +++ b/agent.py @@ -740,10 +740,6 @@ class Agent: def get_utility_model(self): return None - @extension.extensible - def get_browser_model(self): - return None - @extension.extensible def get_embedding_model(self): return None @@ -1044,4 +1040,4 @@ class Agent: message=message, loop_data=loop_data, **kwargs, - ) \ No newline at end of file + ) diff --git a/docs/agents/AGENTS.modals.md b/docs/agents/AGENTS.modals.md index 3b75f84bd..31509ee9c 100644 --- a/docs/agents/AGENTS.modals.md +++ b/docs/agents/AGENTS.modals.md @@ -216,6 +216,24 @@ Outcome: - Nested modals don’t “flatten” into each other. - The backdrop always darkens the page behind the active modal without hiding lower modals incorrectly. +### Floating no-backdrop modals + +Use `.modal-floating` on the outer `.modal` when a modal should behave like a floating utility panel instead of a blocking dialog. This is for special live surfaces such as the browser panel where the user should keep seeing and interacting with the chat or dashboard behind the panel. + +Working contract: + +- `.modal-floating` suppresses the shared `.modal-backdrop` for that modal. +- `.modal-floating` makes the full-screen `.modal` shell pointer-transparent. +- `.modal-floating .modal-inner` remains pointer-active, so the floating panel itself still receives clicks, keyboard focus, drag handlers, resize handles, and form input. +- Floating modal sizing, dragging, and resizing are still component-owned unless promoted to shared modal CSS later. The modal system only provides the backdrop and pointer-event behavior. + +Good to know: + +- A floating modal does not close by clicking the page behind it, because those clicks pass through to the app. Keep an obvious close button in the modal header. +- If a floating modal opens another normal modal, the normal modal can still use the backdrop; stacking remains governed by the shared z-index logic. +- Use `.modal-no-backdrop` only when a component needs backdrop suppression without click-through floating behavior. Prefer `.modal-floating` for utility panels. +- Do not use `.modal-floating` for destructive confirmations, settings forms, auth, import/export, or workflows that require the user to finish or dismiss the dialog before interacting with the rest of the app. + --- ## Writing a modal component (conventions) diff --git a/docs/guides/mcp-setup.md b/docs/guides/mcp-setup.md index 3bd3ddd3c..c08c3720e 100644 --- a/docs/guides/mcp-setup.md +++ b/docs/guides/mcp-setup.md @@ -114,4 +114,4 @@ Community-tested and reliable MCP servers: - **VSCode MCP** - IDE workflows > [!TIP] -> For browser automation tasks, the built-in Browser Agent plugin covers the default workflow. MCP-based browser tools are still useful when you need a different browser stack, remote browser control, or an alternative to the built-in Playwright Chromium (preinstalled in Docker; on demand via `ensure_playwright_binary()` in local dev). +> For browser automation tasks, the built-in `_browser` plugin and direct `browser` tool cover the default workflow. MCP-based browser tools are still useful when you need a different browser stack, remote browser control, or an alternative to the built-in Playwright Chromium (preinstalled in Docker; on demand via `ensure_playwright_binary()` in local dev). diff --git a/docs/guides/projects.md b/docs/guides/projects.md index 06d0e7281..681d61f04 100644 --- a/docs/guides/projects.md +++ b/docs/guides/projects.md @@ -221,7 +221,7 @@ SMTP_PASSWORD=email_pwd_here ### Subagent Configuration -Projects can enable or disable specific subagents. This is configured via the UI and stored in `.a0proj/agents.json`. The Browser Agent is not a subagent; it is a built-in plugin. +Projects can enable or disable specific subagents. This is configured via the UI and stored in `.a0proj/agents.json`. The browser tool is not a subagent; it is a built-in plugin. ### Project LLM Configuration diff --git a/docs/guides/troubleshooting.md b/docs/guides/troubleshooting.md index 802844892..dc42254e5 100644 --- a/docs/guides/troubleshooting.md +++ b/docs/guides/troubleshooting.md @@ -26,8 +26,8 @@ Refer to the [Choosing your LLMs](../setup/installation.md#installing-and-using- **7. How can I make Agent Zero retain memory between sessions?** Use **Settings → Backup & Restore** and avoid mapping the entire `/a0` directory. See [How to update Agent Zero](../setup/installation.md#how-to-update-agent-zero). -**8. My browser agent fails or says Playwright is missing. What now?** -The built-in Browser Agent is a plugin that uses the Main Model from `_model_config`. **Docker:** the Chromium headless shell is shipped preinstalled (typically under `/a0/tmp/playwright`). **Local development:** if the binary is missing, `ensure_playwright_binary()` in `plugins/_browser_agent/helpers/playwright.py` runs `playwright install chromium --only-shell` into `tmp/playwright` on first Browser Agent use (you may see UI notifications). To install ahead of time, run `PLAYWRIGHT_BROWSERS_PATH=tmp/playwright playwright install chromium --only-shell` after `pip install -r requirements.txt`. If you prefer an external browser stack, use MCP alternatives such as Browser OS, Chrome DevTools, or Playwright MCP. See [MCP Setup](mcp-setup.md). +**8. My browser tool fails or says Playwright is missing. What now?** +The built-in browser is provided by the `_browser` plugin and the direct `browser` tool. **Docker:** the Chromium headless shell is shipped preinstalled (typically under `/a0/tmp/playwright`). **Local development:** if the binary is missing, `ensure_playwright_binary()` in `plugins/_browser/helpers/playwright.py` runs `playwright install chromium --only-shell` into `tmp/playwright` on first browser use (you may see UI notifications). To install ahead of time, run `PLAYWRIGHT_BROWSERS_PATH=tmp/playwright playwright install chromium --only-shell` after `pip install -r requirements.txt`. If you prefer an external browser stack, use MCP alternatives such as Browser OS, Chrome DevTools, or Playwright MCP. See [MCP Setup](mcp-setup.md). **9. My secrets disappeared after a backup restore.** Secrets are stored in `/a0/usr/secrets.env` and are not always included in backup archives. Copy them manually. @@ -36,7 +36,7 @@ Secrets are stored in `/a0/usr/secrets.env` and are not always included in backu - Join the Agent Zero [Skool](https://www.skool.com/agent-zero) or [Discord](https://discord.gg/B8KZKNsPpj) community. **11. How do I adjust API rate limits?** -Use the model rate limit fields in Settings (Main Model and Utility Model sections) to set request/input/output limits. The Browser Agent inherits the Main Model limits. These map to the model config limits (for example `limit_requests`, `limit_input`, `limit_output`). +Use the model rate limit fields in Settings (Main Model and Utility Model sections) to set request/input/output limits. These map to the model config limits (for example `limit_requests`, `limit_input`, `limit_output`). **12. My `code_execution_tool` doesn't work, what's wrong?** - Ensure Docker is installed and running. diff --git a/docs/guides/usage.md b/docs/guides/usage.md index c77e672f1..b98198394 100644 --- a/docs/guides/usage.md +++ b/docs/guides/usage.md @@ -126,8 +126,8 @@ Agent Zero's power comes from its ability to use [tools](../developer/architectu - **Understand Tools:** Agent Zero includes default tools like knowledge (powered by SearXNG), code execution, and communication. Understand the capabilities of these tools and how to invoke them. -### Browser Agent Status & MCP Alternatives -The built-in Browser Agent is provided by the `_browser_agent` plugin. It uses the effective Main Model from `_model_config`, including per-chat overrides and the Main Model vision flag. Playwright Chromium is preinstalled in **Docker**; in **local development** it is installed on demand when needed via `ensure_playwright_binary()` (see [Development Setup](../setup/dev-setup.md) to pre-install). +### Browser Tool Status & MCP Alternatives +The built-in browser is provided by the `_browser` plugin and direct `browser` tool. It uses Playwright operations controlled by the main agent, exposes typed page refs for links, buttons, images, and inputs, and includes a WebUI viewer for open browser sessions. Playwright Chromium is preinstalled in **Docker**; in **local development** it is installed on demand when needed via `ensure_playwright_binary()` (see [Development Setup](../setup/dev-setup.md) to pre-install). If you need a different browser stack or want external browser tooling, MCP-based browser tools are still a strong option: diff --git a/docs/setup/dev-setup.md b/docs/setup/dev-setup.md index 647b3decd..025296310 100644 --- a/docs/setup/dev-setup.md +++ b/docs/setup/dev-setup.md @@ -69,7 +69,7 @@ Now when you select one of the python files in the project, you should see prope pip install -r requirements.txt PLAYWRIGHT_BROWSERS_PATH=tmp/playwright playwright install chromium --only-shell ``` -The first command installs Python dependencies. The second installs the Chromium headless shell into `tmp/playwright` ahead of time (same path in Docker: `/a0/tmp/playwright`). If you skip the second command, **local development** still downloads the shell on first Browser Agent use through `ensure_playwright_binary()` in `plugins/_browser_agent/helpers/playwright.py`. Pre-installing avoids that wait. **Docker** images ship the shell preinstalled; runtime install is for local dev when the binary is missing. +The first command installs Python dependencies. The second installs the Chromium headless shell into `tmp/playwright` ahead of time (same path in Docker: `/a0/tmp/playwright`). If you skip the second command, **local development** still downloads the shell on first browser use through `ensure_playwright_binary()` in `plugins/_browser/helpers/playwright.py`. Pre-installing avoids that wait. **Docker** images ship the shell preinstalled; runtime install is for local dev when the binary is missing. Errors in the code editor caused by missing packages should now be gone. If not, try reloading the window. diff --git a/docs/setup/installation.md b/docs/setup/installation.md index 1e6d029a8..727dd5a07 100644 --- a/docs/setup/installation.md +++ b/docs/setup/installation.md @@ -405,7 +405,7 @@ The Settings page is the control center for selecting the Large Language Models | LLM Role | Description | | --- | --- | -| `chat_llm` | This is the primary LLM used for conversations, agent reasoning, tool use, and the built-in browser agent. Vision support controls browser vision and image understanding. | +| `chat_llm` | This is the primary LLM used for conversations, agent reasoning, and tool use. Vision support controls image understanding. | | `utility_llm` | This LLM handles internal tasks like summarizing messages, managing memory, and processing internal prompts. Using a smaller, less expensive model here can improve efficiency. | | `embedding_llm` | The embedding model shipped with A0 runs on CPU and is responsible for generating embeddings used for memory retrieval and knowledge base lookups. Changing the `embedding_llm` will re-index all of A0's memory. | @@ -416,7 +416,7 @@ The Settings page is the control center for selecting the Large Language Models 3. Click "Save" to apply the changes. > [!NOTE] -> The Browser Agent does not have a separate model slot. It uses the effective Main Model resolved by `_model_config`, including per-chat overrides and the Main Model vision flag. +> The built-in browser does not have a separate model slot. The main agent decides when to call the direct `browser` tool. ### Important Considerations diff --git a/knowledge/main/about/capabilities.md b/knowledge/main/about/capabilities.md index e16ab3809..d8365c8a5 100644 --- a/knowledge/main/about/capabilities.md +++ b/knowledge/main/about/capabilities.md @@ -76,7 +76,7 @@ An external REST API is available for programmatic task submission. Agent-to-Age - **No persistent state between chats** unless explicitly memorized or saved to files. - **Context window**: long conversations are summarized automatically, which can lose detail. - **Memory recall is approximate**: similarity search may miss relevant memories or surface irrelevant ones. -- **No GUI interaction** outside the browser agent (which is separate from the main agent). +- **No GUI interaction** outside built-in browser tooling or configured computer-use integrations. - **Container boundary**: the agent cannot affect systems outside the Docker container unless network access or volume mounts are configured. - **Model capability ceiling**: tool usage quality and reasoning depth are bounded by the underlying LLM. Small models may struggle with complex multi-step tool use. - **No real-time data** beyond web search. The agent's own knowledge cutoff is the underlying model's training cutoff. diff --git a/knowledge/main/about/configuration.md b/knowledge/main/about/configuration.md index 57a49205a..ff009aeca 100644 --- a/knowledge/main/about/configuration.md +++ b/knowledge/main/about/configuration.md @@ -6,11 +6,11 @@ Agent Zero uses three configurable LLM roles: | Role | Purpose | |------|---------| -| `chat_llm` | Primary model for all agent reasoning, tool use, and the Browser Agent | +| `chat_llm` | Primary model for all agent reasoning and tool use | | `utility_llm` | Secondary model for internal framework tasks: memory summarization, query generation, history compression, memory recall filtering | | `embedding_llm` | Produces vector embeddings for memory and knowledge indexing | -The utility model handles high-volume, lower-stakes operations and can be a cheaper/faster model than the chat model. The Browser Agent uses the effective chat model resolved by `_model_config`, including per-chat overrides and the chat model vision flag. Changing the embedding model invalidates the existing vector index - the entire knowledge base is re-indexed automatically. +The utility model handles high-volume, lower-stakes operations and can be a cheaper/faster model than the chat model. Browser automation is exposed as the direct `browser` tool; the main agent decides when to call it. Changing the embedding model invalidates the existing vector index - the entire knowledge base is re-indexed automatically. ## Model Providers diff --git a/models.py b/models.py index 824f5cf70..80600e855 100644 --- a/models.py +++ b/models.py @@ -45,7 +45,7 @@ from sentence_transformers import SentenceTransformer from pydantic import ConfigDict -# disable extra logging, must be done repeatedly, otherwise browser-use will turn it back on for some reason +# keep provider logging quiet in normal operation def turn_off_logging(): os.environ["LITELLM_LOG"] = "ERROR" # only errors litellm.suppress_debug_info = True diff --git a/plugins/_browser/api/extensions.py b/plugins/_browser/api/extensions.py new file mode 100644 index 000000000..b2cf8306c --- /dev/null +++ b/plugins/_browser/api/extensions.py @@ -0,0 +1,27 @@ +from helpers.api import ApiHandler, Request +from plugins._browser.helpers.extension_manager import ( + get_extensions_root, + install_chrome_web_store_extension, + list_browser_extensions, +) + + +class Extensions(ApiHandler): + async def process(self, input: dict, request: Request) -> dict: + action = input.get("action", "list") + + if action == "list": + return { + "ok": True, + "root": str(get_extensions_root()), + "extensions": list_browser_extensions(), + } + + if action == "install_web_store": + try: + result = install_chrome_web_store_extension(str(input.get("url", ""))) + except ValueError as exc: + return {"ok": False, "error": str(exc)} + return result + + return {"ok": False, "error": f"Unknown action: {action}"} diff --git a/plugins/_browser/api/status.py b/plugins/_browser/api/status.py new file mode 100644 index 000000000..922a314d1 --- /dev/null +++ b/plugins/_browser/api/status.py @@ -0,0 +1,32 @@ +from helpers.api import ApiHandler, Request +from plugins._browser.helpers.config import build_browser_launch_config, get_browser_config +from plugins._browser.helpers.playwright import get_playwright_binary, get_playwright_cache_dir +from plugins._browser.helpers.runtime import known_context_ids + + +class Status(ApiHandler): + async def process(self, input: dict, request: Request) -> dict: + browser_config = get_browser_config() + launch_config = build_browser_launch_config(browser_config) + runtime_binary = get_playwright_binary( + full_browser=launch_config["requires_full_browser"] + ) + shell_binary = get_playwright_binary(full_browser=False) + chromium_binary = get_playwright_binary(full_browser=True) + return { + "plugin": "_browser", + "playwright": { + "cache_dir": get_playwright_cache_dir(), + "binary_found": bool(runtime_binary), + "binary_path": str(runtime_binary) if runtime_binary else "", + "headless_shell_binary_path": str(shell_binary) if shell_binary else "", + "chromium_binary_path": str(chromium_binary) if chromium_binary else "", + "launch_mode": launch_config["browser_mode"], + }, + "extensions": { + **launch_config["extensions"], + "launch_mode": launch_config["browser_mode"], + "requires_full_browser": launch_config["requires_full_browser"], + }, + "contexts": known_context_ids(), + } diff --git a/plugins/_browser/api/ws_browser.py b/plugins/_browser/api/ws_browser.py new file mode 100644 index 000000000..2dc7002a5 --- /dev/null +++ b/plugins/_browser/api/ws_browser.py @@ -0,0 +1,241 @@ +from __future__ import annotations + +import asyncio +from typing import Any, ClassVar + +from agent import AgentContext +from helpers.ws import WsHandler +from helpers.ws_manager import WsResult +from plugins._browser.helpers.runtime import get_runtime + + +class WsBrowser(WsHandler): + _streams: ClassVar[dict[tuple[str, str], asyncio.Task[None]]] = {} + + async def on_disconnect(self, sid: str) -> None: + for key in [key for key in self._streams if key[0] == sid]: + task = self._streams.pop(key) + task.cancel() + + async def process( + self, + event: str, + data: dict[str, Any], + sid: str, + ) -> dict[str, Any] | WsResult | None: + if not event.startswith("browser_"): + return None + + if event == "browser_viewer_subscribe": + return await self._subscribe(data, sid) + if event == "browser_viewer_unsubscribe": + return self._unsubscribe(data, sid) + if event == "browser_viewer_command": + return await self._command(data, sid) + if event == "browser_viewer_input": + return await self._input(data, sid) + + return WsResult.error( + code="UNKNOWN_BROWSER_EVENT", + message=f"Unknown browser event: {event}", + correlation_id=data.get("correlationId"), + ) + + async def _subscribe(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult: + context_id = self._context_id(data) + if not context_id: + return self._error("MISSING_CONTEXT", "context_id is required", data) + if not AgentContext.get(context_id): + return self._error("CONTEXT_NOT_FOUND", f"Context '{context_id}' was not found", data) + + runtime = await get_runtime(context_id) + listing = await runtime.call("list") + browsers = listing.get("browsers") or [] + if not browsers: + opened = await runtime.call("open", "about:blank") + listing = await runtime.call("list") + browsers = listing.get("browsers") or [] + if opened.get("id"): + listing["last_interacted_browser_id"] = opened.get("id") + active_id = data.get("browser_id") or listing.get("last_interacted_browser_id") + if not active_id and browsers: + active_id = browsers[0].get("id") + + stream_key = (sid, context_id) + existing = self._streams.pop(stream_key, None) + if existing: + existing.cancel() + self._streams[stream_key] = asyncio.create_task( + self._stream_frames(sid, context_id, active_id) + ) + + return { + "context_id": context_id, + "active_browser_id": active_id, + "browsers": browsers, + } + + def _unsubscribe(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult: + context_id = self._context_id(data) + if not context_id: + return self._error("MISSING_CONTEXT", "context_id is required", data) + task = self._streams.pop((sid, context_id), None) + if task: + task.cancel() + return {"context_id": context_id, "unsubscribed": True} + + async def _command(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult: + context_id = self._context_id(data) + if not context_id: + return self._error("MISSING_CONTEXT", "context_id is required", data) + runtime = await get_runtime(context_id) + command = str(data.get("command") or "").strip().lower().replace("-", "_") + browser_id = data.get("browser_id") + + try: + if command == "open": + result = await runtime.call("open", data.get("url") or "about:blank") + elif command == "navigate": + result = await runtime.call("navigate", browser_id, data.get("url") or "") + elif command == "back": + result = await runtime.call("back", browser_id) + elif command == "forward": + result = await runtime.call("forward", browser_id) + elif command == "reload": + result = await runtime.call("reload", browser_id) + elif command == "close": + result = await runtime.call("close_browser", browser_id) + elif command == "list": + result = await runtime.call("list") + else: + return self._error("UNKNOWN_COMMAND", f"Unknown browser command: {command}", data) + except Exception as exc: + return self._error("COMMAND_FAILED", str(exc), data) + + listing = await runtime.call("list") + last_interacted_browser_id = listing.get("last_interacted_browser_id") + await self.emit_to( + sid, + "browser_viewer_state", + { + "context_id": context_id, + "result": result, + "browsers": listing.get("browsers") or [], + "last_interacted_browser_id": last_interacted_browser_id, + }, + correlation_id=data.get("correlationId"), + ) + return { + "result": result, + "browsers": listing.get("browsers") or [], + "last_interacted_browser_id": last_interacted_browser_id, + } + + async def _input(self, data: dict[str, Any], sid: str) -> dict[str, Any] | WsResult: + context_id = self._context_id(data) + if not context_id: + return self._error("MISSING_CONTEXT", "context_id is required", data) + runtime = await get_runtime(context_id, create=False) + if not runtime: + return self._error("NO_BROWSER_RUNTIME", "No browser runtime exists for this context", data) + + input_type = str(data.get("input_type") or "").strip().lower() + browser_id = data.get("browser_id") + try: + if input_type == "mouse": + result = await runtime.call( + "mouse", + browser_id, + data.get("event_type") or "click", + float(data.get("x") or 0), + float(data.get("y") or 0), + data.get("button") or "left", + ) + elif input_type == "keyboard": + result = await runtime.call( + "keyboard", + browser_id, + key=str(data.get("key") or ""), + text=str(data.get("text") or ""), + ) + elif input_type == "viewport": + result = await runtime.call( + "set_viewport", + browser_id, + int(data.get("width") or 0), + int(data.get("height") or 0), + ) + elif input_type == "wheel": + result = await runtime.call( + "wheel", + browser_id, + float(data.get("x") or 0), + float(data.get("y") or 0), + float(data.get("delta_x") or 0), + float(data.get("delta_y") or 0), + ) + else: + return self._error("UNKNOWN_INPUT", f"Unknown browser input: {input_type}", data) + except Exception as exc: + return self._error("INPUT_FAILED", str(exc), data) + + return {"state": result} + + async def _stream_frames( + self, + sid: str, + context_id: str, + browser_id: int | str | None, + ) -> None: + while True: + try: + runtime = await get_runtime(context_id, create=False) + if runtime: + listing = await runtime.call("list") + browsers = listing.get("browsers") or [] + browser_ids = {str(browser.get("id")) for browser in browsers} + requested_id = str(browser_id or "") if browser_id else "" + active_id = ( + browser_id + if requested_id and requested_id in browser_ids + else listing.get("last_interacted_browser_id") + ) + if active_id and str(active_id) not in browser_ids: + active_id = None + if not active_id and browsers: + active_id = browsers[0].get("id") + if active_id: + frame = await runtime.call("screenshot", active_id) + frame["context_id"] = context_id + frame["browsers"] = browsers + await self.emit_to(sid, "browser_viewer_frame", frame) + else: + await self.emit_to( + sid, + "browser_viewer_frame", + { + "context_id": context_id, + "browser_id": None, + "browsers": browsers, + "image": "", + "mime": "", + "state": None, + }, + ) + await asyncio.sleep(0.75) + except asyncio.CancelledError: + raise + except Exception: + await asyncio.sleep(1.5) + + @staticmethod + def _context_id(data: dict[str, Any]) -> str: + return str(data.get("context_id") or data.get("context") or "").strip() + + @staticmethod + def _error(code: str, message: str, data: dict[str, Any]) -> WsResult: + return WsResult.error( + code=code, + message=message, + correlation_id=data.get("correlationId"), + ) diff --git a/plugins/_browser/assets/browser-page-content.js b/plugins/_browser/assets/browser-page-content.js new file mode 100644 index 000000000..fec76e633 --- /dev/null +++ b/plugins/_browser/assets/browser-page-content.js @@ -0,0 +1,2891 @@ +(() => { + const GLOBAL_KEY = "__spaceBrowserPageContent__"; + const DOM_HELPER_KEY = "__spaceBrowserDomHelper__"; + const VERSION = "6"; + const BLOCK_TAGS = new Set([ + "ADDRESS", + "ARTICLE", + "ASIDE", + "BLOCKQUOTE", + "BODY", + "DETAILS", + "DIV", + "DL", + "FIELDSET", + "FIGCAPTION", + "FIGURE", + "FOOTER", + "FORM", + "H1", + "H2", + "H3", + "H4", + "H5", + "H6", + "HEADER", + "HR", + "HTML", + "LI", + "MAIN", + "NAV", + "OL", + "P", + "PRE", + "SECTION", + "TABLE", + "TBODY", + "TD", + "TFOOT", + "TH", + "THEAD", + "TR", + "UL" + ]); + const SKIP_TAGS = new Set([ + "HEAD", + "LINK", + "META", + "NOSCRIPT", + "SCRIPT", + "STYLE", + "TEMPLATE" + ]); + const INTERACTIVE_ROLES = new Set([ + "button", + "checkbox", + "combobox", + "link", + "menuitem", + "menuitemcheckbox", + "menuitemradio", + "option", + "radio", + "searchbox", + "slider", + "spinbutton", + "switch", + "tab", + "textbox" + ]); + const INTERACTIVE_EVENT_NAMES = new Set([ + "auxclick", + "change", + "click", + "contextmenu", + "dblclick", + "input", + "keydown", + "keypress", + "keyup", + "mousedown", + "mouseup", + "pointerdown", + "pointerup", + "submit", + "touchend", + "touchstart" + ]); + const INTERACTIVE_EVENT_PROPERTIES = [...INTERACTIVE_EVENT_NAMES] + .map((eventName) => `on${eventName}`); + + if (globalThis[GLOBAL_KEY]?.version === VERSION) { + return; + } + + const state = { + backend: "live", + captureId: 0, + capturedAt: 0, + captureOptions: { + includeLabelQuotes: false, + includeLinkUrls: false, + includeSemanticTags: true, + includeStateTags: true, + includeListIndentation: true, + includeListMarkers: false + }, + entries: new Map() + }; + + function isElementNode(value) { + return Boolean(value && value.nodeType === 1); + } + + function isTextNode(value) { + return Boolean(value && value.nodeType === 3); + } + + function normalizeText(value) { + return String(value ?? "") + .replace(/\s+/gu, " ") + .trim(); + } + + function looksLikeSerializedHtmlText(value) { + const normalizedValue = normalizeText(value); + if (!normalizedValue || !normalizedValue.includes("<") || !normalizedValue.includes(">")) { + return false; + } + + if (//iu.test(normalizedValue)) { + return true; + } + + const tagMatches = normalizedValue.match(/<\/?[a-z][^>]*>/giu) || []; + return tagMatches.length >= 3 && normalizedValue.length >= 80; + } + + function looksLikeBrowserHelperMarkupText(value) { + const normalizedValue = normalizeText(value); + if (!normalizedValue) { + return false; + } + + return /space-browser-(?:frame-document|shadow-root)/iu.test(normalizedValue) + || /data-space-browser-(?:frame|node|status|frame-url|frame-title|frame-src)/iu.test(normalizedValue); + } + + function looksLikeMinifiedScriptText(value) { + const normalizedValue = normalizeText(value); + if (!normalizedValue || normalizedValue.length < 400) { + return false; + } + + const jsSignals = [ + /\bfunction\b/u, + /\breturn\b/u, + /\bvar\b/u, + /\bnew\b/u, + /\bcase\b/u, + /\bswitch\b/u, + /\bwhile\b/u, + /\bfor\b/u, + /\b(?:localStorage|postMessage|document\.|window\.|parent\.)/u, + /\bthis\./u, + /(?:&&|\|\||>>>|!==|===)/u + ].reduce((count, pattern) => count + (pattern.test(normalizedValue) ? 1 : 0), 0); + + if (jsSignals < 4) { + return false; + } + + const punctuationCount = (normalizedValue.match(/[{}[\]();=<>\\]/gu) || []).length; + return punctuationCount / normalizedValue.length >= 0.12; + } + + function shouldDropReadableText(value) { + const normalizedValue = normalizeText(value); + if (!normalizedValue) { + return true; + } + + return looksLikeBrowserHelperMarkupText(normalizedValue) + || looksLikeSerializedHtmlText(normalizedValue) + || looksLikeMinifiedScriptText(normalizedValue); + } + + function normalizeAttributeText(value) { + return normalizeText(value).slice(0, 160); + } + + function escapeMarkdownText(value) { + return String(value ?? "").replace(/([\\`*_{}\[\]()#+\-!|>])/gu, "\\$1"); + } + + function quoteText(value) { + return JSON.stringify(String(value ?? "")); + } + + function truncateText(value, maxLength = 120) { + const normalizedValue = normalizeText(value); + if (normalizedValue.length <= maxLength) { + return normalizedValue; + } + + return `${normalizedValue.slice(0, Math.max(0, maxLength - 1)).trimEnd()}...`; + } + + function delayMs(timeoutMs) { + return new Promise((resolve) => { + globalThis.setTimeout(resolve, Math.max(0, Number(timeoutMs) || 0)); + }); + } + + function parseCssColor(value) { + const normalizedValue = normalizeText(value); + if (!normalizedValue || normalizedValue === "transparent") { + return null; + } + + const rgbMatch = normalizedValue.match(/^rgba?\(([^)]+)\)$/iu); + if (rgbMatch) { + const parts = rgbMatch[1] + .split(",") + .map((part) => Number.parseFloat(String(part || "").trim())) + .filter((part) => Number.isFinite(part)); + if (parts.length >= 3) { + return { + r: Math.max(0, Math.min(255, parts[0])), + g: Math.max(0, Math.min(255, parts[1])), + b: Math.max(0, Math.min(255, parts[2])), + a: parts.length >= 4 ? Math.max(0, Math.min(1, parts[3])) : 1 + }; + } + } + + const hexMatch = normalizedValue.match(/^#([\da-f]{3,8})$/iu); + if (!hexMatch) { + return null; + } + + const hex = hexMatch[1]; + if (hex.length === 3 || hex.length === 4) { + const [r, g, b, a = "f"] = hex.split(""); + return { + r: Number.parseInt(`${r}${r}`, 16), + g: Number.parseInt(`${g}${g}`, 16), + b: Number.parseInt(`${b}${b}`, 16), + a: Number.parseInt(`${a}${a}`, 16) / 255 + }; + } + + if (hex.length === 6 || hex.length === 8) { + return { + r: Number.parseInt(hex.slice(0, 2), 16), + g: Number.parseInt(hex.slice(2, 4), 16), + b: Number.parseInt(hex.slice(4, 6), 16), + a: hex.length === 8 ? Number.parseInt(hex.slice(6, 8), 16) / 255 : 1 + }; + } + + return null; + } + + function rgbToHsl(color) { + if (!color) { + return null; + } + + const r = color.r / 255; + const g = color.g / 255; + const b = color.b / 255; + const max = Math.max(r, g, b); + const min = Math.min(r, g, b); + const delta = max - min; + const lightness = (max + min) / 2; + let hue = 0; + let saturation = 0; + + if (delta > 0) { + saturation = delta / (1 - Math.abs(2 * lightness - 1)); + if (max === r) { + hue = 60 * (((g - b) / delta) % 6); + } else if (max === g) { + hue = 60 * (((b - r) / delta) + 2); + } else { + hue = 60 * (((r - g) / delta) + 4); + } + } + + if (hue < 0) { + hue += 360; + } + + return { + hue, + lightness, + saturation + }; + } + + function isTrustedHtmlRequirementError(error) { + return /TrustedHTML/iu.test(String(error?.message || error || "")); + } + + function joinBlocks(blocks) { + return blocks + .map((block) => String(block || "").trim()) + .filter(Boolean) + .join("\n\n") + .trim(); + } + + function cleanReadableMarkdown(value) { + const lines = String(value || "") + .replace(/[\s\S]*?<\/style\\?>/giu, "") + .replace(/[\s\S]*?<\/script\\?>/giu, "") + .replace(//giu, "") + .split("\n"); + + const filteredLines = []; + let insideCodeFence = false; + + lines.forEach((line) => { + const trimmedLine = String(line || "").trim(); + if (trimmedLine.startsWith("```")) { + insideCodeFence = !insideCodeFence; + filteredLines.push(line); + return; + } + + if (!trimmedLine || insideCodeFence) { + filteredLines.push(line); + return; + } + + if (shouldDropReadableText(trimmedLine)) { + return; + } + + filteredLines.push(line); + }); + + return filteredLines + .join("\n") + .replace(/\n{3,}/gu, "\n\n") + .trim(); + } + + function joinInlineParts(parts) { + return String(parts + .map((part) => String(part || "").trim()) + .filter(Boolean) + .join(" ")) + .replace(/\s+([,.;!?])/gu, "$1") + .replace(/([([{\u201c])\s+/gu, "$1") + .replace(/\s+([\])}\u201d])/gu, "$1") + .replace(/\s*\n\s*/gu, "\n") + .replace(/[ \t]+\n/gu, "\n") + .replace(/\n{3,}/gu, "\n\n") + .trim(); + } + + function indentBlock(text, level = 1) { + const prefix = " ".repeat(Math.max(0, level)); + return String(text || "") + .split("\n") + .map((line) => `${prefix}${line}`) + .join("\n"); + } + + function createNamedError(name, message, details = {}) { + const error = new Error(message); + error.name = name; + Object.assign(error, details); + return error; + } + + function coerceSelectorList(payload) { + if (typeof payload === "string") { + return [payload]; + } + + if (Array.isArray(payload?.selectors)) { + return payload.selectors; + } + + if (typeof payload?.selectors === "string") { + return [payload.selectors]; + } + + if (Array.isArray(payload?.selector)) { + return payload.selector; + } + + if (typeof payload?.selector === "string") { + return [payload.selector]; + } + + if (Array.isArray(payload)) { + return payload; + } + + return []; + } + + function normalizeSelectorList(payload) { + return coerceSelectorList(payload) + .map((selector) => String(selector || "").trim()) + .filter(Boolean); + } + + function normalizeIncludeLinkUrls(payload) { + return payload?.includeLinkUrls === true; + } + + function normalizeIncludeLabelQuotes(payload) { + return payload?.includeLabelQuotes === true; + } + + function normalizeIncludeListIndentation(payload) { + return payload?.includeListIndentation !== false; + } + + function normalizeIncludeListMarkers(payload) { + return payload?.includeListMarkers === true; + } + + function normalizeIncludeStateTags(payload) { + return payload?.includeStateTags !== false; + } + + function normalizeIncludeSemanticTags(payload) { + return payload?.includeSemanticTags !== false; + } + + function formatSummaryValue(value, options = {}) { + const normalizedValue = normalizeText(value); + if (!normalizedValue) { + return ""; + } + + if (options.includeLabelQuotes === true) { + return quoteText(normalizedValue); + } + + return escapeMarkdownText(normalizedValue); + } + + function normalizeFrameChain(value) { + const rawFrameChain = Array.isArray(value) + ? value + : typeof value === "string" + ? value.split(">") + : []; + + return rawFrameChain + .map((entry) => String(entry || "").trim()) + .filter(Boolean); + } + + function getDomHelper() { + const helper = globalThis[DOM_HELPER_KEY]; + if ( + helper + && typeof helper.captureDocument === "function" + && typeof helper.detailNode === "function" + && typeof helper.clickNode === "function" + && typeof helper.typeNode === "function" + && typeof helper.submitNode === "function" + && typeof helper.typeSubmitNode === "function" + && typeof helper.scrollNode === "function" + ) { + return helper; + } + + return null; + } + + function requireDomHelper(actionLabel) { + const helper = getDomHelper(); + if (helper) { + return helper; + } + + throw createNamedError( + "BrowserPageContentHelperUnavailableError", + `Browser page content cannot ${actionLabel} without the desktop DOM helper.`, + { + code: "browser_page_content_dom_helper_unavailable", + details: { + action: String(actionLabel || "resolve") + } + } + ); + } + + function normalizeReferenceId(value) { + if (typeof value === "number" && Number.isFinite(value)) { + return String(Math.trunc(value)); + } + + if (typeof value === "string") { + return value.trim(); + } + + if (value && typeof value === "object") { + return normalizeReferenceId(value.referenceId ?? value.ref ?? value.id); + } + + return ""; + } + + function getTagName(element) { + return String(element?.tagName || "").toUpperCase(); + } + + function getAttributeNamesSafe(element) { + try { + if (typeof element?.getAttributeNames === "function") { + return element.getAttributeNames(); + } + + return [...(element?.attributes || [])] + .map((attribute) => String(attribute?.name || "").trim()) + .filter(Boolean); + } catch { + return []; + } + } + + function normalizeInteractiveEventName(value) { + return String(value || "") + .trim() + .toLowerCase() + .split(/[.:]/u, 1)[0]; + } + + function isInteractiveEventName(value) { + return INTERACTIVE_EVENT_NAMES.has(normalizeInteractiveEventName(value)); + } + + function isInteractiveEventAttributeName(attributeName) { + const normalizedName = String(attributeName || "").trim().toLowerCase(); + if (!normalizedName) { + return false; + } + + if (normalizedName.startsWith("@")) { + return isInteractiveEventName(normalizedName.slice(1)); + } + + if (normalizedName.startsWith("x-on:") || normalizedName.startsWith("v-on:")) { + return isInteractiveEventName(normalizedName.slice(5)); + } + + if (normalizedName.startsWith("ng-")) { + return isInteractiveEventName(normalizedName.slice(3)); + } + + if (normalizedName.startsWith("on") && normalizedName.length > 2) { + return isInteractiveEventName(normalizedName.slice(2)); + } + + return false; + } + + function hasHelperManagedNodeReference(element) { + return Boolean(normalizeAttributeText(element?.getAttribute?.("data-space-browser-node-id"))); + } + + function hasInteractiveEventHandlerAttribute(element) { + return getAttributeNamesSafe(element).some((attributeName) => { + return isInteractiveEventAttributeName(attributeName); + }); + } + + function hasInteractiveEventHandlerProperty(element) { + return INTERACTIVE_EVENT_PROPERTIES.some((propertyName) => { + return typeof element?.[propertyName] === "function"; + }); + } + + function hasInteractiveEventHandler(element) { + return hasInteractiveEventHandlerAttribute(element) || hasInteractiveEventHandlerProperty(element); + } + + function isStyleDeclarationHidden(styleValue) { + const normalizedStyleValue = String(styleValue || "") + .toLowerCase() + .replace(/\s+/gu, ""); + + if (!normalizedStyleValue) { + return false; + } + + return /(?:^|;)display:none(?:;|$)/u.test(normalizedStyleValue) + || /(?:^|;)visibility:hidden(?:;|$)/u.test(normalizedStyleValue) + || /(?:^|;)visibility:collapse(?:;|$)/u.test(normalizedStyleValue) + || /(?:^|;)content-visibility:hidden(?:;|$)/u.test(normalizedStyleValue) + || /(?:^|;)opacity:0(?:\.0+)?(?:;|$)/u.test(normalizedStyleValue); + } + + function isComputedStyleHidden(computedStyle) { + if (!computedStyle) { + return false; + } + + const display = normalizeText(computedStyle.display).toLowerCase(); + const visibility = normalizeText(computedStyle.visibility).toLowerCase(); + const contentVisibility = normalizeText(computedStyle.contentVisibility).toLowerCase(); + const opacity = Number(computedStyle.opacity || 1); + + return display === "none" + || visibility === "hidden" + || visibility === "collapse" + || contentVisibility === "hidden" + || opacity <= 0; + } + + function isEffectivelyHiddenByAncestor(element) { + let current = element; + + while (isElementNode(current)) { + if (current.hidden || current.getAttribute?.("aria-hidden") === "true") { + return true; + } + + if (isStyleDeclarationHidden(current.getAttribute?.("style"))) { + return true; + } + + if (isComputedStyleHidden(getComputedStyleSafe(current))) { + return true; + } + + current = current.parentElement; + } + + return false; + } + + function isHiddenElement(element) { + if (!isElementNode(element)) { + return true; + } + + const tagName = getTagName(element); + if (SKIP_TAGS.has(tagName)) { + return true; + } + + if (element.hidden || element.getAttribute?.("aria-hidden") === "true") { + return true; + } + + if (tagName === "INPUT" && String(element.getAttribute?.("type") || "").toLowerCase() === "hidden") { + return true; + } + + if (isStyleDeclarationHidden(element.getAttribute?.("style"))) { + return true; + } + + const computedStyle = getComputedStyleSafe(element); + if (isComputedStyleHidden(computedStyle)) { + return true; + } + + return isEffectivelyHiddenByAncestor(element.parentElement); + } + + function isBlockElement(element) { + return BLOCK_TAGS.has(getTagName(element)); + } + + function isInteractiveElement(element) { + if (!isElementNode(element) || isHiddenElement(element)) { + return false; + } + + if (hasHelperManagedNodeReference(element)) { + return true; + } + + const tagName = getTagName(element); + if (tagName === "A" && element.hasAttribute?.("href")) { + return true; + } + + if (tagName === "BUTTON" || tagName === "INPUT" || tagName === "SELECT" || tagName === "TEXTAREA" || tagName === "SUMMARY") { + return true; + } + + if (String(element.getAttribute?.("contenteditable") || "").toLowerCase() === "true") { + return true; + } + + const role = String(element.getAttribute?.("role") || "").trim().toLowerCase(); + return INTERACTIVE_ROLES.has(role) || hasInteractiveEventHandler(element); + } + + function getComputedStyleSafe(element) { + try { + return globalThis.getComputedStyle?.(element) || null; + } catch { + return null; + } + } + + function getElementRectSafe(element) { + try { + const rect = element?.getBoundingClientRect?.(); + if (!rect) { + return null; + } + + return { + height: Number(rect.height) || 0, + width: Number(rect.width) || 0, + x: Number(rect.x) || 0, + y: Number(rect.y) || 0 + }; + } catch { + return null; + } + } + + function readSerializedTagList(element, attributeName) { + const rawValue = normalizeText(element?.getAttribute?.(attributeName)); + if (!rawValue) { + return []; + } + + return rawValue + .split(/\s+/u) + .map((part) => normalizeText(part)) + .filter(Boolean); + } + + function detectSemanticTone(element, computedStyle, metadata = {}) { + const opacity = Number(computedStyle?.opacity || 1); + const backgroundColor = parseCssColor(computedStyle?.backgroundColor || ""); + const borderColor = parseCssColor(computedStyle?.borderTopColor || ""); + const foregroundColor = parseCssColor(computedStyle?.color || ""); + const isButtonLike = ["BUTTON", "INPUT", "SUMMARY"].includes(getTagName(element)) + || ["button", "tab", "menuitem"].includes(String(element?.getAttribute?.("role") || "").trim().toLowerCase()); + + if (metadata.disabled || metadata.blocked || opacity <= 0.58) { + return "muted"; + } + + const preferredColor = [backgroundColor, borderColor, foregroundColor] + .filter((color) => color && color.a > 0.15) + .map((color) => ({ + color, + hsl: rgbToHsl(color) + })) + .find((entry) => entry.hsl && entry.hsl.saturation >= 0.2); + + if (!preferredColor) { + return ""; + } + + const { + hue, + lightness, + saturation + } = preferredColor.hsl; + if (saturation < 0.2) { + return ""; + } + + if ((hue >= 345 || hue < 20) && lightness >= 0.18 && lightness <= 0.82) { + return "error"; + } + + if (hue >= 20 && hue < 65 && lightness >= 0.2 && lightness <= 0.9) { + return "warning"; + } + + if (hue >= 65 && hue < 170 && lightness >= 0.16 && lightness <= 0.84) { + return "success"; + } + + if (hue >= 170 && hue < 280 && lightness >= 0.14 && lightness <= 0.82) { + if (isButtonLike && backgroundColor?.a > 0.2) { + return "primary"; + } + return ""; + } + + return ""; + } + + function collectElementStateMetadata(element, options = {}) { + if (!isElementNode(element)) { + return { + descriptorTags: [], + semanticTags: [], + stateTags: [] + }; + } + + const computedStyle = getComputedStyleSafe(element); + const rect = getElementRectSafe(element); + const tagName = getTagName(element); + const ariaDisabled = String(element.getAttribute?.("aria-disabled") || "").trim().toLowerCase() === "true"; + const ariaBusy = String(element.getAttribute?.("aria-busy") || "").trim().toLowerCase() === "true"; + const ariaChecked = String(element.getAttribute?.("aria-checked") || "").trim().toLowerCase() === "true"; + const ariaCurrent = normalizeText(element.getAttribute?.("aria-current")); + const ariaInvalid = String(element.getAttribute?.("aria-invalid") || "").trim().toLowerCase() === "true"; + const ariaPressed = String(element.getAttribute?.("aria-pressed") || "").trim().toLowerCase() === "true"; + const ariaReadonly = String(element.getAttribute?.("aria-readonly") || "").trim().toLowerCase() === "true"; + const ariaRequired = String(element.getAttribute?.("aria-required") || "").trim().toLowerCase() === "true"; + const ariaSelected = String(element.getAttribute?.("aria-selected") || "").trim().toLowerCase() === "true"; + const helperStateTags = readSerializedTagList(element, "data-space-browser-state-tags"); + const helperSemanticTags = readSerializedTagList(element, "data-space-browser-semantic-tags"); + const closestInert = typeof element.closest === "function" ? element.closest("[inert]") : null; + const pointerEventsNone = normalizeText(computedStyle?.pointerEvents || "").toLowerCase() === "none"; + const disabled = Boolean(element.disabled || ariaDisabled || closestInert || helperStateTags.includes("disabled")); + const blocked = !disabled && (pointerEventsNone || helperStateTags.includes("blocked")); + const checked = Boolean(element.checked || ariaChecked || helperStateTags.includes("checked")); + const selected = tagName === "OPTION" + ? Boolean(element.selected) + : Boolean(ariaSelected || helperStateTags.includes("selected")); + const invalid = Boolean(ariaInvalid || helperStateTags.includes("invalid") || element.matches?.(":invalid")); + const readonly = Boolean(element.readOnly || ariaReadonly); + const required = Boolean(element.required || ariaRequired); + const expanded = String(element.getAttribute?.("aria-expanded") || "").trim().toLowerCase() === "true" || helperStateTags.includes("expanded"); + const pressed = ariaPressed || helperStateTags.includes("pressed"); + const busy = ariaBusy || helperStateTags.includes("busy"); + const current = Boolean((ariaCurrent && ariaCurrent !== "false") || helperStateTags.includes("current")); + const zeroRect = Boolean( + rect + && element.ownerDocument === globalThis.document + && rect.width <= 1 + && rect.height <= 1 + ); + const opacity = Number(computedStyle?.opacity || 1); + const semanticTone = helperSemanticTags[0] || detectSemanticTone(element, computedStyle, { + blocked, + disabled + }); + const stateTags = helperStateTags.length + ? helperStateTags.slice() + : [ + disabled ? "disabled" : "", + !disabled && (blocked || zeroRect) ? "blocked" : "", + checked ? "checked" : "", + selected && tagName !== "SELECT" ? "selected" : "", + invalid ? "invalid" : "", + expanded ? "expanded" : "", + pressed ? "pressed" : "" + ].filter(Boolean); + + const semanticTags = helperSemanticTags.length + ? helperSemanticTags.slice(0, 1) + : (semanticTone ? [semanticTone] : []); + const descriptorTags = [ + ...(options.includeStateTags !== false ? stateTags : []), + ...(options.includeSemanticTags !== false ? semanticTags : []) + ]; + + return { + blocked, + busy, + checked, + current, + cursor: normalizeText(computedStyle?.cursor || "").toLowerCase(), + descriptorTags, + disabled, + expanded, + invalid, + opacity, + pointerEventsNone, + pressed, + readonly, + required, + selected, + semanticTags, + semanticTone, + stateTags, + visible: !isHiddenElement(element), + zeroRect + }; + } + + function getReferenceValueMetadata(element) { + const tagName = getTagName(element); + const helperLiveValue = normalizeText(element?.getAttribute?.("data-space-browser-live-value")); + const helperSelectedValue = normalizeText(element?.getAttribute?.("data-space-browser-selected-text")); + if (tagName === "INPUT") { + const inputType = String(element.getAttribute?.("type") || element.type || "text").toLowerCase(); + if (inputType === "password") { + return ""; + } + return truncateText(helperLiveValue || element.value || element.getAttribute?.("value") || "", 96); + } + + if (tagName === "TEXTAREA") { + return truncateText(helperLiveValue || element.value || "", 96); + } + + if (tagName === "SELECT") { + if (helperSelectedValue) { + return helperSelectedValue; + } + const selectedOptions = [...(element.selectedOptions || [])] + .map((option) => truncateText(option.textContent || option.label || option.value || "", 48)) + .filter(Boolean); + return selectedOptions.join(" | "); + } + + if (String(element.getAttribute?.("contenteditable") || "").toLowerCase() === "true") { + return truncateText(element.textContent || "", 96); + } + + return ""; + } + + function collectMetaLines(doc = globalThis.document) { + const lines = []; + const title = normalizeAttributeText(doc?.title || ""); + const description = normalizeAttributeText( + doc?.querySelector?.('meta[name="description"]')?.getAttribute?.("content") || "" + ); + const url = String(globalThis.location?.href || ""); + + if (!title && !description && !url) { + return ""; + } + + lines.push("---"); + if (title) { + lines.push(`title: ${quoteText(title)}`); + } + if (description) { + lines.push(`description: ${quoteText(description)}`); + } + if (url) { + lines.push(`url: ${quoteText(url)}`); + } + lines.push("---"); + return lines.join("\n"); + } + + function summarizeUrl(value) { + const normalizedValue = String(value || "").trim(); + if (!normalizedValue) { + return ""; + } + + try { + const url = new URL(normalizedValue, globalThis.location?.href || "http://localhost/"); + if (url.origin === globalThis.location?.origin) { + const relative = `${url.pathname || "/"}${url.search || ""}${url.hash || ""}`; + return truncateText(relative || "/", 96); + } + + return truncateText(`${url.hostname}${url.pathname || "/"}`, 96); + } catch { + return truncateText(normalizedValue, 96); + } + } + + function getElementText(element) { + return normalizeText(element?.textContent || ""); + } + + function collectLabelCandidates(element, options = {}) { + const includeAlt = options.includeAlt !== false; + const includeDescendantImageAlt = options.includeDescendantImageAlt !== false; + const includePlaceholder = options.includePlaceholder === true; + const includeText = options.includeText !== false; + const collectedLabels = []; + + try { + if (Array.isArray(element?.labels) || typeof element?.labels?.forEach === "function") { + element.labels.forEach((labelElement) => { + const text = getElementText(labelElement); + if (text) { + collectedLabels.push(text); + } + }); + } + } catch { + // Ignore labels lookup failures from non-form elements. + } + + [ + element?.getAttribute?.("aria-label"), + element?.getAttribute?.("title") + ].forEach((candidate) => { + const text = normalizeAttributeText(candidate); + if (text) { + collectedLabels.push(text); + } + }); + + if (includeAlt) { + const altText = normalizeAttributeText(element?.getAttribute?.("alt")); + if (altText) { + collectedLabels.push(altText); + } + } + + if (includePlaceholder) { + const placeholderText = normalizeAttributeText(element?.getAttribute?.("placeholder")); + if (placeholderText) { + collectedLabels.push(placeholderText); + } + } + + if (includeDescendantImageAlt) { + try { + [...(element?.querySelectorAll?.("img[alt], img[title]") || [])] + .slice(0, 3) + .forEach((mediaElement) => { + const text = normalizeAttributeText( + mediaElement.getAttribute?.("alt") + || mediaElement.getAttribute?.("title") + ); + if (text) { + collectedLabels.push(text); + } + }); + } catch { + // Ignore descendant-media lookup failures. + } + } + + if (includeText) { + const textContent = getElementText(element); + if (textContent) { + collectedLabels.push(textContent); + } + } + + return [...new Set(collectedLabels.filter(Boolean))]; + } + + function getLabelText(element, options = {}) { + return collectLabelCandidates(element, options)[0] || ""; + } + + function serializeElementSnapshot(element) { + if (!isElementNode(element)) { + return ""; + } + + try { + if (typeof element.outerHTML === "string" && element.outerHTML) { + return element.outerHTML; + } + } catch { + // Fall through to XMLSerializer. + } + + try { + if (typeof globalThis.XMLSerializer === "function") { + return new globalThis.XMLSerializer().serializeToString(element); + } + } catch { + // Ignore serialization errors. + } + + return ""; + } + + function getReferenceKind(element) { + const tagName = getTagName(element); + const role = String(element.getAttribute?.("role") || "").trim().toLowerCase(); + const inputType = String(element.getAttribute?.("type") || element.type || "text").toLowerCase(); + + if (tagName === "A" || role === "link") { + return "link"; + } + + if (tagName === "IMG") { + return "image"; + } + + if (tagName === "BUTTON" || ["button", "menuitem", "tab"].includes(role)) { + return "button"; + } + + if (tagName === "TEXTAREA") { + return "textarea"; + } + + if (tagName === "SELECT" || role === "combobox") { + return "select"; + } + + if (tagName === "SUMMARY") { + return "summary"; + } + + if (tagName === "INPUT") { + if (["button", "submit", "reset"].includes(inputType)) { + return "button"; + } + + if (inputType === "checkbox") { + return "checkbox"; + } + + if (inputType === "radio") { + return "radio"; + } + + return `input ${inputType || "text"}`; + } + + if (String(element.getAttribute?.("contenteditable") || "").toLowerCase() === "true") { + return "editable"; + } + + if (role === "searchbox") { + return "input search"; + } + + if (role === "textbox") { + return "input text"; + } + + if (hasHelperManagedNodeReference(element) || hasInteractiveEventHandler(element)) { + return "button"; + } + + return role || tagName.toLowerCase(); + } + + function collectReferenceSummaryData(element, options = {}) { + const tagName = getTagName(element); + const role = String(element.getAttribute?.("role") || "").trim().toLowerCase(); + const id = normalizeAttributeText(element.getAttribute?.("id")); + const name = normalizeAttributeText(element.getAttribute?.("name")); + const kind = getReferenceKind(element); + const stateMetadata = collectElementStateMetadata(element, options); + const formatValue = (value) => formatSummaryValue(value, options); + const includeLinkUrls = options.includeLinkUrls === true; + const parts = []; + const appendFallbackIdOrName = () => { + if (id) { + parts.push(`#${id}`); + return; + } + + if (name) { + parts.push(`name=${formatValue(name)}`); + } + }; + + if (tagName === "A" || role === "link") { + const hrefSummary = summarizeUrl(element.getAttribute?.("href") || element.href || ""); + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: true, + includePlaceholder: false, + includeText: true + }), 120); + const displayLabel = label || hrefSummary; + + if (displayLabel) { + parts.push(formatValue(displayLabel)); + } else { + appendFallbackIdOrName(); + } + + if (includeLinkUrls) { + if (hrefSummary && hrefSummary !== displayLabel) { + parts.push(`-> ${hrefSummary}`); + } + } + } else if (tagName === "BUTTON" || ["button", "menuitem", "tab"].includes(role)) { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: true, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else if (tagName === "TEXTAREA" || role === "textbox" || role === "searchbox") { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } + const placeholder = normalizeAttributeText(element.getAttribute?.("placeholder")); + if (placeholder) { + parts.push(`placeholder=${formatValue(placeholder)}`); + } else if (!label) { + appendFallbackIdOrName(); + } + } else if (tagName === "SELECT" || role === "combobox") { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + + const selectedValue = getReferenceValueMetadata(element); + const selectedOptions = selectedValue + ? [selectedValue] + : [...(element.selectedOptions || [])] + .map((option) => truncateText(option.textContent || "", 48)) + .filter(Boolean); + if (selectedOptions.length) { + parts.push(`selected=${formatValue(selectedOptions.join(" | "))}`); + } + } else if (tagName === "SUMMARY") { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: true, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else if (tagName === "INPUT") { + const inputType = String(element.getAttribute?.("type") || element.type || "text").toLowerCase(); + if (["button", "submit", "reset"].includes(inputType)) { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: false + }) || element.value || "", 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else if (["checkbox", "radio"].includes(inputType)) { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: false + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else if (inputType === "file") { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: false + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: false + }), 120); + if (label) { + parts.push(formatValue(label)); + } + + const placeholder = normalizeAttributeText(element.getAttribute?.("placeholder")); + const value = inputType === "password" + ? "" + : getReferenceValueMetadata(element); + + if (placeholder) { + parts.push(`placeholder=${formatValue(placeholder)}`); + } + if (value) { + parts.push(`value=${formatValue(value)}`); + } + if (!label && !placeholder && !value) { + appendFallbackIdOrName(); + } + } + } else if (String(element.getAttribute?.("contenteditable") || "").toLowerCase() === "true") { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else if (tagName === "IMG") { + const srcSummary = summarizeUrl(element.currentSrc || element.getAttribute?.("src") || element.src || ""); + const label = truncateText(getLabelText(element, { + includeAlt: true, + includeDescendantImageAlt: false, + includePlaceholder: false, + includeText: false + }), 120); + const displayLabel = label || srcSummary; + if (displayLabel) { + parts.push(formatValue(displayLabel)); + } else { + appendFallbackIdOrName(); + } + } else if (role) { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: true, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } else { + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: true, + includePlaceholder: false, + includeText: true + }), 120); + if (label) { + parts.push(formatValue(label)); + } else { + appendFallbackIdOrName(); + } + } + + return { + descriptorTags: stateMetadata.descriptorTags.slice(), + kind, + semanticTags: stateMetadata.semanticTags.slice(), + state: stateMetadata, + summary: parts.filter(Boolean).join(" ") + }; + } + + function createReferenceEntry(element, referenceId, options = {}) { + const nodeId = normalizeAttributeText(element.getAttribute?.("data-space-browser-node-id")); + const frameId = normalizeAttributeText(element.getAttribute?.("data-space-browser-frame-id")); + const frameChain = normalizeFrameChain(element.getAttribute?.("data-space-browser-frame-chain")); + const helperBacked = Boolean(nodeId && frameChain.length); + const summaryData = collectReferenceSummaryData(element, options); + + return { + connected: helperBacked ? true : element.isConnected !== false, + dom: serializeElementSnapshot(element), + descriptorTags: summaryData.descriptorTags, + element: helperBacked ? null : element, + frameChain, + frameId, + helperBacked, + id: normalizeAttributeText(element.getAttribute?.("id")), + name: normalizeAttributeText(element.getAttribute?.("name")), + nodeId, + referenceId, + kind: summaryData.kind, + semanticTags: summaryData.semanticTags, + state: summaryData.state, + summary: summaryData.summary, + tagName: getTagName(element) + }; + } + + function ensureReference(element, context) { + if (context.referenceIdsByElement.has(element)) { + return context.referenceIdsByElement.get(element); + } + + const referenceId = String(context.nextReferenceId++); + const entry = createReferenceEntry(element, referenceId, context.options); + context.referenceIdsByElement.set(element, referenceId); + context.entries.set(referenceId, entry); + return referenceId; + } + + function renderReference(element, context) { + const referenceId = ensureReference(element, context); + const entry = context.entries.get(referenceId); + const kind = normalizeText(entry?.kind || getTagName(element).toLowerCase()); + const descriptorTags = Array.isArray(entry?.descriptorTags) + ? entry.descriptorTags.map((tag) => normalizeText(tag)).filter(Boolean) + : []; + const summary = normalizeText(entry?.summary || ""); + const descriptor = [...descriptorTags, kind, referenceId].filter(Boolean).join(" "); + return summary ? `[${descriptor}] ${summary}` : `[${descriptor}]`; + } + + function isReferenceableElement(element) { + return isInteractiveElement(element) || getTagName(element) === "IMG"; + } + + function renderInlineNode(node, context) { + if (isTextNode(node)) { + const textContent = normalizeText(node.textContent || ""); + if (shouldDropReadableText(textContent)) { + return ""; + } + + return escapeMarkdownText(textContent); + } + + if (!isElementNode(node) || isHiddenElement(node)) { + return ""; + } + + if (isReferenceableElement(node)) { + return renderReference(node, context); + } + + const tagName = getTagName(node); + + if (tagName === "LABEL" && (node.getAttribute?.("for") || node.querySelector?.("input, textarea, select, button"))) { + return ""; + } + + if (tagName === "BR") { + return "\n"; + } + + if (tagName === "STRONG" || tagName === "B") { + const content = renderInlineChildren(node, context); + return content ? `**${content}**` : ""; + } + + if (tagName === "EM" || tagName === "I") { + const content = renderInlineChildren(node, context); + return content ? `*${content}*` : ""; + } + + if (tagName === "S" || tagName === "STRIKE" || tagName === "DEL") { + const content = renderInlineChildren(node, context); + return content ? `~~${content}~~` : ""; + } + + if (tagName === "CODE") { + const content = normalizeText(node.textContent || ""); + return content ? `\`${content.replace(/`/gu, "\\`")}\`` : ""; + } + + return renderInlineChildren(node, context); + } + + function renderInlineChildren(element, context) { + const parts = []; + + element.childNodes.forEach((childNode) => { + const renderedChild = renderInlineNode(childNode, context); + if (renderedChild) { + parts.push(renderedChild); + } + }); + + return joinInlineParts(parts); + } + + function renderParagraph(element, context) { + return renderInlineChildren(element, context); + } + + function renderHeading(element, context) { + const level = Math.min(6, Math.max(1, Number.parseInt(getTagName(element).slice(1), 10) || 1)); + const content = renderInlineChildren(element, context); + return content ? `${"#".repeat(level)} ${content}` : ""; + } + + function renderCodeBlock(element) { + const content = String(element.textContent || "").trimEnd(); + if (!content) { + return ""; + } + + return `\`\`\`\n${content.replace(/```/gu, "\\`\\`\\`")}\n\`\`\``; + } + + function renderBlockquote(element, context) { + const content = renderBlockChildren(element, context); + if (!content) { + return ""; + } + + return content + .split("\n") + .map((line) => `> ${line}`) + .join("\n"); + } + + function renderListItem(element, context, depth, index, ordered) { + const includeListMarkers = context.options.includeListMarkers === true; + const includeListIndentation = context.options.includeListIndentation !== false; + const marker = includeListMarkers ? (ordered ? `${index + 1}.` : "-") : ""; + const indentation = includeListIndentation ? " ".repeat(Math.max(0, depth)) : ""; + const inlineParts = []; + const nestedBlocks = []; + + element.childNodes.forEach((childNode) => { + if (isElementNode(childNode) && (getTagName(childNode) === "UL" || getTagName(childNode) === "OL")) { + const nestedList = renderList(childNode, context, depth + 1); + if (nestedList) { + nestedBlocks.push(nestedList); + } + return; + } + + const renderedChild = renderInlineNode(childNode, context); + if (renderedChild) { + inlineParts.push(renderedChild); + } + }); + + const head = joinInlineParts(inlineParts); + const linePrefix = marker ? `${indentation}${marker} ` : indentation; + const lines = [`${linePrefix}${head || "(empty)"}`]; + nestedBlocks.forEach((nestedBlock) => { + lines.push(indentBlock(nestedBlock, includeListIndentation ? 1 : 0)); + }); + return lines.join("\n"); + } + + function renderList(element, context, depth = 0) { + const ordered = getTagName(element) === "OL"; + return [...element.children] + .filter((child) => getTagName(child) === "LI" && !isHiddenElement(child)) + .map((item, index) => renderListItem(item, context, depth, index, ordered)) + .filter(Boolean) + .join("\n"); + } + + function renderTableCell(element, context) { + return renderInlineChildren(element, context); + } + + function renderTable(element, context) { + const rows = [...element.querySelectorAll?.(":scope > thead > tr, :scope > tbody > tr, :scope > tr, :scope > tfoot > tr") || []] + .filter((row) => getTagName(row) === "TR"); + + if (!rows.length) { + return ""; + } + + const renderedRows = rows.map((row) => { + return [...row.children] + .filter((cell) => ["TD", "TH"].includes(getTagName(cell)) && !isHiddenElement(cell)) + .map((cell) => renderTableCell(cell, context)); + }).filter((cells) => cells.length); + + if (!renderedRows.length) { + return ""; + } + + const columnCount = Math.max(...renderedRows.map((cells) => cells.length)); + const normalizedRows = renderedRows.map((cells) => { + const nextCells = cells.slice(); + while (nextCells.length < columnCount) { + nextCells.push(""); + } + return nextCells; + }); + + const headerRow = normalizedRows[0]; + const separatorRow = headerRow.map(() => "---"); + const tableLines = [ + `| ${headerRow.join(" | ")} |`, + `| ${separatorRow.join(" | ")} |` + ]; + + normalizedRows.slice(1).forEach((row) => { + tableLines.push(`| ${row.join(" | ")} |`); + }); + + return tableLines.join("\n"); + } + + function renderGenericContainer(element, context) { + return renderBlockChildren(element, context); + } + + function renderElementAsBlock(element, context) { + if (!isElementNode(element) || isHiddenElement(element)) { + return ""; + } + + if (isReferenceableElement(element)) { + return renderReference(element, context); + } + + const tagName = getTagName(element); + + if (tagName === "LABEL" && (element.getAttribute?.("for") || element.querySelector?.("input, textarea, select, button"))) { + return ""; + } + + if (/^H[1-6]$/u.test(tagName)) { + return renderHeading(element, context); + } + + if (tagName === "P") { + return renderParagraph(element, context); + } + + if (tagName === "PRE") { + return renderCodeBlock(element); + } + + if (tagName === "BLOCKQUOTE") { + return renderBlockquote(element, context); + } + + if (tagName === "UL" || tagName === "OL") { + return renderList(element, context); + } + + if (tagName === "TABLE") { + return renderTable(element, context); + } + + if (tagName === "HR") { + return "---"; + } + + return renderGenericContainer(element, context); + } + + function renderBlockChildren(element, context) { + const blocks = []; + const inlineParts = []; + + const flushInlineParts = () => { + const inlineText = joinInlineParts(inlineParts.splice(0, inlineParts.length)); + if (inlineText) { + blocks.push(inlineText); + } + }; + + element.childNodes.forEach((childNode) => { + if (isTextNode(childNode)) { + const rawTextContent = normalizeText(childNode.textContent || ""); + if (shouldDropReadableText(rawTextContent)) { + return; + } + + const textContent = escapeMarkdownText(rawTextContent); + if (textContent) { + inlineParts.push(textContent); + } + return; + } + + if (!isElementNode(childNode) || isHiddenElement(childNode)) { + return; + } + + const renderedChild = renderElementAsBlock(childNode, context); + if (!renderedChild) { + return; + } + + if (isBlockElement(childNode) || isReferenceableElement(childNode)) { + flushInlineParts(); + blocks.push(renderedChild); + return; + } + + inlineParts.push(renderedChild); + }); + + flushInlineParts(); + return joinBlocks(blocks); + } + + function createCaptureContext(payload = null) { + return { + entries: new Map(), + nextReferenceId: 1, + options: { + includeLabelQuotes: normalizeIncludeLabelQuotes(payload), + includeLinkUrls: normalizeIncludeLinkUrls(payload), + includeSemanticTags: normalizeIncludeSemanticTags(payload), + includeStateTags: normalizeIncludeStateTags(payload), + includeListIndentation: normalizeIncludeListIndentation(payload), + includeListMarkers: normalizeIncludeListMarkers(payload) + }, + referenceIdsByElement: new WeakMap() + }; + } + + function resolveSelectorTargets(payload, doc = globalThis.document) { + const selectors = normalizeSelectorList(payload); + if (!selectors.length) { + return { + includeMetaData: true, + items: [ + { + key: "document", + targets: [doc?.body || doc?.documentElement].filter(Boolean) + } + ] + }; + } + + return { + includeMetaData: false, + items: selectors.map((selector) => { + let targets = []; + try { + targets = [...(doc?.querySelectorAll?.(selector) || [])]; + } catch (error) { + throw createNamedError( + "BrowserPageContentSelectorError", + `Browser page content could not resolve selector "${selector}".`, + { + code: "browser_page_content_selector_error", + details: { + selector + }, + cause: error + } + ); + } + + return { + key: selector, + targets + }; + }) + }; + } + + function parseSnapshotFragment(html, parser) { + return parser.parseFromString( + `${String(html || "")}`, + "text/html" + ); + } + + function renderSnapshotFragment(html, captureContext, parser) { + const parsedDocument = parseSnapshotFragment(html, parser); + const blocks = []; + const inlineParts = []; + + const flushInlineParts = () => { + const inlineText = joinInlineParts(inlineParts.splice(0, inlineParts.length)); + if (inlineText) { + blocks.push(inlineText); + } + }; + + parsedDocument.body.childNodes.forEach((childNode) => { + if (isTextNode(childNode)) { + const rawTextContent = normalizeText(childNode.textContent || ""); + if (shouldDropReadableText(rawTextContent)) { + return; + } + + const textContent = escapeMarkdownText(rawTextContent); + if (textContent) { + inlineParts.push(textContent); + } + return; + } + + if (!isElementNode(childNode) || isHiddenElement(childNode)) { + return; + } + + const renderedChild = renderElementAsBlock(childNode, captureContext); + if (!renderedChild) { + return; + } + + if (isBlockElement(childNode) || isReferenceableElement(childNode)) { + flushInlineParts(); + blocks.push(renderedChild); + return; + } + + inlineParts.push(renderedChild); + }); + + flushInlineParts(); + return cleanReadableMarkdown(joinBlocks(blocks)); + } + + function captureLive(payload = null) { + const captureContext = createCaptureContext(payload); + const resolvedTargets = resolveSelectorTargets(payload); + const snapshot = {}; + + resolvedTargets.items.forEach((item) => { + const blocks = []; + if (resolvedTargets.includeMetaData && item.key === "document") { + const meta = collectMetaLines(globalThis.document); + if (meta) { + blocks.push(meta); + } + } + + item.targets.forEach((target) => { + const renderedTarget = renderElementAsBlock(target, captureContext); + if (renderedTarget) { + blocks.push(renderedTarget); + } + }); + + snapshot[item.key] = cleanReadableMarkdown(joinBlocks(blocks)); + }); + + state.captureId += 1; + state.capturedAt = Date.now(); + state.backend = "live"; + state.captureOptions = { ...captureContext.options }; + state.entries = captureContext.entries; + return snapshot; + } + + async function captureWithDomHelper(payload = null) { + const helper = requireDomHelper("capture content"); + const selectors = normalizeSelectorList(payload); + const helperPayload = { + snapshotMode: "content" + }; + if (selectors.length) { + helperPayload.selectors = selectors; + } + const documentSnapshot = await helper.captureDocument({ + ...helperPayload + }); + const snapshot = {}; + const parser = new globalThis.DOMParser(); + const captureContext = createCaptureContext(payload); + try { + if (selectors.length && documentSnapshot?.targets && typeof documentSnapshot.targets === "object") { + selectors.forEach((selector) => { + snapshot[selector] = renderSnapshotFragment(documentSnapshot.targets?.[selector] || "", captureContext, parser); + }); + + state.captureId += 1; + state.capturedAt = Date.now(); + state.backend = "dom_helper"; + state.captureOptions = { ...captureContext.options }; + state.entries = captureContext.entries; + return snapshot; + } + + const parsedDocument = parser.parseFromString(String(documentSnapshot?.html || ""), "text/html"); + const resolvedTargets = resolveSelectorTargets(payload, parsedDocument); + + resolvedTargets.items.forEach((item) => { + const blocks = []; + if (resolvedTargets.includeMetaData && item.key === "document") { + const meta = collectMetaLines(parsedDocument); + if (meta) { + blocks.push(meta); + } + } + + item.targets.forEach((target) => { + const renderedTarget = renderElementAsBlock(target, captureContext); + if (renderedTarget) { + blocks.push(renderedTarget); + } + }); + + snapshot[item.key] = cleanReadableMarkdown(joinBlocks(blocks)); + }); + + state.captureId += 1; + state.capturedAt = Date.now(); + state.backend = "dom_helper"; + state.captureOptions = { ...captureContext.options }; + state.entries = captureContext.entries; + return snapshot; + } catch (error) { + if (!isTrustedHtmlRequirementError(error)) { + throw error; + } + + return captureLive(payload); + } + } + + async function capture(payload = null) { + if (getDomHelper()) { + return captureWithDomHelper(payload); + } + + return captureLive(payload); + } + + function detailLive(entry) { + const liveState = entry.connected && entry.element + ? collectElementStateMetadata(entry.element, state.captureOptions) + : entry.state || collectElementStateMetadata(null); + return { + captureId: state.captureId, + capturedAt: state.capturedAt, + connected: entry.connected, + descriptorTags: liveState.descriptorTags, + dom: entry.connected ? serializeElementSnapshot(entry.element) || entry.dom : entry.dom, + referenceId: entry.referenceId, + semanticTags: liveState.semanticTags, + state: liveState, + summary: entry.summary, + tagName: entry.tagName + }; + } + + async function detail(referenceId) { + const entry = requireReferenceEntry(referenceId, { + actionLabel: "detail", + requireConnected: false + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("resolve detail"); + const resolvedDetail = await helper.detailNode(entry.frameChain, entry.nodeId); + return { + captureId: state.captureId, + capturedAt: state.capturedAt, + connected: resolvedDetail?.connected !== false, + descriptorTags: Array.isArray(resolvedDetail?.descriptorTags) ? resolvedDetail.descriptorTags : (entry.descriptorTags || []), + dom: String(resolvedDetail?.dom || entry.dom || ""), + frameChain: entry.frameChain.slice(), + frameId: entry.frameId, + nodeId: entry.nodeId, + referenceId: entry.referenceId, + semanticTags: Array.isArray(resolvedDetail?.semanticTags) ? resolvedDetail.semanticTags : (entry.semanticTags || []), + state: resolvedDetail?.state || entry.state || collectElementStateMetadata(null), + summary: entry.summary, + tagName: String(resolvedDetail?.tagName || entry.tagName || "") + }; + } + + return detailLive(entry); + } + + function requireReferenceEntry(referenceId, options = {}) { + const normalizedReferenceId = normalizeReferenceId(referenceId); + if (!normalizedReferenceId) { + throw createNamedError( + "BrowserPageContentReferenceError", + "Browser page content requests require a reference id.", + { + code: "browser_page_content_reference_required", + details: { + action: String(options.actionLabel || "resolve") + } + } + ); + } + + if (!state.entries.size) { + throw createNamedError( + "BrowserPageContentReferenceError", + `Browser page content has no reference capture for "${normalizedReferenceId}".`, + { + code: "browser_page_content_reference_missing_capture", + details: { + action: String(options.actionLabel || "resolve"), + referenceId: normalizedReferenceId + } + } + ); + } + + const entry = state.entries.get(normalizedReferenceId); + if (!entry) { + throw createNamedError( + "BrowserPageContentReferenceError", + `Browser page content could not find reference "${normalizedReferenceId}".`, + { + code: "browser_page_content_reference_not_found", + details: { + action: String(options.actionLabel || "resolve"), + referenceId: normalizedReferenceId + } + } + ); + } + + refreshReferenceEntry(entry); + + if (options.requireConnected !== false && !entry.connected) { + throw createNamedError( + "BrowserPageContentReferenceError", + `Browser page content reference "${normalizedReferenceId}" is no longer connected.`, + { + code: "browser_page_content_reference_disconnected", + details: { + action: String(options.actionLabel || "resolve"), + referenceId: normalizedReferenceId + } + } + ); + } + + return entry; + } + + function refreshReferenceEntry(entry) { + if (!entry || entry.helperBacked || !entry.element) { + return entry; + } + + entry.connected = entry.element.isConnected !== false; + if (entry.connected) { + entry.dom = serializeElementSnapshot(entry.element) || entry.dom; + entry.id = normalizeAttributeText(entry.element.getAttribute?.("id")); + entry.name = normalizeAttributeText(entry.element.getAttribute?.("name")); + const summaryData = collectReferenceSummaryData(entry.element, state.captureOptions); + entry.descriptorTags = summaryData.descriptorTags; + entry.kind = summaryData.kind; + entry.semanticTags = summaryData.semanticTags; + entry.state = summaryData.state; + entry.summary = summaryData.summary; + entry.tagName = getTagName(entry.element); + } + + return entry; + } + + function scrollElementIntoView(element) { + try { + element.scrollIntoView?.({ + behavior: "auto", + block: "center", + inline: "center" + }); + return true; + } catch { + return false; + } + } + + function focusElement(element) { + try { + element.focus?.({ + preventScroll: true + }); + return true; + } catch { + try { + element.focus?.(); + return true; + } catch { + return false; + } + } + } + + function describeActiveElement(element) { + if (!isElementNode(element)) { + return ""; + } + + const tagName = getTagName(element).toLowerCase(); + const id = normalizeAttributeText(element.getAttribute?.("id")); + const name = normalizeAttributeText(element.getAttribute?.("name")); + const label = truncateText(getLabelText(element, { + includeAlt: false, + includeDescendantImageAlt: true, + includePlaceholder: false, + includeText: false + }), 48); + return [tagName, id ? `#${id}` : "", name ? `name=${name}` : "", label].filter(Boolean).join(" "); + } + + function getActionObservationRoot(element) { + if (!isElementNode(element)) { + return globalThis.document?.body || globalThis.document?.documentElement || null; + } + + return element.closest?.("form, fieldset, dialog, [role='dialog'], [role='alert'], [role='status'], [aria-live], article, section, main, li, tr, td, th") + || element.parentElement + || element; + } + + function getElementDirectText(element) { + if (!isElementNode(element)) { + return ""; + } + + return normalizeText( + [...(element.childNodes || [])] + .filter((node) => isTextNode(node)) + .map((node) => node.textContent || "") + .join(" ") + ); + } + + function collectNearbyTextEntries(root, limit = 24) { + if (!isElementNode(root)) { + return []; + } + + const entries = []; + const seen = new Set(); + const acceptElement = (element) => { + if (!isElementNode(element) || isHiddenElement(element) || entries.length >= limit) { + return; + } + + const role = normalizeText(element.getAttribute?.("role")).toLowerCase(); + const directText = getElementDirectText(element); + const fallbackText = ["alert", "status"].includes(role) || element.hasAttribute?.("aria-live") + ? getElementText(element) + : ""; + const text = truncateText(directText || fallbackText, 220); + if (!text) { + return; + } + + const key = `${role}|${text}`; + if (seen.has(key)) { + return; + } + seen.add(key); + const state = collectElementStateMetadata(element, { + includeSemanticTags: true, + includeStateTags: true + }); + entries.push({ + invalid: state.invalid === true, + role, + semanticTone: state.semanticTone || "", + text + }); + }; + + acceptElement(root); + const walker = globalThis.document?.createTreeWalker?.(root, globalThis.NodeFilter?.SHOW_ELEMENT ?? 1); + if (!walker) { + return entries; + } + + let currentNode = walker.nextNode(); + while (currentNode && entries.length < limit) { + acceptElement(currentNode); + currentNode = walker.nextNode(); + } + + return entries; + } + + function captureActionEffectSnapshot(element) { + const observationRoot = getActionObservationRoot(element); + return { + activeElement: describeActiveElement(globalThis.document?.activeElement), + observationRoot, + observationText: truncateText(getElementText(observationRoot), 2000), + targetDom: truncateText(serializeElementSnapshot(element), 2000), + targetState: collectElementStateMetadata(element, { + includeSemanticTags: true, + includeStateTags: true + }), + textEntries: collectNearbyTextEntries(observationRoot), + value: getReferenceValueMetadata(element) + }; + } + + async function waitForObservedActionWindow(observationRoot, { + quietMs = 40, + timeoutMs = 180 + } = {}) { + const target = observationRoot?.ownerDocument?.body + || observationRoot?.ownerDocument?.documentElement + || globalThis.document?.body + || globalThis.document?.documentElement; + if (!target || typeof globalThis.MutationObserver !== "function") { + await delayMs(timeoutMs); + return { + attributeNames: [], + mutationCount: 0 + }; + } + + const attributeNames = new Set(); + let lastMutationAt = 0; + let mutationCount = 0; + const observer = new globalThis.MutationObserver((mutations) => { + mutationCount += mutations.length; + lastMutationAt = Date.now(); + mutations.forEach((mutation) => { + if (mutation.type === "attributes" && mutation.attributeName) { + attributeNames.add(String(mutation.attributeName)); + } + }); + }); + + try { + observer.observe(target, { + attributes: true, + characterData: true, + childList: true, + subtree: true + }); + const startedAt = Date.now(); + while (Date.now() - startedAt < timeoutMs) { + await delayMs(20); + if (mutationCount > 0 && Date.now() - lastMutationAt >= quietMs) { + break; + } + } + } finally { + observer.disconnect(); + } + + return { + attributeNames: [...attributeNames], + mutationCount + }; + } + + async function withObservedActionWindow(observationRoot, action, options = {}) { + const target = observationRoot?.ownerDocument?.body + || observationRoot?.ownerDocument?.documentElement + || globalThis.document?.body + || globalThis.document?.documentElement; + if (!target || typeof globalThis.MutationObserver !== "function") { + const result = await action(); + const observedMutations = await waitForObservedActionWindow(observationRoot, options); + return { + observedMutations, + result + }; + } + + const attributeNames = new Set(); + let lastMutationAt = 0; + let mutationCount = 0; + const observer = new globalThis.MutationObserver((mutations) => { + mutationCount += mutations.length; + lastMutationAt = Date.now(); + mutations.forEach((mutation) => { + if (mutation.type === "attributes" && mutation.attributeName) { + attributeNames.add(String(mutation.attributeName)); + } + }); + }); + + try { + observer.observe(target, { + attributes: true, + characterData: true, + childList: true, + subtree: true + }); + const result = await action(); + const quietMs = Math.max(0, Number(options.quietMs) || 40); + const timeoutMs = Math.max(0, Number(options.timeoutMs) || 180); + const startedAt = Date.now(); + while (Date.now() - startedAt < timeoutMs) { + await delayMs(20); + if (mutationCount > 0 && Date.now() - lastMutationAt >= quietMs) { + break; + } + } + return { + observedMutations: { + attributeNames: [...attributeNames], + mutationCount + }, + result + }; + } finally { + observer.disconnect(); + } + } + + function compareDescriptorTags(beforeTags = [], afterTags = []) { + const beforeValue = beforeTags.filter(Boolean).join("|"); + const afterValue = afterTags.filter(Boolean).join("|"); + return beforeValue !== afterValue; + } + + function buildActionEffectResult(entry, beforeSnapshot, afterSnapshot, observedMutations, extra = {}) { + const newTextEntries = afterSnapshot.textEntries.filter((entryData) => { + return !beforeSnapshot.textEntries.some((beforeEntry) => beforeEntry.text === entryData.text); + }); + const validationEntries = newTextEntries.filter((entryData) => { + return entryData.invalid + || ["alert", "status"].includes(entryData.role) + || ["error", "warning"].includes(entryData.semanticTone); + }); + const focusChanged = beforeSnapshot.activeElement !== afterSnapshot.activeElement; + const nearbyTextChanged = beforeSnapshot.observationText !== afterSnapshot.observationText; + const valueChanged = beforeSnapshot.value !== afterSnapshot.value; + const checkedChanged = beforeSnapshot.targetState.checked !== afterSnapshot.targetState.checked; + const selectedChanged = beforeSnapshot.targetState.selected !== afterSnapshot.targetState.selected; + const expandedChanged = beforeSnapshot.targetState.expanded !== afterSnapshot.targetState.expanded; + const pressedChanged = beforeSnapshot.targetState.pressed !== afterSnapshot.targetState.pressed; + const descriptorChanged = compareDescriptorTags(beforeSnapshot.targetState.descriptorTags, afterSnapshot.targetState.descriptorTags); + const targetDomChanged = beforeSnapshot.targetDom !== afterSnapshot.targetDom; + const domChanged = Boolean(observedMutations.mutationCount) || targetDomChanged || nearbyTextChanged; + const status = { + alertTextAdded: newTextEntries.some((entryData) => ["alert", "status"].includes(entryData.role)), + checkedChanged, + descriptorChanged, + domChanged, + expandedChanged, + focusChanged, + nearbyTextChanged, + pressedChanged, + reacted: false, + selectedChanged, + targetChanged: descriptorChanged || targetDomChanged || valueChanged || checkedChanged || selectedChanged || expandedChanged || pressedChanged, + targetDomChanged, + valueChanged, + validationTextAdded: validationEntries.length > 0 + }; + status.reacted = Object.entries(status).some(([key, value]) => key !== "reacted" && value === true); + status.noObservedEffect = !status.reacted; + + return { + ...extra, + descriptorTags: afterSnapshot.targetState.descriptorTags.slice(), + effect: { + mutationAttributes: observedMutations.attributeNames.slice(0, 8), + mutationCount: observedMutations.mutationCount, + newText: newTextEntries.map((entryData) => entryData.text).slice(0, 3), + semanticHints: [...new Set(newTextEntries.map((entryData) => entryData.semanticTone).filter(Boolean))].slice(0, 3), + validationText: validationEntries.map((entryData) => entryData.text).slice(0, 3) + }, + semanticTags: afterSnapshot.targetState.semanticTags.slice(), + state: afterSnapshot.targetState, + status + }; + } + + function buildActionResult(entry, extra = {}) { + return { + captureId: state.captureId, + descriptorTags: Array.isArray(entry?.descriptorTags) ? entry.descriptorTags.slice() : [], + referenceId: entry.referenceId, + semanticTags: Array.isArray(entry?.semanticTags) ? entry.semanticTags.slice() : [], + state: entry.state || collectElementStateMetadata(entry.element, state.captureOptions), + summary: entry.summary, + tagName: entry.tagName, + ...extra + }; + } + + function buildHelperBackedActionResult(entry, helperResult, extra = {}) { + return { + captureId: state.captureId, + descriptorTags: Array.isArray(helperResult?.descriptorTags) ? helperResult.descriptorTags : (entry.descriptorTags || []), + frameChain: entry.frameChain.slice(), + frameId: entry.frameId, + nodeId: entry.nodeId, + referenceId: entry.referenceId, + semanticTags: Array.isArray(helperResult?.semanticTags) ? helperResult.semanticTags : (entry.semanticTags || []), + state: helperResult?.state || entry.state || collectElementStateMetadata(null), + summary: entry.summary, + tagName: String(helperResult?.tagName || entry.tagName || ""), + ...extra + }; + } + + function mergeActionOutcomeResults(...results) { + const normalizedResults = results.filter(Boolean); + const mergedStatus = {}; + const mergedEffect = { + mutationAttributes: [], + mutationCount: 0, + newText: [], + semanticHints: [], + validationText: [] + }; + + normalizedResults.forEach((result) => { + Object.entries(result?.status || {}).forEach(([key, value]) => { + if (typeof value === "boolean") { + mergedStatus[key] = mergedStatus[key] === true || value === true; + } + }); + if (Number.isFinite(result?.effect?.mutationCount)) { + mergedEffect.mutationCount += Number(result.effect.mutationCount); + } + ["mutationAttributes", "newText", "semanticHints", "validationText"].forEach((key) => { + const values = Array.isArray(result?.effect?.[key]) ? result.effect[key] : []; + values.forEach((value) => { + if (value && !mergedEffect[key].includes(value)) { + mergedEffect[key].push(value); + } + }); + }); + }); + + mergedStatus.reacted = Object.entries(mergedStatus).some(([key, value]) => key !== "reacted" && key !== "noObservedEffect" && value === true); + mergedStatus.noObservedEffect = !mergedStatus.reacted; + return { + effect: mergedEffect, + status: mergedStatus + }; + } + + function dispatchDomEvent(target, eventName, EventType = "Event", options = {}) { + const EventConstructor = typeof globalThis[EventType] === "function" + ? globalThis[EventType] + : globalThis.Event; + const event = new EventConstructor(eventName, { + bubbles: true, + cancelable: true, + composed: true, + ...options + }); + target.dispatchEvent(event); + return event; + } + + function dispatchKeyboardEvent(target, eventName, options = {}) { + const KeyboardEventConstructor = typeof globalThis.KeyboardEvent === "function" + ? globalThis.KeyboardEvent + : globalThis.Event; + const event = new KeyboardEventConstructor(eventName, { + bubbles: true, + cancelable: true, + composed: true, + code: "Enter", + key: "Enter", + ...options + }); + + [ + ["charCode", Number(options.charCode ?? 0)], + ["keyCode", Number(options.keyCode ?? 13)], + ["which", Number(options.which ?? 13)] + ].forEach(([propertyName, propertyValue]) => { + try { + if (typeof event[propertyName] !== "number") { + Object.defineProperty(event, propertyName, { + configurable: true, + enumerable: true, + value: propertyValue + }); + } + } catch { + // Ignore read-only KeyboardEvent properties. + } + }); + + target.dispatchEvent(event); + return event; + } + + function setNativeValue(element, nextValue) { + const tagName = getTagName(element); + const normalizedValue = String(nextValue ?? ""); + + if (tagName === "INPUT") { + const descriptor = Object.getOwnPropertyDescriptor(globalThis.HTMLInputElement?.prototype || {}, "value"); + if (typeof descriptor?.set === "function") { + descriptor.set.call(element, normalizedValue); + } else { + element.value = normalizedValue; + } + return normalizedValue; + } + + if (tagName === "TEXTAREA") { + const descriptor = Object.getOwnPropertyDescriptor(globalThis.HTMLTextAreaElement?.prototype || {}, "value"); + if (typeof descriptor?.set === "function") { + descriptor.set.call(element, normalizedValue); + } else { + element.value = normalizedValue; + } + return normalizedValue; + } + + if (tagName === "SELECT") { + const matchedOption = [...(element.options || [])].find((option) => { + return option.value === normalizedValue + || normalizeText(option.textContent || "") === normalizeText(normalizedValue) + || normalizeText(option.label || "") === normalizeText(normalizedValue); + }); + + const resolvedValue = matchedOption ? matchedOption.value : normalizedValue; + const descriptor = Object.getOwnPropertyDescriptor(globalThis.HTMLSelectElement?.prototype || {}, "value"); + if (typeof descriptor?.set === "function") { + descriptor.set.call(element, resolvedValue); + } else { + element.value = resolvedValue; + } + return resolvedValue; + } + + if (String(element.getAttribute?.("contenteditable") || "").toLowerCase() === "true") { + element.textContent = normalizedValue; + return normalizedValue; + } + + throw createNamedError( + "BrowserPageContentActionError", + `Browser page content cannot type into <${getTagName(element).toLowerCase()}>.`, + { + code: "browser_page_content_type_unsupported" + } + ); + } + + async function updateElementValue(referenceId, value) { + const entry = requireReferenceEntry(referenceId, { + actionLabel: "type" + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("type into reference"); + const typedResult = await helper.typeNode(entry.frameChain, entry.nodeId, value); + return buildHelperBackedActionResult(entry, typedResult, { + effect: typedResult?.effect || {}, + status: typedResult?.status || {}, + value: typedResult?.value ?? String(value ?? "") + }); + } + + const element = entry.element; + const beforeSnapshot = captureActionEffectSnapshot(element); + + const { + result: appliedValue, + observedMutations + } = await withObservedActionWindow(beforeSnapshot.observationRoot, async () => { + scrollElementIntoView(element); + focusElement(element); + const nextValue = setNativeValue(element, value); + + if (typeof element.setSelectionRange === "function") { + try { + element.setSelectionRange(String(nextValue).length, String(nextValue).length); + } catch { + // Ignore selection errors for unsupported input types. + } + } + + dispatchDomEvent(element, "beforeinput", "InputEvent", { + data: String(value ?? ""), + inputType: "insertText" + }); + dispatchDomEvent(element, "input", "InputEvent", { + data: String(value ?? ""), + inputType: "insertText" + }); + dispatchDomEvent(element, "change"); + return nextValue; + }); + + refreshReferenceEntry(entry); + return buildActionResult(entry, { + ...buildActionEffectResult(entry, beforeSnapshot, captureActionEffectSnapshot(element), observedMutations), + value: appliedValue + }); + } + + async function activateElement(referenceId) { + const entry = requireReferenceEntry(referenceId, { + actionLabel: "click" + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("click reference"); + const clickedResult = await helper.clickNode(entry.frameChain, entry.nodeId); + return buildHelperBackedActionResult(entry, clickedResult, { + effect: clickedResult?.effect || {}, + status: clickedResult?.status || {} + }); + } + + const element = entry.element; + const beforeSnapshot = captureActionEffectSnapshot(element); + + scrollElementIntoView(element); + focusElement(element); + + if (beforeSnapshot.targetState.disabled) { + throw createNamedError( + "BrowserPageContentActionError", + `Browser page content reference "${entry.referenceId}" is disabled.`, + { + code: "browser_page_content_click_disabled" + } + ); + } + + const { + observedMutations + } = await withObservedActionWindow(beforeSnapshot.observationRoot, async () => { + if (typeof element.click === "function") { + element.click(); + } else { + dispatchDomEvent(element, "click", "MouseEvent", { + button: 0 + }); + } + }); + + refreshReferenceEntry(entry); + return buildActionResult(entry, buildActionEffectResult( + entry, + beforeSnapshot, + captureActionEffectSnapshot(element), + observedMutations + )); + } + + async function submitElement(referenceId) { + const entry = requireReferenceEntry(referenceId, { + actionLabel: "submit" + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("submit reference"); + const submittedResult = await helper.submitNode(entry.frameChain, entry.nodeId); + return buildHelperBackedActionResult(entry, submittedResult, { + effect: submittedResult?.effect || {}, + status: submittedResult?.status || {} + }); + } + + const element = entry.element; + const tagName = getTagName(element); + const beforeSnapshot = captureActionEffectSnapshot(element); + + const { + observedMutations + } = await withObservedActionWindow(beforeSnapshot.observationRoot, async () => { + scrollElementIntoView(element); + focusElement(element); + + if (tagName === "FORM") { + if (typeof element.requestSubmit === "function") { + element.requestSubmit(); + } else { + const submitEvent = dispatchDomEvent(element, "submit"); + if (!submitEvent.defaultPrevented) { + element.submit?.(); + } + } + } else if (typeof element.form?.requestSubmit === "function") { + if (tagName === "BUTTON" || tagName === "INPUT") { + element.form.requestSubmit(element); + } else { + element.form.requestSubmit(); + } + } else if (element.form) { + const submitEvent = dispatchDomEvent(element.form, "submit"); + if (!submitEvent.defaultPrevented) { + element.form.submit?.(); + } + } else if (typeof element.click === "function") { + element.click(); + } else { + throw createNamedError( + "BrowserPageContentActionError", + `Browser page content cannot submit reference "${entry.referenceId}".`, + { + code: "browser_page_content_submit_unsupported" + } + ); + } + }); + + refreshReferenceEntry(entry); + return buildActionResult(entry, buildActionEffectResult( + entry, + beforeSnapshot, + captureActionEffectSnapshot(element), + observedMutations + )); + } + + function shouldEnterSubmitForm(element) { + const tagName = getTagName(element); + if (tagName !== "INPUT") { + return false; + } + + const inputType = String(element.getAttribute?.("type") || element.type || "text").toLowerCase(); + return ![ + "button", + "checkbox", + "color", + "file", + "hidden", + "image", + "radio", + "range", + "reset", + "submit" + ].includes(inputType); + } + + async function pressEnterElement(referenceId, actionLabel = "type_submit") { + const entry = requireReferenceEntry(referenceId, { + actionLabel + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("press enter on reference"); + const submittedResult = await helper.typeSubmitNode(entry.frameChain, entry.nodeId, ""); + return buildHelperBackedActionResult(entry, submittedResult, { + effect: submittedResult?.effect || {}, + status: submittedResult?.status || {} + }); + } + + const element = entry.element; + const beforeSnapshot = captureActionEffectSnapshot(element); + + const { + observedMutations + } = await withObservedActionWindow(beforeSnapshot.observationRoot, async () => { + scrollElementIntoView(element); + focusElement(element); + + const keydownEvent = dispatchKeyboardEvent(element, "keydown", { + charCode: 0, + keyCode: 13, + which: 13 + }); + const keypressEvent = dispatchKeyboardEvent(element, "keypress", { + charCode: 13, + keyCode: 13, + which: 13 + }); + const keyupEvent = dispatchKeyboardEvent(element, "keyup", { + charCode: 0, + keyCode: 13, + which: 13 + }); + + if ( + !keydownEvent.defaultPrevented + && !keypressEvent.defaultPrevented + && !keyupEvent.defaultPrevented + && shouldEnterSubmitForm(element) + ) { + if (typeof element.form?.requestSubmit === "function") { + element.form.requestSubmit(); + } else if (element.form) { + const submitEvent = dispatchDomEvent(element.form, "submit"); + if (!submitEvent.defaultPrevented) { + element.form.submit?.(); + } + } + } + }); + + refreshReferenceEntry(entry); + return buildActionResult(entry, buildActionEffectResult( + entry, + beforeSnapshot, + captureActionEffectSnapshot(element), + observedMutations + )); + } + + async function typeAndSubmit(referenceId, value) { + const entry = requireReferenceEntry(referenceId, { + actionLabel: "type_submit" + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("type and submit reference"); + const submittedResult = await helper.typeSubmitNode(entry.frameChain, entry.nodeId, value); + return buildHelperBackedActionResult(entry, submittedResult, { + effect: submittedResult?.effect || {}, + status: submittedResult?.status || {}, + value: submittedResult?.value ?? String(value ?? "") + }); + } + + const typed = await updateElementValue(referenceId, value); + const submitted = await pressEnterElement(referenceId); + const mergedOutcome = mergeActionOutcomeResults(typed, submitted); + + return { + ...submitted, + ...mergedOutcome, + value: typed.value + }; + } + + async function scrollToReference(referenceId) { + const entry = requireReferenceEntry(referenceId, { + actionLabel: "scroll" + }); + + if (entry.helperBacked) { + const helper = requireDomHelper("scroll to reference"); + const scrollResult = await helper.scrollNode(entry.frameChain, entry.nodeId); + return buildHelperBackedActionResult(entry, scrollResult, { + effect: scrollResult?.effect || {}, + status: scrollResult?.status || {} + }); + } + + const beforeSnapshot = captureActionEffectSnapshot(entry.element); + scrollElementIntoView(entry.element); + focusElement(entry.element); + refreshReferenceEntry(entry); + const afterSnapshot = captureActionEffectSnapshot(entry.element); + const scrollEffect = buildActionEffectResult(entry, beforeSnapshot, afterSnapshot, { + attributeNames: [], + mutationCount: 0 + }); + return buildActionResult(entry, { + ...scrollEffect, + status: { + ...scrollEffect.status, + reacted: true, + noObservedEffect: false + } + }); + } + + globalThis[GLOBAL_KEY] = { + click(referenceId) { + return activateElement(referenceId); + }, + capture, + clear() { + state.captureId = 0; + state.capturedAt = 0; + state.captureOptions = { + includeLabelQuotes: false, + includeLinkUrls: false, + includeSemanticTags: true, + includeStateTags: true, + includeListIndentation: true, + includeListMarkers: false + }; + state.entries = new Map(); + }, + detail, + getState() { + return { + captureId: state.captureId, + capturedAt: state.capturedAt, + includeLabelQuotes: state.captureOptions.includeLabelQuotes === true, + includeLinkUrls: state.captureOptions.includeLinkUrls === true, + includeSemanticTags: state.captureOptions.includeSemanticTags !== false, + includeStateTags: state.captureOptions.includeStateTags !== false, + includeListIndentation: state.captureOptions.includeListIndentation !== false, + includeListMarkers: state.captureOptions.includeListMarkers === true, + referenceCount: state.entries.size + }; + }, + scroll(referenceId) { + return scrollToReference(referenceId); + }, + submit(referenceId) { + return submitElement(referenceId); + }, + type(referenceId, value) { + return updateElementValue(referenceId, value); + }, + typeSubmit(referenceId, value) { + return typeAndSubmit(referenceId, value); + }, + version: VERSION + }; +})(); diff --git a/plugins/_browser/default_config.yaml b/plugins/_browser/default_config.yaml new file mode 100644 index 000000000..50c62df12 --- /dev/null +++ b/plugins/_browser/default_config.yaml @@ -0,0 +1,10 @@ +# Load unpacked Chromium extension directories into the Browser tool. +# Paths must be readable from the Agent Zero runtime itself. +extensions_enabled: false + +# One unpacked extension directory per item. +extension_paths: [] + +# Optional _model_config preset used by Browser-owned model helpers. +# Empty uses the effective Main Model. +model_preset: "" diff --git a/plugins/_browser/extensions/python/_functions/agent/AgentContext/remove/start/_10_cleanup_browser_runtime.py b/plugins/_browser/extensions/python/_functions/agent/AgentContext/remove/start/_10_cleanup_browser_runtime.py new file mode 100644 index 000000000..0d6de048f --- /dev/null +++ b/plugins/_browser/extensions/python/_functions/agent/AgentContext/remove/start/_10_cleanup_browser_runtime.py @@ -0,0 +1,10 @@ +from helpers.extension import Extension +from plugins._browser.helpers.runtime import close_runtime_sync + + +class CleanupBrowserRuntimeOnRemove(Extension): + def execute(self, data: dict = {}, **kwargs): + args = data.get("args", ()) + context_id = args[0] if isinstance(args, tuple) and args else "" + if context_id: + close_runtime_sync(str(context_id), delete_profile=True) diff --git a/plugins/_browser/extensions/python/_functions/agent/AgentContext/reset/start/_10_cleanup_browser_runtime.py b/plugins/_browser/extensions/python/_functions/agent/AgentContext/reset/start/_10_cleanup_browser_runtime.py new file mode 100644 index 000000000..c5b18753e --- /dev/null +++ b/plugins/_browser/extensions/python/_functions/agent/AgentContext/reset/start/_10_cleanup_browser_runtime.py @@ -0,0 +1,11 @@ +from helpers.extension import Extension +from plugins._browser.helpers.runtime import close_runtime_sync + + +class CleanupBrowserRuntimeOnReset(Extension): + def execute(self, data: dict = {}, **kwargs): + args = data.get("args", ()) + context = args[0] if isinstance(args, tuple) and args else None + context_id = getattr(context, "id", "") + if context_id: + close_runtime_sync(context_id, delete_profile=True) diff --git a/plugins/_browser/extensions/python/system_prompt/_20_browser_context.py b/plugins/_browser/extensions/python/system_prompt/_20_browser_context.py new file mode 100644 index 000000000..72edeaaaf --- /dev/null +++ b/plugins/_browser/extensions/python/system_prompt/_20_browser_context.py @@ -0,0 +1,59 @@ +from __future__ import annotations + +from typing import Any + +from agent import LoopData +from helpers.extension import Extension +from plugins._browser.helpers.runtime import get_runtime + + +class BrowserContextPrompt(Extension): + async def execute( + self, + system_prompt: list[str] = [], + loop_data: LoopData = LoopData(), + **kwargs: Any, + ): + if not self.agent: + return + + runtime = await get_runtime(self.agent.context.id, create=False) + if not runtime: + return + + try: + listing = await runtime.call("list") + except Exception: + return + + browsers = listing.get("browsers") or [] + if not browsers: + return + + rows = ["browser id|url|title"] + for browser in browsers: + rows.append( + f"{browser.get('id')}|{browser.get('currentUrl', '')}|{browser.get('title', '')}" + ) + + section = ["currently open web browsers", "\n".join(rows)] + last_id = listing.get("last_interacted_browser_id") + if last_id: + try: + state = await runtime.call("state", last_id) + content = await runtime.call("content", last_id, None) + document = content.get("document") if isinstance(content, dict) else "" + if document: + section.extend( + [ + "", + "last interacted web browser", + f"browser id|url|title\n{state.get('id')}|{state.get('currentUrl', '')}|{state.get('title', '')}", + "page content↓", + str(document), + ] + ) + except Exception: + pass + + system_prompt.append("\n".join(section)) diff --git a/plugins/_browser/extensions/python/webui_ws_disconnect/_50_browser.py b/plugins/_browser/extensions/python/webui_ws_disconnect/_50_browser.py new file mode 100644 index 000000000..6579809c3 --- /dev/null +++ b/plugins/_browser/extensions/python/webui_ws_disconnect/_50_browser.py @@ -0,0 +1,24 @@ +from __future__ import annotations + +from typing import Any + +from helpers.extension import Extension +from plugins._browser.api.ws_browser import WsBrowser + + +class BrowserWebuiWsDisconnect(Extension): + async def execute( + self, + instance: Any = None, + sid: str = "", + **kwargs: Any, + ) -> None: + if instance is None: + return + handler = WsBrowser( + instance.socketio, + instance.lock, + manager=instance.manager, + namespace=instance.namespace, + ) + await handler.on_disconnect(sid) diff --git a/plugins/_browser/extensions/python/webui_ws_event/_50_browser.py b/plugins/_browser/extensions/python/webui_ws_event/_50_browser.py new file mode 100644 index 000000000..e3bda7b90 --- /dev/null +++ b/plugins/_browser/extensions/python/webui_ws_event/_50_browser.py @@ -0,0 +1,47 @@ +from __future__ import annotations + +from typing import Any + +from helpers.extension import Extension +from helpers.ws_manager import WsResult +from plugins._browser.api.ws_browser import WsBrowser + + +class BrowserWebuiWsEvents(Extension): + async def execute( + self, + instance: Any = None, + sid: str = "", + event_type: str = "", + data: dict[str, Any] | None = None, + response_data: dict[str, Any] | None = None, + **kwargs: Any, + ) -> None: + if not event_type.startswith("browser_") or instance is None or response_data is None: + return + + handler = WsBrowser( + instance.socketio, + instance.lock, + manager=instance.manager, + namespace=instance.namespace, + ) + result = await handler.process(event_type, data or {}, sid) + if result is None: + return + + if isinstance(result, WsResult): + payload = result.as_result( + handler_id=handler.identifier, + fallback_correlation_id=(data or {}).get("correlationId"), + ) + if payload.get("ok"): + response_data.update(payload.get("data") or {}) + else: + response_data["browser_error"] = payload.get("error") or { + "code": "BROWSER_ERROR", + "error": "Browser request failed", + } + return + + response_data.update(result) diff --git a/plugins/_browser/extensions/webui/chat-input-bottom-actions-start/browser-button.html b/plugins/_browser/extensions/webui/chat-input-bottom-actions-start/browser-button.html new file mode 100644 index 000000000..68266ae3e --- /dev/null +++ b/plugins/_browser/extensions/webui/chat-input-bottom-actions-start/browser-button.html @@ -0,0 +1,17 @@ + diff --git a/plugins/_browser/extensions/webui/get_tool_message_handler/browser-tool-handler.js b/plugins/_browser/extensions/webui/get_tool_message_handler/browser-tool-handler.js new file mode 100644 index 000000000..0b4ddfc10 --- /dev/null +++ b/plugins/_browser/extensions/webui/get_tool_message_handler/browser-tool-handler.js @@ -0,0 +1,76 @@ +import { + createActionButton, + copyToClipboard, +} from "/components/messages/action-buttons/simple-action-buttons.js"; +import { store as stepDetailStore } from "/components/modals/process-step-detail/step-detail-store.js"; +import { store as speechStore } from "/components/chat/speech/speech-store.js"; +import { + buildDetailPayload, + cleanStepTitle, + drawProcessStep, +} from "/js/messages.js"; + +const BROWSER_MODAL = "/plugins/_browser/webui/main.html"; + +export default async function registerBrowserToolHandler(extData) { + if (extData?.tool_name === "browser") { + extData.handler = drawBrowserTool; + } +} + +function drawBrowserTool({ + id, + type, + heading, + content, + kvps, + timestamp, + agentno = 0, + ...additional +}) { + const title = cleanStepTitle(heading); + const displayKvps = { ...kvps }; + const headerLabels = [ + kvps?._tool_name && { label: kvps._tool_name, class: "tool-name-badge" }, + ].filter(Boolean); + const contentText = String(content ?? ""); + const browserButton = createActionButton( + "visibility", + "Browser", + () => { + if (window.ensureModalOpen) { + void window.ensureModalOpen(BROWSER_MODAL); + return; + } + void window.openModal?.(BROWSER_MODAL); + }, + ); + browserButton.setAttribute("title", "Open Browser"); + browserButton.setAttribute("aria-label", "Open Browser"); + browserButton.setAttribute("data-bs-placement", "top"); + browserButton.setAttribute("data-bs-trigger", "hover"); + const actionButtons = [browserButton]; + + if (contentText.trim()) { + actionButtons.push( + createActionButton("detail", "", () => + stepDetailStore.showStepDetail( + buildDetailPayload(arguments[0], { headerLabels }), + ), + ), + createActionButton("speak", "", () => speechStore.speak(contentText)), + createActionButton("copy", "", () => copyToClipboard(contentText)), + ); + } + + return drawProcessStep({ + id, + title, + code: "WWW", + classes: undefined, + kvps: displayKvps, + content, + actionButtons: actionButtons.filter(Boolean), + log: arguments[0], + }); +} diff --git a/plugins/_browser/helpers/__init__.py b/plugins/_browser/helpers/__init__.py new file mode 100644 index 000000000..4b18f3bdb --- /dev/null +++ b/plugins/_browser/helpers/__init__.py @@ -0,0 +1 @@ +# Built-in direct browser helpers. diff --git a/plugins/_browser/helpers/config.py b/plugins/_browser/helpers/config.py new file mode 100644 index 000000000..a996da098 --- /dev/null +++ b/plugins/_browser/helpers/config.py @@ -0,0 +1,272 @@ +from __future__ import annotations + +from pathlib import Path +from typing import TYPE_CHECKING, Any + +if TYPE_CHECKING: + from agent import Agent + + +PLUGIN_NAME = "_browser" +MODEL_PRESET_KEY = "model_preset" +BASE_BROWSER_ARGS = [ + "--no-sandbox", + "--disable-dev-shm-usage", + "--disable-gpu", +] + + +def _normalize_extension_paths(value: Any) -> list[str]: + if isinstance(value, str): + candidates = value.replace("\r\n", "\n").replace("\r", "\n").split("\n") + elif isinstance(value, (list, tuple, set)): + candidates = list(value) + else: + candidates = [] + + normalized_paths: list[str] = [] + seen: set[str] = set() + for entry in candidates: + raw_path = str(entry or "").strip() + if not raw_path: + continue + normalized = str(Path(raw_path).expanduser()) + if normalized in seen: + continue + seen.add(normalized) + normalized_paths.append(normalized) + return normalized_paths + + +def _normalize_model_preset(value: Any) -> str: + return str(value or "").strip() + + +def normalize_browser_config(settings: dict[str, Any] | None) -> dict[str, Any]: + raw = settings if isinstance(settings, dict) else {} + return { + "extensions_enabled": bool(raw.get("extensions_enabled", False)), + "extension_paths": _normalize_extension_paths(raw.get("extension_paths", [])), + MODEL_PRESET_KEY: _normalize_model_preset(raw.get(MODEL_PRESET_KEY, "")), + } + + +def browser_runtime_config(settings: dict[str, Any] | None) -> dict[str, Any]: + config = normalize_browser_config(settings) + return { + "extensions_enabled": config["extensions_enabled"], + "extension_paths": config["extension_paths"], + } + + +def get_browser_config(agent: "Agent | None" = None) -> dict[str, Any]: + from helpers import plugins + + return normalize_browser_config(plugins.get_plugin_config(PLUGIN_NAME, agent=agent) or {}) + + +def get_browser_model_preset_name( + agent: "Agent | None" = None, + settings: dict[str, Any] | None = None, +) -> str: + config = ( + normalize_browser_config(settings) + if settings is not None + else get_browser_config(agent=agent) + ) + return str(config.get(MODEL_PRESET_KEY, "") or "").strip() + + +def get_browser_model_preset_options( + agent: "Agent | None" = None, + settings: dict[str, Any] | None = None, +) -> list[dict[str, Any]]: + from plugins._model_config.helpers import model_config + + selected_name = get_browser_model_preset_name(agent=agent, settings=settings) + options: list[dict[str, Any]] = [] + found_selected = False + + for preset in model_config.get_presets(): + name = str(preset.get("name", "") or "").strip() + if not name: + continue + if name == selected_name: + found_selected = True + chat_cfg = preset.get("chat", {}) if isinstance(preset, dict) else {} + if not isinstance(chat_cfg, dict): + chat_cfg = {} + provider = str(chat_cfg.get("provider", "") or "").strip() + model_name = str(chat_cfg.get("name", "") or "").strip() + summary = " / ".join(part for part in (provider, model_name) if part) + options.append( + { + "name": name, + "label": name, + "missing": False, + "summary": summary, + } + ) + + if selected_name and not found_selected: + options.append( + { + "name": selected_name, + "label": f"{selected_name} (missing)", + "missing": True, + "summary": "", + } + ) + + return options + + +def resolve_browser_model_selection( + agent: "Agent | None" = None, + settings: dict[str, Any] | None = None, +) -> dict[str, Any]: + from plugins._model_config.helpers import model_config + + preset_name = get_browser_model_preset_name(agent=agent, settings=settings) + if preset_name: + preset = model_config.get_preset_by_name(preset_name) + if isinstance(preset, dict): + chat_cfg = preset.get("chat", {}) + if isinstance(chat_cfg, dict) and ( + str(chat_cfg.get("provider", "") or "").strip() + or str(chat_cfg.get("name", "") or "").strip() + ): + return { + "config": chat_cfg, + "source_kind": "preset", + "source_label": f"Preset '{preset_name}' via _model_config", + "selected_preset_name": preset_name, + "preset_status": "active", + "warning": "", + } + return { + "config": model_config.get_chat_model_config(agent), + "source_kind": "main", + "source_label": "Main Model via _model_config", + "selected_preset_name": preset_name, + "preset_status": "invalid", + "warning": ( + f"Configured browser preset '{preset_name}' does not define a chat model. " + "Falling back to the Main Model." + ), + } + + return { + "config": model_config.get_chat_model_config(agent), + "source_kind": "main", + "source_label": "Main Model via _model_config", + "selected_preset_name": preset_name, + "preset_status": "missing", + "warning": ( + f"Configured browser preset '{preset_name}' was not found. " + "Falling back to the Main Model." + ), + } + + return { + "config": model_config.get_chat_model_config(agent), + "source_kind": "main", + "source_label": "Main Model via _model_config", + "selected_preset_name": "", + "preset_status": "none", + "warning": "", + } + + +def resolve_browser_model(agent: "Agent", settings: dict[str, Any] | None = None): + selection = resolve_browser_model_selection(agent=agent, settings=settings) + if selection["source_kind"] == "main": + return agent.get_chat_model() + + import models + from plugins._model_config.helpers import model_config + + model_config_object = model_config.build_model_config( + selection["config"], + models.ModelType.CHAT, + ) + return models.get_chat_model( + model_config_object.provider, + model_config_object.name, + model_config=model_config_object, + **model_config_object.build_kwargs(), + ) + + +def describe_browser_extensions(settings: dict[str, Any] | None) -> dict[str, Any]: + config = normalize_browser_config(settings) + path_details: list[dict[str, Any]] = [] + for extension_path in config["extension_paths"]: + path = Path(extension_path) + exists = path.exists() + is_dir = path.is_dir() if exists else False + path_details.append( + { + "path": extension_path, + "exists": exists, + "is_dir": is_dir, + "loadable": exists and is_dir, + } + ) + + active_paths = [item["path"] for item in path_details if item["loadable"]] + invalid_paths = [item["path"] for item in path_details if not item["loadable"]] + active = bool(config["extensions_enabled"] and active_paths) + + warnings: list[str] = [] + if config["extensions_enabled"] and not config["extension_paths"]: + warnings.append( + "Extensions are enabled, but no unpacked extension directories are configured." + ) + elif config["extensions_enabled"] and not active_paths: + warnings.append( + "Extensions are enabled, but none of the configured extension directories are readable unpacked folders." + ) + elif invalid_paths: + warnings.append( + "Some configured extension directories are missing or not directories, so they will be skipped." + ) + + return { + "enabled": bool(config["extensions_enabled"]), + "active": active, + "configured_paths": config["extension_paths"], + "active_paths": active_paths, + "invalid_paths": invalid_paths, + "path_details": path_details, + "active_path_count": len(active_paths), + "warnings": warnings, + } + + +def build_browser_launch_config(settings: dict[str, Any] | None) -> dict[str, Any]: + extensions = describe_browser_extensions(settings) + args = list(BASE_BROWSER_ARGS) + channel: str | None = None + browser_mode = "headless_shell" + + if extensions["active"]: + joined_paths = ",".join(extensions["active_paths"]) + args.extend( + [ + f"--disable-extensions-except={joined_paths}", + f"--load-extension={joined_paths}", + ] + ) + channel = "chromium" + browser_mode = "chromium_extensions" + else: + args.insert(0, "--headless=new") + + return { + "args": args, + "browser_mode": browser_mode, + "channel": channel, + "extensions": extensions, + "requires_full_browser": bool(extensions["active"]), + } diff --git a/plugins/_browser/helpers/extension_manager.py b/plugins/_browser/helpers/extension_manager.py new file mode 100644 index 000000000..609fd630e --- /dev/null +++ b/plugins/_browser/helpers/extension_manager.py @@ -0,0 +1,177 @@ +from __future__ import annotations + +import json +import re +import shutil +import tempfile +import urllib.request +import zipfile +from pathlib import Path +from typing import Any + +from helpers import files, plugins +from plugins._browser.helpers.config import PLUGIN_NAME, get_browser_config + + +EXTENSION_ID_RE = re.compile(r"^[a-p]{32}$") +WEB_STORE_ID_RE = re.compile(r"(? Path: + root = Path(files.get_abs_path("usr/browser-extensions")) + root.mkdir(parents=True, exist_ok=True) + return root + + +def parse_chrome_web_store_extension_id(value: str) -> str: + source = str(value or "").strip() + if EXTENSION_ID_RE.fullmatch(source): + return source + + match = WEB_STORE_ID_RE.search(source) + if match: + return match.group(1) + + raise ValueError("Enter a Chrome Web Store URL or a 32-character extension id.") + + +def list_browser_extensions() -> list[dict[str, Any]]: + root = get_extensions_root() + config = get_browser_config() + enabled_paths = {str(Path(path).expanduser()) for path in config["extension_paths"]} + entries: list[dict[str, Any]] = [] + + for manifest_path in sorted(root.glob("**/manifest.json")): + extension_dir = manifest_path.parent + try: + manifest = json.loads(manifest_path.read_text(encoding="utf-8")) + except Exception: + manifest = {} + extension_path = str(extension_dir) + entries.append( + { + "name": manifest.get("name") or extension_dir.name, + "version": manifest.get("version") or "", + "path": extension_path, + "enabled": extension_path in enabled_paths, + } + ) + + return entries + + +def install_chrome_web_store_extension(source: str) -> dict[str, Any]: + extension_id = parse_chrome_web_store_extension_id(source) + target = get_extensions_root() / "chrome-web-store" / extension_id + + with tempfile.TemporaryDirectory(prefix="a0-browser-ext-") as tmp: + archive_path = Path(tmp) / f"{extension_id}.crx" + _download_crx(extension_id, archive_path) + payload_path = Path(tmp) / f"{extension_id}.zip" + payload_path.write_bytes(_crx_zip_payload(archive_path.read_bytes())) + extracted_path = Path(tmp) / "extracted" + _safe_extract_zip(payload_path, extracted_path) + + if not (extracted_path / "manifest.json").is_file(): + raise ValueError("Downloaded extension did not contain a manifest.json file.") + + if target.exists(): + shutil.rmtree(target) + target.parent.mkdir(parents=True, exist_ok=True) + shutil.copytree(extracted_path, target) + + config = _enable_extension_path(target) + manifest = _read_manifest(target) + return { + "ok": True, + "id": extension_id, + "name": manifest.get("name") or extension_id, + "version": manifest.get("version") or "", + "path": str(target), + "extensions_enabled": config["extensions_enabled"], + "extension_paths": config["extension_paths"], + } + + +def _download_crx(extension_id: str, archive_path: Path) -> None: + url = WEB_STORE_DOWNLOAD_URL.format(extension_id=extension_id) + request = urllib.request.Request( + url, + headers={ + "User-Agent": ( + "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 " + "(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" + ) + }, + ) + with urllib.request.urlopen(request, timeout=30) as response: + data = response.read() + if not data: + raise ValueError("Chrome Web Store returned an empty extension package.") + archive_path.write_bytes(data) + + +def _crx_zip_payload(data: bytes) -> bytes: + if data.startswith(b"PK"): + return data + if data[:4] != b"Cr24": + raise ValueError("Downloaded package is not a CRX or ZIP archive.") + + version = int.from_bytes(data[4:8], "little") + if version == 2: + public_key_len = int.from_bytes(data[8:12], "little") + signature_len = int.from_bytes(data[12:16], "little") + offset = 16 + public_key_len + signature_len + elif version == 3: + header_len = int.from_bytes(data[8:12], "little") + offset = 12 + header_len + else: + raise ValueError(f"Unsupported CRX version: {version}.") + + payload = data[offset:] + if not payload.startswith(b"PK"): + raise ValueError("CRX payload did not contain a ZIP archive.") + return payload + + +def _safe_extract_zip(archive_path: Path, target_dir: Path) -> None: + target_dir.mkdir(parents=True, exist_ok=True) + root = target_dir.resolve() + with zipfile.ZipFile(archive_path) as archive: + for member in archive.infolist(): + destination = (target_dir / member.filename).resolve() + if not destination.is_relative_to(root): + raise ValueError("Extension archive contains an unsafe path.") + if member.is_dir(): + destination.mkdir(parents=True, exist_ok=True) + continue + destination.parent.mkdir(parents=True, exist_ok=True) + with archive.open(member) as source, destination.open("wb") as output: + shutil.copyfileobj(source, output) + + +def _enable_extension_path(extension_path: Path) -> dict[str, Any]: + config = get_browser_config() + path = str(extension_path) + paths = list(config["extension_paths"]) + if path not in paths: + paths.append(path) + config["extensions_enabled"] = True + config["extension_paths"] = paths + plugins.save_plugin_config(PLUGIN_NAME, "", "", config) + return config + + +def _read_manifest(extension_path: Path) -> dict[str, Any]: + manifest_path = extension_path / "manifest.json" + try: + return json.loads(manifest_path.read_text(encoding="utf-8")) + except Exception: + return {} diff --git a/plugins/_browser/helpers/playwright.py b/plugins/_browser/helpers/playwright.py new file mode 100644 index 000000000..59c79b059 --- /dev/null +++ b/plugins/_browser/helpers/playwright.py @@ -0,0 +1,57 @@ +import os +import subprocess +from pathlib import Path + +from helpers import files + +HEADLESS_SHELL_PATTERNS = ( + "chromium_headless_shell-*/chrome-*/headless_shell", + "chromium_headless_shell-*/chrome-*/headless_shell.exe", +) + +FULL_CHROMIUM_PATTERNS = ( + "chromium-*/chrome-linux/chrome", + "chromium-*/chrome-win/chrome.exe", +) + + +def get_playwright_cache_dir() -> str: + return files.get_abs_path("tmp/playwright") + + +def configure_playwright_env() -> str: + cache_dir = get_playwright_cache_dir() + os.environ["PLAYWRIGHT_BROWSERS_PATH"] = cache_dir + return cache_dir + + +def get_playwright_binary(*, full_browser: bool = False) -> Path | None: + cache_dir = Path(get_playwright_cache_dir()) + patterns = FULL_CHROMIUM_PATTERNS if full_browser else (HEADLESS_SHELL_PATTERNS + FULL_CHROMIUM_PATTERNS) + for pattern in patterns: + binary = next(cache_dir.glob(pattern), None) + if binary and binary.exists(): + return binary + return None + + +def ensure_playwright_binary(*, full_browser: bool = False) -> Path: + binary = get_playwright_binary(full_browser=full_browser) + if binary: + return binary + + cache_dir = configure_playwright_env() + env = os.environ.copy() + env["PLAYWRIGHT_BROWSERS_PATH"] = cache_dir + install_command = ["playwright", "install", "chromium"] + if not full_browser: + install_command.append("--only-shell") + subprocess.check_call( + install_command, + env=env, + ) + + binary = get_playwright_binary(full_browser=full_browser) + if not binary: + raise RuntimeError("Playwright Chromium binary not found after installation") + return binary diff --git a/plugins/_browser/helpers/runtime.py b/plugins/_browser/helpers/runtime.py new file mode 100644 index 000000000..99b777dde --- /dev/null +++ b/plugins/_browser/helpers/runtime.py @@ -0,0 +1,623 @@ +from __future__ import annotations + +import atexit +import asyncio +import base64 +import re +import shutil +import threading +from dataclasses import dataclass +from pathlib import Path +from typing import Any +from urllib.parse import urlsplit, urlunsplit + +from helpers import files +from helpers.defer import DeferredTask +from helpers.print_style import PrintStyle + +from plugins._browser.helpers.config import build_browser_launch_config, get_browser_config +from plugins._browser.helpers.playwright import configure_playwright_env, ensure_playwright_binary + + +PLUGIN_DIR = Path(__file__).resolve().parents[1] +CONTENT_HELPER_PATH = PLUGIN_DIR / "assets" / "browser-page-content.js" +RUNTIME_DATA_KEY = "_browser_runtime" +DEFAULT_VIEWPORT = {"width": 1024, "height": 768} + +_SPECIAL_SCHEME_RE = re.compile(r"^(?:about|blob|data|file|mailto|tel):", re.I) +_URL_SCHEME_RE = re.compile(r"^[a-z][a-z\d+\-.]*://", re.I) +_LOCAL_HOST_RE = re.compile( + r"^(?:localhost|\[[0-9a-f:.]+\]|(?:\d{1,3}\.){3}\d{1,3})(?::\d+)?$", + re.I, +) +_TYPED_HOST_RE = re.compile( + r"^(?:localhost|\[[0-9a-f:.]+\]|(?:\d{1,3}\.){3}\d{1,3}|" + r"(?:[a-z\d](?:[a-z\d-]{0,61}[a-z\d])?\.)+[a-z\d-]{2,63})(?::\d+)?$", + re.I, +) +_SAFE_CONTEXT_RE = re.compile(r"[^a-zA-Z0-9_.-]+") + + +def normalize_url(value: str) -> str: + raw = str(value or "").strip() + if not raw: + raise ValueError("Browser navigation requires a non-empty URL.") + + def with_trailing_path(url: str) -> str: + parts = urlsplit(url) + if parts.scheme in {"http", "https"} and not parts.path: + return urlunsplit((parts.scheme, parts.netloc, "/", parts.query, parts.fragment)) + return urlunsplit(parts) + + try: + host = re.split(r"[/?#]", raw, 1)[0] or "" + if ( + not _URL_SCHEME_RE.match(raw) + and not _SPECIAL_SCHEME_RE.match(raw) + and not raw.startswith(("/", "?", "#", ".")) + and not re.search(r"\s", raw) + and _TYPED_HOST_RE.match(host) + ): + protocol = "http://" if _LOCAL_HOST_RE.match(host) else "https://" + return with_trailing_path(protocol + raw) + + parts = urlsplit(raw) + if parts.scheme: + return with_trailing_path(raw) + except Exception: + pass + + return with_trailing_path("https://" + raw) + + +def _safe_context_id(context_id: str) -> str: + return _SAFE_CONTEXT_RE.sub("_", str(context_id or "default")).strip("._") or "default" + + +@dataclass +class BrowserPage: + id: int + page: Any + + +class BrowserRuntime: + def __init__(self, context_id: str): + self.context_id = str(context_id) + self._core = _BrowserRuntimeCore(self.context_id) + self._worker = DeferredTask(thread_name=f"BrowserRuntime-{self.context_id}") + self._closed = False + + async def call(self, method: str, *args: Any, **kwargs: Any) -> Any: + if self._closed and method != "close": + raise RuntimeError("Browser runtime is closed.") + + async def runner(): + fn = getattr(self._core, method) + return await fn(*args, **kwargs) + + return await self._worker.execute_inside(runner) + + async def close(self, delete_profile: bool = False) -> None: + if self._closed: + return + try: + await self.call("close", delete_profile=delete_profile) + finally: + self._closed = True + self._worker.kill(terminate_thread=True) + + +class _BrowserRuntimeCore: + def __init__(self, context_id: str): + self.context_id = context_id + self.safe_context_id = _safe_context_id(context_id) + self.playwright = None + self.context = None + self.pages: dict[int, BrowserPage] = {} + self.next_browser_id = 1 + self.last_interacted_browser_id: int | None = None + self._content_helper_source: str | None = None + + @property + def profile_dir(self) -> Path: + return Path(files.get_abs_path("tmp/browser/sessions", self.safe_context_id)) + + @property + def downloads_dir(self) -> Path: + return Path(files.get_abs_path("usr/downloads/browser")) + + async def ensure_started(self) -> None: + if self.context: + return + + from playwright.async_api import async_playwright + + self.profile_dir.mkdir(parents=True, exist_ok=True) + self.downloads_dir.mkdir(parents=True, exist_ok=True) + browser_config = get_browser_config() + launch_config = build_browser_launch_config(browser_config) + configure_playwright_env() + browser_binary = ensure_playwright_binary( + full_browser=launch_config["requires_full_browser"] + ) + + self.playwright = await async_playwright().start() + launch_kwargs: dict[str, Any] = { + "user_data_dir": str(self.profile_dir), + "headless": True, + "accept_downloads": True, + "downloads_path": str(self.downloads_dir), + "viewport": DEFAULT_VIEWPORT, + "screen": DEFAULT_VIEWPORT, + "no_viewport": False, + "args": launch_config["args"], + } + if launch_config["channel"]: + launch_kwargs["channel"] = launch_config["channel"] + else: + launch_kwargs["executable_path"] = str(browser_binary) + self.context = await self.playwright.chromium.launch_persistent_context( + **launch_kwargs + ) + self.context.set_default_timeout(30000) + self.context.set_default_navigation_timeout(30000) + await self.context.add_init_script(self._shadow_dom_script()) + await self.context.add_init_script(path=str(CONTENT_HELPER_PATH)) + + for page in list(self.context.pages): + if page.url == "about:blank": + try: + await page.close() + except Exception: + pass + continue + self._register_page(page) + + async def open(self, url: str = "about:blank") -> dict[str, Any]: + await self.ensure_started() + page = await self.context.new_page() + browser_page = self._register_page(page) + self.last_interacted_browser_id = browser_page.id + if url and url != "about:blank": + await self._goto(page, normalize_url(url)) + else: + await self._settle(page) + return {"id": browser_page.id, "state": await self._state(browser_page.id)} + + async def list(self) -> dict[str, Any]: + await self.ensure_started() + return { + "browsers": [await self._state(browser_id) for browser_id in sorted(self.pages)], + "last_interacted_browser_id": self.last_interacted_browser_id, + } + + async def state(self, browser_id: int | str | None = None) -> dict[str, Any]: + await self.ensure_started() + return await self._state(self._resolve_browser_id(browser_id)) + + async def navigate(self, browser_id: int | str | None, url: str) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await self._goto(page, normalize_url(url)) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def back(self, browser_id: int | str | None = None) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await page.go_back(wait_until="domcontentloaded", timeout=10000) + await self._settle(page) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def forward(self, browser_id: int | str | None = None) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await page.go_forward(wait_until="domcontentloaded", timeout=10000) + await self._settle(page) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def reload(self, browser_id: int | str | None = None) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await page.reload(wait_until="domcontentloaded", timeout=15000) + await self._settle(page) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def content( + self, + browser_id: int | str | None = None, + payload: dict[str, Any] | None = None, + ) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await self._ensure_content_helper(page) + result = await page.evaluate( + "(payload) => globalThis.__spaceBrowserPageContent__.capture(payload || null)", + payload or None, + ) + self.last_interacted_browser_id = resolved_id + return result or {} + + async def detail(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await self._ensure_content_helper(page) + result = await page.evaluate( + "(ref) => globalThis.__spaceBrowserPageContent__.detail(ref)", + reference_id, + ) + self.last_interacted_browser_id = resolved_id + return result or {} + + async def evaluate(self, browser_id: int | str | None, script: str) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + result = await page.evaluate(str(script or "undefined")) + self.last_interacted_browser_id = resolved_id + return {"result": result, "state": await self._state(resolved_id)} + + async def click(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]: + return await self._reference_action("click", browser_id, reference_id) + + async def submit(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]: + return await self._reference_action("submit", browser_id, reference_id) + + async def scroll(self, browser_id: int | str | None, reference_id: int | str) -> dict[str, Any]: + return await self._reference_action("scroll", browser_id, reference_id) + + async def type( + self, + browser_id: int | str | None, + reference_id: int | str, + text: str, + ) -> dict[str, Any]: + return await self._reference_action("type", browser_id, reference_id, text) + + async def type_submit( + self, + browser_id: int | str | None, + reference_id: int | str, + text: str, + ) -> dict[str, Any]: + return await self._reference_action("typeSubmit", browser_id, reference_id, text) + + async def close_browser(self, browser_id: int | str | None = None) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await page.close() + self.pages.pop(resolved_id, None) + if self.last_interacted_browser_id == resolved_id: + self.last_interacted_browser_id = next(iter(sorted(self.pages)), None) + return await self.list() + + async def close_all_browsers(self) -> dict[str, Any]: + await self.ensure_started() + for browser_id in list(self.pages): + try: + await self.pages[browser_id].page.close() + except Exception: + pass + self.pages.clear() + self.last_interacted_browser_id = None + return {"browsers": [], "last_interacted_browser_id": None} + + async def screenshot( + self, + browser_id: int | str | None = None, + *, + quality: int = 70, + ) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + image = await page.screenshot(type="jpeg", quality=max(20, min(95, int(quality)))) + return { + "browser_id": resolved_id, + "mime": "image/jpeg", + "image": base64.b64encode(image).decode("ascii"), + "state": await self._state(resolved_id), + } + + async def set_viewport( + self, + browser_id: int | str | None, + width: int, + height: int, + ) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + viewport = { + "width": max(320, min(4096, int(width or DEFAULT_VIEWPORT["width"]))), + "height": max(200, min(4096, int(height or DEFAULT_VIEWPORT["height"]))), + } + await page.set_viewport_size(viewport) + self.last_interacted_browser_id = resolved_id + return {"state": await self._state(resolved_id), "viewport": viewport} + + async def mouse( + self, + browser_id: int | str | None, + event_type: str, + x: float, + y: float, + button: str = "left", + ) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + event_type = str(event_type or "click").lower() + if event_type == "move": + await page.mouse.move(float(x), float(y)) + elif event_type == "down": + await page.mouse.down(button=button) + elif event_type == "up": + await page.mouse.up(button=button) + else: + await page.mouse.click(float(x), float(y), button=button) + await self._settle(page, short=True) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def wheel( + self, + browser_id: int | str | None, + x: float, + y: float, + delta_x: float = 0, + delta_y: float = 0, + ) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await page.mouse.move(float(x), float(y)) + await page.mouse.wheel(float(delta_x), float(delta_y)) + await self._settle(page, short=True) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def keyboard( + self, + browser_id: int | str | None, + *, + key: str = "", + text: str = "", + ) -> dict[str, Any]: + await self.ensure_started() + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + if text: + await page.keyboard.type(str(text)) + elif key: + await page.keyboard.press(str(key)) + await self._settle(page, short=True) + self.last_interacted_browser_id = resolved_id + return await self._state(resolved_id) + + async def close(self, delete_profile: bool = False) -> None: + for browser_id in list(self.pages): + try: + await self.pages[browser_id].page.close() + except Exception: + pass + self.pages.clear() + if self.context: + try: + await self.context.close() + except Exception as exc: + PrintStyle.warning(f"Browser context close failed: {exc}") + self.context = None + if self.playwright: + try: + await self.playwright.stop() + except Exception as exc: + PrintStyle.warning(f"Playwright stop failed: {exc}") + self.playwright = None + self.last_interacted_browser_id = None + if delete_profile: + shutil.rmtree(self.profile_dir, ignore_errors=True) + + async def _reference_action( + self, + helper_method: str, + browser_id: int | str | None, + reference_id: int | str, + text: str | None = None, + ) -> dict[str, Any]: + resolved_id = self._resolve_browser_id(browser_id) + page = self._page(resolved_id) + await self._ensure_content_helper(page) + if text is None: + action = await page.evaluate( + "(args) => globalThis.__spaceBrowserPageContent__[args.method](args.ref)", + {"method": helper_method, "ref": reference_id}, + ) + else: + action = await page.evaluate( + "(args) => globalThis.__spaceBrowserPageContent__[args.method](args.ref, args.text)", + {"method": helper_method, "ref": reference_id, "text": text}, + ) + await self._settle(page, short=False) + self.last_interacted_browser_id = resolved_id + return {"action": action or {}, "state": await self._state(resolved_id)} + + async def _goto(self, page: Any, url: str) -> None: + from playwright.async_api import TimeoutError as PlaywrightTimeoutError + + try: + await page.goto(url, wait_until="domcontentloaded", timeout=30000) + except PlaywrightTimeoutError: + PrintStyle.warning(f"Browser navigation timed out after DOM handoff: {url}") + await self._settle(page) + + async def _settle(self, page: Any, short: bool = False) -> None: + from playwright.async_api import TimeoutError as PlaywrightTimeoutError + + try: + await page.wait_for_load_state( + "domcontentloaded", + timeout=1000 if short else 5000, + ) + except PlaywrightTimeoutError: + pass + await asyncio.sleep(0.1 if short else 0.35) + + async def _state(self, browser_id: int) -> dict[str, Any]: + browser_page = self.pages.get(int(browser_id)) + if not browser_page: + raise KeyError(f"Browser {browser_id} is not open.") + page = browser_page.page + try: + title = await page.title() + except Exception: + title = "" + try: + history_length = await page.evaluate("() => globalThis.history?.length || 0") + except Exception: + history_length = 0 + return { + "id": browser_page.id, + "currentUrl": page.url, + "title": title, + "canGoBack": bool(history_length and int(history_length) > 1), + "canGoForward": False, + "loading": False, + } + + def _register_page(self, page: Any) -> BrowserPage: + existing = self._browser_id_for_page(page) + if existing is not None: + return self.pages[existing] + browser_id = self.next_browser_id + self.next_browser_id += 1 + browser_page = BrowserPage(id=browser_id, page=page) + self.pages[browser_id] = browser_page + + def on_close() -> None: + self.pages.pop(browser_id, None) + + page.on("close", on_close) + return browser_page + + def _browser_id_for_page(self, page: Any) -> int | None: + for browser_id, browser_page in self.pages.items(): + if browser_page.page == page: + return browser_id + return None + + def _resolve_browser_id(self, browser_id: int | str | None = None) -> int: + if browser_id is None or str(browser_id).strip() == "": + if self.last_interacted_browser_id in self.pages: + return int(self.last_interacted_browser_id) + if self.pages: + return sorted(self.pages)[0] + raise KeyError("No browser is open. Use action=open first.") + value = str(browser_id).strip() + if value.startswith("browser-"): + value = value.split("-", 1)[1] + resolved = int(value) + if resolved not in self.pages: + raise KeyError(f"Browser {resolved} is not open.") + return resolved + + def _page(self, browser_id: int) -> Any: + return self.pages[int(browser_id)].page + + async def _ensure_content_helper(self, page: Any) -> None: + has_helper = await page.evaluate( + "() => Boolean(globalThis.__spaceBrowserPageContent__?.capture)" + ) + if has_helper: + return + if self._content_helper_source is None: + self._content_helper_source = CONTENT_HELPER_PATH.read_text(encoding="utf-8") + await page.evaluate(self._content_helper_source) + + @staticmethod + def _shadow_dom_script() -> str: + return """ +(() => { + const original = Element.prototype.attachShadow; + if (original && !original.__a0BrowserOpenShadowPatch) { + const patched = function attachShadow(options) { + return original.call(this, { ...(options || {}), mode: "open" }); + }; + patched.__a0BrowserOpenShadowPatch = true; + Element.prototype.attachShadow = patched; + } +})(); +""" + + +_runtimes: dict[str, BrowserRuntime] = {} +_runtime_lock = threading.RLock() + + +async def get_runtime(context_id: str, *, create: bool = True) -> BrowserRuntime | None: + context_id = str(context_id or "").strip() + if not context_id: + raise ValueError("context_id is required") + with _runtime_lock: + runtime = _runtimes.get(context_id) + if runtime is None and create: + runtime = BrowserRuntime(context_id) + _runtimes[context_id] = runtime + return runtime + + +async def close_runtime(context_id: str, *, delete_profile: bool = True) -> None: + context_id = str(context_id or "").strip() + if not context_id: + return + with _runtime_lock: + runtime = _runtimes.pop(context_id, None) + if runtime: + await runtime.close(delete_profile=delete_profile) + + +def close_runtime_sync(context_id: str, *, delete_profile: bool = True) -> None: + task = DeferredTask(thread_name="BrowserCleanup") + task.start_task(close_runtime, context_id, delete_profile=delete_profile) + try: + task.result_sync(timeout=30) + finally: + task.kill(terminate_thread=True) + + +async def close_all_runtimes(*, delete_profiles: bool = False) -> None: + with _runtime_lock: + runtimes = list(_runtimes.values()) + _runtimes.clear() + for runtime in runtimes: + try: + await runtime.close(delete_profile=delete_profiles) + except Exception as exc: + PrintStyle.warning(f"Browser runtime cleanup failed: {exc}") + + +def close_all_runtimes_sync() -> None: + task = DeferredTask(thread_name="BrowserCleanupAll") + task.start_task(close_all_runtimes, delete_profiles=False) + try: + task.result_sync(timeout=30) + finally: + task.kill(terminate_thread=True) + + +def known_context_ids() -> list[str]: + with _runtime_lock: + return sorted(_runtimes) + + +atexit.register(close_all_runtimes_sync) diff --git a/plugins/_browser/hooks.py b/plugins/_browser/hooks.py new file mode 100644 index 000000000..dc4c54208 --- /dev/null +++ b/plugins/_browser/hooks.py @@ -0,0 +1,47 @@ +from __future__ import annotations + +from helpers import files, plugins, yaml as yaml_helper +from plugins._browser.helpers.config import ( + PLUGIN_NAME, + browser_runtime_config, + normalize_browser_config, +) +from plugins._browser.helpers.runtime import close_all_runtimes_sync + + +def _load_saved_browser_config(project_name: str = "", agent_profile: str = "") -> dict: + entries = plugins.find_plugin_assets( + plugins.CONFIG_FILE_NAME, + plugin_name=PLUGIN_NAME, + project_name=project_name, + agent_profile=agent_profile, + only_first=True, + ) + path = entries[0].get("path", "") if entries else "" + if path and files.exists(path): + return files.read_file_json(path) or {} + + plugin_dir = plugins.find_plugin_dir(PLUGIN_NAME) + default_path = ( + files.get_abs_path(plugin_dir, plugins.CONFIG_DEFAULT_FILE_NAME) + if plugin_dir + else "" + ) + if default_path and files.exists(default_path): + return yaml_helper.loads(files.read_file(default_path)) or {} + + return {} + + +def get_plugin_config(default=None, **kwargs): + return normalize_browser_config(default) + + +def save_plugin_config(settings=None, project_name="", agent_profile="", **kwargs): + normalized = normalize_browser_config(settings) + current = normalize_browser_config( + _load_saved_browser_config(project_name=project_name, agent_profile=agent_profile) + ) + if browser_runtime_config(normalized) != browser_runtime_config(current): + close_all_runtimes_sync() + return normalized diff --git a/plugins/_browser/plugin.yaml b/plugins/_browser/plugin.yaml new file mode 100644 index 000000000..f15608154 --- /dev/null +++ b/plugins/_browser/plugin.yaml @@ -0,0 +1,9 @@ +name: _browser +title: Browser +description: Built-in direct Playwright browser tool and WebUI viewer. +version: 1.0.0 +always_enabled: false +settings_sections: + - external +per_project_config: false +per_agent_config: false diff --git a/plugins/_browser/prompts/agent.system.tool.browser.md b/plugins/_browser/prompts/agent.system.tool.browser.md new file mode 100644 index 000000000..b0112ae25 --- /dev/null +++ b/plugins/_browser/prompts/agent.system.tool.browser.md @@ -0,0 +1,48 @@ +### browser +direct Playwright browser control with visible WebUI viewer +use for web browsing, page inspection, forms, downloads, and browser-only tasks +state stays open per chat context +refs come from content as typed markers: [link 3], [button 6], [image 1], [input text 8] + +actions: open list state navigate back forward reload content detail click type submit type_submit scroll evaluate close close_all +common args: action browser_id url ref text selector selectors script + +workflow: +- open creates a new browser and returns id/state +- content returns readable page markdown with typed refs +- detail inspects one ref, including link/image/input/button metadata +- click/type/type_submit/submit/scroll use refs from latest content capture and return {action,state} +- navigate/back/forward/reload return fresh state +- list shows open browsers + +examples: +~~~json +{ + "tool_name": "browser", + "tool_args": { + "action": "open", + "url": "https://example.com" + } +} +~~~ + +~~~json +{ + "tool_name": "browser", + "tool_args": { + "action": "content", + "browser_id": 1 + } +} +~~~ + +~~~json +{ + "tool_name": "browser", + "tool_args": { + "action": "click", + "browser_id": 1, + "ref": 3 + } +} +~~~ diff --git a/plugins/_browser/tools/browser.py b/plugins/_browser/tools/browser.py new file mode 100644 index 000000000..8d5a969c4 --- /dev/null +++ b/plugins/_browser/tools/browser.py @@ -0,0 +1,107 @@ +from __future__ import annotations + +import json +from typing import Any + +from helpers.tool import Response, Tool +from plugins._browser.helpers.runtime import get_runtime + + +class Browser(Tool): + async def execute( + self, + action: str = "", + browser_id: int | str | None = None, + url: str = "", + ref: int | str | None = None, + text: str = "", + selector: str = "", + selectors: list[str] | None = None, + script: str = "", + **kwargs: Any, + ) -> Response: + action = str(action or self.method or "state").strip().lower().replace("-", "_") + runtime = await get_runtime(self.agent.context.id) + + try: + if action == "open": + result = await runtime.call("open", url or "about:blank") + elif action == "list": + result = await runtime.call("list") + elif action == "state": + result = await runtime.call("state", browser_id) + elif action == "navigate": + result = await runtime.call("navigate", browser_id, url) + elif action == "back": + result = await runtime.call("back", browser_id) + elif action == "forward": + result = await runtime.call("forward", browser_id) + elif action == "reload": + result = await runtime.call("reload", browser_id) + elif action == "content": + payload = self._selector_payload(selector, selectors) + result = await runtime.call("content", browser_id, payload) + elif action == "detail": + result = await runtime.call("detail", browser_id, self._require_ref(ref)) + elif action == "click": + result = await runtime.call("click", browser_id, self._require_ref(ref)) + elif action == "type": + result = await runtime.call("type", browser_id, self._require_ref(ref), text) + elif action == "submit": + result = await runtime.call("submit", browser_id, self._require_ref(ref)) + elif action in {"type_submit", "typesubmit"}: + result = await runtime.call( + "type_submit", + browser_id, + self._require_ref(ref), + text, + ) + elif action == "scroll": + result = await runtime.call("scroll", browser_id, self._require_ref(ref)) + elif action == "evaluate": + result = await runtime.call("evaluate", browser_id, script) + elif action == "close": + result = await runtime.call("close_browser", browser_id) + elif action == "close_all": + result = await runtime.call("close_all_browsers") + else: + return Response( + message=f"Unknown browser action: {action}", + break_loop=False, + ) + except Exception as exc: + return Response(message=f"Browser {action} failed: {exc}", break_loop=False) + + return Response(message=self._format_result(action, result), break_loop=False) + + def get_log_object(self): + return self.agent.context.log.log( + type="tool", + heading=f"icon://captive_portal {self.agent.agent_name}: Using browser", + content="", + kvps=self.args, + _tool_name=self.name, + ) + + @staticmethod + def _require_ref(ref: int | str | None) -> int | str: + if ref is None or str(ref).strip() == "": + raise ValueError("ref is required for this browser action") + return ref + + @staticmethod + def _selector_payload(selector: str = "", selectors: list[str] | None = None) -> dict | None: + if selectors: + return {"selectors": selectors} + if selector: + return {"selector": selector} + return None + + @staticmethod + def _format_result(action: str, result: Any) -> str: + if action == "content" and isinstance(result, dict): + if set(result.keys()) == {"document"}: + return str(result.get("document") or "") + return json.dumps(result, indent=2, ensure_ascii=False) + + return json.dumps(result, indent=2, ensure_ascii=False, default=str) diff --git a/plugins/_browser/webui/browser-config-store.js b/plugins/_browser/webui/browser-config-store.js new file mode 100644 index 000000000..9ef3d98ef --- /dev/null +++ b/plugins/_browser/webui/browser-config-store.js @@ -0,0 +1,155 @@ +import { createStore } from "/js/AlpineStore.js"; +import { fetchApi } from "/js/api.js"; + +const MODEL_CONFIG_API = "/plugins/_model_config/model_presets"; + +function normalizePathList(value) { + const source = Array.isArray(value) + ? value + : String(value || "").split(/\r?\n/); + const seen = new Set(); + const paths = []; + for (const item of source) { + const path = String(item || "").trim(); + if (!path || seen.has(path)) continue; + seen.add(path); + paths.push(path); + } + return paths; +} + +function ensureConfig(config) { + if (!config || typeof config !== "object") return null; + if (typeof config.extensions_enabled !== "boolean") { + config.extensions_enabled = Boolean(config.extensions_enabled); + } + config.extension_paths = normalizePathList(config.extension_paths); + config.model_preset = String(config.model_preset || "").trim(); + delete config.model; + return config; +} + +export const store = createStore("browserConfig", { + config: null, + extensionPathsText: "", + presets: [], + presetsLoading: false, + presetsError: "", + _presetsLoaded: false, + + async init(config) { + this.bindConfig(config); + await this.loadPresets(); + }, + + cleanup() { + this.config = null; + this.extensionPathsText = ""; + this.presetsError = ""; + }, + + bindConfig(config) { + const safeConfig = ensureConfig(config); + if (!safeConfig) return; + if (this.config === safeConfig) return; + this.config = safeConfig; + this.extensionPathsText = safeConfig.extension_paths.join("\n"); + }, + + setExtensionPathsText(value) { + this.extensionPathsText = String(value || ""); + this.syncExtensionPaths(); + }, + + syncExtensionPaths() { + const safeConfig = ensureConfig(this.config); + if (!safeConfig) return; + safeConfig.extension_paths = normalizePathList(this.extensionPathsText); + }, + + hasPaths() { + return this.pathCount() > 0; + }, + + pathCount() { + return normalizePathList(this.extensionPathsText).length; + }, + + pathCountLabel() { + const count = this.pathCount(); + if (!count) return "No extension paths configured"; + return `${count} path${count === 1 ? "" : "s"} configured`; + }, + + extensionModeReady() { + const safeConfig = ensureConfig(this.config); + return Boolean(safeConfig?.extensions_enabled && this.pathCount()); + }, + + async loadPresets() { + if (this._presetsLoaded || this.presetsLoading) return; + this.presetsLoading = true; + this.presetsError = ""; + try { + const response = await fetchApi(MODEL_CONFIG_API, { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ action: "get" }), + }); + const data = await response.json().catch(() => ({})); + this.presets = Array.isArray(data?.presets) + ? data.presets.filter((preset) => String(preset?.name || "").trim()) + : []; + this._presetsLoaded = true; + } catch (error) { + this.presets = []; + this.presetsError = error instanceof Error ? error.message : String(error); + } finally { + this.presetsLoading = false; + } + }, + + selectedPreset() { + const selected = String(this.config?.model_preset || "").trim(); + if (!selected) return null; + return this.presets.find((preset) => preset?.name === selected) || null; + }, + + presetOptions() { + const selected = String(this.config?.model_preset || "").trim(); + const options = this.presets.map((preset) => ({ + ...preset, + label: preset.name, + missing: false, + })); + if (selected && this._presetsLoaded && !options.some((preset) => preset.name === selected)) { + options.push({ + name: selected, + label: `${selected} (missing)`, + missing: true, + }); + } + return options; + }, + + selectedPresetSummary() { + const selected = String(this.config?.model_preset || "").trim(); + if (!selected) return "Using the effective Main Model."; + + const preset = this.selectedPreset(); + if (!preset) return `Preset "${selected}" is not available. Browser will fall back to the Main Model.`; + + const chat = preset.chat || {}; + const parts = [chat.provider, chat.name].filter((item) => String(item || "").trim()); + return parts.length ? parts.join(" / ") : "This preset has no Main Model; Browser will fall back to the Main Model."; + }, + + selectedPresetMissing() { + const selected = String(this.config?.model_preset || "").trim(); + return Boolean(selected && this._presetsLoaded && !this.selectedPreset()); + }, + + openPresets() { + void globalThis.openModal?.("/plugins/_model_config/webui/main.html"); + }, +}); diff --git a/plugins/_browser/webui/browser-store.js b/plugins/_browser/webui/browser-store.js new file mode 100644 index 000000000..3184403fd --- /dev/null +++ b/plugins/_browser/webui/browser-store.js @@ -0,0 +1,596 @@ +import { createStore } from "/js/AlpineStore.js"; +import { callJsonApi } from "/js/api.js"; +import { getNamespacedClient } from "/js/websocket.js"; +import { store as chatInputStore } from "/components/chat/input/input-store.js"; +import { store as fileBrowserStore } from "/components/modals/file-browser/file-browser-store.js"; +import { store as pluginSettingsStore } from "/components/plugins/plugin-settings-store.js"; + +const websocket = getNamespacedClient("/ws"); +websocket.addHandlers(["ws_webui"]); + +const EXTENSIONS_ROOT_FALLBACK = "/a0/usr/browser-extensions"; + +function firstOk(response) { + const result = response?.results?.find((item) => item?.ok); + if (result) { + const data = result.data || {}; + if (data.browser_error) { + throw new Error(data.browser_error.error || data.browser_error.code || "Browser request failed"); + } + return data; + } + const error = response?.results?.find((item) => !item?.ok)?.error; + if (error) throw new Error(error.error || error.code || "Browser request failed"); + return {}; +} + +const model = { + loading: true, + error: "", + status: null, + contextId: "", + browsers: [], + activeBrowserId: null, + address: "", + frameSrc: "", + frameState: null, + connected: false, + addressFocused: false, + _frameOff: null, + _stateOff: null, + _lastFrameAt: 0, + _floatingCleanup: null, + _stageElement: null, + _stageResizeObserver: null, + _viewportSyncTimer: null, + _lastViewportKey: "", + extensionMenuOpen: false, + extensionInstallUrl: "", + extensionActionLoading: false, + extensionActionMessage: "", + extensionActionError: "", + extensionsRoot: "", + extensionsList: [], + + async refreshStatus() { + this.status = await callJsonApi("/plugins/_browser/status", {}); + }, + + async refreshExtensionsList() { + const response = await callJsonApi("/plugins/_browser/extensions", { action: "list" }); + if (response?.ok) { + this.extensionsRoot = response.root || EXTENSIONS_ROOT_FALLBACK; + this.extensionsList = Array.isArray(response.extensions) ? response.extensions : []; + } + }, + + toggleExtensionsMenu() { + this.extensionMenuOpen = !this.extensionMenuOpen; + if (this.extensionMenuOpen) { + this.extensionActionMessage = ""; + this.extensionActionError = ""; + void this.refreshExtensionsList(); + } + }, + + closeExtensionsMenu() { + this.extensionMenuOpen = false; + }, + + resolveContextId() { + const urlContext = new URLSearchParams(globalThis.location?.search || "").get("ctxid"); + const selectedChat = globalThis.Alpine?.store?.("chats")?.selected; + return globalThis.getContext?.() || urlContext || selectedChat || ""; + }, + + async openExtensionsSettings() { + if (!pluginSettingsStore?.openConfig) { + this.error = "Browser settings are unavailable."; + return; + } + try { + this.closeExtensionsMenu(); + await pluginSettingsStore.openConfig("_browser"); + await this.refreshAfterSettingsClose(); + } catch (error) { + this.error = error instanceof Error ? error.message : String(error); + } + }, + + async refreshAfterSettingsClose() { + this.loading = true; + this.error = ""; + try { + await this.refreshStatus(); + await this.refreshExtensionsList(); + this.connected = false; + this.browsers = []; + this.setActiveBrowserId(null); + this.address = ""; + this.frameState = null; + this.frameSrc = ""; + if (this.contextId) { + await this.connectViewer(); + } + } finally { + this.loading = false; + } + }, + + async openExtensionsFolder() { + this.closeExtensionsMenu(); + try { + if (!this.extensionsRoot) { + await this.refreshExtensionsList(); + } + void fileBrowserStore.open(this.extensionsRoot || EXTENSIONS_ROOT_FALLBACK); + } catch (error) { + this.extensionActionError = error instanceof Error ? error.message : String(error); + } + }, + + createExtensionWithAgent() { + this._prefillAgentPrompt( + [ + "Use the a0-browser-ext skill to create a new Chrome extension for Agent Zero's Browser.", + "Start by asking me for the extension name, purpose, target websites, and required permissions.", + `Create it under ${this.extensionsRoot || EXTENSIONS_ROOT_FALLBACK}/ and keep permissions minimal.`, + ].join("\n") + ); + }, + + askAgentInstallExtension() { + const url = String(this.extensionInstallUrl || "").trim(); + this._prefillAgentPrompt( + [ + "Use the a0-browser-ext skill to install and review a Chrome Web Store extension for Agent Zero's Browser.", + url ? `Chrome Web Store URL or id: ${url}` : "Ask me for the Chrome Web Store URL or extension id first.", + "Explain the permissions and any sandbox risk before enabling it.", + ].join("\n") + ); + }, + + async installExtensionFromUrl() { + const url = String(this.extensionInstallUrl || "").trim(); + this.extensionActionMessage = ""; + this.extensionActionError = ""; + if (!url) { + this.extensionActionError = "Paste a Chrome Web Store URL or extension id first."; + return; + } + + this.extensionActionLoading = true; + try { + const response = await callJsonApi("/plugins/_browser/extensions", { + action: "install_web_store", + url, + }); + if (!response?.ok) { + throw new Error(response?.error || "Install failed."); + } + this.extensionInstallUrl = ""; + this.extensionActionMessage = `Installed ${response.name || response.id}. Browser sessions restart when extension settings change.`; + await this.refreshStatus(); + await this.refreshExtensionsList(); + } catch (error) { + this.extensionActionError = error instanceof Error ? error.message : String(error); + } finally { + this.extensionActionLoading = false; + } + }, + + _prefillAgentPrompt(prompt) { + chatInputStore.message = prompt; + chatInputStore.adjustTextareaHeight?.(); + chatInputStore.focus?.(); + this.closeExtensionsMenu(); + }, + + async onOpen(element = null) { + this.loading = true; + this.error = ""; + this.setupFloatingModal(element); + this.contextId = this.resolveContextId(); + try { + await this.refreshStatus(); + await this.connectViewer(); + } catch (error) { + this.error = error instanceof Error ? error.message : String(error); + } finally { + this.loading = false; + } + }, + + async connectViewer() { + if (!this.contextId) { + this.connected = false; + this.error = "No active chat context is selected."; + return; + } + this.error = ""; + await this._bindSocketEvents(); + const response = await websocket.request( + "browser_viewer_subscribe", + { + context_id: this.contextId, + browser_id: this.activeBrowserId, + }, + { timeoutMs: 10000 }, + ); + const data = firstOk(response); + this.browsers = data.browsers || []; + this.setActiveBrowserId(data.active_browser_id || this.activeBrowserId || null); + this.connected = true; + this.queueViewportSync(true); + }, + + async _bindSocketEvents() { + if (!this._frameOff) { + const frameHandler = ({ data }) => { + if (data?.context_id !== this.contextId) return; + this.browsers = data.browsers || this.browsers; + this.setActiveBrowserId(data.browser_id || data.state?.id || this.activeBrowserId); + this.frameState = data.state || null; + if (!this.addressFocused && data.state?.currentUrl) { + this.address = data.state.currentUrl; + } + this.frameSrc = data.image ? `data:${data.mime || "image/jpeg"};base64,${data.image}` : ""; + if (!data.image && !data.state) { + this.setActiveBrowserId(null); + this.frameState = null; + this.frameSrc = ""; + } + this._lastFrameAt = Date.now(); + }; + await websocket.on("browser_viewer_frame", frameHandler); + this._frameOff = () => websocket.off("browser_viewer_frame", frameHandler); + } + if (!this._stateOff) { + const stateHandler = ({ data }) => { + if (data?.context_id !== this.contextId) return; + this.browsers = data.browsers || []; + this.setActiveBrowserId(data.last_interacted_browser_id || this.firstBrowserId()); + this.queueViewportSync(true); + }; + await websocket.on("browser_viewer_state", stateHandler); + this._stateOff = () => websocket.off("browser_viewer_state", stateHandler); + } + }, + + async command(command, extra = {}) { + this.error = ""; + const previousActiveBrowserId = this.activeBrowserId; + try { + const response = await websocket.request( + "browser_viewer_command", + { + context_id: this.contextId, + browser_id: this.activeBrowserId, + command, + ...extra, + }, + { timeoutMs: 20000 }, + ); + const data = firstOk(response); + this.browsers = data.browsers || this.browsers; + const result = data.result || {}; + this.setActiveBrowserId( + result.id + || result.state?.id + || result.last_interacted_browser_id + || data.last_interacted_browser_id + || this.firstBrowserId() + ); + if (!this.activeBrowserId) { + this.frameState = null; + this.frameSrc = ""; + } + if (result.state?.currentUrl || result.currentUrl) { + this.address = result.state?.currentUrl || result.currentUrl; + } + const activeChanged = this.activeBrowserId && this.activeBrowserId !== previousActiveBrowserId; + if ((command === "open" || command === "close" || activeChanged) && this.contextId && this.activeBrowserId) { + await this.connectViewer(); + } + this.queueViewportSync(true); + } catch (error) { + this.error = error instanceof Error ? error.message : String(error); + } + }, + + async go() { + const url = String(this.address || "").trim(); + if (!url) return; + this.addressFocused = false; + globalThis.document?.activeElement?.blur?.(); + if (this.activeBrowserId) { + await this.command("navigate", { url }); + } else { + await this.command("open", { url }); + } + }, + + onAddressFocus() { + this.addressFocused = true; + }, + + onAddressBlur() { + this.addressFocused = false; + if (this.frameState?.currentUrl && !String(this.address || "").trim()) { + this.address = this.frameState.currentUrl; + } + }, + + async selectBrowser(id) { + if (String(id || "").trim() === "") { + await this.command("open", { url: "about:blank" }); + return; + } + this.setActiveBrowserId(id); + if (this.contextId) { + await this.connectViewer(); + } + }, + + firstBrowserId() { + const first = Array.isArray(this.browsers) ? this.browsers[0] : null; + return first?.id || null; + }, + + setActiveBrowserId(id) { + const previous = this.activeBrowserId; + const numeric = Number(id) || null; + const exists = !numeric || !Array.isArray(this.browsers) || this.browsers.some((browser) => Number(browser.id) === numeric); + this.activeBrowserId = exists ? numeric : null; + if (this.activeBrowserId !== previous) { + this._lastViewportKey = ""; + } + }, + + pointerCoordinatesFor(event, element = null) { + const target = element || event?.currentTarget; + if (!target) return null; + const rect = target.getBoundingClientRect(); + const naturalWidth = target.naturalWidth || rect.width; + const naturalHeight = target.naturalHeight || rect.height; + return { + x: ((event.clientX - rect.left) / Math.max(1, rect.width)) * naturalWidth, + y: ((event.clientY - rect.top) / Math.max(1, rect.height)) * naturalHeight, + }; + }, + + currentViewportSize() { + const stage = this._stageElement; + if (!stage) return null; + const width = Math.floor(stage.clientWidth || 0); + const height = Math.floor(stage.clientHeight || 0); + if (width < 80 || height < 80) return null; + return { + width: Math.max(320, width), + height: Math.max(200, height), + }; + }, + + queueViewportSync(force = false) { + if (this._viewportSyncTimer) { + globalThis.clearTimeout(this._viewportSyncTimer); + } + this._viewportSyncTimer = globalThis.setTimeout(() => { + this._viewportSyncTimer = null; + void this.syncViewport(force); + }, force ? 0 : 80); + }, + + async syncViewport(force = false) { + if (!this.contextId || !this.activeBrowserId) return; + const viewport = this.currentViewportSize(); + if (!viewport) return; + const key = `${this.activeBrowserId}:${viewport.width}x${viewport.height}`; + if (!force && this._lastViewportKey === key) return; + try { + await websocket.emit("browser_viewer_input", { + context_id: this.contextId, + browser_id: this.activeBrowserId, + input_type: "viewport", + width: viewport.width, + height: viewport.height, + }); + this._lastViewportKey = key; + } catch (error) { + this._lastViewportKey = ""; + console.warn("Browser viewport sync failed", error); + } + }, + + async sendMouse(eventType, event) { + if (!this.activeBrowserId || !event?.currentTarget) return; + const pointer = this.pointerCoordinatesFor(event); + if (!pointer) return; + await websocket.emit("browser_viewer_input", { + context_id: this.contextId, + browser_id: this.activeBrowserId, + input_type: "mouse", + event_type: eventType, + x: pointer.x, + y: pointer.y, + button: "left", + }); + }, + + async sendWheel(event) { + if (!this.activeBrowserId || !event) return; + const image = event.currentTarget?.querySelector?.(".browser-frame") || event.target?.closest?.(".browser-frame"); + const pointer = this.pointerCoordinatesFor(event, image); + if (!pointer) return; + await websocket.emit("browser_viewer_input", { + context_id: this.contextId, + browser_id: this.activeBrowserId, + input_type: "wheel", + x: pointer.x, + y: pointer.y, + delta_x: Number(event.deltaX || 0), + delta_y: Number(event.deltaY || 0), + }); + }, + + async sendKey(event) { + if (!this.activeBrowserId) return; + if (event.ctrlKey || event.metaKey || event.altKey) return; + const editable = ["INPUT", "TEXTAREA", "SELECT"].includes(event.target?.tagName); + if (editable) return; + event.preventDefault(); + const printable = event.key && event.key.length === 1; + await websocket.emit("browser_viewer_input", { + context_id: this.contextId, + browser_id: this.activeBrowserId, + input_type: "keyboard", + key: printable ? "" : event.key, + text: printable ? event.key : "", + }); + }, + + async cleanup() { + if (this.contextId) { + try { + await websocket.emit("browser_viewer_unsubscribe", { context_id: this.contextId }); + } catch {} + } + this._frameOff?.(); + this._stateOff?.(); + this._frameOff = null; + this._stateOff = null; + this._floatingCleanup?.(); + this._floatingCleanup = null; + this._stageResizeObserver?.disconnect?.(); + this._stageResizeObserver = null; + this._stageElement = null; + if (this._viewportSyncTimer) { + globalThis.clearTimeout(this._viewportSyncTimer); + this._viewportSyncTimer = null; + } + this._lastViewportKey = ""; + this.extensionMenuOpen = false; + this.extensionActionLoading = false; + this.connected = false; + }, + + setupFloatingModal(element = null) { + this._floatingCleanup?.(); + const root = element || globalThis.document?.querySelector(".browser-panel"); + const modal = root?.closest?.(".modal"); + const inner = modal?.querySelector?.(".modal-inner"); + const body = modal?.querySelector?.(".modal-bd"); + const header = modal?.querySelector?.(".modal-header"); + const stage = root?.querySelector?.(".browser-stage"); + if (!modal || !inner || !header) return; + modal.classList.add("modal-floating"); + inner.classList.add("browser-modal"); + body?.classList?.add("browser-modal-body"); + this._stageElement = stage || null; + + const rect = inner.getBoundingClientRect(); + inner.style.left = `${Math.max(8, rect.left)}px`; + inner.style.top = `${Math.max(8, rect.top)}px`; + inner.style.transform = "none"; + + let drag = null; + let resizeObserver = null; + const viewportGap = 8; + const clampPosition = (left, top) => { + const bounds = inner.getBoundingClientRect(); + const maxLeft = Math.max(viewportGap, globalThis.innerWidth - bounds.width - viewportGap); + const maxTop = Math.max(viewportGap, globalThis.innerHeight - bounds.height - viewportGap); + return { + left: Math.min(Math.max(viewportGap, left), maxLeft), + top: Math.min(Math.max(viewportGap, top), maxTop), + }; + }; + const clampGeometry = () => { + const bounds = inner.getBoundingClientRect(); + const left = Math.max(viewportGap, bounds.left); + const top = Math.max(viewportGap, bounds.top); + const maxWidth = Math.max(320, globalThis.innerWidth - viewportGap * 2); + const maxHeight = Math.max(300, globalThis.innerHeight - viewportGap * 2); + if (bounds.width > maxWidth) { + inner.style.width = `${maxWidth}px`; + } + if (bounds.height > maxHeight) { + inner.style.height = `${maxHeight}px`; + } + const next = clampPosition(left, top); + inner.style.left = `${next.left}px`; + inner.style.top = `${next.top}px`; + inner.style.maxWidth = `${Math.max(320, globalThis.innerWidth - next.left - viewportGap)}px`; + inner.style.maxHeight = `${Math.max(300, globalThis.innerHeight - next.top - viewportGap)}px`; + this.queueViewportSync(); + }; + clampGeometry(); + globalThis.addEventListener("resize", clampGeometry); + if (globalThis.ResizeObserver) { + resizeObserver = new ResizeObserver(clampGeometry); + resizeObserver.observe(inner); + if (stage) { + this._stageResizeObserver?.disconnect?.(); + this._stageResizeObserver = new ResizeObserver(() => this.queueViewportSync()); + this._stageResizeObserver.observe(stage); + } + } + globalThis.requestAnimationFrame(() => this.queueViewportSync(true)); + + const onPointerMove = (event) => { + if (!drag) return; + const next = clampPosition( + drag.left + event.clientX - drag.x, + drag.top + event.clientY - drag.y, + ); + inner.style.left = `${next.left}px`; + inner.style.top = `${next.top}px`; + clampGeometry(); + }; + const onPointerUp = () => { + drag = null; + globalThis.removeEventListener("pointermove", onPointerMove); + globalThis.removeEventListener("pointerup", onPointerUp); + try { + header.releasePointerCapture?.(header.__browserPanelPointerId || 0); + } catch {} + }; + const onPointerDown = (event) => { + if (event.button !== 0) return; + if (event.target?.closest?.("button, input, select, textarea, a")) return; + const current = inner.getBoundingClientRect(); + drag = { + x: event.clientX, + y: event.clientY, + left: current.left, + top: current.top, + }; + header.__browserPanelPointerId = event.pointerId; + header.setPointerCapture?.(event.pointerId); + globalThis.addEventListener("pointermove", onPointerMove); + globalThis.addEventListener("pointerup", onPointerUp); + event.preventDefault(); + }; + header.addEventListener("pointerdown", onPointerDown); + + this._floatingCleanup = () => { + header.removeEventListener("pointerdown", onPointerDown); + globalThis.removeEventListener("pointermove", onPointerMove); + globalThis.removeEventListener("pointerup", onPointerUp); + globalThis.removeEventListener("resize", clampGeometry); + resizeObserver?.disconnect?.(); + this._stageResizeObserver?.disconnect?.(); + this._stageResizeObserver = null; + }; + }, + + get activeTitle() { + return this.frameState?.title || "Browser"; + }, + + get activeUrl() { + return this.frameState?.currentUrl || this.address || "about:blank"; + }, +}; + +export const store = createStore("browserPage", model); diff --git a/plugins/_browser/webui/config.html b/plugins/_browser/webui/config.html new file mode 100644 index 000000000..7647e79be --- /dev/null +++ b/plugins/_browser/webui/config.html @@ -0,0 +1,225 @@ + + + Browser Settings + + + + +
+ +
+ + + + diff --git a/plugins/_browser/webui/main.html b/plugins/_browser/webui/main.html new file mode 100644 index 000000000..8fd4b5156 --- /dev/null +++ b/plugins/_browser/webui/main.html @@ -0,0 +1,556 @@ + + + Browser + + + +
+ +
+ + + + diff --git a/plugins/_browser_agent/api/model_preset.py b/plugins/_browser_agent/api/model_preset.py deleted file mode 100644 index 662122e43..000000000 --- a/plugins/_browser_agent/api/model_preset.py +++ /dev/null @@ -1,35 +0,0 @@ -from helpers.api import ApiHandler, Request, Response - -from plugins._browser_agent.helpers.model_preset import ( - get_browser_model_preset_name, - save_browser_model_preset_name, -) -from plugins._model_config.helpers import model_config - - -class ModelPreset(ApiHandler): - async def process(self, input: dict, request: Request) -> dict | Response: - action = str(input.get("action", "get") or "get").strip().lower() - - if action == "get": - return { - "ok": True, - "preset_name": get_browser_model_preset_name(), - } - - if action not in {"set", "clear"}: - return Response(status=400, response=f"Unknown action: {action}") - - preset_name = "" - if action == "set": - preset_name = str(input.get("preset_name", "") or "").strip() - if not preset_name: - return Response(status=400, response="Missing preset_name") - if not model_config.get_preset_by_name(preset_name): - return Response(status=404, response=f"Preset '{preset_name}' not found") - - save_browser_model_preset_name(preset_name) - return { - "ok": True, - "preset_name": preset_name, - } diff --git a/plugins/_browser_agent/api/status.py b/plugins/_browser_agent/api/status.py deleted file mode 100644 index bd382c022..000000000 --- a/plugins/_browser_agent/api/status.py +++ /dev/null @@ -1,54 +0,0 @@ -import importlib.metadata - -from helpers.api import ApiHandler, Request, Response -from plugins._browser_agent.helpers.model_preset import ( - get_browser_model_preset_options, - resolve_browser_model_selection, -) -from plugins._browser_agent.helpers.playwright import ( - get_playwright_binary, - get_playwright_cache_dir, -) - - -class Status(ApiHandler): - async def process(self, input: dict, request: Request) -> dict | Response: - selection = resolve_browser_model_selection() - cfg = selection["config"] - binary = get_playwright_binary() - - browser_use_ok = False - browser_use_error = "" - browser_use_version = "" - try: - import browser_use # noqa: F401 - - browser_use_ok = True - browser_use_version = importlib.metadata.version("browser-use") - except Exception as e: - browser_use_error = str(e) - - return { - "plugin": "_browser_agent", - "model_source": selection["source_label"], - "model_source_kind": selection["source_kind"], - "selected_preset_name": selection["selected_preset_name"], - "preset_status": selection["preset_status"], - "preset_warning": selection["warning"], - "available_presets": get_browser_model_preset_options(), - "model": { - "provider": cfg.get("provider", ""), - "name": cfg.get("name", ""), - "vision": bool(cfg.get("vision", False)), - }, - "playwright": { - "cache_dir": get_playwright_cache_dir(), - "binary_found": bool(binary), - "binary_path": str(binary) if binary else "", - }, - "browser_use": { - "import_ok": browser_use_ok, - "version": browser_use_version, - "error": browser_use_error, - }, - } diff --git a/plugins/_browser_agent/assets/init_override.js b/plugins/_browser_agent/assets/init_override.js deleted file mode 100644 index b04b0b98e..000000000 --- a/plugins/_browser_agent/assets/init_override.js +++ /dev/null @@ -1,246 +0,0 @@ -// open all shadow doms -(function () { - const originalAttachShadow = Element.prototype.attachShadow; - Element.prototype.attachShadow = function attachShadow(options) { - return originalAttachShadow.call(this, { ...options, mode: "open" }); - }; -})(); - -// // Create a global bridge for iframe communication -// (function() { -// let elementCounter = 0; -// const ignoredTags = [ -// "style", -// "script", -// "meta", -// "link", -// "svg", -// "noscript", -// "path", -// ]; - -// function isElementVisible(element) { -// // Return true for non-element nodes -// if (element.nodeType !== Node.ELEMENT_NODE) { -// return true; -// } - -// const computedStyle = window.getComputedStyle(element); - -// // Check if element is hidden via CSS -// if ( -// computedStyle.display === "none" || -// computedStyle.visibility === "hidden" || -// computedStyle.opacity === "0" -// ) { -// return false; -// } - -// // Check for hidden input type -// if (element.tagName === "INPUT" && element.type === "hidden") { -// return false; -// } - -// // Check for hidden attribute -// if ( -// element.hasAttribute("hidden") || -// element.getAttribute("aria-hidden") === "true" -// ) { -// return false; -// } - -// return true; -// } - -// function convertAttribute(tag, attr) { -// let out = { -// name: attr.name, -// value: attr.value, -// }; - -// if (["srcset"].includes(out.name)) return null; -// if (out.name.startsWith("data-") && out.name != "data-A0UID" && out.name != "data-a0-frame-id") return null; - -// if (tag === "img" && out.value.startsWith("data:")) out.value = "data..."; - -// return out; -// } - -// // This function will be available in all frames -// window.__A0_extractFrameContent = function() { -// // Get the current frame's DOM content -// const extractContent = (node) => { -// if (!node) return ""; - -// let content = ""; -// const tagName = node.tagName ? node.tagName.toLowerCase() : ""; - -// // Skip ignored tags -// if (tagName && ignoredTags.includes(tagName)) { -// return ""; -// } - -// if (node.nodeType === Node.ELEMENT_NODE) { -// // Add unique ID to the actual DOM element -// if (tagName) { -// const uid = elementCounter++; -// node.setAttribute("data-A0UID", uid); -// } - -// content += `<${tagName}`; - -// // Add invisible attribute if element is not visible -// if (!isElementVisible(node)) { -// content += " invisible"; -// } - -// // Add attributes with conversion -// for (let attr of node.attributes) { -// const out = convertAttribute(tagName, attr); -// if (out) content += ` ${out.name}="${out.value}"`; -// } - -// if (tagName) { -// content += ` selector="${node.getAttribute("data-A0UID")}"`; -// } - -// content += ">"; - -// // Handle shadow DOM -// if (node.shadowRoot) { -// content += ""; -// for (let shadowChild of node.shadowRoot.childNodes) { -// content += extractContent(shadowChild); -// } -// content += ""; -// } - -// // Handle child nodes -// for (let child of node.childNodes) { -// content += extractContent(child); -// } - -// content += ``; -// } else if (node.nodeType === Node.TEXT_NODE) { -// content += node.textContent; -// } else if (node.nodeType === Node.COMMENT_NODE) { -// content += ``; -// } - -// return content; -// }; - -// return extractContent(document.documentElement); -// }; - -// // Setup message listener in each frame -// window.addEventListener('message', function(event) { -// if (event.data === 'A0_REQUEST_CONTENT') { -// // Extract content and send it back to parent -// const content = window.__A0_extractFrameContent(); -// // Use '*' as targetOrigin since we're in a controlled environment -// window.parent.postMessage({ -// type: 'A0_FRAME_CONTENT', -// content: content, -// frameId: window.frameElement?.getAttribute('data-a0-frame-id') -// }, '*'); -// } -// }); - -// // Function to extract content from all frames -// window.__A0_extractAllFramesContent = async function(rootNode = document) { -// let content = ""; - -// // Extract content from current document -// content += window.__A0_extractFrameContent(); - -// // Find all iframes -// const iframes = rootNode.getElementsByTagName('iframe'); - -// // Create a map to store frame contents -// const frameContents = new Map(); - -// // Setup promise for each iframe -// const framePromises = Array.from(iframes).map((iframe) => { -// return new Promise((resolve) => { -// const frameId = 'frame_' + Math.random().toString(36).substr(2, 9); -// iframe.setAttribute('data-a0-frame-id', frameId); - -// // Setup one-time message listener for this specific frame -// const listener = function(event) { -// if (event.data?.type === 'A0_FRAME_CONTENT' && -// event.data?.frameId === frameId) { -// frameContents.set(frameId, event.data.content); -// window.removeEventListener('message', listener); -// resolve(); -// } -// }; -// window.addEventListener('message', listener); - -// // Request content from frame -// iframe.contentWindow.postMessage('A0_REQUEST_CONTENT', '*'); - -// // Timeout after 2 seconds -// setTimeout(resolve, 2000); -// }); -// }); - -// // Wait for all frames to respond or timeout -// await Promise.all(framePromises); - -// // Add frame contents in order -// for (let iframe of iframes) { -// const frameId = iframe.getAttribute('data-a0-frame-id'); -// const frameContent = frameContents.get(frameId); -// if (frameContent) { -// content += ``; -// content += frameContent; -// content += ``; -// } -// } - -// return content; -// }; -// })(); - -// // override iframe creation to inject our script into them -// (function() { -// // Store the original createElement to use for iframe creation -// const originalCreateElement = document.createElement; - -// // Override createElement to catch iframe creation -// document.createElement = function(tagName, options) { -// const element = originalCreateElement.call(document, tagName, options); -// if (tagName.toLowerCase() === 'iframe') { -// // Override the src setter -// const originalSrcSetter = Object.getOwnPropertyDescriptor(HTMLIFrameElement.prototype, 'src').set; -// Object.defineProperty(element, 'src', { -// set: function(value) { -// // Call original setter -// originalSrcSetter.call(this, value); - -// // Wait for load and inject our script -// this.addEventListener('load', () => { -// try { -// // Try to inject our script into the iframe -// const iframeDoc = this.contentWindow.document; -// const script = iframeDoc.createElement('script'); -// script.textContent = ` -// // Make iframe accessible -// document.domain = document.domain; -// // Disable security policies if possible -// if (window.SecurityPolicyViolationEvent) { -// window.SecurityPolicyViolationEvent = undefined; -// } -// `; -// iframeDoc.head.appendChild(script); -// } catch(e) { -// console.warn('Could not inject into iframe:', e); -// } -// }, { once: true }); -// } -// }); -// } -// return element; -// }; -// })(); diff --git a/plugins/_browser_agent/extensions/python/_functions/agent/Agent/get_browser_model/start/_10_browser_agent.py b/plugins/_browser_agent/extensions/python/_functions/agent/Agent/get_browser_model/start/_10_browser_agent.py deleted file mode 100644 index 57ea73dfb..000000000 --- a/plugins/_browser_agent/extensions/python/_functions/agent/Agent/get_browser_model/start/_10_browser_agent.py +++ /dev/null @@ -1,7 +0,0 @@ -from helpers.extension import Extension -from plugins._browser_agent.helpers.browser_llm import build_browser_model_for_agent - -class BrowserModelProvider(Extension): - def execute(self, data: dict = {}, **kwargs): - if self.agent: - data["result"] = build_browser_model_for_agent(self.agent) diff --git a/plugins/_browser_agent/extensions/webui/get_message_handler/browser-agent-handler.js b/plugins/_browser_agent/extensions/webui/get_message_handler/browser-agent-handler.js deleted file mode 100644 index 5177c1394..000000000 --- a/plugins/_browser_agent/extensions/webui/get_message_handler/browser-agent-handler.js +++ /dev/null @@ -1,54 +0,0 @@ -import { - createActionButton, - copyToClipboard, -} from "/components/messages/action-buttons/simple-action-buttons.js"; -import { store as stepDetailStore } from "/components/modals/process-step-detail/step-detail-store.js"; -import { store as speechStore } from "/components/chat/speech/speech-store.js"; -import { - buildDetailPayload, - cleanStepTitle, - drawProcessStep, -} from "/js/messages.js"; - -export default async function registerBrowserAgentHandler(extData) { - if (extData?.type === "browser") { - extData.handler = drawMessageBrowserAgent; - } -} - -function drawMessageBrowserAgent({ - id, - type, - heading, - content, - kvps, - timestamp, - agentno = 0, - ...additional -}) { - const title = cleanStepTitle(heading); - const displayKvps = { ...kvps }; - const answerText = String(kvps?.answer ?? ""); - const actionButtons = answerText.trim() - ? [ - createActionButton("detail", "", () => - stepDetailStore.showStepDetail( - buildDetailPayload(arguments[0], { headerLabels: [] }), - ), - ), - createActionButton("speak", "", () => speechStore.speak(answerText)), - createActionButton("copy", "", () => copyToClipboard(answerText)), - ].filter(Boolean) - : []; - - return drawProcessStep({ - id, - title, - code: "WWW", - classes: undefined, - kvps: displayKvps, - content, - actionButtons, - log: arguments[0], - }); -} diff --git a/plugins/_browser_agent/extensions/webui/get_tool_message_handler/browser-tool-handler.js b/plugins/_browser_agent/extensions/webui/get_tool_message_handler/browser-tool-handler.js deleted file mode 100644 index 7dc480593..000000000 --- a/plugins/_browser_agent/extensions/webui/get_tool_message_handler/browser-tool-handler.js +++ /dev/null @@ -1,15 +0,0 @@ -import { drawMessageToolSimple } from "/js/messages.js"; - -/** - * Registers the browser_agent tool message handler to set the custom badge. - * @param {object} extData - */ -export default async function registerBrowserToolHandler(extData) { - if (extData?.tool_name === "browser_agent") { - extData.handler = drawBrowserTool; - } -} - -function drawBrowserTool(args) { - return drawMessageToolSimple({ ...args, code: "WWW" }); -} diff --git a/plugins/_browser_agent/helpers/__init__.py b/plugins/_browser_agent/helpers/__init__.py deleted file mode 100644 index 2d4bd7f1d..000000000 --- a/plugins/_browser_agent/helpers/__init__.py +++ /dev/null @@ -1 +0,0 @@ -# Built-in browser agent helpers. diff --git a/plugins/_browser_agent/helpers/browser_llm.py b/plugins/_browser_agent/helpers/browser_llm.py deleted file mode 100644 index 1b0725237..000000000 --- a/plugins/_browser_agent/helpers/browser_llm.py +++ /dev/null @@ -1,162 +0,0 @@ -from typing import Any, List, Optional -import litellm -from litellm import acompletion -from langchain_core.callbacks.manager import CallbackManagerForLLMRun -from langchain_core.messages import BaseMessage - -import models -from browser_use.llm import ChatGoogle, ChatOpenRouter - -from plugins._browser_agent.helpers import browser_use_monkeypatch -from plugins._browser_agent.helpers import model_preset -from plugins._browser_agent.helpers import browser_use_openrouter_compat -from plugins._browser_agent.helpers import browser_use_output_sanitize - - -_BROWSER_USE_PATCHED = False - - -def apply_browser_use_patches() -> None: - global _BROWSER_USE_PATCHED - if _BROWSER_USE_PATCHED: - return - - browser_use_monkeypatch.apply() - litellm.modify_params = True - _BROWSER_USE_PATCHED = True - - -class AsyncAIChatReplacement: - class _Completions: - def __init__(self, wrapper): - self._wrapper = wrapper - - async def create(self, *args, **kwargs): - return await self._wrapper._acall(*args, **kwargs) - - class _Chat: - def __init__(self, wrapper): - self.completions = AsyncAIChatReplacement._Completions(wrapper) - - def __init__(self, wrapper, *args, **kwargs): - self._wrapper = wrapper - self.chat = AsyncAIChatReplacement._Chat(wrapper) - - -class BrowserCompatibleChatWrapper(ChatOpenRouter): - """ - A wrapper for browser agent that can filter/sanitize messages - before sending them to the LLM. - """ - - def __init__(self, *args, **kwargs): - apply_browser_use_patches() - models.turn_off_logging() - self._wrapper = models.LiteLLMChatWrapper(*args, **kwargs) - self.model = self._wrapper.model_name - self.kwargs = self._wrapper.kwargs - - @property - def model_name(self) -> str: - return self._wrapper.model_name - - @property - def provider(self) -> str: - return self._wrapper.provider - - def get_client(self, *args, **kwargs): # type: ignore - return AsyncAIChatReplacement(self, *args, **kwargs) - - async def _acall( - self, - messages: List[BaseMessage], - stop: Optional[List[str]] = None, - run_manager: Optional[CallbackManagerForLLMRun] = None, - **kwargs: Any, - ): - models.apply_rate_limiter_sync(self._wrapper.a0_model_conf, str(messages)) - - try: - model = kwargs.pop("model", None) - effective_model = model or self._wrapper.model_name - kwrgs = {**self._wrapper.kwargs, **kwargs} - request_messages = messages - - # hack from browser-use to fix json schema for gemini (additionalProperties, $defs, $ref) - if "response_format" in kwrgs and "json_schema" in kwrgs["response_format"] and effective_model and effective_model.startswith("gemini/"): - kwrgs["response_format"]["json_schema"] = ChatGoogle("")._fix_gemini_schema(kwrgs["response_format"]["json_schema"]) - - if browser_use_openrouter_compat.should_use_openrouter_prompt_schema_fallback( - provider=self.provider, - model_name=effective_model, - kwargs=kwrgs, - ): - fallback_request = browser_use_openrouter_compat.build_json_object_fallback_request( - messages=messages, - kwargs=kwrgs, - ) - if fallback_request is not None: - request_messages, kwrgs = fallback_request - - resp = await acompletion( - model=self._wrapper.model_name, - messages=request_messages, - stop=stop, - **kwrgs, - ) - - # Gemini: strip triple backticks and conform schema - try: - msg = resp.choices[0].message # type: ignore - if self.provider == "gemini" and isinstance(getattr(msg, "content", None), str): - cleaned = browser_use_monkeypatch.gemini_clean_and_conform(msg.content) # type: ignore - if cleaned: - msg.content = cleaned - except Exception: - pass - - except Exception as e: - raise e - - # Structured output: normalize keys/models reject (e.g. "" on action dicts) and repair partial JSON - try: - rf = kwrgs.get("response_format") or {} - if "json_schema" in rf or "json_object" in rf: - msg_obj = resp.choices[0].message - raw_content = getattr(msg_obj, "content", None) - fixed = browser_use_output_sanitize.sanitize_llm_message_content_for_browser_use(raw_content) # type: ignore[arg-type] - if fixed is not None: - msg_obj.content = fixed - except Exception: - pass - - return resp - - -def build_browser_model_from_config( - model_config: models.ModelConfig, -) -> BrowserCompatibleChatWrapper: - apply_browser_use_patches() - original_provider = model_config.provider.lower() - provider_name, kwargs = models._merge_provider_defaults( # type: ignore[attr-defined] - "chat", original_provider, model_config.build_kwargs() - ) - return models._get_litellm_chat( # type: ignore[attr-defined] - BrowserCompatibleChatWrapper, - model_config.name, - provider_name, - model_config, - **kwargs, - ) - -def build_browser_model_for_agent(agent=None) -> BrowserCompatibleChatWrapper: - """Build and return the browser-use adapter using chat model config.""" - from plugins._model_config.helpers.model_config import ( - build_model_config, - ) - import models - - selection = model_preset.resolve_browser_model_selection(agent) - cfg = selection["config"] - mc = build_model_config(cfg, models.ModelType.CHAT) - return build_browser_model_from_config(mc) diff --git a/plugins/_browser_agent/helpers/browser_use.py b/plugins/_browser_agent/helpers/browser_use.py deleted file mode 100644 index df4b3c7a6..000000000 --- a/plugins/_browser_agent/helpers/browser_use.py +++ /dev/null @@ -1,4 +0,0 @@ -from helpers import dotenv -dotenv.save_dotenv_value("ANONYMIZED_TELEMETRY", "false") -import browser_use -import browser_use.utils diff --git a/plugins/_browser_agent/helpers/browser_use_monkeypatch.py b/plugins/_browser_agent/helpers/browser_use_monkeypatch.py deleted file mode 100644 index 83d0eb0c9..000000000 --- a/plugins/_browser_agent/helpers/browser_use_monkeypatch.py +++ /dev/null @@ -1,166 +0,0 @@ -from typing import Any -from browser_use.llm import ChatGoogle -from helpers import dirty_json - -from plugins._browser_agent.helpers import browser_use_output_sanitize - - -# ------------------------------------------------------------------------------ -# Gemini Helper for Output Conformance -# ------------------------------------------------------------------------------ -# This function sanitizes and conforms the JSON output from Gemini to match -# the specific schema expectations of the browser-use library. It handles -# markdown fences, aliases actions (like 'complete_task' to 'done'), and -# intelligently constructs a valid 'data' object for the final action. - -def gemini_clean_and_conform(text: str): - obj = None - try: - # dirty_json parser is robust enough to handle markdown fences - obj = dirty_json.parse(text) - except Exception: - return None # return None if parsing fails - - if not isinstance(obj, dict): - return None - - obj = browser_use_output_sanitize.normalize_parsed_browser_use_output(obj) - - # Conform actions to browser-use expectations - if isinstance(obj.get("action"), list): - normalized_actions = [] - for item in obj["action"]: - if not isinstance(item, dict): - continue # Skip non-dict items - - action_key, action_value = next(iter(item.items()), (None, None)) - if not action_key: - continue - - # Alias 'complete_task' to 'done' to handle inconsistencies - if action_key == "complete_task": - action_key = "done" - - # Create a mutable copy of the value - v = (action_value or {}).copy() - - if action_key in ("scroll_down", "scroll_up", "scroll"): - is_down = action_key != "scroll_up" - v.setdefault("down", is_down) - v.setdefault("num_pages", 1.0) - normalized_actions.append({"scroll": v}) - elif action_key == "go_to_url": - v.setdefault("new_tab", False) - normalized_actions.append({action_key: v}) - elif action_key == "done": - # If `data` is missing, construct it from other keys - if "data" not in v: - # Pop fields from the top-level `done` object - response_text = v.pop("response", None) - summary_text = v.pop("page_summary", None) - title_text = v.pop("title", "Task Completed") - - final_response = response_text or "Task completed successfully." # browser-use expects string - final_summary = summary_text or "No page summary available." # browser-use expects string - - v["data"] = { - "title": title_text, - "response": final_response, - "page_summary": final_summary, - } - - v.setdefault("success", True) - normalized_actions.append({action_key: v}) - else: - normalized_actions.append(item) - obj["action"] = normalized_actions - - return dirty_json.stringify(obj) - -# ------------------------------------------------------------------------------ -# Monkey-patch for browser-use Gemini schema issue -# ------------------------------------------------------------------------------ -# The original _fix_gemini_schema in browser_use.llm.google.chat.ChatGoogle -# removes the 'title' property but fails to remove it from the 'required' list, -# causing a validation error with the Gemini API. This patch corrects that behavior. - -def _patched_fix_gemini_schema(self, schema: dict[str, Any]) -> dict[str, Any]: - """ - Convert a Pydantic model to a Gemini-compatible schema. - - This function removes unsupported properties like 'additionalProperties' and resolves - $ref references that Gemini doesn't support. - """ - - # Handle $defs and $ref resolution - if '$defs' in schema: - defs = schema.pop('$defs') - - def resolve_refs(obj: Any) -> Any: - if isinstance(obj, dict): - if '$ref' in obj: - ref = obj.pop('$ref') - ref_name = ref.split('/')[-1] - if ref_name in defs: - # Replace the reference with the actual definition - resolved = defs[ref_name].copy() - # Merge any additional properties from the reference - for key, value in obj.items(): - if key != '$ref': - resolved[key] = value - return resolve_refs(resolved) - return obj - else: - # Recursively process all dictionary values - return {k: resolve_refs(v) for k, v in obj.items()} - elif isinstance(obj, list): - return [resolve_refs(item) for item in obj] - return obj - - schema = resolve_refs(schema) - - # Remove unsupported properties - def clean_schema(obj: Any) -> Any: - if isinstance(obj, dict): - # Remove unsupported properties - cleaned = {} - for key, value in obj.items(): - if key not in ['additionalProperties', 'title', 'default']: - cleaned_value = clean_schema(value) - # Handle empty object properties - Gemini doesn't allow empty OBJECT types - if ( - key == 'properties' - and isinstance(cleaned_value, dict) - and len(cleaned_value) == 0 - and isinstance(obj.get('type', ''), str) - and obj.get('type', '').upper() == 'OBJECT' - ): - # Convert empty object to have at least one property - cleaned['properties'] = {'_placeholder': {'type': 'string'}} - else: - cleaned[key] = cleaned_value - - # If this is an object type with empty properties, add a placeholder - if ( - isinstance(cleaned.get('type', ''), str) - and cleaned.get('type', '').upper() == 'OBJECT' - and 'properties' in cleaned - and isinstance(cleaned['properties'], dict) - and len(cleaned['properties']) == 0 - ): - cleaned['properties'] = {'_placeholder': {'type': 'string'}} - - # PATCH: Also remove 'title' from the required list if it exists - if 'required' in cleaned and isinstance(cleaned.get('required'), list): - cleaned['required'] = [p for p in cleaned['required'] if p != 'title'] - - return cleaned - elif isinstance(obj, list): - return [clean_schema(item) for item in obj] - return obj - - return clean_schema(schema) - -def apply(): - """Applies the monkey-patch to ChatGoogle.""" - ChatGoogle._fix_gemini_schema = _patched_fix_gemini_schema diff --git a/plugins/_browser_agent/helpers/browser_use_openrouter_compat.py b/plugins/_browser_agent/helpers/browser_use_openrouter_compat.py deleted file mode 100644 index a377d3d7d..000000000 --- a/plugins/_browser_agent/helpers/browser_use_openrouter_compat.py +++ /dev/null @@ -1,93 +0,0 @@ -from __future__ import annotations - -import copy -import json -from typing import Any - -def is_openrouter_request(provider: str | None, model_name: str | None) -> bool: - provider_name = (provider or "").lower() - model = (model_name or "").lower() - return provider_name == "openrouter" or model.startswith("openrouter/") - - -def has_json_schema_response_format(kwargs: dict[str, Any]) -> bool: - response_format = kwargs.get("response_format") - return isinstance(response_format, dict) and ( - response_format.get("type") == "json_schema" or "json_schema" in response_format - ) - - -def should_use_openrouter_prompt_schema_fallback( - provider: str | None, model_name: str | None, kwargs: dict[str, Any] -) -> bool: - """ - OpenRouter sometimes routes browser-use structured output through providers - that reject large compiled grammars. Avoid the hard error entirely by - downgrading to `json_object` before the first request. - """ - return is_openrouter_request(provider, model_name) and has_json_schema_response_format(kwargs) - - -def relax_strict_tool_schemas(tools: Any) -> Any: - """ - Disable strict tool grammar on fallback while keeping tool definitions intact. - """ - if not isinstance(tools, list): - return tools - - relaxed = copy.deepcopy(tools) - for tool in relaxed: - if not isinstance(tool, dict): - continue - function_spec = tool.get("function") - if isinstance(function_spec, dict) and function_spec.get("strict") is True: - function_spec["strict"] = False - return relaxed - - -def _schema_hint_text(response_format: dict[str, Any]) -> str | None: - schema_payload = response_format.get("json_schema") - if not isinstance(schema_payload, dict): - return None - - compact_schema = json.dumps( - schema_payload, - ensure_ascii=False, - separators=(",", ":"), - ) - return ( - "Return only a single JSON object with no markdown fences, prose, or extra text. " - "Follow this schema exactly: " - f"{compact_schema}" - ) - - -def prepend_schema_hint_to_messages( - messages: list[Any], response_format: dict[str, Any] -) -> list[Any]: - hint = _schema_hint_text(response_format) - if not hint: - return list(messages) - return [{"role": "system", "content": hint}, *list(messages)] - - -def build_json_object_fallback_request( - messages: list[Any], - kwargs: dict[str, Any], -) -> tuple[list[Any], dict[str, Any]] | None: - """ - Replace strict json_schema with json_object and move schema guidance into the prompt. - - This keeps browser-use's local validation path while avoiding provider-side - grammar compilation limits on OpenRouter. - """ - response_format = kwargs.get("response_format") - if not isinstance(response_format, dict): - return None - - updated_kwargs = copy.deepcopy(kwargs) - updated_kwargs["response_format"] = {"type": "json_object"} - if "tools" in updated_kwargs: - updated_kwargs["tools"] = relax_strict_tool_schemas(updated_kwargs["tools"]) - updated_messages = prepend_schema_hint_to_messages(messages, response_format) - return updated_messages, updated_kwargs diff --git a/plugins/_browser_agent/helpers/browser_use_output_sanitize.py b/plugins/_browser_agent/helpers/browser_use_output_sanitize.py deleted file mode 100644 index 3a161d9ae..000000000 --- a/plugins/_browser_agent/helpers/browser_use_output_sanitize.py +++ /dev/null @@ -1,79 +0,0 @@ -""" -Utilities to normalize LLM replies before browser-use parses them into AgentOutput. - -Some models (e.g. via OpenRouter) emit extra JSON keys such as "" : "", which -Pydantic rejects as extra_forbidden on strict action union members. -""" - -from __future__ import annotations - -from typing import Any - -from helpers import dirty_json - - -def deep_strip_empty_string_keys(obj: Any) -> Any: - """ - Recursively remove dict entries whose key is the empty string. - - Browser-use action objects must be discriminated unions with a single - action key; spurious "" keys break validation for every union variant. - """ - if isinstance(obj, dict): - return { - k: deep_strip_empty_string_keys(v) - for k, v in obj.items() - if k != "" - } - if isinstance(obj, list): - return [deep_strip_empty_string_keys(item) for item in obj] - return obj - - -def normalize_parsed_browser_use_output(obj: dict) -> dict: - """Apply all normalizations safe for a parsed AgentOutput-shaped dict.""" - out = deep_strip_empty_string_keys(obj) - if not isinstance(out, dict): - return obj - return out - - -def parse_and_sanitize_llm_json(text: str) -> str | None: - """ - Parse message content and return JSON text safe for AgentOutput parsing. - - Returns None if the string is not a JSON object. - """ - try: - obj = dirty_json.parse(text) - except Exception: - return None - if not isinstance(obj, dict): - return None - return dirty_json.stringify(normalize_parsed_browser_use_output(obj)) - - -def sanitize_llm_message_content_for_browser_use(content: str | None) -> str | None: - """ - Best-effort sanitize assistant message content in place for browser-use. - - - If content parses as a dict: strip bad keys and re-serialize. - - If content is non-JSON or trailing garbage: try dirty_json parse; if dict, sanitize. - - Otherwise return the original string. - """ - if content is None: - return None - stripped = content.strip() - if not stripped: - return content - sanitized = parse_and_sanitize_llm_json(stripped) - if sanitized is not None: - return sanitized - if not stripped.startswith("{"): - try: - obj = dirty_json.parse(stripped) - except Exception: - return content - if isinstance(obj, dict): - return dirty_json.stringify(normalize_parsed_browser_use_output(obj)) - return content diff --git a/plugins/_browser_agent/helpers/model_preset.py b/plugins/_browser_agent/helpers/model_preset.py deleted file mode 100644 index 35b00803b..000000000 --- a/plugins/_browser_agent/helpers/model_preset.py +++ /dev/null @@ -1,122 +0,0 @@ -from __future__ import annotations - -from typing import Any - -from helpers import plugins as plugin_helpers -from plugins._model_config.helpers import model_config - - -MODEL_PRESET_KEY = "model_preset" - - -def get_browser_model_preset_name(agent=None) -> str: - config = plugin_helpers.get_plugin_config("_browser_agent", agent=agent) or {} - return str(config.get(MODEL_PRESET_KEY, "") or "").strip() - - -def get_browser_model_preset_options(agent=None) -> list[dict[str, Any]]: - selected_name = get_browser_model_preset_name(agent) - options: list[dict[str, Any]] = [] - found_selected = False - - for preset in model_config.get_presets(): - name = str(preset.get("name", "") or "").strip() - if not name: - continue - if name == selected_name: - found_selected = True - chat_cfg = preset.get("chat", {}) if isinstance(preset, dict) else {} - if not isinstance(chat_cfg, dict): - chat_cfg = {} - provider = str(chat_cfg.get("provider", "") or "").strip() - model_name = str(chat_cfg.get("name", "") or "").strip() - summary = " / ".join(part for part in (provider, model_name) if part) - options.append( - { - "name": name, - "label": name, - "missing": False, - "summary": summary, - } - ) - - if selected_name and not found_selected: - options.append( - { - "name": selected_name, - "label": f"{selected_name} (missing)", - "missing": True, - "summary": "", - } - ) - - return options - - -def resolve_browser_model_selection(agent=None) -> dict[str, Any]: - preset_name = get_browser_model_preset_name(agent) - if preset_name: - preset = model_config.get_preset_by_name(preset_name) - if isinstance(preset, dict): - chat_cfg = preset.get("chat", {}) - if isinstance(chat_cfg, dict) and ( - str(chat_cfg.get("provider", "") or "").strip() - or str(chat_cfg.get("name", "") or "").strip() - ): - return { - "config": chat_cfg, - "source_kind": "preset", - "source_label": f"Preset '{preset_name}' via _model_config", - "selected_preset_name": preset_name, - "preset_status": "active", - "warning": "", - } - return { - "config": model_config.get_chat_model_config(agent), - "source_kind": "main", - "source_label": "Main Model via _model_config", - "selected_preset_name": preset_name, - "preset_status": "invalid", - "warning": ( - f"Configured browser preset '{preset_name}' does not define a chat model. " - "Falling back to the Main Model." - ), - } - - return { - "config": model_config.get_chat_model_config(agent), - "source_kind": "main", - "source_label": "Main Model via _model_config", - "selected_preset_name": preset_name, - "preset_status": "missing", - "warning": ( - f"Configured browser preset '{preset_name}' was not found. " - "Falling back to the Main Model." - ), - } - - return { - "config": model_config.get_chat_model_config(agent), - "source_kind": "main", - "source_label": "Main Model via _model_config", - "selected_preset_name": "", - "preset_status": "none", - "warning": "", - } - - -def save_browser_model_preset_name(preset_name: str) -> None: - normalized = str(preset_name or "").strip() - config = plugin_helpers.get_plugin_config("_browser_agent") or {} - - if normalized: - config[MODEL_PRESET_KEY] = normalized - else: - config.pop(MODEL_PRESET_KEY, None) - - plugin_helpers.save_plugin_config( - "_browser_agent", - project_name="", - agent_profile="", - settings=config, - ) diff --git a/plugins/_browser_agent/helpers/playwright.py b/plugins/_browser_agent/helpers/playwright.py deleted file mode 100644 index 10c68d656..000000000 --- a/plugins/_browser_agent/helpers/playwright.py +++ /dev/null @@ -1,38 +0,0 @@ -import os -import sys -from pathlib import Path -import subprocess -from helpers import files - - -# this helper ensures that playwright is installed in /lib/playwright -# should work for both docker and local installation - -def get_playwright_binary(): - pw_cache = Path(get_playwright_cache_dir()) - for pattern in ( - "chromium_headless_shell-*/chrome-*/headless_shell", - "chromium_headless_shell-*/chrome-*/headless_shell.exe", - ): - binary = next(pw_cache.glob(pattern), None) - if binary: - return binary - return None - -def get_playwright_cache_dir(): - return files.get_abs_path("tmp/playwright") - -def ensure_playwright_binary(): - bin = get_playwright_binary() - if not bin: - cache = get_playwright_cache_dir() - env = os.environ.copy() - env["PLAYWRIGHT_BROWSERS_PATH"] = cache - subprocess.check_call( - ["playwright", "install", "chromium", "--only-shell"], - env=env - ) - bin = get_playwright_binary() - if not bin: - raise Exception("Playwright binary not found after installation") - return bin diff --git a/plugins/_browser_agent/plugin.yaml b/plugins/_browser_agent/plugin.yaml deleted file mode 100644 index 4ce5fae4e..000000000 --- a/plugins/_browser_agent/plugin.yaml +++ /dev/null @@ -1,8 +0,0 @@ -name: _browser_agent -title: Browser Agent -description: Built-in browser-use automation tool. -version: 1.0.0 -always_enabled: false -settings_sections: [] -per_project_config: false -per_agent_config: false diff --git a/plugins/_browser_agent/prompts/agent.system.tool.browser.md b/plugins/_browser_agent/prompts/agent.system.tool.browser.md deleted file mode 100644 index b7c5be90d..000000000 --- a/plugins/_browser_agent/prompts/agent.system.tool.browser.md +++ /dev/null @@ -1,7 +0,0 @@ -### browser_agent -subordinate browser worker for web tasks -args: `message`, `reset` -- give clear task-oriented instructions, credentials, and a stop condition -- `reset=true` starts a new browser session; `false` continues the current one -- when continuing, refer to open pages instead of restarting -downloads go to `/a0/tmp/downloads` diff --git a/plugins/_browser_agent/prompts/browser_agent.system.md b/plugins/_browser_agent/prompts/browser_agent.system.md deleted file mode 100644 index 70a41a44f..000000000 --- a/plugins/_browser_agent/prompts/browser_agent.system.md +++ /dev/null @@ -1,22 +0,0 @@ -# Operation instruction -Keep your tasks solution as simple and straight forward as possible -Follow instructions as closely as possible -When told go to website, open the website. If no other instructions: stop there -Do not interact with the website unless told to -Always accept all cookies if prompted on the website, NEVER go to browser cookie settings -If asked specific questions about a website, be as precise and close to the actual page content as possible -If you are waiting for instructions: you should end the task and mark as done - -## Task Completion -When you have completed the assigned task OR are waiting for further instructions: -1. Use the "Complete task" action to mark the task as complete -2. Provide the required parameters: title, response, and page_summary -3. Do NOT continue taking actions after calling "Complete task" - -## Important Notes -- Always call "Complete task" when your objective is achieved -- In page_summary respond with one paragraph of main content plus an overview of page elements -- Response field is used to answer to user's task or ask additional questions -- If you navigate to a website and no further actions are requested, call "Complete task" immediately -- If you complete any requested interaction (clicking, typing, etc.), call "Complete task" -- Never leave a task running indefinitely - always conclude with "Complete task" diff --git a/plugins/_browser_agent/tools/browser_agent.py b/plugins/_browser_agent/tools/browser_agent.py deleted file mode 100644 index e7098c86f..000000000 --- a/plugins/_browser_agent/tools/browser_agent.py +++ /dev/null @@ -1,440 +0,0 @@ -import asyncio -import time -from typing import Optional, cast -from agent import Agent, InterventionException -from pathlib import Path - -from helpers.tool import Tool, Response -from helpers import files, defer, persist_chat, strings -from plugins._browser_agent.helpers.browser_use import browser_use # type: ignore[attr-defined] -from helpers.print_style import PrintStyle -from plugins._browser_agent.helpers.playwright import ensure_playwright_binary -from helpers.secrets import get_secrets_manager -from extensions.python.message_loop_start._10_iteration_no import get_iter_no -from pydantic import BaseModel -import uuid -from helpers.dirty_json import DirtyJson - - -PLUGIN_DIR = Path(__file__).resolve().parents[1] - - -class State: - @staticmethod - async def create(agent: Agent): - state = State(agent) - return state - - def __init__(self, agent: Agent): - self.agent = agent - self.browser_session: Optional[browser_use.BrowserSession] = None - self.task: Optional[defer.DeferredTask] = None - self.use_agent: Optional[browser_use.Agent] = None - self.secrets_dict: Optional[dict[str, str]] = None - self.iter_no = 0 - - def __del__(self): - self.kill_task() - files.delete_dir(self.get_user_data_dir()) # cleanup user data dir - - def get_user_data_dir(self): - return str( - Path.home() - / ".config" - / "browseruse" - / "profiles" - / f"agent_{self.agent.context.id}" - ) - - def _get_browser_http_headers(self): - # ignored for now - return {} - - def _get_browser_vision(self): - from plugins._model_config.helpers.model_config import get_chat_model_config - cfg = get_chat_model_config(self.agent) - return cfg.get("vision", False) - - async def _initialize(self): - if self.browser_session: - return - - # for some reason we need to provide exact path to headless shell, otherwise it looks for headed browser - pw_binary = ensure_playwright_binary() - - self.browser_session = browser_use.BrowserSession( - browser_profile=browser_use.BrowserProfile( - headless=True, - disable_security=True, - chromium_sandbox=False, - accept_downloads=True, - downloads_path=files.get_abs_path("usr/downloads"), - allowed_domains=["*", "http://*", "https://*"], - executable_path=pw_binary, - keep_alive=True, - minimum_wait_page_load_time=1.0, - wait_for_network_idle_page_load_time=2.0, - maximum_wait_page_load_time=10.0, - window_size={"width": 1024, "height": 2048}, - screen={"width": 1024, "height": 2048}, - viewport={"width": 1024, "height": 2048}, - no_viewport=False, - args=["--headless=new", "--no-sandbox"], - # Use a unique user data directory to avoid conflicts - user_data_dir=self.get_user_data_dir(), - extra_http_headers=self._get_browser_http_headers(), - ) - ) - - await self.browser_session.start() if self.browser_session else None - # self.override_hooks() - - # -------------------------------------------------------------------------- - # Patch to enforce vertical viewport size - # -------------------------------------------------------------------------- - # Browser-use auto-configuration overrides viewport settings, causing wrong - # aspect ratio. We fix this by directly setting viewport size after startup. - # -------------------------------------------------------------------------- - - if self.browser_session: - try: - page = await self.browser_session.get_current_page() - if page: - await page.set_viewport_size({"width": 1024, "height": 2048}) - except Exception as e: - PrintStyle().warning(f"Could not force set viewport size: {e}") - - # -------------------------------------------------------------------------- - - # Add init script to the browser session - if self.browser_session and self.browser_session.browser_context: - js_override = str(PLUGIN_DIR / "assets" / "init_override.js") - await self.browser_session.browser_context.add_init_script(path=js_override) if self.browser_session else None - - def start_task(self, task: str): - if self.task and self.task.is_alive(): - self.kill_task() - - self.task = defer.DeferredTask( - thread_name="BrowserAgent" + self.agent.context.id - ) - if self.agent.context.task: - self.agent.context.task.add_child_task(self.task, terminate_thread=True) - self.task.start_task(self._run_task, task) if self.task else None - return self.task - - def kill_task(self): - if self.task: - self.task.kill(terminate_thread=True) - self.task = None - if self.browser_session: - try: - import asyncio - - loop = asyncio.new_event_loop() - asyncio.set_event_loop(loop) - loop.run_until_complete(self.browser_session.close()) if self.browser_session else None - loop.close() - except Exception as e: - PrintStyle().error(f"Error closing browser session: {e}") - finally: - self.browser_session = None - self.use_agent = None - self.iter_no = 0 - - async def _run_task(self, task: str): - await self._initialize() - - class DoneResult(BaseModel): - title: str - response: str - page_summary: str - - # Initialize controller - controller = browser_use.Controller(output_model=DoneResult) - - # Register custom completion action with proper ActionResult fields - @controller.registry.action("Complete task", param_model=DoneResult) - async def complete_task(params: DoneResult): - result = browser_use.ActionResult( - is_done=True, success=True, extracted_content=params.model_dump_json() - ) - return result - - model = self.agent.get_browser_model() - - try: - - secrets_manager = get_secrets_manager(self.agent.context) - secrets_dict = secrets_manager.load_secrets() - - self.use_agent = browser_use.Agent( - task=task, - browser_session=self.browser_session, - llm=model, - use_vision=self._get_browser_vision(), - extend_system_message=self.agent.read_prompt( - "prompts/browser_agent.system.md" - ), - controller=controller, - enable_memory=False, # Disable memory to avoid state conflicts - llm_timeout=3000, # TODO rem - sensitive_data=cast(dict[str, str | dict[str, str]] | None, secrets_dict or {}), # Pass secrets - ) - except Exception as e: - raise Exception( - f"Browser agent initialization failed. This might be due to model compatibility issues. Error: {e}" - ) from e - - self.iter_no = get_iter_no(self.agent) - - async def hook(agent: browser_use.Agent): - await self.agent.wait_if_paused() - if self.iter_no != get_iter_no(self.agent): - raise InterventionException("Task cancelled") - - # try: - result = None - if self.use_agent: - result = await self.use_agent.run( - max_steps=50, on_step_start=hook, on_step_end=hook - ) - return result - - async def get_page(self): - if self.use_agent and self.browser_session: - try: - return await self.use_agent.browser_session.get_current_page() if self.use_agent.browser_session else None - except Exception: - # Browser session might be closed or invalid - return None - return None - - async def get_selector_map(self): - """Get the selector map for the current page state.""" - if self.use_agent: - await self.use_agent.browser_session.get_state_summary(cache_clickable_elements_hashes=True) if self.use_agent.browser_session else None - return await self.use_agent.browser_session.get_selector_map() if self.use_agent.browser_session else None - await self.use_agent.browser_session.get_state_summary( - cache_clickable_elements_hashes=True - ) - return await self.use_agent.browser_session.get_selector_map() - return {} - - -class BrowserAgent(Tool): - - async def execute(self, message="", reset="", **kwargs): - self.guid = self.agent.context.generate_id() # short random id - reset = str(reset).lower().strip() == "true" - await self.prepare_state(reset=reset) - message = get_secrets_manager(self.agent.context).mask_values(message, placeholder="{key}") # mask any potential passwords passed from A0 to browser-use to browser-use format - task = self.state.start_task(message) if self.state else None - - # wait for browser agent to finish and update progress with timeout - timeout_seconds = 300 # 5 minute timeout - start_time = time.time() - - fail_counter = 0 - while not task.is_ready() if task else False: - # Check for timeout to prevent infinite waiting - if time.time() - start_time > timeout_seconds: - PrintStyle().warning( - self._mask(f"Browser agent task timeout after {timeout_seconds} seconds, forcing completion") - ) - break - - await self.agent.handle_intervention() - await asyncio.sleep(1) - try: - if task and task.is_ready(): # otherwise get_update hangs - break - try: - update = await asyncio.wait_for(self.get_update(), timeout=10) - fail_counter = 0 # reset on success - except asyncio.TimeoutError: - fail_counter += 1 - PrintStyle().warning( - self._mask(f"browser_agent.get_update timed out ({fail_counter}/3)") - ) - if fail_counter >= 3: - PrintStyle().warning( - self._mask("3 consecutive browser_agent.get_update timeouts, breaking loop") - ) - break - continue - update_log = update.get("log", get_use_agent_log(None)) - self.update_progress("\n".join(update_log)) - screenshot = update.get("screenshot", None) - if screenshot: - self.log.update(screenshot=screenshot) - except Exception as e: - PrintStyle().error(self._mask(f"Error getting update: {str(e)}")) - - if task and not task.is_ready(): - PrintStyle().warning(self._mask("browser_agent.get_update timed out, killing the task")) - self.state.kill_task() if self.state else None - return Response( - message=self._mask("Browser agent task timed out, not output provided."), - break_loop=False, - ) - - # final progress update - if self.state and self.state.use_agent: - log_final = get_use_agent_log(self.state.use_agent) - self.update_progress("\n".join(log_final)) - - # collect result with error handling - try: - result = await task.result() if task else None - except Exception as e: - PrintStyle().error(self._mask(f"Error getting browser agent task result: {str(e)}")) - # Return a timeout response if task.result() fails - answer_text = self._mask(f"Browser agent task failed to return result: {str(e)}") - self.log.update(answer=answer_text) - return Response(message=answer_text, break_loop=False) - # finally: - # # Stop any further browser access after task completion - # # self.state.kill_task() - # pass - - # Check if task completed successfully - if result and result.is_done(): - answer = result.final_result() - try: - if answer and isinstance(answer, str) and answer.strip(): - answer_data = DirtyJson.parse_string(answer) - answer_text = strings.dict_to_text(answer_data) # type: ignore - else: - answer_text = ( - str(answer) if answer else "Task completed successfully" - ) - except Exception as e: - answer_text = ( - str(answer) - if answer - else f"Task completed with parse error: {str(e)}" - ) - else: - # Task hit max_steps without calling done() - urls = result.urls() if result else [] - current_url = urls[-1] if urls else "unknown" - answer_text = ( - f"Task reached step limit without completion. Last page: {current_url}. " - f"The browser agent may need clearer instructions on when to finish." - ) - - # Mask answer for logs and response - answer_text = self._mask(answer_text) - - # update the log (without screenshot path here, user can click) - self.log.update(answer=answer_text) - - # add screenshot to the answer if we have it - if ( - self.log.kvps - and "screenshot" in self.log.kvps - and self.log.kvps["screenshot"] - ): - path = self.log.kvps["screenshot"].split("//", 1)[-1].split("&", 1)[0] - answer_text += f"\n\nScreenshot: {path}" - - # respond (with screenshot path) - return Response(message=answer_text, break_loop=False) - - def get_log_object(self): - return self.agent.context.log.log( - type="browser", - heading=f"icon://captive_portal {self.agent.agent_name}: Calling Browser Agent", - content="", - kvps=self.args, - ) - - async def get_update(self): - await self.prepare_state() - - result = {} - agent = self.agent - ua = self.state.use_agent if self.state else None - page = await self.state.get_page() if self.state else None - - if ua and page: - try: - - async def _get_update(): - - # await agent.wait_if_paused() # no need here - - # Build short activity log - result["log"] = get_use_agent_log(ua) - - path = files.get_abs_path( - persist_chat.get_chat_folder_path(agent.context.id), - "browser", - "screenshots", - f"{self.guid}.png", - ) - files.make_dirs(path) - await page.screenshot(path=path, full_page=False, timeout=3000) - result["screenshot"] = f"img://{path}&t={str(time.time())}" - - if self.state and self.state.task and not self.state.task.is_ready(): - await self.state.task.execute_inside(_get_update) - - except Exception: - pass - - return result - - async def prepare_state(self, reset=False): - self.state = self.agent.get_data("_browser_agent_state") - if reset and self.state: - self.state.kill_task() - if not self.state or reset: - self.state = await State.create(self.agent) - self.agent.set_data("_browser_agent_state", self.state) - - def update_progress(self, text): - text = self._mask(text) - short = text.split("\n")[-1] - if len(short) > 50: - short = short[:50] + "..." - progress = f"Browser: {short}" - - self.log.update(progress=text) - self.agent.context.log.set_progress(progress) - - def _mask(self, text: str) -> str: - try: - return get_secrets_manager(self.agent.context).mask_values(text or "") - except Exception as e: - return text or "" - - # def __del__(self): - # if self.state: - # self.state.kill_task() - - -def get_use_agent_log(use_agent: browser_use.Agent | None): - result = ["🚦 Starting task"] - if use_agent: - action_results = use_agent.history.action_results() or [] - short_log = [] - for item in action_results: - # final results - if item.is_done: - if item.success: - short_log.append("✅ Done") - else: - short_log.append( - f"❌ Error: {item.error or item.extracted_content or 'Unknown error'}" - ) - - # progress messages - else: - text = item.extracted_content - if text: - first_line = text.split("\n", 1)[0][:200] - short_log.append(first_line) - result.extend(short_log) - return result diff --git a/plugins/_browser_agent/webui/browser-agent-store.js b/plugins/_browser_agent/webui/browser-agent-store.js deleted file mode 100644 index 1f24ba448..000000000 --- a/plugins/_browser_agent/webui/browser-agent-store.js +++ /dev/null @@ -1,51 +0,0 @@ -import { createStore } from "/js/AlpineStore.js"; -import { callJsonApi } from "/js/api.js"; - -const STATUS_API = "/plugins/_browser_agent/status"; -const MODEL_PRESET_API = "/plugins/_browser_agent/model_preset"; - -const model = { - loading: true, - savingPreset: false, - error: "", - status: null, - - async refreshStatus() { - this.status = await callJsonApi(STATUS_API, {}); - }, - - async savePreset(presetName) { - this.savingPreset = true; - try { - await callJsonApi(MODEL_PRESET_API, { - action: presetName ? "set" : "clear", - preset_name: presetName || "", - }); - this.error = ""; - await this.refreshStatus(); - } catch (error) { - this.error = error instanceof Error ? error.message : String(error); - await this.refreshStatus(); - } finally { - this.savingPreset = false; - } - }, - - async onOpen() { - this.loading = true; - this.error = ""; - - try { - await this.refreshStatus(); - } catch (error) { - this.status = null; - this.error = error instanceof Error ? error.message : String(error); - } finally { - this.loading = false; - } - }, - - cleanup() {}, -}; - -export const store = createStore("browserAgentPage", model); diff --git a/plugins/_browser_agent/webui/main.html b/plugins/_browser_agent/webui/main.html deleted file mode 100644 index c67edd8d6..000000000 --- a/plugins/_browser_agent/webui/main.html +++ /dev/null @@ -1,232 +0,0 @@ - - - Browser Agent - - - -
- -
- - - - diff --git a/plugins/_browser_agent/webui/thumbnail.jpg b/plugins/_browser_agent/webui/thumbnail.jpg deleted file mode 100644 index 2ea014eb8..000000000 Binary files a/plugins/_browser_agent/webui/thumbnail.jpg and /dev/null differ diff --git a/plugins/_model_config/README.md b/plugins/_model_config/README.md index 15a07861d..d95fa653d 100644 --- a/plugins/_model_config/README.md +++ b/plugins/_model_config/README.md @@ -23,7 +23,6 @@ This plugin centralizes model selection and model-related settings for the appli - Allows a chat context to store a temporary override or preset reference in context data. - **Model object construction** - Builds `ModelConfig` objects and the runtime chat, utility, and embedding wrappers used elsewhere in the app. - - Note: Browser model wiring now lives in the `_browser_agent` plugin. - **API key validation** - Reports configured providers that still require API keys. diff --git a/requirements.txt b/requirements.txt index 8add74428..1bf1ac764 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,6 +1,5 @@ a2wsgi==1.10.8 ansio==0.0.1 -browser-use==0.5.11 docker==7.1.0 duckduckgo-search==6.1.12 faiss-cpu==1.11.0 diff --git a/skills/a0-browser-ext/SKILL.md b/skills/a0-browser-ext/SKILL.md new file mode 100644 index 000000000..ee529a597 --- /dev/null +++ b/skills/a0-browser-ext/SKILL.md @@ -0,0 +1,106 @@ +--- +name: a0-browser-ext +description: Create, inspect, install, and safely maintain Chrome extensions for Agent Zero's built-in Browser plugin. +tags: ["agent-zero", "browser", "chrome-extension", "playwright", "manifest-v3"] +--- + +# Agent Zero Browser Extensions + +Use this skill when the user wants to create a new Browser extension, modify an existing extension, or install a Chrome Web Store extension for Agent Zero's direct `_browser` plugin. + +## Operating Model + +- Agent Zero loads Browser extensions from unpacked directories. +- Create user-owned extensions under `/a0/usr/browser-extensions//`. +- Browser extension paths must be visible inside the Docker runtime. Prefer `/a0/usr/browser-extensions/...` paths over host-only paths. +- The Browser puzzle menu can open "My Browser Extensions", seed a "+ Create New with A0" request, and install Chrome Web Store URLs. +- Chrome Web Store installs are converted into unpacked extension folders before Browser can load them. +- Extension setting changes restart active Browser runtimes so Playwright can relaunch Chromium with the extension arguments. + +## Safety First + +Browser extensions run inside the Docker browser sandbox, but malicious or buggy extensions can still damage that sandboxed environment, corrupt browser profiles, exfiltrate page data visible to the Browser, or make browsing unreliable. + +Before creating or installing an extension: + +- State the requested behavior in one sentence. +- List the minimum permissions and host permissions needed. +- Avoid `` unless the user explicitly needs broad page access. +- Avoid remote code, eval-style execution, hidden credential collection, and broad network access. +- Do not store secrets in extension files. +- Prefer content scripts for page-local behavior and service workers for coordination. +- Tell the user when an extension can read or modify page content. + +## Create New Extension + +1. Ask for the extension name, user-visible purpose, target websites, and whether it needs a popup, content script, background service worker, options page, or side panel. +2. Choose a lowercase slug such as `reader-highlighter`. +3. Create `/a0/usr/browser-extensions//manifest.json`. +4. Add only the files the extension actually needs. +5. Validate JSON syntax and confirm `manifest_version` is `3`. +6. Keep generated code small, readable, and easy for the user to audit. +7. After creating the folder, tell the user to open Browser's puzzle menu, use "Browser Extension Settings", enable extensions, and include the new folder path if it is not already enabled. + +Minimal Manifest V3 starter: + +```json +{ + "manifest_version": 3, + "name": "Agent Zero Example Extension", + "version": "0.1.0", + "description": "Small, auditable Browser extension created with Agent Zero.", + "permissions": [], + "host_permissions": [], + "action": { + "default_title": "A0 Extension" + } +} +``` + +Content script starter: + +```json +{ + "manifest_version": 3, + "name": "Agent Zero Page Helper", + "version": "0.1.0", + "description": "Adds a small page helper for specific sites.", + "permissions": [], + "host_permissions": ["https://example.com/*"], + "content_scripts": [ + { + "matches": ["https://example.com/*"], + "js": ["content.js"], + "run_at": "document_idle" + } + ] +} +``` + +## Install From Chrome Web Store + +If the user gives a Chrome Web Store URL or extension id: + +1. Confirm they understand the sandbox warning. +2. Extract the 32-character extension id from the URL. +3. Prefer the Browser puzzle menu's URL installer for direct installs. +4. If installing manually, download the CRX from Chrome's update service, extract the ZIP payload safely, and place it under `/a0/usr/browser-extensions/chrome-web-store//`. +5. Inspect `manifest.json` and summarize name, version, permissions, host permissions, and suspicious capabilities. +6. Enable only after the user accepts the risk. + +Common URL shapes: + +```text +https://chromewebstore.google.com/detail/name/ +https://chrome.google.com/webstore/detail/name/ + +``` + +## Review Checklist + +- `manifest.json` parses cleanly. +- Every permission has a reason. +- Host matches are specific. +- No credential scraping, hidden data upload, or remote executable code. +- UI text is concise and tells the truth. +- The extension can be removed by deleting its folder from `/a0/usr/browser-extensions/` and removing the path from Browser settings. diff --git a/skills/a0-development/SKILL.md b/skills/a0-development/SKILL.md index 188567ee6..6fb829e7d 100644 --- a/skills/a0-development/SKILL.md +++ b/skills/a0-development/SKILL.md @@ -706,7 +706,7 @@ The framework ships with these core plugins in `/a0/plugins/`: | `_memory` | Persistent vector memory system | | `_text_editor` | File read/write/patch with line numbers | | `_model_config` | LLM model selection and configuration | -| `_browser_agent` | Browser automation and web interaction | +| `_browser` | Direct browser automation and WebUI viewing | | `_infection_check` | Prompt injection safety checks | | `_error_retry` | Retry on critical exceptions | | `_email_integration` | Email communication via IMAP/SMTP | diff --git a/tests/test_browser_agent_regressions.py b/tests/test_browser_agent_regressions.py index 1864678cf..01aa32417 100644 --- a/tests/test_browser_agent_regressions.py +++ b/tests/test_browser_agent_regressions.py @@ -1,74 +1,404 @@ -import asyncio -import importlib -import json import sys +import threading from pathlib import Path from types import SimpleNamespace +import pytest + PROJECT_ROOT = Path(__file__).resolve().parents[1] if str(PROJECT_ROOT) not in sys.path: sys.path.insert(0, str(PROJECT_ROOT)) -import plugins._browser_agent.helpers.browser_use_monkeypatch as browser_use_monkeypatch -import plugins._browser_agent.tools.browser_agent as browser_agent_module +from plugins._browser.helpers.config import ( + build_browser_launch_config, + get_browser_model_preset_options, + normalize_browser_config, + resolve_browser_model_selection, +) +from plugins._browser.helpers.extension_manager import ( + _crx_zip_payload, + parse_chrome_web_store_extension_id, +) +from plugins._browser.helpers.runtime import normalize_url +import plugins._browser.hooks as browser_hooks_module +import plugins._browser.tools.browser as browser_tool_module +import plugins._browser.api.ws_browser as ws_browser_module -def test_gemini_clean_and_conform_normalizes_known_single_action_shapes(): - raw = ( - '{"action":[' - '{"complete_task":{"title":"T","response":"R","page_summary":"S"}}' - ']}' +def test_browser_url_normalization_matches_address_bar_hosts(): + assert normalize_url("localhost:3000") == "http://localhost:3000/" + assert normalize_url("127.0.0.1:8000/path") == "http://127.0.0.1:8000/path" + assert normalize_url("novinky.cz") == "https://novinky.cz/" + assert normalize_url("https://example.com") == "https://example.com/" + assert normalize_url("about:blank") == "about:blank" + + +def test_browser_config_normalizes_extension_paths(tmp_path): + extension_dir = tmp_path / "extension" + extension_dir.mkdir() + + config = normalize_browser_config( + { + "extensions_enabled": 1, + "extension_paths": [str(extension_dir), "", " ", str(extension_dir)], + } ) - cleaned = browser_use_monkeypatch.gemini_clean_and_conform(raw) + assert config == { + "extensions_enabled": True, + "extension_paths": [str(extension_dir)], + "model_preset": "", + } - assert cleaned is not None - parsed = json.loads(cleaned) - assert parsed["action"] == [ + +def test_browser_config_normalizes_model_preset(): + assert normalize_browser_config({"model_preset": " Research "})["model_preset"] == "Research" + assert "model" not in normalize_browser_config({"model": "main"}) + + +def test_browser_model_selection_uses_presets(monkeypatch): + import plugins._browser.helpers.config as browser_config_module + from plugins._model_config.helpers import model_config + + monkeypatch.setattr( + browser_config_module, + "get_browser_config", + lambda agent=None: {"model_preset": "Research", "extensions_enabled": False, "extension_paths": []}, + ) + monkeypatch.setattr( + model_config, + "get_preset_by_name", + lambda name: { + "name": "Research", + "chat": {"provider": "openrouter", "name": "example/model"}, + } if name == "Research" else None, + ) + + selection = resolve_browser_model_selection(SimpleNamespace()) + + assert selection["source_kind"] == "preset" + assert selection["config"] == {"provider": "openrouter", "name": "example/model"} + + +def test_browser_model_selection_falls_back_to_main_for_missing_preset(monkeypatch): + from plugins._model_config.helpers import model_config + + monkeypatch.setattr(model_config, "get_preset_by_name", lambda name: None) + monkeypatch.setattr( + model_config, + "get_chat_model_config", + lambda agent=None: {"provider": "openrouter", "name": "main/model"}, + ) + + selection = resolve_browser_model_selection(SimpleNamespace(), {"model_preset": "Missing"}) + + assert selection["source_kind"] == "main" + assert selection["preset_status"] == "missing" + assert selection["config"] == {"provider": "openrouter", "name": "main/model"} + + +def test_browser_model_preset_options_include_missing_selected(monkeypatch): + from plugins._model_config.helpers import model_config + + monkeypatch.setattr( + model_config, + "get_presets", + lambda: [{"name": "Balance", "chat": {"provider": "openrouter", "name": "model"}}], + ) + + options = get_browser_model_preset_options(settings={"model_preset": "Deleted"}) + + assert options[-1]["name"] == "Deleted" + assert options[-1]["missing"] is True + + +def test_browser_launch_config_switches_to_chromium_for_extensions(tmp_path): + extension_dir = tmp_path / "extension" + extension_dir.mkdir() + + launch = build_browser_launch_config( { - "done": { - "success": True, - "data": { - "title": "T", - "response": "R", - "page_summary": "S", - }, - } + "extensions_enabled": True, + "extension_paths": [str(extension_dir)], + } + ) + + assert launch["browser_mode"] == "chromium_extensions" + assert launch["channel"] == "chromium" + assert launch["requires_full_browser"] is True + assert launch["extensions"]["active"] is True + assert any(arg.startswith("--load-extension=") for arg in launch["args"]) + assert "--headless=new" not in launch["args"] + + +def test_browser_extension_manager_parses_web_store_urls(): + extension_id = "a" * 32 + + assert parse_chrome_web_store_extension_id(extension_id) == extension_id + assert ( + parse_chrome_web_store_extension_id( + f"https://chromewebstore.google.com/detail/example/{extension_id}" + ) + == extension_id + ) + assert ( + parse_chrome_web_store_extension_id( + f"https://chrome.google.com/webstore/detail/example/{extension_id}?hl=en" + ) + == extension_id + ) + + +def test_browser_extension_manager_extracts_crx3_zip_payload(): + payload = b"PK\x03\x04zip-payload" + header = b"metadata" + crx = b"Cr24" + (3).to_bytes(4, "little") + len(header).to_bytes(4, "little") + header + payload + + assert _crx_zip_payload(crx) == payload + + +def test_browser_extension_menu_exposes_agent_and_url_paths(): + html = (PROJECT_ROOT / "plugins" / "_browser" / "webui" / "main.html").read_text( + encoding="utf-8" + ) + skill = PROJECT_ROOT / "skills" / "a0-browser-ext" / "SKILL.md" + + assert "+ Create New with A0" in html + assert "Chrome Web Store URL" in html + assert "My Browser Extensions" in html + assert "malicious or buggy extensions" in html + assert skill.exists() + + +def test_browser_save_plugin_config_restarts_runtimes_on_change(monkeypatch, tmp_path): + extension_dir = tmp_path / "extension" + extension_dir.mkdir() + restarted = [] + + monkeypatch.setattr( + browser_hooks_module, + "_load_saved_browser_config", + lambda project_name="", agent_profile="": { + "extensions_enabled": False, + "extension_paths": [], }, - ] + ) + monkeypatch.setattr( + browser_hooks_module, + "close_all_runtimes_sync", + lambda: restarted.append(True), + ) + + result = browser_hooks_module.save_plugin_config( + { + "extensions_enabled": True, + "extension_paths": [str(extension_dir)], + }, + project_name="", + agent_profile="", + ) + + assert result["extensions_enabled"] is True + assert result["extension_paths"] == [str(extension_dir)] + assert result["model_preset"] == "" + assert restarted == [True] -class DummyBrowserSession: - def __init__(self) -> None: - self.kill_called = False - self.close_called = False +def test_browser_save_plugin_config_does_not_restart_runtimes_for_preset_only(monkeypatch): + restarted = [] - async def kill(self) -> None: - self.kill_called = True + monkeypatch.setattr( + browser_hooks_module, + "_load_saved_browser_config", + lambda project_name="", agent_profile="": { + "extensions_enabled": False, + "extension_paths": [], + "model_preset": "", + }, + ) + monkeypatch.setattr( + browser_hooks_module, + "close_all_runtimes_sync", + lambda: restarted.append(True), + ) - async def close(self) -> None: - self.close_called = True + result = browser_hooks_module.save_plugin_config( + { + "extensions_enabled": False, + "extension_paths": [], + "model_preset": "Research", + }, + project_name="", + agent_profile="", + ) + + assert result["model_preset"] == "Research" + assert restarted == [] -class DummyAgent: - def __init__(self) -> None: - self.context = SimpleNamespace(id="ctx", task=None) +@pytest.mark.asyncio +async def test_browser_tool_dispatches_direct_actions(monkeypatch): + calls = [] + + class FakeRuntime: + async def call(self, method, *args): + calls.append((method, args)) + if method == "content": + return {"document": "[link 1] Example"} + return {"ok": True, "method": method, "args": args} + + async def fake_get_runtime(context_id, create=True): + assert context_id == "ctx" + return FakeRuntime() + + monkeypatch.setattr(browser_tool_module, "get_runtime", fake_get_runtime) + agent = SimpleNamespace(context=SimpleNamespace(id="ctx")) + tool = browser_tool_module.Browser( + agent=agent, + name="browser", + method=None, + args={}, + message="", + loop_data=None, + ) + + response = await tool.execute(action="content", browser_id=1) + + assert response.message == "[link 1] Example" + assert calls == [("content", (1, None))] -def test_browser_session_teardown_prefers_kill_for_keep_alive_sessions(): - state = browser_agent_module.State(DummyAgent()) - session = DummyBrowserSession() - state.browser_session = session +@pytest.mark.asyncio +async def test_browser_viewer_subscribe_unregisters_stream(monkeypatch): + class FakeRuntime: + def __init__(self) -> None: + self.opened = False - state.kill_task() + async def call(self, method, *args): + if method == "list": + if self.opened: + return { + "browsers": [{"id": 1, "currentUrl": "about:blank", "title": ""}], + "last_interacted_browser_id": 1, + } + return {"browsers": [], "last_interacted_browser_id": None} + if method == "open": + self.opened = True + return {"id": 1, "state": {"id": 1, "currentUrl": "about:blank"}} + raise AssertionError(method) - assert session.kill_called is True - assert session.close_called is False + async def fake_get_runtime(context_id, create=True): + assert context_id == "ctx" + return FakeRuntime() + + monkeypatch.setattr(ws_browser_module, "get_runtime", fake_get_runtime) + monkeypatch.setattr( + ws_browser_module.AgentContext, + "get", + staticmethod(lambda context_id: SimpleNamespace(id=context_id)), + ) + + handler = ws_browser_module.WsBrowser( + SimpleNamespace(), + threading.RLock(), + manager=None, + ) + + result = await handler.process( + "browser_viewer_subscribe", + {"context_id": "ctx", "correlationId": "c1"}, + "sid-1", + ) + + assert result["context_id"] == "ctx" + assert ("sid-1", "ctx") in ws_browser_module.WsBrowser._streams + + await handler.on_disconnect("sid-1") + + assert ("sid-1", "ctx") not in ws_browser_module.WsBrowser._streams -def test_browser_cleanup_extensions_follow_new_extensible_path_layout(): - extension = importlib.import_module("helpers.extension") +@pytest.mark.asyncio +async def test_browser_viewer_viewport_input_dispatches_resize(monkeypatch): + calls = [] + + class FakeRuntime: + async def call(self, method, *args, **kwargs): + calls.append((method, args, kwargs)) + return {"ok": True, "method": method, "args": args} + + async def fake_get_runtime(context_id, create=True): + assert context_id == "ctx" + assert create is False + return FakeRuntime() + + monkeypatch.setattr(ws_browser_module, "get_runtime", fake_get_runtime) + + handler = ws_browser_module.WsBrowser( + SimpleNamespace(), + threading.RLock(), + manager=None, + ) + + result = await handler.process( + "browser_viewer_input", + { + "context_id": "ctx", + "browser_id": 7, + "input_type": "viewport", + "width": 1280, + "height": 720, + }, + "sid-1", + ) + + assert result == {"state": {"ok": True, "method": "set_viewport", "args": (7, 1280, 720)}} + assert calls == [("set_viewport", (7, 1280, 720), {})] + + +@pytest.mark.asyncio +async def test_browser_viewer_wheel_input_dispatches_scroll(monkeypatch): + calls = [] + + class FakeRuntime: + async def call(self, method, *args, **kwargs): + calls.append((method, args, kwargs)) + return {"ok": True, "method": method, "args": args} + + async def fake_get_runtime(context_id, create=True): + assert context_id == "ctx" + assert create is False + return FakeRuntime() + + monkeypatch.setattr(ws_browser_module, "get_runtime", fake_get_runtime) + + handler = ws_browser_module.WsBrowser( + SimpleNamespace(), + threading.RLock(), + manager=None, + ) + + result = await handler.process( + "browser_viewer_input", + { + "context_id": "ctx", + "browser_id": 3, + "input_type": "wheel", + "x": 320, + "y": 480, + "delta_x": 0, + "delta_y": 640, + }, + "sid-1", + ) + + assert result == {"state": {"ok": True, "method": "wheel", "args": (3, 320.0, 480.0, 0.0, 640.0)}} + assert calls == [("wheel", (3, 320.0, 480.0, 0.0, 640.0), {})] + + +def test_browser_cleanup_extensions_follow_extensible_path_layout(): + extension = __import__("helpers.extension", fromlist=["_get_extension_classes"]) remove_classes = extension._get_extension_classes( # type: ignore[attr-defined] "_functions/agent/AgentContext/remove/start" ) @@ -76,5 +406,12 @@ def test_browser_cleanup_extensions_follow_new_extensible_path_layout(): "_functions/agent/AgentContext/reset/start" ) - assert any(cls.__name__ == "CleanupBrowserStateOnRemove" for cls in remove_classes) - assert any(cls.__name__ == "CleanupBrowserStateOnReset" for cls in reset_classes) + assert any(cls.__name__ == "CleanupBrowserRuntimeOnRemove" for cls in remove_classes) + assert any(cls.__name__ == "CleanupBrowserRuntimeOnReset" for cls in reset_classes) + + +def test_legacy_browser_dependency_is_removed(): + assert not (PROJECT_ROOT / "plugins" / ("_browser" + "_agent")).exists() + assert ("browser" + "-use") not in (PROJECT_ROOT / "requirements.txt").read_text( + encoding="utf-8" + ) diff --git a/tests/test_webui_extension_surfaces.py b/tests/test_webui_extension_surfaces.py index b494a02f7..2d6880ced 100644 --- a/tests/test_webui_extension_surfaces.py +++ b/tests/test_webui_extension_surfaces.py @@ -10,7 +10,7 @@ from typing import Iterator import pytest from flask import Flask -PROJECT_ROOT = Path(__file__).resolve().parents[2] +PROJECT_ROOT = Path(__file__).resolve().parents[1] if str(PROJECT_ROOT) not in sys.path: sys.path.insert(0, str(PROJECT_ROOT)) @@ -75,6 +75,19 @@ def _temporary_probe_plugin(surface: str) -> Iterator[tuple[str, str]]: dir=plugins_root, ) as temp_plugin_dir: plugin_id = Path(temp_plugin_dir).name + (Path(temp_plugin_dir) / "plugin.yaml").write_text( + ( + f"name: {plugin_id}\n" + f"title: {plugin_id}\n" + "description: Temporary WebUI surface probe.\n" + "version: 0.0.0\n" + "always_enabled: false\n" + ), + encoding="utf-8", + ) + from helpers import cache + + cache.clear("*(plugins)*") probe_file = ( Path(temp_plugin_dir) / "extensions" @@ -91,7 +104,10 @@ def _temporary_probe_plugin(surface: str) -> Iterator[tuple[str, str]]: ), encoding="utf-8", ) - yield plugin_id, probe_file.name + try: + yield plugin_id, probe_file.name + finally: + cache.clear("*(plugins)*") @pytest.mark.asyncio @@ -117,8 +133,13 @@ async def test_webui_surface_extension_point_end_to_end( f"{plugin_id}/extensions/webui/{surface}/{probe_file_name}" ) - assert any( - extension.get("plugin_id") == plugin_id - and str(extension.get("path", "")).replace("\\", "/").endswith(expected_suffix) + extension_paths = [ + str( + extension.get("path", "") + if isinstance(extension, dict) + else extension + ).replace("\\", "/") for extension in extensions - ) + ] + + assert any(path.endswith(expected_suffix) for path in extension_paths) diff --git a/webui/js/modals.js b/webui/js/modals.js index c4e444a80..a3076b2fe 100644 --- a/webui/js/modals.js +++ b/webui/js/modals.js @@ -5,6 +5,20 @@ import { callJsExtensions } from "/js/extensions.js"; // Modal functionality const modalStack = []; +function findModalIndexByPath(modalPath) { + return modalStack.findIndex((modal) => modal.path === modalPath); +} + +function focusModal(modalPath) { + const modalIndex = findModalIndexByPath(modalPath); + if (modalIndex === -1) return false; + if (modalIndex === modalStack.length - 1) return true; + const [modal] = modalStack.splice(modalIndex, 1); + modalStack.push(modal); + updateModalZIndexes(); + return true; +} + function getModalScrollElement(modal) { return modal?.element?.querySelector(".modal-scroll"); } @@ -38,6 +52,15 @@ backdrop.style.display = "none"; backdrop.style.backdropFilter = "blur(5px)"; document.body.appendChild(backdrop); +function modalSuppressesBackdrop(modal) { + const path = String(modal?.path || ""); + return path === "/plugins/_browser/webui/main.html" + || path === "plugins/_browser/webui/main.html" + || modal?.element?.classList?.contains("modal-floating") + || modal?.element?.classList?.contains("modal-no-backdrop") + || modal?.inner?.classList?.contains("modal-no-backdrop"); +} + // Function to update z-index for all modals and backdrop function updateModalZIndexes() { // Base z-index for modals @@ -51,20 +74,26 @@ function updateModalZIndexes() { modal.element.style.zIndex = baseZIndex + index * 20; }); - // Always show backdrop - backdrop.style.display = "block"; + const backdropModalStack = modalStack.filter((modal) => !modalSuppressesBackdrop(modal)); - if (modalStack.length > 1) { - // For multiple modals, position backdrop between the top two - const topModalIndex = modalStack.length - 1; - const previousModalZIndex = baseZIndex + (topModalIndex - 1) * 20; - backdrop.style.zIndex = previousModalZIndex + 10; - } else if (modalStack.length === 1) { - // For single modal, position backdrop below it - backdrop.style.zIndex = baseZIndex - 1; - } else { - // No modals, hide backdrop + if (backdropModalStack.length === 0) { backdrop.style.display = "none"; + return; + } + + backdrop.style.display = "block"; + backdrop.style.backdropFilter = "blur(5px)"; + backdrop.style.backgroundColor = ""; + + if (backdropModalStack.length === modalStack.length && modalStack.length > 1) { + const topModalIndex = modalStack.length - 1; + backdrop.style.zIndex = baseZIndex + (topModalIndex - 1) * 20 + 10; + } else { + const topBackdropModal = backdropModalStack[backdropModalStack.length - 1]; + const topBackdropModalIndex = modalStack.indexOf(topBackdropModal); + backdrop.style.zIndex = topBackdropModalIndex > 0 + ? baseZIndex + (topBackdropModalIndex - 1) * 20 + 10 + : baseZIndex - 1; } } @@ -213,6 +242,26 @@ export async function openModal(modalPath, beforeClose = null) { }); } +export function isModalOpen(modalPath) { + return findModalIndexByPath(modalPath) !== -1; +} + +export async function ensureModalOpen(modalPath, beforeClose = null) { + if (focusModal(modalPath)) return null; + return openModal(modalPath, beforeClose); +} + +export async function toggleModal(modalPath, beforeClose = null) { + if (!isModalOpen(modalPath)) { + return openModal(modalPath, beforeClose); + } + while (isModalOpen(modalPath)) { + const closed = await closeModal(modalPath); + if (closed === false) return false; + } + return true; +} + // Function to close modal export async function closeModal(modalPath = null) { if (modalStack.length === 0) return; @@ -369,3 +418,6 @@ document.addEventListener("keydown", (e) => { globalThis.openModal = openModal; globalThis.closeModal = closeModal; globalThis.scrollModal = scrollModal; +globalThis.isModalOpen = isModalOpen; +globalThis.ensureModalOpen = ensureModalOpen; +globalThis.toggleModal = toggleModal;