Introduce DOX (AGENTS.md) contracts across the repository to formalize ownership and local work contracts: adds .github/AGENTS.md plus AGENTS.md files for agents, api, conf, docker, docs, extensions, helpers, knowledge, lib, plugins, prompts, scripts, skills, tests, tools, usr, webui (and several subfolders). Update root AGENTS.md (content and last-updated date) to include DOX framework guidance and a child DOX index. Update .gitignore to allow usr/ and usr/plugins AGENTS.md files. Remove legacy deep-dive files under docs/agents (banners, components, modals, plugins) and migrate frontend/plugin references to webui/ and plugins/ DOX locations. Also adjust plugins/README.md and several skills/*/SKILL.md files to align with the new DOX layout.
Expose Agent Zero over the Agent Client Protocol as a builtin stdio plugin with session lifecycle support, streaming bridges, registry metadata, and editor workspace handling.
Add lazy dependency installation through the ACP plugin hook using the root requirements pin so self-updated instances can recover without a fresh Docker image.
Pin agent-client-protocol, add focused static coverage, and allow ACP sessions to override the per-context workdir for code execution and workdir prompts.
The LiteParse dependency is already managed from the root requirements.txt, so the document query plugin should not carry a separate requirements file.
This keeps dependency installation centralized for the bundled core plugin.
Start Tailscale Remote Control through Funnel on HTTPS port 443, verify the prepared client supports the funnel command, and give the Funnel startup path a longer approval window.
Forward Tailscale approval/login URLs through the shared tunnel notification stream so the UI renders an actionable browser link like Microsoft Dev Tunnels, with clearer basic-user copy around approval and public Funnel URLs.
Keep ngrok removed from the Remote Control surface and extend regression coverage for Funnel command wiring, unsupported clients, and approval-link handling.
Store and return the connector computer-use contract_version and nested capabilities payload in Agent Zero, surface the contract version in tool status text, and teach the remote computer-use prompt/skill to prefer the structured background dispatch contract over OS-name assumptions.
Move the internal Linux Desktop skill toward a structured-first workflow: state/window commands before screenshots, batched desktopctl sequences in one process, and final screenshots only when pixels matter. Normalize internal system Desktop resize requests so narrow portrait canvas sizes do not shrink the real X11 root display and distort screenshots. Keep the live Docker runtime in sync with the changed Desktop files.
Expose native window listing, indexed window state, and element_action dispatch modes through the Agent Zero connector tool. Update host computer-use prompts and platform skills so agents prefer background structural targeting and only rely on screenshots for foreground or uncertain actions.
Add an adaptive OCR heuristic that samples PDF text density and disables LiteParse OCR for large text-rich PDFs before the OCR path reaches timeout territory.
Keep LiteParse isolated in a subprocess regardless of stale user config, remove the subprocess toggle from the settings UI, and raise the default LiteParse worker count to 2 for a safer multi-chat speedup.
Update Document Query docs and focused tests for the new heuristic, mandatory isolation, and worker default.
Materialize browser, desktop, computer-use, and vision-load screenshots into chat-scoped artifacts so historical image refs survive temporary screenshot pruning.
Keep history serialization free of rescue assumptions, document durable screenshot behavior in tool prompts/skills, and size Xpra canvases from backend-normalized display dimensions to prevent stretched desktop views.
Verified with focused pytest coverage plus live Docker checks for browser screenshot persistence and Xpra canvas dimensions.
Expose the main Document Query parser, retrieval, fetch, LiteParse/OCR, and fallback controls in the plugin settings UI. Add a generated 256x256 JPEG thumbnail under the plugin size limit and cover both the settings wiring and thumbnail constraints with focused tests.
Add a Document Query plugin settings panel that maps to the existing parser_concurrency runtime limit, with a focused regression test so the UI remains wired to the backend setting.
Pin LiteParse to 2.0.3 in both Docker requirements and the plugin hook requirements so new images and existing plugin installs resolve the same tested runtime.
Run LiteParse in a subprocess so native parser crashes cannot take down the Web UI process. Bound parser concurrency and LiteParse workers for multi-chat stability, seed Q&A context with leading document chunks for title/abstract grounding, and keep a small-document fallback when vector search returns no chunks.
Rename the query optimization prompt from optmimize to optimize, update the helper lookup, and fix the concise typo inside the prompt.
Also add a regression assertion for the corrected prompt filename and remove the remaining literal a0_small test references so global audits stay clean.
Add LiteParse as the preferred parser path with legacy parser fallbacks, centralized document fetching, generic user-facing progress, and compatibility shims for the former helper/tool imports.
Install the runtime through Docker requirements for fresh images and through the _document_query plugin hook/startup migration for existing installations.
Move the long document_query tool instructions into a document-query skill and leave a compact tool prompt stub that directs the model to load the skill before using document_query for documents, code-file Q&A, and document-image OCR. Also add default Agent Zero guidance for document/code/OCR Q&A routing.
Tests:
- PYTHONPATH=/home/eclypso/a0/agent-zero-pr-1528 conda run -n a0 pytest tests/test_document_query_plugin.py -q
- python -m compileall -q plugins/_document_query helpers/document_query.py tools/document_query.py tests/test_document_query_plugin.py
- git diff --check
- Live Agent Zero Web UI E2E at localhost:32080: PDF Q&A, code-file Q&A through document_query skill, and W-4 document-image OCR
Broader legacy pytest probe remains blocked by unrelated browser-agent, docker workflow branch expectation, and webui fixture path failures in this older PR worktree.
Rename the Remote Link UI and user-facing messages to Remote Control.
Refactor tunnel startup into provider helpers, remove ngrok support, and keep Cloudflare Tunnel, Microsoft Dev Tunnels, Serveo, and Tailscale wired through the Remote Control selector.
Add provider-time binary preparation, Tailscale userspace tailscaled startup, Tailscale login URL display, Microsoft Dev Tunnel progress feedback, and focused regression coverage for the Remote Control provider flows.
Remove stale runtime entries from the desktop SSH agent directory when a self-update request is consumed. Keep the cleanup best-effort so missing paths, non-directory paths, and unexpected cleanup failures do not block update startup.
Cover successful cleanup, missing-directory skips, and failure fallback with focused self-update manager tests.
Advertise the installed_plugins connector capability and add a protected API endpoint that lists already-installed Agent Zero plugins and toggles supported plugins only.
The endpoint normalizes plugin metadata, preserves the installed-only safety boundary, and refuses changes to protected plugins such as _a0_connector so the CLI cannot disconnect itself.
Add an Agent Zero owned browser DOM helper that captures shadow DOM and iframe content with frame-chain/node references.\n\nInstall the DOM helper before page-content capture for both local and host-browser runtimes, and send DOM helper payloads to A0 CLI host browser sessions when needed.\n\nCover iframe content refs and host-browser payload delivery in focused regression tests.
- add max_active_skills to the _skills plugin config and expose it in the config UI
- enforce the scoped cap consistently in skills runtime, catalog responses, and chat activation
- cover raised and lowered cap behavior with focused skills runtime tests
- make the active Visible/Pinned state read as active instead of grey
- replace the bordered toggle wrapper with a simpler slash-separated control
- remove the visible/hidden count label and its unused store helpers
Preserve MCP image, audio, and resource tool results instead of collapsing non-text responses into an empty textual result. Images and image resources now flow into raw history as data URL attachments, while audio and non-image binary resources are saved as artifacts with normalized paths.
Extract shared media artifact helpers for base64 validation, image data URLs, decoded-size checks, artifact saving, MIME normalization, and safe filenames. Reuse the shared helpers from MCP, browser connector, and computer-use artifact paths, and add focused regression coverage.
Exclude the Desktop profile .ssh/agent runtime directory from usr backups during self-update so live SSH agent sockets do not abort upgrades.
Keep the rule in the self-update manager, where the usr backup actually runs, and cover it with a regression test alongside the existing runtime-socket backup cases.
Add a Linux-specific host computer-use skill, route Wayland/AT-SPI backends to it instead of macOS AX guidance, and include compact structural tree outlines in AX/UIA snapshot responses so agents can pick paths and semantic targets from the tool result.
Add the Windows host computer-use skill and teach computer_use_remote to surface UIA window-management guidance, selector passthrough, and click-last workflow hints. Keep backend-specific actions out of generic guidance while exposing Windows structural operations when the backend advertises them.
Tests: uv run --python 3.12 --with-requirements requirements.txt --with-requirements requirements2.txt --with-requirements requirements.dev.txt --with litellm pytest tests\\test_tool_action_contracts.py tests\\test_a0_connector_prompt_gating.py tests\\test_skills_runtime.py -q
Move explicit AX action names and argument details out of the always-loaded computer_use_remote prompt and generic host-computer-use skill. The generic guidance now only explains backend discovery and skill loading, while host-computer-use-macos remains the detailed home for macOS structural targeting. Also soften the old Super+H hide-window guidance so window actions are chosen from the reported backend and verified visually.
Add a macOS-specific computer-use skill for AX structural targeting and keep the generic host skill backend-neutral. Surface backend ids, families, and advertised features from computer_use_remote start/status results, add backend-gated ax_snapshot and ax_action handling, and prompt the model to load the macOS skill only when the CLI reports matching support.
Extend the existing Codex OAuth image bridge regression to cover multiple text parts before an image_url, matching the computer-use capture shape that combines tool context with a screenshot attachment.
Map COMPUTER_USE_APPROVAL_REQUIRED tool responses into the existing COMPUTER_USE_REARM_REQUIRED stop guidance. This keeps agents from retrying desktop actions or using screenshot fallbacks when macOS permissions still require a user-approved re-arm.
Convert Chat Completions image_url content parts into Responses API input_image parts instead of normalizing multimodal messages down to text.
Keep text-only content lists as plain text and add OAuth bridge regression tests for image passthrough.
Store computer-use screenshots as standalone RawMessage entries after the textual tool result, matching the existing vision_load path so the model receives a real multimodal message.
Prefer shared screenshot file paths over base64 artifacts when available, and tighten host computer-use guidance so agents stop instead of proceeding from unverified state when a screenshot is not visible.
Update computer_use_remote guidance for Ubuntu/GNOME/Wayland so hide-window tasks use Super+H instead of Alt+F9. Reinforce that type results only prove keystrokes were sent and that the agent must verify the fresh screenshot before typing follow-up text or claiming success.
Clarify that computer_use_remote is the only host desktop-control path and that linux-desktop only targets the internal Docker/Xpra Desktop. Add host-computer-use retrieval triggers and regression coverage so host-screen queries rank ahead of the Xpra desktop skill while explicit Agent Zero Desktop requests still route to linux-desktop.
Return computer-use captures as multimodal tool-result content so the model can visually inspect fresh screenshots after each remote action. Keep the textual preview for logs and prune older capture payloads to avoid runaway context growth.
Sanitize embedded image data URLs from prompt token estimates so screenshot attachments do not explode context accounting.\n\nStrengthen computer_use_remote prompt, skill, and capture-result text so state-changing desktop actions are treated as attempts until a fresh screen visibly confirms the requested outcome.
Add the standard tool prompt contract so the model can call computer_use_remote in live sessions.
Keep availability, CLI enablement, trust mode, and re-arm enforcement as runtime checks instead of prompt-loader gating.
Update connector prompt and prompt-budget tests to cover the new exposure path.
Let users hide skills from the model-facing available catalog through the chat Skills selector while keeping pinned skill injection as a separate mode.
Hidden skills are filtered from skill listing, search, loading, relevant recall, and loaded-skill prompt injection, with chat-level show/hide overrides and persistent default hidden-skill config support.