Commit graph

2230 commits

Author SHA1 Message Date
frdel
175baa49db Add AGENTS.md DOX files and migrate docs
Introduce DOX (AGENTS.md) contracts across the repository to formalize ownership and local work contracts: adds .github/AGENTS.md plus AGENTS.md files for agents, api, conf, docker, docs, extensions, helpers, knowledge, lib, plugins, prompts, scripts, skills, tests, tools, usr, webui (and several subfolders). Update root AGENTS.md (content and last-updated date) to include DOX framework guidance and a child DOX index. Update .gitignore to allow usr/ and usr/plugins AGENTS.md files. Remove legacy deep-dive files under docs/agents (banners, components, modals, plugins) and migrate frontend/plugin references to webui/ and plugins/ DOX locations. Also adjust plugins/README.md and several skills/*/SKILL.md files to align with the new DOX layout.
2026-06-01 13:55:07 +02:00
Alessandro
46112c9750 Add builtin ACP plugin
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Expose Agent Zero over the Agent Client Protocol as a builtin stdio plugin with session lifecycle support, streaming bridges, registry metadata, and editor workspace handling.

Add lazy dependency installation through the ACP plugin hook using the root requirements pin so self-updated instances can recover without a fresh Docker image.

Pin agent-client-protocol, add focused static coverage, and allow ACP sessions to override the per-context workdir for code execution and workdir prompts.
2026-06-01 02:40:28 +02:00
Alessandro
a7b4fcd798 Remove document query plugin requirements
The LiteParse dependency is already managed from the root requirements.txt, so the document query plugin should not carry a separate requirements file.

This keeps dependency installation centralized for the bundled core plugin.
2026-06-01 02:29:22 +02:00
Alessandro
bb6004c77f Harden Tailscale Funnel remote control
Start Tailscale Remote Control through Funnel on HTTPS port 443, verify the prepared client supports the funnel command, and give the Funnel startup path a longer approval window.

Forward Tailscale approval/login URLs through the shared tunnel notification stream so the UI renders an actionable browser link like Microsoft Dev Tunnels, with clearer basic-user copy around approval and public Funnel URLs.

Keep ngrok removed from the Remote Control surface and extend regression coverage for Funnel command wiring, unsupported clients, and approval-link handling.
2026-06-01 02:11:58 +02:00
Alessandro Frau
08e3feef20
Merge pull request #1677 from cooper-oai/cooper/agent-zero-private-codex-oauth-store
Harden Codex OAuth refresh token ownership
2026-06-01 02:08:31 +02:00
Alessandro
d8710c3f81 Preserve computer-use capability metadata
Store and return the connector computer-use contract_version and nested capabilities payload in Agent Zero, surface the contract version in tool status text, and teach the remote computer-use prompt/skill to prefer the structured background dispatch contract over OS-name assumptions.
2026-05-30 22:39:44 +02:00
Alessandro
ce4156a110 Harden internal Xpra desktop control
Move the internal Linux Desktop skill toward a structured-first workflow: state/window commands before screenshots, batched desktopctl sequences in one process, and final screenshots only when pixels matter. Normalize internal system Desktop resize requests so narrow portrait canvas sizes do not shrink the real X11 root display and distort screenshots. Keep the live Docker runtime in sync with the changed Desktop files.
2026-05-30 22:22:15 +02:00
Alessandro
3c1bdfa5f9 Add background computer-use tool contract
Expose native window listing, indexed window state, and element_action dispatch modes through the Agent Zero connector tool. Update host computer-use prompts and platform skills so agents prefer background structural targeting and only rely on screenshots for foreground or uncertain actions.
2026-05-30 21:48:43 +02:00
Alessandro
9e4b2f1843 Tune LiteParse OCR defaults
Add an adaptive OCR heuristic that samples PDF text density and disables LiteParse OCR for large text-rich PDFs before the OCR path reaches timeout territory.

Keep LiteParse isolated in a subprocess regardless of stale user config, remove the subprocess toggle from the settings UI, and raise the default LiteParse worker count to 2 for a safer multi-chat speedup.

Update Document Query docs and focused tests for the new heuristic, mandatory isolation, and worker default.
2026-05-30 19:02:10 +02:00
Alessandro
edd58a42d2 Fix durable screenshot artifacts and Xpra sizing
Materialize browser, desktop, computer-use, and vision-load screenshots into chat-scoped artifacts so historical image refs survive temporary screenshot pruning.

Keep history serialization free of rescue assumptions, document durable screenshot behavior in tool prompts/skills, and size Xpra canvases from backend-normalized display dimensions to prevent stretched desktop views.

Verified with focused pytest coverage plus live Docker checks for browser screenshot persistence and Xpra canvas dimensions.
2026-05-30 17:45:19 +02:00
Alessandro Frau
45e4bd892c
Merge pull request #1528 from Deimos-AI/pr/document-query-parser-abstraction
Some checks failed
Build And Publish Docker Images / plan (push) Has been cancelled
Build And Publish Docker Images / build (push) Has been cancelled
feat: extract document_query into _document_query plugin with parser strategy pattern
2026-05-29 16:57:24 +02:00
Alessandro
98f6c17d15 feat(document_query): expand settings panel and thumbnail
Expose the main Document Query parser, retrieval, fetch, LiteParse/OCR, and fallback controls in the plugin settings UI. Add a generated 256x256 JPEG thumbnail under the plugin size limit and cover both the settings wiring and thumbnail constraints with focused tests.
2026-05-29 16:47:38 +02:00
Alessandro
59dd1c99cb feat(document_query): expose parser concurrency setting
Add a Document Query plugin settings panel that maps to the existing parser_concurrency runtime limit, with a focused regression test so the UI remains wired to the backend setting.
2026-05-29 16:30:47 +02:00
Alessandro
6df3acc1e6 fix(document_query): pin LiteParse dependency
Pin LiteParse to 2.0.3 in both Docker requirements and the plugin hook requirements so new images and existing plugin installs resolve the same tested runtime.
2026-05-29 16:07:20 +02:00
Alessandro
b2ead06a4e fix(document_query): isolate LiteParse parsing
Run LiteParse in a subprocess so native parser crashes cannot take down the Web UI process. Bound parser concurrency and LiteParse workers for multi-chat stability, seed Q&A context with leading document chunks for title/abstract grounding, and keep a small-document fallback when vector search returns no chunks.
2026-05-29 15:51:59 +02:00
Cooper Gamble
808629942a Identify Agent Zero OAuth refresh requests 2026-05-29 12:06:34 +00:00
Cooper Gamble
96088de923 Harden OAuth auth path and proxy edge cases 2026-05-29 11:54:26 +00:00
Cooper Gamble
d36228af82 Reject shared OAuth auth file aliases 2026-05-29 11:28:42 +00:00
Cooper Gamble
1acf88e323 Preserve private mode for OAuth fallback writes 2026-05-29 11:23:19 +00:00
Cooper Gamble
3c6aaae737 Handle private OAuth auth file edge cases 2026-05-29 11:19:44 +00:00
Cooper Gamble
47078c00ae Support bind-mounted private OAuth auth files 2026-05-29 11:15:08 +00:00
Cooper Gamble
2d5f7b89ec Harden Codex OAuth refresh token ownership 2026-05-29 11:10:16 +00:00
Alessandro
d039af512a fix(document_query): clean prompt spelling and legacy references
Rename the query optimization prompt from optmimize to optimize, update the helper lookup, and fix the concise typo inside the prompt.

Also add a regression assertion for the corrected prompt filename and remove the remaining literal a0_small test references so global audits stay clean.
2026-05-29 12:46:32 +02:00
Alessandro
6ccbae0712 feat(document_query): add liteparse runtime and progressive skill
Add LiteParse as the preferred parser path with legacy parser fallbacks, centralized document fetching, generic user-facing progress, and compatibility shims for the former helper/tool imports.

Install the runtime through Docker requirements for fresh images and through the _document_query plugin hook/startup migration for existing installations.

Move the long document_query tool instructions into a document-query skill and leave a compact tool prompt stub that directs the model to load the skill before using document_query for documents, code-file Q&A, and document-image OCR. Also add default Agent Zero guidance for document/code/OCR Q&A routing.

Tests:
- PYTHONPATH=/home/eclypso/a0/agent-zero-pr-1528 conda run -n a0 pytest tests/test_document_query_plugin.py -q
- python -m compileall -q plugins/_document_query helpers/document_query.py tools/document_query.py tests/test_document_query_plugin.py
- git diff --check
- Live Agent Zero Web UI E2E at localhost:32080: PDF Q&A, code-file Q&A through document_query skill, and W-4 document-image OCR

Broader legacy pytest probe remains blocked by unrelated browser-agent, docker workflow branch expectation, and webui fixture path failures in this older PR worktree.
2026-05-29 12:45:14 +02:00
Deimos Agent
5fd7a6a79e feat: extract document_query into _document_query plugin with parser strategy pattern
- Create plugins/_document_query/ with full plugin structure:
  plugin.yaml, default_config.yaml, tools/, helpers/, helpers/parsers/, prompts/, README.md
- Add BaseParser ABC with asyncio.to_thread offload and configurable timeouts
- Implement 5 parsers: PDF (PyMuPDF+Tesseract), HTML (Markdownify),
  Text (expanded mimetypes: YAML, XML, TOML, JS, TS, shell),
  Image (Unstructured), Unstructured (catch-all)
- Add MIME type registry with priority-based routing via get_parser_for_mimetype()
- Add gather_timeout on asyncio.gather for bounded concurrent fetches
- All config externalized to default_config.yaml
- Disable core files (._py.bak) replaced by plugin
- Update knowledge_tool._py import to plugin path
2026-05-29 12:45:05 +02:00
Alessandro
67224672e8 Add Remote Control tunnel providers
Rename the Remote Link UI and user-facing messages to Remote Control.

Refactor tunnel startup into provider helpers, remove ngrok support, and keep Cloudflare Tunnel, Microsoft Dev Tunnels, Serveo, and Tailscale wired through the Remote Control selector.

Add provider-time binary preparation, Tailscale userspace tailscaled startup, Tailscale login URL display, Microsoft Dev Tunnel progress feedback, and focused regression coverage for the Remote Control provider flows.
2026-05-28 22:33:29 +02:00
Alessandro
8af14fcd93 Clean desktop SSH agent state during self-update
Remove stale runtime entries from the desktop SSH agent directory when a self-update request is consumed. Keep the cleanup best-effort so missing paths, non-directory paths, and unexpected cleanup failures do not block update startup.

Cover successful cleanup, missing-directory skips, and failure fallback with focused self-update manager tests.
2026-05-27 16:25:18 +02:00
Alessandro
d4dc83ba78 Expose installed plugin toggles
Advertise the installed_plugins connector capability and add a protected API endpoint that lists already-installed Agent Zero plugins and toggles supported plugins only.

The endpoint normalizes plugin metadata, preserves the installed-only safety boundary, and refuses changes to protected plugins such as _a0_connector so the CLI cannot disconnect itself.
2026-05-26 20:05:47 +02:00
Alessandro
ee0acc72f9 Improve browser iframe DOM actions
Some checks failed
Build And Publish Docker Images / plan (push) Has been cancelled
Build And Publish Docker Images / build (push) Has been cancelled
Add an Agent Zero owned browser DOM helper that captures shadow DOM and iframe content with frame-chain/node references.\n\nInstall the DOM helper before page-content capture for both local and host-browser runtimes, and send DOM helper payloads to A0 CLI host browser sessions when needed.\n\nCover iframe content refs and host-browser payload delivery in focused regression tests.
2026-05-26 17:36:19 +02:00
Alessandro
aa139af454 Update README.md 2026-05-26 17:21:01 +02:00
Alessandro
b337ba3db8 Make skills cap configurable
- add max_active_skills to the _skills plugin config and expose it in the config UI
- enforce the scoped cap consistently in skills runtime, catalog responses, and chat activation
- cover raised and lowered cap behavior with focused skills runtime tests
2026-05-26 17:01:13 +02:00
Alessandro
ecb80d3876 Refine skills modal toggle styling
- make the active Visible/Pinned state read as active instead of grey
- replace the bordered toggle wrapper with a simpler slash-separated control
- remove the visible/hidden count label and its unused store helpers
2026-05-26 15:58:07 +02:00
Alessandro
4f06aa0a8e Fix MCP multimodal content handling
Preserve MCP image, audio, and resource tool results instead of collapsing non-text responses into an empty textual result. Images and image resources now flow into raw history as data URL attachments, while audio and non-image binary resources are saved as artifacts with normalized paths.

Extract shared media artifact helpers for base64 validation, image data URLs, decoded-size checks, artifact saving, MIME normalization, and safe filenames. Reuse the shared helpers from MCP, browser connector, and computer-use artifact paths, and add focused regression coverage.
2026-05-26 15:31:33 +02:00
Alessandro
369e0df17f Skip transient Desktop SSH agent state during self-update backup
Exclude the Desktop profile .ssh/agent runtime directory from usr backups during self-update so live SSH agent sockets do not abort upgrades.

Keep the rule in the self-update manager, where the usr backup actually runs, and cover it with a regression test alongside the existing runtime-socket backup cases.
2026-05-26 14:54:53 +02:00
Alessandro
22e79811f3 Decrease chat composer max-height
Chat input box max-height was set too high, to the point of breaking other UI elements.
2026-05-26 14:01:41 +02:00
Alessandro
a999ed02f2 Refresh README showcase assets
Some checks failed
Build And Publish Docker Images / plan (push) Has been cancelled
Build And Publish Docker Images / build (push) Has been cancelled
2026-05-23 21:40:00 +02:00
Alessandro
97953db46b Guide computer-use remote through Linux AT-SPI
Add a Linux-specific host computer-use skill, route Wayland/AT-SPI backends to it instead of macOS AX guidance, and include compact structural tree outlines in AX/UIA snapshot responses so agents can pick paths and semantic targets from the tool result.
2026-05-23 19:25:51 +02:00
Alessandro
b670559322 Guide Windows computer use through UIA
Add the Windows host computer-use skill and teach computer_use_remote to surface UIA window-management guidance, selector passthrough, and click-last workflow hints. Keep backend-specific actions out of generic guidance while exposing Windows structural operations when the backend advertises them.

Tests: uv run --python 3.12 --with-requirements requirements.txt --with-requirements requirements2.txt --with-requirements requirements.dev.txt --with litellm pytest tests\\test_tool_action_contracts.py tests\\test_a0_connector_prompt_gating.py tests\\test_skills_runtime.py -q
2026-05-23 18:25:04 +02:00
Alessandro Frau
a931759868 Keep backend computer-use actions out of generic guidance
Move explicit AX action names and argument details out of the always-loaded computer_use_remote prompt and generic host-computer-use skill. The generic guidance now only explains backend discovery and skill loading, while host-computer-use-macos remains the detailed home for macOS structural targeting. Also soften the old Super+H hide-window guidance so window actions are chosen from the reported backend and verified visually.
2026-05-23 15:27:14 +02:00
Alessandro Frau
e7cb3aa3fa Split macOS computer-use backend guidance
Add a macOS-specific computer-use skill for AX structural targeting and keep the generic host skill backend-neutral. Surface backend ids, families, and advertised features from computer_use_remote start/status results, add backend-gated ax_snapshot and ax_action handling, and prompt the model to load the macOS skill only when the CLI reports matching support.
2026-05-23 15:08:06 +02:00
Alessandro Frau
0c9939ba92 Cover Codex OAuth multimodal tool results
Extend the existing Codex OAuth image bridge regression to cover multiple text parts before an image_url, matching the computer-use capture shape that combines tool context with a screenshot attachment.
2026-05-23 15:07:51 +02:00
Alessandro Frau
a80cf3842e Treat computer-use approval denial as rearm required
Map COMPUTER_USE_APPROVAL_REQUIRED tool responses into the existing COMPUTER_USE_REARM_REQUIRED stop guidance. This keeps agents from retrying desktop actions or using screenshot fallbacks when macOS permissions still require a user-approved re-arm.
2026-05-23 13:32:47 +02:00
Alessandro
4a836940f3 Preserve vision inputs in Codex OAuth proxy
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Convert Chat Completions image_url content parts into Responses API input_image parts instead of normalizing multimodal messages down to text.

Keep text-only content lists as plain text and add OAuth bridge regression tests for image passthrough.
2026-05-23 12:02:11 +02:00
Alessandro
cee9abfde4 Expose computer-use captures as vision messages
Store computer-use screenshots as standalone RawMessage entries after the textual tool result, matching the existing vision_load path so the model receives a real multimodal message.

Prefer shared screenshot file paths over base64 artifacts when available, and tighten host computer-use guidance so agents stop instead of proceeding from unverified state when a screenshot is not visible.
2026-05-23 11:51:24 +02:00
Alessandro
2f9037a195 Prefer Super+H for host window hiding
Update computer_use_remote guidance for Ubuntu/GNOME/Wayland so hide-window tasks use Super+H instead of Alt+F9. Reinforce that type results only prove keystrokes were sent and that the agent must verify the fresh screenshot before typing follow-up text or claiming success.
2026-05-23 11:33:06 +02:00
Alessandro
ae5e462cd7 Separate host computer use from Xpra desktop
Clarify that computer_use_remote is the only host desktop-control path and that linux-desktop only targets the internal Docker/Xpra Desktop. Add host-computer-use retrieval triggers and regression coverage so host-screen queries rank ahead of the Xpra desktop skill while explicit Agent Zero Desktop requests still route to linux-desktop.
2026-05-23 11:19:23 +02:00
Alessandro
30d364bb97 Attach computer-use captures to tool results
Return computer-use captures as multimodal tool-result content so the model can visually inspect fresh screenshots after each remote action. Keep the textual preview for logs and prune older capture payloads to avoid runaway context growth.
2026-05-23 11:10:50 +02:00
Alessandro
1f34b87c00 Require visual verification for computer-use captures
Sanitize embedded image data URLs from prompt token estimates so screenshot attachments do not explode context accounting.\n\nStrengthen computer_use_remote prompt, skill, and capture-result text so state-changing desktop actions are treated as attempts until a fresh screen visibly confirms the requested outcome.
2026-05-23 10:32:37 +02:00
Alessandro
60c36d16d8 Expose computer_use_remote as a runtime-checked tool
Add the standard tool prompt contract so the model can call computer_use_remote in live sessions.

Keep availability, CLI enablement, trust mode, and re-arm enforcement as runtime checks instead of prompt-loader gating.

Update connector prompt and prompt-budget tests to cover the new exposure path.
2026-05-23 09:40:48 +02:00
Alessandro
5e2c2a86ef Add skill visibility controls
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Let users hide skills from the model-facing available catalog through the chat Skills selector while keeping pinned skill injection as a separate mode.

Hidden skills are filtered from skill listing, search, loading, relevant recall, and loaded-skill prompt injection, with chat-level show/hide overrides and persistent default hidden-skill config support.
2026-05-22 17:44:22 +02:00