agent-zero

mirror of https://github.com/agent0ai/agent-zero.git synced 2026-07-10 01:18:29 +00:00

Author	SHA1	Message	Date
frdel	175baa49db	Add AGENTS.md DOX files and migrate docs Introduce DOX (AGENTS.md) contracts across the repository to formalize ownership and local work contracts: adds .github/AGENTS.md plus AGENTS.md files for agents, api, conf, docker, docs, extensions, helpers, knowledge, lib, plugins, prompts, scripts, skills, tests, tools, usr, webui (and several subfolders). Update root AGENTS.md (content and last-updated date) to include DOX framework guidance and a child DOX index. Update .gitignore to allow usr/ and usr/plugins AGENTS.md files. Remove legacy deep-dive files under docs/agents (banners, components, modals, plugins) and migrate frontend/plugin references to webui/ and plugins/ DOX locations. Also adjust plugins/README.md and several skills/*/SKILL.md files to align with the new DOX layout.	2026-06-01 13:55:07 +02:00
Alessandro	46112c9750	Add builtin ACP plugin Some checks are pending Build And Publish Docker Images / plan (push) Waiting to run Details Build And Publish Docker Images / build (push) Blocked by required conditions Details Expose Agent Zero over the Agent Client Protocol as a builtin stdio plugin with session lifecycle support, streaming bridges, registry metadata, and editor workspace handling. Add lazy dependency installation through the ACP plugin hook using the root requirements pin so self-updated instances can recover without a fresh Docker image. Pin agent-client-protocol, add focused static coverage, and allow ACP sessions to override the per-context workdir for code execution and workdir prompts.	2026-06-01 02:40:28 +02:00
Alessandro	a7b4fcd798	Remove document query plugin requirements The LiteParse dependency is already managed from the root requirements.txt, so the document query plugin should not carry a separate requirements file. This keeps dependency installation centralized for the bundled core plugin.	2026-06-01 02:29:22 +02:00
Alessandro	bb6004c77f	Harden Tailscale Funnel remote control Start Tailscale Remote Control through Funnel on HTTPS port 443, verify the prepared client supports the funnel command, and give the Funnel startup path a longer approval window. Forward Tailscale approval/login URLs through the shared tunnel notification stream so the UI renders an actionable browser link like Microsoft Dev Tunnels, with clearer basic-user copy around approval and public Funnel URLs. Keep ngrok removed from the Remote Control surface and extend regression coverage for Funnel command wiring, unsupported clients, and approval-link handling.	2026-06-01 02:11:58 +02:00
Alessandro Frau	08e3feef20	Merge pull request #1677 from cooper-oai/cooper/agent-zero-private-codex-oauth-store Harden Codex OAuth refresh token ownership	2026-06-01 02:08:31 +02:00
Alessandro	d8710c3f81	Preserve computer-use capability metadata Store and return the connector computer-use contract_version and nested capabilities payload in Agent Zero, surface the contract version in tool status text, and teach the remote computer-use prompt/skill to prefer the structured background dispatch contract over OS-name assumptions.	2026-05-30 22:39:44 +02:00
Alessandro	ce4156a110	Harden internal Xpra desktop control Move the internal Linux Desktop skill toward a structured-first workflow: state/window commands before screenshots, batched desktopctl sequences in one process, and final screenshots only when pixels matter. Normalize internal system Desktop resize requests so narrow portrait canvas sizes do not shrink the real X11 root display and distort screenshots. Keep the live Docker runtime in sync with the changed Desktop files.	2026-05-30 22:22:15 +02:00
Alessandro	3c1bdfa5f9	Add background computer-use tool contract Expose native window listing, indexed window state, and element_action dispatch modes through the Agent Zero connector tool. Update host computer-use prompts and platform skills so agents prefer background structural targeting and only rely on screenshots for foreground or uncertain actions.	2026-05-30 21:48:43 +02:00
Alessandro	9e4b2f1843	Tune LiteParse OCR defaults Add an adaptive OCR heuristic that samples PDF text density and disables LiteParse OCR for large text-rich PDFs before the OCR path reaches timeout territory. Keep LiteParse isolated in a subprocess regardless of stale user config, remove the subprocess toggle from the settings UI, and raise the default LiteParse worker count to 2 for a safer multi-chat speedup. Update Document Query docs and focused tests for the new heuristic, mandatory isolation, and worker default.	2026-05-30 19:02:10 +02:00
Alessandro	edd58a42d2	Fix durable screenshot artifacts and Xpra sizing Materialize browser, desktop, computer-use, and vision-load screenshots into chat-scoped artifacts so historical image refs survive temporary screenshot pruning. Keep history serialization free of rescue assumptions, document durable screenshot behavior in tool prompts/skills, and size Xpra canvases from backend-normalized display dimensions to prevent stretched desktop views. Verified with focused pytest coverage plus live Docker checks for browser screenshot persistence and Xpra canvas dimensions.	2026-05-30 17:45:19 +02:00
Alessandro Frau	45e4bd892c	Merge pull request #1528 from Deimos-AI/pr/document-query-parser-abstraction Some checks failed Build And Publish Docker Images / plan (push) Has been cancelled Details Build And Publish Docker Images / build (push) Has been cancelled Details feat: extract document_query into _document_query plugin with parser strategy pattern	2026-05-29 16:57:24 +02:00
Alessandro	98f6c17d15	feat(document_query): expand settings panel and thumbnail Expose the main Document Query parser, retrieval, fetch, LiteParse/OCR, and fallback controls in the plugin settings UI. Add a generated 256x256 JPEG thumbnail under the plugin size limit and cover both the settings wiring and thumbnail constraints with focused tests.	2026-05-29 16:47:38 +02:00
Alessandro	59dd1c99cb	feat(document_query): expose parser concurrency setting Add a Document Query plugin settings panel that maps to the existing parser_concurrency runtime limit, with a focused regression test so the UI remains wired to the backend setting.	2026-05-29 16:30:47 +02:00
Alessandro	6df3acc1e6	fix(document_query): pin LiteParse dependency Pin LiteParse to 2.0.3 in both Docker requirements and the plugin hook requirements so new images and existing plugin installs resolve the same tested runtime.	2026-05-29 16:07:20 +02:00
Alessandro	b2ead06a4e	fix(document_query): isolate LiteParse parsing Run LiteParse in a subprocess so native parser crashes cannot take down the Web UI process. Bound parser concurrency and LiteParse workers for multi-chat stability, seed Q&A context with leading document chunks for title/abstract grounding, and keep a small-document fallback when vector search returns no chunks.	2026-05-29 15:51:59 +02:00
Cooper Gamble	808629942a	Identify Agent Zero OAuth refresh requests	2026-05-29 12:06:34 +00:00
Cooper Gamble	96088de923	Harden OAuth auth path and proxy edge cases	2026-05-29 11:54:26 +00:00
Cooper Gamble	d36228af82	Reject shared OAuth auth file aliases	2026-05-29 11:28:42 +00:00
Cooper Gamble	1acf88e323	Preserve private mode for OAuth fallback writes	2026-05-29 11:23:19 +00:00
Cooper Gamble	3c6aaae737	Handle private OAuth auth file edge cases	2026-05-29 11:19:44 +00:00
Cooper Gamble	47078c00ae	Support bind-mounted private OAuth auth files	2026-05-29 11:15:08 +00:00
Cooper Gamble	2d5f7b89ec	Harden Codex OAuth refresh token ownership	2026-05-29 11:10:16 +00:00
Alessandro	d039af512a	fix(document_query): clean prompt spelling and legacy references Rename the query optimization prompt from optmimize to optimize, update the helper lookup, and fix the concise typo inside the prompt. Also add a regression assertion for the corrected prompt filename and remove the remaining literal a0_small test references so global audits stay clean.	2026-05-29 12:46:32 +02:00
Alessandro	6ccbae0712	feat(document_query): add liteparse runtime and progressive skill Add LiteParse as the preferred parser path with legacy parser fallbacks, centralized document fetching, generic user-facing progress, and compatibility shims for the former helper/tool imports. Install the runtime through Docker requirements for fresh images and through the _document_query plugin hook/startup migration for existing installations. Move the long document_query tool instructions into a document-query skill and leave a compact tool prompt stub that directs the model to load the skill before using document_query for documents, code-file Q&A, and document-image OCR. Also add default Agent Zero guidance for document/code/OCR Q&A routing. Tests: - PYTHONPATH=/home/eclypso/a0/agent-zero-pr-1528 conda run -n a0 pytest tests/test_document_query_plugin.py -q - python -m compileall -q plugins/_document_query helpers/document_query.py tools/document_query.py tests/test_document_query_plugin.py - git diff --check - Live Agent Zero Web UI E2E at localhost:32080: PDF Q&A, code-file Q&A through document_query skill, and W-4 document-image OCR Broader legacy pytest probe remains blocked by unrelated browser-agent, docker workflow branch expectation, and webui fixture path failures in this older PR worktree.	2026-05-29 12:45:14 +02:00
Deimos Agent	5fd7a6a79e	feat: extract document_query into _document_query plugin with parser strategy pattern - Create plugins/_document_query/ with full plugin structure: plugin.yaml, default_config.yaml, tools/, helpers/, helpers/parsers/, prompts/, README.md - Add BaseParser ABC with asyncio.to_thread offload and configurable timeouts - Implement 5 parsers: PDF (PyMuPDF+Tesseract), HTML (Markdownify), Text (expanded mimetypes: YAML, XML, TOML, JS, TS, shell), Image (Unstructured), Unstructured (catch-all) - Add MIME type registry with priority-based routing via get_parser_for_mimetype() - Add gather_timeout on asyncio.gather for bounded concurrent fetches - All config externalized to default_config.yaml - Disable core files (._py.bak) replaced by plugin - Update knowledge_tool._py import to plugin path	2026-05-29 12:45:05 +02:00
Alessandro	67224672e8	Add Remote Control tunnel providers Rename the Remote Link UI and user-facing messages to Remote Control. Refactor tunnel startup into provider helpers, remove ngrok support, and keep Cloudflare Tunnel, Microsoft Dev Tunnels, Serveo, and Tailscale wired through the Remote Control selector. Add provider-time binary preparation, Tailscale userspace tailscaled startup, Tailscale login URL display, Microsoft Dev Tunnel progress feedback, and focused regression coverage for the Remote Control provider flows.	2026-05-28 22:33:29 +02:00
Alessandro	8af14fcd93	Clean desktop SSH agent state during self-update Remove stale runtime entries from the desktop SSH agent directory when a self-update request is consumed. Keep the cleanup best-effort so missing paths, non-directory paths, and unexpected cleanup failures do not block update startup. Cover successful cleanup, missing-directory skips, and failure fallback with focused self-update manager tests.	2026-05-27 16:25:18 +02:00
Alessandro	d4dc83ba78	Expose installed plugin toggles Advertise the installed_plugins connector capability and add a protected API endpoint that lists already-installed Agent Zero plugins and toggles supported plugins only. The endpoint normalizes plugin metadata, preserves the installed-only safety boundary, and refuses changes to protected plugins such as _a0_connector so the CLI cannot disconnect itself.	2026-05-26 20:05:47 +02:00
Alessandro	ee0acc72f9	Improve browser iframe DOM actions Some checks failed Build And Publish Docker Images / plan (push) Has been cancelled Details Build And Publish Docker Images / build (push) Has been cancelled Details Add an Agent Zero owned browser DOM helper that captures shadow DOM and iframe content with frame-chain/node references.\n\nInstall the DOM helper before page-content capture for both local and host-browser runtimes, and send DOM helper payloads to A0 CLI host browser sessions when needed.\n\nCover iframe content refs and host-browser payload delivery in focused regression tests.	2026-05-26 17:36:19 +02:00
Alessandro	aa139af454	Update README.md	2026-05-26 17:21:01 +02:00
Alessandro	b337ba3db8	Make skills cap configurable - add max_active_skills to the _skills plugin config and expose it in the config UI - enforce the scoped cap consistently in skills runtime, catalog responses, and chat activation - cover raised and lowered cap behavior with focused skills runtime tests	2026-05-26 17:01:13 +02:00
Alessandro	ecb80d3876	Refine skills modal toggle styling - make the active Visible/Pinned state read as active instead of grey - replace the bordered toggle wrapper with a simpler slash-separated control - remove the visible/hidden count label and its unused store helpers	2026-05-26 15:58:07 +02:00
Alessandro	4f06aa0a8e	Fix MCP multimodal content handling Preserve MCP image, audio, and resource tool results instead of collapsing non-text responses into an empty textual result. Images and image resources now flow into raw history as data URL attachments, while audio and non-image binary resources are saved as artifacts with normalized paths. Extract shared media artifact helpers for base64 validation, image data URLs, decoded-size checks, artifact saving, MIME normalization, and safe filenames. Reuse the shared helpers from MCP, browser connector, and computer-use artifact paths, and add focused regression coverage.	2026-05-26 15:31:33 +02:00
Alessandro	369e0df17f	Skip transient Desktop SSH agent state during self-update backup Exclude the Desktop profile .ssh/agent runtime directory from usr backups during self-update so live SSH agent sockets do not abort upgrades. Keep the rule in the self-update manager, where the usr backup actually runs, and cover it with a regression test alongside the existing runtime-socket backup cases.	2026-05-26 14:54:53 +02:00
Alessandro	22e79811f3	Decrease chat composer max-height Chat input box max-height was set too high, to the point of breaking other UI elements.	2026-05-26 14:01:41 +02:00
Alessandro	a999ed02f2	Refresh README showcase assets Some checks failed Build And Publish Docker Images / plan (push) Has been cancelled Details Build And Publish Docker Images / build (push) Has been cancelled Details	2026-05-23 21:40:00 +02:00
Alessandro	97953db46b	Guide computer-use remote through Linux AT-SPI Add a Linux-specific host computer-use skill, route Wayland/AT-SPI backends to it instead of macOS AX guidance, and include compact structural tree outlines in AX/UIA snapshot responses so agents can pick paths and semantic targets from the tool result.	2026-05-23 19:25:51 +02:00
Alessandro	b670559322	Guide Windows computer use through UIA Add the Windows host computer-use skill and teach computer_use_remote to surface UIA window-management guidance, selector passthrough, and click-last workflow hints. Keep backend-specific actions out of generic guidance while exposing Windows structural operations when the backend advertises them. Tests: uv run --python 3.12 --with-requirements requirements.txt --with-requirements requirements2.txt --with-requirements requirements.dev.txt --with litellm pytest tests\\test_tool_action_contracts.py tests\\test_a0_connector_prompt_gating.py tests\\test_skills_runtime.py -q	2026-05-23 18:25:04 +02:00
Alessandro Frau	a931759868	Keep backend computer-use actions out of generic guidance Move explicit AX action names and argument details out of the always-loaded computer_use_remote prompt and generic host-computer-use skill. The generic guidance now only explains backend discovery and skill loading, while host-computer-use-macos remains the detailed home for macOS structural targeting. Also soften the old Super+H hide-window guidance so window actions are chosen from the reported backend and verified visually.	2026-05-23 15:27:14 +02:00
Alessandro Frau	e7cb3aa3fa	Split macOS computer-use backend guidance Add a macOS-specific computer-use skill for AX structural targeting and keep the generic host skill backend-neutral. Surface backend ids, families, and advertised features from computer_use_remote start/status results, add backend-gated ax_snapshot and ax_action handling, and prompt the model to load the macOS skill only when the CLI reports matching support.	2026-05-23 15:08:06 +02:00
Alessandro Frau	0c9939ba92	Cover Codex OAuth multimodal tool results Extend the existing Codex OAuth image bridge regression to cover multiple text parts before an image_url, matching the computer-use capture shape that combines tool context with a screenshot attachment.	2026-05-23 15:07:51 +02:00
Alessandro Frau	a80cf3842e	Treat computer-use approval denial as rearm required Map COMPUTER_USE_APPROVAL_REQUIRED tool responses into the existing COMPUTER_USE_REARM_REQUIRED stop guidance. This keeps agents from retrying desktop actions or using screenshot fallbacks when macOS permissions still require a user-approved re-arm.	2026-05-23 13:32:47 +02:00
Alessandro	4a836940f3	Preserve vision inputs in Codex OAuth proxy Some checks are pending Build And Publish Docker Images / plan (push) Waiting to run Details Build And Publish Docker Images / build (push) Blocked by required conditions Details Convert Chat Completions image_url content parts into Responses API input_image parts instead of normalizing multimodal messages down to text. Keep text-only content lists as plain text and add OAuth bridge regression tests for image passthrough.	2026-05-23 12:02:11 +02:00
Alessandro	cee9abfde4	Expose computer-use captures as vision messages Store computer-use screenshots as standalone RawMessage entries after the textual tool result, matching the existing vision_load path so the model receives a real multimodal message. Prefer shared screenshot file paths over base64 artifacts when available, and tighten host computer-use guidance so agents stop instead of proceeding from unverified state when a screenshot is not visible.	2026-05-23 11:51:24 +02:00
Alessandro	2f9037a195	Prefer Super+H for host window hiding Update computer_use_remote guidance for Ubuntu/GNOME/Wayland so hide-window tasks use Super+H instead of Alt+F9. Reinforce that type results only prove keystrokes were sent and that the agent must verify the fresh screenshot before typing follow-up text or claiming success.	2026-05-23 11:33:06 +02:00
Alessandro	ae5e462cd7	Separate host computer use from Xpra desktop Clarify that computer_use_remote is the only host desktop-control path and that linux-desktop only targets the internal Docker/Xpra Desktop. Add host-computer-use retrieval triggers and regression coverage so host-screen queries rank ahead of the Xpra desktop skill while explicit Agent Zero Desktop requests still route to linux-desktop.	2026-05-23 11:19:23 +02:00
Alessandro	30d364bb97	Attach computer-use captures to tool results Return computer-use captures as multimodal tool-result content so the model can visually inspect fresh screenshots after each remote action. Keep the textual preview for logs and prune older capture payloads to avoid runaway context growth.	2026-05-23 11:10:50 +02:00
Alessandro	1f34b87c00	Require visual verification for computer-use captures Sanitize embedded image data URLs from prompt token estimates so screenshot attachments do not explode context accounting.\n\nStrengthen computer_use_remote prompt, skill, and capture-result text so state-changing desktop actions are treated as attempts until a fresh screen visibly confirms the requested outcome.	2026-05-23 10:32:37 +02:00
Alessandro	60c36d16d8	Expose computer_use_remote as a runtime-checked tool Add the standard tool prompt contract so the model can call computer_use_remote in live sessions. Keep availability, CLI enablement, trust mode, and re-arm enforcement as runtime checks instead of prompt-loader gating. Update connector prompt and prompt-budget tests to cover the new exposure path.	2026-05-23 09:40:48 +02:00
Alessandro	5e2c2a86ef	Add skill visibility controls Some checks are pending Build And Publish Docker Images / plan (push) Waiting to run Details Build And Publish Docker Images / build (push) Blocked by required conditions Details Let users hide skills from the model-facing available catalog through the chat Skills selector while keeping pinned skill injection as a separate mode. Hidden skills are filtered from skill listing, search, loading, relevant recall, and loaded-skill prompt injection, with chat-level show/hide overrides and persistent default hidden-skill config support.	2026-05-22 17:44:22 +02:00

1 2 3 4 5 ...

2230 commits