Commit graph

477 commits

Author SHA1 Message Date
Alessandro Frau
a80cf3842e Treat computer-use approval denial as rearm required
Map COMPUTER_USE_APPROVAL_REQUIRED tool responses into the existing COMPUTER_USE_REARM_REQUIRED stop guidance. This keeps agents from retrying desktop actions or using screenshot fallbacks when macOS permissions still require a user-approved re-arm.
2026-05-23 13:32:47 +02:00
Alessandro
4a836940f3 Preserve vision inputs in Codex OAuth proxy
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Convert Chat Completions image_url content parts into Responses API input_image parts instead of normalizing multimodal messages down to text.

Keep text-only content lists as plain text and add OAuth bridge regression tests for image passthrough.
2026-05-23 12:02:11 +02:00
Alessandro
cee9abfde4 Expose computer-use captures as vision messages
Store computer-use screenshots as standalone RawMessage entries after the textual tool result, matching the existing vision_load path so the model receives a real multimodal message.

Prefer shared screenshot file paths over base64 artifacts when available, and tighten host computer-use guidance so agents stop instead of proceeding from unverified state when a screenshot is not visible.
2026-05-23 11:51:24 +02:00
Alessandro
2f9037a195 Prefer Super+H for host window hiding
Update computer_use_remote guidance for Ubuntu/GNOME/Wayland so hide-window tasks use Super+H instead of Alt+F9. Reinforce that type results only prove keystrokes were sent and that the agent must verify the fresh screenshot before typing follow-up text or claiming success.
2026-05-23 11:33:06 +02:00
Alessandro
ae5e462cd7 Separate host computer use from Xpra desktop
Clarify that computer_use_remote is the only host desktop-control path and that linux-desktop only targets the internal Docker/Xpra Desktop. Add host-computer-use retrieval triggers and regression coverage so host-screen queries rank ahead of the Xpra desktop skill while explicit Agent Zero Desktop requests still route to linux-desktop.
2026-05-23 11:19:23 +02:00
Alessandro
30d364bb97 Attach computer-use captures to tool results
Return computer-use captures as multimodal tool-result content so the model can visually inspect fresh screenshots after each remote action. Keep the textual preview for logs and prune older capture payloads to avoid runaway context growth.
2026-05-23 11:10:50 +02:00
Alessandro
1f34b87c00 Require visual verification for computer-use captures
Sanitize embedded image data URLs from prompt token estimates so screenshot attachments do not explode context accounting.\n\nStrengthen computer_use_remote prompt, skill, and capture-result text so state-changing desktop actions are treated as attempts until a fresh screen visibly confirms the requested outcome.
2026-05-23 10:32:37 +02:00
Alessandro
60c36d16d8 Expose computer_use_remote as a runtime-checked tool
Add the standard tool prompt contract so the model can call computer_use_remote in live sessions.

Keep availability, CLI enablement, trust mode, and re-arm enforcement as runtime checks instead of prompt-loader gating.

Update connector prompt and prompt-budget tests to cover the new exposure path.
2026-05-23 09:40:48 +02:00
Alessandro
5e2c2a86ef Add skill visibility controls
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Let users hide skills from the model-facing available catalog through the chat Skills selector while keeping pinned skill injection as a separate mode.

Hidden skills are filtered from skill listing, search, loading, relevant recall, and loaded-skill prompt injection, with chat-level show/hide overrides and persistent default hidden-skill config support.
2026-05-22 17:44:22 +02:00
Alessandro
bb48fad754 Improve Codex OAuth model setup UI
Add Main and Utility Codex model selectors to the OAuth plugin config and persist them through the existing model config API.

Clean up the OAuth config layout by removing the redundant Check Models action, moving the available model list above Advanced, softening borders, and removing repeated account labels.

Show account quota usage bars on the welcome dashboard Codex card and add static coverage for the selector, model list, and quota UI.
2026-05-22 17:22:47 +02:00
Alessandro
770b53e292 Expose connector skill activation
Add a protected skills_activate endpoint and context-aware skills_list support so connector clients can activate skills in live chats. Advertise the capability through the connector API.
2026-05-22 17:03:04 +02:00
Alessandro
5464ead7ce Update computer use allow guidance
Point rearm-required Computer Use guidance at /computer-use on and remove the old confirm/free-run wording so the Agent Zero plugin matches the new CLI access model.
2026-05-22 16:20:58 +02:00
Alessandro
4f2d996ac8 Allow file browser to open root Markdown in Editor
Let File Browser-sourced Editor opens register existing Markdown files under the Agent Zero runtime root while preserving the stricter document artifact sandbox for ordinary document operations. Pass the file-browser source through the Editor store and cover the /a0/AGENTS.md-style path with regression tests.
2026-05-22 15:00:03 +02:00
Alessandro
1c9b5c8b21 Add contextual file browser surface actions
Route Markdown files to Editor, txt and Office documents to Desktop, and browser-renderable files to Browser from the file browser action menu. Extend the Desktop/document allowlists for txt files, keep unsupported small files on the legacy editor path, and harden tooltip cleanup for dropdown-triggered modal closes.
2026-05-22 14:45:53 +02:00
Alessandro
b64c43e736 Unify surface modal header actions
Add a shared grouped action rail for Browser, Desktop, and Editor floating surface modals so modality switches, canvas docking, focus mode, New, and close controls appear in a consistent order with separators. Fix the Desktop focus button class collision that hid the canvas docking button, and align dock button labels with canvas terminology.
2026-05-22 14:19:35 +02:00
Alessandro
8601f0d10c Split text editor and Office artifact ownership
- rename document_artifact to office_artifact and remove retired shims/facades
- make text_editor own Markdown saves, canvas-open intent, refresh, and stale-save protection
- keep Office artifacts Desktop-only with Office formats and update skills/tests
2026-05-22 11:21:04 +02:00
Alessandro
c1bdde057c Make Desktop screenshots ephemeral by default
Route in-process Xpra/Desktop screenshot observations through context-scoped ephemeral image refs with vision_load payloads, matching the privacy posture of computer-use and browser screenshots. Keep desktopctl shell observations path-based with aggressive pruning so image payloads are not printed into shell logs, and preserve explicit screenshot paths as durable user-owned artifacts.
2026-05-22 10:21:28 +02:00
Alessandro
430c48d1a5 Make browser screenshots ephemeral and context scoped
Route no-path browser screenshots through an in-process ephemeral image registry that vision_load consumes into the existing data-url model boundary. Stop materializing host-browser artifacts into tmp/browser/host-screenshots, keep explicit path screenshots durable, and make browser log metadata point at the active chat/task context while preserving browser-context detail.
2026-05-22 09:50:47 +02:00
Alessandro
e36cf19bfc Avoid persisting computer-use capture artifacts
Attach base64 computer-use capture artifacts directly as data-image URLs in RawMessage content instead of materializing them under the connector temp capture directory. Keep legacy path-based captures as a fallback while preserving base64 compatibility for model adapters and avoiding durable screenshot files for artifact payloads.
2026-05-22 05:09:25 +02:00
Alessandro
26b3ae00d5 Open saved browser screenshots in image viewer
Route persisted browser screenshot thumbnails to the shared image viewer modal while keeping live Browser previews as Canvas entry points.\n\nAdd regression coverage so static screenshot artifacts do not regress back to opening the Browser surface.
2026-05-22 05:03:11 +02:00
Alessandro
d1827e6c66 Refactor: use user locale for time displays
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Add user-configurable timezone and 12/24-hour preferences, then wire them through settings, runtime snapshots, scheduler payloads, wait handling, notifications, backups, memory, plugin metadata, and frontend formatters.

Keep UTC as the boundary for absolute instants while serializing user-facing dates in the configured or browser-resolved timezone. Preserve scheduler wall-clock inputs in the selected timezone, propagate TZ into desktop/runtime process environments, and restart active desktop sessions when the runtime timezone changes.

Cover the risky paths with timezone regression tests for settings normalization, auto and fixed timezone resolution, scheduler round-trips, memory timestamp conversion, and desktop timezone sync.
2026-05-21 15:26:00 +02:00
Alessandro Frau
fc8ac5d49c
Merge pull request #1643 from nullbr41n/patch-1
Add sender number to WhatsApp user message format
2026-05-21 15:21:26 +02:00
Alessandro
675afa8dee Refactor speech stack into built-in Kokoro TTS and Whisper STT plugins
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Split the legacy core speech stack into two built-in, independently toggleable plugins: `_kokoro_tts` for TTS and `_whisper_stt` for STT.

This refactor keeps dependency installation and bootstrap concerns in Docker/bootstrap/preload, while moving speech-specific tooling, APIs, prompts, UI, and runtime behavior into the plugins. Core now exposes engine-agnostic `tts-service` and `stt-service` brokers, with browser-native TTS preserved as the fallback when Kokoro is disabled.

Included in this change:
- add built-in `_kokoro_tts` plugin with plugin-owned synth API, config, status UI, and provider registration
- add built-in `_whisper_stt` plugin with plugin-owned transcribe API, mic runtime, device UI, prompt injection, and provider registration
- remove legacy core speech APIs/helpers/settings/UI and delete unused `webui/js/speech_browser.js`
- replace the old hardcoded speech settings section with a generic voice surface backed by plugin extensions
- update preload/docs/tests to match the new plugin-owned speech architecture

Behavioral intent:
- both plugins are built-in but not `always_enabled`
- users can now hot-switch TTS and STT independently
- browser TTS remains available when `_kokoro_tts` is off
- Whisper mic UI only appears when `_whisper_stt` is enabled
2026-05-21 05:41:59 +02:00
Alessandro
30315f5227 Reduce plugin scanner false positives
Calibrate scanner prompts around demonstrated risk instead of the mere presence of common plugin capabilities. Treat scoped credentials, network calls, filesystem access, subprocesses, prompts, and generated assets as expected behavior when they match the declared plugin purpose, while keeping warnings and failures for ambiguity, unsafe handling, concealment, exploitability, or purpose mismatch.

Add regression coverage for the rendered scanner prompt so this calibration is preserved.
2026-05-21 04:02:43 +02:00
Alessandro
e96f4a389d Add Editor plugin thumbnail
Create the missing Editor plugin thumbnail as a 256x256 optimized JPEG matching the existing core plugin icon style.

Keep the asset below the 20 KB plugin thumbnail budget.
2026-05-21 03:33:08 +02:00
Alessandro
cf51c792f5 Make error retry count configurable
Some checks are pending
Build And Publish Docker Images / plan (push) Has been skipped
Build And Publish Docker Images / build (push) Waiting to run
Read the _error_retry retry limit from plugin settings instead of using the hardcoded single retry. Add config sanitization, preserve the default retry count in the settings UI, update plugin docs, and cover configured and zero-retry behavior with focused tests.
2026-05-18 03:23:56 +02:00
Alessandro
e0337410e7 Preserve model preset inherited settings
Deep-merge model preset slots with the active configuration so custom context windows, rate limits, and nested kwargs survive preset switches.

Treat legacy utility preset defaults as implicit values, allow omitted utility and embedding slots to inherit configured models, and document the partial-preset behavior.
2026-05-18 02:45:08 +02:00
Alessandro
27aa2d8550 Improve Browser Docker runtime recovery
Clarify Browser settings around internal Docker vs A0 CLI host-browser runtimes. Add recovery guidance to host-browser failures so users can switch back to the internal Docker browser from settings or /browser container. Cover the recovery messaging in host-browser connector tests.
2026-05-18 02:00:31 +02:00
nullbrain
ef32acb989
Add sender number to WhatsApp user message format 2026-05-15 19:45:30 -04:00
Alessandro
38bbff3d9a Add connector message queue protocol
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Advertise message queue support from the Agent Zero connector backend and add WebSocket handlers for queue add, remove, and send operations.

Include queue snapshots in context subscriptions and emit queue updates as the backend state changes so the CLI can stay in sync.
2026-05-15 18:13:32 +02:00
Alessandro
70adbe91a0 Polish Editor and Browser surface cleanup
Remove obsolete Office markdown editor UI and handoff code now that Markdown lives in the dedicated Editor surface.

Harden the Editor modal so it opens directly into a Markdown draft and rebinds Ace to the visible root when switching surfaces.

Make Browser address Enter navigation explicit and update the canvas setup expectations for the slimmer Office shell.
2026-05-15 12:38:29 +02:00
Alessandro
89901b64f0 Polish native Markdown editor experience
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Expand the dedicated Editor surface with safe rendered preview mode, ACE-backed source editing, browser-style tabs, toolbar/file actions, preview search, and richer Markdown rendering for code blocks, task lists, images, tables, math, local links, and footnotes.

Keep open Markdown files synchronized with the active context and saved tool edits, including live refresh for document_artifact and text_editor results without routing Markdown through Desktop/Office.

Add inline preview-page editing, clickable preview task-list checkboxes, source editor rehydration after preview-mode refreshes, and regression coverage for the new editor wiring and sync behavior.
2026-05-15 04:47:24 +02:00
Alessandro
330a0c5790 Split Markdown editor into dedicated surface
Add a builtin _editor plugin that owns Markdown API/WebSocket sessions, canvas and modal UI, live refresh, tabs, prompt Extras for active-context open files, inline close confirmation, and Close All handling.

Route Markdown document artifacts to Editor while keeping Office/Desktop focused on LibreOffice formats, and update Desktop/Office prompts, menus, compatibility shims, and regression coverage.
2026-05-15 02:41:41 +02:00
Alessandro
b48e31bead Forward remote exec reset flag
Forward reset=true from code_execution_remote replacement commands to the connected CLI and document when to use it versus runtime=reset. This lets the CLI tear down stuck host sessions before running the next command.

Tests from /home/eclypso/a0/a0-connector: PYTHONPATH=src conda run -n a0 pytest tests/test_plugin_backend.py::test_code_execution_remote_forwards_reset_true_with_replacement_command -v; ./.venv/bin/python -m pytest tests/test_plugin_backend.py -k 'code_execution_remote or select_remote_exec or ws_connector_exec_result' -v. Mirrored to live container 07e0288dc04f and health check returned HTTP 200.
2026-05-15 00:28:10 +02:00
Alessandro
68c3b8b022 Move office and desktop state under plugin storage
Migrate retired /usr/_office and /usr/_desktop trees from plugin startup into /usr/plugins/<plugin>. Update office document storage, desktop session/runtime paths, and context-scoped screenshots to use the plugin-owned state layout. Add focused tests for retired-state migration and the new path behavior.
2026-05-12 16:21:43 +02:00
Alessandro
7b61ceb241 Reflect connector model overrides in Web UI
Render custom per-chat model overrides in the model switcher instead of hiding them behind a generic Custom label.

Mark model override updates dirty so an already-open Web UI refreshes after CLI or Web UI changes, without exposing API key values in labels.

Add focused regression coverage for switcher rendering hooks and state-sync notifications.
2026-05-12 16:04:02 +02:00
Alessandro
03cc91287e Add Nebius Token Factory provider
Register Nebius Token Factory as an OpenAI-compatible chat provider using the Token Factory API base URL and NEBIUS_API_KEY-derived provider id.

Expose Nebius in onboarding metadata and add static coverage for the provider endpoint and UI listing.
2026-05-12 15:50:24 +02:00
Alessandro
4bab8da3f5 Keep host browser requests on Browser runtime
Route host/local browser requests through the Browser tool instead of desktop or shell fallbacks. Add remote-debugging setup guidance to Browser runtime errors and document the exact Chrome inspect setting in prompts, skills, and Web UI copy.
2026-05-12 15:45:29 +02:00
Alessandro
7b1c84aeca Improve browser tool ergonomics for agent UI control
Teach the Browser content helper to ignore global/delegated framework event bindings so snapshots surface the actual actionable controls instead of broad wrapper elements. Add an accessible name to the Browser address bar for more reliable capture output.

Allow agents to use selector-based reference actions, coordinate click fallbacks, focused-field typing, and string key chords such as CTRL+A across the browser tool, container runtime, and host connector runtime. Cover the behavior with browser regression and host connector tests.
2026-05-12 09:41:13 +02:00
Alessandro
55474443c9 Stabilize document artifact affordances
Make file creation opt-in through document_artifact, move document file cards to final responses, and keep the tool payload as a quiet execution record.

Deduplicate response cards by file identity, refresh open Desktop canvas sessions after saved edits, and harden document_artifact edit input normalization for common append/update shapes.

Update prompts, skills, styles, and regression coverage for response-only file actions and explicit-only canvas opening.
2026-05-12 06:59:22 +02:00
Alessandro
6de7073bf9 Fix blocking history compression edge cases
Detect stalled automatic history compression so the prompt-prep wait loop cannot spin forever when no further reduction is possible.

Split large manual chat compaction input by verified token budget instead of line midpoint, covering single-line 85k+ character histories.

Add regression tests for stalled compression, max-pass bailout, and large single-line compaction chunking.
2026-05-12 04:47:28 +02:00
Alessandro
ba0d90c380 Improve model config provider controls
Reset the custom API base URL whenever the provider dropdown changes so stale endpoints do not carry across provider tests. Move the chat Supports Vision toggle out of Advanced Settings while keeping dependent vision settings, such as Max embeds, inside Advanced.
2026-05-12 03:52:18 +02:00
Alessandro
f17198e126 fix: tighten tool guidance and editor workflows 2026-05-11 11:51:58 +02:00
Alessandro
6ba1f30dca fix: make memory cleanup update stale fragments 2026-05-11 11:51:58 +02:00
Alessandro
6d29268cbd refactor: align skills and tool guidance
Some checks are pending
Build And Publish Docker Images / plan (push) Waiting to run
Build And Publish Docker Images / build (push) Blocked by required conditions
Rename high-impact skills to task-oriented names and move plugin-owned skills into their owning plugin folders.\n\nAlign renamed skill frontmatter with the official SKILL.md standard by keeping trigger language in name/description metadata, replacing the old create-skill wizard with build-skill, and updating browser, A0 connector, computer-use, CLI setup, and scheduler skill references.\n\nTighten the recurring cross-provider guidance gaps surfaced by the evidence sweeps: memory requests now avoid promptinclude-file routing, scheduler prompts distinguish cron schedules from planned ISO dates, document questions prefer document_query, skills_tool search/read_file usage is clearer, normal notifications set info/priority 10, and local/host text editors preserve patch intent.\n\nUpdate regression tests for the renamed skills, plugin ownership, prompt budget reality, and standard frontmatter shape.
2026-05-10 07:13:14 +02:00
Alessandro
854a96e880 Treat bare desktop canvas as ready
Allow the Linux Desktop state collector to report a healthy canvas when XFCE has no active application window, as long as the display, visible windows, and screenshots are available.

Document the readiness rule in the linux-desktop skill and add regression coverage for the bare-desktop active_window=null case.
2026-05-10 00:07:04 +02:00
Alessandro
c8e239d5a2 Harden remote computer use readiness
Track computer-use CLI status, last error, and restore-token presence in connector metadata so stale Free Run settings are no longer treated as ready.

Materialize CLI-provided screenshot artifacts through Agent Zero's file helpers, stop dispatching computer_use_remote actions when metadata already reports rearm required, and teach the skill to give backend-agnostic rearm guidance without screenshot or vision fallbacks.
2026-05-10 00:05:08 +02:00
Alessandro
daf95ec3ab Normalize tool contracts and slim prompt surface
Standardize multi-action tools around tool_args.action while keeping parser compatibility for older tool/args, tool_name:action, and method-shaped requests. This keeps new prompts clean without breaking agents that learned the previous dialect.

Move A0 connector remote execution/file tools into stable standard prompts, make remote targeting independent of the active chat context, and skill-gate beta computer-use remote so it no longer weighs down the always-on tool list.

Align text editor, scheduler, skills, office artifact, memory, notify, and browser prompts/tools around the canonical action contract. Add scheduler update/timezone handling, skills_tool read_file, text editor patch coverage, and fixes for memory_forget, behaviour_adjustment, and code execution progress warnings.

Reduce default prompt pressure by compacting browser and scheduler prompts into skill-backed manifests, shortening skill catalog descriptions, and pruning noisy framework knowledge. Remove obsolete connector prompt stubs and root tool-call knowledge examples.

Tests: conda run -n a0 pytest tests/test_a0_connector_prompt_gating.py tests/test_tool_action_contracts.py tests/test_task_scheduler_timezone.py tests/test_text_editor_context_patch.py tests/test_tool_request_normalization.py tests/test_office_document_store.py::test_odf_is_advertised_and_docx_remains_explicit_compatibility tests/test_office_document_store.py::test_document_artifact_accepts_method_alias_for_ods_create tests/test_skills_runtime.py tests/test_default_prompt_budget.py::test_a0_small_profile_removed_and_prompt_text_generic -q
2026-05-09 21:54:43 +02:00
Alessandro
09d9ed2e80 Bound browser tab usage during research 2026-05-09 17:36:15 +02:00
Alessandro
ef98275c8c Improve final onboarding CLI connector hint
Tighten the final onboarding discovery cards so the setup suggestions take less vertical space.

Add a compact CLI Connector card with macOS/Linux and Windows install commands plus a small A0 terminal preview asset.
2026-05-09 16:47:52 +02:00