Make Markdown the first-class document workflow in the office skills and state the Desktop/LibreOffice path as opt-in for GUI or binary Office work.
Remove passive Browser canvas auto-opening from tool results; Browser result handling now only syncs an already-open Browser canvas, while explicit user buttons can still open the canvas or modal. Add regression coverage for the no-auto-open policy and Markdown-first skill guidance.
Make Desktop canvas and modal handoffs resize the live Xpra viewport deterministically by syncing the visible frame, making backend resize requests authoritative, unloading hidden iframe clients, and guarding Xpra HTML menu callbacks when the menu is disabled. Also forwards wheel events from the embedded Xpra canvas so mouse and trackpad scrolling reach the Linux desktop session.
Teach the browser page-content helper to traverse open shadow roots and assigned slot nodes when collecting text, rendering list/inline children, and resolving selectors. This lets Agent Zero inspect modern component-heavy pages more accurately without depending only on light-DOM textContent.
Bump the injected helper version so existing browser contexts can refresh to the new DOM traversal behavior.
Add a linux-desktop skill that teaches Agent Zero how to operate the persistent XFCE/Xpra desktop through desktopctl.sh, including app launch, focus, click, typing, and stable folder entry points for Workdir, Projects, Skills, Agents, and Downloads.
Add a Calc cell-edit helper that opens a workbook through the visible LibreOffice Calc desktop session, updates a requested sheet cell, saves, and verifies the XLSX on disk. Expand the Office canvas setup tests to cover Desktop branding, Xpra package requirements, resize behavior, mobile canvas gating, and the new skill helpers.
Rework the Office canvas into the Desktop surface, with Markdown editing for text documents and official LibreOffice/Xpra sessions for DOCX, XLSX, and PPTX. The panel now presents Desktop-oriented actions, named header buttons, persistent session tabs, adaptive modal/canvas sizing, and fast client-side Xpra frame fitting during resize.
Stop auto-opening the canvas from document tool results, hide the canvas on mobile-width layouts, and emit resize lifecycle events so embedded desktop surfaces can pause expensive work while the user drags.
Remove the Collabora/WOPI runtime and route stack, including the old status APIs, proxy helpers, bootstrap extensions, and WOPI store tests. Add the Markdown-first document store, LibreOffice status/conversion helpers, LibreOfficeKit session bridge, and reusable Xpra virtual desktop gateway used by the new document runtime.
Update image and self-update bootstrap paths so existing containers can acquire the LibreOffice, XFCE, Xpra, and desktop-control dependencies through the normal install hooks instead of an ad hoc manual install.
Keep newly-created Office sessions out of orphan cleanup so in-flight iframe loads do not lose their WOPI tokens during mount refreshes.
Add regression coverage for the fresh-session grace window while preserving cleanup for older orphaned sessions.
Decode byte chunks from the live Codex/ChatGPT account SSE stream before parsing events.
Preserve accumulated output_text deltas when the final response.completed object is present but has no extractable output content.
Update the OAuth tests to cover byte-delivered SSE chunks and empty completed responses.
Expose sanitized active main and utility model metadata through the model override endpoint, then render those names in the chat model switcher even when no preset override is active. Keep the inline model names hidden on narrow screens and cover the behavior with a regression check.
Refresh model names after settings save
Refresh the active chat model switcher after _model_config settings are saved so changes to main and utility models appear immediately. Extend the model switcher regression check to cover the save-refresh hook.
Add a discovery hero for the OAuth plugin below the Telegram, Email, and WhatsApp integration cards. The banner uses the supplied OpenAI artwork, opens the OAuth plugin settings, and adapts its copy when a ChatGPT/Codex account is already connected.
Create a generic OAuth Connections plugin with Codex/ChatGPT Account as the first provider, using OpenAI's device-code flow to persist Codex-compatible account tokens.
Expose a loopback OpenAI-compatible wrapper for models, responses, and chat completions, and point LiteLLM at the container-local Agent Zero origin.
Add a dummy API-key extension and focused tests so the account-backed provider appears configured without requiring a user-entered key.
docs: add Codex plan OAuth callout
Highlight that Agent Zero can use an existing OpenAI Codex plan through the new OAuth flow.
Add the account-backed LLM plans image and surface the section from the README navigation, while pointing toward future Gemini CLI and Claude Code integrations.
Handle Codex account SSE chat chunks
Teach the Codex/ChatGPT account bridge to extract text from OpenAI-style SSE chat completion deltas and fall back to a normal output_text response when upstream only streams chunks.
Strip user-supplied stream kwargs before LiteLLM calls so Agent Zero owns streaming mode and custom parameters cannot pass stream twice.
Add targeted tests for streamed delta extraction and reconstructed responses.
update README.md with LLM plans mention
Keep browser sessions context-qualified so tabs from different chats can coexist without closing on context switches.
Create a real chat context when Browser launches from dashboard/no selected context, preserving agent handoff for that session.
Move chat context detail out of visible tab labels and into hover tooltips using only real chat names, with regression coverage for the updated lifecycle.
Replace the raw Collabora setup log with a simple Office setup progress state, redesign the Office dashboard around document cards with lightweight previews, and keep backend WOPI sessions aligned with visible Office tabs. Also preserve the restored Office canvas surface across window refreshes and add regression coverage for the new behavior.
Restart the canvas screencast after page-changing commands and remount viewport metrics when starting or resizing streams so canvas scrolling stays smooth across first mount, new tabs, and navigation.
Move Browser JS off Alpine global store lookups and onto direct store imports, tighten modal/canvas handoff state, and keep annotations aligned with accepted viewport frames.
Improve Browser tab close ergonomics, allow Chromium native error pages to render without blocking the UI, include right-canvas tab polish, and expand regression coverage for these paths.
Decode browser frames before display and only render frames that match the active viewer viewport, avoiding stretched stale screencast images during startup and resize.
Keep rejecting mismatched CDP screencast frames on the backend, extend canvas viewport settling, and cover the behavior with browser regression tests.
Include small browser panel CSS polish.
Track open Office sessions as tabs so Docs, Sheets, and Slides can switch between files without losing the active editor context.
Add backend support to list and close WOPI sessions, revoking tokens and locks when a tab closes.
Show open-file metadata in the Office start view and keep the mobile canvas rail reachable after closing the canvas.
Teach document_artifact to create embedded spreadsheet charts through a native create_chart operation, including generic line/bar/column/pie/area/scatter support and stock-style OHLC charts.
Parse CSV, TSV, and Markdown table content into real XLSX cells during spreadsheet creation so chart ranges bind to typed data instead of row text blobs.
Update the Office artifact skill and tool prompt to prefer native chart creation over Python fallback, and cover the workflow with regression tests.
Register Time Travel on Agent Zero's existing /a0/usr watchdog and coalesce automatic snapshot triggers into a single pending commit window capped at one commit per workspace every 10 seconds.
Exclude top-level /a0/usr plugins and nested Git worktrees from root snapshots, preserve self-root Git workspace tracking, and cover the behavior with Time Travel tests.
Add read/edit support for Office document artifacts, including direct DOCX, XLSX, and PPTX updates with version history preservation. Inject compact active canvas metadata so agents can discover opened files without loading file contents. Move detailed usage guidance into the office-artifacts skill and keep the always-on tool prompt lean to avoid context bloat.
Allow the Browser surface to create and select a chat context when opened without an active context.
Reuse an in-flight context creation promise so repeated startup paths do not race, and update commands/viewer connection to ensure a context before calling browser websocket APIs.
Add a browser regression guard for the no-context startup path.
Wait for the right-canvas browser surface to finish its opening transition before using its dimensions as the Playwright viewport.
Measure raw stage dimensions for stability, then apply the existing clamped viewport values so initial screencasts do not render into a stretched canvas.
Add a browser regression guard for the raw viewport settle path.
Require explicit artifact, file, canvas, or format cues before turning response text into an Office artifact, while still allowing standalone deliverable-shaped drafts to open in the canvas. Add a same-turn guard so the response affordance does not duplicate documents already created with document_artifact, plus regression coverage for noisy long-document cases.
Add a clearer onboarding progression for Main Model, Utility Model, and ready states with streamlined copy and calmer calls to action.
Rework email integration setup around provider-first selection, guided defaults, and cleaner account fields.
Restyle Settings and standard modals around a streamlined left-rail layout, clearer section hierarchy, advanced settings disclosures, and stronger update states.
Add persistent update visibility with quieter once-daily update notifications, plus Remote Link and Space Agent actions in the canvas rail. Refresh the tunnel experience as a normal Remote Link modal with clearer copy, QR/mobile affordances, and safer state handling.
Add the _time_travel core plugin with Agent Zero-owned shadow Git snapshots, history/diff/preview/travel/revert APIs, capture hooks, and canvas plus floating window UI surfaces for /a0/usr workspaces.
Wire generic file-browser mutation hooks for UI edits, update modal backdrop handling, remove the legacy _diff_viewer plugin, and replace Diff Viewer tests with focused Time Travel coverage.
Inspired by Space Agent :-)
Move the right canvas rail into a centered floating socket in both open and closed states so chat space is preserved and the canvas content keeps the full panel width.
Add a dedicated dock-style show/hide control, make surface buttons select surfaces without hiding the canvas, remove the header close button, and align the diff viewer header padding with shared spacing tokens.
Update right-canvas.css
Add Codex-inspired annotation UI to the built-in Browser surfaces, including the Annotate toggle, Cmd/Ctrl+. shortcut, selection overlay, inline comments, and batch Draft to chat / Send now actions.
Wire browser_viewer_annotation through the WebSocket and runtime layers, and expose safe DOM metadata extraction for clicked elements and selected areas without leaking password/value data.
Expand regression coverage for the Browser UI, annotation dispatch, runtime helper exposure, prompt formatting, and WebUI extension surface harness behavior.
Adds the core _diff_viewer plugin for viewing staged, unstaged, and untracked working-tree changes in the right canvas and window modal.
Includes context-aware workspace resolution, safe read-only Git collection, zero-line .gitkeep filtering, unified diff rendering, and focused diff collection tests.
Persist the active agent profile with each chat context and add a context-scoped endpoint for switching profiles without mutating global settings. Update the WebUI selector and docs to treat settings as the default for new chats, and expose the switch through the A0 connector plugin.
Add Browser settings for the default starting page and tool-result autofocus, and wire them through config, APIs, runtime opens, and the settings UI.
Resolve Chrome extension __MSG_* manifest labels from locale metadata so installed extensions show readable names. Stabilize Browser viewport negotiation across canvas and modal surfaces by clearing stale frames, waiting for stable surface dimensions, and forcing sync after dock transitions. Move Browser loading/error state into a thin bottom status bar so it no longer overlays the page viewport.
- Auto-open Office and Browser canvas surfaces from fresh tool results, including history/result messages.
- Preserve Browser target IDs when focusing a canvas session from tool output.
- Convert substantial response-style artifacts into Office documents at runtime, without relying only on prompt compliance.
- Attach Office artifact metadata to the completed response log so the canvas opens without leaving a dangling Processing group.
- Polish Office UX by removing the inactive version-history action, showing only the healthy dot, and improving Collabora blank-load recovery with browser state cleanup.
- Deduplicate auto-open events and ignore stale results.
Extend Browser into a reusable panel that can run in either the Universal Canvas or the floating modal. Add canvas registration, dock/undock behavior, and keep the existing modal path working as a fallback.
Stabilize tab switching with viewer tokens and stale-frame rejection, prevent command snapshots from crossing active tabs, and keep tab changes responsive.
Improve canvas navigation and scrolling by making screencast polling non-blocking and removing page-settle waits from wheel input, so the visible frame updates promptly without stretch/catch-up artifacts.
Polish Browser busy feedback with a spinner-only status affordance to avoid misleading “updating browser” copy.
Replace the Browser viewer’s screenshot polling with CDP screencast streaming for much smoother navigation. The runtime now starts/stops CDP screencasts cleanly, acknowledges frames, drops stale frames, and keeps the WebSocket payload compatible with the existing viewer.
Also fixes modal viewport sizing by sending the initial stage dimensions on subscribe, applying CDP emulation sizing before the first frame, avoiding image stretching, and increasing screencast JPEG quality to 92. Regression coverage was added for the screencast path, frame ack/drop behavior, viewport sizing, and UI rendering assumptions.
-- Still needs thorough performance audit and optimization --
Refine the Browser modal UI with more native-feeling tabs, consistent chrome controls, right-side tab close buttons, and a cleaner extension dropdown. Move the Browser LLM preset into the dropdown with the active Main Model summary, simplify extension settings, remove the global extension enable switch and legacy extension root behavior, and add per-extension enable toggles.
Also updates the Chrome extension install/review flow with contextual warning copy, “Scan with A0”, cleaner labels, hidden empty extension state, and regression coverage for the new Browser UX.
- Always launch Browser with full Playwright Chromium instead of switching between headless shell and extension mode
- Cache Chromium under /a0/usr/plugins/_browser/playwright with legacy lookup for existing installs
- Store installed Browser extensions under /a0/usr/plugins/_browser/extensions with legacy extension-root compatibility
- Show clearer first-run Chromium install messaging and extend the initial Browser timeout
- Fix Browser spinner animation for startup and extension install states
- Update Docker Playwright install script and regression coverage
- Download Chrome Web Store extensions using the detected Chrome prodversion instead of a stale hardcoded version
- Update extension settings copy to reflect Chrome Web Store URL support
- Serialize Browser persistent-context startup and clean stale Chromium profile singleton locks
- Increase Browser viewer subscribe timeout for extension-enabled cold starts
- Add regressions for Web Store download URL handling, slow viewer startup, and stale profile lock cleanup
Introduce the new built-in Browser plugin for Agent Zero, replacing the legacy
browser-use-based browser agent with a direct Playwright-powered browser tool,
live WebUI viewer, browser session controls, status APIs, configuration, and
extension-management support.
Add browser-specific modal behavior so the browser can run as a floating,
resizable, no-backdrop window, including modal focus, toggle, and idempotent
open helpers for richer WebUI surfaces.
Remove the old `_browser_agent` core plugin and the `browser-use` dependency,
then clean up stale browser-model wiring and references across agent code,
model configuration docs, setup guides, troubleshooting docs, skills, and
Agent Zero knowledge.
Update regression and WebUI extension-surface coverage for the new browser
architecture and modal behavior.
The legacy browser-use implementation has been extracted from core so it can
continue separately as a community plugin published through the A0 Plugin Index for any user or professional that were relying on it for workflow.
Add post-action settle/fresh-capture handling for computer_use_remote, include capture ids and coordinate-space summaries in screenshot attachments, and tighten prompt guidance so agents use the latest capture without assuming semantic/window targeting.
Update the WebSocket disconnect handler signature to accept the disconnect
reason now passed by python-socketio.
Agent Zero does not currently use the reason value, but keeping the parameter
matches the documented Socket.IO callback shape and avoids relying on the
library's legacy one-argument handler fallback.
python-socketio>=5.14.2 now documents server disconnect handlers as receiving sid, reason:
https://python-socketio.readthedocs.io/en/stable/server.html#connect-and-disconnect-events.
The 5.14.2 source also passes that reason into the disconnect event. It still has a legacy fallback that retries old one-arg handlers, so removing it would probably work today, but only by leaning on compatibility behavior.