Adds the explicit browser:screenshot action that writes JPEG/PNG files for vision_load, extends agent-callable Browser input actions, and documents the explicit vision workflow.
Adds the browser-forms on-demand skill and regression coverage for dispatch, runtime screenshot files, ref point resolution, upload path normalization, prompt discoverability, and label-wrapped form controls surfaced by the chat-driven E2E.
Keep Browser modal activation passive when switching from Desktop by reusing existing Browser sessions instead of creating a blank tab on viewer subscribe.
Add a Focus mode control to the Browser modal header matching Desktop's fullscreen/restore behavior.
Cover the passive subscribe path and Browser modal focus button in regression tests.
Keep one Xpra Desktop iframe alive across canvas, modal, and keepalive hosts instead of unloading it during normal UI handoffs. Add intentional shutdown/restart state so explicit shutdown is treated as closed, not crashed.
Add the desktop_shutdown Office API path, backend system-desktop shutdown cleanup, and an XFCE panel Shutdown Desktop launcher that requires a second click before writing the shutdown request marker. Hide unsafe logout, lock, and switch-user affordances and cover the lifecycle with focused tests.
Add a desktop_state helper, expanded desktopctl observe-act-verify commands, backend desktop_state support, Extra prompt state, and Xpra bridge diagnostics for the built-in Linux Desktop.
Update the Linux Desktop skill so agents prefer structured/app-native/keyboard workflows, treat coordinate clicks as last resort, and verify terminal or CLI-agent work with fresh final screenshots. Cover the behavior with focused Office desktop state, canvas setup, and office_session tests.
Promote LibreOffice-native ODT, ODS, and ODP as first-class defaults for Writer, Spreadsheet, and Presentation while keeping OOXML as explicit compatibility formats.
Add ODF package generation, validation, read/edit support, and focused tests for Markdown, ODT, ODS, ODP, DOCX, XLSX, and PPTX artifact behavior.
Reduce automatic document response triggering so meta-discussions about generated files do not create artifacts, while explicit file and canvas requests still work through the intended Markdown editor or Desktop affordance.
Preserve the native A0 browser launcher, sync the live container, and validate the flow with real chats and Playwright.
Make the Office canvas mount passive so Xpra starts only when the Desktop surface is opened or an official Office document is created/opened.
Track Desktop host visibility to unload hidden frames, stop monitors, dedupe viewport resize work, and set Xpra offscreen mode according to HTTPS support. Add a near-future note for the tunnel memory footprint.
Show Office desktop startup progress
Display a loading message while the Agent Zero Desktop environment is starting or restarting, so the right-canvas Desktop button gives immediate feedback before Xpra finishes waking up.
Detect cached Playwright contexts that have already closed before reusing the browser runtime.
Clear stale browser pages, popup waiters, screencasts, and interaction state; stop the old Playwright instance; and restart cleanly on next use. Add regression coverage for stale context recovery and unexpected context close events.
Treat local Xpra GUI client packages as best-effort during Office runtime preparation so ARM64 codec dependency gaps do not surface as startup warnings when the browser-hosted Desktop is already usable.
Keep required Desktop Xpra packages strict, trim the ARM Docker fallback to the server/X11/html5 set, and add regression coverage for optional versus required xpra-codecs/libvpx9 failures.
Register the Agent Zero Browser as the Desktop URL handler, queue URL intents from the Xfce environment, and route them into Browser on the opposite canvas/modal surface. Also make floating Browser and Desktop modals pass outside clicks through while preserving interaction inside the modal window.
Use a shared mutable holder for the POSIX PTY master fd and invalidate it before close. This keeps EOF cleanup and TTYSession.close()/kill() idempotent and prevents closing an unrelated resource if the OS reuses the old fd number.
Detect closed or exited local TTY sessions before writing, convert invalid PTY write errors into retryable session failures, and reset/retry the terminal session once after send/read failures.
Store and close POSIX PTY master descriptors when terminal sessions are closed or killed, and make local terminal session shutdown await the full TTY cleanup path. This prevents leaked /dev/ptmx descriptors from exhausting the process file descriptor limit.
Route Office canvas renames through the document store so dirty or missing-on-disk Markdown sessions can be materialized at the new path without hitting the generic workdir filesystem rename endpoint. Add regression coverage for missing draft materialization, dirty markdown rename, and the custom rename hook contract.
Add local XDG overrides for the xfce4-mail-reader.desktop and xfce4-web-browser.desktop application IDs shipped by the current desktop runtime, while keeping the older exo-* IDs covered for compatibility. Update the desktop profile tests so future changes assert both generations of launcher IDs.
Force-add curated snapshot paths so workspace .gitignore rules cannot break Time Travel snapshots, while preserving Time Travel's own exclusions for secrets and generated files.
Repair invalid shadow Git repositories by restoring HEAD when possible or quarantining and reinitializing unusable repos, and canonicalize workspace paths to avoid duplicate shadow histories for aliases.
Add regression coverage for ignored paths, corrupt shadow HEAD recovery, and canonical workspace identity.
Remove dynamically loaded skills when they are deactivated from the Skills selector. Treat skill names and paths as aliases so scoped defaults, chat overrides, and loaded-skill state resolve consistently.
Allow users to disconnect their OpenAI account by clearing stored ChatGPT OAuth tokens while preserving unrelated auth data.
Fetch and normalize Codex usage windows, then show remaining percentage and reset timing in the OAuth settings UI.
Add focused tests for usage parsing and disconnect cleanup.
Show hidden files by default in Thunar-backed Desktop sessions while preserving existing file manager profile settings.
Hide the default Xfce Mail Reader and Web Browser helper entries from the Applications menu through local XDG overrides, and cover the generated Desktop profile artifacts with targeted tests.
Detect openable Chrome extension UI pages from manifests and expose resolved chrome-extension URLs to the Browser UI.
Render an Open button in the compact Browser extension dropdown and cover manifest UI metadata with regression tests.
Install a curated README into generated Agent Zero Desktop sessions so the Xfce workspace explains the habitat concept, credits the open-source foundations and Jan Tomášek, and gives users Terminal commands for popular agent CLIs.
Keep the README as an _office plugin asset and copy it into the Desktop profile during launcher preparation.
Add a pencil action beside Save that reuses the existing file browser rename modal for open Office documents. Preserve document metadata after filesystem renames, retarget active LibreOffice desktop sessions to the new path, and cover the rename flow in Office regression tests.
Route DOCX, spreadsheets, and presentations exclusively through the Xpra desktop LibreOffice session. Keep the custom canvas path focused on Markdown source editing, remove the old dashboard/preview/native LibreOfficeKit code, and update tests and runtime package declarations to match the new Office surface.
Route binary Office documents through the persistent Desktop surface while keeping Markdown in the custom tabbed editor.
Harden Xpra clipboard bridging and explicit clipboard flags so host paste can reach the desktop session.
Align XFCE and LibreOffice profile paths with Agent Zero locations: downloads for wallpapers, configured workdir for default saves and the Workdir shortcut, and trusted metadata for generated launchers.
Make the embedded Xpra Desktop use the browser cursor as the only visible cursor by suppressing the shadow pointer overlay and pointer-position renderer without blocking pointer input.
Prefer the active Office host iframe when choosing the Desktop frame, then force resize recovery during modal-to-canvas docking so the Xpra desktop, window, and canvas refill the canvas after handoff.
Make the Desktop iframe explicitly focusable and re-arm Xpra keyboard capture on load and click so typed input reaches the remote session reliably.\n\nAdd regression assertions for the Xpra keyboard bridge contract.
Bridge copy, cut, paste, and common edit shortcuts from the Browser modal and canvas screenshot surface into the Playwright runtime while preserving native clipboard behavior for Agent Zero UI fields.
Add websocket and runtime clipboard handling with regression coverage for frontend shortcut routing, paste fallback, and viewer input dispatch.
Add a Time Travel entry directly under Files in the sidebar dropdown and route it through the existing modal. Stop Time Travel from registering or mounting a right-canvas surface, and keep modal refresh tied to the modal state.
Expose extension deletion from the Browser internal settings page and keep the compact Browser dropdown focused on quick enable/install actions.\n\nAdd a guarded uninstall API that only deletes Browser-managed extension folders, updates enabled extension paths, refreshes the settings UI, and covers managed versus external paths with regression tests.
Fix annotation panel stacking so draft popovers render above the annotations recap.\n\nAllow the annotations recap tray to float within the browser stage by dragging its header, with bounded positioning and cleanup when annotations are cleared or the browser surface unmounts.
Render Browser tool Screenshot KVPs as clickable live thumbnails that open the Browser canvas while preserving the existing lower-row Browser action.\n\nAdd a lightweight websocket snapshot endpoint for existing browser runtimes and keep preview frame memory bounded with revocable object URLs.
Sync document_artifact results into an already-open Office canvas without auto-opening a closed canvas.
Generate PPTX artifacts through the Office plugin writer so PowerPoint decks open in Impress with visible multi-slide content.
Add focused regression coverage for canvas sync behavior and PPTX slide creation.
Keep Office document artifacts from auto-opening the canvas while adding plugin-owned Download and Open in canvas message actions. Add format-specific skills for Markdown, Word, Excel, and presentation workflows, and clarify the startup-warmed Desktop runtime remains visually opt-in.\n\nCover the Excel method=create path, Markdown-first/no-auto-open policies, response affordance copy, document action buttons, and Desktop bootstrap with focused regressions.
Add an explicit close button to the right canvas toolbar, next to the undock control, and cover its label, handler, and ordering in the canvas regression test.
Treat document_artifact tool_args.method as an action alias so calls like method=create with format=xlsx create workbooks instead of falling back to LibreOffice status. Add regression coverage for the exact XLSX creation shape.
Make Markdown the first-class document workflow in the office skills and state the Desktop/LibreOffice path as opt-in for GUI or binary Office work.
Remove passive Browser canvas auto-opening from tool results; Browser result handling now only syncs an already-open Browser canvas, while explicit user buttons can still open the canvas or modal. Add regression coverage for the no-auto-open policy and Markdown-first skill guidance.
Make Desktop canvas and modal handoffs resize the live Xpra viewport deterministically by syncing the visible frame, making backend resize requests authoritative, unloading hidden iframe clients, and guarding Xpra HTML menu callbacks when the menu is disabled. Also forwards wheel events from the embedded Xpra canvas so mouse and trackpad scrolling reach the Linux desktop session.
Teach the browser page-content helper to traverse open shadow roots and assigned slot nodes when collecting text, rendering list/inline children, and resolving selectors. This lets Agent Zero inspect modern component-heavy pages more accurately without depending only on light-DOM textContent.
Bump the injected helper version so existing browser contexts can refresh to the new DOM traversal behavior.
Add a linux-desktop skill that teaches Agent Zero how to operate the persistent XFCE/Xpra desktop through desktopctl.sh, including app launch, focus, click, typing, and stable folder entry points for Workdir, Projects, Skills, Agents, and Downloads.
Add a Calc cell-edit helper that opens a workbook through the visible LibreOffice Calc desktop session, updates a requested sheet cell, saves, and verifies the XLSX on disk. Expand the Office canvas setup tests to cover Desktop branding, Xpra package requirements, resize behavior, mobile canvas gating, and the new skill helpers.
Rework the Office canvas into the Desktop surface, with Markdown editing for text documents and official LibreOffice/Xpra sessions for DOCX, XLSX, and PPTX. The panel now presents Desktop-oriented actions, named header buttons, persistent session tabs, adaptive modal/canvas sizing, and fast client-side Xpra frame fitting during resize.
Stop auto-opening the canvas from document tool results, hide the canvas on mobile-width layouts, and emit resize lifecycle events so embedded desktop surfaces can pause expensive work while the user drags.
Remove the Collabora/WOPI runtime and route stack, including the old status APIs, proxy helpers, bootstrap extensions, and WOPI store tests. Add the Markdown-first document store, LibreOffice status/conversion helpers, LibreOfficeKit session bridge, and reusable Xpra virtual desktop gateway used by the new document runtime.
Update image and self-update bootstrap paths so existing containers can acquire the LibreOffice, XFCE, Xpra, and desktop-control dependencies through the normal install hooks instead of an ad hoc manual install.
- Auto-register tabs opened by site (window.open, target=_blank,
ctrl-click) via context.on("page",...) with registry lock and
closing-state guard.
- Modifier-key click via Playwright trusted input: keyboard.down/up
around mouse.click for coord-based path; locator.click(modifiers=...)
selector fallback for off-screen / hidden elements. Chrome focus
rule: ctrl/meta-click keeps focus on origin tab; override via
focus_popup arg.
- key_chord action: presses keys in order, releases in reverse;
guarantees release on exception. Supports Ctrl+A/C/V style chords.
- mouse modifiers click-only (raises ValueError for non-click events).
- list(include_content=true) bulk read across all tabs in parallel
via asyncio.gather (was sequential).
- multi action: batched sub-calls. Different browser_id groups run
concurrently; same browser_id sequentially. Returns array of
{ok, result|error} matching input order. Lets the agent fan out
reads or coordinated mutations across tabs in one tool call.
- Cross-tab work no longer steals viewer focus.
last_interacted_browser_id promotes only on open / set_active /
same-tab work / Chrome popup rule. WebUI auto-open allowlist
tightened to open|navigate|set_active so background actions don't
drag the viewer.
- New set_active action for explicit focus switch.
- JS helper bumps VERSION to force re-injection on cached pages;
exports boundingBoxFor returning {x,y,w,h,selector} for the
trusted-input modifier-click paths.
Backwards-compatible: every new arg is optional with safe defaults.
No removed actions; existing call shapes preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep newly-created Office sessions out of orphan cleanup so in-flight iframe loads do not lose their WOPI tokens during mount refreshes.
Add regression coverage for the fresh-session grace window while preserving cleanup for older orphaned sessions.
Decode byte chunks from the live Codex/ChatGPT account SSE stream before parsing events.
Preserve accumulated output_text deltas when the final response.completed object is present but has no extractable output content.
Update the OAuth tests to cover byte-delivered SSE chunks and empty completed responses.