From a174072e67ded2bc4128f8a0da46d042ae042a7d Mon Sep 17 00:00:00 2001 From: Alessandro <155005371+3clyp50@users.noreply.github.com> Date: Tue, 5 May 2026 17:18:09 +0200 Subject: [PATCH] Generalize Desktop CLI agent guidance Refine the Linux Desktop skill so agents distinguish shell prompts from target CLI prompts before sending natural-language input. Use a generic nested CLI-agent launch pattern with TARGET_CLI and optional FALLBACK_CMD instead of Codex-specific instructions, and add canvas setup assertions to keep the guidance generic. --- plugins/_office/skills/linux-desktop/SKILL.md | 81 ++++++++++++------- tests/test_office_canvas_setup.py | 6 ++ 2 files changed, 59 insertions(+), 28 deletions(-) diff --git a/plugins/_office/skills/linux-desktop/SKILL.md b/plugins/_office/skills/linux-desktop/SKILL.md index 2d69b4a4b..1aec0834e 100644 --- a/plugins/_office/skills/linux-desktop/SKILL.md +++ b/plugins/_office/skills/linux-desktop/SKILL.md @@ -38,23 +38,24 @@ Keep these standing rules: 2. Treat ODF as first-class for LibreOffice office work: ODT in Writer, ODS in Calc, ODP in Impress. Use DOCX/XLSX/PPTX only for explicit Microsoft compatibility. 3. Use the Desktop only when the user asks for the Desktop, a GUI app, binary Office visual work, or visual confirmation. 4. Never open the Desktop/canvas automatically from a tool result if the user has not opened it. Offer the explicit Open in canvas action instead. -5. Launch common apps from the Desktop icons, the header buttons, or `scripts/desktopctl.sh`. +5. Launch common apps from the Desktop icons, the header buttons, or `/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh`. 6. Use the external Agent Zero Browser for web browsing. Do not launch an operating-system browser in this version. 7. Verify GUI work by observing the desktop state, checking window titles, and saving the file before reporting success. If exact terminal text matters, load or inspect the screenshot path returned by the final observation, not a screenshot captured before the text appeared. ## Control Flow -Use the helper script when the Desktop is already open and you need reliable app launches, clicks, keystrokes, or window checks from the agent shell: +Use the helper script when the Desktop is already open and you need reliable app launches, clicks, keystrokes, or window checks from the agent shell. In the live Agent Zero runtime, prefer the absolute path so the command works from any current directory: ```bash -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh check -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh state --json -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh observe --json --screenshot -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh launch calc -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh wait-window LibreOffice -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh windows LibreOffice -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh focus LibreOffice -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh key ctrl+s +DESKTOP=/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh +$DESKTOP check +$DESKTOP state --json +$DESKTOP observe --json --screenshot +$DESKTOP launch calc +$DESKTOP wait-window LibreOffice +$DESKTOP windows LibreOffice +$DESKTOP focus LibreOffice +$DESKTOP key ctrl+s ``` The script targets the persistent `agent-zero-desktop` X display, sets `DISPLAY`, `XAUTHORITY`, and `HOME` to the XFCE profile, then uses `xdotool` for input. Startup normally prepares this session. If `check` fails during explicit Desktop work, report that the Desktop runtime is not ready instead of installing packages ad hoc. @@ -62,21 +63,23 @@ The script targets the persistent `agent-zero-desktop` X display, sets `DISPLAY` For direct app launches without coordinates: ```bash -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh launch writer -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh launch calc -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh launch impress -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh launch terminal -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh launch settings -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh open-path /a0/usr/workdir -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh focus "LibreOffice" -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh paste-text "Text to insert" -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh key ctrl+s +DESKTOP=/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh +$DESKTOP launch writer +$DESKTOP launch calc +$DESKTOP launch impress +$DESKTOP launch terminal +$DESKTOP launch settings +$DESKTOP open-path /a0/usr/workdir +$DESKTOP focus "LibreOffice" +$DESKTOP paste-text "Text to insert" +$DESKTOP key ctrl+s ``` For live spreadsheet coworking, use the Calc helper instead of hand-written UNO snippets: ```bash -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh calc-set-cell /a0/usr/workdir/example.xlsx Sheet1 B2 "Cowork verified live" +DESKTOP=/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh +$DESKTOP calc-set-cell /a0/usr/workdir/example.xlsx Sheet1 B2 "Cowork verified live" ``` This opens the workbook in the visible Desktop Calc session if needed, changes the cell through LibreOffice, saves the workbook, and verifies the `.xlsx` on disk. Because the edit happens through the running LibreOffice session, the user can see the sheet update without refreshing the Desktop surface. @@ -84,14 +87,15 @@ This opens the workbook in the visible Desktop Calc session if needed, changes t For coordinate actions, clicks are explicitly last resort. First try `launch`, `open-path`, `wait-window`, `focus`, `key`, `paste-text`, `save`, or an app-native helper. If a coordinate action is still necessary, base it on a fresh screenshot observation and verify immediately afterward: ```bash -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh observe --json --screenshot -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh click 120 180 -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh dblclick 120 180 -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh right-click 120 180 -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh drag 120 180 400 180 -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh scroll down 3 -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh type "Text to enter" -plugins/_office/skills/linux-desktop/scripts/desktopctl.sh observe --json +DESKTOP=/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh +$DESKTOP observe --json --screenshot +$DESKTOP click 120 180 +$DESKTOP dblclick 120 180 +$DESKTOP right-click 120 180 +$DESKTOP drag 120 180 400 180 +$DESKTOP scroll down 3 +$DESKTOP type "Text to enter" +$DESKTOP observe --json ``` When browser automation is available, the higher-level QA flow is: @@ -115,6 +119,27 @@ Terminal apps are visual state, not structured logs. When the task depends on ex For nested CLI agents, a successful proof requires both the input prompt and the nested agent's visible response in the final screenshot, or another deterministic saved transcript produced by the CLI itself. +Guard the boundary between the shell and the target CLI carefully: + +- A shell prompt such as `root@...#` means the target CLI is not currently receiving chat input. Never paste natural-language text into that shell prompt unless it is deliberately quoted as an argument to a shell command. +- If launching a CLI by name can fail, use a shell-safe fallback command and wait for the CLI's own prompt before sending the user's natural-language message. +- After a launch failure such as `command not found`, do not continue by sending the chat message. Start the fallback CLI command or report the blocker. +- A prompt like `>`, `Implement {feature}`, or a CLI-specific input box inside the terminal is different from a shell prompt. Only then should `paste-text "natural language"` followed by `key Return` be used as chat input. + +Example for a nested CLI-agent smoke test: + +```bash +DESKTOP=/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh +$DESKTOP focus "Terminal" +$DESKTOP paste-text 'TARGET_CLI="example-cli-agent"; FALLBACK_CMD=""; if command -v "$TARGET_CLI" >/dev/null 2>&1; then "$TARGET_CLI"; elif [ -n "$FALLBACK_CMD" ]; then sh -lc "$FALLBACK_CMD"; else echo "CLI agent not found: $TARGET_CLI"; fi' +$DESKTOP key Return +$DESKTOP observe --json --screenshot +# Verify the screenshot shows the target CLI prompt, not a shell prompt, before sending natural language: +$DESKTOP paste-text 'Reply with exactly the requested smoke-test token.' +$DESKTOP key Return +$DESKTOP observe --json --screenshot +``` + ## Desktop Locations The Desktop exposes stable folders for common user work: diff --git a/tests/test_office_canvas_setup.py b/tests/test_office_canvas_setup.py index 6cd40d42a..a43aa0544 100644 --- a/tests/test_office_canvas_setup.py +++ b/tests/test_office_canvas_setup.py @@ -365,6 +365,7 @@ def test_official_libreoffice_desktop_route_and_packages_are_declared(): assert "/a0/usr/workdir" in linux_desktop_skill assert "/a0/usr/projects" in linux_desktop_skill assert "desktopctl.sh" in linux_desktop_skill + assert "/a0/plugins/_office/skills/linux-desktop/scripts/desktopctl.sh" in linux_desktop_skill assert "calc-set-cell" in linux_desktop_skill assert "Clicks are explicitly last resort" in linux_desktop_skill or "clicks are explicitly last resort" in linux_desktop_skill assert "fresh Desktop observation" in linux_desktop_skill @@ -372,6 +373,11 @@ def test_official_libreoffice_desktop_route_and_packages_are_declared(): assert "Terminal And CLI Agent Verification" in linux_desktop_skill assert "Do not report from an earlier screenshot path" in linux_desktop_skill assert "screenshot path returned by that final observation" in linux_desktop_skill + assert "Never paste natural-language text into that shell prompt" in linux_desktop_skill + assert "command not found" in linux_desktop_skill + assert "TARGET_CLI=\"example-cli-agent\"" in linux_desktop_skill + assert "FALLBACK_CMD" in linux_desktop_skill + assert "@openai/codex" not in linux_desktop_skill assert "xdotool" in linux_desktopctl assert "agent-zero-desktop" in linux_desktopctl assert "launch_app" in linux_desktopctl