spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-10 12:20:07 +00:00

Author	SHA1	Message	Date
Ahmed Abushagur	9e2f84adf0	fix: use native OpenRouter model_provider for Codex CLI config (#1490 ) Codex CLI's OPENAI_BASE_URL env var approach causes "Invalid Responses API request" errors because OpenRouter doesn't fully support the Responses API wire format via base URL override. Switch all 8 codex scripts to use ~/.codex/config.toml with model_provider="openrouter" which uses the native OpenRouter integration. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 18:47:40 -05:00
A	b29cf4a75d	fix: sync cloud READMEs with current agent list (#1486 ) READMEs across all 8 clouds still referenced 5 removed agents (NanoClaw, Cline, gptme, Plandex, Continue) and were missing ZeroClaw. Users following these docs got 404 errors. Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-19 17:47:57 -05:00
L	a67d83ed38	feat: reorder agents and remove NanoClaw (#1477 ) * feat: add ZeroClaw agent (14.9k stars, native OpenRouter support) Add ZeroClaw — a Rust-based autonomous AI assistant framework by Harvard/MIT/Sundai.Club communities — across all 8 clouds. Scripts: local, hetzner, digitalocean, fly, aws, gcp, daytona, sprite Install: bootstrap.sh with --install-rust + --install-system-deps Config: zeroclaw onboard --provider openrouter (via agent_configure) Env: OPENROUTER_API_KEY + ZEROCLAW_PROVIDER=openrouter (native support) Launch: zeroclaw agent Note: ZeroClaw compiles from Rust source (~5-10 min build time). A build-time warning is shown to set expectations. Also update test/mock-curl-script.sh to stub zeroclaw install URLs and add zeroclaw to mock agent binaries in test/mock.sh. Bump CLI version 0.5.8 → 0.5.9. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * feat: reorder agents and remove NanoClaw New agent order: claude → openclaw → zeroclaw → codex → opencode → kilocode - Remove NanoClaw (8 scripts + manifest entry + matrix entries + README row) - Reorder manifest.json agents section to match new order - Reorder matrix entries by cloud (local/hetzner/fly/aws/daytona/digitalocean/gcp/sprite) with agents in new order within each cloud block - Update README matrix table row order - Update test/mock.sh mock agent binary list to match - Bump CLI version 0.5.9 → 0.5.10 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 11:39:03 -08:00
L	f7458952b0	feat: remove Cline, gptme, Plandex, and Continue agents (#1475 ) Delete 32 agent scripts ({cloud}/{cline,gptme,plandex,continue}.sh across 8 clouds), remove the 4 agents from manifest.json with all their matrix entries, update README matrix rows, remove stale mock agent binaries and plandex.ai URL patterns from test harness, update CLI help examples to use remaining agents, and bump version 0.5.7 → 0.5.8. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 11:12:46 -08:00
A	5612cda40b	feat: remove Aider, Goose, Open Interpreter, Gemini CLI, Amazon Q from matrix (#1472 ) These 5 agents are being dropped from the Spawn matrix. This removes 45 agent scripts across 9 clouds, cleans the manifest, test fixtures, READMEs, CLI source, and shared library comments. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 12:31:00 -05:00
A	d8785c3d0b	security: fix command injection in cline auth via remote env var expansion (#1473 ) All 9 cline.sh scripts embedded OPENROUTER_API_KEY directly into the cloud_run command string, allowing shell metacharacter injection on the remote server. Fix by escaping the dollar sign (\${OPENROUTER_API_KEY}) so the variable is expanded on the remote machine where it's already set via agent_env_vars()/generate_env_config, not locally before being passed to cloud_run. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-19 12:25:16 -05:00
A	e2d6aa1444	fix: use json_escape in save_vm_connection to prevent malformed JSON (#1470 ) save_vm_connection built JSON via direct string interpolation, which produces malformed output if any value contains quotes, backslashes, or other JSON-special characters. This breaks spawn list/delete/history. Changes: - Use json_escape for all string fields in save_vm_connection - Use json_escape for GCP zone/project metadata values - Switch AWS, GCP, Daytona get_server_name to get_validated_server_name for consistency with Hetzner, DigitalOcean, Fly, OVH Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 16:23:27 +00:00
Ahmed Abushagur	8ee54d01a8	fix: harden agent reliability + security across all clouds (#1468 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use local github-auth.sh instead of curling from main When running from a local checkout, base64-encode the local github-auth.sh and send it inline to the remote machine. This ensures fixes (like the sudo skip for root) take effect immediately without waiting for a merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle github-auth errors gracefully instead of terminating GitHub CLI setup is optional — failures should not abort the spawn session. Guard both run_callback calls in offer_github_auth with \|\| log_warn so the script continues even if gh install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use GOOGLE_GEMINI_BASE_URL to route Gemini CLI through OpenRouter Gemini CLI ignores OPENAI_BASE_URL — it uses GEMINI_API_KEY to talk directly to Google's API. The OpenRouter key is not a valid Google API key, so all requests fail with "API key not valid". Use GOOGLE_GEMINI_BASE_URL to redirect Gemini CLI to OpenRouter's endpoint. Fixes all 9 cloud gemini scripts + manifest.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard optional spawn_agent hooks so failures don't kill the session With set -eo pipefail, any unguarded failure terminates the script. Several optional operations in spawn_agent were unguarded: - agent_configure: config file uploads (agent works with defaults) - agent_save_connection: convenience JSON for spawn list - agent_pre_launch: gateway daemons, startup hooks - agent_pre_provision: pre-provision prompts - .spawnrc shell hooks: hooking env vars into .bashrc/.zshrc These now log warnings and continue instead of aborting. Critical steps (cloud_authenticate, agent_install, cloud_provision) still exit on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: audit and fix env vars, escaping, and error handling across all agents Audit findings from 3 parallel agents, fixes applied: Env vars (4 agents fixed across 9 clouds each = 36 scripts): - Amazon Q: remove fake OPENAI_* vars (Q uses AWS auth, can't use OpenRouter) - Cline: replace OPENAI_* env vars with `cline auth -p openrouter` command - Open Interpreter: drop OPENAI_* vars, use only OPENROUTER_API_KEY (native support via --model flag) - NanoClaw: add ANTHROPIC_BASE_URL to .env file (was missing, requests went to Anthropic directly) Escaping: - execute_agent_non_interactive: replace printf '%q' with single-quote wrapping to avoid double-escaping on Fly.io Manifest updated for amazonq, cline, interpreter entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use setsid to detach openclaw gateway daemon from SSH sessions The gateway daemon launch (`nohup openclaw gateway ... & disown`) hangs on all clouds because SSH/exec channels wait for child FDs to close. setsid creates a new session, fully detaching the daemon so the channel can close immediately. Falls back to nohup where setsid is unavailable. Consolidates the daemon launch into a shared start_openclaw_gateway() function used by all 9 cloud scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: configure npm global prefix for non-root clouds (AWS, GCP, OVH) AWS Lightsail, GCP, and OVH SSH as non-root users (ubuntu/login user), so `npm install -g` fails with EACCES on /usr/local/lib/node_modules/. Fix: configure npm prefix to ~/.npm-global during cloud-init/setup and add ~/.npm-global/bin to the SSH PATH prefix so agent install commands find globally-installed npm binaries without sudo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove broken OpenRouter routing from Gemini CLI scripts Gemini CLI uses Google's native API format (/v1beta/models/:streamGenerateContent), not the OpenAI-compatible format (/v1/chat/completions). No base URL override can bridge this — the request formats are fundamentally incompatible. Same situation as Amazon Q (uses vendor-specific auth/API). Removed GEMINI_API_KEY and GOOGLE_GEMINI_BASE_URL from all 9 scripts + manifest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install AWS CLI and gcloud SDK when missing Instead of printing manual install instructions and exiting, both CLIs now auto-install: - AWS: downloads official .pkg (macOS) or .zip (Linux) installer - GCP: uses brew cask on macOS, Google's tarball installer on Linux Falls back to manual instructions if auto-install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: nanoclaw — install Docker on Linux, fix hardcoded /root/ path Two issues broke NanoClaw on all clouds: 1. .env upload hardcoded /root/nanoclaw/.env — fails on non-root clouds (AWS=ubuntu, GCP=user, OVH=ubuntu). Now uses upload_config_file with $HOME which expands on the remote side. 2. NanoClaw requires a container runtime. On Linux it uses Docker, but Docker was never installed. Added Docker install via get.docker.com to all cloud scripts (with sudo where SSH user is non-root). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review findings from PR #1463 - Reject symlinked github-auth.sh before base64-encoding (falls back to remote URL) - Hide API key from process list using curl -K - instead of -H in verify_openrouter_key Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: quote OPENROUTER_API_KEY in cline auth to prevent command injection Unquoted variable in `cline auth -p openrouter -k ${OPENROUTER_API_KEY}` allows shell metacharacters in the key to execute arbitrary commands on the remote server. Wrapping in escaped double quotes prevents expansion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 08:36:24 -05:00
Ahmed Abushagur	be904cbe1c	fix: install_agent double-escaping + github-auth reliability (#1460 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use local github-auth.sh instead of curling from main When running from a local checkout, base64-encode the local github-auth.sh and send it inline to the remote machine. This ensures fixes (like the sudo skip for root) take effect immediately without waiting for a merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle github-auth errors gracefully instead of terminating GitHub CLI setup is optional — failures should not abort the spawn session. Guard both run_callback calls in offer_github_auth with \|\| log_warn so the script continues even if gh install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use GOOGLE_GEMINI_BASE_URL to route Gemini CLI through OpenRouter Gemini CLI ignores OPENAI_BASE_URL — it uses GEMINI_API_KEY to talk directly to Google's API. The OpenRouter key is not a valid Google API key, so all requests fail with "API key not valid". Use GOOGLE_GEMINI_BASE_URL to redirect Gemini CLI to OpenRouter's endpoint. Fixes all 9 cloud gemini scripts + manifest.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard optional spawn_agent hooks so failures don't kill the session With set -eo pipefail, any unguarded failure terminates the script. Several optional operations in spawn_agent were unguarded: - agent_configure: config file uploads (agent works with defaults) - agent_save_connection: convenience JSON for spawn list - agent_pre_launch: gateway daemons, startup hooks - agent_pre_provision: pre-provision prompts - .spawnrc shell hooks: hooking env vars into .bashrc/.zshrc These now log warnings and continue instead of aborting. Critical steps (cloud_authenticate, agent_install, cloud_provision) still exit on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 05:21:55 -05:00
Ahmed Abushagur	159ad49fec	fix: harden openclaw across all clouds (#1456 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 09:25:48 +00:00
Ahmed Abushagur	f2795a6d84	fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prepend ~/.local/bin to PATH in ssh_run_server After uv installs to ~/.local/bin, the current shell session doesn't have it in PATH, causing "uv: command not found" on DigitalOcean and all other SSH-based clouds (Hetzner, AWS, GCP, OVH). Fly.io's run_server already prepends this PATH — now the shared ssh_run_server does the same, fixing all SSH-based clouds at once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Node.js to cloud-init for all cloud providers npm-based agents (codex, kilocode, etc.) fail with "npm: command not found" because Node.js isn't installed during cloud-init. Fly.io was the only provider installing Node.js (in wait_for_cloud_init). Now all cloud-init scripts install Node.js v22 LTS from nodesource, matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and GCP cloud-init (was already in shared/DigitalOcean/Hetzner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use apt packages for nodejs/npm instead of nodesource The nodesource setup script (setup_22.x) runs its own apt-get update and repository configuration, nearly doubling cloud-init time and causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm in its default repos — just add them to the packages list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add timeouts and better error handling to Daytona CLI commands Daytona CLI commands (login, list, create) can hang indefinitely when the API is slow or unreachable. This causes: - "Failed to create sandbox: timeout" with no recovery - Token validation timeouts misreported as "invalid token" - Users re-entering valid tokens that also timeout Fixes: - Wrap all daytona CLI calls with timeout (30s for auth, 120s for create) - Detect timeout errors separately from auth errors - Show actionable "try again / check status" messages for timeouts - Add nodejs/npm to Daytona wait_for_cloud_init Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: set DAYTONA_API_URL to Daytona Cloud by default The Daytona CLI may default to connecting to a local self-hosted server instead of Daytona Cloud. Without DAYTONA_API_URL set to https://app.daytona.io/api, every CLI command (login, list, create) hangs trying to reach a non-existent local server and times out. The SDK documents this as the default, but the CLI doesn't always pick it up — now we export it explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing n installs Node.js v22 to /usr/local/bin/node but apt's v18 at /usr/bin/node can shadow it in non-interactive SSH sessions. After n 22, symlink the new binaries over the apt ones so v22 is always resolved. Also fix hcloud CLI token extraction for new TOML format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review, add curl timeouts to trigger workflows - Fix ssh_run_server command injection concern: use single-quoted path_prefix so $HOME/$PATH expand remotely, not locally - Add --connect-timeout 15 --max-time 30 to trigger workflows to prevent 5-min hangs when server streams responses - Handle 409 (dedup) as success — expected when cron fires every 15min but cycles take 35min - Reduce workflow timeout-minutes from 5 to 2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 06:54:07 -05:00
Ahmed Abushagur	db4aaa0c73	fix: prevent SSH hangs, fix command escaping, pin Python 3.12 for aider (#1439 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 04:23:15 -05:00
Ahmed Abushagur	d9e6d058e0	fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds (#1436 ) aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 03:21:59 -05:00
Ahmed Abushagur	633ce8eaac	feat: upgrade default server sizes, fix Fly.io agent installs, improve E2E tests (#1428 ) - Upgrade default VM sizes across clouds for better agent performance: - Hetzner: cpx11 → cx23 (with cx22 fallback support for deprecated types) - DigitalOcean: s-2vcpu-2gb → s-2vcpu-4gb - Daytona: 2048MB → 4096MB memory - Oracle: VM.Standard.E2.1.Micro → VM.Standard.A1.Flex - OVH: d2-2 → d2-4 - Fix Fly.io agent failures: - Add Node.js + build-essential to wait_for_cloud_init (fixes npm-based agents) - Prepend PATH in interactive_session (fixes "source not found" errors) - Fix openclaw installs across clouds: use explicit PATH export instead of source - Fix DigitalOcean token validation (check "uuid" not "id") - Fix AWS cloud-init: chown .bashrc/.zshrc to ubuntu user - Improve Hetzner fallback: add "cheapest available" as last-resort fallback - Upgrade E2E tests: per-combo auto-fix, credential collection, robustness fixes Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:17:08 -08:00
Ahmed Abushagur	963144ecbd	fix: use pipx to install Aider across all clouds (#1429 ) Ubuntu 24.04 blocks system-wide pip installs (PEP 668 externally-managed- environment). Switch all aider.sh scripts from `pip install aider-chat` to `python3 -m pip install pipx && pipx install aider-chat`, which installs into an isolated virtualenv and works on all target distros. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:16:39 -08:00
Ahmed Abushagur	22b6a402f4	feat: E2E test harness, QA pipeline integration, macOS compat linter (#1425 ) * feat: add QA upgrade — macOS compat linter, per-agent mock assertions Layer 1: macOS compat linter (test/macos-compat.sh) - 12 rules (MC001–MC012) catching bash 3.2 incompatibilities - Detects: base64 -w0 file args, non-portable echo flags, source <(), ((var++)), read -d, nounset flag, sed -i, date %N, local -n, declare -A, ${var,,}, and \|& - Added to CI lint.yml in warn-only mode for burn-in - Integrated as Phase 0.5 in qa-dry-run.sh Layer 2: Per-agent mock assertions - test/fixtures/_shared_agent_assertions.sh with install checks for all 15 agents (claude, openclaw, aider, goose, etc.) - Integrated into test/mock.sh via _run_agent_assertions() Also includes branch fixes: - Fix base64 -w0 to use stdin redirect (aws, daytona, fly) - Fix fly/openclaw to use npm install instead of broken curl\|bash Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E test harness and integrate into QA pipeline Add test/e2e.sh — a full E2E test harness that provisions real servers, installs agents, and verifies setup across all clouds. Features: - Smoke test (one canary agent per cloud) and full matrix modes - Credential auto-detection for 8 clouds - Per-cloud preflight validation (sequential) then parallel agent tests - Stale server cleanup, timing history, cross-cloud comparison - Auto-fix and optimization phases via Claude agents - macOS bash 3.2 compatible Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh: - Runs after mock tests pass, gated on cloud credentials - Phase 5b auto-fixes failures using per-agent worktree branches - Parses results and includes in QA summary Also fixes: - shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read() - aws/lib/common.sh: fix SSH key import (use cat instead of base64, handle race condition on concurrent imports) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:41:07 -05:00
A	07ff397ee5	security: add SSH key path validation to aws/lib/common.sh (#1414 ) Add validation in ensure_ssh_key() to prevent path traversal and arbitrary file upload attacks: - Validate public key file exists and is a regular file - Reject symlinks to prevent reading sensitive system files - Enforce 10KB size limit (SSH pubkeys are ~100-600 bytes) Fixes #1407 Agent: complexity-hunter Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 12:54:09 -05:00
A	f412fb69bc	ux: wait for OpenClaw gateway to be ready before launching TUI (#1385 ) Fixes #1354 - users experienced a ~30s delay with "gateway not connected" errors when trying to use OpenClaw immediately after launch. Root cause: gateway takes time to bind to port 18789, but TUI launched after only 2 seconds. Solution: Add wait_for_openclaw_gateway() helper that polls the gateway port (max 30s) before launching TUI, ensuring immediate usability. Changes: - shared/common.sh: Add wait_for_openclaw_gateway() function - All openclaw.sh scripts (10 files): Replace sleep 2 with gateway readiness check Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 03:49:53 -05:00
A	c4eccbd72f	feat: prioritize clouds with CLI installed + hcloud CLI integration (#1375 ) * fix: auto-run gcloud auth login on expired GCP tokens Instead of telling users to run `gcloud auth login` manually, just run it automatically when auth check fails or instance creation hits a reauthentication error, then retry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: prioritize clouds with CLI installed + hcloud CLI integration When selecting a cloud provider, clouds are now sorted in 3 tiers: 1. Credentials detected (env vars set) — top priority 2. CLI installed (e.g., gcloud, hcloud, aws) — middle priority 3. Neither — default order Also adds hcloud CLI-first support for Hetzner operations (server create/delete/list, SSH key management, auth) with automatic fallback to the existing REST API when hcloud is not available. Closes #1370 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rename aws-lightsail to aws across the project Simplifies the cloud key from "aws-lightsail" to "aws" — AWS should have a single entry regardless of the underlying service used. Renames the directory, updates manifest.json matrix keys, CLI map, test fixtures, README, and all agent scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 20:12:35 -08:00

19 commits