spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-19 08:01:17 +00:00

Author	SHA1	Message	Date
L	9fc59ded1c	fix: handle raw m2. macaroon tokens from Fly.io CLI Sessions API (#1552 ) Root cause of 'no tokens found in header' after browser OAuth: The Fly.io CLI Sessions API returns raw macaroon tokens (e.g. m2.XXXX) WITHOUT the 'FlyV1 ' prefix. _sanitize_fly_token only handled fm2_ tokens, so m2. tokens fell through unchanged and were sent as: Authorization: Bearer m2.XXXX Fly.io's Machines API expects FlyV1 macaroon format, not Bearer. Fixes: - _sanitize_fly_token: add m2.* case that wraps as 'FlyV1 m2.XXX' - _try_fly_browser_auth polling: eagerly wrap any non-FlyV1 token with 'FlyV1 ' prefix at the source, before it's echoed back to the caller Token format handling after fix: m2.XXXX → FlyV1 m2.XXXX ← CLI Sessions API (was broken) fm2_XXXX → FlyV1 fm2_XXXX ← still handled (unchanged) FlyV1 fm2_XXXX → FlyV1 fm2_XXXX ← already correct (unchanged) eyJhbGci... → Bearer eyJ... ← legacy JWT (fallback to manual) Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 23:54:34 -08:00
A	328e6a6da4	fix: replace bun -e with python3 in fly/lib/common.sh to fix 18 mock test failures (#1553 ) bun is not installed in the mock test environment (CI or local test runs). The mock harness stubs bun as a no-op logger, so _fly_json_get() always returned empty string, causing "Failed to extract machine ID" and 18 fly script test failures in bash test/mock.sh. Replace all 4 bun -e invocations with equivalent python3 code: - _fly_json_get: extract top-level JSON field from stdin - _fly_build_machine_body: build machine creation JSON body - _fly_destroy_app: extract machine IDs array - list_servers: format apps table python3 is always available and already has a pass-through mock in test/mock.sh (like /usr/bin/python3). No behavior change for real runs. Before: bash test/mock.sh fly → 18 passed, 18 failed After: bash test/mock.sh fly → 36 passed, 0 failed Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-21 02:19:46 -05:00
L	fe2c0b024b	fix: prevent premature exit when Fly.io CLI session access_token is null (#1551 ) The polling loop in _try_fly_browser_auth() was returning immediately on the first poll (t=2s) because: access_token=$(... "d.get('access_token','')") When the JSON has "access_token": null (before the user completes browser auth), Python's print(None) outputs the string "None". Bash $() captures "None" as non-empty, passes [[ -n "$access_token" ]], and returns it as the token — before the user even sees the browser. Then _validate_fly_token(FLY_API_TOKEN="None") sends: Authorization: Bearer None which Fly.io rejects with: verify: invalid token: no tokens found in header Fix: d.get('access_token') or '' → None or '' = '' (empty, keeps polling) + explicit != "None" guard for belt-and-suspenders Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 21:37:34 -08:00
L	4ae781d2a8	fix: remove 2>/dev/null from token validation calls in auth flow (#1549 ) Token validation functions (test_hcloud_token, test_do_token, test_daytona_token, _validate_fly_token) contain rich diagnostic log_error/log_warn messages with error details and fix instructions. Calling them with 2>/dev/null silently discarded all that output, leaving users with no explanation when their token was rejected. shared/common.sh — ensure_api_token_with_provider(): Remove 2>/dev/null from "${test_func}" in both the env-var and config-file validation branches, so callers like test_hcloud_token can print API error details and remediation steps. fly/lib/common.sh — ensure_fly_token(): Remove 2>/dev/null from both _validate_fly_token calls (config-file path and post-browser-OAuth path) so users see why validation failed. Note: Issue 1 (API polling in _poll_instance_once) is intentionally left with 2>/dev/null — suppressing curl errors during a 60-iteration polling loop prevents terminal flooding and is handled by '\|\| true'. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 21:27:42 -08:00
L	8c437435eb	fix: show Fly.io login URL immediately by removing 2>/dev/null suppression (#1548 ) 2>/dev/null on _try_fly_browser_auth() was swallowing all stderr, including the auth URL printf and log_step messages that the user needs to see for sandbox/headless environments. Also add a 'Fetching Fly.io login URL...' log_step before the API call so the user gets immediate feedback while the session is created (the curl call can take 1-2 seconds before the URL is available). Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 21:12:17 -08:00
A	d6c53d838f	fix: source .spawnrc directly in agent launch commands for reliable env loading (#1546 ) 24 agent scripts (codex, opencode, kilocode, openclaw across 6 clouds) used `source ~/.zshrc && <agent>` which loads env vars indirectly via a hook. This fails silently when .zshrc has errors or the hook install was non-fatal, causing agents to launch without OPENROUTER_API_KEY. Change to `source ~/.spawnrc 2>/dev/null; source ~/.zshrc 2>/dev/null; <agent>` which loads env vars directly (matching claude/zeroclaw pattern) and tolerates .zshrc failures without blocking the agent. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 23:37:03 -05:00
L	be176e4cdb	fix: confirm kebab resource name + improve Fly.io sandbox auth (#1525 ) shared/common.sh — prompt_spawn_name(): Replace log_info with safe_read so user confirms (or overrides) the derived kebab-case resource name before it's used for any cloud resource: Spawn name (e.g. "My Dev Box"): My Claude Box Resource name [my-claude-box]: ⏎ ← press Enter to accept fly/lib/common.sh — _try_fly_browser_auth(): - Print auth URL prominently on its own line (not just as a warning) so sandbox users can copy-paste it into their local browser - Suppress open_browser errors (\|\| true) so the script doesn't abort if no browser is available - Add explicit sandbox hint while polling - After 120s timeout: offer manual API token entry as a last resort with a direct link to fly.io/dashboard → Tokens Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 07:12:49 -08:00
Ahmed Abushagur	b5d174a472	fix: pin Codex to 0.94.0 + wire_api=chat for multi-turn stability (#1518 ) * fix: switch Codex wire_api from "responses" to "chat" for multi-turn stability The Responses API format causes "Invalid Responses API request" errors on the second turn and beyond — conversation history items round-trip through OpenRouter with null content fields and missing IDs that fail validation. Chat Completions format is fully supported and avoids this issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pin Codex to 0.94.0 + wire_api=chat for multi-turn stability OpenRouter's Responses API proxy drops required fields (id, content) from conversation-history items on multi-turn requests, causing "Invalid Responses API request" at input[6]+. Codex >=0.97.0 removed wire_api=chat support (openai/codex#10157), so we pin to 0.94.0 — the last release where Chat Completions format still works. Tracking: https://github.com/openai/codex/issues/12114 TODO: unpin once OpenRouter /responses handles round-trip correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 04:49:35 -05:00
A	50dd2f26ed	fix: repair Fly.io saved token loading (_load_token_from_config misuse) (#1513 ) ensure_fly_token() called _load_token_from_config with only 1 argument (config file path) but the function requires 3 (config_file, env_var_name, provider_name). The empty env_var_name fails the security validation regex, so the function always returns 1 silently. Users with saved Fly.io tokens in ~/.config/spawn/fly.json were forced to re-authenticate every session. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 03:54:41 -05:00
A	3280a44c45	feat: add browser-based OAuth login for Fly.io + token sanitizer (#1506 ) Replace the prompt-first auth flow with a browser-based CLI session flow (same as `fly auth login`). The new auth chain is: 1. Environment variable (FLY_API_TOKEN) 2. Saved config file (~/.config/spawn/fly.json) 3. flyctl CLI (`fly auth token`) 4. Browser OAuth via Fly.io CLI Sessions API (NEW) 5. Manual token prompt (last resort fallback) The browser flow creates a CLI session via POST /api/v1/cli_sessions, opens the auth URL in the user's browser, then polls for the access token. This is the same mechanism flyctl uses internally. Also add _sanitize_fly_token() to handle the Fly dashboard copy button which includes the display name before the token (e.g. "Deploy Token FlyV1 fm2_..."). The sanitizer strips everything before "FlyV1" or extracts bare "fm2_" tokens, and trims whitespace/newlines. Applied at every token entry point (env var, config, manual prompt). Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 22:50:19 -08:00
L	d5690a8b11	feat: spawn name prompt + kebab resource naming across all clouds (#1507 ) * feat: add spawn name prompt and project confirmation to GCP flow Ask for spawn name upfront (before auth), derive kebab-case default for VM naming, and confirm the current GCP project before using it. New interaction order: 1. Spawn name: "My Dev Box" → kebab "my-dev-box" exported as GCP_INSTANCE_NAME_KEBAB 2. gcloud auth + project confirm: "Current project: X Keep? [Y/n]" If no → project picker shown 3. SSH key 4. Machine type picker (existing) 5. Zone picker (existing) 6. Instance name prompt: "Instance name [my-dev-box]: " User can press Enter to accept or type a custom name New functions: _to_kebab_case() — lowercases, replaces non-alnum with hyphens _gcp_prompt_spawn_name() — prompts for display name, exports kebab default; honours SPAWN_NAME env var set by CLI (--name flag) Modified: _gcp_resolve_project() — adds Y/n confirmation when project already set get_server_name() — shows kebab default in prompt, accepts Enter cloud_authenticate() — calls _gcp_prompt_spawn_name first Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * feat: add spawn name prompt to all clouds via shared/common.sh Move _to_kebab_case() and prompt_spawn_name() to shared/common.sh so all clouds get upfront spawn name prompting and kebab-based resource naming. shared/common.sh: + _to_kebab_case() — "My Dev Box" → "my-dev-box" + prompt_spawn_name() — asks for display name, exports SPAWN_NAME_DISPLAY and SPAWN_NAME_KEBAB; skips if already set; honours SPAWN_NAME env var from CLI --name flag ~ get_resource_name() — replaces silent SPAWN_NAME fallback with a visible prefilled default: "Enter server name [my-dev-box]: " Per-cloud changes (cloud_authenticate gains prompt_spawn_name first): hetzner, fly, aws, daytona, digitalocean, sprite — one-line change each gcp/lib/common.sh: - Remove _to_kebab_case() (now in shared) - Remove _gcp_prompt_spawn_name() (now in shared as prompt_spawn_name) ~ cloud_authenticate: _gcp_prompt_spawn_name → prompt_spawn_name ~ get_server_name: simplified back to get_validated_server_name (shared get_resource_name now shows the kebab default in the prompt) Result — every cloud shows this flow upfront: Spawn name (e.g. "My Dev Box"): My Claude Box ℹ Resource name: my-claude-box ... Enter server name [my-claude-box]: ⏎ Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: use "Use project '...'?" instead of "Keep this project?" in GCP prompt Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 22:22:59 -08:00
Ahmed Abushagur	9e2f84adf0	fix: use native OpenRouter model_provider for Codex CLI config (#1490 ) Codex CLI's OPENAI_BASE_URL env var approach causes "Invalid Responses API request" errors because OpenRouter doesn't fully support the Responses API wire format via base URL override. Switch all 8 codex scripts to use ~/.codex/config.toml with model_provider="openrouter" which uses the native OpenRouter integration. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 18:47:40 -05:00
A	b29cf4a75d	fix: sync cloud READMEs with current agent list (#1486 ) READMEs across all 8 clouds still referenced 5 removed agents (NanoClaw, Cline, gptme, Plandex, Continue) and were missing ZeroClaw. Users following these docs got 404 errors. Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-19 17:47:57 -05:00
L	a67d83ed38	feat: reorder agents and remove NanoClaw (#1477 ) * feat: add ZeroClaw agent (14.9k stars, native OpenRouter support) Add ZeroClaw — a Rust-based autonomous AI assistant framework by Harvard/MIT/Sundai.Club communities — across all 8 clouds. Scripts: local, hetzner, digitalocean, fly, aws, gcp, daytona, sprite Install: bootstrap.sh with --install-rust + --install-system-deps Config: zeroclaw onboard --provider openrouter (via agent_configure) Env: OPENROUTER_API_KEY + ZEROCLAW_PROVIDER=openrouter (native support) Launch: zeroclaw agent Note: ZeroClaw compiles from Rust source (~5-10 min build time). A build-time warning is shown to set expectations. Also update test/mock-curl-script.sh to stub zeroclaw install URLs and add zeroclaw to mock agent binaries in test/mock.sh. Bump CLI version 0.5.8 → 0.5.9. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * feat: reorder agents and remove NanoClaw New agent order: claude → openclaw → zeroclaw → codex → opencode → kilocode - Remove NanoClaw (8 scripts + manifest entry + matrix entries + README row) - Reorder manifest.json agents section to match new order - Reorder matrix entries by cloud (local/hetzner/fly/aws/daytona/digitalocean/gcp/sprite) with agents in new order within each cloud block - Update README matrix table row order - Update test/mock.sh mock agent binary list to match - Bump CLI version 0.5.9 → 0.5.10 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 11:39:03 -08:00
L	f7458952b0	feat: remove Cline, gptme, Plandex, and Continue agents (#1475 ) Delete 32 agent scripts ({cloud}/{cline,gptme,plandex,continue}.sh across 8 clouds), remove the 4 agents from manifest.json with all their matrix entries, update README matrix rows, remove stale mock agent binaries and plandex.ai URL patterns from test harness, update CLI help examples to use remaining agents, and bump version 0.5.7 → 0.5.8. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 11:12:46 -08:00
A	5612cda40b	feat: remove Aider, Goose, Open Interpreter, Gemini CLI, Amazon Q from matrix (#1472 ) These 5 agents are being dropped from the Spawn matrix. This removes 45 agent scripts across 9 clouds, cleans the manifest, test fixtures, READMEs, CLI source, and shared library comments. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 12:31:00 -05:00
A	d8785c3d0b	security: fix command injection in cline auth via remote env var expansion (#1473 ) All 9 cline.sh scripts embedded OPENROUTER_API_KEY directly into the cloud_run command string, allowing shell metacharacter injection on the remote server. Fix by escaping the dollar sign (\${OPENROUTER_API_KEY}) so the variable is expanded on the remote machine where it's already set via agent_env_vars()/generate_env_config, not locally before being passed to cloud_run. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-19 12:25:16 -05:00
Ahmed Abushagur	8ee54d01a8	fix: harden agent reliability + security across all clouds (#1468 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use local github-auth.sh instead of curling from main When running from a local checkout, base64-encode the local github-auth.sh and send it inline to the remote machine. This ensures fixes (like the sudo skip for root) take effect immediately without waiting for a merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle github-auth errors gracefully instead of terminating GitHub CLI setup is optional — failures should not abort the spawn session. Guard both run_callback calls in offer_github_auth with \|\| log_warn so the script continues even if gh install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use GOOGLE_GEMINI_BASE_URL to route Gemini CLI through OpenRouter Gemini CLI ignores OPENAI_BASE_URL — it uses GEMINI_API_KEY to talk directly to Google's API. The OpenRouter key is not a valid Google API key, so all requests fail with "API key not valid". Use GOOGLE_GEMINI_BASE_URL to redirect Gemini CLI to OpenRouter's endpoint. Fixes all 9 cloud gemini scripts + manifest.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard optional spawn_agent hooks so failures don't kill the session With set -eo pipefail, any unguarded failure terminates the script. Several optional operations in spawn_agent were unguarded: - agent_configure: config file uploads (agent works with defaults) - agent_save_connection: convenience JSON for spawn list - agent_pre_launch: gateway daemons, startup hooks - agent_pre_provision: pre-provision prompts - .spawnrc shell hooks: hooking env vars into .bashrc/.zshrc These now log warnings and continue instead of aborting. Critical steps (cloud_authenticate, agent_install, cloud_provision) still exit on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: audit and fix env vars, escaping, and error handling across all agents Audit findings from 3 parallel agents, fixes applied: Env vars (4 agents fixed across 9 clouds each = 36 scripts): - Amazon Q: remove fake OPENAI_* vars (Q uses AWS auth, can't use OpenRouter) - Cline: replace OPENAI_* env vars with `cline auth -p openrouter` command - Open Interpreter: drop OPENAI_* vars, use only OPENROUTER_API_KEY (native support via --model flag) - NanoClaw: add ANTHROPIC_BASE_URL to .env file (was missing, requests went to Anthropic directly) Escaping: - execute_agent_non_interactive: replace printf '%q' with single-quote wrapping to avoid double-escaping on Fly.io Manifest updated for amazonq, cline, interpreter entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use setsid to detach openclaw gateway daemon from SSH sessions The gateway daemon launch (`nohup openclaw gateway ... & disown`) hangs on all clouds because SSH/exec channels wait for child FDs to close. setsid creates a new session, fully detaching the daemon so the channel can close immediately. Falls back to nohup where setsid is unavailable. Consolidates the daemon launch into a shared start_openclaw_gateway() function used by all 9 cloud scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: configure npm global prefix for non-root clouds (AWS, GCP, OVH) AWS Lightsail, GCP, and OVH SSH as non-root users (ubuntu/login user), so `npm install -g` fails with EACCES on /usr/local/lib/node_modules/. Fix: configure npm prefix to ~/.npm-global during cloud-init/setup and add ~/.npm-global/bin to the SSH PATH prefix so agent install commands find globally-installed npm binaries without sudo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove broken OpenRouter routing from Gemini CLI scripts Gemini CLI uses Google's native API format (/v1beta/models/:streamGenerateContent), not the OpenAI-compatible format (/v1/chat/completions). No base URL override can bridge this — the request formats are fundamentally incompatible. Same situation as Amazon Q (uses vendor-specific auth/API). Removed GEMINI_API_KEY and GOOGLE_GEMINI_BASE_URL from all 9 scripts + manifest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install AWS CLI and gcloud SDK when missing Instead of printing manual install instructions and exiting, both CLIs now auto-install: - AWS: downloads official .pkg (macOS) or .zip (Linux) installer - GCP: uses brew cask on macOS, Google's tarball installer on Linux Falls back to manual instructions if auto-install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: nanoclaw — install Docker on Linux, fix hardcoded /root/ path Two issues broke NanoClaw on all clouds: 1. .env upload hardcoded /root/nanoclaw/.env — fails on non-root clouds (AWS=ubuntu, GCP=user, OVH=ubuntu). Now uses upload_config_file with $HOME which expands on the remote side. 2. NanoClaw requires a container runtime. On Linux it uses Docker, but Docker was never installed. Added Docker install via get.docker.com to all cloud scripts (with sudo where SSH user is non-root). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review findings from PR #1463 - Reject symlinked github-auth.sh before base64-encoding (falls back to remote URL) - Hide API key from process list using curl -K - instead of -H in verify_openrouter_key Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: quote OPENROUTER_API_KEY in cline auth to prevent command injection Unquoted variable in `cline auth -p openrouter -k ${OPENROUTER_API_KEY}` allows shell metacharacters in the key to execute arbitrary commands on the remote server. Wrapping in escaped double quotes prevents expansion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 08:36:24 -05:00
Ahmed Abushagur	be904cbe1c	fix: install_agent double-escaping + github-auth reliability (#1460 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use local github-auth.sh instead of curling from main When running from a local checkout, base64-encode the local github-auth.sh and send it inline to the remote machine. This ensures fixes (like the sudo skip for root) take effect immediately without waiting for a merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle github-auth errors gracefully instead of terminating GitHub CLI setup is optional — failures should not abort the spawn session. Guard both run_callback calls in offer_github_auth with \|\| log_warn so the script continues even if gh install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use GOOGLE_GEMINI_BASE_URL to route Gemini CLI through OpenRouter Gemini CLI ignores OPENAI_BASE_URL — it uses GEMINI_API_KEY to talk directly to Google's API. The OpenRouter key is not a valid Google API key, so all requests fail with "API key not valid". Use GOOGLE_GEMINI_BASE_URL to redirect Gemini CLI to OpenRouter's endpoint. Fixes all 9 cloud gemini scripts + manifest.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard optional spawn_agent hooks so failures don't kill the session With set -eo pipefail, any unguarded failure terminates the script. Several optional operations in spawn_agent were unguarded: - agent_configure: config file uploads (agent works with defaults) - agent_save_connection: convenience JSON for spawn list - agent_pre_launch: gateway daemons, startup hooks - agent_pre_provision: pre-provision prompts - .spawnrc shell hooks: hooking env vars into .bashrc/.zshrc These now log warnings and continue instead of aborting. Critical steps (cloud_authenticate, agent_install, cloud_provision) still exit on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 05:21:55 -05:00
Ahmed Abushagur	159ad49fec	fix: harden openclaw across all clouds (#1456 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 09:25:48 +00:00
A	f3ffb6caed	fix: broken error message in multi-creds validation, predictable temp path (#1442 ) 1. _multi_creds_validate referenced undefined help_url variable, causing empty "Get new credentials from: " error messages when OVH credential validation fails. Added help_url as parameter and pass it from caller. 2. _spawn_inject_env_vars (used by 130+ agent scripts via spawn_agent) uploaded credentials to static /tmp/env_config path. The older inject_env_vars_ssh/inject_env_vars_cb functions document this as a symlink attack vector and use randomized paths. Fixed to match. 3. Removed dead inject_env_vars_fly and inject_env_vars_sprite functions (all agent scripts now use spawn_agent -> _spawn_inject_env_vars). Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-18 07:51:28 -05:00
Ahmed Abushagur	db4aaa0c73	fix: prevent SSH hangs, fix command escaping, pin Python 3.12 for aider (#1439 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 04:23:15 -05:00
Ahmed Abushagur	d9e6d058e0	fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds (#1436 ) aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 03:21:59 -05:00
Ahmed Abushagur	633ce8eaac	feat: upgrade default server sizes, fix Fly.io agent installs, improve E2E tests (#1428 ) - Upgrade default VM sizes across clouds for better agent performance: - Hetzner: cpx11 → cx23 (with cx22 fallback support for deprecated types) - DigitalOcean: s-2vcpu-2gb → s-2vcpu-4gb - Daytona: 2048MB → 4096MB memory - Oracle: VM.Standard.E2.1.Micro → VM.Standard.A1.Flex - OVH: d2-2 → d2-4 - Fix Fly.io agent failures: - Add Node.js + build-essential to wait_for_cloud_init (fixes npm-based agents) - Prepend PATH in interactive_session (fixes "source not found" errors) - Fix openclaw installs across clouds: use explicit PATH export instead of source - Fix DigitalOcean token validation (check "uuid" not "id") - Fix AWS cloud-init: chown .bashrc/.zshrc to ubuntu user - Improve Hetzner fallback: add "cheapest available" as last-resort fallback - Upgrade E2E tests: per-combo auto-fix, credential collection, robustness fixes Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:17:08 -08:00
Ahmed Abushagur	963144ecbd	fix: use pipx to install Aider across all clouds (#1429 ) Ubuntu 24.04 blocks system-wide pip installs (PEP 668 externally-managed- environment). Switch all aider.sh scripts from `pip install aider-chat` to `python3 -m pip install pipx && pipx install aider-chat`, which installs into an isolated virtualenv and works on all target distros. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:16:39 -08:00
Ahmed Abushagur	22b6a402f4	feat: E2E test harness, QA pipeline integration, macOS compat linter (#1425 ) * feat: add QA upgrade — macOS compat linter, per-agent mock assertions Layer 1: macOS compat linter (test/macos-compat.sh) - 12 rules (MC001–MC012) catching bash 3.2 incompatibilities - Detects: base64 -w0 file args, non-portable echo flags, source <(), ((var++)), read -d, nounset flag, sed -i, date %N, local -n, declare -A, ${var,,}, and \|& - Added to CI lint.yml in warn-only mode for burn-in - Integrated as Phase 0.5 in qa-dry-run.sh Layer 2: Per-agent mock assertions - test/fixtures/_shared_agent_assertions.sh with install checks for all 15 agents (claude, openclaw, aider, goose, etc.) - Integrated into test/mock.sh via _run_agent_assertions() Also includes branch fixes: - Fix base64 -w0 to use stdin redirect (aws, daytona, fly) - Fix fly/openclaw to use npm install instead of broken curl\|bash Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E test harness and integrate into QA pipeline Add test/e2e.sh — a full E2E test harness that provisions real servers, installs agents, and verifies setup across all clouds. Features: - Smoke test (one canary agent per cloud) and full matrix modes - Credential auto-detection for 8 clouds - Per-cloud preflight validation (sequential) then parallel agent tests - Stale server cleanup, timing history, cross-cloud comparison - Auto-fix and optimization phases via Claude agents - macOS bash 3.2 compatible Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh: - Runs after mock tests pass, gated on cloud credentials - Phase 5b auto-fixes failures using per-agent worktree branches - Parses results and includes in QA summary Also fixes: - shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read() - aws/lib/common.sh: fix SSH key import (use cat instead of base64, handle race condition on concurrent imports) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:41:07 -05:00
A	3e13a213f1	security: fix command injection in fly/lib/common.sh bash -c invocations (#1423 ) Quote $escaped_cmd inside the -C argument to bash -c in run_server() and interactive_session() to prevent word splitting. Without quotes, even though printf '%q' escapes shell metacharacters, the shell still splits the escaped command on whitespace before passing it to bash -c, enabling potential argument injection. Fixes #1422 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 19:35:23 -05:00
Ahmed Abushagur	2f1398f5b4	fix: use official curl installer for OpenClaw on Fly.io (#1391 ) * test: add mock test coverage for all 15 Fly.io agent scripts Fly.io had zero test coverage — every bug fixed this session (stale tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went undetected. This adds the full mock test infrastructure: - test/fixtures/fly/ — env vars, API assertions, fixture JSONs for app creation, machine creation, and token validation endpoints - test/mock-curl-script.sh — URL stripping for api.machines.dev, body validation for machine creation, synthetic status responses, app creation POST handler, state tracking - test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token), URL stripping, required field validation, base64 mock - test/record.sh — Fly.io REST endpoints now recordable, live create+delete cycle, error detection, auth var mapping All 15 agent scripts (aider, claude, openclaw, etc.) are automatically discovered and tested: 75 passed, 0 failed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use official curl installer for OpenClaw on Fly.io bun install -g openclaw fails on Fly.io's bare Ubuntu image. Switch to the official installer (curl -fsSL https://openclaw.ai/install.sh \| bash) which handles Node.js detection and dependency installation automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 06:29:32 -05:00
Ahmed Abushagur	14d36d1e1d	fix: Fly.io SSH reliability and app name UX (#1388 ) * fix: re-prompt on taken Fly.io app names + timeout run_server Two fixes for Fly.io UX: 1. When app name is globally taken by another user, re-prompt instead of failing. Returns exit code 2 from _fly_create_app so create_server can loop with a new name. 2. run_server now has a 5-minute timeout (portable, no coreutils needed) to prevent indefinite hangs like the 3-hour SSH session stall. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wait for SSH before installing tools on Fly.io The previous wait_for_cloud_init immediately ran apt-get via fly ssh console on a machine that wasn't SSH-reachable yet, causing indefinite hangs. Now: 1. _fly_wait_for_ssh polls with a 30s-timeout echo until SSH responds 2. Shows progress at each step instead of suppressing all output 3. Each run_server call has an explicit timeout (10min for apt, 2min for bun, 30s for PATH exports) 4. Retries package install once on timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: run fly ssh console in foreground, not background fly ssh console breaks when backgrounded with & — it needs a foreground process to establish the connection. Reverted to foreground execution and use timeout/gtimeout when available (Linux/CI). On macOS where timeout isn't available, the user can Ctrl+C hung commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: ensure bun PATH is available in non-interactive fly ssh sessions Ubuntu's default .bashrc returns early for non-interactive shells, so "source ~/.bashrc && bun install -g openclaw" silently fails — the PATH line at the bottom of .bashrc is never reached. Fix by prepending ~/.bun/bin to PATH in run_server() so all remote commands have access to tools installed during wait_for_cloud_init. Also fix spawn_agent to explicitly handle agent_install failure instead of relying on set -e (which exits silently). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:54:34 -05:00
Ahmed Abushagur	999751537d	fix: validate saved tokens + handle FlyV1 auth scheme (#1386 ) * fix: validate saved API tokens before use Tokens loaded from config files (e.g. ~/.config/spawn/fly.json) were never validated, so expired or revoked tokens would silently pass through and only fail at the point of use (e.g. app creation). Now the provider's test function runs on config-file tokens too, falling through to a fresh prompt if validation fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle FlyV1 token auth scheme for Fly.io Machines API Fly.io dashboard tokens use the format "FlyV1 fm2_..." where "FlyV1" is the authorization scheme itself, not a Bearer token prefix. The script was always sending "Authorization: Bearer FlyV1 fm2_..." which the API rejects with "token validation error". Now detects FlyV1-prefixed tokens and sends them as "Authorization: FlyV1 fm2_..." using custom auth headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make refactor service actually run reliably Three fixes for the refactor workflow that was producing zero PRs: 1. community-coordinator: Gemini → Sonnet — Gemini doesn't support the Task tool, causing a respawn on every single cycle 2. Monitoring loop: replace "sleep 5" (which drifted to sleep 30) with explicit short-sleep instructions and CRITICAL rule that every turn must include a tool call to stay alive 3. Lifecycle management: explicit shutdown sequence with retry, preventing early exit that orphans teammates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 04:31:46 -05:00
A	f412fb69bc	ux: wait for OpenClaw gateway to be ready before launching TUI (#1385 ) Fixes #1354 - users experienced a ~30s delay with "gateway not connected" errors when trying to use OpenClaw immediately after launch. Root cause: gateway takes time to bind to port 18789, but TUI launched after only 2 seconds. Solution: Add wait_for_openclaw_gateway() helper that polls the gateway port (max 30s) before launching TUI, ensuring immediate usability. Changes: - shared/common.sh: Add wait_for_openclaw_gateway() function - All openclaw.sh scripts (10 files): Replace sleep 2 with gateway readiness check Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 03:49:53 -05:00
A	8d533d3908	fix: add error handling for critical ID/IP extraction failures (#1323 ) Prevent silent failures when cloud API responses don't contain expected server/instance IDs or IPs. Without these checks, scripts would continue with empty variables, leading to cryptic failures downstream (e.g., "ssh root@" or API calls with empty IDs). Changes: - fly: Check FLY_MACHINE_ID after extraction, fail fast with clear error - ovh: Check OVH_INSTANCE_ID after extraction, fail fast with clear error - hetzner: Check HETZNER_SERVER_ID and HETZNER_SERVER_IP (+ null check for jq) - digitalocean: Check DO_DROPLET_ID after extraction, fail fast with clear error Impact: Improves reliability by catching API response parsing failures immediately rather than propagating empty values to SSH/API calls. Agent: code-health Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:22:48 -05:00
Ahmed Abushagur	758b575658	feat: add server lifecycle management (reconnect + delete) (#1363 ) Wire up connection tracking across all 10 clouds so users can reconnect to and delete previously spawned servers via `spawn list` and `spawn delete`. Phase 1 - Connection tracking: - Extend save_vm_connection() with cloud and metadata params - Add save_vm_connection to create_server() in all cloud libs - Extend VMConnection with cloud, deleted, deleted_at, metadata fields Phase 2 - Delete via interactive picker: - Add "Delete this server" option to spawn list picker - Build delete scripts that reuse each cloud's destroy_server() - Confirmation UX with spinner feedback - Soft-delete marking in history (deleted records show [deleted]) Phase 3 - Standalone delete command: - spawn delete (aliases: rm, destroy) with interactive picker - Filter support: spawn delete -a <agent> -c <cloud> Also improves reconnect hints for Fly (fly ssh console) and Daytona (daytona ssh) connections. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 17:06:49 -08:00
L	55e6b2e88e	fix: use ~/.spawnrc for env vars instead of inlining into .bashrc (#1362 ) Ubuntu's default .bashrc has an interactive-shell guard that exits early in non-interactive contexts. When SSH runs a command string (ssh -t user@host -- "cmd"), the shell is non-interactive, so env vars appended to .bashrc are never loaded — causing Claude Code to start without OpenRouter credentials and get rejected. Fix: write env vars to ~/.spawnrc and have .bashrc/.zshrc source it. Launch commands source ~/.spawnrc directly, bypassing the guard. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 17:05:17 -08:00
A	ec81c74594	refactor: introduce cloud adapter + spawn_agent runner system (#1340 ) Eliminate ~70% boilerplate across 149 agent scripts by introducing a standard cloud_* adapter interface and spawn_agent orchestration runner. Each cloud's lib/common.sh now exports 7 adapter functions (cloud_authenticate, cloud_provision, cloud_wait_ready, cloud_run, cloud_upload, cloud_interactive, cloud_label) that wrap cloud-specific operations behind a uniform interface. Agent scripts define hooks (agent_install, agent_env_vars, agent_launch_cmd, etc.) and call `spawn_agent "Agent Name"` — the runner handles the full deployment flow: auth → provision → wait → install → API key → env → config → launch. - shared/common.sh: add spawn_agent(), _fn_exists(), _spawn_inject_env_vars() - 10 cloud lib/common.sh files: add cloud_* adapter functions - 149 agent scripts: rewrite to hook pattern (~40-80 lines → ~20-35 lines) - test/run.sh: update 2 sprite test patterns for new adapter paths - Net reduction: ~4,300 lines (2,257 added, 6,563 removed) Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 16:25:44 -08:00
A	392fbb7049	fix: source .bashrc in launch commands so env vars are available (#1325 ) Env vars (OPENROUTER_API_KEY, ANTHROPIC_BASE_URL, etc.) are written to ~/.bashrc by inject_env_vars_* functions, but launch commands only exported PATH inline — they never sourced .bashrc. This meant Claude started without API keys. Previously `source ~/.bashrc` was removed because fnm's eval corrupted PATH. fnm has been completely removed from the codebase, so it's now safe to source .bashrc again. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 15:00:24 -05:00
A	bcb59eb925	fix: stop sourcing rc files in launch command — fnm env destroys PATH (#1261 ) Root cause: the launch command did `source ~/.bashrc; source ~/.zshrc; claude`. The .zshrc contains `eval "$(fnm env)"` which outputs PATH with literal "$PATH" in quotes instead of expanding it, destroying the entire PATH. Confirmed via debugging: - `ssh -t ... 'export PATH=...; which claude'` → works (/root/.bun/bin/claude) - `ssh -t ... 'export PATH=...; source ~/.zshrc; which claude'` → "command not found" - `source ~/.zshrc; echo $PATH` → `"/run/user/0/fnm_multishells/...":"$PATH"` (broken) Fix: - Remove `source ~/.bashrc` and `source ~/.zshrc` from ALL launch commands - ssh -t creates a pseudo-terminal, so bash auto-sources .bashrc for env vars - Explicit PATH export is all we need for finding the claude binary - Remove fnm eval snippet from _finalize_claude_install (it poisoned rc files) - Also: clean up stale ~/.bash_profile, fix cloud-init PATH, move node install after bun attempt Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 01:06:55 -08:00
A	3030b1d036	fix: revert .profile writes, use explicit PATH in launch commands (#1260 ) Stop writing env vars to ~/.profile and ~/.bash_profile — only write to .bashrc and .zshrc. The .profile approach caused issues because login shells source it inconsistently across distros, and creating .bash_profile makes bash -l skip .profile entirely. Replace `bash -lc claude` launch commands with explicit PATH export + source pattern across all cloud providers. This ensures claude is found regardless of shell initialization quirks. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 00:43:49 -08:00
A	46e6f46008	fix: stop creating ~/.bash_profile — was destroying system PATH (#1258 ) On Ubuntu/Debian, ~/.bash_profile doesn't exist by default. When bash starts as a login shell (bash -l), it sources the FIRST file it finds from: ~/.bash_profile, ~/.bash_login, ~/.profile. Since only ~/.profile exists, that's what gets sourced — and ~/.profile sets up the standard PATH (/usr/bin, /bin, etc.) and sources ~/.bashrc. Our inject_env_vars_* functions and _finalize_claude_install were writing to ~/.bash_profile and ~/.zprofile (either via touch+append or via for-loop over all rc files). Creating ~/.bash_profile caused bash -l to source it INSTEAD of ~/.profile, completely losing the standard PATH setup. After deployment, even basic commands like `ls` would fail. Fix: Only write to ~/.profile, ~/.bashrc, ~/.zshrc across all clouds (shared, fly, sprite). These are the standard files that work correctly on all Linux distros without breaking the shell initialization chain. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 00:27:28 -08:00
A	99b21e2797	fix: write env config to all shell startup files including .bash_profile (#1251 ) Root cause: bash -l sources the FIRST of ~/.bash_profile, ~/.bash_login, ~/.profile. If ~/.bash_profile exists (e.g. from cloud-init), ~/.profile is never read and our claude PATH exports are invisible. Additionally, .bashrc has a non-interactive guard that skips exports when sourced from non-interactive shells like `ssh host "cmd"` or `bash -lc`. Fix: write env config and PATH entries to ALL shell startup files: ~/.profile, ~/.bash_profile, ~/.bashrc, ~/.zshrc, ~/.zprofile. This ensures both login and interactive shells on any platform find claude. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 00:04:36 -08:00
A	dac4c62d6c	fix: try bun before npm for Claude Code install, fix PATH in launch (#1249 ) Two fixes: 1. Swap fallback order from curl → npm → bun to curl → bun → npm. Bun is faster and typically pre-installed. Use `bun i -g`. 2. Fix "claude: command not found" at launch. The default .bashrc has a non-interactive guard (`case $- in i) ;; *) return;; esac`) that skips PATH exports when sourced from SSH command strings. Fix: write env config to ~/.profile (always sourced by login shells) in addition to .bashrc/.zshrc, and launch with `bash -lc claude` which starts a login shell that sources ~/.profile. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-15 23:44:02 -08:00
A	db06ff84e0	fix: run claude install --force and persist fnm PATH to shell configs (#1245 ) After installing Claude Code (via any method), run `claude install --force` to set up shell integration, then ensure fnm bootstrap is persisted to both .bashrc and .zshrc so interactive sessions can find node. Also simplify all launch commands across 9 clouds: instead of hardcoding PATH entries that may miss fnm, source the rc files which now contain all the necessary PATH entries from both inject_env_vars and _finalize_claude_install. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-15 23:34:09 -08:00
A	6357e0b2d1	fix: ask GitHub CLI setup before provisioning, not after (#1243 ) Previously offer_github_auth prompted interactively inside inject_env_vars_*, which runs after the server is already provisioned. This means the user sits through provisioning before being asked a simple yes/no question. Split into two phases: - prompt_github_auth: asks the question early (before create_server) - offer_github_auth: executes the install later (after server is up), using the stored answer without re-prompting Falls back to interactive prompt if prompt_github_auth was never called, so non-claude scripts and older clouds keep working unchanged. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-15 23:20:59 -08:00
A	d0847986f8	fix: use shared install_claude_code across all clouds with fnm PATH fix (#1242 ) All cloud claude.sh scripts had inline curl-only installs with no fallback. When the curl installer failed (transient outage, rate limit), installation failed with no recovery. Additionally, fnm-installed Node.js was invisible to subsequent SSH sessions because each SSH command runs in a non-interactive shell that doesn't source .bashrc/.zshrc. Changes: - Migrate 8 cloud scripts to use shared install_claude_code (curl → npm → bun) - Move _ensure_node_runtime before npm/bun install attempts (not after) - Add fnm paths to claude_path so node is discoverable across SSH sessions - Prefix npm/bun install commands with claude_path for PATH visibility - Update test assertion to match new install_claude_code behavior Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-15 23:16:23 -08:00
L	d8ac64863d	fix: inject env vars into both .bashrc and .zshrc, fix PATH across all clouds (#1213 ) API keys and env vars were only written to .zshrc, so SSH sessions using bash couldn't find credentials. Also fixes incorrect ~/.claude/local/bin PATH (claude installs to ~/.local/bin) and syncs interactive_session PATH with cloud-init PATH across all 9 clouds. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-15 17:30:40 -08:00
A	9336998168	fix(ux): add post-session summary to 10 exec-based cloud providers (#1056 ) Users on exec-based clouds (Fly, Render, Koyeb, Northflank, Railway, Modal, Daytona, E2B, CodeSandbox, GitHub Codespaces) got no warning when their session ended that their service was still running and incurring charges. This adds: - _show_exec_post_session_summary() in shared/common.sh for non-SSH providers that use CLI exec commands instead of direct SSH - SPAWN_DASHBOARD_URL for all 10 exec-based clouds so users get actionable dashboard links - Post-session summary calls in each cloud's interactive_session() - 33 new tests covering the exec post-session summary feature Agent: ux-engineer Co-authored-by: A <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 00:38:10 -05:00
A	f586e19790	fix(security): replace unquoted heredocs with printf to prevent shell expansion in API keys (#1031 ) Unquoted `<< EOF` heredocs in nanoclaw .env file creation cause shell expansion of the API key value. If an API key contains `$`, backticks, or `\`, the value is silently corrupted or could trigger command execution. Replace with `printf '%s'` which safely writes the value without interpretation. Also fix unquoted variable expansion in upload_config_file's mv command and the github-codespaces/openclaw.sh config heredoc. Fixes 34 scripts across all cloud providers. Agent: security-auditor Co-authored-by: A <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 19:41:10 -05:00
A	d2fbd325b0	refactor: decompose fly get_server_name and oracle _setup_vcn_networking (#1000 ) - fly/lib/common.sh: Replace 23-line get_server_name() that duplicated env-var-check, prompt, and validation logic with a one-line call to the shared get_validated_server_name helper, matching all other cloud providers. - oracle/lib/common.sh: Break _setup_vcn_networking (48 lines, 3 distinct responsibilities) into focused helpers: - _create_internet_gateway: creates the IGW resource - _add_default_route: configures the route table - _add_ssh_security_rules: opens SSH port in the security list The orchestrator _setup_vcn_networking now delegates to these three helpers. Agent: complexity-hunter Co-authored-by: A <6723574+louisgv@users.noreply.github.com>	2026-02-13 12:57:11 -08:00
A	a0f6b335a4	fix: harden upload_file path validation with strict allowlist regex across 10 clouds (#993 ) Replace fragile blocklist validation and printf '%q' escaping in upload_file() with strict allowlist regex [a-zA-Z0-9/_.~-]+ across all non-SSH cloud providers. For codesandbox, additionally migrate from shell command interpolation to SDK filesystem API via environment variables, eliminating the injection surface entirely. Affected clouds: codesandbox, daytona, e2b, fly, koyeb, modal, northflank, railway, render, sprite Fixes #989 Agent: security-auditor Co-authored-by: A <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 12:20:40 -08:00
A	0f60a2b082	fix: add actionable guidance to agent installation failures across 126 scripts (#966 ) Add log_install_failed helper to shared/common.sh that provides structured troubleshooting for agent install failures: possible causes, SSH debug command (when server IP available), manual install command, and re-run suggestion. Also improve SSH key registration error message. Agent: ux-engineer Co-authored-by: A <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 10:14:03 -08:00

1 2

80 commits