spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-04-28 11:59:29 +00:00

Author	SHA1	Message	Date
A	e9cbab5b7f	fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936 ) Three fixes for Sprite E2E failures in long-running batches (73+ min): 1. Retry `_sprite_provision_verify`: list failures now retry 3x with exponential backoff (5s, 10s, 20s) instead of failing immediately. Fixes kilocode batch 6 "Could not list Sprite instances" errors. 2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add `Client.Timeout`, `request canceled`, and `authentication failed` to the transient error retry pattern in `spriteRetry`. Also uses linear backoff (3s * attempt) instead of fixed 3s delay. Fixes hermes batch 7 HTTP timeout errors. 3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The E2E orchestrator calls `cloud_refresh_auth` before each provisioning batch. For Sprite, this re-validates the token via `sprite org list` and attempts `sprite auth refresh` if expired. Fixes junie batch 8 "authentication failed" errors. Fixes #2934 Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 21:47:58 -07:00
A	3630c07c70	fix(e2e): add per-agent timeout to prevent silent hangs in E2E runs (#2720 ) The E2E framework's run_single_agent function had no overall timeout. When provision/verify/input_test steps hung (e.g. cloud_exec blocking on sprite-zeroclaw or digitalocean-opencode), the process would stall indefinitely without writing a .result file, causing silent test failures. Add a per-agent wall-clock timeout (default 1800s, 2400s for junie) that wraps the core provision/verify/input_test logic in a killable subshell. If the timeout expires, the subshell is killed and a "fail" result is written, ensuring E2E batches always complete. Fixes #2714 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-17 13:16:09 -07:00
A	8fe6450485	fix(e2e): increase provision timeout for junie on hetzner (#2683 ) * fix(e2e): increase provision timeout for junie on hetzner junie's install takes >720s on Hetzner, exceeding the default PROVISION_TIMEOUT and causing 100% E2E failure for hetzner-junie. Add a per-agent provision timeout mechanism in common.sh via get_provision_timeout(). This checks (in order): 1. PROVISION_TIMEOUT_<agent> env var override 2. Built-in per-agent default (_PROVISION_TIMEOUT_junie=1200) 3. Global PROVISION_TIMEOUT (720s) provision.sh now calls get_provision_timeout() to resolve the effective timeout per agent instead of using the flat global. Fixes #2680 Agent: code-health Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(security): whitelist-sanitize agent name before eval in get_provision_timeout tr '-' '_' only replaced hyphens, allowing metacharacters like $, backticks, and ; to pass through into eval, enabling shell injection via a crafted agent name. Replace with sed whitelist [A-Za-z0-9_] to strip all unsafe chars. Agent: team-lead Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-16 00:54:03 -07:00
A	68abbee4df	fix(e2e): fix OPENROUTER_API_KEY fallback and sprite env whitelist (#2491 ) On QA VMs running Claude Code via OpenRouter, the API key is stored as ANTHROPIC_AUTH_TOKEN. Add a fallback in common.sh so e2e.sh picks up the key from ANTHROPIC_AUTH_TOKEN when ANTHROPIC_BASE_URL points to openrouter.ai and OPENROUTER_API_KEY is unset. Also add SPRITE_NAME and SPRITE_ORG to the headless env var whitelist in provision.sh — these are emitted by _sprite_headless_env() but were missing from the positive whitelist, causing every Sprite provisioning attempt to log errors and silently skip the env vars. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 03:23:46 -04:00
A	a22fe9010c	fix: safe printf format strings and document e2e source usage (#2445 ) install.sh: Replace color variable interpolation in printf format strings with %b arguments to prevent format string injection (fixes #2443). common.sh: Use %b for color escapes in logging functions. Document that BASH_SOURCE and source usage in load_cloud_driver is intentional since e2e scripts are filesystem-only, not curl\|bash (fixes #2438). Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:28:45 -04:00
A	3724bb8ba4	fix: address SSH command injection risks in e2e cloud drivers (#2447 ) Add defense-in-depth validation across all e2e cloud driver scripts: - Validate IP addresses match IPv4 format before use in SSH commands (aws, digitalocean, gcp, hetzner) - Validate SSH username contains only safe characters (gcp) - Validate resource IDs are numeric before interpolating into API URLs (digitalocean droplet IDs, hetzner server IDs) - URL-encode app name in Hetzner API query parameter to prevent query parameter injection - Validate numeric env vars (INPUT_TEST_TIMEOUT, PROVISION_TIMEOUT, INSTALL_WAIT) that get interpolated into remote command strings Fixes #2432, #2433, #2434, #2435, #2442 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:27:47 -04:00
A	c4ae16849d	refactor: remove dead cloud_exec_long and _*_exec_long functions (#2407 ) The cloud_exec_long dispatcher in common.sh and all five cloud-specific _exec_long implementations (aws, digitalocean, gcp, hetzner, sprite) were defined but never called by any code in the e2e test suite. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 19:39:53 -07:00
A	23fea2df21	fix(e2e): add junie agent to E2E test harness (#2314 ) The junie agent was added in #2300 but the E2E test scripts were not updated. This adds junie to ALL_AGENTS, verify dispatch, input test dispatch, and the provision.sh fallback env configuration. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-08 00:03:32 -05:00
Ahmed Abushagur	4a90abdaa2	fix(e2e): improve openclaw reliability on AWS and other clouds (#2123 ) * fix(e2e): improve openclaw reliability on AWS and other clouds Three changes to make openclaw e2e tests more robust: 1. Increase PROVISION_TIMEOUT from 480s to 720s — AWS cloud-init for "full" tier (Node.js + Bun + build-essential) can exceed 480s, causing the CLI to be killed before .spawnrc is written. 2. Add .spawnrc manual fallback in provision.sh — if the CLI is killed before writing .spawnrc, construct it via SSH using OPENROUTER_API_KEY with agent-specific env vars (openclaw, zeroclaw). 3. Add retry logic to openclaw gateway input test — the gateway can crash with 1006 websocket closure on resource-constrained instances. Now retries once after killing and restarting the gateway process. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(security): fix command injection in e2e provision scripts - Use printf %q and temp file for api_key handling in provision.sh to prevent shell metachar injection (single quotes, backticks, $) - Double-quote env_b64 interpolation in cloud_exec call to prevent word splitting - Replace echo with printf in bashrc append to avoid portability issues - Replace overbroad pkill -f 'openclaw gateway' in verify.sh with PID-targeted kill via lsof/fuser on port 18789 Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-03-02 23:19:34 -05:00
A	277c4236a3	fix(security): replace eval with direct indirection in load_cloud_driver (#2121 ) Removes eval-based function creation pattern in e2e/lib/common.sh. Uses variable indirection (ACTIVE_CLOUD global + wrapper functions) instead of eval to reduce attack surface. Fixes #2118 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-02 16:50:27 -05:00
A	d713f9650f	feat: add hermes agent to 4 clouds, bump install wait to 600s (#2084 ) - Add hermes shim scripts for GCP, Hetzner, DigitalOcean, and Daytona - Update manifest.json matrix entries from "missing" to "implemented" - Bump default INSTALL_WAIT from 300s to 600s to fix zeroclaw timeout on small VMs where Rust compilation takes 8-12 minutes - Update cloud READMEs with hermes usage docs - Bump CLI version to 0.11.18 Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 19:31:50 -05:00
A	ab08476a63	refactor: Remove dead code and stale references (#2028 ) - Add `hermes` to ALL_AGENTS in sh/e2e/lib/common.sh (stale: hermes added to manifest.json in #2023 but never added to the e2e agent list) - Add verify_hermes() and input_test_hermes() to sh/e2e/lib/verify.sh and wire them into verify_agent/run_input_test dispatch tables - Remove dead log_warn() from sh/shared/github-auth.sh (defined but never called) - Remove dead get_cloud_env_vars() from sh/shared/key-request.sh (no callers outside file) - Remove dead invalidate_cloud_key() from sh/shared/key-request.sh (no callers anywhere) Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-28 12:55:09 -08:00
Ahmed Abushagur	c1e605c884	fix(e2e): increase server sizes and install timeouts (#2014 ) E2E tests were failing because agent installs didn't complete within the default 120s timeout, and small VMs ran out of memory during builds. - INSTALL_WAIT: 120s → 300s (with per-cloud override via cloud_install_wait) - AWS: nano_3_0 → medium_3_0 (all agents need 4GB for reliable installs) - DigitalOcean: s-1vcpu-512mb-10gb → s-2vcpu-2gb, cap at 3 parallel - GCP: e2-medium → e2-standard-2 - Hetzner: cap at 5 parallel (primary IP limit) - Sprite: 300s install wait (slower exec than SSH) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-28 00:25:36 -08:00
Ahmed Abushagur	c595c90dc4	fix(e2e): prevent multi-cloud name and file collisions (#2013 ) When multiple clouds run in parallel, they generate the same app name (e.g. e2e-claude-TIMESTAMP) and write to the same temp files (.exit/.stdout/.stderr), causing data corruption. - Include ACTIVE_CLOUD in make_app_name: e2e-gcp-claude-TIMESTAMP - Use ${app_name} instead of ${agent} for provision temp files Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-28 03:12:05 -05:00
Ahmed Abushagur	627026a26b	feat(e2e): multi-cloud test suite with cloud driver pattern (#2004 ) * feat(e2e): multi-cloud test suite with cloud driver pattern Scale the E2E test suite from AWS-only to all 6 infrastructure clouds (aws, hetzner, digitalocean, gcp, daytona, sprite) with parallel execution support. Architecture: - Cloud driver pattern: each cloud implements _cloudname_func() functions - load_cloud_driver() wires cloud-specific functions to generic names (cloud_exec, cloud_teardown, etc.) - Shared orchestration stays in one place, cloud details are isolated New files: - sh/e2e/e2e.sh — unified entry point with --cloud flag - sh/e2e/lib/clouds/{aws,hetzner,digitalocean,gcp,daytona,sprite}.sh Refactored: - common.sh — removed AWS constants, added load_cloud_driver() - provision.sh — cloud-agnostic via cloud_headless_env/cloud_provision_verify - verify.sh — replaced aws_ssh with cloud_exec/cloud_exec_long - teardown.sh/cleanup.sh — delegate to cloud driver functions - aws-e2e.sh — thin wrapper: exec e2e.sh --cloud aws Usage: e2e.sh --cloud aws # Single cloud e2e.sh --cloud aws --cloud hetzner # Multiple clouds in parallel e2e.sh --cloud all --parallel 3 # All clouds, 3 agents parallel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(e2e): prevent subshell EXIT trap inheritance and single-cloud early exit - Reset EXIT trap in multi-cloud subshells to prevent LOG_DIR deletion before the main process reads log files - Use `\|\| true` for single-cloud run_agents_for_cloud to prevent set -e from skipping the summary on env validation failure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: default to parallel agent provisioning in e2e tests All agents within a cloud now run in parallel by default instead of sequentially. Use --sequential to restore the old behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cap sprite parallelism, 4GB for openclaw, remove stderr suppression - Sprite: add _sprite_max_parallel (cap 2 concurrent agents) to avoid CLI rate limiting that caused all 6 agents to fail - AWS: use medium_3_0 (4GB) bundle for openclaw which needs more RAM - Input tests: remove 2>/dev/null from agent commands so failures produce visible error output instead of empty responses - Add cloud_max_parallel to driver interface, respected by e2e.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use bash instead of sh for exec_long across all cloud drivers Ubuntu's /bin/sh is dash, which doesn't support bash-specific PATH sourcing from .spawnrc/.cargo/env. This caused codex and zeroclaw input tests to fail with "command not found" even though verify passed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: codex input test uses positional prompt, not -q flag codex CLI takes prompt as positional arg: `codex "PROMPT"`. The -q flag doesn't exist, causing "Usage:" error output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use codex exec -q for non-interactive input test codex requires `exec` subcommand for non-interactive mode. Plain `codex PROMPT` expects a TTY (stdin is not a terminal). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: codex exec takes no -q flag, just positional prompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use cx23 instead of deprecated cx22 for Hetzner e2e tests Hetzner deprecated server type cx22 (ID 104). The default now uses cx23. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-27 19:28:08 -08:00
A	d04096a15b	feat!: remove Fly.io cloud provider support (#1979 ) * feat!: remove Fly.io cloud provider support Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io infrastructure internally) is retained. - Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/ - Remove fly cloud entry and 6 fly matrix entries from manifest.json - Remove fly imports, destroy cases, and connection handlers from commands.ts - Remove fly-ssh sentinel from security.ts - Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh) - Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts - Clean up fly references in build config, gitignore, icon sources - Bump CLI version to 0.11.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: restore Docker image build under sh/docker/ Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename workflow from fly-docker.yml to docker.yml with updated paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix extra blank lines in commands.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-27 00:06:32 -05:00
A	4994c28594	fix(security): harden shell scripts - fix sed portability, curl HTTPS enforcement, token expiry (#1917 ) - MEDIUM: Validate flyctl auth status before empty FLY_API_TOKEN fallback in provision.sh (fail fast instead of silent failure) - LOW: Fix sed -i portability in qa.sh (use sed -i.bak for macOS compat) - LOW: Increase FLY_API_TOKEN expiry from 2h to 8h in common.sh - LOW: Add --proto '=https' to all curl -L calls in digitalocean scripts (6 files) to prevent HTTP downgrade on redirects Fixes #1913 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-25 03:23:32 -08:00
A	154112fb41	feat: add live input/output E2E verification for agents (#1886 ) * feat: add live input/output E2E verification for agents The E2E suite previously only verified static artifacts (binaries, config files, env vars). An agent with a broken API key or crash-on-launch bug would pass all checks. This adds an input test phase that sends a real prompt to each agent and verifies the response contains a marker string. - Add fly_ssh_long() with configurable timeout for long-running commands - Add per-agent input test functions (claude -p, codex -q, openclaw -p, zeroclaw agent -p; opencode/kilocode skip as TUI-only) - Add run_input_test() dispatcher with SKIP_INPUT_TEST env var support - Add --skip-input-test CLI flag to fly-e2e.sh - Chain input test after verify in run_single_agent() pipeline - Add INPUT_TEST_TIMEOUT constant (default 120s, env-overridable) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: format p.text({ message }) to multi-line for biome Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 15:16:30 -05:00
A	b84adfb74e	refactor: move all shell scripts to /sh directory (#1843 ) Reorganizes the project so all shell scripts live under a dedicated /sh directory, enabling the OpenRouter rewrite URL to point at /sh/ instead of the repository root. Moves: - cli/install.sh → sh/cli/install.sh - shared/.sh → sh/shared/.sh - {cloud}/{agent}.sh → sh/{cloud}/{agent}.sh (48 scripts) - {cloud}/README.md → sh/{cloud}/README.md - e2e/.sh → sh/e2e/.sh - test/macos-compat.sh → sh/test/macos-compat.sh - test/fixtures/*/.sh → sh/test/fixtures/*/.sh Updates all references: - RAW_BASE path construction in commands.ts, update-check.ts - GitHub auth URL in agent-setup.ts - Self-referencing URLs in install.sh, github-auth.sh - CI workflow paths in lint.yml, cli-release.yml - Test file paths in install-script-validation, manifest-integrity - Documentation in README.md, cli/README.md, CLAUDE.md - QA scripts in .claude/skills/ Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 21:14:54 -08:00

19 commits