spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-21 18:52:56 +00:00

Author	SHA1	Message	Date
A	1cfa9ca1a7	fix(cursor): update binary path from ~/.cursor/bin to ~/.local/bin (#3058 ) The cursor installer changed its binary install location from ~/.cursor/bin/agent to ~/.local/bin/agent (as of 2026-03-25 release). Updates: - agent-setup.ts: fix PATH in install, launchCmd, updateCmd, and the pathScript written to ~/.bashrc/~/.zshrc - verify.sh: fix E2E binary check to look in ~/.local/bin first - Bump CLI to 0.27.3 -- qa/e2e-tester Co-authored-by: spawn-qa-bot <qa@openrouter.ai>	2026-03-27 02:37:40 -07:00
Ahmed Abushagur	dcb740ec68	ci: add cursor agent to Docker image pipeline (#3051 ) Adds cursor.Dockerfile and includes cursor in the docker.yml matrix so nightly builds produce ghcr.io/openrouterteam/spawn-cursor:latest. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 13:41:27 +07:00
A	088e33b30e	fix(e2e): correct stale test expectation for hermes timeout fallback (#3044 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details When AGENT_TIMEOUT_hermes is non-numeric, get_agent_timeout() skips the env var and uses the built-in _AGENT_TIMEOUT_hermes=3600, NOT the global AGENT_TIMEOUT=1800. The test expected ${AGENT_TIMEOUT} (1800) but the function correctly returns 3600 (hermes built-in default). This test was failing silently, masking the correct behavior. Also filed OpenRouterTeam/spawn#3042 for cursor missing from e2e framework. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-26 19:02:23 -07:00
A	1c8011cae5	fix(e2e): add cursor agent to e2e test framework (#3045 ) Add cursor to ALL_AGENTS, verify_cursor, input_test_cursor, and their dispatch cases so e2e sweeps cover the cursor agent. Fixes #3042 Agent: issue-fixer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-27 08:40:51 +07:00
A	499eb494c6	fix(security): use StrictHostKeyChecking=accept-new in all SSH connections (#3037 ) Replace StrictHostKeyChecking=no with accept-new across all E2E cloud drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS constant, and pull-history.ts. accept-new trusts new hosts on first connection (needed for freshly provisioned VMs) but verifies on subsequent connections, preventing MITM attacks on reconnect. Fixes #3031 Agent: style-reviewer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-26 18:04:40 -07:00
A	917d34d034	fix(e2e): ensure openclaw binary available in --fast mode on Sprite (#3040 ) * fix(e2e): ensure agent binary available after spawnrc fallback When the provision timeout kills the CLI before agent install completes (common in --fast mode on Sprite), the manual .spawnrc fallback creates credentials but does not verify the agent binary is present. This causes "openclaw not found" failures in E2E verification. Add _ensure_agent_binary() that runs after the manual .spawnrc fallback: 1. Checks if the agent binary exists on the remote VM 2. If missing, runs the agent's install command directly 3. Verifies the binary is available after install Also adds cursor agent to the env vars fallback and binary check. Fixes #3028 Agent: ux-engineer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(security): add --proto '=https' to cursor install curl command Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-27 07:36:45 +07:00
A	7080d80472	fix(security): prevent race condition in GitHub token file permissions (#3035 ) Before this change, gh auth login wrote the token file with default permissions, and chmod 600 was applied afterward — leaving a window where the file could be read by other users on multi-user systems. Now the credential directory is created with 700 permissions and umask is set to 077 before the write, so the token file is created with restrictive permissions from the start. Agent: complexity-hunter Fixes #3030 Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-26 16:59:42 -07:00
A	aafdb8655f	fix(security): pipe encoded commands via stdin in GCP/AWS exec functions (#3036 ) Replace shell interpolation of base64-encoded commands in SSH invocations with stdin piping. Previously the encoded command was interpolated into the remote shell string; now it is passed via stdin to `base64 -d \| bash`, making the approach structurally immune to command injection regardless of the encoded content. Fixes #3029 Fixes #3022 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-27 06:11:50 +07:00
Ahmed Abushagur	c61736e511	feat: add Cursor CLI agent across all clouds (#3018 ) * feat: add Cursor CLI agent across all clouds Adds Cursor's terminal-based AI coding agent (the `agent` command from cursor.com/cli) to the spawn matrix. Routes LLM requests through OpenRouter via --endpoint flag and CURSOR_API_KEY env var. - manifest.json: new cursor agent entry + all 6 cloud matrix entries - agent-setup.ts: install, configure, launch, and update definitions - Shell scripts for all 6 clouds (local, hetzner, aws, do, gcp, sprite) - Config: writes ~/.cursor/cli-config.json with full permissions - Icon: cursor.png from cursor.com/apple-touch-icon.png - All cloud READMEs updated with cursor.sh usage - CLI version bumped to 0.26.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add spawn skill injection for Cursor CLI Writes a .cursor/rules/spawn.mdc rule file with alwaysApply: true during setup, teaching the Cursor agent how to use the spawn CLI to provision child cloud VMs. Uses the same base64 upload pattern as other agent config files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: A <258483684+la14-1@users.noreply.github.com>	2026-03-26 13:53:49 -07:00
A	255ffbf8b7	fix(security): use grep -F for literal string matching in PATH checks (#3021 ) Fixes #3019 Replace `grep -qx` with `grep -qxF` in the `ensure_in_path` function to prevent regex pattern injection. Without -F, attacker-controlled SPAWN_INSTALL_DIR or BUN_INSTALL env vars containing regex metacharacters (e.g. `/.*`) could cause false positive/negative PATH matches, potentially bypassing the symlink creation logic. Agent: issue-fixer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-27 02:56:07 +07:00
A	defca448b0	fix(e2e): load GCP_ZONE from ~/.config/spawn/gcp.json in E2E driver (#3017 ) The GCP E2E cloud driver defaulted to us-central1-a when GCP_ZONE was not set in the environment. The QA VM stores zone config in ~/.config/spawn/gcp.json (alongside GCP_PROJECT) but _gcp_validate_env only read GCP_PROJECT from the environment — it never loaded GCP_ZONE. This caused E2E failures when us-central1-a had insufficient resources: 3 agents (openclaw, opencode, kilocode) failed with "SSH port never opened" because GCP couldn't provision instances in that zone. Fix: load both GCP_PROJECT and GCP_ZONE from the config file in _gcp_validate_env when they are not already set in the environment, matching how key-request.sh loads GCP_PROJECT for provisioning. Verified: all 3 previously failing agents now pass on europe-west1-b. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-27 01:27:46 +07:00
A	988f5bb7a9	fix(security): validate bun path before symlinking in install.sh (fixes #3009 ) (#3011 ) Add allowlist validation for the bun binary path resolved via `command -v bun` before using it in symlink operations that may run with sudo privileges. If bun is found at an unexpected location, skip the symlink and warn the user. This prevents a privilege escalation attack where a malicious binary on PATH could be symlinked to /usr/local/bin/bun with elevated privileges. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 05:37:45 -07:00
A	463b8398f2	fix: add ai-review.sh to bash -n syntax check list in e2e-lib.sh (#3005 ) ai-review.sh is sourced by e2e.sh but was missing from the bash -n syntax check loop in sh/test/e2e-lib.sh. This means syntax errors in ai-review.sh would not be caught by the test harness. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-26 03:12:07 -07:00
A	7378cab0b2	fix(security): add defensive validation to tmpdir cleanup in install.sh (#3000 ) Adds a non-empty check after mktemp and guards the EXIT trap so rm -rf only fires when tmpdir is non-empty and still a directory. This is a defense-in-depth hardening — the current code is safe due to set -e, but explicit validation is best practice for rm -rf operations. Fixes #2998 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-26 11:26:56 +07:00
Ahmed Abushagur	90dde882d0	fix: installSpawnCli fails on Sprite — bun shim doesn't work (#2993 ) Sprite has a bun shim at /.sprite/bin/bun that delegates to $HOME/.bun/bin/bun, but that binary doesn't exist on fresh VMs. `command -v bun` returns true (finds the shim) so the install script skips bun installation, then bun fails when actually invoked. Fixed in two places: - installSpawnCli: source shell profiles, test `bun --version` (not just existence), and install bun fresh if it doesn't work - install.sh: replace `command -v bun` with `bun --version` to detect broken shims Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-26 07:36:12 +07:00
Ahmed Abushagur	934dfd309f	test: add unit tests for E2E bash test infrastructure (#2968 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details 136 tests covering common.sh, verify.sh, provision.sh, and e2e.sh: - format_duration, make_app_name, track_app/untrack_app - get_provision_timeout/get_agent_timeout with env overrides - Numeric validation (injection resistance for timeout vars) - OpenRouter API key fallback logic - _validate_timeout and _validate_base64 security checks - run_input_test dispatch (unknown agent, TUI skips, SKIP_INPUT_TEST) - provision_agent app_name validation (injection resistance) - e2e.sh argument parsing (--help, missing args, invalid clouds/agents) - ALL_AGENTS completeness (verify_* and input_test_* for every agent) - Cloud driver interface compliance (all 5 drivers implement required fns) - bash -n syntax check on all E2E scripts - macOS compat linter on core E2E libraries Also documents a known limitation: _validate_base64 uses per-line grep matching, so multiline strings pass if each line is valid (low risk since base64 encoding always strips newlines). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: A <258483684+la14-1@users.noreply.github.com>	2026-03-24 18:42:48 -07:00
A	a6940fdaad	fix(e2e): improve interactive harness failure logging (#2951 ) On interactive provision failure, save the harness log to a persistent path (/tmp/spawn-interactive-harness-last.log) for post-mortem inspection, and filter output to only show [harness] prefixed lines (30 lines) instead of dumping 50 raw lines of mixed output. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: L <6723574+louisgv@users.noreply.github.com> Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>	2026-03-24 08:45:19 -07:00
A	6c742bdd11	fix(e2e): increase hermes install timeout to fix failures on Hetzner/DO/GCP (#2956 ) Hermes installs a Python virtualenv which takes 20+ min on fresh VMs. The previous 300s install timeout caused the CLI to give up before writing .spawnrc, leading to 30-min E2E timeouts on Hetzner, DigitalOcean, and GCP (but not Sprite, which has a manual .spawnrc fallback). Changes: - agent-setup.ts: hermes installAgent timeout 300s → 600s - common.sh: add hermes per-agent overrides (_PROVISION_TIMEOUT_hermes=720, _AGENT_TIMEOUT_hermes=3600) to give the install enough headroom - package.json: bump CLI version 0.25.26 → 0.25.27 -- qa/e2e-tester Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-24 21:34:41 +07:00
A	056ce252c7	fix(e2e): suppress matrix email on targeted re-runs via SPAWN_E2E_SKIP_EMAIL (#2944 ) When the quality cycle e2e-tester re-runs only failed agents (e.g. `e2e.sh --cloud hetzner zeroclaw codex`), e2e.sh was firing a matrix email showing only those 2 agents — both PASS if the retry succeeded. This looked like "2 tests ran, all passed" when in reality 32 tests ran with 2 failures. - Add SPAWN_E2E_SKIP_EMAIL=1 env var check at the top of send_matrix_email - Update qa-quality-prompt.md to set SPAWN_E2E_SKIP_EMAIL=1 on re-runs Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 00:17:10 -07:00
A	aafeda4020	fix(e2e): reduce Hetzner max parallel from 5 to 3 to respect primary IP quota (#2943 ) The QA account's primary IP limit is ~3, so running 5 agents in parallel exhausted the quota, causing codex and zeroclaw to fail with resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps provisioning within quota while still running agents concurrently. Verified: zeroclaw and codex both PASS on Hetzner after this fix. -- qa/e2e-tester Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-24 13:32:10 +07:00
A	81ab237efe	fix(e2e): harden shell scripts against injection in SSH commands (#2945 ) - hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of embedding it in the SSH command string via variable expansion. The remote bash reads stdin, base64-decodes, and executes. - verify.sh: Add remote-side re-validation of base64 and timeout values in _stage_prompt_remotely and _stage_timeout_remotely. Values are assigned to remote shell variables and validated before writing to temp files, providing defense-in-depth against injection. - provision.sh: Add explicit early rejection of dangerous shell chars ($, `, \) in env var values from cloud_headless_env, and add remote-side re-validation of base64 payload before writing. Fixes #2937 Fixes #2938 Fixes #2939 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-24 13:30:47 +07:00
A	8ed8d91205	fix(qa): stash before pull, fix star count push, fix claude update flag (#2942 ) - Stash uncommitted changes before git pull --rebase so the pull never aborts with "You have unstaged changes" - Pull --rebase before pushing star count commit to avoid non-fast-forward rejection (was failing every single cycle) - Remove --yes flag from claude update (flag was removed upstream) - Fix interactive harness AI prompt: update success marker text from "is ready" or "Starting agent" to match code check ("Starting agent..." or "setup completed successfully") Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 12:53:27 +07:00
A	4f141486dc	refactor: remove dead code and stale references (#2940 ) - fix misplaced interactive_provision comment block in interactive.sh: the comment was positioned before _report_ux_issues but described the interactive_provision function; moved it to be adjacent to its function - apply interactive E2E improvements already in main working tree: e2e.sh: add verify_agent call after interactive_provision to wait for .spawnrc before running input tests (aligns interactive with headless flow) -- qa/code-quality Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 12:09:50 +07:00
A	e9cbab5b7f	fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936 ) Three fixes for Sprite E2E failures in long-running batches (73+ min): 1. Retry `_sprite_provision_verify`: list failures now retry 3x with exponential backoff (5s, 10s, 20s) instead of failing immediately. Fixes kilocode batch 6 "Could not list Sprite instances" errors. 2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add `Client.Timeout`, `request canceled`, and `authentication failed` to the transient error retry pattern in `spriteRetry`. Also uses linear backoff (3s * attempt) instead of fixed 3s delay. Fixes hermes batch 7 HTTP timeout errors. 3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The E2E orchestrator calls `cloud_refresh_auth` before each provisioning batch. For Sprite, this re-validates the token via `sprite org list` and attempts `sprite auth refresh` if expired. Fixes junie batch 8 "authentication failed" errors. Fixes #2934 Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 21:47:58 -07:00
A	50319e0d39	fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935 ) Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary IPs from previous test runs consume the account quota. This adds proactive cleanup at two levels: 1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached primary IPs during pre-batch stale cleanup, freeing quota before any new servers are provisioned. 2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()` before `createServer()` in headless/non-interactive mode, ensuring each agent provisioning attempt starts with a clean IP quota. The existing reactive cleanup (retry after failure) in `hetzner.ts` remains as a fallback. Fixes #2933 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 11:20:30 +07:00
A	c1e6fb76f9	fix(e2e): harden pkill regex escaping against all metacharacters (#2917 ) * fix(e2e): harden pkill regex escaping against all metacharacters (#2911) The sed character class `[.[\^$]` was malformed and missed several extended regex metacharacters (+, ?, (, ), {, }, \|). Replace with a correct bracket expression that escapes all POSIX ERE metacharacters. Although app_name is already validated to [A-Za-z0-9._-], fixing the escaping is defense-in-depth against future changes to the validation. Agent: security-auditor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> fix(e2e): correct sed bracket expression to escape ] character Place ] first in character class so it's treated as literal. Use \\ to match literal backslash. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 12:35:31 -07:00
A	a96522829b	fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898 ) * fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively. The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN is an OpenRouter key — every Anthropic API call returns 401, so the harness returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires. SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect, ensuring the harness only tests the provisioning/installation UX. * fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via "printf \| base64 -d \| bash", which makes bash's stdin the decode pipe. So piped data from the outer SSH call never reaches subcommands. "printf '%s' 'VALUE' \| cloud_exec APP 'cat > /tmp/.e2e-timeout'" always creates an empty file, causing "timeout: invalid time interval ''" when the input test runs. Fix: embed the validated numeric timeout value directly in the printf command string (safe — _validate_timeout ensures only [0-9] digits). * test(e2e): add claude PATH diagnostics to input_test_claude Temporary debug output to trace where claude is installed after interactive provision completes. * test(e2e): save harness transcript JSON on success for debugging * fix(e2e): remove 'is ready' from harness success pattern 'SSH is ready' (emitted ~15s into provision when SSH connects but before any agent installation) matched the /is ready/ pattern, triggering false success detection. The harness killed the spawn CLI during cloud-init wait, leaving a VM with no agent installed. Fix: use the same precise patterns as the main repo's harness: /Starting agent\.\.\.\|setup completed successfully/i Both only fire after orchestrate.ts completes the full setup. * chore(e2e): remove temporary debug instrumentation * feat(e2e): add ai-powered ux review after interactive provision After each successful interactive E2E run, the harness sends the full terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt. It looks for confusing messages, noisy output, missing context in spinners, and unhelpful errors that don't explain next steps. Findings are returned as uxIssues[] in the harness JSON result. interactive.sh then files a GitHub issue per run listing each problem with a verbatim example and concrete suggestion. Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM where ANTHROPIC_API_KEY is an OpenRouter key. * refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required - Random 33% gate: UX review runs on ~1 in 3 successful interactive provisions, not every run - Minimum bar: only surface findings when AI found 3+ clear issues (filters one-off nits) - Tighter system prompt: only flag obvious problems (repeated messages, debug leaks, cryptic errors), not minor style preferences Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(e2e): replace random throttle with stricter ux review prompt Instead of Math.random() to suppress issues, make the AI self-regulate: the system prompt now instructs it to only flag genuinely bad problems (repeated messages, raw stack traces, no-feedback waits) and treat zero findings as a good outcome, not a failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 13:42:02 +07:00
A	9448cb8ca0	fix(e2e): fix _stage_prompt_remotely to embed prompt inline instead of stdin pipe (#2897 ) The stdin piping approach was broken: _hetzner_exec runs remote commands via "printf '%s' 'ENCODED_CMD' \| base64 -d \| bash", which connects bash's stdin to the base64 pipe rather than SSH's outer stdin. So `cat > /tmp/.e2e-prompt` read from EOF — the encoded prompt was never written to the remote file. Fix: embed the validated base64 prompt directly in the command string using printf. This is safe because _validate_base64 ensures the prompt contains only [A-Za-z0-9+/=] — no characters that can break out of single quotes or inject shell metacharacters. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-23 12:19:51 +07:00
A	9280489ada	fix(qa): load ANTHROPIC_AUTH_TOKEN as ANTHROPIC_API_KEY for interactive E2E (#2894 ) * chore: update agent GitHub star counts * fix(qa): load ANTHROPIC_AUTH_TOKEN as ANTHROPIC_API_KEY for interactive E2E QA VMs store the Anthropic key as ANTHROPIC_AUTH_TOKEN in /etc/spawn-qa-auth.env, but the e2e-interactive handler only looked for ANTHROPIC_API_KEY — causing the 6am cron to fail immediately with "ANTHROPIC_API_KEY not set". Accept either name when loading from the auth env file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(e2e): bump interactive harness timeout to 20min, fix zombie VM teardown - SESSION_TIMEOUT_MS: 10min → 20min — provisioning a VM takes 3-4 min before onboarding even starts; 10min wasn't enough headroom - interactive.sh: call cloud_provision_verify even on harness failure so teardown can find and delete any VM that was partially created (e.g. on timeout mid-provision) — previously left zombie VMs with no .meta file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-23 11:24:26 +07:00
Ahmed Abushagur	6aeb9ba142	feat(e2e): diff-aware AI review with e2e-last-green tracking (#2893 ) AI log review now includes the git diff since the last fully passing E2E run, enabling causal analysis like "this 404 likely caused by commit abc123 which deleted file Y". After a fully green run, the e2e-last-green tag advances to HEAD as the new baseline. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 11:21:35 +07:00
A	4d08dbe2a7	fix(security): harden remote command construction in provision.sh (#2886 ) * fix(security): harden remote command construction in provision.sh Split the .spawnrc upload fallback into two separate cloud_exec calls to separate data from commands. Step 1 writes the validated base64 payload to a remote temp file. Step 2 decodes from that file and sets up shell rc sourcing using a static command string with no interpolated variables. This eliminates command injection risk in the control-flow portion of the remote command (for loop, grep, etc.) even if the base64 validation were ever bypassed, since user-controlled data never appears in the same command string as shell control flow. Fixes #2882 Agent: complexity-hunter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: correct error handling + use mktemp for temp file - Return 1 (not 0) when step 1 fails to avoid masking provisioning failures - Use mktemp -t spawnrc.b64 to avoid race conditions on concurrent provisions Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: propagate step 2 failure in provision.sh (return 1) The else branch for step 2 (decode + shell rc setup) logged an error but the function still returned 0, masking the failure. Now returns 1 so provisioning failures are correctly propagated. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-22 20:44:33 -07:00
A	d046a9bfdf	fix: tighten character whitelist for cloud_headless_env values (#2890 ) The env value whitelist allowed @, %, +, =, :, and , characters that are unnecessary for cloud resource names (server names, regions, sizes) and could be used as shell metacharacters in certain contexts. Restrict to only [A-Za-z0-9._/-] which matches all legitimate cloud resource identifiers. Fixes #2883 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 08:41:50 +07:00
A	fa79d34a47	fix(security): properly quote remote cmd construction in verify.sh (#2887 ) Prevent shell metacharacter interpretation in test prompt handling by staging INPUT_TEST_TIMEOUT and attempt number to remote temp files instead of interpolating them into remote command strings. Previously, _TIMEOUT='${INPUT_TEST_TIMEOUT}' and --session-id e2e-test-${attempt} were interpolated directly into double-quoted remote command strings. While _validate_timeout enforces digits-only, the structural pattern of local-to-remote variable interpolation is inherently risky. Now all dynamic values (prompt, timeout, attempt) are piped to remote temp files via stdin and read back on the remote side, eliminating the injection surface entirely. Fixes #2884 Agent: test-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 08:39:36 +07:00
A	db6c44be9c	fix(e2e): update input tests for new agent CLIs + auto-load email creds (#2877 ) * fix(e2e): update input tests for latest agent CLI interfaces + auto-load email creds claude: add --dangerously-skip-permissions --no-session-persistence to bypass trust dialog when running in /tmp/e2e-test (not in ~/.claude.json trusted projects list written during install) codex: replace `codex exec --full-auto` (removed in new @openai/codex) with `codex -q -a full-auto` — quiet mode + full-auto approval, no exec subcommand email: auto-load RESEND_API_KEY + KEY_REQUEST_EMAIL from /etc/spawn-key-server-auth.env (QA VM) or ~/.config/spawn/resend.env (local) so send_matrix_email fires on every e2e run, not just QA-cycle runs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(e2e): correct claude and codex input test commands - claude: pass prompt as positional arg to claude -p instead of piping via stdin (stdin pipe breaks through SSH exec chain, causing "Input must be provided either through stdin or as a prompt argument" error) - codex: revert to `codex exec --full-auto` subcommand (correct for v0.116.0 — previous -q -a full-auto flags don't exist) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 03:08:37 +07:00
Ahmed Abushagur	48163ea2ee	feat(e2e): AI-powered log review catches non-fatal issues (#2875 ) * feat(e2e): add AI-powered log review after provisioning Feeds provision stderr/stdout logs to an LLM after each agent deploys. Catches non-fatal issues that binary pass/fail checks miss: silent 404s, failed component installs, connection instability, swallowed warnings. This would have caught the keep-alive 404 and the sprite idle shutdown that the existing E2E tests missed because installSpriteKeepAlive() is non-fatal and the binary checks only verify final state. - Uses gemini-flash-lite-2.0 via OpenRouter (cheap, fast) - Advisory only — never fails the test, reports findings as warnings - Truncates logs to last 200 lines to stay within token limits - Skips gracefully if OPENROUTER_API_KEY is missing or API fails Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(e2e): add AI log review and --fast mode testing AI log review: - After each agent provisions, feeds stderr/stdout to gemini-flash-lite to catch non-fatal issues binary checks miss (404s, failed installs, connection drops, swallowed warnings) - Advisory only — never fails the test, surfaces findings as warnings - Would have caught the keep-alive 404 and sprite idle shutdown --fast mode E2E: - Add --fast flag to e2e.sh, passed through to spawn CLI during provision - Update QA e2e-tester protocol to run both normal and --fast passes - --fast enables images + tarballs + parallel boot Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-23 02:15:09 +07:00
Ahmed Abushagur	66a1749b4b	fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details * fix: destroy orphaned Packer builder instances on workflow cancel When a Packer Snapshots workflow is cancelled mid-build, Packer's process is killed before it can clean up its temporary builder droplet/server. This leaves orphaned packer-* instances running and costing money. Add `if: cancelled()` cleanup steps for both DigitalOcean and Hetzner that destroy any packer-* prefixed instances after cancellation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove Hetzner cleanup step — only DO needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove Hetzner from Packer snapshots, add cancel cleanup Remove Hetzner from the Packer workflow entirely — only DigitalOcean snapshots are built. Deletes packer/hetzner.pkr.hcl and simplifies the workflow by removing all Hetzner-specific steps and cloud conditionals. Also adds a cancelled() cleanup step that destroys orphaned packer-* builder droplets when a workflow run is cancelled mid-build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add missing sprite-keep-running.sh script The keep-alive install was 404ing because sh/shared/sprite-keep-running.sh never existed in the repo. The TypeScript code downloaded it from the CDN (which maps to sh/shared/) but the file was never created. The script wraps a command and pings the sprite's own public URL every 30s to prevent inactivity shutdown. It resolves the URL via sprite-env info (available on all sprites) and falls back to exec without keep-alive if the URL can't be determined. Also removes Hetzner from the Packer snapshots workflow entirely — only DigitalOcean snapshots are built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address security review — scope cleanup filter, fix JSON injection 1. Add `spawn-packer` tag to DO builder droplets in Packer template and filter cleanup by tag instead of broad `packer-` name prefix. Prevents accidentally destroying builder instances from other concurrent builds. 2. Use `jq --arg` for SINGLE_AGENT_INPUT instead of string interpolation to prevent JSON injection via crafted agent names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 18:13:38 +00:00
A	57e06bab4a	fix(e2e): fix manual .spawnrc creation on Sprite (stdin piping broken) (#2872 ) The manual .spawnrc fallback in provision.sh was using `printf '%s' "${env_b64}" \| cloud_exec ...`, which works for SSH-based clouds (Hetzner, GCP, AWS) where stdin is passed through the SSH connection. However, Sprite's exec driver replaces stdin with the command pipe: `printf '%s' "${cmd}" \| sprite exec -s NAME -- bash` This causes the outer env_b64 pipe to be lost — `base64 -d` receives no input and writes an empty .spawnrc, which then fails the OPENROUTER_API_KEY and openrouter.ai verification checks. Fix: embed the base64 data directly in the command string using `printf '%s' '${env_b64}'`. This is safe because env_b64 is validated to contain only [A-Za-z0-9+/=] — the standard base64 alphabet — which cannot break out of single quotes or cause shell injection. Confirmed by E2E run where sprite/claude and sprite/openclaw both failed with: [FAIL] OPENROUTER_API_KEY not found in .spawnrc [FAIL] Failed to create manual .spawnrc Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-22 16:46:05 +07:00
A	c1363b138c	feat(gcp): default boot disk to 40 GB, configurable via GCP_DISK_SIZE (#2867 ) GCP's default 10 GB boot disk is insufficient for coding agents — node_modules, apt packages, and build caches easily exceed it. Default to 40 GB and allow override via GCP_DISK_SIZE env var. Closes #2866 Co-authored-by: Claude <claude@anthropic.com>	2026-03-22 11:21:05 +07:00
A	9a98589cef	fix(security): prevent command injection via INPUT_TEST_TIMEOUT in verify.sh (#2851 ) Add defense-in-depth validation of INPUT_TEST_TIMEOUT directly in verify.sh (not just relying on common.sh). Each input test function now calls _validate_timeout() to ensure the value contains only digits before use. Additionally, instead of interpolating INPUT_TEST_TIMEOUT directly into remote command strings passed to cloud_exec, the timeout value is now assigned to a single-quoted remote variable (_TIMEOUT) and referenced via "$_TIMEOUT" on the remote side. This eliminates the injection surface even if validation were somehow bypassed. Affected functions: input_test_claude(), input_test_codex(), input_test_openclaw(), input_test_zeroclaw(). Fixes #2849 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-20 19:58:52 -07:00
A	5acf598615	fix: use stdin piping in _stage_prompt_remotely to prevent injection (#2839 ) Replaces command string interpolation with stdin piping for the base64 prompt in verify.sh. Also anchors the _validate_base64 regex. Fixes #2833 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 15:46:00 -07:00
A	c323f10ae9	fix(gcp): add /usr/local/bin to PATH for kilocode binary detection (#2825 ) Fixes #2823: npm installs kilocode to /usr/local/bin when running as root on GCP, but the E2E binary verify step didn't include /usr/local/bin in PATH, causing false "binary not found" failures. The .spawnrc PATH (generated by generateEnvConfig) already includes /usr/local/bin, but verify_kilocode used a hardcoded PATH that omitted it. This aligns kilocode and codex verify checks with openclaw and junie which already include /usr/local/bin. Also fixes the same latent issue in verify_codex. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 05:25:15 -07:00
Ahmed Abushagur	ed127cf592	feat: never-give-up resilience layer (#2807 ) Some checks failed CLI Release / Build and release CLI (push) Failing after 5s Details Lint / Biome Lint (push) Failing after 4s Details Lint / macOS Compatibility (push) Successful in 15s Details Lint / ShellCheck (push) Successful in 59s Details * feat: never-give-up resilience layer — retry every failure instead of exiting Add retryOrQuit() helper to shared/ui.ts that prompts "Try again? (Y/n)" after any recoverable failure. Wrap all fatal exit points with retry loops: - Cloud auth (Hetzner, DigitalOcean, AWS, GCP): retry after 3 failed tokens - API key acquisition: retry after 3 failed OAuth+manual attempts - Server creation: retry on any createServer failure (both fast & sequential) - SSH readiness: retry on waitForReady timeout - Agent install: retry on install failure - Pre-launch hooks: retry on preLaunch failure Non-interactive mode (SPAWN_NON_INTERACTIVE=1) still throws immediately. Ctrl+C at any retry prompt exits cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(e2e): add AI-driven interactive test harness Add --interactive mode to the E2E test framework. Instead of running spawn in headless mode (SPAWN_NON_INTERACTIVE=1), this spawns the CLI in a real PTY and uses Claude Haiku to respond to prompts like a human user would. New files: - sh/e2e/interactive-harness.ts — Bun script that drives the PTY + AI loop - sh/e2e/lib/interactive.sh — Bash integration with the E2E framework Usage: e2e.sh --cloud hetzner claude --interactive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(qa): wire interactive E2E into scheduled QA pipeline - Add `e2e-interactive` option to workflow_dispatch in qa.yml - Add `e2e-interactive` run mode to qa.sh (loads cloud creds + ANTHROPIC_API_KEY) - Runs `e2e.sh --cloud hetzner claude --interactive` directly (no Claude Code needed) - Defaults to hetzner (cheapest), overridable via E2E_INTERACTIVE_CLOUD/AGENT env vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(qa): schedule interactive E2E daily at 6am UTC Runs one agent (claude) on one cloud (hetzner) with AI-driven prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(qa): offset soak cron to avoid GitHub Actions schedule dedup GitHub Actions deduplicates overlapping cron schedules into one run, making `github.event.schedule` unpredictable. The soak test at `0 3 * * 1` was getting absorbed by the `0 /4 * ` quality sweep and never firing as reason=soak. Move soak to `30 1 * 1` (Monday 1:30am UTC) — safely between the 0am and 4am quality sweep slots. Interactive E2E at `0 6 * * ` is already safe (between the 4am and 8am slots). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix(qa): add e2e-interactive to trigger server valid reasons The trigger server validates reason query params against an allowlist. Without this, the `e2e-interactive` dispatch returns 400. Also note: `soak` is already in VALID_REASONS in the repo but the running service on the QA VM is stale — needs a restart to pick up both soak and e2e-interactive reasons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 17:33:22 -07:00
A	66036bfac9	fix(do): skip _run_with_restart in headless mode to prevent duplicate droplets (#2805 ) The _run_with_restart wrapper in all 8 DigitalOcean agent scripts catches SIGTERM/SIGKILL exit codes (143/137) and retries the orchestration process. In headless mode (E2E tests), when the provision timeout kills the process, this restart loop would re-run main.ts, creating duplicate droplets and exhausting the account's droplet quota — causing ALL subsequent DO agents to fail provisioning. Skip the restart loop entirely when SPAWN_HEADLESS=1 (set by runScriptHeadless in the CLI). The restart behavior is only useful for interactive sessions where the user's SSH connection drops. Fixes #2794 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 16:12:25 -07:00
A	8d76ad90d3	security: base64-encode cmd in _sprite_exec to prevent injection (#2803 ) Apply the same base64 encoding mitigation used by all other cloud drivers (aws, hetzner, digitalocean, gcp). The command is encoded locally, validated for safe characters, then decoded and executed on the remote side via `base64 -d \| bash`. Fixes #2800 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-19 13:19:07 -07:00
A	8fef58845c	fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798 ) The pre-run stale cleanup (added in #2789) used the same 30-minute max_age as the post-run cleanup. Orphaned instances from recently-failed runs (< 30 min old) were not cleaned, causing quota exhaustion on DigitalOcean and other clouds. Pre-run cleanup now uses _CLEANUP_MAX_AGE=300 (5 min) to aggressively reclaim orphaned e2e instances before provisioning new ones. Post-run cleanup retains the 30-minute default. All 5 cloud drivers respect the override. Fixes #2793 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 11:23:55 -07:00
A	e4bfd38443	security: pass encoded prompt via env var, not string interpolation (#2799 ) Fixes #2797. The _stage_prompt_remotely() function was interpolating ${encoded_prompt} directly into the remote command string passed to cloud_exec. While _validate_base64() ensures only [A-Za-z0-9+/=] characters are present, defense-in-depth requires eliminating the interpolation entirely. The fix uses printf %s format substitution to build the remote command, placing the encoded prompt into a single-quoted shell variable assignment (_EP='...') on the remote side. Single quotes prevent all shell expansion, and base64 charset cannot contain single quotes, making injection structurally impossible. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 11:23:08 -07:00
A	5f8b7f1145	fix(e2e): run stale cleanup before agents, not just after (#2789 ) Orphaned e2e instances from previously interrupted test runs (e.g. killed by timeout) remain under the 30-minute max_age threshold and continue to consume account capacity. This caused DigitalOcean "droplet limit exceeded" 422 errors when re-running the suite within 30 minutes of a failed run. Add a pre-run stale cleanup call at the start of run_agents_for_cloud (after credentials are validated, before agents start). This clears leftover e2e-* instances immediately so they don't block provisioning in the new run. -- qa/e2e-tester Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 03:49:51 -07:00
A	9ab3993b39	fix(e2e): eliminate prompt interpolation in input_test commands (#2790 ) Replaces the pattern of embedding base64-encoded prompts directly into remote command strings via shell variable interpolation with a two-step approach: stage the encoded prompt to a remote temp file first, then read from that file in the agent command. This eliminates RCE risk if the prompt source ever becomes user-controlled. Changes: - Add _stage_prompt_remotely() helper that writes encoded prompt to /tmp/.e2e-prompt on the remote host via an isolated cloud_exec call - input_test_claude(): read prompt from temp file instead of _ENCODED_PROMPT var - input_test_codex(): same - input_test_openclaw(): same - input_test_zeroclaw(): same - Update _validate_base64() comment to reflect defense-in-depth role Closes #2788 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 03:48:53 -07:00
A	b0ecb3a139	fix(e2e): validate base64 chars in encoded_prompt before remote injection (#2780 ) Add explicit validation that encoded_prompt only contains safe base64 characters ([A-Za-z0-9+/=]) in all input_test_* functions in verify.sh. This makes the safety assumption explicit in code rather than relying on documentation — if the base64 output ever contains unexpected chars, the test aborts immediately instead of injecting them into a remote command string. Fixes #2775 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 20:11:24 -07:00
A	1085987a01	fix(e2e): add path-prefix guard to final_cleanup rm -rf (#2778 ) Validates LOG_DIR is within /tmp/spawn-e2e.* before deleting it, preventing catastrophic data loss if LOG_DIR is somehow set to an unexpected path via TMPDIR manipulation or future refactors. Fixes #2777 Agent: issue-fixer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-18 20:10:24 -07:00

1 2 3 4

189 commits