spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-04-28 11:59:29 +00:00

Author	SHA1	Message	Date
A	f2f981bd0a	fix(e2e): reduce Hetzner batch parallelism from 3 to 2 (#3112 ) Prevents server_limit_reached errors when pre-existing servers (e.g. spawn-szil) consume quota during E2E batch 1. Fixes #3111 Agent: test-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-31 03:08:18 +07:00
A	499eb494c6	fix(security): use StrictHostKeyChecking=accept-new in all SSH connections (#3037 ) Replace StrictHostKeyChecking=no with accept-new across all E2E cloud drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS constant, and pull-history.ts. accept-new trusts new hosts on first connection (needed for freshly provisioned VMs) but verifies on subsequent connections, preventing MITM attacks on reconnect. Fixes #3031 Agent: style-reviewer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-26 18:04:40 -07:00
A	aafeda4020	fix(e2e): reduce Hetzner max parallel from 5 to 3 to respect primary IP quota (#2943 ) The QA account's primary IP limit is ~3, so running 5 agents in parallel exhausted the quota, causing codex and zeroclaw to fail with resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps provisioning within quota while still running agents concurrently. Verified: zeroclaw and codex both PASS on Hetzner after this fix. -- qa/e2e-tester Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-24 13:32:10 +07:00
A	81ab237efe	fix(e2e): harden shell scripts against injection in SSH commands (#2945 ) - hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of embedding it in the SSH command string via variable expansion. The remote bash reads stdin, base64-decodes, and executes. - verify.sh: Add remote-side re-validation of base64 and timeout values in _stage_prompt_remotely and _stage_timeout_remotely. Values are assigned to remote shell variables and validated before writing to temp files, providing defense-in-depth against injection. - provision.sh: Add explicit early rejection of dangerous shell chars ($, `, \) in env var values from cloud_headless_env, and add remote-side re-validation of base64 payload before writing. Fixes #2937 Fixes #2938 Fixes #2939 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-24 13:30:47 +07:00
A	50319e0d39	fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935 ) Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary IPs from previous test runs consume the account quota. This adds proactive cleanup at two levels: 1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached primary IPs during pre-batch stale cleanup, freeing quota before any new servers are provisioned. 2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()` before `createServer()` in headless/non-interactive mode, ensuring each agent provisioning attempt starts with a clean IP quota. The existing reactive cleanup (retry after failure) in `hetzner.ts` remains as a fallback. Fixes #2933 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 11:20:30 +07:00
A	8fef58845c	fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798 ) The pre-run stale cleanup (added in #2789) used the same 30-minute max_age as the post-run cleanup. Orphaned instances from recently-failed runs (< 30 min old) were not cleaned, causing quota exhaustion on DigitalOcean and other clouds. Pre-run cleanup now uses _CLEANUP_MAX_AGE=300 (5 min) to aggressively reclaim orphaned e2e instances before provisioning new ones. Post-run cleanup retains the 30-minute default. All 5 cloud drivers respect the override. Fixes #2793 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 11:23:55 -07:00
A	6fda75ccc8	security: validate base64 output in cloud_exec and soak.sh (defense-in-depth) (#2532 ) Add base64 character validation ([A-Za-z0-9+/=]) before use in SSH command strings for gcp.sh, aws.sh, and hetzner.sh cloud_exec functions -- matching the existing fix in digitalocean.sh (#2528). Also add a validated _encode_b64 helper to soak.sh and use it for all Telegram bot token encoding, preventing corrupted base64 from breaking out of single-quoted SSH command strings. Closes #2527 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-12 09:32:48 -04:00
A	e9f8d5ec2d	fix: secure curl header args and provision.sh export whitelist (fixes #2464 , fixes #2465 ) (#2471 ) - Replace `-H "Authorization: Bearer ..."` curl args with temp curl config files (`-K`) in digitalocean.sh and hetzner.sh e2e drivers, keeping API tokens out of `ps` output - Replace dangerous-var blocklist in provision.sh with a positive whitelist of allowed cloud_headless_env variable names Agent: complexity-hunter Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-10 17:54:32 -07:00
A	1bddd713ea	fix: base64-encode commands in SSH exec to prevent injection (#2448 ) All four SSH-based cloud drivers (aws, digitalocean, gcp, hetzner) passed the command string directly as an SSH argument, which gets interpreted by the remote shell. While current callers pass trusted E2E test code, this creates a security footgun for future changes. Fix: base64-encode the command locally and decode it on the remote side before piping to bash. The encoded string contains only safe characters [A-Za-z0-9+/=], eliminating any injection vector. Stdin is preserved for callers that pipe data into cloud_exec. Closes #2432, closes #2433, closes #2434, closes #2435 Agent: complexity-hunter Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-03-10 13:22:33 -04:00
A	3724bb8ba4	fix: address SSH command injection risks in e2e cloud drivers (#2447 ) Add defense-in-depth validation across all e2e cloud driver scripts: - Validate IP addresses match IPv4 format before use in SSH commands (aws, digitalocean, gcp, hetzner) - Validate SSH username contains only safe characters (gcp) - Validate resource IDs are numeric before interpolating into API URLs (digitalocean droplet IDs, hetzner server IDs) - URL-encode app name in Hetzner API query parameter to prevent query parameter injection - Validate numeric env vars (INPUT_TEST_TIMEOUT, PROVISION_TIMEOUT, INSTALL_WAIT) that get interpolated into remote command strings Fixes #2432, #2433, #2434, #2435, #2442 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-10 12:27:47 -04:00
A	c4ae16849d	refactor: remove dead cloud_exec_long and _*_exec_long functions (#2407 ) The cloud_exec_long dispatcher in common.sh and all five cloud-specific _exec_long implementations (aws, digitalocean, gcp, hetzner, sprite) were defined but never called by any code in the e2e test suite. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-09 19:39:53 -07:00
A	1740274323	fix: replace base64 interpolation with stdin piping in all cloud exec_long functions (#2290 ) Replace unsafe pattern where base64-encoded commands were interpolated into remote command strings with secure stdin piping — command data now travels as stdin rather than as part of the command string, eliminating injection risk from shell metacharacter interpretation. Affected functions across all 5 cloud drivers: - _hetzner_exec_long - _aws_exec_long - _gcp_exec_long - _digitalocean_exec_long - _sprite_exec_long Fixes #2286 Fixes #2287 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-07 14:09:15 -05:00
A	548cfdf0b1	fix(security): apply base64 exec escaping to remaining 4 cloud drivers (#2067 ) PR #2064 fixed _exec_long shell injection for DigitalOcean and Sprite but missed the same bash -c '${cmd}' pattern in Hetzner, GCP, AWS, and Daytona. Apply the same base64-encoding fix to all four. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-01 11:50:33 -08:00
Ahmed Abushagur	c1e605c884	fix(e2e): increase server sizes and install timeouts (#2014 ) E2E tests were failing because agent installs didn't complete within the default 120s timeout, and small VMs ran out of memory during builds. - INSTALL_WAIT: 120s → 300s (with per-cloud override via cloud_install_wait) - AWS: nano_3_0 → medium_3_0 (all agents need 4GB for reliable installs) - DigitalOcean: s-1vcpu-512mb-10gb → s-2vcpu-2gb, cap at 3 parallel - GCP: e2-medium → e2-standard-2 - Hetzner: cap at 5 parallel (primary IP limit) - Sprite: 300s install wait (slower exec than SSH) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-28 00:25:36 -08:00
Ahmed Abushagur	627026a26b	feat(e2e): multi-cloud test suite with cloud driver pattern (#2004 ) * feat(e2e): multi-cloud test suite with cloud driver pattern Scale the E2E test suite from AWS-only to all 6 infrastructure clouds (aws, hetzner, digitalocean, gcp, daytona, sprite) with parallel execution support. Architecture: - Cloud driver pattern: each cloud implements _cloudname_func() functions - load_cloud_driver() wires cloud-specific functions to generic names (cloud_exec, cloud_teardown, etc.) - Shared orchestration stays in one place, cloud details are isolated New files: - sh/e2e/e2e.sh — unified entry point with --cloud flag - sh/e2e/lib/clouds/{aws,hetzner,digitalocean,gcp,daytona,sprite}.sh Refactored: - common.sh — removed AWS constants, added load_cloud_driver() - provision.sh — cloud-agnostic via cloud_headless_env/cloud_provision_verify - verify.sh — replaced aws_ssh with cloud_exec/cloud_exec_long - teardown.sh/cleanup.sh — delegate to cloud driver functions - aws-e2e.sh — thin wrapper: exec e2e.sh --cloud aws Usage: e2e.sh --cloud aws # Single cloud e2e.sh --cloud aws --cloud hetzner # Multiple clouds in parallel e2e.sh --cloud all --parallel 3 # All clouds, 3 agents parallel Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(e2e): prevent subshell EXIT trap inheritance and single-cloud early exit - Reset EXIT trap in multi-cloud subshells to prevent LOG_DIR deletion before the main process reads log files - Use `\|\| true` for single-cloud run_agents_for_cloud to prevent set -e from skipping the summary on env validation failure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: default to parallel agent provisioning in e2e tests All agents within a cloud now run in parallel by default instead of sequentially. Use --sequential to restore the old behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: cap sprite parallelism, 4GB for openclaw, remove stderr suppression - Sprite: add _sprite_max_parallel (cap 2 concurrent agents) to avoid CLI rate limiting that caused all 6 agents to fail - AWS: use medium_3_0 (4GB) bundle for openclaw which needs more RAM - Input tests: remove 2>/dev/null from agent commands so failures produce visible error output instead of empty responses - Add cloud_max_parallel to driver interface, respected by e2e.sh Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use bash instead of sh for exec_long across all cloud drivers Ubuntu's /bin/sh is dash, which doesn't support bash-specific PATH sourcing from .spawnrc/.cargo/env. This caused codex and zeroclaw input tests to fail with "command not found" even though verify passed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: codex input test uses positional prompt, not -q flag codex CLI takes prompt as positional arg: `codex "PROMPT"`. The -q flag doesn't exist, causing "Usage:" error output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use codex exec -q for non-interactive input test codex requires `exec` subcommand for non-interactive mode. Plain `codex PROMPT` expects a TTY (stdin is not a terminal). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: codex exec takes no -q flag, just positional prompt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use cx23 instead of deprecated cx22 for Hetzner e2e tests Hetzner deprecated server type cx22 (ID 104). The default now uses cx23. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-27 19:28:08 -08:00

15 commits