Commit graph

15 commits

Author SHA1 Message Date
A
f2f981bd0a
fix(e2e): reduce Hetzner batch parallelism from 3 to 2 (#3112)
Prevents server_limit_reached errors when pre-existing servers (e.g.
spawn-szil) consume quota during E2E batch 1.

Fixes #3111

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-31 03:08:18 +07:00
A
499eb494c6
fix(security): use StrictHostKeyChecking=accept-new in all SSH connections (#3037)
Replace StrictHostKeyChecking=no with accept-new across all E2E cloud
drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS
constant, and pull-history.ts. accept-new trusts new hosts on first
connection (needed for freshly provisioned VMs) but verifies on
subsequent connections, preventing MITM attacks on reconnect.

Fixes #3031

Agent: style-reviewer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-26 18:04:40 -07:00
A
aafeda4020
fix(e2e): reduce Hetzner max parallel from 5 to 3 to respect primary IP quota (#2943)
The QA account's primary IP limit is ~3, so running 5 agents in parallel
exhausted the quota, causing codex and zeroclaw to fail with
resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps
provisioning within quota while still running agents concurrently.

Verified: zeroclaw and codex both PASS on Hetzner after this fix.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 13:32:10 +07:00
A
81ab237efe
fix(e2e): harden shell scripts against injection in SSH commands (#2945)
- hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of
  embedding it in the SSH command string via variable expansion. The
  remote bash reads stdin, base64-decodes, and executes.

- verify.sh: Add remote-side re-validation of base64 and timeout values
  in _stage_prompt_remotely and _stage_timeout_remotely. Values are
  assigned to remote shell variables and validated before writing to
  temp files, providing defense-in-depth against injection.

- provision.sh: Add explicit early rejection of dangerous shell chars
  ($, `, \) in env var values from cloud_headless_env, and add
  remote-side re-validation of base64 payload before writing.

Fixes #2937
Fixes #2938
Fixes #2939

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 13:30:47 +07:00
A
50319e0d39
fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935)
Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary
IPs from previous test runs consume the account quota. This adds proactive
cleanup at two levels:

1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached
   primary IPs during pre-batch stale cleanup, freeing quota before any
   new servers are provisioned.

2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()`
   before `createServer()` in headless/non-interactive mode, ensuring
   each agent provisioning attempt starts with a clean IP quota.

The existing reactive cleanup (retry after failure) in `hetzner.ts`
remains as a fallback.

Fixes #2933

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 11:20:30 +07:00
A
8fef58845c
fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798)
The pre-run stale cleanup (added in #2789) used the same 30-minute max_age
as the post-run cleanup. Orphaned instances from recently-failed runs (< 30 min
old) were not cleaned, causing quota exhaustion on DigitalOcean and other clouds.

Pre-run cleanup now uses _CLEANUP_MAX_AGE=300 (5 min) to aggressively reclaim
orphaned e2e instances before provisioning new ones. Post-run cleanup retains
the 30-minute default. All 5 cloud drivers respect the override.

Fixes #2793

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 11:23:55 -07:00
A
6fda75ccc8
security: validate base64 output in cloud_exec and soak.sh (defense-in-depth) (#2532)
Add base64 character validation ([A-Za-z0-9+/=]) before use in SSH
command strings for gcp.sh, aws.sh, and hetzner.sh cloud_exec
functions -- matching the existing fix in digitalocean.sh (#2528).

Also add a validated _encode_b64 helper to soak.sh and use it for
all Telegram bot token encoding, preventing corrupted base64 from
breaking out of single-quoted SSH command strings.

Closes #2527

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 09:32:48 -04:00
A
e9f8d5ec2d
fix: secure curl header args and provision.sh export whitelist (fixes #2464, fixes #2465) (#2471)
- Replace `-H "Authorization: Bearer ..."` curl args with temp curl config
  files (`-K`) in digitalocean.sh and hetzner.sh e2e drivers, keeping API
  tokens out of `ps` output
- Replace dangerous-var blocklist in provision.sh with a positive whitelist
  of allowed cloud_headless_env variable names

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 17:54:32 -07:00
A
1bddd713ea
fix: base64-encode commands in SSH exec to prevent injection (#2448)
All four SSH-based cloud drivers (aws, digitalocean, gcp, hetzner)
passed the command string directly as an SSH argument, which gets
interpreted by the remote shell. While current callers pass trusted
E2E test code, this creates a security footgun for future changes.

Fix: base64-encode the command locally and decode it on the remote
side before piping to bash. The encoded string contains only safe
characters [A-Za-z0-9+/=], eliminating any injection vector. Stdin
is preserved for callers that pipe data into cloud_exec.

Closes #2432, closes #2433, closes #2434, closes #2435

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-10 13:22:33 -04:00
A
3724bb8ba4
fix: address SSH command injection risks in e2e cloud drivers (#2447)
Add defense-in-depth validation across all e2e cloud driver scripts:

- Validate IP addresses match IPv4 format before use in SSH commands
  (aws, digitalocean, gcp, hetzner)
- Validate SSH username contains only safe characters (gcp)
- Validate resource IDs are numeric before interpolating into API URLs
  (digitalocean droplet IDs, hetzner server IDs)
- URL-encode app name in Hetzner API query parameter to prevent
  query parameter injection
- Validate numeric env vars (INPUT_TEST_TIMEOUT, PROVISION_TIMEOUT,
  INSTALL_WAIT) that get interpolated into remote command strings

Fixes #2432, #2433, #2434, #2435, #2442

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:27:47 -04:00
A
c4ae16849d
refactor: remove dead cloud_exec_long and _*_exec_long functions (#2407)
The cloud_exec_long dispatcher in common.sh and all five cloud-specific
_exec_long implementations (aws, digitalocean, gcp, hetzner, sprite)
were defined but never called by any code in the e2e test suite.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 19:39:53 -07:00
A
1740274323
fix: replace base64 interpolation with stdin piping in all cloud exec_long functions (#2290)
Replace unsafe pattern where base64-encoded commands were interpolated
into remote command strings with secure stdin piping — command data now
travels as stdin rather than as part of the command string, eliminating
injection risk from shell metacharacter interpretation.

Affected functions across all 5 cloud drivers:
- _hetzner_exec_long
- _aws_exec_long
- _gcp_exec_long
- _digitalocean_exec_long
- _sprite_exec_long

Fixes #2286
Fixes #2287

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 14:09:15 -05:00
A
548cfdf0b1
fix(security): apply base64 exec escaping to remaining 4 cloud drivers (#2067)
PR #2064 fixed _exec_long shell injection for DigitalOcean and Sprite
but missed the same bash -c '${cmd}' pattern in Hetzner, GCP, AWS, and
Daytona. Apply the same base64-encoding fix to all four.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 11:50:33 -08:00
Ahmed Abushagur
c1e605c884
fix(e2e): increase server sizes and install timeouts (#2014)
E2E tests were failing because agent installs didn't complete within
the default 120s timeout, and small VMs ran out of memory during builds.

- INSTALL_WAIT: 120s → 300s (with per-cloud override via cloud_install_wait)
- AWS: nano_3_0 → medium_3_0 (all agents need 4GB for reliable installs)
- DigitalOcean: s-1vcpu-512mb-10gb → s-2vcpu-2gb, cap at 3 parallel
- GCP: e2-medium → e2-standard-2
- Hetzner: cap at 5 parallel (primary IP limit)
- Sprite: 300s install wait (slower exec than SSH)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-28 00:25:36 -08:00
Ahmed Abushagur
627026a26b
feat(e2e): multi-cloud test suite with cloud driver pattern (#2004)
* feat(e2e): multi-cloud test suite with cloud driver pattern

Scale the E2E test suite from AWS-only to all 6 infrastructure clouds
(aws, hetzner, digitalocean, gcp, daytona, sprite) with parallel
execution support.

Architecture:
- Cloud driver pattern: each cloud implements _cloudname_func() functions
- load_cloud_driver() wires cloud-specific functions to generic names
  (cloud_exec, cloud_teardown, etc.)
- Shared orchestration stays in one place, cloud details are isolated

New files:
- sh/e2e/e2e.sh — unified entry point with --cloud flag
- sh/e2e/lib/clouds/{aws,hetzner,digitalocean,gcp,daytona,sprite}.sh

Refactored:
- common.sh — removed AWS constants, added load_cloud_driver()
- provision.sh — cloud-agnostic via cloud_headless_env/cloud_provision_verify
- verify.sh — replaced aws_ssh with cloud_exec/cloud_exec_long
- teardown.sh/cleanup.sh — delegate to cloud driver functions
- aws-e2e.sh — thin wrapper: exec e2e.sh --cloud aws

Usage:
  e2e.sh --cloud aws                     # Single cloud
  e2e.sh --cloud aws --cloud hetzner     # Multiple clouds in parallel
  e2e.sh --cloud all --parallel 3        # All clouds, 3 agents parallel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(e2e): prevent subshell EXIT trap inheritance and single-cloud early exit

- Reset EXIT trap in multi-cloud subshells to prevent LOG_DIR deletion
  before the main process reads log files
- Use `|| true` for single-cloud run_agents_for_cloud to prevent set -e
  from skipping the summary on env validation failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: default to parallel agent provisioning in e2e tests

All agents within a cloud now run in parallel by default instead of
sequentially. Use --sequential to restore the old behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cap sprite parallelism, 4GB for openclaw, remove stderr suppression

- Sprite: add _sprite_max_parallel (cap 2 concurrent agents) to avoid
  CLI rate limiting that caused all 6 agents to fail
- AWS: use medium_3_0 (4GB) bundle for openclaw which needs more RAM
- Input tests: remove 2>/dev/null from agent commands so failures
  produce visible error output instead of empty responses
- Add cloud_max_parallel to driver interface, respected by e2e.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use bash instead of sh for exec_long across all cloud drivers

Ubuntu's /bin/sh is dash, which doesn't support bash-specific PATH
sourcing from .spawnrc/.cargo/env. This caused codex and zeroclaw
input tests to fail with "command not found" even though verify passed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex input test uses positional prompt, not -q flag

codex CLI takes prompt as positional arg: `codex "PROMPT"`.
The -q flag doesn't exist, causing "Usage:" error output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use codex exec -q for non-interactive input test

codex requires `exec` subcommand for non-interactive mode.
Plain `codex PROMPT` expects a TTY (stdin is not a terminal).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex exec takes no -q flag, just positional prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use cx23 instead of deprecated cx22 for Hetzner e2e tests

Hetzner deprecated server type cx22 (ID 104). The default now uses cx23.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-27 19:28:08 -08:00