GCP, Sprite, and DigitalOcean had commented-out code `# local agent="$2"`
in their `_headless_env` functions. Hetzner already used the cleaner style
`# $2 = agent (unused but part of the interface)`. Normalize to match.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes for Sprite E2E failures in long-running batches (73+ min):
1. Retry `_sprite_provision_verify`: list failures now retry 3x with
exponential backoff (5s, 10s, 20s) instead of failing immediately.
Fixes kilocode batch 6 "Could not list Sprite instances" errors.
2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add
`Client.Timeout`, `request canceled`, and `authentication failed`
to the transient error retry pattern in `spriteRetry`. Also uses
linear backoff (3s * attempt) instead of fixed 3s delay.
Fixes hermes batch 7 HTTP timeout errors.
3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The
E2E orchestrator calls `cloud_refresh_auth` before each provisioning
batch. For Sprite, this re-validates the token via `sprite org list`
and attempts `sprite auth refresh` if expired.
Fixes junie batch 8 "authentication failed" errors.
Fixes#2934
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Apply the same base64 encoding mitigation used by all other cloud
drivers (aws, hetzner, digitalocean, gcp). The command is encoded
locally, validated for safe characters, then decoded and executed
on the remote side via `base64 -d | bash`.
Fixes#2800
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The pre-run stale cleanup (added in #2789) used the same 30-minute max_age
as the post-run cleanup. Orphaned instances from recently-failed runs (< 30 min
old) were not cleaned, causing quota exhaustion on DigitalOcean and other clouds.
Pre-run cleanup now uses _CLEANUP_MAX_AGE=300 (5 min) to aggressively reclaim
orphaned e2e instances before provisioning new ones. Post-run cleanup retains
the 30-minute default. All 5 cloud drivers respect the override.
Fixes#2793
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace word-split _sprite_org_flags() call sites with _sprite_cmd()
helper that uses a proper bash array for the -o flag, eliminating
injection risk from org names with spaces or shell metacharacters
- Validate _SPRITE_ORG against [A-Za-z0-9_-]+ in _sprite_validate_env
- Use grep -qF (fixed-string) instead of grep -q for app name matching
to prevent regex metacharacters in names from causing false matches
- Use mktemp for _stderr_tmp in _sprite_exec instead of predictable
PID-based path (/tmp/sprite-exec-err.$$) to prevent symlink attacks
Closes#2436
Agent: complexity-hunter
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
The cloud_exec_long dispatcher in common.sh and all five cloud-specific
_exec_long implementations (aws, digitalocean, gcp, hetzner, sprite)
were defined but never called by any code in the e2e test suite.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces ${cfg}.fix$$ temp pattern with mktemp for guaranteed uniqueness.
Both temp file usages in the function are updated.
Fixes#2354
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Pipe the command via stdin to bash instead of embedding it in a bash -c
string. This eliminates shell injection risk from unquoted cmd parameter,
consistent with _sprite_exec_long in the same file and other cloud drivers.
Fixes#2327
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace unsafe pattern where base64-encoded commands were interpolated
into remote command strings with secure stdin piping — command data now
travels as stdin rather than as part of the command string, eliminating
injection risk from shell metacharacter interpretation.
Affected functions across all 5 cloud drivers:
- _hetzner_exec_long
- _aws_exec_long
- _gcp_exec_long
- _digitalocean_exec_long
- _sprite_exec_long
Fixes#2286Fixes#2287
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes unquoted ${timeout} in _sprite_exec_long that could allow
command injection if timeout contained shell metacharacters.
Adds numeric validation before use.
Fixes#2117
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Without --force, sprite destroy prompts for confirmation in
non-interactive E2E mode and silently fails ("Ok, come back later!"),
leaving stale instances running indefinitely.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Base64-encode the command before embedding it in bash -c to prevent
single-quote breakout in _sprite_exec_long and _digitalocean_exec_long.
Fixes#2063
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(sprite): fix all 6 Sprite agent installs for E2E
- Use `npm install -g --prefix` instead of `npm config set prefix` to
avoid creating .npmrc that conflicts with nvm on Sprite VMs
- Fix shell environment setup to only modify .bash_profile (not .bashrc)
so non-interactive bash -c commands retain PATH config
- Add $HOME/.cargo/bin to PATH for zeroclaw (Sprite has no ~/.cargo/env)
- Add $HOME/.local/bin to PATH config for Sprite shell environment
- Add sprite E2E cloud driver with org detection, config corruption fix,
direct command embedding (not $1 positional), and retry logic
- Fix provision.sh to kill full process tree after timeout (prevents
orphaned sprite exec sessions from corrupting config)
- Fix verify.sh zeroclaw check to not rely on ~/.cargo/env existing
Tested: 6/6 Sprite agents pass E2E (claude, codex, openclaw, zeroclaw,
opencode, kilocode). Hermes is not in the Sprite manifest.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: biome format - collapse runSprite call to single line
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Replace bash -c "${cmd}" with bash -c '$1' _ "${cmd}" so the
command is passed as a positional argument, not interpolated into
the shell string. Same pattern applied to the timeout wrapper.
Fixes#2018
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
E2E tests were failing because agent installs didn't complete within
the default 120s timeout, and small VMs ran out of memory during builds.
- INSTALL_WAIT: 120s → 300s (with per-cloud override via cloud_install_wait)
- AWS: nano_3_0 → medium_3_0 (all agents need 4GB for reliable installs)
- DigitalOcean: s-1vcpu-512mb-10gb → s-2vcpu-2gb, cap at 3 parallel
- GCP: e2-medium → e2-standard-2
- Hetzner: cap at 5 parallel (primary IP limit)
- Sprite: 300s install wait (slower exec than SSH)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat(e2e): multi-cloud test suite with cloud driver pattern
Scale the E2E test suite from AWS-only to all 6 infrastructure clouds
(aws, hetzner, digitalocean, gcp, daytona, sprite) with parallel
execution support.
Architecture:
- Cloud driver pattern: each cloud implements _cloudname_func() functions
- load_cloud_driver() wires cloud-specific functions to generic names
(cloud_exec, cloud_teardown, etc.)
- Shared orchestration stays in one place, cloud details are isolated
New files:
- sh/e2e/e2e.sh — unified entry point with --cloud flag
- sh/e2e/lib/clouds/{aws,hetzner,digitalocean,gcp,daytona,sprite}.sh
Refactored:
- common.sh — removed AWS constants, added load_cloud_driver()
- provision.sh — cloud-agnostic via cloud_headless_env/cloud_provision_verify
- verify.sh — replaced aws_ssh with cloud_exec/cloud_exec_long
- teardown.sh/cleanup.sh — delegate to cloud driver functions
- aws-e2e.sh — thin wrapper: exec e2e.sh --cloud aws
Usage:
e2e.sh --cloud aws # Single cloud
e2e.sh --cloud aws --cloud hetzner # Multiple clouds in parallel
e2e.sh --cloud all --parallel 3 # All clouds, 3 agents parallel
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(e2e): prevent subshell EXIT trap inheritance and single-cloud early exit
- Reset EXIT trap in multi-cloud subshells to prevent LOG_DIR deletion
before the main process reads log files
- Use `|| true` for single-cloud run_agents_for_cloud to prevent set -e
from skipping the summary on env validation failure
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: default to parallel agent provisioning in e2e tests
All agents within a cloud now run in parallel by default instead of
sequentially. Use --sequential to restore the old behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: cap sprite parallelism, 4GB for openclaw, remove stderr suppression
- Sprite: add _sprite_max_parallel (cap 2 concurrent agents) to avoid
CLI rate limiting that caused all 6 agents to fail
- AWS: use medium_3_0 (4GB) bundle for openclaw which needs more RAM
- Input tests: remove 2>/dev/null from agent commands so failures
produce visible error output instead of empty responses
- Add cloud_max_parallel to driver interface, respected by e2e.sh
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use bash instead of sh for exec_long across all cloud drivers
Ubuntu's /bin/sh is dash, which doesn't support bash-specific PATH
sourcing from .spawnrc/.cargo/env. This caused codex and zeroclaw
input tests to fail with "command not found" even though verify passed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: codex input test uses positional prompt, not -q flag
codex CLI takes prompt as positional arg: `codex "PROMPT"`.
The -q flag doesn't exist, causing "Usage:" error output.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use codex exec -q for non-interactive input test
codex requires `exec` subcommand for non-interactive mode.
Plain `codex PROMPT` expects a TTY (stdin is not a terminal).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: codex exec takes no -q flag, just positional prompt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use cx23 instead of deprecated cx22 for Hetzner e2e tests
Hetzner deprecated server type cx22 (ID 104). The default now uses cx23.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>