Commit graph

39 commits

Author SHA1 Message Date
Muhammad Hashmi
9b176cd5b8
feat(daytona): add Daytona provider (#3168)
* feat(daytona): re-add Daytona cloud provider

* fix(daytona): tighten live provider behavior

* fix(daytona): harden reconnect and dashboard flows
2026-04-04 00:36:38 +00:00
A
15df9dfae3
fix(security): array-based agent detection and GCP instance name validation (#3158)
* fix(security): array-based agent detection and GCP instance name validation

Replace shell string concatenation in detectAgent() with individual
`command -v` calls per agent, eliminating the compound shell command.
Add _gcp_validate_instance_name() to validate GCP instance names match
[a-z][a-z0-9-]*[a-z0-9] before passing to gcloud commands.

Fixes #3151
Fixes #3149

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add instance name validation in _gcp_cleanup_stale()

Defense-in-depth: validate instance names from GCP API before passing
to gcloud delete, consistent with validation at other call sites.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-03 11:24:33 +07:00
A
e3278578ee
fix(e2e): skip GCP tests when billing is disabled (#3146)
Add a billing pre-check to _gcp_validate_env so the E2E orchestrator
skips GCP gracefully ("skipped — credentials not configured") instead
of failing every agent individually when billing is disabled.

Fixes #3091

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 19:26:42 +07:00
A
e98a3a5c4b
fix(e2e): use jq to count DigitalOcean droplets instead of grep (#3125)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
The previous grep -o '"id":[0-9]*' pattern matched all numeric id fields
in the droplets JSON response (including nested image/region/size ids),
overcounting droplets by 2x and falsely reporting quota exhaustion.

Replace with jq '.droplets | length' which correctly counts only top-level
droplet objects. This restores DigitalOcean capacity detection so e2e runs
can use available droplet slots.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-31 16:32:33 +07:00
A
455f4cd43e
fix(e2e): redirect DO max_parallel log_warn to stderr (#3110)
_digitalocean_max_parallel() called log_warn which writes colored output
to stdout, polluting the captured return value when invoked via
cloud_max=$(cloud_max_parallel). The downstream integer comparison
[ "${effective_parallel}" -gt "${cloud_max}" ] then fails with
'integer expression expected', silently leaving the droplet limit cap
unapplied. Fix: redirect log_warn output to stderr so only the numeric
value is captured.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-31 11:32:51 +07:00
A
b0f9f4e7af
refactor(e2e): normalize unused-arg comments in headless_env functions (#3113)
GCP, Sprite, and DigitalOcean had commented-out code `# local agent="$2"`
in their `_headless_env` functions. Hetzner already used the cleaner style
`# $2 = agent (unused but part of the interface)`. Normalize to match.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 03:51:07 +07:00
A
f2f981bd0a
fix(e2e): reduce Hetzner batch parallelism from 3 to 2 (#3112)
Prevents server_limit_reached errors when pre-existing servers (e.g.
spawn-szil) consume quota during E2E batch 1.

Fixes #3111

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-31 03:08:18 +07:00
A
0bd8930c09
fix(digitalocean): use canonical DIGITALOCEAN_ACCESS_TOKEN env var (#3099)
Replaces all references to DO_API_TOKEN with DIGITALOCEAN_ACCESS_TOKEN,
matching DigitalOcean's official CLI and API documentation. This includes
TypeScript source, tests, shell scripts, Packer config, CI workflows,
and documentation.

Supersedes #3068 (rebased onto current main).

Agent: pr-maintainer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-30 08:48:56 +07:00
A
11f0c334aa
fix(digitalocean): fail fast when droplet quota is exhausted, list existing droplets (#3062)
Some checks failed
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Build Docker Images / build (claude) (push) Has been cancelled
Build Docker Images / build (codex) (push) Has been cancelled
Build Docker Images / build (cursor) (push) Has been cancelled
Build Docker Images / build (hermes) (push) Has been cancelled
Build Docker Images / build (junie) (push) Has been cancelled
Build Docker Images / build (kilocode) (push) Has been cancelled
Build Docker Images / build (openclaw) (push) Has been cancelled
Build Docker Images / build (opencode) (push) Has been cancelled
Build Docker Images / build (zeroclaw) (push) Has been cancelled
- E2E: _digitalocean_max_parallel() now returns 0 (not 1) when no capacity
- E2E: run_agents_for_cloud() skips cloud with actionable error when capacity is 0
- CLI: checkAccountStatus() includes droplet names in limit-reached error message

Fixes #3059

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:49:18 +07:00
A
499eb494c6
fix(security): use StrictHostKeyChecking=accept-new in all SSH connections (#3037)
Replace StrictHostKeyChecking=no with accept-new across all E2E cloud
drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS
constant, and pull-history.ts. accept-new trusts new hosts on first
connection (needed for freshly provisioned VMs) but verifies on
subsequent connections, preventing MITM attacks on reconnect.

Fixes #3031

Agent: style-reviewer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-26 18:04:40 -07:00
A
aafdb8655f
fix(security): pipe encoded commands via stdin in GCP/AWS exec functions (#3036)
Replace shell interpolation of base64-encoded commands in SSH invocations
with stdin piping. Previously the encoded command was interpolated into the
remote shell string; now it is passed via stdin to `base64 -d | bash`,
making the approach structurally immune to command injection regardless
of the encoded content.

Fixes #3029
Fixes #3022

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 06:11:50 +07:00
A
defca448b0
fix(e2e): load GCP_ZONE from ~/.config/spawn/gcp.json in E2E driver (#3017)
The GCP E2E cloud driver defaulted to us-central1-a when GCP_ZONE was
not set in the environment. The QA VM stores zone config in
~/.config/spawn/gcp.json (alongside GCP_PROJECT) but _gcp_validate_env
only read GCP_PROJECT from the environment — it never loaded GCP_ZONE.

This caused E2E failures when us-central1-a had insufficient resources:
3 agents (openclaw, opencode, kilocode) failed with "SSH port never
opened" because GCP couldn't provision instances in that zone.

Fix: load both GCP_PROJECT and GCP_ZONE from the config file in
_gcp_validate_env when they are not already set in the environment,
matching how key-request.sh loads GCP_PROJECT for provisioning.

Verified: all 3 previously failing agents now pass on europe-west1-b.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:27:46 +07:00
A
aafeda4020
fix(e2e): reduce Hetzner max parallel from 5 to 3 to respect primary IP quota (#2943)
The QA account's primary IP limit is ~3, so running 5 agents in parallel
exhausted the quota, causing codex and zeroclaw to fail with
resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps
provisioning within quota while still running agents concurrently.

Verified: zeroclaw and codex both PASS on Hetzner after this fix.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 13:32:10 +07:00
A
81ab237efe
fix(e2e): harden shell scripts against injection in SSH commands (#2945)
- hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of
  embedding it in the SSH command string via variable expansion. The
  remote bash reads stdin, base64-decodes, and executes.

- verify.sh: Add remote-side re-validation of base64 and timeout values
  in _stage_prompt_remotely and _stage_timeout_remotely. Values are
  assigned to remote shell variables and validated before writing to
  temp files, providing defense-in-depth against injection.

- provision.sh: Add explicit early rejection of dangerous shell chars
  ($, `, \) in env var values from cloud_headless_env, and add
  remote-side re-validation of base64 payload before writing.

Fixes #2937
Fixes #2938
Fixes #2939

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 13:30:47 +07:00
A
e9cbab5b7f
fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936)
Three fixes for Sprite E2E failures in long-running batches (73+ min):

1. Retry `_sprite_provision_verify`: list failures now retry 3x with
   exponential backoff (5s, 10s, 20s) instead of failing immediately.
   Fixes kilocode batch 6 "Could not list Sprite instances" errors.

2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add
   `Client.Timeout`, `request canceled`, and `authentication failed`
   to the transient error retry pattern in `spriteRetry`. Also uses
   linear backoff (3s * attempt) instead of fixed 3s delay.
   Fixes hermes batch 7 HTTP timeout errors.

3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The
   E2E orchestrator calls `cloud_refresh_auth` before each provisioning
   batch. For Sprite, this re-validates the token via `sprite org list`
   and attempts `sprite auth refresh` if expired.
   Fixes junie batch 8 "authentication failed" errors.

Fixes #2934

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:47:58 -07:00
A
50319e0d39
fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935)
Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary
IPs from previous test runs consume the account quota. This adds proactive
cleanup at two levels:

1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached
   primary IPs during pre-batch stale cleanup, freeing quota before any
   new servers are provisioned.

2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()`
   before `createServer()` in headless/non-interactive mode, ensuring
   each agent provisioning attempt starts with a clean IP quota.

The existing reactive cleanup (retry after failure) in `hetzner.ts`
remains as a fallback.

Fixes #2933

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 11:20:30 +07:00
A
8d76ad90d3
security: base64-encode cmd in _sprite_exec to prevent injection (#2803)
Apply the same base64 encoding mitigation used by all other cloud
drivers (aws, hetzner, digitalocean, gcp). The command is encoded
locally, validated for safe characters, then decoded and executed
on the remote side via `base64 -d | bash`.

Fixes #2800

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-19 13:19:07 -07:00
A
8fef58845c
fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798)
The pre-run stale cleanup (added in #2789) used the same 30-minute max_age
as the post-run cleanup. Orphaned instances from recently-failed runs (< 30 min
old) were not cleaned, causing quota exhaustion on DigitalOcean and other clouds.

Pre-run cleanup now uses _CLEANUP_MAX_AGE=300 (5 min) to aggressively reclaim
orphaned e2e instances before provisioning new ones. Post-run cleanup retains
the 30-minute default. All 5 cloud drivers respect the override.

Fixes #2793

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 11:23:55 -07:00
A
6fda75ccc8
security: validate base64 output in cloud_exec and soak.sh (defense-in-depth) (#2532)
Add base64 character validation ([A-Za-z0-9+/=]) before use in SSH
command strings for gcp.sh, aws.sh, and hetzner.sh cloud_exec
functions -- matching the existing fix in digitalocean.sh (#2528).

Also add a validated _encode_b64 helper to soak.sh and use it for
all Telegram bot token encoding, preventing corrupted base64 from
breaking out of single-quoted SSH command strings.

Closes #2527

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 09:32:48 -04:00
A
76399eafd9
security: validate base64 in digitalocean.sh SSH exec (defense-in-depth) (#2528)
Add explicit base64 character validation in _digitalocean_exec after
encoding the command, matching the existing pattern in provision.sh.
This ensures the encoded value contains only [A-Za-z0-9+/=] before
embedding it in the SSH command string.

Note: #2527 (provision.sh base64 validation) was already fixed in a
prior commit — the validation at lines 284-289 already rejects
non-base64 characters and empty output.

Fixes #2526

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 08:16:48 -04:00
A
85a2289bb0
fix(e2e): dynamically calculate DigitalOcean parallel capacity from account limit (#2518)
Previously, _digitalocean_max_parallel() always returned 3, assuming all
quota slots were available. When pre-existing droplets occupy slots, the
batch-3 parallel runs fail with "droplet limit exceeded" API errors.

Now queries /v2/account for the actual droplet_limit and subtracts the
current droplet count to compute available capacity. Falls back to 3 if
the API is unreachable.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-12 02:50:48 -04:00
A
e9f8d5ec2d
fix: secure curl header args and provision.sh export whitelist (fixes #2464, fixes #2465) (#2471)
- Replace `-H "Authorization: Bearer ..."` curl args with temp curl config
  files (`-K`) in digitalocean.sh and hetzner.sh e2e drivers, keeping API
  tokens out of `ps` output
- Replace dangerous-var blocklist in provision.sh with a positive whitelist
  of allowed cloud_headless_env variable names

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 17:54:32 -07:00
A
1bddd713ea
fix: base64-encode commands in SSH exec to prevent injection (#2448)
All four SSH-based cloud drivers (aws, digitalocean, gcp, hetzner)
passed the command string directly as an SSH argument, which gets
interpreted by the remote shell. While current callers pass trusted
E2E test code, this creates a security footgun for future changes.

Fix: base64-encode the command locally and decode it on the remote
side before piping to bash. The encoded string contains only safe
characters [A-Za-z0-9+/=], eliminating any injection vector. Stdin
is preserved for callers that pipe data into cloud_exec.

Closes #2432, closes #2433, closes #2434, closes #2435

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-10 13:22:33 -04:00
A
47b26deafa
fix: harden Sprite exec against injection via org flags and grep patterns (#2446)
- Replace word-split _sprite_org_flags() call sites with _sprite_cmd()
  helper that uses a proper bash array for the -o flag, eliminating
  injection risk from org names with spaces or shell metacharacters
- Validate _SPRITE_ORG against [A-Za-z0-9_-]+ in _sprite_validate_env
- Use grep -qF (fixed-string) instead of grep -q for app name matching
  to prevent regex metacharacters in names from causing false matches
- Use mktemp for _stderr_tmp in _sprite_exec instead of predictable
  PID-based path (/tmp/sprite-exec-err.$$) to prevent symlink attacks

Closes #2436

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-10 10:08:17 -07:00
A
3724bb8ba4
fix: address SSH command injection risks in e2e cloud drivers (#2447)
Add defense-in-depth validation across all e2e cloud driver scripts:

- Validate IP addresses match IPv4 format before use in SSH commands
  (aws, digitalocean, gcp, hetzner)
- Validate SSH username contains only safe characters (gcp)
- Validate resource IDs are numeric before interpolating into API URLs
  (digitalocean droplet IDs, hetzner server IDs)
- URL-encode app name in Hetzner API query parameter to prevent
  query parameter injection
- Validate numeric env vars (INPUT_TEST_TIMEOUT, PROVISION_TIMEOUT,
  INSTALL_WAIT) that get interpolated into remote command strings

Fixes #2432, #2433, #2434, #2435, #2442

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 12:27:47 -04:00
A
c4ae16849d
refactor: remove dead cloud_exec_long and _*_exec_long functions (#2407)
The cloud_exec_long dispatcher in common.sh and all five cloud-specific
_exec_long implementations (aws, digitalocean, gcp, hetzner, sprite)
were defined but never called by any code in the e2e test suite.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-09 19:39:53 -07:00
A
bd1399c861
fix: use mktemp in _sprite_fix_config to prevent race conditions (#2359)
Replaces ${cfg}.fix$$ temp pattern with mktemp for guaranteed uniqueness.
Both temp file usages in the function are updated.

Fixes #2354

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 18:46:48 -07:00
A
de732fa695
fix: prevent command injection in _sprite_exec via stdin piping (#2329)
Pipe the command via stdin to bash instead of embedding it in a bash -c
string. This eliminates shell injection risk from unquoted cmd parameter,
consistent with _sprite_exec_long in the same file and other cloud drivers.

Fixes #2327

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 06:44:19 -04:00
A
1740274323
fix: replace base64 interpolation with stdin piping in all cloud exec_long functions (#2290)
Replace unsafe pattern where base64-encoded commands were interpolated
into remote command strings with secure stdin piping — command data now
travels as stdin rather than as part of the command string, eliminating
injection risk from shell metacharacter interpretation.

Affected functions across all 5 cloud drivers:
- _hetzner_exec_long
- _aws_exec_long
- _gcp_exec_long
- _digitalocean_exec_long
- _sprite_exec_long

Fixes #2286
Fixes #2287

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 14:09:15 -05:00
A
035e4bf830
Remove Daytona cloud provider from codebase (#2261)
Simplify the cloud matrix by removing Daytona. All Daytona-specific code,
scripts, tests, and configuration have been removed. Daytona has been moved
to "Previously Considered" in the Cloud Provider Wishlist (#1183) and can
be revived on community demand.

Closes #2260

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-06 18:53:08 -05:00
A
7cc21e4111
fix(security): quote timeout var and validate numeric in sprite.sh (#2120)
Fixes unquoted ${timeout} in _sprite_exec_long that could allow
command injection if timeout contained shell metacharacters.
Adds numeric validation before use.

Fixes #2117

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-02 16:47:39 -05:00
Ahmed Abushagur
9242d44cbb
fix(e2e): add --force to sprite destroy in teardown (#2100)
Without --force, sprite destroy prompts for confirmation in
non-interactive E2E mode and silently fails ("Ok, come back later!"),
leaving stale instances running indefinitely.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 00:24:43 -08:00
A
548cfdf0b1
fix(security): apply base64 exec escaping to remaining 4 cloud drivers (#2067)
PR #2064 fixed _exec_long shell injection for DigitalOcean and Sprite
but missed the same bash -c '${cmd}' pattern in Hetzner, GCP, AWS, and
Daytona. Apply the same base64-encoding fix to all four.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 11:50:33 -08:00
A
862030b776
fix(security): escape cmd args in _exec_long to prevent shell injection (#2064)
Base64-encode the command before embedding it in bash -c to prevent
single-quote breakout in _sprite_exec_long and _digitalocean_exec_long.

Fixes #2063

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:42:27 -05:00
Ahmed Abushagur
45caf4b96b
fix(sprite): fix all 6 Sprite agent installs for E2E (#2057)
* fix(sprite): fix all 6 Sprite agent installs for E2E

- Use `npm install -g --prefix` instead of `npm config set prefix` to
  avoid creating .npmrc that conflicts with nvm on Sprite VMs
- Fix shell environment setup to only modify .bash_profile (not .bashrc)
  so non-interactive bash -c commands retain PATH config
- Add $HOME/.cargo/bin to PATH for zeroclaw (Sprite has no ~/.cargo/env)
- Add $HOME/.local/bin to PATH config for Sprite shell environment
- Add sprite E2E cloud driver with org detection, config corruption fix,
  direct command embedding (not $1 positional), and retry logic
- Fix provision.sh to kill full process tree after timeout (prevents
  orphaned sprite exec sessions from corrupting config)
- Fix verify.sh zeroclaw check to not rely on ~/.cargo/env existing

Tested: 6/6 Sprite agents pass E2E (claude, codex, openclaw, zeroclaw,
opencode, kilocode). Hermes is not in the Sprite manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format - collapse runSprite call to single line

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-01 07:15:09 -05:00
Ahmed Abushagur
cd758589c3
fix(e2e): robust DigitalOcean teardown with retry and deletion confirmation (#2046)
The teardown was doing a single DELETE without --max-time, so connection
timeouts caused HTTP 000 and the droplet was never deleted. When running
6 agents in batches of 3, batch 1's stale droplet caused batch 2 to fail
with "will exceed your droplet limit."

Fix:
- Add --max-time 30 to prevent curl hangs
- Retry DELETE up to 3 times on failure
- Poll the API after DELETE to confirm the droplet is actually gone (up to 60s)
- Remove -f flag from curl so %{http_code} is always captured

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 22:09:15 -05:00
A
54c7764d03
fix(security): prevent cmd injection in sprite exec via positional args (#2021)
Replace bash -c "${cmd}" with bash -c '$1' _ "${cmd}" so the
command is passed as a positional argument, not interpolated into
the shell string. Same pattern applied to the timeout wrapper.

Fixes #2018

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 09:42:13 -05:00
Ahmed Abushagur
c1e605c884
fix(e2e): increase server sizes and install timeouts (#2014)
E2E tests were failing because agent installs didn't complete within
the default 120s timeout, and small VMs ran out of memory during builds.

- INSTALL_WAIT: 120s → 300s (with per-cloud override via cloud_install_wait)
- AWS: nano_3_0 → medium_3_0 (all agents need 4GB for reliable installs)
- DigitalOcean: s-1vcpu-512mb-10gb → s-2vcpu-2gb, cap at 3 parallel
- GCP: e2-medium → e2-standard-2
- Hetzner: cap at 5 parallel (primary IP limit)
- Sprite: 300s install wait (slower exec than SSH)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-28 00:25:36 -08:00
Ahmed Abushagur
627026a26b
feat(e2e): multi-cloud test suite with cloud driver pattern (#2004)
* feat(e2e): multi-cloud test suite with cloud driver pattern

Scale the E2E test suite from AWS-only to all 6 infrastructure clouds
(aws, hetzner, digitalocean, gcp, daytona, sprite) with parallel
execution support.

Architecture:
- Cloud driver pattern: each cloud implements _cloudname_func() functions
- load_cloud_driver() wires cloud-specific functions to generic names
  (cloud_exec, cloud_teardown, etc.)
- Shared orchestration stays in one place, cloud details are isolated

New files:
- sh/e2e/e2e.sh — unified entry point with --cloud flag
- sh/e2e/lib/clouds/{aws,hetzner,digitalocean,gcp,daytona,sprite}.sh

Refactored:
- common.sh — removed AWS constants, added load_cloud_driver()
- provision.sh — cloud-agnostic via cloud_headless_env/cloud_provision_verify
- verify.sh — replaced aws_ssh with cloud_exec/cloud_exec_long
- teardown.sh/cleanup.sh — delegate to cloud driver functions
- aws-e2e.sh — thin wrapper: exec e2e.sh --cloud aws

Usage:
  e2e.sh --cloud aws                     # Single cloud
  e2e.sh --cloud aws --cloud hetzner     # Multiple clouds in parallel
  e2e.sh --cloud all --parallel 3        # All clouds, 3 agents parallel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(e2e): prevent subshell EXIT trap inheritance and single-cloud early exit

- Reset EXIT trap in multi-cloud subshells to prevent LOG_DIR deletion
  before the main process reads log files
- Use `|| true` for single-cloud run_agents_for_cloud to prevent set -e
  from skipping the summary on env validation failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: default to parallel agent provisioning in e2e tests

All agents within a cloud now run in parallel by default instead of
sequentially. Use --sequential to restore the old behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cap sprite parallelism, 4GB for openclaw, remove stderr suppression

- Sprite: add _sprite_max_parallel (cap 2 concurrent agents) to avoid
  CLI rate limiting that caused all 6 agents to fail
- AWS: use medium_3_0 (4GB) bundle for openclaw which needs more RAM
- Input tests: remove 2>/dev/null from agent commands so failures
  produce visible error output instead of empty responses
- Add cloud_max_parallel to driver interface, respected by e2e.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use bash instead of sh for exec_long across all cloud drivers

Ubuntu's /bin/sh is dash, which doesn't support bash-specific PATH
sourcing from .spawnrc/.cargo/env. This caused codex and zeroclaw
input tests to fail with "command not found" even though verify passed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex input test uses positional prompt, not -q flag

codex CLI takes prompt as positional arg: `codex "PROMPT"`.
The -q flag doesn't exist, causing "Usage:" error output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use codex exec -q for non-interactive input test

codex requires `exec` subcommand for non-interactive mode.
Plain `codex PROMPT` expects a TTY (stdin is not a terminal).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex exec takes no -q flag, just positional prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use cx23 instead of deprecated cx22 for Hetzner e2e tests

Hetzner deprecated server type cx22 (ID 104). The default now uses cx23.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-27 19:28:08 -08:00