Commit graph

212 commits

Author SHA1 Message Date
A
8c73bb9713
fix(security): replace fragile printenv with eval parameter expansion in timeout functions (#3238)
The get_provision_timeout and get_agent_timeout functions used printenv with
dynamically constructed variable names, which is fragile across shells and
platforms. Replace with eval-based parameter expansion using the already-
sanitized safe_agent variable (restricted to [A-Za-z0-9_]).

Fixes #3234

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-08 01:44:43 -07:00
A
1745b78689
fix(security): restrict temp file permissions in send_matrix_email (#3239)
Set umask 077 before mktemp so the temp .ts file is created with 0600
permissions, preventing other users on shared systems from reading it.
Umask is restored immediately after file creation.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-08 15:33:34 +07:00
A
7e44923fb9
fix(security): eliminate TOCTOU race in e2e.sh LOG_DIR cleanup (#3237)
The previous code resolved symlinks via realpath then operated on the
resolved path, leaving a window where an attacker could swap the symlink
target between resolution and rm -rf (CWE-367).

Fix: reject symlinks outright before deletion, perform ownership check
on the original path (not the resolved one), and delete the original
path instead of the resolved path. This eliminates the useful TOCTOU
window since rm -rf on a non-symlink directory doesn't follow symlinks.

Fixes #3233

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-08 13:11:56 +07:00
A
a9b429e0fd
fix(security): replace eval with safer alternatives in common.sh timeout functions (#3229)
Replace eval-based indirect variable expansion with:
- printenv for environment variable lookups (PROVISION_TIMEOUT_<agent>, AGENT_TIMEOUT_<agent>)
- Case statement lookup tables for builtin per-agent defaults

Fixes #3228

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-08 11:27:03 +07:00
A
05fbb2ebdc
fix(security): validate realpath result before LOG_DIR deletion in e2e.sh (#3225)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Fixes #3222

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-08 07:43:34 +07:00
A
f251ed59ba
fix(security): harden e2e.sh against injection, symlink, and DoS (#3197)
Some checks are pending
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
- Sanitize cloud/agent names before building email HTML (#3189)
- Validate result values against allowlist (pass/fail/skip)
- Resolve symlinks and check ownership before rm -rf (#3194)
- Add upper bounds on cloud/agent list sizes (#3190)

Fixes #3189 #3194 #3190

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 06:16:38 -07:00
A
aa98039f95
fix(e2e): validate LOG_DIR ownership before rm -rf in final_cleanup (#3183)
Some checks are pending
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
* fix(e2e): validate LOG_DIR ownership before rm -rf in final_cleanup

Adds _E2E_CREATED_LOG_DIR tracking to ensure cleanup only removes
directories created by this script instance, not attacker-controlled paths.

Fixes #3181

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(e2e): restore SAFE_TMP_ROOT prefix validation alongside ownership check

Defense-in-depth: keep both the path prefix check (SAFE_TMP_ROOT/spawn-e2e.*)
and the ownership check (_E2E_CREATED_LOG_DIR) as two independent layers.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-05 19:56:55 +07:00
Muhammad Hashmi
a60d238dfc
fix(daytona): set per-sandbox user/org defaults (#3175)
* feat(daytona): re-add Daytona cloud provider

* fix(daytona): tighten live provider behavior

* fix(daytona): harden reconnect and dashboard flows

* fix(daytona): use platform sandbox defaults

* fix(daytona): add user and org defaults

* fix(ux): stop echoing shell script on startup

---------

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
2026-04-04 18:08:40 -07:00
Muhammad Hashmi
9b176cd5b8
feat(daytona): add Daytona provider (#3168)
* feat(daytona): re-add Daytona cloud provider

* fix(daytona): tighten live provider behavior

* fix(daytona): harden reconnect and dashboard flows
2026-04-04 00:36:38 +00:00
A
15df9dfae3
fix(security): array-based agent detection and GCP instance name validation (#3158)
* fix(security): array-based agent detection and GCP instance name validation

Replace shell string concatenation in detectAgent() with individual
`command -v` calls per agent, eliminating the compound shell command.
Add _gcp_validate_instance_name() to validate GCP instance names match
[a-z][a-z0-9-]*[a-z0-9] before passing to gcloud commands.

Fixes #3151
Fixes #3149

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add instance name validation in _gcp_cleanup_stale()

Defense-in-depth: validate instance names from GCP API before passing
to gcloud delete, consistent with validation at other call sites.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-03 11:24:33 +07:00
A
e157637ab8
fix(e2e): add pi to E2E agent coverage (#3156)
Fixes #3152

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-04-03 10:15:43 +07:00
A
e3278578ee
fix(e2e): skip GCP tests when billing is disabled (#3146)
Add a billing pre-check to _gcp_validate_env so the E2E orchestrator
skips GCP gracefully ("skipped — credentials not configured") instead
of failing every agent individually when billing is disabled.

Fixes #3091

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-02 19:26:42 +07:00
A
c1d8acb73e
feat: add Pi coding agent (shittycodingagent.ai) to spawn (#3128)
Pi is a minimal terminal coding agent by Mario Zechner (~29.8k GitHub
stars) that natively supports OpenRouter via OPENROUTER_API_KEY.
Installed via npm as @mariozechner/pi-coding-agent, CLI command is `pi`.

- Add Pi agent config across all 6 clouds (local, hetzner, aws, do, gcp, sprite)
- Add manifest.json entry with matrix entries
- Add agent-setup.ts config (node cloudInitTier, npm install)
- Add spawn-skill.ts injection path (~/.pi/agent/skills/spawn/SKILL.md)
- Add bash wrappers for all clouds
- Update README matrix (also adds missing Cursor CLI row: 10 agents, 60 combos)

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 17:34:34 -07:00
A
e98a3a5c4b
fix(e2e): use jq to count DigitalOcean droplets instead of grep (#3125)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
The previous grep -o '"id":[0-9]*' pattern matched all numeric id fields
in the droplets JSON response (including nested image/region/size ids),
overcounting droplets by 2x and falsely reporting quota exhaustion.

Replace with jq '.droplets | length' which correctly counts only top-level
droplet objects. This restores DigitalOcean capacity detection so e2e runs
can use available droplet slots.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-31 16:32:33 +07:00
A
455f4cd43e
fix(e2e): redirect DO max_parallel log_warn to stderr (#3110)
_digitalocean_max_parallel() called log_warn which writes colored output
to stdout, polluting the captured return value when invoked via
cloud_max=$(cloud_max_parallel). The downstream integer comparison
[ "${effective_parallel}" -gt "${cloud_max}" ] then fails with
'integer expression expected', silently leaving the droplet limit cap
unapplied. Fix: redirect log_warn output to stderr so only the numeric
value is captured.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-31 11:32:51 +07:00
A
5e0144b645
fix(zeroclaw): remove broken zeroclaw agent (repo 404) (#3107)
* fix(zeroclaw): remove broken zeroclaw agent (repo 404)

The zeroclaw-labs/zeroclaw GitHub repository returns 404 — all installs
fail. Remove zeroclaw entirely from the matrix: agent definition,
setup code, shell scripts, e2e tests, packer config, skill files,
and documentation.

Fixes #3102

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(zeroclaw): remove stale zeroclaw reference from discovery.md ARM agents list

Addresses security review on PR #3107 — the last remaining zeroclaw
reference in .claude/rules/discovery.md is now removed.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(zeroclaw): remove remaining stale zeroclaw references from CI/packer

Remove zeroclaw from:
- .github/workflows/agent-tarballs.yml ARM build matrix
- .github/workflows/docker.yml agent matrix
- packer/digitalocean.pkr.hcl comment
- sh/e2e/e2e.sh comment

Addresses all 5 stale references flagged in security review of PR #3107.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-30 15:35:40 -07:00
A
b0f9f4e7af
refactor(e2e): normalize unused-arg comments in headless_env functions (#3113)
GCP, Sprite, and DigitalOcean had commented-out code `# local agent="$2"`
in their `_headless_env` functions. Hetzner already used the cleaner style
`# $2 = agent (unused but part of the interface)`. Normalize to match.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 03:51:07 +07:00
A
f2f981bd0a
fix(e2e): reduce Hetzner batch parallelism from 3 to 2 (#3112)
Prevents server_limit_reached errors when pre-existing servers (e.g.
spawn-szil) consume quota during E2E batch 1.

Fixes #3111

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-31 03:08:18 +07:00
A
0bd8930c09
fix(digitalocean): use canonical DIGITALOCEAN_ACCESS_TOKEN env var (#3099)
Replaces all references to DO_API_TOKEN with DIGITALOCEAN_ACCESS_TOKEN,
matching DigitalOcean's official CLI and API documentation. This includes
TypeScript source, tests, shell scripts, Packer config, CI workflows,
and documentation.

Supersedes #3068 (rebased onto current main).

Agent: pr-maintainer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-30 08:48:56 +07:00
A
a29d0d8a15
fix(security): replace variable-stored shell code with named function in verify.sh (#3073)
Some checks are pending
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Fixes #3070

The port_check / port_check_r variables stored executable shell code as
strings and expanded them via ${port_check} inside cloud_exec commands.
This is an eval-equivalent pattern: if any part of the variable were ever
derived from dynamic input, it would be directly exploitable as command
injection.

Replace the pattern with _check_port_18789() remote function definitions
inside each cloud_exec call. The function is defined and called entirely
on the remote side — no shell code is stored in local bash variables.

Affected functions:
- _openclaw_ensure_gateway (2 usages)
- _openclaw_restart_gateway (1 usage)
- _openclaw_verify_gateway_resilience (3 usages)

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 11:25:00 +07:00
A
4db068d0c4
fix(github-auth): add sudo availability check before use (#3072)
In rootless containers or environments without sudo, the script
previously failed with cryptic errors. Now fails fast with a clear
error message when non-root and sudo is unavailable.

Fixes #3069

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-28 08:39:22 +07:00
A
f9b81475fe
fix(cursor): remove stale ~/.cursor/bin references missed in #3058 migration (#3066)
Clean up three remaining stale references to ~/.cursor/bin that were
not caught in the #3058 path migration:

- manifest.json: update notes field to reflect ~/.local/bin/agent
- sh/e2e/lib/provision.sh: remove ~/.cursor/bin from path_prefix
- sh/e2e/lib/verify.sh: remove ~/.cursor/bin from binary check PATH

Fixes #3065

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 19:51:10 +07:00
A
11f0c334aa
fix(digitalocean): fail fast when droplet quota is exhausted, list existing droplets (#3062)
Some checks failed
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Build Docker Images / build (claude) (push) Has been cancelled
Build Docker Images / build (codex) (push) Has been cancelled
Build Docker Images / build (cursor) (push) Has been cancelled
Build Docker Images / build (hermes) (push) Has been cancelled
Build Docker Images / build (junie) (push) Has been cancelled
Build Docker Images / build (kilocode) (push) Has been cancelled
Build Docker Images / build (openclaw) (push) Has been cancelled
Build Docker Images / build (opencode) (push) Has been cancelled
Build Docker Images / build (zeroclaw) (push) Has been cancelled
- E2E: _digitalocean_max_parallel() now returns 0 (not 1) when no capacity
- E2E: run_agents_for_cloud() skips cloud with actionable error when capacity is 0
- CLI: checkAccountStatus() includes droplet names in limit-reached error message

Fixes #3059

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 18:49:18 +07:00
A
1cfa9ca1a7
fix(cursor): update binary path from ~/.cursor/bin to ~/.local/bin (#3058)
The cursor installer changed its binary install location from
~/.cursor/bin/agent to ~/.local/bin/agent (as of 2026-03-25 release).

Updates:
- agent-setup.ts: fix PATH in install, launchCmd, updateCmd, and
  the pathScript written to ~/.bashrc/~/.zshrc
- verify.sh: fix E2E binary check to look in ~/.local/bin first
- Bump CLI to 0.27.3

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-27 02:37:40 -07:00
Ahmed Abushagur
dcb740ec68
ci: add cursor agent to Docker image pipeline (#3051)
Adds cursor.Dockerfile and includes cursor in the docker.yml matrix
so nightly builds produce ghcr.io/openrouterteam/spawn-cursor:latest.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:41:27 +07:00
A
088e33b30e
fix(e2e): correct stale test expectation for hermes timeout fallback (#3044)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
When AGENT_TIMEOUT_hermes is non-numeric, get_agent_timeout() skips the
env var and uses the built-in _AGENT_TIMEOUT_hermes=3600, NOT the global
AGENT_TIMEOUT=1800. The test expected ${AGENT_TIMEOUT} (1800) but the
function correctly returns 3600 (hermes built-in default). This test was
failing silently, masking the correct behavior.

Also filed OpenRouterTeam/spawn#3042 for cursor missing from e2e framework.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-26 19:02:23 -07:00
A
1c8011cae5
fix(e2e): add cursor agent to e2e test framework (#3045)
Add cursor to ALL_AGENTS, verify_cursor, input_test_cursor, and their
dispatch cases so e2e sweeps cover the cursor agent.

Fixes #3042

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 08:40:51 +07:00
A
499eb494c6
fix(security): use StrictHostKeyChecking=accept-new in all SSH connections (#3037)
Replace StrictHostKeyChecking=no with accept-new across all E2E cloud
drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS
constant, and pull-history.ts. accept-new trusts new hosts on first
connection (needed for freshly provisioned VMs) but verifies on
subsequent connections, preventing MITM attacks on reconnect.

Fixes #3031

Agent: style-reviewer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-26 18:04:40 -07:00
A
917d34d034
fix(e2e): ensure openclaw binary available in --fast mode on Sprite (#3040)
* fix(e2e): ensure agent binary available after spawnrc fallback

When the provision timeout kills the CLI before agent install completes
(common in --fast mode on Sprite), the manual .spawnrc fallback creates
credentials but does not verify the agent binary is present. This causes
"openclaw not found" failures in E2E verification.

Add _ensure_agent_binary() that runs after the manual .spawnrc fallback:
1. Checks if the agent binary exists on the remote VM
2. If missing, runs the agent's install command directly
3. Verifies the binary is available after install

Also adds cursor agent to the env vars fallback and binary check.

Fixes #3028

Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(security): add --proto '=https' to cursor install curl command

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 07:36:45 +07:00
A
7080d80472
fix(security): prevent race condition in GitHub token file permissions (#3035)
Before this change, gh auth login wrote the token file with default
permissions, and chmod 600 was applied afterward — leaving a window
where the file could be read by other users on multi-user systems.

Now the credential directory is created with 700 permissions and umask
is set to 077 before the write, so the token file is created with
restrictive permissions from the start.

Agent: complexity-hunter
Fixes #3030

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-26 16:59:42 -07:00
A
aafdb8655f
fix(security): pipe encoded commands via stdin in GCP/AWS exec functions (#3036)
Replace shell interpolation of base64-encoded commands in SSH invocations
with stdin piping. Previously the encoded command was interpolated into the
remote shell string; now it is passed via stdin to `base64 -d | bash`,
making the approach structurally immune to command injection regardless
of the encoded content.

Fixes #3029
Fixes #3022

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 06:11:50 +07:00
Ahmed Abushagur
c61736e511
feat: add Cursor CLI agent across all clouds (#3018)
* feat: add Cursor CLI agent across all clouds

Adds Cursor's terminal-based AI coding agent (the `agent` command from
cursor.com/cli) to the spawn matrix. Routes LLM requests through
OpenRouter via --endpoint flag and CURSOR_API_KEY env var.

- manifest.json: new cursor agent entry + all 6 cloud matrix entries
- agent-setup.ts: install, configure, launch, and update definitions
- Shell scripts for all 6 clouds (local, hetzner, aws, do, gcp, sprite)
- Config: writes ~/.cursor/cli-config.json with full permissions
- Icon: cursor.png from cursor.com/apple-touch-icon.png
- All cloud READMEs updated with cursor.sh usage
- CLI version bumped to 0.26.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add spawn skill injection for Cursor CLI

Writes a .cursor/rules/spawn.mdc rule file with alwaysApply: true
during setup, teaching the Cursor agent how to use the spawn CLI
to provision child cloud VMs. Uses the same base64 upload pattern
as other agent config files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-26 13:53:49 -07:00
A
255ffbf8b7
fix(security): use grep -F for literal string matching in PATH checks (#3021)
Fixes #3019

Replace `grep -qx` with `grep -qxF` in the `ensure_in_path` function
to prevent regex pattern injection. Without -F, attacker-controlled
SPAWN_INSTALL_DIR or BUN_INSTALL env vars containing regex metacharacters
(e.g. `/.*`) could cause false positive/negative PATH matches, potentially
bypassing the symlink creation logic.

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 02:56:07 +07:00
A
defca448b0
fix(e2e): load GCP_ZONE from ~/.config/spawn/gcp.json in E2E driver (#3017)
The GCP E2E cloud driver defaulted to us-central1-a when GCP_ZONE was
not set in the environment. The QA VM stores zone config in
~/.config/spawn/gcp.json (alongside GCP_PROJECT) but _gcp_validate_env
only read GCP_PROJECT from the environment — it never loaded GCP_ZONE.

This caused E2E failures when us-central1-a had insufficient resources:
3 agents (openclaw, opencode, kilocode) failed with "SSH port never
opened" because GCP couldn't provision instances in that zone.

Fix: load both GCP_PROJECT and GCP_ZONE from the config file in
_gcp_validate_env when they are not already set in the environment,
matching how key-request.sh loads GCP_PROJECT for provisioning.

Verified: all 3 previously failing agents now pass on europe-west1-b.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 01:27:46 +07:00
A
988f5bb7a9
fix(security): validate bun path before symlinking in install.sh (fixes #3009) (#3011)
Add allowlist validation for the bun binary path resolved via `command -v bun`
before using it in symlink operations that may run with sudo privileges. If bun
is found at an unexpected location, skip the symlink and warn the user. This
prevents a privilege escalation attack where a malicious binary on PATH could be
symlinked to /usr/local/bin/bun with elevated privileges.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 05:37:45 -07:00
A
463b8398f2
fix: add ai-review.sh to bash -n syntax check list in e2e-lib.sh (#3005)
ai-review.sh is sourced by e2e.sh but was missing from the bash -n
syntax check loop in sh/test/e2e-lib.sh. This means syntax errors in
ai-review.sh would not be caught by the test harness.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-26 03:12:07 -07:00
A
7378cab0b2
fix(security): add defensive validation to tmpdir cleanup in install.sh (#3000)
Adds a non-empty check after mktemp and guards the EXIT trap so rm -rf
only fires when tmpdir is non-empty and still a directory. This is a
defense-in-depth hardening — the current code is safe due to set -e,
but explicit validation is best practice for rm -rf operations.

Fixes #2998

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 11:26:56 +07:00
Ahmed Abushagur
90dde882d0
fix: installSpawnCli fails on Sprite — bun shim doesn't work (#2993)
Sprite has a bun shim at /.sprite/bin/bun that delegates to
$HOME/.bun/bin/bun, but that binary doesn't exist on fresh VMs.
`command -v bun` returns true (finds the shim) so the install script
skips bun installation, then bun fails when actually invoked.

Fixed in two places:
- installSpawnCli: source shell profiles, test `bun --version` (not
  just existence), and install bun fresh if it doesn't work
- install.sh: replace `command -v bun` with `bun --version` to detect
  broken shims

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 07:36:12 +07:00
Ahmed Abushagur
934dfd309f
test: add unit tests for E2E bash test infrastructure (#2968)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
136 tests covering common.sh, verify.sh, provision.sh, and e2e.sh:
- format_duration, make_app_name, track_app/untrack_app
- get_provision_timeout/get_agent_timeout with env overrides
- Numeric validation (injection resistance for timeout vars)
- OpenRouter API key fallback logic
- _validate_timeout and _validate_base64 security checks
- run_input_test dispatch (unknown agent, TUI skips, SKIP_INPUT_TEST)
- provision_agent app_name validation (injection resistance)
- e2e.sh argument parsing (--help, missing args, invalid clouds/agents)
- ALL_AGENTS completeness (verify_* and input_test_* for every agent)
- Cloud driver interface compliance (all 5 drivers implement required fns)
- bash -n syntax check on all E2E scripts
- macOS compat linter on core E2E libraries

Also documents a known limitation: _validate_base64 uses per-line grep
matching, so multiline strings pass if each line is valid (low risk since
base64 encoding always strips newlines).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-24 18:42:48 -07:00
A
a6940fdaad
fix(e2e): improve interactive harness failure logging (#2951)
On interactive provision failure, save the harness log to a persistent
path (/tmp/spawn-interactive-harness-last.log) for post-mortem inspection,
and filter output to only show [harness] prefixed lines (30 lines) instead
of dumping 50 raw lines of mixed output.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
2026-03-24 08:45:19 -07:00
A
6c742bdd11
fix(e2e): increase hermes install timeout to fix failures on Hetzner/DO/GCP (#2956)
Hermes installs a Python virtualenv which takes 20+ min on fresh VMs.
The previous 300s install timeout caused the CLI to give up before
writing .spawnrc, leading to 30-min E2E timeouts on Hetzner, DigitalOcean,
and GCP (but not Sprite, which has a manual .spawnrc fallback).

Changes:
- agent-setup.ts: hermes installAgent timeout 300s → 600s
- common.sh: add hermes per-agent overrides (_PROVISION_TIMEOUT_hermes=720,
  _AGENT_TIMEOUT_hermes=3600) to give the install enough headroom
- package.json: bump CLI version 0.25.26 → 0.25.27

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 21:34:41 +07:00
A
056ce252c7
fix(e2e): suppress matrix email on targeted re-runs via SPAWN_E2E_SKIP_EMAIL (#2944)
When the quality cycle e2e-tester re-runs only failed agents
(e.g. `e2e.sh --cloud hetzner zeroclaw codex`), e2e.sh was firing
a matrix email showing only those 2 agents — both PASS if the retry
succeeded. This looked like "2 tests ran, all passed" when in reality
32 tests ran with 2 failures.

- Add SPAWN_E2E_SKIP_EMAIL=1 env var check at the top of send_matrix_email
- Update qa-quality-prompt.md to set SPAWN_E2E_SKIP_EMAIL=1 on re-runs

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 00:17:10 -07:00
A
aafeda4020
fix(e2e): reduce Hetzner max parallel from 5 to 3 to respect primary IP quota (#2943)
The QA account's primary IP limit is ~3, so running 5 agents in parallel
exhausted the quota, causing codex and zeroclaw to fail with
resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps
provisioning within quota while still running agents concurrently.

Verified: zeroclaw and codex both PASS on Hetzner after this fix.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 13:32:10 +07:00
A
81ab237efe
fix(e2e): harden shell scripts against injection in SSH commands (#2945)
- hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of
  embedding it in the SSH command string via variable expansion. The
  remote bash reads stdin, base64-decodes, and executes.

- verify.sh: Add remote-side re-validation of base64 and timeout values
  in _stage_prompt_remotely and _stage_timeout_remotely. Values are
  assigned to remote shell variables and validated before writing to
  temp files, providing defense-in-depth against injection.

- provision.sh: Add explicit early rejection of dangerous shell chars
  ($, `, \) in env var values from cloud_headless_env, and add
  remote-side re-validation of base64 payload before writing.

Fixes #2937
Fixes #2938
Fixes #2939

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 13:30:47 +07:00
A
8ed8d91205
fix(qa): stash before pull, fix star count push, fix claude update flag (#2942)
- Stash uncommitted changes before git pull --rebase so the pull
  never aborts with "You have unstaged changes"
- Pull --rebase before pushing star count commit to avoid
  non-fast-forward rejection (was failing every single cycle)
- Remove --yes flag from claude update (flag was removed upstream)
- Fix interactive harness AI prompt: update success marker text from
  "is ready" or "Starting agent" to match code check
  ("Starting agent..." or "setup completed successfully")

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 12:53:27 +07:00
A
4f141486dc
refactor: remove dead code and stale references (#2940)
- fix misplaced interactive_provision comment block in interactive.sh:
  the comment was positioned before _report_ux_issues but described the
  interactive_provision function; moved it to be adjacent to its function
- apply interactive E2E improvements already in main working tree:
  e2e.sh: add verify_agent call after interactive_provision to wait for
  .spawnrc before running input tests (aligns interactive with headless flow)

-- qa/code-quality

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 12:09:50 +07:00
A
e9cbab5b7f
fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936)
Three fixes for Sprite E2E failures in long-running batches (73+ min):

1. Retry `_sprite_provision_verify`: list failures now retry 3x with
   exponential backoff (5s, 10s, 20s) instead of failing immediately.
   Fixes kilocode batch 6 "Could not list Sprite instances" errors.

2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add
   `Client.Timeout`, `request canceled`, and `authentication failed`
   to the transient error retry pattern in `spriteRetry`. Also uses
   linear backoff (3s * attempt) instead of fixed 3s delay.
   Fixes hermes batch 7 HTTP timeout errors.

3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The
   E2E orchestrator calls `cloud_refresh_auth` before each provisioning
   batch. For Sprite, this re-validates the token via `sprite org list`
   and attempts `sprite auth refresh` if expired.
   Fixes junie batch 8 "authentication failed" errors.

Fixes #2934

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:47:58 -07:00
A
50319e0d39
fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935)
Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary
IPs from previous test runs consume the account quota. This adds proactive
cleanup at two levels:

1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached
   primary IPs during pre-batch stale cleanup, freeing quota before any
   new servers are provisioned.

2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()`
   before `createServer()` in headless/non-interactive mode, ensuring
   each agent provisioning attempt starts with a clean IP quota.

The existing reactive cleanup (retry after failure) in `hetzner.ts`
remains as a fallback.

Fixes #2933

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 11:20:30 +07:00
A
c1e6fb76f9
fix(e2e): harden pkill regex escaping against all metacharacters (#2917)
* fix(e2e): harden pkill regex escaping against all metacharacters (#2911)

The sed character class `[.[\*^$]` was malformed and missed several
extended regex metacharacters (+, ?, (, ), {, }, |). Replace with a
correct bracket expression that escapes all POSIX ERE metacharacters.

Although app_name is already validated to [A-Za-z0-9._-], fixing the
escaping is defense-in-depth against future changes to the validation.

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(e2e): correct sed bracket expression to escape ] character

Place ] first in character class so it's treated as literal.
Use \\ to match literal backslash.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 12:35:31 -07:00
A
a96522829b
fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898)
* fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness

Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively.
The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN
is an OpenRouter key — every Anthropic API call returns 401, so the harness
returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires.

SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect,
ensuring the harness only tests the provisioning/installation UX.

* fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner

Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via
"printf | base64 -d | bash", which makes bash's stdin the decode pipe.
So piped data from the outer SSH call never reaches subcommands.

"printf '%s' 'VALUE' | cloud_exec APP 'cat > /tmp/.e2e-timeout'" always
creates an empty file, causing "timeout: invalid time interval ''" when
the input test runs.

Fix: embed the validated numeric timeout value directly in the printf
command string (safe — _validate_timeout ensures only [0-9] digits).

* test(e2e): add claude PATH diagnostics to input_test_claude

Temporary debug output to trace where claude is installed
after interactive provision completes.

* test(e2e): save harness transcript JSON on success for debugging

* fix(e2e): remove 'is ready' from harness success pattern

'SSH is ready' (emitted ~15s into provision when SSH connects but before
any agent installation) matched the /is ready/ pattern, triggering false
success detection. The harness killed the spawn CLI during cloud-init wait,
leaving a VM with no agent installed.

Fix: use the same precise patterns as the main repo's harness:
  /Starting agent\.\.\.|setup completed successfully/i
Both only fire after orchestrate.ts completes the full setup.

* chore(e2e): remove temporary debug instrumentation

* feat(e2e): add ai-powered ux review after interactive provision

After each successful interactive E2E run, the harness sends the full
terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt.
It looks for confusing messages, noisy output, missing context in spinners,
and unhelpful errors that don't explain next steps.

Findings are returned as uxIssues[] in the harness JSON result.
interactive.sh then files a GitHub issue per run listing each problem
with a verbatim example and concrete suggestion.

Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM
where ANTHROPIC_API_KEY is an OpenRouter key.

* refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required

- Random 33% gate: UX review runs on ~1 in 3 successful interactive
  provisions, not every run
- Minimum bar: only surface findings when AI found 3+ clear issues
  (filters one-off nits)
- Tighter system prompt: only flag obvious problems (repeated messages,
  debug leaks, cryptic errors), not minor style preferences

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(e2e): replace random throttle with stricter ux review prompt

Instead of Math.random() to suppress issues, make the AI self-regulate:
the system prompt now instructs it to only flag genuinely bad problems
(repeated messages, raw stack traces, no-feedback waits) and treat
zero findings as a good outcome, not a failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 13:42:02 +07:00