The get_provision_timeout and get_agent_timeout functions used printenv with
dynamically constructed variable names, which is fragile across shells and
platforms. Replace with eval-based parameter expansion using the already-
sanitized safe_agent variable (restricted to [A-Za-z0-9_]).
Fixes#3234
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Set umask 077 before mktemp so the temp .ts file is created with 0600
permissions, preventing other users on shared systems from reading it.
Umask is restored immediately after file creation.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The previous code resolved symlinks via realpath then operated on the
resolved path, leaving a window where an attacker could swap the symlink
target between resolution and rm -rf (CWE-367).
Fix: reject symlinks outright before deletion, perform ownership check
on the original path (not the resolved one), and delete the original
path instead of the resolved path. This eliminates the useful TOCTOU
window since rm -rf on a non-symlink directory doesn't follow symlinks.
Fixes#3233
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Sanitize cloud/agent names before building email HTML (#3189)
- Validate result values against allowlist (pass/fail/skip)
- Resolve symlinks and check ownership before rm -rf (#3194)
- Add upper bounds on cloud/agent list sizes (#3190)
Fixes#3189#3194#3190
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(e2e): validate LOG_DIR ownership before rm -rf in final_cleanup
Adds _E2E_CREATED_LOG_DIR tracking to ensure cleanup only removes
directories created by this script instance, not attacker-controlled paths.
Fixes#3181
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(e2e): restore SAFE_TMP_ROOT prefix validation alongside ownership check
Defense-in-depth: keep both the path prefix check (SAFE_TMP_ROOT/spawn-e2e.*)
and the ownership check (_E2E_CREATED_LOG_DIR) as two independent layers.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(security): array-based agent detection and GCP instance name validation
Replace shell string concatenation in detectAgent() with individual
`command -v` calls per agent, eliminating the compound shell command.
Add _gcp_validate_instance_name() to validate GCP instance names match
[a-z][a-z0-9-]*[a-z0-9] before passing to gcloud commands.
Fixes#3151Fixes#3149
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add instance name validation in _gcp_cleanup_stale()
Defense-in-depth: validate instance names from GCP API before passing
to gcloud delete, consistent with validation at other call sites.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add a billing pre-check to _gcp_validate_env so the E2E orchestrator
skips GCP gracefully ("skipped — credentials not configured") instead
of failing every agent individually when billing is disabled.
Fixes#3091
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous grep -o '"id":[0-9]*' pattern matched all numeric id fields
in the droplets JSON response (including nested image/region/size ids),
overcounting droplets by 2x and falsely reporting quota exhaustion.
Replace with jq '.droplets | length' which correctly counts only top-level
droplet objects. This restores DigitalOcean capacity detection so e2e runs
can use available droplet slots.
-- qa/e2e-tester
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
_digitalocean_max_parallel() called log_warn which writes colored output
to stdout, polluting the captured return value when invoked via
cloud_max=$(cloud_max_parallel). The downstream integer comparison
[ "${effective_parallel}" -gt "${cloud_max}" ] then fails with
'integer expression expected', silently leaving the droplet limit cap
unapplied. Fix: redirect log_warn output to stderr so only the numeric
value is captured.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
GCP, Sprite, and DigitalOcean had commented-out code `# local agent="$2"`
in their `_headless_env` functions. Hetzner already used the cleaner style
`# $2 = agent (unused but part of the interface)`. Normalize to match.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces all references to DO_API_TOKEN with DIGITALOCEAN_ACCESS_TOKEN,
matching DigitalOcean's official CLI and API documentation. This includes
TypeScript source, tests, shell scripts, Packer config, CI workflows,
and documentation.
Supersedes #3068 (rebased onto current main).
Agent: pr-maintainer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#3070
The port_check / port_check_r variables stored executable shell code as
strings and expanded them via ${port_check} inside cloud_exec commands.
This is an eval-equivalent pattern: if any part of the variable were ever
derived from dynamic input, it would be directly exploitable as command
injection.
Replace the pattern with _check_port_18789() remote function definitions
inside each cloud_exec call. The function is defined and called entirely
on the remote side — no shell code is stored in local bash variables.
Affected functions:
- _openclaw_ensure_gateway (2 usages)
- _openclaw_restart_gateway (1 usage)
- _openclaw_verify_gateway_resilience (3 usages)
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
In rootless containers or environments without sudo, the script
previously failed with cryptic errors. Now fails fast with a clear
error message when non-root and sudo is unavailable.
Fixes#3069
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Clean up three remaining stale references to ~/.cursor/bin that were
not caught in the #3058 path migration:
- manifest.json: update notes field to reflect ~/.local/bin/agent
- sh/e2e/lib/provision.sh: remove ~/.cursor/bin from path_prefix
- sh/e2e/lib/verify.sh: remove ~/.cursor/bin from binary check PATH
Fixes#3065
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- E2E: _digitalocean_max_parallel() now returns 0 (not 1) when no capacity
- E2E: run_agents_for_cloud() skips cloud with actionable error when capacity is 0
- CLI: checkAccountStatus() includes droplet names in limit-reached error message
Fixes#3059
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The cursor installer changed its binary install location from
~/.cursor/bin/agent to ~/.local/bin/agent (as of 2026-03-25 release).
Updates:
- agent-setup.ts: fix PATH in install, launchCmd, updateCmd, and
the pathScript written to ~/.bashrc/~/.zshrc
- verify.sh: fix E2E binary check to look in ~/.local/bin first
- Bump CLI to 0.27.3
-- qa/e2e-tester
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Adds cursor.Dockerfile and includes cursor in the docker.yml matrix
so nightly builds produce ghcr.io/openrouterteam/spawn-cursor:latest.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When AGENT_TIMEOUT_hermes is non-numeric, get_agent_timeout() skips the
env var and uses the built-in _AGENT_TIMEOUT_hermes=3600, NOT the global
AGENT_TIMEOUT=1800. The test expected ${AGENT_TIMEOUT} (1800) but the
function correctly returns 3600 (hermes built-in default). This test was
failing silently, masking the correct behavior.
Also filed OpenRouterTeam/spawn#3042 for cursor missing from e2e framework.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Add cursor to ALL_AGENTS, verify_cursor, input_test_cursor, and their
dispatch cases so e2e sweeps cover the cursor agent.
Fixes#3042
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace StrictHostKeyChecking=no with accept-new across all E2E cloud
drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS
constant, and pull-history.ts. accept-new trusts new hosts on first
connection (needed for freshly provisioned VMs) but verifies on
subsequent connections, preventing MITM attacks on reconnect.
Fixes#3031
Agent: style-reviewer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(e2e): ensure agent binary available after spawnrc fallback
When the provision timeout kills the CLI before agent install completes
(common in --fast mode on Sprite), the manual .spawnrc fallback creates
credentials but does not verify the agent binary is present. This causes
"openclaw not found" failures in E2E verification.
Add _ensure_agent_binary() that runs after the manual .spawnrc fallback:
1. Checks if the agent binary exists on the remote VM
2. If missing, runs the agent's install command directly
3. Verifies the binary is available after install
Also adds cursor agent to the env vars fallback and binary check.
Fixes#3028
Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(security): add --proto '=https' to cursor install curl command
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Before this change, gh auth login wrote the token file with default
permissions, and chmod 600 was applied afterward — leaving a window
where the file could be read by other users on multi-user systems.
Now the credential directory is created with 700 permissions and umask
is set to 077 before the write, so the token file is created with
restrictive permissions from the start.
Agent: complexity-hunter
Fixes#3030
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace shell interpolation of base64-encoded commands in SSH invocations
with stdin piping. Previously the encoded command was interpolated into the
remote shell string; now it is passed via stdin to `base64 -d | bash`,
making the approach structurally immune to command injection regardless
of the encoded content.
Fixes#3029Fixes#3022
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add Cursor CLI agent across all clouds
Adds Cursor's terminal-based AI coding agent (the `agent` command from
cursor.com/cli) to the spawn matrix. Routes LLM requests through
OpenRouter via --endpoint flag and CURSOR_API_KEY env var.
- manifest.json: new cursor agent entry + all 6 cloud matrix entries
- agent-setup.ts: install, configure, launch, and update definitions
- Shell scripts for all 6 clouds (local, hetzner, aws, do, gcp, sprite)
- Config: writes ~/.cursor/cli-config.json with full permissions
- Icon: cursor.png from cursor.com/apple-touch-icon.png
- All cloud READMEs updated with cursor.sh usage
- CLI version bumped to 0.26.0
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add spawn skill injection for Cursor CLI
Writes a .cursor/rules/spawn.mdc rule file with alwaysApply: true
during setup, teaching the Cursor agent how to use the spawn CLI
to provision child cloud VMs. Uses the same base64 upload pattern
as other agent config files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
Fixes#3019
Replace `grep -qx` with `grep -qxF` in the `ensure_in_path` function
to prevent regex pattern injection. Without -F, attacker-controlled
SPAWN_INSTALL_DIR or BUN_INSTALL env vars containing regex metacharacters
(e.g. `/.*`) could cause false positive/negative PATH matches, potentially
bypassing the symlink creation logic.
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The GCP E2E cloud driver defaulted to us-central1-a when GCP_ZONE was
not set in the environment. The QA VM stores zone config in
~/.config/spawn/gcp.json (alongside GCP_PROJECT) but _gcp_validate_env
only read GCP_PROJECT from the environment — it never loaded GCP_ZONE.
This caused E2E failures when us-central1-a had insufficient resources:
3 agents (openclaw, opencode, kilocode) failed with "SSH port never
opened" because GCP couldn't provision instances in that zone.
Fix: load both GCP_PROJECT and GCP_ZONE from the config file in
_gcp_validate_env when they are not already set in the environment,
matching how key-request.sh loads GCP_PROJECT for provisioning.
Verified: all 3 previously failing agents now pass on europe-west1-b.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add allowlist validation for the bun binary path resolved via `command -v bun`
before using it in symlink operations that may run with sudo privileges. If bun
is found at an unexpected location, skip the symlink and warn the user. This
prevents a privilege escalation attack where a malicious binary on PATH could be
symlinked to /usr/local/bin/bun with elevated privileges.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ai-review.sh is sourced by e2e.sh but was missing from the bash -n
syntax check loop in sh/test/e2e-lib.sh. This means syntax errors in
ai-review.sh would not be caught by the test harness.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Adds a non-empty check after mktemp and guards the EXIT trap so rm -rf
only fires when tmpdir is non-empty and still a directory. This is a
defense-in-depth hardening — the current code is safe due to set -e,
but explicit validation is best practice for rm -rf operations.
Fixes#2998
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sprite has a bun shim at /.sprite/bin/bun that delegates to
$HOME/.bun/bin/bun, but that binary doesn't exist on fresh VMs.
`command -v bun` returns true (finds the shim) so the install script
skips bun installation, then bun fails when actually invoked.
Fixed in two places:
- installSpawnCli: source shell profiles, test `bun --version` (not
just existence), and install bun fresh if it doesn't work
- install.sh: replace `command -v bun` with `bun --version` to detect
broken shims
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
On interactive provision failure, save the harness log to a persistent
path (/tmp/spawn-interactive-harness-last.log) for post-mortem inspection,
and filter output to only show [harness] prefixed lines (30 lines) instead
of dumping 50 raw lines of mixed output.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Hermes installs a Python virtualenv which takes 20+ min on fresh VMs.
The previous 300s install timeout caused the CLI to give up before
writing .spawnrc, leading to 30-min E2E timeouts on Hetzner, DigitalOcean,
and GCP (but not Sprite, which has a manual .spawnrc fallback).
Changes:
- agent-setup.ts: hermes installAgent timeout 300s → 600s
- common.sh: add hermes per-agent overrides (_PROVISION_TIMEOUT_hermes=720,
_AGENT_TIMEOUT_hermes=3600) to give the install enough headroom
- package.json: bump CLI version 0.25.26 → 0.25.27
-- qa/e2e-tester
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
When the quality cycle e2e-tester re-runs only failed agents
(e.g. `e2e.sh --cloud hetzner zeroclaw codex`), e2e.sh was firing
a matrix email showing only those 2 agents — both PASS if the retry
succeeded. This looked like "2 tests ran, all passed" when in reality
32 tests ran with 2 failures.
- Add SPAWN_E2E_SKIP_EMAIL=1 env var check at the top of send_matrix_email
- Update qa-quality-prompt.md to set SPAWN_E2E_SKIP_EMAIL=1 on re-runs
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The QA account's primary IP limit is ~3, so running 5 agents in parallel
exhausted the quota, causing codex and zeroclaw to fail with
resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps
provisioning within quota while still running agents concurrently.
Verified: zeroclaw and codex both PASS on Hetzner after this fix.
-- qa/e2e-tester
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
- hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of
embedding it in the SSH command string via variable expansion. The
remote bash reads stdin, base64-decodes, and executes.
- verify.sh: Add remote-side re-validation of base64 and timeout values
in _stage_prompt_remotely and _stage_timeout_remotely. Values are
assigned to remote shell variables and validated before writing to
temp files, providing defense-in-depth against injection.
- provision.sh: Add explicit early rejection of dangerous shell chars
($, `, \) in env var values from cloud_headless_env, and add
remote-side re-validation of base64 payload before writing.
Fixes#2937Fixes#2938Fixes#2939
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Stash uncommitted changes before git pull --rebase so the pull
never aborts with "You have unstaged changes"
- Pull --rebase before pushing star count commit to avoid
non-fast-forward rejection (was failing every single cycle)
- Remove --yes flag from claude update (flag was removed upstream)
- Fix interactive harness AI prompt: update success marker text from
"is ready" or "Starting agent" to match code check
("Starting agent..." or "setup completed successfully")
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- fix misplaced interactive_provision comment block in interactive.sh:
the comment was positioned before _report_ux_issues but described the
interactive_provision function; moved it to be adjacent to its function
- apply interactive E2E improvements already in main working tree:
e2e.sh: add verify_agent call after interactive_provision to wait for
.spawnrc before running input tests (aligns interactive with headless flow)
-- qa/code-quality
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes for Sprite E2E failures in long-running batches (73+ min):
1. Retry `_sprite_provision_verify`: list failures now retry 3x with
exponential backoff (5s, 10s, 20s) instead of failing immediately.
Fixes kilocode batch 6 "Could not list Sprite instances" errors.
2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add
`Client.Timeout`, `request canceled`, and `authentication failed`
to the transient error retry pattern in `spriteRetry`. Also uses
linear backoff (3s * attempt) instead of fixed 3s delay.
Fixes hermes batch 7 HTTP timeout errors.
3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The
E2E orchestrator calls `cloud_refresh_auth` before each provisioning
batch. For Sprite, this re-validates the token via `sprite org list`
and attempts `sprite auth refresh` if expired.
Fixes junie batch 8 "authentication failed" errors.
Fixes#2934
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary
IPs from previous test runs consume the account quota. This adds proactive
cleanup at two levels:
1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached
primary IPs during pre-batch stale cleanup, freeing quota before any
new servers are provisioned.
2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()`
before `createServer()` in headless/non-interactive mode, ensuring
each agent provisioning attempt starts with a clean IP quota.
The existing reactive cleanup (retry after failure) in `hetzner.ts`
remains as a fallback.
Fixes#2933
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(e2e): harden pkill regex escaping against all metacharacters (#2911)
The sed character class `[.[\*^$]` was malformed and missed several
extended regex metacharacters (+, ?, (, ), {, }, |). Replace with a
correct bracket expression that escapes all POSIX ERE metacharacters.
Although app_name is already validated to [A-Za-z0-9._-], fixing the
escaping is defense-in-depth against future changes to the validation.
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(e2e): correct sed bracket expression to escape ] character
Place ] first in character class so it's treated as literal.
Use \\ to match literal backslash.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness
Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively.
The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN
is an OpenRouter key — every Anthropic API call returns 401, so the harness
returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires.
SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect,
ensuring the harness only tests the provisioning/installation UX.
* fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner
Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via
"printf | base64 -d | bash", which makes bash's stdin the decode pipe.
So piped data from the outer SSH call never reaches subcommands.
"printf '%s' 'VALUE' | cloud_exec APP 'cat > /tmp/.e2e-timeout'" always
creates an empty file, causing "timeout: invalid time interval ''" when
the input test runs.
Fix: embed the validated numeric timeout value directly in the printf
command string (safe — _validate_timeout ensures only [0-9] digits).
* test(e2e): add claude PATH diagnostics to input_test_claude
Temporary debug output to trace where claude is installed
after interactive provision completes.
* test(e2e): save harness transcript JSON on success for debugging
* fix(e2e): remove 'is ready' from harness success pattern
'SSH is ready' (emitted ~15s into provision when SSH connects but before
any agent installation) matched the /is ready/ pattern, triggering false
success detection. The harness killed the spawn CLI during cloud-init wait,
leaving a VM with no agent installed.
Fix: use the same precise patterns as the main repo's harness:
/Starting agent\.\.\.|setup completed successfully/i
Both only fire after orchestrate.ts completes the full setup.
* chore(e2e): remove temporary debug instrumentation
* feat(e2e): add ai-powered ux review after interactive provision
After each successful interactive E2E run, the harness sends the full
terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt.
It looks for confusing messages, noisy output, missing context in spinners,
and unhelpful errors that don't explain next steps.
Findings are returned as uxIssues[] in the harness JSON result.
interactive.sh then files a GitHub issue per run listing each problem
with a verbatim example and concrete suggestion.
Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM
where ANTHROPIC_API_KEY is an OpenRouter key.
* refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required
- Random 33% gate: UX review runs on ~1 in 3 successful interactive
provisions, not every run
- Minimum bar: only surface findings when AI found 3+ clear issues
(filters one-off nits)
- Tighter system prompt: only flag obvious problems (repeated messages,
debug leaks, cryptic errors), not minor style preferences
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(e2e): replace random throttle with stricter ux review prompt
Instead of Math.random() to suppress issues, make the AI self-regulate:
the system prompt now instructs it to only flag genuinely bad problems
(repeated messages, raw stack traces, no-feedback waits) and treat
zero findings as a good outcome, not a failure.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>