On GCP VMs (running as root), npm installs openclaw to /usr/local/bin
instead of ~/.npm-global/bin because the system npm prefix is writable
and already in PATH. The E2E verify_openclaw() and related gateway
helper functions only explicitly listed ~/.npm-global/bin, ~/.bun/bin,
and ~/.local/bin — missing /usr/local/bin when .spawnrc sourcing
silently fails in the piped-bash SSH exec context.
Add /usr/local/bin explicitly to all openclaw-related PATH exports in
verify.sh so the binary check succeeds regardless of .spawnrc state.
Fixes#2732
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The E2E framework's run_single_agent function had no overall timeout.
When provision/verify/input_test steps hung (e.g. cloud_exec blocking
on sprite-zeroclaw or digitalocean-opencode), the process would stall
indefinitely without writing a .result file, causing silent test failures.
Add a per-agent wall-clock timeout (default 1800s, 2400s for junie) that
wraps the core provision/verify/input_test logic in a killable subshell.
If the timeout expires, the subshell is killed and a "fail" result is
written, ensuring E2E batches always complete.
Fixes#2714
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(e2e): increase provision timeout for junie on hetzner
junie's install takes >720s on Hetzner, exceeding the default
PROVISION_TIMEOUT and causing 100% E2E failure for hetzner-junie.
Add a per-agent provision timeout mechanism in common.sh via
get_provision_timeout(). This checks (in order):
1. PROVISION_TIMEOUT_<agent> env var override
2. Built-in per-agent default (_PROVISION_TIMEOUT_junie=1200)
3. Global PROVISION_TIMEOUT (720s)
provision.sh now calls get_provision_timeout() to resolve the
effective timeout per agent instead of using the flat global.
Fixes#2680
Agent: code-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): whitelist-sanitize agent name before eval in get_provision_timeout
tr '-' '_' only replaced hyphens, allowing metacharacters like $, backticks,
and ; to pass through into eval, enabling shell injection via a crafted agent
name. Replace with sed whitelist [A-Za-z0-9_] to strip all unsafe chars.
Agent: team-lead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Pass base64-encoded prompts via _ENCODED_PROMPT shell variable assignment
at the start of remote command strings instead of interpolating directly
into single-quoted decode contexts. This prevents quote-escaping
vulnerabilities if INPUT_TEST_PROMPT or the encoding mechanism ever
changes to produce characters that break single-quote delimiters.
Fixes#2666
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
On Sprite VMs, npm's global prefix (from nvm) is writable and in PATH
after sourcing .bashrc, so openclaw installs to the nvm bin dir instead
of ~/.npm-global/bin. The E2E verify_openclaw() binary check only
prepended ~/.npm-global/bin, ~/.bun/bin, and ~/.local/bin — missing the
nvm bin path entirely.
Source .bashrc (in addition to .spawnrc) before the command -v check so
the verify PATH matches the install-time PATH. Applied the same fix to
the ensure/restart gateway helpers and the openclaw input test.
Fixes#2656
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove three dead functions that were defined but never called:
- verify_setup_github — checked GitHub CLI auth status
- verify_setup_browser — checked Chrome browser install
- verify_setup_telegram — checked openclaw Telegram config
These were orphaned helpers (never called from verify_agent or anywhere
else). All agent-specific checks go through verify_agent() which dispatches
to the per-agent verify_*() functions, none of which called these helpers.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
The @jetbrains/junie-cli postinstall script may download the actual
binary to non-standard locations that verify_junie() wasn't checking.
Add ~/.junie/bin, /usr/local/bin, and dynamic npm global bin resolution
to the PATH search in the binary check.
Fixes#2611
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
GCP e2-micro VMs are slow and throttled. When the openclaw gateway is
killed during the resilience test, the lock file is held by the dead
process for ~5s. This causes the first systemd restart attempt to fail
with "lock timeout after 5000ms", requiring a second restart cycle.
Timeline on slow VMs: RestartSec(5) + lock-timeout(5) + RestartSec(5)
+ boot(5) ≈ 20s. The previous 30s window was too tight — the gateway
DID recover but just barely missed the polling window on throttled CPUs.
Increasing to 60s gives a comfortable 3x margin for all VM types.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
ZeroClaw's latest GitHub release (v0.1.9a) ships no binary assets.
The --prefer-prebuilt bootstrap path hits a 404, falls back to Rust
source compilation, and exceeds the 600s install timeout — causing
zeroclaw to fail on all clouds (digitalocean, gcp, hetzner, sprite).
Fix: replace the bootstrap invocation with a direct curl download from
v0.1.7-beta.30 (the last release that ships linux-gnu prebuilt binaries)
into ~/.local/bin. This completes in seconds vs ~20 minutes for a source
build, and removes the swap-space setup step that was only needed for
memory-intensive compilation.
Also remove the now-unused ensureSwapSpace function and update the E2E
verify check to also look in ~/.local/bin for the zeroclaw binary.
-- qa/e2e-tester
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
* feat: add Telegram and WhatsApp options to OpenClaw setup picker
Adds separate "Telegram" and "WhatsApp" checkboxes to the OpenClaw
setup screen:
- Telegram: prompts for bot token from @BotFather, injects into
OpenClaw config via `openclaw config set`
- WhatsApp: reminds user to scan QR code via the web dashboard
after launch (no CLI setup possible)
Updates USER.md with channel-specific guidance when either is selected.
Bump CLI version to 0.16.16.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: run WhatsApp QR scan interactively before TUI launch
Instead of punting WhatsApp setup to "after launch", runs
`openclaw channels login --channel whatsapp` as an interactive SSH
session between gateway start and TUI launch. The user scans the
QR code with their phone during provisioning setup.
Flow: gateway starts → tunnel set up → WhatsApp QR scan → TUI launch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update WhatsApp hint to reflect pre-TUI QR scanning
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add --config and --steps CLI flags for programmatic setup
Add --config <path> flag to load spawn options from a JSON config file
(model, steps, name, setup data like telegram_bot_token). Add --steps
<list> flag for comma-separated setup step control. Both enable the
web UI and headless automation to control which setup steps run.
Priority order: CLI flags > --config file > env vars > defaults.
- New spawn-config.ts module with valibot validation
- OptionalStep extended with dataEnvVar and interactive metadata
- validateStepNames() for step name validation with warnings
- Telegram setup reads TELEGRAM_BOT_TOKEN env var before prompting
- WhatsApp auto-skipped in headless mode with warning
- promptSetupOptions() skipped when SPAWN_ENABLED_STEPS already set
- E2E verify helpers for github, browser, telegram setup artifacts
- QA reference file documenting all agent setup options
- Version bump to 0.17.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add --model flag and priority order tests
- Add --model <id> CLI flag that sets MODEL_ID env var
- --model is extracted before --config so it takes priority
- Add config-priority.test.ts with 8 tests verifying:
- --model overrides config model
- --steps overrides config steps
- --steps "" disables all steps
- --name overrides config name
- Config tokens apply as defaults
- Explicit env vars override config tokens
- Remove preferences.json from priority order docs (not needed)
- Add --model to help text and unknown-flag guidance
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add --model, --config, --steps to README
Document config file format, setup steps table, and new CLI flags
in the commands table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address security review feedback
- Move null byte check before path resolution (defense-in-depth)
- Move agent-setup-options.md from .claude/rules/ to .docs/ (git-ignored)
per documentation policy
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve rebase conflicts and deduplicate --model flag extraction
Rebase on main introduced a duplicate --model flag extraction block
(one from the PR at line 804, one from main at line 941). Consolidated
into the single early extraction point with -m shorthand support.
Also removed duplicate --model entry from KNOWN_FLAGS set.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
- soak.sh: SOAK_CLOUD env var makes cloud configurable (default: sprite)
- qa.sh: load TELEGRAM_BOT_TOKEN, TELEGRAM_TEST_CHAT_ID, SOAK_CLOUD from
/etc/spawn-qa-auth.env in soak mode
- qa.yml: add weekly Monday 3am UTC scheduled soak trigger
- fix: bun eval → bun -e across soak.sh, key-request.sh, github-auth.sh
(bun eval is not a valid subcommand in bun 1.3.9)
- fix: export _TOKEN via env prefix so process.env._TOKEN works in bun -e
- docs: update shell-scripts.md rule to say bun -e (not bun eval)
Verified: 3/4 Telegram tests pass in smoke test on DigitalOcean (120s wait)
getMe ✓ sendMessage ✓ getWebhookInfo ✓; cron test needs full 55-min window.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Three root-cause bugs in input test functions:
1. Stdin pass-through broken: cloud_exec uses "printf '...' | base64 -d | bash"
on the remote, meaning bash reads the script from its own stdin — not the
outer process's stdin. "PROMPT=$(base64 -d)" inside the script was reading
from the already-consumed pipe, always producing an empty prompt.
Fix: embed the base64-encoded prompt directly in the remote command string.
Base64 output is [A-Za-z0-9+/=] only — safe to embed in single-quoted strings.
2. Zeroclaw flag wrong: "zeroclaw agent -p" was passing the prompt as
--provider (not --prompt). The correct flag for non-interactive single-message
mode is "-m"/"--message".
3. Codex model stale: "openai/gpt-5-codex" does not exist on OpenRouter.
Updated to "openai/gpt-5.1-codex" which is available.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add base64 character validation ([A-Za-z0-9+/=]) before use in SSH
command strings for gcp.sh, aws.sh, and hetzner.sh cloud_exec
functions -- matching the existing fix in digitalocean.sh (#2528).
Also add a validated _encode_b64 helper to soak.sh and use it for
all Telegram bot token encoding, preventing corrupted base64 from
breaking out of single-quoted SSH command strings.
Closes#2527
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add explicit base64 character validation in _digitalocean_exec after
encoding the command, matching the existing pattern in provision.sh.
This ensures the encoded value contains only [A-Za-z0-9+/=] before
embedding it in the SSH command string.
Note: #2527 (provision.sh base64 validation) was already fixed in a
prior commit — the validation at lines 284-289 already rejects
non-base64 characters and empty output.
Fixes#2526
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: add cron-triggered Telegram reminder to soak test
Tests OpenClaw's ability to stay alive and execute scheduled tasks.
Installs a one-shot cron on the VM before the 1h soak wait that sends
a Telegram message at ~55 min, then verifies the message was sent
after the wait completes. Also moves Telegram config injection before
the soak wait so the cron can use the bot token immediately.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: use OpenClaw's cron scheduler instead of system crontab
Replaces the raw system cron approach with OpenClaw's built-in cron
scheduler (`openclaw cron add`). This properly tests that OpenClaw's
gateway stays alive after 1 hour and can execute scheduled tasks.
The test now:
1. Injects Telegram config + schedules an OpenClaw cron job (--at +55min)
2. Waits 1 hour (soak)
3. Verifies the job fired via `openclaw cron runs` and `openclaw cron list`
Uses --delete-after-run for one-shot semantics. Verification checks both
the run history and the auto-deletion as proof of execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: verify cron message on Telegram side via forwardMessage
Instead of trusting OpenClaw's self-reported cron status, we now verify
the message actually exists in the Telegram chat:
1. Extract message_id from OpenClaw's cron execution logs (tries
`openclaw cron runs`, then ~/.openclaw/cron/ directory)
2. Call Telegram's forwardMessage API with that message_id
3. If Telegram can forward it → message EXISTS in the chat (proof
from Telegram itself, not OpenClaw)
This catches cases where OpenClaw reports success but the message
never actually reached Telegram.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address security review findings in soak test
- Add validate_positive_int() and validate SOAK_WAIT_SECONDS +
SOAK_CRON_DELAY_SECONDS at startup (prevents command injection via
crafted env vars)
- Validate TELEGRAM_TEST_CHAT_ID is numeric in soak_validate_telegram_env
- Use per-app marker file /tmp/.spawn-cron-scheduled-${app} to avoid
race conditions when multiple soak tests run on the same VM
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When provisioning hits a 422 "droplet limit exceeded" response, wait 30s
and retry up to 3 times. Makes E2E suite resilient to transient limit hits
during parallel batch provisioning.
Fixes#2516
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Previously, _digitalocean_max_parallel() always returned 3, assuming all
quota slots were available. When pre-existing droplets occupy slots, the
batch-3 parallel runs fail with "droplet limit exceeded" API errors.
Now queries /v2/account for the actual droplet_limit and subtracts the
current droplet count to compute available capacity. Falls back to 3 if
the API is unreachable.
-- qa/e2e-tester
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
When Sprite (or another cloud) times out during provisioning, provision.sh
falls back to constructing .spawnrc manually over SSH. The claude and codex
agents were missing from the agent-specific case block, so:
- claude: ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN were never written,
causing verify_claude's openrouter.ai check to fail
- codex: OPENAI_API_KEY and OPENAI_BASE_URL were never written
Discovered during E2E run: sprite/claude failed with .spawnrc timeout +
missing openrouter.ai in fallback .spawnrc.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add a soak test that provisions OpenClaw on Sprite, waits 1 hour for
stabilization, injects a Telegram bot token, and runs integration tests
against the Telegram Bot API (getMe, sendMessage, getWebhookInfo).
- New: sh/e2e/lib/soak.sh — soak test library with all Telegram-specific logic
- Modified: sh/e2e/e2e.sh — add --soak flag to arg parser
- Modified: qa.sh — add soak run mode (bypasses Claude, runs e2e.sh directly)
- Modified: trigger-server.ts — add "soak" to VALID_REASONS
- Modified: qa.yml — add soak to workflow_dispatch options
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
On QA VMs running Claude Code via OpenRouter, the API key is stored as
ANTHROPIC_AUTH_TOKEN. Add a fallback in common.sh so e2e.sh picks up
the key from ANTHROPIC_AUTH_TOKEN when ANTHROPIC_BASE_URL points to
openrouter.ai and OPENROUTER_API_KEY is unset.
Also add SPRITE_NAME and SPRITE_ORG to the headless env var whitelist
in provision.sh — these are emitted by _sprite_headless_env() but were
missing from the positive whitelist, causing every Sprite provisioning
attempt to log errors and silently skip the env vars.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Replace `-H "Authorization: Bearer ..."` curl args with temp curl config
files (`-K`) in digitalocean.sh and hetzner.sh e2e drivers, keeping API
tokens out of `ps` output
- Replace dangerous-var blocklist in provision.sh with a positive whitelist
of allowed cloud_headless_env variable names
Agent: complexity-hunter
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
All four SSH-based cloud drivers (aws, digitalocean, gcp, hetzner)
passed the command string directly as an SSH argument, which gets
interpreted by the remote shell. While current callers pass trusted
E2E test code, this creates a security footgun for future changes.
Fix: base64-encode the command locally and decode it on the remote
side before piping to bash. The encoded string contains only safe
characters [A-Za-z0-9+/=], eliminating any injection vector. Stdin
is preserved for callers that pipe data into cloud_exec.
Closes#2432, closes#2433, closes#2434, closes#2435
Agent: complexity-hunter
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
- Replace word-split _sprite_org_flags() call sites with _sprite_cmd()
helper that uses a proper bash array for the -o flag, eliminating
injection risk from org names with spaces or shell metacharacters
- Validate _SPRITE_ORG against [A-Za-z0-9_-]+ in _sprite_validate_env
- Use grep -qF (fixed-string) instead of grep -q for app name matching
to prevent regex metacharacters in names from causing false matches
- Use mktemp for _stderr_tmp in _sprite_exec instead of predictable
PID-based path (/tmp/sprite-exec-err.$$) to prevent symlink attacks
Closes#2436
Agent: complexity-hunter
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
- Validate app_name at function entry (alphanumeric, dots, hyphens, underscores
only) before it's used in file paths or passed to cloud_exec
- Add trap-based cleanup for the temp file used during .spawnrc fallback creation
- Add security comments documenting the three-layer defense model: printf %q
quoting, base64 encoding, and stdin piping (no interpolation into command
strings)
The core vulnerability (env_b64 interpolated into the cloud_exec command string)
was already fixed in a prior commit that switched to stdin piping. This change
adds defense-in-depth and documentation.
Fixes#2437, #2441
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
install.sh: Replace color variable interpolation in printf format strings
with %b arguments to prevent format string injection (fixes#2443).
common.sh: Use %b for color escapes in logging functions. Document that
BASH_SOURCE and source usage in load_cloud_driver is intentional since
e2e scripts are filesystem-only, not curl|bash (fixes#2438).
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add defense-in-depth validation across all e2e cloud driver scripts:
- Validate IP addresses match IPv4 format before use in SSH commands
(aws, digitalocean, gcp, hetzner)
- Validate SSH username contains only safe characters (gcp)
- Validate resource IDs are numeric before interpolating into API URLs
(digitalocean droplet IDs, hetzner server IDs)
- URL-encode app name in Hetzner API query parameter to prevent
query parameter injection
- Validate numeric env vars (INPUT_TEST_TIMEOUT, PROVISION_TIMEOUT,
INSTALL_WAIT) that get interpolated into remote command strings
Fixes#2432, #2433, #2434, #2435, #2442
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* security: escape pkill regex metacharacters in app_name
Fixes#2409 - escape regex metacharacters (., [, \, *, ^, $) in
app_name before using in pkill -f pattern to prevent unintended
process termination. Even though app_name is validated against a
safe character whitelist, . and - are regex metacharacters that
could match broader patterns than intended.
Note: #2410 (unquoted regex in bash conditional) was already fixed
by a prior commit that refactored the code to use sed instead of
[[ =~ BASH_REMATCH ]].
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: remove dead exec_long functions reintroduced from pre-#2407 code
Remove cloud_exec_long dispatcher and all _*_exec_long() functions
from common.sh and cloud driver files (aws, digitalocean, gcp,
hetzner, sprite). These were explicitly removed as dead code in
PR #2407 (commit c4ae1684) and must not be reintroduced.
Issue #2410 (unquoted regex in bash conditional) is already resolved:
the [[ =~ ]] pattern was previously replaced with case/sed parsing.
Fixes#2409Fixes#2410
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The cloud_exec_long dispatcher in common.sh and all five cloud-specific
_exec_long implementations (aws, digitalocean, gcp, hetzner, sprite)
were defined but never called by any code in the e2e test suite.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
After SSH reconnect, agent commands (openclaw, codex, kilocode, junie) were
not found because PATH was only written to ~/.bashrc, which is not sourced
by login shells. Login shells (used by SSH) source ~/.profile or
~/.bash_profile instead.
Changes:
- Write .spawnrc sourcing to ~/.profile and ~/.bash_profile in addition
to ~/.bashrc and ~/.zshrc (orchestrate.ts)
- Write npm-global PATH export to ~/.profile and ~/.bash_profile for all
npm-installed agents: OpenClaw, Codex, Kilo Code, Junie (agent-setup.ts)
- Write Claude Code PATH to ~/.profile and ~/.bash_profile (agent-setup.ts)
- Write OpenCode PATH to ~/.profile and ~/.bash_profile (agent-setup.ts)
- Extract NPM_GLOBAL_PATH_PERSIST constant to DRY up repeated shell snippets
- Fix e2e provision.sh to also write .spawnrc sourcing to login shell configs
- Bump CLI version to 0.15.32
Fixes#2394
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replaces ${cfg}.fix$$ temp pattern with mktemp for guaranteed uniqueness.
Both temp file usages in the function are updated.
Fixes#2354
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds defense-in-depth check to reject malformed base64 output
before it is embedded in the cloud_exec remote command.
Fixes#2353
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove intermediate $env_b64 shell variable that stored base64-encoded
credentials. Pipe directly from base64 to cloud_exec, preventing any
credential data from appearing in process listings or shell traces.
Fixes#2333
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Block dangerous system env vars (PATH, LD_PRELOAD, etc.) before export
- Add explicit alphanumeric validation on env var names
- Validate app_name is non-empty and safe before pkill -f
- Tighten pkill regex from "sprite.*exec.*" to "sprite exec.*"
Fixes#2330#2332
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Pipe the command via stdin to bash instead of embedding it in a bash -c
string. This eliminates shell injection risk from unquoted cmd parameter,
consistent with _sprite_exec_long in the same file and other cloud drivers.
Fixes#2327
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The junie agent was added in #2300 but the E2E test scripts were not
updated. This adds junie to ALL_AGENTS, verify dispatch, input test
dispatch, and the provision.sh fallback env configuration.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Three distinct E2E bugs fixed:
1. SSH key generation race condition: When multiple agents provision in
parallel, concurrent processes all call generateSshKey() and race to
create ~/.ssh/id_ed25519. ssh-keygen won't overwrite an existing file
(prompts on stdin which is "ignore"), causing zeroclaw/codex to fail
with "SSH key generation failed". Fix: check if key already exists
before generating, and re-check after a failed generation attempt.
2. Hetzner SSH key 409 uniqueness_error: The Hetzner API returns HTTP 409
with "SSH key not unique" when the same key content is registered under
a different name. The hetznerApi() function throws on non-2xx before
the error-parsing code runs, and the regex /already/ didn't match
"not unique". Fix: catch 409 in ensureSshKey() and match against
uniqueness_error/not unique/already patterns.
3. Hermes binary not found: The hermes install script (uv tool) creates
the actual binary + venv at ~/.hermes/hermes-agent/venv/ with a symlink
at ~/.local/bin/hermes. The tarball capture script only captured the
symlink + ~/.local/share/, leaving a dangling symlink. Fix: include
~/.hermes/ in capture paths, add venv/bin to verify.sh PATH check,
and update hermes launchCmd to include the venv PATH.
Fixes#2304
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#2292
Unanchored grep -q would match the marker anywhere in output, including
error messages like "Expected SPAWN_E2E_OK but got...". Using grep -qx
requires the marker to appear as a complete line, preventing false passes.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace unsafe pattern where base64-encoded commands were interpolated
into remote command strings with secure stdin piping — command data now
travels as stdin rather than as part of the command string, eliminating
injection risk from shell metacharacter interpretation.
Affected functions across all 5 cloud drivers:
- _hetzner_exec_long
- _aws_exec_long
- _gcp_exec_long
- _digitalocean_exec_long
- _sprite_exec_long
Fixes#2286Fixes#2287
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: replace base64 interpolation with stdin piping in verify.sh (Fixes#2283)
Replace unsafe pattern where encoded prompt was interpolated into remote
command strings with secure stdin piping — prompt data now travels as stdin
rather than as part of the command string, eliminating injection risk.
Affected functions: input_test_claude, input_test_codex, input_test_openclaw,
input_test_zeroclaw.
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: use cloud_exec (not cloud_exec_long) for stdin piping
cloud_exec_long ignores stdin - remote base64 -d would hang.
cloud_exec passes cmd to bash -c, which preserves stdin piping.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: restore timeout protection for input tests using cloud_exec
Wraps each agent command in `timeout ${INPUT_TEST_TIMEOUT}` on the remote
side so tests cannot hang indefinitely after switching from cloud_exec_long
to cloud_exec. Updates stale comment referencing cloud_exec_long.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Simplify the cloud matrix by removing Daytona. All Daytona-specific code,
scripts, tests, and configuration have been removed. Daytona has been moved
to "Previously Considered" in the Cloud Provider Wishlist (#1183) and can
be revived on community demand.
Closes#2260
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: tarball workflow failures (root ownership, swapfile, hermes TTY)
- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: capture claude symlink target in tarball + fix verify PATH
The claude installer creates a symlink at ~/.local/bin/claude pointing
to ~/.local/share/claude/versions/X.Y.Z. The capture script was missing
~/.local/share/claude/, causing a broken symlink in the tarball.
Also add ~/.npm-global/bin to the verify PATH check for claude (npm
fallback install path).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* test(e2e): add openclaw gateway kill/restart resilience test
Verifies that the openclaw gateway auto-restarts after being killed
with SIGKILL, validating the systemd Restart=always supervision.
The test runs as part of verify_openclaw:
1. Confirms gateway is listening on :18789
2. Kills it with SIGKILL (simulates a hard crash)
3. Waits up to 30s for systemd to auto-restart it
4. Verifies port 18789 comes back online
If the gateway isn't running (e.g. non-systemd env), the test is
skipped gracefully. On failure, dumps systemd status and gateway
logs for diagnostics.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Revert "test(e2e): add openclaw gateway kill/restart resilience test"
This reverts commit 39b79d5c12.
* test: add unit tests for openclaw gateway resilience config
Verifies that startGateway() produces correct systemd and cron
configuration for auto-restart after a gateway crash:
- Restart=always and RestartSec=5 in the systemd unit
- Cron heartbeat checks port 18789 and restarts if dead
- Wrapper script sources .spawnrc and execs openclaw gateway
- Multiple port-check fallbacks (ss, /dev/tcp, nc)
- Non-systemd fallback to setsid/nohup
- 300s startup timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test(e2e): add openclaw gateway kill/restart resilience test
Kills the gateway with SIGKILL during verify_openclaw and verifies
systemd Restart=always brings it back within 30s. Skips gracefully
on non-systemd environments.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
1. promptSpawnName() now checks DO_DROPLET_NAME before generating a
random name, matching getServerName() behavior. This fixes the e2e
harness creating droplets as spawn-XXXX when it expects
e2e-digitalocean-AGENT-TIMESTAMP.
2. Replace BASH_REMATCH with sed-based parsing in provision.sh for
macOS bash 3.2 compatibility. BASH_REMATCH was returning empty
values, causing `export: '=': not a valid identifier`.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* fix(security): validate env values in cloud_headless_env parser
Reject values containing shell metacharacters ($, backtick, ;, &, |, <, >)
to prevent potential command injection if a cloud driver returns malicious output.
Fixes#2139
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): replace env value blacklist with whitelist regex
The blacklist approach missed dangerous characters like (), quotes,
backslash, newlines, {}, and !. Switch to a whitelist that only allows
[A-Za-z0-9@%+=:,./_-] — a strict safe set sufficient for env values.
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The fallback .spawnrc construction (used when provision times out before
.spawnrc is written) had two bugs:
1. zeroclaw case wrongly included OPENAI_API_KEY and OPENAI_BASE_URL —
these are hermes env vars, not zeroclaw's. zeroclaw only needs
ZEROCLAW_PROVIDER=openrouter (plus the base OPENROUTER_API_KEY).
2. hermes and kilocode were missing from the case statement entirely.
- hermes needs OPENAI_BASE_URL and OPENAI_API_KEY (verify_hermes
checks for OPENAI_BASE_URL in .spawnrc)
- kilocode needs KILO_PROVIDER_TYPE=openrouter and
KILO_OPEN_ROUTER_API_KEY (verify_kilocode checks KILO_PROVIDER_TYPE)
Without these fixes, hermes and kilocode would fail verification whenever
provisioning timed out before the normal .spawnrc was written.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove `cleanup_stale_apps()` in `sh/e2e/lib/cleanup.sh` which was dead
code — defined but never called. The E2E orchestrator (`e2e.sh`) invokes
`cloud_cleanup_stale` directly on the active cloud driver; the wrapper
function and its file served no purpose.
Also remove the corresponding `source` call in `e2e.sh`.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* fix: address 4 reliability issues across codebase
1. sprite.ts: add --force to destroy command (stdin is "ignore" so
interactive prompts would hang until 60s timeout)
2. verify.sh: replace /dev/tcp port checks with ss -tln primary
(Debian/Ubuntu bash compiled without /dev/tcp support)
3. verify.sh: make _openclaw_restart_gateway a hard failure instead
of log_warn (matching _openclaw_ensure_gateway behavior)
4. agent-setup.ts: add ss -tln port check + "already running" early
exit + increase timeout from 120s to 300s (gateway takes ~3min
to initialize on AWS medium instances)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: biome format - use consistent double quotes in portCheck
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* fix(e2e): improve openclaw reliability on AWS and other clouds
Three changes to make openclaw e2e tests more robust:
1. Increase PROVISION_TIMEOUT from 480s to 720s — AWS cloud-init
for "full" tier (Node.js + Bun + build-essential) can exceed 480s,
causing the CLI to be killed before .spawnrc is written.
2. Add .spawnrc manual fallback in provision.sh — if the CLI is killed
before writing .spawnrc, construct it via SSH using OPENROUTER_API_KEY
with agent-specific env vars (openclaw, zeroclaw).
3. Add retry logic to openclaw gateway input test — the gateway can
crash with 1006 websocket closure on resource-constrained instances.
Now retries once after killing and restarting the gateway process.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(security): fix command injection in e2e provision scripts
- Use printf %q and temp file for api_key handling in provision.sh to
prevent shell metachar injection (single quotes, backticks, $)
- Double-quote env_b64 interpolation in cloud_exec call to prevent
word splitting
- Replace echo with printf in bashrc append to avoid portability issues
- Replace overbroad pkill -f 'openclaw gateway' in verify.sh with
PID-targeted kill via lsof/fuser on port 18789
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>