Commit graph

18 commits

Author SHA1 Message Date
Ahmed Abushagur
4ac19a375a
fix: capture claude symlink target + verify PATH (#2245)
* fix: tarball workflow failures (root ownership, swapfile, hermes TTY)

- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: capture claude symlink target in tarball + fix verify PATH

The claude installer creates a symlink at ~/.local/bin/claude pointing
to ~/.local/share/claude/versions/X.Y.Z. The capture script was missing
~/.local/share/claude/, causing a broken symlink in the tarball.

Also add ~/.npm-global/bin to the verify PATH check for claude (npm
fallback install path).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-06 10:55:09 -05:00
Ahmed Abushagur
c71f01725b
test: unit tests for openclaw gateway resilience config (#2224)
* test(e2e): add openclaw gateway kill/restart resilience test

Verifies that the openclaw gateway auto-restarts after being killed
with SIGKILL, validating the systemd Restart=always supervision.

The test runs as part of verify_openclaw:
1. Confirms gateway is listening on :18789
2. Kills it with SIGKILL (simulates a hard crash)
3. Waits up to 30s for systemd to auto-restart it
4. Verifies port 18789 comes back online

If the gateway isn't running (e.g. non-systemd env), the test is
skipped gracefully. On failure, dumps systemd status and gateway
logs for diagnostics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "test(e2e): add openclaw gateway kill/restart resilience test"

This reverts commit 39b79d5c12.

* test: add unit tests for openclaw gateway resilience config

Verifies that startGateway() produces correct systemd and cron
configuration for auto-restart after a gateway crash:

- Restart=always and RestartSec=5 in the systemd unit
- Cron heartbeat checks port 18789 and restarts if dead
- Wrapper script sources .spawnrc and execs openclaw gateway
- Multiple port-check fallbacks (ss, /dev/tcp, nc)
- Non-systemd fallback to setsid/nohup
- 300s startup timeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test(e2e): add openclaw gateway kill/restart resilience test

Kills the gateway with SIGKILL during verify_openclaw and verifies
systemd Restart=always brings it back within 30s. Skips gracefully
on non-systemd environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-05 19:38:10 -05:00
Ahmed Abushagur
300b330106
fix: address 4 reliability issues across codebase (#2129)
* fix: address 4 reliability issues across codebase

1. sprite.ts: add --force to destroy command (stdin is "ignore" so
   interactive prompts would hang until 60s timeout)

2. verify.sh: replace /dev/tcp port checks with ss -tln primary
   (Debian/Ubuntu bash compiled without /dev/tcp support)

3. verify.sh: make _openclaw_restart_gateway a hard failure instead
   of log_warn (matching _openclaw_ensure_gateway behavior)

4. agent-setup.ts: add ss -tln port check + "already running" early
   exit + increase timeout from 120s to 300s (gateway takes ~3min
   to initialize on AWS medium instances)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format - use consistent double quotes in portCheck

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-03 03:18:44 -05:00
Ahmed Abushagur
4a90abdaa2
fix(e2e): improve openclaw reliability on AWS and other clouds (#2123)
* fix(e2e): improve openclaw reliability on AWS and other clouds

Three changes to make openclaw e2e tests more robust:

1. Increase PROVISION_TIMEOUT from 480s to 720s — AWS cloud-init
   for "full" tier (Node.js + Bun + build-essential) can exceed 480s,
   causing the CLI to be killed before .spawnrc is written.

2. Add .spawnrc manual fallback in provision.sh — if the CLI is killed
   before writing .spawnrc, construct it via SSH using OPENROUTER_API_KEY
   with agent-specific env vars (openclaw, zeroclaw).

3. Add retry logic to openclaw gateway input test — the gateway can
   crash with 1006 websocket closure on resource-constrained instances.
   Now retries once after killing and restarting the gateway process.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): fix command injection in e2e provision scripts

- Use printf %q and temp file for api_key handling in provision.sh to
  prevent shell metachar injection (single quotes, backticks, $)
- Double-quote env_b64 interpolation in cloud_exec call to prevent
  word splitting
- Replace echo with printf in bashrc append to avoid portability issues
- Replace overbroad pkill -f 'openclaw gateway' in verify.sh with
  PID-targeted kill via lsof/fuser on port 18789

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-02 23:19:34 -05:00
Ahmed Abushagur
b35ecdfaff
fix(e2e): drop --timeout flag from openclaw agent command (#2109)
The outer cloud_exec_long already enforces a timeout via
INPUT_TEST_TIMEOUT. The inner --timeout 60 was redundant and could
cause premature kills before the outer timeout expired.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-02 10:57:44 -08:00
Ahmed Abushagur
35badf8d1b
fix(e2e): fail hard when OpenClaw gateway doesn't start (#2111)
The gateway startup was silently swallowed with log_warn, masking
real failures. Now tracks whether the port came up and fails the
test with the gateway log contents if it didn't.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-02 13:51:29 -05:00
Ahmed Abushagur
45caf4b96b
fix(sprite): fix all 6 Sprite agent installs for E2E (#2057)
* fix(sprite): fix all 6 Sprite agent installs for E2E

- Use `npm install -g --prefix` instead of `npm config set prefix` to
  avoid creating .npmrc that conflicts with nvm on Sprite VMs
- Fix shell environment setup to only modify .bash_profile (not .bashrc)
  so non-interactive bash -c commands retain PATH config
- Add $HOME/.cargo/bin to PATH for zeroclaw (Sprite has no ~/.cargo/env)
- Add $HOME/.local/bin to PATH config for Sprite shell environment
- Add sprite E2E cloud driver with org detection, config corruption fix,
  direct command embedding (not $1 positional), and retry logic
- Fix provision.sh to kill full process tree after timeout (prevents
  orphaned sprite exec sessions from corrupting config)
- Fix verify.sh zeroclaw check to not rely on ~/.cargo/env existing

Tested: 6/6 Sprite agents pass E2E (claude, codex, openclaw, zeroclaw,
opencode, kilocode). Hermes is not in the Sprite manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format - collapse runSprite call to single line

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-01 07:15:09 -05:00
Ahmed Abushagur
2155a36a1f
fix(e2e): use explicit PATH for hermes, codex, and kilocode binary checks (#2049)
Non-interactive SSH sessions don't source .bashrc or .zshrc, so binaries
installed to ~/.local/bin (hermes via uv) or ~/.npm-global/bin (codex,
kilocode via npm) were not found during verification.

Fix all three verify functions and the codex input test to use explicit
PATH with the known install directories, matching the pattern already
used by openclaw and claude.

Verified: AWS 7/7, Hetzner 6/6 implemented, GCP 6/6 implemented.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-01 00:07:33 -05:00
Ahmed Abushagur
d8dbf952c2
fix(e2e): fix openclaw input test (PATH, CLI flags, gateway restart) (#2045)
The openclaw e2e input test was failing for three independent reasons:

1. PATH missing ~/.npm-global/bin — openclaw installs via npm with a
   custom prefix, but verify_openclaw and input_test_openclaw didn't
   include that directory in PATH

2. Wrong CLI invocation — used `openclaw -p` which doesn't exist.
   The correct command is `openclaw agent --message "..." --session-id`

3. Gateway not running — the openclaw gateway (port 18789) can die
   between provisioning and verification. Now the input test ensures
   the gateway is running before sending the prompt.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-28 20:41:48 -05:00
A
ab08476a63
refactor: Remove dead code and stale references (#2028)
- Add `hermes` to ALL_AGENTS in sh/e2e/lib/common.sh (stale: hermes added to
  manifest.json in #2023 but never added to the e2e agent list)
- Add verify_hermes() and input_test_hermes() to sh/e2e/lib/verify.sh and
  wire them into verify_agent/run_input_test dispatch tables
- Remove dead log_warn() from sh/shared/github-auth.sh (defined but never called)
- Remove dead get_cloud_env_vars() from sh/shared/key-request.sh (no callers outside file)
- Remove dead invalidate_cloud_key() from sh/shared/key-request.sh (no callers anywhere)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-28 12:55:09 -08:00
Ahmed Abushagur
627026a26b
feat(e2e): multi-cloud test suite with cloud driver pattern (#2004)
* feat(e2e): multi-cloud test suite with cloud driver pattern

Scale the E2E test suite from AWS-only to all 6 infrastructure clouds
(aws, hetzner, digitalocean, gcp, daytona, sprite) with parallel
execution support.

Architecture:
- Cloud driver pattern: each cloud implements _cloudname_func() functions
- load_cloud_driver() wires cloud-specific functions to generic names
  (cloud_exec, cloud_teardown, etc.)
- Shared orchestration stays in one place, cloud details are isolated

New files:
- sh/e2e/e2e.sh — unified entry point with --cloud flag
- sh/e2e/lib/clouds/{aws,hetzner,digitalocean,gcp,daytona,sprite}.sh

Refactored:
- common.sh — removed AWS constants, added load_cloud_driver()
- provision.sh — cloud-agnostic via cloud_headless_env/cloud_provision_verify
- verify.sh — replaced aws_ssh with cloud_exec/cloud_exec_long
- teardown.sh/cleanup.sh — delegate to cloud driver functions
- aws-e2e.sh — thin wrapper: exec e2e.sh --cloud aws

Usage:
  e2e.sh --cloud aws                     # Single cloud
  e2e.sh --cloud aws --cloud hetzner     # Multiple clouds in parallel
  e2e.sh --cloud all --parallel 3        # All clouds, 3 agents parallel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(e2e): prevent subshell EXIT trap inheritance and single-cloud early exit

- Reset EXIT trap in multi-cloud subshells to prevent LOG_DIR deletion
  before the main process reads log files
- Use `|| true` for single-cloud run_agents_for_cloud to prevent set -e
  from skipping the summary on env validation failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: default to parallel agent provisioning in e2e tests

All agents within a cloud now run in parallel by default instead of
sequentially. Use --sequential to restore the old behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: cap sprite parallelism, 4GB for openclaw, remove stderr suppression

- Sprite: add _sprite_max_parallel (cap 2 concurrent agents) to avoid
  CLI rate limiting that caused all 6 agents to fail
- AWS: use medium_3_0 (4GB) bundle for openclaw which needs more RAM
- Input tests: remove 2>/dev/null from agent commands so failures
  produce visible error output instead of empty responses
- Add cloud_max_parallel to driver interface, respected by e2e.sh

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use bash instead of sh for exec_long across all cloud drivers

Ubuntu's /bin/sh is dash, which doesn't support bash-specific PATH
sourcing from .spawnrc/.cargo/env. This caused codex and zeroclaw
input tests to fail with "command not found" even though verify passed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex input test uses positional prompt, not -q flag

codex CLI takes prompt as positional arg: `codex "PROMPT"`.
The -q flag doesn't exist, causing "Usage:" error output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use codex exec -q for non-interactive input test

codex requires `exec` subcommand for non-interactive mode.
Plain `codex PROMPT` expects a TTY (stdin is not a terminal).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: codex exec takes no -q flag, just positional prompt

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use cx23 instead of deprecated cx22 for Hetzner e2e tests

Hetzner deprecated server type cx22 (ID 104). The default now uses cx23.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-27 19:28:08 -08:00
A
d04096a15b
feat!: remove Fly.io cloud provider support (#1979)
* feat!: remove Fly.io cloud provider support

Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io
infrastructure internally) is retained.

- Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/
- Remove fly cloud entry and 6 fly matrix entries from manifest.json
- Remove fly imports, destroy cases, and connection handlers from commands.ts
- Remove fly-ssh sentinel from security.ts
- Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh)
- Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts
- Clean up fly references in build config, gitignore, icon sources
- Bump CLI version to 0.11.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: restore Docker image build under sh/docker/

Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename
workflow from fly-docker.yml to docker.yml with updated paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix extra blank lines in commands.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-27 00:06:32 -05:00
A
b6f021ecf2
fix(security): clarify base64+single-quote pattern in fly_ssh (#1937)
Fixes #1933. The comments incorrectly implied base64 encoding alone
prevents injection. Safety relies on the combination of base64 output
(no single quotes in alphabet) + single-quote wrapping. Made this
explicit in all 7 affected comments.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 18:44:51 -05:00
A
b09295131a
fix(security): replace eval with pipe-to-sh in fly_ssh functions (#1928)
Eliminates nested quote eval pattern in favor of direct pipe to sh,
removing potential injection surface in fly_ssh and fly_ssh_long.
Fixes #1927

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 14:50:24 -05:00
A
cac06b2706
fix(security): base64-encode INPUT_TEST_PROMPT in E2E verify.sh to prevent injection (#1923)
Fixes #1921

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 10:26:18 -05:00
A
638250e744
fix(security): use base64 encoding in fly_ssh to prevent command injection (#1918)
Replace single-quote escaping (which only handled ' but not other shell
metacharacters like $(), backticks, ;, ||, &&, |) with base64 encoding.
Base64 output contains only [A-Za-z0-9+/=] characters, completely
eliminating shell metacharacter injection risks regardless of command
content. Compatible with both GNU coreutils (Linux) and BSD (macOS).

Fixes #1912

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 03:25:31 -08:00
A
154112fb41
feat: add live input/output E2E verification for agents (#1886)
* feat: add live input/output E2E verification for agents

The E2E suite previously only verified static artifacts (binaries, config
files, env vars). An agent with a broken API key or crash-on-launch bug
would pass all checks. This adds an input test phase that sends a real
prompt to each agent and verifies the response contains a marker string.

- Add fly_ssh_long() with configurable timeout for long-running commands
- Add per-agent input test functions (claude -p, codex -q, openclaw -p,
  zeroclaw agent -p; opencode/kilocode skip as TUI-only)
- Add run_input_test() dispatcher with SKIP_INPUT_TEST env var support
- Add --skip-input-test CLI flag to fly-e2e.sh
- Chain input test after verify in run_single_agent() pipeline
- Add INPUT_TEST_TIMEOUT constant (default 120s, env-overridable)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: format p.text({ message }) to multi-line for biome

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 15:16:30 -05:00
A
b84adfb74e
refactor: move all shell scripts to /sh directory (#1843)
Reorganizes the project so all shell scripts live under a dedicated
/sh directory, enabling the OpenRouter rewrite URL to point at /sh/
instead of the repository root.

Moves:
- cli/install.sh → sh/cli/install.sh
- shared/*.sh → sh/shared/*.sh
- {cloud}/{agent}.sh → sh/{cloud}/{agent}.sh (48 scripts)
- {cloud}/README.md → sh/{cloud}/README.md
- e2e/*.sh → sh/e2e/*.sh
- test/macos-compat.sh → sh/test/macos-compat.sh
- test/fixtures/**/*.sh → sh/test/fixtures/**/*.sh

Updates all references:
- RAW_BASE path construction in commands.ts, update-check.ts
- GitHub auth URL in agent-setup.ts
- Self-referencing URLs in install.sh, github-auth.sh
- CI workflow paths in lint.yml, cli-release.yml
- Test file paths in install-script-validation, manifest-integrity
- Documentation in README.md, cli/README.md, CLAUDE.md
- QA scripts in .claude/skills/

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 21:14:54 -08:00
Renamed from e2e/lib/verify.sh (Browse further)