Commit graph

2522 commits

Author SHA1 Message Date
A
18b1a5f50f
fix(install): force IPv4 DNS for npm installs and add junie binary verify (#2920)
* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* fix(install): force IPv4 DNS for npm installs and add junie binary verify

On Sprite VMs (and potentially other clouds with flaky IPv6 routing), npm
install of packages with native-binary postinstall scripts (kilocode, junie)
fails with i/o timeout when connecting to the npm registry over IPv6.

Changes:
- Add NODE_OPTIONS=--dns-result-order=ipv4first to NPM_PREFIX_SETUP so all
  npm installs prefer IPv4, preventing the IPv6 timeout on first attempt
- Add cd ~ before postinstall re-run in KILOCODE_BINARY_VERIFY to avoid
  "current working directory was deleted" errors in bun/node on retry
- Add JUNIE_BINARY_VERIFY snippet (analogous to kilocode) that detects and
  recovers from a failed junie postinstall by re-running it from $HOME
- Apply JUNIE_BINARY_VERIFY to the junie install command

Fixes sprite kilocode and junie failures seen in E2E run 2026-03-23.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 05:13:12 +07:00
A
e0db833307
fix(update-check): redirect install script stdout to stderr in --output json mode (#2919)
When --output json is requested, the auto-update install script was
running with stdio: "inherit", causing [spawn] install messages to
pollute stdout before the JSON result, breaking JSON consumers.

Fix:
- Pre-scan process.argv for --output json before checkForUpdates()
  is called in index.ts (formal flag parsing happens later at line 944)
- Pass jsonOutput flag through checkForUpdates() -> performAutoUpdate()
- When jsonOutput=true, use stdio: ["pipe", stderr, stderr] for the
  install script execution so all output goes to stderr only
- Set SPAWN_CLI_UPDATED=1 env var on re-exec so JSON consumers can
  detect the update via cli_updated: true in SpawnResult
- Add cli_updated?: boolean to SpawnResult interface in commands/run.ts
- Add tests covering both json and non-json stdio behavior

Fixes #2918

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 03:18:50 +07:00
A
c1e6fb76f9
fix(e2e): harden pkill regex escaping against all metacharacters (#2917)
* fix(e2e): harden pkill regex escaping against all metacharacters (#2911)

The sed character class `[.[\*^$]` was malformed and missed several
extended regex metacharacters (+, ?, (, ), {, }, |). Replace with a
correct bracket expression that escapes all POSIX ERE metacharacters.

Although app_name is already validated to [A-Za-z0-9._-], fixing the
escaping is defense-in-depth against future changes to the validation.

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(e2e): correct sed bracket expression to escape ] character

Place ] first in character class so it's treated as literal.
Use \\ to match literal backslash.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 12:35:31 -07:00
A
f38ae693de
fix: set SPAWN_NON_INTERACTIVE in headless mode to prevent prompt hangs (#2916)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Headless mode set SPAWN_HEADLESS and SPAWN_MODE but not
SPAWN_NON_INTERACTIVE, which all cloud modules check before prompting.
This caused GCP (and potentially other clouds) to prompt for project
confirmation when stdin was closed, resulting in a fatal error.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 01:22:47 +07:00
A
a959a6db83
fix(types): remove as type assertions from test mocks (#2913)
Add missing fields (signalCode, resourceUsage, pid, killed) to
Bun.spawnSync and Bun.spawn mock return values so they satisfy the
full return types without needing `as` casts or biome-ignore comments.

Agent: style-reviewer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 00:24:49 +07:00
A
69a0d476a0
test: remove duplicate and theatrical tests (#2912)
Remove 8 tests that checked constant equality (DEFAULT_DROPLET_SIZE,
DEFAULT_DO_REGION, DEFAULT_MACHINE_TYPE, DEFAULT_ZONE, DEFAULT_SERVER_TYPE,
DEFAULT_LOCATION) across digitalocean/gcp/hetzner cov files — these tests
just hardcode the same string twice and break if the default is changed for
a valid reason.

Also remove 2 sleep() tests from ssh-cov.test.ts: sleep() is a trivial
setTimeout wrapper with no logic, and the timing test added 50ms of real
wall time per run.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-24 00:22:49 +07:00
A
0e17461fcd
test: remove duplicate cmdFix tests from cmd-fix-cov.test.ts (#2910)
Three tests in the `cmdFix (additional coverage)` describe block were
exact duplicates of tests already in cmd-fix.test.ts:

- "fixes directly when only one server" = "directly fixes when only one active server"
- "finds record by name when spawnId matches name" = "fixes by spawn name"
- "shows no active spawns when history is empty" = "shows message when no active spawns"

Removed the duplicate describe block and its now-unused imports.
Unique fixSpawn coverage (security validation, manifest failure, label
fallbacks, success message) is preserved.

Agent: pr-maintainer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-23 21:35:44 +07:00
A
f8e23317c9
fix(cli): fix openclaw DO size and kilocode CWD install failures (#2909)
- digitalocean: change openclaw min size from s-2vcpu-4gb-intel to
  s-2vcpu-4gb (intel variant no longer available in nyc3)
- agent-setup: add cd "$HOME" before kilocode npm install to prevent
  postinstall failure when CWD is deleted during npm global install
- bump version to 0.25.19

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 20:37:48 +07:00
A
59dea5fc09
refactor: remove dead code and stale references (#2908)
- remove `export` from `LocalTarball` interface in `shared/agent-tarball.ts`
  — the type is only used internally as the return type of `downloadTarballLocally`;
  it was never imported from outside the module.

- remove `getTerminalWidth` re-export from `commands/index.ts`
  — `getTerminalWidth` is only called inside `commands/info.ts` itself;
  it was re-exported through the barrel but never imported from there by any consumer or test.

bump CLI version patch: 0.25.18 → 0.25.19

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 19:51:41 +07:00
A
f296544c1c
fix(cli): bump version to 0.25.18 for security fix in #2904 (#2906)
Commit 97b6424 (fix(security): add cmd validation to Sprite
runSprite() and runSpriteSilent()) changed production CLI code without
a corresponding version bump. The CLI has auto-update — without this
bump users won't receive the null-byte injection guard.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-23 18:50:00 +07:00
A
97b6424ebe
fix(security): add cmd validation to Sprite runSprite() and runSpriteSilent() (#2904)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Mirrors the guard already in interactiveSession() and all other clouds.
Null bytes in cmd could truncate commands at the C level.

Fixes #2903

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:30:25 +07:00
A
5392ff2d7a
fix: detect and recover from Hetzner primary_ip_limit exceeded error (#2905)
When parallel E2E runs exhaust Hetzner's Primary IP quota, the CLI now
detects the `resource_limit_exceeded` / `primary_ip_limit` error, automatically
cleans up orphaned Primary IPs (unattached to any server), and retries once.
If cleanup doesn't free quota, a clear message guides users to delete stale
resources or request a quota increase.

Fixes #2902

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:26:32 +07:00
A
d2f11bbf06
test: remove duplicate and theatrical tests (#2901)
cmd-pick-cov.test.ts: remove 8 theatrical flag-parsing tests that all hit
the same early-exit code path (no stdin options → exit 1). Each test
passed a different flag combination but all verified only that exit(1) was
thrown — no flag-specific behavior was actually exercised. Keep the one
meaningful test: "exits with error when no options provided".

ssh-cov.test.ts: consolidate 5 single-assertion constant-check tests into
2 tests (one per constant). All 5 previously tested string membership in
SSH_BASE_OPTS / SSH_INTERACTIVE_OPTS in separate it() blocks.

Before: 1868 tests, 4454 expect() calls
After:  1857 tests, 4446 expect() calls (-11 tests, -8 expects)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 16:28:30 +07:00
A
7aba20e327
fix(ux): deduplicate install messages, add newlines to SSH polling, clarify completion messages (#2900)
- Suppress stdout+stderr from `claude install --force` to prevent duplicate
  "successfully installed" messages (was printed up to 4x)
- Make logStepInline fall back to newline-separated output when stderr is not
  a TTY, so SSH port polling status is readable in piped/captured contexts
- Consolidate post-install completion messages into a single clear milestone:
  "Agent setup complete -- {agent} is ready on {cloud}"
- Bump CLI version to 0.25.16

Fixes #2899

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 15:26:34 +07:00
A
a96522829b
fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898)
* fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness

Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively.
The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN
is an OpenRouter key — every Anthropic API call returns 401, so the harness
returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires.

SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect,
ensuring the harness only tests the provisioning/installation UX.

* fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner

Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via
"printf | base64 -d | bash", which makes bash's stdin the decode pipe.
So piped data from the outer SSH call never reaches subcommands.

"printf '%s' 'VALUE' | cloud_exec APP 'cat > /tmp/.e2e-timeout'" always
creates an empty file, causing "timeout: invalid time interval ''" when
the input test runs.

Fix: embed the validated numeric timeout value directly in the printf
command string (safe — _validate_timeout ensures only [0-9] digits).

* test(e2e): add claude PATH diagnostics to input_test_claude

Temporary debug output to trace where claude is installed
after interactive provision completes.

* test(e2e): save harness transcript JSON on success for debugging

* fix(e2e): remove 'is ready' from harness success pattern

'SSH is ready' (emitted ~15s into provision when SSH connects but before
any agent installation) matched the /is ready/ pattern, triggering false
success detection. The harness killed the spawn CLI during cloud-init wait,
leaving a VM with no agent installed.

Fix: use the same precise patterns as the main repo's harness:
  /Starting agent\.\.\.|setup completed successfully/i
Both only fire after orchestrate.ts completes the full setup.

* chore(e2e): remove temporary debug instrumentation

* feat(e2e): add ai-powered ux review after interactive provision

After each successful interactive E2E run, the harness sends the full
terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt.
It looks for confusing messages, noisy output, missing context in spinners,
and unhelpful errors that don't explain next steps.

Findings are returned as uxIssues[] in the harness JSON result.
interactive.sh then files a GitHub issue per run listing each problem
with a verbatim example and concrete suggestion.

Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM
where ANTHROPIC_API_KEY is an OpenRouter key.

* refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required

- Random 33% gate: UX review runs on ~1 in 3 successful interactive
  provisions, not every run
- Minimum bar: only surface findings when AI found 3+ clear issues
  (filters one-off nits)
- Tighter system prompt: only flag obvious problems (repeated messages,
  debug leaks, cryptic errors), not minor style preferences

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(e2e): replace random throttle with stricter ux review prompt

Instead of Math.random() to suppress issues, make the AI self-regulate:
the system prompt now instructs it to only flag genuinely bad problems
(repeated messages, raw stack traces, no-feedback waits) and treat
zero findings as a good outcome, not a failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 13:42:02 +07:00
A
9448cb8ca0
fix(e2e): fix _stage_prompt_remotely to embed prompt inline instead of stdin pipe (#2897)
The stdin piping approach was broken: _hetzner_exec runs remote commands via
"printf '%s' 'ENCODED_CMD' | base64 -d | bash", which connects bash's stdin to
the base64 pipe rather than SSH's outer stdin. So `cat > /tmp/.e2e-prompt` read
from EOF — the encoded prompt was never written to the remote file.

Fix: embed the validated base64 prompt directly in the command string using
printf. This is safe because _validate_base64 ensures the prompt contains only
[A-Za-z0-9+/=] — no characters that can break out of single quotes or inject
shell metacharacters.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-23 12:19:51 +07:00
A
e7e3b327a1
test: remove duplicate saveSpawnRecord describe block (#2896)
The saveSpawnRecord tests in history-trimming.test.ts duplicated the
describe block already in history.test.ts. Moved the two unique test
cases ("no cap" 200-record retention and "assign id when missing") into
history.test.ts and removed the duplicate block from history-trimming.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-23 12:14:49 +07:00
A
f1f2667cb0
fix: skip interactive session in headless mode (#2895)
* fix: skip interactive session in headless mode (#2892)

When SPAWN_HEADLESS=1, the orchestrator now exits with code 0 after
provisioning completes instead of attempting to launch the agent
interactively. This fixes Claude Code (and other agents) failing with
"Input must be provided through stdin or --prompt" when spawned via
`--headless --output json` without a prompt.

The VM is fully provisioned and ready — callers can SSH in or use
`spawn connect` to start the agent manually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clean up SPAWN_HEADLESS env in test afterEach to prevent leaks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-22 21:38:53 -07:00
A
9280489ada
fix(qa): load ANTHROPIC_AUTH_TOKEN as ANTHROPIC_API_KEY for interactive E2E (#2894)
* chore: update agent GitHub star counts

* fix(qa): load ANTHROPIC_AUTH_TOKEN as ANTHROPIC_API_KEY for interactive E2E

QA VMs store the Anthropic key as ANTHROPIC_AUTH_TOKEN in
/etc/spawn-qa-auth.env, but the e2e-interactive handler only looked for
ANTHROPIC_API_KEY — causing the 6am cron to fail immediately with
"ANTHROPIC_API_KEY not set". Accept either name when loading from the
auth env file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(e2e): bump interactive harness timeout to 20min, fix zombie VM teardown

- SESSION_TIMEOUT_MS: 10min → 20min — provisioning a VM takes 3-4 min
  before onboarding even starts; 10min wasn't enough headroom
- interactive.sh: call cloud_provision_verify even on harness failure so
  teardown can find and delete any VM that was partially created (e.g.
  on timeout mid-provision) — previously left zombie VMs with no .meta file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-23 11:24:26 +07:00
Ahmed Abushagur
6aeb9ba142
feat(e2e): diff-aware AI review with e2e-last-green tracking (#2893)
AI log review now includes the git diff since the last fully passing
E2E run, enabling causal analysis like "this 404 likely caused by
commit abc123 which deleted file Y". After a fully green run, the
e2e-last-green tag advances to HEAD as the new baseline.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:21:35 +07:00
A
4d08dbe2a7
fix(security): harden remote command construction in provision.sh (#2886)
* fix(security): harden remote command construction in provision.sh

Split the .spawnrc upload fallback into two separate cloud_exec calls
to separate data from commands. Step 1 writes the validated base64
payload to a remote temp file. Step 2 decodes from that file and
sets up shell rc sourcing using a static command string with no
interpolated variables.

This eliminates command injection risk in the control-flow portion
of the remote command (for loop, grep, etc.) even if the base64
validation were ever bypassed, since user-controlled data never
appears in the same command string as shell control flow.

Fixes #2882

Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: correct error handling + use mktemp for temp file

- Return 1 (not 0) when step 1 fails to avoid masking provisioning failures
- Use mktemp -t spawnrc.b64 to avoid race conditions on concurrent provisions

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: propagate step 2 failure in provision.sh (return 1)

The else branch for step 2 (decode + shell rc setup) logged an error
but the function still returned 0, masking the failure. Now returns 1
so provisioning failures are correctly propagated.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 20:44:33 -07:00
A
b0593952df
fix(security): validate cmd parameter in sprite interactiveSession (#2888)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Add empty-string and null-byte validation to sprite's interactiveSession,
matching the guards already present in aws, hetzner, digitalocean, and gcp.
Without this check, a raw cmd string is passed directly to bash -c.

Fixes #2881

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 18:53:28 -07:00
A
da07fd4031
fix(security): prevent command injection in sprite uploadFile (#2889)
Replace shell string interpolation with array-based exec arguments in
uploadFileSprite. Previously, remotePath and tempRemote were interpolated
into a bash -c string (`mkdir -p $(dirname '${normalizedRemote}') && mv
'${tempRemote}' '${normalizedRemote}'`), which is inherently unsafe
even with regex validation.

Now uses two separate sprite exec calls with paths passed as discrete
array arguments after `--`, and computes dirname in TypeScript using
node:path/posix instead of shell command substitution. Also fixes the
mockBunSpawn test helper to return fresh ReadableStream instances per
call, preventing "ReadableStream already used" errors.

Fixes #2880

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 18:51:51 -07:00
A
0224b56a4d
fix(digitalocean): detect droplet limit before creation, clear error on 422 (#2891)
checkAccountStatus() now queries the account's droplet_limit and
current droplet count. When at capacity it warns interactively and
throws immediately in headless/E2E mode with a clear message instead
of attempting creation and getting a cryptic 422.

Also adds specific detection of droplet limit 422 errors in
createServer() with actionable guidance (limit increase URL).

Bump CLI to 0.25.14.

Fixes #2865

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 18:49:17 -07:00
A
83cd6bc6df
test: remove duplicate generateCodeVerifier/generateCodeChallenge tests from oauth-cov (#2885)
These two describe blocks in oauth-cov.test.ts were redundant subsets of the more
comprehensive coverage already in oauth-pkce.test.ts (which includes RFC 7636 test
vectors, uniqueness checks, padding validation, and base64url character checks).

Duplicates found: 1 function pair (generateCodeVerifier + generateCodeChallenge)
Tests removed: 2
Tests rewritten: 0

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 08:43:14 +07:00
A
d046a9bfdf
fix: tighten character whitelist for cloud_headless_env values (#2890)
The env value whitelist allowed @, %, +, =, :, and , characters that
are unnecessary for cloud resource names (server names, regions, sizes)
and could be used as shell metacharacters in certain contexts. Restrict
to only [A-Za-z0-9._/-] which matches all legitimate cloud resource
identifiers.

Fixes #2883

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 08:41:50 +07:00
A
fa79d34a47
fix(security): properly quote remote cmd construction in verify.sh (#2887)
Prevent shell metacharacter interpretation in test prompt handling
by staging INPUT_TEST_TIMEOUT and attempt number to remote temp files
instead of interpolating them into remote command strings.

Previously, _TIMEOUT='${INPUT_TEST_TIMEOUT}' and --session-id
e2e-test-${attempt} were interpolated directly into double-quoted
remote command strings. While _validate_timeout enforces digits-only,
the structural pattern of local-to-remote variable interpolation is
inherently risky. Now all dynamic values (prompt, timeout, attempt)
are piped to remote temp files via stdin and read back on the remote
side, eliminating the injection surface entirely.

Fixes #2884

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 08:39:36 +07:00
A
054a740e5a
refactor: remove stale Packer comment in hetzner.ts (#2878)
The reference to "Hetzner Packer" was removed in #2869.
Updated the comment to accurately describe the snapshot naming convention.

-- qa/code-quality

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-23 04:14:00 +07:00
A
76afe9546b
test: add missing assertions to no-op smoke tests (#2879)
19 tests across 7 files were calling functions with no expect() calls —
they verified "does not throw" implicitly but provided zero signal on
side effects or return values.

Added assertions to each:
- agent-setup-cov: expect runServer called after graceful failure
- auto-update: expect runServer called on non-fatal SSH error
- aws-cov: assert state.awsRegion set by promptRegion env var paths,
  spawnSync call counts for ensureAwsCli, fetch called for destroyServer
- do-cov: assert SPAWN_NAME_KEBAB preserved on early return,
  fetch NOT called when no token in checkAccountStatus
- gcp-cov: assert spy call counts for authenticate, destroyInstance,
  ensureGcloudCli; spawnSync NOT called when GCP_PROJECT env set;
  fetch NOT called when no project in checkBillingEnabled
- hetzner-cov: assert fetch called for ensureHcloudToken validation
  and for destroyServer REST calls
- ssh-cov: assert connectSpy and bunSpawnSpy called in waitForSsh

All 1925 tests pass. expect() calls increased from 4555 to 4575.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 04:12:18 +07:00
A
db6c44be9c
fix(e2e): update input tests for new agent CLIs + auto-load email creds (#2877)
* fix(e2e): update input tests for latest agent CLI interfaces + auto-load email creds

claude: add --dangerously-skip-permissions --no-session-persistence to bypass
trust dialog when running in /tmp/e2e-test (not in ~/.claude.json trusted
projects list written during install)

codex: replace `codex exec --full-auto` (removed in new @openai/codex) with
`codex -q -a full-auto` — quiet mode + full-auto approval, no exec subcommand

email: auto-load RESEND_API_KEY + KEY_REQUEST_EMAIL from
/etc/spawn-key-server-auth.env (QA VM) or ~/.config/spawn/resend.env (local)
so send_matrix_email fires on every e2e run, not just QA-cycle runs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(e2e): correct claude and codex input test commands

- claude: pass prompt as positional arg to claude -p instead of piping
  via stdin (stdin pipe breaks through SSH exec chain, causing
  "Input must be provided either through stdin or as a prompt argument"
  error)
- codex: revert to `codex exec --full-auto` subcommand (correct for
  v0.116.0 — previous -q -a full-auto flags don't exist)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 03:08:37 +07:00
Ahmed Abushagur
48163ea2ee
feat(e2e): AI-powered log review catches non-fatal issues (#2875)
* feat(e2e): add AI-powered log review after provisioning

Feeds provision stderr/stdout logs to an LLM after each agent deploys.
Catches non-fatal issues that binary pass/fail checks miss: silent 404s,
failed component installs, connection instability, swallowed warnings.

This would have caught the keep-alive 404 and the sprite idle shutdown
that the existing E2E tests missed because installSpriteKeepAlive() is
non-fatal and the binary checks only verify final state.

- Uses gemini-flash-lite-2.0 via OpenRouter (cheap, fast)
- Advisory only — never fails the test, reports findings as warnings
- Truncates logs to last 200 lines to stay within token limits
- Skips gracefully if OPENROUTER_API_KEY is missing or API fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(e2e): add AI log review and --fast mode testing

AI log review:
- After each agent provisions, feeds stderr/stdout to gemini-flash-lite
  to catch non-fatal issues binary checks miss (404s, failed installs,
  connection drops, swallowed warnings)
- Advisory only — never fails the test, surfaces findings as warnings
- Would have caught the keep-alive 404 and sprite idle shutdown

--fast mode E2E:
- Add --fast flag to e2e.sh, passed through to spawn CLI during provision
- Update QA e2e-tester protocol to run both normal and --fast passes
- --fast enables images + tarballs + parallel boot

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-23 02:15:09 +07:00
Ahmed Abushagur
baf03ce47b
fix: prevent sprite idle shutdown during agent install (#2874)
The sprite was going idle and shutting down during long npm install
operations because the remote keep-alive script wasn't installed yet
and sprite exec alone doesn't count as activity.

- Add local keep-alive that pings the sprite's public URL every 30s
  from the client machine during provisioning and agent install
- Stop it when the interactive session starts (remote script takes over)
- Add i/o timeout to spriteRetry's transient error regex so connection
  timeouts are retried instead of failing immediately

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 02:13:07 +07:00
Ahmed Abushagur
66a1749b4b
fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
* fix: destroy orphaned Packer builder instances on workflow cancel

When a Packer Snapshots workflow is cancelled mid-build, Packer's process
is killed before it can clean up its temporary builder droplet/server.
This leaves orphaned packer-* instances running and costing money.

Add `if: cancelled()` cleanup steps for both DigitalOcean and Hetzner
that destroy any packer-* prefixed instances after cancellation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove Hetzner cleanup step — only DO needed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove Hetzner from Packer snapshots, add cancel cleanup

Remove Hetzner from the Packer workflow entirely — only DigitalOcean
snapshots are built. Deletes packer/hetzner.pkr.hcl and simplifies the
workflow by removing all Hetzner-specific steps and cloud conditionals.

Also adds a cancelled() cleanup step that destroys orphaned packer-*
builder droplets when a workflow run is cancelled mid-build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing sprite-keep-running.sh script

The keep-alive install was 404ing because sh/shared/sprite-keep-running.sh
never existed in the repo. The TypeScript code downloaded it from the CDN
(which maps to sh/shared/) but the file was never created.

The script wraps a command and pings the sprite's own public URL every 30s
to prevent inactivity shutdown. It resolves the URL via sprite-env info
(available on all sprites) and falls back to exec without keep-alive if
the URL can't be determined.

Also removes Hetzner from the Packer snapshots workflow entirely — only
DigitalOcean snapshots are built.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address security review — scope cleanup filter, fix JSON injection

1. Add `spawn-packer` tag to DO builder droplets in Packer template and
   filter cleanup by tag instead of broad `packer-` name prefix. Prevents
   accidentally destroying builder instances from other concurrent builds.

2. Use `jq --arg` for SINGLE_AGENT_INPUT instead of string interpolation
   to prevent JSON injection via crafted agent names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:13:38 +00:00
A
87f49eba48
test: remove duplicate and theatrical tests (#2873)
Remove 7 redundant tests that test the same code paths as existing tests:

- history.test.ts: consolidate 4 separate "unrecognized JSON value" tests
  (non-array object, JSON string, null, number) into one data-driven test.
  All 4 hit the identical parseHistoryData "Unrecognized format" branch.

- cmd-link-cov.test.ts: remove "exits with error when no IP provided" —
  duplicate of the same test in cmd-link.test.ts with identical behavior.

- update-check-cov.test.ts: remove "skips in test environment" and "skips
  when SPAWN_NO_UPDATE_CHECK=1" — both already covered in update-check.test.ts.

- orchestrate-cov.test.ts: remove "calls preLaunch when defined" — identical
  to the same test in orchestrate.test.ts (same mock setup, same assertion).

All 1866 remaining tests pass. Lint clean.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 20:22:47 +07:00
A
c25594cf09
test: Remove duplicate killWithTimeout tests (#2870)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
* test: remove duplicate and theatrical tests

- cmd-fix-cov.test.ts: remove 6 duplicate fixSpawn tests already covered
  in cmd-fix.test.ts; keep only the unique success message assertion
- icon-integrity.test.ts: consolidate 54 per-entity it() blocks into 4
  data-driven tests (same 67 expect() calls, 50 fewer test cases)
- manifest-type-contracts.test.ts: consolidate per-field for-loop it()
  blocks into 3 grouped tests (same 662 expect() calls, 15 fewer cases)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: remove duplicate killWithTimeout tests from ssh-cov.test.ts

The `killWithTimeout additional` describe block in ssh-cov.test.ts
duplicated scenarios already covered in kill-with-timeout.test.ts:
- "sends SIGTERM then SIGKILL" == kill-with-timeout's SIGKILL grace test
- "does nothing when first kill throws" == kill-with-timeout's SIGTERM throw test

Removed the 2 duplicate tests from ssh-cov.test.ts. The dedicated
kill-with-timeout.test.ts file is the canonical location for
killWithTimeout coverage.

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-22 16:47:59 +07:00
A
57e06bab4a
fix(e2e): fix manual .spawnrc creation on Sprite (stdin piping broken) (#2872)
The manual .spawnrc fallback in provision.sh was using `printf '%s' "${env_b64}" | cloud_exec ...`,
which works for SSH-based clouds (Hetzner, GCP, AWS) where stdin is passed through the SSH
connection. However, Sprite's exec driver replaces stdin with the command pipe:
  `printf '%s' "${cmd}" | sprite exec -s NAME -- bash`
This causes the outer env_b64 pipe to be lost — `base64 -d` receives no input and writes an
empty .spawnrc, which then fails the OPENROUTER_API_KEY and openrouter.ai verification checks.

Fix: embed the base64 data directly in the command string using `printf '%s' '${env_b64}'`.
This is safe because env_b64 is validated to contain only [A-Za-z0-9+/=] — the standard
base64 alphabet — which cannot break out of single quotes or cause shell injection.

Confirmed by E2E run where sprite/claude and sprite/openclaw both failed with:
  [FAIL] OPENROUTER_API_KEY not found in .spawnrc
  [FAIL] Failed to create manual .spawnrc

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 16:46:05 +07:00
A
cc8b6601ec
refactor: remove stale references and add missing entries to test README (#2871)
- remove stale reference to `commands-update-download.test.ts` (renamed to `cmd-update-cov.test.ts`)
- remove stale reference to `picker.test.ts` (renamed to `picker-cov.test.ts`)
- add 25 missing `-cov.test.ts` files that exist on disk but were undocumented

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-22 15:47:58 +07:00
A
7e56e1839b
test: remove duplicate and theatrical tests (#2868)
- cmd-fix-cov.test.ts: remove 6 duplicate fixSpawn tests already covered
  in cmd-fix.test.ts; keep only the unique success message assertion
- icon-integrity.test.ts: consolidate 54 per-entity it() blocks into 4
  data-driven tests (same 67 expect() calls, 50 fewer test cases)
- manifest-type-contracts.test.ts: consolidate per-field for-loop it()
  blocks into 3 grouped tests (same 662 expect() calls, 15 fewer cases)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 12:06:55 +07:00
A
c1363b138c
feat(gcp): default boot disk to 40 GB, configurable via GCP_DISK_SIZE (#2867)
GCP's default 10 GB boot disk is insufficient for coding agents — node_modules,
apt packages, and build caches easily exceed it. Default to 40 GB and allow
override via GCP_DISK_SIZE env var.

Closes #2866

Co-authored-by: Claude <claude@anthropic.com>
2026-03-22 11:21:05 +07:00
A
92f2de4036
test: remove theatrical tests — replace no-op assertions with real signal (#2863)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
preflight-credentials.test.ts: all 7 tests had zero expect() calls with
comments like "// No crash = pass". Rewrote to capture logWarn mock calls
from mockClackPrompts() and assert on warning presence and credential names.

sprite-cov.test.ts: 13 out of 23 tests had no expect/rejects calls (just
called functions and discarded results). Added assertions on Bun.spawn call
counts to verify: authenticated paths skip login, unauthenticated paths
trigger login, createSprite reuses vs creates based on list output,
verifySpriteConnectivity calls sprite twice, setupShellEnvironment runs
multiple exec commands.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 08:38:39 +07:00
A
300e2fc221
fix(security): shellQuote cmd in runServer() across all cloud providers (#2862)
Defense-in-depth: explicitly shellQuote(cmd) inside runServer() so the
cmd parameter is always protected by single-quote escaping, regardless
of how the surrounding command string is constructed.

Previously, cmd was interpolated raw into fullCmd before the outer
shellQuote() wrapper. While the outer wrapper did protect it, this
made the safety non-obvious and fragile against future refactors.
The new pattern matches interactiveSession() where cmd gets its own
shellQuote() call.

Fixes #2859

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 14:48:37 -07:00
A
3f12cb9ee8
refactor: remove duplicate docker constants into shared orchestrate module (#2860)
Consolidate DOCKER_CONTAINER_NAME and DOCKER_REGISTRY constants from
gcp/main.ts and hetzner/main.ts into shared/orchestrate.ts. Both files
defined identical values ("spawn-agent" and "ghcr.io/openrouterteam"); they
now import the shared exports instead.

Bumps CLI patch version to 0.25.11.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-21 14:27:21 -07:00
A
d480a7fec4
test: remove duplicate and theatrical tests (#2861)
- manifest.test.ts: remove 4 duplicate loadManifest error/fallback tests
  (HTTP 500 stale-cache, no-cache-HTTP500-throws, invalid-manifest-throws,
  network-error-throws) — all covered more thoroughly by
  manifest-cache-lifecycle.test.ts

- ssh-keys.test.ts: remove 2-key sorting test superseded by ssh-keys-cov.test.ts
  which validates the full 3-way sort order (ED25519 > RSA > ECDSA)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 03:43:47 +07:00
A
7ab6c693d3
fix: add --beta docker to help output and update description (#2857)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
The --beta docker feature (PR #2854) was missing from `spawn help`
output, and its error description said "Hetzner" only but it also
works on GCP.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-21 06:20:35 -07:00
A
2f329684e0
test: remove duplicate and theatrical tests (#2858)
- aws-cov.test.ts: remove aws/BUNDLES (3 tests) and aws/credential-persistence
  (6 tests) — all scenarios already covered by aws.test.ts with stronger
  assertions (>= 5 tiers vs >= 3, pricing format, naming convention, etc.)

- cmd-run-cov.test.ts: remove "cmdRun dry run" and "cmdRun validation" (3 tests)
  — dry-run is covered more thoroughly in cmdrun-happy-path.test.ts;
  validation tests duplicate commands-error-paths.test.ts exactly

- agent-setup-cov.test.ts: remove "agents return non-empty launch commands"
  (weaker duplicate of "all agents have launchCmd") and "agents have configure
  functions" (no expect() calls — theatrical)

Total: 5 tests removed, 162 lines deleted, 0 regressions (1951 pass)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 19:49:27 +07:00
Ahmed Abushagur
6d2c4746f5
feat: add --beta docker for Hetzner Docker CE app image (#2854)
* feat: add --beta docker for Hetzner Docker CE app image

Uses Hetzner's pre-built docker-ce app image when --beta docker
(or --fast) is active, giving faster boot times similar to DO
marketplace images. Snapshots still take priority when available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: pull and run pre-built agent Docker images on Hetzner

When --beta docker (or --fast) is active, boots Hetzner with docker-ce
app image, then pulls ghcr.io/openrouterteam/spawn-{agent}:latest and
runs it. All runServer commands are routed through docker exec into
the container, and the interactive session uses docker exec -it.
Skips agent install since the agent is pre-baked in the image.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add --beta docker support for GCP with Container-Optimized OS

When --beta docker (or --fast) is active on GCP, uses cos-stable
from cos-cloud (Docker pre-installed, read-only OS). Skips cloud-init
startup script (incompatible with COS), pulls the pre-built agent
image from ghcr.io, and routes all commands through docker exec.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct import path for logInfo/logStep (shared/log.js -> shared/ui.js)

The log.js module does not exist; these functions are exported from ui.ts.
Also merge duplicate ui.js imports per biome organizeImports.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-21 17:10:19 +07:00
A
bfe9fb9808
test: remove duplicate and theatrical tests (#2856)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
- Replace 10x `expect(true).toBe(true)` in update-check-cov.test.ts with
  meaningful assertions: skip-condition tests now verify fetch was NOT called,
  fetch-failure tests use `resolves.toBeUndefined()`, backoff edge-case tests
  verify fetch WAS called (proving the skip was bypassed)
- Remove theatrical executor existence check (`typeof executor.execFileSync === "function"`)
  that proved nothing about behavior
- Replace structural `typeof agent.install/envVars/launchCmd === "function"` checks in
  agent-setup-cov.test.ts with assertion that agent names are non-empty strings;
  the downstream tests already prove the functions work by calling them

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-21 15:48:44 +07:00
Ahmed Abushagur
8c7a381375
fix: auto-reconnect on Sprite connection drops (#2855)
Sprite CLI exits with code 1 on "connection closed" (not 255 like SSH).
The reconnect loop now treats exit code 1 on Sprite as a connection
drop, retrying up to 5 times with a 3s delay between attempts.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 15:13:14 +07:00
A
a3e0dbd4dd
test: remove duplicate and theatrical tests (#2853)
- Remove `digitalocean/findSpawnSnapshot` describe from do-cov.test.ts
  (3 basic tests) — fully superseded by do-snapshot.test.ts (7 thorough
  tests covering name filtering, invalid IDs, network failure, etc.)

- Remove `setupAutoUpdate` describe from agent-setup-cov.test.ts
  (2 shallow tests checking only "systemd" string presence) — fully
  superseded by auto-update.test.ts which verifies exact systemd unit
  content, base64-encoded scripts, timer schedules, and error handling

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21 12:24:00 +07:00
Ahmed Abushagur
26332afa56
fix: prevent silent exit in --fast mode on Sprite (#2852)
In fast mode, Promise.allSettled runs server boot, OAuth, and tarball
download concurrently. When all operations complete — especially after
Bun.serve.stop(true) in the OAuth flow removes its event loop handle —
the event loop can appear empty before the await continuation starts
new I/O operations. This causes Bun to exit silently with code 0,
dropping the user back to their shell after "Successfully obtained
OpenRouter API key via OAuth!" with no error.

Fix: keep a dummy setInterval handle alive during the fast-mode
concurrent section so the event loop never drains prematurely.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 20:51:02 -07:00