Commit graph

2301 commits

Author SHA1 Message Date
A
e045cf6f78
fix(security): prevent sed delimiter injection and harden SPAWN_ISSUE validation (#2964)
safe_substitute: Switch sed delimiter from | to \x01 (SOH control char) across
qa.sh, refactor.sh, security.sh, and discovery.sh. This eliminates delimiter
injection regardless of value content, since \x01 cannot appear in normal input.
Values containing \x01 are explicitly rejected as defense-in-depth.

SPAWN_ISSUE: Fix qa.sh validation from ^[0-9]+$ to ^[1-9][0-9]*$ to reject
leading zeros and zero itself. Add 32-bit signed integer range check
(max 2147483647) to all three scripts (qa.sh, refactor.sh, security.sh)
to prevent integer overflow in downstream consumers.

Fixes #2961
Fixes #2962

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25 01:51:41 +07:00
A
ad00f93cf7
test: merge duplicate createCloudAgents describe blocks and use beforeEach (#2959)
Merged "createCloudAgents" and "createCloudAgents detailed" into a single
describe block. Both blocks tested the same function with no structural
distinction, causing duplicate organization without value.

Eliminated 26 repetitive inline runner object constructions by moving
runner and result setup into beforeEach. This removes ~115 lines of
boilerplate while keeping all 21 tests and their assertions intact.

1895 tests still pass.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 23:53:35 +07:00
A
a6940fdaad
fix(e2e): improve interactive harness failure logging (#2951)
On interactive provision failure, save the harness log to a persistent
path (/tmp/spawn-interactive-harness-last.log) for post-mortem inspection,
and filter output to only show [harness] prefixed lines (30 lines) instead
of dumping 50 raw lines of mixed output.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
2026-03-24 08:45:19 -07:00
A
65320abf05
refactor(test): extract shouldSkipCloudInit helper and add unit tests (#2958)
Extracts the inline docker-mode condition from hetzner/main.ts and
gcp/main.ts into a testable exported function in shared/cloud-init.ts,
then adds real unit tests that import from the source. Fixes #2952.

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 22:32:53 +07:00
A
6c742bdd11
fix(e2e): increase hermes install timeout to fix failures on Hetzner/DO/GCP (#2956)
Hermes installs a Python virtualenv which takes 20+ min on fresh VMs.
The previous 300s install timeout caused the CLI to give up before
writing .spawnrc, leading to 30-min E2E timeouts on Hetzner, DigitalOcean,
and GCP (but not Sprite, which has a manual .spawnrc fallback).

Changes:
- agent-setup.ts: hermes installAgent timeout 300s → 600s
- common.sh: add hermes per-agent overrides (_PROVISION_TIMEOUT_hermes=720,
  _AGENT_TIMEOUT_hermes=3600) to give the install enough headroom
- package.json: bump CLI version 0.25.26 → 0.25.27

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 21:34:41 +07:00
A
b2606084e6
test: remove theatrical source-grep test for docker-mode waitForReady (#2953)
docker-cloudinit-skip.test.ts was reading source file contents with readFileSync
and checking for the presence of specific string literals — a source-grep
anti-pattern that tests the text exists, not that the behavior works.

The waitForReady() closure in hetzner/main.ts and gcp/main.ts cannot be directly
unit tested without refactoring (tracked in #2952). The source-grep tests are
removed to avoid false confidence.

Filed https://github.com/OpenRouterTeam/spawn/issues/2952 to track proper
behavioral testing via extracting the skip-cloud-init condition into a testable
exported helper.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 20:40:03 +07:00
A
77dbeb95ae
fix(fix): add missing LANG export to buildFixScript (#2954)
`buildFixScript()` was missing `export LANG='C.UTF-8'` that was added to
the canonical `generateEnvConfig()` in commit f93c799d. Users running
`spawn fix` would get a `.spawnrc` without the UTF-8 locale export,
causing garbled Unicode in agent TUIs — the same regression that f93c799d
fixed for fresh provisioning.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 20:38:05 +07:00
A
c3cdd7ec8d
test: remove theatrical source-grep tests, replace with real unit tests (#2948)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
do-min-size.test.ts was reading source file contents with readFileSync
and checking for the presence of specific strings (bash-grep anti-pattern).
Fixes:
- Export slugRamGb and AGENT_MIN_SIZE from digitalocean.ts
- Import them in main.ts instead of re-defining
- Rewrite do-min-size tests to call functions with inputs and assert outputs
  (3 source-grep tests → 6 behavior tests)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 02:08:45 -07:00
A
f93c799db8
fix(ux): suppress duplicate install message and set UTF-8 locale (#2950)
1. Suppress Claude Code curl installer stdout — the remote installer
   prints its own "Installation complete!" which duplicated the local
   "Claude Code agent installed successfully" message.

2. Export LANG=C.UTF-8 in both the interactive SSH session command and
   the .spawnrc env config. Fresh cloud VMs often default to the C
   locale which cannot render Unicode properly, causing garbled ANSI
   output in agent TUIs (e.g. "⏵⏵bypasspermissionson" instead of
   properly spaced text).

Fixes #2946

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 01:59:11 -07:00
A
0f3cb8b2eb
docs(tests): add missing test entries to __tests__/README (#2949)
Two test files (do-min-size.test.ts, docker-cloudinit-skip.test.ts) existed
on disk but were not documented in the README. Add entries for both.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 15:51:43 +07:00
A
056ce252c7
fix(e2e): suppress matrix email on targeted re-runs via SPAWN_E2E_SKIP_EMAIL (#2944)
When the quality cycle e2e-tester re-runs only failed agents
(e.g. `e2e.sh --cloud hetzner zeroclaw codex`), e2e.sh was firing
a matrix email showing only those 2 agents — both PASS if the retry
succeeded. This looked like "2 tests ran, all passed" when in reality
32 tests ran with 2 failures.

- Add SPAWN_E2E_SKIP_EMAIL=1 env var check at the top of send_matrix_email
- Update qa-quality-prompt.md to set SPAWN_E2E_SKIP_EMAIL=1 on re-runs

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 00:17:10 -07:00
A
aafeda4020
fix(e2e): reduce Hetzner max parallel from 5 to 3 to respect primary IP quota (#2943)
The QA account's primary IP limit is ~3, so running 5 agents in parallel
exhausted the quota, causing codex and zeroclaw to fail with
resource_limit_exceeded. Reducing _hetzner_max_parallel to 3 keeps
provisioning within quota while still running agents concurrently.

Verified: zeroclaw and codex both PASS on Hetzner after this fix.

-- qa/e2e-tester

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 13:32:10 +07:00
A
81ab237efe
fix(e2e): harden shell scripts against injection in SSH commands (#2945)
- hetzner.sh: Pipe base64-encoded command via stdin to SSH instead of
  embedding it in the SSH command string via variable expansion. The
  remote bash reads stdin, base64-decodes, and executes.

- verify.sh: Add remote-side re-validation of base64 and timeout values
  in _stage_prompt_remotely and _stage_timeout_remotely. Values are
  assigned to remote shell variables and validated before writing to
  temp files, providing defense-in-depth against injection.

- provision.sh: Add explicit early rejection of dangerous shell chars
  ($, `, \) in env var values from cloud_headless_env, and add
  remote-side re-validation of base64 payload before writing.

Fixes #2937
Fixes #2938
Fixes #2939

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 13:30:47 +07:00
A
8ed8d91205
fix(qa): stash before pull, fix star count push, fix claude update flag (#2942)
- Stash uncommitted changes before git pull --rebase so the pull
  never aborts with "You have unstaged changes"
- Pull --rebase before pushing star count commit to avoid
  non-fast-forward rejection (was failing every single cycle)
- Remove --yes flag from claude update (flag was removed upstream)
- Fix interactive harness AI prompt: update success marker text from
  "is ready" or "Starting agent" to match code check
  ("Starting agent..." or "setup completed successfully")

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 12:53:27 +07:00
A
4f141486dc
refactor: remove dead code and stale references (#2940)
- fix misplaced interactive_provision comment block in interactive.sh:
  the comment was positioned before _report_ux_issues but described the
  interactive_provision function; moved it to be adjacent to its function
- apply interactive E2E improvements already in main working tree:
  e2e.sh: add verify_agent call after interactive_provision to wait for
  .spawnrc before running input tests (aligns interactive with headless flow)

-- qa/code-quality

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 12:09:50 +07:00
A
e9cbab5b7f
fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936)
Three fixes for Sprite E2E failures in long-running batches (73+ min):

1. Retry `_sprite_provision_verify`: list failures now retry 3x with
   exponential backoff (5s, 10s, 20s) instead of failing immediately.
   Fixes kilocode batch 6 "Could not list Sprite instances" errors.

2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add
   `Client.Timeout`, `request canceled`, and `authentication failed`
   to the transient error retry pattern in `spriteRetry`. Also uses
   linear backoff (3s * attempt) instead of fixed 3s delay.
   Fixes hermes batch 7 HTTP timeout errors.

3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The
   E2E orchestrator calls `cloud_refresh_auth` before each provisioning
   batch. For Sprite, this re-validates the token via `sprite org list`
   and attempts `sprite auth refresh` if expired.
   Fixes junie batch 8 "authentication failed" errors.

Fixes #2934

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:47:58 -07:00
A
50319e0d39
fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935)
Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary
IPs from previous test runs consume the account quota. This adds proactive
cleanup at two levels:

1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached
   primary IPs during pre-batch stale cleanup, freeing quota before any
   new servers are provisioned.

2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()`
   before `createServer()` in headless/non-interactive mode, ensuring
   each agent provisioning attempt starts with a clean IP quota.

The existing reactive cleanup (retry after failure) in `hetzner.ts`
remains as a fallback.

Fixes #2933

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 11:20:30 +07:00
Ahmed Abushagur
3b150eabd8
fix: skip cloud-init wait in Hetzner Docker mode (#2924)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Hetzner's waitForReady() was missing the useDocker check that GCP
already has. Non-minimal agents (openclaw, codex) with --beta docker
waited 5 minutes for a cloud-init marker that never appears on Docker
CE app images.

Adds useDocker to the condition and a source-level regression test
verifying both Hetzner and GCP include the check.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:36:37 -07:00
Ahmed Abushagur
659fd1c6da
fix: use POSIX normalize for remote Linux paths in validateRemotePath (#2929)
node:path.normalize() is platform-dependent — on Windows it converts
forward slashes to backslashes, which then fail the character allowlist
regex. Remote paths are always Linux paths regardless of the client OS.

Switch to node:path/posix so normalization always uses forward slashes.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:34:49 -07:00
Ahmed Abushagur
8d73d73406
fix: rethrow normalized Error in tryCatchIf/asyncTryCatchIf (#2930)
When the guard returns false, both functions re-threw the raw caught
value (e) instead of the normalized Error (err). If a non-Error value
was thrown (string, number), downstream handlers received inconsistent
types instead of always getting Error instances.

Changed throw e → throw err in both functions.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:33:05 -07:00
A
75cff300b4
docs: sync README with source of truth (#2932)
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 08:43:49 +07:00
Ahmed Abushagur
56f7840f0c
fix: fail fast when GCP delete is missing project metadata (#2925)
When history metadata lacks a project ID, spawn delete silently fell
back to the gcloud default project, attempting deletion in the wrong
project (404) while the instance kept running and billing.

Now fails fast with a clear error and link to GCP Console. Also adds
a defensive check in destroyInstance() to reject empty project.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 08:42:47 +07:00
A
f5f0b9ec64
fix(lint): fix biome violations in packages/shared and add to CI (#2923)
The CI biome check only covered packages/cli/src/, .claude/scripts/,
and .claude/skills/setup-spa/ — packages/shared/src/ was unchecked,
allowing 7 lint/format violations to accumulate in its test files.

- Auto-fix import ordering, formatting, and useNumberNamespace lint
  across 3 test files in packages/shared/src/__tests__/
- Add packages/shared/src/ to the biome check in lint.yml so future
  violations are caught in CI

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:49:55 -07:00
Ahmed Abushagur
2f4fef049a
fix: enforce minimum droplet size for any undersized selection (#2931)
The min-size check only triggered when the exact default slug was
selected (s-2vcpu-2gb). Users who chose s-1vcpu-1gb or s-1vcpu-2gb
bypassed the check and got OOM crashes on openclaw.

Now parses RAM from the DO slug and compares GB values, so any size
below the agent's minimum gets upgraded.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 07:34:44 +07:00
Ahmed Abushagur
42df6f753a
fix: prevent uninstall from truncating RC files with missing end marker (#2927)
If the end marker (# <<< spawn <<<) is missing from .bashrc/.zshrc,
cleanRcFile dropped all content after the start marker. Now detects
unclosed blocks and skips the file with a warning instead of writing
a truncated version.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 06:54:10 +07:00
Ahmed Abushagur
9651e029df
fix: handle missing ssh-keygen in getSshFingerprint (#2926)
getSshFingerprint called Bun.spawnSync without error handling, crashing
the CLI if ssh-keygen is not in PATH. Wrapped with unwrapOr(tryCatch())
to return empty string on failure, matching getKeyType's pattern.

Also added empty fingerprint handling to Hetzner SSH key registration
(matching DigitalOcean's existing pattern) to skip keys that can't be
fingerprinted instead of attempting re-registration.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 06:50:45 +07:00
Ahmed Abushagur
fd2d661e27
fix: validate manifest fields are plain objects, not just truthy (#2921)
* fix: validate manifest fields are plain objects, not just truthy

isValidManifest used !!data.agents/clouds/matrix which accepts strings,
numbers, and arrays. Downstream Object.keys() then silently returns
character indices or array indices instead of real agent/cloud names.
Replace with isPlainObject() checks to reject non-object values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add validation tests for non-object manifest fields

Tests that loadManifest rejects manifests where agents/clouds/matrix
are strings, arrays, or numbers instead of plain objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-24 06:48:54 +07:00
Ahmed Abushagur
472b315762
fix: prevent permanent history lock when PID file write fails (#2928)
Two bugs in acquireLock:
1. PID write failure was ignored — process returned success but left a
   lock dir without a PID file. If it crashed, no other process could
   detect the lock as stale, making it permanent.
2. Lock dirs without PID files were not treated as stale — other
   processes waited until timeout instead of cleaning up immediately.

Fix: retry on PID write failure (clean up dir first), and treat
lock dirs without PID files as broken/stale (force remove).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 06:47:10 +07:00
Ahmed Abushagur
6a6ca87969
fix: add sudo to tarball mirror commands for non-root SSH users (#2922)
* fix: add sudo to tarball mirror commands for non-root SSH users

The mirror step copies files from /root/ to $HOME/ for non-root users
(e.g. ubuntu on AWS Lightsail), but cp and chown ran without sudo.
A non-root user can't read /root/ or chown root-owned files, so the
mirror silently failed (errors suppressed by 2>/dev/null || true).

Adds sudo to cp/chown in both mirror blocks (tryTarballInstall and
uploadAndExtractTarball) and removes error suppression so failures
propagate to the caller.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: verify sudo in tarball mirror commands for both install paths

Adds tests for tryTarballInstall and uploadAndExtractTarball that assert:
- cp and chown use sudo (needed to read /root/ as non-root user)
- error suppression (2>/dev/null || true) is not present

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 05:47:39 +07:00
A
18b1a5f50f
fix(install): force IPv4 DNS for npm installs and add junie binary verify (#2920)
* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* chore: update agent GitHub star counts

* fix(install): force IPv4 DNS for npm installs and add junie binary verify

On Sprite VMs (and potentially other clouds with flaky IPv6 routing), npm
install of packages with native-binary postinstall scripts (kilocode, junie)
fails with i/o timeout when connecting to the npm registry over IPv6.

Changes:
- Add NODE_OPTIONS=--dns-result-order=ipv4first to NPM_PREFIX_SETUP so all
  npm installs prefer IPv4, preventing the IPv6 timeout on first attempt
- Add cd ~ before postinstall re-run in KILOCODE_BINARY_VERIFY to avoid
  "current working directory was deleted" errors in bun/node on retry
- Add JUNIE_BINARY_VERIFY snippet (analogous to kilocode) that detects and
  recovers from a failed junie postinstall by re-running it from $HOME
- Apply JUNIE_BINARY_VERIFY to the junie install command

Fixes sprite kilocode and junie failures seen in E2E run 2026-03-23.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24 05:13:12 +07:00
A
e0db833307
fix(update-check): redirect install script stdout to stderr in --output json mode (#2919)
When --output json is requested, the auto-update install script was
running with stdio: "inherit", causing [spawn] install messages to
pollute stdout before the JSON result, breaking JSON consumers.

Fix:
- Pre-scan process.argv for --output json before checkForUpdates()
  is called in index.ts (formal flag parsing happens later at line 944)
- Pass jsonOutput flag through checkForUpdates() -> performAutoUpdate()
- When jsonOutput=true, use stdio: ["pipe", stderr, stderr] for the
  install script execution so all output goes to stderr only
- Set SPAWN_CLI_UPDATED=1 env var on re-exec so JSON consumers can
  detect the update via cli_updated: true in SpawnResult
- Add cli_updated?: boolean to SpawnResult interface in commands/run.ts
- Add tests covering both json and non-json stdio behavior

Fixes #2918

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 03:18:50 +07:00
A
c1e6fb76f9
fix(e2e): harden pkill regex escaping against all metacharacters (#2917)
* fix(e2e): harden pkill regex escaping against all metacharacters (#2911)

The sed character class `[.[\*^$]` was malformed and missed several
extended regex metacharacters (+, ?, (, ), {, }, |). Replace with a
correct bracket expression that escapes all POSIX ERE metacharacters.

Although app_name is already validated to [A-Za-z0-9._-], fixing the
escaping is defense-in-depth against future changes to the validation.

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(e2e): correct sed bracket expression to escape ] character

Place ] first in character class so it's treated as literal.
Use \\ to match literal backslash.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 12:35:31 -07:00
A
f38ae693de
fix: set SPAWN_NON_INTERACTIVE in headless mode to prevent prompt hangs (#2916)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Headless mode set SPAWN_HEADLESS and SPAWN_MODE but not
SPAWN_NON_INTERACTIVE, which all cloud modules check before prompting.
This caused GCP (and potentially other clouds) to prompt for project
confirmation when stdin was closed, resulting in a fatal error.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 01:22:47 +07:00
A
a959a6db83
fix(types): remove as type assertions from test mocks (#2913)
Add missing fields (signalCode, resourceUsage, pid, killed) to
Bun.spawnSync and Bun.spawn mock return values so they satisfy the
full return types without needing `as` casts or biome-ignore comments.

Agent: style-reviewer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-24 00:24:49 +07:00
A
69a0d476a0
test: remove duplicate and theatrical tests (#2912)
Remove 8 tests that checked constant equality (DEFAULT_DROPLET_SIZE,
DEFAULT_DO_REGION, DEFAULT_MACHINE_TYPE, DEFAULT_ZONE, DEFAULT_SERVER_TYPE,
DEFAULT_LOCATION) across digitalocean/gcp/hetzner cov files — these tests
just hardcode the same string twice and break if the default is changed for
a valid reason.

Also remove 2 sleep() tests from ssh-cov.test.ts: sleep() is a trivial
setTimeout wrapper with no logic, and the timing test added 50ms of real
wall time per run.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-24 00:22:49 +07:00
A
0e17461fcd
test: remove duplicate cmdFix tests from cmd-fix-cov.test.ts (#2910)
Three tests in the `cmdFix (additional coverage)` describe block were
exact duplicates of tests already in cmd-fix.test.ts:

- "fixes directly when only one server" = "directly fixes when only one active server"
- "finds record by name when spawnId matches name" = "fixes by spawn name"
- "shows no active spawns when history is empty" = "shows message when no active spawns"

Removed the duplicate describe block and its now-unused imports.
Unique fixSpawn coverage (security validation, manifest failure, label
fallbacks, success message) is preserved.

Agent: pr-maintainer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-23 21:35:44 +07:00
A
f8e23317c9
fix(cli): fix openclaw DO size and kilocode CWD install failures (#2909)
- digitalocean: change openclaw min size from s-2vcpu-4gb-intel to
  s-2vcpu-4gb (intel variant no longer available in nyc3)
- agent-setup: add cd "$HOME" before kilocode npm install to prevent
  postinstall failure when CWD is deleted during npm global install
- bump version to 0.25.19

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 20:37:48 +07:00
A
59dea5fc09
refactor: remove dead code and stale references (#2908)
- remove `export` from `LocalTarball` interface in `shared/agent-tarball.ts`
  — the type is only used internally as the return type of `downloadTarballLocally`;
  it was never imported from outside the module.

- remove `getTerminalWidth` re-export from `commands/index.ts`
  — `getTerminalWidth` is only called inside `commands/info.ts` itself;
  it was re-exported through the barrel but never imported from there by any consumer or test.

bump CLI version patch: 0.25.18 → 0.25.19

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 19:51:41 +07:00
A
f296544c1c
fix(cli): bump version to 0.25.18 for security fix in #2904 (#2906)
Commit 97b6424 (fix(security): add cmd validation to Sprite
runSprite() and runSpriteSilent()) changed production CLI code without
a corresponding version bump. The CLI has auto-update — without this
bump users won't receive the null-byte injection guard.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-23 18:50:00 +07:00
A
97b6424ebe
fix(security): add cmd validation to Sprite runSprite() and runSpriteSilent() (#2904)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
Mirrors the guard already in interactiveSession() and all other clouds.
Null bytes in cmd could truncate commands at the C level.

Fixes #2903

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:30:25 +07:00
A
5392ff2d7a
fix: detect and recover from Hetzner primary_ip_limit exceeded error (#2905)
When parallel E2E runs exhaust Hetzner's Primary IP quota, the CLI now
detects the `resource_limit_exceeded` / `primary_ip_limit` error, automatically
cleans up orphaned Primary IPs (unattached to any server), and retries once.
If cleanup doesn't free quota, a clear message guides users to delete stale
resources or request a quota increase.

Fixes #2902

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 17:26:32 +07:00
A
d2f11bbf06
test: remove duplicate and theatrical tests (#2901)
cmd-pick-cov.test.ts: remove 8 theatrical flag-parsing tests that all hit
the same early-exit code path (no stdin options → exit 1). Each test
passed a different flag combination but all verified only that exit(1) was
thrown — no flag-specific behavior was actually exercised. Keep the one
meaningful test: "exits with error when no options provided".

ssh-cov.test.ts: consolidate 5 single-assertion constant-check tests into
2 tests (one per constant). All 5 previously tested string membership in
SSH_BASE_OPTS / SSH_INTERACTIVE_OPTS in separate it() blocks.

Before: 1868 tests, 4454 expect() calls
After:  1857 tests, 4446 expect() calls (-11 tests, -8 expects)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 16:28:30 +07:00
A
7aba20e327
fix(ux): deduplicate install messages, add newlines to SSH polling, clarify completion messages (#2900)
- Suppress stdout+stderr from `claude install --force` to prevent duplicate
  "successfully installed" messages (was printed up to 4x)
- Make logStepInline fall back to newline-separated output when stderr is not
  a TTY, so SSH port polling status is readable in piped/captured contexts
- Consolidate post-install completion messages into a single clear milestone:
  "Agent setup complete -- {agent} is ready on {cloud}"
- Bump CLI version to 0.25.16

Fixes #2899

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 15:26:34 +07:00
A
a96522829b
fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898)
* fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness

Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively.
The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN
is an OpenRouter key — every Anthropic API call returns 401, so the harness
returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires.

SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect,
ensuring the harness only tests the provisioning/installation UX.

* fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner

Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via
"printf | base64 -d | bash", which makes bash's stdin the decode pipe.
So piped data from the outer SSH call never reaches subcommands.

"printf '%s' 'VALUE' | cloud_exec APP 'cat > /tmp/.e2e-timeout'" always
creates an empty file, causing "timeout: invalid time interval ''" when
the input test runs.

Fix: embed the validated numeric timeout value directly in the printf
command string (safe — _validate_timeout ensures only [0-9] digits).

* test(e2e): add claude PATH diagnostics to input_test_claude

Temporary debug output to trace where claude is installed
after interactive provision completes.

* test(e2e): save harness transcript JSON on success for debugging

* fix(e2e): remove 'is ready' from harness success pattern

'SSH is ready' (emitted ~15s into provision when SSH connects but before
any agent installation) matched the /is ready/ pattern, triggering false
success detection. The harness killed the spawn CLI during cloud-init wait,
leaving a VM with no agent installed.

Fix: use the same precise patterns as the main repo's harness:
  /Starting agent\.\.\.|setup completed successfully/i
Both only fire after orchestrate.ts completes the full setup.

* chore(e2e): remove temporary debug instrumentation

* feat(e2e): add ai-powered ux review after interactive provision

After each successful interactive E2E run, the harness sends the full
terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt.
It looks for confusing messages, noisy output, missing context in spinners,
and unhelpful errors that don't explain next steps.

Findings are returned as uxIssues[] in the harness JSON result.
interactive.sh then files a GitHub issue per run listing each problem
with a verbatim example and concrete suggestion.

Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM
where ANTHROPIC_API_KEY is an OpenRouter key.

* refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required

- Random 33% gate: UX review runs on ~1 in 3 successful interactive
  provisions, not every run
- Minimum bar: only surface findings when AI found 3+ clear issues
  (filters one-off nits)
- Tighter system prompt: only flag obvious problems (repeated messages,
  debug leaks, cryptic errors), not minor style preferences

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(e2e): replace random throttle with stricter ux review prompt

Instead of Math.random() to suppress issues, make the AI self-regulate:
the system prompt now instructs it to only flag genuinely bad problems
(repeated messages, raw stack traces, no-feedback waits) and treat
zero findings as a good outcome, not a failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 13:42:02 +07:00
A
9448cb8ca0
fix(e2e): fix _stage_prompt_remotely to embed prompt inline instead of stdin pipe (#2897)
The stdin piping approach was broken: _hetzner_exec runs remote commands via
"printf '%s' 'ENCODED_CMD' | base64 -d | bash", which connects bash's stdin to
the base64 pipe rather than SSH's outer stdin. So `cat > /tmp/.e2e-prompt` read
from EOF — the encoded prompt was never written to the remote file.

Fix: embed the validated base64 prompt directly in the command string using
printf. This is safe because _validate_base64 ensures the prompt contains only
[A-Za-z0-9+/=] — no characters that can break out of single quotes or inject
shell metacharacters.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-23 12:19:51 +07:00
A
e7e3b327a1
test: remove duplicate saveSpawnRecord describe block (#2896)
The saveSpawnRecord tests in history-trimming.test.ts duplicated the
describe block already in history.test.ts. Moved the two unique test
cases ("no cap" 200-record retention and "assign id when missing") into
history.test.ts and removed the duplicate block from history-trimming.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-23 12:14:49 +07:00
A
f1f2667cb0
fix: skip interactive session in headless mode (#2895)
* fix: skip interactive session in headless mode (#2892)

When SPAWN_HEADLESS=1, the orchestrator now exits with code 0 after
provisioning completes instead of attempting to launch the agent
interactively. This fixes Claude Code (and other agents) failing with
"Input must be provided through stdin or --prompt" when spawned via
`--headless --output json` without a prompt.

The VM is fully provisioned and ready — callers can SSH in or use
`spawn connect` to start the agent manually.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clean up SPAWN_HEADLESS env in test afterEach to prevent leaks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-22 21:38:53 -07:00
A
9280489ada
fix(qa): load ANTHROPIC_AUTH_TOKEN as ANTHROPIC_API_KEY for interactive E2E (#2894)
* chore: update agent GitHub star counts

* fix(qa): load ANTHROPIC_AUTH_TOKEN as ANTHROPIC_API_KEY for interactive E2E

QA VMs store the Anthropic key as ANTHROPIC_AUTH_TOKEN in
/etc/spawn-qa-auth.env, but the e2e-interactive handler only looked for
ANTHROPIC_API_KEY — causing the 6am cron to fail immediately with
"ANTHROPIC_API_KEY not set". Accept either name when loading from the
auth env file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(e2e): bump interactive harness timeout to 20min, fix zombie VM teardown

- SESSION_TIMEOUT_MS: 10min → 20min — provisioning a VM takes 3-4 min
  before onboarding even starts; 10min wasn't enough headroom
- interactive.sh: call cloud_provision_verify even on harness failure so
  teardown can find and delete any VM that was partially created (e.g.
  on timeout mid-provision) — previously left zombie VMs with no .meta file

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-23 11:24:26 +07:00
Ahmed Abushagur
6aeb9ba142
feat(e2e): diff-aware AI review with e2e-last-green tracking (#2893)
AI log review now includes the git diff since the last fully passing
E2E run, enabling causal analysis like "this 404 likely caused by
commit abc123 which deleted file Y". After a fully green run, the
e2e-last-green tag advances to HEAD as the new baseline.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 11:21:35 +07:00
A
4d08dbe2a7
fix(security): harden remote command construction in provision.sh (#2886)
* fix(security): harden remote command construction in provision.sh

Split the .spawnrc upload fallback into two separate cloud_exec calls
to separate data from commands. Step 1 writes the validated base64
payload to a remote temp file. Step 2 decodes from that file and
sets up shell rc sourcing using a static command string with no
interpolated variables.

This eliminates command injection risk in the control-flow portion
of the remote command (for loop, grep, etc.) even if the base64
validation were ever bypassed, since user-controlled data never
appears in the same command string as shell control flow.

Fixes #2882

Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: correct error handling + use mktemp for temp file

- Return 1 (not 0) when step 1 fails to avoid masking provisioning failures
- Use mktemp -t spawnrc.b64 to avoid race conditions on concurrent provisions

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: propagate step 2 failure in provision.sh (return 1)

The else branch for step 2 (decode + shell rc setup) logged an error
but the function still returned 0, masking the failure. Now returns 1
so provisioning failures are correctly propagated.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 20:44:33 -07:00