The junie agent was added in #2300 but the E2E test scripts were not
updated. This adds junie to ALL_AGENTS, verify dispatch, input test
dispatch, and the provision.sh fallback env configuration.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
- Interactive picker: add blank separator line between entries so label
and subtitle are visually grouped (not blending into adjacent entries)
- Non-interactive table: wrap subtitle in pc.dim() for better contrast
with the bold entry name
- Update pickerHeight to account for added separator lines
Fixes#2309
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Three distinct E2E bugs fixed:
1. SSH key generation race condition: When multiple agents provision in
parallel, concurrent processes all call generateSshKey() and race to
create ~/.ssh/id_ed25519. ssh-keygen won't overwrite an existing file
(prompts on stdin which is "ignore"), causing zeroclaw/codex to fail
with "SSH key generation failed". Fix: check if key already exists
before generating, and re-check after a failed generation attempt.
2. Hetzner SSH key 409 uniqueness_error: The Hetzner API returns HTTP 409
with "SSH key not unique" when the same key content is registered under
a different name. The hetznerApi() function throws on non-2xx before
the error-parsing code runs, and the regex /already/ didn't match
"not unique". Fix: catch 409 in ensureSshKey() and match against
uniqueness_error/not unique/already patterns.
3. Hermes binary not found: The hermes install script (uv tool) creates
the actual binary + venv at ~/.hermes/hermes-agent/venv/ with a symlink
at ~/.local/bin/hermes. The tarball capture script only captured the
symlink + ~/.local/share/, leaving a dangling symlink. Fix: include
~/.hermes/ in capture paths, add venv/bin to verify.sh PATH check,
and update hermes launchCmd to include the venv PATH.
Fixes#2304
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Tests for getScriptFailureGuidance were failing when cloud credential
env vars (HCLOUD_TOKEN, DO_API_TOKEN) were set in the environment.
The tests expected these vars to appear as "missing" in the output,
but only unset OPENROUTER_API_KEY. Now both the cloud-specific var
and OPENROUTER_API_KEY are saved/unset before each test.
Bump CLI version to 0.15.11.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
The Phase 2 SSH handshake loop in waitForSsh spawns SSH processes
without a per-process timeout. ConnectTimeout=10 only covers TCP
connect — if sshd accepts the connection but stalls during key
exchange or authentication, the process hangs indefinitely. This
causes the entire spawn command to freeze with no way to recover.
Add a 30s killWithTimeout guard to each probe, matching the pattern
already used in every cloud-specific runServer/uploadFile function.
-- refactor/code-health
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
After every e2e run, send an HTML matrix report to KEY_REQUEST_EMAIL
via Resend showing pass/fail/skip per agent x cloud combination.
- e2e.sh: add send_matrix_email() — builds result table from LOG_DIR
result files, writes temp TS, calls bun run to POST to Resend API.
Called just before exit so LOG_DIR is still available.
- qa.sh (e2e mode): load RESEND_API_KEY + KEY_REQUEST_EMAIL from
/etc/spawn-key-server-auth.env before launching Claude so the creds
are inherited by the e2e.sh subprocess.
Both changes are no-ops when credentials are absent (silent skip).
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
All four SSH-based uploadFile functions (Hetzner, DO, AWS, GCP) used
`await proc.exited` on SCP subprocesses without any timeout guard.
If SCP hangs due to a network issue, the CLI hangs indefinitely.
This adds the same killWithTimeout pattern already used by runServer
and runServerCapture in these same files: a 120-second timeout that
kills the SCP process if it stalls.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Packer template:
- Match official 90-cleanup.sh: remove SSH host keys, create
revoked_keys, remove cloud-init instances, zero-fill free space,
use --force-confold for upgrades, autoremove/autoclean
- Add Packer manifest post-processor for snapshot ID extraction
- Remove PACKER_LOG=1 (debug logging not needed in production)
Workflow:
- Add "Submit to DO Marketplace" step after successful build
- Reads agent→app_id mapping from MARKETPLACE_APP_IDS secret (JSON)
- Extracts snapshot ID from Packer manifest, PATCHes Vendor API
- Gracefully handles 400 (app already pending review)
- Skips silently if no MARKETPLACE_APP_IDS secret is configured
Setup: add MARKETPLACE_APP_IDS secret as JSON, e.g.:
{"claude":"60089fc6...", "codex":"60089fc7..."}
App IDs come from the DO Vendor Portal after initial approval.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Fixes#2292
Unanchored grep -q would match the marker anywhere in output, including
error messages like "Expected SPAWN_E2E_OK but got...". Using grep -qx
requires the marker to appear as a complete line, preventing false passes.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
All 42 agent scripts across 6 clouds used BASH_SOURCE[0] with dirname
for local checkout detection. This breaks curl|bash execution because
BASH_SOURCE resolves to /dev/fd/XX instead of a real path.
Remove the BASH_SOURCE-based SCRIPT_DIR detection and the "Local checkout"
code path from all scripts. The SPAWN_CLI_DIR env var (used by e2e tests)
is the correct mechanism for running from source. Local cloud scripts
that previously lacked SPAWN_CLI_DIR support now have it.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace unsafe pattern where base64-encoded commands were interpolated
into remote command strings with secure stdin piping — command data now
travels as stdin rather than as part of the command string, eliminating
injection risk from shell metacharacter interpretation.
Affected functions across all 5 cloud drivers:
- _hetzner_exec_long
- _aws_exec_long
- _gcp_exec_long
- _digitalocean_exec_long
- _sprite_exec_long
Fixes#2286Fixes#2287
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: replace base64 interpolation with stdin piping in verify.sh (Fixes#2283)
Replace unsafe pattern where encoded prompt was interpolated into remote
command strings with secure stdin piping — prompt data now travels as stdin
rather than as part of the command string, eliminating injection risk.
Affected functions: input_test_claude, input_test_codex, input_test_openclaw,
input_test_zeroclaw.
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: use cloud_exec (not cloud_exec_long) for stdin piping
cloud_exec_long ignores stdin - remote base64 -d would hang.
cloud_exec passes cmd to bash -c, which preserves stdin piping.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: restore timeout protection for input tests using cloud_exec
Wraps each agent command in `timeout ${INPUT_TEST_TIMEOUT}` on the remote
side so tests cannot hang indefinitely after switching from cloud_exec_long
to cloud_exec. Updates stale comment referencing cloud_exec_long.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
De-export interfaces, types, and constants that are only used within
their own module files. These were exported but never imported by any
other module or test file, unnecessarily widening the public API surface.
Affected symbols:
- aws: AwsState, Region, REGIONS, AGENT_BUNDLE_DEFAULTS
- digitalocean: DigitalOceanState, DropletSize, DROPLET_SIZES, DoRegion, DO_REGIONS
- gcp: GcpState, MachineTypeTier, MACHINE_TYPES, ZoneOption, ZONES
- hetzner: HetznerState, ServerTypeTier, SERVER_TYPES, LocationOption, LOCATIONS
- sprite: SpriteState
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Add whitelist validation for AGENT_NAME immediately after the empty
check to prevent command injection and path traversal via the parameter.
While the existing case statement catches unknown agents, explicit
upfront validation makes the security intent clear and defensive.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The PKCE migration TODO referenced closed issue #2041. The TODO
itself is still valid (DigitalOcean still doesn't support PKCE),
so keep the migration checklist but drop the issue number.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
* refactor: remove commands.ts compatibility shim and fix stale references
- Delete packages/cli/src/commands.ts shim file (only re-exported commands/index.ts)
- Update index.ts to import directly from ./commands/index.js
- Update 24 test files to import from ../commands/index.js
- Fix stale CLAUDE.md reference to commands.ts
- Fix stale QA prompt references to commands.ts and wrong line numbers
- Bump CLI version to 0.15.8
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: remove stale references to deleted commands.ts compatibility shim
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
The v0 fallback path in loadHistory() returned raw parsed JSON array
directly without validating individual elements. This could cause
TypeErrors (e.g. r.agent.toLowerCase() on undefined) in callers like
getActiveServers and filterHistory when corrupted entries exist.
Now filters each element through v.safeParse(SpawnRecordSchema, el),
matching the validation the v1 path already performs.
Fixes#2277
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Three fixes for marketplace validation failures:
1. Install all security updates (apt-get dist-upgrade) — img_check
fails if any security patches are pending.
2. Purge droplet-agent and /opt/digitalocean — img_check fails if
the DO monitoring agent directory exists.
3. Correct img_check.sh filename to 99-img-check.sh — the previous
URL returned 404.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The marketplace-partners repo uses `99-img-check.sh`, not
`img_check.sh`. The wrong filename caused a 404 on curl download,
failing all agent builds with exit code 22.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: claude snapshot build — remove npm fallback from install command
The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.
Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: snapshot cleanup and lookup — use name prefix instead of tags
DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.
Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: packer build failures — OOM kill + history builtin
Two issues introduced by PR #2271 (marketplace compliance):
1. Droplet downsized to s-1vcpu-1gb (1GB RAM) — Claude's native
installer and zeroclaw's Rust build get OOM-killed. Restore
s-2vcpu-2gb.
2. Cleanup provisioner uses `history -c` which is a bash builtin.
Packer runs scripts with /bin/sh (dash on Ubuntu) which doesn't
have it → exit 127 on ALL agents. Remove it — the .bash_history
file deletion already handles persistent history.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: claude snapshot build — remove npm fallback from install command
The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.
Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: snapshot cleanup and lookup — use name prefix instead of tags
DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.
Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Switch build droplet from s-2vcpu-2gb to s-1vcpu-1gb ($6/mo) per DO
Marketplace recommendation for cross-size snapshot compatibility
- Add ufw firewall provisioner (deny incoming, allow SSH, enable)
- Replace basic apt-get clean with full DO Marketplace cleanup sequence:
removes SSH authorized_keys, clears bash history, truncates /var/log,
resets machine-id, and runs cloud-init clean so each launched droplet
gets a fresh identity on first boot
- Add img_check.sh validation step (from digitalocean/marketplace-partners)
to verify firewall active, no root password, and security posture before
the snapshot is finalized — build fails if image doesn't meet requirements
Fixes#2269
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: restore Packer DO snapshot pipeline for fast agent boot
Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.
- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
matrix strategy, auto-cleanup of old snapshots, and injection-safe
env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
invalid ID, network failure)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use repo-root-relative path for tier scripts in Packer template
Packer resolves script paths relative to cwd (repo root), not relative
to the .pkr.hcl file. Changed `scripts/tier-*.sh` to
`packer/scripts/tier-*.sh`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: Packer build region/size and PATH for agent installs
Two issues causing build failures:
1. `s-2vcpu-4gb` not available in `nyc3` — changed build region to
`sfo3` and size to `s-2vcpu-2gb` (universally available, cheaper,
sufficient for building snapshots)
2. Claude install puts binary in `~/.local/bin` which isn't in PATH
during Packer provisioning — added full PATH to environment_vars
on both the install and marker provisioners so agent binaries and
subsequent scripts can find each other
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: remove packages/shared, deduplicate with packages/cli/src/shared
packages/shared duplicated packages/cli/src/shared (parse.ts, result.ts,
type-guards.ts) with the CLI never importing from the shared package.
The only consumer was .claude/skills/setup-spa, which now imports directly
from packages/cli/src/shared via relative paths.
- Delete packages/shared entirely
- Update setup-spa imports to use relative paths to CLI shared
- Remove @openrouter/spawn-shared workspace dependency from setup-spa
- Update CLAUDE.md and type-safety.md references
Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: remove packages/shared from lint workflow, fix import sorting
The Biome Lint CI step referenced packages/shared/src/ which no longer
exists after this PR removes the package. Also fix import ordering in
setup-spa files to satisfy Biome's organizeImports rule.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: address Devin review — update stale packages/shared references
- Update type-safety.md line 67: packages/shared/src/parse.ts → packages/cli/src/shared/parse.ts
- Update install.ps1 sparse-checkout: remove packages/shared reference
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
manifest.json has 6 clouds (local, hetzner, aws, digitalocean, gcp,
sprite) and 7 agents, yielding 42 implemented matrix entries. The
README tagline incorrectly stated "7 clouds" and "49 combinations"
— likely stale from when Daytona was still listed.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat: restore Packer DO snapshot pipeline for fast agent boot
Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.
- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
matrix strategy, auto-cleanup of old snapshots, and injection-safe
env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
invalid ID, network failure)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use repo-root-relative path for tier scripts in Packer template
Packer resolves script paths relative to cwd (repo root), not relative
to the .pkr.hcl file. Changed `scripts/tier-*.sh` to
`packer/scripts/tier-*.sh`.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Remove 5 unused reset*State() exports (aws, hetzner, gcp, digitalocean,
sprite) that were never called anywhere in the codebase. Convert their
associated _state variables from let to const since they are no longer
reassigned.
Remove stale Daytona references in status.ts (comment and IP check)
left over after Daytona cloud provider removal in #2261.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.
- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
matrix strategy, auto-cleanup of old snapshots, and injection-safe
env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
invalid ID, network failure)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The status command (PR #2254) added --prune and --json flags but did not
register them in KNOWN_FLAGS. This caused the CLI to reject them with
"Unknown flag" errors before the command could even dispatch.
Bump CLI version 0.15.4 -> 0.15.5.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Simplify the cloud matrix by removing Daytona. All Daytona-specific code,
scripts, tests, and configuration have been removed. Daytona has been moved
to "Previously Considered" in the Cloud Provider Wishlist (#1183) and can
be revived on community demand.
Closes#2260
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes#2249
The overly broad `>>? word` pattern and generic doubled-operator check
were blocking legitimate natural-language developer prompts like:
- "Fix the merge conflict >> registration flow"
- "Run tests && deploy if they pass"
Root cause: `validatePrompt` is called before the prompt is set as the
`SPAWN_PROMPT` env var. Inside double-quoted shell arguments, `>>` and
`&&` are not interpreted as shell operators, so blocking them provided
no real security benefit while creating confusing UX rejections.
Changes:
- Remove `/>>?\s*[a-zA-Z_]\w{2,}/` pattern (false-positive on >> in English)
- Remove generic `hasDoubledOperators` check (false-positive on && in English)
- Keep all targeted patterns: $(cmd), backticks, ${var}, | bash/sh,
; rm -rf, fd redirections, heredoc, process substitution, path redirects
- Update tests: split broad && / || tests into "commands" vs "natural language"
- Add tests asserting all issue #2249 example prompts are now accepted
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#2252
history.json now uses a versioned envelope:
{ "version": 1, "records": [...] }
This creates a migration escape hatch for future SpawnRecord shape changes.
loadHistory() transparently reads both v0 (bare array) and v1 formats,
automatically migrating v0 files on next write. All write operations now
use writeHistory() to stamp the current schema version consistently.
Validation uses valibot schemas (VMConnectionSchema, SpawnRecordSchema,
HistoryFileV1Schema) so the structure is verified and typed without `as`
casts. Updated all affected tests to check data.records instead of data.
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the `spawn status` command requested in #2253. The command:
- Reads active (non-deleted) cloud servers from history
- Queries Hetzner and DigitalOcean REST APIs in parallel using saved tokens
- Shows a live-state table: ID, Agent, Cloud, IP, State, Since
- States: running (green), stopped (yellow), gone (dim), unknown (dim)
- --prune flag marks gone servers as deleted in history
- --json flag outputs machine-readable JSON for scripting
- `spawn ps` is an alias for `spawn status`
Other clouds (AWS, GCP, Sprite, Daytona) require CLI auth flows that cannot
run non-interactively; they report "unknown" with a helpful hint.
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Each cloud module (aws, daytona, digitalocean, gcp, hetzner, sprite) previously
stored per-operation state in bare module-level `let` variables, making them
process-global singletons. This is safe for single-cloud CLI invocations today
but creates latent bugs for multi-cloud orchestration and test isolation.
Replace scattered `let` globals with a single typed `_state` object per module:
- `AwsState` / `resetAwsState()` — 8 fields including `selectedBundle`
- `DaytonaState` / `resetDaytonaState()` — 5 fields
- `DigitalOceanState` / `resetDigitalOceanState()` — 3 fields
- `GcpState` / `resetGcpState()` — 5 fields
- `HetznerState` / `resetHetznerState()` — 3 fields
- `SpriteState` / `resetSpriteState()` — 2 fields
Each module exports a `resetXxxState()` function for test isolation. No function
signatures or existing exports were changed.
Fixes#2251
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: ARM tarball builds + arch-aware download
- Add ARM64 matrix entries for native binary agents (zeroclaw, opencode,
hermes, claude) in agent-tarballs.yml workflow
- Update agent-tarball.ts to detect remote VM arch via uname -m and
download the correct tarball (x86_64 or arm64)
- Change release strategy to support multiple arch assets per tag
- Document ARM build requirements in discovery.md for future agents
- Bump CLI version to 0.15.2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use sudo for tarball extraction on non-root SSH clouds
On AWS Lightsail, SSH connects as 'ubuntu' (not root), but tarballs
extract to /root/. Without sudo, tar fails with "Permission denied".
Conditionally use sudo when not running as root (id -u != 0).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
parseJsonRaw was removed in 8b99fe0a but CLAUDE.md and
.claude/rules/type-safety.md still referenced it. Updated
to parseJsonObj which is the current function name.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Docker delivery is superseded by the tarball approach (#2232) which is
faster (curl|tar ~5-15s vs Docker install ~30s + pull ~60s) and works
on every cloud without Docker as a dependency.
- Remove tryInstallFromDocker, withDockerInstall, DOCKER_IMAGE_PREFIX
- Remove dockerImage and slowInstall from AgentConfig
- Remove Docker cloud-init from DigitalOcean
- Unwrap openclaw and zeroclaw to direct install (tarball is tried
first in orchestrate.ts, these are the fallback)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Remove parseJsonRaw from packages/shared — exported but never imported
- Remove dead re-exports from agent-setup.ts (AgentConfig type, generateEnvConfig)
that no consumer imports (all callers use the original modules directly)
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
* fix: tarball workflow failures (root ownership, swapfile, hermes TTY)
- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: capture claude symlink target in tarball + fix verify PATH
The claude installer creates a symlink at ~/.local/bin/claude pointing
to ~/.local/share/claude/versions/X.Y.Z. The capture script was missing
~/.local/share/claude/, causing a broken symlink in the tarball.
Also add ~/.npm-global/bin to the verify PATH check for claude (npm
fallback install path).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
- Replace require() calls with ESM import in history-spawn-id.test.ts
(require() violates ESM-only rule per shell-scripts.md)
- Fix stale parseJsonRaw reference in test README (cli parse.ts does
not export parseJsonRaw; only packages/shared does)
- Add 5 missing test file entries to test README
-- qa/code-quality
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Remove two tests from "sequential saves at the boundary" that were
exact duplicates of tests in the "MAX_HISTORY_ENTRIES trimming" section:
- "99 to 100 entries" duplicated "should keep all entries when at exactly 100"
- "100 to 101 entries" duplicated "should trim to 100 when adding entry that exceeds the limit"
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
GITHUB_TOKEN containing newlines, tabs, or carriage returns could
corrupt ~/.config/gh/hosts.yml before permissions are set (line 314)
and bypass validation in downstream consumers. Defense-in-depth fix
following the pattern established in sh/shared/key-request.sh:78.
Fixes#2239
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: pre-built agent tarballs on GitHub Releases for fast install
Adds a nightly GitHub Actions workflow that builds and uploads agent
tarballs to rolling GitHub Releases. During provisioning, the CLI now
attempts to download and extract a tarball before falling back to live
install. Priority chain: snapshot > tarball > live install.
- New workflow: .github/workflows/agent-tarballs.yml
- New capture script: packer/scripts/capture-agent.sh
- New module: packages/cli/src/shared/agent-tarball.ts
- Orchestrate tries tarball first on non-local clouds
- Skip tarball when using DO snapshot (skipTarball flag)
- Tests for tarball install + orchestration integration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use global.fetch mock pattern and address security review
- Use `global.fetch = mock(...)` instead of `spyOn(globalThis, "fetch")`
to match codebase convention and fix CI mock interception
- Add URL validation regex to reject shell metacharacters (CRITICAL)
- Add agent name validation in workflow input (MEDIUM)
- Add `jq has()` check before executing install commands (CRITICAL)
- Use `tar -T` instead of unquoted word-splitting in capture-agent.sh (MEDIUM)
- Resolve merge conflicts with upstream/main (keep Docker fields, adapt
to simplified DO flow, bump version to 0.15.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use globalThis.fetch for testability in CI
Bun's native fetch binding doesn't go through global.fetch property
lookup, so global.fetch = mock(...) doesn't intercept it. Using
globalThis.fetch explicitly ensures the mock interception works.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing packer dependencies and harden install command safety
- Add packer/agents.json (agent tier + install command definitions)
- Add packer/scripts/tier-{minimal,node,bun,full}.sh (dependency scripts)
- Add basic command safety check rejecting suspicious patterns
- Document packer/agents.json as a trust boundary requiring PR review
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): fix npm prefix mismatch, add apt-get update, cleanup
- Add apt-get update -y before apt-get install in all tier scripts
- Add --prefix ~/.npm-global to npm install commands in agents.json
so installed packages land where capture-agent.sh expects them
- Rename misleading MARKER_DIR → MARKER_FILE in capture-agent.sh
- Remove stale comment referencing packer snapshots in workflow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): detect empty agent installs in capture script
The "no files found" check was dead code — the marker file is always
created before filtering, so FILTERED_FILE always had at least one
entry. Now we count non-marker entries to catch cases where the agent
install silently fails and no actual files are on disk.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): use bare fetch() for Bun mock compatibility in CI
In Bun, global.fetch = mock(...) overrides bare fetch() calls but NOT
globalThis.fetch() calls. Every other source file in the codebase uses
bare fetch() and their mocks work fine in CI. Switch to match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): use dependency injection for fetch in tests
Bun's global.fetch mock doesn't reliably intercept bare fetch() calls
across all Bun versions in CI. Instead of fighting the runtime, accept
an optional fetchFn parameter (defaults to fetch) and pass mock fetch
directly in tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): bypass mock.module bleed in agent-tarball tests
orchestrate.test.ts uses mock.module("../shared/agent-tarball", ...)
which is process-global in Bun and bleeds into agent-tarball.test.ts.
Import via URL (import.meta.url resolution) to bypass the specifier-
based mock matching and get the real module.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): eliminate mock.module bleed between test files
Bun's mock.module is process-global — orchestrate.test.ts mocking
agent-tarball poisoned agent-tarball.test.ts (the mock function
ignored the fetchFn parameter and always returned false).
Fix: make tryTarballInstall injectable via OrchestrationOptions.
orchestrate.test.ts passes the mock directly via options instead
of using mock.module. agent-tarball.test.ts imports the real module.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tests): mock Bun.which in credential priority tests
Tests assumed no cloud CLIs were installed, but machines with hcloud/
doctl would get "CLI installed" hint overrides, failing the assertion.
Spy on Bun.which to return null so tests are environment-independent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: fix import ordering after rebase
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: add curl domain allowlist and expand command blocklist
Addresses security review findings:
- Add domain allowlist for curl/wget targets (claude.ai, opencode.ai,
raw.githubusercontent.com, registry.npmjs.org, crates.io, github.com)
- Expand suspicious command blocklist (python -c, perl -e, ruby -e, dd, /dev/)
- Document 4-layer security model in workflow comments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: add rm -rf to command blocklist
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Remove sh/e2e/aws-e2e.sh: dead backwards-compat wrapper with no
references (superseded by unified e2e.sh --cloud aws)
- Remove getStatusDescription from commands/shared.ts: defined and
tested but never called in production code
- Remove parseJsonRaw from packages/cli/src/shared/parse.ts: zero
production usages (still available in packages/shared if needed)
- Update corresponding test files to remove dead code tests
- Bump CLI version to 0.14.4
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- manifest.json: change aws auth to AWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEY
so the key-request system includes AWS in its missing-key emails
- sh/e2e/e2e.sh: clouds missing credentials now SKIP (not FAIL), so
running --cloud all is safe and only tests what's configured
- qa.sh: include e2e mode in cloud credential loading (was fixtures+quality only)
- qa-quality-prompt.md: e2e-tester now runs e2e.sh --cloud all --parallel 6 --skip-input-test
- qa-e2e-prompt.md: standalone e2e bot now runs e2e.sh --cloud all --parallel 6
Also wires KEY_SERVER_URL + KEY_SERVER_SECRET into /etc/spawn-qa-auth.env
(system change, not in this commit) so missing-key emails are actually sent.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Add key_request: false to Daytona in manifest.json and update
_parse_cloud_auths() to skip clouds with that flag set.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: add unique spawn IDs to prevent history record corruption
History records were matched by heuristic ("most recent record for this
cloud without a connection"), which caused saveVmConnection and
saveLaunchCmd to overwrite the wrong record during concurrent or failed
spawns.
Fix: every SpawnRecord now has a unique `id` (UUID). All history
operations (saveVmConnection, saveLaunchCmd, removeRecord,
markRecordDeleted, mergeLastConnection) match by id when available,
falling back to the old heuristic for pre-migration records.
The orchestrator (TS path) now creates the history record AFTER server
creation succeeds, not before — so failed provisions don't leave orphan
entries.
Also adds "Remove from history" option to the spawn ls action picker,
restoring the ability to soft-delete entries without destroying the VM.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 18 unit tests for spawn ID history behavior
Tests cover:
- generateSpawnId returns unique UUIDs
- saveSpawnRecord auto-generates id when not provided
- saveVmConnection matches by spawnId (not heuristic)
- saveVmConnection does not cross-contaminate concurrent spawns
- saveVmConnection falls back to heuristic without spawnId
- saveLaunchCmd matches by spawnId (not heuristic)
- saveLaunchCmd falls back without spawnId
- removeRecord matches by id, not by timestamp+agent+cloud
- removeRecord handles duplicate timestamps correctly
- removeRecord falls back for legacy records without id
- markRecordDeleted targets correct record by id
- mergeLastConnection uses spawn_id from last-connection.json
- mergeLastConnection falls back to heuristic without spawn_id
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: enable biome import sorting with grouped imports
Adds organizeImports to biome assist config with groups:
1. Type imports
2. Node built-ins
3. Third-party packages
4. @openrouter/* packages
5. Aliases
Auto-fixed import order and lint issues across all TypeScript files,
including .claude/skills/ and packages/cli/src/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>