Commit graph

1869 commits

Author SHA1 Message Date
A
2d69b2806b
fix: improve cloud descriptions for non-technical users (#2328)
Cherry-picks UX improvements from #2321: simplifies cloud descriptions
to plain language, adds account/payment requirements upfront so users
know what they need before starting.

Fixes #2323

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-08 04:07:25 -07:00
Ahmed Abushagur
bc0c1827bb
fix: reorder auth flow and persist OpenRouter API key (#2320)
* fix: reorder auth flow and persist OpenRouter API key across retries

Two onboarding issues reported by users:

1. After DigitalOcean OAuth, the message said "OpenRouter authentication
   in 5s..." but then a GitHub CLI prompt appeared first. Fix: move API
   key acquisition immediately after cloud auth, before preProvision
   hooks (which include the GitHub prompt). Remove the misleading 5s
   delay message.

2. On retry after billing failure, DigitalOcean token was remembered but
   the OpenRouter API key was lost (only stored in process.env). Fix:
   persist the key to ~/.config/spawn/openrouter.json and load it on
   subsequent runs, matching how cloud tokens are already persisted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add mode 0o700 to config dir and await saveOpenRouterKey

- Add mode: 0o700 to mkdirSync in saveOpenRouterKey to match other cloud
  modules (aws, hetzner, digitalocean) and prevent directory permission leak
- Add missing await on saveOpenRouterKey(manualKey) to ensure manual API
  keys persist to disk before the function returns

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-08 06:48:14 -04:00
A
de732fa695
fix: prevent command injection in _sprite_exec via stdin piping (#2329)
Pipe the command via stdin to bash instead of embedding it in a bash -c
string. This eliminates shell injection risk from unquoted cmd parameter,
consistent with _sprite_exec_long in the same file and other cloud drivers.

Fixes #2327

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 06:44:19 -04:00
A
fedd024801
refactor: remove dead runServerCapture from all cloud modules (#2325)
The runServerCapture function was defined in aws, hetzner, gcp, and
digitalocean modules but never called anywhere in the codebase. All
cloud modules use runServer (which streams to stderr) and the
CloudRunner interface only requires runServer, not runServerCapture.

Bump CLI version 0.15.14 → 0.15.15.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 01:47:33 -08:00
Ahmed Abushagur
a215848cac
fix: skip SSH key selection prompt, use all keys automatically (#2326)
New users don't know which SSH key to pick. Just use all discovered
keys silently (ed25519 sorted first). If none exist, generate one.

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 05:45:13 -04:00
Ahmed Abushagur
dda6d53db7
fix: skip model selection prompt, default to openrouter/auto (#2322)
New users don't know what LLM models are — prompting them to pick one
with no context is confusing and openrouter/auto can route to weak
models. Remove the interactive model prompt entirely; agents use their
modelDefault silently (or MODEL_ID env var for power users).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 00:54:46 -08:00
Ahmed Abushagur
ff3a60267c
feat: add billing/payment setup guidance for new cloud users (#2319)
Detect billing-related server creation errors, open the cloud's billing
page in the browser, and prompt the user to retry after adding a payment
method. Adds pre-flight account checks for DigitalOcean (account status)
and GCP (billing enabled).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 04:50:51 -04:00
Ahmed Abushagur
c9792f1213
fix: remove banned as type assertions from key-server.ts (#2324)
Replace 3 `as` casts with runtime narrowing:
- `m.clouds as Record<string, any>` → toRecord() helper
- `body.providers as string[]` → Array.isArray + typeof guard
- `fd.get(...) as string` → typeof guard

Closes #2268

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 04:49:09 -04:00
A
26149d14b1
fix(spa): detect HTML auth redirects in Slack file downloads (#2316)
Slack file downloads fail silently when the bot token lacks the
files:read OAuth scope — Slack returns an HTML login page instead of
the actual file bytes. This causes Claude Code to send corrupt "images"
to the Anthropic API, which returns 400 "Could not process image".

Changes:
- Add files:read scope to slack-manifest.yml
- Add Content-Type header check in downloadSlackFile (catches text/html)
- Add magic-byte check via looksLikeHtml() as defense-in-depth
- Add tests for both validation paths and the looksLikeHtml helper

Note: After merging, the Slack app must be reinstalled to pick up the
new files:read scope on the bot token.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 04:48:37 -04:00
Ahmed Abushagur
0ff1da1093
fix: remove redundant GitHub CLI prompt during provisioning (#2318)
Auto-detect GitHub credentials (GITHUB_TOKEN env var or `gh auth token`)
instead of interactively asking users. Rename promptGithubAuth → detectGithubAuth.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 00:01:09 -08:00
A
459e25a844
feat(cli): show connect-or-create menu when existing spawns are present (#2310)
* feat(cli): show connect-or-create menu when existing spawns are present

When the user runs `spawn` with no arguments and has active servers in
history, display a top-level menu before jumping into the create flow:

  What would you like to do?
  ❯ Connect to existing server
    Create a new server

Selecting "Connect to existing server" opens the same interactive picker
as `spawn list` (activeServerPicker). Selecting "Create a new server" or
having no existing spawns continues with the current create flow, so
there is no behaviour change for first-time users.

Fixes #2308

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* chore(cli): bump version to 0.15.14

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 01:56:37 -05:00
A
053c0a8aec
test: remove 34 theatrical tests from manifest-cache-lifecycle.test.ts (#2317)
Remove tests that verify JavaScript language semantics rather than
application logic. These tests would pass even if the source code
were deleted:

- 18 isValidManifest tests (JS truthiness of null, 0, false, "", [])
- 7 matrixStatus edge cases (Object property lookup with hyphens,
  underscores, empty strings, long keys)
- 5 agentKeys/cloudKeys ordering tests (Object.keys insertion order,
  an ES2015 spec guarantee)
- 3 countImplemented tests (for-loop over 1000 items, single entry,
  non-standard statuses)

Kept 17 tests that exercise real application behavior: cache corruption
recovery, HTTP error fallback, in-memory cache, fallback chains, and
countImplemented case-sensitivity.

Closes #2315

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 01:18:54 -05:00
A
bb290c37df
docs: sync README matrix with manifest.json (add Junie) (#2312)
manifest.json has 8 agents (added Junie) and 48 implemented combinations,
but README tagline said "7 agents / 42 combinations" and the matrix table
was missing the Junie row.

-- qa/record-keeper

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 00:07:22 -05:00
A
23fea2df21
fix(e2e): add junie agent to E2E test harness (#2314)
The junie agent was added in #2300 but the E2E test scripts were not
updated. This adds junie to ALL_AGENTS, verify dispatch, input test
dispatch, and the provision.sh fallback env configuration.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 00:03:32 -05:00
A
bd41641c11
fix(cli): improve visual spacing in spawn list output (#2311)
- Interactive picker: add blank separator line between entries so label
  and subtitle are visually grouped (not blending into adjacent entries)
- Non-interactive table: wrap subtitle in pc.dim() for better contrast
  with the bold entry name
- Update pickerHeight to account for added separator lines

Fixes #2309

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 00:01:53 -05:00
A
252e8fc726
feat: add Junie CLI (JetBrains) agent across all 6 clouds (#2300)
Adds JetBrains' Junie CLI as a new agent in the spawn matrix.

- agent: npm install -g @jetbrains/junie-cli, launched via `junie`
- env: JUNIE_OPENROUTER_API_KEY (native OpenRouter BYOK support)
- cloudInitTier: node (npm-based install)
- matrix: all 6 clouds implemented (local, hetzner, aws, digitalocean, gcp, sprite)
- icon: JetBrains org avatar (assets/agents/junie.png)
- tests: 7 unit tests in junie-agent.test.ts
- version bump: 0.15.9 → 0.15.10

Closes #2296

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 19:38:45 -08:00
A
51dec6e877
fix: E2E failures - SSH key gen race, hetzner 409, hermes binary path (#2305)
Three distinct E2E bugs fixed:

1. SSH key generation race condition: When multiple agents provision in
   parallel, concurrent processes all call generateSshKey() and race to
   create ~/.ssh/id_ed25519. ssh-keygen won't overwrite an existing file
   (prompts on stdin which is "ignore"), causing zeroclaw/codex to fail
   with "SSH key generation failed". Fix: check if key already exists
   before generating, and re-check after a failed generation attempt.

2. Hetzner SSH key 409 uniqueness_error: The Hetzner API returns HTTP 409
   with "SSH key not unique" when the same key content is registered under
   a different name. The hetznerApi() function throws on non-2xx before
   the error-parsing code runs, and the regex /already/ didn't match
   "not unique". Fix: catch 409 in ensureSshKey() and match against
   uniqueness_error/not unique/already patterns.

3. Hermes binary not found: The hermes install script (uv tool) creates
   the actual binary + venv at ~/.hermes/hermes-agent/venv/ with a symlink
   at ~/.local/bin/hermes. The tarball capture script only captured the
   symlink + ~/.local/share/, leaving a dangling symlink. Fix: include
   ~/.hermes/ in capture paths, add venv/bin to verify.sh PATH check,
   and update hermes launchCmd to include the venv PATH.

Fixes #2304

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 22:05:44 -05:00
A
e7ac388110
fix: make credential hint tests environment-independent (#2303)
Tests for getScriptFailureGuidance were failing when cloud credential
env vars (HCLOUD_TOKEN, DO_API_TOKEN) were set in the environment.
The tests expected these vars to appear as "missing" in the output,
but only unset OPENROUTER_API_KEY. Now both the cloud-specific var
and OPENROUTER_API_KEY are saved/unset before each test.

Bump CLI version to 0.15.11.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-07 20:41:52 -05:00
A
90ae485c02
fix: add per-process timeout to SSH handshake probes in waitForSsh (#2299)
The Phase 2 SSH handshake loop in waitForSsh spawns SSH processes
without a per-process timeout. ConnectTimeout=10 only covers TCP
connect — if sshd accepts the connection but stalls during key
exchange or authentication, the process hangs indefinitely. This
causes the entire spawn command to freeze with no way to recover.

Add a 30s killWithTimeout guard to each probe, matching the pattern
already used in every cloud-specific runServer/uploadFile function.

-- refactor/code-health

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 18:40:48 -05:00
A
099ad8940e
feat(e2e): send agent x cloud matrix email on completion (#2297)
After every e2e run, send an HTML matrix report to KEY_REQUEST_EMAIL
via Resend showing pass/fail/skip per agent x cloud combination.

- e2e.sh: add send_matrix_email() — builds result table from LOG_DIR
  result files, writes temp TS, calls bun run to POST to Resend API.
  Called just before exit so LOG_DIR is still available.
- qa.sh (e2e mode): load RESEND_API_KEY + KEY_REQUEST_EMAIL from
  /etc/spawn-key-server-auth.env before launching Claude so the creds
  are inherited by the e2e.sh subprocess.

Both changes are no-ops when credentials are absent (silent skip).

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 14:07:55 -08:00
A
1991ffcb15
fix: add timeout protection to uploadFile across all SSH-based clouds (#2298)
All four SSH-based uploadFile functions (Hetzner, DO, AWS, GCP) used
`await proc.exited` on SCP subprocesses without any timeout guard.
If SCP hangs due to a network issue, the CLI hangs indefinitely.

This adds the same killWithTimeout pattern already used by runServer
and runServerCapture in these same files: a 120-second timeout that
kills the SCP process if it stalls.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 13:48:11 -08:00
Ahmed Abushagur
7bebc6558f
feat: full marketplace compliance + automated Vendor API submission (#2295)
Packer template:
- Match official 90-cleanup.sh: remove SSH host keys, create
  revoked_keys, remove cloud-init instances, zero-fill free space,
  use --force-confold for upgrades, autoremove/autoclean
- Add Packer manifest post-processor for snapshot ID extraction
- Remove PACKER_LOG=1 (debug logging not needed in production)

Workflow:
- Add "Submit to DO Marketplace" step after successful build
- Reads agent→app_id mapping from MARKETPLACE_APP_IDS secret (JSON)
- Extracts snapshot ID from Packer manifest, PATCHes Vendor API
- Gracefully handles 400 (app already pending review)
- Skips silently if no MARKETPLACE_APP_IDS secret is configured

Setup: add MARKETPLACE_APP_IDS secret as JSON, e.g.:
  {"claude":"60089fc6...", "codex":"60089fc7..."}
App IDs come from the DO Vendor Portal after initial approval.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 16:40:04 -05:00
A
dadb2387e2
refactor: Fix stale references in qa-quality-prompt and test README (#2294)
- Fix qa-quality-prompt.md references to non-existent packages/shared/src/
  (only packages/cli/ exists; shared code lives in packages/cli/src/shared/)
- Add missing test file entries to __tests__/README.md:
  do-snapshot.test.ts and ui-utils.test.ts

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 15:42:36 -05:00
A
ce06492cb7
fix: use exact-line match for INPUT_TEST_MARKER in E2E verify functions (#2293)
Fixes #2292

Unanchored grep -q would match the marker anywhere in output, including
error messages like "Expected SPAWN_E2E_OK but got...". Using grep -qx
requires the marker to appear as a complete line, preventing false passes.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 14:40:06 -05:00
A
52addf16e5
fix: remove BASH_SOURCE usage from all cloud agent scripts (Fixes #2285) (#2289)
All 42 agent scripts across 6 clouds used BASH_SOURCE[0] with dirname
for local checkout detection. This breaks curl|bash execution because
BASH_SOURCE resolves to /dev/fd/XX instead of a real path.

Remove the BASH_SOURCE-based SCRIPT_DIR detection and the "Local checkout"
code path from all scripts. The SPAWN_CLI_DIR env var (used by e2e tests)
is the correct mechanism for running from source. Local cloud scripts
that previously lacked SPAWN_CLI_DIR support now have it.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 14:12:10 -05:00
A
1740274323
fix: replace base64 interpolation with stdin piping in all cloud exec_long functions (#2290)
Replace unsafe pattern where base64-encoded commands were interpolated
into remote command strings with secure stdin piping — command data now
travels as stdin rather than as part of the command string, eliminating
injection risk from shell metacharacter interpretation.

Affected functions across all 5 cloud drivers:
- _hetzner_exec_long
- _aws_exec_long
- _gcp_exec_long
- _digitalocean_exec_long
- _sprite_exec_long

Fixes #2286
Fixes #2287

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 14:09:15 -05:00
A
735e80e376
fix: replace base64 interpolation with stdin piping in verify.sh (Fixes #2283) (#2284)
* fix: replace base64 interpolation with stdin piping in verify.sh (Fixes #2283)

Replace unsafe pattern where encoded prompt was interpolated into remote
command strings with secure stdin piping — prompt data now travels as stdin
rather than as part of the command string, eliminating injection risk.

Affected functions: input_test_claude, input_test_codex, input_test_openclaw,
input_test_zeroclaw.

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use cloud_exec (not cloud_exec_long) for stdin piping

cloud_exec_long ignores stdin - remote base64 -d would hang.
cloud_exec passes cmd to bash -c, which preserves stdin piping.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: restore timeout protection for input tests using cloud_exec

Wraps each agent command in `timeout ${INPUT_TEST_TIMEOUT}` on the remote
side so tests cannot hang indefinitely after switching from cloud_exec_long
to cloud_exec.  Updates stale comment referencing cloud_exec_long.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 12:41:50 -05:00
A
6eb0234f81
refactor: remove unnecessary exports from cloud modules (#2288)
De-export interfaces, types, and constants that are only used within
their own module files. These were exported but never imported by any
other module or test file, unnecessarily widening the public API surface.

Affected symbols:
- aws: AwsState, Region, REGIONS, AGENT_BUNDLE_DEFAULTS
- digitalocean: DigitalOceanState, DropletSize, DROPLET_SIZES, DoRegion, DO_REGIONS
- gcp: GcpState, MachineTypeTier, MACHINE_TYPES, ZoneOption, ZONES
- hetzner: HetznerState, ServerTypeTier, SERVER_TYPES, LocationOption, LOCATIONS
- sprite: SpriteState

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-07 11:44:55 -05:00
A
70d8462e56
fix: add explicit input validation to capture-agent.sh (Fixes #2281) (#2282)
Add whitelist validation for AGENT_NAME immediately after the empty
check to prevent command injection and path traversal via the parameter.
While the existing case statement catches unknown agents, explicit
upfront validation makes the security intent clear and defensive.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 06:27:28 -08:00
A
bf28ccde87
fix: remove stale TODO(#2041) reference (issue is closed) (#2280)
The PKCE migration TODO referenced closed issue #2041. The TODO
itself is still valid (DigitalOcean still doesn't support PKCE),
so keep the migration checklist but drop the issue number.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-07 07:49:34 -05:00
A
92e8618d20
refactor: Remove dead code and stale references (#2278)
* refactor: remove commands.ts compatibility shim and fix stale references

- Delete packages/cli/src/commands.ts shim file (only re-exported commands/index.ts)
- Update index.ts to import directly from ./commands/index.js
- Update 24 test files to import from ../commands/index.js
- Fix stale CLAUDE.md reference to commands.ts
- Fix stale QA prompt references to commands.ts and wrong line numbers
- Bump CLI version to 0.15.8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: remove stale references to deleted commands.ts compatibility shim

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-07 03:56:13 -05:00
A
0ef8eb4467
fix: validate v0 history entries against SpawnRecordSchema (#2279)
The v0 fallback path in loadHistory() returned raw parsed JSON array
directly without validating individual elements. This could cause
TypeErrors (e.g. r.agent.toLowerCase() on undefined) in callers like
getActiveServers and filterHistory when corrupted entries exist.

Now filters each element through v.safeParse(SpawnRecordSchema, el),
matching the validation the v1 path already performs.

Fixes #2277

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 03:47:11 -05:00
Ahmed Abushagur
7643b96266
fix: pass DO Marketplace img_check validation (#2276)
Three fixes for marketplace validation failures:

1. Install all security updates (apt-get dist-upgrade) — img_check
   fails if any security patches are pending.
2. Purge droplet-agent and /opt/digitalocean — img_check fails if
   the DO monitoring agent directory exists.
3. Correct img_check.sh filename to 99-img-check.sh — the previous
   URL returned 404.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 02:43:46 -05:00
Ahmed Abushagur
4719b49754
fix: correct img_check.sh filename to 99-img-check.sh (#2275)
The marketplace-partners repo uses `99-img-check.sh`, not
`img_check.sh`. The wrong filename caused a 404 on curl download,
failing all agent builds with exit code 22.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 01:48:52 -05:00
Ahmed Abushagur
5103a763b4
fix: packer build — OOM kill and history builtin (#2274)
* fix: claude snapshot build — remove npm fallback from install command

The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.

Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: snapshot cleanup and lookup — use name prefix instead of tags

DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.

Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: packer build failures — OOM kill + history builtin

Two issues introduced by PR #2271 (marketplace compliance):

1. Droplet downsized to s-1vcpu-1gb (1GB RAM) — Claude's native
   installer and zeroclaw's Rust build get OOM-killed. Restore
   s-2vcpu-2gb.

2. Cleanup provisioner uses `history -c` which is a bash builtin.
   Packer runs scripts with /bin/sh (dash on Ubuntu) which doesn't
   have it → exit 127 on ALL agents. Remove it — the .bash_history
   file deletion already handles persistent history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 01:15:39 -05:00
Ahmed Abushagur
d77a067aa4
fix: snapshot cleanup + claude install (name-prefix filter) (#2273)
* fix: claude snapshot build — remove npm fallback from install command

The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.

Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: snapshot cleanup and lookup — use name prefix instead of tags

DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.

Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 21:32:58 -08:00
A
c3cb98daab
feat: add DO Marketplace compliance to Packer build pipeline (#2271)
- Switch build droplet from s-2vcpu-2gb to s-1vcpu-1gb ($6/mo) per DO
  Marketplace recommendation for cross-size snapshot compatibility
- Add ufw firewall provisioner (deny incoming, allow SSH, enable)
- Replace basic apt-get clean with full DO Marketplace cleanup sequence:
  removes SSH authorized_keys, clears bash history, truncates /var/log,
  resets machine-id, and runs cloud-init clean so each launched droplet
  gets a fresh identity on first boot
- Add img_check.sh validation step (from digitalocean/marketplace-partners)
  to verify firewall active, no root password, and security posture before
  the snapshot is finalized — build fails if image doesn't meet requirements

Fixes #2269

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 00:20:35 -05:00
Ahmed Abushagur
955a6081c1
fix: Packer build region/size and PATH for agent installs (#2270)
* feat: restore Packer DO snapshot pipeline for fast agent boot

Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.

- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
  distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
  matrix strategy, auto-cleanup of old snapshots, and injection-safe
  env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
  tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
  invalid ID, network failure)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use repo-root-relative path for tier scripts in Packer template

Packer resolves script paths relative to cwd (repo root), not relative
to the .pkr.hcl file. Changed `scripts/tier-*.sh` to
`packer/scripts/tier-*.sh`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Packer build region/size and PATH for agent installs

Two issues causing build failures:

1. `s-2vcpu-4gb` not available in `nyc3` — changed build region to
   `sfo3` and size to `s-2vcpu-2gb` (universally available, cheaper,
   sufficient for building snapshots)

2. Claude install puts binary in `~/.local/bin` which isn't in PATH
   during Packer provisioning — added full PATH to environment_vars
   on both the install and marker provisioners so agent binaries and
   subsequent scripts can find each other

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:45:39 -05:00
A
3a1de9d4cf
refactor: remove packages/shared, deduplicate with CLI shared (#2257)
* refactor: remove packages/shared, deduplicate with packages/cli/src/shared

packages/shared duplicated packages/cli/src/shared (parse.ts, result.ts,
type-guards.ts) with the CLI never importing from the shared package.
The only consumer was .claude/skills/setup-spa, which now imports directly
from packages/cli/src/shared via relative paths.

- Delete packages/shared entirely
- Update setup-spa imports to use relative paths to CLI shared
- Remove @openrouter/spawn-shared workspace dependency from setup-spa
- Update CLAUDE.md and type-safety.md references

Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: remove packages/shared from lint workflow, fix import sorting

The Biome Lint CI step referenced packages/shared/src/ which no longer
exists after this PR removes the package. Also fix import ordering in
setup-spa files to satisfy Biome's organizeImports rule.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: address Devin review — update stale packages/shared references

- Update type-safety.md line 67: packages/shared/src/parse.ts → packages/cli/src/shared/parse.ts
- Update install.ps1 sparse-checkout: remove packages/shared reference

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-06 21:58:42 -05:00
A
66f0aebebb
docs: Sync README with source of truth (#2264)
manifest.json has 6 clouds (local, hetzner, aws, digitalocean, gcp,
sprite) and 7 agents, yielding 42 implemented matrix entries. The
README tagline incorrectly stated "7 clouds" and "49 combinations"
— likely stale from when Daytona was still listed.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-07 01:43:24 +00:00
Ahmed Abushagur
e7b6b0b9fd
fix: Packer tier script path relative to repo root (#2266)
* feat: restore Packer DO snapshot pipeline for fast agent boot

Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.

- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
  distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
  matrix strategy, auto-cleanup of old snapshots, and injection-safe
  env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
  tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
  invalid ID, network failure)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use repo-root-relative path for tier scripts in Packer template

Packer resolves script paths relative to cwd (repo root), not relative
to the .pkr.hcl file. Changed `scripts/tier-*.sh` to
`packer/scripts/tier-*.sh`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 17:40:57 -08:00
A
df462645a0
refactor: remove dead reset*State functions and stale Daytona references (#2265)
Remove 5 unused reset*State() exports (aws, hetzner, gcp, digitalocean,
sprite) that were never called anywhere in the codebase. Convert their
associated _state variables from let to const since they are no longer
reassigned.

Remove stale Daytona references in status.ts (comment and IP check)
left over after Daytona cloud provider removal in #2261.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-06 20:39:32 -05:00
Ahmed Abushagur
cefcd56327
feat: restore Packer DO snapshot pipeline for fast agent boot (#2262)
Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.

- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
  distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
  matrix strategy, auto-cleanup of old snapshots, and injection-safe
  env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
  tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
  invalid ID, network failure)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 16:32:05 -08:00
A
9e26d74ddb
fix: add --prune and --json to KNOWN_FLAGS for spawn status (#2263)
The status command (PR #2254) added --prune and --json flags but did not
register them in KNOWN_FLAGS. This caused the CLI to reject them with
"Unknown flag" errors before the command could even dispatch.

Bump CLI version 0.15.4 -> 0.15.5.

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-06 19:31:07 -05:00
A
035e4bf830
Remove Daytona cloud provider from codebase (#2261)
Simplify the cloud matrix by removing Daytona. All Daytona-specific code,
scripts, tests, and configuration have been removed. Daytona has been moved
to "Previously Considered" in the Cloud Provider Wishlist (#1183) and can
be revived on community demand.

Closes #2260

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-06 18:53:08 -05:00
A
50397f19a3
fix: narrow validatePrompt patterns to prevent false positives on developer phrases (#2259)
Fixes #2249

The overly broad `>>? word` pattern and generic doubled-operator check
were blocking legitimate natural-language developer prompts like:
- "Fix the merge conflict >> registration flow"
- "Run tests && deploy if they pass"

Root cause: `validatePrompt` is called before the prompt is set as the
`SPAWN_PROMPT` env var. Inside double-quoted shell arguments, `>>` and
`&&` are not interpreted as shell operators, so blocking them provided
no real security benefit while creating confusing UX rejections.

Changes:
- Remove `/>>?\s*[a-zA-Z_]\w{2,}/` pattern (false-positive on >> in English)
- Remove generic `hasDoubledOperators` check (false-positive on && in English)
- Keep all targeted patterns: $(cmd), backticks, ${var}, | bash/sh,
  ; rm -rf, fd redirections, heredoc, process substitution, path redirects
- Update tests: split broad && / || tests into "commands" vs "natural language"
- Add tests asserting all issue #2249 example prompts are now accepted

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-06 15:20:39 -08:00
A
2fd3175103
fix: add schema versioning to history.json (v0 bare array → v1 wrapped) (#2256)
Fixes #2252

history.json now uses a versioned envelope:
  { "version": 1, "records": [...] }

This creates a migration escape hatch for future SpawnRecord shape changes.
loadHistory() transparently reads both v0 (bare array) and v1 formats,
automatically migrating v0 files on next write. All write operations now
use writeHistory() to stamp the current schema version consistently.

Validation uses valibot schemas (VMConnectionSchema, SpawnRecordSchema,
HistoryFileV1Schema) so the structure is verified and typed without `as`
casts. Updated all affected tests to check data.records instead of data.

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 15:17:47 -08:00
A
abc15107eb
feat: add spawn status command to show live server state (#2254)
Implements the `spawn status` command requested in #2253. The command:
- Reads active (non-deleted) cloud servers from history
- Queries Hetzner and DigitalOcean REST APIs in parallel using saved tokens
- Shows a live-state table: ID, Agent, Cloud, IP, State, Since
- States: running (green), stopped (yellow), gone (dim), unknown (dim)
- --prune flag marks gone servers as deleted in history
- --json flag outputs machine-readable JSON for scripting
- `spawn ps` is an alias for `spawn status`

Other clouds (AWS, GCP, Sprite, Daytona) require CLI auth flows that cannot
run non-interactively; they report "unknown" with a helpful hint.

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-06 15:13:24 -08:00
A
f862ee563e
refactor: replace module-level mutable globals with typed state objects in cloud providers (#2255)
Each cloud module (aws, daytona, digitalocean, gcp, hetzner, sprite) previously
stored per-operation state in bare module-level `let` variables, making them
process-global singletons. This is safe for single-cloud CLI invocations today
but creates latent bugs for multi-cloud orchestration and test isolation.

Replace scattered `let` globals with a single typed `_state` object per module:
- `AwsState` / `resetAwsState()` — 8 fields including `selectedBundle`
- `DaytonaState` / `resetDaytonaState()` — 5 fields
- `DigitalOceanState` / `resetDigitalOceanState()` — 3 fields
- `GcpState` / `resetGcpState()` — 5 fields
- `HetznerState` / `resetHetznerState()` — 3 fields
- `SpriteState` / `resetSpriteState()` — 2 fields

Each module exports a `resetXxxState()` function for test isolation. No function
signatures or existing exports were changed.

Fixes #2251

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 18:11:46 -05:00
Ahmed Abushagur
141254c4e1
feat: ARM tarball builds + arch-aware download (#2248)
* feat: ARM tarball builds + arch-aware download

- Add ARM64 matrix entries for native binary agents (zeroclaw, opencode,
  hermes, claude) in agent-tarballs.yml workflow
- Update agent-tarball.ts to detect remote VM arch via uname -m and
  download the correct tarball (x86_64 or arm64)
- Change release strategy to support multiple arch assets per tag
- Document ARM build requirements in discovery.md for future agents
- Bump CLI version to 0.15.2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use sudo for tarball extraction on non-root SSH clouds

On AWS Lightsail, SSH connects as 'ubuntu' (not root), but tarballs
extract to /root/. Without sudo, tar fails with "Permission denied".
Conditionally use sudo when not running as root (id -u != 0).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 17:10:33 -05:00