- Revert local security warning to openclaw-only (was blocking all agents)
- Update spawn skill to document how to run prompts on child VMs:
- Always use `bash -lc` (binaries in ~/.local/bin/ need login shell)
- Claude uses `-p` not `--print` or `--headless`
- Add `--dangerously-skip-permissions` for unattended child VMs
- Don't waste tokens with `which`/`find` or creating non-root users
- Sync all on-disk skill files with embedded version
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- E2E: _digitalocean_max_parallel() now returns 0 (not 1) when no capacity
- E2E: run_agents_for_cloud() skips cloud with actionable error when capacity is 0
- CLI: checkAccountStatus() includes droplet names in limit-reached error message
Fixes#3059
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
resolveEntityKey() and checkEntity() checked manifest.agents[input] directly,
bypassing the disabled filter in agentKeys(). This let users run `spawn cursor
<cloud>` even though cursor is disabled, wasting time provisioning a VM for an
agent that can't route through OpenRouter. Now both functions check the disabled
flag and show the disabled_reason to the user.
Also removes stale cursor references from spawn skill templates injected into
child VMs.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Previously the warning only appeared for openclaw. Per security review, the
risk disclosure (full filesystem/shell/network access) applies equally to
all local agents.
Agent: pr-maintainer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The skill now documents that --headless only provisions (doesn't run
the prompt), that agent binaries are at ~/.local/bin/ (not on PATH),
and that --print should be used for one-shot prompts as root instead
of fighting with permission restrictions.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace base64-into-shell interpolation with SCP-based uploadConfigFile()
for Claude Code settings.json and Cursor CLI config files. This eliminates
the attack surface of injecting encoded payloads into shell command strings.
Add chmod 600 on ~/.openclaw/openclaw.json after writing the Telegram bot
token to prevent other users on the VM from reading the token in plaintext.
Fixes#3033Fixes#3034
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: pull child spawn history back to parent for `spawn tree`
When the interactive session ends (or headless mode completes), the
parent downloads the child VM's history.json and merges records into
local history. Before downloading, it runs `spawn pull-history` on the
child, which recursively pulls from all grandchildren — so the full
tree collapses up to the root regardless of depth.
Changes:
- Add getParentFields() — sets parent_id/depth on saveSpawnRecord calls
- Add pullChildHistory() — downloads + merges child history after session
- Add `spawn pull-history` command for recursive SSH-based history pull
- Add 11 tests for parseAndMergeChildHistory
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: trigger CI recompute
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): validate user/ip params before SSH exec in pull-history
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): use shared validators for SSH params in pull-history and delete
Replace inline regex checks in pull-history.ts with validateUsername()
and validateConnectionIP() from security.ts, matching the pattern used
across connect.ts, fix.ts, and link.ts. Also add the same validation
to delete.ts:pullChildHistory which had no SSH parameter validation.
orchestrate.ts uses the runner abstraction (not raw user@ip), so its
SSH params come from the cloud provider, not untrusted history records.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
PR #3015 added --yes and -y flags to the delete command but didn't add
them to KNOWN_FLAGS in flags.ts. This caused `spawn delete --name foo --yes`
to fail with "Unknown flag: --yes" because checkUnknownFlags runs before
dispatchDeleteCommand strips these flags.
Also adds delete-specific flags to --help documentation.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Agents running on spawned VMs couldn't delete child spawns because
`spawn delete` requires an interactive terminal for the picker UI.
Added --name and --yes flags: when both are provided in non-interactive
mode, the server matching the name is deleted without prompts. This
enables agents to manage their own child VMs programmatically.
Updated all skill files to teach agents the headless delete syntax.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
Replace hand-constructed openrouter.json path with getSpawnCloudConfigPath("openrouter")
for single-source-of-truth path resolution. Remove unused _cloudName parameter since
the function delegates ALL cloud credentials unconditionally.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add /^[A-Za-z0-9+/=]+$/ validation after each .toString("base64") call
in delegateCloudCredentials() and injectEnvVars(), consistent with the
pattern established in agent-setup.ts by #2988.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ux): replace download spinner with stderr logging, reset terminal before SSH handoff
Fixes two UX issues from live E2E session (#3001):
1. Download spinner (p.spinner from @clack/prompts) wrote ANSI escape codes
to stdout. When stdout is captured (E2E harness, piped output), these
sequences appeared as raw text rather than rendered colors. Replace
p.spinner() in downloadScriptWithFallback and downloadBundle with
logStep/logInfo/logError from shared/ui.ts, which write to stderr and
correctly check isTTY before emitting ANSI codes.
2. Garbled output at start of interactive session (overlapping status lines
from the remote agent's TUI) may be caused by residual ANSI state from
@clack/prompts (hidden cursor, active color attributes). Emit
ESC[?25h ESC[0m to stderr before prepareStdinForHandoff() to explicitly
restore cursor visibility and reset all attributes before the SSH session
takes over.
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: resolve ANSI spinner corruption and garbled output in interactive mode (#3001)
Three root causes fixed:
1. Spinner wrote to stdout while all other CLI status output goes to stderr,
causing ANSI escape sequence interleaving and corruption when both streams
are merged on a terminal. Redirected all p.spinner() calls to process.stderr.
2. unicode-detect.ts (which sets TERM=linux for SSH sessions to force ASCII
fallback) was only imported in commands/shared.ts but not in shared/ui.ts.
Cloud module entry points (hetzner/main.ts, etc.) that import shared/ui.ts
loaded @clack/prompts without the TERM override, causing Unicode spinner
frames in environments that can't render them.
3. After an interactive SSH session ends, the remote agent's TUI (e.g. Claude
Code) may leave the terminal in raw mode with altered attributes. Added
terminal reset (ANSI attribute reset + stty sane) after spawnInteractive()
returns to prevent garbled post-session output.
Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The CLI help output only listed 3 of 5 beta features (tarball, images,
docker). The error output on invalid beta flags and the README both
correctly listed all 5. This adds the missing parallel and recursive
entries to --help for consistency.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove `export` from `getTerminalWidth` in commands/info.ts — only
used internally, not exported from commands/index.ts barrel
- Remove `export` from `makeDockerExec` in shared/orchestrate.ts — only
used internally by `makeDockerRunner`, no external callers
- Bump CLI version to 0.26.6
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Sprite has a bun shim at /.sprite/bin/bun that delegates to
$HOME/.bun/bin/bun, but that binary doesn't exist on fresh VMs.
`command -v bun` returns true (finds the shim) so the install script
skips bun installation, then bun fails when actually invoked.
Fixed in two places:
- installSpawnCli: source shell profiles, test `bun --version` (not
just existence), and install bun fresh if it doesn't work
- install.sh: replace `command -v bun` with `bun --version` to detect
broken shims
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: spawn step skipped when no explicit --steps passed
The spawn skill injection condition used `enabledSteps?.has("spawn")`
which is falsy when enabledSteps is undefined (no --steps flag). Now
checks the recursive beta flag directly and falls through when no
explicit steps are selected, matching how auto-update works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: embed skill content in spawn-skill.ts instead of reading from disk
The skills/ directory exists in the repo but isn't bundled when the CLI
is installed via npm. readSkillContent() couldn't find the files at
runtime, causing "No spawn skill file for agent" on every deploy.
Fixed by embedding all skill content directly as string constants in the
module. Removed fs-based getSkillsDir/readSkillContent/getSpawnSkillSourceFile
in favor of a single AGENT_SKILLS config map with inline content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously only `settingsB64` had a validation check. Added the same
`/^[A-Za-z0-9+/=]+$/` guard for wrapperB64, unitB64, and timerB64
before they are interpolated into shell commands, closing the consistency gap.
Fixes#2986
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The `skips when no credential files exist` test in recursive-spawn.test.ts
was failing in the full suite (1911 pass, 1 fail) because other test files
(oauth-cov.test.ts, cmd-uninstall-cov.test.ts) write openrouter.json and
hetzner.json to $HOME/.config/spawn/ without cleanup, contaminating the
shared sandbox HOME used by bun's test runner. The test passed in isolation
but failed 100% of the time in the full suite.
Fix: add a beforeEach inside the delegateCloudCredentials describe block
that removes $HOME/.config/spawn/ before each test, making the test
self-contained and immune to cross-file pollution.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add recursive spawn (--beta recursive)
Enables VMs to spawn child VMs. When --beta recursive is active:
- Injects SPAWN_PARENT_ID, SPAWN_DEPTH, SPAWN_BETA=recursive into .spawnrc
- Installs spawn CLI on the VM via install.sh
- Delegates cloud + OpenRouter credentials to the VM
- Tracks parent_id and depth on SpawnRecord for tree relationships
- Adds `spawn tree` command for full recursive tree view
- Adds `spawn history export` for pulling child history via SSH
- Adds `spawn list --json` and `spawn list --flat` flags
- Adds tree rendering in `spawn list` when parent-child relationships exist
- Adds cascade delete support in delete.ts
- Adds mergeChildHistory() for backward-pass history sync
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add recursive spawn to README
Add --beta recursive to beta features table, new commands
(spawn tree, spawn history export, spawn list --flat/--json)
to commands table, and a dedicated Recursive Spawn section
with usage examples for tree view and cascade delete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add cmdTree coverage tests to fix mock test CI
The CI coverage threshold (90% functions, 80% lines) was failing
because tree.ts had 0% coverage. Added tests that exercise cmdTree
with empty history, tree rendering, JSON output, flat records,
and deleted/depth labels. tree.ts now has 100% coverage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): validate cloudName and use valibot in pullChildHistory
- Add cloudName validation against ^[a-z0-9-]+$ to prevent
command injection in delegateCloudCredentials
- Export SpawnRecordSchema from history.ts and replace loose
type guard with valibot schema validation in pullChildHistory
- Resolve merge conflicts with main (include both docker and
recursive beta features)
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: add installSpawnCli and delegateCloudCredentials coverage
Export and test installSpawnCli (success + timeout failure paths)
and delegateCloudCredentials (no creds, with creds, write failure,
mkdir failure paths) to improve orchestrate.ts function coverage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gritQL rule false positives and delete.ts coverage
- use TsAsExpression() AST node instead of backtick pattern to avoid
matching import aliases as type assertions
- export and test findDescendants() and pullChildHistory() to bring
delete.ts line coverage above the 35% threshold
- add 8 new tests for descendant finding and history pull edge cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
- Suppress remote command output in Hetzner runServer() by piping
stdout/stderr instead of inheriting. This prevents raw ANSI escape
sequences from remote install commands (spinners, progress bars)
from leaking into the local terminal as garbled characters, and
eliminates duplicate status messages that were repeated 15+ times.
Captured stderr is logged via logDebug on failure for debugging.
- Add LC_ALL=C.UTF-8 to both the interactive SSH session and the
.spawnrc env config to ensure consistent UTF-8 locale across all
locale categories, preventing garbled Unicode rendering in Claude
Code's TUI welcome interface.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: remove docker from --fast and fix docker cp into container
Two fixes for --beta docker:
1. Remove "docker" from --fast beta features — --fast was auto-enabling
--beta docker, pulling ghcr images that hang the session.
Users must now opt in explicitly with --beta docker.
2. Fix uploadFile in docker mode — .spawnrc was uploaded to the host
but never copied into the container. Add docker cp after SCP upload
so env vars and configs reach the agent inside the container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: keep docker in --fast beta features
The docker cp fix resolves the hang — no need to remove docker from
--fast. The issue was missing file copy into the container, not the
docker mode itself.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: extract makeDockerRunner helper, fix uploadFile into container
Add makeDockerRunner() that wraps a CloudRunner so all commands and
file uploads target the Docker container. Replaces inline lambdas in
hetzner/main.ts and gcp/main.ts with a clean one-liner.
The key fix: uploadFile now docker cp's files into the container after
SCP — previously .spawnrc (API keys, env vars) only landed on the host,
so the agent inside the container had no config and hung.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): shellQuote remotePath in docker cp command
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract duplicate dockerExec helper from gcp/main.ts and hetzner/main.ts
into shared makeDockerExec() in orchestrate.ts. Both local functions were
identical — wrapping commands with docker exec using DOCKER_CONTAINER_NAME
and shellQuote.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The hermes install script's mini-swe-agent pip dependency uses
git+ssh:// URLs that timeout on fresh cloud VMs (hetzner/gcp/digitalocean)
where outbound SSH to GitHub is blocked or slow.
Add `git config --global url.https://github.com/.insteadOf` rules
before the hermes install and update commands to force git to use
HTTPS instead of SSH for all GitHub URLs. This eliminates the SSH
connection timeout that was causing install failures.
Fixes#2955
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts the inline docker-mode condition from hetzner/main.ts and
gcp/main.ts into a testable exported function in shared/cloud-init.ts,
then adds real unit tests that import from the source. Fixes#2952.
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
`buildFixScript()` was missing `export LANG='C.UTF-8'` that was added to
the canonical `generateEnvConfig()` in commit f93c799d. Users running
`spawn fix` would get a `.spawnrc` without the UTF-8 locale export,
causing garbled Unicode in agent TUIs — the same regression that f93c799d
fixed for fresh provisioning.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1. Suppress Claude Code curl installer stdout — the remote installer
prints its own "Installation complete!" which duplicated the local
"Claude Code agent installed successfully" message.
2. Export LANG=C.UTF-8 in both the interactive SSH session command and
the .spawnrc env config. Fresh cloud VMs often default to the C
locale which cannot render Unicode properly, causing garbled ANSI
output in agent TUIs (e.g. "⏵⏵bypasspermissionson" instead of
properly spaced text).
Fixes#2946
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary
IPs from previous test runs consume the account quota. This adds proactive
cleanup at two levels:
1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached
primary IPs during pre-batch stale cleanup, freeing quota before any
new servers are provisioned.
2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()`
before `createServer()` in headless/non-interactive mode, ensuring
each agent provisioning attempt starts with a clean IP quota.
The existing reactive cleanup (retry after failure) in `hetzner.ts`
remains as a fallback.
Fixes#2933
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs in acquireLock:
1. PID write failure was ignored — process returned success but left a
lock dir without a PID file. If it crashed, no other process could
detect the lock as stale, making it permanent.
2. Lock dirs without PID files were not treated as stale — other
processes waited until timeout instead of cleaning up immediately.
Fix: retry on PID write failure (clean up dir first), and treat
lock dirs without PID files as broken/stale (force remove).
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update agent GitHub star counts
* chore: update agent GitHub star counts
* chore: update agent GitHub star counts
* chore: update agent GitHub star counts
* chore: update agent GitHub star counts
* fix(install): force IPv4 DNS for npm installs and add junie binary verify
On Sprite VMs (and potentially other clouds with flaky IPv6 routing), npm
install of packages with native-binary postinstall scripts (kilocode, junie)
fails with i/o timeout when connecting to the npm registry over IPv6.
Changes:
- Add NODE_OPTIONS=--dns-result-order=ipv4first to NPM_PREFIX_SETUP so all
npm installs prefer IPv4, preventing the IPv6 timeout on first attempt
- Add cd ~ before postinstall re-run in KILOCODE_BINARY_VERIFY to avoid
"current working directory was deleted" errors in bun/node on retry
- Add JUNIE_BINARY_VERIFY snippet (analogous to kilocode) that detects and
recovers from a failed junie postinstall by re-running it from $HOME
- Apply JUNIE_BINARY_VERIFY to the junie install command
Fixes sprite kilocode and junie failures seen in E2E run 2026-03-23.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
When --output json is requested, the auto-update install script was
running with stdio: "inherit", causing [spawn] install messages to
pollute stdout before the JSON result, breaking JSON consumers.
Fix:
- Pre-scan process.argv for --output json before checkForUpdates()
is called in index.ts (formal flag parsing happens later at line 944)
- Pass jsonOutput flag through checkForUpdates() -> performAutoUpdate()
- When jsonOutput=true, use stdio: ["pipe", stderr, stderr] for the
install script execution so all output goes to stderr only
- Set SPAWN_CLI_UPDATED=1 env var on re-exec so JSON consumers can
detect the update via cli_updated: true in SpawnResult
- Add cli_updated?: boolean to SpawnResult interface in commands/run.ts
- Add tests covering both json and non-json stdio behavior
Fixes#2918
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Headless mode set SPAWN_HEADLESS and SPAWN_MODE but not
SPAWN_NON_INTERACTIVE, which all cloud modules check before prompting.
This caused GCP (and potentially other clouds) to prompt for project
confirmation when stdin was closed, resulting in a fatal error.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add missing fields (signalCode, resourceUsage, pid, killed) to
Bun.spawnSync and Bun.spawn mock return values so they satisfy the
full return types without needing `as` casts or biome-ignore comments.
Agent: style-reviewer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- remove `export` from `LocalTarball` interface in `shared/agent-tarball.ts`
— the type is only used internally as the return type of `downloadTarballLocally`;
it was never imported from outside the module.
- remove `getTerminalWidth` re-export from `commands/index.ts`
— `getTerminalWidth` is only called inside `commands/info.ts` itself;
it was re-exported through the barrel but never imported from there by any consumer or test.
bump CLI version patch: 0.25.18 → 0.25.19
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Commit 97b6424 (fix(security): add cmd validation to Sprite
runSprite() and runSpriteSilent()) changed production CLI code without
a corresponding version bump. The CLI has auto-update — without this
bump users won't receive the null-byte injection guard.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
When parallel E2E runs exhaust Hetzner's Primary IP quota, the CLI now
detects the `resource_limit_exceeded` / `primary_ip_limit` error, automatically
cleans up orphaned Primary IPs (unattached to any server), and retries once.
If cleanup doesn't free quota, a clear message guides users to delete stale
resources or request a quota increase.
Fixes#2902
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Suppress stdout+stderr from `claude install --force` to prevent duplicate
"successfully installed" messages (was printed up to 4x)
- Make logStepInline fall back to newline-separated output when stderr is not
a TTY, so SSH port polling status is readable in piped/captured contexts
- Consolidate post-install completion messages into a single clear milestone:
"Agent setup complete -- {agent} is ready on {cloud}"
- Bump CLI version to 0.25.16
Fixes#2899
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: skip interactive session in headless mode (#2892)
When SPAWN_HEADLESS=1, the orchestrator now exits with code 0 after
provisioning completes instead of attempting to launch the agent
interactively. This fixes Claude Code (and other agents) failing with
"Input must be provided through stdin or --prompt" when spawned via
`--headless --output json` without a prompt.
The VM is fully provisioned and ready — callers can SSH in or use
`spawn connect` to start the agent manually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: clean up SPAWN_HEADLESS env in test afterEach to prevent leaks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
checkAccountStatus() now queries the account's droplet_limit and
current droplet count. When at capacity it warns interactively and
throws immediately in headless/E2E mode with a clear message instead
of attempting creation and getting a cryptic 422.
Also adds specific detection of droplet limit 422 errors in
createServer() with actionable guidance (limit increase URL).
Bump CLI to 0.25.14.
Fixes#2865
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The sprite was going idle and shutting down during long npm install
operations because the remote keep-alive script wasn't installed yet
and sprite exec alone doesn't count as activity.
- Add local keep-alive that pings the sprite's public URL every 30s
from the client machine during provisioning and agent install
- Stop it when the interactive session starts (remote script takes over)
- Add i/o timeout to spriteRetry's transient error regex so connection
timeouts are retried instead of failing immediately
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GCP's default 10 GB boot disk is insufficient for coding agents — node_modules,
apt packages, and build caches easily exceed it. Default to 40 GB and allow
override via GCP_DISK_SIZE env var.
Closes#2866
Co-authored-by: Claude <claude@anthropic.com>
Consolidate DOCKER_CONTAINER_NAME and DOCKER_REGISTRY constants from
gcp/main.ts and hetzner/main.ts into shared/orchestrate.ts. Both files
defined identical values ("spawn-agent" and "ghcr.io/openrouterteam"); they
now import the shared exports instead.
Bumps CLI patch version to 0.25.11.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
The --beta docker feature (PR #2854) was missing from `spawn help`
output, and its error description said "Hetzner" only but it also
works on GCP.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add --beta docker for Hetzner Docker CE app image
Uses Hetzner's pre-built docker-ce app image when --beta docker
(or --fast) is active, giving faster boot times similar to DO
marketplace images. Snapshots still take priority when available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: pull and run pre-built agent Docker images on Hetzner
When --beta docker (or --fast) is active, boots Hetzner with docker-ce
app image, then pulls ghcr.io/openrouterteam/spawn-{agent}:latest and
runs it. All runServer commands are routed through docker exec into
the container, and the interactive session uses docker exec -it.
Skips agent install since the agent is pre-baked in the image.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add --beta docker support for GCP with Container-Optimized OS
When --beta docker (or --fast) is active on GCP, uses cos-stable
from cos-cloud (Docker pre-installed, read-only OS). Skips cloud-init
startup script (incompatible with COS), pulls the pre-built agent
image from ghcr.io, and routes all commands through docker exec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: correct import path for logInfo/logStep (shared/log.js -> shared/ui.js)
The log.js module does not exist; these functions are exported from ui.ts.
Also merge duplicate ui.js imports per biome organizeImports.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Sprite CLI exits with code 1 on "connection closed" (not 255 like SSH).
The reconnect loop now treats exit code 1 on Sprite as a connection
drop, retrying up to 5 times with a 3s delay between attempts.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In fast mode, Promise.allSettled runs server boot, OAuth, and tarball
download concurrently. When all operations complete — especially after
Bun.serve.stop(true) in the OAuth flow removes its event loop handle —
the event loop can appear empty before the await continuation starts
new I/O operations. This causes Bun to exit silently with code 0,
dropping the user back to their shell after "Successfully obtained
OpenRouter API key via OAuth!" with no error.
Fix: keep a dummy setInterval handle alive during the fast-mode
concurrent section so the event loop never drains prematurely.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use base64 encoding for GITHUB_TOKEN to prevent injection
Aligns GITHUB_TOKEN handling with the existing base64 pattern used for
OPENROUTER_API_KEY in orchestrate.ts, eliminating the single-quote
escaping vulnerability.
Fixes#2834
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: apply shellQuote to base64-encoded GITHUB_TOKEN
Address security review feedback: wrap the base64-encoded token in
shellQuote() for defense-in-depth, preventing any theoretical shell
metacharacter escape from the interpolated value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
spawn link is a fully implemented command (440 lines) that was
completely missing from `spawn help`. Users had no way to discover
it through the CLI's self-documentation.
Also adds --fast to the KNOWN_FLAGS set for consistency — it was
accepted by the CLI but not registered in the flag validation set.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The MAX_HISTORY_ENTRIES=100 cap silently archived records when you
spawned more than 100 times, making older active servers vanish from
`spawn list`. The cap was solving a non-problem — 1000 records is ~500KB.
Removed:
- MAX_HISTORY_ENTRIES constant and trimming logic
- archiveRecords() and readExistingArchive() (no longer needed)
- Smart trim tests (history-trimming.test.ts rewritten to test ordering only)
Existing archive files (~/.spawn/history-YYYY-MM-DD.json) are still
readable by recoverFromArchives() for corruption recovery.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix 24 TypeScript strict mode errors across 7 production files:
- interactive.ts: guard against undefined `val` in validate callback
- list.ts: use already-narrowed `conn` variable instead of `selected.connection`
- run.ts: widen `buildCloudLines` defaults param to `Record<string, unknown>`
- digitalocean.ts: use `toRecord()` to safely drill into nested API responses;
capture narrowed `oauthCode` in const for async closure
- history.ts: backfill missing record IDs via `backfillRecordIds()` helper;
use `v.safeParse` output directly to get properly typed records
- index.ts: use `Manifest` type for `showUnknownCommandError` parameter
- orchestrate.ts: capture narrowed `tunnel` and `getConnectionInfo` in const
variables before async closures
Fixes#2821
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
- GCP coverage tests (6 failures): getServerIp, listServers, and
authenticate tests did not mock the `which gcloud` spawnSync call
inside requireGcloudCmd(), causing "gcloud CLI not found" errors.
Add mockSpawnSyncWithGcloud/mockWhichGcloud helpers that satisfy
the gcloud discovery call before the test-specific mock.
- Sandbox guardrail test (1 failure): cmd-uninstall-cov deletes
~/.spawn and other sandbox directories but never re-creates them.
Since Bun runs test files in the same process, the fs-sandbox
test then fails. Add afterEach restoration of sandbox dirs.
- Add coverageThreshold to bunfig.toml with correct syntax
(coverageThreshold under [test], not [test.coverage])
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>