- Replace repeated 'SSH port closed (N/36)' with periodic updates every 5 attempts
- Add clear 'Provisioning complete. Connecting...' line before agent attach
Fixes#3053
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: pull child spawn history back to parent for `spawn tree`
When the interactive session ends (or headless mode completes), the
parent downloads the child VM's history.json and merges records into
local history. Before downloading, it runs `spawn pull-history` on the
child, which recursively pulls from all grandchildren — so the full
tree collapses up to the root regardless of depth.
Changes:
- Add getParentFields() — sets parent_id/depth on saveSpawnRecord calls
- Add pullChildHistory() — downloads + merges child history after session
- Add `spawn pull-history` command for recursive SSH-based history pull
- Add 11 tests for parseAndMergeChildHistory
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: trigger CI recompute
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): validate user/ip params before SSH exec in pull-history
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): use shared validators for SSH params in pull-history and delete
Replace inline regex checks in pull-history.ts with validateUsername()
and validateConnectionIP() from security.ts, matching the pattern used
across connect.ts, fix.ts, and link.ts. Also add the same validation
to delete.ts:pullChildHistory which had no SSH parameter validation.
orchestrate.ts uses the runner abstraction (not raw user@ip), so its
SSH params come from the cloud provider, not untrusted history records.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Shows a non-intrusive "⭐ Enjoying Spawn? Star us on GitHub!" message
to returning users (2+ successful spawns) after a successful spawn
session completes. Shown at most once per 30 days.
- New `maybeShowStarPrompt()` in `shared/star-prompt.ts`
- Tracks `starPromptShownAt` in `~/.config/spawn/preferences.json`
- Called after `execScript()` returns success in cmdRun, cmdInteractive,
and cmdAgentInteractive (skipped in headless mode)
- The `execScript()` return type changed from `void` to `boolean`
to indicate whether the script ran successfully
- Added 7 unit tests covering all gate conditions
Fixes#3020
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace hand-constructed openrouter.json path with getSpawnCloudConfigPath("openrouter")
for single-source-of-truth path resolution. Remove unused _cloudName parameter since
the function delegates ALL cloud credentials unconditionally.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add /^[A-Za-z0-9+/=]+$/ validation after each .toString("base64") call
in delegateCloudCredentials() and injectEnvVars(), consistent with the
pattern established in agent-setup.ts by #2988.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ux): replace download spinner with stderr logging, reset terminal before SSH handoff
Fixes two UX issues from live E2E session (#3001):
1. Download spinner (p.spinner from @clack/prompts) wrote ANSI escape codes
to stdout. When stdout is captured (E2E harness, piped output), these
sequences appeared as raw text rather than rendered colors. Replace
p.spinner() in downloadScriptWithFallback and downloadBundle with
logStep/logInfo/logError from shared/ui.ts, which write to stderr and
correctly check isTTY before emitting ANSI codes.
2. Garbled output at start of interactive session (overlapping status lines
from the remote agent's TUI) may be caused by residual ANSI state from
@clack/prompts (hidden cursor, active color attributes). Emit
ESC[?25h ESC[0m to stderr before prepareStdinForHandoff() to explicitly
restore cursor visibility and reset all attributes before the SSH session
takes over.
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: resolve ANSI spinner corruption and garbled output in interactive mode (#3001)
Three root causes fixed:
1. Spinner wrote to stdout while all other CLI status output goes to stderr,
causing ANSI escape sequence interleaving and corruption when both streams
are merged on a terminal. Redirected all p.spinner() calls to process.stderr.
2. unicode-detect.ts (which sets TERM=linux for SSH sessions to force ASCII
fallback) was only imported in commands/shared.ts but not in shared/ui.ts.
Cloud module entry points (hetzner/main.ts, etc.) that import shared/ui.ts
loaded @clack/prompts without the TERM override, causing Unicode spinner
frames in environments that can't render them.
3. After an interactive SSH session ends, the remote agent's TUI (e.g. Claude
Code) may leave the terminal in raw mode with altered attributes. Added
terminal reset (ANSI attribute reset + stty sane) after spawnInteractive()
returns to prevent garbled post-session output.
Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
delegateCloudCredentials only copied the current cloud's config file
(e.g. sprite.json when spawning on Sprite). Child VMs couldn't spawn
on other clouds because their tokens weren't forwarded.
Now iterates all known clouds and copies every credential file that
exists locally, so the agent can spawn children on any cloud.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove `export` from `getTerminalWidth` in commands/info.ts — only
used internally, not exported from commands/index.ts barrel
- Remove `export` from `makeDockerExec` in shared/orchestrate.ts — only
used internally by `makeDockerRunner`, no external callers
- Bump CLI version to 0.26.6
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Sprite has a bun shim at /.sprite/bin/bun that delegates to
$HOME/.bun/bin/bun, but that binary doesn't exist on fresh VMs.
`command -v bun` returns true (finds the shim) so the install script
skips bun installation, then bun fails when actually invoked.
Fixed in two places:
- installSpawnCli: source shell profiles, test `bun --version` (not
just existence), and install bun fresh if it doesn't work
- install.sh: replace `command -v bun` with `bun --version` to detect
broken shims
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: spawn step skipped when no explicit --steps passed
The spawn skill injection condition used `enabledSteps?.has("spawn")`
which is falsy when enabledSteps is undefined (no --steps flag). Now
checks the recursive beta flag directly and falls through when no
explicit steps are selected, matching how auto-update works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: embed skill content in spawn-skill.ts instead of reading from disk
The skills/ directory exists in the repo but isn't bundled when the CLI
is installed via npm. readSkillContent() couldn't find the files at
runtime, causing "No spawn skill file for agent" on every deploy.
Fixed by embedding all skill content directly as string constants in the
module. Removed fs-based getSkillsDir/readSkillContent/getSpawnSkillSourceFile
in favor of a single AGENT_SKILLS config map with inline content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When `--beta recursive` is active, a new "Spawn CLI" setup step injects
agent-native instruction files teaching each agent how to use the `spawn`
CLI to create child VMs. Skill files live in `skills/` at the repo root
and use each agent's native format (YAML frontmatter for Claude/Codex/
OpenClaw, plain markdown for others, append mode for Hermes).
- Add `skills/` directory with 8 agent-specific skill files
- Add `spawn-skill.ts` module with path mapping, file reading, and injection
- Register "spawn" as a conditional setup step gated by `--beta recursive`
- Wire `injectSpawnSkill()` into orchestrate.ts postInstall flow
- Add 52 tests covering path mapping, append mode, file existence, injection
- Bump CLI version to 0.26.0 (minor: new feature)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds non-empty guard to makeDockerExec to make the security boundary
explicit and prevent silent misuse with empty commands.
Fixes#2985
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add recursive spawn (--beta recursive)
Enables VMs to spawn child VMs. When --beta recursive is active:
- Injects SPAWN_PARENT_ID, SPAWN_DEPTH, SPAWN_BETA=recursive into .spawnrc
- Installs spawn CLI on the VM via install.sh
- Delegates cloud + OpenRouter credentials to the VM
- Tracks parent_id and depth on SpawnRecord for tree relationships
- Adds `spawn tree` command for full recursive tree view
- Adds `spawn history export` for pulling child history via SSH
- Adds `spawn list --json` and `spawn list --flat` flags
- Adds tree rendering in `spawn list` when parent-child relationships exist
- Adds cascade delete support in delete.ts
- Adds mergeChildHistory() for backward-pass history sync
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add recursive spawn to README
Add --beta recursive to beta features table, new commands
(spawn tree, spawn history export, spawn list --flat/--json)
to commands table, and a dedicated Recursive Spawn section
with usage examples for tree view and cascade delete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add cmdTree coverage tests to fix mock test CI
The CI coverage threshold (90% functions, 80% lines) was failing
because tree.ts had 0% coverage. Added tests that exercise cmdTree
with empty history, tree rendering, JSON output, flat records,
and deleted/depth labels. tree.ts now has 100% coverage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): validate cloudName and use valibot in pullChildHistory
- Add cloudName validation against ^[a-z0-9-]+$ to prevent
command injection in delegateCloudCredentials
- Export SpawnRecordSchema from history.ts and replace loose
type guard with valibot schema validation in pullChildHistory
- Resolve merge conflicts with main (include both docker and
recursive beta features)
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: add installSpawnCli and delegateCloudCredentials coverage
Export and test installSpawnCli (success + timeout failure paths)
and delegateCloudCredentials (no creds, with creds, write failure,
mkdir failure paths) to improve orchestrate.ts function coverage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gritQL rule false positives and delete.ts coverage
- use TsAsExpression() AST node instead of backtick pattern to avoid
matching import aliases as type assertions
- export and test findDescendants() and pullChildHistory() to bring
delete.ts line coverage above the 35% threshold
- add 8 new tests for descendant finding and history pull edge cases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
* fix: remove docker from --fast and fix docker cp into container
Two fixes for --beta docker:
1. Remove "docker" from --fast beta features — --fast was auto-enabling
--beta docker, pulling ghcr images that hang the session.
Users must now opt in explicitly with --beta docker.
2. Fix uploadFile in docker mode — .spawnrc was uploaded to the host
but never copied into the container. Add docker cp after SCP upload
so env vars and configs reach the agent inside the container.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: keep docker in --fast beta features
The docker cp fix resolves the hang — no need to remove docker from
--fast. The issue was missing file copy into the container, not the
docker mode itself.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: extract makeDockerRunner helper, fix uploadFile into container
Add makeDockerRunner() that wraps a CloudRunner so all commands and
file uploads target the Docker container. Replaces inline lambdas in
hetzner/main.ts and gcp/main.ts with a clean one-liner.
The key fix: uploadFile now docker cp's files into the container after
SCP — previously .spawnrc (API keys, env vars) only landed on the host,
so the agent inside the container had no config and hung.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): shellQuote remotePath in docker cp command
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove local tarball download, use remote-only tarball install
The local-download-then-SCP-upload path was unnecessary complexity —
downloading a tarball to the user's machine just to re-upload it to the
VM is wasteful. The VM downloads directly from GitHub instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: force zeroclaw native runtime to prevent Docker container hang
ZeroClaw auto-detects Docker and launches in a container (pulling
ghcr.io/openrouterteam/spawn-zeroclaw), which hangs the interactive
session. Force native mode via ZEROCLAW_RUNTIME=native env var and
adapter = "native" in config.toml.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: disable openclaw Docker sandbox to prevent container hang
Same issue as zeroclaw — openclaw auto-detects Docker and runs agents
in containers, hanging the interactive session. Disable via
agents.defaults.sandbox.mode = off in config and fallback JSON.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: disable codex Docker sandbox to prevent container hang
Codex CLI also auto-detects Docker for sandboxing. Set
sandbox_mode = "danger-full-access" in config.toml — the VM itself
provides isolation, Docker sandboxing just causes hangs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract duplicate dockerExec helper from gcp/main.ts and hetzner/main.ts
into shared makeDockerExec() in orchestrate.ts. Both local functions were
identical — wrapping commands with docker exec using DOCKER_CONTAINER_NAME
and shellQuote.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 180s timeout to uploadFileSprite to prevent indefinite hangs during
tarball uploads. Without a timeout, large tarballs or stalled Sprite
connections block the entire provisioning pipeline past the 720s E2E
provision timeout, causing agent binary not-found failures for openclaw,
zeroclaw, and codex.
Also skip the redundant remote tarball download fallback when a local
tarball was already downloaded but its upload/extract failed -- the
remote download would face the same extraction issues. This saves ~150s
in the fallback chain, leaving enough time for the live install to
complete within the provision timeout.
Fixes#2960
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Suppress stdout+stderr from `claude install --force` to prevent duplicate
"successfully installed" messages (was printed up to 4x)
- Make logStepInline fall back to newline-separated output when stderr is not
a TTY, so SSH port polling status is readable in piped/captured contexts
- Consolidate post-install completion messages into a single clear milestone:
"Agent setup complete -- {agent} is ready on {cloud}"
- Bump CLI version to 0.25.16
Fixes#2899
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: skip interactive session in headless mode (#2892)
When SPAWN_HEADLESS=1, the orchestrator now exits with code 0 after
provisioning completes instead of attempting to launch the agent
interactively. This fixes Claude Code (and other agents) failing with
"Input must be provided through stdin or --prompt" when spawned via
`--headless --output json` without a prompt.
The VM is fully provisioned and ready — callers can SSH in or use
`spawn connect` to start the agent manually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: clean up SPAWN_HEADLESS env in test afterEach to prevent leaks
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Consolidate DOCKER_CONTAINER_NAME and DOCKER_REGISTRY constants from
gcp/main.ts and hetzner/main.ts into shared/orchestrate.ts. Both files
defined identical values ("spawn-agent" and "ghcr.io/openrouterteam"); they
now import the shared exports instead.
Bumps CLI patch version to 0.25.11.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Sprite CLI exits with code 1 on "connection closed" (not 255 like SSH).
The reconnect loop now treats exit code 1 on Sprite as a connection
drop, retrying up to 5 times with a 3s delay between attempts.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In fast mode, Promise.allSettled runs server boot, OAuth, and tarball
download concurrently. When all operations complete — especially after
Bun.serve.stop(true) in the OAuth flow removes its event loop handle —
the event loop can appear empty before the await continuation starts
new I/O operations. This causes Bun to exit silently with code 0,
dropping the user back to their shell after "Successfully obtained
OpenRouter API key via OAuth!" with no error.
Fix: keep a dummy setInterval handle alive during the fast-mode
concurrent section so the event loop never drains prematurely.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add .js extensions to 124 relative imports that were missing them.
The codebase is "type": "module" (ESM) and the dominant pattern already
used .js extensions, but 35 files had a mix of extensionless and .js
imports — sometimes within the same file. Standardize to .js everywhere.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Fix 24 TypeScript strict mode errors across 7 production files:
- interactive.ts: guard against undefined `val` in validate callback
- list.ts: use already-narrowed `conn` variable instead of `selected.connection`
- run.ts: widen `buildCloudLines` defaults param to `Record<string, unknown>`
- digitalocean.ts: use `toRecord()` to safely drill into nested API responses;
capture narrowed `oauthCode` in const for async closure
- history.ts: backfill missing record IDs via `backfillRecordIds()` helper;
use `v.safeParse` output directly to get properly typed records
- index.ts: use `Manifest` type for `showUnknownCommandError` parameter
- orchestrate.ts: capture narrowed `tunnel` and `getConnectionInfo` in const
variables before async closures
Fixes#2821
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
When SSH exits with code 255 (connection dropped/timed out), retry up
to 5 times with 3s delay between attempts. Clean exits (0), Ctrl+C
(130), and agent crashes exit immediately without retrying.
Only applies to remote clouds — local sessions skip reconnect logic.
Signed-off-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat: never-give-up resilience layer — retry every failure instead of exiting
Add retryOrQuit() helper to shared/ui.ts that prompts "Try again? (Y/n)"
after any recoverable failure. Wrap all fatal exit points with retry loops:
- Cloud auth (Hetzner, DigitalOcean, AWS, GCP): retry after 3 failed tokens
- API key acquisition: retry after 3 failed OAuth+manual attempts
- Server creation: retry on any createServer failure (both fast & sequential)
- SSH readiness: retry on waitForReady timeout
- Agent install: retry on install failure
- Pre-launch hooks: retry on preLaunch failure
Non-interactive mode (SPAWN_NON_INTERACTIVE=1) still throws immediately.
Ctrl+C at any retry prompt exits cleanly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(e2e): add AI-driven interactive test harness
Add --interactive mode to the E2E test framework. Instead of running spawn
in headless mode (SPAWN_NON_INTERACTIVE=1), this spawns the CLI in a real
PTY and uses Claude Haiku to respond to prompts like a human user would.
New files:
- sh/e2e/interactive-harness.ts — Bun script that drives the PTY + AI loop
- sh/e2e/lib/interactive.sh — Bash integration with the E2E framework
Usage:
e2e.sh --cloud hetzner claude --interactive
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(qa): wire interactive E2E into scheduled QA pipeline
- Add `e2e-interactive` option to workflow_dispatch in qa.yml
- Add `e2e-interactive` run mode to qa.sh (loads cloud creds + ANTHROPIC_API_KEY)
- Runs `e2e.sh --cloud hetzner claude --interactive` directly (no Claude Code needed)
- Defaults to hetzner (cheapest), overridable via E2E_INTERACTIVE_CLOUD/AGENT env vars
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(qa): schedule interactive E2E daily at 6am UTC
Runs one agent (claude) on one cloud (hetzner) with AI-driven prompts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(qa): offset soak cron to avoid GitHub Actions schedule dedup
GitHub Actions deduplicates overlapping cron schedules into one run,
making `github.event.schedule` unpredictable. The soak test at `0 3 * * 1`
was getting absorbed by the `0 */4 * * *` quality sweep and never firing
as reason=soak.
Move soak to `30 1 * * 1` (Monday 1:30am UTC) — safely between the
0am and 4am quality sweep slots. Interactive E2E at `0 6 * * *` is
already safe (between the 4am and 8am slots).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(qa): add e2e-interactive to trigger server valid reasons
The trigger server validates reason query params against an allowlist.
Without this, the `e2e-interactive` dispatch returns 400.
Also note: `soak` is already in VALID_REASONS in the repo but the running
service on the QA VM is stale — needs a restart to pick up both soak and
e2e-interactive reasons.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: skip cloud-init for minimal-tier agents with tarballs/snapshots
Ubuntu 24.04 base images already have curl + git, so minimal-tier
agents (claude, opencode, zeroclaw, hermes) don't need the cloud-init
package install step when using tarballs or snapshots.
Adds skipCloudInit flag to CloudOrchestrator — set automatically when
(tarball || snapshot) && tier === "minimal". Each cloud's waitForReady
checks this flag and calls waitForSshOnly instead of waitForCloudInit.
Saves ~30-60s on minimal-tier agent deploys with --fast or --beta tarball.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add --fast mode and updated beta features to README
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: remove timing table from README
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat: add --fast flag for parallel server boot + setup
Adds `--fast` flag that runs server creation concurrently with API key
prompt, account check, pre-provision hooks, tarball download, and env
config generation. Once SSH is up, uploads tarball and applies config.
--fast implies --beta tarball and --beta images, enabling snapshots
and pre-built tarballs automatically.
Flow without --fast (sequential):
auth → API key → preProvision → size → create → boot → install → configure
Flow with --fast (parallel):
auth → size → [create+boot | API key | preProvision | tarball download | accountCheck]
→ upload tarball → inject env → configure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add --beta parallel as standalone opt-in for parallel setup
--beta parallel enables the parallel orchestration without implying
tarball/images. --fast still implies all three (tarball + images +
parallel).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tryCatchIf(isFileError) only catches filesystem errors (ENOENT, EACCES),
but JSON.parse throws SyntaxError on corrupted preferences.json. This
was the same bug fixed in 16a2f180 across 4 files, but orchestrate.ts
was missed. A corrupted ~/.spawn/preferences.json would crash the CLI
instead of gracefully falling back to no preferred model.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Installs a systemd timer + oneshot service that updates the agent binary
and system packages every 6 hours without disrupting running instances.
Agent update safety:
- Binary agents (Go, Rust): Linux keeps old inode in memory; safe to replace
- npm agents: Node.js caches modules at startup; running processes unaffected
- New version takes effect on next restart via the existing restart loop
System update safety:
- Disables Ubuntu's unattended-upgrades to prevent dpkg lock contention
- Uses flock -w 300 on /var/lib/dpkg/lock-frontend before apt operations
- DEBIAN_FRONTEND=noninteractive with --force-confdef/--force-confold
User-facing:
- "Auto-update" option in setup multiselect (default on, user can uncheck)
- Skipped for local cloud and non-systemd systems
- Non-fatal: setup failure doesn't block agent launch
- Logs to /var/log/spawn-auto-update.log
Timer: 15min after boot, then every 6h with 30min random jitter.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded "bash" shell references with platform-aware utilities so
spawn works natively from PowerShell on Windows without WSL or Git Bash.
- New shared/shell.ts: isWindows(), getLocalShell(), getInstallScriptUrl(),
getInstallCmd(), getWhichCommand() with platform override for testability
- local/local.ts: use getLocalShell() for runLocal() and interactiveSession()
- commands/run.ts: spawnScript/runScriptHeadless use getLocalShell()
- commands/update.ts: Windows downloads install.ps1, runs via PowerShell
- update-check.ts: Windows auto-update uses install.ps1; "where" replaces "which"
- shared/orchestrate.ts: PowerShell-compatible .spawnrc setup for local Windows
- Remote SSH commands unchanged — remote servers are always Linux
Closes#2726
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
When the user selects the GitHub CLI step in setup options (interactive
prompt or --steps github), offerGithubAuth() was silently returning early
if no local gh token was found by detectGithubAuth(). This made the step
unreachable for users without gh installed locally — exactly the ones who
need remote setup most.
Fix: accept an `explicitlyRequested` parameter in offerGithubAuth(). When
true, skip the githubAuthRequested guard and always run the remote install.
The orchestrator passes enabledSteps?.has("github") as this flag.
detectGithubAuth() still auto-enables the step when a local token exists
(convenience forwarding), but can no longer block a user-explicit request.
Fixes#2672
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Move "Custom model" from OpenClaw-specific to common setup steps so
every agent shows it in the setup menu. Add modelEnvVar to agents that
support model override via environment variable:
- Kilo Code: KILOCODE_MODEL
- ZeroClaw: ZEROCLAW_MODEL
- Hermes: LLM_MODEL
- Junie: JUNIE_MODEL
When a custom model is selected, the env var is injected into .spawnrc
alongside the other agent env vars. OpenClaw continues to use its
existing configure() path. Claude and Codex don't have modelEnvVar
since they handle model routing differently.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add downloadFile to CloudRunner + local OpenClaw config merge
Add `downloadFile(remotePath, localPath)` to the CloudRunner interface
and implement it across all 6 cloud providers (Hetzner, AWS, GCP,
DigitalOcean, Sprite, Local) — mirroring the existing `uploadFile` with
reversed SCP direction.
Replace the OpenClaw config write with a download → deep-merge → upload
flow so config merging happens in our own linted TypeScript instead of
a remote script.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: move isPlainObject and deepMerge to shared utils
Extract `isPlainObject` to `shared/type-guards.ts` and `deepMerge` to
`shared/parse.ts` so they're reusable across the codebase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: promote isPlainObject to shared package, use across codebase
Move `isPlainObject` from cli/type-guards.ts into
@openrouter/spawn-shared so it can be used everywhere. Replace
inline `val !== null && typeof val === "object" && !Array.isArray(val)`
checks in:
- shared/type-guards.ts (toRecord, toObjectArray)
- shared/parse.ts (parseJsonObj)
- cli/manifest.ts (isValidManifest)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: remove type-guards re-export, import directly from spawn-shared
Delete `packages/cli/src/shared/type-guards.ts` (was just a re-export
barrel). All 35 consuming files now import `getErrorMessage`, `isString`,
`isNumber`, `isPlainObject`, `toRecord`, etc. directly from
`@openrouter/spawn-shared`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WhatsApp setup is too complex for normal users (QR scan + separate
device + pairing). Remove it from the setup options entirely.
Also change multiselect defaults to nothing pre-selected — let users
opt in to what they want instead of pre-selecting for them.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When an agent has an SSH tunnel (e.g., OpenClaw dashboard), store the
tunnel remote port and browser URL template in connection.metadata at
spawn time. On reconnect via `spawn ls` → "Enter agent", re-establish
the SSH tunnel and open the dashboard automatically.
- Add saveMetadata() to history.ts for merging key-value pairs into records
- Store tunnel_remote_port and tunnel_browser_url_template in orchestrate.ts
- Re-establish tunnel in cmdEnterAgent (connect.ts) when metadata is present
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: messaging UX — silence doctor, fix groupPolicy, remove early WhatsApp pairing
- Set groupPolicy to "open" for both Telegram and WhatsApp (was
"allowlist" with empty allowFrom, causing doctor warnings)
- Suppress doctor warning spam by redirecting openclaw config set
stdout to /dev/null
- Remove WhatsApp pairing prompt (appeared immediately after QR scan
before user could message the bot — now just tells them the command)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: improve Telegram/WhatsApp pairing instructions
Add step-by-step instructions for Telegram pairing so users know to
search for their bot in Telegram and message it. Improve WhatsApp
post-link instructions to explain how contacts pair.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: pre-select Telegram in setup options as recommended channel
Telegram has the smoothest setup UX (bot token + pairing code) compared
to WhatsApp (QR scan + separate device). Pre-select it alongside Chrome
in the multiselect and label it as "recommended" in the hint.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Telegram is a built-in channel, not a plugin. Replace broken
`openclaw plugins enable telegram` (OOM) and `openclaw channels add`
(doesn't exist) with proper setup:
- Write channel config (botToken, dmPolicy: pairing, groups) directly
into the atomic JSON config file during setup
- After gateway starts, prompt user to pair via
`openclaw pairing approve <channel> <CODE>`
- WhatsApp: QR scan via `openclaw channels login`, then pairing
- Bump version to 0.17.16
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: set telegram groupPolicy to open during channel setup
OpenClaw defaults groupPolicy to "allowlist" with an empty groupAllowFrom,
which silently drops all group messages. Set it to "open" after adding the
Telegram channel so group messages work out of the box.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use OpenClaw config file for Telegram setup instead of broken CLI commands
Telegram is a built-in channel in OpenClaw, not a plugin. The previous
approach used `openclaw plugins enable telegram` (caused OOM on 2GB) and
`openclaw channels add --channel telegram` (command doesn't exist).
Now writes Telegram config (botToken, enabled, groupPolicy) directly into
the atomic JSON config file during setup. Also sets groupPolicy to "open"
so group messages work out of the box instead of being silently dropped.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use openclaw onboard for channel setup instead of manual config
OpenClaw has a built-in `openclaw onboard` command that interactively
guides users through Telegram/WhatsApp channel setup. Use that instead
of manually prompting for tokens and writing config ourselves.
- Remove custom Telegram token prompt from agent-setup.ts
- Remove broken `openclaw channels add` and `openclaw plugins enable`
- Run `openclaw onboard` after gateway starts for channel setup
- Base config (API key, gateway, model) still written atomically
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: move Telegram/WhatsApp channel setup to after gateway starts
OpenClaw's `channels add` and `channels login` commands require a running
gateway. Previously, Telegram token configuration ran in setupOpenclawConfig
(pre-gateway) using `openclaw config set`, causing the gateway to hang on
startup when a token was present for a disabled-by-default plugin.
Now:
- Plugin enables stay in setupOpenclawConfig (pre-gateway)
- Channel config (token add, QR login) runs in orchestrate.ts step 11c
after the gateway is up, using `openclaw channels add/login`
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: use shellQuote instead of jsonEscape for Telegram token
jsonEscape uses JSON.stringify which produces double-quoted strings that
the shell interprets, creating a command injection vector. shellQuote
wraps in single quotes, preventing shell interpretation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: fix biome export ordering in interactive.ts and manifest.ts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat: add Telegram and WhatsApp options to OpenClaw setup picker
Adds separate "Telegram" and "WhatsApp" checkboxes to the OpenClaw
setup screen:
- Telegram: prompts for bot token from @BotFather, injects into
OpenClaw config via `openclaw config set`
- WhatsApp: reminds user to scan QR code via the web dashboard
after launch (no CLI setup possible)
Updates USER.md with channel-specific guidance when either is selected.
Bump CLI version to 0.16.16.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: run WhatsApp QR scan interactively before TUI launch
Instead of punting WhatsApp setup to "after launch", runs
`openclaw channels login --channel whatsapp` as an interactive SSH
session between gateway start and TUI launch. The user scans the
QR code with their phone during provisioning setup.
Flow: gateway starts → tunnel set up → WhatsApp QR scan → TUI launch
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: update WhatsApp hint to reflect pre-TUI QR scanning
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add --config and --steps CLI flags for programmatic setup
Add --config <path> flag to load spawn options from a JSON config file
(model, steps, name, setup data like telegram_bot_token). Add --steps
<list> flag for comma-separated setup step control. Both enable the
web UI and headless automation to control which setup steps run.
Priority order: CLI flags > --config file > env vars > defaults.
- New spawn-config.ts module with valibot validation
- OptionalStep extended with dataEnvVar and interactive metadata
- validateStepNames() for step name validation with warnings
- Telegram setup reads TELEGRAM_BOT_TOKEN env var before prompting
- WhatsApp auto-skipped in headless mode with warning
- promptSetupOptions() skipped when SPAWN_ENABLED_STEPS already set
- E2E verify helpers for github, browser, telegram setup artifacts
- QA reference file documenting all agent setup options
- Version bump to 0.17.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add --model flag and priority order tests
- Add --model <id> CLI flag that sets MODEL_ID env var
- --model is extracted before --config so it takes priority
- Add config-priority.test.ts with 8 tests verifying:
- --model overrides config model
- --steps overrides config steps
- --steps "" disables all steps
- --name overrides config name
- Config tokens apply as defaults
- Explicit env vars override config tokens
- Remove preferences.json from priority order docs (not needed)
- Add --model to help text and unknown-flag guidance
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add --model, --config, --steps to README
Document config file format, setup steps table, and new CLI flags
in the commands table.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address security review feedback
- Move null byte check before path resolution (defense-in-depth)
- Move agent-setup-options.md from .claude/rules/ to .docs/ (git-ignored)
per documentation policy
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve rebase conflicts and deduplicate --model flag extraction
Rebase on main introduced a duplicate --model flag extraction block
(one from the PR at line 804, one from main at line 941). Consolidated
into the single early extraction point with -m shorthand support.
Also removed duplicate --model entry from KNOWN_FLAGS set.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Adds --model / -m CLI flag to override the agent's default LLM model:
spawn codex gcp --model openai/gpt-5.3-codex
Also supports persistent per-agent model preferences via config file at
~/.config/spawn/preferences.json:
{ "models": { "codex": "openai/gpt-5.3-codex" } }
Priority: --model flag > preferences file > agent default.
This enables a future web UI to pass model selection via CLI args when
invoking spawn programmatically to provision machines.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat: gate tarball install behind --beta=tarball flag
Tarball install is not yet reliable enough to be the default.
Move it behind an opt-in --beta=tarball flag so users can test it
explicitly while live install remains the default path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: support multiple --beta flags (repeatable)
Parse all --beta flags from args in a loop, collecting them into a
comma-separated SPAWN_BETA env var. Consumers check for their feature
with Set.has() so multiple beta features can be active simultaneously.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace for(;;) loop with extractAllFlagValues helper
Cleaner approach: a dedicated helper mutates args in place and returns
all values for a repeatable flag, replacing the infinite loop pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add validateModelId() to reject model IDs containing shell metacharacters.
The validation is applied in orchestrate.ts immediately after resolving
MODEL_ID from env/agent defaults, before the value reaches any agent
configure function or runServer call. Invalid model IDs are dropped to
undefined with a warning.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: unified arrow-key selection + setup checkboxes
Replace p.autocomplete (type-ahead) with p.select (arrow-key navigation)
for agent and cloud selection. Add p.multiselect checkboxes for optional
post-provision setup steps (GitHub CLI, Chrome browser), all ON by default.
Three fast prompts: agent → cloud → setup options. Defaults: OpenClaw,
first cloud with credentials, all steps enabled.
Key changes:
- interactive.ts: p.autocomplete → p.select with initialValue defaults
- interactive.ts: promptSetupOptions() with p.multiselect, exported for reuse
- run.ts: wire setup options into cmdRun direct path
- agents.ts: OptionalStep type, getAgentOptionalSteps() static metadata
- orchestrate.ts: read SPAWN_ENABLED_STEPS env var, gate GitHub auth + configure
- agent-setup.ts: gate Chrome install with enabledSteps in setupOpenclawConfig
- Version bump 0.15.40 → 0.16.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: mirror tarball files to $HOME for non-root SSH users (GCP, AWS)
Tarballs are built with absolute /root/ paths, but GCP and AWS Lightsail
SSH as a regular user whose $HOME is /home/<user>/. After extraction,
binaries like `claude` end up at /root/.claude/local/bin/ but the
launchCmd looks in $HOME/.claude/local/bin/ — causing "command not found".
Add a post-extraction step that copies /root/ dotfiles to $HOME/ when
the SSH user isn't root. This fixes `spawn claude gcp` failing with
exit code 127 after tarball install.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
OpenClaw runs a web dashboard on port 18791 of the remote VM. This
change SSH-tunnels that port to localhost and auto-opens the browser,
giving users a web UI with zero CLI knowledge needed.
- Add TunnelConfig to AgentConfig interface (agents.ts)
- Add startSshTunnel function with port-finding logic (ssh.ts)
- Capture gateway token in closure so the same token is used for both
the remote config and the browser URL (agent-setup.ts)
- Wire tunnel into orchestration pipeline between preLaunch and
interactiveSession (orchestrate.ts)
- Add getConnectionInfo to CloudOrchestrator interface and implement
in all SSH-based clouds (DO, Hetzner, AWS, GCP)
- Local: opens browser directly at localhost:18791
- Sprite: gracefully skipped (no standard SSH)
- Add USER.md bootstrap to guide OpenClaw users to web dashboard
Closes#2449
Supersedes #2418
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>