spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-04-28 11:59:29 +00:00

Author	SHA1	Message	Date
A	50319e0d39	fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935 ) Hetzner E2E runs fail with `resource_limit_exceeded` when stale primary IPs from previous test runs consume the account quota. This adds proactive cleanup at two levels: 1. E2E shell driver: `_hetzner_cleanup_orphaned_ips()` deletes unattached primary IPs during pre-batch stale cleanup, freeing quota before any new servers are provisioned. 2. TypeScript CLI: `hetzner/main.ts` calls `cleanupOrphanedPrimaryIps()` before `createServer()` in headless/non-interactive mode, ensuring each agent provisioning attempt starts with a clean IP quota. The existing reactive cleanup (retry after failure) in `hetzner.ts` remains as a fallback. Fixes #2933 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 11:20:30 +07:00
Ahmed Abushagur	472b315762	fix: prevent permanent history lock when PID file write fails (#2928 ) Two bugs in acquireLock: 1. PID write failure was ignored — process returned success but left a lock dir without a PID file. If it crashed, no other process could detect the lock as stale, making it permanent. 2. Lock dirs without PID files were not treated as stale — other processes waited until timeout instead of cleaning up immediately. Fix: retry on PID write failure (clean up dir first), and treat lock dirs without PID files as broken/stale (force remove). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 06:47:10 +07:00
A	18b1a5f50f	fix(install): force IPv4 DNS for npm installs and add junie binary verify (#2920 ) * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * fix(install): force IPv4 DNS for npm installs and add junie binary verify On Sprite VMs (and potentially other clouds with flaky IPv6 routing), npm install of packages with native-binary postinstall scripts (kilocode, junie) fails with i/o timeout when connecting to the npm registry over IPv6. Changes: - Add NODE_OPTIONS=--dns-result-order=ipv4first to NPM_PREFIX_SETUP so all npm installs prefer IPv4, preventing the IPv6 timeout on first attempt - Add cd ~ before postinstall re-run in KILOCODE_BINARY_VERIFY to avoid "current working directory was deleted" errors in bun/node on retry - Add JUNIE_BINARY_VERIFY snippet (analogous to kilocode) that detects and recovers from a failed junie postinstall by re-running it from $HOME - Apply JUNIE_BINARY_VERIFY to the junie install command Fixes sprite kilocode and junie failures seen in E2E run 2026-03-23. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-24 05:13:12 +07:00
A	e0db833307	fix(update-check): redirect install script stdout to stderr in --output json mode (#2919 ) When --output json is requested, the auto-update install script was running with stdio: "inherit", causing [spawn] install messages to pollute stdout before the JSON result, breaking JSON consumers. Fix: - Pre-scan process.argv for --output json before checkForUpdates() is called in index.ts (formal flag parsing happens later at line 944) - Pass jsonOutput flag through checkForUpdates() -> performAutoUpdate() - When jsonOutput=true, use stdio: ["pipe", stderr, stderr] for the install script execution so all output goes to stderr only - Set SPAWN_CLI_UPDATED=1 env var on re-exec so JSON consumers can detect the update via cli_updated: true in SpawnResult - Add cli_updated?: boolean to SpawnResult interface in commands/run.ts - Add tests covering both json and non-json stdio behavior Fixes #2918 Agent: issue-fixer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-24 03:18:50 +07:00
A	f38ae693de	fix: set SPAWN_NON_INTERACTIVE in headless mode to prevent prompt hangs (#2916 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details Headless mode set SPAWN_HEADLESS and SPAWN_MODE but not SPAWN_NON_INTERACTIVE, which all cloud modules check before prompting. This caused GCP (and potentially other clouds) to prompt for project confirmation when stdin was closed, resulting in a fatal error. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 01:22:47 +07:00
A	a959a6db83	fix(types): remove `as` type assertions from test mocks (#2913 ) Add missing fields (signalCode, resourceUsage, pid, killed) to Bun.spawnSync and Bun.spawn mock return values so they satisfy the full return types without needing `as` casts or biome-ignore comments. Agent: style-reviewer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-24 00:24:49 +07:00
A	59dea5fc09	refactor: remove dead code and stale references (#2908 ) - remove `export` from `LocalTarball` interface in `shared/agent-tarball.ts` — the type is only used internally as the return type of `downloadTarballLocally`; it was never imported from outside the module. - remove `getTerminalWidth` re-export from `commands/index.ts` — `getTerminalWidth` is only called inside `commands/info.ts` itself; it was re-exported through the barrel but never imported from there by any consumer or test. bump CLI version patch: 0.25.18 → 0.25.19 Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 19:51:41 +07:00
A	f296544c1c	fix(cli): bump version to 0.25.18 for security fix in #2904 (#2906 ) Commit `97b6424` (fix(security): add cmd validation to Sprite runSprite() and runSpriteSilent()) changed production CLI code without a corresponding version bump. The CLI has auto-update — without this bump users won't receive the null-byte injection guard. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-23 18:50:00 +07:00
A	5392ff2d7a	fix: detect and recover from Hetzner primary_ip_limit exceeded error (#2905 ) When parallel E2E runs exhaust Hetzner's Primary IP quota, the CLI now detects the `resource_limit_exceeded` / `primary_ip_limit` error, automatically cleans up orphaned Primary IPs (unattached to any server), and retries once. If cleanup doesn't free quota, a clear message guides users to delete stale resources or request a quota increase. Fixes #2902 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 17:26:32 +07:00
A	7aba20e327	fix(ux): deduplicate install messages, add newlines to SSH polling, clarify completion messages (#2900 ) - Suppress stdout+stderr from `claude install --force` to prevent duplicate "successfully installed" messages (was printed up to 4x) - Make logStepInline fall back to newline-separated output when stderr is not a TTY, so SSH port polling status is readable in piped/captured contexts - Consolidate post-install completion messages into a single clear milestone: "Agent setup complete -- {agent} is ready on {cloud}" - Bump CLI version to 0.25.16 Fixes #2899 Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 15:26:34 +07:00
A	f1f2667cb0	fix: skip interactive session in headless mode (#2895 ) * fix: skip interactive session in headless mode (#2892) When SPAWN_HEADLESS=1, the orchestrator now exits with code 0 after provisioning completes instead of attempting to launch the agent interactively. This fixes Claude Code (and other agents) failing with "Input must be provided through stdin or --prompt" when spawned via `--headless --output json` without a prompt. The VM is fully provisioned and ready — callers can SSH in or use `spawn connect` to start the agent manually. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: clean up SPAWN_HEADLESS env in test afterEach to prevent leaks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-22 21:38:53 -07:00
A	0224b56a4d	fix(digitalocean): detect droplet limit before creation, clear error on 422 (#2891 ) checkAccountStatus() now queries the account's droplet_limit and current droplet count. When at capacity it warns interactively and throws immediately in headless/E2E mode with a clear message instead of attempting creation and getting a cryptic 422. Also adds specific detection of droplet limit 422 errors in createServer() with actionable guidance (limit increase URL). Bump CLI to 0.25.14. Fixes #2865 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-22 18:49:17 -07:00
Ahmed Abushagur	baf03ce47b	fix: prevent sprite idle shutdown during agent install (#2874 ) The sprite was going idle and shutting down during long npm install operations because the remote keep-alive script wasn't installed yet and sprite exec alone doesn't count as activity. - Add local keep-alive that pings the sprite's public URL every 30s from the client machine during provisioning and agent install - Stop it when the interactive session starts (remote script takes over) - Add i/o timeout to spriteRetry's transient error regex so connection timeouts are retried instead of failing immediately Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 02:13:07 +07:00
A	c1363b138c	feat(gcp): default boot disk to 40 GB, configurable via GCP_DISK_SIZE (#2867 ) GCP's default 10 GB boot disk is insufficient for coding agents — node_modules, apt packages, and build caches easily exceed it. Default to 40 GB and allow override via GCP_DISK_SIZE env var. Closes #2866 Co-authored-by: Claude <claude@anthropic.com>	2026-03-22 11:21:05 +07:00
A	3f12cb9ee8	refactor: remove duplicate docker constants into shared orchestrate module (#2860 ) Consolidate DOCKER_CONTAINER_NAME and DOCKER_REGISTRY constants from gcp/main.ts and hetzner/main.ts into shared/orchestrate.ts. Both files defined identical values ("spawn-agent" and "ghcr.io/openrouterteam"); they now import the shared exports instead. Bumps CLI patch version to 0.25.11. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-21 14:27:21 -07:00
A	7ab6c693d3	fix: add --beta docker to help output and update description (#2857 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details The --beta docker feature (PR #2854) was missing from `spawn help` output, and its error description said "Hetzner" only but it also works on GCP. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-21 06:20:35 -07:00
Ahmed Abushagur	6d2c4746f5	feat: add --beta docker for Hetzner Docker CE app image (#2854 ) * feat: add --beta docker for Hetzner Docker CE app image Uses Hetzner's pre-built docker-ce app image when --beta docker (or --fast) is active, giving faster boot times similar to DO marketplace images. Snapshots still take priority when available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: pull and run pre-built agent Docker images on Hetzner When --beta docker (or --fast) is active, boots Hetzner with docker-ce app image, then pulls ghcr.io/openrouterteam/spawn-{agent}:latest and runs it. All runServer commands are routed through docker exec into the container, and the interactive session uses docker exec -it. Skips agent install since the agent is pre-baked in the image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --beta docker support for GCP with Container-Optimized OS When --beta docker (or --fast) is active on GCP, uses cos-stable from cos-cloud (Docker pre-installed, read-only OS). Skips cloud-init startup script (incompatible with COS), pulls the pre-built agent image from ghcr.io, and routes all commands through docker exec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: correct import path for logInfo/logStep (shared/log.js -> shared/ui.js) The log.js module does not exist; these functions are exported from ui.ts. Also merge duplicate ui.js imports per biome organizeImports. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-03-21 17:10:19 +07:00
Ahmed Abushagur	8c7a381375	fix: auto-reconnect on Sprite connection drops (#2855 ) Sprite CLI exits with code 1 on "connection closed" (not 255 like SSH). The reconnect loop now treats exit code 1 on Sprite as a connection drop, retrying up to 5 times with a 3s delay between attempts. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 15:13:14 +07:00
Ahmed Abushagur	26332afa56	fix: prevent silent exit in --fast mode on Sprite (#2852 ) In fast mode, Promise.allSettled runs server boot, OAuth, and tarball download concurrently. When all operations complete — especially after Bun.serve.stop(true) in the OAuth flow removes its event loop handle — the event loop can appear empty before the await continuation starts new I/O operations. This causes Bun to exit silently with code 0, dropping the user back to their shell after "Successfully obtained OpenRouter API key via OAuth!" with no error. Fix: keep a dummy setInterval handle alive during the fast-mode concurrent section so the event loop never drains prematurely. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 20:51:02 -07:00
A	b9e326d649	fix: use base64 encoding for GITHUB_TOKEN to prevent injection (#2840 ) * fix: use base64 encoding for GITHUB_TOKEN to prevent injection Aligns GITHUB_TOKEN handling with the existing base64 pattern used for OPENROUTER_API_KEY in orchestrate.ts, eliminating the single-quote escaping vulnerability. Fixes #2834 Agent: security-auditor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: apply shellQuote to base64-encoded GITHUB_TOKEN Address security review feedback: wrap the base64-encoded token in shellQuote() for defense-in-depth, preventing any theoretical shell metacharacter escape from the interpolated value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 16:46:49 -07:00
A	f4e2cd80a4	fix(ux): add spawn link to help output and --fast to KNOWN_FLAGS (#2828 ) spawn link is a fully implemented command (440 lines) that was completely missing from `spawn help`. Users had no way to discover it through the CLI's self-documentation. Also adds --fast to the KNOWN_FLAGS set for consistency — it was accepted by the CLI but not registered in the flag validation set. Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-20 08:49:26 -07:00
Ahmed Abushagur	21c0e1511c	fix: remove 100-entry history cap — keep all records (#2819 ) The MAX_HISTORY_ENTRIES=100 cap silently archived records when you spawned more than 100 times, making older active servers vanish from `spawn list`. The cap was solving a non-problem — 1000 records is ~500KB. Removed: - MAX_HISTORY_ENTRIES constant and trimming logic - archiveRecords() and readExistingArchive() (no longer needed) - Smart trim tests (history-trimming.test.ts rewritten to test ordering only) Existing archive files (~/.spawn/history-YYYY-MM-DD.json) are still readable by recoverFromArchives() for corruption recovery. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 06:32:08 -07:00
A	24bdf664ab	fix(types): resolve TypeScript strict mode errors in production code (#2824 ) Fix 24 TypeScript strict mode errors across 7 production files: - interactive.ts: guard against undefined `val` in validate callback - list.ts: use already-narrowed `conn` variable instead of `selected.connection` - run.ts: widen `buildCloudLines` defaults param to `Record<string, unknown>` - digitalocean.ts: use `toRecord()` to safely drill into nested API responses; capture narrowed `oauthCode` in const for async closure - history.ts: backfill missing record IDs via `backfillRecordIds()` helper; use `v.safeParse` output directly to get properly typed records - index.ts: use `Manifest` type for `showUnknownCommandError` parameter - orchestrate.ts: capture narrowed `tunnel` and `getConnectionInfo` in const variables before async closures Fixes #2821 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-03-20 03:17:04 -07:00
A	69b6f8aa66	fix(test): fix 7 failing tests — GCP mock gaps and sandbox pollution (#2816 ) - GCP coverage tests (6 failures): getServerIp, listServers, and authenticate tests did not mock the `which gcloud` spawnSync call inside requireGcloudCmd(), causing "gcloud CLI not found" errors. Add mockSpawnSyncWithGcloud/mockWhichGcloud helpers that satisfy the gcloud discovery call before the test-specific mock. - Sandbox guardrail test (1 failure): cmd-uninstall-cov deletes ~/.spawn and other sandbox directories but never re-creates them. Since Bun runs test files in the same process, the fs-sandbox test then fails. Add afterEach restoration of sandbox dirs. - Add coverageThreshold to bunfig.toml with correct syntax (coverageThreshold under [test], not [test.coverage]) Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-19 23:43:13 -07:00
A	9ae3525030	feat: enforce CI coverage thresholds + colocate billing guidance (#2811 ) - Move bunfig.toml to repo root with valid coverageThreshold syntax (line=80%, function=0 to avoid per-file false positives) - Add --coverage flag to CI test step - Delete packages/cli/bunfig.toml (superseded by root config) - Add tests for packages/shared (type-guards, parse, result) - Colocate billing config into each cloud directory (aws/billing.ts, gcp/billing.ts, hetzner/billing.ts, digitalocean/billing.ts) - Refactor billing-guidance.ts: BillingConfig interface replaces cloud-string-keyed Record maps - Bump CLI version to 0.25.1 Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-19 22:52:45 -07:00
Ahmed Abushagur	ed127cf592	feat: never-give-up resilience layer (#2807 ) Some checks failed CLI Release / Build and release CLI (push) Failing after 5s Details Lint / Biome Lint (push) Failing after 4s Details Lint / macOS Compatibility (push) Successful in 15s Details Lint / ShellCheck (push) Successful in 59s Details * feat: never-give-up resilience layer — retry every failure instead of exiting Add retryOrQuit() helper to shared/ui.ts that prompts "Try again? (Y/n)" after any recoverable failure. Wrap all fatal exit points with retry loops: - Cloud auth (Hetzner, DigitalOcean, AWS, GCP): retry after 3 failed tokens - API key acquisition: retry after 3 failed OAuth+manual attempts - Server creation: retry on any createServer failure (both fast & sequential) - SSH readiness: retry on waitForReady timeout - Agent install: retry on install failure - Pre-launch hooks: retry on preLaunch failure Non-interactive mode (SPAWN_NON_INTERACTIVE=1) still throws immediately. Ctrl+C at any retry prompt exits cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(e2e): add AI-driven interactive test harness Add --interactive mode to the E2E test framework. Instead of running spawn in headless mode (SPAWN_NON_INTERACTIVE=1), this spawns the CLI in a real PTY and uses Claude Haiku to respond to prompts like a human user would. New files: - sh/e2e/interactive-harness.ts — Bun script that drives the PTY + AI loop - sh/e2e/lib/interactive.sh — Bash integration with the E2E framework Usage: e2e.sh --cloud hetzner claude --interactive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(qa): wire interactive E2E into scheduled QA pipeline - Add `e2e-interactive` option to workflow_dispatch in qa.yml - Add `e2e-interactive` run mode to qa.sh (loads cloud creds + ANTHROPIC_API_KEY) - Runs `e2e.sh --cloud hetzner claude --interactive` directly (no Claude Code needed) - Defaults to hetzner (cheapest), overridable via E2E_INTERACTIVE_CLOUD/AGENT env vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(qa): schedule interactive E2E daily at 6am UTC Runs one agent (claude) on one cloud (hetzner) with AI-driven prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(qa): offset soak cron to avoid GitHub Actions schedule dedup GitHub Actions deduplicates overlapping cron schedules into one run, making `github.event.schedule` unpredictable. The soak test at `0 3 * * 1` was getting absorbed by the `0 /4 * ` quality sweep and never firing as reason=soak. Move soak to `30 1 * 1` (Monday 1:30am UTC) — safely between the 0am and 4am quality sweep slots. Interactive E2E at `0 6 * * ` is already safe (between the 4am and 8am slots). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix(qa): add e2e-interactive to trigger server valid reasons The trigger server validates reason query params against an allowlist. Without this, the `e2e-interactive` dispatch returns 400. Also note: `soak` is already in VALID_REASONS in the repo but the running service on the QA VM is stale — needs a restart to pick up both soak and e2e-interactive reasons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 17:33:22 -07:00
Ahmed Abushagur	2280550c18	perf: skip cloud-init for minimal-tier agents with tarballs/snapshots (#2804 ) * perf: skip cloud-init for minimal-tier agents with tarballs/snapshots Ubuntu 24.04 base images already have curl + git, so minimal-tier agents (claude, opencode, zeroclaw, hermes) don't need the cloud-init package install step when using tarballs or snapshots. Adds skipCloudInit flag to CloudOrchestrator — set automatically when (tarball \|\| snapshot) && tier === "minimal". Each cloud's waitForReady checks this flag and calls waitForSshOnly instead of waitForCloudInit. Saves ~30-60s on minimal-tier agent deploys with --fast or --beta tarball. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add --fast mode and updated beta features to README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove timing table from README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-19 16:14:49 -07:00
A	1d0349cc23	test: add SPAWN_FAST fast-mode coverage to orchestrate (#2801 ) Add 6 test cases verifying the Promise.allSettled parallel orchestration path introduced in #2796. Tests cover: happy path, server boot failure propagation, API key failure propagation, tarball fallback to agent.install, local cloud exclusion from fast mode, and non-fatal preProvision/checkAccountReady failures. Agent: test-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 13:16:02 -07:00
Ahmed Abushagur	5efbcf9ee7	feat: add --fast flag for parallel server boot + setup (#2796 ) * feat: add --fast flag for parallel server boot + setup Adds `--fast` flag that runs server creation concurrently with API key prompt, account check, pre-provision hooks, tarball download, and env config generation. Once SSH is up, uploads tarball and applies config. --fast implies --beta tarball and --beta images, enabling snapshots and pre-built tarballs automatically. Flow without --fast (sequential): auth → API key → preProvision → size → create → boot → install → configure Flow with --fast (parallel): auth → size → [create+boot \| API key \| preProvision \| tarball download \| accountCheck] → upload tarball → inject env → configure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --beta parallel as standalone opt-in for parallel setup --beta parallel enables the parallel orchestration without implying tarball/images. --fast still implies all three (tarball + images + parallel). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 10:26:54 -07:00
A	6772ed1cd7	fix(cli): validate agentKey in buildFixScript and fixSpawn before manifest lookup (#2792 ) Some checks failed Lint / ShellCheck (push) Successful in 1m5s Details CLI Release / Build and release CLI (push) Failing after 18s Details Lint / Biome Lint (push) Failing after 4s Details Lint / macOS Compatibility (push) Successful in 14s Details Add validateIdentifier() calls to buildFixScript() and fixSpawn() to ensure agent keys from spawn history match [a-z0-9_-]+ before using them to index manifest.agents. This prevents potential prototype pollution or unexpected behavior from tampered history files. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-19 06:36:06 -07:00
A	787087144c	fix(cli): bump version to 0.23.2 for missed patch releases (#2787 ) Some checks failed CLI Release / Build and release CLI (push) Failing after 5s Details Lint / Biome Lint (push) Failing after 4s Details Lint / macOS Compatibility (push) Successful in 17s Details Lint / ShellCheck (push) Successful in 57s Details Two CLI changes landed after the last version bump (0.23.1) without incrementing the version: - `d9575acd`: fix(cli): exit with code 1 on spawn fix error paths - `148cc9e7`: refactor: extract duplicate waitForSshSnapshotBoot to shared/ssh.ts The CLI has auto-update enabled — without a version bump, users won't pick up these fixes on next run. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-19 01:00:10 -07:00
A	15a62a9ad0	fix(cli): use tryCatch for JSON.parse in loadPreferredModel (#2782 ) tryCatchIf(isFileError) only catches filesystem errors (ENOENT, EACCES), but JSON.parse throws SyntaxError on corrupted preferences.json. This was the same bug fixed in `16a2f180` across 4 files, but orchestrate.ts was missed. A corrupted ~/.spawn/preferences.json would crash the CLI instead of gracefully falling back to no preferred model. Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-18 20:15:17 -07:00
Ahmed Abushagur	7289f3ef36	feat(hetzner): add snapshot support + Packer image builds (#2774 ) Some checks failed CLI Release / Build and release CLI (push) Failing after 31s Details Lint / ShellCheck (push) Successful in 40s Details Lint / Biome Lint (push) Failing after 14s Details Lint / macOS Compatibility (push) Successful in 18s Details CLI changes: - Add findSpawnSnapshot() to query Hetzner /images?type=snapshot API for pre-built spawn-{agent}-* images (matches by description prefix) - Add waitForSshOnly() for snapshot boots (skips cloud-init polling) - Update createServer() to accept optional snapshotId — boots from snapshot instead of ubuntu-24.04, skips cloud-init userdata - Wire up orchestrator with skipAgentInstall flag Packer changes: - Add packer/hetzner.pkr.hcl using hcloud plugin, mirroring the DO template (tier scripts, agent install, cleanup, manifest) - Unify packer-snapshots.yml to build both DO and Hetzner in a single workflow with cloud×agent matrix and per-cloud cleanup steps Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 16:46:48 -07:00
A	fc98700a24	fix(digitalocean): use s-2vcpu-4gb-intel for openclaw to support nyc3 region (#2769 ) s-2vcpu-4gb is not available in nyc3 (the default E2E region), causing openclaw provisioning to fail with 422. s-2vcpu-4gb-intel offers the same specs (2 vCPUs, 4 GB RAM) and is available in all regions including nyc3. -- qa/e2e-tester Co-authored-by: spawn-qa-bot <qa@openrouter.ai>	2026-03-18 11:26:19 -07:00
A	b46524887d	feat(hetzner): fetch locations from API, re-prompt on unavailable location (#2766 ) Hetzner disabled fsn1 (Falkenstein), causing a fatal HTTP 412 error for all users using the default location. This change: - Fetches available locations dynamically from GET /locations API - Falls back to a hardcoded list if the API call fails - On location-unavailable errors (HTTP 412 resource_unavailable), prompts the user to pick a different location instead of crashing - Changes default location from fsn1 to nbg1 (Nuremberg) - Excludes previously-failed locations from the re-pick list Closes #2764 Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Security Reviewer <security@openrouter.ai> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-18 10:39:42 -07:00
A	75c75d42d4	fix(ui): propagate Ctrl+C/Esc cancellation instead of returning empty string (#2757 ) When p.isCancel() detected user cancellation in prompt() and selectFromList(), the result was silently converted to "" instead of exiting. This caused infinite retry loops in billing prompts, silent fallthrough in oauth key entry, and unintended defaults in name prompts. Now both functions call process.exit(0) on cancel for a clean exit. Fixes #2745 Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-17 23:54:32 -07:00
A	fef312cd47	fix(update): cache successful update checks for 1 hour (#2755 ) checkForUpdates() previously fetched the latest version from GitHub on every single CLI invocation, blocking for up to 10s on slow/offline connections. Now it writes a timestamp to ~/.config/spawn/.update-checked after a successful check and skips the network call if the cache is less than 1 hour old. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-17 23:08:05 -07:00
A	a557fb1002	fix(cli): handle --help and --version flags after positional args (#2750 ) Previously, `spawn claude sprite --help` would warn about extra args and proceed to provision a server. Now trailing help/version flags are detected and handled correctly in both the default command path and verb alias path (e.g., `spawn run claude sprite --help`). Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 22:29:48 -07:00
A	1e190924bf	fix(aws): wait for public IP before returning from waitForInstance (#2746 ) Lightsail can report state=running before assigning a public IP. Continue polling until both state is running and IP is non-empty, preventing SSH connection failures from an empty IP address. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 22:16:57 -07:00
A	f35696434a	fix(security): use writeFileSync for credential files — Bun.write ignores mode option (#2742 ) Bun.write does not support the `mode` option, so credential config files (Hetzner, DigitalOcean, AWS, OpenRouter) were created with 0644 permissions instead of the intended 0600, exposing API tokens to other local users. Switch to node:fs writeFileSync which correctly applies file permissions. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 22:09:36 -07:00
A	7fe1bdf6b3	fix(junie): remove JUNIE_MODEL env var to fix 'Unknown model: openrouter/auto' crash (#2735 ) Junie only accepts its own shorthand model names (gpt, opus, sonnet, etc.) and not OpenRouter model IDs. Removing modelEnvVar lets junie handle its own model routing via the OpenRouter API key instead. Fixes #2734 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-17 21:22:32 -07:00
A	b1de116690	refactor: replace manual multi-level type guards with toRecord/isString in index.ts (#2731 ) Two instances of the pattern `err && typeof err === "object" && "code" in err` violated the type-safety rule requiring valibot or shared type-guard utilities instead of manual multi-level type checks. Replaced with `toRecord(err)` and `isString()` from @openrouter/spawn-shared for consistent, rule-compliant error code extraction. Also bumps CLI patch version per cli-version.md. -- qa/code-quality Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-17 18:40:16 -07:00
Ahmed Abushagur	66b16d8651	feat: add Windows PowerShell support — remove bash dependency for local execution (#2727 ) Replace hardcoded "bash" shell references with platform-aware utilities so spawn works natively from PowerShell on Windows without WSL or Git Bash. - New shared/shell.ts: isWindows(), getLocalShell(), getInstallScriptUrl(), getInstallCmd(), getWhichCommand() with platform override for testability - local/local.ts: use getLocalShell() for runLocal() and interactiveSession() - commands/run.ts: spawnScript/runScriptHeadless use getLocalShell() - commands/update.ts: Windows downloads install.ps1, runs via PowerShell - update-check.ts: Windows auto-update uses install.ps1; "where" replaces "which" - shared/orchestrate.ts: PowerShell-compatible .spawnrc setup for local Windows - Remote SSH commands unchanged — remote servers are always Linux Closes #2726 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-17 16:35:23 -07:00
A	1733903a1f	fix(digitalocean): add OAuth recovery in doApi for mid-session 401 errors (#2723 ) When a DigitalOcean token expires mid-session (after ensureDoToken succeeds), API calls like ensureSshKey, createServer, listServers, destroyServer would crash with "Fatal: DigitalOcean API error 401" because doApi had no recovery path for 401 responses. Now doApi detects 401, attempts OAuth browser flow recovery via tryDoOAuth(), and retries the request with the new token. A re-entrancy guard prevents infinite loops (doApi → tryDoOAuth → doApi → ...). If OAuth recovery fails, the original 401 error is thrown as before. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 16:13:42 -07:00
A	0e5bfd830b	fix(e2e): double GCP cloud-init wait timeout to 10 minutes for Node install (#2713 ) * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * chore: update agent GitHub star counts * fix(gcp): double cloud-init wait timeout to 120 attempts (10 min) GCP startup scripts installing Node.js 22 via `n` from curl take longer than 5 min on cold starts. The previous 60-attempt (5 min) poll timed out with "Startup script may not have completed, continuing..." and proceeded to run `npm install -g @kilocode/cli` before npm was available, causing `npm: command not found` errors. Increase `maxAttempts` from 60 to 120 (10 min) in `waitForCloudInit` to give the Node install enough time to complete on GCP cold starts. Confirmed by E2E run: GCP kilocode failed with npm not found after all 60 poll attempts exhausted; all other GCP agents passed (they don't need Node). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-17 11:51:41 -07:00
Ahmed Abushagur	34785a9a63	feat(hermes): add YOLO mode toggle to setup menu (#2711 ) Add HERMES_YOLO_MODE as a setup option for Hermes Agent, enabled by default. This disables Hermes's security approval prompts so it can self-install skill dependencies (e.g. himalaya for email) at runtime on dedicated cloud VMs. Users can uncheck it in the setup multiselect if they prefer Hermes to prompt before installing tools. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 10:09:41 -07:00
A	eec83898e4	fix(kilocode): add binary verification after npm install to recover from silent postinstall failures (#2707 ) @kilocode/cli v7+ uses a native binary postinstall that downloads a platform-specific binary. On some clouds (notably GCP with cloudInitTier "node"), this postinstall can fail silently, leaving the npm bin symlink pointing to a JS wrapper with no actual native binary to exec. The fix adds a KILOCODE_BINARY_VERIFY shell snippet that runs after npm install and: 1. Checks if kilocode is already working (fast path) 2. If not, finds the npm package dir and re-runs the postinstall 3. If still not found, searches for the native binary in the package dir and symlinks it into a PATH-accessible location Fixes #2706 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 08:30:00 -07:00
A	5b2eddb763	fix(sprite): replace personal VM URL with official CDN for keep-alive script (#2701 ) The sprite-keep-running.sh script was downloaded from a hardcoded personal VM URL (kurt-claw-f.sprites.app) which would break all Sprite deployments if that VM goes offline. Use the official CDN proxy at openrouter.ai/labs/spawn/. Fixes #2699 -- refactor/code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-16 20:04:49 -07:00
A	b854917186	fix(security): validate tunnel URL and port from history before openBrowser() (#2697 ) Add validateTunnelUrl() and validateTunnelPort() in security.ts to prevent phishing attacks via tampered ~/.spawn/history.json. Apply both validations in cmdEnterAgent() and cmdOpenDashboard() in connect.ts before any tunnel data is used. - validateTunnelUrl: enforce URL starts with http://localhost: or http://127.0.0.1: only (blocks external/phishing URLs) - validateTunnelPort: enforce numeric value in range 1-65535 - Add comprehensive test cases for both validators Fixes #2696 Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-16 15:22:29 -07:00
A	644593eaea	fix(security): propagate path normalization to all cloud modules (#2693 ) * fix(security): propagate path normalization to all cloud upload/download functions PR #2690 added normalize() before path traversal checks in AWS but not the other clouds. Apply the same defense-in-depth to GCP, DigitalOcean, Hetzner, Sprite, and shared validateRemotePath. Agent: code-health Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(security): use normalized path in all file transfer operations Addresses code review: replace original remotePath with normalizedRemote in scp commands and bash operations to prevent validation bypass. - digitalocean: use normalizedRemote in uploadFile scp and derive expandedPath from normalizedRemote in downloadFile - hetzner: same pattern for uploadFile/downloadFile - gcp: derive expandedPath from normalizedRemote.replace(...) in both uploadFile and downloadFile - sprite: use normalizedRemote in bash mkdir/mv command and derive expandedPath from normalizedRemote in downloadFile Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(security): close validation bypass in agent-setup and AWS file ops validateRemotePath() validated the normalized path but returned void, so the caller still used the original unsanitized remotePath in shell commands — bypassing the normalization check entirely. Fix: return the normalized path and use it in all file operations. Also fix AWS uploadFile/downloadFile which validated normalizedRemote but used the original remotePath in scp commands. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-16 14:48:59 -07:00

1 2 3 4 5 ...

296 commits