The sprite-keep-running.sh script was downloaded from a hardcoded personal
VM URL (kurt-claw-f.sprites.app) which would break all Sprite deployments
if that VM goes offline. Use the official CDN proxy at openrouter.ai/labs/spawn/.
Fixes#2699
-- refactor/code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): propagate path normalization to all cloud upload/download functions
PR #2690 added normalize() before path traversal checks in AWS but not
the other clouds. Apply the same defense-in-depth to GCP, DigitalOcean,
Hetzner, Sprite, and shared validateRemotePath.
Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(security): use normalized path in all file transfer operations
Addresses code review: replace original remotePath with normalizedRemote
in scp commands and bash operations to prevent validation bypass.
- digitalocean: use normalizedRemote in uploadFile scp and derive
expandedPath from normalizedRemote in downloadFile
- hetzner: same pattern for uploadFile/downloadFile
- gcp: derive expandedPath from normalizedRemote.replace(...) in both
uploadFile and downloadFile
- sprite: use normalizedRemote in bash mkdir/mv command and derive
expandedPath from normalizedRemote in downloadFile
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): close validation bypass in agent-setup and AWS file ops
validateRemotePath() validated the normalized path but returned void,
so the caller still used the original unsanitized remotePath in shell
commands — bypassing the normalization check entirely.
Fix: return the normalized path and use it in all file operations.
Also fix AWS uploadFile/downloadFile which validated normalizedRemote
but used the original remotePath in scp commands.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add downloadFile to CloudRunner + local OpenClaw config merge
Add `downloadFile(remotePath, localPath)` to the CloudRunner interface
and implement it across all 6 cloud providers (Hetzner, AWS, GCP,
DigitalOcean, Sprite, Local) — mirroring the existing `uploadFile` with
reversed SCP direction.
Replace the OpenClaw config write with a download → deep-merge → upload
flow so config merging happens in our own linted TypeScript instead of
a remote script.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: move isPlainObject and deepMerge to shared utils
Extract `isPlainObject` to `shared/type-guards.ts` and `deepMerge` to
`shared/parse.ts` so they're reusable across the codebase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: promote isPlainObject to shared package, use across codebase
Move `isPlainObject` from cli/type-guards.ts into
@openrouter/spawn-shared so it can be used everywhere. Replace
inline `val !== null && typeof val === "object" && !Array.isArray(val)`
checks in:
- shared/type-guards.ts (toRecord, toObjectArray)
- shared/parse.ts (parseJsonObj)
- cli/manifest.ts (isValidManifest)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: remove type-guards re-export, import directly from spawn-shared
Delete `packages/cli/src/shared/type-guards.ts` (was just a re-export
barrel). All 35 consuming files now import `getErrorMessage`, `isString`,
`isNumber`, `isPlainObject`, `toRecord`, etc. directly from
`@openrouter/spawn-shared`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `sprite create` API call in `createSprite()` had no timeout, so when
the Sprite API blocked for certain agents (kilocode, opencode), the
process hung indefinitely. The bash-level timeout in provision.sh wraps
the outer subshell but the deeply-nested `sprite create` subprocess
could survive signal propagation.
Add a 300s (configurable via SPRITE_CREATE_TIMEOUT) timeout to the
`sprite create` subprocess using the existing killWithTimeout +
asyncTryCatch pattern already used by runSprite() and destroyServer().
Fixes#2612
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two new GritQL biome plugins (matching ori repo patterns) that ban
all try/catch and try/finally in TypeScript code. Convert all remaining
blocks across production and test files to use tryCatch/asyncTryCatch
from @openrouter/spawn-shared.
no-try-catch.grit covers all 4 variants:
- try/catch with binding, try/catch without binding
- try/catch/finally with binding, try/catch/finally without binding
no-try-finally.grit covers bare try/finally.
Both exclude shared/result.ts and shared/parse.ts (the implementation layer).
Production files (18): aws, hetzner, digitalocean, gcp, sprite, index,
update-check, ui, ssh, agent-setup, picker, agent-tarball, shared,
run, connect, delete, list
Test files (12): cmdlast, cmd-interactive, cmdrun-happy-path,
commands-resolve-run, commands-swap-resolve, commands-error-paths,
download-and-failure, preload, ssh-keys, update-check, orchestrate,
fs-sandbox, prompt-file-security, security, script-failure-guidance
Bumps CLI version to 0.16.6
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds sprite-keep-running support so sprites stay alive during long
agent sessions instead of shutting down due to inactivity.
- Add installSpriteKeepAlive() to sprite/sprite.ts: downloads and
installs the sprite-keep-running script (~/.local/bin) on the sprite
during setup. Non-fatal: logs a warning if download fails so
deployment still proceeds.
- Modify interactiveSession() to wrap the session command in a temp
script (base64-encoded to handle multi-line restart loops) and exec
it via sprite-keep-running if available, with plain bash fallback.
- Call installSpriteKeepAlive() in sprite/main.ts createServer() step
after setupShellEnvironment(), applying to all Sprite agents.
- Add sprite-keep-alive.test.ts: 11 unit tests covering download URL,
install path, error resilience, session script structure, and
keep-alive wrapper inclusion.
Fixes#2424
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Move all filesystem path helpers (getUserHome, getSpawnDir, getHistoryPath,
getSpawnCloudConfigPath, getCacheDir, getCacheFile, getUpdateFailedPath,
getSshDir, getTmpDir) into a single shared/paths.ts module. This eliminates
scattered homedir()/process.env.HOME patterns across 8+ files and provides
a single import source for all path resolution.
- Create packages/cli/src/shared/paths.ts with 9 exported functions
- Update 17 source files to import from paths.ts
- Add re-exports in ui.ts and history.ts for backward compatibility
- Remove direct homedir() imports from gcp, sprite, local, ssh-keys, etc.
- Add comprehensive unit tests in paths.test.ts
- Bump CLI version to 0.15.34
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bun's os.homedir() reads from getpwuid() and ignores runtime changes to
process.env.HOME. Named imports capture the native function binding, so
patching os.homedir on the default export doesn't propagate. This caused
all test files using homedir() to write .spawn-test-* dirs to the real
home directory instead of the preload sandbox.
Add getUserHome() helper to shared/ui.ts that prefers process.env.HOME,
replace all direct homedir() calls in production and test code.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidates duplicate server naming logic from 5 cloud modules into shared utilities in src/shared/ui.ts. No behavioral changes - purely structural refactor.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The two-phase save architecture was fundamentally broken: saveVmConnection()
was called inside createServer() BEFORE saveSpawnRecord() created the record,
so the merge-by-spawnId silently failed every time — resulting in records
with no connection data and `spawn ls` showing nothing.
Replace with atomic single-save: createServer() now returns VMConnection,
and the orchestrator calls saveSpawnRecord() once with connection data
included. Removes saveVmConnection(), getConnectionPath(),
mergeLastConnection(), and last-connection.json entirely.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
Moves getErrorMessage to zero-dep shared module, eliminating 13 inline
copies and 2 hasMessage variant sites across the codebase.
Fixes#2341
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
De-export interfaces, types, and constants that are only used within
their own module files. These were exported but never imported by any
other module or test file, unnecessarily widening the public API surface.
Affected symbols:
- aws: AwsState, Region, REGIONS, AGENT_BUNDLE_DEFAULTS
- digitalocean: DigitalOceanState, DropletSize, DROPLET_SIZES, DoRegion, DO_REGIONS
- gcp: GcpState, MachineTypeTier, MACHINE_TYPES, ZoneOption, ZONES
- hetzner: HetznerState, ServerTypeTier, SERVER_TYPES, LocationOption, LOCATIONS
- sprite: SpriteState
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Remove 5 unused reset*State() exports (aws, hetzner, gcp, digitalocean,
sprite) that were never called anywhere in the codebase. Convert their
associated _state variables from let to const since they are no longer
reassigned.
Remove stale Daytona references in status.ts (comment and IP check)
left over after Daytona cloud provider removal in #2261.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Each cloud module (aws, daytona, digitalocean, gcp, hetzner, sprite) previously
stored per-operation state in bare module-level `let` variables, making them
process-global singletons. This is safe for single-cloud CLI invocations today
but creates latent bugs for multi-cloud orchestration and test isolation.
Replace scattered `let` globals with a single typed `_state` object per module:
- `AwsState` / `resetAwsState()` — 8 fields including `selectedBundle`
- `DaytonaState` / `resetDaytonaState()` — 5 fields
- `DigitalOceanState` / `resetDigitalOceanState()` — 3 fields
- `GcpState` / `resetGcpState()` — 5 fields
- `HetznerState` / `resetHetznerState()` — 3 fields
- `SpriteState` / `resetSpriteState()` — 2 fields
Each module exports a `resetXxxState()` function for test isolation. No function
signatures or existing exports were changed.
Fixes#2251
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: add unique spawn IDs to prevent history record corruption
History records were matched by heuristic ("most recent record for this
cloud without a connection"), which caused saveVmConnection and
saveLaunchCmd to overwrite the wrong record during concurrent or failed
spawns.
Fix: every SpawnRecord now has a unique `id` (UUID). All history
operations (saveVmConnection, saveLaunchCmd, removeRecord,
markRecordDeleted, mergeLastConnection) match by id when available,
falling back to the old heuristic for pre-migration records.
The orchestrator (TS path) now creates the history record AFTER server
creation succeeds, not before — so failed provisions don't leave orphan
entries.
Also adds "Remove from history" option to the spawn ls action picker,
restoring the ability to soft-delete entries without destroying the VM.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 18 unit tests for spawn ID history behavior
Tests cover:
- generateSpawnId returns unique UUIDs
- saveSpawnRecord auto-generates id when not provided
- saveVmConnection matches by spawnId (not heuristic)
- saveVmConnection does not cross-contaminate concurrent spawns
- saveVmConnection falls back to heuristic without spawnId
- saveLaunchCmd matches by spawnId (not heuristic)
- saveLaunchCmd falls back without spawnId
- removeRecord matches by id, not by timestamp+agent+cloud
- removeRecord handles duplicate timestamps correctly
- removeRecord falls back for legacy records without id
- markRecordDeleted targets correct record by id
- mergeLastConnection uses spawn_id from last-connection.json
- mergeLastConnection falls back to heuristic without spawn_id
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: enable biome import sorting with grouped imports
Adds organizeImports to biome assist config with groups:
1. Type imports
2. Node built-ins
3. Third-party packages
4. @openrouter/* packages
5. Aliases
Auto-fixed import order and lint issues across all TypeScript files,
including .claude/skills/ and packages/cli/src/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Unref the SIGKILL timer in killWithTimeout() so it doesn't keep the
event loop alive for 5 extra seconds after a timed-out process exits
- Wrap all setTimeout/clearTimeout pairs in try/finally across 6 cloud
providers (12 call sites) to guarantee cleanup on exceptions
- Add missing 60s timeout guard to runSpriteSilent() which could hang
indefinitely on unresponsive sprite processes
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(security): add --proto '=https' to curl calls in TypeScript provisioning
Fixes#2169
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(lint): break long lines for biome format compliance
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: address 4 reliability issues across codebase
1. sprite.ts: add --force to destroy command (stdin is "ignore" so
interactive prompts would hang until 60s timeout)
2. verify.sh: replace /dev/tcp port checks with ss -tln primary
(Debian/Ubuntu bash compiled without /dev/tcp support)
3. verify.sh: make _openclaw_restart_gateway a hard failure instead
of log_warn (matching _openclaw_ensure_gateway behavior)
4. agent-setup.ts: add ss -tln port check + "already running" early
exit + increase timeout from 120s to 300s (gateway takes ~3min
to initialize on AWS medium instances)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: biome format - use consistent double quotes in portCheck
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
- sprite/sprite.ts: Replace duplicate saveVmConnection implementation
with a call to the shared saveVmConnection from history.ts. The local
version duplicated the mkdir + writeFileSync logic already provided by
the shared function, just with Sprite-specific hardcoded values.
Remove now-unused writeFileSync, mkdirSync, and getSpawnDir imports.
- Bump CLI version 0.12.5 → 0.12.6 (patch)
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Add logStepInline/logStepDone helpers to ui.ts and convert all 9
polling loops (DO droplet, DO cloud-init, AWS instance, AWS cloud-init,
Hetzner cloud-init, Daytona SSH, Sprite connectivity, GCP startup,
shared SSH port) from multi-line spam to a single line that updates
in place.
Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Removed `getState()` from hetzner, gcp, daytona, sprite, and digitalocean
modules. These functions were exported but never called from production code
or tests. The aws module retains its `getState()` which is tested in
custom-flag.test.ts to verify region state mutation.
Also bumps CLI patch version (0.12.2 → 0.12.3) as required per project rules.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The sprite destroy command can hang indefinitely when the Sprite API
is unresponsive. Add a 60s timeout using the existing killWithTimeout
utility (same pattern as runSprite).
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
Rebased fix/issue-2083 onto main after commands.ts split (PR #2095).
Key resolutions:
- commands.ts: kept HEAD shim (re-exports from ./commands/index.ts)
- package.json: kept PR version 0.12.0 without @openrouter/spawn-shared dep
- Fixed @openrouter/spawn-shared imports in commands/shared.ts, commands/update.ts,
and __tests__/orchestrate.test.ts that were added after the PR branched
All 1390 tests pass, biome lint clean.
Agent: pr-maintainer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Six of seven cloud main.ts files had hardcoded agent lists that were
stale (missing hermes, added in #2084). Replace all hardcoded lists
with Object.keys(agents).join(", ") so they stay in sync automatically
when new agents are added.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* refactor: eliminate 7 identical agents.ts boilerplate files
Adds createCloudAgents() factory to shared/agent-setup.ts, reducing
each cloud's agents.ts from 16-line copy-paste to a single call.
Net reduction of 49 lines across 9 files.
Fixes#2078
Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* chore: apply biome formatting
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The Sprite saveVmConnection() wrote ~/.spawn/last-connection.json without
restrictive permissions (defaulting to umask 0o644/0o755), unlike the shared
saveVmConnection() in history.ts which correctly uses mode 0o700 for the
directory and 0o600 for the file. On multi-user systems this could expose
server names and connection metadata to other users.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
runSprite was wired as CloudRunner.runServer but silently dropped the
timeoutSecs parameter. All other clouds (Hetzner, DO, AWS, GCP, Daytona)
implement kill-on-timeout via setTimeout+killWithTimeout; Sprite had zero
timeout protection, so a hung agent install (e.g. ZeroClaw's 600s Rust
compile, Claude Code's 300s install) would hang forever on Sprite.
Matches the pattern used by every other cloud provider.
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(sprite): fix all 6 Sprite agent installs for E2E
- Use `npm install -g --prefix` instead of `npm config set prefix` to
avoid creating .npmrc that conflicts with nvm on Sprite VMs
- Fix shell environment setup to only modify .bash_profile (not .bashrc)
so non-interactive bash -c commands retain PATH config
- Add $HOME/.cargo/bin to PATH for zeroclaw (Sprite has no ~/.cargo/env)
- Add $HOME/.local/bin to PATH config for Sprite shell environment
- Add sprite E2E cloud driver with org detection, config corruption fix,
direct command embedding (not $1 positional), and retry logic
- Fix provision.sh to kill full process tree after timeout (prevents
orphaned sprite exec sessions from corrupting config)
- Fix verify.sh zeroclaw check to not rely on ~/.cargo/env existing
Tested: 6/6 Sprite agents pass E2E (claude, codex, openclaw, zeroclaw,
opencode, kilocode). Hermes is not in the Sprite manifest.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: biome format - collapse runSprite call to single line
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
When HOME is unset (containers, systemd, cron), process.env.HOME produces
literal "undefined" in path strings:
- ssh-keys.ts: SSH discovery/generation writes to "undefined/.ssh/"
- sprite.ts: CLI detection misses ~/.local/bin, PATH update corrupted
- gcp.ts: gcloud detection misses ~/google-cloud-sdk/bin, PATH corrupted
Same fix as #2026: use `process.env.HOME || homedir()` via `join()` for
robust OS-level fallback when HOME is unset.
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The hermes agent was added to manifest.json and sh/sprite/hermes.sh in
feat #2023, but createAgents() in shared/agent-setup.ts was not updated.
This caused sh/sprite/hermes.sh to throw "Unknown agent: hermes" when
resolveAgent() was called.
- Add hermes entry to createAgents() with correct install cmd, envVars, and launchCmd
- Update sprite/main.ts usage error message to include hermes
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Dead code removed:
- `cleanup_stale_apps` function in `sh/e2e/lib/cleanup.sh` — defined but
never called; `e2e.sh` calls `cloud_cleanup_stale` directly instead
- `generateEnvConfig` and `AgentConfig` re-exports from all 7 cloud-specific
`agents.ts` modules (aws, hetzner, gcp, digitalocean, daytona, local,
sprite) — nothing imported these from the cloud modules; they were already
available via `@openrouter/spawn-shared` and `../shared/agents`
All 1435 tests pass, biome lint is clean (0 errors).
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(security): prevent flag injection via hyphen-leading path segments in uploadFile
Reject remote paths where any segment starts with "-" (e.g., "-e", "/tmp/-evil")
across all 6 cloud providers. This prevents potential CLI flag injection in
commands like base64, printf, mv, and scp.
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* style: fix biome format for path validation conditions
Break long if-conditions across multiple lines and add parentheses
around arrow function parameters to satisfy biome formatter.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix(aws): increase OpenClaw gateway timeout to 120s and default to medium bundle
OpenClaw gateway consistently times out on AWS Lightsail because the 60s
timeout is too short for cold starts (npm install of 713 packages + gateway
init). Doubles the timeout to 120s and sets the default bundle for OpenClaw
to medium_3_0 (4 GB RAM) since it's too heavy for nano (512 MB).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve openclaw binary path for setsid and add npm-global to Sprite PATH
setsid replaces the process image and doesn't inherit the parent shell's
exported PATH, causing "No such file or directory" on Sprite (and potentially
other clouds). Fix by resolving the full binary path with `command -v` before
passing it to setsid. Also adds ~/.npm-global/bin to Sprite's persisted shell
PATH config so openclaw is discoverable in all session types.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(codex): update wire_api from "chat" to "responses"
Codex CLI dropped support for wire_api = "chat" — it now requires
"responses". This was never updated since the original codex integration,
causing an immediate crash loop on launch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: enable GitHub CLI auth for all agents, not just Claude Code
Only Claude Code had preProvision: promptGithubAuth — all other agents
(codex, openclaw, opencode, kilocode, zeroclaw) skipped GitHub auth
entirely. These are all coding agents that need gh access for PRs,
cloning, etc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing spawn import that crashes headless mode (#1981)
runBashHeadless calls spawn() from node:child_process at line 1112,
but only spawnSync was imported. This causes a ReferenceError crash
whenever --headless mode is used.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* refactor: remove dead offerGithubAuth exports from cloud agents.ts files
The per-cloud offerGithubAuth re-exports in each cloud's agents.ts were
never called from outside their own module. The actual GitHub auth
orchestration is handled by shared/orchestrate.ts which calls
offerGithubAuth from shared/agent-setup.ts directly.
Also update stale comment in sh/test/fixtures/_shared_agent_assertions.sh
that referenced mock.sh, a test harness file that no longer exists in
the repository.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: collapse multi-line imports to single-line per biome format
After removing offerGithubAuth exports, the remaining 2-import blocks
should be single-line. Also collapse fly/agents.ts 4-import block and
remove trailing blank line.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* fix: eliminate keystroke loss during interactive agent sessions
Three root causes were identified and fixed:
1. **Event loop fd competition**: Bun.spawn with stdio:"inherit" shares
fd 0 between the parent event loop and the child SSH process. The
kernel arbitrarily splits input bytes between them, causing random
keystroke drops. Introduced spawnInteractive() using Node's
child_process.spawnSync to block the event loop entirely.
2. **Unnecessary shell layers**: AWS and GCP wrapped the SSH command in
an extra `bash -c '...'` layer, creating 3 shell processes before the
agent. Aligned to match Hetzner/DO which pass directly.
3. **stty sane side effects**: prepareStdinForHandoff() ran `stty sane`
which enables ixon (XON/XOFF flow control), causing periodic input
freezes. Removed — setRawMode(false) is sufficient. Also removed
process.stdin.destroy() which could corrupt fd 0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: biome format + remove stdin unref that broke async spawn
- Fix biome formatting in ssh.ts and commands.ts
- Remove process.stdin.unref() from prepareStdinForHandoff — it
allowed the event loop to exit before async child_process.spawn
finished, causing test failures and potential production issues
with the spawnBash (legacy script execution) path
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Same pipe-buffer deadlock pattern fixed by PRs #1903, #1915, #1920, #1922.
Two instances were missed in those passes.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
PR #1920 fixed pipe buffer deadlock in runServerCapture and
waitForCloudInit but missed 6 other locations where Bun.spawn uses
"pipe" for stderr without draining it before await proc.exited.
When a child process writes >64KB to a piped stderr, the OS pipe
buffer fills, the child blocks on write(), and the parent blocks on
exited — classic deadlock.
Fix: change stderr from "pipe" to "inherit" in all 6 locations since
the stderr output is never read programmatically. This also lets
users see installation errors and SCP errors in real time.
Affected functions:
- fly.ts ensureFlyCli()
- sprite.ts ensureSpriteCli()
- gcp.ts ensureGcloudCli()
- hetzner.ts uploadFile()
- digitalocean.ts uploadFile()
- aws.ts uploadFile()
-- refactor/code-health
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: drain piped stderr before awaiting process exit to prevent deadlock
Awaiting `proc.exited` before reading piped stderr causes a deadlock when
the child process writes enough stderr output to fill the OS pipe buffer
(~64KB). The process blocks waiting for the buffer to drain, but we never
drain it because we're waiting for the process to exit first.
Fix sprite/sprite.ts (createSprite, uploadFileSprite) and aws/aws.ts
(awsCli) to start draining stderr before awaiting exit, matching the
established pattern in gcp/gcp.ts and shared/ssh.ts.
Agent: code-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: apply biome format fixes to pass CI
Agent: team-lead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci: add Mock Tests job to satisfy required status check
Split the unit-tests job into mock-tests (runs bun test) and unit-tests
(verifies cloud bundles build). The repo ruleset requires "Mock Tests",
"Unit Tests", and "Biome Lint" checks — the missing "Mock Tests" job was
blocking all PR merges.
Fixes#1901
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* style: fix pre-existing Biome format issues in 9 files
Auto-applied Biome formatter to src/ to resolve failing "Biome Lint"
required status check. No logic changes — formatting only.
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Using Node's child_process.spawn() to launch interactive SSH/shell sessions
from inside a Bun process adds unnecessary overhead: an extra process fork,
PTY negotiation indirection, and a forced Bun→Node stdio context switch.
Switch all interactiveSession() functions to Bun.spawn() with
stdio: ["inherit","inherit","inherit"], which hands off file descriptors
directly without forking a Node wrapper process.
Also removes the 500ms hardcoded sleep in orchestrate.ts that was a
band-aid for the old child_process handoff latency. The synchronous
prepareStdinForHandoff() is sufficient on its own.
Affected clouds: hetzner, aws, gcp, digitalocean, fly, daytona, sprite, local
Also fixes runInteractiveCommand() in commands.ts (spawn connect).
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Restructure the repo as a Bun workspace monorepo:
- Move cli/ → packages/cli/
- Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities
- Add root package.json with workspace configuration
- Update all CLI imports to use @openrouter/spawn-shared
- Deduplicate toRecord/toObjectArray helpers from 4 cloud modules
- Update SPA (slack-bot) to use shared package instead of local toObj()
- Update 48 agent shell scripts for new packages/cli/ path
- Update install.sh, install.ps1, e2e, and test scripts
- Update all GitHub workflows, .gitignore, pre-commit hooks
- Update CLAUDE.md, README.md, and skill prompt references
- Pin all dependency versions (no ^ ranges)
- Bump CLI version 0.9.1 → 0.10.0
All 1908 tests pass. Lint clean. All 8 cloud bundles build.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>