Commit graph

45 commits

Author SHA1 Message Date
A
d4774fdc8e
fix(sprite): append to ~/.bash_profile and gate exec zsh on interactive shells (#2756)
- Use >> instead of > to append to ~/.bash_profile (preserves existing config)
- Gate exec zsh on interactive shells: [[ $- == *i* ]] && exec /usr/bin/zsh -l
- Bump CLI version to 0.21.7

Fixes #2740

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 23:55:33 -07:00
A
5b2eddb763
fix(sprite): replace personal VM URL with official CDN for keep-alive script (#2701)
The sprite-keep-running.sh script was downloaded from a hardcoded personal
VM URL (kurt-claw-f.sprites.app) which would break all Sprite deployments
if that VM goes offline. Use the official CDN proxy at openrouter.ai/labs/spawn/.

Fixes #2699

-- refactor/code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 20:04:49 -07:00
A
644593eaea
fix(security): propagate path normalization to all cloud modules (#2693)
* fix(security): propagate path normalization to all cloud upload/download functions

PR #2690 added normalize() before path traversal checks in AWS but not
the other clouds. Apply the same defense-in-depth to GCP, DigitalOcean,
Hetzner, Sprite, and shared validateRemotePath.

Agent: code-health

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(security): use normalized path in all file transfer operations

Addresses code review: replace original remotePath with normalizedRemote
in scp commands and bash operations to prevent validation bypass.

- digitalocean: use normalizedRemote in uploadFile scp and derive
  expandedPath from normalizedRemote in downloadFile
- hetzner: same pattern for uploadFile/downloadFile
- gcp: derive expandedPath from normalizedRemote.replace(...) in both
  uploadFile and downloadFile
- sprite: use normalizedRemote in bash mkdir/mv command and derive
  expandedPath from normalizedRemote in downloadFile

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(security): close validation bypass in agent-setup and AWS file ops

validateRemotePath() validated the normalized path but returned void,
so the caller still used the original unsanitized remotePath in shell
commands — bypassing the normalization check entirely.

Fix: return the normalized path and use it in all file operations.

Also fix AWS uploadFile/downloadFile which validated normalizedRemote
but used the original remotePath in scp commands.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-16 14:48:59 -07:00
A
f7c23de716
feat: add downloadFile to CloudRunner + local OpenClaw config merge (#2636)
* feat: add downloadFile to CloudRunner + local OpenClaw config merge

Add `downloadFile(remotePath, localPath)` to the CloudRunner interface
and implement it across all 6 cloud providers (Hetzner, AWS, GCP,
DigitalOcean, Sprite, Local) — mirroring the existing `uploadFile` with
reversed SCP direction.

Replace the OpenClaw config write with a download → deep-merge → upload
flow so config merging happens in our own linted TypeScript instead of
a remote script.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move isPlainObject and deepMerge to shared utils

Extract `isPlainObject` to `shared/type-guards.ts` and `deepMerge` to
`shared/parse.ts` so they're reusable across the codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: promote isPlainObject to shared package, use across codebase

Move `isPlainObject` from cli/type-guards.ts into
@openrouter/spawn-shared so it can be used everywhere. Replace
inline `val !== null && typeof val === "object" && !Array.isArray(val)`
checks in:

- shared/type-guards.ts (toRecord, toObjectArray)
- shared/parse.ts (parseJsonObj)
- cli/manifest.ts (isValidManifest)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove type-guards re-export, import directly from spawn-shared

Delete `packages/cli/src/shared/type-guards.ts` (was just a re-export
barrel). All 35 consuming files now import `getErrorMessage`, `isString`,
`isNumber`, `isPlainObject`, `toRecord`, etc. directly from
`@openrouter/spawn-shared`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-14 15:47:32 -07:00
A
1195fcb648
fix: add timeout to sprite create subprocess to prevent indefinite hang (#2614)
The `sprite create` API call in `createSprite()` had no timeout, so when
the Sprite API blocked for certain agents (kilocode, opencode), the
process hung indefinitely. The bash-level timeout in provision.sh wraps
the outer subshell but the deeply-nested `sprite create` subprocess
could survive signal propagation.

Add a 300s (configurable via SPRITE_CREATE_TIMEOUT) timeout to the
`sprite create` subprocess using the existing killWithTimeout +
asyncTryCatch pattern already used by runSprite() and destroyServer().

Fixes #2612

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-14 08:17:55 -04:00
A
5031d84e6c
refactor: eliminate process-global mock.module() pollution in tests (#2490)
Replace mock.module() calls with dependency injection to prevent
cross-file test pollution in Bun's shared worker process. Changes:

- orchestrate.ts: add getApiKey to OrchestrationOptions
- billing-guidance.ts: add injectable BillingGuidanceDeps parameter
- delete.ts: add optional deleteHandler parameter to confirmAndDelete
- update.ts: add UpdateOptions with injectable runUpdate function
- sprite.ts: add optional spawnFn parameter to interactiveSession
- Remove unnecessary oauth mocks from junie-agent and do-snapshot tests

Only @clack/prompts mock (shared via test-helpers.ts) and
do-payment-warning.test.ts (safe spread pattern) remain.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 23:57:57 -07:00
A
46b1e9d42c
refactor: add no-try-catch + no-try-finally grit rules, eliminate all violations (#2481)
Add two new GritQL biome plugins (matching ori repo patterns) that ban
all try/catch and try/finally in TypeScript code. Convert all remaining
blocks across production and test files to use tryCatch/asyncTryCatch
from @openrouter/spawn-shared.

no-try-catch.grit covers all 4 variants:
- try/catch with binding, try/catch without binding
- try/catch/finally with binding, try/catch/finally without binding

no-try-finally.grit covers bare try/finally.

Both exclude shared/result.ts and shared/parse.ts (the implementation layer).

Production files (18): aws, hetzner, digitalocean, gcp, sprite, index,
update-check, ui, ssh, agent-setup, picker, agent-tarball, shared,
run, connect, delete, list

Test files (12): cmdlast, cmd-interactive, cmdrun-happy-path,
commands-resolve-run, commands-swap-resolve, commands-error-paths,
download-and-failure, preload, ssh-keys, update-check, orchestrate,
fs-sandbox, prompt-file-security, security, script-failure-guidance

Bumps CLI version to 0.16.6

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 21:27:25 -07:00
A
014d591e68
refactor: convert remaining 5 try/catch blocks to Result helpers (#2480)
Convert the last convertible catch blocks:
- digitalocean.ts: SSH key registration fallback
- sprite.ts: keep-alive soft-dependency install
- agent-tarball.ts: tarball metadata fetch fallback
- list.ts: enter/reconnect connection error recovery (2 blocks)

The remaining ~43 try blocks are all try/finally cleanup (21),
security/billing validation (10), or top-level handlers — none
are candidates for Result helper conversion.

Bumps CLI to 0.16.5.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
2026-03-10 23:01:10 -04:00
A
a7a2032584
refactor: replace ~50 try/catch blocks with Result helpers across 20 files (#2479)
Convert catch-all, catch-swallow, catch-return-fallback, and catch-classify
patterns to use tryCatch/asyncTryCatch/unwrapOr from @openrouter/spawn-shared.

Files changed: aws.ts, hetzner.ts, digitalocean.ts, gcp.ts, run.ts, delete.ts,
shared.ts, ssh.ts, agent-setup.ts, orchestrate.ts, ui.ts, index.ts,
update-check.ts, update.ts, status.ts, picker.ts, interactive.ts, list.ts,
pick.ts, ssh-keys.ts, billing-guidance.ts, oauth.ts, sprite.ts

Preserved all try/finally-only blocks, security-validation-exit blocks,
billing/classify blocks, spinner cleanup, and top-level handleError blocks.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
2026-03-10 19:26:41 -07:00
A
72ccb098ab
feat: integrate Sprite keep-alive tasks for all Sprite agents (#2428)
Adds sprite-keep-running support so sprites stay alive during long
agent sessions instead of shutting down due to inactivity.

- Add installSpriteKeepAlive() to sprite/sprite.ts: downloads and
  installs the sprite-keep-running script (~/.local/bin) on the sprite
  during setup. Non-fatal: logs a warning if download fails so
  deployment still proceeds.

- Modify interactiveSession() to wrap the session command in a temp
  script (base64-encoded to handle multi-line restart loops) and exec
  it via sprite-keep-running if available, with plain bash fallback.

- Call installSpriteKeepAlive() in sprite/main.ts createServer() step
  after setupShellEnvironment(), applying to all Sprite agents.

- Add sprite-keep-alive.test.ts: 11 unit tests covering download URL,
  install path, error resilience, session script structure, and
  keep-alive wrapper inclusion.

Fixes #2424

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 02:24:18 -07:00
A
de76599b39
refactor: centralize path resolution into shared/paths.ts (#2422)
Move all filesystem path helpers (getUserHome, getSpawnDir, getHistoryPath,
getSpawnCloudConfigPath, getCacheDir, getCacheFile, getUpdateFailedPath,
getSshDir, getTmpDir) into a single shared/paths.ts module. This eliminates
scattered homedir()/process.env.HOME patterns across 8+ files and provides
a single import source for all path resolution.

- Create packages/cli/src/shared/paths.ts with 9 exported functions
- Update 17 source files to import from paths.ts
- Add re-exports in ui.ts and history.ts for backward compatibility
- Remove direct homedir() imports from gcp, sprite, local, ssh-keys, etc.
- Add comprehensive unit tests in paths.test.ts
- Bump CLI version to 0.15.34

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 00:48:03 -07:00
A
486aba49f6
fix: use process.env.HOME instead of os.homedir() for test sandboxing (#2417)
Bun's os.homedir() reads from getpwuid() and ignores runtime changes to
process.env.HOME. Named imports capture the native function binding, so
patching os.homedir on the default export doesn't propagate. This caused
all test files using homedir() to write .spawn-test-* dirs to the real
home directory instead of the preload sandbox.

Add getUserHome() helper to shared/ui.ts that prefers process.env.HOME,
replace all direct homedir() calls in production and test code.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 00:20:19 -07:00
A
f272294902
refactor: Deduplicate getServerName and promptSpawnName across cloud modules (#2415)
Consolidates duplicate server naming logic from 5 cloud modules into shared utilities in src/shared/ui.ts. No behavioral changes - purely structural refactor.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 05:26:25 +00:00
Ahmed Abushagur
05f744e052
fix: atomic single-save for history records (createServer returns VMConnection) (#2388)
The two-phase save architecture was fundamentally broken: saveVmConnection()
was called inside createServer() BEFORE saveSpawnRecord() created the record,
so the merge-by-spawnId silently failed every time — resulting in records
with no connection data and `spawn ls` showing nothing.

Replace with atomic single-save: createServer() now returns VMConnection,
and the orchestrator calls saveSpawnRecord() once with connection data
included. Removes saveVmConnection(), getConnectionPath(),
mergeLastConnection(), and last-connection.json entirely.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-09 14:32:45 -07:00
A
36582b3b95
refactor: deduplicate getErrorMessage into shared/type-guards.ts (#2343)
Moves getErrorMessage to zero-dep shared module, eliminating 13 inline
copies and 2 hasMessage variant sites across the codebase.

Fixes #2341

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 07:45:11 -07:00
A
6eb0234f81
refactor: remove unnecessary exports from cloud modules (#2288)
De-export interfaces, types, and constants that are only used within
their own module files. These were exported but never imported by any
other module or test file, unnecessarily widening the public API surface.

Affected symbols:
- aws: AwsState, Region, REGIONS, AGENT_BUNDLE_DEFAULTS
- digitalocean: DigitalOceanState, DropletSize, DROPLET_SIZES, DoRegion, DO_REGIONS
- gcp: GcpState, MachineTypeTier, MACHINE_TYPES, ZoneOption, ZONES
- hetzner: HetznerState, ServerTypeTier, SERVER_TYPES, LocationOption, LOCATIONS
- sprite: SpriteState

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-07 11:44:55 -05:00
A
df462645a0
refactor: remove dead reset*State functions and stale Daytona references (#2265)
Remove 5 unused reset*State() exports (aws, hetzner, gcp, digitalocean,
sprite) that were never called anywhere in the codebase. Convert their
associated _state variables from let to const since they are no longer
reassigned.

Remove stale Daytona references in status.ts (comment and IP check)
left over after Daytona cloud provider removal in #2261.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-06 20:39:32 -05:00
A
f862ee563e
refactor: replace module-level mutable globals with typed state objects in cloud providers (#2255)
Each cloud module (aws, daytona, digitalocean, gcp, hetzner, sprite) previously
stored per-operation state in bare module-level `let` variables, making them
process-global singletons. This is safe for single-cloud CLI invocations today
but creates latent bugs for multi-cloud orchestration and test isolation.

Replace scattered `let` globals with a single typed `_state` object per module:
- `AwsState` / `resetAwsState()` — 8 fields including `selectedBundle`
- `DaytonaState` / `resetDaytonaState()` — 5 fields
- `DigitalOceanState` / `resetDigitalOceanState()` — 3 fields
- `GcpState` / `resetGcpState()` — 5 fields
- `HetznerState` / `resetHetznerState()` — 3 fields
- `SpriteState` / `resetSpriteState()` — 2 fields

Each module exports a `resetXxxState()` function for test isolation. No function
signatures or existing exports were changed.

Fixes #2251

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-06 18:11:46 -05:00
L
65a81edc57
fix: add unique spawn IDs to prevent history record corruption (#2235)
* fix: add unique spawn IDs to prevent history record corruption

History records were matched by heuristic ("most recent record for this
cloud without a connection"), which caused saveVmConnection and
saveLaunchCmd to overwrite the wrong record during concurrent or failed
spawns.

Fix: every SpawnRecord now has a unique `id` (UUID). All history
operations (saveVmConnection, saveLaunchCmd, removeRecord,
markRecordDeleted, mergeLastConnection) match by id when available,
falling back to the old heuristic for pre-migration records.

The orchestrator (TS path) now creates the history record AFTER server
creation succeeds, not before — so failed provisions don't leave orphan
entries.

Also adds "Remove from history" option to the spawn ls action picker,
restoring the ability to soft-delete entries without destroying the VM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 18 unit tests for spawn ID history behavior

Tests cover:
- generateSpawnId returns unique UUIDs
- saveSpawnRecord auto-generates id when not provided
- saveVmConnection matches by spawnId (not heuristic)
- saveVmConnection does not cross-contaminate concurrent spawns
- saveVmConnection falls back to heuristic without spawnId
- saveLaunchCmd matches by spawnId (not heuristic)
- saveLaunchCmd falls back without spawnId
- removeRecord matches by id, not by timestamp+agent+cloud
- removeRecord handles duplicate timestamps correctly
- removeRecord falls back for legacy records without id
- markRecordDeleted targets correct record by id
- mergeLastConnection uses spawn_id from last-connection.json
- mergeLastConnection falls back to heuristic without spawn_id

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: enable biome import sorting with grouped imports

Adds organizeImports to biome assist config with groups:
1. Type imports
2. Node built-ins
3. Third-party packages
4. @openrouter/* packages
5. Aliases

Auto-fixed import order and lint issues across all TypeScript files,
including .claude/skills/ and packages/cli/src/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-05 23:27:03 -08:00
A
701e3af56e
fix: prevent timer leaks and event-loop stalls in SSH timeout handling (#2200)
- Unref the SIGKILL timer in killWithTimeout() so it doesn't keep the
  event loop alive for 5 extra seconds after a timed-out process exits
- Wrap all setTimeout/clearTimeout pairs in try/finally across 6 cloud
  providers (12 call sites) to guarantee cleanup on exceptions
- Add missing 60s timeout guard to runSpriteSilent() which could hang
  indefinitely on unresponsive sprite processes

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-04 19:04:47 -08:00
A
c8581b7958
fix(security): add --proto '=https' to TypeScript curl provisioning calls (#2172)
* fix(security): add --proto '=https' to curl calls in TypeScript provisioning

Fixes #2169

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(lint): break long lines for biome format compliance

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-04 00:21:37 -08:00
Ahmed Abushagur
300b330106
fix: address 4 reliability issues across codebase (#2129)
* fix: address 4 reliability issues across codebase

1. sprite.ts: add --force to destroy command (stdin is "ignore" so
   interactive prompts would hang until 60s timeout)

2. verify.sh: replace /dev/tcp port checks with ss -tln primary
   (Debian/Ubuntu bash compiled without /dev/tcp support)

3. verify.sh: make _openclaw_restart_gateway a hard failure instead
   of log_warn (matching _openclaw_ensure_gateway behavior)

4. agent-setup.ts: add ss -tln port check + "already running" early
   exit + increase timeout from 120s to 300s (gateway takes ~3min
   to initialize on AWS medium instances)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format - use consistent double quotes in portCheck

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-03 03:18:44 -05:00
A
c9b8ee5997
refactor: Remove dead code and stale references (#2128)
- sprite/sprite.ts: Replace duplicate saveVmConnection implementation
  with a call to the shared saveVmConnection from history.ts. The local
  version duplicated the mkdir + writeFileSync logic already provided by
  the shared function, just with Sprite-specific hardcoded values.
  Remove now-unused writeFileSync, mkdirSync, and getSpawnDir imports.
- Bump CLI version 0.12.5 → 0.12.6 (patch)

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 22:05:38 -08:00
Ahmed Abushagur
f7b3bc91c9
fix: show all polling status on single updating line (#2115)
Add logStepInline/logStepDone helpers to ui.ts and convert all 9
polling loops (DO droplet, DO cloud-init, AWS instance, AWS cloud-init,
Hetzner cloud-init, Daytona SSH, Sprite connectivity, GCP startup,
shared SSH port) from multi-line spam to a single line that updates
in place.

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-02 18:10:27 -05:00
A
0e145c2e8a
refactor: Remove dead getState() exports from cloud modules (#2108)
Removed `getState()` from hetzner, gcp, daytona, sprite, and digitalocean
modules. These functions were exported but never called from production code
or tests. The aws module retains its `getState()` which is tested in
custom-flag.test.ts to verify region state mutation.

Also bumps CLI patch version (0.12.2 → 0.12.3) as required per project rules.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 10:58:48 -08:00
Ahmed Abushagur
297e7ff21c
fix(sprite): add 60s timeout to destroyServer to prevent hanging (#2096)
The sprite destroy command can hang indefinitely when the Sprite API
is unresponsive. Add a 60s timeout using the existing killWithTimeout
utility (same pattern as runSprite).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-01 22:37:57 -08:00
A
3911b5bc28
refactor: resolve conflicts — merge packages/shared into packages/cli/src/shared (#2092)
Rebased fix/issue-2083 onto main after commands.ts split (PR #2095).
Key resolutions:
- commands.ts: kept HEAD shim (re-exports from ./commands/index.ts)
- package.json: kept PR version 0.12.0 without @openrouter/spawn-shared dep
- Fixed @openrouter/spawn-shared imports in commands/shared.ts, commands/update.ts,
  and __tests__/orchestrate.test.ts that were added after the PR branched

All 1390 tests pass, biome lint clean.

Agent: pr-maintainer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 22:05:41 -08:00
A
4802852fac
fix: derive agent lists dynamically in usage messages (#2089)
Six of seven cloud main.ts files had hardcoded agent lists that were
stale (missing hermes, added in #2084). Replace all hardcoded lists
with Object.keys(agents).join(", ") so they stay in sync automatically
when new agents are added.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-01 23:21:15 -05:00
A
608e8b4bdd
refactor: eliminate 7 identical agents.ts boilerplate files (#2088)
* refactor: eliminate 7 identical agents.ts boilerplate files

Adds createCloudAgents() factory to shared/agent-setup.ts, reducing
each cloud's agents.ts from 16-line copy-paste to a single call.
Net reduction of 49 lines across 9 files.

Fixes #2078

Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* chore: apply biome formatting

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 22:06:57 -05:00
A
dbec3e768c
fix(security): add restrictive file permissions to Sprite saveVmConnection (#2068)
The Sprite saveVmConnection() wrote ~/.spawn/last-connection.json without
restrictive permissions (defaulting to umask 0o644/0o755), unlike the shared
saveVmConnection() in history.ts which correctly uses mode 0o700 for the
directory and 0o600 for the file. On multi-user systems this could expose
server names and connection metadata to other users.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 15:44:15 -05:00
A
cef14ce9ea
fix(sprite): pass timeoutSecs through to runSprite, add kill-on-timeout (#2060)
runSprite was wired as CloudRunner.runServer but silently dropped the
timeoutSecs parameter. All other clouds (Hetzner, DO, AWS, GCP, Daytona)
implement kill-on-timeout via setTimeout+killWithTimeout; Sprite had zero
timeout protection, so a hung agent install (e.g. ZeroClaw's 600s Rust
compile, Claude Code's 300s install) would hang forever on Sprite.

Matches the pattern used by every other cloud provider.

Agent: team-lead

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 08:21:26 -05:00
Ahmed Abushagur
45caf4b96b
fix(sprite): fix all 6 Sprite agent installs for E2E (#2057)
* fix(sprite): fix all 6 Sprite agent installs for E2E

- Use `npm install -g --prefix` instead of `npm config set prefix` to
  avoid creating .npmrc that conflicts with nvm on Sprite VMs
- Fix shell environment setup to only modify .bash_profile (not .bashrc)
  so non-interactive bash -c commands retain PATH config
- Add $HOME/.cargo/bin to PATH for zeroclaw (Sprite has no ~/.cargo/env)
- Add $HOME/.local/bin to PATH config for Sprite shell environment
- Add sprite E2E cloud driver with org detection, config corruption fix,
  direct command embedding (not $1 positional), and retry logic
- Fix provision.sh to kill full process tree after timeout (prevents
  orphaned sprite exec sessions from corrupting config)
- Fix verify.sh zeroclaw check to not rely on ~/.cargo/env existing

Tested: 6/6 Sprite agents pass E2E (claude, codex, openclaw, zeroclaw,
opencode, kilocode). Hermes is not in the Sprite manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format - collapse runSprite call to single line

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-01 07:15:09 -05:00
A
44f67462ed
fix: extend HOME hardening to ssh-keys, sprite, gcp (3 files missed by #2026) (#2036)
When HOME is unset (containers, systemd, cron), process.env.HOME produces
literal "undefined" in path strings:
- ssh-keys.ts: SSH discovery/generation writes to "undefined/.ssh/"
- sprite.ts: CLI detection misses ~/.local/bin, PATH update corrupted
- gcp.ts: gcloud detection misses ~/google-cloud-sdk/bin, PATH corrupted

Same fix as #2026: use `process.env.HOME || homedir()` via `join()` for
robust OS-level fallback when HOME is unset.

Agent: team-lead

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 13:51:09 -08:00
A
912d8305c5
fix: add missing hermes agent to createAgents() and update sprite agents list (#2024)
The hermes agent was added to manifest.json and sh/sprite/hermes.sh in
feat #2023, but createAgents() in shared/agent-setup.ts was not updated.
This caused sh/sprite/hermes.sh to throw "Unknown agent: hermes" when
resolveAgent() was called.

- Add hermes entry to createAgents() with correct install cmd, envVars, and launchCmd
- Update sprite/main.ts usage error message to include hermes

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 12:08:18 -05:00
A
87978b424d
refactor: Remove dead code and stale references (#2010)
Dead code removed:
- `cleanup_stale_apps` function in `sh/e2e/lib/cleanup.sh` — defined but
  never called; `e2e.sh` calls `cloud_cleanup_stale` directly instead
- `generateEnvConfig` and `AgentConfig` re-exports from all 7 cloud-specific
  `agents.ts` modules (aws, hetzner, gcp, digitalocean, daytona, local,
  sprite) — nothing imported these from the cloud modules; they were already
  available via `@openrouter/spawn-shared` and `../shared/agents`

All 1435 tests pass, biome lint is clean (0 errors).

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:18:20 -05:00
A
d92d0e6e21
fix(security): prevent flag injection via hyphen-leading remote paths (#1996)
* fix(security): prevent flag injection via hyphen-leading path segments in uploadFile

Reject remote paths where any segment starts with "-" (e.g., "-e", "/tmp/-evil")
across all 6 cloud providers. This prevents potential CLI flag injection in
commands like base64, printf, mv, and scp.

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: fix biome format for path validation conditions

Break long if-conditions across multiple lines and add parentheses
around arrow function parameters to satisfy biome formatter.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-27 11:17:13 -05:00
A
4436a01372
fix(aws): increase OpenClaw gateway timeout and default to medium bundle (#1982)
* fix(aws): increase OpenClaw gateway timeout to 120s and default to medium bundle

OpenClaw gateway consistently times out on AWS Lightsail because the 60s
timeout is too short for cold starts (npm install of 713 packages + gateway
init). Doubles the timeout to 120s and sets the default bundle for OpenClaw
to medium_3_0 (4 GB RAM) since it's too heavy for nano (512 MB).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve openclaw binary path for setsid and add npm-global to Sprite PATH

setsid replaces the process image and doesn't inherit the parent shell's
exported PATH, causing "No such file or directory" on Sprite (and potentially
other clouds). Fix by resolving the full binary path with `command -v` before
passing it to setsid. Also adds ~/.npm-global/bin to Sprite's persisted shell
PATH config so openclaw is discoverable in all session types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(codex): update wire_api from "chat" to "responses"

Codex CLI dropped support for wire_api = "chat" — it now requires
"responses". This was never updated since the original codex integration,
causing an immediate crash loop on launch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: enable GitHub CLI auth for all agents, not just Claude Code

Only Claude Code had preProvision: promptGithubAuth — all other agents
(codex, openclaw, opencode, kilocode, zeroclaw) skipped GitHub auth
entirely. These are all coding agents that need gh access for PRs,
cloning, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add missing spawn import that crashes headless mode (#1981)

runBashHeadless calls spawn() from node:child_process at line 1112,
but only spawnSync was imported. This causes a ReferenceError crash
whenever --headless mode is used.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-02-27 01:58:17 -05:00
A
f1c2b4d4e4
refactor: Remove dead code and stale references (#1976)
* refactor: remove dead offerGithubAuth exports from cloud agents.ts files

The per-cloud offerGithubAuth re-exports in each cloud's agents.ts were
never called from outside their own module. The actual GitHub auth
orchestration is handled by shared/orchestrate.ts which calls
offerGithubAuth from shared/agent-setup.ts directly.

Also update stale comment in sh/test/fixtures/_shared_agent_assertions.sh
that referenced mock.sh, a test harness file that no longer exists in
the repository.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: collapse multi-line imports to single-line per biome format

After removing offerGithubAuth exports, the remaining 2-import blocks
should be single-line. Also collapse fly/agents.ts 4-import block and
remove trailing blank line.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-02-26 22:04:33 -05:00
Ahmed Abushagur
357132b506
fix: eliminate keystroke loss during interactive agent sessions (#1939)
* fix: eliminate keystroke loss during interactive agent sessions

Three root causes were identified and fixed:

1. **Event loop fd competition**: Bun.spawn with stdio:"inherit" shares
   fd 0 between the parent event loop and the child SSH process. The
   kernel arbitrarily splits input bytes between them, causing random
   keystroke drops. Introduced spawnInteractive() using Node's
   child_process.spawnSync to block the event loop entirely.

2. **Unnecessary shell layers**: AWS and GCP wrapped the SSH command in
   an extra `bash -c '...'` layer, creating 3 shell processes before the
   agent. Aligned to match Hetzner/DO which pass directly.

3. **stty sane side effects**: prepareStdinForHandoff() ran `stty sane`
   which enables ixon (XON/XOFF flow control), causing periodic input
   freezes. Removed — setRawMode(false) is sufficient. Also removed
   process.stdin.destroy() which could corrupt fd 0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format + remove stdin unref that broke async spawn

- Fix biome formatting in ssh.ts and commands.ts
- Remove process.stdin.unref() from prepareStdinForHandoff — it
  allowed the event loop to exit before async child_process.spawn
  finished, causing test failures and potential production issues
  with the spawnBash (legacy script execution) path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 00:08:38 -05:00
A
fbb48514e9
fix: drain piped stdout/stderr in aws waitForCloudInit and sprite destroyServer (#1929)
Same pipe-buffer deadlock pattern fixed by PRs #1903, #1915, #1920, #1922.
Two instances were missed in those passes.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 15:48:43 -05:00
A
2907ff6068
fix: drain piped stderr in CLI installers and uploadFile to prevent deadlock (#1922)
PR #1920 fixed pipe buffer deadlock in runServerCapture and
waitForCloudInit but missed 6 other locations where Bun.spawn uses
"pipe" for stderr without draining it before await proc.exited.

When a child process writes >64KB to a piped stderr, the OS pipe
buffer fills, the child blocks on write(), and the parent blocks on
exited — classic deadlock.

Fix: change stderr from "pipe" to "inherit" in all 6 locations since
the stderr output is never read programmatically. This also lets
users see installation errors and SCP errors in real time.

Affected functions:
- fly.ts ensureFlyCli()
- sprite.ts ensureSpriteCli()
- gcp.ts ensureGcloudCli()
- hetzner.ts uploadFile()
- digitalocean.ts uploadFile()
- aws.ts uploadFile()

-- refactor/code-health

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 09:26:27 -05:00
A
d8e03c7a04
fix: drain piped stderr before awaiting process exit to prevent deadlock (#1903)
* fix: drain piped stderr before awaiting process exit to prevent deadlock

Awaiting `proc.exited` before reading piped stderr causes a deadlock when
the child process writes enough stderr output to fill the OS pipe buffer
(~64KB). The process blocks waiting for the buffer to drain, but we never
drain it because we're waiting for the process to exit first.

Fix sprite/sprite.ts (createSprite, uploadFileSprite) and aws/aws.ts
(awsCli) to start draining stderr before awaiting exit, matching the
established pattern in gcp/gcp.ts and shared/ssh.ts.

Agent: code-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: apply biome format fixes to pass CI

Agent: team-lead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-25 01:54:56 -05:00
A
9e54f0cf57
ci: add Mock Tests job to satisfy required status check (#1904)
* ci: add Mock Tests job to satisfy required status check

Split the unit-tests job into mock-tests (runs bun test) and unit-tests
(verifies cloud bundles build). The repo ruleset requires "Mock Tests",
"Unit Tests", and "Biome Lint" checks — the missing "Mock Tests" job was
blocking all PR merges.

Fixes #1901

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: fix pre-existing Biome format issues in 9 files

Auto-applied Biome formatter to src/ to resolve failing "Biome Lint"
required status check. No logic changes — formatting only.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 00:54:33 -05:00
A
5b9ceef060
perf: replace child_process.spawn with Bun.spawn for interactive sessions (#1888)
Using Node's child_process.spawn() to launch interactive SSH/shell sessions
from inside a Bun process adds unnecessary overhead: an extra process fork,
PTY negotiation indirection, and a forced Bun→Node stdio context switch.

Switch all interactiveSession() functions to Bun.spawn() with
stdio: ["inherit","inherit","inherit"], which hands off file descriptors
directly without forking a Node wrapper process.

Also removes the 500ms hardcoded sleep in orchestrate.ts that was a
band-aid for the old child_process handoff latency. The synchronous
prepareStdinForHandoff() is sufficient on its own.

Affected clouds: hetzner, aws, gcp, digitalocean, fly, daytona, sprite, local
Also fixes runInteractiveCommand() in commands.ts (spawn connect).

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 15:17:53 -05:00
A
65f6f1be32
feat: Bun workspace monorepo — packages/cli + packages/shared (#1853)
Restructure the repo as a Bun workspace monorepo:

- Move cli/ → packages/cli/
- Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities
- Add root package.json with workspace configuration
- Update all CLI imports to use @openrouter/spawn-shared
- Deduplicate toRecord/toObjectArray helpers from 4 cloud modules
- Update SPA (slack-bot) to use shared package instead of local toObj()
- Update 48 agent shell scripts for new packages/cli/ path
- Update install.sh, install.ps1, e2e, and test scripts
- Update all GitHub workflows, .gitignore, pre-commit hooks
- Update CLAUDE.md, README.md, and skill prompt references
- Pin all dependency versions (no ^ ranges)
- Bump CLI version 0.9.1 → 0.10.0

All 1908 tests pass. Lint clean. All 8 cloud bundles build.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 22:07:05 -08:00