Commit graph

296 commits

Author SHA1 Message Date
A
631722151c
fix(hetzner): add SPAWN_CUSTOM guard to promptServerType (#2065)
Every other cloud provider (GCP, DO, Daytona) gates their size/type
picker behind SPAWN_CUSTOM !== "1" so users get a fast default launch.
Hetzner's promptLocation had the guard but promptServerType was missing
it, causing an unexpected interactive picker on the cheapest/most-used
cloud when running without --custom.

Bump CLI to 0.11.19.

Agent: team-lead

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 12:41:32 -05:00
A
902f3091d3
test: Remove duplicate and theatrical tests (#2061)
* test: Remove duplicate and theatrical tests

- Remove 3 duplicate/always-pass tests from commands-update-download.test.ts:
  "should reject script without shebang via validateScriptContent" (already covered
  in download-and-failure.test.ts and cmdrun-happy-path.test.ts),
  "should reject script with dangerous pattern" (duplicate + always-pass or-chain),
  "should show script-not-found message when both URLs 404" (duplicate of existing 404 test)
- Remove 5 theatrical tests from custom-flag.test.ts that only verify
  constant arrays have entries with defined id/label fields (SERVER_TYPES,
  LOCATIONS, DROPLET_SIZES, DO_REGIONS, SANDBOX_SIZES) — these test constant
  existence, not behavior, and fail due to @openrouter/spawn-shared import error
- Bump CLI version to 0.11.18

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: Remove trailing blank lines in custom-flag.test.ts for biome format

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-01 09:09:29 -08:00
A
cef14ce9ea
fix(sprite): pass timeoutSecs through to runSprite, add kill-on-timeout (#2060)
runSprite was wired as CloudRunner.runServer but silently dropped the
timeoutSecs parameter. All other clouds (Hetzner, DO, AWS, GCP, Daytona)
implement kill-on-timeout via setTimeout+killWithTimeout; Sprite had zero
timeout protection, so a hung agent install (e.g. ZeroClaw's 600s Rust
compile, Claude Code's 300s install) would hang forever on Sprite.

Matches the pattern used by every other cloud provider.

Agent: team-lead

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 08:21:26 -05:00
A
210519a590
fix(security): document PKCE migration path for DigitalOcean OAuth (#2056)
Adds explicit monitoring obligation and step-by-step migration
checklist to the DO_CLIENT_SECRET comment. Tracks when PKCE was last
verified unsupported (2026-03) and what to do when it becomes
available, addressing the technical debt tracking request from #2041.

Fixes #2041

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-01 04:11:40 -05:00
A
84133fb036
fix(security): replace validateLaunchCmd blocklist with allowlist (#2053)
* fix(security): replace validateLaunchCmd blocklist with allowlist

The blocklist pattern />\\s*\\// (redirection to absolute path) matched
2>/dev/null, which appears in every valid launch command generated by
agent-setup.ts. This caused mergeLastConnection() to reject and discard
all connection data, breaking the spawn list → "Enter agent" reconnect
flow and spawn last.

Replace the blocklist with a strict allowlist: each semicolon-separated
segment must match one of:
  - source ~/.<rc-file> [2>/dev/null]
  - export PATH=<safe-path>
  - <binary> [simple-args]

This simultaneously fixes the false-positive and closes the latent
injection gap (the old blocklist only blocked '; rm' but not arbitrary
'; <other-cmd>').

Fixes #2052

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: apply biome formatter to fix CI format check

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 03:12:27 -05:00
A
9be0c9597d
fix: spawn last reconnects to existing VM instead of always reprovisioning (#2051)
`cmdLast()` was always calling `cmdRun()`, creating a brand-new VM every
time. Wire it into `handleRecordAction()` instead, which already contains
the reconnect-vs-rerun logic used by `spawn list`: if the latest history
record has a live connection (IP + server ID), the user is offered options
to enter the agent or SSH in; only if no connection info exists (or the
user chooses "Spawn a new VM") does it provision a fresh instance.

Also bumps CLI version 0.11.13 → 0.11.14.

Fixes #2050

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-01 01:49:36 -05:00
A
708326a693
refactor: Remove dead exports across TypeScript modules (#2044)
Remove `export` from functions that are only used internally within their
own file and never imported elsewhere. Affected modules:

- `history.ts`: `mergeLastConnection` (only called internally by `getActiveServers`/`filterHistory`)
- `update-check.ts`: `isUpdateBackedOff` (only called internally by `checkForUpdates`)
- `aws/aws.ts`: `waitForSsh` (only called internally by `waitForCloudInit`)
- `gcp/gcp.ts`: `waitForSsh` (only called internally by `waitForCloudInit`)
- `daytona/daytona.ts`: `waitForSsh` (only called internally by `waitForCloudInit`)
- `shared/agent-setup.ts`: 11 implementation helpers (`installAgent`, `uploadConfigFile`,
  `installClaudeCode`, `setupClaudeCodeConfig`, `promptGithubAuth`, `setupCodexConfig`,
  `setupOpenclawConfig`, `startGateway`, `setupZeroclawConfig`, `ensureSwapSpace`,
  `openCodeInstallCmd`) — all only used within `createAgents()`

All 1410 tests pass, biome lint clean (0 errors).

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-28 20:39:38 -05:00
A
44f67462ed
fix: extend HOME hardening to ssh-keys, sprite, gcp (3 files missed by #2026) (#2036)
When HOME is unset (containers, systemd, cron), process.env.HOME produces
literal "undefined" in path strings:
- ssh-keys.ts: SSH discovery/generation writes to "undefined/.ssh/"
- sprite.ts: CLI detection misses ~/.local/bin, PATH update corrupted
- gcp.ts: gcloud detection misses ~/google-cloud-sdk/bin, PATH corrupted

Same fix as #2026: use `process.env.HOME || homedir()` via `join()` for
robust OS-level fallback when HOME is unset.

Agent: team-lead

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 13:51:09 -08:00
A
a8b7bb7fb9
test: consolidate wasteful one-per-flag tests in unknown-flags suite (#2029)
Remove 18 redundant/theatrical tests from unknown-flags.test.ts:

- Removed duplicate 'should detect --verbose as unknown' test (same name,
  same assertion, nearly identical inputs as the test 28 lines above it)
- Consolidated 14 individual 'allows known flags' tests — each called
  findUnknownFlag([flag]) with a single flag and expected null — into one
  data-driven loop over all 17 flags; same coverage, 13 fewer test cases
- Removed 'should contain --name flag' which is fully subsumed by the
  immediately following 'should contain all expected flags' test that
  already verifies --name along with 22 other flags

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 20:42:04 +00:00
A
912d8305c5
fix: add missing hermes agent to createAgents() and update sprite agents list (#2024)
The hermes agent was added to manifest.json and sh/sprite/hermes.sh in
feat #2023, but createAgents() in shared/agent-setup.ts was not updated.
This caused sh/sprite/hermes.sh to throw "Unknown agent: hermes" when
resolveAgent() was called.

- Add hermes entry to createAgents() with correct install cmd, envVars, and launchCmd
- Update sprite/main.ts usage error message to include hermes

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 12:08:18 -05:00
A
ae3f4001cc
refactor: Remove dead code and stale references (#2017)
Remove stale comments in test files that referenced deleted test files
(commands-untested.test.ts, commands-helpers.test.ts) and remove
"Agent: X" metadata annotations that became obsolete after the
theatrical test cleanup.

All 1424 tests pass, biome lint clean.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-02-28 02:07:48 -08:00
A
87978b424d
refactor: Remove dead code and stale references (#2010)
Dead code removed:
- `cleanup_stale_apps` function in `sh/e2e/lib/cleanup.sh` — defined but
  never called; `e2e.sh` calls `cloud_cleanup_stale` directly instead
- `generateEnvConfig` and `AgentConfig` re-exports from all 7 cloud-specific
  `agents.ts` modules (aws, hetzner, gcp, digitalocean, daytona, local,
  sprite) — nothing imported these from the cloud modules; they were already
  available via `@openrouter/spawn-shared` and `../shared/agents`

All 1435 tests pass, biome lint is clean (0 errors).

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 00:18:20 -05:00
A
e063180f06
refactor: Remove dead code and stale references (#2008)
Remove `setupOpenclawBatched` from `packages/cli/src/shared/agent-setup.ts`.
This function was exported but never called anywhere in the codebase — it was
superseded by the composable `setupOpenclawConfig` + `startGateway` approach
used in `createAgents()`.

Bump CLI patch version to 0.11.7.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 20:38:06 -05:00
A
4edc886e55
fix(security): validate launch_cmd from history before shell execution (#2006)
* fix(security): validate launch_cmd from history before shell execution

launch_cmd from history.json was passed directly to bash -lc via SSH with no
validation, enabling command injection if the history file was tampered with.

Adds validateLaunchCmd() that blocks $(...), backticks, pipes, command chaining,
redirections, and variable expansion. Validation is applied at both merge time
(history.ts:mergeLastConnection) and execution time (commands.ts:cmdEnterAgent).

Fixes #2005

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: apply biome formatting to security.ts

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-27 18:15:17 -05:00
A
a678402e67
refactor: Remove dead code and stale references (#2003)
- Remove `runLocalCapture` from local/local.ts (exported but never called)
- Remove `listServers` from aws, hetzner, digitalocean, daytona modules
  (all exported but never imported or called anywhere)
- Remove `InstanceListSchema` from aws.ts (only used in removed listServers)
- Remove now-unused imports in daytona.ts (parseJsonRaw, toObjectArray, toRecord)
- Bump CLI version 0.11.4 → 0.11.5

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-27 13:43:58 -08:00
A
f49cd97cdf
fix(ux): apply resolveListFilters to cmdDelete so bare positional args work (#2002)
spawn delete hetzner was silently returning "No active servers to delete"
even when the user had active Hetzner servers. The positional arg was
parsed as agentFilter, but no agent is named "hetzner", so the filter
matched nothing. cmdList already calls resolveListFilters() which
auto-promotes a bare arg to cloudFilter when no agent matches — cmdDelete
was missing this call entirely.

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-27 14:18:55 -05:00
A
d92d0e6e21
fix(security): prevent flag injection via hyphen-leading remote paths (#1996)
* fix(security): prevent flag injection via hyphen-leading path segments in uploadFile

Reject remote paths where any segment starts with "-" (e.g., "-e", "/tmp/-evil")
across all 6 cloud providers. This prevents potential CLI flag injection in
commands like base64, printf, mv, and scp.

Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: fix biome format for path validation conditions

Break long if-conditions across multiple lines and add parentheses
around arrow function parameters to satisfy biome formatter.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-27 11:17:13 -05:00
A
d8131c3df6
fix(hetzner): update deprecated server types to cx23/cpx22 gen (#1983)
* fix(hetzner): update deprecated cx22/cpx21 server types to cx23/cpx22

Hetzner deprecated the entire cx*2 and cpx*1 server lines on Jan 1, 2026.
New orders fail with "server type is deprecated". Updates to the current
gen3 CX and gen2 CPX lines (cx23, cx33, cx43, cx53, cpx22, cpx32).

Also shows the server type picker by default instead of requiring --custom,
so users can choose their instance size on every deploy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(zeroclaw): append autonomy config instead of overwriting onboard output

zeroclaw onboard generates a complete config with required fields like
default_temperature. Our setup was overwriting that with a partial config
missing required fields, causing a crash loop on startup. Now appends
the security/shell settings instead so onboard's fields are preserved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix biome formatting in agent-setup.ts

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Agent: pr-maintainer

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-02-27 00:20:31 -08:00
A
4436a01372
fix(aws): increase OpenClaw gateway timeout and default to medium bundle (#1982)
* fix(aws): increase OpenClaw gateway timeout to 120s and default to medium bundle

OpenClaw gateway consistently times out on AWS Lightsail because the 60s
timeout is too short for cold starts (npm install of 713 packages + gateway
init). Doubles the timeout to 120s and sets the default bundle for OpenClaw
to medium_3_0 (4 GB RAM) since it's too heavy for nano (512 MB).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve openclaw binary path for setsid and add npm-global to Sprite PATH

setsid replaces the process image and doesn't inherit the parent shell's
exported PATH, causing "No such file or directory" on Sprite (and potentially
other clouds). Fix by resolving the full binary path with `command -v` before
passing it to setsid. Also adds ~/.npm-global/bin to Sprite's persisted shell
PATH config so openclaw is discoverable in all session types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(codex): update wire_api from "chat" to "responses"

Codex CLI dropped support for wire_api = "chat" — it now requires
"responses". This was never updated since the original codex integration,
causing an immediate crash loop on launch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: enable GitHub CLI auth for all agents, not just Claude Code

Only Claude Code had preProvision: promptGithubAuth — all other agents
(codex, openclaw, opencode, kilocode, zeroclaw) skipped GitHub auth
entirely. These are all coding agents that need gh access for PRs,
cloning, etc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add missing spawn import that crashes headless mode (#1981)

runBashHeadless calls spawn() from node:child_process at line 1112,
but only spawnSync was imported. This causes a ReferenceError crash
whenever --headless mode is used.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-02-27 01:58:17 -05:00
A
d04096a15b
feat!: remove Fly.io cloud provider support (#1979)
* feat!: remove Fly.io cloud provider support

Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io
infrastructure internally) is retained.

- Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/
- Remove fly cloud entry and 6 fly matrix entries from manifest.json
- Remove fly imports, destroy cases, and connection handlers from commands.ts
- Remove fly-ssh sentinel from security.ts
- Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh)
- Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts
- Clean up fly references in build config, gitignore, icon sources
- Bump CLI version to 0.11.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: restore Docker image build under sh/docker/

Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename
workflow from fly-docker.yml to docker.yml with updated paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix extra blank lines in commands.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-27 00:06:32 -05:00
A
cb4e6839b7
fix: add ~/.npm-global/bin to OpenClaw PATH for gateway, launch, and reconnect (#1972)
* fix: add ~/.npm-global/bin to OpenClaw PATH for gateway, launch, and reconnect

OpenClaw installs to ~/.npm-global/bin/ via npm, but startGateway() and
launchCmd() only included ~/.bun/bin and ~/.local/bin in PATH — so the
`openclaw` binary was never found on non-Fly clouds (DigitalOcean, Hetzner,
AWS, GCP). Fly was unaffected because it uses setupOpenclawBatched() which
correctly includes the npm-global path.

Three fixes:
1. startGateway(): add $HOME/.npm-global/bin to PATH
2. launchCmd(): add $HOME/.npm-global/bin to PATH
3. install(): persist PATH to ~/.bashrc and ~/.zshrc (matching codex/kilocode
   pattern) so reconnects via `spawn openclaw <cloud> --name ...` also work

Closes #1965

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct command chaining and idempotency in npm-global PATH setup

- Use curly braces to group grep||echo so PATH append only runs after
  successful npm install (fixes operator precedence bug)
- Skip ~/.zshrc modification when file doesn't exist (avoids creating
  it on non-zsh systems)
- Use grep -qF for literal string matching (no regex interpretation)
- Apply fix to all three affected agents: openclaw, codex, kilocode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-02-26 20:45:36 -05:00
A
15d8828638
fix: add npm-global/bin to PATH for openclaw gateway on non-Fly clouds (#1969)
* fix: add npm-global/bin to PATH for openclaw startGateway and launchCmd

Fixes crash where openclaw gateway fails to start on non-Fly clouds
(DigitalOcean, Hetzner, AWS, GCP) because ~/.npm-global/bin was absent
from PATH in startGateway() and launchCmd(). Fly was unaffected because
setupOpenclawBatched() already included the correct PATH.

Fixes #1965

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* style: fix Biome format error on launchCmd line

Agent: pr-maintainer

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-26 18:46:55 -05:00
A
892830689c
refactor: Remove dead code and consolidate duplicate parseJson helpers (#1963)
- Remove CACHE_DIR dead export from manifest.ts (was defined but never imported anywhere)
- Add parseJsonObj() to @openrouter/spawn-shared for parsing JSON objects
- Remove 4x duplicate local parseJson/LooseObject definitions from hetzner, digitalocean, daytona, fly cloud modules
- Remove now-unused `import * as v from "valibot"` from all 4 cloud modules
- Bump CLI to 0.10.24

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-26 16:17:47 -05:00
A
623b4aca64
fix: add npm-global/bin to PATH for codex and kilocode installs (#1953)
Fixes #1947

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-26 06:37:51 -08:00
A
95d4ca93bb
refactor: Remove dead code and stale references (#1950)
Remove the `runWithRetry` function exported from 4 cloud modules (aws, hetzner, gcp, digitalocean)
that were defined but never called anywhere in the codebase. Only `fly.ts` uses its own
`runWithRetry` internally, so that definition is preserved.

Also bump CLI version to 0.10.22 per version policy.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 08:36:08 -05:00
A
9f57e2b506
fix: replace non-null assertions with proper null guards in fly.ts and oauth.ts (#1946)
Replace 6 non-null assertion operators (!) with safe alternatives:
- fly.ts: 4x getCmd()! -> null guard with clear error message
- fly.ts: 1x .pop()! -> fallback with || ""
- oauth.ts: 1x .get("code")! -> hoist value from outer if-check

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-26 05:49:48 -05:00
A
fe6fd20143
refactor: remove duplicate sleep() definitions in fly and daytona modules (#1944)
Both fly.ts and daytona.ts defined a local `sleep` helper identical to the
one already exported from shared/ssh.ts. Remove the local copies and import
the shared function instead, consistent with all other cloud modules.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 03:51:03 -05:00
Ahmed Abushagur
bdcde7bfc4
fix: use spawnSync for script execution to eliminate fd 0 competition (#1942)
The cmdRun path (the main user flow) was still using async
child_process.spawn for script execution. This left Bun's event loop
running while SSH (a grandchild process inside the bash script)
competed for fd 0 input bytes — causing intermittent keystroke loss.

Switch spawnBash to use spawnSync, which blocks the event loop entirely
and gives the child process exclusive terminal access. This matches
what we already did for runInteractiveCommand in #1939.

Also removes dead spawnCalls tracking code from cmdrun-happy-path test.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 02:50:59 -05:00
A
954f3b4893
fix: use user-local npm prefix for openclaw install (#1941)
npm install -g openclaw fails with EACCES on non-root users (e.g.,
ubuntu on AWS Lightsail) because /usr/local/lib/node_modules isn't
writable. Use the same ~/.npm-global prefix pattern already used by
codex and kilocode agents.

Fixes both the standard installAgent path and the batched
setupOpenclawBatched path (used by Fly).

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 00:57:09 -05:00
Ahmed Abushagur
193cae0b29
fix: pause stdin before interactive handoff to stop keystroke loss (#1938)
The parent process called process.stdin.resume() which put stdin into
flowing mode, making it actively read from fd 0 and discard bytes
(no listeners). This caused the parent to race with the child SSH
process for keystrokes — the kernel gave each byte to whichever
process called read() first, resulting in random keystroke drops.

Switching to pause() makes the parent stop reading from fd 0, so
Bun.spawn(stdio: "inherit") gives the child exclusive access to
the terminal input via dup2().

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 20:38:57 -05:00
A
556f32ecfc
fix: reset terminal state before interactive session handoff (#1934)
* fix: reset terminal state before interactive session handoff

The stdin handoff from TS orchestration to the interactive SSH session
was leaving the terminal in a dirty state, causing users to need 2+
Enter presses or random keystrokes before input worked.

Three fixes:
1. Unconditionally call setRawMode(false) instead of checking isRaw
   first — @clack/core's close() already resets the flag but the
   terminal can still be dirty after multiple readline instances
2. Run `stty sane` to fully reset the terminal line discipline,
   undoing any damage from readline's emitKeypressEvents
3. Resume stdin instead of pausing it — Bun.spawn with stdio:"inherit"
   needs an active stream, a paused stdin causes the child to see
   blocked input

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: fix Biome formatting for Bun.spawnSync call

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-25 18:45:27 -05:00
A
98d5a6612b
fix: add SIGKILL fallback to process timeout kills (#1931)
* fix: add SIGKILL fallback to process timeout kills

proc.kill() only sends SIGTERM; SSH processes stuck in network I/O can
ignore SIGTERM and cause the CLI to hang forever waiting on proc.exited.

Add killWithTimeout() to shared/ssh.ts that sends SIGTERM then SIGKILL
after a 5s grace period. Replace all 10 proc.kill() timeout sites across
Fly, AWS, DigitalOcean, GCP and Hetzner providers.

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* chore: format files with biome

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 16:50:18 -05:00
A
fbb48514e9
fix: drain piped stdout/stderr in aws waitForCloudInit and sprite destroyServer (#1929)
Same pipe-buffer deadlock pattern fixed by PRs #1903, #1915, #1920, #1922.
Two instances were missed in those passes.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 15:48:43 -05:00
A
7a5e7580bd
fix(security): validate server_id in cmdConnect and cmdEnterAgent (#1925)
All other connection fields (ip, user, server_name) are validated
against injection before being passed to shell commands, but server_id
was skipped in both cmdConnect and cmdEnterAgent despite being used as
a daytona ssh argument (line 2922). This inconsistency existed while
execDeleteServer, mergeLastConnection, and the headless code path all
correctly validated server_id.

Adds the missing `if (connection.server_id) { validateServerIdentifier(...) }`
guard in both functions, matching the existing server_name pattern.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 12:56:57 -05:00
A
2907ff6068
fix: drain piped stderr in CLI installers and uploadFile to prevent deadlock (#1922)
PR #1920 fixed pipe buffer deadlock in runServerCapture and
waitForCloudInit but missed 6 other locations where Bun.spawn uses
"pipe" for stderr without draining it before await proc.exited.

When a child process writes >64KB to a piped stderr, the OS pipe
buffer fills, the child blocks on write(), and the parent blocks on
exited — classic deadlock.

Fix: change stderr from "pipe" to "inherit" in all 6 locations since
the stderr output is never read programmatically. This also lets
users see installation errors and SCP errors in real time.

Affected functions:
- fly.ts ensureFlyCli()
- sprite.ts ensureSpriteCli()
- gcp.ts ensureGcloudCli()
- hetzner.ts uploadFile()
- digitalocean.ts uploadFile()
- aws.ts uploadFile()

-- refactor/code-health

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 09:26:27 -05:00
A
b279cc7c8b
fix: drain piped stderr in Fly/Daytona runServerCapture to prevent deadlock (#1920)
The runServerCapture functions in fly.ts and daytona.ts spawn processes
with stdio: ["pipe", "pipe", "pipe"] but only drain stdout. If stderr
output exceeds the 64KB pipe buffer, the child process blocks on write
and deadlocks. This was already fixed in Hetzner, DigitalOcean, AWS,
GCP, and shared/ssh.ts (commit 2e79d71b) but Fly and Daytona were
missed.

Apply the same Promise.all pattern to drain both pipes concurrently.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 07:19:15 -05:00
A
9e309f1ff5
fix(security): escape single quotes in standard SSH enter-agent path (#1902)
The standard SSH path in cmdEnterAgent() interpolated remoteCmd into a
single-quoted bash -lc wrapper without escaping embedded single quotes.
If launch_cmd (from history.json) or the manifest's launch/pre_launch
fields contained a single quote, the shell quoting would break, allowing
unintended command execution on the remote server.

The Fly.io path already had this escaping (PR #1880, #1893) but the
generic SSH fallback did not. This adds the same replace(/'/g, "'\\''")
pattern used everywhere else in the codebase.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 22:06:13 -08:00
A
40417d845d
fix: use single-quote escaping for restart loop in interactiveSession (#1893)
JSON.stringify double-quoting caused two bugs in the restart wrapper:
1. Literal \n instead of newlines (bash doesn't interpret \n in "...")
2. Shell variables ($vars) expanded to empty strings before script ran

Affected clouds: fly, gcp, hetzner, digitalocean, aws.
Daytona already had the correct single-quote escaping.

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 18:47:23 -05:00
A
d423ea57d5
refactor: deduplicate pickToTTY as wrapper around pickToTTYWithActions (#1891)
Fixes #1890

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-24 17:48:57 -05:00
A
5b9ceef060
perf: replace child_process.spawn with Bun.spawn for interactive sessions (#1888)
Using Node's child_process.spawn() to launch interactive SSH/shell sessions
from inside a Bun process adds unnecessary overhead: an extra process fork,
PTY negotiation indirection, and a forced Bun→Node stdio context switch.

Switch all interactiveSession() functions to Bun.spawn() with
stdio: ["inherit","inherit","inherit"], which hands off file descriptors
directly without forking a Node wrapper process.

Also removes the 500ms hardcoded sleep in orchestrate.ts that was a
band-aid for the old child_process handoff latency. The synchronous
prepareStdinForHandoff() is sufficient on its own.

Affected clouds: hetzner, aws, gcp, digitalocean, fly, daytona, sprite, local
Also fixes runInteractiveCommand() in commands.ts (spawn connect).

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 15:17:53 -05:00
A
2a1ed4164e
fix(gcp): add network/subnet flags to fix custom VPC subnet mode (#1883)
GCP instance creation was failing with 'Invalid value for field
resource.networkInterfaces[0].subnetwork' when the project VPC uses
custom subnet mode. Add --network and --subnet flags defaulting to
'default', with GCP_NETWORK and GCP_SUBNET env var overrides for
custom VPC setups.

Fixes #1882

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-24 14:27:46 -05:00
A
0edc588629
fix(daytona): use single-quote escaping in interactiveSession to prevent shell injection (#1880)
Replace JSON.stringify double-quoting with single-quote escaping for the
cmd argument in interactiveSession(). Double-quoted strings in bash allow
$() and ${} expansion, making the previous pattern vulnerable to injection
if cmd ever contained shell metacharacters. Single-quoted strings prevent
ALL shell expansion, matching the defense-in-depth approach Fly already uses.

Fixes #1879

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-24 12:28:26 -05:00
A
c701b745ab
fix: add process supervision restart loop for agents on cloud VMs (#1862)
When an agent process dies on a cloud VM (SIGTERM, OOM, crash), it now
automatically restarts after 5 seconds, up to 10 times. Clean exits
(code 0) break out immediately. Local execution is unaffected.

Fixes #1860

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-24 02:51:31 -05:00
A
9f6bc423ac
fix: add swap space before ZeroClaw install to prevent OOM on nano instances (#1851)
* fix: add swap space before ZeroClaw install to prevent OOM on nano instances

ZeroClaw's Rust compilation gets OOM-killed on nano_3_0 (512 MB) — build
fails at a random dependency each run. Add ensureSwapSpace() that creates
a 1 GB swap file before running the installer:

- Idempotent: skips silently if swap already exists
- Non-fatal: logs a warning if sudo fails (larger instances won't need it)
- Timeout bumped from 5 min to 10 min (swap-backed builds are slower)
- Defense-in-depth: --prefer-prebuilt avoids compilation in the common
  case, but fallback source builds still need memory

Fixes #1840

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add input validation to ensureSwapSpace() to prevent command injection

Validate sizeMb is a positive integer before interpolating into shell
commands, as requested in security review.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-23 23:12:13 -08:00
A
f5f020d250
feat: Result monad in shared + Claude Code fixtures + SPA Result adoption (#1858)
* refactor: split SPA into helpers + main, add build script and tests

Split slack-bot.ts into helpers.ts (pure functions) and main.ts (entry
point) for testability. Add build.ts to bundle SPA into spa.js. Add
spa.test.ts with 19 tests covering stream parsing and text helpers.

Improved streaming: tool_use and tool_result events get their own Slack
messages instead of concatenating everything into one. Prompt is passed
via stdin to avoid CLI flag parsing issues with user content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: drop build.ts — run main.ts directly via bun

Bun runs TypeScript natively, no bundling step needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: move Result monad to shared, add Claude Code fixtures, use Result in SPA

- Move Result type/Ok/Err from packages/cli/src/shared/result.ts to
  packages/shared/src/result.ts and re-export from @openrouter/spawn-shared
- Update CLI imports (ui.ts) to use the shared package
- Add fixtures/claude-code/ with realistic stream-json events covering
  all event types (assistant text, tool_use, user tool_result, result)
- Refactor SPA helpers to return Result<T> instead of throwing/returning null:
  loadState() → Result<State>, saveState() → Result<void>,
  downloadSlackFile() → Result<string>, addMapping() → Result<void>
- Update main.ts call sites to handle Result returns
- Update SPA tests to import events from fixtures and test Result returns
- Bump CLI version 0.10.0 → 0.10.1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: biome format issues in aws.test.ts, aws.ts, daytona.ts

Expand inline objects/arrays to multi-line format to satisfy biome
formatter rules. No logic changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 22:48:51 -08:00
A
65f6f1be32
feat: Bun workspace monorepo — packages/cli + packages/shared (#1853)
Restructure the repo as a Bun workspace monorepo:

- Move cli/ → packages/cli/
- Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities
- Add root package.json with workspace configuration
- Update all CLI imports to use @openrouter/spawn-shared
- Deduplicate toRecord/toObjectArray helpers from 4 cloud modules
- Update SPA (slack-bot) to use shared package instead of local toObj()
- Update 48 agent shell scripts for new packages/cli/ path
- Update install.sh, install.ps1, e2e, and test scripts
- Update all GitHub workflows, .gitignore, pre-commit hooks
- Update CLAUDE.md, README.md, and skill prompt references
- Pin all dependency versions (no ^ ranges)
- Bump CLI version 0.9.1 → 0.10.0

All 1908 tests pass. Lint clean. All 8 cloud bundles build.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 22:07:05 -08:00
Renamed from cli/package.json (Browse further)