* fix: add ~/.npm-global/bin to OpenClaw PATH for gateway, launch, and reconnect
OpenClaw installs to ~/.npm-global/bin/ via npm, but startGateway() and
launchCmd() only included ~/.bun/bin and ~/.local/bin in PATH — so the
`openclaw` binary was never found on non-Fly clouds (DigitalOcean, Hetzner,
AWS, GCP). Fly was unaffected because it uses setupOpenclawBatched() which
correctly includes the npm-global path.
Three fixes:
1. startGateway(): add $HOME/.npm-global/bin to PATH
2. launchCmd(): add $HOME/.npm-global/bin to PATH
3. install(): persist PATH to ~/.bashrc and ~/.zshrc (matching codex/kilocode
pattern) so reconnects via `spawn openclaw <cloud> --name ...` also work
Closes#1965
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: correct command chaining and idempotency in npm-global PATH setup
- Use curly braces to group grep||echo so PATH append only runs after
successful npm install (fixes operator precedence bug)
- Skip ~/.zshrc modification when file doesn't exist (avoids creating
it on non-zsh systems)
- Use grep -qF for literal string matching (no regex interpretation)
- Apply fix to all three affected agents: openclaw, codex, kilocode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* fix: add npm-global/bin to PATH for openclaw startGateway and launchCmd
Fixes crash where openclaw gateway fails to start on non-Fly clouds
(DigitalOcean, Hetzner, AWS, GCP) because ~/.npm-global/bin was absent
from PATH in startGateway() and launchCmd(). Fly was unaffected because
setupOpenclawBatched() already included the correct PATH.
Fixes#1965
Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* style: fix Biome format error on launchCmd line
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove CACHE_DIR dead export from manifest.ts (was defined but never imported anywhere)
- Add parseJsonObj() to @openrouter/spawn-shared for parsing JSON objects
- Remove 4x duplicate local parseJson/LooseObject definitions from hetzner, digitalocean, daytona, fly cloud modules
- Remove now-unused `import * as v from "valibot"` from all 4 cloud modules
- Bump CLI to 0.10.24
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Remove the `runWithRetry` function exported from 4 cloud modules (aws, hetzner, gcp, digitalocean)
that were defined but never called anywhere in the codebase. Only `fly.ts` uses its own
`runWithRetry` internally, so that definition is preserved.
Also bump CLI version to 0.10.22 per version policy.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Both fly.ts and daytona.ts defined a local `sleep` helper identical to the
one already exported from shared/ssh.ts. Remove the local copies and import
the shared function instead, consistent with all other cloud modules.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The cmdRun path (the main user flow) was still using async
child_process.spawn for script execution. This left Bun's event loop
running while SSH (a grandchild process inside the bash script)
competed for fd 0 input bytes — causing intermittent keystroke loss.
Switch spawnBash to use spawnSync, which blocks the event loop entirely
and gives the child process exclusive terminal access. This matches
what we already did for runInteractiveCommand in #1939.
Also removes dead spawnCalls tracking code from cmdrun-happy-path test.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
npm install -g openclaw fails with EACCES on non-root users (e.g.,
ubuntu on AWS Lightsail) because /usr/local/lib/node_modules isn't
writable. Use the same ~/.npm-global prefix pattern already used by
codex and kilocode agents.
Fixes both the standard installAgent path and the batched
setupOpenclawBatched path (used by Fly).
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The parent process called process.stdin.resume() which put stdin into
flowing mode, making it actively read from fd 0 and discard bytes
(no listeners). This caused the parent to race with the child SSH
process for keystrokes — the kernel gave each byte to whichever
process called read() first, resulting in random keystroke drops.
Switching to pause() makes the parent stop reading from fd 0, so
Bun.spawn(stdio: "inherit") gives the child exclusive access to
the terminal input via dup2().
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: reset terminal state before interactive session handoff
The stdin handoff from TS orchestration to the interactive SSH session
was leaving the terminal in a dirty state, causing users to need 2+
Enter presses or random keystrokes before input worked.
Three fixes:
1. Unconditionally call setRawMode(false) instead of checking isRaw
first — @clack/core's close() already resets the flag but the
terminal can still be dirty after multiple readline instances
2. Run `stty sane` to fully reset the terminal line discipline,
undoing any damage from readline's emitKeypressEvents
3. Resume stdin instead of pausing it — Bun.spawn with stdio:"inherit"
needs an active stream, a paused stdin causes the child to see
blocked input
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix Biome formatting for Bun.spawnSync call
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* fix: add SIGKILL fallback to process timeout kills
proc.kill() only sends SIGTERM; SSH processes stuck in network I/O can
ignore SIGTERM and cause the CLI to hang forever waiting on proc.exited.
Add killWithTimeout() to shared/ssh.ts that sends SIGTERM then SIGKILL
after a 5s grace period. Replace all 10 proc.kill() timeout sites across
Fly, AWS, DigitalOcean, GCP and Hetzner providers.
Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* chore: format files with biome
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Same pipe-buffer deadlock pattern fixed by PRs #1903, #1915, #1920, #1922.
Two instances were missed in those passes.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
All other connection fields (ip, user, server_name) are validated
against injection before being passed to shell commands, but server_id
was skipped in both cmdConnect and cmdEnterAgent despite being used as
a daytona ssh argument (line 2922). This inconsistency existed while
execDeleteServer, mergeLastConnection, and the headless code path all
correctly validated server_id.
Adds the missing `if (connection.server_id) { validateServerIdentifier(...) }`
guard in both functions, matching the existing server_name pattern.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
PR #1920 fixed pipe buffer deadlock in runServerCapture and
waitForCloudInit but missed 6 other locations where Bun.spawn uses
"pipe" for stderr without draining it before await proc.exited.
When a child process writes >64KB to a piped stderr, the OS pipe
buffer fills, the child blocks on write(), and the parent blocks on
exited — classic deadlock.
Fix: change stderr from "pipe" to "inherit" in all 6 locations since
the stderr output is never read programmatically. This also lets
users see installation errors and SCP errors in real time.
Affected functions:
- fly.ts ensureFlyCli()
- sprite.ts ensureSpriteCli()
- gcp.ts ensureGcloudCli()
- hetzner.ts uploadFile()
- digitalocean.ts uploadFile()
- aws.ts uploadFile()
-- refactor/code-health
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The runServerCapture functions in fly.ts and daytona.ts spawn processes
with stdio: ["pipe", "pipe", "pipe"] but only drain stdout. If stderr
output exceeds the 64KB pipe buffer, the child process blocks on write
and deadlocks. This was already fixed in Hetzner, DigitalOcean, AWS,
GCP, and shared/ssh.ts (commit 2e79d71b) but Fly and Daytona were
missed.
Apply the same Promise.all pattern to drain both pipes concurrently.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The standard SSH path in cmdEnterAgent() interpolated remoteCmd into a
single-quoted bash -lc wrapper without escaping embedded single quotes.
If launch_cmd (from history.json) or the manifest's launch/pre_launch
fields contained a single quote, the shell quoting would break, allowing
unintended command execution on the remote server.
The Fly.io path already had this escaping (PR #1880, #1893) but the
generic SSH fallback did not. This adds the same replace(/'/g, "'\\''")
pattern used everywhere else in the codebase.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
JSON.stringify double-quoting caused two bugs in the restart wrapper:
1. Literal \n instead of newlines (bash doesn't interpret \n in "...")
2. Shell variables ($vars) expanded to empty strings before script ran
Affected clouds: fly, gcp, hetzner, digitalocean, aws.
Daytona already had the correct single-quote escaping.
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Using Node's child_process.spawn() to launch interactive SSH/shell sessions
from inside a Bun process adds unnecessary overhead: an extra process fork,
PTY negotiation indirection, and a forced Bun→Node stdio context switch.
Switch all interactiveSession() functions to Bun.spawn() with
stdio: ["inherit","inherit","inherit"], which hands off file descriptors
directly without forking a Node wrapper process.
Also removes the 500ms hardcoded sleep in orchestrate.ts that was a
band-aid for the old child_process handoff latency. The synchronous
prepareStdinForHandoff() is sufficient on its own.
Affected clouds: hetzner, aws, gcp, digitalocean, fly, daytona, sprite, local
Also fixes runInteractiveCommand() in commands.ts (spawn connect).
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
GCP instance creation was failing with 'Invalid value for field
resource.networkInterfaces[0].subnetwork' when the project VPC uses
custom subnet mode. Add --network and --subnet flags defaulting to
'default', with GCP_NETWORK and GCP_SUBNET env var overrides for
custom VPC setups.
Fixes#1882
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace JSON.stringify double-quoting with single-quote escaping for the
cmd argument in interactiveSession(). Double-quoted strings in bash allow
$() and ${} expansion, making the previous pattern vulnerable to injection
if cmd ever contained shell metacharacters. Single-quoted strings prevent
ALL shell expansion, matching the defense-in-depth approach Fly already uses.
Fixes#1879
Agent: issue-fixer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
When an agent process dies on a cloud VM (SIGTERM, OOM, crash), it now
automatically restarts after 5 seconds, up to 10 times. Clean exits
(code 0) break out immediately. Local execution is unaffected.
Fixes#1860
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add swap space before ZeroClaw install to prevent OOM on nano instances
ZeroClaw's Rust compilation gets OOM-killed on nano_3_0 (512 MB) — build
fails at a random dependency each run. Add ensureSwapSpace() that creates
a 1 GB swap file before running the installer:
- Idempotent: skips silently if swap already exists
- Non-fatal: logs a warning if sudo fails (larger instances won't need it)
- Timeout bumped from 5 min to 10 min (swap-backed builds are slower)
- Defense-in-depth: --prefer-prebuilt avoids compilation in the common
case, but fallback source builds still need memory
Fixes#1840
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add input validation to ensureSwapSpace() to prevent command injection
Validate sizeMb is a positive integer before interpolating into shell
commands, as requested in security review.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* refactor: split SPA into helpers + main, add build script and tests
Split slack-bot.ts into helpers.ts (pure functions) and main.ts (entry
point) for testability. Add build.ts to bundle SPA into spa.js. Add
spa.test.ts with 19 tests covering stream parsing and text helpers.
Improved streaming: tool_use and tool_result events get their own Slack
messages instead of concatenating everything into one. Prompt is passed
via stdin to avoid CLI flag parsing issues with user content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: drop build.ts — run main.ts directly via bun
Bun runs TypeScript natively, no bundling step needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: move Result monad to shared, add Claude Code fixtures, use Result in SPA
- Move Result type/Ok/Err from packages/cli/src/shared/result.ts to
packages/shared/src/result.ts and re-export from @openrouter/spawn-shared
- Update CLI imports (ui.ts) to use the shared package
- Add fixtures/claude-code/ with realistic stream-json events covering
all event types (assistant text, tool_use, user tool_result, result)
- Refactor SPA helpers to return Result<T> instead of throwing/returning null:
loadState() → Result<State>, saveState() → Result<void>,
downloadSlackFile() → Result<string>, addMapping() → Result<void>
- Update main.ts call sites to handle Result returns
- Update SPA tests to import events from fixtures and test Result returns
- Bump CLI version 0.10.0 → 0.10.1
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: biome format issues in aws.test.ts, aws.ts, daytona.ts
Expand inline objects/arrays to multi-line format to satisfy biome
formatter rules. No logic changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructure the repo as a Bun workspace monorepo:
- Move cli/ → packages/cli/
- Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities
- Add root package.json with workspace configuration
- Update all CLI imports to use @openrouter/spawn-shared
- Deduplicate toRecord/toObjectArray helpers from 4 cloud modules
- Update SPA (slack-bot) to use shared package instead of local toObj()
- Update 48 agent shell scripts for new packages/cli/ path
- Update install.sh, install.ps1, e2e, and test scripts
- Update all GitHub workflows, .gitignore, pre-commit hooks
- Update CLAUDE.md, README.md, and skill prompt references
- Pin all dependency versions (no ^ ranges)
- Bump CLI version 0.9.1 → 0.10.0
All 1908 tests pass. Lint clean. All 8 cloud bundles build.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>