Commit graph

24 commits

Author SHA1 Message Date
A
e44705d925
fix(ux): reduce SSH wait verbosity and clarify agent handoff (#3056)
- Replace repeated 'SSH port closed (N/36)' with periodic updates every 5 attempts
- Add clear 'Provisioning complete. Connecting...' line before agent attach

Fixes #3053

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-27 15:22:46 +07:00
A
499eb494c6
fix(security): use StrictHostKeyChecking=accept-new in all SSH connections (#3037)
Replace StrictHostKeyChecking=no with accept-new across all E2E cloud
drivers (aws, gcp, digitalocean, hetzner), the shared SSH_BASE_OPTS
constant, and pull-history.ts. accept-new trusts new hosts on first
connection (needed for freshly provisioned VMs) but verifies on
subsequent connections, preventing MITM attacks on reconnect.

Fixes #3031

Agent: style-reviewer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-26 18:04:40 -07:00
A
52d06c4cb5
fix: resolve ANSI spinner corruption and garbled output (#3001) (#3003)
* fix(ux): replace download spinner with stderr logging, reset terminal before SSH handoff

Fixes two UX issues from live E2E session (#3001):

1. Download spinner (p.spinner from @clack/prompts) wrote ANSI escape codes
   to stdout. When stdout is captured (E2E harness, piped output), these
   sequences appeared as raw text rather than rendered colors. Replace
   p.spinner() in downloadScriptWithFallback and downloadBundle with
   logStep/logInfo/logError from shared/ui.ts, which write to stderr and
   correctly check isTTY before emitting ANSI codes.

2. Garbled output at start of interactive session (overlapping status lines
   from the remote agent's TUI) may be caused by residual ANSI state from
   @clack/prompts (hidden cursor, active color attributes). Emit
   ESC[?25h ESC[0m to stderr before prepareStdinForHandoff() to explicitly
   restore cursor visibility and reset all attributes before the SSH session
   takes over.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve ANSI spinner corruption and garbled output in interactive mode (#3001)

Three root causes fixed:

1. Spinner wrote to stdout while all other CLI status output goes to stderr,
   causing ANSI escape sequence interleaving and corruption when both streams
   are merged on a terminal. Redirected all p.spinner() calls to process.stderr.

2. unicode-detect.ts (which sets TERM=linux for SSH sessions to force ASCII
   fallback) was only imported in commands/shared.ts but not in shared/ui.ts.
   Cloud module entry points (hetzner/main.ts, etc.) that import shared/ui.ts
   loaded @clack/prompts without the TERM override, causing Unicode spinner
   frames in environments that can't render them.

3. After an interactive SSH session ends, the remote agent's TUI (e.g. Claude
   Code) may leave the terminal in raw mode with altered attributes. Added
   terminal reset (ANSI attribute reset + stty sane) after spawnInteractive()
   returns to prevent garbled post-session output.

Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 15:28:32 +07:00
Ahmed Abushagur
659fd1c6da
fix: use POSIX normalize for remote Linux paths in validateRemotePath (#2929)
node:path.normalize() is platform-dependent — on Windows it converts
forward slashes to backslashes, which then fail the character allowlist
regex. Remote paths are always Linux paths regardless of the client OS.

Switch to node:path/posix so normalization always uses forward slashes.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 19:34:49 -07:00
A
ffb4cbeb11
fix(security): prevent path traversal in uploadFile/downloadFile across all cloud providers (#2844)
Check for ".." path traversal in the raw input BEFORE normalize() strips
it, fixing CWE-22 where crafted paths like "/tmp/../../etc/passwd"
normalized to "/etc/passwd" and bypassed the post-normalize ".." check.

Extracts a shared validateRemotePath() into shared/ssh.ts and replaces
the duplicated inline validation in all 5 providers (DigitalOcean,
Hetzner, GCP, AWS, Sprite) plus agent-setup.ts.

Fixes #2835

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-20 16:48:58 -07:00
A
1dc9c04eeb
fix: standardize ESM import extensions across 35 production files (#2827)
Add .js extensions to 124 relative imports that were missing them.
The codebase is "type": "module" (ESM) and the dominant pattern already
used .js extensions, but 35 files had a mix of extensionless and .js
imports — sometimes within the same file. Standardize to .js everywhere.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-20 08:51:40 -07:00
A
148cc9e7ee
refactor: extract duplicate waitForSshSnapshotBoot to shared/ssh.ts (#2783)
The waitForSshOnly function was identically duplicated in hetzner.ts and
digitalocean.ts. Extract the shared logic into waitForSshSnapshotBoot() in
shared/ssh.ts and replace the duplicate cloud implementations with thin
wrappers that resolve module-local state before delegating.

-- qa/code-quality

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-18 22:10:25 -07:00
A
035ee3ca63
fix(ssh): always escalate to SIGKILL in killWithTimeout (#2752)
proc.killed is true as soon as kill() is called, not when the process
exits. This meant SIGKILL escalation was always skipped, leaving stuck
processes hanging indefinitely. Remove the faulty guard and always
attempt SIGKILL after the grace period — try/catch handles already-dead
processes.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 05:54:38 +00:00
A
46b1e9d42c
refactor: add no-try-catch + no-try-finally grit rules, eliminate all violations (#2481)
Add two new GritQL biome plugins (matching ori repo patterns) that ban
all try/catch and try/finally in TypeScript code. Convert all remaining
blocks across production and test files to use tryCatch/asyncTryCatch
from @openrouter/spawn-shared.

no-try-catch.grit covers all 4 variants:
- try/catch with binding, try/catch without binding
- try/catch/finally with binding, try/catch/finally without binding

no-try-finally.grit covers bare try/finally.

Both exclude shared/result.ts and shared/parse.ts (the implementation layer).

Production files (18): aws, hetzner, digitalocean, gcp, sprite, index,
update-check, ui, ssh, agent-setup, picker, agent-tarball, shared,
run, connect, delete, list

Test files (12): cmdlast, cmd-interactive, cmdrun-happy-path,
commands-resolve-run, commands-swap-resolve, commands-error-paths,
download-and-failure, preload, ssh-keys, update-check, orchestrate,
fs-sandbox, prompt-file-security, security, script-failure-guidance

Bumps CLI version to 0.16.6

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 21:27:25 -07:00
A
a7a2032584
refactor: replace ~50 try/catch blocks with Result helpers across 20 files (#2479)
Convert catch-all, catch-swallow, catch-return-fallback, and catch-classify
patterns to use tryCatch/asyncTryCatch/unwrapOr from @openrouter/spawn-shared.

Files changed: aws.ts, hetzner.ts, digitalocean.ts, gcp.ts, run.ts, delete.ts,
shared.ts, ssh.ts, agent-setup.ts, orchestrate.ts, ui.ts, index.ts,
update-check.ts, update.ts, status.ts, picker.ts, interactive.ts, list.ts,
pick.ts, ssh-keys.ts, billing-guidance.ts, oauth.ts, sprite.ts

Preserved all try/finally-only blocks, security-validation-exit blocks,
billing/classify blocks, spinner cleanup, and top-level handleError blocks.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
2026-03-10 19:26:41 -07:00
Ahmed Abushagur
c77ca106d2
feat: ssh tunnel + browser auto-open for OpenClaw web dashboard (#2452)
OpenClaw runs a web dashboard on port 18791 of the remote VM. This
change SSH-tunnels that port to localhost and auto-opens the browser,
giving users a web UI with zero CLI knowledge needed.

- Add TunnelConfig to AgentConfig interface (agents.ts)
- Add startSshTunnel function with port-finding logic (ssh.ts)
- Capture gateway token in closure so the same token is used for both
  the remote config and the browser URL (agent-setup.ts)
- Wire tunnel into orchestration pipeline between preLaunch and
  interactiveSession (orchestrate.ts)
- Add getConnectionInfo to CloudOrchestrator interface and implement
  in all SSH-based clouds (DO, Hetzner, AWS, GCP)
- Local: opens browser directly at localhost:18791
- Sprite: gracefully skipped (no standard SSH)
- Add USER.md bootstrap to guide OpenClaw users to web dashboard

Closes #2449
Supersedes #2418

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-10 14:25:43 -04:00
Ahmed Abushagur
3c029be108
Revert "feat: wrap cloud VM sessions in tmux for persistence (#2358)" (#2366)
This reverts commit e855790a5d.

Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-09 01:11:57 -04:00
Ahmed Abushagur
e855790a5d
feat: wrap cloud VM sessions in tmux for persistence (#2358)
* feat: wrap cloud VM sessions in tmux for session persistence

- Ctrl+C exits the agent → user lands at a shell prompt (can run CLI commands)
- SSH disconnect → tmux session persists, `spawn last` reattaches
- Install tmux automatically during env setup if not present
- Reconnect flow (`spawn last`, `spawn enter`) also uses tmux attach
- Replaces the restart loop — tmux gives users control over restarts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: auto-tunnel gateway dashboard port over SSH

Forward port 18789 (OpenClaw gateway dashboard) to localhost so users
can access http://localhost:18789 from their browser during SSH sessions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — command injection, port forwarding, tmux install order

1. wrapWithTmux: escape backslashes, $, and backticks in addition to
   double quotes to prevent command injection via tmux send-keys
2. SSH port forwarding: remove unconditional -L 18789 tunnel from
   SSH_INTERACTIVE_OPTS; export SSH_TUNNEL_OPTS for agent-specific use
3. tmux install: try sudo apt-get first (most cloud VMs need it on AWS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-09 00:22:23 -04:00
A
4f528b1c77
refactor: remove unnecessary exports and fix stale comment (#2338)
- Remove `export` from `verifyOpenrouterKey` in shared/oauth.ts (only used internally)
- Remove `export` from `tcpCheck` in shared/ssh.ts (only used internally)
- Fix stale comment in commands/index.ts referencing non-existent `./commands.js`

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 08:51:25 -04:00
A
90ae485c02
fix: add per-process timeout to SSH handshake probes in waitForSsh (#2299)
The Phase 2 SSH handshake loop in waitForSsh spawns SSH processes
without a per-process timeout. ConnectTimeout=10 only covers TCP
connect — if sshd accepts the connection but stalls during key
exchange or authentication, the process hangs indefinitely. This
causes the entire spawn command to freeze with no way to recover.

Add a 30s killWithTimeout guard to each probe, matching the pattern
already used in every cloud-specific runServer/uploadFile function.

-- refactor/code-health

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 18:40:48 -05:00
L
65a81edc57
fix: add unique spawn IDs to prevent history record corruption (#2235)
* fix: add unique spawn IDs to prevent history record corruption

History records were matched by heuristic ("most recent record for this
cloud without a connection"), which caused saveVmConnection and
saveLaunchCmd to overwrite the wrong record during concurrent or failed
spawns.

Fix: every SpawnRecord now has a unique `id` (UUID). All history
operations (saveVmConnection, saveLaunchCmd, removeRecord,
markRecordDeleted, mergeLastConnection) match by id when available,
falling back to the old heuristic for pre-migration records.

The orchestrator (TS path) now creates the history record AFTER server
creation succeeds, not before — so failed provisions don't leave orphan
entries.

Also adds "Remove from history" option to the spawn ls action picker,
restoring the ability to soft-delete entries without destroying the VM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 18 unit tests for spawn ID history behavior

Tests cover:
- generateSpawnId returns unique UUIDs
- saveSpawnRecord auto-generates id when not provided
- saveVmConnection matches by spawnId (not heuristic)
- saveVmConnection does not cross-contaminate concurrent spawns
- saveVmConnection falls back to heuristic without spawnId
- saveLaunchCmd matches by spawnId (not heuristic)
- saveLaunchCmd falls back without spawnId
- removeRecord matches by id, not by timestamp+agent+cloud
- removeRecord handles duplicate timestamps correctly
- removeRecord falls back for legacy records without id
- markRecordDeleted targets correct record by id
- mergeLastConnection uses spawn_id from last-connection.json
- mergeLastConnection falls back to heuristic without spawn_id

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: enable biome import sorting with grouped imports

Adds organizeImports to biome assist config with groups:
1. Type imports
2. Node built-ins
3. Third-party packages
4. @openrouter/* packages
5. Aliases

Auto-fixed import order and lint issues across all TypeScript files,
including .claude/skills/ and packages/cli/src/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-05 23:27:03 -08:00
A
701e3af56e
fix: prevent timer leaks and event-loop stalls in SSH timeout handling (#2200)
- Unref the SIGKILL timer in killWithTimeout() so it doesn't keep the
  event loop alive for 5 extra seconds after a timed-out process exits
- Wrap all setTimeout/clearTimeout pairs in try/finally across 6 cloud
  providers (12 call sites) to guarantee cleanup on exceptions
- Add missing 60s timeout guard to runSpriteSilent() which could hang
  indefinitely on unresponsive sprite processes

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-04 19:04:47 -08:00
A
85e0c932c7
fix(ssh): use boolean flag to detect TCP probe success in waitForSsh (#2157)
When the TCP probe succeeds on the final attempt, `attempt` equals
`maxAttempts` after the loop increments it. The previous guard
`attempt >= maxAttempts` then incorrectly threw a timeout error even
though the port was open.

Fix by tracking TCP success with a `tcpOpen` boolean flag and checking
that instead of the attempt counter.

Fixes #2155

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-03 20:47:02 -05:00
Ahmed Abushagur
f7b3bc91c9
fix: show all polling status on single updating line (#2115)
Add logStepInline/logStepDone helpers to ui.ts and convert all 9
polling loops (DO droplet, DO cloud-init, AWS instance, AWS cloud-init,
Hetzner cloud-init, Daytona SSH, Sprite connectivity, GCP startup,
shared SSH port) from multi-line spam to a single line that updates
in place.

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-02 18:10:27 -05:00
Ahmed Abushagur
357132b506
fix: eliminate keystroke loss during interactive agent sessions (#1939)
* fix: eliminate keystroke loss during interactive agent sessions

Three root causes were identified and fixed:

1. **Event loop fd competition**: Bun.spawn with stdio:"inherit" shares
   fd 0 between the parent event loop and the child SSH process. The
   kernel arbitrarily splits input bytes between them, causing random
   keystroke drops. Introduced spawnInteractive() using Node's
   child_process.spawnSync to block the event loop entirely.

2. **Unnecessary shell layers**: AWS and GCP wrapped the SSH command in
   an extra `bash -c '...'` layer, creating 3 shell processes before the
   agent. Aligned to match Hetzner/DO which pass directly.

3. **stty sane side effects**: prepareStdinForHandoff() ran `stty sane`
   which enables ixon (XON/XOFF flow control), causing periodic input
   freezes. Removed — setRawMode(false) is sufficient. Also removed
   process.stdin.destroy() which could corrupt fd 0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome format + remove stdin unref that broke async spawn

- Fix biome formatting in ssh.ts and commands.ts
- Remove process.stdin.unref() from prepareStdinForHandoff — it
  allowed the event loop to exit before async child_process.spawn
  finished, causing test failures and potential production issues
  with the spawnBash (legacy script execution) path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 00:08:38 -05:00
A
98d5a6612b
fix: add SIGKILL fallback to process timeout kills (#1931)
* fix: add SIGKILL fallback to process timeout kills

proc.kill() only sends SIGTERM; SSH processes stuck in network I/O can
ignore SIGTERM and cause the CLI to hang forever waiting on proc.exited.

Add killWithTimeout() to shared/ssh.ts that sends SIGTERM then SIGKILL
after a 5s grace period. Replace all 10 proc.kill() timeout sites across
Fly, AWS, DigitalOcean, GCP and Hetzner providers.

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* chore: format files with biome

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-25 16:50:18 -05:00
A
32fd908101
fix: reduce SSH keystroke latency with EscapeChar=none (#1910)
SSH scans every byte for ~ escape sequences by default, adding
per-keystroke overhead. Disable this for interactive agent sessions
where escape sequences aren't needed. Also add AddressFamily=inet
to skip IPv6 resolution stalls.

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
2026-02-25 00:16:47 -08:00
Ahmed Abushagur
338ae57f71
fix: replace @clack/prompts multiselect with /dev/tty picker for SSH keys (#1907)
* fix: run bun in foreground in DigitalOcean scripts to unfreeze interactive prompts

The _run_with_restart function backgrounded bun with `& + wait` so a SIGTERM
trap could forward the signal. But backgrounding removes bun from the terminal's
foreground process group, which prevents @clack/prompts multiselect from entering
raw mode — arrow keys print as raw escape sequences (^[[A^[[B) and the SSH key
selection prompt freezes.

Fix: run bun in the foreground and detect SIGTERM from exit code 143 (128+15)
instead of using a trap flag + PID tracking. This preserves the restart-on-signal
behavior while giving bun full terminal access for interactive prompts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace @clack/prompts multiselect with /dev/tty picker for SSH keys

When the CLI (parent bun) spawns bash → child bun for cloud scripts,
the parent's event loop keeps fd 0 registered and races with the child's
@clack/prompts for terminal input. This causes the SSH key multiselect
to render but freeze — arrow keys print as raw escape sequences.

Fix: add multiPickToTTY() in picker.ts that opens /dev/tty directly,
bypassing process.stdin entirely. Replace the @clack/prompts multiselect
in ssh-keys.ts with this new function. Also add process.stdin.unref()
to prepareStdinForHandoff() so the parent stops polling fd 0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: disable SSH compression for interactive sessions

Compression=yes adds per-keystroke CPU overhead that causes
noticeable input lag on normal connections. Only beneficial
on slow/high-latency links.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 23:54:54 -08:00
A
65f6f1be32
feat: Bun workspace monorepo — packages/cli + packages/shared (#1853)
Restructure the repo as a Bun workspace monorepo:

- Move cli/ → packages/cli/
- Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities
- Add root package.json with workspace configuration
- Update all CLI imports to use @openrouter/spawn-shared
- Deduplicate toRecord/toObjectArray helpers from 4 cloud modules
- Update SPA (slack-bot) to use shared package instead of local toObj()
- Update 48 agent shell scripts for new packages/cli/ path
- Update install.sh, install.ps1, e2e, and test scripts
- Update all GitHub workflows, .gitignore, pre-commit hooks
- Update CLAUDE.md, README.md, and skill prompt references
- Pin all dependency versions (no ^ ranges)
- Bump CLI version 0.9.1 → 0.10.0

All 1908 tests pass. Lint clean. All 8 cloud bundles build.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 22:07:05 -08:00
Renamed from cli/src/shared/ssh.ts (Browse further)