Commit graph

296 commits

Author SHA1 Message Date
Ahmed Abushagur
b548c5b75a
fix: only pre-select Chrome browser in setup picker (#2512)
#2507 pre-selected all setup options. Only browser should default to
enabled — GitHub CLI and reuse-saved-key are opt-in.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 23:05:31 -04:00
Ahmed Abushagur
aa6e7dd1fc
fix: default all setup options to enabled in picker (#2507)
The multiselect picker for setup options (Chrome browser, GitHub CLI,
etc.) started with nothing selected. Now all available options are
pre-selected so users get the full setup by default.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:43:03 -04:00
A
65a2efd5ba
fix: gcp use root SSH user instead of whoami (#2503)
The `resolveUsername()` function called `whoami` and validated against a
regex that rejected dots in usernames (e.g. `adrian.hale`), causing
"Invalid username" errors. All other clouds use a static SSH user
(root for Hetzner/DO, ubuntu for AWS).

Switch GCP to use `root` consistently:
- Replace dynamic `whoami` lookup with static `GCP_SSH_USER = "root"`
- Simplify cloud-init startup script (already runs as root)
- Fix bun symlink path to use /root instead of /home/${username}
- Remove unused `username` field from GcpState

Closes #2502

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-11 13:48:49 -07:00
A
150d094ef2
fix: fallback to manual project entry when gcloud projects list fails (#2500)
* fix: fallback to manual project entry when gcloud projects list fails

When the user declines the suggested default GCP project and
`gcloud projects list` fails (e.g. lacking resourcemanager.projects.list
permission), prompt for a manual project ID instead of hard-failing.

Also fix selectFromList() to return "" on cancel (Ctrl+C/Escape) rather
than defaultValue, so canceling a project picker is treated as "no
selection" rather than silently re-using the first project.

Fixes #2499

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add GCP project ID format validation for manual entry

Validates user-entered GCP project IDs against the required format
(^[a-z][a-z0-9-]{4,28}[a-z0-9]$) before accepting them. Invalid
entries are rejected with a helpful message and the user is re-prompted.

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-11 15:47:53 -04:00
A
479cbbc009
fix: pass --skip-setup to hermes installer for headless installs (#2496)
The Hermes Agent installer's setup wizard tries to read from /dev/tty,
which fails in headless/non-interactive cloud VM environments. The
installer supports --skip-setup to bypass the wizard; pass it via
bash -s -- --skip-setup.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-11 09:33:27 -04:00
A
5031d84e6c
refactor: eliminate process-global mock.module() pollution in tests (#2490)
Replace mock.module() calls with dependency injection to prevent
cross-file test pollution in Bun's shared worker process. Changes:

- orchestrate.ts: add getApiKey to OrchestrationOptions
- billing-guidance.ts: add injectable BillingGuidanceDeps parameter
- delete.ts: add optional deleteHandler parameter to confirmAndDelete
- update.ts: add UpdateOptions with injectable runUpdate function
- sprite.ts: add optional spawnFn parameter to interactiveSession
- Remove unnecessary oauth mocks from junie-agent and do-snapshot tests

Only @clack/prompts mock (shared via test-helpers.ts) and
do-payment-warning.test.ts (safe spread pattern) remain.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 23:57:57 -07:00
A
6439cba58c
fix: remove spinner from delete to prevent output overlap (#2487)
* fix: remove spinner from delete command to prevent output overlap

The delete spinner in confirmAndDelete collided with cloud-specific
destroy functions that print their own progress (logStep/logInfo).
This caused the "Instance destroyed" message to overwrite the spinner
line without a newline, producing garbled output.

Remove the spinner and let the cloud destroy functions handle progress
output directly, then show a clean success/failure message after.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: redirect cloud destroy output into delete spinner

Cloud destroy functions (logStep/logInfo) write progress to stderr,
which collided with the @clack spinner on the terminal. Now stderr
writes during the delete are intercepted and fed into s.message()
so the spinner text updates in place instead of garbling the output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add delete spinner behavior tests

Verify that confirmAndDelete:
- Feeds stderr output from cloud destroy functions into spinner.message()
- Calls spinner.clear() (not stop) so no spinner chrome remains
- Shows p.log.success with the last stderr message as detail
- Shows p.log.error on failure
- Always restores process.stderr.write, even on error
- Works when destroy produces no stderr output

Also adds spinnerClear to the shared test-helpers mock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove global cloud module mocks that polluted other tests

Only mock hetzner (the cloud used by test records). Other cloud modules
are left un-mocked since they're never called for hetzner records. This
fixes the DO payment warning test failures caused by mock.module being
process-global in Bun.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 23:35:12 -07:00
A
0e265d65d7
fix: use parseJsonObj instead of JSON.parse to prevent SyntaxError crashes on corrupted config (#2486)
Five call sites wrapped JSON.parse inside tryCatchIf(isFileError), causing
SyntaxError (from corrupted JSON) to escape uncaught since SyntaxError has no
.code property. Replace with parseJsonObj() which catches SyntaxError internally
and returns null, restoring graceful recovery.

Affected: loadApiToken(), loadSavedOpenRouterKey(), readCache(),
tryLoadLocalManifest(), hasCloudConfigCredentials()

Fixes #2485

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-11 01:27:07 -04:00
A
9a1dad7fcb
feat: gate tarball install behind --beta=tarball flag (#2482)
* feat: gate tarball install behind --beta=tarball flag

Tarball install is not yet reliable enough to be the default.
Move it behind an opt-in --beta=tarball flag so users can test it
explicitly while live install remains the default path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: support multiple --beta flags (repeatable)

Parse all --beta flags from args in a loop, collecting them into a
comma-separated SPAWN_BETA env var. Consumers check for their feature
with Set.has() so multiple beta features can be active simultaneously.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace for(;;) loop with extractAllFlagValues helper

Cleaner approach: a dedicated helper mutates args in place and returns
all values for a repeatable flag, replacing the infinite loop pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 21:24:51 -07:00
A
014d591e68
refactor: convert remaining 5 try/catch blocks to Result helpers (#2480)
Convert the last convertible catch blocks:
- digitalocean.ts: SSH key registration fallback
- sprite.ts: keep-alive soft-dependency install
- agent-tarball.ts: tarball metadata fetch fallback
- list.ts: enter/reconnect connection error recovery (2 blocks)

The remaining ~43 try blocks are all try/finally cleanup (21),
security/billing validation (10), or top-level handlers — none
are candidates for Result helper conversion.

Bumps CLI to 0.16.5.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
2026-03-10 23:01:10 -04:00
A
a7a2032584
refactor: replace ~50 try/catch blocks with Result helpers across 20 files (#2479)
Convert catch-all, catch-swallow, catch-return-fallback, and catch-classify
patterns to use tryCatch/asyncTryCatch/unwrapOr from @openrouter/spawn-shared.

Files changed: aws.ts, hetzner.ts, digitalocean.ts, gcp.ts, run.ts, delete.ts,
shared.ts, ssh.ts, agent-setup.ts, orchestrate.ts, ui.ts, index.ts,
update-check.ts, update.ts, status.ts, picker.ts, interactive.ts, list.ts,
pick.ts, ssh-keys.ts, billing-guidance.ts, oauth.ts, sprite.ts

Preserved all try/finally-only blocks, security-validation-exit blocks,
billing/classify blocks, spinner cleanup, and top-level handleError blocks.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
2026-03-10 19:26:41 -07:00
A
5289a87043
fix: use asyncTryCatch for tarball install + add chown ownership fix (#2478)
Replace try/catch in agent-tarball.ts with asyncTryCatch Result helpers:
- Phase 3 (download/extract): asyncTryCatch → returns false on any failure
- Phase 4 (mirror): asyncTryCatch → non-fatal, logs warning on failure

Add chown ownership fix for non-root SSH users (GCP, AWS Lightsail):
files extracted as root need ownership corrected after mirroring.

Add 5 anti-regression tests for non-root home directory mirroring.

Supersedes #2466.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 19:04:20 -07:00
A
3fd17e3d1d
refactor: replace indiscriminate try/catch with guarded Result helpers (#2477)
Add tryCatchIf/asyncTryCatchIf with error predicates (isFileError,
isNetworkError, isOperationalError) so operational errors are handled
explicitly while programming bugs (TypeError, ReferenceError) propagate
and crash visibly instead of being silently swallowed.

Transforms ~40 try/catch blocks across 14 files:
- File I/O (manifest cache, config loading, history) → tryCatchIf(isFileError)
- Network/fetch (API calls, version checks, OAuth) → asyncTryCatchIf(isNetworkError)
- SSH/subprocess (agent setup, tunnel) → asyncTryCatchIf(isOperationalError)
- API retry loops (DO, Hetzner) → guard retries with isNetworkError

Intentionally keeps ~85 try/catch blocks as-is (cleanup/finally, retry
loops, user-facing error handlers, catch-classify-rethrow patterns).

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 18:55:07 -07:00
A
b3938144b7
fix: validate model ID before shell interpolation (fixes #2460) (#2472)
Add validateModelId() to reject model IDs containing shell metacharacters.
The validation is applied in orchestrate.ts immediately after resolving
MODEL_ID from env/agent defaults, before the value reaches any agent
configure function or runServer call. Invalid model IDs are dropped to
undefined with a warning.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 20:31:32 -04:00
Ahmed Abushagur
d82dea811d
feat: unified arrow-key selection + setup checkboxes (#2459)
* feat: unified arrow-key selection + setup checkboxes

Replace p.autocomplete (type-ahead) with p.select (arrow-key navigation)
for agent and cloud selection. Add p.multiselect checkboxes for optional
post-provision setup steps (GitHub CLI, Chrome browser), all ON by default.

Three fast prompts: agent → cloud → setup options. Defaults: OpenClaw,
first cloud with credentials, all steps enabled.

Key changes:
- interactive.ts: p.autocomplete → p.select with initialValue defaults
- interactive.ts: promptSetupOptions() with p.multiselect, exported for reuse
- run.ts: wire setup options into cmdRun direct path
- agents.ts: OptionalStep type, getAgentOptionalSteps() static metadata
- orchestrate.ts: read SPAWN_ENABLED_STEPS env var, gate GitHub auth + configure
- agent-setup.ts: gate Chrome install with enabledSteps in setupOpenclawConfig
- Version bump 0.15.40 → 0.16.0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: mirror tarball files to $HOME for non-root SSH users (GCP, AWS)

Tarballs are built with absolute /root/ paths, but GCP and AWS Lightsail
SSH as a regular user whose $HOME is /home/<user>/. After extraction,
binaries like `claude` end up at /root/.claude/local/bin/ but the
launchCmd looks in $HOME/.claude/local/bin/ — causing "command not found".

Add a post-extraction step that copies /root/ dotfiles to $HOME/ when
the SSH user isn't root. This fixes `spawn claude gcp` failing with
exit code 127 after tarball install.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-10 14:19:08 -07:00
A
5db9cc2a80
fix: show history table directly when no active servers found in spawn list (#2451)
Instead of telling users to pipe through `spawn list | cat` to view their
spawn history, render the history table inline when no active connections
exist. The | cat workaround was needed because non-interactive mode skips
the picker; now interactive mode falls through to renderListTable directly,
consistent with what `spawn list | cat` was already doing.

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 15:21:00 -04:00
A
a46a92a8a4
fix: add missing PATH entries in Hetzner and DigitalOcean runServer/interactiveSession (#2450)
AWS and GCP both include $HOME/.npm-global/bin and $HOME/.claude/local/bin in the
PATH exported before running remote commands. Hetzner and DO were missing these two
entries, causing "command not found" errors for Claude Code and npm-global packages
on those clouds.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 14:24:16 -04:00
A
0380ad33f9
refactor: remove dead exports only used within their own files (#2431)
- withSpinner in commands/shared.ts
- ENTITY_DEFS in commands/shared.ts
- isValidManifest in manifest.ts
- waitForInstance in aws/aws.ts
- SignalEntry, ExitCodeEntry in guidance-data.ts

Bump version: 0.15.37 -> 0.15.38

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-10 08:51:15 -04:00
A
15e4715555
fix: validate server ID in status.ts before API calls (#2430)
status.ts passed server_id from history directly into Hetzner/DO API
URLs without calling validateServerIdentifier(). Both delete.ts and
connect.ts validate first; status.ts was the only gap. A tampered
~/.spawn/history.json could craft a server_id with path traversal
characters (e.g. "../v2/account") causing the Bearer token to be
sent to an unintended API endpoint (SSRF via URL path manipulation).

Fix: call validateServerIdentifier() after extracting serverId,
returning "unknown" gracefully on failure.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 07:17:07 -04:00
A
72ccb098ab
feat: integrate Sprite keep-alive tasks for all Sprite agents (#2428)
Adds sprite-keep-running support so sprites stay alive during long
agent sessions instead of shutting down due to inactivity.

- Add installSpriteKeepAlive() to sprite/sprite.ts: downloads and
  installs the sprite-keep-running script (~/.local/bin) on the sprite
  during setup. Non-fatal: logs a warning if download fails so
  deployment still proceeds.

- Modify interactiveSession() to wrap the session command in a temp
  script (base64-encoded to handle multi-line restart loops) and exec
  it via sprite-keep-running if available, with plain bash fallback.

- Call installSpriteKeepAlive() in sprite/main.ts createServer() step
  after setupShellEnvironment(), applying to all Sprite agents.

- Add sprite-keep-alive.test.ts: 11 unit tests covering download URL,
  install path, error resilience, session script structure, and
  keep-alive wrapper inclusion.

Fixes #2424

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 02:24:18 -07:00
A
de76599b39
refactor: centralize path resolution into shared/paths.ts (#2422)
Move all filesystem path helpers (getUserHome, getSpawnDir, getHistoryPath,
getSpawnCloudConfigPath, getCacheDir, getCacheFile, getUpdateFailedPath,
getSshDir, getTmpDir) into a single shared/paths.ts module. This eliminates
scattered homedir()/process.env.HOME patterns across 8+ files and provides
a single import source for all path resolution.

- Create packages/cli/src/shared/paths.ts with 9 exported functions
- Update 17 source files to import from paths.ts
- Add re-exports in ui.ts and history.ts for backward compatibility
- Remove direct homedir() imports from gcp, sprite, local, ssh-keys, etc.
- Add comprehensive unit tests in paths.test.ts
- Bump CLI version to 0.15.34

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 00:48:03 -07:00
A
486aba49f6
fix: use process.env.HOME instead of os.homedir() for test sandboxing (#2417)
Bun's os.homedir() reads from getpwuid() and ignores runtime changes to
process.env.HOME. Named imports capture the native function binding, so
patching os.homedir on the default export doesn't propagate. This caused
all test files using homedir() to write .spawn-test-* dirs to the real
home directory instead of the preload sandbox.

Add getUserHome() helper to shared/ui.ts that prefers process.env.HOME,
replace all direct homedir() calls in production and test code.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-10 00:20:19 -07:00
A
f272294902
refactor: Deduplicate getServerName and promptSpawnName across cloud modules (#2415)
Consolidates duplicate server naming logic from 5 cloud modules into shared utilities in src/shared/ui.ts. No behavioral changes - purely structural refactor.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-10 05:26:25 +00:00
L
2da9e6cd46
refactor: restore @openrouter/spawn-shared workspace package (#2405)
* refactor: restore @openrouter/spawn-shared workspace package

Restore packages/shared/ as canonical location for parse.ts, result.ts,
and type-guards.ts. CLI shared files become thin re-exports, preserving
all existing import paths. SPA imports switch from fragile relative paths
to the workspace package.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sort exports in shared package barrel to satisfy biome

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sort SPA imports to satisfy biome organizeImports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-09 17:14:26 -07:00
Ahmed Abushagur
06796ec95c
fix: isolate orchestrate tests from user's ~/.spawn history (#2398)
The orchestrate test suite called runOrchestration (which internally
calls saveSpawnRecord) without setting SPAWN_HOME to a temp directory.
Every test run wrote ~20 fake records into the user's real history,
eventually filling it with 100 connectionless "testagent" entries
and wiping all real spawn history.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:46:19 -04:00
L
e182806eee
fix: graceful recovery from corrupted history.json (#2391)
* fix: graceful recovery from corrupted history.json

- Atomic writes (write to .tmp, rename into place) to prevent corruption
- Backup corrupted files with .corrupt suffix before discarding
- Per-record salvaging: if some v1 records are malformed, keep the valid ones
- Archive recovery: when history.json is corrupted, try loading from archives
- Stderr warnings when corruption is detected or records are recovered

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace try/catch with Result tryCatch wrapper in history.ts

Add tryCatch() to shared/result.ts and use it throughout history.ts to
eliminate all 7 try/catch blocks. Errors are now handled via Result
pattern matching instead of exception control flow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-09 14:50:29 -07:00
L
d9a25a4720
fix: ESC/Ctrl-C in picker falls back to numbered list instead of cancelling (#2390)
The TTY key loop treated explicit user cancellation (ESC/Ctrl-C) the same
as a TTY failure — both called fallback() which renders a numbered-list
picker. Now the key loop distinguishes between the two: cancel() exits
cleanly, fallback() is only used when /dev/tty is unavailable.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-09 14:28:02 -07:00
Ahmed Abushagur
e38f4483d6
fix: align cloud defaults with manifest (DO size, Hetzner location) (#2387)
DO default was s-2vcpu-4gb which isn't available in nyc3, causing 422
errors. Changed to s-2vcpu-2gb to match manifest.json. Also aligned
Hetzner default location from nbg1 to fsn1 to match manifest.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 18:23:22 +00:00
Ahmed Abushagur
7bab1c3289
fix: set browser.defaultProfile to openclaw for managed browser mode (#2384)
On headless VMs there's no Chrome extension to attach to. Setting
defaultProfile to "openclaw" tells OpenClaw to launch and manage
the browser itself via CDP instead of waiting for an extension relay.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 13:23:23 -04:00
A
2074211d13
fix: wire maxAttempts parameter in waitForCloudInit for hetzner and digitalocean (#2380)
The `_maxAttempts` parameter in both Hetzner and DigitalOcean's
`waitForCloudInit()` was silently ignored — loop bounds and early-exit
checks were hardcoded. Rename to `maxAttempts` and use it consistently,
matching the AWS/GCP implementations.

Fixes #2378

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-09 09:35:43 -04:00
Ahmed Abushagur
4004b51f6d
fix: use curl for Chrome download + capture google-chrome-stable in tarball (#2370)
- wget not available on many cloud VMs, use curl instead
- Remove 2>/dev/null from dpkg/apt so install errors are visible
- Capture /usr/bin/google-chrome-stable in tarball (actual .deb binary name)
- Use curl in packer/agents.json tarball build too

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 23:59:32 -07:00
Ahmed Abushagur
7e2f9f45fc
fix: use Google Chrome .deb for OpenClaw browser tool (#2368)
* fix: use Google Chrome .deb instead of Playwright for OpenClaw browser

Snap Chromium on Ubuntu 24.04 fails because AppArmor confinement blocks
CDP control. OpenClaw's own docs recommend installing Google Chrome via
.deb package which bypasses snap entirely.

Also adds browser.noSandbox and browser.executablePath to the OpenClaw
config so the browser tool works out of the box on Linux VMs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unnecessary confirmation prompt when OAuth fails

If OAuth didn't complete, the user obviously wants to paste a key.
The "Paste your API key manually? (Y/n)" prompt was pointless friction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unnecessary "Continue anyway?" credential confirmation

If the user selected a cloud, they obviously want to continue.
The warning + setup guidance is sufficient — no need to block on a confirm.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move Chrome install to configure step so it runs after tarball

The tarball path skips agent.install() entirely, so Chrome never got
installed. Moving it to configure() (setupOpenclawConfig) ensures it
always runs regardless of install method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: bundle Google Chrome in openclaw tarball

Add Chrome .deb install to openclaw's tarball build so it ships
pre-installed. Capture /usr/bin/google-chrome and /opt/google/chrome/
in the tarball. Add dl.google.com to the workflow domain allowlist.

The configure() step still has a fallback install with idempotency
check (command -v google-chrome) for non-tarball installs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use openclaw config set for browser setup + correct binary name

- Use `google-chrome-stable` (actual .deb binary name) not `google-chrome`
- Set browser config via `openclaw config set` CLI (the supported way)
  instead of writing JSON directly which wasn't being picked up
- Remove browser section from JSON config to avoid conflicts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 01:52:08 -04:00
Ahmed Abushagur
57a7a9e033
feat: install Playwright Chromium for OpenClaw browser tool (#2362)
Ubuntu 24.04 replaced chromium-browser with a snap redirect that fails
on cloud VMs without snapd. Playwright's bundled Chromium is
self-contained (~170MB), works headless, and has no snap dependency.

Installed as a non-fatal post-install step — if it fails, the agent
still works but without browser capabilities.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 00:20:33 -04:00
A
3d7ad51f6d
fix: GCP billing retry fails because temp startup script is already deleted (#2361)
The startup script temp file was cleaned up immediately after the first
gcloud call, but the billing retry path re-used the same args array
referencing that file. This meant billing retries always failed with a
file-not-found error. Move cleanup to a try/finally block that runs
after all retry paths. Also add randomness and mode 0o600 to the temp
file path.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 23:07:57 -04:00
A
62e1df9be5
refactor: deduplicate PkgVersionSchema to shared/parse.ts (#2357)
Move the PkgVersionSchema (v.object({ version: v.string() })) from its
duplicate definitions in commands/shared.ts and update-check.ts into the
shared parse module. Both consumers now import from the single source.

Bump CLI version to 0.15.22.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 21:45:51 -04:00
A
4396703615
refactor: use shared getErrorMessage() and deduplicate OAuth CSS (#2348)
Replace 4 inline `err instanceof Error ? err.message : String(err)`
patterns in aws.ts, digitalocean.ts, and hetzner.ts with the shared
getErrorMessage() helper. The shared helper uses duck-typing which is
more robust across realms/prototypes than instanceof checks.

Export OAUTH_CSS from shared/oauth.ts and import it in
digitalocean/digitalocean.ts instead of duplicating the 250+ char
CSS string.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 13:42:08 -04:00
A
8ac2ae366f
refactor: remove unused hasMessage type guard (#2346)
hasMessage was exported from shared/type-guards.ts but never imported
outside of its own test file. getErrorMessage already covers the
message-extraction use case. Remove the dead function and its tests.

-- qa/code-quality

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 12:51:18 -04:00
A
36582b3b95
refactor: deduplicate getErrorMessage into shared/type-guards.ts (#2343)
Moves getErrorMessage to zero-dep shared module, eliminating 13 inline
copies and 2 hasMessage variant sites across the codebase.

Fixes #2341

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 07:45:11 -07:00
A
48af1c3459
fix: resolve undefined variable refs in Hetzner billing retry path (#2340)
PR #2335 fixed this bug in digitalocean.ts, gcp.ts, and aws.ts but
missed hetzner.ts. The billing retry block assigned serverId/serverIp
to undefined local variables (hetznerServerId, hetznerServerIp) instead
of _state.serverId / _state.serverIp, so the retry always threw
"Server creation failed" even when the API call succeeded. This also
adds the missing saveVmConnection() call in the retry success path so
the VM is recorded in spawn history.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 09:48:54 -04:00
L
a77f70adfc
fix: update cloud picker prompt to 'Pick your cloud' (#2334)
* fix: update cloud picker prompt to "Pick your cloud"

The previous "Where should your agent run?" was vague. Simplify to
"Pick your cloud (type to filter)" for clarity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use "Select a cloud" for cloud picker prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-08 05:04:28 -07:00
Ahmed Abushagur
bc0c1827bb
fix: reorder auth flow and persist OpenRouter API key (#2320)
* fix: reorder auth flow and persist OpenRouter API key across retries

Two onboarding issues reported by users:

1. After DigitalOcean OAuth, the message said "OpenRouter authentication
   in 5s..." but then a GitHub CLI prompt appeared first. Fix: move API
   key acquisition immediately after cloud auth, before preProvision
   hooks (which include the GitHub prompt). Remove the misleading 5s
   delay message.

2. On retry after billing failure, DigitalOcean token was remembered but
   the OpenRouter API key was lost (only stored in process.env). Fix:
   persist the key to ~/.config/spawn/openrouter.json and load it on
   subsequent runs, matching how cloud tokens are already persisted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add mode 0o700 to config dir and await saveOpenRouterKey

- Add mode: 0o700 to mkdirSync in saveOpenRouterKey to match other cloud
  modules (aws, hetzner, digitalocean) and prevent directory permission leak
- Add missing await on saveOpenRouterKey(manualKey) to ensure manual API
  keys persist to disk before the function returns

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-03-08 06:48:14 -04:00
Ahmed Abushagur
ff3a60267c
feat: add billing/payment setup guidance for new cloud users (#2319)
Detect billing-related server creation errors, open the cloud's billing
page in the browser, and prompt the user to retry after adding a payment
method. Adds pre-flight account checks for DigitalOcean (account status)
and GCP (billing enabled).

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 04:50:51 -04:00
A
459e25a844
feat(cli): show connect-or-create menu when existing spawns are present (#2310)
* feat(cli): show connect-or-create menu when existing spawns are present

When the user runs `spawn` with no arguments and has active servers in
history, display a top-level menu before jumping into the create flow:

  What would you like to do?
  ❯ Connect to existing server
    Create a new server

Selecting "Connect to existing server" opens the same interactive picker
as `spawn list` (activeServerPicker). Selecting "Create a new server" or
having no existing spawns continues with the current create flow, so
there is no behaviour change for first-time users.

Fixes #2308

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* chore(cli): bump version to 0.15.14

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 01:56:37 -05:00
A
bd41641c11
fix(cli): improve visual spacing in spawn list output (#2311)
- Interactive picker: add blank separator line between entries so label
  and subtitle are visually grouped (not blending into adjacent entries)
- Non-interactive table: wrap subtitle in pc.dim() for better contrast
  with the bold entry name
- Update pickerHeight to account for added separator lines

Fixes #2309

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-08 00:01:53 -05:00
A
51dec6e877
fix: E2E failures - SSH key gen race, hetzner 409, hermes binary path (#2305)
Three distinct E2E bugs fixed:

1. SSH key generation race condition: When multiple agents provision in
   parallel, concurrent processes all call generateSshKey() and race to
   create ~/.ssh/id_ed25519. ssh-keygen won't overwrite an existing file
   (prompts on stdin which is "ignore"), causing zeroclaw/codex to fail
   with "SSH key generation failed". Fix: check if key already exists
   before generating, and re-check after a failed generation attempt.

2. Hetzner SSH key 409 uniqueness_error: The Hetzner API returns HTTP 409
   with "SSH key not unique" when the same key content is registered under
   a different name. The hetznerApi() function throws on non-2xx before
   the error-parsing code runs, and the regex /already/ didn't match
   "not unique". Fix: catch 409 in ensureSshKey() and match against
   uniqueness_error/not unique/already patterns.

3. Hermes binary not found: The hermes install script (uv tool) creates
   the actual binary + venv at ~/.hermes/hermes-agent/venv/ with a symlink
   at ~/.local/bin/hermes. The tarball capture script only captured the
   symlink + ~/.local/share/, leaving a dangling symlink. Fix: include
   ~/.hermes/ in capture paths, add venv/bin to verify.sh PATH check,
   and update hermes launchCmd to include the venv PATH.

Fixes #2304

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 22:05:44 -05:00
A
e7ac388110
fix: make credential hint tests environment-independent (#2303)
Tests for getScriptFailureGuidance were failing when cloud credential
env vars (HCLOUD_TOKEN, DO_API_TOKEN) were set in the environment.
The tests expected these vars to appear as "missing" in the output,
but only unset OPENROUTER_API_KEY. Now both the cloud-specific var
and OPENROUTER_API_KEY are saved/unset before each test.

Bump CLI version to 0.15.11.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
2026-03-07 20:41:52 -05:00
A
90ae485c02
fix: add per-process timeout to SSH handshake probes in waitForSsh (#2299)
The Phase 2 SSH handshake loop in waitForSsh spawns SSH processes
without a per-process timeout. ConnectTimeout=10 only covers TCP
connect — if sshd accepts the connection but stalls during key
exchange or authentication, the process hangs indefinitely. This
causes the entire spawn command to freeze with no way to recover.

Add a 30s killWithTimeout guard to each probe, matching the pattern
already used in every cloud-specific runServer/uploadFile function.

-- refactor/code-health

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 18:40:48 -05:00
A
1991ffcb15
fix: add timeout protection to uploadFile across all SSH-based clouds (#2298)
All four SSH-based uploadFile functions (Hetzner, DO, AWS, GCP) used
`await proc.exited` on SCP subprocesses without any timeout guard.
If SCP hangs due to a network issue, the CLI hangs indefinitely.

This adds the same killWithTimeout pattern already used by runServer
and runServerCapture in these same files: a 120-second timeout that
kills the SCP process if it stalls.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 13:48:11 -08:00
A
0ef8eb4467
fix: validate v0 history entries against SpawnRecordSchema (#2279)
The v0 fallback path in loadHistory() returned raw parsed JSON array
directly without validating individual elements. This could cause
TypeErrors (e.g. r.agent.toLowerCase() on undefined) in callers like
getActiveServers and filterHistory when corrupted entries exist.

Now filters each element through v.safeParse(SpawnRecordSchema, el),
matching the validation the v1 path already performs.

Fixes #2277

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 03:47:11 -05:00
Ahmed Abushagur
d77a067aa4
fix: snapshot cleanup + claude install (name-prefix filter) (#2273)
* fix: claude snapshot build — remove npm fallback from install command

The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.

Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: snapshot cleanup and lookup — use name prefix instead of tags

DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.

Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 21:32:58 -08:00