Commit graph

173 commits

Author SHA1 Message Date
A
b84adfb74e
refactor: move all shell scripts to /sh directory (#1843)
Reorganizes the project so all shell scripts live under a dedicated
/sh directory, enabling the OpenRouter rewrite URL to point at /sh/
instead of the repository root.

Moves:
- cli/install.sh → sh/cli/install.sh
- shared/*.sh → sh/shared/*.sh
- {cloud}/{agent}.sh → sh/{cloud}/{agent}.sh (48 scripts)
- {cloud}/README.md → sh/{cloud}/README.md
- e2e/*.sh → sh/e2e/*.sh
- test/macos-compat.sh → sh/test/macos-compat.sh
- test/fixtures/**/*.sh → sh/test/fixtures/**/*.sh

Updates all references:
- RAW_BASE path construction in commands.ts, update-check.ts
- GitHub auth URL in agent-setup.ts
- Self-referencing URLs in install.sh, github-auth.sh
- CI workflow paths in lint.yml, cli-release.yml
- Test file paths in install-script-validation, manifest-integrity
- Documentation in README.md, cli/README.md, CLAUDE.md
- QA scripts in .claude/skills/

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 21:14:54 -08:00
A
8394386cd4
fix: install openclaw with npm instead of bun to fix gateway plugin loading (#1833)
* fix: install openclaw with npm instead of bun to fix gateway plugin loading

The OpenClaw gateway daemon runs on Node.js, but openclaw was being
installed via `bun install -g`. Bun and Node use incompatible module
resolution strategies, causing channel plugins (Telegram, Discord, etc.)
to silently fail to load at gateway startup.

Switch both install paths to `npm install -g openclaw` so the daemon's
Node runtime can resolve its dependencies correctly.

Fixes #1828

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: apply biome formatting to commands-update-download.test.ts

The file was added in #1831 without passing Biome's format check.
Auto-formatted with `bunx @biomejs/biome format --write`.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use bun install/runtime for openclaw instead of npm

npm install is broken on target VMs. Switch all openclaw install
commands back to bun and remove npm prefix from gateway PATH.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
2026-02-24 00:08:40 -05:00
A
18a7f7e96a
fix: improve AWS Lightsail error message for new accounts (#1826)
* fix: improve AWS Lightsail error message for new accounts

Fixes #1824

Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: fix biome formatting in AWS Lightsail error message

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 22:05:50 -05:00
A
0bca426980
security: validate RAW_BASE immediately before curl|bash invocation (#1822)
Fixes #1819

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 19:31:53 -05:00
A
a4e805d83b
fix: restore gateway port wait and batch openclaw setup into fewer SSH sessions (#1820)
The previous refactor (b43d3f1) deleted the gateway port wait entirely,
causing the TUI to launch before the gateway was listening on port 18789.

Changes:
- startGateway() now starts the daemon AND polls port 18789 in the same
  SSH session (up to 60s), using /dev/tcp with nc fallback.
- New setupOpenclawBatched() combines install verification + env var
  setup + openclaw config into a single SSH session (was 6 separate
  SSH calls, now 2 total for the whole openclaw flow).
- New optional `setup` hook on AgentConfig lets agents opt into the
  batched path; other agents are unaffected.

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 18:46:19 -05:00
A
c76d930046
fix: prevent GCP delete from hanging on interactive project prompt (#1813)
When deleting a GCP instance, resolveProject() could trigger an
interactive prompt ("Use project?") that collides with the deletion
spinner, causing the command to hang indefinitely. This happened when
instance metadata was missing the project (pre-ee653ca instances) or
when GCP_PROJECT was set to an empty string.

Fix: run resolveProject() in non-interactive mode during deletion so it
auto-accepts the gcloud config default. Also fail fast instead of
showing an interactive picker when no project is available.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 13:52:16 -05:00
A
463c7a2efb
feat: add --custom flag for machine type/region selection (#1810)
* feat: add --custom flag for interactive machine type/region selection

By default, all clouds now skip size/region prompts and use sensible
defaults for faster provisioning. The --custom flag enables interactive
pickers on all clouds, unifying the previously inconsistent behavior
where some clouds always prompted and others never did.

- AWS: promptRegion/promptBundle gated on SPAWN_CUSTOM
- GCP: promptMachineType/promptZone gated on SPAWN_CUSTOM
- Fly: promptVmOptions gated on SPAWN_CUSTOM
- Hetzner: new promptServerType/promptLocation with type/location arrays
- DigitalOcean: new promptDropletSize/promptDoRegion with size/region arrays
- Daytona: new promptSandboxSize with cpu/memory/disk presets
- Sprite: no change (managed platform, no meaningful size options)
- --custom + --headless is an error (incompatible modes)
- Version bump to 0.8.0 (new feature)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: fix biome format violations in --custom flag code

Auto-format object literals in arrays (expand to multi-line), wrap
long console.error line, and expand inline array in test assertion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 12:29:51 -05:00
A
ee653ca4a6
fix: persist GCP zone/project in connection metadata for deletion (#1809)
GCP delete was re-prompting for project/zone because saveVmConnection
didn't save metadata. Now createInstance passes zone and project as
metadata, and mergeLastConnection reads it back into history.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 08:17:44 -08:00
A
493ca7a1cd
fix: use SSH_INTERACTIVE_OPTS in interactiveSession() for hetzner/do/aws/gcp (#1805)
BatchMode=yes in SSH_BASE_OPTS actively blocks TTY prompts in interactive
sessions. Four cloud providers (Hetzner, DigitalOcean, AWS, GCP) were
using SSH_BASE_OPTS in their interactiveSession() functions despite
SSH_INTERACTIVE_OPTS being purpose-built for this (added in PR #1795).
This also adds Compression=yes, IPQoS=lowdelay, StrictHostKeyChecking=accept-new,
and the -t flag (already included), aligning with the commands.ts reconnect path.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 09:25:26 -05:00
A
180b19d9f4
fix: reduce SSH interactive lag (GSSAPIAuthentication + TCPKeepAlive) (#1795)
* fix: reduce SSH interactive lag with GSSAPIAuthentication=no and TCPKeepAlive=no

GSSAPIAuthentication causes latency on every SSH interaction when
the server doesn't support Kerberos (i.e. always for our VMs).
TCPKeepAlive is redundant with ServerAliveInterval and can cause
retransmission issues through NAT/firewalls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use SSH_INTERACTIVE_OPTS for all interactive sessions

The reconnect (cmdConnect) and agent launch (cmdEnterAgent) paths
were using bare SSH with only StrictHostKeyChecking, missing all
performance flags. Now they use SSH_INTERACTIVE_OPTS which includes:

- GSSAPIAuthentication=no (skip Kerberos timeout)
- TCPKeepAlive=no (avoid NAT retransmission issues)
- ServerAliveInterval=15 (encrypted keepalives)
- Compression=yes (reduce latency on slow/distant links)
- IPQoS=lowdelay (mark packets for low-latency treatment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 03:20:49 -05:00
A
a26d27f139
style: enforce biome format across codebase, add CI check (#1794)
Run `biome format --write` on all 98 source files (38 needed fixes).
The main change: object literals and long argument lists are now expanded
onto separate lines per Biome's `"expand": "always"` setting, making
code much easier to scan on narrow screens.

Add `biome format` check step to CI lint workflow so formatting
regressions are caught on every PR.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 23:32:12 -08:00
A
86cae8ee32
feat: add SSH key discovery & selection across all providers (#1792)
All 4 providers (Hetzner, DO, AWS, GCP) hardcoded ~/.ssh/id_ed25519 and
duplicated key generation logic. Users with id_rsa or custom-named keys
got unwanted new keys generated. This adds a shared ssh-keys module that:

- Scans ~/.ssh/ for all valid key pairs (matching pub + private files)
- With 0 keys: generates id_ed25519 (same as before)
- With 1 key: uses it silently
- With 2+ keys: prompts multiselect (all selected by default)
- Caches the result at module level for the session
- Centralizes getSshFingerprint() (was duplicated in Hetzner + DO)
- All providers now pass -i flags for selected keys to SSH commands

Net -152 lines of duplicated code across providers.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 23:22:50 -08:00
A
b802dfbc16
refactor: extract saveLaunchCmd to history.ts (#1789)
Eliminates copy-paste of saveLaunchCmd across 8 cloud provider files.
The local/local.ts copy had already diverged (using Bun.write() instead
of writeFileSync()), confirming the maintenance risk.

Fixes #1786

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-22 23:11:14 -08:00
A
ed7ebedde4
fix: clean up stdin/TTY state before interactive session handoff (#1790)
After provisioning, @clack/prompts and readline leave stdin with stale
listeners, raw mode, and buffered input. This causes flaky keyboard input
in the interactive SSH session. Add prepareStdinForHandoff() that closes
the shared readline, removes all stdin listeners, resets raw mode, and
pauses stdin before launching the child process.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-23 01:56:49 -05:00
A
3a554a5ada
fix: replace instanceof Error with hasMessage() duck-typing in SSH retry paths (#1785)
wrapSshCall (agent-setup.ts) and spriteRetry (sprite.ts) used `instanceof
Error` to extract error messages — an anti-pattern explicitly avoided
throughout the rest of the codebase (consistent with comments in index.ts,
commands.ts, manifest.ts, etc.). When errors cross module or bundling
boundaries, instanceof returns false even for real Error objects, causing
err.message to fall back to String(err) and producing `[object Object]` in
the retry logs. Uses `hasMessage()` from shared/type-guards for consistent
duck-typed narrowing.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-23 00:57:03 -05:00
A
fa34d29b7e
fix: explicitly pass SSH identity file for DigitalOcean connections (#1784)
DigitalOcean SSH was failing with "Permission denied (publickey)" because
the SSH client was not explicitly told which identity file to use. When
users have multiple SSH keys or an SSH agent with different keys loaded,
SSH may try the wrong key first and fail — especially with BatchMode=yes
which suppresses interactive fallbacks.

The fix adds `-i ~/.ssh/id_ed25519` to SSH_OPTS (matching AWS's approach)
and passes sshKeyPath to the shared waitForSsh utility, ensuring the
correct key is always used for both the handshake wait and all subsequent
SSH/SCP commands.

Fixes #1783

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-23 00:11:59 -05:00
A
3988ffe90e
fix: check exitCode in openBrowser() and reinit readline after @clack/prompts (#1782)
openBrowser() never checked the exitCode from Bun.spawnSync, so it silently
returned success even when the browser command failed (headless VMs, no
DISPLAY). Now checks exitCode and always shows the URL as fallback.

selectFromList() uses @clack/prompts which creates/destroys its own readline
on stdin. After it finishes, the shared readline in ui.ts can be corrupted
(Bun #1707). Now explicitly closes and nulls the shared readline after
@clack/prompts returns so the next prompt() call gets a fresh one.

Fixes #1770

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-22 23:21:06 -05:00
A
16c8a2b90b
fix: use getSpawnDir()/getConnectionPath() in all cloud providers (#1774)
Fixes #1769

All 8 cloud providers hard-coded `${process.env.HOME}/.spawn` for
connection data, bypassing the SPAWN_HOME env var support in history.ts.
Replaced all 16 occurrences with getSpawnDir() and getConnectionPath().

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 19:27:21 -08:00
A
0843c5e708
feat: shared SSH wait utility with TCP pre-check and stderr capture (#1779)
Replace 5 duplicated SSH wait implementations (AWS, DO, Hetzner, GCP,
Sprite) with a shared two-phase utility in cli/src/shared/ssh.ts:

- Phase 1: cheap TCP probe (2s intervals) until port 22 opens
- Phase 2: full SSH handshake (3s intervals) with stderr capture
- Adds BatchMode=yes to prevent interactive prompt hangs
- Removes ~220 lines of duplicated sleep/SSH_OPTS/waitForSsh code

Daytona (token auth) and Fly (WireGuard) left unchanged — too different.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 19:17:09 -08:00
A
b62dc1af33
feat: ban as type assertions, add runtime schema validation with valibot (#1775)
* fix: resolve all biome lint warnings across the codebase

- Replace all noExplicitAny with proper types (unknown, Record<string, unknown>)
- Fix useBlockStatements in picker.ts (braceless if)
- Fix useNumberNamespace in picker.ts (parseInt → Number.parseInt)
- Codebase now passes biome lint with 0 errors and 0 warnings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ban `as` type assertions, add runtime schema validation with valibot

Replace all ~170 unsafe `as` type assertions across the entire codebase
(production + tests) with runtime-validated alternatives:

- Add GritQL biome plugin (`no-type-assertion.grit`) that bans all `as`
  casts except `as const`
- Add valibot for schema-validated JSON parsing (`parseJsonWith`)
- Add shared utilities: `parse.ts` (schema parsing), `type-guards.ts`
- Replace `as` casts in all 5 cloud modules (aws, daytona, hetzner,
  digitalocean, fly) with valibot schemas + type guards
- Replace `as` casts in shared modules (manifest, update-check, oauth,
  commands, history, ui)
- Replace `as any` in all 26 test files with proper `new Response()`
  mocks and typed variables
- Add 13 tests for parseJsonWith/parseJsonRaw
- Add "Embrace Bold Changes" culture rule to CLAUDE.md
- Bump version 0.6.19 → 0.7.0

1859 tests pass, 0 lint errors across 95 files, bundle +6KB from valibot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: move GritQL plugin into cli/lint/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 18:50:53 -08:00
A
f0a70b66a1
feat: multi-line layout for ls/delete — name first, then agent · cloud · time (#1777)
Entries in `spawn ls` and `spawn delete` now display as two lines:
  - Line 1: spawn name (bold)
  - Line 2: Agent · Cloud · relative time

Removes SSH connection info and prompt previews from the list display
to keep it clean and scannable.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 18:33:06 -08:00
A
2413db6ade
fix: truncate picker lines to terminal width to prevent redraw corruption (#1772)
Long labels (e.g. "Claude Code on GCP Compute Engine -- spawn-trial-000-ahmed")
wrap to multiple rows, but the redraw logic uses a fixed line count to cursor-up.
This causes old content to pile up on every arrow-key press.

Query terminal width via `stty size` and truncate all lines to fit within
a single row, with a 1-char margin to prevent auto-wrap edge cases.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 17:22:46 -08:00
A
8112276121
feat: add delete sub-menu (destroy/remove) and spawn kill alias (#1765)
Pressing `d` in the server picker now shows a sub-menu:
- Destroy server: hard delete (destroys cloud VM + marks deleted)
- Remove from history: soft delete (removes entry, no cloud API call)
- Cancel: go back to picker

Also adds `kill` as an alias for `spawn delete`.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 15:23:49 -08:00
A
545ddafe4a
fix: extract flags module to fix KNOWN_FLAGS drift in tests (#1757)
KNOWN_FLAGS in unknown-flags.test.ts was copy-pasted from index.ts and
was missing the --name flag, causing silent test gaps. Extract
KNOWN_FLAGS, findUnknownFlag, and expandEqualsFlags into a new flags.ts
module so tests import the real source of truth.

- Create cli/src/flags.ts with KNOWN_FLAGS, findUnknownFlag, expandEqualsFlags
- Update index.ts to import from flags.ts (checkUnknownFlags now uses findUnknownFlag)
- Update unknown-flags.test.ts to import from flags.ts instead of copy-pasting
- Add tests for --name flag, KNOWN_FLAGS completeness, and expandEqualsFlags
- Bump CLI version to 0.6.15

Fixes #1744

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-22 18:10:07 -05:00
A
f3a2b85b5b
fix: always confirm cloud resource name with user, even when SPAWN_NAME is set (#1758)
When the CLI collects a display name (SPAWN_NAME), each cloud now shows
the kebab-case derivative as the default in the resource name prompt
instead of silently accepting it. Users can hit Enter to accept or type
an override. Non-interactive mode still skips the prompt.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 14:25:34 -08:00
A
7b021fb1f5
fix: set TERM and use login shell for interactive SSH sessions (#1754)
SSH interactive sessions ran the agent command in a non-login,
non-interactive shell — .bashrc/.profile weren't sourced and TERM
wasn't always set, making the shell feel broken (no colors, bad
line editing, missing env).

Fix for all 6 SSH-based clouds (DO, Hetzner, AWS, GCP, Fly, Daytona):
- Forward local TERM (default xterm-256color) to the remote
- Use `exec bash -l -c` for a proper login shell

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 14:14:13 -08:00
A
3e5cd2d076
fix: spawn fails with bun not found after install (#1748)
* fix: add ~/.bun/bin to shell rc files so spawn finds bun after install

The install script was only adding ~/.local/bin to shell profile files
(bashrc/zshrc/bash_profile), but not ~/.bun/bin. Since the spawn binary
uses #!/usr/bin/env bun as its shebang, bun must be in PATH for spawn
to work. After exec $SHELL, only dirs in rc files are available.

Now ensure_in_path() patches shell rc files for both ~/.local/bin (for
spawn) and ~/.bun/bin (for bun), and correctly checks both when deciding
whether to show "Run spawn" vs "exec $SHELL" instructions.

Fixes #1747

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: quote dir in fish_add_path to prevent command injection

Address security review feedback on PR #1748 — unquoted ${dir} in
fish command string could allow injection if HOME/BUN_INSTALL env
vars contain metacharacters.

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-22 13:41:27 -08:00
A
57d4ee7eeb
fix: drop apt nodejs/npm, install Node 22 directly via n (#1746)
apt-get install nodejs npm pulls in hundreds of node-* packages
(libhwasan, node-jsonify, node-eslint-utils, etc.) adding 60-90s
to cloud-init. We immediately replace it with Node 22 via n anyway.

Fix: bootstrap n directly from curl and install Node 22 in one step.
No apt nodejs/npm needed.

Before: apt install nodejs npm → npm install -g n → n 22 (slow)
After:  curl n | bash -s install 22 (fast, no apt bloat)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 12:40:22 -08:00
A
9d3728fd8d
fix: add build-essential to node cloud-init tier (#1743)
* fix: add build-essential to node cloud-init tier

The "node" tier (used by claude, codex, kilocode) was missing
build-essential. Native npm packages that compile C/C++ addons
fail without it. The "full" tier had it but no agent uses "full".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: upgrade openclaw to full cloud-init tier

Openclaw needs the most dependencies (build-essential, nodejs, npm,
bun) but was on the "bun" tier which only installed curl/unzip/git/zsh.
Switch to "full" which includes everything.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 12:27:11 -08:00
A
30f3758902
fix: set HOME=/root in cloud-init userdata to prevent unbound variable (#1741)
DigitalOcean's cloud-init environment doesn't set HOME. Combined
with set -e, any $HOME or ~ reference (bun install, .bashrc writes)
fails with "HOME: unbound variable" and cloud-init silently aborts.

Fixed in both DigitalOcean and Hetzner (same pattern). AWS doesn't
use set -e so is unaffected.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 12:13:56 -08:00
A
dc21fa223b
fix: cloud-init streaming script bash syntax error (#1737)
.join("; ") produced invalid bash: &; after background command,
do; after for, then; after if. Use newline-joined string instead.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 12:05:54 -08:00
A
b50b27141c
fix: stream cloud-init output instead of blind-polling on DigitalOcean (#1734)
Replace 60×5s blind poll loop ("Cloud-init in progress N/60") with
real-time streaming of /var/log/cloud-init-output.log via tail -f
over SSH. Users now see every apt-get, curl, and error as it happens.

Background checker exits as soon as .cloud-init-complete marker
appears. 5min timeout. Brief 30s fallback poll if streaming fails.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 11:58:29 -08:00
A
ac5e8495b1
feat: customize cloud-init per agent to fix boot timeouts (#1733)
Agents declare their dependency tier (minimal/node/bun/full), and
cloud-init only installs what's needed. Lightweight agents like
OpenCode and ZeroClaw skip Node.js upgrade, Bun install, and
build-essential — saving 60-90s on boot and eliminating the
DigitalOcean cloud-init timeout.

- Add CloudInitTier type + cloudInitTier field to AgentConfig
- Add shared/cloud-init.ts: tier-to-packages mapping
- Update all 6 clouds (DO, Hetzner, AWS, GCP, Fly, Daytona)
- Bump CLI version to 0.6.8

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 11:43:45 -08:00
A
738ad18fee
fix: add 5s delay between DigitalOcean and OpenRouter OAuth flows (#1727)
When both OAuth flows open browser tabs back-to-back, the user may
reactively close the second tab thinking it's a duplicate. Add a 5-second
pause with a message after DO OAuth completes, only when browser auth
was actually used (skipped for env var / saved token paths).

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 11:27:14 -08:00
A
f2010ce3bd
fix: add account:read scope to DigitalOcean OAuth flow (#1724)
OAuth token validation calls GET /v2/account which requires the
account:read scope. Without it, the token exchange succeeds but
validation fails with 403, falling through to manual token entry.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 11:14:26 -08:00
A
e527d79815
feat: DigitalOcean OAuth2 flow for automatic token provisioning (#1716)
* feat: add DigitalOcean OAuth2 flow for automatic token provisioning

Implements the OAuth2 authorization code flow for DigitalOcean as an
alternative to manual API token entry. The flow mirrors the existing
OpenRouter OAuth pattern using Bun.serve() for the local callback.

Changes:
- Add tryDoOAuth() with local Bun.serve callback, CSRF state, and
  code-for-token exchange via DO's /v1/oauth/token endpoint
- Add tryRefreshDoToken() for refreshing expired tokens without
  re-authorization
- Extend config persistence with refresh_token, expires_at, auth_method
- Modify ensureDoToken() flow: env var -> saved config (with refresh) ->
  OAuth browser flow -> manual paste fallback
- OAuth is gated on DO_OAUTH_CLIENT_ID and DO_OAUTH_CLIENT_SECRET env vars
- Add 37 tests covering config persistence, CSRF generation, code
  validation, token expiry, URL construction, and feature toggle
- Bump CLI version to 0.6.5

Closes #1715

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: hardcode DO OAuth credentials, remove env var gate

Embed client_id and client_secret as constants (same pattern as gh CLI,
doctl, gcloud). OAuth is now always available — no env vars needed.
Public CLI clients cannot keep secrets confidential; security comes from
the authorization code flow itself (user consent, localhost redirect,
CSRF state, single-use codes).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add droplet:delete scope for spawn delete support

The spawn CLI's destroyServer() calls DELETE /droplets/{id} which
requires the droplet:delete scope. All its required sub-scopes
(droplet:read, regions:read, sizes:read, actions:read, image:read)
were already present.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 13:49:53 -05:00
A
9cb265d820
refactor: remove all cloud bash libs, convert AWS to JS bundle fallback (#1714)
All clouds now use TypeScript. Convert the last holdout (AWS) from bash
lib fallback to the JS bundle download pattern, then delete all remaining
cloud bash libs and clean up stale test code.

- Convert 6 AWS agent scripts to JS bundle fallback (matching hetzner)
- Delete aws/lib/common.sh and hetzner/lib/common.sh
- Delete orphaned test/fixtures/ovh/
- Stub out dead functions in test/e2e.sh that sourced deleted libs
- Delete 3 test files that only tested cloud bash libs
- Remove dead describe blocks from 3 remaining test files
- Bump CLI version 0.6.3 → 0.6.4

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 10:13:36 -08:00
A
21f7e7683f
refactor: deduplicate remaining 6 clouds into shared agent-setup pattern (#1704)
Convert gcp, daytona, digitalocean, hetzner, sprite, and local clouds
to use shared/agent-setup.ts and shared/orchestrate.ts, matching the
pattern established by AWS and Fly. Each cloud's agents.ts is now a
~26-line thin wrapper; each main.ts uses runOrchestration().

- Delete gcp/lib/common.sh (406 lines of dead bash code)
- Delete cli/src/fly/oauth.ts and cli/src/fly/ui.ts re-export wrappers
- Fix all fly/oauth and fly/ui imports to use shared/ directly
- Update test thresholds for reduced bash cloud count
- Bump CLI version to 0.6.3

Net reduction: ~2,850 lines removed.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 09:20:34 -08:00
A
eac5713ef0
refactor: deduplicate AWS/Fly agent setup into shared modules (#1700)
Extract ~800 lines of duplicated agent helpers and orchestration logic
from aws/agents.ts and fly/agents.ts into shared modules:

- shared/agent-setup.ts: CloudRunner interface, installAgent,
  uploadConfigFile, installClaudeCode, setupClaudeCodeConfig,
  GitHub auth, config helpers, createAgents(), resolveAgent()
- shared/orchestrate.ts: CloudOrchestrator interface + 12-step
  runOrchestration() pipeline
- shared/agents.ts: AgentConfig type + generateEnvConfig (single source)

Each cloud becomes a thin wrapper (~25-60 lines) that constructs a
CloudRunner/CloudOrchestrator from its provider-specific functions.

Also fixes pre-existing test breakage (aws.test.ts imported renamed
exports LIGHTSAIL_BUNDLES/BundleTier → BUNDLES/Bundle) and removes
dead aws/lib/common.sh reference from test/e2e.sh.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 08:40:21 -08:00
A
55df28137d
feat: convert gcp/ cloud provider from Bash to TypeScript (#1694)
Security review approved. All issues resolved.
2026-02-22 08:51:50 -05:00
A
850327c29d
feat: convert aws/ cloud provider from Bash to TypeScript (#1693)
Migrates AWS Lightsail from 609-line bash (aws/lib/common.sh) to TypeScript,
following the established Fly.io/local provider patterns. Type safety eliminates
SigV4 signing bugs, @clack/prompts provides interactive bundle/region pickers,
and error handling is explicit.

- cli/src/aws/aws.ts — Core: AWS CLI wrapper, SigV4 REST API, auth, provisioning, SSH
- cli/src/aws/agents.ts — Agent configs and install helpers
- cli/src/aws/main.ts — Orchestrator
- aws/*.sh — Converted to thin bun shims with bash fallback (curl|bash compatible)
- cli/package.json — Version bump to 0.6.0

Fixes #1675

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-22 07:50:54 -05:00
A
435d9125d5
feat: convert local/ cloud provider from Bash to TypeScript (#1688)
Creates cli/src/local/{main,local,agents}.ts following the Fly.io
pattern. All 6 agent .sh files replaced with thin bun shims.
Extracts shared oauth.ts and ui.ts to cli/src/shared/ for reuse
across cloud providers. Updates fly/ to re-export from shared.

Fixes #1681

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-22 10:49:40 +00:00
A
0f4df7be71
feat: pre-built Docker image for OpenClaw on Fly.io (#1686)
Eliminates the slow waitForCloudInit() + bun install phase by booting
a pre-built image with Node.js, bun, and openclaw already installed.
The image is rebuilt daily via GitHub Actions to pick up new releases.

Other agents are unaffected — they still use ubuntu:24.04 + cloud-init.

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-22 02:50:46 -05:00
A
4cec25c6b7
fix: pass spawn name through cmdRun and headless flows (#1674)
cmdRun (spawn <agent> <cloud>) was not collecting or passing the spawn
name, so SPAWN_NAME was never set in the script environment and the
history record lacked a name. cmdRunHeadless had the same gap.

- Add promptSpawnName() call to cmdRun and pass result to execScript
- Wire spawnName through HeadlessOptions to runBashHeadless
- Add --name CLI flag to set SPAWN_NAME from the command line
- Skip interactive name prompt when SPAWN_NAME is already set
- Bump CLI to 0.5.33

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 21:52:52 -08:00
A
461d945212
chore: bump CLI version to 0.5.32 (#1666)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 20:30:14 -08:00
A
760fa22dea
fix: bump fly default VM to 4GB, add 10GB volume, hide keepalive dots (#1654)
- Default VM memory: 1024MB → 4096MB (all agents except ZeroClaw
  which stays at 1024MB). Prevents OOM kills during native installs.
- Attach a 10GB persistent volume at /data with a /root/work symlink
  so agents have enough storage to clone repos and work.
  Configurable via FLY_VOLUME_SIZE env var.
- Keepalive: changed dots to spaces so they're invisible in terminal.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 17:56:02 -08:00
A
8650ad15d8
feat: interactive VM size and volume prompts for Fly.io (#1655)
Users can now choose VM size (1x/2x/4x shared CPU tiers) and opt into
persistent volumes during provisioning instead of getting hardcoded defaults.
FLY_VM_MEMORY env var still works for CI/headless mode.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 17:19:09 -08:00
A
b43d3f1b70
fix: combine gateway start + port wait into single SSH session (#1642)
The old flow opened up to 60+ separate fly ssh console sessions to
poll port 18789 after starting the gateway daemon. Each session opens
a new WireGuard tunnel which is slow and flaky.

Now: one SSH session starts the daemon, then polls the port in-band
with a simple for loop. Output from the loop also serves as a
keepalive for flyctl.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 16:16:08 -08:00
A
09d9f597ac
fix: use openclaw curl installer to prevent fly ssh hang (#1640)
bun install -g openclaw spawns child processes that keep stdout/stderr
FDs open, preventing fly ssh console from detecting EOF. Replace with
the official curl installer (--no-onboard) which handles Node detection
and cleanup without leaving orphan processes on the pipe.

See: https://docs.openclaw.ai/install

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:58:05 -08:00
A
42f2b66b55
fix: keep stdin pipe open during fly ssh to prevent session teardown (#1638)
flyctl tears down the WireGuard transport when stdin closes ("session
forcibly closed; the remote process may still be running"). This
killed long-running commands like `bun install -g openclaw`.

Instead of calling stdin.end() immediately, keep the pipe open for
the duration of the command and close it after the process exits.
The pipe still prevents interactive prompts from hanging (no data
flows through it), but flyctl no longer interprets the closed fd
as a signal to kill the session.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:44:14 -08:00