.join("; ") produced invalid bash: &; after background command,
do; after for, then; after if. Use newline-joined string instead.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 60×5s blind poll loop ("Cloud-init in progress N/60") with
real-time streaming of /var/log/cloud-init-output.log via tail -f
over SSH. Users now see every apt-get, curl, and error as it happens.
Background checker exits as soon as .cloud-init-complete marker
appears. 5min timeout. Brief 30s fallback poll if streaming fails.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents declare their dependency tier (minimal/node/bun/full), and
cloud-init only installs what's needed. Lightweight agents like
OpenCode and ZeroClaw skip Node.js upgrade, Bun install, and
build-essential — saving 60-90s on boot and eliminating the
DigitalOcean cloud-init timeout.
- Add CloudInitTier type + cloudInitTier field to AgentConfig
- Add shared/cloud-init.ts: tier-to-packages mapping
- Update all 6 clouds (DO, Hetzner, AWS, GCP, Fly, Daytona)
- Bump CLI version to 0.6.8
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: don't re-prompt name for failed spawns, improve retry hint
- Reuse existing spawn name when rerunning from `spawn list` or
`spawn last` instead of prompting for a new name (#1712)
- Include --name flag in retry command hint when a spawn name
was used, e.g. `spawn claude hetzner --name my-box` (#1709)
- Bump CLI version to 0.6.5
Fixes#1712Fixes#1709
Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: add unit tests for buildRetryCommand --name flag
Cover the spawnName parameter added for issue #1709:
- with name and no prompt
- with name and short prompt
- with name and long prompt (prompt-file fallback)
- with undefined/empty name (no --name flag)
- verify --name appears before --prompt in output
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: quote --name value when spawn name contains spaces
Handle edge case where spawn names may contain spaces or quotes
by properly quoting and escaping the --name flag value.
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: always quote --name value to prevent shell injection
Always wrap spawn names in double quotes in the retry command hint,
not just when names contain spaces. This prevents shell metacharacters
(;, |, &, etc.) in spawn names from being interpreted if users copy
the displayed retry command.
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
shared/common.sh (3852 lines) was dead code — the entire architecture
was rewritten to TypeScript in cli/src/. No agent scripts source it
anymore. The only consumer was github-auth.sh which just needed 4
log functions (now inlined).
Remove 27 test files that spawned ~800+ real bash/bun subprocesses per
run (the root cause of slow bun test). Every shared-common-*.test.ts
file forked a real bash shell per test case to source shared/common.sh.
CLI subprocess tests spawned `bun run index.ts` per assertion. These
were integration tests, not unit tests.
Also removes:
- mock-tests CI job from test.yml (ran test/mock.sh which opens browser)
- Stale plan files referencing deleted infrastructure
- All CLAUDE.md/README.md references to the old lib/common.sh pattern
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When both OAuth flows open browser tabs back-to-back, the user may
reactively close the second tab thinking it's a duplicate. Add a 5-second
pause with a message after DO OAuth completes, only when browser auth
was actually used (skipped for env var / saved token paths).
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Prevent recursive update check during install (SPAWN_NO_UPDATE_CHECK=1)
- Increase fetch timeout from 5s to 10s for slow/cold connections
- Add 1-hour failure backoff to avoid repeated failed update attempts
- Bump CLI version to 0.6.6
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
OAuth token validation calls GET /v2/account which requires the
account:read scope. Without it, the token exchange succeeds but
validation fails with 403, falling through to manual token entry.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reuse a single readline interface across prompt() calls instead of
creating and closing a new one each time. In Bun, repeatedly calling
createInterface/close on the same stdin causes the "close" event to
fire immediately on subsequent interfaces, which resolved the prompt
with an empty string before the user could type — triggering "Token
cannot be empty".
Fixes#1707
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add DigitalOcean OAuth2 flow for automatic token provisioning
Implements the OAuth2 authorization code flow for DigitalOcean as an
alternative to manual API token entry. The flow mirrors the existing
OpenRouter OAuth pattern using Bun.serve() for the local callback.
Changes:
- Add tryDoOAuth() with local Bun.serve callback, CSRF state, and
code-for-token exchange via DO's /v1/oauth/token endpoint
- Add tryRefreshDoToken() for refreshing expired tokens without
re-authorization
- Extend config persistence with refresh_token, expires_at, auth_method
- Modify ensureDoToken() flow: env var -> saved config (with refresh) ->
OAuth browser flow -> manual paste fallback
- OAuth is gated on DO_OAUTH_CLIENT_ID and DO_OAUTH_CLIENT_SECRET env vars
- Add 37 tests covering config persistence, CSRF generation, code
validation, token expiry, URL construction, and feature toggle
- Bump CLI version to 0.6.5
Closes#1715
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: hardcode DO OAuth credentials, remove env var gate
Embed client_id and client_secret as constants (same pattern as gh CLI,
doctl, gcloud). OAuth is now always available — no env vars needed.
Public CLI clients cannot keep secrets confidential; security comes from
the authorization code flow itself (user consent, localhost redirect,
CSRF state, single-use codes).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add droplet:delete scope for spawn delete support
The spawn CLI's destroyServer() calls DELETE /droplets/{id} which
requires the droplet:delete scope. All its required sub-scopes
(droplet:read, regions:read, sizes:read, actions:read, image:read)
were already present.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The npm-global prefix fix from PR #1699 was lost when agent configs
were refactored from gcp/agents.ts into shared/agent-setup.ts. Without
`npm config set prefix ~/.npm-global`, npm install -g uses the system
prefix (/usr/lib/node_modules) which fails with EACCES for non-root
users on GCP.
This also fixes kilocode's postinstall script (postinstall.mjs) which
uses require.resolve() to find @kilocode/cli-linux-x64 — when npm
writes to the system prefix, the postinstall can't write the binary
symlink into the package's bin/ directory.
Fixes#1698
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Delete 5 entirely-duplicate test files and trim 9 others where the same
bash functions were tested identically in multiple places. Every removed
test has a surviving canonical copy — zero coverage lost.
Deleted (all content duplicated elsewhere):
- shared-common-decomposed-helpers.test.ts
- shared-common-oauth-retry.test.ts
- shared-common-oauth-security.test.ts
- shared-common-server-retry.test.ts
- shared-common-token-provider.test.ts
79 files / 38k lines → 74 files / 33k lines
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All clouds now use TypeScript. Convert the last holdout (AWS) from bash
lib fallback to the JS bundle download pattern, then delete all remaining
cloud bash libs and clean up stale test code.
- Convert 6 AWS agent scripts to JS bundle fallback (matching hetzner)
- Delete aws/lib/common.sh and hetzner/lib/common.sh
- Delete orphaned test/fixtures/ovh/
- Stub out dead functions in test/e2e.sh that sourced deleted libs
- Delete 3 test files that only tested cloud bash libs
- Remove dead describe blocks from 3 remaining test files
- Bump CLI version 0.6.3 → 0.6.4
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace python3 with bun/jq in shared scripts (#1697)
Replace python3 -c inline scripting with jq (preferred) and bun -e
fallbacks per project policy. Python is not a declared dependency;
jq and bun are the project's scripting runtimes.
Changes:
- shared/common.sh: Replace all 9 python3 -c calls with jq/bun -e
- shared/key-request.sh: Replace all 4 python3 -c calls with jq/bun -e
- check_python_available: Now checks for jq or bun instead of python3
- Update test expectations for JS semantics (true/false vs True/False,
bracket access vs .get(), null handling)
Fixes#1697
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: replace eval() with safe property access, rename check_python_available
Security: eliminate eval() from _extract_json_field() — use regex-based
bracket-notation parser to traverse JSON paths safely. The function now
extracts ['key'] and [N] segments from the expression string and
iterates through them, preventing arbitrary code execution.
Also rename check_python_available() → check_json_processor_available()
throughout the codebase (shared/common.sh, local/lib/common.sh, and
tests) since the function now checks for jq/bun, not python3.
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Automated refactor/discovery agents occasionally run tests from outside
the cli/ directory, where bunfig.toml is not loaded and this preload
never activates. When that happens, HOME stays as the real home dir
(/root on CI), so any subprocess-test-*.txt written by tests leaks
there instead of the sandbox.
Added cleanupStrayTestFiles() which runs both on preload init and on
process exit. This retroactively removes any leftover files from past
runs and prevents accumulation in future ones.
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Convert gcp, daytona, digitalocean, hetzner, sprite, and local clouds
to use shared/agent-setup.ts and shared/orchestrate.ts, matching the
pattern established by AWS and Fly. Each cloud's agents.ts is now a
~26-line thin wrapper; each main.ts uses runOrchestration().
- Delete gcp/lib/common.sh (406 lines of dead bash code)
- Delete cli/src/fly/oauth.ts and cli/src/fly/ui.ts re-export wrappers
- Fix all fly/oauth and fly/ui imports to use shared/ directly
- Update test thresholds for reduced bash cloud count
- Bump CLI version to 0.6.3
Net reduction: ~2,850 lines removed.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GCP VMs run as a non-root user, so `npm install -g` fails when the npm
prefix points to a system directory. Ensure ~/.npm-global is configured
as the npm prefix before global installs for kilocode, codex, and
openclaw (npm fallback).
Fixes#1698
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract ~800 lines of duplicated agent helpers and orchestration logic
from aws/agents.ts and fly/agents.ts into shared modules:
- shared/agent-setup.ts: CloudRunner interface, installAgent,
uploadConfigFile, installClaudeCode, setupClaudeCodeConfig,
GitHub auth, config helpers, createAgents(), resolveAgent()
- shared/orchestrate.ts: CloudOrchestrator interface + 12-step
runOrchestration() pipeline
- shared/agents.ts: AgentConfig type + generateEnvConfig (single source)
Each cloud becomes a thin wrapper (~25-60 lines) that constructs a
CloudRunner/CloudOrchestrator from its provider-specific functions.
Also fixes pre-existing test breakage (aws.test.ts imported renamed
exports LIGHTSAIL_BUNDLES/BundleTier → BUNDLES/Bundle) and removes
dead aws/lib/common.sh reference from test/e2e.sh.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: convert sprite/ cloud provider from Bash to TypeScript
Makes Sprite CLI orchestration (retry, org detection, file upload) cleaner.
Converts 381-line lib/common.sh and 6 agent scripts to TS/Bun.
Fixes#1680
Agent: complexity-hunter
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add path traversal check, fix regex injection, update test assertions
- Add '..' path traversal rejection in uploadFileSprite
- Replace RegExp constructor with string comparison in createSprite
to prevent regex injection
- Add base64 output validation in main.ts
- Update TS_CLOUDS sets and test count assertions for sprite conversion
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: update test assertions for TS-converted cloud providers
Lowered cloud lib/common.sh count from >= 7 to >= 5 and SSH-based
upload_file count from >= 4 to >= 3 to reflect sprite and digitalocean
being converted from Bash to TypeScript.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add temp file path validation in sprite uploadConfigFile
Add path validation to ensure the temp file path stays within the
expected tmpdir() directory, preventing potential path manipulation.
The other three security review findings (path traversal, regex
injection, base64 validation) were already addressed in the previous
commit on this branch.
Agent: code-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test count assertions after sprite TS migration
Both upload-file-security and cloud-lib-source-chain had '>= 5' floor
assertions that assumed sprite had bash lib/common.sh. Now that sprite
is TS-based (no bash lib), the bash-cloud count is 4, not 5.
Agent: team-lead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: convert daytona/ cloud provider from Bash to TypeScript
Replaces fragile bash SSH workarounds with structured TypeScript.
Converts 341-line lib/common.sh and 6 agent scripts to TS/Bun.
Fixes#1679
Agent: ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: update test assertions for daytona TypeScript conversion
Add daytona to TS_CLOUDS set and lower cloud count thresholds since
daytona no longer has a bash lib/common.sh.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: address security review - path traversal, command injection, test counts
- Add path traversal rejection (reject '..') in uploadConfigFile and uploadFile
- Use single quotes around remotePath in shell commands to prevent expansion
- Add strict remotePath validation to uploadConfigFile (allowlist regex)
- Update TS_CLOUDS sets across all test files for daytona TS conversion
- Adjust upload-file-security test count expectations for TS migrations
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test assertions for TS-converted cloud providers
After converting daytona and digitalocean from Bash to TypeScript, the
number of bash-based cloud libs dropped. Updated expected counts:
- cloud-lib-source-chain: >= 6 to >= 5
- cloud-error-guidance create_server: >= 5 to >= 4
- upload-file-security SSH clouds: >= 4 to >= 3
- shared-common-post-session SSH clouds: >= 4 to >= 3
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: convert hetzner/ cloud provider from Bash to TypeScript
Migrates hetzner/ to the same TypeScript pattern as fly/ and local/:
- Creates cli/src/hetzner/{main.ts,hetzner.ts,agents.ts}
- Replaces 6 bash agent scripts with thin bun shims
- Reuses cli/src/fly/{oauth.ts,ui.ts} for cross-cloud functionality
- Adds hetzner to TS_CLOUDS in manifest-integrity tests
- Bumps CLI version to 0.5.35
Why: Consistent TypeScript architecture across cloud providers enables
type-safe API interactions, better error handling for Hetzner's unusual
"error: null" success response format, and eliminates bash JSON parsing.
Fixes#1676
Agent: code-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: validate remotePath in uploadConfigFile to prevent command injection
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: convert digitalocean/ cloud provider from Bash to TypeScript
Replaces python3 usage (violates CLAUDE.md) with native TypeScript.
Converts 277-line lib/common.sh and 6 agent scripts to TS/Bun.
Fixes#1677
Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: update tests for digitalocean TypeScript conversion
Add digitalocean to TS_CLOUDS set so bash -n tests skip the removed
lib/common.sh. Skip digitalocean scripts in mock tests (same as fly).
Adjust create_server count threshold from 6 to 5.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Creates cli/src/local/{main,local,agents}.ts following the Fly.io
pattern. All 6 agent .sh files replaced with thin bun shims.
Extracts shared oauth.ts and ui.ts to cli/src/shared/ for reuse
across cloud providers. Updates fly/ to re-export from shared.
Fixes#1681
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: validate env-loaded tokens to prevent curl config injection
_load_token_from_env() performed zero validation on API token values
from environment variables before they reached _curl_api(), which
passes auth headers via curl's -K stdin config. A token containing a
double-quote could break out of the config's quoted string and inject
additional curl directives (e.g., redirecting the request to an
attacker-controlled server).
_load_token_from_config() already validates with the same regex
(^[a-zA-Z0-9._/@:+=, -]+$). This adds the same check to the env
path, closing the defense-in-depth gap across all token-loading paths.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: pre-built Docker image for OpenClaw on Fly.io (#1686)
Eliminates the slow waitForCloudInit() + bun install phase by booting
a pre-built image with Node.js, bun, and openclaw already installed.
The image is rebuilt daily via GitHub Actions to pick up new releases.
Other agents are unaffected — they still use ubuntu:24.04 + cloud-init.
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use positional params in macOS curl path to prevent command injection (#1685)
**Why:** The macOS fallback in `request_missing_cloud_keys()` used
`${providers_json}` directly in a curl `-d` argument. If `providers_json`
contained shell metacharacters (e.g., from a failed python3 call), this
could execute arbitrary commands. The Linux path already used the safe
positional parameter pattern (`bash -c '...' -- "$1" "$2" "$3"`).
Unifies both code paths to use the safe positional parameter pattern.
Fixes#1684
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: update test to expect rejection of tokens with newlines
The _load_token_from_env validation now rejects tokens containing
newline characters to prevent curl config injection. Update the test
to expect exit code 1 and verify the warning message is emitted.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Eliminates the slow waitForCloudInit() + bun install phase by booting
a pre-built image with Node.js, bun, and openclaw already installed.
The image is rebuilt daily via GitHub Actions to pick up new releases.
Other agents are unaffected — they still use ubuntu:24.04 + cloud-init.
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: replace require() with ESM imports in bun eval scripts (#1669)
Fixes#1669
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: update test assertion to match ESM import pattern
The test expected require('http') but the PR changed shared/common.sh
to use ESM imports. Update assertion to expect import http from 'http'.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replaces all references to zeroclaw-labs/zeroclaw/main/scripts/install.sh
with a pinned commit SHA (a117be64). This prevents supply chain attacks via
the mutable 'main' branch reference in curl|bash installer patterns.
Other curl|bash patterns (bun.sh, claude.ai, sprites.dev) use HTTPS to
vendor-controlled domains with no stable commit SHA to pin to -- these
follow industry-standard installer patterns and are left as-is.
Fixes#1670
-- refactor/ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
cmdRun (spawn <agent> <cloud>) was not collecting or passing the spawn
name, so SPAWN_NAME was never set in the script environment and the
history record lacked a name. cmdRunHeadless had the same gap.
- Add promptSpawnName() call to cmdRun and pass result to execScript
- Wire spawnName through HeadlessOptions to runBashHeadless
- Add --name CLI flag to set SPAWN_NAME from the command line
- Skip interactive name prompt when SPAWN_NAME is already set
- Bump CLI to 0.5.33
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Native npm packages (node-gyp, etc.) need gcc/make/libc-dev to compile.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace direct nodejs.org tarball download with the same apt + n
approach used by all other clouds: install nodejs/npm via apt, then
upgrade to v22 LTS via `n`. Also adds zsh to base packages (matching
cloud-init userdata) and removes xz-utils (no longer needed).
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use `bun install -g openclaw` (consistent with Hetzner and other VM
clouds) instead of the curl installer script. Bun and Node.js are
already available from the cloud-init phase.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add performance-1x/2x/4x (dedicated vCPU) options alongside existing
shared CPU tiers. Thread cpuKind through to the Machines API cpu_kind
field so users can provision dedicated VMs for consistent performance.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The interactive volume prompt added unnecessary friction to the
provisioning flow. Volume support remains in fly.ts for programmatic
use via ServerOptions.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When selecting a previous spawn from `spawn ls`, the first option is now
"Enter <agent>" which SSHes into the VM and launches the agent directly,
instead of just opening a plain SSH shell.
The exact launch command is captured at spawn time and stored in the
connection record, so dynamic state (PATH setup, env sourcing) is
preserved for reconnection.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write GITHUB_TOKEN to a temp file with 0600 permissions instead of
inlining in the command string, preventing exposure via ps aux and
/proc/*/cmdline.
Fixes#1659
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Thread agent and cloud slugs through to the OpenRouter OAuth URL so
OpenRouter knows which agent/cloud combination the user is deploying.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default VM memory: 1024MB → 4096MB (all agents except ZeroClaw
which stays at 1024MB). Prevents OOM kills during native installs.
- Attach a 10GB persistent volume at /data with a /root/work symlink
so agents have enough storage to clone repos and work.
Configurable via FLY_VOLUME_SIZE env var.
- Keepalive: changed dots to spaces so they're invisible in terminal.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fly machine exec drops the session when there's no output for too long.
OpenClaw's installer runs silently in non-TTY mode, producing no output
for minutes while npm builds native deps — triggering "ssh shell: session
forcibly closed".
Fix: run the install command in the background and print a dot every 5s
to keep the SSH session alive. Applies to all agents on Fly, not just
OpenClaw.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The validatePromptFilePath function used CJS require("path") inline,
violating the project's ESM-only rule. This could trigger Bun
compatibility issues since the project is "type": "module".
Replace with a top-level `import { resolve } from "path"` statement.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Users can now choose VM size (1x/2x/4x shared CPU tiers) and opt into
persistent volumes during provisioning instead of getting hardcoded defaults.
FLY_VM_MEMORY env var still works for CI/headless mode.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`conn.server_id` holds the Fly machine ID (e.g. "d8d91a0c4e1783")
while `conn.server_name` holds the app name (e.g. "spawn-abc123").
`flyDestroyServer()` calls `/apps/${name}/machines` — it expects the
app name, not a machine ID.
The `server_id || server_name` precedence meant every `spawn delete`
for Fly.io passed a machine ID, causing the Fly API to return
"Could not find App" and leaving the VM running and accumulating charges.
Fix: swap precedence to `server_name || server_id` for the Fly.io path.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The old flow opened up to 60+ separate fly ssh console sessions to
poll port 18789 after starting the gateway daemon. Each session opens
a new WireGuard tunnel which is slow and flaky.
Now: one SSH session starts the daemon, then polls the port in-band
with a simple for loop. Output from the loop also serves as a
keepalive for flyctl.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes for "session forcibly closed" during openclaw install:
1. The openclaw install command piped all output to /dev/null, so
flyctl saw zero bytes flowing and killed the session. Removed
the >/dev/null 2>&1 redirect.
2. Added a background keepalive to runServer that prints a dot to
stderr every 10s. This prevents flyctl from tearing down silent
SSH sessions even if the command itself produces no output for
a while.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bun install -g openclaw spawns child processes that keep stdout/stderr
FDs open, preventing fly ssh console from detecting EOF. Replace with
the official curl installer (--no-onboard) which handles Node detection
and cleanup without leaving orphan processes on the pipe.
See: https://docs.openclaw.ai/install
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs in fly.ts:
1. getCmd() only uses `which` in a subprocess, but Bun.spawnSync inherits
the original PATH — not process.env mutations from ensureFlyCli(). After
installing flyctl to ~/.fly/bin, getCmd() still can't find it. Fix: add
a filesystem fallback that checks ~/.fly/bin directly.
2. ensureFlyToken() resolves the token and saves it to config, but never
writes it to process.env.FLY_API_TOKEN. When fly ssh console runs as a
subprocess, it has no token and can't authenticate. Fix: add
syncTokenToEnv() and call it on every successful token resolution path.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace the Node.js install chain (apt nodejs+npm → npm install -g n →
n 22 → symlinks) with a single curl of the v22 binary tarball from
nodejs.org. Eliminates python3 dependency, npm bloat, and the n version
manager. Bun is installed first as the primary package manager.
Fly-only change — other clouds unchanged pending validation.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
flyctl tears down the WireGuard transport when stdin closes ("session
forcibly closed; the remote process may still be running"). This
killed long-running commands like `bun install -g openclaw`.
Instead of calling stdin.end() immediately, keep the pipe open for
the duration of the command and close it after the process exits.
The pipe still prevents interactive prompts from hanging (no data
flows through it), but flyctl no longer interprets the closed fd
as a signal to kill the session.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After PR #1602 converted fly/ from bash to TypeScript, fly/lib/common.sh was
removed. However, buildDeleteScript() still generated a bash script that tried
to source it via curl, causing spawn delete to always fail for Fly.io servers
with a curl exit 22 (404). Users were left with orphaned apps incurring charges.
Fix: add fly-specific path in execDeleteServer() that calls ensureFlyCli(),
ensureFlyToken(), and destroyServer() directly from the TypeScript fly module,
bypassing the bash script path entirely. Remove the dead case "fly" from
buildDeleteScript().
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>