* test: Remove duplicate and theatrical tests
- cmd-listing-output: Fix always-pass guard (if (localLine) → expect defined then check)
- with-retry-result: Replace conditional if (!r.ok) expects with toMatchObject
- run-path-credential-display: Remove 96 lines of duplicate tests
- parseAuthEnvVars for credential status (duplicate of commands-exported-utils.test.ts)
- credential function edge cases with weak OR assertions (duplicate of credential-hints.test.ts)
- Migrated 2 unique edge cases (whitespace trimming, empty separator) to commands-exported-utils.test.ts
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: apply biome format to test files in qa/dedup-scanner
Agent: team-lead
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* fix(hetzner): update deprecated cx22/cpx21 server types to cx23/cpx22
Hetzner deprecated the entire cx*2 and cpx*1 server lines on Jan 1, 2026.
New orders fail with "server type is deprecated". Updates to the current
gen3 CX and gen2 CPX lines (cx23, cx33, cx43, cx53, cpx22, cpx32).
Also shows the server type picker by default instead of requiring --custom,
so users can choose their instance size on every deploy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(zeroclaw): append autonomy config instead of overwriting onboard output
zeroclaw onboard generates a complete config with required fields like
default_temperature. Our setup was overwriting that with a partial config
missing required fields, causing a crash loop on startup. Now appends
the security/shell settings instead so onboard's fields are preserved.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix biome formatting in agent-setup.ts
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Agent: pr-maintainer
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
DigitalOcean's token exchange endpoint requires client_secret and does
not support PKCE-only public client flows. The embedded secret follows
the same pattern used by gh CLI, doctl, gcloud, and az CLI. Expanded
the comment to explain:
- Why client_secret is required (no PKCE support)
- Why embedding it is acceptable (public client, RFC 6749 §2.1)
- What security mechanisms are actually relied upon
- When the secret should be removed (if DO adds PKCE)
Fixes#1980
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(aws): increase OpenClaw gateway timeout to 120s and default to medium bundle
OpenClaw gateway consistently times out on AWS Lightsail because the 60s
timeout is too short for cold starts (npm install of 713 packages + gateway
init). Doubles the timeout to 120s and sets the default bundle for OpenClaw
to medium_3_0 (4 GB RAM) since it's too heavy for nano (512 MB).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: resolve openclaw binary path for setsid and add npm-global to Sprite PATH
setsid replaces the process image and doesn't inherit the parent shell's
exported PATH, causing "No such file or directory" on Sprite (and potentially
other clouds). Fix by resolving the full binary path with `command -v` before
passing it to setsid. Also adds ~/.npm-global/bin to Sprite's persisted shell
PATH config so openclaw is discoverable in all session types.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(codex): update wire_api from "chat" to "responses"
Codex CLI dropped support for wire_api = "chat" — it now requires
"responses". This was never updated since the original codex integration,
causing an immediate crash loop on launch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: enable GitHub CLI auth for all agents, not just Claude Code
Only Claude Code had preProvision: promptGithubAuth — all other agents
(codex, openclaw, opencode, kilocode, zeroclaw) skipped GitHub auth
entirely. These are all coding agents that need gh access for PRs,
cloning, etc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing spawn import that crashes headless mode (#1981)
runBashHeadless calls spawn() from node:child_process at line 1112,
but only spawnSync was imported. This causes a ReferenceError crash
whenever --headless mode is used.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
runBashHeadless calls spawn() from node:child_process at line 1112,
but only spawnSync was imported. This causes a ReferenceError crash
whenever --headless mode is used.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace local rec() helper in hetzner.ts with shared toRecord() from
@openrouter/spawn-shared, eliminating a duplicate implementation that
already existed in the shared package with equivalent behavior
- Fix stale comment in key-request.sh referencing non-existent qa.sh
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* feat!: remove Fly.io cloud provider support
Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io
infrastructure internally) is retained.
- Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/
- Remove fly cloud entry and 6 fly matrix entries from manifest.json
- Remove fly imports, destroy cases, and connection handlers from commands.ts
- Remove fly-ssh sentinel from security.ts
- Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh)
- Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts
- Clean up fly references in build config, gitignore, icon sources
- Bump CLI version to 0.11.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: restore Docker image build under sh/docker/
Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename
workflow from fly-docker.yml to docker.yml with updated paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix extra blank lines in commands.ts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
The manifest validation (isValidManifest) describe block in
commands-swap-resolve.test.ts used an always-pass anti-pattern:
try { await loadManifest() } catch {} followed by console.error.some()
assertions. This pattern silently passes even when the expected rejection
path is not triggered.
The same coverage (missing agents/clouds/matrix fields, null data, HTTP
errors, valid manifests) is already provided by manifest-cache-lifecycle.test.ts
with proper expect().rejects.toThrow() assertions.
Remove the duplicate 145-line block. No regression: pass/fail counts unchanged.
-- qa/dedup-scanner
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor: remove dead offerGithubAuth exports from cloud agents.ts files
The per-cloud offerGithubAuth re-exports in each cloud's agents.ts were
never called from outside their own module. The actual GitHub auth
orchestration is handled by shared/orchestrate.ts which calls
offerGithubAuth from shared/agent-setup.ts directly.
Also update stale comment in sh/test/fixtures/_shared_agent_assertions.sh
that referenced mock.sh, a test harness file that no longer exists in
the repository.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: collapse multi-line imports to single-line per biome format
After removing offerGithubAuth exports, the remaining 2-import blocks
should be single-line. Also collapse fly/agents.ts 4-import block and
remove trailing blank line.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* test: Remove duplicate and theatrical tests
- Remove aws/agents describe block from aws.test.ts — it duplicated
the identical resolveAgent, agent configs, and generateEnvConfig
tests already present in fly.test.ts; both test the same shared
createAgents/resolveAgent logic from shared/agent-setup.ts
- Remove duplicate dotenv + interactive_prompts checks from
manifest-type-contracts.test.ts "Agent optional field types" section
— these are fully covered by the dedicated "Dotenv configuration"
and "Interactive prompts structure" sections below
- Fix always-skip test in history.test.ts: guard was silently skipping
when running as root (CI environment); replaced with explicit early
return inside block statement
- Fix conditional expects in commands-display.test.ts: the
if (line.includes("cloud")) / if (line.includes("agent")) guards
were unnecessary since every agent/cloud line always contains the
count string; rewrote to unconditional output assertions
- Fix redundant if (resolved) guard in run-path-credential-display.test.ts
after expect(resolved).toBe("claude") already guarantees non-null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: fix biome format issues in test files
Remove trailing blank line in aws.test.ts and expand single-line
if-block to multi-line in history.test.ts per biome format rules.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
* fix: add ~/.npm-global/bin to OpenClaw PATH for gateway, launch, and reconnect
OpenClaw installs to ~/.npm-global/bin/ via npm, but startGateway() and
launchCmd() only included ~/.bun/bin and ~/.local/bin in PATH — so the
`openclaw` binary was never found on non-Fly clouds (DigitalOcean, Hetzner,
AWS, GCP). Fly was unaffected because it uses setupOpenclawBatched() which
correctly includes the npm-global path.
Three fixes:
1. startGateway(): add $HOME/.npm-global/bin to PATH
2. launchCmd(): add $HOME/.npm-global/bin to PATH
3. install(): persist PATH to ~/.bashrc and ~/.zshrc (matching codex/kilocode
pattern) so reconnects via `spawn openclaw <cloud> --name ...` also work
Closes#1965
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: correct command chaining and idempotency in npm-global PATH setup
- Use curly braces to group grep||echo so PATH append only runs after
successful npm install (fixes operator precedence bug)
- Skip ~/.zshrc modification when file doesn't exist (avoids creating
it on non-zsh systems)
- Use grep -qF for literal string matching (no regex interpretation)
- Apply fix to all three affected agents: openclaw, codex, kilocode
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Two tests used try/catch with assertions in both branches, meaning they
passed whether loadManifest succeeded or threw. The comment claimed local
manifest fallback could be used, but tryLoadLocalManifest() returns null
in test environments (NODE_ENV=test), so the function always throws here.
Replace with expect().rejects.toThrow() which fails if no error is thrown
and eliminates the banned `err: any` type assertion.
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add npm-global/bin to PATH for openclaw startGateway and launchCmd
Fixes crash where openclaw gateway fails to start on non-Fly clouds
(DigitalOcean, Hetzner, AWS, GCP) because ~/.npm-global/bin was absent
from PATH in startGateway() and launchCmd(). Fly was unaffected because
setupOpenclawBatched() already included the correct PATH.
Fixes#1965
Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* style: fix Biome format error on launchCmd line
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The "no active servers" message suggested `spawn list --non-interactive`,
but --non-interactive is not a recognized CLI flag. Running it would
trigger an "Unknown flag" error since checkUnknownFlags() rejects it
before any subcommand dispatch.
Replace with `spawn list | cat`, which correctly forces non-interactive
output by making process.stdout.isTTY falsy.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The uploadFile function interpolated base64-encoded file content
directly into a shell command string via ${b64}, allowing potential
shell metacharacter injection and RCE on the Fly.io machine.
Fix: pipe base64 data through stdin instead of embedding it in
the command string, and add base64 character validation as
defense-in-depth (matching the pattern in daytona.ts).
Fixes#1961
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove CACHE_DIR dead export from manifest.ts (was defined but never imported anywhere)
- Add parseJsonObj() to @openrouter/spawn-shared for parsing JSON objects
- Remove 4x duplicate local parseJson/LooseObject definitions from hetzner, digitalocean, daytona, fly cloud modules
- Remove now-unused `import * as v from "valibot"` from all 4 cloud modules
- Bump CLI to 0.10.24
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
- manifest-helpers.test.ts: 7 tests used try/catch where the catch block
held all assertions. Since loadManifest() loads the local manifest.json
when NODE_ENV is not "test", these tests passed silently with 0 assertions.
Fix: set NODE_ENV=test + call _resetCacheForTesting() in beforeEach, and
replace try/catch with expect(...).rejects.toThrow(). Also remove `any`
type annotations on agentKeys/cloudKeys helper manifests.
- security-edge-cases.test.ts: "should use custom field name in error messages"
used a manual guard (throw new Error in try) instead of expect().toThrow().
Replace with 2 clean expect(() => ...).toThrow() calls.
- prompt-file-security.test.ts + security.test.ts: tests that checked multiple
error message properties used try/catch with `catch (e: any)`. Replace with
proper instanceof narrowing so the caught value is typed without `any`.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: add coverage for cloud-init tier selection functions
getPackagesForTier, needsNode, and needsBun had zero test coverage
despite non-trivial branching logic (4-way tier switch). Any change
to package lists or tier membership would be silently undetected.
Agent: test-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: format cloud-init.test.ts to pass biome format check
Agent: team-lead
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
spawn list in interactive mode showed "No spawns recorded yet" even when
spawn history existed but no active servers were reachable (e.g. after a
failed spawn or deleted server). Now shows the correct count and hints.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The opencode project migrated from github.com/anomalyco/opencode to
github.com/sst/opencode. The old org's releases may no longer be
updated, causing opencode provisioning to fail.
Updates:
- Release download URL in agent-setup.ts
- url, creator, and repo fields in manifest.json
- Agent table link in README.md
Fixes#1948
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: Remove duplicate and theatrical tests
- Remove duplicate getScriptFailureGuidance describe block from
download-and-failure.test.ts (already covered by script-failure-guidance.test.ts)
- Remove duplicate getStatusDescription and getErrorMessage describe blocks
from download-and-failure.test.ts (covered by commands-exported-utils.test.ts)
- Remove duplicate buildRetryCommand, isRetryableExitCode, getScriptFailureGuidance,
and getErrorMessage describe blocks from run-path-credential-display.test.ts
(all covered by dedicated test files)
- Remove duplicate hasCloudCredentials and credentialHints describe blocks
from run-path-credential-display.test.ts (covered by cloud-credentials.test.ts
and credential-hints.test.ts respectively)
- Fix always-pass conditional patterns in manifest-type-contracts.test.ts:
remove tautological "at least one agent uses X" tests that only registered
when the condition was already true, making them guaranteed-pass noise
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-- qa/dedup-scanner
* fix: Apply biome format to fix trailing blank lines in test files
Remove trailing blank lines in download-and-failure.test.ts and
run-path-credential-display.test.ts to satisfy biome format check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-- qa/team-lead
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the `runWithRetry` function exported from 4 cloud modules (aws, hetzner, gcp, digitalocean)
that were defined but never called anywhere in the codebase. Only `fly.ts` uses its own
`runWithRetry` internally, so that definition is preserved.
Also bump CLI version to 0.10.22 per version policy.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Daytona was the only cloud provider without process timeouts in
runServer() and runServerCapture(). All other providers (AWS, Fly,
Hetzner, DigitalOcean, GCP) implement setTimeout + killWithTimeout
to prevent the CLI from hanging forever on stalled remote commands.
This adds the same timeout pattern: default 300s, configurable via
the timeoutSecs parameter that the CloudRunner interface already
declares but Daytona was silently ignoring.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* test: remove 5 duplicate and theatrical test files
Remove test files that are fully duplicated by more comprehensive
counterparts, plus one theatrical test that only grep-checks shell
script text without testing behavior.
Duplicates removed:
- manifest-validation.test.ts (subset of manifest-cache-lifecycle.test.ts)
- matrix-compact-footer.test.ts (subset of commands-exported-utils.test.ts)
- commands-output.test.ts (subset of commands-display.test.ts)
- cloud-info.test.ts (subset of commands-cloud-info.test.ts)
Theatrical test removed:
- install-script-validation.test.ts (reads install.sh as string, checks
substring presence -- tests that functions "exist" not that they work)
All 1657 remaining tests pass. Zero regressions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: Fix always-pass pattern and stale comments
- integration.test.ts: remove conditional `if (cacheExists)` block that
silently skipped the cache-file assertion when the file wasn't written;
the second loadManifest() call already exercises in-memory caching
without needing the conditional; remove now-unused readFileSync/existsSync
imports
- commands.test.ts: remove stale references to cloud-info.test.ts and
commands-output.test.ts (deleted in prior commit) from inline comment;
remove unused createMockManifest import
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Both fly.ts and daytona.ts defined a local `sleep` helper identical to the
one already exported from shared/ssh.ts. Remove the local copies and import
the shared function instead, consistent with all other cloud modules.
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The cmdRun path (the main user flow) was still using async
child_process.spawn for script execution. This left Bun's event loop
running while SSH (a grandchild process inside the bash script)
competed for fd 0 input bytes — causing intermittent keystroke loss.
Switch spawnBash to use spawnSync, which blocks the event loop entirely
and gives the child process exclusive terminal access. This matches
what we already did for runInteractiveCommand in #1939.
Also removes dead spawnCalls tracking code from cmdrun-happy-path test.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
npm install -g openclaw fails with EACCES on non-root users (e.g.,
ubuntu on AWS Lightsail) because /usr/local/lib/node_modules isn't
writable. Use the same ~/.npm-global prefix pattern already used by
codex and kilocode agents.
Fixes both the standard installAgent path and the batched
setupOpenclawBatched path (used by Fly).
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: eliminate keystroke loss during interactive agent sessions
Three root causes were identified and fixed:
1. **Event loop fd competition**: Bun.spawn with stdio:"inherit" shares
fd 0 between the parent event loop and the child SSH process. The
kernel arbitrarily splits input bytes between them, causing random
keystroke drops. Introduced spawnInteractive() using Node's
child_process.spawnSync to block the event loop entirely.
2. **Unnecessary shell layers**: AWS and GCP wrapped the SSH command in
an extra `bash -c '...'` layer, creating 3 shell processes before the
agent. Aligned to match Hetzner/DO which pass directly.
3. **stty sane side effects**: prepareStdinForHandoff() ran `stty sane`
which enables ixon (XON/XOFF flow control), causing periodic input
freezes. Removed — setRawMode(false) is sufficient. Also removed
process.stdin.destroy() which could corrupt fd 0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: biome format + remove stdin unref that broke async spawn
- Fix biome formatting in ssh.ts and commands.ts
- Remove process.stdin.unref() from prepareStdinForHandoff — it
allowed the event loop to exit before async child_process.spawn
finished, causing test failures and potential production issues
with the spawnBash (legacy script execution) path
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The parent process called process.stdin.resume() which put stdin into
flowing mode, making it actively read from fd 0 and discard bytes
(no listeners). This caused the parent to race with the child SSH
process for keystrokes — the kernel gave each byte to whichever
process called read() first, resulting in random keystroke drops.
Switching to pause() makes the parent stop reading from fd 0, so
Bun.spawn(stdio: "inherit") gives the child exclusive access to
the terminal input via dup2().
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
On AWS and GCP, cloud-init ran `n install 22` via `su - ubuntu` (non-root).
The n version manager needs write access to /usr/local/bin/ which the
non-root user may not have reliably in non-interactive cloud-init context.
This caused npm to not be installed/on PATH, breaking `npm install -g
openclaw` with "npm: command not found".
Fix: run n install as root (cloud-init already runs as root) so node/npm
install directly to /usr/local/bin/ which is always on PATH. This matches
what Hetzner and DigitalOcean already do. Also removes the now-unnecessary
npm global prefix configuration since /usr/local is the default.
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
When spawn delete encounters a cloud API error (network timeout, 500,
auth failure), the server is still running. Marking the record as
deleted in this case hides it from spawn delete/spawn list, preventing
retry and causing untracked billing.
Only mark as deleted on: (1) successful deletion, (2) server already
gone/404. Error paths keep the record active for retry.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: reset terminal state before interactive session handoff
The stdin handoff from TS orchestration to the interactive SSH session
was leaving the terminal in a dirty state, causing users to need 2+
Enter presses or random keystrokes before input worked.
Three fixes:
1. Unconditionally call setRawMode(false) instead of checking isRaw
first — @clack/core's close() already resets the flag but the
terminal can still be dirty after multiple readline instances
2. Run `stty sane` to fully reset the terminal line discipline,
undoing any damage from readline's emitKeypressEvents
3. Resume stdin instead of pausing it — Bun.spawn with stdio:"inherit"
needs an active stream, a paused stdin causes the child to see
blocked input
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix Biome formatting for Bun.spawnSync call
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* fix: add SIGKILL fallback to process timeout kills
proc.kill() only sends SIGTERM; SSH processes stuck in network I/O can
ignore SIGTERM and cause the CLI to hang forever waiting on proc.exited.
Add killWithTimeout() to shared/ssh.ts that sends SIGTERM then SIGKILL
after a 5s grace period. Replace all 10 proc.kill() timeout sites across
Fly, AWS, DigitalOcean, GCP and Hetzner providers.
Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* chore: format files with biome
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The error messages in handleRecordAction() recommended
`spawn agent/cloud` (slash notation), but the CLI itself shows
"Tip: use a space instead of slash" when users follow that advice.
Changed to `spawn agent cloud` to match canonical syntax.
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Same pipe-buffer deadlock pattern fixed by PRs #1903, #1915, #1920, #1922.
Two instances were missed in those passes.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
All other connection fields (ip, user, server_name) are validated
against injection before being passed to shell commands, but server_id
was skipped in both cmdConnect and cmdEnterAgent despite being used as
a daytona ssh argument (line 2922). This inconsistency existed while
execDeleteServer, mergeLastConnection, and the headless code path all
correctly validated server_id.
Adds the missing `if (connection.server_id) { validateServerIdentifier(...) }`
guard in both functions, matching the existing server_name pattern.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
PR #1920 fixed pipe buffer deadlock in runServerCapture and
waitForCloudInit but missed 6 other locations where Bun.spawn uses
"pipe" for stderr without draining it before await proc.exited.
When a child process writes >64KB to a piped stderr, the OS pipe
buffer fills, the child blocks on write(), and the parent blocks on
exited — classic deadlock.
Fix: change stderr from "pipe" to "inherit" in all 6 locations since
the stderr output is never read programmatically. This also lets
users see installation errors and SCP errors in real time.
Affected functions:
- fly.ts ensureFlyCli()
- sprite.ts ensureSpriteCli()
- gcp.ts ensureGcloudCli()
- hetzner.ts uploadFile()
- digitalocean.ts uploadFile()
- aws.ts uploadFile()
-- refactor/code-health
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The runServerCapture functions in fly.ts and daytona.ts spawn processes
with stdio: ["pipe", "pipe", "pipe"] but only drain stdout. If stderr
output exceeds the 64KB pipe buffer, the child process blocks on write
and deadlocks. This was already fixed in Hetzner, DigitalOcean, AWS,
GCP, and shared/ssh.ts (commit 2e79d71b) but Fly and Daytona were
missed.
Apply the same Promise.all pattern to drain both pipes concurrently.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Delete resolve-prompt.test.ts entirely - it defined replicas of
extractFlagValue, resolvePrompt, and handleDefaultCommand from index.ts
rather than importing them. The replicas had already diverged from the
real code (different parameters, missing flag aliases).
Remove replica functions (renderCompactList, renderMatrixFooter) and
their tests from matrix-compact-footer.test.ts while keeping the valid
tests for exported functions (getImplementedClouds, getMissingClouds,
calculateColumnWidth, getTerminalWidth).
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The "Full setup guide" link shown by `spawn <cloud>` pointed to
`/tree/main/{cloud}` which is a 404. The actual READMEs live under
`sh/{cloud}/`, so the URL should be `/tree/main/sh/{cloud}`.
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
PR #1903 fixed a pipe buffer deadlock in awsCli() by draining both
stdout and stderr before awaiting proc.exited. The same pattern existed
in runServerCapture() across 4 cloud providers and waitForCloudInit()
across 3 providers. If SSH produces >64KB of stderr, the child blocks
writing to the full pipe while the parent blocks waiting for exit.
Fixes: hetzner, aws, digitalocean, gcp — 7 locations total.
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
- Always show instance size picker (remove SPAWN_CUSTOM gate) so users
can choose bigger instances instead of silently defaulting to nano
- Add 1GB swap in cloud-init so curl installer doesn't get OOM-killed
on 512MB nano instances
- Set N_PREFIX=$HOME/.n in installClaudeCode so the Node.js fallback
via `n` works as non-root (ubuntu user can't write to /usr/local/n)
- Add $HOME/.n/bin to Claude Code PATH so node is found after fallback
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
SSH scans every byte for ~ escape sequences by default, adding
per-keystroke overhead. Disable this for interactive agent sessions
where escape sequences aren't needed. Also add AddressFamily=inet
to skip IPv6 resolution stalls.
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Ahmed Abushagur <ahmed@abushagur.com>
* fix: run bun in foreground in DigitalOcean scripts to unfreeze interactive prompts
The _run_with_restart function backgrounded bun with `& + wait` so a SIGTERM
trap could forward the signal. But backgrounding removes bun from the terminal's
foreground process group, which prevents @clack/prompts multiselect from entering
raw mode — arrow keys print as raw escape sequences (^[[A^[[B) and the SSH key
selection prompt freezes.
Fix: run bun in the foreground and detect SIGTERM from exit code 143 (128+15)
instead of using a trap flag + PID tracking. This preserves the restart-on-signal
behavior while giving bun full terminal access for interactive prompts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: replace @clack/prompts multiselect with /dev/tty picker for SSH keys
When the CLI (parent bun) spawns bash → child bun for cloud scripts,
the parent's event loop keeps fd 0 registered and races with the child's
@clack/prompts for terminal input. This causes the SSH key multiselect
to render but freeze — arrow keys print as raw escape sequences.
Fix: add multiPickToTTY() in picker.ts that opens /dev/tty directly,
bypassing process.stdin entirely. Replace the @clack/prompts multiselect
in ssh-keys.ts with this new function. Also add process.stdin.unref()
to prepareStdinForHandoff() so the parent stops polling fd 0.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* perf: disable SSH compression for interactive sessions
Compression=yes adds per-keystroke CPU overhead that causes
noticeable input lag on normal connections. Only beneficial
on slow/high-latency links.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: drain piped stderr before awaiting process exit to prevent deadlock
Awaiting `proc.exited` before reading piped stderr causes a deadlock when
the child process writes enough stderr output to fill the OS pipe buffer
(~64KB). The process blocks waiting for the buffer to drain, but we never
drain it because we're waiting for the process to exit first.
Fix sprite/sprite.ts (createSprite, uploadFileSprite) and aws/aws.ts
(awsCli) to start draining stderr before awaiting exit, matching the
established pattern in gcp/gcp.ts and shared/ssh.ts.
Agent: code-health
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: apply biome format fixes to pass CI
Agent: team-lead
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The standard SSH path in cmdEnterAgent() interpolated remoteCmd into a
single-quoted bash -lc wrapper without escaping embedded single quotes.
If launch_cmd (from history.json) or the manifest's launch/pre_launch
fields contained a single quote, the shell quoting would break, allowing
unintended command execution on the remote server.
The Fly.io path already had this escaping (PR #1880, #1893) but the
generic SSH fallback did not. This adds the same replace(/'/g, "'\\''")
pattern used everywhere else in the codebase.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>