Commit graph

1385 commits

Author SHA1 Message Date
A
ea9bb2bee5
fix: use direct Node.js binary tarball on Fly instead of apt/npm/n (#1637)
Replace the Node.js install chain (apt nodejs+npm → npm install -g n →
n 22 → symlinks) with a single curl of the v22 binary tarball from
nodejs.org. Eliminates python3 dependency, npm bloat, and the n version
manager. Bun is installed first as the primary package manager.

Fly-only change — other clouds unchanged pending validation.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:44:39 -08:00
A
42f2b66b55
fix: keep stdin pipe open during fly ssh to prevent session teardown (#1638)
flyctl tears down the WireGuard transport when stdin closes ("session
forcibly closed; the remote process may still be running"). This
killed long-running commands like `bun install -g openclaw`.

Instead of calling stdin.end() immediately, keep the pipe open for
the duration of the command and close it after the process exits.
The pipe still prevents interactive prompts from hanging (no data
flows through it), but flyctl no longer interprets the closed fd
as a signal to kill the session.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:44:14 -08:00
A
c0c6f896b9
fix: use TypeScript module for fly spawn delete (bash script sourced missing file) (#1635)
After PR #1602 converted fly/ from bash to TypeScript, fly/lib/common.sh was
removed. However, buildDeleteScript() still generated a bash script that tried
to source it via curl, causing spawn delete to always fail for Fly.io servers
with a curl exit 22 (404). Users were left with orphaned apps incurring charges.

Fix: add fly-specific path in execDeleteServer() that calls ensureFlyCli(),
ensureFlyToken(), and destroyServer() directly from the TypeScript fly module,
bypassing the bash script path entirely. Remove the dead case "fly" from
buildDeleteScript().

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-21 15:28:51 -08:00
A
4bd373f1aa
fix: replace NodeSource with n for Fly Node.js install (#1633)
NodeSource's setup_22.x script adds an APT repo, pulls in python3 as a
dependency, and runs apt-get update twice — slow and heavyweight. Switch
to the same approach used by GCP/Hetzner: install apt's bundled nodejs,
then upgrade to v22 via n with symlinks.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:16:11 -08:00
A
b4be9b9d2f
fix: use pipe+close for fly ssh stdin instead of ignore (#1632)
fly ssh console with stdin as /dev/null ("ignore") can cause the
connection to hang — flyctl doesn't get a clean EOF signal to know
when to close the transport. Switch to "pipe" and immediately call
stdin.end() so flyctl receives a proper EOF.

Applied to runServer, runServerCapture, and uploadFile.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:12:44 -08:00
A
9b6e860bba
fix: hetzner destroy_server false failure and daytona silent error swallow (#1622)
Hetzner: grep '"error"' matched '"error": null' in successful DELETE
responses, causing every server deletion to log "Failed to destroy
server" and return 1 even when the server was actually destroyed.
Fix: use jq -e '.action' (consistent with create_server's approach)
which is only present on success.

Daytona: >/dev/null 2>&1 || true suppressed all errors and always
logged "Sandbox destroyed" regardless of outcome. Fix: capture response,
check exit code and error JSON fields, log billing warning on failure.

Same class of bug as AWS (#1606) and GCP/DigitalOcean (#1615).

Agent: team-lead

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 18:12:24 -05:00
A
46cadfd5e1
fix: add --no-install-recommends to all apt calls across clouds (#1631)
Same fix as the fly PR (#1629) applied to all bash-based clouds.
Without --no-install-recommends, `git` pulls in python3 (~50MB)
via recommended packages on Ubuntu 24.04.

Affected files:
- gcp/lib/common.sh — cloud-init userdata
- aws/lib/common.sh — cloud-init userdata + unzip install
- daytona/lib/common.sh — sandbox base tools
- shared/common.sh — jq install + Node.js fallback
- shared/github-auth.sh — gh CLI install

Also added DEBIAN_FRONTEND=noninteractive and ca-certificates
where missing.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 18:12:19 -05:00
A
576fc05c6e
fix: prevent openclaw install hang on fly by closing inherited FDs (#1630)
`bun install -g openclaw` spawns child processes that keep stdout/stderr
FDs open, preventing `fly ssh console -C` from detecting EOF and returning.
Wrap in subshell with redirected output so children inherit closed FDs.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:04:35 -08:00
A
246351874e
fix: skip unnecessary apt recommends (python3) during fly VM setup (#1629)
git on Ubuntu 24.04 pulls in python3 via recommended packages. Use
--no-install-recommends to install only direct dependencies. Also
added ca-certificates explicitly since it's needed for HTTPS but
won't be auto-pulled without recommends.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 15:02:55 -08:00
A
cbd8c87a6d
fix: prevent terminal hang during fly agent install + fatal preLaunch (#1628)
Two fixes:

1. runServer() was inheriting stdin, so commands like `claude install`
   that try to read input would hang the terminal indefinitely. Changed
   stdin to "ignore" (/dev/null) for non-interactive remote commands.

2. preLaunch failures (e.g. OpenClaw gateway) were silently swallowed,
   dropping users into a broken TUI with no gateway. Now preLaunch
   errors propagate — users get a clear error instead of a mystery hang.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:53:07 -08:00
A
79da7298f9
test: remove 11 more theater/duplicate test files (825 tests) (#1627)
Removed files fall into two categories:

1. Replica files (7) — define inline copies of functions and test the
   copies instead of real code:
   - resolve-list-filters.test.ts (75 tests, explicit "Exact replica" comment)
   - credential-display-lines.test.ts (44 tests, "test via exact replicas")
   - cli-pipeline.test.ts (43 tests, replica extractFlagValue)
   - index-parsing.test.ts (38 tests, replica expandEqualsFlags/extractPromptArgs)
   - list-prompt-display.test.ts (32 tests, replica suggestCloudsForPrompt)
   - prompt-file-errors.test.ts (34 tests, replica handlePromptFileError)
   - commands-helpers.test.ts (64 tests, replica calculateColumnWidth/validateNonEmptyString)

2. Duplicate coverage (4) — import real functions but test the exact
   same helpers already covered in commands-exported-utils.test.ts:
   - credential-prioritization.test.ts (57 tests, also has replica functions)
   - list-output-helpers.test.ts (97 tests, same 10+ functions)
   - time-auth-record-helpers.test.ts (70 tests, same 9 functions)
   - manifest-real-data.test.ts (44 tests, same functions against real manifest)

Before: 92 files, 4,469 tests
After:  81 files, 3,644 tests (0 failures)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:40:37 -08:00
A
cbecb9cbea
fix: suppress interactive dpkg prompts during fly VM setup (#1626)
tzdata (pulled in as a Node.js dependency) tries to run
dpkg-reconfigure interactively, which fails on headless Fly
machines. Set DEBIAN_FRONTEND=noninteractive so apt silently
accepts defaults.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:35:27 -08:00
A
8f47fac0c2
fix: add strict quality guardrails to test-engineer agent prompt (#1624)
The test-engineer agent was generating hundreds of fake tests that
copy-pasted source functions inline instead of importing them. These
tests pass even when the real code is broken (2,497 removed in #1620).

Three layers of defense:
- Global: ban bulk test generation with replica pattern from any agent
- Team lead: explicit rejection criteria for inline-replica test plans
- Test-engineer: 6 strict non-negotiable quality rules (must import
  from source, max 1 new file per cycle, prioritize fixing over adding)

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:26:41 -08:00
A
bd78f6dc1f
fix: replace fly machine exec with fly ssh console to fix 408 timeouts (#1623)
fly machine exec uses Fly's HTTP exec API which randomly returns 408
deadline_exceeded on commands >30s. Switch all non-interactive remote
execution (runServer, runServerCapture, uploadFile) to fly ssh console -C
which uses WireGuard tunneling and is reliable for long-running commands.

Also batch ~25 individual remote calls into ~4 combined shell scripts:
- waitForCloudInit: 8 calls → 1 (apt, node, bun, PATH setup)
- installClaudeCode: 8 calls → 1 (cleanup, install, finalize)
- setupClaudeCodeConfig: 5 calls → 1 (inline base64 file writes)
- env setup in main.ts: 4 calls → 1 (inline base64 + shell hooks)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-02-21 14:25:05 -08:00
A
9c0ebcba63
test: remove unicode token test (contradicts security validation) (#1625)
_load_token_from_config intentionally rejects non-ASCII tokens via
regex validation to prevent curl injection. The test expected unicode
tokens to pass, contradicting the code's security design.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:22:53 -08:00
A
978221c673
test: remove sandbox-verification.test.ts (always skipped in CI) (#1621)
CI runs `bun test` from the repo root, not `cli/`, so the
bunfig.toml preload that sets up the sandbox never loads. All 17
tests skip silently — they verify preload infrastructure, not
application code.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:12:15 -08:00
A
175b1a798b
test: remove 38 replica/duplicate test files (2,497 fake tests) (#1620)
These test files were auto-generated by an AI agent and test copy-pasted
"replica" functions defined inline — not the real source code. They pass
even when the actual code is broken, providing false confidence.

Two categories removed:

1. Replica-only files (34 files, ~1,482 tests): Define inline copies of
   functions and test those copies instead of importing from source.
   Examples: key-server.test.ts, trigger-server.test.ts,
   index-dispatch-routing.test.ts, verb-aliases.test.ts

2. Duplicate-with-imports files (4 files, ~631 tests): Import real
   functions but duplicate coverage already in
   commands-exported-utils.test.ts. Examples:
   commands-credential-display-internals.test.ts (178 tests),
   cli-core-edge-cases.test.ts (237 tests)

Before: 131 files, 6,966 tests (5 failing)
After:   93 files, 4,469 tests (1 pre-existing failure)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:09:06 -08:00
A
b055c0a285
fix: gcp and digitalocean destroy_server silently swallow errors (#1615)
GCP's destroy_server redirected both stdout and stderr to /dev/null
without checking the exit code, so deletion failures were invisible
to users. DigitalOcean's destroy_server never checked the API response
for error payloads, always reporting success.

Both bugs could leave cloud instances running (and charging money)
while telling users they were destroyed. Same class of bug fixed for
AWS in PR #1606.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-21 17:08:53 -05:00
A
53db26056c
fix: aws safe_read calls discard user input, breaking CLI install flow (#1613)
safe_read() outputs via stdout and takes only one argument (the prompt).
Three call sites in aws/lib/common.sh incorrectly passed a variable name
as a second argument instead of using command substitution:

  safe_read "prompt" varname    # BUG: varname never assigned
  varname=$(safe_read "prompt") # CORRECT: captures stdout

This caused:
- Install prompt always defaulting to "y" (user's "n" was ignored)
- AWS credentials never being captured after CLI install, leaving
  AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY empty, so the
  install-then-configure code path always failed silently

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-21 17:08:50 -05:00
A
0f59f0e844
fix: fly machine wait timeout exceeds API max of 60s (#1619)
The Fly Machines API enforces a [1s, 1m0s] range on
WaitMachineRequest.Timeout. We were passing 90s, which caused an
invalid_argument error and prevented machines from starting.

Lower the default to 60s (the API maximum) and retry up to 3 times
so slow-starting machines still have a full 3-minute window.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 14:00:13 -08:00
A
0569c86ba5
fix: strip ANSI codes in tests that compare plain text output (#1617)
On CI (GitHub Actions), `CI=true` causes picocolors to enable ANSI
output. Tests comparing against plain text (e.g., `toContain("--prompt
requires a value")`) fail because the actual output wraps text in bold/
dim ANSI codes.

Fixes:
- Subprocess tests (runCli): add NO_COLOR=1 to child env
- Mock capture tests: add stripAnsi() helper to output getters
- Bash subprocess tests: add NO_COLOR=1 to execSync env

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 13:52:30 -08:00
A
d69c4f0f02
fix: reject non-2xx responses in Fly.io token validation (#1614)
testFlyToken() fallback to /v1/user accepted 404 plain text responses
because hasError() only checks for JSON "error"/"errors" keys. Adding
resp.ok check ensures non-2xx responses are correctly rejected.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 13:47:40 -08:00
A
01e5fa842d
fix: update tests for fly TypeScript shim scripts and env injection (#1609)
After the fly provider was converted to TypeScript (PR #1602), the bash
shim scripts no longer source lib/common.sh or reference OPENROUTER_API_KEY
directly -- that logic moved to TypeScript. Skip TypeScript shim scripts
in bash-specific convention checks.

Also fixes:
- URL regex in cloud-error-guidance to exclude backticks/commas from
  template literals in heredocs
- aws added to skipProviders for destroy_server error check (uses set -e
  and internal process.exit, not explicit return 1)
- inject_env_vars_local test regex updated to match semicolon separator
  instead of && (matches actual shared/common.sh implementation)

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 12:45:21 -08:00
A
262d081756
refactor: move fly TS into cli/src/fly/, add build-clouds.sh (#1604)
Move all fly TypeScript files from fly/lib/*.ts and fly/main.ts into
cli/src/fly/. This gives them access to cli/node_modules (@clack/prompts),
biome linting, and the existing bun:test infrastructure — no symlinks or
NODE_PATH hacks needed.

The org picker now uses @clack/prompts select() directly (static import,
bundled at build time).

New: cli/build-clouds.sh — auto-discovers cli/src/*/main.ts and bundles
each into {cloud}.js. Scalable to future cloud TS migrations:
  bash cli/build-clouds.sh        # build all
  bash cli/build-clouds.sh fly    # build one

Shims now check for cli/src/fly/main.ts (local) or download fly.js from
GitHub releases (remote curl|bash).

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 12:34:09 -08:00
L
14fb352b52
feat: add Windows PowerShell installer (install.ps1) (#1610)
The only existing installer (install.sh) is bash-only and fails silently
on Windows PowerShell — 'curl ... | bash' errors because bash.exe is not
available outside WSL.

install.ps1 implements the same logic as install.sh for PowerShell:
- Checks bun >= 1.2.0; installs via bun.sh/install.ps1 if missing
- Downloads CLI source via git sparse-checkout or GitHub API fallback
- Builds with 'bun install && bun run build'; falls back to pre-built binary
- Installs to %USERPROFILE%\.local\bin (or SPAWN_INSTALL_DIR override)
- Creates spawn.cmd wrapper for cmd.exe compatibility
- Adds install dir to the user's persistent PATH if not already present

Usage:
  irm https://raw.githubusercontent.com/OpenRouterTeam/spawn/main/cli/install.ps1 | iex

README updated with Windows PowerShell install instructions alongside
the existing macOS/Linux/WSL command.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 12:31:23 -08:00
Ahmed Abushagur
76cb8ead3b
fix: unpin Codex version, restore wire_api=responses (#1608)
Reverts the 0.94.0 pin — install latest Codex and use the required
wire_api="responses" format.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 20:12:23 +00:00
A
70dd127edc
fix: aws destroy_server silently swallows errors, fix URL regex in test (#1606)
The aws destroy_server function had conditional logic (if/else for CLI
vs REST mode) but no error handling - failures were silently ignored and
"Instance destroyed" was logged even on failure. This could leave
instances running and incurring charges without the user knowing.

Also fix the URL extraction regex in cloud-error-guidance.test.ts to
exclude backtick characters, preventing false positives from template
literals in embedded TypeScript code.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-21 15:09:44 -05:00
A
eff99caefe
fix: apply default spawn name when user presses Enter without typing (#1605)
promptSpawnName() used `placeholder` (visual hint only) without `defaultValue`,
so pressing Enter returned an empty string instead of applying the placeholder.
Now generates a unique default like `spawn-a3f2` with a random suffix to avoid
Fly.io global name collisions.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 11:46:59 -08:00
A
d7ff0739a2
fix: fly auth token deprecated + org picker + macaroon tokens (#1603)
* fix: fly auth token deprecated + org picker + macaroon discharge tokens

Three fixes for the fly/ TypeScript provider:

1. `fly auth token` is deprecated — newer flyctl outputs a message, not
   a token. Now tries `fly tokens create org --expiry 24h` first, with
   `fly auth token` as fallback. Uses org tokens (not deploy) since
   spawn needs to create new apps.

2. Token sanitization stripped macaroon discharge tokens at commas
   (`fm2_[^ ,]*` → `fm2_\S+`). The full composite token
   `fm2_xxx,fm2_yyy,fo1_zzz` is now preserved.

3. Org picker upgraded from numbered 1/2 input to arrow-key interactive
   selector with cursor navigation, scroll windowing, and fallback to
   numbered list when TTY is unavailable.

Also fixes: testFlyToken fallback sent `Bearer FlyV1 ...` (double prefix)
for macaroon tokens — now dispatches FlyV1 vs Bearer correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: never run test/mock.sh locally — opens browser, CI only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 11:06:19 -08:00
A
2ef621cc69
refactor: convert fly/ cloud provider from bash to TypeScript (#1601) (#1602)
Replace fly/lib/common.sh (741 lines of bash) with a TypeScript
implementation using Bun runtime. The fly/ provider was the most
complex bash code in the project — recent fixes (#1597, #1599, #1600)
highlight the pain of debugging HTTP calls, JSON parsing, and multi-step
auth flows in shell.

New TypeScript modules:
- fly/lib/ui.ts — logging, prompts, validation (zero deps)
- fly/lib/fly.ts — API client (fetch), auth chain, org listing, provisioning
- fly/lib/oauth.ts — OpenRouter OAuth via Bun.serve(), key management
- fly/lib/agents.ts — typed agent configs for all 6 agents
- fly/main.ts — orchestrator entry point

Agent .sh files become thin shims (~30 lines) that install bun if needed,
download TS sources for curl|bash execution, and delegate to main.ts.

Test coverage:
- 44 TypeScript unit tests (bun test) for pure logic
- 4 fly failure-mode tests (mock.sh) for error scenarios
- All existing test suites pass (110 run.sh, 76 mock.sh)

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 10:41:34 -08:00
A
119e0f3660
fix: use fly auth login (OAuth) instead of manual token paste (#1600)
* fix: use fly auth login (OAuth) instead of manual token paste

The fly auth flow was falling back to ensure_api_token_with_provider
which prompts users to manually paste a token from the dashboard.
This is bad UX when `fly auth login` exists and handles browser-based
OAuth automatically.

New auth chain:
1. FLY_API_TOKEN env var (if set and valid)
2. Saved config (~/.config/spawn/fly.json)
3. Existing fly CLI session (fly auth token)
4. fly auth login — browser OAuth flow (NEW)

Removes the manual token paste fallback entirely. If fly CLI isn't
installed, fails with a clear install instruction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add manual token paste as final fallback after OAuth

Auth chain is now:
1. FLY_API_TOKEN env var
2. Saved config
3. fly auth token (existing session)
4. fly auth login (OAuth)
5. Manual token paste (last resort)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 17:43:34 +00:00
A
fae0d764fd
fix: fail loudly with root cause when fly org fetch fails (#1599)
Previously _fly_list_orgs silently swallowed all errors (2>/dev/null
everywhere) and _fly_prompt_org fell back to manual input with no
diagnostic info. Now both paths (fly CLI + GraphQL) surface specific
failure reasons — missing CLI, empty output, parse errors with raw
JSON, GraphQL errors — and _fly_prompt_org fails hard with actionable
debug hints instead of silently defaulting.

Also always show the org picker when fetch succeeds (no silent default).

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 09:14:40 -08:00
A
c25566d810
fix: fall back to GraphQL API for Fly.io org listing when flyctl fails (#1597)
_fly_list_orgs previously relied solely on `flyctl orgs list --json`.
When flyctl is absent or its output is unexpected, the user gets dumped
into a manual "Enter Fly.io org slug" prompt — even though we already
have a valid API token.

Now tries flyctl first, then falls back to the Fly.io GraphQL API
(`api.fly.io/graphql`) using the saved FLY_API_TOKEN. Works with
both Bearer and FlyV1 macaroon tokens.

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 09:03:21 -08:00
A
65317f3969
fix: bun mock shim forwards args + strips TS for node fallback on CI (#1598)
The mock bun shim was broken on CI (ubuntu-latest, no real bun):
- Only passed $2 to node, dropping -- field default args needed by _fly_json
- Didn't strip TypeScript annotations (: any[], as any) that node can't parse

Fixes:
- shift 2 to preserve extra args, forward them to both real bun and node
- sed -E strips TS type annotations before passing to node --input-type=module
- All fly tests now pass under the node-only CI fallback path

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 09:02:23 -08:00
A
a1a1ae85dc
refactor: replace python3 with bun+TS in fly/lib/common.sh (#1596)
* refactor: replace python3 with bun+TS in fly/lib/common.sh, fix token validation

Three targeted fixes to the Fly.io library:

1. Replace all python3 with bun+TypeScript:
   - _fly_json: stdin-piped field extractor via bun -e (no eval, no env var
     size limits — handles arbitrarily large API responses)
   - _fly_json_ids: dedicated machine ID extractor for destroy_server
   - _fly_list_orgs: bun -e with flat dict + nodes/organizations support
   - list_servers: bun -e formatted table output
   Zero python3 invocations remain in the file.

2. Dual-endpoint _test_fly_token: tries Machines API first (deploy tokens),
   falls back to api.fly.io/v1/user (OAuth/personal tokens). Prevents
   rejecting valid personal tokens that lack Machines API access.

3. No more eval(): _fly_json uses direct property access (d[field]) instead
   of python3 eval(expr), eliminating the code injection surface entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: always prompt user for Fly.io org, never silently default

_fly_prompt_org now asks the user directly when the org list can't be
fetched, instead of silently falling back to "personal".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 08:56:24 -08:00
A
2af285dc52
feat: fall back to SigV4 REST API when AWS CLI is absent (aws/lightsail) (#1583)
* feat: fall back to SigV4 REST API when AWS CLI is absent (aws/lightsail)

If `aws` CLI is not installed but AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
are set, provision Lightsail instances directly via the REST API instead of
erroring out.

- Add _lightsail_rest(): inline Bun TypeScript that computes SigV4 signatures
  via node:crypto and calls the Lightsail API with native fetch — no openssl
  or curl gymnastics required
- Add _ls_json(): dot-path JSON parser, prefers jq, falls back to bun eval
- ensure_aws_cli() now sets LIGHTSAIL_MODE=cli|rest; REST mode requires bun
  (already a project dependency) and shows a clear error if missing
- All API calls in ensure_ssh_key, create_server, _wait_for_lightsail_instance,
  destroy_server, list_servers are gated on LIGHTSAIL_MODE
- Replace all python3 JSON encoding (key import, userdata, list table) with
  bun eval — consistent with project tooling
- No more auto-install of the 200 MB AWS CLI binary

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add interactive AWS CLI install when CLI is missing

When neither aws CLI nor raw credentials are found, prompt the user
to install AWS CLI v2 on the spot (macOS .pkg / Linux zip installer).
After install, prompt for Access Key ID + Secret and validate via
sts:GetCallerIdentity before proceeding.

The decision cascade is now:
  1. Existing aws CLI with valid creds → cli mode
  2. Raw env-var creds + bun available → rest mode
  3. Offer to install aws CLI → prompt for creds → cli mode
  4. Creds collected during install + bun → rest mode fallback
  5. Nothing worked → show manual instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: eliminate code/path injection in bun eval calls (aws/lib/common.sh)

Pass shell variables as process.argv arguments instead of interpolating
them into JavaScript string literals:

- _ls_json(): path parameter passed as process.argv[2] (was CRITICAL
  code injection — attacker-controlled path could escape the string)
- ensure_ssh_key(): pub_path and key_name passed as process.argv[2..3]
  (was HIGH — path injection via $HOME)
- create_server(): ud_tmp, name, az, bundle passed as process.argv[2..5]
  (was MEDIUM — temp file path interpolation)

Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
2026-02-21 16:46:57 +00:00
A
a0da57d559
docs: add CLAUDE.md rule — always use bun+ts for inline scripting, never python (#1595)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 08:46:46 -08:00
A
279eddd689
fix: verify Node.js install after nodesource setup, add fallback (#1581) (#1592)
The nodesource setup_22.x script can run successfully but leave nodejs
uninstalled on Fly.io machines. Add post-install verification with
`which node && node --version`, fall back to default Debian nodejs
package if nodesource fails, increase timeout from 120s to 180s, and
report a clear error if node is unavailable after all attempts.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 11:46:25 -05:00
A
51493ccc75
refactor: rewrite fly/lib/common.sh — eliminate redundancy and fragility (#1594)
- Replace _fly_json_get (stdin pipe) + _fly_parse_error with unified _fly_json
  that takes JSON string, python expression, and default as arguments
- Collapse 5-step auth chain into 2 steps: fly auth token → ensure_api_token_with_provider
- Replace _validate_fly_token (dual-endpoint fallback) with _test_fly_token (single Machines API call)
- Replace python3-based _fly_build_machine_body with printf + json_escape
- Remove fly ssh console -C fallback from run_server (FLY_MACHINE_ID always set)
- Remove base64 fallback from upload_file (stdin pipe via fly machine exec only)
- Remove collision re-prompt loop from create_server, fail with actionable message
- Add _fly_cleanup_on_failure to delete app when machine creation fails
- Lighten wait_for_cloud_init: install curl/unzip/git only (drop build-essential, python3-pip, zsh)
- Delete unused functions: _try_flyctl_auth, _try_fly_browser_auth, get_fly_org, _fly_json_get, _fly_parse_error

730 → 536 lines. All 112 mock tests pass, zero regressions.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 08:43:00 -08:00
A
3db19d90ac
fix: accept comma in Fly.io macaroon tokens & handle flat org dict (#1593)
Real `fly auth token` returns comma-separated multi-segment macaroon
tokens (fm2_...,fm2_...,fo1_...). The token validation regex rejected
commas, forcing re-auth on every run. Add comma to the allowed charset.

`fly orgs list --json` returns a flat dict ({"slug": "Name"}) on some
flyctl versions, not the list/nodes format the parser expected. Detect
and handle both formats so the org picker works correctly.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 08:20:48 -08:00
A
ed9501235b
fix: bash 3.2 compat — sed for pattern sub + split local var=$(cmd) (#1572, #1571) (#1587)
Issue #1572: Replace bash 4+ ${//} pattern substitution in generate_env_config
with sed for macOS bash 3.2 compatibility.

Issue #1571: Split local var=$(cmd) declarations in fly/lib/common.sh so
exit codes propagate correctly with set -e on macOS bash 3.2.

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 07:49:17 -08:00
A
ce8b1afdf8
fix: always rm temp env file even if .zshrc append fails (#1573) (#1586)
Use semicolons instead of && for rm in inject_env_vars, inject_env_vars_sprite,
inject_env_vars_cb, and inject_env_vars_cloud so the temp file containing the
API key is always deleted even if ~/.zshrc doesn't exist or append fails.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 10:45:55 -05:00
A
aa4174db9e
fix: add retry logic to wait_for_cloud_init for error recovery (#1575) (#1588)
Add _fly_run_with_retry helper that wraps run_server with configurable
retry count, sleep interval, and timeout. Apply it to package manager
and installer commands in wait_for_cloud_init so transient failures
(network timeouts, apt lock contention) no longer abort the entire
cloud-init sequence.

Agent: complexity-hunter

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 10:45:32 -05:00
A
cbb6198258
test: add Fly.io failure-mode tests for SSH tunnel, API errors (#1579) (#1590)
Add mock tests covering real failure scenarios that were previously
untested despite 36/36 happy-path tests passing:

- API rate limit (429): mock curl returns 429 for cloud API calls
- Machine creation failure (422): mock curl returns 422 for POST to */machines*
- SSH tunnel failure: fly ssh console / fly machine exec exit non-zero
  (simulates WireGuard tunnel context deadline exceeded)
- SSH timeout: fly CLI never returns "ok", _fly_wait_for_ssh exhausts retries

The fly mock now checks MOCK_ERROR_SCENARIO to simulate CLI-level failures
(ssh_tunnel_failure, ssh_timeout) in addition to the existing curl-level
error injection (rate_limit, create_failure).

Agent: test-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 10:43:17 -05:00
A
282803a9bb
fix: add debug logging to ensure_fly_token auth chain (#1574) (#1589)
Add log_info/log_warn messages at each step of the 5-step auth chain
so users can see which auth method is being tried and why fallbacks occur.

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 10:42:46 -05:00
L
3eca4221c6
fix: address architectural brittleness in Fly.io integration (issue #1581) (#1585)
Resolves sub-issues #1569, #1570, #1576, #1577, #1578, #1580.

#1569 — /wait endpoint replaces polling loop:
  _fly_wait_for_machine_start now uses GET /apps/{app}/machines/{id}/wait
  ?state=started&timeout=90. One blocking API call instead of 30 polls.

#1570 — fly machine exec replaces fly ssh console for run_server:
  run_server uses 'fly machine exec MACHINE_ID --app APP -- bash -c cmd'
  (direct API, no WireGuard tunnel) when FLY_MACHINE_ID is set. Falls
  back to 'fly ssh console -C' for environments without a machine ID.

#1576 — App name collision loop capped at 5 retries:
  Prevents infinite re-prompt. Suggests FLY_APP_NAME env var after 5
  failed attempts.

#1577 — destroy_server errors are now reported:
  All fly_api calls check for error responses. Reports failed machine
  deletions and exits non-zero on app deletion failure instead of
  always logging "destroyed" regardless of outcome.

#1578 — bun replaced with python3 for all JSON parsing:
  _fly_json_get, _fly_build_machine_body, _fly_list_orgs, destroy_server,
  list_servers all use python3 -c now. python3 is universally available;
  bun was only available after cloud-init completed on the target machine.

#1580 — upload_file uses stdin pipe instead of base64 string injection:
  'fly machine exec ... -- bash -c "cat > path" < local_file' streams
  file content directly. Eliminates the command-length/injection risk of
  embedding base64 content in a shell argument string.

test/mock.sh: add 'fly machine exec' case to the fly CLI mock.
test/fixtures/fly/_env.sh: add FLY_MACHINE_ID to test env.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 07:19:23 -08:00
L
b42d1a52d6
fix: don't bail on non-zero exit from fly orgs list + pass JSON as arg (#1582)
Some flyctl versions exit non-zero even on success. Removed '|| return 1'
so the output is always captured. Empty output is still a failure.

Also pass JSON as a bun argument (process.argv[1]) instead of piping via
stdin — avoids any Bun.stdin buffering issue in the _fly_list_orgs context.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 07:03:56 -08:00
L
07ce24b710
fix: capture interactive_pick output and export FLY_ORG in _fly_prompt_org (#1568)
interactive_pick() echoes the selected value to stdout — it does NOT
export the env var. _fly_prompt_org was calling it without capturing
the output, so FLY_ORG was never set and the echo printed the org
slug as a raw string to the terminal.

Fix: org=$(interactive_pick ...) && export FLY_ORG.
Also guard with the standard FLY_ORG / SPAWN_NON_INTERACTIVE early-exit.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 06:40:23 -08:00
L
6fab9b6ae5
fix: use fly orgs list for org picker + strip ANSI from token capture (#1567)
1. _fly_list_orgs: use 'fly orgs list --json' (flyctl) instead of the
   non-existent api.fly.io/v1/organizations REST endpoint. Pipe through
   interactive_pick (same pattern as Hetzner/GCP pickers) so org
   selection uses the shared arrow-key / fzf / numbered-list picker.

2. fly auth token captures: add 'sed s/\x1b...//g' to strip ANSI color
   escape codes. flyctl may output the token with terminal colors even
   when stdout is piped; the ESC character (\033) fails the security
   character check (^[a-zA-Z0-9._/@:+=\ -]+$) causing the token to be
   marked malformed and cleared on the next run.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 06:35:44 -08:00
L
0f63596a12
feat: use Fly.io API to list orgs in picker for _fly_prompt_org (#1566)
Replace flyctl-based org listing with a direct API call to
api.fly.io/v1/organizations, feeding results into _display_and_select
(the shared arrow-key / fzf / numbered-list picker).

_fly_list_orgs():
  - Calls GET /v1/organizations with Bearer auth
  - Emits pipe-delimited "slug|name (type)" lines for _display_and_select

_fly_prompt_org():
  - Single org: auto-selects silently
  - Multiple orgs: shows arrow-key picker via _display_and_select
    (defaults to "personal" if that slug is in the list)
  - API unavailable: falls back to safe_read prompt with "personal" default

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-02-21 06:31:57 -08:00