fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440)

* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.*install.*` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && || | operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prepend ~/.local/bin to PATH in ssh_run_server After uv installs to ~/.local/bin, the current shell session doesn't have it in PATH, causing "uv: command not found" on DigitalOcean and all other SSH-based clouds (Hetzner, AWS, GCP, OVH). Fly.io's run_server already prepends this PATH — now the shared ssh_run_server does the same, fixing all SSH-based clouds at once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Node.js to cloud-init for all cloud providers npm-based agents (codex, kilocode, etc.) fail with "npm: command not found" because Node.js isn't installed during cloud-init. Fly.io was the only provider installing Node.js (in wait_for_cloud_init). Now all cloud-init scripts install Node.js v22 LTS from nodesource, matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and GCP cloud-init (was already in shared/DigitalOcean/Hetzner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use apt packages for nodejs/npm instead of nodesource The nodesource setup script (setup_22.x) runs its own apt-get update and repository configuration, nearly doubling cloud-init time and causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm in its default repos — just add them to the packages list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add timeouts and better error handling to Daytona CLI commands Daytona CLI commands (login, list, create) can hang indefinitely when the API is slow or unreachable. This causes: - "Failed to create sandbox: timeout" with no recovery - Token validation timeouts misreported as "invalid token" - Users re-entering valid tokens that also timeout Fixes: - Wrap all daytona CLI calls with timeout (30s for auth, 120s for create) - Detect timeout errors separately from auth errors - Show actionable "try again / check status" messages for timeouts - Add nodejs/npm to Daytona wait_for_cloud_init Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: set DAYTONA_API_URL to Daytona Cloud by default The Daytona CLI may default to connecting to a local self-hosted server instead of Daytona Cloud. Without DAYTONA_API_URL set to https://app.daytona.io/api, every CLI command (login, list, create) hangs trying to reach a non-existent local server and times out. The SDK documents this as the default, but the CLI doesn't always pick it up — now we export it explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing n installs Node.js v22 to /usr/local/bin/node but apt's v18 at /usr/bin/node can shadow it in non-interactive SSH sessions. After n 22, symlink the new binaries over the apt ones so v22 is always resolved. Also fix hcloud CLI token extraction for new TOML format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review, add curl timeouts to trigger workflows - Fix ssh_run_server command injection concern: use single-quoted path_prefix so $HOME/$PATH expand remotely, not locally - Add --connect-timeout 15 --max-time 30 to trigger workflows to prevent 5-min hangs when server streams responses - Handle 409 (dedup) as success — expected when cron fires every 15min but cycles take 35min - Reduce workflow timeout-minutes from 5 to 2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-18 23:51:40 +00:00 · 2026-02-18 03:54:07 -08:00 · 2026-02-18 03:54:07 -08:00 · f2795a6d84
commit f2795a6d84
parent 0057aa01e7
8 changed files with 148 additions and 34 deletions
--- a/.github/workflows/discovery.yml
+++ b/.github/workflows/discovery.yml
@ -10,7 +10,7 @@ on:
 jobs:
  trigger:
    runs-on: ubuntu-latest
-    timeout-minutes: 5
+    timeout-minutes: 2
    # Only trigger on issues with safe-to-work AND (cloud-request or agent-request) labels, or schedule/manual
    if: >-
      github.event_name != 'issues' ||
@ -23,6 +23,24 @@ jobs:
          SPRITE_URL: ${{ secrets.DISCOVERY_SPRITE_URL }}
          TRIGGER_SECRET: ${{ secrets.DISCOVERY_TRIGGER_SECRET }}
        run: |
-          curl -sS --fail-with-body -X POST \
+          HTTP_CODE=$(curl -sS --connect-timeout 15 --max-time 30 \
+            -o /tmp/response.json -w "%{http_code}" -X POST \
            "${SPRITE_URL}/trigger?reason=${{ github.event_name }}&issue=${{ github.event.issue.number || '' }}" \
-            -H "Authorization: Bearer ${TRIGGER_SECRET}"
+            -H "Authorization: Bearer ${TRIGGER_SECRET}")
+          BODY=$(cat /tmp/response.json 2>/dev/null || echo '{}')
+          echo "$BODY"
+          case "$HTTP_CODE" in
+            2*)
+              echo "::notice::Trigger accepted (HTTP $HTTP_CODE)"
+              ;;
+            409)
+              echo "::notice::Run already in progress — this is expected (HTTP 409)"
+              ;;
+            429)
+              echo "::warning::Server at capacity (HTTP 429)"
+              ;;
+            *)
+              echo "::error::Trigger failed (HTTP $HTTP_CODE)"
+              exit 1
+              ;;
+          esac
--- a/.github/workflows/refactor.yml
+++ b/.github/workflows/refactor.yml
@ -10,7 +10,7 @@ on:
 jobs:
  trigger:
    runs-on: ubuntu-latest
-    timeout-minutes: 5
+    timeout-minutes: 2
    # Only trigger on issues with safe-to-work AND (bug, cli, enhancement, or maintenance) labels, or schedule/manual
    if: >-
      github.event_name != 'issues' ||
@ -25,6 +25,24 @@ jobs:
          SPRITE_URL: ${{ secrets.REFACTOR_SPRITE_URL }}
          TRIGGER_SECRET: ${{ secrets.REFACTOR_TRIGGER_SECRET }}
        run: |
-          curl -sS --fail-with-body -X POST \
+          HTTP_CODE=$(curl -sS --connect-timeout 15 --max-time 30 \
+            -o /tmp/response.json -w "%{http_code}" -X POST \
            "${SPRITE_URL}/trigger?reason=${{ github.event_name }}&issue=${{ github.event.issue.number || '' }}" \
-            -H "Authorization: Bearer ${TRIGGER_SECRET}"
+            -H "Authorization: Bearer ${TRIGGER_SECRET}")
+          BODY=$(cat /tmp/response.json 2>/dev/null || echo '{}')
+          echo "$BODY"
+          case "$HTTP_CODE" in
+            2*)
+              echo "::notice::Trigger accepted (HTTP $HTTP_CODE)"
+              ;;
+            409)
+              echo "::notice::Run already in progress — this is expected (HTTP 409)"
+              ;;
+            429)
+              echo "::warning::Server at capacity (HTTP 429)"
+              ;;
+            *)
+              echo "::error::Trigger failed (HTTP $HTTP_CODE)"
+              exit 1
+              ;;
+          esac
--- a/aws/lib/common.sh
+++ b/aws/lib/common.sh
@ -112,14 +112,17 @@ get_cloud_init_userdata() {
    cat << 'CLOUD_INIT_EOF'
 #!/bin/bash
 apt-get update -y
-apt-get install -y curl unzip git zsh
+apt-get install -y curl unzip git zsh nodejs npm
+# Upgrade Node.js to v22 LTS (apt has v18, agents like Cline need v20+)
+# n installs to /usr/local/bin but apt's v18 at /usr/bin can shadow it, so symlink over
+npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx
 # Install Bun
 su - ubuntu -c 'curl -fsSL https://bun.sh/install | bash'
 # Install Claude Code
 su - ubuntu -c 'curl -fsSL https://claude.ai/install.sh | bash'
 # Configure PATH
-echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.bashrc
-echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.zshrc
+echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.bashrc
+echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.zshrc
 chown ubuntu:ubuntu /home/ubuntu/.bashrc /home/ubuntu/.zshrc
 touch /home/ubuntu/.cloud-init-complete
 chown ubuntu:ubuntu /home/ubuntu/.cloud-init-complete
--- a/daytona/lib/common.sh
+++ b/daytona/lib/common.sh
@ -50,6 +50,10 @@ _is_daytona_auth_error() {
    printf '%s' "${1}" | grep -qi "unauthorized\|invalid.*key\|authentication\|forbidden"
 }

+_is_daytona_timeout() {
+    printf '%s' "${1}" | grep -qi "timeout\|timed out\|deadline exceeded\|context canceled"
+}
+
 _daytona_auth_error() {
    log_error "Invalid API key"
    log_error "How to fix:"
@ -58,13 +62,40 @@ _daytona_auth_error() {
    log_warn "  3. Check key hasn't expired or been revoked"
 }

+# Run a daytona CLI command with a timeout to prevent indefinite hangs.
+# Usage: _daytona_with_timeout SECONDS daytona [args...]
+_daytona_with_timeout() {
+    local secs="${1}"; shift
+    local timeout_bin=""
+    if command -v timeout &>/dev/null; then timeout_bin="timeout"
+    elif command -v gtimeout &>/dev/null; then timeout_bin="gtimeout"
+    fi
+    if [[ -n "${timeout_bin}" ]]; then
+        "${timeout_bin}" "${secs}" "$@"
+    else
+        "$@"
+    fi
+}
+
 test_daytona_token() {
    local test_response
-    # Authenticate CLI with the API key first
-    test_response=$(daytona login --api-key "${DAYTONA_API_KEY}" 2>&1)
-    local exit_code=$?
+    local exit_code=0
+
+    # Authenticate CLI with the API key (30s timeout)
+    # Use --api-key=VALUE syntax per official Daytona docs
+    test_response=$(_daytona_with_timeout 30 daytona login --api-key="${DAYTONA_API_KEY}" 2>&1) || exit_code=$?
+
+    if [[ ${exit_code} -eq 124 ]]; then
+        log_error "Daytona login timed out (Daytona API may be temporarily unavailable)"
+        log_warn "Try again in a minute, or check https://status.daytona.io"
+        return 1
+    fi

    if [[ ${exit_code} -ne 0 ]]; then
+        if _is_daytona_timeout "${test_response}"; then
+            log_error "Daytona login timed out: ${test_response}"
+            return 1
+        fi
        if _is_daytona_auth_error "${test_response}"; then
            _daytona_auth_error; return 1
        fi
@ -72,10 +103,23 @@ test_daytona_token() {
        return 1
    fi

-    # Verify by listing sandboxes (lightweight API call)
-    test_response=$(daytona list --limit 1 2>&1)
-    if [[ $? -ne 0 ]] && _is_daytona_auth_error "${test_response}"; then
-        _daytona_auth_error; return 1
+    # Verify by listing sandboxes (30s timeout)
+    exit_code=0
+    test_response=$(_daytona_with_timeout 30 daytona sandbox list --limit 1 2>&1) || exit_code=$?
+
+    if [[ ${exit_code} -eq 124 ]] || _is_daytona_timeout "${test_response:-}"; then
+        log_error "Daytona API timed out (service may be temporarily slow)"
+        log_warn "Try again in a minute, or check https://status.daytona.io"
+        return 1
+    fi
+
+    if [[ ${exit_code} -ne 0 ]]; then
+        if _is_daytona_auth_error "${test_response}"; then
+            _daytona_auth_error
+        else
+            log_error "Daytona API check failed: ${test_response}"
+        fi
+        return 1
    fi
    return 0
 }
@ -87,6 +131,11 @@ ensure_daytona_token() {
        "${HOME}/.config/spawn/daytona.json" \
        "https://app.daytona.io" \
        "test_daytona_token"
+
+    # Always authenticate CLI — ensure_api_token_with_provider may skip
+    # the test function when loading from env var, so daytona login
+    # would never be called. Re-running login is idempotent and fast.
+    _daytona_with_timeout 30 daytona login --api-key="${DAYTONA_API_KEY}" 2>/dev/null || true
 }

 get_server_name() {
@ -109,7 +158,7 @@ _daytona_create_with_resources() {
    if [[ ! "${disk}" =~ ^[0-9]+$ ]]; then log_error "Invalid DAYTONA_DISK: must be numeric"; return 1; fi

    log_step "Creating Daytona sandbox '${name}' (${cpu} vCPU / ${memory}MB RAM / ${disk}GB disk)..."
-    daytona create \
+    _daytona_with_timeout 120 daytona sandbox create \
        --name "${name}" \
        --cpu "${cpu}" \
        --memory "${memory}" \
@ -130,7 +179,7 @@ _daytona_create_with_class() {
    fi

    log_step "Creating Daytona sandbox '${name}' (class: ${sandbox_class})..."
-    daytona create \
+    _daytona_with_timeout 120 daytona sandbox create \
        --name "${name}" \
        --class "${sandbox_class}" \
        --auto-stop 0 \
@ -143,7 +192,7 @@ _resolve_sandbox_id() {

    # Try to get the sandbox ID from `daytona info`
    local info_output
-    info_output=$(daytona info "${name}" --format json 2>/dev/null) || true
+    info_output=$(daytona sandbox info "${name}" --format json 2>/dev/null) || true

    if [[ -n "${info_output}" ]]; then
        DAYTONA_SANDBOX_ID=$(printf '%s' "${info_output}" | python3 -c "import json,sys; print(json.load(sys.stdin).get('id',''))" 2>/dev/null) || true
@ -178,7 +227,11 @@ create_server() {
    fi

    if [[ ${exit_code} -ne 0 ]]; then
-        if _is_snapshot_conflict "${output}"; then
+        if [[ ${exit_code} -eq 124 ]] || _is_daytona_timeout "${output:-}"; then
+            log_error "Sandbox creation timed out (Daytona API may be temporarily slow)"
+            log_warn "Try again in a minute, or check https://status.daytona.io"
+            log_warn "You can also try: daytona create --name ${name} --class small"
+        elif _is_snapshot_conflict "${output}"; then
            log_error "Cannot specify resources when using a Daytona snapshot"
            log_error ""
            log_error "Use a sandbox class instead:"
@ -200,7 +253,8 @@ create_server() {

 wait_for_cloud_init() {
    log_step "Installing base tools in sandbox..."
-    run_server "apt-get update -y && apt-get install -y curl unzip git zsh" >/dev/null 2>&1 || true
+    run_server "apt-get update -y && apt-get install -y curl unzip git zsh nodejs npm" >/dev/null 2>&1 || true
+    run_server "npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx" >/dev/null 2>&1 || true
    run_server "curl -fsSL https://bun.sh/install | bash" >/dev/null 2>&1 || true
    run_server "curl -fsSL https://claude.ai/install.sh | bash" >/dev/null 2>&1 || true
    run_server 'echo "export PATH=\"${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}\"" >> ~/.bashrc' >/dev/null 2>&1 || true
@ -259,12 +313,12 @@ destroy_server() {
        return 0
    fi
    log_step "Destroying sandbox ${sandbox_id}..."
-    daytona delete "${sandbox_id}" 2>/dev/null || true
+    daytona sandbox delete "${sandbox_id}" 2>/dev/null || true
    log_info "Sandbox destroyed"
 }

 list_servers() {
-    daytona list
+    daytona sandbox list
 }

 # ============================================================
--- a/gcp/lib/common.sh
+++ b/gcp/lib/common.sh
@ -137,13 +137,16 @@ get_cloud_init_userdata() {
    cat << 'CLOUD_INIT_EOF'
 #!/bin/bash
 apt-get update -y
-apt-get install -y curl unzip git zsh
+apt-get install -y curl unzip git zsh nodejs npm
+# Upgrade Node.js to v22 LTS (apt has v18, agents like Cline need v20+)
+# n installs to /usr/local/bin but apt's v18 at /usr/bin can shadow it, so symlink over
+npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx
 # Install Bun
 su - $(logname 2>/dev/null || echo "$USER") -c 'curl -fsSL https://bun.sh/install | bash' || true
 # Install Claude Code
 su - $(logname 2>/dev/null || echo "$USER") -c 'curl -fsSL https://claude.ai/install.sh | bash' || true
 # Configure PATH for all users
-echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.bun/bin:${PATH}"' >> /etc/profile.d/spawn.sh
+echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}"' >> /etc/profile.d/spawn.sh
 chmod +x /etc/profile.d/spawn.sh
 touch /tmp/.cloud-init-complete
 CLOUD_INIT_EOF
--- a/hetzner/lib/common.sh
+++ b/hetzner/lib/common.sh
@ -66,19 +66,26 @@ ensure_hcloud_token() {
            log_info "Using hcloud CLI (context: $(hcloud context active))"
            # Export token from CLI context for API fallback compatibility
            if [[ -z "${HCLOUD_TOKEN:-}" ]]; then
-                # SECURITY: Use grep -F for literal string matching to prevent command injection
-                # if the context name contains shell metacharacters
                local active_context
                active_context=$(hcloud context active 2>/dev/null || echo "")
                if [[ -n "${active_context}" ]]; then
-                    # Use -F for literal string matching (no pattern interpretation)
-                    HCLOUD_TOKEN=$(grep -F "[${active_context}]" ~/.config/hcloud/cli.toml 2>/dev/null | grep token | sed 's/.*= *"\(.*\)"/\1/' || true)
+                    # hcloud config uses [[contexts]] array format (lines are indented):
+                    #   [[contexts]]
+                    #     name = "myctx"
+                    #     token = "abc123"
+                    # Find the "name = " line, grab up to 5 lines after it, extract token
+                    HCLOUD_TOKEN=$(grep -FA5 "name = \"${active_context}\"" ~/.config/hcloud/cli.toml 2>/dev/null | grep 'token *=' | sed 's/.*= *"\(.*\)"/\1/' | head -1 || true)
                    if [[ -n "${HCLOUD_TOKEN:-}" ]]; then
                        export HCLOUD_TOKEN
                    fi
                fi
            fi
-            return 0
+            if [[ -z "${HCLOUD_TOKEN:-}" ]]; then
+                log_warn "Could not extract API token from hcloud CLI config"
+                log_warn "Falling back to manual token entry..."
+            else
+                return 0
+            fi
        else
            log_info "hcloud CLI found but no active context"
            log_info "Run: hcloud context create myproject"
--- a/ovh/lib/common.sh
+++ b/ovh/lib/common.sh
@ -375,6 +375,7 @@ install_base_deps() {
    fi

    run_ovh "$ip" "${sudo_prefix}apt-get update -qq && ${sudo_prefix}apt-get install -y -qq curl unzip git zsh build-essential python3 python3-pip nodejs npm > /dev/null 2>&1"
+    run_ovh "$ip" "${sudo_prefix}npm install -g n && ${sudo_prefix}n 22 && ${sudo_prefix}ln -sf /usr/local/bin/node /usr/bin/node && ${sudo_prefix}ln -sf /usr/local/bin/npm /usr/bin/npm && ${sudo_prefix}ln -sf /usr/local/bin/npx /usr/bin/npx"

    # Install Bun
    run_ovh "$ip" "curl -fsSL https://bun.sh/install | bash"
--- a/shared/common.sh
+++ b/shared/common.sh
@ -1387,8 +1387,8 @@ _ensure_nodejs_runtime() {
    local claude_path="$2"
    if ! ${run_cb} "${claude_path} && command -v node" >/dev/null 2>&1; then
        log_step "Installing Node.js runtime (required for claude package)..."
-        if ${run_cb} "curl -fsSL https://deb.nodesource.com/setup_lts.x | bash - && apt-get install -y nodejs" >/dev/null 2>&1; then
-            log_info "Node.js installed via nodesource"
+        if ${run_cb} "apt-get install -y nodejs npm && npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx" >/dev/null 2>&1; then
+            log_info "Node.js installed via n"
        else
            log_warn "Could not install Node.js - bun method may fail"
        fi
@ -1732,8 +1732,13 @@ packages:
  - unzip
  - git
  - zsh
+  - nodejs
+  - npm

 runcmd:
+  # Upgrade Node.js to v22 LTS (apt has v18, agents like Cline need v20+)
+  # n installs to /usr/local/bin but apt's v18 at /usr/bin can shadow it, so symlink over
+  - npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx
  # Install Bun
  - su - root -c 'curl -fsSL https://bun.sh/install | bash'
  # Install Claude Code
@ -2207,10 +2212,15 @@ wait_for_cloud_init() {
 # Run a command on a remote server via SSH
 # Usage: ssh_run_server IP COMMAND
 # Requires: SSH_USER (default: root), SSH_OPTS
-# SECURITY: Command is properly quoted to prevent shell injection
+# SECURITY: Command is properly quoted to prevent shell injection.
+# Note: $cmd is always a shell command string (with pipes, semicolons, etc.)
+# that is intentionally interpreted by the remote shell. All callers pass
+# static command strings — never user-controlled input.
 ssh_run_server() {
    local ip="${1}"
    local cmd="${2}"
+    # Single-quoted so $HOME/$PATH expand on the remote side, not locally.
+    local path_prefix='export PATH="$HOME/.local/bin:$HOME/.bun/bin:$PATH"'
    if [[ -n "${SPAWN_DEBUG:-}" ]]; then
        cmd="set -x; ${cmd}"
    fi
@ -2218,7 +2228,7 @@ ssh_run_server() {
    # < /dev/null prevents SSH from consuming the parent script's stdin.
    # Without this, sequential SSH calls can steal input meant for later
    # commands (e.g., safe_read prompts), causing hangs.
-    ssh $SSH_OPTS "${SSH_USER:-root}@${ip}" -- "${cmd}" < /dev/null
+    ssh $SSH_OPTS "${SSH_USER:-root}@${ip}" -- "${path_prefix} && ${cmd}" < /dev/null
 }

 # Upload a file to a remote server via SCP