fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440)

* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds

aider-chat on Python 3.13 fails with `ImportError: cannot import name
'_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved
— those releases have no Python 3.13 binary wheels, so the C extension
is missing at runtime.

Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>`
and single quotes get mangled by `printf '%q'` in run_server before the
command reaches the remote machine) with `--upgrade`, which forces all
transitive deps including Pillow to their latest compatible versions.

Also adds a plain-text echo before the install so users see progress
instead of a silent hang during the 2-4 minute install.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: update aider/gptme/interpreter assertions from pip to uv

The install method for aider, gptme, and open-interpreter was changed
from pip to `uv tool install` across all clouds. The mock test
assertions still checked for the old `pip.*install.*` patterns, causing
9 failures (3 agents × 3 clouds).

Update patterns to match the actual `uv tool install` commands now used
in all cloud scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: trigger test run for uv assertion fix

* fix: prevent SSH hangs, restore stderr, fix command escaping across clouds

- Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH
  stdin theft causing sequential install/verify/configure steps to hang
- Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default
  SSH_OPTS so long-running installs don't silently drop on flaky networks
- Remove 2>/dev/null from Fly.io run_server so remote command errors are
  no longer silently swallowed (--quiet flag still suppresses flyctl noise)
- Fix Fly.io printf '%q' double-quoting: remove extra quotes around
  $escaped_cmd that prevented the remote shell from consuming escapes,
  breaking && || | operators in commands
- Remove broken printf '%q' from Daytona run_server and interactive_session
  where it escaped shell operators into literal characters since daytona exec
  has no intermediate shell layer
- Pin aider to --python 3.12 instead of --with audioop-lts across all clouds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add --pty to fly ssh console for interactive sessions

fly ssh console -C does not allocate a pseudo-terminal by default,
causing interactive TUI agents (aider, claude) to fail with
"Input is not a terminal (fd=0)" or completely unresponsive input.

Adding --pty forces PTY allocation, matching how other clouds handle
interactive sessions (SSH uses -t, Sprite uses -tty).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prepend ~/.local/bin to PATH in ssh_run_server

After uv installs to ~/.local/bin, the current shell session doesn't
have it in PATH, causing "uv: command not found" on DigitalOcean and
all other SSH-based clouds (Hetzner, AWS, GCP, OVH).

Fly.io's run_server already prepends this PATH — now the shared
ssh_run_server does the same, fixing all SSH-based clouds at once.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Node.js to cloud-init for all cloud providers

npm-based agents (codex, kilocode, etc.) fail with "npm: command not
found" because Node.js isn't installed during cloud-init. Fly.io was
the only provider installing Node.js (in wait_for_cloud_init).

Now all cloud-init scripts install Node.js v22 LTS from nodesource,
matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and
GCP cloud-init (was already in shared/DigitalOcean/Hetzner).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use apt packages for nodejs/npm instead of nodesource

The nodesource setup script (setup_22.x) runs its own apt-get update
and repository configuration, nearly doubling cloud-init time and
causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm
in its default repos — just add them to the packages list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add timeouts and better error handling to Daytona CLI commands

Daytona CLI commands (login, list, create) can hang indefinitely when
the API is slow or unreachable. This causes:
- "Failed to create sandbox: timeout" with no recovery
- Token validation timeouts misreported as "invalid token"
- Users re-entering valid tokens that also timeout

Fixes:
- Wrap all daytona CLI calls with timeout (30s for auth, 120s for create)
- Detect timeout errors separately from auth errors
- Show actionable "try again / check status" messages for timeouts
- Add nodejs/npm to Daytona wait_for_cloud_init

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: set DAYTONA_API_URL to Daytona Cloud by default

The Daytona CLI may default to connecting to a local self-hosted
server instead of Daytona Cloud. Without DAYTONA_API_URL set to
https://app.daytona.io/api, every CLI command (login, list, create)
hangs trying to reach a non-existent local server and times out.

The SDK documents this as the default, but the CLI doesn't always
pick it up — now we export it explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing

n installs Node.js v22 to /usr/local/bin/node but apt's v18 at
/usr/bin/node can shadow it in non-interactive SSH sessions. After
n 22, symlink the new binaries over the apt ones so v22 is always
resolved. Also fix hcloud CLI token extraction for new TOML format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address security review, add curl timeouts to trigger workflows

- Fix ssh_run_server command injection concern: use single-quoted
  path_prefix so $HOME/$PATH expand remotely, not locally
- Add --connect-timeout 15 --max-time 30 to trigger workflows to
  prevent 5-min hangs when server streams responses
- Handle 409 (dedup) as success — expected when cron fires every 15min
  but cycles take 35min
- Reduce workflow timeout-minutes from 5 to 2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Ahmed Abushagur 2026-02-18 03:54:07 -08:00 committed by GitHub
parent 0057aa01e7
commit f2795a6d84
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 148 additions and 34 deletions

View file

@ -10,7 +10,7 @@ on:
jobs:
trigger:
runs-on: ubuntu-latest
timeout-minutes: 5
timeout-minutes: 2
# Only trigger on issues with safe-to-work AND (cloud-request or agent-request) labels, or schedule/manual
if: >-
github.event_name != 'issues' ||
@ -23,6 +23,24 @@ jobs:
SPRITE_URL: ${{ secrets.DISCOVERY_SPRITE_URL }}
TRIGGER_SECRET: ${{ secrets.DISCOVERY_TRIGGER_SECRET }}
run: |
curl -sS --fail-with-body -X POST \
HTTP_CODE=$(curl -sS --connect-timeout 15 --max-time 30 \
-o /tmp/response.json -w "%{http_code}" -X POST \
"${SPRITE_URL}/trigger?reason=${{ github.event_name }}&issue=${{ github.event.issue.number || '' }}" \
-H "Authorization: Bearer ${TRIGGER_SECRET}"
-H "Authorization: Bearer ${TRIGGER_SECRET}")
BODY=$(cat /tmp/response.json 2>/dev/null || echo '{}')
echo "$BODY"
case "$HTTP_CODE" in
2*)
echo "::notice::Trigger accepted (HTTP $HTTP_CODE)"
;;
409)
echo "::notice::Run already in progress — this is expected (HTTP 409)"
;;
429)
echo "::warning::Server at capacity (HTTP 429)"
;;
*)
echo "::error::Trigger failed (HTTP $HTTP_CODE)"
exit 1
;;
esac

View file

@ -10,7 +10,7 @@ on:
jobs:
trigger:
runs-on: ubuntu-latest
timeout-minutes: 5
timeout-minutes: 2
# Only trigger on issues with safe-to-work AND (bug, cli, enhancement, or maintenance) labels, or schedule/manual
if: >-
github.event_name != 'issues' ||
@ -25,6 +25,24 @@ jobs:
SPRITE_URL: ${{ secrets.REFACTOR_SPRITE_URL }}
TRIGGER_SECRET: ${{ secrets.REFACTOR_TRIGGER_SECRET }}
run: |
curl -sS --fail-with-body -X POST \
HTTP_CODE=$(curl -sS --connect-timeout 15 --max-time 30 \
-o /tmp/response.json -w "%{http_code}" -X POST \
"${SPRITE_URL}/trigger?reason=${{ github.event_name }}&issue=${{ github.event.issue.number || '' }}" \
-H "Authorization: Bearer ${TRIGGER_SECRET}"
-H "Authorization: Bearer ${TRIGGER_SECRET}")
BODY=$(cat /tmp/response.json 2>/dev/null || echo '{}')
echo "$BODY"
case "$HTTP_CODE" in
2*)
echo "::notice::Trigger accepted (HTTP $HTTP_CODE)"
;;
409)
echo "::notice::Run already in progress — this is expected (HTTP 409)"
;;
429)
echo "::warning::Server at capacity (HTTP 429)"
;;
*)
echo "::error::Trigger failed (HTTP $HTTP_CODE)"
exit 1
;;
esac

View file

@ -112,14 +112,17 @@ get_cloud_init_userdata() {
cat << 'CLOUD_INIT_EOF'
#!/bin/bash
apt-get update -y
apt-get install -y curl unzip git zsh
apt-get install -y curl unzip git zsh nodejs npm
# Upgrade Node.js to v22 LTS (apt has v18, agents like Cline need v20+)
# n installs to /usr/local/bin but apt's v18 at /usr/bin can shadow it, so symlink over
npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx
# Install Bun
su - ubuntu -c 'curl -fsSL https://bun.sh/install | bash'
# Install Claude Code
su - ubuntu -c 'curl -fsSL https://claude.ai/install.sh | bash'
# Configure PATH
echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.bashrc
echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.zshrc
echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.bashrc
echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}"' >> /home/ubuntu/.zshrc
chown ubuntu:ubuntu /home/ubuntu/.bashrc /home/ubuntu/.zshrc
touch /home/ubuntu/.cloud-init-complete
chown ubuntu:ubuntu /home/ubuntu/.cloud-init-complete

View file

@ -50,6 +50,10 @@ _is_daytona_auth_error() {
printf '%s' "${1}" | grep -qi "unauthorized\|invalid.*key\|authentication\|forbidden"
}
_is_daytona_timeout() {
printf '%s' "${1}" | grep -qi "timeout\|timed out\|deadline exceeded\|context canceled"
}
_daytona_auth_error() {
log_error "Invalid API key"
log_error "How to fix:"
@ -58,13 +62,40 @@ _daytona_auth_error() {
log_warn " 3. Check key hasn't expired or been revoked"
}
# Run a daytona CLI command with a timeout to prevent indefinite hangs.
# Usage: _daytona_with_timeout SECONDS daytona [args...]
_daytona_with_timeout() {
local secs="${1}"; shift
local timeout_bin=""
if command -v timeout &>/dev/null; then timeout_bin="timeout"
elif command -v gtimeout &>/dev/null; then timeout_bin="gtimeout"
fi
if [[ -n "${timeout_bin}" ]]; then
"${timeout_bin}" "${secs}" "$@"
else
"$@"
fi
}
test_daytona_token() {
local test_response
# Authenticate CLI with the API key first
test_response=$(daytona login --api-key "${DAYTONA_API_KEY}" 2>&1)
local exit_code=$?
local exit_code=0
# Authenticate CLI with the API key (30s timeout)
# Use --api-key=VALUE syntax per official Daytona docs
test_response=$(_daytona_with_timeout 30 daytona login --api-key="${DAYTONA_API_KEY}" 2>&1) || exit_code=$?
if [[ ${exit_code} -eq 124 ]]; then
log_error "Daytona login timed out (Daytona API may be temporarily unavailable)"
log_warn "Try again in a minute, or check https://status.daytona.io"
return 1
fi
if [[ ${exit_code} -ne 0 ]]; then
if _is_daytona_timeout "${test_response}"; then
log_error "Daytona login timed out: ${test_response}"
return 1
fi
if _is_daytona_auth_error "${test_response}"; then
_daytona_auth_error; return 1
fi
@ -72,10 +103,23 @@ test_daytona_token() {
return 1
fi
# Verify by listing sandboxes (lightweight API call)
test_response=$(daytona list --limit 1 2>&1)
if [[ $? -ne 0 ]] && _is_daytona_auth_error "${test_response}"; then
_daytona_auth_error; return 1
# Verify by listing sandboxes (30s timeout)
exit_code=0
test_response=$(_daytona_with_timeout 30 daytona sandbox list --limit 1 2>&1) || exit_code=$?
if [[ ${exit_code} -eq 124 ]] || _is_daytona_timeout "${test_response:-}"; then
log_error "Daytona API timed out (service may be temporarily slow)"
log_warn "Try again in a minute, or check https://status.daytona.io"
return 1
fi
if [[ ${exit_code} -ne 0 ]]; then
if _is_daytona_auth_error "${test_response}"; then
_daytona_auth_error
else
log_error "Daytona API check failed: ${test_response}"
fi
return 1
fi
return 0
}
@ -87,6 +131,11 @@ ensure_daytona_token() {
"${HOME}/.config/spawn/daytona.json" \
"https://app.daytona.io" \
"test_daytona_token"
# Always authenticate CLI — ensure_api_token_with_provider may skip
# the test function when loading from env var, so daytona login
# would never be called. Re-running login is idempotent and fast.
_daytona_with_timeout 30 daytona login --api-key="${DAYTONA_API_KEY}" 2>/dev/null || true
}
get_server_name() {
@ -109,7 +158,7 @@ _daytona_create_with_resources() {
if [[ ! "${disk}" =~ ^[0-9]+$ ]]; then log_error "Invalid DAYTONA_DISK: must be numeric"; return 1; fi
log_step "Creating Daytona sandbox '${name}' (${cpu} vCPU / ${memory}MB RAM / ${disk}GB disk)..."
daytona create \
_daytona_with_timeout 120 daytona sandbox create \
--name "${name}" \
--cpu "${cpu}" \
--memory "${memory}" \
@ -130,7 +179,7 @@ _daytona_create_with_class() {
fi
log_step "Creating Daytona sandbox '${name}' (class: ${sandbox_class})..."
daytona create \
_daytona_with_timeout 120 daytona sandbox create \
--name "${name}" \
--class "${sandbox_class}" \
--auto-stop 0 \
@ -143,7 +192,7 @@ _resolve_sandbox_id() {
# Try to get the sandbox ID from `daytona info`
local info_output
info_output=$(daytona info "${name}" --format json 2>/dev/null) || true
info_output=$(daytona sandbox info "${name}" --format json 2>/dev/null) || true
if [[ -n "${info_output}" ]]; then
DAYTONA_SANDBOX_ID=$(printf '%s' "${info_output}" | python3 -c "import json,sys; print(json.load(sys.stdin).get('id',''))" 2>/dev/null) || true
@ -178,7 +227,11 @@ create_server() {
fi
if [[ ${exit_code} -ne 0 ]]; then
if _is_snapshot_conflict "${output}"; then
if [[ ${exit_code} -eq 124 ]] || _is_daytona_timeout "${output:-}"; then
log_error "Sandbox creation timed out (Daytona API may be temporarily slow)"
log_warn "Try again in a minute, or check https://status.daytona.io"
log_warn "You can also try: daytona create --name ${name} --class small"
elif _is_snapshot_conflict "${output}"; then
log_error "Cannot specify resources when using a Daytona snapshot"
log_error ""
log_error "Use a sandbox class instead:"
@ -200,7 +253,8 @@ create_server() {
wait_for_cloud_init() {
log_step "Installing base tools in sandbox..."
run_server "apt-get update -y && apt-get install -y curl unzip git zsh" >/dev/null 2>&1 || true
run_server "apt-get update -y && apt-get install -y curl unzip git zsh nodejs npm" >/dev/null 2>&1 || true
run_server "npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx" >/dev/null 2>&1 || true
run_server "curl -fsSL https://bun.sh/install | bash" >/dev/null 2>&1 || true
run_server "curl -fsSL https://claude.ai/install.sh | bash" >/dev/null 2>&1 || true
run_server 'echo "export PATH=\"${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}\"" >> ~/.bashrc' >/dev/null 2>&1 || true
@ -259,12 +313,12 @@ destroy_server() {
return 0
fi
log_step "Destroying sandbox ${sandbox_id}..."
daytona delete "${sandbox_id}" 2>/dev/null || true
daytona sandbox delete "${sandbox_id}" 2>/dev/null || true
log_info "Sandbox destroyed"
}
list_servers() {
daytona list
daytona sandbox list
}
# ============================================================

View file

@ -137,13 +137,16 @@ get_cloud_init_userdata() {
cat << 'CLOUD_INIT_EOF'
#!/bin/bash
apt-get update -y
apt-get install -y curl unzip git zsh
apt-get install -y curl unzip git zsh nodejs npm
# Upgrade Node.js to v22 LTS (apt has v18, agents like Cline need v20+)
# n installs to /usr/local/bin but apt's v18 at /usr/bin can shadow it, so symlink over
npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx
# Install Bun
su - $(logname 2>/dev/null || echo "$USER") -c 'curl -fsSL https://bun.sh/install | bash' || true
# Install Claude Code
su - $(logname 2>/dev/null || echo "$USER") -c 'curl -fsSL https://claude.ai/install.sh | bash' || true
# Configure PATH for all users
echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.bun/bin:${PATH}"' >> /etc/profile.d/spawn.sh
echo 'export PATH="${HOME}/.claude/local/bin:${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}"' >> /etc/profile.d/spawn.sh
chmod +x /etc/profile.d/spawn.sh
touch /tmp/.cloud-init-complete
CLOUD_INIT_EOF

View file

@ -66,19 +66,26 @@ ensure_hcloud_token() {
log_info "Using hcloud CLI (context: $(hcloud context active))"
# Export token from CLI context for API fallback compatibility
if [[ -z "${HCLOUD_TOKEN:-}" ]]; then
# SECURITY: Use grep -F for literal string matching to prevent command injection
# if the context name contains shell metacharacters
local active_context
active_context=$(hcloud context active 2>/dev/null || echo "")
if [[ -n "${active_context}" ]]; then
# Use -F for literal string matching (no pattern interpretation)
HCLOUD_TOKEN=$(grep -F "[${active_context}]" ~/.config/hcloud/cli.toml 2>/dev/null | grep token | sed 's/.*= *"\(.*\)"/\1/' || true)
# hcloud config uses [[contexts]] array format (lines are indented):
# [[contexts]]
# name = "myctx"
# token = "abc123"
# Find the "name = " line, grab up to 5 lines after it, extract token
HCLOUD_TOKEN=$(grep -FA5 "name = \"${active_context}\"" ~/.config/hcloud/cli.toml 2>/dev/null | grep 'token *=' | sed 's/.*= *"\(.*\)"/\1/' | head -1 || true)
if [[ -n "${HCLOUD_TOKEN:-}" ]]; then
export HCLOUD_TOKEN
fi
fi
fi
return 0
if [[ -z "${HCLOUD_TOKEN:-}" ]]; then
log_warn "Could not extract API token from hcloud CLI config"
log_warn "Falling back to manual token entry..."
else
return 0
fi
else
log_info "hcloud CLI found but no active context"
log_info "Run: hcloud context create myproject"

View file

@ -375,6 +375,7 @@ install_base_deps() {
fi
run_ovh "$ip" "${sudo_prefix}apt-get update -qq && ${sudo_prefix}apt-get install -y -qq curl unzip git zsh build-essential python3 python3-pip nodejs npm > /dev/null 2>&1"
run_ovh "$ip" "${sudo_prefix}npm install -g n && ${sudo_prefix}n 22 && ${sudo_prefix}ln -sf /usr/local/bin/node /usr/bin/node && ${sudo_prefix}ln -sf /usr/local/bin/npm /usr/bin/npm && ${sudo_prefix}ln -sf /usr/local/bin/npx /usr/bin/npx"
# Install Bun
run_ovh "$ip" "curl -fsSL https://bun.sh/install | bash"

View file

@ -1387,8 +1387,8 @@ _ensure_nodejs_runtime() {
local claude_path="$2"
if ! ${run_cb} "${claude_path} && command -v node" >/dev/null 2>&1; then
log_step "Installing Node.js runtime (required for claude package)..."
if ${run_cb} "curl -fsSL https://deb.nodesource.com/setup_lts.x | bash - && apt-get install -y nodejs" >/dev/null 2>&1; then
log_info "Node.js installed via nodesource"
if ${run_cb} "apt-get install -y nodejs npm && npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx" >/dev/null 2>&1; then
log_info "Node.js installed via n"
else
log_warn "Could not install Node.js - bun method may fail"
fi
@ -1732,8 +1732,13 @@ packages:
- unzip
- git
- zsh
- nodejs
- npm
runcmd:
# Upgrade Node.js to v22 LTS (apt has v18, agents like Cline need v20+)
# n installs to /usr/local/bin but apt's v18 at /usr/bin can shadow it, so symlink over
- npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx
# Install Bun
- su - root -c 'curl -fsSL https://bun.sh/install | bash'
# Install Claude Code
@ -2207,10 +2212,15 @@ wait_for_cloud_init() {
# Run a command on a remote server via SSH
# Usage: ssh_run_server IP COMMAND
# Requires: SSH_USER (default: root), SSH_OPTS
# SECURITY: Command is properly quoted to prevent shell injection
# SECURITY: Command is properly quoted to prevent shell injection.
# Note: $cmd is always a shell command string (with pipes, semicolons, etc.)
# that is intentionally interpreted by the remote shell. All callers pass
# static command strings — never user-controlled input.
ssh_run_server() {
local ip="${1}"
local cmd="${2}"
# Single-quoted so $HOME/$PATH expand on the remote side, not locally.
local path_prefix='export PATH="$HOME/.local/bin:$HOME/.bun/bin:$PATH"'
if [[ -n "${SPAWN_DEBUG:-}" ]]; then
cmd="set -x; ${cmd}"
fi
@ -2218,7 +2228,7 @@ ssh_run_server() {
# < /dev/null prevents SSH from consuming the parent script's stdin.
# Without this, sequential SSH calls can steal input meant for later
# commands (e.g., safe_read prompts), causing hangs.
ssh $SSH_OPTS "${SSH_USER:-root}@${ip}" -- "${cmd}" < /dev/null
ssh $SSH_OPTS "${SSH_USER:-root}@${ip}" -- "${path_prefix} && ${cmd}" < /dev/null
}
# Upload a file to a remote server via SCP