spawn/daytona/lib/common.sh
Ahmed Abushagur f2795a6d84
fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440)
* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds

aider-chat on Python 3.13 fails with `ImportError: cannot import name
'_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved
— those releases have no Python 3.13 binary wheels, so the C extension
is missing at runtime.

Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>`
and single quotes get mangled by `printf '%q'` in run_server before the
command reaches the remote machine) with `--upgrade`, which forces all
transitive deps including Pillow to their latest compatible versions.

Also adds a plain-text echo before the install so users see progress
instead of a silent hang during the 2-4 minute install.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: update aider/gptme/interpreter assertions from pip to uv

The install method for aider, gptme, and open-interpreter was changed
from pip to `uv tool install` across all clouds. The mock test
assertions still checked for the old `pip.*install.*` patterns, causing
9 failures (3 agents × 3 clouds).

Update patterns to match the actual `uv tool install` commands now used
in all cloud scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: trigger test run for uv assertion fix

* fix: prevent SSH hangs, restore stderr, fix command escaping across clouds

- Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH
  stdin theft causing sequential install/verify/configure steps to hang
- Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default
  SSH_OPTS so long-running installs don't silently drop on flaky networks
- Remove 2>/dev/null from Fly.io run_server so remote command errors are
  no longer silently swallowed (--quiet flag still suppresses flyctl noise)
- Fix Fly.io printf '%q' double-quoting: remove extra quotes around
  $escaped_cmd that prevented the remote shell from consuming escapes,
  breaking && || | operators in commands
- Remove broken printf '%q' from Daytona run_server and interactive_session
  where it escaped shell operators into literal characters since daytona exec
  has no intermediate shell layer
- Pin aider to --python 3.12 instead of --with audioop-lts across all clouds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add --pty to fly ssh console for interactive sessions

fly ssh console -C does not allocate a pseudo-terminal by default,
causing interactive TUI agents (aider, claude) to fail with
"Input is not a terminal (fd=0)" or completely unresponsive input.

Adding --pty forces PTY allocation, matching how other clouds handle
interactive sessions (SSH uses -t, Sprite uses -tty).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prepend ~/.local/bin to PATH in ssh_run_server

After uv installs to ~/.local/bin, the current shell session doesn't
have it in PATH, causing "uv: command not found" on DigitalOcean and
all other SSH-based clouds (Hetzner, AWS, GCP, OVH).

Fly.io's run_server already prepends this PATH — now the shared
ssh_run_server does the same, fixing all SSH-based clouds at once.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Node.js to cloud-init for all cloud providers

npm-based agents (codex, kilocode, etc.) fail with "npm: command not
found" because Node.js isn't installed during cloud-init. Fly.io was
the only provider installing Node.js (in wait_for_cloud_init).

Now all cloud-init scripts install Node.js v22 LTS from nodesource,
matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and
GCP cloud-init (was already in shared/DigitalOcean/Hetzner).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use apt packages for nodejs/npm instead of nodesource

The nodesource setup script (setup_22.x) runs its own apt-get update
and repository configuration, nearly doubling cloud-init time and
causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm
in its default repos — just add them to the packages list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add timeouts and better error handling to Daytona CLI commands

Daytona CLI commands (login, list, create) can hang indefinitely when
the API is slow or unreachable. This causes:
- "Failed to create sandbox: timeout" with no recovery
- Token validation timeouts misreported as "invalid token"
- Users re-entering valid tokens that also timeout

Fixes:
- Wrap all daytona CLI calls with timeout (30s for auth, 120s for create)
- Detect timeout errors separately from auth errors
- Show actionable "try again / check status" messages for timeouts
- Add nodejs/npm to Daytona wait_for_cloud_init

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: set DAYTONA_API_URL to Daytona Cloud by default

The Daytona CLI may default to connecting to a local self-hosted
server instead of Daytona Cloud. Without DAYTONA_API_URL set to
https://app.daytona.io/api, every CLI command (login, list, create)
hangs trying to reach a non-existent local server and times out.

The SDK documents this as the default, but the CLI doesn't always
pick it up — now we export it explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing

n installs Node.js v22 to /usr/local/bin/node but apt's v18 at
/usr/bin/node can shadow it in non-interactive SSH sessions. After
n 22, symlink the new binaries over the apt ones so v22 is always
resolved. Also fix hcloud CLI token extraction for new TOML format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address security review, add curl timeouts to trigger workflows

- Fix ssh_run_server command injection concern: use single-quoted
  path_prefix so $HOME/$PATH expand remotely, not locally
- Add --connect-timeout 15 --max-time 30 to trigger workflows to
  prevent 5-min hangs when server streams responses
- Handle 409 (dedup) as success — expected when cron fires every 15min
  but cycles take 35min
- Reduce workflow timeout-minutes from 5 to 2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 06:54:07 -05:00

334 lines
12 KiB
Bash

#!/bin/bash
# Common bash functions for Daytona sandbox spawn scripts
# Uses Daytona CLI (daytona) — https://www.daytona.io
# Sandboxes are cloud dev environments with true SSH access
# Default: --class small (override with DAYTONA_CLASS or explicit DAYTONA_CPU/MEMORY/DISK)
# Bash safety flags
set -eo pipefail
# ============================================================
# Provider-agnostic functions
# ============================================================
# Source shared provider-agnostic functions (local or remote fallback)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" 2>/dev/null && pwd)"
if [[ -n "${SCRIPT_DIR}" && -f "${SCRIPT_DIR}/../../shared/common.sh" ]]; then
source "${SCRIPT_DIR}/../../shared/common.sh"
else
eval "$(curl -fsSL https://raw.githubusercontent.com/OpenRouterTeam/spawn/main/shared/common.sh)"
fi
# Note: Provider-agnostic functions (logging, OAuth, browser, nc_listen) are now in shared/common.sh
# ============================================================
# Daytona specific functions
# ============================================================
SPAWN_DASHBOARD_URL="https://app.daytona.io/"
ensure_daytona_cli() {
if ! command -v daytona &>/dev/null; then
log_step "Installing Daytona CLI..."
if command -v brew &>/dev/null; then
brew install daytonaio/cli/daytona 2>/dev/null || {
log_error "Failed to install Daytona CLI via Homebrew"
log_error "Install manually: brew install daytonaio/cli/daytona"
return 1
}
else
log_error "Daytona CLI not found and Homebrew is not available"
log_error "Install manually: brew install daytonaio/cli/daytona"
log_error "See: https://www.daytona.io/docs/en/getting-started"
return 1
fi
fi
log_info "Daytona CLI available"
}
_is_daytona_auth_error() {
printf '%s' "${1}" | grep -qi "unauthorized\|invalid.*key\|authentication\|forbidden"
}
_is_daytona_timeout() {
printf '%s' "${1}" | grep -qi "timeout\|timed out\|deadline exceeded\|context canceled"
}
_daytona_auth_error() {
log_error "Invalid API key"
log_error "How to fix:"
log_warn " 1. Verify API key at: https://app.daytona.io"
log_warn " 2. Ensure the key has sandbox permissions"
log_warn " 3. Check key hasn't expired or been revoked"
}
# Run a daytona CLI command with a timeout to prevent indefinite hangs.
# Usage: _daytona_with_timeout SECONDS daytona [args...]
_daytona_with_timeout() {
local secs="${1}"; shift
local timeout_bin=""
if command -v timeout &>/dev/null; then timeout_bin="timeout"
elif command -v gtimeout &>/dev/null; then timeout_bin="gtimeout"
fi
if [[ -n "${timeout_bin}" ]]; then
"${timeout_bin}" "${secs}" "$@"
else
"$@"
fi
}
test_daytona_token() {
local test_response
local exit_code=0
# Authenticate CLI with the API key (30s timeout)
# Use --api-key=VALUE syntax per official Daytona docs
test_response=$(_daytona_with_timeout 30 daytona login --api-key="${DAYTONA_API_KEY}" 2>&1) || exit_code=$?
if [[ ${exit_code} -eq 124 ]]; then
log_error "Daytona login timed out (Daytona API may be temporarily unavailable)"
log_warn "Try again in a minute, or check https://status.daytona.io"
return 1
fi
if [[ ${exit_code} -ne 0 ]]; then
if _is_daytona_timeout "${test_response}"; then
log_error "Daytona login timed out: ${test_response}"
return 1
fi
if _is_daytona_auth_error "${test_response}"; then
_daytona_auth_error; return 1
fi
log_error "Daytona login failed: ${test_response}"
return 1
fi
# Verify by listing sandboxes (30s timeout)
exit_code=0
test_response=$(_daytona_with_timeout 30 daytona sandbox list --limit 1 2>&1) || exit_code=$?
if [[ ${exit_code} -eq 124 ]] || _is_daytona_timeout "${test_response:-}"; then
log_error "Daytona API timed out (service may be temporarily slow)"
log_warn "Try again in a minute, or check https://status.daytona.io"
return 1
fi
if [[ ${exit_code} -ne 0 ]]; then
if _is_daytona_auth_error "${test_response}"; then
_daytona_auth_error
else
log_error "Daytona API check failed: ${test_response}"
fi
return 1
fi
return 0
}
ensure_daytona_token() {
ensure_api_token_with_provider \
"Daytona" \
"DAYTONA_API_KEY" \
"${HOME}/.config/spawn/daytona.json" \
"https://app.daytona.io" \
"test_daytona_token"
# Always authenticate CLI — ensure_api_token_with_provider may skip
# the test function when loading from env var, so daytona login
# would never be called. Re-running login is idempotent and fast.
_daytona_with_timeout 30 daytona login --api-key="${DAYTONA_API_KEY}" 2>/dev/null || true
}
get_server_name() {
get_resource_name "DAYTONA_SANDBOX_NAME" "Enter sandbox name: "
}
_is_snapshot_conflict() {
printf '%s' "${1}" | grep -qi "cannot specify.*resources.*snapshot\|cannot specify.*sandbox.*resources"
}
_daytona_create_with_resources() {
local name="${1}"
local cpu="${DAYTONA_CPU:-2}"
local memory="${DAYTONA_MEMORY:-4096}"
local disk="${DAYTONA_DISK:-5}"
# Validate numeric env vars to prevent command injection
if [[ ! "${cpu}" =~ ^[0-9]+$ ]]; then log_error "Invalid DAYTONA_CPU: must be numeric"; return 1; fi
if [[ ! "${memory}" =~ ^[0-9]+$ ]]; then log_error "Invalid DAYTONA_MEMORY: must be numeric"; return 1; fi
if [[ ! "${disk}" =~ ^[0-9]+$ ]]; then log_error "Invalid DAYTONA_DISK: must be numeric"; return 1; fi
log_step "Creating Daytona sandbox '${name}' (${cpu} vCPU / ${memory}MB RAM / ${disk}GB disk)..."
_daytona_with_timeout 120 daytona sandbox create \
--name "${name}" \
--cpu "${cpu}" \
--memory "${memory}" \
--disk "${disk}" \
--auto-stop 0 \
--auto-archive 0 \
2>&1
}
_daytona_create_with_class() {
local name="${1}"
local sandbox_class="${DAYTONA_CLASS:-small}"
# Validate class to prevent injection (alphanumeric, hyphens, underscores)
if [[ ! "${sandbox_class}" =~ ^[a-zA-Z0-9_-]+$ ]]; then
log_error "Invalid DAYTONA_CLASS: must be alphanumeric (with hyphens/underscores)"
return 1
fi
log_step "Creating Daytona sandbox '${name}' (class: ${sandbox_class})..."
_daytona_with_timeout 120 daytona sandbox create \
--name "${name}" \
--class "${sandbox_class}" \
--auto-stop 0 \
--auto-archive 0 \
2>&1
}
_resolve_sandbox_id() {
local name="${1}"
# Try to get the sandbox ID from `daytona info`
local info_output
info_output=$(daytona sandbox info "${name}" --format json 2>/dev/null) || true
if [[ -n "${info_output}" ]]; then
DAYTONA_SANDBOX_ID=$(printf '%s' "${info_output}" | python3 -c "import json,sys; print(json.load(sys.stdin).get('id',''))" 2>/dev/null) || true
fi
# Fall back to using the name as the identifier (Daytona accepts both)
if [[ -z "${DAYTONA_SANDBOX_ID:-}" ]]; then
DAYTONA_SANDBOX_ID="${name}"
fi
export DAYTONA_SANDBOX_ID
export DAYTONA_SANDBOX_NAME_ACTUAL="${name}"
}
create_server() {
local name="${1}"
local output
local exit_code=0
# Try explicit resources first if any resource env vars are set
if [[ -n "${DAYTONA_CPU:-}" || -n "${DAYTONA_MEMORY:-}" || -n "${DAYTONA_DISK:-}" ]]; then
output=$(_daytona_create_with_resources "${name}") && exit_code=0 || exit_code=$?
# Detect snapshot/resource conflict and fall back to --class
if [[ ${exit_code} -ne 0 ]] && _is_snapshot_conflict "${output}"; then
log_warn "Daytona rejected explicit resource flags (snapshot in use)"
log_step "Retrying with --class small..."
output=$(_daytona_create_with_class "${name}") && exit_code=0 || exit_code=$?
fi
else
output=$(_daytona_create_with_class "${name}") && exit_code=0 || exit_code=$?
fi
if [[ ${exit_code} -ne 0 ]]; then
if [[ ${exit_code} -eq 124 ]] || _is_daytona_timeout "${output:-}"; then
log_error "Sandbox creation timed out (Daytona API may be temporarily slow)"
log_warn "Try again in a minute, or check https://status.daytona.io"
log_warn "You can also try: daytona create --name ${name} --class small"
elif _is_snapshot_conflict "${output}"; then
log_error "Cannot specify resources when using a Daytona snapshot"
log_error ""
log_error "Use a sandbox class instead:"
log_error " DAYTONA_CLASS=small spawn <agent> daytona"
log_error ""
log_error "Or unset explicit resource variables:"
log_error " unset DAYTONA_CPU DAYTONA_MEMORY DAYTONA_DISK"
else
log_error "Failed to create sandbox: ${output}"
fi
return 1
fi
_resolve_sandbox_id "${name}"
log_info "Sandbox created: ${DAYTONA_SANDBOX_ID}"
save_vm_connection "daytona-sandbox" "daytona" "${DAYTONA_SANDBOX_ID}" "$name" "daytona"
}
wait_for_cloud_init() {
log_step "Installing base tools in sandbox..."
run_server "apt-get update -y && apt-get install -y curl unzip git zsh nodejs npm" >/dev/null 2>&1 || true
run_server "npm install -g n && n 22 && ln -sf /usr/local/bin/node /usr/bin/node && ln -sf /usr/local/bin/npm /usr/bin/npm && ln -sf /usr/local/bin/npx /usr/bin/npx" >/dev/null 2>&1 || true
run_server "curl -fsSL https://bun.sh/install | bash" >/dev/null 2>&1 || true
run_server "curl -fsSL https://claude.ai/install.sh | bash" >/dev/null 2>&1 || true
run_server 'echo "export PATH=\"${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}\"" >> ~/.bashrc' >/dev/null 2>&1 || true
run_server 'echo "export PATH=\"${HOME}/.local/bin:${HOME}/.bun/bin:${PATH}\"" >> ~/.zshrc' >/dev/null 2>&1 || true
log_info "Base tools installed"
}
# Daytona uses `daytona exec` for running commands in sandboxes.
# The command string is passed directly to bash -c as a single argument.
# All callers pass trusted, hardcoded command strings (not user input).
# Do NOT use printf '%q' here — it escapes shell operators like && and ||
# into literal characters, breaking multi-part commands.
run_server() {
local cmd="${1}"
daytona exec "${DAYTONA_SANDBOX_ID}" -- bash -c "${cmd}"
}
upload_file() {
local local_path="${1}"
local remote_path="${2}"
# SECURITY: Strict allowlist validation — only safe path characters
if [[ ! "${remote_path}" =~ ^[a-zA-Z0-9/_.~-]+$ ]]; then
log_error "Invalid remote path (must contain only alphanumeric, /, _, ., ~, -): ${remote_path}"
return 1
fi
# base64 output is safe (alphanumeric + /+=) so no injection risk
local content
content=$(base64 -w0 < "${local_path}" 2>/dev/null || base64 < "${local_path}")
daytona exec "${DAYTONA_SANDBOX_ID}" -- bash -c "printf '%s' '${content}' | base64 -d > '${remote_path}'"
}
# Daytona has true SSH support — much better than exec-only providers
interactive_session() {
local cmd="${1}"
local session_exit=0
if [[ -z "${cmd}" ]]; then
# Pure interactive shell via SSH
daytona ssh "${DAYTONA_SANDBOX_ID}" || session_exit=$?
else
# Run a specific command interactively via exec.
# Pass directly to bash -c — do NOT use printf '%q' (see run_server comment).
daytona exec "${DAYTONA_SANDBOX_ID}" -- bash -c "${cmd}" || session_exit=$?
fi
SERVER_NAME="${DAYTONA_SANDBOX_ID:-}" SPAWN_RECONNECT_CMD="daytona ssh ${DAYTONA_SANDBOX_ID:-}" \
_show_exec_post_session_summary
return "${session_exit}"
}
destroy_server() {
local sandbox_id="${1:-${DAYTONA_SANDBOX_ID:-}}"
if [[ -z "${sandbox_id}" ]]; then
log_warn "No sandbox ID to destroy"
return 0
fi
log_step "Destroying sandbox ${sandbox_id}..."
daytona sandbox delete "${sandbox_id}" 2>/dev/null || true
log_info "Sandbox destroyed"
}
list_servers() {
daytona sandbox list
}
# ============================================================
# Cloud adapter interface
# ============================================================
cloud_authenticate() { ensure_daytona_cli; ensure_daytona_token; }
cloud_provision() { create_server "$1"; }
cloud_wait_ready() { wait_for_cloud_init; }
cloud_run() { run_server "$1"; }
cloud_upload() { upload_file "$1" "$2"; }
cloud_interactive() { interactive_session "$1"; }
cloud_label() { echo "Daytona sandbox"; }