spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-04-28 03:49:31 +00:00

Author	SHA1	Message	Date
Ahmed Abushagur	4e87523c4f	fix(packer): repair cursor tarball + hermes interactive install (#3367 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details agent-tarballs.yml has been failing nightly since 2026-03-27 and packer-snapshots.yml since 2026-04-25. Two distinct breakages. cursor: capture-agent.sh's allowlist was missing cursor, so the install step succeeded but the capture step rejected the agent name. Adds cursor to the allowlist plus its capture paths (~/.local/bin/ for the `agent` symlink, ~/.local/share/cursor-agent/ for the extracted package, matching what verify.sh and cursor-proxy already expect). hermes: The upstream installer launches an interactive setup wizard after install, which fails in CI with `/dev/tty: No such device or address`. Production code already passes `--skip-setup` (see packages/cli/src/shared/agent-setup.ts:1336); packer/agents.json was the lone exception. Adds the same flag. Both pipelines read from packer/agents.json, so this single edit unblocks both the daily tarball build and the DO marketplace image build for hermes. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 18:31:40 -07:00
A	5e0144b645	fix(zeroclaw): remove broken zeroclaw agent (repo 404) (#3107 ) * fix(zeroclaw): remove broken zeroclaw agent (repo 404) The zeroclaw-labs/zeroclaw GitHub repository returns 404 — all installs fail. Remove zeroclaw entirely from the matrix: agent definition, setup code, shell scripts, e2e tests, packer config, skill files, and documentation. Fixes #3102 Agent: code-health Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(zeroclaw): remove stale zeroclaw reference from discovery.md ARM agents list Addresses security review on PR #3107 — the last remaining zeroclaw reference in .claude/rules/discovery.md is now removed. Agent: issue-fixer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(zeroclaw): remove remaining stale zeroclaw references from CI/packer Remove zeroclaw from: - .github/workflows/agent-tarballs.yml ARM build matrix - .github/workflows/docker.yml agent matrix - packer/digitalocean.pkr.hcl comment - sh/e2e/e2e.sh comment Addresses all 5 stale references flagged in security review of PR #3107. Agent: issue-fixer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-30 15:35:40 -07:00
A	0bd8930c09	fix(digitalocean): use canonical DIGITALOCEAN_ACCESS_TOKEN env var (#3099 ) Replaces all references to DO_API_TOKEN with DIGITALOCEAN_ACCESS_TOKEN, matching DigitalOcean's official CLI and API documentation. This includes TypeScript source, tests, shell scripts, Packer config, CI workflows, and documentation. Supersedes #3068 (rebased onto current main). Agent: pr-maintainer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-30 08:48:56 +07:00
Ahmed Abushagur	3687fb38c3	ci: add cursor agent to tarball build pipeline (#3049 ) Cursor CLI installs a native binary via curl, so it needs both x86_64 and arm64 builds. Also adds cursor.com to the allowed domains list. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-27 13:42:46 +07:00
Ahmed Abushagur	66a1749b4b	fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869 ) Some checks are pending CLI Release / Build and release CLI (push) Waiting to run Details Lint / ShellCheck (push) Waiting to run Details Lint / Biome Lint (push) Waiting to run Details Lint / macOS Compatibility (push) Waiting to run Details * fix: destroy orphaned Packer builder instances on workflow cancel When a Packer Snapshots workflow is cancelled mid-build, Packer's process is killed before it can clean up its temporary builder droplet/server. This leaves orphaned packer-* instances running and costing money. Add `if: cancelled()` cleanup steps for both DigitalOcean and Hetzner that destroy any packer-* prefixed instances after cancellation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove Hetzner cleanup step — only DO needed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove Hetzner from Packer snapshots, add cancel cleanup Remove Hetzner from the Packer workflow entirely — only DigitalOcean snapshots are built. Deletes packer/hetzner.pkr.hcl and simplifies the workflow by removing all Hetzner-specific steps and cloud conditionals. Also adds a cancelled() cleanup step that destroys orphaned packer-* builder droplets when a workflow run is cancelled mid-build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add missing sprite-keep-running.sh script The keep-alive install was 404ing because sh/shared/sprite-keep-running.sh never existed in the repo. The TypeScript code downloaded it from the CDN (which maps to sh/shared/) but the file was never created. The script wraps a command and pings the sprite's own public URL every 30s to prevent inactivity shutdown. It resolves the URL via sprite-env info (available on all sprites) and falls back to exec without keep-alive if the URL can't be determined. Also removes Hetzner from the Packer snapshots workflow entirely — only DigitalOcean snapshots are built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address security review — scope cleanup filter, fix JSON injection 1. Add `spawn-packer` tag to DO builder droplets in Packer template and filter cleanup by tag instead of broad `packer-` name prefix. Prevents accidentally destroying builder instances from other concurrent builds. 2. Use `jq --arg` for SINGLE_AGENT_INPUT instead of string interpolation to prevent JSON injection via crafted agent names. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 18:13:38 +00:00
Ahmed Abushagur	2825884fee	fix(packer): use cpx22 in nbg1 for Hetzner builds (#2785 ) cx23 is only available in Helsinki — poor availability. Switch to cpx22 (AMD, 2 vCPU, 4GB) which is available in nbg1/hel1/sin. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 22:52:01 -07:00
Ahmed Abushagur	7289f3ef36	feat(hetzner): add snapshot support + Packer image builds (#2774 ) Some checks failed CLI Release / Build and release CLI (push) Failing after 31s Details Lint / ShellCheck (push) Successful in 40s Details Lint / Biome Lint (push) Failing after 14s Details Lint / macOS Compatibility (push) Successful in 18s Details CLI changes: - Add findSpawnSnapshot() to query Hetzner /images?type=snapshot API for pre-built spawn-{agent}-* images (matches by description prefix) - Add waitForSshOnly() for snapshot boots (skips cloud-init polling) - Update createServer() to accept optional snapshotId — boots from snapshot instead of ubuntu-24.04, skips cloud-init userdata - Wire up orchestrator with skipAgentInstall flag Packer changes: - Add packer/hetzner.pkr.hcl using hcloud plugin, mirroring the DO template (tier scripts, agent install, cleanup, manifest) - Unify packer-snapshots.yml to build both DO and Hetzner in a single workflow with cloud×agent matrix and per-cloud cleanup steps Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 16:46:48 -07:00
Ahmed Abushagur	34fc9b6d4d	fix: increase packer snapshot transfer timeout to 60m (#2648 ) * fix: increase packer snapshot transfer timeout to 60m The default 30m timeout is too short for transferring snapshots to distant DO regions (blr1, sgp1, syd1). This caused zeroclaw and kilocode builds to fail despite successful provisioning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: remove batch splitting from packer workflow DO droplet cap is no longer an issue — revert to single parallel build job for all agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 04:48:11 -04:00
A	0d66125fd6	fix: add junie to tarball build pipeline (#2541 ) Junie was added as a fully implemented agent (manifest, agent scripts, agent-setup.ts) but the packer/tarball pipeline was never updated. This meant the nightly agent-tarballs workflow could not build a pre-built tarball for Junie, forcing all deployments to do a live npm install. - Add junie entry to packer/agents.json (tier: node, @jetbrains/junie-cli) - Add junie to capture-agent.sh allowlist and path-capture case (npm-based, same as codex/kilocode — captures /root/.npm-global/) Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-12 18:45:03 -04:00
Ahmed Abushagur	4004b51f6d	fix: use curl for Chrome download + capture google-chrome-stable in tarball (#2370 ) - wget not available on many cloud VMs, use curl instead - Remove 2>/dev/null from dpkg/apt so install errors are visible - Capture /usr/bin/google-chrome-stable in tarball (actual .deb binary name) - Use curl in packer/agents.json tarball build too Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-08 23:59:32 -07:00
Ahmed Abushagur	7e2f9f45fc	fix: use Google Chrome .deb for OpenClaw browser tool (#2368 ) * fix: use Google Chrome .deb instead of Playwright for OpenClaw browser Snap Chromium on Ubuntu 24.04 fails because AppArmor confinement blocks CDP control. OpenClaw's own docs recommend installing Google Chrome via .deb package which bypasses snap entirely. Also adds browser.noSandbox and browser.executablePath to the OpenClaw config so the browser tool works out of the box on Linux VMs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unnecessary confirmation prompt when OAuth fails If OAuth didn't complete, the user obviously wants to paste a key. The "Paste your API key manually? (Y/n)" prompt was pointless friction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove unnecessary "Continue anyway?" credential confirmation If the user selected a cloud, they obviously want to continue. The warning + setup guidance is sufficient — no need to block on a confirm. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: move Chrome install to configure step so it runs after tarball The tarball path skips agent.install() entirely, so Chrome never got installed. Moving it to configure() (setupOpenclawConfig) ensures it always runs regardless of install method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: bundle Google Chrome in openclaw tarball Add Chrome .deb install to openclaw's tarball build so it ships pre-installed. Capture /usr/bin/google-chrome and /opt/google/chrome/ in the tarball. Add dl.google.com to the workflow domain allowlist. The configure() step still has a fallback install with idempotency check (command -v google-chrome) for non-tarball installs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use openclaw config set for browser setup + correct binary name - Use `google-chrome-stable` (actual .deb binary name) not `google-chrome` - Set browser config via `openclaw config set` CLI (the supported way) instead of writing JSON directly which wasn't being picked up - Remove browser section from JSON config to avoid conflicts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 01:52:08 -04:00
A	51dec6e877	fix: E2E failures - SSH key gen race, hetzner 409, hermes binary path (#2305 ) Three distinct E2E bugs fixed: 1. SSH key generation race condition: When multiple agents provision in parallel, concurrent processes all call generateSshKey() and race to create ~/.ssh/id_ed25519. ssh-keygen won't overwrite an existing file (prompts on stdin which is "ignore"), causing zeroclaw/codex to fail with "SSH key generation failed". Fix: check if key already exists before generating, and re-check after a failed generation attempt. 2. Hetzner SSH key 409 uniqueness_error: The Hetzner API returns HTTP 409 with "SSH key not unique" when the same key content is registered under a different name. The hetznerApi() function throws on non-2xx before the error-parsing code runs, and the regex /already/ didn't match "not unique". Fix: catch 409 in ensureSshKey() and match against uniqueness_error/not unique/already patterns. 3. Hermes binary not found: The hermes install script (uv tool) creates the actual binary + venv at ~/.hermes/hermes-agent/venv/ with a symlink at ~/.local/bin/hermes. The tarball capture script only captured the symlink + ~/.local/share/, leaving a dangling symlink. Fix: include ~/.hermes/ in capture paths, add venv/bin to verify.sh PATH check, and update hermes launchCmd to include the venv PATH. Fixes #2304 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-07 22:05:44 -05:00
Ahmed Abushagur	7bebc6558f	feat: full marketplace compliance + automated Vendor API submission (#2295 ) Packer template: - Match official 90-cleanup.sh: remove SSH host keys, create revoked_keys, remove cloud-init instances, zero-fill free space, use --force-confold for upgrades, autoremove/autoclean - Add Packer manifest post-processor for snapshot ID extraction - Remove PACKER_LOG=1 (debug logging not needed in production) Workflow: - Add "Submit to DO Marketplace" step after successful build - Reads agent→app_id mapping from MARKETPLACE_APP_IDS secret (JSON) - Extracts snapshot ID from Packer manifest, PATCHes Vendor API - Gracefully handles 400 (app already pending review) - Skips silently if no MARKETPLACE_APP_IDS secret is configured Setup: add MARKETPLACE_APP_IDS secret as JSON, e.g.: {"claude":"60089fc6...", "codex":"60089fc7..."} App IDs come from the DO Vendor Portal after initial approval. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 16:40:04 -05:00
A	70d8462e56	fix: add explicit input validation to capture-agent.sh (Fixes #2281 ) (#2282 ) Add whitelist validation for AGENT_NAME immediately after the empty check to prevent command injection and path traversal via the parameter. While the existing case statement catches unknown agents, explicit upfront validation makes the security intent clear and defensive. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-07 06:27:28 -08:00
Ahmed Abushagur	7643b96266	fix: pass DO Marketplace img_check validation (#2276 ) Three fixes for marketplace validation failures: 1. Install all security updates (apt-get dist-upgrade) — img_check fails if any security patches are pending. 2. Purge droplet-agent and /opt/digitalocean — img_check fails if the DO monitoring agent directory exists. 3. Correct img_check.sh filename to 99-img-check.sh — the previous URL returned 404. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 02:43:46 -05:00
Ahmed Abushagur	4719b49754	fix: correct img_check.sh filename to 99-img-check.sh (#2275 ) The marketplace-partners repo uses `99-img-check.sh`, not `img_check.sh`. The wrong filename caused a 404 on curl download, failing all agent builds with exit code 22. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 01:48:52 -05:00
Ahmed Abushagur	5103a763b4	fix: packer build — OOM kill and history builtin (#2274 ) * fix: claude snapshot build — remove npm fallback from install command The native install (curl \| bash) succeeds but exits non-zero due to a PATH warning. The \|\| fallback then tries `npm install` which doesn't exist on the "minimal" tier → exit 127. Fix: replace npm fallback with binary existence check (same pattern as hermes agent). If install exits non-zero but ~/.local/bin/claude exists, the build succeeds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: snapshot cleanup and lookup — use name prefix instead of tags DO Packer builder `tags` only apply to the temporary build droplet, not the resulting snapshot image. Both the workflow cleanup step and the CLI's findSpawnSnapshot() were querying by `tag_name` which returned nothing — old snapshots piled up and the CLI couldn't find existing snapshots. Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of tags, in both the workflow and the CLI. Remove misleading `tags` from the Packer template. Add test cases for name-prefix filtering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: packer build failures — OOM kill + history builtin Two issues introduced by PR #2271 (marketplace compliance): 1. Droplet downsized to s-1vcpu-1gb (1GB RAM) — Claude's native installer and zeroclaw's Rust build get OOM-killed. Restore s-2vcpu-2gb. 2. Cleanup provisioner uses `history -c` which is a bash builtin. Packer runs scripts with /bin/sh (dash on Ubuntu) which doesn't have it → exit 127 on ALL agents. Remove it — the .bash_history file deletion already handles persistent history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 01:15:39 -05:00
Ahmed Abushagur	d77a067aa4	fix: snapshot cleanup + claude install (name-prefix filter) (#2273 ) * fix: claude snapshot build — remove npm fallback from install command The native install (curl \| bash) succeeds but exits non-zero due to a PATH warning. The \|\| fallback then tries `npm install` which doesn't exist on the "minimal" tier → exit 127. Fix: replace npm fallback with binary existence check (same pattern as hermes agent). If install exits non-zero but ~/.local/bin/claude exists, the build succeeds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: snapshot cleanup and lookup — use name prefix instead of tags DO Packer builder `tags` only apply to the temporary build droplet, not the resulting snapshot image. Both the workflow cleanup step and the CLI's findSpawnSnapshot() were querying by `tag_name` which returned nothing — old snapshots piled up and the CLI couldn't find existing snapshots. Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of tags, in both the workflow and the CLI. Remove misleading `tags` from the Packer template. Add test cases for name-prefix filtering. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 21:32:58 -08:00
A	c3cb98daab	feat: add DO Marketplace compliance to Packer build pipeline (#2271 ) - Switch build droplet from s-2vcpu-2gb to s-1vcpu-1gb ($6/mo) per DO Marketplace recommendation for cross-size snapshot compatibility - Add ufw firewall provisioner (deny incoming, allow SSH, enable) - Replace basic apt-get clean with full DO Marketplace cleanup sequence: removes SSH authorized_keys, clears bash history, truncates /var/log, resets machine-id, and runs cloud-init clean so each launched droplet gets a fresh identity on first boot - Add img_check.sh validation step (from digitalocean/marketplace-partners) to verify firewall active, no root password, and security posture before the snapshot is finalized — build fails if image doesn't meet requirements Fixes #2269 Agent: issue-fixer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-03-07 00:20:35 -05:00
Ahmed Abushagur	955a6081c1	fix: Packer build region/size and PATH for agent installs (#2270 ) * feat: restore Packer DO snapshot pipeline for fast agent boot Restores the nightly Packer snapshot build pipeline (reverted in #2205) that pre-bakes agent images as DigitalOcean snapshots. When a snapshot exists on the user's account, droplet boot skips cloud-init and tarball install entirely — cutting provisioning from ~10min to ~2min. - Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region distribution, apt-lock wait, and snapshot marker - Add `.github/workflows/packer-snapshots.yml` nightly build with matrix strategy, auto-cleanup of old snapshots, and injection-safe env var handling - Add `findSpawnSnapshot()` to query DO API for pre-built snapshots - Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait) - Modify `createServer()` to accept optional `snapshotId` param - Wire snapshot detection in DO `main.ts` orchestrator - Add `skipAgentInstall` to `CloudOrchestrator` interface to skip tarball + install steps when booting from snapshot - Add 5 unit tests for snapshot lookup (happy path, empty, error, invalid ID, network failure) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use repo-root-relative path for tier scripts in Packer template Packer resolves script paths relative to cwd (repo root), not relative to the .pkr.hcl file. Changed `scripts/tier-.sh` to `packer/scripts/tier-.sh`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: Packer build region/size and PATH for agent installs Two issues causing build failures: 1. `s-2vcpu-4gb` not available in `nyc3` — changed build region to `sfo3` and size to `s-2vcpu-2gb` (universally available, cheaper, sufficient for building snapshots) 2. Claude install puts binary in `~/.local/bin` which isn't in PATH during Packer provisioning — added full PATH to environment_vars on both the install and marker provisioners so agent binaries and subsequent scripts can find each other Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 22:45:39 -05:00
Ahmed Abushagur	e7b6b0b9fd	fix: Packer tier script path relative to repo root (#2266 ) * feat: restore Packer DO snapshot pipeline for fast agent boot Restores the nightly Packer snapshot build pipeline (reverted in #2205) that pre-bakes agent images as DigitalOcean snapshots. When a snapshot exists on the user's account, droplet boot skips cloud-init and tarball install entirely — cutting provisioning from ~10min to ~2min. - Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region distribution, apt-lock wait, and snapshot marker - Add `.github/workflows/packer-snapshots.yml` nightly build with matrix strategy, auto-cleanup of old snapshots, and injection-safe env var handling - Add `findSpawnSnapshot()` to query DO API for pre-built snapshots - Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait) - Modify `createServer()` to accept optional `snapshotId` param - Wire snapshot detection in DO `main.ts` orchestrator - Add `skipAgentInstall` to `CloudOrchestrator` interface to skip tarball + install steps when booting from snapshot - Add 5 unit tests for snapshot lookup (happy path, empty, error, invalid ID, network failure) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use repo-root-relative path for tier scripts in Packer template Packer resolves script paths relative to cwd (repo root), not relative to the .pkr.hcl file. Changed `scripts/tier-.sh` to `packer/scripts/tier-.sh`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 17:40:57 -08:00
Ahmed Abushagur	cefcd56327	feat: restore Packer DO snapshot pipeline for fast agent boot (#2262 ) Restores the nightly Packer snapshot build pipeline (reverted in #2205) that pre-bakes agent images as DigitalOcean snapshots. When a snapshot exists on the user's account, droplet boot skips cloud-init and tarball install entirely — cutting provisioning from ~10min to ~2min. - Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region distribution, apt-lock wait, and snapshot marker - Add `.github/workflows/packer-snapshots.yml` nightly build with matrix strategy, auto-cleanup of old snapshots, and injection-safe env var handling - Add `findSpawnSnapshot()` to query DO API for pre-built snapshots - Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait) - Modify `createServer()` to accept optional `snapshotId` param - Wire snapshot detection in DO `main.ts` orchestrator - Add `skipAgentInstall` to `CloudOrchestrator` interface to skip tarball + install steps when booting from snapshot - Add 5 unit tests for snapshot lookup (happy path, empty, error, invalid ID, network failure) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 16:32:05 -08:00
Ahmed Abushagur	4ac19a375a	fix: capture claude symlink target + verify PATH (#2245 ) * fix: tarball workflow failures (root ownership, swapfile, hermes TTY) - Use sudo mv + chown for tarball in release step (root-owned from capture) - Skip swapfile creation if /swapfile already exists (GitHub Actions runners) - Tolerate hermes setup wizard failure when /dev/tty unavailable in CI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: capture claude symlink target in tarball + fix verify PATH The claude installer creates a symlink at ~/.local/bin/claude pointing to ~/.local/share/claude/versions/X.Y.Z. The capture script was missing ~/.local/share/claude/, causing a broken symlink in the tarball. Also add ~/.npm-global/bin to the verify PATH check for claude (npm fallback install path). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-03-06 10:55:09 -05:00
Ahmed Abushagur	ba9690ea23	fix: tarball workflow failures (root ownership, swapfile, hermes TTY) (#2240 ) - Use sudo mv + chown for tarball in release step (root-owned from capture) - Skip swapfile creation if /swapfile already exists (GitHub Actions runners) - Tolerate hermes setup wizard failure when /dev/tty unavailable in CI Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 05:48:46 -05:00
Ahmed Abushagur	8072c084c2	feat: pre-built agent tarballs for fast install (#2232 ) * feat: pre-built agent tarballs on GitHub Releases for fast install Adds a nightly GitHub Actions workflow that builds and uploads agent tarballs to rolling GitHub Releases. During provisioning, the CLI now attempts to download and extract a tarball before falling back to live install. Priority chain: snapshot > tarball > live install. - New workflow: .github/workflows/agent-tarballs.yml - New capture script: packer/scripts/capture-agent.sh - New module: packages/cli/src/shared/agent-tarball.ts - Orchestrate tries tarball first on non-local clouds - Skip tarball when using DO snapshot (skipTarball flag) - Tests for tarball install + orchestration integration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use global.fetch mock pattern and address security review - Use `global.fetch = mock(...)` instead of `spyOn(globalThis, "fetch")` to match codebase convention and fix CI mock interception - Add URL validation regex to reject shell metacharacters (CRITICAL) - Add agent name validation in workflow input (MEDIUM) - Add `jq has()` check before executing install commands (CRITICAL) - Use `tar -T` instead of unquoted word-splitting in capture-agent.sh (MEDIUM) - Resolve merge conflicts with upstream/main (keep Docker fields, adapt to simplified DO flow, bump version to 0.15.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use globalThis.fetch for testability in CI Bun's native fetch binding doesn't go through global.fetch property lookup, so global.fetch = mock(...) doesn't intercept it. Using globalThis.fetch explicitly ensures the mock interception works. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing packer dependencies and harden install command safety - Add packer/agents.json (agent tier + install command definitions) - Add packer/scripts/tier-{minimal,node,bun,full}.sh (dependency scripts) - Add basic command safety check rejecting suspicious patterns - Document packer/agents.json as a trust boundary requiring PR review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): fix npm prefix mismatch, add apt-get update, cleanup - Add apt-get update -y before apt-get install in all tier scripts - Add --prefix ~/.npm-global to npm install commands in agents.json so installed packages land where capture-agent.sh expects them - Rename misleading MARKER_DIR → MARKER_FILE in capture-agent.sh - Remove stale comment referencing packer snapshots in workflow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): detect empty agent installs in capture script The "no files found" check was dead code — the marker file is always created before filtering, so FILTERED_FILE always had at least one entry. Now we count non-marker entries to catch cases where the agent install silently fails and no actual files are on disk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): use bare fetch() for Bun mock compatibility in CI In Bun, global.fetch = mock(...) overrides bare fetch() calls but NOT globalThis.fetch() calls. Every other source file in the codebase uses bare fetch() and their mocks work fine in CI. Switch to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): use dependency injection for fetch in tests Bun's global.fetch mock doesn't reliably intercept bare fetch() calls across all Bun versions in CI. Instead of fighting the runtime, accept an optional fetchFn parameter (defaults to fetch) and pass mock fetch directly in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): bypass mock.module bleed in agent-tarball tests orchestrate.test.ts uses mock.module("../shared/agent-tarball", ...) which is process-global in Bun and bleeds into agent-tarball.test.ts. Import via URL (import.meta.url resolution) to bypass the specifier- based mock matching and get the real module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): eliminate mock.module bleed between test files Bun's mock.module is process-global — orchestrate.test.ts mocking agent-tarball poisoned agent-tarball.test.ts (the mock function ignored the fetchFn parameter and always returned false). Fix: make tryTarballInstall injectable via OrchestrationOptions. orchestrate.test.ts passes the mock directly via options instead of using mock.module. agent-tarball.test.ts imports the real module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): mock Bun.which in credential priority tests Tests assumed no cloud CLIs were installed, but machines with hcloud/ doctl would get "CLI installed" hint overrides, failing the assertion. Spy on Bun.which to return null so tests are environment-independent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: fix import ordering after rebase Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: add curl domain allowlist and expand command blocklist Addresses security review findings: - Add domain allowlist for curl/wget targets (claude.ai, opencode.ai, raw.githubusercontent.com, registry.npmjs.org, crates.io, github.com) - Expand suspicious command blocklist (python -c, perl -e, ruby -e, dd, /dev/) - Document 4-layer security model in workflow comments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: add rm -rf to command blocklist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 04:49:39 -05:00
Ahmed Abushagur	07c2c08e3a	revert: remove Packer snapshot pipeline (#2205 ) DO snapshots are private and account-scoped — users on different accounts cannot see snapshots built by the CI token. Docker images are the better approach for cross-account pre-built agents. Removes: packer/, packer-snapshots workflow, snapshot lookup code, and snapshot test. Reverts DO CLI to plain cloud-init flow. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 02:48:52 -05:00
Ahmed Abushagur	ed98a59318	feat(digitalocean): Packer nightly snapshot pipeline for fast boot (#2198 ) * feat(digitalocean): Packer nightly snapshot pipeline for fast boot Add pre-built Packer snapshots for DigitalOcean droplets. Instead of 10-20 min cloud-init + agent install on every boot, snapshot-based droplets boot in ~2-3 min (SSH only, agent pre-installed). - Packer HCL2 template with parametrized agent/tier builds - Agent build matrix (packer/agents.json) for all 7 agents - Tier scripts mirroring cloud-init.ts package tiers - Nightly GitHub Actions workflow (4 AM UTC, max-parallel: 3) - Automatic cleanup: keeps only latest snapshot per agent - CLI: findSpawnSnapshot() looks up pre-built images via DO API - CLI: waitForSshOnly() skips cloud-init when using snapshots - CLI: createServer() accepts optional snapshotId, skips user_data - CLI: main.ts routes to fast path when snapshot detected - Tests for findSpawnSnapshot() (5 cases, all passing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(packer): use var-file for install_commands to avoid shell quoting issues The previous approach passed install_commands as `-var` inline, but GitHub Actions expands `${{ }}` before shell evaluation — JSON arrays with `\|`, `&&`, and `"` characters break shell quoting. Fix: generate a `.auto.pkrvars.json` file (auto-loaded by Packer) using jq with --argjson for safe JSON handling. Also route all `${{ inputs }}` and `${{ matrix }}` values through env vars to prevent script injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:47:46 -08:00

27 commits