Commit graph

27 commits

Author SHA1 Message Date
Ahmed Abushagur
4e87523c4f
fix(packer): repair cursor tarball + hermes interactive install (#3367)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
agent-tarballs.yml has been failing nightly since 2026-03-27 and
packer-snapshots.yml since 2026-04-25. Two distinct breakages.

cursor:
  capture-agent.sh's allowlist was missing cursor, so the install
  step succeeded but the capture step rejected the agent name.
  Adds cursor to the allowlist plus its capture paths
  (~/.local/bin/ for the `agent` symlink, ~/.local/share/cursor-agent/
  for the extracted package, matching what verify.sh and cursor-proxy
  already expect).

hermes:
  The upstream installer launches an interactive setup wizard after
  install, which fails in CI with `/dev/tty: No such device or
  address`. Production code already passes `--skip-setup` (see
  packages/cli/src/shared/agent-setup.ts:1336); packer/agents.json
  was the lone exception. Adds the same flag.

Both pipelines read from packer/agents.json, so this single edit
unblocks both the daily tarball build and the DO marketplace image
build for hermes.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 18:31:40 -07:00
A
5e0144b645
fix(zeroclaw): remove broken zeroclaw agent (repo 404) (#3107)
* fix(zeroclaw): remove broken zeroclaw agent (repo 404)

The zeroclaw-labs/zeroclaw GitHub repository returns 404 — all installs
fail. Remove zeroclaw entirely from the matrix: agent definition,
setup code, shell scripts, e2e tests, packer config, skill files,
and documentation.

Fixes #3102

Agent: code-health
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(zeroclaw): remove stale zeroclaw reference from discovery.md ARM agents list

Addresses security review on PR #3107 — the last remaining zeroclaw
reference in .claude/rules/discovery.md is now removed.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(zeroclaw): remove remaining stale zeroclaw references from CI/packer

Remove zeroclaw from:
- .github/workflows/agent-tarballs.yml ARM build matrix
- .github/workflows/docker.yml agent matrix
- packer/digitalocean.pkr.hcl comment
- sh/e2e/e2e.sh comment

Addresses all 5 stale references flagged in security review of PR #3107.

Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-30 15:35:40 -07:00
A
0bd8930c09
fix(digitalocean): use canonical DIGITALOCEAN_ACCESS_TOKEN env var (#3099)
Replaces all references to DO_API_TOKEN with DIGITALOCEAN_ACCESS_TOKEN,
matching DigitalOcean's official CLI and API documentation. This includes
TypeScript source, tests, shell scripts, Packer config, CI workflows,
and documentation.

Supersedes #3068 (rebased onto current main).

Agent: pr-maintainer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-30 08:48:56 +07:00
Ahmed Abushagur
3687fb38c3
ci: add cursor agent to tarball build pipeline (#3049)
Cursor CLI installs a native binary via curl, so it needs both x86_64
and arm64 builds. Also adds cursor.com to the allowed domains list.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-27 13:42:46 +07:00
Ahmed Abushagur
66a1749b4b
fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869)
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
* fix: destroy orphaned Packer builder instances on workflow cancel

When a Packer Snapshots workflow is cancelled mid-build, Packer's process
is killed before it can clean up its temporary builder droplet/server.
This leaves orphaned packer-* instances running and costing money.

Add `if: cancelled()` cleanup steps for both DigitalOcean and Hetzner
that destroy any packer-* prefixed instances after cancellation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove Hetzner cleanup step — only DO needed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove Hetzner from Packer snapshots, add cancel cleanup

Remove Hetzner from the Packer workflow entirely — only DigitalOcean
snapshots are built. Deletes packer/hetzner.pkr.hcl and simplifies the
workflow by removing all Hetzner-specific steps and cloud conditionals.

Also adds a cancelled() cleanup step that destroys orphaned packer-*
builder droplets when a workflow run is cancelled mid-build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing sprite-keep-running.sh script

The keep-alive install was 404ing because sh/shared/sprite-keep-running.sh
never existed in the repo. The TypeScript code downloaded it from the CDN
(which maps to sh/shared/) but the file was never created.

The script wraps a command and pings the sprite's own public URL every 30s
to prevent inactivity shutdown. It resolves the URL via sprite-env info
(available on all sprites) and falls back to exec without keep-alive if
the URL can't be determined.

Also removes Hetzner from the Packer snapshots workflow entirely — only
DigitalOcean snapshots are built.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address security review — scope cleanup filter, fix JSON injection

1. Add `spawn-packer` tag to DO builder droplets in Packer template and
   filter cleanup by tag instead of broad `packer-` name prefix. Prevents
   accidentally destroying builder instances from other concurrent builds.

2. Use `jq --arg` for SINGLE_AGENT_INPUT instead of string interpolation
   to prevent JSON injection via crafted agent names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:13:38 +00:00
Ahmed Abushagur
2825884fee
fix(packer): use cpx22 in nbg1 for Hetzner builds (#2785)
cx23 is only available in Helsinki — poor availability. Switch to
cpx22 (AMD, 2 vCPU, 4GB) which is available in nbg1/hel1/sin.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 22:52:01 -07:00
Ahmed Abushagur
7289f3ef36
feat(hetzner): add snapshot support + Packer image builds (#2774)
Some checks failed
CLI Release / Build and release CLI (push) Failing after 31s
Lint / ShellCheck (push) Successful in 40s
Lint / Biome Lint (push) Failing after 14s
Lint / macOS Compatibility (push) Successful in 18s
CLI changes:
- Add findSpawnSnapshot() to query Hetzner /images?type=snapshot API
  for pre-built spawn-{agent}-* images (matches by description prefix)
- Add waitForSshOnly() for snapshot boots (skips cloud-init polling)
- Update createServer() to accept optional snapshotId — boots from
  snapshot instead of ubuntu-24.04, skips cloud-init userdata
- Wire up orchestrator with skipAgentInstall flag

Packer changes:
- Add packer/hetzner.pkr.hcl using hcloud plugin, mirroring the DO
  template (tier scripts, agent install, cleanup, manifest)
- Unify packer-snapshots.yml to build both DO and Hetzner in a single
  workflow with cloud×agent matrix and per-cloud cleanup steps

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 16:46:48 -07:00
Ahmed Abushagur
34fc9b6d4d
fix: increase packer snapshot transfer timeout to 60m (#2648)
* fix: increase packer snapshot transfer timeout to 60m

The default 30m timeout is too short for transferring snapshots to
distant DO regions (blr1, sgp1, syd1). This caused zeroclaw and
kilocode builds to fail despite successful provisioning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: remove batch splitting from packer workflow

DO droplet cap is no longer an issue — revert to single parallel build
job for all agents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 04:48:11 -04:00
A
0d66125fd6
fix: add junie to tarball build pipeline (#2541)
Junie was added as a fully implemented agent (manifest, agent scripts,
agent-setup.ts) but the packer/tarball pipeline was never updated.
This meant the nightly agent-tarballs workflow could not build a
pre-built tarball for Junie, forcing all deployments to do a live
npm install.

- Add junie entry to packer/agents.json (tier: node, @jetbrains/junie-cli)
- Add junie to capture-agent.sh allowlist and path-capture case
  (npm-based, same as codex/kilocode — captures /root/.npm-global/)

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-12 18:45:03 -04:00
Ahmed Abushagur
4004b51f6d
fix: use curl for Chrome download + capture google-chrome-stable in tarball (#2370)
- wget not available on many cloud VMs, use curl instead
- Remove 2>/dev/null from dpkg/apt so install errors are visible
- Capture /usr/bin/google-chrome-stable in tarball (actual .deb binary name)
- Use curl in packer/agents.json tarball build too

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-08 23:59:32 -07:00
Ahmed Abushagur
7e2f9f45fc
fix: use Google Chrome .deb for OpenClaw browser tool (#2368)
* fix: use Google Chrome .deb instead of Playwright for OpenClaw browser

Snap Chromium on Ubuntu 24.04 fails because AppArmor confinement blocks
CDP control. OpenClaw's own docs recommend installing Google Chrome via
.deb package which bypasses snap entirely.

Also adds browser.noSandbox and browser.executablePath to the OpenClaw
config so the browser tool works out of the box on Linux VMs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unnecessary confirmation prompt when OAuth fails

If OAuth didn't complete, the user obviously wants to paste a key.
The "Paste your API key manually? (Y/n)" prompt was pointless friction.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove unnecessary "Continue anyway?" credential confirmation

If the user selected a cloud, they obviously want to continue.
The warning + setup guidance is sufficient — no need to block on a confirm.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: move Chrome install to configure step so it runs after tarball

The tarball path skips agent.install() entirely, so Chrome never got
installed. Moving it to configure() (setupOpenclawConfig) ensures it
always runs regardless of install method.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: bundle Google Chrome in openclaw tarball

Add Chrome .deb install to openclaw's tarball build so it ships
pre-installed. Capture /usr/bin/google-chrome and /opt/google/chrome/
in the tarball. Add dl.google.com to the workflow domain allowlist.

The configure() step still has a fallback install with idempotency
check (command -v google-chrome) for non-tarball installs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use openclaw config set for browser setup + correct binary name

- Use `google-chrome-stable` (actual .deb binary name) not `google-chrome`
- Set browser config via `openclaw config set` CLI (the supported way)
  instead of writing JSON directly which wasn't being picked up
- Remove browser section from JSON config to avoid conflicts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 01:52:08 -04:00
A
51dec6e877
fix: E2E failures - SSH key gen race, hetzner 409, hermes binary path (#2305)
Three distinct E2E bugs fixed:

1. SSH key generation race condition: When multiple agents provision in
   parallel, concurrent processes all call generateSshKey() and race to
   create ~/.ssh/id_ed25519. ssh-keygen won't overwrite an existing file
   (prompts on stdin which is "ignore"), causing zeroclaw/codex to fail
   with "SSH key generation failed". Fix: check if key already exists
   before generating, and re-check after a failed generation attempt.

2. Hetzner SSH key 409 uniqueness_error: The Hetzner API returns HTTP 409
   with "SSH key not unique" when the same key content is registered under
   a different name. The hetznerApi() function throws on non-2xx before
   the error-parsing code runs, and the regex /already/ didn't match
   "not unique". Fix: catch 409 in ensureSshKey() and match against
   uniqueness_error/not unique/already patterns.

3. Hermes binary not found: The hermes install script (uv tool) creates
   the actual binary + venv at ~/.hermes/hermes-agent/venv/ with a symlink
   at ~/.local/bin/hermes. The tarball capture script only captured the
   symlink + ~/.local/share/, leaving a dangling symlink. Fix: include
   ~/.hermes/ in capture paths, add venv/bin to verify.sh PATH check,
   and update hermes launchCmd to include the venv PATH.

Fixes #2304

Agent: code-health

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 22:05:44 -05:00
Ahmed Abushagur
7bebc6558f
feat: full marketplace compliance + automated Vendor API submission (#2295)
Packer template:
- Match official 90-cleanup.sh: remove SSH host keys, create
  revoked_keys, remove cloud-init instances, zero-fill free space,
  use --force-confold for upgrades, autoremove/autoclean
- Add Packer manifest post-processor for snapshot ID extraction
- Remove PACKER_LOG=1 (debug logging not needed in production)

Workflow:
- Add "Submit to DO Marketplace" step after successful build
- Reads agent→app_id mapping from MARKETPLACE_APP_IDS secret (JSON)
- Extracts snapshot ID from Packer manifest, PATCHes Vendor API
- Gracefully handles 400 (app already pending review)
- Skips silently if no MARKETPLACE_APP_IDS secret is configured

Setup: add MARKETPLACE_APP_IDS secret as JSON, e.g.:
  {"claude":"60089fc6...", "codex":"60089fc7..."}
App IDs come from the DO Vendor Portal after initial approval.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 16:40:04 -05:00
A
70d8462e56
fix: add explicit input validation to capture-agent.sh (Fixes #2281) (#2282)
Add whitelist validation for AGENT_NAME immediately after the empty
check to prevent command injection and path traversal via the parameter.
While the existing case statement catches unknown agents, explicit
upfront validation makes the security intent clear and defensive.

Agent: security-auditor

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 06:27:28 -08:00
Ahmed Abushagur
7643b96266
fix: pass DO Marketplace img_check validation (#2276)
Three fixes for marketplace validation failures:

1. Install all security updates (apt-get dist-upgrade) — img_check
   fails if any security patches are pending.
2. Purge droplet-agent and /opt/digitalocean — img_check fails if
   the DO monitoring agent directory exists.
3. Correct img_check.sh filename to 99-img-check.sh — the previous
   URL returned 404.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 02:43:46 -05:00
Ahmed Abushagur
4719b49754
fix: correct img_check.sh filename to 99-img-check.sh (#2275)
The marketplace-partners repo uses `99-img-check.sh`, not
`img_check.sh`. The wrong filename caused a 404 on curl download,
failing all agent builds with exit code 22.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 01:48:52 -05:00
Ahmed Abushagur
5103a763b4
fix: packer build — OOM kill and history builtin (#2274)
* fix: claude snapshot build — remove npm fallback from install command

The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.

Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: snapshot cleanup and lookup — use name prefix instead of tags

DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.

Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: packer build failures — OOM kill + history builtin

Two issues introduced by PR #2271 (marketplace compliance):

1. Droplet downsized to s-1vcpu-1gb (1GB RAM) — Claude's native
   installer and zeroclaw's Rust build get OOM-killed. Restore
   s-2vcpu-2gb.

2. Cleanup provisioner uses `history -c` which is a bash builtin.
   Packer runs scripts with /bin/sh (dash on Ubuntu) which doesn't
   have it → exit 127 on ALL agents. Remove it — the .bash_history
   file deletion already handles persistent history.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 01:15:39 -05:00
Ahmed Abushagur
d77a067aa4
fix: snapshot cleanup + claude install (name-prefix filter) (#2273)
* fix: claude snapshot build — remove npm fallback from install command

The native install (curl | bash) succeeds but exits non-zero due to a
PATH warning. The || fallback then tries `npm install` which doesn't
exist on the "minimal" tier → exit 127.

Fix: replace npm fallback with binary existence check (same pattern
as hermes agent). If install exits non-zero but ~/.local/bin/claude
exists, the build succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: snapshot cleanup and lookup — use name prefix instead of tags

DO Packer builder `tags` only apply to the temporary build droplet,
not the resulting snapshot image. Both the workflow cleanup step and
the CLI's findSpawnSnapshot() were querying by `tag_name` which
returned nothing — old snapshots piled up and the CLI couldn't find
existing snapshots.

Fix: filter by snapshot name prefix (`spawn-{agent}-`) instead of
tags, in both the workflow and the CLI. Remove misleading `tags`
from the Packer template. Add test cases for name-prefix filtering.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 21:32:58 -08:00
A
c3cb98daab
feat: add DO Marketplace compliance to Packer build pipeline (#2271)
- Switch build droplet from s-2vcpu-2gb to s-1vcpu-1gb ($6/mo) per DO
  Marketplace recommendation for cross-size snapshot compatibility
- Add ufw firewall provisioner (deny incoming, allow SSH, enable)
- Replace basic apt-get clean with full DO Marketplace cleanup sequence:
  removes SSH authorized_keys, clears bash history, truncates /var/log,
  resets machine-id, and runs cloud-init clean so each launched droplet
  gets a fresh identity on first boot
- Add img_check.sh validation step (from digitalocean/marketplace-partners)
  to verify firewall active, no root password, and security posture before
  the snapshot is finalized — build fails if image doesn't meet requirements

Fixes #2269

Agent: issue-fixer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-03-07 00:20:35 -05:00
Ahmed Abushagur
955a6081c1
fix: Packer build region/size and PATH for agent installs (#2270)
* feat: restore Packer DO snapshot pipeline for fast agent boot

Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.

- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
  distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
  matrix strategy, auto-cleanup of old snapshots, and injection-safe
  env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
  tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
  invalid ID, network failure)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use repo-root-relative path for tier scripts in Packer template

Packer resolves script paths relative to cwd (repo root), not relative
to the .pkr.hcl file. Changed `scripts/tier-*.sh` to
`packer/scripts/tier-*.sh`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: Packer build region/size and PATH for agent installs

Two issues causing build failures:

1. `s-2vcpu-4gb` not available in `nyc3` — changed build region to
   `sfo3` and size to `s-2vcpu-2gb` (universally available, cheaper,
   sufficient for building snapshots)

2. Claude install puts binary in `~/.local/bin` which isn't in PATH
   during Packer provisioning — added full PATH to environment_vars
   on both the install and marker provisioners so agent binaries and
   subsequent scripts can find each other

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:45:39 -05:00
Ahmed Abushagur
e7b6b0b9fd
fix: Packer tier script path relative to repo root (#2266)
* feat: restore Packer DO snapshot pipeline for fast agent boot

Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.

- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
  distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
  matrix strategy, auto-cleanup of old snapshots, and injection-safe
  env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
  tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
  invalid ID, network failure)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use repo-root-relative path for tier scripts in Packer template

Packer resolves script paths relative to cwd (repo root), not relative
to the .pkr.hcl file. Changed `scripts/tier-*.sh` to
`packer/scripts/tier-*.sh`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 17:40:57 -08:00
Ahmed Abushagur
cefcd56327
feat: restore Packer DO snapshot pipeline for fast agent boot (#2262)
Restores the nightly Packer snapshot build pipeline (reverted in #2205)
that pre-bakes agent images as DigitalOcean snapshots. When a snapshot
exists on the user's account, droplet boot skips cloud-init and tarball
install entirely — cutting provisioning from ~10min to ~2min.

- Add `packer/digitalocean.pkr.hcl` HCL2 template with multi-region
  distribution, apt-lock wait, and snapshot marker
- Add `.github/workflows/packer-snapshots.yml` nightly build with
  matrix strategy, auto-cleanup of old snapshots, and injection-safe
  env var handling
- Add `findSpawnSnapshot()` to query DO API for pre-built snapshots
- Add `waitForSshOnly()` for snapshot boots (skip cloud-init wait)
- Modify `createServer()` to accept optional `snapshotId` param
- Wire snapshot detection in DO `main.ts` orchestrator
- Add `skipAgentInstall` to `CloudOrchestrator` interface to skip
  tarball + install steps when booting from snapshot
- Add 5 unit tests for snapshot lookup (happy path, empty, error,
  invalid ID, network failure)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 16:32:05 -08:00
Ahmed Abushagur
4ac19a375a
fix: capture claude symlink target + verify PATH (#2245)
* fix: tarball workflow failures (root ownership, swapfile, hermes TTY)

- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: capture claude symlink target in tarball + fix verify PATH

The claude installer creates a symlink at ~/.local/bin/claude pointing
to ~/.local/share/claude/versions/X.Y.Z. The capture script was missing
~/.local/share/claude/, causing a broken symlink in the tarball.

Also add ~/.npm-global/bin to the verify PATH check for claude (npm
fallback install path).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
2026-03-06 10:55:09 -05:00
Ahmed Abushagur
ba9690ea23
fix: tarball workflow failures (root ownership, swapfile, hermes TTY) (#2240)
- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 05:48:46 -05:00
Ahmed Abushagur
8072c084c2
feat: pre-built agent tarballs for fast install (#2232)
* feat: pre-built agent tarballs on GitHub Releases for fast install

Adds a nightly GitHub Actions workflow that builds and uploads agent
tarballs to rolling GitHub Releases. During provisioning, the CLI now
attempts to download and extract a tarball before falling back to live
install. Priority chain: snapshot > tarball > live install.

- New workflow: .github/workflows/agent-tarballs.yml
- New capture script: packer/scripts/capture-agent.sh
- New module: packages/cli/src/shared/agent-tarball.ts
- Orchestrate tries tarball first on non-local clouds
- Skip tarball when using DO snapshot (skipTarball flag)
- Tests for tarball install + orchestration integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use global.fetch mock pattern and address security review

- Use `global.fetch = mock(...)` instead of `spyOn(globalThis, "fetch")`
  to match codebase convention and fix CI mock interception
- Add URL validation regex to reject shell metacharacters (CRITICAL)
- Add agent name validation in workflow input (MEDIUM)
- Add `jq has()` check before executing install commands (CRITICAL)
- Use `tar -T` instead of unquoted word-splitting in capture-agent.sh (MEDIUM)
- Resolve merge conflicts with upstream/main (keep Docker fields, adapt
  to simplified DO flow, bump version to 0.15.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use globalThis.fetch for testability in CI

Bun's native fetch binding doesn't go through global.fetch property
lookup, so global.fetch = mock(...) doesn't intercept it. Using
globalThis.fetch explicitly ensures the mock interception works.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add missing packer dependencies and harden install command safety

- Add packer/agents.json (agent tier + install command definitions)
- Add packer/scripts/tier-{minimal,node,bun,full}.sh (dependency scripts)
- Add basic command safety check rejecting suspicious patterns
- Document packer/agents.json as a trust boundary requiring PR review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tarballs): fix npm prefix mismatch, add apt-get update, cleanup

- Add apt-get update -y before apt-get install in all tier scripts
- Add --prefix ~/.npm-global to npm install commands in agents.json
  so installed packages land where capture-agent.sh expects them
- Rename misleading MARKER_DIR → MARKER_FILE in capture-agent.sh
- Remove stale comment referencing packer snapshots in workflow

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tarballs): detect empty agent installs in capture script

The "no files found" check was dead code — the marker file is always
created before filtering, so FILTERED_FILE always had at least one
entry. Now we count non-marker entries to catch cases where the agent
install silently fails and no actual files are on disk.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tarballs): use bare fetch() for Bun mock compatibility in CI

In Bun, global.fetch = mock(...) overrides bare fetch() calls but NOT
globalThis.fetch() calls. Every other source file in the codebase uses
bare fetch() and their mocks work fine in CI. Switch to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tarballs): use dependency injection for fetch in tests

Bun's global.fetch mock doesn't reliably intercept bare fetch() calls
across all Bun versions in CI. Instead of fighting the runtime, accept
an optional fetchFn parameter (defaults to fetch) and pass mock fetch
directly in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tarballs): bypass mock.module bleed in agent-tarball tests

orchestrate.test.ts uses mock.module("../shared/agent-tarball", ...)
which is process-global in Bun and bleeds into agent-tarball.test.ts.
Import via URL (import.meta.url resolution) to bypass the specifier-
based mock matching and get the real module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tarballs): eliminate mock.module bleed between test files

Bun's mock.module is process-global — orchestrate.test.ts mocking
agent-tarball poisoned agent-tarball.test.ts (the mock function
ignored the fetchFn parameter and always returned false).

Fix: make tryTarballInstall injectable via OrchestrationOptions.
orchestrate.test.ts passes the mock directly via options instead
of using mock.module. agent-tarball.test.ts imports the real module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(tests): mock Bun.which in credential priority tests

Tests assumed no cloud CLIs were installed, but machines with hcloud/
doctl would get "CLI installed" hint overrides, failing the assertion.
Spy on Bun.which to return null so tests are environment-independent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: fix import ordering after rebase

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: add curl domain allowlist and expand command blocklist

Addresses security review findings:
- Add domain allowlist for curl/wget targets (claude.ai, opencode.ai,
  raw.githubusercontent.com, registry.npmjs.org, crates.io, github.com)
- Expand suspicious command blocklist (python -c, perl -e, ruby -e, dd, /dev/)
- Document 4-layer security model in workflow comments

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* security: add rm -rf to command blocklist

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 04:49:39 -05:00
Ahmed Abushagur
07c2c08e3a
revert: remove Packer snapshot pipeline (#2205)
DO snapshots are private and account-scoped — users on different
accounts cannot see snapshots built by the CI token. Docker images
are the better approach for cross-account pre-built agents.

Removes: packer/, packer-snapshots workflow, snapshot lookup code,
and snapshot test. Reverts DO CLI to plain cloud-init flow.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 02:48:52 -05:00
Ahmed Abushagur
ed98a59318
feat(digitalocean): Packer nightly snapshot pipeline for fast boot (#2198)
* feat(digitalocean): Packer nightly snapshot pipeline for fast boot

Add pre-built Packer snapshots for DigitalOcean droplets. Instead of
10-20 min cloud-init + agent install on every boot, snapshot-based
droplets boot in ~2-3 min (SSH only, agent pre-installed).

- Packer HCL2 template with parametrized agent/tier builds
- Agent build matrix (packer/agents.json) for all 7 agents
- Tier scripts mirroring cloud-init.ts package tiers
- Nightly GitHub Actions workflow (4 AM UTC, max-parallel: 3)
- Automatic cleanup: keeps only latest snapshot per agent
- CLI: findSpawnSnapshot() looks up pre-built images via DO API
- CLI: waitForSshOnly() skips cloud-init when using snapshots
- CLI: createServer() accepts optional snapshotId, skips user_data
- CLI: main.ts routes to fast path when snapshot detected
- Tests for findSpawnSnapshot() (5 cases, all passing)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(packer): use var-file for install_commands to avoid shell quoting issues

The previous approach passed install_commands as `-var` inline, but
GitHub Actions expands `${{ }}` before shell evaluation — JSON arrays
with `|`, `&&`, and `"` characters break shell quoting.

Fix: generate a `.auto.pkrvars.json` file (auto-loaded by Packer)
using jq with --argjson for safe JSON handling. Also route all
`${{ inputs }}` and `${{ matrix }}` values through env vars to
prevent script injection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-04 20:47:46 -08:00