spawn/.github/workflows/packer-snapshots.yml
Ahmed Abushagur 66a1749b4b
Some checks are pending
CLI Release / Build and release CLI (push) Waiting to run
Lint / ShellCheck (push) Waiting to run
Lint / Biome Lint (push) Waiting to run
Lint / macOS Compatibility (push) Waiting to run
fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869)
* fix: destroy orphaned Packer builder instances on workflow cancel

When a Packer Snapshots workflow is cancelled mid-build, Packer's process
is killed before it can clean up its temporary builder droplet/server.
This leaves orphaned packer-* instances running and costing money.

Add `if: cancelled()` cleanup steps for both DigitalOcean and Hetzner
that destroy any packer-* prefixed instances after cancellation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove Hetzner cleanup step — only DO needed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove Hetzner from Packer snapshots, add cancel cleanup

Remove Hetzner from the Packer workflow entirely — only DigitalOcean
snapshots are built. Deletes packer/hetzner.pkr.hcl and simplifies the
workflow by removing all Hetzner-specific steps and cloud conditionals.

Also adds a cancelled() cleanup step that destroys orphaned packer-*
builder droplets when a workflow run is cancelled mid-build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add missing sprite-keep-running.sh script

The keep-alive install was 404ing because sh/shared/sprite-keep-running.sh
never existed in the repo. The TypeScript code downloaded it from the CDN
(which maps to sh/shared/) but the file was never created.

The script wraps a command and pings the sprite's own public URL every 30s
to prevent inactivity shutdown. It resolves the URL via sprite-env info
(available on all sprites) and falls back to exec without keep-alive if
the URL can't be determined.

Also removes Hetzner from the Packer snapshots workflow entirely — only
DigitalOcean snapshots are built.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address security review — scope cleanup filter, fix JSON injection

1. Add `spawn-packer` tag to DO builder droplets in Packer template and
   filter cleanup by tag instead of broad `packer-` name prefix. Prevents
   accidentally destroying builder instances from other concurrent builds.

2. Use `jq --arg` for SINGLE_AGENT_INPUT instead of string interpolation
   to prevent JSON injection via crafted agent names.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 18:13:38 +00:00

182 lines
6.6 KiB
YAML

name: Packer Snapshots
on:
schedule:
# Nightly at 4 AM UTC (before tarball build at 5 AM)
- cron: "0 4 * * *"
workflow_dispatch:
inputs:
agent:
description: "Single agent to build (leave empty for all)"
required: false
type: string
permissions:
contents: read
jobs:
matrix:
name: Generate matrix
runs-on: ubuntu-latest
outputs:
include: ${{ steps.set.outputs.include }}
steps:
- uses: actions/checkout@v4
- id: set
run: |
SINGLE_AGENT="${SINGLE_AGENT_INPUT}"
if [ -n "$SINGLE_AGENT" ]; then
AGENTS=$(jq -nc --arg agent "$SINGLE_AGENT" '[$agent]')
else
AGENTS=$(jq -c 'keys' packer/agents.json)
fi
# Build a flat include array: [{agent, cloud}, ...]
INCLUDE=$(jq -nc --argjson agents "$AGENTS" \
'[$agents[] as $a | {agent: $a, cloud: "digitalocean"}]')
echo "include=${INCLUDE}" >> "$GITHUB_OUTPUT"
env:
SINGLE_AGENT_INPUT: ${{ inputs.agent }}
build:
name: "digitalocean/${{ matrix.agent }}"
needs: matrix
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
include: ${{ fromJson(needs.matrix.outputs.include) }}
steps:
- uses: actions/checkout@v4
- name: Read agent config
id: config
run: |
TIER=$(jq -r --arg a "$AGENT_NAME" '.[$a].tier // "minimal"' packer/agents.json)
INSTALL=$(jq -c --arg a "$AGENT_NAME" '.[$a].install // []' packer/agents.json)
echo "tier=${TIER}" >> "$GITHUB_OUTPUT"
echo "install=${INSTALL}" >> "$GITHUB_OUTPUT"
env:
AGENT_NAME: ${{ matrix.agent }}
- name: Setup Packer
uses: hashicorp/setup-packer@main
with:
version: latest
- name: Init Packer plugins
run: packer init packer/digitalocean.pkr.hcl
- name: Generate variables file
run: |
jq -n \
--arg token "$DO_API_TOKEN" \
--arg agent "$AGENT_NAME" \
--arg tier "$TIER" \
--argjson install "$INSTALL_COMMANDS" \
'{
do_api_token: $token,
agent_name: $agent,
cloud_init_tier: $tier,
install_commands: $install
}' > packer/auto.pkrvars.json
env:
DO_API_TOKEN: ${{ secrets.DO_API_TOKEN }}
AGENT_NAME: ${{ matrix.agent }}
TIER: ${{ steps.config.outputs.tier }}
INSTALL_COMMANDS: ${{ steps.config.outputs.install }}
- name: Build snapshot
run: packer build -var-file=packer/auto.pkrvars.json packer/digitalocean.pkr.hcl
# When a workflow is cancelled, Packer is killed before it can destroy
# the temporary builder droplet — leaving orphaned instances.
- name: Destroy orphaned builder droplets
if: cancelled()
run: |
# Filter by spawn-packer tag to avoid destroying builder droplets from other workflows
DROPLET_IDS=$(curl -s -H "Authorization: Bearer ${DO_API_TOKEN}" \
"https://api.digitalocean.com/v2/droplets?per_page=200&tag_name=spawn-packer" \
| jq -r '.droplets[].id')
if [ -z "$DROPLET_IDS" ]; then
echo "No orphaned packer builder droplets found"
exit 0
fi
for ID in $DROPLET_IDS; do
echo "Destroying orphaned builder droplet: ${ID}"
curl -s -X DELETE -H "Authorization: Bearer ${DO_API_TOKEN}" \
"https://api.digitalocean.com/v2/droplets/${ID}" || true
done
env:
DO_API_TOKEN: ${{ secrets.DO_API_TOKEN }}
- name: Cleanup old snapshots
if: success()
run: |
PREFIX="spawn-${AGENT_NAME}-"
SNAPSHOTS=$(curl -s -H "Authorization: Bearer ${DO_API_TOKEN}" \
"https://api.digitalocean.com/v2/images?private=true&per_page=100" \
| jq -r --arg prefix "$PREFIX" \
'[.images[] | select(.name | startswith($prefix))] | sort_by(.created_at) | reverse | .[1:] | .[].id')
for ID in $SNAPSHOTS; do
echo "Deleting old snapshot: ${ID}"
curl -s -X DELETE -H "Authorization: Bearer ${DO_API_TOKEN}" \
"https://api.digitalocean.com/v2/images/${ID}" || true
done
env:
DO_API_TOKEN: ${{ secrets.DO_API_TOKEN }}
AGENT_NAME: ${{ matrix.agent }}
- name: Submit to DO Marketplace
if: success()
run: |
# Skip if no marketplace app IDs configured
if [ -z "$MARKETPLACE_APP_IDS" ]; then
echo "No MARKETPLACE_APP_IDS secret — skipping marketplace submission"
exit 0
fi
# Look up this agent's app ID from the JSON map
APP_ID=$(echo "$MARKETPLACE_APP_IDS" | jq -r --arg a "$AGENT_NAME" '.[$a] // empty')
if [ -z "$APP_ID" ]; then
echo "No marketplace app ID for agent ${AGENT_NAME} — skipping"
exit 0
fi
# Extract snapshot ID from Packer manifest
# artifact_id format is "region:snapshot_id" (e.g. "sfo3:12345678")
IMG_ID=$(jq '.builds[-1].artifact_id | split(":")[1] | tonumber' packer/manifest.json)
if [ -z "$IMG_ID" ] || [ "$IMG_ID" = "null" ]; then
echo "Failed to extract snapshot ID from manifest"
exit 1
fi
echo "Submitting snapshot ${IMG_ID} for ${AGENT_NAME} (app: ${APP_ID})"
# PATCH the Vendor API — updates go to "pending" review.
# 400 = app already pending/in-review (expected for nightly runs), not an error.
HTTP_CODE=$(curl -s -o /tmp/mp-response.json -w "%{http_code}" \
-X PATCH \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${DO_API_TOKEN}" \
-d "$(jq -n \
--arg reason "Nightly rebuild — $(date -u '+%Y-%m-%d')" \
--argjson imageId "$IMG_ID" \
'{reasonForUpdate: $reason, imageId: $imageId}')" \
"https://api.digitalocean.com/api/v1/vendor-portal/apps/${APP_ID}")
case "$HTTP_CODE" in
200) echo "Marketplace submission accepted (pending review)" ;;
400) echo "App already pending review — skipping (expected for nightly runs)" ;;
*) echo "Marketplace API returned ${HTTP_CODE}:"
cat /tmp/mp-response.json
exit 1 ;;
esac
env:
DO_API_TOKEN: ${{ secrets.DO_API_TOKEN }}
AGENT_NAME: ${{ matrix.agent }}
MARKETPLACE_APP_IDS: ${{ secrets.MARKETPLACE_APP_IDS }}