- Spinner completion messages now show "done" state instead of repeating
the in-progress message (e.g., "Loading manifest" instead of "Loading manifest...")
- Script failures show actionable troubleshooting (missing credentials,
rate limits, dependencies) instead of generic "Script exited with code N"
- Ctrl+C (exit code 130) exits silently instead of showing an error
- Fuzzy matching for unknown agents/clouds now also searches display names,
so "Hetzner" suggests "hetzner" even when the key doesn't fuzzy-match
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: `claude -p` (print mode) terminates the session when the
model produces a text response with no tool call. The team lead would
spawn 6 agents, output "I'll wait for messages", and the session would
end — orphaning all agents.
Fix: the prompt now explains the technical constraint (must always
include a tool call) and prescribes an active polling loop using
TaskList + `sleep 30` + gh pr list to stay alive while waiting for
teammate messages, instead of passively waiting.
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the naive `claude | tee` pipe with a background process +
watchdog loop that monitors log file growth every 10 seconds.
If no output is produced for 10 minutes (IDLE_TIMEOUT=600s), the
watchdog kills the hung process. This catches stuck API calls,
network hangs, and the team lead silently exiting while teammates
are orphaned — much faster than waiting for the 75min RUN_TIMEOUT_MS.
Team cycle: 10min idle timeout, 60min hard wall-clock timeout
Single cycle: 10min idle timeout, 35min hard wall-clock timeout
The next cron trigger starts a fresh cycle automatically.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rename "Missing" column to "Not available on" to avoid confusion
- Change "all clouds" to "-- all clouds supported" for full coverage agents
- Only show +/- grid legend in grid view (not compact view)
- Fix help text alignment for "spawn list" command
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- create_server(): Validate hostname, plan, region env vars with
validate_resource_name(); pass all values via sys.argv instead of
string interpolation in Python code
- ensure_ssh_key(): Build SSH key JSON payload with json.dumps via
sys.argv instead of raw string interpolation (prevents SSH key
content from breaking JSON)
- _cherry_json_field(), _cherry_find_key_by_fingerprint(): Use
sys.argv instead of bash variable interpolation in Python strings
Agent: security-auditor
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The team lead was spawning 6 agents then exiting because the prompt
lacked explicit instructions to stay alive and wait for messages (the
discovery prompt has this, refactor didn't). Added the WaitForMessage
monitoring loop pattern from discovery.sh.
Also increased IDLE_TIMEOUT from 180s to 600s — 3 min was too
aggressive, killing legitimate cycles where agents are working and the
leader is waiting for their responses.
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Creates agent deployment scripts for Cherry Servers that were marked
as "implemented" in manifest.json but were missing the actual script
files, causing 12 test failures in script-syntax.test.ts.
Added scripts: claude, nanoclaw, aider, codex, interpreter, gemini,
amazonq, cline, gptme, opencode, plandex, kilocode
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The IONOS provider interpolated $IONOS_USERNAME and $IONOS_PASSWORD
directly into Python string literals when saving credentials, allowing
arbitrary code execution via crafted credential values containing
single quotes. Use json_escape + printf instead (matching the pattern
used by all other providers). Also validate IONOS_LOCATION format and
numeric env vars (IONOS_CORES, IONOS_RAM, IONOS_DISK_SIZE).
Agent: security-auditor
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users typing "spawn Claude" or "spawn Claude Code" now get resolved
to the correct key automatically instead of an "invalid characters"
error. Works for both agents and clouds in single-arg info and
two-arg run paths.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lead was spawning teammates via Task (fire-and-forget) and then
ending its conversation after 14 turns. Teammates became orphaned
with no coordination or shutdown sequence.
Added three changes to the team lead prompt:
1. Upfront warning: "your session MUST stay alive for the entire cycle"
2. New "Monitoring Loop" section with explicit WaitForMessage pattern
and common-mistake callout (BAD: spawn then exit, GOOD: spawn then
WaitForMessage loop then shutdown)
3. End instruction restructured into 3 mandatory phases:
Setup → Monitor (WaitForMessage loop) → Shutdown
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The team lead's claude process can hang indefinitely when an API call
doesn't return (observed: pre-flight check hung for 30+ min while 6
agents were orphaned). The hard timeout waits the full 40 min.
Now monitors log file growth every 10s. If no output for 3 minutes
(IDLE_TIMEOUT=180s), the process is killed immediately. The next
5-minute cron trigger starts a fresh cycle — no wasted time.
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ensure_sprite_exists() now polls `sprite list` until the sprite
appears (up to 30s) instead of a fixed sleep. This eliminates the
spurious "sprite not found" errors that appeared while the sprite
was still provisioning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Budget European provider starting at $2/month for 1 vCore/1GB RAM
- REST API via CloudAPI v6 with Basic Auth (username + password)
- Datacenter-based resource organization (auto-creates if needed)
- Volume-based boot disks with cloud-init support
- Implemented 3 agents: claude, aider, goose
- 11 agents marked as missing for future implementation
Agent: cloud-scout-1
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Clarifies that spawn agents use remote LLM APIs, not local inference,
which is why cheap CPU instances suffice and GPU clouds are unnecessary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spawn runs coding agents that call LLM APIs — they need affordable
CPU instances with SSH access, not expensive GPU VMs. Updated the
discovery prompts and CLAUDE.md to explicitly avoid GPU clouds and
prioritize budget VPS providers and container platforms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add paperspace/lib/common.sh with API and CLI support
- Implement 3 agent scripts: claude, aider, openclaw
- Add Paperspace to manifest.json clouds section
- Add 14 matrix entries (paperspace x all agents)
- Create README with usage docs and pricing info
Paperspace is a GPU cloud (now part of DigitalOcean) with:
- Hourly billing, free bandwidth
- GPU machines from $0.46/hr (M4000) to $1.90/hr (A6000)
- Three regions: NY2, CA1, AMS1
- Both pspace CLI and REST API support
Agent: cloud-scout-2
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- Auto-correct swapped arguments (e.g., `spawn sprite claude` now runs
as `spawn claude sprite`) instead of just warning and exiting
- Document `ls` alias for `list` in help text
- Add SPAWN_NO_UPDATE_CHECK env var to troubleshooting section
- Bump version to 0.2.21
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kamatera: Extract _kamatera_queue_field and _extract_kamatera_wan_ip helpers
to deduplicate inline Python blocks in wait_for_command (49->33 lines) and
get_kamatera_server_ip (49->26 lines).
Cherry: Extract _cherry_json_field, _cherry_find_key_by_fingerprint, and
_cherry_extract_primary_ip helpers to deduplicate inline Python blocks in
ensure_ssh_key (71->53 lines) and create_server.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests the compact list view triggered when the matrix grid is wider
than the terminal. Covers view switching logic, count formatting,
missing clouds column, "all clouds" display, and edge cases.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cli/cli.js is a bun build output that should never have been committed.
Remove it from tracking and add it to .gitignore.
Fixes#296
Agent: team-lead
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes:
- claude -p --output-format stream-json requires --verbose flag
- Trap cleanup ran twice (SIGTERM + EXIT) — add re-entry guard
- trigger-server drain() error left HTTP stream open — wrap in
try/finally so controller.close() always runs, preventing
curl error 18 (transfer closed with outstanding read data)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
claude -p buffers all output until completion, so the trigger server
only saw heartbeats during 15-40 min runs. Adding --output-format
stream-json makes claude emit JSON events (tool calls, messages) in
real-time, giving visibility into what the agent is doing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 25 tests for the previously untested showInfoOrError function which
handles single-argument CLI routing (agent info, cloud info, unknown
command with fuzzy suggestions). Tests cover valid agents, valid clouds,
unknown names, fuzzy match suggestions, help flag routing, and edge cases.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `spawn list` grid was 888 characters wide with 30 clouds, making it
completely unreadable in standard terminals (80-120 columns). Now detects
terminal width and automatically switches to a compact view showing each
agent with its cloud count and any missing clouds.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Split _fly_create_and_start_machine (70 lines) into _fly_create_machine
and _fly_wait_for_machine_start for single-responsibility
- Replace ensure_koyeb_token (38 lines) with ensure_api_token_with_provider
- Replace ensure_railway_token (37 lines) with ensure_api_token_with_provider
- Remove _save_koyeb_token and _save_railway_token (handled by shared helper)
Net reduction: ~80 lines of duplicated code
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Cherry Servers as a new cloud provider with:
- REST API-based server provisioning
- SSH key management via API
- Full root access to cloud VPS instances
- Hourly billing with no commitments
Implementation includes:
- cherry/lib/common.sh with Cherry Servers API primitives
- cherry/openclaw.sh for OpenClaw deployment
- cherry/goose.sh for Goose deployment
- cherry/README.md with authentication and usage docs
- manifest.json updates (cloud entry + 14 matrix entries)
Agent: cloud-scout
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>