MODAL_SANDBOX_ID and sandbox name were interpolated directly into
Python code strings, allowing potential code injection. Now all
user-controlled values are passed via environment variables and
read with os.environ in Python.
Changes:
- create_server: pass name/image via _MODAL_NAME/_MODAL_IMAGE env vars,
use getattr() for image lookup, add sandbox name validation
- run_server: pass sandbox ID and command via env vars
- interactive_session: pass sandbox ID and command via env vars
- destroy_server: pass sandbox ID via env var
- Add validate_sandbox_id() to enforce sb-<alphanumeric> format
- upload_file: remove printf '%q' escaping (base64 is safe)
Agent: security-auditor
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement ovh/cline.sh and ovh/gptme.sh using OVH
primitives. Both scripts provision an OVHcloud instance,
install the agent, inject OpenRouter credentials, and
launch an interactive session.
Agent: gap-filler
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- OAuth failures now explain WHY they failed (timeout, port conflict,
no runtime, network) and suggest specific fixes
- Add duration hints to long-running operations (SSH wait: 30-90s,
OAuth: 10-30s) so users know what to expect
- validateImplementation shows exact `spawn <agent> <cloud>` commands
users can run instead of just listing cloud names
- SSH wait failure suggests checking cloud provider dashboard
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Civo: Extract build_create_instance_body() for JSON body construction
and wait_for_civo_instance() for the status polling loop, reducing
create_server() from 113 to 53 lines.
Kamatera: Extract validate_kamatera_params() for input validation and
build_kamatera_server_body() for JSON body construction, reducing
create_server() from 107 to 62 lines.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement ovh/goose.sh and ovh/interpreter.sh using OVH
primitives. Both scripts provision an OVHcloud instance,
install the agent, inject OpenRouter credentials, and
launch an interactive session.
Agent: gap-filler-2
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement ovh/opencode.sh and ovh/plandex.sh using OVH
primitives. Both scripts provision an OVHcloud instance,
install the agent, inject OpenRouter credentials, and
launch an interactive session.
Agent: gap-filler-5
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement ovh/gemini.sh and ovh/amazonq.sh using OVH
primitives. Both scripts provision an OVHcloud instance,
install the agent, inject OpenRouter credentials, and
launch an interactive session.
Agent: gap-filler-3
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement ovh/openclaw.sh and ovh/nanoclaw.sh using OVH
primitives from lib/common.sh. Both scripts provision an
OVHcloud instance, install the agent, inject OpenRouter
credentials, and launch an interactive session.
Agent: gap-filler-1
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Default RUN_TIMEOUT_MS increased to 7200000 (2h) based on observed
team cycle durations of 1-2 hours
- SKILL.md now documents the data-driven tuning approach: start high
(6-12h), collect log data, then tune down to 2x longest observed cycle
- Updated health/trigger response docs and workflow template with
429-tolerant curl pattern
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The prompt-based 15-minute time budget was advisory only — cycles could
run for hours (9.5h observed on Feb 9). Now:
- refactor.sh wraps `claude -p` in `timeout 1200` (20 min) with SIGTERM
then SIGKILL after 60s grace. Distinguishes timeout vs failure in logs.
- trigger-server.ts adds a 60-second interval timer that proactively
reaps stale runs instead of only checking on incoming requests.
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add legend to `spawn list` matrix output (+ implemented, - not yet available)
- Show cloud identifier keys in `spawn <agent>` info output for easy copy-paste
- Add CLI shortcut hint in interactive mode after selection
- Add agent descriptions to `spawn agents` output
- Add agent counts to `spawn clouds` output for consistency
- Fix misleading "Updating" spinner in `spawn update` (it only checks)
- Add `spawn help` to help text command listing
- Improve footer hints in agents/clouds output with actionable commands
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Before checking concurrency, the trigger server now:
- Checks if tracked processes are actually alive (kill -0)
- Reaps dead processes that exited without cleanup
- Kills runs that exceed RUN_TIMEOUT_MS (default 30min)
Health endpoint now reports per-run details (pid, age, reason).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When MAX_CONCURRENT=1 and a cycle is in progress, the trigger server
returns 429. This is expected behavior, not an error — the previous
curl -f treated it as failure (exit code 22).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tests the extractFlagValue generic function and the full CLI flag
extraction pipeline (--prompt/-p and --prompt-file). Existing tests
in index-parsing.test.ts and index-edge-cases.test.ts use simplified
re-implementations; these tests cover the exact behavior including
error messages, process.exit on missing values, startsWith("-") guard,
sequential two-pass extraction, and edge cases with flag-like values.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- `spawn agents` now shows the key users need to type (e.g., `claude`)
alongside the display name and cloud count
- `spawn clouds` now shows the key (e.g., `sprite`) alongside the display
name and description
- Both commands show a usage hint at the bottom
- Error when both --prompt and --prompt-file are provided (was silently
overwriting)
- Remove duplicate agent validation in handleDefaultCommand (was loading
manifest twice without spinner, showing different error format)
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add 15-minute hard deadline with escalation at 10/12/15 min marks
- Limit each agent to ONE PR (prevents runaway micro-refactors)
- Add periodic issue re-scan every 5 minutes + mandatory final sweep
- Update shutdown checklist to verify per-issue comment coverage
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Terminals without UTF-8 support display garbled characters (e.g., "â"
instead of bullets). Replace all Unicode symbols (bullets, em dashes,
arrows, check marks, box drawing) with ASCII equivalents.
Fixes#99
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: Extract duplicated prompt flag parsing into extractFlagValue helper
The --prompt and --prompt-file argument extraction in main() shared identical
patterns for flag detection, value validation, and args splicing. Extracted
into a reusable extractFlagValue() function that handles all three concerns.
Agent: complexity-hunter
* refactor: Consolidate multiple python3 JSON reads into single calls
OVH, Kamatera, and UpCloud each spawned separate python3 processes to
read different fields from the same JSON config file. Consolidate into
a single python3 call per file, printing all fields at once and reading
them with bash read. Also fixes OVH using string interpolation for the
file path instead of the safer sys.argv[1] pattern.
Agent: complexity-hunter
* refactor: Extract flyctl auth and token validation from ensure_fly_token
Split the 75-line ensure_fly_token into focused helpers:
- _try_flyctl_auth: encapsulates flyctl CLI token retrieval
- _validate_fly_token: encapsulates API validation with error reporting
The main function is now a clear sequential flow of token source attempts.
Agent: complexity-hunter
* refactor: Deduplicate retry backoff logic in kamatera_api
The two error branches (network error and HTTP 429/503) had identical
interval update and attempt increment code. Restructure with early
return for success, then unified backoff at the end of the loop.
Agent: complexity-hunter
* refactor: Remove unnecessary async IIFE wrapper in validateAndGetAgent
The function wrapped its body in `return (async () => { ... })()` when
it can simply be declared as `async function` directly.
Agent: complexity-hunter
---------
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
- Fix triple-quote injection in SSH keys (Scaleway, UpCloud), userdata
(BinaryLane), init scripts (Civo, Kamatera), and GraphQL queries
(RunPod) by passing data via stdin/json_escape instead of inline
string interpolation
- Add input validation for all cloud provider env vars (region, type,
plan, etc.) using validate_region_name/validate_resource_name to block
shell metacharacters before they reach Python string interpolation
- Validate Modal image name as Python identifier to prevent code injection
- Validate numeric env vars (RAM, GPU count, disk size) across all providers
Affects: 19 cloud provider lib/common.sh files
Agent: security-auditor
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three new test files target gaps in existing coverage:
- index-edge-cases: tests startsWith("-") guard for --prompt/-p values, --prompt-file
validation, combined flag extraction order, and agent list truncation logic
- manifest-helpers: tests isValidManifest with unusual data shapes (arrays, strings,
numbers), corrupted cache handling, and countImplemented case sensitivity
- security-encoding: tests unicode homoglyphs, null bytes, CRLF line endings, BOM
markers, and control character handling in identifier/script/prompt validation
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- validateImplementation: Show which clouds ARE available when a
combination isn't implemented, instead of a dead-end error
- Interactive mode: Add guidance when no clouds available for agent
- handleError: Add 'spawn help' hint to generic error handler
- handleDefaultCommand: Show agent keys alongside names so users
know what to type (e.g., "claude" not just "Claude Code")
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Matrix now shows 14 agents x 21 clouds (264 implemented, 30 missing).
Added OVHcloud and Kamatera cloud columns, Kilo Code agent row.
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kilo Code (15K+ GitHub stars, 98 HN points) is an all-in-one agentic
engineering platform with native OpenRouter support. Adds agent entry,
sprite implementation, and missing matrix entries for all 21 clouds.
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds Kamatera (25+ global datacenters, REST API, hourly billing) as
a new cloud provider. Implements all 13 agent scripts with full
lifecycle: create, wait, destroy, SSH, upload.
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
printf %q escapes spaces and shell metacharacters, turning "claude install"
into "claude\ install" — which bash -c interprets as a single command named
"claude install" (with literal space). This broke all multi-word commands
passed to run_sprite, including pipes, redirects, and && chains.
Since all callers pass trusted, hardcoded command strings (not user input),
the command string should be passed directly to bash -c for normal shell
parsing.
Fixes#88
Agent: team-lead
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The sprite/claude.sh script was using 'claude install' which requires
claude to already be on PATH. Changed to use the curl installer which
downloads and installs the binary from scratch.
Fixes#88
Agent: issue-responder
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Show clear error when --prompt/-p or --prompt-file is used without a
value (previously silently ignored)
- Fix --prompt-file splice index bug when used after --prompt
- Replace echo -e with printf in fly/lib/common.sh for macOS bash 3.x
compatibility
- Fix incorrect env var name in README (DIGITALOCEAN_TOKEN -> DO_API_TOKEN)
- Add missing agent entries (gptme, OpenCode, Plandex) to 11 cloud READMEs
- Add all 13 agents to Civo README (previously only had 3)
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add generic_cloud_api_custom_auth() to shared/common.sh for cloud
providers that use non-Bearer auth headers. Replace ~120 lines of
duplicated retry logic in upcloud_api() and scaleway_api() with
calls to the new shared function.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change trigger-server MAX_CONCURRENT default from 3 to 1 to prevent
overlapping cycles that duplicate GitHub issue comments
- Add SIGTERM/SIGINT handling to trigger-server so running scripts finish
gracefully on service restart instead of being killed mid-flight
- Add cleanup trap to refactor.sh for worktree/tempfile cleanup on exit
- Add pre-cycle cleanup of stale worktrees, merged branches, and
abandoned PRs from previously interrupted cycles
- Add mandatory Lifecycle Management section to team lead prompt requiring
shutdown_request to all teammates before exiting
- Add dedup checks to community-coordinator: check existing comments
before posting to prevent duplicate acknowledgments/resolutions
- Pass issue number in workflow trigger reason for better logging
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Latitude.sh as the 19th cloud provider in the spawn matrix.
Latitude.sh offers bare metal servers and VMs via REST API with
hourly billing, global locations, and plans starting at $0.07/hr.
New files:
- latitude/lib/common.sh: Provider functions (API wrapper, server
creation/deletion, SSH key management, wait-for-ready)
- latitude/{agent}.sh: All 13 agent deployment scripts
- latitude/README.md: Usage docs with env vars and pricing
Updated:
- manifest.json: Added latitude cloud + 13 matrix entries
- README.md: Updated matrix table (19 clouds, 247 combinations)
Agent: cloud-scout
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hand-rolled ensure_vultr_token() and ensure_linode_token() with
calls to ensure_api_token_with_provider(), matching the pattern already
used by Hetzner, DigitalOcean, Lambda, E2B, and Scaleway.
Extracts test_vultr_token() and test_linode_token() validation functions
to preserve provider-specific error messages and remediation guidance.
Removes ~70 lines of duplicated env-check/config-file/prompt/save logic.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Adds 30 new test cases covering previously untested functions in commands.ts
- Tests for getStatusDescription, renderMatrix helpers, validation logic
- Tests for error handling functions and download fallback logic
- Tests for agent/cloud validation and implementation checking
- Tests for calculateColumnWidth variations with different parameters
- Tests for isLocalSpawnCheckout file detection logic
This improves test coverage for core command logic that wasn't previously tested,
focusing on pure functions and logic that can be tested without full module mocking.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
When sprite version output doesn't match the expected format, the message
now omits the version rather than displaying "unknown". Also broadened the
version regex to match versions without 'v' prefix.
Fixes#79
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The redirect `> /dev/null 2>&1` was being escaped by `run_sprite`'s
`printf %q`, causing the command to be interpreted incorrectly:
/usr/bin/bash: line 1: claude install > /dev/null 2>&1: No such file or directory
Removing the redirect allows users to see installation progress and
simplifies the command. Installation success is already verified by
the subsequent check on line 33.
Fixes#80
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Same bug as improve.sh — was cd'ing into the skills directory
instead of the repo root.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
improve.sh was setting REPO_ROOT to its own directory, causing
manifest.json lookups and git commands to fail silently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added 35 tests covering helper functions in commands.ts
- Tests cover error handling, string validation, column width calculation
- Tests verify renderMatrixRow color selection logic
- Tests validate isLocalSpawnCheckout and report functions
- All 35 new tests pass
- Focus on pure functions and functions with minimal side effects
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Split the 66-line generic_cloud_api function into focused helpers to reduce
complexity and eliminate duplication:
- _parse_api_response: Extracts HTTP code and response body (10 lines)
- _make_api_request: Builds curl args and executes request (27 lines)
- _handle_api_transient_error: Centralizes retry logic for all error types (24 lines)
Main function reduced from 66 to 41 lines (38% reduction). Behavior unchanged:
still retries on network errors and transient HTTP codes (429, 503), with
exponential backoff. All test assertions pass.
This extraction pattern makes it clearer how retry logic flows and easier to
modify error handling in the future without duplicating patterns.
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>