- Add SPAWN_SKIP_API_VALIDATION=1 and SPAWN_SKIP_GITHUB_AUTH=1 to
sprite test environment so verify_openrouter_key() doesn't make real
HTTP calls with the fake test key (which gets 401, clears the key,
and falls into OAuth — causing all sprite assertions to fail)
- Update agent iteration lists from stale "claude openclaw nanoclaw" to
current "claude openclaw codex opencode kilocode zeroclaw"
- Remove dead nanoclaw case from _assert_agent_specific
- Remove 5 dead agent cases (nanoclaw, cline, gptme, plandex, continue)
from _shared_agent_assertions.sh, add zeroclaw
Result: 108 passed, 0 failed (was: 48 passed, 18 failed)
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The json_escape fallback (used when python3 is unavailable) only escaped
backslashes and double quotes, producing invalid JSON when input contained
newlines, tabs, or carriage returns. This could cause JSON injection in
API request bodies sent to cloud providers (Hetzner, DigitalOcean, Fly.io)
and corrupt credential config files.
Add escaping for \n, \r, and \t in the fallback path. The python3 primary
path (json.dumps) was already correct.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Why: `set -eo pipefail` + `output=$(shellcheck ...)` on line 659 of
test/run.sh causes immediate exit when shellcheck finds any warning,
preventing the entire shell test suite from running. 53 CLI tests also
fail due to stale assertions after agents/clouds were removed in recent
PRs.
Fixes:
- test/run.sh:659 — add `|| true` to shellcheck command substitution so
shell test suite runs to completion even when scripts have warnings
- manifest-real-data.test.ts — lower agent count min from 10→5,
matrix count min from 80→40 (now 6 agents, 48 matrix entries)
- agent-env-injection-contract.test.ts — lower script count min
from 70→40 (now 47 implemented scripts)
- script-conventions.test.ts — same script count fix (70→40)
- cloud-lib-source-chain.test.ts — lower cloud lib min from 9→8
(OVH removed, now 8 clouds)
- commands-credential-display-internals.test.ts — add missing
@clack/prompts mock (tests call p.log.error but never mocked it)
- commands-exported-helpers-edges.test.ts — fix environment-dependent
assertion: only check credential-based hintOverrides, not
CLI-installed ones (sprite CLI is installed in CI/dev)
- agent-config-setup.test.ts — fix stale model ID assertion
("openrouter/anthropic/..." → "anthropic/...") and stale mkdir
command ("rm -rf && mkdir" → "mkdir -p")
- agent-info-quickstart.test.ts — remove sprite from singleAuthManifest
fixture (sprite CLI installed causes sprite to be prioritized over
hetzner, breaking 4 tests); update count assertions for single cloud
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Why: test/record.sh used local -n (bash 4.3+ namerefs) which crashes
on macOS's default bash 3.2, breaking contributor workflow for recording
API fixtures. Fixes#1480.
Inlines the _export_env_vars_from_fields helper directly into
_load_multi_config_from_file, eliminating the nameref dependency while
preserving the security validation of env var names.
Agent: team-lead
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Delete 32 agent scripts ({cloud}/{cline,gptme,plandex,continue}.sh across
8 clouds), remove the 4 agents from manifest.json with all their matrix
entries, update README matrix rows, remove stale mock agent binaries and
plandex.ai URL patterns from test harness, update CLI help examples to use
remaining agents, and bump version 0.5.7 → 0.5.8.
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Remove OVH as a cloud provider: delete ovh/ directory (lib + 11 agent
scripts), remove from manifest.json clouds and all ovh/* matrix entries,
update README matrix table, remove OVH destroy case in CLI commands,
and clean up all test harness references (mock.sh, mock-curl-script.sh,
record.sh, e2e.sh, cloud-lib-api-surface.test.ts, test-infra-sync.test.ts)
- Make featured_cloud an array (string[]) so agents can recommend multiple
clouds; update manifest.ts type, all 10 manifest.json values, and the
prioritizeCloudsByCredentials() comparison in commands.ts
- Sandbox OAuth in subprocess tests: add OPENROUTER_API_KEY=sk-or-test-fake
to the default env in cli-entry-edge-cases.test.ts and
cmdrun-resolution.test.ts so get_or_prompt_api_key() never triggers the
real OAuth browser flow during test runs
- Fix upload-file-security.test.ts SSH cloud count (5→4) after OVH removal
- Bump CLI version 0.5.6 → 0.5.7
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
These 5 agents are being dropped from the Spawn matrix. This removes
45 agent scripts across 9 clouds, cleans the manifest, test fixtures,
READMEs, CLI source, and shared library comments.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. _multi_creds_validate referenced undefined help_url variable, causing
empty "Get new credentials from: " error messages when OVH credential
validation fails. Added help_url as parameter and pass it from caller.
2. _spawn_inject_env_vars (used by 130+ agent scripts via spawn_agent)
uploaded credentials to static /tmp/env_config path. The older
inject_env_vars_ssh/inject_env_vars_cb functions document this as a
symlink attack vector and use randomized paths. Fixed to match.
3. Removed dead inject_env_vars_fly and inject_env_vars_sprite functions
(all agent scripts now use spawn_agent -> _spawn_inject_env_vars).
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds
aider-chat on Python 3.13 fails with `ImportError: cannot import name
'_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved
— those releases have no Python 3.13 binary wheels, so the C extension
is missing at runtime.
Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>`
and single quotes get mangled by `printf '%q'` in run_server before the
command reaches the remote machine) with `--upgrade`, which forces all
transitive deps including Pillow to their latest compatible versions.
Also adds a plain-text echo before the install so users see progress
instead of a silent hang during the 2-4 minute install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: update aider/gptme/interpreter assertions from pip to uv
The install method for aider, gptme, and open-interpreter was changed
from pip to `uv tool install` across all clouds. The mock test
assertions still checked for the old `pip.*install.*` patterns, causing
9 failures (3 agents × 3 clouds).
Update patterns to match the actual `uv tool install` commands now used
in all cloud scripts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci: trigger test run for uv assertion fix
* fix: prevent SSH hangs, restore stderr, fix command escaping across clouds
- Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH
stdin theft causing sequential install/verify/configure steps to hang
- Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default
SSH_OPTS so long-running installs don't silently drop on flaky networks
- Remove 2>/dev/null from Fly.io run_server so remote command errors are
no longer silently swallowed (--quiet flag still suppresses flyctl noise)
- Fix Fly.io printf '%q' double-quoting: remove extra quotes around
$escaped_cmd that prevented the remote shell from consuming escapes,
breaking && || | operators in commands
- Remove broken printf '%q' from Daytona run_server and interactive_session
where it escaped shell operators into literal characters since daytona exec
has no intermediate shell layer
- Pin aider to --python 3.12 instead of --with audioop-lts across all clouds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add --pty to fly ssh console for interactive sessions
fly ssh console -C does not allocate a pseudo-terminal by default,
causing interactive TUI agents (aider, claude) to fail with
"Input is not a terminal (fd=0)" or completely unresponsive input.
Adding --pty forces PTY allocation, matching how other clouds handle
interactive sessions (SSH uses -t, Sprite uses -tty).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Oracle Cloud is removed as a supported provider. Each agent now has a
`featured_cloud` field in manifest.json that controls cloud sort order
in the CLI picker — featured clouds appear after credential-detected
clouds but before CLI-installed ones, with a "recommended" hint.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add QA upgrade — macOS compat linter, per-agent mock assertions
Layer 1: macOS compat linter (test/macos-compat.sh)
- 12 rules (MC001–MC012) catching bash 3.2 incompatibilities
- Detects: base64 -w0 file args, non-portable echo flags, source <(),
((var++)), read -d, nounset flag, sed -i, date %N, local -n,
declare -A, ${var,,}, and |&
- Added to CI lint.yml in warn-only mode for burn-in
- Integrated as Phase 0.5 in qa-dry-run.sh
Layer 2: Per-agent mock assertions
- test/fixtures/_shared_agent_assertions.sh with install checks
for all 15 agents (claude, openclaw, aider, goose, etc.)
- Integrated into test/mock.sh via _run_agent_assertions()
Also includes branch fixes:
- Fix base64 -w0 to use stdin redirect (aws, daytona, fly)
- Fix fly/openclaw to use npm install instead of broken curl|bash
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add E2E test harness and integrate into QA pipeline
Add test/e2e.sh — a full E2E test harness that provisions real servers,
installs agents, and verifies setup across all clouds. Features:
- Smoke test (one canary agent per cloud) and full matrix modes
- Credential auto-detection for 8 clouds
- Per-cloud preflight validation (sequential) then parallel agent tests
- Stale server cleanup, timing history, cross-cloud comparison
- Auto-fix and optimization phases via Claude agents
- macOS bash 3.2 compatible
Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh:
- Runs after mock tests pass, gated on cloud credentials
- Phase 5b auto-fixes failures using per-agent worktree branches
- Parses results and includes in QA summary
Also fixes:
- shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read()
- aws/lib/common.sh: fix SSH key import (use cat instead of base64,
handle race condition on concurrent imports)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
test/run.sh (3 failures fixed):
- Export TEST_DIR so sprite mock tracks create→list state across processes
- Add sleep mock to avoid 30s polling loops in ensure_sprite_exists
- Add timeout/gtimeout, python3 pass-through mocks for host protection
- Set HOME to fake home for isolation, create fake home directory structure
- Clean up /tmp/spawn_* temp files in cleanup trap
test/mock.sh (29 failures fixed):
- Fix fly mock to detect "echo ok" in fly ssh console -C arguments
(including printf %q escaped form) so _fly_wait_for_ssh() succeeds
- Add timeout/gtimeout pass-through mocks to prevent system calls
- Add python3 delegate mock for JSON parsing in shared/common.sh
- Clean up /tmp/spawn_* temp files in cleanup trap
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes#1409
The bash sandbox test now verifies that test runs don't create or
modify agent-specific directories and configuration files:
- Checks that ~/.openclaw, ~/.sprite, and ~/.claude directories are
not created by test runs
- Verifies ~/.claude.json and ~/.claude/settings.json are not modified
during tests (using mtime comparison to handle pre-existing files)
- Skips checks for directories/files that existed before tests ran to
avoid false positives in development environments
This ensures tests remain properly sandboxed and don't pollute the
production environment with agent artifacts.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1403
**Changes:**
1. **test/run.sh** - Isolated mock state files:
- Changed /tmp/sprite_mock_created* to use TEST_DIR instead
- Added cleanup of any leaked /tmp files in cleanup() trap
- Prevents /tmp pollution from mock sprite state files
2. **test/record.sh** - Sandboxed config directory:
- Added TEST_CONFIG_DIR environment variable support
- When set, overrides HOME to prevent writing to ~/.config/spawn/
- Allows tests to run without polluting production config
3. **test/qa-dry-run.sh** - Safe git operations:
- Changed git checkout to git restore for reverting README changes
- Prevents potential checkout pollution of working tree
- Falls back to git checkout -- for older git versions
4. **test/test-sandbox.sh** - New verification test:
- Verifies no /tmp pollution after test/run.sh
- Verifies production config not modified
- Verifies mock.sh uses isolated temp directories
**Why:** Prevents test suite from polluting production environment (file writes to /tmp, ~/.config/spawn/, git state mutations).
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fly.io had zero test coverage — every bug fixed this session (stale
tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went
undetected. This adds the full mock test infrastructure:
- test/fixtures/fly/ — env vars, API assertions, fixture JSONs for
app creation, machine creation, and token validation endpoints
- test/mock-curl-script.sh — URL stripping for api.machines.dev,
body validation for machine creation, synthetic status responses,
app creation POST handler, state tracking
- test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token),
URL stripping, required field validation, base64 mock
- test/record.sh — Fly.io REST endpoints now recordable, live
create+delete cycle, error detection, auth var mapping
All 15 agent scripts (aider, claude, openclaw, etc.) are automatically
discovered and tested: 75 passed, 0 failed.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: auto-run gcloud auth login on expired GCP tokens
Instead of telling users to run `gcloud auth login` manually, just
run it automatically when auth check fails or instance creation hits
a reauthentication error, then retry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: prioritize clouds with CLI installed + hcloud CLI integration
When selecting a cloud provider, clouds are now sorted in 3 tiers:
1. Credentials detected (env vars set) — top priority
2. CLI installed (e.g., gcloud, hcloud, aws) — middle priority
3. Neither — default order
Also adds hcloud CLI-first support for Hetzner operations (server
create/delete/list, SSH key management, auth) with automatic fallback
to the existing REST API when hcloud is not available.
Closes#1370
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: rename aws-lightsail to aws across the project
Simplifies the cloud key from "aws-lightsail" to "aws" — AWS should
have a single entry regardless of the underlying service used.
Renames the directory, updates manifest.json matrix keys, CLI map,
test fixtures, README, and all agent scripts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All cloud claude.sh scripts had inline curl-only installs with no fallback.
When the curl installer failed (transient outage, rate limit), installation
failed with no recovery. Additionally, fnm-installed Node.js was invisible
to subsequent SSH sessions because each SSH command runs in a non-interactive
shell that doesn't source .bashrc/.zshrc.
Changes:
- Migrate 8 cloud scripts to use shared install_claude_code (curl → npm → bun)
- Move _ensure_node_runtime before npm/bun install attempts (not after)
- Add fnm paths to claude_path so node is discoverable across SSH sessions
- Prefix npm/bun install commands with claude_path for PATH visibility
- Update test assertion to match new install_claude_code behavior
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce from 41 cloud providers to 10 (9 + local) curated for launch:
- local (free), oracle (free tier), hetzner (~€3.29/mo), ovh (~€3.50/mo),
fly (free tier), aws-lightsail ($3.50/mo), daytona (pay-per-second),
digitalocean ($4/mo), gcp ($7.11/mo), sprite (Fly.io VMs)
Changes:
- Remove 30 cloud directories, test fixtures, and provider-specific tests
- Slim manifest.json from 600 to 150 matrix entries, sorted by price
- Update CLAUDE.md with higher bar for adding clouds (prestige + pricing)
- Transform discovery service from code-implementing team to upvote-driven
demand tracker that creates proposal issues and only implements when a
proposal reaches 50+ upvotes
- Create GitHub issue #1183 as cloud wishlist with all dropped clouds
- Add discovery-team/cloud-proposal/agent-proposal labels
- Protect discovery-team issues from refactor team (no comments/changes)
- Fix all CLI tests (8034 pass, 0 fail) and shell tests (80 pass, 0 fail)
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mock test assertion was checking for GET /sshkeys but the actual
Scaleway API endpoint is /ssh-keys (with a hyphen), causing all 15
scaleway agent tests to fail the "fetches SSH keys" check.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract pattern-matching logic in _strip_api_base() into separate helper functions (_strip_gcore_endpoint, _strip_scaleway_endpoint) to reduce function complexity from 36 lines to organized cases with extracted handlers.
Refactor ensure_api_token_with_provider() in shared/common.sh by extracting:
- _prompt_for_api_token() handles user prompting
- _validate_env_var_name() handles security validation
Reduces main function complexity and improves testability.
Agent: complexity-hunter
Co-authored-by: spawn-refactor-bot <refactor@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Break down overly complex functions into smaller, single-purpose helpers:
discovery.sh:
- Extract _sync_and_setup() from run_team_cycle() for git sync + setup
- Extract _launch_claude() to handle process startup
- Extract _session_completed() to check session status
- Extract _cleanup_cycle_files() for file cleanup
- Reduces run_team_cycle() from 71 lines to 39 lines
record.sh:
- Extract _validate_response_not_empty() for empty check
- Extract _validate_response_json() for JSON validation
- Extract _validate_response_no_error() for API error checking
- Extract _record_fixture_metadata() for metadata recording
- Reduces _save_live_fixture() from 34 lines to 15 lines
shared/common.sh:
- Extract _check_agent_in_path() for PATH verification
- Extract _check_agent_runs() for execution verification
- Reduces verify_agent_installed() from 32 lines to 11 lines
Each helper is focused on one concern, improving maintainability and testability.
Co-authored-by: spawn-refactor-bot <refactor@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract 60+ line nested case statement in _validate_body() into
dedicated _get_required_fields() function using cloud:endpoint pattern
matching. Reduces _validate_body() from 93 to 35 lines while improving
readability and maintainability.
Extract 162-line heredoc from build_team_prompt() into external
discovery-team-prompt.txt template file. Reduces function to 6 lines,
making discovery.sh more maintainable.
All 80 bash tests pass. No functionality change.
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Adds missing test infrastructure functions that were previously only in
mock-curl-script.sh but required by test-infra-sync.test.ts:
- _strip_api_base(): Strips cloud provider API base URLs to extract endpoint paths
- _validate_body(): Validates POST request bodies contain required fields for major clouds
Fixes test failures in test-infra-sync.test.ts where coverage validation checks
rely on these functions being present in test/mock.sh.
Agent: test-engineer
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extracted ssh-keygen mock creation into _create_ssh_keygen_mock() to
simplify setup_mock_agents() from 38 to 13 lines.
Extracted validation and response handling in test/record.sh:
- _validate_endpoint_response(): handles empty/invalid/error responses
- _save_endpoint_fixture(): saves fixture and updates metadata
Reduces _record_endpoint() from 43 to 17 lines.
Extracted ID extraction and delete response handling:
- _extract_resource_id(): extracts ID from create response
- _handle_delete_response(): handles fallback for empty delete responses
Reduces _live_create_delete_cycle() from 44 to 28 lines.
All 79 tests pass.
Agent: complexity-hunter
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extracted the large 270-line embedded mock curl script from the
setup_mock_curl() function into a separate file (mock-curl-script.sh).
This reduces setup_mock_curl() from 270 lines to 6 lines, improving
readability and maintainability.
The refactoring:
- Creates test/mock-curl-script.sh with all mock curl implementation
- Simplifies setup_mock_curl() to copy the external script
- Maintains identical functionality (all tests pass)
- Makes the mock curl logic easier to understand and modify
Agent: complexity-hunter
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Added _api_assertions.sh fixtures for binarylane, genesiscloud, hyperstack, kamatera, latitude, ovh, scaleway, and upcloud to enable comprehensive mock test coverage. Updated _validate_body() in test/mock.sh to validate POST request bodies for all cloud providers, ensuring payload correctness. Fixed syntax error in gcore validation (!! to ;;).
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Extracted helper functions to reduce cyclomatic complexity:
test/mock.sh:
- Extract _wait_with_timeout() from run_script_with_timeout() (reduced from 32→17 lines)
- Extract _setup_test_env() and _record_categorized_result() from run_test() (reduced from 50→26 lines)
test/record.sh:
- Refactor has_api_error() to use lambda dict for cloud-specific checks (improved readability, same logic)
- Extract _format_env_var_display() from list_clouds() to eliminate nested loop (reduced from 48→32 lines)
All functions maintain identical behavior and pass syntax validation.
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract assertion tracking and fixture detection logic in mock.sh:
- New _run_assertions_and_track() helper consolidates 20 lines of repeated assertions
- New _has_missing_fixture() helper checks mock log for fixture errors
- run_test() now 30 lines shorter, focusing on orchestration rather than details
Extract cloud endpoints data in record.sh:
- Replace 132-line case statement with data-driven approach
- Each cloud's endpoints now live in _ENDPOINTS_{cloud} variable
- get_endpoints() function reduced to 3 lines, delegates to variable lookup
Benefits:
- Reduced cognitive load: test logic separated from data
- Easier to add new clouds: just add _ENDPOINTS_* variable
- Better maintainability: centralized endpoint definitions
Tests: All 80 tests pass with fixtures enabled.
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The Gcore PR (#1079) introduced `!!` instead of `;;` as case statement
terminators in 4 places, causing a syntax error on line 542 that breaks
all fixture recording.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL: Add validation to prevent command injection via malicious environment variable names in `export "${var_name}=..."` patterns.
Vulnerability Details:
- All instances of `export "${var_name}=${value}"` where var_name is derived from external sources (manifest.json auth fields, user input, API responses) were vulnerable to command injection
- If var_name contained shell metacharacters like `;`, `$()`, or backticks, arbitrary code could be executed
- Example exploit: var_name=`FOO; rm -rf /` would execute the rm command
Affected Files:
- shared/key-request.sh: _try_load_env_var() - var_name from manifest.json
- shared/common.sh: _load_token_from_config(), ensure_api_token_with_provider(), _multi_creds_load_config(), _multi_creds_prompt(), _poll_instance_once() - var_name from function parameters
- test/record.sh: _load_multi_config_from_file(), _try_load_cloud_config(), _prompt_cloud_creds_interactive() - var_name from test fixtures
Fix Applied:
- Added regex validation before all export statements: `^[A-Z_][A-Z0-9_]*$`
- This allowlist enforces standard POSIX environment variable naming (uppercase letters, digits, underscores only, must start with letter or underscore)
- Returns error if validation fails, preventing injection
Impact:
- While current usage passes hardcoded env var names (e.g., "HCLOUD_TOKEN"), the vulnerability existed in the implementation
- manifest.json is currently trusted, but defense-in-depth prevents supply chain attacks or accidental malformed entries
- Test infrastructure was also vulnerable to malicious fixture data
Agent: security-auditor
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add ServerSpace (serverspace.io) as a new cloud provider with global
locations (EU, US, Asia). Uses REST API with X-API-KEY auth and async
task-based server creation with polling.
- serverspace/lib/common.sh: Full provider library with API wrapper,
SSH key management, server provisioning with cloud-init, task polling
- serverspace/claude.sh: Claude Code agent deployment
- serverspace/aider.sh: Aider agent deployment
- serverspace/goose.sh: Goose agent deployment
- manifest.json: Cloud definition + 15 matrix entries (3 implemented)
- test/mock.sh: URL stripping, body validation, synthetic responses
- test/record.sh: Endpoints, auth, API calls, error detection
- test/fixtures/serverspace/: Mock fixtures for all API endpoints
Co-authored-by: OpenRouter Bot <noreply@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- test/mock.sh: Extract _tracked_assert and _categorize_failure from run_test (86->74 lines)
- ionos/lib/common.sh: Extract _ionos_validate_create_params and _ionos_require_ubuntu_image from create_server (51->28 lines)
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Civo tests failed because networks.json, disk_images.json, and
correctly-named sshkeys.json fixtures were missing. Hetzner tests
failed because datacenters.json was missing (needed for server type
validation). Scaleway tests failed because SCW_DEFAULT_PROJECT_ID
was missing from env, images.json had no Ubuntu images, and
create_server.json fixture was absent.
Also adds Civo and Scaleway to mock's _synthetic_active_response
for instance polling, and fixes Scaleway account API URL stripping.
Results: 435 passed, 0 failed, 1 skipped (previously 270/165/1).
Agent: pr-maintainer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): harden weak crypto fallbacks, key validation, and temp paths
- CSRF state generation: fail instead of using predictable date+$RANDOM
fallback when openssl and /dev/urandom are unavailable (OAuth CSRF bypass)
- Kamatera password: fail instead of using predictable date-based password
when no secure random source available
- key-server validKeyVal: enforce 8-512 char limits and ASCII-only check
to block malformed/oversized values (Fixes#969)
- upload_config_file: use mktemp-derived randomness for remote temp paths
instead of predictable $RANDOM (symlink attack on remote server)
Agent: security-auditor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): update assertions for upload_config_file mktemp-derived paths
The upload_config_file function now uses mktemp-derived basenames
(spawn_config_tmp.XXX) instead of the original filename for remote temp
paths. Update test/run.sh assertions to:
- Match "spawn_config" in the -file upload path
- Verify mv commands move files to correct final destinations
(settings.json, .claude.json)
Addresses reviewer feedback on PR #1039.
Agent: pr-maintainer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 improvements to the QA cycle:
1. Fix agents now get structured failure context — categorized failures
(exit_code, missing_api_call, missing_env, no_fixture) instead of
raw 500-line test output, plus a passing agent for comparison
2. Fix agent changes are verified before committing — re-runs mock tests
after the agent finishes and only commits if results actually improved,
discarding bad fixes that would create noise PRs
3. Test results now include failure categories — mock.sh records
cloud/agent:fail:reason instead of just cloud/agent:fail, enabling
smarter failure routing
4. Mock curl logs NO_FIXTURE warnings when no fixture matches a GET
request, surfacing false-confidence gaps where tests pass with
synthetic fallback data
5. Phase 3 (code fix) failures now escalate to GitHub issues after 3
consecutive cycles, matching the Phase 1 escalation pattern
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Extract _get_multi_cred_spec, _load_multi_config_from_file, and
_save_multi_config_to_file helpers to eliminate duplicated per-cloud
config blocks in try_load_config, save_config, has_credentials,
prompt_credentials, and list_clouds.
The cloud-to-credential mapping (OVH, UpCloud, Kamatera, AtlanticNet,
CloudSigma) is now defined once in _get_multi_cred_spec and consumed
by all five functions, making it trivial to add new multi-credential
clouds.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements Webdock cloud provider with full API integration:
- webdock/lib/common.sh with REST API primitives
- claude.sh, cline.sh, aider.sh agent scripts
- Test coverage in test/record.sh and test/mock.sh
- manifest.json updated with cloud entry and matrix
- README.md with usage documentation
Webdock offers affordable European VPS (€2.15/month starting) with
full REST API, SSH access, and developer-friendly features.
Agent: cloud-scout-1
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Both mock.sh and record.sh now run each cloud's tests/recordings
concurrently as background jobs instead of sequentially.
Results are aggregated after all clouds finish.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: strip ANSI colors before grepping test summary
The mock test output uses ANSI escape codes for colored ✓/✗/━━━
characters, so the grep in the Post summary step couldn't match
them. Strip colors with sed first.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use NO_COLOR standard instead of sed to strip ANSI codes
mock.sh now respects the NO_COLOR env var (https://no-color.org/).
CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add CloudSigma cloud provider
Add CloudSigma as a new cloud provider with API-first architecture:
- Create cloudsigma/lib/common.sh with HTTP Basic Auth support
- Implement cloudsigma/claude.sh and cloudsigma/aider.sh agent scripts
- Add CloudSigma to manifest.json (38th cloud provider)
- Add matrix entries for all 15 agents (2 implemented, 13 missing)
- Update test/record.sh with CloudSigma endpoints and auth handling
- Update test/mock.sh with URL-stripping for CloudSigma API
- Add cloudsigma/README.md with usage documentation
CloudSigma features:
- API v2.0 with HTTP Basic Auth (email:password)
- Regions: ZRH (Zurich), WDC (Washington DC), LVS (Las Vegas)
- Granular resource control (CPU/RAM/Disk independently configurable)
- Ubuntu 24.04 cloned from public library drives
- SSH access via cloudsigma user
- Pay-as-you-go pricing starting at ~$14/month
Agent: cloud-scout
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: address security review comments for CloudSigma provider
- [CRITICAL] Fix command injection in credential saving: use sys.argv
instead of raw shell interpolation in Python strings
- [CRITICAL] Fix shell injection in create_cloudsigma_drive: pass name
and size via sys.argv instead of inline interpolation
- [CRITICAL] Fix shell injection in SSH key fingerprint lookups: pass
fingerprint via sys.argv
- [HIGH] Replace hardcoded VNC password with random generation via
openssl rand -hex 8
- [MEDIUM] Fix config file path injection: pass via sys.argv
Agent: pr-maintainer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add HOSTKEY (https://hostkey.com/) as a new cloud provider to the spawn
matrix. HOSTKEY offers affordable VPS hosting starting from €1/month with
hourly billing, making it suitable for running AI agents that use remote
API inference.
Changes:
- Created hostkey/lib/common.sh with HOSTKEY API wrappers
- Implemented hostkey/claude.sh (Claude Code agent)
- Implemented hostkey/openclaw.sh (OpenClaw agent)
- Added HOSTKEY to manifest.json clouds section
- Added matrix entries for all 15 agents (2 implemented, 13 missing)
- Updated test/record.sh with HOSTKEY test infrastructure
- Updated test/mock.sh with HOSTKEY URL handling
- Created hostkey/README.md with usage instructions
Data centers: Amsterdam, Frankfurt, Helsinki, Reykjavik, Istanbul, New York
Agent: cloud-scout
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add Atlantic.Net Cloud as a new cloud provider with REST API support.
Starting at $4-8/mo for budget VPS instances with SSH access.
Implementation:
- Created atlanticnet/lib/common.sh with HMAC-SHA256 API auth
- Implemented 3 agent scripts: claude.sh, aider.sh, openclaw.sh
- Updated manifest.json with cloud entry and 15 matrix entries
- Added test coverage in test/record.sh and test/mock.sh
- Created atlanticnet/README.md with usage docs
API authentication uses timestamp + random GUID signed with private key.
Defaults: G2.2GB plan, ubuntu-24.04_64bit image, USEAST2 location.
Agent: cloud-scout-1
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Reduces setup_env_for_cloud (84 lines -> 8 lines) and assert_cloud_api_calls
(32 lines -> 9 lines) in test/mock.sh by moving cloud-specific data into
per-cloud _env.sh and _api_assertions.sh files in test/fixtures/.
Adding a new cloud's test config now only requires creating two small files
in the fixtures directory instead of editing case branches in mock.sh.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CodeSandbox as a new sandbox cloud provider for running AI agents.
CodeSandbox features:
- Firecracker microVMs with ~2 second start times
- SDK/CLI-based exec (no SSH)
- Free tier: 40 hours/month on Build plan
- Secure isolated environments
Implementation:
- Created codesandbox/lib/common.sh with SDK wrapper functions
- Implemented 3 agent scripts: claude, aider, openclaw
- Added CodeSandbox to manifest.json clouds
- Created matrix entries (3 implemented, 12 missing)
- Updated test/record.sh to list as non-recordable CLI cloud
- Added codesandbox/README.md with usage instructions
The implementation follows the existing pattern from e2b and modal,
using Node.js SDK (@codesandbox/sdk) for sandbox lifecycle management.
Agent: cloud-scout
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>