1. _multi_creds_validate referenced undefined help_url variable, causing
empty "Get new credentials from: " error messages when OVH credential
validation fails. Added help_url as parameter and pass it from caller.
2. _spawn_inject_env_vars (used by 130+ agent scripts via spawn_agent)
uploaded credentials to static /tmp/env_config path. The older
inject_env_vars_ssh/inject_env_vars_cb functions document this as a
symlink attack vector and use randomized paths. Fixed to match.
3. Removed dead inject_env_vars_fly and inject_env_vars_sprite functions
(all agent scripts now use spawn_agent -> _spawn_inject_env_vars).
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds
aider-chat on Python 3.13 fails with `ImportError: cannot import name
'_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved
— those releases have no Python 3.13 binary wheels, so the C extension
is missing at runtime.
Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>`
and single quotes get mangled by `printf '%q'` in run_server before the
command reaches the remote machine) with `--upgrade`, which forces all
transitive deps including Pillow to their latest compatible versions.
Also adds a plain-text echo before the install so users see progress
instead of a silent hang during the 2-4 minute install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: update aider/gptme/interpreter assertions from pip to uv
The install method for aider, gptme, and open-interpreter was changed
from pip to `uv tool install` across all clouds. The mock test
assertions still checked for the old `pip.*install.*` patterns, causing
9 failures (3 agents × 3 clouds).
Update patterns to match the actual `uv tool install` commands now used
in all cloud scripts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci: trigger test run for uv assertion fix
* fix: prevent SSH hangs, restore stderr, fix command escaping across clouds
- Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH
stdin theft causing sequential install/verify/configure steps to hang
- Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default
SSH_OPTS so long-running installs don't silently drop on flaky networks
- Remove 2>/dev/null from Fly.io run_server so remote command errors are
no longer silently swallowed (--quiet flag still suppresses flyctl noise)
- Fix Fly.io printf '%q' double-quoting: remove extra quotes around
$escaped_cmd that prevented the remote shell from consuming escapes,
breaking && || | operators in commands
- Remove broken printf '%q' from Daytona run_server and interactive_session
where it escaped shell operators into literal characters since daytona exec
has no intermediate shell layer
- Pin aider to --python 3.12 instead of --with audioop-lts across all clouds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add --pty to fly ssh console for interactive sessions
fly ssh console -C does not allocate a pseudo-terminal by default,
causing interactive TUI agents (aider, claude) to fail with
"Input is not a terminal (fd=0)" or completely unresponsive input.
Adding --pty forces PTY allocation, matching how other clouds handle
interactive sessions (SSH uses -t, Sprite uses -tty).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Oracle Cloud is removed as a supported provider. Each agent now has a
`featured_cloud` field in manifest.json that controls cloud sort order
in the CLI picker — featured clouds appear after credential-detected
clouds but before CLI-installed ones, with a "recommended" hint.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add QA upgrade — macOS compat linter, per-agent mock assertions
Layer 1: macOS compat linter (test/macos-compat.sh)
- 12 rules (MC001–MC012) catching bash 3.2 incompatibilities
- Detects: base64 -w0 file args, non-portable echo flags, source <(),
((var++)), read -d, nounset flag, sed -i, date %N, local -n,
declare -A, ${var,,}, and |&
- Added to CI lint.yml in warn-only mode for burn-in
- Integrated as Phase 0.5 in qa-dry-run.sh
Layer 2: Per-agent mock assertions
- test/fixtures/_shared_agent_assertions.sh with install checks
for all 15 agents (claude, openclaw, aider, goose, etc.)
- Integrated into test/mock.sh via _run_agent_assertions()
Also includes branch fixes:
- Fix base64 -w0 to use stdin redirect (aws, daytona, fly)
- Fix fly/openclaw to use npm install instead of broken curl|bash
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add E2E test harness and integrate into QA pipeline
Add test/e2e.sh — a full E2E test harness that provisions real servers,
installs agents, and verifies setup across all clouds. Features:
- Smoke test (one canary agent per cloud) and full matrix modes
- Credential auto-detection for 8 clouds
- Per-cloud preflight validation (sequential) then parallel agent tests
- Stale server cleanup, timing history, cross-cloud comparison
- Auto-fix and optimization phases via Claude agents
- macOS bash 3.2 compatible
Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh:
- Runs after mock tests pass, gated on cloud credentials
- Phase 5b auto-fixes failures using per-agent worktree branches
- Parses results and includes in QA summary
Also fixes:
- shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read()
- aws/lib/common.sh: fix SSH key import (use cat instead of base64,
handle race condition on concurrent imports)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
test/run.sh (3 failures fixed):
- Export TEST_DIR so sprite mock tracks create→list state across processes
- Add sleep mock to avoid 30s polling loops in ensure_sprite_exists
- Add timeout/gtimeout, python3 pass-through mocks for host protection
- Set HOME to fake home for isolation, create fake home directory structure
- Clean up /tmp/spawn_* temp files in cleanup trap
test/mock.sh (29 failures fixed):
- Fix fly mock to detect "echo ok" in fly ssh console -C arguments
(including printf %q escaped form) so _fly_wait_for_ssh() succeeds
- Add timeout/gtimeout pass-through mocks to prevent system calls
- Add python3 delegate mock for JSON parsing in shared/common.sh
- Clean up /tmp/spawn_* temp files in cleanup trap
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes#1409
The bash sandbox test now verifies that test runs don't create or
modify agent-specific directories and configuration files:
- Checks that ~/.openclaw, ~/.sprite, and ~/.claude directories are
not created by test runs
- Verifies ~/.claude.json and ~/.claude/settings.json are not modified
during tests (using mtime comparison to handle pre-existing files)
- Skips checks for directories/files that existed before tests ran to
avoid false positives in development environments
This ensures tests remain properly sandboxed and don't pollute the
production environment with agent artifacts.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1403
**Changes:**
1. **test/run.sh** - Isolated mock state files:
- Changed /tmp/sprite_mock_created* to use TEST_DIR instead
- Added cleanup of any leaked /tmp files in cleanup() trap
- Prevents /tmp pollution from mock sprite state files
2. **test/record.sh** - Sandboxed config directory:
- Added TEST_CONFIG_DIR environment variable support
- When set, overrides HOME to prevent writing to ~/.config/spawn/
- Allows tests to run without polluting production config
3. **test/qa-dry-run.sh** - Safe git operations:
- Changed git checkout to git restore for reverting README changes
- Prevents potential checkout pollution of working tree
- Falls back to git checkout -- for older git versions
4. **test/test-sandbox.sh** - New verification test:
- Verifies no /tmp pollution after test/run.sh
- Verifies production config not modified
- Verifies mock.sh uses isolated temp directories
**Why:** Prevents test suite from polluting production environment (file writes to /tmp, ~/.config/spawn/, git state mutations).
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fly.io had zero test coverage — every bug fixed this session (stale
tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went
undetected. This adds the full mock test infrastructure:
- test/fixtures/fly/ — env vars, API assertions, fixture JSONs for
app creation, machine creation, and token validation endpoints
- test/mock-curl-script.sh — URL stripping for api.machines.dev,
body validation for machine creation, synthetic status responses,
app creation POST handler, state tracking
- test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token),
URL stripping, required field validation, base64 mock
- test/record.sh — Fly.io REST endpoints now recordable, live
create+delete cycle, error detection, auth var mapping
All 15 agent scripts (aider, claude, openclaw, etc.) are automatically
discovered and tested: 75 passed, 0 failed.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: auto-run gcloud auth login on expired GCP tokens
Instead of telling users to run `gcloud auth login` manually, just
run it automatically when auth check fails or instance creation hits
a reauthentication error, then retry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: prioritize clouds with CLI installed + hcloud CLI integration
When selecting a cloud provider, clouds are now sorted in 3 tiers:
1. Credentials detected (env vars set) — top priority
2. CLI installed (e.g., gcloud, hcloud, aws) — middle priority
3. Neither — default order
Also adds hcloud CLI-first support for Hetzner operations (server
create/delete/list, SSH key management, auth) with automatic fallback
to the existing REST API when hcloud is not available.
Closes#1370
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: rename aws-lightsail to aws across the project
Simplifies the cloud key from "aws-lightsail" to "aws" — AWS should
have a single entry regardless of the underlying service used.
Renames the directory, updates manifest.json matrix keys, CLI map,
test fixtures, README, and all agent scripts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All cloud claude.sh scripts had inline curl-only installs with no fallback.
When the curl installer failed (transient outage, rate limit), installation
failed with no recovery. Additionally, fnm-installed Node.js was invisible
to subsequent SSH sessions because each SSH command runs in a non-interactive
shell that doesn't source .bashrc/.zshrc.
Changes:
- Migrate 8 cloud scripts to use shared install_claude_code (curl → npm → bun)
- Move _ensure_node_runtime before npm/bun install attempts (not after)
- Add fnm paths to claude_path so node is discoverable across SSH sessions
- Prefix npm/bun install commands with claude_path for PATH visibility
- Update test assertion to match new install_claude_code behavior
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce from 41 cloud providers to 10 (9 + local) curated for launch:
- local (free), oracle (free tier), hetzner (~€3.29/mo), ovh (~€3.50/mo),
fly (free tier), aws-lightsail ($3.50/mo), daytona (pay-per-second),
digitalocean ($4/mo), gcp ($7.11/mo), sprite (Fly.io VMs)
Changes:
- Remove 30 cloud directories, test fixtures, and provider-specific tests
- Slim manifest.json from 600 to 150 matrix entries, sorted by price
- Update CLAUDE.md with higher bar for adding clouds (prestige + pricing)
- Transform discovery service from code-implementing team to upvote-driven
demand tracker that creates proposal issues and only implements when a
proposal reaches 50+ upvotes
- Create GitHub issue #1183 as cloud wishlist with all dropped clouds
- Add discovery-team/cloud-proposal/agent-proposal labels
- Protect discovery-team issues from refactor team (no comments/changes)
- Fix all CLI tests (8034 pass, 0 fail) and shell tests (80 pass, 0 fail)
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mock test assertion was checking for GET /sshkeys but the actual
Scaleway API endpoint is /ssh-keys (with a hyphen), causing all 15
scaleway agent tests to fail the "fetches SSH keys" check.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract pattern-matching logic in _strip_api_base() into separate helper functions (_strip_gcore_endpoint, _strip_scaleway_endpoint) to reduce function complexity from 36 lines to organized cases with extracted handlers.
Refactor ensure_api_token_with_provider() in shared/common.sh by extracting:
- _prompt_for_api_token() handles user prompting
- _validate_env_var_name() handles security validation
Reduces main function complexity and improves testability.
Agent: complexity-hunter
Co-authored-by: spawn-refactor-bot <refactor@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Break down overly complex functions into smaller, single-purpose helpers:
discovery.sh:
- Extract _sync_and_setup() from run_team_cycle() for git sync + setup
- Extract _launch_claude() to handle process startup
- Extract _session_completed() to check session status
- Extract _cleanup_cycle_files() for file cleanup
- Reduces run_team_cycle() from 71 lines to 39 lines
record.sh:
- Extract _validate_response_not_empty() for empty check
- Extract _validate_response_json() for JSON validation
- Extract _validate_response_no_error() for API error checking
- Extract _record_fixture_metadata() for metadata recording
- Reduces _save_live_fixture() from 34 lines to 15 lines
shared/common.sh:
- Extract _check_agent_in_path() for PATH verification
- Extract _check_agent_runs() for execution verification
- Reduces verify_agent_installed() from 32 lines to 11 lines
Each helper is focused on one concern, improving maintainability and testability.
Co-authored-by: spawn-refactor-bot <refactor@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract 60+ line nested case statement in _validate_body() into
dedicated _get_required_fields() function using cloud:endpoint pattern
matching. Reduces _validate_body() from 93 to 35 lines while improving
readability and maintainability.
Extract 162-line heredoc from build_team_prompt() into external
discovery-team-prompt.txt template file. Reduces function to 6 lines,
making discovery.sh more maintainable.
All 80 bash tests pass. No functionality change.
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Adds missing test infrastructure functions that were previously only in
mock-curl-script.sh but required by test-infra-sync.test.ts:
- _strip_api_base(): Strips cloud provider API base URLs to extract endpoint paths
- _validate_body(): Validates POST request bodies contain required fields for major clouds
Fixes test failures in test-infra-sync.test.ts where coverage validation checks
rely on these functions being present in test/mock.sh.
Agent: test-engineer
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extracted ssh-keygen mock creation into _create_ssh_keygen_mock() to
simplify setup_mock_agents() from 38 to 13 lines.
Extracted validation and response handling in test/record.sh:
- _validate_endpoint_response(): handles empty/invalid/error responses
- _save_endpoint_fixture(): saves fixture and updates metadata
Reduces _record_endpoint() from 43 to 17 lines.
Extracted ID extraction and delete response handling:
- _extract_resource_id(): extracts ID from create response
- _handle_delete_response(): handles fallback for empty delete responses
Reduces _live_create_delete_cycle() from 44 to 28 lines.
All 79 tests pass.
Agent: complexity-hunter
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extracted the large 270-line embedded mock curl script from the
setup_mock_curl() function into a separate file (mock-curl-script.sh).
This reduces setup_mock_curl() from 270 lines to 6 lines, improving
readability and maintainability.
The refactoring:
- Creates test/mock-curl-script.sh with all mock curl implementation
- Simplifies setup_mock_curl() to copy the external script
- Maintains identical functionality (all tests pass)
- Makes the mock curl logic easier to understand and modify
Agent: complexity-hunter
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Added _api_assertions.sh fixtures for binarylane, genesiscloud, hyperstack, kamatera, latitude, ovh, scaleway, and upcloud to enable comprehensive mock test coverage. Updated _validate_body() in test/mock.sh to validate POST request bodies for all cloud providers, ensuring payload correctness. Fixed syntax error in gcore validation (!! to ;;).
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Extracted helper functions to reduce cyclomatic complexity:
test/mock.sh:
- Extract _wait_with_timeout() from run_script_with_timeout() (reduced from 32→17 lines)
- Extract _setup_test_env() and _record_categorized_result() from run_test() (reduced from 50→26 lines)
test/record.sh:
- Refactor has_api_error() to use lambda dict for cloud-specific checks (improved readability, same logic)
- Extract _format_env_var_display() from list_clouds() to eliminate nested loop (reduced from 48→32 lines)
All functions maintain identical behavior and pass syntax validation.
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract assertion tracking and fixture detection logic in mock.sh:
- New _run_assertions_and_track() helper consolidates 20 lines of repeated assertions
- New _has_missing_fixture() helper checks mock log for fixture errors
- run_test() now 30 lines shorter, focusing on orchestration rather than details
Extract cloud endpoints data in record.sh:
- Replace 132-line case statement with data-driven approach
- Each cloud's endpoints now live in _ENDPOINTS_{cloud} variable
- get_endpoints() function reduced to 3 lines, delegates to variable lookup
Benefits:
- Reduced cognitive load: test logic separated from data
- Easier to add new clouds: just add _ENDPOINTS_* variable
- Better maintainability: centralized endpoint definitions
Tests: All 80 tests pass with fixtures enabled.
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The Gcore PR (#1079) introduced `!!` instead of `;;` as case statement
terminators in 4 places, causing a syntax error on line 542 that breaks
all fixture recording.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL: Add validation to prevent command injection via malicious environment variable names in `export "${var_name}=..."` patterns.
Vulnerability Details:
- All instances of `export "${var_name}=${value}"` where var_name is derived from external sources (manifest.json auth fields, user input, API responses) were vulnerable to command injection
- If var_name contained shell metacharacters like `;`, `$()`, or backticks, arbitrary code could be executed
- Example exploit: var_name=`FOO; rm -rf /` would execute the rm command
Affected Files:
- shared/key-request.sh: _try_load_env_var() - var_name from manifest.json
- shared/common.sh: _load_token_from_config(), ensure_api_token_with_provider(), _multi_creds_load_config(), _multi_creds_prompt(), _poll_instance_once() - var_name from function parameters
- test/record.sh: _load_multi_config_from_file(), _try_load_cloud_config(), _prompt_cloud_creds_interactive() - var_name from test fixtures
Fix Applied:
- Added regex validation before all export statements: `^[A-Z_][A-Z0-9_]*$`
- This allowlist enforces standard POSIX environment variable naming (uppercase letters, digits, underscores only, must start with letter or underscore)
- Returns error if validation fails, preventing injection
Impact:
- While current usage passes hardcoded env var names (e.g., "HCLOUD_TOKEN"), the vulnerability existed in the implementation
- manifest.json is currently trusted, but defense-in-depth prevents supply chain attacks or accidental malformed entries
- Test infrastructure was also vulnerable to malicious fixture data
Agent: security-auditor
Co-authored-by: Spawn Refactor Service <refactor@spawn.service>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add ServerSpace (serverspace.io) as a new cloud provider with global
locations (EU, US, Asia). Uses REST API with X-API-KEY auth and async
task-based server creation with polling.
- serverspace/lib/common.sh: Full provider library with API wrapper,
SSH key management, server provisioning with cloud-init, task polling
- serverspace/claude.sh: Claude Code agent deployment
- serverspace/aider.sh: Aider agent deployment
- serverspace/goose.sh: Goose agent deployment
- manifest.json: Cloud definition + 15 matrix entries (3 implemented)
- test/mock.sh: URL stripping, body validation, synthetic responses
- test/record.sh: Endpoints, auth, API calls, error detection
- test/fixtures/serverspace/: Mock fixtures for all API endpoints
Co-authored-by: OpenRouter Bot <noreply@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
- test/mock.sh: Extract _tracked_assert and _categorize_failure from run_test (86->74 lines)
- ionos/lib/common.sh: Extract _ionos_validate_create_params and _ionos_require_ubuntu_image from create_server (51->28 lines)
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Civo tests failed because networks.json, disk_images.json, and
correctly-named sshkeys.json fixtures were missing. Hetzner tests
failed because datacenters.json was missing (needed for server type
validation). Scaleway tests failed because SCW_DEFAULT_PROJECT_ID
was missing from env, images.json had no Ubuntu images, and
create_server.json fixture was absent.
Also adds Civo and Scaleway to mock's _synthetic_active_response
for instance polling, and fixes Scaleway account API URL stripping.
Results: 435 passed, 0 failed, 1 skipped (previously 270/165/1).
Agent: pr-maintainer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(security): harden weak crypto fallbacks, key validation, and temp paths
- CSRF state generation: fail instead of using predictable date+$RANDOM
fallback when openssl and /dev/urandom are unavailable (OAuth CSRF bypass)
- Kamatera password: fail instead of using predictable date-based password
when no secure random source available
- key-server validKeyVal: enforce 8-512 char limits and ASCII-only check
to block malformed/oversized values (Fixes#969)
- upload_config_file: use mktemp-derived randomness for remote temp paths
instead of predictable $RANDOM (symlink attack on remote server)
Agent: security-auditor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): update assertions for upload_config_file mktemp-derived paths
The upload_config_file function now uses mktemp-derived basenames
(spawn_config_tmp.XXX) instead of the original filename for remote temp
paths. Update test/run.sh assertions to:
- Match "spawn_config" in the -file upload path
- Verify mv commands move files to correct final destinations
(settings.json, .claude.json)
Addresses reviewer feedback on PR #1039.
Agent: pr-maintainer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 improvements to the QA cycle:
1. Fix agents now get structured failure context — categorized failures
(exit_code, missing_api_call, missing_env, no_fixture) instead of
raw 500-line test output, plus a passing agent for comparison
2. Fix agent changes are verified before committing — re-runs mock tests
after the agent finishes and only commits if results actually improved,
discarding bad fixes that would create noise PRs
3. Test results now include failure categories — mock.sh records
cloud/agent:fail:reason instead of just cloud/agent:fail, enabling
smarter failure routing
4. Mock curl logs NO_FIXTURE warnings when no fixture matches a GET
request, surfacing false-confidence gaps where tests pass with
synthetic fallback data
5. Phase 3 (code fix) failures now escalate to GitHub issues after 3
consecutive cycles, matching the Phase 1 escalation pattern
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Extract _get_multi_cred_spec, _load_multi_config_from_file, and
_save_multi_config_to_file helpers to eliminate duplicated per-cloud
config blocks in try_load_config, save_config, has_credentials,
prompt_credentials, and list_clouds.
The cloud-to-credential mapping (OVH, UpCloud, Kamatera, AtlanticNet,
CloudSigma) is now defined once in _get_multi_cred_spec and consumed
by all five functions, making it trivial to add new multi-credential
clouds.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements Webdock cloud provider with full API integration:
- webdock/lib/common.sh with REST API primitives
- claude.sh, cline.sh, aider.sh agent scripts
- Test coverage in test/record.sh and test/mock.sh
- manifest.json updated with cloud entry and matrix
- README.md with usage documentation
Webdock offers affordable European VPS (€2.15/month starting) with
full REST API, SSH access, and developer-friendly features.
Agent: cloud-scout-1
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Both mock.sh and record.sh now run each cloud's tests/recordings
concurrently as background jobs instead of sequentially.
Results are aggregated after all clouds finish.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: strip ANSI colors before grepping test summary
The mock test output uses ANSI escape codes for colored ✓/✗/━━━
characters, so the grep in the Post summary step couldn't match
them. Strip colors with sed first.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use NO_COLOR standard instead of sed to strip ANSI codes
mock.sh now respects the NO_COLOR env var (https://no-color.org/).
CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add CloudSigma cloud provider
Add CloudSigma as a new cloud provider with API-first architecture:
- Create cloudsigma/lib/common.sh with HTTP Basic Auth support
- Implement cloudsigma/claude.sh and cloudsigma/aider.sh agent scripts
- Add CloudSigma to manifest.json (38th cloud provider)
- Add matrix entries for all 15 agents (2 implemented, 13 missing)
- Update test/record.sh with CloudSigma endpoints and auth handling
- Update test/mock.sh with URL-stripping for CloudSigma API
- Add cloudsigma/README.md with usage documentation
CloudSigma features:
- API v2.0 with HTTP Basic Auth (email:password)
- Regions: ZRH (Zurich), WDC (Washington DC), LVS (Las Vegas)
- Granular resource control (CPU/RAM/Disk independently configurable)
- Ubuntu 24.04 cloned from public library drives
- SSH access via cloudsigma user
- Pay-as-you-go pricing starting at ~$14/month
Agent: cloud-scout
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: address security review comments for CloudSigma provider
- [CRITICAL] Fix command injection in credential saving: use sys.argv
instead of raw shell interpolation in Python strings
- [CRITICAL] Fix shell injection in create_cloudsigma_drive: pass name
and size via sys.argv instead of inline interpolation
- [CRITICAL] Fix shell injection in SSH key fingerprint lookups: pass
fingerprint via sys.argv
- [HIGH] Replace hardcoded VNC password with random generation via
openssl rand -hex 8
- [MEDIUM] Fix config file path injection: pass via sys.argv
Agent: pr-maintainer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add HOSTKEY (https://hostkey.com/) as a new cloud provider to the spawn
matrix. HOSTKEY offers affordable VPS hosting starting from €1/month with
hourly billing, making it suitable for running AI agents that use remote
API inference.
Changes:
- Created hostkey/lib/common.sh with HOSTKEY API wrappers
- Implemented hostkey/claude.sh (Claude Code agent)
- Implemented hostkey/openclaw.sh (OpenClaw agent)
- Added HOSTKEY to manifest.json clouds section
- Added matrix entries for all 15 agents (2 implemented, 13 missing)
- Updated test/record.sh with HOSTKEY test infrastructure
- Updated test/mock.sh with HOSTKEY URL handling
- Created hostkey/README.md with usage instructions
Data centers: Amsterdam, Frankfurt, Helsinki, Reykjavik, Istanbul, New York
Agent: cloud-scout
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add Atlantic.Net Cloud as a new cloud provider with REST API support.
Starting at $4-8/mo for budget VPS instances with SSH access.
Implementation:
- Created atlanticnet/lib/common.sh with HMAC-SHA256 API auth
- Implemented 3 agent scripts: claude.sh, aider.sh, openclaw.sh
- Updated manifest.json with cloud entry and 15 matrix entries
- Added test coverage in test/record.sh and test/mock.sh
- Created atlanticnet/README.md with usage docs
API authentication uses timestamp + random GUID signed with private key.
Defaults: G2.2GB plan, ubuntu-24.04_64bit image, USEAST2 location.
Agent: cloud-scout-1
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Reduces setup_env_for_cloud (84 lines -> 8 lines) and assert_cloud_api_calls
(32 lines -> 9 lines) in test/mock.sh by moving cloud-specific data into
per-cloud _env.sh and _api_assertions.sh files in test/fixtures/.
Adding a new cloud's test config now only requires creating two small files
in the fixtures directory instead of editing case branches in mock.sh.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CodeSandbox as a new sandbox cloud provider for running AI agents.
CodeSandbox features:
- Firecracker microVMs with ~2 second start times
- SDK/CLI-based exec (no SSH)
- Free tier: 40 hours/month on Build plan
- Secure isolated environments
Implementation:
- Created codesandbox/lib/common.sh with SDK wrapper functions
- Implemented 3 agent scripts: claude, aider, openclaw
- Added CodeSandbox to manifest.json clouds
- Created matrix entries (3 implemented, 12 missing)
- Updated test/record.sh to list as non-recordable CLI cloud
- Added codesandbox/README.md with usage instructions
The implementation follows the existing pattern from e2b and modal,
using Node.js SDK (@codesandbox/sdk) for sandbox lifecycle management.
Agent: cloud-scout
Co-authored-by: B (Discovery Team) <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The 5 per-cloud live recording functions (_live_hetzner, _live_digitalocean,
_live_vultr, _live_linode, _live_civo) each duplicated 50-65 lines of
identical create->save->extract-id->delete->save logic. Extract a generic
_live_create_delete_cycle helper that handles the shared flow, with per-cloud
body builder functions providing only the cloud-specific parts.
Reduces test/record.sh by 112 lines (1016 -> 904) while preserving all
behavior including cloud-specific delete delays and empty-response fallbacks.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use sys.argv to pass shell values to inline Python instead of direct
string interpolation, preventing single-quote injection attacks across
cloud lib common.sh files and test/record.sh.
Also fix eval injection in test/record.sh try_load_config() by replacing
eval of Python-generated export statements with safe tab-separated
parsing and direct variable assignment.
Fixes#759Fixes#760
Agent: security-auditor
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
((var++)) returns exit code 1 when the variable is 0 (falsy), which
causes set -e to terminate the script. Replace all instances with
the safe var=$((var + 1)) pattern in sprite/lib/common.sh and
test/run.sh.
Fixes#762
Agent: community-coordinator
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace hardcoded 4-cloud script list in run_shellcheck with dynamic
discovery that covers all 21 clouds automatically
- Convert 3 inline JSON templates (setup_claude_code_config,
setup_openclaw_config, setup_continue_config) from single-line printf
to readable heredocs while preserving json_escape security
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Break down the 150-line run_test() function into focused helpers:
- run_script_with_timeout(): script execution with env vars and timeout
- show_failure_output(): display last 20 lines on failure
- assert_error_scenario(): handle error scenario assertions
- assert_cloud_api_calls(): cloud-specific API call assertions
- record_test_result(): write pass/fail to RESULTS_FILE
run_test() is now 57 lines (62% reduction), each helper is under 35 lines.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mock curl heredoc script was a monolithic 287-line function with
inline arg parsing, error injection, URL routing, body validation,
fixture lookup, and state tracking all in one flow.
Extract 10 focused helper functions within the heredoc:
- _parse_args: curl argument parsing
- _maybe_inject_error: MOCK_ERROR_SCENARIO handling
- _handle_special_urls: install scripts, OpenRouter, spawn repo
- _strip_api_base: URL-to-endpoint mapping for 14 cloud APIs
- _check_fields / _validate_body: POST body validation
- _try_fixture: fixture file lookup
- _synthetic_active_response: cloud-specific GET-by-ID responses
- _respond_get / _respond_post: METHOD-based response routing
- _track_state: creation/deletion state tracking
The main logic is now a 26-line sequence of named function calls,
making the mock's control flow immediately readable.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>