Commit graph

1018 commits

Author SHA1 Message Date
A
122b59e4da
test: add 99 tests for post-session summary and SPAWN_DASHBOARD_URL convention (#1040)
Cover the _show_post_session_summary function and updated
ssh_interactive_session integration from PR #1037. Tests verify:

- Summary warns user their server is still running with IP
- Dashboard URL shown when SPAWN_DASHBOARD_URL is set
- Generic message when no dashboard URL is available
- Reconnect command uses correct SSH_USER and IP
- SSH exit code preserved through the summary display
- All 25 SSH-based cloud providers set SPAWN_DASHBOARD_URL
- SPAWN_DASHBOARD_URL uses HTTPS and is defined before usage
- Detects custom interactive_session implementations missing summary
  (alibabacloud flagged as known gap)

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 20:46:40 -05:00
A
01c91798d6
refactor(alibabacloud): simplify create_server by extracting helpers (#1035)
- Extract `_aliyun_json_list_first` helper for flat JSON lists (unlike
  `_aliyun_json_field` which handles lists of dicts)
- Extract `_aliyun_extract_instance_id` to replace inline Python parser
- Extract `_ensure_network_infrastructure` to consolidate VPC/vSwitch/SG setup
- Use `_log_diagnostic` for structured error reporting (consistent with
  patterns in shared/common.sh)

Reduces create_server from 86 to 69 lines and eliminates inline Python.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 20:12:40 -05:00
A
beceb69962
test: add 151 tests for key-server security-critical logic (#1036)
Add comprehensive test coverage for the key-server
(.claude/skills/setup-agent-team/key-server.ts), which previously had
zero tests despite containing security-critical logic:

- validKeyVal: API key validation (control chars, shell metacharacters,
  length limits) - 37 tests
- SAFE_PROVIDER_RE: path traversal prevention in provider names - 21 tests
- UUID_RE: batch ID format validation - 12 tests
- signHmac/verifyHmac: HMAC signing and verification for signed URLs - 17 tests
- isAuthed: timing-safe Bearer token auth - 9 tests
- rateCheck: rate limiting logic - 8 tests
- esc: HTML escaping for XSS prevention - 13 tests
- cleanup: data store batch expiry logic - 9 tests
- Key submission validation flow - 6 tests
- Route matching, security headers, backward compat - 19 tests

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 20:11:35 -05:00
Ahmed Abushagur
c6d0cb218e
improve: make QA bot more effective with structured failures and verification (#1034)
5 improvements to the QA cycle:

1. Fix agents now get structured failure context — categorized failures
   (exit_code, missing_api_call, missing_env, no_fixture) instead of
   raw 500-line test output, plus a passing agent for comparison

2. Fix agent changes are verified before committing — re-runs mock tests
   after the agent finishes and only commits if results actually improved,
   discarding bad fixes that would create noise PRs

3. Test results now include failure categories — mock.sh records
   cloud/agent:fail:reason instead of just cloud/agent:fail, enabling
   smarter failure routing

4. Mock curl logs NO_FIXTURE warnings when no fixture matches a GET
   request, surfacing false-confidence gaps where tests pass with
   synthetic fallback data

5. Phase 3 (code fix) failures now escalate to GitHub issues after 3
   consecutive cycles, matching the Phase 1 escalation pattern

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:07:54 -05:00
A
f121b60d80
fix(ux): show post-session summary with server status and reconnect info (#1037)
After an interactive SSH session ends, users are now shown:
- A warning that their server is still running (and may incur charges)
- A link to the cloud provider's dashboard to manage/delete it
- The SSH command to reconnect

This prevents users from unknowingly leaving servers running after
exiting their agent session. Covers all 25 SSH-based cloud providers.

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 20:06:40 -05:00
A
f586e19790
fix(security): replace unquoted heredocs with printf to prevent shell expansion in API keys (#1031)
Unquoted `<< EOF` heredocs in nanoclaw .env file creation cause shell
expansion of the API key value. If an API key contains `$`, backticks,
or `\`, the value is silently corrupted or could trigger command
execution. Replace with `printf '%s'` which safely writes the value
without interpretation.

Also fix unquoted variable expansion in upload_config_file's mv command
and the github-codespaces/openclaw.sh config heredoc.

Fixes 34 scripts across all cloud providers.

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 19:41:10 -05:00
A
e452ea8944
fix(security): validate branch names and cloud names in qa-cycle.sh (#1033)
Add validate_branch_name() and validate_cloud_name() to qa-cycle.sh to
prevent command injection via unvalidated strings passed to git/gh
commands. Cloud names parsed from test/record.sh output via sed were
used directly in branch names, git push, git worktree, and gh pr create
commands without validation.

Fixes #1028

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 19:40:14 -05:00
A
8c052aceb8
test: add 239 tests for CloudSigma provider patterns and conventions (#1030)
Validates CloudSigma's unique architecture: region-based API URLs,
HTTP Basic Auth (email + password), drive cloning workflow, python3
JSON construction, SSRF-preventing region validation, and SSH with
'cloudsigma' user. Covers lib/common.sh API surface, all 8 agent
scripts, manifest consistency, and test infrastructure (mock.sh +
record.sh).

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 19:39:58 -05:00
A
059690f8d7
fix(ux): include cloud provider dashboard URLs in script failure and interrupt messages (#1029)
When spawn scripts fail or are interrupted, error messages now include
the cloud provider's actual dashboard URL instead of generic "check your
cloud provider dashboard" text. This helps users quickly navigate to
their provider to check server status, clean up orphaned resources, or
debug provisioning failures.

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 16:01:57 -08:00
A
7d6bc0292b
fix(ux): add preflight credential check to interactive mode (#1027)
The interactive flow (bare `spawn`) was missing the preflight credential
warning that the direct `spawn <agent> <cloud>` path already had. Users
who picked an agent and cloud interactively would not be warned about
missing credentials, leading to confusing failures from the cloud
provider script. Now both paths warn about missing credentials before
launching.

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:52:03 -05:00
A
334e10ead2
refactor: decompose ensure_aliyun_credentials and extract _aliyun_instance_public_ip (#1026)
Extract _aliyun_load_or_prompt_credentials and _aliyun_configure_cli from
the 68-line ensure_aliyun_credentials function, reducing it to 16 lines.
Extract _aliyun_instance_public_ip to replace inline Python in
_wait_for_aliyun_instance, making IP extraction reusable and consistent
with the existing _aliyun_json_field helper pattern.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:31:39 -05:00
A
b6a07e3c60
fix: prevent sensitive file exfiltration via --prompt-file flag (#1024)
Add path validation to --prompt-file to block reading sensitive files
(SSH keys, cloud credentials, .env files, etc.) whose contents would be
sent to remote agents. Also adds file size validation (1MB limit) and
stat-based file type checking.

Fixes #991

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:30:05 -05:00
A
b76f04cd78
fix(ux): show cloud count and credential readiness in interactive agent picker (#1025)
When users run `spawn` interactively, the agent picker now shows how many
clouds each agent supports and how many have credentials ready. This helps
users quickly identify which agents they can deploy immediately.

Before: "Claude Code  AI coding assistant"
After:  "Claude Code  2 clouds, 1 ready"

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 18:29:25 -05:00
A
46b760cf2b
test: add 364 tests for Oracle Cloud Infrastructure provider patterns (#1023)
Covers OCI CLI dependency management, VCN networking decomposition
(VCN -> IGW -> route -> security rules -> subnet), instance creation
with flex shape handling, cloud-init userdata, SSH delegation,
server destruction, availability domain handling, and all 15 agent
scripts following correct provisioning flow.

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:07:41 -05:00
A
aafe3d1ce4
fix: eliminate duplicate Loading manifest spinner in agent/cloud info (#1021)
When running `spawn claude` or `spawn hetzner`, the "Loading manifest..."
spinner appeared twice: once in showInfoOrError() and again in
cmdAgentInfo/cmdCloudInfo via validateAndGetEntity(). Pass the
pre-loaded manifest to avoid the redundant load and spinner flash.

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:07:08 -05:00
A
415df93ea0
refactor: decompose latitude and contabo create_server into focused helpers (#1022)
Extract validation, error handling, and response parsing from
create_server into dedicated helpers following the pattern from PR #1016.

Latitude helpers: _latitude_validate_inputs, _latitude_check_create_error,
_latitude_extract_server_id

Contabo helpers: _contabo_validate_inputs, _contabo_check_create_error,
_contabo_extract_instance_id

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:05:18 -05:00
A
5412b9891f
fix: validate ALIYUN_IMAGE_ID and fix HOSTKEY input validation ordering (#1019)
- Add validate_resource_name check for ALIYUN_IMAGE_ID env var in
  alibabacloud create_server, consistent with other providers (Contabo,
  Webdock) that validate user-controllable image identifiers
- Move HOSTKEY location validation before _pick_instance_preset call,
  which uses the location in an API request — validates input before
  use rather than after

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:03:34 -05:00
A
41afc76537
test: add 157 tests for Alibaba Cloud provider patterns and conventions (#1020)
Alibaba Cloud was added in commit 0d9307a with zero test coverage.
This adds comprehensive tests covering:
- lib/common.sh API surface (required + provider-specific functions)
- CLI installation and credential handling
- SSH key management (DescribeKeyPairs, ImportKeyPair)
- Server lifecycle (VPC, vSwitch, SecurityGroup, RunInstances)
- Network infrastructure setup (CIDR ranges, availability zones)
- Instance polling behavior
- Security conventions (input validation, safe JSON parsing, macOS compat)
- Agent script patterns (claude.sh, codex.sh, gemini.sh)
- OpenRouter env var injection via SSH
- Manifest consistency checks

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:00:55 -05:00
A
f1e7939188
refactor: decompose alibabacloud create_server into focused helpers (#1018)
Extract _ensure_vpc, _ensure_vswitch, _aliyun_json_field, and
_aliyun_json_top_field from the 182-line create_server function.
This reduces create_server to 85 lines and eliminates repeated
inline Python JSON parsing across multiple functions.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 17:54:47 -05:00
A
8a5d03995b
fix: validate provider name in invalidate_cloud_key and improve key validation (#1017)
- Add regex validation (^[a-z0-9][a-z0-9._-]{0,63}$) to invalidate_cloud_key()
  in shared/key-request.sh to prevent path traversal attacks that could delete
  arbitrary files via crafted provider names (e.g., ../../etc/important)

- Improve validKeyVal() in key-server.ts to block control characters
  (U+0000-U+001F, U+007F-U+009F) and enforce a 4096-byte max length on
  API key values, preventing injection of null bytes, newlines, and
  excessively long values

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 14:43:44 -08:00
A
388770126f
refactor: decompose webdock create_server and koyeb ensure_koyeb_cli into focused helpers (#1016)
webdock/lib/common.sh:
- Extract _webdock_get_public_key_ids() for SSH key ID fetching
- Extract _webdock_validate_inputs() for input validation
- Extract _webdock_handle_create_response() for response parsing and error reporting
- create_server reduced from 53 to 24 lines

koyeb/lib/common.sh:
- Extract _koyeb_detect_os() for OS detection
- Extract _koyeb_detect_arch() for architecture detection
- Extract _koyeb_install_cli() for download and PATH setup
- ensure_koyeb_cli reduced from 51 to 13 lines

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 14:22:24 -08:00
A
beec9ab8a3
fix: show signal names instead of 'code null' when scripts are killed (#1014)
When a spawn script is killed by a signal (SIGKILL, SIGTERM, SIGHUP, etc.),
Node.js returns exit code null. Previously this produced the confusing message
"Script exited with code null". Now detects the actual signal and shows
signal-specific guidance: OOM suggestions for SIGKILL, terminal reconnection
tips for SIGHUP, spot instance warnings for SIGTERM.

Fixes #1011

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 14:12:43 -08:00
A
a260dce642
test: add 133 tests for Webdock provider patterns and conventions (#1015)
Webdock was added in PR #1001 with zero dedicated test coverage.
This adds comprehensive tests validating:
- lib/common.sh API surface (required + provider-specific functions)
- API base URL and constants
- Credential handling (ensure_api_token_with_provider pattern)
- SSH key management (json_escape for injection prevention)
- Server lifecycle (generic_cloud_api, generic_wait_for_instance)
- SSH delegation pattern (ssh_run_server, ssh_upload_file, etc.)
- Security conventions (no echo -e, no set -u, validate_resource_name)
- Agent script patterns (claude, aider, cline)
- Manifest consistency (type, auth, exec_method, defaults)
- Test infrastructure coverage (mock.sh and record.sh entries)

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 14:11:47 -08:00
A
0d9307a907
feat: Add Alibaba Cloud provider support (#1002)
Adds Alibaba Cloud (Aliyun) ECS provider with 3 initial agent implementations.

Provider details:
- API: Alibaba Cloud CLI (aliyun ecs commands)
- Pricing: Starting at ~$3.50/month for entry-level instances
- Regions: Global coverage with strong Asia-Pacific presence
- Instance types: Burstable T5 instances for cost-effective compute

Implements: claude, codex, gemini

Key features:
- Automatic CLI installation
- VPC and vSwitch auto-creation
- Security group configuration with SSH access
- Cloud-init support for automated agent setup
- Credential persistence in ~/.config/spawn/alibabacloud.json

Test coverage: Skipped (CLI-based provider, test infrastructure targets REST APIs)

Agent: cloud-scout-2

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 13:57:22 -08:00
A
69d08e6b1d
fix: improve CLI UX with clearer credential status and help docs (#1012)
- Change 'auth: TOKEN' to 'needs TOKEN' with yellow highlight in spawn clouds
- Always show legend footer explaining ready/needs indicators
- Add --clear hint to spawn list footer
- Show --version/-v and --help/-h aliases in help text
- Document SPAWN_UNICODE=1 env var in help
- Include HTTP status code in update fetch errors
- Bump version to next patch

Fixes #1010

Agent: issue-fixer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 13:53:53 -08:00
A
bd28508fd8
test: add 40 tests for decomposed shared/common.sh helpers (#1009)
Tests cover the recently decomposed helper functions from PR #976
(cmdAgentInfo, generic_wait_for_instance) to ensure the refactored
helpers maintain correct behavior.

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
2026-02-13 13:42:53 -08:00
A
ea5d462f4f
refactor: decompose multi-credential config handling in test/record.sh (#1004)
Extract _get_multi_cred_spec, _load_multi_config_from_file, and
_save_multi_config_to_file helpers to eliminate duplicated per-cloud
config blocks in try_load_config, save_config, has_credentials,
prompt_credentials, and list_clouds.

The cloud-to-credential mapping (OVH, UpCloud, Kamatera, AtlanticNet,
CloudSigma) is now defined once in _get_multi_cred_spec and consumed
by all five functions, making it trivial to add new multi-credential
clouds.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 13:34:37 -08:00
A
b0ebaa94bb
refactor: decompose load_cloud_keys_from_config into focused helpers (#1007)
Extract three helpers from the 82-line, 14-conditional function:
- _parse_cloud_auths: extract cloud auth specs from manifest.json
- _try_load_env_var: load a single env var from env or config file
- _load_cloud_credentials: load all env vars for one cloud provider

The main function is now a 36-line orchestrator with clear flow:
validate prerequisites -> parse manifest -> iterate clouds -> summarize.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
2026-02-13 13:29:53 -08:00
A
d3919cafda
test: add 55 tests for agent info quick-start display (#1005)
Cover printAgentQuickStart (commands.ts) which has zero test coverage:
- Single-auth and multi-auth cloud credential display
- URL hint placement (only on first auth var)
- All/partial/no credentials detection ("ready to go" vs export lines)
- No-auth cloud (auth="none") handling
- Agent info header, install line, available clouds listing
- Credential prioritization in cloud ordering
- Grouped cloud type display and credential indicators
- Pure logic replica tests for quick-start computation

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 13:25:26 -08:00
A
2a66805b33
feat: Add Webdock provider support (#1001)
Implements Webdock cloud provider with full API integration:
- webdock/lib/common.sh with REST API primitives
- claude.sh, cline.sh, aider.sh agent scripts
- Test coverage in test/record.sh and test/mock.sh
- manifest.json updated with cloud entry and matrix
- README.md with usage documentation

Webdock offers affordable European VPS (€2.15/month starting) with
full REST API, SSH access, and developer-friendly features.

Agent: cloud-scout-1

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 13:24:06 -08:00
A
ac56e5454a
fix: improve CLI UX with better error messages and help text (#1003)
- Show list-specific flags (-a, -c, --clear) in unknown flag error
- Add specific error for empty prompt files instead of generic validation
- Document SPAWN_UNICODE=1 env var in help text and troubleshooting
- Show filter/clear hints in interactive list picker

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
2026-02-13 13:04:48 -08:00
A
1a2cec6b81
feat: Add CloudSigma support for 6 agents (#998)
Implements CloudSigma matrix entries for openclaw, nanoclaw, interpreter, continue, gemini, and codex. All scripts follow the standard CloudSigma pattern with OpenRouter API key injection.

Agent: gap-filler

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 12:57:49 -08:00
A
d2fbd325b0
refactor: decompose fly get_server_name and oracle _setup_vcn_networking (#1000)
- fly/lib/common.sh: Replace 23-line get_server_name() that duplicated
  env-var-check, prompt, and validation logic with a one-line call to the
  shared get_validated_server_name helper, matching all other cloud providers.

- oracle/lib/common.sh: Break _setup_vcn_networking (48 lines, 3 distinct
  responsibilities) into focused helpers:
  - _create_internet_gateway: creates the IGW resource
  - _add_default_route: configures the route table
  - _add_ssh_security_rules: opens SSH port in the security list
  The orchestrator _setup_vcn_networking now delegates to these three helpers.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
2026-02-13 12:57:11 -08:00
A
edc5993b61
feat: Add HOSTKEY support for nanoclaw, aider, goose, cline, continue (#999)
Implements 5 missing HOSTKEY matrix entries:
- hostkey/nanoclaw
- hostkey/aider
- hostkey/goose
- hostkey/cline
- hostkey/continue

All scripts follow the standard pattern:
1. Authenticate with HOSTKEY
2. Create server instance
3. Install agent
4. Configure OpenRouter API key injection
5. Launch interactive session

Agent: gap-filler

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 12:40:29 -08:00
A
892d53397f
fix: add --delete-branch to all gh pr close commands (#997)
Ensures closing a PR also deletes its remote branch, consistent with
how gh pr merge already uses --delete-branch. Removes redundant manual
git push origin --delete calls that were previously needed.

Fixes #942

Agent: pr-maintainer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 12:32:43 -08:00
A
b39a691b16
fix: validate SLACK_WEBHOOK format to prevent command injection (#996)
SLACK_WEBHOOK was embedded directly in heredocs at three locations,
allowing potential command injection if the env var contained shell
metacharacters. Added early validation requiring the URL to match
the expected Slack webhook format (https://hooks.slack.com/...).
Also stopped leaking the full webhook URL into prompt text.

Fixes #992

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
2026-02-13 12:28:09 -08:00
A
583d2a63fc
test: add 38 tests validating test infrastructure stays in sync with manifest (#995)
Validates that test/mock.sh and test/record.sh stay in sync with
manifest.json. When a new cloud provider is added, CLAUDE.md mandates
updating both files with endpoint mappings, auth env vars, and API
dispatchers. These tests catch configuration drift automatically:

- ALL_RECORDABLE_CLOUDS completeness and no duplicates
- get_endpoints(), get_auth_env_var(), call_api() coverage parity
- _strip_api_base() URL patterns match fixture directories
- Fixture directories have required _env.sh and _metadata.json
- Auth env vars in record.sh match manifest auth fields
- Shell script conventions (shebang, set -eo pipefail, no echo -e)
- Test infrastructure conventions (NO_COLOR, cleanup traps, counters)

Agent: test-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 12:21:53 -08:00
A
a0f6b335a4
fix: harden upload_file path validation with strict allowlist regex across 10 clouds (#993)
Replace fragile blocklist validation and printf '%q' escaping in upload_file()
with strict allowlist regex [a-zA-Z0-9/_.~-]+ across all non-SSH cloud providers.
For codesandbox, additionally migrate from shell command interpolation to SDK
filesystem API via environment variables, eliminating the injection surface entirely.

Affected clouds: codesandbox, daytona, e2b, fly, koyeb, modal, northflank,
railway, render, sprite

Fixes #989

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 12:20:40 -08:00
A
67424c4bdc
refactor: decompose ensure_jq and ensure_gh_cli into focused helpers (#994)
Extract platform-specific install logic from monolithic installer functions
into small, focused helpers. Both functions had nested OS/package-manager
cascades (depth 3-4) that made the control flow hard to follow.

ensure_jq (shared/common.sh):
- Extract _install_jq_brew, _install_jq_apt, _install_jq_dnf, _install_jq_apk
- Extract _report_jq_not_found for the fallthrough error message
- Main function becomes a clean dispatcher + verification

ensure_gh_cli + _install_gh_binary (shared/github-auth.sh):
- Extract _install_gh_brew, _install_gh_apt, _install_gh_dnf
- Extract _detect_gh_platform, _fetch_gh_latest_version, _download_and_install_gh
- _install_gh_binary drops from 71 to 12 lines as a clean orchestrator
- ensure_gh_cli drops from 57 to 29 lines

No behavior changes. All tests pass, bash -n passes.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 12:14:56 -08:00
Ahmed Abushagur
e720e15c9b
fix: give QA fix agents full mock test output instead of 10-line snippets (#988)
Previously, Phase 3 fix agents only got the last 10 lines grepped from
the log file per failing script. This was often insufficient to diagnose
the root cause. Now runs `bash test/mock.sh {cloud}` per failing cloud
and feeds the complete output to the fix agent.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:59:59 -08:00
A
fff84779dc
refactor: decompose cmdAgentInfo and generic_wait_for_instance into focused helpers (#976)
Extract printAgentQuickStart from cmdAgentInfo (63 -> 43 lines), paralleling
the existing printCloudQuickStart pattern. Extract _poll_instance_once and
_report_instance_timeout from generic_wait_for_instance (52 -> 20 lines),
eliminating duplicated elapsed/sleep/increment code in the polling loop.

Agent: complexity-hunter

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
2026-02-13 11:48:07 -08:00
Ahmed Abushagur
1d9a2dbad1 perf: run cloud tests and recordings in parallel (#982)
Both mock.sh and record.sh now run each cloud's tests/recordings
concurrently as background jobs instead of sequentially.
Results are aggregated after all clouds finish.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:44:57 -08:00
Ahmed Abushagur
4e3f77f9bb
feat: track consecutive fixture recording failures and auto-escalate (#986)
When a cloud's fixture recording fails 3+ consecutive QA cycles, the
system now auto-creates a GitHub issue flagging the persistent failure.
This catches stale API keys, changed endpoints, and other silent
regressions that would otherwise go unnoticed.

- Persistent tracker at .docs/qa-record-failures.json (git-ignored)
- Counter increments on failure, resets on success
- Deduplicates: skips issue creation if one already exists for that cloud

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:42:17 -08:00
Ahmed Abushagur
353f20d53a
docs: expand test infrastructure instructions for discovery bot (#987)
The bot was under-updating test/mock.sh when adding new clouds because
the prompt only mentioned URL stripping. Now lists all 4 required
mock.sh functions and all 5 required record.sh functions explicitly.

Also adds a "Mock Test Infrastructure" reference table to CLAUDE.md so
both human contributors and bots know exactly what to update.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:41:25 -08:00
A
1d4e5b874c
fix: add defense-in-depth for SPAWN_HOME path validation and manifest JSON sanitization (#984)
- Validate SPAWN_HOME is an absolute path, reject relative paths to prevent
  unintended file writes (addresses #980)
- Resolve SPAWN_HOME to canonical form to collapse .. segments
- Strip __proto__, constructor, and prototype keys from parsed manifest JSON
  to prevent prototype pollution (addresses #979)
- Apply sanitization to all manifest ingestion paths (GitHub fetch, disk cache,
  local dev manifest)
- Add 12 tests covering path validation and JSON sanitization

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 11:37:10 -08:00
A
16b9132c7c
fix: add confirmation to history clear and improve UX details (#983)
- Add interactive confirmation prompt before clearing spawn history
  (spawn list --clear) to prevent accidental data loss
- Show total prompt length in dry-run preview when prompt exceeds 100
  characters, so users can verify the correct prompt was loaded
- Add "Rerun previous" suggestion to non-interactive terminal fallback
- Show "(shown first)" hint when clouds with credentials are detected
  in interactive picker, so users understand the sort order
- Add repository URL to spawn version output for discoverability

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
2026-02-13 11:31:01 -08:00
Ahmed Abushagur
d501b5eb1d
fix: CI test summary uses NO_COLOR instead of sed hack (#985)
* fix: strip ANSI colors before grepping test summary

The mock test output uses ANSI escape codes for colored ✓/✗/━━━
characters, so the grep in the Post summary step couldn't match
them. Strip colors with sed first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use NO_COLOR standard instead of sed to strip ANSI codes

mock.sh now respects the NO_COLOR env var (https://no-color.org/).
CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:26:41 -08:00
Ahmed Abushagur
0ed8a29004
fix: stop QA cycle from auto-merging PRs, only create them (#981)
The QA cycle was auto-merging stale QA PRs that were mergeable.
Now it only closes stale ones — merging is left for human review.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:19:24 -08:00
A
3b4444f292
fix: show all auth vars in agent info quick start for multi-credential clouds (#975)
The `spawn <agent>` quick start section was only showing the first auth
env var when the best available cloud requires multiple credentials
(e.g., UpCloud with UPCLOUD_USERNAME + UPCLOUD_PASSWORD). This left
users confused about what other credentials they needed.

Now iterates over all auth vars, consistent with `spawn <cloud>` info.

Agent: ux-engineer

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 11:18:28 -08:00
A
c7dcbaa5af
fix: validate TARGET_SCRIPT against allowlist in trigger-server (#974)
Add startup validation for the TARGET_SCRIPT env var to prevent
arbitrary script execution. The validation:
- Requires .sh extension
- Checks the file exists
- Resolves symlinks and relative paths via realpathSync
- Verifies the real path is inside the allowed skill directory

Fixes #970

Agent: security-auditor

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 11:14:40 -08:00