Break down the 150-line run_test() function into focused helpers:
- run_script_with_timeout(): script execution with env vars and timeout
- show_failure_output(): display last 20 lines on failure
- assert_error_scenario(): handle error scenario assertions
- assert_cloud_api_calls(): cloud-specific API call assertions
- record_test_result(): write pass/fail to RESULTS_FILE
run_test() is now 57 lines (62% reduction), each helper is under 35 lines.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add comprehensive test coverage for the untested credential management
pipeline (_load_token_from_env, _load_token_from_config,
_validate_token_with_provider, _save_token_to_config,
_multi_creds_all_env_set, _multi_creds_load_config,
_multi_creds_validate) plus save/load roundtrip integration tests.
These functions are used by every cloud provider script but had zero
test coverage. Tests run in real bash subprocesses sourcing
shared/common.sh to catch actual shell behavior.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
install.sh is the critical entry point for new users (curl | bash) and
has been modified in 3 recent PRs but had zero test coverage. These tests
validate structure, conventions, security, curl|bash compatibility, the
source-mode fallback wrapper, clone_cli logic, find_install_dir, and
ensure_in_path behavior.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mock curl heredoc script was a monolithic 287-line function with
inline arg parsing, error injection, URL routing, body validation,
fixture lookup, and state tracking all in one flow.
Extract 10 focused helper functions within the heredoc:
- _parse_args: curl argument parsing
- _maybe_inject_error: MOCK_ERROR_SCENARIO handling
- _handle_special_urls: install scripts, OpenRouter, spawn repo
- _strip_api_base: URL-to-endpoint mapping for 14 cloud APIs
- _check_fields / _validate_body: POST body validation
- _try_fixture: fixture file lookup
- _synthetic_active_response: cloud-specific GET-by-ID responses
- _respond_get / _respond_post: METHOD-based response routing
- _track_state: creation/deletion state tracking
The main logic is now a 26-line sequence of named function calls,
making the mock's control flow immediately readable.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add comprehensive test coverage for the single-token credential management
functions in shared/common.sh that previously had zero test coverage:
- _load_token_from_env (env var detection, edge cases)
- _load_token_from_config (JSON config loading, error handling)
- _validate_token_with_provider (validation callback, env var cleanup)
- _save_token_to_config (secure file creation, JSON escaping, roundtrips)
- ensure_api_token_with_provider (full flow integration tests)
These functions are used by every single-token cloud provider (Hetzner,
DigitalOcean, Vultr, Lambda, Linode, etc.) and are security-critical
for credential handling.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract _create_logging_mock and _create_silent_mock from setup_mock_agents
(test/mock.sh) to eliminate repetitive mock creation patterns
- Extract _record_ensure_credentials, _record_endpoint, and
_record_write_metadata from record_cloud (test/record.sh) to separate
credential checking, API recording, and metadata writing concerns
Pure refactoring — no behavior changes.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a spawn script fails (e.g., SSH timeout, credentials issue), the
retry command shown to the user was `spawn <agent> <cloud>`, dropping
the --prompt argument the user originally provided. This was a regression
from PR #683 which accidentally removed the buildRetryCommand function
and prompt parameter that PR #712 had added.
Restores buildRetryCommand (truncates to 60 chars, escapes quotes) and
passes prompt through reportScriptFailure so users can copy-paste the
full retry command without reconstructing it from memory.
Adds 7 tests for buildRetryCommand covering truncation, quote escaping,
empty/undefined prompt, and boundary cases.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, when a user hit Ctrl+C during script execution, the CLI
silently exited with code 130. This left users unaware that a server
may have already been created and could still be running, potentially
incurring charges.
Now shows a warning about orphaned resources before exiting.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Delete start-refactor.sh from setup-trigger-service (hardcoded secret)
- Broaden .gitignore to cover all .claude/skills/*/start-*.sh
- Rotated REFACTOR_TRIGGER_SECRET in GitHub Actions
- Service now runs from correct setup-agent-team location
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simplify from 6 modes (Hexa-Mode) to 4 modes (Quad-Mode) by folding
single-PR review and hygiene into a unified review_all mode that runs
every 15 minutes. This removes the pull_request trigger entirely since
review_all catches all open PRs on schedule, and absorbs staleness
checks + branch cleanup into the same cycle.
Remaining modes: team_building, triage, review_all, scan.
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New issues are triaged by the security team before other workflows can
act on them. The triage agent checks for prompt injection, social
engineering, spam, and unsafe payloads — marking safe issues with
`safe-to-work`, closing malicious ones, or flagging unclear ones for
human review. Discovery and refactor workflows now require the
`safe-to-work` label in addition to their existing label requirements.
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New issue template: Team Building (team-building label) — 2 fields:
which agent team to improve + what to change
- Security team gets a new team_building mode: reads the issue, spawns
implementer + reviewer (both Opus), creates PR, reviews, merges, closes issue
- Discovery workflow: only triggers on cloud-request / agent-request issues
- Refactor workflow: only triggers on bug / cli issues
- Security workflow: only triggers on team-building issues (+ PR/schedule)
- All workflows still run on schedule and workflow_dispatch as before
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add security review team for PR review (#543)
Adds a security team that automatically reviews every PR for security
issues (injection, credential leaks, unsafe patterns, macOS compat)
and sends Slack notifications to #spawn when concerns are found.
- security.sh: dual-mode cycle script (PR review + scheduled scan)
- security.yml: GitHub Actions workflow on pull_request events
- start-security.sh: gitignored wrapper with secrets (deployed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: expand security team with hygiene, scan modes + auto-merge clean PRs
- PR mode: 2-agent team (code-reviewer + test-verifier) reviews PRs.
If zero findings, auto-approves AND merges. If concerns, requests
changes and sends Slack notification to #spawn.
- Hygiene mode (every 6h): pr-triager + branch-cleaner close stale PRs,
file follow-up issues, delete orphan branches.
- Scan mode (daily): shell-auditor + code-auditor + drift-detector
perform full repo security audit, file GitHub issues for findings.
- All modes use Claude Code agent teams (TeamCreate, parallel teammates
via Task tool, SendMessage coordination, TaskList monitoring).
- Workflow updated with schedule triggers and workflow_dispatch inputs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: upgrade all security auditor agents to Opus model
All security-critical roles (code-reviewer, pr-triager, shell-auditor,
code-auditor) now use Opus. Helper roles (test-verifier, branch-cleaner,
drift-detector) remain on Haiku.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: auto-merge PRs with MEDIUM/LOW or no findings
Only CRITICAL/HIGH findings block a PR. MEDIUM/LOW are informational
notes included in the approving review — PR still gets merged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: bun install creates empty directories in proot (Termux)
because proot can't intercept bun's symlink/hardlink/copy_file_range
syscalls. This breaks both local build and source-mode fallback.
Fix: when `bun run build` fails, download the pre-built cli.js from
the `cli-latest` GitHub release. The bundled binary is self-contained
(80KB, all deps inlined) and only needs the bun runtime.
- Add CI workflow (.github/workflows/cli-release.yml) that builds and
uploads cli.js to a rolling `cli-latest` release on every push to main
- Replace broken source-mode fallback with GitHub release download
- Bump CLI version to 0.2.63
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a spawn script fails with credential-related errors, the error
message now always includes "Run spawn <cloud> for setup instructions"
alongside the required env var names. Previously, this setup hint was
only shown when the auth env var names were unknown.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bun 1.3.8 on Termux proot doesn't resolve node_modules by walking up
from the source file directory. Changing cwd to ~/.spawn/ (where
node_modules lives) before exec ensures packages are found.
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a spawn script fails with exit code 255 (SSH connection failure),
the CLI now retries up to 2 times with progressive delays (5s, 10s).
Non-retryable failures (syntax errors, permission denied, Ctrl+C, and
generic exit code 1) are not retried and fail immediately as before.
Fixes#705
Agent: issue-fixer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bun 1.3.8 in Termux proot cannot resolve packages with --packages bundle
even with bun.lock present and after --force reinstall. When the bundled
build fails, install source + node_modules to ~/.spawn/ and create a
wrapper script that runs via `bun` directly.
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The non-git download path did not fetch bun.lock, causing bun install
to resolve dependencies from scratch. On older bun versions (e.g. 1.3.8
in Termux proot), this produced a node_modules layout that broke
`bun build --packages bundle`.
- Download bun.lock in the non-git (curl) path
- Add build retry with `bun install --force` fallback
- Enforce minimum bun version (1.2.0) with auto-upgrade
- Bump CLI version to 0.2.60
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The local/opencode.sh script already exists and is functional.
Agent: gap-filler
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Agents can self-review their PRs (read diff, add comments) but must
never merge. PRs get labeled `needs-team-review` and stay open for
external review by maintainers or a separate review cycle.
Changes across all three scripts:
- refactor.sh: added No Self-Merge Rule section, updated worktree
pattern, issue fix workflow, monitoring loop, and lifecycle mgmt
- discovery.sh: added No Self-Merge Rule section, updated worktree
pattern, git workflow, branch cleaner, and lifecycle mgmt
- qa-cycle.sh: renamed push_and_merge_pr to push_and_create_pr,
removed all gh pr merge calls, added self-review + labeling
Agent: team-lead
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kamatera: Replace _load_kamatera_config, _validate_kamatera_credentials, and
ensure_kamatera_token (62 lines of custom env/config/prompt/validate/save logic)
with ensure_multi_credentials (5 lines).
OVH: Replace _ovh_prompt_credentials, ensure_ovh_authenticated (72 lines of
custom credential management) with ensure_multi_credentials (8 lines).
Replace wait_for_ovh_instance (38 lines of custom polling with backoff) with
generic_wait_for_instance (8 lines).
Net: -175 lines, same behavior, consistent patterns across providers.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, `spawn list hetzner` always treated the bare positional
argument as an agent filter, returning 0 results since "hetzner" is a
cloud, not an agent. Now resolveListFilters auto-detects: when the
filter doesn't resolve as an agent but does resolve as a cloud, it
reclassifies to a cloud filter. This matches the help text which
promises "Filter history by agent or cloud name".
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds label lifecycle transitions (Pending Review → Under Review → In Progress)
directly into both issue mode and refactor mode prompts per maintainer request
on #546. The community-coordinator now manages labels at each stage of issue
engagement, and the shutdown checklist verifies all open issues are labeled.
Fixes#546
Agent: team-lead
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover groupByType, buildAgentLines, buildCloudLines, credentialHint,
mapToSelectOptions, buildRecordLabel, buildRecordHint, and
resolveDisplayName edge cases. Uses the established replica pattern
since these functions are not exported.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implements Plandex on Netcup VPS using netcup/lib/common.sh primitives.
Updates manifest.json to mark netcup/plandex as implemented.
Updates netcup/README.md with Plandex usage instructions.
Agent: gap-filler-netcup
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements Open Interpreter on local machine using local lib/common.sh primitives.
Agent: gap-filler-local
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Exit codes 126 (permission denied) and 137 (killed/OOM) previously
showed terse one-line messages with no suggestions for what to do.
Now they include specific causes and remediation steps, consistent
with all other exit codes in getScriptFailureGuidance.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `echo ""` on line 351 of get_model_id_interactive() was going to
stdout, causing it to be captured by command substitution into MODEL_ID.
This injected a newline into the openclaw.json config, breaking JSON
parsing with "invalid character '\n' at 15:0".
Fixes#553
Agent: issue-fixer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
13 cloud providers had identical 5-line check_ssh_key functions that
fetch SSH keys from the provider API and grep for the fingerprint.
Extract this pattern into a shared check_ssh_key_by_fingerprint helper
in shared/common.sh, reducing each cloud's function to a single line.
Affected clouds: BinaryLane, Cherry, Civo, Contabo, DigitalOcean,
Genesis Cloud, Hetzner, Hostinger, Latitude, Linode, OVH, Scaleway,
Vultr.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The generic_wait_for_instance timeout message previously just said
"did not become active in time" with no guidance. Now it follows the
same pattern as generic_ssh_wait by telling users what to do next.
Similarly, _validate_token_with_provider now shows the env var name
so users can set it directly instead of re-running interactively.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Support `spawn list claude` as shorthand for `spawn list -a claude`
- Add --agent and --cloud as long-flag aliases for -a and -c
- Fix flaky cmdlist-integration tests by priming manifest cache in
beforeEach and isolating XDG_CACHE_HOME to prevent cross-test leakage
- Export _resetCacheForTesting from manifest.ts for deterministic tests
- Update help text with new filter syntax and examples
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The get_model_id_interactive function returned MODEL_ID from env vars
without calling validate_model_id, bypassing the allowlist check. Also
migrated 13 legacy scripts from raw safe_read to get_model_id_interactive
which includes validation.
Agent: security-auditor
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The json_escape function already adds quotes around strings, so using
"%s" in printf was adding a second set of quotes, resulting in invalid
JSON like `"OPENROUTER_API_KEY": ""value""`.
Fixed railway/openclaw.sh and koyeb/openclaw.sh to use %s (unquoted)
for API key and token fields, matching the correct pattern used in
fly/openclaw.sh and shared/common.sh.
Fixes#542
Agent: issue-responder
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Cover the untested filter resolution path added in PR #537 where
cmdList resolves display names (e.g., "Claude Code" -> "claude") before
querying history. Also cover buildRecordLabel/buildRecordHint helpers
from PR #531 and the double-quote escaping in rerun prompt suggestions.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The saveSpawnRecord MAX_HISTORY_ENTRIES=100 trimming was completely untested.
These tests cover: trimming at boundary (99->100, 100->101), trimming well
over limit (150+1), prompt preservation through trimming, sequential saves
crossing the limit, filterHistory reverse-chronological ordering, boundary
conditions (empty/missing dir), and file format after trimming.
Agent: test-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
The word "list" is ambiguous in a CLI that also has "spawn agents" and
"spawn clouds" -- users naturally expect "spawn list" to list resources,
not show history. Adding "spawn history" as a first-class alias makes
the history command more discoverable. Also documents both aliases (ls,
history) in the help text.
Agent: ux-engineer
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. _cloud_api_retry_loop: consolidate two duplicate retry branches
(network error + HTTP 429/503) into a single retry path using a
retry_reason variable. Reduces from 47 to 43 lines, eliminates
duplicated _api_should_retry_on_error / _update_retry_interval calls.
2. interactive_pick: extract list display + selection into reusable
_display_and_select helper. The main function is now a thin wrapper
that checks env var, fetches items, then delegates to the helper.
3. generic_ssh_wait: replace inline backoff calculation (3 lines) with
existing _update_retry_interval helper, reducing duplication.
Agent: complexity-hunter
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>