spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-08 10:09:30 +00:00

Author	SHA1	Message	Date
A	cee05aba80	security: fix incomplete command injection detection in prompt validation (#1401 ) * security: fix incomplete command injection detection in prompt validation Agent: security-auditor Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: refine command injection patterns to avoid false positives Addresses changes requested in PR review: - Updated && and \|\| patterns to only match when followed by common shell commands - Added context-aware check to exclude programming expressions like "a > b && c < d" - Maintains security by still catching shell command chaining attempts - All security tests pass including new edge case tests Fixes false positive rejection of legitimate programming expressions while still detecting shell injection attempts from issue #1400. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 11:51:33 -05:00
A	1a1c06e038	test: sandbox bash tests to prevent production env pollution (#1404 ) Fixes #1403 Changes: 1. test/run.sh - Isolated mock state files: - Changed /tmp/sprite_mock_created* to use TEST_DIR instead - Added cleanup of any leaked /tmp files in cleanup() trap - Prevents /tmp pollution from mock sprite state files 2. test/record.sh - Sandboxed config directory: - Added TEST_CONFIG_DIR environment variable support - When set, overrides HOME to prevent writing to ~/.config/spawn/ - Allows tests to run without polluting production config 3. test/qa-dry-run.sh - Safe git operations: - Changed git checkout to git restore for reverting README changes - Prevents potential checkout pollution of working tree - Falls back to git checkout -- for older git versions 4. test/test-sandbox.sh - New verification test: - Verifies no /tmp pollution after test/run.sh - Verifies production config not modified - Verifies mock.sh uses isolated temp directories Why: Prevents test suite from polluting production environment (file writes to /tmp, ~/.config/spawn/, git state mutations). Agent: test-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 11:26:17 -05:00
L	f3cfe890f7	refactor: simplify trigger server to fire-and-forget + fix monitoring loop prompts (#1384 ) The trigger server streamed script stdout back to GitHub Actions via a long-lived HTTP response, requiring --http1.1, heartbeat injection, server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min GH Actions timeouts. In practice GitHub Actions is just a dumb trigger — the real state lives on the VM (log files, journalctl). Simplify to fire-and-forget: spawn script, return 200 JSON immediately. Also fix the refactor and discovery team lead monitoring loops. The prompts buried the loop in a single compressed line that the model ignored (doing Bash("sleep 10") repeatedly without calling TaskList). Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered steps, matching the security.sh pattern that actually works. Changes: - trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer, drainStreamOutput, startStreamingRun, heartbeat, ReadableStream), replace with startFireAndForgetRun (stdout: "inherit", immediate JSON) - All 4 workflows: simple curl POST, timeout-minutes 90→5, remove --http1.1/-N/--max-time/exit-code handling - refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps - discovery-team-prompt.txt: same Monitor Loop fix - SKILL.md: update architecture docs, remove streaming sections Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 10:47:52 -05:00
A	aff3b73850	security: fix medium/low findings from scan (#1395 ) * security: fix medium severity findings from scan #763 Addresses remaining medium-severity security findings from issue #763: 1. Path traversal in invalidate_cloud_key (shared/key-request.sh) - Removed dots from provider name validation regex - Changed from ^[a-z0-9][a-z0-9._-]{0,63}$ to ^[a-z0-9][a-z0-9_-]{0,63}$ - Prevents path traversal via sequences like "foo..bar" 2. Background process timeout (shared/key-request.sh) - Wrapped fire-and-forget key request in timeout 15s - Prevents leaked subprocess if curl hangs beyond --max-time 3. Rate limiting IP spoofing (.claude/skills/setup-agent-team/key-server.ts) - Switched from x-forwarded-for header to server.requestIP(req) - Uses actual connection IP instead of spoofable header Agent: security-auditor Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: add macOS portability for timeout command Address review feedback from security team - timeout command is not available on macOS by default. Added fallback pattern that: - Uses timeout on Linux (prevents subprocess leak) - Falls back to curl --max-time only on macOS This ensures request_missing_cloud_keys() works on both platforms. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * security: fix command injection vulnerability in key-request.sh Fixes the critical command injection vulnerability identified in security review. Changes: - Use positional parameters ($1, $2, $3) instead of variable interpolation in bash -c - Pass variables via -- delimiter to prevent shell escaping issues - Replace echo with printf for proper formatting (macOS bash 3.x compat) - Maintain timeout wrapper on Linux and curl --max-time fallback on macOS Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 09:29:20 -05:00
A	026963bf78	fix: readonly property assignments and test expectations (#1396 ) Fixed readonly property assignments in commands-compact-list.test.ts by using the existing setTerminalWidth() helper instead of direct Object.defineProperty() calls. This makes the code more maintainable and consistent. Updated oracle-provider-patterns.test.ts to check for install_claude_code function instead of the outdated claude.ai/install.sh reference, matching the current oracle/claude.sh implementation. Changes: - Replaced 4 inline Object.defineProperty() calls with setTerminalWidth() helper - Updated oracle claude.sh test to check for install_claude_code instead of claude.ai/install.sh - All compact list tests passing (20/20) Fixes #1366 Agent: complexity-hunter Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 05:14:40 -08:00
A	31c35594ba	fix: enhance CLI test sandboxing with .ssh directory and verification tests (#1398 ) This commit addresses issue #1373 by improving the test sandbox to prevent accidental writes to the real user environment. Changes: 1. Enhanced preload.ts: - Added .ssh directory creation in sandboxed HOME - Expanded documentation explaining sandboxing strategy - Clarified safety guarantees for filesystem operations 2. Added sandbox-verification.test.ts: - Comprehensive test suite verifying sandbox isolation - Tests environment variable sandboxing (HOME, XDG_*) - Tests pre-created directories (.config, .ssh, .claude, .cache) - Tests filesystem isolation (writes stay in temp directory) - Tests subprocess isolation (bash inherits sandboxed env) - Tests safety guarantees (no exposure of /root paths) The existing preload.ts already prevented writes to real home directory by redirecting process.env.HOME and XDG variables to temp directories. This commit strengthens that sandboxing with the .ssh directory and adds comprehensive verification tests to ensure the sandbox works correctly. Fixes #1373 Agent: test-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 08:05:29 -05:00
A	7544dd0dcb	feat(cli): add spawn name for each run (#1397 ) Implements spawn name feature (#1372) to improve UX: - Add optional spawn name prompt in interactive mode - Pass spawn name via SPAWN_NAME env var to shell scripts - Shell scripts use spawn name as default for resource names - Store spawn name in history for future reference - Bump CLI version to 0.4.0 The spawn name is prompted before agent/cloud selection and automatically used as the default for platform-specific resource names (server name on Hetzner, sprite name on Sprite, etc.). Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 08:05:17 -05:00
A	27e7f32da3	fix: apply test fixes and shell conventions from #1358 (#1394 ) Applied the test fixes from PR #1358: 1. Fixed process.stdout.columns mutation in commands-compact-list.test.ts - Replaced direct property assignments with Object.defineProperty - Created setColumns() helper function for strict mode compatibility - Removed duplicate setTerminalWidth() function 2. Updated oracle-provider-patterns.test.ts assertion - Changed from checking for "claude.ai/install.sh" URL - Now checks for "install_claude_code" function name - Matches current oracle/claude.sh implementation Note: Shell scripts (aws/gptme.sh, gcp/gptme.sh) already have set -eo pipefail from previous commits - no changes needed. Fixes #1365 Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 07:59:27 -05:00
A	e55cd149c2	feat(cli): add type-ahead filtering to agent and cloud selection (#1393 ) Replace select prompts with autocomplete for improved UX when choosing agents and clouds. Users can now type to filter the list, significantly reducing time to find desired options in long lists. - Replace p.select with p.autocomplete for agent selection - Replace p.select with p.autocomplete for cloud selection - Add "type to filter" messaging and placeholder text - Update CLI version 0.3.2 → 0.3.3 Fixes #1367 Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 07:21:06 -05:00
A	06351d6ea0	fix: validate connection parameters to prevent command injection (#1381 , #1380 ) (#1392 ) Add input validation for SSH connection parameters (IP, username, server_name) and server identifiers used in delete operations. This prevents command injection attacks if ~/.spawn/history.json is corrupted or tampered with. Changes: - Add validateConnectionIP() - validates IPv4/IPv6 addresses and sentinels - Add validateUsername() - validates Unix username format - Add validateServerIdentifier() - validates server names/IDs - Update cmdConnect() to validate all connection params before use - Update buildDeleteScript() to validate server IDs before interpolation - Update mergeLastConnection() to validate data from bash scripts - Add comprehensive test coverage for all validation functions - Bump CLI version to 0.3.3 (security patch) Security impact: - Prevents HIGH severity command injection via history.ip/user (issue #1381) - Prevents MEDIUM severity command injection via server_id (issue #1380) Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 06:32:24 -05:00
Ahmed Abushagur	18f970217d	fix: bot two-track spawning — labeled issues skip plan mode (#1389 ) The refactor bot was too passive: it ran for 29 minutes, all 7 teammates used plan mode, none submitted plans, and it ignored a HIGH severity security issue plus 4 safe-to-work issues. Root cause: plan_mode_required on ALL teammates created too much friction for issue-driven work. Teammates had to analyze, plan, submit, and wait for approval — all within a tight time window. Fix: two-track spawning system: - Issue track: teammates assigned to labeled issues (safe-to-work, security, bug) spawn WITHOUT plan mode. The label IS the approval. - Proactive track: teammates doing optional scanning still use plan mode to prevent invented work. Also: - Diminishing Returns Rule now explicitly exempts issue-driven work - Issue-First Policy is now forceful: labeled issues are mandates - Team structure maps teammates to issue label types - Cycle timeout bumped from 15 to 25 min for issue fixing - Discovery prompt updated with same two-track pattern Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 06:29:52 -05:00
Ahmed Abushagur	2f1398f5b4	fix: use official curl installer for OpenClaw on Fly.io (#1391 ) * test: add mock test coverage for all 15 Fly.io agent scripts Fly.io had zero test coverage — every bug fixed this session (stale tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went undetected. This adds the full mock test infrastructure: - test/fixtures/fly/ — env vars, API assertions, fixture JSONs for app creation, machine creation, and token validation endpoints - test/mock-curl-script.sh — URL stripping for api.machines.dev, body validation for machine creation, synthetic status responses, app creation POST handler, state tracking - test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token), URL stripping, required field validation, base64 mock - test/record.sh — Fly.io REST endpoints now recordable, live create+delete cycle, error detection, auth var mapping All 15 agent scripts (aider, claude, openclaw, etc.) are automatically discovered and tested: 75 passed, 0 failed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use official curl installer for OpenClaw on Fly.io bun install -g openclaw fails on Fly.io's bare Ubuntu image. Switch to the official installer (curl -fsSL https://openclaw.ai/install.sh \| bash) which handles Node.js detection and dependency installation automatically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 06:29:32 -05:00
Ahmed Abushagur	14d36d1e1d	fix: Fly.io SSH reliability and app name UX (#1388 ) * fix: re-prompt on taken Fly.io app names + timeout run_server Two fixes for Fly.io UX: 1. When app name is globally taken by another user, re-prompt instead of failing. Returns exit code 2 from _fly_create_app so create_server can loop with a new name. 2. run_server now has a 5-minute timeout (portable, no coreutils needed) to prevent indefinite hangs like the 3-hour SSH session stall. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: wait for SSH before installing tools on Fly.io The previous wait_for_cloud_init immediately ran apt-get via fly ssh console on a machine that wasn't SSH-reachable yet, causing indefinite hangs. Now: 1. _fly_wait_for_ssh polls with a 30s-timeout echo until SSH responds 2. Shows progress at each step instead of suppressing all output 3. Each run_server call has an explicit timeout (10min for apt, 2min for bun, 30s for PATH exports) 4. Retries package install once on timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: run fly ssh console in foreground, not background fly ssh console breaks when backgrounded with & — it needs a foreground process to establish the connection. Reverted to foreground execution and use timeout/gtimeout when available (Linux/CI). On macOS where timeout isn't available, the user can Ctrl+C hung commands. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: ensure bun PATH is available in non-interactive fly ssh sessions Ubuntu's default .bashrc returns early for non-interactive shells, so "source ~/.bashrc && bun install -g openclaw" silently fails — the PATH line at the bottom of .bashrc is never reached. Fix by prepending ~/.bun/bin to PATH in run_server() so all remote commands have access to tools installed during wait_for_cloud_init. Also fix spawn_agent to explicitly handle agent_install failure instead of relying on set -e (which exits silently). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:54:34 -05:00
Ahmed Abushagur	a9d0ee9863	test: add mock test coverage for all 15 Fly.io agent scripts (#1390 ) Fly.io had zero test coverage — every bug fixed this session (stale tokens, FlyV1 auth, name-taken failures, SSH hangs, PATH issues) went undetected. This adds the full mock test infrastructure: - test/fixtures/fly/ — env vars, API assertions, fixture JSONs for app creation, machine creation, and token validation endpoints - test/mock-curl-script.sh — URL stripping for api.machines.dev, body validation for machine creation, synthetic status responses, app creation POST handler, state tracking - test/mock.sh — mock fly/flyctl CLI binary (ssh console, auth token), URL stripping, required field validation, base64 mock - test/record.sh — Fly.io REST endpoints now recordable, live create+delete cycle, error detection, auth var mapping All 15 agent scripts (aider, claude, openclaw, etc.) are automatically discovered and tested: 75 passed, 0 failed. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 05:52:26 -05:00
A	c3dff4be7b	fix: update local cloud tests to validate hook-based abstraction (#1387 ) Why: 79 tests failing due to checking implementation details instead of behavior The local cloud provider tests were written before the spawn_agent hook-based abstraction was introduced. Tests expected scripts to directly call functions like ensure_local_ready, get_openrouter_api_key_oauth, and inject_env_vars_local. Current architecture uses hooks: - agent_install() - defines installation steps - agent_env_vars() - defines env config via generate_env_config - agent_launch_cmd() - defines launch command - spawn_agent() - framework orchestrates auth, env injection, launch Updated tests to validate: 1. Scripts call spawn_agent (not ensure_local_ready directly) 2. Scripts define agent_env_vars hook (not direct env var checks) 3. Scripts define agent_install and agent_launch_cmd hooks 4. Launch commands source ~/.spawnrc or define agent_env_vars Result: 79 test failures fixed, 226/226 tests passing Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 05:07:26 -05:00
Ahmed Abushagur	999751537d	fix: validate saved tokens + handle FlyV1 auth scheme (#1386 ) * fix: validate saved API tokens before use Tokens loaded from config files (e.g. ~/.config/spawn/fly.json) were never validated, so expired or revoked tokens would silently pass through and only fail at the point of use (e.g. app creation). Now the provider's test function runs on config-file tokens too, falling through to a fresh prompt if validation fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle FlyV1 token auth scheme for Fly.io Machines API Fly.io dashboard tokens use the format "FlyV1 fm2_..." where "FlyV1" is the authorization scheme itself, not a Bearer token prefix. The script was always sending "Authorization: Bearer FlyV1 fm2_..." which the API rejects with "token validation error". Now detects FlyV1-prefixed tokens and sends them as "Authorization: FlyV1 fm2_..." using custom auth headers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make refactor service actually run reliably Three fixes for the refactor workflow that was producing zero PRs: 1. community-coordinator: Gemini → Sonnet — Gemini doesn't support the Task tool, causing a respawn on every single cycle 2. Monitoring loop: replace "sleep 5" (which drifted to sleep 30) with explicit short-sleep instructions and CRITICAL rule that every turn must include a tool call to stay alive 3. Lifecycle management: explicit shutdown sequence with retry, preventing early exit that orphans teammates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 04:31:46 -05:00
A	f412fb69bc	ux: wait for OpenClaw gateway to be ready before launching TUI (#1385 ) Fixes #1354 - users experienced a ~30s delay with "gateway not connected" errors when trying to use OpenClaw immediately after launch. Root cause: gateway takes time to bind to port 18789, but TUI launched after only 2 seconds. Solution: Add wait_for_openclaw_gateway() helper that polls the gateway port (max 30s) before launching TUI, ensuring immediate usability. Changes: - shared/common.sh: Add wait_for_openclaw_gateway() function - All openclaw.sh scripts (10 files): Replace sleep 2 with gateway readiness check Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 03:49:53 -05:00
A	5678129126	fix: prevent silent agent installation failures (#1382 ) Make install_agent() check exit codes and fail fast when installation commands return non-zero. Previously, the function would silently continue even when installations failed due to bash \|\| operators returning 0. This fix ensures that installation failures (network timeouts, missing dependencies, package not found) are caught immediately with actionable error messages instead of confusing runtime errors during session launch. Affected ~30 agent scripts using patterns like: - pip install X 2>/dev/null \|\| pip3 install X - command -v bun && bun install X \|\| npm install X Agent: code-health Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 03:11:51 -05:00
A	d2b6fc1ae4	security: fix path traversal in CLI installer file downloads (#1383 ) Fixes path traversal vulnerability where unvalidated filenames from GitHub API could write files outside intended directory. Attack vector: MITM attack or DNS hijacking could inject filenames like "../../../../../../tmp/evil.ts" to write arbitrary files. Fix: Validate filenames before download - block "..", "/", and "\\" to ensure files are written only within ${dest}/cli/src/ Severity: HIGH/CRITICAL Affects: All users running installer via curl\|bash Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 03:09:12 -05:00
A	30138f6a8a	security: fix path traversal in CLI installer and hetzner token extraction (#1379 ) Fixes #1376 - HIGH severity path traversal in CLI installer Fixes #1377 - MEDIUM severity unquoted variable in hetzner token extraction Changes: - cli/install.sh: Replace string prefix matching with canonicalized path comparison to prevent path traversal in rm -rf cleanup. The previous check could be bypassed with sequences like "/tmp/../../home/user". - hetzner/lib/common.sh: Quote xargs placeholder variable to prevent unexpected behavior if hcloud context name contains shell metacharacters. Agent: security-auditor Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-17 01:51:13 -05:00
A	99a9badf62	ci: increase refactor team frequency to every 15 minutes (#1378 ) Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 20:50:03 -08:00
A	c4eccbd72f	feat: prioritize clouds with CLI installed + hcloud CLI integration (#1375 ) * fix: auto-run gcloud auth login on expired GCP tokens Instead of telling users to run `gcloud auth login` manually, just run it automatically when auth check fails or instance creation hits a reauthentication error, then retry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: prioritize clouds with CLI installed + hcloud CLI integration When selecting a cloud provider, clouds are now sorted in 3 tiers: 1. Credentials detected (env vars set) — top priority 2. CLI installed (e.g., gcloud, hcloud, aws) — middle priority 3. Neither — default order Also adds hcloud CLI-first support for Hetzner operations (server create/delete/list, SSH key management, auth) with automatic fallback to the existing REST API when hcloud is not available. Closes #1370 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rename aws-lightsail to aws across the project Simplifies the cloud key from "aws-lightsail" to "aws" — AWS should have a single entry regardless of the underlying service used. Renames the directory, updates manifest.json matrix keys, CLI map, test fixtures, README, and all agent scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 20:12:35 -08:00
L	d452fdea37	chore: upgrade workflow models to Gemini Flash + Sonnet (#1374 ) * chore: replace open-source models with Gemini Flash and Sonnet in workflows Drop moonshotai/kimi-k2.5 and Haiku from refactor/security workflows. Lightweight tasks (triage, issue-checker, community-coordinator) now use google/gemini-3-flash-preview; all other teammates upgraded to Sonnet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: ensure CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in all workflows Add the required feature flag export to refactor.sh and security.sh (discovery.sh already had it). Also update SKILL.md wrapper template and agent teams reference section to document the requirement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: persist CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS into .spawnrc All three service scripts now check for ~/.spawnrc and idempotently append the agent teams feature flag if missing. This ensures every Claude session on the VM inherits the flag, not just the one launched by the service script. Also documents the pattern in SKILL.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to qa-cycle.sh Complete the coverage — qa-cycle.sh now also exports the agent teams feature flag and persists it to .spawnrc, matching the other three service scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 20:00:29 -08:00
A	2b87735e3d	refactor: extract error guidance data structures into separate module (#1335 ) Extracted EXIT_CODE_GUIDANCE and SIGNAL_GUIDANCE from commands.ts into a new guidance-data.ts module. This reduces commands.ts complexity by 100+ lines, making error handling logic more maintainable and focused. Changes: - New file: cli/src/guidance-data.ts (116 lines) with error/signal guidance data - Refactored: commands.ts now 100 lines shorter, imports guidance data - Improved: Exit code 1 handling to avoid circular dependency with credentialHints The extracted module is a pure data file focused on error messages and guidance, separate from the command execution logic. Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 19:45:28 -08:00
A	6be328c314	fix: auto-run gcloud auth login on expired GCP tokens (#1371 ) Instead of telling users to run `gcloud auth login` manually, just run it automatically when auth check fails or instance creation hits a reauthentication error, then retry. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 19:34:54 -08:00
Ahmed Abushagur	054732404d	fix: prevent watchdog from killing teammates prematurely (#1369 ) The result event detection in refactor.sh, discovery.sh, and security.sh was killing the entire process tree 30s after the team lead's session ended. In team-based workflows, the team lead's "result" event fires after spawning teammates — while the actual work is still running as child processes. Instead of immediately killing on result detection, monitor the claude process's child processes via pgrep. While teammates are running, reset the idle counter to prevent false timeouts. Only shut down once all teammate processes have completed (or the hard timeout fires). Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 22:06:08 -05:00
A	87184ebbf7	fix(security): validate OAuth code format before file write (#1322 ) CRITICAL: Prevent injection via malicious OAuth callback Vulnerability: - OAuth code from query param was written directly to file - Attacker-controlled OAuth provider could inject: - Newlines (write multiple files via code="line1\nline2") - Control characters to corrupt subsequent parsing - Excessively long strings (DoS via disk fill) Fix: - Added strict validation: alphanumeric + dash/underscore only - Length constraint: 16-128 chars (matches real OAuth codes) - Fail with 400 status if validation fails - Type coercion (String()) prevents prototype pollution Impact: HIGH - Affects: All users running OAuth flow (default auth method) - Attack vector: Malicious redirect to fake OAuth endpoint - Severity: Code injection, file system manipulation Agent: security-auditor Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 21:04:43 -05:00
Ahmed Abushagur	378b2c7d1d	test: add filesystem isolation preload for CLI tests (#1250 ) Redirects HOME and XDG dirs to a temp directory before tests run, preventing any test from accidentally writing to the real user's home directory (e.g. ~/.claude/settings.json, ~/.zshrc). Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 21:04:18 -05:00
A	7388c1eaec	ux: clarify that 149/150 combinations are working (local/opencode missing) (#1343 ) The tagline claimed "149 combinations" without context. Users might think this is a limitation or wonder why 15×10=150. The matrix table shows the blank cell for local/opencode, but the tagline should clarify this upfront. Agent: ux-engineer Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 21:01:08 -05:00
A	ffd9626ae9	simplify issue templates — let the refactor team triage (#1368 ) Remove verbose fields (dropdowns, use cases, environment, proposed UX) from all issue templates. Humans just need to say what they want; the refactor team handles enrichment and triage. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 17:45:18 -08:00
A	b8ae1d6a68	ux: complete commands table in README (#1336 ) Add missing commands to README commands table for consistency with CLI help: - spawn <cloud> (show available agents) - spawn list <filter> (filter by agent/cloud name) - spawn list -a/-c (explicit filters) - spawn list --clear (clear history) - spawn last (rerun most recent) - spawn help (show help) - spawn version (show version) Updated descriptions to match CLI help output exactly. Agent: ux-engineer Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:32:22 -05:00
A	40514526bd	fix: improve error handling and prevent race conditions in cleanup (#1349 ) Three reliability improvements: 1. OAuth session cleanup: Verify PID still exists before killing to prevent accidentally killing unrelated processes if PID is reused by the OS. Uses kill -0 check before sending SIGTERM. 2. Float arithmetic fallback: Check for python3 availability before using it for fractional POLL_INTERVAL support. Falls back to integer seconds with explicit comment about potential early timeout. 3. Exit code preservation: Add clarifying comment about exit code capture timing in refactor.sh cleanup trap (already correct, now documented). Agent: code-health Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:30:26 -05:00
L	d80d747fab	docs: improve service URL detection in setup-agent-team skill (#1364 ) Add Step 5.5 with three options for determining service URL: - Option A: Sprite/Fly.io VMs (flyctl status) - Option B: Hetzner/other cloud VMs with public IP (curl ipify.org) - Option C: Custom domain/reverse proxy setups Also fix cron syntax typo in Step 6 ('0 /2 * ' -> '0 /2 * * * *'). Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 17:29:50 -08:00
A	fb144fa47d	fix: check saved cloud configs in credential validation Fixes #1197 by checking for saved credentials in ~/.config/spawn/{cloud}.json files. This prevents false-positive credential warnings when cloud-specific credentials are saved via config files (as done by cloud setup scripts). Advantages over PR #1288: - Works with all credential key names (not just api_key/token) - Handles multi-credential clouds correctly (OVH, Contabo) - Generic approach checks for any non-empty credential value Security review: ✅ No vulnerabilities detected - Path traversal protected - Safe JSON parsing - No information disclosure - Correct multi-cloud credential logic	2026-02-16 20:29:08 -05:00
A	9c0420f865	fix: update help examples to reference existing clouds and document --debug flag (#1350 ) UX improvements: - Replace outdated cloud references (vultr/linode) with existing clouds (ovh/gcp) in help examples - Add missing --debug flag to README commands table - Ensure all documented examples reference clouds that exist in the matrix These changes prevent user confusion when following examples in help text and documentation. Agent: ux-engineer Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:28:44 -05:00
A	b540f69248	fix: track OAuth temp directories for cleanup on exit (#1344 ) Security review complete. Merge conflict resolved (combined error handling + track_temp_file). All tests passed (80/80). Low-risk reliability fix.	2026-02-16 20:28:35 -05:00
A	e92522f138	fix: add error logging to empty catch blocks in test helpers (#1334 ) * fix: add error logging to empty catch blocks in test helpers Previously, test helper functions had 14 empty catch blocks that silently swallowed all errors during cleanup operations (reading and deleting temporary stderr files). This change adds error logging that: - Allows expected errors (ENOENT for missing files, exit code 1 for cat) - Logs unexpected errors to console for debugging This improves test reliability by surfacing unexpected filesystem or permission errors that could indicate real problems, while still allowing the intended best-effort cleanup behavior. Fixes: Empty catch blocks in 6 test files Impact: Better test debugging and error visibility Agent: code-health Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix: improve error handling in Python fallback and directory deletion 1. Python arithmetic fallback (shared/common.sh:713): - Changed from: \|\| echo "$((elapsed + 1))" - Changed to: explicit if/else with error detection - Impact: Python errors are now properly caught instead of masked by \|\| 2. Unvalidated directory deletion (cli/install.sh:142): - Added path validation before rm -rf - Checks: path is within dest directory AND directory exists - Impact: Prevents accidental deletion if variables are malformed Both changes improve safety and error visibility without breaking existing functionality. Agent: code-health Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:28:30 -05:00
A	4acb28a263	test: fix bun PATH in subprocess tests and set -eo pipefail in shell scripts (#1353 ) Fixes 256 failing tests that spawn bun subprocesses. These tests were failing because bun was not in the child process PATH. Ensures all CLI test helpers pass PATH with $HOME/.bun/bin included. Also corrects two gptme.sh scripts to use 'set -eo pipefail' instead of bare 'set -e' for proper error handling, per shellcheck conventions. Changes: - 7 CLI test files: add PATH=$HOME/.bun/bin to execSync/spawnSync env - 2 shell scripts: use set -eo pipefail for proper error handling Results: 256 tests now passing, 0 failures in subprocess CLI tests. Co-authored-by: test-engineer <agent@spawn.local> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:28:17 -05:00
A	2fe8956729	fix: improve error handling by capturing error objects in catch blocks (#1360 ) Replace empty catch blocks with explicit error parameters for better debugging and potential future error logging. Changes include: - Add error parameter to all catch blocks (currently 7 instances) - Enable conditional debug logging for non-fatal history write failures - Maintain backward compatibility - no behavior changes - Improve code maintainability and debugging capability This addresses code health issue where errors were silently swallowed without any reference, making debugging difficult. Agent: code-health Co-authored-by: test-engineer <agent@spawn.local> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:27:35 -05:00
A	2630a5d0d8	security: escape single quotes in OAuth server script generation (#1342 ) Prevents potential code injection if malicious parameters containing single quotes are passed to _generate_oauth_server_script(). The function embeds bash variables directly into a Node.js script string using single-quoted JS strings. Without escaping, a crafted parameter like "foo'; malicious(); '" could break out of the string context. While current callers use safe values (randomUUID, tempfile paths, HTML constants), defense-in-depth requires sanitizing at the point of use to prevent future regressions if callers change. Fixes: CWE-94 (Code Injection) Severity: HIGH Impact: Remote code execution if attacker controls OAuth state token, file paths, or HTML content Agent: security-auditor Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:27:05 -05:00
A	7b9912a7ca	Reduce code complexity by extracting helper functions (#1352 ) Refactored two high-complexity functions to improve maintainability: 1. shared/common.sh: Extract install_claude_code() into 5 focused helpers: - _finalize_claude_install: Setup shell integration - _verify_claude_installed: Check if installation succeeded - _install_via_curl: Curl installer method - _ensure_nodejs_runtime: Node.js runtime setup - _install_via_bun: Bun installer method Main function now reads as a clear sequence of steps. 2. cli/src/commands.ts: Simplify credential checking in printQuickStart: - Extract checkAllCredentialsReady() for clarity - Extract printAuthVariableStatus() to handle auth var display - Extract buildCloudCommandHint() for cloud hint formatting Reduces complexity and improves readability. All 80 tests pass. No functional changes. Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:26:15 -05:00
A	42bd3bf96b	fix: add safety checks to prevent destructive rm -rf operations (#1319 ) Improves codebase reliability by adding critical safety validations: 1. cleanup_oauth_session: Added path validation before rm -rf - Prevents accidental deletion if oauth_dir is empty, /, or /tmp - Validates path starts with /tmp/ and is not just /tmp itself - Prevents catastrophic system damage from failed mktemp 2. _init_oauth_session: Added mktemp failure detection - Checks if mktemp -d succeeded before using oauth_dir - Returns error with actionable message if temp dir creation fails - Prevents empty oauth_dir from propagating to rm -rf 3. refactor.sh SPAWN_ISSUE validation: Strengthened regex - Changed from ^[0-9]+$ to ^[1-9][0-9]*$ - Prevents SPAWN_ISSUE="0" from creating issue-0 worktrees - Ensures issue numbers are positive integers (>= 1) These fixes prevent potential data loss from edge cases in OAuth cleanup and refactor service issue handling. Agent: code-health Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:26:09 -05:00
A	b5e8dbbfc5	security: fix temp file race condition in credential upload (#1333 ) HIGH severity: Three functions used hardcoded /tmp/env_config for uploading API keys, creating a TOCTOU race condition where attackers on multi-user systems could create symlinks to exfiltrate OPENROUTER_API_KEY and other credentials. Fixed by using unpredictable temp file names with mktemp-derived randomness, matching the secure pattern in write_remote_file_via_callback(). Affected functions: - inject_env_vars_with_ssh() (line 1094) - inject_env_vars_local() (line 1128) - inject_env_vars_cb() (line 1363) Agent: security-auditor Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:25:59 -05:00
A	da30c7f5d3	security: replace eval with native indirect expansion in test/record.sh (#1351 ) Replaces fragile eval-based indirect variable expansion with bash's native ${!var} syntax. This eliminates potential command injection risks and improves code clarity. Changes: - Line 139: eval "local val=\${...}" → local val="${!env_var:-}" - Line 168: eval "local current_val=\${...}" → local current_val="${!env_var:-}" - Line 215: eval "[[ -n \${...} ]]" → [[ -n "${!env_var:-}" ]] - Line 223: eval "[[ -n \${...} ]]" → [[ -n "${!env_var:-}" ]] - Line 246: eval "local val=\${...}" → local val="${!env_var:-}" - Line 276: eval "local current=\${...}" → local current="${!var_name:-}" Security impact: Removes eval usage that could theoretically allow command injection if env var names were ever user-controlled (currently not the case, but pattern is fragile). Fixes part of issue #763 (MEDIUM: Indirect variable expansion via eval) Agent: security-auditor Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:25:48 -05:00
A	d2866b2976	ux: standardize destroy_server() wrapper for OVH and Sprite (#1345 ) Adds destroy_server() wrapper functions to OVH and Sprite cloud libraries to match the standardized function name used by 8 other clouds. Before: - OVH used destroy_ovh_instance() - Sprite had no destroy function - Cross-cloud scripts couldn't use a uniform destroy_server() call After: - OVH: destroy_server() wraps destroy_ovh_instance() - Sprite: destroy_server() wraps "sprite destroy <name>" CLI command - Cross-cloud scripts can now call destroy_server() uniformly This fixes the blocker for PR #1217 which hardcodes destroy_server() calls that would silently fail for OVH and Sprite users. Fixes #1178 Agent: ux-engineer Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:24:40 -05:00
A	8c845869b3	ux: improve error message formatting and clarity (#1324 ) - Show agent display names instead of keys in cloud suggestion errors - Add visual spacing in "not yet implemented" error output for better scannability - Improve readability of error messages with strategic blank lines Agent: ux-engineer Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:24:38 -05:00
A	8ba3e97ed6	fix: add critical error handling and input validation (#1356 ) - Fix race condition in cleanup_oauth_session: Kill process group to prevent zombie OAuth server processes - Add mktemp failure handling in _init_oauth_session: Prevents undefined behavior when /tmp is full or inaccessible - Add env var name validation in generate_env_config: Prevents shell injection via malformed KEY=value pairs Agent: code-health Co-authored-by: test-engineer <agent@spawn.local> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:24:30 -05:00
Ahmed Abushagur	3fbdf56c4c	fix: add guardrails to prevent bots from inventing unnecessary work (#1347 ) - Add team lead pre-approval gate: teammates spawn in plan mode and must get approval before creating any PR (hard gate, not just prompt rules) - Add diminishing returns rule: default posture is "code is good, shut down" - Add dedup rule: check for existing open/closed PRs before creating new ones - Require concrete PR justification (what breaks without this change) - Add off-limits files list (.github/workflows, .claude/skills, CLAUDE.md) - Use git pathspec exclusions in refactor.sh to never stage protected files - Constrain pr-maintainer to only act on approved or feedback PRs - Reduce refactor cron from every 5 minutes to every 2 hours Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 20:24:25 -05:00
A	5f39b035c6	refactor: extract credential loading helpers to reduce complexity in test/record.sh (#1348 ) Split credential loading logic into focused helper functions: - _export_env_vars_from_fields: Extract array export logic (16 lines) - _load_single_token_config: Extract single-token loading (14 lines) Changes: - try_load_config reduced from 39 to 28 lines (28% reduction) - _load_multi_config_from_file reduced from 38 to 26 lines (32% reduction) - Eliminated duplicate env var validation logic - Improved readability with clear separation of concerns All 80 tests passing. No functional changes. Agent: complexity-hunter Co-authored-by: spawn-bot <bot@openrouter.ai> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:23:49 -05:00
A	8228bf19ed	ux: fix readonly property assignment errors in terminal width tests (#1357 ) The tests were failing because process.stdout.columns is a readonly property in Bun's test environment. Changed all direct assignments to use Object.defineProperty() which allows setting readonly properties during tests. Changes: - Added setTerminalWidth() helper in commands-compact-list.test.ts - Updated all test cases to use Object.defineProperty() instead of direct assignment - Fixed afterEach cleanup to properly restore original columns value - Same fixes applied to commands-list-grid.test.ts This ensures tests pass in Bun runtime while maintaining the same test coverage. Agent: ux-engineer Co-authored-by: test-engineer <agent@spawn.local> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-16 20:23:46 -05:00

1 2 3 4 5 ...

1207 commits