spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-04-29 04:19:30 +00:00

Author	SHA1	Message	Date
A	d04096a15b	feat!: remove Fly.io cloud provider support (#1979 ) * feat!: remove Fly.io cloud provider support Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io infrastructure internally) is retained. - Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/ - Remove fly cloud entry and 6 fly matrix entries from manifest.json - Remove fly imports, destroy cases, and connection handlers from commands.ts - Remove fly-ssh sentinel from security.ts - Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh) - Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts - Clean up fly references in build config, gitignore, icon sources - Bump CLI version to 0.11.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: restore Docker image build under sh/docker/ Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename workflow from fly-docker.yml to docker.yml with updated paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix extra blank lines in commands.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-27 00:06:32 -05:00
A	9e54f0cf57	ci: add Mock Tests job to satisfy required status check (#1904 ) * ci: add Mock Tests job to satisfy required status check Split the unit-tests job into mock-tests (runs bun test) and unit-tests (verifies cloud bundles build). The repo ruleset requires "Mock Tests", "Unit Tests", and "Biome Lint" checks — the missing "Mock Tests" job was blocking all PR merges. Fixes #1901 Agent: issue-fixer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * style: fix pre-existing Biome format issues in 9 files Auto-applied Biome formatter to src/ to resolve failing "Biome Lint" required status check. No logic changes — formatting only. Agent: issue-fixer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-25 00:54:33 -05:00
A	b2bddc4ba5	ci: bump QA cron from daily to every 4 hours (#1895 ) Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:46:55 -08:00
A	98a0d0f68f	feat(qa): add e2e-tester as subagent in scheduled quality sweep (#1894 ) E2E tests now run as a 4th teammate alongside test-runner, dedup-scanner, and code-quality-reviewer during schedule-triggered QA cycles. The standalone e2e mode is preserved for on-demand use. - Add e2e-tester teammate to qa-quality-prompt.md - Increase quality mode timeout from 35 to 40 min - Add "e2e" to trigger-server valid reasons - Re-enable daily schedule in qa.yml, default to "schedule" Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 19:34:35 -05:00
A	58c91571f1	fix: make macOS compat linter blocking and add 6 missing rules (#1867 ) The linter was running in CI with --warn-only, meaning it never blocked anything — effectively vaporware. This removes --warn-only to make it a real gate. Also adds rules for bash 4.0+ features that were documented in CLAUDE.md but not enforced: - MC014: readarray/mapfile (bash 4.0+) - MC015: coproc (bash 4.0+) - MC016: &>> redirect (bash 4.0+) - MC017: relative source paths (breaks curl\|bash) - MC018: wait -n (bash 4.3+) - MC019: declare -g (bash 4.2+) Excludes .claude/worktrees/ from scanning (temp copies, not committed code). Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:50:12 -05:00
A	65f6f1be32	feat: Bun workspace monorepo — packages/cli + packages/shared (#1853 ) Restructure the repo as a Bun workspace monorepo: - Move cli/ → packages/cli/ - Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities - Add root package.json with workspace configuration - Update all CLI imports to use @openrouter/spawn-shared - Deduplicate toRecord/toObjectArray helpers from 4 cloud modules - Update SPA (slack-bot) to use shared package instead of local toObj() - Update 48 agent shell scripts for new packages/cli/ path - Update install.sh, install.ps1, e2e, and test scripts - Update all GitHub workflows, .gitignore, pre-commit hooks - Update CLAUDE.md, README.md, and skill prompt references - Pin all dependency versions (no ^ ranges) - Bump CLI version 0.9.1 → 0.10.0 All 1908 tests pass. Lint clean. All 8 cloud bundles build. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 22:07:05 -08:00
A	b84adfb74e	refactor: move all shell scripts to /sh directory (#1843 ) Reorganizes the project so all shell scripts live under a dedicated /sh directory, enabling the OpenRouter rewrite URL to point at /sh/ instead of the repository root. Moves: - cli/install.sh → sh/cli/install.sh - shared/.sh → sh/shared/.sh - {cloud}/{agent}.sh → sh/{cloud}/{agent}.sh (48 scripts) - {cloud}/README.md → sh/{cloud}/README.md - e2e/.sh → sh/e2e/.sh - test/macos-compat.sh → sh/test/macos-compat.sh - test/fixtures/*/.sh → sh/test/fixtures/*/.sh Updates all references: - RAW_BASE path construction in commands.ts, update-check.ts - GitHub auth URL in agent-setup.ts - Self-referencing URLs in install.sh, github-auth.sh - CI workflow paths in lint.yml, cli-release.yml - Test file paths in install-script-validation, manifest-integrity - Documentation in README.md, cli/README.md, CLAUDE.md - QA scripts in .claude/skills/ Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 21:14:54 -08:00
A	fccb73d147	feat: add Fly.io E2E test suite and QA e2e mode (#1823 ) - Add e2e/ directory with fly-e2e.sh orchestrator and lib/ helpers (provision, verify, teardown, cleanup) that provision real Fly.io VMs, verify agent installation, and tear everything down - Fix openclaw E2E failure by setting MODEL_ID=openrouter/auto to bypass interactive model selection prompt in headless mode - Add e2e mode to qa.sh (reason=e2e) that launches a Claude agent to run the E2E suite and investigate/fix any failures - Update qa.yml with reason dropdown (e2e/schedule/fixtures), kept disabled Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:31:44 -05:00
A	aa88e70488	fix: add concurrency guard and workflow_dispatch to CLI release (#1812 ) The race condition: two PRs merged 3 seconds apart both triggered the CLI Release workflow. The second run (v0.7.12) finished last and overwrote the release with a stale binary, even though the repo HEAD was at v0.8.0. - Add concurrency group so concurrent releases cancel the older one - Add workflow_dispatch trigger for manual re-runs Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 13:52:17 -05:00
A	a26d27f139	style: enforce biome format across codebase, add CI check (#1794 ) Run `biome format --write` on all 98 source files (38 needed fixes). The main change: object literals and long argument lists are now expanded onto separate lines per Biome's `"expand": "always"` setting, making code much easier to scan on narrow screens. Add `biome format` check step to CI lint workflow so formatting regressions are caught on every PR. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 23:32:12 -08:00
A	ec210e37af	fix: Result monad for retry logic — prevent duplicate server creation (#1771 ) * fix: Result monad for retry logic — prevent duplicate server creation SSH exit 255 after an interactive session caused runWithRetries to retry the entire bash script, creating duplicate servers. The old withRetry also blindly retried all errors including timeouts where the remote command may have already completed. Introduces a Result<T> monad (Ok/Err) so callers explicitly signal whether a failure is retryable (return Err) or fatal (throw). Adds wrapSshCall() that classifies SSH errors: transient connection failures are retryable, timeouts are not. Removes retry loop from the top-level script runner entirely since it spans server creation + interactive session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mandate draft-PR-first workflow for all changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add biome lint to CI and pre-commit hook, fix lint violations - Add Biome lint job to .github/workflows/lint.yml - Add TypeScript lint check to .githooks/pre-commit - Fix useBlockStatements violations in ui.ts and tests - Add biome lint to CLAUDE.md "After Each Change" checklist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rename Result.value to Result.data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: clean up stale pre-commit hook - Remove dead check for deleted functions (write_oauth_response_file, create_oauth_response_html) — they no longer exist in the codebase - Fix early exit skipping Biome lint when no .sh files are staged - Replace echo -e with printf (the hook was using the pattern it bans) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve biome lint errors blocking CI - Fix useImportType: import { type Result } → import type { Result } - Fix noUnusedImports: remove unused KNOWN_FLAGS import - Fix noUnusedTemplateLiteral: template literal → string literal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 20:39:42 -05:00
A	60986e5a05	refactor: remove shared/common.sh and 27 subprocess-heavy test files (#1728 ) shared/common.sh (3852 lines) was dead code — the entire architecture was rewritten to TypeScript in cli/src/. No agent scripts source it anymore. The only consumer was github-auth.sh which just needed 4 log functions (now inlined). Remove 27 test files that spawned ~800+ real bash/bun subprocesses per run (the root cause of slow bun test). Every shared-common-*.test.ts file forked a real bash shell per test case to source shared/common.sh. CLI subprocess tests spawned `bun run index.ts` per assertion. These were integration tests, not unit tests. Also removes: - mock-tests CI job from test.yml (ran test/mock.sh which opens browser) - Stale plan files referencing deleted infrastructure - All CLAUDE.md/README.md references to the old lib/common.sh pattern Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 11:32:27 -08:00
A	33bd3e615c	chore: disable QA workflow schedule until VM is fixed (#1722 ) Keep workflow_dispatch for manual testing. Re-enable cron when the QA VM is back online. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 11:06:50 -08:00
A	2303e65022	fix: allow rich text in bug report issue template (#1710 ) Remove `render: shell` from the "What happened?" textarea so users can paste screenshots, drag & drop files, and use markdown formatting instead of being forced into a plain-text code block. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 10:04:20 -08:00
A	0f4df7be71	feat: pre-built Docker image for OpenClaw on Fly.io (#1686 ) Eliminates the slow waitForCloudInit() + bun install phase by booting a pre-built image with Node.js, bun, and openclaw already installed. The image is rebuilt daily via GitHub Actions to pick up new releases. Other agents are unaffected — they still use ubuntu:24.04 + cloud-init. Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 02:50:46 -05:00
A	262d081756	refactor: move fly TS into cli/src/fly/, add build-clouds.sh (#1604 ) Move all fly TypeScript files from fly/lib/.ts and fly/main.ts into cli/src/fly/. This gives them access to cli/node_modules (@clack/prompts), biome linting, and the existing bun:test infrastructure — no symlinks or NODE_PATH hacks needed. The org picker now uses @clack/prompts select() directly (static import, bundled at build time). New: cli/build-clouds.sh — auto-discovers cli/src//main.ts and bundles each into {cloud}.js. Scalable to future cloud TS migrations: bash cli/build-clouds.sh # build all bash cli/build-clouds.sh fly # build one Shims now check for cli/src/fly/main.ts (local) or download fly.js from GitHub releases (remote curl\|bash). Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-21 12:34:09 -08:00
Ahmed Abushagur	f2795a6d84	fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prepend ~/.local/bin to PATH in ssh_run_server After uv installs to ~/.local/bin, the current shell session doesn't have it in PATH, causing "uv: command not found" on DigitalOcean and all other SSH-based clouds (Hetzner, AWS, GCP, OVH). Fly.io's run_server already prepends this PATH — now the shared ssh_run_server does the same, fixing all SSH-based clouds at once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Node.js to cloud-init for all cloud providers npm-based agents (codex, kilocode, etc.) fail with "npm: command not found" because Node.js isn't installed during cloud-init. Fly.io was the only provider installing Node.js (in wait_for_cloud_init). Now all cloud-init scripts install Node.js v22 LTS from nodesource, matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and GCP cloud-init (was already in shared/DigitalOcean/Hetzner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use apt packages for nodejs/npm instead of nodesource The nodesource setup script (setup_22.x) runs its own apt-get update and repository configuration, nearly doubling cloud-init time and causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm in its default repos — just add them to the packages list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add timeouts and better error handling to Daytona CLI commands Daytona CLI commands (login, list, create) can hang indefinitely when the API is slow or unreachable. This causes: - "Failed to create sandbox: timeout" with no recovery - Token validation timeouts misreported as "invalid token" - Users re-entering valid tokens that also timeout Fixes: - Wrap all daytona CLI calls with timeout (30s for auth, 120s for create) - Detect timeout errors separately from auth errors - Show actionable "try again / check status" messages for timeouts - Add nodejs/npm to Daytona wait_for_cloud_init Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: set DAYTONA_API_URL to Daytona Cloud by default The Daytona CLI may default to connecting to a local self-hosted server instead of Daytona Cloud. Without DAYTONA_API_URL set to https://app.daytona.io/api, every CLI command (login, list, create) hangs trying to reach a non-existent local server and times out. The SDK documents this as the default, but the CLI doesn't always pick it up — now we export it explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing n installs Node.js v22 to /usr/local/bin/node but apt's v18 at /usr/bin/node can shadow it in non-interactive SSH sessions. After n 22, symlink the new binaries over the apt ones so v22 is always resolved. Also fix hcloud CLI token extraction for new TOML format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review, add curl timeouts to trigger workflows - Fix ssh_run_server command injection concern: use single-quoted path_prefix so $HOME/$PATH expand remotely, not locally - Add --connect-timeout 15 --max-time 30 to trigger workflows to prevent 5-min hangs when server streams responses - Handle 409 (dedup) as success — expected when cron fires every 15min but cycles take 35min - Reduce workflow timeout-minutes from 5 to 2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 06:54:07 -05:00
Ahmed Abushagur	22b6a402f4	feat: E2E test harness, QA pipeline integration, macOS compat linter (#1425 ) * feat: add QA upgrade — macOS compat linter, per-agent mock assertions Layer 1: macOS compat linter (test/macos-compat.sh) - 12 rules (MC001–MC012) catching bash 3.2 incompatibilities - Detects: base64 -w0 file args, non-portable echo flags, source <(), ((var++)), read -d, nounset flag, sed -i, date %N, local -n, declare -A, ${var,,}, and \|& - Added to CI lint.yml in warn-only mode for burn-in - Integrated as Phase 0.5 in qa-dry-run.sh Layer 2: Per-agent mock assertions - test/fixtures/_shared_agent_assertions.sh with install checks for all 15 agents (claude, openclaw, aider, goose, etc.) - Integrated into test/mock.sh via _run_agent_assertions() Also includes branch fixes: - Fix base64 -w0 to use stdin redirect (aws, daytona, fly) - Fix fly/openclaw to use npm install instead of broken curl\|bash Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E test harness and integrate into QA pipeline Add test/e2e.sh — a full E2E test harness that provisions real servers, installs agents, and verifies setup across all clouds. Features: - Smoke test (one canary agent per cloud) and full matrix modes - Credential auto-detection for 8 clouds - Per-cloud preflight validation (sequential) then parallel agent tests - Stale server cleanup, timing history, cross-cloud comparison - Auto-fix and optimization phases via Claude agents - macOS bash 3.2 compatible Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh: - Runs after mock tests pass, gated on cloud credentials - Phase 5b auto-fixes failures using per-agent worktree branches - Parses results and includes in QA summary Also fixes: - shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read() - aws/lib/common.sh: fix SSH key import (use cat instead of base64, handle race condition on concurrent imports) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:41:07 -05:00
L	6e13256d96	refactor: simplify claude launch — no streaming, no output monitoring (#1412 ) Replace the complex claude launch pattern (subshell + PID file + tee pipe + stream-json + 50-line watchdog monitoring log file growth + session-end detection) with a simple direct launch: claude -p "..." >> "${LOG_FILE}" 2>&1 & The watchdog is now just a wall-clock timeout. The idle-output detection, stream-json result parsing, and tee piping are all removed. Also remove GitHub Actions concurrency groups — the trigger server already handles dedup (409 for same issue, 409 for same reason), making the GH Actions concurrency groups redundant queuing. Changes: - refactor.sh: simple launch + wall-clock-only watchdog - security.sh: same simplification - discovery.sh: same (refactored _kill_claude_process and _run_watchdog_loop to simpler signatures) - All 4 workflows: remove concurrency groups Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 09:02:47 -08:00
L	f3cfe890f7	refactor: simplify trigger server to fire-and-forget + fix monitoring loop prompts (#1384 ) The trigger server streamed script stdout back to GitHub Actions via a long-lived HTTP response, requiring --http1.1, heartbeat injection, server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min GH Actions timeouts. In practice GitHub Actions is just a dumb trigger — the real state lives on the VM (log files, journalctl). Simplify to fire-and-forget: spawn script, return 200 JSON immediately. Also fix the refactor and discovery team lead monitoring loops. The prompts buried the loop in a single compressed line that the model ignored (doing Bash("sleep 10") repeatedly without calling TaskList). Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered steps, matching the security.sh pattern that actually works. Changes: - trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer, drainStreamOutput, startStreamingRun, heartbeat, ReadableStream), replace with startFireAndForgetRun (stdout: "inherit", immediate JSON) - All 4 workflows: simple curl POST, timeout-minutes 90→5, remove --http1.1/-N/--max-time/exit-code handling - refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps - discovery-team-prompt.txt: same Monitor Loop fix - SKILL.md: update architecture docs, remove streaming sections Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 10:47:52 -05:00
A	99a9badf62	ci: increase refactor team frequency to every 15 minutes (#1378 ) Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 20:50:03 -08:00
A	ffd9626ae9	simplify issue templates — let the refactor team triage (#1368 ) Remove verbose fields (dropdowns, use cases, environment, proposed UX) from all issue templates. Humans just need to say what they want; the refactor team handles enrichment and triage. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 17:45:18 -08:00
Ahmed Abushagur	3fbdf56c4c	fix: add guardrails to prevent bots from inventing unnecessary work (#1347 ) - Add team lead pre-approval gate: teammates spawn in plan mode and must get approval before creating any PR (hard gate, not just prompt rules) - Add diminishing returns rule: default posture is "code is good, shut down" - Add dedup rule: check for existing open/closed PRs before creating new ones - Require concrete PR justification (what breaks without this change) - Add off-limits files list (.github/workflows, .claude/skills, CLAUDE.md) - Use git pathspec exclusions in refactor.sh to never stage protected files - Constrain pr-maintainer to only act on approved or feedback PRs - Reduce refactor cron from every 5 minutes to every 2 hours Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 20:24:25 -05:00
A	a4fe0388c1	fix: allow repo collaborators through the gate workflow (#1166 ) Previously only org members were allowed. Now checks both org membership and repo collaborator status, so invited collaborators can open issues and PRs without being blocked. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 18:32:50 -08:00
A	8108d57999	fix: add write permissions to gate workflow (#1148 ) The default GITHUB_TOKEN lacks issues and pull-requests write access, causing 403 when trying to close issues/PRs from non-org members. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 16:37:49 -08:00
A	2a5137a919	feat: add gate workflow to restrict issues/PRs to org members (#1146 ) Automatically closes issues and PRs opened by non-members of the OpenRouterTeam org with an explanatory comment. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 19:33:02 -05:00
A	d589b0d74e	fix: tilde expansion in upload_config_file + bump refactor frequency (#1131 ) Fix #1114 — `mv` failed because `~/.claude/settings.json` was single-quoted on the remote shell, preventing tilde expansion. Remove the single quotes around remote_path and add a mkdir -p safety net. Also bump the refactor team cron from hourly to every 5 minutes. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 17:08:36 -05:00
L	0a0512652a	chore: reduce workflow cron frequencies (#1046 ) - discovery: every 30 min → every 3 days - refactor: every 5 min → hourly - security: every 5 min → every 30 min Co-authored-by: Security Reviewer <security-reviewer@spawn.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 18:55:40 -08:00
Ahmed Abushagur	b4abe8012f	fix(ci): propagate mock test exit code and fix broken pipe in summary (#1032 ) * fix(ci): propagate mock test exit code and fix broken pipe in summary The test workflow had three issues: - mock.sh exit code was swallowed by tee (no pipefail), so the check always passed even with 165 failures - grep\|head pipe caused "write error: Broken pipe" in post summary - Summary was noisy with 100+ individual result lines Now uses PIPESTATUS[0] to capture the real exit code, shows a clean results line plus collapsible failures list, and fails the check when tests fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): report test results without blocking PRs Pre-existing failures (165) shouldn't block unrelated PRs. The summary still shows pass/fail counts and a collapsible failures list so the bot can see the results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * perf(ci): increase QA cycle frequency from daily to every 4 hours Daily runs meant breakage could go undetected for up to 24 hours. Every 4 hours gives 6 runs/day (00:00, 04:00, 08:00, 12:00, 16:00, 20:00 UTC) with a max 4-hour feedback loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): add missing Check results step to fail on test errors Addresses review feedback: - The exit code was captured via PIPESTATUS[0] into GITHUB_OUTPUT but no subsequent step consumed it, so the workflow always passed even when tests failed. Added a "Check results" step that reads the captured exit code and fails the job accordingly. - Reverted QA cron schedule change (every 4 hours back to daily at 06:00 UTC) as it was unrelated to the test exit code fix and should be proposed separately if desired. Agent: pr-maintainer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: A <6723574+louisgv@users.noreply.github.com>	2026-02-13 20:46:45 -05:00
Ahmed Abushagur	d501b5eb1d	fix: CI test summary uses NO_COLOR instead of sed hack (#985 ) * fix: strip ANSI colors before grepping test summary The mock test output uses ANSI escape codes for colored ✓/✗/━━━ characters, so the grep in the Post summary step couldn't match them. Strip colors with sed first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use NO_COLOR standard instead of sed to strip ANSI codes mock.sh now respects the NO_COLOR env var (https://no-color.org/). CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 11:26:41 -08:00
Ahmed Abushagur	50b2e98d7d	ci: add mock test workflow for PRs (#977 ) Runs `bash test/mock.sh` on every pull request targeting main. Includes concurrency grouping to cancel stale runs and a 10-minute timeout. Results are posted to the GitHub Actions step summary. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 11:11:15 -08:00
L	f69f95c7c7	refactor: Simplify security workflow to match discovery/refactor pattern (#929 ) Move mode-detection logic from the GitHub Actions workflow into security.sh where it belongs. The workflow now passes github.event_name directly as the reason parameter (like discovery.yml and refactor.yml), and security.sh uses `gh issue view` to check labels when reason=issues. - Remove 25-line if/elif/else reason-mapping block from security.yml - Remove workflow_dispatch mode input (server-side handles it) - Add `if:` label guard for issues (safe-to-work + team-building/security) - Add `labeled` to issue trigger types - Set cancel-in-progress: false (prevents killing long review_all runs) - Bump cron to */5 - Handle schedule/workflow_dispatch → review_all in security.sh - Keep backwards compat for direct team_building/triage reasons Co-authored-by: Security Reviewer <security-reviewer@spawn.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 05:26:21 -08:00
L	49bb39c8ec	fix: prevent duplicate review_all runs via reason-based dedup (#848 ) Two problems: 1. Schedule was every 20 min but review_all cycles take 35 min, causing overlapping triggers that fill both slots 2. Trigger server only deduped by issue number, not by reason, so two review_all runs could stack up Fixes: - Change schedule from */20 to 0,45 (every 45 min) - Add reason-based dedup in trigger-server.ts: reject 409 if a non-issue run with the same reason is already in progress Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 01:41:11 -08:00
L	56c4c020d5	feat: consolidate security review_all and scan into single 20-min cycle (#802 ) The two scheduled modes (review_all every 15 min, scan every 30 min) competed for MAX_CONCURRENT=1 on the trigger server, causing 429 drops and 30-55+ min gaps. Merge both into a single cycle that runs every 20 min, prioritizing PR review but also performing lightweight repo scanning when capacity allows (≤5 open PRs). Also prevents refactor agents from closing issues manually — issues now auto-close via `Fixes #N` in the PR body when merged. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 20:29:56 -08:00
L	f7c6e07867	feat: security triage applies full label taxonomy (#766 ) * feat: security triage now applies full label taxonomy Triage mode now applies: - Safety label (safe-to-work / malicious / needs-human-review) - Content-type label (bug, enhancement, security, question, etc.) - Lifecycle label (Pending Review) so downstream teams can pick up Team-building mode now transitions lifecycle labels: - Adds "In Progress" at start, removes it on close Added a "Available Labels Reference" section to the triage prompt documenting all label categories for the agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: all security-filed issues get safe-to-work + Pending Review Issues filed by the security team (scan findings, drift/anomaly reports, follow-up issues from closed PRs) now automatically get `safe-to-work` and `Pending Review` labels so downstream teams can immediately pick them up without waiting for another triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove Pending Review from safe-to-work issues safe-to-work already means triage is complete — adding Pending Review is redundant and confusing. Now only UNCLEAR issues get Pending Review (they still need human attention). SAFE issues and security-filed issues skip straight to actionable with just safe-to-work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: normalize all labels to kebab-case Renamed on GitHub: - "In Progress" → "in-progress" - "Pending Review" → "pending-review" - "Under Review" → "under-review" - "good first issue" → "good-first-issue" - "help wanted" → "help-wanted" Updated all references in security.sh and refactor.sh to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: align issue templates and workflows with actual repo labels Created missing labels: cloud-request, agent-request, cli. Replaced nonexistent needs-triage with pending-review in all templates. Templates updated: - bug_report: bug + pending-review - cli_feature_request: cli + enhancement + pending-review - cloud_request: cloud-request + enhancement + pending-review - agent_request: agent-request + enhancement + pending-review Workflows updated: - refactor.yml: trigger on safe-to-work AND (bug\|cli\|enhancement\|maintenance) - discovery.yml: already correct (safe-to-work AND cloud-request\|agent-request) - security.yml: already correct (team-building label check) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 16:20:07 -08:00
L	15e2ca6caf	feat: consolidate security modes — merge pr+hygiene into review_all (#739 ) Simplify from 6 modes (Hexa-Mode) to 4 modes (Quad-Mode) by folding single-PR review and hygiene into a unified review_all mode that runs every 15 minutes. This removes the pull_request trigger entirely since review_all catches all open PRs on schedule, and absorbs staleness checks + branch cleanup into the same cycle. Remaining modes: team_building, triage, review_all, scan. Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:53:26 -08:00
L	4924a7d5db	feat: add security triage gate for issue safety before agent processing (#734 ) New issues are triaged by the security team before other workflows can act on them. The triage agent checks for prompt injection, social engineering, spam, and unsafe payloads — marking safe issues with `safe-to-work`, closing malicious ones, or flagging unclear ones for human review. Discovery and refactor workflows now require the `safe-to-work` label in addition to their existing label requirements. Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:23:33 -08:00
L	4d175ae6c7	feat: add Team Building issue template + route workflows by label (#733 ) - New issue template: Team Building (team-building label) — 2 fields: which agent team to improve + what to change - Security team gets a new team_building mode: reads the issue, spawns implementer + reviewer (both Opus), creates PR, reviews, merges, closes issue - Discovery workflow: only triggers on cloud-request / agent-request issues - Refactor workflow: only triggers on bug / cli issues - Security workflow: only triggers on team-building issues (+ PR/schedule) - All workflows still run on schedule and workflow_dispatch as before Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:17:57 -08:00
L	56ba47109c	feat: add security review team for PR review (#543 ) (#730 ) * feat: add security review team for PR review (#543) Adds a security team that automatically reviews every PR for security issues (injection, credential leaks, unsafe patterns, macOS compat) and sends Slack notifications to #spawn when concerns are found. - security.sh: dual-mode cycle script (PR review + scheduled scan) - security.yml: GitHub Actions workflow on pull_request events - start-security.sh: gitignored wrapper with secrets (deployed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: expand security team with hygiene, scan modes + auto-merge clean PRs - PR mode: 2-agent team (code-reviewer + test-verifier) reviews PRs. If zero findings, auto-approves AND merges. If concerns, requests changes and sends Slack notification to #spawn. - Hygiene mode (every 6h): pr-triager + branch-cleaner close stale PRs, file follow-up issues, delete orphan branches. - Scan mode (daily): shell-auditor + code-auditor + drift-detector perform full repo security audit, file GitHub issues for findings. - All modes use Claude Code agent teams (TeamCreate, parallel teammates via Task tool, SendMessage coordination, TaskList monitoring). - Workflow updated with schedule triggers and workflow_dispatch inputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: upgrade all security auditor agents to Opus model All security-critical roles (code-reviewer, pr-triager, shell-auditor, code-auditor) now use Opus. Helper roles (test-verifier, branch-cleaner, drift-detector) remain on Haiku. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: auto-merge PRs with MEDIUM/LOW or no findings Only CRITICAL/HIGH findings block a PR. MEDIUM/LOW are informational notes included in the approving review — PR still gets merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:04:38 -08:00
L	d961947983	fix: download pre-built CLI from GitHub release when local build fails (#728 ) Root cause: bun install creates empty directories in proot (Termux) because proot can't intercept bun's symlink/hardlink/copy_file_range syscalls. This breaks both local build and source-mode fallback. Fix: when `bun run build` fails, download the pre-built cli.js from the `cli-latest` GitHub release. The bundled binary is self-contained (80KB, all deps inlined) and only needs the bun runtime. - Add CI workflow (.github/workflows/cli-release.yml) that builds and uploads cli.js to a rolling `cli-latest` release on every push to main - Replace broken source-mode fallback with GitHub release download - Bump CLI version to 0.2.63 Co-authored-by: Sprite <noreply@sprite.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 13:48:45 -08:00
Sprite	7abfb045af	chore: Improve issue templates and add CLI feature request Agent request: remove redundant name field (already in title), broaden traction criteria to include fork activity and venture funding. Cloud request: remove redundant name field (already in title), consolidate API docs and billing into Additional Context. New: CLI feature request template for spawn CLI improvements. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-11 16:34:01 +00:00
Ahmed Abushagur	8b9f9a0e5a	QA-Bot setup (#335 ) * feat: testing * feat: auto-fix dead apis * fix: mock works * feat: new fixtures * fix: more clouds tested * fix: dry run fix * fix: civo valid size * fix: civo result wait * feat: fixtures * feat: per cloud agent	2026-02-10 19:51:07 -08:00
B	200b6dc5b2	fix: Force HTTP/1.1 for streaming to avoid HTTP/2 stream errors HTTP/2 has strict stream lifecycle management that doesn't play well with long-lived chunked responses — curl exits with error 92 (stream not closed cleanly: INTERNAL_ERROR). HTTP/1.1 handles persistent streaming connections natively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 22:51:35 +00:00
B	874b9c95f4	feat: Stream script output back to GH Actions instead of keep-alive Replace the broken keep-alive ping loop with a fundamentally better approach: the trigger server now streams the script's stdout/stderr back as the HTTP response body in chunks. The GH Action holds the curl connection open for the entire cycle duration (~90 min timeout). This works because Sprite keeps VMs alive while "actively servicing HTTP requests." A single long-lived streaming response satisfies this naturally — no synthetic pings needed. Key changes: trigger-server.ts: - /trigger now returns a streaming text/plain Response - stdout/stderr piped through ReadableStream with chunked output - 30s heartbeat lines injected during silent periods - Client disconnect handled gracefully (process keeps running) - X-Accel-Buffering: no header to prevent proxy buffering discovery.yml / refactor.yml: - curl -sSN --fail-with-body streams output in real-time - timeout-minutes: 90 to hold the connection for full cycles - Error responses (429/409/401) still print body and exit cleanly discovery.sh / refactor.sh: - Removed all keep-alive logic (start_keepalive/stop_keepalive) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 18:09:26 +00:00
A	6f47c852c8	Increase refactor workflow frequency from 30min to 5min Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 16:24:12 +00:00
B	1029320cff	refactor: Rename improve to discovery and remove improve CLI command Rename the GitHub workflow, scripts, and service from "improve" to "discovery" to better reflect what the automation does. Remove the `spawn improve` CLI command entirely — the discovery/refactor loops are internal automation, not user-facing CLI features. File renames: - .github/workflows/improve.yml → discovery.yml - .claude/skills/.../improve.sh → discovery.sh - .claude/skills/.../start-improve.sh → start-discovery.sh - Service: improve-trigger → discovery-trigger Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 16:13:56 +00:00
A	7ace2695e6	feat: Run issue-fix cycles concurrently with refactor cycles (#145 ) Issue triggers now spawn lightweight 2-agent runs (15-min timeout) in isolated worktrees, while refactor cycles continue independently with the full 6-agent team (30-min timeout). Duplicate issue runs are rejected with 409. - trigger-server.ts: pass SPAWN_ISSUE/SPAWN_REASON env vars to script, add issue dedup (409), include issue in health/trigger responses - refactor.sh: dual-mode (issue vs refactor) with isolated worktrees, mode-specific prompts and timeouts, scoped cleanup - start-refactor.sh: set MAX_CONCURRENT=3 (gitignored, local only) - refactor.yml: handle 409 alongside existing 429 Co-authored-by: A <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-09 22:15:19 -08:00
B	6b5a547e2d	fix: Treat 429 (cycle already running) as success in workflows When MAX_CONCURRENT=1 and a cycle is in progress, the trigger server returns 429. This is expected behavior, not an error — the previous curl -f treated it as failure (exit code 22). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 03:43:32 +00:00
A	ab343d26a2	fix: Prevent duplicate work, add graceful shutdown, and enforce team lifecycle (#86 ) - Change trigger-server MAX_CONCURRENT default from 3 to 1 to prevent overlapping cycles that duplicate GitHub issue comments - Add SIGTERM/SIGINT handling to trigger-server so running scripts finish gracefully on service restart instead of being killed mid-flight - Add cleanup trap to refactor.sh for worktree/tempfile cleanup on exit - Add pre-cycle cleanup of stale worktrees, merged branches, and abandoned PRs from previously interrupted cycles - Add mandatory Lifecycle Management section to team lead prompt requiring shutdown_request to all teammates before exiting - Add dedup checks to community-coordinator: check existing comments before posting to prevent duplicate acknowledgments/resolutions - Pass issue number in workflow trigger reason for better logging Co-authored-by: A <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-09 09:10:56 -08:00
B	aeec170dfa	feat: Add agent and cloud request issue templates Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-09 10:10:10 +00:00

1 2

74 commits