spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-04-28 11:59:29 +00:00

Author	SHA1	Message	Date
Ahmed Abushagur	141254c4e1	feat: ARM tarball builds + arch-aware download (#2248 ) * feat: ARM tarball builds + arch-aware download - Add ARM64 matrix entries for native binary agents (zeroclaw, opencode, hermes, claude) in agent-tarballs.yml workflow - Update agent-tarball.ts to detect remote VM arch via uname -m and download the correct tarball (x86_64 or arm64) - Change release strategy to support multiple arch assets per tag - Document ARM build requirements in discovery.md for future agents - Bump CLI version to 0.15.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use sudo for tarball extraction on non-root SSH clouds On AWS Lightsail, SSH connects as 'ubuntu' (not root), but tarballs extract to /root/. Without sudo, tar fails with "Permission denied". Conditionally use sudo when not running as root (id -u != 0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 17:10:33 -05:00
Ahmed Abushagur	ba9690ea23	fix: tarball workflow failures (root ownership, swapfile, hermes TTY) (#2240 ) - Use sudo mv + chown for tarball in release step (root-owned from capture) - Skip swapfile creation if /swapfile already exists (GitHub Actions runners) - Tolerate hermes setup wizard failure when /dev/tty unavailable in CI Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 05:48:46 -05:00
Ahmed Abushagur	8072c084c2	feat: pre-built agent tarballs for fast install (#2232 ) * feat: pre-built agent tarballs on GitHub Releases for fast install Adds a nightly GitHub Actions workflow that builds and uploads agent tarballs to rolling GitHub Releases. During provisioning, the CLI now attempts to download and extract a tarball before falling back to live install. Priority chain: snapshot > tarball > live install. - New workflow: .github/workflows/agent-tarballs.yml - New capture script: packer/scripts/capture-agent.sh - New module: packages/cli/src/shared/agent-tarball.ts - Orchestrate tries tarball first on non-local clouds - Skip tarball when using DO snapshot (skipTarball flag) - Tests for tarball install + orchestration integration Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use global.fetch mock pattern and address security review - Use `global.fetch = mock(...)` instead of `spyOn(globalThis, "fetch")` to match codebase convention and fix CI mock interception - Add URL validation regex to reject shell metacharacters (CRITICAL) - Add agent name validation in workflow input (MEDIUM) - Add `jq has()` check before executing install commands (CRITICAL) - Use `tar -T` instead of unquoted word-splitting in capture-agent.sh (MEDIUM) - Resolve merge conflicts with upstream/main (keep Docker fields, adapt to simplified DO flow, bump version to 0.15.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use globalThis.fetch for testability in CI Bun's native fetch binding doesn't go through global.fetch property lookup, so global.fetch = mock(...) doesn't intercept it. Using globalThis.fetch explicitly ensures the mock interception works. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add missing packer dependencies and harden install command safety - Add packer/agents.json (agent tier + install command definitions) - Add packer/scripts/tier-{minimal,node,bun,full}.sh (dependency scripts) - Add basic command safety check rejecting suspicious patterns - Document packer/agents.json as a trust boundary requiring PR review Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): fix npm prefix mismatch, add apt-get update, cleanup - Add apt-get update -y before apt-get install in all tier scripts - Add --prefix ~/.npm-global to npm install commands in agents.json so installed packages land where capture-agent.sh expects them - Rename misleading MARKER_DIR → MARKER_FILE in capture-agent.sh - Remove stale comment referencing packer snapshots in workflow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): detect empty agent installs in capture script The "no files found" check was dead code — the marker file is always created before filtering, so FILTERED_FILE always had at least one entry. Now we count non-marker entries to catch cases where the agent install silently fails and no actual files are on disk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): use bare fetch() for Bun mock compatibility in CI In Bun, global.fetch = mock(...) overrides bare fetch() calls but NOT globalThis.fetch() calls. Every other source file in the codebase uses bare fetch() and their mocks work fine in CI. Switch to match. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): use dependency injection for fetch in tests Bun's global.fetch mock doesn't reliably intercept bare fetch() calls across all Bun versions in CI. Instead of fighting the runtime, accept an optional fetchFn parameter (defaults to fetch) and pass mock fetch directly in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): bypass mock.module bleed in agent-tarball tests orchestrate.test.ts uses mock.module("../shared/agent-tarball", ...) which is process-global in Bun and bleeds into agent-tarball.test.ts. Import via URL (import.meta.url resolution) to bypass the specifier- based mock matching and get the real module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tarballs): eliminate mock.module bleed between test files Bun's mock.module is process-global — orchestrate.test.ts mocking agent-tarball poisoned agent-tarball.test.ts (the mock function ignored the fetchFn parameter and always returned false). Fix: make tryTarballInstall injectable via OrchestrationOptions. orchestrate.test.ts passes the mock directly via options instead of using mock.module. agent-tarball.test.ts imports the real module. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): mock Bun.which in credential priority tests Tests assumed no cloud CLIs were installed, but machines with hcloud/ doctl would get "CLI installed" hint overrides, failing the assertion. Spy on Bun.which to return null so tests are environment-independent. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: fix import ordering after rebase Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: add curl domain allowlist and expand command blocklist Addresses security review findings: - Add domain allowlist for curl/wget targets (claude.ai, opencode.ai, raw.githubusercontent.com, registry.npmjs.org, crates.io, github.com) - Expand suspicious command blocklist (python -c, perl -e, ruby -e, dd, /dev/) - Document 4-layer security model in workflow comments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: add rm -rf to command blocklist Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 04:49:39 -05:00
Ahmed Abushagur	77c3e34803	feat(docker): replace Packer snapshots with Docker-based agent delivery (#2206 ) * feat(docker): replace Packer snapshots with Docker-based agent delivery Docker images on GHCR are public and cross-account, unlike DO snapshots which are private/account-scoped. Cloud-init installs Docker + pulls the agent image during boot. The install step extracts pre-built binaries via `docker cp` and falls back to normal install if unavailable. - Add Dockerfiles for all 7 agents (claude, codex, openclaw, opencode, kilocode, zeroclaw, hermes) - Convert docker.yml to matrix build for all agents - Add tryInstallFromDocker() shared helper with Docker-first install - Add Docker pull to DigitalOcean cloud-init userdata - Remove Packer snapshot pipeline, lookup, and SSH-only wait - Remove packer/ directory (HCL templates, tier scripts, agents.json) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * security: address review findings in docker agent delivery - Add agentName validation regex (/^[a-z0-9-]+$/) in digitalocean.ts before interpolation into cloud-init script - Quote dockerImage variable in all docker command strings in agent-setup.ts to prevent command injection - Restrict docker cp to specific known directories (.claude, .bun, .local, .npm, .cargo, .opencode) instead of blanket /root/. Agent: pr-maintainer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: B <6723574+louisgv@users.noreply.github.com>	2026-03-05 11:23:56 -05:00
Ahmed Abushagur	07c2c08e3a	revert: remove Packer snapshot pipeline (#2205 ) DO snapshots are private and account-scoped — users on different accounts cannot see snapshots built by the CI token. Docker images are the better approach for cross-account pre-built agents. Removes: packer/, packer-snapshots workflow, snapshot lookup code, and snapshot test. Reverts DO CLI to plain cloud-init flow. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 02:48:52 -05:00
Ahmed Abushagur	96ffb3e201	fix(packer): pass var file explicitly to packer build (#2203 ) Packer wasn't auto-loading build.auto.pkrvars.json, causing "Unset variable" errors. Pass it explicitly with -var-file. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 01:16:41 -05:00
Ahmed Abushagur	ed98a59318	feat(digitalocean): Packer nightly snapshot pipeline for fast boot (#2198 ) * feat(digitalocean): Packer nightly snapshot pipeline for fast boot Add pre-built Packer snapshots for DigitalOcean droplets. Instead of 10-20 min cloud-init + agent install on every boot, snapshot-based droplets boot in ~2-3 min (SSH only, agent pre-installed). - Packer HCL2 template with parametrized agent/tier builds - Agent build matrix (packer/agents.json) for all 7 agents - Tier scripts mirroring cloud-init.ts package tiers - Nightly GitHub Actions workflow (4 AM UTC, max-parallel: 3) - Automatic cleanup: keeps only latest snapshot per agent - CLI: findSpawnSnapshot() looks up pre-built images via DO API - CLI: waitForSshOnly() skips cloud-init when using snapshots - CLI: createServer() accepts optional snapshotId, skips user_data - CLI: main.ts routes to fast path when snapshot detected - Tests for findSpawnSnapshot() (5 cases, all passing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(packer): use var-file for install_commands to avoid shell quoting issues The previous approach passed install_commands as `-var` inline, but GitHub Actions expands `${{ }}` before shell evaluation — JSON arrays with `\|`, `&&`, and `"` characters break shell quoting. Fix: generate a `.auto.pkrvars.json` file (auto-loaded by Packer) using jq with --argjson for safe JSON handling. Also route all `${{ inputs }}` and `${{ matrix }}` values through env vars to prevent script injection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 20:47:46 -08:00
L	61bcedc0eb	feat: migrate to openrouter.ai/labs/spawn CDN + release artifact version checks (#2178 ) * feat: migrate shell script URLs to openrouter.ai/labs/spawn CDN Users on older CLI versions can't auto-update because the repo was restructured (cli/ → packages/cli/), so old version-check URLs 404. This decouples the CLI from the repo's internal directory structure: - Shell script URLs (install, agent scripts, github-auth) now use openrouter.ai/labs/spawn/* as primary with GitHub raw as fallback - Version checks now use GitHub release artifact (cli-latest/version) as primary — a static URL that never changes regardless of repo layout - CI workflow updated to publish a `version` file alongside cli.js - Remove GITHUB_RAW_URL_PATTERN validation (no longer needed since install URL is now a hardcoded CDN string, not interpolated) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix biome formatting in update-check test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CLAUDE.md says biome lint but should say biome check biome lint only checks lint rules, not formatting. biome check does both. The hooks and CI already run biome check — the docs were out of sync. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(hooks): PostToolUse hook wasn't running biome on CLI source files Two bugs in validate-file.ts: 1. Config search only checked 1-2 levels up from the edited file, but biome.json is at packages/cli/ — 3 levels above src/__tests__/*.ts. Fix: walk up directories until biome.json is found (or hit root). 2. Ran `biome format` (prints formatted output, always exits 0) instead of `biome format --check` (exits non-zero if file needs formatting). Fix: use `biome check` which does lint + format check in one pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-03 23:34:58 -08:00
A	446923c447	refactor: extract inline hook commands to TypeScript scripts (#2174 ) * refactor: extract inline hook commands to TypeScript scripts in .claude/scripts/ Replace long inline `bash -c '...'` one-liners in .claude/settings.json with standalone TypeScript scripts that are easier to read, debug, and maintain: - enforce-worktree.ts: PreToolUse hook ensuring edits happen in worktrees - validate-file.ts: PostToolUse hook for .sh/.ts file validation - pre-merge-check.ts: PreToolUse hook running biome + tests before merge Add .claude/scripts as a bun workspace package (@spawn/hooks). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: replace manual typeguards with valibot schemas in hook scripts - Extract shared schemas (FilePathInput, CommandInput, parseStdin) to schemas.ts - Replace inline multi-level typeof/in checks with v.safeParse() calls - Add valibot dependency to @spawn/hooks package - Add CLAUDE.md rule: always prefer valibot over manual typeguards, share schemas Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: split CLAUDE.md into modular .claude/rules/ files Split the 437-line monolithic CLAUDE.md into a lean 89-line project overview plus 9 focused rules files in .claude/rules/ (auto-loaded by Claude Code): - culture.md — embrace bold changes, parallelize, verify exhaustively - shell-scripts.md — curl\|bash compat, macOS bash 3.x, ESM only, bun not python - type-safety.md — no `as` assertions, ALWAYS use valibot (never manual typeguards) - testing.md — bun:test only, no vitest, no subprocess spawning - git-workflow.md — worktree-first mandatory workflow - autonomous-loops.md — discovery/refactor service architecture - discovery.md — how to fill matrix gaps, add clouds/agents - documentation.md — never commit docs, use .docs/ - cli-version.md — bump version on every CLI change The type-safety rule now explicitly mandates valibot schemas over manual typeguard chains in all cases beyond single-primitive narrowing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(lint): run biome check across all packages in CI The lint workflow only checked packages/cli/src/. Now it checks all TypeScript locations in a single biome check command: - packages/cli/src/ (with GritQL plugins) - packages/shared/src/ (new biome.json) - .claude/scripts/ (new biome.json) - .claude/skills/setup-spa/ Fixed all pre-existing lint/format errors: - node: protocol on all Node.js built-in imports in hook scripts - useBlockStatements in packages/shared/src/type-guards.ts - expand formatting in .claude/skills/setup-spa/main.ts and spa.test.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-03 23:05:41 -08:00
A	d04096a15b	feat!: remove Fly.io cloud provider support (#1979 ) * feat!: remove Fly.io cloud provider support Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io infrastructure internally) is retained. - Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/ - Remove fly cloud entry and 6 fly matrix entries from manifest.json - Remove fly imports, destroy cases, and connection handlers from commands.ts - Remove fly-ssh sentinel from security.ts - Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh) - Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts - Clean up fly references in build config, gitignore, icon sources - Bump CLI version to 0.11.0 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: restore Docker image build under sh/docker/ Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename workflow from fly-docker.yml to docker.yml with updated paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * style: fix extra blank lines in commands.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: L <6723574+louisgv@users.noreply.github.com>	2026-02-27 00:06:32 -05:00
A	9e54f0cf57	ci: add Mock Tests job to satisfy required status check (#1904 ) * ci: add Mock Tests job to satisfy required status check Split the unit-tests job into mock-tests (runs bun test) and unit-tests (verifies cloud bundles build). The repo ruleset requires "Mock Tests", "Unit Tests", and "Biome Lint" checks — the missing "Mock Tests" job was blocking all PR merges. Fixes #1901 Agent: issue-fixer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * style: fix pre-existing Biome format issues in 9 files Auto-applied Biome formatter to src/ to resolve failing "Biome Lint" required status check. No logic changes — formatting only. Agent: issue-fixer Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --------- Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-25 00:54:33 -05:00
A	b2bddc4ba5	ci: bump QA cron from daily to every 4 hours (#1895 ) Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:46:55 -08:00
A	98a0d0f68f	feat(qa): add e2e-tester as subagent in scheduled quality sweep (#1894 ) E2E tests now run as a 4th teammate alongside test-runner, dedup-scanner, and code-quality-reviewer during schedule-triggered QA cycles. The standalone e2e mode is preserved for on-demand use. - Add e2e-tester teammate to qa-quality-prompt.md - Increase quality mode timeout from 35 to 40 min - Add "e2e" to trigger-server valid reasons - Re-enable daily schedule in qa.yml, default to "schedule" Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 19:34:35 -05:00
A	58c91571f1	fix: make macOS compat linter blocking and add 6 missing rules (#1867 ) The linter was running in CI with --warn-only, meaning it never blocked anything — effectively vaporware. This removes --warn-only to make it a real gate. Also adds rules for bash 4.0+ features that were documented in CLAUDE.md but not enforced: - MC014: readarray/mapfile (bash 4.0+) - MC015: coproc (bash 4.0+) - MC016: &>> redirect (bash 4.0+) - MC017: relative source paths (breaks curl\|bash) - MC018: wait -n (bash 4.3+) - MC019: declare -g (bash 4.2+) Excludes .claude/worktrees/ from scanning (temp copies, not committed code). Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 03:50:12 -05:00
A	65f6f1be32	feat: Bun workspace monorepo — packages/cli + packages/shared (#1853 ) Restructure the repo as a Bun workspace monorepo: - Move cli/ → packages/cli/ - Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities - Add root package.json with workspace configuration - Update all CLI imports to use @openrouter/spawn-shared - Deduplicate toRecord/toObjectArray helpers from 4 cloud modules - Update SPA (slack-bot) to use shared package instead of local toObj() - Update 48 agent shell scripts for new packages/cli/ path - Update install.sh, install.ps1, e2e, and test scripts - Update all GitHub workflows, .gitignore, pre-commit hooks - Update CLAUDE.md, README.md, and skill prompt references - Pin all dependency versions (no ^ ranges) - Bump CLI version 0.9.1 → 0.10.0 All 1908 tests pass. Lint clean. All 8 cloud bundles build. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 22:07:05 -08:00
A	b84adfb74e	refactor: move all shell scripts to /sh directory (#1843 ) Reorganizes the project so all shell scripts live under a dedicated /sh directory, enabling the OpenRouter rewrite URL to point at /sh/ instead of the repository root. Moves: - cli/install.sh → sh/cli/install.sh - shared/.sh → sh/shared/.sh - {cloud}/{agent}.sh → sh/{cloud}/{agent}.sh (48 scripts) - {cloud}/README.md → sh/{cloud}/README.md - e2e/.sh → sh/e2e/.sh - test/macos-compat.sh → sh/test/macos-compat.sh - test/fixtures/*/.sh → sh/test/fixtures/*/.sh Updates all references: - RAW_BASE path construction in commands.ts, update-check.ts - GitHub auth URL in agent-setup.ts - Self-referencing URLs in install.sh, github-auth.sh - CI workflow paths in lint.yml, cli-release.yml - Test file paths in install-script-validation, manifest-integrity - Documentation in README.md, cli/README.md, CLAUDE.md - QA scripts in .claude/skills/ Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 21:14:54 -08:00
A	fccb73d147	feat: add Fly.io E2E test suite and QA e2e mode (#1823 ) - Add e2e/ directory with fly-e2e.sh orchestrator and lib/ helpers (provision, verify, teardown, cleanup) that provision real Fly.io VMs, verify agent installation, and tear everything down - Fix openclaw E2E failure by setting MODEL_ID=openrouter/auto to bypass interactive model selection prompt in headless mode - Add e2e mode to qa.sh (reason=e2e) that launches a Claude agent to run the E2E suite and investigate/fix any failures - Update qa.yml with reason dropdown (e2e/schedule/fixtures), kept disabled Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:31:44 -05:00
A	aa88e70488	fix: add concurrency guard and workflow_dispatch to CLI release (#1812 ) The race condition: two PRs merged 3 seconds apart both triggered the CLI Release workflow. The second run (v0.7.12) finished last and overwrote the release with a stale binary, even though the repo HEAD was at v0.8.0. - Add concurrency group so concurrent releases cancel the older one - Add workflow_dispatch trigger for manual re-runs Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 13:52:17 -05:00
A	a26d27f139	style: enforce biome format across codebase, add CI check (#1794 ) Run `biome format --write` on all 98 source files (38 needed fixes). The main change: object literals and long argument lists are now expanded onto separate lines per Biome's `"expand": "always"` setting, making code much easier to scan on narrow screens. Add `biome format` check step to CI lint workflow so formatting regressions are caught on every PR. Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 23:32:12 -08:00
A	ec210e37af	fix: Result monad for retry logic — prevent duplicate server creation (#1771 ) * fix: Result monad for retry logic — prevent duplicate server creation SSH exit 255 after an interactive session caused runWithRetries to retry the entire bash script, creating duplicate servers. The old withRetry also blindly retried all errors including timeouts where the remote command may have already completed. Introduces a Result<T> monad (Ok/Err) so callers explicitly signal whether a failure is retryable (return Err) or fatal (throw). Adds wrapSshCall() that classifies SSH errors: transient connection failures are retryable, timeouts are not. Removes retry loop from the top-level script runner entirely since it spans server creation + interactive session. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mandate draft-PR-first workflow for all changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add biome lint to CI and pre-commit hook, fix lint violations - Add Biome lint job to .github/workflows/lint.yml - Add TypeScript lint check to .githooks/pre-commit - Fix useBlockStatements violations in ui.ts and tests - Add biome lint to CLAUDE.md "After Each Change" checklist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: rename Result.value to Result.data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: clean up stale pre-commit hook - Remove dead check for deleted functions (write_oauth_response_file, create_oauth_response_html) — they no longer exist in the codebase - Fix early exit skipping Biome lint when no .sh files are staged - Replace echo -e with printf (the hook was using the pattern it bans) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve biome lint errors blocking CI - Fix useImportType: import { type Result } → import type { Result } - Fix noUnusedImports: remove unused KNOWN_FLAGS import - Fix noUnusedTemplateLiteral: template literal → string literal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 20:39:42 -05:00
A	60986e5a05	refactor: remove shared/common.sh and 27 subprocess-heavy test files (#1728 ) shared/common.sh (3852 lines) was dead code — the entire architecture was rewritten to TypeScript in cli/src/. No agent scripts source it anymore. The only consumer was github-auth.sh which just needed 4 log functions (now inlined). Remove 27 test files that spawned ~800+ real bash/bun subprocesses per run (the root cause of slow bun test). Every shared-common-*.test.ts file forked a real bash shell per test case to source shared/common.sh. CLI subprocess tests spawned `bun run index.ts` per assertion. These were integration tests, not unit tests. Also removes: - mock-tests CI job from test.yml (ran test/mock.sh which opens browser) - Stale plan files referencing deleted infrastructure - All CLAUDE.md/README.md references to the old lib/common.sh pattern Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 11:32:27 -08:00
A	33bd3e615c	chore: disable QA workflow schedule until VM is fixed (#1722 ) Keep workflow_dispatch for manual testing. Re-enable cron when the QA VM is back online. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 11:06:50 -08:00
A	0f4df7be71	feat: pre-built Docker image for OpenClaw on Fly.io (#1686 ) Eliminates the slow waitForCloudInit() + bun install phase by booting a pre-built image with Node.js, bun, and openclaw already installed. The image is rebuilt daily via GitHub Actions to pick up new releases. Other agents are unaffected — they still use ubuntu:24.04 + cloud-init. Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-22 02:50:46 -05:00
A	262d081756	refactor: move fly TS into cli/src/fly/, add build-clouds.sh (#1604 ) Move all fly TypeScript files from fly/lib/.ts and fly/main.ts into cli/src/fly/. This gives them access to cli/node_modules (@clack/prompts), biome linting, and the existing bun:test infrastructure — no symlinks or NODE_PATH hacks needed. The org picker now uses @clack/prompts select() directly (static import, bundled at build time). New: cli/build-clouds.sh — auto-discovers cli/src//main.ts and bundles each into {cloud}.js. Scalable to future cloud TS migrations: bash cli/build-clouds.sh # build all bash cli/build-clouds.sh fly # build one Shims now check for cli/src/fly/main.ts (local) or download fly.js from GitHub releases (remote curl\|bash). Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-21 12:34:09 -08:00
Ahmed Abushagur	f2795a6d84	fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440 ) * fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds aider-chat on Python 3.13 fails with `ImportError: cannot import name '_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved — those releases have no Python 3.13 binary wheels, so the C extension is missing at runtime. Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>` and single quotes get mangled by `printf '%q'` in run_server before the command reaches the remote machine) with `--upgrade`, which forces all transitive deps including Pillow to their latest compatible versions. Also adds a plain-text echo before the install so users see progress instead of a silent hang during the 2-4 minute install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: update aider/gptme/interpreter assertions from pip to uv The install method for aider, gptme, and open-interpreter was changed from pip to `uv tool install` across all clouds. The mock test assertions still checked for the old `pip.install.` patterns, causing 9 failures (3 agents × 3 clouds). Update patterns to match the actual `uv tool install` commands now used in all cloud scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: trigger test run for uv assertion fix * fix: prevent SSH hangs, restore stderr, fix command escaping across clouds - Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH stdin theft causing sequential install/verify/configure steps to hang - Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default SSH_OPTS so long-running installs don't silently drop on flaky networks - Remove 2>/dev/null from Fly.io run_server so remote command errors are no longer silently swallowed (--quiet flag still suppresses flyctl noise) - Fix Fly.io printf '%q' double-quoting: remove extra quotes around $escaped_cmd that prevented the remote shell from consuming escapes, breaking && \|\| \| operators in commands - Remove broken printf '%q' from Daytona run_server and interactive_session where it escaped shell operators into literal characters since daytona exec has no intermediate shell layer - Pin aider to --python 3.12 instead of --with audioop-lts across all clouds Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add --pty to fly ssh console for interactive sessions fly ssh console -C does not allocate a pseudo-terminal by default, causing interactive TUI agents (aider, claude) to fail with "Input is not a terminal (fd=0)" or completely unresponsive input. Adding --pty forces PTY allocation, matching how other clouds handle interactive sessions (SSH uses -t, Sprite uses -tty). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: prepend ~/.local/bin to PATH in ssh_run_server After uv installs to ~/.local/bin, the current shell session doesn't have it in PATH, causing "uv: command not found" on DigitalOcean and all other SSH-based clouds (Hetzner, AWS, GCP, OVH). Fly.io's run_server already prepends this PATH — now the shared ssh_run_server does the same, fixing all SSH-based clouds at once. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add Node.js to cloud-init for all cloud providers npm-based agents (codex, kilocode, etc.) fail with "npm: command not found" because Node.js isn't installed during cloud-init. Fly.io was the only provider installing Node.js (in wait_for_cloud_init). Now all cloud-init scripts install Node.js v22 LTS from nodesource, matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and GCP cloud-init (was already in shared/DigitalOcean/Hetzner). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use apt packages for nodejs/npm instead of nodesource The nodesource setup script (setup_22.x) runs its own apt-get update and repository configuration, nearly doubling cloud-init time and causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm in its default repos — just add them to the packages list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: add timeouts and better error handling to Daytona CLI commands Daytona CLI commands (login, list, create) can hang indefinitely when the API is slow or unreachable. This causes: - "Failed to create sandbox: timeout" with no recovery - Token validation timeouts misreported as "invalid token" - Users re-entering valid tokens that also timeout Fixes: - Wrap all daytona CLI calls with timeout (30s for auth, 120s for create) - Detect timeout errors separately from auth errors - Show actionable "try again / check status" messages for timeouts - Add nodejs/npm to Daytona wait_for_cloud_init Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: set DAYTONA_API_URL to Daytona Cloud by default The Daytona CLI may default to connecting to a local self-hosted server instead of Daytona Cloud. Without DAYTONA_API_URL set to https://app.daytona.io/api, every CLI command (login, list, create) hangs trying to reach a non-existent local server and times out. The SDK documents this as the default, but the CLI doesn't always pick it up — now we export it explicitly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing n installs Node.js v22 to /usr/local/bin/node but apt's v18 at /usr/bin/node can shadow it in non-interactive SSH sessions. After n 22, symlink the new binaries over the apt ones so v22 is always resolved. Also fix hcloud CLI token extraction for new TOML format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address security review, add curl timeouts to trigger workflows - Fix ssh_run_server command injection concern: use single-quoted path_prefix so $HOME/$PATH expand remotely, not locally - Add --connect-timeout 15 --max-time 30 to trigger workflows to prevent 5-min hangs when server streams responses - Handle 409 (dedup) as success — expected when cron fires every 15min but cycles take 35min - Reduce workflow timeout-minutes from 5 to 2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-18 06:54:07 -05:00
Ahmed Abushagur	22b6a402f4	feat: E2E test harness, QA pipeline integration, macOS compat linter (#1425 ) * feat: add QA upgrade — macOS compat linter, per-agent mock assertions Layer 1: macOS compat linter (test/macos-compat.sh) - 12 rules (MC001–MC012) catching bash 3.2 incompatibilities - Detects: base64 -w0 file args, non-portable echo flags, source <(), ((var++)), read -d, nounset flag, sed -i, date %N, local -n, declare -A, ${var,,}, and \|& - Added to CI lint.yml in warn-only mode for burn-in - Integrated as Phase 0.5 in qa-dry-run.sh Layer 2: Per-agent mock assertions - test/fixtures/_shared_agent_assertions.sh with install checks for all 15 agents (claude, openclaw, aider, goose, etc.) - Integrated into test/mock.sh via _run_agent_assertions() Also includes branch fixes: - Fix base64 -w0 to use stdin redirect (aws, daytona, fly) - Fix fly/openclaw to use npm install instead of broken curl\|bash Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add E2E test harness and integrate into QA pipeline Add test/e2e.sh — a full E2E test harness that provisions real servers, installs agents, and verifies setup across all clouds. Features: - Smoke test (one canary agent per cloud) and full matrix modes - Credential auto-detection for 8 clouds - Per-cloud preflight validation (sequential) then parallel agent tests - Stale server cleanup, timing history, cross-cloud comparison - Auto-fix and optimization phases via Claude agents - macOS bash 3.2 compatible Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh: - Runs after mock tests pass, gated on cloud credentials - Phase 5b auto-fixes failures using per-agent worktree branches - Parses results and includes in QA summary Also fixes: - shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read() - aws/lib/common.sh: fix SSH key import (use cat instead of base64, handle race condition on concurrent imports) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-17 20:41:07 -05:00
L	6e13256d96	refactor: simplify claude launch — no streaming, no output monitoring (#1412 ) Replace the complex claude launch pattern (subshell + PID file + tee pipe + stream-json + 50-line watchdog monitoring log file growth + session-end detection) with a simple direct launch: claude -p "..." >> "${LOG_FILE}" 2>&1 & The watchdog is now just a wall-clock timeout. The idle-output detection, stream-json result parsing, and tee piping are all removed. Also remove GitHub Actions concurrency groups — the trigger server already handles dedup (409 for same issue, 409 for same reason), making the GH Actions concurrency groups redundant queuing. Changes: - refactor.sh: simple launch + wall-clock-only watchdog - security.sh: same simplification - discovery.sh: same (refactored _kill_claude_process and _run_watchdog_loop to simpler signatures) - All 4 workflows: remove concurrency groups Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 09:02:47 -08:00
L	f3cfe890f7	refactor: simplify trigger server to fire-and-forget + fix monitoring loop prompts (#1384 ) The trigger server streamed script stdout back to GitHub Actions via a long-lived HTTP response, requiring --http1.1, heartbeat injection, server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min GH Actions timeouts. In practice GitHub Actions is just a dumb trigger — the real state lives on the VM (log files, journalctl). Simplify to fire-and-forget: spawn script, return 200 JSON immediately. Also fix the refactor and discovery team lead monitoring loops. The prompts buried the loop in a single compressed line that the model ignored (doing Bash("sleep 10") repeatedly without calling TaskList). Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered steps, matching the security.sh pattern that actually works. Changes: - trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer, drainStreamOutput, startStreamingRun, heartbeat, ReadableStream), replace with startFireAndForgetRun (stdout: "inherit", immediate JSON) - All 4 workflows: simple curl POST, timeout-minutes 90→5, remove --http1.1/-N/--max-time/exit-code handling - refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps - discovery-team-prompt.txt: same Monitor Loop fix - SKILL.md: update architecture docs, remove streaming sections Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 10:47:52 -05:00
A	99a9badf62	ci: increase refactor team frequency to every 15 minutes (#1378 ) Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-16 20:50:03 -08:00
Ahmed Abushagur	3fbdf56c4c	fix: add guardrails to prevent bots from inventing unnecessary work (#1347 ) - Add team lead pre-approval gate: teammates spawn in plan mode and must get approval before creating any PR (hard gate, not just prompt rules) - Add diminishing returns rule: default posture is "code is good, shut down" - Add dedup rule: check for existing open/closed PRs before creating new ones - Require concrete PR justification (what breaks without this change) - Add off-limits files list (.github/workflows, .claude/skills, CLAUDE.md) - Use git pathspec exclusions in refactor.sh to never stage protected files - Constrain pr-maintainer to only act on approved or feedback PRs - Reduce refactor cron from every 5 minutes to every 2 hours Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-16 20:24:25 -05:00
A	a4fe0388c1	fix: allow repo collaborators through the gate workflow (#1166 ) Previously only org members were allowed. Now checks both org membership and repo collaborator status, so invited collaborators can open issues and PRs without being blocked. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 18:32:50 -08:00
A	8108d57999	fix: add write permissions to gate workflow (#1148 ) The default GITHUB_TOKEN lacks issues and pull-requests write access, causing 403 when trying to close issues/PRs from non-org members. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 16:37:49 -08:00
A	2a5137a919	feat: add gate workflow to restrict issues/PRs to org members (#1146 ) Automatically closes issues and PRs opened by non-members of the OpenRouterTeam org with an explanatory comment. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 19:33:02 -05:00
A	d589b0d74e	fix: tilde expansion in upload_config_file + bump refactor frequency (#1131 ) Fix #1114 — `mv` failed because `~/.claude/settings.json` was single-quoted on the remote shell, preventing tilde expansion. Remove the single quotes around remote_path and add a mkdir -p safety net. Also bump the refactor team cron from hourly to every 5 minutes. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-14 17:08:36 -05:00
L	0a0512652a	chore: reduce workflow cron frequencies (#1046 ) - discovery: every 30 min → every 3 days - refactor: every 5 min → hourly - security: every 5 min → every 30 min Co-authored-by: Security Reviewer <security-reviewer@spawn.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 18:55:40 -08:00
Ahmed Abushagur	b4abe8012f	fix(ci): propagate mock test exit code and fix broken pipe in summary (#1032 ) * fix(ci): propagate mock test exit code and fix broken pipe in summary The test workflow had three issues: - mock.sh exit code was swallowed by tee (no pipefail), so the check always passed even with 165 failures - grep\|head pipe caused "write error: Broken pipe" in post summary - Summary was noisy with 100+ individual result lines Now uses PIPESTATUS[0] to capture the real exit code, shows a clean results line plus collapsible failures list, and fails the check when tests fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): report test results without blocking PRs Pre-existing failures (165) shouldn't block unrelated PRs. The summary still shows pass/fail counts and a collapsible failures list so the bot can see the results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * perf(ci): increase QA cycle frequency from daily to every 4 hours Daily runs meant breakage could go undetected for up to 24 hours. Every 4 hours gives 6 runs/day (00:00, 04:00, 08:00, 12:00, 16:00, 20:00 UTC) with a max 4-hour feedback loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(ci): add missing Check results step to fail on test errors Addresses review feedback: - The exit code was captured via PIPESTATUS[0] into GITHUB_OUTPUT but no subsequent step consumed it, so the workflow always passed even when tests failed. Added a "Check results" step that reads the captured exit code and fails the job accordingly. - Reverted QA cron schedule change (every 4 hours back to daily at 06:00 UTC) as it was unrelated to the test exit code fix and should be proposed separately if desired. Agent: pr-maintainer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: A <6723574+louisgv@users.noreply.github.com>	2026-02-13 20:46:45 -05:00
Ahmed Abushagur	d501b5eb1d	fix: CI test summary uses NO_COLOR instead of sed hack (#985 ) * fix: strip ANSI colors before grepping test summary The mock test output uses ANSI escape codes for colored ✓/✗/━━━ characters, so the grep in the Post summary step couldn't match them. Strip colors with sed first. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use NO_COLOR standard instead of sed to strip ANSI codes mock.sh now respects the NO_COLOR env var (https://no-color.org/). CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 11:26:41 -08:00
Ahmed Abushagur	50b2e98d7d	ci: add mock test workflow for PRs (#977 ) Runs `bash test/mock.sh` on every pull request targeting main. Includes concurrency grouping to cancel stale runs and a 10-minute timeout. Results are posted to the GitHub Actions step summary. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 11:11:15 -08:00
L	f69f95c7c7	refactor: Simplify security workflow to match discovery/refactor pattern (#929 ) Move mode-detection logic from the GitHub Actions workflow into security.sh where it belongs. The workflow now passes github.event_name directly as the reason parameter (like discovery.yml and refactor.yml), and security.sh uses `gh issue view` to check labels when reason=issues. - Remove 25-line if/elif/else reason-mapping block from security.yml - Remove workflow_dispatch mode input (server-side handles it) - Add `if:` label guard for issues (safe-to-work + team-building/security) - Add `labeled` to issue trigger types - Set cancel-in-progress: false (prevents killing long review_all runs) - Bump cron to */5 - Handle schedule/workflow_dispatch → review_all in security.sh - Keep backwards compat for direct team_building/triage reasons Co-authored-by: Security Reviewer <security-reviewer@spawn.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 05:26:21 -08:00
L	49bb39c8ec	fix: prevent duplicate review_all runs via reason-based dedup (#848 ) Two problems: 1. Schedule was every 20 min but review_all cycles take 35 min, causing overlapping triggers that fill both slots 2. Trigger server only deduped by issue number, not by reason, so two review_all runs could stack up Fixes: - Change schedule from */20 to 0,45 (every 45 min) - Add reason-based dedup in trigger-server.ts: reject 409 if a non-issue run with the same reason is already in progress Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-13 01:41:11 -08:00
L	56c4c020d5	feat: consolidate security review_all and scan into single 20-min cycle (#802 ) The two scheduled modes (review_all every 15 min, scan every 30 min) competed for MAX_CONCURRENT=1 on the trigger server, causing 429 drops and 30-55+ min gaps. Merge both into a single cycle that runs every 20 min, prioritizing PR review but also performing lightweight repo scanning when capacity allows (≤5 open PRs). Also prevents refactor agents from closing issues manually — issues now auto-close via `Fixes #N` in the PR body when merged. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 20:29:56 -08:00
L	f7c6e07867	feat: security triage applies full label taxonomy (#766 ) * feat: security triage now applies full label taxonomy Triage mode now applies: - Safety label (safe-to-work / malicious / needs-human-review) - Content-type label (bug, enhancement, security, question, etc.) - Lifecycle label (Pending Review) so downstream teams can pick up Team-building mode now transitions lifecycle labels: - Adds "In Progress" at start, removes it on close Added a "Available Labels Reference" section to the triage prompt documenting all label categories for the agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: all security-filed issues get safe-to-work + Pending Review Issues filed by the security team (scan findings, drift/anomaly reports, follow-up issues from closed PRs) now automatically get `safe-to-work` and `Pending Review` labels so downstream teams can immediately pick them up without waiting for another triage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove Pending Review from safe-to-work issues safe-to-work already means triage is complete — adding Pending Review is redundant and confusing. Now only UNCLEAR issues get Pending Review (they still need human attention). SAFE issues and security-filed issues skip straight to actionable with just safe-to-work. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: normalize all labels to kebab-case Renamed on GitHub: - "In Progress" → "in-progress" - "Pending Review" → "pending-review" - "Under Review" → "under-review" - "good first issue" → "good-first-issue" - "help wanted" → "help-wanted" Updated all references in security.sh and refactor.sh to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: align issue templates and workflows with actual repo labels Created missing labels: cloud-request, agent-request, cli. Replaced nonexistent needs-triage with pending-review in all templates. Templates updated: - bug_report: bug + pending-review - cli_feature_request: cli + enhancement + pending-review - cloud_request: cloud-request + enhancement + pending-review - agent_request: agent-request + enhancement + pending-review Workflows updated: - refactor.yml: trigger on safe-to-work AND (bug\|cli\|enhancement\|maintenance) - discovery.yml: already correct (safe-to-work AND cloud-request\|agent-request) - security.yml: already correct (team-building label check) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 16:20:07 -08:00
L	15e2ca6caf	feat: consolidate security modes — merge pr+hygiene into review_all (#739 ) Simplify from 6 modes (Hexa-Mode) to 4 modes (Quad-Mode) by folding single-PR review and hygiene into a unified review_all mode that runs every 15 minutes. This removes the pull_request trigger entirely since review_all catches all open PRs on schedule, and absorbs staleness checks + branch cleanup into the same cycle. Remaining modes: team_building, triage, review_all, scan. Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:53:26 -08:00
L	4924a7d5db	feat: add security triage gate for issue safety before agent processing (#734 ) New issues are triaged by the security team before other workflows can act on them. The triage agent checks for prompt injection, social engineering, spam, and unsafe payloads — marking safe issues with `safe-to-work`, closing malicious ones, or flagging unclear ones for human review. Discovery and refactor workflows now require the `safe-to-work` label in addition to their existing label requirements. Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:23:33 -08:00
L	4d175ae6c7	feat: add Team Building issue template + route workflows by label (#733 ) - New issue template: Team Building (team-building label) — 2 fields: which agent team to improve + what to change - Security team gets a new team_building mode: reads the issue, spawns implementer + reviewer (both Opus), creates PR, reviews, merges, closes issue - Discovery workflow: only triggers on cloud-request / agent-request issues - Refactor workflow: only triggers on bug / cli issues - Security workflow: only triggers on team-building issues (+ PR/schedule) - All workflows still run on schedule and workflow_dispatch as before Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:17:57 -08:00
L	56ba47109c	feat: add security review team for PR review (#543 ) (#730 ) * feat: add security review team for PR review (#543) Adds a security team that automatically reviews every PR for security issues (injection, credential leaks, unsafe patterns, macOS compat) and sends Slack notifications to #spawn when concerns are found. - security.sh: dual-mode cycle script (PR review + scheduled scan) - security.yml: GitHub Actions workflow on pull_request events - start-security.sh: gitignored wrapper with secrets (deployed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: expand security team with hygiene, scan modes + auto-merge clean PRs - PR mode: 2-agent team (code-reviewer + test-verifier) reviews PRs. If zero findings, auto-approves AND merges. If concerns, requests changes and sends Slack notification to #spawn. - Hygiene mode (every 6h): pr-triager + branch-cleaner close stale PRs, file follow-up issues, delete orphan branches. - Scan mode (daily): shell-auditor + code-auditor + drift-detector perform full repo security audit, file GitHub issues for findings. - All modes use Claude Code agent teams (TeamCreate, parallel teammates via Task tool, SendMessage coordination, TaskList monitoring). - Workflow updated with schedule triggers and workflow_dispatch inputs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: upgrade all security auditor agents to Opus model All security-critical roles (code-reviewer, pr-triager, shell-auditor, code-auditor) now use Opus. Helper roles (test-verifier, branch-cleaner, drift-detector) remain on Haiku. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: auto-merge PRs with MEDIUM/LOW or no findings Only CRITICAL/HIGH findings block a PR. MEDIUM/LOW are informational notes included in the approving review — PR still gets merged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Sprite <noreply@sprites.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 14:04:38 -08:00
L	d961947983	fix: download pre-built CLI from GitHub release when local build fails (#728 ) Root cause: bun install creates empty directories in proot (Termux) because proot can't intercept bun's symlink/hardlink/copy_file_range syscalls. This breaks both local build and source-mode fallback. Fix: when `bun run build` fails, download the pre-built cli.js from the `cli-latest` GitHub release. The bundled binary is self-contained (80KB, all deps inlined) and only needs the bun runtime. - Add CI workflow (.github/workflows/cli-release.yml) that builds and uploads cli.js to a rolling `cli-latest` release on every push to main - Replace broken source-mode fallback with GitHub release download - Bump CLI version to 0.2.63 Co-authored-by: Sprite <noreply@sprite.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-12 13:48:45 -08:00
Ahmed Abushagur	8b9f9a0e5a	QA-Bot setup (#335 ) * feat: testing * feat: auto-fix dead apis * fix: mock works * feat: new fixtures * fix: more clouds tested * fix: dry run fix * fix: civo valid size * fix: civo result wait * feat: fixtures * feat: per cloud agent	2026-02-10 19:51:07 -08:00
B	200b6dc5b2	fix: Force HTTP/1.1 for streaming to avoid HTTP/2 stream errors HTTP/2 has strict stream lifecycle management that doesn't play well with long-lived chunked responses — curl exits with error 92 (stream not closed cleanly: INTERNAL_ERROR). HTTP/1.1 handles persistent streaming connections natively. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 22:51:35 +00:00
B	874b9c95f4	feat: Stream script output back to GH Actions instead of keep-alive Replace the broken keep-alive ping loop with a fundamentally better approach: the trigger server now streams the script's stdout/stderr back as the HTTP response body in chunks. The GH Action holds the curl connection open for the entire cycle duration (~90 min timeout). This works because Sprite keeps VMs alive while "actively servicing HTTP requests." A single long-lived streaming response satisfies this naturally — no synthetic pings needed. Key changes: trigger-server.ts: - /trigger now returns a streaming text/plain Response - stdout/stderr piped through ReadableStream with chunked output - 30s heartbeat lines injected during silent periods - Client disconnect handled gracefully (process keeps running) - X-Accel-Buffering: no header to prevent proxy buffering discovery.yml / refactor.yml: - curl -sSN --fail-with-body streams output in real-time - timeout-minutes: 90 to hold the connection for full cycles - Error responses (429/409/401) still print body and exit cleanly discovery.sh / refactor.sh: - Removed all keep-alive logic (start_keepalive/stop_keepalive) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-10 18:09:26 +00:00

1 2

78 commits