* feat: ARM tarball builds + arch-aware download
- Add ARM64 matrix entries for native binary agents (zeroclaw, opencode,
hermes, claude) in agent-tarballs.yml workflow
- Update agent-tarball.ts to detect remote VM arch via uname -m and
download the correct tarball (x86_64 or arm64)
- Change release strategy to support multiple arch assets per tag
- Document ARM build requirements in discovery.md for future agents
- Bump CLI version to 0.15.2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use sudo for tarball extraction on non-root SSH clouds
On AWS Lightsail, SSH connects as 'ubuntu' (not root), but tarballs
extract to /root/. Without sudo, tar fails with "Permission denied".
Conditionally use sudo when not running as root (id -u != 0).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Use sudo mv + chown for tarball in release step (root-owned from capture)
- Skip swapfile creation if /swapfile already exists (GitHub Actions runners)
- Tolerate hermes setup wizard failure when /dev/tty unavailable in CI
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: pre-built agent tarballs on GitHub Releases for fast install
Adds a nightly GitHub Actions workflow that builds and uploads agent
tarballs to rolling GitHub Releases. During provisioning, the CLI now
attempts to download and extract a tarball before falling back to live
install. Priority chain: snapshot > tarball > live install.
- New workflow: .github/workflows/agent-tarballs.yml
- New capture script: packer/scripts/capture-agent.sh
- New module: packages/cli/src/shared/agent-tarball.ts
- Orchestrate tries tarball first on non-local clouds
- Skip tarball when using DO snapshot (skipTarball flag)
- Tests for tarball install + orchestration integration
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use global.fetch mock pattern and address security review
- Use `global.fetch = mock(...)` instead of `spyOn(globalThis, "fetch")`
to match codebase convention and fix CI mock interception
- Add URL validation regex to reject shell metacharacters (CRITICAL)
- Add agent name validation in workflow input (MEDIUM)
- Add `jq has()` check before executing install commands (CRITICAL)
- Use `tar -T` instead of unquoted word-splitting in capture-agent.sh (MEDIUM)
- Resolve merge conflicts with upstream/main (keep Docker fields, adapt
to simplified DO flow, bump version to 0.15.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use globalThis.fetch for testability in CI
Bun's native fetch binding doesn't go through global.fetch property
lookup, so global.fetch = mock(...) doesn't intercept it. Using
globalThis.fetch explicitly ensures the mock interception works.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing packer dependencies and harden install command safety
- Add packer/agents.json (agent tier + install command definitions)
- Add packer/scripts/tier-{minimal,node,bun,full}.sh (dependency scripts)
- Add basic command safety check rejecting suspicious patterns
- Document packer/agents.json as a trust boundary requiring PR review
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): fix npm prefix mismatch, add apt-get update, cleanup
- Add apt-get update -y before apt-get install in all tier scripts
- Add --prefix ~/.npm-global to npm install commands in agents.json
so installed packages land where capture-agent.sh expects them
- Rename misleading MARKER_DIR → MARKER_FILE in capture-agent.sh
- Remove stale comment referencing packer snapshots in workflow
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): detect empty agent installs in capture script
The "no files found" check was dead code — the marker file is always
created before filtering, so FILTERED_FILE always had at least one
entry. Now we count non-marker entries to catch cases where the agent
install silently fails and no actual files are on disk.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): use bare fetch() for Bun mock compatibility in CI
In Bun, global.fetch = mock(...) overrides bare fetch() calls but NOT
globalThis.fetch() calls. Every other source file in the codebase uses
bare fetch() and their mocks work fine in CI. Switch to match.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): use dependency injection for fetch in tests
Bun's global.fetch mock doesn't reliably intercept bare fetch() calls
across all Bun versions in CI. Instead of fighting the runtime, accept
an optional fetchFn parameter (defaults to fetch) and pass mock fetch
directly in tests.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): bypass mock.module bleed in agent-tarball tests
orchestrate.test.ts uses mock.module("../shared/agent-tarball", ...)
which is process-global in Bun and bleeds into agent-tarball.test.ts.
Import via URL (import.meta.url resolution) to bypass the specifier-
based mock matching and get the real module.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tarballs): eliminate mock.module bleed between test files
Bun's mock.module is process-global — orchestrate.test.ts mocking
agent-tarball poisoned agent-tarball.test.ts (the mock function
ignored the fetchFn parameter and always returned false).
Fix: make tryTarballInstall injectable via OrchestrationOptions.
orchestrate.test.ts passes the mock directly via options instead
of using mock.module. agent-tarball.test.ts imports the real module.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(tests): mock Bun.which in credential priority tests
Tests assumed no cloud CLIs were installed, but machines with hcloud/
doctl would get "CLI installed" hint overrides, failing the assertion.
Spy on Bun.which to return null so tests are environment-independent.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: fix import ordering after rebase
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: add curl domain allowlist and expand command blocklist
Addresses security review findings:
- Add domain allowlist for curl/wget targets (claude.ai, opencode.ai,
raw.githubusercontent.com, registry.npmjs.org, crates.io, github.com)
- Expand suspicious command blocklist (python -c, perl -e, ruby -e, dd, /dev/)
- Document 4-layer security model in workflow comments
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: add rm -rf to command blocklist
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Signed-off-by: Ahmed Abushagur <ahmed@abushagur.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(docker): replace Packer snapshots with Docker-based agent delivery
Docker images on GHCR are public and cross-account, unlike DO snapshots
which are private/account-scoped. Cloud-init installs Docker + pulls the
agent image during boot. The install step extracts pre-built binaries via
`docker cp` and falls back to normal install if unavailable.
- Add Dockerfiles for all 7 agents (claude, codex, openclaw, opencode,
kilocode, zeroclaw, hermes)
- Convert docker.yml to matrix build for all agents
- Add tryInstallFromDocker() shared helper with Docker-first install
- Add Docker pull to DigitalOcean cloud-init userdata
- Remove Packer snapshot pipeline, lookup, and SSH-only wait
- Remove packer/ directory (HCL templates, tier scripts, agents.json)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* security: address review findings in docker agent delivery
- Add agentName validation regex (/^[a-z0-9-]+$/) in digitalocean.ts
before interpolation into cloud-init script
- Quote dockerImage variable in all docker command strings in
agent-setup.ts to prevent command injection
- Restrict docker cp to specific known directories (.claude, .bun,
.local, .npm, .cargo, .opencode) instead of blanket /root/.
Agent: pr-maintainer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
DO snapshots are private and account-scoped — users on different
accounts cannot see snapshots built by the CI token. Docker images
are the better approach for cross-account pre-built agents.
Removes: packer/, packer-snapshots workflow, snapshot lookup code,
and snapshot test. Reverts DO CLI to plain cloud-init flow.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Packer wasn't auto-loading build.auto.pkrvars.json, causing
"Unset variable" errors. Pass it explicitly with -var-file.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat(digitalocean): Packer nightly snapshot pipeline for fast boot
Add pre-built Packer snapshots for DigitalOcean droplets. Instead of
10-20 min cloud-init + agent install on every boot, snapshot-based
droplets boot in ~2-3 min (SSH only, agent pre-installed).
- Packer HCL2 template with parametrized agent/tier builds
- Agent build matrix (packer/agents.json) for all 7 agents
- Tier scripts mirroring cloud-init.ts package tiers
- Nightly GitHub Actions workflow (4 AM UTC, max-parallel: 3)
- Automatic cleanup: keeps only latest snapshot per agent
- CLI: findSpawnSnapshot() looks up pre-built images via DO API
- CLI: waitForSshOnly() skips cloud-init when using snapshots
- CLI: createServer() accepts optional snapshotId, skips user_data
- CLI: main.ts routes to fast path when snapshot detected
- Tests for findSpawnSnapshot() (5 cases, all passing)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(packer): use var-file for install_commands to avoid shell quoting issues
The previous approach passed install_commands as `-var` inline, but
GitHub Actions expands `${{ }}` before shell evaluation — JSON arrays
with `|`, `&&`, and `"` characters break shell quoting.
Fix: generate a `.auto.pkrvars.json` file (auto-loaded by Packer)
using jq with --argjson for safe JSON handling. Also route all
`${{ inputs }}` and `${{ matrix }}` values through env vars to
prevent script injection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: migrate shell script URLs to openrouter.ai/labs/spawn CDN
Users on older CLI versions can't auto-update because the repo was restructured
(cli/ → packages/cli/), so old version-check URLs 404. This decouples the CLI
from the repo's internal directory structure:
- Shell script URLs (install, agent scripts, github-auth) now use
openrouter.ai/labs/spawn/* as primary with GitHub raw as fallback
- Version checks now use GitHub release artifact (cli-latest/version)
as primary — a static URL that never changes regardless of repo layout
- CI workflow updated to publish a `version` file alongside cli.js
- Remove GITHUB_RAW_URL_PATTERN validation (no longer needed since
install URL is now a hardcoded CDN string, not interpolated)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: fix biome formatting in update-check test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: CLAUDE.md says biome lint but should say biome check
biome lint only checks lint rules, not formatting. biome check does both.
The hooks and CI already run biome check — the docs were out of sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(hooks): PostToolUse hook wasn't running biome on CLI source files
Two bugs in validate-file.ts:
1. Config search only checked 1-2 levels up from the edited file, but
biome.json is at packages/cli/ — 3 levels above src/__tests__/*.ts.
Fix: walk up directories until biome.json is found (or hit root).
2. Ran `biome format` (prints formatted output, always exits 0) instead
of `biome format --check` (exits non-zero if file needs formatting).
Fix: use `biome check` which does lint + format check in one pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: extract inline hook commands to TypeScript scripts in .claude/scripts/
Replace long inline `bash -c '...'` one-liners in .claude/settings.json with
standalone TypeScript scripts that are easier to read, debug, and maintain:
- enforce-worktree.ts: PreToolUse hook ensuring edits happen in worktrees
- validate-file.ts: PostToolUse hook for .sh/.ts file validation
- pre-merge-check.ts: PreToolUse hook running biome + tests before merge
Add .claude/scripts as a bun workspace package (@spawn/hooks).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: replace manual typeguards with valibot schemas in hook scripts
- Extract shared schemas (FilePathInput, CommandInput, parseStdin) to schemas.ts
- Replace inline multi-level typeof/in checks with v.safeParse() calls
- Add valibot dependency to @spawn/hooks package
- Add CLAUDE.md rule: always prefer valibot over manual typeguards, share schemas
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: split CLAUDE.md into modular .claude/rules/ files
Split the 437-line monolithic CLAUDE.md into a lean 89-line project overview
plus 9 focused rules files in .claude/rules/ (auto-loaded by Claude Code):
- culture.md — embrace bold changes, parallelize, verify exhaustively
- shell-scripts.md — curl|bash compat, macOS bash 3.x, ESM only, bun not python
- type-safety.md — no `as` assertions, ALWAYS use valibot (never manual typeguards)
- testing.md — bun:test only, no vitest, no subprocess spawning
- git-workflow.md — worktree-first mandatory workflow
- autonomous-loops.md — discovery/refactor service architecture
- discovery.md — how to fill matrix gaps, add clouds/agents
- documentation.md — never commit docs, use .docs/
- cli-version.md — bump version on every CLI change
The type-safety rule now explicitly mandates valibot schemas over manual
typeguard chains in all cases beyond single-primitive narrowing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(lint): run biome check across all packages in CI
The lint workflow only checked packages/cli/src/. Now it checks all
TypeScript locations in a single biome check command:
- packages/cli/src/ (with GritQL plugins)
- packages/shared/src/ (new biome.json)
- .claude/scripts/ (new biome.json)
- .claude/skills/setup-spa/
Fixed all pre-existing lint/format errors:
- node: protocol on all Node.js built-in imports in hook scripts
- useBlockStatements in packages/shared/src/type-guards.ts
- expand formatting in .claude/skills/setup-spa/main.ts and spa.test.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat!: remove Fly.io cloud provider support
Drop Fly.io as a supported cloud provider. Sprite (which uses Fly.io
infrastructure internally) is retained.
- Delete packages/cli/src/fly/ module, sh/fly/ scripts, fixtures/fly/
- Remove fly cloud entry and 6 fly matrix entries from manifest.json
- Remove fly imports, destroy cases, and connection handlers from commands.ts
- Remove fly-ssh sentinel from security.ts
- Port E2E test suite from Fly.io to AWS Lightsail (fly-e2e.sh → aws-e2e.sh)
- Update README (7 clouds, 42 combinations), CLAUDE.md, and skill prompts
- Clean up fly references in build config, gitignore, icon sources
- Bump CLI version to 0.11.0
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: restore Docker image build under sh/docker/
Move openclaw Dockerfile from sh/fly/docker/ to sh/docker/ and rename
workflow from fly-docker.yml to docker.yml with updated paths.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* style: fix extra blank lines in commands.ts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: L <6723574+louisgv@users.noreply.github.com>
* ci: add Mock Tests job to satisfy required status check
Split the unit-tests job into mock-tests (runs bun test) and unit-tests
(verifies cloud bundles build). The repo ruleset requires "Mock Tests",
"Unit Tests", and "Biome Lint" checks — the missing "Mock Tests" job was
blocking all PR merges.
Fixes#1901
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* style: fix pre-existing Biome format issues in 9 files
Auto-applied Biome formatter to src/ to resolve failing "Biome Lint"
required status check. No logic changes — formatting only.
Agent: issue-fixer
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
E2E tests now run as a 4th teammate alongside test-runner,
dedup-scanner, and code-quality-reviewer during schedule-triggered
QA cycles. The standalone e2e mode is preserved for on-demand use.
- Add e2e-tester teammate to qa-quality-prompt.md
- Increase quality mode timeout from 35 to 40 min
- Add "e2e" to trigger-server valid reasons
- Re-enable daily schedule in qa.yml, default to "schedule"
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The linter was running in CI with --warn-only, meaning it never
blocked anything — effectively vaporware. This removes --warn-only
to make it a real gate.
Also adds rules for bash 4.0+ features that were documented in
CLAUDE.md but not enforced:
- MC014: readarray/mapfile (bash 4.0+)
- MC015: coproc (bash 4.0+)
- MC016: &>> redirect (bash 4.0+)
- MC017: relative source paths (breaks curl|bash)
- MC018: wait -n (bash 4.3+)
- MC019: declare -g (bash 4.2+)
Excludes .claude/worktrees/ from scanning (temp copies, not
committed code).
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Restructure the repo as a Bun workspace monorepo:
- Move cli/ → packages/cli/
- Create packages/shared/ (@openrouter/spawn-shared) with type-guards and parse utilities
- Add root package.json with workspace configuration
- Update all CLI imports to use @openrouter/spawn-shared
- Deduplicate toRecord/toObjectArray helpers from 4 cloud modules
- Update SPA (slack-bot) to use shared package instead of local toObj()
- Update 48 agent shell scripts for new packages/cli/ path
- Update install.sh, install.ps1, e2e, and test scripts
- Update all GitHub workflows, .gitignore, pre-commit hooks
- Update CLAUDE.md, README.md, and skill prompt references
- Pin all dependency versions (no ^ ranges)
- Bump CLI version 0.9.1 → 0.10.0
All 1908 tests pass. Lint clean. All 8 cloud bundles build.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reorganizes the project so all shell scripts live under a dedicated
/sh directory, enabling the OpenRouter rewrite URL to point at /sh/
instead of the repository root.
Moves:
- cli/install.sh → sh/cli/install.sh
- shared/*.sh → sh/shared/*.sh
- {cloud}/{agent}.sh → sh/{cloud}/{agent}.sh (48 scripts)
- {cloud}/README.md → sh/{cloud}/README.md
- e2e/*.sh → sh/e2e/*.sh
- test/macos-compat.sh → sh/test/macos-compat.sh
- test/fixtures/**/*.sh → sh/test/fixtures/**/*.sh
Updates all references:
- RAW_BASE path construction in commands.ts, update-check.ts
- GitHub auth URL in agent-setup.ts
- Self-referencing URLs in install.sh, github-auth.sh
- CI workflow paths in lint.yml, cli-release.yml
- Test file paths in install-script-validation, manifest-integrity
- Documentation in README.md, cli/README.md, CLAUDE.md
- QA scripts in .claude/skills/
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add e2e/ directory with fly-e2e.sh orchestrator and lib/ helpers
(provision, verify, teardown, cleanup) that provision real Fly.io VMs,
verify agent installation, and tear everything down
- Fix openclaw E2E failure by setting MODEL_ID=openrouter/auto to bypass
interactive model selection prompt in headless mode
- Add e2e mode to qa.sh (reason=e2e) that launches a Claude agent to run
the E2E suite and investigate/fix any failures
- Update qa.yml with reason dropdown (e2e/schedule/fixtures), kept disabled
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The race condition: two PRs merged 3 seconds apart both triggered the
CLI Release workflow. The second run (v0.7.12) finished last and
overwrote the release with a stale binary, even though the repo HEAD
was at v0.8.0.
- Add concurrency group so concurrent releases cancel the older one
- Add workflow_dispatch trigger for manual re-runs
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Run `biome format --write` on all 98 source files (38 needed fixes).
The main change: object literals and long argument lists are now expanded
onto separate lines per Biome's `"expand": "always"` setting, making
code much easier to scan on narrow screens.
Add `biome format` check step to CI lint workflow so formatting
regressions are caught on every PR.
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Result monad for retry logic — prevent duplicate server creation
SSH exit 255 after an interactive session caused runWithRetries to retry
the entire bash script, creating duplicate servers. The old withRetry
also blindly retried all errors including timeouts where the remote
command may have already completed.
Introduces a Result<T> monad (Ok/Err) so callers explicitly signal
whether a failure is retryable (return Err) or fatal (throw). Adds
wrapSshCall() that classifies SSH errors: transient connection failures
are retryable, timeouts are not. Removes retry loop from the top-level
script runner entirely since it spans server creation + interactive
session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: mandate draft-PR-first workflow for all changes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add biome lint to CI and pre-commit hook, fix lint violations
- Add Biome lint job to .github/workflows/lint.yml
- Add TypeScript lint check to .githooks/pre-commit
- Fix useBlockStatements violations in ui.ts and tests
- Add biome lint to CLAUDE.md "After Each Change" checklist
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: rename Result.value to Result.data
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: clean up stale pre-commit hook
- Remove dead check for deleted functions (write_oauth_response_file,
create_oauth_response_html) — they no longer exist in the codebase
- Fix early exit skipping Biome lint when no .sh files are staged
- Replace echo -e with printf (the hook was using the pattern it bans)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve biome lint errors blocking CI
- Fix useImportType: import { type Result } → import type { Result }
- Fix noUnusedImports: remove unused KNOWN_FLAGS import
- Fix noUnusedTemplateLiteral: template literal → string literal
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
shared/common.sh (3852 lines) was dead code — the entire architecture
was rewritten to TypeScript in cli/src/. No agent scripts source it
anymore. The only consumer was github-auth.sh which just needed 4
log functions (now inlined).
Remove 27 test files that spawned ~800+ real bash/bun subprocesses per
run (the root cause of slow bun test). Every shared-common-*.test.ts
file forked a real bash shell per test case to source shared/common.sh.
CLI subprocess tests spawned `bun run index.ts` per assertion. These
were integration tests, not unit tests.
Also removes:
- mock-tests CI job from test.yml (ran test/mock.sh which opens browser)
- Stale plan files referencing deleted infrastructure
- All CLAUDE.md/README.md references to the old lib/common.sh pattern
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep workflow_dispatch for manual testing. Re-enable cron when the
QA VM is back online.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminates the slow waitForCloudInit() + bun install phase by booting
a pre-built image with Node.js, bun, and openclaw already installed.
The image is rebuilt daily via GitHub Actions to pick up new releases.
Other agents are unaffected — they still use ubuntu:24.04 + cloud-init.
Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move all fly TypeScript files from fly/lib/*.ts and fly/main.ts into
cli/src/fly/. This gives them access to cli/node_modules (@clack/prompts),
biome linting, and the existing bun:test infrastructure — no symlinks or
NODE_PATH hacks needed.
The org picker now uses @clack/prompts select() directly (static import,
bundled at build time).
New: cli/build-clouds.sh — auto-discovers cli/src/*/main.ts and bundles
each into {cloud}.js. Scalable to future cloud TS migrations:
bash cli/build-clouds.sh # build all
bash cli/build-clouds.sh fly # build one
Shims now check for cli/src/fly/main.ts (local) or download fly.js from
GitHub releases (remote curl|bash).
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds
aider-chat on Python 3.13 fails with `ImportError: cannot import name
'_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved
— those releases have no Python 3.13 binary wheels, so the C extension
is missing at runtime.
Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>`
and single quotes get mangled by `printf '%q'` in run_server before the
command reaches the remote machine) with `--upgrade`, which forces all
transitive deps including Pillow to their latest compatible versions.
Also adds a plain-text echo before the install so users see progress
instead of a silent hang during the 2-4 minute install.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: update aider/gptme/interpreter assertions from pip to uv
The install method for aider, gptme, and open-interpreter was changed
from pip to `uv tool install` across all clouds. The mock test
assertions still checked for the old `pip.*install.*` patterns, causing
9 failures (3 agents × 3 clouds).
Update patterns to match the actual `uv tool install` commands now used
in all cloud scripts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ci: trigger test run for uv assertion fix
* fix: prevent SSH hangs, restore stderr, fix command escaping across clouds
- Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH
stdin theft causing sequential install/verify/configure steps to hang
- Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default
SSH_OPTS so long-running installs don't silently drop on flaky networks
- Remove 2>/dev/null from Fly.io run_server so remote command errors are
no longer silently swallowed (--quiet flag still suppresses flyctl noise)
- Fix Fly.io printf '%q' double-quoting: remove extra quotes around
$escaped_cmd that prevented the remote shell from consuming escapes,
breaking && || | operators in commands
- Remove broken printf '%q' from Daytona run_server and interactive_session
where it escaped shell operators into literal characters since daytona exec
has no intermediate shell layer
- Pin aider to --python 3.12 instead of --with audioop-lts across all clouds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add --pty to fly ssh console for interactive sessions
fly ssh console -C does not allocate a pseudo-terminal by default,
causing interactive TUI agents (aider, claude) to fail with
"Input is not a terminal (fd=0)" or completely unresponsive input.
Adding --pty forces PTY allocation, matching how other clouds handle
interactive sessions (SSH uses -t, Sprite uses -tty).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prepend ~/.local/bin to PATH in ssh_run_server
After uv installs to ~/.local/bin, the current shell session doesn't
have it in PATH, causing "uv: command not found" on DigitalOcean and
all other SSH-based clouds (Hetzner, AWS, GCP, OVH).
Fly.io's run_server already prepends this PATH — now the shared
ssh_run_server does the same, fixing all SSH-based clouds at once.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add Node.js to cloud-init for all cloud providers
npm-based agents (codex, kilocode, etc.) fail with "npm: command not
found" because Node.js isn't installed during cloud-init. Fly.io was
the only provider installing Node.js (in wait_for_cloud_init).
Now all cloud-init scripts install Node.js v22 LTS from nodesource,
matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and
GCP cloud-init (was already in shared/DigitalOcean/Hetzner).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use apt packages for nodejs/npm instead of nodesource
The nodesource setup script (setup_22.x) runs its own apt-get update
and repository configuration, nearly doubling cloud-init time and
causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm
in its default repos — just add them to the packages list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add timeouts and better error handling to Daytona CLI commands
Daytona CLI commands (login, list, create) can hang indefinitely when
the API is slow or unreachable. This causes:
- "Failed to create sandbox: timeout" with no recovery
- Token validation timeouts misreported as "invalid token"
- Users re-entering valid tokens that also timeout
Fixes:
- Wrap all daytona CLI calls with timeout (30s for auth, 120s for create)
- Detect timeout errors separately from auth errors
- Show actionable "try again / check status" messages for timeouts
- Add nodejs/npm to Daytona wait_for_cloud_init
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: set DAYTONA_API_URL to Daytona Cloud by default
The Daytona CLI may default to connecting to a local self-hosted
server instead of Daytona Cloud. Without DAYTONA_API_URL set to
https://app.daytona.io/api, every CLI command (login, list, create)
hangs trying to reach a non-existent local server and times out.
The SDK documents this as the default, but the CLI doesn't always
pick it up — now we export it explicitly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing
n installs Node.js v22 to /usr/local/bin/node but apt's v18 at
/usr/bin/node can shadow it in non-interactive SSH sessions. After
n 22, symlink the new binaries over the apt ones so v22 is always
resolved. Also fix hcloud CLI token extraction for new TOML format.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address security review, add curl timeouts to trigger workflows
- Fix ssh_run_server command injection concern: use single-quoted
path_prefix so $HOME/$PATH expand remotely, not locally
- Add --connect-timeout 15 --max-time 30 to trigger workflows to
prevent 5-min hangs when server streams responses
- Handle 409 (dedup) as success — expected when cron fires every 15min
but cycles take 35min
- Reduce workflow timeout-minutes from 5 to 2
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add QA upgrade — macOS compat linter, per-agent mock assertions
Layer 1: macOS compat linter (test/macos-compat.sh)
- 12 rules (MC001–MC012) catching bash 3.2 incompatibilities
- Detects: base64 -w0 file args, non-portable echo flags, source <(),
((var++)), read -d, nounset flag, sed -i, date %N, local -n,
declare -A, ${var,,}, and |&
- Added to CI lint.yml in warn-only mode for burn-in
- Integrated as Phase 0.5 in qa-dry-run.sh
Layer 2: Per-agent mock assertions
- test/fixtures/_shared_agent_assertions.sh with install checks
for all 15 agents (claude, openclaw, aider, goose, etc.)
- Integrated into test/mock.sh via _run_agent_assertions()
Also includes branch fixes:
- Fix base64 -w0 to use stdin redirect (aws, daytona, fly)
- Fix fly/openclaw to use npm install instead of broken curl|bash
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add E2E test harness and integrate into QA pipeline
Add test/e2e.sh — a full E2E test harness that provisions real servers,
installs agents, and verifies setup across all clouds. Features:
- Smoke test (one canary agent per cloud) and full matrix modes
- Credential auto-detection for 8 clouds
- Per-cloud preflight validation (sequential) then parallel agent tests
- Stale server cleanup, timing history, cross-cloud comparison
- Auto-fix and optimization phases via Claude agents
- macOS bash 3.2 compatible
Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh:
- Runs after mock tests pass, gated on cloud credentials
- Phase 5b auto-fixes failures using per-agent worktree branches
- Parses results and includes in QA summary
Also fixes:
- shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read()
- aws/lib/common.sh: fix SSH key import (use cat instead of base64,
handle race condition on concurrent imports)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace the complex claude launch pattern (subshell + PID file + tee
pipe + stream-json + 50-line watchdog monitoring log file growth +
session-end detection) with a simple direct launch:
claude -p "..." >> "${LOG_FILE}" 2>&1 &
The watchdog is now just a wall-clock timeout. The idle-output detection,
stream-json result parsing, and tee piping are all removed.
Also remove GitHub Actions concurrency groups — the trigger server
already handles dedup (409 for same issue, 409 for same reason), making
the GH Actions concurrency groups redundant queuing.
Changes:
- refactor.sh: simple launch + wall-clock-only watchdog
- security.sh: same simplification
- discovery.sh: same (refactored _kill_claude_process and
_run_watchdog_loop to simpler signatures)
- All 4 workflows: remove concurrency groups
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The trigger server streamed script stdout back to GitHub Actions via a
long-lived HTTP response, requiring --http1.1, heartbeat injection,
server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min
GH Actions timeouts. In practice GitHub Actions is just a dumb trigger
— the real state lives on the VM (log files, journalctl). Simplify to
fire-and-forget: spawn script, return 200 JSON immediately.
Also fix the refactor and discovery team lead monitoring loops. The
prompts buried the loop in a single compressed line that the model
ignored (doing Bash("sleep 10") repeatedly without calling TaskList).
Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered
steps, matching the security.sh pattern that actually works.
Changes:
- trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer,
drainStreamOutput, startStreamingRun, heartbeat, ReadableStream),
replace with startFireAndForgetRun (stdout: "inherit", immediate JSON)
- All 4 workflows: simple curl POST, timeout-minutes 90→5, remove
--http1.1/-N/--max-time/exit-code handling
- refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps
- discovery-team-prompt.txt: same Monitor Loop fix
- SKILL.md: update architecture docs, remove streaming sections
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add team lead pre-approval gate: teammates spawn in plan mode and must
get approval before creating any PR (hard gate, not just prompt rules)
- Add diminishing returns rule: default posture is "code is good, shut down"
- Add dedup rule: check for existing open/closed PRs before creating new ones
- Require concrete PR justification (what breaks without this change)
- Add off-limits files list (.github/workflows, .claude/skills, CLAUDE.md)
- Use git pathspec exclusions in refactor.sh to never stage protected files
- Constrain pr-maintainer to only act on approved or feedback PRs
- Reduce refactor cron from every 5 minutes to every 2 hours
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Previously only org members were allowed. Now checks both org membership
and repo collaborator status, so invited collaborators can open issues
and PRs without being blocked.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The default GITHUB_TOKEN lacks issues and pull-requests write access,
causing 403 when trying to close issues/PRs from non-org members.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Automatically closes issues and PRs opened by non-members of the
OpenRouterTeam org with an explanatory comment.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix#1114 — `mv` failed because `~/.claude/settings.json` was
single-quoted on the remote shell, preventing tilde expansion.
Remove the single quotes around remote_path and add a mkdir -p
safety net.
Also bump the refactor team cron from hourly to every 5 minutes.
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- discovery: every 30 min → every 3 days
- refactor: every 5 min → hourly
- security: every 5 min → every 30 min
Co-authored-by: Security Reviewer <security-reviewer@spawn.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): propagate mock test exit code and fix broken pipe in summary
The test workflow had three issues:
- mock.sh exit code was swallowed by tee (no pipefail), so the check
always passed even with 165 failures
- grep|head pipe caused "write error: Broken pipe" in post summary
- Summary was noisy with 100+ individual result lines
Now uses PIPESTATUS[0] to capture the real exit code, shows a clean
results line plus collapsible failures list, and fails the check when
tests fail.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): report test results without blocking PRs
Pre-existing failures (165) shouldn't block unrelated PRs. The summary
still shows pass/fail counts and a collapsible failures list so the bot
can see the results.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* perf(ci): increase QA cycle frequency from daily to every 4 hours
Daily runs meant breakage could go undetected for up to 24 hours.
Every 4 hours gives 6 runs/day (00:00, 04:00, 08:00, 12:00, 16:00,
20:00 UTC) with a max 4-hour feedback loop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(ci): add missing Check results step to fail on test errors
Addresses review feedback:
- The exit code was captured via PIPESTATUS[0] into GITHUB_OUTPUT but
no subsequent step consumed it, so the workflow always passed even
when tests failed. Added a "Check results" step that reads the
captured exit code and fails the job accordingly.
- Reverted QA cron schedule change (every 4 hours back to daily at
06:00 UTC) as it was unrelated to the test exit code fix and should
be proposed separately if desired.
Agent: pr-maintainer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
* fix: strip ANSI colors before grepping test summary
The mock test output uses ANSI escape codes for colored ✓/✗/━━━
characters, so the grep in the Post summary step couldn't match
them. Strip colors with sed first.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use NO_COLOR standard instead of sed to strip ANSI codes
mock.sh now respects the NO_COLOR env var (https://no-color.org/).
CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Runs `bash test/mock.sh` on every pull request targeting main.
Includes concurrency grouping to cancel stale runs and a 10-minute
timeout. Results are posted to the GitHub Actions step summary.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move mode-detection logic from the GitHub Actions workflow into
security.sh where it belongs. The workflow now passes github.event_name
directly as the reason parameter (like discovery.yml and refactor.yml),
and security.sh uses `gh issue view` to check labels when reason=issues.
- Remove 25-line if/elif/else reason-mapping block from security.yml
- Remove workflow_dispatch mode input (server-side handles it)
- Add `if:` label guard for issues (safe-to-work + team-building/security)
- Add `labeled` to issue trigger types
- Set cancel-in-progress: false (prevents killing long review_all runs)
- Bump cron to */5
- Handle schedule/workflow_dispatch → review_all in security.sh
- Keep backwards compat for direct team_building/triage reasons
Co-authored-by: Security Reviewer <security-reviewer@spawn.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two problems:
1. Schedule was every 20 min but review_all cycles take 35 min,
causing overlapping triggers that fill both slots
2. Trigger server only deduped by issue number, not by reason,
so two review_all runs could stack up
Fixes:
- Change schedule from */20 to 0,45 (every 45 min)
- Add reason-based dedup in trigger-server.ts: reject 409 if a
non-issue run with the same reason is already in progress
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The two scheduled modes (review_all every 15 min, scan every 30 min)
competed for MAX_CONCURRENT=1 on the trigger server, causing 429 drops
and 30-55+ min gaps. Merge both into a single cycle that runs every
20 min, prioritizing PR review but also performing lightweight repo
scanning when capacity allows (≤5 open PRs).
Also prevents refactor agents from closing issues manually — issues
now auto-close via `Fixes #N` in the PR body when merged.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: security triage now applies full label taxonomy
Triage mode now applies:
- Safety label (safe-to-work / malicious / needs-human-review)
- Content-type label (bug, enhancement, security, question, etc.)
- Lifecycle label (Pending Review) so downstream teams can pick up
Team-building mode now transitions lifecycle labels:
- Adds "In Progress" at start, removes it on close
Added a "Available Labels Reference" section to the triage prompt
documenting all label categories for the agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: all security-filed issues get safe-to-work + Pending Review
Issues filed by the security team (scan findings, drift/anomaly
reports, follow-up issues from closed PRs) now automatically get
`safe-to-work` and `Pending Review` labels so downstream teams
can immediately pick them up without waiting for another triage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove Pending Review from safe-to-work issues
safe-to-work already means triage is complete — adding Pending Review
is redundant and confusing. Now only UNCLEAR issues get Pending Review
(they still need human attention). SAFE issues and security-filed
issues skip straight to actionable with just safe-to-work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: normalize all labels to kebab-case
Renamed on GitHub:
- "In Progress" → "in-progress"
- "Pending Review" → "pending-review"
- "Under Review" → "under-review"
- "good first issue" → "good-first-issue"
- "help wanted" → "help-wanted"
Updated all references in security.sh and refactor.sh to match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: align issue templates and workflows with actual repo labels
Created missing labels: cloud-request, agent-request, cli.
Replaced nonexistent needs-triage with pending-review in all templates.
Templates updated:
- bug_report: bug + pending-review
- cli_feature_request: cli + enhancement + pending-review
- cloud_request: cloud-request + enhancement + pending-review
- agent_request: agent-request + enhancement + pending-review
Workflows updated:
- refactor.yml: trigger on safe-to-work AND (bug|cli|enhancement|maintenance)
- discovery.yml: already correct (safe-to-work AND cloud-request|agent-request)
- security.yml: already correct (team-building label check)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simplify from 6 modes (Hexa-Mode) to 4 modes (Quad-Mode) by folding
single-PR review and hygiene into a unified review_all mode that runs
every 15 minutes. This removes the pull_request trigger entirely since
review_all catches all open PRs on schedule, and absorbs staleness
checks + branch cleanup into the same cycle.
Remaining modes: team_building, triage, review_all, scan.
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New issues are triaged by the security team before other workflows can
act on them. The triage agent checks for prompt injection, social
engineering, spam, and unsafe payloads — marking safe issues with
`safe-to-work`, closing malicious ones, or flagging unclear ones for
human review. Discovery and refactor workflows now require the
`safe-to-work` label in addition to their existing label requirements.
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New issue template: Team Building (team-building label) — 2 fields:
which agent team to improve + what to change
- Security team gets a new team_building mode: reads the issue, spawns
implementer + reviewer (both Opus), creates PR, reviews, merges, closes issue
- Discovery workflow: only triggers on cloud-request / agent-request issues
- Refactor workflow: only triggers on bug / cli issues
- Security workflow: only triggers on team-building issues (+ PR/schedule)
- All workflows still run on schedule and workflow_dispatch as before
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add security review team for PR review (#543)
Adds a security team that automatically reviews every PR for security
issues (injection, credential leaks, unsafe patterns, macOS compat)
and sends Slack notifications to #spawn when concerns are found.
- security.sh: dual-mode cycle script (PR review + scheduled scan)
- security.yml: GitHub Actions workflow on pull_request events
- start-security.sh: gitignored wrapper with secrets (deployed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: expand security team with hygiene, scan modes + auto-merge clean PRs
- PR mode: 2-agent team (code-reviewer + test-verifier) reviews PRs.
If zero findings, auto-approves AND merges. If concerns, requests
changes and sends Slack notification to #spawn.
- Hygiene mode (every 6h): pr-triager + branch-cleaner close stale PRs,
file follow-up issues, delete orphan branches.
- Scan mode (daily): shell-auditor + code-auditor + drift-detector
perform full repo security audit, file GitHub issues for findings.
- All modes use Claude Code agent teams (TeamCreate, parallel teammates
via Task tool, SendMessage coordination, TaskList monitoring).
- Workflow updated with schedule triggers and workflow_dispatch inputs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: upgrade all security auditor agents to Opus model
All security-critical roles (code-reviewer, pr-triager, shell-auditor,
code-auditor) now use Opus. Helper roles (test-verifier, branch-cleaner,
drift-detector) remain on Haiku.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: auto-merge PRs with MEDIUM/LOW or no findings
Only CRITICAL/HIGH findings block a PR. MEDIUM/LOW are informational
notes included in the approving review — PR still gets merged.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: bun install creates empty directories in proot (Termux)
because proot can't intercept bun's symlink/hardlink/copy_file_range
syscalls. This breaks both local build and source-mode fallback.
Fix: when `bun run build` fails, download the pre-built cli.js from
the `cli-latest` GitHub release. The bundled binary is self-contained
(80KB, all deps inlined) and only needs the bun runtime.
- Add CI workflow (.github/workflows/cli-release.yml) that builds and
uploads cli.js to a rolling `cli-latest` release on every push to main
- Replace broken source-mode fallback with GitHub release download
- Bump CLI version to 0.2.63
Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HTTP/2 has strict stream lifecycle management that doesn't play well
with long-lived chunked responses — curl exits with error 92
(stream not closed cleanly: INTERNAL_ERROR). HTTP/1.1 handles
persistent streaming connections natively.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the broken keep-alive ping loop with a fundamentally better
approach: the trigger server now streams the script's stdout/stderr
back as the HTTP response body in chunks. The GH Action holds the
curl connection open for the entire cycle duration (~90 min timeout).
This works because Sprite keeps VMs alive while "actively servicing
HTTP requests." A single long-lived streaming response satisfies
this naturally — no synthetic pings needed.
Key changes:
trigger-server.ts:
- /trigger now returns a streaming text/plain Response
- stdout/stderr piped through ReadableStream with chunked output
- 30s heartbeat lines injected during silent periods
- Client disconnect handled gracefully (process keeps running)
- X-Accel-Buffering: no header to prevent proxy buffering
discovery.yml / refactor.yml:
- curl -sSN --fail-with-body streams output in real-time
- timeout-minutes: 90 to hold the connection for full cycles
- Error responses (429/409/401) still print body and exit cleanly
discovery.sh / refactor.sh:
- Removed all keep-alive logic (start_keepalive/stop_keepalive)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>