Commit graph

54 commits

Author SHA1 Message Date
Ahmed Abushagur
f2795a6d84
fix: Node.js v22 upgrade, aider uv install, SSH & cloud reliability (#1440)
* fix: use uv --upgrade to ensure Python 3.13-compatible Pillow across all clouds

aider-chat on Python 3.13 fails with `ImportError: cannot import name
'_imaging' from 'PIL'` when an old Pillow version (pre-10.4) is resolved
— those releases have no Python 3.13 binary wheels, so the C extension
is missing at runtime.

Replace `--with 'Pillow>=10.2.0'` (which was silently broken — the `>`
and single quotes get mangled by `printf '%q'` in run_server before the
command reaches the remote machine) with `--upgrade`, which forces all
transitive deps including Pillow to their latest compatible versions.

Also adds a plain-text echo before the install so users see progress
instead of a silent hang during the 2-4 minute install.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: update aider/gptme/interpreter assertions from pip to uv

The install method for aider, gptme, and open-interpreter was changed
from pip to `uv tool install` across all clouds. The mock test
assertions still checked for the old `pip.*install.*` patterns, causing
9 failures (3 agents × 3 clouds).

Update patterns to match the actual `uv tool install` commands now used
in all cloud scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: trigger test run for uv assertion fix

* fix: prevent SSH hangs, restore stderr, fix command escaping across clouds

- Add < /dev/null to ssh_run_server and generic_ssh_wait to prevent SSH
  stdin theft causing sequential install/verify/configure steps to hang
- Add ServerAliveInterval, ServerAliveCountMax, ConnectTimeout to default
  SSH_OPTS so long-running installs don't silently drop on flaky networks
- Remove 2>/dev/null from Fly.io run_server so remote command errors are
  no longer silently swallowed (--quiet flag still suppresses flyctl noise)
- Fix Fly.io printf '%q' double-quoting: remove extra quotes around
  $escaped_cmd that prevented the remote shell from consuming escapes,
  breaking && || | operators in commands
- Remove broken printf '%q' from Daytona run_server and interactive_session
  where it escaped shell operators into literal characters since daytona exec
  has no intermediate shell layer
- Pin aider to --python 3.12 instead of --with audioop-lts across all clouds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add --pty to fly ssh console for interactive sessions

fly ssh console -C does not allocate a pseudo-terminal by default,
causing interactive TUI agents (aider, claude) to fail with
"Input is not a terminal (fd=0)" or completely unresponsive input.

Adding --pty forces PTY allocation, matching how other clouds handle
interactive sessions (SSH uses -t, Sprite uses -tty).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prepend ~/.local/bin to PATH in ssh_run_server

After uv installs to ~/.local/bin, the current shell session doesn't
have it in PATH, causing "uv: command not found" on DigitalOcean and
all other SSH-based clouds (Hetzner, AWS, GCP, OVH).

Fly.io's run_server already prepends this PATH — now the shared
ssh_run_server does the same, fixing all SSH-based clouds at once.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add Node.js to cloud-init for all cloud providers

npm-based agents (codex, kilocode, etc.) fail with "npm: command not
found" because Node.js isn't installed during cloud-init. Fly.io was
the only provider installing Node.js (in wait_for_cloud_init).

Now all cloud-init scripts install Node.js v22 LTS from nodesource,
matching Fly.io's setup. Also adds ~/.local/bin to PATH in AWS and
GCP cloud-init (was already in shared/DigitalOcean/Hetzner).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use apt packages for nodejs/npm instead of nodesource

The nodesource setup script (setup_22.x) runs its own apt-get update
and repository configuration, nearly doubling cloud-init time and
causing hangs on DigitalOcean. Ubuntu 24.04 includes nodejs and npm
in its default repos — just add them to the packages list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add timeouts and better error handling to Daytona CLI commands

Daytona CLI commands (login, list, create) can hang indefinitely when
the API is slow or unreachable. This causes:
- "Failed to create sandbox: timeout" with no recovery
- Token validation timeouts misreported as "invalid token"
- Users re-entering valid tokens that also timeout

Fixes:
- Wrap all daytona CLI calls with timeout (30s for auth, 120s for create)
- Detect timeout errors separately from auth errors
- Show actionable "try again / check status" messages for timeouts
- Add nodejs/npm to Daytona wait_for_cloud_init

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: set DAYTONA_API_URL to Daytona Cloud by default

The Daytona CLI may default to connecting to a local self-hosted
server instead of Daytona Cloud. Without DAYTONA_API_URL set to
https://app.daytona.io/api, every CLI command (login, list, create)
hangs trying to reach a non-existent local server and times out.

The SDK documents this as the default, but the CLI doesn't always
pick it up — now we export it explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: symlink n-installed Node.js v22 over apt v18 to prevent shadowing

n installs Node.js v22 to /usr/local/bin/node but apt's v18 at
/usr/bin/node can shadow it in non-interactive SSH sessions. After
n 22, symlink the new binaries over the apt ones so v22 is always
resolved. Also fix hcloud CLI token extraction for new TOML format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address security review, add curl timeouts to trigger workflows

- Fix ssh_run_server command injection concern: use single-quoted
  path_prefix so $HOME/$PATH expand remotely, not locally
- Add --connect-timeout 15 --max-time 30 to trigger workflows to
  prevent 5-min hangs when server streams responses
- Handle 409 (dedup) as success — expected when cron fires every 15min
  but cycles take 35min
- Reduce workflow timeout-minutes from 5 to 2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 06:54:07 -05:00
Ahmed Abushagur
22b6a402f4
feat: E2E test harness, QA pipeline integration, macOS compat linter (#1425)
* feat: add QA upgrade — macOS compat linter, per-agent mock assertions

Layer 1: macOS compat linter (test/macos-compat.sh)
- 12 rules (MC001–MC012) catching bash 3.2 incompatibilities
- Detects: base64 -w0 file args, non-portable echo flags, source <(),
  ((var++)), read -d, nounset flag, sed -i, date %N, local -n,
  declare -A, ${var,,}, and |&
- Added to CI lint.yml in warn-only mode for burn-in
- Integrated as Phase 0.5 in qa-dry-run.sh

Layer 2: Per-agent mock assertions
- test/fixtures/_shared_agent_assertions.sh with install checks
  for all 15 agents (claude, openclaw, aider, goose, etc.)
- Integrated into test/mock.sh via _run_agent_assertions()

Also includes branch fixes:
- Fix base64 -w0 to use stdin redirect (aws, daytona, fly)
- Fix fly/openclaw to use npm install instead of broken curl|bash

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add E2E test harness and integrate into QA pipeline

Add test/e2e.sh — a full E2E test harness that provisions real servers,
installs agents, and verifies setup across all clouds. Features:
- Smoke test (one canary agent per cloud) and full matrix modes
- Credential auto-detection for 8 clouds
- Per-cloud preflight validation (sequential) then parallel agent tests
- Stale server cleanup, timing history, cross-cloud comparison
- Auto-fix and optimization phases via Claude agents
- macOS bash 3.2 compatible

Integrate E2E as Phase 5 in both qa-cycle.sh and qa-dry-run.sh:
- Runs after mock tests pass, gated on cloud credentials
- Phase 5b auto-fixes failures using per-agent worktree branches
- Parses results and includes in QA summary

Also fixes:
- shared/common.sh: honour SPAWN_NON_INTERACTIVE=1 in safe_read()
- aws/lib/common.sh: fix SSH key import (use cat instead of base64,
  handle race condition on concurrent imports)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 20:41:07 -05:00
L
6e13256d96
refactor: simplify claude launch — no streaming, no output monitoring (#1412)
Replace the complex claude launch pattern (subshell + PID file + tee
pipe + stream-json + 50-line watchdog monitoring log file growth +
session-end detection) with a simple direct launch:

  claude -p "..." >> "${LOG_FILE}" 2>&1 &

The watchdog is now just a wall-clock timeout. The idle-output detection,
stream-json result parsing, and tee piping are all removed.

Also remove GitHub Actions concurrency groups — the trigger server
already handles dedup (409 for same issue, 409 for same reason), making
the GH Actions concurrency groups redundant queuing.

Changes:
- refactor.sh: simple launch + wall-clock-only watchdog
- security.sh: same simplification
- discovery.sh: same (refactored _kill_claude_process and
  _run_watchdog_loop to simpler signatures)
- All 4 workflows: remove concurrency groups

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-17 09:02:47 -08:00
L
f3cfe890f7
refactor: simplify trigger server to fire-and-forget + fix monitoring loop prompts (#1384)
The trigger server streamed script stdout back to GitHub Actions via a
long-lived HTTP response, requiring --http1.1, heartbeat injection,
server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min
GH Actions timeouts. In practice GitHub Actions is just a dumb trigger
— the real state lives on the VM (log files, journalctl). Simplify to
fire-and-forget: spawn script, return 200 JSON immediately.

Also fix the refactor and discovery team lead monitoring loops. The
prompts buried the loop in a single compressed line that the model
ignored (doing Bash("sleep 10") repeatedly without calling TaskList).
Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered
steps, matching the security.sh pattern that actually works.

Changes:
- trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer,
  drainStreamOutput, startStreamingRun, heartbeat, ReadableStream),
  replace with startFireAndForgetRun (stdout: "inherit", immediate JSON)
- All 4 workflows: simple curl POST, timeout-minutes 90→5, remove
  --http1.1/-N/--max-time/exit-code handling
- refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps
- discovery-team-prompt.txt: same Monitor Loop fix
- SKILL.md: update architecture docs, remove streaming sections

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-17 10:47:52 -05:00
A
99a9badf62
ci: increase refactor team frequency to every 15 minutes (#1378)
Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-16 20:50:03 -08:00
Ahmed Abushagur
3fbdf56c4c
fix: add guardrails to prevent bots from inventing unnecessary work (#1347)
- Add team lead pre-approval gate: teammates spawn in plan mode and must
  get approval before creating any PR (hard gate, not just prompt rules)
- Add diminishing returns rule: default posture is "code is good, shut down"
- Add dedup rule: check for existing open/closed PRs before creating new ones
- Require concrete PR justification (what breaks without this change)
- Add off-limits files list (.github/workflows, .claude/skills, CLAUDE.md)
- Use git pathspec exclusions in refactor.sh to never stage protected files
- Constrain pr-maintainer to only act on approved or feedback PRs
- Reduce refactor cron from every 5 minutes to every 2 hours

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 20:24:25 -05:00
A
a4fe0388c1
fix: allow repo collaborators through the gate workflow (#1166)
Previously only org members were allowed. Now checks both org membership
and repo collaborator status, so invited collaborators can open issues
and PRs without being blocked.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-14 18:32:50 -08:00
A
8108d57999
fix: add write permissions to gate workflow (#1148)
The default GITHUB_TOKEN lacks issues and pull-requests write access,
causing 403 when trying to close issues/PRs from non-org members.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-14 16:37:49 -08:00
A
2a5137a919
feat: add gate workflow to restrict issues/PRs to org members (#1146)
Automatically closes issues and PRs opened by non-members of the
OpenRouterTeam org with an explanatory comment.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-14 19:33:02 -05:00
A
d589b0d74e
fix: tilde expansion in upload_config_file + bump refactor frequency (#1131)
Fix #1114 — `mv` failed because `~/.claude/settings.json` was
single-quoted on the remote shell, preventing tilde expansion.
Remove the single quotes around remote_path and add a mkdir -p
safety net.

Also bump the refactor team cron from hourly to every 5 minutes.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-14 17:08:36 -05:00
L
0a0512652a
chore: reduce workflow cron frequencies (#1046)
- discovery: every 30 min → every 3 days
- refactor: every 5 min → hourly
- security: every 5 min → every 30 min

Co-authored-by: Security Reviewer <security-reviewer@spawn.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 18:55:40 -08:00
Ahmed Abushagur
b4abe8012f
fix(ci): propagate mock test exit code and fix broken pipe in summary (#1032)
* fix(ci): propagate mock test exit code and fix broken pipe in summary

The test workflow had three issues:
- mock.sh exit code was swallowed by tee (no pipefail), so the check
  always passed even with 165 failures
- grep|head pipe caused "write error: Broken pipe" in post summary
- Summary was noisy with 100+ individual result lines

Now uses PIPESTATUS[0] to capture the real exit code, shows a clean
results line plus collapsible failures list, and fails the check when
tests fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): report test results without blocking PRs

Pre-existing failures (165) shouldn't block unrelated PRs. The summary
still shows pass/fail counts and a collapsible failures list so the bot
can see the results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf(ci): increase QA cycle frequency from daily to every 4 hours

Daily runs meant breakage could go undetected for up to 24 hours.
Every 4 hours gives 6 runs/day (00:00, 04:00, 08:00, 12:00, 16:00,
20:00 UTC) with a max 4-hour feedback loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(ci): add missing Check results step to fail on test errors

Addresses review feedback:
- The exit code was captured via PIPESTATUS[0] into GITHUB_OUTPUT but
  no subsequent step consumed it, so the workflow always passed even
  when tests failed. Added a "Check results" step that reads the
  captured exit code and fails the job accordingly.
- Reverted QA cron schedule change (every 4 hours back to daily at
  06:00 UTC) as it was unrelated to the test exit code fix and should
  be proposed separately if desired.

Agent: pr-maintainer
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
2026-02-13 20:46:45 -05:00
Ahmed Abushagur
d501b5eb1d
fix: CI test summary uses NO_COLOR instead of sed hack (#985)
* fix: strip ANSI colors before grepping test summary

The mock test output uses ANSI escape codes for colored ✓/✗/━━━
characters, so the grep in the Post summary step couldn't match
them. Strip colors with sed first.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use NO_COLOR standard instead of sed to strip ANSI codes

mock.sh now respects the NO_COLOR env var (https://no-color.org/).
CI sets NO_COLOR=1 so grep matches ✓/✗/━━━ cleanly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:26:41 -08:00
Ahmed Abushagur
50b2e98d7d
ci: add mock test workflow for PRs (#977)
Runs `bash test/mock.sh` on every pull request targeting main.
Includes concurrency grouping to cancel stale runs and a 10-minute
timeout. Results are posted to the GitHub Actions step summary.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 11:11:15 -08:00
L
f69f95c7c7
refactor: Simplify security workflow to match discovery/refactor pattern (#929)
Move mode-detection logic from the GitHub Actions workflow into
security.sh where it belongs. The workflow now passes github.event_name
directly as the reason parameter (like discovery.yml and refactor.yml),
and security.sh uses `gh issue view` to check labels when reason=issues.

- Remove 25-line if/elif/else reason-mapping block from security.yml
- Remove workflow_dispatch mode input (server-side handles it)
- Add `if:` label guard for issues (safe-to-work + team-building/security)
- Add `labeled` to issue trigger types
- Set cancel-in-progress: false (prevents killing long review_all runs)
- Bump cron to */5
- Handle schedule/workflow_dispatch → review_all in security.sh
- Keep backwards compat for direct team_building/triage reasons

Co-authored-by: Security Reviewer <security-reviewer@spawn.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 05:26:21 -08:00
L
49bb39c8ec
fix: prevent duplicate review_all runs via reason-based dedup (#848)
Two problems:
1. Schedule was every 20 min but review_all cycles take 35 min,
   causing overlapping triggers that fill both slots
2. Trigger server only deduped by issue number, not by reason,
   so two review_all runs could stack up

Fixes:
- Change schedule from */20 to 0,45 (every 45 min)
- Add reason-based dedup in trigger-server.ts: reject 409 if a
  non-issue run with the same reason is already in progress

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-13 01:41:11 -08:00
L
56c4c020d5
feat: consolidate security review_all and scan into single 20-min cycle (#802)
The two scheduled modes (review_all every 15 min, scan every 30 min)
competed for MAX_CONCURRENT=1 on the trigger server, causing 429 drops
and 30-55+ min gaps. Merge both into a single cycle that runs every
20 min, prioritizing PR review but also performing lightweight repo
scanning when capacity allows (≤5 open PRs).

Also prevents refactor agents from closing issues manually — issues
now auto-close via `Fixes #N` in the PR body when merged.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 20:29:56 -08:00
L
f7c6e07867
feat: security triage applies full label taxonomy (#766)
* feat: security triage now applies full label taxonomy

Triage mode now applies:
- Safety label (safe-to-work / malicious / needs-human-review)
- Content-type label (bug, enhancement, security, question, etc.)
- Lifecycle label (Pending Review) so downstream teams can pick up

Team-building mode now transitions lifecycle labels:
- Adds "In Progress" at start, removes it on close

Added a "Available Labels Reference" section to the triage prompt
documenting all label categories for the agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: all security-filed issues get safe-to-work + Pending Review

Issues filed by the security team (scan findings, drift/anomaly
reports, follow-up issues from closed PRs) now automatically get
`safe-to-work` and `Pending Review` labels so downstream teams
can immediately pick them up without waiting for another triage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove Pending Review from safe-to-work issues

safe-to-work already means triage is complete — adding Pending Review
is redundant and confusing. Now only UNCLEAR issues get Pending Review
(they still need human attention). SAFE issues and security-filed
issues skip straight to actionable with just safe-to-work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: normalize all labels to kebab-case

Renamed on GitHub:
- "In Progress" → "in-progress"
- "Pending Review" → "pending-review"
- "Under Review" → "under-review"
- "good first issue" → "good-first-issue"
- "help wanted" → "help-wanted"

Updated all references in security.sh and refactor.sh to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: align issue templates and workflows with actual repo labels

Created missing labels: cloud-request, agent-request, cli.
Replaced nonexistent needs-triage with pending-review in all templates.

Templates updated:
- bug_report: bug + pending-review
- cli_feature_request: cli + enhancement + pending-review
- cloud_request: cloud-request + enhancement + pending-review
- agent_request: agent-request + enhancement + pending-review

Workflows updated:
- refactor.yml: trigger on safe-to-work AND (bug|cli|enhancement|maintenance)
- discovery.yml: already correct (safe-to-work AND cloud-request|agent-request)
- security.yml: already correct (team-building label check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 16:20:07 -08:00
L
15e2ca6caf
feat: consolidate security modes — merge pr+hygiene into review_all (#739)
Simplify from 6 modes (Hexa-Mode) to 4 modes (Quad-Mode) by folding
single-PR review and hygiene into a unified review_all mode that runs
every 15 minutes. This removes the pull_request trigger entirely since
review_all catches all open PRs on schedule, and absorbs staleness
checks + branch cleanup into the same cycle.

Remaining modes: team_building, triage, review_all, scan.

Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 14:53:26 -08:00
L
4924a7d5db
feat: add security triage gate for issue safety before agent processing (#734)
New issues are triaged by the security team before other workflows can
act on them. The triage agent checks for prompt injection, social
engineering, spam, and unsafe payloads — marking safe issues with
`safe-to-work`, closing malicious ones, or flagging unclear ones for
human review. Discovery and refactor workflows now require the
`safe-to-work` label in addition to their existing label requirements.

Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 14:23:33 -08:00
L
4d175ae6c7
feat: add Team Building issue template + route workflows by label (#733)
- New issue template: Team Building (team-building label) — 2 fields:
  which agent team to improve + what to change
- Security team gets a new team_building mode: reads the issue, spawns
  implementer + reviewer (both Opus), creates PR, reviews, merges, closes issue
- Discovery workflow: only triggers on cloud-request / agent-request issues
- Refactor workflow: only triggers on bug / cli issues
- Security workflow: only triggers on team-building issues (+ PR/schedule)
- All workflows still run on schedule and workflow_dispatch as before

Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 14:17:57 -08:00
L
56ba47109c
feat: add security review team for PR review (#543) (#730)
* feat: add security review team for PR review (#543)

Adds a security team that automatically reviews every PR for security
issues (injection, credential leaks, unsafe patterns, macOS compat)
and sends Slack notifications to #spawn when concerns are found.

- security.sh: dual-mode cycle script (PR review + scheduled scan)
- security.yml: GitHub Actions workflow on pull_request events
- start-security.sh: gitignored wrapper with secrets (deployed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: expand security team with hygiene, scan modes + auto-merge clean PRs

- PR mode: 2-agent team (code-reviewer + test-verifier) reviews PRs.
  If zero findings, auto-approves AND merges. If concerns, requests
  changes and sends Slack notification to #spawn.
- Hygiene mode (every 6h): pr-triager + branch-cleaner close stale PRs,
  file follow-up issues, delete orphan branches.
- Scan mode (daily): shell-auditor + code-auditor + drift-detector
  perform full repo security audit, file GitHub issues for findings.
- All modes use Claude Code agent teams (TeamCreate, parallel teammates
  via Task tool, SendMessage coordination, TaskList monitoring).
- Workflow updated with schedule triggers and workflow_dispatch inputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: upgrade all security auditor agents to Opus model

All security-critical roles (code-reviewer, pr-triager, shell-auditor,
code-auditor) now use Opus. Helper roles (test-verifier, branch-cleaner,
drift-detector) remain on Haiku.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: auto-merge PRs with MEDIUM/LOW or no findings

Only CRITICAL/HIGH findings block a PR. MEDIUM/LOW are informational
notes included in the approving review — PR still gets merged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Sprite <noreply@sprites.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 14:04:38 -08:00
L
d961947983
fix: download pre-built CLI from GitHub release when local build fails (#728)
Root cause: bun install creates empty directories in proot (Termux)
because proot can't intercept bun's symlink/hardlink/copy_file_range
syscalls. This breaks both local build and source-mode fallback.

Fix: when `bun run build` fails, download the pre-built cli.js from
the `cli-latest` GitHub release. The bundled binary is self-contained
(80KB, all deps inlined) and only needs the bun runtime.

- Add CI workflow (.github/workflows/cli-release.yml) that builds and
  uploads cli.js to a rolling `cli-latest` release on every push to main
- Replace broken source-mode fallback with GitHub release download
- Bump CLI version to 0.2.63

Co-authored-by: Sprite <noreply@sprite.dev>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-12 13:48:45 -08:00
Ahmed Abushagur
8b9f9a0e5a
QA-Bot setup (#335)
* feat: testing

* feat: auto-fix dead apis

* fix: mock works

* feat: new fixtures

* fix: more clouds tested

* fix: dry run fix

* fix: civo valid size

* fix: civo result wait

* feat: fixtures

* feat: per cloud agent
2026-02-10 19:51:07 -08:00
B
200b6dc5b2 fix: Force HTTP/1.1 for streaming to avoid HTTP/2 stream errors
HTTP/2 has strict stream lifecycle management that doesn't play well
with long-lived chunked responses — curl exits with error 92
(stream not closed cleanly: INTERNAL_ERROR). HTTP/1.1 handles
persistent streaming connections natively.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 22:51:35 +00:00
B
874b9c95f4 feat: Stream script output back to GH Actions instead of keep-alive
Replace the broken keep-alive ping loop with a fundamentally better
approach: the trigger server now streams the script's stdout/stderr
back as the HTTP response body in chunks. The GH Action holds the
curl connection open for the entire cycle duration (~90 min timeout).

This works because Sprite keeps VMs alive while "actively servicing
HTTP requests." A single long-lived streaming response satisfies
this naturally — no synthetic pings needed.

Key changes:

trigger-server.ts:
- /trigger now returns a streaming text/plain Response
- stdout/stderr piped through ReadableStream with chunked output
- 30s heartbeat lines injected during silent periods
- Client disconnect handled gracefully (process keeps running)
- X-Accel-Buffering: no header to prevent proxy buffering

discovery.yml / refactor.yml:
- curl -sSN --fail-with-body streams output in real-time
- timeout-minutes: 90 to hold the connection for full cycles
- Error responses (429/409/401) still print body and exit cleanly

discovery.sh / refactor.sh:
- Removed all keep-alive logic (start_keepalive/stop_keepalive)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 18:09:26 +00:00
A
6f47c852c8 Increase refactor workflow frequency from 30min to 5min
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 16:24:12 +00:00
B
1029320cff refactor: Rename improve to discovery and remove improve CLI command
Rename the GitHub workflow, scripts, and service from "improve" to
"discovery" to better reflect what the automation does. Remove the
`spawn improve` CLI command entirely — the discovery/refactor loops
are internal automation, not user-facing CLI features.

File renames:
- .github/workflows/improve.yml → discovery.yml
- .claude/skills/.../improve.sh → discovery.sh
- .claude/skills/.../start-improve.sh → start-discovery.sh
- Service: improve-trigger → discovery-trigger

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 16:13:56 +00:00
A
7ace2695e6
feat: Run issue-fix cycles concurrently with refactor cycles (#145)
Issue triggers now spawn lightweight 2-agent runs (15-min timeout) in
isolated worktrees, while refactor cycles continue independently with
the full 6-agent team (30-min timeout). Duplicate issue runs are
rejected with 409.

- trigger-server.ts: pass SPAWN_ISSUE/SPAWN_REASON env vars to script,
  add issue dedup (409), include issue in health/trigger responses
- refactor.sh: dual-mode (issue vs refactor) with isolated worktrees,
  mode-specific prompts and timeouts, scoped cleanup
- start-refactor.sh: set MAX_CONCURRENT=3 (gitignored, local only)
- refactor.yml: handle 409 alongside existing 429

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 22:15:19 -08:00
B
6b5a547e2d fix: Treat 429 (cycle already running) as success in workflows
When MAX_CONCURRENT=1 and a cycle is in progress, the trigger server
returns 429. This is expected behavior, not an error — the previous
curl -f treated it as failure (exit code 22).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-10 03:43:32 +00:00
A
ab343d26a2
fix: Prevent duplicate work, add graceful shutdown, and enforce team lifecycle (#86)
- Change trigger-server MAX_CONCURRENT default from 3 to 1 to prevent
  overlapping cycles that duplicate GitHub issue comments
- Add SIGTERM/SIGINT handling to trigger-server so running scripts finish
  gracefully on service restart instead of being killed mid-flight
- Add cleanup trap to refactor.sh for worktree/tempfile cleanup on exit
- Add pre-cycle cleanup of stale worktrees, merged branches, and
  abandoned PRs from previously interrupted cycles
- Add mandatory Lifecycle Management section to team lead prompt requiring
  shutdown_request to all teammates before exiting
- Add dedup checks to community-coordinator: check existing comments
  before posting to prevent duplicate acknowledgments/resolutions
- Pass issue number in workflow trigger reason for better logging

Co-authored-by: A <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 09:10:56 -08:00
B
4c456df091 fix: Switch to direct sprite URL with bearer auth
The Sprite start service API (/services/{name}/start) returns
"service name required" for all service names — appears to be an API
bug. Switched to hitting the sprite's public URL directly with
TRIGGER_SECRET bearer auth instead.

- Re-added TRIGGER_SECRET auth to trigger-server.ts
- Set sprite url_settings.auth to "public"
- Updated both workflows to use SPRITE_URL + TRIGGER_SECRET pattern
- Aligned workflow structure (both use same env vars and curl format)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 09:07:49 +00:00
B
9eb9e74295 debug: Print secret lengths and hash to verify values
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 08:35:23 +00:00
B
87e5790880 debug: Echo SVC_NAME in refactor workflow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 08:16:52 +00:00
B
341710d1cc rename: Improve workflow to Discovery
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 08:15:19 +00:00
B
460ee25690 chore: Align improve workflow with refactor workflow
- Use env vars from secrets instead of hardcoded names
- Add issues trigger (opened, reopened)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 08:14:36 +00:00
Sprite
a361d92e13 fix: Pass env vars correctly in refactor workflow
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 08:13:09 +00:00
Sprite
15dd5e264f debug: Exact curl from docs with hyphenated service name 2026-02-09 01:14:07 +00:00
Sprite
4f78b9b172 debug: Test alternate URL path formats 2026-02-09 01:12:22 +00:00
Sprite
4c35f1db78 debug: Test start API vs direct sprite URL 2026-02-09 01:11:12 +00:00
Sprite
a433b067ad debug: Test start service with body and alternate paths 2026-02-09 01:10:03 +00:00
Sprite
44cafc7cc5 debug: Test API at each level to isolate failure 2026-02-09 01:08:59 +00:00
Sprite
58f9e8d34d debug: Hardcode sprite/service names to isolate API issue 2026-02-09 01:07:47 +00:00
Sprite
6066afcf18 fix: Rename service to improve_trigger (underscores for API compat)
Sprite API rejects service names with hyphens. Renamed from
improve-trigger to improve_trigger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 01:06:46 +00:00
Sprite
774c3d0cc1 debug: Add verbose logging to improve workflow 2026-02-09 01:05:33 +00:00
Sprite
758e79bb59 fix: Inline secret refs in curl URL to avoid env var issues
SERVICE_NAME env var may conflict with GitHub Actions internals.
Inline the secrets directly in the URL template instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 01:04:26 +00:00
Sprite
57cf080c39 chore: Run refactor workflow every 30 minutes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-09 00:09:04 +00:00
Sprite
66221dac80 fix: Use duration=0s to fire-and-forget on start service API
The Sprite start service API returns streaming NDJSON, causing curl -f
to fail with exit code 22. Use duration=0s to return immediately and
drop -f flag since the response is streaming.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-08 23:40:50 +00:00
Sprite
b7b102a352 fix: Remove curl timeout on trigger workflows
Sprite may take time to wake from pause, causing --max-time 30 to fail
with exit code 22.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-08 21:33:03 +00:00
Sprite
38ffd7ebd6 feat: Update trigger workflows to use Sprite start service API
- Replace SPRITE_URL/SPRITE_SECRET pattern with SPRITE_NAME/SERVICE_NAME
- Use Sprite start service API endpoint (api.sprites.dev)
- Share SPRITE_TOKEN across all services
- Update skill documentation to reflect new approach
- Delete deprecated URL/SECRET based secrets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-08 20:29:19 +00:00