spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-01 21:30:21 +00:00

Author	SHA1	Message	Date
Ahmed Abushagur	ed127cf592	feat: never-give-up resilience layer (#2807 ) Some checks failed CLI Release / Build and release CLI (push) Failing after 5s Details Lint / Biome Lint (push) Failing after 4s Details Lint / macOS Compatibility (push) Successful in 15s Details Lint / ShellCheck (push) Successful in 59s Details * feat: never-give-up resilience layer — retry every failure instead of exiting Add retryOrQuit() helper to shared/ui.ts that prompts "Try again? (Y/n)" after any recoverable failure. Wrap all fatal exit points with retry loops: - Cloud auth (Hetzner, DigitalOcean, AWS, GCP): retry after 3 failed tokens - API key acquisition: retry after 3 failed OAuth+manual attempts - Server creation: retry on any createServer failure (both fast & sequential) - SSH readiness: retry on waitForReady timeout - Agent install: retry on install failure - Pre-launch hooks: retry on preLaunch failure Non-interactive mode (SPAWN_NON_INTERACTIVE=1) still throws immediately. Ctrl+C at any retry prompt exits cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(e2e): add AI-driven interactive test harness Add --interactive mode to the E2E test framework. Instead of running spawn in headless mode (SPAWN_NON_INTERACTIVE=1), this spawns the CLI in a real PTY and uses Claude Haiku to respond to prompts like a human user would. New files: - sh/e2e/interactive-harness.ts — Bun script that drives the PTY + AI loop - sh/e2e/lib/interactive.sh — Bash integration with the E2E framework Usage: e2e.sh --cloud hetzner claude --interactive Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(qa): wire interactive E2E into scheduled QA pipeline - Add `e2e-interactive` option to workflow_dispatch in qa.yml - Add `e2e-interactive` run mode to qa.sh (loads cloud creds + ANTHROPIC_API_KEY) - Runs `e2e.sh --cloud hetzner claude --interactive` directly (no Claude Code needed) - Defaults to hetzner (cheapest), overridable via E2E_INTERACTIVE_CLOUD/AGENT env vars Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(qa): schedule interactive E2E daily at 6am UTC Runs one agent (claude) on one cloud (hetzner) with AI-driven prompts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(qa): offset soak cron to avoid GitHub Actions schedule dedup GitHub Actions deduplicates overlapping cron schedules into one run, making `github.event.schedule` unpredictable. The soak test at `0 3 * * 1` was getting absorbed by the `0 /4 * ` quality sweep and never firing as reason=soak. Move soak to `30 1 * 1` (Monday 1:30am UTC) — safely between the 0am and 4am quality sweep slots. Interactive E2E at `0 6 * * ` is already safe (between the 4am and 8am slots). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> fix(qa): add e2e-interactive to trigger server valid reasons The trigger server validates reason query params against an allowlist. Without this, the `e2e-interactive` dispatch returns 400. Also note: `soak` is already in VALID_REASONS in the repo but the running service on the QA VM is stale — needs a restart to pick up both soak and e2e-interactive reasons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 17:33:22 -07:00
A	6081c0a17f	feat(qa): telegram soak test on digitalocean + fix bun -e (#2547 ) - soak.sh: SOAK_CLOUD env var makes cloud configurable (default: sprite) - qa.sh: load TELEGRAM_BOT_TOKEN, TELEGRAM_TEST_CHAT_ID, SOAK_CLOUD from /etc/spawn-qa-auth.env in soak mode - qa.yml: add weekly Monday 3am UTC scheduled soak trigger - fix: bun eval → bun -e across soak.sh, key-request.sh, github-auth.sh (bun eval is not a valid subcommand in bun 1.3.9) - fix: export _TOKEN via env prefix so process.env._TOKEN works in bun -e - docs: update shell-scripts.md rule to say bun -e (not bun eval) Verified: 3/4 Telegram tests pass in smoke test on DigitalOcean (120s wait) getMe ✓ sendMessage ✓ getWebhookInfo ✓; cron test needs full 55-min window. Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-12 19:45:18 -04:00
Ahmed Abushagur	330c10fcd2	feat: add Telegram soak test for OpenClaw (--soak mode) (#2492 ) Add a soak test that provisions OpenClaw on Sprite, waits 1 hour for stabilization, injects a Telegram bot token, and runs integration tests against the Telegram Bot API (getMe, sendMessage, getWebhookInfo). - New: sh/e2e/lib/soak.sh — soak test library with all Telegram-specific logic - Modified: sh/e2e/e2e.sh — add --soak flag to arg parser - Modified: qa.sh — add soak run mode (bypasses Claude, runs e2e.sh directly) - Modified: trigger-server.ts — add "soak" to VALID_REASONS - Modified: qa.yml — add soak to workflow_dispatch options Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: A <258483684+la14-1@users.noreply.github.com>	2026-03-11 05:51:53 -04:00
A	b2bddc4ba5	ci: bump QA cron from daily to every 4 hours (#1895 ) Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-24 16:46:55 -08:00
A	98a0d0f68f	feat(qa): add e2e-tester as subagent in scheduled quality sweep (#1894 ) E2E tests now run as a 4th teammate alongside test-runner, dedup-scanner, and code-quality-reviewer during schedule-triggered QA cycles. The standalone e2e mode is preserved for on-demand use. - Add e2e-tester teammate to qa-quality-prompt.md - Increase quality mode timeout from 35 to 40 min - Add "e2e" to trigger-server valid reasons - Re-enable daily schedule in qa.yml, default to "schedule" Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-24 19:34:35 -05:00
A	fccb73d147	feat: add Fly.io E2E test suite and QA e2e mode (#1823 ) - Add e2e/ directory with fly-e2e.sh orchestrator and lib/ helpers (provision, verify, teardown, cleanup) that provision real Fly.io VMs, verify agent installation, and tear everything down - Fix openclaw E2E failure by setting MODEL_ID=openrouter/auto to bypass interactive model selection prompt in headless mode - Add e2e mode to qa.sh (reason=e2e) that launches a Claude agent to run the E2E suite and investigate/fix any failures - Update qa.yml with reason dropdown (e2e/schedule/fixtures), kept disabled Co-authored-by: spawn-bot <spawn-bot@openrouter.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 19:31:44 -05:00
A	33bd3e615c	chore: disable QA workflow schedule until VM is fixed (#1722 ) Keep workflow_dispatch for manual testing. Re-enable cron when the QA VM is back online. Co-authored-by: lab <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-22 11:06:50 -08:00
L	6e13256d96	refactor: simplify claude launch — no streaming, no output monitoring (#1412 ) Replace the complex claude launch pattern (subshell + PID file + tee pipe + stream-json + 50-line watchdog monitoring log file growth + session-end detection) with a simple direct launch: claude -p "..." >> "${LOG_FILE}" 2>&1 & The watchdog is now just a wall-clock timeout. The idle-output detection, stream-json result parsing, and tee piping are all removed. Also remove GitHub Actions concurrency groups — the trigger server already handles dedup (409 for same issue, 409 for same reason), making the GH Actions concurrency groups redundant queuing. Changes: - refactor.sh: simple launch + wall-clock-only watchdog - security.sh: same simplification - discovery.sh: same (refactored _kill_claude_process and _run_watchdog_loop to simpler signatures) - All 4 workflows: remove concurrency groups Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 09:02:47 -08:00
L	f3cfe890f7	refactor: simplify trigger server to fire-and-forget + fix monitoring loop prompts (#1384 ) The trigger server streamed script stdout back to GitHub Actions via a long-lived HTTP response, requiring --http1.1, heartbeat injection, server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min GH Actions timeouts. In practice GitHub Actions is just a dumb trigger — the real state lives on the VM (log files, journalctl). Simplify to fire-and-forget: spawn script, return 200 JSON immediately. Also fix the refactor and discovery team lead monitoring loops. The prompts buried the loop in a single compressed line that the model ignored (doing Bash("sleep 10") repeatedly without calling TaskList). Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered steps, matching the security.sh pattern that actually works. Changes: - trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer, drainStreamOutput, startStreamingRun, heartbeat, ReadableStream), replace with startFireAndForgetRun (stdout: "inherit", immediate JSON) - All 4 workflows: simple curl POST, timeout-minutes 90→5, remove --http1.1/-N/--max-time/exit-code handling - refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps - discovery-team-prompt.txt: same Monitor Loop fix - SKILL.md: update architecture docs, remove streaming sections Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-17 10:47:52 -05:00
Ahmed Abushagur	8b9f9a0e5a	QA-Bot setup (#335 ) * feat: testing * feat: auto-fix dead apis * fix: mock works * feat: new fixtures * fix: more clouds tested * fix: dry run fix * fix: civo valid size * fix: civo result wait * feat: fixtures * feat: per cloud agent	2026-02-10 19:51:07 -08:00

10 commits