Commit graph

10 commits

Author SHA1 Message Date
Ahmed Abushagur
ed127cf592
feat: never-give-up resilience layer (#2807)
Some checks failed
CLI Release / Build and release CLI (push) Failing after 5s
Lint / Biome Lint (push) Failing after 4s
Lint / macOS Compatibility (push) Successful in 15s
Lint / ShellCheck (push) Successful in 59s
* feat: never-give-up resilience layer — retry every failure instead of exiting

Add retryOrQuit() helper to shared/ui.ts that prompts "Try again? (Y/n)"
after any recoverable failure. Wrap all fatal exit points with retry loops:

- Cloud auth (Hetzner, DigitalOcean, AWS, GCP): retry after 3 failed tokens
- API key acquisition: retry after 3 failed OAuth+manual attempts
- Server creation: retry on any createServer failure (both fast & sequential)
- SSH readiness: retry on waitForReady timeout
- Agent install: retry on install failure
- Pre-launch hooks: retry on preLaunch failure

Non-interactive mode (SPAWN_NON_INTERACTIVE=1) still throws immediately.
Ctrl+C at any retry prompt exits cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(e2e): add AI-driven interactive test harness

Add --interactive mode to the E2E test framework. Instead of running spawn
in headless mode (SPAWN_NON_INTERACTIVE=1), this spawns the CLI in a real
PTY and uses Claude Haiku to respond to prompts like a human user would.

New files:
- sh/e2e/interactive-harness.ts — Bun script that drives the PTY + AI loop
- sh/e2e/lib/interactive.sh — Bash integration with the E2E framework

Usage:
  e2e.sh --cloud hetzner claude --interactive

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(qa): wire interactive E2E into scheduled QA pipeline

- Add `e2e-interactive` option to workflow_dispatch in qa.yml
- Add `e2e-interactive` run mode to qa.sh (loads cloud creds + ANTHROPIC_API_KEY)
- Runs `e2e.sh --cloud hetzner claude --interactive` directly (no Claude Code needed)
- Defaults to hetzner (cheapest), overridable via E2E_INTERACTIVE_CLOUD/AGENT env vars

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(qa): schedule interactive E2E daily at 6am UTC

Runs one agent (claude) on one cloud (hetzner) with AI-driven prompts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(qa): offset soak cron to avoid GitHub Actions schedule dedup

GitHub Actions deduplicates overlapping cron schedules into one run,
making `github.event.schedule` unpredictable. The soak test at `0 3 * * 1`
was getting absorbed by the `0 */4 * * *` quality sweep and never firing
as reason=soak.

Move soak to `30 1 * * 1` (Monday 1:30am UTC) — safely between the
0am and 4am quality sweep slots. Interactive E2E at `0 6 * * *` is
already safe (between the 4am and 8am slots).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(qa): add e2e-interactive to trigger server valid reasons

The trigger server validates reason query params against an allowlist.
Without this, the `e2e-interactive` dispatch returns 400.

Also note: `soak` is already in VALID_REASONS in the repo but the running
service on the QA VM is stale — needs a restart to pick up both soak and
e2e-interactive reasons.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 17:33:22 -07:00
A
6081c0a17f
feat(qa): telegram soak test on digitalocean + fix bun -e (#2547)
- soak.sh: SOAK_CLOUD env var makes cloud configurable (default: sprite)
- qa.sh: load TELEGRAM_BOT_TOKEN, TELEGRAM_TEST_CHAT_ID, SOAK_CLOUD from
  /etc/spawn-qa-auth.env in soak mode
- qa.yml: add weekly Monday 3am UTC scheduled soak trigger
- fix: bun eval → bun -e across soak.sh, key-request.sh, github-auth.sh
  (bun eval is not a valid subcommand in bun 1.3.9)
- fix: export _TOKEN via env prefix so process.env._TOKEN works in bun -e
- docs: update shell-scripts.md rule to say bun -e (not bun eval)

Verified: 3/4 Telegram tests pass in smoke test on DigitalOcean (120s wait)
getMe ✓ sendMessage ✓ getWebhookInfo ✓; cron test needs full 55-min window.

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 19:45:18 -04:00
Ahmed Abushagur
330c10fcd2
feat: add Telegram soak test for OpenClaw (--soak mode) (#2492)
Add a soak test that provisions OpenClaw on Sprite, waits 1 hour for
stabilization, injects a Telegram bot token, and runs integration tests
against the Telegram Bot API (getMe, sendMessage, getWebhookInfo).

- New: sh/e2e/lib/soak.sh — soak test library with all Telegram-specific logic
- Modified: sh/e2e/e2e.sh — add --soak flag to arg parser
- Modified: qa.sh — add soak run mode (bypasses Claude, runs e2e.sh directly)
- Modified: trigger-server.ts — add "soak" to VALID_REASONS
- Modified: qa.yml — add soak to workflow_dispatch options

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: A <258483684+la14-1@users.noreply.github.com>
2026-03-11 05:51:53 -04:00
A
b2bddc4ba5
ci: bump QA cron from daily to every 4 hours (#1895)
Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 16:46:55 -08:00
A
98a0d0f68f
feat(qa): add e2e-tester as subagent in scheduled quality sweep (#1894)
E2E tests now run as a 4th teammate alongside test-runner,
dedup-scanner, and code-quality-reviewer during schedule-triggered
QA cycles. The standalone e2e mode is preserved for on-demand use.

- Add e2e-tester teammate to qa-quality-prompt.md
- Increase quality mode timeout from 35 to 40 min
- Add "e2e" to trigger-server valid reasons
- Re-enable daily schedule in qa.yml, default to "schedule"

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 19:34:35 -05:00
A
fccb73d147
feat: add Fly.io E2E test suite and QA e2e mode (#1823)
- Add e2e/ directory with fly-e2e.sh orchestrator and lib/ helpers
  (provision, verify, teardown, cleanup) that provision real Fly.io VMs,
  verify agent installation, and tear everything down
- Fix openclaw E2E failure by setting MODEL_ID=openrouter/auto to bypass
  interactive model selection prompt in headless mode
- Add e2e mode to qa.sh (reason=e2e) that launches a Claude agent to run
  the E2E suite and investigate/fix any failures
- Update qa.yml with reason dropdown (e2e/schedule/fixtures), kept disabled

Co-authored-by: spawn-bot <spawn-bot@openrouter.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 19:31:44 -05:00
A
33bd3e615c
chore: disable QA workflow schedule until VM is fixed (#1722)
Keep workflow_dispatch for manual testing. Re-enable cron when the
QA VM is back online.

Co-authored-by: lab <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-22 11:06:50 -08:00
L
6e13256d96
refactor: simplify claude launch — no streaming, no output monitoring (#1412)
Replace the complex claude launch pattern (subshell + PID file + tee
pipe + stream-json + 50-line watchdog monitoring log file growth +
session-end detection) with a simple direct launch:

  claude -p "..." >> "${LOG_FILE}" 2>&1 &

The watchdog is now just a wall-clock timeout. The idle-output detection,
stream-json result parsing, and tee piping are all removed.

Also remove GitHub Actions concurrency groups — the trigger server
already handles dedup (409 for same issue, 409 for same reason), making
the GH Actions concurrency groups redundant queuing.

Changes:
- refactor.sh: simple launch + wall-clock-only watchdog
- security.sh: same simplification
- discovery.sh: same (refactored _kill_claude_process and
  _run_watchdog_loop to simpler signatures)
- All 4 workflows: remove concurrency groups

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-17 09:02:47 -08:00
L
f3cfe890f7
refactor: simplify trigger server to fire-and-forget + fix monitoring loop prompts (#1384)
The trigger server streamed script stdout back to GitHub Actions via a
long-lived HTTP response, requiring --http1.1, heartbeat injection,
server.timeout(req, 0), createEnqueuer, drainStreamOutput, and 90-min
GH Actions timeouts. In practice GitHub Actions is just a dumb trigger
— the real state lives on the VM (log files, journalctl). Simplify to
fire-and-forget: spawn script, return 200 JSON immediately.

Also fix the refactor and discovery team lead monitoring loops. The
prompts buried the loop in a single compressed line that the model
ignored (doing Bash("sleep 10") repeatedly without calling TaskList).
Replace with a dedicated "Monitor Loop (CRITICAL)" section with numbered
steps, matching the security.sh pattern that actually works.

Changes:
- trigger-server.ts: remove ~150 lines of streaming code (createEnqueuer,
  drainStreamOutput, startStreamingRun, heartbeat, ReadableStream),
  replace with startFireAndForgetRun (stdout: "inherit", immediate JSON)
- All 4 workflows: simple curl POST, timeout-minutes 90→5, remove
  --http1.1/-N/--max-time/exit-code handling
- refactor.sh: add Monitor Loop (CRITICAL) section with numbered steps
- discovery-team-prompt.txt: same Monitor Loop fix
- SKILL.md: update architecture docs, remove streaming sections

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-02-17 10:47:52 -05:00
Ahmed Abushagur
8b9f9a0e5a
QA-Bot setup (#335)
* feat: testing

* feat: auto-fix dead apis

* fix: mock works

* feat: new fixtures

* fix: more clouds tested

* fix: dry run fix

* fix: civo valid size

* fix: civo result wait

* feat: fixtures

* feat: per cloud agent
2026-02-10 19:51:07 -08:00