spawn

vrr/spawn

mirror of https://github.com/OpenRouterTeam/spawn.git synced 2026-05-20 01:11:18 +00:00

History

A a96522829b fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898 ) * fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively. The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN is an OpenRouter key — every Anthropic API call returns 401, so the harness returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires. SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect, ensuring the harness only tests the provisioning/installation UX. * fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via "printf \| base64 -d \| bash", which makes bash's stdin the decode pipe. So piped data from the outer SSH call never reaches subcommands. "printf '%s' 'VALUE' \| cloud_exec APP 'cat > /tmp/.e2e-timeout'" always creates an empty file, causing "timeout: invalid time interval ''" when the input test runs. Fix: embed the validated numeric timeout value directly in the printf command string (safe — _validate_timeout ensures only [0-9] digits). * test(e2e): add claude PATH diagnostics to input_test_claude Temporary debug output to trace where claude is installed after interactive provision completes. * test(e2e): save harness transcript JSON on success for debugging * fix(e2e): remove 'is ready' from harness success pattern 'SSH is ready' (emitted ~15s into provision when SSH connects but before any agent installation) matched the /is ready/ pattern, triggering false success detection. The harness killed the spawn CLI during cloud-init wait, leaving a VM with no agent installed. Fix: use the same precise patterns as the main repo's harness: /Starting agent\.\.\.\|setup completed successfully/i Both only fire after orchestrate.ts completes the full setup. * chore(e2e): remove temporary debug instrumentation * feat(e2e): add ai-powered ux review after interactive provision After each successful interactive E2E run, the harness sends the full terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt. It looks for confusing messages, noisy output, missing context in spinners, and unhelpful errors that don't explain next steps. Findings are returned as uxIssues[] in the harness JSON result. interactive.sh then files a GitHub issue per run listing each problem with a verbatim example and concrete suggestion. Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM where ANTHROPIC_API_KEY is an OpenRouter key. * refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required - Random 33% gate: UX review runs on ~1 in 3 successful interactive provisions, not every run - Minimum bar: only surface findings when AI found 3+ clear issues (filters one-off nits) - Tighter system prompt: only flag obvious problems (repeated messages, debug leaks, cryptic errors), not minor style preferences Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(e2e): replace random throttle with stricter ux review prompt Instead of Math.random() to suppress issues, make the AI self-regulate: the system prompt now instructs it to only flag genuinely bad problems (repeated messages, raw stack traces, no-feedback waits) and treat zero findings as a good outcome, not a failure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: spawn-qa-bot <qa@openrouter.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-03-23 13:42:02 +07:00
..
aws	docs: add missing agent entries to all cloud READMEs (#2494 )	2026-03-11 05:49:50 -04:00
cli	feat(cli): add spawn uninstall command (#2724 )	2026-03-17 16:33:09 -07:00
digitalocean	fix(do): skip _run_with_restart in headless mode to prevent duplicate droplets (#2805 )	2026-03-19 16:12:25 -07:00
docker	feat: add junie Dockerfile for Docker image builds (#2601 )	2026-03-13 19:40:51 -07:00
e2e	fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898 )	2026-03-23 13:42:02 +07:00
gcp	feat(gcp): default boot disk to 40 GB, configurable via GCP_DISK_SIZE (#2867 )	2026-03-22 11:21:05 +07:00
hetzner	docs: add missing agent entries to all cloud READMEs (#2494 )	2026-03-11 05:49:50 -04:00
local	docs: add missing agent entries to all cloud READMEs (#2494 )	2026-03-11 05:49:50 -04:00
shared	fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869 )	2026-03-22 18:13:38 +00:00
sprite	docs: add missing agent entries to all cloud READMEs (#2494 )	2026-03-11 05:49:50 -04:00
test	refactor: Remove dead code and stale references (#2062 )	2026-03-01 11:45:24 -05:00