spawn/sh
A a96522829b
fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898)
* fix(e2e): pass SPAWN_NAME + SPAWN_ENABLED_STEPS to interactive harness

Without SPAWN_NAME, cmdRun prompts 'Name your spawn' interactively.
The AI driver (Claude Haiku) can't respond because ANTHROPIC_AUTH_TOKEN
is an OpenRouter key — every Anthropic API call returns 401, so the harness
returns <wait> indefinitely until the 20-min SESSION_TIMEOUT_MS fires.

SPAWN_ENABLED_STEPS=auto-update bypasses the setup options multiselect,
ensuring the harness only tests the provisioning/installation UX.

* fix(e2e): fix _stage_timeout_remotely stdin pipe issue on Hetzner

Same root cause as _stage_prompt_remotely: _hetzner_exec runs commands via
"printf | base64 -d | bash", which makes bash's stdin the decode pipe.
So piped data from the outer SSH call never reaches subcommands.

"printf '%s' 'VALUE' | cloud_exec APP 'cat > /tmp/.e2e-timeout'" always
creates an empty file, causing "timeout: invalid time interval ''" when
the input test runs.

Fix: embed the validated numeric timeout value directly in the printf
command string (safe — _validate_timeout ensures only [0-9] digits).

* test(e2e): add claude PATH diagnostics to input_test_claude

Temporary debug output to trace where claude is installed
after interactive provision completes.

* test(e2e): save harness transcript JSON on success for debugging

* fix(e2e): remove 'is ready' from harness success pattern

'SSH is ready' (emitted ~15s into provision when SSH connects but before
any agent installation) matched the /is ready/ pattern, triggering false
success detection. The harness killed the spawn CLI during cloud-init wait,
leaving a VM with no agent installed.

Fix: use the same precise patterns as the main repo's harness:
  /Starting agent\.\.\.|setup completed successfully/i
Both only fire after orchestrate.ts completes the full setup.

* chore(e2e): remove temporary debug instrumentation

* feat(e2e): add ai-powered ux review after interactive provision

After each successful interactive E2E run, the harness sends the full
terminal transcript to Claude (via OpenRouter) with a UX reviewer prompt.
It looks for confusing messages, noisy output, missing context in spinners,
and unhelpful errors that don't explain next steps.

Findings are returned as uxIssues[] in the harness JSON result.
interactive.sh then files a GitHub issue per run listing each problem
with a verbatim example and concrete suggestion.

Uses OPENROUTER_API_KEY (already in env) so it works on the QA VM
where ANTHROPIC_API_KEY is an OpenRouter key.

* refactor(e2e): throttle ux issue filing — 33% chance, 3+ issues required

- Random 33% gate: UX review runs on ~1 in 3 successful interactive
  provisions, not every run
- Minimum bar: only surface findings when AI found 3+ clear issues
  (filters one-off nits)
- Tighter system prompt: only flag obvious problems (repeated messages,
  debug leaks, cryptic errors), not minor style preferences

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(e2e): replace random throttle with stricter ux review prompt

Instead of Math.random() to suppress issues, make the AI self-regulate:
the system prompt now instructs it to only flag genuinely bad problems
(repeated messages, raw stack traces, no-feedback waits) and treat
zero findings as a good outcome, not a failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: spawn-qa-bot <qa@openrouter.ai>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 13:42:02 +07:00
..
aws docs: add missing agent entries to all cloud READMEs (#2494) 2026-03-11 05:49:50 -04:00
cli feat(cli): add spawn uninstall command (#2724) 2026-03-17 16:33:09 -07:00
digitalocean fix(do): skip _run_with_restart in headless mode to prevent duplicate droplets (#2805) 2026-03-19 16:12:25 -07:00
docker feat: add junie Dockerfile for Docker image builds (#2601) 2026-03-13 19:40:51 -07:00
e2e fix(e2e): fix interactive E2E test chain (provision → install → input test) (#2898) 2026-03-23 13:42:02 +07:00
gcp feat(gcp): default boot disk to 40 GB, configurable via GCP_DISK_SIZE (#2867) 2026-03-22 11:21:05 +07:00
hetzner docs: add missing agent entries to all cloud READMEs (#2494) 2026-03-11 05:49:50 -04:00
local docs: add missing agent entries to all cloud READMEs (#2494) 2026-03-11 05:49:50 -04:00
shared fix: add sprite-keep-running.sh, remove Hetzner from Packer, cleanup on cancel (#2869) 2026-03-22 18:13:38 +00:00
sprite docs: add missing agent entries to all cloud READMEs (#2494) 2026-03-11 05:49:50 -04:00
test refactor: Remove dead code and stale references (#2062) 2026-03-01 11:45:24 -05:00