spawn/sh/e2e/lib/clouds
A e9cbab5b7f
fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936)
Three fixes for Sprite E2E failures in long-running batches (73+ min):

1. Retry `_sprite_provision_verify`: list failures now retry 3x with
   exponential backoff (5s, 10s, 20s) instead of failing immediately.
   Fixes kilocode batch 6 "Could not list Sprite instances" errors.

2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add
   `Client.Timeout`, `request canceled`, and `authentication failed`
   to the transient error retry pattern in `spriteRetry`. Also uses
   linear backoff (3s * attempt) instead of fixed 3s delay.
   Fixes hermes batch 7 HTTP timeout errors.

3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The
   E2E orchestrator calls `cloud_refresh_auth` before each provisioning
   batch. For Sprite, this re-validates the token via `sprite org list`
   and attempts `sprite auth refresh` if expired.
   Fixes junie batch 8 "authentication failed" errors.

Fixes #2934

Agent: ux-engineer

Co-authored-by: B <6723574+louisgv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:47:58 -07:00
..
aws.sh fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798) 2026-03-19 11:23:55 -07:00
digitalocean.sh fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798) 2026-03-19 11:23:55 -07:00
gcp.sh fix(e2e): use aggressive cleanup threshold (5 min) for pre-run to prevent quota exhaustion (#2798) 2026-03-19 11:23:55 -07:00
hetzner.sh fix(hetzner): clean up orphaned primary IPs before provisioning to avoid quota exceeded (#2935) 2026-03-24 11:20:30 +07:00
sprite.sh fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936) 2026-03-23 21:47:58 -07:00