mirror of
https://github.com/OpenRouterTeam/spawn.git
synced 2026-04-28 03:49:31 +00:00
fix(sprite): add retry for list failures, increase timeout, refresh auth on expiry (#2936)
Three fixes for Sprite E2E failures in long-running batches (73+ min): 1. Retry `_sprite_provision_verify`: list failures now retry 3x with exponential backoff (5s, 10s, 20s) instead of failing immediately. Fixes kilocode batch 6 "Could not list Sprite instances" errors. 2. Increase `CREATE_TIMEOUT_SECS` default from 300s to 600s and add `Client.Timeout`, `request canceled`, and `authentication failed` to the transient error retry pattern in `spriteRetry`. Also uses linear backoff (3s * attempt) instead of fixed 3s delay. Fixes hermes batch 7 HTTP timeout errors. 3. Add `_sprite_refresh_auth` + `cloud_refresh_auth` interface. The E2E orchestrator calls `cloud_refresh_auth` before each provisioning batch. For Sprite, this re-validates the token via `sprite org list` and attempts `sprite auth refresh` if expired. Fixes junie batch 8 "authentication failed" errors. Fixes #2934 Agent: ux-engineer Co-authored-by: B <6723574+louisgv@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
50319e0d39
commit
e9cbab5b7f
4 changed files with 116 additions and 23 deletions
|
|
@ -133,6 +133,15 @@ cloud_install_wait() {
|
|||
fi
|
||||
}
|
||||
|
||||
# Refresh auth token if the cloud driver supports it (e.g. Sprite tokens
|
||||
# expire after ~60 min). Called before each provisioning batch to prevent
|
||||
# auth expiry failures in long-running E2E suites. See #2934.
|
||||
cloud_refresh_auth() {
|
||||
if type "_${ACTIVE_CLOUD}_refresh_auth" >/dev/null 2>&1; then
|
||||
"_${ACTIVE_CLOUD}_refresh_auth" "$@"
|
||||
fi
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Per-agent provision timeout overrides
|
||||
#
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue