ci(mac): retry Playwright JSON crash + GGUF detect retry + MLX is_gguf guard

Two distinct Mac UI Chat failures captured in PR 5312's CI:

1. /api/inference/load 500 with FileNotFoundError on config.json for
   unsloth/gemma-3-270m-it-GGUF (a GGUF-only repo). Run 25487410091.
   Root cause: detect_gguf_model_remote in
   studio/backend/utils/models/model_config.py had a single
   hf_model_info call with no retry. On a transient HF Hub flake
   it returned None silently, the route at routes/inference.py:592
   treated the repo as non-GGUF, and dispatched to the MLX
   orchestrator. The orchestrator's _build_model_config re-ran
   from_identifier in the subprocess (this time succeeding,
   logging "Detected remote GGUF") but then handed an is_gguf=True
   ModelConfig to MLXInferenceBackend.load_model, which ignored
   is_gguf and called FastMLXModel.from_pretrained →
   mlx_lm.utils.load_model → opened a non-existent config.json on
   the GGUF-only repo. Fix:
     a) detect_gguf_model_remote retries up to 3 times with 1/2/4s
        backoff, bypassing retry on RepositoryNotFoundError /
        GatedRepoError / RevisionNotFoundError / EntryNotFoundError
        (those are permanent).
     b) MLXInferenceBackend.load_model now raises a clear
        RuntimeError if config.is_gguf=True, instead of letting
        mlx_lm surface a cryptic 'config.json does not exist'.

2. Playwright pipeTransport.js 'Unexpected end of JSON input' on
   macos-14 free runners. Runs 25489049059 + 25489429306. Chromium
   browser process dies mid-test → driver Node process can't parse
   the truncated JSON-RPC line and exits. Hits ~50% of runs (well
   above acceptable flake). Fix: retry the chat-UI step up to 3
   times, FULLY resetting Studio (kill, reset-password, reboot,
   /api/health wait, re-export STUDIO_OLD/NEW/NEW2_PW) between
   attempts so the change-password flow finds a fresh bootstrap on
   each retry. Same retry shape on the extra-UI step. Real
   assertion / timeout failures don't match the JSON-input pattern
   so they bypass retry and surface immediately. Updated the
   install-step comment to drop the now-incorrect '1.55-1.57 ship a
   Node 22 driver' claim — all 1.55-1.58 Mac drivers are Node 24,
   the racy crash is in pipeTransport itself.
This commit is contained in:
Daniel Han 2026-05-07 10:17:21 +00:00
parent 8bc52ff1c5
commit d35bf6ac8e
3 changed files with 157 additions and 16 deletions

View file

@ -91,11 +91,13 @@ jobs:
# No --with-deps on Mac: that flag installs Linux apt packages.
# GitHub-hosted macos-14 ships the system frameworks Chromium
# needs already.
# Pin <1.58 because playwright 1.59 ships a Node 24 driver that
# crashes with 'SyntaxError: Unexpected end of JSON input' in
# pipeTransport.js when the Chromium child writes an empty line
# during launch on macos-14 free runners. 1.55-1.57 still ship
# a Node 22 driver and don't hit this race.
# Pinned <1.58 because all 1.55-1.58 drivers ship Node 24 on
# macos-14 and intermittently hit 'SyntaxError: Unexpected end
# of JSON input' in pipeTransport.js when the Chromium child
# transiently flushes an empty buffer. The crash is racy and
# only triggers on a fraction of runs; the retry wrapper in
# "Drive the chat UI with Playwright" is what actually keeps
# the job green when the race fires.
run: |
pip install 'playwright>=1.55,<1.58'
python -m playwright install chromium
@ -139,9 +141,58 @@ jobs:
# available to llama.cpp from CI; gemma-3-270m turn latency
# has been observed to crowd the 180s default. Triple it.
STUDIO_UI_TURN_TIMEOUT_MS: '540000'
# Retry up to 3 times to absorb the racy Playwright Node 24
# pipeTransport.js 'Unexpected end of JSON input' crash that
# fires intermittently on macos-14 free runners (Chromium
# browser process dies mid-test → driver Node process can't
# parse the truncated JSON-RPC line and exits). The retry
# FULLY resets Studio (kill, reset-password, reboot, wait
# /api/health, re-export bootstrap pw) before re-running the
# script so the change-password flow finds a fresh bootstrap.
# A real test failure (assertion / timeout) does NOT match the
# JSON pattern so it bypasses retry and surfaces immediately.
run: |
mkdir -p logs/playwright
python tests/studio/playwright_chat_ui.py
attempt=1
max_attempts=3
while : ; do
set +e
python tests/studio/playwright_chat_ui.py 2>&1 | tee logs/playwright_attempt_${attempt}.log
rc=${PIPESTATUS[0]}
set -e
if [ "$rc" -eq 0 ]; then
break
fi
if grep -q "Unexpected end of JSON input" logs/playwright_attempt_${attempt}.log \
&& [ "$attempt" -lt "$max_attempts" ]; then
echo "::warning::Playwright pipeTransport JSON crash on attempt ${attempt}; resetting Studio and retrying..."
kill "${STUDIO_PID}" 2>/dev/null || true
sleep 2
unsloth studio reset-password
UNSLOTH_API_ONLY=1 unsloth studio -H 127.0.0.1 -p "$STUDIO_PORT" \
> "logs/studio_retry_${attempt}.log" 2>&1 &
STUDIO_PID=$!
echo "STUDIO_PID=$STUDIO_PID" >> "$GITHUB_ENV"
for i in $(seq 1 180); do
if curl -fs "http://127.0.0.1:${STUDIO_PORT}/api/health" > /tmp/health.json \
&& jq -e '.status == "healthy"' /tmp/health.json >/dev/null; then
break
fi
sleep 1
done
STUDIO_OLD_PW=$(cat ~/.unsloth/studio/auth/.bootstrap_password)
STUDIO_NEW_PW="CIUi-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
STUDIO_NEW2_PW="CIUi-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
echo "::add-mask::$STUDIO_OLD_PW"
echo "::add-mask::$STUDIO_NEW_PW"
echo "::add-mask::$STUDIO_NEW2_PW"
export STUDIO_OLD_PW STUDIO_NEW_PW STUDIO_NEW2_PW
attempt=$((attempt + 1))
sleep 3
continue
fi
exit "$rc"
done
- name: Stop Studio (chat-ui ends with Shutdown click; this is belt-and-suspenders)
if: always()
@ -187,9 +238,48 @@ jobs:
STUDIO_UI_TURN_TIMEOUT_MS: '540000'
GGUF_REPO: ${{ env.GGUF_REPO }}
GGUF_VARIANT: ${{ env.GGUF_VARIANT }}
# Same pipeTransport JSON-crash retry shape as "Drive the chat
# UI with Playwright" -- see comment there.
run: |
mkdir -p logs/playwright_extra
python tests/studio/playwright_extra_ui.py
attempt=1
max_attempts=3
while : ; do
set +e
python tests/studio/playwright_extra_ui.py 2>&1 | tee logs/playwright_extra_attempt_${attempt}.log
rc=${PIPESTATUS[0]}
set -e
if [ "$rc" -eq 0 ]; then
break
fi
if grep -q "Unexpected end of JSON input" logs/playwright_extra_attempt_${attempt}.log \
&& [ "$attempt" -lt "$max_attempts" ]; then
echo "::warning::Playwright pipeTransport JSON crash on attempt ${attempt}; resetting Studio and retrying..."
kill "${STUDIO_EXTRA_PID}" 2>/dev/null || true
sleep 2
unsloth studio reset-password
UNSLOTH_API_ONLY=1 unsloth studio -H 127.0.0.1 -p 18897 \
> "logs/studio_extra_retry_${attempt}.log" 2>&1 &
STUDIO_EXTRA_PID=$!
echo "STUDIO_EXTRA_PID=$STUDIO_EXTRA_PID" >> "$GITHUB_ENV"
for i in $(seq 1 180); do
if curl -fs "http://127.0.0.1:18897/api/health" > /tmp/health2.json \
&& jq -e '.status == "healthy"' /tmp/health2.json >/dev/null; then
break
fi
sleep 1
done
STUDIO_OLD_PW=$(cat ~/.unsloth/studio/auth/.bootstrap_password)
STUDIO_NEW_PW="CIUiExtra-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
echo "::add-mask::$STUDIO_OLD_PW"
echo "::add-mask::$STUDIO_NEW_PW"
export STUDIO_OLD_PW STUDIO_NEW_PW
attempt=$((attempt + 1))
sleep 3
continue
fi
exit "$rc"
done
- name: Stop second Studio
if: always()

View file

@ -78,6 +78,28 @@ class MLXInferenceBackend:
model_name = config.identifier if hasattr(config, "identifier") else str(config)
is_vision = getattr(config, "is_vision", False)
# GGUF guard. GGUF models are served via llama-server in the
# parent process, NOT via mlx-lm in this MLX subprocess. The
# route at studio/backend/routes/inference.py:592 (`if config.
# is_gguf:`) is responsible for sending GGUF traffic to the
# llama-server backend before reaching the MLX orchestrator.
# If we end up here with is_gguf=True, the route's
# `detect_gguf_model_remote` returned None on its first call
# (transient HF Hub flake) but the subprocess re-detection
# succeeded. The subprocess cannot reach into the parent's
# llama-server, so all we can do is raise loudly so the caller
# gets a clear error instead of a cryptic
# "config.json does not exist" from mlx_lm.utils.load_model.
if getattr(config, "is_gguf", False):
raise RuntimeError(
f"MLXInferenceBackend cannot load GGUF model '{model_name}': "
f"GGUF models must be served by llama-server in the parent "
f"process. The /api/inference/load route should have "
f"detected this repo as GGUF before dispatching to the MLX "
f"orchestrator -- this fallback indicates a transient HF "
f"Hub failure during initial detection. Retry the request."
)
if hf_token:
import os

View file

@ -1327,16 +1327,45 @@ def detect_gguf_model_remote(
Check if a HuggingFace repo contains GGUF files.
Returns the filename of the best GGUF file in the repo, or None.
"""
try:
from huggingface_hub import model_info as hf_model_info
info = hf_model_info(repo_id, token = hf_token)
repo_files = [s.rfilename for s in info.siblings]
return _pick_best_gguf(repo_files)
except Exception as e:
logger.debug(f"Could not check GGUF files for '{repo_id}': {e}")
return None
Retries on transient HF Hub failures (network hiccups, 5xx, slow
cold-start of the API). Without retry, a single transient failure
here returns None silently and the caller treats the repo as
non-GGUF -- which on Apple Silicon (Mac UI route) means falling
through to the MLX backend, which then fails opening a non-existent
config.json on the GGUF-only repo. Three attempts with 1s/2s/4s
backoff covers the typical free-runner HF Hub flakiness.
"""
import time
from huggingface_hub import model_info as hf_model_info
last_err: Optional[Exception] = None
for attempt in range(3):
try:
info = hf_model_info(repo_id, token = hf_token)
repo_files = [s.rfilename for s in info.siblings]
return _pick_best_gguf(repo_files)
except Exception as e:
last_err = e
# 404 / RepoNotFound is permanent -- don't waste attempts.
err_name = type(e).__name__
if err_name in (
"RepositoryNotFoundError",
"GatedRepoError",
"RevisionNotFoundError",
"EntryNotFoundError",
):
logger.debug(
f"Could not check GGUF files for '{repo_id}': {e}"
)
return None
if attempt < 2:
time.sleep(2 ** attempt)
logger.warning(
f"Could not check GGUF files for '{repo_id}' after 3 attempts: "
f"{last_err}"
)
return None
def download_gguf_file(