mirror of
https://github.com/unslothai/unsloth.git
synced 2026-05-17 03:56:07 +00:00
ci(mac): retry Playwright JSON crash + GGUF detect retry + MLX is_gguf guard
Two distinct Mac UI Chat failures captured in PR 5312's CI:
1. /api/inference/load 500 with FileNotFoundError on config.json for
unsloth/gemma-3-270m-it-GGUF (a GGUF-only repo). Run 25487410091.
Root cause: detect_gguf_model_remote in
studio/backend/utils/models/model_config.py had a single
hf_model_info call with no retry. On a transient HF Hub flake
it returned None silently, the route at routes/inference.py:592
treated the repo as non-GGUF, and dispatched to the MLX
orchestrator. The orchestrator's _build_model_config re-ran
from_identifier in the subprocess (this time succeeding,
logging "Detected remote GGUF") but then handed an is_gguf=True
ModelConfig to MLXInferenceBackend.load_model, which ignored
is_gguf and called FastMLXModel.from_pretrained →
mlx_lm.utils.load_model → opened a non-existent config.json on
the GGUF-only repo. Fix:
a) detect_gguf_model_remote retries up to 3 times with 1/2/4s
backoff, bypassing retry on RepositoryNotFoundError /
GatedRepoError / RevisionNotFoundError / EntryNotFoundError
(those are permanent).
b) MLXInferenceBackend.load_model now raises a clear
RuntimeError if config.is_gguf=True, instead of letting
mlx_lm surface a cryptic 'config.json does not exist'.
2. Playwright pipeTransport.js 'Unexpected end of JSON input' on
macos-14 free runners. Runs 25489049059 + 25489429306. Chromium
browser process dies mid-test → driver Node process can't parse
the truncated JSON-RPC line and exits. Hits ~50% of runs (well
above acceptable flake). Fix: retry the chat-UI step up to 3
times, FULLY resetting Studio (kill, reset-password, reboot,
/api/health wait, re-export STUDIO_OLD/NEW/NEW2_PW) between
attempts so the change-password flow finds a fresh bootstrap on
each retry. Same retry shape on the extra-UI step. Real
assertion / timeout failures don't match the JSON-input pattern
so they bypass retry and surface immediately. Updated the
install-step comment to drop the now-incorrect '1.55-1.57 ship a
Node 22 driver' claim — all 1.55-1.58 Mac drivers are Node 24,
the racy crash is in pipeTransport itself.
This commit is contained in:
parent
8bc52ff1c5
commit
d35bf6ac8e
3 changed files with 157 additions and 16 deletions
104
.github/workflows/studio-mac-ui-smoke.yml
vendored
104
.github/workflows/studio-mac-ui-smoke.yml
vendored
|
|
@ -91,11 +91,13 @@ jobs:
|
|||
# No --with-deps on Mac: that flag installs Linux apt packages.
|
||||
# GitHub-hosted macos-14 ships the system frameworks Chromium
|
||||
# needs already.
|
||||
# Pin <1.58 because playwright 1.59 ships a Node 24 driver that
|
||||
# crashes with 'SyntaxError: Unexpected end of JSON input' in
|
||||
# pipeTransport.js when the Chromium child writes an empty line
|
||||
# during launch on macos-14 free runners. 1.55-1.57 still ship
|
||||
# a Node 22 driver and don't hit this race.
|
||||
# Pinned <1.58 because all 1.55-1.58 drivers ship Node 24 on
|
||||
# macos-14 and intermittently hit 'SyntaxError: Unexpected end
|
||||
# of JSON input' in pipeTransport.js when the Chromium child
|
||||
# transiently flushes an empty buffer. The crash is racy and
|
||||
# only triggers on a fraction of runs; the retry wrapper in
|
||||
# "Drive the chat UI with Playwright" is what actually keeps
|
||||
# the job green when the race fires.
|
||||
run: |
|
||||
pip install 'playwright>=1.55,<1.58'
|
||||
python -m playwright install chromium
|
||||
|
|
@ -139,9 +141,58 @@ jobs:
|
|||
# available to llama.cpp from CI; gemma-3-270m turn latency
|
||||
# has been observed to crowd the 180s default. Triple it.
|
||||
STUDIO_UI_TURN_TIMEOUT_MS: '540000'
|
||||
# Retry up to 3 times to absorb the racy Playwright Node 24
|
||||
# pipeTransport.js 'Unexpected end of JSON input' crash that
|
||||
# fires intermittently on macos-14 free runners (Chromium
|
||||
# browser process dies mid-test → driver Node process can't
|
||||
# parse the truncated JSON-RPC line and exits). The retry
|
||||
# FULLY resets Studio (kill, reset-password, reboot, wait
|
||||
# /api/health, re-export bootstrap pw) before re-running the
|
||||
# script so the change-password flow finds a fresh bootstrap.
|
||||
# A real test failure (assertion / timeout) does NOT match the
|
||||
# JSON pattern so it bypasses retry and surfaces immediately.
|
||||
run: |
|
||||
mkdir -p logs/playwright
|
||||
python tests/studio/playwright_chat_ui.py
|
||||
attempt=1
|
||||
max_attempts=3
|
||||
while : ; do
|
||||
set +e
|
||||
python tests/studio/playwright_chat_ui.py 2>&1 | tee logs/playwright_attempt_${attempt}.log
|
||||
rc=${PIPESTATUS[0]}
|
||||
set -e
|
||||
if [ "$rc" -eq 0 ]; then
|
||||
break
|
||||
fi
|
||||
if grep -q "Unexpected end of JSON input" logs/playwright_attempt_${attempt}.log \
|
||||
&& [ "$attempt" -lt "$max_attempts" ]; then
|
||||
echo "::warning::Playwright pipeTransport JSON crash on attempt ${attempt}; resetting Studio and retrying..."
|
||||
kill "${STUDIO_PID}" 2>/dev/null || true
|
||||
sleep 2
|
||||
unsloth studio reset-password
|
||||
UNSLOTH_API_ONLY=1 unsloth studio -H 127.0.0.1 -p "$STUDIO_PORT" \
|
||||
> "logs/studio_retry_${attempt}.log" 2>&1 &
|
||||
STUDIO_PID=$!
|
||||
echo "STUDIO_PID=$STUDIO_PID" >> "$GITHUB_ENV"
|
||||
for i in $(seq 1 180); do
|
||||
if curl -fs "http://127.0.0.1:${STUDIO_PORT}/api/health" > /tmp/health.json \
|
||||
&& jq -e '.status == "healthy"' /tmp/health.json >/dev/null; then
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
STUDIO_OLD_PW=$(cat ~/.unsloth/studio/auth/.bootstrap_password)
|
||||
STUDIO_NEW_PW="CIUi-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
|
||||
STUDIO_NEW2_PW="CIUi-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
|
||||
echo "::add-mask::$STUDIO_OLD_PW"
|
||||
echo "::add-mask::$STUDIO_NEW_PW"
|
||||
echo "::add-mask::$STUDIO_NEW2_PW"
|
||||
export STUDIO_OLD_PW STUDIO_NEW_PW STUDIO_NEW2_PW
|
||||
attempt=$((attempt + 1))
|
||||
sleep 3
|
||||
continue
|
||||
fi
|
||||
exit "$rc"
|
||||
done
|
||||
|
||||
- name: Stop Studio (chat-ui ends with Shutdown click; this is belt-and-suspenders)
|
||||
if: always()
|
||||
|
|
@ -187,9 +238,48 @@ jobs:
|
|||
STUDIO_UI_TURN_TIMEOUT_MS: '540000'
|
||||
GGUF_REPO: ${{ env.GGUF_REPO }}
|
||||
GGUF_VARIANT: ${{ env.GGUF_VARIANT }}
|
||||
# Same pipeTransport JSON-crash retry shape as "Drive the chat
|
||||
# UI with Playwright" -- see comment there.
|
||||
run: |
|
||||
mkdir -p logs/playwright_extra
|
||||
python tests/studio/playwright_extra_ui.py
|
||||
attempt=1
|
||||
max_attempts=3
|
||||
while : ; do
|
||||
set +e
|
||||
python tests/studio/playwright_extra_ui.py 2>&1 | tee logs/playwright_extra_attempt_${attempt}.log
|
||||
rc=${PIPESTATUS[0]}
|
||||
set -e
|
||||
if [ "$rc" -eq 0 ]; then
|
||||
break
|
||||
fi
|
||||
if grep -q "Unexpected end of JSON input" logs/playwright_extra_attempt_${attempt}.log \
|
||||
&& [ "$attempt" -lt "$max_attempts" ]; then
|
||||
echo "::warning::Playwright pipeTransport JSON crash on attempt ${attempt}; resetting Studio and retrying..."
|
||||
kill "${STUDIO_EXTRA_PID}" 2>/dev/null || true
|
||||
sleep 2
|
||||
unsloth studio reset-password
|
||||
UNSLOTH_API_ONLY=1 unsloth studio -H 127.0.0.1 -p 18897 \
|
||||
> "logs/studio_extra_retry_${attempt}.log" 2>&1 &
|
||||
STUDIO_EXTRA_PID=$!
|
||||
echo "STUDIO_EXTRA_PID=$STUDIO_EXTRA_PID" >> "$GITHUB_ENV"
|
||||
for i in $(seq 1 180); do
|
||||
if curl -fs "http://127.0.0.1:18897/api/health" > /tmp/health2.json \
|
||||
&& jq -e '.status == "healthy"' /tmp/health2.json >/dev/null; then
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
STUDIO_OLD_PW=$(cat ~/.unsloth/studio/auth/.bootstrap_password)
|
||||
STUDIO_NEW_PW="CIUiExtra-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
|
||||
echo "::add-mask::$STUDIO_OLD_PW"
|
||||
echo "::add-mask::$STUDIO_NEW_PW"
|
||||
export STUDIO_OLD_PW STUDIO_NEW_PW
|
||||
attempt=$((attempt + 1))
|
||||
sleep 3
|
||||
continue
|
||||
fi
|
||||
exit "$rc"
|
||||
done
|
||||
|
||||
- name: Stop second Studio
|
||||
if: always()
|
||||
|
|
|
|||
|
|
@ -78,6 +78,28 @@ class MLXInferenceBackend:
|
|||
model_name = config.identifier if hasattr(config, "identifier") else str(config)
|
||||
is_vision = getattr(config, "is_vision", False)
|
||||
|
||||
# GGUF guard. GGUF models are served via llama-server in the
|
||||
# parent process, NOT via mlx-lm in this MLX subprocess. The
|
||||
# route at studio/backend/routes/inference.py:592 (`if config.
|
||||
# is_gguf:`) is responsible for sending GGUF traffic to the
|
||||
# llama-server backend before reaching the MLX orchestrator.
|
||||
# If we end up here with is_gguf=True, the route's
|
||||
# `detect_gguf_model_remote` returned None on its first call
|
||||
# (transient HF Hub flake) but the subprocess re-detection
|
||||
# succeeded. The subprocess cannot reach into the parent's
|
||||
# llama-server, so all we can do is raise loudly so the caller
|
||||
# gets a clear error instead of a cryptic
|
||||
# "config.json does not exist" from mlx_lm.utils.load_model.
|
||||
if getattr(config, "is_gguf", False):
|
||||
raise RuntimeError(
|
||||
f"MLXInferenceBackend cannot load GGUF model '{model_name}': "
|
||||
f"GGUF models must be served by llama-server in the parent "
|
||||
f"process. The /api/inference/load route should have "
|
||||
f"detected this repo as GGUF before dispatching to the MLX "
|
||||
f"orchestrator -- this fallback indicates a transient HF "
|
||||
f"Hub failure during initial detection. Retry the request."
|
||||
)
|
||||
|
||||
if hf_token:
|
||||
import os
|
||||
|
||||
|
|
|
|||
|
|
@ -1327,16 +1327,45 @@ def detect_gguf_model_remote(
|
|||
Check if a HuggingFace repo contains GGUF files.
|
||||
|
||||
Returns the filename of the best GGUF file in the repo, or None.
|
||||
"""
|
||||
try:
|
||||
from huggingface_hub import model_info as hf_model_info
|
||||
|
||||
info = hf_model_info(repo_id, token = hf_token)
|
||||
repo_files = [s.rfilename for s in info.siblings]
|
||||
return _pick_best_gguf(repo_files)
|
||||
except Exception as e:
|
||||
logger.debug(f"Could not check GGUF files for '{repo_id}': {e}")
|
||||
return None
|
||||
Retries on transient HF Hub failures (network hiccups, 5xx, slow
|
||||
cold-start of the API). Without retry, a single transient failure
|
||||
here returns None silently and the caller treats the repo as
|
||||
non-GGUF -- which on Apple Silicon (Mac UI route) means falling
|
||||
through to the MLX backend, which then fails opening a non-existent
|
||||
config.json on the GGUF-only repo. Three attempts with 1s/2s/4s
|
||||
backoff covers the typical free-runner HF Hub flakiness.
|
||||
"""
|
||||
import time
|
||||
from huggingface_hub import model_info as hf_model_info
|
||||
|
||||
last_err: Optional[Exception] = None
|
||||
for attempt in range(3):
|
||||
try:
|
||||
info = hf_model_info(repo_id, token = hf_token)
|
||||
repo_files = [s.rfilename for s in info.siblings]
|
||||
return _pick_best_gguf(repo_files)
|
||||
except Exception as e:
|
||||
last_err = e
|
||||
# 404 / RepoNotFound is permanent -- don't waste attempts.
|
||||
err_name = type(e).__name__
|
||||
if err_name in (
|
||||
"RepositoryNotFoundError",
|
||||
"GatedRepoError",
|
||||
"RevisionNotFoundError",
|
||||
"EntryNotFoundError",
|
||||
):
|
||||
logger.debug(
|
||||
f"Could not check GGUF files for '{repo_id}': {e}"
|
||||
)
|
||||
return None
|
||||
if attempt < 2:
|
||||
time.sleep(2 ** attempt)
|
||||
logger.warning(
|
||||
f"Could not check GGUF files for '{repo_id}' after 3 attempts: "
|
||||
f"{last_err}"
|
||||
)
|
||||
return None
|
||||
|
||||
|
||||
def download_gguf_file(
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue