ci(mac): retry Playwright JSON crash + GGUF detect retry + MLX is_gguf guard

Two distinct Mac UI Chat failures captured in PR 5312's CI: 1. /api/inference/load 500 with FileNotFoundError on config.json for unsloth/gemma-3-270m-it-GGUF (a GGUF-only repo). Run 25487410091. Root cause: detect_gguf_model_remote in studio/backend/utils/models/model_config.py had a single hf_model_info call with no retry. On a transient HF Hub flake it returned None silently, the route at routes/inference.py:592 treated the repo as non-GGUF, and dispatched to the MLX orchestrator. The orchestrator's _build_model_config re-ran from_identifier in the subprocess (this time succeeding, logging "Detected remote GGUF") but then handed an is_gguf=True ModelConfig to MLXInferenceBackend.load_model, which ignored is_gguf and called FastMLXModel.from_pretrained → mlx_lm.utils.load_model → opened a non-existent config.json on the GGUF-only repo. Fix: a) detect_gguf_model_remote retries up to 3 times with 1/2/4s backoff, bypassing retry on RepositoryNotFoundError / GatedRepoError / RevisionNotFoundError / EntryNotFoundError (those are permanent). b) MLXInferenceBackend.load_model now raises a clear RuntimeError if config.is_gguf=True, instead of letting mlx_lm surface a cryptic 'config.json does not exist'. 2. Playwright pipeTransport.js 'Unexpected end of JSON input' on macos-14 free runners. Runs 25489049059 + 25489429306. Chromium browser process dies mid-test → driver Node process can't parse the truncated JSON-RPC line and exits. Hits ~50% of runs (well above acceptable flake). Fix: retry the chat-UI step up to 3 times, FULLY resetting Studio (kill, reset-password, reboot, /api/health wait, re-export STUDIO_OLD/NEW/NEW2_PW) between attempts so the change-password flow finds a fresh bootstrap on each retry. Same retry shape on the extra-UI step. Real assertion / timeout failures don't match the JSON-input pattern so they bypass retry and surface immediately. Updated the install-step comment to drop the now-incorrect '1.55-1.57 ship a Node 22 driver' claim — all 1.55-1.58 Mac drivers are Node 24, the racy crash is in pipeTransport itself.
2026-05-17 03:56:07 +00:00 · 2026-05-07 10:17:21 +00:00 · 2026-05-07 10:17:21 +00:00 · d35bf6ac8e
commit d35bf6ac8e
parent 8bc52ff1c5
3 changed files with 157 additions and 16 deletions
--- a/.github/workflows/studio-mac-ui-smoke.yml
+++ b/.github/workflows/studio-mac-ui-smoke.yml
@ -91,11 +91,13 @@ jobs:
        # No --with-deps on Mac: that flag installs Linux apt packages.
        # GitHub-hosted macos-14 ships the system frameworks Chromium
        # needs already.
-        # Pin <1.58 because playwright 1.59 ships a Node 24 driver that
-        # crashes with 'SyntaxError: Unexpected end of JSON input' in
-        # pipeTransport.js when the Chromium child writes an empty line
-        # during launch on macos-14 free runners. 1.55-1.57 still ship
-        # a Node 22 driver and don't hit this race.
+        # Pinned <1.58 because all 1.55-1.58 drivers ship Node 24 on
+        # macos-14 and intermittently hit 'SyntaxError: Unexpected end
+        # of JSON input' in pipeTransport.js when the Chromium child
+        # transiently flushes an empty buffer. The crash is racy and
+        # only triggers on a fraction of runs; the retry wrapper in
+        # "Drive the chat UI with Playwright" is what actually keeps
+        # the job green when the race fires.
        run: |
          pip install 'playwright>=1.55,<1.58'
          python -m playwright install chromium
@ -139,9 +141,58 @@ jobs:
          # available to llama.cpp from CI; gemma-3-270m turn latency
          # has been observed to crowd the 180s default. Triple it.
          STUDIO_UI_TURN_TIMEOUT_MS: '540000'
+        # Retry up to 3 times to absorb the racy Playwright Node 24
+        # pipeTransport.js 'Unexpected end of JSON input' crash that
+        # fires intermittently on macos-14 free runners (Chromium
+        # browser process dies mid-test → driver Node process can't
+        # parse the truncated JSON-RPC line and exits). The retry
+        # FULLY resets Studio (kill, reset-password, reboot, wait
+        # /api/health, re-export bootstrap pw) before re-running the
+        # script so the change-password flow finds a fresh bootstrap.
+        # A real test failure (assertion / timeout) does NOT match the
+        # JSON pattern so it bypasses retry and surfaces immediately.
        run: |
          mkdir -p logs/playwright
-          python tests/studio/playwright_chat_ui.py
+          attempt=1
+          max_attempts=3
+          while : ; do
+            set +e
+            python tests/studio/playwright_chat_ui.py 2>&1 | tee logs/playwright_attempt_${attempt}.log
+            rc=${PIPESTATUS[0]}
+            set -e
+            if [ "$rc" -eq 0 ]; then
+              break
+            fi
+            if grep -q "Unexpected end of JSON input" logs/playwright_attempt_${attempt}.log \
+               && [ "$attempt" -lt "$max_attempts" ]; then
+              echo "::warning::Playwright pipeTransport JSON crash on attempt ${attempt}; resetting Studio and retrying..."
+              kill "${STUDIO_PID}" 2>/dev/null || true
+              sleep 2
+              unsloth studio reset-password
+              UNSLOTH_API_ONLY=1 unsloth studio -H 127.0.0.1 -p "$STUDIO_PORT" \
+                > "logs/studio_retry_${attempt}.log" 2>&1 &
+              STUDIO_PID=$!
+              echo "STUDIO_PID=$STUDIO_PID" >> "$GITHUB_ENV"
+              for i in $(seq 1 180); do
+                if curl -fs "http://127.0.0.1:${STUDIO_PORT}/api/health" > /tmp/health.json \
+                   && jq -e '.status == "healthy"' /tmp/health.json >/dev/null; then
+                  break
+                fi
+                sleep 1
+              done
+              STUDIO_OLD_PW=$(cat ~/.unsloth/studio/auth/.bootstrap_password)
+              STUDIO_NEW_PW="CIUi-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
+              STUDIO_NEW2_PW="CIUi-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
+              echo "::add-mask::$STUDIO_OLD_PW"
+              echo "::add-mask::$STUDIO_NEW_PW"
+              echo "::add-mask::$STUDIO_NEW2_PW"
+              export STUDIO_OLD_PW STUDIO_NEW_PW STUDIO_NEW2_PW
+              attempt=$((attempt + 1))
+              sleep 3
+              continue
+            fi
+            exit "$rc"
+          done

      - name: Stop Studio (chat-ui ends with Shutdown click; this is belt-and-suspenders)
        if: always()
@ -187,9 +238,48 @@ jobs:
          STUDIO_UI_TURN_TIMEOUT_MS: '540000'
          GGUF_REPO: ${{ env.GGUF_REPO }}
          GGUF_VARIANT: ${{ env.GGUF_VARIANT }}
+        # Same pipeTransport JSON-crash retry shape as "Drive the chat
+        # UI with Playwright" -- see comment there.
        run: |
          mkdir -p logs/playwright_extra
-          python tests/studio/playwright_extra_ui.py
+          attempt=1
+          max_attempts=3
+          while : ; do
+            set +e
+            python tests/studio/playwright_extra_ui.py 2>&1 | tee logs/playwright_extra_attempt_${attempt}.log
+            rc=${PIPESTATUS[0]}
+            set -e
+            if [ "$rc" -eq 0 ]; then
+              break
+            fi
+            if grep -q "Unexpected end of JSON input" logs/playwright_extra_attempt_${attempt}.log \
+               && [ "$attempt" -lt "$max_attempts" ]; then
+              echo "::warning::Playwright pipeTransport JSON crash on attempt ${attempt}; resetting Studio and retrying..."
+              kill "${STUDIO_EXTRA_PID}" 2>/dev/null || true
+              sleep 2
+              unsloth studio reset-password
+              UNSLOTH_API_ONLY=1 unsloth studio -H 127.0.0.1 -p 18897 \
+                > "logs/studio_extra_retry_${attempt}.log" 2>&1 &
+              STUDIO_EXTRA_PID=$!
+              echo "STUDIO_EXTRA_PID=$STUDIO_EXTRA_PID" >> "$GITHUB_ENV"
+              for i in $(seq 1 180); do
+                if curl -fs "http://127.0.0.1:18897/api/health" > /tmp/health2.json \
+                   && jq -e '.status == "healthy"' /tmp/health2.json >/dev/null; then
+                  break
+                fi
+                sleep 1
+              done
+              STUDIO_OLD_PW=$(cat ~/.unsloth/studio/auth/.bootstrap_password)
+              STUDIO_NEW_PW="CIUiExtra-$(python -c 'import secrets; print(secrets.token_urlsafe(16))')"
+              echo "::add-mask::$STUDIO_OLD_PW"
+              echo "::add-mask::$STUDIO_NEW_PW"
+              export STUDIO_OLD_PW STUDIO_NEW_PW
+              attempt=$((attempt + 1))
+              sleep 3
+              continue
+            fi
+            exit "$rc"
+          done

      - name: Stop second Studio
        if: always()
--- a/studio/backend/core/inference/mlx_inference.py
+++ b/studio/backend/core/inference/mlx_inference.py
@ -78,6 +78,28 @@ class MLXInferenceBackend:
        model_name = config.identifier if hasattr(config, "identifier") else str(config)
        is_vision = getattr(config, "is_vision", False)

+        # GGUF guard. GGUF models are served via llama-server in the
+        # parent process, NOT via mlx-lm in this MLX subprocess. The
+        # route at studio/backend/routes/inference.py:592 (`if config.
+        # is_gguf:`) is responsible for sending GGUF traffic to the
+        # llama-server backend before reaching the MLX orchestrator.
+        # If we end up here with is_gguf=True, the route's
+        # `detect_gguf_model_remote` returned None on its first call
+        # (transient HF Hub flake) but the subprocess re-detection
+        # succeeded. The subprocess cannot reach into the parent's
+        # llama-server, so all we can do is raise loudly so the caller
+        # gets a clear error instead of a cryptic
+        # "config.json does not exist" from mlx_lm.utils.load_model.
+        if getattr(config, "is_gguf", False):
+            raise RuntimeError(
+                f"MLXInferenceBackend cannot load GGUF model '{model_name}': "
+                f"GGUF models must be served by llama-server in the parent "
+                f"process. The /api/inference/load route should have "
+                f"detected this repo as GGUF before dispatching to the MLX "
+                f"orchestrator -- this fallback indicates a transient HF "
+                f"Hub failure during initial detection. Retry the request."
+            )
+
        if hf_token:
            import os

--- a/studio/backend/utils/models/model_config.py
+++ b/studio/backend/utils/models/model_config.py
@ -1327,16 +1327,45 @@ def detect_gguf_model_remote(
    Check if a HuggingFace repo contains GGUF files.

    Returns the filename of the best GGUF file in the repo, or None.
-    """
-    try:
-        from huggingface_hub import model_info as hf_model_info

-        info = hf_model_info(repo_id, token = hf_token)
-        repo_files = [s.rfilename for s in info.siblings]
-        return _pick_best_gguf(repo_files)
-    except Exception as e:
-        logger.debug(f"Could not check GGUF files for '{repo_id}': {e}")
-        return None
+    Retries on transient HF Hub failures (network hiccups, 5xx, slow
+    cold-start of the API). Without retry, a single transient failure
+    here returns None silently and the caller treats the repo as
+    non-GGUF -- which on Apple Silicon (Mac UI route) means falling
+    through to the MLX backend, which then fails opening a non-existent
+    config.json on the GGUF-only repo. Three attempts with 1s/2s/4s
+    backoff covers the typical free-runner HF Hub flakiness.
+    """
+    import time
+    from huggingface_hub import model_info as hf_model_info
+
+    last_err: Optional[Exception] = None
+    for attempt in range(3):
+        try:
+            info = hf_model_info(repo_id, token = hf_token)
+            repo_files = [s.rfilename for s in info.siblings]
+            return _pick_best_gguf(repo_files)
+        except Exception as e:
+            last_err = e
+            # 404 / RepoNotFound is permanent -- don't waste attempts.
+            err_name = type(e).__name__
+            if err_name in (
+                "RepositoryNotFoundError",
+                "GatedRepoError",
+                "RevisionNotFoundError",
+                "EntryNotFoundError",
+            ):
+                logger.debug(
+                    f"Could not check GGUF files for '{repo_id}': {e}"
+                )
+                return None
+            if attempt < 2:
+                time.sleep(2 ** attempt)
+    logger.warning(
+        f"Could not check GGUF files for '{repo_id}' after 3 attempts: "
+        f"{last_err}"
+    )
+    return None


 def download_gguf_file(