unsloth/.github/scripts
Daniel Han 54a86c3514
Some checks are pending
Core / Core (HF=default + TRL=default) (push) Waiting to run
Core / Core (HF=4.57.6 + TRL<1) (push) Waiting to run
Core / Core (HF=latest + TRL=latest) (push) Waiting to run
Core / llama.cpp build + smoke (push) Waiting to run
Lint CI / Source lint (Python + shell + YAML + JSON + safety nets) (push) Waiting to run
MLX CI on Mac M1 / dispatch (push) Waiting to run
Security audit / advisory audit (pip + npm + cargo) (push) Waiting to run
Security audit / pip scan-packages :: extras (push) Waiting to run
Security audit / pip scan-packages :: studio (push) Waiting to run
Security audit / pip scan-packages :: hf-stack (push) Waiting to run
Security audit / npm scan-packages (Studio frontend tarballs) (push) Waiting to run
Security audit / workflow-trigger lint (pull_request_target / cache-poisoning) (push) Waiting to run
Security audit / pytest tests/security (push) Waiting to run
Security audit / npm provenance + new install-script diff (push) Waiting to run
Studio API CI / Studio API & Auth Tests (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Studio GGUF CI / Tool calling Tests (push) Waiting to run
Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio API CI / Studio API & Auth Tests (push) Waiting to run
Mac Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Mac Studio GGUF CI / Tool calling Tests (push) Waiting to run
Mac Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio UI CI / Chat UI Tests (push) Waiting to run
Mac Studio Update CI / Studio Updating Tests (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Studio UI CI / Chat UI Tests (push) Waiting to run
Studio Update CI / Studio Updating Tests (push) Waiting to run
Windows Studio API CI / Studio API & Auth Tests (push) Waiting to run
Windows Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Windows Studio GGUF CI / Tool calling Tests (push) Waiting to run
Windows Studio GGUF CI / JSON, images (push) Waiting to run
Windows Studio UI CI / Chat UI Tests (push) Waiting to run
Windows Studio Update CI / Studio Updating Tests (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
ci: route every hf download through xet-tuned stall-retry wrapper (#5476)
Root cause of the Mac json-images 30 min timeout (run 25950714888 /
PR #5430): huggingface_hub>=1.15 deprecated `hf_transfer` and routes
every transfer through `hf-xet`. The CI step's unpinned
`pip install --upgrade huggingface_hub hf_transfer` jumped to 1.15.0
+ hf-xet 1.5.0, the 940 MB mmproj finished in ~21s, then the 3 GB
gemma-4 GGUF made it to ~46% and went completely silent for the
remaining 29 minutes -- no progress bytes, no error, no exit -- until
the job timeout fired.

This wraps every CI `hf download` in a new
`.github/scripts/hf-download-with-retry.sh`:

  * Drops the no-op `HF_HUB_ENABLE_HF_TRANSFER=1` prefix and the
    `hf_transfer` install (both are deprecated on 1.15+ and only
    emit a FutureWarning now).
  * Exports the hf-xet high-performance knobs Daniel asked for:
        HF_XET_HIGH_PERFORMANCE=1
        HF_XET_CHUNK_CACHE_SIZE_BYTES=0
        HF_XET_NUM_CONCURRENT_RANGE_GETS=64
        HF_XET_RECONSTRUCT_WRITE_SEQUENTIALLY=0
        HF_XET_CLIENT_READ_TIMEOUT=500
  * Watchdogs each attempt: if `hf download` has not exited after
    HF_DOWNLOAD_STALL_SECONDS (default 180s = 3 min), SIGTERM,
    sleep 2, SIGKILL, then loop. Retries are unbounded; the
    enclosing job's `timeout-minutes` is the real cap.
  * Optional 3rd positional `LOCAL_DIR` -- omitted lets `hf` use
    the default HF_HUB_CACHE, which is what the HF_HOME-priming
    jobs need.

19 call sites migrated across mlx-ci.yml + 9 studio-*-smoke.yml
workflows. The inline `python -c "from huggingface_hub import
hf_hub_download; ..."` block in mlx-ci.yml is also routed through
the wrapper so every hf transfer in CI gets the same treatment.

Also reverts the json-images timeout 45 -> 30 from #5475: the bump
was masking this hang, not fixing it.
2026-05-15 21:11:56 -07:00
..
hf-download-with-retry.sh ci: route every hf download through xet-tuned stall-retry wrapper (#5476) 2026-05-15 21:11:56 -07:00