unsloth

mirror of https://github.com/unslothai/unsloth.git synced 2026-05-17 21:14:06 +00:00

Author	SHA1	Message	Date
Roland Tannous	79adfd9c71	studio: skip flash-attn install on Blackwell GPUs (sm_100+) (#5420 ) * studio: skip flash-attn install on Blackwell GPUs (sm_100+) Dao-AILab does not publish prebuilt flash-attn wheels for sm_100, sm_120, or sm_121, and the older-arch wheels fail to load on Blackwell. Add a shared has_blackwell_gpu() helper and gate both the install-time (install_python_stack._ensure_flash_attn) and runtime (worker._ensure_flash_attn_for_long_context) paths on it. Detection uses nvidia-smi --query-gpu=compute_cap, which works on Linux and Windows. * test: stub has_blackwell_gpu in pre-existing runtime flash-attn tests prefers_prebuilt_wheel and falls_back_to_pypi exercise the install paths that the Blackwell guard now short-circuits. Make them explicit about non-Blackwell so they pass on real Blackwell hosts. * studio: cache has_blackwell_gpu, skip Blackwell warning under NO_TORCH - Wrap has_blackwell_gpu in functools.lru_cache so repeated calls in a single process avoid redundant nvidia-smi spawns. Tests clear the cache via setup_method/teardown_method. - In _ensure_flash_attn, run the NO_TORCH short-circuit before the Blackwell check so GGUF-only users (who never install torch anyway) do not see a Blackwell warning. Blackwell check still runs above the IS_WINDOWS / IS_MACOS gates so Blackwell-on-Windows users still see the explicit reason rather than a silent OS skip. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: add has_blackwell_gpu to mlx worker test wheel_utils stub test_mlx_training_worker_config loads worker.py against a hand-rolled utils.wheel_utils stub. Adding has_blackwell_gpu to the stub symbol list so worker's import line resolves. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-05-14 18:13:50 +04:00
Daniel Han	1c91f49d83	fix: unblock 4 tests deselected/skipped in #5312 (real bugs) (#5359 ) * fix: unblock 4 tests deselected/skipped in #5312 (real bugs) PR #5312 surfaced two real regressions by turning previously-silent skips into explicit `--deselect` / `pytest.skip(...)` blocks. Both were left as follow-ups rather than fixed in that PR. This PR fixes the underlying bugs so the suppressions can be dropped. 1. studio/backend/requirements/no-torch-runtime.txt: pin tokenizers Installing with `--no-deps -r no-torch-runtime.txt` (the path install.sh takes for the no-torch / GGUF-only mode) resolves transformers to 5.3.0 and tokenizers to the latest available (0.23.1). transformers 5.3.0 requires `tokenizers>=0.22.0,<=0.23.0`, so `from transformers import AutoConfig` then fails at import time: ImportError: tokenizers>=0.22.0,<=0.23.0 is required for a normal functioning of this module, but found tokenizers==0.23.1. Pin `tokenizers>=0.22.0,<=0.23.0` to match the constraint embedded inside every transformers version in the allowed window (4.56.0..5.3.0). Verified locally: a fresh `uv venv` + `uv pip install --no-deps -r no-torch-runtime.txt` followed by `from transformers import AutoConfig` now succeeds. Unblocks 3 deselected cases in studio-backend-ci.yml: - TestE2ETokenizersFix::test_autoconfig_works_with_no_torch_runtime (parametrized py 3.12 + 3.13 -> 2 cases) - TestE2EFullNoTorchSandbox::test_autoconfig_succeeds 2. unsloth/models/rl.py: defensive wrapper for _patch_trl_rl_trainers _patch_trl_rl_trainers has many internal `try: ... except: ... return` branches, but several paths (notably inspect.getsource on the thin wrappers TRL 1.x leaves in trl.trainer for trainers that moved to trl.experimental) can still propagate exceptions. The umbrella patch_trl_rl_trainers() ring-fences each call with try/except + warning_once, but direct callers (the CI shim in consolidated-tests-ci.yml, downstream tools, end-user scripts) used to see the raw exception, which forced #5312's CI heredoc to ring-fence with: except Exception as e: # TRL 1.x renames break the patch helper internally; we # accept that here and skip rather than fail the cell. pytest.skip(f"_patch_trl_rl_trainers raised: ...") Rename the existing implementation to _patch_trl_rl_trainers_impl and make _patch_trl_rl_trainers a thin wrapper that catches any uncaught exception and routes it through logger.info, matching the umbrella wrapper's behaviour. Power users who want the raw raising behaviour for their own diagnostics can still call _patch_trl_rl_trainers_impl directly. Adds tests/python/test_patch_trl_rl_trainers_defensive.py to lock the contract: the wrapper must never raise, and it must delegate to the impl on the happy path. Unblocks 1 skip in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. Follow-up for #5312 once this lands: drop the two `--deselect` lines in studio-backend-ci.yml's repo-cpu-tests step and drop the `except Exception ... pytest.skip(f"_patch_trl_rl_trainers raised: ")` block in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. * chore: tighten comments and docstrings in the new code Drop verbose justifications down to one or two lines per site. The PR description carries the full context; in-file comments only need to point at the WHY. * chore(no-torch-runtime): drop redundant lower bound on tokenizers tokenizers 0.23.0 was never published to PyPI (versions go 0.22.2 -> 0.23.1), so `tokenizers<=0.23.0` resolves to 0.22.2 in practice, the same version the explicit >=0.22.0,<=0.23.0 pin resolved to. Verified on Python 3.12 and 3.13.	2026-05-11 02:39:17 -07:00
Daniel Han	a56c959233	Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke (#5298 ) * Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke The repo currently has no PR-time CI; only release-desktop.yml (manual) and stale.yml (issue pinger). studio/backend/tests/ has 35 test files (~860 tests collected) that never run automatically. Frontend lint/typecheck/build scripts exist in package.json but are not gated on PRs either. This is the gap that let 2026.5.1 ship with the broken Studio chat-history bundle. Adds four ubuntu-latest workflows, all CPU-only and free for public repos: studio-pin-enforce.yml Greps studio/frontend/package.json for caret/tilde ranges on the @assistant-ui surface (and assistant-stream). Blocks the exact regression vector that produced 2026.5.1 (^0.12.19 resolving to a breaking 0.12.28). studio-frontend-ci.yml npm ci (strict lockfile), tree-clean check after, typecheck, vite build, bundle grep for the Studio unstable_Provider call site (<= 3 hits = OK, >= 4 = the 2026.5.1 regression), 75 MB dist budget, biome non-blocking. Uploads dist on failure. studio-backend-ci.yml Runs the existing studio/backend/tests/ suite on Python 3.10/3.11/3.12. Excludes test_studio_api.py (live model + GGUF download) and llama_cpp_load_progress_live (spawns a real llama.cpp). Local run on this branch: 861 pass, 4 skipped, 5 deselected. ruff non-blocking. wheel-smoke.yml python -m build, then verifies the produced wheel: - ships studio/frontend/package-lock.json - ships studio/frontend/dist/index.html - does NOT ship studio/frontend/node_modules/ - does NOT ship studio/frontend/bun.lock - main JS bundle has < 4 unstable_Provider hits Then installs the wheel into a fresh venv with a lightweight dep set and imports studio.backend.main. Locally validated against the wheel built from this branch. Each workflow has concurrency cancellation on the same ref. biome and ruff are gated as non-blocking until the existing accumulated drift is cleared (~470 biome errors today); remove the bypass in a follow-up. Notes verified locally: - pin enforcement: PASS (carets dropped on this branch) - frontend npm ci -> typecheck -> build -> grep -> budget: PASS - bundle: 48 MB, hits=1 - backend pytest: 861 pass, 1 GPU-pollution failure not reproducible on GPU-less runners (won't reproduce on ubuntu-latest) - wheel build: 13s, produces unsloth-2026.5.2-py3-none-any.whl - wheel content sanity: all five checks PASS * CI: install full backend dep set + refine pytest filter for CPU runners First CI run on PR #5298 surfaced two real gaps: 1. pytest collection failed at `import yaml` in utils/models/model_config. Locally my workspace venv had pyyaml from a transitive; CI's clean Python 3.10/3.11/3.12 didn't, so collection hit ModuleNotFoundError on the very first test module. Same blew up the wheel-smoke `from studio.backend.main import app` step. 2. Once the import chain was complete, ~9 tests still failed because they exercise GPU-only paths or live transformers introspection that can't run on a GPU-less `ubuntu-latest` runner regardless of code correctness: - TestGpuAutoSelection - TestPreSpawnGpuResolution - TestPerGpuFitGuardAllCounts - TestTransformersIntrospection - test_returns_cuda_when_cuda_available - test_calls_cuda_cache_when_cuda Fix: - Backend CI installs `studio/backend/requirements/studio.txt` (the declared backend dep set) + the extras the import chain needs but studio.txt omits (python-multipart, sqlalchemy, cryptography, pyyaml, jinja2, mammoth, unpdf, requests, etc.) + torch CPU wheel + transformers. - Refine the pytest -k filter to deselect the GPU/introspection-bound classes by name. Deselections are commented inline with the reason. - wheel-smoke uses the same dep set so the import smoke matches. Locally validated against the freshly-built unsloth-2026.5.2 wheel: 831 passed, 5 skipped, 35 deselected, 0 failed in 47s Studio backend imports cleanly in a fresh venv after the wheel install. * CI: collapse multiline pytest -k expression to a single line YAML's \| block-scalar fed the newlines verbatim into the -k argument and pytest rejected it as 'Wrong expression passed to -k'. Same logical filter on one line. * CI: rename jobs so the GitHub UI shows what each check actually does Adds a per-job 'name:' to all four workflows so the PR check list reads: Studio pin enforcement / @assistant-ui must be pinned exactly Studio frontend CI / Frontend build + bundle sanity Studio backend CI / Backend pytest (Python 3.10\|3.11\|3.12) Studio backend CI / Backend ruff lint (non-blocking) Wheel build + smoke / Wheel build + content sanity + import smoke Instead of the default '<workflow> / <job-key>' which was opaque ('check', 'build', 'pytest (3.10)', 'ruff', 'wheel'). * CI: add Python 3.13 to backend pytest matrix Verified locally: 831 backend tests pass under Python 3.13 with the same filter set used for 3.10 / 3.11 / 3.12. * CI: add Studio inference smoke + Tauri build smoke Two new workflows. Both CPU-only, both free on `ubuntu-latest`. studio-inference-smoke.yml The only workflow we have that proves "Studio actually works", as opposed to "the bundle parses" or "the imports succeed": - runs install.sh --local --no-torch (lean Studio install) - downloads unsloth/gemma-4-E2B-it-GGUF UD-IQ3_XXS into actions/cache - boots Studio in api-only mode - logs in with the bootstrap password, changes it, re-logs - POST /api/inference/load on the GGUF - POST /api/inference/chat/completions and asserts a non-empty assistant response Validated end-to-end locally on a fresh main install: model loaded, chat completion returned `Hello!` against the same GGUF the workflow uses. studio-tauri-smoke.yml PR-time variant of release-desktop.yml. Linux-only debug build (`tauri build --debug --no-bundle`) on ubuntu-22.04. Catches src-tauri Cargo.toml / Rust source breakage, tauri.conf.json drift, and frontend-distDir wiring. Pinned to the same Tauri CLI version (2.10.1) as release-desktop.yml so CLI bumps surface in CI before they break the release pipeline. Mac and Windows desktop builds stay manual via release-desktop.yml because they need code-signing secrets. * CI: use 'hf download' instead of deprecated 'huggingface-cli download' huggingface_hub 1.13.0 dropped the huggingface-cli entrypoint. The replacement is the 'hf' CLI shipped with the same package. Same args, just s/huggingface-cli/hf/. * CI: assert llama.cpp prebuilt path was used on ubuntu-latest The inference-smoke job runs on ubuntu-latest (CPU-only, x86_64), which is exactly the host shape that should pick up ggml-org/llama.cpp's bin-ubuntu-x64.tar.gz prebuilt directly. If install.sh ever falls back to a source build on this runner, the studio/setup.sh routing has regressed and every CPU-only Linux user is paying a 3 minute compile cost again. Tee install.sh output to logs/install.log, then fail the job if the log contains "falling back to source build" or is missing the success marker "prebuilt installed and validated" / "prebuilt up to date and validated". Also include logs/install.log in the failure artifact so the prebuilt diagnostics are uploaded alongside studio.log when the job fails. * Tighten prebuilt-assertion comment in studio-inference-smoke * CI: switch inference-smoke model to Qwen3.5-2B UD-IQ3_XXS Drops the Gemma 4 E2B GGUF (~2.3 GB) for unsloth/Qwen3.5-2B-GGUF (UD-IQ3_XXS, ~890 MiB). Cache-miss download is roughly a third of what it was, and CPU inference on ubuntu-latest finishes well inside the 25 minute job budget. Verified locally: load via /api/inference/load returns status=loaded, is_gguf=true, supports_reasoning=true, supports_tools=true; chat completion returns a non-empty assistant message ("Hello!"). * CI: add workflow_dispatch to inference-smoke for manual cache pre-warm * CI: fold pin-enforce grep into studio-frontend-ci, drop standalone workflow The "@assistant-ui must be pinned exactly" check was its own ~7 second workflow, doing a single grep on studio/frontend/package.json. Move it into studio-frontend-ci.yml as a pre-install step (right after checkout, before any node setup so a violation fails fast). One fewer top-level check row on every PR, same coverage. Add a FIXME so this step is dropped once @assistant-ui/* and assistant-stream leave 0.x: on 1.x, caret ranges are conventional and this becomes overzealous. * CI: add Repo tests (CPU) job, mirroring unsloth-zoo PR #624 conftest The top-level tests/ tree was previously not run anywhere. 23 of its files are CPU-friendly with the right harness: pure-Python helpers, ast walks, installer logic, and CLI shape tests. Locally validated: 302 passed, 9 skipped, 12 deselected in ~7 seconds on Python 3.12. Three pieces: 1. tests/conftest.py -- GPU-free harness, mirrors the conftest landed in unslothai/unsloth-zoo PR #624. Pre-loads unsloth_zoo.device_type and unsloth.device_type under a temporarily-mocked torch.cuda.is_available() so each module's @cache permanently captures "cuda" and the import chain succeeds on a CPU runner. Also stubs torch.cuda.get_device_capability / is_bf16_supported / mem_get_info, which unsloth/__init__.py and unsloth_zoo.temporary_patches probe at import time when DEVICE_TYPE == "cuda". On a real accelerator the harness is skipped and detection runs normally. 2. Two existing tests were leaking sys.modules state across the session because they injected stubs without an __spec__ and without restoration: - tests/test_raw_text.py shoved a "datasets" stub into sys.modules. transformers' import_utils later did importlib.util.find_spec("datasets") and got ValueError: datasets.__spec__ is None. - tests/python/test_fast_sentence_transformer_redirect_lifecycle.py shoved "transformers", "sentence_transformers", and "sentence_transformers.models" stubs in. Subsequent tests that did `import transformers` got the non-package stub. Fix: set __spec__ on stubs, plus an autouse fixture in the sentence-transformer test file that restores the three keys after each test. 3. .github/workflows/studio-backend-ci.yml gains a third job, `Repo tests (CPU)`, that installs the same dep set as the backend-pytest matrix (Python 3.12 only -- the tests are version-independent), exports PYTHONPATH=studio so tests/python/* can import install_python_stack, and runs the 23-file subset above with `-m 'not server and not e2e'`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * CI: install unsloth_zoo for Repo CPU tests, harden conftest fallback The CPU job at run 25422050018 broke at conftest collection: the preload of unsloth.device_type pulled in `from unsloth_zoo.utils import Version` and ubuntu-latest didn't have unsloth_zoo on the path because it is an optional dep of unsloth. Two fixes: 1. Install unsloth_zoo>=2026.5.1 alongside the other deps in the Repo tests (CPU) job (it's also what unsloth's optional `huggingface` extra pins). 2. Wrap the body of _preload_device_type in conftest.py in a try/except so any import failure (missing prereq, broken module, etc.) cleanly returns False instead of aborting the entire collection. The caller already falls back to the stub device_type module on False, so the net behavior is "best effort: real device_type if possible, stub otherwise" instead of "abort the test session". * kernels.utils: guard CUDA_STREAMS / XPU_STREAMS init for DEVICE_COUNT==0 When DEVICE_COUNT is 0 (CPU host: no visible NVIDIA / AMD / Intel GPU) the dict comprehension {... for i in range(0)} was empty and the subsequent max(_CUDA_STREAMS.keys()) raised ValueError: max() iterable argument is empty during module import. That made unsloth.kernels.utils unimportable on any CPU runner, which in turn blocked all of tests/saving/*, three top-level tests/test_.py, and tests/qlora/test_unsloth_qlora_train_and_merge.py from even collecting on CPU CI. Wrap the per-device-index dict comprehension and max() machinery in a DEVICE_COUNT > 0 guard. When DEVICE_COUNT is 0 fall back to empty containers (CUDA_STREAMS = (), WEIGHT_BUFFERS = [], ABSMAX_BUFFERS = []). The consumer functions further down in this module index these arrays by device_index but only during real GPU work, so the empty fallbacks never get touched on a CPU host. GPU-safety verified locally: with 8 visible CUDA devices, CUDA_STREAMS has 8 entries (identical to before this PR). With CUDA_VISIBLE_DEVICES="" the module imports cleanly, CUDA_STREAMS is (), and the previously blocked tests now collect (test_get_model_name passes 38 subtests, test_resolve_model_class passes 9, test_model_registry collects all 8 parametrizations). Same shape applied to the DEVICE_TYPE == "xpu" branch for symmetry. * CI: switch Repo tests (CPU) to auto-discovery + isolate flakes Three changes, locally validated end-to-end (779 passed, 11 skipped, 23 deselected, 0 failed across all three steps): 1. Repo tests (CPU, auto-discovered): replace the explicit 23-file list with `pytest tests/` plus a small set of `--ignore` and `--deselect` flags. New tests under tests/python, tests/studio (excluding the two state-sensitive files), and top-level tests/test_.py are picked up automatically with no workflow edit. --ignore covers: - tests/qlora and tests/saving: GPU-bound by design - tests/utils: helpers folder, not tests - tests/sh: shell suite handled in its own step - two state-polluting hardware-spoof files (next step) -m 'not server and not e2e': honours markers already declared in tests/python/conftest.py --deselect: test_model_registration / test_all_model_registration hit huggingface_hub live; they belong on a network job 2. Hardware-spoof tests (state-sensitive, run in isolation): tests/studio/test_hardware_dispatch_matrix.py and tests/studio/test_is_mlx_dispatch_gate.py mutate module globals in studio.backend.utils.hardware.hardware (IS_ROCM, DEVICE) via their spoof fixtures, and the leak crosses file boundaries. Running them in their own pytest invocation avoids polluting the main sweep. Both pass cleanly in isolation: 28 passed, 1 skipped. 3. Shell installer tests: explicitly enumerated subset that does not depend on install.ps1 layout (test_install_host_defaults.sh has drifted; that's a separate followup). Test fixes folded in to keep the run green: - tests/studio/install/test_rocm_support.py::TestAmdGpuMonitoring ::test_amd_primary_gpu_with_mock now clears HIP/ROCR/CUDA_VISIBLE_DEVICES via monkeypatch so _first_visible_amd_gpu_id() does not short-circuit when the runner sets CUDA_VISIBLE_DEVICES="" to suppress CUDA. - tests/studio/test_hardware_dispatch_matrix.py::spoof_hardware fixture now stubs torch.cuda.get_device_properties when cuda_available is True so detect_hardware()'s device_name probe does not call into _cuda_init() on a CPU runner. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * CI: install torchvision (CPU) so unsloth_zoo.vision_utils can import Run 25430652224 collected three test modules that import unsloth and crashed at unsloth_zoo/vision_utils.py:68 with ModuleNotFoundError: No module named 'torchvision' unsloth_zoo.vision_utils unconditionally imports torchvision at module scope, and unsloth.models._utils pulls vision_utils in. The Repo tests (CPU) job installed torch from the CPU index but not torchvision, so any test that imports unsloth.models.* failed at collection. Add torchvision<0.26 to the same pip install --index-url https://download.pytorch.org/whl/cpu line. * CI: install bitsandbytes (CPU build) for unsloth.models._utils import Run 25430982243 collected three test modules that import unsloth and crashed at unsloth/models/_utils.py:1166 with ModuleNotFoundError: No module named 'bitsandbytes' The bnb import there is unconditional. Recent bnb versions (>=0.45) ship a CPU build so the wheel installs on a free Linux runner and the import resolves; the kernels still raise on use but the module collects, which is enough for these CPU tests. Add 'bitsandbytes>=0.45' to the Repo tests (CPU) deps. * CI: rename workflows + guard kernels.utils CPU-torch binding Workflow renames (top-level `name:` keys; affects PR check rows): Studio backend CI -> Backend CI Studio frontend CI -> Frontend CI Studio inference smoke -> Studio GGUF CI Studio Tauri smoke -> Studio Tauri CI Wheel build + smoke -> Wheel CI Backend CI's matrix job goes from "Backend pytest (Python 3.10)" to just "(Python 3.10)" so the GitHub UI row reads "Backend CI / (Python 3.10)" rather than the old verbose form. Production guard for CPU torch (run 25431126138): unsloth/kernels/utils.py:165 was an unconditional _gpu_getCurrentRawStream = torch._C._cuda_getCurrentRawStream which raised AttributeError on a CPU-only torch wheel because the compiled CUDA backend is absent. Three test modules (test_get_model_name, test_model_registry, test_resolve_model_class) crashed at collection because their import chain reaches this line. Add a hasattr probe: when torch is built without CUDA, fall through to a no-op binding that returns 0. _get_tensor_stream is only invoked during real GPU work, so the no-op is never executed on a CPU host. GPU-safety verified locally: with 8 visible CUDA devices the binding still resolves to the real torch._C._cuda_getCurrentRawStream (behaviour identical to before this PR). The XPU branch is untouched. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-05-06 04:41:57 -07:00
Manan Shah	d65149795b	feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) (#5265 ) * Add Apple Silicon MLX routing Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports Extract original GPU init to _gpu_init.py (unchanged) MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code GPU path unchanged: from ._gpu_init import * * Add Apple Silicon MLX routing - Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports - Extract original GPU init to _gpu_init.py (unchanged) - MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code - GPU path unchanged: from ._gpu_init import * * mlx with studio * mlx with studio * updating temporary install.sh * updating temporary install.sh * adding t_v5 path * adding t_v5 path * fixing vision training * fixing vision training * adding chat * adding chat * minor * minor * Adding export and fixing training issues, inference with lora adaptors * Adding export and fixing training issues, inference with lora adaptors * fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, streaming for VLM * fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, streaming for VLM * Merge mlx-apple-silicon into main * update install.sh to point to main branch * update install.sh to point to main branch * fix: export returns 3 values (success, message, output_path) matching upstream worker * fix: export returns 3 values (success, message, output_path) matching upstream worker * fix(mlx): show training-process peak memory in Studio UI, not system-wide Studio UI was showing ~95 GB during MLX training because get_gpu_utilization read "In use system memory" from IORegistry's AGXAccelerator — system-wide GPU memory across all processes (training + backend + browser + Display). Now the trainer's mx.get_peak_memory value is forwarded through the progress event and surfaced via /api/train/hardware while training is active. Falls back to the system-wide reading when training is not running. * fix(mlx): show training-process peak memory in Studio UI, not system-wide Studio UI was showing ~95 GB during MLX training because get_gpu_utilization read "In use system memory" from IORegistry's AGXAccelerator — system-wide GPU memory across all processes (training + backend + browser + Display). Now the trainer's mx.get_peak_memory() value is forwarded through the progress event and surfaced via /api/train/hardware while training is active. Falls back to the system-wide reading when training is not running. * fix(mlx): make is_bfloat16_supported detect M1/M2 (no native bf16) M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70% slower prefill compared to native fp16. M3+ have native bf16 (macOS Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware detection via mx.device_info. * fix(mlx): make is_bfloat16_supported() detect M1/M2 (no native bf16) M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70% slower prefill compared to native fp16. M3+ have native bf16 (macOS Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware detection via mx.device_info(). * feat(mlx): wire training_type="Full Finetuning" through MLX worker Compute use_lora from the UI's training_type before loading the model, pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and let the existing 'if use_lora' branch skip get_peft_model. Matches the GPU worker's flow. * feat(mlx): wire training_type="Full Finetuning" through MLX worker Compute use_lora from the UI's training_type before loading the model, pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and let the existing 'if use_lora' branch skip get_peft_model. Matches the GPU worker's flow. * fix(mlx): pass save_method='merged_16bit' from Studio's export page Previously the MLX path called save_pretrained_merged with no save_method, which fell through to a no-op that didn't actually fuse LoRA into the base. Now Studio's "Merged Model" export properly fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU behavior for the same UI option. * fix(mlx): pass save_method='merged_16bit' from Studio's export page Previously the MLX path called save_pretrained_merged() with no save_method, which fell through to a no-op that didn't actually fuse LoRA into the base. Now Studio's "Merged Model" export properly fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU behavior for the same UI option. * fix(studio): pass private to MLX push, return 3-tuples consistently MLX push_to_hub branch now forwards private=private (matches GPU) Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model needed') were tripping the route's 3-tuple unpack. Added a None output_path so the unpack always succeeds. * fix(studio): pass private to MLX push, return 3-tuples consistently - MLX push_to_hub branch now forwards private=private (matches GPU) - Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model needed') were tripping the route's 3-tuple unpack. Added a None output_path so the unpack always succeeds. * studio wirings * studio wirings * Merge pull request #5 from Manan17/feat/quant_config studio wirings * fix(mlx): wire train_on_completions for VLM via per-template lookup Mirror the GPU worker: stop excluding VLMs and stop hardcoding template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and fetch the per-template instruction/response markers from TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables train_on_completions for vision+image and audio cases, so backend just trusts the flag. * fix(mlx): wire train_on_completions for VLM via per-template lookup Mirror the GPU worker: stop excluding VLMs and stop hardcoding template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and fetch the per-template instruction/response markers from TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables train_on_completions for vision+image and audio cases, so backend just trusts the flag. * wire in lora rslora, init lora weights, random_state * wire in lora rslora, init lora weights, random_state * loftq studio error message fix * loftq studio error message fix * handle unknown optim and lr scheduler * handle unknown optim and lr scheduler * Merge pull request #6 from Manan17/update/peftkwargs Update/peftkwargs * feat(mlx): pass finetune_language/attention/mlp/vision flags to FastMLXModel Studio's four UI checkboxes now actually flow through to MLX get_peft_model (which was just updated in unsloth-zoo to honor them). Also drops the incorrect train_projector wiring that tied projector LoRA to the attn/mlp flags — those are language-side toggles, not projector toggles. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> * feat(mlx): pass finetune_language/attention/mlp/vision flags to FastMLXModel Studio's four UI checkboxes now actually flow through to MLX get_peft_model (which was just updated in unsloth-zoo to honor them). Also drops the incorrect train_projector wiring that tied projector LoRA to the attn/mlp flags — those are language-side toggles, not projector toggles. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> * feat(mlx,ux): auto-imply finetune_language_layers when user picks attn/mlp UI guardrail. The four checkboxes (vision/language/attention/MLP) carry "scope × module-type" semantics that aren't obvious — picking just "Attention modules" + "MLP modules" without "Language layers" naturally reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune attn/mlp modules in no tower" → empty target_modules → zero trainable params → crash inside value_and_grad. If user selected attn or mlp module types but no layer scope, default to language scope. Power users can still explicitly choose language=False, vision=True if they want vision-only fine-tuning of attn/mlp. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> * feat(mlx,ux): auto-imply finetune_language_layers when user picks attn/mlp UI guardrail. The four checkboxes (vision/language/attention/MLP) carry "scope × module-type" semantics that aren't obvious — picking just "Attention modules" + "MLP modules" without "Language layers" naturally reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune attn/mlp modules in no tower" → empty target_modules → zero trainable params → crash inside value_and_grad. If user selected attn or mlp module types but no layer scope, default to language scope. Power users can still explicitly choose language=False, vision=True if they want vision-only fine-tuning of attn/mlp. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> * fix(mlx): wire top_k, repetition_penalty, and VLM top_p through to mlx-lm/mlx-vlm Inference UI sliders for top_k and repetition_penalty had no effect on MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing bug: mlx_vlm.generate_step expects temperature= (long form), but we were passing temp= which silently fell into *kwargs — every VLM chat was effectively greedy regardless of the temperature slider. Text path (_generate_text): make_sampler now receives top_k in addition to temp/top_p make_logits_processors built and forwarded when repetition_penalty is non-trivial (skip when 0.0/1.0 to avoid pointless overhead) VLM path (_generate_vlm): Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate forwards them to generate_step's sampler/logits_processor builders) Rename temp= → temperature= so it's actually consumed Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10, rep_pen=1.5} now produces a distinct output, proving the parameters reach the sampler. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> fix(mlx): wire top_k, repetition_penalty, and VLM top_p through to mlx-lm/mlx-vlm Inference UI sliders for top_k and repetition_penalty had no effect on MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing bug: mlx_vlm.generate_step expects temperature= (long form), but we were passing temp= which silently fell into *kwargs — every VLM chat was effectively greedy regardless of the temperature slider. Text path (_generate_text): - make_sampler now receives top_k in addition to temp/top_p - make_logits_processors built and forwarded when repetition_penalty is non-trivial (skip when 0.0/1.0 to avoid pointless overhead) VLM path (_generate_vlm): - Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate forwards them to generate_step's sampler/logits_processor builders) - Rename temp= → temperature= so it's actually consumed Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10, rep_pen=1.5} now produces a distinct output, proving the parameters reach the sampler. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> feat(mlx): map format_type to MLX save_method, reuse local save dir for hub push export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit" (was hardcoded merged_16bit, ignoring the UI choice). Both export_merged_model and export_base_model now pass save_directory= to push_to_hub_merged so it reuses the just-written local folder instead of re-saving under a relative "username/model" directory. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> * feat(mlx): map format_type to MLX save_method, reuse local save dir for hub push - export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit" (was hardcoded merged_16bit, ignoring the UI choice). - Both export_merged_model and export_base_model now pass save_directory= to push_to_hub_merged so it reuses the just-written local folder instead of re-saving under a relative "username/model" directory. Co-Authored-By: Manan17 <shahmanan170602@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restore install * restore install * fix(mlx): restore FastVisionModel as a distinct class unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel` right after defining `class FastVisionModel(FastLanguageModel)` with a `for_training` static method. The alias erased the class binding, so the documented `FastVisionModel.for_training(model)` call from upstream Unsloth's VLM notebooks raised `AttributeError` on MLX. Remove the offending alias. `FastVisionModel` is now a real subclass of `FastLanguageModel` again — inherits `from_pretrained` / `get_peft_model` / `for_inference`, exposes `for_training` as a no-op pass-through (no-op because MLX doesn't have a train/eval mode flag; the call exists purely for GPU/MLX notebook parity). Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via FastVisionModel.from_pretrained → get_peft_model → for_training → MLXTrainer.train runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs, peak 5.89 GB). Studio's path (FastLanguageModel.from_pretrained for any repo, auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8. * fix(mlx): restore FastVisionModel as a distinct class unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel` right after defining `class FastVisionModel(FastLanguageModel)` with a `for_training` static method. The alias erased the class binding, so the documented `FastVisionModel.for_training(model)` call from upstream Unsloth's VLM notebooks raised `AttributeError` on MLX. Remove the offending alias. `FastVisionModel` is now a real subclass of `FastLanguageModel` again — inherits `from_pretrained` / `get_peft_model` / `for_inference`, exposes `for_training` as a no-op pass-through (no-op because MLX doesn't have a train/eval mode flag; the call exists purely for GPU/MLX notebook parity). Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via FastVisionModel.from_pretrained → get_peft_model → for_training → MLXTrainer.train() runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs, peak 5.89 GB). Studio's path (FastLanguageModel.from_pretrained for any repo, auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8. * Studio: harden MLX training and export, restore GPU init guards Studio export Restore Tuple[bool, str, Optional[str]] contract on export_merged_model, export_base_model, export_gguf, and export_lora_adapter, populating output_path on successful local saves so routes/worker/CLI/frontend details.output_path is non-empty again. Lift the GPU save_method assignment out of the local-save branch so Hub-only merged exports (save_directory='', push_to_hub=True) no longer hit UnboundLocalError on the push branch. For MLX merged and base hub-only export, stage to a tempfile.TemporaryDirectory before push_to_hub_merged instead of passing save_directory=''. Source _IS_MLX from unsloth instead of recomputing the platform check (single source of truth, also enforces mlx-package availability). Studio MLX training/inference Pass token=hf_token into FastMLXModel.from_pretrained for gated/private models, matching the inference path. Strip hf_token and wandb_token from wandb.init(config=...) so secrets do not leak into the W&B run config. Replace load_from_disk(local_datasets[0]) with the existing UnslothTrainer._resolve_local_files / _loader_for_files helpers so uploaded JSON/JSONL/CSV/Parquet files train through the normal datasets loader (load_from_disk still used for HF save_to_disk directories). Make the dataset slice helper inclusive at the end and treat 0 as a real index instead of "unset", matching the GPU and embedding paths. Add a status_message -> message alias inside _send so the existing parent pump (training.py) renders MLX status updates instead of blanks. Forward min_p through generate_chat_response into _generate_text / _generate_vlm and into make_sampler / vlm_kwargs so the sampling control is no longer a no-op on MLX. Wrap unsloth_zoo.mlx_loader / mlx_trainer imports with a clearer ImportError pointing users at install.sh for Apple Silicon. Exit the MLX stop-polling thread on EOFError/OSError instead of busy-looping when the queue/pipe is permanently closed (one-line why-safe rationale inline). Studio frontend ParamsSection subscribes to platform deviceType via the Zustand hook so the gradient checkpointing dropdown re-renders after the async device fetch completes. Studio hardware get_gpu_utilization MLX branch now reads _read_apple_gpu_stats once and derives VRAM totals from psutil, removing the second ioreg subprocess per utilization poll. Unsloth core Restore the os.geteuid == 0 guard around the CUDA ldconfig recovery that was lost when GPU initialization moved into _gpu_init.py, plus the non-root manual-fix warning branch. Non-root CUDA users no longer shell out to ldconfig at import time. Load dataprep/raw_text via importlib so the MLX import path no longer pulls torch in through dataprep/__init__.py -> synthetic.py. FastVisionModel.from_pretrained overrides the inherited delegator only to inject text_only=False; this is an extension, not a duplication, and is needed so VLM checkpoint loads keep the vision tower. Wrap the MLX-branch unsloth_zoo import with a clearer ImportError. * Studio: regression tests for MLX training/export and GPU init ldconfig guard tests/python/test_gpu_init_ldconfig_guard.py asserts the geteuid root check still wraps the ldconfig recovery and the non-root branch warns bnb users; AST + source-text inspection so the test runs without torch. tests/studio/test_export_output_path_contract.py covers the Tuple[bool, str, Optional[str]] return contract on every export method, the output_path assignment after successful local save, the Hub-only GPU save_method binding fix, the MLX hub-only TemporaryDirectory staging, and the single-source `_IS_MLX` import from unsloth. tests/studio/test_mlx_training_worker_behaviors.py covers token forwarding to FastMLXModel.from_pretrained, wandb config secret stripping, file-aware local dataset loading, status_message -> message aliasing, inclusive slice semantics, EOFError/OSError stop thread exit, and the friendly mlx_loader / mlx_trainer ImportError. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(mlx): cap inference memory + release wired on unload + tame worker pre-pin Three memory-hardening fixes for Studio's MLX path: 1. Inference applies the same Metal caps as the trainer. load_model previously only called set_wired_limit(100% of recommended) with no upper memory_limit, leaving large VLM checkpoints unbounded during the loader allocation. Add _configure_memory_limits() that sets memory_limit to 85% of recommended and wired_limit to min(recommended, memory_limit) — matching MLXTrainer's defaults so behavior is the same whether the user trains or just runs inference. 2. unload_model releases pinned memory back to the OS — but only when the cache is empty. Without this, pinned wired bytes stayed allocated to MLX after the model was gone, starving other apps. The release is guarded on `not self.models` so unloading one of several cached models doesn't un-pin weights still in use. 3. Worker pre-cap is conservative instead of aggressive. The previous pre-pin set_wired_limit(100% of recommended) competed with MLXTrainer's later more conservative cap. Replace with the same 85%-memory / min(rec, memory) pair that the trainer applies later (idempotent re-apply). Bounds the model load + LoRA setup window without over-pinning. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests/studio: regression tests for the _IS_MLX dispatch gate Two gates drive every MLX-vs-CUDA dispatch decision in Studio: 1. unsloth._IS_MLX in unsloth/__init__.py — evaluated once at import time, read by Studio worker code to choose the GPU vs MLX trainer and inference paths. Defined as Darwin AND arm64 AND find_spec("mlx") is not None. 2. utils.hardware.detect_hardware() — runtime probe with priority CUDA > XPU > MLX > CPU. The MLX branch is reached only when both CUDA and XPU are unavailable and the host is Apple Silicon and mlx is importable. Neither gate had a direct test. Adds tests/studio/test_is_mlx_dispatch_gate.py with six tests: test_is_mlx_gate_uses_three_required_predicates AST-walks unsloth/__init__.py and asserts the _IS_MLX assignment is a BoolOp(And) of platform.system()=="Darwin", platform.machine()=="arm64", and find_spec("mlx") is not None. Catches accidental rewrites that drop a predicate. test_is_mlx_gate_true_on_apple_silicon_with_mlx_present Spoofs platform to Darwin/arm64, injects a fake mlx module so find_spec returns a real ModuleSpec, re-evaluates the gate expression. Verifies it flips True under the exact conditions Studio expects. test_is_mlx_gate_false_when_mlx_missing Spoofs Apple Silicon but with mlx absent. Verifies the gate stays False (so a Mac without mlx installed does not pretend to have MLX support). test_is_mlx_gate_false_on_non_apple_silicon Canary on the actual Linux+CUDA / AMD / Intel test host: the gate must remain False regardless of whether mlx happens to be importable. Protects existing GPU users from accidental MLX hijack when MLX support evolves. test_detect_hardware_picks_mlx_when_only_apple_silicon_available Forces torch.cuda and torch.xpu off, spoofs Apple Silicon, injects fake mlx and mlx.core. detect_hardware() must return DeviceType.MLX. test_detect_hardware_picks_cuda_on_real_host Canary: on a real CUDA host detect_hardware() must return DeviceType.CUDA. Protects against the MLX branch shadowing CUDA dispatch on NVIDIA / AMD ROCm hosts. Uses the same monkeypatch.setitem(sys.modules, ...) fake-mlx pattern as the existing test_mlx_inference_backend.py — no new test infrastructure, no real mlx install required. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add AGPL-3.0 SPDX header to Studio MLX regression tests Four Studio MLX test files shipped without an SPDX-License-Identifier: studio/backend/tests/test_mlx_training_worker_config.py tests/studio/test_mlx_training_worker_behaviors.py tests/studio/test_export_output_path_contract.py tests/studio/test_is_mlx_dispatch_gate.py They sit in or alongside studio/backend/, which is governed by studio/LICENSE.AGPL-3.0, and exercise AGPL Studio code. Add the same "# SPDX-License-Identifier: AGPL-3.0-only" header that's already on test_mlx_inference_backend.py so the license declaration matches the code under test rather than defaulting to the repo-root Apache-2.0. * Wrap MLX submodule imports with friendly install hint The _IS_MLX block at the top of unsloth/__init__.py already catches the missing-package case with a friendly install hint, but the follow-up "from unsloth_zoo.mlx_trainer import ..." and "from unsloth_zoo.mlx_loader import ..." lines run unguarded. An Apple Silicon user who has unsloth-zoo installed but on an older version (e.g. the current PyPI release, before the MLX modules ship) sees a raw ImportError on the submodule rather than the hint that points at install.sh. Wrap the two submodule imports in the same try/except shape so the friendly install message fires whether the package is missing entirely or just predates the MLX submodules. No-op once both packages release together; smooths the transitional window where unsloth/main has merged but unsloth-zoo on PyPI has not. --------- Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-05-05 23:54:58 -07:00
Etherll	680d43a488	Fix FastSentenceTransformer loading with newer sentence-transformers (#5259 ) * Fix FastSentenceTransformer compatibility with sentence-transformers 5.4 * Support varied Transformer init signatures Detect Transformer.__init__ parameters and build init kwargs accordingly so trust_remote_code and other args are passed using the correct names. Instead of unconditionally using model_args/config_args, the code now inspects the constructor to decide between model_kwargs/config_kwargs vs model_args/config_args and also sets processor_kwargs or tokenizer_args when present. Initializes Transformer with constructed transformer_kwargs (including max_seq_length) to improve compatibility with different Transformer implementations. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden SentenceTransformer path and module checks * Scrub .github/workflows for staging push (matches staging base) * Guard auto_model write in FastSentenceTransformer._apply_torch_compile On sentence-transformers >=5.4 Transformer.auto_model is a read-only @property backed by self.model, so a direct assignment raises AttributeError. The two get_peft_model paths already guard the write with isinstance(getattr(type(...), "auto_model", None), property); the auto-compile path missed the same guard, which broke the default trainer path whenever max_steps >= _compile_threshold. * Add tests for FastSentenceTransformer property guards * Tighten FastSentenceTransformer redirect lifecycle tests Drop a duplicate assertion-less case, remove dead AST extraction helper, and trim unused imports. The remaining six tests cover substitution on match, restoration on constructor exception, passthrough for unrelated names, pathlib.Path normalisation, trailing slash handling, and the no-identifier guard. * Sync .github/workflows with upstream author branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Avoid sharing trust_remote_code kwargs dict across constructor buckets In FastSentenceTransformer._create_transformer_module, the same trust_remote_code_kwargs dict was being assigned to model_kwargs, config_kwargs, and processor_kwargs (or model_args / config_args / tokenizer_args) on the Transformer constructor. transformers' from_pretrained code paths (configuration_utils, auto_factory, processing_auto, etc.) call kwargs.pop("trust_remote_code", ...) on the dict they receive, which would drain the shared object and silently strip trust_remote_code from the other buckets. Pass an independent copy to each bucket so subsequent buckets and any pass-through auxiliary loads still see trust_remote_code. * Wire do_lower_case and return_dict through Transformer init for ST 5.4 In FastSentenceTransformer._create_transformer_module: - When Transformer.__init__ accepts do_lower_case (ST 5.4+), pass the unsloth tokenizer's do_lower_case as a constructor kwarg. The existing post-init attribute assignment alone is too late: ST 5.4's __init__ uses do_lower_case to install a Lowercase normalizer on tokenizer.backend_tokenizer.normalizer, which is not re-applied if we only set the attribute after construction. The post-init line is preserved untouched for older ST versions. - Add return_dict to the manually completed model_forward_params set so wrapped models with forward(args, kwargs) signatures keep ST's forced dict-like output safety net. ST 5.4's own __init__ unions the forward signature with the same set plus return_dict; the previous override silently dropped it. Preserve flash-attention forward keys when wrapping ST 5.4 Transformer Sentence-transformers 5.4's Transformer.__init__ calls _can_flatten_inputs() during construction, which augments self.model_forward_params with cu_seq_lens_q, cu_seq_lens_k, max_length_q, max_length_k, seq_idx whenever feature-extraction with text modality, the torch backend, flash-attention 2, and varlen flash-attn support are all available. The post-init override of transformer_module.model_forward_params used to replace the attribute outright, silently dropping those keys so ST's preprocess() filter stripped flash-attn kwargs before reaching model.forward. Snapshot the constructor-populated set first, leave the existing overwrite intact for the forward-signature plus tokenizer keys, and union the snapshot back in so flash-attn forwarding keeps working on ST 5.4. For older sentence-transformers releases the attribute is absent and getattr returns an empty set, leaving behavior unchanged. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-05-05 04:15:54 -07:00
Roland Tannous	0da8af56d6	unsloth run: add --enable-tools/--disable-tools server-side tool policy (#5277 ) * Add process-level tool_policy state for unsloth run * Apply tool_policy override at chat/completions, /messages, and tool pass-through gates * Add pure resolver for unsloth run --enable-tools/--disable-tools * Wire --enable-tools/--disable-tools into unsloth run * Color tool-policy notices and confirmation prompt in Claude orange * Always show tool-status notice; print URL + API key in silent mode * Treat any non-loopback bind as external; forward --yes after parent prompt * Fix tool_policy double-module bug: import via state.tool_policy to share global with routes	2026-05-05 12:45:15 +04:00
Datta Nimmaturi	4f9c8321a2	Fix DPO trainer multi process hang (#5199 ) * Fix DPO trainer multi process hang * Fix datacollator error * further dpo vision changes * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden DPO vision row processing and source rewrites - dpo_trainer_vision_signature_columns: also match TRL 0.22.x layout (image_sizes followed by ref_chosen_logps), so vision keys are not stripped via remove_unused_columns on the originally-affected version. - dpo_trainer_concatenated_inputs: fall back to inserting after the image_sizes block when no token_type_ids anchor follows it. - Apply the same vision model_kwargs forwarding rewrite to _compute_loss_liger via dpo_trainer_compute_loss_liger so the Liger DPO path does not drop pixel_position_ids/image_position_ids/ mm_token_type_ids when args.use_liger_loss is true. - dpo_trainer_vision_process_row: - guard chosen/rejected EOS append with tokenizer.eos_token_id is not None - use features.get("images") and features.get("prompt") to match the existing get on line 164 and avoid KeyError on rows without those keys - drop the torch.is_tensor gate so list-form pixel_position_ids/ image_position_ids returned without return_tensors are still aliased - skip the loop entry for image_position_ids when it was already promoted to pixel_position_ids, so the output dict no longer carries both keys with identical data - dpo_trainer_data_collator_vision_keys: switch from pad_sequence to trl.trainer.utils.pad with padding_side='left' (matches the DPO collator's prompt left-pad) and padding_value=-1 for _position_ids keys (sentinel for padded patches), 0 otherwise. Skip the key when not every example carries it. Falls back to pad_sequence if trl.pad is unavailable or the tensor rank is too high. - dpo_trainer_prepare_dataset: keep TRL's writer_batch_size=10 when popping num_proc; removing it defaults to 1000 and reintroduces the vision OOM risk that writer_batch_size=10 was set to avoid. DPO vision row: keep upstream-facing keys and fix patch padding - dpo_trainer_vision_process_row: no longer aliases image_position_ids to pixel_position_ids. Each upstream-emitted vision key is forwarded under its own name. Gemma4 ForConditionalGeneration.forward accepts image_position_ids directly and renames it to pixel_position_ids only at the vision-tower call site, so aliasing in the row helper hid the kwarg the model actually consumes. - dpo_trainer_vision_process_row: extract pixel_values via "in" membership instead of unconditional indexing. With the missing-images path returning [] to the processor, modern processors no longer emit a pixel_values key, and the previous indexing raised KeyError. - dpo_trainer_data_collator_vision_keys: pick padding_side per key family. _position_ids tensors are patch-aligned to pixel_values (TRL's DataCollatorForPreference right-pads pixel_values), so pad them right with the -1 sentinel; mm_token_type_ids is token-aligned to prompt_input_ids (left-padded by TRL), so pad it left with 0. DPO vision: handle multi-image prompts and arbitrary-rank collator pad - dpo_trainer_vision_process_row: when a prompt is missing vision placeholders, insert one placeholder per missing image instead of always inserting a single token. Multi-image rows now satisfy the processor's token-vs-image count check rather than under-inserting and tripping the placeholder/feature mismatch. - dpo_trainer_data_collator_vision_keys: drop the dim()<=2 gate around trl.trainer.utils.pad. trl.pad handles arbitrary rank correctly, while the previous fallback to torch.nn.utils.rnn.pad_sequence raised RuntimeError on rank-3 patch-position tensors with mismatched non-leading dimensions. The pad_sequence path remains as a degraded fallback only when trl.pad is unavailable or raises. * DPO vision row: support scalar images and align prompt-aligned aux ids - dpo_trainer_vision_process_row: type-aware normalization of the features['images'] column instead of a truthiness/len check that raised on single image objects (PIL.Image has no __len__) and on numpy ndarrays (truthiness ambiguous). Lists/tuples count as their length, scalar image objects count as one, None counts as zero, and the original value is forwarded to the processor. - dpo_trainer_vision_process_row: when max_prompt_length truncates prompt_input_ids, also slice token_type_ids and mm_token_type_ids by the same [-max_prompt_length:] suffix. Those keys are 1:1 token aligned to prompt_input_ids (Gemma 4 vision attention keys off mm_token_type_ids per modular_gemma4.py), so leaving them at the original length silently misaligned the multimodal mask. * DPO vision row: stop synthesizing vision-token placeholders Pass features['prompt'] and features['images'] straight to the processor without inserting any extra placeholder tokens. The previous helper used processing_class.image_token, which is the right prompt placeholder for Gemma 4 but the wrong one for Gemma 3 (whose prompt placeholder is boi_token while image_token is the inner expansion target). Synthesizing that token also broke multi-image rows: text ended up with N placeholders while the row helper only forwarded the first image's pixel_values via the standard [0] indexing that mirrors upstream TRL process_row, so token vs image-feature counts diverged. Removing the synthesis matches stock TRL behavior; users provide the correct placeholders for their processor in the prompt. * Add tests for DPO vision row processor passthrough * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-29 04:15:34 -07:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Datta Nimmaturi	da78c6be71	[Studio] Install flash attn at setup time for linux (#4979 ) * [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 16:40:17 +04:00
Daniel Han	d22b2a18f9	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 ) * fix: add tokenizers to no-torch runtime deps and add TORCH_CONSTRAINT for arm64 macOS py313+ Two installer fixes: 1. Add `tokenizers` to `no-torch-runtime.txt` before `transformers`. Without it, `from transformers import AutoConfig` crashes on startup because `--no-deps` skips transitive dependencies. 2. Add `TORCH_CONSTRAINT` variable to `install.sh`. On arm64 macOS with Python 3.13+, tighten the torch requirement to `>=2.6` since torch <2.6 has no cp313 arm64 wheels. The variable replaces the previously hard-coded constraint in the uv pip install line. Includes 66 tests (42 pytest + 24 bash) covering: - Structural checks on install.sh, install.ps1, no-torch-runtime.txt - Shell snippet tests with mocked python for 13 platform/version combos - Mock uv integration verifying correct constraint string - E2E venv tests on Python 3.12 and 3.13 confirming AutoConfig works - Negative control proving AutoConfig fails without tokenizers - Full no-torch sandbox regression guards (safetensors, huggingface_hub) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incomplete no-torch manifest and align E2E tests with real --no-deps path - Add missing transitive deps to no-torch-runtime.txt that are required under --no-deps: regex, typing_extensions, filelock, httpx, httpcore, certifi, idna, anyio, sniffio, h11. Without these, `from transformers import AutoConfig` still fails after install.sh --no-torch. - Change all E2E tests to use --no-deps (matching what install.sh does) instead of normal dep resolution. Previous tests passed even with an incomplete manifest because uv backfilled transitive deps. - Rewrite negative control to derive from the real no-torch-runtime.txt with tokenizers stripped, proving the specific fix matters. - Replace GNU-only sed -i with heredoc in shell test for macOS compat. - Remove unused os/sys imports from Python test file. - Quote SKIP_TORCH and mock uv paths in bash -c strings. * Assert install succeeds before checking import results in E2E tests Address review feedback: test_torch_not_importable and test_tokenizers_directly_importable in Group 3 now assert that uv pip install returns 0 before checking import behavior. This prevents false positives when the install itself fails silently. * Assert install succeeds in negative control and tighten error check - Add missing install-success assertion in test_negative_control_no_tokenizers to prevent false positives from network/install failures. - Tighten error message check to look for "tokenizers" in stderr or ModuleNotFoundError, rather than the generic "No module" substring which could match unrelated import failures. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-01 06:12:17 -07:00
Daniel Han	2ffc8d2cea	tests: add no-torch / Intel Mac test suite (#4646 ) * tests: add no-torch / Intel Mac test suite Add comprehensive test coverage for the no-torch / --no-torch installer and Studio backend changes introduced in #4624. Shell tests (tests/sh/test_mac_intel_compat.sh): - version_ge edge cases (9 tests) - Architecture detection + Python version resolution (4 tests) - get_torch_index_url on Darwin (2 tests) - UNSLOTH_NO_TORCH propagation via SKIP_TORCH (5 tests) - E2E uv venv creation at Python 3.12 (3 tests) - E2E torch skip with mock uv shim (4 tests) - UNSLOTH_NO_TORCH env propagation (4 tests) - --python override flag parsing + resolution (11 tests) - --no-torch flag parsing (4 tests) - SKIP_TORCH unification (3 tests) - CPU hint printing (2 tests) Python tests (tests/python/test_no_torch_filtering.py): - _filter_requirements unit tests with synthetic + real requirements files - NO_TORCH / IS_MACOS constant parsing - Subprocess mock of install_python_stack() across platform configs - install.sh --no-torch flag structural + subprocess tests Python tests (tests/python/test_studio_import_no_torch.py): - AST checks for data_collators.py, chat_templates.py, format_conversion.py - Parametrized venv tests (Python 3.12 + 3.13) for no-torch exec - Dataclass instantiation without torch - format_conversion convert functions without torch - Negative controls (import torch fails, torchao fails) Python tests (tests/python/test_e2e_no_torch_sandbox.py): - Before/after import chain tests - Edge cases (broken torch, fake torch, lazy import) - Hardware detection without torch - install.sh logic tests (flag parsing, version resolution) - install_python_stack filtering tests - Live server startup tests (opt-in via @server marker) * fix: address review comments on test suite - Fix always-true assertion in test_studio_import_no_torch.py (or True) - Make IS_MACOS test platform-aware instead of hardcoding Linux - Restore torchvision + torchaudio in server test cleanup (not just torch) - Include server stderr in skip message for easier debugging * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-03-27 02:33:45 -07:00
Roland Tannous	19e9c60a8e	Consolidate dual venvs and separate install from update (#4530 ) * refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio * refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check) * fix: install.sh calls setup.sh directly, keep both setup and update CLI commands * fix: use importlib.resources.files() directly without _path attribute * fix: bootstrap uv before pip upgrade to handle uv venvs without pip * fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin * feat: add --local flag to install.sh and unsloth studio update for branch testing * fix: resolve repo root from script location for --local installs * feat: add --package flag to install.sh for testing with custom package names * feat: add --package flag to unsloth studio update * fix: always nuke venv in install.sh for clean installs * revert: remove Windows changes, will handle in separate PR * fix: error when --package is passed without an argument * revert: restore Windows scripts to current main * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars * fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs * fix: align banner box for Setup vs Update labels * deprecate: hide 'unsloth studio setup' command, point users to update/install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: check stdout not stdin for auto-launch detection (curl pipe fix) * fix: update install URL to unsloth.ai/install.sh * fix: update install.sh usage comments to unsloth.ai/install.sh * fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: --local install now also installs unsloth-zoo via base.txt before editable overlay * fix: don't skip base packages for --local installs (editable needs unsloth-zoo) * refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths * feat: add migration support for old .venv and CWD-based installs in setup.sh * Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh" This reverts commit `301291d002`. * feat: migrate old .venv layout in install.sh instead of always nuking * feat: validate old .venv with torch CUDA test before migration, recovery message on launch failure * fix: try CUDA then fall back to CPU for migration validation * fix: upgrade unsloth/unsloth-zoo with --reinstall-package on migration to preserve torch * remove: delete unused unsloth ui command (use unsloth studio instead) * Fix Windows venv path mismatch between install.ps1, setup.ps1, and studio.py install.ps1 was creating the venv CWD-relative ($VenvName = "unsloth_studio"), setup.ps1 was using an absolute path to ".unsloth\studio\.venv", and studio.py looks for ".unsloth\studio\unsloth_studio". All three paths were different, so the Windows installer would never produce a working Studio setup. install.ps1: - Use absolute $StudioHome + $VenvDir matching the Linux install.sh layout - Add 3-way migration: old .venv at STUDIO_HOME, CWD-relative ~/unsloth_studio from the previous install.ps1, or fresh creation with torch validation - For migrated envs, upgrade unsloth while preserving existing torch/CUDA wheels - Set SKIP_STUDIO_BASE=1 before calling setup.ps1 (matches install.sh behavior) - Fix launch instructions to use the absolute venv path setup.ps1: - Change $VenvDir from ".unsloth\studio\.venv" to ".unsloth\studio\unsloth_studio" - Add SKIP_STUDIO_BASE guard: error out if venv is missing when called from install.ps1 (which should have already created it) - Differentiate "Setup" vs "Update" in banners based on SKIP_STUDIO_BASE * setup.ps1: unconditionally error if venv missing, matching setup.sh setup.sh always errors out if the venv does not exist (line 224-228), telling the user to run install.sh first. setup.ps1 was conditionally creating a bare venv with python -m venv when SKIP_STUDIO_BASE was not set, which would produce an empty venv with no torch or unsloth. Now setup.ps1 matches setup.sh: always error, always point to install.ps1. * Fix --torch-backend=auto CPU solver dead-end on Linux, macOS, and Windows On CPU-only machines, `uv pip install unsloth --torch-backend=auto` falls back to unsloth==2024.8 because the CPU solver cannot satisfy newer unsloth's dependencies. install.ps1 already solved this with a two-step approach; this applies the same fix to install.sh and install_python_stack.py. install.sh: add get_torch_index_url() that detects GPU via nvidia-smi and maps CUDA versions to PyTorch index URLs (matching install.ps1's Get-TorchIndexUrl). Fresh installs now install torch first via explicit --index-url, then install unsloth with --upgrade-package to preserve the pre-installed torch. All 5 --torch-backend=auto removed from primary paths. install.ps1: add fallback else-branch when TorchIndexUrl is empty, using --torch-backend=auto as last resort (matching install.sh). install_python_stack.py: remove unconditional --torch-backend=auto from _build_uv_cmd. Torch is pre-installed by install.sh/setup.ps1 by the time this runs. Callers that need it can set UV_TORCH_BACKEND. Both install.sh and install.ps1 now share the same three-branch logic: migrated env (upgrade-package only), normal (torch-first + index-url), and fallback (--torch-backend=auto if URL detection fails). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use --reinstall-package for migrated envs on both Linux and Windows For migrated environments (moved from legacy venv location), --reinstall-package is better than --upgrade-package because it forces a clean reinstall even if the same version is already installed. This ensures proper .dist-info and .pyc state in the new venv location. --upgrade-package remains correct for the fresh install path where torch is already installed and we just want to add unsloth without re-resolving torch. * Address review findings: portability, parity, and stale comments - Replace grep -oP (GNU Perl regex) with POSIX sed in get_torch_index_url() so the script works on BSD grep (macOS is already guarded by the Darwin early-return, but Alpine/BusyBox would silently get the wrong CUDA tag) - Add LC_ALL=C before nvidia-smi invocation to prevent locale-dependent output parsing issues - Add warning on stderr when nvidia-smi output is unparseable, matching install.ps1's [WARN] message - Add explicit unsloth-zoo positional arg to install.ps1 migrated path, matching install.sh (--reinstall-package alone won't install it if it was never present in the migrated env) - Fix stale comment in install_python_stack.py line 392 that still claimed --torch-backend=auto is added by _build_uv_cmd - Add sed to test tools directory (function now uses sed instead of grep) * Add --index-url to migrated env path to prevent CPU torch resolution The migrated path runs uv pip install with --reinstall-package for unsloth/unsloth-zoo. While uv should keep existing torch as satisfied, the resolver could still re-resolve torch as a transitive dependency. Without --index-url pointing at the correct CUDA wheel index, the resolver would fall back to plain PyPI and potentially pull CPU-only torch. Adding --index-url $TORCH_INDEX_URL ensures CUDA wheels are available if the resolver needs them. Applied to both install.sh and install.ps1. * Revert --index-url on migrated env path The original install.ps1 on main already handles the migrated path without --index-url and it works correctly. --reinstall-package only forces reinstall of the named packages while uv keeps existing torch as satisfied. No need for the extra flag. * Fix unsloth studio update --local not installing local checkout studio.py sets STUDIO_LOCAL_REPO when --local is passed, but install_python_stack.py never read it. The update path always installed from PyPI regardless of the --local flag. Add a local_repo branch that first updates deps from base.txt (with --upgrade-package to preserve torch), then overlays the local checkout as an editable install with --no-deps. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-03-25 05:24:21 -07:00

12 commits