mirror of
https://github.com/unslothai/unsloth.git
synced 2026-05-17 21:14:06 +00:00
12 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
79adfd9c71
|
studio: skip flash-attn install on Blackwell GPUs (sm_100+) (#5420)
* studio: skip flash-attn install on Blackwell GPUs (sm_100+) Dao-AILab does not publish prebuilt flash-attn wheels for sm_100, sm_120, or sm_121, and the older-arch wheels fail to load on Blackwell. Add a shared has_blackwell_gpu() helper and gate both the install-time (install_python_stack._ensure_flash_attn) and runtime (worker._ensure_flash_attn_for_long_context) paths on it. Detection uses nvidia-smi --query-gpu=compute_cap, which works on Linux and Windows. * test: stub has_blackwell_gpu in pre-existing runtime flash-attn tests prefers_prebuilt_wheel and falls_back_to_pypi exercise the install paths that the Blackwell guard now short-circuits. Make them explicit about non-Blackwell so they pass on real Blackwell hosts. * studio: cache has_blackwell_gpu, skip Blackwell warning under NO_TORCH - Wrap has_blackwell_gpu in functools.lru_cache so repeated calls in a single process avoid redundant nvidia-smi spawns. Tests clear the cache via setup_method/teardown_method. - In _ensure_flash_attn, run the NO_TORCH short-circuit before the Blackwell check so GGUF-only users (who never install torch anyway) do not see a Blackwell warning. Blackwell check still runs above the IS_WINDOWS / IS_MACOS gates so Blackwell-on-Windows users still see the explicit reason rather than a silent OS skip. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: add has_blackwell_gpu to mlx worker test wheel_utils stub test_mlx_training_worker_config loads worker.py against a hand-rolled utils.wheel_utils stub. Adding has_blackwell_gpu to the stub symbol list so worker's import line resolves. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
1c91f49d83
|
fix: unblock 4 tests deselected/skipped in #5312 (real bugs) (#5359)
* fix: unblock 4 tests deselected/skipped in #5312 (real bugs) PR #5312 surfaced two real regressions by turning previously-silent skips into explicit `--deselect` / `pytest.skip(...)` blocks. Both were left as follow-ups rather than fixed in that PR. This PR fixes the underlying bugs so the suppressions can be dropped. 1. studio/backend/requirements/no-torch-runtime.txt: pin tokenizers Installing with `--no-deps -r no-torch-runtime.txt` (the path install.sh takes for the no-torch / GGUF-only mode) resolves transformers to 5.3.0 and tokenizers to the latest available (0.23.1). transformers 5.3.0 requires `tokenizers>=0.22.0,<=0.23.0`, so `from transformers import AutoConfig` then fails at import time: ImportError: tokenizers>=0.22.0,<=0.23.0 is required for a normal functioning of this module, but found tokenizers==0.23.1. Pin `tokenizers>=0.22.0,<=0.23.0` to match the constraint embedded inside every transformers version in the allowed window (4.56.0..5.3.0). Verified locally: a fresh `uv venv` + `uv pip install --no-deps -r no-torch-runtime.txt` followed by `from transformers import AutoConfig` now succeeds. Unblocks 3 deselected cases in studio-backend-ci.yml: - TestE2ETokenizersFix::test_autoconfig_works_with_no_torch_runtime (parametrized py 3.12 + 3.13 -> 2 cases) - TestE2EFullNoTorchSandbox::test_autoconfig_succeeds 2. unsloth/models/rl.py: defensive wrapper for _patch_trl_rl_trainers _patch_trl_rl_trainers has many internal `try: ... except: ... return` branches, but several paths (notably inspect.getsource on the thin wrappers TRL 1.x leaves in trl.trainer for trainers that moved to trl.experimental) can still propagate exceptions. The umbrella patch_trl_rl_trainers() ring-fences each call with try/except + warning_once, but direct callers (the CI shim in consolidated-tests-ci.yml, downstream tools, end-user scripts) used to see the raw exception, which forced #5312's CI heredoc to ring-fence with: except Exception as e: # TRL 1.x renames break the patch helper internally; we # accept that here and skip rather than fail the cell. pytest.skip(f"_patch_trl_rl_trainers raised: ...") Rename the existing implementation to _patch_trl_rl_trainers_impl and make _patch_trl_rl_trainers a thin wrapper that catches any uncaught exception and routes it through logger.info, matching the umbrella wrapper's behaviour. Power users who want the raw raising behaviour for their own diagnostics can still call _patch_trl_rl_trainers_impl directly. Adds tests/python/test_patch_trl_rl_trainers_defensive.py to lock the contract: the wrapper must never raise, and it must delegate to the impl on the happy path. Unblocks 1 skip in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. Follow-up for #5312 once this lands: drop the two `--deselect` lines in studio-backend-ci.yml's repo-cpu-tests step and drop the `except Exception ... pytest.skip(f"_patch_trl_rl_trainers raised: ")` block in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. * chore: tighten comments and docstrings in the new code Drop verbose justifications down to one or two lines per site. The PR description carries the full context; in-file comments only need to point at the WHY. * chore(no-torch-runtime): drop redundant lower bound on tokenizers tokenizers 0.23.0 was never published to PyPI (versions go 0.22.2 -> 0.23.1), so `tokenizers<=0.23.0` resolves to 0.22.2 in practice, the same version the explicit >=0.22.0,<=0.23.0 pin resolved to. Verified on Python 3.12 and 3.13. |
||
|
|
a56c959233
|
Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke (#5298)
* Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke
The repo currently has no PR-time CI; only release-desktop.yml (manual) and
stale.yml (issue pinger). studio/backend/tests/ has 35 test files (~860
tests collected) that never run automatically. Frontend lint/typecheck/build
scripts exist in package.json but are not gated on PRs either. This is the
gap that let 2026.5.1 ship with the broken Studio chat-history bundle.
Adds four ubuntu-latest workflows, all CPU-only and free for public repos:
studio-pin-enforce.yml
Greps studio/frontend/package.json for caret/tilde ranges on the
@assistant-ui surface (and assistant-stream). Blocks the exact regression
vector that produced 2026.5.1 (^0.12.19 resolving to a breaking 0.12.28).
studio-frontend-ci.yml
npm ci (strict lockfile), tree-clean check after, typecheck, vite build,
bundle grep for the Studio unstable_Provider call site (<= 3 hits = OK,
>= 4 = the 2026.5.1 regression), 75 MB dist budget, biome non-blocking.
Uploads dist on failure.
studio-backend-ci.yml
Runs the existing studio/backend/tests/ suite on Python 3.10/3.11/3.12.
Excludes test_studio_api.py (live model + GGUF download) and
llama_cpp_load_progress_live (spawns a real llama.cpp). Local run on this
branch: 861 pass, 4 skipped, 5 deselected. ruff non-blocking.
wheel-smoke.yml
python -m build, then verifies the produced wheel:
- ships studio/frontend/package-lock.json
- ships studio/frontend/dist/index.html
- does NOT ship studio/frontend/node_modules/
- does NOT ship studio/frontend/bun.lock
- main JS bundle has < 4 unstable_Provider hits
Then installs the wheel into a fresh venv with a lightweight dep set and
imports studio.backend.main. Locally validated against the wheel built
from this branch.
Each workflow has concurrency cancellation on the same ref. biome and ruff
are gated as non-blocking until the existing accumulated drift is cleared
(~470 biome errors today); remove the bypass in a follow-up.
Notes verified locally:
- pin enforcement: PASS (carets dropped on this branch)
- frontend npm ci -> typecheck -> build -> grep -> budget: PASS
- bundle: 48 MB, hits=1
- backend pytest: 861 pass, 1 GPU-pollution failure not reproducible on
GPU-less runners (won't reproduce on ubuntu-latest)
- wheel build: 13s, produces unsloth-2026.5.2-py3-none-any.whl
- wheel content sanity: all five checks PASS
* CI: install full backend dep set + refine pytest filter for CPU runners
First CI run on PR #5298 surfaced two real gaps:
1. pytest collection failed at `import yaml` in utils/models/model_config.
Locally my workspace venv had pyyaml from a transitive; CI's clean Python
3.10/3.11/3.12 didn't, so collection hit ModuleNotFoundError on the very
first test module. Same blew up the wheel-smoke `from studio.backend.main
import app` step.
2. Once the import chain was complete, ~9 tests still failed because they
exercise GPU-only paths or live transformers introspection that can't run
on a GPU-less `ubuntu-latest` runner regardless of code correctness:
- TestGpuAutoSelection
- TestPreSpawnGpuResolution
- TestPerGpuFitGuardAllCounts
- TestTransformersIntrospection
- test_returns_cuda_when_cuda_available
- test_calls_cuda_cache_when_cuda
Fix:
- Backend CI installs `studio/backend/requirements/studio.txt` (the
declared backend dep set) + the extras the import chain needs but
studio.txt omits (python-multipart, sqlalchemy, cryptography, pyyaml,
jinja2, mammoth, unpdf, requests, etc.) + torch CPU wheel + transformers.
- Refine the pytest -k filter to deselect the GPU/introspection-bound
classes by name. Deselections are commented inline with the reason.
- wheel-smoke uses the same dep set so the import smoke matches.
Locally validated against the freshly-built unsloth-2026.5.2 wheel:
831 passed, 5 skipped, 35 deselected, 0 failed in 47s
Studio backend imports cleanly in a fresh venv after the wheel install.
* CI: collapse multiline pytest -k expression to a single line
YAML's | block-scalar fed the newlines verbatim into the -k argument and
pytest rejected it as 'Wrong expression passed to -k'. Same logical filter
on one line.
* CI: rename jobs so the GitHub UI shows what each check actually does
Adds a per-job 'name:' to all four workflows so the PR check list reads:
Studio pin enforcement / @assistant-ui must be pinned exactly
Studio frontend CI / Frontend build + bundle sanity
Studio backend CI / Backend pytest (Python 3.10|3.11|3.12)
Studio backend CI / Backend ruff lint (non-blocking)
Wheel build + smoke / Wheel build + content sanity + import smoke
Instead of the default '<workflow> / <job-key>' which was opaque
('check', 'build', 'pytest (3.10)', 'ruff', 'wheel').
* CI: add Python 3.13 to backend pytest matrix
Verified locally: 831 backend tests pass under Python 3.13 with the same
filter set used for 3.10 / 3.11 / 3.12.
* CI: add Studio inference smoke + Tauri build smoke
Two new workflows. Both CPU-only, both free on `ubuntu-latest`.
studio-inference-smoke.yml
The only workflow we have that proves "Studio actually works", as opposed
to "the bundle parses" or "the imports succeed":
- runs install.sh --local --no-torch (lean Studio install)
- downloads unsloth/gemma-4-E2B-it-GGUF UD-IQ3_XXS into actions/cache
- boots Studio in api-only mode
- logs in with the bootstrap password, changes it, re-logs
- POST /api/inference/load on the GGUF
- POST /api/inference/chat/completions and asserts a non-empty
assistant response
Validated end-to-end locally on a fresh main install: model loaded,
chat completion returned `Hello!` against the same GGUF the workflow
uses.
studio-tauri-smoke.yml
PR-time variant of release-desktop.yml. Linux-only debug build
(`tauri build --debug --no-bundle`) on ubuntu-22.04. Catches
src-tauri Cargo.toml / Rust source breakage, tauri.conf.json drift,
and frontend-distDir wiring. Pinned to the same Tauri CLI version
(2.10.1) as release-desktop.yml so CLI bumps surface in CI before
they break the release pipeline. Mac and Windows desktop builds
stay manual via release-desktop.yml because they need code-signing
secrets.
* CI: use 'hf download' instead of deprecated 'huggingface-cli download'
huggingface_hub 1.13.0 dropped the huggingface-cli entrypoint. The
replacement is the 'hf' CLI shipped with the same package. Same args,
just s/huggingface-cli/hf/.
* CI: assert llama.cpp prebuilt path was used on ubuntu-latest
The inference-smoke job runs on ubuntu-latest (CPU-only, x86_64), which
is exactly the host shape that should pick up ggml-org/llama.cpp's
bin-ubuntu-x64.tar.gz prebuilt directly. If install.sh ever falls back
to a source build on this runner, the studio/setup.sh routing has
regressed and every CPU-only Linux user is paying a 3 minute compile
cost again.
Tee install.sh output to logs/install.log, then fail the job if the log
contains "falling back to source build" or is missing the success
marker "prebuilt installed and validated" / "prebuilt up to date and
validated".
Also include logs/install.log in the failure artifact so the prebuilt
diagnostics are uploaded alongside studio.log when the job fails.
* Tighten prebuilt-assertion comment in studio-inference-smoke
* CI: switch inference-smoke model to Qwen3.5-2B UD-IQ3_XXS
Drops the Gemma 4 E2B GGUF (~2.3 GB) for unsloth/Qwen3.5-2B-GGUF
(UD-IQ3_XXS, ~890 MiB). Cache-miss download is roughly a third of
what it was, and CPU inference on ubuntu-latest finishes well
inside the 25 minute job budget.
Verified locally: load via /api/inference/load returns
status=loaded, is_gguf=true, supports_reasoning=true,
supports_tools=true; chat completion returns a non-empty assistant
message ("Hello!").
* CI: add workflow_dispatch to inference-smoke for manual cache pre-warm
* CI: fold pin-enforce grep into studio-frontend-ci, drop standalone workflow
The "@assistant-ui must be pinned exactly" check was its own ~7 second
workflow, doing a single grep on studio/frontend/package.json. Move it
into studio-frontend-ci.yml as a pre-install step (right after
checkout, before any node setup so a violation fails fast). One fewer
top-level check row on every PR, same coverage.
Add a FIXME so this step is dropped once @assistant-ui/* and
assistant-stream leave 0.x: on 1.x, caret ranges are conventional and
this becomes overzealous.
* CI: add Repo tests (CPU) job, mirroring unsloth-zoo PR #624 conftest
The top-level tests/ tree was previously not run anywhere. 23 of its
files are CPU-friendly with the right harness: pure-Python helpers,
ast walks, installer logic, and CLI shape tests. Locally validated:
302 passed, 9 skipped, 12 deselected in ~7 seconds on Python 3.12.
Three pieces:
1. tests/conftest.py -- GPU-free harness, mirrors the conftest landed
in unslothai/unsloth-zoo PR #624. Pre-loads unsloth_zoo.device_type
and unsloth.device_type under a temporarily-mocked
torch.cuda.is_available() so each module's @cache permanently
captures "cuda" and the import chain succeeds on a CPU runner.
Also stubs torch.cuda.get_device_capability /
is_bf16_supported / mem_get_info, which unsloth/__init__.py and
unsloth_zoo.temporary_patches probe at import time when
DEVICE_TYPE == "cuda". On a real accelerator the harness is
skipped and detection runs normally.
2. Two existing tests were leaking sys.modules state across the
session because they injected stubs without an __spec__ and
without restoration:
- tests/test_raw_text.py shoved a "datasets" stub into
sys.modules. transformers' import_utils later did
importlib.util.find_spec("datasets") and got
ValueError: datasets.__spec__ is None.
- tests/python/test_fast_sentence_transformer_redirect_lifecycle.py
shoved "transformers", "sentence_transformers", and
"sentence_transformers.models" stubs in. Subsequent tests
that did `import transformers` got the non-package stub.
Fix: set __spec__ on stubs, plus an autouse fixture in the
sentence-transformer test file that restores the three keys
after each test.
3. .github/workflows/studio-backend-ci.yml gains a third job,
`Repo tests (CPU)`, that installs the same dep set as the
backend-pytest matrix (Python 3.12 only -- the tests are
version-independent), exports PYTHONPATH=studio so tests/python/*
can import install_python_stack, and runs the 23-file subset
above with `-m 'not server and not e2e'`.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* CI: install unsloth_zoo for Repo CPU tests, harden conftest fallback
The CPU job at run 25422050018 broke at conftest collection: the
preload of unsloth.device_type pulled in `from unsloth_zoo.utils import
Version` and ubuntu-latest didn't have unsloth_zoo on the path because
it is an optional dep of unsloth. Two fixes:
1. Install unsloth_zoo>=2026.5.1 alongside the other deps in the Repo
tests (CPU) job (it's also what unsloth's optional `huggingface`
extra pins).
2. Wrap the body of _preload_device_type in conftest.py in a try/except
so any import failure (missing prereq, broken module, etc.) cleanly
returns False instead of aborting the entire collection. The caller
already falls back to the stub device_type module on False, so the
net behavior is "best effort: real device_type if possible, stub
otherwise" instead of "abort the test session".
* kernels.utils: guard CUDA_STREAMS / XPU_STREAMS init for DEVICE_COUNT==0
When DEVICE_COUNT is 0 (CPU host: no visible NVIDIA / AMD / Intel GPU)
the dict comprehension {... for i in range(0)} was empty and the
subsequent max(_CUDA_STREAMS.keys()) raised
ValueError: max() iterable argument is empty
during module import. That made unsloth.kernels.utils unimportable on
any CPU runner, which in turn blocked all of tests/saving/**, three
top-level tests/test_*.py, and tests/qlora/test_unsloth_qlora_train_and_merge.py
from even collecting on CPU CI.
Wrap the per-device-index dict comprehension and max() machinery in
a DEVICE_COUNT > 0 guard. When DEVICE_COUNT is 0 fall back to empty
containers (CUDA_STREAMS = (), WEIGHT_BUFFERS = [], ABSMAX_BUFFERS = []).
The consumer functions further down in this module index these arrays
by device_index but only during real GPU work, so the empty fallbacks
never get touched on a CPU host.
GPU-safety verified locally: with 8 visible CUDA devices, CUDA_STREAMS
has 8 entries (identical to before this PR). With CUDA_VISIBLE_DEVICES=""
the module imports cleanly, CUDA_STREAMS is (), and the previously
blocked tests now collect (test_get_model_name passes 38 subtests,
test_resolve_model_class passes 9, test_model_registry collects all 8
parametrizations).
Same shape applied to the DEVICE_TYPE == "xpu" branch for symmetry.
* CI: switch Repo tests (CPU) to auto-discovery + isolate flakes
Three changes, locally validated end-to-end (779 passed, 11 skipped,
23 deselected, 0 failed across all three steps):
1. Repo tests (CPU, auto-discovered): replace the explicit 23-file
list with `pytest tests/` plus a small set of `--ignore` and
`--deselect` flags. New tests under tests/python, tests/studio
(excluding the two state-sensitive files), and top-level
tests/test_*.py are picked up automatically with no workflow edit.
--ignore covers:
- tests/qlora and tests/saving: GPU-bound by design
- tests/utils: helpers folder, not tests
- tests/sh: shell suite handled in its own step
- two state-polluting hardware-spoof files (next step)
-m 'not server and not e2e': honours markers already declared
in tests/python/conftest.py
--deselect: test_model_registration / test_all_model_registration
hit huggingface_hub live; they belong on a network job
2. Hardware-spoof tests (state-sensitive, run in isolation):
tests/studio/test_hardware_dispatch_matrix.py and
tests/studio/test_is_mlx_dispatch_gate.py mutate module globals
in studio.backend.utils.hardware.hardware (IS_ROCM, DEVICE) via
their spoof fixtures, and the leak crosses file boundaries.
Running them in their own pytest invocation avoids polluting the
main sweep. Both pass cleanly in isolation: 28 passed, 1 skipped.
3. Shell installer tests: explicitly enumerated subset that does not
depend on install.ps1 layout (test_install_host_defaults.sh has
drifted; that's a separate followup).
Test fixes folded in to keep the run green:
- tests/studio/install/test_rocm_support.py::TestAmdGpuMonitoring
::test_amd_primary_gpu_with_mock now clears
HIP/ROCR/CUDA_VISIBLE_DEVICES via monkeypatch so
_first_visible_amd_gpu_id() does not short-circuit when the runner
sets CUDA_VISIBLE_DEVICES="" to suppress CUDA.
- tests/studio/test_hardware_dispatch_matrix.py::spoof_hardware
fixture now stubs torch.cuda.get_device_properties when
cuda_available is True so detect_hardware()'s device_name probe
does not call into _cuda_init() on a CPU runner.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* CI: install torchvision (CPU) so unsloth_zoo.vision_utils can import
Run 25430652224 collected three test modules that import unsloth and
crashed at unsloth_zoo/vision_utils.py:68 with
ModuleNotFoundError: No module named 'torchvision'
unsloth_zoo.vision_utils unconditionally imports torchvision at module
scope, and unsloth.models._utils pulls vision_utils in. The Repo tests
(CPU) job installed torch from the CPU index but not torchvision, so
any test that imports unsloth.models.* failed at collection.
Add torchvision<0.26 to the same pip install --index-url
https://download.pytorch.org/whl/cpu line.
* CI: install bitsandbytes (CPU build) for unsloth.models._utils import
Run 25430982243 collected three test modules that import unsloth and
crashed at unsloth/models/_utils.py:1166 with
ModuleNotFoundError: No module named 'bitsandbytes'
The bnb import there is unconditional. Recent bnb versions (>=0.45)
ship a CPU build so the wheel installs on a free Linux runner and the
import resolves; the kernels still raise on use but the module
collects, which is enough for these CPU tests.
Add 'bitsandbytes>=0.45' to the Repo tests (CPU) deps.
* CI: rename workflows + guard kernels.utils CPU-torch binding
Workflow renames (top-level `name:` keys; affects PR check rows):
Studio backend CI -> Backend CI
Studio frontend CI -> Frontend CI
Studio inference smoke -> Studio GGUF CI
Studio Tauri smoke -> Studio Tauri CI
Wheel build + smoke -> Wheel CI
Backend CI's matrix job goes from "Backend pytest (Python 3.10)" to
just "(Python 3.10)" so the GitHub UI row reads
"Backend CI / (Python 3.10)" rather than the old verbose form.
Production guard for CPU torch (run 25431126138):
unsloth/kernels/utils.py:165 was an unconditional
_gpu_getCurrentRawStream = torch._C._cuda_getCurrentRawStream
which raised AttributeError on a CPU-only torch wheel because the
compiled CUDA backend is absent. Three test modules (test_get_model_name,
test_model_registry, test_resolve_model_class) crashed at collection
because their import chain reaches this line.
Add a hasattr probe: when torch is built without CUDA, fall through to
a no-op binding that returns 0. _get_tensor_stream is only invoked
during real GPU work, so the no-op is never executed on a CPU host.
GPU-safety verified locally: with 8 visible CUDA devices the binding
still resolves to the real torch._C._cuda_getCurrentRawStream
(behaviour identical to before this PR). The XPU branch is untouched.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
d65149795b
|
feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) (#5265)
* Add Apple Silicon MLX routing
Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports
Extract original GPU init to _gpu_init.py (unchanged)
MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code
GPU path unchanged: from ._gpu_init import *
* Add Apple Silicon MLX routing
- Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports
- Extract original GPU init to _gpu_init.py (unchanged)
- MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code
- GPU path unchanged: from ._gpu_init import *
* mlx with studio
* mlx with studio
* updating temporary install.sh
* updating temporary install.sh
* adding t_v5 path
* adding t_v5 path
* fixing vision training
* fixing vision training
* adding chat
* adding chat
* minor
* minor
* Adding export and fixing training issues, inference with lora adaptors
* Adding export and fixing training issues, inference with lora adaptors
* fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, streaming for VLM
* fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, streaming for VLM
* Merge mlx-apple-silicon into main
* update install.sh to point to main branch
* update install.sh to point to main branch
* fix: export returns 3 values (success, message, output_path) matching upstream worker
* fix: export returns 3 values (success, message, output_path) matching upstream worker
* fix(mlx): show training-process peak memory in Studio UI, not system-wide
Studio UI was showing ~95 GB during MLX training because get_gpu_utilization
read "In use system memory" from IORegistry's AGXAccelerator — system-wide
GPU memory across all processes (training + backend + browser + Display).
Now the trainer's mx.get_peak_memory value is forwarded through the
progress event and surfaced via /api/train/hardware while training is
active. Falls back to the system-wide reading when training is not running.
* fix(mlx): show training-process peak memory in Studio UI, not system-wide
Studio UI was showing ~95 GB during MLX training because get_gpu_utilization
read "In use system memory" from IORegistry's AGXAccelerator — system-wide
GPU memory across all processes (training + backend + browser + Display).
Now the trainer's mx.get_peak_memory() value is forwarded through the
progress event and surfaced via /api/train/hardware while training is
active. Falls back to the system-wide reading when training is not running.
* fix(mlx): make is_bfloat16_supported detect M1/M2 (no native bf16)
M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70%
slower prefill compared to native fp16. M3+ have native bf16 (macOS
Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware
detection via mx.device_info.
* fix(mlx): make is_bfloat16_supported() detect M1/M2 (no native bf16)
M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70%
slower prefill compared to native fp16. M3+ have native bf16 (macOS
Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware
detection via mx.device_info().
* feat(mlx): wire training_type="Full Finetuning" through MLX worker
Compute use_lora from the UI's training_type before loading the model,
pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and
let the existing 'if use_lora' branch skip get_peft_model. Matches the
GPU worker's flow.
* feat(mlx): wire training_type="Full Finetuning" through MLX worker
Compute use_lora from the UI's training_type before loading the model,
pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and
let the existing 'if use_lora' branch skip get_peft_model. Matches the
GPU worker's flow.
* fix(mlx): pass save_method='merged_16bit' from Studio's export page
Previously the MLX path called save_pretrained_merged with no
save_method, which fell through to a no-op that didn't actually fuse
LoRA into the base. Now Studio's "Merged Model" export properly
fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU
behavior for the same UI option.
* fix(mlx): pass save_method='merged_16bit' from Studio's export page
Previously the MLX path called save_pretrained_merged() with no
save_method, which fell through to a no-op that didn't actually fuse
LoRA into the base. Now Studio's "Merged Model" export properly
fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU
behavior for the same UI option.
* fix(studio): pass private to MLX push, return 3-tuples consistently
MLX push_to_hub branch now forwards private=private (matches GPU)
Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model
needed') were tripping the route's 3-tuple unpack. Added a None
output_path so the unpack always succeeds.
* fix(studio): pass private to MLX push, return 3-tuples consistently
- MLX push_to_hub branch now forwards private=private (matches GPU)
- Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model
needed') were tripping the route's 3-tuple unpack. Added a None
output_path so the unpack always succeeds.
* studio wirings
* studio wirings
* Merge pull request #5 from Manan17/feat/quant_config
studio wirings
* fix(mlx): wire train_on_completions for VLM via per-template lookup
Mirror the GPU worker: stop excluding VLMs and stop hardcoding
template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and
fetch the per-template instruction/response markers from
TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables
train_on_completions for vision+image and audio cases, so backend
just trusts the flag.
* fix(mlx): wire train_on_completions for VLM via per-template lookup
Mirror the GPU worker: stop excluding VLMs and stop hardcoding
template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and
fetch the per-template instruction/response markers from
TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables
train_on_completions for vision+image and audio cases, so backend
just trusts the flag.
* wire in lora rslora, init lora weights, random_state
* wire in lora rslora, init lora weights, random_state
* loftq studio error message fix
* loftq studio error message fix
* handle unknown optim and lr scheduler
* handle unknown optim and lr scheduler
* Merge pull request #6 from Manan17/update/peftkwargs
Update/peftkwargs
* feat(mlx): pass finetune_language/attention/mlp/vision flags to FastMLXModel
Studio's four UI checkboxes now actually flow through to MLX get_peft_model
(which was just updated in unsloth-zoo to honor them). Also drops the
incorrect train_projector wiring that tied projector LoRA to the
attn/mlp flags — those are language-side toggles, not projector toggles.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx): pass finetune_language/attention/mlp/vision flags to FastMLXModel
Studio's four UI checkboxes now actually flow through to MLX get_peft_model
(which was just updated in unsloth-zoo to honor them). Also drops the
incorrect train_projector wiring that tied projector LoRA to the
attn/mlp flags — those are language-side toggles, not projector toggles.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx,ux): auto-imply finetune_language_layers when user picks attn/mlp
UI guardrail. The four checkboxes (vision/language/attention/MLP) carry
"scope × module-type" semantics that aren't obvious — picking just
"Attention modules" + "MLP modules" without "Language layers" naturally
reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune
attn/mlp modules in *no* tower" → empty target_modules → zero
trainable params → crash inside value_and_grad.
If user selected attn or mlp module types but no layer scope, default
to language scope. Power users can still explicitly choose
language=False, vision=True if they want vision-only fine-tuning of
attn/mlp.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx,ux): auto-imply finetune_language_layers when user picks attn/mlp
UI guardrail. The four checkboxes (vision/language/attention/MLP) carry
"scope × module-type" semantics that aren't obvious — picking just
"Attention modules" + "MLP modules" without "Language layers" naturally
reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune
attn/mlp modules in *no* tower" → empty target_modules → zero
trainable params → crash inside value_and_grad.
If user selected attn or mlp module types but no layer scope, default
to language scope. Power users can still explicitly choose
language=False, vision=True if they want vision-only fine-tuning of
attn/mlp.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* fix(mlx): wire top_k, repetition_penalty, and VLM top_p through to mlx-lm/mlx-vlm
Inference UI sliders for top_k and repetition_penalty had no effect on
MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing
bug: mlx_vlm.generate_step expects temperature= (long form), but we
were passing temp= which silently fell into **kwargs — every VLM chat
was effectively greedy regardless of the temperature slider.
Text path (_generate_text):
make_sampler now receives top_k in addition to temp/top_p
make_logits_processors built and forwarded when repetition_penalty is
non-trivial (skip when 0.0/1.0 to avoid pointless overhead)
VLM path (_generate_vlm):
Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate
forwards them to generate_step's sampler/logits_processor builders)
Rename temp= → temperature= so it's actually consumed
Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and
Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10,
rep_pen=1.5} now produces a distinct output, proving the parameters
reach the sampler.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* fix(mlx): wire top_k, repetition_penalty, and VLM top_p through to mlx-lm/mlx-vlm
Inference UI sliders for top_k and repetition_penalty had no effect on
MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing
bug: mlx_vlm.generate_step expects temperature= (long form), but we
were passing temp= which silently fell into **kwargs — every VLM chat
was effectively greedy regardless of the temperature slider.
Text path (_generate_text):
- make_sampler now receives top_k in addition to temp/top_p
- make_logits_processors built and forwarded when repetition_penalty is
non-trivial (skip when 0.0/1.0 to avoid pointless overhead)
VLM path (_generate_vlm):
- Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate
forwards them to generate_step's sampler/logits_processor builders)
- Rename temp= → temperature= so it's actually consumed
Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and
Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10,
rep_pen=1.5} now produces a distinct output, proving the parameters
reach the sampler.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx): map format_type to MLX save_method, reuse local save dir for hub push
export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit"
(was hardcoded merged_16bit, ignoring the UI choice).
Both export_merged_model and export_base_model now pass save_directory=
to push_to_hub_merged so it reuses the just-written local folder
instead of re-saving under a relative "username/model" directory.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx): map format_type to MLX save_method, reuse local save dir for hub push
- export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit"
(was hardcoded merged_16bit, ignoring the UI choice).
- Both export_merged_model and export_base_model now pass save_directory=
to push_to_hub_merged so it reuses the just-written local folder
instead of re-saving under a relative "username/model" directory.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* restore install
* restore install
* fix(mlx): restore FastVisionModel as a distinct class
unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel`
right after defining `class FastVisionModel(FastLanguageModel)` with a
`for_training` static method. The alias erased the class binding, so
the documented `FastVisionModel.for_training(model)` call from upstream
Unsloth's VLM notebooks raised `AttributeError` on MLX.
Remove the offending alias. `FastVisionModel` is now a real subclass of
`FastLanguageModel` again — inherits `from_pretrained` /
`get_peft_model` / `for_inference`, exposes `for_training` as a no-op
pass-through (no-op because MLX doesn't have a train/eval mode flag;
the call exists purely for GPU/MLX notebook parity).
Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via
FastVisionModel.from_pretrained → get_peft_model → for_training →
MLXTrainer.train runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs,
peak 5.89 GB).
Studio's path (FastLanguageModel.from_pretrained for any repo,
auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8.
* fix(mlx): restore FastVisionModel as a distinct class
unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel`
right after defining `class FastVisionModel(FastLanguageModel)` with a
`for_training` static method. The alias erased the class binding, so
the documented `FastVisionModel.for_training(model)` call from upstream
Unsloth's VLM notebooks raised `AttributeError` on MLX.
Remove the offending alias. `FastVisionModel` is now a real subclass of
`FastLanguageModel` again — inherits `from_pretrained` /
`get_peft_model` / `for_inference`, exposes `for_training` as a no-op
pass-through (no-op because MLX doesn't have a train/eval mode flag;
the call exists purely for GPU/MLX notebook parity).
Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via
FastVisionModel.from_pretrained → get_peft_model → for_training →
MLXTrainer.train() runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs,
peak 5.89 GB).
Studio's path (FastLanguageModel.from_pretrained for any repo,
auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8.
* Studio: harden MLX training and export, restore GPU init guards
Studio export
Restore Tuple[bool, str, Optional[str]] contract on export_merged_model,
export_base_model, export_gguf, and export_lora_adapter, populating
output_path on successful local saves so routes/worker/CLI/frontend
details.output_path is non-empty again.
Lift the GPU save_method assignment out of the local-save branch so
Hub-only merged exports (save_directory='', push_to_hub=True) no longer
hit UnboundLocalError on the push branch.
For MLX merged and base hub-only export, stage to a tempfile.TemporaryDirectory
before push_to_hub_merged instead of passing save_directory=''.
Source _IS_MLX from unsloth instead of recomputing the platform check
(single source of truth, also enforces mlx-package availability).
Studio MLX training/inference
Pass token=hf_token into FastMLXModel.from_pretrained for gated/private
models, matching the inference path.
Strip hf_token and wandb_token from wandb.init(config=...) so secrets
do not leak into the W&B run config.
Replace load_from_disk(local_datasets[0]) with the existing
UnslothTrainer._resolve_local_files / _loader_for_files helpers so
uploaded JSON/JSONL/CSV/Parquet files train through the normal datasets
loader (load_from_disk still used for HF save_to_disk directories).
Make the dataset slice helper inclusive at the end and treat 0 as a real
index instead of "unset", matching the GPU and embedding paths.
Add a status_message -> message alias inside _send so the existing parent
pump (training.py) renders MLX status updates instead of blanks.
Forward min_p through generate_chat_response into _generate_text /
_generate_vlm and into make_sampler / vlm_kwargs so the sampling control
is no longer a no-op on MLX.
Wrap unsloth_zoo.mlx_loader / mlx_trainer imports with a clearer
ImportError pointing users at install.sh for Apple Silicon.
Exit the MLX stop-polling thread on EOFError/OSError instead of
busy-looping when the queue/pipe is permanently closed (one-line
why-safe rationale inline).
Studio frontend
ParamsSection subscribes to platform deviceType via the Zustand hook so
the gradient checkpointing dropdown re-renders after the async device
fetch completes.
Studio hardware
get_gpu_utilization MLX branch now reads _read_apple_gpu_stats once and
derives VRAM totals from psutil, removing the second ioreg subprocess
per utilization poll.
Unsloth core
Restore the os.geteuid == 0 guard around the CUDA ldconfig recovery
that was lost when GPU initialization moved into _gpu_init.py, plus the
non-root manual-fix warning branch. Non-root CUDA users no longer shell
out to ldconfig at import time.
Load dataprep/raw_text via importlib so the MLX import path no longer
pulls torch in through dataprep/__init__.py -> synthetic.py.
FastVisionModel.from_pretrained overrides the inherited delegator only
to inject text_only=False; this is an extension, not a duplication, and
is needed so VLM checkpoint loads keep the vision tower.
Wrap the MLX-branch unsloth_zoo import with a clearer ImportError.
* Studio: regression tests for MLX training/export and GPU init ldconfig guard
tests/python/test_gpu_init_ldconfig_guard.py asserts the geteuid root
check still wraps the ldconfig recovery and the non-root branch warns
bnb users; AST + source-text inspection so the test runs without torch.
tests/studio/test_export_output_path_contract.py covers the
Tuple[bool, str, Optional[str]] return contract on every export method,
the output_path assignment after successful local save, the Hub-only
GPU save_method binding fix, the MLX hub-only TemporaryDirectory
staging, and the single-source `_IS_MLX` import from unsloth.
tests/studio/test_mlx_training_worker_behaviors.py covers token
forwarding to FastMLXModel.from_pretrained, wandb config secret
stripping, file-aware local dataset loading, status_message ->
message aliasing, inclusive slice semantics, EOFError/OSError stop
thread exit, and the friendly mlx_loader / mlx_trainer ImportError.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(mlx): cap inference memory + release wired on unload + tame worker pre-pin
Three memory-hardening fixes for Studio's MLX path:
1. Inference applies the same Metal caps as the trainer.
load_model previously only called set_wired_limit(100% of recommended)
with no upper memory_limit, leaving large VLM checkpoints unbounded
during the loader allocation. Add _configure_memory_limits() that sets
memory_limit to 85% of recommended and wired_limit to min(recommended,
memory_limit) — matching MLXTrainer's defaults so behavior is the same
whether the user trains or just runs inference.
2. unload_model releases pinned memory back to the OS — but only when
the cache is empty. Without this, pinned wired bytes stayed allocated
to MLX after the model was gone, starving other apps. The release is
guarded on `not self.models` so unloading one of several cached
models doesn't un-pin weights still in use.
3. Worker pre-cap is conservative instead of aggressive.
The previous pre-pin set_wired_limit(100% of recommended) competed
with MLXTrainer's later more conservative cap. Replace with the same
85%-memory / min(rec, memory) pair that the trainer applies later
(idempotent re-apply). Bounds the model load + LoRA setup window
without over-pinning.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tests/studio: regression tests for the _IS_MLX dispatch gate
Two gates drive every MLX-vs-CUDA dispatch decision in Studio:
1. unsloth._IS_MLX in unsloth/__init__.py — evaluated once at import
time, read by Studio worker code to choose the GPU vs MLX trainer
and inference paths. Defined as
Darwin AND arm64 AND find_spec("mlx") is not None.
2. utils.hardware.detect_hardware() — runtime probe with priority
CUDA > XPU > MLX > CPU. The MLX branch is reached only when both
CUDA and XPU are unavailable and the host is Apple Silicon and
mlx is importable.
Neither gate had a direct test. Adds tests/studio/test_is_mlx_dispatch_gate.py
with six tests:
test_is_mlx_gate_uses_three_required_predicates
AST-walks unsloth/__init__.py and asserts the _IS_MLX assignment
is a BoolOp(And) of platform.system()=="Darwin",
platform.machine()=="arm64", and find_spec("mlx") is not None.
Catches accidental rewrites that drop a predicate.
test_is_mlx_gate_true_on_apple_silicon_with_mlx_present
Spoofs platform to Darwin/arm64, injects a fake mlx module so
find_spec returns a real ModuleSpec, re-evaluates the gate
expression. Verifies it flips True under the exact conditions
Studio expects.
test_is_mlx_gate_false_when_mlx_missing
Spoofs Apple Silicon but with mlx absent. Verifies the gate stays
False (so a Mac without mlx installed does not pretend to have
MLX support).
test_is_mlx_gate_false_on_non_apple_silicon
Canary on the actual Linux+CUDA / AMD / Intel test host: the gate
must remain False regardless of whether mlx happens to be
importable. Protects existing GPU users from accidental MLX
hijack when MLX support evolves.
test_detect_hardware_picks_mlx_when_only_apple_silicon_available
Forces torch.cuda and torch.xpu off, spoofs Apple Silicon, injects
fake mlx and mlx.core. detect_hardware() must return DeviceType.MLX.
test_detect_hardware_picks_cuda_on_real_host
Canary: on a real CUDA host detect_hardware() must return
DeviceType.CUDA. Protects against the MLX branch shadowing CUDA
dispatch on NVIDIA / AMD ROCm hosts.
Uses the same monkeypatch.setitem(sys.modules, ...) fake-mlx pattern as
the existing test_mlx_inference_backend.py — no new test infrastructure,
no real mlx install required.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add AGPL-3.0 SPDX header to Studio MLX regression tests
Four Studio MLX test files shipped without an SPDX-License-Identifier:
studio/backend/tests/test_mlx_training_worker_config.py
tests/studio/test_mlx_training_worker_behaviors.py
tests/studio/test_export_output_path_contract.py
tests/studio/test_is_mlx_dispatch_gate.py
They sit in or alongside studio/backend/, which is governed by
studio/LICENSE.AGPL-3.0, and exercise AGPL Studio code. Add the same
"# SPDX-License-Identifier: AGPL-3.0-only" header that's already on
test_mlx_inference_backend.py so the license declaration matches
the code under test rather than defaulting to the repo-root
Apache-2.0.
* Wrap MLX submodule imports with friendly install hint
The _IS_MLX block at the top of unsloth/__init__.py already catches the
missing-package case with a friendly install hint, but the follow-up
"from unsloth_zoo.mlx_trainer import ..." and "from unsloth_zoo.mlx_loader import ..."
lines run unguarded. An Apple Silicon user who has unsloth-zoo installed
but on an older version (e.g. the current PyPI release, before the MLX
modules ship) sees a raw ImportError on the submodule rather than the
hint that points at install.sh.
Wrap the two submodule imports in the same try/except shape so the
friendly install message fires whether the package is missing entirely
or just predates the MLX submodules. No-op once both packages release
together; smooths the transitional window where unsloth/main has merged
but unsloth-zoo on PyPI has not.
---------
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
|
||
|
|
680d43a488
|
Fix FastSentenceTransformer loading with newer sentence-transformers (#5259)
* Fix FastSentenceTransformer compatibility with sentence-transformers 5.4 * Support varied Transformer init signatures Detect Transformer.__init__ parameters and build init kwargs accordingly so trust_remote_code and other args are passed using the correct names. Instead of unconditionally using model_args/config_args, the code now inspects the constructor to decide between model_kwargs/config_kwargs vs model_args/config_args and also sets processor_kwargs or tokenizer_args when present. Initializes Transformer with constructed transformer_kwargs (including max_seq_length) to improve compatibility with different Transformer implementations. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden SentenceTransformer path and module checks * Scrub .github/workflows for staging push (matches staging base) * Guard auto_model write in FastSentenceTransformer._apply_torch_compile On sentence-transformers >=5.4 Transformer.auto_model is a read-only @property backed by self.model, so a direct assignment raises AttributeError. The two get_peft_model paths already guard the write with isinstance(getattr(type(...), "auto_model", None), property); the auto-compile path missed the same guard, which broke the default trainer path whenever max_steps >= _compile_threshold. * Add tests for FastSentenceTransformer property guards * Tighten FastSentenceTransformer redirect lifecycle tests Drop a duplicate assertion-less case, remove dead AST extraction helper, and trim unused imports. The remaining six tests cover substitution on match, restoration on constructor exception, passthrough for unrelated names, pathlib.Path normalisation, trailing slash handling, and the no-identifier guard. * Sync .github/workflows with upstream author branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Avoid sharing trust_remote_code kwargs dict across constructor buckets In FastSentenceTransformer._create_transformer_module, the same trust_remote_code_kwargs dict was being assigned to model_kwargs, config_kwargs, and processor_kwargs (or model_args / config_args / tokenizer_args) on the Transformer constructor. transformers' from_pretrained code paths (configuration_utils, auto_factory, processing_auto, etc.) call kwargs.pop("trust_remote_code", ...) on the dict they receive, which would drain the shared object and silently strip trust_remote_code from the other buckets. Pass an independent copy to each bucket so subsequent buckets and any pass-through auxiliary loads still see trust_remote_code. * Wire do_lower_case and return_dict through Transformer init for ST 5.4 In FastSentenceTransformer._create_transformer_module: - When Transformer.__init__ accepts do_lower_case (ST 5.4+), pass the unsloth tokenizer's do_lower_case as a constructor kwarg. The existing post-init attribute assignment alone is too late: ST 5.4's __init__ uses do_lower_case to install a Lowercase normalizer on tokenizer.backend_tokenizer.normalizer, which is not re-applied if we only set the attribute after construction. The post-init line is preserved untouched for older ST versions. - Add return_dict to the manually completed model_forward_params set so wrapped models with forward(*args, **kwargs) signatures keep ST's forced dict-like output safety net. ST 5.4's own __init__ unions the forward signature with the same set plus return_dict; the previous override silently dropped it. * Preserve flash-attention forward keys when wrapping ST 5.4 Transformer Sentence-transformers 5.4's Transformer.__init__ calls _can_flatten_inputs() during construction, which augments self.model_forward_params with cu_seq_lens_q, cu_seq_lens_k, max_length_q, max_length_k, seq_idx whenever feature-extraction with text modality, the torch backend, flash-attention 2, and varlen flash-attn support are all available. The post-init override of transformer_module.model_forward_params used to replace the attribute outright, silently dropping those keys so ST's preprocess() filter stripped flash-attn kwargs before reaching model.forward. Snapshot the constructor-populated set first, leave the existing overwrite intact for the forward-signature plus tokenizer keys, and union the snapshot back in so flash-attn forwarding keeps working on ST 5.4. For older sentence-transformers releases the attribute is absent and getattr returns an empty set, leaving behavior unchanged. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|
|
0da8af56d6
|
unsloth run: add --enable-tools/--disable-tools server-side tool policy (#5277)
* Add process-level tool_policy state for unsloth run * Apply tool_policy override at chat/completions, /messages, and tool pass-through gates * Add pure resolver for unsloth run --enable-tools/--disable-tools * Wire --enable-tools/--disable-tools into unsloth run * Color tool-policy notices and confirmation prompt in Claude orange * Always show tool-status notice; print URL + API key in silent mode * Treat any non-loopback bind as external; forward --yes after parent prompt * Fix tool_policy double-module bug: import via state.tool_policy to share global with routes |
||
|
|
4f9c8321a2
|
Fix DPO trainer multi process hang (#5199)
* Fix DPO trainer multi process hang * Fix datacollator error * further dpo vision changes * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden DPO vision row processing and source rewrites - dpo_trainer_vision_signature_columns: also match TRL 0.22.x layout (image_sizes followed by ref_chosen_logps), so vision keys are not stripped via remove_unused_columns on the originally-affected version. - dpo_trainer_concatenated_inputs: fall back to inserting after the image_sizes block when no token_type_ids anchor follows it. - Apply the same vision model_kwargs forwarding rewrite to _compute_loss_liger via dpo_trainer_compute_loss_liger so the Liger DPO path does not drop pixel_position_ids/image_position_ids/ mm_token_type_ids when args.use_liger_loss is true. - dpo_trainer_vision_process_row: - guard chosen/rejected EOS append with tokenizer.eos_token_id is not None - use features.get("images") and features.get("prompt") to match the existing get on line 164 and avoid KeyError on rows without those keys - drop the torch.is_tensor gate so list-form pixel_position_ids/ image_position_ids returned without return_tensors are still aliased - skip the loop entry for image_position_ids when it was already promoted to pixel_position_ids, so the output dict no longer carries both keys with identical data - dpo_trainer_data_collator_vision_keys: switch from pad_sequence to trl.trainer.utils.pad with padding_side='left' (matches the DPO collator's prompt left-pad) and padding_value=-1 for *_position_ids keys (sentinel for padded patches), 0 otherwise. Skip the key when not every example carries it. Falls back to pad_sequence if trl.pad is unavailable or the tensor rank is too high. - dpo_trainer_prepare_dataset: keep TRL's writer_batch_size=10 when popping num_proc; removing it defaults to 1000 and reintroduces the vision OOM risk that writer_batch_size=10 was set to avoid. * DPO vision row: keep upstream-facing keys and fix patch padding - dpo_trainer_vision_process_row: no longer aliases image_position_ids to pixel_position_ids. Each upstream-emitted vision key is forwarded under its own name. Gemma4 ForConditionalGeneration.forward accepts image_position_ids directly and renames it to pixel_position_ids only at the vision-tower call site, so aliasing in the row helper hid the kwarg the model actually consumes. - dpo_trainer_vision_process_row: extract pixel_values via "in" membership instead of unconditional indexing. With the missing-images path returning [] to the processor, modern processors no longer emit a pixel_values key, and the previous indexing raised KeyError. - dpo_trainer_data_collator_vision_keys: pick padding_side per key family. *_position_ids tensors are patch-aligned to pixel_values (TRL's DataCollatorForPreference right-pads pixel_values), so pad them right with the -1 sentinel; mm_token_type_ids is token-aligned to prompt_input_ids (left-padded by TRL), so pad it left with 0. * DPO vision: handle multi-image prompts and arbitrary-rank collator pad - dpo_trainer_vision_process_row: when a prompt is missing vision placeholders, insert one placeholder per missing image instead of always inserting a single token. Multi-image rows now satisfy the processor's token-vs-image count check rather than under-inserting and tripping the placeholder/feature mismatch. - dpo_trainer_data_collator_vision_keys: drop the dim()<=2 gate around trl.trainer.utils.pad. trl.pad handles arbitrary rank correctly, while the previous fallback to torch.nn.utils.rnn.pad_sequence raised RuntimeError on rank-3 patch-position tensors with mismatched non-leading dimensions. The pad_sequence path remains as a degraded fallback only when trl.pad is unavailable or raises. * DPO vision row: support scalar images and align prompt-aligned aux ids - dpo_trainer_vision_process_row: type-aware normalization of the features['images'] column instead of a truthiness/len check that raised on single image objects (PIL.Image has no __len__) and on numpy ndarrays (truthiness ambiguous). Lists/tuples count as their length, scalar image objects count as one, None counts as zero, and the original value is forwarded to the processor. - dpo_trainer_vision_process_row: when max_prompt_length truncates prompt_input_ids, also slice token_type_ids and mm_token_type_ids by the same [-max_prompt_length:] suffix. Those keys are 1:1 token aligned to prompt_input_ids (Gemma 4 vision attention keys off mm_token_type_ids per modular_gemma4.py), so leaving them at the original length silently misaligned the multimodal mask. * DPO vision row: stop synthesizing vision-token placeholders Pass features['prompt'] and features['images'] straight to the processor without inserting any extra placeholder tokens. The previous helper used processing_class.image_token, which is the right prompt placeholder for Gemma 4 but the wrong one for Gemma 3 (whose prompt placeholder is boi_token while image_token is the inner expansion target). Synthesizing that token also broke multi-image rows: text ended up with N placeholders while the row helper only forwarded the first image's pixel_values via the standard [0] indexing that mirrors upstream TRL process_row, so token vs image-feature counts diverged. Removing the synthesis matches stock TRL behavior; users provide the correct placeholders for their processor in the prompt. * Add tests for DPO vision row processor passthrough * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|
|
13928b5f0e
|
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024)
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
da78c6be71
|
[Studio] Install flash attn at setup time for linux (#4979)
* [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> |
||
|
|
d22b2a18f9
|
fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748)
* fix: add tokenizers to no-torch runtime deps and add TORCH_CONSTRAINT for arm64 macOS py313+ Two installer fixes: 1. Add `tokenizers` to `no-torch-runtime.txt` before `transformers`. Without it, `from transformers import AutoConfig` crashes on startup because `--no-deps` skips transitive dependencies. 2. Add `TORCH_CONSTRAINT` variable to `install.sh`. On arm64 macOS with Python 3.13+, tighten the torch requirement to `>=2.6` since torch <2.6 has no cp313 arm64 wheels. The variable replaces the previously hard-coded constraint in the uv pip install line. Includes 66 tests (42 pytest + 24 bash) covering: - Structural checks on install.sh, install.ps1, no-torch-runtime.txt - Shell snippet tests with mocked python for 13 platform/version combos - Mock uv integration verifying correct constraint string - E2E venv tests on Python 3.12 and 3.13 confirming AutoConfig works - Negative control proving AutoConfig fails without tokenizers - Full no-torch sandbox regression guards (safetensors, huggingface_hub) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incomplete no-torch manifest and align E2E tests with real --no-deps path - Add missing transitive deps to no-torch-runtime.txt that are required under --no-deps: regex, typing_extensions, filelock, httpx, httpcore, certifi, idna, anyio, sniffio, h11. Without these, `from transformers import AutoConfig` still fails after install.sh --no-torch. - Change all E2E tests to use --no-deps (matching what install.sh does) instead of normal dep resolution. Previous tests passed even with an incomplete manifest because uv backfilled transitive deps. - Rewrite negative control to derive from the real no-torch-runtime.txt with tokenizers stripped, proving the specific fix matters. - Replace GNU-only sed -i with heredoc in shell test for macOS compat. - Remove unused os/sys imports from Python test file. - Quote SKIP_TORCH and mock uv paths in bash -c strings. * Assert install succeeds before checking import results in E2E tests Address review feedback: test_torch_not_importable and test_tokenizers_directly_importable in Group 3 now assert that uv pip install returns 0 before checking import behavior. This prevents false positives when the install itself fails silently. * Assert install succeeds in negative control and tighten error check - Add missing install-success assertion in test_negative_control_no_tokenizers to prevent false positives from network/install failures. - Tighten error message check to look for "tokenizers" in stderr or ModuleNotFoundError, rather than the generic "No module" substring which could match unrelated import failures. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
2ffc8d2cea
|
tests: add no-torch / Intel Mac test suite (#4646)
* tests: add no-torch / Intel Mac test suite Add comprehensive test coverage for the no-torch / --no-torch installer and Studio backend changes introduced in #4624. Shell tests (tests/sh/test_mac_intel_compat.sh): - version_ge edge cases (9 tests) - Architecture detection + Python version resolution (4 tests) - get_torch_index_url on Darwin (2 tests) - UNSLOTH_NO_TORCH propagation via SKIP_TORCH (5 tests) - E2E uv venv creation at Python 3.12 (3 tests) - E2E torch skip with mock uv shim (4 tests) - UNSLOTH_NO_TORCH env propagation (4 tests) - --python override flag parsing + resolution (11 tests) - --no-torch flag parsing (4 tests) - SKIP_TORCH unification (3 tests) - CPU hint printing (2 tests) Python tests (tests/python/test_no_torch_filtering.py): - _filter_requirements unit tests with synthetic + real requirements files - NO_TORCH / IS_MACOS constant parsing - Subprocess mock of install_python_stack() across platform configs - install.sh --no-torch flag structural + subprocess tests Python tests (tests/python/test_studio_import_no_torch.py): - AST checks for data_collators.py, chat_templates.py, format_conversion.py - Parametrized venv tests (Python 3.12 + 3.13) for no-torch exec - Dataclass instantiation without torch - format_conversion convert functions without torch - Negative controls (import torch fails, torchao fails) Python tests (tests/python/test_e2e_no_torch_sandbox.py): - Before/after import chain tests - Edge cases (broken torch, fake torch, lazy import) - Hardware detection without torch - install.sh logic tests (flag parsing, version resolution) - install_python_stack filtering tests - Live server startup tests (opt-in via @server marker) * fix: address review comments on test suite - Fix always-true assertion in test_studio_import_no_torch.py (or True) - Make IS_MACOS test platform-aware instead of hardcoding Linux - Restore torchvision + torchaudio in server test cleanup (not just torch) - Include server stderr in skip message for easier debugging * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
19e9c60a8e
|
Consolidate dual venvs and separate install from update (#4530)
* refactor: consolidate dual venvs into single ~/.unsloth/studio/unsloth_studio
* refactor: separate install.sh (first-time) from setup.sh (smart update with PyPI version check)
* fix: install.sh calls setup.sh directly, keep both setup and update CLI commands
* fix: use importlib.resources.files() directly without _path attribute
* fix: bootstrap uv before pip upgrade to handle uv venvs without pip
* fix: frontend 404 when launched via CLI, add global symlink to ~/.local/bin
* feat: add --local flag to install.sh and unsloth studio update for branch testing
* fix: resolve repo root from script location for --local installs
* feat: add --package flag to install.sh for testing with custom package names
* feat: add --package flag to unsloth studio update
* fix: always nuke venv in install.sh for clean installs
* revert: remove Windows changes, will handle in separate PR
* fix: error when --package is passed without an argument
* revert: restore Windows scripts to current main
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: always explicitly set STUDIO_LOCAL_INSTALL and STUDIO_PACKAGE_NAME env vars
* fix: pass explicit STUDIO_LOCAL_REPO env var for --local installs
* fix: align banner box for Setup vs Update labels
* deprecate: hide 'unsloth studio setup' command, point users to update/install.sh
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: check stdout not stdin for auto-launch detection (curl pipe fix)
* fix: update install URL to unsloth.ai/install.sh
* fix: update install.sh usage comments to unsloth.ai/install.sh
* fix: use --upgrade-package for base deps to preserve existing torch/CUDA installs
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix: --local install now also installs unsloth-zoo via base.txt before editable overlay
* fix: don't skip base packages for --local installs (editable needs unsloth-zoo)
* refactor: move --local full dep install to install.sh, keep SKIP_STUDIO_BASE for all paths
* feat: add migration support for old .venv and CWD-based installs in setup.sh
* Revert "feat: add migration support for old .venv and CWD-based installs in setup.sh"
This reverts commit
|