unsloth

mirror of https://github.com/unslothai/unsloth.git synced 2026-05-21 18:34:15 +00:00

Author	SHA1	Message	Date
Daniel Han	08bb85fcda	Create CODEOWNERS (#4039 )	2026-02-12 02:56:13 -08:00
Lei Zhenyuan	cdc9dc1fb1	fix for tma (#4023 )	2026-02-10 17:50:33 -08:00
Datta Nimmaturi	6804c05130	Misc fixes (#4018 ) * convert print to logger * Print but cleaner * Hide model on multiple devices * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo transfomers -> transformers, revert MoE message change * Update MoE detection message to show num_experts and target_modules * Fix llama-cli path in save info message * target_parameters warning for moe * fix should_convert_module for llm_int8_skip_modules * fix should_convert_module for llm_int8_skip_modules * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Logging filters * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * negation * remove should_convert_module patch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 06:31:34 -08:00
Daniel Han	10338dbaa4	Fix warmup_ratio deprecation for transformers >= 5.0 (#4019 ) * Fix warmup_ratio deprecation warning for transformers >= 5.0 In transformers 5.0, warmup_ratio is deprecated in favor of warmup_steps which now accepts float values (< 1 = ratio, >= 1 = absolute steps). The compiler now conditionally sets warmup_steps=0.1 on transformers >= 5.0 (same semantics as warmup_ratio=0.1) and keeps warmup_ratio=0.1 on older versions where warmup_steps only accepts int. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 06:17:47 -08:00
Daniel Han	f106eec5e9	Fix Gemma3 4B training on transformers 5.x (token_type_ids) (#4017 ) * Inject token_type_ids for Gemma3 multimodal training on transformers 5.x In transformers 5.x, create_causal_mask_mapping() raises ValueError when is_training=True and token_type_ids is None. When doing text-only SFT on Gemma3 4B (a multimodal model), the dataset_utils detection for _needs_token_type_ids can miss because: - The model is wrapped in PeftModel, so type(model).__module__ points to peft.peft_model instead of transformers - The processing_class is a tokenizer (not Gemma3Processor), so the fallback MRO check resolves to a module without create_causal_mask_mapping This adds a fallback in _unsloth_pre_compute_loss that injects token_type_ids=zeros when: 1. token_type_ids is not already in inputs 2. The inner model config has model_type "gemma3" 3. The model's module has create_causal_mask_mapping (transformers 5.x) 4. The model is in training mode On transformers 4.x, create_causal_mask_mapping does not exist so this check is inert. Depends on: unslothai/unsloth-zoo#488 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 05:14:36 -08:00
andrewor14	cd24ea0e50	FP8: Load model on-the-fly in vLLM (#3717 ) * FP8: Load model on-the-fly in vLLM Summary: Existing support for `load_in_fp8=True` performs an offline quantization when loading the initial model. This is no longer necessary as of vllm==0.12.0 (after https://github.com/vllm-project/vllm/pull/23014), where we can quantize the model on-the-fly when we load it: ``` llm = LLM( ... hf_overrides={ "quantization_config_dict_str": json.dumps(torchao_config), }, ) ``` Note: Needs https://github.com/unslothai/unsloth-zoo/pull/380 Test Plan: https://gist.github.com/andrewor14/5b85119fae46845d07b608d420907423 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix on-the-fly FP8: always check mapper first, fallback to on-the-fly The original implementation bypasses the FP8 mapper entirely for vllm >= 0.12.0, meaning models like Llama-3.2-1B-Instruct and Qwen3-8B that have pre-quantized FP8-Block/FP8 checkpoints would never use them. This fixes the priority order: 1. Mapper has a pre-quantized model -> use it (always) 2. Mapper has no match + vllm >= 0.12.0 -> on-the-fly FP8 via torchao 3. Mapper has no match + vllm < 0.12.0 -> offline quantization Changes: - loader_utils.py: Move vllm >= 0.12.0 check after mapper lookups - loader.py: Set load_in_fp8=False when mapper resolves to a pre-quantized model to prevent double quantization Tested on B200 with Llama-3.2-1B-Instruct and Qwen3-8B. Corrected code produces results matching baseline (pre-quantized path preserved). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 05:10:13 -08:00
Datta Nimmaturi	3df65308f3	[Misc] Fixes (#4015 ) * convert print to logger * Print but cleaner * Hide model on multiple devices * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typo transfomers -> transformers, revert MoE message change * Update MoE detection message to show num_experts and target_modules --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 02:08:55 -08:00
Roland Tannous	fe5a7d11b6	add llama.cpp prefix to gguf conversion help messages (#4016 )	2026-02-10 01:59:05 -08:00
Fizza Mukhtar	a353fad514	Fix #3397 : Prevent trainer tokenization hang with safe num_proc (#4013 ) * Fix #3397: Prevent trainer tokenization hang with safe num_proc * Fix #3397: Add missing import sys for Windows-safe tokenization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate with existing num_proc guard in dataset_utils.py --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-10 01:53:46 -08:00
Daniel Han	acfe670357	Fix EmbeddingGemma float16 NaN via FORCE_FLOAT32 for gemma3_text (#4014 ) * Fix EmbeddingGemma float16 NaN by adding gemma3_text to FORCE_FLOAT32 and SDPA lists * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 01:40:13 -08:00
Daniel Han	a2f4f04ea5	Inject model reference for dynamic token_type_ids detection in SFTTrainer (#4012 ) * Inject model reference for dynamic token_type_ids detection in SFTTrainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-10 00:37:07 -08:00
Daniel Han	a35e866625	Suppress vLLM v1 executor sleep/wake log messages (#4011 ) * Suppress vLLM v1 executor sleep/wake log messages Add HideLoggingMessage filters for vllm.v1.executor.abstract logger to suppress repetitive sleep/wake INFO and WARNING messages that spam training output when UNSLOTH_VLLM_STANDBY is enabled. The existing filter at line 275 handles the legacy vllm.executor.executor_base path; this adds coverage for the v1 engine path used by vllm 0.11+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 23:51:58 -08:00
pre-commit-ci[bot]	293b431e77	[pre-commit.ci] pre-commit autoupdate (#4009 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.14.14 → v0.15.0](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.14...v0.15.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 17:32:18 -08:00
Daniel Han	4f5de9ba93	Silence peft target_parameters RuntimeWarning for MoE models (#4008 ) * Silence peft target_parameters RuntimeWarning for MoE models Wrap _get_peft_model calls with warnings.catch_warnings() to suppress the "target_parameters were set but no parameter was matched" warning. This fires on MoE models where expert layers use nn.Parameter naming that peft warns about but handles correctly. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 08:25:40 -08:00
Daniel Han	4924a5f6aa	Silence TRL's batch_size=1 padding-free warning in compiled trainer source (#4007 ) Strip the "anihilate"/"annihilate" warning block from compiled trainer source so it does not fire when Unsloth auto-enables padding-free mode with batch size 1 (the common single-GPU case). Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 07:55:29 -08:00
Daniel Han	f3f3c9dfb9	Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training (#4005 ) * Fix dtype mismatch in fp16 + 4-bit/8-bit LoRA training Two fixes for training with dtype=torch.float16 and load_in_4bit=True: 1. fast_lora.py: fast_dequantize() returns tensors in quant_state.dtype (typically bfloat16 or float32), but activations may be float16. The subsequent matmul/addmm operations require matching dtypes. Add dtype casts after each fast_dequantize() call in LoRA_MLP.backward and LoRA_QKV.backward (5 locations total). 2. rl.py: TRL unconditionally casts trainable parameters to bfloat16 in the peft init block. When training with fp16=True, this causes GradScaler to crash since it requires float32 parameters. Make the cast conditional -- use float32 when fp16 is enabled, bfloat16 otherwise. This is a no-op for GRPOTrainer (whose peft init block is already removed by the existing regex), but fixes SFTTrainer and other TRL trainers. Tested with Llama-3.2-1B-Instruct 4-bit on both fp16 and bf16 training. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix fp16 + 4-bit LoRA: thread correct_dtype through post_patch Root cause: fast_dequantize returns tensors in quant_state.dtype, which for pre-quantized models is bfloat16 (from config.json). The post_patch methods in llama/gemma/gemma2 call patch_model_and_tokenizer without passing correct_dtype, so quant_state.dtype is never overridden to match the user's requested dtype. This causes a dtype mismatch crash in the backward pass when training with dtype=torch.float16. Fix: pass the user's dtype from from_pretrained through post_patch to patch_model_and_tokenizer as correct_dtype, matching the pattern already used by vision.py. Revert the 5 symptom-level dtype casts in fast_lora.py (upW, gateW, QW, KW, VW) since they are no longer needed with quant_state.dtype properly set at the source. Tested: fp16+4bit and bf16+4bit Llama-3.2-1B-Instruct 15-step SFT runs both complete successfully with similar losses (~1.558 vs ~1.563). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove TRL's unconditional bfloat16 cast instead of patching the dtype TRL 0.26.0+ hardcodes `param.data.to(torch.bfloat16)` for all trainable params in quantized models, citing the QLoRA paper recommendation. This is wrong: it ignores the user's requested dtype and breaks GradScaler when fp16=True. The block exists in sft_trainer, grpo_trainer, rloo_trainer, and reward_trainer (not dpo_trainer). Previous fix patched the cast to be dtype-conditional. This commit replaces the entire guard `if getattr(model, "is_loaded_in_4bit", ...) or getattr(model, "is_loaded_in_8bit", ...):` with `if False:` to disable the block entirely. Unsloth already handles adapter dtype via patch_model_and_tokenizer, making TRL's cast both unnecessary and harmful. For GRPOTrainer the enclosing peft init block is already removed by the regex above, making this a no-op for GRPO. --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 07:39:26 -08:00
Daniel Han	0a04b1b22c	Fix trl.experimental thin wrapper compilation and OOM from peft_config overwrite (#4006 ) * Fix trainer compilation failures from trl.experimental thin wrappers * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix OOM from prepare_model_for_kbit_training overwriting peft_config patching --------- Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 07:04:55 -08:00
Daniel Han	14fe579629	Fix VLM model + text-only dataset ValueError in TRL 0.22.x (#4004 ) TRL 0.22.x checks _is_vlm (model type) instead of _is_vision_dataset (dataset content, added in 0.25.1+) in _set_signature_columns_if_needed. When _is_vlm=True (e.g. Gemma3), signature columns are set to vision-only ["messages","prompt","completion","images"], which has zero overlap with tokenized text columns [input_ids, labels, attention_mask, ...], causing a ValueError. Fix: expand the VLM branch signature columns to include both vision and text column names. Extra columns not present in the dataset are harmlessly ignored by _remove_unused_columns (it only raises when zero columns match). Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 06:24:58 -08:00
Daniel Han	ba7366be53	Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 (#3998 ) * Patch before compile? * Fix notebook compatibility for transformers 4.57.6 and TRL 0.22-0.27 Fixes several notebook failures discovered during testing all 125 notebooks with transformers==4.57.6 + tRL 0.22.2 and TRL 0.27.1. Warning suppression (import_fixes.py): - Suppress torch 2.9+ pin_memory/is_pinned device deprecation warnings - Suppress cuda.cudart/cuda.nvrtc module deprecation FutureWarning - Filter vllm "Level is deprecated" stderr noise - Filter PydanticSerializationUnexpectedValue warnings - Filter Triton "df: No such file" stderr noise VLM tokenizer loading (vision.py): - Add _construct_vlm_processor_fallback() for models where AutoProcessor.from_pretrained fails (e.g., ERNIE 4.5 VL, LFM2.5-VL) - Wrap processor loading in try/except with fallback to manual construction from separate image_processor + tokenizer components - Add fallback to AutoTokenizer/PreTrainedTokenizerFast when tokenizer loading or patching fails TRL 0.27.1 trainer compatibility (trainer.py): - Add _resolve_trainer_params() to handle thin wrapper trainers that only have def __init__(self, args, kwargs) (e.g., ORPOTrainer in TRL 0.27.1) by walking MRO for real parameter signature VLM _is_vlm detection (rl.py): - Replace blanket _is_vlm=False override with model-architecture-based detection that checks vision_config or ForConditionalGeneration class name, fixing VLM training when bare tokenizer is passed as processing_class ModernBERT SDPA compatibility (loader.py, sentence_transformer.py): - Add "modernbert" to DISABLE_SDPA_MODEL_NAMES to avoid stride alignment issues with torch.compile backward pass - Add DISABLE_SDPA check for sentence transformer models Other fixes (_utils.py): - Suppress false uninitialized weight warnings for VLM multi_modal_projector.layer_norm Tested: 92/125 notebooks pass with TRL 0.22.2, 94/125 with TRL 0.27.1. Remaining failures are infra (missing FFmpeg, network timeouts, GPU arch) not code bugs. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix KTO shape mismatch on TRL 0.27.2+ and truncation alignment - Patch KTO get_batch_logps to auto-align logits and labels when Unsloth model forward truncates input_ids beyond max_seq_length. TRL 0.27.2 changed _process_tokens to only truncate completions (not prompts), so sequences with long prompts exceed max_seq_length and trigger model-side truncation. The original ValueError is replaced with min-length alignment. - Also truncate attention_mask in LlamaModel forward when input_ids are truncated to max_seq_length, preventing shape mismatches in attention. - Widen except clause in rl_replacements.py openenv import from `except ImportError` to `except (ImportError, NameError, Exception)` to handle vllm SamplingParams NameError in TRL 0.27.2. * Fix TRL 0.26+ thin wrapper resolution, enable ModernBERT SDPA, clean up warning filters TRL 0.26+ thin wrapper resolution (rl.py): - Filter _-prefixed private imports when discovering Trainer/Config classes - Look up Config in separate _config.py module when not found in trainer module - Detect thin wrappers (<1000 chars source) and resolve to experimental parent via MRO walk; use resolved module for imports and create_new_function - Enables all 15 trainers to patch successfully (was 5/15 before) ModernBERT SDPA (loader.py): - Remove "modernbert" from DISABLE_SDPA_MODEL_NAMES - SDPA works correctly for both classification and sentence transformers - Verified: 88.9% accuracy on emotion classification, correct domain-specific embeddings after sentence transformer fine-tuning Warning filter cleanup (import_fixes.py): - Remove cuda.cudart/cuda.nvrtc FutureWarning filters (no such warnings exist in torch 2.9.1+; proactive suppression is unnecessary) [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove multi_modal_projector.layer_norm from uninitialized weight guard The LFM2.5-VL projector LayerNorm is properly initialized by transformers and does not need to be excluded from the uninitialized weight check. The original exclusion was added as a workaround but is no longer needed after the upstream fix. * Add transformers 5.0 compat: rope_theta helper, config-as-dim detection, BatchEncoding guard, try/except for TRL trainer source, push_to_hub_token compiler fix - llama.py: Add _get_rope_theta() helper handling both config.rope_theta and rope_parameters dict - llama.py: Handle BatchEncoding in unsloth_fast_generate (transformers 5.0+ returns BatchEncoding from apply_chat_template) - gemma.py: Detect config passed as dim arg in GemmaFixedRotaryEmbedding - tokenizer_utils.py: Add try/except for TRL trainer getsource in patch_sft_trainer_tokenizer - rl_replacements.py: Add compiler fix replacing bare pop("push_to_hub_token") with pop(..., None) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use trl.experimental string check instead of char-count heuristic for thin wrapper detection The <1000 / >1000 char threshold was fragile -- XPOConfig's parent is only 994 chars and would be skipped. All thin wrappers in TRL 0.26+ contain "trl.experimental" in their deprecation warning, while no real trainer or config class does, making it a reliable detection marker. * Move DISABLE_SDPA_MODEL_NAMES import to module level in sentence_transformer The function-level import was redundant since loader.py is already imported at module level. Move it to the existing loader import line. --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 05:11:50 -08:00
siddhu donda	884ce4601f	fix: add inputs_embeds support in _fast_prepare_inputs_for_generation (#3798 ) (#3814 ) Add `inputs_embeds` parameter to `_fast_prepare_inputs_for_generation` so `model.generate(inputs_embeds=...)` works with Unsloth-patched models. Changes: - Add `inputs_embeds=None` to function signature (fixes HF inspect check) - Track `use_inputs_embeds` flag: True when inputs_embeds provided and no cache - Conditionally return inputs_embeds on first step, input_ids on subsequent steps - Handle input_ids being None/empty for batch size and device extraction - Add attention_mask None-guard before slicing Fixes: https://github.com/unslothai/unsloth/issues/3798 Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: siddhudonda <siddhudonda@users.noreply.github.com>	2026-02-09 04:59:43 -08:00
Daniel Han	3b1e8d0ae6	Update README.md	2026-02-09 04:50:54 -08:00
Daniel Han	60dd7269a5	Fix broken documentation links, typos, and formatting in README (#4003 ) - Fix 14 broken documentation links (all returning 404) caused by docs site restructuring (install-and-update -> install, pages moved to /docs/blog/ and /docs/models/tutorials/) - Fix "Qwen2.3-VL" -> "Qwen3-VL" (model does not exist) - Fix incorrect "GSPO" label on gpt-oss GRPO notebook - Fix "4b-bit" typo -> "4-bit" - Fix "sodoku" typo -> "sudoku" - Fix double dash formatting on FP8 GRPO notebook list item - Fix citation URL from http:// to https:// - Update "MultiGPU coming soon" to "is now supported" - Fix Windows installation step numbering (1,3,5,6,7 -> 1,2,3,4,5) - Fix Advanced/Troubleshooting step numbering (5,6,5 -> 4,5,6) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 04:46:46 -08:00
Fizza Mukhtar	c98312f229	Fix multi-GPU loading for quantized models in distributed training (#3917 ) When using torchrun with quantized models (4bit/8bit/fp8), each rank must load the model directly onto its own GPU. The default device_map ("sequential") places everything on GPU 0, causing illegal memory access errors when Accelerate tries to relocate quantized weights. Use the existing prepare_device_map() utility from loader_utils to detect distributed training via LOCAL_RANK/WORLD_SIZE env vars and override device_map to target each rank's local GPU. This is applied in both FastLanguageModel.from_pretrained and FastModel.from_pretrained, covering text, vision, and audio model paths. Fixes #3914 Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 04:26:21 -08:00
Mohammad Miadh Angkad	336bec216a	Refactor Ollama template wiring and harden packing helpers (#3890 ) * Refactor Ollama template wiring and harden packing helpers Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> * Fix Qwen3 and Gemma3n template bindings and tidy packing test helper * Fix gptoss Ollama comment and tinyllama stop parameter - Fix wrong comment referencing gemma3n for gptoss_ollama in chat_templates.py - Add missing stop keyword to tinyllama PARAMETER in ollama_template_mappers.py * Fix _DummyTrainer compatibility across TRL versions The try/except only handled the removal of return_position_ids (TRL v0.24+) but not the absence of padding_free (TRL v0.18.2). Gracefully degrade through all optional collator flags so the test works from trl>=0.18.2 through v0.27+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 04:04:48 -08:00
RektPunk	f868d8b073	[Feature] seperate gguf file path (#3934 ) * seperate gguf * fix Modelfile log * ollama Modelfile create * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GGUF file placement: move initial conversion to _gguf dir, fix cleanup - Move initial GGUF files (from convert_to_gguf) into {model_directory}_gguf/ immediately after conversion, so all GGUF outputs live in the dedicated directory regardless of quantization method (fixes bf16-only case where quant == first_conversion skipped the loop and _gguf dir was never created) - Remove redundant gguf_directory/makedirs from inside the re-quant loop since the directory is now created before the loop - Use Path.unlink(missing_ok=True) for base GGUF cleanup robustness - Unify Modelfile location to {save_directory}_gguf/Modelfile for both VLM and non-VLM models - Fix print message to show actual modelfile_location path - Add gguf_directory key to return dict - Clean up {save_directory}_gguf in push_to_hub_gguf error/finally blocks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-09 04:00:14 -08:00
Etherll	315178b5c3	Add push_to_hub_gguf support for FastSentenceTransformer (#4002 ) * Implement GGUF upload method for SentenceTransformer Added a method to convert and upload SentenceTransformer models to GGUF format, including handling of tokenizer, quantization methods, and repository management on Hugging Face Hub. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-09 00:51:26 -08:00
Daniel Han	b47b081f99	Fix triton 3.6.0 + torch 2.9.x torch.compile crash (missing cluster_dims) (#4001 ) Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-08 20:18:25 -08:00
Daniel Han	c43a5b8f02	Fix multiprocessing crash on Windows/macOS and unify num_proc logic (#3999 ) On Windows and macOS (Python 3.8+), multiprocessing uses the spawn start method. When datasets .map(num_proc=N) is called, it creates a Pool(N) which re-imports __main__ in each worker, causing infinite recursion and a RuntimeError during bootstrapping. Guard the auto-computed dataset_num_proc in the generated Config __init__ by checking multiprocessing.get_start_method() != 'fork'. When the start method is not fork (spawn/forkserver), force dataset_num_proc = None so datasets takes the single-process path. Linux fork behavior is unchanged. Also replace the fixed memory threshold logic with the simpler adaptive approach: cap at 64, then min(num_proc, int(available_gb)), with a safety floor of 1 when available memory is at or below 2GB. Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-08 02:50:06 -08:00
pluesclues	c6de138e62	Update rl_replacements.py (#3990 )	2026-02-05 08:22:42 -08:00
Daniel Han	3a4b1e7fc5	Disable torchcodec in transformers when FFmpeg is missing (#3989 ) * Disable torchcodec in transformers when FFmpeg is missing When torchcodec is installed but FFmpeg libraries are unavailable, transformers still thinks torchcodec is available (via find_spec check) and tries to use it for audio loading, causing RuntimeError. This adds disable_torchcodec_if_broken() which tests if torchcodec can actually load its native libraries, and if not, patches transformers' _torchcodec_available to False so it falls back to librosa instead. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-05 06:54:09 -08:00
Daniel Han	145f6aaeb1	Fix cutlass inductor options for PyTorch < 2.8.0 (#3988 ) The cuda.cutlass_epilogue_fusion_enabled and cuda.cutlass_tma_only inductor config options were added in PyTorch 2.8.0. Using these options on older PyTorch versions causes a RuntimeError during GRPOTrainer initialization. This fix adds a version check to only include these options when running PyTorch 2.8.0 or later, allowing GRPO training to work on older PyTorch versions (e.g., Colab environments with PyTorch 2.5-2.7). Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-05 06:40:11 -08:00
Daniel Han	7b42acae94	Fix RuntimeError not caught when torchcodec fails to load (#3987 ) When datasets library has torchcodec installed but FFmpeg libraries are missing, torchcodec raises a RuntimeError during import. The exception handler only caught ImportError and AttributeError, causing the error to propagate and crash Unsloth imports in environments like Colab where FFmpeg may not be installed. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-02-05 06:35:10 -08:00
Daniel Han	ce256c43bc	Merge branch 'main' of https://github.com/unslothai/unsloth	2026-02-05 06:10:06 -08:00
Daniel Han	f463f692d6	MoE release	2026-02-05 06:09:56 -08:00
Datta Nimmaturi	fad6957555	[MoE] Improve moe kernels for unsloth fine tuning (#3812 ) * Improve MoE performance * small changes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports * disable autotune * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * LoRA for MoE * Make autotune default * make dy contiguous * use non lora model as base for RL * Revert "use non lora model as base for RL" This reverts commit bc8f15629d060593b2eaf436f158ff5ac9df0d5d. * fixup derp * non TMA [T4] * Revert "non TMA [T4]" This reverts commit 35304566690e7c9ab9632899920c85bff322409a. * Fixes for VL MoE and v5 transformers * [transformers] [v5] remove unused hybridcache (#3910) * remote unused hybridcache * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No double compile for qwen3moe * Fix top_k on trl GRPO * Recognise GLM as MoE * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix missing RotaryEmbeddingConfigMixin * Licensing for autotuning cache * Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Erland366 <erland.pg366@gmail.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-05 06:03:25 -08:00
Daniel Han	2883ce4091	Update _utils.py	2026-02-05 05:58:00 -08:00
Daniel Han	ff3f78b6b9	Add PyTorch 2.10 and xformers 0.0.34 support (#3985 ) - Add cu126/cu128/cu130 xformers 0.0.34 wheel dependencies for torch 2.10 - Add cu126-torch2100, cu128-torch2100, cu130-torch2100 meta-dependencies - Add cu126-ampere-torch2100, cu128-ampere-torch2100, cu130-ampere-torch2100 variants - Update _auto_install.py version detection for torch 2.10.x - Add CUDA check for torch 2.10 (requires CUDA 12.6, 12.8, or 13.0) - Update README.md with torch 2.10 installation instructions Co-authored-by: Daniel Hanchen <danielhanchen@users.noreply.github.com>	2026-02-05 05:56:26 -08:00
Daniel Han	7ceebe4554	Silence non-actionable TRL trainer import failures (#3980 ) _patch_trl_rl_trainers enumerates all trainer modules from dir(trl.trainer) and attempts to import each one. Modules like alignprop_trainer fail because they depend on optional packages (diffusers) that may not be installed. The failure is harmless but the print() call produces noise on every import. Change print() to logger.info() so these messages only appear when UNSLOTH_ENABLE_LOGGING=1. Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-02-05 05:32:52 -08:00
Daniel Han	5798267401	Silence third-party deprecation warnings and fix socket leak (#3983 ) * Silence third-party deprecation warnings and fix socket resource leak - Add warning filters for TorchAO deprecated import paths - Filter SWIG builtin type warnings from bitsandbytes/triton - Filter Triton autotuner deprecation warnings - Filter Python 3.12+ multiprocessing fork warnings - Filter resource warnings for unclosed sockets/files - Fix socket leak in has_internet() by properly closing socket * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-05 04:55:52 -08:00
Daniel Han	649865caca	Fix GPT-OSS BlockMask error during inference (#3982 ) GPT-OSS models use eager attention during inference because flex attention returns incorrect results (likely due to left padding). However, when _attn_implementation is set to "flex_attention", transformers creates BlockMask objects which cause a TypeError when passed to the eager attention path: TypeError: unsupported operand type(s) for +=: 'Tensor' and 'BlockMask' This fix excludes GPT-OSS from using flex_attention, keeping it on the eager path to avoid the BlockMask/Tensor type mismatch.	2026-02-05 04:28:46 -08:00
Daniel Han	6f3e52bbcf	Prefer flex attention when available (#3979 ) * Enable flex attention by default * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Avoid dropping flex attention when SDPA unsupported --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-05 03:19:04 -08:00
pluesclues	9b34982509	Trl 0.27.0 update (#3965 ) * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl_replacements.py * Update rl.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update rl_replacements.py, remove chat template from codexes commits * Update rl.py, got rid of gradient checkpointing code that did not work --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-04 23:01:16 -08:00
Daniel Han	e1c682e6d2	Fix torchvision compatibility check for source builds and future torch versions (#3978 ) * Fix torchvision compatibility check for source builds and future torch versions The torchvision version check raised a hard ImportError for custom/source-built PyTorch installations (e.g. AMD ROCm from source with +git* suffixes), even when the actual build was functional. This also silently skipped any torch version not already in the hardcoded table, giving no warning at all for future releases. Changes: - Detect custom/source builds by checking the raw version string's local identifier against known standard prefixes (cu, rocm, cpu, xpu). Our custom Version() strips local identifiers via regex, so detection must happen on the raw string before parsing. - Downgrade to a warning (instead of ImportError) for custom/source builds, since their version numbers may not follow standard PyPI release pairings. - Add formula-based inference for future torch versions not yet in the table. The torch->torchvision minor version formula (torch 2.x -> tv 0.(x+15)) has held for every release from torch 2.0 through 2.9. For formula-predicted versions, mismatches produce a warning rather than a hard error. - Add UNSLOTH_SKIP_TORCHVISION_CHECK=1 env var to skip the check entirely. - Wrap importlib_version and Version calls in try/except so broken metadata never crashes the import. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: stricter regex, case insensitivity, pre-release detection Fixes three edge cases found during review: 1. Regex precision: cu/xpu now require a trailing digit (cu\d, xpu\d) to avoid false negatives on suffixes like "+custom_build" that happen to start with "cu". cpu/xpu match as exact strings only. 2. Case insensitivity: added re.IGNORECASE so "+ROCM6.3" and "+CPU" are correctly recognized as standard builds rather than custom ones. 3. Pre-release detection: nightly/dev/alpha/beta/rc builds with standard CUDA/ROCm suffixes (e.g. "2.7.0.dev20250301+cu124") now produce a warning instead of a hard ImportError. These builds commonly have version mismatches that are expected during development. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address PR review comments: fullmatch, env var casing, torchvision pre-release 1. Switch re.match to re.fullmatch for the custom build regex so the entire local identifier must match. Fixes false negatives where suffixes like +cu124_custom were misclassified as standard because re.match only checked the start of the string. 2. Use .lower() for the UNSLOTH_SKIP_TORCHVISION_CHECK env var so any casing of "true" / "TRUE" / etc. is accepted. 3. Check torchvision_version_raw for pre-release tags in addition to torch_version_raw, so a stable torch paired with a nightly torchvision (e.g. 0.23.0.dev...) also gets a warning instead of a hard ImportError. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-04 04:50:26 -08:00
Daniel Han	4f75ec2fc8	Add vLLM + torch < 2.9.0 + SM100 compatibility check (#3973 ) vLLM's distributed module (device_communicators) crashes with std::bad_alloc when imported on SM100 GPUs (B200/B100/Blackwell) with torch < 2.9.0. This adds an early check that runs before vLLM is imported, providing a helpful error message instead of a cryptic C++ exception. The check: 1. Detects if vLLM is installed 2. Checks if torch version is < 2.9.0 3. Checks if any GPU is SM100 (Blackwell) 4. If all conditions met, raises RuntimeError with clear upgrade instructions	2026-02-03 03:10:24 -08:00
Daniel Han	d5f5b7d6a6	Add TRL truncation regression and metadata loss fixes (Fixes 1 and 3) (#3971 ) * Add TRL truncation regression and metadata loss fixes Fix 1: TRL 0.24.0-0.25.1 right-truncation regression - These versions pass max_length=self.max_prompt_length and truncation=True to the tokenizer, which right-truncates prompts and strips the assistant turn suffix - Use regex to remove these kwargs from the generated code Fix 3: Metadata loss for chat_template_kwargs - TRL 0.24.0+ extracts prompts = [x["prompt"] for x in inputs], losing metadata like reasoning_effort - Inject code to store per-sample chat_template_kwargs on self before extraction - Preserve these kwargs in prompts_text generation for all TRL versions Tested with TRL versions 0.22.2, 0.23.1, 0.24.0, 0.25.1, 0.26.2, and 0.27.1. * Update Fix 1 comment with detailed TRL version behavior explanation Expand the comment for the TRL 0.24.0-0.25.1 truncation regression fix to clarify what each TRL version does: - TRL 0.22.2-0.23.1: Uses truncate_with_protected_tokens() for smart truncation that preserves rightmost tokens and protects special tokens - TRL 0.24.0-0.25.1: Removed smart truncation, passes kwargs directly to tokenizer (max_length, truncation=True, add_special_tokens=False) - TRL 0.26.2+: Removed these kwargs entirely The fix removes these problematic kwargs so 0.24.0-0.25.1 behaves like 0.26.2+ (no tokenizer-level truncation). --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com>	2026-02-03 03:00:12 -08:00
Daniel Han	8f44ae0eda	Fix num_train_epochs=None causing TypeError in GRPOConfig (#3972 ) When users pass `num_train_epochs=None` to GRPOConfig (relying on max_steps to control training duration), Trainer.__init__ fails with: TypeError: '>' not supported between instances of 'NoneType' and 'int' This happens because transformers.Trainer does `args.num_train_epochs > 0` in its __init__ which fails when the value is None. This fix converts None to 3.0 (the default) before Trainer initialization. The actual training duration is still controlled by max_steps since it takes precedence when both are set. Example that now works: ```python config = GRPOConfig( num_train_epochs=None, # Previously caused TypeError max_steps=500, # This controls actual duration ... ) ```	2026-02-03 02:48:40 -08:00
Daniel Han	41417693e4	Fix Vision GRPO string prompts and OpenEnv async compatibility (#3964 ) * [fix] Vision GRPO string prompts and OpenEnv async compatibility - Guard prepare_multimodal_messages in GRPO trainer to skip processing when prompts are pre-templated strings. Notebooks that pre-apply apply_chat_template() produce strings with image tokens already embedded; calling prepare_multimodal_messages on those crashes with TypeError. - Apply nest_asyncio when OpenEnv EnvClient exposes async reset/step, so scripts using run_until_complete() wrappers work in all contexts. - Add wrapper to call patch_torchcodec_audio_decoder() from unsloth_zoo for AudioDecoder dict-compatibility. * Add apply_chat_template guard for pre-templated string prompts in Vision GRPO When notebooks pre-apply apply_chat_template, prompts become strings. The existing guard skips prepare_multimodal_messages for strings. This adds a second guard to skip apply_chat_template in the forward_kwargs block, using prompts directly as prompts_text instead. Covers both TRL 0.25.x (no tools param) and TRL 0.26.2+ (with tools=self.tools). Non-matching replacements silently pass for older TRL versions. * Add TRL 0.25.1 single-line variant for apply_chat_template guard TRL 0.25.1 uses single-line formatting for apply_chat_template: apply_chat_template({"prompt": prompt}, ...)["prompt"] While TRL 0.26.2+ uses multi-line formatting: apply_chat_template( {"prompt": prompt}, ... )["prompt"] Add both variants to ensure full backwards compatibility. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-03 02:03:46 -08:00
Daniel Han	949f1ce573	Fix TRL 0.27.0 GRPO compatibility and PEFT model handling (#3969 ) * Fix TRL 0.27.0 GRPO compatibility and PEFT model handling - Remove use_reentrant=False from gradient_checkpointing_kwargs for TRL 0.27.0+ TRL 0.27.0 auto-sets use_reentrant=False in GRPOConfig.__post_init__, but Unsloth gradient checkpointing requires use_reentrant=True. This adds a post-init cleanup that removes the setting when present. - Handle prepare_peft_model standalone function pattern for TRL 0.22.0+ TRL changed from self._prepare_peft_model() method to prepare_peft_model() standalone function. Both patterns are now bypassed to let Unsloth handle PEFT model preparation. Tested with TRL versions 0.22.2, 0.23.1, 0.24.0, 0.25.1, 0.26.2, and 0.27.1. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-03 01:56:31 -08:00
Kaitao Yang	7dd3ae8768	reduce code duplication (#3877 ) * reduce code duplication * address reviewer feedback: keep original function name - Keep original function name `_offload_frozen_module_for_training` - Make `offload_device` parameter Optional (can be None) - Keep original error handling (return None for missing modules_to_save) - Maintain code deduplication by reusing the helper function --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-02-03 00:27:49 -08:00
Daniel Han	8f0b57ae18	Use standard gradient checkpointing for small sequence lengths (#3867 ) * Use standard gradient checkpointing for small sequence lengths When max_seq_length < 512, the overhead of gradient offloading in gc="unsloth" mode is not worth it. Benchmarks on B200 show: \| seq_len \| gc=unsloth \| gc=True \| Difference \| \|---------\|------------\|----------\|------------\| \| 256 \| 6,803 t/s \| 6,993 t/s\| +2.8% \| \| 384 \| 9,889 t/s \| 9,963 t/s\| +0.7% \| \| 512 \| 13,151 t/s \| 13,092 t/s\| -0.4% \| \| 1024 \| 26,662 t/s \| 25,094 t/s\| -5.9% \| The crossover point is around seq_len 384-512. For sequences shorter than 512, we now automatically use standard gradient checkpointing instead of the custom offloading implementation. Additionally, when user explicitly sets use_gradient_checkpointing to True or False in get_peft_model, it now correctly overrides any previous "unsloth" patching from from_pretrained. This ensures consistent behavior regardless of the order of function calls. Updated in three locations: - FastLlamaModel.get_peft_model (llama.py) - FastLanguageModel.from_pretrained (loader.py) - FastModel.from_pretrained (loader.py) * Refactor: extract gradient checkpointing heuristic into utility function Addresses code review feedback to reduce duplication. The gradient checkpointing heuristic logic was duplicated in 3 places: - FastLlamaModel.get_peft_model (llama.py) - FastLanguageModel.from_pretrained (loader.py) - FastModel.from_pretrained (loader.py) Created apply_unsloth_gradient_checkpointing() utility function in _utils.py that handles: - Heuristic: seq < 512 falls back to standard gc - Explicit True/False overrides unpatch previous patching - Returns the effective use_gradient_checkpointing value Net reduction of ~6 lines while improving maintainability. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: danielhanchen <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-02-02 23:57:09 -08:00

1 2 3 4 5 ...

3371 commits