unsloth

mirror of https://github.com/unslothai/unsloth.git synced 2026-04-28 03:19:57 +00:00

Author	SHA1	Message	Date
luo jiyin	06ed94da0d	chore: fix typo cleanup across tests and backend strings (#5152 ) * chore: fix typos in studio/backend/routes/models.py * chore: fix typos in tests/saving/non_peft/test_mistral_non_peft.py * chore: fix typos in tests/saving/non_peft/test_whisper_non_peft.py * chore: fix typos in tests/saving/vision_models/test_index_file_sharded_model.py * chore: fix typos in tests/saving/vision_models/test_push_to_hub_merged.py * chore: fix typos in tests/saving/vision_models/test_save_merge_qwen2.5vl32B_model_ocr_benchmark.py * chore: fix typos in tests/saving/vision_models/test_save_merge_vision_model_ocr_benchmark.py * chore: fix typos in unsloth/import_fixes.py * Split: keep only 6 file(s) --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-24 12:51:27 +01:00
Octopus	72c1c3b254	fix: patch CONTROL type for special tokens in sentencepiece GGUF export (#5080 ) * fix: patch CONTROL type for special tokens in sentencepiece GGUF export (fixes #5070) When converting a Gemma 3 fine-tune to GGUF via save_pretrained_gguf, tokens like <start_of_turn> (id=105) and <end_of_turn> (id=106) are already present in the sentencepiece model but are typed as NORMAL (1) instead of CONTROL (3). llama.cpp only recognises CONTROL tokens when parse_special=True is active, so these tokens get BPE-split during chat inference and the model produces garbage output. fix_sentencepiece_gguf now reads tokenizer.json's added_tokens list and, for any token with "special": true whose ID falls within the existing sentencepiece vocabulary, updates its type from NORMAL to CONTROL before writing the patched tokenizer.model to disk. The same CONTROL type is also applied when new tokens are appended for the out-of-range case, so both code paths are consistent. * Wire fix_sentencepiece_gguf into tokenizer save path and guard np.diff - save.py: call fix_sentencepiece_gguf inside unsloth_tokenizer_save_pretrained after _preserve_sentencepiece_tokenizer_assets. The helper was previously unreferenced in the repo, so the PR's CONTROL-type patch never actually ran during save_pretrained_gguf. - tokenizer_utils.py: add an early-return guard for len(added_tokens_ids) < 2 before the existing np.diff contiguity check. np.diff on a single-element array returns [] and .min() raises ValueError, which would discard the new in-vocab CONTROL patch; the guard flushes tokenizer.model first. Guard is inserted before the existing lines (diff = np.diff(...) and the min/max check) so their blame is unchanged. Dropped the separate refactor to fold the four duplicated "if patched > 0: write tokenizer.model" blocks into a helper because doing so re-indents lines whose blame is "Formatting & bug fixes"; the duplication remains the author's pattern. * Fix review findings: negative token_id guard and np.diff single-element - tokenizer_utils.py:481: add 0 <= lower bound to the special_token_ids bounds check. Previously a negative token_id from tokenizer.json passed 'token_id < sentence_piece_size' and Python's negative indexing wrapped tokenizer_file.pieces[-1] to silently corrupt the last piece to CONTROL. - tokenizer_utils.py:513: replace the loop-1 'if len < 2: return' guard (which was too broad: it silently skipped vocab extension for single-entry added_tokens.json) with a pre-pass that substitutes a trivially-contiguous 2-element sentinel for the contiguity check, then restores the original array before the append loop. Lines 519 ('diff = np.diff(added_tokens_ids)') and 520-529 (min/max/boundary checks and early-return write blocks) are left literally unchanged so blame remains intact. * Restore real added_tokens_ids before min boundary check Move the '_real_added_tokens_ids' restore above the 'added_tokens_ids.min() != sentence_piece_size' check. With the previous order the sentinel [sentence_piece_size, sentence_piece_size + 1] was still in scope when the min check ran, so any single-entry added_tokens .json with an out-of-range start id (e.g. 99 when sentence_piece_size=2) bypassed the boundary check and fell through to the append loop. * Scope fix_sentencepiece_gguf to GGUF export path only Previously wired fix_sentencepiece_gguf into unsloth_tokenizer_save_pretrained, which is the generic monkey-patch replacement for every tokenizer.save_pretrained call. That caused the GGUF-specific mutation (and the unconditional protobuf import in fix_sentencepiece_gguf) to run on every LoRA / merged 16-bit / push_to_hub / torchao save, where it has no purpose and can abort the entire save if the protobuf runtime is unavailable. - save.py: remove fix_sentencepiece_gguf call from unsloth_tokenizer_save_pretrained. - save.py: add the call inside unsloth_save_pretrained_gguf immediately before save_to_gguf, wrapped in try/except so a protobuf import failure logs a warning and lets GGUF conversion proceed rather than aborting the save. * Broaden special-token retag to USER_DEFINED and narrow save.py except - tokenizer_utils.py:483: the in-vocab retag previously only promoted NORMAL pieces to CONTROL, but the real Gemma tokenizer (e.g. unsloth/functiongemma -270m-it) stores <start_of_turn>/<end_of_turn> as USER_DEFINED (type 4). Extend the predicate to cover both NORMAL and USER_DEFINED so tokens marked "special": true in tokenizer.json are promoted regardless of their current sentencepiece type. Only tokens explicitly flagged special are touched, so non-special USER_DEFINED pieces are unchanged; already-CONTROL pieces stay unchanged. The warning message is generalised accordingly. - save.py:2294: narrow the except clause from Exception to ImportError. The loop-3 try/except was added to tolerate a missing protobuf runtime; leaving it broad also swallows OSError/PermissionError mid-write, which would ship a corrupted tokenizer.model to save_to_gguf. ImportError still covers the protobuf case while letting I/O errors propagate to the outer save handler. * Harden fix_sentencepiece_gguf: widen except, protobuf fallback, revert USER_DEFINED widen, guard entry id - save.py:2294: widen except from ImportError back to Exception. The loop-4 narrowing let JSONDecodeError / KeyError / OSError / PermissionError from fix_sentencepiece_gguf abort the entire GGUF export, a regression vs pre-PR behavior. The outer save_to_gguf try/except still covers GGUF-side failures; any fix-side failure now logs a typed warning and lets conversion proceed. - tokenizer_utils.py:445: the direct 'from transformers.utils import sentencepiece_model_pb2' raises TypeError ("Descriptors cannot be created directly") on modern protobuf runtimes. Prepend a sys.modules.setdefault pre-population using transformers.convert_slow_tokenizer.import_protobuf() so the subsequent from-import finds a compatible module via the module cache. The original import line is left verbatim at its place as the final resolver. - tokenizer_utils.py:483: revert loop-4 widening; retag only NORMAL pieces to CONTROL. Retagging USER_DEFINED pieces caused a concrete tokenization regression where an intentionally-USER_DEFINED in-vocab special token had its sentencepiece encoding broken ('<user> hello' changed from [11, 3, 8] to [11, 0, 12, 21, 0, 8]). The PR's stated scope is the NORMAL->CONTROL Gemma case; USER_DEFINED handling is deferred. - tokenizer_utils.py:475: defensive guard around entry["id"]. A malformed added_tokens entry missing the "id" field or with a non-int id is now skipped rather than raising KeyError / inserting garbage. * Add review tests for sentencepiece GGUF fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-23 03:11:35 -07:00
Datta Nimmaturi	77756faa46	Fix tokenizer save gemma (#5115 ) * [WIP] Fast inference for qwen3.5 * fix tokenizer not saving properly * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * extend to VLM and clenaup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * gate tokenizer.model saving * fix for gated/private models * Fix tokenizer save review findings - save.py:261 restore dict-based _TOKENIZER_MODEL_CACHE so negative results are cached; the set() in `0129fb5e` regressed non-SentencePiece tokenizer saves to a fresh HfApi.model_info call on every checkpoint. Don't cache on exception so gated/private repos can retry later with a valid token. - save.py:282 guard `repo_info.siblings` with `or []`; huggingface_hub types this Optional and returns None for empty or new repos, which made any() raise TypeError out of save_pretrained. - save.py:3487 split push_to_hub into local save + _preserve + push so uploaded tokenizer_config.json/tokenizer.model include the fix rather than the unfixed copies written before the upload. - save.py:3352 call patch_saving_functions on tokenizers passed to unsloth_save_pretrained_torchao to match the other three save entrypoints; previously torchao saves skipped the preservation patch. * Fix push_to_hub repo_id conflict and torchao token forwarding - save.py:3493-3496 pop `repo_id` from kwargs (defaulting to `save_directory`) before calling `self.push_to_hub(repo_id, kwargs)`. The previous `self.push_to_hub(save_directory, kwargs)` passed `save_directory` as the first positional `repo_id` while also forwarding a user-supplied `repo_id` through kwargs, raising `TypeError: got multiple values for argument 'repo_id'` on the standard `save_pretrained(local_path, push_to_hub=True, repo_id=...)` call shape. This regression was introduced by the earlier iteration that split push_to_hub into an explicit second step. - save.py:3314 forward `token=token` on the torchao non-PEFT `tokenizer.save_pretrained(torchao_save_directory)` call so the patched wrapper can reach gated repos when HF_TOKEN is not in the environment. Left the sibling `unsloth_generic_save` call at 3063 untouched (blame points at an earlier full-finetuned save_pretrained_merged fix and the token gap there is lower risk). * Fix torchao tokenizer reload and push_to_hub repo_id default - save.py:3283 after `auto_processor.from_pretrained(save_directory)` re-runs `patch_saving_functions(tokenizer)` on the freshly loaded tokenizer. The rebind at 3283 was overwriting the patched tokenizer passed into `unsloth_save_pretrained_torchao`, so the subsequent `tokenizer.push_to_hub` (3309) and `tokenizer.save_pretrained` (3314) bypassed `_preserve_sentencepiece_tokenizer_assets` and left `{save_directory}-torchao` without `tokenizer.model` / restored `added_tokens_decoder`. - save.py:3497 fall back to `os.path.basename(save_directory)` for `repo_id` instead of the raw `save_directory`. The round-2 fallback diverged from `transformers.PreTrainedTokenizerBase.save_pretrained`, which defaults `repo_id = save_directory.split(os.path.sep)[-1]`; nested local paths like `./out/my-repo` now resolve to `my-repo` (the Hub id) instead of the full filesystem path. * Revert tokenizer save_pretrained repo_id basename fallback - save.py:3497 default `repo_id` back to `save_directory` as-is rather than `os.path.basename(save_directory)`. The basename fallback (added last iteration to match upstream transformers) stripped the user namespace from the Unsloth convention `tokenizer.save_pretrained( "user/repo", push_to_hub=True)`, redirecting the upload to `{current_user}/repo`. save.py itself treats `save_directory` as the repo id at 572, 593, 1723, 1779, 1836, 1844, 1858, and 3025, so the wrapper should follow the same convention. Users who pass a nested filesystem path with `push_to_hub=True` can supply explicit `repo_id=...`. * Guard processor.tokenizer recursion against None save.py:3511 change `elif hasattr(model, "tokenizer")` to `elif getattr(model, "tokenizer", None) is not None`. The previous guard only checked attribute existence; a ProcessorMixin that sets `tokenizer = None` (audio-only or manually constructed) would enter the branch and crash inside the recursive patch_saving_functions on `model.push_to_hub.__name__`. * Add review tests for tokenizer save * Consolidate review tests Drop redundant assertion in test_patch_saving_functions_still_patches_non_none_tokenizer. The hasattr check already proves the patch applied; the or-chained repeat assertion added no signal. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-22 09:03:20 -07:00
Daniel Han	dc0729aadf	Add regression test for shell injection fix in GGML conversion (#4773 ) AST-based test ensures subprocess.Popen calls in GGML conversion functions use argv lists instead of shell=True. Companion to PR #4768.	2026-04-02 00:10:47 -07:00
Daniel Han	66649d18bd	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `cad158a56c`.	2025-12-01 07:24:58 -08:00
pre-commit-ci[bot]	cad158a56c	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:24:34 +00:00
Daniel Han	487a951914	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `964c9fef95`.	2025-12-01 07:24:21 -08:00
pre-commit-ci[bot]	964c9fef95	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:23:44 +00:00
Daniel Han	5f27bc4db5	Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `d34e0454ac`.	2025-12-01 07:23:31 -08:00
pre-commit-ci[bot]	d34e0454ac	[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci	2025-12-01 15:20:22 +00:00
Daniel Han	ba2897a318	Revert "[FIX] Vllm guided decoding params (#3662 )" This reverts commit `fb4f0fdf56`.	2025-12-01 05:43:45 -08:00
Datta Nimmaturi	fb4f0fdf56	[FIX] Vllm guided decoding params (#3662 ) * vllm sampling params fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not patch base_trainer * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * seperate vllm fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestion from @danielhanchen * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 58b483dc0d1790f99580665801d3fa0d7267c533. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit b2497519659a9f301e7a633795d9efdafdc2b277. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit de3daaf429f81aceb6632932b0cb1af5149652a8. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-12-01 05:42:37 -08:00
Daniel Han	d6bb89ad44	Formatting & bug fixes (#3563 ) * Update rl.py * Fix CE Loss * Versioning * Update loader.py * Update loader.py * extract_model_type_from_config * Model types * Update loader.py * get_transformers_model_type * Update loader.py * Update loader.py * Update loader.py * Update rl.py * Update pyproject.toml * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Versioning * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update vision.py * Update vision.py * Fix DataParallel * Update _utils.py * Update rl.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update mapper.py * Versioning * Update loader.py * Update loader.py * Update rl.py * Versioning * Update _utils.py * Fix auto_mapping * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Message * Update vision.py * Update loader.py * Update vision.py * cache_implementation * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Save max_seq_length * Update _utils.py * Update rl.py * Update vision.py * Update llama.py * Mistral3 vllm (#3349) * [WIP] use vLLM for vision language models * Update README.md Editing icon sizes * Update README.md Updating icon sizes * Update README.md (#2885) * MoE kernels AGPLv3 * versioning * Many bug fixes (#2908) * add deepseek v3 * add deepseek r1 base * add deepseek r1 zero * add deepseek distill llama * add deepseek distill models * remove redundant code when constructing model names * add mistral small to registry * rename model registration methods * rename deepseek registration methods * refactor naming for mistral and phi * add global register models * refactor model registration tests for new registry apis * add model search method * remove deprecated registration api * add quant type test * add registry readme * make llama registration more specific * clear registry when executing individual model registration file * more registry readme updates * Update _auto_install.py * Llama4 * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Synthetic data * Update mapper.py * Xet and Synthetic * Update synthetic.py * Update loader.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py --------- Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * silienty skip falcon h1 import is transformers_version < 4.53.0 (#2912) * Dynamically adjust get_per_token_logps function and patch as well (#2911) * add intel gpu with vllm support (#2903) * [bugs] fix for casual mask (#2868) * fix for casual mask * use un_casual in sdpa * add missing mask * fix for type * Explicitly check if xformers exists for attention (#2889) * Update __init__.py * Update llama.py * if mlp doesn't exist in layer module check for feed_forward name for falcon h1 (#2913) * Move inputs to right devices. (#2919) * Move tensors to right devices * fix multi gpu for non mistral models * multi GPU RoPE for gemma2 * Finish up multi GPU inference * Make multiGPU rope a list * Remove unnecessary transfer to CPU * Remove unnecessary move to CPU * Donot move inputs to device yet will be handled separately in another PR * Move inputs to appropriate decoder device * Make device count global variable * Cleanup RoPE device code * Fixup num_gpu to device count * Cleanup device counts * Use device index for RoPE get_cache * Donot typecast * Use tuple instead of list for tensors. Use device index directly * fixup move to device logic * WIP VLM vLLM * Make vLLM patch a function * Add save and load lora functions * Make fast_inference setup depend on the flag * Improve fast inference patching mechanism * Make vision setting depend on checks in fastbasemodel * Check LoRA and vLLM intercompatibility for vision models * Comment pointing to vLLM LoRA check * Improve lora validation on vLLM * Error out on no vLLM and increase max lora rank * Bug fixes (#3017) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * fix for casual mask (#3011) * [intel] add for intel path for llama.py (#3012) * fix for intel path * remove unuse code * Update unsloth/models/llama.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update llama.py * Fix Gemma 2 (#3024) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * falcon force float32 on sm<75 machines (#3026) * Fix torch compile issues (#3028) * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update pyproject.toml * Delete .gitignore * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update _utils.py * Update pyproject.toml * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update chat_templates.py * Seasame force float16 / float32 * Fix Seasame * Update loader.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * is_multimodal * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update vision.py * Update vision.py * Update vision.py * UNSLOTH_DISABLE_STATIC_GENERATION * Update vision.py * Auto vision detection * Sesame * Whisper * Update loader.py * Update loader.py * Update loader.py * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update loader.py * Update loader.py * Update loader.py * Update loader.py * Update _utils.py * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update rl.py * logging * Update pyproject.toml * Update rl.py * versioning * Update rl.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * logits / temperature * Update rl_replacements.py * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Debugging only * Update llama.py * Update llama.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Generic efficient GRPO * Update rl_replacements.py * Update rl_replacements.py * Remove debugging * Update rl_replacements.py * Update rl_replacements.py * Update vision.py * Update llama.py * Update rl_replacements.py * versioning * Update _utils.py * Update vision.py * Update mapper.py * Update loader.py * Update mapper.py * Update vision.py * Update loader.py * Update vision.py * Update loader.py * Update _utils.py * Update vision.py * gradient checkpointing * Gemma 3N fixes * Update loader.py * Versioning * Gemma 3N fixes * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Fix setup.py * setup.py * Prints * Update setup.py * Update setup.py * Update setup.py * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update pyproject.toml * Update vision.py * Update vision.py * Update pyproject.toml * Update vision.py * Update _utils.py * Update __init__.py * Update __init__.py * Small fixes * Update vision.py * Update vision.py * versioning * Update __init__.py * Update llama.py * Update rl.py * Update rl.py * Update _utils.py * Update vision.py * Update vision.py * compiler stance * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Update rl_replacements.py * Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990) This reverts commit `4021da634a`. * skip_guard_eval_unsafe fix * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update synthetic.py * Update llama.py * Update llama.py * Fix `quantization_method` * versioning * Update _utils.py * Update _utils.py * Update _utils.py * check stride * Cleanup * Update rope_embedding.py * Update gemma2.py * Fix `set_stance` * Update pyproject.toml * Update _utils.py * Fixup patch vllm * Disable mllama * Use variables to decide VLM support * Better attn_impl handling * Patch TF protobuf incompatability * Torch 2.8 (#3186) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update _auto_install.py * Update pyproject.toml * Update rl.py * Protobuf issue * Update pyproject.toml * Fix extras transformers typo in pyproject.toml * Update _utils.py * Bug fixes (#3195) * Fix mamba * Update loader.py * Update vision.py * Update loader.py * Filter vLLM standby logs (#3131) * filter vLLM standby logs * safeguard standby logger patch * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py * Update unsloth/models/_utils.py --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update loader.py * Add scaler * Update llama.py * Update _utils.py * Versioning * GPT OSS fix * GPT OSS fix * Update loader.py * Update vision.py * Update vision.py * Update loader.py * Update vision.py * Update vision.py * Update llama.py * Update llama.py * Update llama.py * Versioning * Update mapper.py * Update vision.py * Update vision.py * Update vision.py * Upcast norms * Update loader.py * Update vision.py * Upcast layernorms * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * Update save.py * Update rl.py * Update pyproject.toml * Update rl.py * Update rl_replacements.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Update __init__.py * Torch 2.8 * Update rl_replacements.py * Update loader.py * UNSLOTH_ENABLE_CCE * Fix * Update loader.py * Update loader.py * Update __init__.py * Update __init__.py * Update __init__.py * Update __init__.py * Import fixes * Update loader.py * Fix aimv2 issue * Update loader.py * Update import_fixes.py * Update import_fixes.py * Update loader.py * Update loader.py * Update loader.py * Upgrade * Update loader.py * Update loader.py * Update loader.py * Update loader.py --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * adallow float32 dtype in FastLanguageModel (#3204) * Update loader.py * Update vision.py * Suppress message and use unsloth sampling params * Use trl sampling params for now * Improve error message * fixup quantized fast inference model name * Add mistral 3 support --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> * Set padding to 0 * Fix patch * fixup patch (#3359) Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * Update vision.py * Versioning * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * MXFP4 dequant * Update loader.py * Update vision.py * load_in_16bit * Update vision.py * Update vision.py * Update vision.py * Update rl.py * Update vision.py * offload_embedding * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update rl_replacements.py * Update loader.py * Fix padding issue * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * Update vision.py * New models * Update llama.py * Versioning * Update _utils.py * Update llama.py * Update _utils.py * Update llama.py * Fix AMD * Update _utils.py * Update llama.py * Update vision.py * DEVICE_TYPE_TORCH * Update __init__.py * Update __init__.py * Update _utils.py * Move DEVICE_TYPE * Update rl_replacements.py * Update loader.py * AMD install script * Move AMD * Update _amd_install.sh * Update pyproject.toml * Update pyproject.toml * Delete _amd_install.sh * Update device_type.py * Update loader.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * Update tokenizer_utils.py * Versioning * Update pyproject.toml * Update loader.py * Update _utils.py * Update pyproject.toml * Update pyproject.toml * Update _utils.py * Update pyproject.toml * Update _utils.py * Update _utils.py * Update loader.py * Update _utils.py * Update _utils.py * local_files_only * Cut Cross Entropy * Update llama.py * Update vision.py * Update vision.py * Update vision.py * Qwen 3 VL vLLM (#3489) * Update __init__.py * patch_torchao * torchao_logger * Update rl_replacements.py * Fix * Update rl.py * Update rl.py * Update rl.py * Update rl.py * Update _utils.py * Versioning * fbgemm fp8 block quant support (>=1.4.0) (#3531) * fbgemm fp8 block quant support (>=1.4.0) * Verify for fp8 support before proceeding * Use unsloth zoo's Version and improve comments * spacessss * Update vision.py * Update vision.py * Update rl.py * vllm_sampling_params * Update rl.py * Update rl.py * Update rl.py * Add `ruff` pre-commit hook and apply it (#3424) * Add Ruff pre-commit config and workflow * Add kwarg spacing enforcement helper * Apply Ruff formatting * Update fp8.py * Revert ruff on some files * Update * force-exclude = true * Datasets issue * Ruff * Remove mapper * Update mapper.py * Update pyproject.toml --------- Co-authored-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: jeromeku <jerome.ku@gmail.com> Co-authored-by: DoubleMathew <mmathew23@gmail.com> Co-authored-by: Lei Zhenyuan <zhenyuan.lei@intel.com> Co-authored-by: parth2510 <parthguptapg7326@gmail.com> Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-11-07 06:00:22 -08:00
Roland Tannous	2011859430	Add TorchAO quantization tests with FP16 models and serialization workarounds (#3269 ) * Add TorchAO quantization tests with FP16 models and serialization workarounds * remove unrelated files * cleaned submission	2025-09-04 17:22:07 -07:00
Roland Tannous	0135d126df	fixed save_pretrained_torchao and associated tests (#3264 )	2025-09-03 20:24:12 -07:00
Jerry Zhang	969c6a0bd8	Support saving locally in `model.save_pretrained_torchao` (#3263 ) Summary: Previously the test was not ran correctly and the save to local path is not tested this PR added support for that and tries to test properly Note: `python tests/saving/test_unsloth_save.py` doesn't run test Test Plan: pytest tests/saving/test_unsloth_save.py -k test_save_torchao Reviewers: Subscribers: Tasks: Tags:	2025-09-03 17:51:33 -07:00
Roland Tannous	711ec4a3ac	tests for mxfp4 and quantized models merge fix unsloth zoo pr 254 (#3223 )	2025-08-29 01:30:48 -07:00
Jerry Zhang	f3ab8c21af	Support `model.save_pretrained_torchao` (#3111 ) Summary: Allow users merge the LoRA weights and then do a post training quantization with torchao Usage: ``` from torchao.quantization import Int8DynamicActivationInt8WeightConfig torchao_config = Int8DynamicActivationInt8WeightConfig() model.save_pretrained_torchao( save_path, tokenizer=tokenizer, torchao_config=torchao_config, ) ``` Test Plan: python tests/saving/test_unsloth_save.py Reviewers: Subscribers: Tasks: Tags:	2025-08-26 04:53:39 -07:00
Daniel Han	ce6a73986d	Revert "Revert "Add Qwen2.5-VL-32B-Instruct mapping to fix quantized model me…" (#2990 ) This reverts commit `4021da634a`.	2025-07-17 15:37:23 -07:00
Roland Tannous	efe2cc43a7	tests for additional merge fix unsloth zoo pr 163 (#2719 ) * tests for additional merge fix unsloth zoo pr 163 * fixed load_dataset indent in mistral perplexity test file	2025-06-11 14:08:41 -07:00
Roland Tannous	58f3a6e29d	reroute merge logic language models + comprehensive tests + eval kits (#2673 )	2025-06-02 20:32:57 -07:00
Erland366	ed16a50bf9	feat: Add validation for 4bit save method and implement corresponding error handling	2025-04-19 20:36:30 +00:00

22 commits