* chore: fix typos in studio/backend/routes/models.py
* chore: fix typos in tests/saving/non_peft/test_mistral_non_peft.py
* chore: fix typos in tests/saving/non_peft/test_whisper_non_peft.py
* chore: fix typos in tests/saving/vision_models/test_index_file_sharded_model.py
* chore: fix typos in tests/saving/vision_models/test_push_to_hub_merged.py
* chore: fix typos in tests/saving/vision_models/test_save_merge_qwen2.5vl32B_model_ocr_benchmark.py
* chore: fix typos in tests/saving/vision_models/test_save_merge_vision_model_ocr_benchmark.py
* chore: fix typos in unsloth/import_fixes.py
* Split: keep only 6 file(s)
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
* fix: patch CONTROL type for special tokens in sentencepiece GGUF export (fixes#5070)
When converting a Gemma 3 fine-tune to GGUF via save_pretrained_gguf,
tokens like <start_of_turn> (id=105) and <end_of_turn> (id=106) are
already present in the sentencepiece model but are typed as NORMAL (1)
instead of CONTROL (3). llama.cpp only recognises CONTROL tokens when
parse_special=True is active, so these tokens get BPE-split during
chat inference and the model produces garbage output.
fix_sentencepiece_gguf now reads tokenizer.json's added_tokens list and,
for any token with "special": true whose ID falls within the existing
sentencepiece vocabulary, updates its type from NORMAL to CONTROL before
writing the patched tokenizer.model to disk. The same CONTROL type is
also applied when new tokens are appended for the out-of-range case, so
both code paths are consistent.
* Wire fix_sentencepiece_gguf into tokenizer save path and guard np.diff
- save.py: call fix_sentencepiece_gguf inside unsloth_tokenizer_save_pretrained
after _preserve_sentencepiece_tokenizer_assets. The helper was previously
unreferenced in the repo, so the PR's CONTROL-type patch never actually ran
during save_pretrained_gguf.
- tokenizer_utils.py: add an early-return guard for len(added_tokens_ids) < 2
before the existing np.diff contiguity check. np.diff on a single-element
array returns [] and .min() raises ValueError, which would discard the new
in-vocab CONTROL patch; the guard flushes tokenizer.model first. Guard is
inserted before the existing lines (diff = np.diff(...) and the min/max
check) so their blame is unchanged.
Dropped the separate refactor to fold the four duplicated "if patched > 0:
write tokenizer.model" blocks into a helper because doing so re-indents
lines whose blame is "Formatting & bug fixes"; the duplication
remains the author's pattern.
* Fix review findings: negative token_id guard and np.diff single-element
- tokenizer_utils.py:481: add 0 <= lower bound to the special_token_ids
bounds check. Previously a negative token_id from tokenizer.json passed
'token_id < sentence_piece_size' and Python's negative indexing wrapped
tokenizer_file.pieces[-1] to silently corrupt the last piece to CONTROL.
- tokenizer_utils.py:513: replace the loop-1 'if len < 2: return' guard
(which was too broad: it silently skipped vocab extension for single-entry
added_tokens.json) with a pre-pass that substitutes a trivially-contiguous
2-element sentinel for the contiguity check, then restores the original
array before the append loop. Lines 519 ('diff = np.diff(added_tokens_ids)')
and 520-529 (min/max/boundary checks and early-return write blocks) are
left literally unchanged so blame remains intact.
* Restore real added_tokens_ids before min boundary check
Move the '_real_added_tokens_ids' restore above the
'added_tokens_ids.min() != sentence_piece_size' check. With the previous
order the sentinel [sentence_piece_size, sentence_piece_size + 1] was
still in scope when the min check ran, so any single-entry added_tokens
.json with an out-of-range start id (e.g. 99 when sentence_piece_size=2)
bypassed the boundary check and fell through to the append loop.
* Scope fix_sentencepiece_gguf to GGUF export path only
Previously wired fix_sentencepiece_gguf into unsloth_tokenizer_save_pretrained,
which is the generic monkey-patch replacement for every tokenizer.save_pretrained
call. That caused the GGUF-specific mutation (and the unconditional protobuf
import in fix_sentencepiece_gguf) to run on every LoRA / merged 16-bit /
push_to_hub / torchao save, where it has no purpose and can abort the entire
save if the protobuf runtime is unavailable.
- save.py: remove fix_sentencepiece_gguf call from unsloth_tokenizer_save_pretrained.
- save.py: add the call inside unsloth_save_pretrained_gguf immediately before
save_to_gguf, wrapped in try/except so a protobuf import failure logs a
warning and lets GGUF conversion proceed rather than aborting the save.
* Broaden special-token retag to USER_DEFINED and narrow save.py except
- tokenizer_utils.py:483: the in-vocab retag previously only promoted NORMAL
pieces to CONTROL, but the real Gemma tokenizer (e.g. unsloth/functiongemma
-270m-it) stores <start_of_turn>/<end_of_turn> as USER_DEFINED (type 4).
Extend the predicate to cover both NORMAL and USER_DEFINED so tokens marked
"special": true in tokenizer.json are promoted regardless of their current
sentencepiece type. Only tokens explicitly flagged special are touched, so
non-special USER_DEFINED pieces are unchanged; already-CONTROL pieces stay
unchanged. The warning message is generalised accordingly.
- save.py:2294: narrow the except clause from Exception to ImportError. The
loop-3 try/except was added to tolerate a missing protobuf runtime; leaving
it broad also swallows OSError/PermissionError mid-write, which would ship
a corrupted tokenizer.model to save_to_gguf. ImportError still covers the
protobuf case while letting I/O errors propagate to the outer save handler.
* Harden fix_sentencepiece_gguf: widen except, protobuf fallback, revert USER_DEFINED widen, guard entry id
- save.py:2294: widen except from ImportError back to Exception. The loop-4
narrowing let JSONDecodeError / KeyError / OSError / PermissionError from
fix_sentencepiece_gguf abort the entire GGUF export, a regression vs
pre-PR behavior. The outer save_to_gguf try/except still covers GGUF-side
failures; any fix-side failure now logs a typed warning and lets
conversion proceed.
- tokenizer_utils.py:445: the direct 'from transformers.utils import
sentencepiece_model_pb2' raises TypeError ("Descriptors cannot be created
directly") on modern protobuf runtimes. Prepend a sys.modules.setdefault
pre-population using transformers.convert_slow_tokenizer.import_protobuf()
so the subsequent from-import finds a compatible module via the module
cache. The original import line is left verbatim at its place as the
final resolver.
- tokenizer_utils.py:483: revert loop-4 widening; retag only NORMAL pieces
to CONTROL. Retagging USER_DEFINED pieces caused a concrete tokenization
regression where an intentionally-USER_DEFINED in-vocab special token had
its sentencepiece encoding broken ('<user> hello' changed from [11, 3, 8]
to [11, 0, 12, 21, 0, 8]). The PR's stated scope is the NORMAL->CONTROL
Gemma case; USER_DEFINED handling is deferred.
- tokenizer_utils.py:475: defensive guard around entry["id"]. A malformed
added_tokens entry missing the "id" field or with a non-int id is now
skipped rather than raising KeyError / inserting garbage.
* Add review tests for sentencepiece GGUF fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [WIP] Fast inference for qwen3.5
* fix tokenizer not saving properly
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* extend to VLM and clenaup
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* gate tokenizer.model saving
* fix for gated/private models
* Fix tokenizer save review findings
- save.py:261 restore dict-based _TOKENIZER_MODEL_CACHE so negative
results are cached; the set() in 0129fb5e regressed non-SentencePiece
tokenizer saves to a fresh HfApi.model_info call on every checkpoint.
Don't cache on exception so gated/private repos can retry later with a
valid token.
- save.py:282 guard `repo_info.siblings` with `or []`; huggingface_hub
types this Optional and returns None for empty or new repos, which
made any() raise TypeError out of save_pretrained.
- save.py:3487 split push_to_hub into local save + _preserve + push so
uploaded tokenizer_config.json/tokenizer.model include the fix rather
than the unfixed copies written before the upload.
- save.py:3352 call patch_saving_functions on tokenizers passed to
unsloth_save_pretrained_torchao to match the other three save
entrypoints; previously torchao saves skipped the preservation patch.
* Fix push_to_hub repo_id conflict and torchao token forwarding
- save.py:3493-3496 pop `repo_id` from kwargs (defaulting to
`save_directory`) before calling `self.push_to_hub(repo_id, **kwargs)`.
The previous `self.push_to_hub(save_directory, **kwargs)` passed
`save_directory` as the first positional `repo_id` while also
forwarding a user-supplied `repo_id` through kwargs, raising
`TypeError: got multiple values for argument 'repo_id'` on the
standard `save_pretrained(local_path, push_to_hub=True, repo_id=...)`
call shape. This regression was introduced by the earlier iteration
that split push_to_hub into an explicit second step.
- save.py:3314 forward `token=token` on the torchao non-PEFT
`tokenizer.save_pretrained(torchao_save_directory)` call so the
patched wrapper can reach gated repos when HF_TOKEN is not in the
environment. Left the sibling `unsloth_generic_save` call at 3063
untouched (blame points at an earlier full-finetuned
save_pretrained_merged fix and the token gap there is lower risk).
* Fix torchao tokenizer reload and push_to_hub repo_id default
- save.py:3283 after `auto_processor.from_pretrained(save_directory)`
re-runs `patch_saving_functions(tokenizer)` on the freshly loaded
tokenizer. The rebind at 3283 was overwriting the patched tokenizer
passed into `unsloth_save_pretrained_torchao`, so the subsequent
`tokenizer.push_to_hub` (3309) and `tokenizer.save_pretrained`
(3314) bypassed `_preserve_sentencepiece_tokenizer_assets` and left
`{save_directory}-torchao` without `tokenizer.model` / restored
`added_tokens_decoder`.
- save.py:3497 fall back to `os.path.basename(save_directory)` for
`repo_id` instead of the raw `save_directory`. The round-2 fallback
diverged from `transformers.PreTrainedTokenizerBase.save_pretrained`,
which defaults `repo_id = save_directory.split(os.path.sep)[-1]`;
nested local paths like `./out/my-repo` now resolve to `my-repo`
(the Hub id) instead of the full filesystem path.
* Revert tokenizer save_pretrained repo_id basename fallback
- save.py:3497 default `repo_id` back to `save_directory` as-is rather
than `os.path.basename(save_directory)`. The basename fallback (added
last iteration to match upstream transformers) stripped the user
namespace from the Unsloth convention `tokenizer.save_pretrained(
"user/repo", push_to_hub=True)`, redirecting the upload to
`{current_user}/repo`. save.py itself treats `save_directory` as the
repo id at 572, 593, 1723, 1779, 1836, 1844, 1858, and 3025, so the
wrapper should follow the same convention. Users who pass a nested
filesystem path with `push_to_hub=True` can supply explicit
`repo_id=...`.
* Guard processor.tokenizer recursion against None
save.py:3511 change `elif hasattr(model, "tokenizer")` to
`elif getattr(model, "tokenizer", None) is not None`. The previous
guard only checked attribute existence; a ProcessorMixin that sets
`tokenizer = None` (audio-only or manually constructed) would enter
the branch and crash inside the recursive patch_saving_functions on
`model.push_to_hub.__name__`.
* Add review tests for tokenizer save
* Consolidate review tests
Drop redundant assertion in test_patch_saving_functions_still_patches_non_none_tokenizer.
The hasattr check already proves the patch applied; the or-chained
repeat assertion added no signal.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
* vllm sampling params fix
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* do not patch base_trainer
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* seperate vllm fixes
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Apply suggestion from @danielhanchen
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit 58b483dc0d1790f99580665801d3fa0d7267c533.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit b2497519659a9f301e7a633795d9efdafdc2b277.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"
This reverts commit de3daaf429f81aceb6632932b0cb1af5149652a8.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Summary:
Previously the test was not ran correctly and the save to local path is not tested
this PR added support for that and tries to test properly
Note: `python tests/saving/test_unsloth_save.py` doesn't run test
Test Plan:
pytest tests/saving/test_unsloth_save.py -k test_save_torchao
Reviewers:
Subscribers:
Tasks:
Tags:
Summary:
Allow users merge the LoRA weights and then do a post training quantization with torchao
Usage:
```
from torchao.quantization import Int8DynamicActivationInt8WeightConfig
torchao_config = Int8DynamicActivationInt8WeightConfig()
model.save_pretrained_torchao(
save_path,
tokenizer=tokenizer,
torchao_config=torchao_config,
)
```
Test Plan:
python tests/saving/test_unsloth_save.py
Reviewers:
Subscribers:
Tasks:
Tags: