unsloth

mirror of https://github.com/unslothai/unsloth.git synced 2026-07-09 15:58:41 +00:00

History

Daniel Han d4fbc81d3a Restore dropped FP8 weight_scale_inv tensors on load (#6978 ) * Restore dropped FP8 weight_scale_inv tensors on load Some block-scale FP8 checkpoints (for example Qwen3.6-27B-FP8, issue #6200) load with transformers leaving an mlp.gate_proj as a plain bf16 Linear instead of an fp8 module. Its raw quantized values are read into the bf16 weight and the weight_scale_inv is dropped as an unexpected key, so the weight is used un-scaled and the base model is garbage (perplexity around 2 million). After load, for every checkpoint weight_scale_inv whose live weight is not fp8, dequantize the orphaned weight in place using the block scale from the checkpoint index. Modules that were converted correctly keep an fp8 weight and are skipped, so healthy checkpoints and single-file checkpoints are a no-op. Verified on Qwen3.6-27B-FP8: 64 gate_proj scales restored, perplexity 2028902 to 8.9. No-op on Qwen3-8B-FP8 (all scales already live). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden FP8 weight_scale_inv restore from review - Skip restore when the model has no fp8 weights, so an intentionally dequantized load (load_in_16bit) is never re-scaled and corrupted. - Thread revision, subfolder and cache_dir through the index and shard downloads so scales come from the same snapshot as the weights. - Cover unsharded single-file model.safetensors checkpoints (no index). - Handle transposed block-scale layouts and skip on a true grid mismatch instead of applying a wrong scale. - Match text-only VLM loads where the language_model prefix was stripped. - Restore on the FastLanguageModel text path too, not only vision. - Handle a scalar weight_block_size; per-tensor error handling so one bad tensor cannot abort the rest or hide a partial mutation. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address second review round on FP8 scale restore - Bound peak memory: dequantize block views in place with the fp32 scale broadcast instead of materializing a full expanded scale and fp32 copy, so a near-VRAM-limit load is not pushed into OOM by the repair. - Restore on the sequence-classification load path too. - Cover more VLM key remappings (language_model.model.* to model.language_model.) when matching modules. - Skip the restore for variant loads (variant=...) rather than risk applying default-checkpoint scales to variant weights. [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Align FP8 scale restore revision with the loaded weights and warn on disk-offloaded layers In llama.py the CausalLM/SequenceClassification weight loads resolve model_name on its default branch (revision is not forwarded there), so read the dropped weight_scale_inv tensors from the same default branch instead of the requested revision, avoiding rescaling default-branch weights with scales from another revision. In loader_utils.py a disk-offloaded layer keeps its weight on the meta device until the offload hook materializes it, so the scale cannot be applied in place. Skip such layers explicitly and print a warning rather than silently leaving them unscaled. * Tighten comments in the FP8 scale restore path --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>		2026-07-09 06:44:44 -07:00
..
notebooks	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
python	Fix Windows installer torch index override (#6972 )	2026-07-09 03:46:47 -07:00
qlora	Formatting: ruff line-length 100, kwarg-spacing passes, drop blank after short local imports (#6079 )	2026-06-08 04:24:13 -07:00
saving	Fix repeated base model downloads across checkpoint exports (#6896 )	2026-07-06 23:38:29 -07:00
security	scan_packages: key baseline on matched-code hash so payloads in baselined files are not auto-suppressed (#6552 )	2026-07-01 04:03:59 -07:00
sh	Fix Windows installer torch index override (#6972 )	2026-07-09 03:46:47 -07:00
studio	Studio: remove dead direct_linux_release_plan path (#7030 )	2026-07-09 05:09:16 -07:00
studio_setup_ps1	Make Visual Studio + CMake optional on Windows (prebuilt llama.cpp needs no build tools) (#6499 )	2026-06-22 01:11:09 -07:00
utils	Keep native RoPE scaling when extending context; carry rope_theta for linear (#7028 )	2026-07-09 04:20:41 -07:00
version_compat	Fix FastSentenceTransformer Qwen embedding preprocessing (#6939 )	2026-07-09 01:46:22 -07:00
vllm_compat	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
__init__.py	Qwen 3, Bug Fixes (#2445 )	2025-04-30 22:38:39 -07:00
_zoo_aggressive_cuda_spoof.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
_zoo_rocm_spoof.py	Add RDNA 2/3/4 ROCm routing tests via a CPU-only torch spoof (#6935 )	2026-07-07 04:41:37 -07:00
conftest.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
run_all.sh	Installer: make UV_OVERRIDE space-safe on Apple Silicon (#6503 ) (#6639 )	2026-06-24 17:34:18 -07:00
test_attention_implementation.py	fix(gpt-oss): prefer flex attention over sdpa (#5701 )	2026-05-22 08:38:38 -07:00
test_attn_impl_honor_explicit.py	Honor an explicit sdpa or flex_attention request when flash is disabled (#6847 )	2026-07-06 05:45:45 -07:00
test_bad_mappings_redirect.py	Fix BAD_MAPPINGS not redirecting the -unsloth-bnb-4bit dynamic quants (#6949 )	2026-07-08 17:37:44 -03:00
test_callback_signature_drift.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_cli_export_unpacking.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_enforce_kwargs_spacing.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_fast_gemv_dispatch.py	Fix fast inference crash on compressed-tensors FP8 models (#7025 )	2026-07-09 04:10:59 -07:00
test_fast_generate_slow_guard.py	fast_generate: clear error for vLLM-style inputs when fast_inference=False (#6786 )	2026-07-03 08:16:32 -07:00
test_finetune_last_n_layers.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_fp8_device_context.py	Guard FP8 Triton launches with tensor device context (#6888 )	2026-07-08 18:32:51 -03:00
test_fp8_restore_dropped_scale.py	Restore dropped FP8 weight_scale_inv tensors on load (#6978 )	2026-07-09 06:44:44 -07:00
test_fp8_tiny_e8m0.py	Handle odd shapes and non-float scales in FP8BlockQuantLinear (#6848 )	2026-07-06 05:44:55 -07:00
test_fused_ce_not_return_dict_logits.py	fix: use EMPTY_LOGITS on the fused-CE not-return_dict path (#2068 ) (#6482 )	2026-06-23 01:26:55 -07:00
test_gemma4_chat_template.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_gemma_2b_mapper_key.py	Fix duplicate unsloth/gemma-2b-bnb-4bit mapper key routing the base 4bit repo to the instruct model (#6891 )	2026-07-08 17:40:39 -03:00
test_generate_kwarg_gate.py	Fix gpt-oss offload_embedding and generate() kwargs, and guard offload_embedding on tied/vLLM models (#6774 )	2026-07-01 22:39:00 -07:00
test_get_model_name.py	Formatting: ruff line-length 100, kwarg-spacing passes, drop blank after short local imports (#6079 )	2026-06-08 04:24:13 -07:00
test_gradient_checkpointing_restore.py	Fix TrainingArguments silently disabling unsloth gradient checkpointing (#6829 )	2026-07-03 16:35:02 +01:00
test_ignored_tokenizer_casing.py	Match IGNORED_TOKENIZER_NAMES case-insensitively (#6620 )	2026-06-23 19:40:50 -03:00
test_import_fixes_drift.py	fix: keep LoRA reloads working with PEFT 0.19 (#6748 )	2026-06-30 20:26:57 +01:00
test_loader_glob_skip.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_missing_torchvision_vlm.py	Fix misleading 'only for image models' error for Qwen3-VL when torchvision is missing (#6525 )	2026-06-23 01:28:09 -07:00
test_model_registry.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_moe_lora_targets.py	Scope MoE expert LoRA detection to actual MLP projection targets (#6849 )	2026-07-06 05:45:06 -07:00
test_multi_image_grpo_chunking.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_nvfp4_quant_load.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_offline_loading_helpers.py	Fix offline checkpoint load/export: "tokenizer is weirdly not loaded" (#6554 )	2026-06-25 23:16:53 -07:00
test_offload_embedding_hooks.py	Fix gpt-oss offload_embedding and generate() kwargs, and guard offload_embedding on tied/vLLM models (#6774 )	2026-07-01 22:39:00 -07:00
test_offload_tied_guard.py	Fix gpt-oss offload_embedding and generate() kwargs, and guard offload_embedding on tied/vLLM models (#6774 )	2026-07-01 22:39:00 -07:00
test_peft_tensor_parallel_compat.py	fix: keep LoRA reloads working with PEFT 0.19 (#6748 )	2026-06-30 20:26:57 +01:00
test_peft_weight_converter_compat.py	Formatting: ruff line-length 100, kwarg-spacing passes, drop blank after short local imports (#6079 )	2026-06-08 04:24:13 -07:00
test_prefetch_snapshot_scope.py	Auto Xet to HTTP download fallback in from_pretrained; share Studio's fallback via unsloth_zoo (#6638 )	2026-07-06 05:13:25 -07:00
test_pretrain_compile_reset.py	Add regression tests for the stray-forward compile-cache reset (#6569 )	2026-06-22 07:22:47 -07:00
test_public_api_surface.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_raw_text.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_resolve_model_class.py	Formatting: ruff line-length 100, kwarg-spacing passes, drop blank after short local imports (#6079 )	2026-06-08 04:24:13 -07:00
test_studio_install_workspace_guard.py	Update studio root-resilience tests for the inference-backend refactor (#6490 ) (#6553 )	2026-06-22 01:10:49 -07:00
test_studio_root_resilience.py	Update studio root-resilience tests for the inference-backend refactor (#6490 ) (#6553 )	2026-06-22 01:10:49 -07:00
test_studio_shutdown_thread_wait.py	Studio: fix Ctrl+C shutdown ordering (installer shell + uvicorn thread wait) (#6566 )	2026-06-22 07:41:10 -07:00
test_synthetic_chunk_data.py	fix: correct class name in SyntheticDataKit.chunk_data guard message (#6901 )	2026-07-06 07:11:57 -07:00
test_tool_mask_zoo_compat.py	Formatting: ruff line-length 100, kwarg-spacing passes, drop blank after short local imports (#6079 )	2026-06-08 04:24:13 -07:00
test_uninitialized_position_ids.py	Load DeepSeek-OCR and other VLMs that register AutoModel in auto_map (#6421 )	2026-06-18 07:03:43 -07:00
test_video_path_validation.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00
test_vllm_broken_detection.py	Fix fast_inference crash on ABI-broken vLLM: probe compiled extensions, not just import vllm (#6621 )	2026-06-26 22:43:36 -07:00
test_windows_rocm_bnb_version.py	Reduce and tighten comments and docstrings across the test suite (#6429 )	2026-06-18 01:07:09 -07:00