mirror of
https://github.com/unslothai/unsloth.git
synced 2026-05-17 21:14:06 +00:00
* Fix FastSentenceTransformer compatibility with sentence-transformers 5.4 * Support varied Transformer init signatures Detect Transformer.__init__ parameters and build init kwargs accordingly so trust_remote_code and other args are passed using the correct names. Instead of unconditionally using model_args/config_args, the code now inspects the constructor to decide between model_kwargs/config_kwargs vs model_args/config_args and also sets processor_kwargs or tokenizer_args when present. Initializes Transformer with constructed transformer_kwargs (including max_seq_length) to improve compatibility with different Transformer implementations. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden SentenceTransformer path and module checks * Scrub .github/workflows for staging push (matches staging base) * Guard auto_model write in FastSentenceTransformer._apply_torch_compile On sentence-transformers >=5.4 Transformer.auto_model is a read-only @property backed by self.model, so a direct assignment raises AttributeError. The two get_peft_model paths already guard the write with isinstance(getattr(type(...), "auto_model", None), property); the auto-compile path missed the same guard, which broke the default trainer path whenever max_steps >= _compile_threshold. * Add tests for FastSentenceTransformer property guards * Tighten FastSentenceTransformer redirect lifecycle tests Drop a duplicate assertion-less case, remove dead AST extraction helper, and trim unused imports. The remaining six tests cover substitution on match, restoration on constructor exception, passthrough for unrelated names, pathlib.Path normalisation, trailing slash handling, and the no-identifier guard. * Sync .github/workflows with upstream author branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Avoid sharing trust_remote_code kwargs dict across constructor buckets In FastSentenceTransformer._create_transformer_module, the same trust_remote_code_kwargs dict was being assigned to model_kwargs, config_kwargs, and processor_kwargs (or model_args / config_args / tokenizer_args) on the Transformer constructor. transformers' from_pretrained code paths (configuration_utils, auto_factory, processing_auto, etc.) call kwargs.pop("trust_remote_code", ...) on the dict they receive, which would drain the shared object and silently strip trust_remote_code from the other buckets. Pass an independent copy to each bucket so subsequent buckets and any pass-through auxiliary loads still see trust_remote_code. * Wire do_lower_case and return_dict through Transformer init for ST 5.4 In FastSentenceTransformer._create_transformer_module: - When Transformer.__init__ accepts do_lower_case (ST 5.4+), pass the unsloth tokenizer's do_lower_case as a constructor kwarg. The existing post-init attribute assignment alone is too late: ST 5.4's __init__ uses do_lower_case to install a Lowercase normalizer on tokenizer.backend_tokenizer.normalizer, which is not re-applied if we only set the attribute after construction. The post-init line is preserved untouched for older ST versions. - Add return_dict to the manually completed model_forward_params set so wrapped models with forward(*args, **kwargs) signatures keep ST's forced dict-like output safety net. ST 5.4's own __init__ unions the forward signature with the same set plus return_dict; the previous override silently dropped it. * Preserve flash-attention forward keys when wrapping ST 5.4 Transformer Sentence-transformers 5.4's Transformer.__init__ calls _can_flatten_inputs() during construction, which augments self.model_forward_params with cu_seq_lens_q, cu_seq_lens_k, max_length_q, max_length_k, seq_idx whenever feature-extraction with text modality, the torch backend, flash-attention 2, and varlen flash-attn support are all available. The post-init override of transformer_module.model_forward_params used to replace the attribute outright, silently dropping those keys so ST's preprocess() filter stripped flash-attn kwargs before reaching model.forward. Snapshot the constructor-populated set first, leave the existing overwrite intact for the forward-signature plus tokenizer keys, and union the snapshot back in so flash-attn forwarding keeps working on ST 5.4. For older sentence-transformers releases the attribute is absent and getattr returns an empty set, leaving behavior unchanged. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| conftest.py | ||
| test_cross_platform_parity.py | ||
| test_dpo_vision_processor_passthrough.py | ||
| test_e2e_no_torch_sandbox.py | ||
| test_fast_sentence_transformer_redirect_lifecycle.py | ||
| test_flash_attn_install_python_stack.py | ||
| test_install_python_stack.py | ||
| test_no_torch_filtering.py | ||
| test_studio_import_no_torch.py | ||
| test_tokenizers_and_torch_constraint.py | ||
| test_unsloth_run_tool_policy_resolver.py | ||