unsloth

mirror of https://github.com/unslothai/unsloth.git synced 2026-05-18 23:35:25 +00:00

History

Datta Nimmaturi 4f9c8321a2 Fix DPO trainer multi process hang (#5199 ) * Fix DPO trainer multi process hang * Fix datacollator error * further dpo vision changes * cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden DPO vision row processing and source rewrites - dpo_trainer_vision_signature_columns: also match TRL 0.22.x layout (image_sizes followed by ref_chosen_logps), so vision keys are not stripped via remove_unused_columns on the originally-affected version. - dpo_trainer_concatenated_inputs: fall back to inserting after the image_sizes block when no token_type_ids anchor follows it. - Apply the same vision model_kwargs forwarding rewrite to _compute_loss_liger via dpo_trainer_compute_loss_liger so the Liger DPO path does not drop pixel_position_ids/image_position_ids/ mm_token_type_ids when args.use_liger_loss is true. - dpo_trainer_vision_process_row: - guard chosen/rejected EOS append with tokenizer.eos_token_id is not None - use features.get("images") and features.get("prompt") to match the existing get on line 164 and avoid KeyError on rows without those keys - drop the torch.is_tensor gate so list-form pixel_position_ids/ image_position_ids returned without return_tensors are still aliased - skip the loop entry for image_position_ids when it was already promoted to pixel_position_ids, so the output dict no longer carries both keys with identical data - dpo_trainer_data_collator_vision_keys: switch from pad_sequence to trl.trainer.utils.pad with padding_side='left' (matches the DPO collator's prompt left-pad) and padding_value=-1 for _position_ids keys (sentinel for padded patches), 0 otherwise. Skip the key when not every example carries it. Falls back to pad_sequence if trl.pad is unavailable or the tensor rank is too high. - dpo_trainer_prepare_dataset: keep TRL's writer_batch_size=10 when popping num_proc; removing it defaults to 1000 and reintroduces the vision OOM risk that writer_batch_size=10 was set to avoid. DPO vision row: keep upstream-facing keys and fix patch padding - dpo_trainer_vision_process_row: no longer aliases image_position_ids to pixel_position_ids. Each upstream-emitted vision key is forwarded under its own name. Gemma4 ForConditionalGeneration.forward accepts image_position_ids directly and renames it to pixel_position_ids only at the vision-tower call site, so aliasing in the row helper hid the kwarg the model actually consumes. - dpo_trainer_vision_process_row: extract pixel_values via "in" membership instead of unconditional indexing. With the missing-images path returning [] to the processor, modern processors no longer emit a pixel_values key, and the previous indexing raised KeyError. - dpo_trainer_data_collator_vision_keys: pick padding_side per key family. _position_ids tensors are patch-aligned to pixel_values (TRL's DataCollatorForPreference right-pads pixel_values), so pad them right with the -1 sentinel; mm_token_type_ids is token-aligned to prompt_input_ids (left-padded by TRL), so pad it left with 0. DPO vision: handle multi-image prompts and arbitrary-rank collator pad - dpo_trainer_vision_process_row: when a prompt is missing vision placeholders, insert one placeholder per missing image instead of always inserting a single token. Multi-image rows now satisfy the processor's token-vs-image count check rather than under-inserting and tripping the placeholder/feature mismatch. - dpo_trainer_data_collator_vision_keys: drop the dim()<=2 gate around trl.trainer.utils.pad. trl.pad handles arbitrary rank correctly, while the previous fallback to torch.nn.utils.rnn.pad_sequence raised RuntimeError on rank-3 patch-position tensors with mismatched non-leading dimensions. The pad_sequence path remains as a degraded fallback only when trl.pad is unavailable or raises. * DPO vision row: support scalar images and align prompt-aligned aux ids - dpo_trainer_vision_process_row: type-aware normalization of the features['images'] column instead of a truthiness/len check that raised on single image objects (PIL.Image has no __len__) and on numpy ndarrays (truthiness ambiguous). Lists/tuples count as their length, scalar image objects count as one, None counts as zero, and the original value is forwarded to the processor. - dpo_trainer_vision_process_row: when max_prompt_length truncates prompt_input_ids, also slice token_type_ids and mm_token_type_ids by the same [-max_prompt_length:] suffix. Those keys are 1:1 token aligned to prompt_input_ids (Gemma 4 vision attention keys off mm_token_type_ids per modular_gemma4.py), so leaving them at the original length silently misaligned the multimodal mask. * DPO vision row: stop synthesizing vision-token placeholders Pass features['prompt'] and features['images'] straight to the processor without inserting any extra placeholder tokens. The previous helper used processing_class.image_token, which is the right prompt placeholder for Gemma 4 but the wrong one for Gemma 3 (whose prompt placeholder is boi_token while image_token is the inner expansion target). Synthesizing that token also broke multi-image rows: text ended up with N placeholders while the row helper only forwarded the first image's pixel_values via the standard [0] indexing that mirrors upstream TRL process_row, so token vs image-feature counts diverged. Removing the synthesis matches stock TRL behavior; users provide the correct placeholders for their processor in the prompt. * Add tests for DPO vision row processor passthrough * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>		2026-04-29 04:15:34 -07:00
..
__init__.py	Consolidate dual venvs and separate install from update (#4530 )	2026-03-25 05:24:21 -07:00
conftest.py	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 )	2026-04-01 06:12:17 -07:00
test_cross_platform_parity.py	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 )	2026-04-15 11:39:11 +04:00
test_dpo_vision_processor_passthrough.py	Fix DPO trainer multi process hang (#5199 )	2026-04-29 04:15:34 -07:00
test_e2e_no_torch_sandbox.py	tests: add no-torch / Intel Mac test suite (#4646 )	2026-03-27 02:33:45 -07:00
test_flash_attn_install_python_stack.py	[Studio] Install flash attn at setup time for linux (#4979 )	2026-04-14 16:40:17 +04:00
test_install_python_stack.py	Consolidate dual venvs and separate install from update (#4530 )	2026-03-25 05:24:21 -07:00
test_no_torch_filtering.py	[Studio] Install flash attn at setup time for linux (#4979 )	2026-04-14 16:40:17 +04:00
test_studio_import_no_torch.py	tests: add no-torch / Intel Mac test suite (#4646 )	2026-03-27 02:33:45 -07:00
test_tokenizers_and_torch_constraint.py	fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748 )	2026-04-01 06:12:17 -07:00