fix: add missing gpu_experts_mask=None to KTMoEWrapper call in SFT wrapper

KTMoEWrapper.__new__() requires gpu_experts_mask as a positional argument, but the SFT wrapper omitted it, causing MoE layer wrapping to fail silently and FSDP2 to attempt broadcasting all expert weights (OOM/NCCL crash). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-28 11:49:51 +00:00 · 2026-04-10 02:18:40 +08:00 · 2026-04-10 02:18:40 +08:00 · 6d4632b8c7
commit 6d4632b8c7
parent 5bfcb5f784
1 changed files with 1 additions and 0 deletions
--- a/kt-kernel/python/sft/wrapper.py
+++ b/kt-kernel/python/sft/wrapper.py
@ -312,6 +312,7 @@ def wrap_moe_layers_with_kt_wrapper(model: nn.Module, kt_plugin: Any) -> list[KT
                num_experts_per_tok=moe_config.num_experts_per_tok,
                hidden_size=hidden_size,
                moe_intermediate_size=moe_config.intermediate_size,
+                gpu_experts_mask=None,
                num_gpu_experts=0,
                cpuinfer_threads=getattr(cfg, "kt_num_threads", 1),
                threadpool_count=threadpool_count,