kvcache-ai-ktransformers/kt-kernel/python
mrhaoxx 58d7eabb9b
feat(sft): support transformers v5 fused expert format
Fused experts (e.g. Qwen3MoeExperts) store weights as 3D Parameters
(gate_up_proj [E,2I,H], down_proj [E,H,I]) instead of per-expert
nn.Linear modules. PEFT cannot attach LoRA to these, so we create
KT-managed LoRA buffers with kaiming init, nn.Parameter wrappers
for the optimizer, and pre-assigned .grad for C++ backward.

- arch.py: detect_fused_experts() detection
- weights.py: fused format extraction and weight clearing
- wrapper.py: detect fused at wrap time, store _fused_experts/_lora_rank
- lora.py: _create_fused_expert_lora_buffers, save/load fused LoRA,
  get_kt_lora_params collects fused params, deduplicate wrapper finding
- layer.py: handle v5 TopKRouter tuple output, remove dead code
- autograd.py: sync_forward_sft/submit_forward_sft API rename

Verified: v5 loss/expert-LoRA values match v4 baseline, v4 backward compat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 13:21:29 +08:00
..
cli [fix] improve Sglang kt-kernel detect time duration (#1887) 2026-03-18 23:07:40 +08:00
sft feat(sft): support transformers v5 fused expert format 2026-04-20 13:21:29 +08:00
utils merge: integrate origin/main into sft branch 2026-04-08 23:19:28 +08:00
__init__.py merge: integrate origin/main into sft branch 2026-04-08 23:19:28 +08:00
_cpu_detect.py [feat](kt-kernel): Fix CPU instruction set variants for build & install (#1746) 2025-12-24 18:57:45 +08:00
experts.py merge: integrate origin/main into sft branch 2026-04-08 23:19:28 +08:00
experts_base.py merge: integrate origin/main into sft branch 2026-04-08 23:19:28 +08:00