kvcache-ai-ktransformers/kt-kernel/python/sft
mrhaoxx dd1da65d90
feat(sft): add Qwen3.5 MoE support + fused checkpoint loading
- arch.py: add Qwen3_5Moe arch match, read config from text_config,
  _get_layers_prefix returns model.language_model.layers for Qwen3.5,
  _get_model_container_and_layers searches language_model attr
- weights.py: load_experts_from_checkpoint_files detects fused format
  (gate_up_proj in weight_map) and splits into gate/up/down
- wrapper.py: hidden_size fallback to text_config

Verified: Qwen3.5-35B-A3B (256 experts, fused format) E2E pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:19:15 +08:00
..
__init__.py feat(sft): AMX MoE SFT backend with LoRA support 2026-04-08 23:11:00 +08:00
amx.py feat(sft): AMX MoE SFT backend with LoRA support 2026-04-08 23:11:00 +08:00
arch.py feat(sft): add Qwen3.5 MoE support + fused checkpoint loading 2026-04-20 17:19:15 +08:00
autograd.py feat(sft): support transformers v5 fused expert format 2026-04-20 13:21:29 +08:00
base.py feat(sft): AMX MoE SFT backend with LoRA support 2026-04-08 23:11:00 +08:00
config.py refactor(sft): share_backward_bb default True, share_cache_pool auto-derived 2026-04-09 20:10:38 +08:00
dist_utils.py refactor(sft): unify KTConfig field names with kt_ prefix, add share_cache_pool, remove dead code 2026-04-09 14:17:50 +08:00
layer.py feat(sft): support transformers v5 fused expert format 2026-04-20 13:21:29 +08:00
lora.py feat(sft): support transformers v5 fused expert format 2026-04-20 13:21:29 +08:00
weights.py feat(sft): add Qwen3.5 MoE support + fused checkpoint loading 2026-04-20 17:19:15 +08:00
wrapper.py feat(sft): add Qwen3.5 MoE support + fused checkpoint loading 2026-04-20 17:19:15 +08:00