vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-04-29 12:19:50 +00:00

Author	SHA1	Message	Date
mrhaoxx	f36699affd	feat(sft): AMX MoE SFT backend with LoRA support Complete SFT (Supervised Fine-Tuning) backend for MoE models using AMX SIMD: Core C++ implementation: - sft_moe.hpp: Forward/backward with LoRA fused operations (~5500 lines) - moe-sft-tp.hpp: Tensor-parallel wrapper for multi-NUMA - amx/moe-sft-tp.hpp: AMX-specific TP implementation - avx_kernels.hpp: AVX512 SIMD kernels for LoRA GEMM - amx_kernels.hpp: AMX tile kernels for Panel5 rank-outer optimization - worker_pool: RDTSC profiling, Chrome trace output, SFT timer infrastructure - ext_bindings.cpp: SFT MOE pybind bindings (BF16/INT8/INT4 + SkipLoRA variants) Python sft/ submodule (kt_kernel.sft): - base.py: BaseSFTMoEWrapper with buffer management (template method pattern) - amx.py: AMXSFTMoEWrapper (weight loading, C++ task construction) - autograd.py: KTMoEFunction (torch.autograd.Function for distributed training) - layer.py: KTMoELayerWrapper (nn.Module replacing HF MoE layers) - arch.py: MOEArchConfig (Qwen3/DeepSeek/Mixtral architecture detection) - weights.py: Expert weight extraction and checkpoint loading - lora.py: PEFT LoRA adaptation (view buffers, grad buffers, save/load adapter) - wrapper.py: wrap_moe_layers_with_kt_wrapper, load_kt_model, build_kt_device_map - config.py: KTConfig dataclass (DeepSpeed-style opaque config passthrough) - dist_utils.py: Distributed gather/scatter, checkpoint-phase detection Design decisions: - Rank-0-only expert pattern: only rank 0 holds C++ wrapper and expert weights - DeepSpeed-style integration: accelerate keeps only KTransformersPlugin (framework interaction fields), all logic in kt_kernel.sft - Inference isolation: importing kt_kernel does not load sft/ submodule - Old field name compatibility: _get_kt_config() converts kt_xxx→xxx automatically Verified: Qwen3-235B-A22B 4GPU AMXBF16 training, loss converges normally.	2026-04-08 23:11:00 +08:00
ErvinXie	d8046e1bb4	Kt minimax (#1742 ) [feat]: fp8 kernel and kt-cli support	2025-12-24 15:39:44 +08:00
Jiaqi Liao	fcf8882075	[Feature] Add avx-based kimi-k2 support (#1656 ) Some checks are pending Book-CI / test-2 (push) Waiting to run Details Book-CI / test (push) Waiting to run Details Book-CI / test-1 (push) Waiting to run Details Deploy / deploy (macos-latest) (push) Waiting to run Details Deploy / deploy (ubuntu-latest) (push) Waiting to run Details Deploy / deploy (windows-latest) (push) Waiting to run Details * support Kimi-K2-Thinking original weight fix amx kernel bug * update k2 avx kernel. * feat: add CPUInfer write buffer task * [feat]: add kimi k2 cpu write buffer support - Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights - Fix down (w2) weight column-wise slicing for different TP configurations - Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp - Add comprehensive test cases for weight extraction validation - Ensure compatibility with Kimi model's MoE architecture * [fix]: correct write_weight_scale_to_buffer expert offset calculation Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios. Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * [feat]: add write buffer wrapper * [fix] fix comment --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 16:01:07 +08:00
Jiaqi Liao	94c25626dc	Fix kt-kernel for new wrapper (#1588 ) Some checks are pending Book-CI / test (push) Waiting to run Details Book-CI / test-1 (push) Waiting to run Details Book-CI / test-2 (push) Waiting to run Details Deploy / deploy (macos-latest) (push) Waiting to run Details Deploy / deploy (ubuntu-latest) (push) Waiting to run Details Deploy / deploy (windows-latest) (push) Waiting to run Details * update README for kt-kernel * style: format C++ and Python code in kt-kernel - Format C++ files: task_queue, ext_bindings, and MoE operators - Format Python utility modules: amx, llamafile, and loader - Improve code readability and consistency	2025-11-10 21:47:34 +08:00
Jiaqi Liao	9bc00e587b	Refactor KTMoEWrapper backend (#1587 ) Some checks are pending Book-CI / test (push) Waiting to run Details Book-CI / test-1 (push) Waiting to run Details Book-CI / test-2 (push) Waiting to run Details Deploy / deploy (macos-latest) (push) Waiting to run Details Deploy / deploy (ubuntu-latest) (push) Waiting to run Details Deploy / deploy (windows-latest) (push) Waiting to run Details * universal backend for cpu inference * expert defer	2025-11-10 20:26:15 +08:00

5 commits