Add numa_nodes parameter to BaseMoEWrapper and all subclasses, allowing
users to explicitly specify which NUMA node IDs to use for subpool
mapping instead of always defaulting to sequential [0, 1, ..., N-1].
This enables running multiple KTransformers instances on different NUMA
nodes of the same machine, e.g. --kt-threadpool-count 1 --kt-numa-nodes 1
to bind to NUMA node 1. Previously this required external numactl
workarounds since subpool_numa_map was hardcoded to start from 0.
* support Kimi-K2-Thinking original weight
fix amx kernel bug
* update k2 avx kernel.
* feat: add CPUInfer write buffer task
* [feat]: add kimi k2 cpu write buffer support
- Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights
- Fix down (w2) weight column-wise slicing for different TP configurations
- Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp
- Add comprehensive test cases for weight extraction validation
- Ensure compatibility with Kimi model's MoE architecture
* [fix]: correct write_weight_scale_to_buffer expert offset calculation
Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios.
Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
* [feat]: add write buffer wrapper
* [fix] fix comment
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>