kvcache-ai-ktransformers/kt-kernel/python/utils
ErvinXie 71f683acec
Support Native Kimi K2 Thinking (#1663)
* [feat]: fix k2 prefill

* Update Kimi-K2-Thinking.md

* Create Kimi-K2-Thinking-Native.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking.md

* Update Kimi-K2-Thinking-Native.md

* [perf] optimize K2 MoE weight loading with per-expert pointers

- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 21:53:05 +08:00
..
__init__.py [Feature] Add avx-based kimi-k2 support (#1656) 2025-12-02 16:01:07 +08:00
amx.py Support Native Kimi K2 Thinking (#1663) 2025-12-05 21:53:05 +08:00
llamafile.py Fix kt-kernel for new wrapper (#1588) 2025-11-10 21:47:34 +08:00
loader.py [Feature] Add avx-based kimi-k2 support (#1656) 2025-12-02 16:01:07 +08:00
moe_kernel.py [feat](moe_kernel): add amd blis support (int8) (#1600) 2025-11-27 12:08:53 +08:00