kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-04-28 03:39:48 +00:00

History

ErvinXie 71f683acec Support Native Kimi K2 Thinking (#1663 ) * [feat]: fix k2 prefill * Update Kimi-K2-Thinking.md * Create Kimi-K2-Thinking-Native.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking-Native.md * [perf] optimize K2 MoE weight loading with per-expert pointers - Avoid expensive torch.stack().contiguous() in Python (was ~6.6s) - Use per-expert pointer arrays (gate_projs) instead of contiguous memory - C++ worker pool performs parallel memcpy for TP slicing - Add LOAD_TIME_PROFILE for load_weights timing analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>		2025-12-05 21:53:05 +08:00
..
__init__.py	[Feature] Add avx-based kimi-k2 support (#1656 )	2025-12-02 16:01:07 +08:00
amx.py	Support Native Kimi K2 Thinking (#1663 )	2025-12-05 21:53:05 +08:00
llamafile.py	Fix kt-kernel for new wrapper (#1588 )	2025-11-10 21:47:34 +08:00
loader.py	[Feature] Add avx-based kimi-k2 support (#1656 )	2025-12-02 16:01:07 +08:00
moe_kernel.py	[feat](moe_kernel): add amd blis support (int8) (#1600 )	2025-11-27 12:08:53 +08:00