Jiaqi Liao
db82d99fa6
feat: add fallback expert prefix lookup in loader.py from kimi_k2.5 ( #1822 )
2026-01-30 14:09:38 +08:00
Jiaqi Liao
edc48aba37
[fix]: fix wrapper import issue ( #1819 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-01-28 16:31:56 +08:00
Oql
bf4c8a690b
Add Native Precision Tutorial, update worker strategy and README.md ( #1807 )
2026-01-23 18:00:13 +08:00
SCDESPERTATE
b0f827d2a9
[chore](cuda): explicitly use ele_per_blk var for better readability ( #1784 )
2026-01-23 11:11:08 +08:00
mrhaoxx
b27de4068b
[fix]: fix exp_avx512 for act_fn ( #1797 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-01-20 11:07:22 +08:00
Jianwei Dong
027832c590
[feat](kt-kernel): CPU-GPU experts sched ( #1796 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-01-16 17:01:15 +08:00
Oql
6277da4c2b
support GLM 4.7 ( #1791 )
...
Book-CI / test-2 (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
support GLM 4.7
2026-01-13 17:36:25 +08:00
watamario15
667030d6e6
[kt-kernel]: Fix ignored build configurations in install.sh and CMakeLists.txt ( #1789 )
...
Release Fake Tag / publish (push) Has been cancelled
Release to PyPI / Build kt-kernel (Python 3.11) (push) Has been cancelled
Release to PyPI / Build kt-kernel (Python 3.12) (push) Has been cancelled
Release to PyPI / Publish to PyPI (push) Has been cancelled
Book-CI / test-2 (push) Waiting to run
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
* Correct variable defaults
* Remove CMAKE_BUILD_TYPE setting in CMakeLists
2026-01-12 22:16:19 +08:00
Oql
5edc456749
support Native BF16 format MoE. ( #1788 )
...
support Native BF16 format MoE
2026-01-12 14:43:28 +08:00
Oql
ddb957596f
Fix moe bug. ( #1783 )
...
* [fix]: fix moe.hpp load from file bug.
* [fix]: fix all moe hpp init bug.
* [fix]: fix moe & awq-moe ug.
2026-01-05 17:02:24 +08:00
Oql
dc6394e501
[fix]: fix moe hpp bug. ( #1780 )
...
fix moe hpp init bug.
2026-01-04 19:32:56 +08:00
Jianwei Dong
9adc91714f
Remove kt-kernel-cuda, kt-kernel uses the version with cuda ( #1769 )
2025-12-30 10:23:58 +08:00
ZiWei Yuan
b096b01fbc
[docs]: add kt-cli doc and update corresponding website ( #1768 )
2025-12-29 23:06:22 +08:00
ErvinXie
9539ab91eb
Cli ( #1765 )
...
* [feat]: add custom option for kt run
* [feat]: depth 3
2025-12-29 15:18:42 +08:00
Jianwei Dong
4b235cdaa4
fix cuda wheel build ( #1766 )
2025-12-29 12:42:06 +08:00
Jianwei Dong
559a3ad4ac
fix pypi cuda install ( #1763 )
2025-12-29 11:19:43 +08:00
Jiaqi Liao
46b0f36980
[feat](kt-kernel): Fix CPU instruction set variants for build & install ( #1746 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Release Fake Tag / publish (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.10) (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.11) (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.12) (push) Has been cancelled
Release to PyPI / Publish to PyPI (push) Has been cancelled
* [feat]: Enhance CPU feature detection and support for AVX512 extensions
- Added cmake/DetectCPU.cmake for automatic CPU feature detection.
- Updated CMakeLists.txt to include auto-detection logic for AVX512 features.
- Modified install.sh to include new AVX512_VBMI option for FP8 MoE.
- Enhanced _cpu_detect.py to support progressive matching of CPU variants.
- Created scripts/check_cpu_features.py for manual CPU feature checks.
- Updated setup.py to reflect changes in CPU variant building and environment variables.
* [fix](kt-kernel): Add conditional inclusion of FP8 MoE for AVX512 BF16 support
* [chore](kt-kernel): update project version to 0.5.0 in CMakeLists.txt and version.py
2025-12-24 18:57:45 +08:00
ZiWei Yuan
dc5feece8f
[docs]: update doc link ( #1745 )
2025-12-24 18:00:47 +08:00
ZiWei Yuan
3315335fb1
[docs]: update docs to kt-kernel & add amd_blis doc ( #1744 )
2025-12-24 17:55:15 +08:00
ErvinXie
d8046e1bb4
Kt minimax ( #1742 )
...
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
Jianwei Dong
39449ed1af
update PyPI Install and readme ( #1731 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2025-12-18 17:21:47 +08:00
Jiaqi Liao
3c134359bc
Fix CPU Instruction Set and Installation ( #1729 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* [fix](kt-kernel): fix AVX512 cpu instruction set detection
* [feat](kt-kernel): AVX512 fallback kernel for RAW-INT4
* [fix](kt-kernel): fix setup version issue
* [fix](kt-kernel): update install for custom build
* [docs](kt-kernel): new installation guide for various cpu instruction set
* [fix](kt-kernel): fix _mm512_dpbusd_epi32_compat fallback implmentation
* [style](kt-kernel): clang format
2025-12-18 00:11:57 +08:00
ErvinXie
a8667ddb58
[fix](test): fix import kt-kernel ( #1728 )
2025-12-17 19:46:32 +08:00
SCDESPERTATE
6fc4080a7d
[fix](kt-kernel): fix typo in moe-tp's forward time-profiling ( #1720 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* [fix](kt-kernel): fix typo in moe-tp's forward time-profiling
* [fix](kt-kernel): fix the experts count in profiling
---------
Co-authored-by: KMSorSMS <yzwliam@126.com>
2025-12-17 12:06:33 +08:00
Jianwei Dong
1f79f6da92
[feat](kt-kernel): Add automatic deployment workflow ( #1719 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Release Fake Tag / publish (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.10) (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.11) (push) Has been cancelled
Release to PyPI / Build kt-kernel CPU-only (Python 3.12) (push) Has been cancelled
Release to PyPI / Publish to PyPI (push) Has been cancelled
2025-12-16 15:20:06 +08:00
SCDESPERTATE
008de19e16
[fix](kt-kernel): drop the weights held in Python for loading weights operation in C++ ( #1695 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2025-12-12 11:42:33 +08:00
ZiWei Yuan
53f6a6d6e1
[feat]: patch kml problem ( #1704 )
2025-12-11 14:40:29 +08:00
Jianwei Dong
c65febe05c
[feat]: Automatically detect whether blis is installed on amd cpus ( #1702 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-12-11 14:25:36 +08:00
Oql
e87a042ef0
[fix](kt-kernel): fix write_buffer do numa job ( #1699 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-12-10 16:39:16 +08:00
mrhaoxx
503295fc88
[feat](kt-kernel): refactor convert_cpu_weights.py to support conversation for GLM-4.6V ( #1687 )
...
Signed-off-by: mrhaoxx <mr.haoxx@gmail.com>
2025-12-09 14:24:41 +08:00
Oql
ac69ea891e
Fix K2 MoE decode bug in buffer management ( #1686 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-12-08 21:08:28 +08:00
Oql
8139c092bf
Reduce CPU memory usage during large chunk prefill ( Fixes #1676 ) ( #1683 )
...
* fix(amx): add BufferASmallKGroupImpl to fix buffer overflow in from_mat
The original BufferAKGroupImpl::from_mat writes 64 bytes per K_STEP iteration
but when K_STEP=32 (for GemmKernel224Int4SmallKGroup), this causes buffer overflow.
BufferASmallKGroupImpl overrides from_mat to write only 32 bytes per iteration.
* perf(k2-moe): optimize memory allocation with pooled buffers
- Replace per-expert buffer allocation with shared memory pools
- Dynamically assign buffer slices based on activated experts
- Add group_size inference from scale tensor shape in amx.py
* delete kimi k2 forward test
* add TODO comment for pool_count_ calculation
2025-12-08 20:19:07 +08:00
Jiaqi Liao
721b6c4c94
[docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters ( #1671 )
2025-12-05 22:46:16 +08:00
ErvinXie
71f683acec
Support Native Kimi K2 Thinking ( #1663 )
...
* [feat]: fix k2 prefill
* Update Kimi-K2-Thinking.md
* Create Kimi-K2-Thinking-Native.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking.md
* Update Kimi-K2-Thinking-Native.md
* [perf] optimize K2 MoE weight loading with per-expert pointers
- Avoid expensive torch.stack().contiguous() in Python (was ~6.6s)
- Use per-expert pointer arrays (gate_projs) instead of contiguous memory
- C++ worker pool performs parallel memcpy for TP slicing
- Add LOAD_TIME_PROFILE for load_weights timing analysis
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-05 21:53:05 +08:00
ZiWei Yuan
4850424345
[docs]: add amd blis backend usage guide ( #1669 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-12-05 16:52:26 +08:00
Jiaqi Liao
0698252484
[fix](kt-kernel): gate RAWINT4 behind AVX512 and avoid AVX2 build break ( #1660 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2025-12-03 00:43:23 +08:00
Jiaqi Liao
fcf8882075
[Feature] Add avx-based kimi-k2 support ( #1656 )
...
Book-CI / test-2 (push) Waiting to run
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* support Kimi-K2-Thinking original weight
fix amx kernel bug
* update k2 avx kernel.
* feat: add CPUInfer write buffer task
* [feat]: add kimi k2 cpu write buffer support
- Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights
- Fix down (w2) weight column-wise slicing for different TP configurations
- Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp
- Add comprehensive test cases for weight extraction validation
- Ensure compatibility with Kimi model's MoE architecture
* [fix]: correct write_weight_scale_to_buffer expert offset calculation
Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios.
Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com>
* [feat]: add write buffer wrapper
* [fix] fix comment
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 16:01:07 +08:00
ZiWei Yuan
c2b8c60c4e
[ci]: add int4_1 & int4_1k ( #1653 )
...
* [feat]: init amd adaption
* [feat]: add blis support
* [fix]: fix setup and moe kernel warpper
* [fix](setup.py): support rebuild with cache and import kt_kernel works
fine
* [feat]: add moe_kernel converter for amd and implement the load
method(haven't tested yet)
* [feat](moe_kernel/moe.hpp): delete unused memory when using save
* [fix](moe_kernel): update PLAIN for pack
* [fix](moe_kernel): rm printf debug
* [fix](moe_kernel): skip gpu experts
* [fix](moe_kernel/moe.hpp): update include memory path
* [feat](moe_kernel/moe.hpp): support expert deferral
* [feat]: finish amd
* [ci]: add int4_1 & int4_1k
---------
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
2025-12-02 15:58:14 +08:00
Jianwei Dong
fd78fe520a
fix(scripts): resolve OOM when converting gpu weights and update README ( #1640 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-12-01 14:15:14 +08:00
mrhaoxx
637c49c83f
[feat](kt-kernel): support qwen3-vl weights convert ( #1648 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2025-11-27 22:29:09 +08:00
Jianwei Dong
c256150e08
update ci test ( #1647 )
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-11-27 16:39:48 +08:00
ZiWei Yuan
1374b98ee5
[feat](moe_kernel): add amd blis support (int8) ( #1600 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* [feat]: init amd adaption
* [feat]: add blis support
* [fix]: fix setup and moe kernel warpper
* [fix](setup.py): support rebuild with cache and import kt_kernel works
fine
* [feat]: add moe_kernel converter for amd and implement the load
method(haven't tested yet)
* [feat](moe_kernel/moe.hpp): delete unused memory when using save
* [fix](moe_kernel): update PLAIN for pack
* [fix](moe_kernel): rm printf debug
* [fix](moe_kernel): skip gpu experts
* [fix](moe_kernel/moe.hpp): update include memory path
* [feat](moe_kernel/moe.hpp): support expert deferral
* [feat]: finish amd
---------
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
2025-11-27 12:08:53 +08:00
Jianwei Dong
fef6dd98a8
add accuracy and performance test ( #1643 )
2025-11-27 10:56:39 +08:00
Jiaqi Liao
e7d1c1de09
fix(llamafile): resolve deferred experts data race and update README ( #1646 )
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
2025-11-26 23:19:37 +08:00
Jianwei Dong
51745a9ea1
add ci ( #1642 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2025-11-25 20:52:08 +08:00
DocShotgun
e72a4fb880
[feat](kt-kernel): Add resume arg to CPU weight conversion ( #1630 )
...
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* [feat]: kt-kernel: Add resume arg to CPU weight conversion
* [docs]: kt-kernel: Document resume arg for CPU weight conversion
* [fix]: kt-kernel: Only print resume layer if in use
* [fix]: kt-kernel: Don't log skipped layers when using resume_layer
2025-11-22 12:00:15 +08:00
Jiaqi Liao
e69c67713f
[refactor] fix third_party issue ( #1632 )
...
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* [refactor]: relocate third_party directory
* [fix]: fix custom_flashinfer for kt-sft
2025-11-20 13:55:55 +08:00
ZiWei Yuan
aef6672dd8
[docs]: add contribuing guide and add hooks install ( #1613 )
...
* [feat]: update kt-kernel hooks and add contribution guide
* [docs]: add contributing guide
* [style]: format the python file and cpp file in kt-kernel
2025-11-15 18:26:49 +08:00
ZiWei Yuan
c32fefb1cd
[doc]: update web doc and kt-kernel doc ( #1609 )
...
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* [doc]: update web doc and kt-kernel doc
* [doc](book.toml): add book.toml for rust book compile
2025-11-13 20:44:13 +08:00
Jiaqi Liao
4bd0fe812b
docs(kt-kernel): improve SGLang integration documentation and fix syntax errors ( #1607 )
...
- Clarified instructions for SGLang integration with kt-kernel
2025-11-13 19:23:00 +08:00