mrhaoxx
7a9daf0cd4
[feat](kt-kernel): support avx2 only inference for bf16 fp8 and gptq int4 ( #1892 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* feat: support avx2 bf16 fp8 inference
* feat: support avx2 gptq int4 inference
* fix: numeric issues in fp8 dequant
* Tutorial avx2 (#1900 )
* fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines
* docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs
* Tutorial avx2 (#1901 )
* fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines
* docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs
* docs: update README.md
---------
Co-authored-by: Benjamin F <159887351+yyj6666667@users.noreply.github.com>
2026-03-27 14:45:02 +08:00
Jianwei Dong
027832c590
[feat](kt-kernel): CPU-GPU experts sched ( #1796 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-01-16 17:01:15 +08:00
Oql
5edc456749
support Native BF16 format MoE. ( #1788 )
...
support Native BF16 format MoE
2026-01-12 14:43:28 +08:00
ErvinXie
d8046e1bb4
Kt minimax ( #1742 )
...
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
Jiaqi Liao
fcf8882075
[Feature] Add avx-based kimi-k2 support ( #1656 )
...
Book-CI / test-2 (push) Waiting to run
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* support Kimi-K2-Thinking original weight
fix amx kernel bug
* update k2 avx kernel.
* feat: add CPUInfer write buffer task
* [feat]: add kimi k2 cpu write buffer support
- Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights
- Fix down (w2) weight column-wise slicing for different TP configurations
- Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp
- Add comprehensive test cases for weight extraction validation
- Ensure compatibility with Kimi model's MoE architecture
* [fix]: correct write_weight_scale_to_buffer expert offset calculation
Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios.
Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com>
* [feat]: add write buffer wrapper
* [fix] fix comment
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 16:01:07 +08:00
ZiWei Yuan
1374b98ee5
[feat](moe_kernel): add amd blis support (int8) ( #1600 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* [feat]: init amd adaption
* [feat]: add blis support
* [fix]: fix setup and moe kernel warpper
* [fix](setup.py): support rebuild with cache and import kt_kernel works
fine
* [feat]: add moe_kernel converter for amd and implement the load
method(haven't tested yet)
* [feat](moe_kernel/moe.hpp): delete unused memory when using save
* [fix](moe_kernel): update PLAIN for pack
* [fix](moe_kernel): rm printf debug
* [fix](moe_kernel): skip gpu experts
* [fix](moe_kernel/moe.hpp): update include memory path
* [feat](moe_kernel/moe.hpp): support expert deferral
* [feat]: finish amd
---------
Co-authored-by: mrhaoxx <mr.haoxx@gmail.com>
2025-11-27 12:08:53 +08:00
Jiaqi Liao
9bc00e587b
Refactor KTMoEWrapper backend ( #1587 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* universal backend for cpu inference
* expert defer
2025-11-10 20:26:15 +08:00
chenht2022
6fe30af50d
Merge branch 'main' into develop-cht
2025-11-03 14:35:44 +00:00
ovowei
f854d03bd7
update kt-kernel
2025-11-03 15:19:52 +08:00
chenht2022
dd4377b60b
feat: add deferred expert scheduling support
2025-10-31 08:03:37 +00:00
ovowei
28d8663374
fix
2025-10-22 18:14:34 +08:00
Atream
4c5fcf9774
add kt-kernel
2025-10-12 05:13:00 +00:00