mrhaoxx
7a9daf0cd4
[feat](kt-kernel): support avx2 only inference for bf16 fp8 and gptq int4 ( #1892 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* feat: support avx2 bf16 fp8 inference
* feat: support avx2 gptq int4 inference
* fix: numeric issues in fp8 dequant
* Tutorial avx2 (#1900 )
* fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines
* docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs
* Tutorial avx2 (#1901 )
* fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines
* docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs
* docs: update README.md
---------
Co-authored-by: Benjamin F <159887351+yyj6666667@users.noreply.github.com>
2026-03-27 14:45:02 +08:00
Chen Hongtao
9e69fccb02
[feat]: add mistral moe loader compatibility ( #1873 )
...
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Co-authored-by: chenht2022 <chenht2022@users.noreply.github.com>
2026-02-28 17:50:23 +08:00
VYSE V.E.O
20262b2743
Fix Qwen3.5 FP8 load for VL detection ( #1857 )
...
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* Fix Qwen3.5 FP8 load for VL detection
1, for VL models(Qwen3.5), modify base_key: model.layers.{N} -> model.language_model.layers.{N}
2, clean DUPLICATED class BF16SafeTensorLoader(SafeTensorLoader) , only the first overrided one.
* Indent type
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-26 15:47:22 +08:00
Jianwei Dong
16a8b98f3e
support qwen3.5 ( #1846 )
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-02-16 15:48:14 +08:00
Jiaqi Liao
db82d99fa6
feat: add fallback expert prefix lookup in loader.py from kimi_k2.5 ( #1822 )
2026-01-30 14:09:38 +08:00
Oql
6277da4c2b
support GLM 4.7 ( #1791 )
...
Book-CI / test-2 (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
support GLM 4.7
2026-01-13 17:36:25 +08:00
Oql
5edc456749
support Native BF16 format MoE. ( #1788 )
...
support Native BF16 format MoE
2026-01-12 14:43:28 +08:00
ErvinXie
d8046e1bb4
Kt minimax ( #1742 )
...
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
Jiaqi Liao
fcf8882075
[Feature] Add avx-based kimi-k2 support ( #1656 )
...
Book-CI / test-2 (push) Waiting to run
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* support Kimi-K2-Thinking original weight
fix amx kernel bug
* update k2 avx kernel.
* feat: add CPUInfer write buffer task
* [feat]: add kimi k2 cpu write buffer support
- Implement write_weights_to_buffer function in k2-moe.hpp for extracting GPU expert weights
- Fix down (w2) weight column-wise slicing for different TP configurations
- Support three TP scenarios: cpu_tp == gpu_tp, cpu_tp > gpu_tp, cpu_tp < gpu_tp
- Add comprehensive test cases for weight extraction validation
- Ensure compatibility with Kimi model's MoE architecture
* [fix]: correct write_weight_scale_to_buffer expert offset calculation
Fixed the bug in write_weight_scale_to_buffer_task where expert offsets in GPU buffers were incorrectly calculated. Changed from using per_expert_gpu sizes to using full gpu_tp sizes, ensuring correct memory layout for multi-expert scenarios.
Also added benchmark scripts for k2 moe and write buffer operations, and cleaned up debug output in test files.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com>
* [feat]: add write buffer wrapper
* [fix] fix comment
---------
Co-authored-by: ouqingliang <1692110604@qq.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 16:01:07 +08:00
Jiaqi Liao
94c25626dc
Fix kt-kernel for new wrapper ( #1588 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* update README for kt-kernel
* style: format C++ and Python code in kt-kernel
- Format C++ files: task_queue, ext_bindings, and MoE operators
- Format Python utility modules: amx, llamafile, and loader
- Improve code readability and consistency
2025-11-10 21:47:34 +08:00
Jiaqi Liao
9bc00e587b
Refactor KTMoEWrapper backend ( #1587 )
...
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* universal backend for cpu inference
* expert defer
2025-11-10 20:26:15 +08:00