Commit graph

116 commits

Author SHA1 Message Date
Jianwei Dong
17d9e49dd0
Update README.md 2026-04-18 21:28:46 +08:00
Jianwei Dong
a9f28d495b
Update README.md (#1934) 2026-04-18 21:10:25 +08:00
mrhaoxx
7a9daf0cd4
[feat](kt-kernel): support avx2 only inference for bf16 fp8 and gptq int4 (#1892)
Some checks are pending
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* feat: support avx2 bf16 fp8 inference

* feat: support avx2 gptq int4 inference

* fix: numeric issues in fp8 dequant

* Tutorial avx2 (#1900)

* fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines

* docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs

* Tutorial avx2 (#1901)

* fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines

* docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs

* docs: update README.md

---------

Co-authored-by: Benjamin F <159887351+yyj6666667@users.noreply.github.com>
2026-03-27 14:45:02 +08:00
Jiaqi Liao
411b69bec0
Add GLM-5 Day0 Support update to README (#1851)
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-02-15 11:37:12 +08:00
Jiaqi Liao
7d9943365a
Add MiniMax-M2.5 Day0 support update (#1850)
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Added update for MiniMax-M2.5 Day0 support in the README.
2026-02-13 22:49:10 +08:00
Jiaqi Liao
2f6f7f1921
Kimi k2.5 doc (#1812)
Some checks are pending
Book-CI / test-2 (push) Waiting to run
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
* [doc]: add Kimi-K2.5 deploy&sft guide

* [doc]: add Kimi-K2.5 deploy&sft guide
2026-01-27 13:33:25 +08:00
Oql
5bd5c8f750
[fix]: fix experts-sched-Tutorial.md (#1808)
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
2026-01-23 18:06:24 +08:00
Oql
bf4c8a690b
Add Native Precision Tutorial, update worker strategy and README.md (#1807) 2026-01-23 18:00:13 +08:00
Jiaqi Liao
be668074de
Update tutorial links in README.md (#1749) 2025-12-25 14:26:10 +08:00
ZiWei Yuan
dc5feece8f
[docs]: update doc link (#1745) 2025-12-24 18:00:47 +08:00
ErvinXie
d8046e1bb4
Kt minimax (#1742)
[feat]: fp8 kernel and kt-cli support
2025-12-24 15:39:44 +08:00
mrhaoxx
e7d277d163
[docs]: refine README for dpo updates (#1740)
Some checks are pending
Book-CI / test (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
* [docs]: refine dpo tutorial

* [docs]: refine README for dpo updates

* Update doc/en/DPO_tutorial.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* [docs]: update website doc & refine location

---------

Co-authored-by: ErvinXie <ervinxie@foxmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-12-24 11:20:08 +08:00
ZiWei Yuan
34230eaf44
[docs]: Fix image link in README.md (#1718)
Some checks are pending
Book-CI / test (push) Waiting to run
Book-CI / test-1 (push) Waiting to run
Book-CI / test-2 (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
Updated image link to use raw GitHub URL for better accessibility.
2025-12-15 17:10:15 +08:00
ZiWei Yuan
2f1b743050
[docs]: update website doc png (#1696) 2025-12-11 13:01:32 +08:00
Jiaqi Liao
721b6c4c94
[docs] Update Native Kimi-K2-Thinking documentation and kt-kernel parameters (#1671) 2025-12-05 22:46:16 +08:00
Jiaqi Liao
1ca3a2662e
Add 9#AISoft to the list of contributors (#1668) 2025-12-05 15:44:04 +08:00
Peilin Li
171578a7ec
[refactor]: Change named 'KT-SFT' to 'kt-sft' (#1626)
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
* Change named 'KT-SFT' to 'kt-sft'

* [docs]: update kt-sft name

---------

Co-authored-by: ZiWei Yuan <yzwliam@126.com>
2025-11-17 11:48:42 +08:00
ZiWei Yuan
550e4986f5
[docs]: update README.md (#1616)
* [docs]: update README.md
2025-11-15 20:56:26 +08:00
ZiWei Yuan
7c2ad6dbca
[docs]: update README.md (#1614)
* [docs]: update README.md
2025-11-15 18:34:27 +08:00
ErvinXie
5179f0d634
Add roadmap link to README (#1585) 2025-11-10 18:15:53 +08:00
Jiaqi Liao
07322ca2bd
Refactor: restructure repository to focus on kt-kernel and KT-SFT modules (#1583)
* refactor repo

* fix README
2025-11-10 17:57:48 +08:00
ErvinXie
2cb1674020
Fix image reference in README.md (#1584)
Updated image reference in README for heterogeneous computing.
2025-11-10 17:53:41 +08:00
Jiaqi Liao
57d14d22bc
Refactor: restructure repository to focus on kt-kernel and KT-SFT modulesq recon (#1581)
* refactor: move legacy code to archive/ directory

  - Moved ktransformers, csrc, third_party, merge_tensors to archive/
  - Moved build scripts and configurations to archive/
  - Kept kt-kernel, KT-SFT, doc, and README files in root
  - Preserved complete git history for all moved files

* refactor: restructure repository to focus on kt-kernel and KT-SFT modules

* fix README

* fix README

* fix README

* fix README

* docs: add performance benchmarks to kt-kernel section

Add comprehensive performance data for kt-kernel to match KT-SFT's presentation:
- AMX kernel optimization: 21.3 TFLOPS (3.9× faster than PyTorch)
- Prefill phase: up to 20× speedup vs baseline
- Decode phase: up to 4× speedup
- NUMA optimization: up to 63% throughput improvement
- Multi-GPU (8×L20): 227.85 tokens/s total throughput with DeepSeek-R1 FP8

Source: https://lmsys.org/blog/2025-10-22-KTransformers/

This provides users with concrete performance metrics for both core modules,
making it easier to understand the capabilities of each component.

* refactor: improve kt-kernel performance data with specific hardware and models

Replace generic performance descriptions with concrete benchmarks:
- Specify exact hardware: 8×L20 GPU + Xeon Gold 6454S, Single/Dual-socket Xeon + AMX
- Include specific models: DeepSeek-R1-0528 (FP8), DeepSeek-V3 (671B)
- Show detailed metrics: total throughput, output throughput, concurrency details
- Match KT-SFT presentation style for consistency

This provides users with actionable performance data they can use to evaluate
hardware requirements and expected performance for their use cases.

* fix README

* docs: clean up performance table and improve formatting

* add pic for README

* refactor: simplify .gitmodules and backup legacy submodules

- Remove 7 legacy submodules from root .gitmodules (archive/third_party/*)
- Keep only 2 active submodules for kt-kernel (llama.cpp, pybind11)
- Backup complete .gitmodules to archive/.gitmodules
- Add documentation in archive/README.md for researchers who need legacy submodules

This reduces initial clone size by ~500MB and avoids downloading unused dependencies.

* refactor: move doc/ back to root directory

Keep documentation in root for easier access and maintenance.

* refactor: consolidate all images to doc/assets/

- Move kt-kernel/assets/heterogeneous_computing.png to doc/assets/
- Remove KT-SFT/assets/ (images already in doc/assets/)
- Update KT-SFT/README.md image references to ../doc/assets/
- Eliminates ~7.9MB image duplication
- Centralizes all documentation assets in one location

* fix pic path for README
2025-11-10 17:42:26 +08:00
Atream
86229c852d
Add update for Kimi-K2-Thinking support 2025-11-06 17:56:46 +08:00
ovowei
44e47ad75a update readme.md 2025-11-05 23:30:58 +08:00
ovowei
00f038e763 update readme.md 2025-11-05 23:29:59 +08:00
ovowei
1e17d75bfd fix 2025-10-30 10:47:05 +08:00
ovowei
ca21992e46 update readme.md. (Support Ascend NPU) 2025-10-27 20:53:06 +08:00
Atream
8ef6111ae0
Update README with Citation link 2025-10-10 19:12:31 +08:00
Atream
1e48eab7d5
Add citation section to README
Added citation section with reference to KTransformers paper.
2025-10-10 18:59:29 +08:00
Atream
e93abc93ec
Add SGLang Integration to README.md 2025-10-10 18:50:05 +08:00
Jianwei Dong
d4b3fe2427
Merge branch 'main' into support-qwen3next 2025-09-12 21:59:32 +08:00
djw
a44b710649 support qwen3 next 2025-09-11 11:55:09 +00:00
Azure
24fe61bbc3
Update date for Kimi-K2-0905 support 2025-09-05 17:47:17 +08:00
Azure-Tang
b6d36bffbb update kimi-k2-0905 2025-09-05 03:52:43 +00:00
qiyuxinlin
1334ddc833 update readme 2025-07-25 17:02:36 +00:00
Atream
cf79c93fae
Update README.md 2025-07-11 09:35:12 +08:00
Atream
18690d819f
Update README.md 2025-07-11 09:34:07 +08:00
ErvinXie
aadf31b35d
Update README.md 2025-06-30 17:55:49 +08:00
ErvinXie
a9a72e52c3
Update README.md 2025-06-30 14:56:46 +08:00
liam Yuan
22d0d9ccb2 update vendor ZTE name 2025-06-23 21:07:17 +08:00
liam Yuan
cb77b52c63 update vendor support list 2025-06-23 21:00:01 +08:00
Atream
d051a14941
Update README.md 2025-05-15 10:29:43 +08:00
rnwang04
142fb7ce6c Enable support for Intel XPU devices, add support for DeepSeek V2/V3 first 2025-05-14 19:37:27 +00:00
Atream
7ebf82a492
Update Qwen3 date 2025-04-29 09:43:13 +08:00
qiyuxinlin
a3ba63665a update readme 2025-04-28 22:38:41 +00:00
qiyuxinlin
89823ccb1f update readme 2025-04-28 22:34:47 +00:00
qiyuxinlin
e7763a4b59 update readme 2025-04-28 22:32:35 +00:00
qiyuxinlin
d3ebdafd4b update readme 2025-04-28 22:31:09 +00:00
qiyuxinlin
59b0631e33 update readme 2025-04-28 22:26:38 +00:00