kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-04-28 03:39:48 +00:00

History

mrhaoxx 7a9daf0cd4 Some checks are pending Book-CI / test (push) Waiting to run Details Book-CI / test-1 (push) Waiting to run Details Book-CI / test-2 (push) Waiting to run Details Deploy / deploy (macos-latest) (push) Waiting to run Details Deploy / deploy (ubuntu-latest) (push) Waiting to run Details Deploy / deploy (windows-latest) (push) Waiting to run Details [feat](kt-kernel): support avx2 only inference for bf16 fp8 and gptq int4 (#1892 ) * feat: support avx2 bf16 fp8 inference * feat: support avx2 gptq int4 inference * fix: numeric issues in fp8 dequant * Tutorial avx2 (#1900) * fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines * docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs * Tutorial avx2 (#1901) * fix: prevent injecting -DLLAMA_AVX512=ON on AVX2-only machines * docs: add AVX2 tutorial for running KTransformers on AVX2-only CPUs * docs: update README.md --------- Co-authored-by: Benjamin F <159887351+yyj6666667@users.noreply.github.com>		2026-03-27 14:45:02 +08:00
..
api/server	Initial commit	2024-07-27 16:06:58 +08:00
AVX2-Tutorial_zh.md	[feat](kt-kernel): support avx2 only inference for bf16 fp8 and gptq int4 (#1892 )	2026-03-27 14:45:02 +08:00
clawdbot_integration_guide.md	[docs]: add clawd bot docs	2026-01-30 15:51:30 +08:00
DeepseekR1_V3_tutorial_zh.md	add flashinfer to cuda device	2025-05-15 07:03:45 +00:00
DeepseekR1_V3_tutorial_zh_for_Ascend_NPU.md	Npu revise benchmark results and prerequisites (#1716 )	2025-12-16 14:26:44 +08:00
KTransformers-Fine-Tuning_Developer-Technical-Notes_zh.md	[feat](cmake & doc): fix bug with cmake arch detect & update doc for sft	2025-11-04 08:46:26 +00:00
KTransformers-Fine-Tuning_User-Guide_zh.md	fix: remove py310 as guide	2025-11-08 08:54:32 +00:00
Low-cost_Cloud_Training_and_Inference_KTransformers+AutoDL+LlamaFactory_An_On-demand_Cost-efficient_Integrated_Pipeline_for_Ultra-large_Model_Fine-tuning_and_Inference.pdf	Add files via upload (#1814 )	2026-01-27 17:44:50 +08:00
Qwen3-MoE_tutorial_zh_for_Ascend_NPU.md	fix: qwen3-npu bugs; update: add readme-for-qwen3-npu (#1717 )	2025-12-16 14:27:04 +08:00
【云端低价训推】 KTransformers+AutoDL+LlamaFactory：随用随租的低成本超大模型「微调+推理」一体化流程.pdf	Add AutoDL Tutorial (#1801 )	2026-01-22 14:52:47 +08:00