kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-19 08:00:45 +00:00

History

Benjamin F f05b4009f3 Some checks failed Book-CI / test (push) Has been cancelled Details Book-CI / test-1 (push) Has been cancelled Details Book-CI / test-2 (push) Has been cancelled Details Deploy / deploy (windows-latest) (push) Has been cancelled Details Deploy / deploy (macos-latest) (push) Has been cancelled Details Deploy / deploy (ubuntu-latest) (push) Has been cancelled Details [fix](kt-kernel): fix double mem used by safetensor loader (#1997 ) Release the SafeTensor mmap loader singleton after each layer's load_weights() completes. The C++ engine already holds a deep copy (cpu_infer.sync() guarantees this), so releasing the mmap handles is safe. The next layer recreates the loader on demand. This halves peak memory usage during model loading (e.g. DSv3.2: 1.2T -> 613G). Based on #1966 by @poryfly — adapted to v0.6.2.post3 codebase (adds MXFP4 support missing from the original PR). Co-authored-by: xiongchenhui <xiongchenhui@hisense.com>		2026-05-11 12:00:30 +08:00
..
ci	add ci (#1642 )	2025-11-25 20:52:08 +08:00
per_commit	[feat](kt-kernel): add AVX2/AVX-VNNI RAWINT4 MoE backend (#1942 )	2026-04-30 17:16:49 +08:00
__init__.py	add ci (#1642 )	2025-11-25 20:52:08 +08:00
run_suite.py	update ci test (#1647 )	2025-11-27 16:39:48 +08:00
test_generate_gpu_experts_masks.py	[feat](kt-kernel): CPU-GPU experts sched (#1796 )	2026-01-16 17:01:15 +08:00
test_native_moe_loader_auto_release.py	[fix](kt-kernel): fix double mem used by safetensor loader (#1997 )	2026-05-11 12:00:30 +08:00