kvcache-ai-ktransformers/kt-kernel/test
Benjamin F f05b4009f3
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
[fix](kt-kernel): fix double mem used by safetensor loader (#1997)
Release the SafeTensor mmap loader singleton after each layer's
load_weights() completes. The C++ engine already holds a deep copy
(cpu_infer.sync() guarantees this), so releasing the mmap handles is
safe. The next layer recreates the loader on demand.

This halves peak memory usage during model loading (e.g. DSv3.2:
1.2T -> 613G).

Based on #1966 by @poryfly — adapted to v0.6.2.post3 codebase
(adds MXFP4 support missing from the original PR).

Co-authored-by: xiongchenhui <xiongchenhui@hisense.com>
2026-05-11 12:00:30 +08:00
..
ci add ci (#1642) 2025-11-25 20:52:08 +08:00
per_commit [feat](kt-kernel): add AVX2/AVX-VNNI RAWINT4 MoE backend (#1942) 2026-04-30 17:16:49 +08:00
__init__.py add ci (#1642) 2025-11-25 20:52:08 +08:00
run_suite.py update ci test (#1647) 2025-11-27 16:39:48 +08:00
test_generate_gpu_experts_masks.py [feat](kt-kernel): CPU-GPU experts sched (#1796) 2026-01-16 17:01:15 +08:00
test_native_moe_loader_auto_release.py [fix](kt-kernel): fix double mem used by safetensor loader (#1997) 2026-05-11 12:00:30 +08:00