mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2026-05-19 08:00:45 +00:00
|
Some checks failed
Book-CI / test (push) Has been cancelled
Book-CI / test-1 (push) Has been cancelled
Book-CI / test-2 (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (ubuntu-latest) (push) Has been cancelled
Release the SafeTensor mmap loader singleton after each layer's load_weights() completes. The C++ engine already holds a deep copy (cpu_infer.sync() guarantees this), so releasing the mmap handles is safe. The next layer recreates the loader on demand. This halves peak memory usage during model loading (e.g. DSv3.2: 1.2T -> 613G). Based on #1966 by @poryfly — adapted to v0.6.2.post3 codebase (adds MXFP4 support missing from the original PR). Co-authored-by: xiongchenhui <xiongchenhui@hisense.com> |
||
|---|---|---|
| .. | ||
| ci | ||
| per_commit | ||
| __init__.py | ||
| run_suite.py | ||
| test_generate_gpu_experts_masks.py | ||
| test_native_moe_loader_auto_release.py | ||