toy support for experts on GPU, no CUDA Graph

This commit is contained in:
Atream 2025-02-15 15:16:00 +00:00
parent 1548c99234
commit c189d55bd1
6 changed files with 199 additions and 65 deletions

View file

@ -713,6 +713,8 @@
generate_device: "cuda:7"
prefill_device: "cuda:7"
# don't inject lm_head if already inject marlin experts
# For final modules (model.norm and lm_head), ensure they are on GPU 7 (as in your original config)
- match:
name: "(^model\\.layers\\.(4[5-9]|5[0-9]|60)\\.)|(^model\\.norm)|(^lm_head)"