model : NvFP4 quantized LM head support (#23046)

* NvFP4 quantized LM head support

Signed-off-by: ynankani <ynankani@nvidia.com>

* Address review commnets

Signed-off-by: ynankani <ynankani@nvidia.com>

* Add assert for NvFp4 lm head and tied embeddings

Signed-off-by: ynankani <ynankani@nvidia.com>

* Address review commnets

Signed-off-by: ynankani <ynankani@nvidia.com>

* Create output_s tensor only when LM head NvFp4

Signed-off-by: ynankani <ynankani@nvidia.com>

---------

Signed-off-by: ynankani <ynankani@nvidia.com>
This commit is contained in:
ynankani 2026-05-16 09:09:27 +00:00 committed by GitHub
parent 59778f0196
commit 42928bc14d
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
103 changed files with 121 additions and 101 deletions

View file

@ -141,7 +141,7 @@ llama_model_maincoder::graph::graph(const llama_model & model, const llm_graph_p
res->t_embd = cur;
// lm_head
cur = build_lora_mm(model.output, cur);
cur = build_lora_mm(model.output, cur, model.output_s);
cb(cur, "result_output", -1);
res->t_logits = cur;