ynankani
|
42928bc14d
|
model : NvFP4 quantized LM head support (#23046)
* NvFP4 quantized LM head support
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Add assert for NvFp4 lm head and tied embeddings
Signed-off-by: ynankani <ynankani@nvidia.com>
* Address review commnets
Signed-off-by: ynankani <ynankani@nvidia.com>
* Create output_s tensor only when LM head NvFp4
Signed-off-by: ynankani <ynankani@nvidia.com>
---------
Signed-off-by: ynankani <ynankani@nvidia.com>
|
2026-05-16 11:09:27 +02:00 |
|
Xuan-Son Nguyen
|
994118a183
|
model: move load_hparams and load_tensors to per-model definition (#22004)
* git-friendly migration
* add build_graph
* nits
* exclude old code from build
* wip
* add llm_arch_model_i
* prepare downstream functions
* nits
* nits
* wip
* wip
* add back create_tensor_qkv
* fix files missing include
* enforce one llm_build per arch
* cmake: use glob
* missing model params
* nits
* wip
* wip (2)
* wip (3)
* test-llama-archs is happy
* improve switch case
* move more stuff into llm_arch_model_i
* fix downstream code
* nits
* nits (2)
* fix order
* llama_model_base
* LLAMA_LOAD_LOCALS
* small fix
* fix build errors
* auto
* rm migration script and ifdef
|
2026-05-04 12:36:59 +02:00 |
|
Sigbjørn Skjæret
|
4f02d47339
|
model : refactor bias tensor variable names (#22079)
* refactor bias tensor variable names
* use create_tensor_qkv for jina-bert-v2
|
2026-04-18 20:12:00 +02:00 |
|
Xuan-Son Nguyen
|
4fbdabdc61
|
model: using single llm_build per arch (#21970)
* model: using single llm_build per arch
* fix merge
* nits
|
2026-04-16 21:10:22 +02:00 |
|