ggml: add Q1_0 1-bit quantization support (CPU) (#21273)

* ggml: add Q1_0 and Q1_0_g128 1-bit quantization support (CPU)

* add generic fallback for x86

* remove Q1_0 (group size 32)

* rename Q1_0_g128 => Q1_0

* fix Q1_0 LlamaFileType Enum

* Fix trailing spaces; add generic fallback for othre backends

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix /r/n spacing + arch-fallback

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
This commit is contained in:
Pasha Khosravi 2026-04-06 11:55:21 -07:00 committed by GitHub
parent 506200cf8b
commit 2e1f0a889e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
21 changed files with 285 additions and 5 deletions

View file

@ -154,6 +154,7 @@ extern "C" {
LLAMA_FTYPE_MOSTLY_TQ2_0 = 37, // except 1d tensors
LLAMA_FTYPE_MOSTLY_MXFP4_MOE = 38, // except 1d tensors
LLAMA_FTYPE_MOSTLY_NVFP4 = 39, // except 1d tensors
LLAMA_FTYPE_MOSTLY_Q1_0 = 40, // except 1d tensors
LLAMA_FTYPE_GUESSED = 1024, // not specified in the model file
};