koboldcpp/examples/quantize
Concedo 17a24d753c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/main-intel.Dockerfile
#	.devops/main-vulkan.Dockerfile
#	.devops/server-intel.Dockerfile
#	.devops/server-vulkan.Dockerfile
#	.github/workflows/bench.yml
#	.github/workflows/build.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/server.yml
#	.gitignore
#	Makefile
#	README-sycl.md
#	README.md
#	ci/run.sh
#	flake.lock
#	llama.cpp
#	models/ggml-vocab-falcon.gguf
#	models/ggml-vocab-llama-spm.gguf
#	models/ggml-vocab-mpt.gguf
#	models/ggml-vocab-stablelm.gguf
#	models/ggml-vocab-starcoder.gguf
#	requirements.txt
#	scripts/check-requirements.sh
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-tokenizer-0-bpe.py
#	tests/test-tokenizer-0-spm.py
#	tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
..
CMakeLists.txt quantize: add imatrix and dataset metadata in GGUF (#6658) 2024-04-26 20:06:33 +02:00
quantize.cpp Merge branch 'upstream' into concedo_experimental 2024-04-30 21:04:17 +08:00
README.md chore: Fix markdown warnings (#6625) 2024-04-12 10:52:36 +02:00
tests.sh tests : minor bash stuff (#6902) 2024-04-25 14:27:20 +03:00

quantize

TODO

Llama 2 7B

Quantization Bits per Weight (BPW)
Q2_K 3.35
Q3_K_S 3.50
Q3_K_M 3.91
Q3_K_L 4.27
Q4_K_S 4.58
Q4_K_M 4.84
Q5_K_S 5.52
Q5_K_M 5.68
Q6_K 6.56

Llama 2 13B

Quantization Bits per Weight (BPW)
Q2_K 3.34
Q3_K_S 3.48
Q3_K_M 3.89
Q3_K_L 4.26
Q4_K_S 4.56
Q4_K_M 4.83
Q5_K_S 5.51
Q5_K_M 5.67
Q6_K 6.56

Llama 2 70B

Quantization Bits per Weight (BPW)
Q2_K 3.40
Q3_K_S 3.47
Q3_K_M 3.85
Q3_K_L 4.19
Q4_K_S 4.53
Q4_K_M 4.80
Q5_K_S 5.50
Q5_K_M 5.65
Q6_K 6.56