vrr/koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 09:34:37 +00:00

History

Concedo 17a24d753c Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/main-intel.Dockerfile # .devops/main-vulkan.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-vulkan.Dockerfile # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .gitignore # Makefile # README-sycl.md # README.md # ci/run.sh # flake.lock # llama.cpp # models/ggml-vocab-falcon.gguf # models/ggml-vocab-llama-spm.gguf # models/ggml-vocab-mpt.gguf # models/ggml-vocab-stablelm.gguf # models/ggml-vocab-starcoder.gguf # requirements.txt # scripts/check-requirements.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-0-bpe.py # tests/test-tokenizer-0-spm.py # tests/test-tokenizer-1-spm.cpp		2024-04-30 21:04:17 +08:00
..
CMakeLists.txt	quantize: add imatrix and dataset metadata in GGUF (#6658 )	2024-04-26 20:06:33 +02:00
quantize.cpp	Merge branch 'upstream' into concedo_experimental	2024-04-30 21:04:17 +08:00
README.md	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00
tests.sh	tests : minor bash stuff (#6902 )	2024-04-25 14:27:20 +03:00

README.md

quantize

TODO

Llama 2 7B

Quantization	Bits per Weight (BPW)
Q2_K	3.35
Q3_K_S	3.50
Q3_K_M	3.91
Q3_K_L	4.27
Q4_K_S	4.58
Q4_K_M	4.84
Q5_K_S	5.52
Q5_K_M	5.68
Q6_K	6.56

Llama 2 13B

Quantization	Bits per Weight (BPW)
Q2_K	3.34
Q3_K_S	3.48
Q3_K_M	3.89
Q3_K_L	4.26
Q4_K_S	4.56
Q4_K_M	4.83
Q5_K_S	5.51
Q5_K_M	5.67
Q6_K	6.56

Llama 2 70B

Quantization	Bits per Weight (BPW)
Q2_K	3.40
Q3_K_S	3.47
Q3_K_M	3.85
Q3_K_L	4.19
Q4_K_S	4.53
Q4_K_M	4.80
Q5_K_S	5.50
Q5_K_M	5.65
Q6_K	6.56