Henri Vasserman
3b4a53138f
Merge 'origin/master' into hipblas
2023-04-28 10:08:41 +03:00
Henri Vasserman
a1caa48611
add more cuda defines
...
This is so 'slaren/cuda-f16f32' would merge.
2023-04-28 10:08:21 +03:00
Georgi Gerganov
574406dc7e
ggml : add Q5_0 and Q5_1 quantization ( #1187 )
...
* ggml : add Q5_0 quantization (cuBLAS only)
* ggml : fix Q5_0 qh -> uint32_t
* ggml : fix q5_0 histogram stats
* ggml : q5_0 scalar dot product
* ggml : q5_0 ARM NEON dot
* ggml : q5_0 more efficient ARM NEON using uint64_t masks
* ggml : rename Q5_0 -> Q5_1
* ggml : adding Q5_0 mode
* quantize : add Q5_0 and Q5_1 to map
* ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195 )
---------
Co-authored-by: Stephan Walter <stephan@walter.name>
2023-04-26 23:14:13 +03:00
Henri Vasserman
ef51e9ecac
Merge branch 'ggerganov:master' into hipblas
2023-04-26 12:46:26 +03:00
Georgi Gerganov
7a32fcb3b2
ggml : add Q8_0 quantization format (rename the old one to Q8_1) (ARM NEON) ( #1179 )
...
* ggml : add Q8_0 quantization format (rename the old one to Q8_1)
* tests : fix test-quantize-fns
* ggml : finalize Q8_0 implementation
* ggml : use q4_0_q8_0 and q4_2_q8_0
* ggml : fix Q8_0 dot product bug (ARM)
* ggml : Q8_0 unroll x2
* ggml : fix bug - using wrong block type
* ggml : extend quantize_fns_t with "vec_dot_type"
* ggml : fix Q8_0 to use 255 values out of 256
* ggml : fix assert using wrong QK4_2 instead of QK4_3
2023-04-25 23:40:51 +03:00
Henri Vasserman
db7a01297e
Merge 'origin/master' into hipblas
2023-04-23 21:49:28 +03:00
slaren
50cb666b8a
Improve cuBLAS performance by using a memory pool ( #1094 )
...
* Improve cuBLAS performance by using a memory pool
* Move cuda specific definitions to ggml-cuda.h/cu
* Add CXX flags to nvcc
* Change memory pool synchronization mechanism to a spin lock
General code cleanup
2023-04-21 21:59:17 +02:00
slaren
2005469ea1
Add Q4_3 support to cuBLAS ( #1086 )
2023-04-20 20:49:53 +02:00
slaren
02d6988121
Improve cuBLAS performance by dequantizing on the GPU ( #1065 )
2023-04-20 03:14:14 +02:00