Concedo
4605074245
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# ggml.c
2023-04-20 17:30:54 +08:00
Stephan Walter
c8c2c52482
AVX2 optimization for vec_dot_q4_2_q8_0 ( #1068 )
2023-04-20 08:45:41 +02:00
slaren
02d6988121
Improve cuBLAS performance by dequantizing on the GPU ( #1065 )
2023-04-20 03:14:14 +02:00
Kawrakow
f7d05095b4
Q4_2 quantization with rmse-optimized scale and quants ( #1062 )
...
* Q4_2 quantization with rmse-optimized scale and quants
For quantize-stats we get
q4_2: rmse 0.00159301, maxerr 0.17480469, 95pct<0.0030, median<0.0012
For 7B perplexity with BLAS enabled we get 6.2038 after 655 chunks.
Quantization is slow (~90 seconds on my Mac for 7B) as not
multi-threaded as in PR #896 .
* ggml : satisfy the sanitizer builds
Not sure why this makes them fail
* Better follow ggml conventions for function names
* Fixed type as per reviewer comment
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 20:20:14 +02:00
Georgi Gerganov
884e7d7a2b
ggml : use 8-bit precision for Q4_1 intermediate results ( #1047 )
...
* ggml : use 8-bit precision for Q4_1 intermediate results (ARM)
* ggml : optimize ggml_vec_dot_q4_1_q8_0() via vmalq_n_f32
56 ms/token with Q4_1 !
* ggml : AVX2 implementation of ggml_vec_dot_q4_1_q8_0 (#1051 )
* gitignore : ignore ppl-*.txt files
---------
Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>
2023-04-19 20:10:08 +03:00
Stephan Walter
f3d4edf504
ggml : Q4 cleanup - remove 4-bit dot product code ( #1061 )
...
* Q4 cleanup
* Remove unused AVX512 Q4_0 code
2023-04-19 19:06:37 +03:00
Concedo
be1222c36e
Merged the upstream cublas feature,
2023-04-19 20:45:37 +08:00
slaren
8944a13296
Add NVIDIA cuBLAS support ( #1044 )
2023-04-19 11:22:45 +02:00
Concedo
f662a9a230
Merge branch 'master' into concedo
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# README.md
2023-04-19 16:34:51 +08:00
slaren
6667401238
Multi-threaded ggml_cpy ( #1035 )
...
* Multi-threaded ggml_cpy
* Update ggml.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Also fix wdata offset in ggml_compute_forward_add_q_f32
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-04-19 00:53:24 +02:00
Georgi Gerganov
77a73403ca
ggml : add new Q4_2 quantization (ARM only) ( #1046 )
...
* ggml : Q4_2 ARM
* ggml : add ggml_is_quantized()
* llama : update llama_type_name() with Q4_2 entry
* ggml : speed-up q4_2
- 4 threads: ~100ms -> ~90ms
- 8 threads: ~55ms -> ~50ms
* ggml : optimize q4_2 using vmlaq_n_f32 + vmulq_n_f32
2023-04-18 23:54:57 +03:00
Georgi Gerganov
50a8a2af97
ggml : scratch that - vmlaq_n_f32 is always better
...
Had a background process that was messing with the timings
2023-04-18 23:11:23 +03:00
Georgi Gerganov
dcdd65e296
ggml : optimize ggml_vec_dot_q4_0_q8_0() using vectorized accumulators
2023-04-18 22:59:17 +03:00
Concedo
ac61e34d5f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# README.md
2023-04-18 17:38:10 +08:00
slaren
315a95a4d3
Add LoRA support ( #820 )
2023-04-17 17:28:55 +02:00
Georgi Gerganov
69b740289f
ggml : avoid using ggml_fp16_to_fp32() and ggml_fp32_to_fp16() in ggml.c
2023-04-17 16:16:23 +03:00
Ivan Komarov
f266259ad9
Speedup the AVX-512 implementation of ggml_vec_dot_q4_0() ( #933 )
2023-04-17 15:10:57 +02:00
Concedo
5a4d1b5d15
Merge branch 'master' into concedo
...
# Conflicts:
# CMakeLists.txt
# Makefile
2023-04-16 14:08:23 +08:00
Stephan Walter
2f7c8e014e
Fix potential int8 overflow in non-SIMD vec_dot ( #986 )
2023-04-15 18:28:56 +00:00
Concedo
3e992eabb4
Merge remote-tracking branch 'occam/clblast-gpu-dequant' into concedo
2023-04-16 00:26:54 +08:00
Stephan Walter
0ad964631f
Refactor ggml.c for future tensor types ( #1001 )
2023-04-15 16:25:38 +00:00
0cc4m
57d046eeb6
Enable dequantization on GPU for ClBlast
2023-04-15 18:04:24 +02:00
Georgi Gerganov
e95b6554b4
ggml : add Q8_0 quantization for intermediate results ( #951 )
...
* ggml : add Q8_0 quantization for intermediate results
* quantize-stats : fix test + add it to Makefile default
* Q8: use int8_t, AVX/AVX2 optimizations
* ggml : fix quantize_row_q8_0() ARM_NEON rounding
* minor : updates after rebase to latest master
* quantize-stats : delete obsolete strings
* ggml : fix q4_1 dot func
---------
Co-authored-by: Stephan Walter <stephan@walter.name>
2023-04-15 17:53:22 +03:00
Georgi Gerganov
aa485cee33
ggml : use posix_memalign on non-Windows env
2023-04-15 14:25:45 +03:00
Concedo
d00b865eb1
Merge branch 'master' into concedo
...
# Conflicts:
# .devops/full.Dockerfile
# Makefile
# flake.nix
2023-04-15 11:33:43 +08:00
Pavol Rusnak
c56b715269
Expose type name from ggml ( #970 )
...
Avoid duplication of type names in utils
Co-authored-by: Håkon H. Hitland <haakon@likedan.net>
2023-04-14 20:05:37 +02:00
Kerfuffle
c9a59b70a5
ggml : add unary and binary map operations ( #874 )
...
* GGML map ops proof of concept.
* Various cleanups.
Add handling for task setting.
Add handling for ggml_compute_backward.
Rename functions to ggml_map_unary_f32 and ggml_map_binary_f32
Fix compiler warnings related to casting function pointers and `void *`
Reorder functions and definitions based on the GGML op number.
Use typedefs for map op function pointer types.
* Fix position of map ops cases in ggml_compute_forward
2023-04-14 17:43:55 +03:00
Concedo
a819f22cac
Merge branch 'master' into concedo
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# flake.nix
2023-04-14 21:40:33 +08:00
Georgi Gerganov
1623a6e9b4
ggml : minor
2023-04-14 13:31:29 +03:00
Georgi Gerganov
c14e0d2f23
ggml : always allocate buffers with size multiple of GGML_MEM_ALIGN
2023-04-14 13:31:15 +03:00
Georgi Gerganov
0f07cacb05
ggml : fix q4_1 dot product types
2023-04-14 09:45:42 +03:00
Howard Su
c5d70f5c9e
ggml : optimize rope function to avoid call powf in the tight loop ( #807 )
2023-04-14 09:24:52 +03:00
Georgi Gerganov
a3a2a0eda8
ggml : add GGML_DEFAULT_N_THREADS
2023-04-13 18:36:48 +03:00
Georgi Gerganov
d990e3fffc
ggml : speed-up ggml_vec_dot_q4_1() ARM_NEON + 32-bit ARM support ( #900 )
...
* ggml : speed-up q4_1 ARM_NEON by ~5%
* ggml : implement vaddvq when missing
* ggml : implement vminvq and vmaxvq when missing
* ggml : implement vzip when missing
* ggml : fix comment
* ggml : try to use correct ifdef
2023-04-13 18:32:36 +03:00
Stephan Walter
6232f2d7fd
ggml : optimize non-SIMD Q4_0 vector dot product ( #703 )
2023-04-13 17:59:50 +03:00
Pavol Rusnak
6c248707f5
ggml : introduce GGML_ALIGNED_MALLOC/GGML_ALIGNED_FREE macros ( #884 )
...
which allows us to use aligned_alloc or _aligned_malloc functions
2023-04-13 17:08:32 +03:00
Vladimir
8c3ffc2f04
ggml : update cblas_sgemm columns var to be more reasonable ( #838 )
2023-04-13 16:24:30 +03:00
Concedo
4faae0afa9
Merged upstream, fixed OSX compile errors, integrated noavx2 build into main
2023-04-12 18:08:55 +08:00
Pavol Rusnak
8b679987cd
Fix whitespace, add .editorconfig, add GitHub workflow ( #883 )
2023-04-11 19:45:44 +00:00
Concedo
9245c7d7d0
Merge branch 'master' into concedo
2023-04-11 23:38:15 +08:00
Concedo
23c675b2e6
integrated optional (experimentl) CLBlast support
2023-04-11 23:33:44 +08:00
Stephan Walter
3e6e70d8e8
Add enum llama_ftype, sync ggml_type to model files ( #709 )
2023-04-11 15:03:51 +00:00
comex
2663d2c678
Windows fixes ( #890 )
...
Mostly for msys2 and mingw64 builds, which are different from each other
and different from standard Visual Studio builds. Isn't Windows fun?
- Define _GNU_SOURCE in more files (it's already used in ggml.c for
Linux's sake).
- Don't use PrefetchVirtualMemory if not building for Windows 8 or later
(mingw64 doesn't by default). But warn the user about this situation
since it's probably not intended.
- Check for NOMINMAX already being defined, which it is on mingw64.
- Actually use the `increment` variable (bug in my `pizza` PR).
- Suppress unused variable warnings in the fake pthread_create and
pthread_join implementations for Windows.
- (not Windows-related) Remove mention of `asprintf` from comment;
`asprintf` is no longer used.
Fixes #871 .
2023-04-11 15:19:54 +02:00
Concedo
c9f18082fd
Merge remote-tracking branch 'occam/clblast' into concedo
2023-04-11 17:01:31 +08:00
Georgi Gerganov
461ba9e66e
ggml : fix WASM build
2023-04-10 23:20:01 +03:00
Georgi Gerganov
c3ac702e5e
ggml : add ggml_cont() + optimize ggml_cpy() for contiguous dst
2023-04-10 22:42:28 +03:00
Georgi Gerganov
9d634ef452
ggml : remove trailing whitespaces
2023-04-10 22:42:28 +03:00
Marco Matthies
d9a239c410
Simplify to include lower-case windows.h always, fix compile on mingw32 ( #747 )
2023-04-10 19:57:59 +02:00
Georgi Gerganov
684da25926
ggml : fix quantize_row_q4_1() ARM_NEON ( close #876 )
2023-04-10 19:29:48 +03:00
0cc4m
c3db99ea32
Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing
2023-04-10 18:20:40 +02:00