koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-12 14:11:27 +00:00

Author	SHA1	Message	Date
Concedo	94827172e0	Merge branch 'master' into concedo # Conflicts: # CMakeLists.txt # Makefile # ggml-cuda.cu # ggml-cuda.h	2023-05-02 14:38:31 +08:00
DannyDaemonic	f4cef87edf	Add git-based build information for better issue tracking (#1232 ) * Add git-based build information for better issue tracking * macOS fix * "build (hash)" and "CMAKE_SOURCE_DIR" changes * Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages * Fix conditional dependency on missing target * Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile * 4 space indenting for cmake, attempt to clean up my mess in Makefile * Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it	2023-05-01 18:23:47 +02:00
Concedo	3de34ee492	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # ggml-opencl.c	2023-05-01 12:03:46 +08:00
Pavol Rusnak	6f79699286	build: add armv{6,7,8} support to cmake (#1251 ) - flags copied from Makefile - updated comments in both CMakeLists.txt and Makefile to match reality	2023-04-30 20:48:38 +02:00
Stephan Walter	f0d70f147d	Various fixes to mat_mul benchmark (#1253 )	2023-04-30 12:32:37 +00:00
Concedo	b3315459c7	pilled the new dequants for clblast, fixed some ooms	2023-04-30 14:15:44 +08:00
Georgi Gerganov	214b6a3570	ggml : adjust mul_mat_f16 work memory (#1226 ) * llama : minor - remove explicity int64_t cast * ggml : reduce memory buffer for F16 mul_mat when not using cuBLAS * ggml : add asserts to guard for incorrect wsize	2023-04-29 18:43:28 +03:00
Georgi Gerganov	305eb5afd5	build : fix reference to old llama_util.h	2023-04-29 13:53:12 +03:00
Concedo	bb282a4ecf	reinstated the q4_3 format, for backwards compatibility.	2023-04-29 11:42:04 +08:00
Concedo	0fc1772a8f	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ggml.c	2023-04-29 11:14:05 +08:00
slaren	7fc50c051a	cuBLAS: use host pinned memory and dequantize while copying (#1207 ) * cuBLAS: dequantize simultaneously while copying memory * cuBLAS: use host pinned memory * cuBLAS: improve ggml_compute_forward_mul_mat_f16_f32 with pinned memory * cuBLAS: also pin kv cache * fix rebase	2023-04-29 02:04:18 +02:00
0cc4m	7296c961d9	ggml : add CLBlast support (#1164 ) * Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing * Improve ClBlast implementation, avoid recreating buffers, remove redundant transfers * Finish merge of ClBlast support * Move CLBlast implementation to separate file Add buffer reuse code (adapted from slaren's cuda implementation) * Add q4_2 and q4_3 CLBlast support, improve code * Double CLBlast speed by disabling OpenBLAS thread workaround Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com> Co-authored-by: slaren <2141330+slaren@users.noreply.github.com> * Fix device selection env variable names * Fix cast in opencl kernels * Add CLBlast to CMakeLists.txt * Replace buffer pool with static buffers a, b, qb, c Fix compile warnings * Fix typos, use GGML_TYPE defines, improve code * Improve btype dequant kernel selection code, add error if type is unsupported * Improve code quality * Move internal stuff out of header * Use internal enums instead of CLBlast enums * Remove leftover C++ includes and defines * Make event use easier to read Co-authored-by: Henri Vasserman <henv@hot.ee> * Use c compiler for opencl files * Simplify code, fix include * First check error, then release event * Make globals static, fix indentation * Rename dequant kernels file to conform with other file names * Fix import cl file name --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com> Co-authored-by: slaren <2141330+slaren@users.noreply.github.com> Co-authored-by: Henri Vasserman <henv@hot.ee> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-04-28 17:57:16 +03:00
Johannes Gäßler	92a6e13a31	Add Manjaro CUDA include and lib dirs to Makefile (#1212 )	2023-04-28 15:40:32 +02:00
Concedo	032a171867	integrated q5 formats	2023-04-28 12:58:39 +08:00
Concedo	235daf4016	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # README.md	2023-04-25 20:44:22 +08:00
slaren	e4cf982e0d	Fix cuda compilation (#1128 ) * Fix: Issue with CUBLAS compilation error due to missing -fPIC flag --------- Co-authored-by: B1gM8c <89020353+B1gM8c@users.noreply.github.com>	2023-04-24 17:29:58 +02:00
Concedo	59fb174678	fixed compile errors, made mmap automatic when lora is selected, added updated quantizers and quantization handling for gpt neox gpt 2 and gptj	2023-04-24 23:20:06 +08:00
Concedo	8e615c8245	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-04-24 12:20:08 +08:00
Georgi Gerganov	e4422e299c	ggml : better PERF prints + support "LLAMA_PERF=1 make"	2023-04-23 18:15:39 +03:00
Concedo	1b7aa2b815	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile	2023-04-22 16:22:08 +08:00
Georgi Gerganov	872c365a91	ggml : fix AVX build + update to new Q8_0 format	2023-04-22 11:08:12 +03:00
Concedo	7b3d04e5d4	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt	2023-04-22 10:58:16 +08:00
Concedo	4fa3dfe8bc	just doesn't work properly on windows. will leave it as a manual flag for others	2023-04-22 10:57:38 +08:00
slaren	50cb666b8a	Improve cuBLAS performance by using a memory pool (#1094 ) * Improve cuBLAS performance by using a memory pool * Move cuda specific definitions to ggml-cuda.h/cu * Add CXX flags to nvcc * Change memory pool synchronization mechanism to a spin lock General code cleanup	2023-04-21 21:59:17 +02:00
Concedo	68898046c2	accidentally added the binaries onto repo again.	2023-04-22 00:41:19 +08:00
Concedo	f555db44ec	adding the libraries for cublas first. but i cannot get the kernel to work yet	2023-04-21 23:24:09 +08:00
Concedo	794a38a2e8	Revert "cublas is not feasible at this time. removed for now" This reverts commit `3687db7cf7`.	2023-04-21 21:02:40 +08:00
Concedo	5160053e51	merged llama adapter into the rest of the gpt adapters	2023-04-21 17:47:48 +08:00
Concedo	82d74ca1a6	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml	2023-04-21 16:24:30 +08:00
Concedo	3687db7cf7	cublas is not feasible at this time. removed for now	2023-04-21 16:14:23 +08:00
slaren	2005469ea1	Add Q4_3 support to cuBLAS (#1086 )	2023-04-20 20:49:53 +02:00
Concedo	07bb31b034	wip dont use	2023-04-21 00:35:54 +08:00
Concedo	7ba36c2c6c	trying to put out penguin based fires. sorry for inconvenience	2023-04-20 23:15:07 +08:00
源文雨	5addcb120c	fix: LLAMA_CUBLAS=1 undefined reference 'shm_open' (#1080 )	2023-04-20 15:28:43 +02:00
Concedo	4605074245	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ggml.c	2023-04-20 17:30:54 +08:00
Concedo	0b08ec7c5d	forgot to remove this	2023-04-20 16:28:47 +08:00
Concedo	346cd68903	make linux and OSX build process equal to windows. Now it will build all applicable libraries, for a full build do `make LLAMA_OPENBLAS=1 LLAMA_CLBLAST=1`	2023-04-20 15:53:55 +08:00
slaren	02d6988121	Improve cuBLAS performance by dequantizing on the GPU (#1065 )	2023-04-20 03:14:14 +02:00
Stephan Walter	f3d4edf504	ggml : Q4 cleanup - remove 4-bit dot product code (#1061 ) * Q4 cleanup * Remove unused AVX512 Q4_0 code	2023-04-19 19:06:37 +03:00
Concedo	be1222c36e	Merged the upstream cublas feature,	2023-04-19 20:45:37 +08:00
slaren	8944a13296	Add NVIDIA cuBLAS support (#1044 )	2023-04-19 11:22:45 +02:00
Concedo	f662a9a230	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # README.md	2023-04-19 16:34:51 +08:00
Kawrakow	5ecff35151	Adding a simple program to measure speed of dot products (#1041 ) On my Mac, the direct Q4_1 product is marginally slower (~69 vs ~55 us for Q4_0). The SIMD-ified ggml version is now almost 2X slower (~121 us). On a Ryzen 7950X CPU, the direct product for Q4_1 quantization is faster than the AVX2 implementation (~60 vs ~62 us). --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>	2023-04-18 19:00:14 +00:00
Concedo	ea01771dd5	rwkv is done	2023-04-18 20:55:01 +08:00
Concedo	763ad172c0	arranged files, updated kobold lite, modified makefile for extra link args on linux, started RWKV implementation	2023-04-17 17:31:45 +08:00
Concedo	6548d3b3fb	Added prints for stopping sequences, made makefile 1% friendlier to arch linux users	2023-04-16 20:43:17 +08:00
Georgi Gerganov	e95b6554b4	ggml : add Q8_0 quantization for intermediate results (#951 ) * ggml : add Q8_0 quantization for intermediate results * quantize-stats : fix test + add it to Makefile default * Q8: use int8_t, AVX/AVX2 optimizations * ggml : fix quantize_row_q8_0() ARM_NEON rounding * minor : updates after rebase to latest master * quantize-stats : delete obsolete strings * ggml : fix q4_1 dot func --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-04-15 17:53:22 +03:00
Concedo	d00b865eb1	Merge branch 'master' into concedo # Conflicts: # .devops/full.Dockerfile # Makefile # flake.nix	2023-04-15 11:33:43 +08:00
Stephan Walter	93265e988a	make : fix dependencies, use auto variables (#983 )	2023-04-14 22:39:48 +03:00
Concedo	932d981222	more make targets	2023-04-14 21:54:18 +08:00

1 2 3

114 commits