koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
0cc4m	dcb2ed4826	OpenCL: Fix duplication of layers in VRAM and RAM, add GPU mul kernel (#1653 ) * Use events instead of clFinish, where possible * OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel * Reduce queueing overhead for contiguous tensors by using single mul kernel call * Adapt to #1612 cl_mem malloc changes * Reduce code duplication between cuda and opencl branches * Improve implementation	2023-06-04 08:12:05 +02:00
Concedo	96b0e536b7	Merge branch 'opencl-dev-concedo' into concedo_experimental	2023-06-02 22:12:14 +08:00
Concedo	59fe16877d	Clblast fixes + enhancements to save VRAM: 1. Change all Clblast buffers to CL_MEM_READ_WRITE, as the pool malloc currently doesn't properly handle them. 2. When recycling buffers in pool malloc, always assign the SMALLEST available buffer that fits, instead of the FIRST available buffer 3. When failing to recycle a buffer in pool malloc (all too small), instead recycle the largest available free buffer by resizing it.	2023-06-02 22:10:49 +08:00
Concedo	8d0c81e7cc	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-06-02 12:19:59 +08:00
0cc4m	24239f0df7	Improve implementation	2023-06-01 18:57:08 +02:00
Concedo	234270bd83	back to 32 block size, not better	2023-06-01 00:14:22 +08:00
Concedo	446e42a8c6	change dmmv block size	2023-05-31 21:40:12 +08:00
Concedo	077ee4e989	Revert "Revert "opencl : no need to allocate cl_mem on heap (#1612 )"" This reverts commit `4afa38e744`.	2023-05-31 18:00:52 +08:00
Concedo	50c85bea4c	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 17:53:14 +08:00
0cc4m	5e1eecfe12	Adapt to #1612 cl_mem malloc changes	2023-05-31 07:07:47 +02:00
0cc4m	49aaf08387	Merge remote-tracking branch 'origin/master' into opencl-dev	2023-05-31 06:58:51 +02:00
Concedo	85c9f7df41	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental	2023-05-31 10:20:32 +08:00
Concedo	4afa38e744	Revert "opencl : no need to allocate cl_mem on heap (#1612 )" This reverts commit `bb051d9723`.	2023-05-31 10:20:23 +08:00
0cc4m	ac6b49ed45	Reduce queueing overhead for contiguous tensors by using single mul kernel call	2023-05-30 18:49:53 +02:00
Concedo	3a73ebe8d2	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/full.Dockerfile # .devops/main.Dockerfile # Makefile	2023-05-29 16:47:32 +08:00
Concedo	254a9ff12c	Merge commit '`ebc5d0651a`' into concedo_experimental # Conflicts: # ggml-opencl.cpp	2023-05-29 16:26:24 +08:00
Howard Su	bb051d9723	opencl : no need to allocate cl_mem on heap (#1612 )	2023-05-28 20:13:36 +03:00
Howard Su	ca74884f66	opencl : use strstr to check if fp16 supported (#1611 ) * Use strstr to check if fp16 supported * Ensure ext_buffer is null terminated	2023-05-28 20:09:56 +03:00
0cc4m	97c5cca4e5	OpenCL: Don't load gpu layers into RAM, add mul_f32 kernel	2023-05-27 12:00:56 +02:00
0cc4m	ebc5d0651a	Use events instead of clFinish, where possible	2023-05-27 10:03:35 +02:00
Concedo	6d7749c98f	no difference	2023-05-27 12:42:19 +08:00
Concedo	bd4fe936f5	cleanup sampling code	2023-05-27 11:58:39 +08:00
Concedo	c97e10c50c	Merge branch 'master' into concedo_experimental	2023-05-24 00:36:30 +08:00
Maarten ter Huurne	7d873811f3	Fix handling of "invalid property" when creating OpenCL command queue (#1565 ) The `clCreateCommandQueue()` function will return the code `CL_INVALID_QUEUE_PROPERTIES` when passed unsupported properties, not `CL_INVALID_PROPERTY` as the original code was checking for.	2023-05-23 19:01:15 +03:00
Concedo	5bf9784381	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # ggml-opencl.cpp # llama.cpp	2023-05-23 18:19:16 +08:00
0cc4m	2e6cd4b025	OpenCL Token Generation Acceleration (#1459 ) * Move back to C++ for OpenCL * Refactor OpenCL code to work more like the CUDA code, add missing functions * Deduplicate dequant kernels * Add OpenCL compile options * Use compile args for preprocessing constants * Restore default platform + device selection by id behavior --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Henri Vasserman <henv@hot.ee>	2023-05-23 00:33:24 +03:00
Concedo	981d5ba866	Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # ggml-opencl.cpp # llama.cpp # otherarch/ggml_v2-opencl-legacy.c	2023-05-22 16:16:48 +08:00
Concedo	587308a202	fixed some build errors on linux, changed icon resolution, added more error printing	2023-05-22 12:18:42 +08:00
0cc4m	18e9dd87da	Explicitely set GEMM type	2023-05-21 08:34:17 +02:00
0cc4m	b6b39960c0	Use compile args for preprocessing constants	2023-05-21 08:17:17 +02:00
0cc4m	a1657d0233	Add OpenCL compile options	2023-05-21 07:53:22 +02:00
0cc4m	e41a7ae40c	Fix convert_row_f16 kernel issue	2023-05-21 07:53:22 +02:00
0cc4m	457eff920e	Deduplicate dequant kernels	2023-05-21 07:53:22 +02:00
0cc4m	42e1a2ba3d	Fix tensor load to device Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2023-05-21 07:50:27 +02:00
0cc4m	cda2d488f9	Fix error in convert f16 to f32 kernel call	2023-05-21 07:49:54 +02:00
0cc4m	915d0d1168	Generate dequant_mul_mat kernels from simple templates	2023-05-21 07:49:24 +02:00
0cc4m	cb588e2aa4	Add remaining dequant_mul_mat functions	2023-05-21 07:47:18 +02:00
0cc4m	8c7a7cea2e	Fix dequant_mul_mat kernel	2023-05-21 07:44:49 +02:00
0cc4m	5f610c90bf	Fix bugs in dequant_mul_mat code	2023-05-21 07:44:48 +02:00
0cc4m	17e53dbb7e	Refactor OpenCL code to work more like the CUDA code, add missing functions	2023-05-21 07:42:06 +02:00
0cc4m	a7e3bee4cc	Move back to C++ for OpenCL	2023-05-21 06:17:31 +02:00
Concedo	c048bcfec4	remove old filever checks (+7 squashed commit) Squashed commit: [b72627a] new format not working [e568870] old ver works [7053b77] compile errors fixed, fixing linkers [4ae8889] add new ver [ff82dfd] file format checks [25b8aa8] refactoring type names [931063b] still merging	2023-05-21 00:15:39 +08:00
0cc4m	02914698f0	Update Q4_0, Q4_1 and Q8_0 to use half instead of float	2023-05-20 07:45:56 +02:00
0cc4m	285f8f990b	Explicitely set CLBlast GEMM type	2023-05-20 07:26:38 +02:00
0cc4m	78b1d8351f	Add OpenCL compile options	2023-05-19 21:18:57 +02:00
0cc4m	b73c437e83	Fix convert_row_f16 kernel issue	2023-05-18 08:05:19 +02:00
0cc4m	0df55da4ca	Deduplicate dequant kernels	2023-05-18 07:35:40 +02:00
0cc4m	67dbd356b6	Remove redundant constant values	2023-05-17 19:20:46 +02:00
0cc4m	de10afa80f	Fix tensor load to device Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2023-05-16 18:49:49 +02:00
0cc4m	b3ff66d87f	Fix error in convert f16 to f32 kernel call	2023-05-16 13:05:33 +02:00

1 2 3

106 commits