koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-04-28 11:40:43 +00:00

Author	SHA1	Message	Date
Concedo	50a27793d3	upgrade windows runners to windows 2022, cu11 still uses vs2019 this should finally work (+21 squashed commit) Squashed commit: [5edac5b59] Revert "quick dbg" This reverts commit fd62a997cc6684bb89242d5e7b0ae2aed83fd27f. [fd62a997c] quick dbg [bcccae7e6] sanity check 2 [568e2eb08] sanity check [2f30d573a] please work 2 [cf8765221] please work [c535e60d9] try a small trick [d4ba79b80] 2022 test [3f146b000] t2 [4a3b9a9b4] revert and test [4bdc9a149] reverted test2 [5081cb4a3] reverted test [ea9a826f3] broken test [3c11ae389] compare 2019 [8ecec4fec] not for cu12 [0be964f3a] added vs2019 for the other runners [5d24641cb] debugging 4 [1dee79207] debugging 3 [ab172f133] more debugging 2 [b1a895e84] more debugging [5d21d8bd0] vs2019 setup	2025-06-06 14:02:34 +08:00
Concedo	a341188f84	add install for vs2019	2025-06-05 10:32:57 +08:00
Concedo	a74d8669b3	try hardcoded path (+1 squashed commits) Squashed commits: [711b43d9d] let's see if VS2019 can work	2025-06-05 10:26:02 +08:00
Concedo	f3bb947a13	cuda use wmma flash attention for turing (+1 squashed commits) Squashed commits: [3c5112398] 117 (+10 squashed commit) Squashed commit: [4f01bb2d4] 117 graphs 80v [7549034ea] 117 graphs [dabf9cb99] checking if cuda 11.5.2 works [ba7ccdb7a] another try cu11.7 only [752cf2ae5] increase aria2c download log rate [dc4f198fd] test send turing to wmma flash attention [496a22e83] temp build test cu11.7.0 [ca759c424] temp build test cu11.7 [c46ada17c] test build: enable virtual80 for oldcpu [3ccfd939a] test build: with cuda graphs for all	2025-06-01 11:41:45 +08:00
henk717	b8883e254a	KoboldCpp.sh updates (#1562 ) * YR makefile upstream * Create make_portable_rocm_libs.sh * update makefile, support llama portable, ditch all unnecessary changes * Delete make_portable_rocm_libs.sh should not be needed * koboldcpp.sh updates * Small rocm fixes * ROCm is now a cuda version not a command * Don't commit temp file * Don't commit temp file * 1200 has errors, removing it for now * Only rebuild rocm with rebuild * Update kcpp-build-release-linux.yaml * Fix rocm filename * ROCm Linux CI * We need more diskspace * Workaround for lockfile getting stuck Why do I have to do hacks like this.... * Update kcpp-build-release-linux-rocm.yaml * Dont apt update rocm You don't allow us to apt update? Better not break things github! * Container maybe? * Turns out we aren't root, so we use sudo * Cleanup ROCm CI PR * Build for Runpods GPU * We also need rocblas * More cleanup just in case * Update kcpp-build-release-linux-rocm.yaml --------- Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>	2025-05-26 15:24:49 +08:00
Concedo	0dca953d78	removed winget workflow	2025-05-24 16:40:39 +08:00
Concedo	55cc9acec5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/release.yml # README.md # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/clip.cpp # tools/mtmd/clip.h	2025-05-24 12:10:36 +08:00
Diego Devesa	b775345d78	ci : enable winget package updates (#13734 )	2025-05-23 23:14:00 +03:00
Diego Devesa	a70a8a69c2	ci : add winget package updater (#13732 )	2025-05-23 22:09:38 +02:00
Diego Devesa	3079e9ac8e	release : fix windows hip release (#13707 ) * release : fix windows hip release * make single hip release with multiple targets	2025-05-23 00:21:37 +02:00
Concedo	fdca5ba71e	declutter	2025-05-22 22:58:47 +08:00
Concedo	8bd6f9f9ae	added a simple cross platform launch script for unpacked dirs	2025-05-22 22:09:46 +08:00
Diego Devesa	d643bb2c79	releases : build CPU backend separately (windows) (#13642 )	2025-05-21 22:09:57 +02:00
Concedo	d04b4eeb04	merge not working	2025-05-21 18:06:41 +08:00
R0CKSTAR	33983057d0	musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647 ) * musa: fix build warning (unused parameter) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: upgrade MUSA SDK version to rc4.0.1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/cpy.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-05-21 09:58:49 +08:00
Alberto Cabrera Pérez	f71f40a284	ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532 )	2025-05-19 11:46:09 +01:00
Concedo	59300dbdf5	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/actions/windows-setup-curl/action.yml # .github/workflows/build-linux-cross.yml # README.md # common/CMakeLists.txt # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # tools/server/README.md	2025-05-18 23:27:53 +08:00
Concedo	be3e93c76a	bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme (+1 squashed commits) Squashed commits: [61c152daf] bundle AGPL license and llama.cpp's MIT license into binaries. clarified some licensing terms, updated readme	2025-05-18 02:21:27 +08:00
Diego Devesa	415e40a357	releases : use arm version of curl for arm releases (#13592 )	2025-05-16 19:36:51 +02:00
Sigbjørn Skjæret	7c07ac244d	ci : add ppc64el to build-linux-cross (#13575 )	2025-05-16 14:54:23 +02:00
Thammachart Chinvarapon	b064a51a4e	ci: free_disk_space flag enabled for intel variant (#13426 ) before cleanup: 20G after cleanup: 44G after all built and pushed: 24G https://github.com/Thammachart/llama.cpp/actions/runs/14945093573/job/41987371245	2025-05-10 16:34:48 +02:00
Jeff Bolz	dc1d2adfc0	vulkan: scalar flash attention implementation (#13324 ) * vulkan: scalar flash attention implementation * vulkan: always use fp32 for scalar flash attention * vulkan: use vector loads in scalar flash attention shader * vulkan: remove PV matrix, helps with register usage * vulkan: reduce register usage in scalar FA, but perf may be slightly worse * vulkan: load each Q value once. optimize O reduction. more tuning * vulkan: support q4_0/q8_0 KV in scalar FA * CI: increase timeout to accommodate newly-supported tests * vulkan: for scalar FA, select between 1 and 8 rows * vulkan: avoid using Float16 capability in scalar FA	2025-05-10 08:07:07 +02:00
Concedo	2f5f4ee65a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # common/CMakeLists.txt	2025-05-09 14:18:20 +08:00
Diego Devesa	15e03282bb	ci : limit write permission to only the release step + fixes (#13392 ) * ci : limit write permission to only the release step * fix win cuda file name * fix license file copy on multi-config generators	2025-05-08 23:45:22 +02:00
Concedo	2439014a03	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # examples/embedding/embedding.cpp # tools/imatrix/imatrix.cpp # tools/perplexity/perplexity.cpp	2025-05-08 23:41:02 +08:00
Diego Devesa	70a6991edf	ci : move release workflow to a separate file (#13362 )	2025-05-08 13:15:28 +02:00
Diego Devesa	814f795e06	docker : disable arm64 and intel images (#13356 )	2025-05-07 16:36:33 +02:00
Concedo	b951310ca5	tryout smaller binaries	2025-05-07 14:56:34 +08:00
Diego Devesa	9f2da5871f	llama : build windows releases with dl backends (#13220 )	2025-05-04 14:20:49 +02:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Concedo	bc452da452	improved comfyui compatibility, tweaked hf search	2025-05-02 16:18:31 +08:00
bandoti	d24d592808	ci: fix cross-compile sync issues (#12804 )	2025-05-01 19:06:39 -03:00
bandoti	00137157fc	Disable CI cross-compile builds (#13022 )	2025-04-19 18:05:03 +02:00
hipudding	54a7272043	CANN: Add x86 build ci (#12950 ) * CANN: Add x86 build ci * CANN: fix code format	2025-04-15 12:08:55 +01:00
Concedo	c94aec1930	update workflows, update gemma default adapter sysprompt	2025-04-12 18:38:23 +08:00
Concedo	b42fa821d8	try allow build from commit hash	2025-04-12 13:37:10 +08:00
Concedo	7a7bdeab6d	json to gbnf endpoint added	2025-04-12 11:41:11 +08:00
R0CKSTAR	8ac9f5d765	ci : Replace freediskspace to free_disk_space in docker.yml (#12861 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-11 09:26:17 +02:00
R0CKSTAR	d9a63b2f2e	musa: enable freediskspace for docker image build (#12839 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-09 11:22:30 +02:00
Chenguang Li	6e1c4cebdb	CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786 ) * [CANN] Support ELU and CONV_TRANSPOSE_1D * [CANN]Modification review comments * [CANN]Modification review comments * [CANN]name adjustment * [CANN]remove lambda used in template * [CANN]Use std::func instead of template * [CANN]Modify the code according to the review comments --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com>	2025-04-09 14:04:14 +08:00
Concedo	b99ee451f8	Merge commit '`4ccea213bc`' into concedo_experimental # Conflicts: # .devops/cpu.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .github/workflows/bench.yml.disabled # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # build-xcframework.sh # ci/run.sh # common/CMakeLists.txt # examples/llama.android/llama/build.gradle.kts # examples/perplexity/perplexity.cpp # examples/run/CMakeLists.txt # examples/server/tests/README.md # examples/sycl/win-build-sycl.bat # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # licenses/LICENSE-linenoise # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-04-08 21:26:23 +08:00
Concedo	822cf2430e	Merge commit '`f1e3eb4249`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/backend/SYCL.md # examples/llava/clip.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in	2025-04-08 20:48:53 +08:00
Xuan-Son Nguyen	bd3f59f812	cmake : enable curl by default (#12761 ) * cmake : enable curl by default * no curl if no examples * fix build * fix build-linux-cross * add windows-setup-curl * fix * shell * fix path * fix windows-latest-cmake* * run: include_directories * LLAMA_RUN_EXTRA_LIBS * sycl: no llama_curl * no test-arg-parser on windows * clarification * try riscv64 / arm64 * windows: include libcurl inside release binary * add msg * fix mac / ios / android build * will this fix xcode? * try clearing the cache * add bunch of licenses * revert clear cache * fix xcode * fix xcode (2) * fix typo	2025-04-07 13:35:19 +02:00
Concedo	5edbacdd0e	fix tools (+3 squashed commit) Squashed commit: [95a489ee] fix tools build [1d3d3451] add accelerate [`2837705c`] edit a line	2025-04-06 21:30:48 +08:00
Concedo	8415cac7ac	add vk shaders source (+1 squashed commits) Squashed commits: [45359f49] add vk shaders source	2025-04-05 22:45:18 +08:00
Concedo	34ddd874fe	try containerized ci (+3 squashed commit) Squashed commit: [`f0600744`] troubleshooting [`fe11073c`] cap auto threads at 32 due to diminishing returns [`0c7f8a1d`] troubleshooting	2025-04-05 01:51:03 +08:00
bandoti	1be76e4620	ci: add Linux cross-compile build (#12428 )	2025-04-04 14:05:12 -03:00
Concedo	57e12b73af	try containerized ci (+1 squashed commits) Squashed commits: [fc53c200] try containerized ci (+1 squashed commits) Squashed commits: [4b48b0d5] try containerized ci	2025-04-04 17:19:27 +08:00
0cc4m	a8a1f33567	Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135 ) * Vulkan: Add DP4A MMQ and Q8_1 quantization shader * Add q4_0 x q8_1 matrix matrix multiplication support * Vulkan: Add int8 coopmat MMQ support * Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code * Add GL_EXT_integer_dot_product check * Remove ggml changes, fix mmq pipeline picker * Remove ggml changes, restore Intel coopmat behaviour * Fix glsl compile attempt when integer vec dot is not supported * Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq * Remove redundant comment * Fix integer dot check * Fix compile issue with unsupported int dot glslc * Update Windows build Vulkan SDK version	2025-03-31 14:37:01 +02:00
Concedo	143b611274	updated workflows	2025-03-19 21:56:35 +08:00

1 2 3 4 5 ...

399 commits