koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-16 03:49:42 +00:00

Author	SHA1	Message	Date
Concedo	3fa4843850	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/server/README.md # src/llama-model.cpp	2025-02-08 22:57:18 +08:00
Karol Kontny	4d3465c5ae	ggml: Fix data race in ggml threadpool (#11736 ) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function.	2025-02-08 15:30:53 +01:00
Concedo	27b9358baf	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/run/run.cpp # scripts/sync-ggml.last	2025-02-08 01:31:49 +08:00
Jinyang He	225bbbfa39	ggml : optimize and build warning fix for LoongArch (#11709 ) * ggml : optimize convert f32<->f16 for loongarch_asx * ggml : optimize loongarch_asx extend i16,i8,u8 to i32,i16 * ggml : Fix warnings when run cpu CI locally on LoongArch	2025-02-07 09:38:31 +02:00
Concedo	f13498df13	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/tools.sh # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/server.yml # Makefile # README.md # cmake/llama-config.cmake.in # common/CMakeLists.txt # examples/gbnf-validator/gbnf-validator.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp	2025-02-01 17:14:59 +08:00
issixx	d2e518e9b4	ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065) some threads kept looping and failed to terminate properly after an abort during CPU execution. Co-authored-by: issi <issi@gmail.com>	2025-01-29 11:24:51 +02:00
Concedo	bec231422a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # common/CMakeLists.txt # docs/backend/SYCL.md # docs/build.md # docs/docker.md # examples/export-lora/export-lora.cpp # examples/main/README.md # examples/main/main.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # examples/simple-chat/simple-chat.cpp # ggml/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-01-25 14:16:50 +08:00
Johannes Gäßler	8137b4bb2b	CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380 )	2025-01-24 12:38:31 +01:00
Concedo	96407502cd	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama-bench/llama-bench.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt # src/llama-vocab.cpp # tests/test-backend-ops.cpp	2025-01-17 23:13:50 +08:00
Jeff Bolz	bd38ddea01	vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166 ) * vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl Shaders are based on cpy.cu. * vulkan: support copy from q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl to f32 * ggml: copy q->f32 assumes some contiguity in the destination	2025-01-16 22:47:10 +01:00
Johannes Gäßler	9c8dcefe17	CUDA: backwards pass for misc. ops, add tests (#11257 ) * CUDA: backwards pass for misc. ops, add tests * remove restrict from pointers	2025-01-16 16:43:38 +01:00
Concedo	11cd7c7bb0	survived the storm, again	2025-01-16 22:25:18 +08:00
Johannes Gäßler	432df2d5f9	RoPE: fix back, CUDA support for back + noncont. (#11240 ) * RoPE: fix back, CUDA support for back + noncont. * fix comments reg. non-cont. RoPE support [no-ci]	2025-01-15 12:51:37 +01:00
Concedo	b154bd3671	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build.md # docs/development/HOWTO-add-model.md # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-01-10 17:57:38 +08:00
Molly Sophia	ee7136c6d1	llama: add support for QRWKV6 model architecture (#11001 ) llama: add support for QRWKV6 model architecture (#11001) * WIP: Add support for RWKV6Qwen2 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV: Some graph simplification Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Add support for RWKV6Qwen2 with cpu and cuda GLA Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix some typos Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * code format changes Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix wkv test & add gla test Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Fix cuda warning Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update README.md Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * Update ggml/src/ggml-cuda/gla.cu Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Fix fused lerp weights loading with RWKV6 Signed-off-by: Molly Sophia <mollysophia379@gmail.com> * better sanity check skipping for QRWKV6 in llama-quant thanks @compilade Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: compilade <git@compilade.net> --------- Signed-off-by: Molly Sophia <mollysophia379@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: compilade <git@compilade.net>	2025-01-10 09:58:08 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
Concedo	7c671f289e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # examples/cvector-generator/mean.hpp # examples/cvector-generator/pca.hpp # examples/export-lora/export-lora.cpp # examples/rpc/rpc-server.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/compare-llama-bench.py # scripts/hf.sh # tests/test-chat-template.cpp	2024-12-28 12:48:34 +08:00
Djip007	2cd43f4900	ggml : more perfo with llamafile tinyblas on x86_64 (#10714 ) * more perfo with llamafile tinyblas on x86_64. - add bf16 suport - change dispache strategie (thanks: https://github.com/ikawrakow/ik_llama.cpp/pull/71 ) - reduce memory bandwidth simple tinyblas dispache and more cache freindly * tinyblas dynamic dispaching * sgemm: add M blocs. * - git 2.47 use short id of len 9. - show-progress is not part of GNU Wget2 * remove not stable test	2024-12-24 18:54:49 +01:00
Diego Devesa	32d6ee6385	ggml : fix const usage in SSE path (#10962 )	2024-12-23 20:25:52 +01:00
Concedo	50648de0af	rephrase tensor moved warning, cleanup and prepare for ci	2024-12-19 22:57:43 +08:00
Concedo	f456ed7237	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .devops/tools.sh # .github/workflows/build.yml # Makefile # README.md # common/CMakeLists.txt # common/common.h # examples/llava/CMakeLists.txt # examples/run/CMakeLists.txt # examples/run/README.md # examples/run/run.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-kompute/ggml-kompute.cpp # tests/test-backend-ops.cpp # tests/test-rope.cpp	2024-12-15 15:30:10 +08:00
Concedo	1e07043a6e	clean and rename old clblast files in preparation for merge	2024-12-15 15:29:02 +08:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
Concedo	ed75f8a741	up to date merge, without vulkan-gen-shaders. They will be built before each release from now on, as they are very large	2024-12-13 17:18:01 +08:00
Concedo	de64b9198c	merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)	2024-12-13 17:04:19 +08:00
Concedo	4c4ce5e808	rewritten checkpoint 1 - before coopmat	2024-12-13 16:55:23 +08:00
Karol Kontny	d583cd03f6	ggml : Fix compilation issues on ARM platform when building without fp16 (#10811 )	2024-12-13 01:04:19 +01:00
Diego Devesa	cb13ef85a4	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 ) other windows build fixes	2024-12-12 19:02:49 +01:00
Concedo	4548d893ee	better way to handle termux compatibility (+2 squashed commit) Squashed commit: [301986f11] better way to handle termux compatibility [16b03b225] updated lite	2024-12-11 15:05:01 +08:00
Concedo	a11bba5893	cleanup, fix native build for arm (+28 squashed commit) Squashed commit: [d1f6a4154] bundle library [947ab84b7] undo [0f9aba8d8] test [e9ac93873] test [920438202] test [`1c6d98804`] Revert "quick test" This reverts commit `acf8ec8940`. [`acf8ec894`] quick test [`6a9937233`] undo [`5a263a5bd`] test [`ddfd82bca`] test [`0b30e45da`] test [`c3bfece55`] messed up [`2a4b37fe0`] Revert "test" This reverts commit `80a1fcaeaf`. [`80a1fcaea`] test [`e2aa7d944`] test [`264d80200`] test [`f5b123173`] undo [`1ffacc484`] test [`63c0be926`] undo [`510e0377e`] ofast try fix [`4ac199b20`] try fix sigill [`1bc987ba2`] try fix illegal instruction [`7697252b1`] edit [`f87087b28`] check gcc ver [`e9dfe2cef`] try using qemu to do the pyinstaller [`b411192db`] revert [`25b5301e5`] try using qemu to do the pyinstaller [`58038cddc`] try using qemu to do the pyinstaller	2024-12-10 19:42:23 +08:00
Djip007	19d8762ab6	ggml : refactor online repacking (#10446 ) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-07 14:37:50 +02:00
PAB	a8cbab201d	ggml: add `GGML_SET` Metal kernel + i32 CPU kernel (ggml/1037) * implemented cpu kernel * add i32 test cases in test-backend-ops * typedef `ggml_metal_kargs_set` * implemented `kernel_set` * memcpy	2024-12-05 13:27:33 +02:00
PAB	c2082d93a8	ggml : add `GGML_PAD_REFLECT_1D` operation (ggml/1034) * ggml_pad_reflect_1d defined in header * implemented on CPU * called the forward pass * impl Metal kernel * added Metal kernel * added OP_PAD_REFLECT_1D in test-backend-ops.cpp * add test-pad-reflect-1d test case * test case support multiple backend	2024-12-05 13:27:31 +02:00
Diego Devesa	59f4db1088	ggml : add predefined list of CPU backend variants to build (#10626 ) * ggml : add predefined list of CPU backend variants to build * update CPU dockerfiles	2024-12-04 14:45:40 +01:00
Diego Devesa	2803540814	ggml-cpu : fix HWCAP2_I8MM value (#10646 )	2024-12-04 14:40:44 +01:00
Concedo	557bcaf86e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .clang-tidy # .github/workflows/build.yml # Makefile # Package.swift # common/CMakeLists.txt # examples/batched-bench/CMakeLists.txt # examples/batched/CMakeLists.txt # examples/convert-llama2c-to-ggml/CMakeLists.txt # examples/cvector-generator/CMakeLists.txt # examples/embedding/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/export-lora/CMakeLists.txt # examples/gbnf-validator/CMakeLists.txt # examples/gguf-split/CMakeLists.txt # examples/gguf/CMakeLists.txt # examples/gritlm/CMakeLists.txt # examples/imatrix/CMakeLists.txt # examples/infill/CMakeLists.txt # examples/llama-bench/CMakeLists.txt # examples/llava/CMakeLists.txt # examples/lookahead/CMakeLists.txt # examples/lookup/CMakeLists.txt # examples/main-cmake-pkg/CMakeLists.txt # examples/main/CMakeLists.txt # examples/parallel/CMakeLists.txt # examples/passkey/CMakeLists.txt # examples/perplexity/CMakeLists.txt # examples/quantize-stats/CMakeLists.txt # examples/quantize/CMakeLists.txt # examples/retrieval/CMakeLists.txt # examples/run/CMakeLists.txt # examples/save-load-state/CMakeLists.txt # examples/server/CMakeLists.txt # examples/simple-chat/CMakeLists.txt # examples/simple/CMakeLists.txt # examples/speculative-simple/CMakeLists.txt # examples/speculative/CMakeLists.txt # examples/tokenize/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # pocs/vdot/CMakeLists.txt # src/CMakeLists.txt # src/unicode.cpp # tests/test-sampling.cpp	2024-11-30 12:24:51 +08:00
Concedo	697ca70115	temp checkpoint	2024-11-30 12:13:20 +08:00
Concedo	ec95241e38	temp checkpoint	2024-11-30 11:59:27 +08:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Georgi Gerganov	f0678c5ff4	ggml : fix I8MM Q4_1 scaling factor conversion (#10562 ) ggml-ci	2024-11-29 16:25:39 +02:00
Georgi Gerganov	76b27d29c2	ggml : fix row condition for i8mm kernels (#10561 ) ggml-ci	2024-11-28 14:56:37 +02:00
Shupei Fan	c202cef168	ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541 ) * ggml-cpu: support IQ4_NL_4_4 by runtime repack * ggml-cpu: add __ARM_FEATURE_DOTPROD guard	2024-11-28 13:52:03 +01:00
Concedo	ec581b19d8	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # Package.swift # examples/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/llama-bench/llama-bench.cpp # examples/server/README.md # examples/server/server.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-amx/CMakeLists.txt # ggml/src/ggml-blas/CMakeLists.txt # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-kompute/CMakeLists.txt # ggml/src/ggml-kompute/ggml-kompute.cpp # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-rpc/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # pocs/CMakeLists.txt # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-11-26 17:01:20 +08:00
Diego Devesa	5931c1f233	ggml : add support for dynamic loading of backends (#10469 ) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-25 15:13:39 +01:00
Concedo	83350ec314	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/020-enhancement.yml # .github/ISSUE_TEMPLATE/030-research.yml # .github/ISSUE_TEMPLATE/040-refactor.yml # .github/workflows/build.yml # Makefile # common/CMakeLists.txt # examples/CMakeLists.txt # examples/infill/infill.cpp # examples/lookahead/lookahead.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/speculative/speculative.cpp # flake.lock # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/kernels/CMakeLists.txt # ggml/src/ggml-cann/kernels/dup.cpp # ggml/src/ggml-cann/kernels/get_row_f16.cpp # ggml/src/ggml-cann/kernels/get_row_f32.cpp # ggml/src/ggml-cann/kernels/get_row_q4_0.cpp # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp	2024-11-25 16:26:08 +08:00
Diego Devesa	55ed008b2d	ggml : do not use ARM features not included in the build (#10457 )	2024-11-23 14:41:12 +01:00
Concedo	091a432cf6	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cli-cann.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-musa.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-musa.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .gitignore # CMakeLists.txt # Makefile # cmake/llama-config.cmake.in # docs/backend/SYCL.md # docs/build.md # examples/llama-bench/llama-bench.cpp # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml-blas/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp	2024-11-21 16:26:24 +08:00
Concedo	282a647689	Merge commit '`467576b6cc`' into concedo_experimental # Conflicts: # .gitignore # Makefile # README.md # common/common.h # docs/build.md # examples/infill/infill.cpp # examples/perplexity/perplexity.cpp # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # scripts/sync-ggml-am.sh # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-opt.cpp # tests/test-quantize-perf.cpp	2024-11-21 16:05:21 +08:00
FirstTimeEZ	a43178299c	ggml : fix undefined reference to 'getcpu' (#10354 ) https://github.com/ggerganov/llama.cpp/issues/10352	2024-11-17 10:39:22 +02:00
Johannes Gäßler	8a43e940ab	ggml: new optimization interface (ggml/988)	2024-11-17 08:30:29 +02:00

1 2

55 commits