koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-06-01 14:29:33 +00:00

Author	SHA1	Message	Date
Concedo	557bcaf86e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .clang-tidy # .github/workflows/build.yml # Makefile # Package.swift # common/CMakeLists.txt # examples/batched-bench/CMakeLists.txt # examples/batched/CMakeLists.txt # examples/convert-llama2c-to-ggml/CMakeLists.txt # examples/cvector-generator/CMakeLists.txt # examples/embedding/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/export-lora/CMakeLists.txt # examples/gbnf-validator/CMakeLists.txt # examples/gguf-split/CMakeLists.txt # examples/gguf/CMakeLists.txt # examples/gritlm/CMakeLists.txt # examples/imatrix/CMakeLists.txt # examples/infill/CMakeLists.txt # examples/llama-bench/CMakeLists.txt # examples/llava/CMakeLists.txt # examples/lookahead/CMakeLists.txt # examples/lookup/CMakeLists.txt # examples/main-cmake-pkg/CMakeLists.txt # examples/main/CMakeLists.txt # examples/parallel/CMakeLists.txt # examples/passkey/CMakeLists.txt # examples/perplexity/CMakeLists.txt # examples/quantize-stats/CMakeLists.txt # examples/quantize/CMakeLists.txt # examples/retrieval/CMakeLists.txt # examples/run/CMakeLists.txt # examples/save-load-state/CMakeLists.txt # examples/server/CMakeLists.txt # examples/simple-chat/CMakeLists.txt # examples/simple/CMakeLists.txt # examples/speculative-simple/CMakeLists.txt # examples/speculative/CMakeLists.txt # examples/tokenize/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # pocs/vdot/CMakeLists.txt # src/CMakeLists.txt # src/unicode.cpp # tests/test-sampling.cpp	2024-11-30 12:24:51 +08:00
Concedo	ec95241e38	temp checkpoint	2024-11-30 11:59:27 +08:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Georgi Gerganov	4c0a95b107	llama : add missing model types	2024-11-28 20:45:07 +02:00
Georgi Gerganov	ab96610b1e	cmake : enable warnings in llama (#10474 ) * cmake : enable warnings in llama ggml-ci * cmake : add llama_get_flags and respect LLAMA_FATAL_WARNINGS * cmake : get_flags -> ggml_get_flags * speculative-simple : fix warnings * cmake : reuse ggml_get_flags ggml-ci * speculative-simple : fix compile warning ggml-ci	2024-11-26 14:18:08 +02:00
Concedo	ec581b19d8	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # .github/workflows/build.yml # .github/workflows/docker.yml # CMakeLists.txt # Makefile # Package.swift # examples/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/llama-bench/llama-bench.cpp # examples/server/README.md # examples/server/server.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-amx/CMakeLists.txt # ggml/src/ggml-blas/CMakeLists.txt # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-kompute/CMakeLists.txt # ggml/src/ggml-kompute/ggml-kompute.cpp # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-rpc/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # pocs/CMakeLists.txt # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-11-26 17:01:20 +08:00
Shane A	80acb7b430	Rename Olmo1124 to Olmo2 (#10500 )	2024-11-25 19:36:09 +01:00
Diego Devesa	10bce0450f	llama : accept a list of devices to use to offload a model (#10497 ) * llama : accept a list of devices to use to offload a model * accept `--dev none` to completely disable offloading * fix dev list with dl backends * rename env parameter to LLAMA_ARG_DEVICE for consistency	2024-11-25 19:30:06 +01:00
Diego Devesa	5931c1f233	ggml : add support for dynamic loading of backends (#10469 ) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-25 15:13:39 +01:00
Concedo	83350ec314	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/020-enhancement.yml # .github/ISSUE_TEMPLATE/030-research.yml # .github/ISSUE_TEMPLATE/040-refactor.yml # .github/workflows/build.yml # Makefile # common/CMakeLists.txt # examples/CMakeLists.txt # examples/infill/infill.cpp # examples/lookahead/lookahead.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/speculative/speculative.cpp # flake.lock # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/kernels/CMakeLists.txt # ggml/src/ggml-cann/kernels/dup.cpp # ggml/src/ggml-cann/kernels/get_row_f16.cpp # ggml/src/ggml-cann/kernels/get_row_f32.cpp # ggml/src/ggml-cann/kernels/get_row_q4_0.cpp # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp	2024-11-25 16:26:08 +08:00
Diego Devesa	dc39012cba	llama : fix op mul check with command-r-plus (#10476 )	2024-11-24 16:10:26 +01:00
Concedo	116879144c	better error messages	2024-11-23 18:55:01 +08:00
Concedo	091a432cf6	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cli-cann.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-musa.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-musa.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .gitignore # CMakeLists.txt # Makefile # cmake/llama-config.cmake.in # docs/backend/SYCL.md # docs/build.md # examples/llama-bench/llama-bench.cpp # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml-blas/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp	2024-11-21 16:26:24 +08:00
Georgi Gerganov	1bb30bf28c	llama : handle KV shift for recurrent models (#10402 ) ggml-ci	2024-11-21 10:22:47 +02:00
Concedo	282a647689	Merge commit '`467576b6cc`' into concedo_experimental # Conflicts: # .gitignore # Makefile # README.md # common/common.h # docs/build.md # examples/infill/infill.cpp # examples/perplexity/perplexity.cpp # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # scripts/sync-ggml-am.sh # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-opt.cpp # tests/test-quantize-perf.cpp	2024-11-21 16:05:21 +08:00
Georgi Gerganov	8e752a777b	llama : add check for KV cache shifts (#10401 ) ggml-ci	2024-11-19 13:29:26 +02:00
Shane A	a88ad007de	llama : add OLMo November 2024 support (#10394 ) * Add OLMo November 2024 constants * Add OLMo November 2024 converter * Add loading of OLMo November 2024 tensors and hyper parameters * Add building of OLMo November 2024 model	2024-11-19 11:04:08 +02:00
Concedo	d5feaa8a3d	fixed old mixtral models, but at what cost? was it worth it?	2024-11-19 01:01:25 +08:00
Diego Devesa	be5caccef9	llama : only use default buffer types for the KV cache (#10358 )	2024-11-17 12:25:45 +01:00
Johannes Gäßler	4e54be0ec6	llama/ex: remove --logdir argument (#10339 )	2024-11-16 23:00:41 +01:00
Concedo	590553ef07	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-cli-intel.Dockerfile # .devops/llama-server-intel.Dockerfile # .github/workflows/build.yml # CMakePresets.json # Makefile # docs/backend/SYCL.md # docs/build.md # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # scripts/compare-llama-bench.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last	2024-11-16 17:20:14 +08:00
Concedo	70aee82552	attempts a backflip, but does he stick the landing?	2024-11-16 17:05:45 +08:00
FirstTimeEZ	89e4caaaf0	llama : save number of parameters and the size in llama_model (#10286 ) fixes #10285	2024-11-16 01:42:13 +01:00
Charles Xu	1607a5e5b0	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 ) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-11-15 01:28:50 +01:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Concedo	df080b074d	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/server/README.md # examples/speculative/speculative.cpp # flake.lock # ggml/src/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp	2024-11-14 21:40:52 +08:00
Michael Podvitskiy	fb4a0ec083	llama : propagate the results of `graph_compute` (#9525 ) * llama: propagating the results of `graph_compute` to the user interface * llama: reverting kv_cache in case of failed compute * llama: `llama_kv_cache_state` was removed, only the result of `llama_graph_compute` is returned * llama: restore a kv_cache in case of failed computation * llama: correct reverting of the entire batch. also updates `llama_kv_cache_find_slot`, will correctly count the number of `used` cells for recurrent models * llama: updated comments * llama : add comments about KV cache state after error --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-13 20:00:35 +02:00
Georgi Gerganov	f018acba22	llama : fix Qwen model type strings	2024-11-09 11:26:34 +02:00
Concedo	a244b1ffd2	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # Package.swift # ci/run.sh # docs/backend/SYCL.md # examples/llama-bench/llama-bench.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/run-json-schema-to-grammar.mjs # tests/test-backend-ops.cpp	2024-11-09 13:36:47 +08:00
wwoodsTM	5107e8cea3	DRY: Fixes clone functionality (#10192 )	2024-11-07 16:20:25 +01:00
Zhiyuan Li	3bcd40b3c5	Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration (#10133 ) * rwkv6: rename to wkv6 * rwkv6: support avx2 avx512 armv8 armv9 * rwkv6: update cuda file name * rwkv6: rename params * wkv on sycl * sycl: add some ops * sycl: Enhance OP support judgment * wkv6: drop armv9 and tranfer to GGML style ggml-ci * sync : ggml * update the function to use appropriate types * fix define error * Update ggml/src/ggml-cpu.c * add appropriate asserts * move element-wise functions outside * put the declaration outside the loop * rewrite to be more inline with the common pattern for distributing threads * use recommended way GGML_TENSOR_LOCALS --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Plamen Minev <pacominev@gmail.com> Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com> Co-authored-by: Meng, Hengyu <airdldl@163.com>	2024-11-07 15:19:10 +08:00
Concedo	628dcd640e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # examples/server/README.md	2024-11-06 23:13:00 +08:00
Diego Devesa	94d8cb8be1	metal : fix from ptr buffer name (#10189 )	2024-11-06 12:10:07 +01:00
Gabe Goodhart	b8deef0ec0	llama : add <\|tool_call\|> formatting to Granite template (#10177 ) Branch: GraniteToolCallTemplate Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2024-11-05 14:23:04 +02:00
Concedo	bb13925f39	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakePresets.json # Makefile # Package.swift # ci/run.sh # common/CMakeLists.txt # examples/CMakeLists.txt # flake.lock # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml.c # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp # tests/test-rope.cpp	2024-11-04 16:54:53 +08:00
Concedo	c7e351bf41	add exception for ibm granite, then keep using f16 kq mul for HIPBLAS only for now pending ROCM investigation re https://github.com/ggerganov/llama.cpp/pull/10015	2024-11-04 15:47:13 +08:00
Diego Devesa	9f40989351	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
Concedo	33721615b5	fixed build issues	2024-11-03 11:01:51 +08:00
Concedo	bc30ebd044	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # README.md # examples/CMakeLists.txt # examples/main/README.md # ggml/src/CMakeLists.txt # ggml/src/kompute-shaders/common.comp # scripts/sync-ggml.last # src/llama.cpp	2024-11-02 21:57:29 +08:00
Concedo	223c5f0844	clblast survived	2024-11-02 21:51:38 +08:00
Georgi Gerganov	1926d6e39d	llama : adjust default context size + print warnings (#10136 ) * llama : adjust default context size + print warnings ggml-ci * ggml-ci : add missing gpu-layers + adjust context sizes	2024-11-02 15:18:56 +02:00
Concedo	3072db6895	remove annoying eog prints	2024-11-02 12:44:33 +08:00
Diego Devesa	e991e3127f	llama : use smart pointers for ggml resources (#10117 )	2024-11-01 23:48:26 +01:00
Diego Devesa	85679d37f3	llama : improve output buffer type selection (#10098 )	2024-11-01 00:49:53 +01:00
Diego Devesa	1e9f94994e	quantize : fix --keep-split (#10114 )	2024-11-01 00:45:34 +01:00
Diego Devesa	c02e5ab2a6	llama : fix buffer checks for mamba and rwk (#10111 ) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-10-31 22:54:23 +01:00
Zhenwei Jin	ab3d71f97f	loader: refactor tensor weights storage (#9935 ) * loader: refactor tensor weights storage * use sorted map, sort weights by layer --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-10-31 19:50:39 +01:00
Concedo	a46f8acd03	note: also has support for completion tokens count	2024-11-01 00:44:14 +08:00
Diego Devesa	dea5e86051	ggml : check tensor name lengths in gguf files (#10100 )	2024-10-31 11:40:59 +01:00
Diego Devesa	c5b0f4b5d9	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00

1 2 3 4 5 ...

260 commits