koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-10 04:00:53 +00:00

Author	SHA1	Message	Date
Concedo	4356a00f4a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ci/run.sh # docs/function-calling.md # examples/gritlm/gritlm.cpp # ggml/CMakeLists.txt # ggml/cmake/common.cmake # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # requirements/requirements-compare-llama-bench.txt # scripts/compare-llama-bench.py # tests/CMakeLists.txt	2025-06-18 00:16:54 +08:00
Reithan	f07434f4c1	streamline grammar sampler to speed up generation while using heavy grammar (#1606 )	2025-06-17 23:04:59 +08:00
xctan	860a9e4eef	ggml-cpu : remove the weak alias trick (#14221 )	2025-06-17 12:58:32 +03:00
R0CKSTAR	fe9d60e74a	musa: fix build warning (unused variable) (#14231 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-17 17:48:08 +08:00
Sigbjørn Skjæret	e434e69183	common : suggest --jinja when autodetection fails (#14222 )	2025-06-16 21:58:42 +02:00
Georgi Gerganov	89fea80d29	server : fix incorrect usage of llama_get_embeddings() (#14225 ) * server : fix incorrect usage of llama_get_embeddings() ggml-ci * cont : fix the fix ggml-ci	2025-06-16 22:33:27 +03:00
Concedo	ab29be54c4	comfyui compat - serve temporary upload endpoint for img2img	2025-06-16 23:18:47 +08:00
Diego Devesa	6adc3c3ebc	llama : add thread safety test (#14035 ) * llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 08:11:43 -07:00
bandoti	0dbcabde8c	cmake: clean up external project logic for vulkan-shaders-gen (#14179 ) * Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time	2025-06-16 10:32:13 -03:00
Đinh Trọng Huy	ad590be98c	model : add NeoBERT (#14164 ) * convert neobert model to gguf * add inference graph * fix flake8 lint * followed reviewer suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * follow reviewers suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * override NeoBERT feed-forward length --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 14:53:41 +02:00
uvos	7d6d91babf	HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202 )	2025-06-16 13:47:38 +02:00
Georgi Gerganov	d3e64b9f49	llama : rework embeddings logic (#14208 ) * llama : rework embeddings logic ggml-ci * cont : fix rerank ggml-ci * cont : engrish [no ci] * cont : fix rerank ggml-ci * server : support both embeddings and completions with single model ggml-ci * cont : avoid embeddings_org ggml-ci	2025-06-16 14:14:00 +03:00
Charles Xu	3ba0d843c6	ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206 )	2025-06-16 11:47:57 +02:00
Bartowski	0bf49eb668	convert : remove arcee change in convert_hf_to_gguf_update.py (#14207 )	2025-06-16 10:16:06 +02:00
Đinh Trọng Huy	4ad243677b	gguf-py : allow key override when adding value to GGUFWriter (#14194 ) Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-06-16 09:20:59 +02:00
Jeff Bolz	c89c2d1ab9	vulkan: mutex around vkQueueSubmit (#14127 ) This fixes the remaining crash in test-thread-safety on my system.	2025-06-16 08:21:08 +02:00
xctan	3555b3004b	ggml-cpu : rework weak alias on apple targets (#14146 ) * ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin	2025-06-16 13:54:15 +08:00
Bartowski	d7da8dc83a	model : Add support for Arcee AI's upcoming AFM model (#14185 ) * Add Arcee AFM support * Add draft update code * Fix linter and update URL, may still not be final * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Remote accidental blank line --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-06-16 01:04:06 +02:00
Eric Curtin	cd355eda7d	server : When listening on a unix domain socket don't print http:// and port (#14180 ) Instead show something like this: main: server is listening on file.sock - starting the main loop Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-06-15 23:36:22 +02:00
Ed Addario	30e5b01de2	quantize : change int to unsigned int for KV overrides (#14197 )	2025-06-15 18:53:45 +02:00
Concedo	6c9654f744	updated lite and docs	2025-06-15 23:44:51 +08:00
uvos	e54b394082	CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196 )	2025-06-15 17:30:13 +02:00
Concedo	861a2f5275	terminal title	2025-06-15 21:51:44 +08:00
uvos	2c2caa4443	HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183 )	2025-06-15 15:45:27 +02:00
Georgi Gerganov	5fce5f948d	kv-cache : fix use-after-move of defrag info (#14189 ) ggml-ci	2025-06-15 10:52:11 +03:00
Mikko Juola	9ae4143bc6	model : add dots.llm1 architecture support (#14044 ) (#14118 ) Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: `ffe12627b4/src/transformers/models/dots1/modular_dots1.py`	2025-06-15 09:52:06 +02:00
Georgi Gerganov	c311ac664d	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 ) ggml-ci	2025-06-15 10:08:58 +03:00
Georgi Gerganov	b9912ac570	batch : auto-gen positions + verify multi-sequence input (#14177 ) * batch : verify multi-sequence input batches ggml-ci * cont : auto-gen positions + verify multi-seq input ggml-ci * cont : first print debug info, then perform validation ggml-ci * cont : fix position auto-gen + add comments ggml-ci	2025-06-15 09:18:37 +03:00
Pepijn de Vos	00ba772610	docs : remove WIP since PR has been merged (#13912 )	2025-06-15 08:06:37 +02:00
Piotr	3cb203c89f	llama-chat : Do not throw when tool parsing fails (#14012 ) Currently when a model generates output which looks like a tool call, but is invalid an exception is thrown and not handled, causing the cli or llama-server to bail. Instead, handle the chat parser exception and simply return the generated text in such cases. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-06-14 17:25:15 +01:00
Concedo	9809deed6a	updated docs	2025-06-14 16:56:28 +08:00
Aman Gupta	2e42be42bd	compare-llama-bench: add option to plot (#14169 ) * compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values	2025-06-14 10:34:20 +02:00
Concedo	238be98efa	Allow override config for gguf files when reloading in admin mode, updated lite, fixed typo (+1 squashed commits) Squashed commits: [fe14845cc] Allow override config for gguf files when reloading in admin mode, updated lite (+2 squashed commit) Squashed commit: [9ded66aa5] Allow override config for gguf files when reloading in admin mode [9597f6a34] update lite	2025-06-14 12:00:20 +08:00
Concedo	bfb47cbcd8	Revert "revert padding change for sd chroma" This reverts commit `7de88802f9`.	2025-06-14 10:10:34 +08:00
Concedo	5f9e96e82d	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # CMakeLists.txt # README.md # common/CMakeLists.txt # docs/multimodal.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # src/llama-context.cpp	2025-06-14 09:05:45 +08:00
Concedo	69e4a32ca2	Merge commit '`d4e0d95cf5`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-06-14 01:58:53 +08:00
Concedo	33809c9e82	doing what i must because i can, after the mess that is https://github.com/ggml-org/llama.cpp/pull/13892 there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC i should be relaxing its the weekend	2025-06-14 01:41:16 +08:00
Georgi Gerganov	fb85a288d7	vocab : fix build (#14175 ) ggml-ci	2025-06-13 20:03:05 +03:00
Svetlozar Georgiev	40643edb86	sycl: fix docker image (#14144 )	2025-06-13 18:32:56 +02:00
Guy Goldenberg	3cfbbdb44e	Merge commit from fork * vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-13 19:20:25 +03:00
Concedo	f50c793140	not working - refactoring	2025-06-14 00:03:21 +08:00
Georgi Gerganov	80709b70a2	batch : add LLAMA_BATCH_DEBUG environment variable (#14172 ) * batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display	2025-06-13 18:35:00 +03:00
Concedo	c494525b33	update deprecated apis	2025-06-13 22:21:15 +08:00
Concedo	4204f111f7	Merge commit '`8f47e25f56`' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-linux-cross.yml # docs/backend/CANN.md # examples/batched.swift/Sources/main.swift # examples/embedding/embedding.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/llama.cpp.swift/LibLlama.swift # examples/lookahead/lookahead.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/simple-chat/simple-chat.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/imatrix.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp # tools/run/run.cpp	2025-06-13 22:05:03 +08:00
ddpasa	26ff3685bf	docs : Update multimodal.md (#14122 ) * Update multimodal.md * Update multimodal.md	2025-06-13 15:17:53 +02:00
Wagner Bruna	f6d2d1ce5c	configurable resolution limit (#1586 ) * refactor image gen configuration screen * make image size limit configurable * fix resolution limits and keep dimensions closer to the original ratio * use 0.0 for the configured default image size limit This prevents the current default value from being saved into the config files, in case we later decide to adopt a different value. * export image model version when loading * restore model-specific default image size limit * change the image area restriction to be specified by a square side * move image resolution limits down to the C++ level * Revert "export image model version when loading" This reverts commit `fa65b23de3`. * Linting Fixes: PY: - Inconsistent var name sd_restrict_square -> sd_restrict_square_var - GUI swap back to using absolute row numbers for now. - fstring fix - size_limit -> side_limit inconsistency C++: - roundup_64 standalone function - refactor sd_fix_resolution variable names for clarity - move "anti crashing" hard total megapixel limit always to be applied after soft total megapixel limit instead of conditionally only when sd_restrict_square is unset * allow unsafe resolutions if debugmode is on --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2025-06-13 20:05:20 +08:00
Georgi Gerganov	60c666347b	batch : rework llama_batch_allocr (#14153 ) * batch : rework llama_batch_allocr ggml-ci * cont : move validation inside class ggml-ci * cont : move output counting to class ggml-ci * cont : minor ggml-ci * batch : add TODOs ggml-ci	2025-06-13 13:47:55 +03:00
Reithan	f1c9db4174	fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600 )	2025-06-13 18:46:38 +08:00
Georgi Gerganov	b7cc7745e3	readme : remove survey link (#14168 )	2025-06-13 11:55:44 +03:00
Christian Kastner	cc8d081879	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 ) * cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT * cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*	2025-06-13 10:38:52 +02:00

1 2 3 4 5 ...

8463 commits