koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-13 15:39:11 +00:00

Author	SHA1	Message	Date
Concedo	e35c6b8f9b	remove t5 masking sdcpp	2025-06-18 21:05:03 +08:00
Concedo	e0a7694328	try to set cuda pcie order first thing	2025-06-18 20:25:38 +08:00
Concedo	a8d33ebb0d	increase genamt hardlimit from 0.1 to 0.2 ratio	2025-06-18 19:34:12 +08:00
Concedo	40443a98f5	show available RAM, fixed SD vae tiling noise	2025-06-18 18:44:50 +08:00
Concedo	7966bdd1ad	allow embeddings model to use gpu	2025-06-18 00:46:30 +08:00
Concedo	4356a00f4a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ci/run.sh # docs/function-calling.md # examples/gritlm/gritlm.cpp # ggml/CMakeLists.txt # ggml/cmake/common.cmake # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # requirements/requirements-compare-llama-bench.txt # scripts/compare-llama-bench.py # tests/CMakeLists.txt	2025-06-18 00:16:54 +08:00
Reithan	f07434f4c1	streamline grammar sampler to speed up generation while using heavy grammar (#1606 )	2025-06-17 23:04:59 +08:00
xctan	860a9e4eef	ggml-cpu : remove the weak alias trick (#14221 )	2025-06-17 12:58:32 +03:00
R0CKSTAR	fe9d60e74a	musa: fix build warning (unused variable) (#14231 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-17 17:48:08 +08:00
Sigbjørn Skjæret	e434e69183	common : suggest --jinja when autodetection fails (#14222 )	2025-06-16 21:58:42 +02:00
Georgi Gerganov	89fea80d29	server : fix incorrect usage of llama_get_embeddings() (#14225 ) * server : fix incorrect usage of llama_get_embeddings() ggml-ci * cont : fix the fix ggml-ci	2025-06-16 22:33:27 +03:00
Concedo	ab29be54c4	comfyui compat - serve temporary upload endpoint for img2img	2025-06-16 23:18:47 +08:00
Diego Devesa	6adc3c3ebc	llama : add thread safety test (#14035 ) * llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 08:11:43 -07:00
bandoti	0dbcabde8c	cmake: clean up external project logic for vulkan-shaders-gen (#14179 ) * Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time	2025-06-16 10:32:13 -03:00
Đinh Trọng Huy	ad590be98c	model : add NeoBERT (#14164 ) * convert neobert model to gguf * add inference graph * fix flake8 lint * followed reviewer suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * follow reviewers suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * override NeoBERT feed-forward length --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 14:53:41 +02:00
uvos	7d6d91babf	HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202 )	2025-06-16 13:47:38 +02:00
Georgi Gerganov	d3e64b9f49	llama : rework embeddings logic (#14208 ) * llama : rework embeddings logic ggml-ci * cont : fix rerank ggml-ci * cont : engrish [no ci] * cont : fix rerank ggml-ci * server : support both embeddings and completions with single model ggml-ci * cont : avoid embeddings_org ggml-ci	2025-06-16 14:14:00 +03:00
Charles Xu	3ba0d843c6	ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206 )	2025-06-16 11:47:57 +02:00
Bartowski	0bf49eb668	convert : remove arcee change in convert_hf_to_gguf_update.py (#14207 )	2025-06-16 10:16:06 +02:00
Đinh Trọng Huy	4ad243677b	gguf-py : allow key override when adding value to GGUFWriter (#14194 ) Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-06-16 09:20:59 +02:00
Jeff Bolz	c89c2d1ab9	vulkan: mutex around vkQueueSubmit (#14127 ) This fixes the remaining crash in test-thread-safety on my system.	2025-06-16 08:21:08 +02:00
xctan	3555b3004b	ggml-cpu : rework weak alias on apple targets (#14146 ) * ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin	2025-06-16 13:54:15 +08:00
Bartowski	d7da8dc83a	model : Add support for Arcee AI's upcoming AFM model (#14185 ) * Add Arcee AFM support * Add draft update code * Fix linter and update URL, may still not be final * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Remote accidental blank line --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-06-16 01:04:06 +02:00
Eric Curtin	cd355eda7d	server : When listening on a unix domain socket don't print http:// and port (#14180 ) Instead show something like this: main: server is listening on file.sock - starting the main loop Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-06-15 23:36:22 +02:00
Ed Addario	30e5b01de2	quantize : change int to unsigned int for KV overrides (#14197 )	2025-06-15 18:53:45 +02:00
Concedo	6c9654f744	updated lite and docs	2025-06-15 23:44:51 +08:00
uvos	e54b394082	CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196 )	2025-06-15 17:30:13 +02:00
Concedo	861a2f5275	terminal title	2025-06-15 21:51:44 +08:00
uvos	2c2caa4443	HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183 )	2025-06-15 15:45:27 +02:00
Georgi Gerganov	5fce5f948d	kv-cache : fix use-after-move of defrag info (#14189 ) ggml-ci	2025-06-15 10:52:11 +03:00
Mikko Juola	9ae4143bc6	model : add dots.llm1 architecture support (#14044 ) (#14118 ) Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: `ffe12627b4/src/transformers/models/dots1/modular_dots1.py`	2025-06-15 09:52:06 +02:00
Georgi Gerganov	c311ac664d	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 ) ggml-ci	2025-06-15 10:08:58 +03:00
Georgi Gerganov	b9912ac570	batch : auto-gen positions + verify multi-sequence input (#14177 ) * batch : verify multi-sequence input batches ggml-ci * cont : auto-gen positions + verify multi-seq input ggml-ci * cont : first print debug info, then perform validation ggml-ci * cont : fix position auto-gen + add comments ggml-ci	2025-06-15 09:18:37 +03:00
Pepijn de Vos	00ba772610	docs : remove WIP since PR has been merged (#13912 )	2025-06-15 08:06:37 +02:00
Piotr	3cb203c89f	llama-chat : Do not throw when tool parsing fails (#14012 ) Currently when a model generates output which looks like a tool call, but is invalid an exception is thrown and not handled, causing the cli or llama-server to bail. Instead, handle the chat parser exception and simply return the generated text in such cases. Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>	2025-06-14 17:25:15 +01:00
Concedo	9809deed6a	updated docs	2025-06-14 16:56:28 +08:00
Aman Gupta	2e42be42bd	compare-llama-bench: add option to plot (#14169 ) * compare llama-bench: add option to plot * Address review comments: convert case + add type hints * Add matplotlib to requirements * fix tests * Improve comment and fix assert condition for test * Add back default test_name, add --plot_log_scale * use log_scale regardless of x_values	2025-06-14 10:34:20 +02:00
Concedo	238be98efa	Allow override config for gguf files when reloading in admin mode, updated lite, fixed typo (+1 squashed commits) Squashed commits: [fe14845cc] Allow override config for gguf files when reloading in admin mode, updated lite (+2 squashed commit) Squashed commit: [9ded66aa5] Allow override config for gguf files when reloading in admin mode [9597f6a34] update lite	2025-06-14 12:00:20 +08:00
Concedo	bfb47cbcd8	Revert "revert padding change for sd chroma" This reverts commit `7de88802f9`.	2025-06-14 10:10:34 +08:00
Concedo	5f9e96e82d	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # CMakeLists.txt # README.md # common/CMakeLists.txt # docs/multimodal.md # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/gemm.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # src/llama-context.cpp	2025-06-14 09:05:45 +08:00
Concedo	69e4a32ca2	Merge commit '`d4e0d95cf5`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-06-14 01:58:53 +08:00
Concedo	33809c9e82	doing what i must because i can, after the mess that is https://github.com/ggml-org/llama.cpp/pull/13892 there is so much duplicate code in each cpu arch, i expect upstream will prune it eventually arch detection has no fallback if all the arches are not found, by right we should set GGML_CPU_GENERIC i should be relaxing its the weekend	2025-06-14 01:41:16 +08:00
Georgi Gerganov	fb85a288d7	vocab : fix build (#14175 ) ggml-ci	2025-06-13 20:03:05 +03:00
Svetlozar Georgiev	40643edb86	sycl: fix docker image (#14144 )	2025-06-13 18:32:56 +02:00
Guy Goldenberg	3cfbbdb44e	Merge commit from fork * vocab : prevent integer overflow during load * Add static cast and GGML_ABORT --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-13 19:20:25 +03:00
Concedo	f50c793140	not working - refactoring	2025-06-14 00:03:21 +08:00
Georgi Gerganov	80709b70a2	batch : add LLAMA_BATCH_DEBUG environment variable (#14172 ) * batch : add LLAMA_BATCH_DEBUG environment variable ggml-ci * cont : improve seq_id display	2025-06-13 18:35:00 +03:00
Concedo	c494525b33	update deprecated apis	2025-06-13 22:21:15 +08:00
Concedo	4204f111f7	Merge commit '`8f47e25f56`' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-linux-cross.yml # docs/backend/CANN.md # examples/batched.swift/Sources/main.swift # examples/embedding/embedding.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/llama.cpp.swift/LibLlama.swift # examples/lookahead/lookahead.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/simple-chat/simple-chat.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/imatrix.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp # tools/run/run.cpp	2025-06-13 22:05:03 +08:00
ddpasa	26ff3685bf	docs : Update multimodal.md (#14122 ) * Update multimodal.md * Update multimodal.md	2025-06-13 15:17:53 +02:00

1 2 3 4 5 ...

8468 commits