koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-14 08:22:43 +00:00

Author	SHA1	Message	Date
Concedo	4e40f2aaf4	added photomaker face cloning	2025-06-20 21:33:36 +08:00
Concedo	21881a861d	rename restrict square to sdclampedsoft	2025-06-20 15:39:55 +08:00
Concedo	175c99081e	merged https://github.com/leejet/stable-diffusion.cpp/issues/588 to fix vae tiling, ref https://github.com/LostRuins/koboldcpp/issues/1603	2025-06-20 11:13:04 +08:00
Concedo	b925bbfc6d	add simple api example	2025-06-19 23:05:28 +08:00
Concedo	771261f5be	updated sdui	2025-06-19 22:16:23 +08:00
Concedo	924dfa7cd3	bump version	2025-06-18 21:37:24 +08:00
Concedo	9e49350507	merge occam's https://github.com/ggml-org/llama.cpp/pull/14249	2025-06-18 21:23:23 +08:00
Concedo	5f0a7a84ae	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # scripts/sync-ggml.last	2025-06-18 21:22:51 +08:00
Concedo	268b6f76df	updated lite	2025-06-18 21:06:24 +08:00
Concedo	e35c6b8f9b	remove t5 masking sdcpp	2025-06-18 21:05:03 +08:00
Concedo	e0a7694328	try to set cuda pcie order first thing	2025-06-18 20:25:38 +08:00
Charles Xu	ef035803eb	ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258 )	2025-06-18 12:40:07 +01:00
Concedo	a8d33ebb0d	increase genamt hardlimit from 0.1 to 0.2 ratio	2025-06-18 19:34:12 +08:00
Concedo	40443a98f5	show available RAM, fixed SD vae tiling noise	2025-06-18 18:44:50 +08:00
Xuan-Son Nguyen	413977de32	mtmd : refactor llava-uhd preprocessing logic (#14247 ) * mtmd : refactor llava-uhd preprocessing logic * fix editorconfig	2025-06-18 10:43:57 +02:00
Xuan-Son Nguyen	95402553a5	llama-chat : fix multiple system message for gemma, orion (#14246 )	2025-06-18 09:58:43 +02:00
Sigbjørn Skjæret	3865cff4f5	convert : fix null head_dim AutoConfig regression (#14248 )	2025-06-18 09:52:07 +02:00
Georgi Gerganov	d03172cc79	sync : ggml ggml-ci	2025-06-18 09:59:21 +03:00
Daniel Bevenius	dd8e59f443	ggml : disable warnings for tests when using MSVC (ggml/1273) * ggml : disable warnings for tests when using MSVC This commit disables warnings for tests on windows when using MSVC. The motivation for this is that this brings the build output more inline with what Linux/MacOS systems produce. There is still one warning generated for the tests which is: ```console Building Custom Rule C:/ggml/tests/CMakeLists.txt cl : command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG' [C:\ggml\build\tests\test-arange.vcxproj] test-arange.cpp test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe ``` * ggml : fix typo in tests disable list	2025-06-18 09:59:21 +03:00
Daniel Bevenius	bbe98d2784	ggml : remove unused ggml_context_container (ggml/1272) This commit removes the unused `ggml_context_container` structure from the ggml library. It looks like the usage of this struct was removed in Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc ggml_contexts on the heap (whisper/2525)"). The motivation for this changes is to improve code clarity/readability.	2025-06-18 09:59:21 +03:00
Daniel Bevenius	c2056ed6d4	examples : include examples in msvc disable warn (ggml/1270) This commit adds the examples in the "list" of targets to ignore MSVC warnings. The motivation for this is that currently the examples generate a number of warnings that are ignore/disabled for the core ggml project. This makes for a cleaner output when building.	2025-06-18 09:59:21 +03:00
bandoti	c46503014d	cmake: remove shader-gen step-targets from ggml-vulkan (#14226 ) * Remove step-targets from vulkan-shaders-gen * Unset DESTDIR when building vulkan-shaders-gen	2025-06-17 22:33:25 +02:00
Concedo	7966bdd1ad	allow embeddings model to use gpu	2025-06-18 00:46:30 +08:00
Concedo	4356a00f4a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ci/run.sh # docs/function-calling.md # examples/gritlm/gritlm.cpp # ggml/CMakeLists.txt # ggml/cmake/common.cmake # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # requirements/requirements-compare-llama-bench.txt # scripts/compare-llama-bench.py # tests/CMakeLists.txt	2025-06-18 00:16:54 +08:00
Reithan	f07434f4c1	streamline grammar sampler to speed up generation while using heavy grammar (#1606 )	2025-06-17 23:04:59 +08:00
xctan	860a9e4eef	ggml-cpu : remove the weak alias trick (#14221 )	2025-06-17 12:58:32 +03:00
R0CKSTAR	fe9d60e74a	musa: fix build warning (unused variable) (#14231 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-06-17 17:48:08 +08:00
Sigbjørn Skjæret	e434e69183	common : suggest --jinja when autodetection fails (#14222 )	2025-06-16 21:58:42 +02:00
Georgi Gerganov	89fea80d29	server : fix incorrect usage of llama_get_embeddings() (#14225 ) * server : fix incorrect usage of llama_get_embeddings() ggml-ci * cont : fix the fix ggml-ci	2025-06-16 22:33:27 +03:00
Concedo	ab29be54c4	comfyui compat - serve temporary upload endpoint for img2img	2025-06-16 23:18:47 +08:00
Diego Devesa	6adc3c3ebc	llama : add thread safety test (#14035 ) * llama : add thread safety test * llamafile : remove global state * llama : better LLAMA_SPLIT_MODE_NONE logic when main_gpu < 0 GPU devices are not used --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 08:11:43 -07:00
bandoti	0dbcabde8c	cmake: clean up external project logic for vulkan-shaders-gen (#14179 ) * Remove install step for vulkan-shaders-gen * Add install step to normalize msvc with make * Regenerate modified shaders at build-time	2025-06-16 10:32:13 -03:00
Đinh Trọng Huy	ad590be98c	model : add NeoBERT (#14164 ) * convert neobert model to gguf * add inference graph * fix flake8 lint * followed reviewer suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * follow reviewers suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * override NeoBERT feed-forward length --------- Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-16 14:53:41 +02:00
uvos	7d6d91babf	HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202 )	2025-06-16 13:47:38 +02:00
Georgi Gerganov	d3e64b9f49	llama : rework embeddings logic (#14208 ) * llama : rework embeddings logic ggml-ci * cont : fix rerank ggml-ci * cont : engrish [no ci] * cont : fix rerank ggml-ci * server : support both embeddings and completions with single model ggml-ci * cont : avoid embeddings_org ggml-ci	2025-06-16 14:14:00 +03:00
Charles Xu	3ba0d843c6	ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206 )	2025-06-16 11:47:57 +02:00
Bartowski	0bf49eb668	convert : remove arcee change in convert_hf_to_gguf_update.py (#14207 )	2025-06-16 10:16:06 +02:00
Đinh Trọng Huy	4ad243677b	gguf-py : allow key override when adding value to GGUFWriter (#14194 ) Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>	2025-06-16 09:20:59 +02:00
Jeff Bolz	c89c2d1ab9	vulkan: mutex around vkQueueSubmit (#14127 ) This fixes the remaining crash in test-thread-safety on my system.	2025-06-16 08:21:08 +02:00
xctan	3555b3004b	ggml-cpu : rework weak alias on apple targets (#14146 ) * ggml-cpu : rework weak alias on apple targets * fix powerpc detection * fix ppc detection * fix powerpc detection on darwin	2025-06-16 13:54:15 +08:00
Bartowski	d7da8dc83a	model : Add support for Arcee AI's upcoming AFM model (#14185 ) * Add Arcee AFM support * Add draft update code * Fix linter and update URL, may still not be final * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Remote accidental blank line --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-06-16 01:04:06 +02:00
Eric Curtin	cd355eda7d	server : When listening on a unix domain socket don't print http:// and port (#14180 ) Instead show something like this: main: server is listening on file.sock - starting the main loop Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-06-15 23:36:22 +02:00
Ed Addario	30e5b01de2	quantize : change int to unsigned int for KV overrides (#14197 )	2025-06-15 18:53:45 +02:00
Concedo	6c9654f744	updated lite and docs	2025-06-15 23:44:51 +08:00
uvos	e54b394082	CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196 )	2025-06-15 17:30:13 +02:00
Concedo	861a2f5275	terminal title	2025-06-15 21:51:44 +08:00
uvos	2c2caa4443	HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ (#14183 )	2025-06-15 15:45:27 +02:00
Georgi Gerganov	5fce5f948d	kv-cache : fix use-after-move of defrag info (#14189 ) ggml-ci	2025-06-15 10:52:11 +03:00
Mikko Juola	9ae4143bc6	model : add dots.llm1 architecture support (#14044 ) (#14118 ) Adds: * Dots1Model to convert_hf_to_gguf.py * Computation graph code to llama-model.cpp * Chat template to llama-chat.cpp to detect this model's template. --- The model is called "dots.llm1" (I decided to shorten it to dots1 or DOTS1 in the code generally) architecture. The only models that exist as of writing of this commit that follow this architecture are "dots.llm1.inst" and "dots.llm1.base" from here: * https://huggingface.co/rednote-hilab/dots.llm1.inst * https://huggingface.co/rednote-hilab/dots.llm1.base The model architecture is a combination of Qwen and Deepseek parts, as seen here: `ffe12627b4/src/transformers/models/dots1/modular_dots1.py`	2025-06-15 09:52:06 +02:00
Georgi Gerganov	c311ac664d	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 ) ggml-ci	2025-06-15 10:08:58 +03:00

1 2 3 4 5 ...

8486 commits