koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-07 17:22:04 +00:00

Author	SHA1	Message	Date
Concedo	4eaf05dfeb	handle oai without v1 prefix	2025-10-16 02:16:49 +08:00
Concedo	dfeccea3a1	added shitty fractional scaling support for GNOME. but really just use KDE	2025-10-15 22:28:04 +08:00
Concedo	5207b8d4be	more sd path fallbacks	2025-10-15 15:22:06 +08:00
Concedo	610ba18971	sdcpp precision fix	2025-10-15 11:08:35 +08:00
Concedo	a8c023d906	quick test rocm	2025-10-14 17:19:16 +08:00
Concedo	833a778b18	try fix cu11 fa again	2025-10-13 16:36:59 +08:00
Concedo	3a42c6b523	apply fix from https://github.com/ggml-org/llama.cpp/pull/16558	2025-10-13 15:31:26 +08:00
Concedo	c6884a1462	Revert "revert https://github.com/ggml-org/llama.cpp/pull/15953 for now as it breaks kokoro" This reverts commit `20678ddca1`.	2025-10-13 15:25:53 +08:00
Concedo	ca8f36195f	try fix cu11 fa	2025-10-13 14:50:47 +08:00
Concedo	8b787866c6	fixed a typo	2025-10-13 11:14:38 +08:00
Concedo	59aa1529dc	add embeddings vulkan to makefile	2025-10-13 11:05:45 +08:00
Concedo	20678ddca1	revert https://github.com/ggml-org/llama.cpp/pull/15953 for now as it breaks kokoro	2025-10-13 10:36:51 +08:00
Concedo	121e2fefc8	updated lite	2025-10-12 20:52:16 +08:00
Concedo	54db35cd7a	fix t5 scale as well	2025-10-12 20:35:46 +08:00
Concedo	e0ba01c65e	fix cuda builds	2025-10-12 20:09:16 +08:00
Concedo	1a360b8458	sdcpp: optimize the handling of the FeedForward precision fix (+1 squashed commits) Squashed commits: [621ff6392] sdcpp: optimize the handling of the FeedForward precision fix (+1 squashed commits) Squashed commits: [05b16906c] sdcpp: optimize the handling of the FeedForward precision fix	2025-10-12 17:49:38 +08:00
Concedo	9503547ca1	Merge remote-tracking branch 'lcpp/gg/cacheless-embd' into concedo_experimental	2025-10-12 16:47:48 +08:00
Concedo	7e7da2583e	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt	2025-10-12 16:42:51 +08:00
Concedo	76d5fcbe49	fix the issue that occurs when using CUDA with k-quants weights	2025-10-12 16:18:03 +08:00
Georgi Gerganov	d4d465bce4	graph : support cacheless embeddings with FA and iSWA	2025-10-12 10:35:38 +03:00
Georgi Gerganov	4b2dae383d	common : update presets (#16504 ) * presets : add --embd-gemma-default and remove old embedding presets * presets : add gpt-oss presets * presets : add vision presets * cont : remove reasoning overrides [no ci] * cont : fix batch size for embedding gemma [no ci]	2025-10-12 09:29:13 +03:00
sirus20x6	41aac5c69b	ggml : Fix FP16 ELU positive branch (#16519 ) Co-authored-by: Aaron <shelhamer.aaron@gmail.com>	2025-10-12 08:25:37 +03:00
Daniel Bevenius	a2fba89a42	hparams : add check for layer index in is_recurrent (#16511 ) * hparams : add check for layer index in is_recurrent This commit adds a check in the is_recurrent method to ensure that the provided layer index is within the valid range. The motivation for this change is to prevent potential out-of-bounds and also be consistent with other methods in the class that perform similar checks, like is_swa.	2025-10-12 07:19:06 +02:00
sirus20x6	20cc625edc	ggml: Correct SVE implementation in ggml_vec_dot_f16_unroll (#16518 ) The previous SVE implementation for `ggml_vec_dot_f16_unroll` contained a bug due to a copy-paste error. The wrong variable was used in an FMA instruction, leading to incorrect results. This commit corrects the variable usage and improves the clarity of the code by renaming variables to avoid confusion. Co-authored-by: Aaron <shelhamer.aaron@gmail.com>	2025-10-12 08:15:00 +03:00
Concedo	a0ed446e61	handle numbers outside int32 range with wrapping	2025-10-12 12:46:45 +08:00
Wagner Bruna	9f9494cf3f	sd: add 'default' to the list of supported samplers (#1788 )	2025-10-12 12:35:56 +08:00
Concedo	65c2129f65	https://github.com/leejet/stable-diffusion.cpp/pull/877/commits/47c0f8e4bd6916442d04b0a4412554cf3a043e8d	2025-10-12 10:01:29 +08:00
Johannes Gäßler	11f0af5504	CUDA: faster tile FA, add oob checks, more HSs (#16492 )	2025-10-11 20:54:32 +02:00
Concedo	720fc30832	Merge branch 'upstream' into concedo_experimental	2025-10-11 23:19:38 +08:00
Concedo	e92f9fd422	cursed hack for RNN models	2025-10-11 23:14:55 +08:00
Georgi Gerganov	a3cb04744f	metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494 ) Some checks failed Python Type-Check / pyright type-check (push) Has been cancelled Details	2025-10-11 16:54:10 +03:00
Pascal	4a8fbe0a5e	feat: render user content as markdown option (#16358 ) * feat: render user content as markdown option - Add a persisted 'renderUserContentAsMarkdown' preference to the settings defaults and info metadata so the choice survives reloads like other options - Surface the new 'Render user content as Markdown' checkbox in the General section of the chat settings dialog, beneath the PDF toggle - Render user chat messages with 'MarkdownContent' when the new setting is enabled, matching assistant formatting while preserving the existing card styling otherwise - chore: update webui build output * chore: update webui build output	2025-10-11 15:50:49 +02:00
Yann Follet	31d0ff1869	server / ranking : add sorting and management of top_n (#16403 ) * server / ranking : add sorting and management of top_n * Make the retro compatible if no top_n will return all results here is a script to make some test ```script URL=${1:-http://127.0.0.1:8181} curl "$URL/v1/rerank" -H "Content-Type: application/json" \ -d '{ "model": "M", "query": "What is the recipe to make bread ?", "return_text" : true, "texts" : true, "top_n": 6, "documents": [ "voici la recette pour faire du pain, il faut de la farine de l eau et du levain et du sel", "it is a bear", "bread recipe : floor, water, yest, salt", "The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.", "here is the ingedients to bake bread : 500g floor, 350g water, 120g fresh refresh yest, 15g salt", "recipe to make cookies : floor, eggs, water, chocolat", "here is the recipe to make bread : 500g floor, 350g water, 120g fresh refresh yest, 15g salt", "il fait tres beau aujourd hui", "je n ai pas faim, je ne veux pas manger", "je suis a paris" ] }' \| jq ``` * use resize() instead for(...) * simplify top_n init since no need to return error result to test : ./tests.sh unit/test_rerank.py -v -x ==================================================== test session starts ===================================================== platform linux -- Python 3.12.3, pytest-8.3.5, pluggy-1.6.0 -- /home/yann/dev/yann/llama.cpp/tools/server/tests/test/bin/python3 cachedir: .pytest_cache rootdir: /home/yann/dev/yann/llama.cpp/tools/server/tests configfile: pytest.ini plugins: anyio-4.11.0 collected 8 items unit/test_rerank.py::test_rerank PASSED [ 12%] unit/test_rerank.py::test_rerank_tei_format PASSED [ 25%] unit/test_rerank.py::test_invalid_rerank_req[documents0] PASSED [ 37%] unit/test_rerank.py::test_invalid_rerank_req[None] PASSED [ 50%] unit/test_rerank.py::test_invalid_rerank_req[123] PASSED [ 62%] unit/test_rerank.py::test_invalid_rerank_req[documents3] PASSED [ 75%] unit/test_rerank.py::test_rerank_usage[Machine learning is-A machine-Learning is-19] PASSED [ 87%] unit/test_rerank.py::test_rerank_usage[Which city?-Machine learning is -Paris, capitale de la-26] PASSED [100%] ===================================================== 8 passed in 4.31s ====================================================== * add rerank top_n unit test here is the result : ./tests.sh unit/test_rerank.py -v -x =================================================================== test session starts =================================================================== platform linux -- Python 3.12.3, pytest-8.3.5, pluggy-1.6.0 -- /home/yann/dev/yann/llama.cpp/tools/server/tests/test/bin/python3 cachedir: .pytest_cache rootdir: /home/yann/dev/yann/llama.cpp/tools/server/tests configfile: pytest.ini plugins: anyio-4.11.0 collected 16 items unit/test_rerank.py::test_rerank PASSED [ 6%] unit/test_rerank.py::test_rerank_tei_format PASSED [ 12%] unit/test_rerank.py::test_invalid_rerank_req[documents0] PASSED [ 18%] unit/test_rerank.py::test_invalid_rerank_req[None] PASSED [ 25%] unit/test_rerank.py::test_invalid_rerank_req[123] PASSED [ 31%] unit/test_rerank.py::test_invalid_rerank_req[documents3] PASSED [ 37%] unit/test_rerank.py::test_rerank_usage[Machine learning is-A machine-Learning is-19] PASSED [ 43%] unit/test_rerank.py::test_rerank_usage[Which city?-Machine learning is -Paris, capitale de la-26] PASSED [ 50%] unit/test_rerank.py::test_rerank_top_n[None-4] PASSED [ 56%] unit/test_rerank.py::test_rerank_top_n[2-2] PASSED [ 62%] unit/test_rerank.py::test_rerank_top_n[4-4] PASSED [ 68%] unit/test_rerank.py::test_rerank_top_n[99-4] PASSED [ 75%] unit/test_rerank.py::test_rerank_tei_top_n[None-4] PASSED [ 81%] unit/test_rerank.py::test_rerank_tei_top_n[2-2] PASSED [ 87%] unit/test_rerank.py::test_rerank_tei_top_n[4-4] PASSED [ 93%] unit/test_rerank.py::test_rerank_tei_top_n[99-4] PASSED [100%] =================================================================== 16 passed in 8.84s =================================================================== * editor config check fix	2025-10-11 16:39:04 +03:00
Diego Devesa	97870e6497	cuda : avoid initializing unused devices (#16510 )	2025-10-11 13:02:26 +02:00
amirai21	477a66b035	convert : correctly handle LLaMA tokenizer for Jamba (#16470 ) Some checks failed Python Type-Check / pyright type-check (push) Waiting to run Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details * fix: convert_hf_to_gguf - change Jamba non-sentencepiece mode (tokenizer.json) vocab construction * fix: convert_hf_to_gguf - jamba non-sentencepiece tokenizer to use _set_vocab_llama_hf func * fix: convert_hf_to_gguf - removed get_vocab_base_pre from jamba	2025-10-11 10:33:41 +02:00
Concedo	0cc0ea4cf9	reset prompt template idx	2025-10-11 12:30:07 +08:00
Concedo	5cea2fe944	don't enforce dims	2025-10-11 11:34:47 +08:00
Concedo	80f88eb703	wip qwen image edit. not working yet	2025-10-11 11:24:17 +08:00
Concedo	6d8f8cd65b	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/CMakeLists.txt	2025-10-11 10:01:43 +08:00
Georgi Gerganov	e60f01d941	server : fix division by zero when reporting stats (#16501 ) Some checks are pending Python Type-Check / pyright type-check (push) Waiting to run Details	2025-10-10 22:15:05 +03:00
Georgi Gerganov	81086cd6a3	vocab : mark EOT token for Granite models (#16499 ) * vocab : mark EOT token for Granite models * sampling : fallback to EOS when EOT is not found	2025-10-10 17:17:31 +03:00
Radoslav Gerganov	68ee98ae18	server : return HTTP 400 if prompt exceeds context length (#16486 ) In streaming mode when prompt exceeds context length, the server returns HTTP 200 status code with a JSON error in the body. This is very confusing and inconsistent with all other inference engines which return HTTP 4xx error in this case. This patch fixes this problem and makes the server return HTTP 400 in such cases.	2025-10-10 16:11:07 +02:00
Radoslav Gerganov	cdb6da468c	server : log requests to /v1/completions (#16495 )	2025-10-10 13:22:27 +03:00
Concedo	bc09f34f66	only accept qwen image pruned models matching 40 or 41 layers	2025-10-10 16:26:32 +08:00
Wagner Bruna	bc762fe9b4	add support for Qwen Image Pruning (#1779 ) From leejet/stable-diffusion.cpp#874 .	2025-10-10 16:22:47 +08:00
Prajwal B Mehendarkar	6d69ab3f26	cmake : Dont define XOPENSOURCE on AIX (#16481 )	2025-10-10 11:15:46 +03:00
Wagner Bruna	bece22f996	fix encoding VAE tiling for Qwen Image (#1785 )	2025-10-10 10:07:50 +08:00
Pascal	1faa13a118	webui: updated the chat service to only include max_tokens in the req… (#16489 ) * webui: updated the chat service to only include max_tokens in the request payload when the setting is explicitly provided, while still mapping explicit zero or null values to the infinite-token sentinel * chore: update webui build output	2025-10-09 22:54:57 +02:00
duduta	1deee0f8d4	cpu : optimize the ggml NORM operation (#15953 ) * ggml-cpu: optimize norm operation to use intrinsics or Accelerate rename function add endif macro comment Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Aaron Teo <taronaeo@gmail.com> * implement s390x SIMD suggested by @taronaeo * add TODO comment * tidy up spaces --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Aaron Teo <taronaeo@gmail.com>	2025-10-09 21:11:15 +02:00
Georgi Gerganov	d00cbea63c	server : host-memory prompt caching (#16391 ) * minor : code style * server : fix prompt similarity calculation * server : initial host-memory prompt caching * cont * server : refactor * cont * cont : make the server task of the slot const * cont : minor [no ci] * server : cache prompts and checkpoints only for completion tasks * server : improve prompt caching logic * cont : fix check for number of cached prompts [no ci] * server : improve caching logic, add -cram CLI arg * server : print prompt mismatch info * cont : better naming [no ci] * server : improve prompt cache loading logic * server : add option to debug the slot contents (#16482) * server : add option to debug the slot contents * Update tools/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * server : add option to disable prompt cache --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>	2025-10-09 18:54:51 +03:00

1 2 3 4 5 ...

9934 commits