koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-04-28 03:30:20 +00:00

Author	SHA1	Message	Date
Adrien Gallouët	463b6a963c	tools : enable kvu in perplexity for hellaswag, winogrande, multiple-choice (#19954 ) llama-perplexity -hf unsloth/Qwen3-0.6B-GGUF:Q4_K_M -f winogrande-debiased-eval.csv --winogrande winogrande_score : tokenizing selected tasks winogrande_score : calculating winogrande score over selected tasks. split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 46 failed to decode the batch, n_batch = 2048, ret = 1 winogrande_score: llama_decode() failed same for hellaswag: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 99 failed to decode the batch, n_batch = 2048, ret = 1 hellaswag_score: llama_decode() failed Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-13 21:25:57 +01:00
ZeroV0LT	f17b3be63f	llama : fix pooling assertion crash in chunked GDN detection path (#20468 ) * llama : fix pooling assertion crash in chunked GDN detection path The chunked fused Gated Delta Net detection in sched_reserve() calls graph_reserve(16n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs. This creates a dimension mismatch in build_pooling() for embedding models with mean/rank pooling: build_inp_mean() creates a tensor with shape [n_tokens=16n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...] via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b). Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation, matching the pattern used by the pp/tg worst-case reservations. Regression introduced by #20340 (`d28961d`). Same class of bug as #12517, fixed by #12545. * server : add mean pooling tests to embedding test suite Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple to cover the --pooling mean codepath, which was previously untested. These tests would have caught the regression introduced by #20340 where build_pooling() crashes with a ggml_mul_mat assertion due to mismatched dimensions in the chunked GDN detection path. --------- Co-authored-by: Domenico Crupi <domenico@zerovolt.it>	2026-03-13 20:53:42 +02:00
SoftwareRenderer	d7ba99c485	server: reset counter related to kill-switch on client error (#20513 ) * server: reset kill-switch on client error This avoids triggering a server kill switch. If the client sends a request that exceeds the configured context size, an appropriate HTTP 400 response is provided and no tokens are generated. However since no tokens are generated, update_slots() increments n_empty_consecutive. If the client sends 3 such messages in a row, the server terminates. * moved counter reset as per recommendation * cont : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-13 19:58:09 +02:00
Concedo	04915d99ee	Merge commit '`451ef08432`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/ops.md # docs/ops/Vulkan.csv # src/llama-model-loader.cpp # src/llama-model.cpp # src/llama.cpp # tests/CMakeLists.txt # tests/peg-parser/test-basic.cpp # tests/peg-parser/test-json-parser.cpp # tests/peg-parser/test-python-dict-parser.cpp # tests/peg-parser/test-unicode.cpp # tests/test-chat-auto-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/CMakeLists.txt	2026-03-13 23:33:37 +08:00
Concedo	d2c911884d	Merge commit '`213c4a0b81`' into concedo_experimental # Conflicts: # CODEOWNERS # common/CMakeLists.txt # common/chat-peg-parser.cpp # common/chat.cpp # docs/backend/SYCL.md # docs/development/parsing.md # docs/ops.md # docs/ops/SYCL.csv # embd_res/templates/Apriel-1.6-15b-Thinker-fixed.jinja # embd_res/templates/Bielik-11B-v3.0-Instruct.jinja # embd_res/templates/GLM-4.7-Flash.jinja # embd_res/templates/LFM2-8B-A1B.jinja # embd_res/templates/StepFun3.5-Flash.jinja # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/convert.hpp # ggml/src/ggml-sycl/count-equal.cpp # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/presets.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/vecdotq.hpp # models/templates/Apertus-8B-Instruct.jinja # models/templates/CohereForAI-c4ai-command-r7b-12-2024-tool_use.jinja # models/templates/Qwen-QwQ-32B.jinja # models/templates/Qwen3-Coder.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja # models/templates/deepseek-ai-DeepSeek-V3.1.jinja # models/templates/fireworks-ai-llama-3-firefunction-v2.jinja # models/templates/moonshotai-Kimi-K2.jinja # models/templates/unsloth-Apriel-1.5.jinja # tests/CMakeLists.txt # tests/peg-parser/test-basic.cpp # tests/peg-parser/tests.h # tests/test-backend-ops.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-peg-parser.cpp # tools/CMakeLists.txt # tools/cli/cli.cpp	2026-03-13 21:35:56 +08:00
Daniel Bevenius	8f974d2392	mtmd : rename mtmd_get_audio_bitrate to mtmd_get_audio_sample_rate (#20105 ) This commit renames the the function `mtmd_get_audio_bitrate` to `mtmd_get_audio_sample_rate` to better reflect its purpose. The motivation for this is that the function currently returns the audio sample rate, not the bitrate (sample_rate × bit_depth × channels), and that is how it is used in the code as well. This is a breaking change, but I believe mtmd is still in experimental/development phase so it might be alright to simply rename.	2026-03-13 12:30:02 +01:00
Piotr Wilkin (ilintar)	0e810413bb	tests : use `reasoning` instead of `reasoning_budget` in server tests (#20432 )	2026-03-12 13:41:01 +01:00
Pascal	de190154c8	New conversations now auto-select the first loaded model (#20403 ) * webui: auto-select first loaded model for new conversations in router mode * chore: update webui build output	2026-03-12 09:07:05 +01:00
DAN™	fdb17643d3	model : add support for Phi4ForCausalLMV (#20168 ) * Add support for Phi4ForCausalLMV. * Fix Phi-4 vision parity (correcting SigLIP2 patch-kernel export layout) and matching HF NaFlex resize behavior in mtmd. * Rename contants + fix tokenizer label * Clean-ups. * Fix GGUF export. * Set tokenizer.ggml.pre explicitly. * Default vocab name rather than forcing it. * Clean-ups. * Fix indent. * Fix subscriptable error. * remov overcomplicated code path * Clean-ups. --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-12 00:25:54 +01:00
Piotr Wilkin (ilintar)	acb7c79069	common/parser: handle reasoning budget (#20297 ) * v1 * Finished! * Handlie cli * Reasoning sampler * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Less explosive terminology :) * Add utf-8 case and tests * common : migrate reasoning budget sampler to common * cont : clean up * cont : expose state and allow passing as initial state * cont : remove unused imports * cont : update state machine doc string --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Alde Rojas <hello@alde.dev>	2026-03-11 10:26:12 +01:00
Pascal	00de615345	Fix agentic mcp image single model (#20339 ) * webui: fix MCP image attachments dropped during the agentic loop in single-model mode * chore: update webui build output	2026-03-11 05:31:33 +01:00
Concedo	6adcd0b5db	Merge commit '`34df42f7be`' into concedo_experimental # Conflicts: # README.md # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/binary-ops.c # ggml/src/ggml-hexagon/htp/cpy-ops.c # ggml/src/ggml-hexagon/htp/get-rows-ops.c # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-arith.h # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-inverse.h # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/set-rows-ops.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # tests/test-backend-ops.cpp # tools/cli/cli.cpp # tools/server/webui/src/lib/components/app/chat/ChatScreen/ChatScreen.svelte	2026-03-10 22:20:04 +08:00
Concedo	746664fde6	Merge commit '`2cd20b72ed`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # docs/backend/CANN.md # docs/backend/SYCL.md # docs/backend/snapdragon/README.md # docs/backend/snapdragon/windows.md # docs/build.md # docs/multimodal/MobileVLM.md # docs/ops.md # docs/ops/WebGPU.csv # examples/debug/README.md # examples/llama.vim # examples/model-conversion/README.md # examples/sycl/README.md # ggml/src/ggml-cpu/amx/mmq.cpp # ggml/src/ggml-cpu/arch/x86/repack.cpp # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp-drv.cpp # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/hvx-copy.h # ggml/src/ggml-hexagon/htp/hvx-inverse.h # ggml/src/ggml-hexagon/htp/hvx-reduce.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/worker-pool.c # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cpy.cl # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/quants.hpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # scripts/pr2wt.sh # scripts/server-bench.py # scripts/snapdragon/windows/run-cli.ps1 # tests/test-alloc.cpp # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/completion/README.md # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/README.md # tools/perplexity/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md	2026-03-10 22:11:08 +08:00
Georgi Gerganov	a7b3dee7a5	server : make 2 checkpoints near the end of the prompt (#20288 ) * server : make 2 checkpoints near the end of the prompt * cont : adjust checkpoints	2026-03-10 14:28:23 +02:00
ddh0	1dab5f5a44	llama-quant : fail early on missing imatrix, refactor type selection, code cleanup (#19770 ) * quantize : imatrix-fail early + code cleanup * fix manual override printing it's in the preliminary loop now, so needs to be on its own line * revert header changes per ggerganov * remove old #includes * clarify naming rename `tensor_quantization` to `tensor_typo_option` to descirbe its functionality * fix per barto	2026-03-10 08:16:05 +02:00
Evan Huus	23fbfcb1ad	server: Parse port numbers from MCP server URLs in CORS proxy (#20208 ) * Parse port numbers from MCP server URLs * Pass scheme to http proxy for determining whether to use SSL * Fix download on non-standard port and re-add port to logging * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-09 17:47:54 +01:00
Georgi Gerganov	96cfc4992c	server : fix checkpoints n_tokens calculation (#20287 )	2026-03-09 16:47:06 +02:00
Georgi Gerganov	344ee2a38a	server : warn swa-full is not supported for non-SWA models (#20291 )	2026-03-09 16:44:25 +02:00
Georgi Gerganov	d6e1556499	server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279 ) * server : fix off-by-1 in server_tokens::size_up_to_pos() * cont : fix typo [no ci]	2026-03-09 16:43:38 +02:00
Georgi Gerganov	107d599952	server : add kill switch when server is stuck (#20277 )	2026-03-09 10:33:12 +02:00
Aaron Teo	ae87863dc1	llama-bench: introduce `-hf` and `-hff` flags & use `--mmap 1` by default (#20211 )	2026-03-09 09:05:44 +08:00
Georgi Gerganov	d417bc43dd	server : do not create checkpoints right after mtmd chunks (#20232 )	2026-03-08 22:16:46 +02:00
Johannes Gäßler	a976ff081b	llama: end-to-end tests (#19802 ) * tests: add end-to-end tests per model architecture * fixup for rebase * fix use-after-free in llama-model-loader.cpp * fix CI * fix WebGPU * fix CI * disable CI for macOS-latest-cmake-arm64 * use expert_weights_scale only if != 0.0f * comments	2026-03-08 12:30:21 +01:00
decahedron1	ff52ee964d	server : correct index on finish in OAI completion streams (#20226 )	2026-03-08 10:08:57 +01:00
Piotr Wilkin (ilintar)	566059a26b	Autoparser - complete refactoring of parser architecture (#18675 ) * Autoparser - full single commit squish * Final pre-merge changes: minor fixes, Kimi 2.5 model parser	2026-03-06 21:01:00 +01:00
Tom Vaucourt	e68f2fb894	server : preserve anthropic thinking blocks in conversion (#20120 ) * server : preserve anthropic thinking blocks in conversion (#20090) * server : add tests for anthropic thinking block conversion --------- Co-authored-by: root <root@llamacpp.home>	2026-03-06 17:41:12 +01:00
Concedo	d20e60ddd5	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # examples/batched/batched.cpp # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp # examples/deprecation-warning/deprecation-warning.cpp # examples/eval-callback/eval-callback.cpp # examples/gen-docs/gen-docs.cpp # examples/gguf-hash/gguf-hash.cpp # examples/gguf/gguf.cpp # examples/lookahead/lookahead.cpp # examples/lookup/lookup-create.cpp # examples/lookup/lookup-merge.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # examples/sycl/ls-sycl-device.cpp # examples/training/finetune.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/amx/common.h # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-opencl/kernels/gemv_noshuffle_general_q8_0_f32.cl # ggml/src/ggml-opencl/kernels/transpose.cl # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl # scripts/get-wikitext-2.sh # tests/test-backend-ops.cpp # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/cvector-generator.cpp # tools/export-lora/export-lora.cpp # tools/imatrix/imatrix.cpp # tools/llama-bench/llama-bench.cpp # tools/perplexity/perplexity.cpp # tools/rpc/rpc-server.cpp # tools/tokenize/tokenize.cpp	2026-03-06 21:19:49 +08:00
Concedo	abcca8c0f9	do not use the mxfp4 repack - repack must be synced again from before this commit if it's ever to be used in future. this will break compilation with older w64devkit	2026-03-06 21:07:41 +08:00
JustCommitRandomness	2fbc3b2ae5	Adjust int types in format strings (#2009 ) * tweak format sting types This may not be all of them, but it's the ones which warn on OpenBSD * complete the changes needed to fix the format string specifers * avoid using inttypes, directly cast to size_t (u64 usually) instead --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2026-03-06 19:06:18 +08:00
Piotr Wilkin (ilintar)	f5ddcd1696	Checkpoint every n tokens: squash (#20087 )	2026-03-06 11:39:26 +01:00
Aleksander Grygier	f6235a41ef	webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655 )	2026-03-06 10:00:39 +01:00
Roj234	f7db3f3789	cli : Don't clear system prompt when using '/clear' (#20067 ) * Enhance /clear command to include system prompt Add system prompt to messages when clearing chat history. * Use lambda	2026-03-06 06:41:11 +01:00
Sigbjørn Skjæret	b5ed0e058c	cli : add command and file auto-completion (#19985 )	2026-03-05 10:47:28 +01:00
Aleksander Grygier	5e335ba113	webui: Improvements for Models Selector UI (#20066 )	2026-03-05 08:52:22 +01:00
Marcel Petrick	92f7da00b4	chore : correct typos [no ci] (#20041 ) * fix(docs): correct typos found during code review Non-functional changes only: - Fixed minor spelling mistakes in comments - Corrected typos in user-facing strings - No variables, logic, or functional code was modified. Signed-off-by: Marcel Petrick <mail@marcelpetrick.it> * Update docs/backend/CANN.md Co-authored-by: Aaron Teo <taronaeo@gmail.com> * Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8" This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256. * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Signed-off-by: Marcel Petrick <mail@marcelpetrick.it> Co-authored-by: Aaron Teo <taronaeo@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-05 08:50:21 +01:00
Sigbjørn Skjæret	d969e933e1	tools : add missing clocale include in mtmd-cli [no ci] (#20107 )	2026-03-04 14:18:04 +01:00
SamareshSingh	cb8f4fa3f8	Fix locale-dependent float printing in GGUF metadata (#17331 ) * Set C locale for consistent float formatting across all binaries. * Add C locale setting to all tools binaries Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/ directory to ensure consistent floating-point formatting. * Apply suggestion from @JohannesGaessler --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-03-04 09:30:40 +01:00
standby24x7	54910bd4f3	completion : Fix a typo in warning message (#20082 ) resuse -> reuse	2026-03-04 06:44:49 +01:00
Concedo	4e358265a3	Merge commit '`8387ffb28d`' into concedo_experimental # Conflicts: # docs/backend/VirtGPU.md # docs/backend/ZenDNN.md # ggml/src/ggml-cpu/amx/amx.cpp # ggml/src/ggml-cpu/amx/mmq.cpp # ggml/src/ggml-sycl/add-id.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-backend.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched.cpp # ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h # ggml/src/ggml-virtgpu/backend/backend-dispatched.h # ggml/src/ggml-virtgpu/backend/backend-virgl-apir.h # ggml/src/ggml-virtgpu/backend/backend.cpp # ggml/src/ggml-virtgpu/backend/shared/api_remoting.h # ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h # ggml/src/ggml-virtgpu/backend/shared/apir_backend.h # ggml/src/ggml-virtgpu/backend/shared/apir_cs.h # ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h # ggml/src/ggml-virtgpu/backend/shared/apir_cs_rpc.h # ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp # ggml/src/ggml-virtgpu/ggml-backend-device.cpp # ggml/src/ggml-virtgpu/ggml-backend-reg.cpp # ggml/src/ggml-virtgpu/ggml-backend.cpp # ggml/src/ggml-virtgpu/ggml-remoting.h # ggml/src/ggml-virtgpu/include/apir_hw.h # ggml/src/ggml-virtgpu/regenerate_remoting.py # ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp # ggml/src/ggml-virtgpu/virtgpu-forward-impl.h # ggml/src/ggml-virtgpu/virtgpu-forward.gen.h # ggml/src/ggml-virtgpu/virtgpu.cpp # ggml/src/ggml-virtgpu/virtgpu.h # ggml/src/ggml-zendnn/CMakeLists.txt # ggml/src/ggml-zendnn/ggml-zendnn.cpp # src/CMakeLists.txt # tests/CMakeLists.txt # tests/test-tokenizer-0.sh # tools/cli/README.md # tools/completion/README.md # tools/imatrix/imatrix.cpp # tools/server/README.md	2026-02-28 12:45:16 +08:00
Roj234	3e6ab244ad	server: Add pragma once to server-context.h (#19944 )	2026-02-27 18:28:36 +01:00
Sami Kama	5596a35791	server: Mirroring /v1/responses to /responses to match /v1/chat/completions pattern (#19873 )	2026-02-28 00:44:42 +08:00
Pascal	2e7e638523	server : support multiple model aliases via comma-separated --alias (#19926 ) * server : support multiple model aliases via comma-separated --alias * server : update --alias description and regenerate docs * server : multiple model aliases and tags - address review feedback from ngxson - --alias accepts comma-separated values (std::set, no duplicates) - --tags for informational metadata (not used for routing) - aliases resolve transparently in router via get_meta/has_model - /v1/models exposes aliases and tags fields * regenerate docs * nits * server : use first alias as model_name for backward compat address review feedback from ngxson * server : add single-model test for aliases and tags	2026-02-27 07:05:23 +01:00
Georgi Gerganov	37964f44f9	mtmd : fix padding of n_tokens (#19930 )	2026-02-26 18:39:49 +02:00
Georgi Gerganov	01cd448b8c	server : fix ctx checkpoint restore logic (#19924 )	2026-02-26 18:20:16 +02:00
drrros	efba35a860	server: fix load-on-startup not respected in ini file (#19897 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details Co-authored-by: Roman Marchenko <r.marchenko@ideco.ru>	2026-02-26 12:32:31 +01:00
Maximilian Werk	66287bdaac	model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826 ) * WIP: Add EuroBERT support with autoformatting changes This commit includes: - EuroBERT model implementation for GGUF conversion - C++ backend support for EuroBERT architecture - Unintended autoformatting changes to Python files Saving before reverting formatting-only changes. * feat: add back eos assert when not last token pooling * feat: removed duplicated code and cleanup * feat: removed not working architectures and unnecessary check * fix: typo * fix: dynamic pooling config * feat: added an example model for eurobert * feat: proper llama-vocab implementation for jina-v5 * fix: removed unnecessary comments	2026-02-26 12:14:09 +01:00
yggdrasil75	bd72300591	server : fix typo in server README.md (#19900 ) fix typo	2026-02-26 11:26:16 +01:00
Concedo	749a606374	whisper broke	2026-02-26 16:45:04 +08:00
Concedo	44182ebefe	Merge commit '`8c2c0108dd`' into concedo_experimental # Conflicts: # examples/model-conversion/Makefile # examples/model-conversion/scripts/utils/inspect-org-model.py # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/get-rows-ops.c # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-hexagon/htp/rope-ops.c # ggml/src/ggml-hexagon/htp/set-rows-ops.c # ggml/src/ggml-hexagon/htp/softmax-ops.c # ggml/src/ggml-hexagon/htp/unary-ops.c # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/snapdragon/adb/run-mtmd.sh # scripts/snapdragon/windows/run-cli.ps1 # scripts/sync_vendor.py # tests/test-backend-sampler.cpp	2026-02-26 16:30:37 +08:00
Concedo	7e53bfd28d	Merge commit '`2b6dfe824d`' into concedo_experimental # Conflicts: # .github/workflows/release.yml # examples/save-load-state/save-load-state.cpp # src/llama-context.cpp # tools/cli/cli.cpp	2026-02-26 15:07:23 +08:00

... 3 4 5 6 7 ...

997 commits