koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-04-28 03:30:20 +00:00

Author	SHA1	Message	Date
Diego Devesa	cf0a43bb64	llama-bench : add defrag-thold, check for invalid ranges (#13487 )	2025-05-13 00:31:37 +02:00
Concedo	21e31e255b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # build-xcframework.sh # common/CMakeLists.txt # examples/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-metal/ggml-metal.m # ggml/src/ggml-metal/ggml-metal.metal # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/compare-llama-bench.py # src/CMakeLists.txt # src/llama-model.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-opt.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/clip.cpp # tools/rpc/rpc-server.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-05-13 00:28:35 +08:00
Xuan-Son Nguyen	de4c07f937	clip : cap max image size 1024 for qwen vl model (#13478 )	2025-05-12 15:06:51 +02:00
Anudit Nagar	91159ee9df	server : allow content to be null in oaicompat_completion_params_parse (#13477 )	2025-05-12 13:56:42 +02:00
Diego Devesa	22cdab343b	llama-bench : accept ranges for integer parameters (#13410 )	2025-05-12 13:08:22 +02:00
City	c104023994	mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459 )	2025-05-12 00:39:06 +02:00
Anthony Umfer	9a390c4829	tools : fix uninitialized llama_batch in server (#13436 ) * add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free() * Update tools/server/server.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * use C++11 initializer syntax * switch from Copy-list-initialization to Direct-list-initialization --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-05-11 17:08:26 +02:00
David Huang	7f323a589f	Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386 )	2025-05-11 14:18:39 +02:00
City	3eac209319	mtmd : support InternVL 3 38B and 78B mmproj (#13443 ) * Support InternVL 3 38B and 78B mmproj * Swap norms in clip.cpp * Group variables together	2025-05-11 11:35:52 +02:00
Xuan-Son Nguyen	a634d75d1b	mtmd : move helpers to dedicated file (#13442 ) * mtmd : move helpers to dedicated file * fix windows build * rm redundant include	2025-05-11 11:34:23 +02:00
Concedo	f841b29c41	fixed unicode paths	2025-05-11 14:05:54 +08:00
Xuan-Son Nguyen	15e6125a39	mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434 ) * mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl * fix typo	2025-05-10 19:57:54 +02:00
Xuan-Son Nguyen	3b24d26c22	server : update docs (#13432 )	2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen	053367d149	mtmd : support InternVL 2.5 and 3 (#13422 ) * convert : internvl support * InternVL3-1B working * fix regression * rm mobilevlm from test * fix conversion * add test for internvl * add to list of pre-quant * restore boi/eoi check * add clarify comment for norm eps	2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen	33eff40240	server : vision support via libmtmd (#12898 ) * server : (experimental) vision support via libmtmd * mtmd : add more api around mtmd_image_tokens * mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * abstract out the batch management * small fix * refactor logic adding tokens to batch * implement hashing image * use FNV hash, now hash bitmap instead of file data * allow decoding image embedding to be split into batches * rm whitespace * disable some features when mtmd is on * fix --no-mmproj-offload * mtmd_context_params no timings * refactor server_inp to server_tokens * fix the failing test case * init * wip * working version * add mtmd::bitmaps * add test target * rm redundant define * test: mtmd_input_chunks_free * rm outdated comment * fix merging issue * explicitly create mtmd::input_chunks * mtmd_input_chunk_copy * add clone() * improve server_input struct * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff * fix detokenize * add const to various places * add warning about breaking changes * add c api * helper: use mtmd_image_tokens_get_n_pos * fix ctx_shift * fix name shadowing * more strict condition * support remote image_url * remote image_url log * add CI test * do not log base64 * add "has_multimodal" to /props * remove dangling image * speculative: use slot.cache_tokens.insert * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * rm can_be_detokenized * on prmpt processing done, assert cache_tokens.size * handle_completions_impl returns void * adapt the new web ui * update docs and hot topics * rm assert * small fix (2) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-05-09 19:29:37 +02:00
Concedo	6bb44391bd	Merge commit '`5c86c9ed3e`' into concedo_experimental # Conflicts: # tools/imatrix/imatrix.cpp # tools/mtmd/README.md # tools/run/README.md # tools/run/run.cpp	2025-05-10 00:30:18 +08:00
Diego Devesa	27ebfcacba	llama : do not crash if there is no CPU backend (#13395 ) * llama : do not crash if there is no CPU backend * add checks to examples	2025-05-09 13:02:07 +02:00
Bartowski	efb8b47eda	imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389 ) * Add --parse-special for enabling parsing of special tokens in imatrix calculation * whitespace	2025-05-09 11:53:58 +02:00
R0CKSTAR	0527771dd8	llama-run: add support for downloading models from ModelScope (#13370 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-05-09 10:25:50 +01:00
Concedo	42f6930e13	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp	2025-05-09 17:18:14 +08:00
Xuan-Son Nguyen	2189fd3b63	mtmd : fix batch_view for m-rope (#13397 ) * mtmd : fix batch_view for m-rope * nits : fix comment	2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen	3f96aeff39	llama : one-off chat template fix for Mistral-Small-2503 (#13398 ) * llama : one-off chat template fix for Mistral-Small-2503 * update readme * add mistral-v7-tekken	2025-05-09 11:17:51 +02:00
Xuan-Son Nguyen	d9c4accaff	server : (webui) rename has_multimodal --> modalities (#13393 ) * server : (webui) rename has_multimodal --> modalities * allow converting SVG to PNG * less complicated code	2025-05-09 09:06:37 +02:00
Concedo	2f5f4ee65a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # common/CMakeLists.txt	2025-05-09 14:18:20 +08:00
Matt Clayton	f05a6d71a0	mtmd : Expose helper_decode_image_chunk (#13366 ) * mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free * Slim down * Cleanups	2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen	ee01d71e58	server : (webui) fix a very small misalignment (#13387 ) * server : (webui) fix a very small misalignment * restore font-bold	2025-05-08 18:51:45 +02:00
Concedo	2439014a03	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # examples/embedding/embedding.cpp # tools/imatrix/imatrix.cpp # tools/perplexity/perplexity.cpp	2025-05-08 23:41:02 +08:00
Xuan-Son Nguyen	8c83449cb7	server : (webui) revamp the input area, plus many small UI improvements (#13365 ) * rework the input area * process selected file * change all icons to heroicons * fix thought process collapse * move conversation more menu to sidebar * sun icon --> moon icon * rm default system message * stricter upload file check, only allow image if server has mtmd * build it * add renaming * better autoscroll * build * add conversation group * fix scroll * extra context first, then user input in the end * fix <hr> tag * clean up a bit * build * add mb-3 for <pre> * throttle adjustTextareaHeight to make it less laggy * (nits) missing padding in sidebar * rm stray console log	2025-05-08 15:37:29 +02:00
welix	0ccc121354	mtmd : fix the calculation of n_tokens for smolvlm (#13381 ) Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com>	2025-05-08 15:03:53 +02:00
Georgi Gerganov	6562e5a4d6	context : allow cache-less context for embeddings (#13108 ) * context : allow cache-less context for embeddings ggml-ci * context : enable reranking with encode() ggml-ci * context : encode() clears embd_seq ggml-ci * examples : use llama_encode() when appropriate ggml-ci * models : nomic bert moe does not require KV cache * llama : update comments for llama_decode/llama_encode ggml-ci * context : update warning log [no ci]	2025-05-08 14:28:33 +03:00
Georgi Gerganov	51fb96b1ff	context : remove logits_all flag (#13284 ) * context : remove logits_all flag ggml-ci * llama : remove logits_all flag + reorder llama_context_params ggml-ci	2025-05-08 14:26:50 +03:00
Concedo	38b3bffcef	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakePresets.json # ggml/src/ggml-cuda/CMakeLists.txt # tests/test-sampling.cpp # tools/mtmd/clip.cpp	2025-05-07 19:47:44 +08:00
Xuan-Son Nguyen	32916a4907	clip : refactor graph builder (#13321 ) * mtmd : refactor graph builder * fix qwen2vl * clean up siglip cgraph * pixtral migrated * move minicpmv to a dedicated build function * move max_feature_layer to build_llava * use build_attn for minicpm resampler * fix windows build * add comment for batch_size * also support tinygemma3 test model * qwen2vl does not use RMS norm * fix qwen2vl norm (2)	2025-05-06 22:40:24 +02:00
Concedo	ffe23f0e93	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-sycl/ggml-sycl.cpp # pyproject.toml	2025-05-06 23:39:45 +08:00
Concedo	0fa435b2a6	Merge commit '`9b61acf060`' into concedo_experimental # Conflicts: # Makefile # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # requirements/requirements-all.txt # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/android/adb_run.sh # tools/mtmd/android/build_64.sh # tools/mtmd/clip-quantize-cli.cpp	2025-05-06 23:34:21 +08:00
Concedo	1377a93a73	Merge commit '`5215b91e93`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # cmake/x64-windows-llvm.cmake # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/CMakeLists.txt # tools/imatrix/imatrix.cpp # tools/llava/clip.cpp # tools/rpc/rpc-server.cpp	2025-05-06 23:15:04 +08:00
oobabooga	233461f812	sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264 ) * sampling: add Top-nσ sampler to `llama-server` and sampler ordering * revert: sampler ordering * revert: VS' crappy auto-formatting * revert: VS' crappy auto-formatting pt.2 * revert: my crappy eye sight... * sampling: add XTC to Top-nσ sampler chain * sampling: add Dyna. Temp. to Top-nσ sampler chain * sampling: actually remove Top-nσ from sampler(oops) * Integrate top_n_sigma into main sampler chain * Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA * Formatting * Lint * Exit early in the sampler if nsigma < 0 --------- Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>	2025-05-05 22:12:19 +02:00
igardev	b34c859146	server : Webui - change setText command from parent window to also send the message. (#13309 ) * setText command from parent window for llama-vscode now sends the message automatically. * Upgrade packages versions to fix vulnerabilities with "npm audit fix" command. * Fix code formatting. * Add index.html.gz changes. * Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command." This reverts commit 67687b7fda8a293724ba92ea30bb151677406bc8. * easier approach * add setTimeout --------- Co-authored-by: igardev <ivailo.gardev@akros.ch> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-05 16:03:31 +02:00
Xuan-Son Nguyen	9b61acf060	mtmd : rename llava directory to mtmd (#13311 ) * mv llava to mtmd * change ref everywhere	2025-05-05 16:02:55 +02:00
Xuan-Son Nguyen	5215b91e93	clip : fix confused naming ffn_up and ffn_down (#13290 ) * clip : fix confused naming ffn_up and ffn_down * rm ffn_i/o/g naming * rename n_embd, n_ff * small fix * no check n_ff	2025-05-05 12:54:44 +02:00
Xuan-Son Nguyen	27aa259532	mtmd : add C public API (#13184 ) * init * wip * working version * add mtmd::bitmaps * add test target * rm redundant define * test: mtmd_input_chunks_free * rm outdated comment * fix merging issue * explicitly create mtmd::input_chunks * mtmd_input_chunk_copy * add clone() * add const to various places * add warning about breaking changes * helper: use mtmd_image_tokens_get_n_pos	2025-05-04 23:43:42 +02:00
Diego Devesa	9fdfcdaedd	rpc : use backend registry, support dl backends (#13304 )	2025-05-04 21:25:43 +02:00
Diego Devesa	86bd60d3fe	llava/mtmd : fixes to fully support dl backends (#13303 )	2025-05-04 17:05:20 +02:00
Johannes Gäßler	3e959f0976	imatrix: fix oob writes if src1 is not contiguous (#13286 )	2025-05-04 00:50:37 +02:00
Xuan-Son Nguyen	36667c8edc	clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259 )	2025-05-03 20:07:54 +02:00
Concedo	5a2808ffaf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .flake8 # .github/labeler.yml # .github/workflows/bench.yml.disabled # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # CODEOWNERS # Makefile # README.md # SECURITY.md # build-xcframework.sh # ci/run.sh # docs/development/HOWTO-add-model.md # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # examples/CMakeLists.txt # examples/pydantic_models_to_grammar_examples.py # grammars/README.md # pyrightconfig.json # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # scripts/tool_bench.py # scripts/xxd.cmake # tests/CMakeLists.txt # tests/run-json-schema-to-grammar.mjs # tools/batched-bench/CMakeLists.txt # tools/batched-bench/README.md # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/CMakeLists.txt # tools/cvector-generator/README.md # tools/cvector-generator/completions.txt # tools/cvector-generator/cvector-generator.cpp # tools/cvector-generator/mean.hpp # tools/cvector-generator/negative.txt # tools/cvector-generator/pca.hpp # tools/cvector-generator/positive.txt # tools/export-lora/CMakeLists.txt # tools/export-lora/README.md # tools/export-lora/export-lora.cpp # tools/gguf-split/CMakeLists.txt # tools/gguf-split/README.md # tools/imatrix/CMakeLists.txt # tools/imatrix/README.md # tools/imatrix/imatrix.cpp # tools/llama-bench/CMakeLists.txt # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/llava/CMakeLists.txt # tools/llava/README.md # tools/llava/android/adb_run.sh # tools/llava/android/build_64.sh # tools/llava/clip-quantize-cli.cpp # tools/main/CMakeLists.txt # tools/main/README.md # tools/perplexity/CMakeLists.txt # tools/perplexity/README.md # tools/perplexity/perplexity.cpp # tools/quantize/CMakeLists.txt # tools/rpc/CMakeLists.txt # tools/rpc/README.md # tools/rpc/rpc-server.cpp # tools/run/CMakeLists.txt # tools/run/README.md # tools/run/linenoise.cpp/linenoise.cpp # tools/run/linenoise.cpp/linenoise.h # tools/run/run.cpp # tools/server/CMakeLists.txt # tools/server/README.md # tools/server/bench/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md # tools/server/themes/README.md # tools/server/themes/buttons-top/README.md # tools/server/themes/wild/README.md # tools/tokenize/CMakeLists.txt # tools/tokenize/tokenize.cpp	2025-05-03 12:15:36 +08:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00

... 16 17 18 19 20

997 commits