koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
Concedo	3fefb3bdf2	Merge commit '`f0adb80bf7`' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/backend/SYCL.md # docs/docker.md # examples/sycl/run-llama2.sh # examples/sycl/win-run-llama2.bat # ggml/src/ggml-sycl/ggml-sycl.cpp # tools/llama-bench/README.md	2025-05-21 19:10:57 +08:00
Alberto Cabrera Pérez	725f23f1f3	sycl : backend documentation review (#13544 ) * sycl: reviewing and updating docs * Updates Runtime error codes * Improves OOM troubleshooting entry * Added a llama 3 sample * Updated supported models * Updated releases table	2025-05-19 14:38:20 +01:00
Nick	9c55e5c5c2	fix: check model pointer validity before use (#13631 )	2025-05-19 13:25:41 +03:00
Georgi Gerganov	518329b2d4	parallel : add option for non-shared and larger prompts (#13598 ) * parallel : add option for non-shared and larger prompts * parallel : update readme [no ci] * cont : add note about base models [no ci] * parallel : better var name ggml-ci	2025-05-17 12:58:55 +03:00
Concedo	21e31e255b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # build-xcframework.sh # common/CMakeLists.txt # examples/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-metal/ggml-metal.m # ggml/src/ggml-metal/ggml-metal.metal # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/compare-llama-bench.py # src/CMakeLists.txt # src/llama-model.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-opt.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/clip.cpp # tools/rpc/rpc-server.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-05-13 00:28:35 +08:00
Johannes Gäßler	10d2af0eaa	llama/ggml: add LLM training support (#10544 ) * llama/ggml: add LLM training support more compact progress bar llama_save_model_to_file llama_opt_param_filter ggml_graph_dup force_grads refactor ggml_opt, fix test-opt * remove logits_all * refactor CUDA implementation for ACC * reset graph at beginning of opt period	2025-05-12 14:44:49 +02:00
Georgi Gerganov	6562e5a4d6	context : allow cache-less context for embeddings (#13108 ) * context : allow cache-less context for embeddings ggml-ci * context : enable reranking with encode() ggml-ci * context : encode() clears embd_seq ggml-ci * examples : use llama_encode() when appropriate ggml-ci * models : nomic bert moe does not require KV cache * llama : update comments for llama_decode/llama_encode ggml-ci * context : update warning log [no ci]	2025-05-08 14:28:33 +03:00
Georgi Gerganov	4773d7a02f	examples : remove infill (#13283 ) ggml-ci	2025-05-07 10:28:02 +03:00
Concedo	5a2808ffaf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .flake8 # .github/labeler.yml # .github/workflows/bench.yml.disabled # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # CODEOWNERS # Makefile # README.md # SECURITY.md # build-xcframework.sh # ci/run.sh # docs/development/HOWTO-add-model.md # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # examples/CMakeLists.txt # examples/pydantic_models_to_grammar_examples.py # grammars/README.md # pyrightconfig.json # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # scripts/tool_bench.py # scripts/xxd.cmake # tests/CMakeLists.txt # tests/run-json-schema-to-grammar.mjs # tools/batched-bench/CMakeLists.txt # tools/batched-bench/README.md # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/CMakeLists.txt # tools/cvector-generator/README.md # tools/cvector-generator/completions.txt # tools/cvector-generator/cvector-generator.cpp # tools/cvector-generator/mean.hpp # tools/cvector-generator/negative.txt # tools/cvector-generator/pca.hpp # tools/cvector-generator/positive.txt # tools/export-lora/CMakeLists.txt # tools/export-lora/README.md # tools/export-lora/export-lora.cpp # tools/gguf-split/CMakeLists.txt # tools/gguf-split/README.md # tools/imatrix/CMakeLists.txt # tools/imatrix/README.md # tools/imatrix/imatrix.cpp # tools/llama-bench/CMakeLists.txt # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/llava/CMakeLists.txt # tools/llava/README.md # tools/llava/android/adb_run.sh # tools/llava/android/build_64.sh # tools/llava/clip-quantize-cli.cpp # tools/main/CMakeLists.txt # tools/main/README.md # tools/perplexity/CMakeLists.txt # tools/perplexity/README.md # tools/perplexity/perplexity.cpp # tools/quantize/CMakeLists.txt # tools/rpc/CMakeLists.txt # tools/rpc/README.md # tools/rpc/rpc-server.cpp # tools/run/CMakeLists.txt # tools/run/README.md # tools/run/linenoise.cpp/linenoise.cpp # tools/run/linenoise.cpp/linenoise.h # tools/run/run.cpp # tools/server/CMakeLists.txt # tools/server/README.md # tools/server/bench/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md # tools/server/themes/README.md # tools/server/themes/buttons-top/README.md # tools/server/themes/wild/README.md # tools/tokenize/CMakeLists.txt # tools/tokenize/tokenize.cpp	2025-05-03 12:15:36 +08:00
Concedo	0951ad9f58	temp merge, not working	2025-05-03 11:42:01 +08:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Xuan-Son Nguyen	074e42ab31	convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209 ) * wip * qwen2.5vl ok * vision: fix models missing "text_config" * add test * fix test repo name * fix 32B model * Revert "fix 32B model" This reverts commit 651752f1ae25fe8a01c1e57c18cf2eca80b2774e. * clarify about 32B * rm qwen surgery script * update llava/readme * move V_ENC_EMBD_PATCH handling to Qwen2VLVisionModel	2025-05-02 17:17:15 +02:00
Concedo	d8f1f73dd7	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # cmake/build-info.cmake # common/CMakeLists.txt # examples/llava/README.md # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-05-02 16:54:15 +08:00
Concedo	ca53d1bedc	Merge commit '`13c9a3319b`' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp	2025-05-02 16:42:16 +08:00
Shakil Ahmed	e84773ab60	mtmd-cli : fix out_of_range when input image path is empty (#13244 ) * fix out_of_range error to keep the chat loop running * Update examples/llava/mtmd-cli.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * mtmd-cli : load image right away * add a new line for readability * rm printf * Update examples/llava/mtmd-cli.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update examples/llava/mtmd-cli.cpp --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-05-02 10:20:27 +02:00
Georgi Gerganov	fab647e884	server : add cache reuse card link to help (#13230 ) * server : add cache reuse card link to help * args : use short url	2025-05-02 09:48:31 +03:00
Loïc Carrère	b6e4ff69b8	clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237 )	2025-05-01 21:32:21 +02:00
Xuan-Son Nguyen	8936784f7a	mtmd : add vision support for Mistral Small 3.1 (#13231 ) * convert ok * load ok, missing patch merger * ah sheet it works * update llava/readme * add test * fix test	2025-05-01 17:05:42 +02:00
Tatsuya Tanaka	ceda28ef8e	llava : remove duplicate include (#13207 )	2025-04-30 15:25:20 +02:00
Concedo	dbb6bbf8ea	fixed clip quantize	2025-04-30 20:45:40 +08:00
Concedo	8273739412	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/cpu.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/llama-cli-cann.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .devops/vulkan.Dockerfile # examples/llama-bench/llama-bench.cpp # examples/rpc/rpc-server.cpp # scripts/compare-llama-bench.py # tests/test-quantize-stats.cpp	2025-04-30 17:22:18 +08:00
xiaofei	a0f7016d17	rpc : fix cache directory initialization (#13188 ) Signed-off-by: xiaofei <hbuxiaofei@gmail.com>	2025-04-30 09:29:22 +03:00
matteo	e2e1ddb93a	server : Prefilling assistant message in openai compatible API (#13174 ) * Prefilling assistant message in openai compatible API * fixed indentation * fixed code convention * simplify method usage * no more than one assistant message at end of messages * merge checks into prefill code * Update examples/server/utils.hpp --------- Co-authored-by: matteo <matteo@naspc.lan> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-04-29 20:33:10 +02:00
Alberto Cabrera Pérez	5a63980117	llama-bench: fixed size of fields to correctly map to values (#13183 )	2025-04-29 17:24:36 +02:00
Concedo	be66a77ca5	add f16 quantclip	2025-04-29 22:25:52 +08:00
Concedo	b2ecfa0f55	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/llama-bench/README.md # examples/llama-bench/llama-bench.cpp # examples/llava/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-chat-template.cpp	2025-04-29 21:05:16 +08:00
Xuan-Son Nguyen	00e3e5a194	mtmd : add qwen2vl and qwen2.5vl (#13141 ) * llava : add clip_n_output_tokens, deprecate clip_n_patches * mtmd : add qwen2vl and qwen2.5vl * decode_embd_batch::set_position_... * working version * deprecate llama-qwen2vl-cli * correct order W, H of clip_embd_nbytes_by_img * edit existing line in hot topics	2025-04-29 11:47:04 +02:00
Xuan-Son Nguyen	eaea325324	clip : fix model size display (#13153 )	2025-04-28 21:23:19 +02:00
Concedo	4d8a7a6594	fix occasional clip segfault, fix glm4 (+1 squashed commits) Squashed commits: [bd71cd688] GLM4 fix wip	2025-04-29 01:42:50 +08:00
Vishal Agarwal	1831f538f7	llama-bench: add `-d` depth arg (#13096 ) * add depth param * update llama-bench README and add depth param * llama-bench: default params for depth arg for faster execution * Update examples/llama-bench/README.md Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * fix buffer print ub * use user provided args * remove extra whitespaces --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-04-28 16:50:39 +02:00
Xuan-Son Nguyen	4e87962e34	mtmd : fix glm-edge redundant token count (#13139 ) * mtmd : fix glm-edge redundant token count * fix chat template * temporary disable GLMEdge test chat tmpl	2025-04-28 16:12:56 +02:00
Xuan-Son Nguyen	d2b2031e5f	llama : (mrope) allow using normal 1D position for text token (#13138 ) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp	2025-04-28 14:20:56 +02:00
Xuan-Son Nguyen	5fa9e63be8	clip : refactor set input for cgraph + fix qwen2.5vl input (#13136 ) * clip : refactor set input for cgraph * more strict assert * minicpmv : use clip_n_mmproj_embd instead of copying the same code everywhere * split qwen2 and qwen2.5 code blocks * minor style fix	2025-04-28 12:18:59 +02:00
4onen	c0a97b762e	llama-bench : Add `--override-tensors` arg (#12922 ) * Add --override-tensors option to llama-bench * Correct llama-bench --override-tensors to --override-tensor * llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix. * Make new llama-bench util functions static to fix Ubuntu CI * llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)	2025-04-27 23:48:26 +02:00
LostRuins Concedo	59e991c23c	Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133 )	2025-04-27 12:43:37 +02:00
Concedo	37060f54da	backwards compat handle older HimarIO quants	2025-04-27 17:38:22 +08:00
Concedo	f8b7ddeac0	emergency fix for q25vl	2025-04-27 16:46:33 +08:00
HimariO	ca2bb89eac	clip : Add Qwen2.5VL support (#12402 ) * implment vision model architecture, gguf convertor * handle window attention inputs * add debug utils * fix few incorrect tensor memory layout * move position id remap out of ggml to avoid int32 cuda operations * cleaning up * ignore transformers Qwen2_5_xxx type check * remove not so often use `qwen2vl-cli` debug functions * remove commented-out code blocks * fix attn weight scaling after rebase * add `PROJECTOR_TYPE_QWEN2_5_VL` * remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM` * replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN` * remove `attn_window_size` from gguf * fix model conversion * clean up * fix merging problem * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-27 10:10:34 +02:00
Concedo	1b0481f4b1	wip qwen25vl merge	2025-04-27 13:07:07 +08:00
Concedo	36c8db1248	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/llava/clip-impl.h # examples/llava/clip.cpp # tests/test-arg-parser.cpp # tests/test-json-schema-to-grammar.cpp	2025-04-27 12:51:02 +08:00
Xuan Son Nguyen	53a15d014f	add test	2025-04-26 23:00:41 +02:00
Xuan Son Nguyen	89be919988	fix merging problem	2025-04-26 22:54:41 +02:00
Xuan Son Nguyen	82f8e72ecd	Merge branch 'master' into qwen25-vl	2025-04-26 22:45:06 +02:00
Xuan-Son Nguyen	4753791e70	clip : improve projector naming (#13118 ) * clip : improve projector naming * no more kv has_llava_projector * rm unused kv * rm more unused	2025-04-26 22:39:47 +02:00
Xuan Son Nguyen	0c74ea54f5	clean up	2025-04-26 22:37:05 +02:00
Xuan Son Nguyen	5085dbb293	Merge branch 'master' into qwen25-vl	2025-04-26 22:24:04 +02:00
HimariO	7e1bb0437a	remove `attn_window_size` from gguf	2025-04-26 20:19:51 +08:00
frob	d5fe4e81bd	grammar : handle maxItems == 0 in JSON schema (#13117 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-04-26 10:10:20 +02:00
Concedo	3f545eadbe	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-backend-ops.cpp	2025-04-26 09:12:40 +08:00
HimariO	77b144a8e7	replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`	2025-04-26 01:00:00 +08:00

1 2 3 4 5 ...

1785 commits