koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-16 19:59:16 +00:00

Author	SHA1	Message	Date
Concedo	88660dd59d	merged qwen2.5vl again	2025-04-08 21:32:25 +08:00
Concedo	822cf2430e	Merge commit '`f1e3eb4249`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/backend/SYCL.md # examples/llava/clip.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in	2025-04-08 20:48:53 +08:00
Concedo	c58e9a2be3	revert q2.5vl before merge (+1 squashed commits) Squashed commits: [3197ea95] Revert "add tentative support for qwen2.5vl vision from HimariO fork" This reverts commit `911669087a`.	2025-04-08 20:38:41 +08:00
HimariO	b28ad7ecca	fix attn weight scaling after rebase	2025-04-07 22:07:56 +08:00
HimariO	223edef897	remove commented-out code blocks	2025-04-07 21:52:37 +08:00
HimariO	dde96b4774	remove not so often use `qwen2vl-cli` debug functions	2025-04-07 21:52:37 +08:00
HimariO	8fcf682b28	ignore transformers Qwen2_5_xxx type check	2025-04-07 21:52:37 +08:00
HimariO	fdae70a832	cleaning up	2025-04-07 21:52:37 +08:00
HimariO	c891300c1e	move position id remap out of ggml to avoid int32 cuda operations	2025-04-07 21:52:37 +08:00
HimariO	e18f6a3238	fix few incorrect tensor memory layout	2025-04-07 21:52:37 +08:00
HimariO	ecd673f0c5	add debug utils	2025-04-07 21:51:18 +08:00
HimariO	9c827814e6	handle window attention inputs	2025-04-07 21:51:18 +08:00
HimariO	9c7cc6de9c	implment vision model architecture, gguf convertor	2025-04-07 21:46:06 +08:00
Sergey Fedorov	f1e3eb4249	common : fix includes in arg.cpp and gemma3-cli.cpp (#12766 ) * arg.cpp: add a missing include * gemma3-cli.cpp: fix cinttypes include	2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen	0364178ca2	clip : refactor clip_init, add tests (#12757 ) * refactor clip_init * fix loading file * fix style * test ok * better test with report * add missing headers * clarify * add KEY_MM_PATCH_MERGE_TYPE * remove bool has_* pattern * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * use ggml_soft_max_ext * refactor logging system * add minicpm-v-o 2.6 for testing * use nullptr everywhere * fix Yi-VL model --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-05 17:17:40 +02:00
Concedo	103d60ed2c	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/common.cpp # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/export-lora/export-lora.cpp # examples/gritlm/gritlm.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-vulkan/CMakeLists.txt # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp	2025-04-03 18:57:49 +08:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
Concedo	9e182b3e78	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # docs/backend/SYCL.md # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # ggml/src/ggml-vulkan/ggml-vulkan.cpp # scripts/sync-ggml.last # tests/test-chat-template.cpp	2025-04-01 20:16:07 +08:00
Sigbjørn Skjæret	1a85949067	llava : proper description fix (#12668 )	2025-03-31 11:28:30 +02:00
Sigbjørn Skjæret	f52d59d771	llava : fix clip loading GGUFs with missing description (#12660 )	2025-03-31 11:07:07 +02:00
Concedo	1ebadc515e	add streaming support for oai tools (+2 squashed commit) Squashed commit: [4d080b37] qwen2.5vl surgery script [4bebe7e5] add streaming support for oai tools	2025-03-31 16:49:15 +08:00
Concedo	911669087a	add tentative support for qwen2.5vl vision from HimariO fork	2025-03-29 22:52:43 +08:00
Concedo	396875e1c4	update api docs and lite	2025-03-29 15:39:25 +08:00
Ivy233	02082f1519	clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 ) * [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.	2025-03-26 15:06:04 +01:00
Concedo	bfc30066c9	fixed a clip processing bug	2025-03-15 17:49:49 +08:00
Concedo	0db4ae6237	traded my ink for a pen	2025-03-14 11:58:15 +08:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Concedo	1ef41c2124	streamline output console log (+1 squashed commits) Squashed commits: [ca474bdd] streamline output console log	2025-03-13 15:33:49 +08:00
Concedo	16137f4281	gemma3 now works correctly	2025-03-13 14:34:18 +08:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen	96e1280839	clip : bring back GPU support (#12322 ) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up	2025-03-11 09:20:16 +01:00
tc-mb	8352cdc87b	llava : fix bug in minicpm-v code (#11513 ) * fix bug in minicpm-v code * update readme of minicpm-v	2025-03-10 10:33:24 +02:00
Concedo	ec43d2b147	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # common/common.cpp # examples/embedding/embedding.cpp # examples/json_schema_to_grammar.py # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/README.md # examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj # examples/lookahead/lookahead.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # requirements.txt # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp	2025-03-06 18:54:58 +08:00
Aaron Teo	e9b2f84f14	llava: add big-endian conversion for image encoder (#12218 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-03-06 09:33:21 +01:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
Alex Brooks	84d5f4bc19	Update granite vision docs for 3.2 model (#12105 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-28 11:31:47 +00:00
Ting Lou	a800ae46da	llava : add struct for FFI bindgen (#12079 ) * add struct for FFI bindgen * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-26 15:26:52 +01:00
Alex Brooks	4d1051a40f	Add Doc for Converting Granite Vision -> GGUF (#12006 ) * Add example docs for granite vision Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-25 10:46:05 +01:00
Alex Brooks	7a2c913e66	llava : Add Granite Vision Support (#11794 ) * Add super wip scripts for multimodal granite gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add example for converting mmgranite to gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * remove hardcoded path Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add vision feature layer to gguf params Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Clean up llava surgery and remove name substitution hacks Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add transformers llava next tensor name mapping Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Make siglip / openclip mutuall exclusive Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix projector linear substitution Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix linear 2 substitution index Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Increase max flattened gridpoints to 64 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix hardcoded concat for multiple feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Pull vision feature layers out of gguf keys Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * fix num gridpoints and use all layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Avoid dropping last image encoder layer in llava models Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use 10 for max number of patches Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Standardize vision feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Cleanup logs Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update comment for vision feature layer init Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update notes for alternative to legacy llm conversion script Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix notes rendering Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add v prefix to vision feature layer log Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use current defaults for feature layer Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use constant for max gridpoints / feat layers, style fixes Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * clarify non-negative feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Remove CLIP_API from func signature Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * USE MAX_IMAGE_FEATURE_LAYERS const in layer calc Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Clarify feature layers are non negative ints and not uint Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix condition for reading feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * pop last llava layer when feature layers are unset Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix unset vision layer 0 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update examples/llava/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Reenable assertion for out of bounds get_rows Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use std vector for gridpoints and feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Caculate max feature layer at load time Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Include base patch for granite vision allocation Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix trailing whitespace Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add max num patches = 10 back for minicpmv Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use unordered set to store feature layers Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use max feature layer for postnorm Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Apply suggestions from code review --------- Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-24 17:09:51 +01:00
Concedo	159c47f0e6	Merge commit '`335eb04a91`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # Makefile # docs/build.md # examples/llama.swiftui/llama.swiftui/UI/ContentView.swift # examples/run/run.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt	2025-02-24 11:55:14 +08:00
Ting Lou	36c258ee92	llava: build clip image from pixels (#11999 ) * llava: export function `clip_build_img_from_pixels` to build image from pixels decoded by other libraries instead of stb_image.h for better performance * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-22 15:28:28 +01:00
Alex Brooks	ee02ad02c5	clip : fix visual encoders with no CLS (#11982 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-21 08:11:03 +02:00
Concedo	f144b1f345	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/nix/package.nix # .devops/rocm.Dockerfile # .github/ISSUE_TEMPLATE/020-enhancement.yml # .github/ISSUE_TEMPLATE/030-research.yml # .github/ISSUE_TEMPLATE/040-refactor.yml # .github/ISSUE_TEMPLATE/config.yml # .github/pull_request_template.md # .github/workflows/bench.yml.disabled # .github/workflows/build.yml # .github/workflows/labeler.yml # CONTRIBUTING.md # Makefile # README.md # SECURITY.md # ci/README.md # common/CMakeLists.txt # docs/android.md # docs/backend/SYCL.md # docs/build.md # docs/cuda-fedora.md # docs/development/HOWTO-add-model.md # docs/docker.md # docs/install.md # docs/llguidance.md # examples/cvector-generator/README.md # examples/imatrix/README.md # examples/imatrix/imatrix.cpp # examples/llama.android/llama/src/main/cpp/CMakeLists.txt # examples/llama.swiftui/README.md # examples/llama.vim # examples/lookahead/README.md # examples/lookup/README.md # examples/main/README.md # examples/passkey/README.md # examples/pydantic_models_to_grammar_examples.py # examples/retrieval/README.md # examples/server/CMakeLists.txt # examples/server/README.md # examples/simple-cmake-pkg/README.md # examples/speculative/README.md # flake.nix # grammars/README.md # pyproject.toml # scripts/check-requirements.sh	2025-02-16 02:08:39 +08:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Concedo	db6db9dff9	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/close-issue.yml # .github/workflows/server.yml # AUTHORS # CMakeLists.txt # Makefile # README.md # cmake/llama.pc.in # common/CMakeLists.txt # docs/build.md # examples/batched.swift/Sources/main.swift # examples/llama.swiftui/llama.cpp.swift/LibLlama.swift # examples/llava/CMakeLists.txt # examples/llava/clip.h # examples/run/run.cpp # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2025-02-07 00:52:31 +08:00
SAMI	1ec208083c	llava: add quantization for the visual projector LLAVA, Qwen2VL (#11644 ) * Added quantization for visual projector * Added README * Fixed the clip quantize implementation in the file * Fixed the gcc warning regarding minor linting * Removed trailing whitespace	2025-02-05 10:45:40 +03:00
piDack	0cec062a63	llama : add support for GLM-Edge and GLM-Edge-V series models (#10573 ) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <\|end\|> or </s> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <yuhang.li@zhipuai.cn> Co-authored-by: piDack <pcdack@hotmail.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: liyuhang <yuhang.li@aminer.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-02 09:48:46 +02:00
Concedo	70f1d8d746	vision can set max res (+1 squashed commits) Squashed commits: [938fc655] vision can set max res	2025-01-30 00:19:49 +08:00
Concedo	bec231422a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # common/CMakeLists.txt # docs/backend/SYCL.md # docs/build.md # docs/docker.md # examples/export-lora/export-lora.cpp # examples/main/README.md # examples/main/main.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # examples/simple-chat/simple-chat.cpp # ggml/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2025-01-25 14:16:50 +08:00

1 2 3 4 5

239 commits