Concedo
88660dd59d
merged qwen2.5vl again
2025-04-08 21:32:25 +08:00
Concedo
822cf2430e
Merge commit ' f1e3eb4249' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/backend/SYCL.md
# examples/llava/clip.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Concedo
c58e9a2be3
revert q2.5vl before merge (+1 squashed commits)
...
Squashed commits:
[3197ea95] Revert "add tentative support for qwen2.5vl vision from HimariO fork"
This reverts commit 911669087a .
2025-04-08 20:38:41 +08:00
HimariO
b28ad7ecca
fix attn weight scaling after rebase
2025-04-07 22:07:56 +08:00
HimariO
223edef897
remove commented-out code blocks
2025-04-07 21:52:37 +08:00
HimariO
dde96b4774
remove not so often use qwen2vl-cli debug functions
2025-04-07 21:52:37 +08:00
HimariO
8fcf682b28
ignore transformers Qwen2_5_xxx type check
2025-04-07 21:52:37 +08:00
HimariO
fdae70a832
cleaning up
2025-04-07 21:52:37 +08:00
HimariO
c891300c1e
move position id remap out of ggml to avoid int32 cuda operations
2025-04-07 21:52:37 +08:00
HimariO
e18f6a3238
fix few incorrect tensor memory layout
2025-04-07 21:52:37 +08:00
HimariO
ecd673f0c5
add debug utils
2025-04-07 21:51:18 +08:00
HimariO
9c827814e6
handle window attention inputs
2025-04-07 21:51:18 +08:00
HimariO
9c7cc6de9c
implment vision model architecture, gguf convertor
2025-04-07 21:46:06 +08:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp ( #12766 )
...
* arg.cpp: add a missing include
* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen
0364178ca2
clip : refactor clip_init, add tests ( #12757 )
...
* refactor clip_init
* fix loading file
* fix style
* test ok
* better test with report
* add missing headers
* clarify
* add KEY_MM_PATCH_MERGE_TYPE
* remove bool has_* pattern
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/clip.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* use ggml_soft_max_ext
* refactor logging system
* add minicpm-v-o 2.6 for testing
* use nullptr everywhere
* fix Yi-VL model
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-05 17:17:40 +02:00
Concedo
103d60ed2c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/common.cpp
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/export-lora/export-lora.cpp
# examples/gritlm/gritlm.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/acl_tensor.h
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-vulkan/CMakeLists.txt
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
2025-04-03 18:57:49 +08:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option ( #12694 )
...
* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2
2025-04-01 23:44:05 +02:00
Concedo
9e182b3e78
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/backend/SYCL.md
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/ggml-vulkan.cpp
# scripts/sync-ggml.last
# tests/test-chat-template.cpp
2025-04-01 20:16:07 +08:00
Sigbjørn Skjæret
1a85949067
llava : proper description fix ( #12668 )
2025-03-31 11:28:30 +02:00
Sigbjørn Skjæret
f52d59d771
llava : fix clip loading GGUFs with missing description ( #12660 )
2025-03-31 11:07:07 +02:00
Concedo
1ebadc515e
add streaming support for oai tools (+2 squashed commit)
...
Squashed commit:
[4d080b37] qwen2.5vl surgery script
[4bebe7e5] add streaming support for oai tools
2025-03-31 16:49:15 +08:00
Concedo
911669087a
add tentative support for qwen2.5vl vision from HimariO fork
2025-03-29 22:52:43 +08:00
Concedo
396875e1c4
update api docs and lite
2025-03-29 15:39:25 +08:00
Ivy233
02082f1519
clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend ( #12566 )
...
* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize.
After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.
* [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
2025-03-26 15:06:04 +01:00
Concedo
bfc30066c9
fixed a clip processing bug
2025-03-15 17:49:49 +08:00
Concedo
0db4ae6237
traded my ink for a pen
2025-03-14 11:58:15 +08:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Concedo
1ef41c2124
streamline output console log (+1 squashed commits)
...
Squashed commits:
[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
16137f4281
gemma3 now works correctly
2025-03-13 14:34:18 +08:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
2025-03-11 09:20:16 +01:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
2025-03-10 10:33:24 +02:00
Concedo
ec43d2b147
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# common/common.cpp
# examples/embedding/embedding.cpp
# examples/json_schema_to_grammar.py
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.swiftui/README.md
# examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj
# examples/lookahead/lookahead.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# requirements.txt
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
2025-03-06 18:54:58 +08:00
Aaron Teo
e9b2f84f14
llava: add big-endian conversion for image encoder ( #12218 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-03-06 09:33:21 +01:00
Concedo
6b7d2349a7
Rewrite history to fix bad vulkan shader commits without increasing repo size
...
added dpe colab (+8 squashed commit)
Squashed commit:
[b8362da4] updated lite
[ed6c037d] move nsigma into the regular sampler stack
[ac5f61c6] relative filepath fixed
[05fe96ab] export template
[ed0a5a3e] nix_example.md: refactor (#1401 )
* nix_example.md: add override example
* nix_example.md: drop graphics example, already basic nixos knowledge
* nix_example.md: format
* nix_example.md: Vulkan is disabled on macOS
Disabled in: 1ccd253acc
* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}
Fixes: https://github.com/LostRuins/koboldcpp/issues/1367
[675c62f7] AutoGuess: Phi 4 (mini) (#1402 )
[4bf56982 ] phrasing
[b8c0df04 ] Add Rep Pen to Top N Sigma sampler chain (#1397 )
- place after nsigma and before xtc (+3 squashed commit)
Squashed commit:
[87c52b97 ] disable VMM from HIP
[ee8906f3 ] edit description
[e85c0e69 ] Remove Unnecessary Rep Counting (#1394 )
* stop counting reps
* fix range-based initializer
* strike that - reverse it
2025-03-05 00:02:20 +08:00
Alex Brooks
84d5f4bc19
Update granite vision docs for 3.2 model ( #12105 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-28 11:31:47 +00:00
Ting Lou
a800ae46da
llava : add struct for FFI bindgen ( #12079 )
...
* add struct for FFI bindgen
* Apply suggestions from code review
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-26 15:26:52 +01:00
Alex Brooks
4d1051a40f
Add Doc for Converting Granite Vision -> GGUF ( #12006 )
...
* Add example docs for granite vision
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-25 10:46:05 +01:00
Alex Brooks
7a2c913e66
llava : Add Granite Vision Support ( #11794 )
...
* Add super wip scripts for multimodal granite gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add example for converting mmgranite to gguf
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* remove hardcoded path
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add vision feature layer to gguf params
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Clean up llava surgery and remove name substitution hacks
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add transformers llava next tensor name mapping
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Make siglip / openclip mutuall exclusive
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix projector linear substitution
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix linear 2 substitution index
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Increase max flattened gridpoints to 64
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix hardcoded concat for multiple feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Pull vision feature layers out of gguf keys
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* fix num gridpoints and use all layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Avoid dropping last image encoder layer in llava models
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use 10 for max number of patches
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Standardize vision feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Cleanup logs
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update comment for vision feature layer init
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update notes for alternative to legacy llm conversion script
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix notes rendering
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add v prefix to vision feature layer log
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use current defaults for feature layer
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use constant for max gridpoints / feat layers, style fixes
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* clarify non-negative feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Remove CLIP_API from func signature
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* USE MAX_IMAGE_FEATURE_LAYERS const in layer calc
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Clarify feature layers are non negative ints and not uint
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix condition for reading feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* pop last llava layer when feature layers are unset
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix unset vision layer 0
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Update examples/llava/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Reenable assertion for out of bounds get_rows
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use std vector for gridpoints and feature layers
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Caculate max feature layer at load time
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Include base patch for granite vision allocation
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Fix trailing whitespace
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Add max num patches = 10 back for minicpmv
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use unordered set to store feature layers
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Use max feature layer for postnorm
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
* Apply suggestions from code review
---------
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-24 17:09:51 +01:00
Concedo
159c47f0e6
Merge commit ' 335eb04a91' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CONTRIBUTING.md
# Makefile
# docs/build.md
# examples/llama.swiftui/llama.swiftui/UI/ContentView.swift
# examples/run/run.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
2025-02-24 11:55:14 +08:00
Ting Lou
36c258ee92
llava: build clip image from pixels ( #11999 )
...
* llava: export function `clip_build_img_from_pixels` to build image from pixels decoded by other libraries instead of stb_image.h for better performance
* Apply suggestions from code review
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-02-22 15:28:28 +01:00
Alex Brooks
ee02ad02c5
clip : fix visual encoders with no CLS ( #11982 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2025-02-21 08:11:03 +02:00
Concedo
f144b1f345
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/nix/package.nix
# .devops/rocm.Dockerfile
# .github/ISSUE_TEMPLATE/020-enhancement.yml
# .github/ISSUE_TEMPLATE/030-research.yml
# .github/ISSUE_TEMPLATE/040-refactor.yml
# .github/ISSUE_TEMPLATE/config.yml
# .github/pull_request_template.md
# .github/workflows/bench.yml.disabled
# .github/workflows/build.yml
# .github/workflows/labeler.yml
# CONTRIBUTING.md
# Makefile
# README.md
# SECURITY.md
# ci/README.md
# common/CMakeLists.txt
# docs/android.md
# docs/backend/SYCL.md
# docs/build.md
# docs/cuda-fedora.md
# docs/development/HOWTO-add-model.md
# docs/docker.md
# docs/install.md
# docs/llguidance.md
# examples/cvector-generator/README.md
# examples/imatrix/README.md
# examples/imatrix/imatrix.cpp
# examples/llama.android/llama/src/main/cpp/CMakeLists.txt
# examples/llama.swiftui/README.md
# examples/llama.vim
# examples/lookahead/README.md
# examples/lookup/README.md
# examples/main/README.md
# examples/passkey/README.md
# examples/pydantic_models_to_grammar_examples.py
# examples/retrieval/README.md
# examples/server/CMakeLists.txt
# examples/server/README.md
# examples/simple-cmake-pkg/README.md
# examples/speculative/README.md
# flake.nix
# grammars/README.md
# pyproject.toml
# scripts/check-requirements.sh
2025-02-16 02:08:39 +08:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Concedo
db6db9dff9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/server.yml
# AUTHORS
# CMakeLists.txt
# Makefile
# README.md
# cmake/llama.pc.in
# common/CMakeLists.txt
# docs/build.md
# examples/batched.swift/Sources/main.swift
# examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
# examples/llava/CMakeLists.txt
# examples/llava/clip.h
# examples/run/run.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2025-02-07 00:52:31 +08:00
SAMI
1ec208083c
llava: add quantization for the visual projector LLAVA, Qwen2VL ( #11644 )
...
* Added quantization for visual projector
* Added README
* Fixed the clip quantize implementation in the file
* Fixed the gcc warning regarding minor linting
* Removed trailing whitespace
2025-02-05 10:45:40 +03:00
piDack
0cec062a63
llama : add support for GLM-Edge and GLM-Edge-V series models ( #10573 )
...
* add glm edge chat model
* use config partial_rotary_factor as rope ratio
* support for glm edge model
* vision model support
* remove debug info
* fix format
* llava.cpp trailing whitespace
* remove unused AutoTokenizer
* Update src/llama.cpp for not contain <|end|> or </s>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* add edge template
* fix chat template
* fix confict
* fix confict
* fix ci err
* fix format err
* fix template err
* 9b hf chat support
* format
* format clip.cpp
* fix format
* Apply suggestions from code review
* Apply suggestions from code review
* Update examples/llava/clip.cpp
* fix format
* minor : style
---------
Co-authored-by: liyuhang <yuhang.li@zhipuai.cn>
Co-authored-by: piDack <pcdack@hotmail.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: liyuhang <yuhang.li@aminer.cn>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-02 09:48:46 +02:00
Concedo
70f1d8d746
vision can set max res (+1 squashed commits)
...
Squashed commits:
[938fc655] vision can set max res
2025-01-30 00:19:49 +08:00
Concedo
bec231422a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# common/CMakeLists.txt
# docs/backend/SYCL.md
# docs/build.md
# docs/docker.md
# examples/export-lora/export-lora.cpp
# examples/main/README.md
# examples/main/main.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# examples/simple-chat/simple-chat.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2025-01-25 14:16:50 +08:00