Concedo
06159939d9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# docs/build.md
# examples/rpc/rpc-server.cpp
# examples/sycl/build.sh
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-hip/CMakeLists.txt
# scripts/sync-ggml.last
2025-04-17 00:52:37 +08:00
Russyyds
d6d2c2ab8c
Add performance print for gemma3 in example ( #12929 )
2025-04-14 19:18:20 +02:00
Neo Zhang Jianyu
81c7e64fc2
dsiable curl lib check, this action is missed by commit bd3f59f812
( #12761 ) ( #12937 )
2025-04-14 18:19:07 +08:00
Ed Addario
71e90e8813
quantize: Handle user-defined quantization levels for additional tensors ( #12511 )
...
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Minor refactoring as per the contributors' coding guidelines
* Update descriptions to match existing style
* Add llama_model_quantize_params parameters
* Add new quantize parameters parsing and validation
* Update usage
* Add new parameters defaults
* Add new quantization parameters logic
* Minor refactoring as per the contributors' guidelines
* Implement general --tensor-type instead of tensor-specific command option
* Fix implied type bug
* Restore missing #includes
* Add regex capability for tensor selection
* Refactor function name and update ALLOWED_TENSOR_TYPE
* Add missing #include
* Handle edge case when tensor name is cls.output
* Minor logging improvement
2025-04-13 21:29:28 +03:00
Prajwal B Mehendarkar
bc091a4dc5
common : Define cache directory on AIX ( #12915 )
2025-04-12 17:33:39 +02:00
Concedo
a6149ad0fc
fixed g3 adapter back
2025-04-12 23:17:54 +08:00
Concedo
9f94f62768
fixed segfault
2025-04-12 19:08:27 +08:00
Concedo
7b4254bef9
not working on cpu
2025-04-12 18:55:29 +08:00
Matt Clayton
e59ea539b8
llava: Fix cpu-only clip image encoding sefault ( #12907 )
...
* llava: Fix cpu-only clip image encoding
* clip : no smart ptr for ggml_backend_t
* Fix for backend_ptr push_back
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-12 07:29:03 +02:00
Concedo
7e1289ade8
fixes for sdcpp
2025-04-12 10:08:23 +08:00
Concedo
a0ae187563
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# examples/llava/CMakeLists.txt
# examples/llava/clip.cpp
# examples/rpc/rpc-server.cpp
# examples/run/run.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2025-04-12 10:06:47 +08:00
Concedo
ea9bd61e47
Merge commit ' 64eda5deb9
' into concedo_experimental
...
# Conflicts:
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# docs/backend/SYCL.md
# examples/llava/clip.cpp
# examples/server_embd.py
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/ggml-cann.cpp
# src/CMakeLists.txt
# tests/test-chat-template.cpp
2025-04-12 08:31:22 +08:00
Georgi Gerganov
c94085df28
server : add VSCode's Github Copilot Chat support ( #12896 )
...
* server : add VSCode's Github Copilot Chat support
* cont : update handler name
2025-04-11 23:37:41 +03:00
yuri@FreeBSD
e8a62631b3
rpc : Set cache directory in rpc-server.cpp on FreeBSD ( #12903 )
2025-04-11 22:04:14 +02:00
tastelikefeet
b2034c2b55
contrib: support modelscope community ( #12664 )
...
* support download from modelscope
* support login
* remove comments
* add arguments
* fix code
* fix win32
* test passed
* fix readme
* revert readme
* change to MODEL_ENDPOINT
* revert tail line
* fix readme
* refactor model endpoint
* remove blank line
* fix header
* fix as comments
* update comment
* update readme
---------
Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
2025-04-11 14:01:56 +02:00
Xuan-Son Nguyen
0c50923944
clip : use smart pointer ( ⚠️ breaking change) ( #12869 )
...
* clip : use smart pointers
* fix warmup
* add forward declaration
* misisng include
* fix include (2)
* composite
* simplify batch ptr
* fix conflict
2025-04-11 12:09:39 +02:00
Xuan-Son Nguyen
8b9cc7cdd8
llava : introduce libmtmd ( #12849 )
...
* wip llava2
* migrated gemma3 to llava2
* add timings
* correct pre/postfix
* fix missing include
* fix compilation unused var warn
* update llava2_tokenize
* change name llava2 --> mtmd
* improve api
* refine helpers
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-10 22:57:16 +02:00
Plamen Minev
381603a775
ci: detach common from the library ( #12827 )
...
* fix: detach common from the library
* fix: building chat test template
2025-04-09 10:11:11 +02:00
Xuan-Son Nguyen
65a69e6e1b
clip : do not print ftype ( #12832 )
2025-04-09 10:09:53 +02:00
Matt Clayton
b32efad2bc
llava: improve clip_ctx destructor to not memleak load_image_size ( #12834 )
2025-04-08 22:01:58 +02:00
Georgi Gerganov
a19b5cef16
llama : fix FA when KV cache is not used (i.e. embeddings) ( #12825 )
...
* ggml : FA supports F32 V
* graph : cast KV to F16 when the KV cache is not used
ggml-ci
* server : add test that exercises embeddings with FA enabled
ggml-ci
2025-04-08 19:54:51 +03:00
Xuan-Son Nguyen
78a1ba0a4f
server : fix thread.join() on exit ( #12831 )
2025-04-08 18:37:06 +02:00
dm4
2dabf759e7
llava: add more helper functions to check projector types in clip context ( #12824 )
...
Signed-off-by: dm4 <sunrisedm4@gmail.com>
2025-04-08 15:49:13 +02:00
Concedo
ebf924c5d1
Merge branch 'upstream' into concedo_experimental
2025-04-08 21:46:30 +08:00
Concedo
88660dd59d
merged qwen2.5vl again
2025-04-08 21:32:25 +08:00
Concedo
822cf2430e
Merge commit ' f1e3eb4249
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# docs/backend/SYCL.md
# examples/llava/clip.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/cmake/host-toolchain.cmake.in
2025-04-08 20:48:53 +08:00
Concedo
c58e9a2be3
revert q2.5vl before merge (+1 squashed commits)
...
Squashed commits:
[3197ea95] Revert "add tentative support for qwen2.5vl vision from HimariO fork"
This reverts commit 911669087a
.
2025-04-08 20:38:41 +08:00
characharm
8ca6e1c3a4
server : webui : Improve Chat Input with Auto-Sizing Textarea ( #12785 )
...
* Update ChatScreen.tsx
* useAutosizeTextarea.ts
useAutosizeTextarea to encapsulate the logic.
* Implement responsive auto-sizing chat textarea
Replaces the manual textarea resizing with an automatic height adjustment based on content.
- `useChatTextarea` hook to manage textarea state and auto-sizing logic via refs, preserving the optimization
- Textarea now grows vertically up to a maximum height (`lg:max-h-48`) on large screens (lg breakpoint and up).
- Disables auto-sizing and enables manual vertical resizing (`resize-vertical`) on smaller screens for better mobile usability.
- Aligns the "Send" button to the bottom of the textarea (`items-end`) for consistent positioning during resize.
* -update compressed index.html.gz after npm run build
-refactor: replace OptimizedTextareaValue with AutosizeTextareaApi in VSCode context hook
* chore: normalize line endings to LF
refactor: AutosizeTextareaApi -> chatTextareaApi
* refactor: Rename interface to PascalCase
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-08 11:14:59 +02:00
stduhpf
4ccea213bc
hellaswag: display estimated score confidence interval ( #12797 )
2025-04-07 18:47:08 +03:00
HimariO
b28ad7ecca
fix attn weight scaling after rebase
2025-04-07 22:07:56 +08:00
HimariO
223edef897
remove commented-out code blocks
2025-04-07 21:52:37 +08:00
HimariO
dde96b4774
remove not so often use qwen2vl-cli
debug functions
2025-04-07 21:52:37 +08:00
HimariO
8fcf682b28
ignore transformers Qwen2_5_xxx type check
2025-04-07 21:52:37 +08:00
HimariO
fdae70a832
cleaning up
2025-04-07 21:52:37 +08:00
HimariO
c891300c1e
move position id remap out of ggml to avoid int32 cuda operations
2025-04-07 21:52:37 +08:00
HimariO
e18f6a3238
fix few incorrect tensor memory layout
2025-04-07 21:52:37 +08:00
HimariO
ecd673f0c5
add debug utils
2025-04-07 21:51:18 +08:00
HimariO
9c827814e6
handle window attention inputs
2025-04-07 21:51:18 +08:00
HimariO
9c7cc6de9c
implment vision model architecture, gguf convertor
2025-04-07 21:46:06 +08:00
Concedo
a3f7de7142
fixed outetts docs
2025-04-07 21:31:43 +08:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default ( #12761 )
...
* cmake : enable curl by default
* no curl if no examples
* fix build
* fix build-linux-cross
* add windows-setup-curl
* fix
* shell
* fix path
* fix windows-latest-cmake*
* run: include_directories
* LLAMA_RUN_EXTRA_LIBS
* sycl: no llama_curl
* no test-arg-parser on windows
* clarification
* try riscv64 / arm64
* windows: include libcurl inside release binary
* add msg
* fix mac / ios / android build
* will this fix xcode?
* try clearing the cache
* add bunch of licenses
* revert clear cache
* fix xcode
* fix xcode (2)
* fix typo
2025-04-07 13:35:19 +02:00
Concedo
5edbacdd0e
fix tools (+3 squashed commit)
...
Squashed commit:
[95a489ee] fix tools build
[1d3d3451] add accelerate
[2837705c
] edit a line
2025-04-06 21:30:48 +08:00
Sergey Fedorov
f1e3eb4249
common : fix includes in arg.cpp and gemma3-cli.cpp ( #12766 )
...
* arg.cpp: add a missing include
* gemma3-cli.cpp: fix cinttypes include
2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen
0364178ca2
clip : refactor clip_init, add tests ( #12757 )
...
* refactor clip_init
* fix loading file
* fix style
* test ok
* better test with report
* add missing headers
* clarify
* add KEY_MM_PATCH_MERGE_TYPE
* remove bool has_* pattern
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update examples/llava/clip.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* use ggml_soft_max_ext
* refactor logging system
* add minicpm-v-o 2.6 for testing
* use nullptr everywhere
* fix Yi-VL model
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-05 17:17:40 +02:00
Nauful Shaikh
b772394297
server : webui : Upgrade daisyui, tailwindcss. ( #12735 )
...
* Upgrade daisyui, tailwindcss.
* Switch to all themes.
* Revert a change.
* Update formatting.
* Install packages before npm build.
* Revert "Install packages before npm build."
This reverts commit 336c5147e614e60993162794ba9d9d4629a916f8.
* Add index.html.gz
* run build
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-04-04 16:09:52 +02:00
nick huang
23106f94ea
gguf-split : --merge now respects --dry-run option ( #12681 )
...
* gguf-split now respects dry-run option
* removing trailing space
2025-04-04 16:09:12 +02:00
Concedo
103d60ed2c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/common.cpp
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/export-lora/export-lora.cpp
# examples/gritlm/gritlm.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/acl_tensor.h
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-vulkan/CMakeLists.txt
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
2025-04-03 18:57:49 +08:00
Georgi Gerganov
a10b36c91a
llama : refactor kv cache guard ( #12695 )
...
* llama : refactor kv cache guard
ggml-ci
* cont : fix comment [no ci]
* llama : fix kv_cache restore logic
ggml-ci
* context : simplify kv cache updates
ggml-ci
* cont : better name [no ci]
* llama : fix llama_decode return code when could not find KV slot
ggml-ci
* context : change log err -> warn [no ci]
* kv-cache : add comment + warning
2025-04-02 14:32:59 +03:00
Xuan-Son Nguyen
42eb248f46
common : remove json.hpp from common.cpp ( #12697 )
...
* common : remove json.hpp from common.cpp
* fix comment
2025-04-02 09:58:34 +02:00
Xuan-Son Nguyen
267c1399f1
common : refactor downloading system, handle mmproj with -hf option ( #12694 )
...
* (wip) refactor downloading system [no ci]
* fix all examples
* fix mmproj with -hf
* gemma3: update readme
* only handle mmproj in llava example
* fix multi-shard download
* windows: fix problem with std::min and std::max
* fix 2
2025-04-01 23:44:05 +02:00