Concedo
59300dbdf5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/actions/windows-setup-curl/action.yml
# .github/workflows/build-linux-cross.yml
# README.md
# common/CMakeLists.txt
# examples/parallel/README.md
# examples/parallel/parallel.cpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# tools/server/README.md
2025-05-18 23:27:53 +08:00
Isaac McFadyen
6a2bc8bfb7
server : added --no-prefill-assistant flag ( #13608 )
...
* added no-prefill-assistant flag
* reworded documentation comment
* updated server README.md
2025-05-17 23:59:48 +02:00
Xuan-Son Nguyen
6aa892ec2a
server : do not return error out of context (with ctx shift disabled) ( #13577 )
2025-05-16 21:50:00 +02:00
Xuan-Son Nguyen
aea9f8b4e7
webui : improve accessibility for visually impaired people ( #13551 )
...
* webui : improve accessibility for visually impaired people
* add a11y for extra contents
* fix some labels being read twice
* add skip to main content
2025-05-16 21:49:01 +02:00
Concedo
e5d26a2356
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/CMakeLists.txt
# docs/backend/SYCL.md
# ggml/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/binbcast.cpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dmmv.cpp
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# ggml/src/gguf.cpp
# scripts/compare-llama-bench.py
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-05-16 15:30:31 +08:00
Concedo
6cafc0e73e
Merge commit ' 71bdbdb587' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/CMakeLists.txt
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.h
2025-05-16 15:25:15 +08:00
Concedo
12e6928ec2
i'm gonna regret this, aren't i?
2025-05-15 23:59:55 +08:00
Concedo
7a76e237b8
fixed clip quantize again
2025-05-15 23:22:12 +08:00
Diego Devesa
6c8b91500e
llama-bench : fix -ot with dl backends ( #13563 )
2025-05-15 15:46:55 +02:00
Xuan-Son Nguyen
3cc1f1f1d2
webui : handle PDF input (as text or image) + convert pasted long content to file ( #13562 )
...
* webui : handle PDF input (as text or image)
* handle the case where pdf image + server without mtmd
* fix bug missing pages
2025-05-15 14:24:50 +02:00
Piotr Wilkin (ilintar)
c753d7bed0
server : proper error handling for missing elements in messages array (OpenAI compatible backend) ( #13540 )
2025-05-15 08:40:58 +02:00
Georgi Gerganov
b2838049cc
bench : handle decode errors ( #13548 )
...
ggml-ci
2025-05-15 05:57:02 +03:00
Olivier Chafik
aa48e373f2
server: inject date_string in llama 3.x template + fix date for firefunction v2 (#12802 )
...
* Inject date_string in llama 3.x + fix for functionary v2
https://github.com/ggml-org/llama.cpp/issues/12729
* move/fix detection of functionary v3.1 before llama 3.x, fix & test their non-tool mode
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* generate more tokens in test_completion_with_required_tool_tiny_fast to avoid truncation
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-05-15 02:39:51 +01:00
Olivier Chafik
3198405e98
common: add partial regex support (#12808 )
...
* move string_find_partial_stop & string_ends_with to common
* add common_regex (supports partial matches)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update common/regex-partial.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update common/regex-partial.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* partial regex: add missing iterator end checks
* string utils: use string_views
* direct throw to avoid ggml.h include
* regex-partial: replace missed ggml_asserts
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-14 19:50:57 +01:00
Georgi Gerganov
053174436f
server : passthrough the /models endpoint during loading ( #13535 )
...
* server : passthrough the /models endpoint during loading
* server : update readme + return json for "meta" field
2025-05-14 15:42:10 +03:00
Xuan-Son Nguyen
360a9c98e1
server : fix cache_tokens bug with no cache_prompt ( #13533 )
2025-05-14 13:35:07 +02:00
Xuan-Son Nguyen
bb1681fbd5
webui : use fflate for more deterministic gzip compress ( #13525 )
...
* webui : use pako for more deterministic gzip compress
* simpler code
* use fflate instead of pako
2025-05-14 10:26:12 +02:00
Luca Stefani
d486dd3e8e
webui: Allow pasting file from clipboard ( #13526 )
...
* server: Allow pasting file from clipboard
* server: Prevent default action on file paste
* update build
* format then build combined
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-14 10:07:31 +02:00
Ed Addario
e5c834f718
quantize : improve tensor-type pattern matching ( #13033 )
2025-05-13 19:12:31 +02:00
Xuan-Son Nguyen
71bdbdb587
clip : clip.h become private API ( ⚠️ breaking change) ( #13510 )
2025-05-13 17:07:21 +02:00
Georgi Gerganov
b89d605a91
batched-bench : fix pp batch contents ( #13492 )
2025-05-13 18:01:53 +03:00
Xuan-Son Nguyen
b4726345ac
mtmd : remove libllava, remove clip-quantize-cli ( ⚠️ breaking change) ( #13460 )
...
* mtmd : remove libllava, remove clip-quantize-cli
* rm clip_model_quantize
2025-05-13 15:33:58 +02:00
Concedo
11984f1040
fixed autoguess adapters, fixed tool builds
2025-05-13 19:38:56 +08:00
Diego Devesa
cf0a43bb64
llama-bench : add defrag-thold, check for invalid ranges ( #13487 )
2025-05-13 00:31:37 +02:00
Concedo
21e31e255b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-metal/ggml-metal.m
# ggml/src/ggml-metal/ggml-metal.metal
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# src/llama-model.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/clip.cpp
# tools/rpc/rpc-server.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-05-13 00:28:35 +08:00
Xuan-Son Nguyen
de4c07f937
clip : cap max image size 1024 for qwen vl model ( #13478 )
2025-05-12 15:06:51 +02:00
Anudit Nagar
91159ee9df
server : allow content to be null in oaicompat_completion_params_parse ( #13477 )
2025-05-12 13:56:42 +02:00
Diego Devesa
22cdab343b
llama-bench : accept ranges for integer parameters ( #13410 )
2025-05-12 13:08:22 +02:00
City
c104023994
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj ( #13459 )
2025-05-12 00:39:06 +02:00
Anthony Umfer
9a390c4829
tools : fix uninitialized llama_batch in server ( #13436 )
...
* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()
* Update tools/server/server.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* use C++11 initializer syntax
* switch from Copy-list-initialization to Direct-list-initialization
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-11 17:08:26 +02:00
David Huang
7f323a589f
Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B ( #13386 )
2025-05-11 14:18:39 +02:00
City
3eac209319
mtmd : support InternVL 3 38B and 78B mmproj ( #13443 )
...
* Support InternVL 3 38B and 78B mmproj
* Swap norms in clip.cpp
* Group variables together
2025-05-11 11:35:52 +02:00
Xuan-Son Nguyen
a634d75d1b
mtmd : move helpers to dedicated file ( #13442 )
...
* mtmd : move helpers to dedicated file
* fix windows build
* rm redundant include
2025-05-11 11:34:23 +02:00
Concedo
f841b29c41
fixed unicode paths
2025-05-11 14:05:54 +08:00
Xuan-Son Nguyen
15e6125a39
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl ( #13434 )
...
* mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl
* fix typo
2025-05-10 19:57:54 +02:00
Xuan-Son Nguyen
3b24d26c22
server : update docs ( #13432 )
2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 ( #13422 )
...
* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen
33eff40240
server : vision support via libmtmd ( #12898 )
...
* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-09 19:29:37 +02:00
Concedo
6bb44391bd
Merge commit ' 5c86c9ed3e' into concedo_experimental
...
# Conflicts:
# tools/imatrix/imatrix.cpp
# tools/mtmd/README.md
# tools/run/README.md
# tools/run/run.cpp
2025-05-10 00:30:18 +08:00
Diego Devesa
27ebfcacba
llama : do not crash if there is no CPU backend ( #13395 )
...
* llama : do not crash if there is no CPU backend
* add checks to examples
2025-05-09 13:02:07 +02:00
Bartowski
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation ( #13389 )
...
* Add --parse-special for enabling parsing of special tokens in imatrix calculation
* whitespace
2025-05-09 11:53:58 +02:00
R0CKSTAR
0527771dd8
llama-run: add support for downloading models from ModelScope ( #13370 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-09 10:25:50 +01:00
Concedo
42f6930e13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-rpc/ggml-rpc.cpp
2025-05-09 17:18:14 +08:00
Xuan-Son Nguyen
2189fd3b63
mtmd : fix batch_view for m-rope ( #13397 )
...
* mtmd : fix batch_view for m-rope
* nits : fix comment
2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 ( #13398 )
...
* llama : one-off chat template fix for Mistral-Small-2503
* update readme
* add mistral-v7-tekken
2025-05-09 11:17:51 +02:00
Xuan-Son Nguyen
d9c4accaff
server : (webui) rename has_multimodal --> modalities ( #13393 )
...
* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code
2025-05-09 09:06:37 +02:00
Concedo
2f5f4ee65a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Matt Clayton
f05a6d71a0
mtmd : Expose helper_decode_image_chunk ( #13366 )
...
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups
2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen
ee01d71e58
server : (webui) fix a very small misalignment ( #13387 )
...
* server : (webui) fix a very small misalignment
* restore font-bold
2025-05-08 18:51:45 +02:00
Concedo
2439014a03
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/embedding/embedding.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00