Diego Devesa
cf0a43bb64
llama-bench : add defrag-thold, check for invalid ranges ( #13487 )
2025-05-13 00:31:37 +02:00
Concedo
21e31e255b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-metal/ggml-metal.m
# ggml/src/ggml-metal/ggml-metal.metal
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# src/llama-model.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/clip.cpp
# tools/rpc/rpc-server.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-05-13 00:28:35 +08:00
Xuan-Son Nguyen
de4c07f937
clip : cap max image size 1024 for qwen vl model ( #13478 )
2025-05-12 15:06:51 +02:00
Anudit Nagar
91159ee9df
server : allow content to be null in oaicompat_completion_params_parse ( #13477 )
2025-05-12 13:56:42 +02:00
Diego Devesa
22cdab343b
llama-bench : accept ranges for integer parameters ( #13410 )
2025-05-12 13:08:22 +02:00
City
c104023994
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj ( #13459 )
2025-05-12 00:39:06 +02:00
Anthony Umfer
9a390c4829
tools : fix uninitialized llama_batch in server ( #13436 )
...
* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()
* Update tools/server/server.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* use C++11 initializer syntax
* switch from Copy-list-initialization to Direct-list-initialization
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-11 17:08:26 +02:00
David Huang
7f323a589f
Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B ( #13386 )
2025-05-11 14:18:39 +02:00
City
3eac209319
mtmd : support InternVL 3 38B and 78B mmproj ( #13443 )
...
* Support InternVL 3 38B and 78B mmproj
* Swap norms in clip.cpp
* Group variables together
2025-05-11 11:35:52 +02:00
Xuan-Son Nguyen
a634d75d1b
mtmd : move helpers to dedicated file ( #13442 )
...
* mtmd : move helpers to dedicated file
* fix windows build
* rm redundant include
2025-05-11 11:34:23 +02:00
Concedo
f841b29c41
fixed unicode paths
2025-05-11 14:05:54 +08:00
Xuan-Son Nguyen
15e6125a39
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl ( #13434 )
...
* mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl
* fix typo
2025-05-10 19:57:54 +02:00
Xuan-Son Nguyen
3b24d26c22
server : update docs ( #13432 )
2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 ( #13422 )
...
* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen
33eff40240
server : vision support via libmtmd ( #12898 )
...
* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-09 19:29:37 +02:00
Concedo
6bb44391bd
Merge commit ' 5c86c9ed3e' into concedo_experimental
...
# Conflicts:
# tools/imatrix/imatrix.cpp
# tools/mtmd/README.md
# tools/run/README.md
# tools/run/run.cpp
2025-05-10 00:30:18 +08:00
Diego Devesa
27ebfcacba
llama : do not crash if there is no CPU backend ( #13395 )
...
* llama : do not crash if there is no CPU backend
* add checks to examples
2025-05-09 13:02:07 +02:00
Bartowski
efb8b47eda
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation ( #13389 )
...
* Add --parse-special for enabling parsing of special tokens in imatrix calculation
* whitespace
2025-05-09 11:53:58 +02:00
R0CKSTAR
0527771dd8
llama-run: add support for downloading models from ModelScope ( #13370 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-05-09 10:25:50 +01:00
Concedo
42f6930e13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-rpc/ggml-rpc.cpp
2025-05-09 17:18:14 +08:00
Xuan-Son Nguyen
2189fd3b63
mtmd : fix batch_view for m-rope ( #13397 )
...
* mtmd : fix batch_view for m-rope
* nits : fix comment
2025-05-09 11:18:02 +02:00
Xuan-Son Nguyen
3f96aeff39
llama : one-off chat template fix for Mistral-Small-2503 ( #13398 )
...
* llama : one-off chat template fix for Mistral-Small-2503
* update readme
* add mistral-v7-tekken
2025-05-09 11:17:51 +02:00
Xuan-Son Nguyen
d9c4accaff
server : (webui) rename has_multimodal --> modalities ( #13393 )
...
* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code
2025-05-09 09:06:37 +02:00
Concedo
2f5f4ee65a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# common/CMakeLists.txt
2025-05-09 14:18:20 +08:00
Matt Clayton
f05a6d71a0
mtmd : Expose helper_decode_image_chunk ( #13366 )
...
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups
2025-05-08 20:25:39 +02:00
Xuan-Son Nguyen
ee01d71e58
server : (webui) fix a very small misalignment ( #13387 )
...
* server : (webui) fix a very small misalignment
* restore font-bold
2025-05-08 18:51:45 +02:00
Concedo
2439014a03
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/embedding/embedding.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Xuan-Son Nguyen
8c83449cb7
server : (webui) revamp the input area, plus many small UI improvements ( #13365 )
...
* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log
2025-05-08 15:37:29 +02:00
welix
0ccc121354
mtmd : fix the calculation of n_tokens for smolvlm ( #13381 )
...
Co-authored-by: Taichi Nishimura <Taichi.A.Nishimura@sony.com>
2025-05-08 15:03:53 +02:00
Georgi Gerganov
6562e5a4d6
context : allow cache-less context for embeddings ( #13108 )
...
* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci]
2025-05-08 14:28:33 +03:00
Georgi Gerganov
51fb96b1ff
context : remove logits_all flag ( #13284 )
...
* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci
2025-05-08 14:26:50 +03:00
Concedo
38b3bffcef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# ggml/src/ggml-cuda/CMakeLists.txt
# tests/test-sampling.cpp
# tools/mtmd/clip.cpp
2025-05-07 19:47:44 +08:00
Xuan-Son Nguyen
32916a4907
clip : refactor graph builder ( #13321 )
...
* mtmd : refactor graph builder
* fix qwen2vl
* clean up siglip cgraph
* pixtral migrated
* move minicpmv to a dedicated build function
* move max_feature_layer to build_llava
* use build_attn for minicpm resampler
* fix windows build
* add comment for batch_size
* also support tinygemma3 test model
* qwen2vl does not use RMS norm
* fix qwen2vl norm (2)
2025-05-06 22:40:24 +02:00
Concedo
ffe23f0e93
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/ggml-sycl.cpp
# pyproject.toml
2025-05-06 23:39:45 +08:00
Concedo
0fa435b2a6
Merge commit ' 9b61acf060' into concedo_experimental
...
# Conflicts:
# Makefile
# docs/multimodal/MobileVLM.md
# docs/multimodal/glmedge.md
# docs/multimodal/llava.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# requirements/requirements-all.txt
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/android/adb_run.sh
# tools/mtmd/android/build_64.sh
# tools/mtmd/clip-quantize-cli.cpp
2025-05-06 23:34:21 +08:00
Concedo
1377a93a73
Merge commit ' 5215b91e93' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# cmake/x64-windows-llvm.cmake
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/CMakeLists.txt
# tools/imatrix/imatrix.cpp
# tools/llava/clip.cpp
# tools/rpc/rpc-server.cpp
2025-05-06 23:15:04 +08:00
oobabooga
233461f812
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) ( #13264 )
...
* sampling: add Top-nσ sampler to `llama-server` and sampler ordering
* revert: sampler ordering
* revert: VS' crappy auto-formatting
* revert: VS' crappy auto-formatting pt.2
* revert: my crappy eye sight...
* sampling: add XTC to Top-nσ sampler chain
* sampling: add Dyna. Temp. to Top-nσ sampler chain
* sampling: actually remove Top-nσ from sampler(oops)
* Integrate top_n_sigma into main sampler chain
* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA
* Formatting
* Lint
* Exit early in the sampler if nsigma < 0
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-05-05 22:12:19 +02:00
igardev
b34c859146
server : Webui - change setText command from parent window to also send the message. ( #13309 )
...
* setText command from parent window for llama-vscode now sends the message automatically.
* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.
* Fix code formatting.
* Add index.html.gz changes.
* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."
This reverts commit 67687b7fda8a293724ba92ea30bb151677406bc8.
* easier approach
* add setTimeout
---------
Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-05 16:03:31 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd ( #13311 )
...
* mv llava to mtmd
* change ref everywhere
2025-05-05 16:02:55 +02:00
Xuan-Son Nguyen
5215b91e93
clip : fix confused naming ffn_up and ffn_down ( #13290 )
...
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
2025-05-05 12:54:44 +02:00
Xuan-Son Nguyen
27aa259532
mtmd : add C public API ( #13184 )
...
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* add const to various places
* add warning about breaking changes
* helper: use mtmd_image_tokens_get_n_pos
2025-05-04 23:43:42 +02:00
Diego Devesa
9fdfcdaedd
rpc : use backend registry, support dl backends ( #13304 )
2025-05-04 21:25:43 +02:00
Diego Devesa
86bd60d3fe
llava/mtmd : fixes to fully support dl backends ( #13303 )
2025-05-04 17:05:20 +02:00
Johannes Gäßler
3e959f0976
imatrix: fix oob writes if src1 is not contiguous ( #13286 )
2025-05-04 00:50:37 +02:00
Xuan-Son Nguyen
36667c8edc
clip : revert the change of BOI/EOI token for GLM-edge ( ⚠️ breaking change) ( #13259 )
2025-05-03 20:07:54 +02:00
Concedo
5a2808ffaf
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/labeler.yml
# .github/workflows/bench.yml.disabled
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/server.yml
# .gitignore
# CMakeLists.txt
# CODEOWNERS
# Makefile
# README.md
# SECURITY.md
# build-xcframework.sh
# ci/run.sh
# docs/development/HOWTO-add-model.md
# docs/multimodal/MobileVLM.md
# docs/multimodal/glmedge.md
# docs/multimodal/llava.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# examples/CMakeLists.txt
# examples/pydantic_models_to_grammar_examples.py
# grammars/README.md
# pyrightconfig.json
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# scripts/tool_bench.py
# scripts/xxd.cmake
# tests/CMakeLists.txt
# tests/run-json-schema-to-grammar.mjs
# tools/batched-bench/CMakeLists.txt
# tools/batched-bench/README.md
# tools/batched-bench/batched-bench.cpp
# tools/cvector-generator/CMakeLists.txt
# tools/cvector-generator/README.md
# tools/cvector-generator/completions.txt
# tools/cvector-generator/cvector-generator.cpp
# tools/cvector-generator/mean.hpp
# tools/cvector-generator/negative.txt
# tools/cvector-generator/pca.hpp
# tools/cvector-generator/positive.txt
# tools/export-lora/CMakeLists.txt
# tools/export-lora/README.md
# tools/export-lora/export-lora.cpp
# tools/gguf-split/CMakeLists.txt
# tools/gguf-split/README.md
# tools/imatrix/CMakeLists.txt
# tools/imatrix/README.md
# tools/imatrix/imatrix.cpp
# tools/llama-bench/CMakeLists.txt
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/llava/CMakeLists.txt
# tools/llava/README.md
# tools/llava/android/adb_run.sh
# tools/llava/android/build_64.sh
# tools/llava/clip-quantize-cli.cpp
# tools/main/CMakeLists.txt
# tools/main/README.md
# tools/perplexity/CMakeLists.txt
# tools/perplexity/README.md
# tools/perplexity/perplexity.cpp
# tools/quantize/CMakeLists.txt
# tools/rpc/CMakeLists.txt
# tools/rpc/README.md
# tools/rpc/rpc-server.cpp
# tools/run/CMakeLists.txt
# tools/run/README.md
# tools/run/linenoise.cpp/linenoise.cpp
# tools/run/linenoise.cpp/linenoise.h
# tools/run/run.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
# tools/server/bench/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
# tools/server/themes/README.md
# tools/server/themes/buttons-top/README.md
# tools/server/themes/wild/README.md
# tools/tokenize/CMakeLists.txt
# tools/tokenize/tokenize.cpp
2025-05-03 12:15:36 +08:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00