Commit graph

7892 commits

Author SHA1 Message Date
Concedo
38b3bffcef Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakePresets.json
#	ggml/src/ggml-cuda/CMakeLists.txt
#	tests/test-sampling.cpp
#	tools/mtmd/clip.cpp
2025-05-07 19:47:44 +08:00
Concedo
4e97b69657 updated lite 2025-05-07 18:43:35 +08:00
Concedo
fa22c1a5a4 fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller 2025-05-07 18:30:36 +08:00
Concedo
b951310ca5 tryout smaller binaries 2025-05-07 14:56:34 +08:00
Johannes Gäßler
141a908a59
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135) 2025-05-06 23:35:51 +02:00
Xuan-Son Nguyen
32916a4907
clip : refactor graph builder (#13321)
* mtmd : refactor graph builder

* fix qwen2vl

* clean up siglip cgraph

* pixtral migrated

* move minicpmv to a dedicated build function

* move max_feature_layer to build_llava

* use build_attn for minicpm resampler

* fix windows build

* add comment for batch_size

* also support tinygemma3 test model

* qwen2vl does not use RMS norm

* fix qwen2vl norm (2)
2025-05-06 22:40:24 +02:00
DocShotgun
ffc727203a
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345) 2025-05-06 22:36:24 +02:00
oobabooga
91a86a6f35
sampling : don't consider -infinity values in top_n_sigma (#13344) 2025-05-06 20:24:15 +02:00
Diego Devesa
f4ed10b69c
cmake : remove arm64 msvc presets (#13342) 2025-05-06 20:15:31 +02:00
Concedo
a5b6f372a3 cfg scale wip 2025-05-07 00:36:00 +08:00
Concedo
ffe23f0e93 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	pyproject.toml
2025-05-06 23:39:45 +08:00
Concedo
0fa435b2a6 Merge commit '9b61acf060' into concedo_experimental
# Conflicts:
#	Makefile
#	docs/multimodal/MobileVLM.md
#	docs/multimodal/glmedge.md
#	docs/multimodal/llava.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	requirements/requirements-all.txt
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/README.md
#	tools/mtmd/android/adb_run.sh
#	tools/mtmd/android/build_64.sh
#	tools/mtmd/clip-quantize-cli.cpp
2025-05-06 23:34:21 +08:00
Concedo
1377a93a73 Merge commit '5215b91e93' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	cmake/x64-windows-llvm.cmake
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/CMakeLists.txt
#	tools/imatrix/imatrix.cpp
#	tools/llava/clip.cpp
#	tools/rpc/rpc-server.cpp
2025-05-06 23:15:04 +08:00
Concedo
38a8778f24 wip cfg scale 2025-05-06 23:06:25 +08:00
Akarshan Biswas
1e333d5bba
SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254)
* SYCL: Do not set tensor extras when reorder optimize is disabled

* SYCL: Disable reorder optimize by default
2025-05-06 20:27:06 +05:30
Xuan-Son Nguyen
2f54e348ad
llama : fix build_ffn without gate (#13336)
* llama : fix build_ffn without gate

* fix build on windows

* Revert "fix build on windows"

This reverts commit fc420d3c7eef3481d3d2f313fef2757cb33a7c56.
2025-05-06 14:25:40 +02:00
Johannes Gäßler
2356fb1d53
CUDA: fix bad asserts for partial offload (#13337) 2025-05-06 13:58:51 +02:00
Concedo
13cee48740 embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
Squashed commits:

[b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[fbbaa989f] embed aria2c for windows
2025-05-06 18:56:02 +08:00
Sigbjørn Skjæret
764b85627b
convert : qwen2/3moe : set yarn metadata if present (#13331)
* set yarn metadata if present

* add comment about enabling YaRN

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
2025-05-06 11:12:06 +02:00
Concedo
9981ba8427 glm4 special BOS handling 2025-05-06 16:41:55 +08:00
Johannes Gäßler
15a28ec8c7
CUDA: fix --split-mode row for MMQ (#13323) 2025-05-06 08:36:46 +02:00
compilade
a7366faa5b
gguf-py : avoid requiring pyside6 for other scripts (#13036)
- gguf-py : remove gguf-py/gguf/scripts/__init__.py because it's not needed

Implicit namespaces are supported since Python 3.3 (https://peps.python.org/pep-0420/),
and the entrypoints in pyproject.toml can directly refer to the main functions.
2025-05-05 22:27:31 -04:00
Johannes Gäßler
9070365020
CUDA: fix logic for clearing padding with -ngl 0 (#13320) 2025-05-05 22:32:13 +02:00
oobabooga
233461f812
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264)
* sampling: add Top-nσ sampler to `llama-server` and sampler ordering

* revert: sampler ordering

* revert: VS' crappy auto-formatting

* revert: VS' crappy auto-formatting pt.2

* revert: my crappy eye sight...

* sampling: add XTC to Top-nσ sampler chain

* sampling: add Dyna. Temp. to Top-nσ sampler chain

* sampling: actually remove Top-nσ from sampler(oops)

* Integrate top_n_sigma into main sampler chain

* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA

* Formatting

* Lint

* Exit early in the sampler if nsigma < 0

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-05-05 22:12:19 +02:00
Concedo
f59b5eb561 added toggle for guidance 2025-05-05 22:21:46 +08:00
igardev
b34c859146
server : Webui - change setText command from parent window to also send the message. (#13309)
* setText command from parent window for llama-vscode now sends the message automatically.

* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.

* Fix code formatting.

* Add index.html.gz changes.

* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."

This reverts commit 67687b7fda8a293724ba92ea30bb151677406bc8.

* easier approach

* add setTimeout

---------

Co-authored-by: igardev <ivailo.gardev@akros.ch>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-05 16:03:31 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd (#13311)
* mv llava to mtmd

* change ref everywhere
2025-05-05 16:02:55 +02:00
Xuan-Son Nguyen
5215b91e93
clip : fix confused naming ffn_up and ffn_down (#13290)
* clip :  fix confused naming ffn_up and ffn_down

* rm ffn_i/o/g naming

* rename n_embd, n_ff

* small fix

* no check n_ff
2025-05-05 12:54:44 +02:00
Sigbjørn Skjæret
ae803bfc3d
convert : bailingmoe : set yarn metadata if present (#13312) 2025-05-05 12:34:26 +02:00
Concedo
41142ad67a try sm35 2025-05-05 17:35:11 +08:00
Akarshan Biswas
66645a5285
SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308)
ggml-ci
2025-05-05 13:39:10 +05:30
Xuan-Son Nguyen
27aa259532
mtmd : add C public API (#13184)
* init

* wip

* working version

* add mtmd::bitmaps

* add test target

* rm redundant define

* test: mtmd_input_chunks_free

* rm outdated comment

* fix merging issue

* explicitly create mtmd::input_chunks

* mtmd_input_chunk_copy

* add clone()

* add const to various places

* add warning about breaking changes

* helper: use mtmd_image_tokens_get_n_pos
2025-05-04 23:43:42 +02:00
Diego Devesa
9fdfcdaedd
rpc : use backend registry, support dl backends (#13304) 2025-05-04 21:25:43 +02:00
Aaron Teo
6eb7d25c70
ggml : activate s390x simd for Q3_K (#13301)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-05-04 19:49:12 +02:00
Diego Devesa
86bd60d3fe
llava/mtmd : fixes to fully support dl backends (#13303) 2025-05-04 17:05:20 +02:00
Diego Devesa
9f2da5871f
llama : build windows releases with dl backends (#13220) 2025-05-04 14:20:49 +02:00
Johannes Gäßler
93c4e23905
CUDA: fix race condition in MMQ stream-k fixup (#13299) 2025-05-04 14:16:39 +02:00
Johannes Gäßler
8afbd96818
CUDA: fix race condition in MMQ ids_dst (#13294) 2025-05-04 13:58:38 +02:00
Jeff Bolz
8ae5ebcf85
vulkan: Additional type support for unary, binary, and copy (#13266)
Support f16->f32 copy.
Support f16->f16 and f32->f32 unary ops.
Support all combinations of f16/f32 for src0/src1/dst for add/sub/mul/div.
2025-05-04 07:17:16 +02:00
Johannes Gäßler
3e959f0976
imatrix: fix oob writes if src1 is not contiguous (#13286) 2025-05-04 00:50:37 +02:00
Xuan-Son Nguyen
36667c8edc
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259) 2025-05-03 20:07:54 +02:00
Concedo
a0498ad1b1 update sdui fixed resetting 2025-05-04 00:16:08 +08:00
ymcki
3bf785f3ef
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843) 2025-05-03 17:39:51 +02:00
Concedo
5a2808ffaf Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.flake8
#	.github/labeler.yml
#	.github/workflows/bench.yml.disabled
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	.gitignore
#	CMakeLists.txt
#	CODEOWNERS
#	Makefile
#	README.md
#	SECURITY.md
#	build-xcframework.sh
#	ci/run.sh
#	docs/development/HOWTO-add-model.md
#	docs/multimodal/MobileVLM.md
#	docs/multimodal/glmedge.md
#	docs/multimodal/llava.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	examples/CMakeLists.txt
#	examples/pydantic_models_to_grammar_examples.py
#	grammars/README.md
#	pyrightconfig.json
#	requirements/requirements-all.txt
#	scripts/fetch_server_test_models.py
#	scripts/tool_bench.py
#	scripts/xxd.cmake
#	tests/CMakeLists.txt
#	tests/run-json-schema-to-grammar.mjs
#	tools/batched-bench/CMakeLists.txt
#	tools/batched-bench/README.md
#	tools/batched-bench/batched-bench.cpp
#	tools/cvector-generator/CMakeLists.txt
#	tools/cvector-generator/README.md
#	tools/cvector-generator/completions.txt
#	tools/cvector-generator/cvector-generator.cpp
#	tools/cvector-generator/mean.hpp
#	tools/cvector-generator/negative.txt
#	tools/cvector-generator/pca.hpp
#	tools/cvector-generator/positive.txt
#	tools/export-lora/CMakeLists.txt
#	tools/export-lora/README.md
#	tools/export-lora/export-lora.cpp
#	tools/gguf-split/CMakeLists.txt
#	tools/gguf-split/README.md
#	tools/imatrix/CMakeLists.txt
#	tools/imatrix/README.md
#	tools/imatrix/imatrix.cpp
#	tools/llama-bench/CMakeLists.txt
#	tools/llama-bench/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/llava/CMakeLists.txt
#	tools/llava/README.md
#	tools/llava/android/adb_run.sh
#	tools/llava/android/build_64.sh
#	tools/llava/clip-quantize-cli.cpp
#	tools/main/CMakeLists.txt
#	tools/main/README.md
#	tools/perplexity/CMakeLists.txt
#	tools/perplexity/README.md
#	tools/perplexity/perplexity.cpp
#	tools/quantize/CMakeLists.txt
#	tools/rpc/CMakeLists.txt
#	tools/rpc/README.md
#	tools/rpc/rpc-server.cpp
#	tools/run/CMakeLists.txt
#	tools/run/README.md
#	tools/run/linenoise.cpp/linenoise.cpp
#	tools/run/linenoise.cpp/linenoise.h
#	tools/run/run.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
#	tools/server/bench/README.md
#	tools/server/public_simplechat/readme.md
#	tools/server/tests/README.md
#	tools/server/themes/README.md
#	tools/server/themes/buttons-top/README.md
#	tools/server/themes/wild/README.md
#	tools/tokenize/CMakeLists.txt
#	tools/tokenize/tokenize.cpp
2025-05-03 12:15:36 +08:00
Concedo
b258e23003 fixed bad merge 2025-05-03 11:43:01 +08:00
Concedo
0951ad9f58 temp merge, not working 2025-05-03 11:42:01 +08:00
Concedo
1228f91ccb even better comfyui handling, dynamic node ids 2025-05-03 11:21:22 +08:00
Concedo
6cb36ce1ae better zenity checks for multilingual 2025-05-03 10:09:47 +08:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Georgi Gerganov
b34443923c
sync : ggml (#13268)
* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)

* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)

* review: remove src_x/y < 0 checks; add performance tests

* sync : ggml

ggml-ci

* vulkan : fix lint (#0)

---------

Co-authored-by: Acly <aclysia@gmail.com>
2025-05-02 20:54:30 +03:00