Commit graph

10692 commits

Author SHA1 Message Date
Diego Devesa
5a4ff43e7d
llama : disable pipeline parallelism if compute buffer allocation fails (#16748) 2025-10-27 21:51:28 +01:00
Acly
10640e31aa
ggml : fix interpolate with align-corners and ne=1 (#16700)
* ggml : fix interpolate with align-corners and ne=1

* avoid division by zero if one of the spatial dimensions is 1
* cpu, cuda, opencl returned correct result anyway due to clamp
* vulkan didn't clamp for align-corners so results were broken

* fix clang warning
2025-10-27 21:50:22 +01:00
Johannes Gäßler
80d28f104c
HIP: fix AMDGPU_TARGETS, update documentation (#16803) 2025-10-27 21:39:49 +01:00
Xuan-Son Nguyen
c55d53acec
model : add LightOnOCR-1B model (#16764)
* model : add LightOnOCR-1B model

* add test
2025-10-27 16:02:58 +01:00
Concedo
010d0215fa cleanup 2025-10-27 22:37:53 +08:00
Concedo
eaee2110c3 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-backend-ops.cpp
2025-10-27 22:36:19 +08:00
Wagner Bruna
c652d08f02
sd: sync to master-340-9e28be6 (#1816) 2025-10-27 21:47:48 +08:00
Johannes Gäßler
945501f5ea
llama: fix leaked buffers for mmap + split files (#16765)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
2025-10-27 09:17:31 +01:00
Aman Gupta
75cbdd3fce
test-backend-ops: print failed tests at the end (#16785) 2025-10-27 09:25:10 +08:00
tamarPal
2b9bd9bf4e
sycl: add ROLL operation support (#16665)
* sycl: add ROLL operation support

- Implement ggml_sycl_roll function for F32 tensors
- Add multi-axis roll operation with SYCL kernel
- Support all 4 tensor dimensions with proper shift normalization
- Add roll.cpp and roll.hpp to SYCL backend
- Update backend dispatch and supports_op for GGML_OP_ROLL
- Tests: 17662/17662 pass with identical CPU reference results

* fix: remove trailing whitespace from roll.cpp

- Fix EditorConfig violations in ggml/src/ggml-sycl/roll.cpp
- Remove trailing spaces from lines 6, 11, 28, 47, 58, 60

* ci: retrigger

* sycl: remove wait() calls from ROLL operation

* fix: editorconfig — LF endings + final newline for roll.hpp

---------

Co-authored-by: tamarPal <tamarPal@example.com>
2025-10-27 09:20:24 +08:00
shani-f
59fc1ec8e8
sycl: add REPEAT_BACK operation support (#16734)
* SYCL repeat_back v1 — add core op + switch case

* Implement repeat_back SYCL operation and minor fixes

* Update ggml/src/ggml-sycl/repeat_back.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update ggml/src/ggml-sycl/repeat_back.hpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-27 09:19:50 +08:00
Aman Gupta
75d33b9302
CUDA: support for weight clamp in top-k norm (#16702) 2025-10-27 09:06:16 +08:00
Acly
3470a5c891
ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788) 2025-10-26 23:19:03 +01:00
Sigbjørn Skjæret
bd562fe4f7
cuda : use fast copy when src and dst are of different type and contiguous (#16789)
* use fast copy when src and dst are contiguous and same shape

* use int64_t ne and ignore shape
2025-10-26 21:31:41 +01:00
leejet
bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
* fix k_compute_batched_ptrs

* add backend ops test

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* reduce the batch size

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-10-26 19:13:31 +01:00
Sigbjørn Skjæret
73a48c9790
convert : enable expert group selection for all models with it (#16691) 2025-10-26 17:21:23 +01:00
Sigbjørn Skjæret
f696428ce8
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655)
* add missing norm topk bias

* use clamping instead, update number and add comment
2025-10-26 17:20:32 +01:00
Sigbjørn Skjæret
7cce4f8158
model : set res->t_embd in SmallThinker models (#16782) 2025-10-26 16:08:52 +01:00
amirai21
8d8862829c
docs : add Jamba to Text-only models list (#16778) 2025-10-26 13:01:20 +01:00
Aman Gupta
f77c13b91f
CUDA: General GEMV fusion (#16715) 2025-10-26 19:28:04 +08:00
Concedo
d229774e11 added compatibility endpoint for VITS api 2025-10-26 17:35:10 +08:00
Gilad S.
3cfa9c3f12
vulkan: deduplicate Microsoft Direct3D12 devices (#16689)
* fix: deduplicate and deprioritize Microsoft Direct3D12 vulkan devices from the `vulkan-dozen` driver

* style: indent

* fix: decrease priority

* fix: switch to `||`
2025-10-26 05:37:38 +01:00
Concedo
b730c99ecb fixed a typo 2025-10-26 10:06:59 +08:00
Galunid
5d195f17bc
convert : handle mmproj filename/path properly (#16760)
* convert: handle mmproj model output filename properly

* remove redundant commits

* Add model_type to gguf utility

* Use mmproj- prefix instead of suffix

* Apply CISC suggestion

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-25 20:41:36 +02:00
Concedo
f03ac3e08e updated lite 2025-10-25 22:38:39 +08:00
Concedo
59fafefbe6 Merge branch 'upstream' into concedo_experimental 2025-10-25 22:38:24 +08:00
Wagner Bruna
2cab657c60
sd: sync to master-336-917f7bf (#1810) 2025-10-25 21:19:35 +08:00
Shunta Saito
226f295f4d
model : set res->t_embd in PLaMo2 models (#16766) 2025-10-25 12:26:27 +02:00
Giuseppe Scrivano
f90b4a8efe
vulkan: delete dead code (#16732)
ggml_vk_create_buffer_temp is not used anywhere, and it is the only
caller for ggml_vk_pool_malloc.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-10-25 10:59:54 +02:00
Concedo
9b842edc9a Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental 2025-10-25 16:31:12 +08:00
Concedo
27bf4454d4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/model-conversion/scripts/causal/run-org-model.py
#	tests/test-backend-ops.cpp
2025-10-25 16:30:57 +08:00
Jeff Bolz
8423d01931
vulkan: Optimize SSM_SCAN (#16645) 2025-10-25 07:04:12 +02:00
tsite
97867f1990
add alt umt5xxl tensor name (#1813) 2025-10-25 12:48:08 +08:00
compilade
5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756)
Some checks failed
Python Type-Check / pyright type-check (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
2025-10-24 20:52:00 -04:00
leejet
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
* Fix CUDA grid launch condition for large block_nums.y

* add backend ops test

* reduce test  repetitions
2025-10-24 21:39:37 +02:00
Concedo
da5f5db940 updated lite 2025-10-24 23:38:31 +08:00
Aman Gupta
0bcb40b48c
CUDA: use CUB for arbitary size argsort (#16754) 2025-10-24 20:46:19 +08:00
Florian Badie
69e9ff0103
webui: support q URL parameter (#16728)
* webui: support q URL parameter

Fixes #16722
I’ve checked that it works with Firefox’s AI tools

* webui: apply suggestions from code review

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: update webui static build

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-24 14:10:29 +02:00
Concedo
57e1d9c822 rename blasbatchsize to batchsize 2025-10-24 18:16:54 +08:00
Concedo
3712c6e6cd Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	requirements/requirements-convert_hf_to_gguf.txt
#	tools/imatrix/CMakeLists.txt
#	tools/run/CMakeLists.txt
2025-10-24 18:12:16 +08:00
Daniel Bevenius
5a91109a5d
model-conversion : add trust_remote_code for orig model run [no ci] (#16751)
This commit add the trust_remote_code=True argument when loading models
using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run
original model script.

The motivation for this is that some models require custom code to be
loaded properly, and setting trust_remote_code=True avoids a prompt
asking for user confirmation:
```console
(venv) $ make causal-run-original-model
The repository /path/to/model contains custom code which must be
executed to correctly load the model. You can inspect the repository
content at /path/to/model.

Do you wish to run the custom code? [y/N] N
```

Having this as the default seems like a safe choice as we have to clone
or download the models we convert and would be expecting to run any
custom code they have.
2025-10-24 12:02:02 +02:00
Concedo
c18d6a23ff revert for mistral common fix merge 2025-10-24 17:55:36 +08:00
Concedo
68c9d955d2 support multiple override kv 2025-10-24 17:28:54 +08:00
compilade
f8f071fadd
convert : handle pre-quantized models (#14810)
Some checks are pending
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* convert : begin handling pre-quantized models

* convert : fix conversion from FP8 for Deepseek-V3.1-Base
2025-10-23 16:31:41 -04:00
Johannes Gäßler
0bf47a1dbb
server: add memory breakdown print (#16740) 2025-10-23 21:30:17 +02:00
Wagner Bruna
fef73919ea
sd: clean up changes against stable-diffusion.cpp 90ef5f8 (#1804)
* sd: clean up changes against stable-diffusion.cpp 90ef5f8

Clean up the diff, and include a few missing changes, mainly from
the upscaler and model weight type statistics.

* added line clear again

* remove excess spaces

---------

Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>
2025-10-23 22:00:33 +08:00
Julien Denize
dd62dcfab9
convert : Make mistral-common dependency optional (#16738)
* Make mistral-common dependency optional

* Fix typing
2025-10-23 15:54:46 +02:00
Xuan-Son Nguyen
d0660f237a
mtmd-cli : allow using --jinja (#16718)
* mtmd-cli : allow using --jinja

* support -sys

* implement chat_history

* fix clear memory

* rm -sys support, added TODO
2025-10-23 15:00:49 +02:00
Concedo
0aaa8ca3ab update lite 2025-10-23 19:38:54 +08:00
Prajwal B Mehendarkar
fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00