Concedo
b5d3dcb6c0
add workflow for older pc
2025-10-29 17:35:04 +08:00
Concedo
010d0215fa
cleanup
2025-10-27 22:37:53 +08:00
Concedo
eaee2110c3
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
2025-10-27 22:36:19 +08:00
Wagner Bruna
c652d08f02
sd: sync to master-340-9e28be6 ( #1816 )
2025-10-27 21:47:48 +08:00
Johannes Gäßler
945501f5ea
llama: fix leaked buffers for mmap + split files ( #16765 )
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
2025-10-27 09:17:31 +01:00
Aman Gupta
75cbdd3fce
test-backend-ops: print failed tests at the end ( #16785 )
2025-10-27 09:25:10 +08:00
tamarPal
2b9bd9bf4e
sycl: add ROLL operation support ( #16665 )
...
* sycl: add ROLL operation support
- Implement ggml_sycl_roll function for F32 tensors
- Add multi-axis roll operation with SYCL kernel
- Support all 4 tensor dimensions with proper shift normalization
- Add roll.cpp and roll.hpp to SYCL backend
- Update backend dispatch and supports_op for GGML_OP_ROLL
- Tests: 17662/17662 pass with identical CPU reference results
* fix: remove trailing whitespace from roll.cpp
- Fix EditorConfig violations in ggml/src/ggml-sycl/roll.cpp
- Remove trailing spaces from lines 6, 11, 28, 47, 58, 60
* ci: retrigger
* sycl: remove wait() calls from ROLL operation
* fix: editorconfig — LF endings + final newline for roll.hpp
---------
Co-authored-by: tamarPal <tamarPal@example.com>
2025-10-27 09:20:24 +08:00
shani-f
59fc1ec8e8
sycl: add REPEAT_BACK operation support ( #16734 )
...
* SYCL repeat_back v1 — add core op + switch case
* Implement repeat_back SYCL operation and minor fixes
* Update ggml/src/ggml-sycl/repeat_back.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update ggml/src/ggml-sycl/repeat_back.hpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-27 09:19:50 +08:00
Aman Gupta
75d33b9302
CUDA: support for weight clamp in top-k norm ( #16702 )
2025-10-27 09:06:16 +08:00
Acly
3470a5c891
ggml-alloc : make gallocr prefer chunks that allow memory reuse ( #16788 )
2025-10-26 23:19:03 +01:00
Sigbjørn Skjæret
bd562fe4f7
cuda : use fast copy when src and dst are of different type and contiguous ( #16789 )
...
* use fast copy when src and dst are contiguous and same shape
* use int64_t ne and ignore shape
2025-10-26 21:31:41 +01:00
leejet
bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch ( #16744 )
...
* fix k_compute_batched_ptrs
* add backend ops test
* Update ggml/src/ggml-cuda/ggml-cuda.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* reduce the batch size
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-10-26 19:13:31 +01:00
Sigbjørn Skjæret
73a48c9790
convert : enable expert group selection for all models with it ( #16691 )
2025-10-26 17:21:23 +01:00
Sigbjørn Skjæret
f696428ce8
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero ( #16655 )
...
* add missing norm topk bias
* use clamping instead, update number and add comment
2025-10-26 17:20:32 +01:00
Sigbjørn Skjæret
7cce4f8158
model : set res->t_embd in SmallThinker models ( #16782 )
2025-10-26 16:08:52 +01:00
amirai21
8d8862829c
docs : add Jamba to Text-only models list ( #16778 )
2025-10-26 13:01:20 +01:00
Aman Gupta
f77c13b91f
CUDA: General GEMV fusion ( #16715 )
2025-10-26 19:28:04 +08:00
Concedo
d229774e11
added compatibility endpoint for VITS api
2025-10-26 17:35:10 +08:00
Gilad S.
3cfa9c3f12
vulkan: deduplicate Microsoft Direct3D12 devices ( #16689 )
...
* fix: deduplicate and deprioritize Microsoft Direct3D12 vulkan devices from the `vulkan-dozen` driver
* style: indent
* fix: decrease priority
* fix: switch to `||`
2025-10-26 05:37:38 +01:00
Concedo
b730c99ecb
fixed a typo
2025-10-26 10:06:59 +08:00
Galunid
5d195f17bc
convert : handle mmproj filename/path properly ( #16760 )
...
* convert: handle mmproj model output filename properly
* remove redundant commits
* Add model_type to gguf utility
* Use mmproj- prefix instead of suffix
* Apply CISC suggestion
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-25 20:41:36 +02:00
Concedo
f03ac3e08e
updated lite
2025-10-25 22:38:39 +08:00
Concedo
59fafefbe6
Merge branch 'upstream' into concedo_experimental
2025-10-25 22:38:24 +08:00
Wagner Bruna
2cab657c60
sd: sync to master-336-917f7bf ( #1810 )
2025-10-25 21:19:35 +08:00
Shunta Saito
226f295f4d
model : set res->t_embd in PLaMo2 models ( #16766 )
2025-10-25 12:26:27 +02:00
Giuseppe Scrivano
f90b4a8efe
vulkan: delete dead code ( #16732 )
...
ggml_vk_create_buffer_temp is not used anywhere, and it is the only
caller for ggml_vk_pool_malloc.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-10-25 10:59:54 +02:00
Concedo
9b842edc9a
Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental
2025-10-25 16:31:12 +08:00
Concedo
27bf4454d4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/model-conversion/scripts/causal/run-org-model.py
# tests/test-backend-ops.cpp
2025-10-25 16:30:57 +08:00
Jeff Bolz
8423d01931
vulkan: Optimize SSM_SCAN ( #16645 )
2025-10-25 07:04:12 +02:00
tsite
97867f1990
add alt umt5xxl tensor name ( #1813 )
2025-10-25 12:48:08 +08:00
compilade
5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS ( #16756 )
Python Type-Check / pyright type-check (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
2025-10-24 20:52:00 -04:00
leejet
55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast ( #16742 )
...
* Fix CUDA grid launch condition for large block_nums.y
* add backend ops test
* reduce test repetitions
2025-10-24 21:39:37 +02:00
Concedo
da5f5db940
updated lite
2025-10-24 23:38:31 +08:00
Aman Gupta
0bcb40b48c
CUDA: use CUB for arbitary size argsort ( #16754 )
2025-10-24 20:46:19 +08:00
Florian Badie
69e9ff0103
webui: support q URL parameter ( #16728 )
...
* webui: support q URL parameter
Fixes #16722
I’ve checked that it works with Firefox’s AI tools
* webui: apply suggestions from code review
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* chore: update webui static build
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-24 14:10:29 +02:00
Concedo
57e1d9c822
rename blasbatchsize to batchsize
2025-10-24 18:16:54 +08:00
Concedo
3712c6e6cd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# requirements/requirements-convert_hf_to_gguf.txt
# tools/imatrix/CMakeLists.txt
# tools/run/CMakeLists.txt
2025-10-24 18:12:16 +08:00
Daniel Bevenius
5a91109a5d
model-conversion : add trust_remote_code for orig model run [no ci] ( #16751 )
...
This commit add the trust_remote_code=True argument when loading models
using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run
original model script.
The motivation for this is that some models require custom code to be
loaded properly, and setting trust_remote_code=True avoids a prompt
asking for user confirmation:
```console
(venv) $ make causal-run-original-model
The repository /path/to/model contains custom code which must be
executed to correctly load the model. You can inspect the repository
content at /path/to/model.
Do you wish to run the custom code? [y/N] N
```
Having this as the default seems like a safe choice as we have to clone
or download the models we convert and would be expecting to run any
custom code they have.
2025-10-24 12:02:02 +02:00
Concedo
c18d6a23ff
revert for mistral common fix merge
2025-10-24 17:55:36 +08:00
Concedo
68c9d955d2
support multiple override kv
2025-10-24 17:28:54 +08:00
compilade
f8f071fadd
convert : handle pre-quantized models ( #14810 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run
Python check requirements.txt / check-requirements (push) Waiting to run
Python Type-Check / pyright type-check (push) Waiting to run
* convert : begin handling pre-quantized models
* convert : fix conversion from FP8 for Deepseek-V3.1-Base
2025-10-23 16:31:41 -04:00
Johannes Gäßler
0bf47a1dbb
server: add memory breakdown print ( #16740 )
2025-10-23 21:30:17 +02:00
Wagner Bruna
fef73919ea
sd: clean up changes against stable-diffusion.cpp 90ef5f8 ( #1804 )
...
* sd: clean up changes against stable-diffusion.cpp 90ef5f8
Clean up the diff, and include a few missing changes, mainly from
the upscaler and model weight type statistics.
* added line clear again
* remove excess spaces
---------
Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>
2025-10-23 22:00:33 +08:00
Julien Denize
dd62dcfab9
convert : Make mistral-common dependency optional ( #16738 )
...
* Make mistral-common dependency optional
* Fix typing
2025-10-23 15:54:46 +02:00
Xuan-Son Nguyen
d0660f237a
mtmd-cli : allow using --jinja ( #16718 )
...
* mtmd-cli : allow using --jinja
* support -sys
* implement chat_history
* fix clear memory
* rm -sys support, added TODO
2025-10-23 15:00:49 +02:00
Concedo
0aaa8ca3ab
update lite
2025-10-23 19:38:54 +08:00
Prajwal B Mehendarkar
fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX ( #16610 )
2025-10-23 19:37:31 +08:00
Aman Gupta
061f0eff02
ggml-cuda: use passed ops instead of hardcoded ops ( #16712 )
2025-10-23 19:14:06 +08:00
matteo
8cf6b42d46
server : send partial stop string when <EOG> is reached ( #15007 )
2025-10-23 12:32:24 +03:00
Concedo
12a8bfd453
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CODEOWNERS
# README.md
# docs/ops.md
# docs/ops/SYCL.csv
# docs/ops/Vulkan.csv
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
# tests/test-thread-safety.cpp
2025-10-23 17:22:17 +08:00