Commit graph

8148 commits

Author SHA1 Message Date
Concedo
779a41f23e Merge commit 'c3a2624339' into concedo_experimental 2025-05-24 22:56:02 +08:00
Concedo
f97bbdde00 fix to allow all EOGs to trigger a stop, occam's glm4 fix, 2025-05-24 22:55:11 +08:00
Sigbjørn Skjæret
c3a2624339
vocab : fix ugm tokenizer precision (#13743) 2025-05-24 12:29:09 +02:00
Concedo
bd7a40f326 fix alpaca leak too broad 2025-05-24 18:26:04 +08:00
Johannes Gäßler
ffd0eae60b
CUDA: fix race condition in FA vector kernels (#13742) 2025-05-24 11:46:19 +02:00
Concedo
0dca953d78 removed winget workflow 2025-05-24 16:40:39 +08:00
Concedo
55cc9acec5 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	README.md
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/ggml-cann.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip.cpp
#	tools/mtmd/clip.h
2025-05-24 12:10:36 +08:00
Concedo
ec04115ae9 swa options now available 2025-05-24 11:50:37 +08:00
Diego Devesa
b775345d78
ci : enable winget package updates (#13734) 2025-05-23 23:14:00 +03:00
Diego Devesa
a70a8a69c2
ci : add winget package updater (#13732) 2025-05-23 22:09:38 +02:00
Concedo
748dfcc2e4 massively improved tool calling 2025-05-24 02:26:11 +08:00
Georgi Gerganov
d13d0f6135
hparams : initialize arrays (#13728)
ggml-ci
2025-05-23 20:16:13 +03:00
Xuan-Son Nguyen
8a2afb7520
llama : allow custom list of swa_layers (#13726) 2025-05-23 17:07:04 +02:00
Concedo
c4df151298 experimental swa flag 2025-05-23 21:33:26 +08:00
Concedo
499283c63a rename define to match upstream 2025-05-23 17:10:12 +08:00
Xuan-Son Nguyen
9ecf3e66a3
server : support audio input (#13714)
* server : support audio input

* add audio support on webui
2025-05-23 11:03:47 +02:00
Chenguang Li
faaaff5f94
CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705)
* [CANN]Support MUL_MAT_ID Q8 && Q4

Signed-off-by: noemotiovon <757486878@qq.com>

* codestyle adjustment

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>
2025-05-23 16:47:53 +08:00
Xuan-Son Nguyen
e16c4731c7
ggml : fix the order of ggml_unary_op (#13718) 2025-05-23 08:12:48 +02:00
Jeff Bolz
1dcd01960c
vulkan: support CPY from any type to itself (#13695)
Reuse the f16/f32 copy shaders, and just scale the number of elements
according to the type size.
2025-05-23 06:45:02 +02:00
Jeff Bolz
c10ed6cbcc
vulkan: Disable coopmat/coopmat2/bfloat extensions if glslc doesn't support it (#13696) 2025-05-23 06:33:45 +02:00
Judd
a127ff1780
use LOG_WARN to replace std::cerr (#13657) 2025-05-23 06:33:08 +02:00
Concedo
22ef97d7d3 Merge commit 'ab86335760' into concedo_experimental
# Conflicts:
#	.github/workflows/release.yml
#	examples/retrieval/retrieval.cpp
#	examples/simple-chat/simple-chat.cpp
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	requirements/requirements-convert_hf_to_gguf.txt
#	requirements/requirements-convert_hf_to_gguf_update.txt
#	requirements/requirements-convert_lora_to_gguf.txt
#	tools/run/run.cpp
2025-05-23 11:41:36 +08:00
Diego Devesa
3079e9ac8e
release : fix windows hip release (#13707)
* release : fix windows hip release

* make single hip release with multiple targets
2025-05-23 00:21:37 +02:00
Georgi Gerganov
8a1d206f1d
tts : fix n_ubatch + make WavTokenizer cache-less (#13713)
ggml-ci
2025-05-22 22:21:07 +03:00
Xuan-Son Nguyen
797990c4bc
mtmd : add ultravox audio input (#13623)
* convert ok, load ok

* warmup ok

* test

* still does not work?

* fix padding

* temporary give up

* fix merge conflict

* build_ultravox()

* rm test

* fix merge conflict

* add necessary mtmd APIs

* first working version (only 4s of audio)

* will this monster compile?

* fix compile

* please compile

* fPIC

* fix windows

* various fixes

* clean up audio_helpers

* fix conversion

* add some debug stuff

* long audio input ok

* adapt the api

* add --audio arg

* final touch UX

* add miniaudio to readme

* fix typo

* refactor kv metadata

* mtmd_default_marker()
2025-05-22 20:42:48 +02:00
Aaron Teo
ab86335760
common: Include torch package for s390x (#13699)
* common: update requirements.txt to include pytorch nightly for s390x

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* common: fix torch installation via pip for s390x

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-05-22 21:31:29 +03:00
Concedo
ae8f01c2d4 no need for chat start for now 2025-05-22 23:19:29 +08:00
Concedo
fdca5ba71e declutter 2025-05-22 22:58:47 +08:00
Concedo
69b5d4d4af cursed hack for glm4, may or may not be better 2025-05-22 22:40:37 +08:00
Concedo
8bd6f9f9ae added a simple cross platform launch script for unpacked dirs 2025-05-22 22:09:46 +08:00
Georgi Gerganov
cc74d5be99
server : pad small embedding batches (#13692)
ggml-ci
2025-05-22 16:33:39 +03:00
Concedo
e68a5f448c add ddim sampler 2025-05-22 21:28:01 +08:00
Sigbjørn Skjæret
5be24af73d
gguf-py : correct charsmap parameter typing (#13701) 2025-05-22 14:25:05 +02:00
Nicolò Scipione
d394a9aedc
sycl : Remove waits from function calls (#13702)
* removes the waits in async memcpy functions
2025-05-22 12:54:43 +01:00
Concedo
f125e724eb fix off-by-one npast during some instances of fast forwarding 2025-05-22 19:51:21 +08:00
Ewan Crawford
6b56a64690
SYCL: Avoid using with SYCL-Graph for unsupported nodes (#13587)
Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.

* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074

We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](39e73ae0d6/ggml/src/ggml-cuda/ggml-cuda.cu (L2458-L2458))
method for checking if a graph can be used, even if enabled. I've taken a
similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking
if a graph can be used for the operations even if a user has asked for it to be
enabled.
2025-05-22 16:24:09 +08:00
Concedo
f10574e598 debug text 2025-05-22 14:22:01 +08:00
Henry Linjamäki
a4e8912dfd
opencl: Add support for multiple devices (#12622)
* opencl: Add support for multiple devices

... but limited to one platform. A platform with a GPU will be preferred.

Additionally:

* Filter out devices that lack capabilities needed by the backend
  implementation (half support, OpenCL 2.0+, etc).

* Make ggml_backend_opencl_reg() thread-safe.

* fixup: fix an error in sync_with_other_backends

... when there is only one OpenCL device available.
2025-05-21 16:21:45 -07:00
Henry Linjamäki
edbf42edfd
opencl: fix couple crashes (#12795)
* opencl: fix couple crashes

* fix kernel launches failed on devices which do not support
  non-uniform work-groups. When non-uniform work-groups are not
  supported, set `local_work_size` to NULL (= let driver choose the
  work-group sizes). This patch does not cover everything - just the
  cases tested by test-backend-ops.

* fix sub-buffer creation failed due to `cl_buffer_region::origin` not
  being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`.

* OpenCL: query non-uniform WG sizes only on OpenCL 3.0+
2025-05-21 13:21:17 -07:00
Diego Devesa
d643bb2c79
releases : build CPU backend separately (windows) (#13642) 2025-05-21 22:09:57 +02:00
Georgi Gerganov
8e186ef0e7
hparams : support models for which all layers use SWA (#13682)
ggml-ci
2025-05-21 20:00:49 +03:00
Georgi Gerganov
5fbfe384d4
server : improve error reporting (#13680) 2025-05-21 19:46:56 +03:00
antichristHater
c76532e7ba
convert : add qwen2vl support for unsloth merges (#13686) 2025-05-21 18:40:35 +02:00
Concedo
440350327c set random range for seed 2025-05-21 23:47:18 +08:00
Wagner Bruna
5d0cfc9db3
store on the image the actual random seed, for reproducibility (#1549) 2025-05-21 23:40:47 +08:00
Wagner Bruna
7dc3e3e64b
store clip skip value on generated images (#1551) 2025-05-21 23:37:48 +08:00
Concedo
da7fd4aa57 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/musa.Dockerfile
#	.github/workflows/build.yml
#	README.md
#	ci/README.md
#	docs/docker.md
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup.cpp
#	examples/parallel/parallel.cpp
#	ggml/src/ggml-musa/CMakeLists.txt
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tests/test-arg-parser.cpp
2025-05-21 23:12:22 +08:00
Sigbjørn Skjæret
2aa777d86d
examples : switch retrieval to llama_encode (#13685)
* switch retrieval to llama_encode

* enable --no-warmup for retrieval
2025-05-21 16:57:38 +02:00
Concedo
9f976e9c65 swa full used unless ctx shift and fast forward disabled 2025-05-21 22:47:45 +08:00
Emmanuel Ferdman
eb0f5c28d3
gguf-py : display the invalid gguf type (#13687)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-21 16:33:54 +02:00