Concedo
ec04115ae9
swa options now available
2025-05-24 11:50:37 +08:00
Concedo
748dfcc2e4
massively improved tool calling
2025-05-24 02:26:11 +08:00
Concedo
c4df151298
experimental swa flag
2025-05-23 21:33:26 +08:00
Concedo
499283c63a
rename define to match upstream
2025-05-23 17:10:12 +08:00
Concedo
22ef97d7d3
Merge commit ' ab86335760
' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# examples/retrieval/retrieval.cpp
# examples/simple-chat/simple-chat.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# requirements/requirements-convert_hf_to_gguf.txt
# requirements/requirements-convert_hf_to_gguf_update.txt
# requirements/requirements-convert_lora_to_gguf.txt
# tools/run/run.cpp
2025-05-23 11:41:36 +08:00
Aaron Teo
ab86335760
common: Include torch package for s390x ( #13699 )
...
* common: update requirements.txt to include pytorch nightly for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* common: fix torch installation via pip for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-05-22 21:31:29 +03:00
Concedo
ae8f01c2d4
no need for chat start for now
2025-05-22 23:19:29 +08:00
Concedo
fdca5ba71e
declutter
2025-05-22 22:58:47 +08:00
Concedo
69b5d4d4af
cursed hack for glm4, may or may not be better
2025-05-22 22:40:37 +08:00
Concedo
8bd6f9f9ae
added a simple cross platform launch script for unpacked dirs
2025-05-22 22:09:46 +08:00
Georgi Gerganov
cc74d5be99
server : pad small embedding batches ( #13692 )
...
ggml-ci
2025-05-22 16:33:39 +03:00
Concedo
e68a5f448c
add ddim sampler
2025-05-22 21:28:01 +08:00
Sigbjørn Skjæret
5be24af73d
gguf-py : correct charsmap parameter typing ( #13701 )
2025-05-22 14:25:05 +02:00
Nicolò Scipione
d394a9aedc
sycl : Remove waits from function calls ( #13702 )
...
* removes the waits in async memcpy functions
2025-05-22 12:54:43 +01:00
Concedo
f125e724eb
fix off-by-one npast during some instances of fast forwarding
2025-05-22 19:51:21 +08:00
Ewan Crawford
6b56a64690
SYCL: Avoid using with SYCL-Graph for unsupported nodes ( #13587 )
...
Currently on a CUDA backend to SYCL when running
`GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0` there
are two operations that throw an exception from the blocking
waits during queue recording.
* `-o CONCAT` : Use of blocking waits on a queue that's being recorded https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/concat.cpp#L185-L187
* `-o MUL_MAT_ID`: Blocking wait on a recording queue for a copy to host memory https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-sycl/ggml-sycl.cpp#L3072-L3074
We've noticed that `ggml-cuda.cu` has the
[check_node_graph_compatibility_and_refresh_copy_ops](39e73ae0d6/ggml/src/ggml-cuda/ggml-cuda.cu (L2458-L2458)
)
method for checking if a graph can be used, even if enabled. I've taken a
similar approach in this PR by adding a method to `ggml-sycl.cpp` for checking
if a graph can be used for the operations even if a user has asked for it to be
enabled.
2025-05-22 16:24:09 +08:00
Concedo
f10574e598
debug text
2025-05-22 14:22:01 +08:00
Henry Linjamäki
a4e8912dfd
opencl: Add support for multiple devices ( #12622 )
...
* opencl: Add support for multiple devices
... but limited to one platform. A platform with a GPU will be preferred.
Additionally:
* Filter out devices that lack capabilities needed by the backend
implementation (half support, OpenCL 2.0+, etc).
* Make ggml_backend_opencl_reg() thread-safe.
* fixup: fix an error in sync_with_other_backends
... when there is only one OpenCL device available.
2025-05-21 16:21:45 -07:00
Henry Linjamäki
edbf42edfd
opencl: fix couple crashes ( #12795 )
...
* opencl: fix couple crashes
* fix kernel launches failed on devices which do not support
non-uniform work-groups. When non-uniform work-groups are not
supported, set `local_work_size` to NULL (= let driver choose the
work-group sizes). This patch does not cover everything - just the
cases tested by test-backend-ops.
* fix sub-buffer creation failed due to `cl_buffer_region::origin` not
being aligned to `CL_DEVICE_MEM_BASE_ADDR_ALIGN`.
* OpenCL: query non-uniform WG sizes only on OpenCL 3.0+
2025-05-21 13:21:17 -07:00
Diego Devesa
d643bb2c79
releases : build CPU backend separately (windows) ( #13642 )
2025-05-21 22:09:57 +02:00
Georgi Gerganov
8e186ef0e7
hparams : support models for which all layers use SWA ( #13682 )
...
ggml-ci
2025-05-21 20:00:49 +03:00
Georgi Gerganov
5fbfe384d4
server : improve error reporting ( #13680 )
2025-05-21 19:46:56 +03:00
antichristHater
c76532e7ba
convert : add qwen2vl support for unsloth merges ( #13686 )
2025-05-21 18:40:35 +02:00
Concedo
440350327c
set random range for seed
2025-05-21 23:47:18 +08:00
Wagner Bruna
5d0cfc9db3
store on the image the actual random seed, for reproducibility ( #1549 )
2025-05-21 23:40:47 +08:00
Wagner Bruna
7dc3e3e64b
store clip skip value on generated images ( #1551 )
2025-05-21 23:37:48 +08:00
Concedo
da7fd4aa57
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/musa.Dockerfile
# .github/workflows/build.yml
# README.md
# ci/README.md
# docs/docker.md
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup.cpp
# examples/parallel/parallel.cpp
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-arg-parser.cpp
2025-05-21 23:12:22 +08:00
Sigbjørn Skjæret
2aa777d86d
examples : switch retrieval to llama_encode ( #13685 )
...
* switch retrieval to llama_encode
* enable --no-warmup for retrieval
2025-05-21 16:57:38 +02:00
Concedo
9f976e9c65
swa full used unless ctx shift and fast forward disabled
2025-05-21 22:47:45 +08:00
Emmanuel Ferdman
eb0f5c28d3
gguf-py : display the invalid gguf type ( #13687 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-21 16:33:54 +02:00
Xuan-Son Nguyen
cf4cb59e64
ggml : add ggml_gelu_erf() ( #13667 )
...
* ggml : add ggml_gelu_na (not approximated)
* fix naming order
* rename na --> erf
* apply review suggesions
* revert naming order
2025-05-21 16:26:33 +02:00
Concedo
5b6ed445de
better warning message
2025-05-21 21:47:40 +08:00
Robin Davidsson
0d5c742161
server : Add the endpoints /api/tags and /api/chat ( #13659 )
...
* Add the endpoints /api/tags and /api/chat
Add the endpoints /api/tags and /api/chat, and improved the model metadata response
* Remove trailing whitespaces
* Removed code that is not needed for copilot to work.
2025-05-21 15:15:27 +02:00
Dorin-Andrei Geman
42158ae2e8
server : fix first message identification ( #13634 )
...
* server : fix first message identification
When using the OpenAI SDK (https://github.com/openai/openai-node/blob/master/src/lib/ChatCompletionStream.ts#L623-L626 ) we noticed that the expected assistant role is missing in the first streaming message. Fix this by correctly checking for the first message.
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
* server : Fix checks for first role message for stream=True
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
---------
Signed-off-by: Dorin Geman <dorin.geman@docker.com>
Co-authored-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
2025-05-21 15:07:57 +02:00
Georgi Gerganov
797f2ac062
kv-cache : simplify the interface ( #13660 )
...
* kv-cache : simplify the interface
ggml-ci
* context : revert llama_batch_allocr position change
ggml-ci
2025-05-21 15:11:13 +03:00
Concedo
3fefb3bdf2
Merge commit ' f0adb80bf7
' into concedo_experimental
...
# Conflicts:
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/docker.md
# examples/sycl/run-llama2.sh
# examples/sycl/win-run-llama2.bat
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tools/llama-bench/README.md
2025-05-21 19:10:57 +08:00
Concedo
c0edde61c5
hey what do you know, it worked
2025-05-21 18:50:05 +08:00
Georgi Gerganov
b44890df2e
model : disable SWA for Phi models ( #13676 )
...
* model : disable SWA for Phi models
ggml-ci
* model : update warning message
* model : print warning only if n_swa > 0
* model : fix typo
2025-05-21 13:09:21 +03:00
Concedo
d04b4eeb04
merge not working
2025-05-21 18:06:41 +08:00
Concedo
8b6dfbd1be
disabling the gMask prefix for glm-4 completions
2025-05-21 17:29:24 +08:00
Concedo
49305942ab
try disabling the gMask prefix for glm-4 completions
2025-05-21 16:47:08 +08:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-05-21 09:58:49 +08:00
Eve
fb1cab201c
vulkan: fix warnings ( #13626 )
...
* small fixes
* remove ifdef
2025-05-20 21:35:16 +00:00
Concedo
c64557a851
add darkdetect to reqs
2025-05-21 01:16:15 +08:00
l3utterfly
b7a17463ec
mtmd-helper : bug fix to token batching in mtmd ( #13650 )
...
* Update mtmd-helper.cpp
* Update tools/mtmd/mtmd-helper.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-20 18:55:30 +02:00
Georgi Gerganov
be0239693c
model : fix llama4 graph ( #13663 )
...
ggml-ci
2025-05-20 19:21:04 +03:00
Georgi Gerganov
a4090d1174
llama : remove llama_kv_cache_view API + remove deprecated ( #13653 )
...
ggml-ci
2025-05-20 16:13:16 +03:00
Johannes Gäßler
b69f1647f9
CUDA: skip fully masked-out KV in FA vec kernel ( #13584 )
...
* CUDA: skip fully masked-out KV in FA vec kernel
2025-05-20 14:45:07 +02:00
Sigbjørn Skjæret
759e37b0d8
tests : avoid github urls due to throttling ( #13654 )
2025-05-20 12:03:17 +02:00
Svetlozar Georgiev
4245e622e0
sycl: disable reorder for sycl mulmat ( #13536 )
2025-05-20 11:34:15 +02:00