Commit graph

10692 commits

Author SHA1 Message Date
Sigbjørn Skjæret
d4e0d95cf5
chore : clean up relative source dir paths (#14128) 2025-06-11 19:04:23 +02:00
Sigbjørn Skjæret
cc66a7f78f
tests : add test-tokenizers-repo (#14017) 2025-06-11 17:16:32 +02:00
Jeff Bolz
bd248d4dc7
vulkan: Better thread-safety for command pools/buffers (#14116)
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
2025-06-11 09:48:52 -05:00
Aman
7781e5fe99
webui: Wrap long numbers instead of infinite horizontal scroll (#14062)
* webui: Wrap long numbers instead of infinite horizontal scroll

* Use tailwind class

* update index.html.gz
2025-06-11 16:42:25 +02:00
Georgi Gerganov
89a184fa71
kv-cache : relax SWA masking condition (#14119)
ggml-ci
2025-06-11 16:48:45 +03:00
Taylor
2baf07727f
server : pass default --keep argument (#14120) 2025-06-11 13:43:43 +03:00
Georgi Gerganov
7ae2932116
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121) 2025-06-11 12:52:45 +03:00
Jeff Bolz
1f7d50b293
vulkan: Track descriptor pools/sets per-context (#14109)
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.
2025-06-11 07:19:25 +02:00
lhez
4c763c8d1b
opencl: add mul_mv_id_q4_0_f32_8x_flat (#14003) 2025-06-10 16:55:58 -07:00
compilade
dad5c44398
kv-cache : avoid modifying recurrent cells when setting inputs (#13834)
* kv-cache : avoid modifying recurrent cells when setting inputs

* kv-cache : remove inp_s_mask

It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.

* kv-cache : fix non-consecutive token pos warning for recurrent models

The problem was apparently caused by how the tail cells were swapped.

* graph : simplify logic for recurrent state copies

* kv-cache : use cell without src refs for rs_z in recurrent cache

* llama-graph : fix recurrent state copy

The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.

Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.

* llama-graph : rename n_state to state_size in build_recurrent_state

This naming should reduce confusion between the state size
and the number of states.
2025-06-10 18:20:14 -04:00
Sigbjørn Skjæret
55f6b9fa65
convert : fix duplicate key DeepSeek-R1 conversion error (#14103) 2025-06-10 23:29:52 +02:00
Concedo
5cdb2d3fc6 cleanup 2025-06-11 01:35:40 +08:00
Sigbjørn Skjæret
3678b838bb
llama : support GEGLU for jina-bert-v2 (#14090) 2025-06-10 18:02:08 +02:00
Jeff Bolz
652b70e667
vulkan: force device 0 in CI (#14106) 2025-06-10 10:53:47 -05:00
Juk Armstrong
3a12db23b6
Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104) 2025-06-10 16:48:07 +01:00
Georgi Gerganov
ae92c1855b sync : ggml
ggml-ci
2025-06-10 18:39:33 +03:00
Georgi Gerganov
b7ce1ad1e3 ggml : fix weak alias win32 (whisper/0)
ggml-ci
2025-06-10 18:39:33 +03:00
henk717
f151648f03 Pyinstaller launcher and dependency updates
This PR adds a new launcher executable to the unpack feature, eliminating the need to have python and its dependencies in the unpacked version. It also does a few dependency changes to help future proof.
2025-06-10 23:08:02 +08:00
Concedo
8386546e08 Switched VS2019 for revert cu12.1 build, hopefully solves dll issues
try change order (+3 squashed commit)

Squashed commit:

[457f02507] try newer jimver

[64af28862] windows pyinstaller shim. the final loader will be moved into the packed directory later.

[0272ecf2d] try alternative way of getting cuda toolkit 12.4 since jimver wont work, also fix rocm
try again (+3 squashed commit)

Squashed commit:

[133e81633] try without pwsh

[4d99cefba] try without pwsh

[bdfa91e7d] try alternative way of getting cuda toolkit 12.4, also fix rocm
2025-06-10 23:08:02 +08:00
0cc4m
97340b4c99
Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099) 2025-06-10 13:01:33 +01:00
Isaac McFadyen
2bb0467043
rpc : nicer error messages for RPC server crash (#14076) 2025-06-10 09:41:01 +03:00
Georgi Gerganov
b8e2194efc sync : ggml
ggml-ci
2025-06-10 09:21:56 +03:00
Kai Pastor
1a3b5e80f7 Add in-build ggml::ggml ALIAS library (ggml/1260)
Enable uniform linking with subproject and with find_package.
2025-06-10 09:21:56 +03:00
Georgi Gerganov
1f63e75f3b
metal : use less stack memory in FA kernel (#14088)
* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant
2025-06-09 23:05:02 +03:00
Georgi Gerganov
40cbf571c9
kv-cache : fix shift and defrag logic (#14081)
* kv-cache : fix shift

ggml-ci

* cont : reset shift[i]

ggml-ci

* cont : fix defrag erasing cells that didn't move

ggml-ci
2025-06-09 23:04:35 +03:00
Diego Devesa
7f4fbe5183
llama : allow building all tests on windows when not using shared libs (#13980)
* llama : allow building all tests on windows when not using shared libraries

* add static windows build to ci

* tests : enable debug logs for test-chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-09 20:03:09 +02:00
Concedo
28b35ca879 allow wmma flag for rocm 2025-06-10 01:23:48 +08:00
Concedo
7d8aa31f1f fixed embeddings, added new parameter to limit max embeddings context 2025-06-10 01:11:55 +08:00
xctan
f470bc36be
ggml-cpu : split arch-specific implementations (#13892)
* move ggml-cpu-aarch64 to repack

* split quantize_row_q8_0/1

* split helper functions

* split ggml_vec_dot_q4_0_q8_0

* split ggml_vec_dot_q4_1_q8_1

* split ggml_vec_dot_q5_0_q8_0

* split ggml_vec_dot_q5_1_q8_1

* split ggml_vec_dot_q8_0_q8_0

* split ggml_vec_dot_tq1_0_q8_K

* split ggml_vec_dot_tq2_0_q8_K

* split ggml_vec_dot_q2_K_q8_K

* split ggml_vec_dot_q3_K_q8_K

* split ggml_vec_dot_q4_K_q8_K

* split ggml_vec_dot_q5_K_q8_K

* split ggml_vec_dot_q6_K_q8_K

* split ggml_vec_dot_iq2_xxs_q8_K

* split ggml_vec_dot_iq2_xs_q8_K

* split ggml_vec_dot_iq2_s_q8_K

* split ggml_vec_dot_iq3_xxs_q8_K

* split ggml_vec_dot_iq3_s_q8_K

* split ggml_vec_dot_iq1_s_q8_K

* split ggml_vec_dot_iq1_m_q8_K

* split ggml_vec_dot_iq4_nl_q8_0

* split ggml_vec_dot_iq4_xs_q8_K

* fix typos

* fix missing prototypes

* rename ggml-cpu-quants.c

* rename ggml-cpu-traits

* rename arm folder

* move cpu-feats-x86.cpp

* rename ggml-cpu-hbm

* update arm detection macro in quants.c

* move iq quant tables

* split ggml_quantize_mat_q8_0/K

* split ggml_gemv_*

* split ggml_gemm_*

* rename namespace aarch64 to repack

* use weak aliases to replace test macros

* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

* rename more aarch64 to repack

* clean up rebase leftover

* fix compilation errors

* remove trailing spaces

* try to fix clang compilation errors

* try to fix clang compilation errors again

* try to fix clang compilation errors, 3rd attempt

* try to fix clang compilation errors, 4th attempt

* try to fix clang compilation errors, 5th attempt

* try to fix clang compilation errors, 6th attempt

* try to fix clang compilation errors, 7th attempt

* try to fix clang compilation errors, 8th attempt

* try to fix clang compilation errors, 9th attempt

* more cleanup

* fix compilation errors

* fix apple targets

* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-09 16:47:13 +02:00
Diego Devesa
8f47e25f56
cuda : fix device sync on buffer clear (#14033) 2025-06-09 16:36:26 +02:00
Georgi Gerganov
201b31dc2e
graph : fix geglu (#14077)
ggml-ci
2025-06-09 17:17:31 +03:00
Xinpeng Dou
e21d2d4ae2
CANN: Simplify the environment variable setting(#13104)
* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md
2025-06-09 19:47:39 +08:00
R0CKSTAR
dc0623fddb
webui: fix sidebar being covered by main content (#14082)
* webui: fix sidebar being covered by main content

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* webui: update index.html.gz

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-09 12:01:17 +02:00
Georgi Gerganov
87d34b381d
server : fix LRU check (#14079)
ggml-ci
2025-06-09 12:57:58 +03:00
Concedo
8780b33c64 consolidate imports 2025-06-09 17:48:54 +08:00
Nicolò Scipione
b460d16ae8
sycl: Add reorder to Q6_K mmvq implementation (#13885)
* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-09 11:47:07 +02:00
Concedo
deece4be69 missed a build target 2025-06-09 17:05:56 +08:00
Concedo
68ec00909b updated lite (+1 squashed commits)
Squashed commits:

[375c5768b] updated lite
2025-06-09 16:33:42 +08:00
Đinh Trọng Huy
91a8ee6a6f
add geglu activation function (#14074)
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
2025-06-09 05:15:31 +01:00
Yuanhao Ji
056eb74534
CANN: Enable labeler for Ascend NPU (#13914) 2025-06-09 11:20:06 +08:00
Diego Devesa
247e5c6e44
cuda : fix buffer type check with integrated GPUs (#14069) 2025-06-08 11:39:56 -07:00
Concedo
82d7c53b85 embeddings handle base64 2025-06-09 00:26:40 +08:00
Concedo
7de88802f9 revert padding change for sd chroma 2025-06-08 23:48:46 +08:00
Concedo
1cf7648305 fixed adapter 2025-06-08 23:24:11 +08:00
Concedo
771bd7197b updated lite (+1 squashed commits)
Squashed commits:

[907f10f2f] updated lite
2025-06-08 23:22:26 +08:00
Concedo
6c5c8be48d try to make rocm work for the github ci, requires disabling rocwmma 2025-06-08 21:52:29 +08:00
Concedo
7f57846c2f update bundled vcrts 2025-06-08 19:39:42 +08:00
Concedo
2d4c1aa5a0 chroma support is now usable 2025-06-08 18:53:59 +08:00
Concedo
30cf433ab4 merge base support for chroma, however its not working correctly 2025-06-08 18:06:23 +08:00
Concedo
dcf88d6e78 Revert "make tts use gpu by default. use --ttscpu to disable"
This reverts commit 669f80265b.
2025-06-08 17:08:04 +08:00