Concedo
52cf1ded0c
remove unwanted print
2025-03-14 00:24:28 +08:00
Concedo
bdf2977372
fixed windows ci
2025-03-13 20:45:16 +08:00
Concedo
0460d92cc3
disable context shifting for gemma3
2025-03-13 20:28:26 +08:00
Concedo
ca698f0cbe
tweaked sd img metadata
2025-03-13 20:04:29 +08:00
Wagner Bruna
5413be2c1b
sd: add generation parameters to image metadata ( #1416 )
...
Straight adaptation from stable-diffusion.cpp main.cpp.
2025-03-13 19:35:06 +08:00
Concedo
2c9ade61fe
test automatic vk shader rebuilding
2025-03-13 19:34:15 +08:00
Concedo
e75539e8cb
too many issues without BOS (+1 squashed commits)
...
Squashed commits:
[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124
streamline output console log (+1 squashed commits)
...
Squashed commits:
[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
16137f4281
gemma3 now works correctly
2025-03-13 14:34:18 +08:00
Concedo
57c9523405
sd lora from url
2025-03-13 10:55:01 +08:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Daniel Bevenius
80a02aa858
llama.swiftui : fix xcframework dir in README [no ci] ( #12353 )
...
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Concedo
eb1809c105
add more perf stats
2025-03-12 18:58:27 +08:00
Alberto Cabrera Pérez
363f8c5d67
sycl : variable sg_size support for mmvq kernels ( #12336 )
2025-03-12 09:57:32 +00:00
uvos
34c961b181
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 ( #12315 )
...
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
2025-03-12 09:30:24 +01:00
Jeff Bolz
bf69cfe62f
vulkan: fix bug in coopmat1 mul_mat_id ( #12316 )
...
* tests: run mul_mat_id with a larger N
* vulkan: fix bug in coopmat1 mul_mat_id
2025-03-12 06:59:19 +01:00
Concedo
e500968f92
fixed ggml common path in metal build
2025-03-12 10:58:57 +08:00
uvos
10f2e81809
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows per block between host and device code. ( #12177 )
...
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-03-11 20:16:03 +01:00
jklincn
ba7654380a
ggml-backend : fix backend search path ( #12330 )
...
* Fix backend search path
* replace .native() with '/'
* reverted .native()
2025-03-11 14:25:17 +01:00
BB-fat
6ab2e4765a
metal : Cache the Metal library at the device context level ( #12265 )
2025-03-11 13:45:02 +02:00
Xuan-Son Nguyen
96e1280839
clip : bring back GPU support ( #12322 )
...
* clip : bring back GPU support
* use n_gpu_layers param
* fix double free
* ggml_backend_init_by_type
* clean up
2025-03-11 09:20:16 +01:00
Eve
2c9f833d17
mat vec double buffer ( #12188 )
2025-03-10 19:28:11 +00:00
R0CKSTAR
251364549f
musa: support new arch mp_31 and update doc ( #12296 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-10 18:18:25 +01:00
Henry Linjamäki
8acdacb3ea
opencl: use OpenCL C standard supported by the device ( #12221 )
...
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
2025-03-10 09:57:00 -07:00
John Bean
89b2b56e86
readme: added Sidekick to available UIs ( #12311 )
2025-03-10 16:13:09 +02:00
Concedo
b0541f3652
added draft results
2025-03-10 22:03:20 +08:00
Concedo
3a406b37a7
updated lite
2025-03-10 20:45:10 +08:00
Georgi Gerganov
e128a1bf5b
tests : fix test-quantize-fns to init the CPU backend ( #12306 )
...
ggml-ci
2025-03-10 14:07:15 +02:00
marcoStocchi
6ef79a67ca
common : refactor '-o' option ( #12278 )
...
As discussed in PR 'llama-tts : add -o option' (#12042 ):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
2025-03-10 13:34:13 +02:00
Olivier Chafik
4e39a3c332
server: extract <think> tags from qwq outputs (#12297 )
...
* extract <think> tags from qwq outputs
* const for all static regexes in chat.cpp
2025-03-10 10:59:03 +00:00
Olivier Chafik
be421fc429
tool-call: ensure there's always a non-empty tool call id (#12292 )
2025-03-10 09:45:29 +00:00
Olivier Chafik
87c2630546
allow missing content in message if tool_calls provided ( #12293 )
2025-03-10 09:45:07 +00:00
Olivier Chafik
2b3a25c212
sampler: fixes trigger tokens + lazy grammars (fix typo cast from token to string) (#12291 )
...
* Fix typo in lazy grammar handling (fixes trigger tokens)
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-10 09:44:42 +00:00
tc-mb
8352cdc87b
llava : fix bug in minicpm-v code ( #11513 )
...
* fix bug in minicpm-v code
* update readme of minicpm-v
2025-03-10 10:33:24 +02:00
Concedo
6b7c3ae1d3
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# AUTHORS
# README.md
# ci/run.sh
# docs/build.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# scripts/sync-ggml.last
2025-03-10 10:32:41 +08:00
Georgi Gerganov
1e2f78a004
server : add speculative decoding presets for FIM ( #12287 )
2025-03-09 19:08:20 +02:00
Concedo
b061024812
remove some unhelpful warnings
2025-03-09 22:06:33 +08:00
Concedo
dbd8c680ba
allow remote saving to google drive
2025-03-09 15:04:43 +08:00
Georgi Gerganov
0fd7ca7a21
authors : update ( #12271 )
2025-03-08 18:26:00 +02:00
Jason C.H
6fefc05a7a
ggml-backend : make path_str compatible with C++20 ( #12269 )
2025-03-08 17:02:39 +01:00
Concedo
7eadd0a1d3
add GGML_HIP_ROCWMMA_FATTN
2025-03-08 17:15:41 +08:00
Georgi Gerganov
7ab364390f
server : infill gen ends on new line ( #12254 )
2025-03-07 20:54:30 +02:00
Concedo
72bc855e8a
honor add bos token settings from metadata
2025-03-07 22:10:50 +08:00
Daniel Bevenius
7c7f3b7f43
ggml : skip intermediate .air file when compiling .metallib ( #12247 )
...
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.
The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
2025-03-07 14:15:27 +01:00
Georgi Gerganov
102ac1891d
sync : ggml
...
ggml-ci
2025-03-07 14:49:44 +02:00
vmobilis
d6ae2fa061
ggml : ggml_compute_forward_concat() for arbitrary tensor type (ggml/1118)
...
* ggml_compute_forward_concat() for arbitrary tensor type
* Check that tensors' type match
* ggml-cpu.c: check type of source tensors
* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()
* ggml.c: check concatenated tensor type
* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c
..., as it was moved to ggml.c.
2025-03-07 14:49:44 +02:00
Rémy O
68d0027f3d
ggml-cpu: faster AVX2 variant for IQ1_M ( #12216 )
2025-03-07 13:54:22 +02:00
Georgi Gerganov
ea002810a2
ci : fix save-load test invocations ( #12245 )
2025-03-07 12:19:31 +02:00
Sigbjørn Skjæret
8fad3c7a7c
server : Log original chat template parsing error ( #12233 )
2025-03-07 11:15:33 +01:00