Commit graph

7391 commits

Author SHA1 Message Date
Concedo
5d7c5e9e33 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/tts/tts.cpp
2025-03-16 15:42:39 +08:00
Concedo
2401502cbd improvement to tool calling, allowing specific tools to be used 2025-03-16 15:20:08 +08:00
Concedo
9f7fd63160 revert unwanted change to tool calling 2025-03-16 01:35:48 +08:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option (#12398)
* added -o option to specify an output file name

* llama-tts returns ENOENT in case of file write error

note : PR #12042 is closed as superseded with this one.
2025-03-15 17:23:11 +01:00
Concedo
98eade358a more rocm include dir 2025-03-15 23:29:00 +08:00
aubreyli
3d35d87b41
SYCL: Delete redundant plus sign and space (#12391) 2025-03-15 15:49:03 +01:00
fairydreaming
b19bd064c0
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399)
* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-15 22:19:30 +08:00
Concedo
67851e5415 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/run/run.cpp
#	ggml/src/ggml-cann/aclnn_ops.cpp
2025-03-15 19:54:19 +08:00
Concedo
e84596ec1a add config for default gen tokens and bos toggle 2025-03-15 19:53:06 +08:00
Concedo
bfc30066c9 fixed a clip processing bug 2025-03-15 17:49:49 +08:00
Concedo
7272165e0e verbosity 2025-03-15 12:13:04 +08:00
Concedo
4212f0b8e8 wip on multiple fixes 2025-03-15 10:50:36 +08:00
Chenguang Li
92a391327e
[CANN]MUL_MAT optimization (#12382) 2025-03-15 09:31:08 +08:00
Eric Curtin
9f2250ba72
Add CLI arg to llama-run to adjust the number of threads used (#12370)
We default to 4, sometimes we want to manually adjust this

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file (#12249) (#12250)
* add system_prompt_file

* add -sysf / --system-prompt-file

* remove system_prompt_file
2025-03-14 16:57:05 +01:00
Concedo
4a29e216e7 edit readme 2025-03-14 21:06:55 +08:00
fairydreaming
8fcb563613
Load all MoE experts during warmup (#11571)
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup

* common : use new API to enable warmup mode during model warmup

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-14 13:47:05 +01:00
Concedo
d7498e7e8a added model switching to gguf in admin mode (auto guess layers) 2025-03-14 19:45:55 +08:00
Concedo
30cb77a900 rename replace_instruct_placeholders field 2025-03-14 18:37:12 +08:00
Concedo
be3bba67ff Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	src/llama-model.cpp
2025-03-14 18:25:21 +08:00
Victor
add2a3aa5a
server: fix "--grammar-file" parameter (#12285) 2025-03-14 11:21:17 +01:00
Concedo
782e1e193a replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else (+1 squashed commits)
Squashed commits:

[4a73c8d3] replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else
2025-03-14 18:18:32 +08:00
Concedo
6a1dd57435 gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3 2025-03-14 17:47:01 +08:00
Georgi Gerganov
c522ce4143
graph : simplify attn input build for unified KV cache (#12381)
ggml-ci
2025-03-14 10:47:44 +02:00
Georgi Gerganov
081bee8c64
hparams : add SWA rope parameters (#12374)
ggml-ci
2025-03-14 09:03:24 +02:00
Concedo
7dc72db9de Merge branch 'upstream' into concedo_experimental 2025-03-14 11:58:53 +08:00
Concedo
0db4ae6237 traded my ink for a pen 2025-03-14 11:58:15 +08:00
Georgi Gerganov
84d5475541
llama : fix Gemma3 SWA KV cache shift (#12373)
* llama : fix Gemma3 SWA KV cache shift

ggml-ci

* hparams : add comment [no ci]
2025-03-13 19:08:07 +02:00
Concedo
52cf1ded0c remove unwanted print 2025-03-14 00:24:28 +08:00
Concedo
bdf2977372 fixed windows ci 2025-03-13 20:45:16 +08:00
Concedo
0460d92cc3 disable context shifting for gemma3 2025-03-13 20:28:26 +08:00
Concedo
ca698f0cbe tweaked sd img metadata 2025-03-13 20:04:29 +08:00
Wagner Bruna
5413be2c1b
sd: add generation parameters to image metadata (#1416)
Straight adaptation from stable-diffusion.cpp main.cpp.
2025-03-13 19:35:06 +08:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill (#12364) 2025-03-13 12:34:54 +01:00
Concedo
2c9ade61fe test automatic vk shader rebuilding 2025-03-13 19:34:15 +08:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d
server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338)
* Fix DOS index bug

* Remove new APIs

* remove extra line

* Remove from API

* Add extra newline

* Update examples/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-13 11:10:05 +01:00
Concedo
e75539e8cb too many issues without BOS (+1 squashed commits)
Squashed commits:

[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124 streamline output console log (+1 squashed commits)
Squashed commits:

[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
16137f4281 gemma3 now works correctly 2025-03-13 14:34:18 +08:00
Concedo
57c9523405 sd lora from url 2025-03-13 10:55:01 +08:00
Oscar Barenys
f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) 2025-03-12 20:06:58 +01:00
Concedo
77debb1b1b gemma3 vision works, but is using more tokens than expected - may need resizing 2025-03-13 00:31:16 +08:00
Daniel Bevenius
80a02aa858
llama.swiftui : fix xcframework dir in README [no ci] (#12353)
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Concedo
eb1809c105 add more perf stats 2025-03-12 18:58:27 +08:00
Alberto Cabrera Pérez
363f8c5d67
sycl : variable sg_size support for mmvq kernels (#12336) 2025-03-12 09:57:32 +00:00
uvos
34c961b181
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315)
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) (#12343)
* llama : Add Gemma 3 text-only support

* fix python coding style

* fix compile on ubuntu

* python: fix style

* fix ubuntu compile

* fix build on ubuntu (again)

* fix ubuntu build, finally

* clip : Experimental support for Gemma 3 vision (#12344)

* clip : Experimental support for Gemma 3 vision

* fix build

* PRId64
2025-03-12 09:30:24 +01:00
Jeff Bolz
bf69cfe62f
vulkan: fix bug in coopmat1 mul_mat_id (#12316)
* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id
2025-03-12 06:59:19 +01:00
Concedo
e500968f92 fixed ggml common path in metal build 2025-03-12 10:58:57 +08:00