Concedo
5d7c5e9e33
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/tts/tts.cpp
2025-03-16 15:42:39 +08:00
Concedo
2401502cbd
improvement to tool calling, allowing specific tools to be used
2025-03-16 15:20:08 +08:00
Concedo
9f7fd63160
revert unwanted change to tool calling
2025-03-16 01:35:48 +08:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option ( #12398 )
...
* added -o option to specify an output file name
* llama-tts returns ENOENT in case of file write error
note : PR #12042 is closed as superseded with this one.
2025-03-15 17:23:11 +01:00
Concedo
98eade358a
more rocm include dir
2025-03-15 23:29:00 +08:00
aubreyli
3d35d87b41
SYCL: Delete redundant plus sign and space ( #12391 )
2025-03-15 15:49:03 +01:00
fairydreaming
b19bd064c0
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) ( #12399 )
...
* sycl : support non-contiguous tensors in binary ops
* sycl : silence unused variable warning
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-15 22:19:30 +08:00
Concedo
67851e5415
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/run/run.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
2025-03-15 19:54:19 +08:00
Concedo
e84596ec1a
add config for default gen tokens and bos toggle
2025-03-15 19:53:06 +08:00
Concedo
bfc30066c9
fixed a clip processing bug
2025-03-15 17:49:49 +08:00
Concedo
7272165e0e
verbosity
2025-03-15 12:13:04 +08:00
Concedo
4212f0b8e8
wip on multiple fixes
2025-03-15 10:50:36 +08:00
Chenguang Li
92a391327e
[CANN]MUL_MAT optimization ( #12382 )
2025-03-15 09:31:08 +08:00
Eric Curtin
9f2250ba72
Add CLI arg to llama-run to adjust the number of threads used ( #12370 )
...
We default to 4, sometimes we want to manually adjust this
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file ( #12249 ) ( #12250 )
...
* add system_prompt_file
* add -sysf / --system-prompt-file
* remove system_prompt_file
2025-03-14 16:57:05 +01:00
Concedo
4a29e216e7
edit readme
2025-03-14 21:06:55 +08:00
fairydreaming
8fcb563613
Load all MoE experts during warmup ( #11571 )
...
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup
* common : use new API to enable warmup mode during model warmup
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-14 13:47:05 +01:00
Concedo
d7498e7e8a
added model switching to gguf in admin mode (auto guess layers)
2025-03-14 19:45:55 +08:00
Concedo
30cb77a900
rename replace_instruct_placeholders field
2025-03-14 18:37:12 +08:00
Concedo
be3bba67ff
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# src/llama-model.cpp
2025-03-14 18:25:21 +08:00
Victor
add2a3aa5a
server: fix "--grammar-file" parameter ( #12285 )
2025-03-14 11:21:17 +01:00
Concedo
782e1e193a
replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else (+1 squashed commits)
...
Squashed commits:
[4a73c8d3] replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else
2025-03-14 18:18:32 +08:00
Concedo
6a1dd57435
gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3
2025-03-14 17:47:01 +08:00
Georgi Gerganov
c522ce4143
graph : simplify attn input build for unified KV cache ( #12381 )
...
ggml-ci
2025-03-14 10:47:44 +02:00
Georgi Gerganov
081bee8c64
hparams : add SWA rope parameters ( #12374 )
...
ggml-ci
2025-03-14 09:03:24 +02:00
Concedo
7dc72db9de
Merge branch 'upstream' into concedo_experimental
2025-03-14 11:58:53 +08:00
Concedo
0db4ae6237
traded my ink for a pen
2025-03-14 11:58:15 +08:00
Georgi Gerganov
84d5475541
llama : fix Gemma3 SWA KV cache shift ( #12373 )
...
* llama : fix Gemma3 SWA KV cache shift
ggml-ci
* hparams : add comment [no ci]
2025-03-13 19:08:07 +02:00
Concedo
52cf1ded0c
remove unwanted print
2025-03-14 00:24:28 +08:00
Concedo
bdf2977372
fixed windows ci
2025-03-13 20:45:16 +08:00
Concedo
0460d92cc3
disable context shifting for gemma3
2025-03-13 20:28:26 +08:00
Concedo
ca698f0cbe
tweaked sd img metadata
2025-03-13 20:04:29 +08:00
Wagner Bruna
5413be2c1b
sd: add generation parameters to image metadata ( #1416 )
...
Straight adaptation from stable-diffusion.cpp main.cpp.
2025-03-13 19:35:06 +08:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill ( #12364 )
2025-03-13 12:34:54 +01:00
Concedo
2c9ade61fe
test automatic vk shader rebuilding
2025-03-13 19:34:15 +08:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d
server : fix crash when using verbose output with input tokens that are not in printable range ( #12178 ) ( #12338 )
...
* Fix DOS index bug
* Remove new APIs
* remove extra line
* Remove from API
* Add extra newline
* Update examples/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-13 11:10:05 +01:00
Concedo
e75539e8cb
too many issues without BOS (+1 squashed commits)
...
Squashed commits:
[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124
streamline output console log (+1 squashed commits)
...
Squashed commits:
[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
16137f4281
gemma3 now works correctly
2025-03-13 14:34:18 +08:00
Concedo
57c9523405
sd lora from url
2025-03-13 10:55:01 +08:00
Oscar Barenys
f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support ( #12301 )
2025-03-12 20:06:58 +01:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Daniel Bevenius
80a02aa858
llama.swiftui : fix xcframework dir in README [no ci] ( #12353 )
...
This commit fixes the path to the xcframework in the README file which I
had forgotten to change after renaming the build directory.
2025-03-12 13:45:32 +01:00
Concedo
eb1809c105
add more perf stats
2025-03-12 18:58:27 +08:00
Alberto Cabrera Pérez
363f8c5d67
sycl : variable sg_size support for mmvq kernels ( #12336 )
2025-03-12 09:57:32 +00:00
uvos
34c961b181
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 ( #12315 )
...
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen
7841fc723e
llama : Add Gemma 3 support (+ experimental vision capability) ( #12343 )
...
* llama : Add Gemma 3 text-only support
* fix python coding style
* fix compile on ubuntu
* python: fix style
* fix ubuntu compile
* fix build on ubuntu (again)
* fix ubuntu build, finally
* clip : Experimental support for Gemma 3 vision (#12344 )
* clip : Experimental support for Gemma 3 vision
* fix build
* PRId64
2025-03-12 09:30:24 +01:00
Jeff Bolz
bf69cfe62f
vulkan: fix bug in coopmat1 mul_mat_id ( #12316 )
...
* tests: run mul_mat_id with a larger N
* vulkan: fix bug in coopmat1 mul_mat_id
2025-03-12 06:59:19 +01:00
Concedo
e500968f92
fixed ggml common path in metal build
2025-03-12 10:58:57 +08:00