Commit graph

7248 commits

Author SHA1 Message Date
Concedo
0cfd8d23cb handle symlinks (+1 squashed commits)
Squashed commits:

[fb8477b9] fixed makefile (+4 squashed commit)

Squashed commit:

[4a245bba] fixed a makefile issue

[d68eba69] alias usehipblas to usecublas

[a9ab0a7c] dynamic rocwmma selection

[fefe17c7] revert rocwmma
2025-03-17 21:03:30 +08:00
Concedo
131107dc91 lite fix admin button display issue when preload story 2025-03-17 00:17:31 +08:00
Concedo
6888f5495d allow quantkv with contextshift 2025-03-16 21:48:42 +08:00
Concedo
e466ce65e2 updated sd metadata 2025-03-16 20:12:43 +08:00
Concedo
8708403ee9 revert clean 2025-03-16 17:53:35 +08:00
Concedo
5ef1722d5f fix for sd 2025-03-16 17:02:42 +08:00
Concedo
0954e9e476 improve model estimation 2025-03-16 16:14:13 +08:00
Concedo
5d7c5e9e33 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/tts/tts.cpp
2025-03-16 15:42:39 +08:00
Concedo
2401502cbd improvement to tool calling, allowing specific tools to be used 2025-03-16 15:20:08 +08:00
Concedo
9f7fd63160 revert unwanted change to tool calling 2025-03-16 01:35:48 +08:00
marcoStocchi
f4c3dd5daa
llama-tts : add '-o' option (#12398)
* added -o option to specify an output file name

* llama-tts returns ENOENT in case of file write error

note : PR #12042 is closed as superseded with this one.
2025-03-15 17:23:11 +01:00
Concedo
98eade358a more rocm include dir 2025-03-15 23:29:00 +08:00
aubreyli
3d35d87b41
SYCL: Delete redundant plus sign and space (#12391) 2025-03-15 15:49:03 +01:00
fairydreaming
b19bd064c0
SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399)
* sycl : support non-contiguous tensors in binary ops

* sycl : silence unused variable warning

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-15 22:19:30 +08:00
Concedo
67851e5415 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/run/run.cpp
#	ggml/src/ggml-cann/aclnn_ops.cpp
2025-03-15 19:54:19 +08:00
Concedo
e84596ec1a add config for default gen tokens and bos toggle 2025-03-15 19:53:06 +08:00
Concedo
bfc30066c9 fixed a clip processing bug 2025-03-15 17:49:49 +08:00
Concedo
7272165e0e verbosity 2025-03-15 12:13:04 +08:00
Concedo
4212f0b8e8 wip on multiple fixes 2025-03-15 10:50:36 +08:00
Chenguang Li
92a391327e
[CANN]MUL_MAT optimization (#12382) 2025-03-15 09:31:08 +08:00
Eric Curtin
9f2250ba72
Add CLI arg to llama-run to adjust the number of threads used (#12370)
We default to 4, sometimes we want to manually adjust this

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret
774973b8f3
main : add -sysf / --system-prompt-file (#12249) (#12250)
* add system_prompt_file

* add -sysf / --system-prompt-file

* remove system_prompt_file
2025-03-14 16:57:05 +01:00
Concedo
4a29e216e7 edit readme 2025-03-14 21:06:55 +08:00
fairydreaming
8fcb563613
Load all MoE experts during warmup (#11571)
* llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup

* common : use new API to enable warmup mode during model warmup

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-03-14 13:47:05 +01:00
Concedo
d7498e7e8a added model switching to gguf in admin mode (auto guess layers) 2025-03-14 19:45:55 +08:00
Concedo
30cb77a900 rename replace_instruct_placeholders field 2025-03-14 18:37:12 +08:00
Concedo
be3bba67ff Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	src/llama-model.cpp
2025-03-14 18:25:21 +08:00
Victor
add2a3aa5a
server: fix "--grammar-file" parameter (#12285) 2025-03-14 11:21:17 +01:00
Concedo
782e1e193a replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else (+1 squashed commits)
Squashed commits:

[4a73c8d3] replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else
2025-03-14 18:18:32 +08:00
Concedo
6a1dd57435 gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3 2025-03-14 17:47:01 +08:00
Georgi Gerganov
c522ce4143
graph : simplify attn input build for unified KV cache (#12381)
ggml-ci
2025-03-14 10:47:44 +02:00
Georgi Gerganov
081bee8c64
hparams : add SWA rope parameters (#12374)
ggml-ci
2025-03-14 09:03:24 +02:00
Concedo
7dc72db9de Merge branch 'upstream' into concedo_experimental 2025-03-14 11:58:53 +08:00
Concedo
0db4ae6237 traded my ink for a pen 2025-03-14 11:58:15 +08:00
Georgi Gerganov
84d5475541
llama : fix Gemma3 SWA KV cache shift (#12373)
* llama : fix Gemma3 SWA KV cache shift

ggml-ci

* hparams : add comment [no ci]
2025-03-13 19:08:07 +02:00
Concedo
52cf1ded0c remove unwanted print 2025-03-14 00:24:28 +08:00
Concedo
bdf2977372 fixed windows ci 2025-03-13 20:45:16 +08:00
Concedo
0460d92cc3 disable context shifting for gemma3 2025-03-13 20:28:26 +08:00
Concedo
ca698f0cbe tweaked sd img metadata 2025-03-13 20:04:29 +08:00
Wagner Bruna
5413be2c1b
sd: add generation parameters to image metadata (#1416)
Straight adaptation from stable-diffusion.cpp main.cpp.
2025-03-13 19:35:06 +08:00
Xuan-Son Nguyen
be7c303410
arg : no n_predict = -2 for examples except for main and infill (#12364) 2025-03-13 12:34:54 +01:00
Concedo
2c9ade61fe test automatic vk shader rebuilding 2025-03-13 19:34:15 +08:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
2025-03-13 12:35:44 +02:00
Ishaan Gandhi
2048b5913d
server : fix crash when using verbose output with input tokens that are not in printable range (#12178) (#12338)
* Fix DOS index bug

* Remove new APIs

* remove extra line

* Remove from API

* Add extra newline

* Update examples/server/server.cpp

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-03-13 11:10:05 +01:00
Concedo
e75539e8cb too many issues without BOS (+1 squashed commits)
Squashed commits:

[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124 streamline output console log (+1 squashed commits)
Squashed commits:

[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
16137f4281 gemma3 now works correctly 2025-03-13 14:34:18 +08:00
Concedo
57c9523405 sd lora from url 2025-03-13 10:55:01 +08:00
Oscar Barenys
f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) 2025-03-12 20:06:58 +01:00
Concedo
77debb1b1b gemma3 vision works, but is using more tokens than expected - may need resizing 2025-03-13 00:31:16 +08:00