Commit graph

1188 commits

Author SHA1 Message Date
Concedo
645b09ea20 renamed promptlimit to genlimit, now applies to API requests as well, can be set in the ui. hide API info display if running in CLI mode. 2025-08-30 00:26:05 +08:00
Concedo
3060dfb99f Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/model-conversion/Makefile
#	examples/model-conversion/scripts/causal/convert-model.sh
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cuda/CMakeLists.txt
#	scripts/compare-commits.sh
2025-08-28 23:17:29 +08:00
Concedo
3655ecf9b3 minor template and tts ui fixes 2025-08-27 22:30:09 +08:00
Concedo
205a0b8d4c fix kokoro replacement, add 4096 batch size option 2025-08-25 15:57:13 +08:00
Concedo
b0a8d11584 add tts max length for kokoro (+1 squashed commits)
Squashed commits:

[c1c6feaf] add tts max length for kokoro
2025-08-24 17:57:29 +08:00
Concedo
a6aa47322b csv fix 2025-08-23 12:48:11 +08:00
Concedo
80dabbb689 minor adjustments for sdquant: allow backend to do the translation for the type more defensively, adjust the UI dropdown for clarity. 2025-08-22 23:23:32 +08:00
Wagner Bruna
2f8b0ec538
Support q8_0 quantization for image model loading (#1692)
* Support q8_0 quantization for image model loading

q4_0 may degrade quality significantly, especially for smaller
models like SD 1.5 and SDXL. q8_0 provides a middle-ground,
giving half the memory savings of q4_0 but loading faster and
with less quality loss.

* Accept --sdquant with no parameters

* Use numerical values for the sdquant option
2025-08-22 22:17:15 +08:00
Concedo
7fef0bc949 fix filename regex for whisper 2025-08-22 22:04:05 +08:00
Concedo
9dd6b4c930 improve whisper transcribe apt regex 2025-08-22 17:13:51 +08:00
liuyunrui123
c13db49d5b
Log output supports utf8 encoding display (#1700) 2025-08-21 16:52:03 +08:00
Concedo
3210b378e8 better tool calls 2025-08-20 22:11:31 +08:00
Concedo
eb33467c8c fixed text 2025-08-20 12:25:04 +08:00
Wagner Bruna
6003e90e50
Add flash attention and conv2d direct controls for image generation (#1678)
* Add separate flash attention config for image generation

* Add config option for Conv2D Direct
2025-08-20 12:17:57 +08:00
Concedo
9fb0611115 handle contractions correctly, bump defaults 2025-08-18 22:33:44 +08:00
Concedo
2abe11071b custom voice handling 2025-08-18 16:57:34 +08:00
Concedo
685129fb5a add missing title, set max tts length to 1024, updated lite (+2 squashed commit)
Squashed commit:

[0737a028] add missing title

[a42328b0] add max tts length 1024
2025-08-17 21:42:56 +08:00
Concedo
bcaf379509 tts.cpp merged and working in kcpp! 2025-08-17 18:09:28 +08:00
Concedo
52606e9b1d tts cpp model is now loadable in kcpp 2025-08-17 15:47:22 +08:00
Concedo
5a921a40f9 add overridenativecontext flag, stop nagging me 2025-08-14 22:54:45 +08:00
Concedo
4b2ca1169c more consistency fixes 2025-08-13 19:28:53 +08:00
Concedo
955cf66bbc load embedding at current maxctx instead of max trained ctx by default 2025-08-13 18:42:14 +08:00
Concedo
06a3ee4c3b populate better server identifier headers. 2025-08-13 16:10:30 +08:00
Concedo
30e2f25c05 alias tensorsplit , fixed python error 2025-08-10 22:38:14 +08:00
Concedo
8e6d27f629 handle if assistant_message_gen and assistant_message_gen!=assistant_message_start, replace final output tag with unspaced (gen) version if exists 2025-08-10 16:51:34 +08:00
kallewoof
204739e7f1
Adapter fixes (#1659)
* test adapters

* add assistant_gen adapter key

* add support for chat templates stored as .jinja files

* removed mistakenly commited gated-tokenizers link

* autoguess: Harmony: add missing newline prefixes to system_end
2025-08-10 16:19:50 +08:00
Concedo
89266ac6b8 autoguess adapter make case insensitive 2025-08-10 00:58:47 +08:00
Concedo
487d509b44 try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95 (+1 squashed commits)
Squashed commits:

[940f0c639] try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95
2025-08-10 00:10:37 +08:00
Concedo
4c1faf61b2 increment version (+1 squashed commits)
Squashed commits:

[6e5080ad2] increment version
2025-08-09 20:53:26 +08:00
Concedo
ced98823a1 kai api tool calling 2025-08-09 10:51:10 +08:00
Concedo
9e7a940ce4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/softmax_4_f16.cl
#	ggml/src/ggml-opencl/kernels/softmax_4_f32.cl
#	ggml/src/ggml-opencl/kernels/softmax_f16.cl
#	ggml/src/ggml-opencl/kernels/softmax_f32.cl
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
2025-08-09 01:24:52 +08:00
Concedo
8a71eb03c0 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	ggml/cmake/ggml-config.cmake.in
#	ggml/src/ggml-cann/CMakeLists.txt
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cuda/fattn.cu
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	requirements/requirements-convert_hf_to_gguf.txt
#	scripts/compare-llama-bench.py
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
#	tools/llama-bench/llama-bench.cpp
2025-08-07 21:23:09 +08:00
Concedo
e40d26b9e7 allow offloading moe to cpu with --moecpu 2025-08-05 23:42:42 +08:00
Concedo
9fbbd9e127 half measure for mistral spaced formats 2025-08-04 23:48:11 +08:00
Concedo
6cb8f95b5b tool calling params have been ported over to KAI api and can be used, same syntax as OAI endpoint 2025-08-03 16:21:57 +08:00
Concedo
fa815f76c9 updated model recs (+1 squashed commits)
Squashed commits:

[3e0431ae1] updated model recs
2025-08-02 11:41:37 +08:00
Concedo
cd0dc0abec allow tool calls to be triggered by any role 2025-08-02 10:00:35 +08:00
Concedo
a87c05f8c1 move function call determination to separate method 2025-07-31 14:14:38 +08:00
Concedo
cade9f42bc bump defaults 2025-07-31 12:05:57 +08:00
Concedo
1976bb3f53 fixes for tool calling 2025-07-30 19:25:39 +08:00
Concedo
abf527a207 clearer multimodal capability display 2025-07-28 22:54:49 +08:00
Concedo
ecb2cbf547 fix url params parse search 2025-07-27 16:41:42 +08:00
Concedo
8192cd6747 handle multi tool calls 2025-07-25 23:06:23 +08:00
Concedo
f25339c92b handle empty objects returned by tool calls, also remove misinterpretation of the tools calls instruct tag within ChatML autoguess 2025-07-25 22:22:27 +08:00
Concedo
0d72c794fa Merge commit 'c8ade30036' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/im2col_f16.cl
#	ggml/src/ggml-opencl/kernels/im2col_f32.cl
#	ggml/src/ggml-sycl/im2col.cpp
#	tools/mtmd/clip.cpp
2025-07-25 19:42:45 +08:00
Concedo
8f622cfb50 debugmode longer prints 2025-07-23 19:28:39 +08:00
Concedo
4b348d0b7e add 2 more save slots 2025-07-22 21:07:19 +08:00
Concedo
75154a3d91 add ping endpoint 2025-07-22 18:55:35 +08:00
Concedo
30675b0798 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	docs/build.md
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tools/imatrix/README.md
#	tools/imatrix/imatrix.cpp
2025-07-20 22:47:31 +08:00
Concedo
b028dd4e84 minor fixes 2025-07-18 13:22:59 +08:00