Commit graph

1085 commits

Author SHA1 Message Date
Concedo
205a0b8d4c fix kokoro replacement, add 4096 batch size option 2025-08-25 15:57:13 +08:00
Concedo
b0a8d11584 add tts max length for kokoro (+1 squashed commits)
Squashed commits:

[c1c6feaf] add tts max length for kokoro
2025-08-24 17:57:29 +08:00
Concedo
a6aa47322b csv fix 2025-08-23 12:48:11 +08:00
Concedo
80dabbb689 minor adjustments for sdquant: allow backend to do the translation for the type more defensively, adjust the UI dropdown for clarity. 2025-08-22 23:23:32 +08:00
Wagner Bruna
2f8b0ec538
Support q8_0 quantization for image model loading (#1692)
* Support q8_0 quantization for image model loading

q4_0 may degrade quality significantly, especially for smaller
models like SD 1.5 and SDXL. q8_0 provides a middle-ground,
giving half the memory savings of q4_0 but loading faster and
with less quality loss.

* Accept --sdquant with no parameters

* Use numerical values for the sdquant option
2025-08-22 22:17:15 +08:00
Concedo
7fef0bc949 fix filename regex for whisper 2025-08-22 22:04:05 +08:00
Concedo
9dd6b4c930 improve whisper transcribe apt regex 2025-08-22 17:13:51 +08:00
liuyunrui123
c13db49d5b
Log output supports utf8 encoding display (#1700) 2025-08-21 16:52:03 +08:00
Concedo
3210b378e8 better tool calls 2025-08-20 22:11:31 +08:00
Concedo
eb33467c8c fixed text 2025-08-20 12:25:04 +08:00
Wagner Bruna
6003e90e50
Add flash attention and conv2d direct controls for image generation (#1678)
* Add separate flash attention config for image generation

* Add config option for Conv2D Direct
2025-08-20 12:17:57 +08:00
Concedo
9fb0611115 handle contractions correctly, bump defaults 2025-08-18 22:33:44 +08:00
Concedo
2abe11071b custom voice handling 2025-08-18 16:57:34 +08:00
Concedo
685129fb5a add missing title, set max tts length to 1024, updated lite (+2 squashed commit)
Squashed commit:

[0737a028] add missing title

[a42328b0] add max tts length 1024
2025-08-17 21:42:56 +08:00
Concedo
bcaf379509 tts.cpp merged and working in kcpp! 2025-08-17 18:09:28 +08:00
Concedo
52606e9b1d tts cpp model is now loadable in kcpp 2025-08-17 15:47:22 +08:00
Concedo
5a921a40f9 add overridenativecontext flag, stop nagging me 2025-08-14 22:54:45 +08:00
Concedo
4b2ca1169c more consistency fixes 2025-08-13 19:28:53 +08:00
Concedo
955cf66bbc load embedding at current maxctx instead of max trained ctx by default 2025-08-13 18:42:14 +08:00
Concedo
06a3ee4c3b populate better server identifier headers. 2025-08-13 16:10:30 +08:00
Concedo
30e2f25c05 alias tensorsplit , fixed python error 2025-08-10 22:38:14 +08:00
Concedo
8e6d27f629 handle if assistant_message_gen and assistant_message_gen!=assistant_message_start, replace final output tag with unspaced (gen) version if exists 2025-08-10 16:51:34 +08:00
kallewoof
204739e7f1
Adapter fixes (#1659)
* test adapters

* add assistant_gen adapter key

* add support for chat templates stored as .jinja files

* removed mistakenly commited gated-tokenizers link

* autoguess: Harmony: add missing newline prefixes to system_end
2025-08-10 16:19:50 +08:00
Concedo
89266ac6b8 autoguess adapter make case insensitive 2025-08-10 00:58:47 +08:00
Concedo
487d509b44 try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95 (+1 squashed commits)
Squashed commits:

[940f0c639] try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95
2025-08-10 00:10:37 +08:00
Concedo
4c1faf61b2 increment version (+1 squashed commits)
Squashed commits:

[6e5080ad2] increment version
2025-08-09 20:53:26 +08:00
Concedo
ced98823a1 kai api tool calling 2025-08-09 10:51:10 +08:00
Concedo
9e7a940ce4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/softmax_4_f16.cl
#	ggml/src/ggml-opencl/kernels/softmax_4_f32.cl
#	ggml/src/ggml-opencl/kernels/softmax_f16.cl
#	ggml/src/ggml-opencl/kernels/softmax_f32.cl
#	ggml/src/ggml-rpc/ggml-rpc.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
2025-08-09 01:24:52 +08:00
Concedo
8a71eb03c0 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	ggml/cmake/ggml-config.cmake.in
#	ggml/src/ggml-cann/CMakeLists.txt
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cuda/fattn.cu
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	requirements/requirements-convert_hf_to_gguf.txt
#	scripts/compare-llama-bench.py
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
#	tools/llama-bench/llama-bench.cpp
2025-08-07 21:23:09 +08:00
Concedo
e40d26b9e7 allow offloading moe to cpu with --moecpu 2025-08-05 23:42:42 +08:00
Concedo
9fbbd9e127 half measure for mistral spaced formats 2025-08-04 23:48:11 +08:00
Concedo
6cb8f95b5b tool calling params have been ported over to KAI api and can be used, same syntax as OAI endpoint 2025-08-03 16:21:57 +08:00
Concedo
fa815f76c9 updated model recs (+1 squashed commits)
Squashed commits:

[3e0431ae1] updated model recs
2025-08-02 11:41:37 +08:00
Concedo
cd0dc0abec allow tool calls to be triggered by any role 2025-08-02 10:00:35 +08:00
Concedo
a87c05f8c1 move function call determination to separate method 2025-07-31 14:14:38 +08:00
Concedo
cade9f42bc bump defaults 2025-07-31 12:05:57 +08:00
Concedo
1976bb3f53 fixes for tool calling 2025-07-30 19:25:39 +08:00
Concedo
abf527a207 clearer multimodal capability display 2025-07-28 22:54:49 +08:00
Concedo
ecb2cbf547 fix url params parse search 2025-07-27 16:41:42 +08:00
Concedo
8192cd6747 handle multi tool calls 2025-07-25 23:06:23 +08:00
Concedo
f25339c92b handle empty objects returned by tool calls, also remove misinterpretation of the tools calls instruct tag within ChatML autoguess 2025-07-25 22:22:27 +08:00
Concedo
0d72c794fa Merge commit 'c8ade30036139e32108fee53d8b7164dbfda4bee' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/im2col_f16.cl
#	ggml/src/ggml-opencl/kernels/im2col_f32.cl
#	ggml/src/ggml-sycl/im2col.cpp
#	tools/mtmd/clip.cpp
2025-07-25 19:42:45 +08:00
Concedo
8f622cfb50 debugmode longer prints 2025-07-23 19:28:39 +08:00
Concedo
4b348d0b7e add 2 more save slots 2025-07-22 21:07:19 +08:00
Concedo
75154a3d91 add ping endpoint 2025-07-22 18:55:35 +08:00
Concedo
30675b0798 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	docs/build.md
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tools/imatrix/README.md
#	tools/imatrix/imatrix.cpp
2025-07-20 22:47:31 +08:00
Concedo
b028dd4e84 minor fixes 2025-07-18 13:22:59 +08:00
Concedo
1ca666f9c1 allow handling multipart files up to 999 2025-07-18 01:18:28 +08:00
Concedo
afca31bfbe handle clean_env for remotetunnel 2025-07-17 18:21:22 +08:00
Concedo
d4a394ff73 label attached media with ids 2025-07-17 10:04:46 +08:00