Concedo
3326bdc00a
if blank, autoguess template
2025-09-27 12:49:32 +08:00
Concedo
c7a1eec4e4
try to solve ttscpp oom regression
2025-09-24 17:45:28 +08:00
Concedo
d3f9db8d33
fix system32 writability check
2025-09-24 14:43:41 +08:00
Concedo
174d00bb74
fix aria2c with both download cases
2025-09-22 21:47:08 +08:00
Concedo
59b6a09ae1
try to fix kokoro alloc again
2025-09-22 21:22:41 +08:00
Concedo
216b766aee
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-riscv-native.yml
# .github/workflows/build.yml
# ci/README.md
# ci/run.sh
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# tests/test-backend-ops.cpp
2025-09-22 13:56:02 +08:00
Concedo
08c0246f24
prioritize cwd for downloads if its writable
2025-09-22 13:47:50 +08:00
Concedo
13bee0d39d
some minor fixes
2025-09-22 13:20:06 +08:00
Concedo
c8686a627e
don't mandate mistral common for other model usage
2025-09-21 21:16:02 +08:00
Concedo
9e7661352c
Revert "FA default on"
...
This reverts commit 19c3efb34a .
2025-09-21 17:28:49 +08:00
Concedo
19c3efb34a
FA default on
2025-09-14 11:53:18 +08:00
Concedo
bf8fc4659b
updated lite, tweak default rep pens, default fa off (+3 squashed commit)
...
Squashed commit:
[be2b10125] default rep pen 1.05
[cb4527b15] better to default fa off
[126104fe7] updated lite
2025-09-14 00:11:52 +08:00
Concedo
89feffc0e4
fix aria2c
2025-09-06 09:45:14 +08:00
Concedo
f9ce2a00f0
consistent file download locations
2025-09-06 09:26:45 +08:00
Concedo
979e2113e2
flash attention is now checked by default when using gui launcher
2025-09-03 23:36:43 +08:00
Concedo
5c4ad392ea
added a new parameter --ratelimit that will apply per-IP based rate limiting (to help prevent abuse of public instances).
2025-09-01 22:08:13 +08:00
Concedo
53360e2cff
linting
2025-08-30 15:27:31 +08:00
lone-cloud
cb9bd2fc4a
fix automatic VRAM detection for ROCm and Vulkan backends ( #1715 )
...
* use rocminfo for ROCm VRAM detection
* vulkan VRAM detection needs to consider all heaps, don't print that we're unable to detect VRAM until all detection is ran
2025-08-30 15:22:32 +08:00
Concedo
7b396bd917
added v1 voices endpoint, added lcpp aliases for cli, fixed dia wrong voice
2025-08-30 11:20:18 +08:00
Concedo
645b09ea20
renamed promptlimit to genlimit, now applies to API requests as well, can be set in the ui. hide API info display if running in CLI mode.
2025-08-30 00:26:05 +08:00
Concedo
3060dfb99f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/model-conversion/Makefile
# examples/model-conversion/scripts/causal/convert-model.sh
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# scripts/compare-commits.sh
2025-08-28 23:17:29 +08:00
Concedo
3655ecf9b3
minor template and tts ui fixes
2025-08-27 22:30:09 +08:00
Concedo
205a0b8d4c
fix kokoro replacement, add 4096 batch size option
2025-08-25 15:57:13 +08:00
Concedo
b0a8d11584
add tts max length for kokoro (+1 squashed commits)
...
Squashed commits:
[c1c6feaf] add tts max length for kokoro
2025-08-24 17:57:29 +08:00
Concedo
a6aa47322b
csv fix
2025-08-23 12:48:11 +08:00
Concedo
80dabbb689
minor adjustments for sdquant: allow backend to do the translation for the type more defensively, adjust the UI dropdown for clarity.
2025-08-22 23:23:32 +08:00
Wagner Bruna
2f8b0ec538
Support q8_0 quantization for image model loading ( #1692 )
...
* Support q8_0 quantization for image model loading
q4_0 may degrade quality significantly, especially for smaller
models like SD 1.5 and SDXL. q8_0 provides a middle-ground,
giving half the memory savings of q4_0 but loading faster and
with less quality loss.
* Accept --sdquant with no parameters
* Use numerical values for the sdquant option
2025-08-22 22:17:15 +08:00
Concedo
7fef0bc949
fix filename regex for whisper
2025-08-22 22:04:05 +08:00
Concedo
9dd6b4c930
improve whisper transcribe apt regex
2025-08-22 17:13:51 +08:00
liuyunrui123
c13db49d5b
Log output supports utf8 encoding display ( #1700 )
2025-08-21 16:52:03 +08:00
Concedo
3210b378e8
better tool calls
2025-08-20 22:11:31 +08:00
Concedo
eb33467c8c
fixed text
2025-08-20 12:25:04 +08:00
Wagner Bruna
6003e90e50
Add flash attention and conv2d direct controls for image generation ( #1678 )
...
* Add separate flash attention config for image generation
* Add config option for Conv2D Direct
2025-08-20 12:17:57 +08:00
Concedo
9fb0611115
handle contractions correctly, bump defaults
2025-08-18 22:33:44 +08:00
Concedo
2abe11071b
custom voice handling
2025-08-18 16:57:34 +08:00
Concedo
685129fb5a
add missing title, set max tts length to 1024, updated lite (+2 squashed commit)
...
Squashed commit:
[0737a028] add missing title
[a42328b0] add max tts length 1024
2025-08-17 21:42:56 +08:00
Concedo
bcaf379509
tts.cpp merged and working in kcpp!
2025-08-17 18:09:28 +08:00
Concedo
52606e9b1d
tts cpp model is now loadable in kcpp
2025-08-17 15:47:22 +08:00
Concedo
5a921a40f9
add overridenativecontext flag, stop nagging me
2025-08-14 22:54:45 +08:00
Concedo
4b2ca1169c
more consistency fixes
2025-08-13 19:28:53 +08:00
Concedo
955cf66bbc
load embedding at current maxctx instead of max trained ctx by default
2025-08-13 18:42:14 +08:00
Concedo
06a3ee4c3b
populate better server identifier headers.
2025-08-13 16:10:30 +08:00
Concedo
30e2f25c05
alias tensorsplit , fixed python error
2025-08-10 22:38:14 +08:00
Concedo
8e6d27f629
handle if assistant_message_gen and assistant_message_gen!=assistant_message_start, replace final output tag with unspaced (gen) version if exists
2025-08-10 16:51:34 +08:00
kallewoof
204739e7f1
Adapter fixes ( #1659 )
...
* test adapters
* add assistant_gen adapter key
* add support for chat templates stored as .jinja files
* removed mistakenly commited gated-tokenizers link
* autoguess: Harmony: add missing newline prefixes to system_end
2025-08-10 16:19:50 +08:00
Concedo
89266ac6b8
autoguess adapter make case insensitive
2025-08-10 00:58:47 +08:00
Concedo
487d509b44
try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95 (+1 squashed commits)
...
Squashed commits:
[940f0c639] try fix oldpc cuda broken without flash attn since upstream pr14361 between 1.94 and 1.95
2025-08-10 00:10:37 +08:00
Concedo
4c1faf61b2
increment version (+1 squashed commits)
...
Squashed commits:
[6e5080ad2] increment version
2025-08-09 20:53:26 +08:00
Concedo
ced98823a1
kai api tool calling
2025-08-09 10:51:10 +08:00
Concedo
9e7a940ce4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/softmax_4_f16.cl
# ggml/src/ggml-opencl/kernels/softmax_4_f32.cl
# ggml/src/ggml-opencl/kernels/softmax_f16.cl
# ggml/src/ggml-opencl/kernels/softmax_f32.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
2025-08-09 01:24:52 +08:00