Concedo
993925ba96
gracefully handle bad grammar instead of crashing
2026-03-23 17:00:53 +08:00
Concedo
13db5aee9e
stub files for loading ace step
2026-02-22 23:15:08 +08:00
Concedo
7f485e5287
remove CLBlast, part 1
2026-01-23 13:50:12 +08:00
Concedo
c9c15749e0
wip on adding esrgan upscaling
2026-01-20 00:35:35 +08:00
Llama
95ebfdcde8
Add token ids to logprob data returned by the API ( #1928 )
...
Previously, logprobs only contained the token string
and byte data, as well as the log probability itself.
For workflows that require the token id, translating
from the token bytes to the token id is potentially
costly and unreliable. It is simple and inexpensive
to expose the numeric token ids directly instead.
2026-01-18 16:30:46 +08:00
Concedo
d2b2224b0d
vulkan env var always take priority
2026-01-17 10:34:45 +08:00
Wagner Bruna
f30da43b7f
sd: get the available schedulers directly from sd.cpp ( #1900 )
...
Avoids a hardcoded list on the Python side.
2025-12-24 21:55:24 +08:00
Concedo
1e083d9c8b
integrate autofit for upstream, removed forceversion
2025-12-17 18:42:47 +08:00
Ruben Garcia
06d39dff73
Fix warnings ( #1864 )
2025-11-29 20:18:38 +08:00
Concedo
abf527a207
clearer multimodal capability display
2025-07-28 22:54:49 +08:00
Concedo
811463a704
split audio and vision detection separately
2025-07-13 17:47:15 +08:00
Concedo
65ff041827
added more perf stats
2025-06-21 12:12:28 +08:00
Concedo
736030bb9f
save and load state upgraded to 3 available states
2025-06-04 22:09:40 +08:00
Concedo
53f1511396
use a static buffer for kv reloads instead. also, added into lite ui
2025-06-03 22:32:46 +08:00
Concedo
4b57108508
Save KV State and Load KV State to memory added. GUI not yet updated
2025-06-03 17:46:29 +08:00
Concedo
8e1ebc55b5
dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored
2025-06-02 12:49:53 +08:00
Concedo
6a709be50a
replace deprecated
2025-03-27 10:27:20 +08:00
Concedo
3992fb79cc
wip adding embeddings support
2025-03-24 18:01:23 +08:00
Concedo
eb1809c105
add more perf stats
2025-03-12 18:58:27 +08:00
Concedo
b3de1598e7
Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
...
tts is functional (+6 squashed commit)
Squashed commit:
[22396311] wip tts
[3a883027] tts not yet working
[0dcfab0e] fix silly bug
[a378d9ef] some long overdue cleanup
[fc5a6fb5] Wip tts
[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00
Concedo
272828cab0
tweaks to chat template
2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route ( #1222 )
...
* API: add an /extra/chat_template route
A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.
* switch to pre-established /props endpoint for chat template
* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
aa26a58085
added logprobs api and logprobs viewer
2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67
wip logprobs data
2024-10-30 00:59:34 +08:00
Concedo
12fd16bfd4
Merge commit ' df270ef745' into concedo_experimental
...
# Conflicts:
# Makefile
# common/CMakeLists.txt
# common/common.h
# common/sampling.cpp
# common/sampling.h
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/server/server.cpp
# include/llama.h
# src/llama-sampling.cpp
# src/llama-sampling.h
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
# tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
27bbdf7d2a
added link for novita AI, added legacy warning for old GGML models
2024-09-09 11:19:32 +08:00
Concedo
813cf829b5
allow selecting multigpu on vulkan
2024-06-06 18:36:56 +08:00
Concedo
10b148f4c2
added skip bos for tokenize endpoint
2024-06-05 10:49:11 +08:00
Concedo
f24aef8792
initial whisper integration
2024-05-29 23:13:11 +08:00
Concedo
7968bdebbb
added more stats in perf
2024-03-16 16:53:48 +08:00
Concedo
d943c739a8
wip submitting of llava image to backend
2024-03-10 17:14:27 +08:00
Concedo
5a44d4de2b
refactor and clean identifiers for sd, fix cmake
2024-02-29 18:28:45 +08:00
Concedo
524ba12abd
refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr
2024-02-29 14:02:20 +08:00
Concedo
f75e479db0
WIP on sdcpp integration
2024-02-29 00:40:07 +08:00
Concedo
4cd571db89
vulkan multigpu, show uptime
2024-02-08 16:54:38 +08:00
Concedo
ec2dbd99a3
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# llama.cpp
2024-02-07 22:21:32 +08:00
Concedo
481f7a6fbc
warn about unsupported arch
2024-01-26 14:22:43 +08:00
Concedo
2a4a7241e6
Merge branch 'vulkan_test' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# llama.cpp
2024-01-25 23:01:44 +08:00
Concedo
d9a7bd577a
gpu layer offloading disabled for phi models in clblast
2024-01-25 17:40:05 +08:00
Concedo
94e68fe474
added field to show recent seed
2024-01-02 15:35:04 +08:00
Nexesenex
cf360f3e62
Update expose.cpp '#include <cstdint> ( #586 )
2023-12-28 15:01:22 +08:00
Concedo
6570a2005b
token count includes ids
2023-12-03 15:44:53 +08:00
Concedo
be92cfa125
added preloadstory
2023-11-10 13:05:22 +08:00
Concedo
7fb809b94b
fixed auto rope scaling (+1 squashed commits)
...
Squashed commits:
[b1767874] wip
2023-09-07 14:45:08 +08:00
Concedo
1301bd7e29
Fix to skip GPU offloading so falcon models work correctly
2023-08-30 18:26:41 +08:00
Concedo
b95a4ccb22
added a token counting endpoint, set mmq as default
2023-08-24 20:41:49 +08:00
Concedo
280abaf029
added stop reason in the perf endpoint
2023-07-24 11:55:35 +08:00
Concedo
39dc1a46c4
added token count, updated lite
2023-07-20 14:41:06 +08:00