Commit graph

96 commits

Author SHA1 Message Date
Concedo
abf527a207 clearer multimodal capability display 2025-07-28 22:54:49 +08:00
Concedo
811463a704 split audio and vision detection separately 2025-07-13 17:47:15 +08:00
Concedo
65ff041827 added more perf stats 2025-06-21 12:12:28 +08:00
Concedo
736030bb9f save and load state upgraded to 3 available states 2025-06-04 22:09:40 +08:00
Concedo
53f1511396 use a static buffer for kv reloads instead. also, added into lite ui 2025-06-03 22:32:46 +08:00
Concedo
4b57108508 Save KV State and Load KV State to memory added. GUI not yet updated 2025-06-03 17:46:29 +08:00
Concedo
8e1ebc55b5 dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored 2025-06-02 12:49:53 +08:00
Concedo
6a709be50a replace deprecated 2025-03-27 10:27:20 +08:00
Concedo
3992fb79cc wip adding embeddings support 2025-03-24 18:01:23 +08:00
Concedo
eb1809c105 add more perf stats 2025-03-12 18:58:27 +08:00
Concedo
b3de1598e7 Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)

Squashed commit:

[22396311] wip tts

[3a883027] tts not yet working

[0dcfab0e] fix silly bug

[a378d9ef] some long overdue cleanup

[fc5a6fb5] Wip tts

[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
f75bbb945f speculative decoding initial impl completed (+6 squashed commit)
Squashed commit:

[0a6306ca0] draft wip dont use (will be squashed)

[a758a1c9c] wip dont use (will be squashed)

[e1994d3ce] wip dont use

[f59690d68] wip

[77228147d] wip on spec decoding. dont use yet

[2445bca54] wip adding speculative decoding (+1 squashed commits)

Squashed commits:

[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
2c1a06a07d wip ollama emulation, added detokenize endpoint 2024-11-23 22:48:03 +08:00
Concedo
272828cab0 tweaks to chat template 2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
12fd16bfd4 Merge commit 'df270ef745' into concedo_experimental
# Conflicts:
#	Makefile
#	common/CMakeLists.txt
#	common/common.h
#	common/sampling.cpp
#	common/sampling.h
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/server/server.cpp
#	include/llama.h
#	src/llama-sampling.cpp
#	src/llama-sampling.h
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-grammar-parser.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-llama-grammar.cpp
#	tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
27bbdf7d2a added link for novita AI, added legacy warning for old GGML models 2024-09-09 11:19:32 +08:00
Concedo
813cf829b5 allow selecting multigpu on vulkan 2024-06-06 18:36:56 +08:00
Concedo
10b148f4c2 added skip bos for tokenize endpoint 2024-06-05 10:49:11 +08:00
Concedo
f24aef8792 initial whisper integration 2024-05-29 23:13:11 +08:00
Concedo
7968bdebbb added more stats in perf 2024-03-16 16:53:48 +08:00
Concedo
d943c739a8 wip submitting of llava image to backend 2024-03-10 17:14:27 +08:00
Concedo
5a44d4de2b refactor and clean identifiers for sd, fix cmake 2024-02-29 18:28:45 +08:00
Concedo
524ba12abd refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr 2024-02-29 14:02:20 +08:00
Concedo
f75e479db0 WIP on sdcpp integration 2024-02-29 00:40:07 +08:00
Concedo
4cd571db89 vulkan multigpu, show uptime 2024-02-08 16:54:38 +08:00
Concedo
ec2dbd99a3 Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	flake.lock
#	llama.cpp
2024-02-07 22:21:32 +08:00
Concedo
481f7a6fbc warn about unsupported arch 2024-01-26 14:22:43 +08:00
Concedo
2a4a7241e6 Merge branch 'vulkan_test' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	llama.cpp
2024-01-25 23:01:44 +08:00
Concedo
d9a7bd577a gpu layer offloading disabled for phi models in clblast 2024-01-25 17:40:05 +08:00
Concedo
94e68fe474 added field to show recent seed 2024-01-02 15:35:04 +08:00
Nexesenex
cf360f3e62
Update expose.cpp '#include <cstdint> (#586) 2023-12-28 15:01:22 +08:00
Concedo
6570a2005b token count includes ids 2023-12-03 15:44:53 +08:00
Concedo
be92cfa125 added preloadstory 2023-11-10 13:05:22 +08:00
Concedo
7fb809b94b fixed auto rope scaling (+1 squashed commits)
Squashed commits:

[b1767874] wip
2023-09-07 14:45:08 +08:00
Concedo
1301bd7e29 Fix to skip GPU offloading so falcon models work correctly 2023-08-30 18:26:41 +08:00
Concedo
b95a4ccb22 added a token counting endpoint, set mmq as default 2023-08-24 20:41:49 +08:00
Concedo
280abaf029 added stop reason in the perf endpoint 2023-07-24 11:55:35 +08:00
Concedo
39dc1a46c4 added token count, updated lite 2023-07-20 14:41:06 +08:00
Concedo
1d1111e10f expose timing info in web api 2023-07-11 18:56:06 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo 2023-07-08 09:33:26 +08:00
shutup
1727e652f1 expose some useful info that can be used in statistics of performence 2023-07-07 11:52:58 +08:00
Concedo
27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas 2023-07-06 22:33:46 +08:00
Concedo
66a3f4e421 added support for lora base 2023-06-10 19:29:45 +08:00
Concedo
43f7e40470 added extra endpoints for abort gen and polled streaming 2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation 2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite 2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming 2023-06-08 18:34:08 +02:00