Concedo
52606e9b1d
tts cpp model is now loadable in kcpp
2025-08-17 15:47:22 +08:00
Concedo
7b5cf7143f
handle gguf already containing renamed diffusion tensors prefix
2025-08-12 22:42:29 +08:00
Concedo
3468c2834d
fixed adv mode
2025-08-08 22:26:36 +08:00
Concedo
5a3b2e3921
fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them.
2025-07-12 18:54:40 +08:00
Concedo
c45b8dc56f
fix for gemma3n
2025-07-10 17:39:08 +08:00
Concedo
736030bb9f
save and load state upgraded to 3 available states
2025-06-04 22:09:40 +08:00
Concedo
53f1511396
use a static buffer for kv reloads instead. also, added into lite ui
2025-06-03 22:32:46 +08:00
Concedo
4b57108508
Save KV State and Load KV State to memory added. GUI not yet updated
2025-06-03 17:46:29 +08:00
Concedo
c2802af9e8
fix qwen3, fixed sd, fixed glm4
2025-04-29 20:50:46 +08:00
Concedo
4decd6bea1
GLM4 batch clamp
2025-04-26 09:42:17 +08:00
Concedo
3992fb79cc
wip adding embeddings support
2025-03-24 18:01:23 +08:00
Concedo
0460d92cc3
disable context shifting for gemma3
2025-03-13 20:28:26 +08:00
Concedo
b162c25a5e
fixed moe experts to use detected arch for key
2025-02-10 17:46:08 +08:00
Concedo
b3de1598e7
Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
...
tts is functional (+6 squashed commit)
Squashed commit:
[22396311] wip tts
[3a883027] tts not yet working
[0dcfab0e] fix silly bug
[a378d9ef] some long overdue cleanup
[fc5a6fb5] Wip tts
[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
00d154b32b
wip on qwen2vl integration, updated msvc runtimes
2024-12-15 23:58:02 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00
kallewoof
547ab2aebb
API: add /props route ( #1222 )
...
* API: add an /extra/chat_template route
A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.
* switch to pre-established /props endpoint for chat template
* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
90f5cd0f67
wip logprobs data
2024-10-30 00:59:34 +08:00
Concedo
fc7fe2e7a0
allow rwkv6 to run although its broken
2024-09-09 20:50:58 +08:00
Concedo
0dd3907940
qwen2 warning FA
2024-07-09 20:53:25 +08:00
Nexesenex
cb2336f5d9
Gradient rope formula with offsets ( #938 )
...
* Gradient rope formula with offsets
Positive for Solar models
Negative for Llama 1 and 2 models
* Update gpttype_adapter.cpp
Remove L1/L2
* cleanup PR, skip llama models, keep prints behind debug mode
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-06-25 20:46:34 +08:00
askmyteapot
1e72b65c38
GradientAI Auto ROPE Base calculation ( #910 )
...
* GradientAI Auto ROPE Base calculation
https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models
has a formula that better fits the ideal rope scaling.
Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX.
* add in solar scaling logic
Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k.
* Update model_adapter.h
adding in tensor count to identify solar models based on tensor count of 435.
* Update model_adapter.cpp
add in n_tensor count for solar identification
* refactor and cleanup GradientAI rope scaling
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-06-13 18:12:00 +08:00
Concedo
10b148f4c2
added skip bos for tokenize endpoint
2024-06-05 10:49:11 +08:00
Concedo
f24aef8792
initial whisper integration
2024-05-29 23:13:11 +08:00
Concedo
47c42fd45c
fix for mamba processing
2024-03-13 13:27:46 +08:00
Concedo
5a44d4de2b
refactor and clean identifiers for sd, fix cmake
2024-02-29 18:28:45 +08:00
Concedo
524ba12abd
refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr
2024-02-29 14:02:20 +08:00
Concedo
f75e479db0
WIP on sdcpp integration
2024-02-29 00:40:07 +08:00
Concedo
762eeb6204
triage for opencl
2024-01-27 11:09:43 +08:00
Concedo
d9a7bd577a
gpu layer offloading disabled for phi models in clblast
2024-01-25 17:40:05 +08:00
Concedo
6570a2005b
token count includes ids
2023-12-03 15:44:53 +08:00
Concedo
8b919b5b57
allow customized rope to use model set values
2023-11-15 16:21:52 +08:00
Concedo
839fc6dac8
handle freq_base_train
2023-10-24 23:44:22 +08:00
Concedo
c1ca1de2ac
fixed support for old falcon models
2023-10-18 17:20:44 +08:00
Concedo
7fb809b94b
fixed auto rope scaling (+1 squashed commits)
...
Squashed commits:
[b1767874] wip
2023-09-07 14:45:08 +08:00
Concedo
d4c22a8b02
updated lite, added autorope config based on trained ctxlen, hotfix for falcon gpu broken
2023-08-30 16:50:55 +08:00
Concedo
b95a4ccb22
added a token counting endpoint, set mmq as default
2023-08-24 20:41:49 +08:00
Concedo
981c9131f0
gguf for llama is working
2023-08-23 16:07:07 +08:00
Concedo
39cc83e8c9
incomplete merge, compiles but generates rubbish
2023-08-22 23:12:47 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
Concedo
d28ed99e59
remove unused declarations
2023-06-09 18:01:55 +08:00
Concedo
6f82e17b7a
added MPT support
2023-06-03 16:14:08 +08:00
Concedo
5d9f5b28a6
rwkv integration completed
2023-05-28 00:48:56 +08:00
Concedo
c048bcfec4
remove old filever checks (+7 squashed commit)
...
Squashed commit:
[b72627a] new format not working
[e568870] old ver works
[7053b77] compile errors fixed, fixing linkers
[4ae8889] add new ver
[ff82dfd] file format checks
[25b8aa8] refactoring type names
[931063b] still merging
2023-05-21 00:15:39 +08:00
Concedo
b692e4d2a4
wip
2023-05-14 17:21:07 +08:00
Concedo
05cf5f7d6e
partially working, but the blas matmul is broken
2023-05-13 11:35:38 +08:00
Concedo
2f2eff6e13
the dark gods have been sated, and redpajama is integrated... but at what cost?
2023-05-08 20:58:00 +08:00
Concedo
5eec5d6ed9
Added backwards compatibility to an earlier version of NeoX.
2023-04-25 20:34:18 +08:00
Concedo
ef13443047
wip pythia integration
2023-04-22 01:08:23 +08:00
Concedo
45ec09d31b
fast forwarding for rwkv for unmodified contexts
2023-04-19 15:09:35 +08:00