Concedo
64ce5fca15
better approach when SWA window exceeded, simply refill the window. this is not 100% correct but good enough for fastforward users. Disable FF or increase window if not good enough
2026-04-17 11:44:13 +08:00
Concedo
dc2e6ca2e3
fix header path
2026-04-05 11:02:08 +08:00
Concedo
eb3422996a
BOS fix for gemma4
2026-04-04 22:15:01 +08:00
Concedo
226c79338f
handle glm4.7 flash template
2026-01-28 23:29:08 +08:00
Concedo
b867b67e7e
added mechanics for a full clear if fast forward is not used, this should help recover from bad states
2025-12-05 16:43:37 +08:00
Concedo
8631bbcee3
linting
2025-11-18 18:56:31 +08:00
LostRuins Concedo
5751c30790
add vulkan for whisper
2025-11-13 15:37:58 +08:00
Concedo
3b30f12ca7
future proof handling of rnn models
2025-10-07 19:12:47 +08:00
Concedo
7857578f45
handle more rnn models
2025-10-07 13:47:15 +08:00
Concedo
5d89a48a50
add more rnn models supported
2025-09-24 18:14:59 +08:00
Concedo
52606e9b1d
tts cpp model is now loadable in kcpp
2025-08-17 15:47:22 +08:00
Concedo
7b5cf7143f
handle gguf already containing renamed diffusion tensors prefix
2025-08-12 22:42:29 +08:00
Concedo
3468c2834d
fixed adv mode
2025-08-08 22:26:36 +08:00
Concedo
61c19fea56
fixed glm4 sop, lower regex max stacks (+2 squashed commit)
...
Squashed commit:
[47e39ae5d] lower regex max stack again
[0a32ca232] lower regex max stack again
2025-08-06 17:10:57 +08:00
Concedo
5a3b2e3921
fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them.
2025-07-12 18:54:40 +08:00
Concedo
c45b8dc56f
fix for gemma3n
2025-07-10 17:39:08 +08:00
Concedo
f125e724eb
fix off-by-one npast during some instances of fast forwarding
2025-05-22 19:51:21 +08:00
Concedo
f841b29c41
fixed unicode paths
2025-05-11 14:05:54 +08:00
Concedo
c2802af9e8
fix qwen3, fixed sd, fixed glm4
2025-04-29 20:50:46 +08:00
Concedo
4decd6bea1
GLM4 batch clamp
2025-04-26 09:42:17 +08:00
Concedo
35dc8387e9
fixed rwkv7 handling
2025-04-26 02:13:06 +08:00
Concedo
0460d92cc3
disable context shifting for gemma3
2025-03-13 20:28:26 +08:00
Concedo
b162c25a5e
fixed moe experts to use detected arch for key
2025-02-10 17:46:08 +08:00
Concedo
e788b8289a
You'll never take us alive
...
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
Concedo
00d154b32b
wip on qwen2vl integration, updated msvc runtimes
2024-12-15 23:58:02 +08:00
Concedo
bb13925f39
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# Makefile
# Package.swift
# ci/run.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml.c
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-rope.cpp
2024-11-04 16:54:53 +08:00
Concedo
fc7fe2e7a0
allow rwkv6 to run although its broken
2024-09-09 20:50:58 +08:00
Concedo
0dd3907940
qwen2 warning FA
2024-07-09 20:53:25 +08:00
Nexesenex
cb2336f5d9
Gradient rope formula with offsets ( #938 )
...
* Gradient rope formula with offsets
Positive for Solar models
Negative for Llama 1 and 2 models
* Update gpttype_adapter.cpp
Remove L1/L2
* cleanup PR, skip llama models, keep prints behind debug mode
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-06-25 20:46:34 +08:00
askmyteapot
1e72b65c38
GradientAI Auto ROPE Base calculation ( #910 )
...
* GradientAI Auto ROPE Base calculation
https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models
has a formula that better fits the ideal rope scaling.
Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX.
* add in solar scaling logic
Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k.
* Update model_adapter.h
adding in tensor count to identify solar models based on tensor count of 435.
* Update model_adapter.cpp
add in n_tensor count for solar identification
* refactor and cleanup GradientAI rope scaling
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-06-13 18:12:00 +08:00
Concedo
47c42fd45c
fix for mamba processing
2024-03-13 13:27:46 +08:00
Concedo
f75e479db0
WIP on sdcpp integration
2024-02-29 00:40:07 +08:00
Concedo
762eeb6204
triage for opencl
2024-01-27 11:09:43 +08:00
Concedo
d9a7bd577a
gpu layer offloading disabled for phi models in clblast
2024-01-25 17:40:05 +08:00
Concedo
375003b458
always show reported arch
2023-12-22 11:15:07 +08:00
Concedo
8b919b5b57
allow customized rope to use model set values
2023-11-15 16:21:52 +08:00
Concedo
5db89b90b7
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .gitignore
# CMakeLists.txt
# Makefile
# README.md
# build.zig
# ggml-opencl.cpp
# tests/CMakeLists.txt
# tests/test-double-float.cpp
# tests/test-sampling.cpp
2023-10-25 23:58:15 +08:00
Concedo
839fc6dac8
handle freq_base_train
2023-10-24 23:44:22 +08:00
Concedo
c1ca1de2ac
fixed support for old falcon models
2023-10-18 17:20:44 +08:00
Concedo
7fb809b94b
fixed auto rope scaling (+1 squashed commits)
...
Squashed commits:
[b1767874] wip
2023-09-07 14:45:08 +08:00
Concedo
d4c22a8b02
updated lite, added autorope config based on trained ctxlen, hotfix for falcon gpu broken
2023-08-30 16:50:55 +08:00
Concedo
4b00916ac7
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .dockerignore
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# flake.nix
# tests/CMakeLists.txt
2023-08-28 14:19:05 +08:00
Concedo
bfdc596d58
gguf reader in file format detection
2023-08-23 19:19:52 +08:00
Concedo
39cc83e8c9
incomplete merge, compiles but generates rubbish
2023-08-22 23:12:47 +08:00
Concedo
3a7853d259
handle stablecode-completion-alpha-3b
2023-08-09 21:07:57 +08:00
Concedo
df9135e3a9
fixing memory bugs
2023-06-23 18:41:23 +08:00
Concedo
9b6c35b651
rwkv speed enhancements (batch processing), fixed a rwkv token processing bug
2023-06-13 16:02:12 +08:00
Concedo
6f82e17b7a
added MPT support
2023-06-03 16:14:08 +08:00
Concedo
5d9f5b28a6
rwkv integration completed
2023-05-28 00:48:56 +08:00
Concedo
01a0f206df
added support for starcoder, which is basically gpt2
2023-05-27 13:35:40 +08:00