Concedo
3210b378e8
better tool calls
2025-08-20 22:11:31 +08:00
Concedo
5a921a40f9
add overridenativecontext flag, stop nagging me
2025-08-14 22:54:45 +08:00
Concedo
4c1faf61b2
increment version (+1 squashed commits)
...
Squashed commits:
[6e5080ad2] increment version
2025-08-09 20:53:26 +08:00
Concedo
338b1fe97e
readjusted mistral and oai template, fixed compile issue on termux, updated lite, show generated token ids in debug mode
2025-08-07 21:14:48 +08:00
Concedo
34487d3c02
gpt oss harmony template
2025-08-06 11:39:40 +08:00
Concedo
e40d26b9e7
allow offloading moe to cpu with --moecpu
2025-08-05 23:42:42 +08:00
Concedo
428a07416a
cleanup some debug
2025-08-05 00:07:22 +08:00
Concedo
3284757b56
voxstral mini is really bad
2025-07-29 21:22:17 +08:00
Concedo
abf527a207
clearer multimodal capability display
2025-07-28 22:54:49 +08:00
Concedo
12a6088a65
added voxtral support, however without the magic token it hears audio as text
2025-07-28 22:35:59 +08:00
Concedo
b87864144b
no ctx shift for all mrope
2025-07-25 13:53:20 +08:00
Concedo
9f4d0f6ccf
fixed swa pp bug by retrying smaller batches
2025-07-21 23:34:22 +08:00
Concedo
6d50def409
default kv_unified to true, handle LLAMA_SET_ROWS.
2025-07-21 16:13:20 +08:00
Concedo
b028dd4e84
minor fixes
2025-07-18 13:22:59 +08:00
Concedo
f0564f9caf
updated lite, added better separators for multimodal chunks (universal)
2025-07-17 00:11:08 +08:00
Concedo
bc2877d2fe
test without g3n fix
2025-07-13 23:42:59 +08:00
Concedo
811463a704
split audio and vision detection separately
2025-07-13 17:47:15 +08:00
Concedo
dca49de059
fixed qwen2 audio issues, works fine now (+3 squashed commit)
...
Squashed commit:
[b3053a1ba] updated lite
[5071630d6] fixed mtmd issues, audio works
[06efa5af4] fix mtmd compile
2025-07-12 18:54:41 +08:00
Concedo
5a3b2e3921
fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them.
2025-07-12 18:54:40 +08:00
Concedo
e9473305d0
wip2 (+1 squashed commits)
...
Squashed commits:
[4628777b6] wip
2025-07-12 18:54:40 +08:00
Concedo
c45b8dc56f
fix for gemma3n
2025-07-10 17:39:08 +08:00
Reithan
0097de5c57
improve performance by actually applying nsigma's masking ( #1602 )
...
merging, please report any issues.
2025-07-07 15:41:46 +08:00
Concedo
2e14338455
additional padding for the swa kv cache itself
2025-06-28 15:52:48 +08:00
Concedo
815d2056d9
gentoken reservations
2025-06-28 09:16:20 +08:00
Concedo
39b0699c71
fixed savestates with drafting
2025-06-27 20:35:38 +08:00
Reithan
54dde5e565
Add memoized cache to llama_grammar_reject_candidates_for_stack
( #1615 )
...
* Add memoized cache to llama_grammar_reject_candidates_for_stack
* make size cutoff more aggressive and move to outer branch
* update comment
* add cache reset whenever grammar is reloaded
* remove explicit reference types for compiler transportability
2025-06-25 19:22:19 +08:00
Concedo
65ff041827
added more perf stats
2025-06-21 12:12:28 +08:00
Reithan
f07434f4c1
streamline grammar sampler to speed up generation while using heavy grammar ( #1606 )
2025-06-17 23:04:59 +08:00
Concedo
c494525b33
update deprecated apis
2025-06-13 22:21:15 +08:00
Reithan
f1c9db4174
fix-loss-of-destroyed-tokens-in-grammar-pre-pass ( #1600 )
2025-06-13 18:46:38 +08:00
Concedo
5bac0fb3d5
remove debug prints for now, they were kind of cluttered
2025-06-13 16:00:23 +08:00
Reithan
5af9138ebe
Improve GNBF performance by attempting culled grammar search first ( #1597 )
...
* cull tokens with top_3k first before running grammar, fallback to unculled if none found
* fix errors
* fix improvement and test against concedo's GBNF
* revert non-culling changes
2025-06-13 15:57:27 +08:00
Concedo
1cbe716e45
allow setting maingpu
2025-06-12 17:53:43 +08:00
Concedo
f6bbc350f2
various qol fixes
2025-06-05 10:26:02 +08:00
Concedo
736030bb9f
save and load state upgraded to 3 available states
2025-06-04 22:09:40 +08:00
Concedo
53f1511396
use a static buffer for kv reloads instead. also, added into lite ui
2025-06-03 22:32:46 +08:00
Concedo
4b57108508
Save KV State and Load KV State to memory added. GUI not yet updated
2025-06-03 17:46:29 +08:00
Concedo
6ce85c54d6
not working correctly
2025-06-02 22:12:10 +08:00
Concedo
8e1ebc55b5
dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored
2025-06-02 12:49:53 +08:00
Concedo
51dc1cf920
added scale for text lora
2025-06-02 00:13:42 +08:00
Concedo
0c108f6054
Merge commit '34b7c0439ed0f98575cc4689dfecd98991dee8be' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
# src/CMakeLists.txt
# tools/mtmd/clip.cpp
2025-05-31 12:27:45 +08:00
Concedo
f97bbdde00
fix to allow all EOGs to trigger a stop, occam's glm4 fix,
2025-05-24 22:55:11 +08:00
Concedo
ec04115ae9
swa options now available
2025-05-24 11:50:37 +08:00
Concedo
c4df151298
experimental swa flag
2025-05-23 21:33:26 +08:00
Concedo
69b5d4d4af
cursed hack for glm4, may or may not be better
2025-05-22 22:40:37 +08:00
Concedo
f125e724eb
fix off-by-one npast during some instances of fast forwarding
2025-05-22 19:51:21 +08:00
Concedo
f10574e598
debug text
2025-05-22 14:22:01 +08:00
Concedo
9f976e9c65
swa full used unless ctx shift and fast forward disabled
2025-05-21 22:47:45 +08:00
Concedo
3fefb3bdf2
Merge commit 'f0adb80bf7c2c0d80abb04f4533b5513622d9964' into concedo_experimental
...
# Conflicts:
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/docker.md
# examples/sycl/run-llama2.sh
# examples/sycl/win-run-llama2.bat
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tools/llama-bench/README.md
2025-05-21 19:10:57 +08:00
Concedo
8b6dfbd1be
disabling the gMask prefix for glm-4 completions
2025-05-21 17:29:24 +08:00