Commit graph

452 commits

Author SHA1 Message Date
CasualAutopsy
7703bed260
Temp: Fix Needlessly Iterating on Candidates During Greedy Sampling (#1854) 2025-11-22 16:06:50 +08:00
Concedo
8631bbcee3 linting 2025-11-18 18:56:31 +08:00
LostRuins Concedo
7aea1d7c02 clean up unused llava functions, fix qwen3vl loading 2025-11-18 10:34:55 +08:00
LostRuins Concedo
281542aa0d add smoothing curve, not tested 2025-11-17 23:07:35 +08:00
LostRuins Concedo
3fe0e39b62 Merge commit '4dca015b7e' into concedo_experimental
# Conflicts:
#	.github/copilot-instructions.md
#	README.md
#	docs/ops.md
#	docs/ops/CPU.csv
#	docs/ops/CUDA.csv
#	docs/ops/Vulkan.csv
#	ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
#	src/CMakeLists.txt
#	tests/test-backend-ops.cpp
2025-11-16 18:33:58 +08:00
LostRuins Concedo
86f907272a relocated shader compile warning 2025-11-15 23:17:47 +08:00
LostRuins Concedo
d6a2ad8455 still not really working right 2025-11-09 01:57:48 +08:00
LostRuins Concedo
cfb22b5c9d rename a missed BLAS -> batch 2025-11-06 16:11:26 +08:00
Concedo
0891b0752d qwen3vl fixed (+2 squashed commit)
Squashed commit:

[89f65ed0c] wip fixing q3vl

[6fa34cff2] wip fixing q3vl
2025-10-31 17:52:33 +08:00
Concedo
57e1d9c822 rename blasbatchsize to batchsize 2025-10-24 18:16:54 +08:00
Concedo
68c9d955d2 support multiple override kv 2025-10-24 17:28:54 +08:00
Concedo
e92f9fd422 cursed hack for RNN models 2025-10-11 23:14:55 +08:00
Concedo
3b30f12ca7 future proof handling of rnn models 2025-10-07 19:12:47 +08:00
Concedo
5d89a48a50 add more rnn models supported 2025-09-24 18:14:59 +08:00
Concedo
7e35954695 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	docs/build.md
#	docs/function-calling.md
#	examples/eval-callback/eval-callback.cpp
#	ggml/CMakeLists.txt
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/kleidiai/kernels.cpp
#	ggml/src/ggml-cpu/kleidiai/kernels.h
#	ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
#	scripts/compare-llama-bench.py
#	scripts/server-bench.py
#	scripts/tool_bench.py
#	tests/test-chat.cpp
#	tools/batched-bench/batched-bench.cpp
#	tools/llama-bench/llama-bench.cpp
#	tools/server/README.md
2025-08-31 23:33:36 +08:00
Concedo
3210b378e8 better tool calls 2025-08-20 22:11:31 +08:00
Concedo
5a921a40f9 add overridenativecontext flag, stop nagging me 2025-08-14 22:54:45 +08:00
Concedo
4c1faf61b2 increment version (+1 squashed commits)
Squashed commits:

[6e5080ad2] increment version
2025-08-09 20:53:26 +08:00
Concedo
338b1fe97e readjusted mistral and oai template, fixed compile issue on termux, updated lite, show generated token ids in debug mode 2025-08-07 21:14:48 +08:00
Concedo
34487d3c02 gpt oss harmony template 2025-08-06 11:39:40 +08:00
Concedo
e40d26b9e7 allow offloading moe to cpu with --moecpu 2025-08-05 23:42:42 +08:00
Concedo
428a07416a cleanup some debug 2025-08-05 00:07:22 +08:00
Concedo
3284757b56 voxstral mini is really bad 2025-07-29 21:22:17 +08:00
Concedo
abf527a207 clearer multimodal capability display 2025-07-28 22:54:49 +08:00
Concedo
12a6088a65 added voxtral support, however without the magic token it hears audio as text 2025-07-28 22:35:59 +08:00
Concedo
b87864144b no ctx shift for all mrope 2025-07-25 13:53:20 +08:00
Concedo
9f4d0f6ccf fixed swa pp bug by retrying smaller batches 2025-07-21 23:34:22 +08:00
Concedo
6d50def409 default kv_unified to true, handle LLAMA_SET_ROWS. 2025-07-21 16:13:20 +08:00
Concedo
b028dd4e84 minor fixes 2025-07-18 13:22:59 +08:00
Concedo
f0564f9caf updated lite, added better separators for multimodal chunks (universal) 2025-07-17 00:11:08 +08:00
Concedo
bc2877d2fe test without g3n fix 2025-07-13 23:42:59 +08:00
Concedo
811463a704 split audio and vision detection separately 2025-07-13 17:47:15 +08:00
Concedo
dca49de059 fixed qwen2 audio issues, works fine now (+3 squashed commit)
Squashed commit:

[b3053a1ba] updated lite

[5071630d6] fixed mtmd issues, audio works

[06efa5af4] fix mtmd compile
2025-07-12 18:54:41 +08:00
Concedo
5a3b2e3921 fix for jamba models - they have recurrent layers like rwkv, so context shifting and forwarding wont work on them. 2025-07-12 18:54:40 +08:00
Concedo
e9473305d0 wip2 (+1 squashed commits)
Squashed commits:

[4628777b6] wip
2025-07-12 18:54:40 +08:00
Concedo
c45b8dc56f fix for gemma3n 2025-07-10 17:39:08 +08:00
Reithan
0097de5c57
improve performance by actually applying nsigma's masking (#1602)
merging, please report any issues.
2025-07-07 15:41:46 +08:00
Concedo
2e14338455 additional padding for the swa kv cache itself 2025-06-28 15:52:48 +08:00
Concedo
815d2056d9 gentoken reservations 2025-06-28 09:16:20 +08:00
Concedo
39b0699c71 fixed savestates with drafting 2025-06-27 20:35:38 +08:00
Reithan
54dde5e565
Add memoized cache to llama_grammar_reject_candidates_for_stack (#1615)
* Add memoized cache to llama_grammar_reject_candidates_for_stack

* make size cutoff more aggressive and move to outer branch

* update comment

* add cache reset whenever grammar is reloaded

* remove explicit reference types for compiler transportability
2025-06-25 19:22:19 +08:00
Concedo
65ff041827 added more perf stats 2025-06-21 12:12:28 +08:00
Reithan
f07434f4c1
streamline grammar sampler to speed up generation while using heavy grammar (#1606) 2025-06-17 23:04:59 +08:00
Concedo
c494525b33 update deprecated apis 2025-06-13 22:21:15 +08:00
Reithan
f1c9db4174
fix-loss-of-destroyed-tokens-in-grammar-pre-pass (#1600) 2025-06-13 18:46:38 +08:00
Concedo
5bac0fb3d5 remove debug prints for now, they were kind of cluttered 2025-06-13 16:00:23 +08:00
Reithan
5af9138ebe
Improve GNBF performance by attempting culled grammar search first (#1597)
* cull tokens with top_3k first before running grammar, fallback to unculled if none found

* fix errors

* fix improvement and test against concedo's GBNF

* revert non-culling changes
2025-06-13 15:57:27 +08:00
Concedo
1cbe716e45 allow setting maingpu 2025-06-12 17:53:43 +08:00
Concedo
f6bbc350f2 various qol fixes 2025-06-05 10:26:02 +08:00
Concedo
736030bb9f save and load state upgraded to 3 available states 2025-06-04 22:09:40 +08:00