Concedo
e9473305d0
wip2 (+1 squashed commits)
...
Squashed commits:
[4628777b6] wip
2025-07-12 18:54:40 +08:00
Concedo
c45b8dc56f
fix for gemma3n
2025-07-10 17:39:08 +08:00
Reithan
0097de5c57
improve performance by actually applying nsigma's masking ( #1602 )
...
merging, please report any issues.
2025-07-07 15:41:46 +08:00
Concedo
2e14338455
additional padding for the swa kv cache itself
2025-06-28 15:52:48 +08:00
Concedo
815d2056d9
gentoken reservations
2025-06-28 09:16:20 +08:00
Concedo
39b0699c71
fixed savestates with drafting
2025-06-27 20:35:38 +08:00
Reithan
54dde5e565
Add memoized cache to llama_grammar_reject_candidates_for_stack
( #1615 )
...
* Add memoized cache to llama_grammar_reject_candidates_for_stack
* make size cutoff more aggressive and move to outer branch
* update comment
* add cache reset whenever grammar is reloaded
* remove explicit reference types for compiler transportability
2025-06-25 19:22:19 +08:00
Concedo
65ff041827
added more perf stats
2025-06-21 12:12:28 +08:00
Reithan
f07434f4c1
streamline grammar sampler to speed up generation while using heavy grammar ( #1606 )
2025-06-17 23:04:59 +08:00
Concedo
c494525b33
update deprecated apis
2025-06-13 22:21:15 +08:00
Reithan
f1c9db4174
fix-loss-of-destroyed-tokens-in-grammar-pre-pass ( #1600 )
2025-06-13 18:46:38 +08:00
Concedo
5bac0fb3d5
remove debug prints for now, they were kind of cluttered
2025-06-13 16:00:23 +08:00
Reithan
5af9138ebe
Improve GNBF performance by attempting culled grammar search first ( #1597 )
...
* cull tokens with top_3k first before running grammar, fallback to unculled if none found
* fix errors
* fix improvement and test against concedo's GBNF
* revert non-culling changes
2025-06-13 15:57:27 +08:00
Concedo
1cbe716e45
allow setting maingpu
2025-06-12 17:53:43 +08:00
Concedo
f6bbc350f2
various qol fixes
2025-06-05 10:26:02 +08:00
Concedo
736030bb9f
save and load state upgraded to 3 available states
2025-06-04 22:09:40 +08:00
Concedo
53f1511396
use a static buffer for kv reloads instead. also, added into lite ui
2025-06-03 22:32:46 +08:00
Concedo
4b57108508
Save KV State and Load KV State to memory added. GUI not yet updated
2025-06-03 17:46:29 +08:00
Concedo
6ce85c54d6
not working correctly
2025-06-02 22:12:10 +08:00
Concedo
8e1ebc55b5
dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored
2025-06-02 12:49:53 +08:00
Concedo
51dc1cf920
added scale for text lora
2025-06-02 00:13:42 +08:00
Concedo
0c108f6054
Merge commit ' 34b7c0439e
' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
# src/CMakeLists.txt
# tools/mtmd/clip.cpp
2025-05-31 12:27:45 +08:00
Concedo
f97bbdde00
fix to allow all EOGs to trigger a stop, occam's glm4 fix,
2025-05-24 22:55:11 +08:00
Concedo
ec04115ae9
swa options now available
2025-05-24 11:50:37 +08:00
Concedo
c4df151298
experimental swa flag
2025-05-23 21:33:26 +08:00
Concedo
69b5d4d4af
cursed hack for glm4, may or may not be better
2025-05-22 22:40:37 +08:00
Concedo
f125e724eb
fix off-by-one npast during some instances of fast forwarding
2025-05-22 19:51:21 +08:00
Concedo
f10574e598
debug text
2025-05-22 14:22:01 +08:00
Concedo
9f976e9c65
swa full used unless ctx shift and fast forward disabled
2025-05-21 22:47:45 +08:00
Concedo
3fefb3bdf2
Merge commit ' f0adb80bf7
' into concedo_experimental
...
# Conflicts:
# docs/backend/CANN.md
# docs/backend/SYCL.md
# docs/docker.md
# examples/sycl/run-llama2.sh
# examples/sycl/win-run-llama2.bat
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tools/llama-bench/README.md
2025-05-21 19:10:57 +08:00
Concedo
8b6dfbd1be
disabling the gMask prefix for glm-4 completions
2025-05-21 17:29:24 +08:00
Concedo
49305942ab
try disabling the gMask prefix for glm-4 completions
2025-05-21 16:47:08 +08:00
Concedo
5a499a5d2e
updated ltie, fixed multi clip skip and seeds not incrementing (+2 squashed commit)
...
Squashed commit:
[a9328e29a] fixed multi clip skip and seeds not incrementing
[cad3aa9db] streamline some debug outputs
2025-05-19 23:59:58 +08:00
Concedo
6cafc0e73e
Merge commit ' 71bdbdb587
' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/CMakeLists.txt
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.h
2025-05-16 15:25:15 +08:00
Concedo
35284bcdb5
glm4 clamp 8 on vk
2025-05-13 17:03:24 +08:00
Concedo
48f86bbbc7
tweaked text
2025-05-13 15:54:59 +08:00
Concedo
2819f784d4
use a threadpool, seems to improve tg performance
2025-05-12 18:06:10 +08:00
Concedo
ea2e5ed1e9
mmq debug log
2025-05-09 18:30:11 +08:00
Concedo
2439014a03
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/embedding/embedding.cpp
# tools/imatrix/imatrix.cpp
# tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Concedo
fa22c1a5a4
fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller
2025-05-07 18:30:36 +08:00
Concedo
a5b6f372a3
cfg scale wip
2025-05-07 00:36:00 +08:00
Concedo
0fa435b2a6
Merge commit ' 9b61acf060
' into concedo_experimental
...
# Conflicts:
# Makefile
# docs/multimodal/MobileVLM.md
# docs/multimodal/glmedge.md
# docs/multimodal/llava.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# requirements/requirements-all.txt
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/android/adb_run.sh
# tools/mtmd/android/build_64.sh
# tools/mtmd/clip-quantize-cli.cpp
2025-05-06 23:34:21 +08:00
Concedo
38a8778f24
wip cfg scale
2025-05-06 23:06:25 +08:00
Concedo
13cee48740
embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
...
Squashed commits:
[b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
Squashed commits:
[90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
Squashed commits:
[fbbaa989f] embed aria2c for windows
2025-05-06 18:56:02 +08:00
Concedo
9981ba8427
glm4 special BOS handling
2025-05-06 16:41:55 +08:00
Concedo
f59b5eb561
added toggle for guidance
2025-05-05 22:21:46 +08:00
Concedo
5a2808ffaf
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/labeler.yml
# .github/workflows/bench.yml.disabled
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/server.yml
# .gitignore
# CMakeLists.txt
# CODEOWNERS
# Makefile
# README.md
# SECURITY.md
# build-xcframework.sh
# ci/run.sh
# docs/development/HOWTO-add-model.md
# docs/multimodal/MobileVLM.md
# docs/multimodal/glmedge.md
# docs/multimodal/llava.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# examples/CMakeLists.txt
# examples/pydantic_models_to_grammar_examples.py
# grammars/README.md
# pyrightconfig.json
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# scripts/tool_bench.py
# scripts/xxd.cmake
# tests/CMakeLists.txt
# tests/run-json-schema-to-grammar.mjs
# tools/batched-bench/CMakeLists.txt
# tools/batched-bench/README.md
# tools/batched-bench/batched-bench.cpp
# tools/cvector-generator/CMakeLists.txt
# tools/cvector-generator/README.md
# tools/cvector-generator/completions.txt
# tools/cvector-generator/cvector-generator.cpp
# tools/cvector-generator/mean.hpp
# tools/cvector-generator/negative.txt
# tools/cvector-generator/pca.hpp
# tools/cvector-generator/positive.txt
# tools/export-lora/CMakeLists.txt
# tools/export-lora/README.md
# tools/export-lora/export-lora.cpp
# tools/gguf-split/CMakeLists.txt
# tools/gguf-split/README.md
# tools/imatrix/CMakeLists.txt
# tools/imatrix/README.md
# tools/imatrix/imatrix.cpp
# tools/llama-bench/CMakeLists.txt
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/llava/CMakeLists.txt
# tools/llava/README.md
# tools/llava/android/adb_run.sh
# tools/llava/android/build_64.sh
# tools/llava/clip-quantize-cli.cpp
# tools/main/CMakeLists.txt
# tools/main/README.md
# tools/perplexity/CMakeLists.txt
# tools/perplexity/README.md
# tools/perplexity/perplexity.cpp
# tools/quantize/CMakeLists.txt
# tools/rpc/CMakeLists.txt
# tools/rpc/README.md
# tools/rpc/rpc-server.cpp
# tools/run/CMakeLists.txt
# tools/run/README.md
# tools/run/linenoise.cpp/linenoise.cpp
# tools/run/linenoise.cpp/linenoise.h
# tools/run/run.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
# tools/server/bench/README.md
# tools/server/public_simplechat/readme.md
# tools/server/tests/README.md
# tools/server/themes/README.md
# tools/server/themes/buttons-top/README.md
# tools/server/themes/wild/README.md
# tools/tokenize/CMakeLists.txt
# tools/tokenize/tokenize.cpp
2025-05-03 12:15:36 +08:00
Concedo
5d382970ec
glm4 unclamp for all except vulkan
2025-04-30 17:19:38 +08:00
Concedo
9fdec02914
unclamp glm4 in debug
2025-04-30 14:49:52 +08:00
Concedo
c2802af9e8
fix qwen3, fixed sd, fixed glm4
2025-04-29 20:50:46 +08:00