Commit graph

395 commits

Author SHA1 Message Date
Concedo
ec04115ae9 swa options now available 2025-05-24 11:50:37 +08:00
Concedo
c4df151298 experimental swa flag 2025-05-23 21:33:26 +08:00
Concedo
69b5d4d4af cursed hack for glm4, may or may not be better 2025-05-22 22:40:37 +08:00
Concedo
f125e724eb fix off-by-one npast during some instances of fast forwarding 2025-05-22 19:51:21 +08:00
Concedo
f10574e598 debug text 2025-05-22 14:22:01 +08:00
Concedo
9f976e9c65 swa full used unless ctx shift and fast forward disabled 2025-05-21 22:47:45 +08:00
Concedo
3fefb3bdf2 Merge commit 'f0adb80bf7' into concedo_experimental
# Conflicts:
#	docs/backend/CANN.md
#	docs/backend/SYCL.md
#	docs/docker.md
#	examples/sycl/run-llama2.sh
#	examples/sycl/win-run-llama2.bat
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	tools/llama-bench/README.md
2025-05-21 19:10:57 +08:00
Concedo
8b6dfbd1be disabling the gMask prefix for glm-4 completions 2025-05-21 17:29:24 +08:00
Concedo
49305942ab try disabling the gMask prefix for glm-4 completions 2025-05-21 16:47:08 +08:00
Concedo
5a499a5d2e updated ltie, fixed multi clip skip and seeds not incrementing (+2 squashed commit)
Squashed commit:

[a9328e29a] fixed multi clip skip and seeds not incrementing

[cad3aa9db] streamline some debug outputs
2025-05-19 23:59:58 +08:00
Concedo
6cafc0e73e Merge commit '71bdbdb587' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-cpu/CMakeLists.txt
#	tools/batched-bench/batched-bench.cpp
#	tools/mtmd/clip.h
2025-05-16 15:25:15 +08:00
Concedo
35284bcdb5 glm4 clamp 8 on vk 2025-05-13 17:03:24 +08:00
Concedo
48f86bbbc7 tweaked text 2025-05-13 15:54:59 +08:00
Concedo
2819f784d4 use a threadpool, seems to improve tg performance 2025-05-12 18:06:10 +08:00
Concedo
ea2e5ed1e9 mmq debug log 2025-05-09 18:30:11 +08:00
Concedo
2439014a03 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	examples/embedding/embedding.cpp
#	tools/imatrix/imatrix.cpp
#	tools/perplexity/perplexity.cpp
2025-05-08 23:41:02 +08:00
Concedo
fa22c1a5a4 fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller 2025-05-07 18:30:36 +08:00
Concedo
a5b6f372a3 cfg scale wip 2025-05-07 00:36:00 +08:00
Concedo
0fa435b2a6 Merge commit '9b61acf060' into concedo_experimental
# Conflicts:
#	Makefile
#	docs/multimodal/MobileVLM.md
#	docs/multimodal/glmedge.md
#	docs/multimodal/llava.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	requirements/requirements-all.txt
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/README.md
#	tools/mtmd/android/adb_run.sh
#	tools/mtmd/android/build_64.sh
#	tools/mtmd/clip-quantize-cli.cpp
2025-05-06 23:34:21 +08:00
Concedo
38a8778f24 wip cfg scale 2025-05-06 23:06:25 +08:00
Concedo
13cee48740 embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
Squashed commits:

[b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[fbbaa989f] embed aria2c for windows
2025-05-06 18:56:02 +08:00
Concedo
9981ba8427 glm4 special BOS handling 2025-05-06 16:41:55 +08:00
Concedo
f59b5eb561 added toggle for guidance 2025-05-05 22:21:46 +08:00
Concedo
5a2808ffaf Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.flake8
#	.github/labeler.yml
#	.github/workflows/bench.yml.disabled
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	.gitignore
#	CMakeLists.txt
#	CODEOWNERS
#	Makefile
#	README.md
#	SECURITY.md
#	build-xcframework.sh
#	ci/run.sh
#	docs/development/HOWTO-add-model.md
#	docs/multimodal/MobileVLM.md
#	docs/multimodal/glmedge.md
#	docs/multimodal/llava.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	examples/CMakeLists.txt
#	examples/pydantic_models_to_grammar_examples.py
#	grammars/README.md
#	pyrightconfig.json
#	requirements/requirements-all.txt
#	scripts/fetch_server_test_models.py
#	scripts/tool_bench.py
#	scripts/xxd.cmake
#	tests/CMakeLists.txt
#	tests/run-json-schema-to-grammar.mjs
#	tools/batched-bench/CMakeLists.txt
#	tools/batched-bench/README.md
#	tools/batched-bench/batched-bench.cpp
#	tools/cvector-generator/CMakeLists.txt
#	tools/cvector-generator/README.md
#	tools/cvector-generator/completions.txt
#	tools/cvector-generator/cvector-generator.cpp
#	tools/cvector-generator/mean.hpp
#	tools/cvector-generator/negative.txt
#	tools/cvector-generator/pca.hpp
#	tools/cvector-generator/positive.txt
#	tools/export-lora/CMakeLists.txt
#	tools/export-lora/README.md
#	tools/export-lora/export-lora.cpp
#	tools/gguf-split/CMakeLists.txt
#	tools/gguf-split/README.md
#	tools/imatrix/CMakeLists.txt
#	tools/imatrix/README.md
#	tools/imatrix/imatrix.cpp
#	tools/llama-bench/CMakeLists.txt
#	tools/llama-bench/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/llava/CMakeLists.txt
#	tools/llava/README.md
#	tools/llava/android/adb_run.sh
#	tools/llava/android/build_64.sh
#	tools/llava/clip-quantize-cli.cpp
#	tools/main/CMakeLists.txt
#	tools/main/README.md
#	tools/perplexity/CMakeLists.txt
#	tools/perplexity/README.md
#	tools/perplexity/perplexity.cpp
#	tools/quantize/CMakeLists.txt
#	tools/rpc/CMakeLists.txt
#	tools/rpc/README.md
#	tools/rpc/rpc-server.cpp
#	tools/run/CMakeLists.txt
#	tools/run/README.md
#	tools/run/linenoise.cpp/linenoise.cpp
#	tools/run/linenoise.cpp/linenoise.h
#	tools/run/run.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
#	tools/server/bench/README.md
#	tools/server/public_simplechat/readme.md
#	tools/server/tests/README.md
#	tools/server/themes/README.md
#	tools/server/themes/buttons-top/README.md
#	tools/server/themes/wild/README.md
#	tools/tokenize/CMakeLists.txt
#	tools/tokenize/tokenize.cpp
2025-05-03 12:15:36 +08:00
Concedo
5d382970ec glm4 unclamp for all except vulkan 2025-04-30 17:19:38 +08:00
Concedo
9fdec02914 unclamp glm4 in debug 2025-04-30 14:49:52 +08:00
Concedo
c2802af9e8 fix qwen3, fixed sd, fixed glm4 2025-04-29 20:50:46 +08:00
Concedo
4d8a7a6594 fix occasional clip segfault, fix glm4 (+1 squashed commits)
Squashed commits:

[bd71cd688] GLM4 fix wip
2025-04-29 01:42:50 +08:00
Concedo
cb1c182673 add more warmup (+1 squashed commits)
Squashed commits:

[9578d5352] updated lite
2025-04-26 10:22:09 +08:00
Concedo
4decd6bea1 GLM4 batch clamp 2025-04-26 09:42:17 +08:00
Concedo
6b6597ebf1 allow for single token prompt processing (actual batch size 1) 2025-04-25 16:54:46 +08:00
Concedo
28a2723100 merged pixtral support, not fully working 2025-04-24 15:27:02 +08:00
Concedo
9cd6a1add2 allow mmproj to be run on cpu 2025-04-21 21:03:10 +08:00
Concedo
2ed6850c0b added override tensor 2025-04-20 20:56:17 +08:00
Concedo
c67510718e kv override option (+1 squashed commits)
Squashed commits:

[e615fc01] kv override option
2025-04-17 14:22:30 +08:00
Concedo
93a226d9e4 added prefix for llava, reverted system role in template as it degreaded gemma3. truncated debug logs 2025-04-05 18:06:41 +08:00
Concedo
b3143384b4 larger warmup batch 2025-04-05 10:57:04 +08:00
Concedo
61a73347c6 fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:

[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)

Squashed commits:

[bb78db1e] wip fixing mrope
2025-03-30 17:23:58 +08:00
Concedo
6a709be50a replace deprecated 2025-03-27 10:27:20 +08:00
Concedo
e84596ec1a add config for default gen tokens and bos toggle 2025-03-15 19:53:06 +08:00
Concedo
4212f0b8e8 wip on multiple fixes 2025-03-15 10:50:36 +08:00
Concedo
6a1dd57435 gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3 2025-03-14 17:47:01 +08:00
Concedo
0db4ae6237 traded my ink for a pen 2025-03-14 11:58:15 +08:00
Concedo
52cf1ded0c remove unwanted print 2025-03-14 00:24:28 +08:00
Concedo
0460d92cc3 disable context shifting for gemma3 2025-03-13 20:28:26 +08:00
Concedo
e75539e8cb too many issues without BOS (+1 squashed commits)
Squashed commits:

[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124 streamline output console log (+1 squashed commits)
Squashed commits:

[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
77debb1b1b gemma3 vision works, but is using more tokens than expected - may need resizing 2025-03-13 00:31:16 +08:00
Concedo
eb1809c105 add more perf stats 2025-03-12 18:58:27 +08:00
Concedo
b0541f3652 added draft results 2025-03-10 22:03:20 +08:00