Commit graph

365 commits

Author SHA1 Message Date
Concedo
6b6597ebf1 allow for single token prompt processing (actual batch size 1) 2025-04-25 16:54:46 +08:00
Concedo
28a2723100 merged pixtral support, not fully working 2025-04-24 15:27:02 +08:00
Concedo
9cd6a1add2 allow mmproj to be run on cpu 2025-04-21 21:03:10 +08:00
Concedo
2ed6850c0b added override tensor 2025-04-20 20:56:17 +08:00
Concedo
c67510718e kv override option (+1 squashed commits)
Squashed commits:

[e615fc01] kv override option
2025-04-17 14:22:30 +08:00
Concedo
93a226d9e4 added prefix for llava, reverted system role in template as it degreaded gemma3. truncated debug logs 2025-04-05 18:06:41 +08:00
Concedo
b3143384b4 larger warmup batch 2025-04-05 10:57:04 +08:00
Concedo
61a73347c6 fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:

[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)

Squashed commits:

[bb78db1e] wip fixing mrope
2025-03-30 17:23:58 +08:00
Concedo
6a709be50a replace deprecated 2025-03-27 10:27:20 +08:00
Concedo
e84596ec1a add config for default gen tokens and bos toggle 2025-03-15 19:53:06 +08:00
Concedo
4212f0b8e8 wip on multiple fixes 2025-03-15 10:50:36 +08:00
Concedo
6a1dd57435 gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3 2025-03-14 17:47:01 +08:00
Concedo
0db4ae6237 traded my ink for a pen 2025-03-14 11:58:15 +08:00
Concedo
52cf1ded0c remove unwanted print 2025-03-14 00:24:28 +08:00
Concedo
0460d92cc3 disable context shifting for gemma3 2025-03-13 20:28:26 +08:00
Concedo
e75539e8cb too many issues without BOS (+1 squashed commits)
Squashed commits:

[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124 streamline output console log (+1 squashed commits)
Squashed commits:

[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
77debb1b1b gemma3 vision works, but is using more tokens than expected - may need resizing 2025-03-13 00:31:16 +08:00
Concedo
eb1809c105 add more perf stats 2025-03-12 18:58:27 +08:00
Concedo
b0541f3652 added draft results 2025-03-10 22:03:20 +08:00
Concedo
72bc855e8a honor add bos token settings from metadata 2025-03-07 22:10:50 +08:00
Concedo
6b7d2349a7 Rewrite history to fix bad vulkan shader commits without increasing repo size
added dpe colab (+8 squashed commit)

Squashed commit:

[b8362da4] updated lite

[ed6c037d] move nsigma into the regular sampler stack

[ac5f61c6] relative filepath fixed

[05fe96ab] export template

[ed0a5a3e] nix_example.md: refactor (#1401)

* nix_example.md: add override example

* nix_example.md: drop graphics example, already basic nixos knowledge

* nix_example.md: format

* nix_example.md: Vulkan is disabled on macOS

Disabled in: 1ccd253acc

* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}

Fixes: https://github.com/LostRuins/koboldcpp/issues/1367

[675c62f7] AutoGuess: Phi 4 (mini) (#1402)

[4bf56982] phrasing

[b8c0df04] Add Rep Pen to Top N Sigma sampler chain (#1397)

- place after nsigma and before xtc (+3 squashed commit)

Squashed commit:

[87c52b97] disable VMM from HIP

[ee8906f3] edit description

[e85c0e69] Remove Unnecessary Rep Counting (#1394)

* stop counting reps

* fix range-based initializer

* strike that - reverse it
2025-03-05 00:02:20 +08:00
Reithan
62cd9bb0b2
use range neq zero instead of lt (#1388) 2025-02-24 18:47:19 +08:00
Concedo
f2ac10c014 added nsigma to lite 2025-02-21 15:11:24 +08:00
EquinoxPsychosis
2740af3660
add top n sigma sampler from llama.cpp (#1384)
* Add N Sigma Sampler

* update nsigma sampler chain

* xtc position fix

* remove stray newline

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-02-21 14:31:42 +08:00
Concedo
6d7ef10671 Merge branch 'upstream' into concedo_experimental
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902

# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.gitignore
#	CONTRIBUTING.md
#	Makefile
#	common/CMakeLists.txt
#	common/arg.cpp
#	common/common.cpp
#	examples/main/main.cpp
#	examples/run/run.cpp
#	examples/server/tests/README.md
#	ggml/src/ggml-cuda/mma.cuh
#	scripts/get_chat_template.py
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
2025-02-20 23:17:20 +08:00
Concedo
b162c25a5e fixed moe experts to use detected arch for key 2025-02-10 17:46:08 +08:00
Concedo
d22eca6c47 fix potential crash in autoguess 2025-02-09 12:33:28 +08:00
Concedo
e68a3cf1dc fixed some functions when no model is loaded 2025-02-08 11:15:26 +08:00
Concedo
8fef9f3fb5 reloading is working correctly. 2025-02-06 22:24:18 +08:00
Concedo
fd84b062f9 allow reuse of clip embds 2025-01-30 19:02:45 +08:00
Concedo
f4e2f4b069 disable context shift when using mrope 2025-01-30 00:36:05 +08:00
Concedo
70f1d8d746 vision can set max res (+1 squashed commits)
Squashed commits:

[938fc655] vision can set max res
2025-01-30 00:19:49 +08:00
Concedo
0e45d3bb7a quiet flags now set at load time 2025-01-25 16:46:56 +08:00
Concedo
cca4a934dd fix for chat templates and drafting 2025-01-23 11:49:40 +08:00
Concedo
0e74db7fd4 fixed another tts bug, clblast selection and quiet mode 2025-01-22 21:36:13 +08:00
Concedo
2a00ee8fa8 broken commit 2025-01-16 21:41:18 +08:00
Concedo
b3de1598e7 Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)

Squashed commit:

[22396311] wip tts

[3a883027] tts not yet working

[0dcfab0e] fix silly bug

[a378d9ef] some long overdue cleanup

[fc5a6fb5] Wip tts

[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Nexes the Elder
3e6ef8e0ef
Probable typo (#1287) 2024-12-26 11:51:04 +08:00
Concedo
10d4fc637d fixed a bug with drafting tokens 2024-12-23 11:36:08 +08:00
Concedo
fd5100c382 fix for query param 2024-12-21 10:41:25 +08:00
Concedo
4c56b7cada Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/gbnf-validator/gbnf-validator.cpp
#	examples/llava/clip.cpp
#	examples/run/README.md
#	examples/run/run.cpp
#	examples/server/README.md
#	ggml/src/ggml-cpu/CMakeLists.txt
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
b7d3274523 temporarily make qwenv2l use clip on cpu for vulkan and macos 2024-12-21 09:15:31 +08:00
Concedo
bc297da91e remove unused function 2024-12-16 11:39:52 +08:00
Concedo
00d154b32b wip on qwen2vl integration, updated msvc runtimes 2024-12-15 23:58:02 +08:00
Concedo
60cd68a39d draft model sets gpu split instead of id, made mmq default for cli 2024-12-14 23:58:45 +08:00
Concedo
595cc6975f added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid 2024-12-13 17:11:59 +08:00
Concedo
00a686fc72 fixed fast forwarding context corruption after abort during prompt processing 2024-12-10 22:37:40 +08:00
Concedo
5106816eac drafted tokens debug prints 2024-12-05 17:05:20 +08:00
Concedo
e93c2427b4 allow incompatible vocab in debugmode 2024-12-01 14:11:03 +08:00