Concedo
61a73347c6
fixed mrope for multiple images in qwen2vl (+1 squashed commits)
...
Squashed commits:
[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:
[bb78db1e] wip fixing mrope
2025-03-30 17:23:58 +08:00
Concedo
6a709be50a
replace deprecated
2025-03-27 10:27:20 +08:00
Concedo
e84596ec1a
add config for default gen tokens and bos toggle
2025-03-15 19:53:06 +08:00
Concedo
4212f0b8e8
wip on multiple fixes
2025-03-15 10:50:36 +08:00
Concedo
6a1dd57435
gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3
2025-03-14 17:47:01 +08:00
Concedo
0db4ae6237
traded my ink for a pen
2025-03-14 11:58:15 +08:00
Concedo
52cf1ded0c
remove unwanted print
2025-03-14 00:24:28 +08:00
Concedo
0460d92cc3
disable context shifting for gemma3
2025-03-13 20:28:26 +08:00
Concedo
e75539e8cb
too many issues without BOS (+1 squashed commits)
...
Squashed commits:
[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124
streamline output console log (+1 squashed commits)
...
Squashed commits:
[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Concedo
eb1809c105
add more perf stats
2025-03-12 18:58:27 +08:00
Concedo
b0541f3652
added draft results
2025-03-10 22:03:20 +08:00
Concedo
72bc855e8a
honor add bos token settings from metadata
2025-03-07 22:10:50 +08:00
Concedo
6b7d2349a7
Rewrite history to fix bad vulkan shader commits without increasing repo size
...
added dpe colab (+8 squashed commit)
Squashed commit:
[b8362da4] updated lite
[ed6c037d] move nsigma into the regular sampler stack
[ac5f61c6] relative filepath fixed
[05fe96ab] export template
[ed0a5a3e] nix_example.md: refactor (#1401 )
* nix_example.md: add override example
* nix_example.md: drop graphics example, already basic nixos knowledge
* nix_example.md: format
* nix_example.md: Vulkan is disabled on macOS
Disabled in: 1ccd253acc
* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}
Fixes: https://github.com/LostRuins/koboldcpp/issues/1367
[675c62f7] AutoGuess: Phi 4 (mini) (#1402 )
[4bf56982
] phrasing
[b8c0df04
] Add Rep Pen to Top N Sigma sampler chain (#1397 )
- place after nsigma and before xtc (+3 squashed commit)
Squashed commit:
[87c52b97
] disable VMM from HIP
[ee8906f3
] edit description
[e85c0e69
] Remove Unnecessary Rep Counting (#1394 )
* stop counting reps
* fix range-based initializer
* strike that - reverse it
2025-03-05 00:02:20 +08:00
Reithan
62cd9bb0b2
use range neq zero instead of lt ( #1388 )
2025-02-24 18:47:19 +08:00
Concedo
f2ac10c014
added nsigma to lite
2025-02-21 15:11:24 +08:00
EquinoxPsychosis
2740af3660
add top n sigma sampler from llama.cpp ( #1384 )
...
* Add N Sigma Sampler
* update nsigma sampler chain
* xtc position fix
* remove stray newline
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-02-21 14:31:42 +08:00
Concedo
6d7ef10671
Merge branch 'upstream' into concedo_experimental
...
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .gitignore
# CONTRIBUTING.md
# Makefile
# common/CMakeLists.txt
# common/arg.cpp
# common/common.cpp
# examples/main/main.cpp
# examples/run/run.cpp
# examples/server/tests/README.md
# ggml/src/ggml-cuda/mma.cuh
# scripts/get_chat_template.py
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
2025-02-20 23:17:20 +08:00
Concedo
b162c25a5e
fixed moe experts to use detected arch for key
2025-02-10 17:46:08 +08:00
Concedo
d22eca6c47
fix potential crash in autoguess
2025-02-09 12:33:28 +08:00
Concedo
e68a3cf1dc
fixed some functions when no model is loaded
2025-02-08 11:15:26 +08:00
Concedo
8fef9f3fb5
reloading is working correctly.
2025-02-06 22:24:18 +08:00
Concedo
fd84b062f9
allow reuse of clip embds
2025-01-30 19:02:45 +08:00
Concedo
f4e2f4b069
disable context shift when using mrope
2025-01-30 00:36:05 +08:00
Concedo
70f1d8d746
vision can set max res (+1 squashed commits)
...
Squashed commits:
[938fc655] vision can set max res
2025-01-30 00:19:49 +08:00
Concedo
0e45d3bb7a
quiet flags now set at load time
2025-01-25 16:46:56 +08:00
Concedo
cca4a934dd
fix for chat templates and drafting
2025-01-23 11:49:40 +08:00
Concedo
0e74db7fd4
fixed another tts bug, clblast selection and quiet mode
2025-01-22 21:36:13 +08:00
Concedo
2a00ee8fa8
broken commit
2025-01-16 21:41:18 +08:00
Concedo
b3de1598e7
Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
...
tts is functional (+6 squashed commit)
Squashed commit:
[22396311] wip tts
[3a883027] tts not yet working
[0dcfab0e] fix silly bug
[a378d9ef] some long overdue cleanup
[fc5a6fb5] Wip tts
[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Nexes the Elder
3e6ef8e0ef
Probable typo ( #1287 )
2024-12-26 11:51:04 +08:00
Concedo
10d4fc637d
fixed a bug with drafting tokens
2024-12-23 11:36:08 +08:00
Concedo
fd5100c382
fix for query param
2024-12-21 10:41:25 +08:00
Concedo
4c56b7cada
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/gbnf-validator/gbnf-validator.cpp
# examples/llava/clip.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
b7d3274523
temporarily make qwenv2l use clip on cpu for vulkan and macos
2024-12-21 09:15:31 +08:00
Concedo
bc297da91e
remove unused function
2024-12-16 11:39:52 +08:00
Concedo
00d154b32b
wip on qwen2vl integration, updated msvc runtimes
2024-12-15 23:58:02 +08:00
Concedo
60cd68a39d
draft model sets gpu split instead of id, made mmq default for cli
2024-12-14 23:58:45 +08:00
Concedo
595cc6975f
added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid
2024-12-13 17:11:59 +08:00
Concedo
00a686fc72
fixed fast forwarding context corruption after abort during prompt processing
2024-12-10 22:37:40 +08:00
Concedo
5106816eac
drafted tokens debug prints
2024-12-05 17:05:20 +08:00
Concedo
e93c2427b4
allow incompatible vocab in debugmode
2024-12-01 14:11:03 +08:00
Concedo
32ac3153e4
default speculative set to 8. added more adapter fields
2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee
default to 12 tokens drafted
2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac
customizable speculative size
2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
b9e99c69e8
fixed build
2024-11-26 22:06:55 +08:00
Concedo
62dde8cfb2
ollama sync completions mostly working. stupid api.
2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00