Concedo
|
6b6597ebf1
|
allow for single token prompt processing (actual batch size 1)
|
2025-04-25 16:54:46 +08:00 |
|
Concedo
|
28a2723100
|
merged pixtral support, not fully working
|
2025-04-24 15:27:02 +08:00 |
|
Concedo
|
9cd6a1add2
|
allow mmproj to be run on cpu
|
2025-04-21 21:03:10 +08:00 |
|
Concedo
|
2ed6850c0b
|
added override tensor
|
2025-04-20 20:56:17 +08:00 |
|
Concedo
|
c67510718e
|
kv override option (+1 squashed commits)
Squashed commits:
[e615fc01] kv override option
|
2025-04-17 14:22:30 +08:00 |
|
Concedo
|
93a226d9e4
|
added prefix for llava, reverted system role in template as it degreaded gemma3. truncated debug logs
|
2025-04-05 18:06:41 +08:00 |
|
Concedo
|
b3143384b4
|
larger warmup batch
|
2025-04-05 10:57:04 +08:00 |
|
Concedo
|
61a73347c6
|
fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:
[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:
[bb78db1e] wip fixing mrope
|
2025-03-30 17:23:58 +08:00 |
|
Concedo
|
6a709be50a
|
replace deprecated
|
2025-03-27 10:27:20 +08:00 |
|
Concedo
|
e84596ec1a
|
add config for default gen tokens and bos toggle
|
2025-03-15 19:53:06 +08:00 |
|
Concedo
|
4212f0b8e8
|
wip on multiple fixes
|
2025-03-15 10:50:36 +08:00 |
|
Concedo
|
6a1dd57435
|
gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3
|
2025-03-14 17:47:01 +08:00 |
|
Concedo
|
0db4ae6237
|
traded my ink for a pen
|
2025-03-14 11:58:15 +08:00 |
|
Concedo
|
52cf1ded0c
|
remove unwanted print
|
2025-03-14 00:24:28 +08:00 |
|
Concedo
|
0460d92cc3
|
disable context shifting for gemma3
|
2025-03-13 20:28:26 +08:00 |
|
Concedo
|
e75539e8cb
|
too many issues without BOS (+1 squashed commits)
Squashed commits:
[7138d941] only print bos alert in debug
|
2025-03-13 16:48:29 +08:00 |
|
Concedo
|
1ef41c2124
|
streamline output console log (+1 squashed commits)
Squashed commits:
[ca474bdd] streamline output console log
|
2025-03-13 15:33:49 +08:00 |
|
Concedo
|
77debb1b1b
|
gemma3 vision works, but is using more tokens than expected - may need resizing
|
2025-03-13 00:31:16 +08:00 |
|
Concedo
|
eb1809c105
|
add more perf stats
|
2025-03-12 18:58:27 +08:00 |
|
Concedo
|
b0541f3652
|
added draft results
|
2025-03-10 22:03:20 +08:00 |
|
Concedo
|
72bc855e8a
|
honor add bos token settings from metadata
|
2025-03-07 22:10:50 +08:00 |
|
Concedo
|
6b7d2349a7
|
Rewrite history to fix bad vulkan shader commits without increasing repo size
added dpe colab (+8 squashed commit)
Squashed commit:
[b8362da4] updated lite
[ed6c037d] move nsigma into the regular sampler stack
[ac5f61c6] relative filepath fixed
[05fe96ab] export template
[ed0a5a3e] nix_example.md: refactor (#1401)
* nix_example.md: add override example
* nix_example.md: drop graphics example, already basic nixos knowledge
* nix_example.md: format
* nix_example.md: Vulkan is disabled on macOS
Disabled in: 1ccd253acc
* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}
Fixes: https://github.com/LostRuins/koboldcpp/issues/1367
[675c62f7] AutoGuess: Phi 4 (mini) (#1402)
[4bf56982 ] phrasing
[b8c0df04 ] Add Rep Pen to Top N Sigma sampler chain (#1397)
- place after nsigma and before xtc (+3 squashed commit)
Squashed commit:
[87c52b97 ] disable VMM from HIP
[ee8906f3 ] edit description
[e85c0e69 ] Remove Unnecessary Rep Counting (#1394)
* stop counting reps
* fix range-based initializer
* strike that - reverse it
|
2025-03-05 00:02:20 +08:00 |
|
Reithan
|
62cd9bb0b2
|
use range neq zero instead of lt (#1388)
|
2025-02-24 18:47:19 +08:00 |
|
Concedo
|
f2ac10c014
|
added nsigma to lite
|
2025-02-21 15:11:24 +08:00 |
|
EquinoxPsychosis
|
2740af3660
|
add top n sigma sampler from llama.cpp (#1384)
* Add N Sigma Sampler
* update nsigma sampler chain
* xtc position fix
* remove stray newline
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
|
2025-02-21 14:31:42 +08:00 |
|
Concedo
|
6d7ef10671
|
Merge branch 'upstream' into concedo_experimental
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .gitignore
# CONTRIBUTING.md
# Makefile
# common/CMakeLists.txt
# common/arg.cpp
# common/common.cpp
# examples/main/main.cpp
# examples/run/run.cpp
# examples/server/tests/README.md
# ggml/src/ggml-cuda/mma.cuh
# scripts/get_chat_template.py
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
|
2025-02-20 23:17:20 +08:00 |
|
Concedo
|
b162c25a5e
|
fixed moe experts to use detected arch for key
|
2025-02-10 17:46:08 +08:00 |
|
Concedo
|
d22eca6c47
|
fix potential crash in autoguess
|
2025-02-09 12:33:28 +08:00 |
|
Concedo
|
e68a3cf1dc
|
fixed some functions when no model is loaded
|
2025-02-08 11:15:26 +08:00 |
|
Concedo
|
8fef9f3fb5
|
reloading is working correctly.
|
2025-02-06 22:24:18 +08:00 |
|
Concedo
|
fd84b062f9
|
allow reuse of clip embds
|
2025-01-30 19:02:45 +08:00 |
|
Concedo
|
f4e2f4b069
|
disable context shift when using mrope
|
2025-01-30 00:36:05 +08:00 |
|
Concedo
|
70f1d8d746
|
vision can set max res (+1 squashed commits)
Squashed commits:
[938fc655] vision can set max res
|
2025-01-30 00:19:49 +08:00 |
|
Concedo
|
0e45d3bb7a
|
quiet flags now set at load time
|
2025-01-25 16:46:56 +08:00 |
|
Concedo
|
cca4a934dd
|
fix for chat templates and drafting
|
2025-01-23 11:49:40 +08:00 |
|
Concedo
|
0e74db7fd4
|
fixed another tts bug, clblast selection and quiet mode
|
2025-01-22 21:36:13 +08:00 |
|
Concedo
|
2a00ee8fa8
|
broken commit
|
2025-01-16 21:41:18 +08:00 |
|
Concedo
|
b3de1598e7
|
Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)
Squashed commit:
[22396311] wip tts
[3a883027] tts not yet working
[0dcfab0e] fix silly bug
[a378d9ef] some long overdue cleanup
[fc5a6fb5] Wip tts
[39f50497] wip TTS integration
|
2025-01-13 14:23:25 +08:00 |
|
Nexes the Elder
|
3e6ef8e0ef
|
Probable typo (#1287)
|
2024-12-26 11:51:04 +08:00 |
|
Concedo
|
10d4fc637d
|
fixed a bug with drafting tokens
|
2024-12-23 11:36:08 +08:00 |
|
Concedo
|
fd5100c382
|
fix for query param
|
2024-12-21 10:41:25 +08:00 |
|
Concedo
|
4c56b7cada
|
Merge branch 'upstream' into concedo_experimental
# Conflicts:
# README.md
# examples/gbnf-validator/gbnf-validator.cpp
# examples/llava/clip.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
|
2024-12-21 09:41:49 +08:00 |
|
Concedo
|
b7d3274523
|
temporarily make qwenv2l use clip on cpu for vulkan and macos
|
2024-12-21 09:15:31 +08:00 |
|
Concedo
|
bc297da91e
|
remove unused function
|
2024-12-16 11:39:52 +08:00 |
|
Concedo
|
00d154b32b
|
wip on qwen2vl integration, updated msvc runtimes
|
2024-12-15 23:58:02 +08:00 |
|
Concedo
|
60cd68a39d
|
draft model sets gpu split instead of id, made mmq default for cli
|
2024-12-14 23:58:45 +08:00 |
|
Concedo
|
595cc6975f
|
added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid
|
2024-12-13 17:11:59 +08:00 |
|
Concedo
|
00a686fc72
|
fixed fast forwarding context corruption after abort during prompt processing
|
2024-12-10 22:37:40 +08:00 |
|
Concedo
|
5106816eac
|
drafted tokens debug prints
|
2024-12-05 17:05:20 +08:00 |
|
Concedo
|
e93c2427b4
|
allow incompatible vocab in debugmode
|
2024-12-01 14:11:03 +08:00 |
|