Concedo
e84596ec1a
add config for default gen tokens and bos toggle
2025-03-15 19:53:06 +08:00
Concedo
4212f0b8e8
wip on multiple fixes
2025-03-15 10:50:36 +08:00
Concedo
6a1dd57435
gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3
2025-03-14 17:47:01 +08:00
Concedo
0db4ae6237
traded my ink for a pen
2025-03-14 11:58:15 +08:00
Concedo
52cf1ded0c
remove unwanted print
2025-03-14 00:24:28 +08:00
Concedo
0460d92cc3
disable context shifting for gemma3
2025-03-13 20:28:26 +08:00
Concedo
e75539e8cb
too many issues without BOS (+1 squashed commits)
...
Squashed commits:
[7138d941] only print bos alert in debug
2025-03-13 16:48:29 +08:00
Concedo
1ef41c2124
streamline output console log (+1 squashed commits)
...
Squashed commits:
[ca474bdd] streamline output console log
2025-03-13 15:33:49 +08:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Concedo
eb1809c105
add more perf stats
2025-03-12 18:58:27 +08:00
Concedo
b0541f3652
added draft results
2025-03-10 22:03:20 +08:00
Concedo
72bc855e8a
honor add bos token settings from metadata
2025-03-07 22:10:50 +08:00
Concedo
6b7d2349a7
Rewrite history to fix bad vulkan shader commits without increasing repo size
...
added dpe colab (+8 squashed commit)
Squashed commit:
[b8362da4] updated lite
[ed6c037d] move nsigma into the regular sampler stack
[ac5f61c6] relative filepath fixed
[05fe96ab] export template
[ed0a5a3e] nix_example.md: refactor (#1401 )
* nix_example.md: add override example
* nix_example.md: drop graphics example, already basic nixos knowledge
* nix_example.md: format
* nix_example.md: Vulkan is disabled on macOS
Disabled in: 1ccd253acc
* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}
Fixes: https://github.com/LostRuins/koboldcpp/issues/1367
[675c62f7] AutoGuess: Phi 4 (mini) (#1402 )
[4bf56982
] phrasing
[b8c0df04
] Add Rep Pen to Top N Sigma sampler chain (#1397 )
- place after nsigma and before xtc (+3 squashed commit)
Squashed commit:
[87c52b97
] disable VMM from HIP
[ee8906f3
] edit description
[e85c0e69
] Remove Unnecessary Rep Counting (#1394 )
* stop counting reps
* fix range-based initializer
* strike that - reverse it
2025-03-05 00:02:20 +08:00
Reithan
62cd9bb0b2
use range neq zero instead of lt ( #1388 )
2025-02-24 18:47:19 +08:00
Concedo
f2ac10c014
added nsigma to lite
2025-02-21 15:11:24 +08:00
EquinoxPsychosis
2740af3660
add top n sigma sampler from llama.cpp ( #1384 )
...
* Add N Sigma Sampler
* update nsigma sampler chain
* xtc position fix
* remove stray newline
---------
Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-02-21 14:31:42 +08:00
Concedo
6d7ef10671
Merge branch 'upstream' into concedo_experimental
...
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .gitignore
# CONTRIBUTING.md
# Makefile
# common/CMakeLists.txt
# common/arg.cpp
# common/common.cpp
# examples/main/main.cpp
# examples/run/run.cpp
# examples/server/tests/README.md
# ggml/src/ggml-cuda/mma.cuh
# scripts/get_chat_template.py
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-chat.cpp
2025-02-20 23:17:20 +08:00
Concedo
b162c25a5e
fixed moe experts to use detected arch for key
2025-02-10 17:46:08 +08:00
Concedo
d22eca6c47
fix potential crash in autoguess
2025-02-09 12:33:28 +08:00
Concedo
e68a3cf1dc
fixed some functions when no model is loaded
2025-02-08 11:15:26 +08:00
Concedo
8fef9f3fb5
reloading is working correctly.
2025-02-06 22:24:18 +08:00
Concedo
fd84b062f9
allow reuse of clip embds
2025-01-30 19:02:45 +08:00
Concedo
f4e2f4b069
disable context shift when using mrope
2025-01-30 00:36:05 +08:00
Concedo
70f1d8d746
vision can set max res (+1 squashed commits)
...
Squashed commits:
[938fc655] vision can set max res
2025-01-30 00:19:49 +08:00
Concedo
0e45d3bb7a
quiet flags now set at load time
2025-01-25 16:46:56 +08:00
Concedo
cca4a934dd
fix for chat templates and drafting
2025-01-23 11:49:40 +08:00
Concedo
0e74db7fd4
fixed another tts bug, clblast selection and quiet mode
2025-01-22 21:36:13 +08:00
Concedo
2a00ee8fa8
broken commit
2025-01-16 21:41:18 +08:00
Concedo
b3de1598e7
Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
...
tts is functional (+6 squashed commit)
Squashed commit:
[22396311] wip tts
[3a883027] tts not yet working
[0dcfab0e] fix silly bug
[a378d9ef] some long overdue cleanup
[fc5a6fb5] Wip tts
[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Nexes the Elder
3e6ef8e0ef
Probable typo ( #1287 )
2024-12-26 11:51:04 +08:00
Concedo
10d4fc637d
fixed a bug with drafting tokens
2024-12-23 11:36:08 +08:00
Concedo
fd5100c382
fix for query param
2024-12-21 10:41:25 +08:00
Concedo
4c56b7cada
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/gbnf-validator/gbnf-validator.cpp
# examples/llava/clip.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
b7d3274523
temporarily make qwenv2l use clip on cpu for vulkan and macos
2024-12-21 09:15:31 +08:00
Concedo
bc297da91e
remove unused function
2024-12-16 11:39:52 +08:00
Concedo
00d154b32b
wip on qwen2vl integration, updated msvc runtimes
2024-12-15 23:58:02 +08:00
Concedo
60cd68a39d
draft model sets gpu split instead of id, made mmq default for cli
2024-12-14 23:58:45 +08:00
Concedo
595cc6975f
added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid
2024-12-13 17:11:59 +08:00
Concedo
00a686fc72
fixed fast forwarding context corruption after abort during prompt processing
2024-12-10 22:37:40 +08:00
Concedo
5106816eac
drafted tokens debug prints
2024-12-05 17:05:20 +08:00
Concedo
e93c2427b4
allow incompatible vocab in debugmode
2024-12-01 14:11:03 +08:00
Concedo
32ac3153e4
default speculative set to 8. added more adapter fields
2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee
default to 12 tokens drafted
2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac
customizable speculative size
2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
b9e99c69e8
fixed build
2024-11-26 22:06:55 +08:00
Concedo
62dde8cfb2
ollama sync completions mostly working. stupid api.
2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00
Concedo
1dd37933e3
fixed grammar not resetting correctly
2024-11-23 09:55:12 +08:00
kallewoof
547ab2aebb
API: add /props route ( #1222 )
...
* API: add an /extra/chat_template route
A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.
* switch to pre-established /props endpoint for chat template
* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00