Commit graph

343 commits

Author SHA1 Message Date
Reithan
62cd9bb0b2
use range neq zero instead of lt (#1388) 2025-02-24 18:47:19 +08:00
Concedo
f2ac10c014 added nsigma to lite 2025-02-21 15:11:24 +08:00
EquinoxPsychosis
2740af3660
add top n sigma sampler from llama.cpp (#1384)
* Add N Sigma Sampler

* update nsigma sampler chain

* xtc position fix

* remove stray newline

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-02-21 14:31:42 +08:00
Concedo
6d7ef10671 Merge branch 'upstream' into concedo_experimental
Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902

# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.gitignore
#	CONTRIBUTING.md
#	Makefile
#	common/CMakeLists.txt
#	common/arg.cpp
#	common/common.cpp
#	examples/main/main.cpp
#	examples/run/run.cpp
#	examples/server/tests/README.md
#	ggml/src/ggml-cuda/mma.cuh
#	scripts/get_chat_template.py
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-chat.cpp
2025-02-20 23:17:20 +08:00
Concedo
b162c25a5e fixed moe experts to use detected arch for key 2025-02-10 17:46:08 +08:00
Concedo
d22eca6c47 fix potential crash in autoguess 2025-02-09 12:33:28 +08:00
Concedo
e68a3cf1dc fixed some functions when no model is loaded 2025-02-08 11:15:26 +08:00
Concedo
8fef9f3fb5 reloading is working correctly. 2025-02-06 22:24:18 +08:00
Concedo
fd84b062f9 allow reuse of clip embds 2025-01-30 19:02:45 +08:00
Concedo
f4e2f4b069 disable context shift when using mrope 2025-01-30 00:36:05 +08:00
Concedo
70f1d8d746 vision can set max res (+1 squashed commits)
Squashed commits:

[938fc655] vision can set max res
2025-01-30 00:19:49 +08:00
Concedo
0e45d3bb7a quiet flags now set at load time 2025-01-25 16:46:56 +08:00
Concedo
cca4a934dd fix for chat templates and drafting 2025-01-23 11:49:40 +08:00
Concedo
0e74db7fd4 fixed another tts bug, clblast selection and quiet mode 2025-01-22 21:36:13 +08:00
Concedo
2a00ee8fa8 broken commit 2025-01-16 21:41:18 +08:00
Concedo
b3de1598e7 Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)

Squashed commit:

[22396311] wip tts

[3a883027] tts not yet working

[0dcfab0e] fix silly bug

[a378d9ef] some long overdue cleanup

[fc5a6fb5] Wip tts

[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Nexes the Elder
3e6ef8e0ef
Probable typo (#1287) 2024-12-26 11:51:04 +08:00
Concedo
10d4fc637d fixed a bug with drafting tokens 2024-12-23 11:36:08 +08:00
Concedo
fd5100c382 fix for query param 2024-12-21 10:41:25 +08:00
Concedo
4c56b7cada Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/gbnf-validator/gbnf-validator.cpp
#	examples/llava/clip.cpp
#	examples/run/README.md
#	examples/run/run.cpp
#	examples/server/README.md
#	ggml/src/ggml-cpu/CMakeLists.txt
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
b7d3274523 temporarily make qwenv2l use clip on cpu for vulkan and macos 2024-12-21 09:15:31 +08:00
Concedo
bc297da91e remove unused function 2024-12-16 11:39:52 +08:00
Concedo
00d154b32b wip on qwen2vl integration, updated msvc runtimes 2024-12-15 23:58:02 +08:00
Concedo
60cd68a39d draft model sets gpu split instead of id, made mmq default for cli 2024-12-14 23:58:45 +08:00
Concedo
595cc6975f added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid 2024-12-13 17:11:59 +08:00
Concedo
00a686fc72 fixed fast forwarding context corruption after abort during prompt processing 2024-12-10 22:37:40 +08:00
Concedo
5106816eac drafted tokens debug prints 2024-12-05 17:05:20 +08:00
Concedo
e93c2427b4 allow incompatible vocab in debugmode 2024-12-01 14:11:03 +08:00
Concedo
32ac3153e4 default speculative set to 8. added more adapter fields 2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee default to 12 tokens drafted 2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac customizable speculative size 2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f speculative decoding initial impl completed (+6 squashed commit)
Squashed commit:

[0a6306ca0] draft wip dont use (will be squashed)

[a758a1c9c] wip dont use (will be squashed)

[e1994d3ce] wip dont use

[f59690d68] wip

[77228147d] wip on spec decoding. dont use yet

[2445bca54] wip adding speculative decoding (+1 squashed commits)

Squashed commits:

[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
b9e99c69e8 fixed build 2024-11-26 22:06:55 +08:00
Concedo
62dde8cfb2 ollama sync completions mostly working. stupid api. 2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d wip ollama emulation, added detokenize endpoint 2024-11-23 22:48:03 +08:00
Concedo
1dd37933e3 fixed grammar not resetting correctly 2024-11-23 09:55:12 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
70aee82552 attempts a backflip, but does he stick the landing? 2024-11-16 17:05:45 +08:00
Concedo
bfa118ee45 fix llava segfault 2024-11-14 14:16:39 +08:00
Concedo
4b96c3bba8 try new batch api (not actually batching) 2024-11-14 13:47:26 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
48e9372337 prevent outputting infinity to logprobs (+1 squashed commits)
Squashed commits:

[bcc5f8b92] prevent outputting infinity to logprobs
2024-11-13 00:09:53 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled (#1201) 2024-11-06 23:09:28 +08:00
Concedo
223c5f0844 clblast survived 2024-11-02 21:51:38 +08:00
Concedo
bbebc76817 fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
Squashed commits:

[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
94a5a27b85 Alone in the darkness
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Concedo
becd737e0f slightly increase padding to handle longer gen amts 2024-10-23 22:58:41 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00