Commit graph

1047 commits

Author SHA1 Message Date
Concedo
3fea11675d websearch integrated into lite, changed to POST 2024-12-30 17:30:41 +08:00
Concedo
6026501ed2 websearch functional 2024-12-30 12:01:51 +08:00
Concedo
709dab6289 improved websearch endpoint 2024-12-29 19:39:16 +08:00
Concedo
5451a8e8a9 updated lite 2024-12-29 17:04:29 +08:00
Concedo
2de1975ca2 improve websearch api 2024-12-28 23:36:40 +08:00
Concedo
baaecd1c65 added a basic websearch proxy 2024-12-28 19:07:00 +08:00
Concedo
29afdb7c90 minor linting 2024-12-28 12:21:35 +08:00
kallewoof
23ec550835
PoC: add chat template heuristics (#1283)
* PoC: add chat template heuristics

The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf).

This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.

* gemma 2 heuristic

* Phi 4, Llama 3.x heuristics

* better qwen vs generic heuristic

* cleanup

* mistral (generic) heuristic

* fix sys msg for mistral

* phi 3.5

* mistral v3

* cohere (aya expanse 32b based)

* only derive from chat template if AutoGuess

* add notes about alpaca fallbacks

* added AutoGuess.json dummy

* add mistral v7

* switch to using a json list with search strings
2024-12-28 12:15:23 +08:00
Concedo
5f8f483fae fixed typo (+1 squashed commits)
Squashed commits:

[b586d187] fixed typo
2024-12-23 21:57:34 +08:00
Concedo
13abf591d2 patch release for drafting fix 2024-12-23 11:40:02 +08:00
Concedo
4c56b7cada Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/gbnf-validator/gbnf-validator.cpp
#	examples/llava/clip.cpp
#	examples/run/README.md
#	examples/run/run.cpp
#	examples/server/README.md
#	ggml/src/ggml-cpu/CMakeLists.txt
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
fc52a38a25 handle urls as config download in model param 2024-12-20 10:56:07 +08:00
Concedo
6089421423 always follow pci bus id 2024-12-18 00:46:48 +08:00
Concedo
60cd68a39d draft model sets gpu split instead of id, made mmq default for cli 2024-12-14 23:58:45 +08:00
Concedo
595cc6975f added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid 2024-12-13 17:11:59 +08:00
Concedo
a11bba5893 cleanup, fix native build for arm (+28 squashed commit)
Squashed commit:

[d1f6a4154] bundle library

[947ab84b7] undo

[0f9aba8d8] test

[e9ac93873] test

[920438202] test

[1c6d98804] Revert "quick test"

This reverts commit acf8ec8940.

[acf8ec894] quick test

[6a9937233] undo

[5a263a5bd] test

[ddfd82bca] test

[0b30e45da] test

[c3bfece55] messed up

[2a4b37fe0] Revert "test"

This reverts commit 80a1fcaeaf.

[80a1fcaea] test

[e2aa7d944] test

[264d80200] test

[f5b123173] undo

[1ffacc484] test

[63c0be926] undo

[510e0377e] ofast try fix

[4ac199b20] try fix sigill

[1bc987ba2] try fix illegal instruction

[7697252b1] edit

[f87087b28] check gcc ver

[e9dfe2cef] try using qemu to do the pyinstaller

[b411192db] revert

[25b5301e5] try using qemu to do the pyinstaller

[58038cddc] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Concedo
e9d2332dd8 improved tool calls and whisper 2024-12-06 14:34:31 +08:00
Concedo
836c06d91a minor edit 2024-12-06 00:37:38 +08:00
Concedo
d0d1d922de handle and fix temp paths to chat completions adapter 2024-12-05 17:22:35 +08:00
Concedo
2787fca6b4 refactored library selection, fixed ollama params 2024-12-05 16:47:52 +08:00
Concedo
52cc908f7f default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change. 2024-12-03 22:44:10 +08:00
Concedo
2ba5949054 updated sdcpp, also set euler as default sampler 2024-12-01 17:00:20 +08:00
Concedo
42228b9746 warning when selecting non gguf models 2024-12-01 13:35:51 +08:00
Concedo
b7cd210cd2 more linting with Ruff (+1 squashed commits)
Squashed commits:

[43802cfe2] Applied default Ruff linting
2024-12-01 01:23:13 +08:00
Concedo
409e393d10 fixed critical bug in image model loader 2024-11-30 23:28:24 +08:00
Concedo
0028e71993 special handling to resolve incomplete utf8 token sequences in qwen 2024-11-30 16:54:01 +08:00
Concedo
32ac3153e4 default speculative set to 8. added more adapter fields 2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee default to 12 tokens drafted 2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac customizable speculative size 2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f speculative decoding initial impl completed (+6 squashed commit)
Squashed commit:

[0a6306ca0] draft wip dont use (will be squashed)

[a758a1c9c] wip dont use (will be squashed)

[e1994d3ce] wip dont use

[f59690d68] wip

[77228147d] wip on spec decoding. dont use yet

[2445bca54] wip adding speculative decoding (+1 squashed commits)

Squashed commits:

[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
kallewoof
fd320f6682
/props endpoint: provide context size through default_generation_settings (#1237) 2024-11-26 16:15:27 +08:00
Concedo
1e0792a3ef comfyui emulation also done 2024-11-24 15:39:03 +08:00
Concedo
9bd27323e7 emulate comfyui txt2img 2024-11-24 11:28:12 +08:00
Concedo
bf28d956ae ollama chat api done 2024-11-24 00:10:15 +08:00
Concedo
62dde8cfb2 ollama sync completions mostly working. stupid api. 2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d wip ollama emulation, added detokenize endpoint 2024-11-23 22:48:03 +08:00
Concedo
c0da7e4dcf multiplayer activity tracking 2024-11-23 19:59:55 +08:00
Concedo
1dd37933e3 fixed grammar not resetting correctly 2024-11-23 09:55:12 +08:00
Concedo
18f227625b multiplayer fixes 2024-11-22 19:02:31 +08:00
mkarr
ac6a0cde91
Support chunked encoding. (#1226)
* Support chunked encoding.

The koboldcpp API does not support HTTP chunked encoding. Some HTTP
libraries, notable Go's net/http can automatically choose to use chunked
encoding. This adds support for chunked encoding within the do_POST()
handler.

* refactor slightly to add additional safety checks and follow original format

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-11-21 18:24:04 +08:00
Concedo
c2ca2ec2bc updated docs, fixed a few issues with multiplayer 2024-11-21 18:16:13 +08:00
Concedo
272828cab0 tweaks to chat template 2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
8ab3eb89a8 updated lite 2024-11-21 10:43:48 +08:00
Concedo
a439dcb38e multiplayer error handling 2024-11-19 23:31:48 +08:00
Concedo
1b663e10c8 first functional multiplayer 2024-11-19 22:49:28 +08:00
Concedo
14cbd07eaa more wip multiplayer 2024-11-19 18:09:26 +08:00
Concedo
39124828ab wip multiplayer 2024-11-17 23:29:25 +08:00
Concedo
a8694698fd accept gguf text encoders for sd 2024-11-16 17:23:02 +08:00
Concedo
70aee82552 attempts a backflip, but does he stick the landing? 2024-11-16 17:05:45 +08:00