Commit graph

857 commits

Author SHA1 Message Date
Concedo
0cb599546e increase max supported llava images to 8 2025-01-09 22:12:06 +08:00
Concedo
c73d99ccac updated lite 2025-01-08 13:35:59 +08:00
Concedo
568e476997 added toggle for vae tiling, use custom memory buffer 2025-01-08 13:12:03 +08:00
Concedo
d752846116 fixed ask save file 2025-01-07 22:11:15 +08:00
Concedo
58791612d2 sse3 mode for noavx2 clblast, fixed metadata, added version command 2025-01-06 21:59:05 +08:00
Concedo
9b32482089 fixed bug in aesthetic ui 2025-01-05 18:04:02 +08:00
Concedo
1559d4d2fb fixed defective websearch 2025-01-04 16:47:38 +08:00
Concedo
e07e73aeb4 updated lite 2025-01-04 10:47:48 +08:00
Concedo
8de44d1e41 refactored some outputs 2024-12-30 22:30:27 +08:00
Concedo
5eb314a04b websearch length limits and caching 2024-12-30 18:30:54 +08:00
Concedo
3fea11675d websearch integrated into lite, changed to POST 2024-12-30 17:30:41 +08:00
Concedo
6026501ed2 websearch functional 2024-12-30 12:01:51 +08:00
Concedo
709dab6289 improved websearch endpoint 2024-12-29 19:39:16 +08:00
Concedo
5451a8e8a9 updated lite 2024-12-29 17:04:29 +08:00
Concedo
2de1975ca2 improve websearch api 2024-12-28 23:36:40 +08:00
Concedo
baaecd1c65 added a basic websearch proxy 2024-12-28 19:07:00 +08:00
Concedo
29afdb7c90 minor linting 2024-12-28 12:21:35 +08:00
kallewoof
23ec550835
PoC: add chat template heuristics (#1283)
* PoC: add chat template heuristics

The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf).

This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.

* gemma 2 heuristic

* Phi 4, Llama 3.x heuristics

* better qwen vs generic heuristic

* cleanup

* mistral (generic) heuristic

* fix sys msg for mistral

* phi 3.5

* mistral v3

* cohere (aya expanse 32b based)

* only derive from chat template if AutoGuess

* add notes about alpaca fallbacks

* added AutoGuess.json dummy

* add mistral v7

* switch to using a json list with search strings
2024-12-28 12:15:23 +08:00
Concedo
5f8f483fae fixed typo (+1 squashed commits)
Squashed commits:

[b586d187] fixed typo
2024-12-23 21:57:34 +08:00
Concedo
13abf591d2 patch release for drafting fix 2024-12-23 11:40:02 +08:00
Concedo
4c56b7cada Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/gbnf-validator/gbnf-validator.cpp
#	examples/llava/clip.cpp
#	examples/run/README.md
#	examples/run/run.cpp
#	examples/server/README.md
#	ggml/src/ggml-cpu/CMakeLists.txt
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
fc52a38a25 handle urls as config download in model param 2024-12-20 10:56:07 +08:00
Concedo
6089421423 always follow pci bus id 2024-12-18 00:46:48 +08:00
Concedo
60cd68a39d draft model sets gpu split instead of id, made mmq default for cli 2024-12-14 23:58:45 +08:00
Concedo
595cc6975f added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid 2024-12-13 17:11:59 +08:00
Concedo
a11bba5893 cleanup, fix native build for arm (+28 squashed commit)
Squashed commit:

[d1f6a4154] bundle library

[947ab84b7] undo

[0f9aba8d8] test

[e9ac93873] test

[920438202] test

[1c6d98804] Revert "quick test"

This reverts commit acf8ec8940.

[acf8ec894] quick test

[6a9937233] undo

[5a263a5bd] test

[ddfd82bca] test

[0b30e45da] test

[c3bfece55] messed up

[2a4b37fe0] Revert "test"

This reverts commit 80a1fcaeaf.

[80a1fcaea] test

[e2aa7d944] test

[264d80200] test

[f5b123173] undo

[1ffacc484] test

[63c0be926] undo

[510e0377e] ofast try fix

[4ac199b20] try fix sigill

[1bc987ba2] try fix illegal instruction

[7697252b1] edit

[f87087b28] check gcc ver

[e9dfe2cef] try using qemu to do the pyinstaller

[b411192db] revert

[25b5301e5] try using qemu to do the pyinstaller

[58038cddc] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Concedo
e9d2332dd8 improved tool calls and whisper 2024-12-06 14:34:31 +08:00
Concedo
836c06d91a minor edit 2024-12-06 00:37:38 +08:00
Concedo
d0d1d922de handle and fix temp paths to chat completions adapter 2024-12-05 17:22:35 +08:00
Concedo
2787fca6b4 refactored library selection, fixed ollama params 2024-12-05 16:47:52 +08:00
Concedo
52cc908f7f default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change. 2024-12-03 22:44:10 +08:00
Concedo
2ba5949054 updated sdcpp, also set euler as default sampler 2024-12-01 17:00:20 +08:00
Concedo
42228b9746 warning when selecting non gguf models 2024-12-01 13:35:51 +08:00
Concedo
b7cd210cd2 more linting with Ruff (+1 squashed commits)
Squashed commits:

[43802cfe2] Applied default Ruff linting
2024-12-01 01:23:13 +08:00
Concedo
409e393d10 fixed critical bug in image model loader 2024-11-30 23:28:24 +08:00
Concedo
0028e71993 special handling to resolve incomplete utf8 token sequences in qwen 2024-11-30 16:54:01 +08:00
Concedo
32ac3153e4 default speculative set to 8. added more adapter fields 2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee default to 12 tokens drafted 2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac customizable speculative size 2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f speculative decoding initial impl completed (+6 squashed commit)
Squashed commit:

[0a6306ca0] draft wip dont use (will be squashed)

[a758a1c9c] wip dont use (will be squashed)

[e1994d3ce] wip dont use

[f59690d68] wip

[77228147d] wip on spec decoding. dont use yet

[2445bca54] wip adding speculative decoding (+1 squashed commits)

Squashed commits:

[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
kallewoof
fd320f6682
/props endpoint: provide context size through default_generation_settings (#1237) 2024-11-26 16:15:27 +08:00
Concedo
1e0792a3ef comfyui emulation also done 2024-11-24 15:39:03 +08:00
Concedo
9bd27323e7 emulate comfyui txt2img 2024-11-24 11:28:12 +08:00
Concedo
bf28d956ae ollama chat api done 2024-11-24 00:10:15 +08:00
Concedo
62dde8cfb2 ollama sync completions mostly working. stupid api. 2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d wip ollama emulation, added detokenize endpoint 2024-11-23 22:48:03 +08:00
Concedo
c0da7e4dcf multiplayer activity tracking 2024-11-23 19:59:55 +08:00
Concedo
1dd37933e3 fixed grammar not resetting correctly 2024-11-23 09:55:12 +08:00
Concedo
18f227625b multiplayer fixes 2024-11-22 19:02:31 +08:00
mkarr
ac6a0cde91
Support chunked encoding. (#1226)
* Support chunked encoding.

The koboldcpp API does not support HTTP chunked encoding. Some HTTP
libraries, notable Go's net/http can automatically choose to use chunked
encoding. This adds support for chunked encoding within the do_POST()
handler.

* refactor slightly to add additional safety checks and follow original format

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-11-21 18:24:04 +08:00