Concedo
12cdcf0abe
improved browser opening
2025-01-11 22:53:43 +08:00
Concedo
93b2bebc2f
add more options for context size
2025-01-10 19:08:42 +08:00
Concedo
0305841dd5
added a gguf file analyzer
2025-01-10 16:27:48 +08:00
Concedo
91b6e29af3
added multilingual support for whisper
2025-01-09 23:28:52 +08:00
Concedo
0cb599546e
increase max supported llava images to 8
2025-01-09 22:12:06 +08:00
Concedo
c73d99ccac
updated lite
2025-01-08 13:35:59 +08:00
Concedo
568e476997
added toggle for vae tiling, use custom memory buffer
2025-01-08 13:12:03 +08:00
Concedo
d752846116
fixed ask save file
2025-01-07 22:11:15 +08:00
Concedo
58791612d2
sse3 mode for noavx2 clblast, fixed metadata, added version command
2025-01-06 21:59:05 +08:00
Concedo
9b32482089
fixed bug in aesthetic ui
2025-01-05 18:04:02 +08:00
Concedo
1559d4d2fb
fixed defective websearch
2025-01-04 16:47:38 +08:00
Concedo
e07e73aeb4
updated lite
2025-01-04 10:47:48 +08:00
Concedo
8de44d1e41
refactored some outputs
2024-12-30 22:30:27 +08:00
Concedo
5eb314a04b
websearch length limits and caching
2024-12-30 18:30:54 +08:00
Concedo
3fea11675d
websearch integrated into lite, changed to POST
2024-12-30 17:30:41 +08:00
Concedo
6026501ed2
websearch functional
2024-12-30 12:01:51 +08:00
Concedo
709dab6289
improved websearch endpoint
2024-12-29 19:39:16 +08:00
Concedo
5451a8e8a9
updated lite
2024-12-29 17:04:29 +08:00
Concedo
2de1975ca2
improve websearch api
2024-12-28 23:36:40 +08:00
Concedo
baaecd1c65
added a basic websearch proxy
2024-12-28 19:07:00 +08:00
Concedo
29afdb7c90
minor linting
2024-12-28 12:21:35 +08:00
kallewoof
23ec550835
PoC: add chat template heuristics ( #1283 )
...
* PoC: add chat template heuristics
The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf).
This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.
* gemma 2 heuristic
* Phi 4, Llama 3.x heuristics
* better qwen vs generic heuristic
* cleanup
* mistral (generic) heuristic
* fix sys msg for mistral
* phi 3.5
* mistral v3
* cohere (aya expanse 32b based)
* only derive from chat template if AutoGuess
* add notes about alpaca fallbacks
* added AutoGuess.json dummy
* add mistral v7
* switch to using a json list with search strings
2024-12-28 12:15:23 +08:00
Concedo
5f8f483fae
fixed typo (+1 squashed commits)
...
Squashed commits:
[b586d187] fixed typo
2024-12-23 21:57:34 +08:00
Concedo
13abf591d2
patch release for drafting fix
2024-12-23 11:40:02 +08:00
Concedo
4c56b7cada
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/gbnf-validator/gbnf-validator.cpp
# examples/llava/clip.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-12-21 09:41:49 +08:00
Concedo
fc52a38a25
handle urls as config download in model param
2024-12-20 10:56:07 +08:00
Concedo
6089421423
always follow pci bus id
2024-12-18 00:46:48 +08:00
Concedo
60cd68a39d
draft model sets gpu split instead of id, made mmq default for cli
2024-12-14 23:58:45 +08:00
Concedo
595cc6975f
added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid
2024-12-13 17:11:59 +08:00
Concedo
a11bba5893
cleanup, fix native build for arm (+28 squashed commit)
...
Squashed commit:
[d1f6a4154] bundle library
[947ab84b7] undo
[0f9aba8d8] test
[e9ac93873] test
[920438202] test
[1c6d98804
] Revert "quick test"
This reverts commit acf8ec8940
.
[acf8ec894
] quick test
[6a9937233
] undo
[5a263a5bd
] test
[ddfd82bca
] test
[0b30e45da
] test
[c3bfece55
] messed up
[2a4b37fe0
] Revert "test"
This reverts commit 80a1fcaeaf
.
[80a1fcaea
] test
[e2aa7d944
] test
[264d80200
] test
[f5b123173
] undo
[1ffacc484
] test
[63c0be926
] undo
[510e0377e
] ofast try fix
[4ac199b20
] try fix sigill
[1bc987ba2
] try fix illegal instruction
[7697252b1
] edit
[f87087b28
] check gcc ver
[e9dfe2cef
] try using qemu to do the pyinstaller
[b411192db
] revert
[25b5301e5
] try using qemu to do the pyinstaller
[58038cddc
] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Concedo
e9d2332dd8
improved tool calls and whisper
2024-12-06 14:34:31 +08:00
Concedo
836c06d91a
minor edit
2024-12-06 00:37:38 +08:00
Concedo
d0d1d922de
handle and fix temp paths to chat completions adapter
2024-12-05 17:22:35 +08:00
Concedo
2787fca6b4
refactored library selection, fixed ollama params
2024-12-05 16:47:52 +08:00
Concedo
52cc908f7f
default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change.
2024-12-03 22:44:10 +08:00
Concedo
2ba5949054
updated sdcpp, also set euler as default sampler
2024-12-01 17:00:20 +08:00
Concedo
42228b9746
warning when selecting non gguf models
2024-12-01 13:35:51 +08:00
Concedo
b7cd210cd2
more linting with Ruff (+1 squashed commits)
...
Squashed commits:
[43802cfe2] Applied default Ruff linting
2024-12-01 01:23:13 +08:00
Concedo
409e393d10
fixed critical bug in image model loader
2024-11-30 23:28:24 +08:00
Concedo
0028e71993
special handling to resolve incomplete utf8 token sequences in qwen
2024-11-30 16:54:01 +08:00
Concedo
32ac3153e4
default speculative set to 8. added more adapter fields
2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee
default to 12 tokens drafted
2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac
customizable speculative size
2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
kallewoof
fd320f6682
/props endpoint: provide context size through default_generation_settings ( #1237 )
2024-11-26 16:15:27 +08:00
Concedo
1e0792a3ef
comfyui emulation also done
2024-11-24 15:39:03 +08:00
Concedo
9bd27323e7
emulate comfyui txt2img
2024-11-24 11:28:12 +08:00
Concedo
bf28d956ae
ollama chat api done
2024-11-24 00:10:15 +08:00
Concedo
62dde8cfb2
ollama sync completions mostly working. stupid api.
2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00