Concedo
b7d3274523
temporarily make qwenv2l use clip on cpu for vulkan and macos
2024-12-21 09:15:31 +08:00
Concedo
bc297da91e
remove unused function
2024-12-16 11:39:52 +08:00
Concedo
00d154b32b
wip on qwen2vl integration, updated msvc runtimes
2024-12-15 23:58:02 +08:00
Concedo
60cd68a39d
draft model sets gpu split instead of id, made mmq default for cli
2024-12-14 23:58:45 +08:00
Concedo
595cc6975f
added new flags --moeexperts --failsafe --draftgpulayers and --draftgpuid
2024-12-13 17:11:59 +08:00
Concedo
00a686fc72
fixed fast forwarding context corruption after abort during prompt processing
2024-12-10 22:37:40 +08:00
Concedo
5106816eac
drafted tokens debug prints
2024-12-05 17:05:20 +08:00
Concedo
e93c2427b4
allow incompatible vocab in debugmode
2024-12-01 14:11:03 +08:00
Concedo
32ac3153e4
default speculative set to 8. added more adapter fields
2024-11-30 16:18:27 +08:00
Concedo
e0c59486ee
default to 12 tokens drafted
2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac
customizable speculative size
2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
b9e99c69e8
fixed build
2024-11-26 22:06:55 +08:00
Concedo
62dde8cfb2
ollama sync completions mostly working. stupid api.
2024-11-23 23:31:37 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00
Concedo
1dd37933e3
fixed grammar not resetting correctly
2024-11-23 09:55:12 +08:00
kallewoof
547ab2aebb
API: add /props route ( #1222 )
...
* API: add an /extra/chat_template route
A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.
* switch to pre-established /props endpoint for chat template
* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00
Concedo
bfa118ee45
fix llava segfault
2024-11-14 14:16:39 +08:00
Concedo
4b96c3bba8
try new batch api (not actually batching)
2024-11-14 13:47:26 +08:00
Concedo
3813f6c517
added new flag nofastforward allowing users to disable fast forwarding
2024-11-13 10:59:01 +08:00
Concedo
48e9372337
prevent outputting infinity to logprobs (+1 squashed commits)
...
Squashed commits:
[bcc5f8b92] prevent outputting infinity to logprobs
2024-11-13 00:09:53 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled ( #1201 )
2024-11-06 23:09:28 +08:00
Concedo
223c5f0844
clblast survived
2024-11-02 21:51:38 +08:00
Concedo
bbebc76817
fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
...
Squashed commits:
[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
aa26a58085
added logprobs api and logprobs viewer
2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67
wip logprobs data
2024-10-30 00:59:34 +08:00
Concedo
94a5a27b85
Alone in the darkness
...
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Concedo
becd737e0f
slightly increase padding to handle longer gen amts
2024-10-23 22:58:41 +08:00
Maya
8bb220329c
Dynamic sizes for sequences ( #1157 )
...
* Dynamic sizes for sequences
* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields
* adjust anti abuse limits
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
Concedo
7f76425450
lower topk prefilter token amount to 3k
2024-10-16 20:39:41 +08:00
Concedo
cff72c5d26
remove unwanted print
2024-10-11 18:56:32 +08:00
Concedo
e692a79aab
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# CONTRIBUTING.md
# docs/android.md
# docs/docker.md
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/main/README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/save-load-state/save-load-state.cpp
# examples/server/README.md
# examples/simple/CMakeLists.txt
# examples/speculative/speculative.cpp
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-blas.cpp
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# scripts/debug-test.sh
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Maya
5c9650d68e
Fix access violation when using banned_phrases ( #1154 )
2024-10-10 21:46:39 +08:00
Concedo
fe5479f286
unify antislop and token bans
2024-10-10 18:21:07 +08:00
Concedo
9b614d46bd
antislop sampler working
2024-10-09 16:33:04 +08:00
Concedo
36e9bac98f
wip anti slop sampler
2024-10-09 13:34:47 +08:00
Concedo
f78f8d3d45
wip anti slop
2024-10-07 23:18:13 +08:00
Concedo
65f3c68399
wip antislop
2024-10-07 20:19:22 +08:00
Concedo
740c5e01cb
added token delay feature
2024-10-07 19:45:51 +08:00
Concedo
3e8bb10e2d
wip on rewind function
2024-10-06 16:21:03 +08:00
Concedo
c38d1ecc8d
update templates, fix rwkv
2024-09-22 01:32:12 +08:00
Concedo
53bf0fb32d
removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.
2024-09-15 19:21:52 +08:00
Concedo
e44ddf26ef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# Makefile
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/llama-bench/llama-bench.cpp
# examples/llava/MobileVLM-README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize/CMakeLists.txt
# examples/server/README.md
# examples/speculative/speculative.cpp
# tests/test-backend-ops.cpp
2024-09-13 16:17:24 +08:00
Concedo
7bdac9bc44
prevent shifting on rwkv
2024-09-11 20:22:45 +08:00
Concedo
eee67281be
move kcpp params out
2024-09-10 16:30:12 +08:00
Concedo
fc7fe2e7a0
allow rwkv6 to run although its broken
2024-09-09 20:50:58 +08:00
Concedo
b63158005f
All samplers moved to kcpp side
2024-09-09 18:14:11 +08:00
Concedo
12fd16bfd4
Merge commit ' df270ef745
' into concedo_experimental
...
# Conflicts:
# Makefile
# common/CMakeLists.txt
# common/common.h
# common/sampling.cpp
# common/sampling.h
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/server/server.cpp
# include/llama.h
# src/llama-sampling.cpp
# src/llama-sampling.h
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
# tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
c78690737c
fix for DRY segfault on unicode character substring tokenization
2024-09-08 18:25:00 +08:00