Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00
Concedo
1dd37933e3
fixed grammar not resetting correctly
2024-11-23 09:55:12 +08:00
kallewoof
547ab2aebb
API: add /props route ( #1222 )
...
* API: add an /extra/chat_template route
A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.
* switch to pre-established /props endpoint for chat template
* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00
Concedo
bfa118ee45
fix llava segfault
2024-11-14 14:16:39 +08:00
Concedo
4b96c3bba8
try new batch api (not actually batching)
2024-11-14 13:47:26 +08:00
Concedo
3813f6c517
added new flag nofastforward allowing users to disable fast forwarding
2024-11-13 10:59:01 +08:00
Concedo
48e9372337
prevent outputting infinity to logprobs (+1 squashed commits)
...
Squashed commits:
[bcc5f8b92] prevent outputting infinity to logprobs
2024-11-13 00:09:53 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled ( #1201 )
2024-11-06 23:09:28 +08:00
Concedo
223c5f0844
clblast survived
2024-11-02 21:51:38 +08:00
Concedo
bbebc76817
fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
...
Squashed commits:
[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
aa26a58085
added logprobs api and logprobs viewer
2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67
wip logprobs data
2024-10-30 00:59:34 +08:00
Concedo
94a5a27b85
Alone in the darkness
...
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Concedo
becd737e0f
slightly increase padding to handle longer gen amts
2024-10-23 22:58:41 +08:00
Maya
8bb220329c
Dynamic sizes for sequences ( #1157 )
...
* Dynamic sizes for sequences
* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields
* adjust anti abuse limits
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
Concedo
7f76425450
lower topk prefilter token amount to 3k
2024-10-16 20:39:41 +08:00
Concedo
cff72c5d26
remove unwanted print
2024-10-11 18:56:32 +08:00
Concedo
e692a79aab
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# CONTRIBUTING.md
# docs/android.md
# docs/docker.md
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/main/README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/save-load-state/save-load-state.cpp
# examples/server/README.md
# examples/simple/CMakeLists.txt
# examples/speculative/speculative.cpp
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-blas.cpp
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# scripts/debug-test.sh
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Maya
5c9650d68e
Fix access violation when using banned_phrases ( #1154 )
2024-10-10 21:46:39 +08:00
Concedo
fe5479f286
unify antislop and token bans
2024-10-10 18:21:07 +08:00
Concedo
9b614d46bd
antislop sampler working
2024-10-09 16:33:04 +08:00
Concedo
36e9bac98f
wip anti slop sampler
2024-10-09 13:34:47 +08:00
Concedo
f78f8d3d45
wip anti slop
2024-10-07 23:18:13 +08:00
Concedo
65f3c68399
wip antislop
2024-10-07 20:19:22 +08:00
Concedo
740c5e01cb
added token delay feature
2024-10-07 19:45:51 +08:00
Concedo
3e8bb10e2d
wip on rewind function
2024-10-06 16:21:03 +08:00
Concedo
c38d1ecc8d
update templates, fix rwkv
2024-09-22 01:32:12 +08:00
Concedo
53bf0fb32d
removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.
2024-09-15 19:21:52 +08:00
Concedo
e44ddf26ef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# Makefile
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/llama-bench/llama-bench.cpp
# examples/llava/MobileVLM-README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize/CMakeLists.txt
# examples/server/README.md
# examples/speculative/speculative.cpp
# tests/test-backend-ops.cpp
2024-09-13 16:17:24 +08:00
Concedo
7bdac9bc44
prevent shifting on rwkv
2024-09-11 20:22:45 +08:00
Concedo
eee67281be
move kcpp params out
2024-09-10 16:30:12 +08:00
Concedo
fc7fe2e7a0
allow rwkv6 to run although its broken
2024-09-09 20:50:58 +08:00
Concedo
b63158005f
All samplers moved to kcpp side
2024-09-09 18:14:11 +08:00
Concedo
12fd16bfd4
Merge commit ' df270ef745
' into concedo_experimental
...
# Conflicts:
# Makefile
# common/CMakeLists.txt
# common/common.h
# common/sampling.cpp
# common/sampling.h
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/server/server.cpp
# include/llama.h
# src/llama-sampling.cpp
# src/llama-sampling.h
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
# tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
c78690737c
fix for DRY segfault on unicode character substring tokenization
2024-09-08 18:25:00 +08:00
Concedo
d220495dd4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .github/workflows/docker.yml
# docs/docker.md
# examples/llama-bench/llama-bench.cpp
# flake.lock
# ggml/include/ggml.h
# ggml/src/CMakeLists.txt
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-rope.cpp
2024-08-30 10:37:39 +08:00
Concedo
b78a637da5
try to optimize context shifting
2024-08-26 23:07:31 +08:00
Concedo
cca3c4c78b
xtc fixes
2024-08-22 23:18:46 +08:00
Concedo
fc2545dc83
fixed a typo
2024-08-22 00:25:56 +08:00
Concedo
5bf527a6ae
added xtc sampler
2024-08-21 23:57:15 +08:00
Concedo
1a7ecd55e6
timing for init step, clip for vulkan
2024-08-21 18:14:53 +08:00
Concedo
cd69ab218e
fixed DRY
2024-08-21 17:01:28 +08:00
Concedo
6a4becb731
dry is still buggy because token indexes are wrong
2024-08-21 00:59:26 +08:00
Concedo
db6ef8d1e1
revert dry state reset
2024-08-20 22:22:21 +08:00
Concedo
c1ae350e5b
fixed race condition when generating
2024-08-20 20:17:55 +08:00
Concedo
e12ab53488
force clear some DRY state vars on new generation - not sure if this helps
2024-08-14 21:35:39 +08:00
Concedo
689a17d756
always prefilter to 5k logits
2024-08-12 22:27:06 +08:00
Concedo
729eb1e552
no fast forward for empty prompt
2024-07-27 16:29:35 +08:00
Concedo
eb5b4d0186
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# Package.swift
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-07-23 23:20:32 +08:00