Commit graph

308 commits

Author SHA1 Message Date
Concedo
1dd37933e3 fixed grammar not resetting correctly 2024-11-23 09:55:12 +08:00
kallewoof
547ab2aebb
API: add /props route (#1222)
* API: add an /extra/chat_template route

A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.

* switch to pre-established /props endpoint for chat template

* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
70aee82552 attempts a backflip, but does he stick the landing? 2024-11-16 17:05:45 +08:00
Concedo
bfa118ee45 fix llava segfault 2024-11-14 14:16:39 +08:00
Concedo
4b96c3bba8 try new batch api (not actually batching) 2024-11-14 13:47:26 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
48e9372337 prevent outputting infinity to logprobs (+1 squashed commits)
Squashed commits:

[bcc5f8b92] prevent outputting infinity to logprobs
2024-11-13 00:09:53 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled (#1201) 2024-11-06 23:09:28 +08:00
Concedo
223c5f0844 clblast survived 2024-11-02 21:51:38 +08:00
Concedo
bbebc76817 fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
Squashed commits:

[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
94a5a27b85 Alone in the darkness
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Concedo
becd737e0f slightly increase padding to handle longer gen amts 2024-10-23 22:58:41 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
Concedo
7f76425450 lower topk prefilter token amount to 3k 2024-10-16 20:39:41 +08:00
Concedo
cff72c5d26 remove unwanted print 2024-10-11 18:56:32 +08:00
Concedo
e692a79aab Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	CONTRIBUTING.md
#	docs/android.md
#	docs/docker.md
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/main/README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/server/README.md
#	examples/simple/CMakeLists.txt
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-blas.cpp
#	pocs/vdot/q8dot.cpp
#	pocs/vdot/vdot.cpp
#	scripts/debug-test.sh
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-quantize-fns.cpp
#	tests/test-quantize-perf.cpp
#	tests/test-tokenizer-0.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Maya
5c9650d68e
Fix access violation when using banned_phrases (#1154) 2024-10-10 21:46:39 +08:00
Concedo
fe5479f286 unify antislop and token bans 2024-10-10 18:21:07 +08:00
Concedo
9b614d46bd antislop sampler working 2024-10-09 16:33:04 +08:00
Concedo
36e9bac98f wip anti slop sampler 2024-10-09 13:34:47 +08:00
Concedo
f78f8d3d45 wip anti slop 2024-10-07 23:18:13 +08:00
Concedo
65f3c68399 wip antislop 2024-10-07 20:19:22 +08:00
Concedo
740c5e01cb added token delay feature 2024-10-07 19:45:51 +08:00
Concedo
3e8bb10e2d wip on rewind function 2024-10-06 16:21:03 +08:00
Concedo
c38d1ecc8d update templates, fix rwkv 2024-09-22 01:32:12 +08:00
Concedo
53bf0fb32d removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified. 2024-09-15 19:21:52 +08:00
Concedo
e44ddf26ef Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	Makefile
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/llava/MobileVLM-README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/quantize/CMakeLists.txt
#	examples/server/README.md
#	examples/speculative/speculative.cpp
#	tests/test-backend-ops.cpp
2024-09-13 16:17:24 +08:00
Concedo
7bdac9bc44 prevent shifting on rwkv 2024-09-11 20:22:45 +08:00
Concedo
eee67281be move kcpp params out 2024-09-10 16:30:12 +08:00
Concedo
fc7fe2e7a0 allow rwkv6 to run although its broken 2024-09-09 20:50:58 +08:00
Concedo
b63158005f All samplers moved to kcpp side 2024-09-09 18:14:11 +08:00
Concedo
12fd16bfd4 Merge commit 'df270ef745' into concedo_experimental
# Conflicts:
#	Makefile
#	common/CMakeLists.txt
#	common/common.h
#	common/sampling.cpp
#	common/sampling.h
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/server/server.cpp
#	include/llama.h
#	src/llama-sampling.cpp
#	src/llama-sampling.h
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-grammar-parser.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-llama-grammar.cpp
#	tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
c78690737c fix for DRY segfault on unicode character substring tokenization 2024-09-08 18:25:00 +08:00
Concedo
d220495dd4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/llama-cli-cuda.Dockerfile
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-intel.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.devops/llama-server.Dockerfile
#	.github/workflows/docker.yml
#	docs/docker.md
#	examples/llama-bench/llama-bench.cpp
#	flake.lock
#	ggml/include/ggml.h
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
#	tests/test-rope.cpp
2024-08-30 10:37:39 +08:00
Concedo
b78a637da5 try to optimize context shifting 2024-08-26 23:07:31 +08:00
Concedo
cca3c4c78b xtc fixes 2024-08-22 23:18:46 +08:00
Concedo
fc2545dc83 fixed a typo 2024-08-22 00:25:56 +08:00
Concedo
5bf527a6ae added xtc sampler 2024-08-21 23:57:15 +08:00
Concedo
1a7ecd55e6 timing for init step, clip for vulkan 2024-08-21 18:14:53 +08:00
Concedo
cd69ab218e fixed DRY 2024-08-21 17:01:28 +08:00
Concedo
6a4becb731 dry is still buggy because token indexes are wrong 2024-08-21 00:59:26 +08:00
Concedo
db6ef8d1e1 revert dry state reset 2024-08-20 22:22:21 +08:00
Concedo
c1ae350e5b fixed race condition when generating 2024-08-20 20:17:55 +08:00
Concedo
e12ab53488 force clear some DRY state vars on new generation - not sure if this helps 2024-08-14 21:35:39 +08:00
Concedo
689a17d756 always prefilter to 5k logits 2024-08-12 22:27:06 +08:00
Concedo
729eb1e552 no fast forward for empty prompt 2024-07-27 16:29:35 +08:00
Concedo
eb5b4d0186 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	Makefile
#	Package.swift
#	src/CMakeLists.txt
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-llama-grammar.cpp
2024-07-23 23:20:32 +08:00
Concedo
e2b36aa6cf fixed dry loading seq when not in use, set kcppt to -1 layers by default 2024-07-22 15:44:34 +08:00