Commit graph

304 commits

Author SHA1 Message Date
Concedo
4b96c3bba8 try new batch api (not actually batching) 2024-11-14 13:47:26 +08:00
Concedo
3813f6c517 added new flag nofastforward allowing users to disable fast forwarding 2024-11-13 10:59:01 +08:00
Concedo
48e9372337 prevent outputting infinity to logprobs (+1 squashed commits)
Squashed commits:

[bcc5f8b92] prevent outputting infinity to logprobs
2024-11-13 00:09:53 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled (#1201) 2024-11-06 23:09:28 +08:00
Concedo
223c5f0844 clblast survived 2024-11-02 21:51:38 +08:00
Concedo
bbebc76817 fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
Squashed commits:

[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
aa26a58085 added logprobs api and logprobs viewer 2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67 wip logprobs data 2024-10-30 00:59:34 +08:00
Concedo
94a5a27b85 Alone in the darkness
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Concedo
becd737e0f slightly increase padding to handle longer gen amts 2024-10-23 22:58:41 +08:00
Maya
8bb220329c
Dynamic sizes for sequences (#1157)
* Dynamic sizes for sequences

* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields

* adjust anti abuse limits

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
Concedo
7f76425450 lower topk prefilter token amount to 3k 2024-10-16 20:39:41 +08:00
Concedo
cff72c5d26 remove unwanted print 2024-10-11 18:56:32 +08:00
Concedo
e692a79aab Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	CMakeLists.txt
#	CONTRIBUTING.md
#	docs/android.md
#	docs/docker.md
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/main/README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/save-load-state/save-load-state.cpp
#	examples/server/README.md
#	examples/simple/CMakeLists.txt
#	examples/speculative/speculative.cpp
#	flake.lock
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-blas.cpp
#	pocs/vdot/q8dot.cpp
#	pocs/vdot/vdot.cpp
#	scripts/debug-test.sh
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-quantize-fns.cpp
#	tests/test-quantize-perf.cpp
#	tests/test-tokenizer-0.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Maya
5c9650d68e
Fix access violation when using banned_phrases (#1154) 2024-10-10 21:46:39 +08:00
Concedo
fe5479f286 unify antislop and token bans 2024-10-10 18:21:07 +08:00
Concedo
9b614d46bd antislop sampler working 2024-10-09 16:33:04 +08:00
Concedo
36e9bac98f wip anti slop sampler 2024-10-09 13:34:47 +08:00
Concedo
f78f8d3d45 wip anti slop 2024-10-07 23:18:13 +08:00
Concedo
65f3c68399 wip antislop 2024-10-07 20:19:22 +08:00
Concedo
740c5e01cb added token delay feature 2024-10-07 19:45:51 +08:00
Concedo
3e8bb10e2d wip on rewind function 2024-10-06 16:21:03 +08:00
Concedo
c38d1ecc8d update templates, fix rwkv 2024-09-22 01:32:12 +08:00
Concedo
53bf0fb32d removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified. 2024-09-15 19:21:52 +08:00
Concedo
e44ddf26ef Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	Makefile
#	examples/embedding/embedding.cpp
#	examples/imatrix/imatrix.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/llava/MobileVLM-README.md
#	examples/parallel/parallel.cpp
#	examples/perplexity/perplexity.cpp
#	examples/quantize/CMakeLists.txt
#	examples/server/README.md
#	examples/speculative/speculative.cpp
#	tests/test-backend-ops.cpp
2024-09-13 16:17:24 +08:00
Concedo
7bdac9bc44 prevent shifting on rwkv 2024-09-11 20:22:45 +08:00
Concedo
eee67281be move kcpp params out 2024-09-10 16:30:12 +08:00
Concedo
fc7fe2e7a0 allow rwkv6 to run although its broken 2024-09-09 20:50:58 +08:00
Concedo
b63158005f All samplers moved to kcpp side 2024-09-09 18:14:11 +08:00
Concedo
12fd16bfd4 Merge commit 'df270ef745' into concedo_experimental
# Conflicts:
#	Makefile
#	common/CMakeLists.txt
#	common/common.h
#	common/sampling.cpp
#	common/sampling.h
#	examples/infill/infill.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/server/server.cpp
#	include/llama.h
#	src/llama-sampling.cpp
#	src/llama-sampling.h
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-grammar-parser.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tests/test-llama-grammar.cpp
#	tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
c78690737c fix for DRY segfault on unicode character substring tokenization 2024-09-08 18:25:00 +08:00
Concedo
d220495dd4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/llama-cli-cuda.Dockerfile
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-intel.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.devops/llama-server.Dockerfile
#	.github/workflows/docker.yml
#	docs/docker.md
#	examples/llama-bench/llama-bench.cpp
#	flake.lock
#	ggml/include/ggml.h
#	ggml/src/CMakeLists.txt
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-grad0.cpp
#	tests/test-rope.cpp
2024-08-30 10:37:39 +08:00
Concedo
b78a637da5 try to optimize context shifting 2024-08-26 23:07:31 +08:00
Concedo
cca3c4c78b xtc fixes 2024-08-22 23:18:46 +08:00
Concedo
fc2545dc83 fixed a typo 2024-08-22 00:25:56 +08:00
Concedo
5bf527a6ae added xtc sampler 2024-08-21 23:57:15 +08:00
Concedo
1a7ecd55e6 timing for init step, clip for vulkan 2024-08-21 18:14:53 +08:00
Concedo
cd69ab218e fixed DRY 2024-08-21 17:01:28 +08:00
Concedo
6a4becb731 dry is still buggy because token indexes are wrong 2024-08-21 00:59:26 +08:00
Concedo
db6ef8d1e1 revert dry state reset 2024-08-20 22:22:21 +08:00
Concedo
c1ae350e5b fixed race condition when generating 2024-08-20 20:17:55 +08:00
Concedo
e12ab53488 force clear some DRY state vars on new generation - not sure if this helps 2024-08-14 21:35:39 +08:00
Concedo
689a17d756 always prefilter to 5k logits 2024-08-12 22:27:06 +08:00
Concedo
729eb1e552 no fast forward for empty prompt 2024-07-27 16:29:35 +08:00
Concedo
eb5b4d0186 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	Makefile
#	Package.swift
#	src/CMakeLists.txt
#	src/llama.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-llama-grammar.cpp
2024-07-23 23:20:32 +08:00
Concedo
e2b36aa6cf fixed dry loading seq when not in use, set kcppt to -1 layers by default 2024-07-22 15:44:34 +08:00
Concedo
0ecf13fc13 updated lite, extra error logging 2024-07-21 17:55:47 +08:00
Concedo
24b9616344 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/full-rocm.Dockerfile
#	.devops/full.Dockerfile
#	.devops/llama-cli-cuda.Dockerfile
#	.devops/llama-cli-intel.Dockerfile
#	.devops/llama-cli-rocm.Dockerfile
#	.devops/llama-cli-vulkan.Dockerfile
#	.devops/llama-cli.Dockerfile
#	.devops/llama-server-cuda.Dockerfile
#	.devops/llama-server-intel.Dockerfile
#	.devops/llama-server-rocm.Dockerfile
#	.devops/llama-server-vulkan.Dockerfile
#	.devops/llama-server.Dockerfile
#	CMakeLists.txt
#	CONTRIBUTING.md
#	Makefile
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	requirements.txt
#	src/llama.cpp
#	tests/test-backend-ops.cpp
2024-07-19 14:23:33 +08:00
Concedo
5988243aee fix wrong order, fix llava debug mode failure 2024-07-17 15:30:19 +08:00
Concedo
d775a419b2 updated lite with chat inject, added layer detect, added more console logging 2024-07-16 23:10:15 +08:00