Concedo
4b96c3bba8
try new batch api (not actually batching)
2024-11-14 13:47:26 +08:00
Concedo
3813f6c517
added new flag nofastforward allowing users to disable fast forwarding
2024-11-13 10:59:01 +08:00
Concedo
48e9372337
prevent outputting infinity to logprobs (+1 squashed commits)
...
Squashed commits:
[bcc5f8b92] prevent outputting infinity to logprobs
2024-11-13 00:09:53 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled ( #1201 )
2024-11-06 23:09:28 +08:00
Concedo
223c5f0844
clblast survived
2024-11-02 21:51:38 +08:00
Concedo
bbebc76817
fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
...
Squashed commits:
[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Concedo
aa26a58085
added logprobs api and logprobs viewer
2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67
wip logprobs data
2024-10-30 00:59:34 +08:00
Concedo
94a5a27b85
Alone in the darkness
...
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Concedo
becd737e0f
slightly increase padding to handle longer gen amts
2024-10-23 22:58:41 +08:00
Maya
8bb220329c
Dynamic sizes for sequences ( #1157 )
...
* Dynamic sizes for sequences
* cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields
* adjust anti abuse limits
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-10-16 23:55:11 +08:00
Concedo
7f76425450
lower topk prefilter token amount to 3k
2024-10-16 20:39:41 +08:00
Concedo
cff72c5d26
remove unwanted print
2024-10-11 18:56:32 +08:00
Concedo
e692a79aab
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# CONTRIBUTING.md
# docs/android.md
# docs/docker.md
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/main/README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/save-load-state/save-load-state.cpp
# examples/server/README.md
# examples/simple/CMakeLists.txt
# examples/speculative/speculative.cpp
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-blas.cpp
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# scripts/debug-test.sh
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Maya
5c9650d68e
Fix access violation when using banned_phrases ( #1154 )
2024-10-10 21:46:39 +08:00
Concedo
fe5479f286
unify antislop and token bans
2024-10-10 18:21:07 +08:00
Concedo
9b614d46bd
antislop sampler working
2024-10-09 16:33:04 +08:00
Concedo
36e9bac98f
wip anti slop sampler
2024-10-09 13:34:47 +08:00
Concedo
f78f8d3d45
wip anti slop
2024-10-07 23:18:13 +08:00
Concedo
65f3c68399
wip antislop
2024-10-07 20:19:22 +08:00
Concedo
740c5e01cb
added token delay feature
2024-10-07 19:45:51 +08:00
Concedo
3e8bb10e2d
wip on rewind function
2024-10-06 16:21:03 +08:00
Concedo
c38d1ecc8d
update templates, fix rwkv
2024-09-22 01:32:12 +08:00
Concedo
53bf0fb32d
removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.
2024-09-15 19:21:52 +08:00
Concedo
e44ddf26ef
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# Makefile
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/llama-bench/llama-bench.cpp
# examples/llava/MobileVLM-README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize/CMakeLists.txt
# examples/server/README.md
# examples/speculative/speculative.cpp
# tests/test-backend-ops.cpp
2024-09-13 16:17:24 +08:00
Concedo
7bdac9bc44
prevent shifting on rwkv
2024-09-11 20:22:45 +08:00
Concedo
eee67281be
move kcpp params out
2024-09-10 16:30:12 +08:00
Concedo
fc7fe2e7a0
allow rwkv6 to run although its broken
2024-09-09 20:50:58 +08:00
Concedo
b63158005f
All samplers moved to kcpp side
2024-09-09 18:14:11 +08:00
Concedo
12fd16bfd4
Merge commit ' df270ef745
' into concedo_experimental
...
# Conflicts:
# Makefile
# common/CMakeLists.txt
# common/common.h
# common/sampling.cpp
# common/sampling.h
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/server/server.cpp
# include/llama.h
# src/llama-sampling.cpp
# src/llama-sampling.h
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
# tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
c78690737c
fix for DRY segfault on unicode character substring tokenization
2024-09-08 18:25:00 +08:00
Concedo
d220495dd4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# .github/workflows/docker.yml
# docs/docker.md
# examples/llama-bench/llama-bench.cpp
# flake.lock
# ggml/include/ggml.h
# ggml/src/CMakeLists.txt
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-rope.cpp
2024-08-30 10:37:39 +08:00
Concedo
b78a637da5
try to optimize context shifting
2024-08-26 23:07:31 +08:00
Concedo
cca3c4c78b
xtc fixes
2024-08-22 23:18:46 +08:00
Concedo
fc2545dc83
fixed a typo
2024-08-22 00:25:56 +08:00
Concedo
5bf527a6ae
added xtc sampler
2024-08-21 23:57:15 +08:00
Concedo
1a7ecd55e6
timing for init step, clip for vulkan
2024-08-21 18:14:53 +08:00
Concedo
cd69ab218e
fixed DRY
2024-08-21 17:01:28 +08:00
Concedo
6a4becb731
dry is still buggy because token indexes are wrong
2024-08-21 00:59:26 +08:00
Concedo
db6ef8d1e1
revert dry state reset
2024-08-20 22:22:21 +08:00
Concedo
c1ae350e5b
fixed race condition when generating
2024-08-20 20:17:55 +08:00
Concedo
e12ab53488
force clear some DRY state vars on new generation - not sure if this helps
2024-08-14 21:35:39 +08:00
Concedo
689a17d756
always prefilter to 5k logits
2024-08-12 22:27:06 +08:00
Concedo
729eb1e552
no fast forward for empty prompt
2024-07-27 16:29:35 +08:00
Concedo
eb5b4d0186
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# Package.swift
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-07-23 23:20:32 +08:00
Concedo
e2b36aa6cf
fixed dry loading seq when not in use, set kcppt to -1 layers by default
2024-07-22 15:44:34 +08:00
Concedo
0ecf13fc13
updated lite, extra error logging
2024-07-21 17:55:47 +08:00
Concedo
24b9616344
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-cli.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .devops/llama-server.Dockerfile
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# requirements.txt
# src/llama.cpp
# tests/test-backend-ops.cpp
2024-07-19 14:23:33 +08:00
Concedo
5988243aee
fix wrong order, fix llava debug mode failure
2024-07-17 15:30:19 +08:00
Concedo
d775a419b2
updated lite with chat inject, added layer detect, added more console logging
2024-07-16 23:10:15 +08:00