askmyteapot
1e72b65c38
GradientAI Auto ROPE Base calculation ( #910 )
...
* GradientAI Auto ROPE Base calculation
https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models
has a formula that better fits the ideal rope scaling.
Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX.
* add in solar scaling logic
Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k.
* Update model_adapter.h
adding in tensor count to identify solar models based on tensor count of 435.
* Update model_adapter.cpp
add in n_tensor count for solar identification
* refactor and cleanup GradientAI rope scaling
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-06-13 18:12:00 +08:00
Concedo
10b148f4c2
added skip bos for tokenize endpoint
2024-06-05 10:49:11 +08:00
Concedo
10a1d628ad
added new binding fields for quant k and quant v
2024-06-03 14:35:59 +08:00
Concedo
4b664b3409
improved EOT handling
2024-05-19 22:04:51 +08:00
Concedo
1db3421c52
multiple minor fixes
2024-05-17 15:47:53 +08:00
Concedo
44443edfda
rep pen slope works (+1 squashed commits)
...
Squashed commits:
[535ad566] experiment with rep pen range
2024-05-15 17:20:57 +08:00
Concedo
eff01660e4
re-added smart context due to people complaining
2024-05-11 17:25:03 +08:00
Concedo
dbe72b959e
tidy up and refactor code to support old flags
2024-05-10 16:50:53 +08:00
Concedo
173c7272d5
EOS bypass mode added
2024-05-06 18:01:49 +08:00
Concedo
b48ea96ead
removed unwanted debugs
2024-05-01 11:35:07 +08:00
Concedo
c65448d17a
add flash attention toggle
2024-04-30 21:29:11 +08:00
Concedo
17a24d753c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/main-intel.Dockerfile
# .devops/main-vulkan.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-vulkan.Dockerfile
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# flake.lock
# llama.cpp
# models/ggml-vocab-falcon.gguf
# models/ggml-vocab-llama-spm.gguf
# models/ggml-vocab-mpt.gguf
# models/ggml-vocab-stablelm.gguf
# models/ggml-vocab-starcoder.gguf
# requirements.txt
# scripts/check-requirements.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-0-bpe.py
# tests/test-tokenizer-0-spm.py
# tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
Concedo
c230b78906
refactored a lot of code, remove bantokens, move it to api
2024-04-27 17:57:13 +08:00
Concedo
4ec8a9c57b
expose stop reason in generation
2024-04-27 01:12:12 +08:00
Concedo
0871c7cbd1
Add additional debug info and increased ctx sizes, fixed a bug loading vulkan config
2024-04-25 23:07:37 +08:00
Concedo
cb2dbe9e9a
improved rep pen speed
2024-04-24 21:29:21 +08:00
Concedo
b4d2031215
merged, added ability to render special tokens
2024-04-22 18:19:58 +08:00
Concedo
3170284fc3
added support for special tokens as stop sequences
2024-04-20 09:48:32 +08:00
Concedo
b01820dec7
auto rope scaling changes
2024-04-19 23:08:55 +08:00
Concedo
9a25d77cc1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# ggml-cuda.cu
# ggml.c
# grammars/README.md
# scripts/get-wikitext-2.sh
# scripts/hf.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-04-14 21:18:39 +08:00
Concedo
125f84aa02
fixed compiler warnings
2024-04-08 16:40:55 +08:00
Concedo
a530afa1e4
Merge commit ' 280345968d
' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/main-cuda.Dockerfile
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.lock
# llama.cpp
# scripts/LlamaConfig.cmake.in
# scripts/compare-commits.sh
# scripts/server-llm.sh
# tests/test-quantize-fns.cpp
2024-04-07 20:27:17 +08:00
Concedo
2ef03c9de6
fix for physical batch size
2024-03-15 16:45:20 +08:00
Concedo
47c42fd45c
fix for mamba processing
2024-03-13 13:27:46 +08:00
Concedo
484d90c330
llava support is now fully functioning
2024-03-11 15:55:32 +08:00
Concedo
d943c739a8
wip submitting of llava image to backend
2024-03-10 17:14:27 +08:00
Concedo
c08d7e5042
wip integration of llava
2024-03-10 11:18:47 +08:00
Concedo
7c64845dea
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/nix/sif.nix
# .github/workflows/build.yml
# .github/workflows/python-check-requirements.yml
# README-sycl.md
# README.md
# flake.lock
# flake.nix
# requirements/requirements-convert-hf-to-gguf.txt
# scripts/compare-llama-bench.py
2024-03-04 15:33:33 +08:00
Concedo
2d9a90b652
try to fix ci compile errors (+1 squashed commits)
...
Squashed commits:
[d0d49663] fixed log multiline (+1 squashed commits)
Squashed commits:
[81a8befe] try to fix linux build error (+1 squashed commits)
Squashed commits:
[22850dda] try to fix build (+1 squashed commits)
Squashed commits:
[b8294611] missing type
2024-03-01 23:38:15 +08:00
Concedo
55af5446ad
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# ci/run.sh
# llama.cpp
# scripts/sync-ggml.last
2024-03-01 17:41:37 +08:00
Concedo
524ba12abd
refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr
2024-02-29 14:02:20 +08:00
Concedo
f75e479db0
WIP on sdcpp integration
2024-02-29 00:40:07 +08:00
Concedo
ad638285de
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# flake.lock
# ggml-cuda.cu
# llama.cpp
# tests/test-backend-ops.cpp
# tests/test-quantize-fns.cpp
2024-02-28 13:41:35 +08:00
Concedo
d47e13c892
fixed compile error: GGML_BACKEND_TYPE_GPU (+1 squashed commits)
...
Squashed commits:
[00ca282a] fixed compile error: LLAMA_SPLIT_MODE_ROW
2024-02-26 10:55:35 +08:00
Concedo
b5ba6c9ece
test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models
2024-02-25 21:14:53 +08:00
Concedo
6d6d79f359
fixed a horrible bug in thread counts
2024-02-22 23:57:40 +08:00
Concedo
8d5e25008f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# tests/test-tokenizer-0-falcon.cpp
# tests/test-tokenizer-0-llama.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-llama.cpp
2024-02-17 15:22:05 +08:00
Concedo
066e73d769
context shift even more lenient
2024-02-11 18:30:38 +08:00
Concedo
590af480ab
contextshift more forgiving
2024-02-10 20:49:21 +08:00
Concedo
35111ce01a
row split mode is now a toggle
2024-02-09 18:35:58 +08:00
Concedo
992eea71d7
fixes for vulkan multigpu
2024-02-09 14:42:27 +08:00
Concedo
fe424a5466
tensor split active text
2024-02-09 12:02:23 +08:00
Concedo
4cd571db89
vulkan multigpu, show uptime
2024-02-08 16:54:38 +08:00
Concedo
35c32fd0f2
refactor some old code with batching
2024-02-05 15:54:45 +08:00
Alexander Abushady
4cb956c7db
Quadratic Sampling UI ( #652 )
...
* Quadratic Sampling UI
Kalomaze's Quadratic Sampling, now has a UI within KCPP.
* remove debug prints
* cleanup, add smooth sampler to dynatemp
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-02-04 16:26:27 +08:00
Concedo
2b02cd75c7
reformat debug logging
2024-02-01 23:20:51 +08:00
Concedo
340fbbbb04
show warning if genamt >= ctxsize, show t/s values
2024-01-31 18:51:42 +08:00
Concedo
13dcf4b556
print seed
2024-01-31 14:42:47 +08:00
Concedo
21ab727e83
change split mode to rows
2024-01-30 22:30:08 +08:00
Concedo
ed09a854f0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# ggml-opencl.cpp
# tests/CMakeLists.txt
2024-01-27 11:45:07 +08:00