Commit graph

239 commits

Author SHA1 Message Date
Concedo
b48ea96ead removed unwanted debugs 2024-05-01 11:35:07 +08:00
Concedo
c65448d17a add flash attention toggle 2024-04-30 21:29:11 +08:00
Concedo
17a24d753c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/main-intel.Dockerfile
#	.devops/main-vulkan.Dockerfile
#	.devops/server-intel.Dockerfile
#	.devops/server-vulkan.Dockerfile
#	.github/workflows/bench.yml
#	.github/workflows/build.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/server.yml
#	.gitignore
#	Makefile
#	README-sycl.md
#	README.md
#	ci/run.sh
#	flake.lock
#	llama.cpp
#	models/ggml-vocab-falcon.gguf
#	models/ggml-vocab-llama-spm.gguf
#	models/ggml-vocab-mpt.gguf
#	models/ggml-vocab-stablelm.gguf
#	models/ggml-vocab-starcoder.gguf
#	requirements.txt
#	scripts/check-requirements.sh
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-tokenizer-0-bpe.py
#	tests/test-tokenizer-0-spm.py
#	tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
Concedo
c230b78906 refactored a lot of code, remove bantokens, move it to api 2024-04-27 17:57:13 +08:00
Concedo
4ec8a9c57b expose stop reason in generation 2024-04-27 01:12:12 +08:00
Concedo
0871c7cbd1 Add additional debug info and increased ctx sizes, fixed a bug loading vulkan config 2024-04-25 23:07:37 +08:00
Concedo
cb2dbe9e9a improved rep pen speed 2024-04-24 21:29:21 +08:00
Concedo
b4d2031215 merged, added ability to render special tokens 2024-04-22 18:19:58 +08:00
Concedo
3170284fc3 added support for special tokens as stop sequences 2024-04-20 09:48:32 +08:00
Concedo
b01820dec7 auto rope scaling changes 2024-04-19 23:08:55 +08:00
Concedo
9a25d77cc1 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	Makefile
#	README-sycl.md
#	README.md
#	ci/run.sh
#	ggml-cuda.cu
#	ggml.c
#	grammars/README.md
#	scripts/get-wikitext-2.sh
#	scripts/hf.sh
#	scripts/sync-ggml.last
#	tests/test-backend-ops.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-json-schema-to-grammar.cpp
2024-04-14 21:18:39 +08:00
Concedo
125f84aa02 fixed compiler warnings 2024-04-08 16:40:55 +08:00
Concedo
a530afa1e4 Merge commit '280345968d' into concedo_experimental
# Conflicts:
#	.devops/full-cuda.Dockerfile
#	.devops/llama-cpp-cuda.srpm.spec
#	.devops/main-cuda.Dockerfile
#	.devops/nix/package.nix
#	.devops/server-cuda.Dockerfile
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	ci/run.sh
#	docs/token_generation_performance_tips.md
#	flake.lock
#	llama.cpp
#	scripts/LlamaConfig.cmake.in
#	scripts/compare-commits.sh
#	scripts/server-llm.sh
#	tests/test-quantize-fns.cpp
2024-04-07 20:27:17 +08:00
Concedo
2ef03c9de6 fix for physical batch size 2024-03-15 16:45:20 +08:00
Concedo
47c42fd45c fix for mamba processing 2024-03-13 13:27:46 +08:00
Concedo
484d90c330 llava support is now fully functioning 2024-03-11 15:55:32 +08:00
Concedo
d943c739a8 wip submitting of llava image to backend 2024-03-10 17:14:27 +08:00
Concedo
c08d7e5042 wip integration of llava 2024-03-10 11:18:47 +08:00
Concedo
7c64845dea Merge branch 'master' into concedo_experimental
# Conflicts:
#	.devops/nix/sif.nix
#	.github/workflows/build.yml
#	.github/workflows/python-check-requirements.yml
#	README-sycl.md
#	README.md
#	flake.lock
#	flake.nix
#	requirements/requirements-convert-hf-to-gguf.txt
#	scripts/compare-llama-bench.py
2024-03-04 15:33:33 +08:00
Concedo
2d9a90b652 try to fix ci compile errors (+1 squashed commits)
Squashed commits:

[d0d49663] fixed log multiline (+1 squashed commits)

Squashed commits:

[81a8befe] try to fix linux build error (+1 squashed commits)

Squashed commits:

[22850dda] try to fix build (+1 squashed commits)

Squashed commits:

[b8294611] missing type
2024-03-01 23:38:15 +08:00
Concedo
55af5446ad Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	ci/run.sh
#	llama.cpp
#	scripts/sync-ggml.last
2024-03-01 17:41:37 +08:00
Concedo
524ba12abd refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr 2024-02-29 14:02:20 +08:00
Concedo
f75e479db0 WIP on sdcpp integration 2024-02-29 00:40:07 +08:00
Concedo
ad638285de Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	flake.lock
#	ggml-cuda.cu
#	llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-quantize-fns.cpp
2024-02-28 13:41:35 +08:00
Concedo
d47e13c892 fixed compile error: GGML_BACKEND_TYPE_GPU (+1 squashed commits)
Squashed commits:

[00ca282a] fixed compile error: LLAMA_SPLIT_MODE_ROW
2024-02-26 10:55:35 +08:00
Concedo
b5ba6c9ece test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models 2024-02-25 21:14:53 +08:00
Concedo
6d6d79f359 fixed a horrible bug in thread counts 2024-02-22 23:57:40 +08:00
Concedo
8d5e25008f Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	ci/run.sh
#	tests/test-tokenizer-0-falcon.cpp
#	tests/test-tokenizer-0-llama.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-llama.cpp
2024-02-17 15:22:05 +08:00
Concedo
066e73d769 context shift even more lenient 2024-02-11 18:30:38 +08:00
Concedo
590af480ab contextshift more forgiving 2024-02-10 20:49:21 +08:00
Concedo
35111ce01a row split mode is now a toggle 2024-02-09 18:35:58 +08:00
Concedo
992eea71d7 fixes for vulkan multigpu 2024-02-09 14:42:27 +08:00
Concedo
fe424a5466 tensor split active text 2024-02-09 12:02:23 +08:00
Concedo
4cd571db89 vulkan multigpu, show uptime 2024-02-08 16:54:38 +08:00
Concedo
35c32fd0f2 refactor some old code with batching 2024-02-05 15:54:45 +08:00
Alexander Abushady
4cb956c7db
Quadratic Sampling UI (#652)
* Quadratic Sampling UI

Kalomaze's Quadratic Sampling, now has a UI within KCPP.

* remove debug prints

* cleanup, add smooth sampler to dynatemp

---------

Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-02-04 16:26:27 +08:00
Concedo
2b02cd75c7 reformat debug logging 2024-02-01 23:20:51 +08:00
Concedo
340fbbbb04 show warning if genamt >= ctxsize, show t/s values 2024-01-31 18:51:42 +08:00
Concedo
13dcf4b556 print seed 2024-01-31 14:42:47 +08:00
Concedo
21ab727e83 change split mode to rows 2024-01-30 22:30:08 +08:00
Concedo
ed09a854f0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
#	ci/run.sh
#	ggml-opencl.cpp
#	tests/CMakeLists.txt
2024-01-27 11:45:07 +08:00
Concedo
762eeb6204 triage for opencl 2024-01-27 11:09:43 +08:00
Concedo
d9a7bd577a gpu layer offloading disabled for phi models in clblast 2024-01-25 17:40:05 +08:00
Concedo
08236ccc97 better abort handling, added support for dynatemp exponent 2024-01-23 16:56:12 +08:00
Concedo
5ff53507c4 fixed compile issues for cublas 2024-01-21 14:23:48 +08:00
Concedo
5639c1a520 units (+2 squashed commit)
Squashed commit:

[166979d9] units coversion

[038dd5d4] get rid of all warnings (+1 squashed commits)

Squashed commits:

[6efd1e1b] get rid of all warnings
2024-01-20 23:53:21 +08:00
Concedo
db14de5c32 fossilize ggml library ver 3, to support ggjtv3 2024-01-20 10:49:25 +08:00
kalomaze
123bff9a0f
Full DynaTemp implementation + UI (#600)
* move Dynatemp changes to new branch

* fix float header

* Properly reintroduce variable expert count

Controllable through experts.txt

* first pass at DynaTemp UI

Checkbox partial implemented, Min and Max Temp implemented

* DynaTemp UI Checkbox

Trigger DynaTemp on checkbox

* DynaTemp UI checkbox edition

Hell Yeah! DynaTemp!

* Remove greedy dynatemp

* Fix race condition caused by debug print

* Fixed broken presets and miro

Fixes broken presets and mirostat

* Remove debug function + HHI temp

Also removed unnecessary softmax double precision

* Fix whitespace (?) for generate function

* epic upstream renaming scheme fix

* fix stupid indents

* Other cleanup

Reintroduce unused rep pen function, move temp functions first before entropy dynamic temp

* Slight indent fix

* revert batch pyinstaller maker to mainline

and also delete experts.txt since adjustable routing is also being removed for the PR

* compact dynatemp into a single value dynatemp_range. This is a float which represents the allowed deviation from the min and max temperature when using dynatemp. Thus, if we want a value of dynatemp_min=0.3, dynatemp_max=0.5, then we would simply set temperature=0.4 and dynatemp_range=0.1. Functionally dynatemp would operate the same, but it would simplify usage and make it a single easy to adjust value.

---------

Co-authored-by: Alexander Abushady <aabushady214@gmail.com>
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-01-06 11:13:16 +08:00
Concedo
e49d398f73 use same struct size for cuda and non cuda (+1 squashed commits)
Squashed commits:

[6eee8e2f] use same struct size for cuda and non cuda
2024-01-03 16:05:54 +08:00
Concedo
94e68fe474 added field to show recent seed 2024-01-02 15:35:04 +08:00