Concedo
7c64845dea
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/nix/sif.nix
# .github/workflows/build.yml
# .github/workflows/python-check-requirements.yml
# README-sycl.md
# README.md
# flake.lock
# flake.nix
# requirements/requirements-convert-hf-to-gguf.txt
# scripts/compare-llama-bench.py
2024-03-04 15:33:33 +08:00
Concedo
2d9a90b652
try to fix ci compile errors (+1 squashed commits)
...
Squashed commits:
[d0d49663] fixed log multiline (+1 squashed commits)
Squashed commits:
[81a8befe] try to fix linux build error (+1 squashed commits)
Squashed commits:
[22850dda] try to fix build (+1 squashed commits)
Squashed commits:
[b8294611] missing type
2024-03-01 23:38:15 +08:00
Concedo
55af5446ad
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# ci/run.sh
# llama.cpp
# scripts/sync-ggml.last
2024-03-01 17:41:37 +08:00
Concedo
524ba12abd
refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr
2024-02-29 14:02:20 +08:00
Concedo
f75e479db0
WIP on sdcpp integration
2024-02-29 00:40:07 +08:00
Concedo
ad638285de
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# flake.lock
# ggml-cuda.cu
# llama.cpp
# tests/test-backend-ops.cpp
# tests/test-quantize-fns.cpp
2024-02-28 13:41:35 +08:00
Concedo
d47e13c892
fixed compile error: GGML_BACKEND_TYPE_GPU (+1 squashed commits)
...
Squashed commits:
[00ca282a] fixed compile error: LLAMA_SPLIT_MODE_ROW
2024-02-26 10:55:35 +08:00
Concedo
b5ba6c9ece
test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models
2024-02-25 21:14:53 +08:00
Concedo
6d6d79f359
fixed a horrible bug in thread counts
2024-02-22 23:57:40 +08:00
Concedo
8d5e25008f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# tests/test-tokenizer-0-falcon.cpp
# tests/test-tokenizer-0-llama.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-llama.cpp
2024-02-17 15:22:05 +08:00
Concedo
066e73d769
context shift even more lenient
2024-02-11 18:30:38 +08:00
Concedo
590af480ab
contextshift more forgiving
2024-02-10 20:49:21 +08:00
Concedo
35111ce01a
row split mode is now a toggle
2024-02-09 18:35:58 +08:00
Concedo
992eea71d7
fixes for vulkan multigpu
2024-02-09 14:42:27 +08:00
Concedo
fe424a5466
tensor split active text
2024-02-09 12:02:23 +08:00
Concedo
4cd571db89
vulkan multigpu, show uptime
2024-02-08 16:54:38 +08:00
Concedo
35c32fd0f2
refactor some old code with batching
2024-02-05 15:54:45 +08:00
Alexander Abushady
4cb956c7db
Quadratic Sampling UI ( #652 )
...
* Quadratic Sampling UI
Kalomaze's Quadratic Sampling, now has a UI within KCPP.
* remove debug prints
* cleanup, add smooth sampler to dynatemp
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-02-04 16:26:27 +08:00
Concedo
2b02cd75c7
reformat debug logging
2024-02-01 23:20:51 +08:00
Concedo
340fbbbb04
show warning if genamt >= ctxsize, show t/s values
2024-01-31 18:51:42 +08:00
Concedo
13dcf4b556
print seed
2024-01-31 14:42:47 +08:00
Concedo
21ab727e83
change split mode to rows
2024-01-30 22:30:08 +08:00
Concedo
ed09a854f0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# ggml-opencl.cpp
# tests/CMakeLists.txt
2024-01-27 11:45:07 +08:00
Concedo
762eeb6204
triage for opencl
2024-01-27 11:09:43 +08:00
Concedo
d9a7bd577a
gpu layer offloading disabled for phi models in clblast
2024-01-25 17:40:05 +08:00
Concedo
08236ccc97
better abort handling, added support for dynatemp exponent
2024-01-23 16:56:12 +08:00
Concedo
5ff53507c4
fixed compile issues for cublas
2024-01-21 14:23:48 +08:00
Concedo
5639c1a520
units (+2 squashed commit)
...
Squashed commit:
[166979d9] units coversion
[038dd5d4] get rid of all warnings (+1 squashed commits)
Squashed commits:
[6efd1e1b] get rid of all warnings
2024-01-20 23:53:21 +08:00
Concedo
db14de5c32
fossilize ggml library ver 3, to support ggjtv3
2024-01-20 10:49:25 +08:00
kalomaze
123bff9a0f
Full DynaTemp implementation + UI ( #600 )
...
* move Dynatemp changes to new branch
* fix float header
* Properly reintroduce variable expert count
Controllable through experts.txt
* first pass at DynaTemp UI
Checkbox partial implemented, Min and Max Temp implemented
* DynaTemp UI Checkbox
Trigger DynaTemp on checkbox
* DynaTemp UI checkbox edition
Hell Yeah! DynaTemp!
* Remove greedy dynatemp
* Fix race condition caused by debug print
* Fixed broken presets and miro
Fixes broken presets and mirostat
* Remove debug function + HHI temp
Also removed unnecessary softmax double precision
* Fix whitespace (?) for generate function
* epic upstream renaming scheme fix
* fix stupid indents
* Other cleanup
Reintroduce unused rep pen function, move temp functions first before entropy dynamic temp
* Slight indent fix
* revert batch pyinstaller maker to mainline
and also delete experts.txt since adjustable routing is also being removed for the PR
* compact dynatemp into a single value dynatemp_range. This is a float which represents the allowed deviation from the min and max temperature when using dynatemp. Thus, if we want a value of dynatemp_min=0.3, dynatemp_max=0.5, then we would simply set temperature=0.4 and dynatemp_range=0.1. Functionally dynatemp would operate the same, but it would simplify usage and make it a single easy to adjust value.
---------
Co-authored-by: Alexander Abushady <aabushady214@gmail.com>
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2024-01-06 11:13:16 +08:00
Concedo
e49d398f73
use same struct size for cuda and non cuda (+1 squashed commits)
...
Squashed commits:
[6eee8e2f] use same struct size for cuda and non cuda
2024-01-03 16:05:54 +08:00
Concedo
94e68fe474
added field to show recent seed
2024-01-02 15:35:04 +08:00
Concedo
5e59112de8
prevent other calls when uninitialized
2023-12-28 12:04:53 +08:00
Concedo
2d5d82e915
addlocate gpt_params on heap instead to avoid rare segfault
2023-12-28 11:48:21 +08:00
DebuggingLife46
e733a9e425
Add logit_bias to the OpenAI api ( #577 )
...
* Add logit_bias to the OpenAI api
* Cleanup and refactor, test in swagger.
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2023-12-27 00:26:19 +08:00
Concedo
8823e8b06d
added presence penalty into lite ui
2023-12-23 10:39:40 +08:00
Concedo
77463e0e9c
batch size improvements
2023-12-22 15:27:40 +08:00
Concedo
3f863eed72
add presence penalty
2023-12-19 23:18:56 +08:00
Concedo
7469f202ea
use lowvram flag for offload qkv
2023-12-08 18:16:14 +08:00
Concedo
ec21fa7712
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# ggml-cuda.cu
# llama.cpp
# llama.h
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
2023-12-08 17:42:26 +08:00
Concedo
c7511526a2
noscript mode is done
2023-12-07 00:52:25 +08:00
Concedo
6570a2005b
token count includes ids
2023-12-03 15:44:53 +08:00
Concedo
c142c5634a
fixed segfault with clblast by reversing commit in issue https://github.com/ggerganov/llama.cpp/issues/4296
2023-12-03 00:56:00 +08:00
Concedo
12f66eaa1d
adjust fragmentation fix
2023-12-02 15:59:08 +08:00
Concedo
a012342a77
updated docs, shifted kv extra space to be subtracted from user's ctx value instead of added on load.
2023-11-30 14:19:40 +08:00
Concedo
ba5c33319b
Allocate a small amount of extra context for GGUF to deal with KV fragmentation causing issues in some scenarios.
2023-11-28 20:55:14 +08:00
Concedo
bffa78116d
explore quiet mode
2023-11-26 23:57:27 +08:00
Concedo
a6eb9b8010
Fix GPT2 not loading due to graph too small
2023-11-26 23:06:42 +08:00
Concedo
eb42c73953
revert auto rope scaling for already-ropetuned models - just use their values
2023-11-24 14:20:36 +08:00
Concedo
4d7c14be73
fix stop seq escaping newline
2023-11-20 22:35:45 +08:00