Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Concedo
2c1a06a07d
wip ollama emulation, added detokenize endpoint
2024-11-23 22:48:03 +08:00
Concedo
272828cab0
tweaks to chat template
2024-11-21 11:10:30 +08:00
kallewoof
547ab2aebb
API: add /props route ( #1222 )
...
* API: add an /extra/chat_template route
A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model.
* switch to pre-established /props endpoint for chat template
* bug-fix (upstream): one-off in string juggling
2024-11-21 10:58:32 +08:00
Concedo
aa26a58085
added logprobs api and logprobs viewer
2024-11-01 00:22:15 +08:00
Concedo
90f5cd0f67
wip logprobs data
2024-10-30 00:59:34 +08:00
Concedo
12fd16bfd4
Merge commit ' df270ef745
' into concedo_experimental
...
# Conflicts:
# Makefile
# common/CMakeLists.txt
# common/common.h
# common/sampling.cpp
# common/sampling.h
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/server/server.cpp
# include/llama.h
# src/llama-sampling.cpp
# src/llama-sampling.h
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
# tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Concedo
27bbdf7d2a
added link for novita AI, added legacy warning for old GGML models
2024-09-09 11:19:32 +08:00
Concedo
813cf829b5
allow selecting multigpu on vulkan
2024-06-06 18:36:56 +08:00
Concedo
10b148f4c2
added skip bos for tokenize endpoint
2024-06-05 10:49:11 +08:00
Concedo
f24aef8792
initial whisper integration
2024-05-29 23:13:11 +08:00
Concedo
7968bdebbb
added more stats in perf
2024-03-16 16:53:48 +08:00
Concedo
d943c739a8
wip submitting of llava image to backend
2024-03-10 17:14:27 +08:00
Concedo
5a44d4de2b
refactor and clean identifiers for sd, fix cmake
2024-02-29 18:28:45 +08:00
Concedo
524ba12abd
refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr
2024-02-29 14:02:20 +08:00
Concedo
f75e479db0
WIP on sdcpp integration
2024-02-29 00:40:07 +08:00
Concedo
4cd571db89
vulkan multigpu, show uptime
2024-02-08 16:54:38 +08:00
Concedo
ec2dbd99a3
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# llama.cpp
2024-02-07 22:21:32 +08:00
Concedo
481f7a6fbc
warn about unsupported arch
2024-01-26 14:22:43 +08:00
Concedo
2a4a7241e6
Merge branch 'vulkan_test' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# llama.cpp
2024-01-25 23:01:44 +08:00
Concedo
d9a7bd577a
gpu layer offloading disabled for phi models in clblast
2024-01-25 17:40:05 +08:00
Concedo
94e68fe474
added field to show recent seed
2024-01-02 15:35:04 +08:00
Nexesenex
cf360f3e62
Update expose.cpp '#include <cstdint> ( #586 )
2023-12-28 15:01:22 +08:00
Concedo
6570a2005b
token count includes ids
2023-12-03 15:44:53 +08:00
Concedo
be92cfa125
added preloadstory
2023-11-10 13:05:22 +08:00
Concedo
7fb809b94b
fixed auto rope scaling (+1 squashed commits)
...
Squashed commits:
[b1767874] wip
2023-09-07 14:45:08 +08:00
Concedo
1301bd7e29
Fix to skip GPU offloading so falcon models work correctly
2023-08-30 18:26:41 +08:00
Concedo
b95a4ccb22
added a token counting endpoint, set mmq as default
2023-08-24 20:41:49 +08:00
Concedo
280abaf029
added stop reason in the perf endpoint
2023-07-24 11:55:35 +08:00
Concedo
39dc1a46c4
added token count, updated lite
2023-07-20 14:41:06 +08:00
Concedo
1d1111e10f
expose timing info in web api
2023-07-11 18:56:06 +08:00
callMeMakerRen
4e46673f80
Merge branch 'LostRuins:concedo' into concedo
2023-07-08 09:33:26 +08:00
shutup
1727e652f1
expose some useful info that can be used in statistics of performence
2023-07-07 11:52:58 +08:00
Concedo
27a0907cfa
backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas
2023-07-06 22:33:46 +08:00
Concedo
66a3f4e421
added support for lora base
2023-06-10 19:29:45 +08:00
Concedo
43f7e40470
added extra endpoints for abort gen and polled streaming
2023-06-10 18:13:26 +08:00
SammCheese
e6231c3055
back to http.server, improved implementation
2023-06-09 12:17:55 +02:00
SammCheese
9a8da35ec4
working streaming. TODO: fix lite
2023-06-08 18:34:23 +02:00
SammCheese
97971291e9
draft: token streaming
2023-06-08 18:34:08 +02:00
Concedo
a6a0fa338a
cleanup indentation, fixing cublas build
2023-06-08 22:40:53 +08:00
Concedo
6f82e17b7a
added MPT support
2023-06-03 16:14:08 +08:00
Concedo
5d9f5b28a6
rwkv integration completed
2023-05-28 00:48:56 +08:00
Concedo
981d5ba866
Merge remote-tracking branch 'occam/opencl-dev' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ggml-opencl.cpp
# llama.cpp
# otherarch/ggml_v2-opencl-legacy.c
2023-05-22 16:16:48 +08:00
Concedo
75e4548821
missed out gpt2
2023-05-21 01:44:47 +08:00
Concedo
c048bcfec4
remove old filever checks (+7 squashed commit)
...
Squashed commit:
[b72627a] new format not working
[e568870] old ver works
[7053b77] compile errors fixed, fixing linkers
[4ae8889] add new ver
[ff82dfd] file format checks
[25b8aa8] refactoring type names
[931063b] still merging
2023-05-21 00:15:39 +08:00
Concedo
b692e4d2a4
wip
2023-05-14 17:21:07 +08:00
Concedo
2f2eff6e13
the dark gods have been sated, and redpajama is integrated... but at what cost?
2023-05-08 20:58:00 +08:00
Concedo
ff93b394da
fixed a typo
2023-05-06 12:37:34 +08:00
Concedo
2edbcebe27
added optional force versioning flag
2023-05-05 22:02:00 +08:00
Concedo
0fc1772a8f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# ggml.c
2023-04-29 11:14:05 +08:00