djw
|
3f9bbf1181
|
support qwen3, dont speak human language
|
2025-04-28 08:44:47 +00:00 |
|
chenht2022
|
f3d842a0ca
|
support AMX
|
2025-04-25 14:47:16 +00:00 |
|
wang jiahao
|
b90362b5e6
|
Merge pull request #1198 from kvcache-ai/fix-max_new_tokens
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
fix load default max_new_tokens
|
2025-04-25 12:22:41 +08:00 |
|
qiyuxinlin
|
7af83f9efb
|
fix load default max_new_tokens
|
2025-04-25 04:20:12 +00:00 |
|
Atream
|
67042d11e3
|
Merge pull request #1193 from kvcache-ai/fix-chat-template-encoding
Book-CI / test (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
fix chat template encoding
|
2025-04-23 22:44:46 -06:00 |
|
Atream
|
46493789eb
|
fix chat template encoding
|
2025-04-24 12:44:16 +08:00 |
|
wang jiahao
|
449a83dff6
|
Merge pull request #1183 from kvcache-ai/check-para
Book-CI / test (push) Waiting to run
Deploy / deploy (macos-latest) (push) Waiting to run
Deploy / deploy (ubuntu-latest) (push) Waiting to run
Deploy / deploy (windows-latest) (push) Waiting to run
add check-para
|
2025-04-23 16:27:18 +08:00 |
|
Alisehen
|
f7d939313b
|
Merge remote-tracking branch 'origin/main' into check-para
|
2025-04-23 02:40:14 +00:00 |
|
Alisehen
|
99540ad01f
|
add check parameters
|
2025-04-23 02:38:43 +00:00 |
|
wang jiahao
|
7e4813e8ad
|
Merge pull request #1184 from kvcache-ai/update_param
Book-CI / test (push) Failing after 3s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
change test
|
2025-04-22 20:55:11 +08:00 |
|
qiyuxinlin
|
3a044e6b14
|
change test
|
2025-04-22 12:50:39 +00:00 |
|
Alisehen
|
c995bdbbfa
|
add check-para
|
2025-04-22 09:30:08 +00:00 |
|
wang jiahao
|
739358789e
|
Merge pull request #1182 from kvcache-ai/fix-kill-balance_serve
Book-CI / test (push) Failing after 5s
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
kill serve lead to kill sched and engine
|
2025-04-22 17:28:06 +08:00 |
|
qiyuxinlin
|
4f9950e30c
|
kill serve lead to kill sched and engine
|
2025-04-22 09:25:44 +00:00 |
|
wang jiahao
|
4c41f3a35f
|
Merge pull request #1180 from kvcache-ai/update_param
update speed test
|
2025-04-22 15:39:57 +08:00 |
|
qiyuxinlin
|
b17ab8653c
|
update speed test
|
2025-04-22 07:38:05 +00:00 |
|
wang jiahao
|
485588017b
|
Merge pull request #1177 from kvcache-ai/update_param
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Update param
|
2025-04-22 10:14:36 +08:00 |
|
qiyuxinlin
|
f5287e908a
|
fix no balance_serve import error
|
2025-04-22 02:11:18 +00:00 |
|
qiyuxinlin
|
03a65d6bea
|
roll back ktransformers backend, add max_tokens, max_completion_tokens param
|
2025-04-21 12:55:37 +00:00 |
|
wang jiahao
|
a1162eea01
|
Merge pull request #1158 from Creeper-MZ/function_call
Book-CI / test (push) Failing after 9s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Update Function call
|
2025-04-19 16:31:37 +08:00 |
|
Creeper-MZ
|
133ba746e9
|
优化提示词,解决部分Deepseek r1的兼容性
优化提示词,解决部分Deepseek r1的兼容性
fix non stream
|
2025-04-19 01:20:27 -04:00 |
|
Atream
|
34c199403b
|
Merge pull request #1170 from onepick/fix-cmake-error
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Deploy / deploy (windows-latest) (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Fix cmake config error
|
2025-04-18 07:51:03 -06:00 |
|
wang jiahao
|
0892d37d2d
|
Merge pull request #1172 from kvcache-ai/move_create_sched
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Move KV cache creation to balance_serve
|
2025-04-18 18:19:29 +08:00 |
|
qiyuxinlin
|
38e841900d
|
Move KV cache creation to balance_serve
|
2025-04-18 10:10:07 +00:00 |
|
onepick
|
c5edd3fdf0
|
Fix cmake config error
Signed-off-by: onepick <jiajuku12@163.com>
|
2025-04-18 15:43:03 +08:00 |
|
Atream
|
e44c45e782
|
Merge pull request #1163 from cyhasuka/main
Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation
|
2025-04-18 00:50:58 -06:00 |
|
Atream
|
08f0bd5e13
|
Merge pull request #1168 from kvcache-ai/Atream-patch-1
remove hard code max_length
|
2025-04-17 22:40:28 -06:00 |
|
Atream
|
e6fb4d5a58
|
remove hard code max_length
|
2025-04-18 12:11:18 +08:00 |
|
Jianwei Dong
|
22a30d707d
|
Merge pull request #1167 from kvcache-ai/update-llama4-tutorial-patch-1
update llama4 tutorial
|
2025-04-18 11:44:11 +08:00 |
|
djw
|
dfaf2b20fb
|
update llama4 tutorial
|
2025-04-18 03:42:48 +00:00 |
|
Creeper-MZ
|
62c4023160
|
Fixed #1155
|
2025-04-17 10:21:51 -04:00 |
|
Yuhao Tsui
|
eff5bbc202
|
Merge branch 'kvcache-ai:main' into main
|
2025-04-17 22:01:31 +08:00 |
|
Creeper-MZ
|
4fb19bfcae
|
Update chat.py
|
2025-04-17 09:19:14 -04:00 |
|
ZiWei Yuan
|
8770b6d573
|
Merge pull request #1159 from onepick/fix-rocm-build-error
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Fix some build error for ROCM
|
2025-04-17 19:57:44 +08:00 |
|
onepick
|
6a7624fe4a
|
Change the logic to build device since cuda is as default
Signed-off-by: onepick <jiajuku12@163.com>
|
2025-04-17 19:44:05 +08:00 |
|
Yuhao Tsui
|
8ce34b3b5c
|
Modify the performance calculation module
Modify the performance data calculation module from estimation to retrieving from `raw_usage`.
|
2025-04-17 16:57:53 +08:00 |
|
wang jiahao
|
6e4da83d4b
|
Merge pull request #978 from cyhasuka/main
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Support Non-streaming chat in Ollama backend
|
2025-04-17 14:34:35 +08:00 |
|
wang jiahao
|
b055132369
|
Merge pull request #1154 from 344303947/features/add-function-calling
Fix the error caused by the client not passing temperature and top_p being empty
|
2025-04-17 14:31:02 +08:00 |
|
onepick
|
97f1995696
|
Fix some build error for ROCM
1. Fix terrible logic in CMakeLists.txt
2. using the correct typedef for hip
Signed-off-by: onepick <jiajuku12@163.com>
|
2025-04-17 11:34:33 +08:00 |
|
Creeper-MZ
|
cb266c98d4
|
Fix a bug
|
2025-04-16 23:31:33 -04:00 |
|
wang jiahao
|
3efb66213b
|
Merge pull request #1157 from jiangshibiao/dev-fix-bug
Add bsz_tensors param to torch linear
|
2025-04-17 10:11:01 +08:00 |
|
Creeper-MZ
|
6bc2e85343
|
Update chat.py
|
2025-04-16 15:54:23 -04:00 |
|
Creeper-MZ
|
88f688e2c8
|
更改token注入逻辑,减少token注入量,防止遗忘
Update chat.py
Update chat.py
Update chat.py
|
2025-04-16 15:52:24 -04:00 |
|
root
|
921061666c
|
fix some bugs
|
2025-04-17 00:48:09 +08:00 |
|
kevin
|
c8db24d5eb
|
Update config.py
Update config.py
|
2025-04-16 17:32:08 +08:00 |
|
kevin
|
badf7a1bb1
|
Merge branch 'kvcache-ai:main' into features/add-function-calling
|
2025-04-16 17:21:27 +08:00 |
|
Chengyu Qiu
|
d2cf81423f
|
Merge pull request #1135 from Creeper-MZ/function_call
Book-CI / test (push) Failing after 3s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Add Function call support
|
2025-04-16 09:57:22 +08:00 |
|
ZiWei Yuan
|
fcbd41e175
|
Merge pull request #1143 from jizhilong/improve-cmake-subprocess-output
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
feat(build): display limited tail of subprocesses in real time
|
2025-04-15 17:37:44 +08:00 |
|
jizhilong
|
0638ea298d
|
feat(build): display limited tail of subprocesses in real time
this is a followup on #1108
|
2025-04-15 16:40:38 +08:00 |
|
ZiWei Yuan
|
8dc1ab9e04
|
Merge pull request #1108 from jizhilong/expose-cmake-logs
Book-CI / test (push) Failing after 3s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
chore: show cmake output in real time during build_ext
|
2025-04-14 17:07:00 +08:00 |
|