qiyuxinlin
4f9950e30c
kill serve lead to kill sched and engine
2025-04-22 09:25:44 +00:00
wang jiahao
4c41f3a35f
Merge pull request #1180 from kvcache-ai/update_param
...
update speed test
2025-04-22 15:39:57 +08:00
qiyuxinlin
b17ab8653c
update speed test
2025-04-22 07:38:05 +00:00
wang jiahao
485588017b
Merge pull request #1177 from kvcache-ai/update_param
...
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Update param
2025-04-22 10:14:36 +08:00
qiyuxinlin
f5287e908a
fix no balance_serve import error
2025-04-22 02:11:18 +00:00
qiyuxinlin
03a65d6bea
roll back ktransformers backend, add max_tokens, max_completion_tokens param
2025-04-21 12:55:37 +00:00
wang jiahao
a1162eea01
Merge pull request #1158 from Creeper-MZ/function_call
...
Book-CI / test (push) Failing after 9s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Update Function call
2025-04-19 16:31:37 +08:00
Creeper-MZ
133ba746e9
优化提示词,解决部分Deepseek r1的兼容性
...
优化提示词,解决部分Deepseek r1的兼容性
fix non stream
2025-04-19 01:20:27 -04:00
Atream
34c199403b
Merge pull request #1170 from onepick/fix-cmake-error
...
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Deploy / deploy (windows-latest) (push) Has been cancelled
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Fix cmake config error
2025-04-18 07:51:03 -06:00
wang jiahao
0892d37d2d
Merge pull request #1172 from kvcache-ai/move_create_sched
...
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Move KV cache creation to balance_serve
2025-04-18 18:19:29 +08:00
qiyuxinlin
38e841900d
Move KV cache creation to balance_serve
2025-04-18 10:10:07 +00:00
onepick
c5edd3fdf0
Fix cmake config error
...
Signed-off-by: onepick <jiajuku12@163.com>
2025-04-18 15:43:03 +08:00
Atream
e44c45e782
Merge pull request #1163 from cyhasuka/main
...
Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation
2025-04-18 00:50:58 -06:00
Atream
08f0bd5e13
Merge pull request #1168 from kvcache-ai/Atream-patch-1
...
remove hard code max_length
2025-04-17 22:40:28 -06:00
Atream
e6fb4d5a58
remove hard code max_length
2025-04-18 12:11:18 +08:00
Jianwei Dong
22a30d707d
Merge pull request #1167 from kvcache-ai/update-llama4-tutorial-patch-1
...
update llama4 tutorial
2025-04-18 11:44:11 +08:00
djw
dfaf2b20fb
update llama4 tutorial
2025-04-18 03:42:48 +00:00
Creeper-MZ
62c4023160
Fixed #1155
2025-04-17 10:21:51 -04:00
Yuhao Tsui
eff5bbc202
Merge branch 'kvcache-ai:main' into main
2025-04-17 22:01:31 +08:00
Creeper-MZ
4fb19bfcae
Update chat.py
2025-04-17 09:19:14 -04:00
ZiWei Yuan
8770b6d573
Merge pull request #1159 from onepick/fix-rocm-build-error
...
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Fix some build error for ROCM
2025-04-17 19:57:44 +08:00
onepick
6a7624fe4a
Change the logic to build device since cuda is as default
...
Signed-off-by: onepick <jiajuku12@163.com>
2025-04-17 19:44:05 +08:00
Yuhao Tsui
8ce34b3b5c
Modify the performance calculation module
...
Modify the performance data calculation module from estimation to retrieving from `raw_usage`.
2025-04-17 16:57:53 +08:00
wang jiahao
6e4da83d4b
Merge pull request #978 from cyhasuka/main
...
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Support Non-streaming chat in Ollama backend
2025-04-17 14:34:35 +08:00
wang jiahao
b055132369
Merge pull request #1154 from 344303947/features/add-function-calling
...
Fix the error caused by the client not passing temperature and top_p being empty
2025-04-17 14:31:02 +08:00
onepick
97f1995696
Fix some build error for ROCM
...
1. Fix terrible logic in CMakeLists.txt
2. using the correct typedef for hip
Signed-off-by: onepick <jiajuku12@163.com>
2025-04-17 11:34:33 +08:00
Creeper-MZ
cb266c98d4
Fix a bug
2025-04-16 23:31:33 -04:00
wang jiahao
3efb66213b
Merge pull request #1157 from jiangshibiao/dev-fix-bug
...
Add bsz_tensors param to torch linear
2025-04-17 10:11:01 +08:00
Creeper-MZ
6bc2e85343
Update chat.py
2025-04-16 15:54:23 -04:00
Creeper-MZ
88f688e2c8
更改token注入逻辑,减少token注入量,防止遗忘
...
Update chat.py
Update chat.py
Update chat.py
2025-04-16 15:52:24 -04:00
root
921061666c
fix some bugs
2025-04-17 00:48:09 +08:00
kevin
c8db24d5eb
Update config.py
...
Update config.py
2025-04-16 17:32:08 +08:00
kevin
badf7a1bb1
Merge branch 'kvcache-ai:main' into features/add-function-calling
2025-04-16 17:21:27 +08:00
Chengyu Qiu
d2cf81423f
Merge pull request #1135 from Creeper-MZ/function_call
...
Book-CI / test (push) Failing after 3s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Add Function call support
2025-04-16 09:57:22 +08:00
ZiWei Yuan
fcbd41e175
Merge pull request #1143 from jizhilong/improve-cmake-subprocess-output
...
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
feat(build): display limited tail of subprocesses in real time
2025-04-15 17:37:44 +08:00
jizhilong
0638ea298d
feat(build): display limited tail of subprocesses in real time
...
this is a followup on #1108
2025-04-15 16:40:38 +08:00
ZiWei Yuan
8dc1ab9e04
Merge pull request #1108 from jizhilong/expose-cmake-logs
...
Book-CI / test (push) Failing after 3s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
chore: show cmake output in real time during build_ext
2025-04-14 17:07:00 +08:00
sean.su
8699109129
Refactor the chat interface to support tool calling and parameter processing
...
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.
Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.
Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.
Updated ktransformers.py and transformers.py to support the passing of tool parameters.
Modified the default value of top_p in config.py to 1.0 to increase generation diversity.
Extended the message model in chat.py to support the transmission of tool call information.
These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
Creeper-MZ
a7e8d7c1af
updata function_call
2025-04-13 23:48:51 -04:00
wang jiahao
038db30ec9
Merge pull request #1132 from wangkuigang-yewu-cmss/long-prompt-crash
...
Deploy / deploy (ubuntu-latest) (push) Failing after 6s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
使用长prompt时,避免rpc进程挂掉
2025-04-13 22:06:11 +08:00
wangkuigang-yewu-cmss
4538bdae97
prevent rpc process from crashing on long prompt
...
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
2025-04-13 16:13:16 +08:00
ErvinXie
797dac7e31
Merge pull request #1109 from aubreyli/libxxhash-fPIC
...
xxHash: fix link error due to non-position-independent code
2025-04-13 14:15:31 +08:00
ZiWei Yuan
77956822ce
Merge pull request #1116 from ikawrakow/ik/add_copyright
...
Add missing references to ik_llama.cpp
2025-04-13 11:53:12 +08:00
Iwan Kawrakow
99a247e167
Spelling
2025-04-11 10:15:42 +03:00
Iwan Kawrakow
c46b0c59d0
Add missing references to ik_llama.cpp
2025-04-11 09:39:57 +03:00
Aubrey Li
63ca2fa84d
xxHash: fix link error due to non-position-independent code
...
Add PROPERTIES POSITION_INDEPENDENT_CODE option to fix the
following error:
/usr/bin/ld: ../../third_party/xxHash/libxxhash.a(xxhash.c.o):
relocation R_X86_64_32S against `.rodata' can not be used when
making a shared object; recompile with -fPIC
Trying to link a non-PIC static library libxxhash.a into a
.so shared library, which is not allowed. The object file
xxhash.c.o must be recompiled with explicit -fPIC support.
2025-04-10 21:50:23 +08:00
jizhilong
690d4d42f9
chore: show cmake output in real time during build_ext
...
otherwise cmake error messages may be suppressed, making debugging
difficult
2025-04-10 21:33:04 +08:00
Atream
35ba63e259
Merge pull request #1103 from kvcache-ai/Atream-patch-6
...
Create SECURITY.md
2025-04-09 19:50:57 +08:00
Atream
5f8cdc7640
Create SECURITY.md
2025-04-09 19:50:38 +08:00
Atream
92a67ab549
Merge pull request #1101 from kvcache-ai/Atream-patch-5
...
Update llama4.md
2025-04-09 19:23:46 +08:00