Jesse
e204a0bb6b
Merge 8c8cb207aa
into ee2ede0412
2025-08-05 15:24:17 +08:00
qiyuxinlin
9e1560bb82
GLM4 and SmallThinker
2025-07-25 16:56:36 +00:00
djw
17246bf84f
support smt and glm4
2025-07-25 15:03:27 +00:00
djw
48bc6185b5
support smt and qlm4
2025-07-25 12:48:51 +00:00
qiyuxinlin
712ad1fa3c
smallthinker right
2025-07-25 12:46:14 +00:00
qiyuxinlin
71c1d4eed7
smallthink run
2025-07-24 15:08:29 +00:00
djw
590fcb41cd
support smt and glm4
2025-07-24 12:31:01 +00:00
djw
b66d96db97
support smt and glm4
2025-07-24 08:40:58 +00:00
wang jiahao
a2e95e467a
Update balance_serve.py
2025-07-12 13:14:35 +08:00
Jesse CreateThis
8c8cb207aa
Apply magikRUKKOLA's patch from issue #1417
2025-07-06 19:45:06 +00:00
ouqingliang
90cff820cf
update kvc disk path config.
2025-06-30 15:09:35 +00:00
Ye Zhou
00949d5e8d
Mirror #1247 in server mode
2025-05-29 15:30:40 +08:00
qiyuxinlin
b40f13abeb
fix deduplicate_and_sort cudagraphs
2025-05-15 04:09:34 +00:00
qiyuxinlin
e8e83308a9
fix flashinfer float_workspace_buffer small
2025-05-14 09:33:52 +00:00
qiyuxinlin
c6aa379de2
support safetensor load, delete architectures argument
2025-05-09 10:38:29 +00:00
Atream
7adb7281f4
fix-cache-lens
2025-04-30 03:37:43 +00:00
qiyuxinlin
27990dc6fb
fix load bug
2025-04-28 21:08:13 +00:00
djw
33cbd47086
support qwen3
2025-04-28 18:15:35 +00:00
djw
0da3792b27
support qwen3
2025-04-28 14:05:24 +00:00
djw
3f9bbf1181
support qwen3, dont speak human language
2025-04-28 08:44:47 +00:00
qiyuxinlin
7af83f9efb
fix load default max_new_tokens
2025-04-25 04:20:12 +00:00
Atream
46493789eb
fix chat template encoding
2025-04-24 12:44:16 +08:00
Alisehen
f7d939313b
Merge remote-tracking branch 'origin/main' into check-para
2025-04-23 02:40:14 +00:00
Alisehen
99540ad01f
add check parameters
2025-04-23 02:38:43 +00:00
Alisehen
c995bdbbfa
add check-para
2025-04-22 09:30:08 +00:00
qiyuxinlin
4f9950e30c
kill serve lead to kill sched and engine
2025-04-22 09:25:44 +00:00
qiyuxinlin
b17ab8653c
update speed test
2025-04-22 07:38:05 +00:00
qiyuxinlin
03a65d6bea
roll back ktransformers backend, add max_tokens, max_completion_tokens param
2025-04-21 12:55:37 +00:00
wang jiahao
a1162eea01
Merge pull request #1158 from Creeper-MZ/function_call
...
Book-CI / test (push) Failing after 9s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Update Function call
2025-04-19 16:31:37 +08:00
Creeper-MZ
133ba746e9
优化提示词,解决部分Deepseek r1的兼容性
...
优化提示词,解决部分Deepseek r1的兼容性
fix non stream
2025-04-19 01:20:27 -04:00
wang jiahao
0892d37d2d
Merge pull request #1172 from kvcache-ai/move_create_sched
...
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Move KV cache creation to balance_serve
2025-04-18 18:19:29 +08:00
qiyuxinlin
38e841900d
Move KV cache creation to balance_serve
2025-04-18 10:10:07 +00:00
Atream
e44c45e782
Merge pull request #1163 from cyhasuka/main
...
Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation
2025-04-18 00:50:58 -06:00
Atream
e6fb4d5a58
remove hard code max_length
2025-04-18 12:11:18 +08:00
Creeper-MZ
62c4023160
Fixed #1155
2025-04-17 10:21:51 -04:00
Creeper-MZ
4fb19bfcae
Update chat.py
2025-04-17 09:19:14 -04:00
Yuhao Tsui
8ce34b3b5c
Modify the performance calculation module
...
Modify the performance data calculation module from estimation to retrieving from `raw_usage`.
2025-04-17 16:57:53 +08:00
wang jiahao
6e4da83d4b
Merge pull request #978 from cyhasuka/main
...
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Support Non-streaming chat in Ollama backend
2025-04-17 14:34:35 +08:00
Creeper-MZ
cb266c98d4
Fix a bug
2025-04-16 23:31:33 -04:00
Creeper-MZ
6bc2e85343
Update chat.py
2025-04-16 15:54:23 -04:00
Creeper-MZ
88f688e2c8
更改token注入逻辑,减少token注入量,防止遗忘
...
Update chat.py
Update chat.py
Update chat.py
2025-04-16 15:52:24 -04:00
kevin
c8db24d5eb
Update config.py
...
Update config.py
2025-04-16 17:32:08 +08:00
sean.su
8699109129
Refactor the chat interface to support tool calling and parameter processing
...
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.
Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.
Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.
Updated ktransformers.py and transformers.py to support the passing of tool parameters.
Modified the default value of top_p in config.py to 1.0 to increase generation diversity.
Extended the message model in chat.py to support the transmission of tool call information.
These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
Creeper-MZ
a7e8d7c1af
updata function_call
2025-04-13 23:48:51 -04:00
wangkuigang-yewu-cmss
4538bdae97
prevent rpc process from crashing on long prompt
...
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
2025-04-13 16:13:16 +08:00
Yuhao Tsui
877aec858e
Merge branch 'kvcache-ai:main' into main
2025-04-09 11:46:39 +08:00
qiyuxinlin
64de784328
format kvc2, delete quant_configs, move model_configs to ~/.ktransformers
2025-04-08 10:06:07 +00:00
dongjw
ec03bcbd7f
fix temperature=0, flashinfer sample error
2025-04-07 12:30:47 +08:00
Qin's repo
2c3a3a1e1c
slove [Bug] #1023
...
Only modified the mixed single and double quotes in server/config/config.py
2025-04-03 14:37:32 +08:00
dongjw
5c7ed7b579
fix top_p = 0 bug
2025-04-01 20:38:33 +08:00