Commit graph

100 commits

Author SHA1 Message Date
qiyuxinlin
b17ab8653c update speed test 2025-04-22 07:38:05 +00:00
qiyuxinlin
03a65d6bea roll back ktransformers backend, add max_tokens, max_completion_tokens param 2025-04-21 12:55:37 +00:00
wang jiahao
a1162eea01
Merge pull request #1158 from Creeper-MZ/function_call
Some checks failed
Book-CI / test (push) Failing after 9s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Update Function call
2025-04-19 16:31:37 +08:00
Creeper-MZ
133ba746e9 优化提示词,解决部分Deepseek r1的兼容性
优化提示词,解决部分Deepseek r1的兼容性

fix non stream
2025-04-19 01:20:27 -04:00
wang jiahao
0892d37d2d
Merge pull request #1172 from kvcache-ai/move_create_sched
Some checks failed
Book-CI / test (push) Failing after 4s
Deploy / deploy (ubuntu-latest) (push) Failing after 2s
Deploy / deploy (windows-latest) (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Move KV cache creation to balance_serve
2025-04-18 18:19:29 +08:00
qiyuxinlin
38e841900d Move KV cache creation to balance_serve 2025-04-18 10:10:07 +00:00
Atream
e44c45e782
Merge pull request #1163 from cyhasuka/main
Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation
2025-04-18 00:50:58 -06:00
Atream
e6fb4d5a58
remove hard code max_length 2025-04-18 12:11:18 +08:00
Creeper-MZ
62c4023160 Fixed #1155 2025-04-17 10:21:51 -04:00
Creeper-MZ
4fb19bfcae Update chat.py 2025-04-17 09:19:14 -04:00
Yuhao Tsui
8ce34b3b5c
Modify the performance calculation module
Modify the performance data calculation module from estimation to retrieving from `raw_usage`.
2025-04-17 16:57:53 +08:00
wang jiahao
6e4da83d4b
Merge pull request #978 from cyhasuka/main
Some checks failed
Deploy / deploy (ubuntu-latest) (push) Failing after 3s
Book-CI / test (push) Has been cancelled
Deploy / deploy (macos-latest) (push) Has been cancelled
Deploy / deploy (windows-latest) (push) Has been cancelled
Feat: Support Non-streaming chat in Ollama backend
2025-04-17 14:34:35 +08:00
Creeper-MZ
cb266c98d4 Fix a bug 2025-04-16 23:31:33 -04:00
Creeper-MZ
6bc2e85343 Update chat.py 2025-04-16 15:54:23 -04:00
Creeper-MZ
88f688e2c8 更改token注入逻辑,减少token注入量,防止遗忘
Update chat.py

Update chat.py

Update chat.py
2025-04-16 15:52:24 -04:00
kevin
c8db24d5eb
Update config.py
Update config.py
2025-04-16 17:32:08 +08:00
sean.su
8699109129 Refactor the chat interface to support tool calling and parameter processing
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.

Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.

Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.

Updated ktransformers.py and transformers.py to support the passing of tool parameters.

Modified the default value of top_p in config.py to 1.0 to increase generation diversity.

Extended the message model in chat.py to support the transmission of tool call information.

These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
Creeper-MZ
a7e8d7c1af updata function_call 2025-04-13 23:48:51 -04:00
wangkuigang-yewu-cmss
4538bdae97 prevent rpc process from crashing on long prompt
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。
这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
2025-04-13 16:13:16 +08:00
Yuhao Tsui
877aec858e
Merge branch 'kvcache-ai:main' into main 2025-04-09 11:46:39 +08:00
qiyuxinlin
64de784328 format kvc2, delete quant_configs, move model_configs to ~/.ktransformers 2025-04-08 10:06:07 +00:00
dongjw
ec03bcbd7f fix temperature=0, flashinfer sample error 2025-04-07 12:30:47 +08:00
Qin's repo
2c3a3a1e1c
slove [Bug] #1023
Only modified the mixed single and double quotes in server/config/config.py
2025-04-03 14:37:32 +08:00
dongjw
5c7ed7b579 fix top_p = 0 bug 2025-04-01 20:38:33 +08:00
Azure-Tang
31677181c3 Fix ktransformers-server flashinfer wrapper position arg issue;
Fix db position issue
2025-04-01 07:30:23 +00:00
Atream
25cee5810e add balance-serve, support concurrence 2025-03-31 22:55:32 +08:00
Yuhao Tsui
84164f584c
Update completions.py 2025-03-26 15:39:46 +08:00
Yuhao Tsui
e5694f91c0
Merge branch 'kvcache-ai:main' into main 2025-03-10 09:10:28 +08:00
Atream
09c043d8a6
Merge pull request #842 from BITcyman/fix-openai_chat_completion
[fix] thread context bug
2025-03-07 22:56:19 +08:00
BITcyman
08a8b553d6 [fix] thread context bug 2025-03-07 14:52:16 +00:00
Atream
d453c320f1 fix flashinfer precision 2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64 [update] support openai chat completion api 2025-03-07 08:51:09 +00:00
Yuhao Tsui
d050d8655f
Update completions.py 2025-03-06 11:16:33 +08:00
chenmz00
b2ba795cfd
fix: list models API
Fix the list models API to match the corresponding OpenAI API format.
2025-03-05 21:49:27 +08:00
Azure
662c1e4c14 small fix about max new token 2025-03-05 09:25:41 +00:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0 fix ollama api temperature bug 2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d fix typo for top_p 2025-03-01 20:15:36 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main 2025-03-01 23:24:10 +08:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00
liam
8ddc990668 fix server cache lens 2025-03-01 00:09:57 +08:00
qiyuxinlin
22df52e94e fix temperature 2025-02-27 21:00:44 +08:00
lazymio
b121ca4df8
Fix according to upstream changes 2025-02-27 18:11:35 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request 2025-02-27 18:08:55 +08:00
Atream
798e1d0cfa
Merge pull request #532 from xv44586/fix-sse-formatting
fix: fix SSE formatting
2025-02-27 12:19:23 +08:00
Atream
f403cde6d4
Merge pull request #650 from ceerRep/main
feat: basic api key support
2025-02-27 12:16:53 +08:00
swu-hyk
ec7e912fee modify 2025-02-26 19:21:30 +08:00
swu-hyk
68e7df3a25 implementation of chat routing for Ollama 2025-02-26 17:05:00 +08:00