Commit graph

37 commits

Author SHA1 Message Date
sean.su
8699109129 Refactor the chat interface to support tool calling and parameter processing
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.

Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.

Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.

Updated ktransformers.py and transformers.py to support the passing of tool parameters.

Modified the default value of top_p in config.py to 1.0 to increase generation diversity.

Extended the message model in chat.py to support the transmission of tool call information.

These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
dongjw
5c7ed7b579 fix top_p = 0 bug 2025-04-01 20:38:33 +08:00
Azure-Tang
31677181c3 Fix ktransformers-server flashinfer wrapper position arg issue;
Fix db position issue
2025-04-01 07:30:23 +00:00
Atream
d453c320f1 fix flashinfer precision 2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64 [update] support openai chat completion api 2025-03-07 08:51:09 +00:00
Azure
662c1e4c14 small fix about max new token 2025-03-05 09:25:41 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main 2025-03-01 23:24:10 +08:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00
liam
8ddc990668 fix server cache lens 2025-03-01 00:09:57 +08:00
qiyuxinlin
22df52e94e fix temperature 2025-02-27 21:00:44 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request 2025-02-27 18:08:55 +08:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42 support absorb for prefill long context 2025-02-25 08:52:02 +00:00
Azure
4dc5518e4d update fp8 kernel tutorial 2025-02-24 15:37:01 +00:00
lazymio
76487c4dcb
Revert repetition_penalty as it is not in API spec 2025-02-24 21:30:03 +08:00
lazymio
bf36547f98
Also allow repetition_penalty 2025-02-24 21:07:35 +08:00
lazymio
8704c09192
Allow temperature and top_p from requests 2025-02-24 21:01:33 +08:00
Atream
a529518346 clean PR code and disable flashinfer 2025-02-19 04:42:47 +00:00
ceerrep
73d072f609 Merge branch 'fix_precision_MLA' of https://github.com/kvcache-ai/ktransformers into server-prefix-cache 2025-02-18 11:44:28 +08:00
Xie Weiyu
f029588b61 fix server warmup 2025-02-18 11:39:45 +08:00
Xie Weiyu
c176e516b5 server mix mla 2025-02-17 20:40:28 +08:00
ceerrep
cd9f7f8f34 fix: server: drop <think> tag in chat template 2025-02-17 14:25:27 +08:00
ceerrep
bb0ccc7b1a feat: add prefix cache for server 2025-02-17 00:10:55 +08:00
MuWinds
f74c2d1d17
Solve torch.backends.cuda.sdp_kernel() is deprecated. 2025-02-15 12:41:51 +08:00
liam
4385e85096 support force thinking 2025-02-12 12:43:53 +08:00
liam
6f3a39be08 update force_think config 2025-02-12 12:10:16 +08:00
liam
e536e1420d update force_think 2025-02-12 11:42:55 +08:00
liam
c18ecd7b7f add flush print in local_chat output and change default optimize yaml of deepseekv3 to single gpu 2025-02-08 13:15:52 +08:00
Azure
907251c743 done support deepseekv3 2025-02-04 15:53:38 +00:00
Azure
476b1d8dc6 support deepseekv3; runable but have precition problem 2025-01-31 08:27:24 +00:00
liam
c2b4dc805c 🚑️:roll back transformer.py and find that it's multiple chat hsitory have minor accurate error 2024-11-04 14:02:19 +08:00
anyanqilin
2d67016d14 wjh-change 2024-11-04 14:02:19 +08:00
liam
7c94df4bcf 🚑️: back transformer.py bugs version, and fix typo error in local_chat.py 2024-11-04 14:02:19 +08:00
liam
dd1d8667f3 : refactor local_chat and fix message slice bug in server 2024-11-04 14:02:19 +08:00
chenxl
18c42e67df Initial commit 2024-07-27 16:06:58 +08:00