Commit graph

34 commits

Author SHA1 Message Date
sean.su
8699109129 Refactor the chat interface to support tool calling and parameter processing
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.

Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.

Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.

Updated ktransformers.py and transformers.py to support the passing of tool parameters.

Modified the default value of top_p in config.py to 1.0 to increase generation diversity.

Extended the message model in chat.py to support the transmission of tool call information.

These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
Atream
25cee5810e add balance-serve, support concurrence 2025-03-31 22:55:32 +08:00
BITcyman
299c4dca64 [update] support openai chat completion api 2025-03-07 08:51:09 +00:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0 fix ollama api temperature bug 2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d fix typo for top_p 2025-03-01 20:15:36 +00:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00
liam
8ddc990668 fix server cache lens 2025-03-01 00:09:57 +08:00
qiyuxinlin
22df52e94e fix temperature 2025-02-27 21:00:44 +08:00
lazymio
b121ca4df8
Fix according to upstream changes 2025-02-27 18:11:35 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request 2025-02-27 18:08:55 +08:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42 support absorb for prefill long context 2025-02-25 08:52:02 +00:00
Azure
36fbeee341 Update doc 2025-02-25 08:21:18 +00:00
lazymio
07eb712a73
Left out 2025-02-24 21:51:14 +08:00
lazymio
8704c09192
Allow temperature and top_p from requests 2025-02-24 21:01:33 +08:00
Atream
7e1fe256c8 optimize GPU 2025-02-21 05:06:57 +00:00
ceerrep
73d072f609 Merge branch 'fix_precision_MLA' of https://github.com/kvcache-ai/ktransformers into server-prefix-cache 2025-02-18 11:44:28 +08:00
Xie Weiyu
f029588b61 fix server warmup 2025-02-18 11:39:45 +08:00
ceerrep
c70b6f4d5b fix: use 'cuda:0' by default if torch_device is 'cuda' 2025-02-18 11:15:17 +08:00
Xie Weiyu
c176e516b5 server mix mla 2025-02-17 20:40:28 +08:00
ceerrep
ee24eb8dc3 fix: fix server for triton kernel 2025-02-17 18:08:45 +08:00
ceerrep
bb0ccc7b1a feat: add prefix cache for server 2025-02-17 00:10:55 +08:00
hrz6976
2c3dcd9774 Add a lock to server inference() 2025-02-13 10:05:22 +00:00
Azure
c4d9bc6670 support KExpertsMarlin backend 2025-02-07 05:57:40 +00:00
Azure
907251c743 done support deepseekv3 2025-02-04 15:53:38 +00:00
Azure
476b1d8dc6 support deepseekv3; runable but have precition problem 2025-01-31 08:27:24 +00:00
liam
dd1d8667f3 : refactor local_chat and fix message slice bug in server 2024-11-04 14:02:19 +08:00
chenxl
b9f0819a86 None for load config 2024-08-22 15:52:25 +00:00
TangJingqi
170b7a6001 fix server don't accept yaml path as param; fix server static cache device problem 2024-08-21 14:19:43 +08:00
Atream
412055d450 [feature] experts can be injected using CPUInfer
[fix] fix ktransformers interface when use new CUDAGraphRunner
[fix] fix YAML and optimize logic, the top rule has the highest priority
2024-08-14 16:10:54 +08:00
chenxl
18c42e67df Initial commit 2024-07-27 16:06:58 +08:00