sean.su
8699109129
Refactor the chat interface to support tool calling and parameter processing
...
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.
Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.
Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.
Updated ktransformers.py and transformers.py to support the passing of tool parameters.
Modified the default value of top_p in config.py to 1.0 to increase generation diversity.
Extended the message model in chat.py to support the transmission of tool call information.
These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
Atream
25cee5810e
add balance-serve, support concurrence
2025-03-31 22:55:32 +08:00
BITcyman
299c4dca64
[update] support openai chat completion api
2025-03-07 08:51:09 +00:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
...
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0
fix ollama api temperature bug
2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d
fix typo for top_p
2025-03-01 20:15:36 +00:00
Atream
fa03ea48dd
Merge branch 'main' into feat-chunk-prefill-flashinfer
2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8
support chunk prefill, support 139K context for 24G VRAM
2025-03-01 11:28:25 +00:00
liam
80e0536fb0
Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main
2025-03-01 00:12:21 +08:00
liam
8ddc990668
⚡ fix server cache lens
2025-03-01 00:09:57 +08:00
qiyuxinlin
22df52e94e
fix temperature
2025-02-27 21:00:44 +08:00
lazymio
b121ca4df8
Fix according to upstream changes
2025-02-27 18:11:35 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request
2025-02-27 18:08:55 +08:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
...
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42
support absorb for prefill long context
2025-02-25 08:52:02 +00:00
Azure
36fbeee341
Update doc
2025-02-25 08:21:18 +00:00
lazymio
07eb712a73
Left out
2025-02-24 21:51:14 +08:00
lazymio
8704c09192
Allow temperature and top_p from requests
2025-02-24 21:01:33 +08:00
Atream
7e1fe256c8
optimize GPU
2025-02-21 05:06:57 +00:00
ceerrep
73d072f609
Merge branch 'fix_precision_MLA' of https://github.com/kvcache-ai/ktransformers into server-prefix-cache
2025-02-18 11:44:28 +08:00
Xie Weiyu
f029588b61
fix server warmup
2025-02-18 11:39:45 +08:00
ceerrep
c70b6f4d5b
fix: use 'cuda:0' by default if torch_device is 'cuda'
2025-02-18 11:15:17 +08:00
Xie Weiyu
c176e516b5
server mix mla
2025-02-17 20:40:28 +08:00
ceerrep
ee24eb8dc3
fix: fix server for triton kernel
2025-02-17 18:08:45 +08:00
ceerrep
bb0ccc7b1a
feat: add prefix cache for server
2025-02-17 00:10:55 +08:00
hrz6976
2c3dcd9774
Add a lock to server inference()
2025-02-13 10:05:22 +00:00
Azure
c4d9bc6670
support KExpertsMarlin backend
2025-02-07 05:57:40 +00:00
Azure
907251c743
done support deepseekv3
2025-02-04 15:53:38 +00:00
Azure
476b1d8dc6
support deepseekv3; runable but have precition problem
2025-01-31 08:27:24 +00:00
liam
dd1d8667f3
✨ : refactor local_chat and fix message slice bug in server
2024-11-04 14:02:19 +08:00
chenxl
b9f0819a86
None for load config
2024-08-22 15:52:25 +00:00
TangJingqi
170b7a6001
fix server don't accept yaml path as param; fix server static cache device problem
2024-08-21 14:19:43 +08:00
Atream
412055d450
[feature] experts can be injected using CPUInfer
...
[fix] fix ktransformers interface when use new CUDAGraphRunner
[fix] fix YAML and optimize logic, the top rule has the highest priority
2024-08-14 16:10:54 +08:00
chenxl
18c42e67df
Initial commit
2024-07-27 16:06:58 +08:00