sean.su
8699109129
Refactor the chat interface to support tool calling and parameter processing
...
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling.
Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations.
Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases.
Updated ktransformers.py and transformers.py to support the passing of tool parameters.
Modified the default value of top_p in config.py to 1.0 to increase generation diversity.
Extended the message model in chat.py to support the transmission of tool call information.
These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
2025-04-14 15:23:37 +08:00
dongjw
5c7ed7b579
fix top_p = 0 bug
2025-04-01 20:38:33 +08:00
Azure-Tang
31677181c3
Fix ktransformers-server flashinfer wrapper position arg issue;
...
Fix db position issue
2025-04-01 07:30:23 +00:00
Atream
d453c320f1
fix flashinfer precision
2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64
[update] support openai chat completion api
2025-03-07 08:51:09 +00:00
Azure
662c1e4c14
small fix about max new token
2025-03-05 09:25:41 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main
2025-03-01 23:24:10 +08:00
Atream
fa03ea48dd
Merge branch 'main' into feat-chunk-prefill-flashinfer
2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8
support chunk prefill, support 139K context for 24G VRAM
2025-03-01 11:28:25 +00:00
liam
80e0536fb0
Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main
2025-03-01 00:12:21 +08:00
liam
8ddc990668
⚡ fix server cache lens
2025-03-01 00:09:57 +08:00
qiyuxinlin
22df52e94e
fix temperature
2025-02-27 21:00:44 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request
2025-02-27 18:08:55 +08:00
Atream
b443c7dfa2
Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill
...
Feat absorb for long prefill
2025-02-25 16:53:21 +08:00
Atream
f4c198bd42
support absorb for prefill long context
2025-02-25 08:52:02 +00:00
Azure
4dc5518e4d
update fp8 kernel tutorial
2025-02-24 15:37:01 +00:00
lazymio
76487c4dcb
Revert repetition_penalty as it is not in API spec
2025-02-24 21:30:03 +08:00
lazymio
bf36547f98
Also allow repetition_penalty
2025-02-24 21:07:35 +08:00
lazymio
8704c09192
Allow temperature and top_p from requests
2025-02-24 21:01:33 +08:00
Atream
a529518346
clean PR code and disable flashinfer
2025-02-19 04:42:47 +00:00
ceerrep
73d072f609
Merge branch 'fix_precision_MLA' of https://github.com/kvcache-ai/ktransformers into server-prefix-cache
2025-02-18 11:44:28 +08:00
Xie Weiyu
f029588b61
fix server warmup
2025-02-18 11:39:45 +08:00
Xie Weiyu
c176e516b5
server mix mla
2025-02-17 20:40:28 +08:00
ceerrep
cd9f7f8f34
fix: server: drop <think> tag in chat template
2025-02-17 14:25:27 +08:00
ceerrep
bb0ccc7b1a
feat: add prefix cache for server
2025-02-17 00:10:55 +08:00
MuWinds
f74c2d1d17
Solve torch.backends.cuda.sdp_kernel()
is deprecated.
2025-02-15 12:41:51 +08:00
liam
4385e85096
⚡ support force thinking
2025-02-12 12:43:53 +08:00
liam
6f3a39be08
⚡ update force_think config
2025-02-12 12:10:16 +08:00
liam
e536e1420d
⚡ update force_think
2025-02-12 11:42:55 +08:00
liam
c18ecd7b7f
⚡ add flush print in local_chat output and change default optimize yaml of deepseekv3 to single gpu
2025-02-08 13:15:52 +08:00
Azure
907251c743
done support deepseekv3
2025-02-04 15:53:38 +00:00
Azure
476b1d8dc6
support deepseekv3; runable but have precition problem
2025-01-31 08:27:24 +00:00
liam
c2b4dc805c
🚑 ️:roll back transformer.py and find that it's multiple chat hsitory have minor accurate error
2024-11-04 14:02:19 +08:00
anyanqilin
2d67016d14
wjh-change
2024-11-04 14:02:19 +08:00
liam
7c94df4bcf
🚑 ️: back transformer.py bugs version, and fix typo error in local_chat.py
2024-11-04 14:02:19 +08:00
liam
dd1d8667f3
✨ : refactor local_chat and fix message slice bug in server
2024-11-04 14:02:19 +08:00
chenxl
18c42e67df
Initial commit
2024-07-27 16:06:58 +08:00