Commit graph

203 commits

Author SHA1 Message Date
Atream
09c043d8a6
Merge pull request #842 from BITcyman/fix-openai_chat_completion
[fix] thread context bug
2025-03-07 22:56:19 +08:00
BITcyman
08a8b553d6 [fix] thread context bug 2025-03-07 14:52:16 +00:00
Atream
f8c1821f1d
Update __init__.py 2025-03-07 22:08:48 +08:00
Atream
d453c320f1 fix flashinfer precision 2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64 [update] support openai chat completion api 2025-03-07 08:51:09 +00:00
ZiWei Yuan
63b1c8525b
Merge pull request #820 from kvcache-ai/develop-0.2.3
Develop 0.2.3 ready to release
2025-03-06 14:46:09 +08:00
liam
8eeb6dd432 update compile option for avx512vpopcntdq 2025-03-06 12:18:04 +08:00
chenmz00
b2ba795cfd
fix: list models API
Fix the list models API to match the corresponding OpenAI API format.
2025-03-05 21:49:27 +08:00
liam
9c343b4f71 🔖 release v0.2.3 2025-03-05 20:24:11 +08:00
liam
848fe8ab97 release v0.2.3 2025-03-05 20:21:04 +08:00
Azure
d7becadcf7 Merge branch 'develop-0.2.3' of https://github.com/kvcache-ai/ktransformers into develop-0.2.3 2025-03-05 09:26:23 +00:00
Azure
662c1e4c14 small fix about max new token 2025-03-05 09:25:41 +00:00
liam
dc10480ef6 add humaneval support 2025-03-04 20:54:49 +08:00
Yi Pan
01755a60c0
fix: wrong shape in KLinearMarlin. 2025-03-03 17:34:45 +08:00
Atream
8963ae7817
Update __init__.py 2025-03-03 16:49:50 +08:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0 fix ollama api temperature bug 2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d fix typo for top_p 2025-03-01 20:15:36 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main 2025-03-01 23:24:10 +08:00
宁鹏涛
71286ec1c0
Update local_chat.py
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
2025-03-01 21:52:48 +08:00
Atream
fa03ea48dd Merge branch 'main' into feat-chunk-prefill-flashinfer 2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8 support chunk prefill, support 139K context for 24G VRAM 2025-03-01 11:28:25 +00:00
liam
80e0536fb0 Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main 2025-03-01 00:12:21 +08:00
liam
8ddc990668 fix server cache lens 2025-03-01 00:09:57 +08:00
Shuaiyi
a34a25d5cc Delete unused code 2025-02-27 13:18:19 +00:00
wang jiahao
7a19f3b781
Merge pull request #721 from kvcache-ai/fix_temperature
fix temperature
2025-02-27 21:01:21 +08:00
qiyuxinlin
22df52e94e fix temperature 2025-02-27 21:00:44 +08:00
Atream
85e2cc7bf4
Merge pull request #719 from kvcache-ai/fix-use-generation-json
use generation config from json file in official repo
2025-02-27 19:49:41 +08:00
Atream
e645d84794 use generation config from json file in official repo 2025-02-27 11:48:34 +00:00
lazymio
b121ca4df8
Fix according to upstream changes 2025-02-27 18:11:35 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request 2025-02-27 18:08:55 +08:00
Atream
50c691297f
Merge pull request #622 from akemimadoka/fix-msvc
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
2025-02-27 17:42:00 +08:00
Atream
0422152cf3
Merge pull request #670 from akemimadoka/fix-win
Fix RuntimeError on Windows caused by integer overflow in np.prod
2025-02-27 17:40:27 +08:00
Atream
798e1d0cfa
Merge pull request #532 from xv44586/fix-sse-formatting
fix: fix SSE formatting
2025-02-27 12:19:23 +08:00
Atream
f403cde6d4
Merge pull request #650 from ceerRep/main
feat: basic api key support
2025-02-27 12:16:53 +08:00
Atream
8db6a4d402
Merge branch 'main' into main 2025-02-27 12:12:32 +08:00
wang jiahao
3c8c580580
Merge pull request #691 from swu-hyk/ollama_api_chat
feat:implementation of chat routing for Ollama
2025-02-27 11:17:48 +08:00
Atream
90eb87b3fc
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml 2025-02-26 21:53:50 +08:00
swu-hyk
ec7e912fee modify 2025-02-26 19:21:30 +08:00
swu-hyk
68e7df3a25 implementation of chat routing for Ollama 2025-02-26 17:05:00 +08:00
Chen Hongtao
9660b2cc1e
Merge pull request #685 from vproxy-tools/main
fix numa cpu distribution
2025-02-26 15:35:19 +08:00
liam
ffb86c66e3 fix experts torch 2025-02-26 15:04:40 +08:00
wkgcass
b2bff17775 fix numa cpu distribution
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
akemimadoka
8817777e11 Fix RuntimeError on Windows caused by integer overflow in np.prod 2025-02-26 03:50:12 +08:00
liam
ddf3339339 release v0.2.2rc1 2025-02-25 22:06:36 +08:00
Azure
91c1619296 Merge branch 'develop-0.2.2' into support-fp8
Update README.md
2025-02-25 13:43:26 +00:00
Azure
2c0cce90d0 add fp8 multi gpu yaml example 2025-02-25 13:32:09 +00:00
Atream
d9b2895bd3 Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2 2025-02-25 12:47:48 +00:00
Atream
477ac28a9c fix-update-flashinfer_wrapper_local_chat 2025-02-25 12:47:31 +00:00
Azure
7e5962af3d fix fp8 multi gpu; update FQA 2025-02-25 10:52:29 +00:00