Atream
09c043d8a6
Merge pull request #842 from BITcyman/fix-openai_chat_completion
...
[fix] thread context bug
2025-03-07 22:56:19 +08:00
BITcyman
08a8b553d6
[fix] thread context bug
2025-03-07 14:52:16 +00:00
Atream
f8c1821f1d
Update __init__.py
2025-03-07 22:08:48 +08:00
Atream
d453c320f1
fix flashinfer precision
2025-03-07 14:07:00 +00:00
BITcyman
299c4dca64
[update] support openai chat completion api
2025-03-07 08:51:09 +00:00
ZiWei Yuan
63b1c8525b
Merge pull request #820 from kvcache-ai/develop-0.2.3
...
Develop 0.2.3 ready to release
2025-03-06 14:46:09 +08:00
liam
8eeb6dd432
⚡ update compile option for avx512vpopcntdq
2025-03-06 12:18:04 +08:00
chenmz00
b2ba795cfd
fix: list models API
...
Fix the list models API to match the corresponding OpenAI API format.
2025-03-05 21:49:27 +08:00
liam
9c343b4f71
🔖 release v0.2.3
2025-03-05 20:24:11 +08:00
liam
848fe8ab97
⚡ release v0.2.3
2025-03-05 20:21:04 +08:00
Azure
d7becadcf7
Merge branch 'develop-0.2.3' of https://github.com/kvcache-ai/ktransformers into develop-0.2.3
2025-03-05 09:26:23 +00:00
Azure
662c1e4c14
small fix about max new token
2025-03-05 09:25:41 +00:00
liam
dc10480ef6
⚡ add humaneval support
2025-03-04 20:54:49 +08:00
Yi Pan
01755a60c0
fix: wrong shape in KLinearMarlin.
2025-03-03 17:34:45 +08:00
Atream
8963ae7817
Update __init__.py
2025-03-03 16:49:50 +08:00
wang jiahao
48b9800790
Merge pull request #759 from 3wweiweiwu/fix_top_p_typo
...
fix typo for top_p
2025-03-02 13:58:11 +08:00
1668068727@qq.com
7cdf8139f0
fix ollama api temperature bug
2025-03-02 13:55:26 +08:00
Wix Woo
3aa0cfc29d
fix typo for top_p
2025-03-01 20:15:36 +00:00
Atream
ca1dc1e7d1
Merge branch 'main' into main
2025-03-01 23:24:10 +08:00
宁鹏涛
71286ec1c0
Update local_chat.py
...
修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真
2025-03-01 21:52:48 +08:00
Atream
fa03ea48dd
Merge branch 'main' into feat-chunk-prefill-flashinfer
2025-03-01 11:35:09 +00:00
Atream
f35e8d41d8
support chunk prefill, support 139K context for 24G VRAM
2025-03-01 11:28:25 +00:00
liam
80e0536fb0
Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main
2025-03-01 00:12:21 +08:00
liam
8ddc990668
⚡ fix server cache lens
2025-03-01 00:09:57 +08:00
Shuaiyi
a34a25d5cc
Delete unused code
2025-02-27 13:18:19 +00:00
wang jiahao
7a19f3b781
Merge pull request #721 from kvcache-ai/fix_temperature
...
fix temperature
2025-02-27 21:01:21 +08:00
qiyuxinlin
22df52e94e
fix temperature
2025-02-27 21:00:44 +08:00
Atream
85e2cc7bf4
Merge pull request #719 from kvcache-ai/fix-use-generation-json
...
use generation config from json file in official repo
2025-02-27 19:49:41 +08:00
Atream
e645d84794
use generation config from json file in official repo
2025-02-27 11:48:34 +00:00
lazymio
b121ca4df8
Fix according to upstream changes
2025-02-27 18:11:35 +08:00
wang jiahao
26f7b4af11
Merge branch 'main' into temperature_top_p_from_request
2025-02-27 18:08:55 +08:00
Atream
50c691297f
Merge pull request #622 from akemimadoka/fix-msvc
...
Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC
2025-02-27 17:42:00 +08:00
Atream
0422152cf3
Merge pull request #670 from akemimadoka/fix-win
...
Fix RuntimeError on Windows caused by integer overflow in np.prod
2025-02-27 17:40:27 +08:00
Atream
798e1d0cfa
Merge pull request #532 from xv44586/fix-sse-formatting
...
fix: fix SSE formatting
2025-02-27 12:19:23 +08:00
Atream
f403cde6d4
Merge pull request #650 from ceerRep/main
...
feat: basic api key support
2025-02-27 12:16:53 +08:00
Atream
8db6a4d402
Merge branch 'main' into main
2025-02-27 12:12:32 +08:00
wang jiahao
3c8c580580
Merge pull request #691 from swu-hyk/ollama_api_chat
...
feat:implementation of chat routing for Ollama
2025-02-27 11:17:48 +08:00
Atream
90eb87b3fc
Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml
2025-02-26 21:53:50 +08:00
swu-hyk
ec7e912fee
modify
2025-02-26 19:21:30 +08:00
swu-hyk
68e7df3a25
implementation of chat routing for Ollama
2025-02-26 17:05:00 +08:00
Chen Hongtao
9660b2cc1e
Merge pull request #685 from vproxy-tools/main
...
fix numa cpu distribution
2025-02-26 15:35:19 +08:00
liam
ffb86c66e3
⚡ fix experts torch
2025-02-26 15:04:40 +08:00
wkgcass
b2bff17775
fix numa cpu distribution
...
The numa node location would be calculated based on the total number
of worker threads.
So we should always use the actual number of threads instead of using a min() op.
2025-02-26 14:49:57 +08:00
akemimadoka
8817777e11
Fix RuntimeError on Windows caused by integer overflow in np.prod
2025-02-26 03:50:12 +08:00
liam
ddf3339339
⚡ release v0.2.2rc1
2025-02-25 22:06:36 +08:00
Azure
91c1619296
Merge branch 'develop-0.2.2' into support-fp8
...
Update README.md
2025-02-25 13:43:26 +00:00
Azure
2c0cce90d0
add fp8 multi gpu yaml example
2025-02-25 13:32:09 +00:00
Atream
d9b2895bd3
Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2
2025-02-25 12:47:48 +00:00
Atream
477ac28a9c
fix-update-flashinfer_wrapper_local_chat
2025-02-25 12:47:31 +00:00
Azure
7e5962af3d
fix fp8 multi gpu; update FQA
2025-02-25 10:52:29 +00:00