vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-05 15:40:13 +00:00

Author	SHA1	Message	Date
Atream	09c043d8a6	Merge pull request #842 from BITcyman/fix-openai_chat_completion [fix] thread context bug	2025-03-07 22:56:19 +08:00
BITcyman	08a8b553d6	[fix] thread context bug	2025-03-07 14:52:16 +00:00
Atream	f8c1821f1d	Update __init__.py	2025-03-07 22:08:48 +08:00
Atream	d453c320f1	fix flashinfer precision	2025-03-07 14:07:00 +00:00
BITcyman	299c4dca64	[update] support openai chat completion api	2025-03-07 08:51:09 +00:00
ZiWei Yuan	63b1c8525b	Merge pull request #820 from kvcache-ai/develop-0.2.3 Develop 0.2.3 ready to release	2025-03-06 14:46:09 +08:00
liam	8eeb6dd432	⚡ update compile option for avx512vpopcntdq	2025-03-06 12:18:04 +08:00
chenmz00	b2ba795cfd	fix: list models API Fix the list models API to match the corresponding OpenAI API format.	2025-03-05 21:49:27 +08:00
liam	9c343b4f71	🔖 release v0.2.3	2025-03-05 20:24:11 +08:00
liam	848fe8ab97	⚡ release v0.2.3	2025-03-05 20:21:04 +08:00
Azure	d7becadcf7	Merge branch 'develop-0.2.3' of https://github.com/kvcache-ai/ktransformers into develop-0.2.3	2025-03-05 09:26:23 +00:00
Azure	662c1e4c14	small fix about max new token	2025-03-05 09:25:41 +00:00
liam	dc10480ef6	⚡ add humaneval support	2025-03-04 20:54:49 +08:00
Yi Pan	01755a60c0	fix: wrong shape in KLinearMarlin.	2025-03-03 17:34:45 +08:00
Atream	8963ae7817	Update __init__.py	2025-03-03 16:49:50 +08:00
wang jiahao	48b9800790	Merge pull request #759 from 3wweiweiwu/fix_top_p_typo fix typo for top_p	2025-03-02 13:58:11 +08:00
1668068727@qq.com	7cdf8139f0	fix ollama api temperature bug	2025-03-02 13:55:26 +08:00
Wix Woo	3aa0cfc29d	fix typo for top_p	2025-03-01 20:15:36 +00:00
Atream	ca1dc1e7d1	Merge branch 'main' into main	2025-03-01 23:24:10 +08:00
宁鹏涛	71286ec1c0	Update local_chat.py 修复config.architectures[0] == "DeepseekV2ForCausalLM" or "DeepseekV3ForCausalLM" 永远为真	2025-03-01 21:52:48 +08:00
Atream	fa03ea48dd	Merge branch 'main' into feat-chunk-prefill-flashinfer	2025-03-01 11:35:09 +00:00
Atream	f35e8d41d8	support chunk prefill, support 139K context for 24G VRAM	2025-03-01 11:28:25 +00:00
liam	80e0536fb0	Merge branch 'main' of https://github.com/KMSorSMS/ktransformers into main	2025-03-01 00:12:21 +08:00
liam	8ddc990668	⚡ fix server cache lens	2025-03-01 00:09:57 +08:00
Shuaiyi	a34a25d5cc	Delete unused code	2025-02-27 13:18:19 +00:00
wang jiahao	7a19f3b781	Merge pull request #721 from kvcache-ai/fix_temperature fix temperature	2025-02-27 21:01:21 +08:00
qiyuxinlin	22df52e94e	fix temperature	2025-02-27 21:00:44 +08:00
Atream	85e2cc7bf4	Merge pull request #719 from kvcache-ai/fix-use-generation-json use generation config from json file in official repo	2025-02-27 19:49:41 +08:00
Atream	e645d84794	use generation config from json file in official repo	2025-02-27 11:48:34 +00:00
lazymio	b121ca4df8	Fix according to upstream changes	2025-02-27 18:11:35 +08:00
wang jiahao	26f7b4af11	Merge branch 'main' into temperature_top_p_from_request	2025-02-27 18:08:55 +08:00
Atream	50c691297f	Merge pull request #622 from akemimadoka/fix-msvc Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC	2025-02-27 17:42:00 +08:00
Atream	0422152cf3	Merge pull request #670 from akemimadoka/fix-win Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-27 17:40:27 +08:00
Atream	798e1d0cfa	Merge pull request #532 from xv44586/fix-sse-formatting fix: fix SSE formatting	2025-02-27 12:19:23 +08:00
Atream	f403cde6d4	Merge pull request #650 from ceerRep/main feat: basic api key support	2025-02-27 12:16:53 +08:00
Atream	8db6a4d402	Merge branch 'main' into main	2025-02-27 12:12:32 +08:00
wang jiahao	3c8c580580	Merge pull request #691 from swu-hyk/ollama_api_chat feat:implementation of chat routing for Ollama	2025-02-27 11:17:48 +08:00
Atream	90eb87b3fc	Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-02-26 21:53:50 +08:00
swu-hyk	ec7e912fee	modify	2025-02-26 19:21:30 +08:00
swu-hyk	68e7df3a25	implementation of chat routing for Ollama	2025-02-26 17:05:00 +08:00
Chen Hongtao	9660b2cc1e	Merge pull request #685 from vproxy-tools/main fix numa cpu distribution	2025-02-26 15:35:19 +08:00
liam	ffb86c66e3	⚡ fix experts torch	2025-02-26 15:04:40 +08:00
wkgcass	b2bff17775	fix numa cpu distribution The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.	2025-02-26 14:49:57 +08:00
akemimadoka	8817777e11	Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-26 03:50:12 +08:00
liam	ddf3339339	⚡ release v0.2.2rc1	2025-02-25 22:06:36 +08:00
Azure	91c1619296	Merge branch 'develop-0.2.2' into support-fp8 Update README.md	2025-02-25 13:43:26 +00:00
Azure	2c0cce90d0	add fp8 multi gpu yaml example	2025-02-25 13:32:09 +00:00
Atream	d9b2895bd3	Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2	2025-02-25 12:47:48 +00:00
Atream	477ac28a9c	fix-update-flashinfer_wrapper_local_chat	2025-02-25 12:47:31 +00:00
Azure	7e5962af3d	fix fp8 multi gpu; update FQA	2025-02-25 10:52:29 +00:00

1 2 3 4 5

203 commits