vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-09 05:54:06 +00:00

Author	SHA1	Message	Date
liam	8ddc990668	⚡ fix server cache lens	2025-03-01 00:09:57 +08:00
Shuaiyi	a34a25d5cc	Delete unused code	2025-02-27 13:18:19 +00:00
wang jiahao	7a19f3b781	Merge pull request #721 from kvcache-ai/fix_temperature fix temperature	2025-02-27 21:01:21 +08:00
qiyuxinlin	22df52e94e	fix temperature	2025-02-27 21:00:44 +08:00
Atream	85e2cc7bf4	Merge pull request #719 from kvcache-ai/fix-use-generation-json use generation config from json file in official repo	2025-02-27 19:49:41 +08:00
Atream	e645d84794	use generation config from json file in official repo	2025-02-27 11:48:34 +00:00
lazymio	b121ca4df8	Fix according to upstream changes	2025-02-27 18:11:35 +08:00
wang jiahao	26f7b4af11	Merge branch 'main' into temperature_top_p_from_request	2025-02-27 18:08:55 +08:00
Atream	50c691297f	Merge pull request #622 from akemimadoka/fix-msvc Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC	2025-02-27 17:42:00 +08:00
Atream	0422152cf3	Merge pull request #670 from akemimadoka/fix-win Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-27 17:40:27 +08:00
Atream	798e1d0cfa	Merge pull request #532 from xv44586/fix-sse-formatting fix: fix SSE formatting	2025-02-27 12:19:23 +08:00
Atream	f403cde6d4	Merge pull request #650 from ceerRep/main feat: basic api key support	2025-02-27 12:16:53 +08:00
Atream	8db6a4d402	Merge branch 'main' into main	2025-02-27 12:12:32 +08:00
wang jiahao	3c8c580580	Merge pull request #691 from swu-hyk/ollama_api_chat feat:implementation of chat routing for Ollama	2025-02-27 11:17:48 +08:00
Atream	90eb87b3fc	Update DeepSeek-V3-Chat-multi-gpu-marlin.yaml	2025-02-26 21:53:50 +08:00
swu-hyk	ec7e912fee	modify	2025-02-26 19:21:30 +08:00
swu-hyk	68e7df3a25	implementation of chat routing for Ollama	2025-02-26 17:05:00 +08:00
Chen Hongtao	9660b2cc1e	Merge pull request #685 from vproxy-tools/main fix numa cpu distribution	2025-02-26 15:35:19 +08:00
liam	ffb86c66e3	⚡ fix experts torch	2025-02-26 15:04:40 +08:00
wkgcass	b2bff17775	fix numa cpu distribution The numa node location would be calculated based on the total number of worker threads. So we should always use the actual number of threads instead of using a min() op.	2025-02-26 14:49:57 +08:00
akemimadoka	8817777e11	Fix RuntimeError on Windows caused by integer overflow in np.prod	2025-02-26 03:50:12 +08:00
liam	ddf3339339	⚡ release v0.2.2rc1	2025-02-25 22:06:36 +08:00
Azure	91c1619296	Merge branch 'develop-0.2.2' into support-fp8 Update README.md	2025-02-25 13:43:26 +00:00
Azure	2c0cce90d0	add fp8 multi gpu yaml example	2025-02-25 13:32:09 +00:00
Atream	d9b2895bd3	Merge branch 'fix-update-flashinfer_wrapper_local_chat' into develop-0.2.2	2025-02-25 12:47:48 +00:00
Atream	477ac28a9c	fix-update-flashinfer_wrapper_local_chat	2025-02-25 12:47:31 +00:00
Azure	7e5962af3d	fix fp8 multi gpu; update FQA	2025-02-25 10:52:29 +00:00
liam	0ca0b99fab	⚡ update git ignore add docker dev container	2025-02-25 17:22:11 +08:00
Azure	5474be5299	Merge branch 'main' into develop-0.2.2	2025-02-25 09:04:22 +00:00
Atream	b443c7dfa2	Merge pull request #657 from kvcache-ai/feat-absorb-for-long-prefill Feat absorb for long prefill	2025-02-25 16:53:21 +08:00
Atream	f4c198bd42	support absorb for prefill long context	2025-02-25 08:52:02 +00:00
Azure	36fbeee341	Update doc	2025-02-25 08:21:18 +00:00
ceerrep	f639fbc19e	feat: basic api key support	2025-02-25 14:11:39 +08:00
Azure	4dc5518e4d	update fp8 kernel tutorial	2025-02-24 15:37:01 +00:00
Atream	7b2a6690ab	Merge pull request #608 from makllama/fix_musa_ext musa: support bf16	2025-02-24 23:12:54 +08:00
Xiaodong Ye	f88c05a6f1	Ensure backward compatibility with Torch 2.2 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-02-24 21:55:30 +08:00
lazymio	07eb712a73	Left out	2025-02-24 21:51:14 +08:00
lazymio	91062a834f	Default values	2025-02-24 21:38:01 +08:00
lazymio	76487c4dcb	Revert repetition_penalty as it is not in API spec	2025-02-24 21:30:03 +08:00
lazymio	05ad288453	Also /chat/completions	2025-02-24 21:08:36 +08:00
lazymio	bf36547f98	Also allow repetition_penalty	2025-02-24 21:07:35 +08:00
lazymio	8704c09192	Allow temperature and top_p from requests	2025-02-24 21:01:33 +08:00
Azure	ca7366d2db	Merge remote-tracking branch 'upstream/develop-0.2.2' into support-fp8	2025-02-24 11:58:10 +00:00
Azure	581a524f65	Add data loader to read special weights for fp8; Add special weight process script	2025-02-24 11:34:17 +00:00
Atream	e9b1216a9a	Merge branch 'main' into feat-absorb-for-long-prefill	2025-02-24 09:44:17 +00:00
Atream	f327695079	fix KExpertsMarlin on GPU with out CUDA Graph	2025-02-24 09:30:54 +00:00
Yuhao Tsui	cea07d1998	Feat: Clear cache during weight loading to prevent OOM on GPUs with <=8GB VRAM This change explicitly clears CUDA cache during weight loading to mitigate memory fragmentation issues, particularly beneficial for low-VRAM GPUs.	2025-02-24 10:09:42 +08:00
akemimadoka	706e69f4fc	Fix missing macro definition for KTRANSFORMERS_USE_CUDA and <chrono> includes on MSVC	2025-02-24 01:37:50 +08:00
Atream	f5f6c6b95d	update yaml	2025-02-23 14:33:58 +00:00
Atream	e8e02e5ccc	support Moonlight	2025-02-23 14:21:18 +00:00

1 2 3 4 5

230 commits