kvcache-ai-ktransformers/ktransformers/ktransformers_ext
2024-08-12 11:41:26 +00:00
..
bench 1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic. 2024-08-08 09:04:36 +00:00
cmake [feature] support python 310 and multi instruction 2024-07-31 13:58:17 +00:00
cpu_backend [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
cuda [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
examples 1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic. 2024-08-08 09:04:36 +00:00
operators [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
CMakeLists.txt [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
ext_bindings.cpp [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00