kvcache-ai-ktransformers/ktransformers/ktransformers_ext
2024-08-12 12:53:12 +00:00
..
bench 1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic. 2024-08-08 09:04:36 +00:00
cmake [feature] support python 310 and multi instruction 2024-07-31 13:58:17 +00:00
cpu_backend Merge remote-tracking branch 'upstream/main' into develop-0.1.2 2024-08-12 12:31:49 +00:00
cuda [feature] support q2_k & q3_k dequantize on gpu 2024-08-12 12:53:12 +00:00
examples 1) Linear and MLP operators support qlen>1; 2) All operators now share a single memory buffer; 3) Refactor CPUInfer submit/sync logic. 2024-08-08 09:04:36 +00:00
operators [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
CMakeLists.txt [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00
ext_bindings.cpp [ADD] support multi-gpu qlen>1 q5_k 2024-08-12 11:41:26 +00:00