kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2025-09-15 01:29:42 +00:00

History

Aubrey Li a12e8ab46e yaml: fix Marlin AssertionError Marlin quantized linear only supports GPU device, when change generate_op to "KLinearMarlin", generate_device need to be changed to "cuda" accordingly. Fixes: `e5b001d76f` ("Update readme; Format code; Add example yaml.")		2025-03-21 23:58:20 +08:00
..
configs	update rope calculation; update modeling.py; update gate for moe	2025-02-01 07:32:21 +00:00
ktransformers_ext	fix rocm compilation	2025-03-15 12:34:03 -04:00
models	optimize gguf dequant, save mem, support Q2_K	2025-02-22 06:13:01 +00:00
operators	Update gate.py	2025-03-20 14:54:01 +08:00
optimize	yaml: fix Marlin AssertionError	2025-03-21 23:58:20 +08:00
server	Merge pull request #842 from BITcyman/fix-openai_chat_completion	2025-03-07 22:56:19 +08:00
tests	Merge pull request #943 from SkqLiao/main	2025-03-20 18:49:34 +08:00
util	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
website	✨: refactor local_chat and fix message slice bug in server	2024-11-04 14:02:19 +08:00
__init__.py	🔖 release v0.2.3post2	2025-03-15 18:04:10 +08:00
local_chat.py	merge main; Add torch q8 linear	2025-03-14 05:52:07 -04:00
local_chat_test.py	local chat for cicd test	2025-03-15 02:31:19 +08:00