kvcache-ai-ktransformers/ktransformers
2025-02-17 00:10:55 +08:00
..
configs update rope calculation; update modeling.py; update gate for moe 2025-02-01 07:32:21 +00:00
ktransformers_ext toy support for experts on GPU, no CUDA Graph 2025-02-15 15:16:00 +00:00
models feat: add prefix cache for server 2025-02-17 00:10:55 +08:00
operators Mock triton mla due to precision issue 2025-02-16 06:03:12 +00:00
optimize toy support for experts on GPU, no CUDA Graph 2025-02-15 15:16:00 +00:00
server feat: add prefix cache for server 2025-02-17 00:10:55 +08:00
tests [fix] format classes and files name 2024-08-15 10:44:59 +08:00
util Merge pull request #333 from kvcache-ai/feat_experts_gpu 2025-02-15 23:30:24 +08:00
website : refactor local_chat and fix message slice bug in server 2024-11-04 14:02:19 +08:00
__init__.py [feature] update docker image and entrypoint 2025-02-15 07:55:33 +00:00
local_chat.py support force thinking 2025-02-12 12:43:53 +08:00