vrr/kvcache-ai-ktransformers

mirror of https://github.com/kvcache-ai/ktransformers.git synced 2026-05-01 21:21:12 +00:00

Author	SHA1	Message	Date
Jianwei Dong	15c624dcae	Fix/sglang kt detection (#1875 ) * [feat]: simplify sglang installation with submodule, auto-sync CI, and version alignment - Add kvcache-ai/sglang as git submodule at third_party/sglang (branch = main) - Add top-level install.sh for one-click source installation (sglang + kt-kernel) - Add sglang-kt as hard dependency in kt-kernel/pyproject.toml - Add CI workflow to auto-sync sglang submodule daily and create PR - Add CI workflow to build and publish sglang-kt to PyPI - Integrate sglang-kt build into release-pypi.yml (version.py bump publishes both packages) - Align sglang-kt version with ktransformers via SGLANG_KT_VERSION env var injection - Update Dockerfile to use submodule and inject aligned version - Update all 13 doc files, CLI hints, and i18n strings to reference new install methods Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: bump version to 0.5.2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: rename PyPI package from kt-kernel to ktransformers Users can now `pip install ktransformers` to get everything (sglang-kt is auto-installed as a dependency). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "[build]: rename PyPI package from kt-kernel to ktransformers" This reverts commit `e0cbbf6364`. * [build]: add ktransformers meta-package for PyPI `pip install ktransformers` now works as a single install command. It pulls kt-kernel (which in turn pulls sglang-kt). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix]: show sglang-kt package version in kt version command - Prioritize sglang-kt package version (aligned with ktransformers) over sglang internal __version__ - Update display name from "sglang" to "sglang-kt" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix]: improve sglang-kt detection in kt doctor and kt version Recognize sglang-kt package name as proof of kvcache-ai fork installation. Previously both commands fell through to "PyPI (not recommended)" for non-editable local source installs. Now version.py reuses the centralized check_sglang_installation() logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [build]: bump version to 0.5.2.post1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-04 16:54:48 +08:00
ErvinXie	71f683acec	Support Native Kimi K2 Thinking (#1663 ) * [feat]: fix k2 prefill * Update Kimi-K2-Thinking.md * Create Kimi-K2-Thinking-Native.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking.md * Update Kimi-K2-Thinking-Native.md * [perf] optimize K2 MoE weight loading with per-expert pointers - Avoid expensive torch.stack().contiguous() in Python (was ~6.6s) - Use per-expert pointer arrays (gate_projs) instead of contiguous memory - C++ worker pool performs parallel memcpy for TP slicing - Add LOAD_TIME_PROFILE for load_weights timing analysis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: ouqingliang <1692110604@qq.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-12-05 21:53:05 +08:00
Jiaqi Liao	46af8fcab5	[doc] fix kt parameters (#1629 ) Some checks are pending Book-CI / test (push) Waiting to run Details Book-CI / test-1 (push) Waiting to run Details Book-CI / test-2 (push) Waiting to run Details Deploy / deploy (macos-latest) (push) Waiting to run Details Deploy / deploy (ubuntu-latest) (push) Waiting to run Details Deploy / deploy (windows-latest) (push) Waiting to run Details	2025-11-19 16:41:57 +08:00
Atream	b67cc4095d	Change attention backend to 'flashinfer' in launch command Updated the launch command to include 'flashinfer' as the attention backend.	2025-11-08 20:56:09 +08:00
Atream	0651dbda04	Simplify launch command by removing unused option Removed the unused '--attention-backend triton' option from the launch command.	2025-11-08 16:54:18 +08:00
Atream	d6ee384fe2	Fix download link for Kimi-K2-Thinking weights Updated the download link for AMX INT4 quantized weights.	2025-11-06 19:07:15 +08:00
Atream	d419024bb4	Add KTransformers SGLang inference documentation Add documentation for KTransformers SGLang inference deployment, including installation steps, model download links, server launch instructions, and performance benchmarks.	2025-11-06 17:53:58 +08:00

7 commits