Commit graph

  • 69a566831d
    Merge 61ded4f6bc into f0772445a1 login256 2026-05-18 20:45:46 +00:00
  • f2658823be
    Merge 3278c92e61 into f0772445a1 Benjamin 2026-05-18 11:01:43 +00:00
  • 3278c92e61 [fix](kt-kernel): add intermediate_size parity check in MXFP4 TP flat-buffer path yyj 2026-05-18 18:58:43 +08:00
  • 86d18f94db
    Merge 08bdfc22cf into f0772445a1 RaQiu 2026-05-18 17:19:56 +08:00
  • f5d44c8611 Merge upstream/main: adopt ActivationBF16/DequantizedWeight non-AVX512BF16 fallback yyj 2026-05-18 16:31:09 +08:00
  • 0e3d43adea [fix](kt-kernel): address PR #2010 review — memory leaks, alignment, dynamic expert update yyj 2026-05-18 16:05:41 +08:00
  • f0772445a1
    [perf]: native path for MXFP4 MoE on AVX512F (#2006) main Jim James 2026-05-18 02:44:33 -05:00
  • 6ca197868a [feat](kt-kernel): AVX2 MXFP4 MoE + AMX tile MXFP4 dispatch yyj 2026-05-17 13:20:06 +00:00
  • 08bdfc22cf fix(mesh): chunk prefill scratch promotions RaQiu 2026-05-17 00:56:45 +08:00
  • fb1a829e41 fix(mesh): skip score defer during prefill RaQiu 2026-05-17 00:43:39 +08:00
  • d7f7e66505 [refactor](mesh): remove mmap residency leftovers RaQiu 2026-05-16 23:43:22 +08:00
  • 15ddab0c19 [chore](mesh): move windows and desktop work off main RaQiu 2026-05-16 21:27:37 +08:00
  • 778e7affd2 [fix](mesh): clean extracted include endings RaQiu 2026-05-16 21:18:02 +08:00
  • 7c9b768475 [refactor](mesh): extract mesh runtime helpers RaQiu 2026-05-16 21:15:32 +08:00
  • 744f91e712 [perf]: move inline static constants outside structs Jim James 2026-05-15 03:58:27 -05:00
  • 6f4560aefd [perf]: native path for MXFP4 MoE on AVX512F Jim James 2026-05-15 01:23:57 -05:00
  • 95e20f9c55
    [build]: sync sglang submodule to ebaff7729b9e41c29d94f8d19a53473d321dc566 (#2005) github-actions[bot] 2026-05-14 22:25:31 +08:00
  • d60ddf135e [build]: sync sglang submodule to ebaff7729b9e41c29d94f8d19a53473d321dc566 auto/sync-sglang ovowei 2026-05-14 09:00:00 +00:00
  • e23b63259f [feat](mesh): restore prefill expert-window path RaQiu 2026-05-14 10:10:00 +08:00
  • b3a21c1476 [feat](mesh): add prefill layer-window residency mode RaQiu 2026-05-13 19:37:53 +08:00
  • 61c8c56f1c [fix](mesh): guard resident io by buffer layout RaQiu 2026-05-12 09:56:30 +08:00
  • 11ffbebedc [feat](mesh): restore deferred miss instrumentation RaQiu 2026-05-11 23:32:42 +08:00
  • 6ff1396bda [chore]: merge upstream main RaQiu 2026-05-11 22:19:46 +08:00
  • 3d95b6cb5c [fix](mesh): skip full-gate observe during cuda graph capture RaQiu 2026-05-11 16:01:10 +08:00
  • 39894eb7e5 [feat](mesh): implement slot residency and iouring prefetch RaQiu 2026-05-11 15:42:48 +08:00
  • f05b4009f3
    [fix](kt-kernel): fix double mem used by safetensor loader (#1997) Benjamin F 2026-05-11 12:00:30 +08:00
  • 4a033215bf [fix](kt-kernel): fix double mem used by safetensor loader xiongchenhui 2026-05-10 14:18:45 +08:00
  • bb15fdf47e
    Release/0.6.2.post3: carry kt-kernel SwiGLU clamp companion missing from post2 v0.6.2.post3 Benjamin F 2026-05-10 03:55:02 +08:00
  • 67abbc8087 [fix](release): 0.6.2.post3 — carry kt-kernel SwiGLU clamp companion missing from post2 yyj 2026-05-09 18:31:25 +00:00
  • bd6f4fc18e [feat](v4-2604b): plumb swiglu_limit through MXFP4 CPU MoE path yyj 2026-05-06 20:54:03 +08:00
  • a0f9b299bc docs: add entrypoints and support matrix doc-reorg-entrypoints-support-matrix JimmyPeilinLi 2026-05-10 01:22:13 +08:00
  • 37db9a3b83
    0.6.2.post2: submodule refactor and update tutorial (#1993) v0.6.2.post2 Benjamin F 2026-05-09 18:53:59 +08:00
  • 990812f678 0.6.2.post2: submodule refactor and update tutorial yyj 2026-05-09 18:50:14 +08:00
  • f83141512c [release]: 0.6.2.post2 yyj 2026-05-09 17:40:49 +08:00
  • 04d3737945 build: bump sglang submodule to 43ed1ec77 (V4-Flash hybrid SWA hang fix + DSV4 plugin refactor) yyj 2026-05-09 17:40:10 +08:00
  • 57967bc109 [build]: sync sglang submodule to 43ed1ec77a7fc37e6eeedbf9191e23abd418ae85 ovowei 2026-05-09 08:35:57 +00:00
  • f7c4fa68c5
    [fix]: add guard for SFT MoE and remove guard for AMX FP4 MoE on AVX512F+BW (#1980) Jim James 2026-05-08 03:05:22 -05:00
  • c465557c23
    docs(v4-flash): add optional AMXINT4 CPU-weight conversion path (#1986) Benjamin F 2026-05-08 15:35:05 +08:00
  • c5d9d652c3 docs(v4-flash): add optional AMXINT4 CPU-weight conversion path yyj 2026-05-08 15:27:02 +08:00
  • ddea72ed16 [fix]: harden iouring promotion concurrency RaQiu 2026-05-08 11:58:09 +08:00
  • dfded881ba [fix]: tolerate safetensors handles without close RaQiu 2026-05-07 23:48:09 +08:00
  • cce6b0979f [fix]: align safetensors header without strict payload check RaQiu 2026-05-07 23:45:30 +08:00
  • 00a2434310 [fix]: support Qwen3.5 expert key conversion RaQiu 2026-05-07 23:35:51 +08:00
  • 4c3d3cf02f [feat]: add safetensors direct I/O aligner RaQiu 2026-05-07 23:28:22 +08:00
  • b6d0d292d9 [feat]: complete AMX io_uring resident backend RaQiu 2026-05-07 23:16:06 +08:00
  • 61ded4f6bc Fix duplicate BF16 loader definition lyt 2026-05-07 08:02:29 +00:00
  • 1bdfaac9a9 [fix]: fallback when cpuinfer stream hooks are unavailable RaQiu 2026-05-07 14:34:38 +08:00
  • 1c96c53642 [fix]: skip tiered provider for moe without promotion hooks RaQiu 2026-05-07 14:19:26 +08:00
  • 4892595406 [fix]: detect Qwen3.5 unfused BF16 experts RaQiu 2026-05-07 14:09:47 +08:00
  • cba709123b [feat]: add residency policy selection for tiered weights RaQiu 2026-05-07 14:03:08 +08:00
  • 896de01024 [fix]: accept NUMA nodes in KTMoE wrapper RaQiu 2026-05-07 13:56:52 +08:00
  • 5b758e8f1a [feat]: wire AMX io_uring file-slot backend RaQiu 2026-05-07 12:04:17 +08:00
  • 5c0b7f2d5e [fix]: add guard for SFT MoE and remove guard for AMX FP4 MoE on AVX512F+BW Jim James 2026-05-06 14:47:57 -05:00
  • 99a39728b3 [fix]: io_uring compilation and test fixes RaQiu 2026-05-06 23:44:49 +08:00
  • 3740081249 [fix]: AsyncExpertReader constructor signature - remove num_workers parameter RaQiu 2026-05-06 22:10:38 +08:00
  • 48b50c8886 [feat]: io_uring Python bindings and CLI integration RaQiu 2026-05-06 21:56:16 +08:00
  • 04e1bf132c [feat]: io_uring async I/O for expert weight loading (Sprint 1-2) RaQiu 2026-05-06 17:39:55 +08:00
  • 8b9d233d42
    docs(v4-flash): tilelang install, MTP flags, Ampere unsupported (#1979) Benjamin F 2026-05-06 17:29:38 +08:00
  • 0207231e8c docs(v4-flash): tilelang install, MTP flags, Ampere unsupported yyj 2026-05-06 15:34:58 +08:00
  • 5eeaa13426 [fix](kt-kernel): fix double mem used when loading xiongchenhui 2026-05-06 11:33:16 +08:00
  • d7b5b49a3e
    [release]: 0.6.2.post1 v0.6.2.post1 Benjamin F 2026-05-03 21:07:23 +08:00
  • dcf5b1c42b release: bump version to 0.6.2.post1 yyj 2026-05-03 21:01:29 +08:00
  • 96189972d8
    build: bump sglang submodule to c9edb75e0 (V4-Flash GPU prefill fallback fix + perf) (#1975) Benjamin F 2026-05-03 19:42:19 +08:00
  • 3dc081f527 build: bump sglang submodule to c9edb75e0 (V4-Flash GPU prefill fallback fix + perf) yyj 2026-05-03 19:38:09 +08:00
  • 088ed979d5
    docs(v4-flash): pin transformers==4.57.1 in tutorial prerequisites (#1974) Benjamin F 2026-05-03 16:07:31 +08:00
  • a4ef3ad7f1 docs(v4-flash): pin transformers==4.57.1 in tutorial prerequisites yyj 2026-05-03 16:06:10 +08:00
  • 4b4312c0a2
    release: bump version to 0.6.2 (#1973) v0.6.2 Benjamin F 2026-05-03 14:28:09 +08:00
  • f2ce7d061a release: bump version to 0.6.2 yyj 2026-05-03 14:10:46 +08:00
  • bb3b6e8413
    build: bump sglang submodule to 40d3a82 (V4-Flash flashinfer guard) (#1972) Benjamin F 2026-05-03 14:06:33 +08:00
  • 11c0667027 build: bump sglang submodule to 40d3a82 (V4-Flash flashinfer guard) yyj 2026-05-03 13:15:09 +08:00
  • 53f356c328 deploy: 041bdfc636 gh-pages yyj6666667 2026-05-03 02:48:51 +00:00
  • 041bdfc636
    [New Model] DeepSeek-V4-Flash: kt-kernel MXFP4 MoE + sglang hybrid inference (#1970) Benjamin F 2026-05-03 10:48:31 +08:00
  • 667107996d docs(v4-flash): hybrid CPU/GPU recipe + bump kt-sglang submodule yyj 2026-05-02 21:46:53 +08:00
  • 1f197e3540 fix(loader): avoid uint16 lshift in ue8m0->bf16 conversion yyj 2026-04-26 18:52:27 +08:00
  • d50c09eea5 [perf](kt-kernel): MXFP4 MoE add mat-mat 4×4 tile, refine mat-vec reduce (#1957) Benjamin F 2026-04-26 17:34:08 +08:00
  • 5c2593901d [feat](kt-kernel): adapt MXFP4 MoE backend for DeepSeek-V4-Flash (#1950) Benjamin F 2026-04-25 18:11:53 +08:00
  • 6f5b413750 [feat](kt-kernel): add MXFP4 MoE operator with E2M1 weights × BF16 activations ouqingliang 2026-04-21 02:53:04 +00:00
  • fe06c4d355
    [build]: sync sglang submodule to 537eb762b0881071a0e098bd78666fe052b83deb (#1967) github-actions[bot] 2026-05-02 12:42:04 +08:00
  • 9ccc262ab2 [build]: sync sglang submodule to 537eb762b0881071a0e098bd78666fe052b83deb ovowei 2026-05-01 08:43:28 +00:00
  • fb4e11db95 deploy: 02be2bf53f jdai0 2026-04-30 09:17:11 +00:00
  • 02be2bf53f
    [feat](kt-kernel): add AVX2/AVX-VNNI RAWINT4 MoE backend (#1942) Aliez Ren 2026-04-30 18:16:49 +09:00
  • 7b5c87d07e
    Add instructions for AVX2 compilation Jiaheng Dai 2026-04-30 17:00:15 +08:00
  • b91a54e185
    Update AVX2 tutorial with AVX2 compilation instructions Jiaheng Dai 2026-04-30 16:58:46 +08:00
  • c449365810 [fix](kt-kernel): fix double mem used by safetensor loader xiongchenhui 2026-04-30 16:27:27 +08:00
  • 07f39626ae deploy: 8c634d5dca JimmyPeilinLi 2026-04-30 08:25:52 +00:00
  • 8c634d5dca
    [docs]: refresh kt inference and sft entry points Peilin Li 2026-04-30 16:25:34 +08:00
  • dbc254a0f5 [docs]: refresh kt inference and sft entry points JimmyPeilinLi 2026-04-30 08:23:48 +00:00
  • 24b1941b85
    [fix]: point sglang extra to post2 (#1964) v0.6.1 Peilin Li 2026-04-30 11:57:02 +08:00
  • 5928ccb0b5 [fix]: point sglang extra to post2 JimmyPeilinLi 2026-04-30 03:16:05 +00:00
  • 72044ad65f
    [build]: bump v0.6.1 post1 package metadata v0.6.1.post1 Peilin Li 2026-04-30 01:02:44 +08:00
  • 429b83f57a [build]: bump v0.6.1 post1 package metadata JimmyPeilinLi 2026-04-29 17:01:37 +00:00
  • ef5822639f
    [fix](kt-kernel): pin torch 2.9.1 wheel baseline Peilin Li 2026-04-30 00:57:24 +08:00
  • 719fa0a06d [build](sglang): bump submodule for torch 2.9.1 baseline JimmyPeilinLi 2026-04-29 16:07:42 +00:00
  • 65c04c4c55 [fix](kt-kernel): pin torch 2.9.1 wheel baseline JimmyPeilinLi 2026-04-29 16:07:00 +00:00
  • 9f34ef46e6
    [fix](Qwen3 series): fix gibberish output by correcting RoPE write-back (#31) (#1959) Benjamin F 2026-04-27 22:04:29 +08:00
  • 1cb73c7100 [fix](Qwen3 series): fix gibberish output by correcting RoPE write-back (#31) bump-sglang-pr31 yyj 2026-04-27 21:59:38 +08:00
  • 6fea6b7d99 deploy: 0656e01ac1 JimmyPeilinLi 2026-04-26 16:46:01 +00:00
  • 0656e01ac1
    [docs]: refresh KT install commands (#1958) Peilin Li 2026-04-27 00:45:43 +08:00
  • d93ea7e21e [docs]: refresh KT install commands docs-v061-refresh JimmyPeilinLi 2026-04-26 16:29:20 +00:00
  • a7a575d41e
    [perf](kt-kernel): MXFP4 MoE add mat-mat 4×4 tile, refine mat-vec reduce (#1957) fp4-moe-amx Benjamin F 2026-04-26 17:34:08 +08:00