align sft branch with main: revert worker_pool, strip sft_timer, fix inference defaults

- Revert worker_pool.cpp/.h to main (remove RDTSC timer, Chrome Trace,
  sft_timer namespace, ITT API, extended do_work_stealing_job API)
- Strip all sft_timer instrumentation from sft-only files (sft_moe.hpp,
  moe-sft-tp.hpp, avx_kernels.hpp)
- Restore pin_memory=True in KExpertsCPUBuffer (inference path)
- Restore fused tensor transpose logic in convert_cpu_weights.py (main layout)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
mrhaoxx 2026-04-21 17:39:56 +08:00
parent 168e10f254
commit a789729923
No known key found for this signature in database
7 changed files with 159 additions and 766 deletions

View file

@ -90,7 +90,7 @@ class KExpertsCPUBuffer:
hidden_size = hidden_states.shape[-1]
batch_size = hidden_states.shape[0]
pin_memory = False
pin_memory = True
if batch_size in cls.capture_buffers:
return cls.capture_buffers[batch_size]