Add numa_nodes parameter to BaseMoEWrapper and all subclasses, allowing
users to explicitly specify which NUMA node IDs to use for subpool
mapping instead of always defaulting to sequential [0, 1, ..., N-1].
This enables running multiple KTransformers instances on different NUMA
nodes of the same machine, e.g. --kt-threadpool-count 1 --kt-numa-nodes 1
to bind to NUMA node 1. Previously this required external numactl
workarounds since subpool_numa_map was hardcoded to start from 0.
* update README for kt-kernel
* style: format C++ and Python code in kt-kernel
- Format C++ files: task_queue, ext_bindings, and MoE operators
- Format Python utility modules: amx, llamafile, and loader
- Improve code readability and consistency