diff --git a/doc/en/AMX.md b/doc/en/AMX.md index 1061e9f..00a2e22 100644 --- a/doc/en/AMX.md +++ b/doc/en/AMX.md @@ -20,9 +20,9 @@ Here is the Qwen3MoE startup command: ``` python # llamafile backend -python ktransformers/server/main.py --architectures Qwen3MoeForCausalLM --model_path --gguf_path --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml +python ktransformers/server/main.py --architectures Qwen3MoeForCausalLM --model_path --gguf_path --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml --backend_type balance_serve # AMX backend -python ktransformers/server/main.py --architectures Qwen3MoeForCausalLM --model_path --gguf_path --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml +python ktransformers/server/main.py --architectures Qwen3MoeForCausalLM --model_path --gguf_path --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml --backend_type balance_serve ``` **Note: At present, Qwen3MoE running with AMX can only read BF16 GGUF; support for loading from safetensor will be added later.**