[docs]: update web doc (#1625)

2026-04-28 11:49:51 +00:00 · 2025-11-16 14:40:22 +08:00 · 2025-11-16 14:40:22 +08:00 · ab8ad0a110
commit ab8ad0a110
parent be6db6f46b
3 changed files with 27 additions and 44 deletions
--- a/doc/en/SFT/KTransformers-Fine-Tuning_User-Guide.md
+++ b/doc/en/SFT/KTransformers-Fine-Tuning_User-Guide.md
@ -1,23 +1,13 @@
- [KTransformers Fine-Tuning × LLaMA-Factory Integration – User Guide](#ktransformers-fine-tuning-x-llama-factory-integration-–-user-guide)
 - [Introduction](#introduction)
-
- [Fine-Tuning Results (Examples)](#fine-tuning-results-examples)
-  - [Stylized Dialogue (CatGirl tone)](#stylized-dialogue-catgirl-tone)
-  - [Benchmarks](#benchmarks)
-    - [Translational-Style dataset](#translational-style-dataset)
-    - [AfriMed-QA (short answer)](#afrimed-qa-short-answer)
-    - [AfriMed-QA (multiple choice)](#afrimed-qa-multiple-choice)
-
+  - [Fine-Tuning Results (Examples)](#fine-tuning-results-examples)
 - [Quick to Start](#quick-to-start)
  - [Environment Setup](#environment-setup)
  - [Core Feature 1: Use KTransformers backend to fine-tune ultra-large MoE models](#core-feature-1-use-ktransformers-backend-to-fine-tune-ultra-large-moe-models)
  - [Core Feature 2: Chat with the fine-tuned model (base + LoRA adapter)](#core-feature-2-chat-with-the-fine-tuned-model-base--lora-adapter)
  - [Core Feature 3: Batch inference + metrics (base + LoRA adapter)](#core-feature-3-batch-inference--metrics-base--lora-adapter)
-
 - [KT Fine-Tuning Speed (User-Side View)](#kt-fine-tuning-speed-user-side-view)
  - [End-to-End Performance](#end-to-end-performance)
  - [GPU/CPU Memory Footprint](#gpucpu-memory-footprint)
-
 - [Conclusion](#conclusion)


@ -33,7 +23,7 @@ Our goal is to give resource-constrained researchers a **local path to explore f

 As shown below, LLaMA-Factory is the unified orchestration/configuration layer for the whole fine-tuning workflow—handling data, training scheduling, LoRA injection, and inference interfaces. **KTransformers** acts as a pluggable high-performance backend that takes over core operators like Attention/MoE under the same training configs, enabling efficient **GPU+CPU heterogeneous cooperation**.

-![image-20251011010558909](../assets/image-20251011010558909.png)
+![image-20251011010558909](../../assets/image-20251011010558909.png)

 Within LLaMA-Factory, we compared LoRA fine-tuning with **HuggingFace**, **Unsloth**, and **KTransformers** backends. KTransformers is the **only workable 4090-class solution** for ultra-large MoE models (e.g., 671B) and also delivers higher throughput and lower GPU memory on smaller MoE models (e.g., DeepSeek-14B).

@ -46,7 +36,7 @@ Within LLaMA-Factory, we compared LoRA fine-tuning with **HuggingFace**, **Unslo

 † **1400 GB** is a **theoretical** FP16 full-parameter resident footprint (not runnable). **70 GB** is the **measured peak** with KT strategy (Attention on GPU + layered MoE offload).

-![按照模型划分的对比图_02](../assets/image-compare_model.png)
+![按照模型划分的对比图_02](../../assets/image-compare_model.png)

 ### Fine-Tuning Results (Examples)

@ -56,7 +46,7 @@ Dataset: [NekoQA-10K](https://zhuanlan.zhihu.com/p/1934983798233231689). Goal: i

 The figure compares responses from the base vs. fine-tuned models. The fine-tuned model maintains the target tone and address terms more consistently (red boxes), validating the effectiveness of **style-transfer fine-tuning**.

-![image-20251016175046882](../assets/image-20251016175046882.png)
+![image-20251016175046882](../../assets/image-20251016175046882.png)

 #### Benchmarks

@ -219,7 +209,7 @@ We recommend **AMX acceleration** where available (`lscpu | grep amx`). AMX supp

 Outputs go to `output_dir` in safetensors format plus adapter metadata for later loading.

-![image-20251016171537997](../assets/image-20251016171537997.png)
+![image-20251016171537997](../../assets/image-20251016171537997.png)

 ### Core Feature 2: Chat with the fine-tuned model (base + LoRA adapter)

@ -244,7 +234,7 @@ We also support **GGUF** adapters: for safetensors, set the **directory**; for G

 During loading, LLaMA-Factory maps layer names to KT’s naming. You’ll see logs like `Loaded adapter weight: XXX -> XXX`:

-![image-20251016171526210](../assets/image-20251016171526210.png)
+![image-20251016171526210](../../assets/image-20251016171526210.png)

 ### Core Feature 3: Batch inference + metrics (base + LoRA adapter)