Update exllama.md

2026-04-28 11:40:07 +00:00 · 2025-09-05 11:15:05 +08:00 · 2025-09-05 11:15:05 +08:00 · 039454bc54
commit 039454bc54
parent dcc0515aef
1 changed files with 17 additions and 0 deletions
--- a/ProblemMap/GlobalFixMap/LocalDeploy_Inference/exllama.md
+++ b/ProblemMap/GlobalFixMap/LocalDeploy_Inference/exllama.md
@ -1,5 +1,22 @@
 # ExLLaMA: Guardrails and Fix Patterns

+<details>
+  <summary><strong>🧭 Quick Return to Map</strong></summary>
+
+<br>
+
+  > You are in a sub-page of **LocalDeploy_Inference**.  
+  > To reorient, go back here:  
+  >
+  > - [**LocalDeploy_Inference** — on-prem deployment and model inference](./README.md)  
+  > - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)  
+  > - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)  
+  >
+  > Think of this page as a desk within a ward.  
+  > If you need the full triage and all prescriptions, return to the Emergency Room lobby.
+</details>
+
+
 ExLLaMA (and its fork ExLLaMA2/ExLLaMA-HF) is a highly optimized CUDA inference backend used under **TextGen WebUI** and custom pipelines.
 It can run very large models (65B+) on limited VRAM, but often shows instability when sharded, quantized, or paired with retrieval layers.
 This guide stabilizes ExLLaMA with structural guardrails.