Update exllamaV2.md

2026-04-28 03:29:51 +00:00 · 2025-09-05 11:15:10 +08:00 · 2025-09-05 11:15:10 +08:00 · e5dd545c40
commit e5dd545c40
parent 039454bc54
1 changed files with 17 additions and 0 deletions
--- a/ProblemMap/GlobalFixMap/LocalDeploy_Inference/exllamaV2.md
+++ b/ProblemMap/GlobalFixMap/LocalDeploy_Inference/exllamaV2.md
@ -1,5 +1,22 @@
 # ExLlamaV2: Guardrails and Fix Patterns

+<details>
+  <summary><strong>🧭 Quick Return to Map</strong></summary>
+
+<br>
+
+  > You are in a sub-page of **LocalDeploy_Inference**.  
+  > To reorient, go back here:  
+  >
+  > - [**LocalDeploy_Inference** — on-prem deployment and model inference](./README.md)  
+  > - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)  
+  > - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)  
+  >
+  > Think of this page as a desk within a ward.  
+  > If you need the full triage and all prescriptions, return to the Emergency Room lobby.
+</details>
+
+
 ExLlamaV2 is a specialized inference backend for LLaMA-family models with optimized 4-bit quantization.  
 It provides faster throughput and lower VRAM usage compared to generic backends, but introduces new risks in accuracy, schema drift, and numerical stability.  
 This page maps those issues to WFGY structural fixes with measurable acceptance targets.