Update llamacpp.md

2026-04-28 11:40:07 +00:00 · 2025-09-05 11:15:58 +08:00 · 2025-09-05 11:15:58 +08:00 · aba1dc4de8
commit aba1dc4de8
parent 235bd1a3b5
1 changed files with 17 additions and 0 deletions
--- a/ProblemMap/GlobalFixMap/LocalDeploy_Inference/llamacpp.md
+++ b/ProblemMap/GlobalFixMap/LocalDeploy_Inference/llamacpp.md
@ -1,5 +1,22 @@
 # Llama.cpp: Guardrails and Fix Patterns

+<details>
+  <summary><strong>🧭 Quick Return to Map</strong></summary>
+
+<br>
+
+  > You are in a sub-page of **LocalDeploy_Inference**.  
+  > To reorient, go back here:  
+  >
+  > - [**LocalDeploy_Inference** — on-prem deployment and model inference](./README.md)  
+  > - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)  
+  > - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)  
+  >
+  > Think of this page as a desk within a ward.  
+  > If you need the full triage and all prescriptions, return to the Emergency Room lobby.
+</details>
+
+
 [Llama.cpp](https://github.com/ggerganov/llama.cpp) is the most widely used local inference runtime for GGML/GGUF models.
 It enables CPU/GPU inference across diverse hardware but often introduces fragile states: mismatched quantization, KV-cache drift, and long-context instability.
 This page defines reproducible WFGY-based guardrails and direct fixes.