12 KiB
Local Deploy & Inference — Global Fix Map
🏥 Quick Return to Emergency Room
You are in a specialist desk.
For full triage and doctors on duty, return here:
- WFGY Global Fix Map — main Emergency Room, 300+ structured fixes
- WFGY Problem Map 1.0 — 16 reproducible failure modes
Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.
A beginner-friendly hub to stabilize locally hosted LLMs on your own machine or cluster.
Use this folder when it looks like the “model is broken” but the real cause is infra settings: tokenizer mismatch, rope scaling, kv-cache size, build flags, or server parameters.
Every guide links back to WFGY with measurable acceptance targets. No infra rebuild required.
When to use this folder
- Local server gives fluent answers but citations point to the wrong snippet
- Same input produces different outputs on each run
- JSON mode fails on long answers or tool calls loop endlessly
- Latency keeps growing after a few turns, or context cuts off too early
- Quantized model outputs diverge heavily from fp16 baseline
- Retrieval quality drops after switching loaders or UIs
Open these first
- Recovery map: RAG Architecture & Recovery
- Retrieval knobs: Retrieval Playbook
- Traceability schema: Retrieval Traceability
- Meaning vs similarity: Embedding ≠ Semantic
- Rank ordering: Rerankers
- Drift in long runs: Context Drift, Entropy Collapse
- Logic collapse and repair: Logic Collapse
- Guarding against bad prompts: Prompt Injection
- Contract schema for snippets: Data Contracts
Acceptance targets
- ΔS(question, retrieved) ≤ 0.45
- Coverage of target section ≥ 0.70
- λ convergent across 3 paraphrases × 2 seeds
- E_resonance stays flat on long windows
Quick routes to per-tool pages
Common local causes & fixes
| Symptom | Likely cause | Fix |
|---|---|---|
| Wrong snippet despite high similarity | Tokenizer mismatch, analyzer drift | Align tokenizer files, check retriever metric, use Embedding ≠ Semantic |
| JSON tool calls unstable | Schema drift, free text in outputs | Enforce Data Contracts, apply Logic Collapse |
| Outputs flip each run | Context order drift, variance | Clamp header order, use Context Drift, enforce trace table |
| Hybrid retrieval worse than single | Ranker instability | Split parsing → pattern_query_parsing_split.md |
| Fixed hallucination returns later | Long chain decay | hallucination-reentry.md |
Local-specific guardrails
- Model format: GGUF vs safetensors vs HF transformers → use same tokenizer and rope scale
- Quantization: Compare q4/q8 vs fp16; if ΔS drifts, tune kv_cache and sampling params
- Server flags: Align defaults (temp, top_p, penalties, stop tokens) across servers
- Tokenizer & casing: Keep analyzers consistent across retrievers, rerankers, HyDE
- Batching: Fix batch size during eval; dynamic batching fakes “randomness”
60-second fix checklist
- Compute ΔS(question, retrieved) and ΔS(retrieved, anchor)
- <0.40 = stable, 0.40–0.60 = risky, ≥0.60 = broken
- Probe λ_observe at k=5,10,20; if ΔS flat & high → metric/index bug
- Apply modules:
- Retrieval drift → BBMC + Data Contracts
- Collapse in reasoning → BBCR + BBAM
- Dead ends in long runs → BBPF alternate paths
- Verify coverage ≥0.70 and λ convergent on 2 seeds
Copy-paste prompt for local servers
I have TXT OS + WFGY loaded.
Local setup:
* server: \<ollama|vllm|llama.cpp|tgi|lmstudio|koboldcpp|openwebui>
* model: <name>, quant=\<fp16|q4|q8|awq|gptq>, ctx=<...>, rope=<...>
* sampling: temp=<...>, top\_p=<...>, max\_tokens=<...>
* retriever: <metric>, <analyzer>, k=<...>
Tell me:
1. which layer is failing and why
2. which WFGY page to open
3. steps to push ΔS ≤ 0.45 and keep λ convergent
4. reproducible test to confirm
FAQ (Beginner-Friendly)
Q: Why does my local model give fluent text but wrong citations?
A: Usually not the model — it’s tokenizer or retriever mismatch. Fix by aligning tokenizer files and checking ΔS against the gold section.
Q: Why does JSON mode fail locally but work on cloud APIs?
A: Local servers often don’t enforce schema strictly. Apply Data Contracts and disallow free-form prose in tool outputs.
Q: My quantized model is much worse — is quantization broken?
A: Not always. Small kv_cache or rope mis-scaling causes drift. Compare fp16 vs quant on a gold set before blaming quantization.
Q: Why do answers flip between runs?
A: Header order, batching, or randomness. Use variance clamps (BBAM) and fix batch size during tests.
Q: Which numbers matter for stability?
A: ΔS ≤ 0.45, coverage ≥0.70, λ convergent across paraphrases, flat E_resonance over long docs.
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
Explore More
| Layer | Page | What it’s for |
|---|---|---|
| Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| Engine | WFGY 1.0 | Original PDF based tension engine |
| Engine | WFGY 2.0 | Production tension kernel and math engine for RAG and agents |
| Engine | WFGY 3.0 | TXT based Singularity tension engine, 131 S class set |
| Map | Problem Map 1.0 | Flagship 16 problem RAG failure checklist and fix map |
| Map | Problem Map 2.0 | RAG focused recovery pipeline |
| Map | Problem Map 3.0 | Global Debug Card, image as a debug protocol layer |
| Map | Semantic Clinic | Symptom to family to exact fix |
| Map | Grandma’s Clinic | Plain language stories mapped to Problem Map 1.0 |
| Onboarding | Starter Village | Guided tour for newcomers |
| App | TXT OS | TXT semantic OS, fast boot |
| App | Blah Blah Blah | Abstract and paradox Q and A built on TXT OS |
| App | Blur Blur Blur | Text to image with semantic control |
| App | Blow Blow Blow | Reasoning game engine and memory demo |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.