vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-05-19 16:31:07 +00:00

History

PSBigBig e5dd545c40 Update exllamaV2.md		2025-09-05 11:15:10 +08:00
..
checklists	Create .gitkeep	2025-08-25 19:00:30 +08:00
eval	Create .gitkeep	2025-08-25 19:01:03 +08:00
mvp_demo	Create .gitkeep	2025-08-25 19:00:54 +08:00
ops	Create .gitkeep	2025-08-25 19:01:13 +08:00
patterns	Create .gitkeep	2025-08-25 19:00:03 +08:00
playbooks	Create .gitkeep	2025-08-25 19:00:40 +08:00
tools	Create .gitkeep	2025-08-25 19:00:14 +08:00
.gitkeep	Create .gitkeep	2025-08-25 18:59:52 +08:00
autogptq.md	Update autogptq.md	2025-09-05 11:14:37 +08:00
awq.md	Update awq.md	2025-09-05 11:14:42 +08:00
bitsandbytes.md	Update bitsandbytes.md	2025-09-05 11:14:52 +08:00
ctransformers.md	Update ctransformers.md	2025-09-05 11:14:59 +08:00
exllama.md	Update exllama.md	2025-09-05 11:15:05 +08:00
exllamaV2.md	Update exllamaV2.md	2025-09-05 11:15:10 +08:00
gpt4all.md	Create gpt4all.md	2025-08-30 14:46:47 +08:00
jan.md	Create jan.md	2025-08-30 15:38:57 +08:00
koboldcpp.md	Create koboldcpp.md	2025-08-30 12:42:29 +08:00
llamacpp.md	Create llamacpp.md	2025-08-30 14:15:21 +08:00
lmstudio.md	Create lmstudio.md	2025-08-30 16:38:39 +08:00
ollama.md	Create ollama.md	2025-08-30 10:15:28 +08:00
README.md	Update README.md	2025-09-03 23:51:31 +08:00
textgen-webui.md	Create textgen-webui.md	2025-08-30 11:38:09 +08:00
tgi.md	Update tgi.md	2025-08-30 14:36:30 +08:00
vllm.md	Create vllm.md	2025-08-30 10:28:50 +08:00

README.md

Local Deploy & Inference — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

WFGY Global Fix Map — main Emergency Room, 300+ structured fixes

WFGY Problem Map 1.0 — 16 reproducible failure modes

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A beginner-friendly hub to stabilize locally hosted LLMs on your own machine or cluster.
Use this folder when it looks like the “model is broken” but the real cause is infra settings: tokenizer mismatch, rope scaling, kv-cache size, build flags, or server parameters.
Every guide links back to WFGY with measurable acceptance targets. No infra rebuild required.

When to use this folder

Local server gives fluent answers but citations point to the wrong snippet
Same input produces different outputs on each run
JSON mode fails on long answers or tool calls loop endlessly
Latency keeps growing after a few turns, or context cuts off too early
Quantized model outputs diverge heavily from fp16 baseline
Retrieval quality drops after switching loaders or UIs

Open these first

Recovery map: RAG Architecture & Recovery
Retrieval knobs: Retrieval Playbook
Traceability schema: Retrieval Traceability
Meaning vs similarity: Embedding ≠ Semantic
Rank ordering: Rerankers
Drift in long runs: Context Drift, Entropy Collapse
Logic collapse and repair: Logic Collapse
Guarding against bad prompts: Prompt Injection
Contract schema for snippets: Data Contracts

Acceptance targets

ΔS(question, retrieved) ≤ 0.45
Coverage of target section ≥ 0.70
λ convergent across 3 paraphrases × 2 seeds
E_resonance stays flat on long windows

Quick routes to per-tool pages

Common local causes & fixes

Symptom	Likely cause	Fix
Wrong snippet despite high similarity	Tokenizer mismatch, analyzer drift	Align tokenizer files, check retriever metric, use Embedding ≠ Semantic
JSON tool calls unstable	Schema drift, free text in outputs	Enforce Data Contracts, apply Logic Collapse
Outputs flip each run	Context order drift, variance	Clamp header order, use Context Drift, enforce trace table
Hybrid retrieval worse than single	Ranker instability	Split parsing → pattern_query_parsing_split.md
Fixed hallucination returns later	Long chain decay	hallucination-reentry.md

Local-specific guardrails

Model format: GGUF vs safetensors vs HF transformers → use same tokenizer and rope scale
Quantization: Compare q4/q8 vs fp16; if ΔS drifts, tune kv_cache and sampling params
Server flags: Align defaults (temp, top_p, penalties, stop tokens) across servers
Tokenizer & casing: Keep analyzers consistent across retrievers, rerankers, HyDE
Batching: Fix batch size during eval; dynamic batching fakes “randomness”

60-second fix checklist

Compute ΔS(question, retrieved) and ΔS(retrieved, anchor)
- <0.40 = stable, 0.40–0.60 = risky, ≥0.60 = broken
Probe λ_observe at k=5,10,20; if ΔS flat & high → metric/index bug
Apply modules:
- Retrieval drift → BBMC + Data Contracts
- Collapse in reasoning → BBCR + BBAM
- Dead ends in long runs → BBPF alternate paths
Verify coverage ≥0.70 and λ convergent on 2 seeds

Copy-paste prompt for local servers


I have TXT OS + WFGY loaded.

Local setup:

* server: \<ollama|vllm|llama.cpp|tgi|lmstudio|koboldcpp|openwebui>
* model: <name>, quant=\<fp16|q4|q8|awq|gptq>, ctx=<...>, rope=<...>
* sampling: temp=<...>, top\_p=<...>, max\_tokens=<...>
* retriever: <metric>, <analyzer>, k=<...>

Tell me:

1. which layer is failing and why
2. which WFGY page to open
3. steps to push ΔS ≤ 0.45 and keep λ convergent
4. reproducible test to confirm

FAQ (Beginner-Friendly)

Q: Why does my local model give fluent text but wrong citations?
A: Usually not the model — it’s tokenizer or retriever mismatch. Fix by aligning tokenizer files and checking ΔS against the gold section.

Q: Why does JSON mode fail locally but work on cloud APIs?
A: Local servers often don’t enforce schema strictly. Apply Data Contracts and disallow free-form prose in tool outputs.

Q: My quantized model is much worse — is quantization broken?
A: Not always. Small kv_cache or rope mis-scaling causes drift. Compare fp16 vs quant on a gold set before blaming quantization.

Q: Why do answers flip between runs?
A: Header order, batching, or randomness. Use variance clamps (BBAM) and fix batch size during tests.

Q: Which numbers matter for stability?
A: ΔS ≤ 0.45, coverage ≥0.70, λ convergent across paraphrases, flat E_resonance over long docs.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

README.md Unescape Escape