vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-05-20 09:23:03 +00:00

History

PSBigBig a7d23db140 Create lmstudio.md		2025-08-30 16:38:39 +08:00
..
checklists	Create .gitkeep	2025-08-25 19:00:30 +08:00
eval	Create .gitkeep	2025-08-25 19:01:03 +08:00
mvp_demo	Create .gitkeep	2025-08-25 19:00:54 +08:00
ops	Create .gitkeep	2025-08-25 19:01:13 +08:00
patterns	Create .gitkeep	2025-08-25 19:00:03 +08:00
playbooks	Create .gitkeep	2025-08-25 19:00:40 +08:00
tools	Create .gitkeep	2025-08-25 19:00:14 +08:00
.gitkeep	Create .gitkeep	2025-08-25 18:59:52 +08:00
ctransformers.md	Create ctransformers.md	2025-08-30 13:47:30 +08:00
exllama.md	Create exllama.md	2025-08-30 12:17:40 +08:00
gpt4all.md	Create gpt4all.md	2025-08-30 14:46:47 +08:00
jan.md	Create jan.md	2025-08-30 15:38:57 +08:00
koboldcpp.md	Create koboldcpp.md	2025-08-30 12:42:29 +08:00
llamacpp.md	Create llamacpp.md	2025-08-30 14:15:21 +08:00
lmstudio.md	Create lmstudio.md	2025-08-30 16:38:39 +08:00
ollama.md	Create ollama.md	2025-08-30 10:15:28 +08:00
README.md	Update README.md	2025-08-30 10:39:59 +08:00
textgen-webui.md	Create textgen-webui.md	2025-08-30 11:38:09 +08:00
tgi.md	Update tgi.md	2025-08-30 14:36:30 +08:00
vllm.md	Create vllm.md	2025-08-30 10:28:50 +08:00

README.md

Local Deploy & Inference — Global Fix Map

A hub to stabilize locally hosted models on your own machine or cluster. Use this folder when symptoms look like “model problem” but the root cause is tokenizer skew, rope scaling, kv-cache settings, build flags, or server parameters. Every fix maps back to WFGY pages with measurable targets, so you can verify without changing infra elsewhere.

When to use this folder

Local server returns plausible text but citations do not line up with the right snippet.
Answers alternate between runs on the same input.
JSON mode breaks on long outputs, or tool calls loop.
Latency spikes after a few turns, or context truncates early.
Quantized model behaves very differently from the fp16 baseline.
After switching loaders or UIs, retrieval quality drops.

Open these first

Visual map and recovery: RAG Architecture & Recovery
End to end retrieval knobs: Retrieval Playbook
Why this snippet (traceability schema): Retrieval Traceability
Embedding vs meaning: Embedding ≠ Semantic
Ordering control: Rerankers
Long chains and entropy: Context Drift, Entropy Collapse
Structural collapse and recovery: Logic Collapse
Prompt injection and schema locks: Prompt Injection
Snippet and citation schema: Data Contracts

Acceptance targets

ΔS(question, retrieved) ≤ 0.45
Coverage of target section ≥ 0.70
λ remains convergent across three paraphrases and two seeds
E_resonance stays flat on long windows

Quick routes to per-tool pages

Ollama: ollama.md
vLLM: vllm.md
llama.cpp (server and bindings): llama_cpp.md
TGI Text Generation Inference: tgi.md
LM Studio: lmstudio.md
KoboldCpp: koboldcpp.md
OpenWebUI: openwebui.md
Oobabooga Text Gen WebUI: oobabooga.md

Map symptoms to structural fixes

Wrong snippet despite high similarity
→ Embedding ≠ Semantic
Check analyzer, metric, and normalization in your retriever. Do not blame the model yet.
JSON tool calls or functions go unstable
→ Data Contracts, Logic Collapse
Lock schemas, forbid free text in tool outputs, and echo the contract.
Answers flip between identical runs
→ Context Drift, Retrieval Traceability
Fix header order, clamp variance, and add cite-then-explain guardrails.
Hybrid retrieval loses to a single retriever
→ Pattern: Query Parsing Split, Rerankers
Hallucination returns after you fixed it once
→ Pattern: Hallucination Re-entry

Local-specific guardrails

Model format and loader
GGUF vs safetensors vs HF transformers produce different tokenizer defaults and rope settings. Keep the same tokenizer files and rope scale when comparing.
Check: max context, rope base, rope scale, sliding window, logits processors.
Quantization parity
Compare quantized model to fp16 on a small gold set. If ΔS rises or λ flips, adjust kv_cache size and sampling params before suspecting retrieval.
Server flags
Normalize across servers: temperature, top_p, min_p, frequency_penalty, presence_penalty, max_tokens, stop sequences, repetition penalty. Mismatched defaults mimic reasoning bugs.
Tokenizer and casing
Keep casing and analyzer identical for HyDE, rerank, and retriever prompts. A different tokenizer in the local UI explains “looks the same but not the same.”
Concurrency and batching
Turn off dynamic batching during eval runs or fix batch size. Batch drift looks like “randomness.”

60-second fix checklist

Measure ΔS
Compute ΔS(question, retrieved) and ΔS(retrieved, expected anchor).
Stable less than 0.40. Transitional 0.40 to 0.60. Risk at least 0.60.
Probe λ_observe
Vary k at 5, 10, 20. If ΔS stays flat and high, suspect metric or index mismatch.
Reorder prompt headers. If ΔS spikes, lock the schema.
Apply the module

Retrieval drift → BBMC plus Data Contracts
Reasoning collapse → BBCR bridge plus BBAM, verify with Logic Collapse
Dead ends in long chains → BBPF alternate paths, then re-join with BBCR

Verify
Coverage at least 0.70 on three paraphrases. λ convergent on two seeds.

Copy-paste prompt for local servers


You have TXT OS and the WFGY Problem Map loaded.

My local inference setup:

* server: \<ollama | vllm | llama.cpp | tgi | lmstudio | koboldcpp | openwebui>
* model: <name>, quant: \<fp16 | q4\_k\_m | q8\_0 | awq | gptq>, ctx: \<n\_ctx>, rope: \<base, scale>
* sampling: temp=<...> top\_p=<...> min\_p=<...> max\_tokens=<...> stop=\[...]
* retriever: <metric>, <analyzer>, k=<...>

Tell me:

1. which layer is failing and why,
2. which exact WFGY page to open,
3. the minimal steps to push ΔS ≤ 0.45 and keep λ convergent,
4. a reproducible test to verify it.
   Use BBMC, BBPF, BBCR, BBAM when relevant.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

README.md Unescape Escape