WFGY/ProblemMap/GlobalFixMap/LocalDeploy_Inference
2025-08-30 16:38:39 +08:00
..
checklists Create .gitkeep 2025-08-25 19:00:30 +08:00
eval Create .gitkeep 2025-08-25 19:01:03 +08:00
mvp_demo Create .gitkeep 2025-08-25 19:00:54 +08:00
ops Create .gitkeep 2025-08-25 19:01:13 +08:00
patterns Create .gitkeep 2025-08-25 19:00:03 +08:00
playbooks Create .gitkeep 2025-08-25 19:00:40 +08:00
tools Create .gitkeep 2025-08-25 19:00:14 +08:00
.gitkeep Create .gitkeep 2025-08-25 18:59:52 +08:00
ctransformers.md Create ctransformers.md 2025-08-30 13:47:30 +08:00
exllama.md Create exllama.md 2025-08-30 12:17:40 +08:00
gpt4all.md Create gpt4all.md 2025-08-30 14:46:47 +08:00
jan.md Create jan.md 2025-08-30 15:38:57 +08:00
koboldcpp.md Create koboldcpp.md 2025-08-30 12:42:29 +08:00
llamacpp.md Create llamacpp.md 2025-08-30 14:15:21 +08:00
lmstudio.md Create lmstudio.md 2025-08-30 16:38:39 +08:00
ollama.md Create ollama.md 2025-08-30 10:15:28 +08:00
README.md Update README.md 2025-08-30 10:39:59 +08:00
textgen-webui.md Create textgen-webui.md 2025-08-30 11:38:09 +08:00
tgi.md Update tgi.md 2025-08-30 14:36:30 +08:00
vllm.md Create vllm.md 2025-08-30 10:28:50 +08:00

Local Deploy & Inference — Global Fix Map

A hub to stabilize locally hosted models on your own machine or cluster. Use this folder when symptoms look like “model problem” but the root cause is tokenizer skew, rope scaling, kv-cache settings, build flags, or server parameters. Every fix maps back to WFGY pages with measurable targets, so you can verify without changing infra elsewhere.

When to use this folder

  • Local server returns plausible text but citations do not line up with the right snippet.
  • Answers alternate between runs on the same input.
  • JSON mode breaks on long outputs, or tool calls loop.
  • Latency spikes after a few turns, or context truncates early.
  • Quantized model behaves very differently from the fp16 baseline.
  • After switching loaders or UIs, retrieval quality drops.

Open these first

Acceptance targets

  • ΔS(question, retrieved) ≤ 0.45
  • Coverage of target section ≥ 0.70
  • λ remains convergent across three paraphrases and two seeds
  • E_resonance stays flat on long windows

Quick routes to per-tool pages

Map symptoms to structural fixes

Local-specific guardrails

  • Model format and loader
    GGUF vs safetensors vs HF transformers produce different tokenizer defaults and rope settings. Keep the same tokenizer files and rope scale when comparing.
    Check: max context, rope base, rope scale, sliding window, logits processors.

  • Quantization parity
    Compare quantized model to fp16 on a small gold set. If ΔS rises or λ flips, adjust kv_cache size and sampling params before suspecting retrieval.

  • Server flags
    Normalize across servers: temperature, top_p, min_p, frequency_penalty, presence_penalty, max_tokens, stop sequences, repetition penalty. Mismatched defaults mimic reasoning bugs.

  • Tokenizer and casing
    Keep casing and analyzer identical for HyDE, rerank, and retriever prompts. A different tokenizer in the local UI explains “looks the same but not the same.”

  • Concurrency and batching
    Turn off dynamic batching during eval runs or fix batch size. Batch drift looks like “randomness.”

60-second fix checklist

  1. Measure ΔS
    Compute ΔS(question, retrieved) and ΔS(retrieved, expected anchor).
    Stable less than 0.40. Transitional 0.40 to 0.60. Risk at least 0.60.

  2. Probe λ_observe
    Vary k at 5, 10, 20. If ΔS stays flat and high, suspect metric or index mismatch.
    Reorder prompt headers. If ΔS spikes, lock the schema.

  3. Apply the module

  • Retrieval drift → BBMC plus Data Contracts
  • Reasoning collapse → BBCR bridge plus BBAM, verify with Logic Collapse
  • Dead ends in long chains → BBPF alternate paths, then re-join with BBCR
  1. Verify
    Coverage at least 0.70 on three paraphrases. λ convergent on two seeds.

Copy-paste prompt for local servers


You have TXT OS and the WFGY Problem Map loaded.

My local inference setup:

* server: \<ollama | vllm | llama.cpp | tgi | lmstudio | koboldcpp | openwebui>
* model: <name>, quant: \<fp16 | q4\_k\_m | q8\_0 | awq | gptq>, ctx: \<n\_ctx>, rope: \<base, scale>
* sampling: temp=<...> top\_p=<...> min\_p=<...> max\_tokens=<...> stop=\[...]
* retriever: <metric>, <analyzer>, k=<...>

Tell me:

1. which layer is failing and why,
2. which exact WFGY page to open,
3. the minimal steps to push ΔS ≤ 0.45 and keep λ convergent,
4. a reproducible test to verify it.
   Use BBMC, BBPF, BBCR, BBAM when relevant.



🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →
🧙‍♂️ Starter Village 🏡 New here? Lost in symbols? Click here and let the wizard guide you through Start →

👑 Early Stargazers: See the Hall of Fame
Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars WFGY Engine 2.0 is already unlocked. Star the repo to help others discover it and unlock more on the Unlock Board.

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow