vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-05-20 01:03:33 +00:00

History

PSBigBig 0c6d19aad3 Update example_08_eval_rag_quality.md		2025-08-15 23:32:51 +08:00
..
example_01_basic_fix.md	Update example_01_basic_fix.md	2025-08-15 23:30:10 +08:00
example_02_self_reflection.md	Update example_02_self_reflection.md	2025-08-15 23:30:24 +08:00
example_03_pipeline_patch.md	Update example_03_pipeline_patch.md	2025-08-15 23:30:34 +08:00
example_04_multi_agent_coordination.md	Update example_04_multi_agent_coordination.md	2025-08-15 23:30:44 +08:00
example_05_vectorstore_repair.md	Update example_05_vectorstore_repair.md	2025-08-15 23:32:23 +08:00
example_06_prompt_injection_block.md	Update example_06_prompt_injection_block.md	2025-08-15 23:32:33 +08:00
example_07_bootstrap_ordering.md	Update example_07_bootstrap_ordering.md	2025-08-15 23:32:42 +08:00
example_08_eval_rag_quality.md	Update example_08_eval_rag_quality.md	2025-08-15 23:32:51 +08:00
README.md	Update README.md	2025-08-15 23:29:59 +08:00

README.md

Problem Map — Examples Index (hands-on, SDK-free)

This folder gives you copy-paste runnable walkthroughs that fix real failure modes from the Problem Map.
Everything is SDK-free. You can use Ollama, LangChain, vanilla OpenAI, or anything else.
Each example follows a strict format so you can reproduce, verify, and ship.

What you will get from each example

Problem: the exact failure it targets and its Problem Map number
Inputs: tiny reproducible corpus + the prompt template
Steps: minimal commands you can paste into a terminal
Checks: how to verify the fix worked, with clear pass/fail signals
Why it works: one short paragraph, no fluff

If you have not run a guarded RAG pipeline yet, start here:
Getting Started → ProblemMap/getting-started.md

Example list

01) Basic Guarded Answer (No.1 Hallucination & Chunk Drift)

File: example_01_basic_fix.md
Goal: force the model to answer only from evidence or say not in context.
Outcome: fewer fabrications, stable citations.

02) Self-Reflection Trace (No.1, No.2)

File: example_02_self_reflection.md
Goal: log query → retrieved chunks → answer, then reflect where drift starts.
Outcome: fast pinpointing of retrieval vs generation faults.

03) Pipeline Patch: Intersection + Rerank (No.1, No.4)

File: example_03_pipeline_patch.md
Goal: combine BM25 ∩ embedding, then rerank by cosine to remove tail noise.
Outcome: higher citation hit rate with the same token budget.

04) Multi-Agent Coordination Boundary (No.6 Logic Collapse)

File: example_04_multi_agent_coordination.md
Goal: keep sub-agents from merging incompatible contexts.
Outcome: fewer cross-topic blends and cleaner handoffs.

05) Vector Store Repair and Metrics (No.3 Index Schema Drift)

File: example_05_vectorstore_repair.md
Goal: align chunker version, tokenizer, and index metadata.
Outcome: comparable scores across rebuilds, stable recall.

06) Prompt-Injection Block (Clinic: Injection)

File: example_06_prompt_injection_block.md
Goal: sandbox evidence and neutralize instruction pollution.
Outcome: controlled outputs that ignore adversarial text.

07) Bootstrap Ordering (No.14)

File: example_07_bootstrap_ordering.md
Goal: warm models and indexes before the first query hits.
Outcome: no cold-start nulls, fewer first-minute errors.

08) Evaluate RAG Quality (Precision, Refusal, Citations)

File: example_08_eval_rag_quality.md
Goal: run a tiny benchmark on precision, refusal rate, and citation overlap.
Outcome: a repeatable baseline you can compare after every change.

Conventions used by all examples

1) Evidence-only answer template

Use only the evidence. If not provable, reply exactly: not in context.
Answer format:
- claim
- citations: [id,...]

2) Minimal hybrid retrieval

Retrieve by BM25 and embeddings
Intersect, then rerank by cosine
Keep top 8 after rerank, drop the tail

3) Trace everything

Each run appends one JSON line to runs/trace.jsonl:

{"ts": 1699999999, "q": "question", "chunks": [{"id":"p12#2","score":0.83}], "answer":"...", "ok": true}

4) Verification rules

If the answer contains facts outside evidence, it fails
If evidence is insufficient, the correct output is not in context
Citation ids in the answer must exist in the retrieved set

Quick smoke test before any example

This is a lightweight end-to-end test you can paste now.

# 1) Prepare a tiny corpus (two short pages as plain text)
mkdir -p data
cat > data/pages.json <<'JSON'
[
  {"id":"p1","page":1,"text":"The library defines X. X is a constrained mapping. See also Y."},
  {"id":"p2","page":2,"text":"Y is unrelated to X. It describes a separate protocol."}
]
JSON

# 2) Create two small chunks (one per page)
cat > data/chunks.json <<'JSON'
[
  {"id":"p1#1","page":1,"text":"X is a constrained mapping."},
  {"id":"p2#1","page":2,"text":"Y is unrelated to X. It describes a separate protocol."}
]
JSON

Now choose Python or Node. Both use the same conventions.

Python prompt builder (paste into a REPL and adapt to your LLM client):

def build_prompt(q, chunks):
    ctx = "\n\n".join(f"[{c['id']}] {c['text']}" for c in chunks)
    return (
        "Use only the evidence. If not provable, reply exactly: not in context.\n"
        "Answer format:\n"
        "- claim\n- citations: [id,...]\n\n"
        f"Question: {q}\n\nEvidence:\n{ctx}\n"
    )

q = "What is X?"
chunks = [
  {"id":"p1#1","text":"X is a constrained mapping."},
  {"id":"p2#1","text":"Y is unrelated to X. It describes a separate protocol."}
]
print(build_prompt(q, chunks))

Node prompt builder:

function buildPrompt(q, chunks) {
  const ctx = chunks.map(c => `[${c.id}] ${c.text}`).join("\n\n");
  return `Use only the evidence. If not provable, reply exactly: not in context.
Answer format:
- claim
- citations: [id,...]

Question: ${q}

Evidence:
${ctx}
`;
}

console.log(buildPrompt("What is X?", [
  { id:"p1#1", text:"X is a constrained mapping." },
  { id:"p2#1", text:"Y is unrelated to X. It describes a separate protocol." }
]));

Pass criteria

A correct answer for “What is X?” must say “X is a constrained mapping.” and cite [p1#1]
If you ask “What is Z?”, the only valid answer is not in context

How to use these examples in a real repo

Create a branch named pm-examples
Copy any example_*.md you need into your project’s docs folder
Replace the tiny corpus with your own pages.json and chunks.json
Keep the template and the verification rules
Commit traces for future debugging and audits

If you discover a better parameter or a corner case, open a PR to improve the example. Clear diffs beat long explanations.

Troubleshooting

Model refuses too often Increase top-k to 12 before rerank, but keep only top 8 after rerank.
Citations look wrong Check that chunk ids are preserved all the way into the prompt.
Different runs give different answers Fix your seed where possible and avoid mixing corpora built by different chunkers.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

README.md Unescape Escape