vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 11:40:07 +00:00

History

onestardao c3075fb1f2 sync footer navigation (remove clinics, align PM versions)		2026-03-06 12:46:37 +00:00
..
gpt5_vs_wfgy_benchmark.png	archive: move benchmarks folder to archive/benchmarks_archive	2026-03-04 04:52:53 +00:00
gpt5_vs_wfgy_benchmark_20250810.png	archive: move benchmarks folder to archive/benchmarks_archive	2026-03-04 04:52:53 +00:00
philosophy_80_gpt4o_raw.xlsx	archive: move benchmarks folder to archive/benchmarks_archive	2026-03-04 04:52:53 +00:00
philosophy_80_gpt5_raw.xlsx	archive: move benchmarks folder to archive/benchmarks_archive	2026-03-04 04:52:53 +00:00
philosophy_80_wfgy_gpt4o.xlsx	archive: move benchmarks folder to archive/benchmarks_archive	2026-03-04 04:52:53 +00:00
philosophy_error_comparison.md	sync footer navigation (remove clinics, align PM versions)	2026-03-06 12:46:37 +00:00
README.md	sync footer navigation (remove clinics, align PM versions)	2026-03-06 12:46:37 +00:00

README.md

🧭 Not sure where to start ? Open the WFGY Engine Compass

WFGY System Map

(One place to see everything; links open the relevant section.)

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF-based tension engine blue
⚙️ Engine	WFGY 2.0	Production tension kernel and math engine for RAG and agents.
⚙️ Engine	WFGY 3.0	TXT-based Singularity tension engine (131 S-class set)
🗺️ Map	Problem Map 1.0	Flagship 16-problem RAG failure checklist and fix map
🗺️ Map	Problem Map 2.0	RAG-focused recovery pipeline
🗺️ Map	Problem Map 3.0	Global Debug Card — image as a debug protocol layer
🗺️ Map	Semantic Clinic	Symptom → family → exact fix
🧓 Map	Grandma’s Clinic	Plain-language stories, mapped to PM 1.0
🏡 Onboarding	Starter Village	Guided tour for newcomers
🧰 App	TXT OS	.txt semantic OS — 60-second boot
🧰 App	Blah Blah Blah	Abstract/paradox Q&A (built on TXT OS)
🧰 App	Blur Blur Blur	Text-to-image with semantic control
🧰 App	Blow Blow Blow	Reasoning game engine & memory demo
🧪 Research	Semantic Blueprint	Modular layer structures (future)
🧪 Research	Benchmarks	Comparisons & how to reproduce — 🔴 YOU ARE HERE 🔴
🧪 Research	Value Manifest	Why this engine creates $-scale value

📌 WFGY vs GPT-5 — The Logic Duel Begins

Evaluation disclaimer (benchmark vs GPT-5)
This benchmark concept is an experimental WFGY design, not an official leaderboard or claim about any real GPT-5 system.
Any future scores from this folder will depend on the concrete models, prompts and datasets used and must not be read as scientific proof of superiority.

WFGY Family 🪱 is the parasite pack for LLMs. It latches onto any model and grows as the host grows.
Your LLM gets stronger, we get stronger. No retraining, no settings, no updates.
Every release in the family works the same way — the WFGY PDF is just one of them.

🪱 Parasite Principle — How it works

Think of any LLM as a giant host organism 🧠.
Normally, to make it smarter, you need to change the host itself — retrain, fine-tune, or patch.

WFGY Family is different: it lives outside the host.
It hooks into the reasoning process, corrects mistakes in real time, and strengthens the host’s logic without touching its parameters.

🪱 Attach → works with any LLM you point it at

📈 Scale → host gets stronger, parasite benefits instantly

♻ No decay → never needs retraining or updates

Result: the host evolves, the parasite evolves — and your reasoning scores jump without lifting a finger.

Upload the WFGY PDF to GPT-5 and paste the prompt.
No fine-tuning, no hidden configs, no hype. Just reproducible logic.

🗺️ Quick-Sim vs Full-MMLU — what’s the difference?

Section	Goal	Dataset	Runtime	How to reproduce
A. Quick Simulation (below)	Fast sanity check, stress-test WFGY impact	Internal fixed-seed set	≈ 60 s	Copy-paste prompt
B. 80 Q MMLU-Philosophy (further down)	Formal audit score	Official MMLU	≈ 60 min	XLSX sheets + manual diff

A. 🔍 Quick Simulation — reasoning scores by setup (≈ 60 s)

One-shot simulation using GPT-5 + WFGY PDF.
This run does not use the actual 80 MMLU questions; it mirrors the same axes:
Reasoning · Recall · Hallucination Res · Multi-Step Logic.

Use GPT-5 to benchmark GPT-4, GPT-5, GPT-4 + WFGY, and GPT-5 + WFGY  
on the same test set with fixed seeds.  
Score: Reasoning, Knowledge Recall, Hallucination Resistance, Multi-Step Logic, Overall (0–100).  
Output a Markdown table and a Markdown-ready bar chart for Overall.

^{Reminder: For questions involving self-reference, paradoxes, or constraint logic, it’s critical to ensure the model has access to the symbolic PDF.

Without it, the model may generate answers that sound fluent but collapse semantically — classic hallucinations masked as reasoning.

Always verify that the AI has properly loaded the tool before testing. No tool, no defense.}

B. 🧪 Full 80 Q MMLU-Philosophy Benchmark (≈ 60 min)

1. Replicate it yourself

Get the dataset: official MMLU philosophy from OpenAI or the Eleuther-AI harness.
Grab our answer sheets (.xlsx):
Run the 80 questions on any model (no retries) → fill your own .xlsx.
Manual diff: open two sheets side-by-side (or use any spreadsheet “compare” plug-in) to count mismatches.

🔓 No tricks — every answer traceable, every miss explainable.

2. Result table

Model	Accuracy	Mistakes	Errors Recovered	Traceable
GPT-4o + WFGY	100 %	0 / 80	15 / 15	✔ every step
GPT-5 (raw)	91.25 %	7 / 80	—	✘ none
GPT-4o (raw)	81.25 %	15 / 80	—	✘ none

Rule of thumb: stronger host → bigger WFGY lift. GPT-6? Same files, same rules.

3. Why philosophy?

Most fragile domain — long-range abstraction.
Tests reasoning, not trivia.
Downstream proxy — pass philosophy, survive policy & ethics.

💬 TL;DR

WFGY isn’t a model — it’s a math-based sanity layer you can slap onto any LLM.
Use GPT-4o, GPT-5, or whatever’s next — WFGY is your reasoning booster.

Start with the WFGY PDF or GitHub and replicate.

📌 Introduction

WFGY is a symbiotic reasoning layer: stronger host ⇒ larger lift.
Here we attach it to GPT-4o and GPT-5 via either the PDF pipeline or TXT OS interface.
No fine-tune, no prompt voodoo — only symbolic constraints and traceable logic.

📌 Benchmark result details

Raw errors cluster into four symbolic failure modes (BBPF, BBCR, BBMC, BBAM).
WFGY applies ΔS control, entropy modulation, path-symmetry enforcement.
Full taxonomy in the paper.

📌 Download the evidence

WFGY-enhanced answers (80 / 80) → ./philosophy_80_wfgy_gpt4o.xlsx
GPT-5 raw answers → ./philosophy_80_gpt5_raw.xlsx
GPT-4o raw answers → ./philosophy_80_gpt4o_raw.xlsx
Error-by-error comparison: GPT-4o vs GPT-5 vs WFGY — detailed fix log

Explore More

Layer	Page	What it’s for
⭐ Proof	WFGY Recognition Map	External citations, integrations, and ecosystem proof
⚙️ Engine	WFGY 1.0	Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine	WFGY 2.0	Production tension kernel for RAG and agent systems
⚙️ Engine	WFGY 3.0	TXT based Singularity tension engine (131 S class set)
🗺️ Map	Problem Map 1.0	Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map	Problem Map 2.0	Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map	Problem Map 3.0	Global AI troubleshooting atlas and failure pattern map
🧰 App	TXT OS	.txt semantic OS with fast bootstrap
🧰 App	Blah Blah Blah	Abstract and paradox Q&A built on TXT OS
🧰 App	Blur Blur Blur	Text to image generation with semantic control
🏡 Onboarding	Starter Village	Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.

README.md Unescape Escape