vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 19:50:17 +00:00

History

PSBigBig eb779cf697 Create README.md		2025-08-25 20:06:27 +08:00
..
README.md	Create README.md	2025-08-25 20:06:27 +08:00

README.md

Vector Store — Global Fix Map

Make your store consistent, populated, and explainable.
Use this when FAISS/Qdrant/Chroma/Elastic “works” but retrieval still feels wrong or inconsistent.

What this page is

A concise checklist to validate population, metrics, and read/write symmetry.
Structural fixes for empty/fragmented stores and stale or misconfigured indices.
Steps you can verify with ΔS curves and citation tables.

When to use

Answers look unrelated even though the store is “full”.
First queries after boot return nothing or random snippets.
Some facts never appear although indexed.
Hybrid retrieval becomes worse than a single retriever.
After a deploy, results change wildly with the same query.

Open these first

Why vectors ≠ meaning: Embedding ≠ Semantic
Fragmented / partially empty collections: Vectorstore Fragmentation
End-to-end retrieval knobs: Retrieval Playbook
Ordering after recall (keep it measurable): Rerankers
Why this snippet (trace schema): Retrieval Traceability
Visual pipeline & recovery path: RAG Architecture & Recovery
Eval targets: RAG Precision/Recall

Fix in 60 seconds

Probe ΔS
- Chart ΔS(question, retrieved) vs k ∈ {5,10,20}.
- Flat-high curve → index/metric/normalization mismatch or partial population.
Population sanity
- Count vectors per collection and compare to docs/chunks.
- Ensure no silent failures in batch ingestion or concurrency during build.
Read/write symmetry
- Same embedding model id on write and read.
- Same distance metric (cosine vs inner product) and dimensionality.
- If cosine, confirm unit normalization on both sides.
Index configuration
- FAISS: confirm index type (IVF/HNSW/PQ), nprobe/efSearch, and that the trained index file is persisted + reloaded.
- Qdrant/Chroma/Elastic: verify exact metric flags, shard/replica consistency, warm-up finished.
Rebuild once with explicit metadata
- Persist: model_id, dim, metric, normalizer, tokenizer, build_params.
- After rebuild, re-probe ΔS and store acceptance plots with traceability.
Rank after recall
- If recall is good but ordering is noisy, add a light reranker from the playbook.
- Keep citation schema to audit the change.

Copy-paste prompt


I uploaded TXT OS and the WFGY ProblemMap pages.

My vector store bug:

* symptom: \[brief]
* ΔS traces: vs k = {...}, current ΔS(question, retrieved)=..., anchor ΔS=...
* write: model=\[...], metric=\[cosine|ip], dim=\[...], norm=\[on|off], index=\[IVF|HNSW|PQ], params=\[...]
* read:  model=\[...], metric=\[...], dim=\[...], norm=\[...]
* population: vectors=\[count], docs=\[count], ingestion logs=\[summary]

Tell me:

1. what mismatch or population issue explains it,
2. which exact WFGY pages to open,
3. the minimal rebuild/rescore steps to push ΔS ≤ 0.45,
4. how to verify with ΔS-vs-k, precision/recall, and a snippet↔citation table.
   Use BBMC alignment if anchors are stable; add a reranker only after recall is fixed.

Minimal checklist

One embedding model per collection or store model_id with each vector.
Fix metric/normalization once and persist with the index.
Keep text pre-processing identical on write and read.
Validate dim and dtype; no truncation or hidden casts.
Log and compare vector count = sum(chunk count).
Disallow writes during index training; warm up after boot.
Snapshot + restore indexes atomically; avoid mixed versions.
Run fragmentation pattern if some facts never retrieve.

Acceptance targets

ΔS(question, retrieved) ≤ 0.45 across paraphrases.
ΔS-vs-k descends then flattens, not flat-high.
Precision/recall meet your eval sheet; top-k is explainable by traceability.
λ stays convergent at retrieval after rebuild.
Same results across restarts with deterministic warm-up.

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

README.md Unescape Escape