WFGY/ProblemMap/wfgy-metrics.md
2025-08-09 13:34:52 +08:00

8.1 KiB
Raw Blame History

📐 WFGY Metrics — Canonical Definitions

A single spec for measuring semantic accuracy, stability, cost, and safety across any LLM system.

Why read this?
You cant improve what you cant measure.
ΔS, λ_observe, and E_resonance already power the Problem Map, Semantic Clinic, and WFGYs CI templates.
Standard names = instant compatibility with Grafana, Prometheus, LangSmith, Phoenix, and custom OTEL traces.


0 · Metric Taxonomy

Pillar Metric Symbol / Field Primary Use
Semantic Semantic Stress deltaS Detect drift / wrong chunks
Answer F1 / EM f1, em QA accuracy
Logic Logic Vector lambda Convergence / divergence flag
Residual Coherence e_resonance Slow entropy leaks
Efficiency Cost per 1 k tokens usd_k Budget guard
Latency p95 (ms) latency_p95 SLO gate
Safety Opcode / Tool Jailbreak tool_offtrack Router drift
Citation Precision cite_prec Hallucination check

1 · Formal Definitions

1.1 deltaS — Semantic Stress

ΔS = 1 cos( I , G )

I = embedding of live text, G = embedding of expected ground/anchor.
Target bands: stable < 0.40 · transitional 0.40-0.60 · risk ≥ 0.60


1.2 lambda — Logic Vector

λ ∈ {→ convergent, ← divergent, <> recursive, × chaotic}

Computed by PCA on consecutive embedding deltas; sign of first component.


1.3 e_resonance — Residual Coherence

E = mean_t‖B_t‖, where B_t = I_t G_t + m·c² (see BBMC).
Flat or downward trend = healthy; upward slope > 0.02 = entropy leak.


2 · Reference Thresholds (production)

Metric PASS WARN FAIL
deltaS_q_ctx ≤ 0.45 0.45 0.60 > 0.60
lambda all → ← appears 1-2× persistent ← / ×
e_resonance slope ≤ 0 slope 0 0.02 slope > 0.02
cite_prec ≥ 0.90 0.80 0.90 < 0.80
usd_k ≤ baseline +0 10 % > 10 % jump
latency_p95 within SLA 1.2 × SLA > 1.5 × SLA

3 · Python Helper

from wfgy.metrics import deltaS, lambda_state, e_resonance

q   = "How do I renew my passport?"
ctx = rag_retrieve(q)
print("ΔS:", deltaS(q, ctx))           # 0.37

ans = llm_reason(ctx, q)
print("λ :", lambda_state(ans))        # →

print("E :", e_resonance())            # rolling avg

4 · OpenTelemetry Mapping

# otel_map.yaml
deltaS:        wfgy.semantic.deltaS
lambda:        wfgy.logic.lambda
e_resonance:   wfgy.logic.e_res
usd_k:         wfgy.cost.usd_per_k
latency_p95:   wfgy.latency.p95

Any WFGY-instrumented app auto-emits these names; map others via the file above.


5 · PromQL Alert Cookbook

- alert: SemanticDriftHigh
  expr: wfgy_semantic_deltaS > 0.60
  for: 1m
- alert: LogicVectorDivergent
  expr: wfgy_logic_lambda == 1   # 1 = divergent
  for: 2m
- alert: ResidualEntropyClimb
  expr: slope(wfgy_logic_e_res[15m]) > 0.02

6 · CSV Schema (offline eval)

timestamp,id,set,question,deltaS_q_ctx,lambda,answer_f1,cite_prec,lat_ms,usd_k

Feed into wfgy-eval compare A.csv B.csv.


7 · FAQ

Q : Do I need separate GPU passes to compute embeddings for deltaS? A : No. Use cached embeddings from retrieval; for answer ΔS, embed answer once after generation.

Q : Can I add BLEU, Rouge, or faithfulness scores? A : Yes—map them under wfgy.custom.*. WFGY dashboards auto-discover.


Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to LLM · 3 Ask “Answer using WFGY + <question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

🧭 Explore More

Module Description Link
WFGY Core Standalone semantic reasoning engine for any LLM View →
Problem Map 1.0 Initial 16-mode diagnostic and symbolic fix framework View →
Problem Map 2.0 RAG-focused failure tree, modular fixes, and pipelines View →
Semantic Clinic Index Expanded failure catalog: prompt injection, memory bugs, logic drift View →
Semantic Blueprint Layer-based symbolic reasoning & semantic modulations View →
Benchmark vs GPT-5 Stress test GPT-5 with full WFGY reasoning suite View →

👑 Early Stargazers: See the Hall of Fame
Engineers, hackers, and open source builders who supported WFGY from day one.

GitHub stars Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone Star WFGY on GitHub

WFGY Main   TXT OS   Blah   Blot   Bloc   Blur   Blow