vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 19:50:17 +00:00

2025-08-15 23:36:37 +08:00

7 KiB

Raw Blame History

Live monitoring & alerting — RAG services

Goal: list of recommended metrics, alert rules and dashboard panels to keep RAG pipelines observable and actionable.

Core metrics to collect (recommended names)

Service-level

rag_e2e_latency_seconds (histogram) — E2E latency (request in → answer out)
rag_error_count_total — errors per endpoint
rag_request_count_total — total requests

Retrieval-level

retriever_qps_total
retriever_retrieved_docs_count (per request)
retriever_empty_result_count_total — unexpected empty sets

Vectorstore

vectorstore_index_load_time_seconds
vectorstore_memory_bytes
vectorstore_indexed_docs_total

Accuracy/provenance

rag_citation_hit_rate (CHR gauge over sliding window)
rag_precision_shipped (periodic batch scorer push)
rag_under_refusal_count_total

Infrastructure

llm_api_rate_limited_total
llm_api_error_total
queue_backlog_count (if using background queues)

Suggested PromQL alerts (examples)

Tune thresholds to your workload.

A) Latency breach (interactive)

alert: RAGHighP95Latency
expr: histogram_quantile(0.95, sum(rate(rag_e2e_latency_seconds_bucket[5m])) by (le,instance)) > 2
for: 5m
labels:
  severity: page
annotations:
  summary: "RAG p95 > 2s ({{ $labels.instance }})"

B) Error spike

alert: RAGErrorSpike
expr: increase(rag_error_count_total[5m]) > 50
for: 2m
labels: { severity: page }

C) Retriever empty results

alert: RetrieverEmptyResults
expr: increase(retriever_empty_result_count_total[5m]) > 1
for: 5m
labels: { severity: ticket }

D) CHR drop

alert: CHRDrop
expr: rag_citation_hit_rate < 0.6
for: 10m
labels: { severity: ticket }

E) LLM auth failure

alert: LLMAuthFail
expr: increase(llm_api_error_total{code="401"}[5m]) > 0
for: 1m

Dashboard panels (recommended)

E2E latency (p50/p95/p99) trend.
Requests per second and error rate.
Retriever QPS, avg retrieved docs, empty results.
CHR & Precision (batch scorer push).
Vectorstore memory & disk IO.
LLM provider error & rate-limit metrics.

Incident play (fast actions)

If CHR drop → run diagnostic retrieval for 10 golden queries (retrieved ids + cosine scores).
If retriever empty → check vectorstore health and index partitions. Restart index shard if needed.
If E2E latency spike with LLM errors → throttle traffic, put a hard rate limit and rollback deploy if needed.
If LLM auth failure → rotate key & redeploy secrets.

How to integrate scoring metrics

Periodic scorer job should push rag_citation_hit_rate and rag_precision_shipped as a short-timeseries gauge (per 5–15m window).
Use batching: run score_eval.py (see ProblemMap/eval/README.md) nightly and push summary metrics via a small exporter.

Troubleshooting queries (prometheus examples)

Check p95 per instance:

histogram_quantile(0.95, sum(rate(rag_e2e_latency_seconds_bucket[5m])) by (le,instance))

CHR trend:

avg_over_time(rag_citation_hit_rate[30m])

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

7 KiB Raw Blame History Unescape Escape