vrr/WFGY

mirror of https://github.com/onestardao/WFGY.git synced 2026-04-28 11:40:07 +00:00

2025-08-15 23:36:20 +08:00

7 KiB

Raw Blame History

Deployment checklist — RAG pipeline (pre-deploy & post-deploy)

Purpose: a short, rigorous checklist to verify your environment and reduce bootstrap/dependency issues during deployment.

Before you deploy (pre-flight)

1) Environment & prerequisites

Kubernetes cluster accessible; kubectl points to correct context.
```
kubectl config current-context
kubectl get nodes
```

Ensure cluster resources: CPU / memory / ephemeral storage for vectorstore. Confirm quotas.
Secrets: API keys (LLM), db credentials, vectorstore creds in k8s Secret or vault.
Helm chart / manifests: reviewed and values set for production (replicas, resources, liveness/readiness).

2) Configuration sanity

values.yaml contains:
- resources.requests and limits for retriever/generator.
- replicaCount >= 2 for critical services (if expected load > small).
- readinessProbe and livenessProbe configured.
Vector store sizing: index_shards, disk IOPS, memory (embedding index memory).
Network egress rules for model API (if external LLM).

3) Observability & alarms

Prometheus scraping configured for app metrics endpoints (/metrics).
Default dashboards in Grafana (latency, error-rate, retriever QPS, CHR).
Alerts configured (see live_monitoring_rag.md for suggested alerts).

Deploy steps

Create namespace & secrets:

kubectl create ns rag-prod || true
kubectl -n rag-prod apply -f k8s/secrets.yaml

Install/upgrade Helm chart:

helm upgrade --install rag . -n rag-prod -f values.prod.yaml

Wait for pods to be ready (watch):

kubectl -n rag-prod rollout status deploy/rag-api -w
kubectl -n rag-prod get pods -o wide

Smoke tests (simple requests):

curl -fsS http://<ingress>/healthz
curl -fsS -X POST http://<ingress>/api/qa -d '{"qid":"smoke-1","q":"Who is the CEO of WFGY?" }' | jq

Post-deploy checks (first 15 minutes)

Confirm retriever returns docs for 10 sample queries:
- Use your retrieval debug endpoint to inspect retrieved_ids.
Confirm p95 E2E latency ≤ target (by env). Collect from Grafana or kubectl logs.
Confirm CHR on 10 smoke items ≥ expected baseline (manually assert correctness).

Check for error spikes in logs:

kubectl -n rag-prod logs -l app=rag --since=10m | egrep "ERROR|WARN" | head -n 200

Common config gotchas (double-check)

Vectorstore read-only mode accidentally set? (affects writes)
LLM rate-limiting / auth errors (wrong key or quota).
Wrong index/namespace names between chunker and retriever (off-by-one).
Probes misconfigured — containers get restarted continuously.

Rollback criteria

Rollback if any of:

P95 > target and sustained for 10m.
Error rate > 3× baseline and not transient.
Retrieval failures (empty pool) > 1% of requests.

Rollback command example:

helm rollback rag <previous_revision> -n rag-prod

Quick checklist (copy/paste)

Namespace created, secrets applied
Helm values validated (resources, probes)
Prometheus/Grafana dashboards in place
Smoke tests passed (health & basic QA)
Alerts deployed
Canary traffic small → monitor 10–30 min

🔗 Quick-Start Downloads (60 sec)

Tool	Link	3-Step Setup
WFGY 1.0 PDF	Engine Paper	1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS)	TXTOS.txt	1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly

🧭 Explore More

Module	Description	Link
WFGY Core	WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack	View →
Problem Map 1.0	Initial 16-mode diagnostic and symbolic fix framework	View →
Problem Map 2.0	RAG-focused failure tree, modular fixes, and pipelines	View →
Semantic Clinic Index	Expanded failure catalog: prompt injection, memory bugs, logic drift	View →
Semantic Blueprint	Layer-based symbolic reasoning & semantic modulations	View →
Benchmark vs GPT-5	Stress test GPT-5 with full WFGY reasoning suite	View →
🧙‍♂️ Starter Village 🏡	New here? Lost in symbols? Click here and let the wizard guide you through	Start →

👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.

⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.

7 KiB Raw Blame History Unescape Escape