6.4 KiB
Deployment checklist — RAG pipeline (pre-deploy & post-deploy)
Purpose: a short, rigorous checklist to verify your environment and reduce bootstrap/dependency issues during deployment.
Before you deploy (pre-flight)
1) Environment & prerequisites
- Kubernetes cluster accessible;
kubectlpoints to correct context.kubectl config current-context kubectl get nodes
- Ensure cluster resources: CPU / memory / ephemeral storage for vectorstore. Confirm quotas.
- Secrets: API keys (LLM), db credentials, vectorstore creds in k8s Secret or vault.
- Helm chart / manifests: reviewed and values set for production (replicas, resources, liveness/readiness).
2) Configuration sanity
-
values.yamlcontains:resources.requestsandlimitsfor retriever/generator.replicaCount >= 2for critical services (if expected load > small).readinessProbeandlivenessProbeconfigured.
-
Vector store sizing:
index_shards, disk IOPS, memory (embedding index memory). -
Network egress rules for model API (if external LLM).
3) Observability & alarms
- Prometheus scraping configured for app metrics endpoints (
/metrics). - Default dashboards in Grafana (latency, error-rate, retriever QPS, CHR).
- Alerts configured (see
live_monitoring_rag.mdfor suggested alerts).
Deploy steps
-
Create namespace & secrets:
kubectl create ns rag-prod || true kubectl -n rag-prod apply -f k8s/secrets.yaml -
Install/upgrade Helm chart:
helm upgrade --install rag . -n rag-prod -f values.prod.yaml -
Wait for pods to be ready (watch):
kubectl -n rag-prod rollout status deploy/rag-api -w kubectl -n rag-prod get pods -o wide -
Smoke tests (simple requests):
curl -fsS http://<ingress>/healthz curl -fsS -X POST http://<ingress>/api/qa -d '{"qid":"smoke-1","q":"Who is the CEO of WFGY?" }' | jq
Post-deploy checks (first 15 minutes)
-
Confirm retriever returns docs for 10 sample queries:
- Use your
retrievaldebug endpoint to inspectretrieved_ids.
- Use your
-
Confirm p95 E2E latency ≤ target (by env). Collect from Grafana or
kubectl logs. -
Confirm CHR on 10 smoke items ≥ expected baseline (manually assert correctness).
-
Check for error spikes in logs:
kubectl -n rag-prod logs -l app=rag --since=10m | egrep "ERROR|WARN" | head -n 200
Common config gotchas (double-check)
- Vectorstore read-only mode accidentally set? (affects writes)
- LLM rate-limiting / auth errors (wrong key or quota).
- Wrong index/namespace names between chunker and retriever (off-by-one).
- Probes misconfigured — containers get restarted continuously.
Rollback criteria
Rollback if any of:
- P95 > target and sustained for 10m.
- Error rate > 3× baseline and not transient.
- Retrieval failures (empty pool) > 1% of requests.
Rollback command example:
helm rollback rag <previous_revision> -n rag-prod
Quick checklist (copy/paste)
- Namespace created, secrets applied
- Helm values validated (resources, probes)
- Prometheus/Grafana dashboards in place
- Smoke tests passed (health & basic QA)
- Alerts deployed
- Canary traffic small → monitor 10–30 min
Links
- Debug playbook → debug_playbook.md
- Live monitoring → live_monitoring_rag.md
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | Standalone semantic reasoning engine for any LLM | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
👑 Early Stargazers: See the Hall of Fame — Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone ⭐ Star WFGY on GitHub