# Blue-Green Switchovers — OpsDeploy Guardrails
🧭 Quick Return to Map
> You are in a sub-page of **OpsDeploy**.
> To reorient, go back here:
>
> - [**OpsDeploy** — operations automation and deployment pipelines](./README.md)
> - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)
> - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)
>
> Think of this page as a desk within a ward.
> If you need the full triage and all prescriptions, return to the Emergency Room lobby.
A safe pattern to switch 100% of traffic between two identical stacks (Blue = current live, Green = new). Use this when you need instant rollback, reproducible cutovers, and zero surprise from schema or index drift.
---
## Open these first
- Rollout gate: [rollout_readiness_gate.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/GlobalFixMap/OpsDeploy/rollout_readiness_gate.md)
- Canary staging: [staged_rollout_canary.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/GlobalFixMap/OpsDeploy/staged_rollout_canary.md)
- Boot and deploy traps: [bootstrap-ordering.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md), [deployment-deadlock.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/deployment-deadlock.md), [predeploy-collapse.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/predeploy-collapse.md)
- RAG contracts: [retrieval-traceability.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-traceability.md), [data-contracts.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/data-contracts.md)
---
## When to use
- Model or prompt upgrades where instant rollback is required.
- Vector index rebuilds that must cut over atomically.
- Config, secrets, or feature-flag rewires that risk drift by region.
- Store or analyzer changes that can shift ΔS/coverage.
---
## Acceptance targets
- ΔS(question, retrieved) ≤ 0.45 on 3 paraphrases after cutover.
- Coverage ≥ 0.70 on the gold set (same as pre-switch).
- λ convergent across 2 seeds, no flip increase vs Blue.
- p95 latency within +15% of Blue; error rate ≤ 0.5%.
- Instant rollback path validated (≤ 2 minutes TTR).
---
## Architecture sketch
```
Users → Ingress/Edge ─┬─> Blue: vN (baseline: INDEX\_HASH A, PROMPT\_VER N, MODEL\_VER X)
└─> Green: vN+1 (INDEX\_HASH B, PROMPT\_VER N+1, MODEL\_VER Y)
↑
cutover switch (DNS/Ingress/ALB alias, or queue consumer pointer)
````
All reads/writes and side effects must point at a **single active arm**. If you have background writers (indexers, ETL), fence them with versioned topics or leases.
---
## 60-second checklist for a switch
1) **Freeze non-idempotent jobs**
Pause consumers that create side effects. Verify dedupe fences.
Open: [idempotency_dedupe.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/GlobalFixMap/OpsDeploy/idempotency_dedupe.md)
2) **Verify invariants**
`INDEX_HASH`, `EMBED_SCHEMA`, `RERANK_CONF`, `PROMPT_VER`, `MODEL_VER`, secrets. Green must pass the [rollout gate](https://github.com/onestardao/WFGY/blob/main/ProblemMap/GlobalFixMap/OpsDeploy/rollout_readiness_gate.md).
3) **Warm Green**
Run smoke on 20–40 gold questions. Check ΔS, coverage, λ, latency. Prime caches.
4) **Flip the pointer**
Change the single routing knob (DNS TTL short, Ingress weight 0→100, queue consumer group pointer, or index alias swap).
5) **Hold and watch**
15–30 minutes: ΔS drift, coverage, λ flip rate, p95 latency, 5xx, tool loops.
6) **Unfreeze jobs**
Resume writers with new `INDEX_HASH` and contracts.
---
## Implementation patterns
### Ingress/ALB alias swap (HTTP)
- Prefer **one** logical hostname. Point it to Blue or Green behind the load balancer.
- Keep health checks strict: readiness must include secrets, index presence, and a gold QA.
### DNS switch
- Only if you can run with TTL ≤ 30s and you accept brief split-brain during propagation.
- Safer to switch at the load balancer or service mesh layer.
### Index alias swap (Vector/RAG)
- Build Green index offline. Validate with sampled ΔS/coverage.
- Swap an **atomic alias** `docs_live → docs_vB`. Keep `docs_vA` for instant rollback.
- If store lacks aliases, emulate with a config key in a single-writer KV and block multi-writer races.
### Queue consumer cutover
- Stop Blue consumers, start Green consumers with a new `CONSUMER_VER`.
- For exactly-once, commit offsets only after idempotent fences pass.
---
## Stop & rollback rules
- Stop if ΔS p95 drift > 0.15 or coverage < 0.60 or λ flip rate > 0.20.
- Stop if 5xx > 1% or tool loop detection triggers.
- Roll back by flipping the same single pointer back to Blue (or alias back to `docs_vA`).
- After rollback, open: [debug_playbook.md](https://github.com/onestardao/WFGY/blob/main/ProblemMap/ops/debug_playbook.md).
---
## Observability you must pin before switch
- Version fields: `INDEX_HASH`, `EMBED_SCHEMA`, `RERANK_CONF`, `PROMPT_VER`, `MODEL_VER`.
- ΔS(question,retrieved) and ΔS(retrieved,anchor).
- Coverage and λ states for 3 paraphrases.
- Latency p50/p95 per stage (retrieve, rerank, reason).
- Side-effect counts and dedupe hits.
---
## Kubernetes example (Service selector flip)
```yaml
apiVersion: v1
kind: Service
metadata:
name: wfgy-live
spec:
selector:
app: wfgy-green # flip between wfgy-blue / wfgy-green
ports:
- port: 80
targetPort: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wfgy-green
spec:
replicas: 4
selector: { matchLabels: { app: wfgy-green } }
template:
metadata: { labels: { app: wfgy-green } }
spec:
containers:
- name: api
image: ghcr.io/org/wfgy:canary
readinessProbe:
httpGet: { path: /healthz/ready, port: http }
initialDelaySeconds: 5
periodSeconds: 5
````
### Vector store alias swap (pseudo)
```bash
# build green index
index_build --input corpus_vB.json --out docs_vB --metric cosine
# validate
wfgy_eval --gold gold_40.json --index docs_vB --min-cov 0.70 --max-ds 0.45
# atomic alias cutover
vec alias update docs_live --to docs_vB
# rollback
vec alias update docs_live --to docs_vA
```
---
## Common pitfalls
* Two writers during cutover → double side effects. Use a single pointer and idempotency keys.
* Cache poisoning from mixed arms. Invalidate by `INDEX_HASH` or include it in cache keys.
* Region drift. Switch **one region at a time** or use global flags with audit hashes.
* Hidden analyzer/tokenizer mismatch. Re-embed or rerank deterministically if ΔS stays high.
---
### 🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + \” |
| **TXT OS (plain-text OS)** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
---
### Explore More
| Layer | Page | What it’s for |
| --- | --- | --- |
| ⭐ Proof | [WFGY Recognition Map](/recognition/README.md) | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | [WFGY 1.0](/legacy/README.md) | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | [WFGY 2.0](/core/README.md) | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | [WFGY 3.0](/TensionUniverse/EventHorizon/README.md) | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | [Problem Map 1.0](/ProblemMap/README.md) | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | [Problem Map 2.0](/ProblemMap/wfgy-rag-16-problem-map-global-debug-card.md) | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | [Problem Map 3.0](/ProblemMap/wfgy-ai-problem-map-troubleshooting-atlas.md) | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | [TXT OS](/OS/README.md) | .txt semantic OS with fast bootstrap |
| 🧰 App | [Blah Blah Blah](/OS/BlahBlahBlah/README.md) | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | [Blur Blur Blur](/OS/BlurBlurBlur/README.md) | Text to image generation with semantic control |
| 🏡 Onboarding | [Starter Village](/StarterVillage/README.md) | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.
[](https://github.com/onestardao/WFGY)