# Example 07 — Bootstrap Ordering & Readiness Gate (No.14)
**Goal**
Eliminate “starts but not ready” failures. We introduce a **deterministic bootstrap sequence** and a **readiness gate** that only flips to `READY` after models, indices, and policies are warmed and verified. Queries routed before that point are rejected fast with `503 Service Unavailable`.
**Problem Map link**
Targets **No.14 — Bootstrap Ordering**. Secondary reductions in **No.11 (Symbolic Collapse)** by ensuring commits and caches are only exposed post-warmup.
**Outcome**
- Predictable start-up: no first-minute 500s or null answers
- A single, observable `READY` flag guarded by verifiable checks
- Reproducible warm-up that runs the **same path** as production queries (no special backdoor)
---
## 1) Typical failure shapes we will kill
- **Cold model**: first call triggers model download/compile; timeouts or partial prompts
- **Index not loaded**: FAISS/ANN handle exists but vectors are empty; retrieval returns nothing
- **Manifest mismatch**: runtime embeds with settings incompatible with index (drift)
- **Guard missing**: early requests bypass guard/template while warm-up initializes it
- **Racey caches**: concurrent warm-ups rebuild the same index/embeddings; last writer wins
- **Probe inversion**: liveness probe green, readiness probe also green, but the service is still warming
---
## 2) Bootstrap plan (phased)
We require each phase to **prove** completion before moving on.
0. **Config load** → read runtime config; fail fast if missing
1. **Manifest validation** → compare `index_out/manifest.json` with runtime (Example 05 validator)
2. **Model warm-up** → load the embedding/generation models; run a 1-token dry call
3. **Index warm-up** → load vector index into memory; fetch `ids.json`; touch a sentinel vector
4. **Sentinel query** → run a realistic retrieval+guarded template end-to-end; verify refusal/answer semantics
5. **Flip `READY`** → atomically set readiness; export `/readyz` endpoint
If any phase fails: keep `/livez` green, **/readyz red**, and retry phase with backoff. Do **not** accept traffic.
---
## 3) Path A — Python (stdlib HTTP, single file)
Create `server.py`.
```python
# server.py -- readiness-gated RAG microservice (stdlib only)
import json, os, threading, time, socketserver, http.server, urllib.request
READY = False
READY_LOCK = threading.Lock()
STATE = {"phase": "init", "errors": []}
# --- utilities ---------------------------------------------------------------
def log(msg): print(time.strftime("%H:%M:%S"), msg, flush=True)
def http_post_json(url, body, headers):
req = urllib.request.Request(url, data=json.dumps(body).encode("utf-8"),
headers=headers)
with urllib.request.urlopen(req) as r:
return json.loads(r.read().decode("utf-8"))
def call_openai(prompt, model=os.getenv("OPENAI_MODEL","gpt-4o-mini")):
api_key = os.getenv("OPENAI_API_KEY")
if not api_key: raise RuntimeError("Set OPENAI_API_KEY")
body = {
"model": model,
"messages": [{"role":"user","content":prompt}],
"temperature": 0
}
j = http_post_json("https://api.openai.com/v1/chat/completions", body, {
"Content-Type":"application/json", "Authorization": f"Bearer {api_key}"
})
return j["choices"][0]["message"]["content"].strip()
def build_prompt(q, chunks):
ctx = "\n\n".join(f"[{c['id']}] {c['text']}" for c in chunks)
return (
"Use only the evidence. If not provable, reply exactly: not in context.\n"
"Answer format:\n- claim\n- citations: [id,...]\n\n"
f"Question: {q}\n\nEvidence:\n{ctx}\n"
)
# --- phases ------------------------------------------------------------------
def phase_config():
STATE["phase"] = "config"
# minimal runtime config in env or file
required = ["OPENAI_API_KEY"]
for k in required:
if not os.getenv(k):
raise RuntimeError(f"missing {k}")
log("config OK")
def phase_manifest():
STATE["phase"] = "manifest"
# compare manifest vs runtime; reuse your Example 05 fields
manifest = json.load(open("index_out/manifest.json", encoding="utf8"))
runtime = {
"index_type": "faiss.IndexFlatIP",
"metric": "inner_product",
"embedding": { "model": manifest["embedding"]["model"],
"dimension": manifest["embedding"]["dimension"],
"normalized": True },
"chunker_version": manifest["chunker"]["version"],
"text_preproc": manifest["text_preproc"]
}
# simple equality check
mismatches = []
if manifest["index_type"] != runtime["index_type"]:
mismatches.append("index_type")
if manifest["metric"] != runtime["metric"]:
mismatches.append("metric")
if manifest["embedding"]["dimension"] != runtime["embedding"]["dimension"]:
mismatches.append("embedding.dimension")
if manifest["embedding"]["model"] != runtime["embedding"]["model"]:
mismatches.append("embedding.model")
if manifest["chunker"]["version"] != runtime["chunker_version"]:
mismatches.append("chunker.version")
if mismatches:
raise RuntimeError("manifest mismatch: " + ", ".join(mismatches))
log("manifest OK")
def phase_model():
STATE["phase"] = "model"
# 1-token dry pass to ensure the model is reachable
_ = call_openai("Reply with: ok", model=os.getenv("OPENAI_MODEL","gpt-4o-mini"))
log("model warm OK")
def phase_index():
STATE["phase"] = "index"
# lightweight: just ensure ids.json and chunks.json are readable
ids = json.load(open("index_out/ids.json"))
chunks = json.load(open("data/chunks.json", encoding="utf8"))
if not ids or not chunks:
raise RuntimeError("index or chunks missing")
STATE["ids"] = ids
STATE["chunks"] = {c["id"]: c for c in chunks}
log("index warm OK")
def phase_sentinel():
STATE["phase"] = "sentinel"
# end-to-end smoke: take first two chunks; ask a question that should refuse or answer
ids = STATE["ids"][:2]
chunks = [STATE["chunks"][i] for i in ids if i in STATE["chunks"]]
q = "What is X?"
prompt = build_prompt(q, chunks)
ans = call_openai(prompt)
if not (("citations" in ans.lower()) or ("not in context" in ans.lower())):
raise RuntimeError("sentinel failed template")
log("sentinel OK")
def warmup():
global READY
try:
for step in (phase_config, phase_manifest, phase_model, phase_index, phase_sentinel):
try:
step()
except Exception as e:
STATE["errors"].append({"phase": STATE["phase"], "error": str(e)})
log(f"[retry in 3s] {STATE['phase']} failed: {e}")
time.sleep(3)
return warmup() # restart phases from the beginning
with READY_LOCK:
READY = True
STATE["phase"] = "ready"
log("READY=TRUE")
except Exception as e:
STATE["errors"].append({"phase": "fatal", "error": str(e)})
log("fatal warmup error: " + str(e))
# --- HTTP server -------------------------------------------------------------
class Handler(http.server.BaseHTTPRequestHandler):
def _write(self, code, payload):
self.send_response(code); self.send_header("Content-Type","application/json"); self.end_headers()
self.wfile.write(json.dumps(payload).encode("utf-8"))
def do_GET(self):
if self.path == "/livez":
return self._write(200, {"live": True, "phase": STATE["phase"]})
if self.path == "/readyz":
return self._write(200 if READY else 503, {"ready": READY, "phase": STATE["phase"], "errors": STATE["errors"][-5:]})
if self.path.startswith("/answer"):
if not READY: return self._write(503, {"error":"not ready"})
# minimal demo answer using first two chunks
q = self.path.split("q=",1)[1] if "q=" in self.path else "What is X?"
ids = STATE["ids"][:2]
chunks = [STATE["chunks"][i] for i in ids if i in STATE["chunks"]]
ans = call_openai(build_prompt(q, chunks))
return self._write(200, {"q": q, "answer": ans, "chunks": ids})
return self._write(404, {"error":"not found"})
def main():
threading.Thread(target=warmup, daemon=True).start()
with socketserver.TCPServer(("", 8080), Handler) as httpd:
log("listening on :8080")
httpd.serve_forever()
if __name__ == "__main__":
main()
````
Run:
```bash
# preconditions: index_out/{manifest.json,ids.json} and data/chunks.json exist
OPENAI_API_KEY=sk-xxx python server.py
# in another shell
curl -s localhost:8080/livez
curl -s localhost:8080/readyz # expect 503 until warm-up ends, then 200
curl -s "localhost:8080/answer?q=What%20is%20X?"
```
**Pass criteria**
* `/livez` is **always** 200 after process starts
* `/readyz` returns 503 **until** sentinel passes; then flips to 200 and stays green
* `/answer` rejects with 503 before ready; serves after ready
---
## 4) Path B — Node (http stdlib, single file)
Create `server.mjs`.
```js
// server.mjs -- readiness-gated RAG microservice (Node stdlib)
import http from "node:http";
import fs from "node:fs";
import https from "node:https";
let READY = false;
const STATE = { phase: "init", errors: [], ids: [], chunks: {} };
function log(...a){ console.log(new Date().toISOString(), ...a); }
function callOpenAI(prompt, model=process.env.OPENAI_MODEL || "gpt-4o-mini"){
const key = process.env.OPENAI_API_KEY; if(!key) throw new Error("Set OPENAI_API_KEY");
const body = JSON.stringify({ model, messages:[{role:"user", content: prompt}], temperature: 0 });
return new Promise((resolve,reject)=>{
const req = https.request("https://api.openai.com/v1/chat/completions", {
method:"POST",
headers:{ "Content-Type":"application/json", "Authorization":`Bearer ${key}`, "Content-Length": Buffer.byteLength(body) }
}, res => { let d=""; res.on("data",x=>d+=x); res.on("end",()=>resolve(JSON.parse(d).choices[0].message.content.trim())); });
req.on("error", reject); req.write(body); req.end();
});
}
function buildPrompt(q, chunks){
const ctx = chunks.map(c => `[${c.id}] ${c.text}`).join("\n\n");
return `Use only the evidence. If not provable, reply exactly: not in context.
Answer format:
- claim
- citations: [id,...]
Question: ${q}
Evidence:
${ctx}
`;
}
async function warmup(){
try {
STATE.phase = "config";
if(!process.env.OPENAI_API_KEY) throw new Error("missing OPENAI_API_KEY");
STATE.phase = "manifest";
const m = JSON.parse(fs.readFileSync("index_out/manifest.json","utf8"));
if(m.index_type !== "faiss.IndexFlatIP" || m.metric !== "inner_product") throw new Error("manifest mismatch");
STATE.phase = "model";
await callOpenAI("Reply with: ok");
STATE.phase = "index";
STATE.ids = JSON.parse(fs.readFileSync("index_out/ids.json","utf8"));
const chunksArr = JSON.parse(fs.readFileSync("data/chunks.json","utf8"));
STATE.chunks = Object.fromEntries(chunksArr.map(c => [c.id, c]));
STATE.phase = "sentinel";
const ids = STATE.ids.slice(0,2);
const chunks = ids.map(i => STATE.chunks[i]).filter(Boolean);
const ans = await callOpenAI(buildPrompt("What is X?", chunks));
if(!(ans.toLowerCase().includes("citations") || ans.toLowerCase().includes("not in context")))
throw new Error("sentinel failed template");
READY = true; STATE.phase = "ready"; log("READY=TRUE");
} catch (e){
STATE.errors.push({ phase: STATE.phase, error: String(e) });
log("[retry in 3s]", STATE.phase, e);
setTimeout(warmup, 3000);
}
}
const server = http.createServer(async (req,res)=>{
const json = (code,obj)=>{ res.writeHead(code,{"Content-Type":"application/json"}); res.end(JSON.stringify(obj)); };
if(req.url === "/livez") return json(200, {live:true, phase:STATE.phase});
if(req.url === "/readyz") return json(READY?200:503, {ready:READY, phase:STATE.phase, errors:STATE.errors.slice(-5)});
if(req.url.startsWith("/answer")){
if(!READY) return json(503, {error:"not ready"});
const q = (req.url.split("q=")[1]||"What is X?").replace(/\+/g," ");
const ids = STATE.ids.slice(0,2);
const chunks = ids.map(i => STATE.chunks[i]).filter(Boolean);
const ans = await callOpenAI(buildPrompt(q, chunks));
return json(200, {q, answer: ans, chunks: ids});
}
return json(404, {error:"not found"});
});
server.listen(8080, ()=>{ log("listening on :8080"); warmup(); });
```
Run:
```bash
OPENAI_API_KEY=sk-xxx node server.mjs
curl -s localhost:8080/readyz
```
**Pass criteria** mirror the Python version.
---
## 5) K8s / container probes (production hints)
**Key principle:** **liveness** means “process is alive”, **readiness** means “pipeline is provably usable.”
```yaml
# deployment snippet (k8s)
livenessProbe:
httpGet: { path: /livez, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 3
```
**Order of dependencies at boot**
1. Config & secrets
2. Storage mounts (index files)
3. Network (LLM endpoint)
4. Service warm-up → `/readyz` flips
5. Gate ingress/traffic
---
## 6) Anti-race guarantees
* **Single warm-up owner**: a background thread/task holds a lock; other threads read `READY` only
* **Idempotent warm-up**: re-running warm-up does not corrupt state; it re-verifies and flips `READY` if all checks pass
* **Atomic flag**: only set `READY=True` **after** all phases succeed; do not stream partial readiness
---
## 7) Common mistakes & quick fixes
* **Probe copy-paste**: exposing `/readyz` that always returns 200 → defeats the point; wire it to the real checks
* **Warm-up path ≠ prod path**: using a special “test prompt” that skips the guard/template → sentinel passes but prod fails; run the **same template**
* **Index drift post-deploy**: replacing `index_out` files on disk without flipping readiness to red → serve stale ids; add a file watcher to invalidate readiness and re-warm
---
## 8) What to log
Append a line to `runs/boot.jsonl` at each phase:
```json
{"ts":1699999999,"phase":"manifest","ok":true}
{"ts":1699999999,"phase":"ready","ok":true}
```
Alert on repeated retries or prolonged `ready=false` after deploy.
---
## 9) Next steps
* Combine with **Example 03** and **05** so readiness includes retrieval quality (knee cut present) and schema validation
* Add **Ops** dashboards (see `ops/live_monitoring_rag.md`): track `readyz` duration, first-token latency, refusal and citation rates
* For multi-instance rollouts, use a rolling update policy that drains traffic from old pods only after new pods report `ready=true` for N seconds
---
### 🧭 Explore More
| Module | Description | Link |
|-----------------------|----------------------------------------------------------|----------|
| WFGY Core | Standalone semantic reasoning engine for any LLM | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) |
---
> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** —
> Engineers, hackers, and open source builders who supported WFGY from day one.
> ⭐ Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone ⭐ **[Star WFGY on GitHub](https://github.com/onestardao/WFGY)**