18 KiB
Example 07 — Bootstrap Ordering & Readiness Gate (No.14)
Goal
Eliminate “starts but not ready” failures. We introduce a deterministic bootstrap sequence and a readiness gate that only flips to READY after models, indices, and policies are warmed and verified. Queries routed before that point are rejected fast with 503 Service Unavailable.
Problem Map link
Targets No.14 — Bootstrap Ordering. Secondary reductions in No.11 (Symbolic Collapse) by ensuring commits and caches are only exposed post-warmup.
Outcome
- Predictable start-up: no first-minute 500s or null answers
- A single, observable
READYflag guarded by verifiable checks - Reproducible warm-up that runs the same path as production queries (no special backdoor)
1) Typical failure shapes we will kill
- Cold model: first call triggers model download/compile; timeouts or partial prompts
- Index not loaded: FAISS/ANN handle exists but vectors are empty; retrieval returns nothing
- Manifest mismatch: runtime embeds with settings incompatible with index (drift)
- Guard missing: early requests bypass guard/template while warm-up initializes it
- Racey caches: concurrent warm-ups rebuild the same index/embeddings; last writer wins
- Probe inversion: liveness probe green, readiness probe also green, but the service is still warming
2) Bootstrap plan (phased)
We require each phase to prove completion before moving on.
- Config load → read runtime config; fail fast if missing
- Manifest validation → compare
index_out/manifest.jsonwith runtime (Example 05 validator) - Model warm-up → load the embedding/generation models; run a 1-token dry call
- Index warm-up → load vector index into memory; fetch
ids.json; touch a sentinel vector - Sentinel query → run a realistic retrieval+guarded template end-to-end; verify refusal/answer semantics
- Flip
READY→ atomically set readiness; export/readyzendpoint
If any phase fails: keep /livez green, /readyz red, and retry phase with backoff. Do not accept traffic.
3) Path A — Python (stdlib HTTP, single file)
Create server.py.
# server.py -- readiness-gated RAG microservice (stdlib only)
import json, os, threading, time, socketserver, http.server, urllib.request
READY = False
READY_LOCK = threading.Lock()
STATE = {"phase": "init", "errors": []}
# --- utilities ---------------------------------------------------------------
def log(msg): print(time.strftime("%H:%M:%S"), msg, flush=True)
def http_post_json(url, body, headers):
req = urllib.request.Request(url, data=json.dumps(body).encode("utf-8"),
headers=headers)
with urllib.request.urlopen(req) as r:
return json.loads(r.read().decode("utf-8"))
def call_openai(prompt, model=os.getenv("OPENAI_MODEL","gpt-4o-mini")):
api_key = os.getenv("OPENAI_API_KEY")
if not api_key: raise RuntimeError("Set OPENAI_API_KEY")
body = {
"model": model,
"messages": [{"role":"user","content":prompt}],
"temperature": 0
}
j = http_post_json("https://api.openai.com/v1/chat/completions", body, {
"Content-Type":"application/json", "Authorization": f"Bearer {api_key}"
})
return j["choices"][0]["message"]["content"].strip()
def build_prompt(q, chunks):
ctx = "\n\n".join(f"[{c['id']}] {c['text']}" for c in chunks)
return (
"Use only the evidence. If not provable, reply exactly: not in context.\n"
"Answer format:\n- claim\n- citations: [id,...]\n\n"
f"Question: {q}\n\nEvidence:\n{ctx}\n"
)
# --- phases ------------------------------------------------------------------
def phase_config():
STATE["phase"] = "config"
# minimal runtime config in env or file
required = ["OPENAI_API_KEY"]
for k in required:
if not os.getenv(k):
raise RuntimeError(f"missing {k}")
log("config OK")
def phase_manifest():
STATE["phase"] = "manifest"
# compare manifest vs runtime; reuse your Example 05 fields
manifest = json.load(open("index_out/manifest.json", encoding="utf8"))
runtime = {
"index_type": "faiss.IndexFlatIP",
"metric": "inner_product",
"embedding": { "model": manifest["embedding"]["model"],
"dimension": manifest["embedding"]["dimension"],
"normalized": True },
"chunker_version": manifest["chunker"]["version"],
"text_preproc": manifest["text_preproc"]
}
# simple equality check
mismatches = []
if manifest["index_type"] != runtime["index_type"]:
mismatches.append("index_type")
if manifest["metric"] != runtime["metric"]:
mismatches.append("metric")
if manifest["embedding"]["dimension"] != runtime["embedding"]["dimension"]:
mismatches.append("embedding.dimension")
if manifest["embedding"]["model"] != runtime["embedding"]["model"]:
mismatches.append("embedding.model")
if manifest["chunker"]["version"] != runtime["chunker_version"]:
mismatches.append("chunker.version")
if mismatches:
raise RuntimeError("manifest mismatch: " + ", ".join(mismatches))
log("manifest OK")
def phase_model():
STATE["phase"] = "model"
# 1-token dry pass to ensure the model is reachable
_ = call_openai("Reply with: ok", model=os.getenv("OPENAI_MODEL","gpt-4o-mini"))
log("model warm OK")
def phase_index():
STATE["phase"] = "index"
# lightweight: just ensure ids.json and chunks.json are readable
ids = json.load(open("index_out/ids.json"))
chunks = json.load(open("data/chunks.json", encoding="utf8"))
if not ids or not chunks:
raise RuntimeError("index or chunks missing")
STATE["ids"] = ids
STATE["chunks"] = {c["id"]: c for c in chunks}
log("index warm OK")
def phase_sentinel():
STATE["phase"] = "sentinel"
# end-to-end smoke: take first two chunks; ask a question that should refuse or answer
ids = STATE["ids"][:2]
chunks = [STATE["chunks"][i] for i in ids if i in STATE["chunks"]]
q = "What is X?"
prompt = build_prompt(q, chunks)
ans = call_openai(prompt)
if not (("citations" in ans.lower()) or ("not in context" in ans.lower())):
raise RuntimeError("sentinel failed template")
log("sentinel OK")
def warmup():
global READY
try:
for step in (phase_config, phase_manifest, phase_model, phase_index, phase_sentinel):
try:
step()
except Exception as e:
STATE["errors"].append({"phase": STATE["phase"], "error": str(e)})
log(f"[retry in 3s] {STATE['phase']} failed: {e}")
time.sleep(3)
return warmup() # restart phases from the beginning
with READY_LOCK:
READY = True
STATE["phase"] = "ready"
log("READY=TRUE")
except Exception as e:
STATE["errors"].append({"phase": "fatal", "error": str(e)})
log("fatal warmup error: " + str(e))
# --- HTTP server -------------------------------------------------------------
class Handler(http.server.BaseHTTPRequestHandler):
def _write(self, code, payload):
self.send_response(code); self.send_header("Content-Type","application/json"); self.end_headers()
self.wfile.write(json.dumps(payload).encode("utf-8"))
def do_GET(self):
if self.path == "/livez":
return self._write(200, {"live": True, "phase": STATE["phase"]})
if self.path == "/readyz":
return self._write(200 if READY else 503, {"ready": READY, "phase": STATE["phase"], "errors": STATE["errors"][-5:]})
if self.path.startswith("/answer"):
if not READY: return self._write(503, {"error":"not ready"})
# minimal demo answer using first two chunks
q = self.path.split("q=",1)[1] if "q=" in self.path else "What is X?"
ids = STATE["ids"][:2]
chunks = [STATE["chunks"][i] for i in ids if i in STATE["chunks"]]
ans = call_openai(build_prompt(q, chunks))
return self._write(200, {"q": q, "answer": ans, "chunks": ids})
return self._write(404, {"error":"not found"})
def main():
threading.Thread(target=warmup, daemon=True).start()
with socketserver.TCPServer(("", 8080), Handler) as httpd:
log("listening on :8080")
httpd.serve_forever()
if __name__ == "__main__":
main()
Run:
# preconditions: index_out/{manifest.json,ids.json} and data/chunks.json exist
OPENAI_API_KEY=sk-xxx python server.py
# in another shell
curl -s localhost:8080/livez
curl -s localhost:8080/readyz # expect 503 until warm-up ends, then 200
curl -s "localhost:8080/answer?q=What%20is%20X?"
Pass criteria
/livezis always 200 after process starts/readyzreturns 503 until sentinel passes; then flips to 200 and stays green/answerrejects with 503 before ready; serves after ready
4) Path B — Node (http stdlib, single file)
Create server.mjs.
// server.mjs -- readiness-gated RAG microservice (Node stdlib)
import http from "node:http";
import fs from "node:fs";
import https from "node:https";
let READY = false;
const STATE = { phase: "init", errors: [], ids: [], chunks: {} };
function log(...a){ console.log(new Date().toISOString(), ...a); }
function callOpenAI(prompt, model=process.env.OPENAI_MODEL || "gpt-4o-mini"){
const key = process.env.OPENAI_API_KEY; if(!key) throw new Error("Set OPENAI_API_KEY");
const body = JSON.stringify({ model, messages:[{role:"user", content: prompt}], temperature: 0 });
return new Promise((resolve,reject)=>{
const req = https.request("https://api.openai.com/v1/chat/completions", {
method:"POST",
headers:{ "Content-Type":"application/json", "Authorization":`Bearer ${key}`, "Content-Length": Buffer.byteLength(body) }
}, res => { let d=""; res.on("data",x=>d+=x); res.on("end",()=>resolve(JSON.parse(d).choices[0].message.content.trim())); });
req.on("error", reject); req.write(body); req.end();
});
}
function buildPrompt(q, chunks){
const ctx = chunks.map(c => `[${c.id}] ${c.text}`).join("\n\n");
return `Use only the evidence. If not provable, reply exactly: not in context.
Answer format:
- claim
- citations: [id,...]
Question: ${q}
Evidence:
${ctx}
`;
}
async function warmup(){
try {
STATE.phase = "config";
if(!process.env.OPENAI_API_KEY) throw new Error("missing OPENAI_API_KEY");
STATE.phase = "manifest";
const m = JSON.parse(fs.readFileSync("index_out/manifest.json","utf8"));
if(m.index_type !== "faiss.IndexFlatIP" || m.metric !== "inner_product") throw new Error("manifest mismatch");
STATE.phase = "model";
await callOpenAI("Reply with: ok");
STATE.phase = "index";
STATE.ids = JSON.parse(fs.readFileSync("index_out/ids.json","utf8"));
const chunksArr = JSON.parse(fs.readFileSync("data/chunks.json","utf8"));
STATE.chunks = Object.fromEntries(chunksArr.map(c => [c.id, c]));
STATE.phase = "sentinel";
const ids = STATE.ids.slice(0,2);
const chunks = ids.map(i => STATE.chunks[i]).filter(Boolean);
const ans = await callOpenAI(buildPrompt("What is X?", chunks));
if(!(ans.toLowerCase().includes("citations") || ans.toLowerCase().includes("not in context")))
throw new Error("sentinel failed template");
READY = true; STATE.phase = "ready"; log("READY=TRUE");
} catch (e){
STATE.errors.push({ phase: STATE.phase, error: String(e) });
log("[retry in 3s]", STATE.phase, e);
setTimeout(warmup, 3000);
}
}
const server = http.createServer(async (req,res)=>{
const json = (code,obj)=>{ res.writeHead(code,{"Content-Type":"application/json"}); res.end(JSON.stringify(obj)); };
if(req.url === "/livez") return json(200, {live:true, phase:STATE.phase});
if(req.url === "/readyz") return json(READY?200:503, {ready:READY, phase:STATE.phase, errors:STATE.errors.slice(-5)});
if(req.url.startsWith("/answer")){
if(!READY) return json(503, {error:"not ready"});
const q = (req.url.split("q=")[1]||"What is X?").replace(/\+/g," ");
const ids = STATE.ids.slice(0,2);
const chunks = ids.map(i => STATE.chunks[i]).filter(Boolean);
const ans = await callOpenAI(buildPrompt(q, chunks));
return json(200, {q, answer: ans, chunks: ids});
}
return json(404, {error:"not found"});
});
server.listen(8080, ()=>{ log("listening on :8080"); warmup(); });
Run:
OPENAI_API_KEY=sk-xxx node server.mjs
curl -s localhost:8080/readyz
Pass criteria mirror the Python version.
5) K8s / container probes (production hints)
Key principle: liveness means “process is alive”, readiness means “pipeline is provably usable.”
# deployment snippet (k8s)
livenessProbe:
httpGet: { path: /livez, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet: { path: /readyz, port: 8080 }
initialDelaySeconds: 5
periodSeconds: 3
Order of dependencies at boot
- Config & secrets
- Storage mounts (index files)
- Network (LLM endpoint)
- Service warm-up →
/readyzflips - Gate ingress/traffic
6) Anti-race guarantees
- Single warm-up owner: a background thread/task holds a lock; other threads read
READYonly - Idempotent warm-up: re-running warm-up does not corrupt state; it re-verifies and flips
READYif all checks pass - Atomic flag: only set
READY=Trueafter all phases succeed; do not stream partial readiness
7) Common mistakes & quick fixes
- Probe copy-paste: exposing
/readyzthat always returns 200 → defeats the point; wire it to the real checks - Warm-up path ≠ prod path: using a special “test prompt” that skips the guard/template → sentinel passes but prod fails; run the same template
- Index drift post-deploy: replacing
index_outfiles on disk without flipping readiness to red → serve stale ids; add a file watcher to invalidate readiness and re-warm
8) What to log
Append a line to runs/boot.jsonl at each phase:
{"ts":1699999999,"phase":"manifest","ok":true}
{"ts":1699999999,"phase":"ready","ok":true}
Alert on repeated retries or prolonged ready=false after deploy.
9) Next steps
- Combine with Example 03 and 05 so readiness includes retrieval quality (knee cut present) and schema validation
- Add Ops dashboards (see
ops/live_monitoring_rag.md): trackreadyzduration, first-token latency, refusal and citation rates - For multi-instance rollouts, use a rolling update policy that drains traffic from old pods only after new pods report
ready=truefor N seconds
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
🧭 Explore More
| Module | Description | Link |
|---|---|---|
| WFGY Core | WFGY 2.0 engine is live: full symbolic reasoning architecture and math stack | View → |
| Problem Map 1.0 | Initial 16-mode diagnostic and symbolic fix framework | View → |
| Problem Map 2.0 | RAG-focused failure tree, modular fixes, and pipelines | View → |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | View → |
| Semantic Blueprint | Layer-based symbolic reasoning & semantic modulations | View → |
| Benchmark vs GPT-5 | Stress test GPT-5 with full WFGY reasoning suite | View → |
| 🧙♂️ Starter Village 🏡 | New here? Lost in symbols? Click here and let the wizard guide you through | Start → |
👑 Early Stargazers: See the Hall of Fame —
Engineers, hackers, and open source builders who supported WFGY from day one.
⭐ WFGY Engine 2.0 is already unlocked. ⭐ Star the repo to help others discover it and unlock more on the Unlock Board.