# Example 07 — Bootstrap Ordering & Readiness Gate (No.14)

**Goal**  
Eliminate “starts but not ready” failures. We introduce a **deterministic bootstrap sequence** and a **readiness gate** that only flips to `READY` after models, indices, and policies are warmed and verified. Queries routed before that point are rejected fast with `503 Service Unavailable`.

**Problem Map link**  
Targets **No.14 — Bootstrap Ordering**. Secondary reductions in **No.11 (Symbolic Collapse)** by ensuring commits and caches are only exposed post-warmup.

**Outcome**  
- Predictable start-up: no first-minute 500s or null answers  
- A single, observable `READY` flag guarded by verifiable checks  
- Reproducible warm-up that runs the **same path** as production queries (no special backdoor)

---

## 1) Typical failure shapes we will kill

- **Cold model**: first call triggers model download/compile; timeouts or partial prompts  
- **Index not loaded**: FAISS/ANN handle exists but vectors are empty; retrieval returns nothing  
- **Manifest mismatch**: runtime embeds with settings incompatible with index (drift)  
- **Guard missing**: early requests bypass guard/template while warm-up initializes it  
- **Racey caches**: concurrent warm-ups rebuild the same index/embeddings; last writer wins  
- **Probe inversion**: liveness probe green, readiness probe also green, but the service is still warming

---

## 2) Bootstrap plan (phased)

We require each phase to **prove** completion before moving on.

0. **Config load** → read runtime config; fail fast if missing  
1. **Manifest validation** → compare `index_out/manifest.json` with runtime (Example 05 validator)  
2. **Model warm-up** → load the embedding/generation models; run a 1-token dry call  
3. **Index warm-up** → load vector index into memory; fetch `ids.json`; touch a sentinel vector  
4. **Sentinel query** → run a realistic retrieval+guarded template end-to-end; verify refusal/answer semantics  
5. **Flip `READY`** → atomically set readiness; export `/readyz` endpoint

If any phase fails: keep `/livez` green, **/readyz red**, and retry phase with backoff. Do **not** accept traffic.

---

## 3) Path A — Python (stdlib HTTP, single file)

Create `server.py`.

```python
# server.py -- readiness-gated RAG microservice (stdlib only)
import json, os, threading, time, socketserver, http.server, urllib.request

READY = False
READY_LOCK = threading.Lock()
STATE = {"phase": "init", "errors": []}

# --- utilities ---------------------------------------------------------------
def log(msg): print(time.strftime("%H:%M:%S"), msg, flush=True)

def http_post_json(url, body, headers):
    req = urllib.request.Request(url, data=json.dumps(body).encode("utf-8"),
                                 headers=headers)
    with urllib.request.urlopen(req) as r:
        return json.loads(r.read().decode("utf-8"))

def call_openai(prompt, model=os.getenv("OPENAI_MODEL","gpt-4o-mini")):
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key: raise RuntimeError("Set OPENAI_API_KEY")
    body = {
        "model": model,
        "messages": [{"role":"user","content":prompt}],
        "temperature": 0
    }
    j = http_post_json("https://api.openai.com/v1/chat/completions", body, {
        "Content-Type":"application/json", "Authorization": f"Bearer {api_key}"
    })
    return j["choices"][0]["message"]["content"].strip()

def build_prompt(q, chunks):
    ctx = "\n\n".join(f"[{c['id']}] {c['text']}" for c in chunks)
    return (
        "Use only the evidence. If not provable, reply exactly: not in context.\n"
        "Answer format:\n- claim\n- citations: [id,...]\n\n"
        f"Question: {q}\n\nEvidence:\n{ctx}\n"
    )

# --- phases ------------------------------------------------------------------
def phase_config():
    STATE["phase"] = "config"
    # minimal runtime config in env or file
    required = ["OPENAI_API_KEY"]
    for k in required:
        if not os.getenv(k):
            raise RuntimeError(f"missing {k}")
    log("config OK")

def phase_manifest():
    STATE["phase"] = "manifest"
    # compare manifest vs runtime; reuse your Example 05 fields
    manifest = json.load(open("index_out/manifest.json", encoding="utf8"))
    runtime = {
        "index_type": "faiss.IndexFlatIP",
        "metric": "inner_product",
        "embedding": { "model": manifest["embedding"]["model"],
                       "dimension": manifest["embedding"]["dimension"],
                       "normalized": True },
        "chunker_version": manifest["chunker"]["version"],
        "text_preproc": manifest["text_preproc"]
    }
    # simple equality check
    mismatches = []
    if manifest["index_type"] != runtime["index_type"]:
        mismatches.append("index_type")
    if manifest["metric"] != runtime["metric"]:
        mismatches.append("metric")
    if manifest["embedding"]["dimension"] != runtime["embedding"]["dimension"]:
        mismatches.append("embedding.dimension")
    if manifest["embedding"]["model"] != runtime["embedding"]["model"]:
        mismatches.append("embedding.model")
    if manifest["chunker"]["version"] != runtime["chunker_version"]:
        mismatches.append("chunker.version")
    if mismatches:
        raise RuntimeError("manifest mismatch: " + ", ".join(mismatches))
    log("manifest OK")

def phase_model():
    STATE["phase"] = "model"
    # 1-token dry pass to ensure the model is reachable
    _ = call_openai("Reply with: ok", model=os.getenv("OPENAI_MODEL","gpt-4o-mini"))
    log("model warm OK")

def phase_index():
    STATE["phase"] = "index"
    # lightweight: just ensure ids.json and chunks.json are readable
    ids = json.load(open("index_out/ids.json"))
    chunks = json.load(open("data/chunks.json", encoding="utf8"))
    if not ids or not chunks:
        raise RuntimeError("index or chunks missing")
    STATE["ids"] = ids
    STATE["chunks"] = {c["id"]: c for c in chunks}
    log("index warm OK")

def phase_sentinel():
    STATE["phase"] = "sentinel"
    # end-to-end smoke: take first two chunks; ask a question that should refuse or answer
    ids = STATE["ids"][:2]
    chunks = [STATE["chunks"][i] for i in ids if i in STATE["chunks"]]
    q = "What is X?"
    prompt = build_prompt(q, chunks)
    ans = call_openai(prompt)
    if not (("citations" in ans.lower()) or ("not in context" in ans.lower())):
        raise RuntimeError("sentinel failed template")
    log("sentinel OK")

def warmup():
    global READY
    try:
        for step in (phase_config, phase_manifest, phase_model, phase_index, phase_sentinel):
            try:
                step()
            except Exception as e:
                STATE["errors"].append({"phase": STATE["phase"], "error": str(e)})
                log(f"[retry in 3s] {STATE['phase']} failed: {e}")
                time.sleep(3)
                return warmup()  # restart phases from the beginning
        with READY_LOCK:
            READY = True
            STATE["phase"] = "ready"
        log("READY=TRUE")
    except Exception as e:
        STATE["errors"].append({"phase": "fatal", "error": str(e)})
        log("fatal warmup error: " + str(e))

# --- HTTP server -------------------------------------------------------------
class Handler(http.server.BaseHTTPRequestHandler):
    def _write(self, code, payload):
        self.send_response(code); self.send_header("Content-Type","application/json"); self.end_headers()
        self.wfile.write(json.dumps(payload).encode("utf-8"))

    def do_GET(self):
        if self.path == "/livez":
            return self._write(200, {"live": True, "phase": STATE["phase"]})
        if self.path == "/readyz":
            return self._write(200 if READY else 503, {"ready": READY, "phase": STATE["phase"], "errors": STATE["errors"][-5:]})
        if self.path.startswith("/answer"):
            if not READY: return self._write(503, {"error":"not ready"})
            # minimal demo answer using first two chunks
            q = self.path.split("q=",1)[1] if "q=" in self.path else "What is X?"
            ids = STATE["ids"][:2]
            chunks = [STATE["chunks"][i] for i in ids if i in STATE["chunks"]]
            ans = call_openai(build_prompt(q, chunks))
            return self._write(200, {"q": q, "answer": ans, "chunks": ids})
        return self._write(404, {"error":"not found"})

def main():
    threading.Thread(target=warmup, daemon=True).start()
    with socketserver.TCPServer(("", 8080), Handler) as httpd:
        log("listening on :8080")
        httpd.serve_forever()

if __name__ == "__main__":
    main()
````

Run:

```bash
# preconditions: index_out/{manifest.json,ids.json} and data/chunks.json exist
OPENAI_API_KEY=sk-xxx python server.py
# in another shell
curl -s localhost:8080/livez
curl -s localhost:8080/readyz   # expect 503 until warm-up ends, then 200
curl -s "localhost:8080/answer?q=What%20is%20X?"
```

**Pass criteria**

* `/livez` is **always** 200 after process starts
* `/readyz` returns 503 **until** sentinel passes; then flips to 200 and stays green
* `/answer` rejects with 503 before ready; serves after ready

---

## 4) Path B — Node (http stdlib, single file)

Create `server.mjs`.

```js
// server.mjs -- readiness-gated RAG microservice (Node stdlib)
import http from "node:http";
import fs from "node:fs";
import https from "node:https";

let READY = false;
const STATE = { phase: "init", errors: [], ids: [], chunks: {} };

function log(...a){ console.log(new Date().toISOString(), ...a); }

function callOpenAI(prompt, model=process.env.OPENAI_MODEL || "gpt-4o-mini"){
  const key = process.env.OPENAI_API_KEY; if(!key) throw new Error("Set OPENAI_API_KEY");
  const body = JSON.stringify({ model, messages:[{role:"user", content: prompt}], temperature: 0 });
  return new Promise((resolve,reject)=>{
    const req = https.request("https://api.openai.com/v1/chat/completions", {
      method:"POST",
      headers:{ "Content-Type":"application/json", "Authorization":`Bearer ${key}`, "Content-Length": Buffer.byteLength(body) }
    }, res => { let d=""; res.on("data",x=>d+=x); res.on("end",()=>resolve(JSON.parse(d).choices[0].message.content.trim())); });
    req.on("error", reject); req.write(body); req.end();
  });
}

function buildPrompt(q, chunks){
  const ctx = chunks.map(c => `[${c.id}] ${c.text}`).join("\n\n");
  return `Use only the evidence. If not provable, reply exactly: not in context.
Answer format:
- claim
- citations: [id,...]

Question: ${q}

Evidence:
${ctx}
`;
}

async function warmup(){
  try {
    STATE.phase = "config";
    if(!process.env.OPENAI_API_KEY) throw new Error("missing OPENAI_API_KEY");

    STATE.phase = "manifest";
    const m = JSON.parse(fs.readFileSync("index_out/manifest.json","utf8"));
    if(m.index_type !== "faiss.IndexFlatIP" || m.metric !== "inner_product") throw new Error("manifest mismatch");

    STATE.phase = "model";
    await callOpenAI("Reply with: ok");

    STATE.phase = "index";
    STATE.ids = JSON.parse(fs.readFileSync("index_out/ids.json","utf8"));
    const chunksArr = JSON.parse(fs.readFileSync("data/chunks.json","utf8"));
    STATE.chunks = Object.fromEntries(chunksArr.map(c => [c.id, c]));

    STATE.phase = "sentinel";
    const ids = STATE.ids.slice(0,2);
    const chunks = ids.map(i => STATE.chunks[i]).filter(Boolean);
    const ans = await callOpenAI(buildPrompt("What is X?", chunks));
    if(!(ans.toLowerCase().includes("citations") || ans.toLowerCase().includes("not in context")))
      throw new Error("sentinel failed template");

    READY = true; STATE.phase = "ready"; log("READY=TRUE");
  } catch (e){
    STATE.errors.push({ phase: STATE.phase, error: String(e) });
    log("[retry in 3s]", STATE.phase, e);
    setTimeout(warmup, 3000);
  }
}

const server = http.createServer(async (req,res)=>{
  const json = (code,obj)=>{ res.writeHead(code,{"Content-Type":"application/json"}); res.end(JSON.stringify(obj)); };
  if(req.url === "/livez") return json(200, {live:true, phase:STATE.phase});
  if(req.url === "/readyz") return json(READY?200:503, {ready:READY, phase:STATE.phase, errors:STATE.errors.slice(-5)});
  if(req.url.startsWith("/answer")){
    if(!READY) return json(503, {error:"not ready"});
    const q = (req.url.split("q=")[1]||"What is X?").replace(/\+/g," ");
    const ids = STATE.ids.slice(0,2);
    const chunks = ids.map(i => STATE.chunks[i]).filter(Boolean);
    const ans = await callOpenAI(buildPrompt(q, chunks));
    return json(200, {q, answer: ans, chunks: ids});
  }
  return json(404, {error:"not found"});
});

server.listen(8080, ()=>{ log("listening on :8080"); warmup(); });
```

Run:

```bash
OPENAI_API_KEY=sk-xxx node server.mjs
curl -s localhost:8080/readyz
```

**Pass criteria** mirror the Python version.

---

## 5) K8s / container probes (production hints)

**Key principle:** **liveness** means “process is alive”, **readiness** means “pipeline is provably usable.”

```yaml
# deployment snippet (k8s)
livenessProbe:
  httpGet: { path: /livez, port: 8080 }
  initialDelaySeconds: 5
  periodSeconds: 5
readinessProbe:
  httpGet: { path: /readyz, port: 8080 }
  initialDelaySeconds: 5
  periodSeconds: 3
```

**Order of dependencies at boot**

1. Config & secrets
2. Storage mounts (index files)
3. Network (LLM endpoint)
4. Service warm-up → `/readyz` flips
5. Gate ingress/traffic

---

## 6) Anti-race guarantees

* **Single warm-up owner**: a background thread/task holds a lock; other threads read `READY` only
* **Idempotent warm-up**: re-running warm-up does not corrupt state; it re-verifies and flips `READY` if all checks pass
* **Atomic flag**: only set `READY=True` **after** all phases succeed; do not stream partial readiness

---

## 7) Common mistakes & quick fixes

* **Probe copy-paste**: exposing `/readyz` that always returns 200 → defeats the point; wire it to the real checks
* **Warm-up path ≠ prod path**: using a special “test prompt” that skips the guard/template → sentinel passes but prod fails; run the **same template**
* **Index drift post-deploy**: replacing `index_out` files on disk without flipping readiness to red → serve stale ids; add a file watcher to invalidate readiness and re-warm

---

## 8) What to log

Append a line to `runs/boot.jsonl` at each phase:

```json
{"ts":1699999999,"phase":"manifest","ok":true}
{"ts":1699999999,"phase":"ready","ok":true}
```

Alert on repeated retries or prolonged `ready=false` after deploy.

---

## 9) Next steps

* Combine with **Example 03** and **05** so readiness includes retrieval quality (knee cut present) and schema validation
* Add **Ops** dashboards (see `ops/live_monitoring_rag.md`): track `readyz` duration, first-token latency, refusal and citation rates
* For multi-instance rollouts, use a rolling update policy that drains traffic from old pods only after new pods report `ready=true` for N seconds

---


### 🧭 Explore More

| Module                | Description                                              | Link     |
|-----------------------|----------------------------------------------------------|----------|
| WFGY Core             | Standalone semantic reasoning engine for any LLM         | [View →](https://github.com/onestardao/WFGY/tree/main/core/README.md) |
| Problem Map 1.0       | Initial 16-mode diagnostic and symbolic fix framework    | [View →](https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md) |
| Problem Map 2.0       | RAG-focused failure tree, modular fixes, and pipelines   | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/rag-architecture-and-recovery.md) |
| Semantic Clinic Index | Expanded failure catalog: prompt injection, memory bugs, logic drift | [View →](https://github.com/onestardao/WFGY/blob/main/ProblemMap/SemanticClinicIndex.md) |
| Semantic Blueprint    | Layer-based symbolic reasoning & semantic modulations   | [View →](https://github.com/onestardao/WFGY/tree/main/SemanticBlueprint/README.md) |
| Benchmark vs GPT-5    | Stress test GPT-5 with full WFGY reasoning suite         | [View →](https://github.com/onestardao/WFGY/tree/main/benchmarks/benchmark-vs-gpt5/README.md) |


---

> 👑 **Early Stargazers: [See the Hall of Fame](https://github.com/onestardao/WFGY/tree/main/stargazers)** —  
> Engineers, hackers, and open source builders who supported WFGY from day one.

> <img src="https://img.shields.io/github/stars/onestardao/WFGY?style=social" alt="GitHub stars"> ⭐ Help reach 10,000 stars by 2025-09-01 to unlock Engine 2.0 for everyone  ⭐ **[Star WFGY on GitHub](https://github.com/onestardao/WFGY)**


<div align="center">

[![WFGY Main](https://img.shields.io/badge/WFGY-Main-red?style=flat-square)](https://github.com/onestardao/WFGY)
&nbsp;
[![TXT OS](https://img.shields.io/badge/TXT%20OS-Reasoning%20OS-orange?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS)
&nbsp;
[![Blah](https://img.shields.io/badge/Blah-Semantic%20Embed-yellow?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlahBlahBlah)
&nbsp;
[![Blot](https://img.shields.io/badge/Blot-Persona%20Core-green?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlotBlotBlot)
&nbsp;
[![Bloc](https://img.shields.io/badge/Bloc-Reasoning%20Compiler-blue?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlocBlocBloc)
&nbsp;
[![Blur](https://img.shields.io/badge/Blur-Text2Image%20Engine-navy?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlurBlurBlur)
&nbsp;
[![Blow](https://img.shields.io/badge/Blow-Game%20Logic-purple?style=flat-square)](https://github.com/onestardao/WFGY/tree/main/OS/BlowBlowBlow)

</div>