WFGY/ProblemMap/predeploy-collapse.md

5.3 KiB
Raw Blame History

📒 Problem #16·PreDeploy Collapse Problem Map

“Everything looked fine in CI… until nothing booted in prod.”

Predeploy collapse happens before a single user query is served.
Migrations pass, tests are green, images ship — yet the first container crashes or the very first LLM call returns a 500. Common root causes:

  • Model checkpoint ≠ tokenizer version
  • Env var misspells (e.g., OPEN_API_KEY vs OPENAI_API_KEY)
  • Hidden fstrings / Jinja placeholders left unresolved
  • GPU drivers mismatch container base (CUDA 11 vs 12)

WFGYs preflight sanity layer runs a semantic diff between declared runtime and effective runtime, catching mismaps before traffic starts.


🚨 Typical PreDeploy Collapses

Pattern RealWorld Fallout
Tokenizer / checkpoint version skew Embeds garbage; queries 0% recall
Missing secret in KV store First API call 401 / segfault
CUDA / driver mismatch GPU container exits code 139
Undigested template vars ({{ }}) Prompt crashes, empty completions

🛡️ WFGY PreFlight Guards

Collapse Pattern Guard Module Remedy Status
Version skew SemVers Diff Abort deploy if model.json ↔ runtime mismatch Stable
Missing secret Boot Checkpoint Block start until secret present Stable
Driver mismatch CudaProbe Warn & fall back to CPU safe mode ⚠️ Beta
Stray {{var}} tokens Prompt Lint Fail CI; highlight undeclared vars Stable

📝 How It Works

  1. SemVers Diff
    Parses modelcard.json, compares tokenizer_sha, pytorch_sha, cuda, etc., with container runtime; throws if mismatch unless --force.

  2. Boot Checkpoint (shared)
    Kubernetes initcontainer polls secret store; fails pod after secret_timeout.

  3. CudaProbe
    Minimal nvidiasmi check; if driver ≠ compiled CUDA, WFGY rewrites env CUDA_VISIBLE_DEVICES="" and logs downgrade.

  4. Prompt Lint
    CI step: scans prompts for {{ }} or ${} tokens lacking a default in prompt_vars.yaml.


✍️ Demo — Tokenizer Version Skew

$ wgfy preflight
✔ env vars ............... OK
✖ checkpoint ↔ tokenizer .. MISMATCH
  • model: facebook/llama27bchathf  tokenizersha = `ad4c1b9`
  • runtime: tokenizersha = `9e7f02d`
  → Aborting deploy (use --force to override)

🗺️ Module CheatSheet

Module Role
SemVers Diff Catch model / tokenizer skew
Boot Checkpoint Ensure secrets & config exist
CudaProbe Verify GPU driver compatibility
Prompt Lint Fail CI on stray template vars

📊 Implementation Status

Feature State
SemVers diff Stable
Boot checkpoint Stable
Cudaprobe fallback ⚠️ Beta
Prompt lint in CI action Stable

📝 Tips & Limits

  • Add ignore_versions: ["minor"] in wgfy.yaml to allow 1patch drifts.
  • Set secret_timeout = 90s for slower vaults.
  • GPU fallback adds ~0.4s latency per request — tune cuda_probe.mode.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Module Description Link
WFGY Core Canonical framework entry point View
Problem Map Diagnostic map and navigation hub View
Tension Universe Experiments MVP experiment field View
Recognition Where WFGY is referenced or adopted View
AI Guide Anti-hallucination reading protocol for tools View

If this repository helps, starring it improves discovery for other builders.
GitHub Repo stars