WFGY/ProblemMap/GlobalFixMap/OpsDeploy/README.md

14 KiB
Raw Blame History

Ops & Deploy — Global Fix Map

🏥 Quick Return to Emergency Room

You are in a specialist desk.
For full triage and doctors on duty, return here:

Think of this page as a sub-room.
If you want full consultation and prescriptions, go back to the Emergency Room lobby.

A compact hub to ship safely and keep RAG/LLM systems stable after release.
Use this folder to pick the right guardrail, verify with measurable targets, and recover fast when things wobble. No infra change required.


Open these first


When to use this folder

  • First calls after deploy crash or return stale content.
  • ΔS and citations look fine yesterday but flip today.
  • Rate limits cascade, queues spike, latency climbs.
  • Canary looks good then full rollout breaks retrieval.
  • Index swap succeeds but answers cite old snippets.
  • Retries cause duplicate side effects or charges.
  • Feature flags bleed traffic into unfinished paths.
  • Maintenance windows corrupt embeddings or anchors.

Acceptance targets for a safe rollout

  • ΔS(question, retrieved) ≤ 0.45 across three paraphrases.
  • Coverage ≥ 0.70 on the expected new section.
  • λ remains convergent on 2 seeds during rollout.
  • Idempotency ≥ 99.9% on retry storms.
  • Zero silent index mismatches (hash + counts match).
  • P95 latency stays in budget with backpressure active.

Quick routes — per-page guides

Scenario Fix Page
Rollout readiness rollout_readiness_gate.md
Canary strategy staged_rollout_canary.md
Blue/green cutover blue_green_switchovers.md
Version pin & freeze version_pinning_and_model_lock.md
Vector index swap vector_index_build_and_swap.md
Cache warmup cache_warmup_invalidation.md
Rate limits rate_limit_backpressure.md
Feature flags feature_flags_safe_launch.md
Idempotency idempotency_dedupe.md
Retry logic retry_backoff.md
Rollback plan rollback_and_fast_recovery.md
Postmortems postmortem_and_regression_tests.md
Change freeze release_calendar_and_change_freeze.md
Incident comms incident_comms_and_statuspage.md
Shadow traffic shadow_traffic_mirroring.md
Maintenance window read_only_mode_and_maintenance_window.md
DB migrations db_migration_guardrails.md

60-second ship checklist

  1. Freeze the world → Pin model IDs, prompt revs, index hashes.
  2. Warm up safely → Build index off-path, preload caches with canary.
  3. Shadow then canary → Mirror prod queries, step rollout 5% → 25% → 100%.
  4. Guard the edge → Enable backpressure, retries with jitter, idempotency keys.
  5. Know your exit → Keep rollback switch and comms draft ready.

Symptoms → exact fix

What you see Open this
Deploy points to old snippets vector_index_build_and_swap.md · cache_warmup_invalidation.md
Canary fine, full rollout breaks staged_rollout_canary.md · feature_flags_safe_launch.md
Wrong model after failover version_pinning_and_model_lock.md
Retries duplicate charges idempotency_dedupe.md · retry_backoff.md
RL storms, timeouts rate_limit_backpressure.md
Need rollback now rollback_and_fast_recovery.md · blue_green_switchovers.md
Maintenance corrupts anchors read_only_mode_and_maintenance_window.md · db_migration_guardrails.md
Unsure if safe to ship rollout_readiness_gate.md

FAQ

Q: What does ΔS mean here?
A: ΔS is a stability score. It measures how much the retrieved content drifts from the expected anchor when you change the query slightly. Lower is better (≤ 0.45 is safe).

Q: What is λ convergence?
A: λ tracks whether retrieval order flips unpredictably. If λ is stable across seeds, your rollout is consistent.

Q: Why do I need idempotency keys?
A: Without them, retries can double-charge a user or run the same side-effect twice. Keys make every request “safe to retry.”

Q: How do I know if my index swap worked?
A: Check doc counts and hashes before cutover. If they mismatch, youre pointing at an incomplete index.

Q: Canary looked fine but production broke — why?
A: Canary often hides tail-latency, cache misses, or load-based rate limits. Always test at increasing % of live traffic.

Q: Why do you mention rollback comms?
A: Technical rollback is only half. Users and stakeholders need fast updates, so pre-draft Statuspage or Slack messages are essential.


🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Module Description Link
WFGY Core Canonical framework entry point View
Problem Map Diagnostic map and navigation hub View
Tension Universe Experiments MVP experiment field View
Recognition Where WFGY is referenced or adopted View
AI Guide Anti-hallucination reading protocol for tools View

If this repository helps, starring it improves discovery for other builders.
GitHub Repo stars