WFGY/ProblemMap/GlobalFixMap/Cloud_Serverless/deploy_traffic_shaping.md

9 KiB
Raw Permalink Blame History

Deploy and Traffic Shaping — Guardrails

🧭 Quick Return to Map

You are in a sub-page of Cloud_Serverless.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Control how new code reaches users so you avoid cold spikes, duplicate effects, and version-skew crashes. This page gives a fast, provider-agnostic playbook for serverless rollouts with RAG, streaming, queues, and webhooks in the loop.

When to use this page

  • “Zero-downtime” deploys still produce 5xx or p95 jumps.
  • Webhooks or jobs fire twice during slot swaps or revision flips.
  • Streaming responses cut mid-flight while traffic shifts.
  • Vector writes or indexes receive mixed schema during rollout.
  • Canary looks healthy, then collapses at 2550 percent.

Open these first

Acceptance targets

  • No increase in 5xx rate at each traffic step.
  • p95 latency delta between old and new ≤ 10 percent at equal load.
  • Cold-start share ≤ 5 percent after warm-up gates.
  • No duplicate side effects. Dedupe hit rate ≥ 99.9 percent for idempotent POST.
  • For RAG: ΔS(question, retrieved) stays within ±0.03 between old and new. λ remains convergent on two seeds.

Fix in 60 seconds

  1. Canary with gates Route 1 → 5 → 25 → 50 → 100 percent only if gates pass: 5xx flat, p95 flat, ΔS stable, queue age flat.

  2. Pre-warm and pin Set min instances or provisioned concurrency for the new revision. Pin a canary header X-Release: new for synthetic traffic before shifting users.

  3. Graceful draining Enable connection draining. Keep old revision serving for N seconds while you stop sending new requests. Do not kill active streams.

  4. Idempotency fence Add Idempotency-Key on all write paths and background jobs. Store in KV for at least release_window + 24 h.

  5. Schema locks Embed INDEX_HASH, SCHEMA_REV, and MODEL_TAG in every request. Reject if the backend is behind. Route user to the matching revision.


Typical breakpoints → exact fix

  • Version skew between retriever and writer New code writes updated fields but retriever uses old schema. Lock with SCHEMA_REV in requests and refuse cross-rev traffic. Open: Data Contracts

  • Webhook storms after deploy Retries plus slot swap triggers double delivery. Use dedupe key = sha256(source_id + event_rev + index_hash). Open: Bootstrap Ordering

  • Canary fine, 25 percent fails New revision saturates cold CPU or NAT. Raise min instances or provisioned concurrency before the 25 percent step. Open: Pre-Deploy Collapse

  • Streaming cuts during shift Traffic router closes long-lived connections. Enable draining and pin streams to a revision using a sticky cookie or header.

  • Queues drain twice Two workers on different revisions read the same shard without a fence. Lock by queue_partition + release_id.


Minimal recipes you can copy

A) Weighted rollout with pinned canary

Headers:
- X-Release: canary
- X-Canary: true
Routing:
- 1% → 5% → 25% → 50% → 100%
Gates per step:
- 5xx delta ≤ 0.05%, p95 delta ≤ 10%
- ΔS drift ≤ 0.03 on canary questions (k=10)
- Queue age slope ≤ 0
Rollback:
- instant shift to previous revision if any gate fails

B) Serverless config checklist

Warm-up
- min_instances or provisioned_concurrency ≥ expected p50 load
- warm path hits cache, model, secrets, vector client

Draining
- connection_drain_seconds: 60120
- keep old revision for draining window after last route

Idempotency
- Idempotency-Key on POST, stored in KV ≥ 24h
- retry policy: 3 tries, jitter, total time < request timeout

Schema pins
- SCHEMA_REV, INDEX_HASH, MODEL_TAG in every call

C) Background jobs and batch

During 125% steps:
- pause heavy batch, only allow user-facing flows
- cap queue concurrency per partition
- fence by (partition, release_id)

Observability you must add

  • Split all metrics by release_id and revision.
  • 5xx, p95, cold-start rate per revision.
  • Queue age, retries, dedupe_hits.
  • ΔS and λ on a fixed probe set, old vs new.
  • Rollout timeline with traffic percent and gate decisions.

Verification

  • Probe set shows ΔS and λ unchanged within targets.
  • No duplicate side effects in logs for the rollout window.
  • p95 and error rate flat across each step.
  • Draining proves streams finish on old revision.

When to escalate

  • Any gate fails twice at the same step → freeze traffic, roll back, keep warm, investigate NAT, DNS, or cold-start headroom.
  • Schema incompatibility detected → hold rollout, bump SCHEMA_REV, re-index or add translation layer, re-run canary.
  • Sticky routing not respected → switch to cookie pinning or per-request header pin.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars