WFGY/ProblemMap/GlobalFixMap/Cloud_Serverless/deploy_traffic_shaping.md

9.1 KiB
Raw Blame History

Deploy and Traffic Shaping — Guardrails

🧭 Quick Return to Map

You are in a sub-page of Cloud_Serverless.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

Control how new code reaches users so you avoid cold spikes, duplicate effects, and version-skew crashes. This page gives a fast, provider-agnostic playbook for serverless rollouts with RAG, streaming, queues, and webhooks in the loop.

When to use this page

  • “Zero-downtime” deploys still produce 5xx or p95 jumps.
  • Webhooks or jobs fire twice during slot swaps or revision flips.
  • Streaming responses cut mid-flight while traffic shifts.
  • Vector writes or indexes receive mixed schema during rollout.
  • Canary looks healthy, then collapses at 2550 percent.

Open these first

Acceptance targets

  • No increase in 5xx rate at each traffic step.
  • p95 latency delta between old and new ≤ 10 percent at equal load.
  • Cold-start share ≤ 5 percent after warm-up gates.
  • No duplicate side effects. Dedupe hit rate ≥ 99.9 percent for idempotent POST.
  • For RAG: ΔS(question, retrieved) stays within ±0.03 between old and new. λ remains convergent on two seeds.

Fix in 60 seconds

  1. Canary with gates Route 1 → 5 → 25 → 50 → 100 percent only if gates pass: 5xx flat, p95 flat, ΔS stable, queue age flat.

  2. Pre-warm and pin Set min instances or provisioned concurrency for the new revision. Pin a canary header X-Release: new for synthetic traffic before shifting users.

  3. Graceful draining Enable connection draining. Keep old revision serving for N seconds while you stop sending new requests. Do not kill active streams.

  4. Idempotency fence Add Idempotency-Key on all write paths and background jobs. Store in KV for at least release_window + 24 h.

  5. Schema locks Embed INDEX_HASH, SCHEMA_REV, and MODEL_TAG in every request. Reject if the backend is behind. Route user to the matching revision.


Typical breakpoints → exact fix

  • Version skew between retriever and writer New code writes updated fields but retriever uses old schema. Lock with SCHEMA_REV in requests and refuse cross-rev traffic. Open: Data Contracts

  • Webhook storms after deploy Retries plus slot swap triggers double delivery. Use dedupe key = sha256(source_id + event_rev + index_hash). Open: Bootstrap Ordering

  • Canary fine, 25 percent fails New revision saturates cold CPU or NAT. Raise min instances or provisioned concurrency before the 25 percent step. Open: Pre-Deploy Collapse

  • Streaming cuts during shift Traffic router closes long-lived connections. Enable draining and pin streams to a revision using a sticky cookie or header.

  • Queues drain twice Two workers on different revisions read the same shard without a fence. Lock by queue_partition + release_id.


Minimal recipes you can copy

A) Weighted rollout with pinned canary

Headers:
- X-Release: canary
- X-Canary: true
Routing:
- 1% → 5% → 25% → 50% → 100%
Gates per step:
- 5xx delta ≤ 0.05%, p95 delta ≤ 10%
- ΔS drift ≤ 0.03 on canary questions (k=10)
- Queue age slope ≤ 0
Rollback:
- instant shift to previous revision if any gate fails

B) Serverless config checklist

Warm-up
- min_instances or provisioned_concurrency ≥ expected p50 load
- warm path hits cache, model, secrets, vector client

Draining
- connection_drain_seconds: 60120
- keep old revision for draining window after last route

Idempotency
- Idempotency-Key on POST, stored in KV ≥ 24h
- retry policy: 3 tries, jitter, total time < request timeout

Schema pins
- SCHEMA_REV, INDEX_HASH, MODEL_TAG in every call

C) Background jobs and batch

During 125% steps:
- pause heavy batch, only allow user-facing flows
- cap queue concurrency per partition
- fence by (partition, release_id)

Observability you must add

  • Split all metrics by release_id and revision.
  • 5xx, p95, cold-start rate per revision.
  • Queue age, retries, dedupe_hits.
  • ΔS and λ on a fixed probe set, old vs new.
  • Rollout timeline with traffic percent and gate decisions.

Verification

  • Probe set shows ΔS and λ unchanged within targets.
  • No duplicate side effects in logs for the rollout window.
  • p95 and error rate flat across each step.
  • Draining proves streams finish on old revision.

When to escalate

  • Any gate fails twice at the same step → freeze traffic, roll back, keep warm, investigate NAT, DNS, or cold-start headroom.
  • Schema incompatibility detected → hold rollout, bump SCHEMA_REV, re-index or add translation layer, re-run canary.
  • Sticky routing not respected → switch to cookie pinning or per-request header pin.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
Engine WFGY 1.0 Original PDF based tension engine
Engine WFGY 2.0 Production tension kernel and math engine for RAG and agents
Engine WFGY 3.0 TXT based Singularity tension engine, 131 S class set
Map Problem Map 1.0 Flagship 16 problem RAG failure checklist and fix map
Map Problem Map 2.0 RAG focused recovery pipeline
Map Problem Map 3.0 Global Debug Card, image as a debug protocol layer
Map Semantic Clinic Symptom to family to exact fix
Map Grandmas Clinic Plain language stories mapped to Problem Map 1.0
Onboarding Starter Village Guided tour for newcomers
App TXT OS TXT semantic OS, fast boot
App Blah Blah Blah Abstract and paradox Q and A built on TXT OS
App Blur Blur Blur Text to image with semantic control
App Blow Blow Blow Reasoning game engine and memory demo

If this repository helped, starring it improves discovery so more builders can find the docs and tools. GitHub Repo stars