WFGY/ProblemMap/GlobalFixMap/Cloud_Serverless/stateless_kv_queue_patterns.md

9 KiB
Raw Blame History

Stateless KV and Queue Patterns — Guardrails

🧭 Quick Return to Map

You are in a sub-page of Cloud_Serverless.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A compact repair guide for serverless and edge stacks that must survive retries, bursts, out-of-order events, and duplicate webhooks. All patterns map back to Problem Map pages with measurable targets.

When to use this page

  • Functions are stateless and scale horizontally.
  • You see duplicate side effects on retries or webhook storms.
  • Event order is not guaranteed and you must reconcile late arrivals.
  • Writes cross multiple systems and need exactly-once semantics in practice.
  • You need queues for backpressure without introducing deadlocks.

Open these first

Acceptance targets

  • Duplicate side-effects ≤ 0.1 percent across 1 million events.
  • P99 queue latency ≤ 5 seconds after backpressure is active.
  • Lock contention ≤ 3 percent with P95 wait ≤ 100 ms.
  • Exactly-once at the boundary: user-visible action never applied twice even with three retries.
  • Audit fields present on every message: event_id, source_id, revision, index_hash, dedupe_key, attempt.

Fix in 60 seconds

  1. Define the dedupe key dedupe_key = sha256(source_id + revision + index_hash) Store in kv/done/<dedupe_key> = {ts, status, result_id} with TTL ≥ max replay window. Contract: data-contracts.md

  2. Use a leased lock for single-writer sections kv/lock/<resource> with value token and ttl = function_timeout - safety_margin. Only the holder token can release. On expiry, another worker may acquire.

  3. Put work behind a queue with visibility timeout Visibility timeout must be greater than max_processing_time + jitter. Failed items go to DLQ after N attempts. Record attempt in the message.

  4. Outbox before publish Write business record and outbox/<txid> in the same atomic store. A relay process reads the outbox and publishes to the queue. Consumers use inbox/<txid> as the dedupe ledger.

  5. Idempotent side-effects Before any external call, check kv/done/<dedupe_key>. If present, return the prior result. If absent, perform the effect then write the ledger.


Typical breakpoints → exact fix

  • Webhook storms and double billing Missing dedupe ledger or TTL too short. Add kv/done/<dedupe_key> with long TTL and verify before charge. Open: ops/live_monitoring_rag.md

  • Queue grows without bound No backpressure or priority lanes. Add consumption quotas per tenant and a high-watermark shed rule. Open: ops/debug_playbook.md

  • Deadlocks when multiple writers compete Locks without leases or cross-locks across resources. Replace with single resource lock per critical section and time-boxed lease. Open: deployment-deadlock.md

  • Out-of-order events overwrite newer state Apply a monotonic guard: reject if revision_in_msg < current_revision. Keep a compaction job to expire stale inbox entries. Open: bootstrap-ordering.md


Minimal recipes you can copy

A) Leased lock with safe release

acquire(key, ttl):
  token = random()
  ok = kv.setnx("lock/"+key, token, ttl)
  return ok ? token : null

release(key, token):
  if kv.get("lock/"+key) == token:
    kv.del("lock/"+key)

B) Idempotency ledger

dedupe_key = sha256(source_id + revision + index_hash)

if kv.get("done/"+dedupe_key):
  return replay(kv.get("done/"+dedupe_key))

result = do_side_effect()
kv.set("done/"+dedupe_key, {ts: now(), result_id: result.id}, ttl=30d)
return result

C) Outbox and inbox

// producer tx
db.begin()
db.insert("orders", row)
db.insert("outbox", {txid, payload})
db.commit()

// relay
for msg in db.scan("outbox"):
  queue.send({txid, payload})
  db.delete("outbox", msg.txid)

// consumer
if kv.get("inbox/"+txid): return
handle(payload)
kv.set("inbox/"+txid, now(), ttl=30d)

D) Token bucket for backpressure

bucket = "q_tokens/"+tenant
tokens = kv.incrby(bucket, 1, ttl=1s)
if tokens > limit: requeue_with_delay()

Observability you must add

  • queue_depth, age_of_oldest, consume_rate, requeue_rate, dlq_rate.
  • lock_acquired, lock_conflicts, lock_wait_ms_p95.
  • dedupe_hits, double_effects_detected.
  • Trace fields on every event: event_id, dedupe_key, attempt, tenant, route.

Verification

  • Fire a test that sends the same event 5 times with random delays. Only one side-effect is applied.
  • Kill a worker mid-flight. Message becomes visible again and finishes once.
  • Reorder by sending revision=1,3,2. State ends at 3. Open: eval/eval_rag_precision_recall.md

When to escalate

  • If locks create long waits or cascades, redesign the critical section into outbox plus eventual reconciliation.
  • If queue depth oscillates, split lanes by priority or tenant and apply quotas. Open: ops/live_monitoring_rag.md

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Module Description Link
WFGY Core Canonical framework entry point View
Problem Map Diagnostic map and navigation hub View
Tension Universe Experiments MVP experiment field View
Recognition Where WFGY is referenced or adopted View
AI Guide Anti-hallucination reading protocol for tools View

If this repository helps, starring it improves discovery for other builders.
GitHub Repo stars