12 KiB
Serverless Limits Matrix and Safe Budgets
🧭 Quick Return to Map
You are in a sub-page of Cloud_Serverless.
To reorient, go back here:
- Cloud_Serverless — scalable functions and event-driven pipelines
- WFGY Global Fix Map — main Emergency Room, 300+ structured fixes
- WFGY Problem Map 1.0 — 16 reproducible failure modes
Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.
A compact cheat-sheet to keep your RAG and agent flows inside the real limits of each serverless stack. Use this to set safe budgets for time, payload, memory, concurrency, and streaming so you never ship a workflow that silently truncates or times out.
When to use this page
- “Works locally” but 413/414/431/502 appear in cloud.
- Streams cut around 55–60 seconds even though your function says 300.
- Vector writes fail only under load due to soft concurrency ceilings.
- JSON mode breaks on large tool outputs or long citations.
- Cold starts spike latency after deploy or scale-out.
Open these first
- End-to-end runtime ceilings: Timeouts and Streaming Limits
- Cold starts, concurrency, provisioned capacity: Cold Start and Concurrency
- Idempotency for retries and queues: Stateless KV and Queue Patterns
- Retrieval correctness under truncation: Retrieval Playbook · Data Contracts
- Live probes and rollback: Live Monitoring for RAG · Debug Playbook
Acceptance targets
- Effective timeout budget computed and enforced at build time. No 5xx raised by upstream timeouts in synthetic probes.
- p95 response size ≤ 70% of smallest body limit across the path.
- Zero occurrences of 413/414/431 in production after rollout.
- For RAG: no citation truncation, ΔS(question, retrieved) ≤ 0.45 on the probe set.
Compute your real budgets first
Rule of minimums The effective limit is the strictest element across the full path.
effective_timeout = min( client_timeout,
edge/LB_timeout,
gateway_timeout,
function_timeout,
upstream_model_timeout )
effective_body_in = min( client_upload_limit,
edge_header_limit + body_limit,
gateway_body_limit,
function_memory_and_tmp,
parser/frame limits )
effective_body_out = min( upstream_chunking,
function_response_limit,
compression_window,
edge/LB_response_limit,
client_reader_limit )
Quick mapping tips
- Headers often cap at 8–16 KB. Large auth payloads or tool schemas can hit 431. Collapse headers and move contracts into the body.
- Many edges proxy WebSocket and SSE differently. Budget for the stricter one when you stream.
- Body limits interact with memory. If your platform ties CPU to memory, raising memory may fix both CPU starvation and output buffering.
The limits you must record per platform
Track these keys for each provider you deploy to. Do not hardcode numbers inside app code. Read from env or a small JSON and ship it with your release.
{
"provider": "cloudflare|vercel|aws_lambda|gcp_cf|cloud_run|azure_f|fly|netlify",
"region": "ap-southeast-1",
"timeout_ms": 30000,
"stream_idle_ms": 120000,
"concurrency": {
"soft": 100,
"burst": 500,
"account_ceiling_hint": "ticket required to raise beyond soft"
},
"memory_mb": 1024,
"tmp_storage_mb": 512,
"body_in_mb": 5,
"body_out_mb": 6,
"headers_kb": 8,
"ws_supported": true,
"sse_supported": true,
"retry_semantics": "at_least_once|at_most_once",
"notes": "anycast edge, drains needed for region flips"
}
Store per-route overrides. For example, upload routes differ from chat routes.
Copy-paste validator for CI
Add a tiny test that fails builds if you exceed the smallest limit across all active deployments.
Inputs:
- ROUTES.json with {route, path, needs_stream, max_req_mb, max_resp_mb, target_p95_ms}
- LIMITS.json per environment as in the schema above
Check:
1) For each route, timeout_budget = min(all provider timeout_ms) - drain_safety_ms
Assert target_p95_ms ≤ 0.7 * timeout_budget
2) Assert route.max_req_mb ≤ min(body_in_mb across providers) * 0.7
3) Assert route.max_resp_mb ≤ min(body_out_mb across providers) * 0.7
4) If needs_stream == true → assert all providers sse_supported == true and stream_idle_ms ≥ target_window
5) If concurrency soft < expected peak → require provisioned capacity plan
Fail with a one-line tip including the smallest limiting provider.
Typical failure patterns → exact fix
-
413 Payload Too Large on uploads or tool schemas Shrink request body. Move oversized JSON schemas to a versioned blob, reference by id. Open: Data Contracts
-
431 Request Header Fields Too Large Headers grew due to long auth tokens or excessive trace keys. Collapse to a single compact header. Open: Prompt Injection to lock tool schema and avoid header bloat.
-
Streams end at a fixed wall clock You hit edge or LB idle timeout. Use heartbeat frames every N seconds and lower per-frame payload to keep buffers small. Open: Timeouts and Streaming Limits
-
JSON mode breaks on long citations Response body exceeded a proxy frame or you buffered the whole stream. Switch to chunked JSON lines with cite-then-explain and a strict schema. Open: Retrieval Playbook
-
Burst traffic returns 429 despite “unlimited” concurrency Account or regional soft limits apply. Pre-warm with provisioned capacity and gate at the queue producer using idempotency keys. Open: Cold Start and Concurrency · KV and Queues
Pattern: budget table you can paste in a repo
Create infra/limits/<env>.json and keep it close to code. Example skeleton for a dual-provider rollout:
{
"env": "prod-2025-08-27",
"providers": [
{
"provider": "aws_lambda",
"region": "ap-southeast-1",
"timeout_ms": 30000,
"stream_idle_ms": 120000,
"concurrency": { "soft": 150, "burst": 500 },
"memory_mb": 1024,
"tmp_storage_mb": 512,
"body_in_mb": 6,
"body_out_mb": 6,
"headers_kb": 8,
"ws_supported": false,
"sse_supported": true,
"retry_semantics": "at_least_once",
"notes": "ALB behind API GW, drain on deploy"
},
{
"provider": "cloudflare",
"region": "global-anycast",
"timeout_ms": 30000,
"stream_idle_ms": 120000,
"concurrency": { "soft": 1000, "burst": 5000 },
"memory_mb": 256,
"tmp_storage_mb": 0,
"body_in_mb": 10,
"body_out_mb": 10,
"headers_kb": 8,
"ws_supported": true,
"sse_supported": true,
"retry_semantics": "at_most_once",
"notes": "Edge streams, pin region on long flows"
}
]
}
Then let CI compute the minimums and fail fast if your routes exceed them.
Observability you should add
- Per-route histograms for payload in/out.
- Count of 413/414/431 and which hop returned them.
- Stream lifetime distribution and idle gaps.
- Cold start count and warm pool utilization.
- Queue age and rejection rate when concurrency caps.
Verification
- Synthetic routes produce zero 413/431.
- Stream probes survive at least 2× your target window.
- p95 stays below 70% of the smallest timeout limit after rollout.
- RAG probe set shows no truncation of citations.
When to escalate
- Repeated 502/504 with flat app CPU. You are hitting proxy timeouts. Lower hop count or raise edge timeout where possible, otherwise split the plan.
- Body out is the wall. Switch to paged responses or chunked JSON lines with a cite index.
🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
|---|---|---|
| WFGY 1.0 PDF | Engine Paper | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + <your question>” |
| TXT OS (plain-text OS) | TXTOS.txt | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
Explore More
| Layer | Page | What it’s for |
|---|---|---|
| ⭐ Proof | WFGY Recognition Map | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | WFGY 1.0 | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | WFGY 2.0 | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | WFGY 3.0 | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | Problem Map 1.0 | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | Problem Map 2.0 | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | Problem Map 3.0 | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | TXT OS | .txt semantic OS with fast bootstrap |
| 🧰 App | Blah Blah Blah | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | Blur Blur Blur | Text to image generation with semantic control |
| 🏡 Onboarding | Starter Village | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.