WFGY/ProblemMap/GlobalFixMap/Cloud_Serverless/serverless_limits_matrix.md

12 KiB
Raw Blame History

Serverless Limits Matrix and Safe Budgets

🧭 Quick Return to Map

You are in a sub-page of Cloud_Serverless.
To reorient, go back here:

Think of this page as a desk within a ward.
If you need the full triage and all prescriptions, return to the Emergency Room lobby.

A compact cheat-sheet to keep your RAG and agent flows inside the real limits of each serverless stack. Use this to set safe budgets for time, payload, memory, concurrency, and streaming so you never ship a workflow that silently truncates or times out.

When to use this page

  • “Works locally” but 413/414/431/502 appear in cloud.
  • Streams cut around 5560 seconds even though your function says 300.
  • Vector writes fail only under load due to soft concurrency ceilings.
  • JSON mode breaks on large tool outputs or long citations.
  • Cold starts spike latency after deploy or scale-out.

Open these first

Acceptance targets

  • Effective timeout budget computed and enforced at build time. No 5xx raised by upstream timeouts in synthetic probes.
  • p95 response size ≤ 70% of smallest body limit across the path.
  • Zero occurrences of 413/414/431 in production after rollout.
  • For RAG: no citation truncation, ΔS(question, retrieved) ≤ 0.45 on the probe set.

Compute your real budgets first

Rule of minimums The effective limit is the strictest element across the full path.

effective_timeout  = min( client_timeout,
                           edge/LB_timeout,
                           gateway_timeout,
                           function_timeout,
                           upstream_model_timeout )

effective_body_in  = min( client_upload_limit,
                           edge_header_limit + body_limit,
                           gateway_body_limit,
                           function_memory_and_tmp,
                           parser/frame limits )

effective_body_out = min( upstream_chunking,
                           function_response_limit,
                           compression_window,
                           edge/LB_response_limit,
                           client_reader_limit )

Quick mapping tips

  • Headers often cap at 816 KB. Large auth payloads or tool schemas can hit 431. Collapse headers and move contracts into the body.
  • Many edges proxy WebSocket and SSE differently. Budget for the stricter one when you stream.
  • Body limits interact with memory. If your platform ties CPU to memory, raising memory may fix both CPU starvation and output buffering.

The limits you must record per platform

Track these keys for each provider you deploy to. Do not hardcode numbers inside app code. Read from env or a small JSON and ship it with your release.

{
  "provider": "cloudflare|vercel|aws_lambda|gcp_cf|cloud_run|azure_f|fly|netlify",
  "region": "ap-southeast-1",
  "timeout_ms": 30000,
  "stream_idle_ms": 120000,
  "concurrency": {
    "soft": 100,
    "burst": 500,
    "account_ceiling_hint": "ticket required to raise beyond soft"
  },
  "memory_mb": 1024,
  "tmp_storage_mb": 512,
  "body_in_mb": 5,
  "body_out_mb": 6,
  "headers_kb": 8,
  "ws_supported": true,
  "sse_supported": true,
  "retry_semantics": "at_least_once|at_most_once",
  "notes": "anycast edge, drains needed for region flips"
}

Store per-route overrides. For example, upload routes differ from chat routes.


Copy-paste validator for CI

Add a tiny test that fails builds if you exceed the smallest limit across all active deployments.

Inputs:
- ROUTES.json with {route, path, needs_stream, max_req_mb, max_resp_mb, target_p95_ms}
- LIMITS.json per environment as in the schema above

Check:
1) For each route, timeout_budget = min(all provider timeout_ms) - drain_safety_ms
   Assert target_p95_ms ≤ 0.7 * timeout_budget
2) Assert route.max_req_mb ≤ min(body_in_mb across providers) * 0.7
3) Assert route.max_resp_mb ≤ min(body_out_mb across providers) * 0.7
4) If needs_stream == true → assert all providers sse_supported == true and stream_idle_ms ≥ target_window
5) If concurrency soft < expected peak → require provisioned capacity plan

Fail with a one-line tip including the smallest limiting provider.

Typical failure patterns → exact fix

  • 413 Payload Too Large on uploads or tool schemas Shrink request body. Move oversized JSON schemas to a versioned blob, reference by id. Open: Data Contracts

  • 431 Request Header Fields Too Large Headers grew due to long auth tokens or excessive trace keys. Collapse to a single compact header. Open: Prompt Injection to lock tool schema and avoid header bloat.

  • Streams end at a fixed wall clock You hit edge or LB idle timeout. Use heartbeat frames every N seconds and lower per-frame payload to keep buffers small. Open: Timeouts and Streaming Limits

  • JSON mode breaks on long citations Response body exceeded a proxy frame or you buffered the whole stream. Switch to chunked JSON lines with cite-then-explain and a strict schema. Open: Retrieval Playbook

  • Burst traffic returns 429 despite “unlimited” concurrency Account or regional soft limits apply. Pre-warm with provisioned capacity and gate at the queue producer using idempotency keys. Open: Cold Start and Concurrency · KV and Queues


Pattern: budget table you can paste in a repo

Create infra/limits/<env>.json and keep it close to code. Example skeleton for a dual-provider rollout:

{
  "env": "prod-2025-08-27",
  "providers": [
    {
      "provider": "aws_lambda",
      "region": "ap-southeast-1",
      "timeout_ms": 30000,
      "stream_idle_ms": 120000,
      "concurrency": { "soft": 150, "burst": 500 },
      "memory_mb": 1024,
      "tmp_storage_mb": 512,
      "body_in_mb": 6,
      "body_out_mb": 6,
      "headers_kb": 8,
      "ws_supported": false,
      "sse_supported": true,
      "retry_semantics": "at_least_once",
      "notes": "ALB behind API GW, drain on deploy"
    },
    {
      "provider": "cloudflare",
      "region": "global-anycast",
      "timeout_ms": 30000,
      "stream_idle_ms": 120000,
      "concurrency": { "soft": 1000, "burst": 5000 },
      "memory_mb": 256,
      "tmp_storage_mb": 0,
      "body_in_mb": 10,
      "body_out_mb": 10,
      "headers_kb": 8,
      "ws_supported": true,
      "sse_supported": true,
      "retry_semantics": "at_most_once",
      "notes": "Edge streams, pin region on long flows"
    }
  ]
}

Then let CI compute the minimums and fail fast if your routes exceed them.


Observability you should add

  • Per-route histograms for payload in/out.
  • Count of 413/414/431 and which hop returned them.
  • Stream lifetime distribution and idle gaps.
  • Cold start count and warm pool utilization.
  • Queue age and rejection rate when concurrency caps.

Verification

  • Synthetic routes produce zero 413/431.
  • Stream probes survive at least 2× your target window.
  • p95 stays below 70% of the smallest timeout limit after rollout.
  • RAG probe set shows no truncation of citations.

When to escalate

  • Repeated 502/504 with flat app CPU. You are hitting proxy timeouts. Lower hop count or raise edge timeout where possible, otherwise split the plan.
  • Body out is the wall. Switch to paged responses or chunked JSON lines with a cite index.

🔗 Quick-Start Downloads (60 sec)

Tool Link 3-Step Setup
WFGY 1.0 PDF Engine Paper 1 Download · 2 Upload to your LLM · 3 Ask “Answer using WFGY + <your question>”
TXT OS (plain-text OS) TXTOS.txt 1 Download · 2 Paste into any LLM chat · 3 Type “hello world” — OS boots instantly

Explore More

Layer Page What its for
Proof WFGY Recognition Map External citations, integrations, and ecosystem proof
⚙️ Engine WFGY 1.0 Original PDF tension engine and early logic sketch (legacy reference)
⚙️ Engine WFGY 2.0 Production tension kernel for RAG and agent systems
⚙️ Engine WFGY 3.0 TXT based Singularity tension engine (131 S class set)
🗺️ Map Problem Map 1.0 Flagship 16 problem RAG failure taxonomy and fix map
🗺️ Map Problem Map 2.0 Global Debug Card for RAG and agent pipeline diagnosis
🗺️ Map Problem Map 3.0 Global AI troubleshooting atlas and failure pattern map
🧰 App TXT OS .txt semantic OS with fast bootstrap
🧰 App Blah Blah Blah Abstract and paradox Q&A built on TXT OS
🧰 App Blur Blur Blur Text to image generation with semantic control
🏡 Onboarding Starter Village Guided entry point for new users

If this repository helped, starring it improves discovery so more builders can find the docs and tools.
GitHub Repo stars