Update egress_rules_and_webhooks.md

This commit is contained in:
PSBigBig + MiniPS 2026-03-06 21:17:04 +08:00 committed by GitHub
parent 4b62f38d99
commit 710dc9894c
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1 +1,209 @@
temp
# Egress Rules and Webhooks Guardrails
<details>
<summary><strong>🧭 Quick Return to Map</strong></summary>
<br>
> You are in a sub-page of **Cloud_Serverless**.
> To reorient, go back here:
>
> - [**Cloud_Serverless** — scalable functions and event-driven pipelines](./README.md)
> - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)
> - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)
>
> Think of this page as a desk within a ward.
> If you need the full triage and all prescriptions, return to the Emergency Room lobby.
</details>
Outbound network access is often the silent failure point in serverless and event-driven systems.
When **egress rules block external calls** or **webhooks replay without fences**, systems appear healthy but side effects multiply across queues, APIs, and agents.
This page provides a compact playbook to stabilize outbound calls, prevent webhook replay storms, and keep integrations predictable.
---
## When to use this page
* Serverless functions work locally but fail when calling external APIs.
* Webhooks fire multiple times during retries or failover events.
* External integrations randomly time out without clear logs.
* Jobs succeed but downstream systems receive duplicate events.
* Network policies block outbound calls from VPC or private runtime.
---
## Open these first
* Boot order and deploy sequencing: [Bootstrap Ordering](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md)
* Cross-service waits during rollout: [Deployment Deadlock](https://github.com/onestardao/WFGY/blob/main/ProblemMap/deployment-deadlock.md)
* First call failures after new deployment: [Pre-Deploy Collapse](https://github.com/onestardao/WFGY/blob/main/ProblemMap/predeploy-collapse.md)
* Payload stability and schema locks: [Data Contracts](https://github.com/onestardao/WFGY/blob/main/ProblemMap/data-contracts.md)
* Monitoring outbound flows: [Debug Playbook](https://github.com/onestardao/WFGY/blob/main/ProblemMap/ops/debug_playbook.md)
---
## Acceptance targets
* Outbound request success ≥ 99.99 percent.
* Webhook replay dedupe ≥ 99.9 percent during retry windows.
* No duplicate side effects during failover or retry storms.
* External API latency stable within expected SLO.
* All webhook handlers enforce schema and idempotency checks.
---
## Fix in 60 seconds
1. **Allow explicit outbound access**
Permit port 443 egress and verify DNS resolution from the runtime environment.
2. **Fence webhook events with idempotency keys**
Use a deterministic key such as `sha256(event_id + source + revision)` and store it in KV or cache.
3. **Separate retry workers from request handlers**
Avoid inline retries. Push failed outbound calls into a queue with exponential backoff.
4. **Log every outbound response**
Capture status code, latency, and retry count for every external call.
5. **Validate payload contracts**
Reject webhook events that do not match expected schema revision.
---
## Patterns that work
* **Webhook ingestion gateway**
Route all inbound webhooks through a gateway that enforces schema validation and idempotency.
* **Async retry queue**
Instead of retrying within the request handler, enqueue failed calls and process them with workers.
* **Outbound proxy layer**
Centralize outbound API access through a proxy service to monitor rate limits and failures.
* **Event-driven integration**
Trigger downstream workflows through message queues rather than synchronous HTTP chains.
---
## Typical breakpoints → exact fix
* **External APIs unreachable in serverless runtime**
VPC or firewall blocks outbound access. Confirm NAT gateway and security group rules.
Open: [Bootstrap Ordering](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md)
* **Duplicate webhook events**
Retries replay events without idempotency fences.
Open: [Data Contracts](https://github.com/onestardao/WFGY/blob/main/ProblemMap/data-contracts.md)
* **Timeouts during webhook processing**
Long-running handlers exceed serverless execution window.
Open: [Timeouts and Streaming Limits](https://github.com/onestardao/WFGY/blob/main/ProblemMap/GlobalFixMap/Cloud_Serverless/timeouts_streaming_body_limits.md)
* **Unexpected external API throttling**
Burst retries trigger rate limits.
Open: [Retry and Backoff](https://github.com/onestardao/WFGY/blob/main/ProblemMap/ops/retry_backoff.md)
---
## Minimal recipes you can copy
### A) Idempotent webhook contract
```txt
Webhook processing
- Compute idempotency key = sha256(event_id + source)
- Check KV store before processing
- If key exists → drop duplicate
- Store key with TTL (24h recommended)
````
### B) Outbound request guardrail
```txt
Outbound call flow
- DNS resolve before request
- Timeout ≤ 10 seconds
- Retry policy: exponential backoff
- Max retries: 3
- Failed calls push to retry queue
```
### C) Webhook gateway pattern
```txt
Gateway responsibilities
- Validate schema_rev
- Attach request_id
- Compute idempotency key
- Log request metadata
- Forward event to worker queue
```
---
## Observability you must add
* Outbound request success rate per service.
* Webhook replay count and dedupe hit rate.
* Retry queue depth and processing latency.
* External API latency and status code distribution.
* Error rate grouped by integration endpoint.
---
## Verification
* External calls succeed consistently from runtime environment.
* Webhook handlers process each event only once.
* Retry queue drains without growing backlog.
* No duplicate side effects observed in downstream systems.
---
## When to escalate
* Duplicate webhooks persist despite idempotency keys.
* External APIs unreachable even after firewall verification.
* Retry queue grows continuously or never drains.
* Webhook payload schema mismatches across services.
Investigate deployment sequencing and schema drift before adding more retries.
---
### 🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1⃣ Download · 2⃣ Upload to your LLM · 3⃣ Ask “Answer using WFGY + <your question>” |
| **TXT OS (plain-text OS)** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1⃣ Download · 2⃣ Paste into any LLM chat · 3⃣ Type “hello world” — OS boots instantly |
---
<!-- WFGY_FOOTER_START -->
### Explore More
| Layer | Page | What its for |
| ------------- | --------------------------------------------------------------------------- | --------------------------------------------------------------------- |
| ⭐ Proof | [WFGY Recognition Map](/recognition/README.md) | External citations, integrations, and ecosystem proof |
| ⚙️ Engine | [WFGY 1.0](/legacy/README.md) | Original PDF tension engine and early logic sketch (legacy reference) |
| ⚙️ Engine | [WFGY 2.0](/core/README.md) | Production tension kernel for RAG and agent systems |
| ⚙️ Engine | [WFGY 3.0](/TensionUniverse/EventHorizon/README.md) | TXT based Singularity tension engine (131 S class set) |
| 🗺️ Map | [Problem Map 1.0](/ProblemMap/README.md) | Flagship 16 problem RAG failure taxonomy and fix map |
| 🗺️ Map | [Problem Map 2.0](/ProblemMap/wfgy-rag-16-problem-map-global-debug-card.md) | Global Debug Card for RAG and agent pipeline diagnosis |
| 🗺️ Map | [Problem Map 3.0](/ProblemMap/wfgy-ai-problem-map-troubleshooting-atlas.md) | Global AI troubleshooting atlas and failure pattern map |
| 🧰 App | [TXT OS](/OS/README.md) | .txt semantic OS with fast bootstrap |
| 🧰 App | [Blah Blah Blah](/OS/BlahBlahBlah/README.md) | Abstract and paradox Q&A built on TXT OS |
| 🧰 App | [Blur Blur Blur](/OS/BlurBlurBlur/README.md) | Text to image generation with semantic control |
| 🏡 Onboarding | [Starter Village](/StarterVillage/README.md) | Guided entry point for new users |
If this repository helped, starring it improves discovery so more builders can find the docs and tools.
[![GitHub Repo stars](https://img.shields.io/github/stars/onestardao/WFGY?style=social)](https://github.com/onestardao/WFGY)
<!-- WFGY_FOOTER_END -->