# Environment Bootstrap and Migrations Guardrails
🧭 Quick Return to Map
> You are in a sub-page of **Cloud_Serverless**.
> To reorient, go back here:
>
> - [**Cloud_Serverless** — scalable functions and event-driven pipelines](./README.md)
> - [**WFGY Global Fix Map** — main Emergency Room, 300+ structured fixes](../README.md)
> - [**WFGY Problem Map 1.0** — 16 reproducible failure modes](../../README.md)
>
> Think of this page as a desk within a ward.
> If you need the full triage and all prescriptions, return to the Emergency Room lobby.
Modern cloud systems rarely fail because of code alone.
Most incidents happen when a **new environment boots incorrectly**, or when **database / schema migrations run at the wrong time**.
When environments initialize out of order, migrations can race with services, schemas drift between regions, and agents read partially upgraded data.
This page provides guardrails for safe environment bootstrap and predictable migration workflows in serverless and event-driven systems.
---
## When to use this page
* New deployments fail on the first request but work after retry.
* Database schema mismatches appear after a rollout.
* Services start before dependencies are ready.
* Jobs run migrations twice or skip them entirely.
* A new region or environment returns inconsistent responses.
---
## Open these first
* Boot order and deploy sequencing:
[Bootstrap Ordering](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md)
* Cross-service waits during rollout:
[Deployment Deadlock](https://github.com/onestardao/WFGY/blob/main/ProblemMap/deployment-deadlock.md)
* First request fails after a deploy:
[Pre-Deploy Collapse](https://github.com/onestardao/WFGY/blob/main/ProblemMap/predeploy-collapse.md)
* Payload contracts and schema stability:
[Data Contracts](https://github.com/onestardao/WFGY/blob/main/ProblemMap/data-contracts.md)
* Retrieval correctness after index rebuild:
[Retrieval Playbook](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-playbook.md)
---
## Acceptance targets
* Environment bootstrap completes without manual retries.
* Migrations execute exactly once per revision.
* Schema versions consistent across all regions.
* No service reads data from a partially migrated schema.
* Migration runtime predictable and observable.
For RAG stacks:
* ΔS(question, retrieved) drift ≤ 0.03 after reindex.
* Index metadata identical across environments before traffic.
---
## Fix in 60 seconds
1. **Separate bootstrap from service startup**
Infrastructure initialization, secrets loading, and migrations should run before application services accept traffic.
2. **Run migrations as a controlled job**
Execute migrations through a dedicated job runner or CI pipeline rather than inside request handlers.
3. **Version everything**
Track `schema_rev`, `index_hash`, and `release_id`.
Services refuse to start if incompatible versions are detected.
4. **Gate traffic after bootstrap**
Block user traffic until health probes confirm:
* schema ready
* secrets loaded
* index parity verified
5. **Record migration state**
Store migration history in a durable table so jobs cannot re-run completed migrations.
---
## Patterns that work
* **Migration job runner**
Use a scheduled job or container task to execute migrations before service rollout.
* **Immutable environment revisions**
Each deploy produces a new revision with explicit schema and index versions.
* **Schema compatibility windows**
Design migrations so old services can still read during rollout.
* **Bootstrap contract checks**
Health probes validate schema version, secrets version, and index hash before allowing traffic.
---
## Typical breakpoints → exact fix
* **Service starts before migration finishes**
Boot sequence incorrect.
Gate startup until migration completion.
Open:
[Bootstrap Ordering](https://github.com/onestardao/WFGY/blob/main/ProblemMap/bootstrap-ordering.md)
---
* **Migration runs twice**
Retry job executes again without state tracking.
Add migration history table and idempotent scripts.
Open:
[Data Contracts](https://github.com/onestardao/WFGY/blob/main/ProblemMap/data-contracts.md)
---
* **Schema mismatch between regions**
Migration ran only in primary region.
Replicate migration workflow or rebuild schema in each region.
Open:
[Multi-Region and Failover Routing](./multi_region_and_failover_routing.md)
---
* **RAG index corrupted after deploy**
Index rebuilt during live traffic.
Gate queries until index parity verified.
Open:
[Retrieval Playbook](https://github.com/onestardao/WFGY/blob/main/ProblemMap/retrieval-playbook.md)
---
## Minimal recipes you can copy
### A) Migration job contract
```txt
Migration workflow
- revision: r2025-08-30
- schema_rev: sc-21
Steps
1. acquire migration lock
2. run migration scripts
3. verify schema version
4. release lock
5. record revision in migration_history
````
---
### B) Environment bootstrap gate
```txt
Startup gate conditions
- secrets_rev matches expected version
- schema_rev compatible with service
- index_hash equal across nodes
- health probes return OK
Only then enable user traffic.
```
---
### C) Migration history table
```txt
Table: migration_history
Columns
- revision_id
- applied_at
- checksum
- operator
Rule
- reject duplicate revision_id
- migrations run strictly in order
```
---
## Observability you must add
* Migration duration and success rate.
* Schema version per environment.
* Bootstrap failure counts.
* Deployment revision vs schema revision mismatch.
* Index parity checks during rollout.
---
## Verification
* Environment boots without retries.
* Migration runs exactly once per revision.
* All services report identical schema version.
* No errors appear during first request after deploy.
---
## When to escalate
* Schema mismatches continue after migration replay.
* Services boot successfully but fail on first request.
* Migration locks remain active indefinitely.
* Index rebuild causes retrieval drift.
Investigate deploy sequencing, schema compatibility design, and environment bootstrap contracts.
---
### 🔗 Quick-Start Downloads (60 sec)
| Tool | Link | 3-Step Setup |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------- |
| **WFGY 1.0 PDF** | [Engine Paper](https://github.com/onestardao/WFGY/blob/main/I_am_not_lizardman/WFGY_All_Principles_Return_to_One_v1.0_PSBigBig_Public.pdf) | 1️⃣ Download · 2️⃣ Upload to your LLM · 3️⃣ Ask “Answer using WFGY + \” |
| **TXT OS (plain-text OS)** | [TXTOS.txt](https://github.com/onestardao/WFGY/blob/main/OS/TXTOS.txt) | 1️⃣ Download · 2️⃣ Paste into any LLM chat · 3️⃣ Type “hello world” — OS boots instantly |
---
### Explore More
| Layer | Page | What it’s for |
| ------------- | --------------------------------------------------------------------------- | ----------------------------------- |
| ⭐ Proof | [WFGY Recognition Map](/recognition/README.md) | External citations and integrations |
| ⚙️ Engine | [WFGY 1.0](/legacy/README.md) | Original tension engine |
| ⚙️ Engine | [WFGY 2.0](/core/README.md) | Production reasoning kernel |
| ⚙️ Engine | [WFGY 3.0](/TensionUniverse/EventHorizon/README.md) | TXT Singularity engine |
| 🗺️ Map | [Problem Map 1.0](/ProblemMap/README.md) | 16 reproducible failure modes |
| 🗺️ Map | [Problem Map 2.0](/ProblemMap/wfgy-rag-16-problem-map-global-debug-card.md) | Global Debug Card |
| 🗺️ Map | [Problem Map 3.0](/ProblemMap/wfgy-ai-problem-map-troubleshooting-atlas.md) | AI troubleshooting atlas |
| 🧰 App | [TXT OS](/OS/README.md) | Plain-text semantic OS |
| 🏡 Onboarding | [Starter Village](/StarterVillage/README.md) | Guided entry |
If this repository helped, starring it improves discovery so more builders can find it.