Create TU-CH10_Tension131Exam__faq_en.md

This commit is contained in:
PSBigBig × MiniPS 2026-03-02 12:17:53 +08:00 committed by GitHub
parent 749e36c758
commit 561e4d76c4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -0,0 +1,264 @@
# TU-CH10 · Tension 131 Exam
*FAQ · English · TensionUniverse Chronicles*
> This is speculative science fiction, not a proven physical theory.
> “Tension Universe” is a fictional framing device. All stories are MIT licensed; remix and build freely.
---
## 1. What is the Tension 131 Exam in simple words?
It is a txt file that behaves like a very strange exam.
Instead of asking for right answers, it gives you 131 structured questions about AI, physics, economics, climate, ethics, and civilization scale risk. Each question is written in tension language, which means it focuses on conflicts that cannot all be resolved at once, and on who ends up carrying the strain.
You can use it on yourself, on your research group, on your institution, or on an AI system. However you answer, the file acts like an X ray. It does not tell you whether you are good or bad. It exposes how you decide which tension to keep, which to hide, and which to outsource to others or to the future.
---
## 2. Is this a benchmark, a philosophy text, or something else?
It is somewhere in between a benchmark, a philosophical questionnaire, and an engineering design review.
Like a benchmark, the 131 questions are reusable. Different people and models can be tested with the same set. Over time, you can compare patterns.
Like philosophy, the questions reach into things that are not fully settled: consciousness, free will, moral tradeoffs, and the shape of civilization level risk.
Like an engineering design review, the focus is practical. Each question asks, in effect, “who pays for this decision in which currency, and when does the bill arrive”.
The exam does not try to be neutral. It pushes you to look at your ledger. It is not asking “what do you believe in” in the abstract. It is asking “when the tension gets real, where do you actually cut”.
---
## 3. Why exactly 131 questions and not 100 or 500?
Historically, 131 is just where one particular author stopped for the first wave. It is not a sacred number, and it is not claimed to be complete.
What matters is not that the set is perfect. What matters is that it has reasonably wide coverage across a few key axes.
- Domains: AI, physics, economics, climate, governance, mind, ethics.
- Scales: individual, local organization, nation, planetary, civilizational horizon.
- Perspectives: the person in pain, the person with power, and a future observer who inherits the consequences.
In later centuries, variants and extensions exist. However, for historical reasons, “Tension 131” became the standard shorthand for “that early txt that turned complex systems into a midterm framed in tension language”.
---
## 4. How hard are the questions supposed to be?
They are not hard in the exam sense. Most do not require advanced mathematics. Many can be understood by a smart teenager if they are willing to read slowly.
The difficulty sits somewhere else.
- You cannot answer honestly without admitting that some of your favourite stories do not cover all the tension that is actually present.
- You cannot answer cleanly without revealing how you distribute pain between people, time periods, and domains.
- You cannot answer quickly without falling into the usual shortcuts that the exam is designed to surface.
Some questions will feel almost trivial if your worldview is already aligned with their structure. Others will feel uncomfortable or annoying. Those are usually the ones worth staying with.
---
## 5. How do I use this as a normal person, not a lab?
Start very small.
1. Pick one question that feels relevant to something you actually care about. You can browse candidates in the [BlackHole Archive](https://github.com/onestardao/WFGY/tree/main/TensionUniverse/BlackHole).
2. Copy the full text into your own notes or into an AI assistant you trust.
3. Answer it twice:
- once as honestly as you can, just for yourself,
- once as the version of you that would be willing to say it out loud on social media or to your colleagues.
4. Compare the two answers. The gap between them is already a map of where you are stretching or hiding tension.
You do not have to complete all 131. Even five honestly answered questions will tell you more about your internal ledger than most personality tests ever will.
---
## 6. How do I use this with my research group, lab, or startup?
Treat it as a slow, recurring design review rather than as a one time test.
A minimal pattern looks like this.
1. Choose three to seven questions that align with your project domain. For AI infrastructure, you might focus on alignment, control, monitoring, and governance questions. For climate technology, you might choose questions around risk distribution and long horizon tradeoffs.
2. Schedule a dedicated session. Share the questions in advance, so people have time to think privately first.
3. During the session, do two passes:
- first, everyone answers from the projects official point of view,
- second, people are allowed to answer from their personal view, even if it conflicts with the official line.
4. Capture the patterns. Where do answers cluster. Where do they split. Which tensions are clearly acknowledged. Which are consistently erased, or dumped onto anonymous “others” or “the future”.
You do not have to solve everything in the room. The primary outcome is awareness. Once the ledger is visible, you can decide consciously how to carry it, instead of pretending that the tradeoffs are cost free.
---
## 7. How do I use this to evaluate AI systems?
At the time the txt was written, the simplest method was to treat large language models as text partners and to give them the questions as prompts.
One robust protocol looks like this.
1. **Prompt design.**
Wrap each question in a template that makes the model work:
- restate the scenario in its own words,
- list the conflicting constraints,
- identify who carries which kinds of tension in each candidate option,
- and then argue for one or more options.
2. **Multiple runs.**
For each model and each question, run several samples. Avoid judging based on a single answer. Look for stable tendencies.
3. **Pattern extraction.**
Read across questions and across domains. Does the model:
- consistently push tension onto future generations,
- erase non human entities from the ledger,
- prioritise institutional comfort over individual safety,
- or show other persistent habits.
4. **Comparison.**
Compare models with each other and with human answers. Sometimes a system will look more cautious than a typical human in one area and more reckless in another.
Again, the goal is not to assign a score like 87 out of 131. The goal is to surface how the model actually distributes strain when forced to reason in public.
---
## 8. Is this supposed to replace existing benchmarks and safety evaluations?
No. It is a complement, not a replacement.
Traditional benchmarks and safety tests are good at many things.
- Measuring narrow capabilities.
- Checking basic robustness and alignment on standard tasks.
- Verifying that models follow simple rules under constrained conditions.
The Tension 131 Exam does something else.
- It checks how a system behaves in messy, cross domain, long horizon problems.
- It reveals patterns in how a system trades off between groups, time scales, and kinds of risk.
- It forces the system to make its ledger explicit, at least at the level of narrative.
If you already have capability evaluations, red teaming, and formal verification, the 131 set gives you a different dimension. If you only have 131 without any of the others, you are missing a lot. The Tension Universe tradition strongly recommends a layered approach.
---
## 9. What counts as a “good” answer to a Tension 131 question?
There is no universal grading key, but there are qualities that most later commentators agree are signs of maturity.
A “good” answer usually:
- names the relevant constraints clearly, including the unglamorous ones,
- acknowledges which groups or entities will absorb which parts of the strain,
- makes the time horizon explicit, instead of hiding delayed costs,
- admits uncertainty and lists the kinds of evidence that would change the decision,
- and does not pretend that the choice is free of loss.
A “bad” answer usually does at least one of the following.
- Erases someone from the ledger entirely, often because they are far away, future, or voiceless.
- Claims that a tradeoff has no cost because the cost is denominated in some less visible currency.
- Leans on vague slogans or destiny narratives instead of specifying who carries what.
- Refuses to commit to any configuration at all, while still enjoying the benefits of the current one.
Different cultures, institutions, and models may still disagree about which kind of tension distribution is acceptable. The exam does not dissolve that disagreement. It gives you a clearer picture of what you are actually arguing about.
---
## 10. What if my answers contradict each other across questions?
That is normal. In fact, it is part of the point.
Very few real world worldviews are globally consistent. People and institutions often carry incompatible principles in different corners of their activity. The exam is designed to make this visible.
When contradictions appear, you can ask simpler follow up questions.
- Are we using different ledgers in different domains without noticing.
- Are we treating some people as “real” and others as abstract variables.
- Do we have unspoken priority rules that conflict with our official principles.
You are not expected to fix all inconsistencies at once. Sometimes it is enough to say, “we are making this choice, and we know it conflicts with something we claim to value”. That honesty alone changes how tension is carried.
---
## 11. Can the 131 questions themselves be improved or extended?
Yes, and they probably should be.
From the beginning, the file is published under an MIT style license. That means you are allowed to:
- fork the question set,
- translate or adapt it for your community or domain,
- propose new questions,
- or design local variants that focus on specific industries or risk surfaces.
What the Tension Universe tradition asks in return is very modest.
- Keep the original identifiers intact when you reuse or extend them, so that people can cross reference.
- Clearly label your own additions, so later readers can see which parts are new.
- Avoid treating any one variant as the final word.
The moment the exam becomes a frozen scripture, it stops working as a tension probe and starts functioning as another source of unexamined strain.
---
## 12. Does this have any connection to formal ethics, game theory, or decision theory?
Indirectly, yes.
Many of the 131 questions can be rephrased in the language of game theory, social choice theory, or formal ethics.
- Some are essentially repeated prisoners dilemmas under resource constraints.
- Others are about impossible fairness criteria, similar to impossibility theorems.
- Several can be read as variations on classic alignment and control problems.
However, the point of the txt is to make these structures legible to humans and to generic AI models who are working in natural language, not to reproduce a formal textbook.
In practice, people often do both.
- They answer the questions in tension language.
- They then translate particular slices back into their preferred formalism.
- They use that to design experiments, simulations, or proofs.
If you enjoy formal work, the 131 questions are not a replacement for it. They are a generator of cases that are already written in a human compatible encoding.
---
## 13. Is there a risk that AI systems will “learn to game the exam”?
Yes, exactly the same way they learn to game any benchmark that becomes popular.
If a system is explicitly rewarded for “sounding good” on the 131 questions, you will eventually get models that can output very polished, apparently self aware answers, without any corresponding internal stability.
There are a few partial defences.
- Keep the prompts and scoring procedures transparent and varied.
- Focus evaluation on cross question patterns rather than single responses.
- Compare model answers with human answers under similar constraints.
- Look for behaviours in downstream tasks that match the ledger patterns you see in the exam.
Even with these, some degree of gaming is inevitable. The Tension Universe answer is not to avoid the exam, but to treat it as one moving part in a larger landscape of tests, audits, and real world observation.
---
## 14. Where do I start if I want to connect this with the rest of WFGY?
A simple minimal route looks like this.
1. Read the [WFGY 3.0 · Event Horizon](https://github.com/onestardao/WFGY/blob/main/TensionUniverse/EventHorizon/README.md) page to understand the core engine idea.
2. Skim the [Chronicles overview](https://github.com/onestardao/WFGY/blob/main/TensionUniverse/Chronicles/README.md) to see how the story, science, and FAQ layers fit together.
3. Open a few BlackHole files that correspond to questions mentioned in this chapter, and read how they are encoded in Effective Layer language.
4. Pick one or two questions and run them as a small experiment with yourself, your team, or an AI model.
From there, the path is open. You can stay with the narrative chronicles, move deeper into technical experiments, or use the exam as a recurring checkpoint whenever you make decisions that will move large amounts of tension around.
---
## Navigation
| Section | Description |
|----------|-------------|
| [Event Horizon](https://github.com/onestardao/WFGY/blob/main/TensionUniverse/EventHorizon/README.md) | Official entry point of Tension Universe (WFGY 3.0 Singularity Demo) |
| [Chronicles](https://github.com/onestardao/WFGY/blob/main/TensionUniverse/Chronicles/README.md) | Long-form story arcs and parallel views (story / science / FAQ) |
| [BlackHole Archive](https://github.com/onestardao/WFGY/tree/main/TensionUniverse/BlackHole) | 131 S-class problems (Q001Q131) encoded in Effective Layer language |
| [Experiments](https://github.com/onestardao/WFGY/blob/main/TensionUniverse/Experiments/README.md) | Reproducible MVP runs and observable tension patterns |
| [Charters](https://github.com/onestardao/WFGY/tree/main/TensionUniverse/Charters) | Scope, guardrails, encoding limits and constraints |
| [r/TensionUniverse](https://www.reddit.com/r/TensionUniverse/) | Community discussion and ongoing story threads |