feat: build an AI agent from 0 to 1 -- 11 progressive sessions

- 11 sessions from basic agent loop to autonomous teams - Python MVP implementations for each session - Mental-model-first docs in en/zh/ja - Interactive web platform with step-through visualizations - Incremental architecture: each session adds one mechanism
2026-04-29 23:09:32 +00:00 · 2026-02-21 14:37:42 +08:00 · 2026-02-21 14:37:42 +08:00 · c6a27ef1d7
commit c6a27ef1d7
156 changed files with 28059 additions and 0 deletions
--- a/web/src/data/annotations/s01.json
+++ b/web/src/data/annotations/s01.json
@ -0,0 +1,47 @@
+{
+  "version": "s01",
+  "decisions": [
+    {
+      "id": "one-tool-sufficiency",
+      "title": "Why Bash Alone Is Enough",
+      "description": "Bash can read files, write files, run arbitrary programs, pipe data between processes, and manage the filesystem. Any additional tool (read_file, write_file, etc.) would be a strict subset of what bash already provides. Adding more tools doesn't unlock new capabilities -- it just adds surface area for confusion. The model has to learn fewer tool schemas, and the implementation stays under 100 lines. This is the minimal viable agent: one tool, one loop.",
+      "alternatives": "We could have started with a richer toolset (file I/O, HTTP, database), but that would obscure the core insight: an LLM with a shell is already a general-purpose agent. Starting minimal also makes it obvious what each subsequent version actually adds.",
+      "zh": {
+        "title": "为什么仅靠 Bash 就够了",
+        "description": "Bash 能读写文件、运行任意程序、在进程间传递数据、管理文件系统。任何额外的工具（read_file、write_file 等）都只是 bash 已有能力的子集。增加工具并不会解锁新能力，只会增加模型需要理解的接口。模型只需学习一个工具的 schema，实现代码不超过 100 行。这就是最小可行 agent：一个工具，一个循环。"
+      },
+      "ja": {
+        "title": "Bash だけで十分な理由",
+        "description": "Bash はファイルの読み書き、任意のプログラムの実行、プロセス間のデータパイプ、ファイルシステムの管理が可能です。追加のツール（read_file、write_file など）は bash が既に提供している機能の部分集合に過ぎません。ツールを増やしても新しい能力は得られず、モデルが理解すべきインターフェースが増えるだけです。モデルが学習するスキーマは1つだけで、実装は100行以内に収まります。これが最小限の実用的エージェント：1つのツール、1つのループです。"
+      }
+    },
+    {
+      "id": "process-as-subagent",
+      "title": "Recursive Process Spawning as Subagent Mechanism",
+      "description": "When the agent runs `python v0.py \"subtask\"`, it spawns a completely new process with a fresh LLM context. This child process is effectively a subagent: it has its own system prompt, its own conversation history, and its own task focus. When it finishes, the parent gets the stdout result. This is subagent delegation without any framework -- just Unix process semantics. Each child process naturally isolates concerns because it literally cannot see the parent's context.",
+      "alternatives": "A framework-level subagent system (like v3's Task tool) gives more control over what tools the subagent can access and how results are returned. But at v0, the point is to show that process spawning is the most primitive form of agent delegation -- no shared memory, no message passing, just stdin/stdout.",
+      "zh": {
+        "title": "用递归进程创建实现子代理机制",
+        "description": "当 agent 执行 `python v0.py \"subtask\"` 时，它会创建一个全新的进程，拥有全新的 LLM 上下文。这个子进程实际上就是一个子代理：有自己的系统提示词、对话历史和任务焦点。子进程完成后，父进程通过 stdout 获取结果。这就是不依赖任何框架的子代理委派——纯粹的 Unix 进程语义。每个子进程天然隔离关注点，因为它根本看不到父进程的上下文。"
+      },
+      "ja": {
+        "title": "再帰プロセス生成によるサブエージェント機構",
+        "description": "エージェントが `python v0.py \"subtask\"` を実行すると、新しい LLM コンテキストを持つ完全に新しいプロセスが生成されます。この子プロセスは事実上サブエージェントです：独自のシステムプロンプト、会話履歴、タスクフォーカスを持ちます。完了すると、親プロセスは stdout で結果を受け取ります。これはフレームワークなしのサブエージェント委任です——共有メモリもメッセージパッシングもなく、stdin/stdout だけです。各子プロセスは親のコンテキストを参照できないため、関心の分離が自然に実現されます。"
+      }
+    },
+    {
+      "id": "model-drives-everything",
+      "title": "No Planning Framework -- The Model Decides",
+      "description": "There is no planner, no task queue, no state machine. The system prompt tells the model how to approach problems, and the model decides what bash command to run next based on the conversation so far. This is intentional: at this level, adding a planning layer would be premature abstraction. The model's chain-of-thought IS the plan. The agent loop just keeps asking the model what to do until it stops requesting tools.",
+      "alternatives": "Later versions (v2) add explicit planning via TodoWrite. But v0 proves that implicit planning through the model's reasoning is sufficient for many tasks. The planning framework only becomes necessary when you need external visibility into the agent's intentions.",
+      "zh": {
+        "title": "没有规划框架——由模型自行决策",
+        "description": "没有规划器，没有任务队列，没有状态机。系统提示词告诉模型如何处理问题，模型根据对话历史决定下一步执行什么 bash 命令。这是有意为之的：在这个层级，添加规划层属于过早抽象。模型的思维链本身就是计划。agent 循环只是不断询问模型下一步做什么，直到模型不再请求工具为止。"
+      },
+      "ja": {
+        "title": "計画フレームワークなし——モデルが全てを決定",
+        "description": "プランナーもタスクキューも状態マシンもありません。システムプロンプトがモデルに問題の取り組み方を伝え、モデルがこれまでの会話に基づいて次に実行する bash コマンドを決定します。これは意図的な設計です：このレベルでは計画レイヤーの追加は時期尚早な抽象化です。モデルの思考の連鎖そのものが計画です。エージェントループはモデルがツールの呼び出しを止めるまで、次の行動を問い続けるだけです。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s02.json
+++ b/web/src/data/annotations/s02.json
@ -0,0 +1,47 @@
+{
+  "version": "s02",
+  "decisions": [
+    {
+      "id": "four-tools-not-twenty",
+      "title": "Why Exactly Four Tools",
+      "description": "The four tools are bash, read_file, write_file, and edit_file. Together they cover roughly 95% of coding tasks. Bash handles execution and arbitrary commands. Read_file provides precise file reading with line numbers. Write_file creates or overwrites files. Edit_file does surgical string replacement. More tools would increase the model's cognitive load -- it has to decide which tool to use, and more options means more chances of picking the wrong one. Fewer tools also means fewer tool schemas to maintain and fewer edge cases to handle.",
+      "alternatives": "We could add specialized tools (list_directory, search_files, http_request), and later versions do. But at this stage, bash already covers those use cases. The split from v0's single tool to v1's four tools is specifically about giving the model structured I/O for file operations, where bash's quoting and escaping often trips up the model.",
+      "zh": {
+        "title": "为什么恰好四个工具",
+        "description": "四个工具分别是 bash、read_file、write_file 和 edit_file，覆盖了大约 95% 的编程任务。Bash 处理执行和任意命令；read_file 提供带行号的精确文件读取；write_file 创建或覆盖文件；edit_file 做精确的字符串替换。工具越多，模型的认知负担越重——它必须在更多选项中做选择，选错的概率也随之增加。更少的工具也意味着更少的 schema 需要维护、更少的边界情况需要处理。"
+      },
+      "ja": {
+        "title": "なぜ正確に4つのツールなのか",
+        "description": "4つのツールは bash、read_file、write_file、edit_file です。これらでコーディングタスクの約95%をカバーします。Bash は実行と任意のコマンドを処理し、read_file は行番号付きの正確なファイル読み取りを提供し、write_file はファイルの作成・上書きを行い、edit_file は外科的な文字列置換を行います。ツールが増えるとモデルの認知負荷が増大し、どのツールを使うかの判断でミスが増えます。ツールが少ないことは、メンテナンスすべきスキーマとエッジケースの削減も意味します。"
+      }
+    },
+    {
+      "id": "model-as-agent",
+      "title": "The Model IS the Agent",
+      "description": "The core agent loop is trivially simple: while True, call the LLM, if it returns tool_use blocks then execute them and feed results back, if it returns only text then stop. There is no router, no decision tree, no workflow engine. The model itself decides what to do, when to stop, and how to recover from errors. The code is just plumbing that connects the model to tools. This is a philosophical stance: agent behavior emerges from the model, not from the framework.",
+      "alternatives": "Many agent frameworks add elaborate orchestration layers: ReAct loops with explicit Thought/Action/Observation parsing, LangChain-style chains, AutoGPT-style goal decomposition. These frameworks assume the model needs scaffolding to behave as an agent. Our approach assumes the model already knows how to be an agent -- it just needs tools to act on the world.",
+      "zh": {
+        "title": "模型本身就是代理",
+        "description": "核心 agent 循环极其简单：不断调用 LLM，如果返回 tool_use 块就执行并回传结果，如果只返回文本就停止。没有路由器，没有决策树，没有工作流引擎。模型自己决定做什么、何时停止、如何从错误中恢复。代码只是连接模型和工具的管道。这是一种设计哲学：agent 行为从模型中涌现，而非由框架定义。"
+      },
+      "ja": {
+        "title": "モデルそのものがエージェント",
+        "description": "コアのエージェントループは極めてシンプルです：LLM を呼び出し続け、tool_use ブロックが返されればそれを実行して結果をフィードバックし、テキストのみが返されれば停止します。ルーターも決定木もワークフローエンジンもありません。モデル自体が何をすべきか、いつ停止するか、エラーからどう回復するかを決定します。コードはモデルとツールを接続する配管に過ぎません。これは設計思想です：エージェントの振る舞いはフレームワークではなくモデルから創発するものです。"
+      }
+    },
+    {
+      "id": "explicit-tool-schemas",
+      "title": "JSON Schemas for Every Tool",
+      "description": "Each tool defines a strict JSON schema for its input parameters. For example, edit_file requires old_string and new_string as exact strings, not regex patterns. This eliminates an entire class of bugs: the model can't pass malformed input because the API validates against the schema before execution. It also makes the model's intent unambiguous -- when it calls edit_file with specific strings, there's no parsing ambiguity about what it wants to change.",
+      "alternatives": "Some agent systems let the model output free-form text that gets parsed with regex or heuristics (e.g., extracting code from markdown blocks). This is fragile -- the model might format output slightly differently and break the parser. JSON schemas trade flexibility for reliability.",
+      "zh": {
+        "title": "每个工具都有 JSON Schema",
+        "description": "每个工具都为输入参数定义了严格的 JSON schema。例如，edit_file 要求 old_string 和 new_string 是精确的字符串，而非正则表达式。这消除了一整类错误：模型无法传递格式错误的输入，因为 API 会在执行前校验 schema。这也使模型的意图变得明确——当它用特定字符串调用 edit_file 时，不存在关于它想修改什么的解析歧义。"
+      },
+      "ja": {
+        "title": "全ツールに JSON Schema を定義",
+        "description": "各ツールは入力パラメータに対して厳密な JSON Schema を定義しています。例えば edit_file は old_string と new_string を正確な文字列として要求し、正規表現は使いません。これにより一連のバグを排除できます：API がスキーマに対して実行前にバリデーションを行うため、モデルは不正な入力を渡せません。モデルの意図も明確になります――特定の文字列で edit_file を呼び出す際、何を変更したいかについて解析の曖昧さがありません。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s03.json
+++ b/web/src/data/annotations/s03.json
@ -0,0 +1,47 @@
+{
+  "version": "s03",
+  "decisions": [
+    {
+      "id": "visible-planning",
+      "title": "Making Plans Visible via TodoWrite",
+      "description": "Instead of letting the model plan silently in its chain-of-thought, we force plans to be externalized through the TodoWrite tool. Each plan item has a status (pending, in_progress, completed) that gets tracked explicitly. This has three benefits: (1) users can see what the agent intends to do before it does it, (2) developers can debug agent behavior by inspecting the plan state, (3) the agent itself can refer back to its plan in later turns when earlier context has scrolled away.",
+      "alternatives": "The model could plan internally via chain-of-thought reasoning (as it does in v0/v1). Internal planning works but is invisible and ephemeral -- once the thinking scrolls out of context, the plan is lost. Claude's extended thinking is another option, but it's not inspectable by the user or by downstream tools.",
+      "zh": {
+        "title": "通过 TodoWrite 让计划可见",
+        "description": "我们不让模型在思维链中默默规划，而是强制通过 TodoWrite 工具将计划外化。每个计划项都有可追踪的状态（pending、in_progress、completed）。这有三个好处：(1) 用户可以在执行前看到 agent 打算做什么；(2) 开发者可以通过检查计划状态来调试 agent 行为；(3) agent 自身可以在后续轮次中引用计划，即使早期上下文已经滚出窗口。"
+      },
+      "ja": {
+        "title": "TodoWrite による計画の可視化",
+        "description": "モデルが思考の連鎖の中で黙って計画するのではなく、TodoWrite ツールを通じて計画を外部化することを強制します。各計画項目には追跡可能なステータス（pending、in_progress、completed）があります。利点は3つ：(1) ユーザーがエージェントの意図を実行前に確認できる、(2) 開発者が計画状態を検査してデバッグできる、(3) エージェント自身が以前のコンテキストがスクロールアウトした後でも計画を参照できる。"
+      }
+    },
+    {
+      "id": "single-in-progress",
+      "title": "Only One Task Can Be In-Progress",
+      "description": "The TodoWrite tool enforces that at most one task has status 'in_progress' at any time. If the model tries to start a second task, it must first complete or abandon the current one. This constraint prevents a subtle failure mode: models that try to 'multitask' by interleaving work on multiple items tend to lose track of state and produce half-finished results. Sequential focus produces higher quality than parallel thrashing.",
+      "alternatives": "Allowing multiple in-progress items would let the agent context-switch between tasks, which seems more flexible. In practice, LLMs handle context-switching poorly -- they lose track of which task they were working on and mix up details between tasks. The single-focus constraint is a guardrail that improves output quality.",
+      "zh": {
+        "title": "同一时间只允许一个任务进行中",
+        "description": "TodoWrite 工具强制要求任何时候最多只能有一个任务处于 in_progress 状态。如果模型想开始第二个任务，必须先完成或放弃当前任务。这个约束防止了一种隐蔽的失败模式：试图通过交替处理多个项目来'多任务'的模型，往往会丢失状态并产出半成品。顺序执行的专注度远高于并行切换。"
+      },
+      "ja": {
+        "title": "同時に進行中にできるタスクは1つだけ",
+        "description": "TodoWrite ツールは、同時に 'in_progress' 状態のタスクを最大1つに制限します。モデルが2つ目のタスクを開始しようとする場合、まず現在のタスクを完了または中断する必要があります。この制約は微妙な失敗モードを防ぎます：複数の項目を交互に処理して「マルチタスク」しようとするモデルは、状態を見失い中途半端な結果を生みがちです。逐次的な集中は並行的な切り替えよりも高品質な出力を生み出します。"
+      }
+    },
+    {
+      "id": "max-twenty-items",
+      "title": "Maximum of 20 Plan Items",
+      "description": "TodoWrite caps the plan at 20 items. This is a deliberate constraint against over-planning. Models tend to decompose tasks into increasingly fine-grained steps when unconstrained, producing 50-item plans where each step is trivial. Long plans are fragile: if step 15 fails, the remaining 35 steps may all be invalid. Short plans (under 20 items) stay at the right abstraction level and are easier to adapt when reality diverges from the plan.",
+      "alternatives": "No cap would give the model full flexibility, but in practice leads to absurdly detailed plans. A dynamic cap (proportional to task complexity) would be smarter but adds complexity. The fixed cap of 20 is a simple heuristic that works well empirically -- most real coding tasks can be expressed in 5-15 meaningful steps.",
+      "zh": {
+        "title": "计划项上限为 20 条",
+        "description": "TodoWrite 将计划项限制在 20 条以内。这是对过度规划的刻意约束。不加限制时，模型倾向于将任务分解成越来越细粒度的步骤，产出 50 条的计划，每一步都微不足道。冗长的计划很脆弱：如果第 15 步失败，剩下的 35 步可能全部作废。20 条以内的短计划保持在正确的抽象层级，更容易在现实偏离计划时做出调整。"
+      },
+      "ja": {
+        "title": "計画項目の上限は20個",
+        "description": "TodoWrite は計画を20項目に制限します。これは過度な計画に対する意図的な制約です。制約がないとモデルはタスクをどんどん細かいステップに分解し、各ステップが些末な50項目の計画を作りがちです。長い計画は脆弱です：ステップ15が失敗すると残りの35ステップは全て無効になりかねません。20項目以内の短い計画は適切な抽象度を保ち、現実が計画から逸脱した際の適応が容易です。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s04.json
+++ b/web/src/data/annotations/s04.json
@ -0,0 +1,47 @@
+{
+  "version": "s04",
+  "decisions": [
+    {
+      "id": "context-isolation",
+      "title": "Subagents Get Fresh Context, Not Shared History",
+      "description": "When a parent agent spawns a subagent via the Task tool, the subagent starts with a clean message history containing only the system prompt and the delegated task description. It does NOT inherit the parent's conversation. This is context isolation: the subagent can focus entirely on its specific subtask without being distracted by hundreds of messages from the parent's broader conversation. The result is returned to the parent as a single tool_result, collapsing potentially dozens of subagent turns into one concise answer.",
+      "alternatives": "Sharing the parent's full context would give the subagent more information, but it would also flood the subagent with irrelevant details. Context window is finite -- filling it with parent history leaves less room for the subagent's own work. Fork-based approaches (copy the parent context) are a middle ground but still waste tokens on irrelevant history.",
+      "zh": {
+        "title": "子代理获得全新上下文，而非共享历史",
+        "description": "当父代理通过 Task 工具创建子代理时，子代理从全新的消息历史开始，只包含系统提示词和委派的任务描述，不继承父代理的对话。这就是上下文隔离：子代理可以完全专注于特定子任务，不会被父代理长达数百条消息的对话干扰。结果作为单条 tool_result 返回给父代理，将子代理可能数十轮的交互压缩为一个简洁的回答。"
+      },
+      "ja": {
+        "title": "サブエージェントは共有履歴ではなく新しいコンテキストを取得",
+        "description": "親エージェントが Task ツールでサブエージェントを生成すると、サブエージェントはシステムプロンプトと委任されたタスク説明のみを含むクリーンなメッセージ履歴から開始します。親の会話は引き継ぎません。これがコンテキスト分離です：サブエージェントは親の広範な会話の何百ものメッセージに気を取られることなく、特定のサブタスクに完全に集中できます。結果は単一の tool_result として親に返され、サブエージェントの数十ターンが1つの簡潔な回答に凝縮されます。"
+      }
+    },
+    {
+      "id": "tool-filtering",
+      "title": "Explore Agents Cannot Write Files",
+      "description": "When spawning a subagent with the 'Explore' type, it receives only read-only tools: bash (with restrictions), read_file, and search tools. It cannot call write_file or edit_file. This implements the principle of least privilege: an agent tasked with 'find all usages of function X' doesn't need write access. Removing write tools eliminates the risk of accidental file modification during exploration, and it also narrows the tool space so the model makes better decisions with fewer options.",
+      "alternatives": "Giving all subagents full tool access is simpler to implement but violates least privilege. A permission-request system (subagent asks parent for write access) adds complexity and latency. Static tool filtering by role is the pragmatic middle ground -- simple to implement, effective at preventing accidents.",
+      "zh": {
+        "title": "Explore 代理不能写入文件",
+        "description": "创建 Explore 类型的子代理时，它只获得只读工具：bash（有限制）、read_file 和搜索工具，不能调用 write_file 或 edit_file。这实现了最小权限原则：一个被委派'查找函数 X 所有使用位置'的代理不需要写权限。移除写工具消除了探索过程中误修改文件的风险，同时缩小了工具空间，让模型在更少的选项中做出更好的决策。"
+      },
+      "ja": {
+        "title": "Explore エージェントはファイルを書き込めない",
+        "description": "Explore タイプのサブエージェントを生成すると、読み取り専用ツールのみが提供されます：bash（制限付き）、read_file、検索ツール。write_file や edit_file は使えません。これは最小権限の原則の実装です：「関数 X の全使用箇所を見つける」タスクに書き込み権限は不要です。書き込みツールを除外することで探索中の誤ったファイル変更リスクを排除し、ツール空間を狭めてモデルがより良い判断を下せるようにします。"
+      }
+    },
+    {
+      "id": "no-recursive-task",
+      "title": "Subagents Cannot Spawn Their Own Subagents",
+      "description": "The Task tool is not included in the subagent's tool set. A subagent must complete its work directly -- it cannot delegate further. This prevents infinite delegation loops: without this constraint, an agent could spawn a subagent that spawns another subagent, each one re-delegating the same task in slightly different words, consuming tokens without making progress. One level of delegation handles the vast majority of use cases. If a task is too complex for a single subagent, the parent should decompose it differently.",
+      "alternatives": "Allowing recursive delegation (bounded by depth) would handle deeply nested tasks but adds complexity and the risk of runaway token consumption. In practice, single-level delegation covers most real-world coding tasks. Multi-level delegation is addressed in later versions (v6+) through persistent team structures instead of recursive spawning.",
+      "zh": {
+        "title": "子代理不能再创建子代理",
+        "description": "Task 工具不包含在子代理的工具集中。子代理必须直接完成工作，不能继续委派。这防止了无限委派循环：没有这个约束，一个代理可能创建子代理，子代理又创建子代理，每一层都用略微不同的措辞重新委派同一任务，消耗 token 却毫无进展。一层委派足以处理绝大多数场景。如果任务对单个子代理来说太复杂，应该由父代理重新分解。"
+      },
+      "ja": {
+        "title": "サブエージェントは自身のサブエージェントを生成できない",
+        "description": "Task ツールはサブエージェントのツールセットに含まれません。サブエージェントは作業を直接完了しなければならず、さらなる委任はできません。これにより無限委任ループを防止します：この制約がなければ、エージェントがサブエージェントを生成し、そのサブエージェントがさらにサブエージェントを生成し、それぞれが微妙に異なる言葉で同じタスクを再委任してトークンを消費するだけで進捗しない可能性があります。一段階の委任で大多数のユースケースに対応できます。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s05.json
+++ b/web/src/data/annotations/s05.json
@ -0,0 +1,47 @@
+{
+  "version": "s05",
+  "decisions": [
+    {
+      "id": "tool-result-injection",
+      "title": "Skills Inject via tool_result, Not System Prompt",
+      "description": "When the agent invokes the Skill tool, the skill's content (a SKILL.md file) is returned as a tool_result in a user message, not injected into the system prompt. This is a deliberate caching optimization: the system prompt remains static across turns, which means API providers can cache it (Anthropic's prompt caching, OpenAI's system message caching). If skill content were in the system prompt, it would change every time a new skill is loaded, invalidating the cache. By putting dynamic content in tool_result, we keep the expensive system prompt cacheable while still getting skill knowledge into context.",
+      "alternatives": "Injecting skills into the system prompt is simpler and gives skills higher priority in the model's attention. But it breaks prompt caching (every skill load creates a new system prompt variant) and bloats the system prompt over time as skills accumulate. The tool_result approach keeps things cache-friendly at the cost of slightly lower attention priority.",
+      "zh": {
+        "title": "技能通过 tool_result 注入，而非系统提示词",
+        "description": "当 agent 调用 Skill 工具时，技能内容（SKILL.md 文件）作为 tool_result 在用户消息中返回，而非注入系统提示词。这是一个刻意的缓存优化：系统提示词在各轮次间保持静态，API 提供商可以缓存它（Anthropic 的 prompt caching、OpenAI 的 system message caching）。如果技能内容在系统提示词中，每次加载新技能都会使缓存失效。将动态内容放在 tool_result 中，既保持了昂贵的系统提示词可缓存，又让技能知识进入了上下文。"
+      },
+      "ja": {
+        "title": "スキルはシステムプロンプトではなく tool_result で注入",
+        "description": "エージェントが Skill ツールを呼び出すと、スキルの内容（SKILL.md ファイル）はシステムプロンプトへの注入ではなく、ユーザーメッセージ内の tool_result として返されます。これは意図的なキャッシュ最適化です：システムプロンプトはターン間で静的に保たれるため、API プロバイダーがキャッシュできます（Anthropic のプロンプトキャッシュ、OpenAI のシステムメッセージキャッシュ）。スキル内容がシステムプロンプト内にあると、新しいスキルをロードするたびにキャッシュが無効化されます。動的コンテンツを tool_result に配置することで、高コストなシステムプロンプトのキャッシュ可能性を維持しつつ、スキル知識をコンテキストに取り込めます。"
+      }
+    },
+    {
+      "id": "lazy-loading",
+      "title": "On-Demand Skill Loading Instead of Upfront",
+      "description": "Skills are not loaded at startup. The agent starts with only the skill names and descriptions (from frontmatter). When the agent decides it needs a specific skill, it calls the Skill tool, which loads the full SKILL.md body into context. This keeps the initial prompt small and focused. An agent solving a Python bug doesn't need the Kubernetes deployment skill loaded -- that would waste context window space and potentially confuse the model with irrelevant instructions.",
+      "alternatives": "Loading all skills upfront guarantees the model always has all knowledge available, but wastes tokens on irrelevant skills and may hit context limits. A recommendation system (model suggests skills, human approves) adds latency. Lazy loading lets the model self-serve the knowledge it needs, when it needs it.",
+      "zh": {
+        "title": "按需加载技能而非预加载",
+        "description": "技能不会在启动时加载。Agent 初始只拥有技能名称和描述（来自 frontmatter）。当 agent 判断需要特定技能时，调用 Skill 工具将完整的 SKILL.md 内容加载到上下文中。这保持了初始提示词的精简。一个正在修复 Python bug 的 agent 不需要加载 Kubernetes 部署技能——那会浪费上下文窗口空间，还可能用无关指令干扰模型。"
+      },
+      "ja": {
+        "title": "起動時ではなくオンデマンドでスキルを読み込み",
+        "description": "スキルは起動時に読み込まれません。エージェントは最初、スキルの名前と説明（フロントマターから）のみを持ちます。エージェントが特定のスキルが必要だと判断すると、Skill ツールを呼び出して完全な SKILL.md の内容をコンテキストに読み込みます。これにより初期プロンプトを小さく保ちます。Python のバグを修正しているエージェントに Kubernetes デプロイのスキルは不要です――コンテキストウィンドウの無駄遣いであり、無関係な指示でモデルを混乱させかねません。"
+      }
+    },
+    {
+      "id": "frontmatter-body-split",
+      "title": "YAML Frontmatter + Markdown Body in SKILL.md",
+      "description": "Each SKILL.md file has two parts: YAML frontmatter (name, description, globs) and a markdown body (the actual instructions). The frontmatter serves as metadata for the skill registry -- it's what gets listed when the agent asks 'what skills are available?' The body is the payload that gets loaded on demand. This separation means you can list 100 skills (reading only frontmatter, a few bytes each) without loading 100 full instruction sets (potentially thousands of tokens each).",
+      "alternatives": "A separate metadata file (skill.yaml + skill.md) would work but doubles the number of files. Embedding metadata in the markdown (as headings or comments) requires parsing the full file to extract metadata. Frontmatter is a well-established convention (Jekyll, Hugo, Astro) that keeps metadata and content co-located but separately parseable.",
+      "zh": {
+        "title": "SKILL.md 采用 YAML Frontmatter + Markdown 正文",
+        "description": "每个 SKILL.md 文件有两部分：YAML frontmatter（名称、描述、globs）和 markdown 正文（实际指令）。Frontmatter 作为技能注册表的元数据——当 agent 问'有哪些可用技能'时，展示的就是这些信息。正文是按需加载的有效负载。这种分离意味着可以列出 100 个技能（每个只读几字节的 frontmatter）而不必加载 100 套完整指令集（每套可能数千 token）。"
+      },
+      "ja": {
+        "title": "SKILL.md で YAML フロントマター + Markdown 本文",
+        "description": "各 SKILL.md ファイルは2つの部分で構成されます：YAML フロントマター（名前、説明、globs）と Markdown 本文（実際の指示）。フロントマターはスキルレジストリのメタデータとして機能し、エージェントが「どんなスキルが利用可能か」と問い合わせた際に一覧表示されます。本文はオンデマンドで読み込まれるペイロードです。この分離により、100個のスキル一覧表示（各数バイトのフロントマターのみ読み取り）が100個の完全な指示セット（各数千トークン）のロードなしに可能になります。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s06.json
+++ b/web/src/data/annotations/s06.json
@ -0,0 +1,61 @@
+{
+  "version": "s06",
+  "decisions": [
+    {
+      "id": "three-layer-compression",
+      "title": "Three-Layer Compression Strategy",
+      "description": "Context management uses three distinct layers, each with different cost/benefit profiles. (1) Microcompact runs every turn and is nearly free: it truncates tool_result blocks from older messages, stripping verbose command output that's no longer needed. (2) Auto_compact triggers when token count exceeds a threshold: it calls the LLM to generate a conversation summary, which is expensive but dramatically reduces context size. (3) Manual compact is user-triggered for explicit 'start fresh' moments. Layering these means the cheap operation runs constantly (keeping context tidy) while the expensive operation runs rarely (only when actually needed).",
+      "alternatives": "A single compression strategy (e.g., always summarize at 80% capacity) would be simpler but wasteful -- most of the time, microcompact alone keeps things manageable. A sliding window (drop oldest N messages) is cheap but loses important context. The three-layer approach gives the best token efficiency: cheap cleanup constantly, expensive summarization rarely.",
+      "zh": {
+        "title": "三层压缩策略",
+        "description": "上下文管理使用三个独立的层次，各有不同的成本收益比。(1) 微压缩每轮都运行，几乎零成本：它截断旧消息中的 tool_result 块，去除不再需要的冗长命令输出。(2) 自动压缩在 token 数超过阈值时触发：调用 LLM 生成对话摘要，代价高但能大幅缩减上下文。(3) 手动压缩由用户触发，用于明确的'重新开始'场景。分层意味着低成本操作持续运行（保持上下文整洁），而高成本操作很少触发（仅在真正需要时）。"
+      },
+      "ja": {
+        "title": "3層圧縮戦略",
+        "description": "コンテキスト管理は、異なるコスト・効果プロファイルを持つ3つの層を使用します。(1) マイクロコンパクトは毎ターン実行されほぼ無コスト：古いメッセージの tool_result ブロックを切り詰め、不要な冗長出力を除去します。(2) 自動コンパクトはトークン数が閾値を超えると発動：LLM を呼び出して会話の要約を生成し、コストは高いがコンテキストサイズを劇的に削減します。(3) 手動コンパクトはユーザーが明示的に「最初からやり直し」する時に使用します。この階層化により、安価な操作が常に実行され（コンテキストを整頓）、高価な操作はめったに実行されません（本当に必要な時のみ）。"
+      }
+    },
+    {
+      "id": "min-savings-threshold",
+      "title": "MIN_SAVINGS = 20,000 Tokens Before Compressing",
+      "description": "Auto_compact only triggers when the estimated savings (current tokens minus estimated summary size) exceed 20,000 tokens. Compression is not free: the summary itself consumes tokens, plus there's the API call cost to generate it. If the conversation is only 25,000 tokens, compressing might save 5,000 tokens but cost an API call and produce a summary that's less coherent than the original. The 20K threshold ensures compression only happens when the savings meaningfully exceed the overhead.",
+      "alternatives": "A percentage-based threshold (compress when context is 80% full) adapts to different context window sizes but doesn't account for the fixed cost of generating a summary. A fixed threshold of 10K would compress more aggressively but often isn't worth it. The 20K value was chosen empirically: it's the point where compression savings consistently outweigh the quality loss from summarization.",
+      "zh": {
+        "title": "最小节省量 = 20,000 Token 才触发压缩",
+        "description": "自动压缩仅在估算节省量（当前 token 数减去预估摘要大小）超过 20,000 token 时才触发。压缩不是免费的：摘要本身会消耗 token，还有生成摘要的 API 调用成本。如果对话只有 25,000 token，压缩可能节省 5,000 token，但需要一次 API 调用，且产出的摘要可能不如原文连贯。20K 的阈值确保只在节省量明显超过开销时才进行压缩。"
+      },
+      "ja": {
+        "title": "圧縮前に MIN_SAVINGS = 20,000 トークンが必要",
+        "description": "自動コンパクトは推定節約量（現在のトークン数マイナス推定要約サイズ）が20,000トークンを超えた場合にのみ発動します。圧縮は無料ではありません：要約自体がトークンを消費し、さらに生成のための API コール費用がかかります。会話が25,000トークンしかない場合、圧縮で5,000トークン節約できても、API コールが必要で元の会話より一貫性の低い要約になる可能性があります。20K の閾値は、節約量がオーバーヘッドを確実に上回る場合にのみ圧縮を実行することを保証します。"
+      }
+    },
+    {
+      "id": "summary-replaces-all",
+      "title": "Summary Replaces ALL Messages, Not Partial History",
+      "description": "When auto_compact fires, it generates a summary and replaces the ENTIRE message history with that summary. It does not keep the last N messages alongside the summary. This avoids a subtle coherence problem: if you keep recent messages plus a summary of older ones, the model sees two representations of overlapping content. The summary might say 'we decided to use approach X' while a recent message still shows the deliberation process, creating contradictory signals. A clean summary is a single coherent narrative.",
+      "alternatives": "Keeping the last 5-10 messages alongside the summary preserves recent detail and gives the model more to work with. But it creates the overlap problem described above, and makes the total context size less predictable. Some systems use a 'sliding window + summary' approach which works but requires careful tuning of the overlap region.",
+      "zh": {
+        "title": "摘要替换全部消息，而非保留部分历史",
+        "description": "自动压缩触发时，生成摘要并替换全部消息历史，不会在摘要旁保留最近的 N 条消息。这避免了一个微妙的连贯性问题：如果同时保留近期消息和旧消息的摘要，模型会看到重叠内容的两种表示。摘要可能说'我们决定使用方案 X'，而近期消息仍在展示讨论过程，产生矛盾信号。干净的摘要是一个连贯的单一叙述。"
+      },
+      "ja": {
+        "title": "要約が部分的な履歴ではなく全メッセージを置換",
+        "description": "自動コンパクトが発動すると、要約を生成してメッセージ履歴の全体をその要約で置換します。要約と並べて直近 N 件のメッセージを保持することはしません。これにより微妙な一貫性の問題を回避します：直近のメッセージと古いメッセージの要約を併存させると、モデルは重複するコンテンツの2つの表現を見ることになります。要約が「アプローチ X を使うことに決めた」と言う一方で、直近のメッセージにはまだ検討過程が表示されているかもしれず、矛盾するシグナルを生じます。クリーンな要約は単一の一貫した物語です。"
+      }
+    },
+    {
+      "id": "transcript-archival",
+      "title": "Full Conversation Archived to JSONL on Disk",
+      "description": "Even though context is compressed in memory, the full uncompressed conversation is appended to a JSONL file on disk. Every message, every tool call, every result -- nothing is lost. This means compression is a lossy operation on the in-memory context but a lossless operation on the permanent record. Post-hoc analysis (debugging agent behavior, computing token usage, training data extraction) can always work from the complete transcript. The JSONL format is append-only, making it safe for concurrent writes and easy to stream-process.",
+      "alternatives": "Not archiving saves disk space but makes debugging hard -- when the agent makes a mistake, you can't see what it was 'thinking' 200 messages ago because that context was compressed away. Database storage (SQLite) would provide queryability but adds a dependency. JSONL is the simplest format that supports append-only writes and line-by-line processing.",
+      "zh": {
+        "title": "完整对话以 JSONL 格式归档到磁盘",
+        "description": "尽管上下文在内存中被压缩，完整的未压缩对话仍会追加到磁盘上的 JSONL 文件中。每条消息、每次工具调用、每个结果都不会丢失。压缩对内存上下文是有损操作，但对永久记录是无损的。事后分析（调试 agent 行为、计算 token 用量、提取训练数据）始终可以基于完整记录进行。JSONL 格式仅追加写入，对并发写入安全，易于流式处理。"
+      },
+      "ja": {
+        "title": "完全な会話を JSONL としてディスクに保存",
+        "description": "メモリ上でコンテキストが圧縮されても、完全な非圧縮会話はディスク上の JSONL ファイルに追記されます。全てのメッセージ、全てのツール呼び出し、全ての結果――何も失われません。圧縮はインメモリコンテキストに対しては不可逆ですが、永続記録に対しては可逆です。事後分析（エージェントの挙動デバッグ、トークン使用量の計算、学習データの抽出）は常に完全な記録から行えます。JSONL フォーマットは追記専用で、並行書き込みに安全であり行単位の処理が容易です。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s07.json
+++ b/web/src/data/annotations/s07.json
@ -0,0 +1,47 @@
+{
+  "version": "s07",
+  "decisions": [
+    {
+      "id": "file-based-persistence",
+      "title": "Tasks Stored as JSON Files, Not In-Memory",
+      "description": "Tasks are persisted as JSON files in a .tasks/ directory on the filesystem instead of being held in memory. This has three critical benefits: (1) Tasks survive process crashes -- if the agent dies mid-task, the task board is still on disk when it restarts. (2) Multiple agents can read and write to the same task directory, enabling multi-agent coordination without shared memory. (3) Humans can inspect and manually edit task files for debugging. The filesystem becomes the shared database.",
+      "alternatives": "In-memory storage (like v2's TodoWrite) is simpler and faster but loses state on crash and doesn't work across multiple agent processes. A proper database (SQLite, Redis) would provide ACID guarantees and better concurrency, but adds a dependency and operational complexity. Files are the zero-dependency persistence layer that works everywhere.",
+      "zh": {
+        "title": "任务存储为 JSON 文件，而非内存",
+        "description": "任务以 JSON 文件形式持久化在 .tasks/ 目录中，而非保存在内存里。这有三个关键好处：(1) 任务在进程崩溃后仍然存在——如果 agent 在任务中途崩溃，重启后任务板仍在磁盘上；(2) 多个 agent 可以读写同一任务目录，无需共享内存即可实现多代理协调；(3) 人类可以查看和手动编辑任务文件来调试。文件系统就是共享数据库。"
+      },
+      "ja": {
+        "title": "タスクをメモリではなく JSON ファイルとして保存",
+        "description": "タスクはメモリ内ではなく .tasks/ ディレクトリに JSON ファイルとして永続化されます。3つの重要な利点があります：(1) プロセスのクラッシュ後もタスクが存続する――エージェントがタスク途中でクラッシュしても、再起動時にタスクボードはディスク上に残っています。(2) 複数のエージェントが同じタスクディレクトリを読み書きでき、共有メモリなしにマルチエージェント連携が可能になります。(3) 人間がデバッグのためにタスクファイルを検査・手動編集できます。ファイルシステムが共有データベースになります。"
+      }
+    },
+    {
+      "id": "dependency-graph",
+      "title": "Tasks Have blocks/blockedBy Dependency Fields",
+      "description": "Each task can declare which other tasks it blocks (downstream dependents) and which tasks block it (upstream dependencies). An agent will not start a task that has unresolved blockedBy dependencies. This is essential for multi-agent coordination: when Agent A is writing the database schema and Agent B needs to write queries against it, Agent B's task is blockedBy Agent A's task. Without dependencies, both agents might start simultaneously and Agent B would work against a schema that doesn't exist yet.",
+      "alternatives": "Simple priority ordering (high/medium/low) doesn't capture 'task B literally cannot start until task A finishes.' A centralized coordinator that assigns tasks in order would work but creates a single point of failure and bottleneck. Declarative dependencies let each agent independently determine what it can work on by reading the task files.",
+      "zh": {
+        "title": "任务具有 blocks/blockedBy 依赖字段",
+        "description": "每个任务可以声明它阻塞哪些任务（下游依赖）以及它被哪些任务阻塞（上游依赖）。Agent 不会开始有未解决 blockedBy 依赖的任务。这对多代理协调至关重要：当 Agent A 在编写数据库 schema、Agent B 需要写查询时，Agent B 的任务被 Agent A 的任务阻塞。没有依赖关系，两个 agent 可能同时开始，而 Agent B 会针对一个尚不存在的 schema 工作。"
+      },
+      "ja": {
+        "title": "タスクに blocks/blockedBy 依存関係フィールド",
+        "description": "各タスクは、自分がブロックするタスク（下流の依存先）と、自分をブロックするタスク（上流の依存元）を宣言できます。エージェントは未解決の blockedBy 依存がある タスクを開始しません。これはマルチエージェント連携に不可欠です：エージェント A がデータベーススキーマを書いていてエージェント B がそれに対するクエリを書く必要がある場合、B のタスクは A のタスクにブロックされます。依存関係がなければ両エージェントが同時に開始し、B はまだ存在しないスキーマに対して作業することになります。"
+      }
+    },
+    {
+      "id": "task-replaces-todo",
+      "title": "TaskManager Replaces TodoWrite",
+      "description": "TaskManager is the multi-agent evolution of TodoWrite. Same core concept (a list of items with statuses) but with critical additions: file persistence (survives crashes), dependency tracking (blocks/blockedBy), ownership (which agent is working on what), and multi-process safety. TodoWrite was designed for a single agent tracking its own work in memory. TaskManager is designed for a team of agents coordinating through the filesystem. The API is intentionally similar so the conceptual upgrade path is clear.",
+      "alternatives": "Keeping TodoWrite for single-agent use and adding TaskManager only for multi-agent scenarios would avoid breaking the single-agent experience. But maintaining two systems with overlapping functionality increases complexity. TaskManager is a strict superset of TodoWrite -- a single agent using TaskManager just ignores the multi-agent features.",
+      "zh": {
+        "title": "TaskManager 取代 TodoWrite",
+        "description": "TaskManager 是 TodoWrite 的多代理进化版。核心概念相同（带状态的项目列表），但增加了关键能力：文件持久化（崩溃后存活）、依赖追踪（blocks/blockedBy）、所有权（哪个 agent 在处理什么）、以及多进程安全。TodoWrite 为单 agent 在内存中追踪自身工作而设计。TaskManager 为代理团队通过文件系统协调而设计。API 刻意保持相似，使概念升级路径清晰。"
+      },
+      "ja": {
+        "title": "TaskManager が TodoWrite を置き換え",
+        "description": "TaskManager は TodoWrite のマルチエージェント進化版です。コア概念は同じ（ステータス付きの項目リスト）ですが、重要な追加があります：ファイル永続化（クラッシュ後も存続）、依存関係追跡（blocks/blockedBy）、所有権（どのエージェントが何を担当しているか）、マルチプロセス安全性。TodoWrite は単一エージェントがメモリ内で自身の作業を追跡するために設計されました。TaskManager はエージェントチームがファイルシステムを通じて連携するために設計されています。API は意図的に類似させ、概念的なアップグレードパスを明確にしています。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s08.json
+++ b/web/src/data/annotations/s08.json
@ -0,0 +1,47 @@
+{
+  "version": "s08",
+  "decisions": [
+    {
+      "id": "notification-bus",
+      "title": "threading.Queue as the Notification Bus",
+      "description": "Background task results are delivered via a threading.Queue instead of direct callbacks. The background thread puts a notification on the queue when its work completes. The main agent loop polls the queue before each LLM call. This decoupling is important: the background thread doesn't need to know anything about the main loop's state or timing. It just drops a message on the queue and moves on. The main loop picks it up at its own pace -- never mid-API-call, never mid-tool-execution. No race conditions, no callback hell.",
+      "alternatives": "Direct callbacks (background thread calls a function in the main thread) would deliver results faster but create thread-safety issues -- the callback might fire while the main thread is in the middle of building a request. Event-driven systems (asyncio, event emitters) work but add complexity. A queue is the simplest thread-safe communication primitive.",
+      "zh": {
+        "title": "用 threading.Queue 作为通知总线",
+        "description": "后台任务结果通过 threading.Queue 传递，而非直接回调。后台线程在工作完成时向队列放入通知，主 agent 循环在每次 LLM 调用前轮询队列。这种解耦很重要：后台线程无需了解主循环的状态或时序，只需往队列放入消息然后继续。主循环按自己的节奏取出消息——永远不会在 API 调用中途或工具执行中途。没有竞争条件，没有回调地狱。"
+      },
+      "ja": {
+        "title": "threading.Queue を通知バスとして使用",
+        "description": "バックグラウンドタスクの結果は直接コールバックではなく threading.Queue を通じて配信されます。バックグラウンドスレッドは作業完了時にキューに通知を投入します。メインのエージェントループは各 LLM 呼び出しの前にキューをポーリングします。この疎結合が重要です：バックグラウンドスレッドはメインループの状態やタイミングを一切知る必要がありません。キューにメッセージを入れて先に進むだけです。メインループは自分のペースで取り出します――API 呼び出しの途中でもツール実行の途中でもありません。レースコンディションもコールバック地獄もありません。"
+      }
+    },
+    {
+      "id": "daemon-threads",
+      "title": "Background Tasks Run as Daemon Threads",
+      "description": "Background task threads are created with daemon=True. In Python, daemon threads are killed automatically when the main thread exits. This prevents a common problem: if the main agent completes its work and exits, but a background thread is still running (waiting on a long API call, stuck in a loop), the process would hang indefinitely. With daemon threads, exit is clean -- the main thread finishes, all daemon threads die, process exits. No zombie processes, no cleanup code needed.",
+      "alternatives": "Non-daemon threads with explicit cleanup (join with timeout, then terminate) give more control over shutdown but require careful lifecycle management. Process-based parallelism (multiprocessing) provides stronger isolation but higher overhead. Daemon threads are the pragmatic choice: minimal code, correct behavior in the common case.",
+      "zh": {
+        "title": "后台任务以守护线程运行",
+        "description": "后台任务线程以 daemon=True 创建。在 Python 中，守护线程在主线程退出时自动被终止。这防止了一个常见问题：如果主 agent 完成工作并退出，但后台线程仍在运行（等待一个长时间 API 调用或陷入循环），进程会无限挂起。使用守护线程，退出是干净的——主线程结束，所有守护线程自动终止，进程退出。没有僵尸进程，不需要清理代码。"
+      },
+      "ja": {
+        "title": "バックグラウンドタスクはデーモンスレッドとして実行",
+        "description": "バックグラウンドタスクのスレッドは daemon=True で作成されます。Python ではデーモンスレッドはメインスレッドの終了時に自動的に終了されます。これにより一般的な問題を防ぎます：メインエージェントが作業を完了して終了しても、バックグラウンドスレッドがまだ実行中（長い API 呼び出しを待機、ループに陥っている）だとプロセスが無限にハングします。デーモンスレッドならクリーンに終了できます――メインスレッドが終了すると全デーモンスレッドが自動終了し、プロセスが終了します。ゾンビプロセスもクリーンアップコードも不要です。"
+      }
+    },
+    {
+      "id": "attachment-format",
+      "title": "Structured Notification Format with Type Tags",
+      "description": "Notifications from background tasks use a structured format: {\"type\": \"attachment\", \"attachment\": {status, result, ...}} instead of plain text strings. The type tag lets the main loop handle different notification types differently: an 'attachment' might be injected into the conversation as a tool_result, while a 'status_update' might just update a progress indicator. Machine-readable notifications also enable programmatic filtering (show only errors, suppress progress updates) and UI rendering (display status as a progress bar, not raw text).",
+      "alternatives": "Plain text notifications are simpler but lose structure. The main loop would have to parse free-form text to determine what happened, which is fragile. A class hierarchy (StatusNotification, ResultNotification, ErrorNotification) is more Pythonic but less portable -- JSON structures work the same way regardless of language or serialization format.",
+      "zh": {
+        "title": "带类型标签的结构化通知格式",
+        "description": "后台任务的通知使用结构化格式：{\"type\": \"attachment\", \"attachment\": {status, result, ...}}，而非纯文本字符串。类型标签让主循环可以区别处理不同通知类型：attachment 可能作为 tool_result 注入对话，而 status_update 可能只更新进度指示器。机器可读的通知还支持程序化过滤（只显示错误、抑制进度更新）和 UI 渲染（将状态显示为进度条而非原始文本）。"
+      },
+      "ja": {
+        "title": "型タグ付き構造化通知フォーマット",
+        "description": "バックグラウンドタスクからの通知は構造化フォーマットを使用します：プレーンテキストではなく {\"type\": \"attachment\", \"attachment\": {status, result, ...}} です。型タグによりメインループは異なる通知タイプを異なる方法で処理できます：attachment は会話に tool_result として注入され、status_update は進捗インジケーターの更新のみを行うかもしれません。機械可読な通知はプログラム的なフィルタリング（エラーのみ表示、進捗更新の抑制）や UI レンダリング（ステータスを生テキストではなくプログレスバーとして表示）も可能にします。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s09.json
+++ b/web/src/data/annotations/s09.json
@ -0,0 +1,47 @@
+{
+  "version": "s09",
+  "decisions": [
+    {
+      "id": "teammate-vs-subagent",
+      "title": "Persistent Teammates vs One-Shot Subagents",
+      "description": "In v3, subagents are ephemeral: spawn, do one task, return result, die. Their knowledge dies with them. In v8, teammates are persistent threads with identity (name, role) and config files. A teammate can complete task A, then be assigned task B, carrying forward everything it learned. This is the difference between hiring a contractor for one job and having a team member. Persistent teammates accumulate project knowledge, understand established patterns, and don't need to re-read the same files for every task.",
+      "alternatives": "One-shot subagents (v3 style) are simpler and provide perfect context isolation -- no risk of one task's context polluting another. But the re-learning cost is high: every new task starts from zero. A middle ground (subagents with shared memory/knowledge base) was considered but adds complexity without the full benefit of persistent identity and state.",
+      "zh": {
+        "title": "持久化队友 vs 一次性子代理",
+        "description": "在 v3 中，子代理是临时的：创建、执行一个任务、返回结果、销毁。它们的知识随之消亡。在 v8 中，队友是具有身份（名称、角色）和配置文件的持久化线程。队友可以完成任务 A，然后被分配任务 B，并携带之前学到的所有知识。这就是雇一个临时工做一个项目和拥有一个团队成员之间的区别。持久化队友积累项目知识，理解已建立的模式，不需要为每个任务重新阅读相同的文件。"
+      },
+      "ja": {
+        "title": "永続的なチームメイト vs 使い捨てサブエージェント",
+        "description": "v3 ではサブエージェントは一時的です：生成、1つのタスクを実行、結果を返却、消滅。その知識も一緒に消えます。v8 ではチームメイトはアイデンティティ（名前、役割）と設定ファイルを持つ永続的なスレッドです。チームメイトはタスク A を完了した後、学んだ全てを引き継いでタスク B に割り当てられます。これは1つの仕事のために請負業者を雇うことと、チームメンバーを持つことの違いです。永続的なチームメイトはプロジェクトの知識を蓄積し、確立されたパターンを理解し、タスクごとに同じファイルを再読する必要がありません。"
+      }
+    },
+    {
+      "id": "file-based-team-config",
+      "title": "Team Config Persisted to .teams/{name}/config.json",
+      "description": "Team structure (member names, roles, agent IDs) is stored in a JSON config file, not in any agent's memory. Any agent can discover its teammates by reading the config file -- no need for a discovery service or shared memory. If an agent crashes and restarts, it reads the config to find out who else is on the team. This is consistent with the v6 philosophy: the filesystem is the coordination layer. Config files are also human-readable, making it easy to manually add/remove team members or debug team setup issues.",
+      "alternatives": "In-memory team registries are faster but don't survive process restarts and require a central process to maintain. Service discovery (like DNS or a discovery server) is more robust at scale but overkill for a local multi-agent system. File-based config is the simplest approach that works across independent processes.",
+      "zh": {
+        "title": "团队配置持久化到 .teams/{name}/config.json",
+        "description": "团队结构（成员名称、角色、agent ID）存储在 JSON 配置文件中，而非任何 agent 的内存中。任何 agent 都可以通过读取配置文件发现队友——无需发现服务或共享内存。如果 agent 崩溃并重启，它读取配置即可知道团队中还有谁。这与 v6 的理念一致：文件系统就是协调层。配置文件人类可读，便于手动添加或移除团队成员、调试团队配置问题。"
+      },
+      "ja": {
+        "title": "チーム設定を .teams/{name}/config.json に永続化",
+        "description": "チーム構成（メンバー名、役割、エージェント ID）はエージェントのメモリではなく JSON 設定ファイルに保存されます。どのエージェントも設定ファイルを読むことでチームメイトを発見できます――ディスカバリーサービスや共有メモリは不要です。エージェントがクラッシュして再起動した場合、設定を読んで他のチームメンバーを把握します。これは v6 の思想と一貫しています：ファイルシステムが連携レイヤーです。設定ファイルは人間が読めるため、チームメンバーの手動追加・削除やチーム設定問題のデバッグが容易です。"
+      }
+    },
+    {
+      "id": "tool-filtering-by-role",
+      "title": "Teammates Get Subset of Tools, Lead Gets All",
+      "description": "The team lead receives ALL_TOOLS (including TeamCreate, SendMessage, TaskCreate, etc.) while teammates receive TEAMMATE_TOOLS (a reduced set focused on task execution). This enforces a clear separation of concerns: teammates focus on doing work (coding, testing, researching), while the lead focuses on coordination (creating tasks, assigning work, managing communication). Giving teammates coordination tools would let them create their own sub-teams or reassign tasks, undermining the lead's ability to maintain a coherent plan.",
+      "alternatives": "Giving all agents identical tools is simpler and more egalitarian, but in practice leads to coordination chaos -- multiple agents trying to manage each other, creating conflicting task assignments. A permission system (any agent can request elevation) adds flexibility but also complexity. Static role-based filtering is predictable and easy to reason about.",
+      "zh": {
+        "title": "队友获得工具子集，组长获得全部工具",
+        "description": "团队组长获得 ALL_TOOLS（包括 TeamCreate、SendMessage、TaskCreate 等），而队友获得 TEAMMATE_TOOLS（专注于任务执行的精简工具集）。这强制了清晰的职责分离：队友专注于做事（编码、测试、研究），组长专注于协调（创建任务、分配工作、管理沟通）。给队友协调工具会让他们创建自己的子团队或重新分配任务，破坏组长维持连贯计划的能力。"
+      },
+      "ja": {
+        "title": "チームメイトはツールのサブセット、リーダーは全ツール",
+        "description": "チームリーダーは ALL_TOOLS（TeamCreate、SendMessage、TaskCreate など含む）を受け取り、チームメイトは TEAMMATE_TOOLS（タスク実行に特化した縮小セット）を受け取ります。これにより明確な関心の分離が強制されます：チームメイトは作業（コーディング、テスト、調査）に集中し、リーダーは調整（タスク作成、作業割り当て、コミュニケーション管理）に集中します。チームメイトに調整ツールを与えると、独自のサブチーム作成やタスクの再割り当てが可能になり、リーダーの一貫した計画維持能力が損なわれます。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s10.json
+++ b/web/src/data/annotations/s10.json
@ -0,0 +1,47 @@
+{
+  "version": "s10",
+  "decisions": [
+    {
+      "id": "jsonl-inbox",
+      "title": "JSONL Inbox Files Instead of Shared Memory",
+      "description": "Each teammate has its own inbox file (a JSONL file in the team directory). Sending a message means appending a JSON line to the recipient's inbox file. Reading messages means reading the inbox file and tracking which line was last read. JSONL is append-only by nature, which means concurrent writers don't corrupt each other's data (appends to different file positions). This works across processes without any shared memory, mutex, or IPC mechanism. It's also crash-safe: if the writer crashes mid-append, the worst case is one partial line that the reader can skip.",
+      "alternatives": "Shared memory (Python multiprocessing.Queue) would be faster but doesn't work if agents are separate processes launched independently. A message broker (Redis, RabbitMQ) provides robust pub/sub but adds infrastructure dependencies. Unix domain sockets would work but are harder to debug (no human-readable message log). JSONL files are the simplest approach that provides persistence, cross-process communication, and debuggability.",
+      "zh": {
+        "title": "JSONL 收件箱文件而非共享内存",
+        "description": "每个队友都有自己的收件箱文件（团队目录中的 JSONL 文件）。发送消息意味着向接收者的收件箱文件追加一行 JSON。读取消息意味着读取收件箱文件并追踪上次读到的行。JSONL 天然是仅追加的，这意味着并发写入不会破坏彼此的数据（追加到不同的文件位置）。这在无需共享内存、互斥锁或 IPC 机制的情况下跨进程工作。它也是崩溃安全的：如果写入者在追加中途崩溃，最坏情况是一行不完整的数据，读取者可以跳过。"
+      },
+      "ja": {
+        "title": "共有メモリではなく JSONL インボックスファイル",
+        "description": "各チームメイトはチームディレクトリ内に独自のインボックスファイル（JSONL ファイル）を持ちます。メッセージの送信は受信者のインボックスファイルに JSON 行を追記することです。メッセージの読み取りはインボックスファイルを読んで最後に読んだ行を追跡することです。JSONL は本質的に追記専用で、並行ライターが互いのデータを破壊しません（異なるファイル位置への追記）。共有メモリ、ミューテックス、IPC メカニズムなしにプロセス間で動作します。クラッシュにも安全です：ライターが追記途中でクラッシュしても、最悪の場合は不完全な1行だけでリーダーはスキップできます。"
+      }
+    },
+    {
+      "id": "five-message-types",
+      "title": "Exactly Five Message Types Cover All Coordination Patterns",
+      "description": "The messaging system supports exactly five types: (1) 'message' for point-to-point communication between two agents, (2) 'broadcast' for team-wide announcements, (3) 'shutdown_request' for graceful termination, (4) 'shutdown_response' for acknowledging shutdown, (5) 'plan_approval_response' for the lead to approve or reject a teammate's plan. These five types map to the fundamental coordination patterns: direct communication, broadcast, lifecycle management, and approval workflows. Adding more types (e.g., priority_message, status_update) would increase complexity without enabling new coordination patterns.",
+      "alternatives": "A single generic message type with metadata fields would be more flexible but makes it harder to enforce protocol correctness. Many more types (10+) would provide finer-grained semantics but increase the model's decision burden. Five types is the sweet spot where every type has a clear, distinct purpose and the model can reliably choose the right one.",
+      "zh": {
+        "title": "恰好五种消息类型覆盖所有协调模式",
+        "description": "消息系统恰好支持五种类型：(1) message 用于两个 agent 间的点对点通信；(2) broadcast 用于全团队公告；(3) shutdown_request 用于优雅终止；(4) shutdown_response 用于确认终止；(5) plan_approval_response 用于组长批准或拒绝队友的计划。这五种类型映射到基本协调模式：直接通信、广播、生命周期管理和审批流程。增加更多类型（如 priority_message、status_update）只会增加复杂度而不会启用新的协调模式。"
+      },
+      "ja": {
+        "title": "正確に5つのメッセージタイプで全連携パターンをカバー",
+        "description": "メッセージングシステムは正確に5つのタイプをサポートします：(1) message は2つのエージェント間のポイントツーポイント通信、(2) broadcast はチーム全体への通知、(3) shutdown_request はグレースフルな終了要求、(4) shutdown_response はシャットダウンの確認応答、(5) plan_approval_response はリーダーによるチームメイトの計画の承認・却下。これら5タイプは基本的な連携パターンに対応します：直接通信、ブロードキャスト、ライフサイクル管理、承認ワークフロー。タイプを増やしても（priority_message、status_update など）新たな連携パターンは生まれず、複雑さが増すだけです。"
+      }
+    },
+    {
+      "id": "inbox-before-api-call",
+      "title": "Check Inbox Before Every LLM Call",
+      "description": "Teammates check their inbox file at the top of every agent loop iteration, before calling the LLM API. This ensures maximum responsiveness to incoming messages: a shutdown request is seen within one loop iteration (typically seconds), not after the current task completes (potentially minutes). The inbox check is cheap (read a small file, check if new lines exist) compared to the LLM call (seconds of latency, thousands of tokens). This placement also means incoming messages can influence the next LLM call -- a message saying 'stop working on X, switch to Y' takes effect immediately.",
+      "alternatives": "Checking inbox after each tool execution would be more responsive but adds overhead to every tool call, which is more frequent than LLM calls. A separate watcher thread could monitor the inbox continuously but adds threading complexity. Checking once per LLM call is the pragmatic sweet spot: responsive enough for coordination, cheap enough to not impact performance.",
+      "zh": {
+        "title": "每次 LLM 调用前检查收件箱",
+        "description": "队友在每次 agent 循环迭代的顶部、调用 LLM API 之前检查收件箱文件。这确保了对传入消息的最大响应性：一个终止请求会在一个循环迭代内被看到（通常几秒钟），而非在当前任务完成后（可能数分钟）。收件箱检查成本很低（读取小文件，检查是否有新行），相比 LLM 调用（秒级延迟，数千 token）微不足道。这个位置还意味着传入消息可以影响下一次 LLM 调用——一条'停止 X，转去做 Y'的消息会立即生效。"
+      },
+      "ja": {
+        "title": "毎回の LLM 呼び出し前にインボックスを確認",
+        "description": "チームメイトはエージェントループの各イテレーションの冒頭、LLM API を呼び出す前にインボックスファイルを確認します。これにより受信メッセージへの応答性を最大化します：シャットダウンリクエストは1ループイテレーション以内（通常数秒）で確認され、現在のタスク完了後（数分かかる可能性）ではありません。インボックスの確認は安価で（小さなファイルを読み、新しい行があるか確認）、LLM 呼び出し（秒単位のレイテンシ、数千トークン）と比べて微々たるものです。この配置により受信メッセージが次の LLM 呼び出しに影響できます――「X の作業を止めて Y に切り替えて」というメッセージが即座に有効になります。"
+      }
+    }
+  ]
+}
--- a/web/src/data/annotations/s11.json
+++ b/web/src/data/annotations/s11.json
@ -0,0 +1,47 @@
+{
+  "version": "s11",
+  "decisions": [
+    {
+      "id": "polling-not-events",
+      "title": "Polling for Unclaimed Tasks Instead of Event-Driven Notification",
+      "description": "Autonomous teammates poll the shared task board every ~1 second to find unclaimed tasks, rather than waiting for event-driven notifications. Polling is fundamentally simpler than pub/sub: there's no subscription management, no event routing, no missed-event bugs. With file-based persistence, polling is just 'read the directory listing' -- a cheap operation that works regardless of how many agents are running. The 1-second interval balances responsiveness (new tasks are discovered quickly) against filesystem overhead (not hammering the disk with reads).",
+      "alternatives": "Event-driven notification (file watchers via inotify/fsevents, or a pub/sub channel) would reduce latency from seconds to milliseconds. But file watchers are platform-specific and unreliable across network filesystems. A message broker would work but adds infrastructure. For a system where tasks take minutes to complete, discovering new tasks in 1 second instead of 10 milliseconds makes no practical difference.",
+      "zh": {
+        "title": "轮询未认领任务而非事件驱动通知",
+        "description": "自主队友每隔约 1 秒轮询共享任务板以寻找未认领的任务，而非等待事件驱动的通知。轮询从根本上比发布/订阅更简单：没有订阅管理、没有事件路由、没有事件丢失的 bug。在基于文件的持久化下，轮询就是'读取目录列表'——一个低成本操作，无论有多少 agent 在运行都能正常工作。1 秒的间隔平衡了响应性（新任务被快速发现）和文件系统开销（不会过度读取磁盘）。"
+      },
+      "ja": {
+        "title": "イベント駆動通知ではなくポーリングで未割り当てタスクを発見",
+        "description": "自律的なチームメイトはイベント駆動の通知を待つのではなく、約1秒ごとに共有タスクボードをポーリングして未割り当てタスクを探します。ポーリングはパブ/サブより根本的にシンプルです：サブスクリプション管理、イベントルーティング、イベント欠落バグがありません。ファイルベースの永続化では、ポーリングは「ディレクトリ一覧を読む」だけで、実行中のエージェント数に関係なく動作する安価な操作です。1秒間隔は応答性（新タスクの迅速な発見）とファイルシステムのオーバーヘッド（ディスク読み取りの過負荷回避）のバランスを取っています。"
+      }
+    },
+    {
+      "id": "idle-timeout",
+      "title": "60-Second Idle Timeout Before Self-Termination",
+      "description": "When an autonomous teammate has no tasks to work on and no messages in its inbox, it waits up to 60 seconds before giving up and shutting down. This prevents zombie teammates that wait forever for work that never comes -- a real problem when the lead forgets to send a shutdown request, or when all remaining tasks are blocked on external events. The 60-second window is long enough that a brief gap between task completions and new task creation won't cause premature shutdown, but short enough that unused teammates don't waste resources.",
+      "alternatives": "No timeout (wait forever) risks zombie processes. A very short timeout (5s) causes premature exits when the lead is simply thinking or typing. A heartbeat system (lead periodically pings teammates to keep them alive) works but adds protocol complexity. The 60-second fixed timeout is a good default that balances false-positive exits against resource waste.",
+      "zh": {
+        "title": "空闲 60 秒后自动终止",
+        "description": "当自主队友没有任务可做且收件箱中没有消息时，它最多等待 60 秒后放弃并关闭。这防止了永远等待不会到来的工作的僵尸队友——这在组长忘记发送关闭请求、或所有剩余任务都被外部事件阻塞时是真实存在的问题。60 秒窗口足够长，不会因为任务完成到新任务创建之间的短暂间隔而导致过早关闭；又足够短，不会让闲置队友浪费资源。"
+      },
+      "ja": {
+        "title": "60秒のアイドルタイムアウトで自動終了",
+        "description": "自律的なチームメイトが作業するタスクもインボックスのメッセージもない場合、最大60秒待ってから諦めてシャットダウンします。これにより永遠に来ない仕事を待ち続けるゾンビチームメイトを防ぎます――リーダーがシャットダウンリクエストの送信を忘れたり、残りのタスクが全て外部イベントでブロックされている場合に実際に起こる問題です。60秒のウィンドウはタスク完了から新タスク作成までの短い間隔で早期シャットダウンが起きない十分な長さであり、かつ未使用のチームメイトがリソースを浪費しない十分な短さです。"
+      }
+    },
+    {
+      "id": "identity-after-compression",
+      "title": "Re-Inject Teammate Identity After Context Compression",
+      "description": "When auto_compact compresses the conversation, the resulting summary loses crucial metadata: the teammate's name, which team it belongs to, and its agent_id. Without this information, the teammate can't claim tasks (tasks are owned by name), can't check its inbox (inbox files are keyed by agent_id), and can't identify itself in messages. So after every auto_compact, the system re-injects a structured identity block into the conversation: 'You are [name] on team [team], your agent_id is [id], your inbox is at [path].' This is the minimum context needed for the teammate to remain functional after memory loss.",
+      "alternatives": "Putting identity in the system prompt (which survives compression) would avoid this problem, but violates the cache-friendly static-system-prompt design from v4. Embedding identity in the summary prompt ('when summarizing, always include your name and team') is unreliable -- the LLM might omit it. Explicit post-compression injection is deterministic and guaranteed to work.",
+      "zh": {
+        "title": "上下文压缩后重新注入队友身份",
+        "description": "自动压缩对话时，生成的摘要会丢失关键元数据：队友的名称、所属团队和 agent_id。没有这些信息，队友无法认领任务（任务按名称归属）、无法检查收件箱（收件箱文件以 agent_id 为键）、也无法在消息中表明身份。因此每次自动压缩后，系统会向对话中重新注入一个结构化的身份块：'你是 [team] 团队的 [name]，你的 agent_id 是 [id]，你的收件箱在 [path]。'这是队友在记忆丢失后保持功能所需的最小上下文。"
+      },
+      "ja": {
+        "title": "コンテキスト圧縮後にチームメイトのアイデンティティを再注入",
+        "description": "自動コンパクトが会話を圧縮すると、生成された要約は重要なメタデータを失います：チームメイトの名前、所属チーム、agent_id。この情報がなければチームメイトはタスクを申告できず（タスクは名前で所有）、インボックスを確認できず（インボックスファイルは agent_id をキーとする）、メッセージで自分を識別できません。そのため自動コンパクトの後、システムは構造化されたアイデンティティブロックを会話に再注入します：「あなたは [team] チームの [name] です。agent_id は [id]、インボックスは [path] にあります。」これはメモリ喪失後もチームメイトが機能し続けるために必要な最小限のコンテキストです。"
+      }
+    }
+  ]
+}