add worktree & up task、teammate etc

2026-05-04 09:10:13 +00:00 · 2026-02-24 01:44:44 +08:00 · 2026-02-24 01:44:44 +08:00 · aea8844bac
commit aea8844bac
parent c6a27ef1d7
54 changed files with 2404 additions and 210 deletions
--- a/docs/en/s01-the-agent-loop.md
+++ b/docs/en/s01-the-agent-loop.md
@ -1,6 +1,6 @@
 # s01: The Agent Loop

-> The entire secret of AI coding agents is a while loop that feeds tool results back to the model until the model decides to stop.
+> The core of a coding agent is a while loop that feeds tool results back to the model until the model decides to stop.

 ## The Problem

@ -59,7 +59,8 @@ messages.append({"role": "assistant", "content": response.content})
 ```

 4. We check the stop reason. If the model did not call a tool, the loop
-   ends. This is the only exit condition.
+   ends. In this minimal lesson implementation, this is the only loop exit
+   condition.

 ```python
 if response.stop_reason != "tool_use":
@ -126,7 +127,7 @@ This is session 1 -- the starting point. There is no prior session.

 ## Design Rationale

-This loop is the universal foundation of all LLM-based agents. Production implementations add error handling, token counting, streaming, and retry logic, but the fundamental structure is unchanged. The simplicity is the point: one exit condition (`stop_reason != "tool_use"`) controls the entire flow. Everything else in this course -- tools, planning, compression, teams -- layers on top of this loop without modifying it. Understanding this loop means understanding every agent.
+This loop is the foundation of LLM-based agents. Production implementations add error handling, token counting, streaming, retry logic, permission policy, and lifecycle orchestration, but the core interaction pattern still starts here. The simplicity is the point for this session: in this minimal implementation, one exit condition (`stop_reason != "tool_use"`) controls the flow we need to learn first. Everything else in this course layers on top of this loop. Understanding this loop gives you the base model, not the full production architecture.

 ## Try It

--- a/docs/en/s02-tool-use.md
+++ b/docs/en/s02-tool-use.md
@ -1,6 +1,6 @@
 # s02: Tools

-> A dispatch map routes tool calls to handler functions -- the loop itself does not change at all.
+> A dispatch map routes tool calls to handler functions. The loop stays identical.

 ## The Problem

@ -133,7 +133,7 @@ def agent_loop(messages: list):

 ## Design Rationale

-The dispatch map pattern scales linearly -- adding a tool means adding one handler and one schema entry. The loop never changes. This separation of concerns (loop vs handlers) is why agent frameworks can support dozens of tools without increasing control flow complexity. The pattern also enables independent testing of each handler in isolation, since handlers are pure functions with no coupling to the loop. Any agent that outgrows a dispatch map has a design problem, not a scaling problem.
+The dispatch map scales linearly: add a tool, add a handler, add a schema entry. The loop never changes. Handlers are pure functions, so they test in isolation. Any agent that outgrows a dispatch map has a design problem, not a scaling problem.

 ## Try It

--- a/docs/en/s03-todo-write.md
+++ b/docs/en/s03-todo-write.md
@ -19,9 +19,7 @@ explicitly. The model creates a plan, marks items in_progress as it works,
 and marks them completed when done. A nag reminder injects a nudge if the
 model goes 3+ rounds without updating its todos.

-Teaching simplification: the nag threshold of 3 rounds is set low for
-teaching visibility. Production agents typically use a higher threshold
-around 10 to avoid excessive prompting.
+Note: the nag threshold of 3 rounds is low for visibility. Production systems tune higher. From s07, this course switches to the Task board for durable multi-step work; TodoWrite remains available for quick checklists.

 ## The Solution

--- a/docs/en/s04-subagent.md
+++ b/docs/en/s04-subagent.md
@ -14,7 +14,7 @@ does this project use?" might require reading 5 files, but the parent
 agent does not need all 5 file contents in its history -- it just needs
 the answer: "pytest with conftest.py configuration."

-The solution is process isolation: spawn a child agent with `messages=[]`.
+In this course, a practical solution is fresh-context isolation: spawn a child agent with `messages=[]`.
 The child explores, reads files, runs commands. When it finishes, only its
 final text response returns to the parent. The child's entire message
 history is discarded.
@ -137,11 +137,10 @@ def run_subagent(prompt: str) -> str:
 | Context        | Single shared    | Parent + child isolation  |
 | Subagent       | None             | `run_subagent()` function |
 | Return value   | N/A              | Summary text only         |
-| Todo system    | TodoManager      | Removed (not needed here) |

 ## Design Rationale

-Process isolation gives context isolation for free. A fresh `messages[]` means the subagent cannot be confused by the parent's conversation history. The tradeoff is communication overhead -- results must be compressed back to the parent, losing detail. This is the same tradeoff as OS process isolation: safety and cleanliness in exchange for serialization cost. Limiting subagent depth (no recursive spawning) prevents unbounded resource consumption, and a max iteration count ensures runaway children terminate.
+Fresh-context isolation is a practical way to approximate context isolation in this session. A fresh `messages[]` means the subagent starts without the parent's conversation history. The tradeoff is communication overhead -- results must be compressed back to the parent, losing detail. This is a message-history isolation strategy, not OS process isolation. Limiting subagent depth (no recursive spawning) prevents unbounded resource consumption, and a max iteration count ensures runaway children terminate.

 ## Try It

--- a/docs/en/s05-skill-loading.md
+++ b/docs/en/s05-skill-loading.md
@ -144,7 +144,6 @@ class SkillLoader:
 | System prompt  | Static string    | + skill descriptions       |
 | Knowledge      | None             | .skills/*.md files         |
 | Injection      | None             | Two-layer (system + result)|
-| Subagent       | `run_subagent()` | Removed (different focus)  |

 ## Design Rationale

--- a/docs/en/s06-context-compact.md
+++ b/docs/en/s06-context-compact.md
@ -162,7 +162,6 @@ def agent_loop(messages):
 | Auto-compact   | None             | Token threshold trigger    |
 | Manual compact | None             | `compact` tool             |
 | Transcripts    | None             | Saved to .transcripts/     |
-| Skills         | load_skill       | Removed (different focus)  |

 ## Design Rationale

--- a/docs/en/s07-task-system.md
+++ b/docs/en/s07-task-system.md
@ -1,28 +1,31 @@
 # s07: Tasks

-> Tasks persist as JSON files on the filesystem with a dependency graph, so they survive context compression and can be shared across agents.
+> Tasks are persisted as JSON files with a dependency graph, so state survives context compression and can be shared across agents.

-## The Problem
+## Problem

-In-memory state like TodoManager (s03) is lost when the context is
-compressed (s06). After auto_compact replaces messages with a summary,
-the todo list is gone. The agent has to reconstruct it from the summary
-text, which is lossy and error-prone.
+In-memory state (for example the TodoManager from s03) is fragile under compression (s06). Once earlier turns are compacted into summaries, in-memory todo state is gone.

-This is the critical s06-to-s07 bridge: TodoManager items die with
-compression; file-based tasks don't. Moving state to the filesystem
-makes it compression-proof.
+s06 -> s07 is the key transition:

-More fundamentally, in-memory state is invisible to other agents.
-When we eventually build teams (s09+), teammates need a shared task
-board. In-memory data structures are process-local.
+1. Todo list state in memory is conversational and lossy.
+2. Task board state on disk is durable and recoverable.

-The solution is to persist tasks as JSON files in `.tasks/`. Each task
-is a separate file with an ID, subject, status, and dependency graph.
-Completing task 1 automatically unblocks task 2 if task 2 has
-`blockedBy: [1]`. The file system becomes the source of truth.
+A second issue is visibility: in-memory structures are process-local, so teammates cannot reliably share that state.

-## The Solution
+## When to Use Task vs Todo
+
+From s07 onward, Task is the default. Todo remains for short linear checklists.
+
+## Quick Decision Matrix
+
+| Situation | Prefer | Why |
+|---|---|---|
+| Short, single-session checklist | Todo | Lowest ceremony, fastest capture |
+| Cross-session work, dependencies, or teammates | Task | Durable state, dependency graph, shared visibility |
+| Unsure which one to use | Task | Easier to simplify later than migrate mid-run |
+
+## Solution

 ```
 .tasks/
@ -42,7 +45,7 @@ Dependency resolution:

 ## How It Works

-1. The TaskManager provides CRUD operations. Each task is a JSON file.
+1. TaskManager provides CRUD with one JSON file per task.

 ```python
 class TaskManager:
@ -61,8 +64,7 @@ class TaskManager:
        return json.dumps(task, indent=2)
 ```

-2. When a task is marked completed, `_clear_dependency` removes its ID
-   from all other tasks' `blockedBy` lists.
+2. Completing a task clears that dependency from other tasks.

 ```python
 def _clear_dependency(self, completed_id: int):
@ -73,8 +75,7 @@ def _clear_dependency(self, completed_id: int):
            self._save(task)
 ```

-3. The `update` method handles status changes and bidirectional dependency
-   wiring.
+3. `update` handles status transitions and dependency wiring.

 ```python
 def update(self, task_id, status=None,
@ -94,7 +95,7 @@ def update(self, task_id, status=None,
    self._save(task)
 ```

-4. Four task tools are added to the dispatch map.
+4. Task tools are added to the dispatch map.

 ```python
 TOOL_HANDLERS = {
@ -109,8 +110,7 @@ TOOL_HANDLERS = {

 ## Key Code

-The TaskManager with dependency graph (from `agents/s07_task_system.py`,
-lines 46-123):
+TaskManager with dependency graph (from `agents/s07_task_system.py`, lines 46-123):

 ```python
 class TaskManager:
@ -145,17 +145,20 @@ class TaskManager:

 ## What Changed From s06

-| Component      | Before (s06)     | After (s07)                |
-|----------------|------------------|----------------------------|
-| Tools          | 5                | 8 (+task_create/update/list/get)|
-| State storage  | In-memory only   | JSON files in .tasks/      |
-| Dependencies   | None             | blockedBy + blocks graph   |
-| Compression    | Three-layer      | Removed (different focus)  |
-| Persistence    | Lost on compact  | Survives compression       |
+| Component | Before (s06) | After (s07) |
+|---|---|---|
+| Tools | 5 | 8 (`task_create/update/list/get`) |
+| State storage | In-memory only | JSON files in `.tasks/` |
+| Dependencies | None | `blockedBy + blocks` graph |
+| Persistence | Lost on compact | Survives compression |

 ## Design Rationale

-File-based state survives context compression. When the agent's conversation is compacted, in-memory state is lost, but tasks written to disk persist. The dependency graph ensures correct execution order even after context loss. This is the bridge between ephemeral conversation and persistent work -- the agent can forget conversation details but always has the task board to remind it what needs doing. The filesystem as source of truth also enables future multi-agent sharing, since any process can read the same JSON files.
+File-based state survives compaction and process restarts. The dependency graph preserves execution order even when conversation details are forgotten. This turns transient chat context into durable work state.
+
+Durability still needs a write discipline: reload task JSON before each write, validate expected `status/blockedBy`, then persist atomically. Otherwise concurrent writers can overwrite each other.
+
+Course-level implication: s07+ defaults to Task because it better matches long-running and collaborative engineering workflows.

 ## Try It

@ -164,7 +167,7 @@ cd learn-claude-code
 python agents/s07_task_system.py
 ```

-Example prompts to try:
+Suggested prompts:

 1. `Create 3 tasks: "Setup project", "Write code", "Write tests". Make them depend on each other in order.`
 2. `List all tasks and show the dependency graph`
--- a/docs/en/s08-background-tasks.md
+++ b/docs/en/s08-background-tasks.md
@ -168,7 +168,6 @@ class BackgroundManager:
 | Execution      | Blocking only    | Blocking + background threads|
 | Notification   | None             | Queue drained per loop     |
 | Concurrency    | None             | Daemon threads             |
-| Task system    | File-based CRUD  | Removed (different focus)  |

 ## Design Rationale

--- a/docs/en/s09-agent-teams.md
+++ b/docs/en/s09-agent-teams.md
@ -1,6 +1,6 @@
 # s09: Agent Teams

-> Persistent teammates with JSONL inboxes turn isolated agents into a communicating team -- spawn, message, broadcast, and drain.
+> Persistent teammates with JSONL inboxes are one teaching protocol for turning isolated agents into a communicating team -- spawn, message, broadcast, and drain.

 ## The Problem

@ -215,7 +215,7 @@ pattern used here is safe for the teaching scenario.

 ## Design Rationale

-File-based mailboxes (append-only JSONL) provide concurrency-safe inter-agent communication. Append is atomic on most filesystems, avoiding lock contention. The "drain on read" pattern (read all, truncate) gives batch delivery. This is simpler and more robust than shared memory or socket-based IPC for agent coordination. The tradeoff is latency -- messages are only seen at the next poll -- but for LLM-driven agents where each turn takes seconds, polling latency is negligible compared to inference time.
+File-based mailboxes (append-only JSONL) are easy to inspect and reason about in a teaching codebase. The "drain on read" pattern (read all, truncate) gives batch delivery with very little machinery. The tradeoff is latency -- messages are only seen at the next poll -- but for LLM-driven agents where each turn takes seconds, polling latency is acceptable for this course.

 ## Try It

--- a/docs/en/s11-autonomous-agents.md
+++ b/docs/en/s11-autonomous-agents.md
@ -20,10 +20,7 @@ original system prompt identity ("you are alice, role: coder") fades.
 Identity re-injection solves this by inserting an identity block at the
 start of compressed contexts.

-Teaching simplification: the token estimation used here is rough
-(characters / 4). Production systems use proper tokenizer libraries.
-The nag threshold of 3 rounds (from s03) is set low for teaching
-visibility; production agents typically use a higher threshold around 10.
+Note: token estimation here uses characters/4 (rough). The nag threshold of 3 rounds is low for teaching visibility.

 ## The Solution

--- a/docs/en/s12-worktree-task-isolation.md
+++ b/docs/en/s12-worktree-task-isolation.md
@ -0,0 +1,250 @@
+# s12: Worktree + Task Isolation
+
+> Isolate by directory, coordinate by task ID -- tasks are the control plane, worktrees are the execution plane, and an event stream makes every lifecycle step observable.
+
+## The Problem
+
+By s11, agents can claim and complete tasks autonomously. But every task runs in one shared directory. Ask two agents to refactor different modules at the same time and you hit three failure modes:
+
+Agent A edits `auth.py`. Agent B edits `auth.py`. Neither knows the other touched it. Unstaged changes collide, task status says "in_progress" but the directory is a mess, and when something breaks there is no way to roll back one agent's work without destroying the other's. The task board tracks _what to do_ but has no opinion about _where to do it_.
+
+The fix is to separate the two concerns. Tasks manage goals. Worktrees manage execution context. Bind them by task ID, and each agent gets its own directory, its own branch, and a clean teardown path.
+
+## The Solution
+
+```
+Control Plane (.tasks/)              Execution Plane (.worktrees/)
+---------------------------+        +---------------------------+
+| task_1.json               |        | index.json                |
+|   id: 1                   |        |   name: "auth-refactor"   |
+|   subject: "Auth refactor"|  bind  |   path: ".worktrees/..."  |
+|   status: "in_progress"   | <----> |   branch: "wt/auth-..."   |
+|   worktree: "auth-refactor"|       |   task_id: 1              |
+---------------------------+        |   status: "active"        |
+                                     +---------------------------+
+| task_2.json               |        |                           |
+|   id: 2                   |  bind  |   name: "ui-login"        |
+|   subject: "Login page"   | <----> |   task_id: 2              |
+|   worktree: "ui-login"    |        |   status: "active"        |
+---------------------------+        +---------------------------+
+                                               |
+                                     +---------------------------+
+                                     | events.jsonl (append-only)|
+                                     | worktree.create.before    |
+                                     | worktree.create.after     |
+                                     | worktree.remove.after     |
+                                     | task.completed            |
+                                     +---------------------------+
+```
+
+Three state layers make this work:
+
+1. **Control plane** (`.tasks/task_*.json`) -- what is assigned, in progress, or done. Key fields: `id`, `subject`, `status`, `owner`, `worktree`.
+2. **Execution plane** (`.worktrees/index.json`) -- where commands run and whether the workspace is still valid. Key fields: `name`, `path`, `branch`, `task_id`, `status`.
+3. **Runtime state** (in-memory) -- per-turn execution continuity: `current_task`, `current_worktree`, `tool_result`, `error`.
+
+## How It Works
+
+The lifecycle has five steps. Each step is a tool call.
+
+1. **Create a task.** Persist the goal first. The task starts as `pending` with an empty `worktree` field.
+
+```python
+task = {
+    "id": self._next_id,
+    "subject": subject,
+    "status": "pending",
+    "owner": "",
+    "worktree": "",
+}
+self._save(task)
+```
+
+2. **Create a worktree.** Allocate an isolated directory and branch. If you pass `task_id`, the task auto-advances to `in_progress` and the binding is written to both sides.
+
+```python
+self._run_git(["worktree", "add", "-b", branch, str(path), base_ref])
+
+entry = {
+    "name": name,
+    "path": str(path),
+    "branch": branch,
+    "task_id": task_id,
+    "status": "active",
+}
+idx["worktrees"].append(entry)
+self._save_index(idx)
+
+if task_id is not None:
+    self.tasks.bind_worktree(task_id, name)
+```
+
+3. **Run commands in the worktree.** `worktree_run` sets `cwd` to the worktree path. Edits happen in the isolated directory, not the shared workspace.
+
+```python
+r = subprocess.run(
+    command,
+    shell=True,
+    cwd=path,
+    capture_output=True,
+    text=True,
+    timeout=300,
+)
+```
+
+4. **Observe.** `worktree_status` shows git state inside the isolated context. `worktree_events` queries the append-only event stream.
+
+5. **Close out.** Two choices:
+   - `worktree_keep(name)` -- preserve the directory, mark lifecycle as `kept`.
+   - `worktree_remove(name, complete_task=True)` -- remove the directory, complete the bound task, unbind, and emit `task.completed`. This is the closeout pattern: one call handles teardown and task completion together.
+
+## State Machines
+
+```
+Task:     pending -------> in_progress -------> completed
+               (worktree_create          (worktree_remove
+                with task_id)        with complete_task=true)
+
+Worktree: absent --------> active -----------> removed | kept
+               (worktree_create)         (worktree_remove | worktree_keep)
+```
+
+## Key Code
+
+The closeout pattern -- teardown + task completion in one operation (from `agents/s12_worktree_task_isolation.py`):
+
+```python
+def remove(self, name: str, force: bool = False, complete_task: bool = False) -> str:
+    wt = self._find(name)
+    if not wt:
+        return f"Error: Unknown worktree '{name}'"
+
+    self.events.emit(
+        "worktree.remove.before",
+        task={"id": wt.get("task_id")} if wt.get("task_id") is not None else {},
+        worktree={"name": name, "path": wt.get("path")},
+    )
+    try:
+        args = ["worktree", "remove"]
+        if force:
+            args.append("--force")
+        args.append(wt["path"])
+        self._run_git(args)
+
+        if complete_task and wt.get("task_id") is not None:
+            task_id = wt["task_id"]
+            self.tasks.update(task_id, status="completed")
+            self.tasks.unbind_worktree(task_id)
+            self.events.emit("task.completed", task={
+                "id": task_id, "status": "completed",
+            }, worktree={"name": name})
+
+        idx = self._load_index()
+        for item in idx.get("worktrees", []):
+            if item.get("name") == name:
+                item["status"] = "removed"
+                item["removed_at"] = time.time()
+        self._save_index(idx)
+
+        self.events.emit(
+            "worktree.remove.after",
+            task={"id": wt.get("task_id")} if wt.get("task_id") is not None else {},
+            worktree={"name": name, "path": wt.get("path"), "status": "removed"},
+        )
+        return f"Removed worktree '{name}'"
+    except Exception as e:
+        self.events.emit(
+            "worktree.remove.failed",
+            worktree={"name": name},
+            error=str(e),
+        )
+        raise
+```
+
+The task-side binding (from `agents/s12_worktree_task_isolation.py`):
+
+```python
+def bind_worktree(self, task_id: int, worktree: str, owner: str = "") -> str:
+    task = self._load(task_id)
+    task["worktree"] = worktree
+    if task["status"] == "pending":
+        task["status"] = "in_progress"
+    task["updated_at"] = time.time()
+    self._save(task)
+```
+
+The dispatch map wiring all tools together:
+
+```python
+TOOL_HANDLERS = {
+    "bash":               lambda **kw: run_bash(kw["command"]),
+    "read_file":          lambda **kw: run_read(kw["path"], kw.get("limit")),
+    "write_file":         lambda **kw: run_write(kw["path"], kw["content"]),
+    "edit_file":          lambda **kw: run_edit(kw["path"], kw["old_text"], kw["new_text"]),
+    "task_create":        lambda **kw: TASKS.create(kw["subject"], kw.get("description", "")),
+    "task_list":          lambda **kw: TASKS.list_all(),
+    "task_get":           lambda **kw: TASKS.get(kw["task_id"]),
+    "task_update":        lambda **kw: TASKS.update(kw["task_id"], kw.get("status"), kw.get("owner")),
+    "task_bind_worktree": lambda **kw: TASKS.bind_worktree(kw["task_id"], kw["worktree"]),
+    "worktree_create":    lambda **kw: WORKTREES.create(kw["name"], kw.get("task_id")),
+    "worktree_list":      lambda **kw: WORKTREES.list_all(),
+    "worktree_status":    lambda **kw: WORKTREES.status(kw["name"]),
+    "worktree_run":       lambda **kw: WORKTREES.run(kw["name"], kw["command"]),
+    "worktree_keep":      lambda **kw: WORKTREES.keep(kw["name"]),
+    "worktree_remove":    lambda **kw: WORKTREES.remove(kw["name"], kw.get("force", False), kw.get("complete_task", False)),
+    "worktree_events":    lambda **kw: EVENTS.list_recent(kw.get("limit", 20)),
+}
+```
+
+## Event Stream
+
+Every lifecycle transition emits a before/after/failed triplet to `.worktrees/events.jsonl`. This is an append-only log, not a replacement for task/worktree state files.
+
+Events emitted:
+
+- `worktree.create.before` / `worktree.create.after` / `worktree.create.failed`
+- `worktree.remove.before` / `worktree.remove.after` / `worktree.remove.failed`
+- `worktree.keep`
+- `task.completed` (when `complete_task=true` succeeds)
+
+Payload shape:
+
+```json
+{
+  "event": "worktree.remove.after",
+  "task": {"id": 7, "status": "completed"},
+  "worktree": {"name": "auth-refactor", "path": "...", "status": "removed"},
+  "ts": 1730000000
+}
+```
+
+This gives you three things: policy decoupling (audit and notifications stay outside the core flow), failure compensation (`*.failed` records mark partial transitions), and queryability (`worktree_events` tool reads the log directly).
+
+## What Changed From s11
+
+| Component          | Before (s11)               | After (s12)                                  |
+|--------------------|----------------------------|----------------------------------------------|
+| Coordination state | Task board (`owner/status`) | Task board + explicit `worktree` binding     |
+| Execution scope    | Shared directory            | Task-scoped isolated directory               |
+| Recoverability     | Task status only            | Task status + worktree index                 |
+| Teardown semantics | Task completion             | Task completion + explicit keep/remove       |
+| Lifecycle visibility | Implicit in logs          | Explicit events in `.worktrees/events.jsonl` |
+
+## Design Rationale
+
+Separating control plane from execution plane means you can reason about _what to do_ and _where to do it_ independently. A task can exist without a worktree (planning phase). A worktree can exist without a task (ad-hoc exploration). Binding them is an explicit action that writes state to both sides. This composability is the point -- it keeps the system recoverable after crashes. After an interruption, state reconstructs from `.tasks/` + `.worktrees/index.json` on disk. Volatile in-memory session state downgrades into explicit, durable file state. The event stream adds observability without coupling side effects into the critical path: auditing, notifications, and quota checks consume events rather than intercepting state writes.
+
+## Try It
+
+```sh
+cd learn-claude-code
+python agents/s12_worktree_task_isolation.py
+```
+
+Example prompts to try:
+
+1. `Create tasks for backend auth and frontend login page, then list tasks.`
+2. `Create worktree "auth-refactor" for task 1, create worktree "ui-login", then bind task 2 to "ui-login".`
+3. `Run "git status --short" in worktree "auth-refactor".`
+4. `Keep worktree "ui-login", then list worktrees and inspect worktree events.`
+5. `Remove worktree "auth-refactor" with complete_task=true, then list tasks/worktrees/events.`