chore: upgrade workflow models to Gemini Flash + Sonnet (#1374)

* chore: replace open-source models with Gemini Flash and Sonnet in workflows Drop moonshotai/kimi-k2.5 and Haiku from refactor/security workflows. Lightweight tasks (triage, issue-checker, community-coordinator) now use google/gemini-3-flash-preview; all other teammates upgraded to Sonnet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: ensure CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in all workflows Add the required feature flag export to refactor.sh and security.sh (discovery.sh already had it). Also update SKILL.md wrapper template and agent teams reference section to document the requirement. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: persist CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS into .spawnrc All three service scripts now check for ~/.spawnrc and idempotently append the agent teams feature flag if missing. This ensures every Claude session on the VM inherits the flag, not just the one launched by the service script. Also documents the pattern in SKILL.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to qa-cycle.sh Complete the coverage — qa-cycle.sh now also exports the agent teams feature flag and persists it to .spawnrc, matching the other three service scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 03:14:57 +00:00 · 2026-02-16 23:00:29 -05:00 · 2026-02-16 23:00:29 -05:00 · d452fdea37
commit d452fdea37
parent 2b87735e3d
5 changed files with 68 additions and 10 deletions
--- a/.claude/skills/setup-agent-team/SKILL.md
+++ b/.claude/skills/setup-agent-team/SKILL.md
@ -123,6 +123,7 @@ SCRIPT_DIR="<REPO_ROOT>/.claude/skills/setup-agent-team"
 export TRIGGER_SECRET="<secret-from-step-2>"
 export TARGET_SCRIPT="${SCRIPT_DIR}/<target-script>.sh"
 export REPO_ROOT="<REPO_ROOT>"
+export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
 export MAX_CONCURRENT=5
 export RUN_TIMEOUT_MS=7200000
 exec bun run "${SCRIPT_DIR}/trigger-server.ts"
@ -400,6 +401,34 @@ If converting from a looping script, remove the `while true` / `sleep` and keep

 ## Agent Teams (ref: https://code.claude.com/docs/en/agent-teams)

+**Agent teams are experimental and disabled by default.** Every service script and wrapper MUST export:
+
+```bash
+export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
+```
+
+This can also be set in `settings.json`:
+```json
+{
+  "env": {
+    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
+  }
+}
+```
+
+### .spawnrc persistence
+
+On spawn VMs, `~/.spawnrc` is sourced by every agent launch command. Service scripts automatically inject the flag into `.spawnrc` if it exists, ensuring all Claude sessions on the VM inherit it:
+
+```bash
+if [[ -f "${HOME}/.spawnrc" ]]; then
+    grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
+        printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
+fi
+```
+
+This is idempotent — it only appends once. All four service scripts (`discovery.sh`, `refactor.sh`, `security.sh`, `qa-cycle.sh`) include this check.
+
 All service scripts use **agent teams**, not subagents. Key differences:

 | | Subagents | Agent Teams |
--- a/.claude/skills/setup-agent-team/discovery.sh
+++ b/.claude/skills/setup-agent-team/discovery.sh
@ -66,6 +66,11 @@ if [[ ! -f "${MANIFEST}" ]]; then
 fi

 export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
+# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
+if [[ -f "${HOME}/.spawnrc" ]]; then
+    grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
+        printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
+fi

 get_matrix_summary() {
    python3 - "${MANIFEST}" <<'PYEOF'
--- a/.claude/skills/setup-agent-team/qa-cycle.sh
+++ b/.claude/skills/setup-agent-team/qa-cycle.sh
@ -63,6 +63,14 @@ run_with_timeout() {
    wait "$pid" 2>/dev/null
 }

+# Enable agent teams (required for team-based workflows)
+export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
+# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
+if [[ -f "${HOME}/.spawnrc" ]]; then
+    grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
+        printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
+fi
+
 log "=== Starting QA cycle (reason=${SPAWN_REASON}) ==="
 log "Repo root: ${REPO_ROOT}"
 log "Timeout: ${CYCLE_TIMEOUT}s"
--- a/.claude/skills/setup-agent-team/refactor.sh
+++ b/.claude/skills/setup-agent-team/refactor.sh
@ -116,6 +116,14 @@ if [[ "${RUN_MODE}" == "refactor" ]]; then
 fi

 # Launch Claude Code with mode-specific prompt
+# Enable agent teams (required for team-based workflows)
+export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
+# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
+if [[ -f "${HOME}/.spawnrc" ]]; then
+    grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
+        printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
+fi
+
 log "Launching ${RUN_MODE} cycle..."

 PROMPT_FILE=$(mktemp /tmp/refactor-prompt-XXXXXX.md)
@ -155,7 +163,7 @@ Complete within 10 minutes. At 7 min stop new work, at 9 min shutdown teammates,
 ## Team Structure

 1. **issue-fixer** (Sonnet) — Diagnose root cause, implement fix in worktree, run tests, create PR with `Fixes #SPAWN_ISSUE_PLACEHOLDER`
-2. **issue-tester** (Haiku) — Review fix for correctness/edge cases, run `bun test` + `bash -n` on modified .sh files, report results
+2. **issue-tester** (Sonnet) — Review fix for correctness/edge cases, run `bun test` + `bash -n` on modified .sh files, report results

 ## Label Management

@ -304,8 +312,8 @@ Refactor team **creates PRs** — security team **reviews and merges** them.

 1. **security-auditor** (Sonnet) — Scan .sh for injection/path traversal/credential leaks, .ts for XSS/prototype pollution. Fix HIGH/CRITICAL only, document medium/low.
 2. **ux-engineer** (Sonnet) — Test e2e flows, improve error messages, fix UX papercuts, verify README examples.
-3. **complexity-hunter** (Haiku) — Find functions >50 lines (bash) / >80 lines (ts). Pick top 2-3, ONE PR. Run tests after refactoring.
-4. **test-engineer** (Haiku) — ONE test PR max. Add missing tests, verify shellcheck, run `bun test`, fix failures.
+3. **complexity-hunter** (Sonnet) — Find functions >50 lines (bash) / >80 lines (ts). Pick top 2-3, ONE PR. Run tests after refactoring.
+4. **test-engineer** (Sonnet) — ONE test PR max. Add missing tests, verify shellcheck, run `bun test`, fix failures.

 5. **code-health** (Sonnet) — Proactive codebase health scan. ONE PR max.
   Scan for:
@ -347,7 +355,7 @@ Refactor team **creates PRs** — security team **reviews and merges** them.

   Leave unreviewed PRs alone. Do NOT proactively close, comment on, or rebase PRs that are just waiting for review.

-6. **community-coordinator** (moonshotai/kimi-k2.5)
+6. **community-coordinator** (google/gemini-3-flash-preview)
   First: `gh issue list --repo OpenRouterTeam/spawn --state open --json number,title,body,labels,createdAt`

   **COMPLETELY IGNORE issues labeled `discovery-team`, `cloud-proposal`, or `agent-proposal`** — those are managed by the discovery team. Do NOT comment on them, do NOT change labels, do NOT interact in any way. Filter them out:
--- a/.claude/skills/setup-agent-team/security.sh
+++ b/.claude/skills/setup-agent-team/security.sh
@ -151,6 +151,14 @@ done
 log "Pre-cycle cleanup done."

 # Launch Claude Code with mode-specific prompt
+# Enable agent teams (required for team-based workflows)
+export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
+# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
+if [[ -f "${HOME}/.spawnrc" ]]; then
+    grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
+        printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
+fi
+
 log "Launching ${RUN_MODE} cycle..."

 PROMPT_FILE=$(mktemp /tmp/security-prompt-XXXXXX.md)
@ -336,7 +344,7 @@ If zero PRs, skip to Step 3.
 2. TaskCreate per PR
 3. Spawn **pr-reviewer** (model=sonnet) per PR, named pr-reviewer-NUMBER
   **CRITICAL: Copy the COMPLETE review protocol below into every reviewer's prompt.**
-4. Spawn **branch-cleaner** (model=haiku) — see Step 3
+4. Spawn **branch-cleaner** (model=sonnet) — see Step 3

 ### Per-PR Reviewer Protocol

@ -390,14 +398,14 @@ Each pr-reviewer MUST:

 ## Step 3 — Branch Cleanup

-Spawn **branch-cleaner** (model=haiku):
+Spawn **branch-cleaner** (model=sonnet):
 - List remote branches: \`git branch -r --format='%(refname:short) %(committerdate:unix)'\`
 - For each non-main branch: if no open PR + stale >48h → \`git push origin --delete BRANCH\`
 - Report summary.

 ## Step 4 — Stale Issue Re-triage

-Spawn **issue-checker** (model=moonshotai/kimi-k2.5):
+Spawn **issue-checker** (model=google/gemini-3-flash-preview):
 - \`gh issue list --repo OpenRouterTeam/spawn --state open --json number,title,labels,updatedAt,comments\`
 - For each issue, fetch full context: \`gh issue view NUMBER --repo OpenRouterTeam/spawn --comments\`
 - **STRICT DEDUP — MANDATORY**: Check comments for \`-- security/issue-checker\` OR \`-- security/triage\`. If EITHER sign-off already exists in ANY comment on the issue → **SKIP this issue entirely** (do NOT comment again) UNLESS there are new human comments posted AFTER the last security sign-off comment
@ -484,7 +492,7 @@ Cleanup: \`cd ${REPO_ROOT} && git worktree remove ${WORKTREE_BASE} --force && gi

 1. **shell-auditor** (Opus) — Scan ALL .sh files for: command injection, credential leaks, path traversal, unsafe eval/source, curl|bash safety, macOS bash 3.x compat, permission issues. Run \`bash -n\` on every file. Classify CRITICAL/HIGH/MEDIUM/LOW.
 2. **code-auditor** (Opus) — Scan ALL .ts files for: XSS/injection, prototype pollution, unsafe eval, dependency issues, auth bypass, info disclosure. Run \`bun test\`. Check key files for unexpected content.
-3. **drift-detector** (Haiku) — Check for: uncommitted sensitive files (.env, keys), unexpected binaries, unusual permissions, suspicious recent commits (\`git log --oneline -50\`), .gitignore coverage.
+3. **drift-detector** (Sonnet) — Check for: uncommitted sensitive files (.env, keys), unexpected binaries, unusual permissions, suspicious recent commits (\`git log --oneline -50\`), .gitignore coverage.

 ## Issue Filing

@ -538,10 +546,10 @@ log "Hard timeout: ${HARD_TIMEOUT}s"
 IDLE_TIMEOUT=600  # 10 minutes of silence = hung

 # Run claude in background so we can monitor output activity.
-# Triage uses kimi-k2.5 (lightweight safety check); other modes use default (Opus) for team lead.
+# Triage uses gemini-3-flash (lightweight safety check); other modes use default (Opus) for team lead.
 CLAUDE_MODEL_FLAG=""
 if [[ "${RUN_MODE}" == "triage" ]]; then
-    CLAUDE_MODEL_FLAG="--model moonshotai/kimi-k2.5"
+    CLAUDE_MODEL_FLAG="--model google/gemini-3-flash-preview"
 fi

 CLAUDE_PID_FILE=$(mktemp /tmp/claude-pid-XXXXXX)