chore: upgrade workflow models to Gemini Flash + Sonnet (#1374)

* chore: replace open-source models with Gemini Flash and Sonnet in workflows

Drop moonshotai/kimi-k2.5 and Haiku from refactor/security workflows.
Lightweight tasks (triage, issue-checker, community-coordinator) now use
google/gemini-3-flash-preview; all other teammates upgraded to Sonnet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: ensure CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in all workflows

Add the required feature flag export to refactor.sh and security.sh
(discovery.sh already had it). Also update SKILL.md wrapper template
and agent teams reference section to document the requirement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: persist CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS into .spawnrc

All three service scripts now check for ~/.spawnrc and idempotently
append the agent teams feature flag if missing. This ensures every
Claude session on the VM inherits the flag, not just the one launched
by the service script. Also documents the pattern in SKILL.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: add CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS to qa-cycle.sh

Complete the coverage — qa-cycle.sh now also exports the agent teams
feature flag and persists it to .spawnrc, matching the other three
service scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
L 2026-02-16 23:00:29 -05:00 committed by GitHub
parent 2b87735e3d
commit d452fdea37
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 68 additions and 10 deletions

View file

@ -123,6 +123,7 @@ SCRIPT_DIR="<REPO_ROOT>/.claude/skills/setup-agent-team"
export TRIGGER_SECRET="<secret-from-step-2>"
export TARGET_SCRIPT="${SCRIPT_DIR}/<target-script>.sh"
export REPO_ROOT="<REPO_ROOT>"
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
export MAX_CONCURRENT=5
export RUN_TIMEOUT_MS=7200000
exec bun run "${SCRIPT_DIR}/trigger-server.ts"
@ -400,6 +401,34 @@ If converting from a looping script, remove the `while true` / `sleep` and keep
## Agent Teams (ref: https://code.claude.com/docs/en/agent-teams)
**Agent teams are experimental and disabled by default.** Every service script and wrapper MUST export:
```bash
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
```
This can also be set in `settings.json`:
```json
{
"env": {
"CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
}
}
```
### .spawnrc persistence
On spawn VMs, `~/.spawnrc` is sourced by every agent launch command. Service scripts automatically inject the flag into `.spawnrc` if it exists, ensuring all Claude sessions on the VM inherit it:
```bash
if [[ -f "${HOME}/.spawnrc" ]]; then
grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
fi
```
This is idempotent — it only appends once. All four service scripts (`discovery.sh`, `refactor.sh`, `security.sh`, `qa-cycle.sh`) include this check.
All service scripts use **agent teams**, not subagents. Key differences:
| | Subagents | Agent Teams |

View file

@ -66,6 +66,11 @@ if [[ ! -f "${MANIFEST}" ]]; then
fi
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
if [[ -f "${HOME}/.spawnrc" ]]; then
grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
fi
get_matrix_summary() {
python3 - "${MANIFEST}" <<'PYEOF'

View file

@ -63,6 +63,14 @@ run_with_timeout() {
wait "$pid" 2>/dev/null
}
# Enable agent teams (required for team-based workflows)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
if [[ -f "${HOME}/.spawnrc" ]]; then
grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
fi
log "=== Starting QA cycle (reason=${SPAWN_REASON}) ==="
log "Repo root: ${REPO_ROOT}"
log "Timeout: ${CYCLE_TIMEOUT}s"

View file

@ -116,6 +116,14 @@ if [[ "${RUN_MODE}" == "refactor" ]]; then
fi
# Launch Claude Code with mode-specific prompt
# Enable agent teams (required for team-based workflows)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
if [[ -f "${HOME}/.spawnrc" ]]; then
grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
fi
log "Launching ${RUN_MODE} cycle..."
PROMPT_FILE=$(mktemp /tmp/refactor-prompt-XXXXXX.md)
@ -155,7 +163,7 @@ Complete within 10 minutes. At 7 min stop new work, at 9 min shutdown teammates,
## Team Structure
1. **issue-fixer** (Sonnet) — Diagnose root cause, implement fix in worktree, run tests, create PR with `Fixes #SPAWN_ISSUE_PLACEHOLDER`
2. **issue-tester** (Haiku) — Review fix for correctness/edge cases, run `bun test` + `bash -n` on modified .sh files, report results
2. **issue-tester** (Sonnet) — Review fix for correctness/edge cases, run `bun test` + `bash -n` on modified .sh files, report results
## Label Management
@ -304,8 +312,8 @@ Refactor team **creates PRs** — security team **reviews and merges** them.
1. **security-auditor** (Sonnet) — Scan .sh for injection/path traversal/credential leaks, .ts for XSS/prototype pollution. Fix HIGH/CRITICAL only, document medium/low.
2. **ux-engineer** (Sonnet) — Test e2e flows, improve error messages, fix UX papercuts, verify README examples.
3. **complexity-hunter** (Haiku) — Find functions >50 lines (bash) / >80 lines (ts). Pick top 2-3, ONE PR. Run tests after refactoring.
4. **test-engineer** (Haiku) — ONE test PR max. Add missing tests, verify shellcheck, run `bun test`, fix failures.
3. **complexity-hunter** (Sonnet) — Find functions >50 lines (bash) / >80 lines (ts). Pick top 2-3, ONE PR. Run tests after refactoring.
4. **test-engineer** (Sonnet) — ONE test PR max. Add missing tests, verify shellcheck, run `bun test`, fix failures.
5. **code-health** (Sonnet) — Proactive codebase health scan. ONE PR max.
Scan for:
@ -347,7 +355,7 @@ Refactor team **creates PRs** — security team **reviews and merges** them.
Leave unreviewed PRs alone. Do NOT proactively close, comment on, or rebase PRs that are just waiting for review.
6. **community-coordinator** (moonshotai/kimi-k2.5)
6. **community-coordinator** (google/gemini-3-flash-preview)
First: `gh issue list --repo OpenRouterTeam/spawn --state open --json number,title,body,labels,createdAt`
**COMPLETELY IGNORE issues labeled `discovery-team`, `cloud-proposal`, or `agent-proposal`** — those are managed by the discovery team. Do NOT comment on them, do NOT change labels, do NOT interact in any way. Filter them out:

View file

@ -151,6 +151,14 @@ done
log "Pre-cycle cleanup done."
# Launch Claude Code with mode-specific prompt
# Enable agent teams (required for team-based workflows)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# Persist into .spawnrc so all Claude sessions on this VM inherit the flag
if [[ -f "${HOME}/.spawnrc" ]]; then
grep -q 'CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS' "${HOME}/.spawnrc" 2>/dev/null || \
printf '\nexport CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1\n' >> "${HOME}/.spawnrc"
fi
log "Launching ${RUN_MODE} cycle..."
PROMPT_FILE=$(mktemp /tmp/security-prompt-XXXXXX.md)
@ -336,7 +344,7 @@ If zero PRs, skip to Step 3.
2. TaskCreate per PR
3. Spawn **pr-reviewer** (model=sonnet) per PR, named pr-reviewer-NUMBER
**CRITICAL: Copy the COMPLETE review protocol below into every reviewer's prompt.**
4. Spawn **branch-cleaner** (model=haiku) — see Step 3
4. Spawn **branch-cleaner** (model=sonnet) — see Step 3
### Per-PR Reviewer Protocol
@ -390,14 +398,14 @@ Each pr-reviewer MUST:
## Step 3 — Branch Cleanup
Spawn **branch-cleaner** (model=haiku):
Spawn **branch-cleaner** (model=sonnet):
- List remote branches: \`git branch -r --format='%(refname:short) %(committerdate:unix)'\`
- For each non-main branch: if no open PR + stale >48h → \`git push origin --delete BRANCH\`
- Report summary.
## Step 4 — Stale Issue Re-triage
Spawn **issue-checker** (model=moonshotai/kimi-k2.5):
Spawn **issue-checker** (model=google/gemini-3-flash-preview):
- \`gh issue list --repo OpenRouterTeam/spawn --state open --json number,title,labels,updatedAt,comments\`
- For each issue, fetch full context: \`gh issue view NUMBER --repo OpenRouterTeam/spawn --comments\`
- **STRICT DEDUP — MANDATORY**: Check comments for \`-- security/issue-checker\` OR \`-- security/triage\`. If EITHER sign-off already exists in ANY comment on the issue → **SKIP this issue entirely** (do NOT comment again) UNLESS there are new human comments posted AFTER the last security sign-off comment
@ -484,7 +492,7 @@ Cleanup: \`cd ${REPO_ROOT} && git worktree remove ${WORKTREE_BASE} --force && gi
1. **shell-auditor** (Opus) — Scan ALL .sh files for: command injection, credential leaks, path traversal, unsafe eval/source, curl|bash safety, macOS bash 3.x compat, permission issues. Run \`bash -n\` on every file. Classify CRITICAL/HIGH/MEDIUM/LOW.
2. **code-auditor** (Opus) — Scan ALL .ts files for: XSS/injection, prototype pollution, unsafe eval, dependency issues, auth bypass, info disclosure. Run \`bun test\`. Check key files for unexpected content.
3. **drift-detector** (Haiku) — Check for: uncommitted sensitive files (.env, keys), unexpected binaries, unusual permissions, suspicious recent commits (\`git log --oneline -50\`), .gitignore coverage.
3. **drift-detector** (Sonnet) — Check for: uncommitted sensitive files (.env, keys), unexpected binaries, unusual permissions, suspicious recent commits (\`git log --oneline -50\`), .gitignore coverage.
## Issue Filing
@ -538,10 +546,10 @@ log "Hard timeout: ${HARD_TIMEOUT}s"
IDLE_TIMEOUT=600 # 10 minutes of silence = hung
# Run claude in background so we can monitor output activity.
# Triage uses kimi-k2.5 (lightweight safety check); other modes use default (Opus) for team lead.
# Triage uses gemini-3-flash (lightweight safety check); other modes use default (Opus) for team lead.
CLAUDE_MODEL_FLAG=""
if [[ "${RUN_MODE}" == "triage" ]]; then
CLAUDE_MODEL_FLAG="--model moonshotai/kimi-k2.5"
CLAUDE_MODEL_FLAG="--model google/gemini-3-flash-preview"
fi
CLAUDE_PID_FILE=$(mktemp /tmp/claude-pid-XXXXXX)