diff --git a/.claude/skills/setup-agent-team/discovery-team-prompt.md b/.claude/skills/setup-agent-team/discovery-team-prompt.md index 5df46a9a..11dfef58 100644 --- a/.claude/skills/setup-agent-team/discovery-team-prompt.md +++ b/.claude/skills/setup-agent-team/discovery-team-prompt.md @@ -105,7 +105,7 @@ gh api graphql -f query=' Spawn an **implementer** teammate to: 1. Read the proposal issue for cloud/agent details 2. Implement it following CLAUDE.md Shell Script Rules -3. Add test coverage (test/record.sh + test/mock.sh) +3. Add test coverage (`bun test` in `cli/src/__tests__/`) 4. Create PR referencing the proposal issue 5. Label the proposal `ready-for-implementation` 6. Comment on the proposal: "Implementation PR: #NUMBER -- discovery/implementer" diff --git a/.claude/skills/setup-agent-team/security-review-all-prompt.md b/.claude/skills/setup-agent-team/security-review-all-prompt.md index 856e829f..76208bc1 100644 --- a/.claude/skills/setup-agent-team/security-review-all-prompt.md +++ b/.claude/skills/setup-agent-team/security-review-all-prompt.md @@ -73,7 +73,7 @@ Each pr-reviewer MUST: 6. **Security review** of every changed file: - Command injection, credential leaks, path traversal, XSS/injection, unsafe eval/source, curl|bash safety, macOS bash 3.x compat -7. **Test** (in worktree): `bash -n` on .sh files, `bun test` for .ts files, verify source fallback pattern +7. **Test** (in worktree): `bash -n` on .sh files, `bun test` for .ts files 8. **Decision** — Before posting any review, verify it applies to the **current HEAD commit**: - CRITICAL/HIGH found → `gh pr review NUMBER --request-changes` + label `security-review-required` diff --git a/CLAUDE.md b/CLAUDE.md index f80384ce..7232fa84 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -45,7 +45,7 @@ We are currently shipping with **9 curated clouds** (sorted by price): - **Prestige or unbeatable pricing** — must be a well-known brand OR beat our cheapest options - **Must be manually testable** — you need an account and can verify scripts work - **REST API or CLI with SSH/exec** — no proprietary-only access methods -- **Test coverage is mandatory** (see "Mock Test Infrastructure" section below) +- **Test coverage is mandatory** — add unit tests in `cli/src/__tests__/` Steps to add one: 1. Add cloud-specific TypeScript module in `cli/src/{cloud}/` @@ -53,9 +53,7 @@ Steps to add one: 3. Add `"missing"` entries to the matrix for every existing agent 4. Implement at least 2-3 agent scripts to prove the lib works 5. Update the cloud's `README.md` -6. **Add test coverage** (mandatory): - - `test/record.sh` — add to `ALL_RECORDABLE_CLOUDS`, add cases in `get_endpoints()`, `get_auth_env_var()`, `call_api()`, `has_api_error()`, and add a `_live_{cloud}()` function - - `test/mock.sh` — add cases in `_strip_api_base()`, `_validate_body()`, `assert_cloud_api_calls()`, and `setup_env_for_cloud()` +6. **Add test coverage** (mandatory) — add unit tests in `cli/src/__tests__/` **DO NOT add GPU clouds** (CoreWeave, RunPod, etc.). Spawn agents call remote LLM APIs for inference — they need cheap CPU instances with SSH, not expensive GPU VMs. @@ -85,10 +83,9 @@ Check `gh issue list --repo OpenRouterTeam/spawn --state open` for user requests ### 4. Extend tests -`test/run.sh` contains the test harness. When adding a new cloud or agent: -- Add mock functions for the cloud's CLI/API calls -- Add per-script assertions matching the agent's setup steps -- Run `bash test/run.sh` to verify +Tests use Bun's built-in test runner (`bun:test`). When adding a new cloud or agent: +- Add unit tests in `cli/src/__tests__/` with mocked fetch/prompts +- Run `bun test` to verify ## File Structure Convention @@ -117,7 +114,7 @@ spawn/ refactor.yml # Scheduled + issue-triggered refactor workflow manifest.json # The matrix (source of truth) discovery.sh # Run this to trigger one discovery cycle - test/run.sh # Test harness + test/fixtures/ # API response fixtures for testing README.md # User-facing docs CLAUDE.md # This file - contributor guide ``` @@ -194,30 +191,8 @@ All TypeScript code in `cli/src/` MUST use ESM (`import`/`export`): - Test files go in `cli/src/__tests__/` - Run tests with `bun test` - Use `import { describe, it, expect, beforeEach, afterEach, mock, spyOn } from "bun:test"` -- **NEVER run `bash test/mock.sh` locally** — it opens the browser (OAuth flow) and is disruptive. `test/mock.sh` is for CI only. Locally, only run `bun test` and `bash test/run.sh`. - -### Mock Test Infrastructure (MANDATORY for new clouds) - -When adding a new cloud provider, you **MUST** update all of these in `test/mock.sh`: - -| Function | What to add | -|---|---| -| `_strip_api_base()` | URL pattern to strip the cloud's API base (e.g., `https://api.newcloud.com/v1*`) | -| `_validate_body()` | Server creation endpoint + required POST fields | -| `assert_cloud_api_calls()` | Expected API calls (SSH key fetch + server create endpoints) | -| `setup_env_for_cloud()` | Test env vars (API token, server name, plan, region) | - -And in `test/record.sh`: - -| Function | What to add | -|---|---| -| `ALL_RECORDABLE_CLOUDS` | Cloud name to the list | -| `get_endpoints()` | GET-only API endpoints for fixture recording | -| `get_auth_env_var()` | Auth env var name mapping | -| `call_api()` | API function dispatcher | -| `has_api_error()` | Error response detection for the cloud's API | - -Without these, the cloud has **no test coverage**, body validation is missing, mock tests log `UNHANDLED_URL` warnings, and the QA cycle skips the cloud entirely. +- All tests must be pure unit tests with mocked fetch/prompts — **no subprocess spawning** (`execSync`, `spawnSync`, `Bun.spawn`) +- Test fixtures (API response snapshots) go in `test/fixtures/{cloud}/` ## CLI Version Management @@ -309,3 +284,19 @@ refactor.yml — GitHub Actions workflow that POSTs to the trigger server 2. Update `manifest.json` matrix status to `"implemented"` 3. Update the cloud's `README.md` with usage instructions 4. Commit with a descriptive message + +## Filing Issues for Discovered Problems + +When you encounter bugs, stale references, broken functionality, or architectural issues that are **outside the scope of your current task**, file a GitHub issue immediately rather than ignoring them or trying to fix everything at once: + +```bash +gh issue create --repo OpenRouterTeam/spawn --title "bug: " --body "
" +``` + +Examples of when to file: +- Dead code or stale references to files/functions that no longer exist +- Broken features (e.g., `spawn delete` references non-existent shell scripts) +- Security concerns that need separate review +- Architectural debt that would be too large to fix in the current PR + +**Do NOT silently ignore problems.** If you find something weird and won't fix it now, file an issue so it's tracked.