opencode/perf/test-suite.md

# Test Suite Speed

## Goal

Speed up the `packages/opencode` test suite without reducing coverage or hiding failures.

## Benchmark Command

Run from `packages/opencode`:

```sh
bun run bench:test
```

The full-suite benchmark defaults to one measured run. Use repeated runs only after a targeted win:

```sh
BENCH_WARMUPS=1 BENCH_RUNS=3 bun run bench:test
```

To identify slow files, run:

```sh
bun run profile:test
```

Scope it while exploring:

```sh
TEST_PROFILE_GLOB='test/server/**/*.test.ts' bun run profile:test
TEST_PROFILE_LIMIT=20 bun run profile:test
```

## Primary Metric

`METRIC test_suite_seconds=<median wall clock seconds>`

## Secondary Metrics

`test_suite_best_seconds`, `test_suite_worst_seconds`, failures, and noisy spread.

For profiling: `slowest_test_file_seconds` and the slowest file list.

## Files In Scope

`packages/opencode/test/**`, test fixtures, package test scripts, and implementation setup paths only when a benchmarked bottleneck points there.

## Signals To Watch

Repeated setup work, long sleeps/timeouts, serial integration tests, filesystem/database fixture costs, and broad test globs pulling unrelated work.

## Hypothesis Loop

| Hypothesis                                                                                                | Change                                                                                         | Before    | After   | Decision | Notes                                                                                                                            |
| --------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | --------- | ------- | -------- | -------------------------------------------------------------------------------------------------------------------------------- |
| Repeated full-suite runs are too expensive for discovery                                                  | Switched full-suite benchmark to one run and added per-file profiler                           | ~250s/run | pending | keep     | Bun has no slowest-test reporter in this version; profile files directly.                                                        |
| Plugin install concurrency test spends time spawning more workers than needed to exercise lock contention | Reduced worker counts from 12/10/8 to 6/6/5; kept `holdMs: 30`                                 | 7.800s    | 6.204s  | keep     | Median from 3 targeted runs; still covers concurrent cross-process writes to server, server+tui, and existing json config.       |
| `httpapi-listen` PTY route tests pay for git repositories they do not assert on                           | Removed `git: true` from temp dirs while keeping config setup                                  | 10.554s   | 7.818s  | keep     | Median from 3 targeted runs; HTTP routes, tickets, websocket upgrade, restart, and no-auth paths still pass.                     |
| `workspace.waitForSync` timeout test waits the full production timeout                                    | Added optional timeout parameter defaulting to production timeout; timeout test uses 25ms      | 12.949s   | 8.305s  | keep     | Median from 3 targeted runs; production callers keep the 5000ms default.                                                         |
| `config.test` waits after dependencies even though `.gitignore` is written synchronously                  | Removed obsolete 1000ms sleep from writable `OPENCODE_CONFIG_DIR` test                         | 10.270s   | 9.433s  | keep     | Median from 5 targeted runs because one run was noisy; simpler test and no fixed sleep.                                          |
| SDK parity helpers create git repos for tests that only need files/config/session state                   | Changed `withProject` default to no git; explicit git init test still opts into no-git fixture | 8.011s    | 5.180s  | keep     | Median from 5 targeted runs because first run was cold/noisy.                                                                    |
| Provider plugin filter test waits on plugin dependency readiness setup                                    | Marked local plugin dependencies ready using the existing fixture helper                       | 7.543s    | 6.366s  | keep     | Median from 3 targeted runs; matches neighboring plugin provider test setup.                                                     |
| HTTP provider tests generate local plugins without dependency-ready fixture state                         | Marked generated `.opencode` plugin fixtures dependency-ready                                  | 7.905s    | 2.980s  | keep     | Median from 3 targeted runs; avoids unrelated plugin dependency setup in route tests.                                            |
| TUI plugin lifecycle timeout coverage waits the full production cleanup timeout                           | Added optional runtime dispose timeout override and used 25ms in the timeout test              | 7.330s    | 1.507s  | keep     | Median from 3 targeted runs; production default remains 5000ms.                                                                  |
| Skill tool test initializes git even though it only reads local skill files                               | Removed `git: true` from the temporary directory fixture                                       | 2.320s    | 1.425s  | keep     | Single targeted rerun; still exercises skill discovery, permission request, and bundled file output.                             |
| Prompt shell semantics tests initialize git though they only assert shell/session behavior                | Removed `git: true` from shell-focused prompt fixtures while preserving config setup           | 26.930s   | 23.400s | keep     | Three targeted reruns passed after the change: 23.80s, 23.55s, 23.40s.                                                           |
| Remaining prompt behavior tests mostly do not require repository state                                    | Removed git setup from safe loop/reference/error fixtures; restored shell queue/cancel cases   | 23.400s   | 19.610s | keep     | Safety review found shell runner readiness depends on git-backed setup in several tests; current single rerun passes.            |
| Session processor effect tests do not require repository state                                            | Removed git setup from all processor-effect temp server fixtures                               | 12.500s   | 9.230s  | keep     | Two targeted reruns passed after the change: 9.61s, 9.23s.                                                                       |
| HTTP listen PTY ticket tests restart the same listener topology twice                                     | Folded directory-scoped ticket regression into the broader unsafe-ticket test                  | 7.051s    | 6.170s  | keep     | Two targeted reruns passed after the change: 6.76s, 6.17s; still covers mint failure and successful same-directory upgrade.      |
| File watcher readiness can write before async native subscriptions are active                             | Retried short readiness writes and accepted symlink-realpath HEAD events                       | failed    | 4.62s   | keep     | Three sequential focused watcher runs passed: 4.62s, 4.57s, 4.64s; full suite no longer failed in `watcher.test.ts`.             |
| First provider config/env/filtering block can use Effect-aware instance fixtures                          | Migrated six `tmpdir` + `withTestInstance` cases to `it.instance`                              | 6.06s     | 6.07s   | keep     | Neutral timing, but removes manual config file writes and instance plumbing; use as the pattern for later provider slices.       |
| Custom provider/model config cases can use Effect-aware instance fixtures                                 | Migrated three more config-heavy provider cases to `it.instance`                               | 6.07s     | 6.12s   | keep     | Neutral timing within noise, but continues removing manual config file writes on top of the first provider fixture PR.           |
| Provider env precedence and model lookup cases can use Effect-aware instance fixtures                     | Migrated four more provider lookup/default-model cases to `it.instance`                        | 6.12s     | 6.36s   | keep     | Noisy 5-run median; kept as a small stacked cleanup slice but do not claim speedup from this migration.                          |
| Simple config load cases can use Effect-aware instance fixtures                                           | Migrated JSON, shell, formatter, and lsp config load cases to `it.instance`                    | 14.18s    | 3.93s   | keep     | Three-run medians before/after; removes manual `tmpdir` + `withTestInstance` setup from the first simple config block.           |
| Config template, file include, and simple agent cases can use Effect-aware instance fixtures              | Migrated JSONC, env/file substitution, invalid config, and agent config cases to `it.instance` | 1.87s     | 1.90s   | keep     | Stacked on the first config slice; neutral timing but removes more manual `tmpdir` + instance plumbing.                          |
| Agent option, command, and legacy migration config cases can use Effect-aware instance fixtures           | Migrated agent variant, command, autoshare, and mode migration cases to `it.instance`          | 1.90s     | 1.83s   | keep     | Stacked on the config template slice; small neutral-to-positive timing and less manual setup.                                    |
| Local config update and directory cases can use Effect-aware instance fixtures                            | Migrated local `update` and `directories` cases to `it.instance`                               | 1.77s     | 1.71s   | keep     | Three-run medians; small positive/neutral timing, removes manual instance plumbing, and eliminates one existing unsafe cast.     |
| `.opencode` agent and command file-loading cases can use Effect-aware instance fixtures                   | Migrated singular/plural agent and command markdown fixture cases to `it.instance`             | 7.21s     | 1.87s   | keep     | Parent baseline was noisy (7.42, 7.21, 2.83); after runs were stable at 1.87, 1.98, 1.83. Keep as cleanup with no broad claim.   |
| Legacy tools and permission-order config cases can use Effect-aware instance fixtures                     | Migrated legacy `tools` migration and permission order cases to `it.instance`                  | 1.87s     | 1.87s   | keep     | Neutral timing; removes more manual temp-instance plumbing from legacy config migration coverage.                                |
| Remaining simple config load cases can use Effect-aware instance fixtures                                 | Migrated default config load and legacy TUI-key cases to `it.instance`                         | 7.78s     | 6.39s   | keep     | Single baseline before edit; after median from three sequential reruns (5.76, 6.39, 6.53). Keep as cleanup with cautious timing. |
| Managed settings config cases can use Effect-aware instance fixtures                                      | Migrated managed override and missing-managed-file cases to `it.instance`                      | 2.40s     | 1.76s   | keep     | Single baseline before edit; after median from three sequential reruns (1.75, 1.76, 1.80).                                       |
| Local plugin and subagent config fixtures can use Effect-aware instance fixtures                          | Migrated scoped npm plugin and custom subagent markdown cases to `it.instance`                 | 2.37s     | 1.67s   | keep     | Single baseline before edit; after median from three sequential reruns (1.66, 1.67, 1.67).                                       |
| MCP merge config cases can use Effect-aware instance fixtures                                             | Migrated three MCP merge/override cases to `it.instance`                                       | 1.98s     | 1.95s   | keep     | Neutral timing within noise; removes manual `tmpdir` + `withTestInstance` setup from isolated filesystem-only config cases.      |
| Remaining legacy tools config cases can use Effect-aware instance fixtures                                | Migrated allow/deny legacy `tools` permission cases to `it.instance`                           | 2.65s     | 1.90s   | keep     | Single baseline before edit; after median from three sequential reruns (2.58, 1.90, 1.90).                                       |
| Oversized snapshot batch tests only need to cross the 100-file boundary                                   | Reduced large diff/revert fixture sizes while keeping each case above the batch boundary       | 4.32s     | 3.66s   | keep     | Three affected snapshot tests; after median from three reruns (4.32, 3.66, 3.66) while still crossing the 100-file boundary.     |
| Prompt tests without LLM calls do not need the test LLM server                                            | Added a no-server runner and moved obvious non-LLM prompt/shell cases to it                    | 25.41s    | 21.03s  | keep     | Full prompt file after simplify pass median from three reruns (20.66, 21.03, 21.64); LLM-backed tests stay on original runner.   |
| CLI run subprocess cases can run independently                                                            | Marked `run-process.test.ts` subprocess cases concurrent                                       | 11.87s    | 4.13s   | keep     | Newest-dev single baseline; after median from three reruns (4.13, 4.17, 4.11). Each case has an isolated temp home and LLM port. |
| Snapshot initialization does not need to commit seeded files in the source repo                           | Removed extra `git add`/`commit` from the snapshot test `initialize()` helper                  | 22.22s    | 20.23s  | keep     | Newest-dev single baseline; after median from three reruns (20.23, 22.59, 20.11). Fixture still creates a git repo root commit.  |
| Processor AI SDK tool-call case does not assert git behavior                                              | Removed `git: true` from the non-native tool-call processor test                               | 10.22s    | 9.48s   | keep     | Newest-dev single baseline; full-file after median from three reruns (9.48, 9.60, 9.36); focused case passes in 1.39s.           |

## Profiling Results

Command shape:

```sh
TEST_PROFILE_GLOB='test/<area>/**/*.test.ts' TEST_PROFILE_TOP=15 bun run profile:test
```

Initial slowest files observed during discovery:

| File                                      | Seconds | Scope         |
| ----------------------------------------- | ------: | ------------- |
| `test/config/config.test.ts`              |  23.546 | config        |
| `test/provider/provider.test.ts`          |  18.747 | provider      |
| `test/control-plane/workspace.test.ts`    |  16.447 | control-plane |
| `test/plugin/install-concurrency.test.ts` |  14.804 | plugin        |
| `test/server/httpapi-cors.test.ts`        |  14.620 | server        |
| `test/server/httpapi-listen.test.ts`      |  10.073 | server        |
| `test/server/httpapi-sdk.test.ts`         |   8.661 | server        |
| `test/server/httpapi-provider.test.ts`    |   7.905 | server        |
| `test/cli/tui/plugin-lifecycle.test.ts`   |   7.330 | cli/tui       |
| `test/file/index.test.ts`                 |   7.214 | file          |

This table is historical profiling input, not the current ranking after kept changes.

Targeted 3-run baselines:

| File                                      | Runs                   | Median | Notes                                                                        |
| ----------------------------------------- | ---------------------- | -----: | ---------------------------------------------------------------------------- |
| `test/control-plane/workspace.test.ts`    | 12.949, 12.949, 12.773 | 12.949 | Stable slow target.                                                          |
| `test/server/httpapi-listen.test.ts`      | 10.554, 10.631, 10.479 | 10.554 | Stable slow target; WebSocket/listener lifecycle.                            |
| `test/config/config.test.ts`              | 10.270, 9.042, 10.737  | 10.270 | Large serial file; initial 23s was mixed-scope contention/noise.             |
| `test/server/httpapi-sdk.test.ts`         | 7.600, 8.011, 8.035    |  8.011 | Stable slow target.                                                          |
| `test/plugin/install-concurrency.test.ts` | 7.949, 7.800, 7.712    |  7.800 | Stable slow target; many subprocesses.                                       |
| `test/provider/provider.test.ts`          | 8.323, 7.543, 7.474    |  7.543 | Large serial file.                                                           |
| `test/server/httpapi-cors.test.ts`        | 2.621, 1.682, 1.518    |  1.682 | Not a standalone top target; initial 14s was mixed-scope noise/order effect. |

Full-suite sanity checks:

| Command              |   Result | Notes                                                                                                                                           |
| -------------------- | -------: | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `bun run bench:test` | 225.069s | Before continuing prompt/session work.                                                                                                          |
| `bun run bench:test` | 186.729s | After prompt, processor, and PTY wins before safety review restores.                                                                            |
| `bun run bench:test` | 202.317s | After restoring prompt shell coverage and SDK VCS parity coverage.                                                                              |
| `bun run bench:test` |   failed | Watcher blocker cleared; current run later failed in focused-passing `tool/skill.test.ts` and prompt shell timeout cases under full-suite load. |

## Dead Ends

| Hypothesis                                                             | Change Tried                                                                             | Before |  After | Decision | Notes                                                                                         |
| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | -----: | -----: | -------- | --------------------------------------------------------------------------------------------- |
| `file/index.test.ts` pays unnecessary per-test global instance cleanup | Removed `afterEach(disposeAllInstances)` while keeping the explicit disposal test import | 5.262s | 5.089s | discard  | Improvement was within noise and the cleanup is a safety guard for many instance-state tests. |
| Socket reset retry test can shorten its idle-timeout path              | Reduced Bun server idle timeout and tried forced server close                            | 16.46s | failed | discard  | Shorter idle timeout changed the error shape; forced close hung. Keep the real socket reset.  |
| `tool/webfetch` can avoid per-test instance setup                      | Switched local HTTP tests from `it.instance` to `it.live`                                | 1.219s | failed | discard  | Tool execution reads instance-local agent state, so the temp instance is required.            |
| LSP client interop tests can shorten coarse request-handling sleeps    | Reduced fixed post-notification waits from 100ms to 10ms                                 | 4.270s | 4.740s | discard  | First run improved to 3.870s but verification was slower than baseline; not a clear win.      |
| Config content env cases can use Effect-aware instance fixtures        | Migrated two `OPENCODE_CONFIG_CONTENT` token substitution cases to `it.instance`         |  1.95s |  2.06s | discard  | Passing but not neutral-or-better in focused reruns; keep existing explicit env cleanup.      |