opencode/perf/test-suite.md
2026-05-19 18:09:15 -04:00

145 lines
20 KiB
Markdown

# Test Suite Speed
## Goal
Speed up the `packages/opencode` test suite without reducing coverage or hiding failures.
## Benchmark Command
Run from `packages/opencode`:
```sh
bun run bench:test
```
The full-suite benchmark defaults to one measured run. Use repeated runs only after a targeted win:
```sh
BENCH_WARMUPS=1 BENCH_RUNS=3 bun run bench:test
```
To identify slow files, run:
```sh
bun run profile:test
```
Scope it while exploring:
```sh
TEST_PROFILE_GLOB='test/server/**/*.test.ts' bun run profile:test
TEST_PROFILE_LIMIT=20 bun run profile:test
```
## Primary Metric
`METRIC test_suite_seconds=<median wall clock seconds>`
## Secondary Metrics
`test_suite_best_seconds`, `test_suite_worst_seconds`, failures, and noisy spread.
For profiling: `slowest_test_file_seconds` and the slowest file list.
## Files In Scope
`packages/opencode/test/**`, test fixtures, package test scripts, and implementation setup paths only when a benchmarked bottleneck points there.
## Signals To Watch
Repeated setup work, long sleeps/timeouts, serial integration tests, filesystem/database fixture costs, and broad test globs pulling unrelated work.
## Hypothesis Loop
| Hypothesis | Change | Before | After | Decision | Notes |
| --------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | --------- | ------- | -------- | -------------------------------------------------------------------------------------------------------------------------------- |
| Repeated full-suite runs are too expensive for discovery | Switched full-suite benchmark to one run and added per-file profiler | ~250s/run | pending | keep | Bun has no slowest-test reporter in this version; profile files directly. |
| Plugin install concurrency test spends time spawning more workers than needed to exercise lock contention | Reduced worker counts from 12/10/8 to 6/6/5; kept `holdMs: 30` | 7.800s | 6.204s | keep | Median from 3 targeted runs; still covers concurrent cross-process writes to server, server+tui, and existing json config. |
| `httpapi-listen` PTY route tests pay for git repositories they do not assert on | Removed `git: true` from temp dirs while keeping config setup | 10.554s | 7.818s | keep | Median from 3 targeted runs; HTTP routes, tickets, websocket upgrade, restart, and no-auth paths still pass. |
| `workspace.waitForSync` timeout test waits the full production timeout | Added optional timeout parameter defaulting to production timeout; timeout test uses 25ms | 12.949s | 8.305s | keep | Median from 3 targeted runs; production callers keep the 5000ms default. |
| `config.test` waits after dependencies even though `.gitignore` is written synchronously | Removed obsolete 1000ms sleep from writable `OPENCODE_CONFIG_DIR` test | 10.270s | 9.433s | keep | Median from 5 targeted runs because one run was noisy; simpler test and no fixed sleep. |
| SDK parity helpers create git repos for tests that only need files/config/session state | Changed `withProject` default to no git; explicit git init test still opts into no-git fixture | 8.011s | 5.180s | keep | Median from 5 targeted runs because first run was cold/noisy. |
| Provider plugin filter test waits on plugin dependency readiness setup | Marked local plugin dependencies ready using the existing fixture helper | 7.543s | 6.366s | keep | Median from 3 targeted runs; matches neighboring plugin provider test setup. |
| HTTP provider tests generate local plugins without dependency-ready fixture state | Marked generated `.opencode` plugin fixtures dependency-ready | 7.905s | 2.980s | keep | Median from 3 targeted runs; avoids unrelated plugin dependency setup in route tests. |
| TUI plugin lifecycle timeout coverage waits the full production cleanup timeout | Added optional runtime dispose timeout override and used 25ms in the timeout test | 7.330s | 1.507s | keep | Median from 3 targeted runs; production default remains 5000ms. |
| Skill tool test initializes git even though it only reads local skill files | Removed `git: true` from the temporary directory fixture | 2.320s | 1.425s | keep | Single targeted rerun; still exercises skill discovery, permission request, and bundled file output. |
| Prompt shell semantics tests initialize git though they only assert shell/session behavior | Removed `git: true` from shell-focused prompt fixtures while preserving config setup | 26.930s | 23.400s | keep | Three targeted reruns passed after the change: 23.80s, 23.55s, 23.40s. |
| Remaining prompt behavior tests mostly do not require repository state | Removed git setup from safe loop/reference/error fixtures; restored shell queue/cancel cases | 23.400s | 19.610s | keep | Safety review found shell runner readiness depends on git-backed setup in several tests; current single rerun passes. |
| Session processor effect tests do not require repository state | Removed git setup from all processor-effect temp server fixtures | 12.500s | 9.230s | keep | Two targeted reruns passed after the change: 9.61s, 9.23s. |
| HTTP listen PTY ticket tests restart the same listener topology twice | Folded directory-scoped ticket regression into the broader unsafe-ticket test | 7.051s | 6.170s | keep | Two targeted reruns passed after the change: 6.76s, 6.17s; still covers mint failure and successful same-directory upgrade. |
| File watcher readiness can write before async native subscriptions are active | Retried short readiness writes and accepted symlink-realpath HEAD events | failed | 4.62s | keep | Three sequential focused watcher runs passed: 4.62s, 4.57s, 4.64s; full suite no longer failed in `watcher.test.ts`. |
| First provider config/env/filtering block can use Effect-aware instance fixtures | Migrated six `tmpdir` + `withTestInstance` cases to `it.instance` | 6.06s | 6.07s | keep | Neutral timing, but removes manual config file writes and instance plumbing; use as the pattern for later provider slices. |
| Custom provider/model config cases can use Effect-aware instance fixtures | Migrated three more config-heavy provider cases to `it.instance` | 6.07s | 6.12s | keep | Neutral timing within noise, but continues removing manual config file writes on top of the first provider fixture PR. |
| Provider env precedence and model lookup cases can use Effect-aware instance fixtures | Migrated four more provider lookup/default-model cases to `it.instance` | 6.12s | 6.36s | keep | Noisy 5-run median; kept as a small stacked cleanup slice but do not claim speedup from this migration. |
| Simple config load cases can use Effect-aware instance fixtures | Migrated JSON, shell, formatter, and lsp config load cases to `it.instance` | 14.18s | 3.93s | keep | Three-run medians before/after; removes manual `tmpdir` + `withTestInstance` setup from the first simple config block. |
| Config template, file include, and simple agent cases can use Effect-aware instance fixtures | Migrated JSONC, env/file substitution, invalid config, and agent config cases to `it.instance` | 1.87s | 1.90s | keep | Stacked on the first config slice; neutral timing but removes more manual `tmpdir` + instance plumbing. |
| Agent option, command, and legacy migration config cases can use Effect-aware instance fixtures | Migrated agent variant, command, autoshare, and mode migration cases to `it.instance` | 1.90s | 1.83s | keep | Stacked on the config template slice; small neutral-to-positive timing and less manual setup. |
| Local config update and directory cases can use Effect-aware instance fixtures | Migrated local `update` and `directories` cases to `it.instance` | 1.77s | 1.71s | keep | Three-run medians; small positive/neutral timing, removes manual instance plumbing, and eliminates one existing unsafe cast. |
| `.opencode` agent and command file-loading cases can use Effect-aware instance fixtures | Migrated singular/plural agent and command markdown fixture cases to `it.instance` | 7.21s | 1.87s | keep | Parent baseline was noisy (7.42, 7.21, 2.83); after runs were stable at 1.87, 1.98, 1.83. Keep as cleanup with no broad claim. |
| Legacy tools and permission-order config cases can use Effect-aware instance fixtures | Migrated legacy `tools` migration and permission order cases to `it.instance` | 1.87s | 1.87s | keep | Neutral timing; removes more manual temp-instance plumbing from legacy config migration coverage. |
| Remaining simple config load cases can use Effect-aware instance fixtures | Migrated default config load and legacy TUI-key cases to `it.instance` | 7.78s | 6.39s | keep | Single baseline before edit; after median from three sequential reruns (5.76, 6.39, 6.53). Keep as cleanup with cautious timing. |
| Managed settings config cases can use Effect-aware instance fixtures | Migrated managed override and missing-managed-file cases to `it.instance` | 2.40s | 1.76s | keep | Single baseline before edit; after median from three sequential reruns (1.75, 1.76, 1.80). |
| Local plugin and subagent config fixtures can use Effect-aware instance fixtures | Migrated scoped npm plugin and custom subagent markdown cases to `it.instance` | 2.37s | 1.67s | keep | Single baseline before edit; after median from three sequential reruns (1.66, 1.67, 1.67). |
| MCP merge config cases can use Effect-aware instance fixtures | Migrated three MCP merge/override cases to `it.instance` | 1.98s | 1.95s | keep | Neutral timing within noise; removes manual `tmpdir` + `withTestInstance` setup from isolated filesystem-only config cases. |
| Remaining legacy tools config cases can use Effect-aware instance fixtures | Migrated allow/deny legacy `tools` permission cases to `it.instance` | 2.65s | 1.90s | keep | Single baseline before edit; after median from three sequential reruns (2.58, 1.90, 1.90). |
| Oversized snapshot batch tests only need to cross the 100-file boundary | Reduced large diff/revert fixture sizes while keeping each case above the batch boundary | 4.32s | 3.66s | keep | Three affected snapshot tests; after median from three reruns (4.32, 3.66, 3.66) while still crossing the 100-file boundary. |
| Prompt tests without LLM calls do not need the test LLM server | Added a no-server runner and moved obvious non-LLM prompt/shell cases to it | 25.41s | 21.03s | keep | Full prompt file after simplify pass median from three reruns (20.66, 21.03, 21.64); LLM-backed tests stay on original runner. |
| CLI run subprocess cases can run independently | Marked `run-process.test.ts` subprocess cases concurrent | 11.87s | 4.13s | keep | Newest-dev single baseline; after median from three reruns (4.13, 4.17, 4.11). Each case has an isolated temp home and LLM port. |
| Snapshot initialization does not need to commit seeded files in the source repo | Removed extra `git add`/`commit` from the snapshot test `initialize()` helper | 22.22s | 20.23s | keep | Newest-dev single baseline; after median from three reruns (20.23, 22.59, 20.11). Fixture still creates a git repo root commit. |
| Processor AI SDK tool-call case does not assert git behavior | Removed `git: true` from the non-native tool-call processor test | 10.22s | 9.48s | keep | Newest-dev single baseline; full-file after median from three reruns (9.48, 9.60, 9.36); focused case passes in 1.39s. |
## Profiling Results
Command shape:
```sh
TEST_PROFILE_GLOB='test/<area>/**/*.test.ts' TEST_PROFILE_TOP=15 bun run profile:test
```
Initial slowest files observed during discovery:
| File | Seconds | Scope |
| ----------------------------------------- | ------: | ------------- |
| `test/config/config.test.ts` | 23.546 | config |
| `test/provider/provider.test.ts` | 18.747 | provider |
| `test/control-plane/workspace.test.ts` | 16.447 | control-plane |
| `test/plugin/install-concurrency.test.ts` | 14.804 | plugin |
| `test/server/httpapi-cors.test.ts` | 14.620 | server |
| `test/server/httpapi-listen.test.ts` | 10.073 | server |
| `test/server/httpapi-sdk.test.ts` | 8.661 | server |
| `test/server/httpapi-provider.test.ts` | 7.905 | server |
| `test/cli/tui/plugin-lifecycle.test.ts` | 7.330 | cli/tui |
| `test/file/index.test.ts` | 7.214 | file |
This table is historical profiling input, not the current ranking after kept changes.
Targeted 3-run baselines:
| File | Runs | Median | Notes |
| ----------------------------------------- | ---------------------- | -----: | ---------------------------------------------------------------------------- |
| `test/control-plane/workspace.test.ts` | 12.949, 12.949, 12.773 | 12.949 | Stable slow target. |
| `test/server/httpapi-listen.test.ts` | 10.554, 10.631, 10.479 | 10.554 | Stable slow target; WebSocket/listener lifecycle. |
| `test/config/config.test.ts` | 10.270, 9.042, 10.737 | 10.270 | Large serial file; initial 23s was mixed-scope contention/noise. |
| `test/server/httpapi-sdk.test.ts` | 7.600, 8.011, 8.035 | 8.011 | Stable slow target. |
| `test/plugin/install-concurrency.test.ts` | 7.949, 7.800, 7.712 | 7.800 | Stable slow target; many subprocesses. |
| `test/provider/provider.test.ts` | 8.323, 7.543, 7.474 | 7.543 | Large serial file. |
| `test/server/httpapi-cors.test.ts` | 2.621, 1.682, 1.518 | 1.682 | Not a standalone top target; initial 14s was mixed-scope noise/order effect. |
Full-suite sanity checks:
| Command | Result | Notes |
| -------------------- | -------: | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `bun run bench:test` | 225.069s | Before continuing prompt/session work. |
| `bun run bench:test` | 186.729s | After prompt, processor, and PTY wins before safety review restores. |
| `bun run bench:test` | 202.317s | After restoring prompt shell coverage and SDK VCS parity coverage. |
| `bun run bench:test` | failed | Watcher blocker cleared; current run later failed in focused-passing `tool/skill.test.ts` and prompt shell timeout cases under full-suite load. |
## Dead Ends
| Hypothesis | Change Tried | Before | After | Decision | Notes |
| ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | -----: | -----: | -------- | --------------------------------------------------------------------------------------------- |
| `file/index.test.ts` pays unnecessary per-test global instance cleanup | Removed `afterEach(disposeAllInstances)` while keeping the explicit disposal test import | 5.262s | 5.089s | discard | Improvement was within noise and the cleanup is a safety guard for many instance-state tests. |
| Socket reset retry test can shorten its idle-timeout path | Reduced Bun server idle timeout and tried forced server close | 16.46s | failed | discard | Shorter idle timeout changed the error shape; forced close hung. Keep the real socket reset. |
| `tool/webfetch` can avoid per-test instance setup | Switched local HTTP tests from `it.instance` to `it.live` | 1.219s | failed | discard | Tool execution reads instance-local agent state, so the temp instance is required. |
| LSP client interop tests can shorten coarse request-handling sleeps | Reduced fixed post-notification waits from 100ms to 10ms | 4.270s | 4.740s | discard | First run improved to 3.870s but verification was slower than baseline; not a clear win. |
| Config content env cases can use Effect-aware instance fixtures | Migrated two `OPENCODE_CONFIG_CONTENT` token substitution cases to `it.instance` | 1.95s | 2.06s | discard | Passing but not neutral-or-better in focused reruns; keep existing explicit env cleanup. |