unsloth/tests
Daniel Han 44989ea2cb
ci: deterministic check for studio/frontend dep removals (#5478)
* ci: deterministic check for studio/frontend dep removals

Adds a CI gate that catches the common foot-gun: a dep dropped from
studio/frontend/package.json that something in src/ still imports.

scripts/check_frontend_dep_removal.py
  Diffs package.json against a git base ref, collects every package
  no longer declared, and for each one:
    1. Greps the entire repo for any usage pattern (static / dynamic /
       side-effect imports, require, CSS @import, HTML script/link
       src, new URL(), triple-slash references, template literals,
       bare quoted strings in JS-like files).
    2. Resolves whether the package would still install by BFS'ing
       the dep graph in the new lockfile starting from the new
       package.json's declared deps (so a stale lockfile does not
       give false OK-via-transitive results).
    3. Distinguishes top-level node_modules/<name> from nested copies
       under other packages. Bare src/ imports only resolve to the
       top-level path.
    4. Pip-installed playwright references are filtered, so removing
       the npm playwright (CI uses the pip one) is reported correctly.

  Additional hygiene checks (warnings, fail with --strict):
    - lockfile <root> dep map matches package.json (catches drift).
    - @types/X is not orphaned when X is no longer declared.
    - No src/ import points at a package not declared in any field.

tests/studio/test_frontend_dep_removal.py
  24 deterministic cases. Each patches a copy of the head
  package.json, runs the script, and asserts (exit status,
  reported FAIL list). Covers:
    - Genuinely-breaking removals: next-themes, @xyflow/react,
      @huggingface/hub, dexie, motion, canvas-confetti, recharts,
      node-forge, mammoth, unpdf.
    - Safe-via-transitive removals: katex, clsx, react,
      @radix-ui/react-slot, zustand, tailwind-merge, remark-gfm,
      date-fns, js-yaml, @tauri-apps/api.
    - Mixed multi-removal failing on the unsafe entries only.
    - Non-existent / not-in-base names (no-op).
    - Move from deps to devDeps (not a removal).

.github/workflows/studio-frontend-ci.yml
  Runs the checker on pull_request events against
  origin/${{ github.base_ref }}, plus the edge-case suite.

* scripts: harden frontend dep removal check + adversarial suite

classify() now catches sneaky shapes that an earlier line-only scan
would miss:
  - multi-line `import { a, b } from "pkg"` and the same shape for
    `export { ... } from "pkg"` / `export * from "pkg"` /
    `export type ... from "pkg"`.
  - JSDoc `@import("pkg")` references.
  - Word-boundary fix so `foo` no longer matches `foobar` (subpath gate:
    after the package name we require closing quote or `/`).
  - Negative-lookbehind on `(?<!@)\bimport\b` so CSS `@import "X"` is
    classified as css_import, not side_effect_import.

find_usage() now feeds an 8-line window (4 above / 4 below the grep
hit) into classify() so multi-line import statements are picked up
even though the initial grep is line-based.

tests/studio/test_frontend_dep_removal.py now exercises three suites:
  - 24 edge cases: subprocess-driven, full-pipeline.
  - 28 classify() unit cases: direct function call against hand-crafted
    snippets. Covers static / side-effect / dynamic / require /
    css_import / html_script / html_link / re_export (4 variants) /
    template_literal / new_url / tsc_triple_slash / jsdoc_import /
    string_literal, plus false-positive guards (substring collision,
    plain-text comments, URL path tails, Python files, markdown).
  - 12 adversarial cases: write synthetic files under
    studio/frontend/src/__dep_check_adversarial__/, run the full
    script, then clean up. Confirms multi-line imports, re-exports,
    JSDoc @import, new URL, dynamic imports all FAIL when the
    underlying package is removed.

Current total: 64 / 64 cases pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* scripts: detect bin references in package.json scripts

Catches the last common false-negative: removing a package whose
bin is only referenced through `package.json` scripts (e.g. dropping
typescript while `"build": "tsc -b && vite build"` calls tsc).

Cross-checked the patterns Vercel/Next.js, Vite, and TanStack use
in their own manifests; the bin/scripts pairing is the one
consumer-side pattern dep checkers commonly miss.

How it works:
  - Build a bin-to-package map from each lockfile entry's `bin`
    field. The map is global so a stale lockfile still resolves
    bins from packages about to be pruned.
  - Tokenize each script value, splitting on `&&`, `||`, `;`, `|`.
    Strip env-var assignments and `npx / pnpx / yarn / pnpm / bunx`
    prefixes, plus `./node_modules/.bin/` and `node_modules/.bin/`
    path prefixes. Look up the leading token in the bin map.
  - Hits are reported as `script_bin` and feed the same reachability
    gate as source imports. A bin still installed transitively
    (e.g. vite via @vitejs/plugin-react peer) is OK-via-transitive;
    an orphaned bin is FAIL.

Test additions:
  - 5 new edge cases: removing vite, typescript, eslint, @biomejs/biome,
    and (@biomejs/biome + @vitejs/plugin-react) together. Correctly
    flags @biomejs/biome and the combo as FAIL while vite / typescript
    / eslint are kept by peers.
  - 8 new classify() unit cases: TypeScript ambient `declare module`,
    namespace imports, combined default+named, default-as-named,
    re-export default (4 forms), `.then()` dynamic imports without
    await, and TypeScript `import()` in type position.

Current total: 29 edge + 36 classify-unit + 12 adversarial = 77 / 77.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* scripts: detect package.json field references to packages

After surveying package.json patterns in 10+ popular repos (React,
Vue/Svelte/Astro/Next.js, Vite, Storybook, TanStack/Query, Tailwind,
ESLint, TypeScript, Prettier, SvelteKit), several config fields in
package.json itself can reference packages by string. My checker
filtered all of package.json out of the string_literal fallback,
so removing a package that is only referenced from one of these
fields was a false negative.

Now covered (new pkg_json_field kind):
  - overrides / resolutions / pnpm.overrides keys
  - pnpm.patchedDependencies keys
  - peerDependenciesMeta keys
  - prettier: "@my/prettier-config" string
  - eslintConfig.extends (string or array)
  - stylelint.extends / stylelint.plugins
  - babel.presets / babel.plugins
  - jest.preset / jest.setupFiles / jest.transform
  - commitlint.extends
  - renovate.extends
  - remarkConfig.plugins
  - any other tool config field whose strings/keys equal the pkg
    name or `pkg/subpath`

False-positive guards (do not flag string values inside):
  - browserslist (browser queries)
  - keywords (free-form strings)
  - engines / engineStrict / packageManager / volta (version pins)
  - files / directories / publishConfig (paths)
  - workspaces (paths/globs)
  - main / module / browser / types / typings / exports / imports /
    bin / man (author-side fields)
  - scripts (already handled separately via scripts_bin_refs)
  - name / version / description / author / repository / homepage etc.

Test additions: new PkgFieldCase suite with 19 cases covering each
tool config field, subpath references, and the 5 false-positive
guards. Combined with the existing 29 edge / 36 classify / 12
adversarial cases, the suite is 96 / 96.

* scripts: enumerate dead deps in studio/frontend

Adds an opt-in dead-dep enumeration to the existing safety check.
Iterates every package declared in studio/frontend/package.json
(all four dep fields combined) and reports each as one of:

  used               at least one detected reference -- in src/, a
                     config file, package.json scripts (bin), a
                     package.json tool-config field (overrides /
                     prettier / eslintConfig / stylelint / babel /
                     jest / commitlint / renovate / etc.), or
                     tsconfig.compilerOptions.types

  unused             no detected reference anywhere

  type_pkg_kept      @types/X where X is still declared (or X = node,
                     always implicit)

  type_pkg_orphan    @types/X where X is no longer declared --
                     candidate for removal alongside X

Wiring:
  - New CLI flag `--enumerate-dead` (off by default).
  - CI workflow now passes `--enumerate-dead` so the report shows on
    every PR run; the report is informational unless `--strict` is
    also set.
  - With `--strict`, unused / type_pkg_orphan entries fail the run.

Tests:
  - 5 new EnumCase scenarios:
    E01 fake dep with no usage -> reported unused
    E02 fake dep imported by a synthetic src file -> reported used
    E03 fake dep referenced only in overrides -> reported used
    E04 @types/X paired with X (also imported) -> kept
    E05 @types/X without X -> orphan

Running the new flag against the current main reproduces exactly the
11 deps PR #5477 removed, validating the heuristic end to end.

Current total: 29 edge + 36 classify + 12 adversarial + 19 pkg-json
field + 5 enumeration = 101 / 101.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci: fetch base ref before running dep removal safety check

actions/checkout uses fetch-depth: 1 by default, so when the
dependency removal check ran `git show origin/main:.../package.json`
the ref wasn't available locally and the script exited 2 with
"could not read base package.json at origin/main:...".

Fetch the single base commit before invoking the check so the
git-show lookup resolves. --depth=1 keeps the extra fetch cheap.

* ci: address bot review on PR 5478

Five issues flagged across gemini and codex:

  * --base-lock argparse arg was defined and advertised in the
    docstring, but main() always read args.head_lock in both branches
    -- the flag did nothing. Dropped the dead arg and the misleading
    docstring line; the lockfile-reachability analysis only needs the
    head lockfile.

  * lock_resolvable() was defined but never called. Removed.

  * read_pkg_file() did not specify an encoding for read_text().
    Added encoding="utf-8" for cross-platform stability.

  * read_pkg_file() returned {} when the path did not exist, so a
    bad --head-lock value silently bypassed the reachability checks
    (false PASS for removals that resolve through npm script bins).
    main() now exits 2 with a clear message when the head lockfile
    is missing, matching the existing behavior for the head pkg.

  * studio-frontend-ci.yml pull_request paths filter only matched
    studio/frontend/** and the workflow file, so PRs that modified
    the checker script or its test could skip this job. Added both
    files to the trigger.

* ci: address 10x reviewer findings on dep removal safety check

Eight P1s and three P2s surfaced across 10 codex reviewers; this
commit addresses all of them.

P1s:

1. Workflow refspec. `git fetch --depth=1 origin <base_ref>` may only
   create FETCH_HEAD in shallow PR checkouts; the checker then dies
   with `fatal: invalid object name 'origin/main'`. Use the explicit
   refspec `<base>:refs/remotes/origin/<base>` so origin/<base> is
   reliably created.

2. `_deps_of()` was counting optional peer dependencies as reachable.
   npm only installs an optional peer when another package declares
   the same dep, so for "is this removed package still in the tree"
   they cannot keep it alive on their own. Skip entries marked
   `optional: true` in `peerDependenciesMeta`.

3. JS-syntactic classifiers (static_import, side_effect_import,
   dynamic_import, require, re_export, jsdoc_import, template_literal,
   tsc_triple_slash, new_url) now gate on file extension. Previously
   only the final string-literal fallback was gated, so a JS-shaped
   string inside a Python fixture or a Markdown code fence triggered
   a false FAIL. Added U37-U40 covering .py / .md / .sh / .yml.

4. HTML `<script src=>` and `<link href=>` patterns now respect a
   package-name boundary so `/node_modules/foo-extra/...` is not
   treated as a usage of `foo`. Added U41-U43.

5. New `find_command_usage()` detects CLI invocations in .sh / .yml
   / .yaml / .ps1 / .bat / Dockerfile* (npx pkg, bunx pkg, pnpm exec
   pkg, yarn dlx pkg, or a bare pkg --flag). Also covers scoped CLI
   packages exposed by their unscoped tail (@biomejs/biome -> biome).

6. `build_bin_to_pkg(head_lock)` was losing the bin -> package map
   for packages the PR correctly removed from the lockfile, so
   `scripts.biome:check` no longer flagged when @biomejs/biome was
   being dropped. Now also read the base lockfile (via `git show` or
   the new `--base-lock` override) and layer its bin map on top for
   any package in the removed set.

7. `--strict` now runs hygiene checks (lockfile sync, @types
   orphans, undeclared imports, dead-deps) on the no-removal path
   too. Previously the early return at "[OK] no dependencies removed"
   skipped them, so `--strict` silently passed on a tree with
   uncommitted lockfile drift or unused deps.

8. Removed `@types/X` packages are now matched against the runtime
   target name `X`: `/// <reference types="X" />`, tsconfig
   compilerOptions.types entries, AND runtime `import "X"` shapes.
   Handles the npm scope encoding (`@types/foo__bar` -> `@foo/bar`).

P2s:

9. CSS `url(...)` now accepts both quoted and unquoted forms (added
   U44-U45). The previous regex required `/{pkg}/` after a slash,
   missing bare-package urls like `url(katex/fonts/x.woff2)`.

10. `find_imports_without_decl()` now covers all static-import
    shapes: `import "pkg"`, `import Foo from "pkg"`,
    `import { Foo } from "pkg"`, `import type { Foo } from "pkg"`,
    `await import("pkg")`, `require("pkg")`.

11. (Same as #8.) Removed `@types/X` is also linked to runtime
    imports of `X`, not just type-only references.

Test suite expanded from 101 to 110 cases; all pass. Real-world
enumerate-dead still flags the same 11 unused packages on
studio/dep-removal-safety-check (matches PR 5477's removal set).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci: address 4x Opus reviewer findings on dep removal check

Three blockers from the parallel Opus review batch:

1. scripts_bin_refs ignored every script that began with a wrapper.
   The original "first non-env token wins" heuristic credited
   cross-env / dotenv / dotenvx / env-cmd as the bin, so a script like
   `cross-env CI=1 biome check` left @biomejs/biome looking unused.
   Rewrote into _next_real_bin(), which peels env prefixes, the
   leading package-manager runner (npx / pnpx / bunx / pnpm exec /
   yarn dlx), and the known wrapper bins (with --/-flag-arg handling)
   before returning the real CLI. shlex tokenization preserves quoted
   env values like `FOO="a b"`.

2. enumerate_dep_usage skipped find_command_usage. The non-enumerate
   path already credited deps used only from CI / Dockerfile / shell
   scripts, but `--enumerate-dead` did not, so packages referenced
   only from a workflow were silently listed as dead. Added the same
   call (gated against @types/* to avoid the unscoped-tail false
   positive).

3. classify multi-line window was ±4 lines. Prettier formats long
   named-import lists one identifier per line, so a 20-import block
   pushed the `import` keyword out of the window and the dep dropped
   to the string-literal fallback (or worse, was missed entirely).
   Widened to ±25 -- still bounded enough to keep false-positives
   negligible, wide enough for the realistic Prettier ceiling.

Tests: added 10 _next_real_bin unit cases + 4 scripts_bin_refs
end-to-end cases (W01-W10 + I01-I04) and a 22-identifier multi-line
import adversarial case (A13). Full suite: 125/125.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-05-16 05:46:22 -07:00
..
notebooks CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
python studio: skip flash-attn install on Blackwell GPUs (sm_100+) (#5420) 2026-05-14 18:13:50 +04:00
qlora Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" 2025-12-01 07:24:58 -08:00
saving chore: fix typo cleanup across tests and backend strings (#5152) 2026-04-24 12:51:27 +01:00
security security: NOT affected by Mini Shai-Hulud (May-12 wave) -- forward-looking hardening only (#5397) 2026-05-13 04:58:12 -07:00
sh fix(tests/sh): accept pinned tokenizers line after #5359 (#5361) 2026-05-11 02:58:20 -07:00
studio ci: deterministic check for studio/frontend dep removals (#5478) 2026-05-16 05:46:22 -07:00
utils feat: Add cactus QAT scheme support (#4679) 2026-04-15 07:40:03 -07:00
version_compat CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
vllm_compat CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
__init__.py Qwen 3, Bug Fixes (#2445) 2025-04-30 22:38:39 -07:00
_zoo_aggressive_cuda_spoof.py CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
conftest.py tests: drift detector parity with unsloth-zoo (#5421) 2026-05-14 04:50:30 -07:00
run_all.sh fix: add tokenizers to no-torch deps and TORCH_CONSTRAINT for arm64 macOS py313+ (#4748) 2026-04-01 06:12:17 -07:00
test_cli_export_unpacking.py studio: stream export worker output into the export dialog (#4897) 2026-04-14 08:55:43 -07:00
test_gemma4_chat_template.py update gema4 chat templates (#5116) 2026-04-22 09:04:08 -07:00
test_get_model_name.py feat: Add support for OLMo-3 model (#4678) 2026-04-15 07:39:11 -07:00
test_import_fixes_drift.py import_fixes + drift detectors: cover transformers 5.x drift (#5423) 2026-05-14 05:14:21 -07:00
test_loader_glob_skip.py Add unit tests for HfFileSystem glob skip guard (#4854) 2026-04-06 08:54:36 -07:00
test_model_registry.py Revert "[FIX] Vllm guided decoding params (#3662)" 2025-12-01 05:43:45 -08:00
test_multi_image_grpo_chunking.py Multi Image GRPO (#5197) 2026-05-13 04:27:49 -07:00
test_peft_weight_converter_compat.py Patch checkpoint reload init functions to strip unsupported args (#5167) 2026-04-29 02:50:49 -07:00
test_public_api_surface.py tests: public-api surface drift detector (companion to test_import_fixes_drift.py) (#5428) 2026-05-14 19:56:21 -07:00
test_raw_text.py Add Studio PR-time CI: pin enforcement, frontend, backend, wheel smoke (#5298) 2026-05-06 04:41:57 -07:00
test_resolve_model_class.py fix: guard resolve_model_class fallback against unresolvable transformers AutoModel entries (#5155) 2026-04-24 05:59:17 -07:00
test_studio_install_workspace_guard.py studio: security and hardening pass (auth rate-limit, sandbox, path containment, schema validation, headers) (#5375) 2026-05-13 06:12:18 -07:00
test_studio_root_resilience.py install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths (#5190) 2026-05-05 23:17:40 -07:00