unsloth/scripts
Daniel Han ac765d2efb
studio/ci: pre-install lockfile supply-chain audit (npm + cargo) (#5392)
* studio/ci: pre-install lockfile supply-chain audit (npm + cargo)

The Mini Shai-Hulud wave that hit @tanstack/* on 2026-05-11 19:20-19:26
UTC (GHSA-g7cv-rxg3-hmpx) pushed 84 malicious versions across 42
packages. Each compromised tarball carried an `optionalDependencies`
entry pointing at a GitHub-hosted prepare script that exfiltrated
GitHub / npm / AWS / Vault / SSH credentials on `npm install` / `npm
ci`. Our current lockfile pins ALL @tanstack/* at pre-malicious
versions so we were not exposed, but the only defense layer between
"dependabot opens a security-update PR during a malicious window" and
"a compromised package's postinstall runs on the CI runner" is the
advisory-DB latency. `npm audit` and OSV-Scanner are reactive: there
is a window between malicious publication and GHSA landing.

Add a pre-install lockfile audit that fires on the injection pattern
itself, BEFORE `npm ci` gets a chance to execute lifecycle scripts:

  scripts/lockfile_supply_chain_audit.py

    npm side (studio/frontend/package-lock.json, lockfileVersion 2/3):
      1. every `resolved` URL must point to registry.npmjs.org;
         direct GitHub / git+ / file: refs are the Shai-Hulud vector
      2. every non-bundled entry must carry an `integrity` SHA
      3. raw-text scan for known IOC strings (router_init.js,
         tanstack_runner.js, router_runtime.js, @tanstack/setup,
         the specific TanStack worm commit hash, getsession.org
         exfiltration host, "A Mini Shai-Hulud has Appeared" marker)
      4. nested `node_modules/.../node_modules/` fold-ins are
         transparent -- they ride on the parent tarball's integrity

    cargo side (studio/src-tauri/Cargo.lock):
      5. every `source` must be the crates.io registry
      6. registry crates must have a `checksum`
      7. one allowlist entry: fix-path-env from
         tauri-apps/fix-path-env-rs at pinned SHA c4c45d5. Any other
         non-registry source -- or a bump of that pinned SHA --
         re-fires the audit until reviewed + appended

Wire into four workflows:

  .github/workflows/security-audit.yml -- new step inside the
    advisory-audit job, immediately before `npm audit` so the
    structural pass and the advisory-DB pass appear together in
    the GitHub step summary.

  .github/workflows/studio-frontend-ci.yml,
  .github/workflows/wheel-smoke.yml,
  .github/workflows/studio-tauri-smoke.yml -- new step immediately
    BEFORE `npm ci`. If a future malicious bump lands in our lockfile,
    the audit refuses and `npm ci` never runs, so no `prepare` /
    `postinstall` from a compromised tarball can execute on the
    runner.

Note on --ignore-scripts: every npm ci in our CI is followed directly
by `npm run build` or `tauri build`, both of which depend on package
install scripts (esbuild's native-binary postinstall, etc.). Blanket
--ignore-scripts breaks the build, so the pre-install structural
audit is the practical mitigation. The audit reads lockfiles only;
it never executes anything from them.

Verified:
  - Clean state: 0 findings on the current tree (npm + cargo).
  - Fault injection: synthetic `@tanstack/setup` IOC + non-registry
    `resolved` URL both fire with exit code 1.
  - YAML parses cleanly for all four modified workflows.

Refs:
  - https://tanstack.com/blog/npm-supply-chain-compromise-postmortem
  - https://github.com/TanStack/router/issues/7383
  - https://github.com/TanStack/router/security/advisories/GHSA-g7cv-rxg3-hmpx
  - https://www.aikido.dev/blog/mini-shai-hulud-is-back-tanstack-compromised
  - https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-05-11 20:36:52 -07:00
..
data CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
enforce_kwargs_spacing.py Formatting & bug fixes (#3563) 2025-11-07 06:00:22 -08:00
install_gemma4_mlx.sh Move gemma4 script (#4994) 2026-04-12 23:41:15 -07:00
install_qwen3_6_mlx.sh Add qwen3.6 script (#5084) 2026-04-17 01:21:30 -07:00
lockfile_supply_chain_audit.py studio/ci: pre-install lockfile supply-chain audit (npm + cargo) (#5392) 2026-05-11 20:36:52 -07:00
notebook_to_python.py CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
notebook_validator.py CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
run_ruff_format.py Formatting & bug fixes (#3563) 2025-11-07 06:00:22 -08:00
scan_packages.py CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312) 2026-05-11 03:19:13 -07:00
stamp_studio_release.py Add Studio web update banner and release version display (#5308) 2026-05-11 18:24:01 +04:00