studio/ci: pre-install lockfile supply-chain audit (npm + cargo) (#5392)

* studio/ci: pre-install lockfile supply-chain audit (npm + cargo)

The Mini Shai-Hulud wave that hit @tanstack/* on 2026-05-11 19:20-19:26
UTC (GHSA-g7cv-rxg3-hmpx) pushed 84 malicious versions across 42
packages. Each compromised tarball carried an `optionalDependencies`
entry pointing at a GitHub-hosted prepare script that exfiltrated
GitHub / npm / AWS / Vault / SSH credentials on `npm install` / `npm
ci`. Our current lockfile pins ALL @tanstack/* at pre-malicious
versions so we were not exposed, but the only defense layer between
"dependabot opens a security-update PR during a malicious window" and
"a compromised package's postinstall runs on the CI runner" is the
advisory-DB latency. `npm audit` and OSV-Scanner are reactive: there
is a window between malicious publication and GHSA landing.

Add a pre-install lockfile audit that fires on the injection pattern
itself, BEFORE `npm ci` gets a chance to execute lifecycle scripts:

  scripts/lockfile_supply_chain_audit.py

    npm side (studio/frontend/package-lock.json, lockfileVersion 2/3):
      1. every `resolved` URL must point to registry.npmjs.org;
         direct GitHub / git+ / file: refs are the Shai-Hulud vector
      2. every non-bundled entry must carry an `integrity` SHA
      3. raw-text scan for known IOC strings (router_init.js,
         tanstack_runner.js, router_runtime.js, @tanstack/setup,
         the specific TanStack worm commit hash, getsession.org
         exfiltration host, "A Mini Shai-Hulud has Appeared" marker)
      4. nested `node_modules/.../node_modules/` fold-ins are
         transparent -- they ride on the parent tarball's integrity

    cargo side (studio/src-tauri/Cargo.lock):
      5. every `source` must be the crates.io registry
      6. registry crates must have a `checksum`
      7. one allowlist entry: fix-path-env from
         tauri-apps/fix-path-env-rs at pinned SHA c4c45d5. Any other
         non-registry source -- or a bump of that pinned SHA --
         re-fires the audit until reviewed + appended

Wire into four workflows:

  .github/workflows/security-audit.yml -- new step inside the
    advisory-audit job, immediately before `npm audit` so the
    structural pass and the advisory-DB pass appear together in
    the GitHub step summary.

  .github/workflows/studio-frontend-ci.yml,
  .github/workflows/wheel-smoke.yml,
  .github/workflows/studio-tauri-smoke.yml -- new step immediately
    BEFORE `npm ci`. If a future malicious bump lands in our lockfile,
    the audit refuses and `npm ci` never runs, so no `prepare` /
    `postinstall` from a compromised tarball can execute on the
    runner.

Note on --ignore-scripts: every npm ci in our CI is followed directly
by `npm run build` or `tauri build`, both of which depend on package
install scripts (esbuild's native-binary postinstall, etc.). Blanket
--ignore-scripts breaks the build, so the pre-install structural
audit is the practical mitigation. The audit reads lockfiles only;
it never executes anything from them.

Verified:
  - Clean state: 0 findings on the current tree (npm + cargo).
  - Fault injection: synthetic `@tanstack/setup` IOC + non-registry
    `resolved` URL both fire with exit code 1.
  - YAML parses cleanly for all four modified workflows.

Refs:
  - https://tanstack.com/blog/npm-supply-chain-compromise-postmortem
  - https://github.com/TanStack/router/issues/7383
  - https://github.com/TanStack/router/security/advisories/GHSA-g7cv-rxg3-hmpx
  - https://www.aikido.dev/blog/mini-shai-hulud-is-back-tanstack-compromised
  - https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This commit is contained in:
Daniel Han 2026-05-11 20:36:52 -07:00 committed by GitHub
parent 1794a544b5
commit ac765d2efb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 521 additions and 0 deletions

View file

@ -244,6 +244,27 @@ jobs:
echo '```'
} >> "$GITHUB_STEP_SUMMARY"
# ─────────────────────────────────────────────────────────────
# Pre-install lockfile supply-chain audit (npm + cargo).
# Catches structural anomalies (non-registry resolved URLs,
# missing integrity hashes, known IOC strings) BEFORE `npm
# audit` or OSV-Scanner consult the advisory DB. The advisory
# path is reactive -- there is a window between a malicious
# publication and the GHSA landing. This step fires on the
# injection pattern itself so it catches the same class of
# attack the moment the lockfile shape becomes wrong.
# ─────────────────────────────────────────────────────────────
- name: Lockfile supply-chain audit (pre-install scan)
run: |
python3 scripts/lockfile_supply_chain_audit.py
{
echo "## Lockfile supply-chain audit"
echo
echo "Scanned: studio/frontend/package-lock.json + studio/src-tauri/Cargo.lock"
echo
echo "No structural anomalies or known IOC strings."
} >> "$GITHUB_STEP_SUMMARY"
# ─────────────────────────────────────────────────────────────
# npm: Studio frontend
# ─────────────────────────────────────────────────────────────

View file

@ -58,6 +58,14 @@ jobs:
cache: 'npm'
cache-dependency-path: studio/frontend/package-lock.json
# Run the structural lockfile scan BEFORE npm ci. A compromised
# tarball runs its `prepare` / `postinstall` during `npm ci`,
# so any catch has to fire upstream of that. The scanner is
# pure-Python read-only; safe to call ahead of every install.
- name: Lockfile supply-chain audit (pre-install scan)
working-directory: ${{ github.workspace }}
run: python3 scripts/lockfile_supply_chain_audit.py
- name: Lockfile must agree with package.json (npm ci is strict)
run: npm ci --no-fund --no-audit

View file

@ -69,6 +69,9 @@ jobs:
echo "$out"
[ "$out" = "tauri-cli 2.10.1" ] || { echo "::error::expected tauri-cli 2.10.1, got $out"; exit 1; }
- name: Lockfile supply-chain audit (pre-install scan)
run: python3 scripts/lockfile_supply_chain_audit.py
- name: Frontend build (npm ci, vite)
working-directory: studio/frontend
run: |

View file

@ -53,6 +53,9 @@ jobs:
with:
python-version: '3.12'
- name: Lockfile supply-chain audit (pre-install scan)
run: python3 scripts/lockfile_supply_chain_audit.py
- name: Build frontend
run: |
cd studio/frontend

View file

@ -0,0 +1,486 @@
#!/usr/bin/env python3
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved.
"""Lockfile supply-chain audit for the Studio frontend and Tauri shell.
Runs BEFORE `npm ci` / `cargo fetch` in CI. Refuses to proceed when a
lockfile contains patterns that indicate the kind of supply-chain
injection seen in the npm Shai-Hulud waves and the cargo
crates.io brand-squat attempts.
What it checks
==============
studio/frontend/package-lock.json (lockfileVersion 2 or 3):
1. `resolved` URL origin. Every entry must resolve through
`https://registry.npmjs.org/`. Direct GitHub-hosted dependencies
(`git+ssh://`, `git+https://`, `github:owner/repo#sha`,
`file:`, `http://`) are refused -- npm's TanStack incident used
exactly this vector to land an unaudited GitHub commit hash as
an optional dependency.
2. `integrity` field presence. Every non-workspace entry must carry
an `integrity` SHA. A missing integrity means the registry can
swap the tarball after lockfile generation and CI will not
notice.
3. Known IOC strings. A hardcoded set of indicator-of-compromise
substrings is grepped across the entire lockfile body (file
names, dependency keys, URLs). The list is updated as new
campaigns surface. Catching one means the local install was
about to pull a publicly-known malicious release.
studio/src-tauri/Cargo.lock:
4. `source` field origin. Every entry with a `source` must point at
`registry+https://github.com/rust-lang/crates.io-index`. Direct
git sources (`git+https://...`) and `path+...` for cross-crate
paths warrant manual review and are flagged.
5. Known cargo IOC strings. Same idea as (3), separate list.
Exit codes
==========
0 no findings, or an opt-out env var (UNSLOTH_LOCKFILE_AUDIT_SKIP=1)
is set
1 one or more findings; stderr lists them with file path and line
number where derivable
2 internal error (missing dependency, malformed JSON, etc.)
Operational stance
==================
This scanner only PARSES the lockfiles -- it never executes anything
in them, never resolves anything against the network. Safe to run
ahead of every `npm ci`. The IOC list is short by design; this
complements (not replaces) `npm audit`, OSV-Scanner, and the
advisory-DB pipeline in `.github/workflows/security-audit.yml`. The
shape of the catch is "we refuse to proceed because the lockfile
itself is shaped wrong", which fires before any third-party install
script gets a chance to run on the runner.
"""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[1]
# ─────────────────────────────────────────────────────────────────────
# Known IOC strings (case-sensitive substring match).
# ─────────────────────────────────────────────────────────────────────
#
# Keep these short and FACTUAL. Each entry is tied to a public advisory
# and is the literal string an attacker would have to embed for the
# attack to work. Adding speculative or generic patterns here would
# generate false positives on dependency upgrades.
NPM_IOC_STRINGS: tuple[str, ...] = (
# Shai-Hulud TanStack wave -- May 11, 2026 (GHSA-g7cv-rxg3-hmpx).
"router_init.js",
"tanstack_runner.js",
"router_runtime.js",
"@tanstack/setup",
"github:tanstack/router#79ac49eedf774dd4b0cfa308722bc463cfe5885c",
# Exfiltration endpoints observed across both Shai-Hulud waves.
"filev2.getsession.org",
"getsession.org/file/",
# Campaign markers; the worm tarballs print this to stdout on run.
"A Mini Shai-Hulud has Appeared",
)
CARGO_IOC_STRINGS: tuple[str, ...] = (
# Reserved for future cargo-side incidents. Empty by default --
# `source` origin check below catches the structural pattern.
)
# ─────────────────────────────────────────────────────────────────────
# Allowed lockfile origins.
# ─────────────────────────────────────────────────────────────────────
NPM_REGISTRY_PREFIX = "https://registry.npmjs.org/"
# Tarballs are also fetched from this mirror on some GH Actions cached
# runs (npm rewrites the resolved URL on cache hit). Allow either.
NPM_REGISTRY_PREFIXES_ALLOWED: tuple[str, ...] = (NPM_REGISTRY_PREFIX,)
CARGO_REGISTRY_SOURCE = "registry+https://github.com/rust-lang/crates.io-index"
# ─────────────────────────────────────────────────────────────────────
# Cargo non-registry source allowlist.
# ─────────────────────────────────────────────────────────────────────
#
# Each entry is `(crate_name, exact_source_string)`. The crate must
# match by name AND the source must match the full pinned-SHA string
# verbatim. Bumping the commit SHA forces a re-review here: the
# scanner fires until the new SHA is appended.
#
# Studio's Tauri shell pulls `fix-path-env` directly from
# tauri-apps/fix-path-env-rs because the crate is not published to
# crates.io. The pinned commit (c4c45d5) was reviewed at the time it
# landed; future bumps need explicit approval.
CARGO_SOURCE_ALLOWLIST: tuple[tuple[str, str], ...] = (
(
"fix-path-env",
"git+https://github.com/tauri-apps/fix-path-env-rs#"
"c4c45d503ea115a839aae718d02f79e7c7f0f673",
),
)
# ─────────────────────────────────────────────────────────────────────
# Finding container.
# ─────────────────────────────────────────────────────────────────────
class Finding:
__slots__ = ("path", "package", "kind", "detail")
def __init__(self, path: str, package: str, kind: str, detail: str) -> None:
self.path = path
self.package = package
self.kind = kind
self.detail = detail
def __str__(self) -> str:
return (
f" [{self.kind}] {self.path}\n"
f" package: {self.package}\n"
f" detail: {self.detail}"
)
# ─────────────────────────────────────────────────────────────────────
# package-lock.json audit.
# ─────────────────────────────────────────────────────────────────────
def audit_npm_lockfile(path: Path) -> list[Finding]:
findings: list[Finding] = []
if not path.exists():
return findings
raw = path.read_text(encoding = "utf-8")
try:
lock = json.loads(raw)
except json.JSONDecodeError as exc:
findings.append(
Finding(
path = str(path),
package = "<root>",
kind = "malformed-lockfile",
detail = f"could not parse as JSON: {exc}",
)
)
return findings
lockfile_version = lock.get("lockfileVersion")
if lockfile_version not in (2, 3):
findings.append(
Finding(
path = str(path),
package = "<root>",
kind = "unsupported-lockfile-version",
detail = (f"only lockfileVersion 2 or 3 audited; got {lockfile_version}"),
)
)
packages = lock.get("packages") or {}
for key, entry in packages.items():
# The empty key "" is the project root; workspace entries use
# keys like "node_modules/foo" or "studio/frontend/sub-pkg".
# Skip the project root (it has no `resolved`).
if key == "":
continue
if entry.get("link"):
# Workspace symlink; no tarball to resolve.
continue
resolved = entry.get("resolved")
# Entries living inside another package's `node_modules/`
# tree are bundled fold-ins -- the parent's tarball ships
# their source verbatim and the parent's `integrity` covers
# the whole subtree. npm represents them in lockfileVersion 3
# as nested entries with no `resolved` and no `integrity` of
# their own. Treat them as transparent to this audit.
nested = key.count("/node_modules/") >= 1
# 1. resolved-URL origin.
if resolved is None:
if nested or entry.get("bundled"):
# Bundled / fold-in entry; covered by parent integrity.
pass
elif entry.get("version"):
# Top-level entry without a resolved URL is suspicious.
findings.append(
Finding(
path = str(path),
package = key,
kind = "missing-resolved-url",
detail = (
f"version={entry['version']!r} but no `resolved` "
"field; lockfile is incomplete"
),
)
)
else:
if not any(resolved.startswith(p) for p in NPM_REGISTRY_PREFIXES_ALLOWED):
findings.append(
Finding(
path = str(path),
package = key,
kind = "non-registry-resolved-url",
detail = (
f"resolved={resolved!r}; only "
f"{NPM_REGISTRY_PREFIX} is permitted. Direct "
"GitHub / git / file references are the "
"Shai-Hulud injection vector."
),
)
)
# 2. integrity-hash presence.
if resolved is not None and not entry.get("integrity"):
findings.append(
Finding(
path = str(path),
package = key,
kind = "missing-integrity-hash",
detail = (
"no `integrity` field; npm cannot verify the "
"tarball SHA against the registry-published hash"
),
)
)
# 3. Known IOC strings: scan the raw file body so we hit fields the
# structural pass above doesn't enumerate (scripts, optional
# dependencies, etc.). Cheap and complete.
for ioc in NPM_IOC_STRINGS:
if ioc in raw:
# Best-effort line number lookup.
line_no = _first_line_containing(raw, ioc)
findings.append(
Finding(
path = f"{path}:{line_no}" if line_no else str(path),
package = "<ioc-match>",
kind = "known-ioc-string",
detail = (
f"matched known IOC substring {ioc!r}; this is "
"a public indicator of a recent supply-chain "
"compromise. Refuse to install."
),
)
)
return findings
def _first_line_containing(text: str, needle: str) -> int | None:
for i, line in enumerate(text.splitlines(), start = 1):
if needle in line:
return i
return None
# ─────────────────────────────────────────────────────────────────────
# Cargo.lock audit.
# ─────────────────────────────────────────────────────────────────────
# Cargo.lock is TOML; parse with stdlib tomllib (Python 3.11+). The
# studio's Tauri shell already requires a modern toolchain so this is
# always available where CI runs.
_PACKAGE_HEADER = re.compile(r"^\[\[package\]\]\s*$")
def audit_cargo_lockfile(path: Path) -> list[Finding]:
findings: list[Finding] = []
if not path.exists():
return findings
raw = path.read_text(encoding = "utf-8")
try:
import tomllib # type: ignore[import-not-found]
except ImportError:
# Python <3.11; fall back to a tomli shim if importable.
try:
import tomli as tomllib # type: ignore[no-redef]
except ImportError:
findings.append(
Finding(
path = str(path),
package = "<root>",
kind = "missing-toml-parser",
detail = (
"Python 3.11+ tomllib or tomli is required to "
"parse Cargo.lock; install tomli or upgrade "
"Python before re-running this audit"
),
)
)
return findings
try:
lock = tomllib.loads(raw)
except Exception as exc:
findings.append(
Finding(
path = str(path),
package = "<root>",
kind = "malformed-lockfile",
detail = f"could not parse as TOML: {exc}",
)
)
return findings
for entry in lock.get("package", []):
name = entry.get("name") or "<unnamed>"
version = entry.get("version") or "<unversioned>"
source = entry.get("source")
# Workspace-local crates have no `source` field; skip them.
if source is None:
continue
if source != CARGO_REGISTRY_SOURCE:
if (name, source) in CARGO_SOURCE_ALLOWLIST:
# Pre-approved non-registry source pinned by SHA.
pass
else:
findings.append(
Finding(
path = str(path),
package = f"{name}@{version}",
kind = "non-registry-cargo-source",
detail = (
f"source={source!r}; only "
f"{CARGO_REGISTRY_SOURCE!r} is permitted "
"by default, and no allowlist entry covers "
"this crate. If the source is legitimate, "
"add `(name, source)` to "
"CARGO_SOURCE_ALLOWLIST after reviewing the "
"pinned commit."
),
)
)
if not entry.get("checksum") and source == CARGO_REGISTRY_SOURCE:
findings.append(
Finding(
path = str(path),
package = f"{name}@{version}",
kind = "missing-cargo-checksum",
detail = (
"registry crate without checksum; cargo cannot "
"verify the downloaded source against the "
"registry-published SHA"
),
)
)
for ioc in CARGO_IOC_STRINGS:
if ioc in raw:
line_no = _first_line_containing(raw, ioc)
findings.append(
Finding(
path = f"{path}:{line_no}" if line_no else str(path),
package = "<ioc-match>",
kind = "known-ioc-string",
detail = f"matched known IOC substring {ioc!r}",
)
)
return findings
# ─────────────────────────────────────────────────────────────────────
# CLI.
# ─────────────────────────────────────────────────────────────────────
DEFAULT_NPM_LOCKFILES = ("studio/frontend/package-lock.json",)
DEFAULT_CARGO_LOCKFILES = ("studio/src-tauri/Cargo.lock",)
def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description = "Pre-install lockfile supply-chain audit.",
)
parser.add_argument(
"--root",
default = str(REPO_ROOT),
help = "Repo root (default: parent of this script).",
)
parser.add_argument(
"--npm-lockfile",
action = "append",
default = None,
help = (
"Path to a package-lock.json (repeatable). "
"Default: studio/frontend/package-lock.json."
),
)
parser.add_argument(
"--cargo-lockfile",
action = "append",
default = None,
help = (
"Path to a Cargo.lock (repeatable). "
"Default: studio/src-tauri/Cargo.lock."
),
)
args = parser.parse_args(argv)
if os.environ.get("UNSLOTH_LOCKFILE_AUDIT_SKIP") == "1":
print(
"[lockfile-audit] UNSLOTH_LOCKFILE_AUDIT_SKIP=1; "
"audit skipped (expected only for local triage)",
flush = True,
)
return 0
root = Path(args.root).resolve()
npm_paths = [root / p for p in (args.npm_lockfile or DEFAULT_NPM_LOCKFILES)]
cargo_paths = [root / p for p in (args.cargo_lockfile or DEFAULT_CARGO_LOCKFILES)]
all_findings: list[Finding] = []
for p in npm_paths:
print(f"[lockfile-audit] npm: {p}", flush = True)
all_findings.extend(audit_npm_lockfile(p))
for p in cargo_paths:
print(f"[lockfile-audit] cargo: {p}", flush = True)
all_findings.extend(audit_cargo_lockfile(p))
if not all_findings:
print(
f"[lockfile-audit] OK: 0 findings across "
f"{len(npm_paths)} npm + {len(cargo_paths)} cargo lockfile(s)",
flush = True,
)
return 0
print(
f"\n[lockfile-audit] FAIL: {len(all_findings)} finding(s):\n",
file = sys.stderr,
)
for f in all_findings:
print(str(f), file = sys.stderr)
print(file = sys.stderr)
print(
"[lockfile-audit] Refusing to proceed. Each finding above is "
"either a structural lockfile anomaly or a public indicator-of-"
"compromise. Investigate before running `npm ci` or `cargo fetch`.",
file = sys.stderr,
)
return 1
if __name__ == "__main__":
sys.exit(main())