From ac765d2efbcd7410e95ab8b2feade2aaae95b806 Mon Sep 17 00:00:00 2001 From: Daniel Han Date: Mon, 11 May 2026 20:36:52 -0700 Subject: [PATCH] studio/ci: pre-install lockfile supply-chain audit (npm + cargo) (#5392) * studio/ci: pre-install lockfile supply-chain audit (npm + cargo) The Mini Shai-Hulud wave that hit @tanstack/* on 2026-05-11 19:20-19:26 UTC (GHSA-g7cv-rxg3-hmpx) pushed 84 malicious versions across 42 packages. Each compromised tarball carried an `optionalDependencies` entry pointing at a GitHub-hosted prepare script that exfiltrated GitHub / npm / AWS / Vault / SSH credentials on `npm install` / `npm ci`. Our current lockfile pins ALL @tanstack/* at pre-malicious versions so we were not exposed, but the only defense layer between "dependabot opens a security-update PR during a malicious window" and "a compromised package's postinstall runs on the CI runner" is the advisory-DB latency. `npm audit` and OSV-Scanner are reactive: there is a window between malicious publication and GHSA landing. Add a pre-install lockfile audit that fires on the injection pattern itself, BEFORE `npm ci` gets a chance to execute lifecycle scripts: scripts/lockfile_supply_chain_audit.py npm side (studio/frontend/package-lock.json, lockfileVersion 2/3): 1. every `resolved` URL must point to registry.npmjs.org; direct GitHub / git+ / file: refs are the Shai-Hulud vector 2. every non-bundled entry must carry an `integrity` SHA 3. raw-text scan for known IOC strings (router_init.js, tanstack_runner.js, router_runtime.js, @tanstack/setup, the specific TanStack worm commit hash, getsession.org exfiltration host, "A Mini Shai-Hulud has Appeared" marker) 4. nested `node_modules/.../node_modules/` fold-ins are transparent -- they ride on the parent tarball's integrity cargo side (studio/src-tauri/Cargo.lock): 5. every `source` must be the crates.io registry 6. registry crates must have a `checksum` 7. one allowlist entry: fix-path-env from tauri-apps/fix-path-env-rs at pinned SHA c4c45d5. Any other non-registry source -- or a bump of that pinned SHA -- re-fires the audit until reviewed + appended Wire into four workflows: .github/workflows/security-audit.yml -- new step inside the advisory-audit job, immediately before `npm audit` so the structural pass and the advisory-DB pass appear together in the GitHub step summary. .github/workflows/studio-frontend-ci.yml, .github/workflows/wheel-smoke.yml, .github/workflows/studio-tauri-smoke.yml -- new step immediately BEFORE `npm ci`. If a future malicious bump lands in our lockfile, the audit refuses and `npm ci` never runs, so no `prepare` / `postinstall` from a compromised tarball can execute on the runner. Note on --ignore-scripts: every npm ci in our CI is followed directly by `npm run build` or `tauri build`, both of which depend on package install scripts (esbuild's native-binary postinstall, etc.). Blanket --ignore-scripts breaks the build, so the pre-install structural audit is the practical mitigation. The audit reads lockfiles only; it never executes anything from them. Verified: - Clean state: 0 findings on the current tree (npm + cargo). - Fault injection: synthetic `@tanstack/setup` IOC + non-registry `resolved` URL both fire with exit code 1. - YAML parses cleanly for all four modified workflows. Refs: - https://tanstack.com/blog/npm-supply-chain-compromise-postmortem - https://github.com/TanStack/router/issues/7383 - https://github.com/TanStack/router/security/advisories/GHSA-g7cv-rxg3-hmpx - https://www.aikido.dev/blog/mini-shai-hulud-is-back-tanstack-compromised - https://www.stepsecurity.io/blog/mini-shai-hulud-is-back-a-self-spreading-supply-chain-attack-hits-the-npm-ecosystem * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- .github/workflows/security-audit.yml | 21 + .github/workflows/studio-frontend-ci.yml | 8 + .github/workflows/studio-tauri-smoke.yml | 3 + .github/workflows/wheel-smoke.yml | 3 + scripts/lockfile_supply_chain_audit.py | 486 +++++++++++++++++++++++ 5 files changed, 521 insertions(+) create mode 100755 scripts/lockfile_supply_chain_audit.py diff --git a/.github/workflows/security-audit.yml b/.github/workflows/security-audit.yml index 0fc8073e7..df0af95df 100644 --- a/.github/workflows/security-audit.yml +++ b/.github/workflows/security-audit.yml @@ -244,6 +244,27 @@ jobs: echo '```' } >> "$GITHUB_STEP_SUMMARY" + # ───────────────────────────────────────────────────────────── + # Pre-install lockfile supply-chain audit (npm + cargo). + # Catches structural anomalies (non-registry resolved URLs, + # missing integrity hashes, known IOC strings) BEFORE `npm + # audit` or OSV-Scanner consult the advisory DB. The advisory + # path is reactive -- there is a window between a malicious + # publication and the GHSA landing. This step fires on the + # injection pattern itself so it catches the same class of + # attack the moment the lockfile shape becomes wrong. + # ───────────────────────────────────────────────────────────── + - name: Lockfile supply-chain audit (pre-install scan) + run: | + python3 scripts/lockfile_supply_chain_audit.py + { + echo "## Lockfile supply-chain audit" + echo + echo "Scanned: studio/frontend/package-lock.json + studio/src-tauri/Cargo.lock" + echo + echo "No structural anomalies or known IOC strings." + } >> "$GITHUB_STEP_SUMMARY" + # ───────────────────────────────────────────────────────────── # npm: Studio frontend # ───────────────────────────────────────────────────────────── diff --git a/.github/workflows/studio-frontend-ci.yml b/.github/workflows/studio-frontend-ci.yml index eb00e297a..bde62c87f 100644 --- a/.github/workflows/studio-frontend-ci.yml +++ b/.github/workflows/studio-frontend-ci.yml @@ -58,6 +58,14 @@ jobs: cache: 'npm' cache-dependency-path: studio/frontend/package-lock.json + # Run the structural lockfile scan BEFORE npm ci. A compromised + # tarball runs its `prepare` / `postinstall` during `npm ci`, + # so any catch has to fire upstream of that. The scanner is + # pure-Python read-only; safe to call ahead of every install. + - name: Lockfile supply-chain audit (pre-install scan) + working-directory: ${{ github.workspace }} + run: python3 scripts/lockfile_supply_chain_audit.py + - name: Lockfile must agree with package.json (npm ci is strict) run: npm ci --no-fund --no-audit diff --git a/.github/workflows/studio-tauri-smoke.yml b/.github/workflows/studio-tauri-smoke.yml index d517a5f45..23b57d7e0 100644 --- a/.github/workflows/studio-tauri-smoke.yml +++ b/.github/workflows/studio-tauri-smoke.yml @@ -69,6 +69,9 @@ jobs: echo "$out" [ "$out" = "tauri-cli 2.10.1" ] || { echo "::error::expected tauri-cli 2.10.1, got $out"; exit 1; } + - name: Lockfile supply-chain audit (pre-install scan) + run: python3 scripts/lockfile_supply_chain_audit.py + - name: Frontend build (npm ci, vite) working-directory: studio/frontend run: | diff --git a/.github/workflows/wheel-smoke.yml b/.github/workflows/wheel-smoke.yml index 983070ae1..dad867039 100644 --- a/.github/workflows/wheel-smoke.yml +++ b/.github/workflows/wheel-smoke.yml @@ -53,6 +53,9 @@ jobs: with: python-version: '3.12' + - name: Lockfile supply-chain audit (pre-install scan) + run: python3 scripts/lockfile_supply_chain_audit.py + - name: Build frontend run: | cd studio/frontend diff --git a/scripts/lockfile_supply_chain_audit.py b/scripts/lockfile_supply_chain_audit.py new file mode 100755 index 000000000..e52183214 --- /dev/null +++ b/scripts/lockfile_supply_chain_audit.py @@ -0,0 +1,486 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: AGPL-3.0-only +# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. + +"""Lockfile supply-chain audit for the Studio frontend and Tauri shell. + +Runs BEFORE `npm ci` / `cargo fetch` in CI. Refuses to proceed when a +lockfile contains patterns that indicate the kind of supply-chain +injection seen in the npm Shai-Hulud waves and the cargo +crates.io brand-squat attempts. + +What it checks +============== + +studio/frontend/package-lock.json (lockfileVersion 2 or 3): + + 1. `resolved` URL origin. Every entry must resolve through + `https://registry.npmjs.org/`. Direct GitHub-hosted dependencies + (`git+ssh://`, `git+https://`, `github:owner/repo#sha`, + `file:`, `http://`) are refused -- npm's TanStack incident used + exactly this vector to land an unaudited GitHub commit hash as + an optional dependency. + + 2. `integrity` field presence. Every non-workspace entry must carry + an `integrity` SHA. A missing integrity means the registry can + swap the tarball after lockfile generation and CI will not + notice. + + 3. Known IOC strings. A hardcoded set of indicator-of-compromise + substrings is grepped across the entire lockfile body (file + names, dependency keys, URLs). The list is updated as new + campaigns surface. Catching one means the local install was + about to pull a publicly-known malicious release. + +studio/src-tauri/Cargo.lock: + + 4. `source` field origin. Every entry with a `source` must point at + `registry+https://github.com/rust-lang/crates.io-index`. Direct + git sources (`git+https://...`) and `path+...` for cross-crate + paths warrant manual review and are flagged. + + 5. Known cargo IOC strings. Same idea as (3), separate list. + +Exit codes +========== + + 0 no findings, or an opt-out env var (UNSLOTH_LOCKFILE_AUDIT_SKIP=1) + is set + 1 one or more findings; stderr lists them with file path and line + number where derivable + 2 internal error (missing dependency, malformed JSON, etc.) + +Operational stance +================== + +This scanner only PARSES the lockfiles -- it never executes anything +in them, never resolves anything against the network. Safe to run +ahead of every `npm ci`. The IOC list is short by design; this +complements (not replaces) `npm audit`, OSV-Scanner, and the +advisory-DB pipeline in `.github/workflows/security-audit.yml`. The +shape of the catch is "we refuse to proceed because the lockfile +itself is shaped wrong", which fires before any third-party install +script gets a chance to run on the runner. +""" + +from __future__ import annotations + +import argparse +import json +import os +import re +import sys +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[1] + + +# ───────────────────────────────────────────────────────────────────── +# Known IOC strings (case-sensitive substring match). +# ───────────────────────────────────────────────────────────────────── +# +# Keep these short and FACTUAL. Each entry is tied to a public advisory +# and is the literal string an attacker would have to embed for the +# attack to work. Adding speculative or generic patterns here would +# generate false positives on dependency upgrades. +NPM_IOC_STRINGS: tuple[str, ...] = ( + # Shai-Hulud TanStack wave -- May 11, 2026 (GHSA-g7cv-rxg3-hmpx). + "router_init.js", + "tanstack_runner.js", + "router_runtime.js", + "@tanstack/setup", + "github:tanstack/router#79ac49eedf774dd4b0cfa308722bc463cfe5885c", + # Exfiltration endpoints observed across both Shai-Hulud waves. + "filev2.getsession.org", + "getsession.org/file/", + # Campaign markers; the worm tarballs print this to stdout on run. + "A Mini Shai-Hulud has Appeared", +) + +CARGO_IOC_STRINGS: tuple[str, ...] = ( + # Reserved for future cargo-side incidents. Empty by default -- + # `source` origin check below catches the structural pattern. +) + + +# ───────────────────────────────────────────────────────────────────── +# Allowed lockfile origins. +# ───────────────────────────────────────────────────────────────────── +NPM_REGISTRY_PREFIX = "https://registry.npmjs.org/" + +# Tarballs are also fetched from this mirror on some GH Actions cached +# runs (npm rewrites the resolved URL on cache hit). Allow either. +NPM_REGISTRY_PREFIXES_ALLOWED: tuple[str, ...] = (NPM_REGISTRY_PREFIX,) + +CARGO_REGISTRY_SOURCE = "registry+https://github.com/rust-lang/crates.io-index" + + +# ───────────────────────────────────────────────────────────────────── +# Cargo non-registry source allowlist. +# ───────────────────────────────────────────────────────────────────── +# +# Each entry is `(crate_name, exact_source_string)`. The crate must +# match by name AND the source must match the full pinned-SHA string +# verbatim. Bumping the commit SHA forces a re-review here: the +# scanner fires until the new SHA is appended. +# +# Studio's Tauri shell pulls `fix-path-env` directly from +# tauri-apps/fix-path-env-rs because the crate is not published to +# crates.io. The pinned commit (c4c45d5) was reviewed at the time it +# landed; future bumps need explicit approval. +CARGO_SOURCE_ALLOWLIST: tuple[tuple[str, str], ...] = ( + ( + "fix-path-env", + "git+https://github.com/tauri-apps/fix-path-env-rs#" + "c4c45d503ea115a839aae718d02f79e7c7f0f673", + ), +) + + +# ───────────────────────────────────────────────────────────────────── +# Finding container. +# ───────────────────────────────────────────────────────────────────── + + +class Finding: + __slots__ = ("path", "package", "kind", "detail") + + def __init__(self, path: str, package: str, kind: str, detail: str) -> None: + self.path = path + self.package = package + self.kind = kind + self.detail = detail + + def __str__(self) -> str: + return ( + f" [{self.kind}] {self.path}\n" + f" package: {self.package}\n" + f" detail: {self.detail}" + ) + + +# ───────────────────────────────────────────────────────────────────── +# package-lock.json audit. +# ───────────────────────────────────────────────────────────────────── + + +def audit_npm_lockfile(path: Path) -> list[Finding]: + findings: list[Finding] = [] + if not path.exists(): + return findings + + raw = path.read_text(encoding = "utf-8") + try: + lock = json.loads(raw) + except json.JSONDecodeError as exc: + findings.append( + Finding( + path = str(path), + package = "", + kind = "malformed-lockfile", + detail = f"could not parse as JSON: {exc}", + ) + ) + return findings + + lockfile_version = lock.get("lockfileVersion") + if lockfile_version not in (2, 3): + findings.append( + Finding( + path = str(path), + package = "", + kind = "unsupported-lockfile-version", + detail = (f"only lockfileVersion 2 or 3 audited; got {lockfile_version}"), + ) + ) + + packages = lock.get("packages") or {} + for key, entry in packages.items(): + # The empty key "" is the project root; workspace entries use + # keys like "node_modules/foo" or "studio/frontend/sub-pkg". + # Skip the project root (it has no `resolved`). + if key == "": + continue + if entry.get("link"): + # Workspace symlink; no tarball to resolve. + continue + + resolved = entry.get("resolved") + # Entries living inside another package's `node_modules/` + # tree are bundled fold-ins -- the parent's tarball ships + # their source verbatim and the parent's `integrity` covers + # the whole subtree. npm represents them in lockfileVersion 3 + # as nested entries with no `resolved` and no `integrity` of + # their own. Treat them as transparent to this audit. + nested = key.count("/node_modules/") >= 1 + + # 1. resolved-URL origin. + if resolved is None: + if nested or entry.get("bundled"): + # Bundled / fold-in entry; covered by parent integrity. + pass + elif entry.get("version"): + # Top-level entry without a resolved URL is suspicious. + findings.append( + Finding( + path = str(path), + package = key, + kind = "missing-resolved-url", + detail = ( + f"version={entry['version']!r} but no `resolved` " + "field; lockfile is incomplete" + ), + ) + ) + else: + if not any(resolved.startswith(p) for p in NPM_REGISTRY_PREFIXES_ALLOWED): + findings.append( + Finding( + path = str(path), + package = key, + kind = "non-registry-resolved-url", + detail = ( + f"resolved={resolved!r}; only " + f"{NPM_REGISTRY_PREFIX} is permitted. Direct " + "GitHub / git / file references are the " + "Shai-Hulud injection vector." + ), + ) + ) + + # 2. integrity-hash presence. + if resolved is not None and not entry.get("integrity"): + findings.append( + Finding( + path = str(path), + package = key, + kind = "missing-integrity-hash", + detail = ( + "no `integrity` field; npm cannot verify the " + "tarball SHA against the registry-published hash" + ), + ) + ) + + # 3. Known IOC strings: scan the raw file body so we hit fields the + # structural pass above doesn't enumerate (scripts, optional + # dependencies, etc.). Cheap and complete. + for ioc in NPM_IOC_STRINGS: + if ioc in raw: + # Best-effort line number lookup. + line_no = _first_line_containing(raw, ioc) + findings.append( + Finding( + path = f"{path}:{line_no}" if line_no else str(path), + package = "", + kind = "known-ioc-string", + detail = ( + f"matched known IOC substring {ioc!r}; this is " + "a public indicator of a recent supply-chain " + "compromise. Refuse to install." + ), + ) + ) + + return findings + + +def _first_line_containing(text: str, needle: str) -> int | None: + for i, line in enumerate(text.splitlines(), start = 1): + if needle in line: + return i + return None + + +# ───────────────────────────────────────────────────────────────────── +# Cargo.lock audit. +# ───────────────────────────────────────────────────────────────────── + + +# Cargo.lock is TOML; parse with stdlib tomllib (Python 3.11+). The +# studio's Tauri shell already requires a modern toolchain so this is +# always available where CI runs. +_PACKAGE_HEADER = re.compile(r"^\[\[package\]\]\s*$") + + +def audit_cargo_lockfile(path: Path) -> list[Finding]: + findings: list[Finding] = [] + if not path.exists(): + return findings + + raw = path.read_text(encoding = "utf-8") + try: + import tomllib # type: ignore[import-not-found] + except ImportError: + # Python <3.11; fall back to a tomli shim if importable. + try: + import tomli as tomllib # type: ignore[no-redef] + except ImportError: + findings.append( + Finding( + path = str(path), + package = "", + kind = "missing-toml-parser", + detail = ( + "Python 3.11+ tomllib or tomli is required to " + "parse Cargo.lock; install tomli or upgrade " + "Python before re-running this audit" + ), + ) + ) + return findings + + try: + lock = tomllib.loads(raw) + except Exception as exc: + findings.append( + Finding( + path = str(path), + package = "", + kind = "malformed-lockfile", + detail = f"could not parse as TOML: {exc}", + ) + ) + return findings + + for entry in lock.get("package", []): + name = entry.get("name") or "" + version = entry.get("version") or "" + source = entry.get("source") + # Workspace-local crates have no `source` field; skip them. + if source is None: + continue + if source != CARGO_REGISTRY_SOURCE: + if (name, source) in CARGO_SOURCE_ALLOWLIST: + # Pre-approved non-registry source pinned by SHA. + pass + else: + findings.append( + Finding( + path = str(path), + package = f"{name}@{version}", + kind = "non-registry-cargo-source", + detail = ( + f"source={source!r}; only " + f"{CARGO_REGISTRY_SOURCE!r} is permitted " + "by default, and no allowlist entry covers " + "this crate. If the source is legitimate, " + "add `(name, source)` to " + "CARGO_SOURCE_ALLOWLIST after reviewing the " + "pinned commit." + ), + ) + ) + if not entry.get("checksum") and source == CARGO_REGISTRY_SOURCE: + findings.append( + Finding( + path = str(path), + package = f"{name}@{version}", + kind = "missing-cargo-checksum", + detail = ( + "registry crate without checksum; cargo cannot " + "verify the downloaded source against the " + "registry-published SHA" + ), + ) + ) + + for ioc in CARGO_IOC_STRINGS: + if ioc in raw: + line_no = _first_line_containing(raw, ioc) + findings.append( + Finding( + path = f"{path}:{line_no}" if line_no else str(path), + package = "", + kind = "known-ioc-string", + detail = f"matched known IOC substring {ioc!r}", + ) + ) + + return findings + + +# ───────────────────────────────────────────────────────────────────── +# CLI. +# ───────────────────────────────────────────────────────────────────── + + +DEFAULT_NPM_LOCKFILES = ("studio/frontend/package-lock.json",) +DEFAULT_CARGO_LOCKFILES = ("studio/src-tauri/Cargo.lock",) + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser( + description = "Pre-install lockfile supply-chain audit.", + ) + parser.add_argument( + "--root", + default = str(REPO_ROOT), + help = "Repo root (default: parent of this script).", + ) + parser.add_argument( + "--npm-lockfile", + action = "append", + default = None, + help = ( + "Path to a package-lock.json (repeatable). " + "Default: studio/frontend/package-lock.json." + ), + ) + parser.add_argument( + "--cargo-lockfile", + action = "append", + default = None, + help = ( + "Path to a Cargo.lock (repeatable). " + "Default: studio/src-tauri/Cargo.lock." + ), + ) + args = parser.parse_args(argv) + + if os.environ.get("UNSLOTH_LOCKFILE_AUDIT_SKIP") == "1": + print( + "[lockfile-audit] UNSLOTH_LOCKFILE_AUDIT_SKIP=1; " + "audit skipped (expected only for local triage)", + flush = True, + ) + return 0 + + root = Path(args.root).resolve() + npm_paths = [root / p for p in (args.npm_lockfile or DEFAULT_NPM_LOCKFILES)] + cargo_paths = [root / p for p in (args.cargo_lockfile or DEFAULT_CARGO_LOCKFILES)] + + all_findings: list[Finding] = [] + for p in npm_paths: + print(f"[lockfile-audit] npm: {p}", flush = True) + all_findings.extend(audit_npm_lockfile(p)) + for p in cargo_paths: + print(f"[lockfile-audit] cargo: {p}", flush = True) + all_findings.extend(audit_cargo_lockfile(p)) + + if not all_findings: + print( + f"[lockfile-audit] OK: 0 findings across " + f"{len(npm_paths)} npm + {len(cargo_paths)} cargo lockfile(s)", + flush = True, + ) + return 0 + + print( + f"\n[lockfile-audit] FAIL: {len(all_findings)} finding(s):\n", + file = sys.stderr, + ) + for f in all_findings: + print(str(f), file = sys.stderr) + print(file = sys.stderr) + print( + "[lockfile-audit] Refusing to proceed. Each finding above is " + "either a structural lockfile anomaly or a public indicator-of-" + "compromise. Investigate before running `npm ci` or `cargo fetch`.", + file = sys.stderr, + ) + return 1 + + +if __name__ == "__main__": + sys.exit(main())