unsloth/scripts
Daniel Han fa3840cf6d scripts: harden github_blob_to_raw against substring URL spoofing
CodeQL flagged scripts/notebook_to_python.py:33's
`if "github.com" in url and "/blob/" in url` as
py/incomplete-url-substring-sanitization: "github.com" can sit
anywhere in the URL, so an attacker-controlled URL like
https://attacker.example.com/github.com/blob/x would be rewritten
to a raw.githubusercontent.com URL and fetched as if it were a
real GitHub blob.

Switch to urllib.parse.urlparse and require parsed.netloc ==
"github.com" exactly, then rewrite via a proper urlunparse on the
parsed components (path is replaced with first /blob/ -> / only).
Query strings and fragments now round-trip correctly too, which
was an incidental bug in the old string-replace path.

Closes the high-severity CodeQL alert on PR head 08235625.
2026-05-08 02:43:47 +00:00
..
data CI(notebooks): cross-repo validator for unslothai/notebooks 2026-05-07 11:42:57 +00:00
enforce_kwargs_spacing.py Formatting & bug fixes (#3563) 2025-11-07 06:00:22 -08:00
install_gemma4_mlx.sh Move gemma4 script (#4994) 2026-04-12 23:41:15 -07:00
install_qwen3_6_mlx.sh Add qwen3.6 script (#5084) 2026-04-17 01:21:30 -07:00
notebook_to_python.py scripts: harden github_blob_to_raw against substring URL spoofing 2026-05-08 02:43:47 +00:00
notebook_validator.py CI(notebooks): cross-repo validator for unslothai/notebooks 2026-05-07 11:42:57 +00:00
run_ruff_format.py Formatting & bug fixes (#3563) 2025-11-07 06:00:22 -08:00
scan_packages.py [pre-commit.ci] auto fixes from pre-commit.com hooks 2026-05-06 23:56:54 +00:00