mirror of
https://github.com/unslothai/unsloth.git
synced 2026-05-17 03:56:07 +00:00
scripts/verify_comment_only_diff.py compares a list of changed files
between two git refs and reports whether each diff is strictly comments
or docstrings.
* .py: parse both revs into AST, strip module / class / function
docstrings, then compare ast.unparse output. Pure Python comments
are discarded by ast.parse by construction, so any post-strip diff
is real code.
* .yml / .yaml: yaml.safe_load both sides and compare the parsed
Python object; if scalar values differ, also strip shell comments
inside any multi-line scalar (i.e. `run: |` script bodies) before
comparing.
Exit code is 0 if every file is comment-only, 1 otherwise. The script
also prints a tight diff snippet for any FAIL line so a reviewer can
spot the real code change at a glance.
This is what I used to gate the trim PRs #5418 (this repo) and #640
(unsloth-zoo). Shipping it under scripts/ so any contributor can
deterministically prove a comment / docstring refactor is truly
comment-only, without manually eyeballing every line of a 4000-line
diff.
Usage:
python scripts/verify_comment_only_diff.py [--base REF] [--head REF] path ...
Defaults: --base origin/main, --head HEAD. Paths are repo-relative.
Smoke test against the squash-merged PR #5418 (a real 3-file pure trim):
git diff --name-only 6994d07f~1..6994d07f \
| xargs python scripts/verify_comment_only_diff.py --base 6994d07f~1 --head
|
||
|---|---|---|
| .. | ||
| data | ||
| check_new_install_scripts.py | ||
| enforce_kwargs_spacing.py | ||
| install_gemma4_mlx.sh | ||
| install_qwen3_6_mlx.sh | ||
| lint_workflow_triggers.py | ||
| lockfile_supply_chain_audit.py | ||
| notebook_to_python.py | ||
| notebook_validator.py | ||
| run_ruff_format.py | ||
| scan_npm_packages.py | ||
| scan_packages.py | ||
| stamp_studio_release.py | ||
| verify_comment_only_diff.py | ||