Related to systematic release workflow failures. The workflow has never
successfully completed from start to finish since validation was added.
Root causes identified and fixed:
1. **GraphQL node_id vs numeric release ID**: The create-release job was
using `gh release view --json id` which returns a GraphQL node_id
(RE_kwDON5nJtM4PmlTt) instead of the numeric database ID (261772525)
needed by the REST API. The validation workflow then failed with 404
when trying to download assets. Fixed by using `gh api` to get the
numeric ID from the releases list endpoint.
2. **Missing binaries in Docker image**: The validation script expects 26
binaries + 3 Windows symlinks in /opt/pulse/bin/, but the Dockerfile
was only copying a subset. Missing binaries included the main pulse
server binary, armv6/386 builds for all agents, and caused immediate
validation failure. Fixed by copying all built binaries from
backend-builder stage.
3. **Assets-only validation fallback broken**: When Docker image pull
times out, the workflow falls back to assets-only validation but was
still calling the validation script without --skip-docker flag,
causing it to fail on the first docker command. Fixed by passing
--skip-docker flag in the fallback path.
4. **Asset download pagination**: The asset download was not using
--paginate, which would cause silent failures once we exceed 30 assets
(currently at 27). Fixed by adding --paginate to gh api call.
All fixes verified locally and address the complete failure chain.
The gh release download command doesn't work with draft releases.
Switch to using curl with GitHub API and authentication token to download assets.
This allows validation to work properly with draft releases.
Related to #695
Added exponential backoff retry logic to handle Docker Hub CDN
propagation delays (2-5 minutes after push).
Validation workflow now:
- Retries Docker image pull up to 10 times
- Uses exponential backoff: 30s, 60s, 120s, 120s...
- Total timeout: ~10 minutes max
- Continues with asset-only validation if image unavailable
This keeps validation enabled (important for quality) while
fixing the race condition that caused consistent failures.
Related to #695
The Python heredoc was not indented, causing YAML parsers to interpret
the Python code as YAML syntax. This caused workflow_dispatch runs to
fail instantly with 'workflow file issue' error before any jobs could start.
The fix indents the heredoc content and changes delimiter from 'PY' to
'EOF' to match standard conventions.
Create GitHub Actions workflow that validates release assets AFTER they're uploaded
to catch issues even if someone manually uploads or modifies assets.
Features:
- Triggers on release created/edited (draft only)
- Downloads all assets from GitHub release
- Re-runs scripts/validate-release.sh on downloaded assets
- On validation failure:
* Deletes all assets from the release
* Sets commit status to failed
* Updates release description with error details
- On validation success:
* Sets commit status to success
* Updates release description with validation summary
This acts as a safety gate to prevent publishing releases with:
- Missing required files
- Checksum mismatches
- Incorrect version strings in binaries
- Corrupted or incomplete uploads