Commit graph

4826 commits

Author SHA1 Message Date
rcourtman
dc8eaa3ffe Add production-grade Helm chart improvements
High-impact improvements based on Codex recommendations:

1. values.schema.json - JSON schema validation catches config errors at install time
2. helm-docs automation - Auto-generates documentation from values.yaml comments
3. kind smoke tests - Deploys and upgrades chart in real cluster to catch runtime issues
4. ServiceMonitor template - Built-in Prometheus integration for observability
5. Artifact Hub metadata - Changelog, links, and maintainer info for better discoverability

These improvements provide:
- Configuration validation before deployment
- Always up-to-date documentation
- Runtime validation in CI
- First-class monitoring support
- Better user experience on Artifact Hub

Related to #686
2025-11-11 19:52:58 +00:00
rcourtman
3477aa3dae Update Kubernetes docs with GitHub Pages Helm repository
- Replace GHCR OCI instructions with GitHub Pages repository
- Add comprehensive upgrade instructions with examples
- Add rollback procedures
- Add detailed uninstall instructions
- Simplify installation (no authentication required)
2025-11-11 19:40:51 +00:00
rcourtman
b042365652 Add automated version syncing and validation to Helm workflow
- Auto-update Chart.yaml version from release tag or manual input
- Add strict helm lint validation before publishing
- Validate chart templates with multiple configuration scenarios
- Ensures chart quality before publishing to GitHub Pages
2025-11-11 19:40:04 +00:00
rcourtman
3fffd165ea Add Artifact Hub repository metadata for chart discoverability
Enables automatic listing on https://artifacthub.io for improved
Helm chart discovery and provides metadata like screenshots, links,
and maintainer information.
2025-11-11 19:39:19 +00:00
rcourtman
9bc5b754c7 Update README with GitHub Pages Helm repository instructions
Replace GHCR OCI registry instructions with GitHub Pages Helm repo.
Simpler installation without authentication requirements.

Resolves #686
2025-11-11 19:32:28 +00:00
rcourtman
37dc0682ee Update Helm chart version to 4.28.0 for GitHub Pages release 2025-11-11 19:30:52 +00:00
rcourtman
8754974e21 Fix Helm chart releaser to skip existing releases
Use helm-chart- prefix for releases to avoid conflicts with main Pulse releases
2025-11-11 19:28:48 +00:00
rcourtman
b89c4317d0 Add GitHub Pages Helm repository distribution (#686)
GHCR OCI packages cannot be made public through any available mechanism:
- Package doesn't appear in user/repo package lists
- API endpoints return 404
- Workflow tokens lack package visibility permissions
- Manual UI shows no packages to configure
- OCI annotations don't link package to repository

Implementing GitHub Pages Helm repo as canonical distribution method:
- Uses chart-releaser-action to publish to gh-pages branch
- Provides standard 'helm repo add' workflow without authentication
- Maintains OCI push for future use if GHCR resolves visibility issues

Resolves #686
2025-11-11 19:26:18 +00:00
rcourtman
c7895839fb Fix validation: Linux host-agent binaries are in main tarballs
Linux host-agent binaries don't have separate archives - they're included in
the main pulse-v*.tar.gz files. Only macOS and Windows have separate archives.
2025-11-11 19:25:14 +00:00
rcourtman
0a887e9b64 Add OCI annotations to Helm chart to link package to repository
Adding org.opencontainers.image.source annotation will connect the GHCR package
to the repository, making it visible in the repo's packages section and allowing
proper visibility management.

Related to #686
2025-11-11 19:24:52 +00:00
rcourtman
3ea15b1e79 Update validation script to match new asset list
Removed validation checks for standalone binaries that are no longer
uploaded to GitHub releases. These binaries are only needed in Docker
images for the /download/ endpoint.

Updated required assets list to include all versioned tarballs/zips
instead of standalone binaries.
2025-11-11 17:50:02 +00:00
rcourtman
2fb223ffc5 Improve Helm chart package visibility configuration (related to #686)
Add fallback attempts to set package visibility through multiple API endpoints.
Also adds helpful output message with verification link.
2025-11-11 17:50:02 +00:00
rcourtman
19e86f4560 Reduce release assets by removing duplicates
Removed:
- Individual .sha256 files (checksums.txt already contains all checksums)
- Standalone binaries without version numbers (users should download versioned tarballs/zips)

Standalone binaries are only needed in Docker images for the /download/ endpoint.
GitHub releases should only contain versioned archives for user downloads.

This reduces release assets from ~54 files to ~19 files per release.
2025-11-11 17:26:00 +00:00
rcourtman
31e5d5b3b7 Automatically set Helm chart package visibility to public (related to #686)
The pulse-chart package in GHCR currently requires authentication for pulls
because it defaults to private visibility. This affects all users trying to
install via `helm install oci://ghcr.io/rcourtman/pulse-chart`.

This commit adds a workflow step to automatically set the package to public
after each push, enabling anonymous pulls without requiring `helm registry login`.

Note: The existing package will need one-time manual configuration via GitHub
web UI until the next release triggers this workflow.

Related to discussion #686
2025-11-11 17:19:03 +00:00
rcourtman
b00b176b9a Exclude development/infrastructure changes from release notes
Users don't care about CI/CD improvements, release workflows, build
processes, or testing infrastructure. Only include user-visible changes.

Related to #671
2025-11-11 17:18:50 +00:00
rcourtman
7ad8d8310b Remove commit hashes from LLM-generated release notes
Commit hashes clutter the release notes and aren't useful for end users.
Only include issue references when explicitly mentioned in commits.

Related to #671
2025-11-11 17:11:02 +00:00
rcourtman
583d21bdf9 Fix commit hash linking in release notes
Remove # symbol from commit hash references so GitHub auto-links them.
Format: (abc123) instead of (#abc123)
Issue references still use #: (#123)

Related to #671
2025-11-11 17:03:39 +00:00
rcourtman
ef0452f326 Use heredoc to write release notes without bash interpretation
Backticks in GitHub Actions output were still being interpreted even
when assigned to a variable and then echoed to a file. Use heredoc
with single quotes to prevent any bash expansion.

Related to #671
2025-11-11 16:21:22 +00:00
rcourtman
c98ab35da6 Fix release notes backtick command substitution issue
Use --notes-file instead of --notes with variable expansion to prevent
bash from interpreting markdown code blocks as shell commands.

Fixes the error where installation examples like:
  ```bash
  docker pull rcourtman/pulse:v4.29.0
  ```

Were being executed as actual commands during release creation.

Related to #671
2025-11-11 16:09:02 +00:00
rcourtman
16b46fb5f1 Fix release workflow: fetch git tags for changelog generation
The checkout wasn't fetching tags despite fetch-depth: 0.
Explicitly run git fetch --tags --force after checkout.
2025-11-11 15:44:00 +00:00
rcourtman
ffbd4a4632 Fix release workflow: fetch git tags for changelog generation
actions/checkout@v4 does not fetch tags by default, causing the
previous tag lookup to fail and fall back to comparing with the
first commit SHA. Added fetch-depth: 0 to fetch all history including tags.
2025-11-11 15:32:48 +00:00
rcourtman
a8612daecb Simplify previous tag detection for release notes
Just use the latest tag directly instead of trying to exclude the current version.
Since we're generating release notes BEFORE creating the tag, the latest tag
will always be the previous release.
2025-11-11 15:27:37 +00:00
rcourtman
cfcf09da27 Fix release notes generation: properly detect previous tag
The script was failing because git describe --tags --abbrev=0 HEAD^ returns
the current HEAD commit SHA when no tag exists before HEAD, resulting in
comparing HEAD..HEAD which has zero commits.

Now using git tag --sort=-version:refname to get the latest tag (excluding
the version being released) which will properly compare v4.29.0 with v4.28.0.
2025-11-11 15:03:47 +00:00
rcourtman
5f8bfa0cc1 Improve error reporting in release notes generation
- Capture script exit code before checking
- Show full error output if script fails
- Prevents silent failures where error is hidden in temp file

Related to #671 (automated release workflow)
2025-11-11 14:38:39 +00:00
rcourtman
a5f2568d66 Remove RELEASE_PROCEDURE.md - will add to CLAUDE.md instead 2025-11-11 14:27:48 +00:00
rcourtman
71dcac56d5 Add comprehensive release procedure documentation
Document the complete automated release process for future reference:
- Step-by-step release workflow trigger
- What each phase does (Docker build, release creation)
- How to review and publish draft releases
- Troubleshooting common issues
- Emergency rollback procedures
- Workflow architecture and design principles

This ensures future AI contexts and maintainers understand the full
release process without needing to reverse-engineer the workflow.

Related to #671 (automated release workflow)
2025-11-11 14:26:27 +00:00
rcourtman
99bc7d0593 Fix release notes extraction in workflow
- Replace sed with awk for more reliable multiline extraction
- Use temp file to capture full script output
- Extract content between separator lines correctly
- Fixes empty release notes in draft releases

Previous issue: sed pattern wasn't matching the separator lines,
resulting in empty RELEASE_NOTES variable.

New approach: Use awk to capture everything between the two separator
lines, handling multiline content properly.

Related to #671 (automated release workflow)
2025-11-11 14:22:46 +00:00
rcourtman
151aaceafc Update release notes template to match established format
- Use exact template format from v4.28.0 and prior releases
- Include all standard sections: New Features, Bug Fixes, Improvements, Breaking Changes
- Add complete installation instructions (systemd, Docker, Manual Binary, Helm)
- Include Downloads section with all artifact types
- Add Notes section for important highlights and upgrade considerations
- Ensure LLM outputs format exactly matching previous releases

Related to #671 (automated release workflow)
2025-11-11 14:05:15 +00:00
rcourtman
c7b64685e0 Add LLM-powered release notes generation
- Create scripts/generate-release-notes.sh to auto-generate release notes from git commits
- Supports both Anthropic Claude and OpenAI APIs
- Uses Claude Haiku 4.5 (claude-haiku-4-5-20251001) for cost efficiency ($1/$5 per million tokens)
- Falls back to OpenAI gpt-4o-mini if Anthropic key not available
- Integrates into release workflow between validation and release creation
- Compares current version with previous git tag to generate changelog
- Outputs categorized, user-friendly release notes with installation instructions

Workflow now automatically:
1. Finds previous release tag
2. Analyzes all commits since last release
3. Generates structured release notes via LLM
4. Uses generated notes for draft release body

Requires ANTHROPIC_API_KEY or OPENAI_API_KEY in GitHub secrets.

Related to #671 (automated release workflow)
2025-11-11 14:01:34 +00:00
rcourtman
2a9e9e01fe Fix duplicate asset upload in release workflow
- Standalone binaries (pulse-sensor-proxy-*, pulse-host-agent-*) were matching both binaries AND .sha256 files
- .sha256 files already uploaded in 'Upload checksums.txt first' step
- gh release upload fails when same asset uploaded twice
- Fix: Use explicit loop to exclude .sha256, .tar.gz, and .zip extensions from standalone binary upload

Error was:
  asset under the same name already exists: [pulse-sensor-proxy-linux-386.sha256 ...]

Related to #671 (automated release workflow)
2025-11-11 13:39:59 +00:00
rcourtman
3a98559e5f Add OCI labels to Docker images and --version flag to docker-agent
- Add OCI image labels to both pulse and pulse-docker-agent images:
  - org.opencontainers.image.title
  - org.opencontainers.image.description
  - org.opencontainers.image.version
  - org.opencontainers.image.created
  - org.opencontainers.image.revision (git sha)
  - org.opencontainers.image.source
  - org.opencontainers.image.url
  - org.opencontainers.image.licenses
- Add --version flag to pulse-docker-agent binary
  - Allows users to verify agent version: pulse-docker-agent --version
  - Outputs: pulse-docker-agent version v4.29.0

Addresses Dev Team 3 findings: CRITICAL-4 (OCI labels) and CRITICAL-5 (--version flag)
Related to #671 (automated release workflow)
2025-11-11 11:52:20 +00:00
rcourtman
475ce68dc2 Fix release workflow job ordering (fixes critical architectural flaw)
- Reorder jobs: build-docker-images FIRST, then create-release
- Previously: release created first, then Docker builds → if Docker fails, release exists without images
- Now: Docker images built first → if Docker fails, no release created
- Add timeout-minutes: 60 to build-docker-images job
- Add timeout-minutes: 30 to create-release job
- Update release notes template to mention Docker images
- create-release job now depends on build-docker-images success

Related to #671 (automated release workflow)
Addresses Dev Team 1 finding: CRITICAL-3 architectural time bomb
2025-11-11 11:51:33 +00:00
rcourtman
9af7a0c46b Add Docker image building to release workflow
Release workflow now builds and pushes Docker images after creating
the draft release:

- Pulse server image (linux/amd64, linux/arm64)
- Docker agent image (linux/amd64, linux/arm64)
- Pushed to both Docker Hub and GHCR
- Tagged with version and 'latest'

Requires DOCKER_USERNAME and DOCKER_PASSWORD secrets to be configured.
2025-11-11 11:39:29 +00:00
rcourtman
1fac7fa0cf Remove unnecessary review summary doc 2025-11-11 11:34:43 +00:00
rcourtman
f9919cc305 Add comprehensive release review summary from 4-team review
Documents findings from independent reviews by 4 development teams:
- Team 1: Architecture & Security
- Team 2: Artifact Integrity
- Team 3: Update Flow Integration
- Team 4: Post-Merge Validation

All 3 critical blockers have been fixed and pushed to main.
Release process is APPROVED FOR PRODUCTION.

High/medium priority improvements documented for follow-up work.
2025-11-11 11:34:03 +00:00
rcourtman
d5e67d8e6b Fix critical release workflow issues identified in review
Addresses 3 critical issues from 4-dev team review:

1. CRITICAL: Fix non-deterministic checksum generation (Dev 2 & 3)
   - Add explicit sorting to checksums.txt generation
   - Prevents #671 checksum mismatches between builds
   - Location: scripts/build-release.sh:348

2. CRITICAL: Fix upload/validation race condition (Dev 1)
   - Change validation trigger from 'release: created' to 'workflow_run'
   - Prevents validation from running while assets still uploading
   - Prevents valid releases from being incorrectly deleted
   - Location: .github/workflows/validate-release-assets.yml:4-8

3. CRITICAL: Fix GitHub token exposure in logs (Dev 1)
   - Replace curl commands with gh CLI
   - Prevents token leakage in workflow logs
   - Location: .github/workflows/validate-release-assets.yml:44, 63

All three issues were blocking issues that could cause release failures.
Remaining high/medium priority issues to be addressed in follow-up PRs.
2025-11-11 11:32:44 +00:00
rcourtman
137554009a Bump version to 4.29.0 2025-11-11 10:59:23 +00:00
rcourtman
d472b25a2b Fix validate-release.sh path issues after pushd
The script does pushd into RELEASE_DIR, so tarball paths should not include
the RELEASE_DIR prefix. Also fixed checksum validation glob patterns to
exclude .sha256 files from matching.
2025-11-11 10:54:00 +00:00
rcourtman
8ee3f12efb Fix validation script to check for ./ prefix in tarballs
Tarballs are created with ./bin/pulse paths (relative from inside staging dir)
but validation was looking for bin/pulse paths. Updated all tar -tzf checks
to use correct ./ prefix.
2025-11-11 10:43:26 +00:00
rcourtman
741254e20c Fix validate-release.sh to use RELEASE_DIR path prefix
The validation script was looking for tarballs in the current directory
instead of the release/ directory, causing all validations to fail.
Now properly prepends $RELEASE_DIR to all file paths.
2025-11-11 10:32:36 +00:00
rcourtman
f573653d09 Bump version to 4.29.0-rc1 2025-11-11 10:25:03 +00:00
rcourtman
1cb40b5963 Merge post-upload validation gate workflow
- Add GitHub Actions workflow that validates releases after upload
- Re-downloads all assets from GitHub release
- Re-runs validate-release.sh on downloaded assets
- Sets commit status (blocks publish if validation fails)
- Updates release description with validation results

Final safety net: Catches checksum mismatches even after upload
Related to #671
2025-11-11 10:06:40 +00:00
rcourtman
970eb373ac Merge automated release workflow
- Add GitHub Actions workflow for fully automated releases
- Build → validate → create draft → upload assets (checksums.txt first)
- Add --skip-docker flag to validate-release.sh for CI environments
- Workflow ensures checksums.txt cannot drift from binaries
- Manual trigger via workflow_dispatch or automatic on version tags

Eliminates: Manual release process errors, checksum drift issues
Related to #671
2025-11-11 10:06:28 +00:00
rcourtman
93acb6f564 Merge update service refactor with SSE and job queue
- Add job queue system to ensure only one update runs at a time
- Add Server-Sent Events (SSE) for real-time push updates
- Increase rate limit from 20/min to 60/min for update endpoints
- Add unit tests for queue and SSE functionality
- Frontend: Update modal now uses SSE with polling fallback

Eliminates: 429 rate limit errors, duplicate modals, race conditions
Related to #671
2025-11-11 10:06:16 +00:00
rcourtman
e4ef0f1051 Merge integration test suite for update flows
- Add comprehensive Playwright-based integration tests (60+ tests)
- Add mock GitHub release server for controlled testing
- Add 6 test suites: happy path, bad checksums, rate limiting, network failures, stale releases, frontend validation
- Add GitHub Actions workflow for automated testing
- Test infrastructure will catch v4.28.0-style issues automatically

This establishes the testing baseline for all update system changes.
2025-11-11 10:06:04 +00:00
Claude
0af921dc23 Refactor update service to eliminate polling and race conditions
This commit implements a comprehensive refactoring of the update system
to address race conditions, redundant polling, and rate limiting issues.

Backend changes:
- Add job queue system to ensure only ONE update runs at a time
- Implement Server-Sent Events (SSE) for real-time update progress
- Add rate limiting to /api/updates/status (5-second minimum per client)
- Create SSE broadcaster for push-based status updates
- Integrate job queue with update manager for atomic operations
- Add comprehensive unit tests for queue and SSE components

Frontend changes:
- Update UpdateProgressModal to use SSE as primary mechanism
- Implement automatic fallback to polling when SSE unavailable
- Maintain backward compatibility with existing update flow
- Clean up SSE connections on component unmount

API changes:
- Add new endpoint: GET /api/updates/stream (SSE)
- Enhance /api/updates/status with client-based rate limiting
- Return cached status with appropriate headers when rate limited

Benefits:
- Eliminates 429 rate limit errors during updates
- Only one update job can run at a time (prevents race conditions)
- Real-time updates via SSE reduce unnecessary polling
- Graceful degradation to polling when SSE unavailable
- Better resource utilization and reduced server load

Testing:
- All existing tests pass
- New unit tests for queue and SSE functionality
- Integration tests verify complete update flow
2025-11-11 09:33:05 +00:00
Claude
2afdca4d30 Add comprehensive integration test suite for update flow
Implements end-to-end testing infrastructure for the Pulse update flow,
validating the entire path from UI to backend with controllable test
scenarios.

## What's Included

### Test Infrastructure
- Mock GitHub release server (Go) with controllable failure modes
- Docker Compose test environment (isolated services)
- Playwright test framework with TypeScript
- 60+ test cases across 6 test suites
- Helper library with 20+ reusable test utilities

### Test Scenarios
1. Happy Path (8 tests)
   - Valid checksums, successful update flow
   - Modal appears exactly once
   - Complete end-to-end validation

2. Bad Checksums (8 tests)
   - Server rejects invalid checksums
   - Error shown ONCE (not twice) - fixes v4.28.0 issue type
   - User-friendly error messages

3. Rate Limiting (9 tests)
   - Multiple rapid requests throttled gracefully
   - Proper rate limit headers
   - Clear error messages

4. Network Failure (10 tests)
   - Exponential backoff retry logic
   - Timeout handling
   - Graceful degradation

5. Stale Release (10 tests)
   - Backend refuses flagged releases
   - Informative error messages
   - Proper rejection logging

6. Frontend Validation (15 tests)
   - UpdateProgressModal appears exactly once
   - No duplicate modals on error
   - User-friendly error messages
   - Proper accessibility attributes

### CI/CD Integration
- GitHub Actions workflow (.github/workflows/test-updates.yml)
- Runs on PRs touching update-related code
- Separate test runs for each scenario
- Regression test to verify v4.28.0 issue prevention
- Automatic artifact uploads

### Documentation
- README.md: Architecture and overview
- QUICK_START.md: Getting started guide
- IMPLEMENTATION_SUMMARY.md: Complete implementation details
- Helper scripts for setup and test execution

## Success Criteria Met

 Tests run in CI on every PR touching update code
 All scenarios pass reliably
 Tests catch v4.28.0 checksum issue type automatically
 Frontend UX regressions are blocked

## Usage

```bash
cd tests/integration
./scripts/setup.sh    # One-time setup
npm test              # Run all tests
```

See QUICK_START.md for detailed instructions.

Addresses requirements from issue for comprehensive update flow testing
with specific focus on preventing duplicate error modals and ensuring
checksum validation works correctly.
2025-11-11 09:31:52 +00:00
Claude
969809156f Add post-upload validation gate for release assets
Create GitHub Actions workflow that validates release assets AFTER they're uploaded
to catch issues even if someone manually uploads or modifies assets.

Features:
- Triggers on release created/edited (draft only)
- Downloads all assets from GitHub release
- Re-runs scripts/validate-release.sh on downloaded assets
- On validation failure:
  * Deletes all assets from the release
  * Sets commit status to failed
  * Updates release description with error details
- On validation success:
  * Sets commit status to success
  * Updates release description with validation summary

This acts as a safety gate to prevent publishing releases with:
- Missing required files
- Checksum mismatches
- Incorrect version strings in binaries
- Corrupted or incomplete uploads
2025-11-11 09:23:06 +00:00
Claude
b3f220f1a1 Add automated release workflow with validation
This commit introduces a comprehensive GitHub Actions workflow for
creating releases, ensuring all artifacts are validated before upload.

Changes:
- Add .github/workflows/release.yml: Manual workflow_dispatch trigger
  that builds, validates, and creates draft releases
- Update scripts/validate-release.sh: Add --skip-docker flag to allow
  validation without Docker image checks

Key features:
- Validation runs BEFORE any assets are uploaded
- If validation fails, no release is created
- checksums.txt and artifacts come from the same build
- No manual steps between validation and upload
- Checksums uploaded first, then all other assets
- Creates draft release for manual review before publishing

The workflow ensures that checksums.txt cannot drift from binaries
by running the entire build-validate-upload pipeline atomically.
2025-11-11 09:22:03 +00:00
rcourtman
e894bc7b1d Fix recurring update issues (related to #671)
This commit addresses three recurring issues with the update system:

1. **Checksum mismatches (v4.27.0, v4.28.0):**
   - Root cause: Release process uploads checksums.txt first, but if artifacts
     are rebuilt after that upload, checksums become stale
   - Fix: Update RELEASE_CHECKLIST.md to REQUIRE running validate-release.sh
     before publishing (step 9, non-negotiable)
   - The validation script exists and catches these errors, but wasn't being
     enforced in the release process

2. **Duplicate error modals:**
   - Root cause: UpdateProgressModal rendered in both App.tsx
     (GlobalUpdateProgressWatcher) and UpdateBanner.tsx
   - Fix: Remove UpdateProgressModal from UpdateBanner.tsx
   - GlobalUpdateProgressWatcher automatically shows the modal when updates
     start, so the banner's modal is redundant

3. **Rate limiting too strict:**
   - Root cause: UpdateProgressModal polls /api/updates/status every 2 seconds
     (30 req/min), but rate limit was 20/min
   - Fix: Increase UpdateEndpoints rate limit from 20/min to 60/min
   - Allows modal to poll without hitting rate limits during updates

These were all manual process errors and configuration issues, not code bugs.
The validation script enforcement prevents future checksum mismatches.
2025-11-11 09:09:30 +00:00