Commit graph

594 commits

Author SHA1 Message Date
rcourtman
bbe11d1e7f Fix discovery test Prometheus metric collision
Remove t.Parallel() from tests that verify global Prometheus gauge values.
When tests run in parallel, they update the same global gauges
(discoveryScanServers, discoveryScanErrors) causing race conditions and
incorrect metric values.

Fixes test failure in workflow run 19281332332:
- TestPerformScanRecordsHistoryAndMetrics expected 2 servers, got 1

Related to release workflow preflight tests.
2025-11-11 23:34:49 +00:00
rcourtman
b41f8a2ac4 Fix backend test failures blocking release workflow
Three categories of fixes:

1. Goroutine leak causing 10-minute timeout:
   - Add defer mon.notificationMgr.Stop() in monitor_memory_test.go
   - Background goroutines from notification manager weren't being stopped

2. Database NULL column scanning errors:
   - Change LastError from string to *string in queue.go
   - Change PayloadBytes from int to *int in queue.go
   - SQL NULL values require pointer types in Go

3. SSRF protection blocking test servers:
   - Check allowlist for localhost before rejecting in notifications.go
   - Set PULSE_DATA_DIR to temp directory in tests
   - Add defer nm.Stop() calls to prevent goroutine leaks

Fixes for preflight test failures in workflow run 19280879903.
2025-11-11 23:27:03 +00:00
rcourtman
92a5d74ba9 Add Snap Docker support to install-docker-agent.sh
Snap-installed Docker does not automatically create a docker group,
causing permission denied errors when the pulse-docker service user
tries to access /var/run/docker.sock.

Changes:
- Auto-detect Snap Docker installations
- Create docker group if missing when Snap Docker is detected
- Restart Snap Docker after group creation to refresh socket ACLs
- Add socket access validation before starting the service
- Handle symlinked Docker sockets in systemd unit ReadWritePaths
- Document troubleshooting steps in DOCKER_MONITORING.md
2025-11-11 23:07:29 +00:00
rcourtman
68049e0fb3 Fix pulse-sensor-proxy pvecm errors in LXC containers (related to #600)
When pulse-sensor-proxy runs inside an LXC container on a Proxmox host,
pvecm status fails with "ipcc_send_rec[1] failed: Unknown error -1"
because the container can't access the host's corosync IPC socket.

This caused repeated warnings every few seconds even though the proxy
can function correctly by discovering local host addresses.

Extended the standalone node detection to recognize "ipcc_send_rec"
errors as indicating an LXC container deployment and gracefully fall
back to local address discovery instead of logging warnings.
2025-11-11 23:04:36 +00:00
rcourtman
515987cc8b Fix failing backend tests in preflight checks
Fixes three test failures that were blocking release workflow:

1. TestApplyDockerReportGeneratesUniqueIDsForCollidingHosts:
   - Initialize dockerTokenBindings and dockerMetadataStore in test helper
   - These maps were nil causing panic on first access

2. TestSendGroupedAppriseHTTP & TestSendTestNotificationAppriseHTTP:
   - Configure allowlist to permit localhost (127.0.0.1) for test servers
   - SSRF protection was blocking httptest.NewServer() URLs
   - Tests need to allowlist the test server IP to bypass security checks

Related to workflow fix in 5fa78c3e3.
2025-11-11 23:02:45 +00:00
rcourtman
5fa78c3e36 Fix YAML syntax error in validate-release-assets workflow
The Python heredoc was not indented, causing YAML parsers to interpret
the Python code as YAML syntax. This caused workflow_dispatch runs to
fail instantly with 'workflow file issue' error before any jobs could start.

The fix indents the heredoc content and changes delimiter from 'PY' to
'EOF' to match standard conventions.
2025-11-11 22:54:37 +00:00
rcourtman
9f7c9aea95 Revert "temp: set VERSION to 4.29.0-test for canary release"
This reverts commit 5d391a98b0.
2025-11-11 22:41:32 +00:00
rcourtman
5d391a98b0 temp: set VERSION to 4.29.0-test for canary release 2025-11-11 22:39:03 +00:00
rcourtman
ea6cad10ce Release workflow guardrails (related to #695) 2025-11-11 22:34:00 +00:00
rcourtman
3e90737448 Fix guest agent OS info calls causing OpenBSD VM crashes (related to #692)
Add defensive mitigation to prevent repeated guest-get-osinfo calls that
trigger buggy behavior in QEMU guest agent 9.0.2 on OpenBSD 7.6.

The issue: OpenBSD doesn't have /etc/os-release (Linux convention), and
qemu-ga 9.0.2 appears to spawn excessive helper processes trying to read
this file whenever guest-get-osinfo is called. These helpers don't clean
up properly, eventually exhausting the process table and crashing the VM.

The fix: Track consecutive OS info failures per VM. After 3 failures,
automatically skip future guest-get-osinfo calls for that VM while
continuing to fetch other guest agent data (network interfaces, version).
This prevents triggering the buggy code path while maintaining most guest
agent functionality.

The counter resets on success, so if the guest agent is upgraded or the
issue is resolved, Pulse will automatically resume OS info collection.

Related to #692
2025-11-11 22:27:22 +00:00
rcourtman
135b378820 Fix Windows/macOS host agent downloads for bare metal installs (related to #684)
Bare metal installations couldn't serve Windows host agent downloads because
the Windows and macOS binaries weren't included in the universal tarball. The
download endpoint would return 404 when Windows users tried to install the
host agent from a bare metal Pulse deployment (Proxmox LXC, Debian VM, etc.).

Changes:
- build-release.sh: Copy Windows/macOS host agent binaries into universal tarball
- build-release.sh: Create symlinks for Windows binaries without .exe extension
- validate-release.sh: Add Windows 386 binary and symlink to Docker validation
- validate-release.sh: Add explicit validation that universal tarball contains all Windows/macOS binaries

The universal tarball now matches the Docker image, ensuring both deployment
methods can serve the complete set of downloadable binaries for the /download/
endpoint.
2025-11-11 21:26:33 +00:00
rcourtman
e5dfca6c88 Fix SELinux compatibility in host agent installer
Replace mv with install command to ensure correct SELinux context.
The mv command preserves the user_tmp_t label from /tmp, which
prevents systemd from executing the binary on SELinux systems.

The install command creates a new file with the correct label for
/usr/local/bin. Added automatic restorecon call for SELinux systems
to ensure policy compliance.

Related to #688
2025-11-11 21:13:33 +00:00
rcourtman
f6570e18b7 Fix checksum extraction to use word boundaries
The grep pattern was too loose and could match filenames like:
- pulse-v4.29.0-linux-amd64.tar.gz (correct)
- pulse-v4.29.0-linux-amd64.tar.gz.sha256 (also matched)

Using grep -w ensures we only match the exact filename as a complete word,
preventing false matches on files with the same prefix.
2025-11-11 20:41:42 +00:00
rcourtman
34b29610e7 Generate both checksums.txt and .sha256 files for backward compatibility
Following best practices for release format transitions:
- build-release.sh now generates both formats from same sha256sum run
- Workflow uploads both checksums.txt and individual .sha256 files
- Validation ensures both formats exist and match

This provides a safe transition period for users with older install scripts
while maintaining the cleaner checksums.txt format going forward. After 2-3
releases when most users have updated scripts, we can remove .sha256 generation.

Related: Install script already supports both formats (falls back gracefully).
2025-11-11 20:31:15 +00:00
rcourtman
d78983cafc Make install script backward compatible with old and new release formats
The install script now tries checksums.txt first (v4.29.0+), then falls back
to individual .sha256 files (v4.28.0 and earlier). This ensures users can
update from any version regardless of which checksum format was used.

This fixes the release format transition issue where changing asset structure
broke updates for users on older versions.
2025-11-11 20:19:47 +00:00
rcourtman
b53a21f11f Update install script to use checksums.txt instead of individual .sha256 files
Aligns with release asset reduction changes. The install script now downloads the unified checksums.txt file and extracts the checksum for the specific architecture being installed.
2025-11-11 20:14:13 +00:00
rcourtman
a79db028ff Add manual trigger support to demo server update workflow
Allows manually deploying specific releases to the demo server via workflow_dispatch.
2025-11-11 20:12:24 +00:00
rcourtman
4604563273 Fix demo workflow asset checks to follow redirects
The workflow was failing because GitHub returns 302 redirects for freshly published release assets while the CDN propagates. Adding -L flag to curl commands allows them to follow redirects and properly detect when assets are available.
2025-11-11 20:09:41 +00:00
rcourtman
036673e783 Add production-grade Helm chart improvements
High-impact improvements based on Codex recommendations:

1. values.schema.json - JSON schema validation catches config errors at install time
2. helm-docs automation - Auto-generates documentation from values.yaml comments
3. kind smoke tests - Deploys and upgrades chart in real cluster to catch runtime issues
4. ServiceMonitor template - Built-in Prometheus integration for observability
5. Artifact Hub metadata - Changelog, links, and maintainer info for better discoverability

These improvements provide:
- Configuration validation before deployment
- Always up-to-date documentation
- Runtime validation in CI
- First-class monitoring support
- Better user experience on Artifact Hub

Related to #686
2025-11-11 19:52:58 +00:00
rcourtman
3a260a3130 Update Kubernetes docs with GitHub Pages Helm repository
- Replace GHCR OCI instructions with GitHub Pages repository
- Add comprehensive upgrade instructions with examples
- Add rollback procedures
- Add detailed uninstall instructions
- Simplify installation (no authentication required)
2025-11-11 19:40:51 +00:00
rcourtman
89d8e52073 Add automated version syncing and validation to Helm workflow
- Auto-update Chart.yaml version from release tag or manual input
- Add strict helm lint validation before publishing
- Validate chart templates with multiple configuration scenarios
- Ensures chart quality before publishing to GitHub Pages
2025-11-11 19:40:04 +00:00
rcourtman
cdf8399beb Add Artifact Hub repository metadata for chart discoverability
Enables automatic listing on https://artifacthub.io for improved
Helm chart discovery and provides metadata like screenshots, links,
and maintainer information.
2025-11-11 19:39:19 +00:00
rcourtman
30773f181a Update README with GitHub Pages Helm repository instructions
Replace GHCR OCI registry instructions with GitHub Pages Helm repo.
Simpler installation without authentication requirements.

Resolves #686
2025-11-11 19:32:28 +00:00
rcourtman
2ab9ffb14a Update Helm chart version to 4.28.0 for GitHub Pages release 2025-11-11 19:30:52 +00:00
rcourtman
6273d57164 Fix Helm chart releaser to skip existing releases
Use helm-chart- prefix for releases to avoid conflicts with main Pulse releases
2025-11-11 19:28:48 +00:00
rcourtman
c667b2dee4 Add GitHub Pages Helm repository distribution (#686)
GHCR OCI packages cannot be made public through any available mechanism:
- Package doesn't appear in user/repo package lists
- API endpoints return 404
- Workflow tokens lack package visibility permissions
- Manual UI shows no packages to configure
- OCI annotations don't link package to repository

Implementing GitHub Pages Helm repo as canonical distribution method:
- Uses chart-releaser-action to publish to gh-pages branch
- Provides standard 'helm repo add' workflow without authentication
- Maintains OCI push for future use if GHCR resolves visibility issues

Resolves #686
2025-11-11 19:26:18 +00:00
rcourtman
0f78b681c8 Fix validation: Linux host-agent binaries are in main tarballs
Linux host-agent binaries don't have separate archives - they're included in
the main pulse-v*.tar.gz files. Only macOS and Windows have separate archives.
2025-11-11 19:25:14 +00:00
rcourtman
f5c5b05075 Add OCI annotations to Helm chart to link package to repository
Adding org.opencontainers.image.source annotation will connect the GHCR package
to the repository, making it visible in the repo's packages section and allowing
proper visibility management.

Related to #686
2025-11-11 19:24:52 +00:00
rcourtman
e54f881eea Update validation script to match new asset list
Removed validation checks for standalone binaries that are no longer
uploaded to GitHub releases. These binaries are only needed in Docker
images for the /download/ endpoint.

Updated required assets list to include all versioned tarballs/zips
instead of standalone binaries.
2025-11-11 17:50:02 +00:00
rcourtman
b6aa1fe592 Improve Helm chart package visibility configuration (related to #686)
Add fallback attempts to set package visibility through multiple API endpoints.
Also adds helpful output message with verification link.
2025-11-11 17:50:02 +00:00
rcourtman
fa8a8f3af3 Reduce release assets by removing duplicates
Removed:
- Individual .sha256 files (checksums.txt already contains all checksums)
- Standalone binaries without version numbers (users should download versioned tarballs/zips)

Standalone binaries are only needed in Docker images for the /download/ endpoint.
GitHub releases should only contain versioned archives for user downloads.

This reduces release assets from ~54 files to ~19 files per release.
2025-11-11 17:26:00 +00:00
rcourtman
8f0a548e3d Automatically set Helm chart package visibility to public (related to #686)
The pulse-chart package in GHCR currently requires authentication for pulls
because it defaults to private visibility. This affects all users trying to
install via `helm install oci://ghcr.io/rcourtman/pulse-chart`.

This commit adds a workflow step to automatically set the package to public
after each push, enabling anonymous pulls without requiring `helm registry login`.

Note: The existing package will need one-time manual configuration via GitHub
web UI until the next release triggers this workflow.

Related to discussion #686
2025-11-11 17:19:03 +00:00
rcourtman
13a469362a Exclude development/infrastructure changes from release notes
Users don't care about CI/CD improvements, release workflows, build
processes, or testing infrastructure. Only include user-visible changes.

Related to #671
2025-11-11 17:18:50 +00:00
rcourtman
93d0eb6b8a Remove commit hashes from LLM-generated release notes
Commit hashes clutter the release notes and aren't useful for end users.
Only include issue references when explicitly mentioned in commits.

Related to #671
2025-11-11 17:11:02 +00:00
rcourtman
6e669b46dc Fix commit hash linking in release notes
Remove # symbol from commit hash references so GitHub auto-links them.
Format: (abc123) instead of (#abc123)
Issue references still use #: (#123)

Related to #671
2025-11-11 17:03:39 +00:00
rcourtman
6eff1e9fa6 Use heredoc to write release notes without bash interpretation
Backticks in GitHub Actions output were still being interpreted even
when assigned to a variable and then echoed to a file. Use heredoc
with single quotes to prevent any bash expansion.

Related to #671
2025-11-11 16:21:22 +00:00
rcourtman
5f05718c3e Fix release notes backtick command substitution issue
Use --notes-file instead of --notes with variable expansion to prevent
bash from interpreting markdown code blocks as shell commands.

Fixes the error where installation examples like:
  ```bash
  docker pull rcourtman/pulse:v4.29.0
  ```

Were being executed as actual commands during release creation.

Related to #671
2025-11-11 16:09:02 +00:00
rcourtman
8a4e7e9de8 Fix release workflow: fetch git tags for changelog generation
The checkout wasn't fetching tags despite fetch-depth: 0.
Explicitly run git fetch --tags --force after checkout.
2025-11-11 15:44:00 +00:00
rcourtman
89bdb534e0 Fix release workflow: fetch git tags for changelog generation
actions/checkout@v4 does not fetch tags by default, causing the
previous tag lookup to fail and fall back to comparing with the
first commit SHA. Added fetch-depth: 0 to fetch all history including tags.
2025-11-11 15:32:48 +00:00
rcourtman
f444aec82f Simplify previous tag detection for release notes
Just use the latest tag directly instead of trying to exclude the current version.
Since we're generating release notes BEFORE creating the tag, the latest tag
will always be the previous release.
2025-11-11 15:27:37 +00:00
rcourtman
70b321a450 Fix release notes generation: properly detect previous tag
The script was failing because git describe --tags --abbrev=0 HEAD^ returns
the current HEAD commit SHA when no tag exists before HEAD, resulting in
comparing HEAD..HEAD which has zero commits.

Now using git tag --sort=-version:refname to get the latest tag (excluding
the version being released) which will properly compare v4.29.0 with v4.28.0.
2025-11-11 15:03:47 +00:00
rcourtman
f78c58c9a7 Improve error reporting in release notes generation
- Capture script exit code before checking
- Show full error output if script fails
- Prevents silent failures where error is hidden in temp file

Related to #671 (automated release workflow)
2025-11-11 14:38:39 +00:00
rcourtman
95c6daf80a Remove RELEASE_PROCEDURE.md - will add to CLAUDE.md instead 2025-11-11 14:27:48 +00:00
rcourtman
1f0a348912 Add comprehensive release procedure documentation
Document the complete automated release process for future reference:
- Step-by-step release workflow trigger
- What each phase does (Docker build, release creation)
- How to review and publish draft releases
- Troubleshooting common issues
- Emergency rollback procedures
- Workflow architecture and design principles

This ensures future AI contexts and maintainers understand the full
release process without needing to reverse-engineer the workflow.

Related to #671 (automated release workflow)
2025-11-11 14:26:27 +00:00
rcourtman
ea165e0bcc Fix release notes extraction in workflow
- Replace sed with awk for more reliable multiline extraction
- Use temp file to capture full script output
- Extract content between separator lines correctly
- Fixes empty release notes in draft releases

Previous issue: sed pattern wasn't matching the separator lines,
resulting in empty RELEASE_NOTES variable.

New approach: Use awk to capture everything between the two separator
lines, handling multiline content properly.

Related to #671 (automated release workflow)
2025-11-11 14:22:46 +00:00
rcourtman
c27f21f33b Update release notes template to match established format
- Use exact template format from v4.28.0 and prior releases
- Include all standard sections: New Features, Bug Fixes, Improvements, Breaking Changes
- Add complete installation instructions (systemd, Docker, Manual Binary, Helm)
- Include Downloads section with all artifact types
- Add Notes section for important highlights and upgrade considerations
- Ensure LLM outputs format exactly matching previous releases

Related to #671 (automated release workflow)
2025-11-11 14:05:15 +00:00
rcourtman
a7828e2d1e Add LLM-powered release notes generation
- Create scripts/generate-release-notes.sh to auto-generate release notes from git commits
- Supports both Anthropic Claude and OpenAI APIs
- Uses Claude Haiku 4.5 (claude-haiku-4-5-20251001) for cost efficiency ($1/$5 per million tokens)
- Falls back to OpenAI gpt-4o-mini if Anthropic key not available
- Integrates into release workflow between validation and release creation
- Compares current version with previous git tag to generate changelog
- Outputs categorized, user-friendly release notes with installation instructions

Workflow now automatically:
1. Finds previous release tag
2. Analyzes all commits since last release
3. Generates structured release notes via LLM
4. Uses generated notes for draft release body

Requires ANTHROPIC_API_KEY or OPENAI_API_KEY in GitHub secrets.

Related to #671 (automated release workflow)
2025-11-11 14:01:34 +00:00
rcourtman
15c22e34e8 Fix duplicate asset upload in release workflow
- Standalone binaries (pulse-sensor-proxy-*, pulse-host-agent-*) were matching both binaries AND .sha256 files
- .sha256 files already uploaded in 'Upload checksums.txt first' step
- gh release upload fails when same asset uploaded twice
- Fix: Use explicit loop to exclude .sha256, .tar.gz, and .zip extensions from standalone binary upload

Error was:
  asset under the same name already exists: [pulse-sensor-proxy-linux-386.sha256 ...]

Related to #671 (automated release workflow)
2025-11-11 13:39:59 +00:00
rcourtman
a8dc2e8e9b Add OCI labels to Docker images and --version flag to docker-agent
- Add OCI image labels to both pulse and pulse-docker-agent images:
  - org.opencontainers.image.title
  - org.opencontainers.image.description
  - org.opencontainers.image.version
  - org.opencontainers.image.created
  - org.opencontainers.image.revision (git sha)
  - org.opencontainers.image.source
  - org.opencontainers.image.url
  - org.opencontainers.image.licenses
- Add --version flag to pulse-docker-agent binary
  - Allows users to verify agent version: pulse-docker-agent --version
  - Outputs: pulse-docker-agent version v4.29.0

Addresses Dev Team 3 findings: CRITICAL-4 (OCI labels) and CRITICAL-5 (--version flag)
Related to #671 (automated release workflow)
2025-11-11 11:52:20 +00:00
rcourtman
1a263ce9d0 Fix release workflow job ordering (fixes critical architectural flaw)
- Reorder jobs: build-docker-images FIRST, then create-release
- Previously: release created first, then Docker builds → if Docker fails, release exists without images
- Now: Docker images built first → if Docker fails, no release created
- Add timeout-minutes: 60 to build-docker-images job
- Add timeout-minutes: 30 to create-release job
- Update release notes template to mention Docker images
- create-release job now depends on build-docker-images success

Related to #671 (automated release workflow)
Addresses Dev Team 1 finding: CRITICAL-3 architectural time bomb
2025-11-11 11:51:33 +00:00