Commit graph

4826 commits

Author SHA1 Message Date
rcourtman
391e3d200b Switch to reliable workflow_dispatch trigger for releases
Tag push triggers in GitHub Actions are unreliable (known issue).
Major projects don't actually use automatic tag triggers - they use
workflow_dispatch or other manual triggers.

Changes:
- Remove tag push trigger
- Use workflow_dispatch with version input
- Workflow validates that annotated tag already exists
- Tag still stores LLM changelog in annotation
- Manual trigger: gh workflow run release.yml -f version=X.Y.Z

This is the pattern that actually works reliably.
2025-11-13 12:24:34 +00:00
rcourtman
44d15d02b4 Add workflow_dispatch fallback for tag-triggered releases
GitHub Actions has a known issue where tag pushes sometimes don't
trigger workflows. Add workflow_dispatch as a backup trigger that
accepts a tag parameter.

This allows manual triggering if automatic tag push trigger fails.
2025-11-13 12:21:11 +00:00
rcourtman
bed32e33a2 Prepare v4.30.0 release 2025-11-13 12:04:45 +00:00
rcourtman
c13360f962 Optimize release workflow for speed
Preflight tests improvements:
- Add npm cache for frontend dependencies (saves ~30-60s)
- Add Go module cache (saves ~20-40s)
- Add Playwright browser cache (saves ~40-60s)
- Remove excessive diagnostic output (saves ~10-20s)
- Total preflight savings: ~2-3 minutes

Docker build improvements:
- Enable Docker layer caching via registry (saves ~2-4 min per build)
- Cache stored in GHCR as :buildcache tags
- Reuses unchanged layers across releases
- First build same time, subsequent builds much faster
- Total Docker savings: ~4-8 minutes on releases with few changes

Expected total time reduction: 6-11 minutes on typical releases
No functionality sacrificed - all tests and validations remain.
2025-11-13 12:00:36 +00:00
rcourtman
bec2661270 Revert VERSION to 4.29.5 for clean testing 2025-11-13 11:58:51 +00:00
rcourtman
597fe6f5aa Require LLM-written changelogs in tag annotations
Remove GitHub auto-generation fallback. Tags MUST be annotated
with Claude-written release notes.

Why:
- LLMs write semantic, user-focused changelogs
- Filters out dev/internal commits
- Explains features in terms users understand
- GitHub's auto-gen is just raw commit dumps

Workflow now fails fast with clear error if tag lacks annotation.
2025-11-13 11:57:26 +00:00
rcourtman
66e0721739 Support Claude-written changelogs in tag annotations
Workflow now checks for annotated tags and uses the annotation
as release notes. If no annotation exists, falls back to GitHub's
auto-generation.

This allows Claude to write formatted release notes when creating
releases, stored directly in git history as part of the tag.
2025-11-13 11:56:02 +00:00
rcourtman
0bc91737fa Fix heredoc syntax in release workflow
Cannot use GitHub Actions template syntax inside single-quoted heredoc
2025-11-13 11:49:17 +00:00
rcourtman
355efd600a Refactor to tag-driven release workflow with auto-changelog
Major improvements:
- Trigger on tag push (git push origin vX.Y.Z) instead of workflow_dispatch
- Auto-generate release notes using GitHub's API
- Tag is single source of truth (eliminates version/tag mismatch)
- Follows industry standard pattern (Kubernetes, Docker, HashiCorp)
- Also push 'latest' tag to Docker registries
- Simpler workflow: update VERSION → commit → tag → push tag

Breaking change: Manual workflow_dispatch releases no longer supported.
Use: git tag vX.Y.Z && git push origin vX.Y.Z
2025-11-13 11:48:10 +00:00
rcourtman
ac94f54b54 Fix remote sync check in release trigger script
- Replace unreliable git fetch --dry-run check
- Use git rev-parse to compare local and remote commits
- Prevents false warnings about diverged branches
2025-11-13 11:43:36 +00:00
rcourtman
71f337f1f8 Prepare v4.30.0 release 2025-11-13 11:40:54 +00:00
rcourtman
6f85a0fef7 Revert VERSION to 4.29.5 for release testing 2025-11-13 11:38:37 +00:00
rcourtman
dd33891095 Add pre-flight validation script for releases
- Check VERSION file matches before triggering workflow
- Validate working directory is clean
- Confirm on main branch and up to date
- Load release notes from /tmp/release_notes_X.Y.Z.md
- Prevents wasting CI time on misconfigured releases
2025-11-13 11:36:53 +00:00
rcourtman
64c59a3f37 Prepare v4.30.0 release 2025-11-13 11:34:49 +00:00
rcourtman
7c895df1f3 Fix Proxmox 9.x VM status endpoint incompatibility
Proxmox VE 9.x removed support for the "full" parameter in the
/nodes/{node}/qemu/{vmid}/status/current endpoint. When Pulse sent
GetVMStatus() requests with ?full=1, Proxmox responded with:

  API error 400: {"errors":{"full":"property is not defined in schema..."}}

This caused the cluster client to mark ALL endpoints as unhealthy, which
cascaded into multiple failures:
- VM status checks failed
- Guest agent queries were blocked
- Filesystem data collection stopped working
- All Windows VMs showed disk:-1 (unknown) instead of actual disk usage

The fix removes the ?full=1 parameter since Proxmox 9.x returns all data
by default without needing this parameter. This maintains backward
compatibility with older Proxmox versions while fixing the issue in 9.x.

After this fix:
- Cluster endpoints are correctly marked as healthy
- Guest agent queries work properly
- Windows VMs report actual disk usage (e.g., 26% on C:\ drive)
- VM monitoring functions normally on Proxmox 9.x
2025-11-13 11:22:36 +00:00
rcourtman
456bf20c54 Simplify Remember Me label text
Remove '30 days' from label - most apps just say 'Remember me' without
specifying the duration.
2025-11-13 10:40:53 +00:00
rcourtman
aaeb5a458e Add Remember Me feature with sliding session expiration (Related to #707)
Implements a "Remember Me" option that allows users to stay logged in
for 30 days instead of the default 24 hours. This addresses the pain
point of frequent re-authentication in LAN-only environments while
maintaining authentication security.

Backend changes:
- Add rememberMe field to login request handling
- Support variable session durations (24h default, 30d with Remember Me)
- Implement sliding session expiration that extends sessions on each
  authenticated request using the original duration
- Store OriginalDuration in session data for proper sliding window
- Update session cookie MaxAge to match session duration

Frontend changes:
- Add "Remember Me for 30 days" checkbox to login form
- Pass rememberMe flag in login request
- Improve UI with clear duration indication

Key features:
- Sessions extend automatically on each request (sliding window)
- Original duration preserved across session extension
- Backward compatible with existing sessions (legacy sessions work)
- Sessions persist across server restarts

This provides a better user experience for LAN deployments without
compromising security by completely disabling authentication.
2025-11-13 10:37:08 +00:00
rcourtman
6a5b8d698b Add critical safety guards to temperature proxy installation
After implementing the health gate, added comprehensive safety measures
to prevent the health checks themselves from becoming a new failure point.

**Problem**: Previous commit added strict health checks but could fail in
edge cases:
- `pct exec` could hang if container stopped/frozen → installer deadlocks
- systemctl/journalctl might not be available → diagnostics fail
- Container access check could fail for transient reasons
- pvecm error detection was fragile (string matching specific messages)

**Solutions Implemented**:

1. **Timeouts on All External Commands** (install.sh:1596,1618)
   - `timeout 5` on systemctl checks
   - `timeout 10` on pct exec checks
   - Prevents installer from hanging indefinitely

2. **Graceful Degradation** (install.sh:1602-1630)
   - Check for systemctl/pct availability before using
   - Warn if tools missing instead of failing
   - Container check is warning-only (may be transient)
   - Only fail on critical checks: service running, socket exists

3. **Bypass Flag Support** (install.sh:1589-1594)
   - Set `PULSE_SKIP_HEALTH_CHECKS=1` to bypass all checks
   - Documented in error messages for troubleshooting
   - Allows installation in unsupported environments

4. **Flexible Diagnostics** (install.sh:1640-1647)
   - Use journalctl if available, fallback to syslog
   - Conditional tool-specific advice

5. **Broader Error Detection** (ssh.go:582-628)
   - List of 14 standalone indicators (vs 5 hardcoded checks)
   - Case-insensitive matching for localization tolerance
   - Permissive strategy: treat any known pattern as standalone
   - Handles variations: "no cluster", "IPC", "connection refused", etc.

6. **Enhanced Test Coverage** (ssh_test.go:+35 lines)
   - Added 3 new test cases (variation patterns)
   - Tests now cover 8 standalone scenarios + 3 negative cases
   - All tests pass (11/11)

**Impact**:
- Health gate won't block installation in edge cases
- Better user experience on non-standard setups
- Standalone detection handles more error message variations
- Clear escape hatch for troubleshooting (bypass flag)

**Confidence Level**: High
- All tests pass (bash syntax + Go unit tests)
- Graceful fallbacks for every external command
- Only critical checks are hard failures
- Warnings guide users through validation issues

Related to #571
2025-11-13 10:26:46 +00:00
rcourtman
b2dc91ed66 Add comprehensive tests for standalone node detection patterns
Tests validate the error pattern matching logic added in previous commit,
ensuring we correctly identify:

1. **Standalone Node Patterns** (should trigger fallback):
   - Classic: 'Corosync config does not exist'
   - LXC ipcc errors: 'ipcc_send_rec[1] failed: Unknown error -1'
   - Access control errors: 'Unable to load access control list'
   - All patterns from GitHub issue #571

2. **Genuine Errors** (should NOT trigger fallback):
   - Network timeouts
   - Permission denied
   - Command not found

Tests use real error messages from production GitHub issues to prevent
regressions. All 9 test cases pass.

Coverage:
- 6 standalone/LXC error patterns
- 3 genuine error cases (negative testing)
- References issue #571 for traceability

Related to #571
2025-11-13 10:17:57 +00:00
rcourtman
d3875eaae5 Dramatically improve temperature proxy installation robustness
Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes:

**Problem 1: Silent Failures**
- Installations reported "SUCCESS" even when proxy never started
- UI showed green checkmarks with no temperature data
- Zero feedback when things went wrong

**Problem 2: Missing Diagnostics**
- Service failures logged only in journald
- Users saw "Something going on with the proxy" with no actionable guidance
- No way to troubleshoot from error messages

**Problem 3: Standalone Node Issues**
- Proxy daemon logged continuous pvecm errors as warnings
- "ipcc_send_rec" and "Unknown error -1" messages confused users
- These are expected for non-clustered/LXC setups

**Solutions Implemented:**

1. **Health Gate in install.sh (lines 1588-1629)**
   - Verify service is running after installation
   - Check socket exists on host
   - Confirm socket visible inside container via bind mount
   - Fail loudly with specific diagnostics if any check fails

2. **Actionable Error Messages in install-sensor-proxy.sh (lines 822-877)**
   - When service fails to start: dump full systemctl status + 40 lines of logs
   - When socket missing: show permissions, service status, and remediation command
   - Include common issues checklist (missing user, permission errors, lm-sensors, etc.)
   - Direct link to troubleshooting docs

3. **Better Standalone Node Detection in ssh.go (lines 585-595)**
   - Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators
   - Log at INFO level (not WARN) since this is expected behavior
   - Clarify message: "using localhost for temperature collection"

**Impact:**
- Eliminates "green checkmark but no temps" scenario
- Users get immediate actionable feedback on failures
- Standalone/LXC installations work silently without error spam
- Reduces support burden from #571 (15+ comments of user frustration)

Related to #571
2025-11-13 10:14:19 +00:00
rcourtman
52ee702187 Revert "Document release notes input requirement"
This reverts commit 5c06597b6e.
2025-11-13 09:41:43 +00:00
rcourtman
526fbdc1d4 Document release notes input requirement 2025-11-13 09:40:21 +00:00
rcourtman
941905c06a Require release notes input for workflow 2025-11-13 09:37:38 +00:00
rcourtman
60de1cade6 Polish release notes fallback 2025-11-13 09:10:43 +00:00
rcourtman
57c58eb7a6 Add deterministic release notes fallback 2025-11-13 00:00:25 +00:00
rcourtman
85217afb1d Improve release notes fallback 2025-11-12 23:40:26 +00:00
rcourtman
e4f7ca04fb Prepare v4.29.6 release 2025-11-12 23:19:20 +00:00
rcourtman
ddac48e640 Ensure agent ID collisions respect token boundaries (Related to #658) 2025-11-12 22:46:56 +00:00
rcourtman
82a2eebb3f Improve update integration diagnostics 2025-11-12 22:27:05 +00:00
rcourtman
ca1ef66d77 Ensure alert cooldown persists reliably
Related to #706
2025-11-12 21:52:24 +00:00
rcourtman
48e5e854c6 Improve auth timeout handling (related to #705) 2025-11-12 21:50:53 +00:00
rcourtman
ca81f70afd Make API update test resilient to stale cookies 2025-11-12 21:32:33 +00:00
rcourtman
accd1c642f Use internal mock host for release assets 2025-11-12 21:23:01 +00:00
rcourtman
8330e1ea05 Fix API update test to send CSRF token 2025-11-12 21:13:46 +00:00
rcourtman
e27d9f7deb Preserve storage backups after partial failures (Related to #704) 2025-11-12 21:10:18 +00:00
rcourtman
6a1a88217f Add release dry run workflow and API update integration test 2025-11-12 21:02:52 +00:00
rcourtman
6979f3f57a Related to discussion #702: harden uninstall cleanup 2025-11-12 19:46:45 +00:00
rcourtman
305a5b17bc Handle Snap Docker home restrictions (Related to #693) 2025-11-12 19:20:04 +00:00
rcourtman
0899e36ad2 Improve sensor proxy cluster validation (Related to #703) 2025-11-12 19:17:45 +00:00
rcourtman
d2247d3ad7 Related to #692: Skip unsupported guest OS info calls 2025-11-12 19:17:09 +00:00
rcourtman
18bf8d2b3a Handle docker validation under errexit
Related to #693
2025-11-12 18:34:53 +00:00
rcourtman
4a0637957c Related to #698: harden installer release detection 2025-11-12 17:56:16 +00:00
rcourtman
798f91792b Improve sensor-proxy release detection (related to #701) 2025-11-12 17:49:20 +00:00
rcourtman
ff900895be Ensure release validation handles published edits (related to #669) 2025-11-12 17:33:30 +00:00
rcourtman
f61b850179 Ensure VM status requests always return meminfo (Related to #694) 2025-11-12 17:30:10 +00:00
rcourtman
c8c406172a Auto-update Helm chart version to 4.29.5 2025-11-12 17:15:02 +00:00
rcourtman
9b3247b4ce Auto-update Helm chart documentation 2025-11-12 17:15:01 +00:00
rcourtman
40de26a826 Skip helm-docs commits during release workflows 2025-11-12 17:14:31 +00:00
rcourtman
77d4229a6d Bump version to 4.29.5 2025-11-12 16:34:22 +00:00
rcourtman
8865916fb6 Fix missing regexp import for path traversal validation 2025-11-12 16:34:16 +00:00