Tag push triggers in GitHub Actions are unreliable (known issue).
Major projects don't actually use automatic tag triggers - they use
workflow_dispatch or other manual triggers.
Changes:
- Remove tag push trigger
- Use workflow_dispatch with version input
- Workflow validates that annotated tag already exists
- Tag still stores LLM changelog in annotation
- Manual trigger: gh workflow run release.yml -f version=X.Y.Z
This is the pattern that actually works reliably.
GitHub Actions has a known issue where tag pushes sometimes don't
trigger workflows. Add workflow_dispatch as a backup trigger that
accepts a tag parameter.
This allows manual triggering if automatic tag push trigger fails.
Preflight tests improvements:
- Add npm cache for frontend dependencies (saves ~30-60s)
- Add Go module cache (saves ~20-40s)
- Add Playwright browser cache (saves ~40-60s)
- Remove excessive diagnostic output (saves ~10-20s)
- Total preflight savings: ~2-3 minutes
Docker build improvements:
- Enable Docker layer caching via registry (saves ~2-4 min per build)
- Cache stored in GHCR as :buildcache tags
- Reuses unchanged layers across releases
- First build same time, subsequent builds much faster
- Total Docker savings: ~4-8 minutes on releases with few changes
Expected total time reduction: 6-11 minutes on typical releases
No functionality sacrificed - all tests and validations remain.
Remove GitHub auto-generation fallback. Tags MUST be annotated
with Claude-written release notes.
Why:
- LLMs write semantic, user-focused changelogs
- Filters out dev/internal commits
- Explains features in terms users understand
- GitHub's auto-gen is just raw commit dumps
Workflow now fails fast with clear error if tag lacks annotation.
Workflow now checks for annotated tags and uses the annotation
as release notes. If no annotation exists, falls back to GitHub's
auto-generation.
This allows Claude to write formatted release notes when creating
releases, stored directly in git history as part of the tag.
Major improvements:
- Trigger on tag push (git push origin vX.Y.Z) instead of workflow_dispatch
- Auto-generate release notes using GitHub's API
- Tag is single source of truth (eliminates version/tag mismatch)
- Follows industry standard pattern (Kubernetes, Docker, HashiCorp)
- Also push 'latest' tag to Docker registries
- Simpler workflow: update VERSION → commit → tag → push tag
Breaking change: Manual workflow_dispatch releases no longer supported.
Use: git tag vX.Y.Z && git push origin vX.Y.Z
- Replace unreliable git fetch --dry-run check
- Use git rev-parse to compare local and remote commits
- Prevents false warnings about diverged branches
- Check VERSION file matches before triggering workflow
- Validate working directory is clean
- Confirm on main branch and up to date
- Load release notes from /tmp/release_notes_X.Y.Z.md
- Prevents wasting CI time on misconfigured releases
Proxmox VE 9.x removed support for the "full" parameter in the
/nodes/{node}/qemu/{vmid}/status/current endpoint. When Pulse sent
GetVMStatus() requests with ?full=1, Proxmox responded with:
API error 400: {"errors":{"full":"property is not defined in schema..."}}
This caused the cluster client to mark ALL endpoints as unhealthy, which
cascaded into multiple failures:
- VM status checks failed
- Guest agent queries were blocked
- Filesystem data collection stopped working
- All Windows VMs showed disk:-1 (unknown) instead of actual disk usage
The fix removes the ?full=1 parameter since Proxmox 9.x returns all data
by default without needing this parameter. This maintains backward
compatibility with older Proxmox versions while fixing the issue in 9.x.
After this fix:
- Cluster endpoints are correctly marked as healthy
- Guest agent queries work properly
- Windows VMs report actual disk usage (e.g., 26% on C:\ drive)
- VM monitoring functions normally on Proxmox 9.x
Implements a "Remember Me" option that allows users to stay logged in
for 30 days instead of the default 24 hours. This addresses the pain
point of frequent re-authentication in LAN-only environments while
maintaining authentication security.
Backend changes:
- Add rememberMe field to login request handling
- Support variable session durations (24h default, 30d with Remember Me)
- Implement sliding session expiration that extends sessions on each
authenticated request using the original duration
- Store OriginalDuration in session data for proper sliding window
- Update session cookie MaxAge to match session duration
Frontend changes:
- Add "Remember Me for 30 days" checkbox to login form
- Pass rememberMe flag in login request
- Improve UI with clear duration indication
Key features:
- Sessions extend automatically on each request (sliding window)
- Original duration preserved across session extension
- Backward compatible with existing sessions (legacy sessions work)
- Sessions persist across server restarts
This provides a better user experience for LAN deployments without
compromising security by completely disabling authentication.
After implementing the health gate, added comprehensive safety measures
to prevent the health checks themselves from becoming a new failure point.
**Problem**: Previous commit added strict health checks but could fail in
edge cases:
- `pct exec` could hang if container stopped/frozen → installer deadlocks
- systemctl/journalctl might not be available → diagnostics fail
- Container access check could fail for transient reasons
- pvecm error detection was fragile (string matching specific messages)
**Solutions Implemented**:
1. **Timeouts on All External Commands** (install.sh:1596,1618)
- `timeout 5` on systemctl checks
- `timeout 10` on pct exec checks
- Prevents installer from hanging indefinitely
2. **Graceful Degradation** (install.sh:1602-1630)
- Check for systemctl/pct availability before using
- Warn if tools missing instead of failing
- Container check is warning-only (may be transient)
- Only fail on critical checks: service running, socket exists
3. **Bypass Flag Support** (install.sh:1589-1594)
- Set `PULSE_SKIP_HEALTH_CHECKS=1` to bypass all checks
- Documented in error messages for troubleshooting
- Allows installation in unsupported environments
4. **Flexible Diagnostics** (install.sh:1640-1647)
- Use journalctl if available, fallback to syslog
- Conditional tool-specific advice
5. **Broader Error Detection** (ssh.go:582-628)
- List of 14 standalone indicators (vs 5 hardcoded checks)
- Case-insensitive matching for localization tolerance
- Permissive strategy: treat any known pattern as standalone
- Handles variations: "no cluster", "IPC", "connection refused", etc.
6. **Enhanced Test Coverage** (ssh_test.go:+35 lines)
- Added 3 new test cases (variation patterns)
- Tests now cover 8 standalone scenarios + 3 negative cases
- All tests pass (11/11)
**Impact**:
- Health gate won't block installation in edge cases
- Better user experience on non-standard setups
- Standalone detection handles more error message variations
- Clear escape hatch for troubleshooting (bypass flag)
**Confidence Level**: High
- All tests pass (bash syntax + Go unit tests)
- Graceful fallbacks for every external command
- Only critical checks are hard failures
- Warnings guide users through validation issues
Related to #571
Tests validate the error pattern matching logic added in previous commit,
ensuring we correctly identify:
1. **Standalone Node Patterns** (should trigger fallback):
- Classic: 'Corosync config does not exist'
- LXC ipcc errors: 'ipcc_send_rec[1] failed: Unknown error -1'
- Access control errors: 'Unable to load access control list'
- All patterns from GitHub issue #571
2. **Genuine Errors** (should NOT trigger fallback):
- Network timeouts
- Permission denied
- Command not found
Tests use real error messages from production GitHub issues to prevent
regressions. All 9 test cases pass.
Coverage:
- 6 standalone/LXC error patterns
- 3 genuine error cases (negative testing)
- References issue #571 for traceability
Related to #571
Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes:
**Problem 1: Silent Failures**
- Installations reported "SUCCESS" even when proxy never started
- UI showed green checkmarks with no temperature data
- Zero feedback when things went wrong
**Problem 2: Missing Diagnostics**
- Service failures logged only in journald
- Users saw "Something going on with the proxy" with no actionable guidance
- No way to troubleshoot from error messages
**Problem 3: Standalone Node Issues**
- Proxy daemon logged continuous pvecm errors as warnings
- "ipcc_send_rec" and "Unknown error -1" messages confused users
- These are expected for non-clustered/LXC setups
**Solutions Implemented:**
1. **Health Gate in install.sh (lines 1588-1629)**
- Verify service is running after installation
- Check socket exists on host
- Confirm socket visible inside container via bind mount
- Fail loudly with specific diagnostics if any check fails
2. **Actionable Error Messages in install-sensor-proxy.sh (lines 822-877)**
- When service fails to start: dump full systemctl status + 40 lines of logs
- When socket missing: show permissions, service status, and remediation command
- Include common issues checklist (missing user, permission errors, lm-sensors, etc.)
- Direct link to troubleshooting docs
3. **Better Standalone Node Detection in ssh.go (lines 585-595)**
- Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators
- Log at INFO level (not WARN) since this is expected behavior
- Clarify message: "using localhost for temperature collection"
**Impact:**
- Eliminates "green checkmark but no temps" scenario
- Users get immediate actionable feedback on failures
- Standalone/LXC installations work silently without error spam
- Reduces support burden from #571 (15+ comments of user frustration)
Related to #571