Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
The auto-update flow stops the Pulse service before applying updates.
If the update fails, the rollback path restored files but never
restarted the service. Since the main unit was explicitly stopped
(not crashed), systemd's Restart=always didn't rescue it.
Add restart-on-failure guards to both pulse-auto-update.sh and
install.sh so Pulse is always restarted after a failed update attempt.
The backup_existing function copied the entire config directory
(including metrics.db at ~2.5GB) on every upgrade with no cleanup.
On small VMs this filled the disk within a few releases.
The upgrade only swaps the binary; config files are not modified,
so the backup served no practical purpose.
The v4 installer added mount entries for /run/pulse-sensor-proxy to LXC
container configs. After upgrading to v5 and rebooting, /run (tmpfs) is
wiped and the container fails to start. The installer now detects and
removes these stale mp<N> and lxc.mount.entry references automatically
when run on a PVE host, and the upgrade docs include manual fix steps.
On SELinux-enforcing systems (Fedora, RHEL, CentOS), binaries installed to
non-standard locations need proper security contexts for systemd to execute
them. Without this, systemd fails with 'Permission denied' even when the
binary has correct Unix permissions.
Changes:
- Add restore_selinux_contexts() function to both install scripts
- Uses restorecon (preferred) or chcon (fallback) to set bin_t context
- Only runs when SELinux is detected and enforcing
- Called after binary installation, before systemd service start
Alpine uses apk/OpenRC instead of apt/systemd, which the Pulse
LXC installation flow requires. This prevents failed installations.
- Remove Alpine download option from advanced mode
- Add note that Pulse requires Debian-based templates
- Add validation when user selects from template list to catch
Alpine/Gentoo/Arch/Void and fall back to Debian 12 with warning
Related to #915
The unified agent now handles temperature monitoring in v5+, making
pulse-sensor-proxy unnecessary. This commit:
1. Adds INSTALLER_MAJOR_VERSION constant to declare bundled version
2. Skips 'Temperature Monitoring Setup' prompts for v5+ installs
3. Skips sensor proxy installation entirely for v5+
4. Updates help text to mark --proxy as deprecated for v5+
5. Removes outdated sensor proxy instructions from completion message
Fixes the 'pct pull TASK ERROR: failed to open /opt/pulse/bin/pulse-sensor-proxy-linux-amd64'
error reported by users installing v5.0.0-rc.3.
Reported-by: RLSinRFV (GitHub Discussion #845)
Addresses #827
- Added 3-retry logic with 2-second delays between attempts
- Increased timeout from 15s to 30s for slower connections
- Show actual curl error instead of suppressing stderr
- Provide workaround instructions (download manually then run)
- Show the URL being downloaded for easier debugging
The installer was missing:
1. copy_unified_agent_binaries_from_dir() to extract pulse-agent-* binaries
from the release tarball to /opt/pulse/bin/
2. install.sh and install.ps1 in the deploy_agent_scripts() array
This caused /install.sh and /download/pulse-agent to return 404 on fresh
installations and upgrades from pre-4.33.0 versions.
Related to #760, #751
The unified agent system replaced install-host-agent.sh with install.sh.
This commit updates all references:
- Dockerfile: removed COPY for deleted script
- router.go: serve install.sh at /install-host-agent.sh endpoint (backwards compatible)
- build-release.sh: removed copy of deleted script
- validate-release.sh: removed validation of deleted script
- install.sh: updated script list for bare-metal installs
Scripts like install.sh and install-sensor-proxy.sh are now attached
as release assets and downloaded from releases/latest/download/ URLs.
This ensures users always get scripts compatible with their installed
version, even while development continues on main.
Changes:
- build-release.sh: copy install scripts to release directory
- create-release.yml: upload scripts as release assets
- Updated all documentation and code references to use release URLs
- Scripts reference each other via release URLs for consistency
The grep pattern was looking for 'pulse-sensor-proxy' as a standalone
string, but the actual mount line contains paths like:
mp0: /run/pulse-sensor-proxy,mp=/mnt/pulse-proxy,replicate=0
This caused the removal logic to never execute, leaving the old mp
mount in place and preventing the migration to lxc.mount.entry format.
Changed pattern to match either path component:
- /pulse-sensor-proxy (source path)
- /mnt/pulse-proxy (mount point)
Also removed space after colon in pattern to match actual format.
This completes the fix for temperature proxy setup on LXC containers.
The /etc/pve/ directory is a clustered FUSE filesystem (pmxcfs) managed
by Proxmox. Direct modifications using sed -i or echo >> don't work
reliably on this filesystem, and LXC config files contain snapshot
sections that must be preserved.
Changes:
- Use temp file approach: copy config, modify temp, copy back to trigger sync
- Only modify main config section (before first [snapshot] marker)
- Properly handle both mp mount removal and lxc.mount.entry addition
- Apply fix to both install.sh and install-sensor-proxy.sh
This fixes temperature proxy setup failures where the socket mount
entry wasn't being persisted to the container configuration.
Related to #628
After implementing the health gate, added comprehensive safety measures
to prevent the health checks themselves from becoming a new failure point.
**Problem**: Previous commit added strict health checks but could fail in
edge cases:
- `pct exec` could hang if container stopped/frozen → installer deadlocks
- systemctl/journalctl might not be available → diagnostics fail
- Container access check could fail for transient reasons
- pvecm error detection was fragile (string matching specific messages)
**Solutions Implemented**:
1. **Timeouts on All External Commands** (install.sh:1596,1618)
- `timeout 5` on systemctl checks
- `timeout 10` on pct exec checks
- Prevents installer from hanging indefinitely
2. **Graceful Degradation** (install.sh:1602-1630)
- Check for systemctl/pct availability before using
- Warn if tools missing instead of failing
- Container check is warning-only (may be transient)
- Only fail on critical checks: service running, socket exists
3. **Bypass Flag Support** (install.sh:1589-1594)
- Set `PULSE_SKIP_HEALTH_CHECKS=1` to bypass all checks
- Documented in error messages for troubleshooting
- Allows installation in unsupported environments
4. **Flexible Diagnostics** (install.sh:1640-1647)
- Use journalctl if available, fallback to syslog
- Conditional tool-specific advice
5. **Broader Error Detection** (ssh.go:582-628)
- List of 14 standalone indicators (vs 5 hardcoded checks)
- Case-insensitive matching for localization tolerance
- Permissive strategy: treat any known pattern as standalone
- Handles variations: "no cluster", "IPC", "connection refused", etc.
6. **Enhanced Test Coverage** (ssh_test.go:+35 lines)
- Added 3 new test cases (variation patterns)
- Tests now cover 8 standalone scenarios + 3 negative cases
- All tests pass (11/11)
**Impact**:
- Health gate won't block installation in edge cases
- Better user experience on non-standard setups
- Standalone detection handles more error message variations
- Clear escape hatch for troubleshooting (bypass flag)
**Confidence Level**: High
- All tests pass (bash syntax + Go unit tests)
- Graceful fallbacks for every external command
- Only critical checks are hard failures
- Warnings guide users through validation issues
Related to #571
Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes:
**Problem 1: Silent Failures**
- Installations reported "SUCCESS" even when proxy never started
- UI showed green checkmarks with no temperature data
- Zero feedback when things went wrong
**Problem 2: Missing Diagnostics**
- Service failures logged only in journald
- Users saw "Something going on with the proxy" with no actionable guidance
- No way to troubleshoot from error messages
**Problem 3: Standalone Node Issues**
- Proxy daemon logged continuous pvecm errors as warnings
- "ipcc_send_rec" and "Unknown error -1" messages confused users
- These are expected for non-clustered/LXC setups
**Solutions Implemented:**
1. **Health Gate in install.sh (lines 1588-1629)**
- Verify service is running after installation
- Check socket exists on host
- Confirm socket visible inside container via bind mount
- Fail loudly with specific diagnostics if any check fails
2. **Actionable Error Messages in install-sensor-proxy.sh (lines 822-877)**
- When service fails to start: dump full systemctl status + 40 lines of logs
- When socket missing: show permissions, service status, and remediation command
- Include common issues checklist (missing user, permission errors, lm-sensors, etc.)
- Direct link to troubleshooting docs
3. **Better Standalone Node Detection in ssh.go (lines 585-595)**
- Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators
- Log at INFO level (not WARN) since this is expected behavior
- Clarify message: "using localhost for temperature collection"
**Impact:**
- Eliminates "green checkmark but no temps" scenario
- Users get immediate actionable feedback on failures
- Standalone/LXC installations work silently without error spam
- Reduces support burden from #571 (15+ comments of user frustration)
Related to #571
The installer was constructing malformed download URLs like:
https://github.com/.../download/location: https://github.com/.../pulse-location: ...
This occurred when the latest GitHub release is a draft:
1. /releases/latest API returns nothing (drafts don't count as "latest")
2. Fallback redirect scraper gets "location: .../releases" (no /tag/)
3. sed regex fails to match but echoes the entire header line
4. That malformed string becomes LATEST_RELEASE, breaking the download URL
Fixed by:
1. Switch both stable and RC channels to use /releases endpoint
2. Filter JSON to get first non-draft (and non-prerelease for stable)
3. Harden redirect scraper to only match when /tag/ is actually present
4. Fall through to v4.5.1 hardcoded fallback if both methods fail
This ensures the installer works correctly when latest release is draft,
during DNS issues, and when GitHub API is unavailable.
The grep pattern was too loose and could match filenames like:
- pulse-v4.29.0-linux-amd64.tar.gz (correct)
- pulse-v4.29.0-linux-amd64.tar.gz.sha256 (also matched)
Using grep -w ensures we only match the exact filename as a complete word,
preventing false matches on files with the same prefix.
The install script now tries checksums.txt first (v4.29.0+), then falls back
to individual .sha256 files (v4.28.0 and earlier). This ensures users can
update from any version regardless of which checksum format was used.
This fixes the release format transition issue where changing asset structure
broke updates for users on older versions.
Aligns with release asset reduction changes. The install script now downloads the unified checksums.txt file and extracts the checksum for the specific architecture being installed.
Related to #681
The variable local_proxy_binary was declared with local scope inside
the BUILD_FROM_SOURCE conditional block but referenced outside of it
during cleanup. This caused "unbound variable" errors on release installs
since the script uses set -u.
Moved the declaration before the conditional block and initialize to empty
string. The cleanup code [[ -f "$local_proxy_binary" ]] already handles
the empty string case safely.
This commit resolves the recurring temperature monitoring failures that have plagued multiple releases:
1. **Fix user mismatch (v4.27.1 regression)**:
- Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy'
- Aligns with the user created by install-sensor-proxy.sh (line 389)
- Prevents panic when binary is run outside systemd context
- Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too
2. **Fix standalone node validation (v4.25.0+ regression)**:
- pvecm status exits with code 2 on standalone nodes (not in a cluster)
- This caused validation to fail, rejecting all temperature requests
- Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames
- On standalone nodes, cluster membership list is populated with host's own addresses
- Maintains SSRF protection while allowing standalone operation
- Added comprehensive test coverage
3. **Make installer fail loudly on proxy setup failure**:
- Previously, failed proxy installation only printed a warning
- Install script then claimed "Pulse installation complete!" (confusing for users)
- Now exits with clear error message and remediation steps
- Forces operators to fix proxy issues before claiming success
- Users who skip temperature monitoring are unaffected
4. **Add test coverage to prevent future regressions**:
- Added TestDiscoverLocalHostAddresses to verify local address discovery
- Validates no loopback or link-local addresses are returned
- All existing tests pass with new changes
Pattern of failures across releases:
- v4.23.0: Missing proxy binaries in release
- v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie)
- v4.25.0: Single-node pvecm status exit code
- v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy)
This comprehensive fix addresses the root causes rather than applying another tactical patch.
Related to #571
The 5-second connect timeout was too aggressive for DNS resolution in some
Proxmox LXC environments, causing "Resolving timed out after 5000 milliseconds"
errors when downloading the auto-update script from raw.githubusercontent.com.
Changes:
- Add download_auto_update_script() helper with retry logic
- Increase connect timeout from 5s to 15s for slow DNS
- Increase max time from 15s to 60s for complete transfer
- Retry up to 3 times with incremental backoff (3s, 6s delays)
- Gracefully degrade: installer continues without auto-updates if download fails
- Users can re-run with --enable-auto-updates later when connectivity improves
Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be
more user-friendly and address common setup issues:
- Added clear explanation of why the proxy is needed (containers can't access hardware)
- Provided concrete IP example instead of placeholder
- Showed full docker-compose.yml context with proper YAML structure
- Added sudo to commands where needed
- Updated docker-compose commands to v2 syntax with note about v1
- Expanded verification steps with clearer success indicators
- Added reminder to check container name in verification commands
These improvements should help users who encounter blank temperature displays
due to missing proxy installation or bind mount configuration.
The bare metal installer was not copying pulse-host-agent binaries from
release tarballs into /opt/pulse/bin/, causing 404 errors when users
tried to install the host agent via the download endpoint.
Changes:
- Copy pulse-host-agent binary during initial installation (alongside
pulse-docker-agent)
- Update install_additional_agent_binaries() to fetch and install
cross-platform host agent binaries (linux-amd64, linux-arm64,
linux-armv7, darwin-amd64, darwin-arm64, windows-amd64)
- Match existing pattern used for Docker agent distribution
The build pipeline (build-release.sh and Dockerfile) already correctly
includes host agent binaries in releases and Docker images. This fix
ensures the installer deploys them.
Users on bare metal deployments should rerun install.sh to populate
/opt/pulse/bin/ with the missing host agent binaries. Docker
deployments are unaffected.