Fix persistent temperature monitoring issues for standalone Proxmox nodes (addresses #571)

This commit resolves the recurring temperature monitoring failures that have plagued multiple releases:

1. **Fix user mismatch (v4.27.1 regression)**:
   - Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy'
   - Aligns with the user created by install-sensor-proxy.sh (line 389)
   - Prevents panic when binary is run outside systemd context
   - Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too

2. **Fix standalone node validation (v4.25.0+ regression)**:
   - pvecm status exits with code 2 on standalone nodes (not in a cluster)
   - This caused validation to fail, rejecting all temperature requests
   - Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames
   - On standalone nodes, cluster membership list is populated with host's own addresses
   - Maintains SSRF protection while allowing standalone operation
   - Added comprehensive test coverage

3. **Make installer fail loudly on proxy setup failure**:
   - Previously, failed proxy installation only printed a warning
   - Install script then claimed "Pulse installation complete!" (confusing for users)
   - Now exits with clear error message and remediation steps
   - Forces operators to fix proxy issues before claiming success
   - Users who skip temperature monitoring are unaffected

4. **Add test coverage to prevent future regressions**:
   - Added TestDiscoverLocalHostAddresses to verify local address discovery
   - Validates no loopback or link-local addresses are returned
   - All existing tests pass with new changes

Pattern of failures across releases:
- v4.23.0: Missing proxy binaries in release
- v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie)
- v4.25.0: Single-node pvecm status exit code
- v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy)

This comprehensive fix addresses the root causes rather than applying another tactical patch.

Related to #571
This commit is contained in:
rcourtman 2025-11-09 16:53:14 +00:00
parent 62a9f40cc7
commit c9d1671afd
4 changed files with 138 additions and 7 deletions

View file

@ -1427,9 +1427,21 @@ fi'; then
# Clean up temporary binary if it was copied
[[ -f "$local_proxy_binary" ]] && rm -f "$local_proxy_binary"
else
print_warn "Proxy installation failed - temperature monitoring will use fallback method"
print_info "Check logs: /tmp/proxy-install-${CTID}.log"
print_info "Or run manually: curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | bash -s -- --ctid $CTID"
# Proxy installation failed - this is a fatal error since user opted in
rm -f "$proxy_script"
echo
print_error "Temperature proxy installation failed"
echo
echo "Temperature monitoring was enabled but the proxy setup failed."
echo "Check logs: /tmp/proxy-install-${CTID}.log"
echo
echo "To fix, you can either:"
echo " 1. Fix the issue and run the proxy installer manually:"
echo " curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | bash -s -- --ctid $CTID"
echo
echo " 2. Re-run the Pulse installer and skip temperature monitoring when prompted"
echo
exit 1
fi
rm -f "$proxy_script"
fi