Dramatically improve temperature proxy installation robustness

Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes:

**Problem 1: Silent Failures**
- Installations reported "SUCCESS" even when proxy never started
- UI showed green checkmarks with no temperature data
- Zero feedback when things went wrong

**Problem 2: Missing Diagnostics**
- Service failures logged only in journald
- Users saw "Something going on with the proxy" with no actionable guidance
- No way to troubleshoot from error messages

**Problem 3: Standalone Node Issues**
- Proxy daemon logged continuous pvecm errors as warnings
- "ipcc_send_rec" and "Unknown error -1" messages confused users
- These are expected for non-clustered/LXC setups

**Solutions Implemented:**

1. **Health Gate in install.sh (lines 1588-1629)**
   - Verify service is running after installation
   - Check socket exists on host
   - Confirm socket visible inside container via bind mount
   - Fail loudly with specific diagnostics if any check fails

2. **Actionable Error Messages in install-sensor-proxy.sh (lines 822-877)**
   - When service fails to start: dump full systemctl status + 40 lines of logs
   - When socket missing: show permissions, service status, and remediation command
   - Include common issues checklist (missing user, permission errors, lm-sensors, etc.)
   - Direct link to troubleshooting docs

3. **Better Standalone Node Detection in ssh.go (lines 585-595)**
   - Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators
   - Log at INFO level (not WARN) since this is expected behavior
   - Clarify message: "using localhost for temperature collection"

**Impact:**
- Eliminates "green checkmark but no temps" scenario
- Users get immediate actionable feedback on failures
- Standalone/LXC installations work silently without error spam
- Reduces support burden from #571 (15+ comments of user frustration)

Related to #571
This commit is contained in:
rcourtman 2025-11-13 10:14:19 +00:00
parent 52ee702187
commit d3875eaae5
3 changed files with 93 additions and 9 deletions

View file

@ -1583,7 +1583,50 @@ fi'; then
fi
if bash "$proxy_script" "${proxy_install_args[@]}" 2>&1 | tee /tmp/proxy-install-${CTID}.log; then
print_info "Temperature proxy installed successfully"
print_info "Temperature proxy installation script completed"
# Verify proxy is actually working
echo
print_info "Verifying temperature proxy health..."
local proxy_health_ok=true
# Check 1: Service is running
if ! systemctl is-active --quiet pulse-sensor-proxy 2>/dev/null; then
print_error "✗ Service not running"
proxy_health_ok=false
else
print_info "✓ Service running"
fi
# Check 2: Socket exists
if [[ ! -S /run/pulse-sensor-proxy/pulse-sensor-proxy.sock ]]; then
print_error "✗ Socket not found at /run/pulse-sensor-proxy/pulse-sensor-proxy.sock"
proxy_health_ok=false
else
print_info "✓ Socket exists"
fi
# Check 3: Socket is accessible from container
if ! pct exec $CTID -- test -S /mnt/pulse-proxy/pulse-sensor-proxy.sock 2>/dev/null; then
print_error "✗ Socket not visible inside container at /mnt/pulse-proxy/pulse-sensor-proxy.sock"
print_error " Bind mount may not be configured correctly"
proxy_health_ok=false
else
print_info "✓ Socket accessible from container"
fi
if [[ "$proxy_health_ok" != "true" ]]; then
echo
print_error "Temperature proxy health check failed"
print_error "See diagnostics above and logs: /tmp/proxy-install-${CTID}.log"
print_error ""
print_error "Check: systemctl status pulse-sensor-proxy"
print_error "Check: journalctl -u pulse-sensor-proxy -n 50"
echo
exit 1
fi
print_success "Temperature proxy is healthy and ready"
# Clean up temporary binary if it was copied
[[ -f "$local_proxy_binary" ]] && rm -f "$local_proxy_binary"
else