Pulse/cmd/pulse-sensor-proxy
rcourtman 8e993ea901 Add critical safety guards to temperature proxy installation
After implementing the health gate, added comprehensive safety measures
to prevent the health checks themselves from becoming a new failure point.

**Problem**: Previous commit added strict health checks but could fail in
edge cases:
- `pct exec` could hang if container stopped/frozen → installer deadlocks
- systemctl/journalctl might not be available → diagnostics fail
- Container access check could fail for transient reasons
- pvecm error detection was fragile (string matching specific messages)

**Solutions Implemented**:

1. **Timeouts on All External Commands** (install.sh:1596,1618)
   - `timeout 5` on systemctl checks
   - `timeout 10` on pct exec checks
   - Prevents installer from hanging indefinitely

2. **Graceful Degradation** (install.sh:1602-1630)
   - Check for systemctl/pct availability before using
   - Warn if tools missing instead of failing
   - Container check is warning-only (may be transient)
   - Only fail on critical checks: service running, socket exists

3. **Bypass Flag Support** (install.sh:1589-1594)
   - Set `PULSE_SKIP_HEALTH_CHECKS=1` to bypass all checks
   - Documented in error messages for troubleshooting
   - Allows installation in unsupported environments

4. **Flexible Diagnostics** (install.sh:1640-1647)
   - Use journalctl if available, fallback to syslog
   - Conditional tool-specific advice

5. **Broader Error Detection** (ssh.go:582-628)
   - List of 14 standalone indicators (vs 5 hardcoded checks)
   - Case-insensitive matching for localization tolerance
   - Permissive strategy: treat any known pattern as standalone
   - Handles variations: "no cluster", "IPC", "connection refused", etc.

6. **Enhanced Test Coverage** (ssh_test.go:+35 lines)
   - Added 3 new test cases (variation patterns)
   - Tests now cover 8 standalone scenarios + 3 negative cases
   - All tests pass (11/11)

**Impact**:
- Health gate won't block installation in edge cases
- Better user experience on non-standard setups
- Standalone detection handles more error message variations
- Clear escape hatch for troubleshooting (bypass flag)

**Confidence Level**: High
- All tests pass (bash syntax + Go unit tests)
- Graceful fallbacks for every external command
- Only critical checks are hard failures
- Warnings guide users through validation issues

Related to #571
2025-11-13 10:26:46 +00:00
..
audit.go Make pulse-sensor-proxy resilient to read-only filesystems 2025-11-06 00:18:51 +00:00
audit_test.go Make pulse-sensor-proxy resilient to read-only filesystems 2025-11-06 00:18:51 +00:00
auth.go feat(security): Implement range-based rate limiting 2025-11-07 17:08:45 +00:00
auth_test.go feat(security): Implement GID authorization enforcement 2025-11-07 17:09:16 +00:00
capabilities.go feat(security): Add capability-based authorization 2025-11-07 17:09:32 +00:00
cleanup.go feat: add comprehensive node cleanup system 2025-10-17 18:53:45 +00:00
config.example.yaml feat(security): Add node allowlist validation to prevent SSRF attacks 2025-11-07 17:08:28 +00:00
config.go feat(security): Add node allowlist validation to prevent SSRF attacks 2025-11-07 17:08:28 +00:00
main.go Fix persistent temperature monitoring issues for standalone Proxmox nodes (addresses #571) 2025-11-09 16:53:14 +00:00
main_test.go feat(security): Add capability-based authorization 2025-11-07 17:09:32 +00:00
metrics.go feat(security): Add node allowlist validation to prevent SSRF attacks 2025-11-07 17:08:28 +00:00
ssh.go Add critical safety guards to temperature proxy installation 2025-11-13 10:26:46 +00:00
ssh_test.go Add critical safety guards to temperature proxy installation 2025-11-13 10:26:46 +00:00
throttle.go feat(security): Implement range-based rate limiting 2025-11-07 17:08:45 +00:00
throttle_test.go feat(security): Implement range-based rate limiting 2025-11-07 17:08:45 +00:00
validation.go Improve sensor proxy cluster validation (Related to #703) 2025-11-12 19:17:45 +00:00
validation_fuzz_test.go security: complete Phase 1 sensor proxy hardening 2025-10-20 15:13:37 +00:00
validation_test.go Improve sensor proxy cluster validation (Related to #703) 2025-11-12 19:17:45 +00:00