Commit graph

3 commits

Author SHA1 Message Date
rcourtman
c9d1671afd Fix persistent temperature monitoring issues for standalone Proxmox nodes (addresses #571)
This commit resolves the recurring temperature monitoring failures that have plagued multiple releases:

1. **Fix user mismatch (v4.27.1 regression)**:
   - Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy'
   - Aligns with the user created by install-sensor-proxy.sh (line 389)
   - Prevents panic when binary is run outside systemd context
   - Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too

2. **Fix standalone node validation (v4.25.0+ regression)**:
   - pvecm status exits with code 2 on standalone nodes (not in a cluster)
   - This caused validation to fail, rejecting all temperature requests
   - Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames
   - On standalone nodes, cluster membership list is populated with host's own addresses
   - Maintains SSRF protection while allowing standalone operation
   - Added comprehensive test coverage

3. **Make installer fail loudly on proxy setup failure**:
   - Previously, failed proxy installation only printed a warning
   - Install script then claimed "Pulse installation complete!" (confusing for users)
   - Now exits with clear error message and remediation steps
   - Forces operators to fix proxy issues before claiming success
   - Users who skip temperature monitoring are unaffected

4. **Add test coverage to prevent future regressions**:
   - Added TestDiscoverLocalHostAddresses to verify local address discovery
   - Validates no loopback or link-local addresses are returned
   - All existing tests pass with new changes

Pattern of failures across releases:
- v4.23.0: Missing proxy binaries in release
- v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie)
- v4.25.0: Single-node pvecm status exit code
- v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy)

This comprehensive fix addresses the root causes rather than applying another tactical patch.

Related to #571
2025-11-09 16:53:14 +00:00
rcourtman
b2e65f7b3e feat(security): Add SSH output limits and improve host key management
Addresses two security vulnerabilities:

1. SSH Output Size Limits:
   - Prevents memory exhaustion from malicious remote nodes
   - Configurable max_ssh_output_bytes (default 1MB)
   - Stream with io.LimitReader to cap output size
   - New metric: pulse_proxy_ssh_output_oversized_total{node}
   - WARN logging for oversized outputs

2. Improved Host Key Management:
   - Seed host keys from Proxmox cluster store (/etc/pve/priv/known_hosts)
   - Falls back to ssh-keyscan only if Proxmox unavailable (with WARN)
   - Fingerprint change detection with ERROR logging
   - require_proxmox_hostkeys option for strict mode
   - New metric: pulse_proxy_hostkey_changes_total{node}
   - Reduces MITM attack surface significantly

Known hosts manager now normalizes entries, reuses existing fingerprints,
and raises typed HostKeyChangeError when fingerprints differ.

Related to security audit 2025-11-07.

Co-authored-by: Codex <codex@openai.com>
2025-11-07 17:09:02 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00