Critical fix for intermittent HTTP endpoint hangs identified by Codex analysis.
## Root Cause
SSH collection via getTemperatureViaSSH() had no timeout, causing HTTP
handlers to block indefinitely when sensors command hung. This held node-level
mutexes and rate limit slots, creating cascading failures where subsequent
requests queued indefinitely.
## Solution
- Thread request context through to SSH execution
- Add exec.CommandContext with 15s timeout (vs 30s HTTP client timeout)
- Create execCommandWithLimitsContext() to wrap SSH commands
- Ensures handlers always release locks and respond within deadline
## Impact
- HTTP temps endpoint now responds in ~70ms consistently
- Temperature data successfully collected and displayed in Pulse
- Eliminates 'context deadline exceeded' errors
- Prevents node gate deadlocks from slow/stuck SSH sessions
Related to Codex session 019a7e99-00fc-7903-afa3-01100baf47c6
This implements HTTP/HTTPS support for pulse-sensor-proxy to enable
temperature monitoring across multiple separate Proxmox instances.
Architecture changes:
- Dual-mode operation: Unix socket (local) + HTTPS (remote)
- Unix socket remains default for security/performance (no breaking change)
- HTTP mode enables temps from external PVE hosts
Backend implementation:
- Add HTTPS server with TLS + Bearer token authentication to sensor-proxy
- Add TemperatureProxyURL and TemperatureProxyToken fields to PVEInstance
- Add HTTP client (internal/tempproxy/http_client.go) for remote proxy calls
- Update temperature collector to prefer HTTP proxy when configured
- Fallback logic: HTTP proxy → Unix socket → direct SSH (if not containerized)
Configuration:
- pulse-sensor-proxy config: http_enabled, http_listen_addr, http_tls_cert/key, http_auth_token
- PVEInstance config: temperature_proxy_url, temperature_proxy_token
- Environment variables: PULSE_SENSOR_PROXY_HTTP_* for all HTTP settings
Security:
- TLS 1.2+ with modern cipher suites
- Constant-time token comparison (timing attack prevention)
- Rate limiting applied to HTTP requests (shared with socket mode)
- Audit logging for all HTTP requests
Next steps:
- Update installer script to support HTTP mode + auto-registration
- Add Pulse API endpoint for proxy registration
- Generate TLS certificates during installation
- Test multi-instance temperature collection
Related to #571 (multi-instance architecture)
This commit resolves the recurring temperature monitoring failures that have plagued multiple releases:
1. **Fix user mismatch (v4.27.1 regression)**:
- Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy'
- Aligns with the user created by install-sensor-proxy.sh (line 389)
- Prevents panic when binary is run outside systemd context
- Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too
2. **Fix standalone node validation (v4.25.0+ regression)**:
- pvecm status exits with code 2 on standalone nodes (not in a cluster)
- This caused validation to fail, rejecting all temperature requests
- Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames
- On standalone nodes, cluster membership list is populated with host's own addresses
- Maintains SSRF protection while allowing standalone operation
- Added comprehensive test coverage
3. **Make installer fail loudly on proxy setup failure**:
- Previously, failed proxy installation only printed a warning
- Install script then claimed "Pulse installation complete!" (confusing for users)
- Now exits with clear error message and remediation steps
- Forces operators to fix proxy issues before claiming success
- Users who skip temperature monitoring are unaffected
4. **Add test coverage to prevent future regressions**:
- Added TestDiscoverLocalHostAddresses to verify local address discovery
- Validates no loopback or link-local addresses are returned
- All existing tests pass with new changes
Pattern of failures across releases:
- v4.23.0: Missing proxy binaries in release
- v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie)
- v4.25.0: Single-node pvecm status exit code
- v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy)
This comprehensive fix addresses the root causes rather than applying another tactical patch.
Related to #571
When pulse-sensor-proxy runs inside a container (Docker/LXC), it cannot
complete SSH workflows properly, leading to continuous [preauth] log floods
on the Proxmox host. This happens because the proxy is meant to run on the
host, not inside the container.
Changes:
- Import internal/system for InContainer() detection
- Add startup warning when running in containerized environment
- Point users to docs/TEMPERATURE_MONITORING.md for correct setup
- Allow suppression via PULSE_SENSOR_PROXY_SUPPRESS_CONTAINER_WARNING=true
This catches the misconfiguration early and directs users to supported
installation methods, preventing the SSH spam reported in discussion #628.
Related to #637
The sensor-proxy was failing to start on systems with read-only filesystems
because audit logging required a writable /var/log/pulse/sensor-proxy directory.
Changes:
- Modified newAuditLogger() to automatically fall back to stderr (systemd journal)
if the audit log file cannot be opened
- Removed error return from newAuditLogger() since it now always succeeds
- Added warning logs when fallback mode is used to alert operators
- Updated tests to handle the new signature
- Added better debugging to audit log tests
This allows the sensor-proxy to run on:
- Immutable/read-only root filesystems
- Hardened systems with restricted /var mounts
- Containerized environments with limited write access
Audit events are still captured via systemd journal when file logging is
unavailable, maintaining the security audit trail.
Users can now control logging verbosity through:
- YAML config file: log_level: "debug|info|warn|error"
- Environment variable: PULSE_SENSOR_PROXY_LOG_LEVEL
Default log level is set to "info" instead of debug, reducing verbose output.
Supported levels: trace, debug, info, warn, error, fatal, panic, disabled
Related to #629
Add support for configuring rate limits via config.yaml to allow
administrators to tune the proxy for different deployment sizes.
Changes:
- Add RateLimitConfig struct to config.go with per_peer_interval_ms and per_peer_burst
- Update newRateLimiter() to accept optional RateLimitConfig parameter
- Load rate limit config from YAML and apply overrides to defaults
- Update tests to pass nil for default behavior
- Add comprehensive config.example.yaml with documentation
Configuration examples:
- Small (1-3 nodes): 1000ms interval, burst 5 (default)
- Medium (4-10 nodes): 500ms interval, burst 10
- Large (10+ nodes): 250ms interval, burst 20
Defaults remain conservative (1 req/sec, burst 5) to support most
deployments while allowing customization for larger environments.
Related: #46b8b8d08 (rate limit fix for multi-node support)
RELEASE BLOCKER FIX - Prevents containers from triggering host-level operations.
Added host-only method restrictions:
- RPCEnsureClusterKeys (SSH key distribution)
- RPCRegisterNodes (node registration)
- RPCRequestCleanup (cleanup operations)
Implementation:
- New privilegedMethods map defines host-only methods
- Request handler checks if method is privileged
- If privileged AND caller is from ID-mapped UID range (container), reject
- Host processes (real root, configured UIDs) can still call privileged methods
- Containers can still call get_temperature and get_status
Security impact:
- Prevents compromised containers from:
• Triggering unwanted SSH key distribution to cluster nodes
• Learning about cluster topology via forced registration
• DOS attacks by repeatedly calling key distribution
• Other host-level privileged operations
Without this fix, any container with root could call these methods after
authentication, undermining the security isolation between container and host.
Addresses high-severity finding #2 from security audit.
CRITICAL security fixes for pulse-sensor-proxy:
1. Strengthened hostname validation regex:
- Now requires hostnames to start with alphanumeric character
- Prevents SSH option injection via hostnames starting with '-'
- Pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{0,63}$ (1-64 chars total)
- Added IPv4 and IPv6 validation regexes for future use
2. Added validation to vulnerable V1 RPC handlers:
- handleGetTemperature: Now validates node parameter before SSH
- handleRegisterNodes: Now validates discovered cluster nodes
- Previously these handlers passed unsanitized input directly to SSH
3. Defense in depth:
- V2 handlers already had validation (now using improved regex)
- Multiple layers of protection against malicious node identifiers
- Validation prevents container from passing SSH options as hostnames
Without these fixes, a compromised container could potentially inject SSH
options by providing malicious node names, though the 'root@' prefix
provided some mitigation.
Addresses high-severity finding from security audit.
Implements automated cleanup workflow when nodes are deleted from Pulse, removing all monitoring footprint from the host. Changes include a new RPC handler in the sensor proxy for cleanup requests, enhanced node deletion modal with detailed cleanup explanations, and improved SSH key management with proper tagging for atomic updates.
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access
UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client
The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.
Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades