Remove all code paths that manipulate config files without Phase 2 locking:
1. Installer: Remove ensure_allowed_nodes_file_reference() call (line 1674)
- Migration now handled exclusively by config migrate-to-file
2. Installer: Make migration failures fatal in update_allowed_nodes()
- Prevents fallback to unsafe Python manipulation
3. Daemon sanitizer: Remove os.WriteFile() call
- Now only sanitizes in-memory copy, doesn't write back to disk
- Logs warning instructing admin to run `config migrate-to-file`
4. Self-heal script: Replace 132 lines of Python with CLI call
- sanitize_allowed_nodes() now calls `config migrate-to-file`
- Eliminates uncoordinated Python-based config rewriting
All config mutations now flow exclusively through Phase 2 CLI with
atomic operations and file locking. No code paths remain that can
create duplicate allowed_nodes blocks.
Addresses Codex review feedback on Phase 2 gaps.
Implements bullet-proof configuration management to completely eliminate
allowed_nodes corruption by design. This builds on Phase 1 (file-only mode)
by replacing all shell/Python config manipulation with proper Go tooling.
**New Features:**
- `pulse-sensor-proxy config validate` - parse and validate config files
- `pulse-sensor-proxy config set-allowed-nodes` - atomic node list updates
- File locking via flock prevents concurrent write races
- Atomic writes (temp file + rename) ensure consistency
- systemd ExecStartPre validation prevents startup with bad config
**Architectural Changes:**
1. Installer now calls config CLI instead of embedded Python/shell scripts
2. All config mutations go through single authoritative writer
3. Deduplication and normalization handled in Go (reuses existing logic)
4. Sanitizer kept as noisy failsafe (warns if corruption still occurs)
**Implementation Details:**
- New cmd/pulse-sensor-proxy/config_cmd.go with cobra commands
- withLockedFile() wrapper ensures exclusive access
- atomicWriteFile() uses temp + rename pattern
- Installer update_allowed_nodes() simplified to CLI calls
- Both systemd service modes include ExecStartPre validation
**Why This Works:**
- Single code path for all writes (no shell/Python divergence)
- File locking serializes self-heal timer + manual installer runs
- Validation gate prevents proxy from starting with corrupt config
- CLI uses same YAML parser as the daemon (guaranteed compatibility)
**Phase 2 Benefits:**
- Corruption impossible by design (not just detected and fixed)
- No more Python dependency for config management
- Atomic operations prevent partial writes
- Clear error messages on validation failures
The defensive sanitizer remains active but now logs loudly if triggered,
allowing us to confirm Phase 2 eliminates corruption in production before
removing the safety net entirely.
This completes the fix for the recurring temperature monitoring outages.
Related to Phase 1 commit 53dec6010
This implements HTTP/HTTPS support for pulse-sensor-proxy to enable
temperature monitoring across multiple separate Proxmox instances.
Architecture changes:
- Dual-mode operation: Unix socket (local) + HTTPS (remote)
- Unix socket remains default for security/performance (no breaking change)
- HTTP mode enables temps from external PVE hosts
Backend implementation:
- Add HTTPS server with TLS + Bearer token authentication to sensor-proxy
- Add TemperatureProxyURL and TemperatureProxyToken fields to PVEInstance
- Add HTTP client (internal/tempproxy/http_client.go) for remote proxy calls
- Update temperature collector to prefer HTTP proxy when configured
- Fallback logic: HTTP proxy → Unix socket → direct SSH (if not containerized)
Configuration:
- pulse-sensor-proxy config: http_enabled, http_listen_addr, http_tls_cert/key, http_auth_token
- PVEInstance config: temperature_proxy_url, temperature_proxy_token
- Environment variables: PULSE_SENSOR_PROXY_HTTP_* for all HTTP settings
Security:
- TLS 1.2+ with modern cipher suites
- Constant-time token comparison (timing attack prevention)
- Rate limiting applied to HTTP requests (shared with socket mode)
- Audit logging for all HTTP requests
Next steps:
- Update installer script to support HTTP mode + auto-registration
- Add Pulse API endpoint for proxy registration
- Generate TLS certificates during installation
- Test multi-instance temperature collection
Related to #571 (multi-instance architecture)
Users can now control logging verbosity through:
- YAML config file: log_level: "debug|info|warn|error"
- Environment variable: PULSE_SENSOR_PROXY_LOG_LEVEL
Default log level is set to "info" instead of debug, reducing verbose output.
Supported levels: trace, debug, info, warn, error, fatal, panic, disabled
Related to #629
Add support for configuring rate limits via config.yaml to allow
administrators to tune the proxy for different deployment sizes.
Changes:
- Add RateLimitConfig struct to config.go with per_peer_interval_ms and per_peer_burst
- Update newRateLimiter() to accept optional RateLimitConfig parameter
- Load rate limit config from YAML and apply overrides to defaults
- Update tests to pass nil for default behavior
- Add comprehensive config.example.yaml with documentation
Configuration examples:
- Small (1-3 nodes): 1000ms interval, burst 5 (default)
- Medium (4-10 nodes): 500ms interval, burst 10
- Large (10+ nodes): 250ms interval, burst 20
Defaults remain conservative (1 req/sec, burst 5) to support most
deployments while allowing customization for larger environments.
Related: #46b8b8d08 (rate limit fix for multi-node support)
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.
Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades