Commit graph

11 commits

Author SHA1 Message Date
rcourtman
f9341ae1fc Improve temperature proxy workflow 2025-11-17 14:25:46 +00:00
rcourtman
c957ccd9e6 Add CI build workflow and tighten proxy diagnostics 2025-11-14 13:32:29 +00:00
rcourtman
e4e915c8a1 Fix temperature data intermittency caused by proxy rate limit retries
Root Cause:
The classifyError() function in tempproxy/client.go was returning nil
when err was nil, even if respError contained "rate limit exceeded".
This caused the retry logic to treat rate limit errors as retryable,
triggering 3 retries with exponential backoff (100ms, 200ms, 400ms)
for each rate-limited request.

With multiple nodes polling simultaneously and hitting the proxy's
1 req/sec default rate limit, this created a retry storm:
- 3 nodes polling every 10 seconds
- 1-2 requests rate limited per cycle
- Each rate limit triggered 3 retries
- Result: 6+ extra requests per cycle, causing temperature data to
  flicker in and out as requests were dropped

Solution:
1. Reordered classifyError() to check respError first before checking
   if err is nil, ensuring rate limit errors are properly classified
2. Added explicit rate limit detection that marks these errors as
   non-retryable
3. Added stub EnableTemperatureMonitoring/DisableTemperatureMonitoring
   methods to Monitor for interface compatibility

Impact:
- Rate limit retry attempts reduced from 151 in 10 minutes to 0
- Temperature data now stable for all nodes
- No more flickering temperature displays in dashboard
2025-11-05 10:20:15 +00:00
rcourtman
1e25fa572a security: add resilience and error handling to tempproxy client
Implements comprehensive client-side improvements for production reliability:

1. Context Support with Deadlines:
   - Added callWithContext() for context-aware RPC calls
   - Respects context deadlines and cancellation
   - Prevents goroutine pileup under network issues

2. Exponential Backoff with Jitter:
   - Automatic retry with exponential backoff (100ms → 10s)
   - ±10% jitter to prevent thundering herd
   - Max 3 retries for transient failures
   - Smart retry decision based on error classification

3. Error Classification:
   - ProxyError type with classification (Transport, Auth, SSH, Sensor, Timeout)
   - Retryable vs non-retryable error identification
   - Better error messages for debugging
   - Structured error handling throughout

4. Improved Connection Handling:
   - DialContext for cancellable connections
   - Proper deadline propagation
   - Clean separation of single-attempt vs retry logic
   - Legacy call() method preserved for backwards compatibility

Security Notes:
- SSH fallback already blocked in containers (temperature.go:69-77)
- Per-client token auth not needed after method-level authz (commit d55112ac4)
- ID-mapped root blocked from privileged methods

Performance:
- No retry on non-retryable errors (auth, sensor failures)
- Context cancellation short-circuits retry loops
- Jitter prevents synchronized retry storms

Addresses Codex findings #4 and #5 from security audit.
2025-10-19 16:37:11 +00:00
rcourtman
123e0f04ca feat: add comprehensive node cleanup system
Implements automated cleanup workflow when nodes are deleted from Pulse, removing all monitoring footprint from the host. Changes include a new RPC handler in the sensor proxy for cleanup requests, enhanced node deletion modal with detailed cleanup explanations, and improved SSH key management with proper tagging for atomic updates.
2025-10-17 18:53:45 +00:00
rcourtman
65b696f2d6 fix: remove unused log import from tempproxy client
Leftover from removing EnsureClusterKeys() method. Caused compile failure
preventing hot-dev from starting.
2025-10-17 14:15:37 +00:00
rcourtman
f141f7db33 feat: enhance sensor proxy with improved cluster discovery and SSH management
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access

UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client

The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
2025-10-17 11:43:26 +00:00
rcourtman
e4c3b06f14 Automate sensor proxy container mount and auth 2025-10-14 12:41:48 +00:00
rcourtman
b952444837 refactor: Rename pulse-temp-proxy to pulse-sensor-proxy
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.

Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades
2025-10-13 13:17:05 +00:00
rcourtman
c7bb76c12e fix: Switch proxy socket to directory-level bind mount for stability
Fixes LXC bind mount issue where socket-level mounts break when the
socket is recreated by systemd. Following Codex's recommendation to
bind mount the directory instead of the file.

Changes:
- Socket path: /run/pulse-temp-proxy/pulse-temp-proxy.sock
- Systemd: RuntimeDirectory=pulse-temp-proxy (auto-creates /run/pulse-temp-proxy)
- Systemd: RuntimeDirectoryMode=0770 for group access
- LXC mount: Bind entire /run/pulse-temp-proxy directory
- Install script: Upgrades old socket-level mounts to directory-level
- Install script: Detects and handles bind mount changes

This survives socket recreations and container restarts. The directory
mount persists even when systemd unlinks/recreates the socket file.

Related to #528
2025-10-12 22:33:53 +00:00
rcourtman
e7bc338891 feat: Implement secure temperature proxy for containerized deployments
Addresses #528

Introduces pulse-temp-proxy architecture to eliminate SSH key exposure in containers:

**Architecture:**
- pulse-temp-proxy runs on Proxmox host (outside LXC/Docker)
- SSH keys stored on host filesystem (/var/lib/pulse-temp-proxy/ssh/)
- Pulse communicates via unix socket (bind-mounted into container)
- Proxy handles cluster discovery, key rollout, and temperature fetching

**Components:**
- cmd/pulse-temp-proxy: Standalone Go binary with unix socket RPC server
- internal/tempproxy: Client library for Pulse backend
- scripts/install-temp-proxy.sh: Idempotent installer for existing deployments
- scripts/pulse-temp-proxy.service: Systemd service for proxy

**Integration:**
- Pulse automatically detects and uses proxy when socket exists
- Falls back to direct SSH for native installations
- Installer automatically configures proxy for new LXC deployments
- Existing LXC users can upgrade by running install-temp-proxy.sh

**Security improvements:**
- Container compromise no longer exposes SSH keys
- SSH keys never enter container filesystem
- Maintains forced command restrictions
- Transparent to users - no workflow changes

**Documentation:**
- Updated TEMPERATURE_MONITORING.md with new architecture
- Added verification steps and upgrade instructions
- Preserved legacy documentation for native installs
2025-10-12 21:35:35 +00:00