Commit graph

12 commits

Author SHA1 Message Date
rcourtman
35adcf104f docs: add guidance for large deployments (30+ nodes) in rate limit config
Update config.example.yaml with:
- Recommendations for very large deployments (30+ nodes)
- Formula for calculating optimal rate limits based on node count
- Example calculation: 30 nodes with 10s polling = 300ms interval
- Security note about minimum safe intervals

This helps admins properly configure the proxy for enterprise
deployments with dozens of nodes.
2025-10-21 11:27:13 +00:00
rcourtman
44d5f91e92 feat: make pulse-sensor-proxy rate limits configurable
Add support for configuring rate limits via config.yaml to allow
administrators to tune the proxy for different deployment sizes.

Changes:
- Add RateLimitConfig struct to config.go with per_peer_interval_ms and per_peer_burst
- Update newRateLimiter() to accept optional RateLimitConfig parameter
- Load rate limit config from YAML and apply overrides to defaults
- Update tests to pass nil for default behavior
- Add comprehensive config.example.yaml with documentation

Configuration examples:
- Small (1-3 nodes): 1000ms interval, burst 5 (default)
- Medium (4-10 nodes): 500ms interval, burst 10
- Large (10+ nodes): 250ms interval, burst 20

Defaults remain conservative (1 req/sec, burst 5) to support most
deployments while allowing customization for larger environments.

Related: #46b8b8d08 (rate limit fix for multi-node support)
2025-10-21 11:25:21 +00:00
rcourtman
d856e75018 fix: increase pulse-sensor-proxy rate limits for multi-node support
- Increase rate limit from 1 req/5sec to 1 req/sec (60/min)
- Increase burst from 2 to 5 requests
- Fixes temperature collection failures when monitoring 3+ nodes
- All requests from containerized Pulse use same UID, causing rate limiting
- New limits support 5-10 node deployments comfortably

Resolves issue where adding standalone nodes broke temperature monitoring
for all nodes due to aggressive rate limiting.
2025-10-21 11:21:12 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00
rcourtman
29f4879cd4 test: add comprehensive security tests and documentation
Implements all remaining Codex recommendations before launch:

1. Privileged Methods Tests:
   - TestPrivilegedMethodsCompleteness ensures all host-side RPCs are protected
   - Will fail if new privileged RPC is added without authorization
   - Verifies read-only methods are NOT in privilegedMethods

2. ID-Mapped Root Detection Tests:
   - TestIDMappedRootDetection covers all boundary conditions
   - Tests UID/GID range detection (both must be in range)
   - Tests multiple ID ranges, edge cases, disabled mode
   - 100% coverage of container identification logic

3. Authorization Tests:
   - TestPrivilegedMethodsBlocked verifies containers can't call privileged RPCs
   - TestIDMappedRootDisabled ensures feature can be disabled
   - Tests both container and host credentials

4. Comprehensive Security Documentation (23 KB):
   - Architecture overview with diagrams
   - Complete authentication & authorization flow
   - Rate limiting details (already implemented: 20/min per peer)
   - SSH security model and forced commands
   - Container isolation mechanisms
   - Monitoring & alerting recommendations
   - Development mode documentation (PULSE_DEV_ALLOW_CONTAINER_SSH)
   - Troubleshooting guide with common issues
   - Incident response procedures

Rate Limiting Status:
- Already implemented in throttle.go (20 req/min, burst 10, max 10 concurrent)
- Per-peer rate limiting at line 328 in main.go
- Per-node concurrency control at line 825 in main.go
- Exceeds Codex's requirements

All tests pass. Documentation covers all security aspects.

Addresses final Codex recommendations for production readiness.
2025-10-19 16:47:13 +00:00
rcourtman
1519390f08 security: enhance logging for denied privileged method calls
Improved security audit trail for attempted container privilege escalation:

- Added detailed logging when containers attempt privileged methods
- Logs UID, GID, PID, correlation ID, and method name
- Marked with "SECURITY:" prefix for easy filtering/alerting
- Helps operators detect and investigate compromise attempts

Example log output:
  SECURITY: Container attempted to call privileged method - access denied
  method=ensure_cluster_keys uid=101000 gid=101000 pid=12345

Addresses Codex recommendation for comprehensive logging of denied
privileged RPCs to enable monitoring and alerting on attempted abuse.
2025-10-19 16:40:42 +00:00
rcourtman
026b9c5b77 security: add method-level authorization for privileged RPC methods
RELEASE BLOCKER FIX - Prevents containers from triggering host-level operations.

Added host-only method restrictions:
- RPCEnsureClusterKeys (SSH key distribution)
- RPCRegisterNodes (node registration)
- RPCRequestCleanup (cleanup operations)

Implementation:
- New privilegedMethods map defines host-only methods
- Request handler checks if method is privileged
- If privileged AND caller is from ID-mapped UID range (container), reject
- Host processes (real root, configured UIDs) can still call privileged methods
- Containers can still call get_temperature and get_status

Security impact:
- Prevents compromised containers from:
  • Triggering unwanted SSH key distribution to cluster nodes
  • Learning about cluster topology via forced registration
  • DOS attacks by repeatedly calling key distribution
  • Other host-level privileged operations

Without this fix, any container with root could call these methods after
authentication, undermining the security isolation between container and host.

Addresses high-severity finding #2 from security audit.
2025-10-19 16:31:50 +00:00
rcourtman
3a6a4fd362 security: fix SSH command injection vulnerabilities in pulse-sensor-proxy
CRITICAL security fixes for pulse-sensor-proxy:

1. Strengthened hostname validation regex:
   - Now requires hostnames to start with alphanumeric character
   - Prevents SSH option injection via hostnames starting with '-'
   - Pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{0,63}$ (1-64 chars total)
   - Added IPv4 and IPv6 validation regexes for future use

2. Added validation to vulnerable V1 RPC handlers:
   - handleGetTemperature: Now validates node parameter before SSH
   - handleRegisterNodes: Now validates discovered cluster nodes
   - Previously these handlers passed unsanitized input directly to SSH

3. Defense in depth:
   - V2 handlers already had validation (now using improved regex)
   - Multiple layers of protection against malicious node identifiers
   - Validation prevents container from passing SSH options as hostnames

Without these fixes, a compromised container could potentially inject SSH
options by providing malicious node names, though the 'root@' prefix
provided some mitigation.

Addresses high-severity finding from security audit.
2025-10-19 16:28:38 +00:00
rcourtman
123e0f04ca feat: add comprehensive node cleanup system
Implements automated cleanup workflow when nodes are deleted from Pulse, removing all monitoring footprint from the host. Changes include a new RPC handler in the sensor proxy for cleanup requests, enhanced node deletion modal with detailed cleanup explanations, and improved SSH key management with proper tagging for atomic updates.
2025-10-17 18:53:45 +00:00
rcourtman
f141f7db33 feat: enhance sensor proxy with improved cluster discovery and SSH management
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access

UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client

The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
2025-10-17 11:43:26 +00:00
rcourtman
e4c3b06f14 Automate sensor proxy container mount and auth 2025-10-14 12:41:48 +00:00
rcourtman
b952444837 refactor: Rename pulse-temp-proxy to pulse-sensor-proxy
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.

Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades
2025-10-13 13:17:05 +00:00