13 KiB
Security Changelog - Pulse Sensor Proxy
Deprecated in v5:
pulse-sensor-proxyis deprecated and not recommended for new deployments. This changelog is retained for existing installations and historical reference.
2025-11-07: Critical Security Hardening
Summary
Comprehensive security audit and hardening of the pulse-sensor-proxy architecture. Four critical vulnerabilities were identified and fixed, significantly improving the security posture against container compromise scenarios.
Security Fixes
1. Read-Only Socket Mount (CRITICAL) ✅ FIXED
Vulnerability: Socket directory was mounted read-write into containers, allowing compromised containers to:
- Unlink the socket and create man-in-the-middle proxies
- Fill
/run/pulse-sensor-proxy/to exhaust tmpfs - Race the proxy service on restart to hijack the socket path
Fix: Changed all socket mounts to read-only (:ro)
- Files Modified:
docker-compose.yml,docs/TEMPERATURE_MONITORING.md - Impact: Breaking change for existing deployments (must update mount to
:ro) - Migration: Change
:/run/pulse-sensor-proxy:rwto:/run/pulse-sensor-proxy:ro
Security Benefit: Compromised containers can no longer tamper with socket infrastructure.
2. Node Allowlist Validation (CRITICAL) ✅ FIXED
Vulnerability: Proxy would SSH to ANY hostname/IP that passed format validation, enabling:
- Internal network reconnaissance via SSH handshakes
- Port scanning using the proxy as a relay
- Resource exhaustion via slow-loris SSH attacks
- Complete bypass of network security controls
Fix: Multi-layer node validation system
- New Files:
cmd/pulse-sensor-proxy/validation.go - Modified Files:
cmd/pulse-sensor-proxy/config.go,cmd/pulse-sensor-proxy/main.go,cmd/pulse-sensor-proxy/metrics.go - Features:
- Configurable
allowed_nodeslist (supports hostnames, IPs, CIDR ranges) - Automatic cluster membership validation on Proxmox hosts
- 5-minute cache of cluster membership to reduce pvecm overhead
strict_node_validationoption for strict vs. permissive modes- Prometheus metric:
pulse_proxy_node_validation_failures_total
- Configurable
Configuration Example:
# Only allow specific nodes
allowed_nodes:
- "pve1"
- "pve2.example.com"
- "192.168.1.0/24"
# Require cluster membership validation
strict_node_validation: true
```text
**Default Behavior:** If `allowed_nodes` is empty and proxy runs on Proxmox host, automatically validates against cluster membership (secure by default).
**Security Benefit:** Eliminates SSRF attack vector completely. Containers can only request temperatures from approved nodes.
---
#### 3. **Read/Write Deadlines (CRITICAL)** ✅ FIXED
**Vulnerability:** No read deadline allowed attackers to:
- Hold connection slots indefinitely by connecting but not sending data
- Starve legitimate requests (4 UIDs could consume all 8 global slots)
- Trivial DoS with minimal resources
**Fix:** Comprehensive deadline management
- **Modified Files:** `cmd/pulse-sensor-proxy/config.go`, `cmd/pulse-sensor-proxy/main.go`, `cmd/pulse-sensor-proxy/metrics.go`
- **Features:**
- Configurable `read_timeout` (default: 5s) and `write_timeout` (default: 10s)
- Read deadline set before request parsing, cleared before handler execution
- Write deadline set before response transmission
- Automatic penalty applied on timeout
- Prometheus metrics: `pulse_proxy_read_timeouts_total`, `pulse_proxy_write_timeouts_total`
**Configuration Example:**
```yaml
read_timeout: 5s # Max time to wait for request
write_timeout: 10s # Max time to send response
Security Benefit: Connection slot exhaustion attacks no longer possible. Slow/stalled clients automatically disconnected.
4. Range-Based Rate Limiting (HIGH PRIORITY) ✅ FIXED
Vulnerability: Rate limiting was per-UID, easily bypassed by:
- Creating multiple users in container (each mapped to unique host UID)
- 100+ subordinate UIDs available in typical ID-mapping (100000-165535)
- Each UID got separate rate limit quota
- Attackers could drive proxy to 100% CPU with parallel requests
Fix: Range-based rate limiting for containers
- Modified Files:
cmd/pulse-sensor-proxy/throttle.go,cmd/pulse-sensor-proxy/main.go,cmd/pulse-sensor-proxy/auth.go,cmd/pulse-sensor-proxy/metrics.go - Features:
- Automatic detection of ID-mapped UID ranges from
/etc/subuidand/etc/subgid - Rate limits applied per-range for container UIDs
- Rate limits applied per-UID for host UIDs (backwards compatible)
- Metrics show
peer="range:100000-165535"orpeer="uid:0"
- Automatic detection of ID-mapped UID ranges from
Technical Details:
identifyPeer()checks if BOTH UID AND GID are in mapped ranges- If in range: all UIDs in that range share rate limits
- If NOT in range: legacy per-UID limiting (for host processes)
Security Benefit: Multi-UID bypass attacks no longer possible. Entire container limited as single entity.
5. GID Authorization Fix (MEDIUM PRIORITY) ✅ FIXED
Vulnerability: allowed_peer_gids populated from config but never checked:
- Created false sense of security for administrators
- GID-based policies silently ignored
- No way to authorize by group membership
Fix: Implemented proper GID authorization
- Modified Files:
cmd/pulse-sensor-proxy/auth.go - New Files:
cmd/pulse-sensor-proxy/auth_test.go - Features:
- Peer authorized if UID OR GID matches allowlist
- Debug logging shows which rule granted access
- Full test coverage
Security Benefit: GID-based policies now actually enforced as administrators expect.
6. SSH Output Size Limits (MEDIUM PRIORITY) ✅ FIXED
Vulnerability: No cap on SSH command output size:
- Malicious remote node could stream gigabytes
- Memory exhaustion possible
- CPU spike during parsing
Fix: Implemented configurable output size limits
- Modified Files:
cmd/pulse-sensor-proxy/config.go,cmd/pulse-sensor-proxy/ssh.go,cmd/pulse-sensor-proxy/metrics.go - New Files:
cmd/pulse-sensor-proxy/ssh_test.go - Features:
max_ssh_output_bytesconfig option (default: 1MB)- Stream with
io.LimitReaderto cap size - Error returned if limit exceeded
- Prometheus metric:
pulse_proxy_ssh_output_oversized_total{node}
Configuration Example:
max_ssh_output_bytes: 1048576 # 1MB default
Security Benefit: Remote nodes cannot exhaust proxy memory or CPU via oversized outputs.
7. Improved Host Key Management (MEDIUM PRIORITY) ✅ FIXED
Vulnerability: Trust-On-First-Use (TOFU) via ssh-keyscan:
- Trusts whatever key remote offers on first contact
- No administrator approval for new fingerprints
- Vulnerable to MITM if container influences routing
- No alerting on fingerprint changes
Fix: Multi-phase host key hardening
- Modified Files:
internal/ssh/knownhosts/manager.go,cmd/pulse-sensor-proxy/ssh.go,cmd/pulse-sensor-proxy/config.go,cmd/pulse-sensor-proxy/metrics.go - New Files:
internal/ssh/knownhosts/manager_test.go - Features:
- Seed host keys from Proxmox cluster store (
/etc/pve/priv/known_hosts) - Falls back to ssh-keyscan only if Proxmox unavailable (with WARN)
- Fingerprint change detection with ERROR logging
require_proxmox_hostkeysconfig option for strict mode- Prometheus metric:
pulse_proxy_hostkey_changes_total{node}
- Seed host keys from Proxmox cluster store (
Configuration Example:
require_proxmox_hostkeys: false # true = strict mode (reject unknown hosts)
Security Benefit: Significantly reduces MITM attack surface. Administrators can detect and respond to fingerprint changes.
8. Capability-Based Authorization (MEDIUM PRIORITY) ✅ FIXED
Vulnerability: Any UID in allowlist could call privileged methods:
- No separation between read-only and admin capabilities
- If another service's UID in list, inherits full host-level control
Fix: Comprehensive capability system
- New Files:
cmd/pulse-sensor-proxy/capabilities.go - Modified Files:
cmd/pulse-sensor-proxy/config.go,cmd/pulse-sensor-proxy/auth.go,cmd/pulse-sensor-proxy/main.go - Features:
- Three capability levels:
read,write,admin - Per-UID capability assignment
- Privileged methods require
admincapability - Backwards compatible with legacy
allowed_peer_uidsformat
- Three capability levels:
Configuration Example:
allowed_peers:
- uid: 0
capabilities: [read, write, admin] # Root gets everything
- uid: 1000
capabilities: [read] # Docker user: read-only
- uid: 1001
capabilities: [read, write] # Temperature access but not key distribution
Security Benefit: Proper least-privilege model. Services can be granted only the capabilities they need.
9. Additional Systemd Hardening (LOW PRIORITY) ✅ FIXED
Gap: Additional systemd hardening directives available but not enabled:
MemoryDenyWriteExecute(prevents RWX memory)RestrictRealtime(denies realtime scheduling)ProtectHostname(hostname protection)ProtectKernelLogs(kernel log protection)SystemCallArchitectures(native only)
Fix: Enhanced systemd unit file
- Modified Files:
scripts/pulse-sensor-proxy.service - Added Directives:
MemoryDenyWriteExecute=trueRestrictRealtime=trueProtectHostname=trueProtectKernelLogs=trueSystemCallArchitectures=native
Security Benefit: Defense in depth. Additional layers to slow/prevent post-compromise exploitation.
Additional Improvements
Enhanced Metrics
New Prometheus metrics for security monitoring:
pulse_proxy_node_validation_failures_total{reason}
pulse_proxy_read_timeouts_total
pulse_proxy_write_timeouts_total
pulse_proxy_rate_limit_hits_total
pulse_proxy_limiter_rejections_total{reason, peer}
pulse_proxy_limiter_penalties_total{reason, peer}
pulse_proxy_global_concurrency_inflight
Better Logging
- Node validation failures logged at WARN with "potential SSRF attempt"
- Read timeouts logged with "slow client or attack"
- All security events include correlation IDs for tracing
- Peer identification shows "range:X-Y" for containers
Configuration Flexibility
All new features have sensible defaults and can be tuned via:
- YAML config file (
/etc/pulse-sensor-proxy/config.yaml) - Environment variables (e.g.,
PULSE_SENSOR_PROXY_READ_TIMEOUT) - Command-line flags
Migration Guide
For Existing Deployments
1. Update Socket Mounts (REQUIRED):
Docker:
# OLD:
- /run/pulse-sensor-proxy:/run/pulse-sensor-proxy:rw
# NEW:
- /run/pulse-sensor-proxy:/run/pulse-sensor-proxy:ro
LXC (Proxmox):
# Mounts created by install script are already correct
# If manually configured, ensure mount is read-only
2. Optional Configuration:
Create /etc/pulse-sensor-proxy/config.yaml:
# Restrict nodes (optional, auto-detects cluster by default)
allowed_nodes:
- "10.0.0.0/24" # Your cluster network
# Adjust timeouts if needed (defaults are good for most)
read_timeout: 5s
write_timeout: 10s
# Tune rate limits if necessary (defaults are reasonable)
rate_limit:
per_peer_interval_ms: 1000
per_peer_burst: 5
3. Update Monitoring:
Add new metrics to your Prometheus alerts:
# Alert on SSRF attempts
- alert: PulseSensorSSRFAttempt
expr: rate(pulse_proxy_node_validation_failures_total[5m]) > 0
# Alert on read timeout attacks
- alert: PulseSensorReadTimeouts
expr: rate(pulse_proxy_read_timeouts_total[5m]) > 1
4. Restart Proxy:
systemctl restart pulse-sensor-proxy
Backwards Compatibility
Preserved:
- Empty
allowed_nodes+ Proxmox host = auto-validate cluster (secure default) - Empty
allowed_nodes+ non-Proxmox = allow all (legacy behavior) - Host UID rate limiting unchanged
- All existing config files continue to work
Breaking Changes:
- Socket mounts MUST be changed to
:ro(security fix) - Containers with multiple users now share rate limits (security fix)
Testing
All fixes include comprehensive tests:
# Run test suite
go test ./cmd/pulse-sensor-proxy -v
# Build binary
go build ./cmd/pulse-sensor-proxy
# Test configuration
./pulse-sensor-proxy --config /etc/pulse-sensor-proxy/config.yaml version
Security Impact Assessment
Before Fixes:
- SSRF: Trivially exploitable, full internal network access
- DoS: 4 UIDs could completely starve service
- Container Bypass: 100+ UIDs available for rate limit bypass
- Socket Tampering: Compromised container could MITM all proxy traffic
After Fixes:
- SSRF: ✅ Eliminated (node validation)
- DoS: ✅ Eliminated (read deadlines)
- Container Bypass: ✅ Eliminated (range-based limiting)
- Socket Tampering: ✅ Eliminated (read-only mount)
Overall Risk Reduction: Critical vulnerabilities eliminated. System now resilient to container compromise scenarios.
References
- Temperature Monitoring Overview:
docs/security/TEMPERATURE_MONITORING.md - Sensor Proxy Hardening: Standardized security controls for legacy deployments.
Credits
Security audit performed by Claude + Codex collaboration.
Issues identified:
- Socket directory tampering (Codex)
- Unrestricted SSRF (Codex)
- Missing read deadline (Codex)
- Multi-UID rate limit bypass (Codex)
All fixes implemented and tested 2025-11-07.
For questions or security concerns, file issues at: https://github.com/rcourtman/Pulse/issues