# Temperature Monitoring Security Guide
This document describes the security architecture of Pulse's temperature monitoring system with pulse-sensor-proxy.
## Table of Contents
- [Architecture Overview](#architecture-overview)
- [Security Boundaries](#security-boundaries)
- [Authentication & Authorization](#authentication--authorization)
- [Rate Limiting](#rate-limiting)
- [SSH Security](#ssh-security)
- [Container Isolation](#container-isolation)
- [Monitoring & Alerting](#monitoring--alerting)
- [Development Mode](#development-mode)
- [Troubleshooting](#troubleshooting)
---
## Architecture Overview
```mermaid
graph TD
Container[Pulse Container]
Proxy[pulse-sensor-proxy
Host Service]
Cluster[Cluster Nodes
SSH sensors -j]
Container -->|Unix Socket
Rate Limited| Proxy
Proxy -->|SSH
Forced Command| Cluster
Cluster -->|Temperature JSON| Proxy
Proxy -->|Temperature JSON| Container
style Proxy fill:#e1f5e1
style Container fill:#fff4e1
style Cluster fill:#e1f0ff
```
**Key Principle**: SSH keys never enter containers. All SSH operations are performed by the host-side proxy.
---
## Security Boundaries
### 1. Host ↔ Container Boundary
- **Enforced by**: Method-level authorization + ID-mapped root detection
- **Container CAN**:
- ✅ Call `get_temperature` (read temperature data)
- ✅ Call `get_status` (check proxy health)
- **Container CANNOT**:
- ❌ Call `ensure_cluster_keys` (SSH key distribution)
- ❌ Call `register_nodes` (node discovery)
- ❌ Call `request_cleanup` (cleanup operations)
- ❌ Use direct SSH (blocked by container detection)
### 2. Proxy ↔ Cluster Nodes Boundary
- **Enforced by**: SSH forced commands + IP filtering
- **SSH authorized_keys entry**:
```bash
from="192.168.0.0/24",command="sensors -j",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAAA... pulse-sensor-proxy
```
- Proxy can ONLY run `sensors -j` on cluster nodes
- IP restrictions prevent lateral movement
### 3. Client ↔ Proxy Boundary
- **Enforced by**: UID-based ACL + adaptive rate limiting
- SO_PEERCRED verifies caller's UID/GID/PID
- Rate limiting (defaults): ~12 requests per minute per UID (burst 2), per-UID concurrency 2, global concurrency 8, 2 s penalty on validation failures
- Per-node guard: only 1 SSH fetch per node at a time
---
## Authentication & Authorization
### Authentication (Who can connect?)
**Allowed UIDs**:
- Root (UID 0) - host processes
- Proxy's own UID (pulse-sensor-proxy user)
- Configured UIDs from `/etc/pulse-sensor-proxy/config.yaml`
- ID-mapped root ranges (containers, if enabled)
**ID-Mapped Root Detection**:
- Reads `/etc/subuid` and `/etc/subgid` for UID/GID mapping ranges
- Containers typically use ranges like `100000-165535`
- Both UID AND GID must be in mapped ranges
### Authorization (What can they call?)
**Privileged Methods** (host-only):
```go
var privilegedMethods = map[string]bool{
"ensure_cluster_keys": true, // SSH key distribution
"register_nodes": true, // Node registration
"request_cleanup": true, // Cleanup operations
}
```
**Authorization Check**:
```go
if privilegedMethods[method] && isIDMappedRoot(credentials) {
return "method requires host-level privileges"
}
```
**Read-Only Methods** (containers allowed):
- `get_temperature` - Fetch temperature data via proxy
- `get_status` - Check proxy health and version
---
## Rate Limiting
### Per-Peer Limits (commit 46b8b8d)
- **Rate:** 1 request per second (`per_peer_interval_ms = 1000`)
- **Burst:** 5 requests (enough to sweep five nodes per polling window)
- **Per-peer concurrency:** Maximum 2 concurrent RPCs
- **Global concurrency:** 8 simultaneous RPCs across all peers
- **Penalty:** 2 s enforced delay on validation failures (oversized payloads, unauthorized methods)
- **Cleanup:** Peer entries expire after 10 minutes of inactivity
### Configurable Overrides
Administrators can raise or lower thresholds via `/etc/pulse-sensor-proxy/config.yaml`:
```yaml
rate_limit:
per_peer_interval_ms: 500 # 2 rps
per_peer_burst: 10 # allow 10-node sweep
```
Security guidance:
- Keep `per_peer_interval_ms ≥ 100` in production; lower values expand the attack surface for noisy callers.
- Ensure UID/GID filters stay in place when increasing throughput, and continue to ship audit logs off-host.
- Monitor `pulse_proxy_limiter_penalties_total` alongside `pulse_proxy_limiter_rejects_total` to spot abusive or compromised clients.
### Per-Node Concurrency
- **Limit**: 1 concurrent SSH request per node
- **Purpose**: Prevents SSH connection storms
- **Scope**: Applies to all peers requesting same node
### Monitoring Rate Limits
```bash
# Check rate limit metrics
curl -s http://127.0.0.1:9127/metrics | grep pulse_proxy_limiter_rejects_total
# Watch for rate limit warnings in logs
journalctl -u pulse-sensor-proxy -f | grep "Rate limit exceeded"
```
---
## SSH Security
### SSH Key Management
**Key Location**: `/var/lib/pulse-sensor-proxy/ssh/id_ed25519`
- **Owner**: `pulse-sensor-proxy:pulse-sensor-proxy`
- **Permissions**: `0600` (read/write for owner only)
- **Type**: Ed25519 (modern, secure)
**Key Distribution**:
- Only host processes can trigger distribution (via `ensure_cluster_keys`)
- Containers are blocked from key distribution operations
- Keys are distributed with forced commands and IP restrictions
### Forced Command Restrictions
**On cluster nodes**, the SSH key can ONLY run:
```bash
sensors -j
```
**No other commands possible**:
- ❌ Shell access denied (`no-pty`)
- ❌ Port forwarding disabled (`no-port-forwarding`)
- ❌ X11 forwarding disabled (`no-X11-forwarding`)
- ❌ Agent forwarding disabled (`no-agent-forwarding`)
### IP Filtering
**Source IP restrictions**:
```bash
from="192.168.0.0/24,10.0.0.0/8"
```
- Automatically detected from cluster node IPs
- Prevents SSH key use from outside the cluster
- Updated during key rotation
---
## Container Isolation
### Fallback SSH Protection
**In containers**, direct SSH is blocked:
```go
if isRunningInContainer() && !devModeAllowSSH {
log.Error().Msg("SECURITY BLOCK: SSH temperature collection disabled in containers")
return &Temperature{Available: false}, nil
}
```
**Container Detection Methods**:
1. Check for `/.dockerenv` file
2. Check `/proc/1/cgroup` for "docker", "lxc", "containerd"
**Bypass**: Only possible with explicit environment variable (see [Development Mode](#development-mode))
### ID-Mapped Root Detection
**How it works**:
```go
// Check /etc/subuid and /etc/subgid for mapping ranges
// Example /etc/subuid:
// root:100000:65536
func isIDMappedRoot(cred *peerCredentials) bool {
return uidInRange(cred.uid, idMappedUIDRanges) &&
gidInRange(cred.gid, idMappedGIDRanges)
}
```
**Why both UID and GID?**:
- Container root: `uid=100000, gid=100000` → ID-mapped
- Container app user: `uid=101001, gid=101001` → ID-mapped
- Host root: `uid=0, gid=0` → NOT ID-mapped
- Mixed: `uid=100000, gid=50` → NOT ID-mapped (fails check)
---
## Monitoring & Alerting
### Log Locations
**Proxy logs**:
```bash
journalctl -u pulse-sensor-proxy -f
```
**Backend logs** (inside container):
```bash
journalctl -u pulse-backend -f
```
**Audit rotation**: Use the steps in [operations/audit-log-rotation.md](operations/audit-log-rotation.md) to rotate `/var/log/pulse/sensor-proxy/audit.log`. After each rotation, restart the proxy and confirm temperature pollers are healthy in `/api/monitoring/scheduler/health` (closed breakers, no DLQ entries).
### Security Events to Monitor
#### 1. Privileged Method Denials
```
SECURITY: Container attempted to call privileged method - access denied
method=ensure_cluster_keys uid=101000 gid=101000 pid=12345
```
**Alert on**: Any occurrence (indicates attempted privilege escalation)
#### 2. Rate Limit Violations
```
Rate limit exceeded uid=101000 pid=12345
```
**Alert on**: Sustained violations (>10/minute indicates possible abuse)
#### 3. Authorization Failures
```
Peer authorization failed uid=50000 gid=50000
```
**Alert on**: Repeated failures from same UID (indicates misconfiguration or probing)
#### 4. SSH Fallback Attempts
```
SECURITY BLOCK: SSH temperature collection disabled in containers
```
**Alert on**: Any occurrence (should only happen during misconfigurations)
### Metrics to Track
```bash
# Rate limit hits
pulse_proxy_rate_limit_hits_total
# RPC requests by method and result
pulse_proxy_rpc_requests_total{method="get_temperature",result="success"}
pulse_proxy_rpc_requests_total{method="ensure_cluster_keys",result="unauthorized"}
# SSH request latency
pulse_proxy_ssh_latency_seconds{node="delly"}
# Active connections
pulse_proxy_queue_depth
pulse_proxy_global_concurrency_inflight
```
### Recommended Alerts
1. **Privilege Escalation Attempts**:
```
pulse_proxy_rpc_requests_total{result="unauthorized"} > 0
```
2. **Rate Limit Abuse**:
```
rate(pulse_proxy_rate_limit_hits_total[5m]) > 1
```
3. **Proxy Unavailable**:
```
up{job="pulse-sensor-proxy"} == 0
```
4. **Scheduler Drift** (Pulse side – ensures temperature pollers stay healthy):
```
max_over_time(pulse_monitor_poll_queue_depth[5m]) >
```
Pair with a check of `/api/monitoring/scheduler/health` to confirm temperature instances report `breaker.state == "closed"`.
---
## Development Mode
### SSH Fallback Override
**Purpose**: Allow direct SSH from containers during development/testing
**Environment Variable**:
```bash
export PULSE_DEV_ALLOW_CONTAINER_SSH=true
```
**Security Implications**:
- ⚠️ **NEVER use in production**
- Allows container to use SSH keys if present
- Defeats the security isolation model
- Should only be used in trusted development environments
**Example Usage**:
```bash
# In systemd override for pulse-backend
mkdir -p /etc/systemd/system/pulse-backend.service.d
cat < /etc/systemd/system/pulse-backend.service.d/dev-ssh.conf
[Service]
Environment=PULSE_DEV_ALLOW_CONTAINER_SSH=true
EOF
systemctl daemon-reload
systemctl restart pulse-backend
```
**Monitoring**:
```bash
# Check if dev mode is active
journalctl -u pulse-backend | grep "dev mode" | tail -1
```
**Disable dev mode**:
```bash
rm /etc/systemd/system/pulse-backend.service.d/dev-ssh.conf
systemctl daemon-reload
systemctl restart pulse-backend
```
---
## Troubleshooting
### "method requires host-level privileges"
**Symptom**: Container gets this error when calling RPC
**Cause**: Container attempted to call privileged method
**Resolution**: This is expected behavior. Only these methods are restricted:
- `ensure_cluster_keys`
- `register_nodes`
- `request_cleanup`
**If host process is blocked**:
1. Check UID is not in ID-mapped range:
```bash
id
cat /etc/subuid /etc/subgid
```
2. Verify proxy's allowed UIDs:
```bash
cat /etc/pulse-sensor-proxy/config.yaml
```
### "Rate limit exceeded"
**Symptom**: Requests failing with rate limit error
**Cause**: Peer exceeded ~12 requests/minute (or exhausted per-peer/global concurrency)
**Resolution**:
1. Confirm workload is legitimate (look for retry loops or aggressive polling).
2. Allow the limiter to recover—penalty sleeps clear in ~2 s and idle peers expire after 10 minutes.
3. If sustained higher throughput is required, adjust the constants in `cmd/pulse-sensor-proxy/throttle.go` and rebuild.
### Temperature monitoring unavailable
**Symptom**: No temperature data in dashboard
**Diagnosis**:
```bash
# 1. Check proxy is running
systemctl status pulse-sensor-proxy
# 2. Check socket exists
ls -la /run/pulse-sensor-proxy/
# 3. Check socket is accessible in container
ls -la /mnt/pulse-proxy/
# 4. Test proxy from host
curl -s --unix-socket /run/pulse-sensor-proxy/pulse-sensor-proxy.sock \
-X POST -d '{"method":"get_status"}' | jq
# 5. Check SSH connectivity
ssh root@delly "sensors -j"
# 6. Inspect adaptive polling for temperature pollers
curl -s http://localhost:7655/api/monitoring/scheduler/health \
| jq '.instances[] | select(.key | contains("temperature")) | {key, breaker: .breaker.state, deadLetter: .deadLetter.present, lastSuccess: .pollStatus.lastSuccess}'
```
### SSH key not distributed
**Symptom**: Manual `ensure_cluster_keys` call fails
**Check**:
1. Are you calling from host (not container)?
2. Is pvecm available? `command -v pvecm`
3. Can you reach cluster nodes? `pvecm status`
4. Check proxy logs: `journalctl -u pulse-sensor-proxy -f`
---
## Best Practices
### Production Deployments
1. ✅ **Never use dev mode** (`PULSE_DEV_ALLOW_CONTAINER_SSH=true`)
2. ✅ **Monitor security logs** for unauthorized access attempts
3. ✅ **Use IP filtering** on SSH authorized_keys entries
4. ✅ **Rotate SSH keys** periodically (use `ensure_cluster_keys` with rotation)
5. ✅ **Limit allowed_peer_uids** to minimum necessary
6. ✅ **Enable audit logging** for privileged operations
### Development Environments
1. ✅ Use dev mode SSH override if needed (document why)
2. ✅ Test with actual ID-mapped containers
3. ✅ Verify privileged method blocking works
4. ✅ Test rate limiting under load
### Incident Response
**If container compromise suspected**:
1. Check for privileged method attempts:
```bash
journalctl -u pulse-sensor-proxy | grep "SECURITY:"
```
2. Check rate limit violations:
```bash
journalctl -u pulse-sensor-proxy | grep "Rate limit"
```
3. Restart proxy to clear state:
```bash
systemctl restart pulse-sensor-proxy
```
4. Consider rotating SSH keys:
```bash
# From host, call ensure_cluster_keys with new key
```
---
## References
- [Pulse Installation Guide](../README.md)
- [pulse-sensor-proxy Configuration](../cmd/pulse-sensor-proxy/README.md)
- [Security Audit Results](../SECURITY.md)
- [LXC ID Mapping Documentation](https://linuxcontainers.org/lxc/manpages/man5/lxc.container.conf.5.html#lbAJ)
---
**Last Updated**: 2025-10-19
**Security Contact**: File issues at https://github.com/rcourtman/Pulse/issues