Pulse/docs/TEMPERATURE_MONITORING.md
rcourtman 07fe382553 docs: update temperature monitoring guide to reflect removed UI button
- Replace references to 'Ensure cluster keys' button with instructions to re-run setup script
- Update troubleshooting section for new cluster nodes
- The setup script already handles SSH key distribution automatically
2025-10-17 11:46:31 +00:00

641 lines
21 KiB
Markdown

# Temperature Monitoring
Pulse can display real-time CPU and NVMe temperatures directly in your dashboard, giving you instant visibility into your hardware health.
## Features
- **CPU Package Temperature**: Shows the overall CPU temperature when available
- **Individual Core Temperatures**: Tracks each CPU core
- **NVMe Drive Temperatures**: Monitors NVMe SSD temperatures (visible in the Storage tab's disk list)
- **Color-Coded Display**:
- Green: < 60°C (normal)
- Yellow: 60-80°C (warm)
- Red: > 80°C (hot)
## How It Works
### Secure Architecture (v4.24.0+)
For **containerized deployments** (LXC/Docker), Pulse uses a secure proxy architecture:
1. **pulse-sensor-proxy** runs on the Proxmox host (outside the container)
2. SSH keys are stored on the host filesystem (`/var/lib/pulse-sensor-proxy/ssh/`)
3. Pulse communicates with the proxy via unix socket
4. The proxy handles all SSH connections to cluster nodes
**Benefits:**
- SSH keys never enter the container
- Container compromise doesn't expose infrastructure credentials
- Automatically configured during installation
- Transparent to users - no setup changes
#### Manual installation (host-side)
When you need to provision the proxy yourself (for example via your own automation), run these steps on the host that runs your Pulse container:
1. **Install the binary**
```bash
curl -L https://github.com/rcourtman/Pulse/releases/download/<TAG>/pulse-sensor-proxy-linux-amd64 \
-o /usr/local/bin/pulse-sensor-proxy
chmod 0755 /usr/local/bin/pulse-sensor-proxy
```
Use the arm64/armv7 artefact if required.
2. **Create the service account if missing**
```bash
id pulse-sensor-proxy >/dev/null 2>&1 || \
useradd --system --user-group --no-create-home --shell /usr/sbin/nologin pulse-sensor-proxy
```
3. **Provision the data directories**
```bash
install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0750 /var/lib/pulse-sensor-proxy
install -d -o pulse-sensor-proxy -g pulse-sensor-proxy -m 0700 /var/lib/pulse-sensor-proxy/ssh
```
4. **(Optional) Add `/etc/pulse-sensor-proxy/config.yaml`**
Only needed if you want explicit subnet/metrics settings; otherwise the proxy auto-detects host CIDRs.
```yaml
allowed_source_subnets:
- 192.168.1.0/24
metrics_address: 0.0.0.0:9127 # use "disabled" to switch metrics off
```
5. **Install the hardened systemd unit**
Copy the unit from `scripts/install-sensor-proxy.sh` or create `/etc/systemd/system/pulse-sensor-proxy.service` with:
```ini
[Unit]
Description=Pulse Temperature Proxy
After=network.target
[Service]
Type=simple
User=pulse-sensor-proxy
Group=pulse-sensor-proxy
WorkingDirectory=/var/lib/pulse-sensor-proxy
ExecStart=/usr/local/bin/pulse-sensor-proxy
Restart=on-failure
RestartSec=5s
RuntimeDirectory=pulse-sensor-proxy
RuntimeDirectoryMode=0775
UMask=0007
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=read-only
ReadWritePaths=/var/lib/pulse-sensor-proxy
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectControlGroups=true
ProtectClock=true
PrivateTmp=true
PrivateDevices=true
ProtectProc=invisible
ProcSubset=pid
LockPersonality=true
RemoveIPC=true
RestrictSUIDSGID=true
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=true
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM
CapabilityBoundingSet=
AmbientCapabilities=
KeyringMode=private
LimitNOFILE=1024
StandardOutput=journal
StandardError=journal
SyslogIdentifier=pulse-sensor-proxy
[Install]
WantedBy=multi-user.target
```
6. **Enable the service**
```bash
systemctl daemon-reload
systemctl enable --now pulse-sensor-proxy.service
```
Confirm the socket appears at `/run/pulse-sensor-proxy/pulse-sensor-proxy.sock`.
7. **Expose the socket to Pulse**
- **Proxmox LXC:** append `lxc.mount.entry: /run/pulse-sensor-proxy run/pulse-sensor-proxy none bind,create=dir 0 0` to `/etc/pve/lxc/<CTID>.conf` and restart the container.
- **Docker:** bind mount `/run/pulse-sensor-proxy` into the container (`- /run/pulse-sensor-proxy:/run/pulse-sensor-proxy:rw`).
After the container restarts, the backend will automatically use the proxy. To refresh SSH keys on cluster nodes (e.g., after adding a new node), SSH to your Proxmox host and re-run the setup script: `curl -fsSL https://get.pulsenode.com/install-proxy.sh | bash -s -- --ctid <your-container-id>`
### Legacy Architecture (Pre-v4.24.0 / Native Installs)
For native (non-containerized) installations, Pulse connects directly via SSH:
1. Pulse uses SSH key authentication (like Ansible, Terraform, etc.)
2. Runs `sensors -j` command to read hardware temperatures
3. SSH key stored in Pulse's home directory
> **Important for native installs:** Run every setup command as the same user account that executes the Pulse service (typically `pulse`). The backend reads the SSH key from that user's home directory.
## Requirements
1. **SSH Key Authentication**: Your Pulse server needs SSH key access to nodes (no passwords)
2. **lm-sensors Package**: Installed on nodes to read hardware sensors
3. **Passwordless root SSH** (Proxmox clusters only): For proxy architecture, the Proxmox host running Pulse must have passwordless root SSH access to all cluster nodes. This is standard for Proxmox clusters but hardened environments may need to create an alternate service account.
## Setup (Automatic)
The auto-setup script (Settings → Nodes → Setup Script) will prompt you to configure SSH access for temperature monitoring:
1. Run the auto-setup script on your Proxmox node
2. When prompted for SSH setup, choose "y"
3. Get your Pulse server's public key:
```bash
# On your Pulse server (run as the user running Pulse)
cat ~/.ssh/id_rsa.pub
```
4. Paste the public key when prompted
5. The script will:
- Add the key to `/root/.ssh/authorized_keys`
- Install `lm-sensors`
- Run `sensors-detect --auto`
If the node is part of a Proxmox cluster, the script will now detect the other members and offer to configure the same SSH/lm-sensors setup on each of them automatically—confirm when prompted to roll it out cluster-wide.
## Setup (Manual)
If you skipped SSH setup during auto-setup, you can configure it manually:
### 1. Generate SSH Key (on Pulse server)
```bash
# Run as the user running Pulse (usually the pulse service account)
ssh-keygen -t rsa -N "" -f ~/.ssh/id_rsa
```
### 2. Copy Public Key to Proxmox Nodes
```bash
# Get your public key
cat ~/.ssh/id_rsa.pub
# Add it to each Proxmox node
ssh root@your-proxmox-node
mkdir -p /root/.ssh
chmod 700 /root/.ssh
echo "YOUR_PUBLIC_KEY_HERE" >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
```
### 3. Install lm-sensors (on each Proxmox node)
```bash
apt-get update
apt-get install -y lm-sensors
sensors-detect --auto
```
### 4. Test SSH Connection
From your Pulse server:
```bash
ssh root@your-proxmox-node "sensors -j"
```
You should see JSON output with temperature data.
## How It Works
1. Pulse uses SSH to connect to each node as root
2. Runs `sensors -j` to get temperature data in JSON format
3. Parses CPU temperatures (coretemp/k10temp)
4. Parses NVMe temperatures (nvme-pci-*)
5. Displays CPU temperatures on the overview dashboard and lists NVMe drive temperatures in the Storage tab's disk table when available
## Troubleshooting
### No Temperature Data Shown
**Check SSH access**:
```bash
# From Pulse server
ssh root@your-proxmox-node "echo test"
```
**Check lm-sensors**:
```bash
# On Proxmox node
sensors -j
```
**Check Pulse logs**:
```bash
journalctl -u pulse -f | grep -i temp
```
### Temperature Shows as Unavailable
- lm-sensors may not be installed
- Node may not have temperature sensors
- SSH key authentication may not be working
### ARM Devices (Raspberry Pi, etc.)
ARM devices typically don't have the same sensor interfaces. Temperature monitoring may not work or may show different sensors (like `thermal_zone0` instead of `coretemp`).
## Security & Architecture
### How Temperature Collection Works
Temperature monitoring uses **SSH key authentication** - the same trusted method used by automation tools like Ansible, Terraform, and Saltstack for managing infrastructure at scale.
**What Happens**:
1. Pulse connects to your node via SSH using a key (no passwords)
2. Runs `sensors -j` to get temperature readings in JSON format
3. Parses the data and displays it in the dashboard
4. Disconnects (entire operation takes <1 second)
**Security Design**:
- ✅ **Key-based authentication** - More secure than passwords, industry standard
- ✅ **Read-only operation** - `sensors` command only reads hardware data
- ✅ **Private key stays on Pulse server** - Never transmitted or exposed
- ✅ **Public key on nodes** - Safe to store, can't be used to gain access
- ✅ **Instantly revocable** - Remove key from authorized_keys to disable
- ✅ **Logged and auditable** - All connections logged in `/var/log/auth.log`
### What Pulse Uses SSH For
Pulse reuses the SSH access only for the actions already described in [Setup (Automatic)](#setup-automatic) and [How It Works](#how-it-works): adding the public key during setup (if you opt in) and polling `sensors -j` each cycle. It does nothing else—no extra commands, file changes, or config edits—and revoking the key stops temperature collection immediately.
This is the same security model used by thousands of organizations for infrastructure automation.
### Best Practices
1. **Dedicated key**: Generate a separate SSH key just for Pulse (recommended)
2. **Firewall rules**: Optionally restrict SSH to your Pulse server's IP
3. **Regular monitoring**: Review auth logs if you want extra visibility
4. **Secure your Pulse server**: Keep it updated and behind proper access controls
### Command Restrictions (Default)
Pulse now writes the temperature key with a forced command so the connection can only execute `sensors -j`. Port/X11/agent forwarding and PTY allocation are all disabled automatically when you opt in through the setup script. Re-running the script upgrades older installs to the restricted entry without touching any of your other SSH keys.
```bash
# Example entry in /root/.ssh/authorized_keys installed by Pulse
command="sensors -j",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAAB3NzaC1yc2E...
```
You can still manage the entry manually if you prefer, but no extra steps are required for new installations.
## Performance Impact
- Minimal: SSH connection is made once per polling cycle
- Timeout: 5 seconds (non-blocking)
- Falls back gracefully if SSH fails
- No impact if SSH is not configured
## Container Security Considerations
✅ **Resolved in v4.24.0**
### Secure Proxy Architecture (Current)
As of v4.24.0, containerized deployments use **pulse-sensor-proxy** which eliminates the security concerns:
- **SSH keys stored on host** - Not accessible from container
- **Unix socket communication** - Pulse never touches SSH keys
- **Automatic during installation** - No manual configuration needed
- **Container compromise = No credential exposure** - Attacker gains nothing
**For new installations:** The proxy is installed automatically during LXC setup. No action required.
**For existing installations (pre-v4.24.0):** Upgrade your deployment to use the proxy:
```bash
# On your Proxmox host
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \
bash -s -- --ctid <your-pulse-container-id>
```
### Legacy Security Concerns (Pre-v4.24.0)
Older versions stored SSH keys inside the container, creating security risks:
- Compromised container = exposed SSH keys
- Even with forced commands, keys could be extracted
- Required manual hardening (key rotation, IP restrictions, etc.)
### Hardening Recommendations (Legacy/Native Installs Only)
#### 1. Key Rotation
Rotate SSH keys periodically (e.g., every 90 days):
```bash
# On Pulse server
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_new -N ""
# Update all nodes' authorized_keys
# Test connectivity
ssh -i ~/.ssh/id_ed25519_new node "sensors -j"
# Replace old key
mv ~/.ssh/id_ed25519_new ~/.ssh/id_ed25519
```
#### 2. Secret Mounts (Docker)
Mount SSH keys from secure volumes:
```yaml
version: '3'
services:
pulse:
image: rcourtman/pulse:latest
volumes:
- pulse-ssh-keys:/home/pulse/.ssh:ro # Read-only
- pulse-data:/data
volumes:
pulse-ssh-keys:
driver: local
driver_opts:
type: tmpfs # Memory-only, not persisted
device: tmpfs
```
#### 3. Monitoring & Alerts
Enable SSH audit logging on Proxmox nodes:
```bash
# Install auditd
apt-get install auditd
# Watch SSH access
auditctl -w /root/.ssh -p wa -k ssh_access
# Monitor for unexpected commands
tail -f /var/log/audit/audit.log | grep ssh
```
#### 4. IP Restrictions
Limit SSH access to your Pulse server IP in `/etc/ssh/sshd_config`:
```ssh
Match User root Address 192.168.1.100
ForceCommand sensors -j
PermitOpen none
AllowAgentForwarding no
AllowTcpForwarding no
```
### Verifying Proxy Installation
To check if your deployment is using the secure proxy:
```bash
# On Proxmox host - check proxy service
systemctl status pulse-sensor-proxy
# Check if socket exists
ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
# View proxy logs
journalctl -u pulse-sensor-proxy -f
```
In the Pulse container, check the logs at startup:
```bash
# Should see: "Temperature proxy detected - using secure host-side bridge"
journalctl -u pulse | grep -i proxy
```
### Disabling Temperature Monitoring
To remove SSH access:
```bash
# On each Proxmox node
sed -i '/pulse@/d' /root/.ssh/authorized_keys
# Or remove just the forced command entry
sed -i '/command="sensors -j"/d' /root/.ssh/authorized_keys
```
Temperature data will stop appearing in the dashboard after the next polling cycle.
## Operations & Troubleshooting
### Managing the Proxy Service
The pulse-sensor-proxy service runs on the Proxmox host (outside the container).
**Service Management:**
```bash
# Check service status
systemctl status pulse-sensor-proxy
# Restart the proxy
systemctl restart pulse-sensor-proxy
# Stop the proxy (disables temperature monitoring)
systemctl stop pulse-sensor-proxy
# Start the proxy
systemctl start pulse-sensor-proxy
# Enable proxy to start on boot
systemctl enable pulse-sensor-proxy
# Disable proxy autostart
systemctl disable pulse-sensor-proxy
```
### Log Locations
**Proxy Logs (on Proxmox host):**
```bash
# Follow proxy logs in real-time
journalctl -u pulse-sensor-proxy -f
# View last 50 lines
journalctl -u pulse-sensor-proxy -n 50
# View logs since last boot
journalctl -u pulse-sensor-proxy -b
# View logs with timestamps
journalctl -u pulse-sensor-proxy --since "1 hour ago"
```
**Pulse Logs (in container):**
```bash
# Check if proxy is being used
journalctl -u pulse | grep -i "proxy\|temperature"
# Should see: "Temperature proxy detected - using secure host-side bridge"
```
### SSH Key Rotation
Rotate SSH keys periodically for security (recommended every 90 days):
```bash
# 1. On Proxmox host, backup old keys
cd /var/lib/pulse-sensor-proxy/ssh/
cp id_ed25519 id_ed25519.backup
cp id_ed25519.pub id_ed25519.pub.backup
# 2. Generate new keypair
ssh-keygen -t ed25519 -f id_ed25519 -N "" -C "pulse-sensor-proxy-rotated"
# 3. Get the new public key
cat id_ed25519.pub
# 4. Add new key to all cluster nodes
# For each node in your cluster:
ssh root@node1 "echo 'NEW_PUBLIC_KEY_HERE' >> /root/.ssh/authorized_keys"
ssh root@node2 "echo 'NEW_PUBLIC_KEY_HERE' >> /root/.ssh/authorized_keys"
# ... repeat for all nodes
# 5. Restart proxy to use new keys
systemctl restart pulse-sensor-proxy
# 6. Verify temperature data still works in Pulse UI
# 7. Remove old keys from nodes (after confirming new keys work)
ssh root@node1 "sed -i '/pulse-sensor-proxy-old/d' /root/.ssh/authorized_keys"
```
### Revoking Access When Nodes Leave
When removing a node from your cluster:
```bash
# On the node being removed, remove the proxy's public key
ssh root@old-node "sed -i '/pulse-sensor-proxy/d' /root/.ssh/authorized_keys"
# No restart needed - proxy will fail gracefully for that node
# Temperature monitoring will continue for remaining nodes
```
### Failure Modes
**Proxy Not Running:**
- Symptom: No temperature data in Pulse UI
- Check: `systemctl status pulse-sensor-proxy` on Proxmox host
- Fix: `systemctl start pulse-sensor-proxy`
**Socket Not Accessible in Container:**
- Symptom: Pulse logs show "Temperature proxy not available - using direct SSH"
- Check: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock` in container
- Fix: Verify bind mount in LXC config (`/etc/pve/lxc/<CTID>.conf`)
- Should have: `lxc.mount.entry: /run/pulse-sensor-proxy run/pulse-sensor-proxy none bind,create=dir 0 0`
**pvecm Not Available:**
- Symptom: Proxy fails to discover cluster nodes
- Cause: Pulse runs on non-Proxmox host
- Fallback: Use legacy direct SSH method (native installation)
**Pulse Running Off-Cluster:**
- Symptom: Proxy discovers local host but not remote cluster nodes
- Limitation: Proxy requires passwordless SSH between cluster nodes
- Solution: Ensure Proxmox host running Pulse has SSH access to all cluster nodes
**Unauthorized Connection Attempts:**
- Symptom: Proxy logs show "Unauthorized connection attempt"
- Cause: Process with non-root UID trying to access socket
- Normal: Only root (UID 0) or proxy's own user can access socket
- Check: Look for suspicious processes trying to access the socket
### Monitoring the Proxy
**Manual Monitoring (v1):**
The proxy service includes systemd restart-on-failure, which handles most issues automatically. For additional monitoring:
```bash
# Check proxy health
systemctl is-active pulse-sensor-proxy && echo "Proxy is running" || echo "Proxy is down"
# Monitor logs for errors
journalctl -u pulse-sensor-proxy --since "1 hour ago" | grep -i error
# Verify socket exists and is accessible
test -S /run/pulse-sensor-proxy/pulse-sensor-proxy.sock && echo "Socket OK" || echo "Socket missing"
```
**Alerting:**
- Rely on systemd's automatic restart (`Restart=on-failure`)
- Monitor via journalctl for persistent failures
- Check Pulse UI for missing temperature data
**Future:** Integration with pulse-watchdog is planned for automated health checks and alerting (see #528).
### Known Limitations
**One Proxy Per Host:**
- Each Proxmox host runs one pulse-sensor-proxy instance
- If multiple Pulse containers run on same host, they share the same proxy
- All containers see the same temperature data from the same cluster
**Requires Proxmox Cluster Membership:**
- Proxy uses `pvecm nodes` to discover cluster members
- Standalone Proxmox nodes work but only monitor that single node
- For standalone nodes, proxy is less useful (direct SSH works fine)
**Passwordless Root SSH Required:**
- Proxy assumes passwordless root SSH between cluster nodes
- Standard for Proxmox clusters, but hardened environments may differ
- Alternative: Create dedicated service account with sudo access to `sensors`
**No Cross-Cluster Support:**
- Proxy only manages the cluster its host belongs to
- Cannot bridge temperature monitoring across multiple disconnected clusters
- Each cluster needs its own Pulse instance with its own proxy
### Common Issues
**Temperature Data Stops Appearing:**
1. Check proxy service: `systemctl status pulse-sensor-proxy`
2. Check proxy logs: `journalctl -u pulse-sensor-proxy -n 50`
3. Test SSH manually: `ssh root@node "sensors -j"`
4. Verify socket exists: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock`
**New Cluster Node Not Showing Temperatures:**
1. Ensure lm-sensors installed: `ssh root@new-node "sensors -j"`
2. Proxy auto-discovers on next poll (may take up to 1 minute)
3. Re-run the setup script to configure SSH keys on the new node: `curl -fsSL https://get.pulsenode.com/install-proxy.sh | bash -s -- --ctid <CTID>`
**Permission Denied Errors:**
1. Verify socket permissions: `ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock`
2. Should be: `srw-rw---- 1 root root`
3. Check Pulse runs as root in container: `pct exec <CTID> -- whoami`
**Proxy Service Won't Start:**
1. Check logs: `journalctl -u pulse-sensor-proxy -n 50`
2. Verify binary exists: `ls -l /usr/local/bin/pulse-sensor-proxy`
3. Test manually: `/usr/local/bin/pulse-sensor-proxy --version`
4. Check socket directory: `ls -ld /var/run`
### Getting Help
If temperature monitoring isn't working:
1. **Collect diagnostic info:**
```bash
# On Proxmox host
systemctl status pulse-sensor-proxy
journalctl -u pulse-sensor-proxy -n 100 > /tmp/proxy-logs.txt
ls -la /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
# In Pulse container
journalctl -u pulse -n 100 | grep -i temp > /tmp/pulse-temp-logs.txt
```
2. **Test manually:**
```bash
# On Proxmox host - test SSH to a cluster node
ssh root@cluster-node "sensors -j"
```
3. **Check GitHub Issues:** https://github.com/rcourtman/Pulse/issues
4. **Include in bug report:**
- Pulse version
- Deployment type (LXC/Docker/native)
- Proxy logs
- Pulse logs
- Output of manual SSH test