The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments. Changes: - Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/ - Updated all path constants and binary references - Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_* - Updated systemd service and service account name - Updated installation, rotation, and build scripts - Renamed hardening documentation - Maintained backward compatibility for key removal during upgrades
28 KiB
Pulse Temperature Proxy - Security Hardening Guide
Overview
The pulse-sensor-proxy is a host-side service that provides secure temperature monitoring for containerized Pulse deployments. It addresses a critical security concern: SSH keys stored inside LXC containers can be exfiltrated if the container is compromised.
Architecture:
- Host-side proxy runs with minimal privileges on each Proxmox node
- Containerized Pulse communicates via Unix socket (
/run/pulse-sensor-proxy/pulse-sensor-proxy.sock) - Proxy authenticates containers using Linux
SO_PEERCRED(UID/PID verification) - SSH keys never leave the host filesystem
Threat Model:
- ✅ Container compromise cannot access SSH keys
- ✅ Container cannot directly SSH to cluster nodes
- ✅ Rate limiting prevents abuse via socket
- ✅ IP restrictions on SSH keys limit lateral movement
- ✅ Audit logging tracks all temperature requests
Prerequisites
- Proxmox VE 7.0+ or Proxmox Backup Server 2.0+
- LXC container running Pulse (unprivileged recommended)
- Root access to Proxmox host(s)
lm-sensorsinstalled on all nodes- Cluster SSH access configured (root passwordless SSH between nodes)
Host Hardening
Service Account
The proxy runs as the pulse-sensor-proxy user with these characteristics:
- System account (no login shell:
/usr/sbin/nologin) - No home directory
- Dedicated group:
pulse-sensor-proxy - Owns
/var/lib/pulse-sensor-proxyand/run/pulse-sensor-proxy
Verify service account:
# Check user exists
id pulse-sensor-proxy
# Expected output:
# uid=XXX(pulse-sensor-proxy) gid=XXX(pulse-sensor-proxy) groups=XXX(pulse-sensor-proxy)
# Check shell (should be /usr/sbin/nologin)
getent passwd pulse-sensor-proxy | cut -d: -f7
Systemd Unit Security
The systemd unit includes comprehensive hardening directives:
Key security features:
User=pulse-sensor-proxy/Group=pulse-sensor-proxy- Unprivileged executionNoNewPrivileges=true- Prevents privilege escalationProtectSystem=strict- Read-only/usr,/boot,/efiProtectHome=true- Inaccessible/home,/root,/run/userPrivateTmp=true- Isolated/tmpand/var/tmpSystemCallFilter=@system-service- Restricted syscallsCapabilityBoundingSet=- No capabilities grantedRestrictAddressFamilies=AF_UNIX AF_INET AF_INET6- Socket restrictions
Verify systemd security:
# Check service status
systemctl status pulse-sensor-proxy
# Verify user/group
ps aux | grep pulse-sensor-proxy | grep -v grep
# Expected: pulse-sensor-proxy user, not root
# Check systemd security settings
systemctl show pulse-sensor-proxy | grep -E '(User=|NoNewPrivileges|ProtectSystem|CapabilityBoundingSet)'
File Permissions
Critical paths and ownership:
/var/lib/pulse-sensor-proxy/ pulse-sensor-proxy:pulse-sensor-proxy 0750
├── ssh/ pulse-sensor-proxy:pulse-sensor-proxy 0700
│ ├── id_ed25519 pulse-sensor-proxy:pulse-sensor-proxy 0600
│ └── id_ed25519.pub pulse-sensor-proxy:pulse-sensor-proxy 0640
└── ssh.d/ pulse-sensor-proxy:pulse-sensor-proxy 0750
├── next/ pulse-sensor-proxy:pulse-sensor-proxy 0750
└── prev/ pulse-sensor-proxy:pulse-sensor-proxy 0750
/run/pulse-sensor-proxy/ pulse-sensor-proxy:pulse-sensor-proxy 0775
└── pulse-sensor-proxy.sock pulse-sensor-proxy:pulse-sensor-proxy 0777
Verify permissions:
# Check base directory
ls -ld /var/lib/pulse-sensor-proxy/
# Expected: drwxr-x--- pulse-sensor-proxy pulse-sensor-proxy
# Check SSH keys
ls -l /var/lib/pulse-sensor-proxy/ssh/
# Expected:
# -rw------- pulse-sensor-proxy pulse-sensor-proxy id_ed25519
# -rw-r----- pulse-sensor-proxy pulse-sensor-proxy id_ed25519.pub
# Check socket directory (note: 0775 for container access)
ls -ld /run/pulse-sensor-proxy/
# Expected: drwxrwxr-x pulse-sensor-proxy pulse-sensor-proxy
Why 0775 on socket directory?
The socket directory needs 0775 (not 0770) to allow the container's unprivileged UID (e.g., 1001) to traverse into the directory and access the socket. The socket itself is 0777 as access control is enforced via SO_PEERCRED.
LXC Container Requirements
Configuration Summary
| Setting | Value | Purpose |
|---|---|---|
lxc.idmap |
u 0 100000 65536g 0 100000 65536 |
Unprivileged UID/GID mapping |
lxc.apparmor.profile |
generated or custom |
AppArmor confinement |
lxc.cap.drop |
sys_admin (optional) |
Drop dangerous capabilities |
lxc.mount.entry |
Directory-level bind mount | Socket access from container |
Sample LXC Configuration
In /etc/pve/lxc/<VMID>.conf:
# Unprivileged container (required)
unprivileged: 1
# AppArmor profile (recommended)
lxc.apparmor.profile: generated
# Drop CAP_SYS_ADMIN if feasible (optional but recommended)
# WARNING: May break some container management operations
lxc.cap.drop: sys_admin
# Bind mount proxy socket directory (REQUIRED)
# Note: Directory-level mount, not socket-level (socket is recreated by systemd)
lxc.mount.entry: /run/pulse-sensor-proxy run/pulse-sensor-proxy none bind,create=dir 0 0
Key points:
- Directory-level mount: Mount
/run/pulse-sensor-proxydirectory, not the socket file itself - Why directory mount? Systemd recreates the socket on restart; socket-level mounts break on recreation
- Mode 0775: Socket directory needs group+other execute permissions for container UID traversal
- Socket 0777: Actual socket is world-writable; security enforced via
SO_PEERCREDauthentication
Runtime Verification
Check container is unprivileged:
# On host
pct config <VMID> | grep unprivileged
# Expected: unprivileged: 1
# Inside container
cat /proc/self/uid_map
# Expected: 0 100000 65536 (or similar)
# NOT: 0 0 4294967295 (privileged)
Check AppArmor confinement:
# Inside container
cat /proc/self/attr/current
# Expected: lxc-<vmid>_</var/lib/lxc> (enforcing) or similar
# NOT: unconfined
Check namespace isolation:
# Inside container
ls -li /proc/self/ns/
# Each namespace should have a unique inode number, different from host
Check capabilities:
# Inside container
capsh --print | grep Current
# Should show limited capability set
# If lxc.cap.drop: sys_admin is set, CAP_SYS_ADMIN should be absent
Check bind mount:
# Inside container
ls -la /run/pulse-sensor-proxy/
# Expected: pulse-sensor-proxy.sock visible
# Test socket access (requires Pulse to attempt connection)
socat - UNIX-CONNECT:/run/pulse-sensor-proxy/pulse-sensor-proxy.sock
# Should connect (may timeout waiting for input, but connection succeeds)
Key Management
SSH Key Restrictions
All SSH keys deployed to cluster nodes include these restrictions:
command="sensors -j"- Forced command (only sensors allowed)from="<subnets>"- IP address restrictionsno-port-forwarding- Disable port forwardingno-X11-forwarding- Disable X11 forwardingno-agent-forwarding- Disable agent forwardingno-pty- Disable PTY allocation
Example authorized_keys entry:
from="192.168.0.0/24,10.0.0.0/8",command="sensors -j",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-ed25519 AAAA... pulse-sensor-proxy
Configure allowed subnets:
Create /etc/pulse-sensor-proxy/config.yaml:
allowed_source_subnets:
- "192.168.0.0/24" # LAN subnet
- "10.0.0.0/8" # VPN subnet
Or use environment variable:
# In /etc/default/pulse-sensor-proxy (loaded by systemd)
PULSE_SENSOR_PROXY_ALLOWED_SUBNETS="192.168.0.0/24,10.0.0.0/8"
Auto-detection:
If no subnets are configured, the proxy auto-detects host IP addresses and uses them as /32 (IPv4) or /128 (IPv6) CIDRs. This is secure but brittle (breaks if host IP changes). Explicit configuration is recommended.
Verify SSH restrictions:
# On any cluster node
grep pulse-sensor-proxy /root/.ssh/authorized_keys
# Expected format:
# from="...",command="sensors -j",no-* ssh-ed25519 AAAA... pulse-sensor-proxy
Key Rotation
Rotation cadence:
- Recommended: Every 90 days
- Minimum: Every 180 days
- After incident: Immediately
Rotation workflow:
The pulse-sensor-proxy-rotate-keys.sh script performs staged rotation with verification:
-
Dry-run (recommended first):
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --dry-runShows what would happen without making changes.
-
Perform rotation:
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.shWhat happens:
- Generates new Ed25519 keypair in
/var/lib/pulse-sensor-proxy/ssh.d/next/ - Pushes new key to all cluster nodes (via RPC
ensure_cluster_keys) - Verifies SSH connectivity with new key on each node
- Atomically swaps keys:
- Current
/ssh/→/ssh.d/prev/(backup) - Staging
/ssh.d/next/→/ssh/(active)
- Current
- Old keys preserved in
/ssh.d/prev/for rollback
- Generates new Ed25519 keypair in
-
If rotation fails, rollback:
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --rollbackRestores previous keypair from
/ssh.d/prev/and re-pushes to cluster nodes.
Post-rotation verification:
# Check new key timestamp
stat /var/lib/pulse-sensor-proxy/ssh/id_ed25519
# Verify all nodes have new key
for node in pve1 pve2 pve3; do
echo "=== $node ==="
ssh root@$node "grep pulse-sensor-proxy /root/.ssh/authorized_keys | tail -1"
done
# Test temperature fetch via proxy
curl -s --unix-socket /run/pulse-sensor-proxy/pulse-sensor-proxy.sock \
-d '{"correlation_id":"test","method":"get_temp","params":{"node":"pve1"}}' \
| jq .
Automated Rotation (Optional)
Create systemd timer:
/etc/systemd/system/pulse-sensor-proxy-key-rotation.service:
[Unit]
Description=Rotate pulse-sensor-proxy SSH keys
After=pulse-sensor-proxy.service
Requires=pulse-sensor-proxy.service
[Service]
Type=oneshot
ExecStart=/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh
StandardOutput=journal
StandardError=journal
/etc/systemd/system/pulse-sensor-proxy-key-rotation.timer:
[Unit]
Description=Rotate pulse-sensor-proxy SSH keys every 90 days
Requires=pulse-sensor-proxy-key-rotation.service
[Timer]
OnCalendar=quarterly
RandomizedDelaySec=1h
Persistent=true
[Install]
WantedBy=timers.target
Enable timer:
systemctl daemon-reload
systemctl enable --now pulse-sensor-proxy-key-rotation.timer
# Check next run
systemctl list-timers pulse-sensor-proxy-key-rotation.timer
Monitoring & Auditing
Metrics Endpoint
The proxy exposes Prometheus metrics on 127.0.0.1:9127 by default.
Available metrics:
pulse_proxy_rpc_requests_total{method, result}- RPC request counterpulse_proxy_rpc_latency_seconds{method}- RPC handler latency histogrampulse_proxy_ssh_requests_total{node, result}- SSH request counter per nodepulse_proxy_ssh_latency_seconds{node}- SSH latency histogram per nodepulse_proxy_queue_depth- Concurrent RPC requests (gauge)pulse_proxy_rate_limit_hits_total- Rejected requests due to rate limitingpulse_proxy_build_info{version}- Build metadata
Configure metrics address:
In /etc/default/pulse-sensor-proxy:
# Listen on all interfaces (WARNING: exposes metrics externally)
PULSE_SENSOR_PROXY_METRICS_ADDR="0.0.0.0:9127"
# Disable metrics
PULSE_SENSOR_PROXY_METRICS_ADDR="disabled"
Test metrics endpoint:
curl -s http://127.0.0.1:9127/metrics | grep pulse_proxy
Prometheus Integration
Sample scrape configuration:
scrape_configs:
- job_name: 'pulse-sensor-proxy'
static_configs:
- targets:
- 'pve1:9127'
- 'pve2:9127'
- 'pve3:9127'
relabel_configs:
- source_labels: [__address__]
regex: '([^:]+):.+'
target_label: instance
Alert Rules
Recommended Prometheus alerts:
groups:
- name: pulse-sensor-proxy
rules:
# High SSH failure rate
- alert: PulseProxySSHFailureRate
expr: |
rate(pulse_proxy_ssh_requests_total{result="error"}[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High SSH failure rate on {{ $labels.instance }}"
description: "{{ $value | humanize }} SSH requests/sec failing"
# Rate limiting active
- alert: PulseProxyRateLimiting
expr: |
rate(pulse_proxy_rate_limit_hits_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Rate limiting active on {{ $labels.instance }}"
description: "Proxy rejecting requests due to rate limits"
# High queue depth
- alert: PulseProxyQueueDepth
expr: pulse_proxy_queue_depth > 5
for: 5m
labels:
severity: warning
annotations:
summary: "High RPC queue depth on {{ $labels.instance }}"
description: "{{ $value }} concurrent requests (threshold: 5)"
# Proxy down
- alert: PulseProxyDown
expr: up{job="pulse-sensor-proxy"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Pulse proxy down on {{ $labels.instance }}"
Audit Logging
Log format: All RPC requests are logged with structured fields:
corr_id- Correlation ID (UUID, tracks request lifecycle)uid/pid- Peer credentials fromSO_PEERCREDmethod- RPC method called (get_temp,register_nodes,ensure_cluster_keys)
Example log entries:
{"level":"info","corr_id":"a7f3d..","uid":1001,"pid":12345,"method":"get_temp","node":"pve1","msg":"RPC request"}
{"level":"info","corr_id":"a7f3d..","node":"pve1","latency_ms":245,"msg":"Temperature fetch successful"}
Query logs:
# All RPC requests in last hour
journalctl -u pulse-sensor-proxy --since "1 hour ago" -o json | \
jq -r 'select(.corr_id != null) | [.corr_id, .uid, .method, .node] | @tsv'
# Failed SSH requests
journalctl -u pulse-sensor-proxy --since today | grep -E '(SSH.*failed|error)'
# Rate limit hits
journalctl -u pulse-sensor-proxy --since today | grep "rate limit"
# Specific correlation ID
journalctl -u pulse-sensor-proxy | grep "corr_id=a7f3d"
Rate Limiting
Current limits (per peer UID+PID):
- Rate: 20 requests/minute (token bucket with burst)
- Burst: 10 requests
- Concurrency: 10 simultaneous requests
Behavior on limit exceeded:
- Request rejected immediately (no queuing)
pulse_proxy_rate_limit_hits_totalmetric incremented- Log entry:
"Rate limit exceeded" - HTTP-like semantics: Similar to 429 Too Many Requests
Adjust limits:
Limits are hardcoded in throttle.go. To adjust, modify and rebuild:
// cmd/pulse-sensor-proxy/throttle.go
const (
requestsPerMin = 20 // Change this
requestBurst = 10 // Change this
maxConcurrent = 10 // Change this
)
Then rebuild and restart:
go build -v ./cmd/pulse-sensor-proxy
systemctl restart pulse-sensor-proxy
Incident Response
Suspected Compromise Checklist
If the proxy or host is suspected compromised:
-
Isolate immediately:
# Stop proxy service systemctl stop pulse-sensor-proxy # Block outbound SSH from host (if applicable) iptables -A OUTPUT -p tcp --dport 22 -j REJECT -
Rotate all keys:
# Remove compromised keys from all nodes for node in pve1 pve2 pve3; do ssh root@$node "sed -i '/pulse-sensor-proxy/d' /root/.ssh/authorized_keys" done # Generate new keys (don't use rotation script - may be compromised) rm -rf /var/lib/pulse-sensor-proxy/ssh* mkdir -p /var/lib/pulse-sensor-proxy/ssh ssh-keygen -t ed25519 -N '' -C "pulse-sensor-proxy emergency $(date -u +%Y%m%dT%H%M%SZ)" \ -f /var/lib/pulse-sensor-proxy/ssh/id_ed25519 chown -R pulse-sensor-proxy:pulse-sensor-proxy /var/lib/pulse-sensor-proxy/ssh chmod 0700 /var/lib/pulse-sensor-proxy/ssh chmod 0600 /var/lib/pulse-sensor-proxy/ssh/id_ed25519 chmod 0640 /var/lib/pulse-sensor-proxy/ssh/id_ed25519.pub -
Audit logs:
# Export all proxy logs journalctl -u pulse-sensor-proxy --since "7 days ago" > /tmp/proxy-audit-$(date +%s).log # Look for anomalies: # - Unusual correlation IDs # - High rate limit hits # - Unexpected UIDs/PIDs # - SSH errors to unexpected nodes -
Reinstall proxy:
# Re-run installation script /opt/pulse/scripts/install-temp-proxy.sh # Verify service status systemctl status pulse-sensor-proxy -
Re-push keys:
# Use proxy RPC to push new keys /opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh -
Verify no persistence mechanisms:
# Check for unexpected systemd units systemctl list-units --all | grep -i proxy # Check for unexpected cron jobs crontab -l -u pulse-sensor-proxy # Check for unauthorized files in /var/lib/pulse-sensor-proxy find /var/lib/pulse-sensor-proxy -type f ! -path '*/ssh/*' ! -path '*/ssh.d/*'
Post-Incident Hardening
After an incident, consider:
- Audit all LXC containers for unexpected privilege escalation
- Review bind mounts on all containers (check for unauthorized mounts)
- Enable full syscall auditing (
auditd) on host - Restrict network access to proxy metrics endpoint (firewall
127.0.0.1:9127) - Implement log aggregation (forward
journaldto central SIEM)
Testing & Rollout
Development Testing
Before deploying to production, verify the implementation with these safe tests:
1. Build Verification:
# Compile proxy
cd /opt/pulse
go build -v ./cmd/pulse-sensor-proxy
# Verify binary
./pulse-sensor-proxy version
# Expected: pulse-sensor-proxy dev (or version number)
# Check help output
./pulse-sensor-proxy --help
2. Rotation Script Syntax:
# Syntax check
bash -n /opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh
# Help output
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --help
# Dry-run (requires root and socket)
sudo /opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --dry-run
3. Configuration Validation:
# Test config file parsing
cat > /tmp/test-config.yaml <<EOF
allowed_source_subnets:
- "192.168.0.0/24"
- "10.0.0.0/8"
metrics_address: "127.0.0.1:9127"
EOF
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('/tmp/test-config.yaml'))"
Production Rollout Checklist
Phase 1: Pre-Deployment (on host)
-
Backup current state:
# Backup systemd unit cp /etc/systemd/system/pulse-sensor-proxy.service \ /etc/systemd/system/pulse-sensor-proxy.service.backup # Backup SSH keys tar -czf /tmp/pulse-sensor-proxy-keys-backup-$(date +%s).tar.gz \ /var/lib/pulse-sensor-proxy/ssh/ # Note current service status systemctl status pulse-sensor-proxy > /tmp/pulse-sensor-proxy-status-before.txt -
Create service account:
# Run install script or manually create if ! id -u pulse-sensor-proxy >/dev/null 2>&1; then useradd --system --user-group --no-create-home --shell /usr/sbin/nologin pulse-sensor-proxy fi -
Update file ownership:
chown -R pulse-sensor-proxy:pulse-sensor-proxy /var/lib/pulse-sensor-proxy/ chmod 0750 /var/lib/pulse-sensor-proxy/ chmod 0700 /var/lib/pulse-sensor-proxy/ssh/ chmod 0600 /var/lib/pulse-sensor-proxy/ssh/id_ed25519 chmod 0640 /var/lib/pulse-sensor-proxy/ssh/id_ed25519.pub
Phase 2: Deploy Hardened Version
-
Build and install binary:
cd /opt/pulse go build -v -o /tmp/pulse-sensor-proxy ./cmd/pulse-sensor-proxy # Verify build /tmp/pulse-sensor-proxy version # Install sudo install -m 0755 -o root -g root /tmp/pulse-sensor-proxy /usr/local/bin/pulse-sensor-proxy -
Install hardened systemd unit:
# Copy hardened unit sudo cp /opt/pulse/scripts/pulse-sensor-proxy.service /etc/systemd/system/ # Verify syntax systemd-analyze verify /etc/systemd/system/pulse-sensor-proxy.service # Reload systemd sudo systemctl daemon-reload -
Update RuntimeDirectoryMode for LXC access:
# Ensure socket directory is accessible from container sudo mkdir -p /etc/systemd/system/pulse-sensor-proxy.service.d/ cat | sudo tee /etc/systemd/system/pulse-sensor-proxy.service.d/lxc-access.conf <<'EOF'
[Service] RuntimeDirectoryMode=0775 EOF
sudo systemctl daemon-reload
**Phase 3: Restart and Verify**
1. **Restart service:**
```bash
sudo systemctl restart pulse-sensor-proxy
# Check status
sudo systemctl status pulse-sensor-proxy
-
Verify service user:
ps aux | grep pulse-sensor-proxy | grep -v grep # Expected: pulse-sensor-proxy user, not root -
Check socket permissions:
ls -ld /run/pulse-sensor-proxy/ # Expected: drwxrwxr-x pulse-sensor-proxy pulse-sensor-proxy ls -l /run/pulse-sensor-proxy/pulse-sensor-proxy.sock # Expected: srwxrwxrwx pulse-sensor-proxy pulse-sensor-proxy -
Test from container:
# Inside LXC container running Pulse ls -la /run/pulse-sensor-proxy/ # Should show socket # Check Pulse logs for connection success journalctl -u pulse-backend -n 50 | grep -i temperature
Phase 4: End-to-End Validation
-
Test RPC methods:
# On host, test socket connectivity echo '{"correlation_id":"test-001","method":"register_nodes","params":{}}' | \ sudo socat - UNIX-CONNECT:/run/pulse-sensor-proxy/pulse-sensor-proxy.sock | jq . # Should return cluster nodes list -
Test temperature fetch:
# From container or via socket echo '{"correlation_id":"test-002","method":"get_temp","params":{"node":"pve1"}}' | \ socat - UNIX-CONNECT:/run/pulse-sensor-proxy/pulse-sensor-proxy.sock | jq . # Should return sensors JSON data -
Verify metrics endpoint:
curl -s http://127.0.0.1:9127/metrics | grep pulse_proxy # Should show metrics like: # pulse_proxy_rpc_requests_total{method="get_temp",result="success"} N # pulse_proxy_queue_depth 0 -
Test SSH key rotation:
# Dry-run first sudo /opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --dry-run # Full rotation (if confident) sudo /opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh # Verify all nodes updated for node in pve1 pve2 pve3; do ssh root@$node "tail -1 /root/.ssh/authorized_keys" done -
Audit logging verification:
# Check logs include correlation IDs and peer credentials sudo journalctl -u pulse-sensor-proxy --since "5 minutes ago" -o json | \ jq -r 'select(.corr_id != null) | [.corr_id, .uid, .method] | @tsv' # Should show structured logging with UIDs
Phase 5: Monitoring Setup
-
Configure Prometheus scraping:
# Add to prometheus.yml scrape_configs: - job_name: 'pulse-sensor-proxy' static_configs: - targets: ['localhost:9127'] -
Import alert rules:
# Copy alert rules from docs to Prometheus alerts directory # Reload Prometheus configuration -
Verify alerts fire (optional stress test):
# Generate rate limit hits (test alert) for i in {1..50}; do echo '{"correlation_id":"stress-'$i'","method":"register_nodes","params":{}}' | \ socat - UNIX-CONNECT:/run/pulse-sensor-proxy/pulse-sensor-proxy.sock & done wait # Check rate limit metric increased curl -s http://127.0.0.1:9127/metrics | grep rate_limit_hits
Rollback Procedure
If issues occur during rollout:
-
Stop new service:
sudo systemctl stop pulse-sensor-proxy -
Restore backup:
sudo cp /etc/systemd/system/pulse-sensor-proxy.service.backup \ /etc/systemd/system/pulse-sensor-proxy.service sudo systemctl daemon-reload -
Restore SSH keys (if rotated):
# If rotation was performed and failed sudo /opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --rollback -
Restart with old configuration:
sudo systemctl restart pulse-sensor-proxy sudo systemctl status pulse-sensor-proxy -
Verify Pulse connectivity:
# Check Pulse can still fetch temperatures # Monitor Pulse logs
Known Limitations
- No automated unit tests: Code verification relies on build success and manual testing
- Key rotation requires manual trigger: Automated timer setup is optional
- Metrics require Prometheus: No built-in alerting without external monitoring
- LXC bind mount required: Container must have directory-level bind mount configured
- Root required for rotation script: Script needs root to run
ensure_cluster_keysRPC
Future Improvements
- Add Go unit tests for validation, throttling, and metrics logic
- Implement health check endpoint (e.g.,
/health) separate from metrics - Add support for TLS on metrics endpoint
- Create automated integration test suite
- Add
--checkflag to rotation script for pre-flight validation - Support for multiple LXC containers accessing same proxy instance
Appendix
Quick Verification Checklist
Host:
- Service running as
pulse-sensor-proxyuser (not root) - Keys in
/var/lib/pulse-sensor-proxy/ssh/owned bypulse-sensor-proxy:pulse-sensor-proxy - Private key permissions:
0600 - Socket directory permissions:
0775(not0770) - Metrics endpoint accessible:
curl http://127.0.0.1:9127/metrics
Container:
- Container is unprivileged (
unprivileged: 1in config) - Bind mount exists:
ls /run/pulse-sensor-proxy/pulse-sensor-proxy.sock - AppArmor enforced:
cat /proc/self/attr/currentshows confinement - Pulse can connect to socket (check Pulse logs)
SSH Keys:
- All nodes have
pulse-sensor-proxykey in/root/.ssh/authorized_keys - Keys include
from="..."restrictions - Keys include
command="sensors -j"forced command - Keys include
no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty
Monitoring:
- Prometheus scraping metrics successfully
- Alerts configured for SSH failures, rate limiting, queue depth
- Logs forwarded to central logging (optional but recommended)
Reference Commands
Service Management:
systemctl status pulse-sensor-proxy # Check service status
systemctl restart pulse-sensor-proxy # Restart service
journalctl -u pulse-sensor-proxy -f # Tail logs
Key Management:
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --dry-run # Dry-run rotation
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh # Perform rotation
/opt/pulse/scripts/pulse-sensor-proxy-rotate-keys.sh --rollback # Rollback
Metrics:
curl http://127.0.0.1:9127/metrics # Fetch all metrics
curl -s http://127.0.0.1:9127/metrics | grep pulse_proxy # Filter proxy metrics
Manual RPC (Testing):
# Using socat (inline JSON)
echo '{"correlation_id":"test","method":"get_temp","params":{"node":"pve1"}}' | \
socat - UNIX-CONNECT:/run/pulse-sensor-proxy/pulse-sensor-proxy.sock
# Using Python (proper JSON-RPC client)
python3 <<'PY'
import json, socket, uuid
payload = {
"correlation_id": str(uuid.uuid4()),
"method": "get_temp",
"params": {"node": "pve1"}
}
with socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) as s:
s.connect("/run/pulse-sensor-proxy/pulse-sensor-proxy.sock")
s.sendall((json.dumps(payload) + "\n").encode())
s.shutdown(socket.SHUT_WR)
print(s.recv(65536).decode())
PY
Verification:
# Check service user
ps aux | grep pulse-sensor-proxy | grep -v grep
# Check file ownership
ls -lR /var/lib/pulse-sensor-proxy/
# Check bind mount in container
pct enter <VMID>
ls -la /run/pulse-sensor-proxy/
# Check SSH keys on nodes
for node in pve1 pve2 pve3; do
echo "=== $node ==="
ssh root@$node "grep pulse-sensor-proxy /root/.ssh/authorized_keys"
done
Document Version: 1.0 Last Updated: 2025-10-13 Applies To: pulse-sensor-proxy v1.0+