Pulse/docs/operations/pulse-sensor-proxy-runbook.md
rcourtman 524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00

3 KiB

Pulse Sensor Proxy Runbook

Quick Reference

  • Binary: /opt/pulse/sensor-proxy/bin/pulse-sensor-proxy
  • Unit: pulse-sensor-proxy.service
  • Logs: /var/log/pulse/sensor-proxy/proxy.log
  • Audit trail: /var/log/pulse/sensor-proxy/audit.log (hash chained, forwarded via rsyslog)
  • Metrics: http://127.0.0.1:9456/metrics
  • Limiters: per-UID token bucket (burst 2) + global concurrency (8)

Monitoring Alerts & Response

Rate Limit Hits (pulse_proxy_limiter_rejections_total)

  1. Check audit log entries tagged limiter.rejection for offending UID.
  2. Confirm workload legitimacy; if expected, consider increasing limits via config override.
  3. If malicious, block source process/user and inspect Pulse audit logs.

Penalty Events (pulse_proxy_limiter_penalties_total)

  1. Review corresponding validation failures in audit log (command.validation_failed).
  2. If repeated invalid JSON/unknown methods, inspect caller code for regressions or intrusion attempts.

Audit Log Forwarder Down

  1. journalctl -u rsyslog to confirm transmission errors.
  2. Ensure /etc/pulse/log-forwarding certs valid & remote host reachable.
  3. Forwarding queue stored locally in /var/log/pulse/sensor-proxy/forwarding.log; ship manually if outage exceeds 1 hour.

Proxy Health Endpoint Fails

  1. systemctl status pulse-sensor-proxy
  2. Check /var/log/pulse/sensor-proxy/proxy.log for panic or limiter exhaustion.
  3. Inspect /var/log/pulse/sensor-proxy/audit.log for recent privileged method denials.

Standard Procedures

Restart Proxy Safely

sudo systemctl stop pulse-sensor-proxy
sudo apparmor_parser -r /etc/apparmor.d/pulse-sensor-proxy   # if updating policy
sudo systemctl start pulse-sensor-proxy

Verify: curl -s http://127.0.0.1:9456/metrics | grep pulse_proxy_build_info.

Rotate SSH Keys

  1. Run scripts/secure-sensor-files.sh to regenerate keys (ensure environment locked down).
  2. Use RPC ensure_cluster_keys to distribute new public key.
  3. Confirm nodes accept ssh from proxy host.

Adjust Rate Limits

  1. Update limiter_policy environment overrides (future config).
  2. Restart proxy; monitor limiter metrics to validate new thresholds.
  3. Document change in security runbook.

Incident Handling

  • Unauthorized Command Attempt: audit log shows command.validation_failed and limiter penalties; capture correlation ID, check Pulse side for compromised container.
  • Excessive Temperature Failures: refer to pulse_proxy_ssh_requests_total{result="error"}; validate network ACLs and node health; escalate to Proxmox team if nodes unreachable.
  • Log Tampering Suspected: verify audit hash chain by replaying eventHash values; compare with remote log store (immutable). Trigger security response if mismatch.

Postmortem Checklist

  • Timeline: command audit entries, limiter stats, rsyslog queue depth.
  • Verify AppArmor/seccomp status (aa-status, systemctl show pulse-sensor-proxy -p AppArmorProfile).
  • Ensure firewall ACLs match docs/security/pulse-sensor-proxy-network.md.