Pulse/internal/alerts
rcourtman bb7ca93c18 feat: Add mdadm RAID monitoring support for host agents
Implements comprehensive mdadm RAID array monitoring for Linux hosts
via pulse-host-agent. Arrays are automatically detected and monitored
with real-time status updates, rebuild progress tracking, and automatic
alerting for degraded or failed arrays.

Key changes:

**Backend:**
- Add mdadm package for parsing mdadm --detail output
- Extend host agent report structure with RAID array data
- Integrate mdadm collection into host agent (Linux-only, best-effort)
- Add RAID array processing in monitoring system
- Implement automatic alerting:
  - Critical alerts for degraded arrays or arrays with failed devices
  - Warning alerts for rebuilding/resyncing arrays with progress tracking
  - Auto-clear alerts when arrays return to healthy state

**Frontend:**
- Add TypeScript types for RAID arrays and devices
- Display RAID arrays in host details drawer with:
  - Array status (clean/degraded/recovering) with color-coded indicators
  - Device counts (active/total/failed/spare)
  - Rebuild progress percentage and speed when applicable
  - Green for healthy, amber for rebuilding, red for degraded

**Documentation:**
- Document mdadm monitoring feature in HOST_AGENT.md
- Explain requirements (Linux, mdadm installed, root access)
- Clarify scope (software RAID only, hardware RAID not supported)

**Testing:**
- Add comprehensive tests for mdadm output parsing
- Test parsing of healthy, degraded, and rebuilding arrays
- Verify proper extraction of device states and rebuild progress

All builds pass successfully. RAID monitoring is automatic and best-effort
- if mdadm is not installed or no arrays exist, host agent continues
reporting other metrics normally.

Related to #676
2025-11-09 16:36:33 +00:00
..
alerts.go feat: Add mdadm RAID monitoring support for host agents 2025-11-09 16:36:33 +00:00
alerts_test.go Improve backup-age alerts to show VM/CT names in multi-cluster setups (related to #668) 2025-11-08 18:24:04 +00:00
concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
history.go Fix critical alert system concurrency and memory leak issues 2025-11-07 09:12:28 +00:00
history_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
offline_toggle_test.go feat: finalize swarm service monitoring (#598) 2025-10-26 09:35:49 +00:00
per_metric_delay_example_test.go Add configurable SSH port for temperature monitoring 2025-11-05 20:03:29 +00:00
quiet_hours_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
threshold_update_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
time_threshold_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00