Pulse/internal/monitoring
rcourtman bb7ca93c18 feat: Add mdadm RAID monitoring support for host agents
Implements comprehensive mdadm RAID array monitoring for Linux hosts
via pulse-host-agent. Arrays are automatically detected and monitored
with real-time status updates, rebuild progress tracking, and automatic
alerting for degraded or failed arrays.

Key changes:

**Backend:**
- Add mdadm package for parsing mdadm --detail output
- Extend host agent report structure with RAID array data
- Integrate mdadm collection into host agent (Linux-only, best-effort)
- Add RAID array processing in monitoring system
- Implement automatic alerting:
  - Critical alerts for degraded arrays or arrays with failed devices
  - Warning alerts for rebuilding/resyncing arrays with progress tracking
  - Auto-clear alerts when arrays return to healthy state

**Frontend:**
- Add TypeScript types for RAID arrays and devices
- Display RAID arrays in host details drawer with:
  - Array status (clean/degraded/recovering) with color-coded indicators
  - Device counts (active/total/failed/spare)
  - Rebuild progress percentage and speed when applicable
  - Green for healthy, amber for rebuilding, red for degraded

**Documentation:**
- Document mdadm monitoring feature in HOST_AGENT.md
- Explain requirements (Linux, mdadm installed, root access)
- Clarify scope (software RAID only, hardware RAID not supported)

**Testing:**
- Add comprehensive tests for mdadm output parsing
- Test parsing of healthy, degraded, and rebuilding arrays
- Verify proper extraction of device states and rebuild progress

All builds pass successfully. RAID monitoring is automatic and best-effort
- if mdadm is not installed or no arrays exist, host agent continues
reporting other metrics normally.

Related to #676
2025-11-09 16:36:33 +00:00
..
backoff.go feat: implement error handling with circuit breakers and backoff (Phase 2 Task 7) 2025-10-20 15:13:37 +00:00
backoff_test.go test: add comprehensive unit tests for backoff and circuit breaker (Phase 2 Task 9a) 2025-10-20 15:13:38 +00:00
backup_guard_test.go Guard PBS backups from failed polls 2025-11-05 19:26:20 +00:00
ceph.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
circuit_breaker.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
circuit_breaker_test.go test: add comprehensive unit tests for backoff and circuit breaker (Phase 2 Task 9a) 2025-10-20 15:13:38 +00:00
container_disk_usage.go feat: add professional logging with runtime configuration and performance optimization 2025-10-20 15:13:38 +00:00
diagnostic_snapshots.go Refine Proxmox node memory fallback (#582) 2025-10-22 15:36:26 +00:00
docker_commands.go feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
docker_commands_test.go chore: snapshot current changes 2025-11-02 22:47:55 +00:00
fake_executor_integration.go test: add comprehensive integration test harness for adaptive polling (Phase 2 Task 9c) 2025-10-20 15:13:38 +00:00
fs_filters.go Ignore read-only guest filesystems in disk aggregation 2025-10-14 16:13:53 +00:00
fs_filters_test.go Ignore read-only guest filesystems in disk aggregation 2025-10-14 16:13:53 +00:00
harness_integration.go Surface LXC interface IPs via PVE interfaces API (#596) 2025-10-23 08:07:32 +00:00
helpers_test.go Expand monitoring and discovery test coverage 2025-10-16 08:17:08 +00:00
integration_integration_test.go test: add soak test with runtime instrumentation (Phase 2 Task 9d) 2025-10-20 15:13:38 +00:00
main_test.go Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00
metrics.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
metrics_history.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
metrics_history_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
monitor.go feat: Add mdadm RAID monitoring support for host agents 2025-11-09 16:36:33 +00:00
monitor_docker_test.go Refactor: Code cleanup and localStorage consolidation 2025-11-04 21:50:46 +00:00
monitor_health_test.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
monitor_host_agents_test.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
monitor_memory_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_pmg_test.go Fix PMG API parameter issues causing 400 errors 2025-11-05 19:28:37 +00:00
monitor_polling.go Fix guest agent disk data regression on Proxmox 8.3+ 2025-11-06 18:42:46 +00:00
monitor_snapshots_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_storage_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
poller.go feat: add professional logging with runtime configuration and performance optimization 2025-10-20 15:13:38 +00:00
ratetracker.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
ratetracker_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
reload.go Propagate config updates to settings nodes (#588) 2025-10-22 13:45:13 +00:00
scheduler.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
staleness_tracker.go release: prepare v4.25.0 2025-10-22 10:46:18 +00:00
staleness_tracker_test.go test: add comprehensive staleness tracker unit tests (Phase 2 Task 9b) 2025-10-20 15:13:38 +00:00
task_queue.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
temperature.go Improve Docker temperature monitoring documentation for clarity (related to #600) 2025-11-07 15:09:42 +00:00
temperature_service.go Add configurable SSH port for temperature monitoring 2025-11-05 20:03:29 +00:00
temperature_test.go Expand temperature sensor compatibility for SuperIO and AMD CPUs 2025-11-05 18:47:21 +00:00