This change fixes backup-age alert notifications to display VM/CT names
instead of just "VMID XXX" in multi-cluster environments where backups
are stored on PBS.
Changes:
- Store all guests per VMID (not just first match) to handle VMID collisions across clusters
- Persist last-known guest names/types in metadata store for deleted VMs
- Enrich backup correlation with persisted metadata when live inventory is empty
- Update CheckBackups to handle multiple VMID matches intelligently
The fix addresses two scenarios:
1. Multiple PVE clusters with same VMID backing up to one PBS
2. VMs deleted from Proxmox but backups still exist on PBS
Backup-age alerts will now show proper VM/CT names when:
- A unique guest exists with that VMID (live or persisted)
- Multiple guests share a VMID (uses first match, consistent with current behavior)
When truly ambiguous (multiple live VMs, same VMID, no way to determine origin),
the alert gracefully falls back to showing "VMID XXX".
The original fix in c6c0ac63e only handled per-resource overrides when
thresholds were disabled (trigger <= 0 or Disabled=true). It did not
handle global DisableAll* flags (DisableAllStorage, DisableAllNodes,
DisableAllGuests, etc.).
When a user toggled a DisableAll* flag from false to true:
- Check* functions returned early without processing
- Existing active alerts remained in m.activeAlerts map
- Those alerts continued generating webhook notifications
- reevaluateActiveAlertsLocked didn't check DisableAll* flags
This commit fixes the issue by:
1. Updating reevaluateActiveAlertsLocked to check all DisableAll* flags
and resolve alerts for those resource types during config updates
2. Adding alert cleanup to Check* functions before early returns:
- CheckStorage: clears usage and offline alerts
- CheckNode: clears cpu/memory/disk/temperature and offline alerts
- CheckPMG: clears queue/message alerts and offline alerts
- CheckPBS: clears cpu/memory and offline alerts
- CheckHost: calls existing cleanup helpers
3. Adding comprehensive test coverage for DisableAllStorage scenario
Related to #561
Related to #547 and #622
## Samsung SSD Fix (#547)
Samsung 980 and 990 series SSDs have known firmware bugs that cause them to
report incorrect health status (typically FAILED or critical warnings) even
when the drives are actually healthy. This is commonly due to incorrect
temperature threshold reporting in the firmware.
This change adds special handling to detect these drives and skip health
status alerts while still monitoring wearout metrics, which remain reliable.
The fix also clears any existing false alerts for these drives.
Users experiencing these false alerts should update their Samsung SSD firmware
to the latest version from Samsung, which typically resolves the issue.
## Docker Agent CPU Fix (#622)
Addresses issue where Docker container CPU usage shows 0%. The Docker
agent uses ContainerStatsOneShot which typically doesn't populate
PreCPUStats, requiring manual delta tracking between collection cycles.
Changes:
- Fix logic bug where prevContainerCPU was updated before checking if
previous sample existed, causing incorrect delta calculations
- Add comprehensive debug logging showing which calculation method
succeeded (PreCPUStats, system delta, or time-based fallback)
- Add warning after 10 PreCPUStats failures to inform about manual
tracking mode (normal for one-shot stats)
- Add detailed failure logging when CPU calculation cannot complete
Expected behavior: First collection cycle returns 0% (no previous
sample), subsequent cycles show accurate CPU metrics.
Extends the Docker monitoring and alerting system to track writable layer
usage as a percentage of the container's root filesystem. This helps
identify containers with bloated copy-on-write layers before they
consume excessive disk space.
- Add disk threshold to DockerThresholdConfig (default: 85% trigger, 80% clear)
- Evaluate disk alerts for running containers when RootFilesystemBytes > 0
- Include disk metadata (writable layer, total filesystem, block I/O stats)
- Update frontend to display and configure disk thresholds
- Add test coverage for disk usage alert hysteresis
- Document disk monitoring in DOCKER_MONITORING.md
Per-container and per-host overrides apply to disk thresholds the same
way they do for CPU and memory.
- Add comprehensive test coverage for alerts package with 285+ new tests
- Implement ThresholdsTable component with metric thresholds display
- Enhance Alerts page UI with improved layout and metric filtering
- Add frontend component tests for Alerts page and ThresholdsTable
- Set up Vitest testing infrastructure for SolidJS components
- Improve config persistence with better validation
- Expand discovery tests with 333+ test cases
- Update API, configuration, and Docker monitoring documentation