Commit graph

116 commits

Author SHA1 Message Date
rcourtman
96573f4aca feat: enhance AI baseline context visibility and incident timeline improvements
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
  status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
  their infrastructure's normal behavior patterns
- Added baseline import to service.go

Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
2025-12-21 00:14:20 +00:00
rcourtman
ae522c9a2b fix: Allow all threshold types (Storage, Temperature, Host Agent) to be set to 0 to disable alerting
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior

Related to #864
2025-12-20 20:42:23 +00:00
rcourtman
781442cdd0 test: Add comprehensive tests for Host Agent threshold normalization with Trigger=0. Related to #864 2025-12-20 20:32:59 +00:00
rcourtman
968e0a7b3d fix: reduce syslog flooding by downgrading routine logs to debug level
Addresses issue #861 - syslog flooded on docker host

Many routine operational messages were being logged at INFO level,
causing excessive log volume when monitoring multiple VMs/containers.
These messages are now logged at DEBUG level:

- Guest threshold checking (every guest, every poll cycle)
- Storage threshold checking (every storage, every poll cycle)
- Host agent linking messages
- Filesystem inclusion in disk calculation
- Guest agent disk usage replacement
- Polling start/completion messages
- Alert cleanup and save messages

Users can set LOG_LEVEL=debug to see these messages if needed for
troubleshooting. The default INFO level now produces significantly
less log output.

Also updated documentation in CONFIGURATION.md and DOCKER.md to:
- Clarify what each log level includes
- Add tip about using LOG_LEVEL=warn for minimal logging
2025-12-18 23:27:32 +00:00
rcourtman
5338ab580c Stabilize core E2E tests
- Preserve alerts activation state when saving thresholds
- Use compliant default E2E password and deterministic bootstrap token seeding
- Harden Playwright selectors, waits, and diagnostics gating
2025-12-17 19:36:48 +00:00
rcourtman
cf44352c83 feat: configurable backup freshness thresholds for dashboard indicator
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).

- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration

Closes #839
2025-12-16 16:36:08 +00:00
rcourtman
3a2a73f9d6 Merge main into ai-features: incorporate latest bugfixes
Resolved conflicts:
- pkg/fsfilters/filters.go: Keep both TrueNAS and EnhanceCP filter fixes
- DockerUnifiedTable.tsx: Use main's resource column overlap fix
2025-12-13 15:18:51 +00:00
rcourtman
9c92bb49df feat(ai): Wire alert history to pattern detector for event tracking
Connect alert system to failure prediction:

1. Add AlertCallback to HistoryManager:
   - OnAlert() method to register callbacks
   - Callbacks invoked when alerts are added
   - Called outside lock to prevent deadlocks

2. Expose OnAlertHistory() on alerts.Manager:
   - Pass-through to HistoryManager.OnAlert()
   - Enables external systems to track alerts

3. Wire pattern detector in router startup:
   - Register callback when pattern detector is created
   - Convert alert types to trackable events
   - Pattern detector now learns from production alerts

Now every alert (memory_warning, cpu_critical, etc.) is recorded as
a historical event for pattern analysis. The AI can predict:
'High memory usage typically occurs every ~3 days (next expected in ~1 day)'

All tests passing.
2025-12-12 14:16:03 +00:00
rcourtman
7b1ec9b5f5 Add host alert deduplication with tests
- Modified alerts.go for improved host alert handling
- Added host_dedup_test.go for deduplication test coverage
2025-12-07 12:38:38 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
c5ab0724f1 fix: Race condition in TestLoadActiveAlerts causing flaky test
ClearActiveAlerts triggers an async save to disk, which can race with
LoadActiveAlerts reading the file. The test now clears the in-memory
map directly without triggering the async save.
2025-12-04 20:41:00 +00:00
rcourtman
d0d989289a Refactor alert system: fix race conditions, memory leaks, and improve code quality
- Rename checkFlapping to checkFlappingLocked to clarify lock contract
- Replace goto statements with structured control flow
- Wire up unused recordAlertFired/recordAlertResolved metric hooks
- Add trackingMapCleanup goroutine to prevent memory leaks from stale entries
- Tighten alert ID validation to alphanumeric + safe punctuation
- Fix history save error handling to properly manage backup lifecycle
- Add auto-migration for deprecated GroupingWindow field
- Refactor 300+ line UpdateConfig into focused helper functions
- Unify duplicate evaluateVMCondition/evaluateContainerCondition
- Add constants for magic numbers (thresholds, timing, flapping)
- Update tests to match new backup behavior
2025-12-02 23:31:36 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
b9db9c140b docs: Add godoc comments to more exported functions
Add missing godoc comments to:
- BuildGuestKey in alerts/alerts.go
- GenerateMockData in mock/generator.go
- NewDockerUpdater, NewAURUpdater in updates/adapter_installsh.go
- NewMockUpdater in updates/mock_updater.go
2025-12-02 16:03:57 +00:00
rcourtman
85487b0058 perf: Cache lowercase RAID state in alerts processing
Compute strings.ToLower(array.State) once and reuse stateLower instead
of calling ToLower three times when checking RAID array status.
2025-12-02 15:35:32 +00:00
rcourtman
b877d4170d test: Add saveHistoryWithRetry tests for alerts package
Add comprehensive tests for the saveHistoryWithRetry function covering:
- Backup file creation from existing history
- Empty history serialization
- Single retry success
- Read-only directory failure with retries
- Concurrent saves with serialization via saveMu
- Snapshot isolation during save

Coverage: saveHistoryWithRetry 58.6% → 86.2%
Coverage: alerts package 87.4% → 87.8%
2025-12-02 12:49:27 +00:00
rcourtman
969f79c2fd test: Add getGuestThresholds tests for alerts package
Add comprehensive tests for the getGuestThresholds function covering:
- Default threshold application
- Guest-specific overrides
- Custom rule filter matching
- Override precedence over custom rules
- Priority-based rule selection
- Disabled rules handling
- Disabled override handling
- DisableConnectivity propagation from overrides and rules
- Legacy CPU threshold conversion
- Legacy ID migration (clustered and standalone VMs)
- Container type support
- All metric thresholds application

Coverage: getGuestThresholds 40.2% → 77.6%
2025-12-02 12:35:07 +00:00
rcourtman
753125d189 test: Add preserveAlertState, checkPMGQuarantineBacklog, LoadActiveAlerts tests
Add comprehensive tests for three low-coverage functions:
- preserveAlertState: nil handling, state preservation from existing alerts,
  ackState fallback, new alert handling
- checkPMGQuarantineBacklog: nil quarantine handling, warning/critical
  thresholds, growth rate alerts, alert updates, virus quarantine
- LoadActiveAlerts: missing file, valid file loading, old alert filtering,
  old acknowledged alert filtering, ack state restoration, invalid JSON,
  duplicate alert handling

Coverage improvements:
- preserveAlertState: 63.6% → 100%
- checkPMGQuarantineBacklog: 12.9% → 100%
- checkQuarantineMetric: 0% → 93.1%
- LoadActiveAlerts: 26.2% → 80.0%
- Alerts package: 83.5% → 86.6%
2025-12-02 12:22:14 +00:00
rcourtman
192a74460e test: Add HistoryManager tests for alerts package
New history_test.go with 24 tests covering GetStats, getFileSize,
AddAlert, GetHistory, GetAllHistory, RemoveAlert, ClearAllHistory,
cleanOldEntries, saveHistory, loadHistory. Alerts package 83.6%→84.0%.
2025-12-02 12:06:53 +00:00
rcourtman
5ff7e20539 test: Add dispatchAlert tests (55.6%→77.8%)
Add TestDispatchAlert with 8 test cases covering:
- Returns false when onAlert callback is nil
- Returns false when alert is nil
- Returns false when activation state is pending
- Returns false when activation state is snoozed
- Returns false for monitor-only alerts
- Dispatches synchronously when async is false
- Dispatches asynchronously when async is true
- Clones alert before dispatch

Alerts package coverage: 83.4%→83.5%
2025-12-02 11:47:24 +00:00
rcourtman
3d957403ef test: Add CheckStorage tests (52.4%→92.9%)
Add comprehensive TestCheckStorageComprehensive with 11 test cases covering:
- Returns early when alerts disabled
- DisableAllStorage clears existing usage and offline alerts
- Override with Disabled clears alerts
- Usage threshold checking
- Override threshold applied correctly
- Skips usage check when offline/unavailable/zero usage
- Offline status creates alert after confirmations
- Unavailable status creates alert
- Clears offline alert when back online

Alerts package coverage: 82.4%→83.4%
2025-12-02 11:43:56 +00:00
rcourtman
905d78b6a6 test: Add CheckPMG tests (0%→100%)
Add comprehensive TestCheckPMGComprehensive with 9 test cases covering:
- Returns early when alerts disabled
- DisableAllPMG clears all PMG alert types (queue-total, queue-deferred,
  queue-hold, oldest-message, offline)
- Override with Disabled clears alerts
- DisableAllPMGOffline clears offline alert
- Offline status creates alert after confirmations
- Connection health error triggers offline alert
- Connection health unhealthy triggers offline alert
- Clears offline alert when back online
- Skips metrics when PMG is offline

Alerts package coverage: 81.5%→82.4%
2025-12-02 11:40:53 +00:00
rcourtman
e1cdd6ebdb test: Add CheckPBS tests (0%→98.3%)
Add comprehensive TestCheckPBSComprehensive with 12 test cases covering:
- Returns early when alerts disabled
- DisableAllPBS clears existing CPU/memory/offline alerts
- Override with Disabled clears alerts
- DisableAllPBSOffline clears offline alert
- CPU threshold checking when online
- Memory threshold checking when online
- Skips metrics when PBS is offline
- Override thresholds applied correctly
- Offline status creates alert after confirmations
- Connection health error triggers offline alert
- Connection health unhealthy triggers offline alert
- Clears offline alert when back online

Alerts package coverage: 80.0%→81.5%
2025-12-02 11:38:36 +00:00
rcourtman
bc7fa17b54 test: Add CheckHost tests (49.6%→98.3%)
Add comprehensive TestCheckHostComprehensive with 17 test cases covering:
- Empty host ID early return
- Alerts disabled early return
- DisableAllHosts clears existing alerts
- Override with Disabled clears alerts
- CPU/Memory/Disk threshold nil clears alerts
- RAID degraded/rebuilding/healthy states
- RAID with failed devices triggers critical alert
- RAID resync triggers rebuilding alert
- Existing RAID alert not duplicated (preserves start time)
- Override thresholds applied correctly
- Multiple disks handling
- Offline alert cleared when host comes online
- Tags included in metadata

Alerts package coverage: 78.6%→80.0%
2025-12-02 11:34:20 +00:00
rcourtman
ceb54ba349 test: Add CheckGuest tests (41.4%→97.4%)
Cover all CheckGuest branches:
- Early return when alerts disabled
- Early return when all guests disabled
- VM and Container type handling
- Unsupported guest type returns early
- pulse-no-alerts tag suppresses alerts
- Stopped guest triggers powered-off check
- DisableAllGuestsOffline clears tracking
- Paused guest clears powered-off alert
- Non-running guest clears metric alerts
- Running guest clears powered-off alert
- Disabled thresholds clear existing alerts
- CPU, memory, disk metric checks
- Individual disk checks (mountpoint, device, index fallback)
- Skips disk with zero total or negative usage
- I/O metrics (diskRead, diskWrite, networkIn, networkOut)
- pulse-relaxed tag applies relaxed thresholds

Alerts package coverage: 76.0%→78.6%
2025-12-02 11:27:32 +00:00
rcourtman
dda3d866ec test: Add CheckNode tests (31%→100%)
Cover all CheckNode branches:
- Early return when alerts disabled
- DisableAllNodes clears existing alerts
- DisableNodesOffline clears tracking
- Offline/connection error/failed triggers offline check
- Online node clears offline alert
- Online node triggers metric checks
- Offline node skips metric checks
- Override thresholds applied correctly
- Temperature with package temp and max fallback
- Temperature skipped when unavailable/nil/no threshold
- Memory and disk metric checks

Alerts package coverage: 75.2%→76.0%
2025-12-02 11:24:04 +00:00
rcourtman
42890b70f8 test: Add suppressGuestAlerts and guestHasMonitorOnlyAlerts tests
Coverage improvements:
- suppressGuestAlerts: 37% -> 96.3%
- guestHasMonitorOnlyAlerts: 40% -> 90%

Tests cover:
- No alerts returns false
- Exact ResourceID match clears
- Prefix match (e.g., "vm100/disk1") clears
- All auxiliary maps cleared (pending, recent, suppressed, rateLimit)
- Multiple alerts cleared
- Monitor-only detection via metadata (bool and string types)
2025-12-02 11:14:30 +00:00
rcourtman
914b1ced2a test: Add applyThresholdOverride tests
Coverage for applyThresholdOverride: 50% -> 93.2%

Tests cover:
- Empty override returns base unchanged
- Disabled/DisableConnectivity flag overrides
- Modern CPU threshold override
- Legacy CPU threshold conversion
- Modern takes precedence over legacy
- Multiple metrics override
- Note override, clearing, trimming
- All legacy metric types (Memory, Disk, etc.)
- Temperature and Usage override
- ensureHysteresisThreshold Clear value filling
2025-12-02 11:08:58 +00:00
rcourtman
3379e90073 test: Add ClearActiveAlerts test with existing alerts
Coverage for ClearActiveAlerts: 16% -> 92%

Tests verify all internal maps are properly cleared when alerts exist:
- activeAlerts, pendingAlerts, recentAlerts
- suppressedUntil, alertRateLimit
- nodeOfflineCount, offlineConfirmations
- dockerOfflineCount, dockerStateConfirm
- ackState, recentlyResolved
2025-12-02 11:01:54 +00:00
rcourtman
e644c38071 test: Add CheckDiskHealth normal path tests
Coverage for CheckDiskHealth: 51% -> 98%

Tests cover:
- Healthy disk (PASSED/OK) creates no alert
- Failed non-Samsung disk creates critical alert
- Alert cleared when disk health recovers
- Low wearout (<10%) creates warning alert
- Wearout alert updates on subsequent checks
- Wearout alert cleared when wearout >= 10%
- Empty/UNKNOWN health creates no alert
2025-12-02 10:59:48 +00:00
rcourtman
eac8ed48c5 test: Add Docker container restart loop alert tests
Coverage for checkDockerContainerRestartLoop: 53.5% -> 95.3%

Tests cover:
- First check initializes tracking without alert
- Stable restart count doesn't alert
- Restarts under threshold (<=3) don't alert
- Restart loop threshold (>3) triggers critical alert
- Recovery after window expires clears alert
- Incremental restart accumulation
- Alert StartTime preservation on updates
2025-12-02 10:54:51 +00:00
rcourtman
e1105d68ca test: Add Docker container health and OOM kill alert tests
Coverage improvements:
- checkDockerContainerHealth: 21.1% -> 94.7%
- checkDockerContainerOOMKill: 19.2% -> 96.2%

Tests cover:
- Health states (healthy, empty, none, starting, unhealthy, degraded)
- Health alert recovery when container becomes healthy
- OOM kill detection (exit code 137)
- OOM alert deduplication (repeated 137 doesn't re-alert)
- OOM alert clearing when container recovers or exits with different code
2025-12-02 10:49:42 +00:00
rcourtman
4538b5348d fix: Isolate alerts tests with temp directories to prevent flaky failures
Tests using NewManager() were sharing /etc/pulse/alerts, causing race
conditions when running in parallel. Added newTestManager(t) helper that
creates isolated temp directories for each test.
2025-12-02 10:21:03 +00:00
rcourtman
74fe6b9223 test: Add tests for checkPMGNodeQueues
- 12 test cases covering per-node queue monitoring
- Tests total/deferred/hold queue thresholds at warning/critical levels
- Tests oldest message age per node
- Tests outlier detection, threshold clearing, existing alert updates
- Tests empty nodes and nil QueueStatus handling

Coverage: checkPMGNodeQueues 0%→85.1%
Package coverage: 70.3%→72.3%
2025-12-01 17:16:32 +00:00
rcourtman
f38736eefe test: Add tests for checkZFSPoolHealth
- 14 test cases covering pool state alerts (ONLINE, DEGRADED, FAULTED,
  UNAVAIL), pool error tracking, device-level alerts, state transitions
- Tests pool state clearing on recovery, error count updates,
  SPARE device handling, FAULTED device critical alerts

Coverage: checkZFSPoolHealth 0%→100%
Package coverage: 68.6%→70.3%
2025-12-01 17:14:12 +00:00
rcourtman
b8facf3e8a test: Add tests for checkEscalations and CleanupAlertsForNodes
- checkEscalations: 6 test cases covering disabled/enabled states,
  acknowledged alerts, threshold timing, multi-level escalation
- CleanupAlertsForNodes: 7 test cases covering node removal,
  Docker/PBS alert preservation, empty node handling, nil safety

Coverage: checkEscalations 0%→100%, CleanupAlertsForNodes 0%→92%
Package coverage: 67.6%→68.6%
2025-12-01 17:11:47 +00:00
rcourtman
b1265451a8 test: Add tests for Cleanup and convertLegacyThreshold
- Cleanup: 16 test cases covering auto-acknowledge, TTL cleanup,
  rate limit entries, suppressions, pending alerts, flapping history,
  Docker restart tracking, PMG anomaly trackers, quarantine history
- convertLegacyThreshold: 5 test cases covering nil/zero/negative input,
  default margin, custom margin

Coverage: Cleanup 0%→97.6%, convertLegacyThreshold 0%→83.3%
Package coverage: 65.4%→67.6%
2025-12-01 17:08:45 +00:00
rcourtman
73d2e91ab6 test: Add tests for checkStorageOffline and checkGuestPoweredOff
- checkStorageOffline: 5 test cases covering confirmation polling,
  alert creation, LastSeen updates, disabled storage handling
- checkGuestPoweredOff: 9 test cases covering confirmation polling,
  alert creation, severity overrides, disabled guests, monitorOnly flag

Coverage: checkStorageOffline 0%→100%, checkGuestPoweredOff 0%→100%
Package coverage: 63.4%→65.4%

Note: Pre-existing test failure in suite related to /etc/pulse file
access - not caused by these changes.
2025-12-01 17:05:15 +00:00
rcourtman
bfcd2f95a5 test: Add tests for checkPMGQueueDepths and checkPMGOldestMessage
- checkPMGQueueDepths: 10 test cases covering total/deferred/hold queues
  at warning/critical levels, existing alert updates, threshold clearing
- checkPMGOldestMessage: 8 test cases covering threshold logic,
  multi-node oldest detection, alert creation/updates/clearing

Coverage: checkPMGQueueDepths 0%→88.1%, checkPMGOldestMessage 0%→100%
Package coverage: 61.3%→63.4%
2025-12-01 16:59:59 +00:00
rcourtman
03f0ee01b1 test: Add tests for calculateTrimmedBaseline and createOrUpdateNodeAlert
- calculateTrimmedBaseline: 8 test cases covering sample count thresholds,
  trimmed mean calculation, median fallback logic, odd-length arrays
- createOrUpdateNodeAlert: 2 test cases for new alert creation and
  existing alert updates

Coverage: calculateTrimmedBaseline 0%→96.9%, createOrUpdateNodeAlert 0%→100%
Package coverage: 59.6%→61.3%
2025-12-01 16:56:14 +00:00
rcourtman
db102e86e2 test: Add tests for checkPBSOffline and checkPMGOffline functions
checkPBSOffline (0%→100%):
- Override Disabled clears alert and returns
- Override DisableConnectivity clears alert and returns
- Insufficient confirmations waits
- Creates alert after 3 confirmations
- Existing alert updates LastSeen

checkPMGOffline (0%→100%):
- Override Disabled clears alert and returns
- Override DisableConnectivity clears alert and returns
- Insufficient confirmations waits
- Creates alert after 3 confirmations
- Existing alert updates LastSeen

Coverage: 58.6% → 59.6%
2025-12-01 16:50:18 +00:00
rcourtman
e5c82a1720 test: Add tests for checkNodeOffline function
checkNodeOffline (0%→100%):
- Override DisableConnectivity clears alert and returns
- Existing alert updates StartTime
- Insufficient confirmations waits (1, 2 confirmations)
- Creates alert after 3 confirmations with correct metadata
- Alert added to history

Coverage: 58.5% → 58.6%
2025-12-01 16:48:13 +00:00
rcourtman
7e16c39332 test: Add tests for offline alert clearing functions
clearNodeOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline count when node comes online
- Clears existing alert and adds to resolved

clearPBSOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline confirmation count
- Clears existing alert and adds to resolved

clearPMGOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline confirmation count
- Clears existing alert and adds to resolved

Coverage: 57.4% → 58.5%
2025-12-01 16:45:51 +00:00
rcourtman
25cea6866e test: Add tests for alert utility functions
SetMetricHooks (0%→100%):
- Sets all four hooks correctly
- Nil hooks are safely handled
- Properly saves/restores state to avoid test pollution

NotifyExistingAlert (0%→100%):
- Non-existent alert is no-op
- Existing alert dispatches async notification

GetResolvedAlert (0%→100%):
- Returns nil for non-existent alert
- Returns nil for nil resolved entry
- Returns nil when Alert is nil
- Returns cloned alert with resolved time

GetAlertHistory/GetAlertHistorySince (0%→100%):
- Returns history from history manager
- Respects limit parameter
- Zero time returns all history
- Filters by time correctly

ClearAlertHistory (0%→100%):
- Clears all history entries

Coverage: 56.9% → 57.4%
2025-12-01 16:43:04 +00:00
rcourtman
807d47580b test: Add comprehensive tests for HandleDockerHostOffline
HandleDockerHostOffline (0%→100%):
- Empty host ID is no-op
- Disabled alerts is no-op
- DisableAllDockerHostsOffline clears tracking and alert
- Override DisableConnectivity clears tracking and alert
- Existing alert updates LastSeen
- Requires 3 confirmations before alert
- Alert has correct metadata (resourceType, hostId, agentId)

Coverage: 55.6% → 56.9%
2025-12-01 16:39:10 +00:00
rcourtman
5a774d1d7f test: Add comprehensive tests for HandleHostRemoved, ReevaluateGuestAlert
HandleHostRemoved (0%→100%):
- Empty host ID is no-op
- Clears host offline alert and confirmations
- Clears host metric alerts (CPU, memory)
- Clears host disk alerts
- Clears all alert types together

ReevaluateGuestAlert (0%→100%):
- No active alerts is no-op
- Clears alert when threshold disabled (nil)
- Clears alert when trigger is zero
- Clears alert when value below clear threshold
- Clears alert when value below trigger threshold
- Keeps alert when value above both thresholds
- Processes all metric types (7 types)
- Clears pending alert when threshold disabled
- Uses clear equals trigger when clear is zero
- Ignores alerts for different guests

NormalizeMetricTimeThresholds (0%→100%):
- Updated existing test to call public wrapper instead of internal function

Coverage: 54.6% → 55.6%
2025-12-01 16:36:48 +00:00
rcourtman
4752d82762 test: Add comprehensive tests for reevaluateActiveAlertsLocked
Tests major branches for alert re-evaluation on config change:
- Empty alerts map (no-op)
- Alert ID with insufficient parts (skipped)
- DisableAllPMG resolves PMG queue alerts
- DisableAllHosts resolves Host alerts via resourceType metadata
- Docker host offline alerts are skipped (not threshold-based)
- DisableAllDockerHosts resolves dockerhost metric alerts
- DisableAllNodes resolves Node alerts via Instance field
- DisableAllStorage resolves Storage alerts
- DisableAllPBS resolves PBS alerts
- DisableAllGuests resolves Guest alerts
- Disabled override resolves alert
- Alert below clear threshold is resolved
- Alert between clear and trigger is resolved on config change

Coverage improvement:
- reevaluateActiveAlertsLocked: 59.8% → 81.2%
- alerts package: 54.0% → 54.6%
2025-12-01 16:31:05 +00:00
rcourtman
c9b70f2ea3 test: Add comprehensive tests for HandleHostOffline function
Tests all branches:
- Empty host ID returns early
- Alerts disabled returns early
- DisableAllHostsOffline clears alert and confirmation tracking
- Override DisableConnectivity clears alert and returns
- Override Disabled clears alert and returns
- Existing alert updates LastSeen only
- Insufficient confirmations waits for more
- Sufficient confirmations (3) creates alert with metadata

Coverage improvement:
- HandleHostOffline: 64.0% → 100%
- alerts package: 53.5% → 54.0%
2025-12-01 16:26:56 +00:00
rcourtman
d8fc2ec558 test: Add comprehensive tests for applyGlobalOfflineSettingsLocked
Tests all 7 disable settings branches:
- DisableAllNodesOffline: clears node-offline-* alerts and nodeOfflineCount
- DisableAllPBSOffline: clears pbs-offline-* alerts and offlineConfirmations
- DisableAllGuestsOffline: clears guest-powered-off-* alerts and offlineConfirmations
- DisableAllDockerHostsOffline: clears docker-host-offline-* alerts and dockerOfflineCount
- DisableAllDockerContainers: clears docker-container-* alerts and tracking state
- DisableAllDockerServices: clears docker-service-* alerts
- No settings enabled: verifies no-op behavior

Coverage improvement:
- applyGlobalOfflineSettingsLocked: 57.1% → 100%
- alerts package: 53.0% → 53.5%
2025-12-01 16:23:45 +00:00
rcourtman
f2c2ff625f test: Add comprehensive tests for alert clearing functions
- TestClearBackupAlertsLocked: tests backup-age alert clearing with nil handling
- TestClearBackupAlerts: tests public wrapper function
- TestClearSnapshotAlertsForInstanceLocked: tests instance-specific snapshot clearing
- TestClearSnapshotAlertsForInstance: tests public wrapper function

Coverage improvement:
- clearBackupAlertsLocked: 75% → 100%
- clearBackupAlerts: 0% → 100%
- clearSnapshotAlertsForInstanceLocked: 50% → 100%
- clearSnapshotAlertsForInstance: 0% → 100%
- alerts package: 52.8% → 53.0%
2025-12-01 16:20:49 +00:00