Commit graph

4826 commits

Author SHA1 Message Date
rcourtman
17e43980c8 test: Add tests for getNodeDisplayName, lookupClusterEndpointLabel, extractSnapshotName
- getNodeDisplayName: 71%→100% (12 tests covering cluster/non-cluster fallbacks)
- lookupClusterEndpointLabel: 77%→85% (14 tests covering endpoint matching)
- extractSnapshotName: 78%→100% (8 tests covering volid parsing)

Monitoring package 46.2%→46.4%
2025-12-01 18:37:27 +00:00
rcourtman
d64f184830 test: Add tests for clampUint64ToInt64 and resetAuthFailures
- clampUint64ToInt64: 67%→100% (5 tests covering boundary conditions)
- resetAuthFailures: 67%→100% (6 tests covering map cleanup logic)

Note: cloneStringFloatMap and cloneStringMap tests already existed.
2025-12-01 18:31:34 +00:00
rcourtman
81317da1c6 test: Add tests for SetBreakerState, RecordResult, ResetQueueDepth, IncInFlight, DecInFlight
- SetBreakerState: 67%→100% (8 tests covering state conversion, retryAt clamping)
- RecordResult: 75%→96% (11 tests covering success/failure paths, staleness)
- ResetQueueDepth: 75%→100% (4 tests covering negative clamping)
- IncInFlight/DecInFlight: 67%→100% (4 tests)

Monitoring package 45.9%→46.2%
2025-12-01 18:27:05 +00:00
rcourtman
2758261581 test: Add tests for recordAuthFailure, shouldSkipProxyHost, recoverFromPanic
- recordAuthFailure: 53%→100% (8 tests covering failure counting, node removal)
- shouldSkipProxyHost: 44%→100% (9 tests covering cooldown logic, state cleanup)
- recoverFromPanic: 50%→100% (7 tests covering various panic value types)

Monitoring package 45.3%→45.9%
2025-12-01 18:20:12 +00:00
rcourtman
7359a9fb4d test: Add tests for mergeNVMeTempsIntoDisks, UpdateQueueSnapshot, UpdateDeadLetterCounts, templateFuncMap
- mergeNVMeTempsIntoDisks: 57%→97% (14 tests covering WWN/serial/path matching, NVMe fallback)
- UpdateQueueSnapshot: 57%→100% (5 tests covering nil, gauges, stale key cleanup)
- UpdateDeadLetterCounts: 52%→100% (7 tests covering aggregation, normalization)
- templateFuncMap: 20%→100% (8 tests covering all template helper functions)

Monitoring 44.6%→45.3%, notifications 45.7%→45.9%
2025-12-01 18:14:43 +00:00
rcourtman
eb02f28f5b test: Add tests for getInstanceConfig, baseIntervalForInstanceType, LoadOIDCConfig
- getInstanceConfig: 33%→100% (nil handling, case-insensitive matching)
- baseIntervalForInstanceType: 50%→100% (all instance types, clamping)
- LoadOIDCConfig: 35%→94% (file not exist, read errors, decryption)
2025-12-01 18:06:15 +00:00
rcourtman
c729b8c649 test: Add monitoring tests for parseDurationEnv, parseIntEnv, markFailed, logNodeMemorySource
Add comprehensive test coverage for:
- parseDurationEnv: empty/unset env, valid durations, invalid inputs
- parseIntEnv: empty/unset env, valid integers, invalid inputs
- markFailed: status setting, timestamps, failure reason
- logNodeMemorySource: nil handling, source change detection, log levels

All four functions now at 100% coverage (up from 0-78%).
2025-12-01 17:59:20 +00:00
rcourtman
8198d05993 test: Add tests for SSH knownhosts error and path methods
- HostKeyChangeError.Error(): message formatting
- HostKeyChangeError.Unwrap(): error chain verification
- manager.Path(): path retrieval
- HostFieldMatches(): host matching patterns

ssh/knownhosts coverage: 72.2% → 75.0%
2025-12-01 17:50:13 +00:00
rcourtman
31e8f4b19b test: Add tests for diagnostic snapshot recording and retrieval
- recordGuestSnapshot: nil handling, field setting, map initialization
- GetDiagnosticSnapshots: nil/empty handling, sorting verification

Both functions now at 100% coverage.
2025-12-01 17:48:36 +00:00
rcourtman
1a23a7d9ec test: Add error path tests for config persistence functions
- LoadAPITokens: invalid JSON, empty file, file not exist
- LoadEmailConfig: invalid JSON, file not exist
- LoadWebhooks: invalid JSON, legacy migration
- LoadNodesConfig: empty arrays, missing fields, corruption recovery
- cleanupOldBackups: non-existent dir, multiple files cleanup
- Bonus: LoadAppriseConfig, LoadAlertConfig, LoadSystemSettings error paths

config package coverage: 46.3% → 48.2%
2025-12-01 17:45:50 +00:00
rcourtman
ed75f2f096 test: Add comprehensive tests for API token management
- Clone: deep copy verification for pointers and slices
- NewAPITokenRecord/NewHashedAPITokenRecord: creation and validation
- Config methods: HasAPITokens, APITokenCount, ActiveAPITokenHashes
- Config methods: HasAPITokenHash, PrimaryAPITokenHash, PrimaryAPITokenHint
- Config methods: ValidateAPIToken, UpsertAPIToken, RemoveAPIToken, SortAPITokens

config package coverage: 43.5% → 46.3%
2025-12-01 17:37:27 +00:00
rcourtman
b444793897 test: Add tests for monitoring and notifications functions
- buildCephClusterModel: 0% → 100% (11 test cases)
- collectContainerRootUsage: 0% → 100% (18 test cases)
- NotificationManager getters/setters: 8 functions now tested

Overall coverage: 45.5% → 45.8%
2025-12-01 17:33:36 +00:00
rcourtman
74fe6b9223 test: Add tests for checkPMGNodeQueues
- 12 test cases covering per-node queue monitoring
- Tests total/deferred/hold queue thresholds at warning/critical levels
- Tests oldest message age per node
- Tests outlier detection, threshold clearing, existing alert updates
- Tests empty nodes and nil QueueStatus handling

Coverage: checkPMGNodeQueues 0%→85.1%
Package coverage: 70.3%→72.3%
2025-12-01 17:16:32 +00:00
rcourtman
f38736eefe test: Add tests for checkZFSPoolHealth
- 14 test cases covering pool state alerts (ONLINE, DEGRADED, FAULTED,
  UNAVAIL), pool error tracking, device-level alerts, state transitions
- Tests pool state clearing on recovery, error count updates,
  SPARE device handling, FAULTED device critical alerts

Coverage: checkZFSPoolHealth 0%→100%
Package coverage: 68.6%→70.3%
2025-12-01 17:14:12 +00:00
rcourtman
b8facf3e8a test: Add tests for checkEscalations and CleanupAlertsForNodes
- checkEscalations: 6 test cases covering disabled/enabled states,
  acknowledged alerts, threshold timing, multi-level escalation
- CleanupAlertsForNodes: 7 test cases covering node removal,
  Docker/PBS alert preservation, empty node handling, nil safety

Coverage: checkEscalations 0%→100%, CleanupAlertsForNodes 0%→92%
Package coverage: 67.6%→68.6%
2025-12-01 17:11:47 +00:00
rcourtman
b1265451a8 test: Add tests for Cleanup and convertLegacyThreshold
- Cleanup: 16 test cases covering auto-acknowledge, TTL cleanup,
  rate limit entries, suppressions, pending alerts, flapping history,
  Docker restart tracking, PMG anomaly trackers, quarantine history
- convertLegacyThreshold: 5 test cases covering nil/zero/negative input,
  default margin, custom margin

Coverage: Cleanup 0%→97.6%, convertLegacyThreshold 0%→83.3%
Package coverage: 65.4%→67.6%
2025-12-01 17:08:45 +00:00
rcourtman
73d2e91ab6 test: Add tests for checkStorageOffline and checkGuestPoweredOff
- checkStorageOffline: 5 test cases covering confirmation polling,
  alert creation, LastSeen updates, disabled storage handling
- checkGuestPoweredOff: 9 test cases covering confirmation polling,
  alert creation, severity overrides, disabled guests, monitorOnly flag

Coverage: checkStorageOffline 0%→100%, checkGuestPoweredOff 0%→100%
Package coverage: 63.4%→65.4%

Note: Pre-existing test failure in suite related to /etc/pulse file
access - not caused by these changes.
2025-12-01 17:05:15 +00:00
rcourtman
bfcd2f95a5 test: Add tests for checkPMGQueueDepths and checkPMGOldestMessage
- checkPMGQueueDepths: 10 test cases covering total/deferred/hold queues
  at warning/critical levels, existing alert updates, threshold clearing
- checkPMGOldestMessage: 8 test cases covering threshold logic,
  multi-node oldest detection, alert creation/updates/clearing

Coverage: checkPMGQueueDepths 0%→88.1%, checkPMGOldestMessage 0%→100%
Package coverage: 61.3%→63.4%
2025-12-01 16:59:59 +00:00
rcourtman
03f0ee01b1 test: Add tests for calculateTrimmedBaseline and createOrUpdateNodeAlert
- calculateTrimmedBaseline: 8 test cases covering sample count thresholds,
  trimmed mean calculation, median fallback logic, odd-length arrays
- createOrUpdateNodeAlert: 2 test cases for new alert creation and
  existing alert updates

Coverage: calculateTrimmedBaseline 0%→96.9%, createOrUpdateNodeAlert 0%→100%
Package coverage: 59.6%→61.3%
2025-12-01 16:56:14 +00:00
rcourtman
db102e86e2 test: Add tests for checkPBSOffline and checkPMGOffline functions
checkPBSOffline (0%→100%):
- Override Disabled clears alert and returns
- Override DisableConnectivity clears alert and returns
- Insufficient confirmations waits
- Creates alert after 3 confirmations
- Existing alert updates LastSeen

checkPMGOffline (0%→100%):
- Override Disabled clears alert and returns
- Override DisableConnectivity clears alert and returns
- Insufficient confirmations waits
- Creates alert after 3 confirmations
- Existing alert updates LastSeen

Coverage: 58.6% → 59.6%
2025-12-01 16:50:18 +00:00
rcourtman
e5c82a1720 test: Add tests for checkNodeOffline function
checkNodeOffline (0%→100%):
- Override DisableConnectivity clears alert and returns
- Existing alert updates StartTime
- Insufficient confirmations waits (1, 2 confirmations)
- Creates alert after 3 confirmations with correct metadata
- Alert added to history

Coverage: 58.5% → 58.6%
2025-12-01 16:48:13 +00:00
rcourtman
7e16c39332 test: Add tests for offline alert clearing functions
clearNodeOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline count when node comes online
- Clears existing alert and adds to resolved

clearPBSOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline confirmation count
- Clears existing alert and adds to resolved

clearPMGOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline confirmation count
- Clears existing alert and adds to resolved

Coverage: 57.4% → 58.5%
2025-12-01 16:45:51 +00:00
rcourtman
25cea6866e test: Add tests for alert utility functions
SetMetricHooks (0%→100%):
- Sets all four hooks correctly
- Nil hooks are safely handled
- Properly saves/restores state to avoid test pollution

NotifyExistingAlert (0%→100%):
- Non-existent alert is no-op
- Existing alert dispatches async notification

GetResolvedAlert (0%→100%):
- Returns nil for non-existent alert
- Returns nil for nil resolved entry
- Returns nil when Alert is nil
- Returns cloned alert with resolved time

GetAlertHistory/GetAlertHistorySince (0%→100%):
- Returns history from history manager
- Respects limit parameter
- Zero time returns all history
- Filters by time correctly

ClearAlertHistory (0%→100%):
- Clears all history entries

Coverage: 56.9% → 57.4%
2025-12-01 16:43:04 +00:00
rcourtman
807d47580b test: Add comprehensive tests for HandleDockerHostOffline
HandleDockerHostOffline (0%→100%):
- Empty host ID is no-op
- Disabled alerts is no-op
- DisableAllDockerHostsOffline clears tracking and alert
- Override DisableConnectivity clears tracking and alert
- Existing alert updates LastSeen
- Requires 3 confirmations before alert
- Alert has correct metadata (resourceType, hostId, agentId)

Coverage: 55.6% → 56.9%
2025-12-01 16:39:10 +00:00
rcourtman
5a774d1d7f test: Add comprehensive tests for HandleHostRemoved, ReevaluateGuestAlert
HandleHostRemoved (0%→100%):
- Empty host ID is no-op
- Clears host offline alert and confirmations
- Clears host metric alerts (CPU, memory)
- Clears host disk alerts
- Clears all alert types together

ReevaluateGuestAlert (0%→100%):
- No active alerts is no-op
- Clears alert when threshold disabled (nil)
- Clears alert when trigger is zero
- Clears alert when value below clear threshold
- Clears alert when value below trigger threshold
- Keeps alert when value above both thresholds
- Processes all metric types (7 types)
- Clears pending alert when threshold disabled
- Uses clear equals trigger when clear is zero
- Ignores alerts for different guests

NormalizeMetricTimeThresholds (0%→100%):
- Updated existing test to call public wrapper instead of internal function

Coverage: 54.6% → 55.6%
2025-12-01 16:36:48 +00:00
rcourtman
4752d82762 test: Add comprehensive tests for reevaluateActiveAlertsLocked
Tests major branches for alert re-evaluation on config change:
- Empty alerts map (no-op)
- Alert ID with insufficient parts (skipped)
- DisableAllPMG resolves PMG queue alerts
- DisableAllHosts resolves Host alerts via resourceType metadata
- Docker host offline alerts are skipped (not threshold-based)
- DisableAllDockerHosts resolves dockerhost metric alerts
- DisableAllNodes resolves Node alerts via Instance field
- DisableAllStorage resolves Storage alerts
- DisableAllPBS resolves PBS alerts
- DisableAllGuests resolves Guest alerts
- Disabled override resolves alert
- Alert below clear threshold is resolved
- Alert between clear and trigger is resolved on config change

Coverage improvement:
- reevaluateActiveAlertsLocked: 59.8% → 81.2%
- alerts package: 54.0% → 54.6%
2025-12-01 16:31:05 +00:00
rcourtman
c9b70f2ea3 test: Add comprehensive tests for HandleHostOffline function
Tests all branches:
- Empty host ID returns early
- Alerts disabled returns early
- DisableAllHostsOffline clears alert and confirmation tracking
- Override DisableConnectivity clears alert and returns
- Override Disabled clears alert and returns
- Existing alert updates LastSeen only
- Insufficient confirmations waits for more
- Sufficient confirmations (3) creates alert with metadata

Coverage improvement:
- HandleHostOffline: 64.0% → 100%
- alerts package: 53.5% → 54.0%
2025-12-01 16:26:56 +00:00
rcourtman
d8fc2ec558 test: Add comprehensive tests for applyGlobalOfflineSettingsLocked
Tests all 7 disable settings branches:
- DisableAllNodesOffline: clears node-offline-* alerts and nodeOfflineCount
- DisableAllPBSOffline: clears pbs-offline-* alerts and offlineConfirmations
- DisableAllGuestsOffline: clears guest-powered-off-* alerts and offlineConfirmations
- DisableAllDockerHostsOffline: clears docker-host-offline-* alerts and dockerOfflineCount
- DisableAllDockerContainers: clears docker-container-* alerts and tracking state
- DisableAllDockerServices: clears docker-service-* alerts
- No settings enabled: verifies no-op behavior

Coverage improvement:
- applyGlobalOfflineSettingsLocked: 57.1% → 100%
- alerts package: 53.0% → 53.5%
2025-12-01 16:23:45 +00:00
rcourtman
f2c2ff625f test: Add comprehensive tests for alert clearing functions
- TestClearBackupAlertsLocked: tests backup-age alert clearing with nil handling
- TestClearBackupAlerts: tests public wrapper function
- TestClearSnapshotAlertsForInstanceLocked: tests instance-specific snapshot clearing
- TestClearSnapshotAlertsForInstance: tests public wrapper function

Coverage improvement:
- clearBackupAlertsLocked: 75% → 100%
- clearBackupAlerts: 0% → 100%
- clearSnapshotAlertsForInstanceLocked: 50% → 100%
- clearSnapshotAlertsForInstance: 0% → 100%
- alerts package: 52.8% → 53.0%
2025-12-01 16:20:49 +00:00
rcourtman
b1f5dac4fc test: Add edge case tests for ClearActiveAlerts and sanitizeAlertKey
- TestClearActiveAlertsEmptyMaps: tests early return when both maps empty
- sanitizeAlertKey: test for input with only slashes/backslashes -> root

Coverage improvement:
- sanitizeAlertKey: 96.4% → 100%
- alerts package: 52.8%
2025-12-01 16:15:50 +00:00
rcourtman
8943bfedd2 test: Add comprehensive tests for AcknowledgeAlert and UnacknowledgeAlert functions
- TestAcknowledgeAlertNotFound: tests error path when alert doesn't exist
- TestUnacknowledgeAlertNotFound: tests error path when alert doesn't exist
- TestUnacknowledgeAlertSuccess: tests successful unacknowledgement with state verification

Coverage improvement:
- AcknowledgeAlert: 92.3% → 100%
- UnacknowledgeAlert: 41.7% → 100%
- alerts package: 52.6% → 52.8%
2025-12-01 16:12:50 +00:00
rcourtman
a3dec60dc6 test: Add edge case tests for matchesDockerIgnoredPrefix function
- skips empty prefix in list
- all empty prefixes returns false
- empty name/id edge cases
- matchesDockerIgnoredPrefix 92.3%→100%
2025-12-01 16:08:21 +00:00
rcourtman
82e793bef7 test: Add comprehensive tests for HandleHostOnline function
- clears offline alert and confirmation tracking
- clears confirmation even without alert
- empty host ID is noop
- HandleHostOnline 90%→100%
2025-12-01 16:06:06 +00:00
rcourtman
da2248fc10 test: Add comprehensive tests for safeCallResolvedCallback function
- calls callback synchronously with alert ID
- calls callback asynchronously
- noop when callback is nil
- recovers from panic in sync and async callbacks
- safeCallResolvedCallback 90%→100%
2025-12-01 16:04:25 +00:00
rcourtman
60ccaf2d62 test: Add comprehensive tests for safeCallEscalateCallback function
- calls callback with alert and level
- noop when callback is nil
- recovers from panic in callback
- clones alert to prevent concurrent modification
- safeCallEscalateCallback 0%→100%
2025-12-01 16:02:19 +00:00
rcourtman
b24f57aa1c test: Add comprehensive tests for cleanupDockerContainerAlerts function
- clears alerts and state tracking not in seen set
- skips alerts from other hosts
- handles empty seen set (clears all)
- cleanupDockerContainerAlerts 75%→100%
2025-12-01 15:59:49 +00:00
rcourtman
e34b939744 test: Add comprehensive tests for HandleDockerHostOnline function
- clears offline alert and tracking
- clears tracking even when no alert exists
- empty host ID is noop
- HandleDockerHostOnline 88.9%→100%
2025-12-01 15:57:44 +00:00
rcourtman
5a9e947f78 test: Add empty ID test for HandleDockerHostRemoved function
- empty host ID is noop (preserves existing alerts/tracking)
- HandleDockerHostRemoved 75%→100%
2025-12-01 15:56:11 +00:00
rcourtman
a73486ca63 test: Add comprehensive tests for cleanupHostDiskAlerts function
- clears alerts not in seen set (stale disk alerts)
- empty host ID is noop
- skips nil alerts without panic
- skips non-matching prefix (non-disk alerts)
- cleanupHostDiskAlerts 76.9%→100%
2025-12-01 15:54:33 +00:00
rcourtman
2cded52b72 test: Add comprehensive tests for clearHostDiskAlerts function
- clears all disk alerts for specified host
- empty hostID is noop
- skips nil alerts without panic
- noop when no matching alerts
- clearHostDiskAlerts 72.7%→100%
2025-12-01 15:52:39 +00:00
rcourtman
ba4a6e0cb4 test: Add comprehensive tests for clearHostMetricAlerts function
- clears specified metrics from host
- defaults to cpu and memory when no metrics specified
- empty hostID is noop
- clearHostMetricAlerts 85.7%→100%
2025-12-01 15:50:59 +00:00
rcourtman
547bc61e44 test: Add comprehensive tests for clearStorageOfflineAlert function
- clears existing offline alert and confirmation
- triggers resolved callback
- noop when no alert exists
- clears offline confirmation even without alert
- clearStorageOfflineAlert 50%→100%
2025-12-01 15:48:32 +00:00
rcourtman
8ecab8defa test: Add comprehensive tests for dockerServiceResourceID function
- with/without host ID (prefix selection)
- derives ID from service name with sanitization
- special chars replaced with dashes
- preserves alphanumeric, underscore, hyphen
- trims leading/trailing dashes/underscores
- truncates long IDs to 32 chars
- fallback to 'service' when empty
- dockerServiceResourceID 23.8%→100%
2025-12-01 15:46:34 +00:00
rcourtman
60aa2d7cde test: Add comprehensive tests for dockerServiceDisplayName function
- returns name when present
- returns trimmed name
- returns truncated ID when name empty (first 12 chars)
- returns full short ID when < 12 chars
- returns 'service' when both empty/whitespace
- prefers name over ID
- dockerServiceDisplayName 33.3%→100%
2025-12-01 15:44:40 +00:00
rcourtman
f7854809bd test: Add comprehensive tests for shouldNotifyAfterCooldown function
- cooldown disabled (0) allows notification
- negative cooldown allows notification
- first notification allowed when never notified
- notification blocked during cooldown period
- notification allowed after cooldown expires
- boundary condition (exact cooldown time)
- shouldNotifyAfterCooldown 42.9%→100%
2025-12-01 15:42:35 +00:00
rcourtman
993835d985 test: Add comprehensive tests for applyRelaxedGuestThresholds function
- nil thresholds get defaults (CPU 95, Memory 92, Disk 95)
- low thresholds raised to minimum relaxed values
- high thresholds unchanged
- clear adjusted when too close to trigger
- original config unchanged (no mutation)
- alerts package coverage improvement: applyRelaxedGuestThresholds 75%→93.8%
2025-12-01 15:40:19 +00:00
rcourtman
e272d12f11 test: Add comprehensive tests for checkRateLimit function
Cover all branches: disabled rate limit (zero/negative MaxAlertsHour),
under limit, at limit blocking, separate limits per alertID, old entry
cleanup, and mixed old/recent entries. Coverage improved from 71.4% to 100%.
2025-12-01 15:35:19 +00:00
rcourtman
560f22fc03 test: Add comprehensive tests for getMetricTimeThreshold function
Cover all branches: empty config, empty/whitespace metricType, direct
match, canonical key matching (vm→guest, container→guest), fallback
chain (default, _default, wildcard), precedence, and no-match scenarios.
Coverage improved from 77.8% to 100%.
2025-12-01 15:32:33 +00:00
rcourtman
1c36a2433f test: Add comprehensive tests for EmailTemplate function
Cover both branches: single alert template (isSingle=true with 1 alert)
and grouped alert template (isSingle=false or multiple alerts). Tests
verify subject format, HTML structure, and text body generation.
Coverage improved from 66.7% to 100%.
2025-12-01 15:29:31 +00:00
rcourtman
93a01d5373 test: Add comprehensive tests for getBaseTimeThreshold function
Cover all branches: nil/empty TimeThresholds, direct match, canonical key
matching (vm→guest, container→guest), "all" fallback, specific match
precedence, and global threshold fallback. Coverage improved from 85.7% to 100%.
2025-12-01 15:25:18 +00:00