Fix monitoring test panic and goroutine leaks

Two critical fixes to prevent test timeouts:

1. Nil map panic in TestPollPVEInstanceUsesRRDMemUsedFallback:
   - Test monitor was missing nodeLastOnline map initialization
   - Panic occurred when pollPVEInstance tried to update nodeLastOnline[nodeID]
   - Caused deadlock when panic recovery tried to acquire already-held mutex
   - Added nodeLastOnline: make(map[string]time.Time) to test monitor

2. Alert manager goroutine leak in Docker tests:
   - newTestMonitor() created alert manager but never stopped it
   - Background goroutines (escalationChecker, periodicSaveAlerts) kept running
   - Added t.Cleanup(func() { m.alertManager.Stop() }) to test helper

These fixes resolve the 10+ minute test timeouts in CI workflows.

Related to workflow run 19281508603.
This commit is contained in:
rcourtman 2025-11-11 23:52:24 +00:00
parent bbe11d1e7f
commit 02273e7fcb
2 changed files with 4 additions and 1 deletions

View file

@ -13,7 +13,7 @@ import (
func newTestMonitor(t *testing.T) *Monitor {
t.Helper()
return &Monitor{
m := &Monitor{
state: models.NewState(),
alertManager: alerts.NewManager(),
removedDockerHosts: make(map[string]time.Time),
@ -21,6 +21,8 @@ func newTestMonitor(t *testing.T) *Monitor {
dockerTokenBindings: make(map[string]string),
dockerMetadataStore: config.NewDockerMetadataStore(t.TempDir()),
}
t.Cleanup(func() { m.alertManager.Stop() })
return m
}
func TestApplyDockerReportGeneratesUniqueIDsForCollidingHosts(t *testing.T) {

View file

@ -191,6 +191,7 @@ func TestPollPVEInstanceUsesRRDMemUsedFallback(t *testing.T) {
dlqInsightMap: make(map[string]*dlqInsight),
authFailures: make(map[string]int),
lastAuthAttempt: make(map[string]time.Time),
nodeLastOnline: make(map[string]time.Time),
}
defer mon.alertManager.Stop()
defer mon.notificationMgr.Stop()