Pulse/internal/notifications
rcourtman 99e5a38534 Fix critical monitoring system issues and add robustness improvements
This commit addresses 9 critical issues identified during the monitoring system audit:

**Race Conditions Fixed:**
- PBS backup pollers: Moved lock earlier to eliminate check-then-act race (lines 7316-7378)
- PVE backup poll timing: Fixed double write to lastPVEBackupPoll with proper synchronization (lines 5927-5977)
- Docker hosts cleanup: Refactored to avoid holding both m.mu and s.mu locks simultaneously (lines 1911-1937)

**Context Propagation Fixed:**
- Replaced all context.Background() calls with parent context for proper cancellation chain:
  - PBS backup poller (line 7367)
  - PVE backup poller (line 5955)
  - PBS fallback check (line 7154)

**Memory Leak Prevention:**
- Added cleanup for guest metadata cache (10 minute TTL, lines 1942-1957)
- Added cleanup for diagnostic snapshots (1 hour TTL, lines 1959-1987)
- Added cleanup for RRD cache (1 minute TTL, lines 1989-2007)
- All cleanup methods called on 10-second ticker (lines 3791-3793)

**Panic Recovery:**
- Added recoverFromPanic helper to log panics with stack traces (lines 1910-1920)
- Protected all critical goroutines:
  - poll (line 4020)
  - taskWorker (line 4200)
  - retryFailedConnections (line 3851)
  - checkMockAlerts (line 8896)
  - pollPVEInstance (line 4886)
  - pollPBSInstance (line 7164)
  - pollPMGInstance (line 7498)

**Import Fixes:**
- Added missing sync import to email_enhanced.go
- Added missing os import to queue.go

All fixes maintain proper lock ordering and release locks before calling methods that acquire other locks to prevent deadlocks.
2025-11-07 08:52:37 +00:00
..
concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
email_enhanced.go Fix critical monitoring system issues and add robustness improvements 2025-11-07 08:52:37 +00:00
email_providers.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
email_template.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
notifications.go Implement queue cancellation and atomic DB operations (P1 fixes) 2025-11-07 08:33:09 +00:00
notifications_test.go Add test notification functionality for Apprise 2025-11-05 18:54:18 +00:00
queue.go Fix critical monitoring system issues and add robustness improvements 2025-11-07 08:52:37 +00:00
webhook_enhanced.go Document layered retry semantics (P2 documentation) 2025-11-07 08:35:00 +00:00
webhook_templates.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00