Implement 5 medium/low priority improvements identified in systematic review:
UX IMPROVEMENTS:
- Notify existing critical alerts when activating from pending_review state
Previously: critical alerts during observation window would never notify
Now: users receive notifications for active critical alerts after activation
Implementation: Added NotifyExistingAlert() method and logic in ActivateAlerts()
PERFORMANCE OPTIMIZATIONS:
- Replace per-alert cleanup goroutines with periodic batch cleanup
Prevents spawning 1000s of goroutines during alert flapping
recentlyResolved entries now cleaned up once per minute instead of 1 goroutine per alert
- Simplify GetActiveAlerts() implementation
Removed intermediate map copy, holds lock slightly longer but operation is fast
Cleaner code with reduced memory allocation
CONFIGURATION VALIDATION:
- Validate timezone in quiet hours configuration
Invalid timezones now disable quiet hours with error log instead of silent fallback
Prevents unexpected behavior when timezone is typo'd or invalid
GRACEFUL SHUTDOWN:
- Add 100ms delay in Stop() for background goroutine cleanup
Reduces risk of state corruption during shutdown
Allows escalation checker and periodic save to exit cleanly
Technical details:
- internal/alerts/alerts.go: Added NotifyExistingAlert(), optimized cleanup patterns
- internal/api/alerts.go: Enhanced ActivateAlerts() to notify existing critical alerts
- Removed ~20 lines of goroutine spawning code
- Added periodic cleanup for recentlyResolved map
- All changes preserve backward compatibility
Testing: Verified compilation with 'go build -o /dev/null ./...'
Fix 5 critical bugs identified through systematic code review:
CRITICAL FIXES (prevent service crashes):
- Add panic recovery to all alert callbacks (onAlert, onResolved, onEscalate)
- Clone alerts before passing to escalation callback to prevent data races
- Make clearAlertNoLock callback async to prevent deadlock
HIGH PRIORITY FIXES (prevent memory leaks):
- Add cleanup for stale pendingAlerts entries (deleted resources)
- Add cleanup for dockerRestartTracking (ephemeral containers in CI/CD)
MEDIUM PRIORITY FIXES (prevent stuck alerts):
- Validate hysteresis thresholds (ensure clear < trigger)
- Auto-fix invalid configurations with warning logs
Impact:
- Service stability: Malformed webhook URLs or email configs can no longer crash Pulse
- Memory management: Prevents unbounded growth in dynamic environments
- Alert reliability: Prevents alerts that never clear due to invalid thresholds
- Concurrency safety: Eliminates data races in escalation path
Technical details:
- Created safeCallResolvedCallback() and safeCallEscalateCallback() wrappers
- Added ensureValidHysteresis() validation helper
- Extended Cleanup() with pendingAlerts and dockerRestartTracking pruning
- All callbacks now have defer/recover panic handlers with detailed logging
Testing: Verified compilation with 'go build -o /dev/null ./...'
- Add comprehensive test coverage for alerts package with 285+ new tests
- Implement ThresholdsTable component with metric thresholds display
- Enhance Alerts page UI with improved layout and metric filtering
- Add frontend component tests for Alerts page and ThresholdsTable
- Set up Vitest testing infrastructure for SolidJS components
- Improve config persistence with better validation
- Expand discovery tests with 333+ test cases
- Update API, configuration, and Docker monitoring documentation