Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
their infrastructure's normal behavior patterns
- Added baseline import to service.go
Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior
Related to #864
Addresses issue #861 - syslog flooded on docker host
Many routine operational messages were being logged at INFO level,
causing excessive log volume when monitoring multiple VMs/containers.
These messages are now logged at DEBUG level:
- Guest threshold checking (every guest, every poll cycle)
- Storage threshold checking (every storage, every poll cycle)
- Host agent linking messages
- Filesystem inclusion in disk calculation
- Guest agent disk usage replacement
- Polling start/completion messages
- Alert cleanup and save messages
Users can set LOG_LEVEL=debug to see these messages if needed for
troubleshooting. The default INFO level now produces significantly
less log output.
Also updated documentation in CONFIGURATION.md and DOCKER.md to:
- Clarify what each log level includes
- Add tip about using LOG_LEVEL=warn for minimal logging
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).
- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration
Closes#839
Connect alert system to failure prediction:
1. Add AlertCallback to HistoryManager:
- OnAlert() method to register callbacks
- Callbacks invoked when alerts are added
- Called outside lock to prevent deadlocks
2. Expose OnAlertHistory() on alerts.Manager:
- Pass-through to HistoryManager.OnAlert()
- Enables external systems to track alerts
3. Wire pattern detector in router startup:
- Register callback when pattern detector is created
- Convert alert types to trackable events
- Pattern detector now learns from production alerts
Now every alert (memory_warning, cpu_critical, etc.) is recorded as
a historical event for pattern analysis. The AI can predict:
'High memory usage typically occurs every ~3 days (next expected in ~1 day)'
All tests passing.
ClearActiveAlerts triggers an async save to disk, which can race with
LoadActiveAlerts reading the file. The test now clears the in-memory
map directly without triggering the async save.
- Rename checkFlapping to checkFlappingLocked to clarify lock contract
- Replace goto statements with structured control flow
- Wire up unused recordAlertFired/recordAlertResolved metric hooks
- Add trackingMapCleanup goroutine to prevent memory leaks from stale entries
- Tighten alert ID validation to alphanumeric + safe punctuation
- Fix history save error handling to properly manage backup lifecycle
- Add auto-migration for deprecated GroupingWindow field
- Refactor 300+ line UpdateConfig into focused helper functions
- Unify duplicate evaluateVMCondition/evaluateContainerCondition
- Add constants for magic numbers (thresholds, timing, flapping)
- Update tests to match new backup behavior
Add missing godoc comments to:
- BuildGuestKey in alerts/alerts.go
- GenerateMockData in mock/generator.go
- NewDockerUpdater, NewAURUpdater in updates/adapter_installsh.go
- NewMockUpdater in updates/mock_updater.go
Add comprehensive tests for the saveHistoryWithRetry function covering:
- Backup file creation from existing history
- Empty history serialization
- Single retry success
- Read-only directory failure with retries
- Concurrent saves with serialization via saveMu
- Snapshot isolation during save
Coverage: saveHistoryWithRetry 58.6% → 86.2%
Coverage: alerts package 87.4% → 87.8%
Add TestDispatchAlert with 8 test cases covering:
- Returns false when onAlert callback is nil
- Returns false when alert is nil
- Returns false when activation state is pending
- Returns false when activation state is snoozed
- Returns false for monitor-only alerts
- Dispatches synchronously when async is false
- Dispatches asynchronously when async is true
- Clones alert before dispatch
Alerts package coverage: 83.4%→83.5%
Tests using NewManager() were sharing /etc/pulse/alerts, causing race
conditions when running in parallel. Added newTestManager(t) helper that
creates isolated temp directories for each test.
clearNodeOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline count when node comes online
- Clears existing alert and adds to resolved
clearPBSOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline confirmation count
- Clears existing alert and adds to resolved
clearPMGOfflineAlert (0%→100%):
- No alert and no count is no-op
- Resets offline confirmation count
- Clears existing alert and adds to resolved
Coverage: 57.4% → 58.5%
SetMetricHooks (0%→100%):
- Sets all four hooks correctly
- Nil hooks are safely handled
- Properly saves/restores state to avoid test pollution
NotifyExistingAlert (0%→100%):
- Non-existent alert is no-op
- Existing alert dispatches async notification
GetResolvedAlert (0%→100%):
- Returns nil for non-existent alert
- Returns nil for nil resolved entry
- Returns nil when Alert is nil
- Returns cloned alert with resolved time
GetAlertHistory/GetAlertHistorySince (0%→100%):
- Returns history from history manager
- Respects limit parameter
- Zero time returns all history
- Filters by time correctly
ClearAlertHistory (0%→100%):
- Clears all history entries
Coverage: 56.9% → 57.4%
HandleHostRemoved (0%→100%):
- Empty host ID is no-op
- Clears host offline alert and confirmations
- Clears host metric alerts (CPU, memory)
- Clears host disk alerts
- Clears all alert types together
ReevaluateGuestAlert (0%→100%):
- No active alerts is no-op
- Clears alert when threshold disabled (nil)
- Clears alert when trigger is zero
- Clears alert when value below clear threshold
- Clears alert when value below trigger threshold
- Keeps alert when value above both thresholds
- Processes all metric types (7 types)
- Clears pending alert when threshold disabled
- Uses clear equals trigger when clear is zero
- Ignores alerts for different guests
NormalizeMetricTimeThresholds (0%→100%):
- Updated existing test to call public wrapper instead of internal function
Coverage: 54.6% → 55.6%