When a powered-off VM is backed up by Proxmox, the VM status briefly
changes (e.g., to "running" during backup). This caused the powered-off
alert to be cleared, deleting the ackState record. When the backup
completed and the alert was recreated, it appeared as a new unacknowledged
alert, generating a new notification.
The fix preserves ackState when alerts are removed, allowing
preserveAlertState to restore the acknowledgement when the same alert
reappears. Stale ackState entries (for alerts that don't exist) are
cleaned up after 1 hour.
Related to #937
- Add registry checker tests (caching, enable/disable, parsing, concurrency)
- Add alert integration tests for update detection and Pro license gating
- Add API handler tests for /api/infra-updates endpoints
- Test cleanup of tracking maps when containers are removed
- Test threshold-based alerting behavior
- Add FeatureUpdateAlerts constant for Pro license gating
- Add feature to all Pro tier feature lists
- Add SetLicenseChecker method to alerts Manager
- Check Pro license in checkDockerContainerImageUpdate before alerting
- Wire license checker from router to alert manager
Free users still see update badges in the UI.
Pro users get proactive alerts after 24h of pending updates.
- Add GHCR (GitHub Container Registry) token support for public images
- Clean up dockerUpdateFirstSeen tracking when containers are removed
- Improve UpdateIcon tooltip to show digest info
- Add cursor-help to indicate hoverable tooltip
Acknowledged alerts were still triggering repeated webhook notifications
because the re-notification logic only checked cooldown period, not
acknowledgment status. Now acknowledged alerts are skipped entirely.
Related to #921
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
their infrastructure's normal behavior patterns
- Added baseline import to service.go
Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior
Related to #864
Addresses issue #861 - syslog flooded on docker host
Many routine operational messages were being logged at INFO level,
causing excessive log volume when monitoring multiple VMs/containers.
These messages are now logged at DEBUG level:
- Guest threshold checking (every guest, every poll cycle)
- Storage threshold checking (every storage, every poll cycle)
- Host agent linking messages
- Filesystem inclusion in disk calculation
- Guest agent disk usage replacement
- Polling start/completion messages
- Alert cleanup and save messages
Users can set LOG_LEVEL=debug to see these messages if needed for
troubleshooting. The default INFO level now produces significantly
less log output.
Also updated documentation in CONFIGURATION.md and DOCKER.md to:
- Clarify what each log level includes
- Add tip about using LOG_LEVEL=warn for minimal logging
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).
- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration
Closes#839
Connect alert system to failure prediction:
1. Add AlertCallback to HistoryManager:
- OnAlert() method to register callbacks
- Callbacks invoked when alerts are added
- Called outside lock to prevent deadlocks
2. Expose OnAlertHistory() on alerts.Manager:
- Pass-through to HistoryManager.OnAlert()
- Enables external systems to track alerts
3. Wire pattern detector in router startup:
- Register callback when pattern detector is created
- Convert alert types to trackable events
- Pattern detector now learns from production alerts
Now every alert (memory_warning, cpu_critical, etc.) is recorded as
a historical event for pattern analysis. The AI can predict:
'High memory usage typically occurs every ~3 days (next expected in ~1 day)'
All tests passing.
ClearActiveAlerts triggers an async save to disk, which can race with
LoadActiveAlerts reading the file. The test now clears the in-memory
map directly without triggering the async save.
- Rename checkFlapping to checkFlappingLocked to clarify lock contract
- Replace goto statements with structured control flow
- Wire up unused recordAlertFired/recordAlertResolved metric hooks
- Add trackingMapCleanup goroutine to prevent memory leaks from stale entries
- Tighten alert ID validation to alphanumeric + safe punctuation
- Fix history save error handling to properly manage backup lifecycle
- Add auto-migration for deprecated GroupingWindow field
- Refactor 300+ line UpdateConfig into focused helper functions
- Unify duplicate evaluateVMCondition/evaluateContainerCondition
- Add constants for magic numbers (thresholds, timing, flapping)
- Update tests to match new backup behavior
Add missing godoc comments to:
- BuildGuestKey in alerts/alerts.go
- GenerateMockData in mock/generator.go
- NewDockerUpdater, NewAURUpdater in updates/adapter_installsh.go
- NewMockUpdater in updates/mock_updater.go
Add comprehensive tests for the saveHistoryWithRetry function covering:
- Backup file creation from existing history
- Empty history serialization
- Single retry success
- Read-only directory failure with retries
- Concurrent saves with serialization via saveMu
- Snapshot isolation during save
Coverage: saveHistoryWithRetry 58.6% → 86.2%
Coverage: alerts package 87.4% → 87.8%
Add TestDispatchAlert with 8 test cases covering:
- Returns false when onAlert callback is nil
- Returns false when alert is nil
- Returns false when activation state is pending
- Returns false when activation state is snoozed
- Returns false for monitor-only alerts
- Dispatches synchronously when async is false
- Dispatches asynchronously when async is true
- Clones alert before dispatch
Alerts package coverage: 83.4%→83.5%
Tests using NewManager() were sharing /etc/pulse/alerts, causing race
conditions when running in parallel. Added newTestManager(t) helper that
creates isolated temp directories for each test.