Commit graph

20 commits

Author SHA1 Message Date
rcourtman
9b531c547d Fix recovery notifications silently disabled by config PUT (#1332)
Two fixes for missing recovery/resolved notifications:

1. API config PUT handler now preserves notifyOnResolve when the client
   omits it from the request body. Go decodes a missing bool as false,
   which silently disabled recovery notifications on older clients.

2. CancelAlert now always cleans up the cooldown record even when the
   alert has already left the pending buffer, preventing stale cooldown
   entries from suppressing future alert cycles.
2026-03-09 11:28:28 +00:00
rcourtman
0dd3fc779b Fix alert disable notification suppression
Some checks failed
Build and Test / Secret Scan (push) Has been cancelled
Build and Test / Frontend & Backend (push) Has been cancelled
Core E2E Tests / Playwright Core E2E (push) Has been cancelled
2026-03-07 18:40:08 +00:00
rcourtman
12a5a98117 fix: SSE race conditions, alert user spoofing, and security status oracle
SSE Broadcaster:
- Add per-client mutex to prevent concurrent writes to ResponseWriter
- Fix data race in cleanupLoop reading LastActive without synchronization
- Update LastActive in SendHeartbeat so clients aren't incorrectly pruned
  after 5 minutes of idle heartbeat traffic

Alert Acknowledgements:
- Extract authenticated user from X-Authenticated-User header instead of
  hardcoding 'admin' or trusting request body's User field
- Prevents audit log spoofing and ensures accurate user attribution

Security Status Endpoint:
- Remove ?token= query param validation from public /api/security/status
- Prevents endpoint from acting as a token validity oracle for attackers
- Authentication still works via session cookies and X-API-Token header
2026-02-03 17:40:58 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
f478046696 refactor(api): Add interfaces to handlers for testability
Extract interfaces from concrete monitor type dependencies:

alerts.go:
- Add AlertManager, ConfigPersistence, AlertMonitor interfaces
- Change AlertHandlers to accept AlertMonitor interface

notifications.go:
- Add NotificationManager, NotificationConfigPersistence interfaces
- Add NotificationMonitor interface
- Change NotificationHandlers to accept NotificationMonitor interface

updates.go:
- Add UpdatesMonitor interface
- Change UpdatesHandlers to accept interface

audit_handlers.go:
- Update to use interface-based injection

profile_suggestions.go:
- Minor interface alignment

Benefits:
- Handlers can now be tested with mock implementations
- Decouples handlers from concrete monitoring.Monitor type
- Works with monitor_wrappers.go added in previous commit
2026-01-19 19:21:46 +00:00
rcourtman
3096ec53b5 fix: preserve alert activation state when saving config. Related to #1096 2026-01-16 14:24:02 +00:00
rcourtman
fa43628cde fix: Alert acknowledge/unacknowledge fails with reverse proxies
Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs
containing %2F (encoded slash). Alert IDs contain forward slashes
(e.g., "docker-container-state-docker:abc/def"), causing acknowledge
requests to fail with 400 errors when going through a reverse proxy.

Added new body-based endpoints that accept alert ID in JSON body:
- POST /api/alerts/acknowledge {"id": "..."}
- POST /api/alerts/unacknowledge {"id": "..."}
- POST /api/alerts/clear {"id": "..."}

Updated frontend to use the new endpoints. Legacy path-based endpoints
are preserved for backwards compatibility.

Related to #1026
2026-01-03 20:51:25 +00:00
rcourtman
96573f4aca feat: enhance AI baseline context visibility and incident timeline improvements
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
  status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
  their infrastructure's normal behavior patterns
- Added baseline import to service.go

Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
2025-12-21 00:14:20 +00:00
rcourtman
e157aa04ad fix: Allow printable ASCII in alert IDs for acknowledge requests
Reverts overly strict alert ID validation that was rejecting valid
alert IDs containing special characters. Docker host IDs can contain
user-supplied data like hostnames which may include parentheses,
brackets, or other printable ASCII characters.

The previous validation only allowed alphanumeric + limited punctuation,
which caused 400 errors when acknowledging alerts from Docker hosts
with special characters in their identifiers.

Related to #852
2025-12-16 20:10:31 +00:00
rcourtman
d0d989289a Refactor alert system: fix race conditions, memory leaks, and improve code quality
- Rename checkFlapping to checkFlappingLocked to clarify lock contract
- Replace goto statements with structured control flow
- Wire up unused recordAlertFired/recordAlertResolved metric hooks
- Add trackingMapCleanup goroutine to prevent memory leaks from stale entries
- Tighten alert ID validation to alphanumeric + safe punctuation
- Fix history save error handling to properly manage backup lifecycle
- Add auto-migration for deprecated GroupingWindow field
- Refactor 300+ line UpdateConfig into focused helper functions
- Unify duplicate evaluateVMCondition/evaluateContainerCondition
- Add constants for magic numbers (thresholds, timing, flapping)
- Update tests to match new backup behavior
2025-12-02 23:31:36 +00:00
rcourtman
b4d497ce3b security: Add request body size limits to API handlers
Add http.MaxBytesReader to 16 additional handlers to prevent memory
exhaustion attacks via oversized request bodies:

- docker_agents.go: HandleReport (512KB), HandleCommandAck (8KB),
  HandleSetCustomDisplayName (8KB)
- alerts.go: UpdateAlertConfig (64KB), BulkAcknowledgeAlerts (32KB),
  BulkClearAlerts (32KB)
- config_handlers.go: HandleAddNode, HandleTestConnection,
  HandleUpdateNode, HandleTestNodeConfig (32KB each),
  HandleVerifyTemperatureSSH, HandleExportConfig, HandleDiscoverServers,
  HandleSetupScriptURL (8KB each), HandleImportConfig (1MB),
  HandleUpdateMockMode (16KB)
2025-12-02 16:43:13 +00:00
rcourtman
255357d2fe Add recovery notifications and grouping controls 2025-11-21 22:07:00 +00:00
rcourtman
77282bd3a6 Implement Pulse tag overrides and alert clear persistence 2025-10-25 14:28:32 +00:00
rcourtman
5c54685f04 Add API token scopes and standalone host agent
Introduces granular permission scopes for API tokens (docker:report, docker:manage, host-agent:report, monitoring:read/write, settings:read/write) allowing tokens to be restricted to minimum required access. Legacy tokens default to full access until scopes are explicitly configured.

Adds standalone host agent for monitoring Linux, macOS, and Windows servers outside Proxmox/Docker estates. New Servers workspace in UI displays uptime, OS metadata, and capacity metrics from enrolled agents.

Includes comprehensive token management UI overhaul with scope presets, inline editing, and visual scope indicators.
2025-10-23 11:40:31 +00:00
rcourtman
ff4dc49ae4 Update Pulse install flow and related components 2025-10-21 19:58:53 +00:00
rcourtman
ad371bf412 feat: improve alert system performance, UX, and edge case handling
Implement 5 medium/low priority improvements identified in systematic review:

UX IMPROVEMENTS:
- Notify existing critical alerts when activating from pending_review state
  Previously: critical alerts during observation window would never notify
  Now: users receive notifications for active critical alerts after activation
  Implementation: Added NotifyExistingAlert() method and logic in ActivateAlerts()

PERFORMANCE OPTIMIZATIONS:
- Replace per-alert cleanup goroutines with periodic batch cleanup
  Prevents spawning 1000s of goroutines during alert flapping
  recentlyResolved entries now cleaned up once per minute instead of 1 goroutine per alert
- Simplify GetActiveAlerts() implementation
  Removed intermediate map copy, holds lock slightly longer but operation is fast
  Cleaner code with reduced memory allocation

CONFIGURATION VALIDATION:
- Validate timezone in quiet hours configuration
  Invalid timezones now disable quiet hours with error log instead of silent fallback
  Prevents unexpected behavior when timezone is typo'd or invalid

GRACEFUL SHUTDOWN:
- Add 100ms delay in Stop() for background goroutine cleanup
  Reduces risk of state corruption during shutdown
  Allows escalation checker and periodic save to exit cleanly

Technical details:
- internal/alerts/alerts.go: Added NotifyExistingAlert(), optimized cleanup patterns
- internal/api/alerts.go: Enhanced ActivateAlerts() to notify existing critical alerts
- Removed ~20 lines of goroutine spawning code
- Added periodic cleanup for recentlyResolved map
- All changes preserve backward compatibility

Testing: Verified compilation with 'go build -o /dev/null ./...'
2025-10-21 11:05:45 +00:00
rcourtman
85ffe10aed docs: add Mermaid diagrams to improve visual documentation
Enhance documentation with six Mermaid diagrams to better explain
complex system implementations:

- Adaptive polling lifecycle flowchart showing enqueue→execute→feedback
  cycle with scheduler, priority queue, and worker interactions
- Circuit breaker state machine diagram illustrating Closed↔Open↔Half-open
  transitions with triggers and recovery paths
- Temperature proxy architecture diagram highlighting trust boundaries,
  security controls, and data flow between host/container/cluster
- Sensor proxy request flow sequence diagram showing auth, rate limiting,
  validation, and SSH execution pipeline
- Alert webhook pipeline flowchart detailing template resolution, URL
  rendering, HTTP dispatch, and retry logic
- Script library workflow diagram illustrating dev→test→bundle→distribute
  lifecycle emphasizing modular design

These visualizations make it easier for operators and contributors to
understand Pulse's sophisticated architectural patterns.
2025-10-21 10:40:33 +00:00
rcourtman
5927535110 Ref #556: adjust alert history range handling 2025-10-15 18:41:06 +00:00
rcourtman
0a5a4c1a0d Allow printable alert IDs for acknowledgements (#550) 2025-10-14 16:48:22 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00