Commit graph

43 commits

Author SHA1 Message Date
rcourtman
b8a551ce22 Forward-port webhook JSON template escaping 2026-04-01 17:04:40 +01:00
rcourtman
a253016327 Harden Apprise server URL base handling 2026-04-01 15:50:30 +01:00
rcourtman
53f41fdb45 Harden webhook request URL validation 2026-03-29 13:18:40 +01:00
rcourtman
d6536932fc Harden outbound URLs and file-backed storage 2026-03-29 12:47:55 +01:00
rcourtman
778a2577b6 feat: Pulse v6 release 2026-03-18 16:06:30 +00:00
rcourtman
9b531c547d Fix recovery notifications silently disabled by config PUT (#1332)
Two fixes for missing recovery/resolved notifications:

1. API config PUT handler now preserves notifyOnResolve when the client
   omits it from the request body. Go decodes a missing bool as false,
   which silently disabled recovery notifications on older clients.

2. CancelAlert now always cleans up the cooldown record even when the
   alert has already left the pending buffer, preventing stale cooldown
   entries from suppressing future alert cycles.
2026-03-09 11:28:28 +00:00
rcourtman
0dd3fc779b Fix alert disable notification suppression
Some checks failed
Build and Test / Secret Scan (push) Has been cancelled
Build and Test / Frontend & Backend (push) Has been cancelled
Core E2E Tests / Playwright Core E2E (push) Has been cancelled
2026-03-07 18:40:08 +00:00
rcourtman
464d3f8486 Fix stale queued notification delivery 2026-03-05 23:46:35 +00:00
rcourtman
eb2397d99a fix(notifications): route escalation notifications to selected channels only (#1259)
Escalation was calling SendAlert() which always sends to all enabled
channels, ignoring the per-level channel selection (email/webhook/all).

Add SendAlertToChannels() that snapshots only the requested channel
configs and uses a distinct "_escalation" queue type so the dequeue
handler skips cooldown writes — preventing interference with the alert
manager's own re-notify cadence.
2026-02-26 20:49:10 +00:00
rcourtman
77bd2e70d9 fix(notifications): add service-specific resolved webhook templates (#1259)
Backport from v6 (88d5865a8). Recovery webhook notifications were using
the firing PayloadTemplate which services like Telegram, Teams, Discord
etc. silently rejected as malformed. Now uses a three-tier template
pipeline matching the firing path:
- Tier 1: Custom user template (if configured)
- Tier 2: Service-specific ResolvedPayloadTemplate (Discord green embed,
  Telegram chat_id+text, Slack header blocks, Teams MessageCard/Adaptive,
  PagerDuty event_action:"resolve", Pushover, Gotify, Mattermost)
- Tier 3: Generic JSON fallback (backward compatible)

Also adds Event, ResolvedAt, ResolvedAtISO fields to WebhookPayloadData.
2026-02-24 23:28:33 +00:00
rcourtman
82ccb662f9 fix(notifications): use service-specific templates for resolved webhooks (#1068)
Recovery notifications for Discord, Slack, Teams, PagerDuty, and other
service webhooks were sending a generic JSON payload that lacked the
required format (e.g. Discord needs `embeds`, Slack needs `blocks`),
causing resolved notifications to silently fail.

- Add `prepareResolvedWebhookData` to build template data with Level="resolved"
- Route resolved webhooks through service-specific templates with full
  URL rendering, Telegram ChatID extraction, and PagerDuty routing_key
- Custom user templates take precedence over built-in service templates
- Return errors on service template failures instead of falling back to
  generic payloads that endpoints would reject
- Fix PagerDuty template to send event_action="resolve" for resolved alerts
2026-02-24 10:49:52 +00:00
rcourtman
8a48acef1d fix: hotfix 5.1.5 — node duplication, alert scrambling, ntfy resolved formatting
- fix(models): filter nodes by instance in UpdateNodesForInstance to prevent
  PVE node duplication across poll cycles (#1214, #1192, #1217)
- fix(alerts): sort GetActiveAlerts output for stable ordering, preventing
  hostname scrambling in frontend (#1218)
- fix(notifications): add ntfy-specific resolved webhook formatting with
  plain-text body and proper headers (#1213)
- fix(frontend): respect "hide Docker update actions" setting in
  DockerFilter Update All button (#1219)
- fix(frontend): add missing v prefix to GitHub release tag URLs (#1195)
- fix(monitoring): reduce disk detection warning from Warn to Debug to
  eliminate log spam for pass-through disks (#1216)
- chore: bump VERSION to 5.1.5
2026-02-08 11:48:22 +00:00
rcourtman
b3fa409b74 Allow SMTP auth over unencrypted connections, fix rate limit persistence, sanitize diagnostics export
- Replace Go stdlib smtp.PlainAuth (which refuses credentials without TLS)
  with a custom plainAuth that respects the user's explicit transport choice
- Remove TLS guard from LoginAuth for the same reason
- Add RateLimit field to EmailConfig so the user's configured value is
  persisted instead of being silently overwritten with 60
- Implement actual sanitization in the "Export for GitHub" diagnostics
  button (was previously ignored — both exports produced identical data)

Related to #1189
2026-02-04 15:42:47 +00:00
rcourtman
05266d9062 Show node display name in alerts instead of raw Proxmox node name
Alerts previously showed the raw Proxmox node name (e.g., "on pve") even
when users configured a display name (e.g., "SPACEX") via Settings or the
host agent --hostname flag. This affected the alert UI, email notifications,
and webhook payloads.

Add NodeDisplayName field to the alert chain: cache display names in the
alert Manager (populated by CheckNode/CheckHost on every poll), resolve
them at alert creation via preserveAlertState, refresh on metric updates,
and enrich at read time in GetActiveAlerts. Update models.Alert, the
syncAlertsToState conversion, email templates, Apprise body text, webhook
payloads, and all frontend rendering paths.

Related to #1188
2026-02-04 14:26:44 +00:00
rcourtman
7c1ebbecd5 fix(security): enhance webhook validation, enforce API scopes, and improve test coverage 2026-02-03 22:41:44 +00:00
rcourtman
81f146dcf0 Security fixes: Prevent Apprise RCE and Webhook DNS Rebinding 2026-02-03 22:00:02 +00:00
rcourtman
b2639ed5a5 Fix security vulnerabilities and critical bugs
- Fix WebSocket CORS bypass by strictly verifying origin
- Fix OIDC refresh token persistence by encrypting at rest
- Fix grouped webhook data mutation by cloning alerts
- Fix host agent uninstall authorization and config fetch logic
- Fix notification queue recovery for stuck sending items
- Fix ignored update history limit parameter
- Fix ineffective break statement in WebSocket write pump
2026-02-03 17:16:27 +00:00
rcourtman
4f40c3d751 fix: resolve critical stability and auth issues
- Fix data race in webhook notifications by removing shared state
- Fix duplicate monitors on config reload by stopping old instances
- Prevent metrics ID deletion on transient startup errors
- Support Bearer auth header for config export/import endpoints
2026-02-03 16:46:27 +00:00
rcourtman
aeca5e39fa Fix multi-tenant persistence and backend stability
- Initialize Alert and Notification managers with tenant-specific data directories

- Add panic recovery to WebSocket safeSend for stability

- Record host metrics to history for sparkline support
2026-02-03 16:24:42 +00:00
rcourtman
d06ed2edb3 refactor: Add testability improvements to core packages
hostagent/commands.go:
- Extract execCommandContext as mockable variable

hostagent/proxmox_setup.go:
- Convert stateFilePath constants to variables (testable)
- Extract runCommand and lookPath as mockable functions
- Add duplicate comment (minor cleanup needed)

notifications/notifications.go:
- Add GetQueueStats() method for interface compliance
- Used by NotificationMonitor interface

updates/manager.go:
- Add AddSSEClient, RemoveSSEClient, GetSSECachedStatus methods
- Enables interface-based SSE client management

pkg/audit/export.go:
- Minor testability improvements

go.mod/go.sum:
- Add stretchr/objx v0.5.2 (test mocking dependency)
2026-01-19 19:25:38 +00:00
rcourtman
17dec929a0 feat: Add mention support for webhook alerts. Related to #1118
Adds a Mention field to webhook configurations that allows users to tag
individuals or groups when alerts are sent. This works with:

- Discord: @everyone, <@USER_ID>, <@&ROLE_ID>
- Microsoft Teams: @General, user email
- Mattermost: @channel, @all, @username

The mention is included in the webhook payload via the {{.Mention}} template
variable. Built-in templates for Discord, Slack, and Teams now conditionally
include mentions when configured.

Backend changes:
- Add Mention field to WebhookConfig struct
- Add Mention field to WebhookPayloadData for template access
- Pass mention through sendGroupedWebhook

Frontend changes:
- Add mention field to Webhook interface
- Add Mention input to webhook configuration form
- Show service-specific help text for mention formats
2026-01-18 15:16:37 +00:00
rcourtman
e6d07c3294 style: remove emojis from log messages
Replaced emoji icons with plain text for cleaner logs and cross-platform compatibility.
2025-12-13 21:29:11 +00:00
rcourtman
611740087c style: fix additional staticcheck warnings
- Lowercase error messages (ST1005)
- Use context.Background() instead of nil (SA1012)
- Fix rand.Intn(1) which always returns 0 (SA4030)
- Remove unnecessary nil check before len() (S1009)
2025-11-27 09:21:11 +00:00
rcourtman
ad998a1e2f style: fix staticcheck style warnings
- Merge variable declaration with assignment (S1021)
- Use unconditional strings.TrimPrefix (S1017)
- Remove unnecessary nil checks around range (S1031)
- Remove unnecessary fmt.Sprintf (S1039)
- Use copy() instead of manual loop (S1001)
- Use time.Until instead of t.Sub(time.Now()) (S1024)
- Use buf.String() instead of string(buf.Bytes()) (S1030)
2025-11-27 09:19:33 +00:00
rcourtman
b370799988 chore: remove more dead code
Remove 330 lines of unreachable code:
- internal/monitoring/temperature_service.go: unused temperature service abstraction
- internal/monitoring/temperature.go: unused NewTemperatureCollector wrapper
- internal/mock/generator.go: unused GenerateAlerts function
- internal/mock/integration.go: unused ToggleMockMode wrapper
- internal/notifications/notifications.go: unused sendEmailWithContent,
  generatePayloadFromTemplate, isPrivateRange172, groupAlerts
- internal/notifications/email_providers.go: unused GetProviderDefaults
2025-11-27 00:10:55 +00:00
rcourtman
255357d2fe Add recovery notifications and grouping controls 2025-11-21 22:07:00 +00:00
rcourtman
11d7f4fd4e Add Apprise test support for notifications
Related to #584
2025-11-20 17:54:20 +00:00
rcourtman
8d320ef56b Fix notification manager deadlock in Stop()
Critical deadlock fix:
- Stop() was holding n.mu lock while calling queue.Stop()
- queue.Stop() waits for worker goroutines to finish
- Worker goroutines call ProcessQueuedNotification() which needs n.mu lock
- This created a classic lock-order deadlock

Fix:
- Unlock n.mu before calling queue.Stop()
- Relock after queue shutdown completes
- Workers can now finish and acquire lock as needed

This resolves 30-second test timeouts in notifications package.

Tests now complete in <1s instead of timing out at 30s.
2025-11-11 23:58:18 +00:00
rcourtman
d7766af799 Fix backend test failures blocking release workflow
Three categories of fixes:

1. Goroutine leak causing 10-minute timeout:
   - Add defer mon.notificationMgr.Stop() in monitor_memory_test.go
   - Background goroutines from notification manager weren't being stopped

2. Database NULL column scanning errors:
   - Change LastError from string to *string in queue.go
   - Change PayloadBytes from int to *int in queue.go
   - SQL NULL values require pointer types in Go

3. SSRF protection blocking test servers:
   - Check allowlist for localhost before rejecting in notifications.go
   - Set PULSE_DATA_DIR to temp directory in tests
   - Add defer nm.Stop() calls to prevent goroutine leaks

Fixes for preflight test failures in workflow run 19280879903.
2025-11-11 23:27:03 +00:00
rcourtman
1b221cca71 feat: Add configurable allowlist for webhook private IP targets (addresses #673)
Allow homelab users to send webhooks to internal services while maintaining security defaults.

Changes:
- Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config)
- Implement CIDR parsing and validation in NotificationManager
- Convert ValidateWebhookURL to instance method to access allowlist
- Add UI controls in System Settings for configuring trusted CIDR ranges
- Maintain strict security by default (block all private IPs)
- Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist
- Re-validate on both config save and webhook delivery (DNS rebinding protection)
- Add comprehensive tests for CIDR parsing and IP matching

Backend:
- UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation
- Support for bare IPs (auto-converts to /32 or /128)
- Thread-safe allowlist updates with RWMutex
- Logging when allowlist is updated or used
- Validation errors prevent invalid CIDRs from being saved

Frontend:
- New "Webhook Security" section in System Settings
- Input field with examples and helpful placeholder text
- Real-time unsaved changes tracking
- Loads and saves allowlist via system settings API

Security:
- Default behavior unchanged (all private IPs blocked)
- Explicit opt-in required via configuration
- Localhost (127/8) always blocked
- Link-local (169.254/16) always blocked
- Cloud metadata services always blocked
- DNS resolution checked at both save and send time

Testing:
- Tests for CIDR parsing (valid/invalid inputs)
- Tests for IP allowlist matching
- Tests for bare IP address handling
- Tests for security boundaries (localhost, link-local remain blocked)

Related to #673

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 08:31:12 +00:00
rcourtman
7ee11105f5 Implement queue cancellation and atomic DB operations (P1 fixes)
Queue cancellation mechanism:
- Add CancelByAlertIDs method to mark queued notifications as cancelled when alerts resolve
- Update CancelAlert to cancel queued notifications containing resolved alert IDs
- Skip cancelled notifications in queue processor
- Prevents resolved alerts from triggering notifications after they clear

Atomic DB operations:
- Add IncrementAttemptAndSetStatus to atomically update attempt counter and status
- Replace separate IncrementAttempt + UpdateStatus calls with single atomic operation
- Prevents orphaned queue entries when crashes occur between operations
- Eliminates race condition where rows get stuck in "pending" or "sending" status

These fixes ensure queued notifications are properly cancelled when alerts resolve
and prevent database inconsistencies during crash scenarios.
2025-11-07 08:33:09 +00:00
rcourtman
c6a69e525c Fix critical notification system bugs and security issues
Critical fixes (P0):
- Fix cooldown timing: Mark cooldown only after successful delivery, not before enqueue
- Add os.MkdirAll to queue initialization to prevent silent failures on fresh installs
- Add DNS re-validation at webhook send time to prevent DNS rebinding SSRF attacks
- Add SSRF validation for Apprise HTTP URLs
- Remove secret logging (bot tokens, routing keys) from debug logs
- Implement lastNotified cleanup to prevent unbounded memory growth
- Use shared HTTP client for webhooks to enable TLS connection reuse
- Add fallback to direct sending when queue enqueue fails
- Make queue worker concurrent (5 workers with semaphore) to prevent head-of-line blocking
- Fix webhook rate limiter race condition with separate mutex
- Fix email manager thread safety with mutex on rate limiter
- Fix grouping timer leak by adding stopCleanup signal
- Fix webhook 429 double sleep (use Retry-After OR backoff, not both)

Frontend improvements:
- Add queue/DLQ management API methods (getQueueStats, getDLQ, retryDLQItem, deleteDLQItem)
- Add getNotificationHealth and getWebhookHistory endpoints
- Add Apprise test support to NotificationTestRequest type

Related to notification system audit
2025-11-07 08:29:13 +00:00
rcourtman
febce91145 Remove internal development documentation files
Remove 4 LLM-generated internal development docs that don't belong in the repository:
- MIGRATION_SCAFFOLDING.md
- NOTIFICATION_AUDIT.md
- NOTIFICATION_QUICK_REFERENCE.md
- NOTIFICATION_SYSTEM_MAP.md

These were internal development notes, not user-facing documentation.
2025-11-07 08:23:19 +00:00
rcourtman
6a48c759e8 Fix critical notification system bugs and security issues
This commit addresses multiple critical issues identified in the notification
system audit conducted with Codex:

**Critical Fixes:**

1. **Queue Retry Logic (Critical #1)**
   - Fixed broken retry/DLQ system where send functions never returned errors
   - Made sendGroupedEmail(), sendGroupedWebhook(), sendGroupedApprise() return errors
   - Made sendWebhookRequest() return errors
   - ProcessQueuedNotification() now properly propagates errors to queue
   - Retry logic and DLQ now function correctly

2. **Attempt Counter Bug (Critical #2)**
   - Fixed double-increment bug in queue processing
   - Separated UpdateStatus() from attempt tracking
   - Added IncrementAttempt() method
   - Notifications now get correct number of retry attempts

3. **Secret Exposure (Critical #3 & #4)**
   - Masked webhook headers and customFields in GET /api/notifications/webhooks
   - Added redactSecretsFromURL() to sanitize webhook URLs in history
   - Truncated/redacted response bodies in webhook history
   - Protected against credential harvesting via API

4. **Email Rate Limiting (Critical #5)**
   - Added emailManager field to NotificationManager
   - Shared EnhancedEmailManager instance across sends
   - Rate limiter now accumulates across multiple emails
   - SMTP rate limits are now enforced correctly

5. **SSRF Protection (High #6)**
   - Added DNS resolution of webhook URLs
   - Added isPrivateIP() check using CIDR ranges
   - Blocks all private IP ranges (10/8, 172.16/12, 192.168/16, 127/8, 169.254/16)
   - Blocks IPv6 private ranges (::1, fe80::/10, fc00::/7)
   - Prevents DNS rebinding attacks
   - Returns error instead of warning for private IPs

**New Features:**

6. **Health Endpoint (High #8)**
   - Added GET /api/notifications/health
   - Returns queue stats (pending, sending, sent, failed, dlq)
   - Shows email/webhook configuration status
   - Provides overall health indicator

**Related to notification system audit**

Files changed:
- internal/notifications/notifications.go: Error returns, rate limiting, SSRF hardening
- internal/notifications/queue.go: Attempt tracking fix
- internal/api/notifications.go: Secret masking, health endpoint
2025-11-06 23:26:03 +00:00
rcourtman
20099549c6 Add comprehensive release validation to prevent missing artifacts
Adds automated validation script to prevent the pattern of patch
releases caused by missing files/artifacts.

scripts/validate-release.sh validates all 40+ artifacts including:
- Docker image scripts (8 install/uninstall scripts)
- Docker image binaries (17 across all platforms)
- Release tarballs (5 including universal and macOS)
- Standalone binaries (12+)
- Checksums for all distributable assets
- Version embedding in every binary type
- Tarball contents (binaries + scripts + VERSION)
- Binary architectures and file types

The script catches 100% of issues from the last 3 patch releases
(missing scripts, missing install.sh, missing binaries, broken
version embedding).

Updated RELEASE_CHECKLIST.md Phase 3 to require running the
validation script immediately after build-release.sh and before
proceeding to Docker build/publish phases.

Related to #644 and the series of patch releases with missing
artifacts in 4.26.x.
2025-11-06 16:33:49 +00:00
rcourtman
ddc787418b Round float values in webhook payloads to 1 decimal place
Webhook alert payloads now round Value and Threshold fields to 1 decimal
place before template rendering. This eliminates excessive precision in
webhook messages (e.g., 62.27451680630036 becomes 62.3).

The fix is applied in prepareWebhookData() so all webhook templates
benefit automatically, including Google Space webhooks, generic JSON
webhooks, and custom templates.

Related to #619
2025-11-05 19:19:10 +00:00
rcourtman
02864f54dd Add test notification functionality for Apprise
- Add support for testing Apprise notifications via /api/notifications/test endpoint
- Users can now test their Apprise configuration (both CLI and HTTP modes) using method="apprise"
- Added comprehensive unit tests for both CLI and HTTP modes
- Tests verify correct behavior when Apprise is enabled/disabled
- Tests validate that notifications are properly sent through Apprise channels

Related to #584
2025-11-05 18:54:18 +00:00
rcourtman
77282bd3a6 Implement Pulse tag overrides and alert clear persistence 2025-10-25 14:28:32 +00:00
rcourtman
be26f957c0 Add snapshot size alert thresholds (#585) 2025-10-22 13:30:40 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00
Pulse Automation Bot
80b9d0602a Add Apprise notification integration (#570) 2025-10-18 16:39:39 +00:00
rcourtman
91fecacfef feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00