Commit graph

10 commits

Author SHA1 Message Date
rcourtman
77bd2e70d9 fix(notifications): add service-specific resolved webhook templates (#1259)
Backport from v6 (88d5865a8). Recovery webhook notifications were using
the firing PayloadTemplate which services like Telegram, Teams, Discord
etc. silently rejected as malformed. Now uses a three-tier template
pipeline matching the firing path:
- Tier 1: Custom user template (if configured)
- Tier 2: Service-specific ResolvedPayloadTemplate (Discord green embed,
  Telegram chat_id+text, Slack header blocks, Teams MessageCard/Adaptive,
  PagerDuty event_action:"resolve", Pushover, Gotify, Mattermost)
- Tier 3: Generic JSON fallback (backward compatible)

Also adds Event, ResolvedAt, ResolvedAtISO fields to WebhookPayloadData.
2026-02-24 23:28:33 +00:00
rcourtman
05266d9062 Show node display name in alerts instead of raw Proxmox node name
Alerts previously showed the raw Proxmox node name (e.g., "on pve") even
when users configured a display name (e.g., "SPACEX") via Settings or the
host agent --hostname flag. This affected the alert UI, email notifications,
and webhook payloads.

Add NodeDisplayName field to the alert chain: cache display names in the
alert Manager (populated by CheckNode/CheckHost on every poll), resolve
them at alert creation via preserveAlertState, refresh on metric updates,
and enrich at read time in GetActiveAlerts. Update models.Alert, the
syncAlertsToState conversion, email templates, Apprise body text, webhook
payloads, and all frontend rendering paths.

Related to #1188
2026-02-04 14:26:44 +00:00
rcourtman
bd030c7c87 security: fix webhook SSRF, rate limit spoofing, metrics retention, and url poisoning
- Fix SSRF and rate limit bypass in SendEnhancedWebhook by validating the rendered URL.
- Fix rate limit spoofing in updates API by using secure IP extraction (trusted proxies).
- Fix memory leak in metrics history by correctly clearing fully stale data series.
- Fix public URL poisoning by preventing overwrites when explicitly configured.
2026-02-03 16:58:13 +00:00
rcourtman
17dec929a0 feat: Add mention support for webhook alerts. Related to #1118
Adds a Mention field to webhook configurations that allows users to tag
individuals or groups when alerts are sent. This works with:

- Discord: @everyone, <@USER_ID>, <@&ROLE_ID>
- Microsoft Teams: @General, user email
- Mattermost: @channel, @all, @username

The mention is included in the webhook payload via the {{.Mention}} template
variable. Built-in templates for Discord, Slack, and Teams now conditionally
include mentions when configured.

Backend changes:
- Add Mention field to WebhookConfig struct
- Add Mention field to WebhookPayloadData for template access
- Pass mention through sendGroupedWebhook

Frontend changes:
- Add mention field to Webhook interface
- Add Mention input to webhook configuration form
- Show service-specific help text for mention formats
2026-01-18 15:16:37 +00:00
rcourtman
a82a345cd6 Improve table column widths and sparkline visibility 2025-11-09 23:36:52 +00:00
rcourtman
1b221cca71 feat: Add configurable allowlist for webhook private IP targets (addresses #673)
Allow homelab users to send webhooks to internal services while maintaining security defaults.

Changes:
- Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config)
- Implement CIDR parsing and validation in NotificationManager
- Convert ValidateWebhookURL to instance method to access allowlist
- Add UI controls in System Settings for configuring trusted CIDR ranges
- Maintain strict security by default (block all private IPs)
- Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist
- Re-validate on both config save and webhook delivery (DNS rebinding protection)
- Add comprehensive tests for CIDR parsing and IP matching

Backend:
- UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation
- Support for bare IPs (auto-converts to /32 or /128)
- Thread-safe allowlist updates with RWMutex
- Logging when allowlist is updated or used
- Validation errors prevent invalid CIDRs from being saved

Frontend:
- New "Webhook Security" section in System Settings
- Input field with examples and helpful placeholder text
- Real-time unsaved changes tracking
- Loads and saves allowlist via system settings API

Security:
- Default behavior unchanged (all private IPs blocked)
- Explicit opt-in required via configuration
- Localhost (127/8) always blocked
- Link-local (169.254/16) always blocked
- Cloud metadata services always blocked
- DNS resolution checked at both save and send time

Testing:
- Tests for CIDR parsing (valid/invalid inputs)
- Tests for IP allowlist matching
- Tests for bare IP address handling
- Tests for security boundaries (localhost, link-local remain blocked)

Related to #673

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 08:31:12 +00:00
rcourtman
b70dc3d00d Document layered retry semantics (P2 documentation)
Add documentation to explain how transport-level and queue-level retries interact:
- Email: MaxRetries (transport) * MaxAttempts (queue) = total SMTP attempts
- Webhooks: RetryCount (transport) * MaxAttempts (queue) = total HTTP attempts
- Example: 3 * 3 = 9 total delivery attempts for a single notification

This clarifies the multiplicative retry behavior and helps operators understand
the actual retry counts when using the persistent queue.
2025-11-07 08:35:00 +00:00
rcourtman
c6a69e525c Fix critical notification system bugs and security issues
Critical fixes (P0):
- Fix cooldown timing: Mark cooldown only after successful delivery, not before enqueue
- Add os.MkdirAll to queue initialization to prevent silent failures on fresh installs
- Add DNS re-validation at webhook send time to prevent DNS rebinding SSRF attacks
- Add SSRF validation for Apprise HTTP URLs
- Remove secret logging (bot tokens, routing keys) from debug logs
- Implement lastNotified cleanup to prevent unbounded memory growth
- Use shared HTTP client for webhooks to enable TLS connection reuse
- Add fallback to direct sending when queue enqueue fails
- Make queue worker concurrent (5 workers with semaphore) to prevent head-of-line blocking
- Fix webhook rate limiter race condition with separate mutex
- Fix email manager thread safety with mutex on rate limiter
- Fix grouping timer leak by adding stopCleanup signal
- Fix webhook 429 double sleep (use Retry-After OR backoff, not both)

Frontend improvements:
- Add queue/DLQ management API methods (getQueueStats, getDLQ, retryDLQItem, deleteDLQItem)
- Add getNotificationHealth and getWebhookHistory endpoints
- Add Apprise test support to NotificationTestRequest type

Related to notification system audit
2025-11-07 08:29:13 +00:00
rcourtman
91fecacfef feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00