Commit graph

113 commits

Author SHA1 Message Date
rcourtman
edc5a99d9b Block unspecified webhook addresses 2026-04-05 23:35:54 +01:00
rcourtman
dcc4747215 Harden alert history and tenant storage paths 2026-03-31 09:23:03 +01:00
rcourtman
a7326d7047 Tighten redirect and download path handling 2026-03-31 09:17:52 +01:00
rcourtman
66b448d63b Harden outbound SSO and webhook URL handling 2026-03-31 09:06:06 +01:00
rcourtman
916636d481 Fix SMTP transport test fixtures for secure envelopes 2026-03-28 15:17:32 +00:00
rcourtman
29c4d6b5a7 Harden SMTP message and envelope handling
Some checks failed
Build and Test / Secret Scan (push) Waiting to run
Build and Test / Frontend & Backend (push) Waiting to run
Helm CI / Lint and Render Chart (push) Waiting to run
Core E2E Tests / Playwright Core E2E (push) Waiting to run
Update Integration Tests / Update Flow Integration Tests (push) Has been cancelled
2026-03-28 11:07:21 +00:00
rcourtman
b5757c38fd Harden security handlers and apprise execution 2026-03-28 11:03:16 +00:00
rcourtman
0a7b93a842 Include mentions in resolved webhook templates (#1118)
Some checks are pending
Build and Test / Secret Scan (push) Waiting to run
Build and Test / Frontend & Backend (push) Waiting to run
Core E2E Tests / Playwright Core E2E (push) Waiting to run
2026-03-26 00:32:07 +00:00
rcourtman
6c03706b6f Harden JSON webhook templates for live alerts (#1367) 2026-03-25 23:25:14 +00:00
rcourtman
e46239d8ac Preserve queued recovery notifications on alert cancellation (#1350) 2026-03-25 13:18:33 +00:00
rcourtman
9b531c547d Fix recovery notifications silently disabled by config PUT (#1332)
Two fixes for missing recovery/resolved notifications:

1. API config PUT handler now preserves notifyOnResolve when the client
   omits it from the request body. Go decodes a missing bool as false,
   which silently disabled recovery notifications on older clients.

2. CancelAlert now always cleans up the cooldown record even when the
   alert has already left the pending buffer, preventing stale cooldown
   entries from suppressing future alert cycles.
2026-03-09 11:28:28 +00:00
rcourtman
0dd3fc779b Fix alert disable notification suppression
Some checks failed
Build and Test / Secret Scan (push) Has been cancelled
Build and Test / Frontend & Backend (push) Has been cancelled
Core E2E Tests / Playwright Core E2E (push) Has been cancelled
2026-03-07 18:40:08 +00:00
rcourtman
464d3f8486 Fix stale queued notification delivery 2026-03-05 23:46:35 +00:00
rcourtman
a3fcaafbdb test(notifications): remove queue stats race with background processor 2026-03-03 21:37:27 +00:00
rcourtman
eb2397d99a fix(notifications): route escalation notifications to selected channels only (#1259)
Escalation was calling SendAlert() which always sends to all enabled
channels, ignoring the per-level channel selection (email/webhook/all).

Add SendAlertToChannels() that snapshots only the requested channel
configs and uses a distinct "_escalation" queue type so the dequeue
handler skips cooldown writes — preventing interference with the alert
manager's own re-notify cadence.
2026-02-26 20:49:10 +00:00
rcourtman
77bd2e70d9 fix(notifications): add service-specific resolved webhook templates (#1259)
Backport from v6 (88d5865a8). Recovery webhook notifications were using
the firing PayloadTemplate which services like Telegram, Teams, Discord
etc. silently rejected as malformed. Now uses a three-tier template
pipeline matching the firing path:
- Tier 1: Custom user template (if configured)
- Tier 2: Service-specific ResolvedPayloadTemplate (Discord green embed,
  Telegram chat_id+text, Slack header blocks, Teams MessageCard/Adaptive,
  PagerDuty event_action:"resolve", Pushover, Gotify, Mattermost)
- Tier 3: Generic JSON fallback (backward compatible)

Also adds Event, ResolvedAt, ResolvedAtISO fields to WebhookPayloadData.
2026-02-24 23:28:33 +00:00
rcourtman
82ccb662f9 fix(notifications): use service-specific templates for resolved webhooks (#1068)
Recovery notifications for Discord, Slack, Teams, PagerDuty, and other
service webhooks were sending a generic JSON payload that lacked the
required format (e.g. Discord needs `embeds`, Slack needs `blocks`),
causing resolved notifications to silently fail.

- Add `prepareResolvedWebhookData` to build template data with Level="resolved"
- Route resolved webhooks through service-specific templates with full
  URL rendering, Telegram ChatID extraction, and PagerDuty routing_key
- Custom user templates take precedence over built-in service templates
- Return errors on service template failures instead of falling back to
  generic payloads that endpoints would reject
- Fix PagerDuty template to send event_action="resolve" for resolved alerts
2026-02-24 10:49:52 +00:00
rcourtman
8a48acef1d fix: hotfix 5.1.5 — node duplication, alert scrambling, ntfy resolved formatting
- fix(models): filter nodes by instance in UpdateNodesForInstance to prevent
  PVE node duplication across poll cycles (#1214, #1192, #1217)
- fix(alerts): sort GetActiveAlerts output for stable ordering, preventing
  hostname scrambling in frontend (#1218)
- fix(notifications): add ntfy-specific resolved webhook formatting with
  plain-text body and proper headers (#1213)
- fix(frontend): respect "hide Docker update actions" setting in
  DockerFilter Update All button (#1219)
- fix(frontend): add missing v prefix to GitHub release tag URLs (#1195)
- fix(monitoring): reduce disk detection warning from Warn to Debug to
  eliminate log spam for pass-through disks (#1216)
- chore: bump VERSION to 5.1.5
2026-02-08 11:48:22 +00:00
rcourtman
b3fa409b74 Allow SMTP auth over unencrypted connections, fix rate limit persistence, sanitize diagnostics export
- Replace Go stdlib smtp.PlainAuth (which refuses credentials without TLS)
  with a custom plainAuth that respects the user's explicit transport choice
- Remove TLS guard from LoginAuth for the same reason
- Add RateLimit field to EmailConfig so the user's configured value is
  persisted instead of being silently overwritten with 60
- Implement actual sanitization in the "Export for GitHub" diagnostics
  button (was previously ignored — both exports produced identical data)

Related to #1189
2026-02-04 15:42:47 +00:00
rcourtman
05266d9062 Show node display name in alerts instead of raw Proxmox node name
Alerts previously showed the raw Proxmox node name (e.g., "on pve") even
when users configured a display name (e.g., "SPACEX") via Settings or the
host agent --hostname flag. This affected the alert UI, email notifications,
and webhook payloads.

Add NodeDisplayName field to the alert chain: cache display names in the
alert Manager (populated by CheckNode/CheckHost on every poll), resolve
them at alert creation via preserveAlertState, refresh on metric updates,
and enrich at read time in GetActiveAlerts. Update models.Alert, the
syncAlertsToState conversion, email templates, Apprise body text, webhook
payloads, and all frontend rendering paths.

Related to #1188
2026-02-04 14:26:44 +00:00
rcourtman
b9eee668e5 test: expand security regression coverage 2026-02-04 10:28:41 +00:00
rcourtman
7c1ebbecd5 fix(security): enhance webhook validation, enforce API scopes, and improve test coverage 2026-02-03 22:41:44 +00:00
rcourtman
a3436fbde5 fix(tests): disable email in concurrency test to prevent CI timeouts
The TestNotificationManagerEmailConfigConcurrency test was causing CI
failures by triggering 1000+ email send attempts to a non-existent SMTP
server, each with retries and delays. This test verifies concurrent config
updates don't cause races, not actual email delivery. Disabling email
eliminates the network operations that were causing 60+ second test runs
and occasional CI failures.
2026-02-03 22:39:59 +00:00
rcourtman
81f146dcf0 Security fixes: Prevent Apprise RCE and Webhook DNS Rebinding 2026-02-03 22:00:02 +00:00
rcourtman
beae4c860c fix: address 6 security and reliability issues
Security fixes:
- Auto-register now requires settings:write scope for API tokens
- X-Forwarded-For in auto-register only trusted from verified proxies
- Public URL capture requires authentication (no loopback bypass)
- Lockout reset now uses RequireAdmin for session users

Reliability fixes:
- Docker stop command expiration clears PendingUninstall flag
- Cancelled notifications get completed_at set and are cleaned up
2026-02-03 17:32:44 +00:00
rcourtman
b2639ed5a5 Fix security vulnerabilities and critical bugs
- Fix WebSocket CORS bypass by strictly verifying origin
- Fix OIDC refresh token persistence by encrypting at rest
- Fix grouped webhook data mutation by cloning alerts
- Fix host agent uninstall authorization and config fetch logic
- Fix notification queue recovery for stuck sending items
- Fix ignored update history limit parameter
- Fix ineffective break statement in WebSocket write pump
2026-02-03 17:16:27 +00:00
rcourtman
bd030c7c87 security: fix webhook SSRF, rate limit spoofing, metrics retention, and url poisoning
- Fix SSRF and rate limit bypass in SendEnhancedWebhook by validating the rendered URL.
- Fix rate limit spoofing in updates API by using secure IP extraction (trusted proxies).
- Fix memory leak in metrics history by correctly clearing fully stale data series.
- Fix public URL poisoning by preventing overwrites when explicitly configured.
2026-02-03 16:58:13 +00:00
rcourtman
4f40c3d751 fix: resolve critical stability and auth issues
- Fix data race in webhook notifications by removing shared state
- Fix duplicate monitors on config reload by stopping old instances
- Prevent metrics ID deletion on transient startup errors
- Support Bearer auth header for config export/import endpoints
2026-02-03 16:46:27 +00:00
rcourtman
aeca5e39fa Fix multi-tenant persistence and backend stability
- Initialize Alert and Notification managers with tenant-specific data directories

- Add panic recovery to WebSocket safeSend for stability

- Record host metrics to history for sparkline support
2026-02-03 16:24:42 +00:00
rcourtman
3b347b6548 fix: harden SQLite against I/O contention causing persistent lock errors
- Move all SQLite pragmas from db.Exec() to DSN parameters so every
  connection the pool creates gets busy_timeout and other settings.
  Previously only the first connection had these applied.
- Set MaxOpenConns(1) on audit, RBAC, and notification databases
  (metrics already had this). Fixes potential for multiple connections
  where new ones lack busy_timeout.
- Increase busy_timeout from 5s to 30s across all databases to
  tolerate disk I/O pressure during backup windows.
- Fix nested query deadlocks in GetRoles(), GetUserAssignments(), and
  CancelByAlertIDs() that would deadlock with MaxOpenConns(1).
- Fix circuit breaker retryInterval not resetting on recovery, which
  caused the next trip to start at 5-minute backoff instead of 5s.

Related to #1156
2026-02-02 17:29:14 +00:00
rcourtman
b611b2219c fix: negotiate SMTP auth mechanism from server capabilities. Related to #1165
Instead of hardcoding PLAIN auth or switching on provider name, query
the server's EHLO response for advertised AUTH mechanisms and pick the
best one (PLAIN preferred, LOGIN as fallback). This properly handles
Microsoft 365 which only advertises LOGIN, and any future server with
non-standard auth support.

Also adds TLS safety check to LOGIN auth (matching PlainAuth behavior)
and moves auth negotiation into each send method so it happens after
the connection and STARTTLS upgrade, when capabilities are accurate.
2026-02-02 11:36:00 +00:00
rcourtman
98a235578e fix: add SMTP LOGIN auth for Microsoft 365 email. Related to #1165
Microsoft 365 advertises AUTH LOGIN but not AUTH PLAIN, causing
"504 5.7.4 Unrecognized authentication type" for users with valid
credentials. Add a loginAuth implementation and use it automatically
when the Microsoft 365 / Outlook provider is selected.
2026-02-02 11:30:46 +00:00
rcourtman
7f7edfceb4 test: expand backend coverage 2026-01-25 21:08:44 +00:00
rcourtman
c44cb5af5b fix: use pure Go SQLite driver for arm64 compatibility
Switch from mattn/go-sqlite3 (CGO) to modernc.org/sqlite (pure Go)
for auth, audit, and notification queue storage. This enables SQLite
functionality on arm64 Docker images which are built with CGO_ENABLED=0.

Related to #1140
2026-01-21 18:58:23 +00:00
rcourtman
7ce1355bba fix(test): disable email in TestSendResolvedAlert to avoid retry delays 2026-01-20 18:29:29 +00:00
rcourtman
96b7370f7b test: improve coverage for API, AI, Alerts, and Frontend Utils
- Add comprehensive tests for internal/api/config_handlers.go (Phases 1-3)
- Improve test coverage for AI tools, chat service, and session management
- Enhance alert and notification tests (ResolvedAlert, Webhook)
- Add frontend unit tests for utils (searchHistory, tagColors, temperature, url)
- Add proximity client API tests
2026-01-20 15:52:39 +00:00
rcourtman
d06ed2edb3 refactor: Add testability improvements to core packages
hostagent/commands.go:
- Extract execCommandContext as mockable variable

hostagent/proxmox_setup.go:
- Convert stateFilePath constants to variables (testable)
- Extract runCommand and lookPath as mockable functions
- Add duplicate comment (minor cleanup needed)

notifications/notifications.go:
- Add GetQueueStats() method for interface compliance
- Used by NotificationMonitor interface

updates/manager.go:
- Add AddSSEClient, RemoveSSEClient, GetSSECachedStatus methods
- Enables interface-based SSE client management

pkg/audit/export.go:
- Minor testability improvements

go.mod/go.sum:
- Add stretchr/objx v0.5.2 (test mocking dependency)
2026-01-19 19:25:38 +00:00
rcourtman
17dec929a0 feat: Add mention support for webhook alerts. Related to #1118
Adds a Mention field to webhook configurations that allows users to tag
individuals or groups when alerts are sent. This works with:

- Discord: @everyone, <@USER_ID>, <@&ROLE_ID>
- Microsoft Teams: @General, user email
- Mattermost: @channel, @all, @username

The mention is included in the webhook payload via the {{.Mention}} template
variable. Built-in templates for Discord, Slack, and Teams now conditionally
include mentions when configured.

Backend changes:
- Add Mention field to WebhookConfig struct
- Add Mention field to WebhookPayloadData for template access
- Pass mention through sendGroupedWebhook

Frontend changes:
- Add mention field to Webhook interface
- Add Mention input to webhook configuration form
- Show service-specific help text for mention formats
2026-01-18 15:16:37 +00:00
rcourtman
8eabd266fc fix(frontend): extend kiosk mode to Docker and Hosts pages
Kiosk mode (?kiosk=1) now hides the filter panel on all main views:
- Proxmox dashboard (already supported)
- Docker hosts page (added)
- Hosts overview page (added)

This ensures a clean display when using token auth for dashboard/kiosk
displays without the search and filter controls visible.

Follow-up fix for #1055
2026-01-11 12:16:20 +00:00
rcourtman
d197955272 feat(notifications): add Mattermost webhook template with rich formatting
Add a dedicated Mattermost webhook template that uses Markdown formatting
in the text field. Unlike Slack (which supports blocks), Mattermost only
renders the "text" field, so this template includes:

- Emoji indicators for alert severity (🚨 critical, ⚠️ warning, ℹ️ info)
- Bold resource name and node
- Markdown table with all alert details
- Link to view alert in Pulse

This provides much more context than the previous Slack template's
fallback text which only showed "Pulse Alert: Critical - <HOSTNAME>".

Addresses #1084
2026-01-11 12:00:39 +00:00
rcourtman
b75728922c feat: add demo AI findings for mock mode
When MOCK_ENABLED=true, Pulse now injects realistic AI patrol
findings to showcase the AI features without requiring actual
LLM API calls. This enables the demo instance to demonstrate:

- Critical/warning/info findings with realistic content
- Patrol run history
- Actionable recommendations

Also includes refinements to dismissal logic from earlier work:
- Only 'not_an_issue' creates permanent suppression
- 'expected_behavior' and 'will_fix_later' just acknowledge
2025-12-22 17:16:26 +00:00
rcourtman
e6d07c3294 style: remove emojis from log messages
Replaced emoji icons with plain text for cleaner logs and cross-platform compatibility.
2025-12-13 21:29:11 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
629645a2a0 test: Add UpdateStatus not found test for notifications package 2025-12-02 14:26:17 +00:00
rcourtman
89b624c731 test: Add NewNotificationQueue invalid path test for notifications package 2025-12-02 14:17:05 +00:00
rcourtman
fca712430e test: Add singleAlertTemplate type coverage tests
Cover io type (formats as "I/O") and custom type (uses titleCase)
branches that were previously untested in the email template.
2025-12-02 13:45:49 +00:00
rcourtman
f4397b1512 test: Add ValidateWebhookURL edge case tests for notifications package
Cover empty URL, invalid scheme, missing hostname, cloud metadata
endpoints, loopback variants, and IPv6 link-local addresses.
2025-12-02 13:41:34 +00:00
rcourtman
b49a014737 test: Add sendResolvedEmail tests for notifications package
Add comprehensive tests for the sendResolvedEmail function covering:
- Empty alert list (returns error)
- Nil alert list (returns error)
- All nil alerts (returns error from content builder)
- Single alert (exercises email sending path)
- Multiple alerts (tests grouped notification)
- Mixed nil and valid alerts (filters correctly)
- Zero resolved time (handles gracefully)

Also improves buildResolvedNotificationContent coverage as a collateral
benefit since sendResolvedEmail calls it internally.

Coverage: sendResolvedEmail 0% → 100%
Coverage: buildResolvedNotificationContent → 100%
Coverage: notifications package 58.3% → 58.6%
2025-12-02 13:14:22 +00:00
rcourtman
d5acf4be32 test: Add performCleanup tests for notifications queue
Add 4 tests covering the performCleanup function:
- cleanup removes old completed entries (>7 days)
- cleanup removes old DLQ entries (>30 days)
- cleanup removes old audit logs (>30 days)
- cleanup with empty database (no panic)

performCleanup coverage: 0% → 87.0%
Notifications package: 57.3% → 58.3%
2025-12-02 12:16:55 +00:00
rcourtman
7104f76f06 test: Add GetQueue, addWebhookDelivery, GetWebhookHistory tests
Tests for NotificationManager accessor and helper functions.
Covers queue retrieval, webhook delivery tracking, history trimming
to max 100 entries, and copy-on-read semantics. Notifications 56.6%→57.3%.
2025-12-02 12:10:29 +00:00