Commit graph

2367 commits

Author SHA1 Message Date
rcourtman
0ddbf37c59 feat(auth): add policy evaluator and SQLite auth manager for RBAC
- Add policy evaluator for fine-grained access control
- Implement SQLite-backed auth manager for user/role persistence
- Support role-based permissions evaluation
2026-01-12 15:20:49 +00:00
rcourtman
d0ba203203 feat(audit): add comprehensive audit logging system
- Add SQLite-backed audit logger for persistent audit trails
- Implement cryptographic signing for tamper detection
- Add audit log export functionality
- Add webhook notifications for audit events
2026-01-12 15:20:33 +00:00
rcourtman
3febd3266e feat(ai): add approval store and dry-run simulator for AI Auto-Fix
- Add approval store for tracking AI-suggested changes
- Implement SQLite-backed persistence for approvals
- Add dry-run simulator for testing AI fixes safely
- Support simulated execution with rollback capability
2026-01-12 15:20:16 +00:00
rcourtman
97701297c4 feat(sso): add SAML 2.0 and multi-provider SSO support
- Add SAML 2.0 Service Provider implementation using crewjam/saml
- Support IdP metadata from URL or raw XML
- Add multi-provider SSO configuration model
- Implement provider management API (CRUD operations)
- Add provider connection testing endpoint
- Add IdP metadata preview endpoint
- Add SSOProvidersPanel component for settings UI
- Support attribute-based role mapping (groups → Pulse roles)

API endpoints:
- GET/POST /api/security/sso/providers - List/create providers
- GET/PUT/DELETE /api/security/sso/providers/{id} - Provider CRUD
- POST /api/security/sso/providers/test - Test connection
- POST /api/security/sso/providers/metadata/preview - Preview metadata
- /api/saml/{id}/login, /acs, /metadata, /logout, /slo - SAML endpoints
2026-01-12 15:19:59 +00:00
rcourtman
1dda538265 fix(models): extend namespace disambiguation to SyncGuestBackupTimes (#1095)
The previous commit fixed namespace disambiguation for backup alerts,
but the Overview display uses SyncGuestBackupTimes to populate backup
timestamps on VMs/Containers. This commit extends the same namespace
matching logic to that function.

Also tightened the matching algorithm to use suffix matching instead
of substring matching, preventing false positives like "pve" matching
"pve-nat".
2026-01-12 15:11:59 +00:00
rcourtman
a88edd7c8f fix(alerts): disambiguate PBS backups using namespace for multi-PVE setups (#1095)
When multiple PVE instances have VMs with overlapping VMIDs, PBS backups
were being matched to the wrong VM because the code would just use the
first matching guest. Now when a PBS backup has a namespace, it attempts
to match that namespace to the PVE instance name to find the correct VM.

This helps users who have separate PBS instances backing up different
PVE clusters with namespaces like "pve1", "nat", etc.
2026-01-12 14:55:17 +00:00
rcourtman
4090d98160 fix(docker): calculate Used memory from Total-Free in Docker-in-LXC
When running Docker inside an LXC container, gopsutil can read the
Total memory (from cgroup limits) and Free memory correctly, but
returns 0 for Used memory. This caused the display to show "0B / 7GB"
even though memory was being used.

Added a fallback that calculates Used = Total - Free when Used is 0
but Total and Free are valid. This completes the fallback chain for
Docker-in-LXC memory reporting.

Fixes #1075
2026-01-12 14:04:38 +00:00
rcourtman
b2a6cd0fa3 fix(agent): add FreeBSD platform support to agent download and UI (#1051)
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
  so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
  with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go

Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
2026-01-11 23:51:12 +00:00
rcourtman
f527e6ebd0 docs: fix Kubernetes DaemonSet deployment guide
Fixes #1091 - addresses all three documentation issues reported:

1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
   exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64

2. PULSE_AGENT_ID: Added to example and documented why it's required
   for DaemonSets (prevents token conflicts when all pods share one
   API token)

3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
   PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
   of the default behavior (show only problematic resources)

Also added tolerations, resource requests/limits, and ARM64 note.
2026-01-11 21:43:23 +00:00
rcourtman
607a303b77 fix(frontend): add Mattermost to webhook service type dropdown
The Mattermost webhook template was added to the backend (d1979552)
but the frontend service dropdown wasn't updated, so users couldn't
select Mattermost as a service type.

Adds mattermost to:
- serviceName mapping
- service selection array
- service description ternary

Fixes #1084
2026-01-11 16:44:39 +00:00
rcourtman
0267d1aa8a Auto-update Helm chart version to 5.0.15 2026-01-11 15:34:14 +00:00
rcourtman
1beb823552 Auto-update Helm chart documentation 2026-01-11 15:34:12 +00:00
rcourtman
66dff95355 chore: bump version to 5.0.15 2026-01-11 13:46:44 +00:00
rcourtman
8eabd266fc fix(frontend): extend kiosk mode to Docker and Hosts pages
Kiosk mode (?kiosk=1) now hides the filter panel on all main views:
- Proxmox dashboard (already supported)
- Docker hosts page (added)
- Hosts overview page (added)

This ensures a clean display when using token auth for dashboard/kiosk
displays without the search and filter controls visible.

Follow-up fix for #1055
2026-01-11 12:16:20 +00:00
rcourtman
d197955272 feat(notifications): add Mattermost webhook template with rich formatting
Add a dedicated Mattermost webhook template that uses Markdown formatting
in the text field. Unlike Slack (which supports blocks), Mattermost only
renders the "text" field, so this template includes:

- Emoji indicators for alert severity (🚨 critical, ⚠️ warning, ℹ️ info)
- Bold resource name and node
- Markdown table with all alert details
- Link to view alert in Pulse

This provides much more context than the previous Slack template's
fallback text which only showed "Pulse Alert: Critical - <HOSTNAME>".

Addresses #1084
2026-01-11 12:00:39 +00:00
rcourtman
802ce68e0c fix(frontend): persist kiosk mode in sessionStorage across navigation
Previously, kiosk mode (?kiosk=1) was only read from URL params on each
render. When navigating to different sections or refreshing, the kiosk
param was lost and the filter panel would reappear.

Now kiosk mode is persisted to sessionStorage when detected from URL,
similar to how API tokens are handled. This makes it survive:
- Navigation between dashboard sections (Docker, Ceph, etc.)
- Page refreshes
- The URL cleanup that removes the token parameter

To exit kiosk mode, users can either:
- Close the browser tab (clears sessionStorage)
- Navigate to the URL with ?kiosk=false

Fixes follow-up bug reported in #1055
2026-01-11 11:57:22 +00:00
rcourtman
80444a9022 fix(monitor): use cluster quorum status instead of endpoint count for health
Previously, when some cluster endpoints were unreachable (e.g., backup
nodes intentionally offline), the cluster was marked as "degraded" even
though the Proxmox cluster itself was healthy and had quorum.

Now the connection health check queries the Proxmox cluster's actual
quorum status. A cluster is only marked "degraded" if it has lost
quorum (not enough votes for consensus), which is the actual indicator
of cluster instability.

This means:
- Cluster with quorum + some nodes offline = "healthy"
- Cluster without quorum = "degraded" (warning)
- All endpoints down = "error"

Fixes #1085
2026-01-11 11:54:02 +00:00
rcourtman
9389194d2e fix(frontend): use Show components for reactive license check in AgentProfilesPanel
SolidJS components only run once - early returns based on signals don't
re-render when those signals change. The license check spinner was
getting stuck because checkingLicense() was evaluated once at mount
time, and even though setCheckingLicense(false) was called after the
API response, the component didn't re-render.

Converted early returns to nested <Show> components which properly track
signal changes and update the UI when checkingLicense becomes false.

Fixes #1076
2026-01-11 08:50:14 +00:00
rcourtman
9cd79daa68 fix(hostagent): prevent data mixing when multiple nodes share hostname
When multiple PVE nodes have the same hostname (e.g., both named "pve"),
auto-linking would incorrectly link all host agents to the first matching
node, causing temperature and sensor data to be mixed/duplicated.

Changes:
- findLinkedProxmoxEntity now detects hostname collisions and refuses
  to auto-link, logging a warning instead
- Added manual link API endpoint (POST /api/agents/host/link) so users
  can explicitly link agents to the correct nodes
- Added State.LinkHostAgentToNode for bidirectional manual linking

Fixes #1081
2026-01-10 23:12:51 +00:00
rcourtman
a978693cb1 fix(dockeragent): use TotalMemoryBytes fallback for memory.Total
The previous fix (4ff9e58c) added a fallback for TotalMemoryBytes in
the agent when Docker's info.MemTotal returns 0 in LXC environments.
However, the server was not using TotalMemoryBytes to populate the
memory.Total field - it only used gopsutil's Memory.TotalBytes.

When gopsutil also fails to read memory in the LXC container, the
frontend would see memory.Total=0 and wouldn't fall back to
totalMemoryBytes due to JavaScript's nullish coalescing (??) only
triggering on null/undefined, not on 0.

This fix ensures the server uses TotalMemoryBytes as a fallback for
memory.Total when gopsutil returns 0, providing a complete fix chain:
1. Agent: Falls back to gopsutil when Docker returns 0
2. Server: Falls back to TotalMemoryBytes when gopsutil returns 0

Fixes #1075
2026-01-10 22:45:41 +00:00
rcourtman
55f5f071ed fix: replace hallucinated upgrade URLs with correct pulserelay.pro
Previous LLM sessions incorrectly inserted fake URLs (pulse.sh/pro and
yourpulse.io/pro) for the Pro upgrade links. Neither domain exists.

Replaced all 34 instances with the correct URL: https://pulserelay.pro/

Fixes #1077
2026-01-10 22:45:40 +00:00
rcourtman
d6554a0d87 fix(frontend): expand agent tables to full width in full-width mode
Tables in Settings → Agents were not expanding to fill the container
width when full-width mode was enabled. Added `w-full` class to all
tables (Managed Agents, Kubernetes Clusters, and removed host tables)
so they properly expand in full-width layouts.

Fixes #1080
2026-01-10 22:45:40 +00:00
rcourtman
135e5aaf0a Auto-update Helm chart version to 5.0.14 2026-01-10 19:35:05 +00:00
rcourtman
eb1eff5c92 Auto-update Helm chart documentation 2026-01-10 19:35:03 +00:00
rcourtman
2a0909e9aa chore: bump version to 5.0.14 2026-01-10 18:48:40 +00:00
rcourtman
4ff9e58c23 fix(dockeragent): use fallback memory reading in Docker-on-LXC
When Docker daemon runs inside an LXC container, it may report 0 for
MemTotal because it can't read the cgroup memory limits correctly.
This caused the UI to show "0B / 7GB" and trigger false alerts with
overflow percentages (214748364799.6%).

The fix checks if Docker's info.MemTotal is 0 and falls back to
gopsutil's /proc/meminfo reading (snapshot.Memory.TotalBytes) which
works correctly in LXC environments.

Fixes #1075
2026-01-10 18:41:41 +00:00
rcourtman
5d4d2ffefc fix(api): add missing Pro features to license features endpoint
The /api/license/features endpoint was only returning AI and agent
profile features, but was missing Team & Compliance features:
- sso (basic SSO/OIDC)
- advanced_sso (SAML, multi-provider)
- rbac (role-based access control)
- audit_logging (enterprise audit logs)
- advanced_reporting (PDF/CSV reports)

This caused Pro users to see "Upgrade to Pro" buttons on SSO, Roles,
and Audit Log panels even though their license included these features.

Fixes #1077
2026-01-10 18:38:12 +00:00
rcourtman
543ae8b417 fix(frontend): show update availability for Docker deployments
Previously, Docker environments would skip update checks entirely and
always show "running latest version". Now Docker users will see when
a new version is available (though the update mechanism is still
docker pull, not the automatic updater).

Fixes #1074
2026-01-10 15:48:32 +00:00
rcourtman
657d59ad64 fix(frontend): extract and use token from URL query parameter
When visiting Pulse with ?token=xxx, the frontend now:
1. Extracts the token from the URL
2. Stores it in sessionStorage for subsequent requests
3. Removes it from the URL for security (browser history)

This enables kiosk/dashboard mode via URL tokens without needing
cookie persistence.

Fixes #1055
2026-01-10 15:44:26 +00:00
rcourtman
b7f5cfde1c fix: apply subnet preference for cluster nodes in fallback path
When cluster node validation fails (because cluster-reported IPs are on
an internal network unreachable from Pulse), the fallback path was not
applying subnet preference logic. This caused Pulse to continue trying
to connect to internal cluster IPs instead of management network IPs.

Now the fallback path queries node network interfaces via the initial
connection and sets IPOverride to an IP on the same network as the
original connection, just like the validated node path does.

Fixes #929
2026-01-10 15:40:48 +00:00
rcourtman
1f4f0472b0 fix: use configured memory (MaxMem) instead of balloon for VM total
Previously, when memory ballooning was active on a VM, Pulse would use
the balloon value as the total memory instead of the configured MaxMem.
This caused confusing displays where a 4GB VM with 1GB balloon would
show "94% (966MB/1GB)" instead of "24% (966MB/4GB)".

The balloon value is still tracked in memory.balloon for the frontend's
yellow balloon marker visualization, but no longer replaces the total.

Fixes #1070
2026-01-10 15:37:45 +00:00
rcourtman
1816e2dbb8 fix(agent): use dataset used capacity for RAIDZ pools instead of zpool alloc
For RAIDZ pools, zpool ALLOC includes parity overhead, but users expect
to see actual data usage. Now using dataset Used value (from statfs)
when RAIDZ is detected, matching the existing fix for total capacity.

Fixes the second part of #1052 where used capacity was inflated.
2026-01-10 15:25:28 +00:00
rcourtman
80729408c1 docs: add RBAC endpoints, OIDC group mapping, and update Pro terminology
- Add RBAC/role management endpoints to API.md
- Document OIDC group-to-role mapping feature in OIDC.md
- Add missing config files to CONFIGURATION.md (audit.db, AI files)
- Add OIDC_GROUP_ROLE_MAPPINGS env var documentation
- Fix "enterprise" -> "Pro" terminology in TROUBLESHOOTING.md
- Refocus TEMPERATURE_MONITORING.md on agent method, collapse legacy proxy docs
2026-01-10 13:59:50 +00:00
rcourtman
a970a6e5ee fix(lint): prefix unused err variable with underscore 2026-01-10 12:55:02 +00:00
rcourtman
4e11022425 refactor(ui): remove user-facing 'enterprise' terminology
- Replace 'enterprise authentication' with 'team authentication'
- Replace 'Enterprise Insights' with 'Advanced Insights'
- Deprecate isEnterprise() in favor of isPro() and hasFeature()
- Update Settings.tsx to use isPro() for badge visibility
2026-01-10 12:55:02 +00:00
rcourtman
2246aee35f chore: replace 'enterprise' terminology with 'Pro' in hot-dev docs 2026-01-10 12:55:02 +00:00
rcourtman
668cdf3393 feat(license): add audit_logging, advanced_sso, advanced_reporting to Pro tier
Major changes:
- Add audit_logging, advanced_sso, advanced_reporting features to Pro tier
- Persist session username for RBAC authorization after restart
- Add hot-dev auto-detection for pulse-pro binary (enables SQLite audit logging)

Frontend improvements:
- Replace isEnterprise() with hasFeature() for granular feature gating
- Update AuditLogPanel, OIDCPanel, RolesPanel, UserAssignmentsPanel, AISettings
- Update AuditWebhookPanel to use hasFeature('audit_logging')

Backend changes:
- Session store now persists and restores username field
- Update CreateSession/CreateOIDCSession to accept username parameter
- GetSessionUsername falls back to persisted username after restart

Testing:
- Update license_test.go to reflect Pro tier feature changes
- Update session tests for new username parameter
2026-01-10 12:55:02 +00:00
rcourtman
cba2e8609d Auto-update Helm chart version to 5.0.13 2026-01-10 08:47:24 +00:00
rcourtman
773efb4b19 Auto-update Helm chart documentation 2026-01-10 08:47:22 +00:00
rcourtman
9a59c4459b fix(workflow): build frontend before building backend in demo deployment 2026-01-10 00:41:00 +00:00
rcourtman
486ee29bc8 chore: bump version to 5.0.13 and fix test mocks 2026-01-10 00:27:11 +00:00
rcourtman
07b4765b8d fix: respect quiet hours for recovery notifications (#1068)
Recovery notifications were bypassing the quiet hours check, causing
users to receive recovery alerts during their configured quiet hours
window even though the original "down" alerts were suppressed.

- Add ShouldSuppressResolvedNotification() to alert manager
- Check quiet hours before sending recovery notifications in monitor
- Recovery notifications now follow same suppression rules as alerts
2026-01-09 21:47:36 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
rcourtman
92c150e979 feat(rbac): add OIDC group mapping tests and audit logging for RBAC actions 2026-01-09 19:25:33 +00:00
rcourtman
6ed1fdf806 feat(rbac): implement RBAC UI, OIDC group mapping, and API standard auth
- Added Roles and Users settings panels
- Implemented OIDC group-to-role mappings in config and auth flow
- Standardized API token context handling via pkg/auth
- Added Pulse Pro branding and upgrade banners to RBAC features
- Cleanup: Removed empty code blocks and fixed lint errors
2026-01-09 19:16:34 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
22059210f7 fix(frontend): remove unused import and variable to satisfy hooks 2026-01-09 14:46:15 +00:00
rcourtman
5c4399d69f feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929) 2026-01-09 14:45:29 +00:00
rcourtman
6019e3e77e fix: normalize custom OpenAI-compatible API URLs (#1067)
Users providing base URLs like "https://openrouter.ai/api/v1" were
getting HTML error responses because the client used the URL directly
without appending "/chat/completions".

- Normalize baseURL in NewOpenAIClient to ensure it ends with /chat/completions
- Fix modelsEndpoint() to derive /models from the normalized baseURL
- Add tests for URL normalization with various endpoint formats
2026-01-09 09:13:36 +00:00
rcourtman
020553a12d fix: use flexible subnet matching instead of fixed /24
The previous implementation assumed /24 subnets, which failed for
larger networks (e.g., /16 or /20). Now uses progressive subnet
matching that tries /24, /20, and /16 to handle various network sizes.

Example: If connection IP is 10.1.1.5 and a node has 10.1.2.6,
it now correctly identifies them as being on the same network.
2026-01-08 23:24:50 +00:00