Commit graph

206 commits

Author SHA1 Message Date
rcourtman
c664204b59 feat: add OIDC logout URL support and improve UX
Enhancements for OIDC authentication based on user feedback from issue #327:

1. Add OIDC logout URL support
   - New OIDC_LOGOUT_URL environment variable
   - UI field in OIDC settings panel for logout URL configuration
   - Properly redirects to IdP logout endpoint (e.g., Authentik end-session)
   - Stored in config and returned via security status API

2. Fix redirect URL help text in UI
   - Handle empty defaultRedirect string properly
   - Improved help text when PUBLIC_URL is not set
   - Clarify when auto-detection vs manual config is needed

3. Documentation improvements
   - Add note about using https:// in PUBLIC_URL/OIDC_REDIRECT_URL when behind TLS proxy
   - Document OIDC_LOGOUT_URL environment variable
   - Clarify X-Forwarded-Proto header behavior in OIDC docs
   - Add better guidance for Authentik users on HTTPS setup

4. Frontend improvements
   - Add HS256 signature algorithm error message in Login component
   - Display OIDC logout URL when available

These changes address the remaining OIDC UX issues reported by users,
particularly around logout functionality and reverse proxy configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 10:59:22 +00:00
rcourtman
2b4b6a08e1 fix: resolve OIDC authentication issues with DISABLE_AUTH and improve UX
Fixes multiple OIDC authentication issues reported in GitHub issue #327:

1. Fix DISABLE_AUTH=true disabling OIDC sessions
   - Reorder authentication checks to validate proxy auth and OIDC sessions
     before checking DISABLE_AUTH flag
   - Allows OIDC to function even when basic auth is disabled

2. Fix missing username display for OIDC users
   - Add GetSessionUsername() function to look up username from session ID
   - Set X-Authenticated-User header for OIDC authenticated requests
   - Update security status endpoint to return oidcUsername field
   - Display OIDC username in UI header alongside logout button

3. Fix missing logout button for OIDC users
   - Set hasAuth(true) when OIDC session is detected in frontend
   - Update security status endpoint to return OIDC info even when
     DISABLE_AUTH=true
   - Properly initialize WebSocket and load user preferences for OIDC sessions

4. Add documentation for Authentik HS256/RS256 issue
   - Document requirement for RSA signing key in Authentik
   - Add troubleshooting entry for signature algorithm mismatch
   - Provide clear resolution steps in CONFIGURATION.md and OIDC.md

All changes maintain backward compatibility and follow defensive security
practices. X-Forwarded-Proto header handling was verified to be correct.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 10:53:19 +00:00
rcourtman
4160d2e68b feat: add SSH key removal option to Quick Setup and fix node deletion
- Enhanced Quick Setup script to detect existing SSH configuration
  - Offers Keep/Remove/Skip options when SSH key already exists
  - Provides clean removal of SSH key from authorized_keys
  - Shows manual removal instructions for lm-sensors package
- Fixed ConfigWatcher panic on double-close during shutdown
- Fixed node deletion to allow removing the last node
  - Added SaveNodesConfigAllowEmpty method for explicit admin actions
  - Fixed deleted node host extraction before removal
- Display Quick Setup command after copying to clipboard
- Improved node name matching for temperature data
  - Handles .lan suffix variations between config and WebSocket state

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-01 10:25:56 +00:00
rcourtman
d0f049d373 refactor: improve setup script output professionalism
- Remove excessive emojis and decorative elements
- Use clear, concise language throughout
- Simplify progress indicators to simple checkmarks
- Remove unnecessary tips and verbose explanations
- Improve error message clarity
- Use proper capitalization (not ALL CAPS for emphasis)
- Clean up temperature monitoring prompt to be more direct

The setup script now presents a more professional, enterprise-ready
appearance while maintaining all functionality.
2025-10-01 08:14:49 +00:00
rcourtman
fdf0e0b958 feat: automate SSH key generation and embedding in setup scripts
- Add getOrGenerateSSHKey() function that automatically generates SSH keypair if needed
- Embed SSH public key directly in setup scripts (no manual copy/paste required)
- Simplify temperature monitoring setup - user just types 'y' and it's done
- Improves UX: removes manual steps for SSH key setup

Changes:
- internal/api/config_handlers.go: Add SSH key generation and auto-embedding
- frontend-modern/src/components/Settings/NodeModal.tsx: Remove dead setupCode modal code
- Setup script now includes embedded SSH_PUBLIC_KEY variable

User workflow before:
1. Run setup script
2. Prompted to run commands on Pulse server
3. Copy SSH public key manually
4. Paste into setup script
5. Done

User workflow now:
1. Run setup script
2. Type 'y' for temperature monitoring
3. Done (SSH key automatically installed)
2025-10-01 08:10:48 +00:00
rcourtman
6bfaa8b79a fix: OIDC redirect URL now respects X-Forwarded-Proto header
Addresses #327 - Users behind reverse proxies (Traefik, nginx, etc) were
experiencing redirect loop issues because the redirect URL was being built
with http:// instead of https:// when X-Forwarded-Proto was set.

Changes:
- Build OIDC redirect URL dynamically from each request instead of at startup
- Respect X-Forwarded-Proto and X-Forwarded-Host headers from reverse proxies
- Update UI help text to clarify auto-detection behavior
- Add debug logging to show how redirect URL is constructed

When redirect URL is not explicitly configured, Pulse now builds it from
the incoming request headers, properly detecting HTTPS when behind a proxy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 21:06:20 +00:00
rcourtman
edb8702e77 fix CI errors: remove unused imports and format Go code
addresses unused TypeScript variables and gofmt formatting issues
2025-09-30 19:59:55 +00:00
rcourtman
fd52a7add1 improve oidc error logging and documentation
addresses #327

- added detailed logging when ID token verification fails
- added better error messages for common OIDC issues
- updated docs with Authentik-specific configuration
- added troubleshooting section for redirect loops and invalid_id_token errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 19:52:55 +00:00
rcourtman
745c2b4c6b rebalance temperature monitoring messaging - reassuring but honest
Changed from scary warnings to confident, reassuring tone:

Before:
- "⚠️ IMPORTANT: This grants SSH access..."
- Emphasized risks and compromise scenarios
- Made users feel unsafe enabling the feature

After:
- "Works just like Ansible, Saltstack, etc."
- Emphasizes this is industry-standard approach
- Compares to trusted automation tools
- Focuses on what it does, not what could go wrong
- Still transparent about security model
- Removes duplicate/contradictory sections

The feature is secure and follows best practices. The messaging should
reflect confidence in the design while still being transparent.

Users should feel good about enabling it, not scared.
2025-09-30 19:16:30 +00:00
rcourtman
d5bd6c7676 improve SSH setup security messaging for temperature monitoring
- Make it clear SSH setup is OPTIONAL
- Explain security model upfront before user commits
- Detail exactly what access is being granted (root SSH, sensors only)
- Warn users to only proceed if they trust Pulse server
- Better differentiate public vs private keys
- Show exactly where the key is stored
- Explain how to revoke access
- Add comprehensive security documentation
- Include advanced option for command restrictions in authorized_keys
- Add risk assessment and best practices

This ensures users make informed decisions about SSH access to their
critical Proxmox infrastructure.
2025-09-30 19:13:23 +00:00
rcourtman
d78c388cd0 add SSH key setup to auto-setup script for temperature monitoring
- Prompts user to set up SSH access during auto-setup
- Guides user to paste their Pulse server's public key
- Adds key to /root/.ssh/authorized_keys
- Installs lm-sensors automatically
- Runs sensors-detect --auto for proper sensor detection
- Optional: user can skip and set up later manually
- Includes validation of SSH key format
- Shows clear instructions for manual setup if skipped

This ensures temperature monitoring works out-of-the-box for users
who run the auto-setup script.
2025-09-30 19:10:45 +00:00
rcourtman
f7842a0892 improve: add comprehensive debug logging for OIDC troubleshooting
Added detailed debug-level logs throughout the OIDC flow:
- Provider initialization (issuer, endpoints, scopes)
- Login flow tracking (client ID, redirect URL)
- Token exchange success/failure details
- Claims extraction (username, email, groups)
- Access control checks (why restrictions passed/failed)

Enhanced error logs to include issuer URL and actual error details in
audit events instead of generic "failed" messages.

Updated docs with Debug Logging section showing example output and
troubleshooting guidance for common issues like group restrictions.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 18:45:30 +00:00
rcourtman
552173b262 fix: improve alert system robustness and security
Addresses multiple issues identified during comprehensive alert system audit:

1. Fix ZFS device loop lock issue
   - Moved lock acquisition outside loop in checkZFSPoolHealth
   - Changed clearAlert to clearAlertNoLock when lock already held
   - Prevents multiple lock acquisitions in same iteration

2. Add alert deduplication on restore
   - Prevents duplicate alerts after service restart
   - Tracks seen alert IDs during LoadActiveAlerts
   - Logs warnings for any duplicates found

3. Add API input validation
   - validateAlertID function prevents DOS attacks
   - Limit alert ID length to 500 characters
   - Whitelist allowed characters (alphanumeric, -, _, :, /, .)
   - Cap history limit parameter at 10,000 records
   - Applied validation to acknowledge, unacknowledge, and clear endpoints

4. Add panic recovery to goroutines
   - All SaveActiveAlerts goroutines now have defer/recover
   - Cleanup goroutines protected from panics
   - Contextual error logging for each goroutine type

5. Document lock ordering
   - Added comprehensive documentation for Manager mutexes
   - Explains m.mu and resolvedMutex relationship
   - Clarifies acquisition rules to prevent deadlocks
   - Inline comments for resolvedMutex field

These fixes improve stability, security, data integrity, and maintainability
of the alert system without breaking API compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 15:35:39 +00:00
Pulse Monitor
e0e5528fe3 feat: add demo mode with read-only protection
Adds DEMO_MODE environment variable that blocks all write operations
while allowing full read/view functionality. Includes banner notification
in UI when demo mode is active.

Addresses need for safe public demo instances.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-30 14:46:20 +00:00
rcourtman
5470d2350b Add runtime mock toggles and auth-safe dev assets 2025-09-30 10:02:26 +00:00
rcourtman
013431a139 chore: tidy repo formatting and linting 2025-09-29 20:19:18 +00:00
rcourtman
e72d12d86e Refine security settings UI and credential rotation flow 2025-09-29 17:42:10 +00:00
rcourtman
3d78c0a9fa Improve security settings UX and fix alerts typing 2025-09-29 15:52:03 +00:00
rcourtman
9852ef9047 Align dev ports and improve auto-register UX 2025-09-29 15:05:59 +00:00
rcourtman
645c793f82 feat: add OIDC single sign-on 2025-09-29 10:22:27 +00:00
rcourtman
6f4771ae2d feat: unify styling and improve cluster detection 2025-09-28 18:46:52 +00:00
Pulse Monitor
e386a83778 cleanup: remove legacy POLLING_INTERVAL env variable (addresses #447)
PVE polling is hardcoded to 10s since Proxmox cluster/resources endpoint only updates every 10s internally. Setting faster polling intervals was wasteful and provided no benefit.

Removed:
- POLLING_INTERVAL env variable and all references
- pollingInterval from config structs and API responses
- UI settings for polling interval (already removed)
- Dynamic polling interval updates via SIGHUP
- Legacy persistence code for saving polling settings

The monitoring loop now uses a hardcoded 10s interval matching Proxmox's update frequency.
2025-09-11 12:33:44 +00:00
Pulse Monitor
bd0c817056 fix: handle trailing slashes in node URLs (addresses #428)
Strip trailing slashes and paths from URLs before parsing host:port
to prevent "invalid port number" errors when users add nodes with
URLs like https://192.168.xxx.xxx:8006/
2025-09-11 12:27:24 +00:00
Pulse Monitor
d4f87a6230 feat: add physical disk diagnostics to help troubleshoot missing disks (addresses #429)
- Add comprehensive disk detection diagnostics to /api/diagnostics
- Shows which nodes return disks and which don't
- Provides specific error messages and API responses
- Includes targeted recommendations based on failure type
- Helps users provide better debugging info when reporting issues
2025-09-11 07:28:15 +00:00
Pulse Monitor
2dbd9c4e36 feat: implement per-resource-type alert delays
- VMs/Containers default to 10 seconds
- Nodes default to 15 seconds
- Storage defaults to 30 seconds
- PBS servers default to 30 seconds

This allows more appropriate delays for different resource types instead of a single global delay that doesn't fit all use cases. Storage and PBS can have longer delays since they're less critical and more prone to transient spikes during operations.
2025-09-10 19:45:42 +00:00
Pulse Monitor
b270ef7501 fix: improve PBS alert threshold persistence when updating nodes (addresses #440)
The issue was that PBS monitoring uses name-based IDs (pbs-<name>) while
the config system uses index-based IDs (pbs-0, pbs-1). When updating PBS
node configuration, the alert overrides were already being preserved but
the ID mismatch wasn't properly documented. Added explicit logging to
track PBS override preservation using the correct monitoring ID.
2025-09-10 17:11:53 +00:00
Pulse Monitor
ceb9939295 fix: ensure cluster endpoints include port number (addresses #428)
When detecting Proxmox cluster nodes, the Host field was being set to just the node name without a port. This caused validation to fail with "invalid Port number" error when qdevices were running.

Now cluster endpoints properly include the port (8006) in the Host field, allowing clusters with qdevices to be added successfully.
2025-09-10 17:01:11 +00:00
Pulse Monitor
0aeeb3da0d fix: resolve alert acknowledgment timeout issue (addresses #438)
The alert acknowledgment endpoints were hanging because GetState() was called
synchronously to broadcast updates via WebSocket, which could take significant
time with many nodes/guests. This caused the HTTP response to timeout, showing
an error to users even though the alert was successfully acknowledged.

Fixed by:
- Sending HTTP response immediately after acknowledging the alert
- Moving WebSocket broadcast to a goroutine to avoid blocking
- Applied fix to all alert endpoints (acknowledge, unacknowledge, clear, bulk ops)

This resolves the issue where users saw 'Failed to acknowledge alert' errors
but the alert was actually acknowledged (disappeared on refresh).
2025-09-10 15:49:12 +00:00
Pulse Monitor
ce6a76a0f9 fix: preserve PBS alert thresholds when updating node configuration (addresses #440)
When updating PBS nodes through the node configuration UI, alert thresholds
were being reset to defaults. This was because alert overrides are stored
separately from node configuration and weren't being preserved during node updates.

The fix ensures that when a node is updated, the alert configuration (including
any custom threshold overrides) is reloaded and preserved. This applies to both
PBS and PVE nodes to ensure consistent behavior.
2025-09-10 15:12:43 +00:00
Pulse Monitor
670bf4665d fix: improve cluster detection reliability on first add (addresses #437)
- Add retry logic with delays to detectPVECluster function to handle API permission propagation
- Periodically re-check standalone nodes to detect if they're actually part of a cluster
- Increase timeout from 3 to 5 seconds for cluster detection attempts
- Skip retries for definitively standalone nodes (501 not implemented errors)

This addresses the issue where adding a PVE cluster doesn't detect it properly on first attempt,
requiring deletion and re-adding to work correctly. The retry mechanism gives time for
API permissions to fully propagate in Proxmox.
2025-09-10 14:39:01 +00:00
Pulse Monitor
25ed6172a0 refactor: improve service name detection compatibility (addresses #430)
- Use centralized detectServiceName() function instead of duplicate logic
- Automatically detect whether system uses 'pulse' or 'pulse-backend' service
- Improves compatibility between official and community installer scripts
- Reduces confusion when users mix installation methods
2025-09-10 12:33:20 +00:00
Pulse Monitor
820ee6499d fix: improve cluster detection to handle qdevice configurations (addresses #428)
- Add API validation for cluster nodes to filter out qdevice VMs
- Only include nodes with working Proxmox APIs in cluster endpoints
- Prevent connection failures when cluster has non-Proxmox participants
- Add detailed logging for cluster node validation process

This resolves issues where Proxmox clusters using corosync qdevice
(external quorum device) would fail to connect because Pulse tried
to connect to the qdevice VM which has no Proxmox API.
2025-09-07 21:19:30 +00:00
Pulse Monitor
95987141b9 fix: improve guest URL validation and error handling (addresses #427)
- Add client-side URL validation with instant feedback
- Show validation errors inline below URL input fields
- Prevent saving when URLs have validation errors
- Improve error message extraction in API client
- Handle incomplete URLs like 'https://emby.' gracefully
- Backend already had validation, now frontend shows it properly
2025-09-07 14:27:03 +00:00
Pulse Monitor
eab4c07986 fix: improve error handling for guest URL saving (addresses #427)
- Add more specific error messages when metadata save fails
- Better handling of permission and disk space errors
- This should help diagnose why guest URLs fail to save in some cases
- The atomic write operation was already in place but errors weren't clear
2025-09-07 14:05:22 +00:00
Pulse Monitor
e4e4f515c7 fix: resolve VM disk monitoring issues (addresses #414, #416, #425)
- Always query guest agent for running VMs instead of only when disk is 0
- Add duplicate mount point detection to prevent inflated disk totals
- Show allocated disk size as fallback when guest agent unavailable
- Add comprehensive logging for guest agent disk queries
- Include diagnostic script for troubleshooting VM disk issues
2025-09-06 19:59:25 +00:00
Pulse Monitor
5325ef481e fix: comprehensive VM disk usage reporting improvements (addresses #414, #416, #348, #367, #425)
- Always query guest agent for running VMs (cluster/resources API always returns 0)
- Show allocated disk size when guest agent unavailable (instead of misleading 0%)
- Fix duplicate mount point counting issue (#425)
- Add comprehensive logging for guest agent queries
- Include diagnostic script for troubleshooting VM disk issues
- Update both monitor.go and monitor_optimized.go for consistency
2025-09-06 19:52:11 +00:00
Pulse Monitor
dda66c4cd3 security: fix path traversal and malformed token handling vulnerabilities
- Prevent path traversal attacks by cleaning and validating URL paths
- Use secure token comparison to prevent timing attacks
- Return appropriate HTTP status codes for different attack vectors
- Add comprehensive logging for security events
2025-09-06 12:38:46 +00:00
Pulse Monitor
776fec7018 fix: properly handle PBS connection timeouts with granular timeout settings
The real issue was not the overall timeout duration, but that DNS resolution and TLS handshake could hang indefinitely. Added specific timeouts for:
- DNS resolution/connection: 10 seconds
- TLS handshake: 10 seconds
- Response headers: 10 seconds

This prevents the connection from hanging on DNS lookup (like with pve-backup.lan) or during TLS negotiation, which was causing the 'context deadline exceeded' errors. (addresses #424)
2025-09-06 10:07:10 +00:00
Pulse Monitor
f7b8b0dc7f fix: increase PBS timeout to prevent 'context deadline exceeded' errors (addresses #424)
PBS servers can be slow to respond, especially under load or over slower connections. Increased the timeout from 10 seconds to 30 seconds specifically for PBS version checks during diagnostics.
2025-09-06 10:03:55 +00:00
Pulse Monitor
2eb7589747 fix: prevent rate limiting on essential real-time endpoints (addresses #419)
The /api/state and /api/guests/metadata endpoints are now excluded from
rate limiting as they are polled frequently by the UI for real-time updates.
This prevents the "Loading..." issue when users with multiple nodes access
the application.

- Added skip list in UniversalRateLimitMiddleware for real-time endpoints
- Removed duplicate rate limiting logic from router's ServeHTTP
- Consolidated all rate limiting into the universal middleware
2025-09-04 20:22:56 +00:00
Pulse Monitor
e66b74ee65 fix: allow password changes when behind proxy Basic Auth (addresses #407)
The password change endpoint now handles both scenarios:
- Direct auth: Uses Authorization header when it contains Pulse credentials
- Proxy auth: Uses currentPassword from JSON body when behind proxy Basic Auth
- Prevents proxy auth from interfering with Pulse's own authentication
- Maintains security by always requiring current password verification
2025-09-04 19:42:49 +00:00
Pulse Monitor
635d7c06f4 fix: resolve webhook JSON parsing errors for all services
- Fixed SendEnhancedWebhook to use service-specific payload generation
- Test webhooks now properly skip template-syntax headers
- ntfy and other plain text services correctly skip JSON validation
- Prevents 'invalid character' errors when testing webhooks
- All webhook payload generation now respects service type
2025-09-04 18:55:23 +00:00
Pulse Monitor
4574dad237 fix: resolve 404 error when updating or deleting webhooks
- Fixed webhook ID extraction in UpdateWebhook and DeleteWebhook handlers
- Previous code expected 5 URL parts but path only had 2 after prefix stripping
- Now correctly extracts webhook ID from /api/notifications/webhooks/{id}
- Resolves frontend error when saving webhook changes
2025-09-04 18:24:02 +00:00
Pulse Monitor
83862eb817 fix: set Online status for mock cluster endpoints
Mock cluster endpoints were showing grey dots because the Online field
wasn't being set based on the node status. Now properly reflects the
node's online/offline status with green/grey indicators.
2025-09-04 15:29:42 +00:00
Pulse Monitor
0a66bffb58 fix: alert acknowledgment routing path mismatch (addresses #380)
The acknowledge/unacknowledge/clear endpoints were returning 404 due to incorrect path trimming in HandleAlerts. The router was registered with /api/alerts/ but the handler was trimming /api/alerts, causing path parsing to be off by one character.
2025-09-04 13:06:15 +00:00
Pulse Monitor
69598d62f6 enhance: improve mock data realism and alert system
- Add dynamic metric fluctuations for VMs and containers in mock data
- Fix alert acknowledgment to dim instead of hide alerts
- Implement unacknowledge functionality with backend persistence
- Simplify alert UI to single-click toggle (remove selection system)
- Add proper hysteresis for alert resolution when metrics drop
- Fix SVG icon boundaries in alert displays
- Add webhook disable toggles for testing without notifications
- Fix frontend directory duplication issue (addresses frontend-modern recreation)
- Improve alert sorting to show most recent first
- Make mock system generate realistic metric changes for proper alert lifecycle
2025-09-02 21:11:01 +00:00
Pulse Monitor
21d784164a fix: tag indicators now only show for guests that actually have tags
- Added ToFrontend() method to StateSnapshot for proper data conversion
- Modified /api/state endpoint to use frontend-formatted data
- Enhanced WebSocket store to handle tag data transformation consistently
- Ensures tags are properly converted between backend strings and frontend arrays
2025-08-31 18:01:47 +00:00
Pulse Monitor
426f4b274e fix: prevent mock mode from wiping production node configuration
Mock mode was inadvertently clearing real node configuration when toggling.
Added protection to prevent SaveNodesConfig from modifying nodes.enc when
in mock mode. Mock and production data are now completely separated.
2025-08-31 16:48:52 +00:00
Pulse Monitor
68801366d3 fix: properly handle alert IDs with special characters in acknowledge/clear endpoints (addresses #380)
Alert IDs like 'pve1:qemu/101-cpu' contain slashes which were breaking the URL path parsing.
Fixed by finding the /acknowledge or /clear suffix and extracting everything before it,
rather than trying to split by slashes.
2025-08-31 16:24:08 +00:00
Pulse Monitor
903581f66d fix: alert acknowledgement URL parsing (addresses #380)
The acknowledge and clear alert endpoints were incorrectly parsing the alert ID from the URL path, causing 404 errors. Fixed the path extraction logic to properly handle the /api/alerts/{id}/acknowledge pattern.
2025-08-31 16:16:36 +00:00