Enhancements for OIDC authentication based on user feedback from issue #327:
1. Add OIDC logout URL support
- New OIDC_LOGOUT_URL environment variable
- UI field in OIDC settings panel for logout URL configuration
- Properly redirects to IdP logout endpoint (e.g., Authentik end-session)
- Stored in config and returned via security status API
2. Fix redirect URL help text in UI
- Handle empty defaultRedirect string properly
- Improved help text when PUBLIC_URL is not set
- Clarify when auto-detection vs manual config is needed
3. Documentation improvements
- Add note about using https:// in PUBLIC_URL/OIDC_REDIRECT_URL when behind TLS proxy
- Document OIDC_LOGOUT_URL environment variable
- Clarify X-Forwarded-Proto header behavior in OIDC docs
- Add better guidance for Authentik users on HTTPS setup
4. Frontend improvements
- Add HS256 signature algorithm error message in Login component
- Display OIDC logout URL when available
These changes address the remaining OIDC UX issues reported by users,
particularly around logout functionality and reverse proxy configuration.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes multiple OIDC authentication issues reported in GitHub issue #327:
1. Fix DISABLE_AUTH=true disabling OIDC sessions
- Reorder authentication checks to validate proxy auth and OIDC sessions
before checking DISABLE_AUTH flag
- Allows OIDC to function even when basic auth is disabled
2. Fix missing username display for OIDC users
- Add GetSessionUsername() function to look up username from session ID
- Set X-Authenticated-User header for OIDC authenticated requests
- Update security status endpoint to return oidcUsername field
- Display OIDC username in UI header alongside logout button
3. Fix missing logout button for OIDC users
- Set hasAuth(true) when OIDC session is detected in frontend
- Update security status endpoint to return OIDC info even when
DISABLE_AUTH=true
- Properly initialize WebSocket and load user preferences for OIDC sessions
4. Add documentation for Authentik HS256/RS256 issue
- Document requirement for RSA signing key in Authentik
- Add troubleshooting entry for signature algorithm mismatch
- Provide clear resolution steps in CONFIGURATION.md and OIDC.md
All changes maintain backward compatibility and follow defensive security
practices. X-Forwarded-Proto header handling was verified to be correct.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Enhanced Quick Setup script to detect existing SSH configuration
- Offers Keep/Remove/Skip options when SSH key already exists
- Provides clean removal of SSH key from authorized_keys
- Shows manual removal instructions for lm-sensors package
- Fixed ConfigWatcher panic on double-close during shutdown
- Fixed node deletion to allow removing the last node
- Added SaveNodesConfigAllowEmpty method for explicit admin actions
- Fixed deleted node host extraction before removal
- Display Quick Setup command after copying to clipboard
- Improved node name matching for temperature data
- Handles .lan suffix variations between config and WebSocket state
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Remove excessive emojis and decorative elements
- Use clear, concise language throughout
- Simplify progress indicators to simple checkmarks
- Remove unnecessary tips and verbose explanations
- Improve error message clarity
- Use proper capitalization (not ALL CAPS for emphasis)
- Clean up temperature monitoring prompt to be more direct
The setup script now presents a more professional, enterprise-ready
appearance while maintaining all functionality.
- Add getOrGenerateSSHKey() function that automatically generates SSH keypair if needed
- Embed SSH public key directly in setup scripts (no manual copy/paste required)
- Simplify temperature monitoring setup - user just types 'y' and it's done
- Improves UX: removes manual steps for SSH key setup
Changes:
- internal/api/config_handlers.go: Add SSH key generation and auto-embedding
- frontend-modern/src/components/Settings/NodeModal.tsx: Remove dead setupCode modal code
- Setup script now includes embedded SSH_PUBLIC_KEY variable
User workflow before:
1. Run setup script
2. Prompted to run commands on Pulse server
3. Copy SSH public key manually
4. Paste into setup script
5. Done
User workflow now:
1. Run setup script
2. Type 'y' for temperature monitoring
3. Done (SSH key automatically installed)
Addresses #327 - Users behind reverse proxies (Traefik, nginx, etc) were
experiencing redirect loop issues because the redirect URL was being built
with http:// instead of https:// when X-Forwarded-Proto was set.
Changes:
- Build OIDC redirect URL dynamically from each request instead of at startup
- Respect X-Forwarded-Proto and X-Forwarded-Host headers from reverse proxies
- Update UI help text to clarify auto-detection behavior
- Add debug logging to show how redirect URL is constructed
When redirect URL is not explicitly configured, Pulse now builds it from
the incoming request headers, properly detecting HTTPS when behind a proxy.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
addresses #327
- added detailed logging when ID token verification fails
- added better error messages for common OIDC issues
- updated docs with Authentik-specific configuration
- added troubleshooting section for redirect loops and invalid_id_token errors
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Changed from scary warnings to confident, reassuring tone:
Before:
- "⚠️ IMPORTANT: This grants SSH access..."
- Emphasized risks and compromise scenarios
- Made users feel unsafe enabling the feature
After:
- "Works just like Ansible, Saltstack, etc."
- Emphasizes this is industry-standard approach
- Compares to trusted automation tools
- Focuses on what it does, not what could go wrong
- Still transparent about security model
- Removes duplicate/contradictory sections
The feature is secure and follows best practices. The messaging should
reflect confidence in the design while still being transparent.
Users should feel good about enabling it, not scared.
- Make it clear SSH setup is OPTIONAL
- Explain security model upfront before user commits
- Detail exactly what access is being granted (root SSH, sensors only)
- Warn users to only proceed if they trust Pulse server
- Better differentiate public vs private keys
- Show exactly where the key is stored
- Explain how to revoke access
- Add comprehensive security documentation
- Include advanced option for command restrictions in authorized_keys
- Add risk assessment and best practices
This ensures users make informed decisions about SSH access to their
critical Proxmox infrastructure.
- Prompts user to set up SSH access during auto-setup
- Guides user to paste their Pulse server's public key
- Adds key to /root/.ssh/authorized_keys
- Installs lm-sensors automatically
- Runs sensors-detect --auto for proper sensor detection
- Optional: user can skip and set up later manually
- Includes validation of SSH key format
- Shows clear instructions for manual setup if skipped
This ensures temperature monitoring works out-of-the-box for users
who run the auto-setup script.
Added detailed debug-level logs throughout the OIDC flow:
- Provider initialization (issuer, endpoints, scopes)
- Login flow tracking (client ID, redirect URL)
- Token exchange success/failure details
- Claims extraction (username, email, groups)
- Access control checks (why restrictions passed/failed)
Enhanced error logs to include issuer URL and actual error details in
audit events instead of generic "failed" messages.
Updated docs with Debug Logging section showing example output and
troubleshooting guidance for common issues like group restrictions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Addresses multiple issues identified during comprehensive alert system audit:
1. Fix ZFS device loop lock issue
- Moved lock acquisition outside loop in checkZFSPoolHealth
- Changed clearAlert to clearAlertNoLock when lock already held
- Prevents multiple lock acquisitions in same iteration
2. Add alert deduplication on restore
- Prevents duplicate alerts after service restart
- Tracks seen alert IDs during LoadActiveAlerts
- Logs warnings for any duplicates found
3. Add API input validation
- validateAlertID function prevents DOS attacks
- Limit alert ID length to 500 characters
- Whitelist allowed characters (alphanumeric, -, _, :, /, .)
- Cap history limit parameter at 10,000 records
- Applied validation to acknowledge, unacknowledge, and clear endpoints
4. Add panic recovery to goroutines
- All SaveActiveAlerts goroutines now have defer/recover
- Cleanup goroutines protected from panics
- Contextual error logging for each goroutine type
5. Document lock ordering
- Added comprehensive documentation for Manager mutexes
- Explains m.mu and resolvedMutex relationship
- Clarifies acquisition rules to prevent deadlocks
- Inline comments for resolvedMutex field
These fixes improve stability, security, data integrity, and maintainability
of the alert system without breaking API compatibility.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Adds DEMO_MODE environment variable that blocks all write operations
while allowing full read/view functionality. Includes banner notification
in UI when demo mode is active.
Addresses need for safe public demo instances.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
PVE polling is hardcoded to 10s since Proxmox cluster/resources endpoint only updates every 10s internally. Setting faster polling intervals was wasteful and provided no benefit.
Removed:
- POLLING_INTERVAL env variable and all references
- pollingInterval from config structs and API responses
- UI settings for polling interval (already removed)
- Dynamic polling interval updates via SIGHUP
- Legacy persistence code for saving polling settings
The monitoring loop now uses a hardcoded 10s interval matching Proxmox's update frequency.
Strip trailing slashes and paths from URLs before parsing host:port
to prevent "invalid port number" errors when users add nodes with
URLs like https://192.168.xxx.xxx:8006/
- Add comprehensive disk detection diagnostics to /api/diagnostics
- Shows which nodes return disks and which don't
- Provides specific error messages and API responses
- Includes targeted recommendations based on failure type
- Helps users provide better debugging info when reporting issues
- VMs/Containers default to 10 seconds
- Nodes default to 15 seconds
- Storage defaults to 30 seconds
- PBS servers default to 30 seconds
This allows more appropriate delays for different resource types instead of a single global delay that doesn't fit all use cases. Storage and PBS can have longer delays since they're less critical and more prone to transient spikes during operations.
The issue was that PBS monitoring uses name-based IDs (pbs-<name>) while
the config system uses index-based IDs (pbs-0, pbs-1). When updating PBS
node configuration, the alert overrides were already being preserved but
the ID mismatch wasn't properly documented. Added explicit logging to
track PBS override preservation using the correct monitoring ID.
When detecting Proxmox cluster nodes, the Host field was being set to just the node name without a port. This caused validation to fail with "invalid Port number" error when qdevices were running.
Now cluster endpoints properly include the port (8006) in the Host field, allowing clusters with qdevices to be added successfully.
The alert acknowledgment endpoints were hanging because GetState() was called
synchronously to broadcast updates via WebSocket, which could take significant
time with many nodes/guests. This caused the HTTP response to timeout, showing
an error to users even though the alert was successfully acknowledged.
Fixed by:
- Sending HTTP response immediately after acknowledging the alert
- Moving WebSocket broadcast to a goroutine to avoid blocking
- Applied fix to all alert endpoints (acknowledge, unacknowledge, clear, bulk ops)
This resolves the issue where users saw 'Failed to acknowledge alert' errors
but the alert was actually acknowledged (disappeared on refresh).
When updating PBS nodes through the node configuration UI, alert thresholds
were being reset to defaults. This was because alert overrides are stored
separately from node configuration and weren't being preserved during node updates.
The fix ensures that when a node is updated, the alert configuration (including
any custom threshold overrides) is reloaded and preserved. This applies to both
PBS and PVE nodes to ensure consistent behavior.
- Add retry logic with delays to detectPVECluster function to handle API permission propagation
- Periodically re-check standalone nodes to detect if they're actually part of a cluster
- Increase timeout from 3 to 5 seconds for cluster detection attempts
- Skip retries for definitively standalone nodes (501 not implemented errors)
This addresses the issue where adding a PVE cluster doesn't detect it properly on first attempt,
requiring deletion and re-adding to work correctly. The retry mechanism gives time for
API permissions to fully propagate in Proxmox.
- Use centralized detectServiceName() function instead of duplicate logic
- Automatically detect whether system uses 'pulse' or 'pulse-backend' service
- Improves compatibility between official and community installer scripts
- Reduces confusion when users mix installation methods
- Add API validation for cluster nodes to filter out qdevice VMs
- Only include nodes with working Proxmox APIs in cluster endpoints
- Prevent connection failures when cluster has non-Proxmox participants
- Add detailed logging for cluster node validation process
This resolves issues where Proxmox clusters using corosync qdevice
(external quorum device) would fail to connect because Pulse tried
to connect to the qdevice VM which has no Proxmox API.
- Add client-side URL validation with instant feedback
- Show validation errors inline below URL input fields
- Prevent saving when URLs have validation errors
- Improve error message extraction in API client
- Handle incomplete URLs like 'https://emby.' gracefully
- Backend already had validation, now frontend shows it properly
- Add more specific error messages when metadata save fails
- Better handling of permission and disk space errors
- This should help diagnose why guest URLs fail to save in some cases
- The atomic write operation was already in place but errors weren't clear
- Always query guest agent for running VMs instead of only when disk is 0
- Add duplicate mount point detection to prevent inflated disk totals
- Show allocated disk size as fallback when guest agent unavailable
- Add comprehensive logging for guest agent disk queries
- Include diagnostic script for troubleshooting VM disk issues
- Always query guest agent for running VMs (cluster/resources API always returns 0)
- Show allocated disk size when guest agent unavailable (instead of misleading 0%)
- Fix duplicate mount point counting issue (#425)
- Add comprehensive logging for guest agent queries
- Include diagnostic script for troubleshooting VM disk issues
- Update both monitor.go and monitor_optimized.go for consistency
- Prevent path traversal attacks by cleaning and validating URL paths
- Use secure token comparison to prevent timing attacks
- Return appropriate HTTP status codes for different attack vectors
- Add comprehensive logging for security events
The real issue was not the overall timeout duration, but that DNS resolution and TLS handshake could hang indefinitely. Added specific timeouts for:
- DNS resolution/connection: 10 seconds
- TLS handshake: 10 seconds
- Response headers: 10 seconds
This prevents the connection from hanging on DNS lookup (like with pve-backup.lan) or during TLS negotiation, which was causing the 'context deadline exceeded' errors. (addresses #424)
PBS servers can be slow to respond, especially under load or over slower connections. Increased the timeout from 10 seconds to 30 seconds specifically for PBS version checks during diagnostics.
The /api/state and /api/guests/metadata endpoints are now excluded from
rate limiting as they are polled frequently by the UI for real-time updates.
This prevents the "Loading..." issue when users with multiple nodes access
the application.
- Added skip list in UniversalRateLimitMiddleware for real-time endpoints
- Removed duplicate rate limiting logic from router's ServeHTTP
- Consolidated all rate limiting into the universal middleware
The password change endpoint now handles both scenarios:
- Direct auth: Uses Authorization header when it contains Pulse credentials
- Proxy auth: Uses currentPassword from JSON body when behind proxy Basic Auth
- Prevents proxy auth from interfering with Pulse's own authentication
- Maintains security by always requiring current password verification
- Fixed SendEnhancedWebhook to use service-specific payload generation
- Test webhooks now properly skip template-syntax headers
- ntfy and other plain text services correctly skip JSON validation
- Prevents 'invalid character' errors when testing webhooks
- All webhook payload generation now respects service type
- Fixed webhook ID extraction in UpdateWebhook and DeleteWebhook handlers
- Previous code expected 5 URL parts but path only had 2 after prefix stripping
- Now correctly extracts webhook ID from /api/notifications/webhooks/{id}
- Resolves frontend error when saving webhook changes
Mock cluster endpoints were showing grey dots because the Online field
wasn't being set based on the node status. Now properly reflects the
node's online/offline status with green/grey indicators.
The acknowledge/unacknowledge/clear endpoints were returning 404 due to incorrect path trimming in HandleAlerts. The router was registered with /api/alerts/ but the handler was trimming /api/alerts, causing path parsing to be off by one character.
- Add dynamic metric fluctuations for VMs and containers in mock data
- Fix alert acknowledgment to dim instead of hide alerts
- Implement unacknowledge functionality with backend persistence
- Simplify alert UI to single-click toggle (remove selection system)
- Add proper hysteresis for alert resolution when metrics drop
- Fix SVG icon boundaries in alert displays
- Add webhook disable toggles for testing without notifications
- Fix frontend directory duplication issue (addresses frontend-modern recreation)
- Improve alert sorting to show most recent first
- Make mock system generate realistic metric changes for proper alert lifecycle
- Added ToFrontend() method to StateSnapshot for proper data conversion
- Modified /api/state endpoint to use frontend-formatted data
- Enhanced WebSocket store to handle tag data transformation consistently
- Ensures tags are properly converted between backend strings and frontend arrays
Mock mode was inadvertently clearing real node configuration when toggling.
Added protection to prevent SaveNodesConfig from modifying nodes.enc when
in mock mode. Mock and production data are now completely separated.
Alert IDs like 'pve1:qemu/101-cpu' contain slashes which were breaking the URL path parsing.
Fixed by finding the /acknowledge or /clear suffix and extracting everything before it,
rather than trying to split by slashes.
The acknowledge and clear alert endpoints were incorrectly parsing the alert ID from the URL path, causing 404 errors. Fixed the path extraction logic to properly handle the /api/alerts/{id}/acknowledge pattern.