When updating PBS nodes through the node configuration UI, alert thresholds
were being reset to defaults. This was because alert overrides are stored
separately from node configuration and weren't being preserved during node updates.
The fix ensures that when a node is updated, the alert configuration (including
any custom threshold overrides) is reloaded and preserved. This applies to both
PBS and PVE nodes to ensure consistent behavior.
- handle PBS node status endpoint permission errors gracefully (returns nil instead of error for 403s)
- add required cf and timeframe parameters to RRD endpoint calls
- properly handle nil nodeStatus returns in monitor.go
these API calls now fail silently as PBS API tokens often lack the required permissions for these endpoints, which is expected behavior
- Add retry logic with delays to detectPVECluster function to handle API permission propagation
- Periodically re-check standalone nodes to detect if they're actually part of a cluster
- Increase timeout from 3 to 5 seconds for cluster detection attempts
- Skip retries for definitively standalone nodes (501 not implemented errors)
This addresses the issue where adding a PVE cluster doesn't detect it properly on first attempt,
requiring deletion and re-adding to work correctly. The retry mechanism gives time for
API permissions to fully propagate in Proxmox.
Instead of making individual API calls for each guest's metadata,
load all metadata once at the Dashboard level and pass it down as props.
This reduces hundreds of HTTP requests to just one when dealing with
large deployments.
With 800 guests, this changes from 800 individual requests to 1 batch request.
The README previously claimed alerts for 'VMs go down' but currently only node down detection is implemented. Updated to accurately reflect that alerts are for nodes, not individual VMs/containers.
The cluster client was incorrectly marking nodes as unhealthy when encountering
VM-specific QEMU guest agent errors. This caused storage and backup operations
to fail with "no healthy nodes available" even though the nodes were actually
accessible.
Changes:
- Added broader detection for guest agent errors in executeWithFailover
- Updated recovery logic to ignore VM-specific errors when recovering nodes
- Guest agent errors no longer affect node health status
This fixes the issue where users with clusters would see storage and backup
operations fail after any VM without a guest agent was queried.
- Creates ~/.pulse marker file after successful install/update
- Addresses vhsdream's request in PR #7519
- Helps Community Scripts track that Pulse has been installed
- Improves compatibility between installation methods
- Use centralized detectServiceName() function instead of duplicate logic
- Automatically detect whether system uses 'pulse' or 'pulse-backend' service
- Improves compatibility between official and community installer scripts
- Reduces confusion when users mix installation methods
- Calculate memory as (Total - Available) instead of raw Used value
- Excludes buffer/cache memory that Linux can reclaim when needed
- Prevents false alerts from Linux cache usage
- Falls back to traditional calculation on older Proxmox versions
- VMs already use FreeMem from guest agent when available
- Memory usage will appear lower but more accurate (e.g., 56% instead of 84%)
- Users may need to adjust alert thresholds accordingly
- Add Available field to MemoryStatus struct to capture memory available for allocation
- Update node memory calculation to use Available memory when present
- This excludes non-reclaimable cache/buffers from used memory calculation
- Provides more accurate memory pressure indication, avoiding false alerts
- Falls back to traditional used memory if Available field is missing (older Proxmox versions)
When a threshold is set to 100%, it now effectively disables alerts for that metric.
This allows users to turn off specific alerts without disabling all alerts for a resource.
Also clears any existing alerts when threshold is changed to 100%.
- Changed logic to always query guest agent when available, not just when disk is 0
- This fixes issue where Proxmox returns incorrect non-zero values from cluster/resources
- Guest agent data is now preferred over cluster/resources data for all running VMs
- Improved logging to show when we're replacing cluster data with guest agent data
This should resolve the issue reported by FaboulousSan where VMs were showing
host disk space instead of actual VM disk usage.
- Add comprehensive filtering for network filesystems (NFS, CIFS, SMB, FUSE, 9p)
- Skip Docker volumes, snap mounts, and other special mountpoints
- Add detailed logging to track which filesystems are included/excluded
- Add sanity check to detect when reported disk is way larger than allocated
- Improve logging with GB values and more context for debugging
This should prevent Pulse from accidentally including host disk space or
network shares when calculating VM disk usage. Users can use the existing
diagnostics system in the UI to troubleshoot VM disk issues.
The disk monitoring backend was working but frontend wasn't updating because the WebSocket store was missing the handler for physicalDisks data. Also added physicalDisks count to broadcast logging for better debugging.
- Added DiskHealthSummary widget to dashboard showing:
- Total disk health status overview
- Healthy/failing/low-life disk counts
- Average SSD life remaining with visual bar
- Distribution of disks across nodes
- Added disk count badges to node selector in storage tab
- Shows disk counts next to storage pools count per node
- Webhook notifications automatically trigger for disk alerts via existing system
- Dashboard widget highlights issues with color-coded status indicators
- Added disk polling to monitoring cycle using Proxmox API
- Created CheckDiskHealth() alert manager for failing drives and low SSD life
- Added PhysicalDisk model to state with proper serialization
- Implemented DiskList component with health indicators and SSD wearout bars
- Added Physical Disks tab to Storage page with toggle between pools and disks
- Added ZFS health badges to storage cards for degraded/failed pools
- Alerts trigger for health != PASSED and SSD wearout < 10%
- Frontend displays disk model, type, temperature, and usage information
- Add API validation for cluster nodes to filter out qdevice VMs
- Only include nodes with working Proxmox APIs in cluster endpoints
- Prevent connection failures when cluster has non-Proxmox participants
- Add detailed logging for cluster node validation process
This resolves issues where Proxmox clusters using corosync qdevice
(external quorum device) would fail to connect because Pulse tried
to connect to the qdevice VM which has no Proxmox API.
- Add client-side URL validation with instant feedback
- Show validation errors inline below URL input fields
- Prevent saving when URLs have validation errors
- Improve error message extraction in API client
- Handle incomplete URLs like 'https://emby.' gracefully
- Backend already had validation, now frontend shows it properly
- Add more specific error messages when metadata save fails
- Better handling of permission and disk space errors
- This should help diagnose why guest URLs fail to save in some cases
- The atomic write operation was already in place but errors weren't clear
- Skip auth check entirely in App.tsx for development
- Add .env.dev file with DISABLE_AUTH=true and PULSE_MOCK_MODE=true
- Update hot-dev.sh to load .env.dev environment variables
- This ensures the app loads immediately without auth issues
- WebSocket and API now work without authentication in dev mode
- Changed initial isLoading state to false to prevent infinite loading
- Initialize WebSocket store immediately on component mount
- Added error handling and debug logging to identify issues
- Added 10-second timeout fallback for auth checks
- The auth check was hanging, preventing the app from ever loading
- Fixed Resource interface to properly define all used properties
- Added proper optional chaining for potentially undefined values
- Fixed displayValue to always return a number type
- Properly handle undefined thresholds and defaults in event handlers
- Fixed input value handling to work with strict TypeScript checks
- Fixed PBS alert toggle not responding in thresholds settings
- PBS servers now use connectivity toggle like nodes instead of disabled toggle
- Added support for disableConnectivity flag on PBS instances in backend
- Fixed PBS ID format mismatch between frontend and backend
- PBS offline alerts now properly respect the disableConnectivity setting
- Prevents spam alerts by checking disableConnectivity flag for PBS offline alerts
- Always query guest agent for running VMs instead of only when disk is 0
- Add duplicate mount point detection to prevent inflated disk totals
- Show allocated disk size as fallback when guest agent unavailable
- Add comprehensive logging for guest agent disk queries
- Include diagnostic script for troubleshooting VM disk issues
- Always query guest agent for running VMs (cluster/resources API always returns 0)
- Show allocated disk size when guest agent unavailable (instead of misleading 0%)
- Fix duplicate mount point counting issue (#425)
- Add comprehensive logging for guest agent queries
- Include diagnostic script for troubleshooting VM disk issues
- Update both monitor.go and monitor_optimized.go for consistency
- Prevent path traversal attacks by cleaning and validating URL paths
- Use secure token comparison to prevent timing attacks
- Return appropriate HTTP status codes for different attack vectors
- Add comprehensive logging for security events
The parallel optimization introduced in commit 634e0dd37 accidentally removed
all guest agent filesystem fetching logic from the optimized monitor code.
This caused VMs with guest agents to show no disk stats after v4.12.1.
Added back the guest agent fetching logic to pollVMsWithNodesOptimized:
- Fetches filesystem info when VM disk stats are 0
- Aggregates disk usage from all valid filesystems
- Skips special filesystems and Windows System Reserved partitions
- Uses guest agent data when available to show accurate disk usage
This restores disk stats display for VMs with working QEMU guest agents.
- Implement proper API integration with list and detail endpoints
- Add ZFS pool and device status conversion
- Enable by default with PULSE_DISABLE_ZFS_MONITORING opt-out
- Test with real Proxmox nodes and verify functionality
- Add comprehensive error handling and logging
- Document feature configuration and requirements
The feature now properly:
- Fetches ZFS pool status from Proxmox API
- Detects degraded/faulted pools and devices
- Tracks read/write/checksum errors
- Generates appropriate alerts
- Displays issues in the Storage tab UI
Tested and verified working with real Proxmox clusters.
- Added debug mode: localStorage.setItem('debug-pmg', 'true')
- Robust VMID=0 detection handles string and number types
- Debug logging shows exactly what's happening with PMG backups
- Created test suite that verifies all PMG backup scenarios
- All test cases pass including PBS 'ct' type with VMID='0'
Users experiencing issues can enable debug mode to help diagnose:
1. Open browser console
2. Run: localStorage.setItem('debug-pmg', 'true')
3. Reload page and check for [PMG Debug] messages
4. Share debug output if still showing as LXC
Test results:
✓ PBS PMG backup (ct type with VMID 0) → Host
✓ PBS PMG backup (ct type with numeric VMID 0) → Host
✓ Storage PMG backup (host type) → Host
✓ Storage PMG backup (lxc type with VMID 0) → Host
✓ Regular LXC backup → LXC
- Handle VMID as both string and number types consistently
- Check for both 'ct' and 'lxc' backup types (PBS uses 'ct')
- Check for both 'vm' and 'qemu' backup types for consistency
- Always check VMID=0 first before checking backup type
- PBS stores PMG backups as 'ct' type with VMID='0' (string)
This should properly identify all PMG host config backups regardless
of whether they come from PBS or regular storage, and regardless
of whether VMID is a string or number.
Added console.log statements to understand why PMG backups with VMID=0
are still showing as LXC in v4.14.0. This will help identify:
- What data type vmid is (string vs number)
- What backup type is being sent
- Whether the checks are being triggered
- Add PULSE_ENABLE_ZFS_MONITORING env var (disabled by default)
- Fix API field mapping (health vs state, cksum vs checksum)
- Add proper API endpoint structures for list and detail
- Mark feature as experimental due to API complexity
- Simplify conversion to handle basic health status only
This is a safer approach until we can fully test with real Proxmox nodes
- Add ZFS pool status data structures to models
- Implement ZFS pool data collection via Proxmox API
- Add ZFS pool health alerts for degraded/faulted states
- Add ZFS device error detection and alerting
- Display ZFS pool status in Storage tab when issues detected
- Add mock data generation for testing ZFS monitoring
- Alert on read/write/checksum errors for pools and devices