mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-04-29 12:00:13 +00:00
Complete documentation overhaul for Pulse v4.24.0 release covering all new features and operational procedures. Documentation Updates (19 files): P0 Release-Critical: - Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook - Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status - Operations: Enhanced audit-log-rotation.md with scheduler health checks - Security: Updated proxy hardening docs with rate limit defaults - Docker: Added runtime logging and rollback procedures P1 Deployment & Integration: - KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification - PORT_CONFIGURATION.md: Service naming, change tracking via update history - REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification - PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration - TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows Features Documented: - X-RateLimit-* headers for all API responses - Updates rollback workflow (UI & CLI) - Scheduler health API with rich metadata - Runtime logging configuration (no restart required) - Adaptive polling (GA, enabled by default) - Enhanced audit logging - Circuit breakers and dead-letter queue Supporting Changes: - Discovery service enhancements - Config handlers updates - Sensor proxy installer improvements Total Changes: 1,626 insertions(+), 622 deletions(-) Files Modified: 24 (19 docs, 5 code) All documentation is production-ready for v4.24.0 release.
2.7 KiB
2.7 KiB
ZFS Pool Monitoring
Pulse v4.15.0+ includes automatic ZFS pool health monitoring for Proxmox VE nodes.
Features
- Automatic Detection: Detects ZFS storage and monitors associated pools
- Health Status: Monitors pool state (ONLINE, DEGRADED, FAULTED)
- Error Tracking: Tracks read, write, and checksum errors
- Device Monitoring: Monitors individual devices within pools
- Alert Generation: Creates alerts for degraded pools and device errors
- Frontend Display: Shows ZFS issues inline with storage information
Requirements
Proxmox Permissions
The Pulse user needs Sys.Audit permission on /nodes/{node}/disks to access ZFS information:
# Grant permission for ZFS monitoring (already included in standard Pulse role)
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
API Endpoints Used
/nodes/{node}/disks/zfs- Lists ZFS pools/nodes/{node}/disks/zfs/{pool}- Gets detailed pool status
Configuration
ZFS monitoring is enabled by default in Pulse v4.15.0+.
Disabling ZFS Monitoring
If you want to disable ZFS monitoring (e.g., for performance reasons):
# Add to /opt/pulse/.env or environment
PULSE_DISABLE_ZFS_MONITORING=true
Alert Types
Pool State Alerts
- Warning: Pool is DEGRADED
- Critical: Pool is FAULTED or UNAVAIL
Error Alerts
- Warning: Any read/write/checksum errors detected
- Alerts include error counts and affected devices
Device Alerts
- Warning: Device has errors but is ONLINE
- Critical: Device is FAULTED or UNAVAIL
Frontend Display
ZFS issues appear in the Storage tab:
- Yellow warning bar for degraded pools
- Red error counts for devices with issues
- Detailed device status for troubleshooting
Performance Impact
- Adds 2 API calls per node with ZFS storage
- Typically adds <1 second to polling cycle
- Only queries nodes that have ZFS storage
Troubleshooting
No ZFS Data Appearing
- Check permissions:
pveum user permissions pulse-monitor@pam - Verify ZFS pools exist:
zpool list - Check logs:
grep ZFS /opt/pulse/pulse.log(raise log level todebugvia Settings → System → Logging if you need more context, then switch back toinfo).
Permission Denied Errors
Grant the required permission:
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
High API Load
Disable ZFS monitoring if not needed:
echo "PULSE_DISABLE_ZFS_MONITORING=true" >> /opt/pulse/.env
systemctl restart pulse
Example Alert
Alert: ZFS pool 'rpool' is DEGRADED
Node: pve1
Pool: rpool
State: DEGRADED
Errors: 12 read, 0 write, 3 checksum
Device sdb2: DEGRADED with 12 read errors
This helps administrators identify failing drives before complete failure occurs.