mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-04-29 12:00:13 +00:00
Complete documentation overhaul for Pulse v4.24.0 release covering all new features and operational procedures. Documentation Updates (19 files): P0 Release-Critical: - Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook - Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status - Operations: Enhanced audit-log-rotation.md with scheduler health checks - Security: Updated proxy hardening docs with rate limit defaults - Docker: Added runtime logging and rollback procedures P1 Deployment & Integration: - KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification - PORT_CONFIGURATION.md: Service naming, change tracking via update history - REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification - PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration - TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows Features Documented: - X-RateLimit-* headers for all API responses - Updates rollback workflow (UI & CLI) - Scheduler health API with rich metadata - Runtime logging configuration (no restart required) - Adaptive polling (GA, enabled by default) - Enhanced audit logging - Circuit breakers and dead-letter queue Supporting Changes: - Discovery service enhancements - Config handlers updates - Sensor proxy installer improvements Total Changes: 1,626 insertions(+), 622 deletions(-) Files Modified: 24 (19 docs, 5 code) All documentation is production-ready for v4.24.0 release.
98 lines
2.7 KiB
Markdown
98 lines
2.7 KiB
Markdown
# ZFS Pool Monitoring
|
|
|
|
Pulse v4.15.0+ includes automatic ZFS pool health monitoring for Proxmox VE nodes.
|
|
|
|
## Features
|
|
|
|
- **Automatic Detection**: Detects ZFS storage and monitors associated pools
|
|
- **Health Status**: Monitors pool state (ONLINE, DEGRADED, FAULTED)
|
|
- **Error Tracking**: Tracks read, write, and checksum errors
|
|
- **Device Monitoring**: Monitors individual devices within pools
|
|
- **Alert Generation**: Creates alerts for degraded pools and device errors
|
|
- **Frontend Display**: Shows ZFS issues inline with storage information
|
|
|
|
## Requirements
|
|
|
|
### Proxmox Permissions
|
|
The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` to access ZFS information:
|
|
|
|
```bash
|
|
# Grant permission for ZFS monitoring (already included in standard Pulse role)
|
|
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
|
|
```
|
|
|
|
### API Endpoints Used
|
|
- `/nodes/{node}/disks/zfs` - Lists ZFS pools
|
|
- `/nodes/{node}/disks/zfs/{pool}` - Gets detailed pool status
|
|
|
|
## Configuration
|
|
|
|
ZFS monitoring is **enabled by default** in Pulse v4.15.0+.
|
|
|
|
### Disabling ZFS Monitoring
|
|
If you want to disable ZFS monitoring (e.g., for performance reasons):
|
|
|
|
```bash
|
|
# Add to /opt/pulse/.env or environment
|
|
PULSE_DISABLE_ZFS_MONITORING=true
|
|
```
|
|
|
|
## Alert Types
|
|
|
|
### Pool State Alerts
|
|
- **Warning**: Pool is DEGRADED
|
|
- **Critical**: Pool is FAULTED or UNAVAIL
|
|
|
|
### Error Alerts
|
|
- **Warning**: Any read/write/checksum errors detected
|
|
- Alerts include error counts and affected devices
|
|
|
|
### Device Alerts
|
|
- **Warning**: Device has errors but is ONLINE
|
|
- **Critical**: Device is FAULTED or UNAVAIL
|
|
|
|
## Frontend Display
|
|
|
|
ZFS issues appear in the Storage tab:
|
|
- Yellow warning bar for degraded pools
|
|
- Red error counts for devices with issues
|
|
- Detailed device status for troubleshooting
|
|
|
|
## Performance Impact
|
|
|
|
- Adds 2 API calls per node with ZFS storage
|
|
- Typically adds <1 second to polling cycle
|
|
- Only queries nodes that have ZFS storage
|
|
|
|
## Troubleshooting
|
|
|
|
### No ZFS Data Appearing
|
|
1. Check permissions: `pveum user permissions pulse-monitor@pam`
|
|
2. Verify ZFS pools exist: `zpool list`
|
|
3. Check logs: `grep ZFS /opt/pulse/pulse.log` (raise log level to `debug` via **Settings → System → Logging** if you need more context, then switch back to `info`).
|
|
|
|
### Permission Denied Errors
|
|
Grant the required permission:
|
|
```bash
|
|
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
|
|
```
|
|
|
|
### High API Load
|
|
Disable ZFS monitoring if not needed:
|
|
```bash
|
|
echo "PULSE_DISABLE_ZFS_MONITORING=true" >> /opt/pulse/.env
|
|
systemctl restart pulse
|
|
```
|
|
|
|
## Example Alert
|
|
|
|
```
|
|
Alert: ZFS pool 'rpool' is DEGRADED
|
|
Node: pve1
|
|
Pool: rpool
|
|
State: DEGRADED
|
|
Errors: 12 read, 0 write, 3 checksum
|
|
Device sdb2: DEGRADED with 12 read errors
|
|
```
|
|
|
|
This helps administrators identify failing drives before complete failure occurs.
|