diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md index 3ea0b7d52..938c754ed 100644 --- a/docs/MIGRATION.md +++ b/docs/MIGRATION.md @@ -1,117 +1,57 @@ -# Migrating Pulse +# 🚚 Migrating Pulse -**Updated for Pulse v4.24.0** +**Updated for Pulse v4.24.0+** -## Quick Migration Guide +## πŸš€ Quick Migration Guide -### ❌ DON'T: Copy files directly -Never copy `/etc/pulse` or `/var/lib/pulse` directories between systems: -- The encryption key is tied to the files -- Credentials may be exposed -- Configuration may not work on different systems +### ❌ DON'T: Copy Files +Never copy `/etc/pulse` or `/var/lib/pulse` manually. Encryption keys and credentials will break. ### βœ… DO: Use Export/Import -#### Exporting (Old Server) -1. Open Pulse web interface -2. Go to **Settings** β†’ **Configuration Management** -3. Click **Export Configuration** -4. Enter a strong passphrase (you'll need this for import!) -5. Save the downloaded file securely +#### 1. Export (Old Server) +1. Go to **Settings β†’ Configuration Management**. +2. Click **Export Configuration**. +3. Enter a strong passphrase and save the `.enc` file. -#### Importing (New Server) -1. Install fresh Pulse instance -2. Open Pulse web interface -3. Go to **Settings** β†’ **Configuration Management** -4. Click **Import Configuration** -5. Select your exported file -6. Enter the same passphrase -7. Click Import -8. **Post-migration verification (v4.24.0+)**: - - Check scheduler health: `curl -s http://localhost:7655/api/monitoring/scheduler/health | jq` - - Verify adaptive polling status: **Settings β†’ System β†’ Monitoring** - - Confirm all nodes are connected and polling correctly +#### 2. Import (New Server) +1. Install a fresh Pulse instance. +2. Go to **Settings β†’ Configuration Management**. +3. Click **Import Configuration** and upload your file. +4. Enter the passphrase. -## What Gets Migrated +## πŸ“¦ What Gets Migrated -βœ… **Included:** -- All PVE/PBS nodes and credentials -- Alert settings and thresholds -- Email configuration -- Webhook configurations -- System settings -- Guest metadata (custom URLs, notes) +| Included βœ… | Not Included ❌ | +| :--- | :--- | +| Nodes & Credentials | Historical Metrics | +| Alert Settings | Alert History | +| Email & Webhooks | Auth Settings (Passwords/Tokens) | +| System Settings | Update Rollback History | +| Guest Metadata | | -❌ **Not Included:** -- Historical metrics data -- Alert history -- Authentication settings (passwords, API tokens) -- **Updates rollback history** (v4.24.0+) -- Each instance should configure its own authentication -- **Note:** Updates rollback data isn't transferred and must be rebuilt by running one successful update cycle on the new host - -## Common Scenarios +## πŸ”„ Common Scenarios ### Moving to New Hardware -1. Export from old server -2. Shut down old Pulse instance -3. Install Pulse on new hardware -4. Import configuration -5. Verify all nodes are connected +Export from old β†’ Install new β†’ Import. -### Docker to Systemd (or vice versa) -The export/import process works across all installation methods: -- Docker β†’ Systemd βœ… -- Systemd β†’ Docker βœ… -- Docker β†’ LXC βœ… - -### Backup Strategy -**Weekly Backups:** -1. Export configuration weekly -2. Store exports with date: `pulse-backup-2024-01-15.enc` -3. Keep last 4 backups -4. Store passphrase securely (password manager) +### Docker ↔ Systemd ↔ LXC +The export file works across all installation methods. You can migrate from Docker to LXC or vice versa seamlessly. ### Disaster Recovery -1. Install Pulse: `curl -sL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash` -2. Import latest backup -3. System restored in under 5 minutes! +1. Install Pulse: `curl -sL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash` +2. Import your latest backup. +3. Restored in < 5 minutes. -## Security Notes +## πŸ”’ Security -- **Passphrase Protection**: Exports are encrypted with PBKDF2 (100,000 iterations) -- **Safe to Store**: Encrypted exports can be stored in cloud backups -- **Minimum 12 characters**: Use a strong passphrase -- **Password Manager**: Store your passphrase securely -- **Rollback History**: Updates rollback data isn't included in exports; rebuild by running one successful update on the new host +* **Encryption**: Exports are encrypted with PBKDF2 (100k iterations). +* **Storage**: Safe to store in cloud backups or password managers. +* **Passphrase**: Use a strong, unique passphrase (min 12 chars). -## Troubleshooting +## πŸ”§ Troubleshooting -**"Invalid passphrase" error** -- Ensure you're using the exact same passphrase -- Check for extra spaces or capitalization - -**Missing nodes after import** -- Verify the export was taken after adding the nodes -- Check Settings to ensure nodes are listed - -**Connection errors after import** -- Node IPs may have changed -- Update node addresses in Settings - -**Logging issues after migration (v4.24.0+)** -- If you lose logs after migration, ensure the runtime logging configuration persisted -- Toggle **Settings β†’ System β†’ Logging** to your desired level -- Check environment variables: `LOG_LEVEL`, `LOG_FORMAT` -- Verify log file rotation settings are correct - -## Pro Tips - -1. **Test imports**: Try importing on a test instance first -2. **Document changes**: Note any manual configs not in Pulse -3. **Version matching**: Best to import into same or newer Pulse version -4. **Network access**: Ensure new server can reach all nodes - ---- - -*Remember: Export/Import is the ONLY supported migration method. Direct file copying is not supported and may result in data loss.* \ No newline at end of file +* **"Invalid passphrase"**: Ensure exact match (case-sensitive). +* **Missing Nodes**: Verify export date. +* **Connection Errors**: Update node IPs in Settings if they changed. +* **Logging**: Re-configure log levels in **Settings β†’ System β†’ Logging** if needed. \ No newline at end of file diff --git a/docs/PROXY_CONTROL_PLANE.md b/docs/PROXY_CONTROL_PLANE.md new file mode 100644 index 000000000..d18d3ba38 --- /dev/null +++ b/docs/PROXY_CONTROL_PLANE.md @@ -0,0 +1,40 @@ +# πŸ“‘ Proxy Control Plane + +The Control Plane synchronizes `pulse-sensor-proxy` instances with the Pulse server, ensuring they trust the correct nodes without manual configuration. + +## πŸ—οΈ Architecture + +```mermaid +graph LR + Pulse[Pulse Server] -- HTTPS /api/temperature-proxy --> Proxy[Sensor Proxy] + Proxy -- SSH --> Nodes[Cluster Nodes] +``` + +1. **Registration**: The proxy registers with Pulse on startup/install. +2. **Sync**: The proxy periodically fetches the "Authorized Nodes" list from Pulse. +3. **Validation**: The proxy only executes commands on nodes authorized by Pulse. + +## πŸ”„ Workflow + +1. **Install**: `install-sensor-proxy.sh` calls `/api/temperature-proxy/register`. +2. **Token Exchange**: Pulse returns a `ctrl_token` which the proxy saves to `/etc/pulse-sensor-proxy/.pulse-control-token`. +3. **Polling**: The proxy polls `/api/temperature-proxy/authorized-nodes` every 60s (configurable). +4. **Update**: If the node list changes (e.g., a new node is added to Pulse), the proxy updates its internal allowlist automatically. + +## βš™οΈ Configuration + +The proxy configuration in `/etc/pulse-sensor-proxy/config.yaml` handles the sync: + +```yaml +pulse_control_plane: + url: https://pulse.example.com:7655 + token_file: /etc/pulse-sensor-proxy/.pulse-control-token + refresh_interval: 60s +``` + +## πŸ›‘οΈ Security + +* **Tokens**: The `ctrl_token` is unique per proxy instance. +* **Least Privilege**: The proxy only knows about nodes explicitly added to Pulse. +* **Fallback**: If the control plane is unreachable, the proxy uses its last known good configuration. + diff --git a/docs/SCRIPT_LIBRARY.md b/docs/SCRIPT_LIBRARY.md new file mode 100644 index 000000000..99b5d1f2a --- /dev/null +++ b/docs/SCRIPT_LIBRARY.md @@ -0,0 +1,63 @@ +# πŸ“œ Script Library Guide + +This guide explains the shared Bash modules in `scripts/lib/` used for building installer scripts. + +## πŸ“‚ Structure + +| File | Purpose | +| :--- | :--- | +| `common.sh` | Logging, error handling, retry helpers, temp dirs. | +| `http.sh` | Curl/wget wrappers, GitHub release helpers. | +| `systemd.sh` | Systemd unit management helpers. | + +**Conventions:** +* **Namespaces:** Functions are exported as `module::function` (e.g., `common::run`). +* **Bundling:** `make bundle-scripts` inlines modules for distribution. +* **Compatibility:** Targets Bash 5 on Debian 11+ and Ubuntu LTS. + +## 🦴 Script Skeleton + +```bash +#!/usr/bin/env bash +set -euo pipefail + +LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/lib" && pwd)" +# shellcheck source=../../scripts/lib/common.sh +source "${LIB_DIR}/common.sh" +# shellcheck source=../../scripts/lib/systemd.sh +source "${LIB_DIR}/systemd.sh" + +common::init "$@" +common::require_command curl tar + +main() { + common::log_info "Starting installer..." + common::temp_dir WORKDIR --prefix pulse- + + http::download --url "${URL}" --output "${WORKDIR}/pulse.tar.gz" + + systemd::create_service /etc/systemd/system/pulse.service <<'UNIT' +[Unit] +Description=Pulse Monitoring +UNIT + + systemd::enable_and_start pulse.service +} + +main "$@" +``` + +## πŸ› οΈ Best Practices + +* **Logging:** Use `common::log_info`, `common::log_warn`, etc. They respect `PULSE_LOG_LEVEL`. +* **Dry Run:** Wrap mutating commands in `common::run` to support `--dry-run`. +* **Testing:** Use `scripts/tests/run.sh` for linting and `scripts/tests/integration/` for scenarios. + +## πŸ“¦ Bundling + +1. Update `scripts/bundle.manifest`. +2. Run `make bundle-scripts`. +3. Verify `dist/` artifacts. + +**Note:** Never edit bundled artifacts manually. Always rebuild from source. + diff --git a/docs/ZFS_MONITORING.md b/docs/ZFS_MONITORING.md new file mode 100644 index 000000000..fbc99247e --- /dev/null +++ b/docs/ZFS_MONITORING.md @@ -0,0 +1,42 @@ +# πŸ’Ύ ZFS Pool Monitoring + +Pulse automatically detects and monitors ZFS pools on your Proxmox nodes. + +## πŸš€ Features + +* **Auto-Detection**: No configuration needed. +* **Health Status**: Tracks `ONLINE`, `DEGRADED`, and `FAULTED` states. +* **Error Tracking**: Monitors read, write, and checksum errors. +* **Alerts**: Notifies you of degraded pools or failing devices. + +## βš™οΈ Requirements + +The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` (included in the standard Pulse role). + +```bash +# Grant permission manually if needed +pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor +``` + +## πŸ”§ Configuration + +ZFS monitoring is **enabled by default**. To disable it: + +```bash +# Add to /opt/pulse/.env +PULSE_DISABLE_ZFS_MONITORING=true +``` + +## 🚨 Alerts + +| Severity | Condition | +| :--- | :--- | +| **Warning** | Pool `DEGRADED` or any read/write/checksum errors. | +| **Critical** | Pool `FAULTED` or `UNAVAIL`. | + +## πŸ” Troubleshooting + +**No ZFS Data?** +1. Check permissions: `pveum user permissions pulse-monitor@pam`. +2. Verify pools exist: `zpool list`. +3. Check logs: `grep ZFS /opt/pulse/pulse.log`. diff --git a/docs/script-library-guide.md b/docs/script-library-guide.md deleted file mode 100644 index 4874dd108..000000000 --- a/docs/script-library-guide.md +++ /dev/null @@ -1,144 +0,0 @@ -# Pulse Script Library Guide - -This guide expands on `scripts/lib/README.md` and explains how the shared Bash -modules fit together when you are building or refactoring installer scripts. - ---- - -## Library Structure - -``` -scripts/ - lib/ - common.sh # Logging, error handling, retry helpers, temp dirs - http.sh # Curl/wget wrappers, GitHub release helpers - systemd.sh # Systemd unit management helpers - README.md # API-level reference -``` - -Key conventions: - -- **Namespaces:** Exported functions are declared as `module::function` (for - example `common::run`, `systemd::create_service`). Avoid referencing private - helpers (`module::__helper`) from other modules. -- **Development vs Bundled mode:** During local development scripts source - modules from `scripts/lib`. Bundled artifacts produced by - `make bundle-scripts` contain the modules inline, so the source guards remain - but resolve to no-ops. -- **Compatibility:** The library targets Bash 5 but must run on Debian 11+ - (Pulse LXC), Ubuntu LTS, and minimal container images. Stick to POSIX shell - built-ins or guarded GNU extensions. - ---- - -## Recommended Script Skeleton - -```bash -#!/usr/bin/env bash -set -euo pipefail - -LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/lib" && pwd)" -# shellcheck source=../../scripts/lib/common.sh -source "${LIB_DIR}/common.sh" -# shellcheck source=../../scripts/lib/systemd.sh -source "${LIB_DIR}/systemd.sh" - -common::init "$@" -common::require_command curl tar - -main() { - common::log_info "Starting installer..." - common::temp_dir WORKDIR --prefix pulse- - download_payload - install_service -} - -download_payload() { - http::download --url "${PULSE_DOWNLOAD_URL}" --output "${WORKDIR}/pulse.tar.gz" -} - -install_service() { - systemd::create_service /etc/systemd/system/pulse.service <<'UNIT' -[Unit] -Description=Pulse Monitoring -After=network-online.target -UNIT - - systemd::enable_and_start pulse.service -} - -main "$@" -``` - -**Why this layout works** - -- `common::init` centralises logging/traps and stores the original CLI args so - you can re-exec under sudo if required (`common::ensure_root`). -- `common::temp_dir` registers a cleanup handler automatically to keep `/tmp` - tidy. -- `systemd::create_service` respects `--dry-run` flags and prevents partial - writes on failure. - ---- - -## Logging and Dry-Run Practices - -- Respect `PULSE_LOG_LEVEL` and `PULSE_DEBUG`β€”they are already wired into - `common::log_*`. -- Wrap mutating commands in `common::run` or `common::run_capture` to inherit - retry/backoff logic and `--dry-run` behaviour. -- Provide meaningful `--label` values on long-running steps to improve CI log - readability. - -Example: - -```bash -common::run --label "Extract Pulse binary" \ - -- tar -xzf "${ARCHIVE}" -C "${TARGET_DIR}" --strip-components=1 -``` - -When invoked with `--dry-run`, the command prints the operation instead of -executing and exits successfullyβ€”keep this in mind when writing tests. - ---- - -## Testing Strategy - -- **Smoke tests:** `scripts/tests/run.sh` lints scripts, validates manifests, and - exercises bundle generation. -- **Integration tests:** Place scenario-specific scripts under - `scripts/tests/integration/`. They should run quickly (<30s) and clean up - after themselves. -- **Manual verification:** For destructive operations (e.g., provisioning an LXC - container), run the script with `--dry-run` to confirm the steps before - executing against real infrastructure. - -When adding new library helpers, accompany them with unit coverage using -`bats` (found under `testing-tools/bats`) or an integration script that covers -the happy path and a failure case. - ---- - -## Bundling Checklist - -1. Update `scripts/bundle.manifest` with any newly created scripts. -2. Run `make bundle-scripts` (or `./scripts/bundle.sh`) to regenerate `dist/*`. -3. Inspect the diff to ensure only intentional changes appear. -4. Re-run `scripts/tests/run.sh` to catch lint and shellcheck regressions. - -Bundled files embed provenance metadata (timestamp + manifest path). Do not edit -bundled artifacts by handβ€”always rebuild from sources. - ---- - -## When to Extend the Library - -- You need to reuse logic across two or more scripts. -- A helper hides platform-specific differences (e.g., `systemctl` vs `service` - on legacy systems). -- The code is complex enough that centralised unit tests provide value. - -Document new functions in `scripts/lib/README.md` and update this guide if usage -patterns change. Keeping these references in sync helps future contributors -avoid copy/paste or undocumented conventions. - diff --git a/docs/temperature-proxy-control-plane.md b/docs/temperature-proxy-control-plane.md deleted file mode 100644 index ce7bbd4b4..000000000 --- a/docs/temperature-proxy-control-plane.md +++ /dev/null @@ -1,120 +0,0 @@ -# Pulse Temperature Proxy – Control Plane Sync - -## Goals - -1. Make `pulse-sensor-proxy` trust Pulse itself instead of scraping `pvecm`/editing `/etc/pve`. -2. Ensure host installers always create a pulse-proxy registration, regardless of socket vs HTTP mode. -3. Keep backwards compatibility: existing `allowed_nodes` entries remain a fallback cache, but the runtime source of truth is Pulse. - -## Overview - -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” HTTPS / Unix socket β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Pulse server (LXC) β”‚ <═════════════════════════════> β”‚ pulse-sensor-proxy β”‚ -β”‚ β”‚ /api/... β”‚ (Proxmox host) β”‚ -β”‚ - Stores nodes β”‚ β”‚ - Collects temps β”‚ -β”‚ - Issues proxy tokenβ”‚ β”‚ - Validates node β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ via synced list β”‚ - β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ -``` - -1. Installer registers the proxy using `/api/temperature-proxy/register`. - - Response now includes `ctrl_token`, `instance_id`, and `allowed_nodes`. - - Pulse persists `{instance_id, ctrl_token, last_seen, allowed_nodes_cache}`. -2. Proxy writes: - ```yaml - pulse_control_plane: - url: https://pulse.example.com:7655 - token_file: /etc/pulse-sensor-proxy/.pulse-control-token - refresh_interval: 60s - ``` -3. Proxy boot sequence: - - Load cached `allowed_nodes` from YAML (fallback only). - - If `pulse_control_plane` configured, fetch `/api/temperature-proxy/authorized-nodes`. - - Replace in-memory allowlist atomically, log version/hash. - - Retry based on exponential backoff; stay on cached list if control plane unreachable. - -## API Changes (Pulse) - -1. **Extend existing registration endpoint** - - Request: `{hostname, proxy_url, kind}` (`kind` = `socket` or `http`). - - Response: `{success, token, ctrl_token, pve_instance, allowed_nodes, refresh_interval}`. - - Persist `ctrl_token` (or reuse `TemperatureProxyToken` field if `proxy_url` empty). -2. **New endpoint** `/api/temperature-proxy/authorized-nodes` - - Auth: `X-Proxy-Token: ` or `Authorization: Bearer`. - - Response: - ```json - { - "nodes": [ - {"name": "delly", "ip": "192.168.0.5"}, - {"name": "minipc", "ip": "192.168.0.134"} - ], - "hash": "sha256:...", - "refresh_interval": 60, - "updated_at": "2025-11-15T20:47:00Z" - } - ``` - - Uses Pulse config (`nodes.enc` + cluster endpoints) to build list. - - Derives `ip` from cluster endpoints or stored host value; duplicates removed. - - Logs when proxies pull list (metrics + last_seen). -3. **Persistence** - - `config.PVEInstance` already has `TemperatureProxyURL`/`Token`. Add `TemperatureProxyControlToken` or reuse existing field when URL empty. - - Add `LastProxyPull`, `LastAllowlistHash`. -4. **Access control** - - Router should treat `/api/temperature-proxy/authorized-nodes` as public but requiring proxy token (bypasses user auth). - - Rate limit per proxy (maybe 12/min). - -## Proxy Changes - -1. **Config additions** - ```yaml - pulse_control_plane: - url: https://pulse.lan:7655 - token_file: /etc/pulse-sensor-proxy/.pulse-control-token - refresh_interval: 60s # default - insecure_skip_verify: false - ``` -2. **Startup** - - Read token from `token_file`. - - Launch goroutine: `syncAllowlist(ctx)` loops: - 1. GET `/api/temperature-proxy/authorized-nodes`. - 2. Validate response (non-empty, verify hash changes). - 3. Replace `nodeValidator` allowlist in thread-safe way. - 4. Write new snapshot to `allowed_nodes_cache` (optional). - 5. Sleep `refresh_interval` (server-provided). - - If call fails: log warning, keep last known list, use fallback allowlist when empty. -3. **NodeValidator** - - Keep ability to parse static `allowed_nodes`. - - Add `SetAuthorizedNodes([]string)` to update hosts + CIDRs. - - When `hasAllowlist == false` but control-plane sync enabled, we never fall back to cluster detection. - - Provide metrics: last sync success timestamp, number of nodes, etc. - -## Installer Changes - -1. Host install path (`install.sh` invoking `install-sensor-proxy.sh`) - - Always pass `--pulse-server http://:`. - - If `--pulse-server` not supplied manually, `install-sensor-proxy.sh` fetches from `PULSE_SERVER` env. -2. `install-sensor-proxy.sh` - - After downloading binary, run registration: - ``` - ctrl_token=$(register_with_pulse "$PULSE_SERVER" "$SHORT_HOSTNAME" "$PROXY_URL" "$MODE") - echo "$ctrl_token" > /etc/pulse-sensor-proxy/.pulse-control-token - ``` - - Append control-plane block to config if not present. - - After install, call new authorized-nodes endpoint once to prime the cache. - - Continue merging `allowed_nodes` for fallback, but treat as `# Legacy fallback`. -3. Provide migration flag `--legacy-allowlist` to skip control plane (for air-gapped hosts). - -## Migration Plan - -1. Ship allowlist merge fix (already done locally) so reruns stop causing YAML errors. -2. Release intermediate version where installer accepts `--pulse-server` and registers proxies; proxy ignores new config fields until next release. -3. Release proxy with control-plane sync; ensure it tolerates missing control block (for older installs). -4. Update docs + UI to show last proxy sync state (diagnostics tab). - -## Open Questions / TODO - -- Decide whether ctrl_token reuses `TemperatureProxyToken` (rename field) or is separate. -- How to handle multiple Pulse servers controlling the same host (future?). For now, one ctrl token per PVE instance. -- Should HTTP-mode proxies reuse the same sync endpoint (yes). - diff --git a/docs/zfs-monitoring.md b/docs/zfs-monitoring.md deleted file mode 100644 index ca02d416b..000000000 --- a/docs/zfs-monitoring.md +++ /dev/null @@ -1,98 +0,0 @@ -# ZFS Pool Monitoring - -Pulse v4.15.0+ includes automatic ZFS pool health monitoring for Proxmox VE nodes. - -## Features - -- **Automatic Detection**: Detects ZFS storage and monitors associated pools -- **Health Status**: Monitors pool state (ONLINE, DEGRADED, FAULTED) -- **Error Tracking**: Tracks read, write, and checksum errors -- **Device Monitoring**: Monitors individual devices within pools -- **Alert Generation**: Creates alerts for degraded pools and device errors -- **Frontend Display**: Shows ZFS issues inline with storage information - -## Requirements - -### Proxmox Permissions -The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` to access ZFS information: - -```bash -# Grant permission for ZFS monitoring (already included in standard Pulse role) -pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor -``` - -### API Endpoints Used -- `/nodes/{node}/disks/zfs` - Lists ZFS pools -- `/nodes/{node}/disks/zfs/{pool}` - Gets detailed pool status - -## Configuration - -ZFS monitoring is **enabled by default** in Pulse v4.15.0+. - -### Disabling ZFS Monitoring -If you want to disable ZFS monitoring (e.g., for performance reasons): - -```bash -# Add to /opt/pulse/.env or environment -PULSE_DISABLE_ZFS_MONITORING=true -``` - -## Alert Types - -### Pool State Alerts -- **Warning**: Pool is DEGRADED -- **Critical**: Pool is FAULTED or UNAVAIL - -### Error Alerts -- **Warning**: Any read/write/checksum errors detected -- Alerts include error counts and affected devices - -### Device Alerts -- **Warning**: Device has errors but is ONLINE -- **Critical**: Device is FAULTED or UNAVAIL - -## Frontend Display - -ZFS issues appear in the Storage tab: -- Yellow warning bar for degraded pools -- Red error counts for devices with issues -- Detailed device status for troubleshooting - -## Performance Impact - -- Adds 2 API calls per node with ZFS storage -- Typically adds <1 second to polling cycle -- Only queries nodes that have ZFS storage - -## Troubleshooting - -### No ZFS Data Appearing -1. Check permissions: `pveum user permissions pulse-monitor@pam` -2. Verify ZFS pools exist: `zpool list` -3. Check logs: `grep ZFS /opt/pulse/pulse.log` (raise log level to `debug` via **Settings β†’ System β†’ Logging** if you need more context, then switch back to `info`). - -### Permission Denied Errors -Grant the required permission: -```bash -pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor -``` - -### High API Load -Disable ZFS monitoring if not needed: -```bash -echo "PULSE_DISABLE_ZFS_MONITORING=true" >> /opt/pulse/.env -systemctl restart pulse -``` - -## Example Alert - -``` -Alert: ZFS pool 'rpool' is DEGRADED -Node: pve1 -Pool: rpool -State: DEGRADED -Errors: 12 read, 0 write, 3 checksum -Device sdb2: DEGRADED with 12 read errors -``` - -This helps administrators identify failing drives before complete failure occurs.