mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-05-12 05:45:27 +00:00
Refactor remaining docs and standardize naming
This commit is contained in:
parent
6bb11270a1
commit
53bfa2b2fd
7 changed files with 183 additions and 460 deletions
|
|
@ -1,117 +1,57 @@
|
|||
# Migrating Pulse
|
||||
# 🚚 Migrating Pulse
|
||||
|
||||
**Updated for Pulse v4.24.0**
|
||||
**Updated for Pulse v4.24.0+**
|
||||
|
||||
## Quick Migration Guide
|
||||
## 🚀 Quick Migration Guide
|
||||
|
||||
### ❌ DON'T: Copy files directly
|
||||
Never copy `/etc/pulse` or `/var/lib/pulse` directories between systems:
|
||||
- The encryption key is tied to the files
|
||||
- Credentials may be exposed
|
||||
- Configuration may not work on different systems
|
||||
### ❌ DON'T: Copy Files
|
||||
Never copy `/etc/pulse` or `/var/lib/pulse` manually. Encryption keys and credentials will break.
|
||||
|
||||
### ✅ DO: Use Export/Import
|
||||
|
||||
#### Exporting (Old Server)
|
||||
1. Open Pulse web interface
|
||||
2. Go to **Settings** → **Configuration Management**
|
||||
3. Click **Export Configuration**
|
||||
4. Enter a strong passphrase (you'll need this for import!)
|
||||
5. Save the downloaded file securely
|
||||
#### 1. Export (Old Server)
|
||||
1. Go to **Settings → Configuration Management**.
|
||||
2. Click **Export Configuration**.
|
||||
3. Enter a strong passphrase and save the `.enc` file.
|
||||
|
||||
#### Importing (New Server)
|
||||
1. Install fresh Pulse instance
|
||||
2. Open Pulse web interface
|
||||
3. Go to **Settings** → **Configuration Management**
|
||||
4. Click **Import Configuration**
|
||||
5. Select your exported file
|
||||
6. Enter the same passphrase
|
||||
7. Click Import
|
||||
8. **Post-migration verification (v4.24.0+)**:
|
||||
- Check scheduler health: `curl -s http://localhost:7655/api/monitoring/scheduler/health | jq`
|
||||
- Verify adaptive polling status: **Settings → System → Monitoring**
|
||||
- Confirm all nodes are connected and polling correctly
|
||||
#### 2. Import (New Server)
|
||||
1. Install a fresh Pulse instance.
|
||||
2. Go to **Settings → Configuration Management**.
|
||||
3. Click **Import Configuration** and upload your file.
|
||||
4. Enter the passphrase.
|
||||
|
||||
## What Gets Migrated
|
||||
## 📦 What Gets Migrated
|
||||
|
||||
✅ **Included:**
|
||||
- All PVE/PBS nodes and credentials
|
||||
- Alert settings and thresholds
|
||||
- Email configuration
|
||||
- Webhook configurations
|
||||
- System settings
|
||||
- Guest metadata (custom URLs, notes)
|
||||
| Included ✅ | Not Included ❌ |
|
||||
| :--- | :--- |
|
||||
| Nodes & Credentials | Historical Metrics |
|
||||
| Alert Settings | Alert History |
|
||||
| Email & Webhooks | Auth Settings (Passwords/Tokens) |
|
||||
| System Settings | Update Rollback History |
|
||||
| Guest Metadata | |
|
||||
|
||||
❌ **Not Included:**
|
||||
- Historical metrics data
|
||||
- Alert history
|
||||
- Authentication settings (passwords, API tokens)
|
||||
- **Updates rollback history** (v4.24.0+)
|
||||
- Each instance should configure its own authentication
|
||||
- **Note:** Updates rollback data isn't transferred and must be rebuilt by running one successful update cycle on the new host
|
||||
|
||||
## Common Scenarios
|
||||
## 🔄 Common Scenarios
|
||||
|
||||
### Moving to New Hardware
|
||||
1. Export from old server
|
||||
2. Shut down old Pulse instance
|
||||
3. Install Pulse on new hardware
|
||||
4. Import configuration
|
||||
5. Verify all nodes are connected
|
||||
Export from old → Install new → Import.
|
||||
|
||||
### Docker to Systemd (or vice versa)
|
||||
The export/import process works across all installation methods:
|
||||
- Docker → Systemd ✅
|
||||
- Systemd → Docker ✅
|
||||
- Docker → LXC ✅
|
||||
|
||||
### Backup Strategy
|
||||
**Weekly Backups:**
|
||||
1. Export configuration weekly
|
||||
2. Store exports with date: `pulse-backup-2024-01-15.enc`
|
||||
3. Keep last 4 backups
|
||||
4. Store passphrase securely (password manager)
|
||||
### Docker ↔ Systemd ↔ LXC
|
||||
The export file works across all installation methods. You can migrate from Docker to LXC or vice versa seamlessly.
|
||||
|
||||
### Disaster Recovery
|
||||
1. Install Pulse: `curl -sL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash`
|
||||
2. Import latest backup
|
||||
3. System restored in under 5 minutes!
|
||||
1. Install Pulse: `curl -sL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash`
|
||||
2. Import your latest backup.
|
||||
3. Restored in < 5 minutes.
|
||||
|
||||
## Security Notes
|
||||
## 🔒 Security
|
||||
|
||||
- **Passphrase Protection**: Exports are encrypted with PBKDF2 (100,000 iterations)
|
||||
- **Safe to Store**: Encrypted exports can be stored in cloud backups
|
||||
- **Minimum 12 characters**: Use a strong passphrase
|
||||
- **Password Manager**: Store your passphrase securely
|
||||
- **Rollback History**: Updates rollback data isn't included in exports; rebuild by running one successful update on the new host
|
||||
* **Encryption**: Exports are encrypted with PBKDF2 (100k iterations).
|
||||
* **Storage**: Safe to store in cloud backups or password managers.
|
||||
* **Passphrase**: Use a strong, unique passphrase (min 12 chars).
|
||||
|
||||
## Troubleshooting
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
**"Invalid passphrase" error**
|
||||
- Ensure you're using the exact same passphrase
|
||||
- Check for extra spaces or capitalization
|
||||
|
||||
**Missing nodes after import**
|
||||
- Verify the export was taken after adding the nodes
|
||||
- Check Settings to ensure nodes are listed
|
||||
|
||||
**Connection errors after import**
|
||||
- Node IPs may have changed
|
||||
- Update node addresses in Settings
|
||||
|
||||
**Logging issues after migration (v4.24.0+)**
|
||||
- If you lose logs after migration, ensure the runtime logging configuration persisted
|
||||
- Toggle **Settings → System → Logging** to your desired level
|
||||
- Check environment variables: `LOG_LEVEL`, `LOG_FORMAT`
|
||||
- Verify log file rotation settings are correct
|
||||
|
||||
## Pro Tips
|
||||
|
||||
1. **Test imports**: Try importing on a test instance first
|
||||
2. **Document changes**: Note any manual configs not in Pulse
|
||||
3. **Version matching**: Best to import into same or newer Pulse version
|
||||
4. **Network access**: Ensure new server can reach all nodes
|
||||
|
||||
---
|
||||
|
||||
*Remember: Export/Import is the ONLY supported migration method. Direct file copying is not supported and may result in data loss.*
|
||||
* **"Invalid passphrase"**: Ensure exact match (case-sensitive).
|
||||
* **Missing Nodes**: Verify export date.
|
||||
* **Connection Errors**: Update node IPs in Settings if they changed.
|
||||
* **Logging**: Re-configure log levels in **Settings → System → Logging** if needed.
|
||||
40
docs/PROXY_CONTROL_PLANE.md
Normal file
40
docs/PROXY_CONTROL_PLANE.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# 📡 Proxy Control Plane
|
||||
|
||||
The Control Plane synchronizes `pulse-sensor-proxy` instances with the Pulse server, ensuring they trust the correct nodes without manual configuration.
|
||||
|
||||
## 🏗️ Architecture
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
Pulse[Pulse Server] -- HTTPS /api/temperature-proxy --> Proxy[Sensor Proxy]
|
||||
Proxy -- SSH --> Nodes[Cluster Nodes]
|
||||
```
|
||||
|
||||
1. **Registration**: The proxy registers with Pulse on startup/install.
|
||||
2. **Sync**: The proxy periodically fetches the "Authorized Nodes" list from Pulse.
|
||||
3. **Validation**: The proxy only executes commands on nodes authorized by Pulse.
|
||||
|
||||
## 🔄 Workflow
|
||||
|
||||
1. **Install**: `install-sensor-proxy.sh` calls `/api/temperature-proxy/register`.
|
||||
2. **Token Exchange**: Pulse returns a `ctrl_token` which the proxy saves to `/etc/pulse-sensor-proxy/.pulse-control-token`.
|
||||
3. **Polling**: The proxy polls `/api/temperature-proxy/authorized-nodes` every 60s (configurable).
|
||||
4. **Update**: If the node list changes (e.g., a new node is added to Pulse), the proxy updates its internal allowlist automatically.
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
The proxy configuration in `/etc/pulse-sensor-proxy/config.yaml` handles the sync:
|
||||
|
||||
```yaml
|
||||
pulse_control_plane:
|
||||
url: https://pulse.example.com:7655
|
||||
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
|
||||
refresh_interval: 60s
|
||||
```
|
||||
|
||||
## 🛡️ Security
|
||||
|
||||
* **Tokens**: The `ctrl_token` is unique per proxy instance.
|
||||
* **Least Privilege**: The proxy only knows about nodes explicitly added to Pulse.
|
||||
* **Fallback**: If the control plane is unreachable, the proxy uses its last known good configuration.
|
||||
|
||||
63
docs/SCRIPT_LIBRARY.md
Normal file
63
docs/SCRIPT_LIBRARY.md
Normal file
|
|
@ -0,0 +1,63 @@
|
|||
# 📜 Script Library Guide
|
||||
|
||||
This guide explains the shared Bash modules in `scripts/lib/` used for building installer scripts.
|
||||
|
||||
## 📂 Structure
|
||||
|
||||
| File | Purpose |
|
||||
| :--- | :--- |
|
||||
| `common.sh` | Logging, error handling, retry helpers, temp dirs. |
|
||||
| `http.sh` | Curl/wget wrappers, GitHub release helpers. |
|
||||
| `systemd.sh` | Systemd unit management helpers. |
|
||||
|
||||
**Conventions:**
|
||||
* **Namespaces:** Functions are exported as `module::function` (e.g., `common::run`).
|
||||
* **Bundling:** `make bundle-scripts` inlines modules for distribution.
|
||||
* **Compatibility:** Targets Bash 5 on Debian 11+ and Ubuntu LTS.
|
||||
|
||||
## 🦴 Script Skeleton
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/lib" && pwd)"
|
||||
# shellcheck source=../../scripts/lib/common.sh
|
||||
source "${LIB_DIR}/common.sh"
|
||||
# shellcheck source=../../scripts/lib/systemd.sh
|
||||
source "${LIB_DIR}/systemd.sh"
|
||||
|
||||
common::init "$@"
|
||||
common::require_command curl tar
|
||||
|
||||
main() {
|
||||
common::log_info "Starting installer..."
|
||||
common::temp_dir WORKDIR --prefix pulse-
|
||||
|
||||
http::download --url "${URL}" --output "${WORKDIR}/pulse.tar.gz"
|
||||
|
||||
systemd::create_service /etc/systemd/system/pulse.service <<'UNIT'
|
||||
[Unit]
|
||||
Description=Pulse Monitoring
|
||||
UNIT
|
||||
|
||||
systemd::enable_and_start pulse.service
|
||||
}
|
||||
|
||||
main "$@"
|
||||
```
|
||||
|
||||
## 🛠️ Best Practices
|
||||
|
||||
* **Logging:** Use `common::log_info`, `common::log_warn`, etc. They respect `PULSE_LOG_LEVEL`.
|
||||
* **Dry Run:** Wrap mutating commands in `common::run` to support `--dry-run`.
|
||||
* **Testing:** Use `scripts/tests/run.sh` for linting and `scripts/tests/integration/` for scenarios.
|
||||
|
||||
## 📦 Bundling
|
||||
|
||||
1. Update `scripts/bundle.manifest`.
|
||||
2. Run `make bundle-scripts`.
|
||||
3. Verify `dist/` artifacts.
|
||||
|
||||
**Note:** Never edit bundled artifacts manually. Always rebuild from source.
|
||||
|
||||
42
docs/ZFS_MONITORING.md
Normal file
42
docs/ZFS_MONITORING.md
Normal file
|
|
@ -0,0 +1,42 @@
|
|||
# 💾 ZFS Pool Monitoring
|
||||
|
||||
Pulse automatically detects and monitors ZFS pools on your Proxmox nodes.
|
||||
|
||||
## 🚀 Features
|
||||
|
||||
* **Auto-Detection**: No configuration needed.
|
||||
* **Health Status**: Tracks `ONLINE`, `DEGRADED`, and `FAULTED` states.
|
||||
* **Error Tracking**: Monitors read, write, and checksum errors.
|
||||
* **Alerts**: Notifies you of degraded pools or failing devices.
|
||||
|
||||
## ⚙️ Requirements
|
||||
|
||||
The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` (included in the standard Pulse role).
|
||||
|
||||
```bash
|
||||
# Grant permission manually if needed
|
||||
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
|
||||
```
|
||||
|
||||
## 🔧 Configuration
|
||||
|
||||
ZFS monitoring is **enabled by default**. To disable it:
|
||||
|
||||
```bash
|
||||
# Add to /opt/pulse/.env
|
||||
PULSE_DISABLE_ZFS_MONITORING=true
|
||||
```
|
||||
|
||||
## 🚨 Alerts
|
||||
|
||||
| Severity | Condition |
|
||||
| :--- | :--- |
|
||||
| **Warning** | Pool `DEGRADED` or any read/write/checksum errors. |
|
||||
| **Critical** | Pool `FAULTED` or `UNAVAIL`. |
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
**No ZFS Data?**
|
||||
1. Check permissions: `pveum user permissions pulse-monitor@pam`.
|
||||
2. Verify pools exist: `zpool list`.
|
||||
3. Check logs: `grep ZFS /opt/pulse/pulse.log`.
|
||||
|
|
@ -1,144 +0,0 @@
|
|||
# Pulse Script Library Guide
|
||||
|
||||
This guide expands on `scripts/lib/README.md` and explains how the shared Bash
|
||||
modules fit together when you are building or refactoring installer scripts.
|
||||
|
||||
---
|
||||
|
||||
## Library Structure
|
||||
|
||||
```
|
||||
scripts/
|
||||
lib/
|
||||
common.sh # Logging, error handling, retry helpers, temp dirs
|
||||
http.sh # Curl/wget wrappers, GitHub release helpers
|
||||
systemd.sh # Systemd unit management helpers
|
||||
README.md # API-level reference
|
||||
```
|
||||
|
||||
Key conventions:
|
||||
|
||||
- **Namespaces:** Exported functions are declared as `module::function` (for
|
||||
example `common::run`, `systemd::create_service`). Avoid referencing private
|
||||
helpers (`module::__helper`) from other modules.
|
||||
- **Development vs Bundled mode:** During local development scripts source
|
||||
modules from `scripts/lib`. Bundled artifacts produced by
|
||||
`make bundle-scripts` contain the modules inline, so the source guards remain
|
||||
but resolve to no-ops.
|
||||
- **Compatibility:** The library targets Bash 5 but must run on Debian 11+
|
||||
(Pulse LXC), Ubuntu LTS, and minimal container images. Stick to POSIX shell
|
||||
built-ins or guarded GNU extensions.
|
||||
|
||||
---
|
||||
|
||||
## Recommended Script Skeleton
|
||||
|
||||
```bash
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/lib" && pwd)"
|
||||
# shellcheck source=../../scripts/lib/common.sh
|
||||
source "${LIB_DIR}/common.sh"
|
||||
# shellcheck source=../../scripts/lib/systemd.sh
|
||||
source "${LIB_DIR}/systemd.sh"
|
||||
|
||||
common::init "$@"
|
||||
common::require_command curl tar
|
||||
|
||||
main() {
|
||||
common::log_info "Starting installer..."
|
||||
common::temp_dir WORKDIR --prefix pulse-
|
||||
download_payload
|
||||
install_service
|
||||
}
|
||||
|
||||
download_payload() {
|
||||
http::download --url "${PULSE_DOWNLOAD_URL}" --output "${WORKDIR}/pulse.tar.gz"
|
||||
}
|
||||
|
||||
install_service() {
|
||||
systemd::create_service /etc/systemd/system/pulse.service <<'UNIT'
|
||||
[Unit]
|
||||
Description=Pulse Monitoring
|
||||
After=network-online.target
|
||||
UNIT
|
||||
|
||||
systemd::enable_and_start pulse.service
|
||||
}
|
||||
|
||||
main "$@"
|
||||
```
|
||||
|
||||
**Why this layout works**
|
||||
|
||||
- `common::init` centralises logging/traps and stores the original CLI args so
|
||||
you can re-exec under sudo if required (`common::ensure_root`).
|
||||
- `common::temp_dir` registers a cleanup handler automatically to keep `/tmp`
|
||||
tidy.
|
||||
- `systemd::create_service` respects `--dry-run` flags and prevents partial
|
||||
writes on failure.
|
||||
|
||||
---
|
||||
|
||||
## Logging and Dry-Run Practices
|
||||
|
||||
- Respect `PULSE_LOG_LEVEL` and `PULSE_DEBUG`—they are already wired into
|
||||
`common::log_*`.
|
||||
- Wrap mutating commands in `common::run` or `common::run_capture` to inherit
|
||||
retry/backoff logic and `--dry-run` behaviour.
|
||||
- Provide meaningful `--label` values on long-running steps to improve CI log
|
||||
readability.
|
||||
|
||||
Example:
|
||||
|
||||
```bash
|
||||
common::run --label "Extract Pulse binary" \
|
||||
-- tar -xzf "${ARCHIVE}" -C "${TARGET_DIR}" --strip-components=1
|
||||
```
|
||||
|
||||
When invoked with `--dry-run`, the command prints the operation instead of
|
||||
executing and exits successfully—keep this in mind when writing tests.
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
- **Smoke tests:** `scripts/tests/run.sh` lints scripts, validates manifests, and
|
||||
exercises bundle generation.
|
||||
- **Integration tests:** Place scenario-specific scripts under
|
||||
`scripts/tests/integration/`. They should run quickly (<30s) and clean up
|
||||
after themselves.
|
||||
- **Manual verification:** For destructive operations (e.g., provisioning an LXC
|
||||
container), run the script with `--dry-run` to confirm the steps before
|
||||
executing against real infrastructure.
|
||||
|
||||
When adding new library helpers, accompany them with unit coverage using
|
||||
`bats` (found under `testing-tools/bats`) or an integration script that covers
|
||||
the happy path and a failure case.
|
||||
|
||||
---
|
||||
|
||||
## Bundling Checklist
|
||||
|
||||
1. Update `scripts/bundle.manifest` with any newly created scripts.
|
||||
2. Run `make bundle-scripts` (or `./scripts/bundle.sh`) to regenerate `dist/*`.
|
||||
3. Inspect the diff to ensure only intentional changes appear.
|
||||
4. Re-run `scripts/tests/run.sh` to catch lint and shellcheck regressions.
|
||||
|
||||
Bundled files embed provenance metadata (timestamp + manifest path). Do not edit
|
||||
bundled artifacts by hand—always rebuild from sources.
|
||||
|
||||
---
|
||||
|
||||
## When to Extend the Library
|
||||
|
||||
- You need to reuse logic across two or more scripts.
|
||||
- A helper hides platform-specific differences (e.g., `systemctl` vs `service`
|
||||
on legacy systems).
|
||||
- The code is complex enough that centralised unit tests provide value.
|
||||
|
||||
Document new functions in `scripts/lib/README.md` and update this guide if usage
|
||||
patterns change. Keeping these references in sync helps future contributors
|
||||
avoid copy/paste or undocumented conventions.
|
||||
|
||||
|
|
@ -1,120 +0,0 @@
|
|||
# Pulse Temperature Proxy – Control Plane Sync
|
||||
|
||||
## Goals
|
||||
|
||||
1. Make `pulse-sensor-proxy` trust Pulse itself instead of scraping `pvecm`/editing `/etc/pve`.
|
||||
2. Ensure host installers always create a pulse-proxy registration, regardless of socket vs HTTP mode.
|
||||
3. Keep backwards compatibility: existing `allowed_nodes` entries remain a fallback cache, but the runtime source of truth is Pulse.
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
┌─────────────────────┐ HTTPS / Unix socket ┌─────────────────────┐
|
||||
│ Pulse server (LXC) │ <═════════════════════════════> │ pulse-sensor-proxy │
|
||||
│ │ /api/... │ (Proxmox host) │
|
||||
│ - Stores nodes │ │ - Collects temps │
|
||||
│ - Issues proxy token│ │ - Validates node │
|
||||
└─────────────────────┘ │ via synced list │
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
1. Installer registers the proxy using `/api/temperature-proxy/register`.
|
||||
- Response now includes `ctrl_token`, `instance_id`, and `allowed_nodes`.
|
||||
- Pulse persists `{instance_id, ctrl_token, last_seen, allowed_nodes_cache}`.
|
||||
2. Proxy writes:
|
||||
```yaml
|
||||
pulse_control_plane:
|
||||
url: https://pulse.example.com:7655
|
||||
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
|
||||
refresh_interval: 60s
|
||||
```
|
||||
3. Proxy boot sequence:
|
||||
- Load cached `allowed_nodes` from YAML (fallback only).
|
||||
- If `pulse_control_plane` configured, fetch `/api/temperature-proxy/authorized-nodes`.
|
||||
- Replace in-memory allowlist atomically, log version/hash.
|
||||
- Retry based on exponential backoff; stay on cached list if control plane unreachable.
|
||||
|
||||
## API Changes (Pulse)
|
||||
|
||||
1. **Extend existing registration endpoint**
|
||||
- Request: `{hostname, proxy_url, kind}` (`kind` = `socket` or `http`).
|
||||
- Response: `{success, token, ctrl_token, pve_instance, allowed_nodes, refresh_interval}`.
|
||||
- Persist `ctrl_token` (or reuse `TemperatureProxyToken` field if `proxy_url` empty).
|
||||
2. **New endpoint** `/api/temperature-proxy/authorized-nodes`
|
||||
- Auth: `X-Proxy-Token: <ctrl_token>` or `Authorization: Bearer`.
|
||||
- Response:
|
||||
```json
|
||||
{
|
||||
"nodes": [
|
||||
{"name": "delly", "ip": "192.168.0.5"},
|
||||
{"name": "minipc", "ip": "192.168.0.134"}
|
||||
],
|
||||
"hash": "sha256:...",
|
||||
"refresh_interval": 60,
|
||||
"updated_at": "2025-11-15T20:47:00Z"
|
||||
}
|
||||
```
|
||||
- Uses Pulse config (`nodes.enc` + cluster endpoints) to build list.
|
||||
- Derives `ip` from cluster endpoints or stored host value; duplicates removed.
|
||||
- Logs when proxies pull list (metrics + last_seen).
|
||||
3. **Persistence**
|
||||
- `config.PVEInstance` already has `TemperatureProxyURL`/`Token`. Add `TemperatureProxyControlToken` or reuse existing field when URL empty.
|
||||
- Add `LastProxyPull`, `LastAllowlistHash`.
|
||||
4. **Access control**
|
||||
- Router should treat `/api/temperature-proxy/authorized-nodes` as public but requiring proxy token (bypasses user auth).
|
||||
- Rate limit per proxy (maybe 12/min).
|
||||
|
||||
## Proxy Changes
|
||||
|
||||
1. **Config additions**
|
||||
```yaml
|
||||
pulse_control_plane:
|
||||
url: https://pulse.lan:7655
|
||||
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
|
||||
refresh_interval: 60s # default
|
||||
insecure_skip_verify: false
|
||||
```
|
||||
2. **Startup**
|
||||
- Read token from `token_file`.
|
||||
- Launch goroutine: `syncAllowlist(ctx)` loops:
|
||||
1. GET `/api/temperature-proxy/authorized-nodes`.
|
||||
2. Validate response (non-empty, verify hash changes).
|
||||
3. Replace `nodeValidator` allowlist in thread-safe way.
|
||||
4. Write new snapshot to `allowed_nodes_cache` (optional).
|
||||
5. Sleep `refresh_interval` (server-provided).
|
||||
- If call fails: log warning, keep last known list, use fallback allowlist when empty.
|
||||
3. **NodeValidator**
|
||||
- Keep ability to parse static `allowed_nodes`.
|
||||
- Add `SetAuthorizedNodes([]string)` to update hosts + CIDRs.
|
||||
- When `hasAllowlist == false` but control-plane sync enabled, we never fall back to cluster detection.
|
||||
- Provide metrics: last sync success timestamp, number of nodes, etc.
|
||||
|
||||
## Installer Changes
|
||||
|
||||
1. Host install path (`install.sh` invoking `install-sensor-proxy.sh`)
|
||||
- Always pass `--pulse-server http://<container-ip>:<port>`.
|
||||
- If `--pulse-server` not supplied manually, `install-sensor-proxy.sh` fetches from `PULSE_SERVER` env.
|
||||
2. `install-sensor-proxy.sh`
|
||||
- After downloading binary, run registration:
|
||||
```
|
||||
ctrl_token=$(register_with_pulse "$PULSE_SERVER" "$SHORT_HOSTNAME" "$PROXY_URL" "$MODE")
|
||||
echo "$ctrl_token" > /etc/pulse-sensor-proxy/.pulse-control-token
|
||||
```
|
||||
- Append control-plane block to config if not present.
|
||||
- After install, call new authorized-nodes endpoint once to prime the cache.
|
||||
- Continue merging `allowed_nodes` for fallback, but treat as `# Legacy fallback`.
|
||||
3. Provide migration flag `--legacy-allowlist` to skip control plane (for air-gapped hosts).
|
||||
|
||||
## Migration Plan
|
||||
|
||||
1. Ship allowlist merge fix (already done locally) so reruns stop causing YAML errors.
|
||||
2. Release intermediate version where installer accepts `--pulse-server` and registers proxies; proxy ignores new config fields until next release.
|
||||
3. Release proxy with control-plane sync; ensure it tolerates missing control block (for older installs).
|
||||
4. Update docs + UI to show last proxy sync state (diagnostics tab).
|
||||
|
||||
## Open Questions / TODO
|
||||
|
||||
- Decide whether ctrl_token reuses `TemperatureProxyToken` (rename field) or is separate.
|
||||
- How to handle multiple Pulse servers controlling the same host (future?). For now, one ctrl token per PVE instance.
|
||||
- Should HTTP-mode proxies reuse the same sync endpoint (yes).
|
||||
|
||||
|
|
@ -1,98 +0,0 @@
|
|||
# ZFS Pool Monitoring
|
||||
|
||||
Pulse v4.15.0+ includes automatic ZFS pool health monitoring for Proxmox VE nodes.
|
||||
|
||||
## Features
|
||||
|
||||
- **Automatic Detection**: Detects ZFS storage and monitors associated pools
|
||||
- **Health Status**: Monitors pool state (ONLINE, DEGRADED, FAULTED)
|
||||
- **Error Tracking**: Tracks read, write, and checksum errors
|
||||
- **Device Monitoring**: Monitors individual devices within pools
|
||||
- **Alert Generation**: Creates alerts for degraded pools and device errors
|
||||
- **Frontend Display**: Shows ZFS issues inline with storage information
|
||||
|
||||
## Requirements
|
||||
|
||||
### Proxmox Permissions
|
||||
The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` to access ZFS information:
|
||||
|
||||
```bash
|
||||
# Grant permission for ZFS monitoring (already included in standard Pulse role)
|
||||
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
|
||||
```
|
||||
|
||||
### API Endpoints Used
|
||||
- `/nodes/{node}/disks/zfs` - Lists ZFS pools
|
||||
- `/nodes/{node}/disks/zfs/{pool}` - Gets detailed pool status
|
||||
|
||||
## Configuration
|
||||
|
||||
ZFS monitoring is **enabled by default** in Pulse v4.15.0+.
|
||||
|
||||
### Disabling ZFS Monitoring
|
||||
If you want to disable ZFS monitoring (e.g., for performance reasons):
|
||||
|
||||
```bash
|
||||
# Add to /opt/pulse/.env or environment
|
||||
PULSE_DISABLE_ZFS_MONITORING=true
|
||||
```
|
||||
|
||||
## Alert Types
|
||||
|
||||
### Pool State Alerts
|
||||
- **Warning**: Pool is DEGRADED
|
||||
- **Critical**: Pool is FAULTED or UNAVAIL
|
||||
|
||||
### Error Alerts
|
||||
- **Warning**: Any read/write/checksum errors detected
|
||||
- Alerts include error counts and affected devices
|
||||
|
||||
### Device Alerts
|
||||
- **Warning**: Device has errors but is ONLINE
|
||||
- **Critical**: Device is FAULTED or UNAVAIL
|
||||
|
||||
## Frontend Display
|
||||
|
||||
ZFS issues appear in the Storage tab:
|
||||
- Yellow warning bar for degraded pools
|
||||
- Red error counts for devices with issues
|
||||
- Detailed device status for troubleshooting
|
||||
|
||||
## Performance Impact
|
||||
|
||||
- Adds 2 API calls per node with ZFS storage
|
||||
- Typically adds <1 second to polling cycle
|
||||
- Only queries nodes that have ZFS storage
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No ZFS Data Appearing
|
||||
1. Check permissions: `pveum user permissions pulse-monitor@pam`
|
||||
2. Verify ZFS pools exist: `zpool list`
|
||||
3. Check logs: `grep ZFS /opt/pulse/pulse.log` (raise log level to `debug` via **Settings → System → Logging** if you need more context, then switch back to `info`).
|
||||
|
||||
### Permission Denied Errors
|
||||
Grant the required permission:
|
||||
```bash
|
||||
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
|
||||
```
|
||||
|
||||
### High API Load
|
||||
Disable ZFS monitoring if not needed:
|
||||
```bash
|
||||
echo "PULSE_DISABLE_ZFS_MONITORING=true" >> /opt/pulse/.env
|
||||
systemctl restart pulse
|
||||
```
|
||||
|
||||
## Example Alert
|
||||
|
||||
```
|
||||
Alert: ZFS pool 'rpool' is DEGRADED
|
||||
Node: pve1
|
||||
Pool: rpool
|
||||
State: DEGRADED
|
||||
Errors: 12 read, 0 write, 3 checksum
|
||||
Device sdb2: DEGRADED with 12 read errors
|
||||
```
|
||||
|
||||
This helps administrators identify failing drives before complete failure occurs.
|
||||
Loading…
Add table
Add a link
Reference in a new issue