Refactor remaining docs and standardize naming

This commit is contained in:
courtmanr@gmail.com 2025-11-25 00:28:33 +00:00
parent 6bb11270a1
commit 53bfa2b2fd
7 changed files with 183 additions and 460 deletions

View file

@ -1,117 +1,57 @@
# Migrating Pulse
# 🚚 Migrating Pulse
**Updated for Pulse v4.24.0**
**Updated for Pulse v4.24.0+**
## Quick Migration Guide
## 🚀 Quick Migration Guide
### ❌ DON'T: Copy files directly
Never copy `/etc/pulse` or `/var/lib/pulse` directories between systems:
- The encryption key is tied to the files
- Credentials may be exposed
- Configuration may not work on different systems
### ❌ DON'T: Copy Files
Never copy `/etc/pulse` or `/var/lib/pulse` manually. Encryption keys and credentials will break.
### ✅ DO: Use Export/Import
#### Exporting (Old Server)
1. Open Pulse web interface
2. Go to **Settings** → **Configuration Management**
3. Click **Export Configuration**
4. Enter a strong passphrase (you'll need this for import!)
5. Save the downloaded file securely
#### 1. Export (Old Server)
1. Go to **Settings → Configuration Management**.
2. Click **Export Configuration**.
3. Enter a strong passphrase and save the `.enc` file.
#### Importing (New Server)
1. Install fresh Pulse instance
2. Open Pulse web interface
3. Go to **Settings** → **Configuration Management**
4. Click **Import Configuration**
5. Select your exported file
6. Enter the same passphrase
7. Click Import
8. **Post-migration verification (v4.24.0+)**:
- Check scheduler health: `curl -s http://localhost:7655/api/monitoring/scheduler/health | jq`
- Verify adaptive polling status: **Settings → System → Monitoring**
- Confirm all nodes are connected and polling correctly
#### 2. Import (New Server)
1. Install a fresh Pulse instance.
2. Go to **Settings → Configuration Management**.
3. Click **Import Configuration** and upload your file.
4. Enter the passphrase.
## What Gets Migrated
## 📦 What Gets Migrated
✅ **Included:**
- All PVE/PBS nodes and credentials
- Alert settings and thresholds
- Email configuration
- Webhook configurations
- System settings
- Guest metadata (custom URLs, notes)
| Included ✅ | Not Included ❌ |
| :--- | :--- |
| Nodes & Credentials | Historical Metrics |
| Alert Settings | Alert History |
| Email & Webhooks | Auth Settings (Passwords/Tokens) |
| System Settings | Update Rollback History |
| Guest Metadata | |
❌ **Not Included:**
- Historical metrics data
- Alert history
- Authentication settings (passwords, API tokens)
- **Updates rollback history** (v4.24.0+)
- Each instance should configure its own authentication
- **Note:** Updates rollback data isn't transferred and must be rebuilt by running one successful update cycle on the new host
## Common Scenarios
## 🔄 Common Scenarios
### Moving to New Hardware
1. Export from old server
2. Shut down old Pulse instance
3. Install Pulse on new hardware
4. Import configuration
5. Verify all nodes are connected
Export from old → Install new → Import.
### Docker to Systemd (or vice versa)
The export/import process works across all installation methods:
- Docker → Systemd ✅
- Systemd → Docker ✅
- Docker → LXC ✅
### Backup Strategy
**Weekly Backups:**
1. Export configuration weekly
2. Store exports with date: `pulse-backup-2024-01-15.enc`
3. Keep last 4 backups
4. Store passphrase securely (password manager)
### Docker ↔ Systemd ↔ LXC
The export file works across all installation methods. You can migrate from Docker to LXC or vice versa seamlessly.
### Disaster Recovery
1. Install Pulse: `curl -sL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash`
2. Import latest backup
3. System restored in under 5 minutes!
1. Install Pulse: `curl -sL https://github.com/rcourtman/Pulse/releases/latest/download/install.sh | bash`
2. Import your latest backup.
3. Restored in < 5 minutes.
## Security Notes
## 🔒 Security
- **Passphrase Protection**: Exports are encrypted with PBKDF2 (100,000 iterations)
- **Safe to Store**: Encrypted exports can be stored in cloud backups
- **Minimum 12 characters**: Use a strong passphrase
- **Password Manager**: Store your passphrase securely
- **Rollback History**: Updates rollback data isn't included in exports; rebuild by running one successful update on the new host
* **Encryption**: Exports are encrypted with PBKDF2 (100k iterations).
* **Storage**: Safe to store in cloud backups or password managers.
* **Passphrase**: Use a strong, unique passphrase (min 12 chars).
## Troubleshooting
## 🔧 Troubleshooting
**"Invalid passphrase" error**
- Ensure you're using the exact same passphrase
- Check for extra spaces or capitalization
**Missing nodes after import**
- Verify the export was taken after adding the nodes
- Check Settings to ensure nodes are listed
**Connection errors after import**
- Node IPs may have changed
- Update node addresses in Settings
**Logging issues after migration (v4.24.0+)**
- If you lose logs after migration, ensure the runtime logging configuration persisted
- Toggle **Settings → System → Logging** to your desired level
- Check environment variables: `LOG_LEVEL`, `LOG_FORMAT`
- Verify log file rotation settings are correct
## Pro Tips
1. **Test imports**: Try importing on a test instance first
2. **Document changes**: Note any manual configs not in Pulse
3. **Version matching**: Best to import into same or newer Pulse version
4. **Network access**: Ensure new server can reach all nodes
---
*Remember: Export/Import is the ONLY supported migration method. Direct file copying is not supported and may result in data loss.*
* **"Invalid passphrase"**: Ensure exact match (case-sensitive).
* **Missing Nodes**: Verify export date.
* **Connection Errors**: Update node IPs in Settings if they changed.
* **Logging**: Re-configure log levels in **Settings → System → Logging** if needed.

View file

@ -0,0 +1,40 @@
# 📡 Proxy Control Plane
The Control Plane synchronizes `pulse-sensor-proxy` instances with the Pulse server, ensuring they trust the correct nodes without manual configuration.
## 🏗️ Architecture
```mermaid
graph LR
Pulse[Pulse Server] -- HTTPS /api/temperature-proxy --> Proxy[Sensor Proxy]
Proxy -- SSH --> Nodes[Cluster Nodes]
```
1. **Registration**: The proxy registers with Pulse on startup/install.
2. **Sync**: The proxy periodically fetches the "Authorized Nodes" list from Pulse.
3. **Validation**: The proxy only executes commands on nodes authorized by Pulse.
## 🔄 Workflow
1. **Install**: `install-sensor-proxy.sh` calls `/api/temperature-proxy/register`.
2. **Token Exchange**: Pulse returns a `ctrl_token` which the proxy saves to `/etc/pulse-sensor-proxy/.pulse-control-token`.
3. **Polling**: The proxy polls `/api/temperature-proxy/authorized-nodes` every 60s (configurable).
4. **Update**: If the node list changes (e.g., a new node is added to Pulse), the proxy updates its internal allowlist automatically.
## ⚙️ Configuration
The proxy configuration in `/etc/pulse-sensor-proxy/config.yaml` handles the sync:
```yaml
pulse_control_plane:
url: https://pulse.example.com:7655
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
refresh_interval: 60s
```
## 🛡️ Security
* **Tokens**: The `ctrl_token` is unique per proxy instance.
* **Least Privilege**: The proxy only knows about nodes explicitly added to Pulse.
* **Fallback**: If the control plane is unreachable, the proxy uses its last known good configuration.

63
docs/SCRIPT_LIBRARY.md Normal file
View file

@ -0,0 +1,63 @@
# 📜 Script Library Guide
This guide explains the shared Bash modules in `scripts/lib/` used for building installer scripts.
## 📂 Structure
| File | Purpose |
| :--- | :--- |
| `common.sh` | Logging, error handling, retry helpers, temp dirs. |
| `http.sh` | Curl/wget wrappers, GitHub release helpers. |
| `systemd.sh` | Systemd unit management helpers. |
**Conventions:**
* **Namespaces:** Functions are exported as `module::function` (e.g., `common::run`).
* **Bundling:** `make bundle-scripts` inlines modules for distribution.
* **Compatibility:** Targets Bash 5 on Debian 11+ and Ubuntu LTS.
## 🦴 Script Skeleton
```bash
#!/usr/bin/env bash
set -euo pipefail
LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/lib" && pwd)"
# shellcheck source=../../scripts/lib/common.sh
source "${LIB_DIR}/common.sh"
# shellcheck source=../../scripts/lib/systemd.sh
source "${LIB_DIR}/systemd.sh"
common::init "$@"
common::require_command curl tar
main() {
common::log_info "Starting installer..."
common::temp_dir WORKDIR --prefix pulse-
http::download --url "${URL}" --output "${WORKDIR}/pulse.tar.gz"
systemd::create_service /etc/systemd/system/pulse.service <<'UNIT'
[Unit]
Description=Pulse Monitoring
UNIT
systemd::enable_and_start pulse.service
}
main "$@"
```
## 🛠️ Best Practices
* **Logging:** Use `common::log_info`, `common::log_warn`, etc. They respect `PULSE_LOG_LEVEL`.
* **Dry Run:** Wrap mutating commands in `common::run` to support `--dry-run`.
* **Testing:** Use `scripts/tests/run.sh` for linting and `scripts/tests/integration/` for scenarios.
## 📦 Bundling
1. Update `scripts/bundle.manifest`.
2. Run `make bundle-scripts`.
3. Verify `dist/` artifacts.
**Note:** Never edit bundled artifacts manually. Always rebuild from source.

42
docs/ZFS_MONITORING.md Normal file
View file

@ -0,0 +1,42 @@
# 💾 ZFS Pool Monitoring
Pulse automatically detects and monitors ZFS pools on your Proxmox nodes.
## 🚀 Features
* **Auto-Detection**: No configuration needed.
* **Health Status**: Tracks `ONLINE`, `DEGRADED`, and `FAULTED` states.
* **Error Tracking**: Monitors read, write, and checksum errors.
* **Alerts**: Notifies you of degraded pools or failing devices.
## ⚙️ Requirements
The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` (included in the standard Pulse role).
```bash
# Grant permission manually if needed
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
```
## 🔧 Configuration
ZFS monitoring is **enabled by default**. To disable it:
```bash
# Add to /opt/pulse/.env
PULSE_DISABLE_ZFS_MONITORING=true
```
## 🚨 Alerts
| Severity | Condition |
| :--- | :--- |
| **Warning** | Pool `DEGRADED` or any read/write/checksum errors. |
| **Critical** | Pool `FAULTED` or `UNAVAIL`. |
## 🔍 Troubleshooting
**No ZFS Data?**
1. Check permissions: `pveum user permissions pulse-monitor@pam`.
2. Verify pools exist: `zpool list`.
3. Check logs: `grep ZFS /opt/pulse/pulse.log`.

View file

@ -1,144 +0,0 @@
# Pulse Script Library Guide
This guide expands on `scripts/lib/README.md` and explains how the shared Bash
modules fit together when you are building or refactoring installer scripts.
---
## Library Structure
```
scripts/
lib/
common.sh # Logging, error handling, retry helpers, temp dirs
http.sh # Curl/wget wrappers, GitHub release helpers
systemd.sh # Systemd unit management helpers
README.md # API-level reference
```
Key conventions:
- **Namespaces:** Exported functions are declared as `module::function` (for
example `common::run`, `systemd::create_service`). Avoid referencing private
helpers (`module::__helper`) from other modules.
- **Development vs Bundled mode:** During local development scripts source
modules from `scripts/lib`. Bundled artifacts produced by
`make bundle-scripts` contain the modules inline, so the source guards remain
but resolve to no-ops.
- **Compatibility:** The library targets Bash 5 but must run on Debian 11+
(Pulse LXC), Ubuntu LTS, and minimal container images. Stick to POSIX shell
built-ins or guarded GNU extensions.
---
## Recommended Script Skeleton
```bash
#!/usr/bin/env bash
set -euo pipefail
LIB_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/lib" && pwd)"
# shellcheck source=../../scripts/lib/common.sh
source "${LIB_DIR}/common.sh"
# shellcheck source=../../scripts/lib/systemd.sh
source "${LIB_DIR}/systemd.sh"
common::init "$@"
common::require_command curl tar
main() {
common::log_info "Starting installer..."
common::temp_dir WORKDIR --prefix pulse-
download_payload
install_service
}
download_payload() {
http::download --url "${PULSE_DOWNLOAD_URL}" --output "${WORKDIR}/pulse.tar.gz"
}
install_service() {
systemd::create_service /etc/systemd/system/pulse.service <<'UNIT'
[Unit]
Description=Pulse Monitoring
After=network-online.target
UNIT
systemd::enable_and_start pulse.service
}
main "$@"
```
**Why this layout works**
- `common::init` centralises logging/traps and stores the original CLI args so
you can re-exec under sudo if required (`common::ensure_root`).
- `common::temp_dir` registers a cleanup handler automatically to keep `/tmp`
tidy.
- `systemd::create_service` respects `--dry-run` flags and prevents partial
writes on failure.
---
## Logging and Dry-Run Practices
- Respect `PULSE_LOG_LEVEL` and `PULSE_DEBUG`—they are already wired into
`common::log_*`.
- Wrap mutating commands in `common::run` or `common::run_capture` to inherit
retry/backoff logic and `--dry-run` behaviour.
- Provide meaningful `--label` values on long-running steps to improve CI log
readability.
Example:
```bash
common::run --label "Extract Pulse binary" \
-- tar -xzf "${ARCHIVE}" -C "${TARGET_DIR}" --strip-components=1
```
When invoked with `--dry-run`, the command prints the operation instead of
executing and exits successfully—keep this in mind when writing tests.
---
## Testing Strategy
- **Smoke tests:** `scripts/tests/run.sh` lints scripts, validates manifests, and
exercises bundle generation.
- **Integration tests:** Place scenario-specific scripts under
`scripts/tests/integration/`. They should run quickly (<30s) and clean up
after themselves.
- **Manual verification:** For destructive operations (e.g., provisioning an LXC
container), run the script with `--dry-run` to confirm the steps before
executing against real infrastructure.
When adding new library helpers, accompany them with unit coverage using
`bats` (found under `testing-tools/bats`) or an integration script that covers
the happy path and a failure case.
---
## Bundling Checklist
1. Update `scripts/bundle.manifest` with any newly created scripts.
2. Run `make bundle-scripts` (or `./scripts/bundle.sh`) to regenerate `dist/*`.
3. Inspect the diff to ensure only intentional changes appear.
4. Re-run `scripts/tests/run.sh` to catch lint and shellcheck regressions.
Bundled files embed provenance metadata (timestamp + manifest path). Do not edit
bundled artifacts by hand—always rebuild from sources.
---
## When to Extend the Library
- You need to reuse logic across two or more scripts.
- A helper hides platform-specific differences (e.g., `systemctl` vs `service`
on legacy systems).
- The code is complex enough that centralised unit tests provide value.
Document new functions in `scripts/lib/README.md` and update this guide if usage
patterns change. Keeping these references in sync helps future contributors
avoid copy/paste or undocumented conventions.

View file

@ -1,120 +0,0 @@
# Pulse Temperature Proxy Control Plane Sync
## Goals
1. Make `pulse-sensor-proxy` trust Pulse itself instead of scraping `pvecm`/editing `/etc/pve`.
2. Ensure host installers always create a pulse-proxy registration, regardless of socket vs HTTP mode.
3. Keep backwards compatibility: existing `allowed_nodes` entries remain a fallback cache, but the runtime source of truth is Pulse.
## Overview
```
┌─────────────────────┐ HTTPS / Unix socket ┌─────────────────────┐
│ Pulse server (LXC) │ <═════════════════════════════> │ pulse-sensor-proxy │
│ │ /api/... │ (Proxmox host) │
│ - Stores nodes │ │ - Collects temps │
│ - Issues proxy token│ │ - Validates node │
└─────────────────────┘ │ via synced list │
└─────────────────────┘
```
1. Installer registers the proxy using `/api/temperature-proxy/register`.
- Response now includes `ctrl_token`, `instance_id`, and `allowed_nodes`.
- Pulse persists `{instance_id, ctrl_token, last_seen, allowed_nodes_cache}`.
2. Proxy writes:
```yaml
pulse_control_plane:
url: https://pulse.example.com:7655
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
refresh_interval: 60s
```
3. Proxy boot sequence:
- Load cached `allowed_nodes` from YAML (fallback only).
- If `pulse_control_plane` configured, fetch `/api/temperature-proxy/authorized-nodes`.
- Replace in-memory allowlist atomically, log version/hash.
- Retry based on exponential backoff; stay on cached list if control plane unreachable.
## API Changes (Pulse)
1. **Extend existing registration endpoint**
- Request: `{hostname, proxy_url, kind}` (`kind` = `socket` or `http`).
- Response: `{success, token, ctrl_token, pve_instance, allowed_nodes, refresh_interval}`.
- Persist `ctrl_token` (or reuse `TemperatureProxyToken` field if `proxy_url` empty).
2. **New endpoint** `/api/temperature-proxy/authorized-nodes`
- Auth: `X-Proxy-Token: <ctrl_token>` or `Authorization: Bearer`.
- Response:
```json
{
"nodes": [
{"name": "delly", "ip": "192.168.0.5"},
{"name": "minipc", "ip": "192.168.0.134"}
],
"hash": "sha256:...",
"refresh_interval": 60,
"updated_at": "2025-11-15T20:47:00Z"
}
```
- Uses Pulse config (`nodes.enc` + cluster endpoints) to build list.
- Derives `ip` from cluster endpoints or stored host value; duplicates removed.
- Logs when proxies pull list (metrics + last_seen).
3. **Persistence**
- `config.PVEInstance` already has `TemperatureProxyURL`/`Token`. Add `TemperatureProxyControlToken` or reuse existing field when URL empty.
- Add `LastProxyPull`, `LastAllowlistHash`.
4. **Access control**
- Router should treat `/api/temperature-proxy/authorized-nodes` as public but requiring proxy token (bypasses user auth).
- Rate limit per proxy (maybe 12/min).
## Proxy Changes
1. **Config additions**
```yaml
pulse_control_plane:
url: https://pulse.lan:7655
token_file: /etc/pulse-sensor-proxy/.pulse-control-token
refresh_interval: 60s # default
insecure_skip_verify: false
```
2. **Startup**
- Read token from `token_file`.
- Launch goroutine: `syncAllowlist(ctx)` loops:
1. GET `/api/temperature-proxy/authorized-nodes`.
2. Validate response (non-empty, verify hash changes).
3. Replace `nodeValidator` allowlist in thread-safe way.
4. Write new snapshot to `allowed_nodes_cache` (optional).
5. Sleep `refresh_interval` (server-provided).
- If call fails: log warning, keep last known list, use fallback allowlist when empty.
3. **NodeValidator**
- Keep ability to parse static `allowed_nodes`.
- Add `SetAuthorizedNodes([]string)` to update hosts + CIDRs.
- When `hasAllowlist == false` but control-plane sync enabled, we never fall back to cluster detection.
- Provide metrics: last sync success timestamp, number of nodes, etc.
## Installer Changes
1. Host install path (`install.sh` invoking `install-sensor-proxy.sh`)
- Always pass `--pulse-server http://<container-ip>:<port>`.
- If `--pulse-server` not supplied manually, `install-sensor-proxy.sh` fetches from `PULSE_SERVER` env.
2. `install-sensor-proxy.sh`
- After downloading binary, run registration:
```
ctrl_token=$(register_with_pulse "$PULSE_SERVER" "$SHORT_HOSTNAME" "$PROXY_URL" "$MODE")
echo "$ctrl_token" > /etc/pulse-sensor-proxy/.pulse-control-token
```
- Append control-plane block to config if not present.
- After install, call new authorized-nodes endpoint once to prime the cache.
- Continue merging `allowed_nodes` for fallback, but treat as `# Legacy fallback`.
3. Provide migration flag `--legacy-allowlist` to skip control plane (for air-gapped hosts).
## Migration Plan
1. Ship allowlist merge fix (already done locally) so reruns stop causing YAML errors.
2. Release intermediate version where installer accepts `--pulse-server` and registers proxies; proxy ignores new config fields until next release.
3. Release proxy with control-plane sync; ensure it tolerates missing control block (for older installs).
4. Update docs + UI to show last proxy sync state (diagnostics tab).
## Open Questions / TODO
- Decide whether ctrl_token reuses `TemperatureProxyToken` (rename field) or is separate.
- How to handle multiple Pulse servers controlling the same host (future?). For now, one ctrl token per PVE instance.
- Should HTTP-mode proxies reuse the same sync endpoint (yes).

View file

@ -1,98 +0,0 @@
# ZFS Pool Monitoring
Pulse v4.15.0+ includes automatic ZFS pool health monitoring for Proxmox VE nodes.
## Features
- **Automatic Detection**: Detects ZFS storage and monitors associated pools
- **Health Status**: Monitors pool state (ONLINE, DEGRADED, FAULTED)
- **Error Tracking**: Tracks read, write, and checksum errors
- **Device Monitoring**: Monitors individual devices within pools
- **Alert Generation**: Creates alerts for degraded pools and device errors
- **Frontend Display**: Shows ZFS issues inline with storage information
## Requirements
### Proxmox Permissions
The Pulse user needs `Sys.Audit` permission on `/nodes/{node}/disks` to access ZFS information:
```bash
# Grant permission for ZFS monitoring (already included in standard Pulse role)
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
```
### API Endpoints Used
- `/nodes/{node}/disks/zfs` - Lists ZFS pools
- `/nodes/{node}/disks/zfs/{pool}` - Gets detailed pool status
## Configuration
ZFS monitoring is **enabled by default** in Pulse v4.15.0+.
### Disabling ZFS Monitoring
If you want to disable ZFS monitoring (e.g., for performance reasons):
```bash
# Add to /opt/pulse/.env or environment
PULSE_DISABLE_ZFS_MONITORING=true
```
## Alert Types
### Pool State Alerts
- **Warning**: Pool is DEGRADED
- **Critical**: Pool is FAULTED or UNAVAIL
### Error Alerts
- **Warning**: Any read/write/checksum errors detected
- Alerts include error counts and affected devices
### Device Alerts
- **Warning**: Device has errors but is ONLINE
- **Critical**: Device is FAULTED or UNAVAIL
## Frontend Display
ZFS issues appear in the Storage tab:
- Yellow warning bar for degraded pools
- Red error counts for devices with issues
- Detailed device status for troubleshooting
## Performance Impact
- Adds 2 API calls per node with ZFS storage
- Typically adds <1 second to polling cycle
- Only queries nodes that have ZFS storage
## Troubleshooting
### No ZFS Data Appearing
1. Check permissions: `pveum user permissions pulse-monitor@pam`
2. Verify ZFS pools exist: `zpool list`
3. Check logs: `grep ZFS /opt/pulse/pulse.log` (raise log level to `debug` via **Settings → System → Logging** if you need more context, then switch back to `info`).
### Permission Denied Errors
Grant the required permission:
```bash
pveum acl modify /nodes -user pulse-monitor@pam -role PVEAuditor
```
### High API Load
Disable ZFS monitoring if not needed:
```bash
echo "PULSE_DISABLE_ZFS_MONITORING=true" >> /opt/pulse/.env
systemctl restart pulse
```
## Example Alert
```
Alert: ZFS pool 'rpool' is DEGRADED
Node: pve1
Pool: rpool
State: DEGRADED
Errors: 12 read, 0 write, 3 checksum
Device sdb2: DEGRADED with 12 read errors
```
This helps administrators identify failing drives before complete failure occurs.