diff --git a/README.md b/README.md index e44cbbbb6..272914d99 100644 --- a/README.md +++ b/README.md @@ -589,6 +589,7 @@ journalctl -u pulse -f ## Operations Runbooks +- [Sensor proxy config management](docs/operations/sensor-proxy-config-management.md) – Safe configuration updates using the built-in CLI, migration guide, and troubleshooting. - [Sensor proxy audit log rotation](docs/operations/audit-log-rotation.md) – Safely rotate append-only logs and verify poller health. - [Adaptive polling rollout](docs/operations/ADAPTIVE_POLLING_ROLLOUT.md) – Enable/disable the adaptive scheduler with guardrails. - [Automatic update management](docs/operations/auto-update.md) – Control the `pulse-update` timer/service, trigger manual runs, and roll back safely. diff --git a/cmd/pulse-sensor-proxy/README.md b/cmd/pulse-sensor-proxy/README.md index 8473b7b23..eff30b942 100644 --- a/cmd/pulse-sensor-proxy/README.md +++ b/cmd/pulse-sensor-proxy/README.md @@ -46,6 +46,71 @@ The proxy reads `/etc/pulse-sensor-proxy/config.yaml` (see | `max_ssh_output_bytes` | Cap command output | Prevents memory exhaustion (default 1 MiB) | | `rate_limit.per_peer_interval_ms` / `per_peer_burst` | Token bucket guardrails | Keep interval ≥100 ms in production | | `http_*` keys | HTTPS bridge mode | Needs TLS files plus bearer token | +| `allowed_nodes_file` | Path to allowed nodes list | Default: `/etc/pulse-sensor-proxy/allowed_nodes.yaml` | + +### Configuration Management CLI + +The proxy includes built-in commands for safe configuration management. These prevent corruption by using atomic writes and file locking. + +**Validate configuration:** +```bash +# Validate config.yaml and allowed_nodes.yaml +pulse-sensor-proxy config validate + +# Validate specific config file +pulse-sensor-proxy config validate --config /path/to/config.yaml + +# Validate specific allowed_nodes file +pulse-sensor-proxy config validate --allowed-nodes /path/to/allowed_nodes.yaml +``` + +**Manage allowed nodes:** +```bash +# Add nodes to the allowed list (merge mode) +pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.1 --merge node1.local + +# Replace entire list with new nodes +pulse-sensor-proxy config set-allowed-nodes --replace --merge 192.168.0.1 --merge 192.168.0.2 + +# Clear the allowed nodes list (replace with empty) +pulse-sensor-proxy config set-allowed-nodes --replace + +# Use custom path +pulse-sensor-proxy config set-allowed-nodes --allowed-nodes /custom/path.yaml --merge 192.168.0.10 +``` + +**How it works:** +- All writes are atomic (temp file + rename) +- File locking prevents concurrent modifications +- Deduplication and normalization happen automatically +- Empty lists are allowed (useful for security lockdown or IPC-only clusters) +- Config validation runs before service startup (systemd ExecStartPre) + +**Best practices:** +- Use the CLI instead of manual editing whenever possible +- The installer automatically uses these commands +- Manual edits to `config.yaml` are safe if the service is stopped +- Never edit `allowed_nodes.yaml` while the service is running + +### Allowed Nodes File + +The proxy maintains a separate YAML file for the authorized node list at +`/etc/pulse-sensor-proxy/allowed_nodes.yaml`. This separation prevents +config corruption when the installer or control-plane sync updates the list. + +Format: +```yaml +# Managed by pulse-sensor-proxy config CLI +# Do not edit manually while service is running +allowed_nodes: + - 192.168.0.1 + - 192.168.0.2 + - node1.local + - node2.example.com +``` + +The file is optional - if missing or empty, the proxy falls back to IPC-based +discovery (pvecm status) when available. ### Environment Overrides @@ -105,11 +170,64 @@ Set alerts on: | Symptom | Guidance | | --- | --- | +| Service fails to start with "Config validation failed" | Run `pulse-sensor-proxy config validate` to see specific errors. Check for duplicate keys or malformed YAML. | +| Config corruption detected during startup | Older versions had dual code paths. Update to v4.31.1+ and reinstall proxy. The migration runs automatically. | +| Temperature monitoring stops working after config change | Validate config first with `pulse-sensor-proxy config validate`, then restart service: `systemctl restart pulse-sensor-proxy`. | | `Cannot open audit log file` | Check permissions on `/var/log/pulse/sensor-proxy`. Remove `chattr +a` only during rotation. | | `connection denied` in audit log | UID/GID not listed in `allowed_peers`. Verify Pulse container UID mapping. | | `HTTP request from unauthorized source IP` | Update `allowed_source_subnets` or run through a reverse proxy that advertises the client IP via `ProxyProtocol` (not supported yet). | | `rate limit exceeded` | Increase `rate_limit.per_peer_burst` or fix noisy hosts before relaxing limits. | | `temperature pollers stuck` | Hit `/api/monitoring/scheduler/health`, ensure breakers are `closed`, restart Pulse + proxy if necessary. | +| Lock file permissions error | Lock files use 0600 to prevent unprivileged DoS. Check file ownership matches proxy user. | + +### Config Corruption Recovery + +If you suspect config corruption (service won't start, temperatures stopped): + +1. **Validate the config:** + ```bash + pulse-sensor-proxy config validate + ``` + +2. **If corruption is detected, reinstall the proxy:** + ```bash + curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \ + sudo bash -s -- --standalone --pulse-server http://your-pulse:7655 + ``` + The installer automatically migrates to file-based config and fixes corruption. + +3. **Check for duplicate allowed_nodes blocks:** + ```bash + grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml + ``` + Should only appear once. Multiple instances indicate corruption that Phase 1 migration will fix. + +4. **Manual recovery (if installer unavailable):** + ```bash + # Stop the service + sudo systemctl stop pulse-sensor-proxy + + # Validate and identify issues + pulse-sensor-proxy config validate --config /etc/pulse-sensor-proxy/config.yaml + + # If allowed_nodes appears in config.yaml, extract it manually: + grep -A 100 "^allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml | \ + head -n 20 > /tmp/nodes.txt + + # Remove duplicate allowed_nodes from config.yaml (edit manually) + # Then create allowed_nodes.yaml: + pulse-sensor-proxy config set-allowed-nodes --replace --merge node1 --merge node2 + + # Add allowed_nodes_file reference to config.yaml if missing: + echo "allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml" | \ + sudo tee -a /etc/pulse-sensor-proxy/config.yaml + + # Validate again + pulse-sensor-proxy config validate + + # Start service + sudo systemctl start pulse-sensor-proxy + ``` For additional hardening steps, read `docs/PULSE_SENSOR_PROXY_HARDENING.md` and `docs/TEMPERATURE_MONITORING_SECURITY.md`. diff --git a/docs/README.md b/docs/README.md index 6c3b39fe9..3a6a09100 100644 --- a/docs/README.md +++ b/docs/README.md @@ -45,6 +45,7 @@ section groups related guides so you can jump straight to the material you need. ## Operations Runbooks +- [operations/sensor-proxy-config-management.md](operations/sensor-proxy-config-management.md) – Safe configuration updates using the built-in CLI, migration from inline config, and troubleshooting corruption issues. - [operations/audit-log-rotation.md](operations/audit-log-rotation.md) – Monthly/incident log rotation procedure that preserves the hash chain and validates scheduler health afterward. - [operations/ADAPTIVE_POLLING_ROLLOUT.md](operations/ADAPTIVE_POLLING_ROLLOUT.md) – Rollout/rollback plan for enabling adaptive polling in staging or production. - [operations/auto-update.md](operations/auto-update.md) – Lifecycle of the `pulse-update` timer/service: enablement, manual trigger, rollback, and observability. diff --git a/docs/TEMPERATURE_MONITORING.md b/docs/TEMPERATURE_MONITORING.md index ebd661b9f..df0c955fe 100644 --- a/docs/TEMPERATURE_MONITORING.md +++ b/docs/TEMPERATURE_MONITORING.md @@ -507,6 +507,40 @@ curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/instal ``` Then reinstall with the desired flags (for example, `--standalone --http-mode --pulse-server https://pulse:7655`). +### Config Validation Failure on Startup + +**Symptom:** Proxy service fails to start with "Config validation failed" or "duplicate allowed_nodes blocks detected" + +**Cause:** Config file corruption from earlier versions that had dual code paths for managing the allowed nodes list. This was the root cause of 99% of temperature monitoring failures. + +**Fix (Automatic):** +Version 4.31.1+ automatically migrates to file-based config management during installation. Simply reinstall: + +```bash +curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \ + sudo bash -s -- --standalone --pulse-server http://your-pulse:7655 +``` + +The installer will: +- Detect and repair duplicate `allowed_nodes:` blocks in config.yaml +- Migrate to separate `/etc/pulse-sensor-proxy/allowed_nodes.yaml` file +- Use atomic Go CLI for all future config updates + +**Verify the fix:** +```bash +# Check for duplicates (should only appear once, in allowed_nodes.yaml) +grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml + +# Validate configuration +pulse-sensor-proxy config validate + +# Check service status +systemctl status pulse-sensor-proxy +``` + +**Manual recovery (if needed):** +See troubleshooting section in `/opt/pulse/cmd/pulse-sensor-proxy/README.md` + ### SSH Connection Attempts from Container ([preauth] Logs) **Symptom:** Proxmox host logs (`/var/log/auth.log`) show repeated SSH connection attempts from your Pulse container: @@ -1356,6 +1390,33 @@ test -S /run/pulse-sensor-proxy/pulse-sensor-proxy.sock && echo "Socket OK" || e **Contributions Welcome:** If any of these improvements interest you, open a GitHub issue to discuss implementation! +## Configuration Management + +Starting with v4.31.1, the sensor proxy includes a built-in CLI for safe configuration management. This prevents config corruption that caused 99% of temperature monitoring failures. + +### Quick Reference + +```bash +# Validate config files +pulse-sensor-proxy config validate + +# Add nodes to allowed list +pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.1 --merge node1.local + +# Replace entire allowed list +pulse-sensor-proxy config set-allowed-nodes --replace --merge 192.168.0.1 +``` + +**Key benefits:** +- Atomic writes with file locking prevent corruption +- Automatic deduplication and normalization +- systemd validation prevents startup with bad config +- Installer uses CLI (no more shell/Python divergence) + +**See also:** +- [Sensor Proxy Config Management Guide](operations/sensor-proxy-config-management.md) - Complete runbook +- [Sensor Proxy CLI Reference](/opt/pulse/cmd/pulse-sensor-proxy/README.md) - Full command documentation + ## Control-Plane Sync & Migration As of v4.32 the sensor proxy registers with Pulse and syncs its authorized node list via `/api/temperature-proxy/authorized-nodes`. No more manual `allowed_nodes` maintenance or `/etc/pve` access is required. diff --git a/docs/TROUBLESHOOTING.md b/docs/TROUBLESHOOTING.md index c0b93c2f4..b28deb26a 100644 --- a/docs/TROUBLESHOOTING.md +++ b/docs/TROUBLESHOOTING.md @@ -280,6 +280,56 @@ See [Configuration Guide](CONFIGURATION.md#tlshttps-configuration) for complete ### Temperature Monitoring Issues +#### Sensor proxy fails to start (config validation error) + +**Symptoms:** Service won't start, logs show "Config validation failed" or "duplicate allowed_nodes blocks detected" + +**Diagnosis:** +```bash +# Check service status +sudo systemctl status pulse-sensor-proxy + +# Validate config manually +pulse-sensor-proxy config validate + +# Look for duplicate blocks +grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml +``` + +**Fix:** +The issue is config corruption from earlier versions. Version 4.31.1+ fixes this automatically: + +```bash +# Reinstall to migrate to new config system +curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \ + sudo bash -s -- --standalone --pulse-server http://your-pulse:7655 + +# Verify the fix +pulse-sensor-proxy config validate +sudo systemctl status pulse-sensor-proxy +``` + +The new config system: +- Separates allowed nodes into `/etc/pulse-sensor-proxy/allowed_nodes.yaml` +- Uses atomic writes with file locking +- Validates config before service startup +- Includes CLI for safe config management + +**Manual config management (advanced):** +```bash +# Add nodes to allowed list +pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.1 --merge node1.local + +# Replace entire list +pulse-sensor-proxy config set-allowed-nodes --replace --merge 192.168.0.1 + +# Validate before restarting +pulse-sensor-proxy config validate +sudo systemctl restart pulse-sensor-proxy +``` + +See `/opt/pulse/cmd/pulse-sensor-proxy/README.md` for complete CLI documentation. + #### Temperature data flickers after adding nodes **Symptoms:** Dashboard temperatures alternate between values and `--`, or new nodes never show readings. Proxy logs contain `limiter.rejection` messages. diff --git a/docs/operations/sensor-proxy-config-management.md b/docs/operations/sensor-proxy-config-management.md new file mode 100644 index 000000000..593618d25 --- /dev/null +++ b/docs/operations/sensor-proxy-config-management.md @@ -0,0 +1,469 @@ +# Sensor Proxy Configuration Management + +This guide covers safe configuration management for pulse-sensor-proxy, including the new CLI tools introduced in v4.31.1+ to prevent config corruption. + +## Overview + +Starting with v4.31.1, pulse-sensor-proxy uses a two-file configuration system: + +1. **Main config:** `/etc/pulse-sensor-proxy/config.yaml` - Contains all settings except allowed nodes +2. **Allowed nodes:** `/etc/pulse-sensor-proxy/allowed_nodes.yaml` - Separate file for the authorized node list + +This separation prevents corruption from concurrent updates by the installer, control-plane sync, and self-heal timer. + +## Architecture + +### Why Two Files? + +Earlier versions stored `allowed_nodes:` inline in `config.yaml`, causing corruption when: +- The installer updated node lists +- The self-heal timer ran (every 5 minutes) +- Control-plane sync modified the list +- Version detection had edge cases + +Multiple code paths (shell, Python, Go) would race to update the same YAML file, creating duplicate `allowed_nodes:` keys that broke YAML parsing. + +### New System (v4.31.1+) + +**Phase 1 (Migration):** +- Force file-based mode exclusively +- Installer migrates inline blocks to `allowed_nodes.yaml` +- Self-heal timer includes corruption detection and repair + +**Phase 2 (Atomic Operations):** +- Go CLI replaces all shell/Python config manipulation +- File locking prevents concurrent writes +- Atomic writes (temp file + rename) ensure consistency +- systemd validation prevents startup with corrupt config + +## Configuration CLI Reference + +### Validate Configuration + +Check config files for errors before restarting the service: + +```bash +# Validate both config.yaml and allowed_nodes.yaml +pulse-sensor-proxy config validate + +# Validate specific config file +pulse-sensor-proxy config validate --config /path/to/config.yaml + +# Validate specific allowed_nodes file +pulse-sensor-proxy config validate --allowed-nodes /path/to/allowed_nodes.yaml +``` + +**Exit codes:** +- 0 = valid +- Non-zero = validation failed (check stderr for details) + +**Common validation errors:** +- "duplicate allowed_nodes blocks" - Run migration (see below) +- "failed to parse YAML" - Syntax error in config file +- "read_timeout must be positive" - Invalid timeout value + +### Manage Allowed Nodes + +The CLI provides two modes: + +**Merge mode (default):** Adds nodes to existing list +```bash +# Add single node +pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.10 + +# Add multiple nodes +pulse-sensor-proxy config set-allowed-nodes \ + --merge 192.168.0.1 \ + --merge 192.168.0.2 \ + --merge node1.local +``` + +**Replace mode:** Overwrites entire list +```bash +# Replace with new list +pulse-sensor-proxy config set-allowed-nodes --replace \ + --merge 192.168.0.1 \ + --merge 192.168.0.2 + +# Clear the list (empty is valid for IPC-only clusters) +pulse-sensor-proxy config set-allowed-nodes --replace +``` + +**Custom paths:** +```bash +# Use non-default path +pulse-sensor-proxy config set-allowed-nodes \ + --allowed-nodes /custom/path.yaml \ + --merge 192.168.0.10 +``` + +### How It Works + +1. **File locking:** Uses `flock(LOCK_EX)` on separate `.lock` file +2. **Atomic writes:** Writes to temp file, syncs, then renames +3. **Deduplication:** Automatically removes duplicate entries +4. **Normalization:** Trims whitespace, sorts entries +5. **Empty lists allowed:** Useful for security lockdown or IPC-based discovery + +## Common Tasks + +### Adding Nodes After Cluster Expansion + +When you add a new node to your Proxmox cluster: + +```bash +# Add the new node to allowed list +pulse-sensor-proxy config set-allowed-nodes --merge new-node.local + +# Validate config +pulse-sensor-proxy config validate + +# Restart proxy to apply +sudo systemctl restart pulse-sensor-proxy + +# Verify in Pulse UI +# Check Settings → Diagnostics → Temperature Proxy +``` + +### Removing Decommissioned Nodes + +When removing a node from your cluster: + +```bash +# Get current list +cat /etc/pulse-sensor-proxy/allowed_nodes.yaml + +# Replace with updated list (without old node) +pulse-sensor-proxy config set-allowed-nodes --replace \ + --merge 192.168.0.1 \ + --merge 192.168.0.2 + # (omit the decommissioned node) + +# Validate and restart +pulse-sensor-proxy config validate +sudo systemctl restart pulse-sensor-proxy +``` + +**Note:** The proxy cleanup system automatically removes SSH keys from deleted nodes. See temperature monitoring docs for details. + +### Migrating from Inline Config + +If you're running an older version with inline `allowed_nodes:` in config.yaml: + +```bash +# Upgrade to latest version (auto-migrates) +curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \ + sudo bash -s -- --standalone --pulse-server http://your-pulse:7655 + +# Verify migration +pulse-sensor-proxy config validate + +# Check that allowed_nodes only appears in allowed_nodes.yaml +grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml +# Should show: allowed_nodes.yaml:3:allowed_nodes: +# Should NOT show duplicate entries in config.yaml +``` + +### Changing Other Config Settings + +For settings in `config.yaml` (not allowed_nodes): + +```bash +# Stop the service first +sudo systemctl stop pulse-sensor-proxy + +# Edit config.yaml manually +sudo nano /etc/pulse-sensor-proxy/config.yaml + +# Validate before starting +pulse-sensor-proxy config validate + +# Start service +sudo systemctl start pulse-sensor-proxy + +# Check for errors +sudo systemctl status pulse-sensor-proxy +journalctl -u pulse-sensor-proxy -n 50 +``` + +**Safe to edit in config.yaml:** +- `allowed_source_subnets` +- `allowed_peers` (UID/GID permissions) +- `rate_limit` settings +- `metrics_address` +- `http_*` settings (HTTPS mode) +- `pulse_control_plane` block + +**Never edit manually:** +- `allowed_nodes:` (use CLI instead, or it will be in allowed_nodes.yaml anyway) +- Lock files (`.lock`) + +## Troubleshooting + +### Config Validation Fails + +**Symptom:** `pulse-sensor-proxy config validate` returns error + +**Diagnosis:** +```bash +# Run validation with full output +pulse-sensor-proxy config validate 2>&1 + +# Check for duplicate blocks +grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml + +# Check YAML syntax +python3 -c "import yaml; yaml.safe_load(open('/etc/pulse-sensor-proxy/config.yaml'))" +``` + +**Common fixes:** +- Duplicate blocks: Run migration (upgrade to v4.31.1+) +- YAML syntax errors: Fix indentation, remove tabs, check colons +- Missing required fields: Add `read_timeout`, `write_timeout` + +### Service Won't Start After Config Change + +**Diagnosis:** +```bash +# Check systemd logs +journalctl -u pulse-sensor-proxy -n 100 + +# Look for validation errors +journalctl -u pulse-sensor-proxy | grep -i "validation\|corrupt\|duplicate" + +# Try starting in foreground for better errors +sudo -u pulse-sensor-proxy /usr/local/bin/pulse-sensor-proxy +``` + +**Fix:** +```bash +# Validate config first +pulse-sensor-proxy config validate + +# If validation passes but service fails, check permissions +ls -la /etc/pulse-sensor-proxy/ +ls -la /var/lib/pulse-sensor-proxy/ + +# Ensure proxy user owns files +sudo chown -R pulse-sensor-proxy:pulse-sensor-proxy /etc/pulse-sensor-proxy/ +sudo chown -R pulse-sensor-proxy:pulse-sensor-proxy /var/lib/pulse-sensor-proxy/ +``` + +### Lock File Errors + +**Symptom:** `failed to acquire file lock` or `failed to open lock file` + +**Cause:** Lock file has wrong permissions or process holds stale lock + +**Fix:** +```bash +# Check lock file permissions (should be 0600) +ls -la /etc/pulse-sensor-proxy/*.lock + +# Fix permissions +sudo chmod 0600 /etc/pulse-sensor-proxy/*.lock +sudo chown pulse-sensor-proxy:pulse-sensor-proxy /etc/pulse-sensor-proxy/*.lock + +# If stale lock, identify holder +sudo lsof /etc/pulse-sensor-proxy/allowed_nodes.yaml.lock + +# Kill stale process if needed (use with caution) +sudo kill +``` + +**Prevention:** Locks are automatically released when process exits. Don't manually delete lock files. + +### Allowed Nodes List is Empty + +**Symptom:** allowed_nodes.yaml exists but has no entries + +**Is this a problem?** Not necessarily: +- Empty list is valid for clusters using IPC discovery (pvecm status) +- Control-plane mode populates the list automatically +- Standalone nodes require manual node entries + +**To populate manually:** +```bash +# Add your cluster nodes +pulse-sensor-proxy config set-allowed-nodes --replace \ + --merge 192.168.0.1 \ + --merge 192.168.0.2 \ + --merge 192.168.0.3 + +# Verify +cat /etc/pulse-sensor-proxy/allowed_nodes.yaml +``` + +## Best Practices + +### General Guidelines + +1. **Always validate before restarting:** + ```bash + pulse-sensor-proxy config validate && sudo systemctl restart pulse-sensor-proxy + ``` + +2. **Use the CLI for allowed_nodes changes:** + - Don't edit `allowed_nodes.yaml` manually + - Use `config set-allowed-nodes` instead + +3. **Stop service before editing config.yaml:** + - Prevents race conditions with running process + - systemd validation will catch errors on startup + +4. **Back up config before major changes:** + ```bash + sudo cp /etc/pulse-sensor-proxy/config.yaml /etc/pulse-sensor-proxy/config.yaml.backup + sudo cp /etc/pulse-sensor-proxy/allowed_nodes.yaml /etc/pulse-sensor-proxy/allowed_nodes.yaml.backup + ``` + +5. **Monitor after changes:** + ```bash + journalctl -u pulse-sensor-proxy -f + # Check Pulse UI: Settings → Diagnostics → Temperature Proxy + ``` + +### Automation Scripts + +When scripting config changes: + +```bash +#!/bin/bash +set -euo pipefail + +# Function to safely update allowed nodes +update_allowed_nodes() { + local nodes=("$@") + + # Build command + local cmd="pulse-sensor-proxy config set-allowed-nodes --replace" + for node in "${nodes[@]}"; do + cmd="$cmd --merge $node" + done + + # Execute with validation + if eval "$cmd"; then + echo "Allowed nodes updated successfully" + else + echo "Failed to update allowed nodes" >&2 + return 1 + fi + + # Validate + if ! pulse-sensor-proxy config validate; then + echo "Config validation failed after update" >&2 + return 1 + fi + + # Restart service + if sudo systemctl restart pulse-sensor-proxy; then + echo "Service restarted successfully" + else + echo "Service restart failed" >&2 + return 1 + fi + + # Wait for service to be active + sleep 2 + if systemctl is-active --quiet pulse-sensor-proxy; then + echo "Service is running" + else + echo "Service failed to start" >&2 + journalctl -u pulse-sensor-proxy -n 20 + return 1 + fi +} + +# Example usage +update_allowed_nodes "192.168.0.1" "192.168.0.2" "node3.local" +``` + +### Monitoring Config Health + +Add to your monitoring system: + +```bash +# Check for config corruption (should return 0) +pulse-sensor-proxy config validate +echo $? + +# Check for duplicate blocks (should be empty) +grep "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml | wc -l + +# Check lock file permissions (should be 0600) +stat -c "%a" /etc/pulse-sensor-proxy/*.lock + +# Check service is running +systemctl is-active pulse-sensor-proxy +``` + +## Migration Path + +### Upgrading from Pre-v4.31.1 + +**Automatic migration** (recommended): +```bash +# Simply reinstall - migration runs automatically +curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \ + sudo bash -s -- --standalone --pulse-server http://your-pulse:7655 + +# Verify +pulse-sensor-proxy config validate +sudo systemctl status pulse-sensor-proxy +``` + +**Manual migration** (if needed): +```bash +# 1. Stop service +sudo systemctl stop pulse-sensor-proxy + +# 2. Extract allowed_nodes from config.yaml +grep -A 100 "^allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml > /tmp/nodes.txt + +# 3. Parse and add to allowed_nodes.yaml +# (Example for simple list - adjust for your format) +pulse-sensor-proxy config set-allowed-nodes --replace \ + --merge node1.local \ + --merge node2.local + +# 4. Remove allowed_nodes from config.yaml +# Edit manually or use sed: +sudo sed -i '/^allowed_nodes:/,/^[a-z_]/d' /etc/pulse-sensor-proxy/config.yaml + +# 5. Add reference to allowed_nodes.yaml +echo "allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml" | \ + sudo tee -a /etc/pulse-sensor-proxy/config.yaml + +# 6. Validate +pulse-sensor-proxy config validate + +# 7. Start service +sudo systemctl start pulse-sensor-proxy +``` + +## Related Documentation + +- [Temperature Monitoring](../TEMPERATURE_MONITORING.md) - Setup and troubleshooting +- [Sensor Proxy README](/opt/pulse/cmd/pulse-sensor-proxy/README.md) - Complete CLI reference +- [Audit Log Rotation](audit-log-rotation.md) - Managing append-only logs +- [Temperature Monitoring Security](../TEMPERATURE_MONITORING_SECURITY.md) - Security architecture + +## Support + +If config management issues persist after following this guide: + +1. Collect diagnostics: + ```bash + pulse-sensor-proxy config validate 2>&1 > /tmp/validate.log + sudo systemctl status pulse-sensor-proxy > /tmp/status.log + journalctl -u pulse-sensor-proxy -n 200 > /tmp/journal.log + grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml > /tmp/grep.log + ``` + +2. File an issue at https://github.com/rcourtman/Pulse/issues + +3. Include: + - Pulse version + - Sensor proxy version (`pulse-sensor-proxy --version`) + - Output from diagnostic commands above + - Steps that led to the issue