mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-04-28 19:41:17 +00:00
469 lines
13 KiB
Markdown
469 lines
13 KiB
Markdown
# Sensor Proxy Configuration Management
|
|
|
|
This guide covers safe configuration management for pulse-sensor-proxy, including the new CLI tools introduced in v4.31.1+ to prevent config corruption.
|
|
|
|
## Overview
|
|
|
|
Starting with v4.31.1, pulse-sensor-proxy uses a two-file configuration system:
|
|
|
|
1. **Main config:** `/etc/pulse-sensor-proxy/config.yaml` - Contains all settings except allowed nodes
|
|
2. **Allowed nodes:** `/etc/pulse-sensor-proxy/allowed_nodes.yaml` - Separate file for the authorized node list
|
|
|
|
This separation prevents corruption from concurrent updates by the installer, control-plane sync, and self-heal timer.
|
|
|
|
## Architecture
|
|
|
|
### Why Two Files?
|
|
|
|
Earlier versions stored `allowed_nodes:` inline in `config.yaml`, causing corruption when:
|
|
- The installer updated node lists
|
|
- The self-heal timer ran (every 5 minutes)
|
|
- Control-plane sync modified the list
|
|
- Version detection had edge cases
|
|
|
|
Multiple code paths (shell, Python, Go) would race to update the same YAML file, creating duplicate `allowed_nodes:` keys that broke YAML parsing.
|
|
|
|
### New System (v4.31.1+)
|
|
|
|
**Phase 1 (Migration):**
|
|
- Force file-based mode exclusively
|
|
- Installer migrates inline blocks to `allowed_nodes.yaml`
|
|
- Self-heal timer includes corruption detection and repair
|
|
|
|
**Phase 2 (Atomic Operations):**
|
|
- Go CLI replaces all shell/Python config manipulation
|
|
- File locking prevents concurrent writes
|
|
- Atomic writes (temp file + rename) ensure consistency
|
|
- systemd validation prevents startup with corrupt config
|
|
|
|
## Configuration CLI Reference
|
|
|
|
### Validate Configuration
|
|
|
|
Check config files for errors before restarting the service:
|
|
|
|
```bash
|
|
# Validate both config.yaml and allowed_nodes.yaml
|
|
pulse-sensor-proxy config validate
|
|
|
|
# Validate specific config file
|
|
pulse-sensor-proxy config validate --config /path/to/config.yaml
|
|
|
|
# Validate specific allowed_nodes file
|
|
pulse-sensor-proxy config validate --allowed-nodes /path/to/allowed_nodes.yaml
|
|
```
|
|
|
|
**Exit codes:**
|
|
- 0 = valid
|
|
- Non-zero = validation failed (check stderr for details)
|
|
|
|
**Common validation errors:**
|
|
- "duplicate allowed_nodes blocks" - Run migration (see below)
|
|
- "failed to parse YAML" - Syntax error in config file
|
|
- "read_timeout must be positive" - Invalid timeout value
|
|
|
|
### Manage Allowed Nodes
|
|
|
|
The CLI provides two modes:
|
|
|
|
**Merge mode (default):** Adds nodes to existing list
|
|
```bash
|
|
# Add single node
|
|
pulse-sensor-proxy config set-allowed-nodes --merge 192.168.0.10
|
|
|
|
# Add multiple nodes
|
|
pulse-sensor-proxy config set-allowed-nodes \
|
|
--merge 192.168.0.1 \
|
|
--merge 192.168.0.2 \
|
|
--merge node1.local
|
|
```
|
|
|
|
**Replace mode:** Overwrites entire list
|
|
```bash
|
|
# Replace with new list
|
|
pulse-sensor-proxy config set-allowed-nodes --replace \
|
|
--merge 192.168.0.1 \
|
|
--merge 192.168.0.2
|
|
|
|
# Clear the list (empty is valid for IPC-only clusters)
|
|
pulse-sensor-proxy config set-allowed-nodes --replace
|
|
```
|
|
|
|
**Custom paths:**
|
|
```bash
|
|
# Use non-default path
|
|
pulse-sensor-proxy config set-allowed-nodes \
|
|
--allowed-nodes /custom/path.yaml \
|
|
--merge 192.168.0.10
|
|
```
|
|
|
|
### How It Works
|
|
|
|
1. **File locking:** Uses `flock(LOCK_EX)` on separate `.lock` file
|
|
2. **Atomic writes:** Writes to temp file, syncs, then renames
|
|
3. **Deduplication:** Automatically removes duplicate entries
|
|
4. **Normalization:** Trims whitespace, sorts entries
|
|
5. **Empty lists allowed:** Useful for security lockdown or IPC-based discovery
|
|
|
|
## Common Tasks
|
|
|
|
### Adding Nodes After Cluster Expansion
|
|
|
|
When you add a new node to your Proxmox cluster:
|
|
|
|
```bash
|
|
# Add the new node to allowed list
|
|
pulse-sensor-proxy config set-allowed-nodes --merge new-node.local
|
|
|
|
# Validate config
|
|
pulse-sensor-proxy config validate
|
|
|
|
# Restart proxy to apply
|
|
sudo systemctl restart pulse-sensor-proxy
|
|
|
|
# Verify in Pulse UI
|
|
# Check Settings → Diagnostics → Temperature Proxy
|
|
```
|
|
|
|
### Removing Decommissioned Nodes
|
|
|
|
When removing a node from your cluster:
|
|
|
|
```bash
|
|
# Get current list
|
|
cat /etc/pulse-sensor-proxy/allowed_nodes.yaml
|
|
|
|
# Replace with updated list (without old node)
|
|
pulse-sensor-proxy config set-allowed-nodes --replace \
|
|
--merge 192.168.0.1 \
|
|
--merge 192.168.0.2
|
|
# (omit the decommissioned node)
|
|
|
|
# Validate and restart
|
|
pulse-sensor-proxy config validate
|
|
sudo systemctl restart pulse-sensor-proxy
|
|
```
|
|
|
|
**Note:** The proxy cleanup system automatically removes SSH keys from deleted nodes. See temperature monitoring docs for details.
|
|
|
|
### Migrating from Inline Config
|
|
|
|
If you're running an older version with inline `allowed_nodes:` in config.yaml:
|
|
|
|
```bash
|
|
# Upgrade to latest version (auto-migrates)
|
|
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \
|
|
sudo bash -s -- --standalone --pulse-server http://your-pulse:7655
|
|
|
|
# Verify migration
|
|
pulse-sensor-proxy config validate
|
|
|
|
# Check that allowed_nodes only appears in allowed_nodes.yaml
|
|
grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml
|
|
# Should show: allowed_nodes.yaml:3:allowed_nodes:
|
|
# Should NOT show duplicate entries in config.yaml
|
|
```
|
|
|
|
### Changing Other Config Settings
|
|
|
|
For settings in `config.yaml` (not allowed_nodes):
|
|
|
|
```bash
|
|
# Stop the service first
|
|
sudo systemctl stop pulse-sensor-proxy
|
|
|
|
# Edit config.yaml manually
|
|
sudo nano /etc/pulse-sensor-proxy/config.yaml
|
|
|
|
# Validate before starting
|
|
pulse-sensor-proxy config validate
|
|
|
|
# Start service
|
|
sudo systemctl start pulse-sensor-proxy
|
|
|
|
# Check for errors
|
|
sudo systemctl status pulse-sensor-proxy
|
|
journalctl -u pulse-sensor-proxy -n 50
|
|
```
|
|
|
|
**Safe to edit in config.yaml:**
|
|
- `allowed_source_subnets`
|
|
- `allowed_peers` (UID/GID permissions)
|
|
- `rate_limit` settings
|
|
- `metrics_address`
|
|
- `http_*` settings (HTTPS mode)
|
|
- `pulse_control_plane` block
|
|
|
|
**Never edit manually:**
|
|
- `allowed_nodes:` (use CLI instead, or it will be in allowed_nodes.yaml anyway)
|
|
- Lock files (`.lock`)
|
|
|
|
## Troubleshooting
|
|
|
|
### Config Validation Fails
|
|
|
|
**Symptom:** `pulse-sensor-proxy config validate` returns error
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Run validation with full output
|
|
pulse-sensor-proxy config validate 2>&1
|
|
|
|
# Check for duplicate blocks
|
|
grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml
|
|
|
|
# Check YAML syntax
|
|
python3 -c "import yaml; yaml.safe_load(open('/etc/pulse-sensor-proxy/config.yaml'))"
|
|
```
|
|
|
|
**Common fixes:**
|
|
- Duplicate blocks: Run migration (upgrade to v4.31.1+)
|
|
- YAML syntax errors: Fix indentation, remove tabs, check colons
|
|
- Missing required fields: Add `read_timeout`, `write_timeout`
|
|
|
|
### Service Won't Start After Config Change
|
|
|
|
**Diagnosis:**
|
|
```bash
|
|
# Check systemd logs
|
|
journalctl -u pulse-sensor-proxy -n 100
|
|
|
|
# Look for validation errors
|
|
journalctl -u pulse-sensor-proxy | grep -i "validation\|corrupt\|duplicate"
|
|
|
|
# Try starting in foreground for better errors
|
|
sudo -u pulse-sensor-proxy /opt/pulse/sensor-proxy/bin/pulse-sensor-proxy # legacy installs: /usr/local/bin/pulse-sensor-proxy
|
|
```
|
|
|
|
**Fix:**
|
|
```bash
|
|
# Validate config first
|
|
pulse-sensor-proxy config validate
|
|
|
|
# If validation passes but service fails, check permissions
|
|
ls -la /etc/pulse-sensor-proxy/
|
|
ls -la /var/lib/pulse-sensor-proxy/
|
|
|
|
# Ensure proxy user owns files
|
|
sudo chown -R pulse-sensor-proxy:pulse-sensor-proxy /etc/pulse-sensor-proxy/
|
|
sudo chown -R pulse-sensor-proxy:pulse-sensor-proxy /var/lib/pulse-sensor-proxy/
|
|
```
|
|
|
|
### Lock File Errors
|
|
|
|
**Symptom:** `failed to acquire file lock` or `failed to open lock file`
|
|
|
|
**Cause:** Lock file has wrong permissions or process holds stale lock
|
|
|
|
**Fix:**
|
|
```bash
|
|
# Check lock file permissions (should be 0600)
|
|
ls -la /etc/pulse-sensor-proxy/*.lock
|
|
|
|
# Fix permissions
|
|
sudo chmod 0600 /etc/pulse-sensor-proxy/*.lock
|
|
sudo chown pulse-sensor-proxy:pulse-sensor-proxy /etc/pulse-sensor-proxy/*.lock
|
|
|
|
# If stale lock, identify holder
|
|
sudo lsof /etc/pulse-sensor-proxy/allowed_nodes.yaml.lock
|
|
|
|
# Kill stale process if needed (use with caution)
|
|
sudo kill <PID>
|
|
```
|
|
|
|
**Prevention:** Locks are automatically released when process exits. Don't manually delete lock files.
|
|
|
|
### Allowed Nodes List is Empty
|
|
|
|
**Symptom:** allowed_nodes.yaml exists but has no entries
|
|
|
|
**Is this a problem?** Not necessarily:
|
|
- Empty list is valid for clusters using IPC discovery (pvecm status)
|
|
- Control-plane mode populates the list automatically
|
|
- Standalone nodes require manual node entries
|
|
|
|
**To populate manually:**
|
|
```bash
|
|
# Add your cluster nodes
|
|
pulse-sensor-proxy config set-allowed-nodes --replace \
|
|
--merge 192.168.0.1 \
|
|
--merge 192.168.0.2 \
|
|
--merge 192.168.0.3
|
|
|
|
# Verify
|
|
cat /etc/pulse-sensor-proxy/allowed_nodes.yaml
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### General Guidelines
|
|
|
|
1. **Always validate before restarting:**
|
|
```bash
|
|
pulse-sensor-proxy config validate && sudo systemctl restart pulse-sensor-proxy
|
|
```
|
|
|
|
2. **Use the CLI for allowed_nodes changes:**
|
|
- Don't edit `allowed_nodes.yaml` manually
|
|
- Use `config set-allowed-nodes` instead
|
|
|
|
3. **Stop service before editing config.yaml:**
|
|
- Prevents race conditions with running process
|
|
- systemd validation will catch errors on startup
|
|
|
|
4. **Back up config before major changes:**
|
|
```bash
|
|
sudo cp /etc/pulse-sensor-proxy/config.yaml /etc/pulse-sensor-proxy/config.yaml.backup
|
|
sudo cp /etc/pulse-sensor-proxy/allowed_nodes.yaml /etc/pulse-sensor-proxy/allowed_nodes.yaml.backup
|
|
```
|
|
|
|
5. **Monitor after changes:**
|
|
```bash
|
|
journalctl -u pulse-sensor-proxy -f
|
|
# Check Pulse UI: Settings → Diagnostics → Temperature Proxy
|
|
```
|
|
|
|
### Automation Scripts
|
|
|
|
When scripting config changes:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
set -euo pipefail
|
|
|
|
# Function to safely update allowed nodes
|
|
update_allowed_nodes() {
|
|
local nodes=("$@")
|
|
|
|
# Build command
|
|
local cmd="pulse-sensor-proxy config set-allowed-nodes --replace"
|
|
for node in "${nodes[@]}"; do
|
|
cmd="$cmd --merge $node"
|
|
done
|
|
|
|
# Execute with validation
|
|
if eval "$cmd"; then
|
|
echo "Allowed nodes updated successfully"
|
|
else
|
|
echo "Failed to update allowed nodes" >&2
|
|
return 1
|
|
fi
|
|
|
|
# Validate
|
|
if ! pulse-sensor-proxy config validate; then
|
|
echo "Config validation failed after update" >&2
|
|
return 1
|
|
fi
|
|
|
|
# Restart service
|
|
if sudo systemctl restart pulse-sensor-proxy; then
|
|
echo "Service restarted successfully"
|
|
else
|
|
echo "Service restart failed" >&2
|
|
return 1
|
|
fi
|
|
|
|
# Wait for service to be active
|
|
sleep 2
|
|
if systemctl is-active --quiet pulse-sensor-proxy; then
|
|
echo "Service is running"
|
|
else
|
|
echo "Service failed to start" >&2
|
|
journalctl -u pulse-sensor-proxy -n 20
|
|
return 1
|
|
fi
|
|
}
|
|
|
|
# Example usage
|
|
update_allowed_nodes "192.168.0.1" "192.168.0.2" "node3.local"
|
|
```
|
|
|
|
### Monitoring Config Health
|
|
|
|
Add to your monitoring system:
|
|
|
|
```bash
|
|
# Check for config corruption (should return 0)
|
|
pulse-sensor-proxy config validate
|
|
echo $?
|
|
|
|
# Check for duplicate blocks (should be empty)
|
|
grep "allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml | wc -l
|
|
|
|
# Check lock file permissions (should be 0600)
|
|
stat -c "%a" /etc/pulse-sensor-proxy/*.lock
|
|
|
|
# Check service is running
|
|
systemctl is-active pulse-sensor-proxy
|
|
```
|
|
|
|
## Migration Path
|
|
|
|
### Upgrading from Pre-v4.31.1
|
|
|
|
**Automatic migration** (recommended):
|
|
```bash
|
|
# Simply reinstall - migration runs automatically
|
|
curl -fsSL https://raw.githubusercontent.com/rcourtman/Pulse/main/scripts/install-sensor-proxy.sh | \
|
|
sudo bash -s -- --standalone --pulse-server http://your-pulse:7655
|
|
|
|
# Verify
|
|
pulse-sensor-proxy config validate
|
|
sudo systemctl status pulse-sensor-proxy
|
|
```
|
|
|
|
**Manual migration** (if needed):
|
|
```bash
|
|
# 1. Stop service
|
|
sudo systemctl stop pulse-sensor-proxy
|
|
|
|
# 2. Extract allowed_nodes from config.yaml
|
|
grep -A 100 "^allowed_nodes:" /etc/pulse-sensor-proxy/config.yaml > /tmp/nodes.txt
|
|
|
|
# 3. Parse and add to allowed_nodes.yaml
|
|
# (Example for simple list - adjust for your format)
|
|
pulse-sensor-proxy config set-allowed-nodes --replace \
|
|
--merge node1.local \
|
|
--merge node2.local
|
|
|
|
# 4. Remove allowed_nodes from config.yaml
|
|
# Edit manually or use sed:
|
|
sudo sed -i '/^allowed_nodes:/,/^[a-z_]/d' /etc/pulse-sensor-proxy/config.yaml
|
|
|
|
# 5. Add reference to allowed_nodes.yaml
|
|
echo "allowed_nodes_file: /etc/pulse-sensor-proxy/allowed_nodes.yaml" | \
|
|
sudo tee -a /etc/pulse-sensor-proxy/config.yaml
|
|
|
|
# 6. Validate
|
|
pulse-sensor-proxy config validate
|
|
|
|
# 7. Start service
|
|
sudo systemctl start pulse-sensor-proxy
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- [Temperature Monitoring](../TEMPERATURE_MONITORING.md) - Setup and troubleshooting
|
|
- [Sensor Proxy README](/opt/pulse/cmd/pulse-sensor-proxy/README.md) - Complete CLI reference
|
|
- [Audit Log Rotation](audit-log-rotation.md) - Managing append-only logs
|
|
- [Temperature Monitoring Security](../TEMPERATURE_MONITORING_SECURITY.md) - Security architecture
|
|
|
|
## Support
|
|
|
|
If config management issues persist after following this guide:
|
|
|
|
1. Collect diagnostics:
|
|
```bash
|
|
pulse-sensor-proxy config validate 2>&1 > /tmp/validate.log
|
|
sudo systemctl status pulse-sensor-proxy > /tmp/status.log
|
|
journalctl -u pulse-sensor-proxy -n 200 > /tmp/journal.log
|
|
grep -n "allowed_nodes:" /etc/pulse-sensor-proxy/*.yaml > /tmp/grep.log
|
|
```
|
|
|
|
2. File an issue at https://github.com/rcourtman/Pulse/issues
|
|
|
|
3. Include:
|
|
- Pulse version
|
|
- Sensor proxy version (`pulse-sensor-proxy --version`)
|
|
- Output from diagnostic commands above
|
|
- Steps that led to the issue
|