Improves configuration handling and system settings APIs to support
v4.24.0 features including runtime logging controls, adaptive polling
configuration, and enhanced config export/persistence.
Changes:
- Add config override system for discovery service
- Enhance system settings API with runtime logging controls
- Improve config persistence and export functionality
- Update security setup handling
- Refine monitoring and discovery service integration
These changes provide the backend support for the configuration
features documented in the v4.24.0 release.
Resolves two remaining TODOs from codebase audit.
## 1. PBS/PMG Test Harness Stubs
**Location:** internal/monitoring/harness_integration.go:149-151
**Changes:**
- Added PBS client stub registration: `monitor.pbsClients[inst.Name] = &pbs.Client{}`
- Added PMG client stub registration: `monitor.pmgClients[inst.Name] = &pmg.Client{}`
- Added imports for pkg/pbs and pkg/pmg
**Purpose:**
Enables integration test scenarios to include PBS and PMG instance types
alongside existing PVE support. Stubs allow scheduler to register and
execute tasks for these instance types during integration testing.
**Testing:**
✅ TestAdaptiveSchedulerIntegration passes (55.5s)
✅ Integration test harness now supports all three instance types
## 2. HTTP Config URL Fetch
**Location:** cmd/pulse/config.go:226-261
**Problem:**
`PULSE_INIT_CONFIG_URL` was recognized but not implemented, returning
"URL import not yet implemented" error.
**Implementation:**
- URL validation (http/https schemes only)
- HTTP client with 15 second timeout
- Status code validation (2xx required)
- Empty response detection
- Base64 decoding with fallback to raw data
- Matches existing env-var behavior for `PULSE_INIT_CONFIG_DATA`
**Security:**
- Both HTTP and HTTPS supported (HTTPS recommended for production)
- URL scheme validation prevents file:// or other protocols
- Timeout prevents hanging on unresponsive servers
**Usage:**
```bash
export PULSE_INIT_CONFIG_URL="https://config-server/encrypted-config"
export PULSE_INIT_CONFIG_PASSPHRASE="secret"
pulse config auto-import
```
**Testing:**
✅ Code compiles cleanly
✅ Follows same pattern as existing PULSE_INIT_CONFIG_DATA handling
## Impact
- Completes integration test infrastructure for all instance types
- Enables automated config distribution via HTTP(S) for container deployments
- Removes last TODOs from codebase (no TODO/FIXME remaining in Go files)
Fixes panic: assignment to entry in nil map in PMG polling tests.
**Problem:**
Tests were manually creating Monitor structs without initializing internal
maps like pollStatusMap, causing nil map panics when recordTaskResult()
tried to update task status.
**Root Cause:**
- TestPollPMGInstancePopulatesState (line 90)
- TestPollPMGInstanceRecordsAuthFailures (line 189)
Both created Monitor with only partial field initialization, missing:
- pollStatusMap
- dlqInsightMap
- instanceInfoCache
- Other internal state maps
**Solution:**
Changed both tests to use New() constructor which properly initializes all
maps and internal state (monitor.go:1541). This ensures tests match production
initialization and will automatically pick up any future map additions.
**Tests:**
✅ TestPollPMGInstancePopulatesState - now passes
✅ TestPollPMGInstanceRecordsAuthFailures - now passes
✅ All monitoring tests pass (0.125s)
Follows best practice: use constructors instead of manual struct creation
to maintain initialization invariants.
Add comprehensive instance-level diagnostics to /api/monitoring/scheduler/health
**New Response Structure:**
Enhanced "instances" array with per-instance details:
- Instance metadata: displayName, type, connection URL
- Poll status: last success/error timestamps, error messages, error category
- Circuit breaker: state, timestamps, failure counts, retry windows
- Dead letter: present flag, reason, attempt history, retry schedule
**Implementation:**
Data structures:
- instanceInfo: cache of display names, URLs, types
- pollStatus: tracks successes/errors with timestamps and categories
- dlqInsight: DLQ entry metadata (reason, attempts, schedule)
- circuitBreaker: enhanced with stateSince, lastTransition
Tracking logic:
- buildInstanceInfoCache: populate metadata from config on startup
- recordTaskResult: track poll outcomes, error details, categories
- sendToDeadLetter: capture DLQ insights (reason, timestamps)
- circuitBreaker: record state transitions with timestamps
**Backward Compatible:**
- Existing fields (deadLetter, breakers, staleness) unchanged
- New "instances" array is additive
- Old clients can ignore new fields
**Testing:**
- Unit test: TestSchedulerHealth_EnhancedResponse validates all fields
- Integration tests: still passing (55s)
- All error tracking and breaker history verified
**Operator Benefits:**
- Diagnose issues without log digging
- See error messages directly in API
- Understand breaker states and retry schedules
- Track DLQ entries with full context
- Single API call for complete instance health view
Example: Quickly identify "401 unauthorized" on specific PBS instance,
see it's in DLQ after 5 retries, and know when next retry scheduled.
Part of Phase 2 follow-up work to improve observability.
Implements structured logging package with LOG_LEVEL/LOG_FORMAT env support, debug level guards for hot paths, enriched error messages with actionable context, and stack trace capture for production debugging. Improves observability and reduces log overhead in high-frequency polling loops.
Task 8 of 10 complete. Exposes read-only scheduler health data including:
- Queue depth and distribution by instance type
- Dead-letter queue inspection (top 25 tasks with error details)
- Circuit breaker states (instance-level)
- Staleness scores per instance
New API endpoint:
GET /api/monitoring/scheduler/health (requires authentication)
New snapshot methods:
- StalenessTracker.Snapshot() - exports all staleness data
- TaskQueue.Snapshot() - queue depth & per-type distribution
- TaskQueue.PeekAll() - dead-letter task inspection
- circuitBreaker.State() - exports state, failures, retryAt
- Monitor.SchedulerHealth() - aggregates all health data
Documentation updated with API spec, field descriptions, and usage examples.
Replaces immediate polling with queue-based scheduling:
- TaskQueue with min-heap (container/heap) for NextRun-ordered execution
- Worker goroutines that block on WaitNext() until tasks are due
- Tasks only execute when NextRun <= now, respecting adaptive intervals
- Automatic rescheduling after execution via scheduler.BuildPlan
- Queue depth tracking for backpressure-aware interval adjustments
- Upsert semantics for updating scheduled tasks without duplicates
Task 6 of 10 complete (60%). Ready for error/backoff policies.
Confirms adaptive scheduling logic is fully operational:
- EMA smoothing (alpha=0.6) to prevent interval oscillations
- Staleness-based interpolation between min/max intervals
- Error penalty (0.6x per error) for faster recovery detection
- Queue depth stretch (0.1x per task) for backpressure handling
- ±5% jitter to prevent thundering herd effects
- Per-instance state tracking for smooth transitions
Task 5 of 10 complete. Scheduler foundation ready for queue-based execution.
Adds freshness metadata tracking for all monitored instances:
- StalenessTracker with per-instance last success/error/mutation timestamps
- Change hash detection using SHA1 for detecting data mutations
- Normalized staleness scoring (0-1 scale) based on age vs maxStale
- Integration with PollMetrics for authoritative last-success data
- Wired into all poll functions (PVE/PBS/PMG) via UpdateSuccess/UpdateError
- Connected to scheduler as StalenessSource implementation
Task 4 of 10 complete. Ready for adaptive interval logic.
Track minimum and maximum CPU temperatures since monitoring started.
This provides better insight into temperature trends and cooling
adequacy over time.
Changes:
- Backend: Add CPUMin, CPUMaxRecord, MinRecorded, MaxRecorded fields
to Temperature model
- Backend: Implement min/max tracking logic in monitoring cycle that
preserves values across polling cycles
- Backend: Initialize min/max on first reading, update on extremes
- Frontend: Update Temperature TypeScript interface with new fields
- Frontend: Display min/max range in NodeCard tooltip (e.g., "52°C
(48-67°C since monitoring started)")
- Frontend: Rebuild dist assets
Temperature display now shows:
- Current temperature with color coding (green/yellow/red)
- Tooltip with full min-max range and context
- Min/max tracked in-memory (resets on Pulse restart)
Example tooltip: "CPU: 52°C (48-67°C since monitoring started)"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Fix script input handling to work with standard curl | bash pattern by prioritizing /dev/tty
- Add Raspberry Pi temperature sensor support (cpu_thermal chip and generic temp sensors)
- Add comprehensive documentation for turnkey standalone node setup
- Fix printf formatting error in setup script
Implements automatic temperature monitoring setup for standalone
Proxmox/Pimox nodes without manual SSH key configuration.
Changes:
- Add /api/system/proxy-public-key endpoint to expose proxy's SSH public key
- Setup script now detects standalone nodes (non-cluster)
- Auto-fetches and installs proxy SSH key with forced commands
- Add Raspberry Pi temperature support via cpu_thermal and /sys/class/thermal
- Enhance setup script with better error handling for lm-sensors installation
- Add RPi detection to skip lm-sensors and use native thermal interface
Security:
- Public key endpoint is safe (public keys are meant to be public)
- All installed keys use forced command="sensors -j" with full restrictions
- No shell access, port forwarding, or other SSH features enabled
Implements automated cleanup workflow when nodes are deleted from Pulse, removing all monitoring footprint from the host. Changes include a new RPC handler in the sensor proxy for cleanup requests, enhanced node deletion modal with detailed cleanup explanations, and improved SSH key management with proper tagging for atomic updates.
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access
UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client
The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
Addresses #101
v4.23.0 introduced a regression where systems with only NVMe temperatures
(no CPU sensor) would display "No CPU sensor" in the UI. This was caused
by the Available flag being set to true when NVMe temps existed, even
without CPU data, triggering the error message in the frontend.
Backend changes:
- Add HasCPU and HasNVMe boolean fields to Temperature model
- Extend CPU sensor detection to support more chip types: zenpower,
k8temp, acpitz, it87 (case-insensitive matching)
- HasCPU is set based on CPU chip detection (coretemp, k10temp, etc.),
not value thresholds
- This prevents false negatives when sensors report 0°C during resets
- CPU temperature values now accepted even when 0 (checked with !IsNaN
instead of > 0)
- extractTempInput returns NaN instead of 0 when no data found
- Available flag means "any temperature data exists" for backward compatibility
- Update mock generator to properly set the new flags
- Add unit tests for NVMe-only and 0°C scenarios to prevent regression
- Removed amd_energy from CPU chip list (power sensor, not temperature)
Frontend changes:
- Add hasCPU and hasNVMe optional fields to Temperature interface
- Update NodeSummaryTable to check hasCPU flag with fallback to available
for backward compatibility with older API responses
- Update NodeCard temperature display logic with same fallback pattern
- Systems with only NVMe temps now show "-" instead of error message
- Fallback ensures UI works with both old and new API responses
Testing:
- All unit tests pass including NVMe-only and 0°C test cases
- Fix prevents false "no CPU sensor" errors when sensors temporarily report 0°C
- Fix eliminates false "no CPU sensor" errors for NVMe-only systems
Addresses #528
Introduces pulse-temp-proxy architecture to eliminate SSH key exposure in containers:
**Architecture:**
- pulse-temp-proxy runs on Proxmox host (outside LXC/Docker)
- SSH keys stored on host filesystem (/var/lib/pulse-temp-proxy/ssh/)
- Pulse communicates via unix socket (bind-mounted into container)
- Proxy handles cluster discovery, key rollout, and temperature fetching
**Components:**
- cmd/pulse-temp-proxy: Standalone Go binary with unix socket RPC server
- internal/tempproxy: Client library for Pulse backend
- scripts/install-temp-proxy.sh: Idempotent installer for existing deployments
- scripts/pulse-temp-proxy.service: Systemd service for proxy
**Integration:**
- Pulse automatically detects and uses proxy when socket exists
- Falls back to direct SSH for native installations
- Installer automatically configures proxy for new LXC deployments
- Existing LXC users can upgrade by running install-temp-proxy.sh
**Security improvements:**
- Container compromise no longer exposes SSH keys
- SSH keys never enter container filesystem
- Maintains forced command restrictions
- Transparent to users - no workflow changes
**Documentation:**
- Updated TEMPERATURE_MONITORING.md with new architecture
- Added verification steps and upgrade instructions
- Preserved legacy documentation for native installs
The PUT /api/config/nodes/{id} endpoint was corrupting node configurations
when making partial updates (e.g., updating just monitorPhysicalDisks):
- Authentication fields (tokenName, tokenValue, password) were being cleared
when updating unrelated settings
- Name field was being blanked when not included in request
- Monitor* boolean fields were defaulting to false
Changes:
- Only update name field if explicitly provided in request
- Only switch authentication method when auth fields are explicitly provided
- Preserve existing auth credentials on non-auth updates
- Applied fix to all node types (PVE, PBS, PMG)
Also enables physical disk monitoring by default (opt-out instead of opt-in)
and preserves disk data between polling intervals.