Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-29 20:10:21 +00:00

Author	SHA1	Message	Date
rcourtman	dc94f6092a	Add retry logic for guest agent filesystem info in efficient polling Related to #630 When using the efficient polling path (cluster/resources endpoint), guest agent calls to GetVMFSInfo were made without retry logic. This could cause transient "Guest details unavailable" errors during initialization when the guest agent wasn't immediately ready to respond. The traditional polling path already used retryGuestAgentCall for filesystem info queries, providing resilience against transient timeouts. This commit applies the same retry logic to the efficient polling path for consistency. Changes: - Wrap GetVMFSInfo call in efficient polling with retryGuestAgentCall - Use configured guestAgentFSInfoTimeout and guestAgentRetries settings - Ensures consistent behavior between traditional and efficient polling paths This should resolve the transient initialization issue reported in #630 where guest details were unavailable until after a reinstall/restart.	2025-11-05 19:49:17 +00:00
rcourtman	9670afe0cb	Fix NODE column in backups to show actual guest node Related to discussion #577 When backups are stored on shared storage accessible from multiple nodes, the backup polling code was incorrectly assigning the backup to whichever node it was discovered on during the scan, rather than the node where the VM/container actually resides. This fix: - Builds a lookup map of VMID -> actual node at the start of backup polling - Uses this map to assign the correct node for guest backups (VMID > 0) - Preserves existing behavior for host backups (VMID == 0) - Falls back to the queried node if the guest is not found in the map This ensures the NODE column accurately reflects which node hosts each guest, matching the information displayed on the main page.	2025-11-05 19:38:32 +00:00
rcourtman	4c1d7a2797	Fix PMG API parameter issues causing 400 errors Related to #614 Corrects three issues with PMG monitoring: 1. Remove unsupported timeframe parameter from GetMailStatistics - PMG API /statistics/mail does not accept timeframe parameter - Previously sent "timeframe=day" causing 400 error - API returns current day statistics by default 2. Fix GetMailCount timespan parameter to use seconds - Changed from 24 (hours) to 86400 (seconds) - PMG API expects timespan in seconds, not hours - Previously sent "timespan=24" causing 400 error 3. Update function signature and tests - Renamed GetMailCount parameter from timespanHours to timespanSeconds - Updated test expectations to match corrected API calls - Tests verify parameters are sent correctly These changes align the PMG client with actual PMG API requirements, fixing the data population issues reported in v4.25.0.	2025-11-05 19:28:37 +00:00
rcourtman	fcba710183	Guard PBS backups from failed polls Related to #613 When all PBS datastore queries fail (e.g., due to network issues or PBS downtime), the system was clearing all backups and showing an empty list. This adds the same preservation logic that exists for PVE storage backups. Changes: - Add shouldPreservePBSBackups() helper function - Track datastore query success/failure counts in pollPBSBackups() - Preserve existing backups when all datastore queries fail - Add comprehensive unit tests for PBS backup preservation logic This ensures users can still see their backup history even during temporary connectivity issues with PBS, matching the behavior already implemented for PVE storage backups.	2025-11-05 19:26:20 +00:00
rcourtman	350828a260	Prefer IP addresses over hostnames for cluster communication This change modifies the `clusterEndpointEffectiveURL` function to prioritize IP addresses over hostnames when building cluster endpoint URLs. This eliminates excessive DNS lookups that can overwhelm DNS servers (e.g., pi-hole), which was causing hundreds of thousands of unnecessary DNS queries. When Pulse communicates with Proxmox cluster nodes, it will now: 1. First try to use the IP address from ClusterEndpoint.IP 2. Fall back to ClusterEndpoint.Host only if IP is not available This is a minimal, backwards-compatible change that maintains existing functionality while dramatically reducing DNS traffic for clusters where node IPs are already known and stored. Related to #620	2025-11-05 19:23:26 +00:00
rcourtman	b1831d7b3e	Add guest URL support for PVE hosts Related to discussion #615 Add optional GuestURL field to PVE instances and cluster endpoints, allowing users to specify a separate guest-accessible URL for web UI navigation that differs from the internal management URL. Backend changes: - Add GuestURL field to PVEInstance and ClusterEndpoint structs - Add GuestURL field to Node model - Update cluster auto-discovery to preserve existing GuestURL values - Update node creation logic to populate GuestURL from config - Update API handlers to accept and persist GuestURL field Frontend changes: - Add GuestURL input field to NodeModal for configuration - Update NodeGroupHeader and NodeSummaryTable to use GuestURL for navigation - Add GuestURL to Node and PVENodeConfig TypeScript interfaces When GuestURL is configured, it will be used for navigation links instead of the Host URL, allowing users to access PVE hosts through a reverse proxy or different domain while maintaining internal API connections.	2025-11-05 19:06:08 +00:00
rcourtman	7dd7a0b0f9	Fix node/host dropout issue caused by cluster health failures Implemented comprehensive state preservation to prevent temporary dropouts: 1. Node Grace Period (60s): - Track last-online timestamp for each Proxmox node - Preserve online status during grace period to prevent flapping - Applied to all node status checks throughout codebase 2. Efficient Polling Preservation: - Detect when cluster/resources returns empty arrays - Preserve previous VMs/containers if had resources before - Handles cluster health check failures gracefully 3. Traditional Polling Preservation: - Updated preservation logic for per-node VM/container polling - Triggers when zero resources returned regardless of node response - Fixed issue where nodes responding with empty data bypassed preservation Root cause: Intermittent Proxmox cluster health failures ("no healthy nodes available") caused both efficient and traditional polling to return empty arrays, immediately clearing all VMs/containers from state. Changes: - internal/monitoring/monitor.go: Added node grace period, efficient polling preservation - internal/monitoring/monitor_polling.go: Fixed traditional polling preservation logic Fixes frequent UI flickering where vmCount/containerCount would briefly drop to zero.	2025-11-05 17:01:20 +00:00
rcourtman	27f2038dab	Add per-node temperature monitoring and fix critical config update bug This commit implements per-node temperature monitoring control and fixes a critical bug where partial node updates were destroying existing configuration. Backend changes: - Add TemperatureMonitoringEnabled field (bool) to PVEInstance, PBSInstance, and PMGInstance - Update monitor.go to check per-node temperature setting with global fallback - Convert all NodeConfigRequest boolean fields to bool pointers - Add nil checks in HandleUpdateNode to prevent overwriting unmodified fields - Fix critical bug where partial updates zeroed out MonitorVMs, MonitorContainers, etc. - Update NodeResponse, NodeFrontend, and StateSnapshot to include temperature setting - Fix HandleAddNode and test connection handlers to use pointer-based boolean fields Frontend changes: - Add temperatureMonitoringEnabled to Node interface and config types - Create per-node temperature monitoring toggle handler with optimistic updates - Update NodeModal to wire up per-node temperature toggle - Add isTemperatureMonitoringEnabled helper to check effective monitoring state - Update ConfiguredNodeTables to show/hide temperature badge based on monitoring state - Update NodeSummaryTable to conditionally show temperature column - Pass globalTemperatureMonitoringEnabled prop through component tree The critical bug fix ensures that when updating a single field (like temperature monitoring), the backend only modifies that specific field instead of zeroing out all other boolean configuration fields.	2025-11-05 14:11:53 +00:00
rcourtman	e4e915c8a1	Fix temperature data intermittency caused by proxy rate limit retries Root Cause: The classifyError() function in tempproxy/client.go was returning nil when err was nil, even if respError contained "rate limit exceeded". This caused the retry logic to treat rate limit errors as retryable, triggering 3 retries with exponential backoff (100ms, 200ms, 400ms) for each rate-limited request. With multiple nodes polling simultaneously and hitting the proxy's 1 req/sec default rate limit, this created a retry storm: - 3 nodes polling every 10 seconds - 1-2 requests rate limited per cycle - Each rate limit triggered 3 retries - Result: 6+ extra requests per cycle, causing temperature data to flicker in and out as requests were dropped Solution: 1. Reordered classifyError() to check respError first before checking if err is nil, ensuring rate limit errors are properly classified 2. Added explicit rate limit detection that marks these errors as non-retryable 3. Added stub EnableTemperatureMonitoring/DisableTemperatureMonitoring methods to Monitor for interface compatibility Impact: - Rate limit retry attempts reduced from 151 in 10 minutes to 0 - Temperature data now stable for all nodes - No more flickering temperature displays in dashboard	2025-11-05 10:20:15 +00:00
rcourtman	7a185c4ab3	Improve guest agent timeout handling for high-load environments (refs #592 ) This change addresses intermittent "Guest details unavailable" and "Disk stats unavailable" errors affecting users with large VM deployments (50+ VMs) or high-load Proxmox environments. Changes: - Increased default guest agent timeouts (3-5s → 10-15s) to better handle environments under load - Added automatic retry logic (1 retry by default) for transient timeout failures - Made all timeouts and retry count configurable via environment variables: * GUEST_AGENT_FSINFO_TIMEOUT (default: 15s) * GUEST_AGENT_NETWORK_TIMEOUT (default: 10s) * GUEST_AGENT_OSINFO_TIMEOUT (default: 10s) * GUEST_AGENT_VERSION_TIMEOUT (default: 10s) * GUEST_AGENT_RETRIES (default: 1) - Added comprehensive documentation in VM_DISK_MONITORING.md with configuration examples for different deployment scenarios These improvements allow Pulse to gracefully handle intermittent API timeouts without immediately displaying errors, while remaining configurable for different network conditions and environment sizes. Fixes: https://github.com/rcourtman/Pulse/discussions/592	2025-11-05 09:40:58 +00:00
rcourtman	d52ac6d8b5	Fix CSRF token validation and improve token management - Add Access-Control-Expose-Headers to allow frontend to read X-CSRF-Token response header - Implement proactive CSRF token issuance on GET requests when session exists but CSRF cookie is missing - Ensures frontend always has valid CSRF token before making POST requests - Fixes 403 Forbidden errors when toggling system settings This resolves CSRF validation failures that occurred when CSRF tokens expired or were missing while valid sessions existed.	2025-11-05 09:23:44 +00:00
rcourtman	10862db4e4	Enhance container detection for temperature SSH safeguards (refs #601 )	2025-11-04 22:30:35 +00:00
rcourtman	6eb1a10d9b	Refactor: Code cleanup and localStorage consolidation This commit includes comprehensive codebase cleanup and refactoring: ## Code Cleanup - Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate) - Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo) - Clean up commented-out code blocks across multiple files - Remove unused TypeScript exports (helpTextClass, private tag color helpers) - Delete obsolete test files and components ## localStorage Consolidation - Centralize all storage keys into STORAGE_KEYS constant - Update 5 files to use centralized keys: * utils/apiClient.ts (AUTH, LEGACY_TOKEN) * components/Dashboard/Dashboard.tsx (GUEST_METADATA) * components/Docker/DockerHosts.tsx (DOCKER_METADATA) * App.tsx (PLATFORMS_SEEN) * stores/updates.ts (UPDATES) - Benefits: Single source of truth, prevents typos, better maintainability ## Previous Work Committed - Docker monitoring improvements and disk metrics - Security enhancements and setup fixes - API refactoring and cleanup - Documentation updates - Build system improvements ## Testing - All frontend tests pass (29 tests) - All Go tests pass (15 packages) - Production build successful - Zero breaking changes Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)	2025-11-04 21:50:46 +00:00
rcourtman	5c4be1921c	chore: snapshot current changes	2025-11-02 22:47:55 +00:00
rcourtman	730c6bf864	Fix Docker agent removal and improve security This commit addresses multiple issues in the Docker/host agent removal flow: Agent Stop Fix: - Add systemctl stop command after agent acknowledgement to prevent systemd restart - Previous behavior: agent disabled but systemd immediately restarted it (Restart=always) - New behavior: agent disables itself, sends ack, then stops systemd service completely UX Improvements: - Add real-time elapsed time counter during removal wait - Show progress indicators prominently (no longer hidden in dropdown) - Display expected time range (30-60 seconds) and last heartbeat - Auto-show timeout warning after 2 minutes with actionable "Force remove" button - Add contextual help explaining what's happening at each stage Security Enhancement: - Automatically revoke API tokens when removing Docker/host agents - Previous behavior: tokens remained valid after agent removal - New behavior: tokens are revoked and persisted immediately on removal - Prevents removed agents from re-authenticating with old credentials	2025-10-29 12:27:36 +00:00
rcourtman	32392d1212	Add disk metrics, block I/O, and mount details to Docker monitoring Extends Docker container monitoring with comprehensive disk and storage information: - Writable layer size and root filesystem usage displayed in new Disk column - Block I/O statistics (read/write bytes totals) shown in container drawer - Mount metadata including type, source, destination, mode, and driver details - Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets) Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.	2025-10-29 12:05:36 +00:00
rcourtman	f2acdd59af	Normalize docker agent version handling	2025-10-28 08:42:58 +00:00
rcourtman	68ce8e7520	feat: finalize swarm service monitoring (#598 )	2025-10-26 09:35:49 +00:00
rcourtman	77282bd3a6	Implement Pulse tag overrides and alert clear persistence	2025-10-25 14:28:32 +00:00
rcourtman	d643dcf0bc	perf: reduce polling allocations and guest metadata load	2025-10-25 13:12:47 +00:00
rcourtman	6333a445e9	feat: add native Windows service support and expandable host details Windows Host Agent Enhancements: - Implement native Windows service support using golang.org/x/sys/windows/svc - Add Windows Event Log integration for troubleshooting - Create professional PowerShell installation/uninstallation scripts - Add process termination and retry logic to handle Windows file locking - Register uninstall endpoint at /uninstall-host-agent.ps1 Host Agent UI Improvements: - Add expandable drawer to Hosts page (click row to view details) - Display system info, network interfaces, disks, and temperatures in cards - Replace status badges with subtle colored indicators - Remove redundant master-detail sidebar layout - Add search filtering for hosts Technical Details: - service_windows.go: Windows service lifecycle management with graceful shutdown - service_stub.go: Cross-platform compatibility for non-Windows builds - install-host-agent.ps1: Full Windows installation with validation - uninstall-host-agent.ps1: Clean removal with process termination and retries - HostsOverview.tsx: Expandable row pattern matching Docker/Proxmox pages Files Added: - cmd/pulse-host-agent/service_windows.go - cmd/pulse-host-agent/service_stub.go - scripts/install-host-agent.ps1 - scripts/uninstall-host-agent.ps1 - frontend-modern/src/components/Hosts/HostsOverview.tsx - frontend-modern/src/components/Hosts/HostsFilter.tsx The Windows service now starts reliably with automatic restart on failure, and the uninstall script handles file locking gracefully without requiring reboots.	2025-10-23 22:11:56 +00:00
rcourtman	e76ab5eec0	Strip IPv6 scopes from container metadata (#596 )	2025-10-23 08:55:18 +00:00
rcourtman	a885fb5472	Surface LXC interface IPs via PVE interfaces API (#596 )	2025-10-23 08:07:32 +00:00
rcourtman	b95c01066e	Capture dynamic LXC IP metrics (#596 )	2025-10-23 07:50:45 +00:00
rcourtman	be85459db2	Add LXC config metadata for guest drawers (#596 )	2025-10-23 07:30:32 +00:00
rcourtman	f4ead79c82	Ensure LXC drawers populate without metrics (#596 )	2025-10-22 22:27:19 +00:00
rcourtman	aac3dacd63	Improve LXC guest metrics visibility (#596 )	2025-10-22 22:24:33 +00:00
rcourtman	fe1533ea13	Improve PMG metric ingestion refs #551	2025-10-22 18:15:27 +00:00
rcourtman	7ae393c8ec	Refine Proxmox node memory fallback (#582 )	2025-10-22 15:36:26 +00:00
rcourtman	c9543e8a7e	Add qemu guest agent version metadata	2025-10-22 15:24:07 +00:00
rcourtman	be26f957c0	Add snapshot size alert thresholds (#585 )	2025-10-22 13:30:40 +00:00
rcourtman	f83caf8933	Add collision-safe Docker host identifiers (#590 )	2025-10-22 12:30:25 +00:00
rcourtman	4eb8bed9b5	Fix initial setup caching and container discovery defaults	2025-10-22 07:34:32 +00:00
rcourtman	2786afdff0	feat: comprehensive diagnostics and observability improvements Upgrade diagnostics infrastructure from 5/10 to 8/10 production readiness with enhanced metrics, logging, and request correlation capabilities. Request Correlation - Wire request IDs through context in middleware - Return X-Request-ID header in all API responses - Enable downstream log correlation across request lifecycle HTTP/API Metrics (18 new Prometheus metrics) - pulse_http_request_duration_seconds - API latency histogram - pulse_http_requests_total - request counter by method/route/status - pulse_http_request_errors_total - error counter by type - Path normalization to control label cardinality Per-Node Poll Metrics - pulse_monitor_node_poll_duration_seconds - per-node timing - pulse_monitor_node_poll_total - success/error counts per node - pulse_monitor_node_poll_errors_total - error breakdown per node - pulse_monitor_node_poll_last_success_timestamp - freshness tracking - pulse_monitor_node_poll_staleness_seconds - age since last success - Enables multi-node hotspot identification Scheduler Health Metrics - pulse_scheduler_queue_due_soon - ready queue depth - pulse_scheduler_queue_depth - by instance type - pulse_scheduler_queue_wait_seconds - time in queue histogram - pulse_scheduler_dead_letter_depth - failed task tracking - pulse_scheduler_breaker_state - circuit breaker state - pulse_scheduler_breaker_failure_count - consecutive failures - pulse_scheduler_breaker_retry_seconds - time until retry - Enable alerting on DLQ spikes, breaker opens, queue backlogs Diagnostics Endpoint Caching - pulse_diagnostics_cache_hits_total - cache performance - pulse_diagnostics_cache_misses_total - cache misses - pulse_diagnostics_refresh_duration_seconds - probe timing - 45-second TTL prevents thundering herd on /api/diagnostics - Thread-safe with RWMutex - X-Diagnostics-Cached-At header shows cache freshness Debug Log Performance - Gate high-frequency debug logs behind IsLevelEnabled() checks - Reduces CPU waste in production when debug disabled - Covers scheduler loops, poll cycles, API handlers Persistent Logging - File logging with automatic rotation - LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, LOG_COMPRESS env vars - MultiWriter sends logs to both stderr and file - Gzip compression support for rotated logs Files modified: - internal/api/diagnostics.go (caching layer) - internal/api/middleware.go (request IDs, HTTP metrics) - internal/api/http_metrics.go (NEW - HTTP metric definitions) - internal/logging/logging.go (file logging with rotation) - internal/monitoring/metrics.go (node + scheduler metrics) - internal/monitoring/monitor.go (instrumentation, debug gating) Impact: Dramatically improved production troubleshooting with per-node visibility, scheduler health metrics, persistent logs, and cached diagnostics. Fast incident response now possible for multi-node deployments.	2025-10-21 12:37:39 +00:00
rcourtman	5ebb32ce10	feat: enhance runtime configuration and system settings management Improves configuration handling and system settings APIs to support v4.24.0 features including runtime logging controls, adaptive polling configuration, and enhanced config export/persistence. Changes: - Add config override system for discovery service - Enhance system settings API with runtime logging controls - Improve config persistence and export functionality - Update security setup handling - Refine monitoring and discovery service integration These changes provide the backend support for the configuration features documented in the v4.24.0 release.	2025-10-20 17:41:19 +00:00
rcourtman	9b1709a05b	feat: enhance scheduler health API with rich instance metadata Add comprehensive instance-level diagnostics to /api/monitoring/scheduler/health New Response Structure: Enhanced "instances" array with per-instance details: - Instance metadata: displayName, type, connection URL - Poll status: last success/error timestamps, error messages, error category - Circuit breaker: state, timestamps, failure counts, retry windows - Dead letter: present flag, reason, attempt history, retry schedule Implementation: Data structures: - instanceInfo: cache of display names, URLs, types - pollStatus: tracks successes/errors with timestamps and categories - dlqInsight: DLQ entry metadata (reason, attempts, schedule) - circuitBreaker: enhanced with stateSince, lastTransition Tracking logic: - buildInstanceInfoCache: populate metadata from config on startup - recordTaskResult: track poll outcomes, error details, categories - sendToDeadLetter: capture DLQ insights (reason, timestamps) - circuitBreaker: record state transitions with timestamps Backward Compatible: - Existing fields (deadLetter, breakers, staleness) unchanged - New "instances" array is additive - Old clients can ignore new fields Testing: - Unit test: TestSchedulerHealth_EnhancedResponse validates all fields - Integration tests: still passing (55s) - All error tracking and breaker history verified Operator Benefits: - Diagnose issues without log digging - See error messages directly in API - Understand breaker states and retry schedules - Track DLQ entries with full context - Single API call for complete instance health view Example: Quickly identify "401 unauthorized" on specific PBS instance, see it's in DLQ after 5 retries, and know when next retry scheduled. Part of Phase 2 follow-up work to improve observability.	2025-10-20 15:13:38 +00:00
rcourtman	2636ba9137	test: add comprehensive integration test harness for adaptive polling (Phase 2 Task 9c) Add PollExecutor seam and integration test infrastructure: PollExecutor Interface: - Add pluggable executor interface for testability - Implement realExecutor wrapping existing poll functions - Add SetExecutor() for test injection - Zero impact on production behavior Integration Test Harness: - Build-tagged integration tests (go:build integration) - Synthetic workload generator with configurable scenarios - Fake executor simulating latencies, failures, recovery - Runtime metrics collection (queue depth, staleness, goroutines) Comprehensive Assertions: - Queue depth bounds: stays within 1.5× instance count - Staleness: healthy instances <20s, multiple poll cycles - Circuit breakers: transient failures recover, permanent stay blocked - Dead-letter queue: only permanent failures routed - Scheduler health: snapshot consistency validation Test Scenarios: - 10 healthy PVE instances (rapid polling) - 1 transient failure instance (fail → recover) - 1 permanent failure instance (DLQ routing) - 55s test duration with 3s base intervals - Validates full adaptive scheduler lifecycle Runs with: go test -tags=integration ./internal/monitoring -run TestAdaptiveSchedulerIntegration Part of Phase 2 Task 9 (Integration/Soak Testing)	2025-10-20 15:13:38 +00:00
rcourtman	160adeb3b8	feat: add scheduler health API endpoint (Phase 2 Task 8) Task 8 of 10 complete. Exposes read-only scheduler health data including: - Queue depth and distribution by instance type - Dead-letter queue inspection (top 25 tasks with error details) - Circuit breaker states (instance-level) - Staleness scores per instance New API endpoint: GET /api/monitoring/scheduler/health (requires authentication) New snapshot methods: - StalenessTracker.Snapshot() - exports all staleness data - TaskQueue.Snapshot() - queue depth & per-type distribution - TaskQueue.PeekAll() - dead-letter task inspection - circuitBreaker.State() - exports state, failures, retryAt - Monitor.SchedulerHealth() - aggregates all health data Documentation updated with API spec, field descriptions, and usage examples.	2025-10-20 15:13:38 +00:00
rcourtman	b1f445b33d	feat: implement error handling with circuit breakers and backoff (Phase 2 Task 7) Adds comprehensive error resilience: - Circuit breaker with closed/open/half-open states (3 failures = trip) - Exponential backoff with jitter (2s initial, 2x multiplier, 5min max) - Dead-letter queue for tasks exceeding 5 retry attempts - Error classification (transient vs permanent) using internal/errors helpers - Per-instance failure tracking and breaker state management - Integration with staleness tracker for outcome recording Task 7 of 10 complete (70%). Ready for API surfaces and testing.	2025-10-20 15:13:37 +00:00
rcourtman	aa5c08ad4a	feat: implement priority queue-based task execution (Phase 2 Task 6) Replaces immediate polling with queue-based scheduling: - TaskQueue with min-heap (container/heap) for NextRun-ordered execution - Worker goroutines that block on WaitNext() until tasks are due - Tasks only execute when NextRun <= now, respecting adaptive intervals - Automatic rescheduling after execution via scheduler.BuildPlan - Queue depth tracking for backpressure-aware interval adjustments - Upsert semantics for updating scheduled tasks without duplicates Task 6 of 10 complete (60%). Ready for error/backoff policies.	2025-10-20 15:13:37 +00:00
rcourtman	c7d1abf874	feat: implement staleness tracker for adaptive polling (Phase 2 Task 4) Adds freshness metadata tracking for all monitored instances: - StalenessTracker with per-instance last success/error/mutation timestamps - Change hash detection using SHA1 for detecting data mutations - Normalized staleness scoring (0-1 scale) based on age vs maxStale - Integration with PollMetrics for authoritative last-success data - Wired into all poll functions (PVE/PBS/PMG) via UpdateSuccess/UpdateError - Connected to scheduler as StalenessSource implementation Task 4 of 10 complete. Ready for adaptive interval logic.	2025-10-20 15:13:37 +00:00
rcourtman	57429900a6	feat: add adaptive polling scheduler infrastructure (Phase 2 Tasks 1-3) Implements adaptive scheduling foundation for Phase 2: - Poll cycle metrics: duration, staleness, queue depth, in-flight counters - Adaptive scheduler with pluggable staleness/interval/enqueue interfaces - Config support: ADAPTIVE_POLLING_ENABLED flag + min/max/base intervals - Feature flag defaults to disabled for safe rollout - Scheduler wiring into Monitor with conditional instantiation Tasks 1-3 of 10 complete. Ready for staleness tracker implementation.	2025-10-20 15:13:37 +00:00
rcourtman	524f42cc28	security: complete Phase 1 sensor proxy hardening Implements comprehensive security hardening for pulse-sensor-proxy: - Privilege drop from root to unprivileged user (UID 995) - Hash-chained tamper-evident audit logging with remote forwarding - Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps - Enhanced command validation with 10+ attack pattern tests - Fuzz testing (7M+ executions, 0 crashes) - SSH hardening, AppArmor/seccomp profiles, operational runbooks All 27 Phase 1 tasks complete. Ready for production deployment.	2025-10-20 15:13:37 +00:00
Pulse Automation Bot	cfdfe896be	Adjust backup and snapshot alert handling	2025-10-18 20:11:01 +00:00
Pulse Automation Bot	80b9d0602a	Add Apprise notification integration (#570 )	2025-10-18 16:39:39 +00:00
Pulse Automation Bot	0b4e4f9c59	Add configurable backup polling interval	2025-10-18 13:06:41 +00:00
Richard Courtman	97b9c6739c	feat: add min/max temperature tracking for nodes Track minimum and maximum CPU temperatures since monitoring started. This provides better insight into temperature trends and cooling adequacy over time. Changes: - Backend: Add CPUMin, CPUMaxRecord, MinRecorded, MaxRecorded fields to Temperature model - Backend: Implement min/max tracking logic in monitoring cycle that preserves values across polling cycles - Backend: Initialize min/max on first reading, update on extremes - Frontend: Update Temperature TypeScript interface with new fields - Frontend: Display min/max range in NodeCard tooltip (e.g., "52°C (48-67°C since monitoring started)") - Frontend: Rebuild dist assets Temperature display now shows: - Current temperature with color coding (green/yellow/red) - Tooltip with full min-max range and context - Min/max tracked in-memory (resets on Pulse restart) Example tooltip: "CPU: 52°C (48-67°C since monitoring started)" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-18 08:15:10 +00:00
rcourtman	f141f7db33	feat: enhance sensor proxy with improved cluster discovery and SSH management Improvements to pulse-sensor-proxy: - Fix cluster discovery to use pvecm status for IP addresses instead of node names - Add standalone node support for non-clustered Proxmox hosts - Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting - Add --pulse-server flag to installer for custom Pulse URLs - Configure www-data group membership for Proxmox IPC access UI and API cleanup: - Remove unused "Ensure cluster keys" button from Settings - Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint - Remove EnsureClusterKeys method from tempproxy client The setup script already handles SSH key distribution during initial configuration, making the manual refresh button redundant.	2025-10-17 11:43:26 +00:00
rcourtman	3a4fc044ea	Add guest agent caching and update doc hints (refs #560 )	2025-10-16 08:15:49 +00:00
rcourtman	91fecacfef	feat: add docker agent command handling	2025-10-15 19:27:19 +00:00

... 2 3 4 5 6

260 commits