Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 11:30:15 +00:00

Author	SHA1	Message	Date
rcourtman	9e339957c6	fix: Update runtime config when toggling Docker update actions setting The DisableDockerUpdateActions setting was being saved to disk but not updated in h.config, causing the UI toggle to appear to revert on page refresh since the API returned the stale runtime value. Related to #1023	2026-01-03 11:14:17 +00:00
rcourtman	e42cbe38f0	test: Improve discovery and Docker agent test coverage	2025-12-29 23:37:10 +00:00
rcourtman	88d419dd5b	feat(ai): Add enriched context with historical trends and predictions Phase 1 of Pulse AI differentiation: - Create internal/ai/context package with types, trends, builder, formatter - Implement linear regression for trend computation (growing/declining/stable/volatile) - Add storage capacity predictions (predicts days until 90% and 100%) - Wire MetricsHistory from monitor to patrol service - Update patrol to use buildEnrichedContext instead of basic summary - Update patrol prompt to reference trend indicators and predictions This gives the AI awareness of historical patterns, enabling it to: - Identify resources with concerning growth rates - Predict capacity exhaustion before it happens - Distinguish between stable high usage vs growing problems - Provide more actionable, time-aware insights All tests passing. Falls back to basic summary if metrics history unavailable.	2025-12-12 09:45:57 +00:00
rcourtman	ca229240eb	Add unit tests for discovery config_override utility functions Tests for parseCIDRs, parseCIDRMap, environmentFromOverride, shouldPruneContainerNetworks, isLikelyContainerPhase, filterPhasesForEnvironment, and ApplyConfigToProfile. Coverage increased from 29.0% to 51.0%.	2025-11-30 00:50:07 +00:00
rcourtman	305aab88df	Fix discovery test Prometheus metric collision Remove t.Parallel() from tests that verify global Prometheus gauge values. When tests run in parallel, they update the same global gauges (discoveryScanServers, discoveryScanErrors) causing race conditions and incorrect metric values. Fixes test failure in workflow run 19281332332: - TestPerformScanRecordsHistoryAndMetrics expected 2 servers, got 1 Related to release workflow preflight tests.	2025-11-11 23:34:49 +00:00
rcourtman	6ca4d9b750	Fix P1/P2 infrastructure issues: panic recovery and optimizations This commit addresses 4 P1 important issues and 1 P2 optimization in infrastructure components: P1-1: Missing Panic Recovery in Discovery Service (service.go:172-195, 499-542) - Problem: No panic recovery in Start(), ForceRefresh(), SetSubnet() goroutines - Impact: Silent service death if scan panics, broken discovery with no monitoring - Fix: - Wrapped initial scan goroutine with defer/recover (lines 172-182) - Wrapped scanLoop goroutine with defer/recover (lines 185-195) - Wrapped ForceRefresh scan with defer/recover (lines 499-509) - Wrapped SetSubnet scan with defer/recover (lines 532-542) - All log panics with stack traces for debugging P1-2: Missing Panic Recovery in Config Watcher Callback (watcher.go:546-556) - Problem: User-provided onMockReload callback could panic and crash watcher - Impact: Panicking callback kills watcher goroutine, no config updates - Fix: Wrapped callback invocation with defer/recover and stack trace logging P1-3: Session Store Stop() Using Send Instead of Close (session_store.go:16-84) - Problem: Stop() used channel send which blocks if nobody reads - Impact: Stop() hangs if backgroundWorker already exited - Fix: - Added sync.Once field stopOnce (line 22) - Changed Stop() to use close() within stopOnce.Do() (lines 80-84) - Prevents double-close panic and ensures all readers are signaled P2-1: Backup Cleanup Inefficient O(n²) Sort (persistence.go:1424-1427) - Problem: Bubble sort used to sort backups by modification time - Impact: Inefficient for large backup counts (>100 files) - Fix: - Replaced bubble sort with sort.Slice() using O(n log n) algorithm - Added "sort" import (line 9) - Maintains same oldest-first ordering for deletion logic All fixes add defensive programming without changing external behavior. Panic recovery ensures services continue operating even with bugs, while optimization reduces cleanup time for backup-heavy environments.	2025-11-07 09:55:22 +00:00
rcourtman	ba6d934204	Fix critical P0 infrastructure concurrency issues This commit addresses 3 critical P0 race conditions and resource leaks in core infrastructure: P0-1: Discovery Service Goroutine Leak (service.go:468, 488) - Problem: ForceRefresh() and SetSubnet() spawned unbounded goroutines without checking if scan already in progress - Impact: Rapid API calls create goroutine explosion, resource exhaustion - Fix: - ForceRefresh: Check isScanning before spawning goroutine (lines 470-476) - SetSubnet: Check isScanning, defer scan if already running (lines 491-504) - Both now log when skipping to aid debugging P0-2: Config Persistence Unlock/Relock Race (persistence.go:1177-1206) - Problem: LoadNodesConfig() unlocked RLock, called SaveNodesConfig (acquires Lock), then relocked - Impact: Another goroutine could modify config between unlock/relock, causing migrated data loss - Fix: - Copy instance slices while holding RLock to ensure consistency (lines 1189-1194) - Release lock, save copies, then return without relocking (lines 1196-1205) - Prevents TOCTOU vulnerability where migrations could be overwritten P0-3: Config Watcher Channel Close Race (watcher.go:19-178) - Problem: Stop() used select-check-close pattern vulnerable to concurrent calls - Impact: Multiple Stop() calls panic on double-close - Fix: - Added sync.Once field stopOnce to ConfigWatcher struct (line 26) - Changed Stop() to use stopOnce.Do() ensuring single execution (lines 175-178) - Removed racy select-based guard All fixes maintain backwards compatibility and add defensive logging for operational visibility.	2025-11-07 09:49:55 +00:00
rcourtman	6eb1a10d9b	Refactor: Code cleanup and localStorage consolidation This commit includes comprehensive codebase cleanup and refactoring: ## Code Cleanup - Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate) - Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo) - Clean up commented-out code blocks across multiple files - Remove unused TypeScript exports (helpTextClass, private tag color helpers) - Delete obsolete test files and components ## localStorage Consolidation - Centralize all storage keys into STORAGE_KEYS constant - Update 5 files to use centralized keys: * utils/apiClient.ts (AUTH, LEGACY_TOKEN) * components/Dashboard/Dashboard.tsx (GUEST_METADATA) * components/Docker/DockerHosts.tsx (DOCKER_METADATA) * App.tsx (PLATFORMS_SEEN) * stores/updates.ts (UPDATES) - Benefits: Single source of truth, prevents typos, better maintainability ## Previous Work Committed - Docker monitoring improvements and disk metrics - Security enhancements and setup fixes - API refactoring and cleanup - Documentation updates - Build system improvements ## Testing - All frontend tests pass (29 tests) - All Go tests pass (15 packages) - Production build successful - Zero breaking changes Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)	2025-11-04 21:50:46 +00:00
rcourtman	5c4be1921c	chore: snapshot current changes	2025-11-02 22:47:55 +00:00
rcourtman	4eb8bed9b5	Fix initial setup caching and container discovery defaults	2025-10-22 07:34:32 +00:00
rcourtman	56c6c0cc0c	feat: improve discovery with progress tracking, validation, and structured errors Significantly enhanced network discovery feature to eliminate false positives, provide real-time progress updates, and better error reporting. Key improvements: - Require positive Proxmox identification (version data, auth headers, or certificates) instead of reporting any service on ports 8006/8007 - Add real-time progress tracking with phase/target counts and completion percentage - Implement structured error reporting with IP, phase, type, and timestamp details - Fix TLS timeout handling to prevent hangs on unresponsive hosts - Expose progress and structured errors via WebSocket for UI consumption - Reduce log verbosity by moving discovery logs to debug level - Fix duplicate IP counting to ensure progress reaches 100% Breaking changes: None (backward compatible with legacy API methods)	2025-10-20 22:29:30 +00:00
rcourtman	5ebb32ce10	feat: enhance runtime configuration and system settings management Improves configuration handling and system settings APIs to support v4.24.0 features including runtime logging controls, adaptive polling configuration, and enhanced config export/persistence. Changes: - Add config override system for discovery service - Enhance system settings API with runtime logging controls - Improve config persistence and export functionality - Update security setup handling - Refine monitoring and discovery service integration These changes provide the backend support for the configuration features documented in the v4.24.0 release.	2025-10-20 17:41:19 +00:00
rcourtman	c91b7874ac	docs: comprehensive v4.24.0 documentation audit and updates Complete documentation overhaul for Pulse v4.24.0 release covering all new features and operational procedures. Documentation Updates (19 files): P0 Release-Critical: - Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook - Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status - Operations: Enhanced audit-log-rotation.md with scheduler health checks - Security: Updated proxy hardening docs with rate limit defaults - Docker: Added runtime logging and rollback procedures P1 Deployment & Integration: - KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification - PORT_CONFIGURATION.md: Service naming, change tracking via update history - REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification - PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration - TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows Features Documented: - X-RateLimit-* headers for all API responses - Updates rollback workflow (UI & CLI) - Scheduler health API with rich metadata - Runtime logging configuration (no restart required) - Adaptive polling (GA, enabled by default) - Enhanced audit logging - Circuit breakers and dead-letter queue Supporting Changes: - Discovery service enhancements - Config handlers updates - Sensor proxy installer improvements Total Changes: 1,626 insertions(+), 622 deletions(-) Files Modified: 24 (19 docs, 5 code) All documentation is production-ready for v4.24.0 release.	2025-10-20 17:20:13 +00:00
rcourtman	b640347a78	fix: improve discovery performance and reliability Discovery Fixes: - Always update cache even when scan finds no servers (prevents stale data) - Remove automatic re-add of deleted nodes to discovery (was causing confusion) - Optimize Docker subnet scanning from 762 IPs to 254 IPs (3x faster) - Add getHostSubnetFromGateway() to detect host network from container Frontend Type Fixes: - Fix ThresholdsTable editScope type errors - Fix SnapshotAlertConfig index signature - Remove unused variable in Settings.tsx These changes make discovery faster, more reliable, and fix the issue where deleted nodes would persist in the discovery cache or immediately reappear.	2025-10-18 22:59:40 +00:00
rcourtman	f46ff1792b	Fix settings security tab navigation	2025-10-11 23:29:47 +00:00

15 commits