Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-29 20:10:21 +00:00

Author	SHA1	Message	Date
rcourtman	e42cbe38f0	test: Improve discovery and Docker agent test coverage	2025-12-29 23:37:10 +00:00
rcourtman	6ca4d9b750	Fix P1/P2 infrastructure issues: panic recovery and optimizations This commit addresses 4 P1 important issues and 1 P2 optimization in infrastructure components: P1-1: Missing Panic Recovery in Discovery Service (service.go:172-195, 499-542) - Problem: No panic recovery in Start(), ForceRefresh(), SetSubnet() goroutines - Impact: Silent service death if scan panics, broken discovery with no monitoring - Fix: - Wrapped initial scan goroutine with defer/recover (lines 172-182) - Wrapped scanLoop goroutine with defer/recover (lines 185-195) - Wrapped ForceRefresh scan with defer/recover (lines 499-509) - Wrapped SetSubnet scan with defer/recover (lines 532-542) - All log panics with stack traces for debugging P1-2: Missing Panic Recovery in Config Watcher Callback (watcher.go:546-556) - Problem: User-provided onMockReload callback could panic and crash watcher - Impact: Panicking callback kills watcher goroutine, no config updates - Fix: Wrapped callback invocation with defer/recover and stack trace logging P1-3: Session Store Stop() Using Send Instead of Close (session_store.go:16-84) - Problem: Stop() used channel send which blocks if nobody reads - Impact: Stop() hangs if backgroundWorker already exited - Fix: - Added sync.Once field stopOnce (line 22) - Changed Stop() to use close() within stopOnce.Do() (lines 80-84) - Prevents double-close panic and ensures all readers are signaled P2-1: Backup Cleanup Inefficient O(n²) Sort (persistence.go:1424-1427) - Problem: Bubble sort used to sort backups by modification time - Impact: Inefficient for large backup counts (>100 files) - Fix: - Replaced bubble sort with sort.Slice() using O(n log n) algorithm - Added "sort" import (line 9) - Maintains same oldest-first ordering for deletion logic All fixes add defensive programming without changing external behavior. Panic recovery ensures services continue operating even with bugs, while optimization reduces cleanup time for backup-heavy environments.	2025-11-07 09:55:22 +00:00
rcourtman	ba6d934204	Fix critical P0 infrastructure concurrency issues This commit addresses 3 critical P0 race conditions and resource leaks in core infrastructure: P0-1: Discovery Service Goroutine Leak (service.go:468, 488) - Problem: ForceRefresh() and SetSubnet() spawned unbounded goroutines without checking if scan already in progress - Impact: Rapid API calls create goroutine explosion, resource exhaustion - Fix: - ForceRefresh: Check isScanning before spawning goroutine (lines 470-476) - SetSubnet: Check isScanning, defer scan if already running (lines 491-504) - Both now log when skipping to aid debugging P0-2: Config Persistence Unlock/Relock Race (persistence.go:1177-1206) - Problem: LoadNodesConfig() unlocked RLock, called SaveNodesConfig (acquires Lock), then relocked - Impact: Another goroutine could modify config between unlock/relock, causing migrated data loss - Fix: - Copy instance slices while holding RLock to ensure consistency (lines 1189-1194) - Release lock, save copies, then return without relocking (lines 1196-1205) - Prevents TOCTOU vulnerability where migrations could be overwritten P0-3: Config Watcher Channel Close Race (watcher.go:19-178) - Problem: Stop() used select-check-close pattern vulnerable to concurrent calls - Impact: Multiple Stop() calls panic on double-close - Fix: - Added sync.Once field stopOnce to ConfigWatcher struct (line 26) - Changed Stop() to use stopOnce.Do() ensuring single execution (lines 175-178) - Removed racy select-based guard All fixes maintain backwards compatibility and add defensive logging for operational visibility.	2025-11-07 09:49:55 +00:00
rcourtman	6eb1a10d9b	Refactor: Code cleanup and localStorage consolidation This commit includes comprehensive codebase cleanup and refactoring: ## Code Cleanup - Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate) - Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo) - Clean up commented-out code blocks across multiple files - Remove unused TypeScript exports (helpTextClass, private tag color helpers) - Delete obsolete test files and components ## localStorage Consolidation - Centralize all storage keys into STORAGE_KEYS constant - Update 5 files to use centralized keys: * utils/apiClient.ts (AUTH, LEGACY_TOKEN) * components/Dashboard/Dashboard.tsx (GUEST_METADATA) * components/Docker/DockerHosts.tsx (DOCKER_METADATA) * App.tsx (PLATFORMS_SEEN) * stores/updates.ts (UPDATES) - Benefits: Single source of truth, prevents typos, better maintainability ## Previous Work Committed - Docker monitoring improvements and disk metrics - Security enhancements and setup fixes - API refactoring and cleanup - Documentation updates - Build system improvements ## Testing - All frontend tests pass (29 tests) - All Go tests pass (15 packages) - Production build successful - Zero breaking changes Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)	2025-11-04 21:50:46 +00:00
rcourtman	5c4be1921c	chore: snapshot current changes	2025-11-02 22:47:55 +00:00
rcourtman	56c6c0cc0c	feat: improve discovery with progress tracking, validation, and structured errors Significantly enhanced network discovery feature to eliminate false positives, provide real-time progress updates, and better error reporting. Key improvements: - Require positive Proxmox identification (version data, auth headers, or certificates) instead of reporting any service on ports 8006/8007 - Add real-time progress tracking with phase/target counts and completion percentage - Implement structured error reporting with IP, phase, type, and timestamp details - Fix TLS timeout handling to prevent hangs on unresponsive hosts - Expose progress and structured errors via WebSocket for UI consumption - Reduce log verbosity by moving discovery logs to debug level - Fix duplicate IP counting to ensure progress reaches 100% Breaking changes: None (backward compatible with legacy API methods)	2025-10-20 22:29:30 +00:00
rcourtman	5ebb32ce10	feat: enhance runtime configuration and system settings management Improves configuration handling and system settings APIs to support v4.24.0 features including runtime logging controls, adaptive polling configuration, and enhanced config export/persistence. Changes: - Add config override system for discovery service - Enhance system settings API with runtime logging controls - Improve config persistence and export functionality - Update security setup handling - Refine monitoring and discovery service integration These changes provide the backend support for the configuration features documented in the v4.24.0 release.	2025-10-20 17:41:19 +00:00
rcourtman	c91b7874ac	docs: comprehensive v4.24.0 documentation audit and updates Complete documentation overhaul for Pulse v4.24.0 release covering all new features and operational procedures. Documentation Updates (19 files): P0 Release-Critical: - Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook - Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status - Operations: Enhanced audit-log-rotation.md with scheduler health checks - Security: Updated proxy hardening docs with rate limit defaults - Docker: Added runtime logging and rollback procedures P1 Deployment & Integration: - KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification - PORT_CONFIGURATION.md: Service naming, change tracking via update history - REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification - PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration - TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows Features Documented: - X-RateLimit-* headers for all API responses - Updates rollback workflow (UI & CLI) - Scheduler health API with rich metadata - Runtime logging configuration (no restart required) - Adaptive polling (GA, enabled by default) - Enhanced audit logging - Circuit breakers and dead-letter queue Supporting Changes: - Discovery service enhancements - Config handlers updates - Sensor proxy installer improvements Total Changes: 1,626 insertions(+), 622 deletions(-) Files Modified: 24 (19 docs, 5 code) All documentation is production-ready for v4.24.0 release.	2025-10-20 17:20:13 +00:00
rcourtman	b640347a78	fix: improve discovery performance and reliability Discovery Fixes: - Always update cache even when scan finds no servers (prevents stale data) - Remove automatic re-add of deleted nodes to discovery (was causing confusion) - Optimize Docker subnet scanning from 762 IPs to 254 IPs (3x faster) - Add getHostSubnetFromGateway() to detect host network from container Frontend Type Fixes: - Fix ThresholdsTable editScope type errors - Fix SnapshotAlertConfig index signature - Remove unused variable in Settings.tsx These changes make discovery faster, more reliable, and fix the issue where deleted nodes would persist in the discovery cache or immediately reappear.	2025-10-18 22:59:40 +00:00
rcourtman	f46ff1792b	Fix settings security tab navigation	2025-10-11 23:29:47 +00:00

10 commits