Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-30 12:30:17 +00:00

Author	SHA1	Message	Date
rcourtman	4f824ab148	style: Apply gofmt to 37 files Standardize code formatting across test files and monitor.go. No functional changes.	2025-12-02 17:21:48 +00:00
rcourtman	c812720f25	test: Add Disk UnmarshalJSON RPM and error path tests Cover RPM field handling (numeric, string, SSD, N/A, null, invalid), invalid JSON error path, and unexpected type fallbacks for both wearout and RPM fields. Coverage: 50% → 95.5%	2025-12-02 02:23:44 +00:00
rcourtman	618fc084f1	test: Add invalid user format tests for NewClient Test error handling for password authentication user format validation: - Missing realm separator (no @) - Empty user string - Multiple @ symbols Improves NewClient coverage from 74.2% to 83.9%.	2025-12-02 01:25:11 +00:00
rcourtman	de33653dc2	test: Add invalid value tests for VMFileSystem.UnmarshalJSON Test error handling for JSON parsing edge cases: - Invalid JSON syntax - Unsupported field types (bool, array) - Unparseable string values for total-bytes and used-bytes Improves coverage from 83.3% to 94.4%.	2025-12-02 01:22:42 +00:00
rcourtman	79afff8ba2	test: Add invalid value tests for MemoryStatus.UnmarshalJSON Test error handling for JSON parsing edge cases: - Invalid JSON syntax - Unsupported field types (bool, array, object) - Unparseable string values Improves coverage from 70.0% to 83.3%.	2025-12-02 01:20:15 +00:00
rcourtman	22d9e2795c	test: Add permanent failure test for ClusterClient.GetNodes Tests the error logging path when all endpoints fail with auth error (83.3% to 91.7% coverage).	2025-12-02 01:05:48 +00:00
rcourtman	5bbf7de1a3	test: Add JSON decode error test for Client.GetNodes Tests the error path when server returns invalid JSON (87.5% to 100%).	2025-12-02 01:03:30 +00:00
rcourtman	490fd9a810	test: Add edge cases for parseReplicationJob fields - Test jobid fallback when id field is missing - Test jobnum field takes precedence over ID parsing - Test last_sync_duration and duration fields - Test last-sync-duration fallback format - Test next_sync and next-sync fallback formats Coverage: 79.7% → 100%	2025-12-02 00:24:40 +00:00
rcourtman	29e01f8ff5	test: Add edge case for coerceUint64 ParseUint error branch String 'abc' without .eE characters triggers ParseUint error path. Coverage: 97.4% to 100%.	2025-12-01 23:44:04 +00:00
rcourtman	e2172b16de	test: Add edge case test for isNotImplementedError fallback branch Tab character triggers extractStatusCode fallback path (regex \s+ matches tab but ' 501' substring check doesn't). Coverage: 87.5% to 100%.	2025-12-01 23:18:45 +00:00
rcourtman	2afc7f0c41	test: Add edge case tests for parseWearoutValue function Add 4 new test cases covering previously untested branches: - Float zero exactly (0.0) - Float negative zero (-0.0) - Only escaped quotes becoming empty after trimming - Quoted whitespace becoming empty after trimming Coverage improved from 95.8% to 100%.	2025-12-01 23:02:18 +00:00
rcourtman	be892f5e07	fix: match storage timeout errors without trailing slash The error pattern `/storage/` only matched storage content endpoints (`/storage/{name}/content`) but not the main storage list endpoint (`/nodes/{node}/storage`). This caused storage timeout errors like: Get ".../nodes/pve-100-224/storage": context deadline exceeded to incorrectly mark cluster nodes as unhealthy, even though the timeout was due to a slow cross-node storage query, not actual node connectivity issues. Fixes #754	2025-12-01 22:48:01 +00:00
rcourtman	9097b507fd	test: Add edge case tests for parseReplicationTime function Add 13 new test cases covering previously untested branches: - float32 timestamp with valid value (using smaller value for precision) - float32/float64 zero and negative values - json.Number zero and negative values - int32 and uint32 timestamp handling - Invalid date format strings (no matching layout) - Partial date strings - Unsupported types (bool, slice) Coverage improved from 93.8% to 100%.	2025-12-01 22:44:23 +00:00
rcourtman	18472f1668	test: Add float32 NaN/Inf tests for intFromAny and floatFromAny Add 6 test cases covering float32 special values: - intFromAny: float32 NaN, +Inf, -Inf (all return 0, false) - floatFromAny: float32 NaN, +Inf, -Inf (all return 0, false) Coverage improved: - intFromAny: 96.7% -> 100% - floatFromAny: 95.0% -> 100%	2025-12-01 22:40:08 +00:00
rcourtman	1e9fbdfdcc	test: Add edge case tests for coerceUint64 function Add 6 new test cases covering previously untested branches: - float64 at MaxUint64 boundary (clamping behavior) - float64 exceeding MaxUint64 (overflow protection) - String with quoted "null" value - String with quoted empty value ("") - String with single quoted empty value ('') - Invalid float parsing in scientific notation Coverage improved from 92.3% to 97.4%.	2025-12-01 22:36:03 +00:00
rcourtman	05b9c3ab2d	test: Add tests for CPUInfo.GetMHzString method Add 11 test cases covering: - Nil MHz returns empty string - String MHz returned as-is - Empty string handling - Float64 formatted without decimals - Float64 zero handling - Float64 rounding for large values - Int formatting - Int zero handling - Default formatting for other types (int64, bool, slice) Coverage: GetMHzString 0% -> 100%	2025-12-01 22:29:30 +00:00
rcourtman	1f748e8670	fix: recover unhealthy cluster nodes even when some nodes are healthy Previously, recovery of unhealthy nodes only triggered when ALL nodes were unhealthy. This caused individual degraded nodes to stay degraded forever since operations would succeed on healthy nodes and never trigger the recovery path. Now recovery is attempted whenever any unhealthy nodes exist, allowing clusters to recover individual nodes over time. Also added: - Panic-safe unlock/lock pattern using anonymous function - Refresh of both healthy and cooling endpoints after recovery - Updated timestamp for accurate cooldown checks Related to #754	2025-12-01 21:47:26 +00:00
rcourtman	d9331570f5	test: Add tests for VMAgentField JSON unmarshaling Covers both Proxmox API formats: - Integer format (older versions): direct int value - Object format (Proxmox 8.3+): {enabled, available} fields - Preference order: available > enabled > 0 - Invalid input handling defaults to 0 - Integration with VMStatus struct	2025-12-01 21:40:47 +00:00
rcourtman	32333cdbbe	test: Add tests for authHTTPError.Error and shouldFallbackToForm Tests for Proxmox client authentication error handling: - authHTTPError.Error: message formatting based on status code (401/403 include status in message, others don't) - shouldFallbackToForm: determines when to retry with form encoding (triggers on 400/415, not on auth errors or server errors) 16 test cases covering all code paths.	2025-12-01 13:39:50 +00:00
rcourtman	42eec54d6e	Add unit tests for parseWearoutValue and clampWearoutConsumed functions 52 test cases covering: - Empty/whitespace input - Simple numeric strings and quoted values - Percentage symbols and N/A variants - Float values with truncation - Messy SMART data with digit extraction fallback - Clamping behavior for unknown, normal, and out-of-range values	2025-12-01 09:18:04 +00:00
rcourtman	f9122d736e	Add unit tests for parseUint64Flexible function 32 test cases covering all code paths: - nil, uint64, int, int64, float64 type handling - json.Number parsing (delegates to string branch) - String parsing: empty, decimal, hex (0x/0X), float notation, scientific - Negative value handling (returns 0 for numeric types) - Error cases: invalid strings, unsupported types	2025-12-01 09:11:02 +00:00
rcourtman	37550bff6d	Add unit tests for ZFS device conversion functions Tests added by ADA run #97 but commit was missed. Covers: RaidZ types, log/cache/spare devices, nested mirrors, ConvertToModelZFSPool, and struct field tests.	2025-12-01 09:03:48 +00:00
rcourtman	6c18849f79	Add unit tests for cluster_client utility functions Test coverage for error detection and retry logic: - extractStatusCode: 13 test cases for HTTP status code extraction - isTransientRateLimitError: 17 test cases for rate limit detection - isNotImplementedError: 14 test cases for 501 error detection - isVMSpecificError: 16 test cases for VM-scoped errors - calculateRateLimitBackoff: backoff timing verification - isAuthError: 12 test cases for authentication errors Coverage 35.5% → 37.3%	2025-12-01 00:24:21 +00:00
rcourtman	92c2d198b1	Add unit tests for Proxmox replication utility functions Comprehensive test coverage for JSON parsing helpers used in replication job status parsing: stringFromAny, intFromAny, boolFromAny, floatFromAny, parseReplicationTime, parseDurationSeconds, parseHHMMSSToSeconds, and parseReplicationJob. Coverage increased from 22.6% to 35.5%.	2025-11-30 02:35:11 +00:00
rcourtman	316161f989	Add unit tests for coerceUint64 and FlexInt.UnmarshalJSON 45 test cases covering: - FlexInt: integer/float/string parsing, truncation behavior, error cases - coerceUint64: nil, float64 (including NaN/Inf), int/int32/int64, uint32/uint64, json.Number, string parsing (whitespace, null, quotes, commas, scientific notation), unsupported types Coverage: 20.5% -> 22.6%	2025-11-30 02:17:52 +00:00
rcourtman	69de7c25ce	Fix cluster degraded status not recovering after transient failures The previous fix (`6db4ee7a`) cleared stale error messages but didn't mark endpoints as healthy again after successful operations. This caused clusters to remain in "degraded" state permanently once any endpoint had a temporary issue, even if all endpoints were actually working. The fix now marks endpoints healthy in clearEndpointError() after successful operations, ensuring degraded clusters recover automatically. Related to #659	2025-11-29 19:04:11 +00:00
rcourtman	1b5528356b	fix: clear stale errors after successful cluster operations Previously, errors stored in ClusterClient.lastError were only cleared during initial health checks or when recovering unhealthy nodes. This caused stale error messages to persist in the UI even after the underlying issues were resolved. The fix clears cached errors in two places: 1. After passing connectivity test in getHealthyClient() 2. After successful operation in executeWithFailover() This ensures that once an endpoint starts working again, any previous error messages are cleared from the UI without requiring a restart. Related to #659, #754	2025-11-27 16:22:16 +00:00
rcourtman	bc9e89696b	chore: fix staticcheck U1000 unused code warnings - Remove unused ipv6Regex from validation.go - Suppress unused recordAlertFired/recordAlertResolved hooks (kept for future use) - Remove unused apiLimiter rate limiter - Remove unused stopOnce fields from csrf_store.go and session_store.go - Remove unused lastBroadcast field from hub.go - Remove unused lastUsedIndex field from cluster_client.go	2025-11-27 09:12:17 +00:00
rcourtman	8276ae837e	chore: cleanup proxmox IsAuthError and remove stray comment - Make IsAuthError unexported (isAuthError) since it's only used internally - Remove stray '// test comment' from docker_metadata.go	2025-11-27 08:59:01 +00:00
rcourtman	c439a83fba	chore: remove additional dead code Remove 241 lines of unreachable code across internal and pkg: - internal/crypto/crypto.go: unused NewCryptoManager wrapper - internal/monitoring/scheduler.go: unused fixedIntervalSelector type - internal/ssh/knownhosts/manager.go: unused hostKeyExists function - internal/updates/manager.go: unused getLatestRelease wrapper - internal/updates/updater.go: unused GetAll method - pkg/discovery/discovery.go: unused scanWorker and runPhase (legacy compat) - pkg/proxmox/client.go: unused post, getTaskStatus, waitForTaskCompletion, getTaskLog - pkg/proxmox/cluster_client.go: unused markUnhealthy wrapper	2025-11-27 05:13:26 +00:00
rcourtman	01f7d81d38	style: fix gofmt formatting inconsistencies Run gofmt -w to fix tab/space inconsistencies across 33 files.	2025-11-26 23:44:36 +00:00
rcourtman	b28828a822	Handle VM guest agent errors without marking nodes unhealthy (related to #736 )	2025-11-21 17:34:25 +00:00
rcourtman	2207642fa9	Related to #727 : normalize persisted Proxmox hosts	2025-11-20 19:58:05 +00:00
rcourtman	766cbe573e	Handle missing storage on cluster nodes	2025-11-18 15:57:29 +00:00
rcourtman	7c895df1f3	Fix Proxmox 9.x VM status endpoint incompatibility Proxmox VE 9.x removed support for the "full" parameter in the /nodes/{node}/qemu/{vmid}/status/current endpoint. When Pulse sent GetVMStatus() requests with ?full=1, Proxmox responded with: API error 400: {"errors":{"full":"property is not defined in schema..."}} This caused the cluster client to mark ALL endpoints as unhealthy, which cascaded into multiple failures: - VM status checks failed - Guest agent queries were blocked - Filesystem data collection stopped working - All Windows VMs showed disk:-1 (unknown) instead of actual disk usage The fix removes the ?full=1 parameter since Proxmox 9.x returns all data by default without needing this parameter. This maintains backward compatibility with older Proxmox versions while fixing the issue in 9.x. After this fix: - Cluster endpoints are correctly marked as healthy - Guest agent queries work properly - Windows VMs report actual disk usage (e.g., 26% on C:\ drive) - VM monitoring functions normally on Proxmox 9.x	2025-11-13 11:22:36 +00:00
rcourtman	f61b850179	Ensure VM status requests always return meminfo (Related to #694 )	2025-11-12 17:30:10 +00:00
rcourtman	a406fe42d8	Fix Proxmox 9.x RRD parameter incompatibility causing cluster health issues Proxmox VE 9.x removed support for the 'ds' parameter in RRD endpoints (/nodes/{node}/rrddata and /nodes/{node}/lxc/{vmid}/rrddata). When Pulse sent RRD requests with ds=memused,memavailable,etc., Proxmox responded with: API error 400: {"errors":{"ds":"property is not defined in schema..."}} This caused cluster nodes to be repeatedly marked unhealthy, which cascaded into storage polling failures showing 'All cluster endpoints are unhealthy' even though the nodes were actually healthy and reachable. Changes: - Added check in cluster_client.go executeWithFailover to recognize the ds parameter error as a capability issue rather than node health failure - Nodes with this error no longer get marked unhealthy - Storage polling and other operations now succeed even when RRD calls fail - The RRD data will be unavailable but core monitoring continues This fix maintains backward compatibility with older Proxmox versions while gracefully handling the API change in Proxmox 9.x.	2025-11-08 12:06:08 +00:00
rcourtman	48fabdd827	Improve Docker temperature monitoring documentation for clarity (related to #600 ) Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be more user-friendly and address common setup issues: - Added clear explanation of why the proxy is needed (containers can't access hardware) - Provided concrete IP example instead of placeholder - Showed full docker-compose.yml context with proper YAML structure - Added sudo to commands where needed - Updated docker-compose commands to v2 syntax with note about v1 - Expanded verification steps with clearer success indicators - Added reminder to check container name in verification commands These improvements should help users who encounter blank temperature displays due to missing proxy installation or bind mount configuration.	2025-11-07 15:09:42 +00:00
rcourtman	9199892115	Fix Windows VM disk accumulation bug by normalizing drive letters Related to #656 Windows guest agents can return multiple directory mountpoints (C:\, C:\Users, C:\Windows) all on the same physical drive. When the QEMU guest agent omits disk[] metadata, commit `5325ef481` falls back to using the mountpoint string as the disk identifier. This causes every Windows directory to be treated as a separate disk, accumulating to inflated totals (e.g., 1TB reported for a 250GB drive). Root cause: The fallback logic in pkg/proxmox/client.go:1585-1594 assigns fs.Disk = fs.Mountpoint when disk[] is missing. On Windows, every directory path is unique, so the deduplication guard in internal/monitoring/monitor_polling.go: 619-635 never triggers, causing all directories to be summed. Changes: - Detect Windows-style mountpoints (drive letter + colon + backslash) - Normalize to drive root when disk[] is missing (e.g., C:\Users → C:) - Preserve existing behavior for Linux/BSD and VMs with disk[] metadata - Add debug logging for synthesized Windows drive identifiers This fix maintains backward compatibility with commit `5325ef481` while preventing the Windows directory accumulation issue. LXC containers are unaffected as they use a different code path.	2025-11-07 12:27:11 +00:00
rcourtman	1a78dcbba2	Fix guest agent disk data regression on Proxmox 8.3+ Related to #630 Proxmox 8.3+ changed the VM status API to return the `agent` field as an object ({"enabled":1,"available":1}) instead of an integer (0 or 1). This caused Pulse to incorrectly treat VMs as having no guest agent, resulting in missing disk usage data (disk:-1) even when the guest agent was running and functional. The issue manifested as: - VMs showing "Guest details unavailable" or missing disk data - Pulse logs showing no "Guest agent enabled, querying filesystem info" messages - `pvesh get /nodes/<node>/qemu/<vmid>/agent/get-fsinfo` working correctly from the command line, confirming the agent was functional Root cause: The VMStatus struct defined `Agent` as an int field. When Proxmox 8.3+ sent the new object format, JSON unmarshaling silently left the field at zero, causing Pulse to skip all guest agent queries. Changes: - Created VMAgentField type with custom UnmarshalJSON to handle both formats: * Legacy (Proxmox <8.3): integer (0 or 1) * Modern (Proxmox 8.3+): object {"enabled":N,"available":N} - Updated VMStatus.Agent from `int` to `VMAgentField` - Updated all references to `detailedStatus.Agent` to use `.Agent.Value` - The unmarshaler prioritizes the "available" field over "enabled" to ensure we only query when the agent is actually responding This fix maintains backward compatibility with older Proxmox versions while supporting the new format introduced in Proxmox 8.3+.	2025-11-06 18:42:46 +00:00
rcourtman	af55362009	Fix inflated RAM usage reporting for LXC containers Related to #553 ## Problem LXC containers showed inflated memory usage (e.g., 90%+ when actual usage was 50-60%, 96% when actual was 61%) because the code used the raw `mem` value from Proxmox's `/cluster/resources` API endpoint. This value comes from cgroup `memory.current` which includes reclaimable cache and buffers, making memory appear nearly full even when plenty is available. ## Root Cause - Nodes: Had sophisticated cache-aware memory calculation with RRD fallbacks - VMs (qemu): Had detailed memory calculation using guest agent meminfo - LXCs: Naively used `res.Mem` directly without any cache-aware correction The Proxmox cluster resources API's `mem` field for LXCs includes cache/buffers (from cgroup memory accounting), which should be excluded for accurate "used" memory. ## Solution Implement cache-aware memory calculation for LXC containers by: 1. Adding `GetLXCRRDData()` method to fetch RRD metrics for LXC containers from `/nodes/{node}/lxc/{vmid}/rrddata` 2. Using RRD `memavailable` to calculate actual used memory (total - available) 3. Falling back to RRD `memused` if `memavailable` is not available 4. Only using cluster resources `mem` value as last resort This matches the approach already used for nodes and VMs, providing consistent cache-aware memory reporting across all resource types. ## Changes - Added `GuestRRDPoint` type and `GetLXCRRDData()` method to pkg/proxmox - Added `GetLXCRRDData()` to ClusterClient for cluster-aware operations - Modified LXC memory calculation in `pollPVEInstance()` to use RRD data when available - Added guest memory snapshot recording for LXC containers - Updated test stubs to implement the new interface method ## Testing - Code compiles successfully - Follows the same proven pattern used for nodes and VMs - Includes diagnostic snapshot recording for troubleshooting	2025-11-06 00:16:18 +00:00
rcourtman	23691d5b41	Improve cluster health diagnostics and error messaging Related to #405 Enhances error reporting and logging when all cluster endpoints are unhealthy, making it easier to diagnose connectivity issues. Changes: 1. Enhanced error messages in cluster_client.go: - Error now includes list of unreachable endpoints - Added detailed logging when no healthy endpoints available - Log at WARN level (not DEBUG) when cluster health check fails - Better context in recovery attempts with start/completion summaries 2. Improved storage polling resilience in monitor_polling.go: - Better error context when cluster storage polling fails - Specific guidance for "no healthy nodes available" scenario - Storage polling continues with direct node queries even if cluster-wide query fails (already worked, but now clearer) 3. Better recovery logging: - Log when recovery attempts start with list of unhealthy endpoints - Log individual recovery failures at DEBUG level - Log recovery summary (success/failure counts) - Track throttled endpoints separately for clearer diagnostics These changes help users understand: - Which specific endpoints are unreachable - Whether it's a network/connectivity issue vs. API issue - That Pulse will continue trying to recover endpoints automatically - That storage monitoring continues via direct node queries The root issue is that Pulse's internal health tracking can mark all endpoints unhealthy when they're unreachable from the Pulse server, even if Proxmox reports them as "online" in cluster status. Better logging helps diagnose these network connectivity issues.	2025-11-05 19:44:29 +00:00
rcourtman	6eb1a10d9b	Refactor: Code cleanup and localStorage consolidation This commit includes comprehensive codebase cleanup and refactoring: ## Code Cleanup - Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate) - Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo) - Clean up commented-out code blocks across multiple files - Remove unused TypeScript exports (helpTextClass, private tag color helpers) - Delete obsolete test files and components ## localStorage Consolidation - Centralize all storage keys into STORAGE_KEYS constant - Update 5 files to use centralized keys: * utils/apiClient.ts (AUTH, LEGACY_TOKEN) * components/Dashboard/Dashboard.tsx (GUEST_METADATA) * components/Docker/DockerHosts.tsx (DOCKER_METADATA) * App.tsx (PLATFORMS_SEEN) * stores/updates.ts (UPDATES) - Benefits: Single source of truth, prevents typos, better maintainability ## Previous Work Committed - Docker monitoring improvements and disk metrics - Security enhancements and setup fixes - API refactoring and cleanup - Documentation updates - Build system improvements ## Testing - All frontend tests pass (29 tests) - All Go tests pass (15 packages) - Production build successful - Zero breaking changes Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)	2025-11-04 21:50:46 +00:00
rcourtman	a885fb5472	Surface LXC interface IPs via PVE interfaces API (#596 )	2025-10-23 08:07:32 +00:00
rcourtman	b95c01066e	Capture dynamic LXC IP metrics (#596 )	2025-10-23 07:50:45 +00:00
rcourtman	be85459db2	Add LXC config metadata for guest drawers (#596 )	2025-10-23 07:30:32 +00:00
rcourtman	aac3dacd63	Improve LXC guest metrics visibility (#596 )	2025-10-22 22:24:33 +00:00
rcourtman	3a3e0e080c	Add replication monitoring plumbing and UI Refs #395	2025-10-22 16:10:15 +00:00
rcourtman	c9543e8a7e	Add qemu guest agent version metadata	2025-10-22 15:24:07 +00:00
rcourtman	f8b6aa6c97	Treat 501 responses as non-fatal in cluster failover (#449 )	2025-10-22 14:23:13 +00:00

1 2

60 commits