Commit graph

93 commits

Author SHA1 Message Date
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
c05817f9de docs: Add godoc comments to exported functions
Add missing godoc comments to:
- NewRateLimiter and Allow in ratelimit.go
- SnapshotSyncStatus in temperature_proxy.go
- NewClient and GetVersion in pkg/pmg/client.go
2025-12-02 15:58:59 +00:00
rcourtman
c812720f25 test: Add Disk UnmarshalJSON RPM and error path tests
Cover RPM field handling (numeric, string, SSD, N/A, null, invalid),
invalid JSON error path, and unexpected type fallbacks for both
wearout and RPM fields.
Coverage: 50% → 95.5%
2025-12-02 02:23:44 +00:00
rcourtman
618fc084f1 test: Add invalid user format tests for NewClient
Test error handling for password authentication user format validation:
- Missing realm separator (no @)
- Empty user string
- Multiple @ symbols

Improves NewClient coverage from 74.2% to 83.9%.
2025-12-02 01:25:11 +00:00
rcourtman
de33653dc2 test: Add invalid value tests for VMFileSystem.UnmarshalJSON
Test error handling for JSON parsing edge cases:
- Invalid JSON syntax
- Unsupported field types (bool, array)
- Unparseable string values for total-bytes and used-bytes

Improves coverage from 83.3% to 94.4%.
2025-12-02 01:22:42 +00:00
rcourtman
79afff8ba2 test: Add invalid value tests for MemoryStatus.UnmarshalJSON
Test error handling for JSON parsing edge cases:
- Invalid JSON syntax
- Unsupported field types (bool, array, object)
- Unparseable string values

Improves coverage from 70.0% to 83.3%.
2025-12-02 01:20:15 +00:00
rcourtman
22d9e2795c test: Add permanent failure test for ClusterClient.GetNodes
Tests the error logging path when all endpoints fail with auth error
(83.3% to 91.7% coverage).
2025-12-02 01:05:48 +00:00
rcourtman
5bbf7de1a3 test: Add JSON decode error test for Client.GetNodes
Tests the error path when server returns invalid JSON (87.5% to 100%).
2025-12-02 01:03:30 +00:00
rcourtman
490fd9a810 test: Add edge cases for parseReplicationJob fields
- Test jobid fallback when id field is missing
- Test jobnum field takes precedence over ID parsing
- Test last_sync_duration and duration fields
- Test last-sync-duration fallback format
- Test next_sync and next-sync fallback formats

Coverage: 79.7% → 100%
2025-12-02 00:24:40 +00:00
rcourtman
29e01f8ff5 test: Add edge case for coerceUint64 ParseUint error branch
String 'abc' without .eE characters triggers ParseUint error path.
Coverage: 97.4% to 100%.
2025-12-01 23:44:04 +00:00
rcourtman
e2172b16de test: Add edge case test for isNotImplementedError fallback branch
Tab character triggers extractStatusCode fallback path (regex \s+ matches
tab but ' 501' substring check doesn't). Coverage: 87.5% to 100%.
2025-12-01 23:18:45 +00:00
rcourtman
2afc7f0c41 test: Add edge case tests for parseWearoutValue function
Add 4 new test cases covering previously untested branches:
- Float zero exactly (0.0)
- Float negative zero (-0.0)
- Only escaped quotes becoming empty after trimming
- Quoted whitespace becoming empty after trimming

Coverage improved from 95.8% to 100%.
2025-12-01 23:02:18 +00:00
rcourtman
be892f5e07 fix: match storage timeout errors without trailing slash
The error pattern `/storage/` only matched storage content endpoints
(`/storage/{name}/content`) but not the main storage list endpoint
(`/nodes/{node}/storage`).

This caused storage timeout errors like:
  Get ".../nodes/pve-100-224/storage": context deadline exceeded

to incorrectly mark cluster nodes as unhealthy, even though the timeout
was due to a slow cross-node storage query, not actual node connectivity
issues.

Fixes #754
2025-12-01 22:48:01 +00:00
rcourtman
9097b507fd test: Add edge case tests for parseReplicationTime function
Add 13 new test cases covering previously untested branches:
- float32 timestamp with valid value (using smaller value for precision)
- float32/float64 zero and negative values
- json.Number zero and negative values
- int32 and uint32 timestamp handling
- Invalid date format strings (no matching layout)
- Partial date strings
- Unsupported types (bool, slice)

Coverage improved from 93.8% to 100%.
2025-12-01 22:44:23 +00:00
rcourtman
18472f1668 test: Add float32 NaN/Inf tests for intFromAny and floatFromAny
Add 6 test cases covering float32 special values:
- intFromAny: float32 NaN, +Inf, -Inf (all return 0, false)
- floatFromAny: float32 NaN, +Inf, -Inf (all return 0, false)

Coverage improved:
- intFromAny: 96.7% -> 100%
- floatFromAny: 95.0% -> 100%
2025-12-01 22:40:08 +00:00
rcourtman
1e9fbdfdcc test: Add edge case tests for coerceUint64 function
Add 6 new test cases covering previously untested branches:
- float64 at MaxUint64 boundary (clamping behavior)
- float64 exceeding MaxUint64 (overflow protection)
- String with quoted "null" value
- String with quoted empty value ("")
- String with single quoted empty value ('')
- Invalid float parsing in scientific notation

Coverage improved from 92.3% to 97.4%.
2025-12-01 22:36:03 +00:00
rcourtman
05b9c3ab2d test: Add tests for CPUInfo.GetMHzString method
Add 11 test cases covering:
- Nil MHz returns empty string
- String MHz returned as-is
- Empty string handling
- Float64 formatted without decimals
- Float64 zero handling
- Float64 rounding for large values
- Int formatting
- Int zero handling
- Default formatting for other types (int64, bool, slice)

Coverage: GetMHzString 0% -> 100%
2025-12-01 22:29:30 +00:00
rcourtman
1f748e8670 fix: recover unhealthy cluster nodes even when some nodes are healthy
Previously, recovery of unhealthy nodes only triggered when ALL nodes
were unhealthy. This caused individual degraded nodes to stay degraded
forever since operations would succeed on healthy nodes and never
trigger the recovery path.

Now recovery is attempted whenever any unhealthy nodes exist, allowing
clusters to recover individual nodes over time.

Also added:
- Panic-safe unlock/lock pattern using anonymous function
- Refresh of both healthy and cooling endpoints after recovery
- Updated timestamp for accurate cooldown checks

Related to #754
2025-12-01 21:47:26 +00:00
rcourtman
d9331570f5 test: Add tests for VMAgentField JSON unmarshaling
Covers both Proxmox API formats:
- Integer format (older versions): direct int value
- Object format (Proxmox 8.3+): {enabled, available} fields
- Preference order: available > enabled > 0
- Invalid input handling defaults to 0
- Integration with VMStatus struct
2025-12-01 21:40:47 +00:00
rcourtman
32333cdbbe test: Add tests for authHTTPError.Error and shouldFallbackToForm
Tests for Proxmox client authentication error handling:
- authHTTPError.Error: message formatting based on status code
  (401/403 include status in message, others don't)
- shouldFallbackToForm: determines when to retry with form encoding
  (triggers on 400/415, not on auth errors or server errors)

16 test cases covering all code paths.
2025-12-01 13:39:50 +00:00
rcourtman
42eec54d6e Add unit tests for parseWearoutValue and clampWearoutConsumed functions
52 test cases covering:
- Empty/whitespace input
- Simple numeric strings and quoted values
- Percentage symbols and N/A variants
- Float values with truncation
- Messy SMART data with digit extraction fallback
- Clamping behavior for unknown, normal, and out-of-range values
2025-12-01 09:18:04 +00:00
rcourtman
f9122d736e Add unit tests for parseUint64Flexible function
32 test cases covering all code paths:
- nil, uint64, int, int64, float64 type handling
- json.Number parsing (delegates to string branch)
- String parsing: empty, decimal, hex (0x/0X), float notation, scientific
- Negative value handling (returns 0 for numeric types)
- Error cases: invalid strings, unsupported types
2025-12-01 09:11:02 +00:00
rcourtman
37550bff6d Add unit tests for ZFS device conversion functions
Tests added by ADA run #97 but commit was missed.
Covers: RaidZ types, log/cache/spare devices, nested mirrors,
ConvertToModelZFSPool, and struct field tests.
2025-12-01 09:03:48 +00:00
rcourtman
68c0e79b21 Add unit tests for cloneProfile and clonePhase functions in discovery
Add comprehensive tests for the cloneProfile and clonePhase utility
functions in pkg/discovery/discovery.go. Tests verify deep copying
behavior for all fields including subnets, metadata, warnings, extra
targets, and phases to ensure mutations don't affect original objects.
2025-12-01 01:51:54 +00:00
rcourtman
6c18849f79 Add unit tests for cluster_client utility functions
Test coverage for error detection and retry logic:
- extractStatusCode: 13 test cases for HTTP status code extraction
- isTransientRateLimitError: 17 test cases for rate limit detection
- isNotImplementedError: 14 test cases for 501 error detection
- isVMSpecificError: 16 test cases for VM-scoped errors
- calculateRateLimitBackoff: backoff timing verification
- isAuthError: 12 test cases for authentication errors

Coverage 35.5% → 37.3%
2025-12-01 00:24:21 +00:00
rcourtman
dc76294ce1 Add unit tests for discovery package utility functions
Test coverage for pure utility functions: friendlyPhaseName,
defaultProductsForPort, cloneHeader, copyMetadata, and
ensurePolicyDefaults.
2025-11-30 16:05:11 +00:00
rcourtman
efafbe8e31 Add unit tests for PMG flexible JSON type parsers
Tests for flexibleFloat and flexibleInt custom JSON unmarshalers that
handle PMG API responses where numeric values may arrive as numbers,
strings, or nulls. 64 test cases covering:

- JSON numbers (integers, floats, scientific notation, negatives)
- String values (numeric strings, empty, whitespace, "null")
- JSON null values
- Error cases (invalid strings, arrays, objects, booleans)
- Boundary values (max/min float64)
- Real PMG response patterns (mail stats, queue status)
- Struct embedding behavior
2025-11-30 03:04:12 +00:00
rcourtman
92c2d198b1 Add unit tests for Proxmox replication utility functions
Comprehensive test coverage for JSON parsing helpers used in
replication job status parsing: stringFromAny, intFromAny,
boolFromAny, floatFromAny, parseReplicationTime, parseDurationSeconds,
parseHHMMSSToSeconds, and parseReplicationJob.

Coverage increased from 22.6% to 35.5%.
2025-11-30 02:35:11 +00:00
rcourtman
316161f989 Add unit tests for coerceUint64 and FlexInt.UnmarshalJSON
45 test cases covering:
- FlexInt: integer/float/string parsing, truncation behavior, error cases
- coerceUint64: nil, float64 (including NaN/Inf), int/int32/int64,
  uint32/uint64, json.Number, string parsing (whitespace, null, quotes,
  commas, scientific notation), unsupported types

Coverage: 20.5% -> 22.6%
2025-11-30 02:17:52 +00:00
rcourtman
69de7c25ce Fix cluster degraded status not recovering after transient failures
The previous fix (6db4ee7a) cleared stale error messages but didn't mark
endpoints as healthy again after successful operations. This caused
clusters to remain in "degraded" state permanently once any endpoint had
a temporary issue, even if all endpoints were actually working.

The fix now marks endpoints healthy in clearEndpointError() after
successful operations, ensuring degraded clusters recover automatically.

Related to #659
2025-11-29 19:04:11 +00:00
rcourtman
2eea0335a2 Extract filesystem filtering logic into pkg/fsfilters
Move the inline filesystem skip logic from pollVMsAndContainersEfficient
into a reusable ShouldSkipFilesystem function. This consolidates filtering
for virtual filesystems (tmpfs, cgroup, etc.), network mounts (nfs, cifs,
fuse), and special mountpoints (/dev, /proc, /snap, etc.) into one tested
location.

Reduces cyclomatic complexity of pollVMsAndContainersEfficient and adds
28 test cases covering virtual fs types, network mounts, special mounts,
Windows paths, and edge cases.
2025-11-29 16:38:08 +00:00
rcourtman
1b5528356b fix: clear stale errors after successful cluster operations
Previously, errors stored in ClusterClient.lastError were only cleared
during initial health checks or when recovering unhealthy nodes. This
caused stale error messages to persist in the UI even after the
underlying issues were resolved.

The fix clears cached errors in two places:
1. After passing connectivity test in getHealthyClient()
2. After successful operation in executeWithFailover()

This ensures that once an endpoint starts working again, any previous
error messages are cleared from the UI without requiring a restart.

Related to #659, #754
2025-11-27 16:22:16 +00:00
rcourtman
ad998a1e2f style: fix staticcheck style warnings
- Merge variable declaration with assignment (S1021)
- Use unconditional strings.TrimPrefix (S1017)
- Remove unnecessary nil checks around range (S1031)
- Remove unnecessary fmt.Sprintf (S1039)
- Use copy() instead of manual loop (S1001)
- Use time.Until instead of t.Sub(time.Now()) (S1024)
- Use buf.String() instead of string(buf.Bytes()) (S1030)
2025-11-27 09:19:33 +00:00
rcourtman
bc9e89696b chore: fix staticcheck U1000 unused code warnings
- Remove unused ipv6Regex from validation.go
- Suppress unused recordAlertFired/recordAlertResolved hooks (kept for future use)
- Remove unused apiLimiter rate limiter
- Remove unused stopOnce fields from csrf_store.go and session_store.go
- Remove unused lastBroadcast field from hub.go
- Remove unused lastUsedIndex field from cluster_client.go
2025-11-27 09:12:17 +00:00
rcourtman
8276ae837e chore: cleanup proxmox IsAuthError and remove stray comment
- Make IsAuthError unexported (isAuthError) since it's only used internally
- Remove stray '// test comment' from docker_metadata.go
2025-11-27 08:59:01 +00:00
rcourtman
3045aa16fb chore: remove unused phaseError type from discovery 2025-11-27 08:47:13 +00:00
rcourtman
c439a83fba chore: remove additional dead code
Remove 241 lines of unreachable code across internal and pkg:
- internal/crypto/crypto.go: unused NewCryptoManager wrapper
- internal/monitoring/scheduler.go: unused fixedIntervalSelector type
- internal/ssh/knownhosts/manager.go: unused hostKeyExists function
- internal/updates/manager.go: unused getLatestRelease wrapper
- internal/updates/updater.go: unused GetAll method
- pkg/discovery/discovery.go: unused scanWorker and runPhase (legacy compat)
- pkg/proxmox/client.go: unused post, getTaskStatus, waitForTaskCompletion, getTaskLog
- pkg/proxmox/cluster_client.go: unused markUnhealthy wrapper
2025-11-27 05:13:26 +00:00
rcourtman
01f7d81d38 style: fix gofmt formatting inconsistencies
Run gofmt -w to fix tab/space inconsistencies across 33 files.
2025-11-26 23:44:36 +00:00
rcourtman
c9ce98d679 test: add unit tests for pkg/discovery/envdetect
Add 20 tests covering:
- Environment enum String() method
- DefaultScanPolicy default values
- ScanPolicy struct fields
- SubnetPhase struct fields
- EnvironmentProfile struct fields
- parseHexIP little-endian IP parsing
- tryCommonSubnets fallback subnet list
- profileWithWarning helper
- addFallbackSubnets with confidence handling
2025-11-26 14:25:17 +00:00
rcourtman
981b41eeb8 test: add unit tests for pkg/agents/docker 2025-11-26 14:22:58 +00:00
rcourtman
d57c2a5a6c test: add unit tests for pkg/agents/host 2025-11-26 14:16:57 +00:00
rcourtman
f6e59b59c3 test: add unit tests for pkg/pbs 2025-11-26 14:15:49 +00:00
rcourtman
cd66266236 test: add unit tests for pkg/tlsutil 2025-11-26 14:14:34 +00:00
rcourtman
ea335546fc feat: improve legacy agent detection and migration UX
Add seamless migration path from legacy agents to unified agent:

- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer

Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
2025-11-25 23:26:22 +00:00
courtmanr@gmail.com
ac9384aee4 Fix PBS discovery causing auth failures
Increases confidence score for PBS when receiving 401/403 responses to avoid unnecessary probing of other endpoints that trigger auth failure logs.

Fixes #741
2025-11-23 00:08:54 +00:00
rcourtman
b28828a822 Handle VM guest agent errors without marking nodes unhealthy (related to #736) 2025-11-21 17:34:25 +00:00
rcourtman
2207642fa9 Related to #727: normalize persisted Proxmox hosts 2025-11-20 19:58:05 +00:00
rcourtman
766cbe573e Handle missing storage on cluster nodes 2025-11-18 15:57:29 +00:00
rcourtman
7c895df1f3 Fix Proxmox 9.x VM status endpoint incompatibility
Proxmox VE 9.x removed support for the "full" parameter in the
/nodes/{node}/qemu/{vmid}/status/current endpoint. When Pulse sent
GetVMStatus() requests with ?full=1, Proxmox responded with:

  API error 400: {"errors":{"full":"property is not defined in schema..."}}

This caused the cluster client to mark ALL endpoints as unhealthy, which
cascaded into multiple failures:
- VM status checks failed
- Guest agent queries were blocked
- Filesystem data collection stopped working
- All Windows VMs showed disk:-1 (unknown) instead of actual disk usage

The fix removes the ?full=1 parameter since Proxmox 9.x returns all data
by default without needing this parameter. This maintains backward
compatibility with older Proxmox versions while fixing the issue in 9.x.

After this fix:
- Cluster endpoints are correctly marked as healthy
- Guest agent queries work properly
- Windows VMs report actual disk usage (e.g., 26% on C:\ drive)
- VM monitoring functions normally on Proxmox 9.x
2025-11-13 11:22:36 +00:00
rcourtman
f61b850179 Ensure VM status requests always return meminfo (Related to #694) 2025-11-12 17:30:10 +00:00