Pulse/internal/monitoring
rcourtman 59a97f2e3e Fix storage disappearing after upgrade by preserving TLS validation
Fixes #657

Between v4.25.0 and v4.26.4, commit 72865ff62 changed cluster endpoint
resolution to prefer IP addresses over hostnames to reduce DNS lookups
(refs #620). However, this caused TLS certificate validation to fail for
installations with VerifySSL=true, because Proxmox certificates typically
contain hostnames (e.g., pve01.example.com), not IP addresses.

When all cluster endpoints failed TLS validation during the initial health
check, the ClusterClient marked all nodes as unhealthy. Subsequent calls
to GetAllStorage() would fail with "no healthy nodes available in cluster",
causing storage data to disappear from the UI despite the cluster being
fully operational.

**Root Cause:**
The IP-first approach breaks TLS hostname verification when:
- VerifySSL is enabled (common for production environments)
- Certificates are issued with hostnames, not IPs (standard practice)
- Result: x509 certificate validation fails (e.g., "certificate is valid
  for pve01.example.com, not 10.0.0.44")

**Solution:**
Conditionally prefer hostnames vs IPs based on TLS validation requirements:

1. When TLS hostname verification is required (VerifySSL=true AND no
   fingerprint override), prefer hostname to ensure certificate CN/SAN
   validation succeeds.

2. When TLS verification is bypassed (VerifySSL=false OR fingerprint
   provided), prefer IP to reduce DNS lookups.

This approach:
- Fixes the regression for users with VerifySSL enabled
- Preserves the DNS optimization for self-signed/fingerprint configs
- Maintains backwards compatibility with v4.25.0 behavior
- Does not compromise TLS security

**Testing:**
Users reported that rolling back to v4.25.0 fixed their storage visibility.
This fix should restore storage for v4.26.4+ while maintaining the DNS
optimization for appropriate scenarios.
2025-11-07 15:36:52 +00:00
..
backoff.go feat: implement error handling with circuit breakers and backoff (Phase 2 Task 7) 2025-10-20 15:13:37 +00:00
backoff_test.go test: add comprehensive unit tests for backoff and circuit breaker (Phase 2 Task 9a) 2025-10-20 15:13:38 +00:00
backup_guard_test.go Guard PBS backups from failed polls 2025-11-05 19:26:20 +00:00
ceph.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
circuit_breaker.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
circuit_breaker_test.go test: add comprehensive unit tests for backoff and circuit breaker (Phase 2 Task 9a) 2025-10-20 15:13:38 +00:00
container_disk_usage.go feat: add professional logging with runtime configuration and performance optimization 2025-10-20 15:13:38 +00:00
diagnostic_snapshots.go Refine Proxmox node memory fallback (#582) 2025-10-22 15:36:26 +00:00
docker_commands.go feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
docker_commands_test.go chore: snapshot current changes 2025-11-02 22:47:55 +00:00
fake_executor_integration.go test: add comprehensive integration test harness for adaptive polling (Phase 2 Task 9c) 2025-10-20 15:13:38 +00:00
fs_filters.go Ignore read-only guest filesystems in disk aggregation 2025-10-14 16:13:53 +00:00
fs_filters_test.go Ignore read-only guest filesystems in disk aggregation 2025-10-14 16:13:53 +00:00
harness_integration.go Surface LXC interface IPs via PVE interfaces API (#596) 2025-10-23 08:07:32 +00:00
helpers_test.go Expand monitoring and discovery test coverage 2025-10-16 08:17:08 +00:00
integration_integration_test.go test: add soak test with runtime instrumentation (Phase 2 Task 9d) 2025-10-20 15:13:38 +00:00
main_test.go Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00
metrics.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
metrics_history.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
metrics_history_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
monitor.go Fix storage disappearing after upgrade by preserving TLS validation 2025-11-07 15:36:52 +00:00
monitor_docker_test.go Refactor: Code cleanup and localStorage consolidation 2025-11-04 21:50:46 +00:00
monitor_health_test.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
monitor_host_agents_test.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
monitor_memory_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_pmg_test.go Fix PMG API parameter issues causing 400 errors 2025-11-05 19:28:37 +00:00
monitor_polling.go Fix guest agent disk data regression on Proxmox 8.3+ 2025-11-06 18:42:46 +00:00
monitor_snapshots_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_storage_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
poller.go feat: add professional logging with runtime configuration and performance optimization 2025-10-20 15:13:38 +00:00
ratetracker.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
ratetracker_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
reload.go Propagate config updates to settings nodes (#588) 2025-10-22 13:45:13 +00:00
scheduler.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
staleness_tracker.go release: prepare v4.25.0 2025-10-22 10:46:18 +00:00
staleness_tracker_test.go test: add comprehensive staleness tracker unit tests (Phase 2 Task 9b) 2025-10-20 15:13:38 +00:00
task_queue.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
temperature.go Improve Docker temperature monitoring documentation for clarity (related to #600) 2025-11-07 15:09:42 +00:00
temperature_service.go Add configurable SSH port for temperature monitoring 2025-11-05 20:03:29 +00:00
temperature_test.go Expand temperature sensor compatibility for SuperIO and AMD CPUs 2025-11-05 18:47:21 +00:00