Pulse/pkg
rcourtman 23691d5b41 Improve cluster health diagnostics and error messaging
Related to #405

Enhances error reporting and logging when all cluster endpoints are
unhealthy, making it easier to diagnose connectivity issues.

Changes:

1. Enhanced error messages in cluster_client.go:
   - Error now includes list of unreachable endpoints
   - Added detailed logging when no healthy endpoints available
   - Log at WARN level (not DEBUG) when cluster health check fails
   - Better context in recovery attempts with start/completion summaries

2. Improved storage polling resilience in monitor_polling.go:
   - Better error context when cluster storage polling fails
   - Specific guidance for "no healthy nodes available" scenario
   - Storage polling continues with direct node queries even if
     cluster-wide query fails (already worked, but now clearer)

3. Better recovery logging:
   - Log when recovery attempts start with list of unhealthy endpoints
   - Log individual recovery failures at DEBUG level
   - Log recovery summary (success/failure counts)
   - Track throttled endpoints separately for clearer diagnostics

These changes help users understand:
- Which specific endpoints are unreachable
- Whether it's a network/connectivity issue vs. API issue
- That Pulse will continue trying to recover endpoints automatically
- That storage monitoring continues via direct node queries

The root issue is that Pulse's internal health tracking can mark all
endpoints unhealthy when they're unreachable from the Pulse server,
even if Proxmox reports them as "online" in cluster status. Better
logging helps diagnose these network connectivity issues.
2025-11-05 19:44:29 +00:00
..
agents Refactor: Code cleanup and localStorage consolidation 2025-11-04 21:50:46 +00:00
discovery Update Pulse install flow and related components 2025-10-21 19:58:53 +00:00
pbs Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
pmg Fix PMG API parameter issues causing 400 errors 2025-11-05 19:28:37 +00:00
proxmox Improve cluster health diagnostics and error messaging 2025-11-05 19:44:29 +00:00
tlsutil Add DNS caching to reduce excessive DNS queries 2025-11-05 18:25:38 +00:00