Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-22 03:02:35 +00:00

Author	SHA1	Message	Date
rcourtman	4c4fd3a99b	Fix demo server update workflow race condition Add asset availability check before updating demo server. The workflow now waits up to 5 minutes for checksums.txt and the linux-amd64 tarball to be available before attempting the update. This prevents the install script from failing when the release is published before all assets finish uploading. Resolves demo server downtime during releases.	2025-11-11 01:17:58 +00:00
rcourtman	a0f551bea2	Bump version to 4.28.0	2025-11-11 00:28:23 +00:00
rcourtman	accecdb50b	Make api_tokens.json authoritative source for API tokens (fixes #685 ) This is the proper architectural fix for #685. The previous commit was a bandaid that prevented unnecessary .env writes. This commit addresses the root cause: dual-source-of-truth for API tokens (.env vs api_tokens.json). Changes: 1. Startup Migration (config.go:896-951): - When loading config, if API_TOKEN/API_TOKENS exist in .env but not in api_tokens.json, automatically migrate them - Migrated tokens are named "Migrated from .env (prefix)" for clarity - Logs a deprecation warning: API_TOKEN/API_TOKENS in .env are deprecated - Leaves .env untouched (safe for existing deployments) 2. Config Watcher Changes (watcher.go:338-424): - Only load tokens from .env if api_tokens.json is EMPTY - Once api_tokens.json has records, it becomes the authoritative source - .env changes no longer trigger token overwrites when api_tokens.json exists - Logs debug message when ignoring env tokens Result: - Existing deployments: env tokens automatically migrated to api_tokens.json - UI-created tokens: never overwritten by .env changes - Dark mode toggle: no longer triggers token reload from .env - Backward compatible: fresh installs with API_TOKEN in .env still work - Migration path: users can safely keep API_TOKEN in .env, it will be ignored Future improvement: Add UI warning when API_TOKEN/API_TOKENS still present in .env, prompting users to rotate tokens via the UI.	2025-11-11 00:17:40 +00:00
rcourtman	5d99fc2f2d	Fix dark mode toggle wiping API tokens (related to #685 ) Root cause: SaveSystemSettings calls updateEnvFile which rewrites .env on any setting change, triggering the config watcher. The watcher sees API_TOKEN in .env and replaces all UI-created tokens with "Environment token" records, wiping out host-agent scoped tokens. Fix: updateEnvFile now compares the new content with existing content and skips the write if nothing changed. Since dark mode (and other UI settings) are stored in system.json, not .env, toggling theme no longer triggers unnecessary .env rewrites. This prevents the config watcher from being triggered unnecessarily and preserves UI-created API tokens when changing cosmetic settings. Future improvement: Deprecate API_TOKEN/API_TOKENS from .env entirely and make api_tokens.json the single source of truth (requires migration logic).	2025-11-11 00:11:41 +00:00
rcourtman	bb6ea3b23c	Fix offline alert state not displaying in thresholds UI (related to #683 ) When disabling offline alerts for VMs/containers, the setting was being persisted correctly and honored by the alert system, but the UI always showed "Warn" instead of the actual saved state. Root cause: When reconstructing the overrides list from backend config, the guest override mapping was copying poweredOffSeverity but omitting disableConnectivity, causing ResourceTable to fall back to global defaults. Fix: Add disableConnectivity field to guest override reconstruction in Alerts.tsx (line 676), matching the pattern already used for Docker containers.	2025-11-10 20:32:04 +00:00
rcourtman	14ac4bbb8b	Add Proxmox LXC instructions to bootstrap token UI Users were confused about how to access the bootstrap token in Proxmox LXC containers. They were trying to use the Proxmox web console instead of 'pct enter' from the Proxmox host. This adds explicit instructions in the FirstRunSetup UI that show: - pct enter <ctid> for interactive access - pct exec <ctid> -- cat /etc/pulse/.bootstrap_token for direct retrieval - Clear indication that commands should be run from Proxmox host The instructions only display when the deployment is not Docker and the bootstrap token path is /etc/pulse/.bootstrap_token (indicating LXC). Fixes #681	2025-11-10 12:20:41 +00:00
rcourtman	9fcf0b35e8	Remove RELEASE_CHECKLIST.md from repository This file should remain local only (gitignored) and not be tracked in the repository.	2025-11-10 12:08:21 +00:00
rcourtman	ed0c86c953	Simplify hostname reference in release checklist Update docker-builder hostname from "delly.lan" to "delly" for consistency with other references.	2025-11-10 12:05:08 +00:00
rcourtman	438d3b6b7b	Fix unbound variable error in temperature proxy installation Related to #681 The variable local_proxy_binary was declared with local scope inside the BUILD_FROM_SOURCE conditional block but referenced outside of it during cleanup. This caused "unbound variable" errors on release installs since the script uses set -u. Moved the declaration before the conditional block and initialize to empty string. The cleanup code [[ -f "$local_proxy_binary" ]] already handles the empty string case safely.	2025-11-10 11:37:31 +00:00
rcourtman	999e598e44	Improve bootstrap token instructions for all container types Updated FirstRunSetup to show generic container commands that work across different orchestration platforms: - Use <container-name> placeholder instead of hardcoded "pulse" - Add kubectl exec example for Kubernetes/Helm deployments - Clarify "From container host" applies to Docker, Podman, etc. This ensures the instructions work for Docker Compose, Swarm, Helm, and any other container orchestrator where the container might have a different name.	2025-11-09 23:48:43 +00:00
rcourtman	b29a830046	Fix bootstrap-token command to use correct env var and default path The bootstrap-token CLI command had two bugs: 1. Used PULSE_DATA_PATH instead of PULSE_DATA_DIR (typo) 2. Used /var/lib/pulse as fallback instead of /etc/pulse This caused the command to look in the wrong location for non-Docker deployments. Fixed to match config.Load() logic: - Check PULSE_DATA_DIR env var first - Fall back to /data for Docker, /etc/pulse otherwise	2025-11-09 23:46:41 +00:00
rcourtman	df185985eb	Fix bootstrap token path display for Docker deployments (related to #680 ) The first-run setup UI was displaying incorrect bootstrap token paths for Docker deployments. It showed `/etc/pulse/.bootstrap_token` regardless of deployment type, but Docker containers use `/data/.bootstrap_token` by default (via PULSE_DATA_DIR env var). Changes: - Extended `/api/security/status` endpoint to include `bootstrapTokenPath` and `isDocker` fields when a bootstrap token is active - Updated FirstRunSetup component to fetch and display the correct path dynamically based on actual deployment configuration - For Docker deployments, UI now shows both `docker exec` command and in-container command - Falls back to showing both standard and Docker paths if API data unavailable (backward compatibility) This fix ensures users always see the correct command for their specific deployment, including custom PULSE_DATA_DIR configurations.	2025-11-09 23:41:55 +00:00
rcourtman	a82a345cd6	Improve table column widths and sparkline visibility	2025-11-09 23:36:52 +00:00
rcourtman	6f4cbf3a52	docs: update README	2025-11-09 23:20:19 +00:00
rcourtman	9ab72c236c	docs: update README	2025-11-09 23:02:15 +00:00
rcourtman	aa427678ea	docs: update README	2025-11-09 22:54:04 +00:00
rcourtman	c909e36c91	docs: add specific monthly costs to sponsorship section	2025-11-09 22:41:19 +00:00
rcourtman	4c6f565855	fix: sparkline canvas wrapper display mode for flex layout Change sparkline wrapper from inline-block to block w-full to properly fill flex parent container. Inline-block was preventing the canvas from calculating the correct width when width={0} (auto-size mode).	2025-11-09 22:35:32 +00:00
rcourtman	886368ec44	feat: add sparklines view mode for metrics visualization Add comprehensive sparkline chart support as an alternative to progress bars for CPU, Memory, and Disk metrics across all tables. Features: - Toggle between bars/trends view modes (persisted to localStorage) - 30-second sampling with 2-hour retention window using ring buffer - Canvas-based rendering with shared requestAnimationFrame for efficiency - Hover tooltips showing exact values and timestamps - Threshold reference lines (warning/critical) for context - localStorage persistence survives page refreshes (12-hour max age) - Dynamic width adaptation to column size - Namespaced resource IDs prevent collisions - Lifecycle cleanup prevents memory leaks Performance optimizations: - Decoupled sampling from WebSocket handler (6x reduction in recording) - O(1) ring buffer insertions (no array cloning) - Batched canvas rendering (single rAF for all sparklines) - Debounced localStorage writes - Automatic pruning of removed resources UI improvements: - Consistent radio toggle styling matching other filters - Fixed column widths prevent layout shift during toggle - Fixed row heights prevent vertical size changes - Sparklines fill available column width proportionally	2025-11-09 22:31:35 +00:00
rcourtman	f34ba0fda3	docs: add sponsor badge to header	2025-11-09 22:30:32 +00:00
rcourtman	0089a9ed52	docs: improve sponsorship visibility	2025-11-09 22:27:30 +00:00
rcourtman	752518a830	Remove accidental files	2025-11-09 22:21:19 +00:00
rcourtman	293e2b12ca	docs: update README	2025-11-09 22:19:25 +00:00
rcourtman	75bfc51a7d	Center logo and title in README header	2025-11-09 21:14:45 +00:00
rcourtman	e00065d81c	Fix logo alignment in README header	2025-11-09 21:14:14 +00:00
rcourtman	459b6f3271	Add logo to README header	2025-11-09 21:13:34 +00:00
rcourtman	425ea00ba2	Fix upgrade path when DISABLE_AUTH detected but no credentials exist (fixes #678 ) Users upgrading from v4.25 (where DISABLE_AUTH actually disabled auth) to v4.27.1 (where DISABLE_AUTH is ignored but triggers a deprecation warning) were stuck in a catch-22: - They had no credentials (old version had auth disabled) - DISABLE_AUTH detection incorrectly required authentication - Setup wizard returned 401, preventing first credential creation - Could not complete setup to create credentials and remove flag Root cause: When DISABLE_AUTH was detected, the code set forceRequested=true which triggered the authentication requirement even when authConfigured=false. Fix: Only require authentication when credentials actually exist. When no auth is configured, allow the bootstrap token flow regardless of whether DISABLE_AUTH is detected. This lets users upgrade from legacy DISABLE_AUTH deployments by using the bootstrap token to create their first credentials, then removing the flag.	2025-11-09 20:33:58 +00:00
rcourtman	078248770e	Fix Docker host custom display name not showing in main Docker tab RESOURCE column (related to #662 ) The previous fix (a1ba915ca) correctly added customDisplayName to the WebSocket payload and made it persist in Settings, but the main Docker tab's RESOURCE column still showed the default name. DockerUnifiedTable had four locations that built display names but ignored customDisplayName: - DockerHostGroupHeader (RESOURCE column header) - line 549 - containerMatchesToken (search/filter logic) - line 391 - serviceMatchesToken (search/filter logic) - line 472 - sortedHosts (host sorting logic) - lines 1879-1880 All four now prioritize customDisplayName first, matching the pattern used in DockerHostSummaryTable and Settings (customDisplayName \|\| displayName \|\| hostname \|\| id). This ensures custom Docker host names display consistently across the entire UI.	2025-11-09 18:03:38 +00:00
rcourtman	c9d1671afd	Fix persistent temperature monitoring issues for standalone Proxmox nodes (addresses #571 ) This commit resolves the recurring temperature monitoring failures that have plagued multiple releases: 1. Fix user mismatch (v4.27.1 regression): - Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy' - Aligns with the user created by install-sensor-proxy.sh (line 389) - Prevents panic when binary is run outside systemd context - Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too 2. Fix standalone node validation (v4.25.0+ regression): - pvecm status exits with code 2 on standalone nodes (not in a cluster) - This caused validation to fail, rejecting all temperature requests - Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames - On standalone nodes, cluster membership list is populated with host's own addresses - Maintains SSRF protection while allowing standalone operation - Added comprehensive test coverage 3. Make installer fail loudly on proxy setup failure: - Previously, failed proxy installation only printed a warning - Install script then claimed "Pulse installation complete!" (confusing for users) - Now exits with clear error message and remediation steps - Forces operators to fix proxy issues before claiming success - Users who skip temperature monitoring are unaffected 4. Add test coverage to prevent future regressions: - Added TestDiscoverLocalHostAddresses to verify local address discovery - Validates no loopback or link-local addresses are returned - All existing tests pass with new changes Pattern of failures across releases: - v4.23.0: Missing proxy binaries in release - v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie) - v4.25.0: Single-node pvecm status exit code - v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy) This comprehensive fix addresses the root causes rather than applying another tactical patch. Related to #571	2025-11-09 16:53:14 +00:00
rcourtman	62a9f40cc7	Fix diagnostics incorrectly warning about /run mount in Docker (related to #600 ) The diagnostic code was warning ALL deployments using /run/pulse-sensor-proxy socket path to "remove and re-add" their configuration to use /mnt/pulse-proxy instead. This was incorrect for Docker deployments where /run is the correct and documented mount path (see docker-compose.yml line 15). The warning was only meant for LXC containers where the managed mount at /mnt/pulse-proxy is preferred over a legacy hand-crafted /run mount. Fix: Only show the warning in non-Docker environments (check PULSE_DOCKER env). Docker deployments correctly use /run/pulse-sensor-proxy per compose file. Impact: Docker users were seeing confusing diagnostic warnings telling them to reconfigure a correct setup.	2025-11-09 16:49:49 +00:00
rcourtman	5bac91a664	Fix pulse-sensor-proxy configuration not applied in LXC containers (related to #600 ) This fixes two bugs that prevented temperature monitoring from working after running install-sensor-proxy.sh on LXC deployments: 1. CRITICAL: Pulse service not restarted after systemd override - The installer wrote PULSE_SENSOR_PROXY_SOCKET env var to systemd drop-in and ran daemon-reload, but never restarted Pulse service - Running Pulse instances continued using old environment variables - Temperatures wouldn't work until manual Pulse restart - Now: Automatically restart Pulse if running after writing override 2. Added guard to check if Pulse service exists before configuring - Installer would write systemd override even if Pulse not installed - Left orphaned drop-in files that confused users - Now: Check if pulse.service exists, warn and skip if not found 3. MINOR: Fix inconsistent Docker mount instructions - docker-compose.yml showed :ro (read-only) mount - Installer output showed :rw (read-write) mount - Changed installer to match compose file (:ro is correct and secure) Impact: Users in #600 reported "socketFound=false" even after running installer successfully. This was because Pulse never picked up the new socket path without a restart.	2025-11-09 16:44:08 +00:00
rcourtman	bb7ca93c18	feat: Add mdadm RAID monitoring support for host agents Implements comprehensive mdadm RAID array monitoring for Linux hosts via pulse-host-agent. Arrays are automatically detected and monitored with real-time status updates, rebuild progress tracking, and automatic alerting for degraded or failed arrays. Key changes: Backend: - Add mdadm package for parsing mdadm --detail output - Extend host agent report structure with RAID array data - Integrate mdadm collection into host agent (Linux-only, best-effort) - Add RAID array processing in monitoring system - Implement automatic alerting: - Critical alerts for degraded arrays or arrays with failed devices - Warning alerts for rebuilding/resyncing arrays with progress tracking - Auto-clear alerts when arrays return to healthy state Frontend: - Add TypeScript types for RAID arrays and devices - Display RAID arrays in host details drawer with: - Array status (clean/degraded/recovering) with color-coded indicators - Device counts (active/total/failed/spare) - Rebuild progress percentage and speed when applicable - Green for healthy, amber for rebuilding, red for degraded Documentation: - Document mdadm monitoring feature in HOST_AGENT.md - Explain requirements (Linux, mdadm installed, root access) - Clarify scope (software RAID only, hardware RAID not supported) Testing: - Add comprehensive tests for mdadm output parsing - Test parsing of healthy, degraded, and rebuilding arrays - Verify proper extraction of device states and rebuild progress All builds pass successfully. RAID monitoring is automatic and best-effort - if mdadm is not installed or no arrays exist, host agent continues reporting other metrics normally. Related to #676	2025-11-09 16:36:33 +00:00
rcourtman	23ce2c6d11	Add support for Windows 32-bit (windows-386) architecture (related to #674 ) Adds build support for 32-bit Windows (windows-386) for pulse-host-agent. Changes: - Add windows-386 build to Dockerfile host-agent build section - Add windows-386 binary copy and symlink to Dockerfile - Add windows-386 build to build-release.sh - Add windows-386 zip package to release artifacts - Include windows-386 binary in standalone binary copies This enables pulse-host-agent to run on 32-bit Windows systems, which are still relevant in legacy/industrial monitoring environments through late 2025.	2025-11-09 08:57:30 +00:00
rcourtman	188944019a	docs: Add webhook private IP allowlist configuration guide Document the new webhook security feature that allows homelab users to configure trusted private IP ranges for webhook targets. Includes: - Overview of default security behavior - Step-by-step configuration instructions - Security considerations and best practices - Example CIDR configurations - Troubleshooting guidance for common error messages Related to #673	2025-11-09 08:36:15 +00:00
rcourtman	4834dea05b	Add support for linux-386 and linux-armv6 architectures (related to #674 ) Adds build support for 32-bit x86 (i386/i686) and ARMv6 (older Raspberry Pi models) architectures across all agents and install scripts. Changes: - Add linux-386 and linux-armv6 to build-release.sh builds array - Update Dockerfile to build docker-agent, host-agent, and sensor-proxy for new architectures - Update all install scripts to detect and handle i386/i686 and armv6l architectures - Add architecture normalization in router download endpoints - Update update manager architecture mapping - Update validate-release.sh to expect 24 binaries (was 18) This enables Pulse agents to run on older/legacy hardware including 32-bit x86 systems and Raspberry Pi Zero/Zero W devices.	2025-11-09 08:35:24 +00:00
rcourtman	1b221cca71	feat: Add configurable allowlist for webhook private IP targets (addresses #673 ) Allow homelab users to send webhooks to internal services while maintaining security defaults. Changes: - Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config) - Implement CIDR parsing and validation in NotificationManager - Convert ValidateWebhookURL to instance method to access allowlist - Add UI controls in System Settings for configuring trusted CIDR ranges - Maintain strict security by default (block all private IPs) - Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist - Re-validate on both config save and webhook delivery (DNS rebinding protection) - Add comprehensive tests for CIDR parsing and IP matching Backend: - UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation - Support for bare IPs (auto-converts to /32 or /128) - Thread-safe allowlist updates with RWMutex - Logging when allowlist is updated or used - Validation errors prevent invalid CIDRs from being saved Frontend: - New "Webhook Security" section in System Settings - Input field with examples and helpful placeholder text - Real-time unsaved changes tracking - Loads and saves allowlist via system settings API Security: - Default behavior unchanged (all private IPs blocked) - Explicit opt-in required via configuration - Localhost (127/8) always blocked - Link-local (169.254/16) always blocked - Cloud metadata services always blocked - DNS resolution checked at both save and send time Testing: - Tests for CIDR parsing (valid/invalid inputs) - Tests for IP allowlist matching - Tests for bare IP address handling - Tests for security boundaries (localhost, link-local remain blocked) Related to #673 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-09 08:31:12 +00:00
rcourtman	6bb53eaadb	Surface update errors to UI for better user feedback (related to #671 ) User ZaDarkSide reported that when updates fail, the UI shows a loading spinner indefinitely with no feedback about what went wrong. Users had to check backend logs to understand failures like "checksum verification failed". The infrastructure was already in place: - UpdateStatus struct had an Error field - Frontend already renders error details when present - But updateStatus() never populated the Error field Changes: - Modified updateStatus() to accept optional error parameter - Added sanitizeError() to cap error message length (500 chars max) - Updated all error cases in ApplyUpdate() to pass error details: - Temp directory creation failures - Download failures - Checksum verification failures (most common user complaint) - Extraction failures - Backup creation failures - Apply update failures - Also updated CheckForUpdates() error cases Now when updates fail, users immediately see the error message in the UI's red error panel instead of being stuck on a loading spinner. Security: Errors are only shown to authenticated admin users with update permissions. Error messages are capped at 500 chars to prevent extremely long output. Current error messages don't contain sensitive data (mainly HTTP status codes, file paths, checksum mismatches).	2025-11-09 08:23:04 +00:00
rcourtman	cb682ed369	chore: bump version to v4.27.1	2025-11-09 08:20:25 +00:00
rcourtman	c46b52be89	Improve release notes template with detailed installation methods Updated template to hybrid format combining best of v4.27.0 and v4.25.0: Benefits from detailed format (v4.25.0): - 4 complete installation methods (Quick/Docker/Binary/Helm) - Copy-pasteable commands for each method - Explicit Downloads section listing what's available - Better for new users and SEO Benefits from simple format (v4.27.0): - Consistent section ordering - Clean, scannable structure - Breaking Changes section always present Changes descriptions now require context and user impact, not just one-liners. This helps users understand if a change affects them without clicking through to issues. Based on Codex analysis that detailed format serves more user types better: new users, quick upgrades, search indexing, and professional appearance.	2025-11-09 00:31:54 +00:00
rcourtman	19620faa30	Make release checklist resilient to artifact count changes Removed hardcoded '31 assets' requirement. Instead, checklist now says: - Compare with recent successful releases (v4.26.5, v4.27.0) - Investigate if count differs significantly - Trust the build script output, not a magic number This prevents checklist from becoming outdated if build script adds/removes artifacts. AI can adapt to changes rather than failing on incorrect validation. Philosophy: Define what good looks like (matches recent releases) rather than hardcoding specific numbers that will inevitably change.	2025-11-09 00:30:03 +00:00
rcourtman	89992625ae	Redesign release checklist as requirements, not commands Changed philosophy from 'follow these exact commands' to 'ensure these outcomes are true'. This allows AI to be intelligent about HOW to accomplish goals rather than blindly following steps. Key changes: - Focus on WHAT must be true, not HOW to make it true - Explain WHY each requirement matters - Document critical constraints (checksums.txt ordering, asset count) - Provide troubleshooting guidance instead of rigid procedures - Trust AI to figure out optimal execution path This approach ensures consistent, reliable releases while allowing flexibility in execution methods.	2025-11-09 00:28:04 +00:00
rcourtman	0271c1e78d	Fix release checklist to upload checksums.txt FIRST (fixes #671 race condition) User ZaDarkSide reported that checksums.txt was being uploaded last, causing update failures for users who check immediately after release. The auto-updater downloads checksums.txt first, but if it's not available yet, the update fails with 'no checksum file found'. Changed upload order to: 1. checksums.txt (FIRST - critical for auto-updates) 2. tarballs, zips, helm chart 3. install.sh 4. SHA256 files This prevents the race condition where fast users get update failures.	2025-11-09 00:25:42 +00:00
rcourtman	ed459a9ef4	Update release checklist template to include all artifact types Fixed gh release create commands to upload ALL artifacts: - .tar.gz (Linux/macOS tarballs) - .zip (Windows packages) - .tgz (Helm chart) - .sh (install script) - .txt (checksums) - .sha256 (all checksum files) Updated verification step to check for ~30 assets instead of 4. This fixes incomplete releases that were missing Windows packages, checksums, and install scripts.	2025-11-09 00:22:42 +00:00
rcourtman	5401a4f265	chore: bump version to v4.27.0	2025-11-08 23:48:53 +00:00
rcourtman	334b8c727f	Fix SMART temperature collection on smartctl 7.4+ (related to #672 ) Fixes two critical bugs in refresh_smart_cache() that prevented SMART temperature collection from working: 1. Invalid smartctl parameter: Changed -n standby,after to -n standby The 'after' parameter is not valid in smartctl 7.4 and causes: "INVALID ARGUMENT TO -n: standby,after" Valid syntax is standby[,STATUS[,STATUS2]] where STATUS must be numeric. 2. Broken process detection: Replaced exec -a with lock file approach The original exec -a pulse-sensor-wrapper-refresh bash line replaced the subshell with a new bash process that had no script to run, causing the function to exit immediately without collecting any SMART data. New approach uses a lock file ($CACHE_DIR/smart-refresh.lock) with trap-based cleanup to prevent concurrent refresh operations. Credits to @ZaDarkSide for identifying these issues in PR #672.	2025-11-08 23:40:43 +00:00
rcourtman	de10ec949e	Fix CRITICAL bug: UpdateProgressModal polling never started (fixes #671 ) ROOT CAUSE: The onMount hook checked props.isOpen, but onMount only runs ONCE when the component first mounts. Since UpdateProgressModal mounts when the app loads (before the user clicks "Apply Update"), props.isOpen is false at mount time, so polling never initializes. When the user later clicks "Apply Update" and props.isOpen becomes true, onMount doesn't re-run, leaving the modal in a broken state with no polling, no restart detection, and no auto-reload - exactly what users reported (stuck for 30+ mins). SOLUTION: Changed from onMount to createEffect watching props.isOpen. Now: - Polling starts immediately when the modal opens (user clicks "Apply Update") - Polling stops when the modal closes (cleanup) - The entire update flow works as designed This was the ACTUAL bug - the previous commits (global watcher, fallback polling) were helpful additions but didn't fix the root cause.	2025-11-08 23:26:55 +00:00
rcourtman	c004c4517f	Bulletproof the update auto-refresh with fallback mechanisms (related to #671 ) After the initial fix, added multiple layers of reliability to ensure updates ALWAYS auto-refresh, even in edge cases: 1. Fallback polling: GlobalUpdateProgressWatcher now polls /api/updates/status every 5 seconds as a safety net in case WebSocket events are dropped, missed, or the tab connects mid-update. This ensures tabs that join late or have WebSocket issues still detect in-progress updates. 2. Manual reload button: Added "Reload Now" button in UpdateProgressModal that appears after 5+ health check attempts during restart. Gives users an escape hatch if auto-reload is delayed (slow DNS, reverse proxy issues, etc.). 3. Already protected: Modal close button only shows when update is complete, preventing users from accidentally closing it mid-update. These changes address all failure modes identified: - Tabs without WebSocket: covered by polling fallback - Tabs joining mid-update: covered by polling fallback - Health check delays: covered by manual reload button - User accidentally closing modal: already prevented The combination of WebSocket events (primary), polling (fallback), health checks (restart detection), and manual reload (escape hatch) should make this bulletproof.	2025-11-08 23:19:51 +00:00
rcourtman	706822ed58	Fix updater auto-refresh for all open tabs (related to #671 ) Problem: When an update was triggered, only the tab that clicked "Apply Update" would show the progress modal and auto-refresh after completion. Other open tabs would remain on the old version indefinitely. Root cause: The UpdateProgressModal was only shown when explicitly opened via the UpdateBanner component. WebSocket already broadcasts update:progress events, but no global listener existed to show the modal in all tabs. Solution: Added GlobalUpdateProgressWatcher component in App.tsx that: - Listens to WebSocket updateProgress events globally (in all tabs) - Filters to only real update-in-progress states (downloading, verifying, extracting, installing, restarting) to avoid false positives from routine update checks - Auto-opens the progress modal when an update starts - Allows manual dismissal after update completes - Works independently of UpdateBanner visibility (e.g., when banner is dismissed) The modal's existing health-check and auto-reload logic handles the page refresh once the backend is healthy again.	2025-11-08 23:15:50 +00:00
rcourtman	6bf32f98d6	Fix storage/disk/backup disappearing for clusters with VerifySSL enabled Related to #670, #657 The fix in v4.26.5 (commit `59a97f2e3`) attempted to resolve storage disappearing by preferring hostnames over IPs when TLS hostname verification is required (VerifySSL=true and no fingerprint). However, that fix was ineffective because the cluster discovery code was populating BOTH the Host and IP fields with the IP address. Root Cause: In internal/api/config_handlers.go, the detectPVECluster function was setting: - endpoint.Host = schemePrefix + clusterNode.IP (when IP was available) - endpoint.IP = clusterNode.IP This meant both fields contained the same IP address. When the monitoring code tried to prefer endpoint.Host for TLS validation (internal/monitoring/monitor.go: 361-368), it was still getting an IP, causing certificate validation to fail with "certificate is valid for pve01.example.com, not 10.0.0.44". Solution: Separate the Host and IP fields properly during cluster discovery: - endpoint.Host = hostname (e.g., "https://pve01:8006") for TLS validation - endpoint.IP = IP address (e.g., "10.0.0.44") for DNS-free connections The existing logic in clusterEndpointEffectiveURL() can now correctly choose between them based on TLS requirements. Impact: Users with VerifySSL=true who upgraded to v4.26.1-v4.26.5 and lost storage visibility should now see storage, VM disks, and backups again after this fix.	2025-11-08 23:07:49 +00:00
rcourtman	a39beca464	Fix install.sh auto-update download timeout on slow DNS networks (related to #669 ) The 5-second connect timeout was too aggressive for DNS resolution in some Proxmox LXC environments, causing "Resolving timed out after 5000 milliseconds" errors when downloading the auto-update script from raw.githubusercontent.com. Changes: - Add download_auto_update_script() helper with retry logic - Increase connect timeout from 5s to 15s for slow DNS - Increase max time from 15s to 60s for complete transfer - Retry up to 3 times with incremental backoff (3s, 6s delays) - Gracefully degrade: installer continues without auto-updates if download fails - Users can re-run with --enable-auto-updates later when connectivity improves	2025-11-08 18:50:18 +00:00

... 85 86 87 88 89 ...

4826 commits