Add asset availability check before updating demo server. The workflow now waits
up to 5 minutes for checksums.txt and the linux-amd64 tarball to be available
before attempting the update. This prevents the install script from failing when
the release is published before all assets finish uploading.
Resolves demo server downtime during releases.
This is the proper architectural fix for #685. The previous commit was a
bandaid that prevented unnecessary .env writes. This commit addresses the
root cause: dual-source-of-truth for API tokens (.env vs api_tokens.json).
Changes:
1. Startup Migration (config.go:896-951):
- When loading config, if API_TOKEN/API_TOKENS exist in .env but not in
api_tokens.json, automatically migrate them
- Migrated tokens are named "Migrated from .env (prefix)" for clarity
- Logs a deprecation warning: API_TOKEN/API_TOKENS in .env are deprecated
- Leaves .env untouched (safe for existing deployments)
2. Config Watcher Changes (watcher.go:338-424):
- Only load tokens from .env if api_tokens.json is EMPTY
- Once api_tokens.json has records, it becomes the authoritative source
- .env changes no longer trigger token overwrites when api_tokens.json exists
- Logs debug message when ignoring env tokens
Result:
- Existing deployments: env tokens automatically migrated to api_tokens.json
- UI-created tokens: never overwritten by .env changes
- Dark mode toggle: no longer triggers token reload from .env
- Backward compatible: fresh installs with API_TOKEN in .env still work
- Migration path: users can safely keep API_TOKEN in .env, it will be ignored
Future improvement: Add UI warning when API_TOKEN/API_TOKENS still present
in .env, prompting users to rotate tokens via the UI.
Root cause: SaveSystemSettings calls updateEnvFile which rewrites .env on
any setting change, triggering the config watcher. The watcher sees API_TOKEN
in .env and replaces all UI-created tokens with "Environment token" records,
wiping out host-agent scoped tokens.
Fix: updateEnvFile now compares the new content with existing content and
skips the write if nothing changed. Since dark mode (and other UI settings)
are stored in system.json, not .env, toggling theme no longer triggers
unnecessary .env rewrites.
This prevents the config watcher from being triggered unnecessarily and
preserves UI-created API tokens when changing cosmetic settings.
Future improvement: Deprecate API_TOKEN/API_TOKENS from .env entirely and
make api_tokens.json the single source of truth (requires migration logic).
When disabling offline alerts for VMs/containers, the setting was being persisted
correctly and honored by the alert system, but the UI always showed "Warn" instead
of the actual saved state.
Root cause: When reconstructing the overrides list from backend config, the guest
override mapping was copying poweredOffSeverity but omitting disableConnectivity,
causing ResourceTable to fall back to global defaults.
Fix: Add disableConnectivity field to guest override reconstruction in Alerts.tsx
(line 676), matching the pattern already used for Docker containers.
Users were confused about how to access the bootstrap token in Proxmox
LXC containers. They were trying to use the Proxmox web console instead
of 'pct enter' from the Proxmox host.
This adds explicit instructions in the FirstRunSetup UI that show:
- pct enter <ctid> for interactive access
- pct exec <ctid> -- cat /etc/pulse/.bootstrap_token for direct retrieval
- Clear indication that commands should be run from Proxmox host
The instructions only display when the deployment is not Docker and the
bootstrap token path is /etc/pulse/.bootstrap_token (indicating LXC).
Fixes#681
Related to #681
The variable local_proxy_binary was declared with local scope inside
the BUILD_FROM_SOURCE conditional block but referenced outside of it
during cleanup. This caused "unbound variable" errors on release installs
since the script uses set -u.
Moved the declaration before the conditional block and initialize to empty
string. The cleanup code [[ -f "$local_proxy_binary" ]] already handles
the empty string case safely.
Updated FirstRunSetup to show generic container commands that work
across different orchestration platforms:
- Use <container-name> placeholder instead of hardcoded "pulse"
- Add kubectl exec example for Kubernetes/Helm deployments
- Clarify "From container host" applies to Docker, Podman, etc.
This ensures the instructions work for Docker Compose, Swarm, Helm,
and any other container orchestrator where the container might have
a different name.
The bootstrap-token CLI command had two bugs:
1. Used PULSE_DATA_PATH instead of PULSE_DATA_DIR (typo)
2. Used /var/lib/pulse as fallback instead of /etc/pulse
This caused the command to look in the wrong location for non-Docker
deployments. Fixed to match config.Load() logic:
- Check PULSE_DATA_DIR env var first
- Fall back to /data for Docker, /etc/pulse otherwise
The first-run setup UI was displaying incorrect bootstrap token paths for
Docker deployments. It showed `/etc/pulse/.bootstrap_token` regardless of
deployment type, but Docker containers use `/data/.bootstrap_token` by
default (via PULSE_DATA_DIR env var).
Changes:
- Extended `/api/security/status` endpoint to include `bootstrapTokenPath`
and `isDocker` fields when a bootstrap token is active
- Updated FirstRunSetup component to fetch and display the correct path
dynamically based on actual deployment configuration
- For Docker deployments, UI now shows both `docker exec` command and
in-container command
- Falls back to showing both standard and Docker paths if API data
unavailable (backward compatibility)
This fix ensures users always see the correct command for their specific
deployment, including custom PULSE_DATA_DIR configurations.
Change sparkline wrapper from inline-block to block w-full to properly
fill flex parent container. Inline-block was preventing the canvas from
calculating the correct width when width={0} (auto-size mode).
Add comprehensive sparkline chart support as an alternative to progress bars
for CPU, Memory, and Disk metrics across all tables.
Features:
- Toggle between bars/trends view modes (persisted to localStorage)
- 30-second sampling with 2-hour retention window using ring buffer
- Canvas-based rendering with shared requestAnimationFrame for efficiency
- Hover tooltips showing exact values and timestamps
- Threshold reference lines (warning/critical) for context
- localStorage persistence survives page refreshes (12-hour max age)
- Dynamic width adaptation to column size
- Namespaced resource IDs prevent collisions
- Lifecycle cleanup prevents memory leaks
Performance optimizations:
- Decoupled sampling from WebSocket handler (6x reduction in recording)
- O(1) ring buffer insertions (no array cloning)
- Batched canvas rendering (single rAF for all sparklines)
- Debounced localStorage writes
- Automatic pruning of removed resources
UI improvements:
- Consistent radio toggle styling matching other filters
- Fixed column widths prevent layout shift during toggle
- Fixed row heights prevent vertical size changes
- Sparklines fill available column width proportionally
Users upgrading from v4.25 (where DISABLE_AUTH actually disabled auth) to
v4.27.1 (where DISABLE_AUTH is ignored but triggers a deprecation warning)
were stuck in a catch-22:
- They had no credentials (old version had auth disabled)
- DISABLE_AUTH detection incorrectly required authentication
- Setup wizard returned 401, preventing first credential creation
- Could not complete setup to create credentials and remove flag
Root cause: When DISABLE_AUTH was detected, the code set forceRequested=true
which triggered the authentication requirement even when authConfigured=false.
Fix: Only require authentication when credentials actually exist. When no
auth is configured, allow the bootstrap token flow regardless of whether
DISABLE_AUTH is detected.
This lets users upgrade from legacy DISABLE_AUTH deployments by using the
bootstrap token to create their first credentials, then removing the flag.
The previous fix (a1ba915ca) correctly added customDisplayName to the WebSocket
payload and made it persist in Settings, but the main Docker tab's RESOURCE
column still showed the default name.
DockerUnifiedTable had four locations that built display names but ignored
customDisplayName:
- DockerHostGroupHeader (RESOURCE column header) - line 549
- containerMatchesToken (search/filter logic) - line 391
- serviceMatchesToken (search/filter logic) - line 472
- sortedHosts (host sorting logic) - lines 1879-1880
All four now prioritize customDisplayName first, matching the pattern used in
DockerHostSummaryTable and Settings (customDisplayName || displayName ||
hostname || id).
This ensures custom Docker host names display consistently across the entire UI.
This commit resolves the recurring temperature monitoring failures that have plagued multiple releases:
1. **Fix user mismatch (v4.27.1 regression)**:
- Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy'
- Aligns with the user created by install-sensor-proxy.sh (line 389)
- Prevents panic when binary is run outside systemd context
- Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too
2. **Fix standalone node validation (v4.25.0+ regression)**:
- pvecm status exits with code 2 on standalone nodes (not in a cluster)
- This caused validation to fail, rejecting all temperature requests
- Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames
- On standalone nodes, cluster membership list is populated with host's own addresses
- Maintains SSRF protection while allowing standalone operation
- Added comprehensive test coverage
3. **Make installer fail loudly on proxy setup failure**:
- Previously, failed proxy installation only printed a warning
- Install script then claimed "Pulse installation complete!" (confusing for users)
- Now exits with clear error message and remediation steps
- Forces operators to fix proxy issues before claiming success
- Users who skip temperature monitoring are unaffected
4. **Add test coverage to prevent future regressions**:
- Added TestDiscoverLocalHostAddresses to verify local address discovery
- Validates no loopback or link-local addresses are returned
- All existing tests pass with new changes
Pattern of failures across releases:
- v4.23.0: Missing proxy binaries in release
- v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie)
- v4.25.0: Single-node pvecm status exit code
- v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy)
This comprehensive fix addresses the root causes rather than applying another tactical patch.
Related to #571
The diagnostic code was warning ALL deployments using /run/pulse-sensor-proxy
socket path to "remove and re-add" their configuration to use /mnt/pulse-proxy
instead. This was incorrect for Docker deployments where /run is the correct
and documented mount path (see docker-compose.yml line 15).
The warning was only meant for LXC containers where the managed mount at
/mnt/pulse-proxy is preferred over a legacy hand-crafted /run mount.
Fix: Only show the warning in non-Docker environments (check PULSE_DOCKER env).
Docker deployments correctly use /run/pulse-sensor-proxy per compose file.
Impact: Docker users were seeing confusing diagnostic warnings telling them
to reconfigure a correct setup.
This fixes two bugs that prevented temperature monitoring from working
after running install-sensor-proxy.sh on LXC deployments:
1. CRITICAL: Pulse service not restarted after systemd override
- The installer wrote PULSE_SENSOR_PROXY_SOCKET env var to systemd
drop-in and ran daemon-reload, but never restarted Pulse service
- Running Pulse instances continued using old environment variables
- Temperatures wouldn't work until manual Pulse restart
- Now: Automatically restart Pulse if running after writing override
2. Added guard to check if Pulse service exists before configuring
- Installer would write systemd override even if Pulse not installed
- Left orphaned drop-in files that confused users
- Now: Check if pulse.service exists, warn and skip if not found
3. MINOR: Fix inconsistent Docker mount instructions
- docker-compose.yml showed :ro (read-only) mount
- Installer output showed :rw (read-write) mount
- Changed installer to match compose file (:ro is correct and secure)
Impact: Users in #600 reported "socketFound=false" even after running
installer successfully. This was because Pulse never picked up the new
socket path without a restart.
Implements comprehensive mdadm RAID array monitoring for Linux hosts
via pulse-host-agent. Arrays are automatically detected and monitored
with real-time status updates, rebuild progress tracking, and automatic
alerting for degraded or failed arrays.
Key changes:
**Backend:**
- Add mdadm package for parsing mdadm --detail output
- Extend host agent report structure with RAID array data
- Integrate mdadm collection into host agent (Linux-only, best-effort)
- Add RAID array processing in monitoring system
- Implement automatic alerting:
- Critical alerts for degraded arrays or arrays with failed devices
- Warning alerts for rebuilding/resyncing arrays with progress tracking
- Auto-clear alerts when arrays return to healthy state
**Frontend:**
- Add TypeScript types for RAID arrays and devices
- Display RAID arrays in host details drawer with:
- Array status (clean/degraded/recovering) with color-coded indicators
- Device counts (active/total/failed/spare)
- Rebuild progress percentage and speed when applicable
- Green for healthy, amber for rebuilding, red for degraded
**Documentation:**
- Document mdadm monitoring feature in HOST_AGENT.md
- Explain requirements (Linux, mdadm installed, root access)
- Clarify scope (software RAID only, hardware RAID not supported)
**Testing:**
- Add comprehensive tests for mdadm output parsing
- Test parsing of healthy, degraded, and rebuilding arrays
- Verify proper extraction of device states and rebuild progress
All builds pass successfully. RAID monitoring is automatic and best-effort
- if mdadm is not installed or no arrays exist, host agent continues
reporting other metrics normally.
Related to #676
Adds build support for 32-bit Windows (windows-386) for pulse-host-agent.
Changes:
- Add windows-386 build to Dockerfile host-agent build section
- Add windows-386 binary copy and symlink to Dockerfile
- Add windows-386 build to build-release.sh
- Add windows-386 zip package to release artifacts
- Include windows-386 binary in standalone binary copies
This enables pulse-host-agent to run on 32-bit Windows systems, which are still relevant in legacy/industrial monitoring environments through late 2025.
Document the new webhook security feature that allows homelab users to configure
trusted private IP ranges for webhook targets.
Includes:
- Overview of default security behavior
- Step-by-step configuration instructions
- Security considerations and best practices
- Example CIDR configurations
- Troubleshooting guidance for common error messages
Related to #673
Adds build support for 32-bit x86 (i386/i686) and ARMv6 (older Raspberry Pi models) architectures across all agents and install scripts.
Changes:
- Add linux-386 and linux-armv6 to build-release.sh builds array
- Update Dockerfile to build docker-agent, host-agent, and sensor-proxy for new architectures
- Update all install scripts to detect and handle i386/i686 and armv6l architectures
- Add architecture normalization in router download endpoints
- Update update manager architecture mapping
- Update validate-release.sh to expect 24 binaries (was 18)
This enables Pulse agents to run on older/legacy hardware including 32-bit x86 systems and Raspberry Pi Zero/Zero W devices.
Allow homelab users to send webhooks to internal services while maintaining security defaults.
Changes:
- Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config)
- Implement CIDR parsing and validation in NotificationManager
- Convert ValidateWebhookURL to instance method to access allowlist
- Add UI controls in System Settings for configuring trusted CIDR ranges
- Maintain strict security by default (block all private IPs)
- Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist
- Re-validate on both config save and webhook delivery (DNS rebinding protection)
- Add comprehensive tests for CIDR parsing and IP matching
Backend:
- UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation
- Support for bare IPs (auto-converts to /32 or /128)
- Thread-safe allowlist updates with RWMutex
- Logging when allowlist is updated or used
- Validation errors prevent invalid CIDRs from being saved
Frontend:
- New "Webhook Security" section in System Settings
- Input field with examples and helpful placeholder text
- Real-time unsaved changes tracking
- Loads and saves allowlist via system settings API
Security:
- Default behavior unchanged (all private IPs blocked)
- Explicit opt-in required via configuration
- Localhost (127/8) always blocked
- Link-local (169.254/16) always blocked
- Cloud metadata services always blocked
- DNS resolution checked at both save and send time
Testing:
- Tests for CIDR parsing (valid/invalid inputs)
- Tests for IP allowlist matching
- Tests for bare IP address handling
- Tests for security boundaries (localhost, link-local remain blocked)
Related to #673🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
User ZaDarkSide reported that when updates fail, the UI shows a loading
spinner indefinitely with no feedback about what went wrong. Users had to
check backend logs to understand failures like "checksum verification failed".
The infrastructure was already in place:
- UpdateStatus struct had an Error field
- Frontend already renders error details when present
- But updateStatus() never populated the Error field
Changes:
- Modified updateStatus() to accept optional error parameter
- Added sanitizeError() to cap error message length (500 chars max)
- Updated all error cases in ApplyUpdate() to pass error details:
- Temp directory creation failures
- Download failures
- Checksum verification failures (most common user complaint)
- Extraction failures
- Backup creation failures
- Apply update failures
- Also updated CheckForUpdates() error cases
Now when updates fail, users immediately see the error message in the UI's
red error panel instead of being stuck on a loading spinner.
Security: Errors are only shown to authenticated admin users with update
permissions. Error messages are capped at 500 chars to prevent extremely
long output. Current error messages don't contain sensitive data (mainly
HTTP status codes, file paths, checksum mismatches).
Updated template to hybrid format combining best of v4.27.0 and v4.25.0:
Benefits from detailed format (v4.25.0):
- 4 complete installation methods (Quick/Docker/Binary/Helm)
- Copy-pasteable commands for each method
- Explicit Downloads section listing what's available
- Better for new users and SEO
Benefits from simple format (v4.27.0):
- Consistent section ordering
- Clean, scannable structure
- Breaking Changes section always present
Changes descriptions now require context and user impact, not just
one-liners. This helps users understand if a change affects them without
clicking through to issues.
Based on Codex analysis that detailed format serves more user types better:
new users, quick upgrades, search indexing, and professional appearance.
Removed hardcoded '31 assets' requirement. Instead, checklist now says:
- Compare with recent successful releases (v4.26.5, v4.27.0)
- Investigate if count differs significantly
- Trust the build script output, not a magic number
This prevents checklist from becoming outdated if build script adds/removes
artifacts. AI can adapt to changes rather than failing on incorrect validation.
Philosophy: Define what good looks like (matches recent releases) rather than
hardcoding specific numbers that will inevitably change.
Changed philosophy from 'follow these exact commands' to 'ensure these
outcomes are true'. This allows AI to be intelligent about HOW to
accomplish goals rather than blindly following steps.
Key changes:
- Focus on WHAT must be true, not HOW to make it true
- Explain WHY each requirement matters
- Document critical constraints (checksums.txt ordering, asset count)
- Provide troubleshooting guidance instead of rigid procedures
- Trust AI to figure out optimal execution path
This approach ensures consistent, reliable releases while allowing
flexibility in execution methods.
User ZaDarkSide reported that checksums.txt was being uploaded last,
causing update failures for users who check immediately after release.
The auto-updater downloads checksums.txt first, but if it's not available
yet, the update fails with 'no checksum file found'.
Changed upload order to:
1. checksums.txt (FIRST - critical for auto-updates)
2. tarballs, zips, helm chart
3. install.sh
4. SHA256 files
This prevents the race condition where fast users get update failures.
Fixes two critical bugs in refresh_smart_cache() that prevented SMART
temperature collection from working:
1. Invalid smartctl parameter: Changed -n standby,after to -n standby
The 'after' parameter is not valid in smartctl 7.4 and causes:
"INVALID ARGUMENT TO -n: standby,after"
Valid syntax is standby[,STATUS[,STATUS2]] where STATUS must be numeric.
2. Broken process detection: Replaced exec -a with lock file approach
The original exec -a pulse-sensor-wrapper-refresh bash line replaced
the subshell with a new bash process that had no script to run, causing
the function to exit immediately without collecting any SMART data.
New approach uses a lock file ($CACHE_DIR/smart-refresh.lock) with
trap-based cleanup to prevent concurrent refresh operations.
Credits to @ZaDarkSide for identifying these issues in PR #672.
ROOT CAUSE: The onMount hook checked props.isOpen, but onMount only runs ONCE
when the component first mounts. Since UpdateProgressModal mounts when the app
loads (before the user clicks "Apply Update"), props.isOpen is false at mount
time, so polling never initializes.
When the user later clicks "Apply Update" and props.isOpen becomes true, onMount
doesn't re-run, leaving the modal in a broken state with no polling, no restart
detection, and no auto-reload - exactly what users reported (stuck for 30+ mins).
SOLUTION: Changed from onMount to createEffect watching props.isOpen. Now:
- Polling starts immediately when the modal opens (user clicks "Apply Update")
- Polling stops when the modal closes (cleanup)
- The entire update flow works as designed
This was the ACTUAL bug - the previous commits (global watcher, fallback polling)
were helpful additions but didn't fix the root cause.
After the initial fix, added multiple layers of reliability to ensure updates
ALWAYS auto-refresh, even in edge cases:
1. Fallback polling: GlobalUpdateProgressWatcher now polls /api/updates/status
every 5 seconds as a safety net in case WebSocket events are dropped, missed,
or the tab connects mid-update. This ensures tabs that join late or have
WebSocket issues still detect in-progress updates.
2. Manual reload button: Added "Reload Now" button in UpdateProgressModal that
appears after 5+ health check attempts during restart. Gives users an escape
hatch if auto-reload is delayed (slow DNS, reverse proxy issues, etc.).
3. Already protected: Modal close button only shows when update is complete,
preventing users from accidentally closing it mid-update.
These changes address all failure modes identified:
- Tabs without WebSocket: covered by polling fallback
- Tabs joining mid-update: covered by polling fallback
- Health check delays: covered by manual reload button
- User accidentally closing modal: already prevented
The combination of WebSocket events (primary), polling (fallback), health checks
(restart detection), and manual reload (escape hatch) should make this bulletproof.
Problem: When an update was triggered, only the tab that clicked "Apply Update"
would show the progress modal and auto-refresh after completion. Other open tabs
would remain on the old version indefinitely.
Root cause: The UpdateProgressModal was only shown when explicitly opened via the
UpdateBanner component. WebSocket already broadcasts update:progress events, but
no global listener existed to show the modal in all tabs.
Solution: Added GlobalUpdateProgressWatcher component in App.tsx that:
- Listens to WebSocket updateProgress events globally (in all tabs)
- Filters to only real update-in-progress states (downloading, verifying, extracting,
installing, restarting) to avoid false positives from routine update checks
- Auto-opens the progress modal when an update starts
- Allows manual dismissal after update completes
- Works independently of UpdateBanner visibility (e.g., when banner is dismissed)
The modal's existing health-check and auto-reload logic handles the page refresh
once the backend is healthy again.
Related to #670, #657
The fix in v4.26.5 (commit 59a97f2e3) attempted to resolve storage disappearing
by preferring hostnames over IPs when TLS hostname verification is required
(VerifySSL=true and no fingerprint). However, that fix was ineffective because
the cluster discovery code was populating BOTH the Host and IP fields with the
IP address.
**Root Cause:**
In internal/api/config_handlers.go, the detectPVECluster function was setting:
- endpoint.Host = schemePrefix + clusterNode.IP (when IP was available)
- endpoint.IP = clusterNode.IP
This meant both fields contained the same IP address. When the monitoring code
tried to prefer endpoint.Host for TLS validation (internal/monitoring/monitor.go:
361-368), it was still getting an IP, causing certificate validation to fail
with "certificate is valid for pve01.example.com, not 10.0.0.44".
**Solution:**
Separate the Host and IP fields properly during cluster discovery:
- endpoint.Host = hostname (e.g., "https://pve01:8006") for TLS validation
- endpoint.IP = IP address (e.g., "10.0.0.44") for DNS-free connections
The existing logic in clusterEndpointEffectiveURL() can now correctly choose
between them based on TLS requirements.
**Impact:**
Users with VerifySSL=true who upgraded to v4.26.1-v4.26.5 and lost storage
visibility should now see storage, VM disks, and backups again after this fix.
The 5-second connect timeout was too aggressive for DNS resolution in some
Proxmox LXC environments, causing "Resolving timed out after 5000 milliseconds"
errors when downloading the auto-update script from raw.githubusercontent.com.
Changes:
- Add download_auto_update_script() helper with retry logic
- Increase connect timeout from 5s to 15s for slow DNS
- Increase max time from 15s to 60s for complete transfer
- Retry up to 3 times with incremental backoff (3s, 6s delays)
- Gracefully degrade: installer continues without auto-updates if download fails
- Users can re-run with --enable-auto-updates later when connectivity improves