Commit graph

69 commits

Author SHA1 Message Date
rcourtman
0fe196b5b1 Preflight disk space before Pulse updates
Some checks failed
Build and Test / Secret Scan (push) Waiting to run
Build and Test / Frontend & Backend (push) Waiting to run
Core E2E Tests / Playwright Core E2E (push) Has been cancelled
Update Integration Tests / Update Flow Integration Tests (push) Has been cancelled
2026-04-15 20:57:46 +01:00
rcourtman
324f3be1c8 Move v5 maintenance flow onto release/5.1 2026-04-14 19:22:12 +01:00
rcourtman
efb840deae Fix installer universal bundle fallback 2026-04-13 14:13:11 +01:00
rcourtman
2ad288c091 Fix streamed installer entrypoint 2026-04-12 22:30:58 +01:00
rcourtman
a2ecd4198a Validate local archive architecture 2026-04-05 23:35:54 +01:00
rcourtman
690cc94e17 Support local archives for Proxmox installs 2026-04-05 23:35:54 +01:00
rcourtman
0f2982ce3f Fix auto-update leaving Pulse stopped in LXC installs (#1323) 2026-03-25 23:49:25 +00:00
rcourtman
572520ebc6 Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270)
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
2026-03-09 10:04:28 +00:00
rcourtman
c0b3a0e665 Restart Pulse service after failed auto-update (#1323)
The auto-update flow stops the Pulse service before applying updates.
If the update fails, the rollback path restored files but never
restarted the service. Since the main unit was explicitly stopped
(not crashed), systemd's Restart=always didn't rescue it.

Add restart-on-failure guards to both pulse-auto-update.sh and
install.sh so Pulse is always restarted after a failed update attempt.
2026-03-07 10:46:19 +00:00
rcourtman
c7e0aa63d2 fix(installer): make proxmox auto-register token rotation deterministic 2026-03-03 19:50:33 +00:00
rcourtman
f5365809b3 fix(installer): remove config backup that filled disk on upgrades
The backup_existing function copied the entire config directory
(including metrics.db at ~2.5GB) on every upgrade with no cleanup.
On small VMs this filled the disk within a few releases.

The upgrade only swaps the binary; config files are not modified,
so the backup served no practical purpose.
2026-03-01 23:20:08 +00:00
rcourtman
54a1ace2c5 fix(installer): remove stale sensor-proxy mount entries that prevent LXC start after reboot (#1280)
The v4 installer added mount entries for /run/pulse-sensor-proxy to LXC
container configs. After upgrading to v5 and rebooting, /run (tmpfs) is
wiped and the container fails to start. The installer now detects and
removes these stale mp<N> and lxc.mount.entry references automatically
when run on a PVE host, and the upgrade docs include manual fix steps.
2026-02-22 10:52:12 +00:00
rcourtman
8ffec7c124 Support: Fix audit log permissions and improve UI empty state
- Updated server installer to ensure correct permissions for audit directory
- Added descriptive empty state and filter clearing to AuditLogPanel
2026-01-22 16:41:18 +00:00
rcourtman
098f645b34 chore: update Docker configs and installer
- Dockerfile: remove sensor proxy build target
- docker-compose.yml: remove proxy service configuration
- install.sh: simplify installer without proxy
- updates/manager.go: minor updates
2026-01-21 12:03:52 +00:00
rcourtman
724362504e fix: Add SELinux context restoration for Fedora/RHEL systems. Related to #996
On SELinux-enforcing systems (Fedora, RHEL, CentOS), binaries installed to
non-standard locations need proper security contexts for systemd to execute
them. Without this, systemd fails with 'Permission denied' even when the
binary has correct Unix permissions.

Changes:
- Add restore_selinux_contexts() function to both install scripts
- Uses restorecon (preferred) or chcon (fallback) to set bin_t context
- Only runs when SELinux is detected and enforcing
- Called after binary installation, before systemd service start
2025-12-31 23:12:53 +00:00
rcourtman
5cd6224997 fix(install): align bootstrap token path and data directory config (fixes #962) 2025-12-30 00:28:16 +00:00
rcourtman
cba4418be3 fix: Remove Alpine from LXC template options
Alpine uses apk/OpenRC instead of apt/systemd, which the Pulse
LXC installation flow requires. This prevents failed installations.

- Remove Alpine download option from advanced mode
- Add note that Pulse requires Debian-based templates
- Add validation when user selects from template list to catch
  Alpine/Gentoo/Arch/Void and fall back to Debian 12 with warning

Related to #915
2025-12-25 23:58:24 +00:00
rcourtman
b562580764 fix: Skip deprecated pulse-sensor-proxy for v5+ installations
The unified agent now handles temperature monitoring in v5+, making
pulse-sensor-proxy unnecessary. This commit:

1. Adds INSTALLER_MAJOR_VERSION constant to declare bundled version
2. Skips 'Temperature Monitoring Setup' prompts for v5+ installs
3. Skips sensor proxy installation entirely for v5+
4. Updates help text to mark --proxy as deprecated for v5+
5. Removes outdated sensor proxy instructions from completion message

Fixes the 'pct pull TASK ERROR: failed to open /opt/pulse/bin/pulse-sensor-proxy-linux-amd64'
error reported by users installing v5.0.0-rc.3.

Reported-by: RLSinRFV (GitHub Discussion #845)
2025-12-17 12:20:03 +00:00
rcourtman
9e847e2a02 fix(install): add retry logic and better error output for script download
Addresses #827

- Added 3-retry logic with 2-second delays between attempts
- Increased timeout from 15s to 30s for slower connections
- Show actual curl error instead of suppressing stderr
- Provide workaround instructions (download manually then run)
- Show the URL being downloaded for easier debugging
2025-12-14 21:38:59 +00:00
rcourtman
c6167f08e6 fix: deploy unified agent scripts and binaries during installation
The installer was missing:
1. copy_unified_agent_binaries_from_dir() to extract pulse-agent-* binaries
   from the release tarball to /opt/pulse/bin/
2. install.sh and install.ps1 in the deploy_agent_scripts() array

This caused /install.sh and /download/pulse-agent to return 404 on fresh
installations and upgrades from pre-4.33.0 versions.

Related to #760, #751
2025-11-26 20:35:52 +00:00
rcourtman
f3e85a7455 fix: remove references to deleted install-host-agent.sh script
The unified agent system replaced install-host-agent.sh with install.sh.
This commit updates all references:
- Dockerfile: removed COPY for deleted script
- router.go: serve install.sh at /install-host-agent.sh endpoint (backwards compatible)
- build-release.sh: removed copy of deleted script
- validate-release.sh: removed validation of deleted script
- install.sh: updated script list for bare-metal installs
2025-11-26 09:57:06 +00:00
rcourtman
6853a0ffd1 feat: serve install scripts from GitHub releases instead of main branch
Scripts like install.sh and install-sensor-proxy.sh are now attached
as release assets and downloaded from releases/latest/download/ URLs.
This ensures users always get scripts compatible with their installed
version, even while development continues on main.

Changes:
- build-release.sh: copy install scripts to release directory
- create-release.yml: upload scripts as release assets
- Updated all documentation and code references to use release URLs
- Scripts reference each other via release URLs for consistency
2025-11-26 08:59:59 +00:00
rcourtman
d0d7a3dcbd Fix mp mount detection pattern for pulse-sensor-proxy
The grep pattern was looking for 'pulse-sensor-proxy' as a standalone
string, but the actual mount line contains paths like:
  mp0: /run/pulse-sensor-proxy,mp=/mnt/pulse-proxy,replicate=0

This caused the removal logic to never execute, leaving the old mp
mount in place and preventing the migration to lxc.mount.entry format.

Changed pattern to match either path component:
- /pulse-sensor-proxy (source path)
- /mnt/pulse-proxy (mount point)

Also removed space after colon in pattern to match actual format.

This completes the fix for temperature proxy setup on LXC containers.
2025-11-22 22:34:26 +00:00
rcourtman
3858397f76 Fix LXC config modification for Proxmox pmxcfs filesystem
The /etc/pve/ directory is a clustered FUSE filesystem (pmxcfs) managed
by Proxmox. Direct modifications using sed -i or echo >> don't work
reliably on this filesystem, and LXC config files contain snapshot
sections that must be preserved.

Changes:
- Use temp file approach: copy config, modify temp, copy back to trigger sync
- Only modify main config section (before first [snapshot] marker)
- Properly handle both mp mount removal and lxc.mount.entry addition
- Apply fix to both install.sh and install-sensor-proxy.sh

This fixes temperature proxy setup failures where the socket mount
entry wasn't being persisted to the container configuration.

Related to #628
2025-11-22 22:19:00 +00:00
rcourtman
3b85436c0f Related to #738: make pulse proxy mount migration-safe 2025-11-21 21:29:14 +00:00
rcourtman
28c0d3d39c Harden release validation for host agent downloads (related to #735) 2025-11-21 10:47:53 +00:00
rcourtman
a03f8115b6 Improve installer temperature proxy and backup polling 2025-11-18 18:42:33 +00:00
rcourtman
50f8b76921 Fix auto-registration token parsing and hostname 2025-11-18 09:10:03 +00:00
rcourtman
13daa61d1d Harden turnkey install and proxy auto-registration 2025-11-18 00:24:50 +00:00
rcourtman
fb1e44300b Refresh container sensor proxy installer 2025-11-17 22:41:17 +00:00
rcourtman
3fe6b4fe9b Improve temp proxy install UX 2025-11-17 22:30:32 +00:00
rcourtman
8cc725e8e0 Fix proxy install summary and allowed_nodes cleanup 2025-11-17 14:38:01 +00:00
rcourtman
f9341ae1fc Improve temperature proxy workflow 2025-11-17 14:25:46 +00:00
rcourtman
0d70063642 Fix proxy installer dedupe 2025-11-15 22:04:36 +00:00
rcourtman
47d5c14aef Improve temperature proxy control-plane flow 2025-11-15 21:49:51 +00:00
rcourtman
e69572d6f0 Ensure installer resets config ownership 2025-11-15 18:00:49 +00:00
rcourtman
07d69684ac Fix install channel detection when unset 2025-11-15 17:11:23 +00:00
rcourtman
6a5b8d698b Add critical safety guards to temperature proxy installation
After implementing the health gate, added comprehensive safety measures
to prevent the health checks themselves from becoming a new failure point.

**Problem**: Previous commit added strict health checks but could fail in
edge cases:
- `pct exec` could hang if container stopped/frozen → installer deadlocks
- systemctl/journalctl might not be available → diagnostics fail
- Container access check could fail for transient reasons
- pvecm error detection was fragile (string matching specific messages)

**Solutions Implemented**:

1. **Timeouts on All External Commands** (install.sh:1596,1618)
   - `timeout 5` on systemctl checks
   - `timeout 10` on pct exec checks
   - Prevents installer from hanging indefinitely

2. **Graceful Degradation** (install.sh:1602-1630)
   - Check for systemctl/pct availability before using
   - Warn if tools missing instead of failing
   - Container check is warning-only (may be transient)
   - Only fail on critical checks: service running, socket exists

3. **Bypass Flag Support** (install.sh:1589-1594)
   - Set `PULSE_SKIP_HEALTH_CHECKS=1` to bypass all checks
   - Documented in error messages for troubleshooting
   - Allows installation in unsupported environments

4. **Flexible Diagnostics** (install.sh:1640-1647)
   - Use journalctl if available, fallback to syslog
   - Conditional tool-specific advice

5. **Broader Error Detection** (ssh.go:582-628)
   - List of 14 standalone indicators (vs 5 hardcoded checks)
   - Case-insensitive matching for localization tolerance
   - Permissive strategy: treat any known pattern as standalone
   - Handles variations: "no cluster", "IPC", "connection refused", etc.

6. **Enhanced Test Coverage** (ssh_test.go:+35 lines)
   - Added 3 new test cases (variation patterns)
   - Tests now cover 8 standalone scenarios + 3 negative cases
   - All tests pass (11/11)

**Impact**:
- Health gate won't block installation in edge cases
- Better user experience on non-standard setups
- Standalone detection handles more error message variations
- Clear escape hatch for troubleshooting (bypass flag)

**Confidence Level**: High
- All tests pass (bash syntax + Go unit tests)
- Graceful fallbacks for every external command
- Only critical checks are hard failures
- Warnings guide users through validation issues

Related to #571
2025-11-13 10:26:46 +00:00
rcourtman
d3875eaae5 Dramatically improve temperature proxy installation robustness
Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes:

**Problem 1: Silent Failures**
- Installations reported "SUCCESS" even when proxy never started
- UI showed green checkmarks with no temperature data
- Zero feedback when things went wrong

**Problem 2: Missing Diagnostics**
- Service failures logged only in journald
- Users saw "Something going on with the proxy" with no actionable guidance
- No way to troubleshoot from error messages

**Problem 3: Standalone Node Issues**
- Proxy daemon logged continuous pvecm errors as warnings
- "ipcc_send_rec" and "Unknown error -1" messages confused users
- These are expected for non-clustered/LXC setups

**Solutions Implemented:**

1. **Health Gate in install.sh (lines 1588-1629)**
   - Verify service is running after installation
   - Check socket exists on host
   - Confirm socket visible inside container via bind mount
   - Fail loudly with specific diagnostics if any check fails

2. **Actionable Error Messages in install-sensor-proxy.sh (lines 822-877)**
   - When service fails to start: dump full systemctl status + 40 lines of logs
   - When socket missing: show permissions, service status, and remediation command
   - Include common issues checklist (missing user, permission errors, lm-sensors, etc.)
   - Direct link to troubleshooting docs

3. **Better Standalone Node Detection in ssh.go (lines 585-595)**
   - Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators
   - Log at INFO level (not WARN) since this is expected behavior
   - Clarify message: "using localhost for temperature collection"

**Impact:**
- Eliminates "green checkmark but no temps" scenario
- Users get immediate actionable feedback on failures
- Standalone/LXC installations work silently without error spam
- Reduces support burden from #571 (15+ comments of user frustration)

Related to #571
2025-11-13 10:14:19 +00:00
rcourtman
6979f3f57a Related to discussion #702: harden uninstall cleanup 2025-11-12 19:46:45 +00:00
rcourtman
4a0637957c Related to #698: harden installer release detection 2025-11-12 17:56:16 +00:00
rcourtman
0415c15d31 Fix install.sh malformed download URL when latest release is draft (related to #669)
The installer was constructing malformed download URLs like:
  https://github.com/.../download/location: https://github.com/.../pulse-location: ...

This occurred when the latest GitHub release is a draft:
1. /releases/latest API returns nothing (drafts don't count as "latest")
2. Fallback redirect scraper gets "location: .../releases" (no /tag/)
3. sed regex fails to match but echoes the entire header line
4. That malformed string becomes LATEST_RELEASE, breaking the download URL

Fixed by:
1. Switch both stable and RC channels to use /releases endpoint
2. Filter JSON to get first non-draft (and non-prerelease for stable)
3. Harden redirect scraper to only match when /tag/ is actually present
4. Fall through to v4.5.1 hardcoded fallback if both methods fail

This ensures the installer works correctly when latest release is draft,
during DNS issues, and when GitHub API is unavailable.
2025-11-12 13:11:11 +00:00
rcourtman
777b27b6f6 Fix checksum extraction to use word boundaries
The grep pattern was too loose and could match filenames like:
- pulse-v4.29.0-linux-amd64.tar.gz (correct)
- pulse-v4.29.0-linux-amd64.tar.gz.sha256 (also matched)

Using grep -w ensures we only match the exact filename as a complete word,
preventing false matches on files with the same prefix.
2025-11-11 20:41:42 +00:00
rcourtman
cf5535878d Make install script backward compatible with old and new release formats
The install script now tries checksums.txt first (v4.29.0+), then falls back
to individual .sha256 files (v4.28.0 and earlier). This ensures users can
update from any version regardless of which checksum format was used.

This fixes the release format transition issue where changing asset structure
broke updates for users on older versions.
2025-11-11 20:19:47 +00:00
rcourtman
b8279b0003 Update install script to use checksums.txt instead of individual .sha256 files
Aligns with release asset reduction changes. The install script now downloads the unified checksums.txt file and extracts the checksum for the specific architecture being installed.
2025-11-11 20:14:13 +00:00
rcourtman
438d3b6b7b Fix unbound variable error in temperature proxy installation
Related to #681

The variable local_proxy_binary was declared with local scope inside
the BUILD_FROM_SOURCE conditional block but referenced outside of it
during cleanup. This caused "unbound variable" errors on release installs
since the script uses set -u.

Moved the declaration before the conditional block and initialize to empty
string. The cleanup code [[ -f "$local_proxy_binary" ]] already handles
the empty string case safely.
2025-11-10 11:37:31 +00:00
rcourtman
c9d1671afd Fix persistent temperature monitoring issues for standalone Proxmox nodes (addresses #571)
This commit resolves the recurring temperature monitoring failures that have plagued multiple releases:

1. **Fix user mismatch (v4.27.1 regression)**:
   - Changed binary default user from 'pulse-sensor' to 'pulse-sensor-proxy'
   - Aligns with the user created by install-sensor-proxy.sh (line 389)
   - Prevents panic when binary is run outside systemd context
   - Systemd unit already uses User=pulse-sensor-proxy, so this makes manual runs work too

2. **Fix standalone node validation (v4.25.0+ regression)**:
   - pvecm status exits with code 2 on standalone nodes (not in a cluster)
   - This caused validation to fail, rejecting all temperature requests
   - Added discoverLocalHostAddresses() helper that discovers actual host IPs/hostnames
   - On standalone nodes, cluster membership list is populated with host's own addresses
   - Maintains SSRF protection while allowing standalone operation
   - Added comprehensive test coverage

3. **Make installer fail loudly on proxy setup failure**:
   - Previously, failed proxy installation only printed a warning
   - Install script then claimed "Pulse installation complete!" (confusing for users)
   - Now exits with clear error message and remediation steps
   - Forces operators to fix proxy issues before claiming success
   - Users who skip temperature monitoring are unaffected

4. **Add test coverage to prevent future regressions**:
   - Added TestDiscoverLocalHostAddresses to verify local address discovery
   - Validates no loopback or link-local addresses are returned
   - All existing tests pass with new changes

Pattern of failures across releases:
- v4.23.0: Missing proxy binaries in release
- v4.24.0-rc.3: AMD CPU sensor naming (Tctl vs Tdie)
- v4.25.0: Single-node pvecm status exit code
- v4.27.1: User mismatch (pulse-sensor vs pulse-sensor-proxy)

This comprehensive fix addresses the root causes rather than applying another tactical patch.

Related to #571
2025-11-09 16:53:14 +00:00
rcourtman
a39beca464 Fix install.sh auto-update download timeout on slow DNS networks (related to #669)
The 5-second connect timeout was too aggressive for DNS resolution in some
Proxmox LXC environments, causing "Resolving timed out after 5000 milliseconds"
errors when downloading the auto-update script from raw.githubusercontent.com.

Changes:
- Add download_auto_update_script() helper with retry logic
- Increase connect timeout from 5s to 15s for slow DNS
- Increase max time from 15s to 60s for complete transfer
- Retry up to 3 times with incremental backoff (3s, 6s delays)
- Gracefully degrade: installer continues without auto-updates if download fails
- Users can re-run with --enable-auto-updates later when connectivity improves
2025-11-08 18:50:18 +00:00
rcourtman
48fabdd827 Improve Docker temperature monitoring documentation for clarity (related to #600)
Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be
more user-friendly and address common setup issues:

- Added clear explanation of why the proxy is needed (containers can't access hardware)
- Provided concrete IP example instead of placeholder
- Showed full docker-compose.yml context with proper YAML structure
- Added sudo to commands where needed
- Updated docker-compose commands to v2 syntax with note about v1
- Expanded verification steps with clearer success indicators
- Added reminder to check container name in verification commands

These improvements should help users who encounter blank temperature displays
due to missing proxy installation or bind mount configuration.
2025-11-07 15:09:42 +00:00
rcourtman
50cf34a2da Fix install.sh to deploy host agent binaries (related to #651)
The bare metal installer was not copying pulse-host-agent binaries from
release tarballs into /opt/pulse/bin/, causing 404 errors when users
tried to install the host agent via the download endpoint.

Changes:
- Copy pulse-host-agent binary during initial installation (alongside
  pulse-docker-agent)
- Update install_additional_agent_binaries() to fetch and install
  cross-platform host agent binaries (linux-amd64, linux-arm64,
  linux-armv7, darwin-amd64, darwin-arm64, windows-amd64)
- Match existing pattern used for Docker agent distribution

The build pipeline (build-release.sh and Dockerfile) already correctly
includes host agent binaries in releases and Docker images. This fix
ensures the installer deploys them.

Users on bare metal deployments should rerun install.sh to populate
/opt/pulse/bin/ with the missing host agent binaries. Docker
deployments are unaffected.
2025-11-07 11:19:47 +00:00