Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-16 19:50:10 +00:00

Author	SHA1	Message	Date
rcourtman	cdb692c8fd	Refactor to tag-driven release workflow with auto-changelog Major improvements: - Trigger on tag push (git push origin vX.Y.Z) instead of workflow_dispatch - Auto-generate release notes using GitHub's API - Tag is single source of truth (eliminates version/tag mismatch) - Follows industry standard pattern (Kubernetes, Docker, HashiCorp) - Also push 'latest' tag to Docker registries - Simpler workflow: update VERSION → commit → tag → push tag Breaking change: Manual workflow_dispatch releases no longer supported. Use: git tag vX.Y.Z && git push origin vX.Y.Z	2025-11-13 11:48:10 +00:00
rcourtman	e822ab7ae1	Fix remote sync check in release trigger script - Replace unreliable git fetch --dry-run check - Use git rev-parse to compare local and remote commits - Prevents false warnings about diverged branches	2025-11-13 11:43:36 +00:00
rcourtman	0739b3ade1	Prepare v4.30.0 release	2025-11-13 11:40:54 +00:00
rcourtman	6a647c3274	Revert VERSION to 4.29.5 for release testing	2025-11-13 11:38:37 +00:00
rcourtman	d400befd35	Add pre-flight validation script for releases - Check VERSION file matches before triggering workflow - Validate working directory is clean - Confirm on main branch and up to date - Load release notes from /tmp/release_notes_X.Y.Z.md - Prevents wasting CI time on misconfigured releases	2025-11-13 11:36:53 +00:00
rcourtman	332f92b696	Prepare v4.30.0 release	2025-11-13 11:34:49 +00:00
rcourtman	67b3b2ff84	Fix Proxmox 9.x VM status endpoint incompatibility Proxmox VE 9.x removed support for the "full" parameter in the /nodes/{node}/qemu/{vmid}/status/current endpoint. When Pulse sent GetVMStatus() requests with ?full=1, Proxmox responded with: API error 400: {"errors":{"full":"property is not defined in schema..."}} This caused the cluster client to mark ALL endpoints as unhealthy, which cascaded into multiple failures: - VM status checks failed - Guest agent queries were blocked - Filesystem data collection stopped working - All Windows VMs showed disk:-1 (unknown) instead of actual disk usage The fix removes the ?full=1 parameter since Proxmox 9.x returns all data by default without needing this parameter. This maintains backward compatibility with older Proxmox versions while fixing the issue in 9.x. After this fix: - Cluster endpoints are correctly marked as healthy - Guest agent queries work properly - Windows VMs report actual disk usage (e.g., 26% on C:\ drive) - VM monitoring functions normally on Proxmox 9.x	2025-11-13 11:22:36 +00:00
rcourtman	c1d076db0c	Simplify Remember Me label text Remove '30 days' from label - most apps just say 'Remember me' without specifying the duration.	2025-11-13 10:40:53 +00:00
rcourtman	56a7579c99	Add Remember Me feature with sliding session expiration (Related to #707 ) Implements a "Remember Me" option that allows users to stay logged in for 30 days instead of the default 24 hours. This addresses the pain point of frequent re-authentication in LAN-only environments while maintaining authentication security. Backend changes: - Add rememberMe field to login request handling - Support variable session durations (24h default, 30d with Remember Me) - Implement sliding session expiration that extends sessions on each authenticated request using the original duration - Store OriginalDuration in session data for proper sliding window - Update session cookie MaxAge to match session duration Frontend changes: - Add "Remember Me for 30 days" checkbox to login form - Pass rememberMe flag in login request - Improve UI with clear duration indication Key features: - Sessions extend automatically on each request (sliding window) - Original duration preserved across session extension - Backward compatible with existing sessions (legacy sessions work) - Sessions persist across server restarts This provides a better user experience for LAN deployments without compromising security by completely disabling authentication.	2025-11-13 10:37:08 +00:00
rcourtman	8e993ea901	Add critical safety guards to temperature proxy installation After implementing the health gate, added comprehensive safety measures to prevent the health checks themselves from becoming a new failure point. Problem: Previous commit added strict health checks but could fail in edge cases: - `pct exec` could hang if container stopped/frozen → installer deadlocks - systemctl/journalctl might not be available → diagnostics fail - Container access check could fail for transient reasons - pvecm error detection was fragile (string matching specific messages) Solutions Implemented: 1. Timeouts on All External Commands (install.sh:1596,1618) - `timeout 5` on systemctl checks - `timeout 10` on pct exec checks - Prevents installer from hanging indefinitely 2. Graceful Degradation (install.sh:1602-1630) - Check for systemctl/pct availability before using - Warn if tools missing instead of failing - Container check is warning-only (may be transient) - Only fail on critical checks: service running, socket exists 3. Bypass Flag Support (install.sh:1589-1594) - Set `PULSE_SKIP_HEALTH_CHECKS=1` to bypass all checks - Documented in error messages for troubleshooting - Allows installation in unsupported environments 4. Flexible Diagnostics (install.sh:1640-1647) - Use journalctl if available, fallback to syslog - Conditional tool-specific advice 5. Broader Error Detection (ssh.go:582-628) - List of 14 standalone indicators (vs 5 hardcoded checks) - Case-insensitive matching for localization tolerance - Permissive strategy: treat any known pattern as standalone - Handles variations: "no cluster", "IPC", "connection refused", etc. 6. Enhanced Test Coverage (ssh_test.go:+35 lines) - Added 3 new test cases (variation patterns) - Tests now cover 8 standalone scenarios + 3 negative cases - All tests pass (11/11) Impact: - Health gate won't block installation in edge cases - Better user experience on non-standard setups - Standalone detection handles more error message variations - Clear escape hatch for troubleshooting (bypass flag) Confidence Level: High - All tests pass (bash syntax + Go unit tests) - Graceful fallbacks for every external command - Only critical checks are hard failures - Warnings guide users through validation issues Related to #571	2025-11-13 10:26:46 +00:00
rcourtman	4f2e9055cd	Add comprehensive tests for standalone node detection patterns Tests validate the error pattern matching logic added in previous commit, ensuring we correctly identify: 1. Standalone Node Patterns (should trigger fallback): - Classic: 'Corosync config does not exist' - LXC ipcc errors: 'ipcc_send_rec[1] failed: Unknown error -1' - Access control errors: 'Unable to load access control list' - All patterns from GitHub issue #571 2. Genuine Errors (should NOT trigger fallback): - Network timeouts - Permission denied - Command not found Tests use real error messages from production GitHub issues to prevent regressions. All 9 test cases pass. Coverage: - 6 standalone/LXC error patterns - 3 genuine error cases (negative testing) - References issue #571 for traceability Related to #571	2025-11-13 10:17:57 +00:00
rcourtman	1dd0e74133	Dramatically improve temperature proxy installation robustness Users were abandoning Pulse due to catastrophic temperature monitoring setup failures. This commit addresses the root causes: Problem 1: Silent Failures - Installations reported "SUCCESS" even when proxy never started - UI showed green checkmarks with no temperature data - Zero feedback when things went wrong Problem 2: Missing Diagnostics - Service failures logged only in journald - Users saw "Something going on with the proxy" with no actionable guidance - No way to troubleshoot from error messages Problem 3: Standalone Node Issues - Proxy daemon logged continuous pvecm errors as warnings - "ipcc_send_rec" and "Unknown error -1" messages confused users - These are expected for non-clustered/LXC setups Solutions Implemented: 1. Health Gate in install.sh (lines 1588-1629) - Verify service is running after installation - Check socket exists on host - Confirm socket visible inside container via bind mount - Fail loudly with specific diagnostics if any check fails 2. Actionable Error Messages in install-sensor-proxy.sh (lines 822-877) - When service fails to start: dump full systemctl status + 40 lines of logs - When socket missing: show permissions, service status, and remediation command - Include common issues checklist (missing user, permission errors, lm-sensors, etc.) - Direct link to troubleshooting docs 3. Better Standalone Node Detection in ssh.go (lines 585-595) - Recognize "Unknown error -1" and "Unable to load access control list" as LXC indicators - Log at INFO level (not WARN) since this is expected behavior - Clarify message: "using localhost for temperature collection" Impact: - Eliminates "green checkmark but no temps" scenario - Users get immediate actionable feedback on failures - Standalone/LXC installations work silently without error spam - Reduces support burden from #571 (15+ comments of user frustration) Related to #571	2025-11-13 10:14:19 +00:00
rcourtman	1cfc6da2d9	Revert "Document release notes input requirement" This reverts commit `5c06597b6e`.	2025-11-13 09:41:43 +00:00
rcourtman	5c06597b6e	Document release notes input requirement	2025-11-13 09:40:21 +00:00
rcourtman	1f3723a7ad	Require release notes input for workflow	2025-11-13 09:37:38 +00:00
rcourtman	221c41b2b0	Polish release notes fallback	2025-11-13 09:10:43 +00:00
rcourtman	8330664ed2	Add deterministic release notes fallback	2025-11-13 00:00:25 +00:00
rcourtman	a6a8f0a8ef	Improve release notes fallback	2025-11-12 23:40:26 +00:00
rcourtman	d4b7a61e9c	Prepare v4.29.6 release	2025-11-12 23:19:20 +00:00
rcourtman	1166576b21	Ensure agent ID collisions respect token boundaries (Related to #658 )	2025-11-12 22:46:56 +00:00
rcourtman	bb55144637	Improve update integration diagnostics	2025-11-12 22:27:05 +00:00
rcourtman	88d0aeebdf	Ensure alert cooldown persists reliably Related to #706	2025-11-12 21:52:24 +00:00
rcourtman	3fa7356ff3	Improve auth timeout handling (related to #705 )	2025-11-12 21:50:53 +00:00
rcourtman	4f950de29d	Make API update test resilient to stale cookies	2025-11-12 21:32:33 +00:00
rcourtman	e3952a073c	Use internal mock host for release assets	2025-11-12 21:23:01 +00:00
rcourtman	8b2d64b583	Fix API update test to send CSRF token	2025-11-12 21:13:46 +00:00
rcourtman	98adb5e08e	Preserve storage backups after partial failures (Related to #704 )	2025-11-12 21:10:18 +00:00
rcourtman	3b079eeddb	Add release dry run workflow and API update integration test	2025-11-12 21:02:52 +00:00
rcourtman	874f41b75c	Related to discussion #702 : harden uninstall cleanup	2025-11-12 19:46:45 +00:00
rcourtman	92e155ed3f	Handle Snap Docker home restrictions (Related to #693 )	2025-11-12 19:20:04 +00:00
rcourtman	653ce04bc1	Improve sensor proxy cluster validation (Related to #703 )	2025-11-12 19:17:45 +00:00
rcourtman	923293fbf7	Related to #692 : Skip unsupported guest OS info calls	2025-11-12 19:17:09 +00:00
rcourtman	b829d86d2c	Handle docker validation under errexit Related to #693	2025-11-12 18:34:53 +00:00
rcourtman	b3e8e43736	Related to #698 : harden installer release detection	2025-11-12 17:56:16 +00:00
rcourtman	9dd6357328	Improve sensor-proxy release detection (related to #701 )	2025-11-12 17:49:20 +00:00
rcourtman	429aa075af	Ensure release validation handles published edits (related to #669 )	2025-11-12 17:33:30 +00:00
rcourtman	357a7fe920	Ensure VM status requests always return meminfo (Related to #694 )	2025-11-12 17:30:10 +00:00
rcourtman	29a2fa8bdb	Auto-update Helm chart version to 4.29.5	2025-11-12 17:15:02 +00:00
rcourtman	9abc10265b	Auto-update Helm chart documentation	2025-11-12 17:15:01 +00:00
rcourtman	70d6f911b5	Skip helm-docs commits during release workflows	2025-11-12 17:14:31 +00:00
rcourtman	39548dbbe2	Bump version to 4.29.5	2025-11-12 16:34:22 +00:00
rcourtman	ff5de0147b	Fix missing regexp import for path traversal validation	2025-11-12 16:34:16 +00:00
rcourtman	be40faf67f	Bump version to 4.29.4 - includes security fix	2025-11-12 16:27:49 +00:00
rcourtman	946e2e455f	Security: Fix path traversal vulnerability in host-agent download endpoint CRITICAL SECURITY FIX: The /download/pulse-host-agent endpoint was directly concatenating user-supplied platform and arch query parameters into file paths without validation, allowing path traversal attacks. An attacker could request: /download/pulse-host-agent?platform=../../etc/passwd to read arbitrary files from the container filesystem. Fix: Add input validation to only allow alphanumeric characters and hyphens in platform/arch parameters before using them in file paths. Related: Codex security audit identified this during pre-release review	2025-11-12 16:27:11 +00:00
rcourtman	775bf99a96	Bump version to 4.29.3 for final release test	2025-11-12 16:18:51 +00:00
rcourtman	66d30c56eb	Fix draft release tag creation Draft releases created without --target get 'untagged-...' slugs instead of the proper tag name. This breaks all download URLs since installers expect /download/vX.Y.Z/... but assets are under /download/untagged-.../ Add --target parameter to gh release create to ensure the tag is created properly even for draft releases.	2025-11-12 16:18:22 +00:00
rcourtman	0c98d0b801	Bump version to 4.29.2	2025-11-12 15:47:33 +00:00
rcourtman	ba6d019c5b	Fix eventual consistency issue with release API lookup The releases REST API endpoint is eventually consistent for draft releases. Immediately after gh release create, the new release may not appear in the listing yet, causing the release_id lookup to return empty and fail validation. Add retry loop (10 attempts, 2s intervals) to wait for the release to appear in the API before extracting the ID. Also add validation to ensure we got a valid release_id before proceeding. This fixes the immediate validation failure with 'Release metadata is missing'.	2025-11-12 15:47:21 +00:00
rcourtman	bf8622eeaa	Bump version to 4.29.1 for release workflow test	2025-11-12 15:08:41 +00:00
rcourtman	e3890c2925	Fix release workflow to complete successfully end-to-end Related to systematic release workflow failures. The workflow has never successfully completed from start to finish since validation was added. Root causes identified and fixed: 1. GraphQL node_id vs numeric release ID: The create-release job was using `gh release view --json id` which returns a GraphQL node_id (RE_kwDON5nJtM4PmlTt) instead of the numeric database ID (261772525) needed by the REST API. The validation workflow then failed with 404 when trying to download assets. Fixed by using `gh api` to get the numeric ID from the releases list endpoint. 2. Missing binaries in Docker image: The validation script expects 26 binaries + 3 Windows symlinks in /opt/pulse/bin/, but the Dockerfile was only copying a subset. Missing binaries included the main pulse server binary, armv6/386 builds for all agents, and caused immediate validation failure. Fixed by copying all built binaries from backend-builder stage. 3. Assets-only validation fallback broken: When Docker image pull times out, the workflow falls back to assets-only validation but was still calling the validation script without --skip-docker flag, causing it to fail on the first docker command. Fixed by passing --skip-docker flag in the fallback path. 4. Asset download pagination: The asset download was not using --paginate, which would cause silent failures once we exceed 30 assets (currently at 27). Fixed by adding --paginate to gh api call. All fixes verified locally and address the complete failure chain.	2025-11-12 14:59:16 +00:00

1 2 3 4 5 ...

668 commits