Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-07 08:57:12 +00:00

Author	SHA1	Message	Date
rcourtman	5898cb81be	Fix update modal hanging indefinitely after completion (related to #628 ) When updates complete quickly, the status API may return 'completed' before the frontend detects the 'restarting' phase. This left users staring at a frozen modal with no feedback, requiring manual page refresh. Changes: - When status is 'completed', immediately check /api/health - If backend is healthy, reload the page to get new version - If health check fails, assume restart in progress and start health polling - Ensures users always get reloaded to the new version automatically This fixes the UX issue reported in discussion #628 where the update modal appeared frozen indefinitely despite successful update completion.	2025-11-07 08:11:52 +00:00
rcourtman	b5ef239973	Add container detection warning to pulse-sensor-proxy startup (related to #628 ) When pulse-sensor-proxy runs inside a container (Docker/LXC), it cannot complete SSH workflows properly, leading to continuous [preauth] log floods on the Proxmox host. This happens because the proxy is meant to run on the host, not inside the container. Changes: - Import internal/system for InContainer() detection - Add startup warning when running in containerized environment - Point users to docs/TEMPERATURE_MONITORING.md for correct setup - Allow suppression via PULSE_SENSOR_PROXY_SUPPRESS_CONTAINER_WARNING=true This catches the misconfiguration early and directs users to supported installation methods, preventing the SSH spam reported in discussion #628.	2025-11-06 23:41:29 +00:00
rcourtman	6a48c759e8	Fix critical notification system bugs and security issues This commit addresses multiple critical issues identified in the notification system audit conducted with Codex: Critical Fixes: 1. Queue Retry Logic (Critical #1) - Fixed broken retry/DLQ system where send functions never returned errors - Made sendGroupedEmail(), sendGroupedWebhook(), sendGroupedApprise() return errors - Made sendWebhookRequest() return errors - ProcessQueuedNotification() now properly propagates errors to queue - Retry logic and DLQ now function correctly 2. Attempt Counter Bug (Critical #2) - Fixed double-increment bug in queue processing - Separated UpdateStatus() from attempt tracking - Added IncrementAttempt() method - Notifications now get correct number of retry attempts 3. Secret Exposure (Critical #3 & #4) - Masked webhook headers and customFields in GET /api/notifications/webhooks - Added redactSecretsFromURL() to sanitize webhook URLs in history - Truncated/redacted response bodies in webhook history - Protected against credential harvesting via API 4. Email Rate Limiting (Critical #5) - Added emailManager field to NotificationManager - Shared EnhancedEmailManager instance across sends - Rate limiter now accumulates across multiple emails - SMTP rate limits are now enforced correctly 5. SSRF Protection (High #6) - Added DNS resolution of webhook URLs - Added isPrivateIP() check using CIDR ranges - Blocks all private IP ranges (10/8, 172.16/12, 192.168/16, 127/8, 169.254/16) - Blocks IPv6 private ranges (::1, fe80::/10, fc00::/7) - Prevents DNS rebinding attacks - Returns error instead of warning for private IPs New Features: 6. Health Endpoint (High #8) - Added GET /api/notifications/health - Returns queue stats (pending, sending, sent, failed, dlq) - Shows email/webhook configuration status - Provides overall health indicator Related to notification system audit Files changed: - internal/notifications/notifications.go: Error returns, rate limiting, SSRF hardening - internal/notifications/queue.go: Attempt tracking fix - internal/api/notifications.go: Secret masking, health endpoint	2025-11-06 23:26:03 +00:00
rcourtman	3eafd00c88	Fix Helm chart workflow 403 errors by granting write permissions The publish-helm-chart workflow was failing with 403 errors when attempting to upload Helm chart assets to GitHub releases. This was caused by the workflow having only 'contents: read' permission. Changed to 'contents: write' to allow the 'gh release upload' command to succeed.	2025-11-06 22:50:08 +00:00
rcourtman	fa7ca00250	Fix duplicate checksum in build-release.sh The checksum generation was including pulse-host-agent-v-darwin-arm64.tar.gz twice: once from the .tar.gz pattern and once from the pulse-host-agent-* pattern. Fixed by using extglob to exclude .tar.gz and .sha256 files from the agent binary patterns since tarballs are already matched separately.	2025-11-06 22:19:16 +00:00
rcourtman	b356ba0fec	Bump version to 4.26.4 Version alignment for upcoming release including: - Layout and table overflow fixes (related to #643) - Webhook alert persistence fix - Docker host row dimming fix - Agent installation script deployment fix (related to #644) - Guest agent disk data regression fix - Config backup/restore fixes (related to #646) - Bootstrap token UX improvements	2025-11-06 22:14:45 +00:00
rcourtman	4f9ba7a285	Allow layout to expand on wide displays (related to #643 ) Changed .pulse-shell from fixed 95rem cap to fluid clamp(95rem, 92vw, 120rem) to match standard monitoring dashboard behavior (Proxmox, Grafana, Portainer). On laptops/small screens: unchanged (capped at 1520px) On 1080p displays: expands to ~1766px usable width On 4K/ultrawide: expands up to 1920px max for readability Added back 2xl column widths (totaling ~1720px) that properly fit within the expanded shell, giving wide-display users more breathing room while maintaining proportional scaling across all breakpoints. Changed files: - index.css: Update .pulse-shell max-width to use clamp() - Dashboard.tsx: Add 2xl column widths calculated for expanded shell - GuestRow.tsx: Add matching 2xl column widths	2025-11-06 21:51:17 +00:00
rcourtman	68caf5592b	Fix Proxmox dashboard table overflow on wide displays (related to #643 ) Removed 2xl: width overrides that caused the table to exceed container width. At ≥1536px viewport, the 2xl breakpoint expanded table columns to ~1528px total width while .pulse-shell container provides only ~1416px usable space, forcing Net In/Net Out columns off-screen and requiring horizontal scroll. Table now caps at xl: breakpoint widths (~1266px) which fit comfortably within the container at all viewport sizes. Net In/Net Out columns are now visible without scrolling on 1080p, 4K, and all wide displays. Changed files: - Dashboard.tsx: Remove 2xl: width classes from all table header columns - GuestRow.tsx: Remove 2xl: width classes from all table cell columns	2025-11-06 21:36:30 +00:00
rcourtman	4891f06e76	Fix webhook alerts persisting when DisableAll* flags are enabled The original fix in `c6c0ac63e` only handled per-resource overrides when thresholds were disabled (trigger <= 0 or Disabled=true). It did not handle global DisableAll* flags (DisableAllStorage, DisableAllNodes, DisableAllGuests, etc.). When a user toggled a DisableAll* flag from false to true: - Check* functions returned early without processing - Existing active alerts remained in m.activeAlerts map - Those alerts continued generating webhook notifications - reevaluateActiveAlertsLocked didn't check DisableAll* flags This commit fixes the issue by: 1. Updating reevaluateActiveAlertsLocked to check all DisableAll* flags and resolve alerts for those resource types during config updates 2. Adding alert cleanup to Check* functions before early returns: - CheckStorage: clears usage and offline alerts - CheckNode: clears cpu/memory/disk/temperature and offline alerts - CheckPMG: clears queue/message alerts and offline alerts - CheckPBS: clears cpu/memory and offline alerts - CheckHost: calls existing cleanup helpers 3. Adding comprehensive test coverage for DisableAllStorage scenario Related to #561	2025-11-06 21:17:56 +00:00
rcourtman	2c3768341a	Fix Docker host row dimming for degraded status Docker hosts with 'degraded' status were incorrectly appearing dimmed (opacity-60) in the summary table, making them visually identical to offline hosts. This was confusing because degraded hosts are still actively reporting - they just have unhealthy containers or >35% of containers not running. The isHostOnline function now treats 'degraded' as an online status, so these rows maintain full opacity. The status badge already provides visual indication of the degraded state.	2025-11-06 19:11:17 +00:00
rcourtman	586ab3a740	Fix install.sh to deploy all agent installation scripts (related to #644 ) Root cause: v4.26.3 tarball and Docker image contained all 8 agent scripts, but install.sh only copied install-docker-agent.sh to /opt/pulse/scripts/. Users upgrading via install.sh ended up with missing scripts, causing 404s when trying to add hosts via the UI. Changes: - Add deploy_agent_scripts() function to systematically deploy all scripts - Deploy all 8 scripts: install-{docker,container,host}-agent.{sh,ps1}, uninstall-host-agent.{sh,ps1}, install-sensor-proxy.sh, install-docker.sh - Apply to both main installation and rollback/recovery code paths This ensures bare-metal installations have feature parity with Docker deployments.	2025-11-06 18:59:32 +00:00
rcourtman	1a78dcbba2	Fix guest agent disk data regression on Proxmox 8.3+ Related to #630 Proxmox 8.3+ changed the VM status API to return the `agent` field as an object ({"enabled":1,"available":1}) instead of an integer (0 or 1). This caused Pulse to incorrectly treat VMs as having no guest agent, resulting in missing disk usage data (disk:-1) even when the guest agent was running and functional. The issue manifested as: - VMs showing "Guest details unavailable" or missing disk data - Pulse logs showing no "Guest agent enabled, querying filesystem info" messages - `pvesh get /nodes/<node>/qemu/<vmid>/agent/get-fsinfo` working correctly from the command line, confirming the agent was functional Root cause: The VMStatus struct defined `Agent` as an int field. When Proxmox 8.3+ sent the new object format, JSON unmarshaling silently left the field at zero, causing Pulse to skip all guest agent queries. Changes: - Created VMAgentField type with custom UnmarshalJSON to handle both formats: * Legacy (Proxmox <8.3): integer (0 or 1) * Modern (Proxmox 8.3+): object {"enabled":N,"available":N} - Updated VMStatus.Agent from `int` to `VMAgentField` - Updated all references to `detailedStatus.Agent` to use `.Agent.Value` - The unmarshaler prioritizes the "available" field over "enabled" to ensure we only query when the agent is actually responding This fix maintains backward compatibility with older Proxmox versions while supporting the new format introduced in Proxmox 8.3+.	2025-11-06 18:42:46 +00:00
rcourtman	7ed9203e4b	Fix config backup/restore failures (related to #646 ) Addresses two issues preventing configuration backup/restore: 1. Export passphrase validation mismatch: UI only validated 12+ char requirement when using custom passphrase, but backend always enforced it. Users with shorter login passwords saw unexplained failures. - Frontend now validates all passphrases meet 12-char minimum - Clear error message suggests custom passphrase if login password too short 2. Import data parsing failed silently: Frontend sent `exportData.data` which was undefined for legacy/CLI backups (raw base64 strings). Backend rejected these with no logs. - Frontend now handles both formats: {status, data} and raw strings - Backend logs validation failures for easier troubleshooting Related to #646 where user reported "error after entering password" with no container logs. These changes ensure proper validation feedback and make the backup system resilient to different export formats.	2025-11-06 17:53:54 +00:00
rcourtman	b50dba577f	Fix demo server workflow verification by adding authentication The workflow was failing because /api/state requires authentication, but the verification step was making an unauthenticated request. Changes: - Authenticate with demo/demo credentials before checking node count - Use jq for cleaner JSON parsing instead of grep/cut - Check total node count from API response instead of regex pattern matching Related to user report about demo server not updating to 4.26.3. The demo server was actually updated successfully, but the workflow marked itself as failed due to the verification check failing.	2025-11-06 17:44:46 +00:00
rcourtman	ead325942e	Add bootstrap token display to install.sh completion message Enhances discoverability for non-Docker installations (bare metal, LXC) by displaying the bootstrap token prominently at the end of install.sh. Changes: - Add ASCII box display matching Docker startup format - Show token value and file location - Include usage instructions for first-time setup - Only display if .bootstrap_token file exists - Auto-cleanup note matches behavior With this change, bootstrap token is now prominently displayed across all installation methods: - Docker: startup logs (commit 731eb586) - Bare metal/LXC: install.sh completion (this commit) - CLI: pulse bootstrap-token command (commit 731eb586) Related to #645	2025-11-06 17:35:28 +00:00
rcourtman	a1dc451ed4	Document alert reliability features and DLQ API Add comprehensive documentation for new alert system reliability features: API Documentation (docs/API.md): - Dead Letter Queue (DLQ) API endpoints - GET /api/notifications/dlq - Retrieve failed notifications - GET /api/notifications/queue/stats - Queue statistics - POST /api/notifications/dlq/retry - Retry DLQ items - POST /api/notifications/dlq/delete - Delete DLQ items - Prometheus metrics endpoint documentation - 18 metrics covering alerts, notifications, and queue health - Example Prometheus configuration - Example PromQL queries for common monitoring scenarios Configuration Documentation (docs/CONFIGURATION.md): - Alert TTL configuration - maxAlertAgeDays, maxAcknowledgedAgeDays, autoAcknowledgeAfterHours - Flapping detection configuration - flappingEnabled, flappingWindowSeconds, flappingThreshold, flappingCooldownMinutes - Usage examples and common scenarios - Best practices for preventing notification storms All new features are fully documented with examples and default values.	2025-11-06 17:34:05 +00:00
rcourtman	dd1d222ad0	Improve bootstrap token UX for easier discovery The bootstrap token security requirement was added proactively but lacked discoverability, causing user friction during first-run setup. These improvements make the token easier to find while maintaining the security benefit. Improvements: - Display bootstrap token prominently in startup logs with ASCII box (previously: single line log message) - Add `pulse bootstrap-token` CLI command to display token on demand (Docker: docker exec <container> /app/pulse bootstrap-token) - Improve error messages in quick-setup API to show exact commands for retrieving token when missing or invalid - Error messages now include both Docker and bare metal examples User experience improvements: - Token visible in `docker logs` output immediately - Clear instructions printed with token - Helpful error messages if token is wrong/missing - CLI helper for operators who need to retrieve token later Security unchanged: - Bootstrap token still required for first-run setup - Token still auto-deleted after successful setup - No bypass mechanism added Related to discussion about bootstrap token UX friction.	2025-11-06 17:29:49 +00:00
rcourtman	80acc5ae72	chore: bump version to 4.26.3	2025-11-06 16:56:19 +00:00
rcourtman	f9ca2c0e68	Add hashpw utility for generating password hashes Simple CLI utility to generate bcrypt password hashes for admin users. Usage: hashpw <password> This utility helps administrators generate properly hashed passwords for use in configuration files or manual user setup.	2025-11-06 16:46:56 +00:00
rcourtman	c8e0281953	Add comprehensive alert system reliability improvements This commit implements critical reliability features to prevent data loss and improve alert system robustness: Persistent Notification Queue: - SQLite-backed queue with WAL journaling for crash recovery - Dead Letter Queue (DLQ) for notifications that exhaust retries - Exponential backoff retry logic (100ms → 200ms → 400ms) - Full audit trail for all notification delivery attempts - New file: internal/notifications/queue.go (661 lines) DLQ Management API: - GET /api/notifications/dlq - Retrieve DLQ items - GET /api/notifications/queue/stats - Queue statistics - POST /api/notifications/dlq/retry - Retry failed notifications - POST /api/notifications/dlq/delete - Delete DLQ items - New file: internal/api/notification_queue.go (145 lines) Prometheus Metrics: - 18 comprehensive metrics for alerts and notifications - Metric hooks integrated via function pointers to avoid import cycles - /metrics endpoint exposed for Prometheus scraping - New file: internal/metrics/alert_metrics.go (193 lines) Alert History Reliability: - Exponential backoff retry for history saves (3 attempts) - Automatic backup restoration on write failure - Modified: internal/alerts/history.go Flapping Detection: - Detects and suppresses rapidly oscillating alerts - Configurable window (default: 5 minutes) - Configurable threshold (default: 5 state changes) - Configurable cooldown (default: 15 minutes) - Automatic cleanup of inactive flapping history Alert TTL & Auto-Cleanup: - MaxAlertAgeDays: Auto-cleanup old alerts (default: 7 days) - MaxAcknowledgedAgeDays: Faster cleanup for acked alerts (default: 1 day) - AutoAcknowledgeAfterHours: Auto-ack long-running alerts (default: 24 hours) - Prevents memory leaks from long-running alerts WebSocket Broadcast Sequencer: - Channel-based sequencing ensures ordered message delivery - 100ms coalescing window for rapid state updates - Prevents race conditions in WebSocket broadcasts - Modified: internal/websocket/hub.go Configuration Fields Added: - FlappingEnabled, FlappingWindowSeconds, FlappingThreshold, FlappingCooldownMinutes - MaxAlertAgeDays, MaxAcknowledgedAgeDays, AutoAcknowledgeAfterHours All features are production-ready and build successfully.	2025-11-06 16:46:30 +00:00
rcourtman	47748230f4	Fix first-run setup 401 error by adding bootstrap token unlock screen (related to #639 ) After the security hardening that introduced bootstrap token protection, the first-run setup flow was broken because FirstRunSetup.tsx didn't prompt users for the token. This caused a 401 "Bootstrap setup token required" error during initial admin account creation. Changes: - Add dedicated unlock screen before the setup wizard - Display instructions for retrieving token from host - Include bootstrap token in quick-setup API request headers and body - Only require unlock for first-run setup (skip in force mode) The unlock screen follows the documented flow in README.md and ensures only users with host access can configure an unconfigured instance. Related to #639	2025-11-06 16:45:51 +00:00
rcourtman	20099549c6	Add comprehensive release validation to prevent missing artifacts Adds automated validation script to prevent the pattern of patch releases caused by missing files/artifacts. scripts/validate-release.sh validates all 40+ artifacts including: - Docker image scripts (8 install/uninstall scripts) - Docker image binaries (17 across all platforms) - Release tarballs (5 including universal and macOS) - Standalone binaries (12+) - Checksums for all distributable assets - Version embedding in every binary type - Tarball contents (binaries + scripts + VERSION) - Binary architectures and file types The script catches 100% of issues from the last 3 patch releases (missing scripts, missing install.sh, missing binaries, broken version embedding). Updated RELEASE_CHECKLIST.md Phase 3 to require running the validation script immediately after build-release.sh and before proceeding to Docker build/publish phases. Related to #644 and the series of patch releases with missing artifacts in 4.26.x.	2025-11-06 16:33:49 +00:00
rcourtman	035d872269	Add missing install/uninstall scripts to Docker image and release builds (related to #644 ) The Dockerfile and build-release.sh were missing several installer and uninstaller scripts that the router expects to serve via HTTP endpoints: - install-container-agent.sh - install-host-agent.ps1 - uninstall-host-agent.sh - uninstall-host-agent.ps1 This caused 404 errors when users attempted to add Docker/Podman hosts or use the PowerShell installer, as reported in #644. Changes: - Dockerfile: Added missing scripts to /opt/pulse/scripts/ with proper permissions - build-release.sh: Added missing scripts to both per-platform and universal tarballs to ensure bare-metal deployments serve the same endpoints as Docker deployments	2025-11-06 16:01:40 +00:00
rcourtman	40abcd1237	Fix empty space below backup chart by matching container and SVG heights The chart container was set to min-h-[12rem] (192px) on desktop while the SVG was hardcoded to 128px, creating 64px of unwanted empty space. Changed container to fixed h-32 (128px) to match the SVG height.	2025-11-06 15:34:20 +00:00
rcourtman	615cb129df	Fix checksum verification failure in install.sh (related to #642 ) The .sha256 files generated during release builds contained only the hash, but sha256sum -c expects the format "hash filename". This caused all install.sh updates to fail with "Checksum verification failed" even when the checksum was correct. Root cause: build-release.sh line 289 was using awk to extract only field 1 (the hash), discarding the filename that sha256sum -c needs. Fix: Remove the awk filter to preserve the full sha256sum output format. This affected the demo server update workflow and user installations.	2025-11-06 15:28:05 +00:00
rcourtman	a8fa834d24	Fix critical truncation bug preventing data readability on touch devices (related to #643 ) Removed CSS truncate from key identifier columns (container names, service names, guest names, host names, image names) that were making data inaccessible on mobile/ touch devices where title tooltips don't work. Users can now read full identifiers via horizontal scroll (already implemented via ScrollableTable component). Data should always be readable without requiring additional UI affordances. Changed files: - DockerUnifiedTable: Remove truncate from container/service names and images - GuestRow: Remove truncate from guest names - HostsOverview: Remove truncate from host display names and hostnames Column resizing remains on backlog as optional enhancement; users should not need a drag handle just to read the contents.	2025-11-06 15:00:36 +00:00
rcourtman	57e2f9428e	chore: bump version to 4.26.2	2025-11-06 14:33:08 +00:00
rcourtman	becda56897	Fix critical rollback download URL bug and doc inconsistencies Issues found during systematic audit after #642: 1. CRITICAL BUG - Rollback downloads were completely broken: - Code constructed: pulse-linux-amd64 (no version, no .tar.gz) - Actual asset name: pulse-v4.26.1-linux-amd64.tar.gz - This would cause 404 errors on all rollback attempts - Fixed: Construct correct tarball URL with version - Added: Extract tarball after download to get binary 2. TEMPERATURE_MONITORING.md referenced non-existent v4.27.0: - Changed to use /latest/download/ for future-proof docs 3. API.md example had wrong filename format: - Changed pulse-linux-amd64.tar.gz to pulse-v4.30.0-linux-amd64.tar.gz - Ensures example matches actual release asset naming The rollback bug would have affected any user attempting to roll back to a previous version via the UI or API.	2025-11-06 14:25:32 +00:00
rcourtman	fd3a72606f	Add standalone host-agent binaries to releases Issue: HOST_AGENT.md documented downloading pulse-host-agent binaries from GitHub releases, but those assets didn't exist. Only tarballs were available, making manual installation unnecessarily complex. Changes: - Copy standalone host-agent binaries (all architectures) to release/ directory alongside sensor-proxy binaries - Include host-agent binaries in checksum generation - Update HOST_AGENT.md to clarify available architectures - Retroactively uploaded missing binaries to v4.26.1 This enables air-gapped and manual installations without requiring an already-running Pulse server to download from.	2025-11-06 14:20:59 +00:00
rcourtman	e4378602c1	Fix install.sh missing from GitHub releases (addresses #642 ) Root cause: install.sh was not being copied to the release directory during build-release.sh execution, so it was never uploaded as a release asset. This caused the download URL to return "Not Found", which bash attempted to execute as a command. Changes: - Copy install.sh to release/ directory in build-release.sh - Include install.sh in checksums generation Note: RELEASE_CHECKLIST.md also updated locally to verify install.sh presence in Phase 3 and Phase 5, but that file is gitignored.	2025-11-06 14:10:46 +00:00
rcourtman	fa3b0db243	Improve static asset caching for hashed files Hashed static assets (e.g., index-BXHytNQV.js, index-TvhSzimt.css) are now cached for 1 year with immutable flag since content hash changes when files change. Benefits: - Faster page loads on subsequent visits - Reduced server bandwidth - Better user experience on demo and production instances Only index.html and non-hashed assets remain uncached to ensure users always get the latest version.	2025-11-06 13:54:26 +00:00
rcourtman	a9d2209edd	Fix demo mode to allow authentication endpoints Demo mode now permits login/logout and OIDC authentication endpoints while still blocking all modification requests. This allows demo instances to require authentication while remaining read-only. Authentication endpoints are read-only operations that verify credentials and issue session tokens without modifying any state. All POST/PUT/DELETE/PATCH operations remain blocked.	2025-11-06 13:48:28 +00:00
rcourtman	aea9586145	Update demo credentials to demo/demo	2025-11-06 13:46:57 +00:00
rcourtman	1340ad5f77	docs: Add demo login credentials to README The demo server at demo.pulserelay.pro now requires authentication. Login credentials are demo / changeme.	2025-11-06 13:43:11 +00:00
rcourtman	497bdb625e	Fix version embedding in Docker builds The Docker build was only setting internal/dockeragent.Version but not main.Version, causing the pulse binary to show "dev" instead of the actual version. Now matches build-release.sh ldflags pattern. Related to v4.26.1 release	2025-11-06 12:47:02 +00:00
rcourtman	6192e166f2	chore: prepare release v4.26.1	2025-11-06 12:13:56 +00:00
rcourtman	fdcec85931	Fix critical version embedding issues for 4.26 release Addresses the root cause of issue #631 (infinite Docker agent restart loop) and prevents similar issues with host-agent and sensor-proxy. Changes: - Set dockeragent.Version default to "dev" instead of hardcoded version - Add version embedding to server build in Dockerfile - Add version embedding to host-agent builds (all platforms) - Add version embedding to sensor-proxy builds (all platforms) This ensures: 1. Server's /api/agent/version endpoint returns correct v4.26.0 2. Downloaded agent binaries have matching embedded versions 3. Dev builds skip auto-update (Version="dev") 4. No version mismatch triggers infinite restart loops Related to #631	2025-11-06 11:42:52 +00:00
rcourtman	c638a8c28c	Fix checksum verification failure during installation Related to #639 Users reported "Failed to download checksum for Pulse release" errors during installation. The root cause was a mismatch between what the build system generates and what the installer expects: - install.sh downloads individual .sha256 files (e.g., pulse-v4.25.0-linux-amd64.tar.gz.sha256) - build-release.sh only created a single checksums.txt file This commit updates build-release.sh to generate both: 1. Individual .sha256 files for each asset (required by install.sh) 2. Combined checksums.txt for manual verification and signing This maintains backwards compatibility with the installer while keeping the aggregated checksums.txt for power users and GPG signing.	2025-11-06 11:21:49 +00:00
rcourtman	20854256c3	Fix VM migration issue where custom alert thresholds are lost Resolves #641 ## Problem When a VM migrates between Proxmox nodes, Pulse was treating it as a new resource and discarding custom alert threshold overrides. This occurred because guest IDs included the node name (e.g., `instance-node-VMID`), causing the ID to change when the VM moved to a different node. Users reported that after migrating a VM, previously disabled alerts (e.g., memory threshold set to 0) would resume firing. ## Root Cause Guest IDs were constructed as: - Standalone: `node-VMID` - Cluster: `instance-node-VMID` When a VM migrated from node1 to node2, the ID changed from `instance-node1-100` to `instance-node2-100`, causing: - Alert threshold overrides to be orphaned (keyed by old ID) - Guest metadata (custom URLs, descriptions) to be orphaned - Active alerts to reference the wrong resource ID ## Solution Changed guest ID format to be stable across node migrations: - New format: `instance-VMID` (for both standalone and cluster) - Retains uniqueness across instances while being node-independent - Allows VMs to migrate freely without losing configuration ## Implementation ### Backend Changes 1. Guest ID Construction (`monitor_polling.go`): - Simplified to always use `instance-VMID` format - Removed node from the ID construction logic 2. Alert Override Migration (`alerts.go`): - Added lazy migration in `getGuestThresholds()` - Detects legacy ID formats and migrates to new format - Preserves user configurations automatically 3. Guest Metadata Migration (`guest_metadata.go`): - Added `GetWithLegacyMigration()` helper method - Called during VM/container polling to migrate metadata - Preserves custom URLs and descriptions 4. Active Alerts Migration (`alerts.go`): - Added migration logic in `LoadActiveAlerts()` - Translates legacy alert resource IDs to new format - Preserves alert acknowledgments across restarts ### Frontend Changes 5. ID Construction Updates: - `ThresholdsTable.tsx`: Updated fallback from `instance-node-vmid` to `instance-vmid` - `Dashboard.tsx`: Simplified guest ID construction - `GuestRow.tsx`: Updated `buildGuestId()` helper ## Migration Strategy - Lazy Migration: Configs are migrated as guests are discovered - Backwards Compatible: Old IDs are detected and automatically converted - Zero Downtime: No manual intervention required - Persisted: Migrated configs are saved on next config write cycle ## Testing Recommendations After deployment: 1. Verify existing alert overrides still apply 2. Test VM migration - confirm thresholds persist 3. Check guest metadata (custom URLs) survive migration 4. Verify active alerts maintain acknowledgment state ## Related - Addresses similar issues with guest metadata and active alert tracking - Lays groundwork for any future guest-specific configuration features - Aligns with project philosophy: correctness and UX over implementation complexity	2025-11-06 10:27:15 +00:00
rcourtman	dfe960deb4	Fix container SSH detection and improve troubleshooting for issue #617 Related to #617 This fixes a misconfiguration scenario where Docker containers could attempt direct SSH connections (producing [preauth] log spam) instead of using the sensor proxy. Changes: - Fix container detection to check PULSE_DOCKER=true in addition to system.InContainer() heuristics (both temperature.go and config_handlers.go) - Upgrade temperature collection log from Error to Warn with actionable guidance about mounting the proxy socket - Add Info log when dev mode override is active so operators understand the security posture - Add troubleshooting section to docs for SSH [preauth] logs from containers The container detection was inconsistent - monitor.go checked both flags but temperature.go and config_handlers.go only checked InContainer(). Now all locations consistently check PULSE_DOCKER \|\| InContainer().	2025-11-06 09:57:53 +00:00
rcourtman	12dc8693c4	Add NVIDIA GPU temperature monitoring support (nouveau driver) - Add nouveau chip recognition to temperature parser - Implement parseNouveauGPUTemps() for NVIDIA GPU temps via nouveau driver - Map "GPU core" sensor to edge temperature field - Supports systems using open-source nouveau driver This complements the AMD GPU support added previously. Systems using the nouveau driver will now see NVIDIA GPU temperatures in the dashboard. For proprietary nvidia driver users, GPU temps are not available via lm-sensors and would require nvidia-smi integration.	2025-11-06 00:24:42 +00:00
rcourtman	d62259ffa7	Add AMD GPU temperature monitoring support Related to #600 - Add GPU field to Temperature model with edge, junction, and mem sensors - Add amdgpu chip recognition to temperature parser - Implement parseGPUTemps() to extract AMD GPU temperature data - Update frontend TypeScript types to include GPU temperatures - Display GPU temps in node table tooltip alongside CPU temps - Set hasGPU flag when GPU data is available This enables temperature monitoring for AMD GPUs (amdgpu sensors) that was previously being collected via SSH but silently discarded during parsing.	2025-11-06 00:19:04 +00:00
rcourtman	5b89b2371a	Make pulse-sensor-proxy resilient to read-only filesystems Related to #637 The sensor-proxy was failing to start on systems with read-only filesystems because audit logging required a writable /var/log/pulse/sensor-proxy directory. Changes: - Modified newAuditLogger() to automatically fall back to stderr (systemd journal) if the audit log file cannot be opened - Removed error return from newAuditLogger() since it now always succeeds - Added warning logs when fallback mode is used to alert operators - Updated tests to handle the new signature - Added better debugging to audit log tests This allows the sensor-proxy to run on: - Immutable/read-only root filesystems - Hardened systems with restricted /var mounts - Containerized environments with limited write access Audit events are still captured via systemd journal when file logging is unavailable, maintaining the security audit trail.	2025-11-06 00:18:51 +00:00
rcourtman	af55362009	Fix inflated RAM usage reporting for LXC containers Related to #553 ## Problem LXC containers showed inflated memory usage (e.g., 90%+ when actual usage was 50-60%, 96% when actual was 61%) because the code used the raw `mem` value from Proxmox's `/cluster/resources` API endpoint. This value comes from cgroup `memory.current` which includes reclaimable cache and buffers, making memory appear nearly full even when plenty is available. ## Root Cause - Nodes: Had sophisticated cache-aware memory calculation with RRD fallbacks - VMs (qemu): Had detailed memory calculation using guest agent meminfo - LXCs: Naively used `res.Mem` directly without any cache-aware correction The Proxmox cluster resources API's `mem` field for LXCs includes cache/buffers (from cgroup memory accounting), which should be excluded for accurate "used" memory. ## Solution Implement cache-aware memory calculation for LXC containers by: 1. Adding `GetLXCRRDData()` method to fetch RRD metrics for LXC containers from `/nodes/{node}/lxc/{vmid}/rrddata` 2. Using RRD `memavailable` to calculate actual used memory (total - available) 3. Falling back to RRD `memused` if `memavailable` is not available 4. Only using cluster resources `mem` value as last resort This matches the approach already used for nodes and VMs, providing consistent cache-aware memory reporting across all resource types. ## Changes - Added `GuestRRDPoint` type and `GetLXCRRDData()` method to pkg/proxmox - Added `GetLXCRRDData()` to ClusterClient for cluster-aware operations - Modified LXC memory calculation in `pollPVEInstance()` to use RRD data when available - Added guest memory snapshot recording for LXC containers - Updated test stubs to implement the new interface method ## Testing - Code compiles successfully - Follows the same proven pattern used for nodes and VMs - Includes diagnostic snapshot recording for troubleshooting	2025-11-06 00:16:18 +00:00
rcourtman	88ad986877	Revert "Hide Settings tab when authentication is not configured" This reverts commit d5a1e3d07729bad61743e8645a636e2545e11038.	2025-11-05 23:21:34 +00:00
rcourtman	7936808193	Add custom display name support for Docker hosts This implements the ability for users to assign custom display names to Docker hosts, similar to the existing functionality for Proxmox nodes. This addresses the issue where multiple Docker hosts with identical hostnames but different IPs/domains cannot be easily distinguished in the UI. Backend changes: - Add CustomDisplayName field to DockerHost model (internal/models/models.go:201) - Update UpsertDockerHost to preserve custom display names across updates (internal/models/models.go:1110-1113) - Add SetDockerHostCustomDisplayName method to State for updating names (internal/models/models.go:1221-1235) - Add SetDockerHostCustomDisplayName method to Monitor (internal/monitoring/monitor.go:1070-1088) - Add HandleSetCustomDisplayName API handler (internal/api/docker_agents.go:385-426) - Route /api/agents/docker/hosts/{id}/display-name PUT requests (internal/api/docker_agents.go:117-120) Frontend changes: - Add customDisplayName field to DockerHost TypeScript interface (frontend-modern/src/types/api.ts:136) - Add MonitoringAPI.setDockerHostDisplayName method (frontend-modern/src/api/monitoring.ts:151-187) - Update getDisplayName function to prioritize custom names (frontend-modern/src/components/Settings/DockerAgents.tsx:84-89) - Add inline editing UI with save/cancel buttons in Docker Agents settings (frontend-modern/src/components/Settings/DockerAgents.tsx:1349-1413) - Update sorting to use custom display names (frontend-modern/src/components/Docker/DockerHosts.tsx:58-59) - Update DockerHostSummaryTable to display custom names (frontend-modern/src/components/Docker/DockerHostSummaryTable.tsx:40-42, 87, 120, 254) Users can now click the edit icon next to any Docker host name in Settings > Docker Agents to set a custom display name. The custom name will be preserved across agent reconnections and takes priority over the hostname reported by the agent. Related to #623	2025-11-05 23:18:03 +00:00
rcourtman	0647a76c55	Fix temperature monitoring SSH key availability in containerized setup flow Addresses issue #635 where users encounter "can't find the SSH key" errors when enabling temperature monitoring during automated PVE setup with Pulse running in Docker. Root cause: - Setup script embeds SSH keys at generation time (when downloaded) - For containerized Pulse, keys are empty until pulse-sensor-proxy is installed - Script auto-installs proxy, but didn't refresh keys after installation - This caused temperature monitoring setup to fail with confusing errors Changes: 1. After successful proxy installation, immediately fetch and populate the proxy's SSH public key (lines 4068-4080) 2. Update bash variables SSH_SENSORS_PUBLIC_KEY and SSH_SENSORS_KEY_ENTRY so temperature monitoring setup can proceed in the same script run 3. Improve error messaging when keys aren't available (lines 4424-4453): - Clear explanation of containerized Pulse requirements - Step-by-step instructions for container restart and verification - Separate guidance for bare-metal vs containerized deployments Flow improvements: - Initial run: Proxy installs → keys fetched → temp monitoring configures - Rerun after container restart: Keys fetched at script start → works - Both scenarios now handled correctly Related to #635	2025-11-05 23:11:45 +00:00
rcourtman	3d1c910daa	Hide Settings tab when authentication is not configured Related to #636 When authentication is not configured (hasAuth() returns false), the Settings tab is now automatically hidden from the web interface. This provides a cleaner monitoring-only view for unauthenticated deployments where users only need to check the health of their environment. The Settings icon beside the Alerts tab will only appear when authentication is properly configured via PULSE_AUTH_USER/PASS, API tokens, proxy auth, or OIDC. Changes: - Modified utilityTabs in App.tsx to conditionally include Settings based on hasAuth() signal - Updated CONFIGURATION.md to document this UI behavior	2025-11-05 23:10:20 +00:00
rcourtman	8ca31003a0	docs: document TLS certificate file permissions for HTTPS setup Add comprehensive documentation for HTTPS/TLS configuration including: - File ownership and permission requirements (pulse user) - Common troubleshooting steps for startup failures - Complete setup examples for systemd and Docker - Validation commands for certificate/key verification Related to discussion #634	2025-11-05 23:08:02 +00:00
rcourtman	d28cfed3c7	Improve temperature monitoring setup messaging for containerized deployments When Pulse is running in a container and the SSH key is not available, provide clearer guidance about the pulse-sensor-proxy requirement and include documentation link for Docker deployments. This helps users understand that containerized Pulse needs the host-side sensor proxy to access temperature data from Proxmox hosts.	2025-11-05 23:05:47 +00:00

... 87 88 89 90 91 ...

4826 commits