Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 19:41:17 +00:00

Author	SHA1	Message	Date
rcourtman	4120d75359	Surface shared cluster-only storage in alerts (#1341 )	2026-03-30 19:25:54 +01:00
rcourtman	03b620a429	Parallelize legacy Proxmox VM guest-agent polling (#1319 )	2026-03-27 16:20:48 +00:00
rcourtman	0abbb8ba92	Rotate legacy guest-agent VM priority across polls (#1319 )	2026-03-27 16:17:48 +00:00
rcourtman	01f916dcb5	Use linked host-agent disk data for guest fallback (#1319 )	2026-03-27 15:56:20 +00:00
rcourtman	d4242d9a13	Fix ZFS pool attachment in storage frontend (discussion #1351 )	2026-03-27 14:59:52 +00:00
rcourtman	2a4432048a	Continue guest-agent polling after transient status failures (#1319 )	2026-03-27 14:50:28 +00:00
rcourtman	01e4227ec7	Preserve cached guest metadata in legacy PVE VM poll (#1319 )	2026-03-27 14:35:40 +00:00
rcourtman	e508bc3380	Prefer sane VM free-mem fallback over false full-usage samples (#1319 )	2026-03-27 13:55:07 +00:00
rcourtman	8fc41f774c	Keep normalized Windows guest disks in efficient VM polling (#1319 )	2026-03-27 13:51:55 +00:00
rcourtman	627181566a	Allow SSH temperature fallback when host agent lacks SMART	2026-03-26 22:40:43 +00:00
rcourtman	ae6b663e95	Attach ZFS pools for dataset-backed storages	2026-03-26 22:29:32 +00:00
rcourtman	92e6075ee4	Fix ZFS pool matching for local-zfs storages	2026-03-26 09:09:17 +00:00
rcourtman	e9bbc35bae	Stabilize repeated low-trust VM memory fallbacks (#1319 )	2026-03-26 00:23:29 +00:00
rcourtman	2196327769	Preserve VM guest metadata across transient agent gaps (#1319 )	2026-03-26 00:12:19 +00:00
rcourtman	4ad7e51875	Prefer linked host disk metrics for v5 Proxmox nodes	2026-03-25 16:54:00 +00:00
rcourtman	7dab977d91	Add split memory bar showing Used \| Cache \| Free segments (#1302 ) Show reclaimable buff/cache as a distinct amber segment between used (green) and free (gray) in the memory bar. This explains why Pulse's memory percentage differs from Proxmox: Pulse reports cache-aware usage (MemAvailable) while Proxmox includes cache as used (Total-Free). Backend: add Cache field to Memory model, derived from MemInfo (Available - Free). Only uses MemInfo.Free (not FreeMem fallback) to avoid inflating cache by the balloon gap on ballooned VMs. Frontend: StackedMemoryBar renders three segments with tooltip breakdown. Tooltip Free accounts for balloon limit when active. Percentage label and alerts remain cache-aware (unchanged).	2026-03-10 10:16:14 +00:00
rcourtman	abbd0df609	Fix disk metric spikes when guest agent intermittently fails (#1319 ) Carry forward previous cycle's disk data when the QEMU guest agent times out or errors, instead of falling back to Proxmox cluster/resources which always reports 0 for VM disk usage. Applied to both polling paths (pollVMsAndContainersEfficient and pollVMsWithNodes) with safety guards against uint64 underflow and permanent-failure exclusions.	2026-03-09 18:23:15 +00:00
rcourtman	572520ebc6	Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270 ) Move the guest-agent file-read of /proc/meminfo earlier in the memory fallback chain so it runs before RRD, giving real-time MemAvailable that correctly excludes reclaimable buff/cache on Linux VMs. Also add VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use comma-separated privilege strings.	2026-03-09 10:04:28 +00:00
rcourtman	ff1bbe2fb8	Guard per-VM guest agent calls with timeout and panic recovery (#1319 ) A broken or hung qemu-agent on one VM could stall the entire polling loop, preventing higher-VMID VMs from being detected. Wrap all guest agent work in a 10s per-VM budget with panic recovery, and add a 2s timeout to GetVMStatus in the efficient poller to match the legacy path.	2026-03-07 22:30:18 +00:00
rcourtman	499ab812e3	Fix post-release regressions and lock v5 to single-tenant runtime	2026-03-05 23:46:35 +00:00
rcourtman	a4571f580b	fix(monitoring): harden VM memory selection and flag repeated VM usage	2026-03-03 16:19:17 +00:00
rcourtman	60bdc9a101	fix(memory): skip meminfo-derived when balloon lacks cache metrics (#1302 ) When the balloon driver reports Free but not Buffers or Cached, the meminfo-derived fallback computed memAvailable = Free alone, counting all reclaimable page cache as used memory. This caused Linux VMs to show wildly inflated usage (e.g. 93% when actual is 21%). Now meminfo-derived requires at least one cache metric (Buffers > 0 or Cached > 0) before trusting the value. When missing, the code falls through to RRD/guest-agent/Total-Used fallbacks which provide accurate cache-aware data. Both efficient and traditional polling paths are now consistent.	2026-03-02 11:48:18 +00:00
rcourtman	32746e2d2a	fix(monitoring): use RRD memavailable fallback when PVE node cache metrics missing (#1270 ) When Proxmox /nodes/{node}/status returns only total/used/free without available/buffers/cached, EffectiveAvailable() returns Free (non-zero), causing the RRD fallback gate to be skipped. This results in inflated node memory where cache/buffers are counted as "used." Widen the RRD fallback condition from requiring effectiveAvailable == 0 to triggering whenever missingCacheMetrics is true. Add negative caching for failed RRD lookups (2-minute backoff) to avoid repeated retries.	2026-02-21 22:47:20 +00:00
rcourtman	0ae2806f18	fix(memory): add guest agent /proc/meminfo fallback to avoid VM memory inflation (#1270 ) Proxmox status.Mem includes page cache as "used" memory, inflating reported VM usage. The existing fallbacks (balloon meminfo, RRD, linked host agent) were frequently unavailable, causing most VMs to fall through to the inflated status-mem source. Adds a new last-resort fallback that reads /proc/meminfo via the QEMU guest agent file-read endpoint to get accurate MemAvailable. Results are cached (60s positive, 5min negative backoff for unsupported VMs). Also fixes: RRD memavailable fallback missing from traditional polling path, cache key collisions in multi-PVE setups, FreeMem underflow guard inconsistency, and integer overflow in kB-to-bytes conversion.	2026-02-20 13:31:52 +00:00
rcourtman	fb7582c7e4	fix(memory): use linked Pulse host agent memory to avoid VM inflation (#1270 ) When no guest agent MemInfo or RRD data is available, prefer the linked Pulse host agent's memory (read from /proc/meminfo via gopsutil, which excludes page cache) over Proxmox's status.Mem (total - free, inflated by reclaimable cache). Applied to both efficient and traditional polling paths. Diagnostic fields added to VMMemoryRaw for visibility.	2026-02-19 19:04:19 +00:00
rcourtman	d4ff967815	fix: scope shared storage aggregation to per-instance to prevent cross-instance merging The shared storage deduplication key was just the storage name, causing storages with the same name from different Proxmox instances (or PVE + PBS) to be incorrectly merged into a single entry. This made one random host appear to have all storages from all instances. Include the instance name in the aggregation key so shared storage is only merged within the same Proxmox cluster/instance. Fixes #1246	2026-02-11 09:18:09 +00:00
rcourtman	902bdd92c2	fix: prefer status-mem over status-freemem for VM memory calculation Proxmox's FreeMem field reports free memory relative to the balloon's guest-visible total (total_mem), not relative to MaxMem. When ballooning is active and the VM's memory has been reduced, subtracting FreeMem from MaxMem produces wildly inflated usage (e.g. 97% when actual usage is 20%). Proxmox's Mem field is already calculated as (total_mem - free_mem), giving the correct used bytes regardless of balloon state. Swap the priority so Mem is checked before FreeMem. Related to #1185	2026-02-04 12:08:33 +00:00
rcourtman	19a67dd4f3	Update core infrastructure components Config: - AI configuration improvements - API tokens handling - Persistence layer updates Host Agent: - Command execution improvements - Better test coverage Infrastructure Discovery: - Service improvements - Enhanced test coverage Models: - State snapshot updates - Model improvements Monitoring: - Polling improvements - Guest config handling - Storage config support WebSocket: - Hub tenant test updates Service Discovery: - New service discovery module	2026-01-28 16:52:35 +00:00
rcourtman	2e0da42a81	chore: reliability and maintenance improvements Host agent: - Add SHA256 checksum verification for downloaded binaries - Verify checksum file matches expected bundle filename WebSocket: - Add write failure tracking with graceful disconnection - Increase write deadline to 30s for large state payloads - Better handling for slow clients (Raspberry Pi, slow networks) Monitoring: - Remove unused temperature proxy imports - Add monitor polling improvements - Expand test coverage Other: - Update package.json dependencies - Fix generate-release-notes.sh path handling - Minor reporting engine cleanup	2026-01-22 00:45:04 +00:00
rcourtman	ebc29b4fdb	feat: show pending apt updates for Proxmox nodes (#1083 ) - Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model - Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update) - Add 30-minute polling cache to avoid excessive API calls - Add pendingUpdates to frontend Node type - Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+) - Update test stubs for interface compliance Requires Sys.Audit permission on Proxmox API token to read apt updates.	2026-01-21 10:53:36 +00:00
rcourtman	103eb9c3e0	feat(monitoring): auto-detect Docker inside LXC containers Adds automatic Docker detection for Proxmox LXC containers: - New HasDocker and DockerCheckedAt fields on Container model - Docker socket check via connected agents on first run, restart, or start - Parallel checking with timeouts for efficiency - Caches results and only re-checks after state transitions This enables the AI to know which LXC containers are Docker hosts for better infrastructure guidance.	2026-01-17 14:42:52 +00:00
rcourtman	1f4f0472b0	fix: use configured memory (MaxMem) instead of balloon for VM total Previously, when memory ballooning was active on a VM, Pulse would use the balloon value as the total memory instead of the configured MaxMem. This caused confusing displays where a 4GB VM with 1GB balloon would show "94% (966MB/1GB)" instead of "24% (966MB/4GB)". The balloon value is still tracked in memory.balloon for the frontend's yellow balloon marker visualization, but no longer replaces the total. Fixes #1070	2026-01-10 15:37:45 +00:00
rcourtman	2a8f55d719	feat(enterprise): add Advanced Reporting and Audit Webhooks integration This commit adds enterprise-grade reporting and audit capabilities: Reporting: - Refactored metrics store from internal/ to pkg/ for enterprise access - Added pkg/reporting with shared interfaces for report generation - Created API endpoint: GET /api/admin/reports/generate - New ReportingPanel.tsx for PDF/CSV report configuration Audit Webhooks: - Extended pkg/audit with webhook URL management interface - Added API endpoint: GET/POST /api/admin/webhooks/audit - New AuditWebhookPanel.tsx for webhook configuration - Updated Settings.tsx with Reporting and Webhooks tabs Server Hardening: - Enterprise hooks now execute outside mutex with panic recovery - Removed dbPath from metrics Stats API to prevent path disclosure - Added storage metrics persistence to polling loop Documentation: - Updated README.md feature table - Updated docs/API.md with new endpoints - Updated docs/PULSE_PRO.md with feature descriptions - Updated docs/WEBHOOKS.md with audit webhooks section	2026-01-09 21:31:49 +00:00
rcourtman	3e2824a7ff	feat: remove Enterprise badges, simplify Pro upgrade prompts - Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash - Remove all Enterprise/Pro badges from nav and feature headers - Simplify upgrade CTAs to clean 'Upgrade to Pro' links - Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md - Align terminology: single Pro tier, no separate Enterprise tier Also includes prior refactoring: - Move auth package to pkg/auth for enterprise reuse - Export server functions for testability - Stabilize CLI tests	2026-01-09 16:51:08 +00:00
rcourtman	5c4399d69f	feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929 )	2026-01-09 14:45:29 +00:00
rcourtman	568aac6bd0	fix: multiple triage fixes for stability and correctness 1. Use correct mutex (diagMu) in cleanupDiagnosticSnapshots to prevent "concurrent map iteration and map write" panics (Fixes #1063) 2. Use cluster name for storage instance comparison in UpdateStorageForInstance to prevent storage duplication in clustered Proxmox setups (Fixes #1062) 3. Fix KUBECONFIG unbound variable error in install.sh by using ${KUBECONFIG:-} default parameter expansion (Fixes #1065)	2026-01-08 22:54:33 +00:00
rcourtman	06ebaf50b2	fix: use consistent ID for shared storage to prevent duplication (#1049 ) Shared storage was duplicating across polling cycles because the ID included the node name of whichever node reported it first. When a different node reported first on the next cycle, a new ID was created. This fix updates the shared storage aggregation to use a consistent ID format (instance-cluster-storageName) that doesn't include the node name. Closes #1049. Thanks to @siccous for the report and initial investigation.	2026-01-08 21:29:24 +00:00
rcourtman	9cfcdbb247	fix: Use per-node shared flag for storage deduplication The storage deduplication logic only checked cluster config's Shared flag, but this required the cluster config API call to succeed. When the per-node storage API already returns shared=1 (as the user verified), we should use that directly. Now we check three sources for shared storage detection: 1. Per-node API shared flag (storage.Shared) 2. Cluster config shared flag (if available) 3. Storage type heuristics (NFS, RBD, PBS, etc.) Related to #1049	2026-01-07 10:16:23 +00:00
rcourtman	96d06da0d7	fix: Deduplicate shared storages (NFS, RBD, PBS, etc) in cluster view Shared storages were appearing multiple times (once per node) because the deduplication logic only checked the Proxmox `Shared` flag. Many storage types are inherently cluster-wide but don't set this flag: - RBD (Ceph block storage) - CephFS - PBS (Proxmox Backup Server) - GlusterFS - NFS - CIFS/SMB - iSCSI Now we detect shared storage based on both the Shared flag AND the storage type. Inherently shared storage types are deduplicated and shown once with a "cluster" node designation. Related to #1049	2026-01-06 17:44:52 +00:00
rcourtman	ed78509f92	Fix flaky tests and improve coverage across alerts, api, and config packages - Fix deadlock and race conditions in internal/alerts - Add comprehensive error path tests for internal/config - Fix 401 handling in internal/api - Fix Docker Swarm task filtering test logic	2026-01-03 18:36:17 +00:00
rcourtman	800fab10c2	fix: Use LinkedNodeID for temperature matching to fix duplicate hostname bug When two Proxmox nodes have the same hostname (e.g., 'px1' on different IPs), the getHostAgentTemperature function was matching by hostname alone, causing both nodes to show temperature from whichever host agent appeared first. The fix: - Added getHostAgentTemperatureByID that first tries matching by LinkedNodeID (the unique node ID) before falling back to hostname matching - Updated the caller to pass modelNode.ID for precise matching - Maintains backwards compatibility for setups where linking hasn't occurred Related to #891	2025-12-25 10:00:19 +00:00
rcourtman	968e0a7b3d	fix: reduce syslog flooding by downgrading routine logs to debug level Addresses issue #861 - syslog flooded on docker host Many routine operational messages were being logged at INFO level, causing excessive log volume when monitoring multiple VMs/containers. These messages are now logged at DEBUG level: - Guest threshold checking (every guest, every poll cycle) - Storage threshold checking (every storage, every poll cycle) - Host agent linking messages - Filesystem inclusion in disk calculation - Guest agent disk usage replacement - Polling start/completion messages - Alert cleanup and save messages Users can set LOG_LEVEL=debug to see these messages if needed for troubleshooting. The default INFO level now produces significantly less log output. Also updated documentation in CONFIGURATION.md and DOCKER.md to: - Clarify what each log level includes - Add tip about using LOG_LEVEL=warn for minimal logging	2025-12-18 23:27:32 +00:00
rcourtman	c91307be94	fix: guest URL icon now appears/disappears immediately after AI sets/removes it The issue was a SolidJS reactivity problem in the Dashboard component. When guestMetadata signal was accessed inside a For loop callback and assigned to a plain variable, SolidJS lost reactive tracking. Changed from: const metadata = guestMetadata()[guestId] \|\| ... customUrl={metadata?.customUrl} To: const getMetadata = () => guestMetadata()[guestId] \|\| ... customUrl={getMetadata()?.customUrl} This ensures SolidJS properly tracks the signal dependency when the getter function is called directly in JSX props.	2025-12-18 14:42:47 +00:00
rcourtman	397871629c	fix: cluster-aware guest deduplication and multi-agent token binding - Add cluster-aware guest ID generation (clusterName-VMID instead of instanceName-VMID) to prevent duplicate VMs/containers when multiple cluster nodes are monitored - Add cluster deduplication at registration time - when a node is added that belongs to an already-configured cluster, merge as endpoint instead of creating duplicate - Add startup consolidation to automatically merge duplicate cluster instances - Change host agent token binding from agent GUID to hostname, allowing: - Multiple host agents to share a token (each bound by hostname) - Agent reinstalls on same host without token conflicts - Remove 12-character password minimum requirement - Remove emoji from auto-registration success message - Fix grouped view node lookup to support both cluster-aware node IDs (clusterName-nodeName) and legacy guest grouping keys (instance-nodeName) Fixes duplicate guests appearing when agents are installed on multiple cluster nodes. Also improves multi-agent UX by allowing shared tokens.	2025-12-14 10:16:17 +00:00
rcourtman	c7361362b3	fix: Robust OCI container detection with state persistence Backend: - Seed OCI classification from previous state so containers never 'downgrade' to LXC if config fetching intermittently fails - Prevent type regression in recordGuestSnapshot when OCI was previously detected - Move metrics zeroing before snapshot recording for cleaner flow Frontend: - Add isOCIContainer() memo that checks both type and isOci flag - Use isOCI helper in Dashboard.tsx for AI context building - Include oci-container type in useResources container conversion - Preserve isOci and osTemplate fields through legacy conversion This ensures OCI containers retain their classification even when Proxmox API permissions or transient errors prevent config reads.	2025-12-12 20:06:39 +00:00
rcourtman	fa13919987	fix(ai-chat): Display messages chronologically in AI chatbot - Add 'content' type to StreamDisplayEvent for tracking text chunks - Track content events in streamEvents array for chronological display - Update render to use Switch/Match for cleaner conditional rendering - Interleave thinking, tool calls, and content as they stream in - Add fallback for old messages without streamEvents for backwards compat Previously, tool/command outputs stayed at top while AI text responses accumulated at the bottom. Now all events appear in order like a normal chatbot.	2025-12-11 23:02:59 +00:00
rcourtman	927ac76bad	feat: AI integration, Docker metrics, RAID display, and infrastructure improvements - Add Claude OAuth authentication support with hybrid API key/OAuth flow - Implement Docker container historical metrics in backend and charts API - Add CEPH cluster data collection and new Ceph page - Enhance RAID status display with detailed tooltips and visual indicators - Fix host deduplication logic with Docker bridge IP filtering - Fix NVMe temperature collection in host agent - Add comprehensive test coverage for new features - Improve frontend sparklines and metrics history handling - Fix navigation issues and frontend reload loops	2025-12-09 09:29:27 +00:00
rcourtman	bcd7b550d4	AI Problem Solver implementation and various fixes - Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters - Add 'Investigate with AI' button to filter bar for problematic guests - Fix dashboard column sizing inconsistencies between bars and sparklines view modes - Fix PBS backups display and polling - Refine AI prompt for general-purpose usage - Fix frontend flickering and reload loops during initial load - Integrate persistent SQLite metrics store with Monitor - Fortify AI command routing with improved validation and logging - Fix CSRF token handling for note deletion - Debug and fix AI command execution issues - Various AI reliability improvements and command safety enhancements	2025-12-06 23:46:08 +00:00
rcourtman	8948e84fe5	feat: AI features, agent improvements, and host monitoring enhancements AI Chat Integration: - Multi-provider support (Anthropic, OpenAI, Ollama) - Streaming responses with markdown rendering - Agent command execution for remote troubleshooting - Context-aware conversations with host/container metadata Agent Updates: - Add --enable-proxmox flag for automatic PVE/PBS token setup - Improve auto-update with semver comparison (prevents downgrades) - Add updatedFrom tracking to report previous version after update - Reduce initial update check delay from 30s to 5s - Add agent version column to Hosts page table Host Metrics: - Add DiskIO stats collection (read/write bytes, ops, time) - Improve disk filtering to exclude Docker overlay mounts - Add RAID array monitoring via mdadm - Enhanced temperature sensor parsing Frontend: - New Agent Version column on Hosts overview table - Improved node modal with agent-first installation flow - Add DiskIO display in host drawer - Better responsive handling for metric bars	2025-12-05 10:37:02 +00:00
rcourtman	0bc58f678e	perf: Cache err.Error() in storage timeout error handling Cache err.Error() result in two locations: - monitor.go: storage query retry logic (2x calls to 1) - monitor_polling.go: storage timeout handling (2x calls to 1)	2025-12-02 15:39:37 +00:00

1 2

73 commits