Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 11:30:15 +00:00

Author	SHA1	Message	Date
rcourtman	177ae5f6da	Tighten integer and allocation bounds for CodeQL	2026-03-31 09:50:11 +01:00
rcourtman	9155480bbd	Use explicit integer bounds in Proxmox parsing	2026-03-31 09:43:04 +01:00
rcourtman	33efdc3fb5	Normalize outbound client and update URLs	2026-03-31 09:31:56 +01:00
rcourtman	e93c8b40ae	Fix CodeQL integer and audit findings	2026-03-28 13:33:48 +00:00
rcourtman	e306c0a461	Tolerate partial guest network address payloads (#1319 ) Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-03-27 17:09:09 +00:00
rcourtman	81b0a567ce	Harden guest network interface parsing (#1319 )	2026-03-27 17:05:34 +00:00
rcourtman	2ed4253573	Accept object-style single guest fsinfo results (#1319 )	2026-03-27 16:33:41 +00:00
rcourtman	d11e3d8f2d	Use Ceph monmap and mgrmap counts in cluster summaries (#1319 )	2026-03-27 16:23:57 +00:00
rcourtman	3d27c8f006	Accept object-style guest fsinfo disk metadata (#1319 )	2026-03-27 15:24:40 +00:00
rcourtman	fcfa0c2903	Skip malformed guest fsinfo entries (#1319 )	2026-03-27 15:23:13 +00:00
rcourtman	d4242d9a13	Fix ZFS pool attachment in storage frontend (discussion #1351 )	2026-03-27 14:59:52 +00:00
rcourtman	b5629fb1df	Normalize Windows volume GUID fsinfo mountpoints (#1319 )	2026-03-27 14:04:58 +00:00
rcourtman	b05d2b0489	Handle Windows fsinfo name fallback for guest disks (#1319 )	2026-03-27 11:39:22 +00:00
rcourtman	1f332bee52	Support privileged fsinfo totals for guest disks (#1319 )	2026-03-27 11:18:53 +00:00
rcourtman	1885bd02c0	Fix Proxmox tag color parsing (#1348 )	2026-03-25 10:40:31 +00:00
rcourtman	3a02dd171b	fix(proxmox): add GetClusterOptions to ClusterClient for tag colour fetch	2026-03-15 19:51:20 +00:00
rcourtman	caff845c1a	fix(ui): use Proxmox tag colours from datacenter config Pulse was generating tag colours from a hash of the tag name instead of using the colours configured in Proxmox. Now polls /cluster/options once per PVE instance and merges the tag-style colour map into state, which the frontend uses as the first-priority colour source for tag badges. Falls back to the existing special-tag and hash-based colours when Proxmox hasn't set a custom colour for a tag.	2026-03-15 19:49:46 +00:00
rcourtman	0ae2806f18	fix(memory): add guest agent /proc/meminfo fallback to avoid VM memory inflation (#1270 ) Proxmox status.Mem includes page cache as "used" memory, inflating reported VM usage. The existing fallbacks (balloon meminfo, RRD, linked host agent) were frequently unavailable, causing most VMs to fall through to the inflated status-mem source. Adds a new last-resort fallback that reads /proc/meminfo via the QEMU guest agent file-read endpoint to get accurate MemAvailable. Results are cached (60s positive, 5min negative backoff for unsupported VMs). Also fixes: RRD memavailable fallback missing from traditional polling path, cache key collisions in multi-PVE setups, FreeMem underflow guard inconsistency, and integer overflow in kB-to-bytes conversion.	2026-02-20 13:31:52 +00:00
rcourtman	a54d71117b	fix(proxmox): prevent guest agent errors from marking endpoints unhealthy Backport of v6 commits a87c9950 and 347d7db1. Part 1 (a87c9950): Wrap the four guest agent c.get() errors with fmt.Errorf("guest agent ...: %w", err) so isVMSpecificError() correctly scopes them to the VM rather than the cluster endpoint. Part 2 (347d7db1): Replace the 20+ pattern blocklist in executeWithFailover with an allowlist via isEndpointConnectivityError(). Only true TCP/DNS/TLS failures mark an endpoint unhealthy. Any HTTP response from Proxmox — including 500 — proves the node is reachable and returns the error without affecting endpoint health.	2026-02-18 12:59:20 +00:00
rcourtman	efa916ee2a	fix(memory): correct memory reporting for Linux VMs and FreeBSD ZFS ARC Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox RRD's memavailable metric (which excludes reclaimable page cache) when the qemu-guest-agent doesn't provide MemInfo.Available. Previously the fallback was detailedStatus.Mem (total - MemFree), inflating usage to 80%+ on VMs with normal Linux page cache. Mirrors the existing LXC rrd-memavailable path. FreeBSD ZFS ARC (#1264, #1051): The host agent now reads kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts the ARC size from reported memory usage. ZFS ARC is reclaimable under memory pressure (like Linux SReclaimable) but gopsutil counts it as wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms. Fixes #1270 Fixes #1264 Fixes #1051 (cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)	2026-02-18 12:56:53 +00:00
rcourtman	815c990e85	fix(proxmox): avoid 403 on apt update checks	2026-02-09 20:28:09 +00:00
rcourtman	13a6f7750c	Minor updates to main and proxmox client	2026-01-28 16:52:50 +00:00
rcourtman	ebc29b4fdb	feat: show pending apt updates for Proxmox nodes (#1083 ) - Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model - Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update) - Add 30-minute polling cache to avoid excessive API calls - Add pendingUpdates to frontend Node type - Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+) - Update test stubs for interface compliance Requires Sys.Audit permission on Proxmox API token to read apt updates.	2026-01-21 10:53:36 +00:00
rcourtman	96b7370f7b	test: improve coverage for API, AI, Alerts, and Frontend Utils - Add comprehensive tests for internal/api/config_handlers.go (Phases 1-3) - Improve test coverage for AI tools, chat service, and session management - Enhance alert and notification tests (ResolvedAlert, Webhook) - Add frontend unit tests for utils (searchHistory, tagColors, temperature, url) - Add proximity client API tests	2026-01-20 15:52:39 +00:00
rcourtman	a6a8efaa65	test: Add comprehensive test coverage across packages New test files with expanded coverage: API tests: - ai_handler_test.go: AI handler unit tests with mocking - agent_profiles_tools_test.go: Profile management tests - alerts_endpoints_test.go: Alert API endpoint tests - alerts_test.go: Updated for interface changes - audit_handlers_test.go: Audit handler tests - frontend_embed_test.go: Frontend embedding tests - metadata_handlers_test.go, metadata_provider_test.go: Metadata tests - notifications_test.go: Updated for interface changes - profile_suggestions_test.go: Profile suggestion tests - saml_service_test.go: SAML authentication tests - sensor_proxy_gate_test.go: Sensor proxy tests - updates_test.go: Updated for interface changes Agent tests: - dockeragent/signature_test.go: Docker agent signature tests - hostagent/agent_metrics_test.go: Host agent metrics tests - hostagent/commands_test.go: Command execution tests - hostagent/network_helpers_test.go: Network helper tests - hostagent/proxmox_setup_test.go: Updated setup tests - kubernetesagent/_test.go: Kubernetes agent tests Core package tests: - monitoring/kubernetes_agents_test.go, reload_test.go - remoteconfig/client_test.go, signature_test.go - sensors/collector_test.go - updates/adapter_installsh__test.go: Install adapter tests - updates/manager__test.go: Update manager tests - websocket/hub__test.go: WebSocket hub tests Library tests: - pkg/audit/export_test.go: Audit export tests - pkg/metrics/store_test.go: Metrics store tests - pkg/proxmox/_test.go: Proxmox client tests - pkg/reporting/reporting_test.go: Reporting tests - pkg/server/_test.go: Server tests - pkg/tlsutil/extra_test.go: TLS utility tests Total: ~8000 lines of new test code	2026-01-19 19:26:18 +00:00
rcourtman	80444a9022	fix(monitor): use cluster quorum status instead of endpoint count for health Previously, when some cluster endpoints were unreachable (e.g., backup nodes intentionally offline), the cluster was marked as "degraded" even though the Proxmox cluster itself was healthy and had quorum. Now the connection health check queries the Proxmox cluster's actual quorum status. A cluster is only marked "degraded" if it has lost quorum (not enough votes for consensus), which is the actual indicator of cluster instability. This means: - Cluster with quorum + some nodes offline = "healthy" - Cluster without quorum = "degraded" (warning) - All endpoints down = "error" Fixes #1085	2026-01-11 11:54:02 +00:00
rcourtman	bd1df9f942	feat: automatic subnet preference for cluster node discovery When discovering cluster nodes, Pulse now automatically prefers IPs on the same subnet as the initial connection. This fixes the common issue where Pulse used internal cluster network IPs (e.g., 172.x.x.x) instead of management network IPs (e.g., 10.x.x.x). How it works: 1. Extract subnet from initial connection URL (assumes /24 for IPv4) 2. For each discovered node, query /nodes/{node}/network for all IPs 3. If cluster-reported IP is on a different subnet, find an IP on the preferred subnet and set it as IPOverride 4. Manual IPOverride settings are preserved and take precedence This eliminates the need for manual IPOverride configuration in most multi-network Proxmox setups. Refs #929, #1066	2026-01-08 23:12:30 +00:00
rcourtman	d0191d136f	fix: Add configurable poll timeout and handle external Ceph storage Changes: 1. Add MAX_POLL_TIMEOUT env var for large Proxmox clusters that need more than 3 minutes for polling (default: 3m, minimum: 30s) 2. Handle external Ceph storage gracefully - don't mark nodes unhealthy when Proxmox returns 'binary not installed' (e.g., for Ceph not managed by Proxmox) Related to #965	2026-01-05 23:34:33 +00:00
rcourtman	45d4d68127	fix: Add debug logging and response format handling for replication status - Add comprehensive debug logging to diagnose replication status fetch failures - Handle both array and single-object response formats from Proxmox API - Log raw response body for easier debugging - Log success/failure for each enrichment step This helps diagnose issue #992 where replication last/next sync times aren't showing. The logging will reveal if the API call is failing, returning empty data, or returning data in an unexpected format. Related to #992	2026-01-04 15:01:32 +00:00
rcourtman	4cd3e53c3e	test: add regression tests for missing frontend fields Ensures that LinkedHostAgentId, CommandsEnabled, IsLegacy, and LinkedNodeId are correctly propagated to the frontend. This prevents regressions of the bugs fixed for #952 and #971.	2026-01-02 20:45:35 +00:00
rcourtman	3fdf753a5b	Enhance devcontainer and CI workflows - Add persistent volume mounts for Go/npm caches (faster rebuilds) - Add shell config with helpful aliases and custom prompt - Add comprehensive devcontainer documentation - Add pre-commit hooks for Go formatting and linting - Use go-version-file in CI workflows instead of hardcoded versions - Simplify docker compose commands with --wait flag - Add gitignore entries for devcontainer auth files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-01 22:29:15 +00:00
rcourtman	567a4ad147	fix(replication): fetch status from per-node endpoint The /cluster/replication endpoint only returns job configuration (guest, schedule, source, target), not status data (last_sync, next_sync, duration, fail_count, state). This fix enriches each replication job with status from the per-node endpoint /nodes/{node}/replication/{id}/status to get timing and state data needed for proper UI display. Added integration tests to verify: - Status endpoint is called and data is merged correctly - Graceful handling when status endpoint fails Fixes #992	2025-12-31 23:58:06 +00:00
rcourtman	3fd20340d1	fix: increase PBS storage content timeout to 60s PBS storage content queries with encrypted backups can take 10-20+ seconds to enumerate. The previous 30s timeout was causing intermittent failures when polling backup data from PBS storage configured in PVE. This increases the timeout to 60s to accommodate slow PBS backends while still preventing indefinite hangs on unavailable NFS/network storage.	2025-12-26 00:21:17 +00:00
rcourtman	e0dc6695fc	fix: Per-node TLS fingerprints for cluster peers (TOFU) When a PVE cluster has unique self-signed certificates on each node, Pulse would mark secondary nodes as unhealthy because only the primary node's fingerprint was used for all connections. Now, during cluster discovery, Pulse captures each node's TLS fingerprint and uses it when connecting to that specific node. This enables "Trust On First Use" (TOFU) for clusters with unique per-node certs. Changes: - Add Fingerprint field to ClusterEndpoint config - Add FetchFingerprint() to tlsutil for capturing node certs - validateNodeAPI() now captures and returns fingerprints during discovery - NewClusterClient() accepts endpointFingerprints map for per-node certs - All client creation paths use per-endpoint fingerprints when available Related to #879	2025-12-24 10:05:03 +00:00
rcourtman	969fa0e509	test: add unit tests for AI, Kubernetes agent, and clients	2025-12-17 12:47:36 +00:00
rcourtman	a115af6906	feat: Improve cluster endpoint error messages for users - Add sanitizeEndpointError() to transform raw Go errors into user-friendly messages - Transform 'context deadline exceeded' into helpful messages mentioning possible causes - Storage timeout errors now suggest checking PBS/NFS/Ceph backend connectivity - Connection refused, certificate errors, and auth errors get actionable hints - Apply sanitization everywhere cluster endpoint lastError is stored - Add comprehensive tests for all error transformations	2025-12-16 21:50:02 +00:00
rcourtman	fa13919987	fix(ai-chat): Display messages chronologically in AI chatbot - Add 'content' type to StreamDisplayEvent for tracking text chunks - Track content events in streamEvents array for chronological display - Update render to use Switch/Match for cleaner conditional rendering - Interleave thinking, tool calls, and content as they stream in - Add fallback for old messages without streamEvents for backwards compat Previously, tool/command outputs stayed at top while AI text responses accumulated at the bottom. Now all events appear in order like a normal chatbot.	2025-12-11 23:02:59 +00:00
rcourtman	8948e84fe5	feat: AI features, agent improvements, and host monitoring enhancements AI Chat Integration: - Multi-provider support (Anthropic, OpenAI, Ollama) - Streaming responses with markdown rendering - Agent command execution for remote troubleshooting - Context-aware conversations with host/container metadata Agent Updates: - Add --enable-proxmox flag for automatic PVE/PBS token setup - Improve auto-update with semver comparison (prevents downgrades) - Add updatedFrom tracking to report previous version after update - Reduce initial update check delay from 30s to 5s - Add agent version column to Hosts page table Host Metrics: - Add DiskIO stats collection (read/write bytes, ops, time) - Improve disk filtering to exclude Docker overlay mounts - Add RAID array monitoring via mdadm - Enhanced temperature sensor parsing Frontend: - New Agent Version column on Hosts overview table - Improved node modal with agent-first installation flow - Add DiskIO display in host drawer - Better responsive handling for metric bars	2025-12-05 10:37:02 +00:00
rcourtman	4f824ab148	style: Apply gofmt to 37 files Standardize code formatting across test files and monitor.go. No functional changes.	2025-12-02 17:21:48 +00:00
rcourtman	c812720f25	test: Add Disk UnmarshalJSON RPM and error path tests Cover RPM field handling (numeric, string, SSD, N/A, null, invalid), invalid JSON error path, and unexpected type fallbacks for both wearout and RPM fields. Coverage: 50% → 95.5%	2025-12-02 02:23:44 +00:00
rcourtman	618fc084f1	test: Add invalid user format tests for NewClient Test error handling for password authentication user format validation: - Missing realm separator (no @) - Empty user string - Multiple @ symbols Improves NewClient coverage from 74.2% to 83.9%.	2025-12-02 01:25:11 +00:00
rcourtman	de33653dc2	test: Add invalid value tests for VMFileSystem.UnmarshalJSON Test error handling for JSON parsing edge cases: - Invalid JSON syntax - Unsupported field types (bool, array) - Unparseable string values for total-bytes and used-bytes Improves coverage from 83.3% to 94.4%.	2025-12-02 01:22:42 +00:00
rcourtman	79afff8ba2	test: Add invalid value tests for MemoryStatus.UnmarshalJSON Test error handling for JSON parsing edge cases: - Invalid JSON syntax - Unsupported field types (bool, array, object) - Unparseable string values Improves coverage from 70.0% to 83.3%.	2025-12-02 01:20:15 +00:00
rcourtman	22d9e2795c	test: Add permanent failure test for ClusterClient.GetNodes Tests the error logging path when all endpoints fail with auth error (83.3% to 91.7% coverage).	2025-12-02 01:05:48 +00:00
rcourtman	5bbf7de1a3	test: Add JSON decode error test for Client.GetNodes Tests the error path when server returns invalid JSON (87.5% to 100%).	2025-12-02 01:03:30 +00:00
rcourtman	490fd9a810	test: Add edge cases for parseReplicationJob fields - Test jobid fallback when id field is missing - Test jobnum field takes precedence over ID parsing - Test last_sync_duration and duration fields - Test last-sync-duration fallback format - Test next_sync and next-sync fallback formats Coverage: 79.7% → 100%	2025-12-02 00:24:40 +00:00
rcourtman	29e01f8ff5	test: Add edge case for coerceUint64 ParseUint error branch String 'abc' without .eE characters triggers ParseUint error path. Coverage: 97.4% to 100%.	2025-12-01 23:44:04 +00:00
rcourtman	e2172b16de	test: Add edge case test for isNotImplementedError fallback branch Tab character triggers extractStatusCode fallback path (regex \s+ matches tab but ' 501' substring check doesn't). Coverage: 87.5% to 100%.	2025-12-01 23:18:45 +00:00
rcourtman	2afc7f0c41	test: Add edge case tests for parseWearoutValue function Add 4 new test cases covering previously untested branches: - Float zero exactly (0.0) - Float negative zero (-0.0) - Only escaped quotes becoming empty after trimming - Quoted whitespace becoming empty after trimming Coverage improved from 95.8% to 100%.	2025-12-01 23:02:18 +00:00
rcourtman	be892f5e07	fix: match storage timeout errors without trailing slash The error pattern `/storage/` only matched storage content endpoints (`/storage/{name}/content`) but not the main storage list endpoint (`/nodes/{node}/storage`). This caused storage timeout errors like: Get ".../nodes/pve-100-224/storage": context deadline exceeded to incorrectly mark cluster nodes as unhealthy, even though the timeout was due to a slow cross-node storage query, not actual node connectivity issues. Fixes #754	2025-12-01 22:48:01 +00:00

1 2

98 commits