Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 03:20:11 +00:00

Author	SHA1	Message	Date
rcourtman	768b6d8b7a	fix(frontend): resolve npm audit advisories in lockfile	2026-03-02 23:59:34 +00:00
rcourtman	71a7249fd7	chore: bump version to 5.1.17 Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Helm CI / Lint and Render Chart (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-03-02 17:51:50 +00:00
rcourtman	510ec999ab	fix(api): store TLS fingerprint during auto-registration (#1303 ) The legacy auto-register endpoint captured TLS fingerprints via FetchFingerprint() but never persisted them to the node config. Nodes with self-signed certs registered via the agent would fail with "x509: certificate signed by unknown authority" on subsequent polls. Store the fingerprint in all add/update paths for both PVE and PBS, guard updates against empty-fingerprint clobber when FetchFingerprint fails, and pass the fingerprint to cluster detection configs.	2026-03-02 14:07:18 +00:00
rcourtman	10a4e994b6	fix(api): return 404 from undismiss endpoint for invalid finding IDs (#1300 ) HandleUndismissFinding now checks both patrol and unified stores before returning. Returns 404 with error message when the finding is not found or not dismissed, instead of silently returning success.	2026-03-02 11:48:23 +00:00
rcourtman	60bdc9a101	fix(memory): skip meminfo-derived when balloon lacks cache metrics (#1302 ) When the balloon driver reports Free but not Buffers or Cached, the meminfo-derived fallback computed memAvailable = Free alone, counting all reclaimable page cache as used memory. This caused Linux VMs to show wildly inflated usage (e.g. 93% when actual is 21%). Now meminfo-derived requires at least one cache metric (Buffers > 0 or Cached > 0) before trusting the value. When missing, the code falls through to RRD/guest-agent/Total-Used fallbacks which provide accurate cache-aware data. Both efficient and traditional polling paths are now consistent.	2026-03-02 11:48:18 +00:00
rcourtman	9f8f372f7c	chore: add .mcp.json to gitignore	2026-03-01 23:33:58 +00:00
rcourtman	d43dfbc490	feat(ui): add host removal action to hosts table Add an actions menu to the hosts overview with a "Remove host from Pulse" button. Includes permission checks (requires settings:write scope), confirmation handling, and a security regression test for the delete endpoint scope enforcement.	2026-03-01 23:28:33 +00:00
rcourtman	5bd0563283	test(providers): update Ollama integration tests for timeout parameter	2026-03-01 23:28:16 +00:00
rcourtman	f5365809b3	fix(installer): remove config backup that filled disk on upgrades The backup_existing function copied the entire config directory (including metrics.db at ~2.5GB) on every upgrade with no cleanup. On small VMs this filled the disk within a few releases. The upgrade only swaps the binary; config files are not modified, so the backup served no practical purpose.	2026-03-01 23:20:08 +00:00
rcourtman	0c78fab337	Auto-update Helm chart documentation	2026-03-01 23:15:53 +00:00
rcourtman	fa48369dbb	chore(release): bump version to 5.1.16	2026-03-01 22:40:55 +00:00
rcourtman	d46b5fc84b	fix(ai): route OpenRouter slash-delimited models to OpenAI provider (#1296 ) createProviderForModel() only handled "provider:model" colon format. Models like "google/gemini-2.5-flash" or "google/gemini-2.0-flash:free" (OpenRouter format) failed because the colon split produced invalid provider names. Now uses config.ParseModelString() which correctly detects slash- delimited models as OpenRouter (routed via OpenAI-compatible API).	2026-03-01 22:29:45 +00:00
rcourtman	2fcddecf80	feat(api): add POST /api/ai/patrol/undismiss endpoint to revert suppressed findings (#1300 ) The Undismiss() method existed on FindingsStore but was never exposed via the API. Users who dismissed findings as "not_an_issue" had no way to revert them. - Add HandleUndismissFinding handler and route - Add Undismiss() to UnifiedStore for parity with FindingsStore - Also remove matching explicit suppression rules on undismiss	2026-03-01 22:29:36 +00:00
rcourtman	027fd9932c	fix(proxmox): make monitor reload synchronous after auto-registration (#1303 ) Auto-register was running the monitor reload in a background goroutine, so the HTTP response was sent before the poller picked up the new node. If reload failed or was slow, the node appeared in Settings > Proxmox (reads config from disk) but not on the main Proxmox tab (reads from active polling state). Changed both auto-register paths to reload synchronously, matching the manual add path (HandleAddNode).	2026-03-01 21:04:20 +00:00
rcourtman	d852964696	fix(ai): record patrol and QuickAnalysis token usage in cost store for budget enforcement Patrol runs, evaluation passes, and QuickAnalysis calls were consuming LLM tokens without recording them in the cost store. This made the cost_budget_usd_30d budget setting ineffective since enforceBudget() never saw patrol spend. - Add RecordUsage() to ai.Service for thread-safe cost recording - Add recordPatrolUsage() helper to PatrolService, called on both success and error paths for main patrol and evaluation pass - Record QuickAnalysis token usage in cost store - Return partial PatrolResponse (with token counts) on error instead of nil, so callers can always record consumed tokens - Propagate partial response through chat_service_adapter on error	2026-03-01 19:19:47 +00:00
rcourtman	b1ff7e006f	fix(ui): show PULSE_PUBLIC_URL value in settings and expand node tables to full width (#1305 , #1304 ) Expose PublicURL from runtime config in the system settings API response so the frontend displays the actual value instead of the placeholder when the env var is set. Add w-full to PVE, PBS, and PMG node tables so they expand to fill the container in full-width mode.	2026-03-01 14:42:30 +00:00
rcourtman	c575c7e295	fix(patrol): rename wearout JSON field to ssd_life_remaining_pct (#1300 ) The AI also receives disk data via tool calls (pulse_metrics type="disks"), not just the patrol context table. The raw JSON field "wearout" was ambiguous — rename to "ssd_life_remaining_pct" so the field name itself communicates that 100 = healthy.	2026-02-27 23:12:27 +00:00
rcourtman	3006f51b60	fix(patrol): clarify wearout semantics so AI knows 100% = healthy (#1300 ) The patrol context table header said "Wearout" and the tool returned a raw "wearout" JSON field with no indication that 100 = full life remaining. The AI interpreted "wearout: 100" as fully worn out and raised false "100% Disk Wearout" findings on healthy NVMe drives. Rename the patrol table column to "SSD Life Remaining (100%=new)" and update the data type comment to clarify the semantics.	2026-02-27 23:05:02 +00:00
rcourtman	aae6035e66	fix(docs): audit and fix agent docs vs install script discrepancies (#1299 ) - Split configuration table into "Installer flags" and "Agent-only flags" so users know which flags work with `curl \| bash` vs the binary directly - Add missing --cacert and --env flags to installer docs - Fix --disable-auto-update example (install script doesn't accept it; use --env PULSE_DISABLE_AUTO_UPDATE=true instead) - Add --disable-docker/kubernetes/proxmox and --proxmox-type to install.sh show_help() - Fix --enable-docker=false in CENTRALIZED_MANAGEMENT.md	2026-02-27 21:20:54 +00:00
rcourtman	29a6335905	fix(docs): correct remaining --enable-*=false flags in agent docs (#1299 ) All --enable-docker=false, --enable-kubernetes=false, --enable-proxmox=false references replaced with --disable-docker, --disable-kubernetes, --disable-proxmox.	2026-02-27 21:14:05 +00:00
rcourtman	0bc9445eb8	fix(docs): correct --enable-host=false to --disable-host in agent docs (#1299 ) The installer uses --disable-host as a separate flag, not --enable-host=false.	2026-02-27 20:41:32 +00:00
rcourtman	b1d58fc8aa	fix(installer): avoid "No space left on device" on QNAP by writing binary to persistent storage On QNAP, /usr/local/bin is a tiny RAM disk. The installer was downloading the binary then mv'ing it there, which failed when the RAM disk was full. The QNAP-specific logic that copies to the persistent data volume only ran after that mv. Move QNAP detection before the download step so INSTALL_DIR points to the persistent data volume (e.g. /share/CACHEDEV1_DATA/.pulse-agent) directly. The wrapper script still attempts to copy to /usr/local/bin at boot but falls back to running from persistent storage if that fails. Also fixes: - pkill -f pattern in wrapper could match and kill the wrapper itself (path contains "pulse-agent"); switched to pkill -x for exact match - Upgrade detection now checks /usr/local/bin for legacy QNAP installs - Uninstall cleans up /usr/local/bin runtime copy	2026-02-27 20:41:32 +00:00
rcourtman	538b3c3bdb	Auto-update Helm chart documentation	2026-02-27 15:20:57 +00:00
rcourtman	7530b66254	fix(setup): escape printf %s in Sprintf template to fix format verb count (#1297 ) The printf '%s\n' calls in shell code within the Go Sprintf template were being counted as format verbs, causing a build failure (10 verbs but 9 args). Using %%s produces literal %s in the output.	2026-02-27 14:44:41 +00:00
rcourtman	2f059e650e	chore(release): bump version to 5.1.15	2026-02-27 14:29:10 +00:00
rcourtman	8298852483	feat(installer): add QNAP QTS/QuTS hero agent support (#1253 ) QNAP wipes /etc/init.d on every reboot, so the agent needs persistent storage on a data volume and autorun.sh boot persistence via the flash config partition. Adds detection, install (with watchdog wrapper), and clean uninstall paths. Flash config mount/umount is fail-safe via subshell isolation to prevent leaving the partition mounted on write errors.	2026-02-27 14:19:40 +00:00
rcourtman	62225e0c12	fix(alerts): scope orphaned backup detection per PVE instance to prevent false positives (#1286 ) The previous hasLiveInventory guard was a single boolean — if any PVE instance had at least one live guest, orphan detection ran for all instances. In multi-instance clusters with staggered polling, backups from instances whose VMs hadn't been polled yet appeared orphaned, producing false positive alerts with 0m duration. Replace the global boolean with a per-instance map. PVE storage backups now only run orphan detection when their specific instance has live inventory. PBS/PMG backups (which span instances) retain the "any instance has live guests" check.	2026-02-27 13:32:15 +00:00
rcourtman	4c7a79cecb	fix(setup): preserve SSH authorized_keys symlink on Proxmox and fix key entry quoting (#1297 ) The PVE setup script had three bugs in the temperature monitoring SSH key setup: - Nested double quotes in SSH_SENSORS_KEY_ENTRY broke the bash string, causing "No such file or directory" errors for the key options - The grep/mv pattern to update authorized_keys destroyed the symlink that Proxmox maintains from /root/.ssh/authorized_keys to /etc/pve/priv/ - The uninstall path grepped for "# pulse-managed-key" but keys were tagged "# pulse-sensors", so uninstall never cleaned up sensor keys Fixes: resolve symlinks with readlink -f before operating, create temp files in /tmp with mv-then-cp fallback for cross-device moves, escape inner quotes, and broaden the uninstall filter to match all pulse-prefixed keys.	2026-02-27 13:23:03 +00:00
rcourtman	9aee8fa293	fix(ui): add Pro badge to Reporting tab and reduce patrol trigger log noise (#1285 , #1258 ) Show "Pro" badge on the Reporting settings tab so users know upfront that advanced reporting requires a Pro license, rather than discovering it after filling out the form. Downgrade patrol trigger queue-full and rejection messages from Warn to Debug — these are normal rate-limiting behavior, not actionable warnings.	2026-02-26 21:09:13 +00:00
rcourtman	af712006c9	fix(ai): allow Gemini and other models via OpenRouter without false provider warning (#1296 ) Model name detection used substring matching (.includes('gemini')) which falsely required Gemini provider config for OpenRouter model IDs like "google/gemini-2.5-flash". Now only known provider prefixes are treated as explicit delimiters, slash-containing names route to OpenAI (OpenRouter convention), and colons in model names (e.g. "llama3.2:latest") are no longer misinterpreted as provider prefixes.	2026-02-26 20:49:10 +00:00
rcourtman	fa519cd8ce	fix(alerts): prevent false positive orphaned backup alerts during startup race (#1286 ) Backup polling goroutines can snapshot state before VM/container polling populates the guest inventory. When guestsByVMID is empty, every backup appears orphaned. Gate orphan detection on hasLiveInventory (at least one guest with non-empty ResourceID) and preserve existing orphan alerts when inventory becomes unavailable.	2026-02-26 20:49:10 +00:00
rcourtman	eb2397d99a	fix(notifications): route escalation notifications to selected channels only (#1259 ) Escalation was calling SendAlert() which always sends to all enabled channels, ignoring the per-level channel selection (email/webhook/all). Add SendAlertToChannels() that snapshots only the requested channel configs and uses a distinct "_escalation" queue type so the dequeue handler skips cooldown writes — preventing interference with the alert manager's own re-notify cadence.	2026-02-26 20:49:10 +00:00
rcourtman	c213e0ce30	Auto-update Helm chart documentation	2026-02-25 00:14:54 +00:00
rcourtman	a5fb155b88	chore(release): bump version to 5.1.14	2026-02-24 23:39:42 +00:00
rcourtman	77bd2e70d9	fix(notifications): add service-specific resolved webhook templates (#1259 ) Backport from v6 (`88d5865a8`). Recovery webhook notifications were using the firing PayloadTemplate which services like Telegram, Teams, Discord etc. silently rejected as malformed. Now uses a three-tier template pipeline matching the firing path: - Tier 1: Custom user template (if configured) - Tier 2: Service-specific ResolvedPayloadTemplate (Discord green embed, Telegram chat_id+text, Slack header blocks, Teams MessageCard/Adaptive, PagerDuty event_action:"resolve", Pushover, Gotify, Mattermost) - Tier 3: Generic JSON fallback (backward compatible) Also adds Event, ResolvedAt, ResolvedAtISO fields to WebhookPayloadData.	2026-02-24 23:28:33 +00:00
rcourtman	6221be7311	fix(docker): serialize batch container updates per host (#1289 ) The backend only allows one command per host at a time. The "Update All" button was firing requests in parallel chunks, causing the second container per host to fail with 400. Group targets by host and process them sequentially within each host while still updating different hosts in parallel.	2026-02-24 23:16:22 +00:00
rcourtman	24f5b1cb31	fix(patrol): cap per-run tokens and reset patrol session history	2026-02-24 11:29:47 +00:00
rcourtman	82ccb662f9	fix(notifications): use service-specific templates for resolved webhooks (#1068 ) Recovery notifications for Discord, Slack, Teams, PagerDuty, and other service webhooks were sending a generic JSON payload that lacked the required format (e.g. Discord needs `embeds`, Slack needs `blocks`), causing resolved notifications to silently fail. - Add `prepareResolvedWebhookData` to build template data with Level="resolved" - Route resolved webhooks through service-specific templates with full URL rendering, Telegram ChatID extraction, and PagerDuty routing_key - Custom user templates take precedence over built-in service templates - Return errors on service template failures instead of falling back to generic payloads that endpoints would reject - Fix PagerDuty template to send event_action="resolve" for resolved alerts	2026-02-24 10:49:52 +00:00
rcourtman	4dc09a1240	feat(alerts): add dedicated backup-orphaned alert type (#1286 ) Fire a warning alert immediately when a backup's guest no longer exists in inventory, without requiring age thresholds to be breached. The existing alertOrphaned toggle and ignoreVMIDs UI control this feature with no frontend changes needed.	2026-02-24 09:07:43 +00:00
rcourtman	ffc14c7507	fix(docker): stop CPU bars flickering for idle containers (#1288 ) The isRunning prop used a `cpuPercent > 0` gate that treated idle containers (0% CPU) as not-running, causing the bar to flip between a percentage and an em-dash on every poll cycle. Remove the value guard so visibility depends only on container running state, matching how memory, disk, and restart columns already behave.	2026-02-23 22:05:18 +00:00
rcourtman	cac5be2ca1	chore(frontend-modern): remove stale pnpm lockfile	2026-02-23 11:15:37 +00:00
rcourtman	5457b04608	fix(ai): deduplicate Docker host 3-way chain in mention picker (#1252 ) Replace first-match-only logic in upsertMentionResource with a union-merge algorithm that collects all matching keys, merges losers into a canonical winner, and re-points aliases. This fixes the case where a host agent bridges a VM and a DockerHost but only the first alias match was merged, leaving a duplicate entry in the picker.	2026-02-22 15:15:14 +00:00
rcourtman	2140efce36	Auto-update Helm chart documentation	2026-02-22 12:43:12 +00:00
rcourtman	180c8738b4	chore(release): bump version to 5.1.13	2026-02-22 12:01:38 +00:00
rcourtman	54a1ace2c5	fix(installer): remove stale sensor-proxy mount entries that prevent LXC start after reboot (#1280 ) The v4 installer added mount entries for /run/pulse-sensor-proxy to LXC container configs. After upgrading to v5 and rebooting, /run (tmpfs) is wiped and the container fails to start. The installer now detects and removes these stale mp<N> and lxc.mount.entry references automatically when run on a PVE host, and the upgrade docs include manual fix steps.	2026-02-22 10:52:12 +00:00
rcourtman	f9654f5b7a	Merge pull request #1279 from muratoda/feature/use-locale-aware-time-format Change last refresh time display format to system locale	2026-02-21 23:11:48 +00:00
rcourtman	32746e2d2a	fix(monitoring): use RRD memavailable fallback when PVE node cache metrics missing (#1270 ) When Proxmox /nodes/{node}/status returns only total/used/free without available/buffers/cached, EffectiveAvailable() returns Free (non-zero), causing the RRD fallback gate to be skipped. This results in inflated node memory where cache/buffers are counted as "used." Widen the RRD fallback condition from requiring effectiveAvailable == 0 to triggering whenever missingCacheMetrics is true. Add negative caching for failed RRD lookups (2-minute backoff) to avoid repeated retries.	2026-02-21 22:47:20 +00:00
rcourtman	1170da6a57	fix(ai): serialize linkedVmId/linkedContainerId and harden mention status (#1252 ) HostFrontend was missing LinkedVmId and LinkedContainerId fields, so the frontend dedup aliases for VM/container agents resolved to undefined and never matched. Also add .trim() to getStatusColor and default host agent status to 'online' to fix grey status dots.	2026-02-21 22:00:43 +00:00
rcourtman	b445f8d8fa	fix(agent): preserve user-configured host URL during agent re-registration (#1283 ) When an agent re-registers with the same token, the DHCP matching case would overwrite the Host field with the agent's local IP — even if the user had edited it to a public URL or different IP. Now agent source re-registrations always preserve the existing host, while non-agent DHCP updates still work. Adds 5 regression tests covering hostname preservation, public-IP preservation, agent DHCP, non-agent DHCP, and PBS parity.	2026-02-21 12:46:02 +00:00
rcourtman	50e476c942	fix(ai): fix mention status colors and dedup for docker/VM/LXC agents (#1252 ) Three fixes for remaining mention autocomplete issues: - Status dots now correctly show green/red/yellow for online/offline/ degraded statuses (previously only handled running/stopped/paused) - Docker hosts merge with their host agent via agentId cross-reference - VMs and LXC containers merge with host agents running inside them via linkedVmId/linkedContainerId backend ID aliases	2026-02-20 22:53:52 +00:00

... 5 6 7 8 9 ...

3416 commits