Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-27 00:06:13 +00:00

Author	SHA1	Message	Date
rcourtman	d9f1f7accd	feat(ai): add real-time anomaly detection endpoint Add /api/ai/intelligence/anomalies endpoint that compares live metrics against learned baselines to surface deviations - all deterministic (no LLM required). Backend: - Add AnomalyReport struct with severity classification - Add CheckResourceAnomalies method to baseline store - Add HandleGetAnomalies API handler - Add GetStateProvider getter to AI service Frontend: - Add AnomalyReport and AnomaliesResponse types - Add getAnomalies API function - Add AnomalySeverity type This is the first step toward surfacing deterministic intelligence directly in the UI without requiring LLM interaction.	2025-12-21 10:52:54 +00:00
rcourtman	417a523d85	feat(ai): add unified Intelligence orchestrator - Create Intelligence struct that aggregates all AI subsystems - Add /api/ai/intelligence endpoint for system-wide and per-resource insights - Wire Intelligence into PatrolService as a facade (not replacement) - Add TypeScript types and API client for frontend - Add unit tests for Intelligence orchestrator - Fix pre-existing test failures using diagnostic commands instead of actionable ones The Intelligence orchestrator provides: - System-wide health scoring (A-F grades) - Aggregated findings, predictions, correlations - Per-resource context generation for AI prompts - Learning progress tracking This unifies access to AI subsystems without replacing existing code paths.	2025-12-21 10:32:02 +00:00
rcourtman	57c828e934	fix: disable encryption key deletion to prevent key loss bug IMPORTANT: This disables the encryption key deletion during migration. Previously, when migrating from /etc/pulse to a new data directory, the code would DELETE the original key after copying it. This was causing mysterious key loss bugs in dev environments. Changes: - Commented out the os.Remove() call that deletes the encryption key - Keep both copies of the key for safety (old location is just unused) - Updated test to skip when production key exists (test isolation issue) The old key at /etc/pulse will now be preserved even after migration. This is safe because: 1. The new key location is checked first 2. Having a backup is better than risking data loss 3. Users can manually clean up the old key if desired	2025-12-21 00:27:16 +00:00
rcourtman	c97c4287a4	debug: add critical logging for encryption key deletion bug Added extensive logging to crypto.go to trace when the encryption key migration code runs and when it deletes the key. This is to diagnose a recurring bug where the encryption key mysteriously disappears. The logs will show: - When migration is being considered (dataDir != /etc/pulse) - When migration is skipped (dataDir == /etc/pulse) - CRITICAL log when key is about to be deleted - CRITICAL log when key has been deleted This will help identify whether it's the Go code or something external deleting the key.	2025-12-21 00:25:05 +00:00
rcourtman	96573f4aca	feat: enhance AI baseline context visibility and incident timeline improvements Backend: - Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with status indicators (normal/elevated/anomaly) instead of only when anomalous - This makes Pulse Pro's 'moat' visible - users can see the AI understands their infrastructure's normal behavior patterns - Added baseline import to service.go Frontend (user changes): - Added incident event type filtering with toggle buttons - Added resource incident panel to view all incidents for a resource - Added timeline expand/collapse functionality in alert history - Added incident note saving with proper incidentId tracking - Added startedAt parameter for proper incident timeline loading	2025-12-21 00:14:20 +00:00
rcourtman	5173fc3162	fix: normalize guest ID fallbacks to canonical instance:node:vmid format Multiple frontend components were using - as a fallback when guest.id was falsy. This format drops the node component, which is critical for clustered setups where the same VMID can exist on different nodes. Changes: - GuestDrawer.tsx: Updated guestId() and handleAskAI() to use canonical format - GuestRow.tsx: Updated buildGuestId() to use canonical format - Dashboard.tsx: Updated handleGuestRowClick() and guest rendering loop, also fixed legacy metadata fallback to use consistent keying - ThresholdsTable.tsx: Updated guestsGroupedByNode() to use canonical format Backend changes: - Removed temporary debug logging added during investigation - Added alert history section to AI buildEnrichedResourceContext() function The backend generates VM/Container IDs in instance:node:vmid format (e.g., delly:delly:101) via makeGuestID(). This format is now consistently used across all frontend fallbacks to prevent AI context, metadata, overrides, and metrics from colliding or desyncing in clustered environments.	2025-12-20 22:11:35 +00:00
rcourtman	ae522c9a2b	fix: Allow all threshold types (Storage, Temperature, Host Agent) to be set to 0 to disable alerting - Fixed normalizeStorageDefaults to allow Trigger=0 - Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0 - Added comprehensive tests for all threshold normalization patterns - Updated existing test that expected old behavior Related to #864	2025-12-20 20:42:23 +00:00
rcourtman	781442cdd0	test: Add comprehensive tests for Host Agent threshold normalization with Trigger=0. Related to #864	2025-12-20 20:32:59 +00:00
rcourtman	db5e79bb37	fix: Allow Host Agent thresholds to be set to 0 to disable alerting. Related to #864	2025-12-20 20:25:20 +00:00
rcourtman	3c3f560c4b	Fix login re-auth with stale sessions and hot-dev encryption safety - Login.tsx: Use apiClient.fetch with skipAuth to avoid auth loops - router.go: Skip CSRF validation for /api/login endpoint - hot-dev.sh: Detect encrypted files before generating new key to prevent data loss	2025-12-20 13:45:11 +00:00
rcourtman	d8fd3865e1	chore: remove accidentally committed metrics.db and add .db to gitignore - Remove internal/monitoring/metrics.db (SQLite test artifact) - Add .db, .sqlite, .sqlite3 patterns to .gitignore	2025-12-20 11:55:48 +00:00
rcourtman	41e075b9ec	fix(updates): Add RSS/Atom feed fallback for GitHub rate limits When the GitHub API returns 403 (rate limited), Pulse now falls back to parsing the releases.atom feed which doesn't count against API rate limits. This ensures users can still check for updates even when rate limited. The feed parser: - Extracts version tags from Atom feed entries - Filters prereleases for stable channel users - Returns the first matching release Fixes #840	2025-12-20 10:54:14 +00:00
rcourtman	b6140cd6e8	feat(oidc): Add refresh token support for long-lived sessions When offline_access scope is configured, Pulse now stores and uses OIDC refresh tokens to automatically extend sessions. Sessions remain valid as long as the IdP allows token refresh (typically 30-90 days). Changes: - Store OIDC tokens (refresh token, expiry, issuer) alongside sessions - Automatically refresh tokens when access token nears expiry - Invalidate session if IdP revokes access (forces re-login) - Add background token refresh with concurrency protection - Persist OIDC tokens across restarts Related to #854	2025-12-20 10:45:46 +00:00
rcourtman	17498d7581	fix: reload HideLocalLogin immediately after settings change. Related to #857 When 'Hide local login form' was toggled in Settings, the change was saved to disk but not applied to the in-memory config until restart. Now reloadSystemSettings() also updates config.HideLocalLogin so the setting takes effect immediately.	2025-12-20 00:01:49 +00:00
rcourtman	7f05d87809	fix: add missing HandleLicenseFeatures method and related changes - Add HandleLicenseFeatures handler that was missing from license_handlers.go - Add /api/license/features route to router - Update AI service and metadata provider - Update frontend license API and components - Fix CI build failure caused by tests referencing unimplemented method	2025-12-19 22:59:52 +00:00
rcourtman	65e38fac91	test: improve test coverage for AI, license, config, and monitoring packages New test files: - internal/ai/providers/gemini_test.go: Comprehensive Gemini provider tests - internal/api/ai_intelligence_handlers_test.go: AI intelligence endpoint tests - internal/api/ai_patrol_handlers_test.go: AI patrol endpoint tests - internal/api/license_handlers_test.go: License API handler tests - internal/api/security_oidc_response_test.go: OIDC response formatting tests - internal/config/ai_config_test.go: AI configuration function tests - internal/config/persistence_ai_test.go: AI config persistence tests - internal/config/persistence_extended_test.go: Extended persistence tests - internal/license/persistence_test.go: License persistence tests - internal/license/pubkey_test.go: Public key handling tests - internal/monitoring/host_agent_temps_test.go: Temperature processing tests Enhanced existing files: - internal/api/updates_test.go: Added update handler tests - internal/license/license_test.go: Added Service method tests Coverage improvements: - ai/providers: 57.3% -> 73.0% (+15.7%) - license: 78.3% -> 85.9% (+7.6%) - config: 49.7% -> 53.9% (+4.2%) - monitoring: 49.8% -> 50.8% (+1.0%) - api: 28.4% -> 29.8% (+1.4%)	2025-12-19 22:49:30 +00:00
rcourtman	a1f811cb9e	test(ai): improve AI package test coverage from 59.7% to 69.5% Add comprehensive tests for: - alert_triggered.go: analysis functions (92%+ coverage) - patrol_history_persistence.go: all store methods (100%) - patrol.go: helper functions and getters (100%) - findings.go: Add edge cases, severity escalation (100%) - Export functions: all config/detector constructors (100%) New test files created: - patrol_history_persistence_test.go - exports_test.go - service_extended_test.go - service_remediation_test.go - service_tools_test.go - mock_test.go Also add coverage.html to .gitignore to exclude generated coverage reports.	2025-12-19 21:53:06 +00:00
rcourtman	1d64b4c31a	fix: show Removed Docker Hosts section in UI for re-enrollment The 'Removed Docker Hosts' section was not appearing in Settings -> Agents even when hosts were blocked from re-enrolling. This prevented users from using the 'Allow re-enroll' button to unblock their Docker agents. Root cause: The WebSocket store was missing: 1. The 'removedDockerHosts' property in its initial state 2. A handler to process removedDockerHosts data from WebSocket messages This meant the backend was correctly sending the data, but the frontend was completely ignoring it. Changes: - Add removedDockerHosts to WebSocket store initial state and message handler - Add removedDockerHosts to App.tsx fallback state for consistency - Add missing BroadcastState call after AllowDockerHostReenroll succeeds Also includes previous fixes from this session: - Add PULSE_AGENT_URL as alias for PULSE_AGENT_CONNECT_URL (config.go) - Add runtime Docker/Podman auto-detection in pulse-agent (main.go) Fixes issue reported by darthrater78 in discussion #845	2025-12-19 17:57:04 +00:00
rcourtman	1230099d3d	fix(test): resolve flaky concurrent temperature collection test	2025-12-19 17:09:57 +00:00
rcourtman	3a9df35ae1	fix(ai): improve patrol timing accuracy and status reporting	2025-12-19 17:04:14 +00:00
rcourtman	4d1138793d	feat(license): add initial license implementation structure to fix build	2025-12-19 17:01:57 +00:00
rcourtman	13af682ce1	fix(config): add PULSE_AGENT_CONNECT_URL and improve Docker detection - Add AgentConnectURL config option to override public URL for agents - Improve install.sh to diagnose docker detection failures - Update router to prioritize AgentConnectURL for agent install commands	2025-12-19 16:43:14 +00:00
rcourtman	a93148105f	fix: exclude WebSocket from rate limiting to prevent UI lockout The /ws endpoint was rate limited to 30 connections/minute. After prolonged use with WebSocket reconnections (network hiccups, browser tab throttling, etc.), users with many Docker containers would hit this limit and get stuck with a 'Connecting...' UI. WebSocket connections are already authenticated via session/API token and reconnections are normal behavior, so rate limiting is not needed. Fixes #859 (second report about WebSocket rate limiting after hours of use).	2025-12-19 14:51:52 +00:00
rcourtman	16f143d925	fix: respect X-Forwarded-Proto header for hasHTTPS in /api/security/status Fixes issue where /api/security/status reports hasHTTPS=false when accessed via HTTPS through a reverse proxy like Caddy. Resolves feedback from discussion #845 (clar2242).	2025-12-19 14:40:23 +00:00
rcourtman	968e0a7b3d	fix: reduce syslog flooding by downgrading routine logs to debug level Addresses issue #861 - syslog flooded on docker host Many routine operational messages were being logged at INFO level, causing excessive log volume when monitoring multiple VMs/containers. These messages are now logged at DEBUG level: - Guest threshold checking (every guest, every poll cycle) - Storage threshold checking (every storage, every poll cycle) - Host agent linking messages - Filesystem inclusion in disk calculation - Guest agent disk usage replacement - Polling start/completion messages - Alert cleanup and save messages Users can set LOG_LEVEL=debug to see these messages if needed for troubleshooting. The default INFO level now produces significantly less log output. Also updated documentation in CONFIGURATION.md and DOCKER.md to: - Clarify what each log level includes - Add tip about using LOG_LEVEL=warn for minimal logging	2025-12-18 23:27:32 +00:00
rcourtman	8400976e80	fix: wait for async save in guest metadata test The TestGuestMetadataStore_GetWithLegacyMigration_ClusteredMatchesNodeFormat test was flaky because it triggered an async save in GetWithLegacyMigration but didn't wait for it to complete. When the test ended, t.TempDir() tried to clean up while the goroutine was still writing, causing 'directory not empty' errors on CI. Added time.Sleep(100ms) to wait for the async save, matching the pattern used in other similar tests in the same file.	2025-12-18 22:48:15 +00:00
rcourtman	0d11da74e2	refactor(ui): standardize URL editing with shared UrlEditPopover component - Create reusable UrlEditPopover component with fixed positioning - Add createUrlEditState hook for managing editing state - Update DockerHostSummaryTable to use new popover - Update DockerUnifiedTable (containers & services) to use new popover - Update GuestRow (Proxmox VMs/containers) to use new popover - Update HostsOverview (Proxmox hosts) to use new popover - Add Docker host metadata API for custom URLs - Consistent styling with save, delete, cancel buttons and keyboard shortcuts	2025-12-18 22:22:55 +00:00
rcourtman	65829983b5	v5: gate legacy sensor-proxy and prune dev docs	2025-12-18 21:51:25 +00:00
rcourtman	0d6aaff253	fix: AI Patrol frequency not obeying settings Fixes #858 The patrol interval setting was not being properly applied due to: 1. ReconfigurePatrol() was setting the deprecated QuickCheckInterval field instead of the preferred Interval field 2. SetConfig() was comparing raw field values instead of using GetInterval() to compare effective intervals, causing change detection to fail 3. The API response was missing interval_ms, preventing the frontend from displaying the correct interval Changes: - Update StartPatrol() and ReconfigurePatrol() to use the Interval field - Fix SetConfig() to use GetInterval() for interval comparison - Add IntervalMs to PatrolStatusResponse and include it in the API response	2025-12-18 21:33:50 +00:00
rcourtman	2b48b0a459	feat: add --kube-include-all-deployments flag for Kubernetes agent Adds IncludeAllDeployments option to show all deployments, not just problem ones (where replicas don't match desired). This provides parity with the existing --kube-include-all-pods flag. - Add IncludeAllDeployments to kubernetesagent.Config - Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var - Update collectDeployments to respect the new flag - Add test for IncludeAllDeployments functionality - Update UNIFIED_AGENT.md documentation Addresses feedback from PR #855	2025-12-18 20:58:30 +00:00
rcourtman	90799f4771	fix: correct pod/deployment filtering logic and fix test helper calls - Remove unused sets import from kubernetesagent - Fix inverted filtering logic: keep problem pods/deployments, skip healthy ones - Fix test helper calls: use slice literals instead of undefined makeNamespaceSet	2025-12-18 16:59:37 +00:00
rcourtman	b05791a3e5	fix: remove unused sets import in kubernetesagent	2025-12-18 16:42:51 +00:00
rcourtman	fdb2a07f56	fix(agent): find zpool binary on TrueNAS SCALE (#718 ) Enhanced zpool binary lookup to try common paths when exec.LookPath fails. This fixes issue #718 where TrueNAS SCALE reports inflated storage because the agent runs with a restricted PATH that doesn't include /usr/sbin. Changes: - Added findZpool() helper that tries common paths like /usr/sbin/zpool, /sbin/zpool, /usr/local/sbin/zpool for TrueNAS/FreeBSD/Linux systems - Added commonZpoolPaths variable listing typical zpool locations - Added tests for the new findZpool function This ensures zpool list is used for accurate pool-level capacity instead of falling back to dataset-level summation.	2025-12-18 16:23:56 +00:00
rcourtman	0182cc8310	feat(thresholds): add collapsible accordion sections and UX improvements - Add CollapsibleSection component with animated expand/collapse - Wrap all 6 resource sections (Nodes, VMs, PBS, Storage, Backups, Snapshots) with accordion UI - Add section icons and resource counts in headers - Add expand all / collapse all buttons for quick navigation - Make help banner dismissible with localStorage persistence - Add Ctrl/Cmd+F keyboard shortcut to focus search - Add keyboard shortcut hint badge on search input - Add icons to tab navigation for quick identification - Improve mobile tab labels with shorter text on small screens - Create reusable components: ThresholdBadge, ResourceCard, GlobalDefaultsRow - Create useCollapsedSections hook with localStorage persistence - Default less-used sections (Storage, Backups, Snapshots, PBS) to collapsed	2025-12-18 15:47:44 +00:00
rcourtman	c91307be94	fix: guest URL icon now appears/disappears immediately after AI sets/removes it The issue was a SolidJS reactivity problem in the Dashboard component. When guestMetadata signal was accessed inside a For loop callback and assigned to a plain variable, SolidJS lost reactive tracking. Changed from: const metadata = guestMetadata()[guestId] \|\| ... customUrl={metadata?.customUrl} To: const getMetadata = () => guestMetadata()[guestId] \|\| ... customUrl={getMetadata()?.customUrl} This ensures SolidJS properly tracks the signal dependency when the getter function is called directly in JSX props.	2025-12-18 14:42:47 +00:00
rcourtman	5c9bbf33b6	Merge pull request #856 from BTLzdravtech/wildcard Adds wildcard support for kube namespace filtering.	2025-12-18 11:30:18 +00:00
rcourtman	cf57cfcb03	Merge pull request #855 from BTLzdravtech/main Fixes inverted boolean logic in isProblemPod/isProblemDeployment checks and improves init container exit code handling.	2025-12-18 11:30:11 +00:00
Tomas Hruska	a419b6237a	support wildcards --kube-include-namespace/--kube-exclude-namespace	2025-12-18 00:00:30 +01:00
Tomas Hruska	69d693f346	Fix kubernetes logic and init containers detection	2025-12-17 23:47:03 +01:00
rcourtman	210a6f7cc0	monitoring: keep host IDs stable via token+hostname binding	2025-12-17 20:16:27 +00:00
rcourtman	ebc3474647	hostagent: avoid identity collisions with MAC fallback (Related to #836 )	2025-12-17 20:09:55 +00:00
rcourtman	3623395549	test(api): allow printable alert IDs for acknowledge (Related to #852 )	2025-12-17 20:09:51 +00:00
rcourtman	5338ab580c	Stabilize core E2E tests - Preserve alerts activation state when saving thresholds - Use compliant default E2E password and deterministic bootstrap token seeding - Harden Playwright selectors, waits, and diagnostics gating	2025-12-17 19:36:48 +00:00
rcourtman	54fc259221	fix(ai): improve AI settings UX with validation and smart fallbacks Backend: - Add smart provider fallback when selected model's provider isn't configured - Automatically switch to a model from a configured provider instead of failing - Log warning when fallback occurs for visibility Frontend (AISettings.tsx): - Add helper functions to check if model's provider is configured - Group model dropdown: configured providers first, unconfigured marked with ⚠️ - Add inline warning when selecting model from unconfigured provider - Validate on save that model's provider is configured (or being added) - Warn before clearing last configured provider (would disable AI) - Warn before clearing provider that current model uses - Add patrol interval validation (must be 0 or >= 10 minutes) - Show red border + inline error for invalid patrol intervals 1-9 - Update patrol interval hint: '(0=off, 10+ to enable)' These changes prevent confusing '500 Internal Server Error' and 'AI is not enabled or configured' errors when model/provider mismatch.	2025-12-17 18:30:19 +00:00
rcourtman	c4b893e257	Fix agent download serving wrong architecture binary When a specific architecture is requested (e.g., linux-arm64), don't fall back to the generic pulse-agent binary if the requested arch isn't found. This was causing ARM64 machines to receive x86-64 binaries that can't run. Now returns 404 with helpful error message if requested architecture binary is not available.	2025-12-17 17:22:51 +00:00
rcourtman	30f01771ac	Add meaningful tests for host agent and exec websocket	2025-12-17 17:02:01 +00:00
rcourtman	ab480ca489	fix: Prevent orphaned encrypted data when encryption key is deleted - crypto.go: Add runtime validation to Encrypt() that verifies the key file still exists on disk before encrypting. If the key was deleted while Pulse is running, encryption now fails with a clear error instead of creating orphaned data that can never be decrypted. - hot-dev.sh: Auto-generate encryption key for production data directory (/etc/pulse) when HOT_DEV_USE_PROD_DATA=true and key is missing. This prevents startup failures and ensures encrypted data can be created. - Added test TestEncryptRefusesAfterKeyDeleted to verify the protection works.	2025-12-17 17:00:53 +00:00
rcourtman	d663ba4342	hostagent: avoid host ID collisions and prefer LAN IP	2025-12-17 16:29:59 +00:00
rcourtman	71e1b5dc86	test: expand AI provider test coverage with HTTP mocks	2025-12-17 15:53:56 +00:00
rcourtman	47dfa5d703	test: expand cmd and agent update coverage	2025-12-17 13:28:17 +00:00

1 2 3 4 5 ...

940 commits