Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 11:30:15 +00:00

Author	SHA1	Message	Date
rcourtman	d5b4850715	Harden AI session storage paths	2026-03-28 13:50:55 +00:00
rcourtman	c12394c17f	Route patrol investigations through patrol model (#1360 )	2026-03-26 09:16:38 +00:00
rcourtman	4ba888b450	Fix Pulse Assistant startup for legacy OpenAI-compatible configs (#1339 )	2026-03-25 23:54:17 +00:00
rcourtman	ae2edbde20	fix(ai): complete wiring on first-time configure; guard Ollama fallback Three follow-up fixes: 1. RestartAIChat() now performs the full post-start wiring (MCP providers, patrol adapter, investigation orchestrator) when the service starts for the first time via Restart(). Previously these were only wired via StartAIChat(), leaving first-time configure with a partially wired service. 2. The Ollama→OpenAI-compatible fallback in createProviderForModel is now guarded by !strings.HasPrefix(modelStr, "ollama:") so explicit "ollama:llama3" models are never silently rerouted to a different provider. 3. Windows install script registration check now uses the $Hostname override (if set) instead of always looking up $env:COMPUTERNAME, so post-install verification works correctly when a custom hostname is specified.	2026-03-13 12:06:08 +00:00
rcourtman	e137f3fbf7	fix(ai): start chat service on first-time configure without restart When Pulse starts before AI is configured, legacyService is nil. Saving AI settings called Restart() which bailed immediately on the nil check, leaving the service unstarted (503 on /api/ai/sessions) until a full process restart. Merged the nil and !IsRunning checks so first-time configure now starts the service inline, same as the already-handled stopped case. Also: bare model names that ParseModelString routes to Ollama (e.g. "qwen3-omni") now fall back to a configured custom OpenAI base URL when Ollama is not explicitly configured — handles manually-typed model names on self-hosted OpenAI-compatible endpoints. Fixes #1339, #1296	2026-03-13 11:13:27 +00:00
rcourtman	82c615b3b9	Filter virtual disks from SMART checks to prevent false positives (#1329 ) ZFS zvols (zd*), device-mapper, virtio disks, and other virtual block devices don't support SMART and were being reported as FAILED. Use lsblk JSON metadata to filter by device prefix, transport, subsystem, and vendor/model. Also treat missing smart_status as unknown rather than failed, and ignore UNKNOWN health in Patrol/AI signals.	2026-03-08 22:16:24 +00:00
rcourtman	d46b5fc84b	fix(ai): route OpenRouter slash-delimited models to OpenAI provider (#1296 ) createProviderForModel() only handled "provider:model" colon format. Models like "google/gemini-2.5-flash" or "google/gemini-2.0-flash:free" (OpenRouter format) failed because the colon split produced invalid provider names. Now uses config.ParseModelString() which correctly detects slash- delimited models as OpenRouter (routed via OpenAI-compatible API).	2026-03-01 22:29:45 +00:00
rcourtman	d852964696	fix(ai): record patrol and QuickAnalysis token usage in cost store for budget enforcement Patrol runs, evaluation passes, and QuickAnalysis calls were consuming LLM tokens without recording them in the cost store. This made the cost_budget_usd_30d budget setting ineffective since enforceBudget() never saw patrol spend. - Add RecordUsage() to ai.Service for thread-safe cost recording - Add recordPatrolUsage() helper to PatrolService, called on both success and error paths for main patrol and evaluation pass - Record QuickAnalysis token usage in cost store - Return partial PatrolResponse (with token counts) on error instead of nil, so callers can always record consumed tokens - Propagate partial response through chat_service_adapter on error	2026-03-01 19:19:47 +00:00
rcourtman	24f5b1cb31	fix(patrol): cap per-run tokens and reset patrol session history	2026-02-24 11:29:47 +00:00
rcourtman	8bb89c4031	test: add memory regression coverage for AI stores	2026-02-04 19:56:12 +00:00
rcourtman	8720708e70	fix: address AI patrol concurrency and streaming issues - HIGH: Create per-request AgenticLoop instead of sharing one across concurrent sessions. This prevents race conditions where ExecuteStream calls would overwrite each other's FSM, knowledge accumulator, and other session-specific state. - MEDIUM: TriggerManager.GetStatus now recomputes adaptive interval after pruning old events. Previously, currentInterval could remain stuck in busy/quiet mode after events aged out of the window. - MEDIUM: Patrol stream phases are now broadcast to subscribers. Fixed setStreamPhase() to emit phase events and SubscribeToStream() to send phase events to late joiners. UI was stuck on 'Starting patrol...' because phase events were never emitted. - LOW: Fixed TriggerStatus.CurrentInterval JSON serialization. Changed from time.Duration (serializes as nanoseconds) to int64 milliseconds to match the 'current_interval_ms' tag.	2026-02-03 14:39:00 +00:00
rcourtman	a55ae78715	Revert "Add config option to disable tools for OpenAI-compatible endpoints" This reverts commit `81229f206f`.	2026-02-03 13:26:26 +00:00
rcourtman	81229f206f	Add config option to disable tools for OpenAI-compatible endpoints Some local LLM servers (LM Studio, llama.cpp) expose OpenAI-compatible APIs but don't support function calling. When tools are sent to these models, they output raw control tokens instead of proper responses. This change adds: - openai_tools_disabled config field in AIConfig - AreToolsDisabledForProvider() method to check at runtime - API support to get/set the new setting - Tests for the new functionality When enabled and using a custom OpenAI base URL, the chat service will skip sending tools to the model, allowing basic chat functionality to work even with models that don't support function calling. Fixes #1154	2026-02-03 13:21:44 +00:00
rcourtman	900e05025a	Fix OpenAI-compatible endpoint support for chat Two issues fixed: 1. Custom base URL wasn't being passed to the OpenAI client in createProviderForModel() - requests went to api.openai.com instead of the configured endpoint (e.g., LM Studio, llama.cpp) 2. Tool schemas were missing the "properties" field when tools had no parameters. OpenAI API requires "properties" to always be present as an object, even if empty. Fixes #1154	2026-02-03 12:03:06 +00:00
rcourtman	d1f76982ec	fix: finding drawer actions (notes persist, acknowledge visual, discuss context) - Sync UserNote, AcknowledgedAt, SnoozedUntil, DismissedReason, Suppressed, and TimesRaised from ai.Finding to unified store in both callback and startup sync paths. Mirror note writes to unified store immediately. - Dim acknowledged findings (opacity-60), add "Acknowledged" badge, hide acknowledge button once acknowledged, sort below unacknowledged in severity mode. - Pass finding_id through frontend chat API → backend ChatRequest → ExecuteRequest. Look up full finding from unified store (mutex-guarded) and prepend structured context to the prompt.	2026-02-02 15:18:51 +00:00
rcourtman	7946a2a9c1	test(ai/chat): add agentic loop formatting tests	2026-02-02 11:15:31 +00:00
rcourtman	20f1a9ee7f	test(ai/chat): add tests for service utilities and knowledge extraction	2026-02-02 11:15:02 +00:00
rcourtman	fa1b74792e	docs: add comprehensive deep-dive documentation for AI subsystems Adds detailed architecture documentation for Pulse Patrol and Pulse Assistant. Updates AI.md and PULSE_PRO.md. Also includes additional tests.	2026-02-02 10:29:07 +00:00
rcourtman	71e00ee7df	fix(ai): filter DeepSeek DSML internal function-call format from responses	2026-02-01 18:07:41 +00:00
rcourtman	9d83e4e1d1	fix(ai): fix ollama streaming timeouts and ensure consistent tool call responses	2026-02-01 16:28:24 +00:00
rcourtman	78e9086a19	fix(ai): minor chat service and agentic loop refinements	2026-02-01 10:12:49 +00:00
rcourtman	81ec5c525a	feat(ai): parallelize tool execution and refine knowledge extraction - Implement parallel execution for read-only tools in agentic loop - Optimize negative marker summaries to be more informative - Fix memory percentage scaling in query tools - Add derived memory stats (avg/max) to extraction logic - Add explicit fresh data intent detection to bypass knowledge gate - Update associated tests	2026-02-01 00:12:36 +00:00
rcourtman	50ecd62ae6	feat(ai): add extractors for all tool actions and fix predict/extract key mismatches Major expansion of the knowledge extraction system with extractors for every tool action, plus critical bug fixes discovered during live testing: Extractors added: - query: topology, health, search, list, config (with secondary cached key) - storage: pools, disk_health, raid, backups, backup_tasks, ceph, ceph_details, snapshots, replication, pbs_jobs, resource_disks - metrics: performance, baselines, physical disks, temperatures - alerts: list, findings (with resource_id in key to prevent collisions) - docker: services, updates, swarm, tasks - kubernetes: clusters, nodes, pods, deployments - pmg: status, mail_stats, queues, spam - patrol: get_findings - exec/read: file reads and command execution Bug fixes from live testing: - Config predict key mismatch: model omits node from input, added secondary config:{resource_id}:cached key for reliable gate matching - Baselines predict key mismatch: predict returned baseline:{id} but extractor stores baselines:queried; simplified predict to always return marker key - Query:get composite ID: resource.ID contained node prefix (delly:minipc:100), now prefers VMID (int) over composite string - Alerts findings wrong JSON field: struct expected "findings" but API returns "active" with separate "counts" object - Finding key collision: two findings with same key but different resource_id overwrote each other; added resource_id to fact key - Backup tasks vmid type: API returns int but struct had string, causing json.Unmarshal failure; also fixed status case-sensitivity (OK vs ok) Tests: 91+ tests including 38 roundtrip cases, 5 gate flow tests, negative marker test. All pass.	2026-01-31 22:24:54 +00:00
rcourtman	82ddeac454	refactor(ai): refine agentic loop compaction and knowledge accumulation - Inject wrap-up nudges/escalations after token/turn thresholds are met - Update compaction logic to include key accumulated facts in summaries - Refine knowledge extraction and accumulation tests - Update main entry point for revised AI configuration	2026-01-31 19:33:43 +00:00
rcourtman	c5717d1a10	feat(ai): add knowledge accumulation and enhance agentic loop - Introduce KnowledgeAccumulator to persist facts across turns - Enhance AgenticLoop to support knowledge injection and final text summaries - Update chat service to wire up knowledge components - Frontend updates to support enhanced chat capabilities	2026-01-31 16:22:50 +00:00
rcourtman	95a0d7a6bd	feat(backend): implement AI Patrol, Investigation, and system-wide refactors	2026-01-30 19:02:14 +00:00
rcourtman	e85ec858fd	fix(ai): discovery transient error handling, agentic loop detection, and read-only classification - Discovery: classify transient errors (429, timeout, connection refused, etc.) and return IsError:true so models stop retrying rate-limited calls - Agentic loop: detect identical tool calls repeated >3 times and block with LOOP_DETECTED error, forcing the model to try a different approach - OpenAI provider: skip tool_choice for DeepSeek Reasoner which doesn't support it - Read-only classifier: fix curl -I case sensitivity (uppercase flags lowered), add iostat/vmstat/mpstat/sar/lxc-ls/lxc-info/nc -z to allowlist, fix 2>&1 false positive in input redirect detection	2026-01-29 18:29:54 +00:00
rcourtman	a1fd2c4ddc	fix(ai): skip orphaned tool calls when pruning messages When pruning older messages to fit context limits, we may cut off a user message that preceded an assistant message with tool calls. This leaves an orphaned tool call sequence at the start. Extend pruneMessagesForModel to: - Skip leading assistant messages with tool calls - Also skip their following tool results - Ensures clean message sequence for all providers	2026-01-29 08:19:55 +00:00
rcourtman	1b1c9bb2a3	refactor(ai): convert patrol to agentic tool-based execution - Replace output-parsing approach with tool-based finding creation - PatrolService now uses runAIAnalysis with proper scope handling - Add tool event streaming (tool_start, tool_end) to patrol events - Expose GetExecutor() on chat.Service for patrol integration - Remove regex-based finding extraction in favor of patrol tools The patrol now uses the same agentic loop as chat, with the LLM calling patrol_report_finding to create findings rather than outputting JSON that gets parsed. This is more reliable and consistent with the tool model.	2026-01-28 23:18:58 +00:00
rcourtman	9c2f8a3284	refactor(ai): remove obsolete tool and chat files Remove files that were consolidated into other modules: - chat/patrol.go, patrol_test.go → moved to chat/service.go - tools_infrastructure.go → merged into tools_storage.go - tools_intelligence.go → merged into tools_metrics.go - tools_patrol.go → merged into tools_alerts.go - tools_profiles.go, tools_profiles_test.go → removed (unused) Update related test file references.	2026-01-28 21:30:24 +00:00
rcourtman	badbad4464	refactor(ai): integrate patrol execution into chat service - Add ExecutePatrolStream method to chat.Service for patrol-specific execution - Create chat_service_adapter.go to bridge chat.Service to ai.ChatServiceProvider - Remove standalone patrol.go and patrol_test.go from chat package - Add PatrolRequest/PatrolResponse types to chat service - Add context injection for recent message context This allows patrol to use an isolated agentic loop with its own system prompt while leveraging the common chat infrastructure.	2026-01-28 21:21:41 +00:00
rcourtman	279d4e7ec3	Add context prefetching and metrics to chat service Chat service improvements for better performance and observability: Context Prefetching: - Pre-load resource context when user mentions containers/nodes - Reduces latency for follow-up queries - Smart caching with TTL-based invalidation Metrics Collection: - Track tool execution counts and durations - FSM state transition metrics - Recovery success/failure rates - Telemetry for safety blocks Service Updates: - Better session management - Improved error handling - Cleaner test organization	2026-01-28 16:50:46 +00:00
rcourtman	b2e0ae3fdb	Add ExecutionIntent classification and NonInteractiveOnly enforcement Implement safety layers for command execution: ExecutionIntent classifies commands as: - ObservationOnly: Pure read (status, logs, metrics) - SideEffects: May change state (restart, write, delete) NonInteractiveOnly enforces safe command forms: - Blocks interactive commands (vim, top without -b, etc) - Blocks unbounded streaming (tail -f without limit) - Suggests safe alternatives in error messages Add phantom execution detection: - Catches when model claims actions without using tools - Skips check when tools actually succeeded (fixes false positives) Includes comprehensive tests for: - Intent classification accuracy - Interactive command blocking - Strict resolution validation	2026-01-28 16:49:00 +00:00
rcourtman	6e739cea5c	Add resolved context and routing provenance tracking Implement ResolvedContext to track pinned resources during chat sessions: - ResolvedTarget captures resource ID, type, node, and provenance info - Provenance tracking records how targets were resolved (user mention, tool result, or implicit context) - Session maintains pinned targets that persist across conversation turns Add routing contract tests to verify: - Commands routed to correct container vs host targets - Provenance properly recorded for different resolution methods - Context maintained across multi-turn conversations This provides audit trail for which resources were accessed and how they were identified, supporting safety verification and debugging.	2026-01-28 16:48:25 +00:00
rcourtman	6a0ba8d1a4	Add FSM workflow guardrails for AI assistant safety Implement a state machine that enforces structural safety guarantees: - RESOLVING: Initial state, must discover resources before writing - READING: Read tools allowed after discovery - WRITING: Transitions to VERIFYING after any write operation - VERIFYING: Must perform read verification before next write This prevents: - Write operations without resource discovery - Consecutive writes without verification - Final answers without post-write verification The FSM is enforced at the tool execution layer, providing defense-in-depth that doesn't rely on prompt instructions alone.	2026-01-28 16:47:54 +00:00
rcourtman	7f7edfceb4	test: expand backend coverage	2026-01-25 21:08:44 +00:00
rcourtman	27f1a11acb	feat: add AI Intelligence system with investigation and forecasting Major new AI capabilities for infrastructure monitoring: Investigation System: - Autonomous finding investigation with configurable autonomy levels - Investigation orchestrator with rate limiting and guardrails - Safety checks for read-only mode enforcement - Chat-based investigation with approval workflows Forecasting & Remediation: - Trend forecasting for resource capacity planning - Remediation engine for generating fix proposals - Circuit breaker for AI operation protection Unified Findings: - Unified store bridging alerts and AI findings - Correlation and root cause analysis - Incident coordinator with metrics recording New Frontend: - AI Intelligence page with patrol controls - Investigation drawer for finding details - Unified findings panel with actions Supporting Infrastructure: - Learning store for user preference tracking - Proxmox event ingestion and correlation - Enhanced patrol with investigation triggers	2026-01-24 22:41:43 +00:00
rcourtman	37e7aebc98	feat: enhance AI patrol with streaming and improved findings - Add streaming support to patrol operations - Improve finding detection and reporting - Enhance agentic chat capabilities - Add alert integration improvements	2026-01-22 22:30:35 +00:00
rcourtman	422efdde61	Restore UI improvements and refine Docker/Hosts display - Restore 'mini' mode for StackedDiskBar. - Restore layout fixes (fixed table layout, mobile columns) for Docker and Hosts tables. - Remove 'Ask AI' and AI context selection features. - Docker: Use compact 'Cube' icon for Podman pods to prevent name obstruction. - Docker: Show concise image names (strip registry URL). - Backend: Include pending fixes for AI providers.	2026-01-22 18:03:35 +00:00
rcourtman	798f6a8deb	Refactor: Update AI tools and tests for multi-tenancy - Refactored tool execution to handle tenant-scoped contexts - Added new tests for infrastructure, control, and kubernetes tools - Improved test coverage for agentic chat and approval store	2026-01-22 16:43:08 +00:00
rcourtman	ecc31730f6	Remove OpenCode references	2026-01-20 16:56:41 +00:00
rcourtman	b57b4a7c3c	Tighten AI chat routing and context display	2026-01-20 16:30:55 +00:00
rcourtman	96b7370f7b	test: improve coverage for API, AI, Alerts, and Frontend Utils - Add comprehensive tests for internal/api/config_handlers.go (Phases 1-3) - Improve test coverage for AI tools, chat service, and session management - Enhance alert and notification tests (ResolvedAlert, Webhook) - Add frontend unit tests for utils (searchHistory, tagColors, temperature, url) - Add proximity client API tests	2026-01-20 15:52:39 +00:00
rcourtman	5ff4f97a0d	feat(ai): Add native chat service with streaming and tool execution Replace the OpenCode sidecar with a native chat service that handles: - Real-time streaming responses from AI providers - Multi-turn conversation sessions with history - Tool execution with automatic function calling - Agentic workflows for autonomous task completion - Patrol integration for automated health analysis The chat service directly communicates with AI providers using the new StreamingProvider interface, eliminating the need for an external sidecar process. Sessions are managed in-memory with configurable history limits. Key components: - service.go: Main chat service with provider integration - session.go: Session management and message history - agentic.go: Agentic loop for autonomous tool execution - patrol.go: Patrol-specific chat context and analysis - tools.go: Tool execution bridge to tools package - types.go: Chat message and event type definitions	2026-01-19 19:12:04 +00:00

44 commits