Commit graph

777 commits

Author SHA1 Message Date
rcourtman
771f2a3e41 Add recent changes to intelligence summary 2026-03-19 00:18:28 +00:00
rcourtman
12e705df1e Expose recent changes in resource intelligence 2026-03-19 00:16:14 +00:00
rcourtman
261d4e8586 Expose adapter provenance counts in facet summaries 2026-03-18 23:40:59 +00:00
rcourtman
74e5301688 Expose provenance counts in facet summaries 2026-03-18 23:34:04 +00:00
rcourtman
e1071d2ae0 Preserve related resources in all timeline changes 2026-03-18 23:21:48 +00:00
rcourtman
72fcfd01be Expose grouped timeline kind counts 2026-03-18 23:16:35 +00:00
rcourtman
6d45ce222a Classify incident anomalies 2026-03-18 23:08:04 +00:00
rcourtman
9e99193efd Classify canonical restart changes 2026-03-18 22:49:18 +00:00
rcourtman
7e6a85548f Preserve relationship endpoints in change timelines 2026-03-18 22:11:22 +00:00
rcourtman
ac4d566d69 Cap unified audit list limits 2026-03-18 21:02:52 +00:00
rcourtman
d07c567b46 Wire tenant resource provider at startup 2026-03-18 20:43:38 +00:00
rcourtman
d6ad348dfa Validate source-adapter filters 2026-03-18 20:39:00 +00:00
rcourtman
fc3cf0624b Index resource source-adapter filters 2026-03-18 20:36:32 +00:00
rcourtman
9c43a48ff0 Harden unified resource timeline filters 2026-03-18 20:29:30 +00:00
rcourtman
33b169334c Pin graph provenance in API proofs 2026-03-18 20:19:04 +00:00
rcourtman
181dba0548 Propagate unified facet counts 2026-03-18 18:36:49 +00:00
rcourtman
dc63f86648 Add unified resource facet bundle endpoint 2026-03-18 18:28:57 +00:00
rcourtman
19a5aace70 Expose resource facets and timeline 2026-03-18 17:48:36 +00:00
rcourtman
fae55976a5 Expose unified audit history 2026-03-18 17:44:21 +00:00
rcourtman
f0520bc5e3 Persist unified resource timeline changes 2026-03-18 17:09:30 +00:00
rcourtman
778a2577b6 feat: Pulse v6 release 2026-03-18 16:06:30 +00:00
rcourtman
8a43a964b6 fix(ai): wire patrol circuit breaker on first-time configure 2026-03-13 12:10:14 +00:00
rcourtman
ae2edbde20 fix(ai): complete wiring on first-time configure; guard Ollama fallback
Three follow-up fixes:

1. RestartAIChat() now performs the full post-start wiring (MCP providers,
   patrol adapter, investigation orchestrator) when the service starts for
   the first time via Restart(). Previously these were only wired via
   StartAIChat(), leaving first-time configure with a partially wired service.

2. The Ollama→OpenAI-compatible fallback in createProviderForModel is now
   guarded by !strings.HasPrefix(modelStr, "ollama:") so explicit
   "ollama:llama3" models are never silently rerouted to a different provider.

3. Windows install script registration check now uses the $Hostname override
   (if set) instead of always looking up $env:COMPUTERNAME, so post-install
   verification works correctly when a custom hostname is specified.
2026-03-13 12:06:08 +00:00
rcourtman
e137f3fbf7 fix(ai): start chat service on first-time configure without restart
When Pulse starts before AI is configured, legacyService is nil.
Saving AI settings called Restart() which bailed immediately on the
nil check, leaving the service unstarted (503 on /api/ai/sessions)
until a full process restart.

Merged the nil and !IsRunning checks so first-time configure now
starts the service inline, same as the already-handled stopped case.

Also: bare model names that ParseModelString routes to Ollama (e.g.
"qwen3-omni") now fall back to a configured custom OpenAI base URL
when Ollama is not explicitly configured — handles manually-typed
model names on self-hosted OpenAI-compatible endpoints.

Fixes #1339, #1296
2026-03-13 11:13:27 +00:00
rcourtman
1a582ccc35 fix(diagnostics): honor PVE fingerprint in diagnostics probe 2026-03-10 22:46:12 +00:00
rcourtman
a4b0771974 Prevent removed host agents from resurrecting via in-flight reports (#1331)
Host agents removed from the UI would reappear on the next report cycle
because there was no rejection mechanism — unlike Docker agents which
already had resurrection prevention. Mirror the Docker agent pattern:

- Track removed host IDs in a `removedHosts` map with 24hr TTL
- Persist removal records in `State.RemovedHosts` for frontend display
- Reject reports from removed hosts in `ApplyHostReport()`
- Add `AllowHostReenroll()` + API route to clear the block
- Show removed host agents in the Settings UI with "Allow re-enroll"
- Sync removed-agent maps from state on startup for all agent types
- Fix mock integration snapshot missing `RemovedDockerHosts` field
2026-03-09 17:52:34 +00:00
rcourtman
9b531c547d Fix recovery notifications silently disabled by config PUT (#1332)
Two fixes for missing recovery/resolved notifications:

1. API config PUT handler now preserves notifyOnResolve when the client
   omits it from the request body. Go decodes a missing bool as false,
   which silently disabled recovery notifications on older clients.

2. CancelAlert now always cleans up the cooldown record even when the
   alert has already left the pending buffer, preventing stale cooldown
   entries from suppressing future alert cycles.
2026-03-09 11:28:28 +00:00
rcourtman
572520ebc6 Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270)
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
2026-03-09 10:04:28 +00:00
rcourtman
45b5c8a861 Restore previous license on persistence failure instead of clearing it
If license save fails, the in-memory license was being cleared, which
could drop a valid existing license. Now snapshots the current license
before activation and restores it if persistence fails.
2026-03-08 11:49:26 +00:00
rcourtman
fe0706f614 Fix cluster double-registration invalidating Proxmox credentials (#1319)
Two nodes in the same PVE cluster generated identical Proxmox API token
names, so the second node's setup rotated the shared token and broke the
first node. Include the hostname in the token name so each node gets its
own token. Also refresh the stored cluster credential on the server when
a new endpoint merges into an existing cluster entry.
2026-03-07 22:36:01 +00:00
rcourtman
0dd3fc779b Fix alert disable notification suppression
Some checks failed
Build and Test / Secret Scan (push) Has been cancelled
Build and Test / Frontend & Backend (push) Has been cancelled
Core E2E Tests / Playwright Core E2E (push) Has been cancelled
2026-03-07 18:40:08 +00:00
rcourtman
d6e8bffaeb pulse/license upgrade safety hardening 2026-03-07 15:13:09 +00:00
rcourtman
a6f6f66078 Improve auto-register auth errors and setup token grace window (#1319)
Some checks are pending
Build and Test / Secret Scan (push) Waiting to run
Build and Test / Frontend & Backend (push) Waiting to run
Core E2E Tests / Playwright Core E2E (push) Waiting to run
The /api/auto-register endpoint returned a generic "Invalid or expired
setup code" for all auth failures, making cluster registration issues
impossible to diagnose. Now returns specific errors for expired tokens,
wrong scope, invalid API tokens, etc.

Also extend the setup token grace window to /api/auto-register so
multiple cluster nodes can register with the same token within the
1-minute grace period after first use.
2026-03-07 13:39:26 +00:00
rcourtman
ddecf6d00c Guard legacyMonitor typed-nil and add OIDC refresh panic recovery
Normalize SystemSettingsMonitor interface assignments via reflect to
prevent typed-nil-in-interface (same class as #1324 fix). Also add
defer/recover to the background OIDC token refresh goroutine so a
panic there cannot take down the process.
2026-03-07 10:21:07 +00:00
rcourtman
23a9fa70da Fix nil pointer crash when saving settings (#1324)
SystemSettingsHandler.mtMonitor was an interface field. A nil
*MultiTenantMonitor stored in it became a non-nil interface
(Go typed-nil-in-interface), bypassing the nil guard in getMonitor()
and panicking on every settings save in single-tenant mode.

Change mtMonitor to concrete *monitoring.MultiTenantMonitor so nil
checks work correctly. Also resolve getMonitor() once per request
instead of repeated calls to eliminate a TOCTOU race.
2026-03-07 10:21:07 +00:00
rcourtman
89577fe533 Fix OIDC token refresh bypass and guard AISettingsHandler nil path
The applyAuthContextHeaders early-return in CheckAuth skipped the OIDC
token refresh block, causing long-lived OIDC sessions to expire instead
of auto-refreshing. Move the refresh trigger into extractAndStoreAuthContext
so it fires at the middleware level before CheckAuth's early return.

Also add a nil guard on mtPersistence in AISettingsHandler.GetAIService
for non-default org paths, preventing a potential panic if background
code carries a non-default org context in v5 single-tenant mode.
2026-03-06 11:05:01 +00:00
rcourtman
743ef17b79 Fix AI and config profile handlers broken in v5 single-tenant mode
The single-tenant lockdown (499ab812e) set mtPersistence to nil but
only patched AISettingsHandler with a legacy fallback. AIHandler (chat
service) and ConfigProfileHandler were missed, so AI features (Patrol,
Chat) failed with "chat service not available" and config profiles
would panic on nil dereference. Wire legacy persistence into both
handlers and add the same fallback to ProfileSuggestionHandler.

Fixes #1322
2026-03-06 11:05:01 +00:00
rcourtman
6618db7799 Fix v5 single-tenant router test setup 2026-03-05 23:58:11 +00:00
rcourtman
499ab812e3 Fix post-release regressions and lock v5 to single-tenant runtime 2026-03-05 23:46:35 +00:00
rcourtman
10872c8ca8 fix(patrol): remove noisy per-alert log when patrol is disabled (#1258)
The alert callback logged at Info level for every alert regardless of
whether patrol was enabled. TriggerPatrolForAlert already has an
enabled/running guard and its own debug logging.
2026-03-05 10:01:43 +00:00
rcourtman
72be883f4e fix(proxmox): prevent broken TLS config on auto-register fingerprint failure (#1303)
When FetchFingerprint fails during agent auto-registration, set verifySSL
based on whether a fingerprint was captured rather than hardcoding true.
Also heal already-broken nodes (verifySSL=true with empty fingerprint) on
legacy re-register to prevent permanent connection failures with self-signed
Proxmox certs.
2026-03-05 10:01:43 +00:00
rcourtman
8818a740e2 fix(proxmox): prevent setup-script token drift and add lifecycle integration tests (#1312) 2026-03-03 20:11:01 +00:00
rcourtman
a8e562034e fix(ai): restore dismissed patrol findings and add regression tests 2026-03-03 19:53:55 +00:00
rcourtman
b38488f2da fix(proxmox): stabilize pulse monitor token lifecycle 2026-03-03 10:57:19 +00:00
rcourtman
510ec999ab fix(api): store TLS fingerprint during auto-registration (#1303)
The legacy auto-register endpoint captured TLS fingerprints via
FetchFingerprint() but never persisted them to the node config. Nodes
with self-signed certs registered via the agent would fail with
"x509: certificate signed by unknown authority" on subsequent polls.

Store the fingerprint in all add/update paths for both PVE and PBS,
guard updates against empty-fingerprint clobber when FetchFingerprint
fails, and pass the fingerprint to cluster detection configs.
2026-03-02 14:07:18 +00:00
rcourtman
10a4e994b6 fix(api): return 404 from undismiss endpoint for invalid finding IDs (#1300)
HandleUndismissFinding now checks both patrol and unified stores
before returning. Returns 404 with error message when the finding
is not found or not dismissed, instead of silently returning success.
2026-03-02 11:48:23 +00:00
rcourtman
d43dfbc490 feat(ui): add host removal action to hosts table
Add an actions menu to the hosts overview with a "Remove host from
Pulse" button. Includes permission checks (requires settings:write
scope), confirmation handling, and a security regression test for
the delete endpoint scope enforcement.
2026-03-01 23:28:33 +00:00
rcourtman
2fcddecf80 feat(api): add POST /api/ai/patrol/undismiss endpoint to revert suppressed findings (#1300)
The Undismiss() method existed on FindingsStore but was never exposed
via the API. Users who dismissed findings as "not_an_issue" had no way
to revert them.

- Add HandleUndismissFinding handler and route
- Add Undismiss() to UnifiedStore for parity with FindingsStore
- Also remove matching explicit suppression rules on undismiss
2026-03-01 22:29:36 +00:00
rcourtman
027fd9932c fix(proxmox): make monitor reload synchronous after auto-registration (#1303)
Auto-register was running the monitor reload in a background goroutine,
so the HTTP response was sent before the poller picked up the new node.
If reload failed or was slow, the node appeared in Settings > Proxmox
(reads config from disk) but not on the main Proxmox tab (reads from
active polling state).

Changed both auto-register paths to reload synchronously, matching the
manual add path (HandleAddNode).
2026-03-01 21:04:20 +00:00
rcourtman
d852964696 fix(ai): record patrol and QuickAnalysis token usage in cost store for budget enforcement
Patrol runs, evaluation passes, and QuickAnalysis calls were consuming
LLM tokens without recording them in the cost store. This made the
cost_budget_usd_30d budget setting ineffective since enforceBudget()
never saw patrol spend.

- Add RecordUsage() to ai.Service for thread-safe cost recording
- Add recordPatrolUsage() helper to PatrolService, called on both
  success and error paths for main patrol and evaluation pass
- Record QuickAnalysis token usage in cost store
- Return partial PatrolResponse (with token counts) on error instead
  of nil, so callers can always record consumed tokens
- Propagate partial response through chat_service_adapter on error
2026-03-01 19:19:47 +00:00