Commit graph

2726 commits

Author SHA1 Message Date
rcourtman
e5eb15918e Sanitize LLM control tokens from OpenAI-compatible responses
Some local models (llama.cpp, LM Studio) output internal control tokens
like <|channel|>, <|constrain|>, <|message|> instead of using proper
function calling. These tokens leak into the UI creating a poor UX.

This adds sanitization to strip these control tokens from both streaming
and non-streaming responses before they reach the user.
2026-02-03 13:12:17 +00:00
rcourtman
71f80c8a99 Fix: alert resolution now records incident timeline during quiet hours
- Fixed early return in handleAlertResolved that skipped incident recording
  when quiet hours suppressed recovery notifications
- Added Host Agent alert delay configuration (backend + UI)
- Host Agents now have dedicated time threshold settings like other resource types

Related to #1179
2026-02-03 12:49:41 +00:00
rcourtman
174ac481c8 Add Windows uninstall command to UI
Update the Uninstall agent section to display both Linux/macOS and
Windows uninstall commands with clear platform labels.

Related to #1176
2026-02-03 12:04:46 +00:00
rcourtman
c2de5f7f4c Fix: add Windows uninstall command support for unified agent
The UI only showed a bash uninstall command which doesn't work on Windows.
Added PULSE_UNINSTALL env var support to install.ps1 and updated the UI
to display platform-specific uninstall commands for both Linux/macOS and
Windows.

Related to #1176
2026-02-03 12:03:06 +00:00
rcourtman
900e05025a Fix OpenAI-compatible endpoint support for chat
Two issues fixed:

1. Custom base URL wasn't being passed to the OpenAI client in
   createProviderForModel() - requests went to api.openai.com instead
   of the configured endpoint (e.g., LM Studio, llama.cpp)

2. Tool schemas were missing the "properties" field when tools had no
   parameters. OpenAI API requires "properties" to always be present
   as an object, even if empty.

Fixes #1154
2026-02-03 12:03:06 +00:00
rcourtman
1be9e6a024 Enhance Kiosk Mode: auto-enable logic and magic link generation
This improves the UX for setting up unattended displays by:1. Automatically enabling visual Kiosk mode when a token has monitoring-only scope (unless explicitly disabled).2. Providing a ready-to-use 'Magic Link' (with ?token=...&kiosk=1) upon token creation.
2026-02-03 12:03:06 +00:00
rcourtman
35eedcb5ac Fix: metrics store tier fallback for mock mode sparklines
When querying short time ranges (1h, 6h), the metrics store only looked
in TierRaw and TierMinute which were empty in mock mode. The seeded data
was stored in TierHourly and TierDaily.

Updated tierFallbacks to include coarser tiers as fallbacks:
- TierRaw now falls back to TierMinute, then TierHourly
- TierMinute now falls back to TierRaw, then TierHourly

This ensures sparkline data is available in mock/demo mode where
historical data is seeded into coarser tiers.
2026-02-03 12:03:06 +00:00
rcourtman
8495878553 Fix: improve mock metrics sampler startup performance
- Reduce minimum seed duration from 7 days to 1 hour for faster startup
  on resource-constrained systems (like demo server 1GB droplet)
- Reduce sleep times from 200ms to 50ms between resource processing
- Add diagnostic logging throughout mock metrics seeding to help debug
  issues where sparklines show no data
- Add progress logging for nodes, VMs, containers, storage, docker hosts
2026-02-03 12:03:06 +00:00
rcourtman
7cc3f77097 Auto-update Helm chart version to 5.1.0-rc.1 2026-02-03 00:55:38 +00:00
rcourtman
a61f1b387a Fix: data race in Docker detection test mock — add mutex for concurrent calls 2026-02-03 00:12:16 +00:00
rcourtman
445c5c0587 Fix: remove install-sensor-proxy.sh from release workflow (script was removed) 2026-02-03 00:08:19 +00:00
rcourtman
ed5ab5eebf Fix: flaky metrics fallback test — use WriteBatchSync for deterministic writes 2026-02-02 23:32:28 +00:00
rcourtman
df0d90fb69 Fix: regenerate package-lock.json for ESLint v9 upgrade 2026-02-02 23:25:21 +00:00
rcourtman
6ff5ca94c3 Bump version to 5.1.0-rc.1 2026-02-02 23:22:04 +00:00
rcourtman
744eeb0270 Chore: clean up staged changes for release
- Remove standalone pulse-assistant architecture doc (content lives in CLAUDE.md)
- Add CountdownTimer component for patrol schedule display
- Rewrite patrol handler test to focus on interval persistence
- Extract MockStateProvider to shared test file
2026-02-02 23:17:40 +00:00
rcourtman
c8483f8116 Fix: PBS backup verification status not updating after cache populated
The PBS backup snapshot cache only compared BackupCount and LastBackup
timestamp to decide whether to re-fetch. When PBS verify jobs complete,
neither field changes — only the Verification field on individual
snapshots changes — so the cache served stale data indefinitely.

Add a 10-minute TTL per backup group so verification status changes are
picked up periodically. Also add panic recovery to PBS and PVE backup
goroutines, and use runtimeCtx for PBS backup polling to respect
monitor shutdown.

Closes #1174
2026-02-02 23:12:26 +00:00
rcourtman
02946d45ec test: expand api handler coverage 2026-02-02 23:01:29 +00:00
rcourtman
eed80e2883 Fix: patrol interval not applied — omitempty caused preset to persist across reloads
The "Every" dropdown on the Patrol page was not being respected. Setting
15 min would show "Runs every 6 hours" and the countdown timer was wrong.

Root cause: PatrolSchedulePreset and PatrolIntervalMinutes had omitempty
JSON tags. When the API handler cleared the preset to "", json.Marshal
dropped the field. On reload, NewDefaultAIConfig() re-introduced "6hr"
as the preset, which took priority over the user's custom minutes.

Additional fixes in the same area:
- Track nextScheduledAt explicitly in the patrol loop so next_patrol_at
  reflects the actual ticker schedule, not a stale lastPatrol + interval
  calculation that diverges when the interval changes mid-cycle.
- Refetch patrol status in the frontend after an interval change so the
  countdown timer updates immediately.
- Seed lastPatrol from persisted run history on startup so the header
  countdown timer appears immediately after a backend restart.
2026-02-02 22:53:24 +00:00
rcourtman
bfa648ddd5 Test: expand api feature test coverage
Add tests for AI intelligence, Docker/K8s agents, log redaction, and general router helper functions.
2026-02-02 22:02:22 +00:00
rcourtman
43d7fffeef Test: add coverage for auth and security handlers
Add additional tests for OIDC, SAML, and tenant middleware to improve coverage of security-critical paths.
2026-02-02 22:02:11 +00:00
rcourtman
97a985efb8 Test: improve frontend embedding coverage
Enhance tests for frontend embedding to cover filesystem overrides, dev proxy configuration, and SPA header handling.
2026-02-02 22:01:46 +00:00
rcourtman
eb2d07e48f Chore: enhance core api and metrics testability
Refactor Router to allow HTTP client injection for install script proxying. Add tests for unified agent install mechanism and additional metrics store coverage.
2026-02-02 22:01:36 +00:00
rcourtman
2a7f231649 chore(test): add tests for service discovery tools adapter 2026-02-02 21:54:27 +00:00
rcourtman
a2cfda0936 fix(test): remove flaky content type test in eval 2026-02-02 19:26:24 +00:00
rcourtman
36eb381c26 test(ai): add validation tests for file tools 2026-02-02 19:24:11 +00:00
rcourtman
9b304f8a78 test(ai): comprehensive eval coverage (~71%) including scenarios, overrides, and error cases 2026-02-02 19:18:19 +00:00
rcourtman
abc8900d4c test(ai): add patrol assertions tests, coverage now 53.3% 2026-02-02 19:11:39 +00:00
rcourtman
aa4d728963 test(ai): add patrol quality logic tests, coverage now 42.5% 2026-02-02 19:10:45 +00:00
rcourtman
469c687860 test(ai): improve eval package coverage to 40% 2026-02-02 19:09:13 +00:00
rcourtman
e60a11116b test(api): comprehensively improve test coverage to Security, Infra, and Features 2026-02-02 18:59:44 +00:00
rcourtman
5ce54345ba fix: setup wizard shows no error feedback on validation failure
The ToastContainer was only rendered inside the authenticated app shell,
making all toast notifications invisible during the first-run setup wizard.
Users clicking "Create Account" with an invalid password saw no feedback
at all — just silent 400 errors in the browser console.

- Move ToastContainer outside the needsAuth conditional so it renders
  unconditionally, including during setup
- Add client-side password length validation (>= 12 chars) in SecurityStep
  to catch the most common case before hitting the server
- Fix WelcomeStep to check response.ok after bootstrap token validation
  so the wizard won't advance with an invalid token

Related to #1173
2026-02-02 18:56:04 +00:00
rcourtman
454448b796 fix: deadlock in offline alert recovery notifications
The quiet hours fix (07b4765b) added ShouldSuppressResolvedNotification()
to handleAlertResolved, which acquires m.mu.RLock(). Five clear*OfflineAlert
functions call the resolved callback synchronously while holding m.mu.Lock().
Go's RWMutex is not reentrant, so this deadlocks permanently when any
node/PBS/PMG/storage/guest comes back online after being offline.

The deadlock prevents recovery notifications from being sent and freezes
the monitoring goroutine, cascading to block all subsequent polling.

Fix: change the five affected functions to fire the resolved callback
asynchronously (matching the pattern already used by clearAlertNoLock),
so it runs after m.mu is released.

Related to #1068
2026-02-02 18:17:27 +00:00
rcourtman
3b347b6548 fix: harden SQLite against I/O contention causing persistent lock errors
- Move all SQLite pragmas from db.Exec() to DSN parameters so every
  connection the pool creates gets busy_timeout and other settings.
  Previously only the first connection had these applied.
- Set MaxOpenConns(1) on audit, RBAC, and notification databases
  (metrics already had this). Fixes potential for multiple connections
  where new ones lack busy_timeout.
- Increase busy_timeout from 5s to 30s across all databases to
  tolerate disk I/O pressure during backup windows.
- Fix nested query deadlocks in GetRoles(), GetUserAssignments(), and
  CancelByAlertIDs() that would deadlock with MaxOpenConns(1).
- Fix circuit breaker retryInterval not resetting on recovery, which
  caused the next trip to start at 5-minute backoff instead of 5s.

Related to #1156
2026-02-02 17:29:14 +00:00
rcourtman
ded1593048 fix(alerts): host disk threshold overrides not persisting after save
Two bugs prevented per-volume host disk thresholds from working:

1. The override parsing effect in Alerts.tsx had no handler for
   host:*/disk:* override keys, so they were silently dropped when
   reloading config — making overrides appear to not save.

2. The frontend sanitized mountpoints with underscores while the
   backend used hyphens, so even if parsing worked, the backend
   alert evaluator would never match the saved keys.

Also adds migration to normalize any orphaned underscore-style keys
on config load, and includes hostDisk in the disabled-state change
detection.

Related to #1103
2026-02-02 16:41:29 +00:00
rcourtman
6b237db923 fix(alerts): hide misleading Backup/Snapshot inputs in Global Defaults card, preserve per-resource backup config on threshold edit
The Global Defaults card in VMs & Containers rendered number inputs for
Backup and Snapshot columns. These wrote to guestDefaults which has no
backup/snapshot fields, so values were silently lost on save — appearing
to "reset to 0." Filter these special toggle columns out of the Global
Defaults card since backup/snapshot thresholds are configured in the
dedicated Backups/Snapshots sections.

Also fix saveEdit not preserving backup/snapshot in the raw override
config (hysteresisThresholds), which caused per-resource backup overrides
to be silently dropped when editing other thresholds on the same resource.

Related to #1126
2026-02-02 16:01:12 +00:00
rcourtman
dbf603d7e2 fix: only clear findingId after successful chat send
sendMessage now returns a boolean so handleSubmit only clears the
finding context when the request actually succeeded. Failed sends
preserve the findingId for retries.
2026-02-02 15:20:11 +00:00
rcourtman
d1f76982ec fix: finding drawer actions (notes persist, acknowledge visual, discuss context)
- Sync UserNote, AcknowledgedAt, SnoozedUntil, DismissedReason, Suppressed,
  and TimesRaised from ai.Finding to unified store in both callback and
  startup sync paths. Mirror note writes to unified store immediately.
- Dim acknowledged findings (opacity-60), add "Acknowledged" badge, hide
  acknowledge button once acknowledged, sort below unacknowledged in
  severity mode.
- Pass finding_id through frontend chat API → backend ChatRequest →
  ExecuteRequest. Look up full finding from unified store (mutex-guarded)
  and prepend structured context to the prompt.
2026-02-02 15:18:51 +00:00
rcourtman
7444bd0468 fix(alerts): guest alerts misclassified as node alerts when threshold disabled (#1145)
In single-node setups, guest alerts had Instance == Node, causing
reevaluateActiveAlertsLocked to evaluate them against NodeDefaults
instead of GuestDefaults. Setting guest memory threshold to 0 (disabled)
wouldn't clear existing guest alerts because they were being kept alive
by the still-enabled node memory threshold.

- Add resourceID colon check to distinguish guest IDs (instance:node:vmid)
  from node IDs (instance-node) in reevaluateActiveAlertsLocked
- Clear stale alerts in checkMetric when threshold is nil or disabled
- Skip hysteresis validation for disabled thresholds (Trigger <= 0)
- Fix frontend tooltip: "0" not "-1" disables a threshold
2026-02-02 15:17:53 +00:00
rcourtman
712e5846ec test(ai): add unit tests for discovery adapter
- Add comprehensive tests for DiscoveryMCPAdapter in internal/ai/tools/discovery_adapter_test.go
- Validate strict delegation to DiscoverySource and data transformation
2026-02-02 15:04:45 +00:00
rcourtman
5959cd9d7f test(ai): add unit tests for eval runner
- Add unit tests for internal/ai/eval package
- Validate configuration, retry logic, and custom SSE parsing
- Enables coverage for eval framework without requiring live Pulse server
2026-02-02 14:54:01 +00:00
rcourtman
e86c25c771 test(api): increase coverage for discovery and chat adapter
- Add comprehensive tests for discovery_handlers.go (~75% coverage)
- Add tests for chat_service_adapter.go (previously 0% coverage)
- Fix missing API key issues in chat adapter tests by using ollama model configuration
2026-02-02 14:53:52 +00:00
rcourtman
36ff16cd85 chore(test): fix test asset dependencies
- Add ensure_test_assets.sh script to generate dummy frontend assets for testing
- Update Makefile to run asset generation before tests
2026-02-02 14:53:41 +00:00
rcourtman
347a2572da chore: upgrade eslint to v9 to fix security vulnerability
- Updates eslint to v9.20.0 to resolve Dependabot alert #50
- Migrates config to flat format (eslint.config.js)
- Updates typescript-eslint and eslint-plugin-solid
- Fixes lint error in UnifiedBackups.tsx
2026-02-02 14:17:53 +00:00
rcourtman
78cb794640 fix: add --hostname flag to agent installer scripts. Related to #1169
The agent binary supported --hostname but the installer scripts
didn't accept or forward it, causing "[ERROR] Unknown argument".
2026-02-02 14:08:28 +00:00
rcourtman
b611b2219c fix: negotiate SMTP auth mechanism from server capabilities. Related to #1165
Instead of hardcoding PLAIN auth or switching on provider name, query
the server's EHLO response for advertised AUTH mechanisms and pick the
best one (PLAIN preferred, LOGIN as fallback). This properly handles
Microsoft 365 which only advertises LOGIN, and any future server with
non-standard auth support.

Also adds TLS safety check to LOGIN auth (matching PlainAuth behavior)
and moves auth negotiation into each send method so it happens after
the connection and STARTTLS upgrade, when capabilities are accurate.
2026-02-02 11:36:00 +00:00
rcourtman
98a235578e fix: add SMTP LOGIN auth for Microsoft 365 email. Related to #1165
Microsoft 365 advertises AUTH LOGIN but not AUTH PLAIN, causing
"504 5.7.4 Unrecognized authentication type" for users with valid
credentials. Add a loginAuth implementation and use it automatically
when the Microsoft 365 / Outlook provider is selected.
2026-02-02 11:30:46 +00:00
rcourtman
7946a2a9c1 test(ai/chat): add agentic loop formatting tests 2026-02-02 11:15:31 +00:00
rcourtman
3a9c321c50 test(ai): add patrol alert review, resource state, and scoped run tests 2026-02-02 11:15:19 +00:00
rcourtman
4f866c411c test(ai): add patrol AI analysis and intelligence tests 2026-02-02 11:15:07 +00:00
rcourtman
20f1a9ee7f test(ai/chat): add tests for service utilities and knowledge extraction 2026-02-02 11:15:02 +00:00