Commit graph

908 commits

Author SHA1 Message Date
rcourtman
fdb2a07f56 fix(agent): find zpool binary on TrueNAS SCALE (#718)
Enhanced zpool binary lookup to try common paths when exec.LookPath fails.
This fixes issue #718 where TrueNAS SCALE reports inflated storage because
the agent runs with a restricted PATH that doesn't include /usr/sbin.

Changes:
- Added findZpool() helper that tries common paths like /usr/sbin/zpool,
  /sbin/zpool, /usr/local/sbin/zpool for TrueNAS/FreeBSD/Linux systems
- Added commonZpoolPaths variable listing typical zpool locations
- Added tests for the new findZpool function

This ensures zpool list is used for accurate pool-level capacity instead
of falling back to dataset-level summation.
2025-12-18 16:23:56 +00:00
rcourtman
0182cc8310 feat(thresholds): add collapsible accordion sections and UX improvements
- Add CollapsibleSection component with animated expand/collapse
- Wrap all 6 resource sections (Nodes, VMs, PBS, Storage, Backups, Snapshots) with accordion UI
- Add section icons and resource counts in headers
- Add expand all / collapse all buttons for quick navigation
- Make help banner dismissible with localStorage persistence
- Add Ctrl/Cmd+F keyboard shortcut to focus search
- Add keyboard shortcut hint badge on search input
- Add icons to tab navigation for quick identification
- Improve mobile tab labels with shorter text on small screens
- Create reusable components: ThresholdBadge, ResourceCard, GlobalDefaultsRow
- Create useCollapsedSections hook with localStorage persistence
- Default less-used sections (Storage, Backups, Snapshots, PBS) to collapsed
2025-12-18 15:47:44 +00:00
rcourtman
c91307be94 fix: guest URL icon now appears/disappears immediately after AI sets/removes it
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.

Changed from:
  const metadata = guestMetadata()[guestId] || ...
  customUrl={metadata?.customUrl}

To:
  const getMetadata = () => guestMetadata()[guestId] || ...
  customUrl={getMetadata()?.customUrl}

This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
2025-12-18 14:42:47 +00:00
rcourtman
5c9bbf33b6 Merge pull request #856 from BTLzdravtech/wildcard
Adds wildcard support for kube namespace filtering.
2025-12-18 11:30:18 +00:00
rcourtman
cf57cfcb03 Merge pull request #855 from BTLzdravtech/main
Fixes inverted boolean logic in isProblemPod/isProblemDeployment checks and improves init container exit code handling.
2025-12-18 11:30:11 +00:00
Tomas Hruska
a419b6237a support wildcards --kube-include-namespace/--kube-exclude-namespace 2025-12-18 00:00:30 +01:00
Tomas Hruska
69d693f346 Fix kubernetes logic and init containers detection 2025-12-17 23:47:03 +01:00
rcourtman
210a6f7cc0 monitoring: keep host IDs stable via token+hostname binding 2025-12-17 20:16:27 +00:00
rcourtman
ebc3474647 hostagent: avoid identity collisions with MAC fallback (Related to #836) 2025-12-17 20:09:55 +00:00
rcourtman
3623395549 test(api): allow printable alert IDs for acknowledge (Related to #852) 2025-12-17 20:09:51 +00:00
rcourtman
5338ab580c Stabilize core E2E tests
- Preserve alerts activation state when saving thresholds
- Use compliant default E2E password and deterministic bootstrap token seeding
- Harden Playwright selectors, waits, and diagnostics gating
2025-12-17 19:36:48 +00:00
rcourtman
54fc259221 fix(ai): improve AI settings UX with validation and smart fallbacks
Backend:
- Add smart provider fallback when selected model's provider isn't configured
- Automatically switch to a model from a configured provider instead of failing
- Log warning when fallback occurs for visibility

Frontend (AISettings.tsx):
- Add helper functions to check if model's provider is configured
- Group model dropdown: configured providers first, unconfigured marked with ⚠️
- Add inline warning when selecting model from unconfigured provider
- Validate on save that model's provider is configured (or being added)
- Warn before clearing last configured provider (would disable AI)
- Warn before clearing provider that current model uses
- Add patrol interval validation (must be 0 or >= 10 minutes)
- Show red border + inline error for invalid patrol intervals 1-9
- Update patrol interval hint: '(0=off, 10+ to enable)'

These changes prevent confusing '500 Internal Server Error' and
'AI is not enabled or configured' errors when model/provider mismatch.
2025-12-17 18:30:19 +00:00
rcourtman
c4b893e257 Fix agent download serving wrong architecture binary
When a specific architecture is requested (e.g., linux-arm64), don't fall
back to the generic pulse-agent binary if the requested arch isn't found.
This was causing ARM64 machines to receive x86-64 binaries that can't run.

Now returns 404 with helpful error message if requested architecture
binary is not available.
2025-12-17 17:22:51 +00:00
rcourtman
30f01771ac Add meaningful tests for host agent and exec websocket 2025-12-17 17:02:01 +00:00
rcourtman
ab480ca489 fix: Prevent orphaned encrypted data when encryption key is deleted
- crypto.go: Add runtime validation to Encrypt() that verifies the key file
  still exists on disk before encrypting. If the key was deleted while Pulse
  is running, encryption now fails with a clear error instead of creating
  orphaned data that can never be decrypted.

- hot-dev.sh: Auto-generate encryption key for production data directory
  (/etc/pulse) when HOT_DEV_USE_PROD_DATA=true and key is missing. This
  prevents startup failures and ensures encrypted data can be created.

- Added test TestEncryptRefusesAfterKeyDeleted to verify the protection works.
2025-12-17 17:00:53 +00:00
rcourtman
d663ba4342 hostagent: avoid host ID collisions and prefer LAN IP 2025-12-17 16:29:59 +00:00
rcourtman
71e1b5dc86 test: expand AI provider test coverage with HTTP mocks 2025-12-17 15:53:56 +00:00
rcourtman
47dfa5d703 test: expand cmd and agent update coverage 2025-12-17 13:28:17 +00:00
rcourtman
0ee6e50c8b fix(config): avoid deadlock saving empty nodes config 2025-12-17 13:28:06 +00:00
rcourtman
969fa0e509 test: add unit tests for AI, Kubernetes agent, and clients 2025-12-17 12:47:36 +00:00
rcourtman
67bde72c93 Improve test coverage 2025-12-17 12:00:59 +00:00
rcourtman
667119269d feat: Add tool/function calling support to Ollama provider
Fixes issue where Ollama users get 'I'm a large language model, I can't do XYZ'
responses when trying to use the AI assistant. The problem was that the
Ollama provider was not passing tool definitions to the API.

Changes:
- Add Tools field to ollamaRequest struct
- Add ollamaTool, ollamaToolFunction, ollamaToolCall structs
- Convert tools from ChatRequest to Ollama format in Chat()
- Parse tool_calls from Ollama response
- Set StopReason to 'tool_use' when model requests tool execution
- Handle tool results in multi-turn conversations

Requires Ollama v0.3.0+ and a tool-capable model (llama3.1+, mistral-nemo, etc.)

Closes: Discussion #845 comment by misterlegend
2025-12-17 11:54:32 +00:00
rcourtman
07c993bfe8 fix: backup matching uses instance+VMID to prevent cross-instance collisions
Previously, SyncGuestBackupTimes matched backups to guests using only VMID.
This caused newly created containers to incorrectly show old backup times
from different containers on other Proxmox instances that happened to have
the same VMID.

Now uses composite key (instance+VMID) for PVE storage backups to ensure
proper isolation. PBS backups still use VMID matching (since they aggregate
from multiple sources) but only as a fallback.

Fixes issue where ollama LXC showed 'last backup 3 months ago' despite
being created yesterday.
2025-12-16 22:19:19 +00:00
rcourtman
df48961d04 test: add regression test for OIDC env vars with nil config (#853)
Adds TestOIDCEnvVarsWithNilConfig to catch the case where OIDC_* env
vars were silently ignored when no oidc.enc file existed. This documents
the proper pattern of initializing OIDCConfig before calling MergeFromEnv.
2025-12-16 21:02:47 +00:00
rcourtman
5591b3006f fix: OIDC env vars ignored when no oidc.enc file exists
When OIDC_* environment variables were set but no oidc.enc config file
existed, cfg.OIDC was nil and MergeFromEnv would silently return without
applying the env vars (due to nil receiver check).

Fix: Initialize cfg.OIDC to default values before merging env vars if
it's nil. This ensures OIDC can be configured purely through environment
variables without requiring a pre-existing config file.

Related to #853
2025-12-16 20:25:56 +00:00
rcourtman
e157aa04ad fix: Allow printable ASCII in alert IDs for acknowledge requests
Reverts overly strict alert ID validation that was rejecting valid
alert IDs containing special characters. Docker host IDs can contain
user-supplied data like hostnames which may include parentheses,
brackets, or other printable ASCII characters.

The previous validation only allowed alphanumeric + limited punctuation,
which caused 400 errors when acknowledging alerts from Docker hosts
with special characters in their identifiers.

Related to #852
2025-12-16 20:10:31 +00:00
rcourtman
cf44352c83 feat: configurable backup freshness thresholds for dashboard indicator
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).

- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration

Closes #839
2025-12-16 16:36:08 +00:00
rcourtman
b79d04f734 Add comprehensive AI test coverage
- Add integration tests for Ollama provider (17 tests against real API)
- Add unit tests for baseline, correlation, patterns, memory, knowledge, cost packages
- Add context formatter and builder tests
- Add factory tests for provider initialization
- Add Makefile targets: test-integration, test-all
- Clean up test theatre (removed struct field tests)

Integration tests require Ollama at OLLAMA_URL (default: 192.168.0.124:11434)
Run with: make test-integration
2025-12-16 12:33:06 +00:00
rcourtman
dc156d097c fix: Retry-After header now uses actual limiter window
Previously the Retry-After header was hardcoded to "60" seconds
regardless of the rate limiter's actual window duration. Now uses
the limiter's configured window (e.g., 600 seconds for recovery
endpoints, 300 for exports).

Related to #579
2025-12-16 10:07:16 +00:00
rcourtman
47ced7c97e feat(ui): make AI settings page more compact and user-friendly
- Replace verbose info banner with streamlined layout
- Add collapsible 'Advanced Model Selection' accordion for Chat/Patrol models
- Make AI Patrol Settings section collapsible with inline summary badges
- Compact Cost Controls into single-row inline layout
- Reduce form spacing for tighter presentation
- Remove unused formHelpText import

Also includes:
- OpenAI provider fixes for max_tokens parameters
- Security setup CSRF and 401 fixes
- Minor UI tweaks
2025-12-16 09:20:09 +00:00
rcourtman
fb01d87b00 fix: strip provider prefix from all AI provider models and instant URL refresh
Backend fixes:
- Strip provider prefix (anthropic:, openai:, deepseek:, ollama:) in all
  provider Chat methods and constructors for robust handling
- Models are now correctly parsed regardless of caller format

Frontend fixes:
- Tool cards now persist in AI chat after approval execution by adding
  to streamEvents array
- Dashboard now listens for pulse:metadata-changed custom event
- AI Chat emits this event when set_resource_url tool completes
- Guest URL icons now update instantly when AI sets them
2025-12-15 19:18:09 +00:00
rcourtman
3d6da91ac0 feat(ai): improve AI settings first-time setup UX
- Add setup modal that appears when enabling AI without configured provider
- Modal allows selecting provider (Anthropic, OpenAI, DeepSeek, Ollama)
- Enter API key/URL and enable AI in one smooth flow
- Reorder backend to apply API keys before enabled check
- Fix Ollama to strip 'ollama:' prefix from model names
- Simplify backend error message for unconfigured providers
2025-12-15 18:59:19 +00:00
rcourtman
0e8c8d51ca fix(ai): add fallback default model when Ollama model is empty
When model is not explicitly set in config or request, fall back to
llama3 to prevent 'model is required' errors from Ollama.
2025-12-15 16:59:51 +00:00
rcourtman
8687d69242 fix(ai): normalize Ollama base URL to prevent 405 errors
Users sometimes enter URLs with trailing slashes or include the /api path:
- http://host:11434/  -> would become http://host:11434//api/chat
- http://host:11434/api -> would become http://host:11434/api/api/chat

Now we strip trailing slashes and /api suffix during client initialization.

Fixes #847
2025-12-15 16:51:52 +00:00
rcourtman
f18bf62bd3 fix(ai): use configured provider's default model when no model set
When a user configures only Ollama (or any single provider) via the
multi-provider UI without explicitly selecting a model, GetModel() now
returns that provider's default model instead of falling back to the
legacy Provider field which defaults to "anthropic".

This fixes "API key is required for anthropic" errors when enabling AI
with only Ollama configured.

Related to #847
2025-12-15 11:18:05 +00:00
rcourtman
fed0a9d89b fix(ai): allow enabling AI when any provider is configured
The enable validation was using the legacy single-provider model which
checked settings.Provider and settings.APIKey. Users configuring Ollama
via the new multi-provider UI (setting ollama_base_url) couldn't enable
AI because settings.Provider defaulted to "anthropic" which required an
API key.

Now checks GetConfiguredProviders() first - if any provider is configured
(Anthropic, OpenAI, DeepSeek, or Ollama), AI can be enabled.

Related to #847
2025-12-15 09:43:17 +00:00
rcourtman
204219ab7f fix(agent): use /etc/machine-id in LXC containers to avoid ID collisions
LXC containers share the host's /sys/class/dmi/id/product_uuid, which
causes gopsutil to return identical HostIDs for all LXC containers on
the same physical host. This results in agent ID collisions where
multiple LXC containers appear as a single host in Pulse.

The fix detects LXC containers and prefers /etc/machine-id (which is
unique per container) over gopsutil's HostID.

Related to #773
2025-12-14 23:05:32 +00:00
rcourtman
d12ab31703 feat(docker-agent): add payload size logging for debugging body-too-large errors
Related to #823

- Log payload size (in KB and bytes) at debug level
- Warn when payload approaches 400KB (512KB limit)
- Helps diagnose 'request body too large' errors
2025-12-14 21:10:06 +00:00
rcourtman
2e06f6b966 feat: auto-detect platforms during agent install and allow multi-host tokens
- Install script now auto-detects Docker, Kubernetes, and Proxmox
- Platform monitoring is enabled automatically when detected
- Users can override with --disable-* or --enable-* flags
- Allow same token to register multiple hosts (one per hostname)
- Update tests to reflect new multi-host token behavior
- Improve CompleteStep and UnifiedAgents UI components
- Update UNIFIED_AGENT.md documentation
2025-12-14 16:21:59 +00:00
rcourtman
397871629c fix: cluster-aware guest deduplication and multi-agent token binding
- Add cluster-aware guest ID generation (clusterName-VMID instead of instanceName-VMID)
  to prevent duplicate VMs/containers when multiple cluster nodes are monitored

- Add cluster deduplication at registration time - when a node is added that belongs
  to an already-configured cluster, merge as endpoint instead of creating duplicate

- Add startup consolidation to automatically merge duplicate cluster instances

- Change host agent token binding from agent GUID to hostname, allowing:
  - Multiple host agents to share a token (each bound by hostname)
  - Agent reinstalls on same host without token conflicts

- Remove 12-character password minimum requirement

- Remove emoji from auto-registration success message

- Fix grouped view node lookup to support both cluster-aware node IDs
  (clusterName-nodeName) and legacy guest grouping keys (instance-nodeName)

Fixes duplicate guests appearing when agents are installed on multiple
cluster nodes. Also improves multi-agent UX by allowing shared tokens.
2025-12-14 10:16:17 +00:00
rcourtman
5e2939b6bd feat: link host agents to PVE nodes by hostname to prevent duplication
When a host agent registers, it now searches for a PVE node with a
matching hostname and links them together. Similarly, when PVE nodes
are discovered, they check for existing host agents with matching hostnames.

This prevents the confusion of seeing duplicate entries when users install
agents on PVE cluster nodes that were already discovered via the cluster API.

- Added LinkedHostAgentID field to Node struct
- Added LinkedNodeID/LinkedVMID/LinkedContainerID fields to Host struct
- Added findLinkedProxmoxEntity() to match by hostname (with domain stripping)
- Updated UpdateNodesForInstance() to preserve and auto-set links
2025-12-13 23:14:00 +00:00
rcourtman
e6d07c3294 style: remove emojis from log messages
Replaced emoji icons with plain text for cleaner logs and cross-platform compatibility.
2025-12-13 21:29:11 +00:00
rcourtman
a4cfc09061 chore: reduce minimum password length to 1
Allows users to choose their own password security level instead of
enforcing a 12-character minimum. Users are adults.
2025-12-13 21:29:00 +00:00
rcourtman
1dc573e209 feat: support token query parameter for WebSocket authentication
Allows WebSocket connections to authenticate via ?token= query parameter
when headers cannot be sent (browser WebSocket limitations).
2025-12-13 21:28:50 +00:00
rcourtman
7acff2215c style: remove emojis from AI context formatting and prompts
Replaced emoji indicators with text equivalents for better cross-platform
compatibility and cleaner LLM prompts.
2025-12-13 21:26:49 +00:00
rcourtman
8919281718 fix: clear agents that connected during unauthenticated setup window
When no auth is configured (fresh install), CheckAuth allows all requests.
This creates a race condition where existing agents from a previous setup
can report data before the wizard completes security configuration.

This fix clears all host agents and docker hosts when /api/security/quick-setup
is called, ensuring the wizard shows a clean state after security is configured.

Added:
- State.ClearAllHosts() - removes all host agents
- State.ClearAllDockerHosts() - removes all docker hosts
- Monitor.ClearUnauthenticatedAgents() - clears both and resets token bindings
- Call to ClearUnauthenticatedAgents() in handleQuickSecuritySetupFixed()
2025-12-13 21:22:04 +00:00
rcourtman
3a2a73f9d6 Merge main into ai-features: incorporate latest bugfixes
Resolved conflicts:
- pkg/fsfilters/filters.go: Keep both TrueNAS and EnhanceCP filter fixes
- DockerUnifiedTable.tsx: Use main's resource column overlap fix
2025-12-13 15:18:51 +00:00
rcourtman
5c4069cdbf feat: add persistent metrics history API endpoint
- Add GET /api/metrics-store/history endpoint for querying SQLite-backed metrics
- Support flexible time ranges: 1h, 6h, 12h, 24h, 7d, 30d, 90d
- Return aggregated data with min/max values for longer time ranges
- Add TypeScript types and ChartsAPI.getMetricsHistory() client method

This enables frontend charts to visualize long-term trends using the
tiered retention system (raw → minute → hourly → daily averages).
2025-12-13 14:18:16 +00:00
rcourtman
97f2bfa1ed feat: add configurable metrics retention settings
- Add MetricsRetentionRawHours, MetricsRetentionMinuteHours, MetricsRetentionHourlyDays, MetricsRetentionDailyDays to SystemSettings
- Wire settings from system.json through Config to metrics store initialization
- Set sensible defaults: Raw=2h, Minute=24h, Hourly=7d, Daily=90d
- Log active retention values on startup for transparency

Users can now customize how long metrics are stored at each aggregation tier.
2025-12-13 14:14:07 +00:00
rcourtman
26802cd7bf feat(backend): Implement remaining TODOs
1. resources/store.go: Implement sorting in Query.Execute()
   - Added sortResources function with support for common fields
   - Supports: name, type, status, cpu, memory, disk, last_seen
   - Both ascending and descending order supported

2. ai/service.go: Implement hasAgentForTarget properly
   - Now maps target to specific agent based on hostname/node
   - Uses ResourceProvider lookup for container→host mapping
   - Supports cluster peer routing for Proxmox clusters
   - Properly handles single-agent vs multi-agent scenarios
2025-12-13 13:21:23 +00:00