Commit graph

118 commits

Author SHA1 Message Date
rcourtman
ae522c9a2b fix: Allow all threshold types (Storage, Temperature, Host Agent) to be set to 0 to disable alerting
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior

Related to #864
2025-12-20 20:42:23 +00:00
rcourtman
db5e79bb37 fix: Allow Host Agent thresholds to be set to 0 to disable alerting. Related to #864 2025-12-20 20:25:20 +00:00
rcourtman
65e38fac91 test: improve test coverage for AI, license, config, and monitoring packages
New test files:
- internal/ai/providers/gemini_test.go: Comprehensive Gemini provider tests
- internal/api/ai_intelligence_handlers_test.go: AI intelligence endpoint tests
- internal/api/ai_patrol_handlers_test.go: AI patrol endpoint tests
- internal/api/license_handlers_test.go: License API handler tests
- internal/api/security_oidc_response_test.go: OIDC response formatting tests
- internal/config/ai_config_test.go: AI configuration function tests
- internal/config/persistence_ai_test.go: AI config persistence tests
- internal/config/persistence_extended_test.go: Extended persistence tests
- internal/license/persistence_test.go: License persistence tests
- internal/license/pubkey_test.go: Public key handling tests
- internal/monitoring/host_agent_temps_test.go: Temperature processing tests

Enhanced existing files:
- internal/api/updates_test.go: Added update handler tests
- internal/license/license_test.go: Added Service method tests

Coverage improvements:
- ai/providers: 57.3% -> 73.0% (+15.7%)
- license: 78.3% -> 85.9% (+7.6%)
- config: 49.7% -> 53.9% (+4.2%)
- monitoring: 49.8% -> 50.8% (+1.0%)
- api: 28.4% -> 29.8% (+1.4%)
2025-12-19 22:49:30 +00:00
rcourtman
1d64b4c31a fix: show Removed Docker Hosts section in UI for re-enrollment
The 'Removed Docker Hosts' section was not appearing in Settings -> Agents
even when hosts were blocked from re-enrolling. This prevented users from
using the 'Allow re-enroll' button to unblock their Docker agents.

Root cause: The WebSocket store was missing:
1. The 'removedDockerHosts' property in its initial state
2. A handler to process removedDockerHosts data from WebSocket messages

This meant the backend was correctly sending the data, but the frontend
was completely ignoring it.

Changes:
- Add removedDockerHosts to WebSocket store initial state and message handler
- Add removedDockerHosts to App.tsx fallback state for consistency
- Add missing BroadcastState call after AllowDockerHostReenroll succeeds

Also includes previous fixes from this session:
- Add PULSE_AGENT_URL as alias for PULSE_AGENT_CONNECT_URL (config.go)
- Add runtime Docker/Podman auto-detection in pulse-agent (main.go)

Fixes issue reported by darthrater78 in discussion #845
2025-12-19 17:57:04 +00:00
rcourtman
13af682ce1 fix(config): add PULSE_AGENT_CONNECT_URL and improve Docker detection
- Add AgentConnectURL config option to override public URL for agents
- Improve install.sh to diagnose docker detection failures
- Update router to prioritize AgentConnectURL for agent install commands
2025-12-19 16:43:14 +00:00
rcourtman
8400976e80 fix: wait for async save in guest metadata test
The TestGuestMetadataStore_GetWithLegacyMigration_ClusteredMatchesNodeFormat
test was flaky because it triggered an async save in GetWithLegacyMigration
but didn't wait for it to complete. When the test ended, t.TempDir() tried
to clean up while the goroutine was still writing, causing 'directory not
empty' errors on CI.

Added time.Sleep(100ms) to wait for the async save, matching the pattern
used in other similar tests in the same file.
2025-12-18 22:48:15 +00:00
rcourtman
0d11da74e2 refactor(ui): standardize URL editing with shared UrlEditPopover component
- Create reusable UrlEditPopover component with fixed positioning
- Add createUrlEditState hook for managing editing state
- Update DockerHostSummaryTable to use new popover
- Update DockerUnifiedTable (containers & services) to use new popover
- Update GuestRow (Proxmox VMs/containers) to use new popover
- Update HostsOverview (Proxmox hosts) to use new popover
- Add Docker host metadata API for custom URLs
- Consistent styling with save, delete, cancel buttons and keyboard shortcuts
2025-12-18 22:22:55 +00:00
rcourtman
65829983b5 v5: gate legacy sensor-proxy and prune dev docs 2025-12-18 21:51:25 +00:00
rcourtman
c91307be94 fix: guest URL icon now appears/disappears immediately after AI sets/removes it
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.

Changed from:
  const metadata = guestMetadata()[guestId] || ...
  customUrl={metadata?.customUrl}

To:
  const getMetadata = () => guestMetadata()[guestId] || ...
  customUrl={getMetadata()?.customUrl}

This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
2025-12-18 14:42:47 +00:00
rcourtman
0ee6e50c8b fix(config): avoid deadlock saving empty nodes config 2025-12-17 13:28:06 +00:00
rcourtman
df48961d04 test: add regression test for OIDC env vars with nil config (#853)
Adds TestOIDCEnvVarsWithNilConfig to catch the case where OIDC_* env
vars were silently ignored when no oidc.enc file existed. This documents
the proper pattern of initializing OIDCConfig before calling MergeFromEnv.
2025-12-16 21:02:47 +00:00
rcourtman
5591b3006f fix: OIDC env vars ignored when no oidc.enc file exists
When OIDC_* environment variables were set but no oidc.enc config file
existed, cfg.OIDC was nil and MergeFromEnv would silently return without
applying the env vars (due to nil receiver check).

Fix: Initialize cfg.OIDC to default values before merging env vars if
it's nil. This ensures OIDC can be configured purely through environment
variables without requiring a pre-existing config file.

Related to #853
2025-12-16 20:25:56 +00:00
rcourtman
cf44352c83 feat: configurable backup freshness thresholds for dashboard indicator
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).

- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration

Closes #839
2025-12-16 16:36:08 +00:00
rcourtman
f18bf62bd3 fix(ai): use configured provider's default model when no model set
When a user configures only Ollama (or any single provider) via the
multi-provider UI without explicitly selecting a model, GetModel() now
returns that provider's default model instead of falling back to the
legacy Provider field which defaults to "anthropic".

This fixes "API key is required for anthropic" errors when enabling AI
with only Ollama configured.

Related to #847
2025-12-15 11:18:05 +00:00
rcourtman
e6d07c3294 style: remove emojis from log messages
Replaced emoji icons with plain text for cleaner logs and cross-platform compatibility.
2025-12-13 21:29:11 +00:00
rcourtman
97f2bfa1ed feat: add configurable metrics retention settings
- Add MetricsRetentionRawHours, MetricsRetentionMinuteHours, MetricsRetentionHourlyDays, MetricsRetentionDailyDays to SystemSettings
- Wire settings from system.json through Config to metrics store initialization
- Set sensible defaults: Raw=2h, Minute=24h, Hourly=7d, Daily=90d
- Log active retention values on startup for transparency

Users can now customize how long metrics are stored at each aggregation tier.
2025-12-13 14:14:07 +00:00
rcourtman
a259b67348 feat: add Kubernetes platform support 2025-12-12 21:31:11 +00:00
rcourtman
d36ad0945f feat(settings): Add separate Auto-Fix Model setting for remediation
Add configurable model specifically for automatic remediation actions:

Backend (internal/config/ai.go):
- Add AutoFixModel field to AIConfig
- Add GetAutoFixModel() getter with fallback chain:
  AutoFixModel -> PatrolModel -> Model

Frontend (AISettings.tsx, types/ai.ts):
- Add auto_fix_model to AISettings types
- Add Auto-Fix Model dropdown (only shows when patrol_auto_fix enabled)
- Falls back to patrol model if not set

API (ai_handlers.go):
- Add auto_fix_model to response and update request
- Handle saving/loading the new field

Rationale:
- Auto-fix takes real actions, may warrant a more capable model
- Patrol observation can use cheaper models for cost savings
- Gives users granular control over model costs vs reliability
- Model hierarchy: Chat > AutoFix > Patrol > Default
2025-12-12 14:35:28 +00:00
rcourtman
54a3c3c47d Persist AI cost budget and allow history reset 2025-12-12 12:10:58 +00:00
rcourtman
88d419dd5b feat(ai): Add enriched context with historical trends and predictions
Phase 1 of Pulse AI differentiation:

- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions

This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights

All tests passing. Falls back to basic summary if metrics history unavailable.
2025-12-12 09:45:57 +00:00
rcourtman
d078f5f0f6 fix: Ollama should only show as configured when URL is explicitly set
Previously Ollama always showed as 'Available' even if not set up.
Now it only shows as configured when user has entered an OllamaBaseURL.
2025-12-11 17:12:01 +00:00
rcourtman
e842f523b7 feat: Implement multi-provider AI support
Backend:
- Add per-provider API key fields to AIConfig (AnthropicAPIKey, OpenAIAPIKey, DeepSeekAPIKey, OllamaBaseURL, OpenAIBaseURL)
- Add NewForProvider() and NewForModel() factory functions for multi-provider instantiation
- Update ListModels() to aggregate models from all configured providers with provider:model format
- Update Execute/ExecuteStream to dynamically create provider based on selected model
- Update TestConnection to use multi-provider aware provider creation
- Add helper functions: HasProvider(), GetConfiguredProviders(), GetAPIKeyForProvider(), GetBaseURLForProvider(), ParseModelString(), FormatModelString()

Frontend:
- Remove legacy single-provider UI (provider grid, single API key input, single base URL)
- Add accordion-style UI for configuring all providers independently
- Add model grouping by provider in selectors using optgroup
- Update AIChat model dropdown with grouped provider sections
- Add helper functions for parsing provider from model ID and grouping models

API:
- Add multi-provider fields to AISettingsResponse and AISettingsUpdateRequest
- Add /api/ai/models endpoint for dynamic model listing
- Update settings handlers for per-provider credential management
2025-12-11 16:00:45 +00:00
rcourtman
1e3fdb6f63 feat(ai): Enhanced AI patrol system with alert triggers and history persistence
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
2025-12-10 21:08:22 +00:00
rcourtman
ae7b66ecff refactor(ai): Remove over-engineered URL discovery service
Keep only the simple AI-powered approach:
- set_resource_url tool lets AI save discovered URLs
- Users ask AI directly: 'Find URLs for my containers'
- AI uses its intelligence to discover and set URLs

Removed:
- URLDiscoveryService (rigid port scanning)
- Bulk discovery API endpoints
- Frontend discovery button

The AI itself is smart enough to iterate through resources
and discover URLs when asked.
2025-12-10 08:35:24 +00:00
rcourtman
c8adbb7ae5 Add AI monitoring enhancements and host metadata features
- Add host metadata API for custom URL editing on hosts page
- Enhance AI routing with unified resource provider lookup
- Add encryption key watcher script for debugging key issues
- Improve AI service with better command timeout handling
- Update dev environment workflow with key monitoring docs
- Fix resource store deduplication logic
2025-12-09 16:27:46 +00:00
rcourtman
927ac76bad feat: AI integration, Docker metrics, RAID display, and infrastructure improvements
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
2025-12-09 09:29:27 +00:00
rcourtman
bcd7b550d4 AI Problem Solver implementation and various fixes
- Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters
- Add 'Investigate with AI' button to filter bar for problematic guests
- Fix dashboard column sizing inconsistencies between bars and sparklines view modes
- Fix PBS backups display and polling
- Refine AI prompt for general-purpose usage
- Fix frontend flickering and reload loops during initial load
- Integrate persistent SQLite metrics store with Monitor
- Fortify AI command routing with improved validation and logging
- Fix CSRF token handling for note deletion
- Debug and fix AI command execution issues
- Various AI reliability improvements and command safety enhancements
2025-12-06 23:46:08 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
53d7776d6b wip: AI chat integration with multi-provider support
- Add AI service with Anthropic, OpenAI, and Ollama providers
- Add AI chat UI component with streaming responses
- Add AI settings page for configuration
- Add agent exec framework for command execution
- Add API endpoints for AI chat and configuration
2025-12-04 20:16:53 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
322573157e refactor: Use zerolog instead of fmt.Printf in config export
Replace raw fmt.Printf calls with structured zerolog logging for
consistency with the rest of the codebase. This improves log
formatting and enables proper log level filtering.
2025-12-02 16:27:54 +00:00
rcourtman
884c85c2ab chore: Remove debug logging that exposed config JSON
Removed two DEBUG log statements that were logging full nodes config
JSON at Info level. This was verbose and potentially exposed sensitive
configuration data (credentials, tokens) in logs.
2025-12-02 15:32:02 +00:00
rcourtman
a2c8661787 test: Add SaveNodesConfigAllowEmpty test and document deadlock
Add test for SaveNodesConfigAllowEmpty which permits explicit
deletion of all nodes. Document deadlock bug in saveNodesConfig
where empty config protection tries to call LoadNodesConfig
while holding write lock.
2025-12-02 02:54:49 +00:00
rcourtman
c11c700b63 test: Add LoadNodesConfig PBS/PMG migration tests
Test PBS and PMG token clearing and host normalization migrations.
Coverage: 61.4% → 76.3%
2025-12-02 02:46:47 +00:00
rcourtman
d370010a4f test: Add LoadEmailConfig encrypted round-trip test
Test Save+Load with encryption enabled to cover success path
after decryption. Coverage: 58.8% → 88.2%
2025-12-02 02:43:49 +00:00
rcourtman
36f7f84a02 test: Add LoadWebhooks migration and error path tests
Cover unencrypted .enc file migration fallback and legacy file
with invalid JSON graceful handling.
Coverage: 67.9% → 75.0%
2025-12-02 02:39:04 +00:00
rcourtman
1b9af1c18e test: Add SaveAlertConfig host defaults normalization tests
Cover nil HostDefaults initialization (CPU/Memory/Disk) and clear
threshold computation when clear=0 but trigger is set.
Coverage: 68.2% → 75.3%
2025-12-02 02:37:01 +00:00
rcourtman
c26fb97181 test: Add StageFile transaction tests
Cover success path, already committed error, replacing existing staged
file, and empty basename edge case.
2025-12-02 02:34:20 +00:00
rcourtman
a89028e753 test: Add ImportConfig error path tests
Cover empty passphrase, invalid base64, wrong passphrase decryption
failure, and invalid JSON content error paths.
Coverage: 66.7% → 73.7%
2025-12-02 02:32:06 +00:00
rcourtman
36d6279107 test: Add DiscoveryConfig UnmarshalJSON tests
Cover invalid JSON error path, modern field parsing, legacy field
parsing, and empty object default handling.
Coverage: 60% → 88.9%
2025-12-02 02:29:38 +00:00
rcourtman
eb02f28f5b test: Add tests for getInstanceConfig, baseIntervalForInstanceType, LoadOIDCConfig
- getInstanceConfig: 33%→100% (nil handling, case-insensitive matching)
- baseIntervalForInstanceType: 50%→100% (all instance types, clamping)
- LoadOIDCConfig: 35%→94% (file not exist, read errors, decryption)
2025-12-01 18:06:15 +00:00
rcourtman
1a23a7d9ec test: Add error path tests for config persistence functions
- LoadAPITokens: invalid JSON, empty file, file not exist
- LoadEmailConfig: invalid JSON, file not exist
- LoadWebhooks: invalid JSON, legacy migration
- LoadNodesConfig: empty arrays, missing fields, corruption recovery
- cleanupOldBackups: non-existent dir, multiple files cleanup
- Bonus: LoadAppriseConfig, LoadAlertConfig, LoadSystemSettings error paths

config package coverage: 46.3% → 48.2%
2025-12-01 17:45:50 +00:00
rcourtman
ed75f2f096 test: Add comprehensive tests for API token management
- Clone: deep copy verification for pointers and slices
- NewAPITokenRecord/NewHashedAPITokenRecord: creation and validation
- Config methods: HasAPITokens, APITokenCount, ActiveAPITokenHashes
- Config methods: HasAPITokenHash, PrimaryAPITokenHash, PrimaryAPITokenHint
- Config methods: ValidateAPIToken, UpsertAPIToken, RemoveAPIToken, SortAPITokens

config package coverage: 43.5% → 46.3%
2025-12-01 17:37:27 +00:00
rcourtman
4aa53c6ce6 test: Add comprehensive tests for CreateProxmoxConfigFromFields function
Cover all branches: user without realm gets @pam appended, user with
realm unchanged, token auth skips user modification, empty user handling,
and fingerprint/verifySSL preservation. Coverage improved from 66.7% to 100%.
2025-12-01 15:22:50 +00:00
rcourtman
ed4a229c8b test: Add LoadAPITokens error path tests
- Test nonexistent file returns empty slice (not error)
- Test empty file returns empty slice
- Test invalid JSON returns error
- Improves LoadAPITokens coverage from 80% to 93.3%
2025-12-01 15:00:56 +00:00
rcourtman
59970afc65 test: Add HasScope edge case tests for API tokens
- Test empty scope always returns true
- Test explicit wildcard scope in list grants any scope
- Improves coverage from 85.7% to 100%
2025-12-01 14:51:58 +00:00
rcourtman
d548287105 test: Add unit tests for api_tokens.go pure functions
Add comprehensive tests for tokenPrefix, tokenSuffix, normalizeScopes,
and IsKnownScope functions. Coverage increased 42.7% -> 43.3%.
2025-12-01 12:32:37 +00:00
rcourtman
5d61864495 Add unit tests for importTransaction (internal/config)
24 test cases covering transaction lifecycle: staging, commit, rollback, and cleanup.
Tests include atomic commit behavior, backup/restore on failure, directory creation,
permission handling, and idempotent cleanup.

First test file for import_transaction.go. Coverage 42.4% → 42.7%.
2025-11-30 21:04:41 +00:00
rcourtman
33e3744f08 Add unit tests for GuestMetadataStore (internal/config)
22 test cases covering CRUD operations (Get, GetAll, Set, Delete,
ReplaceAll), persistence (Load, save with atomic write, directory
creation), legacy ID migration (GetWithLegacyMigration for clustered
and standalone formats), round-trip verification, and concurrency.

First test file for guest_metadata.go, complementing the similar
docker_metadata_test.go added in previous run.
2025-11-30 20:48:05 +00:00
rcourtman
0223a52f82 Fix cluster proxy token collision - store per-node control tokens
When multiple cluster nodes register sensor-proxy, each registration
was overwriting the previous node's control token on the shared
PVEInstance. This caused "Proxy token not recognized" errors on all
but the last-registered node.

Changes:
- Add TemperatureProxyControlToken field to ClusterEndpoint struct
- Store control tokens per-endpoint for cluster registrations
- Check both instance-level and endpoint-level tokens when validating

Related to #738
2025-11-30 20:37:58 +00:00