Commit graph

4826 commits

Author SHA1 Message Date
rcourtman
37f5e12dc2 test: add encryption keys to remaining cmd/pulse tests
TestConfigImportCmd and TestConfigAutoImportCmd need encryption keys
in CI where /etc/pulse/.encryption.key doesn't exist.
2026-01-04 18:43:40 +00:00
rcourtman
a9d37eed8d test: fix TestLoad_ReadErrors encryption key 2026-01-04 18:24:39 +00:00
rcourtman
821783eef7 test: fix tests that create .enc files without encryption keys
Tests were failing in CI because they created nodes.enc files without
valid encryption keys, triggering the crypto safety check.

Added createTestEncryptionKey helper and fixed:
- TestLoad_MockEnv (config_load_test.go)
- Multiple tests in commands_test.go that create nodes.enc
2026-01-04 18:15:08 +00:00
rcourtman
f2be9b60f0 test: fix TestLoad_Errors to provide valid encryption key
Test was creating .enc files without a valid encryption key, which
triggers the crypto safety check that prevents generating new keys
when encrypted data exists.
2026-01-04 18:02:39 +00:00
rcourtman
d71b6bd756 fix: Allow qm/pct reboot/shutdown commands with approval
The blocked patterns for 'reboot' and 'shutdown' were too broad,
matching anywhere in the command string. This caused legitimate
Proxmox VM control commands like 'qm reboot 201' to be blocked
instead of requiring approval.

Fix by anchoring these patterns to only match bare system commands
(^reboot, ^shutdown, etc.) while allowing qm/pct variants through
the RequireApproval path.

Related to #1024
2026-01-04 17:57:51 +00:00
rcourtman
301b2fd050 test: fix config tests failing in CI when /etc/pulse doesn't exist
Tests were calling Load() without setting PULSE_DATA_DIR, causing them
to try to create /etc/pulse which fails in CI environments.

- Skip TestLoad_Defaults if /etc/pulse doesn't exist
- Add PULSE_DATA_DIR to tests that were missing it
2026-01-04 17:50:57 +00:00
rcourtman
7a1e3e9b4e Improve test coverage for cmd/pulse-sensor-proxy 2026-01-04 16:10:34 +00:00
rcourtman
f77025fb2f test: fix flaky tests with nonexistent path assertions
Tests using /nonexistent/... paths fail in sandboxed environments
where they return 'permission denied' instead of 'not exists'.
Use /tmp/... paths instead which reliably return 'not exists'.
2026-01-04 15:38:30 +00:00
rcourtman
121adbf00a chore: bump version to 5.0.11 2026-01-04 15:27:58 +00:00
rcourtman
45d4d68127 fix: Add debug logging and response format handling for replication status
- Add comprehensive debug logging to diagnose replication status fetch failures
- Handle both array and single-object response formats from Proxmox API
- Log raw response body for easier debugging
- Log success/failure for each enrichment step

This helps diagnose issue #992 where replication last/next sync times aren't
showing. The logging will reveal if the API call is failing, returning empty
data, or returning data in an unexpected format.

Related to #992
2026-01-04 15:01:32 +00:00
rcourtman
43b5fad12c fix: Add main host URL as fallback for remote cluster access
When a Proxmox cluster is discovered, Pulse now includes the user-provided
main host URL as a fallback endpoint. This handles scenarios where Proxmox
reports internal IPs that aren't reachable from Pulse's network (e.g.,
monitoring a remote cluster across different networks).

Previously, if all cluster endpoint IPs were unreachable, the connection
would fail with no fallback. Now the ClusterClient will fall back to the
main host URL, allowing Proxmox to route API calls internally.

Related to #1028
2026-01-04 14:54:03 +00:00
rcourtman
504f26c6f5 test(ai): improve coverage for patrol service
- Added TestPatrolService_RunPatrol_FullCoverage to test main patrol loop
- Added TestPatrolService_StartStop for lifecycle coverage
- Added TestPatrolService_Setters_Coverage for configuration setters
- Added TestPatrol_RunHeuristicAnalysis_Coverage for heuristic integration
- Mocked provider and state for deterministic AI patrol testing
- Addressed 0% coverage in internal/ai/patrol.go
2026-01-04 14:03:58 +00:00
rcourtman
90cce6d51b test(monitoring): fix failing snapshot tests and improve coverage
- Fix TestMonitor_PollGuestSnapshots_Coverage by correctly initializing State ID fields
- Improve PBS client to handle alternative datastore metric fields (total-space, etc.)
- Add comprehensive test coverage for PBS polling, auth failures, and datastore metrics
- Add various coverage tests for monitoring, alerts, and metadata handling
- Refactor Monitor to support better testing of client creation and auth handling
2026-01-04 10:29:40 +00:00
rcourtman
5d4e911298 feat: improve test coverage for pulse-sensor-proxy 2026-01-03 21:42:19 +00:00
rcourtman
fd7e80ae17 fix: Add clear warning when Docker token is already in use
When a Docker agent tries to register with a token that's already bound
to another agent, the error was logged generically as "Failed to send
docker report". Users had to dig into logs to understand the issue.

Now logs a prominent error message:
"DOCKER REGISTRATION FAILED: This API token is already used by another
Docker agent. Each Docker host requires its own unique token. Generate
a new token in Pulse Settings > Agents and reinstall with the new token."

Related to #1027
2026-01-03 20:56:04 +00:00
rcourtman
22e1cc5613 test(agent): achieve 95% coverage for pulse-agent 2026-01-03 20:52:42 +00:00
rcourtman
fa43628cde fix: Alert acknowledge/unacknowledge fails with reverse proxies
Reverse proxies (Traefik, Caddy, nginx) often normalize or reject URLs
containing %2F (encoded slash). Alert IDs contain forward slashes
(e.g., "docker-container-state-docker:abc/def"), causing acknowledge
requests to fail with 400 errors when going through a reverse proxy.

Added new body-based endpoints that accept alert ID in JSON body:
- POST /api/alerts/acknowledge {"id": "..."}
- POST /api/alerts/unacknowledge {"id": "..."}
- POST /api/alerts/clear {"id": "..."}

Updated frontend to use the new endpoints. Legacy path-based endpoints
are preserved for backwards compatibility.

Related to #1026
2026-01-03 20:51:25 +00:00
rcourtman
adba448419 fix(pbs): correct API paths and achieve >95% test coverage 2026-01-03 20:45:36 +00:00
rcourtman
b039b79e4a fix: Physical disk temps showing 0°C when using host agent SMART data
The mergeNVMeTempsIntoDisks and mergeHostAgentSMARTIntoDisks functions
require nodes to have LinkedHostAgentID populated to match disks with
host agent SMART data. However, the code was passing the local modelNodes
variable which doesn't have this field set - the linking happens inside
UpdateNodesForInstance which modifies the state's copy, not the local var.

Fixed by using currentState.Nodes (from GetSnapshot()) instead of
modelNodes/modelNodesCopy in both the skip-poll path and the background
goroutine. The state snapshot contains nodes with LinkedHostAgentID
already populated, allowing proper SMART data merging.

Related to #1014
2026-01-03 19:20:31 +00:00
rcourtman
abccbcafb6 fix: Container update command incorrectly removes Docker host and revokes token
When a container update command completed successfully, the server was
incorrectly returning shouldRemove=true, which caused the Docker host to
be removed and its API token revoked. This caused 401 Unauthorized errors
for subsequent agent reports.

The fix ensures shouldRemove is only true for "stop" commands, not for
"update_container" or "check_updates" commands.

Related to #1020
2026-01-03 19:05:18 +00:00
rcourtman
233278a9d2 Add Docker Swarm frontend components 2026-01-03 18:52:38 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
08661cca8e fix: Add anchor target for "Manage linked agents" link
The link in the agents list banner pointed to #linked-agents but no
element had that ID, so clicking it did nothing.

Related to #1021
2026-01-03 11:33:08 +00:00
rcourtman
a47c7803bb fix: Preserve configured runtime preference during report collection
When collecting reports, the runtime re-detection was passing RuntimeAuto
instead of the user's configured preference. This caused podman to switch
back to docker on systems like CoreOS where podman provides a docker-
compatible socket at /var/run/docker.sock.

Now the current runtime (set at init from user's --docker-runtime flag)
is passed as the preference, preventing spurious runtime switching.

Related to #1022
2026-01-03 11:30:25 +00:00
rcourtman
9e339957c6 fix: Update runtime config when toggling Docker update actions setting
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.

Related to #1023
2026-01-03 11:14:17 +00:00
rcourtman
fbbefa4546 Improve tests for internal/alerts package
- Fix TestSaveHistoryWithRetry_WriteError to be robust on root
- Add TestOnAlert to history_test.go
- Add pmg_anomaly_test.go for PMG anomaly detection coverage
- Add cleanup_test.go for tracking map cleanup coverage
- extend filter_evaluation_test.go to cover all guest threshold logic
2026-01-02 23:47:16 +00:00
rcourtman
3b48c4acbb Auto-update Helm chart version to 5.0.10 2026-01-02 21:30:25 +00:00
rcourtman
e19c202ff3 Auto-update Helm chart documentation 2026-01-02 21:30:23 +00:00
rcourtman
87ca7c92e0 docs: update example in dev-deploy-agent script 2026-01-02 21:08:42 +00:00
rcourtman
0b3cb71fd1 fix(alerts): use pbsDefaults instead of nodeDefaults for PBS instances 2026-01-02 20:46:53 +00:00
rcourtman
4cd3e53c3e test: add regression tests for missing frontend fields
Ensures that LinkedHostAgentId, CommandsEnabled, IsLegacy, and LinkedNodeId
are correctly propagated to the frontend. This prevents regressions of the
bugs fixed for #952 and #971.
2026-01-02 20:45:35 +00:00
rcourtman
a0e5f22983 chore: bump version to 5.0.10 2026-01-02 20:17:09 +00:00
rcourtman
118574e491 fix: expose linkedHostAgentId and commandsEnabled to frontend
Related to #952 and #971

Both issues were caused by the backend not sending required fields to the
frontend in the ToFrontend() converters:

Issue #971 (Agent required badge):
- NodeFrontend was missing LinkedHostAgentId field
- Frontend couldn't identify linked host agents, so it fell back to showing
  'Agent required' instead of 'Via agent'

Issue #952 (AI Commands toggle stuck):
- HostFrontend was missing CommandsEnabled field
- Frontend couldn't see the actual commandsEnabled state from the backend,
  causing the optimistic UI to never receive confirmation that the state
  had actually changed

Also added IsLegacy and LinkedNodeId to HostFrontend for completeness.
2026-01-02 20:04:20 +00:00
rcourtman
31c704c7a7 refactor: fix lint issues in internal/ai package
- Remove redundant nil checks before len() calls
- Mark unused parameters with underscore
- Convert if/else chains to switch statements for cleaner code
- Add test assertions to resolve unused write warnings in patrol_test.go
2026-01-02 19:53:01 +00:00
rcourtman
7ec012a2e1 feat(pro): expose update_alerts feature and add AI-powered update risk assessment
- Expose FeatureUpdateAlerts in /api/license/features endpoint (was hidden)
- Add 'Update Alerts' label to frontend Pro License panel
- Add AI-powered update risk assessment for Docker container updates
  - Classifies containers by type (auth, web server, database, etc.)
  - Provides context-aware recommendations for update timing
  - Time-based urgency escalation (warning >7d, critical >14d)
- Handle edge cases: nil alerts, empty metadata, float64 pendingHours
- Fix switch case ordering to properly route docker-container-update alerts
- Add comprehensive tests for update analysis (15 new test functions)
2026-01-02 19:21:17 +00:00
rcourtman
c577a7d142 chore: update logo to vector SVG 2026-01-02 18:06:27 +00:00
rcourtman
3637184c63 fix(alerts): decouple PBS custom threshold detection from Node defaults. Related to #1017 2026-01-02 17:46:01 +00:00
rcourtman
b94b6f89d4 Fix ThresholdsTable tests: correct mocking and assertions for resource rendering and filtering 2026-01-02 17:40:23 +00:00
rcourtman
9f0a5d54aa fix(alerts): prevent PBS thresholds from falling back to Node defaults. Related to #1017 2026-01-02 17:04:15 +00:00
rcourtman
9bdbf2616c chore(tests): remove unused test code and redundant test cases
- Remove unused findAlertByID helper and its min dependency from update_alerts_test.go
- Remove redundant negative zero test case from utility_test.go (-0.0 == 0.0 in Go)
2026-01-02 16:11:09 +00:00
rcourtman
0b0b503919 feat: Enable update checks for Docker environments. Related to #1016 2026-01-02 14:22:40 +00:00
rcourtman
180cddb55b refactor: use license package constants for Pro features in AI service 2026-01-02 14:11:56 +00:00
rcourtman
f9ea0fbb5a fix(pro): add error tracking to patrol history store
- Add lastSaveError, lastSaveTime, onSaveError fields to PatrolRunHistoryStore
- Add GetPersistenceStatus() and SetOnSaveError() methods
- Consistent with findings store and cost store error handling
2026-01-02 14:01:32 +00:00
rcourtman
f71c6a6cce fix(pro): add error tracking to cost store and fix race condition
- Add lastSaveError, lastSaveTime, onSaveError fields to cost.Store
- Add GetPersistenceStatus() method to check persistence health
- Add SetOnSaveError() callback for error notifications
- Rename scheduleSave to scheduleSaveLocked for clarity
- Document that scheduleSaveLocked must be called with lock held
- Add tests for new error tracking functionality
2026-01-02 13:59:26 +00:00
rcourtman
c2de1b256b fix(pro): add cleanup goroutine for alert analyzer memory leak
- Add Start/Stop lifecycle methods to AlertTriggeredAnalyzer
- Periodic cleanup of lastAnalyzed map every 30 minutes
- Prevents memory growth from stale cooldown entries
- Document that ai package feature constants are aliases of license constants
- Call Start() in StartPatrol and Stop() in StopPatrol
- Add tests for Start/Stop lifecycle
2026-01-02 13:12:24 +00:00
rcourtman
3029cce172 fix(patrol): address multiple issues in patrol service
- Add missing KubernetesChecked field to persistence (data was being lost)
- Fix Duration field to properly convert between ms and nanoseconds
- Add automatic cleanup of stale stream subscribers (memory leak fix)
- Add error tracking for findings persistence with callback support
- Add GetPersistenceStatus() and SetOnSaveError() methods
- Add tests for new error tracking functionality
2026-01-02 12:45:00 +00:00
rcourtman
3e6ebd593c fix(alerts): resolve mapping and formatting issues for disk temperature thresholds (#1013) 2026-01-02 11:27:48 +00:00
rcourtman
773376fa5d docs: add deep dive summaries for notifications, discovery, and agent exec 2026-01-02 11:18:28 +00:00
rcourtman
d71754743c docs: Add PULSE_DISABLE_DOCKER_UPDATE_ACTIONS documentation
- Add to DOCKER.md configuration table and new 'Disabling Update Features' section
- Add to CONFIGURATION.md monitoring overrides table
- Clarify difference between disabling update detection vs hiding buttons
2026-01-02 10:35:04 +00:00
rcourtman
60220ee161 feat: Add server-wide control to disable Docker update actions
Implements PULSE_DISABLE_DOCKER_UPDATE_ACTIONS environment variable and
Settings UI toggle to hide Docker container update buttons while still
allowing update detection. This addresses requests for a 'read-only' mode
in production environments.

Backend:
- Add DisableDockerUpdateActions to SystemSettings and Config structs
- Add environment variable parsing with EnvOverrides tracking
- Expose setting in GET/POST /api/config/system endpoints
- Block update API with 403 when disabled (defense-in-depth)

Frontend:
- Add disableDockerUpdateActions to SystemConfig type
- Create systemSettings store for reactive access to server config
- Add Docker Settings card in Settings → Agents tab with toggle
- Show env lock badge when set via environment variable

UpdateButton improvements:
- Properly handle loading state (disabled + visual indicator)
- Use Solid.js Show components for proper reactivity
- Show read-only UpdateBadge when updates disabled
- Show interactive button when updates enabled

Closes discussion #982
2026-01-02 10:29:43 +00:00