Commit graph

260 commits

Author SHA1 Message Date
rcourtman
ee0e89871d fix: reduce metrics memory 86x by reverting buffer and adding LTTB downsampling
The in-memory metrics buffer was changed from 1000 to 86400 points per
metric to support 30-day sparklines, but this pre-allocated ~18 MB per
guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB —
explaining why users needed to double their LXC memory after upgrading
to 5.1.0.

- Revert in-memory buffer to 1000 points / 24h retention
- Remove eager slice pre-allocation (use append growth instead)
- Add LTTB (Largest Triangle Three Buckets) downsampling algorithm
- Chart endpoints now use a two-tier strategy: in-memory for ranges
  ≤ 2h, SQLite persistent store + LTTB for longer ranges
- Reduce frontend ring buffer from 86400 to 2000 points

Related to #1190
2026-02-04 19:49:52 +00:00
rcourtman
9d4d392026 fix: host network sparklines showing cumulative bytes instead of rates
Host network sparklines were displaying wildly incorrect values (e.g., 147 GB/s
for an idle Raspberry Pi) because cumulative byte counters (total bytes since
boot) were being stored directly instead of being converted to rates.

Changes:
- monitor.go: Use RateTracker to calculate network rates for hosts, matching
  the existing pattern used for VMs and containers. Only record network
  metrics when we have enough samples to calculate valid rates.
- router.go: Remove network metrics from live fallback for hosts since we
  can't calculate rates from a single snapshot. Better to show nothing than
  misleading cumulative totals.

The fix follows the established codebase pattern where:
1. Agent reports cumulative RXBytes/TXBytes
2. RateTracker compares consecutive samples to calculate bytes/second
3. Rates are stored in metrics history for sparkline display
2026-02-04 16:11:04 +00:00
rcourtman
cffb91f9ea Pre-populate node display name cache before guest polling
Guest polling (CheckGuest) runs before CheckNode in each poll cycle,
so the display name cache was empty when the first guest alert was
created. This caused the initial notification to use the raw Proxmox
node name. Fix by seeding the cache from modelNodes (which are already
available) before guest polling starts.

Related to #1188
2026-02-04 14:29:49 +00:00
rcourtman
05266d9062 Show node display name in alerts instead of raw Proxmox node name
Alerts previously showed the raw Proxmox node name (e.g., "on pve") even
when users configured a display name (e.g., "SPACEX") via Settings or the
host agent --hostname flag. This affected the alert UI, email notifications,
and webhook payloads.

Add NodeDisplayName field to the alert chain: cache display names in the
alert Manager (populated by CheckNode/CheckHost on every poll), resolve
them at alert creation via preserveAlertState, refresh on metric updates,
and enrich at read time in GetActiveAlerts. Update models.Alert, the
syncAlertsToState conversion, email templates, Apprise body text, webhook
payloads, and all frontend rendering paths.

Related to #1188
2026-02-04 14:26:44 +00:00
rcourtman
5c18748742 Add SMART disk lifecycle monitoring with historical charts
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.

Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
2026-02-04 13:35:40 +00:00
rcourtman
902bdd92c2 fix: prefer status-mem over status-freemem for VM memory calculation
Proxmox's FreeMem field reports free memory relative to the balloon's
guest-visible total (total_mem), not relative to MaxMem. When ballooning
is active and the VM's memory has been reduced, subtracting FreeMem from
MaxMem produces wildly inflated usage (e.g. 97% when actual usage is 20%).

Proxmox's Mem field is already calculated as (total_mem - free_mem),
giving the correct used bytes regardless of balloon state. Swap the
priority so Mem is checked before FreeMem.

Related to #1185
2026-02-04 12:08:33 +00:00
rcourtman
5a990dd554 Fix sparkline data inconsistency and support 30d range 2026-02-03 22:39:50 +00:00
rcourtman
2ebe65bbc5 security: add scope checks to AI Patrol and agent profile endpoints
- AI Patrol mutation endpoints (acknowledge, dismiss, suppress, snooze, resolve,
  findings/note, suppressions/*) now require ai:execute scope to prevent
  low-privilege tokens from blinding patrol by hiding/suppressing findings

- Agent profile admin endpoints (/api/admin/profiles/*) now require
  settings:write scope to prevent low-privilege tokens from modifying
  fleet-wide agent behavior
2026-02-03 19:29:56 +00:00
rcourtman
1733bea15c feat(ui): show backup permission warnings on Backups page
When PVE backup polling detects permission errors (403/401/permission
denied), track them per instance and surface them via the scheduler
health endpoint.

The Backups page now fetches instance warnings and displays a banner
when backup permission issues are detected, telling users exactly how
to fix the problem.

Related to #1139
2026-02-03 19:27:10 +00:00
rcourtman
c7f4030c29 fix(monitoring): prevent memory leak from stale metrics history and rate tracker entries
MetricsHistory.Cleanup() was defined but never called, and even if called,
it only removed old data points without deleting map entries for deleted
containers/VMs. Each stale entry leaked ~224KB (7 pre-allocated slices).

Changes:
- Call metricsHistory.Cleanup() and rateTracker.Cleanup() in maintenance loop
- Delete map entries entirely when all data points have expired
- Return nil instead of empty slice in cleanupMetrics() to release backing arrays
- Add Cleanup() method to RateTracker with 24-hour stale threshold
- Add debug logging to track cleanup activity

Related to #1153
2026-02-03 17:16:06 +00:00
rcourtman
4f40c3d751 fix: resolve critical stability and auth issues
- Fix data race in webhook notifications by removing shared state
- Fix duplicate monitors on config reload by stopping old instances
- Prevent metrics ID deletion on transient startup errors
- Support Bearer auth header for config export/import endpoints
2026-02-03 16:46:27 +00:00
rcourtman
aeca5e39fa Fix multi-tenant persistence and backend stability
- Initialize Alert and Notification managers with tenant-specific data directories

- Add panic recovery to WebSocket safeSend for stability

- Record host metrics to history for sparkline support
2026-02-03 16:24:42 +00:00
rcourtman
71f80c8a99 Fix: alert resolution now records incident timeline during quiet hours
- Fixed early return in handleAlertResolved that skipped incident recording
  when quiet hours suppressed recovery notifications
- Added Host Agent alert delay configuration (backend + UI)
- Host Agents now have dedicated time threshold settings like other resource types

Related to #1179
2026-02-03 12:49:41 +00:00
rcourtman
c8483f8116 Fix: PBS backup verification status not updating after cache populated
The PBS backup snapshot cache only compared BackupCount and LastBackup
timestamp to decide whether to re-fetch. When PBS verify jobs complete,
neither field changes — only the Verification field on individual
snapshots changes — so the cache served stale data indefinitely.

Add a 10-minute TTL per backup group so verification status changes are
picked up periodically. Also add panic recovery to PBS and PVE backup
goroutines, and use runtimeCtx for PBS backup polling to respect
monitor shutdown.

Closes #1174
2026-02-02 23:12:26 +00:00
rcourtman
95a0d7a6bd feat(backend): implement AI Patrol, Investigation, and system-wide refactors 2026-01-30 19:02:14 +00:00
rcourtman
70dbb495ad fix: address triage issues #1149, #1153, #1162, #1163
- #1163: Add node badges to storage resources in threshold tables
  (ResourceTable.tsx, ResourceCard.tsx)
- #1162: Fix PBS backup alerts showing datastore as node name
  (alerts.go - use "Unknown" for orphaned backups)
- #1153: Fix memory leaks in tracking maps
  - Add max 48 sample limit for pmgQuarantineHistory
  - Add max 10 entry limit for flappingHistory
  - Add cleanup for dockerUpdateFirstSeen
  - Add cleanupTrackingMaps() for auth, polling, and circuit breaker maps

Note: #1149 fix (chat sessions null check) is in AISettings.tsx
which has other pending changes - will be committed separately.
2026-01-26 22:21:10 +00:00
rcourtman
1e77763870 feat: improve monitoring and temperature handling
Temperature Monitoring:
- Enhance temperature collection and processing
- Add temperature tests

Monitor Improvements:
- Improve monitor reload handling
- Add reload tests

Test Coverage:
- Add Ceph monitoring tests
- Add Docker commands tests
- Add host agent temperature tests
- Add extra coverage tests
2026-01-24 22:43:31 +00:00
rcourtman
4c19fa3c1b fix: resolve btrfs disk summing (#1158), podman disable flag (#1151), and diagnostics path (#1155) 2026-01-23 19:24:38 +00:00
rcourtman
8963d69764 feat: add metrics store point limiting and mock improvements
- Add point limiting to metrics queries
- Improve mock metrics history for testing
- Add monitor enhancements
2026-01-22 22:29:56 +00:00
rcourtman
2e0da42a81 chore: reliability and maintenance improvements
Host agent:
- Add SHA256 checksum verification for downloaded binaries
- Verify checksum file matches expected bundle filename

WebSocket:
- Add write failure tracking with graceful disconnection
- Increase write deadline to 30s for large state payloads
- Better handling for slow clients (Raspberry Pi, slow networks)

Monitoring:
- Remove unused temperature proxy imports
- Add monitor polling improvements
- Expand test coverage

Other:
- Update package.json dependencies
- Fix generate-release-notes.sh path handling
- Minor reporting engine cleanup
2026-01-22 00:45:04 +00:00
rcourtman
7049f5b43c refactor: simplify temperature monitoring after sensor proxy removal
Remove proxy-related temperature code paths:
- temperature.go: remove proxy client integration and fallback logic
- config.go: remove SensorProxyEnabled and related config fields
- monitor.go: remove proxy client initialization and state

Temperature monitoring now relies solely on the unified agent approach.
2026-01-21 12:00:28 +00:00
rcourtman
ebc29b4fdb feat: show pending apt updates for Proxmox nodes (#1083)
- Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model
- Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update)
- Add 30-minute polling cache to avoid excessive API calls
- Add pendingUpdates to frontend Node type
- Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+)
- Update test stubs for interface compliance

Requires Sys.Audit permission on Proxmox API token to read apt updates.
2026-01-21 10:53:36 +00:00
rcourtman
204a9fe084 perf: Cache agent profiles to prevent disk I/O on every report. Related to #1094
GetHostAgentConfig was loading profiles and assignments from disk on
every agent report (every 10-30 seconds per host). With multiple hosts,
this caused disk I/O contention that eventually led to request timeouts.

Added in-memory caching with 60-second TTL:
- Fast path reads from cache without locks when valid
- Double-checked locking pattern for cache refresh
- Cache auto-invalidates after TTL, no manual invalidation needed
2026-01-17 22:31:02 +00:00
rcourtman
103eb9c3e0 feat(monitoring): auto-detect Docker inside LXC containers
Adds automatic Docker detection for Proxmox LXC containers:
- New HasDocker and DockerCheckedAt fields on Container model
- Docker socket check via connected agents on first run, restart, or start
- Parallel checking with timeouts for efficiency
- Caches results and only re-checks after state transitions

This enables the AI to know which LXC containers are Docker hosts
for better infrastructure guidance.
2026-01-17 14:42:52 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
9b49d3171d feat(pbs): add datastore exclusion to reduce PBS log noise
Users with removable/unmounted datastores (e.g., external HDDs for
offline backup) experienced excessive PBS log entries because Pulse
was querying all datastores including unavailable ones.

Added `excludeDatastores` field to PBS node configuration that accepts
patterns to exclude specific datastores from monitoring:
- Exact names: "exthdd1500gb"
- Prefix patterns: "ext*"
- Suffix patterns: "*hdd"
- Contains patterns: "*removable*"

Pattern matching is case-insensitive.

Fixes #1105
2026-01-14 12:26:18 +00:00
rcourtman
d389345153 fix(hosts): calculate Used memory from Total-Free for host agents in LXC
The previous fix (4090d981) addressed memory reporting for Docker agents
running in LXC containers, but the same issue also affects host agents.
When gopsutil runs inside an LXC, it can read Total and Free memory
from cgroup limits, but reports 0 for Used memory.

Added the same Total - Free fallback calculation to the host agent
processing path, which populates the Hosts tab.

Fixes #1075
2026-01-12 21:19:40 +00:00
rcourtman
4090d98160 fix(docker): calculate Used memory from Total-Free in Docker-in-LXC
When running Docker inside an LXC container, gopsutil can read the
Total memory (from cgroup limits) and Free memory correctly, but
returns 0 for Used memory. This caused the display to show "0B / 7GB"
even though memory was being used.

Added a fallback that calculates Used = Total - Free when Used is 0
but Total and Free are valid. This completes the fallback chain for
Docker-in-LXC memory reporting.

Fixes #1075
2026-01-12 14:04:38 +00:00
rcourtman
80444a9022 fix(monitor): use cluster quorum status instead of endpoint count for health
Previously, when some cluster endpoints were unreachable (e.g., backup
nodes intentionally offline), the cluster was marked as "degraded" even
though the Proxmox cluster itself was healthy and had quorum.

Now the connection health check queries the Proxmox cluster's actual
quorum status. A cluster is only marked "degraded" if it has lost
quorum (not enough votes for consensus), which is the actual indicator
of cluster instability.

This means:
- Cluster with quorum + some nodes offline = "healthy"
- Cluster without quorum = "degraded" (warning)
- All endpoints down = "error"

Fixes #1085
2026-01-11 11:54:02 +00:00
rcourtman
9cd79daa68 fix(hostagent): prevent data mixing when multiple nodes share hostname
When multiple PVE nodes have the same hostname (e.g., both named "pve"),
auto-linking would incorrectly link all host agents to the first matching
node, causing temperature and sensor data to be mixed/duplicated.

Changes:
- findLinkedProxmoxEntity now detects hostname collisions and refuses
  to auto-link, logging a warning instead
- Added manual link API endpoint (POST /api/agents/host/link) so users
  can explicitly link agents to the correct nodes
- Added State.LinkHostAgentToNode for bidirectional manual linking

Fixes #1081
2026-01-10 23:12:51 +00:00
rcourtman
a978693cb1 fix(dockeragent): use TotalMemoryBytes fallback for memory.Total
The previous fix (4ff9e58c) added a fallback for TotalMemoryBytes in
the agent when Docker's info.MemTotal returns 0 in LXC environments.
However, the server was not using TotalMemoryBytes to populate the
memory.Total field - it only used gopsutil's Memory.TotalBytes.

When gopsutil also fails to read memory in the LXC container, the
frontend would see memory.Total=0 and wouldn't fall back to
totalMemoryBytes due to JavaScript's nullish coalescing (??) only
triggering on null/undefined, not on 0.

This fix ensures the server uses TotalMemoryBytes as a fallback for
memory.Total when gopsutil returns 0, providing a complete fix chain:
1. Agent: Falls back to gopsutil when Docker returns 0
2. Server: Falls back to TotalMemoryBytes when gopsutil returns 0

Fixes #1075
2026-01-10 22:45:41 +00:00
rcourtman
1f4f0472b0 fix: use configured memory (MaxMem) instead of balloon for VM total
Previously, when memory ballooning was active on a VM, Pulse would use
the balloon value as the total memory instead of the configured MaxMem.
This caused confusing displays where a 4GB VM with 1GB balloon would
show "94% (966MB/1GB)" instead of "24% (966MB/4GB)".

The balloon value is still tracked in memory.balloon for the frontend's
yellow balloon marker visualization, but no longer replaces the total.

Fixes #1070
2026-01-10 15:37:45 +00:00
rcourtman
07b4765b8d fix: respect quiet hours for recovery notifications (#1068)
Recovery notifications were bypassing the quiet hours check, causing
users to receive recovery alerts during their configured quiet hours
window even though the original "down" alerts were suppressed.

- Add ShouldSuppressResolvedNotification() to alert manager
- Check quiet hours before sending recovery notifications in monitor
- Recovery notifications now follow same suppression rules as alerts
2026-01-09 21:47:36 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
rcourtman
d5c93fd226 fix: add cluster endpoint IP override and Windows agent download support
1. Add IPOverride field to ClusterEndpoint struct
   - Allows users to specify a custom IP that takes precedence over auto-discovered IPs
   - Fixes #929 and #1066 where Pulse used internal cluster IPs instead of management IPs
   - Added EffectiveIP() method to cleanly handle the override logic

2. Update connection code to use EffectiveIP()
   - monitor.go: Use override when building endpoint URLs
   - temperature_proxy.go: Use override for proxy connections

3. Add bare Windows EXE files to GitHub releases
   - Fixes #1064 where LXC/barebone installs couldn't download Windows agents
   - Modified build-release.sh to copy EXEs alongside ZIPs
   - Added EXEs to checksum generation
2026-01-08 23:04:25 +00:00
rcourtman
568aac6bd0 fix: multiple triage fixes for stability and correctness
1. Use correct mutex (diagMu) in cleanupDiagnosticSnapshots to prevent
   "concurrent map iteration and map write" panics (Fixes #1063)

2. Use cluster name for storage instance comparison in UpdateStorageForInstance
   to prevent storage duplication in clustered Proxmox setups (Fixes #1062)

3. Fix KUBECONFIG unbound variable error in install.sh by using ${KUBECONFIG:-}
   default parameter expansion (Fixes #1065)
2026-01-08 22:54:33 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
d0191d136f fix: Add configurable poll timeout and handle external Ceph storage
Changes:
1. Add MAX_POLL_TIMEOUT env var for large Proxmox clusters that need
   more than 3 minutes for polling (default: 3m, minimum: 30s)
2. Handle external Ceph storage gracefully - don't mark nodes unhealthy
   when Proxmox returns 'binary not installed' (e.g., for Ceph not
   managed by Proxmox)

Related to #965
2026-01-05 23:34:33 +00:00
rcourtman
43b5fad12c fix: Add main host URL as fallback for remote cluster access
When a Proxmox cluster is discovered, Pulse now includes the user-provided
main host URL as a fallback endpoint. This handles scenarios where Proxmox
reports internal IPs that aren't reachable from Pulse's network (e.g.,
monitoring a remote cluster across different networks).

Previously, if all cluster endpoint IPs were unreachable, the connection
would fail with no fallback. Now the ClusterClient will fall back to the
main host URL, allowing Proxmox to route API calls internally.

Related to #1028
2026-01-04 14:54:03 +00:00
rcourtman
90cce6d51b test(monitoring): fix failing snapshot tests and improve coverage
- Fix TestMonitor_PollGuestSnapshots_Coverage by correctly initializing State ID fields
- Improve PBS client to handle alternative datastore metric fields (total-space, etc.)
- Add comprehensive test coverage for PBS polling, auth failures, and datastore metrics
- Add various coverage tests for monitoring, alerts, and metadata handling
- Refactor Monitor to support better testing of client creation and auth handling
2026-01-04 10:29:40 +00:00
rcourtman
b039b79e4a fix: Physical disk temps showing 0°C when using host agent SMART data
The mergeNVMeTempsIntoDisks and mergeHostAgentSMARTIntoDisks functions
require nodes to have LinkedHostAgentID populated to match disks with
host agent SMART data. However, the code was passing the local modelNodes
variable which doesn't have this field set - the linking happens inside
UpdateNodesForInstance which modifies the state's copy, not the local var.

Fixed by using currentState.Nodes (from GetSnapshot()) instead of
modelNodes/modelNodesCopy in both the skip-poll path and the background
goroutine. The state snapshot contains nodes with LinkedHostAgentID
already populated, allowing proper SMART data merging.

Related to #1014
2026-01-03 19:20:31 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
4ed03f23c2 fix: use Instance field for backup/snapshot state sync instead of ID prefix
This resolves issues where snapshots/backups persist after deletion if the
Instance field didn't match the ID prefix (due to case changes, name changes, etc).

Now consistent with how VMs, Containers, Storage, etc. are filtered.

Also adds Instance field to BackupTask model for completeness.

Addresses #1009 (refs #991)
2026-01-01 23:22:38 +00:00
rcourtman
a4c3295c1a fix: Ensure AI commands toggle remains stable vs agent reports. Related to #952 2026-01-01 15:30:27 +00:00
rcourtman
3b201b4a88 fix: Data race in pollGuestSnapshots accessing state without proper lock
pollGuestSnapshots was reading m.state.VMs and m.state.Containers while
only holding the Monitor's mutex (m.mu), not the State's internal mutex.
This caused a data race where VMs/containers could be modified by another
goroutine while being read, leading to stale or missing snapshot data.

Symptoms: Deleted snapshots persisting in UI, new snapshots not appearing,
only fixable by service restart.

Fix: Use GetSnapshot() which properly acquires State's mutex and returns
a consistent copy of the data.

Related to #991
2026-01-01 14:39:10 +00:00
rcourtman
065a59316f fix(alerts): respect per-guest backup and snapshot overrides (fixes #961) 2025-12-30 00:28:05 +00:00
rcourtman
5ad1f5e847 feat: Merge linked host agent SMART temps into Physical Disks
When a host agent is running on a Proxmox node (linked host agent),
merge the agent's SMART disk temperature data into the Physical Disks
view for that node. This allows disk temps collected by pulse-agent
to populate the Physical Disks page without requiring Proxmox SMART
monitoring to be enabled.

Matching is done by WWN (most reliable), serial number, or device path.

Closes part of issue #909 (follow-up from MichiFr)
2025-12-29 15:39:20 +00:00
rcourtman
fd1f94babf fix: AI Commands toggle now updates immediately in UI. Related to #952
Previously, toggling AI Commands in the Agents view would show a pending state
and wait for the agent to confirm the change (up to 2 minutes). If the agent
was slow to report or the WebSocket update was missed, the toggle would appear
stuck.

Now, UpdateHostAgentConfig also updates the Host model in state immediately,
providing instant UI feedback. The agent will still receive the config on its
next report, but users see the change right away.

Added SetHostCommandsEnabled function to models.State for this purpose.
2025-12-29 13:56:29 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
b50872b686 feat: Implement unified update detection system (Phase 1)
Docker container image update detection with full stack implementation:

Backend:
- Add internal/updatedetection package with types, store, registry checker, manager
- Add registry checking to Docker agent (internal/dockeragent/registry.go)
- Add ImageDigest and UpdateStatus fields to container reports
- Add /api/infra-updates API endpoints for querying updates
- Integrate with alert system - fires after 24h of pending updates

Frontend:
- Add UpdateBadge and UpdateIcon components for update indicators
- Add updateStatus to DockerContainer TypeScript interface
- Display blue update badges in Docker unified table image column
- Add 'has:update' search filter support

Features:
- Registry digest comparison for Docker Hub, GHCR, private registries
- Auth token handling for Docker Hub public images
- Caching with 6h TTL (15min for errors)
- Configurable alert delay via UpdateAlertDelayHours (default: 24h)
- Alert metadata includes digests, pending time, image info
2025-12-27 17:58:38 +00:00