Commit graph

67 commits

Author SHA1 Message Date
rcourtman
d4242d9a13 Fix ZFS pool attachment in storage frontend (discussion #1351) 2026-03-27 14:59:52 +00:00
rcourtman
42a84fc5ca Fix reload-driven PVE host linking consistency (#1269) 2026-03-26 09:01:23 +00:00
rcourtman
40249947ed Fix template backup orphan detection race (#1352) 2026-03-25 10:36:33 +00:00
rcourtman
2fe22c3308 fix(backups): prevent template backups from being flagged as orphaned
Some checks failed
Build and Test / Secret Scan (push) Failing after 5s
Build and Test / Frontend & Backend (push) Failing after 1m8s
Core E2E Tests / Playwright Core E2E (push) Failing after 4m38s
Proxmox VM/LXC templates are intentionally excluded from the monitored
guest list, but their backup files exist on storage. The orphan-detection
logic was firing for every template backup because the VMID was never
in the guest lookup maps.

Fix: track template VMID→node pairs in State.templateVMIDs (unexported,
not serialised to API/frontend) during the resources poll loop, expose
via StateSnapshot.TemplateVMIDs, and use in both buildGuestLookups() and
the storage backup node-resolution map so orphan detection treats template
backups as valid. Also preserves the template map through the cluster
health grace-period path (zero-resource preservation), the partial-node
grace-period path, and clears it on instance removal.

Closes #1352
2026-03-17 09:04:22 +00:00
rcourtman
caff845c1a fix(ui): use Proxmox tag colours from datacenter config
Pulse was generating tag colours from a hash of the tag name instead
of using the colours configured in Proxmox. Now polls /cluster/options
once per PVE instance and merges the tag-style colour map into state,
which the frontend uses as the first-priority colour source for tag
badges. Falls back to the existing special-tag and hash-based colours
when Proxmox hasn't set a custom colour for a tag.
2026-03-15 19:49:46 +00:00
rcourtman
7dab977d91 Add split memory bar showing Used | Cache | Free segments (#1302)
Show reclaimable buff/cache as a distinct amber segment between used
(green) and free (gray) in the memory bar. This explains why Pulse's
memory percentage differs from Proxmox: Pulse reports cache-aware
usage (MemAvailable) while Proxmox includes cache as used (Total-Free).

Backend: add Cache field to Memory model, derived from MemInfo
(Available - Free). Only uses MemInfo.Free (not FreeMem fallback) to
avoid inflating cache by the balloon gap on ballooned VMs.

Frontend: StackedMemoryBar renders three segments with tooltip
breakdown. Tooltip Free accounts for balloon limit when active.
Percentage label and alerts remain cache-aware (unchanged).
2026-03-10 10:16:14 +00:00
rcourtman
a4b0771974 Prevent removed host agents from resurrecting via in-flight reports (#1331)
Host agents removed from the UI would reappear on the next report cycle
because there was no rejection mechanism — unlike Docker agents which
already had resurrection prevention. Mirror the Docker agent pattern:

- Track removed host IDs in a `removedHosts` map with 24hr TTL
- Persist removal records in `State.RemovedHosts` for frontend display
- Reject reports from removed hosts in `ApplyHostReport()`
- Add `AllowHostReenroll()` + API route to clear the block
- Show removed host agents in the Settings UI with "Allow re-enroll"
- Sync removed-agent maps from state on startup for all agent types
- Fix mock integration snapshot missing `RemovedDockerHosts` field
2026-03-09 17:52:34 +00:00
rcourtman
499ab812e3 Fix post-release regressions and lock v5 to single-tenant runtime 2026-03-05 23:46:35 +00:00
rcourtman
a4571f580b fix(monitoring): harden VM memory selection and flag repeated VM usage 2026-03-03 16:19:17 +00:00
rcourtman
8c7d507ea4 fix(alerts): make --disk-exclude suppress Proxmox SSD wear/health alerts (#1142)
The --disk-exclude agent flag only filtered local metric collection but
had no effect on server-side Proxmox disk health and SSD wearout alerts,
which poll the Proxmox API directly. Users excluding disks (e.g.
--disk-exclude sda) still received alerts for those disks.

Agent now sends its DiskExclude patterns in each report. The server
stores them on the Host model and consults them during Proxmox disk
polling — excluded disks get a synthetic healthy status passed to
CheckDiskHealth so any existing alerts clear immediately.

Also adds FreeBSD pseudo-filesystem types (fdescfs, devfs, linprocfs,
linsysfs) to the virtual FS filter and /var/run/ to special mount
prefixes, fixing false disk-full alerts on FreeBSD for fdescfs mounts.
2026-02-20 13:31:52 +00:00
rcourtman
2735204638 fix: skip ambiguous shared-storage backups when VMID exists on multiple instances
When two standalone (non-clustered) PVE hosts share the same storage (NFS,
etc.), both instances see the same backup files during polling. Each instance
creates its own StorageBackup entry, causing guests with the same VMID on
different hosts to incorrectly show each other's backups.

Detect shared-storage duplicates by checking if the same volid appears across
multiple instances. When it does AND the VMID is ambiguous (exists on multiple
instances), skip the backup in SyncGuestBackupTimes rather than guessing which
instance owns it. This uses the same ambiguity pattern already applied to PBS
backups.

Fixes #1177
2026-02-11 11:07:28 +00:00
rcourtman
c92ccc122e fix(state): deduplicate PVE nodes and AI mention resources (#1217, #1214)
Backend: nodes with the same logical identity (cluster+name) are merged
using a health-weighted preference, preserving host-agent links across
node-ID churn.

Frontend: extract buildMentionResources() with alias-based dedup so
docker hosts and standalone host agents sharing an ID/hostname appear
once in the @ mention autocomplete.
2026-02-09 22:19:55 +00:00
rcourtman
8a48acef1d fix: hotfix 5.1.5 — node duplication, alert scrambling, ntfy resolved formatting
- fix(models): filter nodes by instance in UpdateNodesForInstance to prevent
  PVE node duplication across poll cycles (#1214, #1192, #1217)
- fix(alerts): sort GetActiveAlerts output for stable ordering, preventing
  hostname scrambling in frontend (#1218)
- fix(notifications): add ntfy-specific resolved webhook formatting with
  plain-text body and proper headers (#1213)
- fix(frontend): respect "hide Docker update actions" setting in
  DockerFilter Update All button (#1219)
- fix(frontend): add missing v prefix to GitHub release tag URLs (#1195)
- fix(monitoring): reduce disk detection warning from Warn to Debug to
  eliminate log spam for pass-through disks (#1216)
- chore: bump VERSION to 5.1.5
2026-02-08 11:48:22 +00:00
rcourtman
05266d9062 Show node display name in alerts instead of raw Proxmox node name
Alerts previously showed the raw Proxmox node name (e.g., "on pve") even
when users configured a display name (e.g., "SPACEX") via Settings or the
host agent --hostname flag. This affected the alert UI, email notifications,
and webhook payloads.

Add NodeDisplayName field to the alert chain: cache display names in the
alert Manager (populated by CheckNode/CheckHost on every poll), resolve
them at alert creation via preserveAlertState, refresh on metric updates,
and enrich at read time in GetActiveAlerts. Update models.Alert, the
syncAlertsToState conversion, email templates, Apprise body text, webhook
payloads, and all frontend rendering paths.

Related to #1188
2026-02-04 14:26:44 +00:00
rcourtman
5c18748742 Add SMART disk lifecycle monitoring with historical charts
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.

Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
2026-02-04 13:35:40 +00:00
rcourtman
dcfa8cf0ba fix: prevent false PBS backup indicators when VMIDs collide across PVE instances (#1177)
When namespace matching fails, the VMID-only fallback now checks whether
the VMID appears on multiple PVE instances. If ambiguous, the fallback
is skipped — preventing backups from being falsely attributed to the
wrong guest. Unique VMIDs still fall back as before.
2026-02-04 10:11:35 +00:00
rcourtman
19a67dd4f3 Update core infrastructure components
Config:
- AI configuration improvements
- API tokens handling
- Persistence layer updates

Host Agent:
- Command execution improvements
- Better test coverage

Infrastructure Discovery:
- Service improvements
- Enhanced test coverage

Models:
- State snapshot updates
- Model improvements

Monitoring:
- Polling improvements
- Guest config handling
- Storage config support

WebSocket:
- Hub tenant test updates

Service Discovery:
- New service discovery module
2026-01-28 16:52:35 +00:00
rcourtman
ebc29b4fdb feat: show pending apt updates for Proxmox nodes (#1083)
- Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model
- Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update)
- Add 30-minute polling cache to avoid excessive API calls
- Add pendingUpdates to frontend Node type
- Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+)
- Update test stubs for interface compliance

Requires Sys.Audit permission on Proxmox API token to read apt updates.
2026-01-21 10:53:36 +00:00
rcourtman
103eb9c3e0 feat(monitoring): auto-detect Docker inside LXC containers
Adds automatic Docker detection for Proxmox LXC containers:
- New HasDocker and DockerCheckedAt fields on Container model
- Docker socket check via connected agents on first run, restart, or start
- Parallel checking with timeouts for efficiency
- Caches results and only re-checks after state transitions

This enables the AI to know which LXC containers are Docker hosts
for better infrastructure guidance.
2026-01-17 14:42:52 +00:00
rcourtman
1dda538265 fix(models): extend namespace disambiguation to SyncGuestBackupTimes (#1095)
The previous commit fixed namespace disambiguation for backup alerts,
but the Overview display uses SyncGuestBackupTimes to populate backup
timestamps on VMs/Containers. This commit extends the same namespace
matching logic to that function.

Also tightened the matching algorithm to use suffix matching instead
of substring matching, preventing false positives like "pve" matching
"pve-nat".
2026-01-12 15:11:59 +00:00
rcourtman
9cd79daa68 fix(hostagent): prevent data mixing when multiple nodes share hostname
When multiple PVE nodes have the same hostname (e.g., both named "pve"),
auto-linking would incorrectly link all host agents to the first matching
node, causing temperature and sensor data to be mixed/duplicated.

Changes:
- findLinkedProxmoxEntity now detects hostname collisions and refuses
  to auto-link, logging a warning instead
- Added manual link API endpoint (POST /api/agents/host/link) so users
  can explicitly link agents to the correct nodes
- Added State.LinkHostAgentToNode for bidirectional manual linking

Fixes #1081
2026-01-10 23:12:51 +00:00
rcourtman
4ed03f23c2 fix: use Instance field for backup/snapshot state sync instead of ID prefix
This resolves issues where snapshots/backups persist after deletion if the
Instance field didn't match the ID prefix (due to case changes, name changes, etc).

Now consistent with how VMs, Containers, Storage, etc. are filtered.

Also adds Instance field to BackupTask model for completeness.

Addresses #1009 (refs #991)
2026-01-01 23:22:38 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
fd1f94babf fix: AI Commands toggle now updates immediately in UI. Related to #952
Previously, toggling AI Commands in the Agents view would show a pending state
and wait for the agent to confirm the change (up to 2 minutes). If the agent
was slow to report or the WebSocket update was missed, the toggle would appear
stuck.

Now, UpdateHostAgentConfig also updates the Host model in state immediately,
providing instant UI feedback. The agent will still receive the config on its
next report, but users see the change right away.

Added SetHostCommandsEnabled function to models.State for this purpose.
2025-12-29 13:56:29 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
b50872b686 feat: Implement unified update detection system (Phase 1)
Docker container image update detection with full stack implementation:

Backend:
- Add internal/updatedetection package with types, store, registry checker, manager
- Add registry checking to Docker agent (internal/dockeragent/registry.go)
- Add ImageDigest and UpdateStatus fields to container reports
- Add /api/infra-updates API endpoints for querying updates
- Integrate with alert system - fires after 24h of pending updates

Frontend:
- Add UpdateBadge and UpdateIcon components for update indicators
- Add updateStatus to DockerContainer TypeScript interface
- Display blue update badges in Docker unified table image column
- Add 'has:update' search filter support

Features:
- Registry digest comparison for Docker Hub, GHCR, private registries
- Auth token handling for Docker Hub public images
- Caching with 6h TTL (15min for errors)
- Configurable alert delay via UpdateAlertDelayHours (default: 24h)
- Alert metadata includes digests, pending time, image info
2025-12-27 17:58:38 +00:00
rcourtman
b27b76ae46 feat: implement agent self-unregistration and UI improvements
- Add DELETE /api/agents/unregister endpoint for agent self-unregistration
- Agent now unregisters itself from Pulse server when uninstalled
- Add clarifying note in UnifiedAgents explaining linked agents behavior
- Linked agents are managed via their PVE node but this is now explained in UI
- Add LastSeen field to HostAgent model for better agent status tracking
2025-12-26 23:20:55 +00:00
rcourtman
4a7306f6b8 fix: Auto-clear stale LinkedHostAgentID references during node updates
When nodes are updated, now validates that LinkedHostAgentID points to
an existing host agent. References to deleted host agents are automatically
cleared, fixing the 'Agent' tag persistence for users who removed agent
entries before commit c394d24.

Related to #920
2025-12-26 19:45:31 +00:00
rcourtman
cf577e715f fix: Clear node host agent link when agent is removed
When a host agent is deleted via the UI, the LinkedHostAgentID on any
PVE nodes that were linked to it was not being cleared. This caused
the "Agent" tag to persist in the UI after uninstalling the agent.

Related to #920
2025-12-26 17:52:32 +00:00
rcourtman
8f9d5c1120 feat: Agent collects S.M.A.R.T. disk data via smartctl. Related to #907
- Add smartctl package to collect disk temperature and health data
- Add SMART field to agent Sensors struct
- Host agent now runs smartctl to collect disk temps when available
- Backend processes agent SMART data for temperature display
- Graceful fallback when smartctl not installed
2025-12-25 11:37:53 +00:00
rcourtman
598285d3d2 feat: Agent reports CommandsEnabled status to server. Related to #903
- Add CommandsEnabled field to AgentInfo in pkg/agents/host/report.go
- Agent now reports whether AI command execution is enabled
- Server stores and exposes this via Host model
- Frontend can now show which agents have commands enabled
- This provides visibility before implementing remote configuration
2025-12-25 07:55:22 +00:00
rcourtman
e4732af0f5 fix: use configured Guest URLs for PVE/PBS/PMG navigation (#870)
- Fix PVE nodes: buildNodeUrl in ProxmoxNodesSection.tsx now prioritizes
  guestURL over host (was ignoring guestURL entirely)
- Add PBS support: GuestURL field added to PBSInstance config, model,
  and API handlers
- Add PMG support: GuestURL field added to PMGInstance config, model,
  and API handlers
- Update NodeSummaryTable to use guestURL for PBS nodes
- Frontend types updated for PBS/PMG guestURL support

The Guest URL setting in node configuration now works correctly across
all node types. When set, it takes priority over the Host URL when
clicking on node names to navigate to the Proxmox/PBS/PMG web UI.

Closes #870
2025-12-22 22:05:25 +00:00
rcourtman
07c993bfe8 fix: backup matching uses instance+VMID to prevent cross-instance collisions
Previously, SyncGuestBackupTimes matched backups to guests using only VMID.
This caused newly created containers to incorrectly show old backup times
from different containers on other Proxmox instances that happened to have
the same VMID.

Now uses composite key (instance+VMID) for PVE storage backups to ensure
proper isolation. PBS backups still use VMID matching (since they aggregate
from multiple sources) but only as a fallback.

Fixes issue where ollama LXC showed 'last backup 3 months ago' despite
being created yesterday.
2025-12-16 22:19:19 +00:00
rcourtman
397871629c fix: cluster-aware guest deduplication and multi-agent token binding
- Add cluster-aware guest ID generation (clusterName-VMID instead of instanceName-VMID)
  to prevent duplicate VMs/containers when multiple cluster nodes are monitored

- Add cluster deduplication at registration time - when a node is added that belongs
  to an already-configured cluster, merge as endpoint instead of creating duplicate

- Add startup consolidation to automatically merge duplicate cluster instances

- Change host agent token binding from agent GUID to hostname, allowing:
  - Multiple host agents to share a token (each bound by hostname)
  - Agent reinstalls on same host without token conflicts

- Remove 12-character password minimum requirement

- Remove emoji from auto-registration success message

- Fix grouped view node lookup to support both cluster-aware node IDs
  (clusterName-nodeName) and legacy guest grouping keys (instance-nodeName)

Fixes duplicate guests appearing when agents are installed on multiple
cluster nodes. Also improves multi-agent UX by allowing shared tokens.
2025-12-14 10:16:17 +00:00
rcourtman
5e2939b6bd feat: link host agents to PVE nodes by hostname to prevent duplication
When a host agent registers, it now searches for a PVE node with a
matching hostname and links them together. Similarly, when PVE nodes
are discovered, they check for existing host agents with matching hostnames.

This prevents the confusion of seeing duplicate entries when users install
agents on PVE cluster nodes that were already discovered via the cluster API.

- Added LinkedHostAgentID field to Node struct
- Added LinkedNodeID/LinkedVMID/LinkedContainerID fields to Host struct
- Added findLinkedProxmoxEntity() to match by hostname (with domain stripping)
- Updated UpdateNodesForInstance() to preserve and auto-set links
2025-12-13 23:14:00 +00:00
rcourtman
8919281718 fix: clear agents that connected during unauthenticated setup window
When no auth is configured (fresh install), CheckAuth allows all requests.
This creates a race condition where existing agents from a previous setup
can report data before the wizard completes security configuration.

This fix clears all host agents and docker hosts when /api/security/quick-setup
is called, ensuring the wizard shows a clean state after security is configured.

Added:
- State.ClearAllHosts() - removes all host agents
- State.ClearAllDockerHosts() - removes all docker hosts
- Monitor.ClearUnauthenticatedAgents() - clears both and resets token bindings
- Call to ClearUnauthenticatedAgents() in handleQuickSecuritySetupFixed()
2025-12-13 21:22:04 +00:00
rcourtman
a259b67348 feat: add Kubernetes platform support 2025-12-12 21:31:11 +00:00
rcourtman
e55df08dab feat: Add Proxmox 9.1+ OCI container support
- Backend: Add IsOCI and OSTemplate fields to Container model
- Backend: Add extractContainerOSTemplate() and isOCITemplate() detection functions
- Backend: Detect OCI containers via ostemplate config and set type to 'oci'
- Frontend: Add isOci and osTemplate to Container interface
- Frontend: Add 'oci-container' to ResourceType with distinct purple badge
- Frontend: Update Dashboard filters to include OCI containers with LXC
- Tests: Add comprehensive unit tests for OCI detection logic

OCI containers are detected by checking the ostemplate for patterns like:
- oci: prefix (e.g., oci:docker.io/library/alpine:latest)
- docker: prefix (e.g., docker:nginx:latest)
- Known registry URLs (docker.io, ghcr.io, gcr.io, quay.io, etc.)
- Local templates with oci- or oci_ filename patterns
2025-12-12 17:51:43 +00:00
rcourtman
927ac76bad feat: AI integration, Docker metrics, RAID display, and infrastructure improvements
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
2025-12-09 09:29:27 +00:00
rcourtman
472a86dcdb feat: Add OS type display for LXC containers
- Extract ostype from LXC container config (debian, ubuntu, alpine, etc.)
- Map ostype values to human-readable names (e.g., "debian" -> "Debian")
- Add OSName field to Container model and ContainerFrontend
- Add icons for NixOS, openSUSE, and Gentoo in frontend
- LXC containers now show OS icons alongside VMs in the dashboard

Supported LXC OS types: alpine, archlinux, centos, debian, devuan,
fedora, gentoo, nixos, opensuse, ubuntu, unmanaged
2025-12-05 12:43:32 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
49f71015c8 Fix backup indicator being reset when VMs/Containers are re-polled
UpdateVMsForInstance and UpdateContainersForInstance were replacing
guest data without preserving the LastBackup field that was populated
by SyncGuestBackupTimes. This caused backup indicators to always show
"no backup found" since the LastBackup would be wiped every time
guests were polled (which happens more frequently than backup polling).

Now both functions preserve LastBackup from existing data when the
incoming guest data has a zero value.

Related to #762
2025-12-02 00:12:31 +00:00
rcourtman
8361042ada Fix backup status indicator not showing for guests
The backup status indicator feature was incomplete - it added the UI
component but never populated VM/Container LastBackup from actual
backup data. Now SyncGuestBackupTimes() is called after storage
backups and PBS backups are polled, matching each guest's VMID to
its most recent backup timestamp.

Fixes #786
2025-11-30 22:13:46 +00:00
courtmanr@gmail.com
1716774e71 feat: adaptive node table layout, guest row fixes, and legacy agent detection
- Implemented adaptive layout for NodeSummaryTable with responsive columns and sticky name column.
- Fixed GuestRow background display issues.
- Added IsLegacy field to Host and DockerHost models to flag legacy agents (version < 1.0.0).
- Updated monitor to populate IsLegacy based on agent version.
2025-11-25 17:19:36 +00:00
rcourtman
bb7ca93c18 feat: Add mdadm RAID monitoring support for host agents
Implements comprehensive mdadm RAID array monitoring for Linux hosts
via pulse-host-agent. Arrays are automatically detected and monitored
with real-time status updates, rebuild progress tracking, and automatic
alerting for degraded or failed arrays.

Key changes:

**Backend:**
- Add mdadm package for parsing mdadm --detail output
- Extend host agent report structure with RAID array data
- Integrate mdadm collection into host agent (Linux-only, best-effort)
- Add RAID array processing in monitoring system
- Implement automatic alerting:
  - Critical alerts for degraded arrays or arrays with failed devices
  - Warning alerts for rebuilding/resyncing arrays with progress tracking
  - Auto-clear alerts when arrays return to healthy state

**Frontend:**
- Add TypeScript types for RAID arrays and devices
- Display RAID arrays in host details drawer with:
  - Array status (clean/degraded/recovering) with color-coded indicators
  - Device counts (active/total/failed/spare)
  - Rebuild progress percentage and speed when applicable
  - Green for healthy, amber for rebuilding, red for degraded

**Documentation:**
- Document mdadm monitoring feature in HOST_AGENT.md
- Explain requirements (Linux, mdadm installed, root access)
- Clarify scope (software RAID only, hardware RAID not supported)

**Testing:**
- Add comprehensive tests for mdadm output parsing
- Test parsing of healthy, degraded, and rebuilding arrays
- Verify proper extraction of device states and rebuild progress

All builds pass successfully. RAID monitoring is automatic and best-effort
- if mdadm is not installed or no arrays exist, host agent continues
reporting other metrics normally.

Related to #676
2025-11-09 16:36:33 +00:00
rcourtman
7ee252bd84 Fix Docker host display bug when multiple agents share API tokens (related to #658)
Root cause: findMatchingDockerHost() was matching hosts by token ID alone,
causing multiple Docker agents using the same API token to overwrite each
other in state. This resulted in only N visible hosts (where N = number of
unique tokens) instead of all M agents, with hosts "rotating" as each agent
reported every 10 seconds.

Example: 4 agents using 2 tokens would show only 2 hosts, rotating between
agents 1↔2 (token A) and agents 3↔4 (token B).

Fix: Remove token-only matching from findMatchingDockerHost(). Hosts should
only match by:
1. Agent ID (unique per agent)
2. Machine ID + hostname combination (with optional token validation)
3. Machine ID or hostname alone (only for tokenless agents)

This allows multiple agents to share the same API token without colliding.

Additional fix: UpsertDockerHost() now preserves Hidden, PendingUninstall,
and Command fields from existing hosts, preventing these flags from being
reset to defaults on every agent report.
2025-11-07 13:46:35 +00:00
rcourtman
2a79d57f73 Add SMART temperature collection for physical disks (related to #652)
Extends temperature monitoring to collect SMART temps for SATA/SAS disks,
addressing issue #652 where physical disk temperatures showed as empty.

Architecture:
- Deploys pulse-sensor-wrapper.sh as SSH forced command on Proxmox nodes
- Wrapper collects both CPU/GPU temps (sensors -j) and disk temps (smartctl)
- Implements 30-min cache with background refresh to avoid performance impact
- Uses smartctl -n standby,after to skip sleeping drives without waking them
- Returns unified JSON: {sensors: {...}, smart: [...]}

Backend changes:
- Add DiskTemp model with device, serial, WWN, temperature, lastUpdated
- Extend Temperature model with SMART []DiskTemp field and HasSMART flag
- Add WWN field to PhysicalDisk for reliable disk matching
- Update parseSensorsJSON to handle both legacy and new wrapper formats
- Rewrite mergeNVMeTempsIntoDisks to match SMART temps by WWN → serial → devpath
- Preserve legacy NVMe temperature support for backward compatibility

Performance considerations:
- SMART data cached for 30 minutes per node to avoid excessive smartctl calls
- Background refresh prevents blocking temperature requests
- Respects drive standby state to avoid spinning up idle arrays
- Staggered disk scanning with 0.1s delay to avoid saturating SATA controllers

Install script:
- Deploys wrapper to /usr/local/bin/pulse-sensor-wrapper.sh
- Updates SSH forced command from "sensors -j" to wrapper script
- Backward compatible - falls back to direct sensors output if wrapper missing

Testing note:
- Requires real hardware with smartmontools installed for full functionality
- Empty smart array returned gracefully when smartctl unavailable
- Legacy sensor-only nodes continue working without changes
2025-11-07 11:46:57 +00:00
rcourtman
d62259ffa7 Add AMD GPU temperature monitoring support
Related to #600

- Add GPU field to Temperature model with edge, junction, and mem sensors
- Add amdgpu chip recognition to temperature parser
- Implement parseGPUTemps() to extract AMD GPU temperature data
- Update frontend TypeScript types to include GPU temperatures
- Display GPU temps in node table tooltip alongside CPU temps
- Set hasGPU flag when GPU data is available

This enables temperature monitoring for AMD GPUs (amdgpu sensors)
that was previously being collected via SSH but silently discarded
during parsing.
2025-11-06 00:19:04 +00:00
rcourtman
7936808193 Add custom display name support for Docker hosts
This implements the ability for users to assign custom display names to Docker hosts,
similar to the existing functionality for Proxmox nodes. This addresses the issue where
multiple Docker hosts with identical hostnames but different IPs/domains cannot be
easily distinguished in the UI.

Backend changes:
- Add CustomDisplayName field to DockerHost model (internal/models/models.go:201)
- Update UpsertDockerHost to preserve custom display names across updates (internal/models/models.go:1110-1113)
- Add SetDockerHostCustomDisplayName method to State for updating names (internal/models/models.go:1221-1235)
- Add SetDockerHostCustomDisplayName method to Monitor (internal/monitoring/monitor.go:1070-1088)
- Add HandleSetCustomDisplayName API handler (internal/api/docker_agents.go:385-426)
- Route /api/agents/docker/hosts/{id}/display-name PUT requests (internal/api/docker_agents.go:117-120)

Frontend changes:
- Add customDisplayName field to DockerHost TypeScript interface (frontend-modern/src/types/api.ts:136)
- Add MonitoringAPI.setDockerHostDisplayName method (frontend-modern/src/api/monitoring.ts:151-187)
- Update getDisplayName function to prioritize custom names (frontend-modern/src/components/Settings/DockerAgents.tsx:84-89)
- Add inline editing UI with save/cancel buttons in Docker Agents settings (frontend-modern/src/components/Settings/DockerAgents.tsx:1349-1413)
- Update sorting to use custom display names (frontend-modern/src/components/Docker/DockerHosts.tsx:58-59)
- Update DockerHostSummaryTable to display custom names (frontend-modern/src/components/Docker/DockerHostSummaryTable.tsx:40-42, 87, 120, 254)

Users can now click the edit icon next to any Docker host name in Settings > Docker Agents
to set a custom display name. The custom name will be preserved across agent reconnections
and takes priority over the hostname reported by the agent.

Related to #623
2025-11-05 23:18:03 +00:00
rcourtman
b1831d7b3e Add guest URL support for PVE hosts
Related to discussion #615

Add optional GuestURL field to PVE instances and cluster endpoints,
allowing users to specify a separate guest-accessible URL for web UI
navigation that differs from the internal management URL.

Backend changes:
- Add GuestURL field to PVEInstance and ClusterEndpoint structs
- Add GuestURL field to Node model
- Update cluster auto-discovery to preserve existing GuestURL values
- Update node creation logic to populate GuestURL from config
- Update API handlers to accept and persist GuestURL field

Frontend changes:
- Add GuestURL input field to NodeModal for configuration
- Update NodeGroupHeader and NodeSummaryTable to use GuestURL for navigation
- Add GuestURL to Node and PVENodeConfig TypeScript interfaces

When GuestURL is configured, it will be used for navigation links
instead of the Host URL, allowing users to access PVE hosts through
a reverse proxy or different domain while maintaining internal API
connections.
2025-11-05 19:06:08 +00:00