Commit graph

20 commits

Author SHA1 Message Date
rcourtman
ac83074fc2 fix(hostmetrics): skip network mounts before usage probe (#1313) 2026-03-03 20:27:41 +00:00
rcourtman
6c720b7aea fix(freebsd): use golang.org/x/sys/unix.SysctlRaw instead of syscall.SysctlRaw
syscall.SysctlRaw is Darwin-only in Go's standard library; FreeBSD
requires the equivalent from golang.org/x/sys/unix. This fixes the
Docker cross-compilation build failure for the freebsd/amd64 target.

(cherry picked from commit 5fe16c75a075b817f90b7192d8270a7bd6677017)
2026-02-18 13:00:02 +00:00
rcourtman
efa916ee2a fix(memory): correct memory reporting for Linux VMs and FreeBSD ZFS ARC
Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox
RRD's memavailable metric (which excludes reclaimable page cache) when
the qemu-guest-agent doesn't provide MemInfo.Available. Previously the
fallback was detailedStatus.Mem (total - MemFree), inflating usage to
80%+ on VMs with normal Linux page cache. Mirrors the existing LXC
rrd-memavailable path.

FreeBSD ZFS ARC (#1264, #1051): The host agent now reads
kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts
the ARC size from reported memory usage. ZFS ARC is reclaimable under
memory pressure (like Linux SReclaimable) but gopsutil counts it as
wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS
and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms.

Fixes #1270
Fixes #1264
Fixes #1051

(cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)
2026-02-18 12:56:53 +00:00
rcourtman
049a3e424c Add memory regression tests for agent and scheduler 2026-02-04 19:33:29 +00:00
rcourtman
f0a356c016 fix: ZFS pool usage now includes zvols and all pool consumers
The previous reconciliation logic (issue #1052) used per-dataset statfs
values for Total and Used. On Proxmox systems, statfs on a mounted
dataset (e.g. rpool/ROOT/pve-1) only reports that dataset's own usage,
completely missing zvols (VM disk images) and other datasets. This caused
storage bars to show ~0% usage (a few GB of OS files) when the pool
actually had terabytes of VM data allocated.

Fix: derive usable pool capacity from the ratio of dataset Free (usable
pool-available from statfs) to zpool Free (raw pool-available from zpool
list). This ratio converts raw zpool Size to usable total, and Used is
computed as Total - Free. This captures all pool consumers including
zvols, handles RAIDZ parity overhead and mirrors uniformly, and produces
correct usage percentages.

Verified with tests for RAIDZ, mirrors, and both with zvols present.
2026-01-29 12:08:38 +00:00
rcourtman
824d65830c Add debug logging to ZFS disk collection for diagnostics
Adds zerolog debug statements throughout the ZFS collection pipeline
(collector.go and zfs.go) to trace partition discovery, dataset
collection, zpool stats fetching, and pool summarization. This will
help diagnose issues like empty storage bars on mirror-vdev pools.
2026-01-28 17:30:53 +00:00
rcourtman
61bb582d82 fix: disk-exclude now works with device paths and disk I/O
- Add MatchesDiskExclude() to check both device path and mountpoint
- Add MatchesDeviceExclude() for device-only matching
- Update collectDisks to check device in addition to mountpoint
- Update collectDiskIO to respect disk exclusions
- Patterns like /dev/sda, sda, or /mnt/backup all work now

Related to #1142
2026-01-21 19:03:05 +00:00
rcourtman
1816e2dbb8 fix(agent): use dataset used capacity for RAIDZ pools instead of zpool alloc
For RAIDZ pools, zpool ALLOC includes parity overhead, but users expect
to see actual data usage. Now using dataset Used value (from statfs)
when RAIDZ is detected, matching the existing fix for total capacity.

Fixes the second part of #1052 where used capacity was inflated.
2026-01-10 15:25:28 +00:00
rcourtman
49272bd48c fix: Show usable RAIDZ capacity instead of raw pool size
For RAIDZ/mirror pools, zpool list SIZE reports raw capacity (sum of
all disks), but users expect usable capacity (accounting for parity).
The dataset stats from statfs give the correct usable capacity.

Now uses dataset Total when it's smaller than zpool Size, indicating
RAIDZ/mirror overhead.

Related to #1052
2026-01-08 09:38:18 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
4ce1d551e4 fix: Deduplicate disks by device+total to fix Synology storage overcounting. Related to #953
Synology NAS creates multiple shared folders (e.g., /volume1/docker, /volume1/photos)
that are all mount points on the same underlying BTRFS volume. Each reported the same
16TB total, causing Pulse to show 64TB+ instead of 16TB.

The fix tracks device+total combinations and only counts each unique pair once.
When duplicates are found, the shallowest mountpoint (e.g., /volume1) is preferred.

Added a unit test to verify the deduplication works correctly.
2025-12-29 14:03:32 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
fdb2a07f56 fix(agent): find zpool binary on TrueNAS SCALE (#718)
Enhanced zpool binary lookup to try common paths when exec.LookPath fails.
This fixes issue #718 where TrueNAS SCALE reports inflated storage because
the agent runs with a restricted PATH that doesn't include /usr/sbin.

Changes:
- Added findZpool() helper that tries common paths like /usr/sbin/zpool,
  /sbin/zpool, /usr/local/sbin/zpool for TrueNAS/FreeBSD/Linux systems
- Added commonZpoolPaths variable listing typical zpool locations
- Added tests for the new findZpool function

This ensures zpool list is used for accurate pool-level capacity instead
of falling back to dataset-level summation.
2025-12-18 16:23:56 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
da51449392 fix: Exclude TrueNAS Docker overlay mounts from disk stats
Host agent was including Docker overlay2 mounts from TrueNAS SCALE's
.ix-apps directory in disk totals. These mounts inherit the ZFS pool's
AVAIL space, causing massively inflated storage numbers (e.g., 173 TB
per container overlay instead of actual usage).

Changes:
- Add /mnt/.ix-apps/docker/ to container overlay path exclusions
- Use ShouldSkipFilesystem() in host agent disk collection (was only
  using ShouldIgnoreReadOnlyFilesystem() which missed container paths)
- Add test cases for TrueNAS overlay paths

Related to #718
2025-12-04 03:03:04 +00:00
rcourtman
e0ccd2b8db Add unit tests for ZFS storage utility functions (hostmetrics)
65 test cases covering 8 functions:
- parseZpoolList: zpool command output parsing (15 cases)
- uniqueZFSPools: pool name deduplication (7 cases)
- bestZFSMountpoints: mountpoint selection logic (8 cases)
- zfsMountpointScore: mountpoint scoring algorithm (7 cases)
- zfsPoolFromDevice: pool name extraction (6 cases)
- calculatePercent: percentage calculation (7 cases)
- clampPercent: value clamping (8 cases)
- bestZFSPoolDatasets: dataset selection (7 cases)

First comprehensive unit test coverage for internal/hostmetrics
package ZFS utilities.
2025-11-30 12:50:58 +00:00
courtmanr@gmail.com
85461618fd Fix ZFS storage reporting on TrueNAS SCALE (#718)
- Refactor collector to support mocking
- Fix ZFS detection to support 'fuse.zfs' and case-insensitivity
- Add regression tests for ZFS dataset deduplication
2025-11-22 23:53:39 +00:00
rcourtman
45a8cf68ac fix(hostmetrics): dedupe ZFS pools for usable storage
Related to #718
2025-11-18 23:38:11 +00:00
rcourtman
2e1ef44ecd Filter read-only filesystems from host agent disk metrics (related to #690)
Squashfs snap mounts on Ubuntu (and similar read-only filesystems like
erofs on Home Assistant OS) always report near-full usage and trigger
false disk alerts. The filter logic existed in Proxmox monitoring but
wasn't applied to host agents.

Changes:
- Extract read-only filesystem filter to shared pkg/fsfilters package
- Apply filter in hostmetrics.collectDisks() for host/docker agents
- Apply filter in monitor.ApplyHostReport() for backward compatibility
- Convert internal/monitoring/fs_filters.go to wrapper functions

This prevents squashfs, erofs, iso9660, cdfs, udf, cramfs, romfs, and
saturated overlay filesystems from generating alerts. Filtering happens
at both collection time (agents) and ingestion time (server) to ensure
older agents don't cause false alerts until they're updated.
2025-11-12 09:47:02 +00:00
rcourtman
f2acdd59af Normalize docker agent version handling 2025-10-28 08:42:58 +00:00