Commit graph

74 commits

Author SHA1 Message Date
rcourtman
32333cdbbe test: Add tests for authHTTPError.Error and shouldFallbackToForm
Tests for Proxmox client authentication error handling:
- authHTTPError.Error: message formatting based on status code
  (401/403 include status in message, others don't)
- shouldFallbackToForm: determines when to retry with form encoding
  (triggers on 400/415, not on auth errors or server errors)

16 test cases covering all code paths.
2025-12-01 13:39:50 +00:00
rcourtman
42eec54d6e Add unit tests for parseWearoutValue and clampWearoutConsumed functions
52 test cases covering:
- Empty/whitespace input
- Simple numeric strings and quoted values
- Percentage symbols and N/A variants
- Float values with truncation
- Messy SMART data with digit extraction fallback
- Clamping behavior for unknown, normal, and out-of-range values
2025-12-01 09:18:04 +00:00
rcourtman
f9122d736e Add unit tests for parseUint64Flexible function
32 test cases covering all code paths:
- nil, uint64, int, int64, float64 type handling
- json.Number parsing (delegates to string branch)
- String parsing: empty, decimal, hex (0x/0X), float notation, scientific
- Negative value handling (returns 0 for numeric types)
- Error cases: invalid strings, unsupported types
2025-12-01 09:11:02 +00:00
rcourtman
37550bff6d Add unit tests for ZFS device conversion functions
Tests added by ADA run #97 but commit was missed.
Covers: RaidZ types, log/cache/spare devices, nested mirrors,
ConvertToModelZFSPool, and struct field tests.
2025-12-01 09:03:48 +00:00
rcourtman
68c0e79b21 Add unit tests for cloneProfile and clonePhase functions in discovery
Add comprehensive tests for the cloneProfile and clonePhase utility
functions in pkg/discovery/discovery.go. Tests verify deep copying
behavior for all fields including subnets, metadata, warnings, extra
targets, and phases to ensure mutations don't affect original objects.
2025-12-01 01:51:54 +00:00
rcourtman
6c18849f79 Add unit tests for cluster_client utility functions
Test coverage for error detection and retry logic:
- extractStatusCode: 13 test cases for HTTP status code extraction
- isTransientRateLimitError: 17 test cases for rate limit detection
- isNotImplementedError: 14 test cases for 501 error detection
- isVMSpecificError: 16 test cases for VM-scoped errors
- calculateRateLimitBackoff: backoff timing verification
- isAuthError: 12 test cases for authentication errors

Coverage 35.5% → 37.3%
2025-12-01 00:24:21 +00:00
rcourtman
dc76294ce1 Add unit tests for discovery package utility functions
Test coverage for pure utility functions: friendlyPhaseName,
defaultProductsForPort, cloneHeader, copyMetadata, and
ensurePolicyDefaults.
2025-11-30 16:05:11 +00:00
rcourtman
efafbe8e31 Add unit tests for PMG flexible JSON type parsers
Tests for flexibleFloat and flexibleInt custom JSON unmarshalers that
handle PMG API responses where numeric values may arrive as numbers,
strings, or nulls. 64 test cases covering:

- JSON numbers (integers, floats, scientific notation, negatives)
- String values (numeric strings, empty, whitespace, "null")
- JSON null values
- Error cases (invalid strings, arrays, objects, booleans)
- Boundary values (max/min float64)
- Real PMG response patterns (mail stats, queue status)
- Struct embedding behavior
2025-11-30 03:04:12 +00:00
rcourtman
92c2d198b1 Add unit tests for Proxmox replication utility functions
Comprehensive test coverage for JSON parsing helpers used in
replication job status parsing: stringFromAny, intFromAny,
boolFromAny, floatFromAny, parseReplicationTime, parseDurationSeconds,
parseHHMMSSToSeconds, and parseReplicationJob.

Coverage increased from 22.6% to 35.5%.
2025-11-30 02:35:11 +00:00
rcourtman
316161f989 Add unit tests for coerceUint64 and FlexInt.UnmarshalJSON
45 test cases covering:
- FlexInt: integer/float/string parsing, truncation behavior, error cases
- coerceUint64: nil, float64 (including NaN/Inf), int/int32/int64,
  uint32/uint64, json.Number, string parsing (whitespace, null, quotes,
  commas, scientific notation), unsupported types

Coverage: 20.5% -> 22.6%
2025-11-30 02:17:52 +00:00
rcourtman
69de7c25ce Fix cluster degraded status not recovering after transient failures
The previous fix (6db4ee7a) cleared stale error messages but didn't mark
endpoints as healthy again after successful operations. This caused
clusters to remain in "degraded" state permanently once any endpoint had
a temporary issue, even if all endpoints were actually working.

The fix now marks endpoints healthy in clearEndpointError() after
successful operations, ensuring degraded clusters recover automatically.

Related to #659
2025-11-29 19:04:11 +00:00
rcourtman
2eea0335a2 Extract filesystem filtering logic into pkg/fsfilters
Move the inline filesystem skip logic from pollVMsAndContainersEfficient
into a reusable ShouldSkipFilesystem function. This consolidates filtering
for virtual filesystems (tmpfs, cgroup, etc.), network mounts (nfs, cifs,
fuse), and special mountpoints (/dev, /proc, /snap, etc.) into one tested
location.

Reduces cyclomatic complexity of pollVMsAndContainersEfficient and adds
28 test cases covering virtual fs types, network mounts, special mounts,
Windows paths, and edge cases.
2025-11-29 16:38:08 +00:00
rcourtman
1b5528356b fix: clear stale errors after successful cluster operations
Previously, errors stored in ClusterClient.lastError were only cleared
during initial health checks or when recovering unhealthy nodes. This
caused stale error messages to persist in the UI even after the
underlying issues were resolved.

The fix clears cached errors in two places:
1. After passing connectivity test in getHealthyClient()
2. After successful operation in executeWithFailover()

This ensures that once an endpoint starts working again, any previous
error messages are cleared from the UI without requiring a restart.

Related to #659, #754
2025-11-27 16:22:16 +00:00
rcourtman
ad998a1e2f style: fix staticcheck style warnings
- Merge variable declaration with assignment (S1021)
- Use unconditional strings.TrimPrefix (S1017)
- Remove unnecessary nil checks around range (S1031)
- Remove unnecessary fmt.Sprintf (S1039)
- Use copy() instead of manual loop (S1001)
- Use time.Until instead of t.Sub(time.Now()) (S1024)
- Use buf.String() instead of string(buf.Bytes()) (S1030)
2025-11-27 09:19:33 +00:00
rcourtman
bc9e89696b chore: fix staticcheck U1000 unused code warnings
- Remove unused ipv6Regex from validation.go
- Suppress unused recordAlertFired/recordAlertResolved hooks (kept for future use)
- Remove unused apiLimiter rate limiter
- Remove unused stopOnce fields from csrf_store.go and session_store.go
- Remove unused lastBroadcast field from hub.go
- Remove unused lastUsedIndex field from cluster_client.go
2025-11-27 09:12:17 +00:00
rcourtman
8276ae837e chore: cleanup proxmox IsAuthError and remove stray comment
- Make IsAuthError unexported (isAuthError) since it's only used internally
- Remove stray '// test comment' from docker_metadata.go
2025-11-27 08:59:01 +00:00
rcourtman
3045aa16fb chore: remove unused phaseError type from discovery 2025-11-27 08:47:13 +00:00
rcourtman
c439a83fba chore: remove additional dead code
Remove 241 lines of unreachable code across internal and pkg:
- internal/crypto/crypto.go: unused NewCryptoManager wrapper
- internal/monitoring/scheduler.go: unused fixedIntervalSelector type
- internal/ssh/knownhosts/manager.go: unused hostKeyExists function
- internal/updates/manager.go: unused getLatestRelease wrapper
- internal/updates/updater.go: unused GetAll method
- pkg/discovery/discovery.go: unused scanWorker and runPhase (legacy compat)
- pkg/proxmox/client.go: unused post, getTaskStatus, waitForTaskCompletion, getTaskLog
- pkg/proxmox/cluster_client.go: unused markUnhealthy wrapper
2025-11-27 05:13:26 +00:00
rcourtman
01f7d81d38 style: fix gofmt formatting inconsistencies
Run gofmt -w to fix tab/space inconsistencies across 33 files.
2025-11-26 23:44:36 +00:00
rcourtman
c9ce98d679 test: add unit tests for pkg/discovery/envdetect
Add 20 tests covering:
- Environment enum String() method
- DefaultScanPolicy default values
- ScanPolicy struct fields
- SubnetPhase struct fields
- EnvironmentProfile struct fields
- parseHexIP little-endian IP parsing
- tryCommonSubnets fallback subnet list
- profileWithWarning helper
- addFallbackSubnets with confidence handling
2025-11-26 14:25:17 +00:00
rcourtman
981b41eeb8 test: add unit tests for pkg/agents/docker 2025-11-26 14:22:58 +00:00
rcourtman
d57c2a5a6c test: add unit tests for pkg/agents/host 2025-11-26 14:16:57 +00:00
rcourtman
f6e59b59c3 test: add unit tests for pkg/pbs 2025-11-26 14:15:49 +00:00
rcourtman
cd66266236 test: add unit tests for pkg/tlsutil 2025-11-26 14:14:34 +00:00
rcourtman
ea335546fc feat: improve legacy agent detection and migration UX
Add seamless migration path from legacy agents to unified agent:

- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer

Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
2025-11-25 23:26:22 +00:00
courtmanr@gmail.com
ac9384aee4 Fix PBS discovery causing auth failures
Increases confidence score for PBS when receiving 401/403 responses to avoid unnecessary probing of other endpoints that trigger auth failure logs.

Fixes #741
2025-11-23 00:08:54 +00:00
rcourtman
b28828a822 Handle VM guest agent errors without marking nodes unhealthy (related to #736) 2025-11-21 17:34:25 +00:00
rcourtman
2207642fa9 Related to #727: normalize persisted Proxmox hosts 2025-11-20 19:58:05 +00:00
rcourtman
766cbe573e Handle missing storage on cluster nodes 2025-11-18 15:57:29 +00:00
rcourtman
7c895df1f3 Fix Proxmox 9.x VM status endpoint incompatibility
Proxmox VE 9.x removed support for the "full" parameter in the
/nodes/{node}/qemu/{vmid}/status/current endpoint. When Pulse sent
GetVMStatus() requests with ?full=1, Proxmox responded with:

  API error 400: {"errors":{"full":"property is not defined in schema..."}}

This caused the cluster client to mark ALL endpoints as unhealthy, which
cascaded into multiple failures:
- VM status checks failed
- Guest agent queries were blocked
- Filesystem data collection stopped working
- All Windows VMs showed disk:-1 (unknown) instead of actual disk usage

The fix removes the ?full=1 parameter since Proxmox 9.x returns all data
by default without needing this parameter. This maintains backward
compatibility with older Proxmox versions while fixing the issue in 9.x.

After this fix:
- Cluster endpoints are correctly marked as healthy
- Guest agent queries work properly
- Windows VMs report actual disk usage (e.g., 26% on C:\ drive)
- VM monitoring functions normally on Proxmox 9.x
2025-11-13 11:22:36 +00:00
rcourtman
f61b850179 Ensure VM status requests always return meminfo (Related to #694) 2025-11-12 17:30:10 +00:00
rcourtman
2e1ef44ecd Filter read-only filesystems from host agent disk metrics (related to #690)
Squashfs snap mounts on Ubuntu (and similar read-only filesystems like
erofs on Home Assistant OS) always report near-full usage and trigger
false disk alerts. The filter logic existed in Proxmox monitoring but
wasn't applied to host agents.

Changes:
- Extract read-only filesystem filter to shared pkg/fsfilters package
- Apply filter in hostmetrics.collectDisks() for host/docker agents
- Apply filter in monitor.ApplyHostReport() for backward compatibility
- Convert internal/monitoring/fs_filters.go to wrapper functions

This prevents squashfs, erofs, iso9660, cdfs, udf, cramfs, romfs, and
saturated overlay filesystems from generating alerts. Filtering happens
at both collection time (agents) and ingestion time (server) to ensure
older agents don't cause false alerts until they're updated.
2025-11-12 09:47:02 +00:00
rcourtman
bb7ca93c18 feat: Add mdadm RAID monitoring support for host agents
Implements comprehensive mdadm RAID array monitoring for Linux hosts
via pulse-host-agent. Arrays are automatically detected and monitored
with real-time status updates, rebuild progress tracking, and automatic
alerting for degraded or failed arrays.

Key changes:

**Backend:**
- Add mdadm package for parsing mdadm --detail output
- Extend host agent report structure with RAID array data
- Integrate mdadm collection into host agent (Linux-only, best-effort)
- Add RAID array processing in monitoring system
- Implement automatic alerting:
  - Critical alerts for degraded arrays or arrays with failed devices
  - Warning alerts for rebuilding/resyncing arrays with progress tracking
  - Auto-clear alerts when arrays return to healthy state

**Frontend:**
- Add TypeScript types for RAID arrays and devices
- Display RAID arrays in host details drawer with:
  - Array status (clean/degraded/recovering) with color-coded indicators
  - Device counts (active/total/failed/spare)
  - Rebuild progress percentage and speed when applicable
  - Green for healthy, amber for rebuilding, red for degraded

**Documentation:**
- Document mdadm monitoring feature in HOST_AGENT.md
- Explain requirements (Linux, mdadm installed, root access)
- Clarify scope (software RAID only, hardware RAID not supported)

**Testing:**
- Add comprehensive tests for mdadm output parsing
- Test parsing of healthy, degraded, and rebuilding arrays
- Verify proper extraction of device states and rebuild progress

All builds pass successfully. RAID monitoring is automatic and best-effort
- if mdadm is not installed or no arrays exist, host agent continues
reporting other metrics normally.

Related to #676
2025-11-09 16:36:33 +00:00
rcourtman
a406fe42d8 Fix Proxmox 9.x RRD parameter incompatibility causing cluster health issues
Proxmox VE 9.x removed support for the 'ds' parameter in RRD endpoints
(/nodes/{node}/rrddata and /nodes/{node}/lxc/{vmid}/rrddata). When Pulse
sent RRD requests with ds=memused,memavailable,etc., Proxmox responded with:

  API error 400: {"errors":{"ds":"property is not defined in schema..."}}

This caused cluster nodes to be repeatedly marked unhealthy, which cascaded
into storage polling failures showing 'All cluster endpoints are unhealthy'
even though the nodes were actually healthy and reachable.

Changes:
- Added check in cluster_client.go executeWithFailover to recognize the ds
  parameter error as a capability issue rather than node health failure
- Nodes with this error no longer get marked unhealthy
- Storage polling and other operations now succeed even when RRD calls fail
- The RRD data will be unavailable but core monitoring continues

This fix maintains backward compatibility with older Proxmox versions while
gracefully handling the API change in Proxmox 9.x.
2025-11-08 12:06:08 +00:00
rcourtman
48fabdd827 Improve Docker temperature monitoring documentation for clarity (related to #600)
Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be
more user-friendly and address common setup issues:

- Added clear explanation of why the proxy is needed (containers can't access hardware)
- Provided concrete IP example instead of placeholder
- Showed full docker-compose.yml context with proper YAML structure
- Added sudo to commands where needed
- Updated docker-compose commands to v2 syntax with note about v1
- Expanded verification steps with clearer success indicators
- Added reminder to check container name in verification commands

These improvements should help users who encounter blank temperature displays
due to missing proxy installation or bind mount configuration.
2025-11-07 15:09:42 +00:00
rcourtman
9199892115 Fix Windows VM disk accumulation bug by normalizing drive letters
Related to #656

Windows guest agents can return multiple directory mountpoints (C:\, C:\Users,
C:\Windows) all on the same physical drive. When the QEMU guest agent omits
disk[] metadata, commit 5325ef481 falls back to using the mountpoint string
as the disk identifier. This causes every Windows directory to be treated as
a separate disk, accumulating to inflated totals (e.g., 1TB reported for a
250GB drive).

Root cause:
The fallback logic in pkg/proxmox/client.go:1585-1594 assigns fs.Disk =
fs.Mountpoint when disk[] is missing. On Windows, every directory path is
unique, so the deduplication guard in internal/monitoring/monitor_polling.go:
619-635 never triggers, causing all directories to be summed.

Changes:
- Detect Windows-style mountpoints (drive letter + colon + backslash)
- Normalize to drive root when disk[] is missing (e.g., C:\Users → C:)
- Preserve existing behavior for Linux/BSD and VMs with disk[] metadata
- Add debug logging for synthesized Windows drive identifiers

This fix maintains backward compatibility with commit 5325ef481 while
preventing the Windows directory accumulation issue. LXC containers are
unaffected as they use a different code path.
2025-11-07 12:27:11 +00:00
rcourtman
1a78dcbba2 Fix guest agent disk data regression on Proxmox 8.3+
Related to #630

Proxmox 8.3+ changed the VM status API to return the `agent` field as an
object ({"enabled":1,"available":1}) instead of an integer (0 or 1). This
caused Pulse to incorrectly treat VMs as having no guest agent, resulting
in missing disk usage data (disk:-1) even when the guest agent was running
and functional.

The issue manifested as:
- VMs showing "Guest details unavailable" or missing disk data
- Pulse logs showing no "Guest agent enabled, querying filesystem info" messages
- `pvesh get /nodes/<node>/qemu/<vmid>/agent/get-fsinfo` working correctly
  from the command line, confirming the agent was functional

Root cause:
The VMStatus struct defined `Agent` as an int field. When Proxmox 8.3+ sent
the new object format, JSON unmarshaling silently left the field at zero,
causing Pulse to skip all guest agent queries.

Changes:
- Created VMAgentField type with custom UnmarshalJSON to handle both formats:
  * Legacy (Proxmox <8.3): integer (0 or 1)
  * Modern (Proxmox 8.3+): object {"enabled":N,"available":N}
- Updated VMStatus.Agent from `int` to `VMAgentField`
- Updated all references to `detailedStatus.Agent` to use `.Agent.Value`
- The unmarshaler prioritizes the "available" field over "enabled" to ensure
  we only query when the agent is actually responding

This fix maintains backward compatibility with older Proxmox versions while
supporting the new format introduced in Proxmox 8.3+.
2025-11-06 18:42:46 +00:00
rcourtman
af55362009 Fix inflated RAM usage reporting for LXC containers
Related to #553

## Problem

LXC containers showed inflated memory usage (e.g., 90%+ when actual usage was 50-60%,
96% when actual was 61%) because the code used the raw `mem` value from Proxmox's
`/cluster/resources` API endpoint. This value comes from cgroup `memory.current` which
includes reclaimable cache and buffers, making memory appear nearly full even when
plenty is available.

## Root Cause

- **Nodes**: Had sophisticated cache-aware memory calculation with RRD fallbacks
- **VMs (qemu)**: Had detailed memory calculation using guest agent meminfo
- **LXCs**: Naively used `res.Mem` directly without any cache-aware correction

The Proxmox cluster resources API's `mem` field for LXCs includes cache/buffers
(from cgroup memory accounting), which should be excluded for accurate "used" memory.

## Solution

Implement cache-aware memory calculation for LXC containers by:

1. Adding `GetLXCRRDData()` method to fetch RRD metrics for LXC containers from
   `/nodes/{node}/lxc/{vmid}/rrddata`
2. Using RRD `memavailable` to calculate actual used memory (total - available)
3. Falling back to RRD `memused` if `memavailable` is not available
4. Only using cluster resources `mem` value as last resort

This matches the approach already used for nodes and VMs, providing consistent
cache-aware memory reporting across all resource types.

## Changes

- Added `GuestRRDPoint` type and `GetLXCRRDData()` method to pkg/proxmox
- Added `GetLXCRRDData()` to ClusterClient for cluster-aware operations
- Modified LXC memory calculation in `pollPVEInstance()` to use RRD data when available
- Added guest memory snapshot recording for LXC containers
- Updated test stubs to implement the new interface method

## Testing

- Code compiles successfully
- Follows the same proven pattern used for nodes and VMs
- Includes diagnostic snapshot recording for troubleshooting
2025-11-06 00:16:18 +00:00
rcourtman
23691d5b41 Improve cluster health diagnostics and error messaging
Related to #405

Enhances error reporting and logging when all cluster endpoints are
unhealthy, making it easier to diagnose connectivity issues.

Changes:

1. Enhanced error messages in cluster_client.go:
   - Error now includes list of unreachable endpoints
   - Added detailed logging when no healthy endpoints available
   - Log at WARN level (not DEBUG) when cluster health check fails
   - Better context in recovery attempts with start/completion summaries

2. Improved storage polling resilience in monitor_polling.go:
   - Better error context when cluster storage polling fails
   - Specific guidance for "no healthy nodes available" scenario
   - Storage polling continues with direct node queries even if
     cluster-wide query fails (already worked, but now clearer)

3. Better recovery logging:
   - Log when recovery attempts start with list of unhealthy endpoints
   - Log individual recovery failures at DEBUG level
   - Log recovery summary (success/failure counts)
   - Track throttled endpoints separately for clearer diagnostics

These changes help users understand:
- Which specific endpoints are unreachable
- Whether it's a network/connectivity issue vs. API issue
- That Pulse will continue trying to recover endpoints automatically
- That storage monitoring continues via direct node queries

The root issue is that Pulse's internal health tracking can mark all
endpoints unhealthy when they're unreachable from the Pulse server,
even if Proxmox reports them as "online" in cluster status. Better
logging helps diagnose these network connectivity issues.
2025-11-05 19:44:29 +00:00
rcourtman
4c1d7a2797 Fix PMG API parameter issues causing 400 errors
Related to #614

Corrects three issues with PMG monitoring:

1. Remove unsupported timeframe parameter from GetMailStatistics
   - PMG API /statistics/mail does not accept timeframe parameter
   - Previously sent "timeframe=day" causing 400 error
   - API returns current day statistics by default

2. Fix GetMailCount timespan parameter to use seconds
   - Changed from 24 (hours) to 86400 (seconds)
   - PMG API expects timespan in seconds, not hours
   - Previously sent "timespan=24" causing 400 error

3. Update function signature and tests
   - Renamed GetMailCount parameter from timespanHours to timespanSeconds
   - Updated test expectations to match corrected API calls
   - Tests verify parameters are sent correctly

These changes align the PMG client with actual PMG API requirements,
fixing the data population issues reported in v4.25.0.
2025-11-05 19:28:37 +00:00
rcourtman
c93581e1aa Add DNS caching to reduce excessive DNS queries
Related to #608

Implements DNS caching using rs/dnscache to dramatically reduce DNS query
volume for frequently accessed Proxmox hosts. Users were reporting 260,000+
DNS queries in 37 hours for the same hostnames.

Changes:
- Added rs/dnscache dependency for DNS resolution caching
- Created pkg/tlsutil/dnscache.go with DNS cache wrapper
- Updated HTTP client creation to use cached DNS resolver
- Added DNSCacheTimeout configuration option (default: 5 minutes)
- Made DNS cache timeout configurable via:
  - system.json: dnsCacheTimeout field (seconds)
  - Environment variable: DNS_CACHE_TIMEOUT (duration string)
- DNS cache periodically refreshes to prevent stale entries

Benefits:
- Reduces DNS query load on local DNS servers by ~99%
- Reduces network traffic and DNS query log volume
- Maintains fresh DNS entries through periodic refresh
- Configurable timeout for different network environments

Default behavior: 5-minute cache timeout with automatic refresh
2025-11-05 18:25:38 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
32392d1212 Add disk metrics, block I/O, and mount details to Docker monitoring
Extends Docker container monitoring with comprehensive disk and storage information:
- Writable layer size and root filesystem usage displayed in new Disk column
- Block I/O statistics (read/write bytes totals) shown in container drawer
- Mount metadata including type, source, destination, mode, and driver details
- Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets)

Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.
2025-10-29 12:05:36 +00:00
rcourtman
f2acdd59af Normalize docker agent version handling 2025-10-28 08:42:58 +00:00
rcourtman
68ce8e7520 feat: finalize swarm service monitoring (#598) 2025-10-26 09:35:49 +00:00
rcourtman
5c54685f04 Add API token scopes and standalone host agent
Introduces granular permission scopes for API tokens (docker:report, docker:manage, host-agent:report, monitoring:read/write, settings:read/write) allowing tokens to be restricted to minimum required access. Legacy tokens default to full access until scopes are explicitly configured.

Adds standalone host agent for monitoring Linux, macOS, and Windows servers outside Proxmox/Docker estates. New Servers workspace in UI displays uptime, OS metadata, and capacity metrics from enrolled agents.

Includes comprehensive token management UI overhaul with scope presets, inline editing, and visual scope indicators.
2025-10-23 11:40:31 +00:00
rcourtman
a885fb5472 Surface LXC interface IPs via PVE interfaces API (#596) 2025-10-23 08:07:32 +00:00
rcourtman
b95c01066e Capture dynamic LXC IP metrics (#596) 2025-10-23 07:50:45 +00:00
rcourtman
be85459db2 Add LXC config metadata for guest drawers (#596) 2025-10-23 07:30:32 +00:00
rcourtman
aac3dacd63 Improve LXC guest metrics visibility (#596) 2025-10-22 22:24:33 +00:00