Commit graph

9 commits

Author SHA1 Message Date
rcourtman
ebc29b4fdb feat: show pending apt updates for Proxmox nodes (#1083)
- Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model
- Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update)
- Add 30-minute polling cache to avoid excessive API calls
- Add pendingUpdates to frontend Node type
- Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+)
- Update test stubs for interface compliance

Requires Sys.Audit permission on Proxmox API token to read apt updates.
2026-01-21 10:53:36 +00:00
rcourtman
754e9d1abd Fix monitoring test panic and goroutine leaks
Two critical fixes to prevent test timeouts:

1. Nil map panic in TestPollPVEInstanceUsesRRDMemUsedFallback:
   - Test monitor was missing nodeLastOnline map initialization
   - Panic occurred when pollPVEInstance tried to update nodeLastOnline[nodeID]
   - Caused deadlock when panic recovery tried to acquire already-held mutex
   - Added nodeLastOnline: make(map[string]time.Time) to test monitor

2. Alert manager goroutine leak in Docker tests:
   - newTestMonitor() created alert manager but never stopped it
   - Background goroutines (escalationChecker, periodicSaveAlerts) kept running
   - Added t.Cleanup(func() { m.alertManager.Stop() }) to test helper

These fixes resolve the 10+ minute test timeouts in CI workflows.

Related to workflow run 19281508603.
2025-11-11 23:52:24 +00:00
rcourtman
d7766af799 Fix backend test failures blocking release workflow
Three categories of fixes:

1. Goroutine leak causing 10-minute timeout:
   - Add defer mon.notificationMgr.Stop() in monitor_memory_test.go
   - Background goroutines from notification manager weren't being stopped

2. Database NULL column scanning errors:
   - Change LastError from string to *string in queue.go
   - Change PayloadBytes from int to *int in queue.go
   - SQL NULL values require pointer types in Go

3. SSRF protection blocking test servers:
   - Check allowlist for localhost before rejecting in notifications.go
   - Set PULSE_DATA_DIR to temp directory in tests
   - Add defer nm.Stop() calls to prevent goroutine leaks

Fixes for preflight test failures in workflow run 19280879903.
2025-11-11 23:27:03 +00:00
rcourtman
af55362009 Fix inflated RAM usage reporting for LXC containers
Related to #553

## Problem

LXC containers showed inflated memory usage (e.g., 90%+ when actual usage was 50-60%,
96% when actual was 61%) because the code used the raw `mem` value from Proxmox's
`/cluster/resources` API endpoint. This value comes from cgroup `memory.current` which
includes reclaimable cache and buffers, making memory appear nearly full even when
plenty is available.

## Root Cause

- **Nodes**: Had sophisticated cache-aware memory calculation with RRD fallbacks
- **VMs (qemu)**: Had detailed memory calculation using guest agent meminfo
- **LXCs**: Naively used `res.Mem` directly without any cache-aware correction

The Proxmox cluster resources API's `mem` field for LXCs includes cache/buffers
(from cgroup memory accounting), which should be excluded for accurate "used" memory.

## Solution

Implement cache-aware memory calculation for LXC containers by:

1. Adding `GetLXCRRDData()` method to fetch RRD metrics for LXC containers from
   `/nodes/{node}/lxc/{vmid}/rrddata`
2. Using RRD `memavailable` to calculate actual used memory (total - available)
3. Falling back to RRD `memused` if `memavailable` is not available
4. Only using cluster resources `mem` value as last resort

This matches the approach already used for nodes and VMs, providing consistent
cache-aware memory reporting across all resource types.

## Changes

- Added `GuestRRDPoint` type and `GetLXCRRDData()` method to pkg/proxmox
- Added `GetLXCRRDData()` to ClusterClient for cluster-aware operations
- Modified LXC memory calculation in `pollPVEInstance()` to use RRD data when available
- Added guest memory snapshot recording for LXC containers
- Updated test stubs to implement the new interface method

## Testing

- Code compiles successfully
- Follows the same proven pattern used for nodes and VMs
- Includes diagnostic snapshot recording for troubleshooting
2025-11-06 00:16:18 +00:00
rcourtman
a885fb5472 Surface LXC interface IPs via PVE interfaces API (#596) 2025-10-23 08:07:32 +00:00
rcourtman
b95c01066e Capture dynamic LXC IP metrics (#596) 2025-10-23 07:50:45 +00:00
rcourtman
be85459db2 Add LXC config metadata for guest drawers (#596) 2025-10-23 07:30:32 +00:00
rcourtman
20ff56aceb Add coverage for PVE memused fallback #553 2025-10-22 17:14:12 +00:00
rcourtman
7ae393c8ec Refine Proxmox node memory fallback (#582) 2025-10-22 15:36:26 +00:00