Commit graph

6 commits

Author SHA1 Message Date
rcourtman
c7f4030c29 fix(monitoring): prevent memory leak from stale metrics history and rate tracker entries
MetricsHistory.Cleanup() was defined but never called, and even if called,
it only removed old data points without deleting map entries for deleted
containers/VMs. Each stale entry leaked ~224KB (7 pre-allocated slices).

Changes:
- Call metricsHistory.Cleanup() and rateTracker.Cleanup() in maintenance loop
- Delete map entries entirely when all data points have expired
- Return nil instead of empty slice in cleanupMetrics() to release backing arrays
- Add Cleanup() method to RateTracker with 24-hour stale threshold
- Add debug logging to track cleanup activity

Related to #1153
2026-02-03 17:16:06 +00:00
rcourtman
bd030c7c87 security: fix webhook SSRF, rate limit spoofing, metrics retention, and url poisoning
- Fix SSRF and rate limit bypass in SendEnhancedWebhook by validating the rendered URL.
- Fix rate limit spoofing in updates API by using secure IP extraction (trusted proxies).
- Fix memory leak in metrics history by correctly clearing fully stale data series.
- Fix public URL poisoning by preventing overwrites when explicitly configured.
2026-02-03 16:58:13 +00:00
rcourtman
4f824ab148 style: Apply gofmt to 37 files
Standardize code formatting across test files and monitor.go.
No functional changes.
2025-12-02 17:21:48 +00:00
rcourtman
ccd7851426 test: Add tests for lookupClusterEndpointLabel, GetNodeMetrics, stateDetails
- lookupClusterEndpointLabel: nil instance, no match, Host vs IP fallback,
  case-insensitive NodeName, whitespace handling (84.6% -> 100%)
- GetNodeMetrics: unknown metric types, empty nodeID, zero duration,
  empty metrics data, disk type (94.1% -> 100%)
- stateDetails: all breaker states including unknown/invalid state,
  retryAt calculations, timestamps (90% -> 100%)
2025-12-01 20:07:22 +00:00
rcourtman
63491e2672 test: Add tests for GetGuestMetrics, mergeSnapshot, getDockerCommandPayload
- GetGuestMetrics: all metric types, duration filtering, guest not found,
  unknown metric type, empty data (81% -> 100%)
- mergeSnapshot: timestamp merging for LastSuccess/LastError/LastMutated,
  ChangeHash preservation (86.7% -> 100%)
- getDockerCommandPayload: empty hostID, whitespace normalization, host
  not found, expired command cleanup, queued->dispatched marking,
  already-dispatched preservation (91.3% -> 100%)
2025-12-01 19:58:43 +00:00
rcourtman
d65a2fc87a Add unit tests for MetricsHistory (internal/monitoring)
58 test cases covering:
- NewMetricsHistory: constructor and initialization (4 cases)
- appendMetric: time-series append with retention and max limits (5 cases)
- cleanupMetrics: old point removal with cutoff time (5 cases)
- AddGuestMetric: all 7 metric types + unknown type handling (8 cases)
- AddNodeMetric: cpu/memory/disk + unknown type (4 cases)
- AddStorageMetric: usage/used/total/avail + unknown type (5 cases)
- GetGuestMetrics: duration filtering, nonexistent, zero duration (6 cases)
- GetNodeMetrics: duration filtering and edge cases (5 cases)
- GetAllGuestMetrics: multi-metric retrieval (3 cases)
- GetAllStorageMetrics: storage metric retrieval (2 cases)
- Cleanup: retention enforcement across all metric types (1 case)
- Multiple guests isolation: verify metrics don't leak (1 case)
- MaxDataPoints enforcement: oldest points dropped (1 case)
- RetentionTime enforcement: old points removed on append (1 case)

First comprehensive unit test file for metrics_history.go core
functions. Existing concurrency test in separate file.
2025-11-30 21:34:07 +00:00