Commit graph

11 commits

Author SHA1 Message Date
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
78c3434061 fix: include VMID in AI context to prevent incorrect references
The LLM was confusing VMIDs because they weren't included in the
context. Now the formatted context shows:

  ### Container: ollama (VMID 200) on minipc

This prevents the AI from referencing the wrong VMID when generating
findings and recommendations.
2025-12-21 23:13:47 +00:00
rcourtman
9c58bfa127 perf: reduce MetricSamples from 100 to 24 points
100 samples was causing 326k+ input tokens which is expensive.
24 samples (hourly resolution) still provides good pattern visibility
while significantly reducing token cost.

Estimated reduction: ~75% fewer metric tokens.
2025-12-21 22:56:19 +00:00
rcourtman
c15f260280 feat: increase MetricSamples to 100 points (~15 min resolution)
Modern LLMs have 100k+ token contexts. 100 samples over 24h gives
~15 minute resolution while adding minimal token overhead.

This lets the LLM see fine-grained patterns, short spikes, and
accurately distinguish anomalies from normal behavior.
2025-12-21 22:25:54 +00:00
rcourtman
d23f1c78de fix: increase MetricSamples to 24 points for hourly resolution
12 samples was too coarse (2-hour intervals could miss spikes).
24 samples gives ~hourly resolution while still being compact.
2025-12-21 22:24:02 +00:00
rcourtman
5877ce00c3 fix: use 24h window for MetricSamples (matches in-memory retention)
The in-memory MetricsHistory only retains 24 hours of data, not 7 days.
Changed computeGuestMetricSamples to use trendWindow24h instead of
trendWindow7d, and reduced sample count from 24 to 12 points.

This ensures the LLM actually receives metric samples in the context,
which wasn't happening before because the 7-day query returned empty data.
2025-12-21 22:19:40 +00:00
rcourtman
f6b1414ed6 debug: add logging to verify MetricSamples population for LLM context 2025-12-21 22:14:54 +00:00
rcourtman
2928fad643 feat(ai): pass raw metric samples to LLM for pattern interpretation
Instead of relying on pre-computed trend heuristics (which can be misleading
for edge cases like step changes vs continuous growth), we now pass downsampled
raw data points to the LLM so it can interpret patterns directly.

Changes:
- Add MetricSamples field to ResourceContext
- Add DownsampleMetrics() to reduce data points for LLM consumption
- Add formatMetricSamples() to format data compactly (e.g., 'Disk: 26→26→31%')
- Add computeGuestMetricSamples() to gather 7-day sampled history
- Populate MetricSamples for VMs and containers during context build
- Add History section to formatted context output

The LLM now sees actual patterns like 'stable for 6 days then jumped' rather
than just '45.8%/day growth rate' - allowing for much more nuanced interpretation.

This approach:
- Leverages LLM's pattern recognition instead of hard-coded heuristics
- Provides 7 days of data (~24 samples) for context on normal behavior
- Uses minimal tokens due to compact formatting with deduplication
- Is more future-proof as LLMs improve

Example output:
  **History (7d sampled, oldest→newest)**: Disk: 26→26→26→26→26→31%

Refs: Frigate disk usage false positive investigation
2025-12-21 21:09:24 +00:00
rcourtman
6aefeca979 feat: Enhance OCI container display and AI context
- Frontend: Add ociImage memo to extract clean image name from osTemplate
- Frontend: Show OCI image name in type badge tooltip
- Frontend: Display OCI image in OS column when no guest agent info available
- Frontend: Include ociImage in AI context data for selected OCI containers
- Backend: Differentiate OCI containers as 'oci_container' type in AI context
- Backend: Add Metadata field to ResourceContext for extensibility
- Backend: Include oci_image in container metadata for AI analysis
- Backend: Update section heading to 'LXC/OCI Containers' in AI context

This follows Docker container patterns to avoid duplicating work.
2025-12-12 18:00:09 +00:00
rcourtman
5a77fab633 feat(ai): Add baseline learning and anomaly detection (Phase 2)
Phase 2 of Pulse AI differentiation:

- Create internal/ai/baseline package for learned baselines
- Implement statistical baseline learning with mean, stddev, percentiles
- Add z-score based anomaly detection with severity classification
  (low, medium, high, critical based on standard deviations)
- Integrate baseline provider into context builder
- Wire baseline store into patrol service with adapters
- Add anomaly enrichment to resource contexts

Key features:
- Learn computes baseline from historical metric data points
- IsAnomaly and CheckAnomaly detect deviations from normal
- Persists baselines to disk as JSON for durability
- Formatted anomaly descriptions for AI consumption
  Example: 'Memory is high above normal (85.2% vs typical 42.1% ± 8.3%)'

The baseline store needs to be initialized and triggered to learn
from metrics history. Next step is adding the learning loop.

All tests passing.
2025-12-12 11:26:31 +00:00
rcourtman
88d419dd5b feat(ai): Add enriched context with historical trends and predictions
Phase 1 of Pulse AI differentiation:

- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions

This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights

All tests passing. Falls back to basic summary if metrics history unavailable.
2025-12-12 09:45:57 +00:00