Commit graph

12 commits

Author SHA1 Message Date
rcourtman
f45f7401c0 Make metrics Flush wait for queued writes 2026-03-25 14:14:00 +00:00
rcourtman
2acf2e9ef9 Reduce metrics store transaction churn (#1124) 2026-03-25 12:06:28 +00:00
rcourtman
9d8f8b45b5 fix(docker,metrics): preserve container metadata on update and reduce DB writes
Docker container URL preserved on update (#1054): container updates
recreate the container with a new runtime ID. The agent now includes
{oldContainerId, newContainerId} in the completion ACK payload; the
server uses this to copy persisted metadata (custom URLs, descriptions,
tags) to the new ID so nothing is lost. Migration is a copy, not a move,
so rollback scenarios still find metadata under the original ID.

Reduce metrics.db write amplification (#1124): add a UNIQUE index on
(resource_type, resource_id, metric_type, timestamp, tier) so rollup
reprocessing after a failed checkpoint uses INSERT OR IGNORE instead of
creating duplicate rows. Existing duplicates are deduplicated once on
startup if the index creation would otherwise fail. Also sets
wal_autocheckpoint(500) to checkpoint the WAL more frequently, preventing
unbounded WAL growth.

Fixes #1054
Fixes #1124
2026-02-18 12:56:46 +00:00
rcourtman
c2d2e7de0e fix: run retention before auto-vacuum migration to reduce VACUUM cost
VACUUM creates a full copy of the database. Running retention first
deletes stale data (5GB → ~60MB live), so the VACUUM copies far less
data — faster startup and much less temporary disk space needed.
2026-02-11 13:37:29 +00:00
rcourtman
06a9f9694f fix: migrate existing metrics databases to incremental auto-vacuum
The auto_vacuum(INCREMENTAL) pragma from the previous commit only takes
effect on new databases. SQLite requires a full VACUUM to restructure
existing files when switching from NONE to INCREMENTAL. Without this,
users upgrading from bloated 5GB+ databases would never reclaim space.

Adds a one-time migration on startup that detects the current auto_vacuum
mode and runs VACUUM to convert if needed. Subsequent startups skip the
migration since the mode is already INCREMENTAL.
2026-02-11 13:35:03 +00:00
rcourtman
284bdd7ade fix: prevent metrics.db bloat with automatic vacuum and WAL checkpointing
The metrics database could grow to 5GB+ for modest setups because:
1. Retention deletes rows hourly but SQLite never reclaims the space
2. WAL file grows unbounded without explicit checkpointing
3. No cleanup runs on startup, so restarts accumulate stale data

Fixes:
- Enable auto_vacuum=INCREMENTAL so deleted pages can be reclaimed
- Run incremental_vacuum after each retention cleanup
- Force WAL checkpoint(TRUNCATE) after deletes to prevent WAL bloat
- Run retention on startup to clean stale data immediately

Expected DB size for a 50-resource setup drops from 5GB+ to ~60-70MB.

Ref: GitHub Discussion #1231
2026-02-10 23:13:32 +00:00
rcourtman
35eedcb5ac Fix: metrics store tier fallback for mock mode sparklines
When querying short time ranges (1h, 6h), the metrics store only looked
in TierRaw and TierMinute which were empty in mock mode. The seeded data
was stored in TierHourly and TierDaily.

Updated tierFallbacks to include coarser tiers as fallbacks:
- TierRaw now falls back to TierMinute, then TierHourly
- TierMinute now falls back to TierRaw, then TierHourly

This ensures sparkline data is available in mock/demo mode where
historical data is seeded into coarser tiers.
2026-02-03 12:03:06 +00:00
rcourtman
3b347b6548 fix: harden SQLite against I/O contention causing persistent lock errors
- Move all SQLite pragmas from db.Exec() to DSN parameters so every
  connection the pool creates gets busy_timeout and other settings.
  Previously only the first connection had these applied.
- Set MaxOpenConns(1) on audit, RBAC, and notification databases
  (metrics already had this). Fixes potential for multiple connections
  where new ones lack busy_timeout.
- Increase busy_timeout from 5s to 30s across all databases to
  tolerate disk I/O pressure during backup windows.
- Fix nested query deadlocks in GetRoles(), GetUserAssignments(), and
  CancelByAlertIDs() that would deadlock with MaxOpenConns(1).
- Fix circuit breaker retryInterval not resetting on recovery, which
  caused the next trip to start at 5-minute backoff instead of 5s.

Related to #1156
2026-02-02 17:29:14 +00:00
rcourtman
8963d69764 feat: add metrics store point limiting and mock improvements
- Add point limiting to metrics queries
- Improve mock metrics history for testing
- Add monitor enhancements
2026-01-22 22:29:56 +00:00
rcourtman
c75972d57c Fix mock metrics history and guest drawer controls 2026-01-22 09:39:53 +00:00
rcourtman
c8b6cbfc6d feat(pro): long-term metrics history (30d/90d)
- Add FeatureLongTermMetrics license feature for Pro tier
- Implement tiered storage in metrics store (raw, minute, hourly, daily)
- Add covering index for unified history query performance
- Seed mock data for 90 days with appropriate aggregation tiers
- Update PULSE_PRO.md to document the feature
- 7-day history remains free, 30d/90d requires Pro license
2026-01-22 00:42:41 +00:00
rcourtman
2a8f55d719 feat(enterprise): add Advanced Reporting and Audit Webhooks integration
This commit adds enterprise-grade reporting and audit capabilities:

Reporting:
- Refactored metrics store from internal/ to pkg/ for enterprise access
- Added pkg/reporting with shared interfaces for report generation
- Created API endpoint: GET /api/admin/reports/generate
- New ReportingPanel.tsx for PDF/CSV report configuration

Audit Webhooks:
- Extended pkg/audit with webhook URL management interface
- Added API endpoint: GET/POST /api/admin/webhooks/audit
- New AuditWebhookPanel.tsx for webhook configuration
- Updated Settings.tsx with Reporting and Webhooks tabs

Server Hardening:
- Enterprise hooks now execute outside mutex with panic recovery
- Removed dbPath from metrics Stats API to prevent path disclosure
- Added storage metrics persistence to polling loop

Documentation:
- Updated README.md feature table
- Updated docs/API.md with new endpoints
- Updated docs/PULSE_PRO.md with feature descriptions
- Updated docs/WEBHOOKS.md with audit webhooks section
2026-01-09 21:31:49 +00:00
Renamed from internal/metrics/store.go (Browse further)