vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-01 21:10:13 +00:00

rcourtman 2b48b0a459 feat: add --kube-include-all-deployments flag for Kubernetes agent

Adds IncludeAllDeployments option to show all deployments, not just
problem ones (where replicas don't match desired). This provides parity
with the existing --kube-include-all-pods flag.

- Add IncludeAllDeployments to kubernetesagent.Config
- Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var
- Update collectDeployments to respect the new flag
- Add test for IncludeAllDeployments functionality
- Update UNIFIED_AGENT.md documentation

Addresses feedback from PR #855

2025-12-18 20:58:30 +00:00

1.9 KiB

Raw Blame History

📉 Adaptive Polling

Pulse uses an adaptive scheduler to optimize polling based on instance health and activity.

🧠 Architecture

Scheduler: Calculates intervals based on health/staleness.
Priority Queue: Min-heap keyed by NextRun.
Circuit Breaker: Prevents hot loops on failing instances.
Backoff: Exponential retry delays (5s to 5m).

⚙️ Configuration

Adaptive polling is disabled by default.

UI

There is currently no dedicated UI for adaptive polling in v5.

Environment Variables

Variable	Default	Description
`ADAPTIVE_POLLING_ENABLED`	`false`	Enable/disable.
`ADAPTIVE_POLLING_BASE_INTERVAL`	`10s`	Healthy poll rate.
`ADAPTIVE_POLLING_MIN_INTERVAL`	`5s`	Active/busy rate.
`ADAPTIVE_POLLING_MAX_INTERVAL`	`5m`	Idle/backoff rate.

system.json

You can also set adaptivePollingEnabled (and related interval fields) in system.json and restart Pulse.

📊 Metrics

Exposed at :9091/metrics.

Metric	Type	Description
`pulse_monitor_poll_total`	Counter	Total poll attempts.
`pulse_monitor_poll_duration_seconds`	Histogram	Poll latency.
`pulse_monitor_poll_staleness_seconds`	Gauge	Age since last success.
`pulse_monitor_poll_queue_depth`	Gauge	Queue size.
`pulse_monitor_poll_errors_total`	Counter	Error counts by category.

⚡ Circuit Breaker

State	Trigger	Recovery
Closed	Normal operation.	—
Open	≥3 failures.	Backoff (max 5m).
Half-open	Retry window elapsed.	Success = Closed; Fail = Open.

Dead Letter Queue: After 5 transient or 1 permanent failure, tasks move to DLQ (30m retry).

🩺 Health API

GET /api/monitoring/scheduler/health (Auth required)

Returns:

Queue depth & breakdown.
Dead-letter tasks.
Circuit breaker states.
Per-instance staleness.

1.9 KiB Raw Blame History