Pulse

vrr/Pulse

Fork 0

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 19:41:17 +00:00

Commit graph

Author	SHA1	Message	Date
rcourtman	9b1709a05b	feat: enhance scheduler health API with rich instance metadata Add comprehensive instance-level diagnostics to /api/monitoring/scheduler/health New Response Structure: Enhanced "instances" array with per-instance details: - Instance metadata: displayName, type, connection URL - Poll status: last success/error timestamps, error messages, error category - Circuit breaker: state, timestamps, failure counts, retry windows - Dead letter: present flag, reason, attempt history, retry schedule Implementation: Data structures: - instanceInfo: cache of display names, URLs, types - pollStatus: tracks successes/errors with timestamps and categories - dlqInsight: DLQ entry metadata (reason, attempts, schedule) - circuitBreaker: enhanced with stateSince, lastTransition Tracking logic: - buildInstanceInfoCache: populate metadata from config on startup - recordTaskResult: track poll outcomes, error details, categories - sendToDeadLetter: capture DLQ insights (reason, timestamps) - circuitBreaker: record state transitions with timestamps Backward Compatible: - Existing fields (deadLetter, breakers, staleness) unchanged - New "instances" array is additive - Old clients can ignore new fields Testing: - Unit test: TestSchedulerHealth_EnhancedResponse validates all fields - Integration tests: still passing (55s) - All error tracking and breaker history verified Operator Benefits: - Diagnose issues without log digging - See error messages directly in API - Understand breaker states and retry schedules - Track DLQ entries with full context - Single API call for complete instance health view Example: Quickly identify "401 unauthorized" on specific PBS instance, see it's in DLQ after 5 retries, and know when next retry scheduled. Part of Phase 2 follow-up work to improve observability.	2025-10-20 15:13:38 +00:00

Author

SHA1

Message

Date

rcourtman

9b1709a05b

feat: enhance scheduler health API with rich instance metadata

Add comprehensive instance-level diagnostics to /api/monitoring/scheduler/health

**New Response Structure:**

Enhanced "instances" array with per-instance details:
- Instance metadata: displayName, type, connection URL
- Poll status: last success/error timestamps, error messages, error category
- Circuit breaker: state, timestamps, failure counts, retry windows
- Dead letter: present flag, reason, attempt history, retry schedule

**Implementation:**

Data structures:
- instanceInfo: cache of display names, URLs, types
- pollStatus: tracks successes/errors with timestamps and categories
- dlqInsight: DLQ entry metadata (reason, attempts, schedule)
- circuitBreaker: enhanced with stateSince, lastTransition

Tracking logic:
- buildInstanceInfoCache: populate metadata from config on startup
- recordTaskResult: track poll outcomes, error details, categories
- sendToDeadLetter: capture DLQ insights (reason, timestamps)
- circuitBreaker: record state transitions with timestamps

**Backward Compatible:**
- Existing fields (deadLetter, breakers, staleness) unchanged
- New "instances" array is additive
- Old clients can ignore new fields

**Testing:**
- Unit test: TestSchedulerHealth_EnhancedResponse validates all fields
- Integration tests: still passing (55s)
- All error tracking and breaker history verified

**Operator Benefits:**
- Diagnose issues without log digging
- See error messages directly in API
- Understand breaker states and retry schedules
- Track DLQ entries with full context
- Single API call for complete instance health view

Example: Quickly identify "401 unauthorized" on specific PBS instance,
see it's in DLQ after 5 retries, and know when next retry scheduled.

Part of Phase 2 follow-up work to improve observability.

2025-10-20 15:13:38 +00:00

1 commit