mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-05-08 09:53:25 +00:00
Upgrade diagnostics infrastructure from 5/10 to 8/10 production readiness with enhanced metrics, logging, and request correlation capabilities. **Request Correlation** - Wire request IDs through context in middleware - Return X-Request-ID header in all API responses - Enable downstream log correlation across request lifecycle **HTTP/API Metrics** (18 new Prometheus metrics) - pulse_http_request_duration_seconds - API latency histogram - pulse_http_requests_total - request counter by method/route/status - pulse_http_request_errors_total - error counter by type - Path normalization to control label cardinality **Per-Node Poll Metrics** - pulse_monitor_node_poll_duration_seconds - per-node timing - pulse_monitor_node_poll_total - success/error counts per node - pulse_monitor_node_poll_errors_total - error breakdown per node - pulse_monitor_node_poll_last_success_timestamp - freshness tracking - pulse_monitor_node_poll_staleness_seconds - age since last success - Enables multi-node hotspot identification **Scheduler Health Metrics** - pulse_scheduler_queue_due_soon - ready queue depth - pulse_scheduler_queue_depth - by instance type - pulse_scheduler_queue_wait_seconds - time in queue histogram - pulse_scheduler_dead_letter_depth - failed task tracking - pulse_scheduler_breaker_state - circuit breaker state - pulse_scheduler_breaker_failure_count - consecutive failures - pulse_scheduler_breaker_retry_seconds - time until retry - Enable alerting on DLQ spikes, breaker opens, queue backlogs **Diagnostics Endpoint Caching** - pulse_diagnostics_cache_hits_total - cache performance - pulse_diagnostics_cache_misses_total - cache misses - pulse_diagnostics_refresh_duration_seconds - probe timing - 45-second TTL prevents thundering herd on /api/diagnostics - Thread-safe with RWMutex - X-Diagnostics-Cached-At header shows cache freshness **Debug Log Performance** - Gate high-frequency debug logs behind IsLevelEnabled() checks - Reduces CPU waste in production when debug disabled - Covers scheduler loops, poll cycles, API handlers **Persistent Logging** - File logging with automatic rotation - LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, LOG_COMPRESS env vars - MultiWriter sends logs to both stderr and file - Gzip compression support for rotated logs Files modified: - internal/api/diagnostics.go (caching layer) - internal/api/middleware.go (request IDs, HTTP metrics) - internal/api/http_metrics.go (NEW - HTTP metric definitions) - internal/logging/logging.go (file logging with rotation) - internal/monitoring/metrics.go (node + scheduler metrics) - internal/monitoring/monitor.go (instrumentation, debug gating) Impact: Dramatically improved production troubleshooting with per-node visibility, scheduler health metrics, persistent logs, and cached diagnostics. Fast incident response now possible for multi-node deployments. |
||
|---|---|---|
| .. | ||
| alerts.go | ||
| alerts_test.go | ||
| auth.go | ||
| config_handlers.go | ||
| csrf_store.go | ||
| demo_middleware.go | ||
| diagnostics.go | ||
| DO_NOT_EDIT_FRONTEND_HERE.md | ||
| docker_agents.go | ||
| frontend_embed.go | ||
| guest_metadata.go | ||
| http_metrics.go | ||
| middleware.go | ||
| notifications.go | ||
| oidc_handlers.go | ||
| oidc_service.go | ||
| rate_limit_config.go | ||
| rate_limit_config_test.go | ||
| ratelimit.go | ||
| README.md | ||
| recovery_tokens.go | ||
| router.go | ||
| router_integration_test.go | ||
| security.go | ||
| security_oidc.go | ||
| security_setup_fix.go | ||
| security_tokens.go | ||
| session_store.go | ||
| settings.go | ||
| system_settings.go | ||
| types.go | ||
| updates.go | ||
Internal API Package
This directory contains the API server implementation for Pulse.
Important Note About frontend-modern/
The frontend-modern/ subdirectory that appears here is:
- AUTO-GENERATED during builds
- NOT the source code - just a build artifact
- IN .gitignore - never committed
- REQUIRED BY GO - The embed directive needs it here
Frontend Development Location
👉 Edit frontend files at: /opt/pulse/frontend-modern/src/
Why This Structure?
Go's //go:embed directive has limitations:
- Cannot use
../paths to access parent directories - Cannot follow symbolic links
- Must embed files within the Go module
This is a known Go limitation and our structure works around it.