Commit graph

20 commits

Author SHA1 Message Date
rcourtman
778a2577b6 feat: Pulse v6 release 2026-03-18 16:06:30 +00:00
rcourtman
0ca6001bad docs: update documentation after sensor proxy deprecation
Update docs to reflect the simplified temperature monitoring architecture:
- Remove references to pulse-sensor-proxy throughout
- Update TEMPERATURE_MONITORING.md to focus on unified agent approach
- Update CONFIGURATION.md, DEPLOYMENT_MODELS.md, FAQ.md
- Remove SECURITY_CHANGELOG.md (proxy-specific security notes)
- Clarify current recommended setup in various guides
2026-01-21 12:00:59 +00:00
rcourtman
ee63d438cc docs: standardize markdown syntax and remove deprecated sensor-proxy docs 2026-01-20 09:43:49 +00:00
rcourtman
80729408c1 docs: add RBAC endpoints, OIDC group mapping, and update Pro terminology
- Add RBAC/role management endpoints to API.md
- Document OIDC group-to-role mapping feature in OIDC.md
- Add missing config files to CONFIGURATION.md (audit.db, AI files)
- Add OIDC_GROUP_ROLE_MAPPINGS env var documentation
- Fix "enterprise" -> "Pro" terminology in TROUBLESHOOTING.md
- Refocus TEMPERATURE_MONITORING.md on agent method, collapse legacy proxy docs
2026-01-10 13:59:50 +00:00
rcourtman
73c5128a87 feat(audit): Add audit log API endpoints and UI with signature verification
- Add GET /api/audit endpoint for listing events with filters
- Add GET /api/audit/:id/verify endpoint for signature verification
- Add AuditLogPanel UI component with filtering and verification
- Update docs with audit API documentation
- Add localStorage utils for persisting UI state
- Update gitignore patterns
2026-01-08 19:19:57 +00:00
rcourtman
3f0808e9f9 docs: comprehensive core and Pro documentation overhaul
- Major updates to README.md and docs/README.md for Pulse v5
- Added technical deep-dives for Pulse Pro (docs/PULSE_PRO.md) and AI Patrol (docs/AI.md)
- Updated Prometheus metrics documentation and Helm schema for metrics separation
- Refreshed security, installation, and deployment documentation for unified agent models
- Cleaned up legacy summary files
2026-01-07 17:38:27 +00:00
rcourtman
9cfcdbb247 fix: Use per-node shared flag for storage deduplication
The storage deduplication logic only checked cluster config's Shared
flag, but this required the cluster config API call to succeed. When
the per-node storage API already returns shared=1 (as the user
verified), we should use that directly.

Now we check three sources for shared storage detection:
1. Per-node API shared flag (storage.Shared)
2. Cluster config shared flag (if available)
3. Storage type heuristics (NFS, RBD, PBS, etc.)

Related to #1049
2026-01-07 10:16:23 +00:00
rcourtman
65829983b5 v5: gate legacy sensor-proxy and prune dev docs 2025-12-18 21:51:25 +00:00
rcourtman
2b48b0a459 feat: add --kube-include-all-deployments flag for Kubernetes agent
Adds IncludeAllDeployments option to show all deployments, not just
problem ones (where replicas don't match desired). This provides parity
with the existing --kube-include-all-pods flag.

- Add IncludeAllDeployments to kubernetesagent.Config
- Add --kube-include-all-deployments flag and PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS env var
- Update collectDeployments to respect the new flag
- Add test for IncludeAllDeployments functionality
- Update UNIFIED_AGENT.md documentation

Addresses feedback from PR #855
2025-12-18 20:58:30 +00:00
courtmanr@gmail.com
a8378b9e0c Refactor agent and troubleshooting docs to be modern and concise 2025-11-25 00:18:10 +00:00
rcourtman
e39c6a3660 docs(sensor-proxy): comprehensive config management documentation
Adds complete documentation for the new sensor-proxy config management CLI
implemented in Phase 2. Addresses user-facing aspects of the corruption fix.

**New Documentation:**
- docs/operations/sensor-proxy-config-management.md (469 lines)
  - Complete operations runbook for config management
  - Full CLI reference with examples
  - Migration guide from inline config
  - Architecture explanation
  - Common operational tasks
  - Troubleshooting guide
  - Best practices and automation

**Updated Documentation:**
- cmd/pulse-sensor-proxy/README.md
  - Configuration Management CLI section
  - Allowed Nodes File format
  - Enhanced troubleshooting
  - Config corruption recovery

- docs/TEMPERATURE_MONITORING.md
  - Config validation failure troubleshooting
  - Configuration Management quick reference
  - Cross-links to detailed docs

- docs/TROUBLESHOOTING.md
  - Sensor proxy config validation errors
  - Comprehensive diagnosis steps
  - Automatic and manual recovery

- README.md & docs/README.md
  - Added new runbook to operations index
  - Positioned for discoverability

**Coverage:**
- Both CLI commands fully documented
- Phase 1 & Phase 2 architecture explained
- Migration path from pre-v4.31.1
- Config corruption recovery procedures
- Safe config editing practices
- Automation examples
- Troubleshooting all failure modes

**Documentation Quality:**
- Cross-linked from 5 different documents
- Clear examples for common use cases
- Target audience: system administrators
- Follows project documentation style
- Production-ready

This completes the sensor-proxy config corruption fix by providing users
with comprehensive guidance for the new config management system.

Related to Phase 2 commits 3dc073a28, 804a638ea, 131666bc1
2025-11-19 10:01:33 +00:00
rcourtman
2850f20dad docs: add auto-update troubleshooting 2025-11-14 01:07:32 +00:00
rcourtman
8ca31003a0 docs: document TLS certificate file permissions for HTTPS setup
Add comprehensive documentation for HTTPS/TLS configuration including:
- File ownership and permission requirements (pulse user)
- Common troubleshooting steps for startup failures
- Complete setup examples for systemd and Docker
- Validation commands for certificate/key verification

Related to discussion #634
2025-11-05 23:08:02 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
e0396c1362 docs: update documentation for diagnostics improvements
Add comprehensive operator documentation for the new observability features
introduced in the previous commit.

**New Documentation:**
- docs/monitoring/PROMETHEUS_METRICS.md - Complete reference for all 18 new
  Prometheus metrics with alert suggestions

**Updated Documentation:**
- docs/API.md - Document X-Request-ID and X-Diagnostics-Cached-At headers,
  explain diagnostics endpoint caching behavior
- docs/TROUBLESHOOTING.md - Add section on correlating API calls with logs
  using request IDs
- docs/operations/ADAPTIVE_POLLING_ROLLOUT.md - Update monitoring checklists
  with new per-node and scheduler metrics
- docs/CONFIGURATION.md - Clarify LOG_FILE dual-output behavior and rotation
  defaults

These updates ensure operators understand:
- How to set up monitoring/alerting for new metrics
- How to configure file logging with rotation
- How to troubleshoot using request correlation
- What metrics are available for dashboards

Related to: 495e6c794 (feat: comprehensive diagnostics improvements)
2025-10-21 12:45:19 +00:00
rcourtman
ddc9a7a068 docs: comprehensive documentation for rate limit fix and configurability
Document the pulse-sensor-proxy rate limiting bug fix and new
configurability across all relevant documentation:

TEMPERATURE_MONITORING.md:
- Added 'Rate Limiting & Scaling' section with symptom diagnosis
- Included sizing table for 1-3, 4-10, 10-20, and 30+ node deployments
- Provided tuning formula: interval_ms = polling_interval / node_count

TROUBLESHOOTING.md:
- Added 'Temperature data flickers after adding nodes' section
- Step-by-step diagnosis using limiter metrics and scheduler health
- Quick fix with config example

CONFIGURATION.md:
- Added pulse-sensor-proxy/config.yaml reference section
- Documented rate_limit.per_peer_interval_ms and per_peer_burst fields
- Included defaults and example override

pulse-sensor-proxy-runbook.md:
- Updated quick reference with new defaults (1 req/sec, burst 5)
- Added 'Rate Limit Tuning' procedure with 4 deployment profiles
- Included validation steps and monitoring commands

TEMPERATURE_MONITORING_SECURITY.md:
- Updated rate limiting section with new defaults
- Added configurable overrides guidance
- Documented security considerations for production deployments

Related commits:
- 46b8b8d08: Initial rate limit fix (hardcoded defaults)
- ca534e2b6: Made rate limits configurable via YAML
- e244da837: Added guidance for large deployments (30+ nodes)
2025-10-21 11:36:07 +00:00
rcourtman
c91b7874ac docs: comprehensive v4.24.0 documentation audit and updates
Complete documentation overhaul for Pulse v4.24.0 release covering all new
features and operational procedures.

Documentation Updates (19 files):

P0 Release-Critical:
- Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook
- Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status
- Operations: Enhanced audit-log-rotation.md with scheduler health checks
- Security: Updated proxy hardening docs with rate limit defaults
- Docker: Added runtime logging and rollback procedures

P1 Deployment & Integration:
- KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification
- PORT_CONFIGURATION.md: Service naming, change tracking via update history
- REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification
- PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration
- TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows

Features Documented:
- X-RateLimit-* headers for all API responses
- Updates rollback workflow (UI & CLI)
- Scheduler health API with rich metadata
- Runtime logging configuration (no restart required)
- Adaptive polling (GA, enabled by default)
- Enhanced audit logging
- Circuit breakers and dead-letter queue

Supporting Changes:
- Discovery service enhancements
- Config handlers updates
- Sensor proxy installer improvements

Total Changes: 1,626 insertions(+), 622 deletions(-)
Files Modified: 24 (19 docs, 5 code)

All documentation is production-ready for v4.24.0 release.
2025-10-20 17:20:13 +00:00
rcourtman
3a4fc044ea Add guest agent caching and update doc hints (refs #560) 2025-10-16 08:15:49 +00:00
rcourtman
156fd34c50 Update Proxmox guest agent permissions docs and tooling (refs #548) 2025-10-14 10:21:52 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00