Document the pulse-sensor-proxy rate limiting bug fix and new
configurability across all relevant documentation:
TEMPERATURE_MONITORING.md:
- Added 'Rate Limiting & Scaling' section with symptom diagnosis
- Included sizing table for 1-3, 4-10, 10-20, and 30+ node deployments
- Provided tuning formula: interval_ms = polling_interval / node_count
TROUBLESHOOTING.md:
- Added 'Temperature data flickers after adding nodes' section
- Step-by-step diagnosis using limiter metrics and scheduler health
- Quick fix with config example
CONFIGURATION.md:
- Added pulse-sensor-proxy/config.yaml reference section
- Documented rate_limit.per_peer_interval_ms and per_peer_burst fields
- Included defaults and example override
pulse-sensor-proxy-runbook.md:
- Updated quick reference with new defaults (1 req/sec, burst 5)
- Added 'Rate Limit Tuning' procedure with 4 deployment profiles
- Included validation steps and monitoring commands
TEMPERATURE_MONITORING_SECURITY.md:
- Updated rate limiting section with new defaults
- Added configurable overrides guidance
- Documented security considerations for production deployments
Related commits:
- 46b8b8d08: Initial rate limit fix (hardcoded defaults)
- ca534e2b6: Made rate limits configurable via YAML
- e244da837: Added guidance for large deployments (30+ nodes)
The previous diagrams were too complex and overwhelming. Simplified
all diagrams to show core concepts clearly:
- Adaptive polling: reduced to basic scheduler→queue→workers flow
- Temperature proxy: simplified to 3-box trust boundary view
- Sensor proxy sequence: simplified to essential request flow
- Webhook pipeline: reduced to template→send→retry flow
- Script library: simplified to code→test→bundle→dist flow
Fixed parsing error in temperature proxy diagram (parentheses in
edge label causing render failure).
Diagrams should clarify architecture, not recreate implementation.
Implements all remaining Codex recommendations before launch:
1. Privileged Methods Tests:
- TestPrivilegedMethodsCompleteness ensures all host-side RPCs are protected
- Will fail if new privileged RPC is added without authorization
- Verifies read-only methods are NOT in privilegedMethods
2. ID-Mapped Root Detection Tests:
- TestIDMappedRootDetection covers all boundary conditions
- Tests UID/GID range detection (both must be in range)
- Tests multiple ID ranges, edge cases, disabled mode
- 100% coverage of container identification logic
3. Authorization Tests:
- TestPrivilegedMethodsBlocked verifies containers can't call privileged RPCs
- TestIDMappedRootDisabled ensures feature can be disabled
- Tests both container and host credentials
4. Comprehensive Security Documentation (23 KB):
- Architecture overview with diagrams
- Complete authentication & authorization flow
- Rate limiting details (already implemented: 20/min per peer)
- SSH security model and forced commands
- Container isolation mechanisms
- Monitoring & alerting recommendations
- Development mode documentation (PULSE_DEV_ALLOW_CONTAINER_SSH)
- Troubleshooting guide with common issues
- Incident response procedures
Rate Limiting Status:
- Already implemented in throttle.go (20 req/min, burst 10, max 10 concurrent)
- Per-peer rate limiting at line 328 in main.go
- Per-node concurrency control at line 825 in main.go
- Exceeds Codex's requirements
All tests pass. Documentation covers all security aspects.
Addresses final Codex recommendations for production readiness.