Patrol runs, evaluation passes, and QuickAnalysis calls were consuming
LLM tokens without recording them in the cost store. This made the
cost_budget_usd_30d budget setting ineffective since enforceBudget()
never saw patrol spend.
- Add RecordUsage() to ai.Service for thread-safe cost recording
- Add recordPatrolUsage() helper to PatrolService, called on both
success and error paths for main patrol and evaluation pass
- Record QuickAnalysis token usage in cost store
- Return partial PatrolResponse (with token counts) on error instead
of nil, so callers can always record consumed tokens
- Propagate partial response through chat_service_adapter on error
Send an SSE comment immediately when a client connects to the patrol
stream endpoint. This flushes HTTP headers so clients receive the
200 response right away, rather than blocking until the first event.
This fixes eval tests where the stream connection would time out
waiting for headers while patrol was still initializing.
- Add ExecutePatrolStream method to chat.Service for patrol-specific execution
- Create chat_service_adapter.go to bridge chat.Service to ai.ChatServiceProvider
- Remove standalone patrol.go and patrol_test.go from chat package
- Add PatrolRequest/PatrolResponse types to chat service
- Add context injection for recent message context
This allows patrol to use an isolated agentic loop with its own system prompt
while leveraging the common chat infrastructure.