cmd/eval/main.go:
- Fix fmt.Errorf format string lint warning (use %s instead of bare string)
internal/logging/logging_test.go:
- Update tests to account for LogBroadcaster wrapper in baseWriter
- Use string representation checks instead of direct pointer comparison
- Verify both the underlying writer and broadcaster are present
Refactor patrol eval runner to use a dual approach:
1. Poll GET /api/ai/patrol/status until Running=false (primary signal)
2. Best-effort SSE stream connection for tool event visibility
Changes:
- Add status polling loop with configurable timeout
- Make SSE stream optional (may not connect in time)
- Add Completed flag to PatrolRunResult
- Improve assertion error messages
- Add new scenarios and assertions
This is more reliable than relying solely on SSE stream which
may timeout waiting for headers during slow patrol initialization.
Add comprehensive patrol evaluation framework:
- patrol.go: Runner for patrol scenarios with streaming support
- patrol_assertions.go: Assertions for tool usage, findings, timing
- patrol_scenarios.go: Scenarios for basic, investigation, finding quality
- eval_test.go: Unit tests for patrol eval runner
Scenarios:
- patrol-basic: Verifies patrol completes with tools and findings
- patrol-investigation: Ensures investigation before reporting
- patrol-finding-quality: Validates finding structure and evidence
Run with: go run ./cmd/eval -scenario patrol
- Add retry logic for transient failures (phantom, stream, empty response)
- Add environment variable overrides for infrastructure naming
- Add JSON report output per scenario
- Expand assertions with new validation types
- Add more comprehensive test scenarios
- Add docs/EVAL.md with usage documentation
The eval harness now better handles flaky AI responses and provides
detailed reports for debugging.