Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 19:41:17 +00:00

Author	SHA1	Message	Date
rcourtman	a2cfda0936	fix(test): remove flaky content type test in eval	2026-02-02 19:26:24 +00:00
rcourtman	9b304f8a78	test(ai): comprehensive eval coverage (~71%) including scenarios, overrides, and error cases	2026-02-02 19:18:19 +00:00
rcourtman	abc8900d4c	test(ai): add patrol assertions tests, coverage now 53.3%	2026-02-02 19:11:39 +00:00
rcourtman	aa4d728963	test(ai): add patrol quality logic tests, coverage now 42.5%	2026-02-02 19:10:45 +00:00
rcourtman	469c687860	test(ai): improve eval package coverage to 40%	2026-02-02 19:09:13 +00:00
rcourtman	5959cd9d7f	test(ai): add unit tests for eval runner - Add unit tests for internal/ai/eval package - Validate configuration, retry logic, and custom SSE parsing - Enables coverage for eval framework without requiring live Pulse server	2026-02-02 14:54:01 +00:00
rcourtman	9b0fb527f5	feat(patrol): implement patrol findings, evaluation, and investigation logic - Add core Patrol system for automated investigations - Implement findings management and deduplication logic - Add evaluation framework (patrol_eval) with quality assertions and scenarios - Add patrol-specific tools and executor integration - Add E2E test matrix script	2026-01-31 16:23:08 +00:00
rcourtman	95a0d7a6bd	feat(backend): implement AI Patrol, Investigation, and system-wide refactors	2026-01-30 19:02:14 +00:00
rcourtman	0e880f3c89	feat(eval): improve patrol eval with polling-based completion Refactor patrol eval runner to use a dual approach: 1. Poll GET /api/ai/patrol/status until Running=false (primary signal) 2. Best-effort SSE stream connection for tool event visibility Changes: - Add status polling loop with configurable timeout - Make SSE stream optional (may not connect in time) - Add Completed flag to PatrolRunResult - Improve assertion error messages - Add new scenarios and assertions This is more reliable than relying solely on SSE stream which may timeout waiting for headers during slow patrol initialization.	2026-01-29 08:20:39 +00:00
rcourtman	c409e7a05e	feat(eval): add patrol-specific eval scenarios and assertions Add comprehensive patrol evaluation framework: - patrol.go: Runner for patrol scenarios with streaming support - patrol_assertions.go: Assertions for tool usage, findings, timing - patrol_scenarios.go: Scenarios for basic, investigation, finding quality - eval_test.go: Unit tests for patrol eval runner Scenarios: - patrol-basic: Verifies patrol completes with tools and findings - patrol-investigation: Ensures investigation before reporting - patrol-finding-quality: Validates finding structure and evidence Run with: go run ./cmd/eval -scenario patrol	2026-01-28 23:19:11 +00:00
rcourtman	44fecc37c0	feat(eval): enhance AI eval harness with retries and reporting - Add retry logic for transient failures (phantom, stream, empty response) - Add environment variable overrides for infrastructure naming - Add JSON report output per scenario - Expand assertions with new validation types - Add more comprehensive test scenarios - Add docs/EVAL.md with usage documentation The eval harness now better handles flaky AI responses and provides detailed reports for debugging.	2026-01-28 21:24:12 +00:00
rcourtman	a04d41ce2c	Add end-to-end evaluation framework for AI assistant testing Implement comprehensive eval framework for testing Pulse Assistant: Core components: - Runner: Executes scenarios against live API with SSE stream parsing - Assertions: Reusable checks (tool usage, content, duration, errors) - Scenarios: Multi-step test workflows with configurable assertions Basic scenarios: - QuickSmokeTest: Minimal functionality verification - ReadOnlyInfrastructure: List, logs, status operations - RoutingValidation: Command routing to correct targets - LogTailing: Bounded log commands complete properly - Discovery: Infrastructure discovery capabilities Advanced scenarios: - TroubleshootingScenario: Multi-step investigation workflow - DeepDiveScenario: Thorough single-service investigation - ConfigInspectionScenario: Reading configuration files - ResourceAnalysisScenario: Cross-container resource comparison - MultiNodeScenario: Operations across Proxmox nodes - DockerInDockerScenario: Docker containers inside LXCs - ContextChainScenario: Context retention across turns Usage: go test ./internal/ai/eval -live -run TestQuickSmokeTest	2026-01-28 16:49:24 +00:00

12 commits