# ADR-101: Testing Strategy & Fidelity Verification

| Field       | Value                                           |
|-------------|------------------------------------------------|
| **Status**  | Accepted                                        |
| **Date**    | 2026-03-14                                      |
| **Authors** | ruvnet                                          |
| **Series**  | ADR-093 (DeepAgents Rust Conversion)            |

## Context

DeepAgents has extensive test coverage:

- **Unit tests** — 80+ test files across `libs/deepagents/tests/unit_tests/`
- **Integration tests** — Cross-module tests in `tests/integration_tests/`
- **Eval tests** — LLM-powered behavioral tests in `tests/evals/`
- **CLI tests** — 40+ test files in `libs/cli/tests/`
- **Smoke tests** — System prompt validation

100% fidelity requires that the Rust implementation passes equivalent tests producing identical results.

## Decision

### Test Categories

#### 1. Unit Tests (Port from Python)

Each Python unit test file maps to a Rust test module:

| Python Test | Rust Test | Tests |
|---|---|---|
| `test_protocol.py` | `backends/protocol_test.rs` | FileInfo, GrepMatch, WriteResult, EditResult structs |
| `test_state_backend.py` | `backends/state_test.rs` | StateBackend CRUD, ls, grep, glob |
| `test_filesystem_backend.py` | `backends/filesystem_test.rs` | FilesystemBackend with real files |
| `test_filesystem_backend_async.py` | `backends/filesystem_async_test.rs` | Async variants |
| `test_local_shell_backend.py` | `backends/local_shell_test.rs` | Execute, timeout, output truncation |
| `test_composite_backend.py` | `backends/composite_test.rs` | Path routing, result remapping |
| `test_sandbox_backend.py` | `backends/sandbox_test.rs` | BaseSandbox command templates |
| `test_state_backend_async.py` | `backends/state_async_test.rs` | Async StateBackend |
| `test_store_backend.py` | `backends/store_test.rs` | StoreBackend persistence |
| `test_utils.py` | `backends/utils_test.rs` | format_content_with_line_numbers, perform_string_replacement |
| `test_file_system_tools.py` | `tools/filesystem_test.rs` | ls, read, write, edit, glob, grep tools |
| `test_file_system_tools_async.py` | `tools/filesystem_async_test.rs` | Async tool variants |
| `test_local_shell.py` | `tools/execute_test.rs` | Execute tool behavior |
| `test_middleware.py` | `middleware/pipeline_test.rs` | Middleware ordering, state injection |
| `test_middleware_async.py` | `middleware/pipeline_async_test.rs` | Async middleware |
| `test_subagents.py` | `subagents/task_test.rs` | Task tool, state isolation |
| `test_memory_middleware.py` | `middleware/memory_test.rs` | AGENTS.md loading |
| `test_skills_middleware.py` | `middleware/skills_test.rs` | SKILL.md parsing, validation |
| `test_summarization_middleware.py` | `middleware/summarization_test.rs` | Auto-compact trigger |
| `test_compact_tool.py` | `middleware/compact_tool_test.rs` | compact_conversation tool |
| `test_tool_schemas.py` | `tools/schema_test.rs` | Tool parameter schemas |
| `test_models.py` | `core/models_test.rs` | resolve_model, model_matches_spec |
| `test_end_to_end.py` | `core/e2e_test.rs` | Full agent invocation |
| `test_version.py` | `core/version_test.rs` | Version constant |

#### 2. Cross-Language Fidelity Tests

Golden-file tests that verify Rust output matches Python output exactly:

```rust
#[cfg(test)]
mod fidelity_tests {
    /// Test that format_content_with_line_numbers produces identical output.
    #[test]
    fn test_line_number_formatting_matches_python() {
        let lines = vec!["hello", "world", ""];
        let result = format_content_with_line_numbers(&lines, 1);
        // Must match Python's exact output character-for-character
        assert_eq!(result, "     1\thello\n     2\tworld\n     3\t");
    }

    /// Test that grep_raw produces identical GrepMatch structs.
    #[test]
    fn test_grep_matches_python_format() {
        let backend = FilesystemBackend::new(tmp_dir, false);
        // Write test file, grep, compare with Python golden output
    }

    /// Test that edit with replace_all=false rejects multiple occurrences.
    #[test]
    fn test_edit_uniqueness_check() {
        let backend = StateBackend::new(state_with_file("a\na\n"));
        let result = backend.edit("/test.txt", "a", "b", false);
        assert!(result.error.is_some());
        // Error message must match Python's exact wording
    }

    /// Test that CompositeBackend routes identically.
    #[test]
    fn test_composite_routing_matches_python() {
        // Same path inputs → same backend selection → same path stripping
    }
}
```

#### 3. Property-Based Tests

```rust
use proptest::prelude::*;

proptest! {
    /// Any valid path resolves consistently between backends.
    #[test]
    fn path_resolution_consistent(path in "[a-z/]+") {
        let fs = FilesystemBackend::new(tmp, true);
        let resolved = fs.resolve_path(&path);
        // Verify: no path traversal, within root, deterministic
    }

    /// String replacement is idempotent for unique matches.
    #[test]
    fn edit_idempotent_unique(
        content in ".*",
        old in ".+",
        new in ".*",
    ) {
        // If old appears exactly once, edit succeeds
        // If old appears 0 or 2+ times, edit fails with correct error
    }

    /// Skill name validation matches spec exactly.
    #[test]
    fn skill_name_validation(name in "[a-z0-9-]{0,100}") {
        // validate_skill_name produces same result as Python
    }
}
```

#### 4. Integration Tests

```rust
// tests/integration/

/// Full agent creation and invocation with mock LLM.
#[tokio::test]
async fn test_create_deep_agent_with_defaults() {
    let agent = create_deep_agent(DeepAgentConfig {
        model: MockChatModel::new(),
        ..Default::default()
    });

    let result = agent.invoke(AgentState::from_messages(vec![
        HumanMessage::new("Hello"),
    ])).await;

    assert!(result.messages.last().unwrap().is_ai());
}

/// SubAgent task tool with parallel invocation.
#[tokio::test]
async fn test_parallel_subagent_invocation() {
    // Verify that two task tool calls execute concurrently
    // and both return results to the parent agent
}

/// Middleware pipeline ordering.
#[tokio::test]
async fn test_middleware_execution_order() {
    // Verify: TodoList → Memory → Skills → Filesystem →
    //         SubAgent → Summarization → PromptCaching → PatchToolCalls
}

/// Session persistence and resume.
#[tokio::test]
async fn test_session_round_trip() {
    // Create session → checkpoint → resume → verify state
}
```

#### 5. CLI Tests

```rust
// tests/cli/

/// CLI argument parsing matches Python's argparse behavior.
#[test]
fn test_cli_args() {
    let cli = Cli::try_parse_from(["deep", "--model", "openai:gpt-5", "-a", "myagent"]).unwrap();
    assert_eq!(cli.model.unwrap(), "openai:gpt-5");
    assert_eq!(cli.agent.unwrap(), "myagent");
}

/// Non-interactive mode produces same output format.
#[tokio::test]
async fn test_headless_mode() {
    // Run with --headless, verify JSON output matches Python
}
```

### Test Infrastructure

```rust
/// Mock ChatModel for deterministic testing.
/// Python: tests/unit_tests/chat_model.py
pub struct MockChatModel {
    responses: Vec<AIMessage>,
    call_count: AtomicUsize,
}

impl ChatModel for MockChatModel {
    fn invoke(&self, messages: &[Message]) -> AIMessage {
        let idx = self.call_count.fetch_add(1, Ordering::SeqCst);
        self.responses[idx].clone()
    }
}

/// Temporary directory helper for filesystem tests.
pub struct TempBackend {
    dir: tempfile::TempDir,
    backend: FilesystemBackend,
}

impl TempBackend {
    pub fn new() -> Self {
        let dir = tempfile::tempdir().unwrap();
        let backend = FilesystemBackend::new(dir.path(), false);
        Self { dir, backend }
    }
}
```

### Coverage Targets

| Category | Target | Method |
|---|---|---|
| Backend protocol | 100% | Unit tests per method |
| Tool implementations | 100% | Golden-file fidelity tests |
| Middleware pipeline | 100% | Integration + ordering tests |
| State isolation | 100% | Property tests |
| Skill validation | 100% | Exhaustive + property tests |
| CLI args | 100% | Clap derive tests |
| Session persistence | 100% | Round-trip serialization |
| Error messages | 100% | Exact string matching |

## Consequences

- 80+ Python test files ported to Rust with identical assertions
- Golden-file tests guarantee character-for-character output fidelity
- Property-based tests catch edge cases not covered by Python suite
- MockChatModel enables deterministic agent testing without LLM calls
- CI runs both Python and Rust test suites to verify behavioral parity