Ruview/docs/qe-reports/00-qe-queen-summary.md
ruv 924c32547e fix: ADR-080 P0 security + CI remediation from QE analysis
Address all 5 P0 issues from QE analysis (55/100 score):

- P0-1: Rate limiter bypass — validate X-Forwarded-For against trusted proxy list
- P0-2: Exception detail leak — generic 500 messages, exception_type gated by dev mode
- P0-3: WebSocket JWT in URL (CWE-598) — first-message auth pattern replaces query param
- P0-4: Rust tests not in CI — add rust-tests job gating docker-build and notify
- P0-5: WebSocket path mismatch — use WS_PATH constant instead of hardcoded /ws/sensing

Includes ADR-080 remediation plan and 9 QE reports (4,914 lines).
Firmware validated on ESP32-S3 (COM8): CSI collecting, calibration OK.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-04-06 16:12:13 -04:00

20 KiB

QE Queen Summary Report -- wifi-densepose

Date: 2026-04-05
Fleet ID: fleet-02558e91
Orchestrator: QE Queen Coordinator (ADR-001)
Domains Activated: test-generation, coverage-analysis, quality-assessment, security-compliance, defect-intelligence


1. Project Scope and Quality Posture Overview

1.1 Codebase Dimensions

Language / Layer Files Lines of Code Purpose
Rust (.rs) 379 153,139 Core workspace -- 19 crates (16 in workspace, 3 excluded/auxiliary)
Python (.py) 105 38,656 v1 implementation -- API, services, sensing, hardware, middleware
C/H (firmware) 48 9,445 ESP32 CSI node firmware -- collectors, OTA, WASM runtime
TypeScript/TSX (mobile) 48 7,571 React Native mobile app -- screens, stores, services
JavaScript (UI) ~117 25,798 Web observatory UI, components, utilities
Markdown (docs) ~79+ 70,539 79 ADRs, user guides, research, witness logs
Total ~776 ~305,148

1.2 Architecture Summary

The project implements WiFi-based human pose estimation using Channel State Information (CSI). It is structured as a multi-language, multi-platform system:

  • Rust workspace (v0.3.0): 16 crates in workspace plus wifi-densepose-wasm-edge (excluded for wasm32 target) and ruv-neural (auxiliary). Covers signal processing (RuvSense with 14 modules), neural inference (ONNX/PyTorch/Candle), mass casualty assessment (MAT), cross-viewpoint fusion (RuVector v2.0.4), hardware TDM protocol, and web APIs.
  • Python v1: Original implementation with 12 source modules covering API endpoints, CSI extraction, pose services, sensing, database, and middleware.
  • ESP32 firmware: C code for real WiFi CSI collection, edge processing, OTA updates, mmWave sensor integration, WASM runtime, and swarm bridging.
  • Mobile UI: React Native app with pose visualization, MAT screens, vitals monitoring, and RSSI scanning.
  • Web observatory: Three.js-based visualization for RF sensing, phase constellations, and subcarrier manifolds.

1.3 Governance and Process Maturity

Indicator Status Details
Architecture Decision Records Strong 79 ADRs documented in docs/adr/
CI/CD pipelines Strong 8 GitHub Actions workflows (CI, CD, security scan, firmware CI, QEMU, desktop release, verify pipeline, submodules)
Security scanning Strong Dedicated security-scan.yml with Bandit, Semgrep, Safety; runs daily on schedule
Deterministic verification Strong SHA-256 proof pipeline (v1/data/proof/verify.py) with witness bundles (ADR-028)
Code formatting Moderate Black/Flake8 enforced for Python in CI; no rustfmt.toml found for Rust
Type checking Moderate MyPy configured in CI for Python; Rust has native type safety
Dependency management Strong Workspace-level Cargo.toml with pinned versions; requirements.txt for Python

2. Test Pyramid Health

2.1 Overall Test Inventory

Test Layer Rust Python Mobile (TS) Firmware (C) Total
Unit tests 2,618 #[test] 322 functions / 15 files 202 test cases / 25 files 0 3,142
Integration tests 16 files / 7 crates 132 functions / 11 files 0 0 148+ functions
E2E tests 0 8 functions / 1 file 0 0 8 functions
Performance tests 0 26 functions / 2 files 0 0 26 functions
Fuzz tests 0 0 0 3 files (harnesses) 3 harnesses
Subtotal ~2,634 ~488 ~202 3 ~3,327

2.2 Test Pyramid Shape Analysis

Ideal Pyramid          Actual Shape          Assessment
                                            
    /\                    /\                  
   /E2E\                / 8 \                E2E: CRITICALLY THIN
  /------\              /----\               
 / Integ. \            / 148  \              Integration: THIN
/----------\          /--------\             
/   Unit    \        /  3,142   \            Unit: HEALTHY base
--------------      --------------          

Pyramid Ratio (unit : integration : e2e):

  • Actual: 394 : 19 : 1
  • Healthy target: 70 : 20 : 10 (percentage)
  • Actual percentage: 95.3% : 4.5% : 0.2%

Verdict: The pyramid is severely bottom-heavy. Unit tests are plentiful (good), but integration and E2E layers are dangerously thin relative to the project's complexity. For a multi-crate, multi-service system with hardware integration, the integration layer should be 3-4x larger, and E2E should be 10-20x larger.

2.3 Rust Test Distribution by Crate

Crate Source Lines Test Count Tests per 1K LOC Integration Tests Assessment
wifi-densepose-wasm-edge 28,888 643 22.3 3 files Good
wifi-densepose-signal 16,194 370 22.8 1 file Good
ruv-neural ~558 (test-only) 364 N/A 1 file Test-only crate
wifi-densepose-train 10,562 299 28.3 6 files Strong
wifi-densepose-sensing-server 17,825 274 15.4 3 files Moderate
wifi-densepose-mat 19,572 159 8.1 1 file Needs improvement
wifi-densepose-wifiscan 5,779 150 26.0 0 Unit only
wifi-densepose-hardware 4,005 106 26.5 0 Unit only
wifi-densepose-ruvector 4,629 106 22.9 0 Unit only
wifi-densepose-vitals 1,863 52 27.9 0 Unit only
wifi-densepose-desktop 3,309 39 11.8 1 file Thin
wifi-densepose-core 2,596 28 10.8 0 Thin for core crate
wifi-densepose-nn 2,959 23 7.8 0 Needs improvement
wifi-densepose-cli 1,317 5 3.8 0 Critically thin
wifi-densepose-wasm 1,805 0 0.0 0 ZERO tests
wifi-densepose-api 1 (stub) 0 N/A 0 Stub only
wifi-densepose-config 1 (stub) 0 N/A 0 Stub only
wifi-densepose-db 1 (stub) 0 N/A 0 Stub only

2.4 Python Test Coverage by Module

Source Module Source Lines Has Unit Tests Has Integration Tests Assessment
api (13 files) 3,694 No Yes (test_api_endpoints, test_rate_limiting) Partial
services (7 files) 3,038 No Yes (test_inference_pipeline) Partial
sensing (6 files) 2,117 Yes (test_sensing) Yes (test_streaming_pipeline) Moderate
tasks (3 files) 1,977 No No ZERO coverage
middleware (4 files) 1,798 No No ZERO coverage
database (5 files) 1,715 No No ZERO coverage
commands (3 files) 1,161 No No ZERO coverage
core (4 files) 1,117 No (tests focus on CSI extractor from hardware/) No ZERO coverage
config (3 files) 923 No No ZERO coverage
hardware (3 files) 755 Yes (test_csi_extractor, test_esp32_binary_parser) Yes (test_hardware_integration) Good
models (3 files) 578 No No ZERO coverage
testing (3 files) 500 No No ZERO coverage

Key finding: Python unit tests concentrate heavily on CSI extraction and processing (the hardware layer). 11 of 12 source modules have zero dedicated unit test files. The 322 unit test functions map almost entirely to hardware/csi_extractor.py and related signal processing code.

2.5 Mobile UI Test Coverage

The mobile UI has 25 test files with 202 test cases, covering:

  • Stores: poseStore (21), matStore (18), settingsStore (13) -- good state management coverage
  • Components: SignalBar, GaugeArc, ConnectionBanner, SparklineChart, OccupancyGrid, StatusDot, HudOverlay -- 7 components tested
  • Hooks: useServerReachability, useRssiScanner, usePoseStream -- 3 hooks tested
  • Services: api (14), ws (7), simulation (10), rssi (6) -- good service layer coverage
  • Screens: MAT (4), Live (4), Vitals (5), Zones (6), Settings (6) -- all main screens tested
  • Utils: ringBuffer (20), urlValidator (13), colorMap (9) -- thorough utility testing

Assessment: Mobile testing is the strongest layer relative to its codebase size. Good breadth across stores, components, services, and screens.

2.6 Firmware Test Coverage

Test Type Count Coverage
Fuzz harnesses 3 fuzz_csi_serialize.c, fuzz_edge_enqueue.c, fuzz_nvs_config.c
Unit tests 0 No structured unit testing framework
Integration tests 0 No automated hardware-in-the-loop tests

Assessment: The firmware has fuzz testing (a positive for security-critical embedded code), but lacks structured unit tests. The 9,445 lines of C code for a safety-relevant embedded system (disaster survivor detection via MAT) warrant stronger test coverage.


3. Cross-Cutting Quality Concerns

3.1 Code Complexity and Maintainability

Metric Value Threshold Status
AQE quality score 37/100 >70 FAIL
Cyclomatic complexity (avg) 24.09 <15 FAIL
Maintainability index 24.35 >50 FAIL
Security score 85/100 >80 PASS

Large file risk (>500 lines in Rust src/):

File Lines Risk
sensing-server/src/main.rs 4,846 Monolith risk -- nearly 10x the 500-line guideline
sensing-server/src/training_api.rs 1,946 High complexity
wasm/src/mat.rs 1,673 Hard to test, 0 tests in crate
train/src/metrics.rs 1,664 Complex math, needs exhaustive testing
signal/src/ruvsense/pose_tracker.rs 1,523 Critical path, well-tested
mat/src/integration/csi_receiver.rs 1,401 Integration boundary
mat/src/integration/hardware_adapter.rs 1,360 Hardware boundary, audit needed

24 Rust source files exceed 500 lines, violating the project's own CLAUDE.md guideline.

3.2 Error Handling Quality (Rust)

Pattern Count Assessment
Result<> returns 450 Good -- idiomatic error handling in use
.unwrap() calls 720 HIGH RISK -- 720 potential panic points in production code
.expect() calls 35 Acceptable -- provides context on failure
panic!() calls 1 Good -- minimal explicit panics
unsafe blocks 340 NEEDS AUDIT -- high count for an application-level project

Critical concern: The 720 .unwrap() calls represent potential runtime panics. In a system processing real-time WiFi CSI data for pose estimation (and mass casualty assessment), an unwrap failure could crash the entire pipeline. Each call should be reviewed and converted to proper error propagation with ? operator or explicit error handling.

The 340 unsafe blocks are high for a project that is not a systems-level library. These need a focused audit to verify memory safety invariants are upheld, especially in signal processing and hardware interaction code.

3.3 Security Posture

Check Result Details
Hardcoded secrets in Python 0 found Clean
SQL injection risk (f-string SQL) 0 found Clean -- likely using parameterized queries
Python eval() usage 2 calls Safe -- both are PyTorch model.eval() (inference mode), not Python eval
Firmware buffer overflow risk 0 strcpy/sprintf Clean -- uses safe string functions
CI security scanning Active Bandit, Semgrep, Safety in dedicated workflow, runs daily
Dependency scanning Active Safety checks in CI

Security assessment: GOOD. The project follows secure coding practices. The dedicated security-scan workflow with daily scheduling is a strong indicator of security maturity. No critical vulnerabilities detected in static analysis patterns.

3.4 Documentation Quality

Metric Value Assessment
Rust /// doc comments 11,965 Strong
Rust //! module docs 3,512 Strong
Rust pub fn with docs 1,781 / 3,912 (45.5%) Moderate -- 54.5% of public functions lack doc comments
Python functions with docstrings ~543 / ~801 (67.8%) Good
Python classes with docstrings ~121 / ~150 (80.7%) Strong
ADRs 79 Excellent governance
TODO/FIXME markers 1 (Python), 0 (Rust) Clean -- no deferred technical debt markers

3.5 CI/CD Pipeline Coverage

Workflow Trigger Scope
ci.yml Push/PR to main, develop, feature/* Python quality (Black, Flake8, MyPy), security (Bandit, Safety)
cd.yml (deployment) Production deployment
security-scan.yml Push/PR + daily cron SAST with Bandit, Semgrep; dependency scanning with Safety
firmware-ci.yml Push/PR ESP32 firmware build verification
firmware-qemu.yml Push/PR ESP32 QEMU emulation tests
desktop-release.yml Release Desktop application packaging
verify-pipeline.yml Push/PR Deterministic proof verification
update-submodules.yml Manual/scheduled Git submodule sync

Gap: No CI workflow runs cargo test --workspace for the Rust codebase. The 2,618+ Rust tests appear to run only locally. This is a significant gap -- the largest and most critical codebase has no automated CI test execution.


4. Recommendations Matrix

# Recommendation Priority Effort Impact Domain
R1 Add Rust workspace tests to CI -- Create a GitHub Actions workflow that runs cargo test --workspace --no-default-features. The 2,618 Rust tests are the project's primary safety net but run only locally. CRITICAL Low (1-2 days) Very High CI/CD
R2 Reduce .unwrap() calls -- Audit and convert the 720 .unwrap() calls in Rust production code to proper ? error propagation. Prioritize crates in the real-time pipeline: signal, mat, hardware, sensing-server. CRITICAL High (2-3 weeks) Very High Reliability
R3 Audit unsafe blocks -- Review all 340 unsafe blocks. Document safety invariants for each. Consider using unsafe_code lint to flag new additions. CRITICAL Medium (1-2 weeks) High Security
R4 Add Python unit tests for untested modules -- 11 of 12 Python source modules have zero unit tests. Priority targets: api/ (3,694 LOC), services/ (3,038 LOC), database/ (1,715 LOC), middleware/ (1,798 LOC). HIGH Medium (2-3 weeks) High Coverage
R5 Add integration tests for 7 Rust crates -- wifi-densepose-core, wifi-densepose-hardware, wifi-densepose-nn, wifi-densepose-ruvector, wifi-densepose-vitals, wifi-densepose-wifiscan, wifi-densepose-cli have unit tests but no integration test directory. HIGH Medium (2 weeks) High Coverage
R6 Break up sensing-server/src/main.rs (4,846 lines) -- Extract route handlers, middleware, and configuration into separate modules. This single file is nearly 10x the project's 500-line guideline. HIGH Medium (1 week) Medium Maintainability
R7 Add E2E tests -- Only 1 E2E test file exists (test_healthcare_scenario.py with 8 tests). For a system with REST API, WebSocket streaming, hardware integration, and mobile clients, E2E coverage is critically insufficient. HIGH High (3-4 weeks) Very High Coverage
R8 Add tests to wifi-densepose-wasm (1,805 LOC, 0 tests) -- This crate contains MAT WebAssembly bindings used in browser deployment. Zero test coverage for a user-facing interface is unacceptable. HIGH Low (3-5 days) Medium Coverage
R9 Add firmware unit tests -- Adopt a C unit test framework (Unity, CMock, or CTest) for the 9,445 lines of ESP32 firmware. The fuzz harnesses are a good start but do not substitute for structured unit tests. MEDIUM Medium (2 weeks) Medium Coverage
R10 Improve Rust public API documentation -- 54.5% of pub fn declarations lack doc comments. Add #![warn(missing_docs)] to crate lib.rs files to enforce documentation. MEDIUM Medium (1-2 weeks) Medium Documentation
R11 Add rustfmt.toml -- No Rust formatting configuration found. Add workspace-level rustfmt.toml and enforce in CI with cargo fmt --check. LOW Low (1 day) Low Consistency
R12 Reduce cyclomatic complexity -- Average complexity of 24.09 is well above the 15 threshold. Target the 24 files over 500 lines for refactoring. MEDIUM High (3-4 weeks) High Maintainability

5. Overall Quality Score

5.1 Scoring Methodology

Weighted scoring across 8 dimensions, each rated 0-100:

Dimension Weight Score Weighted Rationale
Unit test coverage 20% 68 13.6 3,142 unit tests is strong for Rust/mobile, but Python modules severely undertested
Integration test coverage 15% 32 4.8 Only 7 of 19 Rust crates have integration tests; Python integration tests exist but skip core modules
E2E test coverage 10% 8 0.8 1 E2E file with 8 tests for a multi-platform system is critically insufficient
Security posture 15% 82 12.3 Strong CI security scanning, clean code patterns, daily Bandit/Semgrep/Safety; offset by 340 unsafe blocks needing audit
Code quality / complexity 15% 35 5.3 AQE score 37/100, 720 unwraps, 24 oversized files, high cyclomatic complexity
CI/CD maturity 10% 55 5.5 8 workflows is good breadth, but missing Rust test execution in CI is a major gap
Documentation 10% 78 7.8 79 ADRs, strong docstrings in Python, moderate Rust doc coverage, witness bundles
Architecture governance 5% 90 4.5 Exemplary ADR practice, DDD bounded contexts, deterministic verification pipeline
Total 100% 54.6

5.2 Final Verdict

+---------------------------------------------------------------+
|              QE QUEEN ORCHESTRATION COMPLETE                   |
+---------------------------------------------------------------+
|  Project: wifi-densepose (WiFi CSI Pose Estimation)            |
|  Total Codebase: ~305K lines across 5 languages                |
|  Total Tests: 3,327 (2,618 Rust + 488 Python + 202 Mobile     |
|               + 3 firmware fuzz + 16 Rust integration files)   |
|  Fleet ID: fleet-02558e91                                      |
|  Domains Analyzed: 5                                           |
|  Duration: ~120s                                               |
|  Status: COMPLETED                                             |
|                                                                |
|  OVERALL QUALITY SCORE: 55 / 100                               |
|  GRADE: C+                                                     |
|  RELEASE READINESS: NOT READY (quality gate FAILED)            |
+---------------------------------------------------------------+

5.3 Summary Assessment

Strengths:

  • Exceptional architecture governance with 79 ADRs and deterministic verification (witness bundles)
  • Strong Rust unit test count (2,618) with good distribution across signal processing and training crates
  • Mature security CI pipeline with daily scheduled scanning (Bandit, Semgrep, Safety)
  • Mobile UI has the best test-to-code ratio in the entire project
  • No hardcoded secrets, no unsafe string operations in firmware, clean security patterns

Critical Gaps:

  • Rust tests do not run in CI -- the 2,618 tests are only a local safety net
  • 720 .unwrap() calls create panic risk in production signal processing pipelines
  • 340 unsafe blocks need formal audit with documented safety invariants
  • 11 of 12 Python source modules have zero unit tests
  • Only 8 E2E test functions for a multi-platform, multi-service system
  • sensing-server/main.rs at 4,846 lines is a monolith risk

Path to Release Readiness (target: 75/100):

  1. Add Rust CI workflow (+10 points to CI maturity)
  2. Add Python unit tests for top 4 untested modules (+8 points to unit coverage)
  3. Audit and reduce .unwrap() count by 50% (+5 points to code quality)
  4. Add 5+ E2E test scenarios (+4 points to E2E coverage)
  5. Add integration tests to core, hardware, nn crates (+5 points to integration coverage)

Report generated by QE Queen Coordinator (fleet-02558e91)
Learnings stored: queen-orchestration-full-qe-2026-04-05 in namespace learning
AQE v3 quality assessment saved to: .agentic-qe/results/quality/2026-04-05T11-02-19_assessment.json