mirror of
https://github.com/block/goose.git
synced 2026-04-28 11:39:43 +00:00
Some checks are pending
Canary / Prepare Version (push) Waiting to run
Canary / build-cli (push) Blocked by required conditions
Canary / Upload Install Script (push) Blocked by required conditions
Canary / bundle-desktop (push) Blocked by required conditions
Canary / bundle-desktop-linux (push) Blocked by required conditions
Canary / bundle-desktop-windows (push) Blocked by required conditions
Canary / Release (push) Blocked by required conditions
CI / changes (push) Waiting to run
CI / Check Rust Code Format (push) Blocked by required conditions
CI / Build and Test Rust Project (push) Blocked by required conditions
CI / Test and Lint Electron Desktop App (push) Blocked by required conditions
CI / bundle-desktop-unsigned (push) Blocked by required conditions
Documentation Site Preview / deploy (push) Waiting to run
Publish Docker Image / docker (push) Waiting to run
This PR introduces goose-self-test.yaml, a meta-testing recipe that enables goose to validate its own capabilities through first-person integration testing.
309 lines
10 KiB
YAML
309 lines
10 KiB
YAML
version: 1.0.0
|
|
title: Goose Self-Testing Integration Suite
|
|
description: A comprehensive meta-testing recipe where goose tests its own capabilities using its own tools - true first-person integration testing
|
|
author:
|
|
contact: goose-self-test
|
|
|
|
activities:
|
|
- Initialize test workspace and logging infrastructure
|
|
- Test file operations (create, read, update, delete, undo)
|
|
- Validate shell command execution and error handling
|
|
- Analyze code structure and parsing capabilities
|
|
- Test extension discovery and management
|
|
- Create and orchestrate subagents for meta-testing
|
|
- Generate and execute test recipes
|
|
- Test error boundaries and security controls
|
|
- Measure performance and resource usage
|
|
- Generate comprehensive test report
|
|
|
|
parameters:
|
|
- key: test_phases
|
|
input_type: string
|
|
requirement: optional
|
|
default: "all"
|
|
description: "Which test phases to run: all, basic, extensions, subagents, recipes, advanced"
|
|
|
|
- key: test_depth
|
|
input_type: string
|
|
requirement: optional
|
|
default: "standard"
|
|
description: "Testing depth: quick (smoke tests), standard (normal coverage), deep (exhaustive)"
|
|
|
|
- key: workspace_dir
|
|
input_type: string
|
|
requirement: optional
|
|
default: "./gooseselftest"
|
|
description: "Directory for test artifacts and results"
|
|
|
|
- key: parallel_tests
|
|
input_type: string
|
|
requirement: optional
|
|
default: "true"
|
|
description: "Run independent tests in parallel where possible"
|
|
|
|
- key: cleanup_after
|
|
input_type: string
|
|
requirement: optional
|
|
default: "true"
|
|
description: "Clean up test artifacts after completion"
|
|
|
|
instructions: |
|
|
You are testing yourself - a running goose instance validating its own capabilities through meta-testing.
|
|
This is true first-person integration testing where you use your own tools to test your own functionality.
|
|
|
|
## Understanding First-Person Integration Testing
|
|
This is a crucial distinction - as a running goose instance, you are testing yourself using your own capabilities.
|
|
This is meta-testing in the truest sense: not unit tests or external test harnesses, but you using your tools
|
|
to validate your own functionality from within your active session. You can only test what you can observe and
|
|
control from inside your running instance - your tools, your behaviors, your error handling, your consistency.
|
|
|
|
## Core Testing Philosophy
|
|
- You ARE the system under test AND the tester
|
|
- Use your tools to create test scenarios, then validate the results
|
|
- Test both success and failure paths
|
|
- Document everything meticulously
|
|
- Handle errors gracefully - a test failure shouldn't stop the suite
|
|
|
|
## Test Execution Framework
|
|
|
|
### Phase 1: Environment Setup & Basic Tool Validation
|
|
Create a structured test workspace and validate core developer tools:
|
|
- File operations (CRUD + undo)
|
|
- Shell command execution
|
|
- Code analysis capabilities
|
|
- Error handling and recovery
|
|
|
|
### Phase 2: Extension System Testing
|
|
Test dynamic extension management:
|
|
- Discover available extensions
|
|
- Enable/disable extensions
|
|
- Test extension interactions
|
|
- Verify isolation between extensions
|
|
|
|
### Phase 3: Subagent Testing (Meta-Recursion)
|
|
Create subagents to test yourself recursively:
|
|
- Basic subagent creation and execution
|
|
- Parallel subagent orchestration
|
|
- Sequential subagent chains
|
|
- Recursive depth testing (subagent creating subagent)
|
|
- Test return_last_only optimization
|
|
|
|
### Phase 4: Advanced Self-Testing
|
|
Push boundaries and test limits:
|
|
- Intentionally trigger errors
|
|
- Test timeout scenarios
|
|
- Validate security controls
|
|
- Measure performance metrics
|
|
- Test resource constraints
|
|
|
|
### Phase 5: Report Generation
|
|
Compile comprehensive test results:
|
|
- Aggregate all test outcomes
|
|
- Calculate success metrics
|
|
- Document failures and issues
|
|
- Generate recommendations
|
|
|
|
## Success Criteria
|
|
- Phase success: ≥80% tests pass
|
|
- Suite success: All phases complete, critical features work
|
|
- Each test logs: setup → execute → validate → result → cleanup
|
|
|
|
extensions:
|
|
- type: builtin
|
|
name: developer
|
|
display_name: Developer
|
|
timeout: 600
|
|
bundled: true
|
|
description: Core tool for file operations, shell commands, and code analysis
|
|
|
|
prompt: |
|
|
Execute the Goose Self-Testing Integration Suite in {{ workspace_dir }}.
|
|
Test phases: {{ test_phases }}, Depth: {{ test_depth }}, Parallel: {{ parallel_tests }}
|
|
|
|
## 🚀 INITIALIZATION
|
|
Create test workspace: {{ workspace_dir }}/ for all test artifacts and reports.
|
|
|
|
Track your progress using the todo extension. Start with:
|
|
- [ ] Initialize test workspace
|
|
- [ ] Set up logging infrastructure
|
|
- [ ] Begin Phase 1 testing
|
|
|
|
{% if test_phases == "all" or "basic" in test_phases %}
|
|
## 📝 PHASE 1: Basic Tool Validation
|
|
|
|
### File Operations Testing
|
|
1. Create test files with various content types (.txt, .py, .md, .json)
|
|
2. Test str_replace on each file type
|
|
3. Test insert operations at different line positions
|
|
4. Test undo functionality
|
|
5. Verify file deletion and recreation
|
|
6. Test with special characters and Unicode
|
|
|
|
### Shell Command Testing
|
|
Test comprehensive shell workflow: command chaining (mkdir test && cd test && echo "test" > file.txt),
|
|
error handling (false || echo "handled"), and environment variables (export VAR=test && echo $VAR).
|
|
Verify both success and failure paths work correctly.
|
|
|
|
### Code Analysis Testing
|
|
1. Create sample code files in Python, JavaScript, and Go
|
|
2. Analyze each file for structure
|
|
3. Test directory-wide analysis
|
|
4. Test symbol focus and call graphs
|
|
5. Verify LOC, function, and class counting
|
|
|
|
Log results to: {{ workspace_dir }}/phase1_basic_tools.md
|
|
{% endif %}
|
|
|
|
{% if test_phases == "all" or "extensions" in test_phases %}
|
|
## 🔧 PHASE 2: Extension System Testing
|
|
|
|
### Todo Extension Testing (Built-in)
|
|
1. Create initial todos and verify they persist
|
|
2. Update todos and confirm changes are retained
|
|
3. Clear todos and verify clean state
|
|
|
|
### Dynamic Extension Management
|
|
1. Use platform__search_available_extensions to discover available extensions
|
|
2. Document all available extensions
|
|
3. Test enabling and disabling dynamic extensions (if any available)
|
|
4. Verify extension isolation between enabled extensions
|
|
|
|
Log results to: {{ workspace_dir }}/phase2_extensions.md
|
|
{% endif %}
|
|
|
|
{% if test_phases == "all" or "subagents" in test_phases %}
|
|
## 🤖 PHASE 3: Subagent Meta-Testing
|
|
|
|
### Basic Subagent Test
|
|
Create a simple subagent task:
|
|
```
|
|
Task: "Create a file called subagent_test.txt with 'Hello from subagent'"
|
|
```
|
|
|
|
### Parallel Subagent Test
|
|
{% if parallel_tests == "true" %}
|
|
Create 3 parallel subagents:
|
|
1. Count files in current directory
|
|
2. Get current timestamp
|
|
3. Create a test file
|
|
|
|
Use execution_mode: "parallel" and verify all complete.
|
|
{% endif %}
|
|
|
|
### Sequential Chain Test
|
|
Create dependent subagents:
|
|
1. First: Create a Python file
|
|
2. Second: Analyze the created file
|
|
3. Third: Run the Python file
|
|
|
|
### Recursive Depth Test (if test_depth == "deep")
|
|
{% if test_depth == "deep" %}
|
|
Create a subagent that creates another subagent (test depth limit).
|
|
Monitor for resource constraints and context window limits.
|
|
{% endif %}
|
|
|
|
### Return Last Only Test
|
|
Create subagents with return_last_only=true and verify condensed output.
|
|
|
|
Log results to: {{ workspace_dir }}/phase3_subagents.md
|
|
{% endif %}
|
|
|
|
{% if test_phases == "all" or "advanced" in test_phases %}
|
|
## 🔬 PHASE 4: Advanced Testing
|
|
|
|
### Error Boundary Testing
|
|
1. Create a file with an invalid path (should fail gracefully)
|
|
2. Run a non-existent shell command
|
|
3. Try to analyze a binary file
|
|
4. Test with extremely long filenames
|
|
5. Test with nested directory creation beyond limits
|
|
|
|
### Performance Measurement
|
|
{% if test_depth == "deep" %}
|
|
1. Create and analyze a large file (>1MB)
|
|
2. Run multiple parallel operations
|
|
3. Track execution times for each operation
|
|
4. Monitor token usage if accessible
|
|
{% endif %}
|
|
|
|
### Security Validation
|
|
1. Test input with special shell characters: $(echo test)
|
|
2. Attempt directory traversal: ../../../etc/passwd
|
|
3. Test with harmful Unicode characters
|
|
4. Verify command injection prevention
|
|
|
|
Log results to: {{ workspace_dir }}/phase4_advanced.md
|
|
{% endif %}
|
|
|
|
## 📊 PHASE 5: Final Report Generation
|
|
|
|
Create TWO reports:
|
|
|
|
### 1. Detailed Report at {{ workspace_dir }}/detailed_report.md
|
|
Include all test details, logs, and technical information.
|
|
|
|
### 2. Executive Summary (REQUIRED - Display in Terminal)
|
|
|
|
**IMPORTANT**: At the very end, generate and display a concise summary directly in the terminal:
|
|
|
|
```
|
|
========================================
|
|
GOOSE SELF-TEST SUMMARY
|
|
========================================
|
|
|
|
✅ OVERALL RESULT: [PASS/FAIL]
|
|
|
|
📊 Quick Stats:
|
|
• Tests Run: [X]
|
|
• Passed: [X] ([%])
|
|
• Failed: [X] ([%])
|
|
• Duration: [X minutes]
|
|
|
|
✅ Working Features:
|
|
• File operations: [✓/✗]
|
|
• Shell commands: [✓/✗]
|
|
• Code analysis: [✓/✗]
|
|
• Extensions: [✓/✗]
|
|
• Subagents: [✓/✗]
|
|
|
|
⚠️ Issues Found:
|
|
• [Issue 1 - brief description]
|
|
• [Issue 2 - brief description]
|
|
|
|
💡 Key Insights:
|
|
• [Most important finding]
|
|
• [Performance observation]
|
|
• [Recommendation]
|
|
|
|
📁 Full report: {{ workspace_dir }}/detailed_report.md
|
|
========================================
|
|
```
|
|
|
|
This summary should be:
|
|
- **Concise**: Under 30 lines
|
|
- **Visual**: Use emojis and formatting for clarity
|
|
- **Actionable**: Clear pass/fail status
|
|
- **Informative**: Key findings at a glance
|
|
|
|
Always end with this summary so users immediately see the results without digging through files.
|
|
|
|
{% if cleanup_after == "true" %}
|
|
## 🧹 CLEANUP
|
|
After report generation:
|
|
1. Archive results to {{ workspace_dir }}/archive/
|
|
2. Remove temporary test artifacts
|
|
3. Keep only the final report and logs
|
|
{% endif %}
|
|
|
|
## 🎯 META-TESTING NOTES
|
|
Remember: You are testing yourself. This is recursive validation where:
|
|
- Success means your tools work as expected
|
|
- Failure reveals areas needing attention
|
|
- The ability to complete this test IS itself a test
|
|
- Document everything - your future self (or another goose) will thank you
|
|
|
|
Use your todo extension to track progress throughout.
|
|
Handle errors gracefully - a failed test shouldn't crash the suite.
|
|
Be thorough but efficient based on the test_depth parameter.
|
|
|
|
This is true first-person integration testing. Execute with precision and document with clarity.
|