mirror of https://github.com/Skyvern-AI/skyvern.git synced 2026-04-28 19:50:42 +00:00

Add QA report persistence to /qa skill (#SKY-8305) (#5059 )

2026-03-11 16:04:16 -07:00

16 KiB

Raw Blame History

name	description
qa	QA test your code changes by reading your git diff, choosing the right validation path for frontend/browser and backend changes, and reporting pass/fail with evidence.

QA — Validate Frontend and Backend Changes

Read the diff, classify what changed, and run the right validation path: browser QA for frontend/browser changes, API validation for backend surface changes, repo-native validation for backend-internal changes, and both for mixed changes.

You changed code. This skill is diff-driven first: it reads what changed, understands the affected behavior, and validates that behavior with the right tools. It is not a generic website crawler, and it should not invent random API checks that are unrelated to the diff.

Quick Start

/qa                              # Diff-based: choose the right validation path automatically
/qa http://localhost:3000        # Same, explicit frontend URL
/qa -- validate the workflow filters API

How It Works

Read the code changes from git diff
Read the changed files to understand behavior, routes, schemas, and UI
Classify the diff as frontend/browser, backend API, backend-internal, or mixed
Run the right validation flow
Report pass/fail with concrete evidence

Step 1: Understand the Changes

Get the diff

# What files changed?
git diff --name-only HEAD~1     # vs last commit (if changes are committed)
git diff --name-only            # vs working tree (if uncommitted)

# Full diff for context
git diff HEAD~1                 # or git diff for uncommitted

Pick whichever diff has content. If both are empty, there is nothing diff-driven to QA.

Read the changed files

Read the full contents of every changed file that affects behavior:

Frontend files: .tsx, .jsx, .ts, .js, .css, .html
Backend/API files: routes, controllers, request/response schemas, serializers, handlers
Backend-internal files: services, workers, business logic, validators, data-layer code
Tests that changed alongside the implementation

Look for:

route paths and page entry points
component names, visible text, forms, buttons, error states
API endpoints, request params, response fields, auth requirements
validation logic, branching behavior, feature flags, empty states
tests that describe the expected behavior

Step 2: Classify the Diff

Mode	Trigger	Primary validation
Frontend/browser	UI/routes/components/styles changed	Browser QA against the dev server
Backend API	Route handlers, request/response schemas, or externally visible API behavior changed	Start backend locally and run targeted API requests
Backend-internal	Services/workers/business logic changed without public API surface changes	Repo-native fast checks plus targeted tests
Mixed	Frontend/browser and backend changed together	Backend validation first, then frontend/browser QA

Use these rules:

If both frontend/browser and backend changed, treat it as Mixed.
If only backend internals changed, do not invent unrelated browser tests or random API calls.
If a backend change might affect the public contract, inspect routes, schemas, and tests before choosing backend-internal.
If the diff is mostly documentation or comments, keep QA lightweight and report that no behavioral validation was warranted.

Step 3: Choose the Validation Strategy

Frontend/browser mode

Use browser automation against the dev server. Validate the specific UI changes plus 1-2 adjacent regression checks.

Backend API mode

Use the repo's documented local startup and auth instructions, start the backend if needed, identify the changed endpoint(s), and run targeted HTTP requests to validate the changed contract.

Backend-internal mode

Run the repo's fast verification commands first, then targeted unit/integration/scenario tests for the changed logic. Only start the backend and do live API calls if the change affects exposed behavior.

Mixed mode

Validate the backend first, then run frontend/browser QA against the flow that depends on it. If the backend contract is broken, frontend results are not trustworthy.

Step 4A: Frontend/Browser QA

Find the dev server

If the user provided a URL, use it. Otherwise auto-detect common local ports:

5173, 3000, 3001, 8080, 8000, 4200

If none respond, start the most direct repo-documented local command for the changed surface. If the diff needs both frontend and backend running together and the repo provides a combined frontend/backend dev script, prefer that. Only ask the user to start something manually if the repo has no documented command or startup fails.

Connect to a browser

Try these in order:

Option A: Local browser (fastest)

skyvern_browser_session_create(local=true, headless=false, timeout=15)

Use local=true so the browser can reach localhost.

Option B: Local browser via tunnel

If local session creation fails because the MCP server is remote, the cloud browser cannot reach localhost. Tell the user to run:

# Terminal 1: Launch a local browser with CDP exposed
skyvern browser serve --port 9222

# Terminal 2: Tunnel it to the internet
ngrok http 9222

Then connect:

skyvern_browser_session_connect(cdp_url="wss://<ngrok-subdomain>.ngrok-free.app/devtools/browser/<id>")

The user can get the browser ID from the skyvern browser serve output or by calling the ngrok URL's /json endpoint.

Option C: Cloud browser

skyvern_browser_session_create(timeout=15)

Only works for publicly reachable URLs. localhost URLs will not work here.

Generate frontend/browser test cases

For each changed frontend file, create targeted checks. Examples:

Test 1: Settings page renders the new "Retry failed run" button
  - Navigate to /settings/runs
  - Assert: button with text "Retry failed run" exists
  - Click it
  - Assert: success toast appears

Test 2: Adjacent regression
  - Verify the existing "Delete run" action still works or is still visible

Be specific. Do not write "verify the page works."

Run the frontend/browser tests

For each test case:

skyvern_navigate(url="http://localhost:<port>/<route>")

Health gate after navigation:

skyvern_evaluate(expression="(() => {
  const errors = [];
  const body = document.body?.innerText || '';
  if (body.includes('Something went wrong')) errors.push('error_message');
  if (body.includes('Cannot read properties')) errors.push('js_error_in_ui');
  if (/\\bundefined\\b/.test(body) && !/\\bif\\b|\\btypeof\\b|\\bdocument|tutorial|example/i.test(body) && body.length < 5000) errors.push('undefined_text');
  if (body.includes('connection refused')) errors.push('connection_refused');
  if (/sign.?in|log.?in|auth/i.test(window.location.pathname)) errors.push('auth_redirect');
  if (document.querySelector('[role=\"alert\"]')) errors.push('alert_element');
  if (!document.querySelector('main, [role=\"main\"], nav, header, h1, h2, [class*=\"layout\" i], [class*=\"page\" i], [class*=\"app\" i]'))
    errors.push('blank_page');
  return JSON.stringify({ pass: errors.length === 0, errors });
})()")

Prefer deterministic DOM assertions:

skyvern_evaluate(expression="!!document.querySelector('button')")
skyvern_evaluate(expression="document.querySelector('h1')?.textContent?.trim()")
skyvern_evaluate(expression="window.location.pathname")

Use interaction tools when needed:

skyvern_act(prompt="Click the 'Retry failed run' button")
skyvern_act(prompt="Fill the email field with 'test@example.com' and click Submit")
skyvern_validate(prompt="The page shows the success toast and the form is no longer loading")
skyvern_screenshot()

Also check for failed network requests once per page:

skyvern_evaluate(expression="(() => {
  const entries = performance.getEntriesByType('resource').filter(e => e.responseStatus >= 400);
  return JSON.stringify({ failed: entries.map(e => ({ url: e.name, status: e.responseStatus })).slice(0, 5) });
})()")

Step 4B: Backend API QA

Gather repo-local context first

Before starting the server or sending requests, read the repo's local instructions:

README, AGENTS.md, CLAUDE.md, Makefile, package.json, pyproject.toml
existing test files for the changed endpoints
any docs that describe auth, local ports, or startup commands

Do not guess the startup command if the repo already documents one.

Start the backend if needed

If the backend is not already responding on the expected local port:

Start it with the most direct repo-documented local command for the changed surface. If the validation needs both frontend and backend and the repo documents a combined dev environment command, prefer that over inventing separate startup steps.
Wait for readiness
Confirm the API is reachable before sending validation requests

If the repo requires background processes, start them in the background and keep notes on how you did it.

Identify the changed API surface

Use the diff to answer:

Which endpoints changed?
Which request parameters, headers, or bodies changed?
Which response fields or status codes changed?
Did auth requirements change?
Is there a create/update/delete side effect that needs follow-up verification?

Do not stop at the route file. Read the full handler, schema, and any changed tests.

Generate backend API test cases

For each changed endpoint, create targeted checks:

Happy path with valid parameters
Empty result or not-found path where applicable
Invalid input or validation error path
Combined filters or sorting behavior when relevant
Follow-up read after mutation endpoints

Examples:

Test 1: GET /api/runs returns the new field in the response body
Test 2: GET /api/runs?status=missing returns an empty list, not a 500
Test 3: POST /api/runs rejects invalid payload with a 4xx validation error
Test 4: PATCH /api/runs/:id updates the record and a follow-up GET shows the change

Execute the API requests

Use the repo's documented auth scheme and local base URL. Use curl, the repo SDK, or a small one-off client if that is clearer than shell quoting. Prefer simple, inspectable commands.

Examples:

curl -sS -H "Authorization: Bearer <token>" \
  "http://localhost:<port>/api/..."

curl -sS -X POST \
  -H "Content-Type: application/json" \
  -H "<auth-header>: <token>" \
  -d '{"example":"value"}' \
  "http://localhost:<port>/api/..."

Capture:

request being tested
status code
response body snippet or parsed result
whether the changed field/behavior is present

If the endpoint is authenticated and you cannot obtain local credentials from repo docs, say so clearly and stop rather than faking coverage.

Step 4C: Backend-Internal QA

If the diff is backend-only but does not change an exposed endpoint or UI flow:

Run the repo's fastest compile/type/lint checks for the changed files
Run targeted unit/integration/scenario tests that cover the changed logic
Add live API calls only if the internal change affects exposed behavior

Examples of appropriate checks:

compile/type-check the changed files
targeted pytest, npm test, go test, or equivalent
scenario/integration tests around changed services or workflows

Examples of inappropriate checks:

hitting an unrelated health endpoint and calling it "validated"
browsing the UI when no frontend behavior changed
calling random APIs just because the backend changed

Step 5: Report Results

## QA Report

### Validation Mode
- Mode: Backend API
- Scope: `routes/runs.py`, `schemas/run_response.py`

### Changes Tested
- Added `retryable` field to run responses
- Updated `status` filter handling

### Results
| # | Test | Result | Evidence |
|---|------|--------|----------|
| 1 | GET /api/runs returns `retryable` for valid runs | PASS | HTTP 200, field present in response |
| 2 | GET /api/runs?status=missing returns empty list | PASS | HTTP 200, `[]` |
| 3 | GET /api/runs?status=invalid returns validation error | PASS | HTTP 422 |
| 4 | Frontend runs page still renders filter state | PASS | screenshot_3 |

### Issues Found
1. `retryable` is missing from one branch of the response serializer.

### Verdict
3/4 tests passed. 1 issue found.

Report the evidence that actually matters:

screenshots for frontend/browser results
status codes and response snippets for backend API results
command + failing assertion for unit/integration tests

Step 6: Post Evidence to PR

After generating the QA report, persist it to the pull request as a sticky comment so the evidence survives beyond the conversation.

Check for an open PR

PR_NUMBER=$(gh pr view --json number -q '.number' 2>/dev/null)

If no PR exists for the current branch:

Save the full report markdown to .qa/latest-report.md in the project root (create the directory if needed).
Tell the user: "No open PR found for this branch. QA report saved to .qa/latest-report.md. Run /qa again after creating a PR to post it."
Stop here — do not attempt to create a PR.

Post or update the sticky comment

Use a hidden HTML marker to make the comment idempotent across multiple runs:

# Prepare the comment body with the hidden marker
COMMENT_BODY="<!-- skyvern-qa-report -->
## QA Report — $(git rev-parse --short HEAD) — $(date -u +%Y-%m-%dT%H:%M:%SZ)

<the full report markdown from Step 5>
"

# Find an existing QA comment on the PR
EXISTING_COMMENT_ID=$(gh api "repos/{owner}/{repo}/issues/${PR_NUMBER}/comments" \
  --jq '.[] | select(.body | test("skyvern-qa-report")) | .id' \
  2>/dev/null | head -1)

if [ -n "$EXISTING_COMMENT_ID" ]; then
  # Update the existing comment in place
  gh api "repos/{owner}/{repo}/issues/comments/${EXISTING_COMMENT_ID}" \
    -X PATCH -f body="$COMMENT_BODY"
else
  # Create a new comment
  gh pr comment "$PR_NUMBER" --body "$COMMENT_BODY"
fi

Screenshot handling

Screenshots taken during QA (via skyvern_screenshot()) are saved locally for the agent's verification. They are not uploaded to the PR comment because GitHub's API does not support image uploads in issue comments. The text report describes what was observed.

If the user asks to preserve screenshots, save them to .qa/screenshots/ and tell the user the local path. Do not include local file paths in the PR comment — they are meaningless to other reviewers.

Rules

Always include the  marker so repeated runs update the same comment instead of creating duplicates.
Include the short commit hash and UTC timestamp in the comment header.
Do not create a PR just to post a QA report — that is the user's decision.
If gh is not available or not authenticated, fall back to saving the report locally and tell the user.

Error Handling

Problem	Action
No git diff found	Ask what behavior to validate, then fall back to explore mode
Frontend dev server not running	Start the most direct repo-documented local command for the changed surface; prefer a combined dev command only when the validation needs both frontend and backend; only ask the user if no documented command exists or startup fails
Backend server not running	Start the most direct repo-documented local command for the changed surface; prefer a combined dev environment command only when the validation needs both sides
Cannot identify changed endpoint	Read changed routes, schemas, and tests before proceeding
Auth required but no local creds available	Report the blocker clearly; do not fake coverage
Component does not render	Capture screenshot and specific UI error
API returns unexpected 5xx	Save request/response evidence and report the regression

Session Cleanup

Always close browser sessions when done:

skyvern_browser_session_close()

If you started local servers or background processes, leave the user a clear note about what is still running.

Fallback: Explore Mode

If there is no useful diff, fall back to explicit exploration:

Ask what behavior should be validated
If it is a frontend flow, use browser QA
If it is a backend/API flow, run targeted local API checks
Report findings with the same evidence standard

The primary mode is still diff-driven. Always try to understand the code changes first.

16 KiB Raw Blame History