mirror of
https://github.com/Skyvern-AI/skyvern.git
synced 2026-04-30 20:50:05 +00:00
983 lines
40 KiB
Python
983 lines
40 KiB
Python
"""MCP prompt skills for Skyvern workflow design, debugging, data extraction, and QA testing.
|
|
|
|
These prompts are registered with @mcp.prompt() and injected into LLM conversations
|
|
to guide Claude through Skyvern automation tasks. Each prompt teaches Claude to act
|
|
as the USER — designing workflows and interpreting results — while Skyvern's AI
|
|
handles the actual browser navigation.
|
|
|
|
Milestone context:
|
|
M3: Expose MCP functionality as Claude Skills (this module)
|
|
M4: Expose Skyvern Workflow Copilot via MCP + Skills (these prompts power it)
|
|
"""
|
|
|
|
from typing import Annotated
|
|
|
|
from pydantic import Field
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# build_workflow
|
|
# ---------------------------------------------------------------------------
|
|
|
|
BUILD_WORKFLOW_CONTENT = """\
|
|
# Build a Skyvern Workflow
|
|
|
|
## How Skyvern Workflows Work
|
|
|
|
You create a workflow definition (blocks + parameters). Skyvern executes it in a cloud browser.
|
|
Each block runs in order, sharing the same browser session. The key differentiator is
|
|
**navigation blocks**: you describe a goal in natural language and Skyvern's AI navigates the site
|
|
autonomously — clicking, filling forms, handling popups, and retrying on failure. You do NOT
|
|
need to specify selectors, XPaths, or step-by-step instructions.
|
|
|
|
Workflows are versioned, parameterized, and reusable. Once a workflow works, you can re-run it
|
|
with different inputs forever. Think of each workflow as a saved automation skill.
|
|
|
|
Tools you will use: skyvern_workflow_create, skyvern_workflow_run, skyvern_workflow_status,
|
|
skyvern_workflow_update, skyvern_block_schema, skyvern_block_validate.
|
|
|
|
---
|
|
|
|
## Design the Workflow
|
|
|
|
### Default to navigation blocks
|
|
|
|
navigation is the right choice for most steps. You give it a URL and a navigation_goal.
|
|
Skyvern figures out the navigation at runtime.
|
|
|
|
GOOD navigation_goal — describes the GOAL and what "done" looks like:
|
|
"Search for '{{product_name}}' in the search bar, click the first result, and add it to cart.
|
|
Done when the cart icon shows 1 item."
|
|
|
|
BAD navigation_goal — describes HOW to do it (Skyvern already knows):
|
|
"Find the input element with id='search', type the product name, press Enter, wait for
|
|
results to load, find the first anchor tag in the results div, click it..."
|
|
|
|
### Block type decision tree
|
|
|
|
1. **navigation** (default) — AI-powered browser actions. Use when the step involves browsing,
|
|
clicking, filling forms, or any multi-action sequence. Skyvern handles element finding,
|
|
popup dismissal, scrolling, and retries.
|
|
2. **extraction** — structured data extraction. Use when you need JSON output from a page
|
|
(prices, tables, lists). Requires data_extraction_goal and data_schema.
|
|
3. **for_loop** — iterate over a list. Use when you need to repeat blocks for each item
|
|
in a parameter list (e.g., process each URL, each product).
|
|
4. **conditional** — branch based on conditions. Use when workflow logic should diverge
|
|
based on data from a previous block or a Jinja2 expression.
|
|
5. **goto_url** — simple navigation without any actions. Use to jump to a known URL.
|
|
6. **login** — authenticate with stored credentials. Use for sites that require login.
|
|
7. **code** — run Python for data transformation between blocks.
|
|
8. **action** — single focused action on the current page (e.g., click one button).
|
|
|
|
Use skyvern_block_schema() to see full schemas and examples for any block type.
|
|
|
|
### Engine selection for workflow blocks
|
|
|
|
Task-based blocks (navigation, extraction, action, login, file_download) default to engine 1.0 (`skyvern-1.0`).
|
|
Omit the `engine` field unless you need 2.0. Non-task blocks (for_loop, conditional, code, wait, etc.)
|
|
do not have an engine field — do not set one.
|
|
|
|
Use engine 2.0 (`"engine": "skyvern-2.0"`) on a **navigation** block when:
|
|
- The block's goal requires dynamic planning — discovering what to do at runtime, conditional
|
|
branching, or looping over unknown items on the page.
|
|
- Example: "Navigate through a multi-step insurance quote wizard, handling dynamic questions
|
|
based on previous answers, then extract the final quote."
|
|
|
|
Keep engine 1.0 (default, omit field) when:
|
|
- The path is known upfront — all fields, values, and actions are specified in the prompt.
|
|
- A long prompt with many form fields is still 1.0. Complexity means dynamic planning, not field count.
|
|
- Example: "Fill in SSN, first name, last name, select 'Sole Proprietor', click Continue."
|
|
|
|
When in doubt, split into multiple 1.0 blocks rather than using one 2.0 block — it's cheaper and
|
|
gives you per-block observability. Only navigation blocks support engine 2.0.
|
|
|
|
### Model selection for text_prompt blocks
|
|
|
|
Default to Skyvern Optimized for text_prompt blocks by omitting both `model` and `llm_key`.
|
|
|
|
If the user explicitly asks for a specific model, use the public `model` field:
|
|
`"model": {"model_name": "<one of the values returned by /models>"}`
|
|
|
|
Do NOT invent internal `llm_key` strings like `ANTHROPIC_CLAUDE_3_5_SONNET`.
|
|
Only use `llm_key` when the user explicitly provides an exact internal key and wants that advanced override.
|
|
|
|
### One block per logical step
|
|
|
|
Split workflows into small, focused blocks. Each block should do ONE thing.
|
|
|
|
GOOD (3 blocks):
|
|
Block 1 (navigation): "Go to the search page and search for '{{query}}'. Done when results load."
|
|
Block 2 (navigation): "Click the first result and add it to cart. Done when cart shows 1 item."
|
|
Block 3 (extraction): "Extract the product name, price, and availability from the cart page."
|
|
|
|
BAD (1 block):
|
|
Block 1 (navigation): "Search for the product, click the first result, add to cart, go to checkout,
|
|
fill in shipping, enter payment, and submit the order."
|
|
|
|
### Common workflow shapes
|
|
|
|
**Search + Extract**: goto_url -> navigation (search) -> extraction (results)
|
|
**Multi-page form**: navigation (page 1) -> navigation (page 2) -> navigation (page 3) -> extraction (confirmation)
|
|
**Login + Action**: login (authenticate) -> navigation (do work) -> extraction (results)
|
|
**Batch processing**: for_loop over URLs -> navigation (process each) -> extraction (gather data)
|
|
|
|
---
|
|
|
|
## Parameterize for Reuse
|
|
|
|
Every workflow should accept parameters so it can be re-run with different inputs.
|
|
|
|
### Declaring parameters
|
|
|
|
Parameters are declared in the workflow_definition.parameters array:
|
|
{"parameter_type": "workflow", "key": "company_name", "workflow_parameter_type": "string"}
|
|
|
|
Supported types: string, integer, float, boolean, json, file_url.
|
|
|
|
### Referencing parameters
|
|
|
|
Use {{parameter_key}} in any block field — prompts, URLs, data schemas, goal descriptions.
|
|
Skyvern substitutes values at runtime.
|
|
|
|
### What to parameterize
|
|
|
|
- Input data: names, addresses, search queries, product IDs
|
|
- URLs: if the target URL varies between runs
|
|
- Credentials: use the login block + credential_id parameter
|
|
|
|
### What NOT to parameterize
|
|
|
|
- Navigation instructions: these are the block prompts themselves
|
|
- Block structure: if you need different flows, create separate workflows
|
|
- Static site URLs: if the URL is always the same, hardcode it
|
|
|
|
---
|
|
|
|
## Test via Skyvern's Feedback Loop
|
|
|
|
Do NOT try to get the workflow perfect on the first attempt. Use this iteration loop:
|
|
|
|
### Step 1: Create and run
|
|
|
|
Call skyvern_workflow_create with your definition, then skyvern_workflow_run with test parameters.
|
|
|
|
### Step 2: Check status
|
|
|
|
Poll with skyvern_workflow_status using the run_id. The response tells you:
|
|
- Which block succeeded or failed
|
|
- The failure_reason with what Skyvern saw on the page
|
|
- Step count and timing
|
|
|
|
### Step 3: Fix and re-run
|
|
|
|
Based on the error feedback:
|
|
- **"Prompt too vague"** — add specificity about what "done" looks like. Example: change
|
|
"Fill in the form" to "Fill in the form with company name '{{name}}'. Done when the
|
|
confirmation page shows 'Application Submitted'."
|
|
- **"Element not found"** — add a navigation hint. Example: "Look for the search bar in the
|
|
top navigation area" or "The form is inside an iframe."
|
|
- **"Wrong page"** — add a URL check or split into smaller blocks so each one starts on the
|
|
right page.
|
|
- **"Timeout"** — the page may be slow. Increase max_retries on the block or add a wait block.
|
|
|
|
Call skyvern_workflow_update with the fixed definition, then skyvern_workflow_run again.
|
|
|
|
### Step 4: Verify output
|
|
|
|
When the run succeeds, check the output field in skyvern_workflow_status. For extraction blocks,
|
|
verify the JSON matches your data_schema.
|
|
|
|
---
|
|
|
|
## Quick Feasibility Check (Optional)
|
|
|
|
If you are unsure whether Skyvern can handle a particular site, use skyvern_run_task as a probe
|
|
BEFORE building a full workflow.
|
|
|
|
skyvern_run_task is a one-shot autonomous agent. Give it a URL and a prompt — it navigates
|
|
the site, takes actions, and reports results. No workflow definition needed.
|
|
|
|
If it succeeds: turn the approach into a multi-block workflow for reuse.
|
|
If it struggles: read the failure_reason, refine the prompt, and try again. Two or three
|
|
iterations usually reveal whether the site is automatable and what prompt phrasing works.
|
|
|
|
Do NOT open a manual browser session (skyvern_browser_session_create) to explore the site before
|
|
building a workflow. That approach bypasses Skyvern's AI and wastes time. skyvern_run_task
|
|
gives you the same insight faster because Skyvern navigates the site for you.
|
|
|
|
---
|
|
|
|
## Prompt Refinement Tips
|
|
|
|
When a navigation block fails, refine the navigation_goal:
|
|
1. Use skyvern_run_task first to identify what Skyvern sees on the page.
|
|
2. Add specificity: reference exact labels visible on the page.
|
|
3. Describe what "done" looks like so Skyvern knows when to stop.
|
|
|
|
Example workflow with clear goals:
|
|
Block 1 (login): Authenticate with stored credentials.
|
|
Block 2 (navigation): "Navigate to the settings page and open 'Notification Preferences'.
|
|
Uncheck 'Marketing Emails' and check 'Security Alerts'. Click 'Save Changes'.
|
|
Done when a success banner appears."
|
|
Block 3 (extraction): "Extract the confirmation message."
|
|
|
|
---
|
|
|
|
## Pre-Flight Checklist
|
|
|
|
Before calling skyvern_workflow_create, verify:
|
|
1. Each block has a clear, single-responsibility goal (not a multi-page mega-prompt).
|
|
2. Navigation block goals describe WHAT to achieve, not HOW to click.
|
|
3. Every variable input uses {{parameter_key}} and is declared in parameters.
|
|
4. Extraction blocks include a data_schema with the expected JSON structure.
|
|
5. The block order matches the actual site flow (blocks share one browser session).
|
|
6. Login is handled by a login block (not embedded in a navigation goal).
|
|
7. You have test parameter values ready for the first skyvern_workflow_run call.
|
|
8. Validate blocks with skyvern_block_validate before submitting the full definition.
|
|
"""
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# debug_automation
|
|
# ---------------------------------------------------------------------------
|
|
|
|
DEBUG_AUTOMATION_CONTENT = """\
|
|
# Debugging Skyvern Automations
|
|
|
|
When a workflow run or task fails, follow this structured process: read the error, diagnose the pattern,
|
|
fix and re-run. Do NOT open a manual browser session to explore — Skyvern already tells you what went wrong.
|
|
|
|
## Step 1: Read the Error
|
|
|
|
Call skyvern_workflow_status with the workflow_run_id to get structured failure info.
|
|
|
|
Key fields to examine:
|
|
- **status**: "failed", "terminated", or "timed_out"
|
|
- **failure_reason**: which block failed and why
|
|
- **screenshot**: what the page looked like when the failure occurred
|
|
- **extracted_information**: any partial data the AI did extract before failing
|
|
|
|
If this was an interactive session failure (not a workflow), call skyvern_screenshot to see the current page state,
|
|
then check the last tool response for error details.
|
|
|
|
Record three things before proceeding:
|
|
1. Which block (by label) failed
|
|
2. The error type (timeout, element not found, wrong page, extraction empty, auth required)
|
|
3. What the AI reported seeing on the page
|
|
|
|
## Step 2: Diagnose the Pattern
|
|
|
|
Skyvern failures fall into predictable categories. Match the error to a pattern and apply the standard fix.
|
|
|
|
### Timeout (block exceeded max_steps or wall time)
|
|
- Cause: prompt is too vague, so the AI explores without converging.
|
|
- Fix: add specificity about what "done" looks like. Instead of "fill out the form", write "fill out the form and
|
|
click the blue Submit button. Done when you see a confirmation message containing a reference number."
|
|
- Also check: is max_steps too low? Default is reasonable, but complex forms may need more.
|
|
|
|
### Element Not Found (AI could not locate the target element)
|
|
- Cause: label mismatch (the button says "Continue" but the prompt says "Next"), or element loads asynchronously.
|
|
- Fix: update the prompt to use the exact label visible on the page. If the element loads after a delay, add
|
|
"wait for the page to fully load before acting" to the prompt.
|
|
- If you know the exact label: switch to an action block for a single precise interaction.
|
|
|
|
### Wrong Page (block started on an unexpected page)
|
|
- Cause: the previous block did not complete its page transition. The current block assumed it would land on page B
|
|
but it is still on page A.
|
|
- Fix: update the previous block's prompt to explicitly include the page transition. Add "click Continue and wait
|
|
until the next page loads" instead of just "click Continue". Alternatively, add a goto_url block between them.
|
|
|
|
### Extraction Empty (extraction returned null or empty object)
|
|
- Cause: data loads dynamically (AJAX, infinite scroll) and was not present when the AI read the page. Or the
|
|
extraction prompt does not match the page structure.
|
|
- Fix: add "wait for the data table to fully load" to the prompt. If data requires scrolling, add a navigation
|
|
block that scrolls first. If the prompt is wrong, update it to describe the data using labels visible on the page.
|
|
|
|
### Auth Failure (redirected to login page)
|
|
- Cause: workflow does not handle authentication, or session cookies expired.
|
|
- Fix: add a login block at the start of the workflow, or use a browser_profile that has saved credentials.
|
|
|
|
### Stuck / Hanging (run stays "running" indefinitely)
|
|
- Action: call skyvern_workflow_cancel to stop the run. Then investigate: is the page showing a CAPTCHA, a
|
|
modal dialog, or an unexpected redirect? Check the last screenshot from skyvern_workflow_status.
|
|
|
|
### Rate Limited or Blocked (403, CAPTCHA, "unusual traffic" message)
|
|
- Cause: the target site detected automation.
|
|
- Fix: add a proxy (residential or ISP) to the workflow's proxy_location parameter. Reduce request frequency
|
|
by adding wait blocks between actions. If CAPTCHA persists, report to the user — this may require manual
|
|
intervention or a CAPTCHA-solving integration.
|
|
|
|
## Step 3: Fix and Re-run
|
|
|
|
Use skyvern_workflow_update to modify the failing block. Do NOT delete the workflow and recreate it.
|
|
|
|
Fixing playbook:
|
|
1. Update the failing block's prompt to address the diagnosed issue. Be specific: add exact labels, describe
|
|
what "done" looks like, mention elements to wait for.
|
|
2. If the navigation_goal is too vague for a complex form, make it more explicit — reference exact field labels,
|
|
describe the form layout, and specify what "done" looks like.
|
|
3. Re-run with skyvern_workflow_run using the same parameters as the failed run.
|
|
4. Poll skyvern_workflow_status until the run completes. Check whether the previously failing block now passes.
|
|
5. If it still fails with the same error: refine the prompt further. If it fails with a NEW error, restart
|
|
diagnosis from Step 1.
|
|
|
|
## Step 4: Escalation
|
|
|
|
Not every failure can be fixed by prompt refinement. Know when to escalate.
|
|
|
|
**Use action blocks for single-step precision** when:
|
|
- A navigation block does too much — you only need one specific click or input
|
|
- The page has multiple similar-looking elements and the AI picks the wrong one
|
|
- The step involves a single focused action (e.g., click one button, toggle one checkbox)
|
|
|
|
**Open a manual session (last resort)** when:
|
|
- You cannot determine from error output what the page looks like
|
|
- The site has unusual UI patterns not described in any error message
|
|
- Use: skyvern_browser_session_create, skyvern_navigate to the failing URL, skyvern_screenshot to see the page
|
|
|
|
**Report to the user** when:
|
|
- CAPTCHA blocks persist even with proxy rotation
|
|
- The site requires 2FA or hardware authentication
|
|
- Rate limiting cannot be avoided with proxies and delays
|
|
- The site's terms of service explicitly prohibit automation
|
|
"""
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# extract_data
|
|
# ---------------------------------------------------------------------------
|
|
|
|
EXTRACT_DATA_CONTENT = """\
|
|
# Data Extraction with Skyvern
|
|
|
|
You design the data schema and describe what to extract. Skyvern's AI finds the elements, \
|
|
parses the page, and returns structured JSON. You never write selectors or scraping code.
|
|
|
|
## Schema Design
|
|
|
|
Always provide a `data_extraction_schema` (JSON Schema) so Skyvern returns typed, validated output.
|
|
|
|
- Use `"type": "object"` for a single record (profile, summary, confirmation).
|
|
- Use `"type": "array"` with `"items": { "type": "object", ... }` for lists (search results, table rows).
|
|
- Mark critical fields as `"required"` so missing data surfaces as an error rather than null.
|
|
- Choose descriptive property names that reflect the data, not the page layout \
|
|
(`order_date` not `col_3`, `company_name` not `first_bold_text`).
|
|
- Nest objects when the data is naturally hierarchical:
|
|
`{ "seller": { "name": "...", "rating": 4.5 }, "price": { "amount": 29.99, "currency": "USD" } }`
|
|
|
|
## Writing Extraction Prompts
|
|
|
|
Describe WHAT to extract, not WHERE it is on the page. Skyvern's AI locates the data.
|
|
|
|
Good: "Extract all product names, prices, and star ratings from the search results"
|
|
Bad: "Get the text from each div.product-card > span.price"
|
|
|
|
When the page has multiple similar sections, specify which one:
|
|
"Extract order details from the table under 'Recent Orders', not 'Recommended Products'"
|
|
|
|
For simple extractions on a page you already navigated to, use `skyvern_extract` with a schema. \
|
|
For extraction combined with navigation (log in, then go to dashboard, then extract), use \
|
|
`skyvern_run_task` with a `data_extraction_schema` -- it handles the full flow in one call.
|
|
|
|
## Multi-Page Extraction
|
|
|
|
For paginated results, build a workflow with a `for_loop` block that iterates over page numbers \
|
|
or "next" clicks. Each iteration uses an extraction block to pull that page's data.
|
|
|
|
For infinite-scroll pages, use `skyvern_run_task` with a prompt like \
|
|
"scroll to load all results, then extract every item" -- Skyvern handles the scrolling.
|
|
|
|
For detail-page drilling (list page -> click each item -> extract details), build a workflow: \
|
|
extraction block to get the list of links, then a `for_loop` block that visits each link and \
|
|
extracts the detail fields.
|
|
|
|
## Validate Results
|
|
|
|
After extraction, check the returned data before using it:
|
|
- Verify record count matches expectations (e.g., "I expected ~50 results but got 3").
|
|
- Check for null or empty fields that should have values.
|
|
- If the data looks wrong, refine the extraction prompt (be more specific about which section \
|
|
or what the data looks like), not the schema.
|
|
- Use `skyvern_validate` for page-level assertions before extracting \
|
|
("Is this the search results page?" / "Are there at least 10 results visible?").
|
|
|
|
## Caching Considerations
|
|
|
|
Workflows created via MCP default to Code 2.0 (code_version=2, run_with="code").
|
|
|
|
### What this means for workflow design
|
|
|
|
- **First run**: The AI agent runs all blocks, recording actions. A cached script is generated afterward.
|
|
- **Subsequent runs**: The script replays deterministically — 10-100x faster, no LLM costs.
|
|
- **AI fallback**: If the script encounters an element it cannot find, it falls back to the AI agent \
|
|
for that step. The fallback episode is recorded and used to improve the script.
|
|
|
|
### Design for cacheability
|
|
|
|
1. Use stable selectors: navigation goals that reference exact field labels cache better than vague \
|
|
descriptions. "Fill in the 'Company Name' field" caches better than "fill in the first text box."
|
|
2. Avoid dynamic page content in goals: if a page shows different content each time, the cached script \
|
|
may need frequent AI fallbacks. Consider splitting dynamic sections into separate blocks.
|
|
3. Parameterize all variable data: cached scripts substitute parameters at runtime. Hardcoded values \
|
|
in navigation_goal become part of the script literally.
|
|
|
|
### Overriding execution mode at run time
|
|
|
|
Pass `run_with="agent"` to `skyvern_workflow_run` to force AI execution for a specific run without \
|
|
changing the workflow definition. This is useful for:
|
|
- First runs when no script exists yet (the system handles this automatically)
|
|
- Debugging: comparing AI behavior vs script behavior
|
|
- Sites that changed layout since the last successful script run
|
|
"""
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# Prompt functions
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def build_workflow(
|
|
task_description: Annotated[
|
|
str,
|
|
Field(description="What the workflow should automate, e.g. 'Fill out a tax form on irs.gov'"),
|
|
] = "",
|
|
) -> str:
|
|
"""Guide for building a Skyvern workflow. Invoke this prompt when a user asks to create,
|
|
design, or build a browser automation workflow. The guide covers block selection, prompt
|
|
writing, parameterization, testing, and iteration."""
|
|
if task_description:
|
|
return f"{BUILD_WORKFLOW_CONTENT}\n---\n\nUser's automation goal:\n```\n{task_description}\n```\n"
|
|
return BUILD_WORKFLOW_CONTENT
|
|
|
|
|
|
def debug_automation(
|
|
error_or_symptom: Annotated[
|
|
str,
|
|
Field(description="The error message or symptom to diagnose, e.g. 'Timeout after 30s on login page'"),
|
|
] = "",
|
|
) -> str:
|
|
"""Diagnose and fix a failing Skyvern workflow or task.
|
|
|
|
Guides you through reading Skyvern's structured error output, matching the failure to a known pattern,
|
|
and fixing the workflow by updating block prompts — without manually exploring in a browser.
|
|
"""
|
|
parts = [DEBUG_AUTOMATION_CONTENT]
|
|
if error_or_symptom:
|
|
parts.append(
|
|
f"\n---\n\nThe user reports this error or symptom:\n```\n{error_or_symptom}\n```\n\n"
|
|
"Start at Step 1: call skyvern_workflow_status (or check the last tool response) to get the full error "
|
|
"details. Then match it to a pattern in Step 2 and apply the fix from Step 3."
|
|
)
|
|
return "\n".join(parts)
|
|
|
|
|
|
def extract_data(
|
|
target_description: Annotated[
|
|
str,
|
|
Field(description="What data to extract, e.g. 'Product prices and ratings from Amazon search results'"),
|
|
] = "",
|
|
) -> str:
|
|
"""Guide for extracting structured data from websites using Skyvern.
|
|
|
|
Covers schema design, writing extraction prompts, multi-page extraction patterns, and result
|
|
validation. Call this prompt before building an extraction workflow or writing extraction calls.
|
|
"""
|
|
suffix = ""
|
|
if target_description:
|
|
suffix = (
|
|
f"\n\nApply the above methodology to extract:\n```\n{target_description}\n```\n"
|
|
"Design a JSON Schema for the output, choose the right tool "
|
|
"(skyvern_extract for current page, skyvern_run_task for navigate-then-extract), "
|
|
"and validate the results."
|
|
)
|
|
return EXTRACT_DATA_CONTENT + suffix
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# qa_test
|
|
# ---------------------------------------------------------------------------
|
|
|
|
# NOTE: This content is maintained in three places — keep all in sync:
|
|
# 1. skyvern/cli/skills/qa/SKILL.md (bundled with pip package — canonical)
|
|
# 2. .claude/skills/qa/SKILL.md (project-local copy for this repo)
|
|
# 3. skyvern/cli/mcp_tools/prompts.py (QA_TEST_CONTENT — this file)
|
|
QA_TEST_CONTENT = """\
|
|
# QA — Validate Frontend and Backend Changes
|
|
|
|
Read the diff, classify what changed, and run the right validation path: browser QA for frontend/browser
|
|
changes, API validation for backend surface changes, repo-native validation for backend-internal changes,
|
|
and both for mixed changes.
|
|
|
|
This is NOT a generic crawler. It should validate the changed behavior with the right tools.
|
|
|
|
---
|
|
|
|
## Step 1: Understand the Changes
|
|
|
|
### Get the diff
|
|
|
|
```bash
|
|
# What files changed?
|
|
git diff --name-only HEAD~1 # vs last commit (if changes are committed)
|
|
git diff --name-only # vs working tree (if uncommitted)
|
|
|
|
# Full diff for context
|
|
git diff HEAD~1 # or git diff for uncommitted
|
|
```
|
|
|
|
Pick whichever diff has content. If both are empty, there is nothing diff-driven to QA.
|
|
|
|
### Read the changed files
|
|
|
|
Read the full contents of every changed file that affects behavior:
|
|
|
|
- Frontend files: `.tsx`, `.jsx`, `.ts`, `.js`, `.css`, `.html`
|
|
- Backend/API files: routes, controllers, request/response schemas, serializers, handlers
|
|
- Backend-internal files: services, workers, validators, business logic, data-layer code
|
|
- Tests that changed alongside the implementation
|
|
|
|
Look for routes, UI text, forms, API parameters, response fields, validation logic, error states,
|
|
and any tests that describe the intended behavior.
|
|
|
|
### Classify the diff
|
|
|
|
| Mode | Trigger | Primary validation |
|
|
|------|---------|--------------------|
|
|
| Frontend/browser | UI/routes/components/styles changed | Browser QA against the dev server |
|
|
| Backend API | Route handlers, request/response schemas, or externally visible API behavior changed | Start backend locally and run targeted API requests |
|
|
| Backend-internal | Services/workers/business logic changed without public API surface changes | Repo-native fast checks plus targeted tests |
|
|
| Mixed | Frontend/browser and backend changed together | Backend validation first, then frontend/browser QA |
|
|
|
|
Rules:
|
|
|
|
- If both frontend/browser and backend changed, use `Mixed`.
|
|
- If only backend internals changed, do not invent unrelated browser tests or random API calls.
|
|
- If a backend change might affect the public contract, inspect routes, schemas, and tests before treating it as internal-only.
|
|
- If the diff is mostly documentation or comments, keep QA lightweight and report that no behavioral validation was warranted.
|
|
|
|
---
|
|
|
|
## Step 2: Choose the Validation Strategy
|
|
|
|
### Frontend/browser mode
|
|
|
|
Use browser automation against the dev server. Validate the specific UI changes plus 1-2 nearby regressions.
|
|
|
|
### Backend API mode
|
|
|
|
Read the repo's local instructions first, start the backend if needed, identify the changed
|
|
endpoint(s), and run targeted HTTP requests to validate the changed contract.
|
|
|
|
### Backend-internal mode
|
|
|
|
Run the repo's fast verification commands first, then targeted unit/integration/scenario tests.
|
|
Only do live API calls if the internal change affects exposed behavior.
|
|
|
|
### Mixed mode
|
|
|
|
Validate the backend first, then run frontend/browser QA against the flow that depends on it.
|
|
If the backend contract is broken, frontend results are not trustworthy.
|
|
|
|
---
|
|
|
|
## Step 3A: Frontend/Browser QA
|
|
|
|
### Find the dev server
|
|
|
|
If a URL was provided, use it. Otherwise try common local ports:
|
|
|
|
```text
|
|
5173, 3000, 3001, 8080, 8000, 4200
|
|
```
|
|
|
|
If none respond, start the most direct repo-documented local command for the
|
|
changed surface. If the diff needs both frontend and backend running together
|
|
and the repo provides a combined frontend/backend dev script, prefer that.
|
|
Only ask the user to start something manually if the repo has no documented
|
|
command or startup fails.
|
|
|
|
### Connect to a browser
|
|
|
|
Try these in order:
|
|
|
|
#### Option A: Local browser (fastest)
|
|
```text
|
|
skyvern_browser_session_create(local=true, headless=false, timeout=15)
|
|
```
|
|
Use `local=true` so the browser can reach `localhost`.
|
|
|
|
#### Option B: Local browser via tunnel
|
|
|
|
If local session creation fails because the MCP server is remote, the cloud browser cannot reach
|
|
`localhost`. Tell the user to run:
|
|
|
|
```bash
|
|
# Terminal 1: Launch a local browser with CDP exposed
|
|
skyvern browser serve --port 9222
|
|
|
|
# Terminal 2: Tunnel it to the internet
|
|
ngrok http 9222
|
|
```
|
|
|
|
Then connect:
|
|
|
|
```text
|
|
skyvern_browser_session_connect(cdp_url="wss://<ngrok-subdomain>.ngrok-free.app/devtools/browser/<id>")
|
|
```
|
|
|
|
The user can get the browser ID from the `skyvern browser serve` output or by calling the
|
|
ngrok URL's `/json` endpoint.
|
|
|
|
#### Option C: Cloud browser
|
|
```text
|
|
skyvern_browser_session_create(timeout=15)
|
|
```
|
|
Only works for publicly reachable URLs. `localhost` URLs will not work here.
|
|
|
|
### Generate frontend/browser test cases
|
|
|
|
For each changed frontend file, write targeted checks. Example:
|
|
|
|
```text
|
|
Test 1: Settings page renders the new "Retry failed run" button
|
|
- Navigate to /settings/runs
|
|
- Assert: button exists
|
|
- Click it
|
|
- Assert: success toast appears
|
|
|
|
Test 2: Adjacent regression
|
|
- Verify the existing "Delete run" action still works or is still visible
|
|
```
|
|
|
|
### Run the frontend/browser tests
|
|
|
|
Navigate:
|
|
|
|
```text
|
|
skyvern_navigate(url="http://localhost:<port>/<route>")
|
|
```
|
|
|
|
Health gate:
|
|
|
|
```text
|
|
skyvern_evaluate(expression="(() => {
|
|
const errors = [];
|
|
const body = document.body?.innerText || '';
|
|
if (body.includes('Something went wrong')) errors.push('error_message');
|
|
if (body.includes('Cannot read properties')) errors.push('js_error_in_ui');
|
|
if (/\\bundefined\\b/.test(body) && !/\\bif\\b|\\btypeof\\b|\\bdocument|tutorial|example/i.test(body) && body.length < 5000) errors.push('undefined_text');
|
|
if (body.includes('connection refused')) errors.push('connection_refused');
|
|
if (/sign.?in|log.?in|auth/i.test(window.location.pathname)) errors.push('auth_redirect');
|
|
if (document.querySelector('[role=\\"alert\\"]')) errors.push('alert_element');
|
|
if (!document.querySelector('main, [role=\\"main\\"], nav, header, h1, h2, [class*=\\"layout\\" i], [class*=\\"page\\" i], [class*=\\"app\\" i]'))
|
|
errors.push('blank_page');
|
|
return JSON.stringify({ pass: errors.length === 0, errors });
|
|
})()")
|
|
```
|
|
|
|
Prefer deterministic DOM assertions:
|
|
|
|
```text
|
|
skyvern_evaluate(expression="!!document.querySelector('button')")
|
|
skyvern_evaluate(expression="document.querySelector('h1')?.textContent?.trim()")
|
|
skyvern_evaluate(expression="window.location.pathname")
|
|
```
|
|
|
|
Use interaction tools when needed:
|
|
|
|
```text
|
|
skyvern_act(prompt="Click the 'Retry failed run' button")
|
|
skyvern_validate(prompt="The page shows the success toast and the form is no longer loading")
|
|
skyvern_screenshot()
|
|
```
|
|
|
|
Also check for failed network requests once per page:
|
|
|
|
```text
|
|
skyvern_evaluate(expression="(() => {
|
|
const entries = performance.getEntriesByType('resource').filter(e => e.responseStatus >= 400);
|
|
return JSON.stringify({ failed: entries.map(e => ({ url: e.name, status: e.responseStatus })).slice(0, 5) });
|
|
})()")
|
|
```
|
|
|
|
---
|
|
|
|
## Step 3B: Backend API QA
|
|
|
|
### Gather repo-local context first
|
|
|
|
Before starting the server or sending requests, read the repo's local instructions:
|
|
|
|
- `README`, `AGENTS.md`, `CLAUDE.md`, `Makefile`, `package.json`, `pyproject.toml`
|
|
- changed tests for the affected endpoints
|
|
- docs that describe auth, local ports, or startup commands
|
|
|
|
Do not guess the startup command if the repo already documents one.
|
|
|
|
### Start the backend if needed
|
|
|
|
If the backend is not responding on the expected local port:
|
|
|
|
1. Start it with the most direct repo-documented local command for the changed
|
|
surface. If the validation needs both frontend and backend and the repo
|
|
documents a combined dev environment command, prefer that over inventing
|
|
separate startup steps.
|
|
2. Wait for readiness
|
|
3. Confirm the API is reachable before sending requests
|
|
|
|
### Identify the changed API surface
|
|
|
|
Use the diff to answer:
|
|
|
|
- Which endpoints changed?
|
|
- Which request parameters, headers, or bodies changed?
|
|
- Which response fields or status codes changed?
|
|
- Did auth requirements change?
|
|
- Is there a mutation side effect that needs follow-up verification?
|
|
|
|
Read the full handler, schema, and any changed tests.
|
|
|
|
### Generate backend API test cases
|
|
|
|
For each changed endpoint, create targeted checks:
|
|
|
|
- Happy path with valid parameters
|
|
- Empty result or not-found path where applicable
|
|
- Invalid input or validation error path
|
|
- Combined filters or sorting behavior when relevant
|
|
- Follow-up read after mutation endpoints
|
|
|
|
Example:
|
|
|
|
```text
|
|
Test 1: GET /api/runs returns the new field in the response body
|
|
Test 2: GET /api/runs?status=missing returns an empty list, not a 500
|
|
Test 3: POST /api/runs rejects invalid payload with a 4xx validation error
|
|
Test 4: PATCH /api/runs/:id updates the record and a follow-up GET shows the change
|
|
```
|
|
|
|
### Execute the API requests
|
|
|
|
Use the repo's documented auth scheme and local base URL. Use `curl`, the repo SDK, or a small
|
|
one-off client if that is clearer than shell quoting.
|
|
|
|
Examples:
|
|
|
|
```bash
|
|
curl -sS -H "Authorization: Bearer <token>" \
|
|
"http://localhost:<port>/api/..."
|
|
|
|
curl -sS -X POST \
|
|
-H "Content-Type: application/json" \
|
|
-H "<auth-header>: <token>" \
|
|
-d '{"example":"value"}' \
|
|
"http://localhost:<port>/api/..."
|
|
```
|
|
|
|
Capture:
|
|
|
|
- request being tested
|
|
- status code
|
|
- response body snippet or parsed result
|
|
- whether the changed field/behavior is present
|
|
|
|
If auth is required and local credentials are unavailable from repo docs, say so clearly and stop rather than faking coverage.
|
|
|
|
---
|
|
|
|
## Step 3C: Backend-Internal QA
|
|
|
|
If the diff is backend-only but does not change an exposed endpoint or UI flow:
|
|
|
|
1. Run the repo's fastest compile/type/lint checks for the changed files
|
|
2. Run targeted unit/integration/scenario tests that cover the changed logic
|
|
3. Add live API calls only if the internal change affects exposed behavior
|
|
|
|
Good checks:
|
|
|
|
- compile/type-check the changed files
|
|
- targeted `pytest`, `npm test`, `go test`, or equivalent
|
|
- scenario/integration tests around changed services or workflows
|
|
|
|
Bad checks:
|
|
|
|
- hitting an unrelated health endpoint and calling it validated
|
|
- browsing the UI when no frontend behavior changed
|
|
- calling random APIs just because the backend changed
|
|
|
|
---
|
|
|
|
## Step 4: Report Results
|
|
|
|
```markdown
|
|
## QA Report
|
|
|
|
### Validation Mode
|
|
- Mode: Backend API
|
|
- Scope: `routes/runs.py`, `schemas/run_response.py`
|
|
|
|
### Results
|
|
| # | Test | Result | Evidence |
|
|
|---|------|--------|----------|
|
|
| 1 | GET /api/runs returns `retryable` for valid runs | PASS | HTTP 200, field present |
|
|
| 2 | GET /api/runs?status=missing returns empty list | PASS | HTTP 200, `[]` |
|
|
| 3 | GET /api/runs?status=invalid returns validation error | PASS | HTTP 422 |
|
|
| 4 | Frontend runs page still renders filter state | PASS | screenshot_3 |
|
|
|
|
### Issues Found
|
|
1. `retryable` is missing from one serializer branch.
|
|
|
|
### Verdict
|
|
3/4 tests passed. 1 issue found.
|
|
```
|
|
|
|
Use the evidence that actually matters:
|
|
|
|
- screenshots for frontend/browser results
|
|
- status codes and response snippets for backend API results
|
|
- command + failing assertion for unit/integration tests
|
|
|
|
---
|
|
|
|
## Step 5: Post Evidence to PR
|
|
|
|
After generating the QA report, persist it to the pull request as a sticky comment so the
|
|
evidence survives beyond the conversation.
|
|
|
|
### Check for an open PR
|
|
|
|
```bash
|
|
PR_NUMBER=$(gh pr view --json number -q '.number' 2>/dev/null)
|
|
```
|
|
|
|
If no PR exists for the current branch:
|
|
1. Save the full report markdown to `.qa/latest-report.md` in the project root (create the directory if needed).
|
|
2. Tell the user: "No open PR found for this branch. QA report saved to `.qa/latest-report.md`. Run /qa again after creating a PR to post it."
|
|
3. Stop here — do not attempt to create a PR.
|
|
|
|
### Post or update the sticky comment
|
|
|
|
Use a hidden HTML marker to make the comment idempotent across multiple runs:
|
|
|
|
```bash
|
|
# Prepare the comment body with the hidden marker
|
|
COMMENT_BODY="<!-- skyvern-qa-report -->
|
|
## QA Report — $(git rev-parse --short HEAD) — $(date -u +%Y-%m-%dT%H:%M:%SZ)
|
|
|
|
<the full report markdown from Step 4>
|
|
"
|
|
|
|
# Find an existing QA comment on the PR
|
|
EXISTING_COMMENT_ID=$(gh api "repos/{owner}/{repo}/issues/${PR_NUMBER}/comments" \\
|
|
--jq '.[] | select(.body | test("skyvern-qa-report")) | .id' \\
|
|
2>/dev/null | head -1)
|
|
|
|
if [ -n "$EXISTING_COMMENT_ID" ]; then
|
|
# Update the existing comment in place
|
|
gh api "repos/{owner}/{repo}/issues/comments/${EXISTING_COMMENT_ID}" \\
|
|
-X PATCH -f body="$COMMENT_BODY"
|
|
else
|
|
# Create a new comment
|
|
gh pr comment "$PR_NUMBER" --body "$COMMENT_BODY"
|
|
fi
|
|
```
|
|
|
|
### Screenshot handling
|
|
|
|
Screenshots taken during QA (via `skyvern_screenshot()`) are saved locally for the agent's
|
|
verification. They are not uploaded to the PR comment because GitHub's API does not support
|
|
image uploads in issue comments. The text report describes what was observed.
|
|
|
|
If the user asks to preserve screenshots, save them to `.qa/screenshots/` and tell the user
|
|
the local path. Do not include local file paths in the PR comment — they are meaningless to
|
|
other reviewers.
|
|
|
|
### Rules
|
|
|
|
- Always include the `<!-- skyvern-qa-report -->` marker so repeated runs update the same comment instead of creating duplicates.
|
|
- Include the short commit hash and UTC timestamp in the comment header.
|
|
- Do not create a PR just to post a QA report — that is the user's decision.
|
|
- If `gh` is not available or not authenticated, fall back to saving the report locally and tell the user.
|
|
|
|
---
|
|
|
|
## Tool Selection
|
|
|
|
| What you need | Tool | Speed |
|
|
|---------------|------|-------|
|
|
| Check element exists, text, count, URL | `skyvern_evaluate` | ~10ms |
|
|
| Click, type, fill forms, multi-step interaction | `skyvern_act` | 5-30s |
|
|
| Visual check | `skyvern_validate` | 15-50s |
|
|
| Screenshot | `skyvern_screenshot` | ~1s |
|
|
| Wait for async content | `skyvern_wait` | varies |
|
|
|
|
Default to `skyvern_evaluate` for frontend/browser assertions. For backend API validation, prefer simple
|
|
HTTP requests and deterministic response checks over browser-based inference.
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
| Problem | Action |
|
|
|---------|--------|
|
|
| No git diff found | Ask what behavior to validate, then fall back to explore mode |
|
|
| Frontend dev server not running | Start the most direct repo-documented local command for the changed surface; prefer a combined dev command only when the validation needs both frontend and backend; only ask the user if no documented command exists or startup fails |
|
|
| Backend server not running | Start the most direct repo-documented local command for the changed surface; prefer a combined dev environment command only when the validation needs both sides |
|
|
| Cannot identify changed endpoint | Read changed routes, schemas, and tests before proceeding |
|
|
| Auth required but no local creds available | Report the blocker clearly; do not fake coverage |
|
|
| Component doesn't render | Capture screenshot and specific UI error |
|
|
| API returns unexpected 5xx | Save request/response evidence and report the regression |
|
|
|
|
## Session Cleanup
|
|
|
|
Always close browser sessions when done:
|
|
|
|
```text
|
|
skyvern_browser_session_close()
|
|
```
|
|
|
|
If you started local servers or background processes, leave a clear note about what is still running.
|
|
|
|
---
|
|
|
|
## Fallback: Explore Mode
|
|
|
|
If there is no useful diff:
|
|
|
|
1. Ask what behavior should be validated
|
|
2. Use browser QA for frontend flows
|
|
3. Use targeted API checks for backend flows
|
|
4. Report findings with the same evidence standard
|
|
|
|
The primary mode is still **diff-driven**. Always try to understand the code changes first.
|
|
"""
|
|
|
|
|
|
def qa_test(
|
|
url: Annotated[
|
|
str,
|
|
Field(description="Frontend or API base URL to test against, e.g. 'http://localhost:3000'"),
|
|
] = "",
|
|
context: Annotated[
|
|
str,
|
|
Field(description="What to focus on, e.g. 'test the checkout flow' or 'validate the workflow filters API'"),
|
|
] = "",
|
|
) -> str:
|
|
"""QA test recent code changes with the right validation path.
|
|
|
|
Reads the developer's git diff to understand what changed, classifies the work as
|
|
frontend/browser, backend API, backend-internal, or mixed, and then runs targeted validation.
|
|
Frontend checks use Skyvern browser tools; backend validation uses repo-native startup,
|
|
HTTP requests, and targeted tests where appropriate.
|
|
"""
|
|
parts = [QA_TEST_CONTENT]
|
|
if url or context:
|
|
parts.append("\n---\n")
|
|
if url:
|
|
parts.append(f"Target URL: `{url}`\n")
|
|
if context:
|
|
parts.append(
|
|
f"Focus area: {context}\n\n"
|
|
"Start at Step 1: run `git diff` to understand the changes, choose the correct validation "
|
|
"mode, and focus the resulting checks on the area described above."
|
|
)
|
|
return "\n".join(parts)
|