Bundle QA skill in pip package and auto-install during setup (#4985)

This commit is contained in:
Marc Kelechava 2026-03-04 17:23:50 -08:00 committed by GitHub
parent bdd1a26361
commit 58d9b980d2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 394 additions and 25 deletions

View file

@ -159,14 +159,17 @@ skyvern block validate --file block.json # Validate a block definition
Register Skyvern's MCP server with your AI coding tool in one command:
```bash
skyvern setup claude-code # Register with Claude Code (global)
skyvern setup claude-code # Register with Claude Code + install skills (/qa, etc.)
skyvern setup claude-code --project # Register with Claude Code (project-level)
skyvern setup claude-code --skip-skills # MCP only, no skills
skyvern setup claude-desktop # Register with Claude Desktop
skyvern setup cursor # Register with Cursor
skyvern setup windsurf # Register with Windsurf
skyvern setup codex # Register with Codex
```
`skyvern setup claude-code` writes the MCP config **and** copies bundled skills (including `/qa`) into your project's `.claude/skills/` directory. Use `--skip-skills` to opt out.
### Other
```bash
@ -190,30 +193,34 @@ The CLI and MCP server share the same underlying logic. The CLI is for humans an
## Skills
Skills are bundled reference markdown files that teach AI coding tools how to use Skyvern. They are **not** the same as MCP tools — they are documentation that an AI agent can load to learn the CLI and API.
Skills are bundled markdown files that teach AI coding tools how to use Skyvern. They ship with `pip install skyvern` and are **automatically installed** when you run `skyvern setup claude-code`.
| Skill | What it does |
|-------|-------------|
| **qa** | Reads your `git diff`, generates browser tests, runs them against your dev server, reports pass/fail with screenshots. Invoke with `/qa` in Claude Code. |
| **skyvern** | CLI reference covering all browser automation, workflow, and credential commands. |
| **testing** | Smoke tests for verifying Skyvern deployments. |
```bash
skyvern skill list # List available skills
skyvern skill show skyvern # Render a skill in the terminal
skyvern skill path skyvern # Print the absolute path to a skill file
skyvern skill path # Print the skills directory
skyvern skill copy --output ./docs # Copy all skills to a local directory
skyvern skill copy skyvern -o . # Copy a single skill
skyvern skill show qa # Render a skill in the terminal
skyvern skill copy --output .claude/skills # Copy all skills manually
skyvern skill copy qa -o .claude/skills # Copy a single skill
```
<Accordion title="Loading skills into AI tools">
Skills are plain markdown files. You can load them into any AI coding tool that supports custom instructions:
Skills are plain markdown files. Load them into any AI coding tool that supports custom instructions:
**Claude Code** — add the skill path as a custom instructions file or use `skyvern setup claude-code` which configures MCP (the richer integration path).
**Claude Code** — run `skyvern setup claude-code`, which registers the MCP server **and** installs skills into `.claude/skills/`. The `/qa` skill is immediately available.
**Codex** — copy the skill into your project's `.codex/skills/` directory:
**Codex** — copy skills into your project's `.codex/skills/` directory:
```bash
skyvern skill copy skyvern -o .codex/skills/
skyvern skill copy -o .codex/skills/
```
**Any tool** — point your tool at the file path returned by `skyvern skill path skyvern`.
**Any tool** — point your tool at the file path returned by `skyvern skill path qa`.
</Accordion>

View file

@ -267,6 +267,33 @@ This is especially powerful during development: make a code change, then ask you
Always use `--api-key` when exposing your browser via tunnel. Without it, anyone with the ngrok URL has full browser control.
</Warning>
## QA your frontend changes
If you install the Skyvern CLI, `skyvern setup claude-code` registers the MCP server **and** installs Claude Code skills — including `/qa`, which reads your git diff, generates browser tests, and runs them against your dev server.
```bash
pip install skyvern
skyvern setup claude-code
```
Then make a frontend change and type `/qa` in Claude Code. It will:
1. Read your `git diff` and the full source of each changed file
2. Generate targeted test cases (plus regression tests for adjacent behavior)
3. Open a browser against your running dev server
4. Run the tests — navigate, click, fill forms, check the DOM
5. Report pass/fail with screenshots
For localhost testing, start a local browser first:
```bash
skyvern browser serve --tunnel
```
<Tip>
The `/qa` skill uses `skyvern_evaluate` for fast DOM assertions (~10ms each) and `skyvern_act` for natural-language interactions. Tests are generated fresh from each diff — no test files to maintain.
</Tip>
## Troubleshooting
<Accordion title="Invalid API key or 401 errors">

View file

@ -458,8 +458,10 @@ def extract_data(
# qa_test
# ---------------------------------------------------------------------------
# NOTE: This content is also maintained in .claude/skills/qa/SKILL.md
# for the /qa Claude Code skill. Keep both in sync when updating.
# NOTE: This content is maintained in three places — keep all in sync:
# 1. skyvern/cli/skills/qa/SKILL.md (bundled with pip package — canonical)
# 2. .claude/skills/qa/SKILL.md (project-local copy for this repo)
# 3. skyvern/cli/mcp_tools/prompts.py (QA_TEST_CONTENT — this file)
QA_TEST_CONTENT = """\
# QA — Test Frontend Changes in a Real Browser

View file

@ -5,6 +5,7 @@ from __future__ import annotations
import json
import os
import platform
import shutil
import sys
from pathlib import Path
from urllib.parse import urlparse
@ -14,6 +15,7 @@ from dotenv import load_dotenv
from rich.syntax import Syntax
from skyvern.cli.console import console
from skyvern.cli.skill_commands import get_skill_dirs
from skyvern.utils.env_paths import resolve_backend_env_path
# NOTE: skyvern/cli/mcp.py has older setup_*_config() helpers called from
@ -251,6 +253,47 @@ def _run_setup(
_upsert_mcp_config(config_path, tool_name, entry, dry_run=dry_run, yes=yes)
def _install_skills(project_dir: Path, dry_run: bool = False) -> None:
"""Install bundled skills into a project's .claude/skills/ directory.
Skips skills that already exist at the destination (non-destructive).
"""
skills_dst = project_dir / ".claude" / "skills"
dirs = get_skill_dirs()
if not dirs:
return
installed: list[str] = []
skipped: list[str] = []
failed: list[str] = []
ignore = shutil.ignore_patterns("__pycache__", "*.pyc")
for d in dirs:
target = skills_dst / d.name
if target.exists():
skipped.append(d.name)
continue
if not dry_run:
try:
target.parent.mkdir(parents=True, exist_ok=True)
shutil.copytree(d, target, ignore=ignore)
except OSError as e:
console.print(f"[yellow]Warning: failed to install skill '{d.name}': {e}[/yellow]")
failed.append(d.name)
continue
installed.append(d.name)
if installed:
names = ", ".join(installed)
if dry_run:
console.print(f"\n[yellow]Dry run — would install skills: {names}[/yellow]")
else:
console.print(f"\n[green]Installed skills to {skills_dst}: {names}[/green]")
if "qa" in installed:
console.print("[bold]Tip:[/bold] Make a frontend change and type /qa to test it in a real browser.")
if skipped:
console.print(f"[dim]Skills already installed: {', '.join(skipped)}[/dim]")
# ---------------------------------------------------------------------------
# Commands
# ---------------------------------------------------------------------------
@ -288,11 +331,15 @@ def setup_claude_code(
use_python_path: bool = _python_path_opt,
url: str | None = _url_opt,
project: bool = typer.Option(False, "--project", help="Write to .mcp.json in current dir instead of global config"),
skip_skills: bool = typer.Option(False, "--skip-skills", help="Don't install Claude Code skills (e.g. /qa)"),
) -> None:
"""Register Skyvern MCP with Claude Code (remote by default)."""
"""Register Skyvern MCP with Claude Code and install skills (remote by default)."""
config_path = Path.cwd() / ".mcp.json" if project else _claude_code_global_config_path()
_run_setup("Claude Code", config_path, api_key, dry_run, yes, local, use_python_path, url)
if not skip_skills:
_install_skills(Path.cwd(), dry_run=dry_run)
@setup_app.command("cursor")
def setup_cursor(

View file

@ -19,7 +19,7 @@ SKILLS_DIR = Path(__file__).parent / "skills"
_FRONTMATTER_RE = re.compile(r"^---\n(.*?)\n---", re.DOTALL)
def _get_skill_dirs() -> list[Path]:
def get_skill_dirs() -> list[Path]:
"""Return sorted list of skill directories (those containing SKILL.md)."""
if not SKILLS_DIR.exists():
return []
@ -60,7 +60,7 @@ def _extract_description(skill_md: Path) -> str:
@skill_app.command("list")
def skill_list() -> None:
"""List all bundled skills."""
dirs = _get_skill_dirs()
dirs = get_skill_dirs()
if not dirs:
console.print("[red]No skills found in package. Re-install skyvern.[/red]")
raise typer.Exit(code=1)
@ -120,7 +120,7 @@ def skill_copy(
shutil.copytree(src, target, dirs_exist_ok=overwrite, ignore=_ignore)
console.print(f"[green]Copied skill '{name}' to {target.resolve()}[/green]")
else:
dirs = _get_skill_dirs()
dirs = get_skill_dirs()
if not dirs:
console.print("[red]No skills found in package. Re-install skyvern.[/red]")
raise typer.Exit(code=1)

View file

@ -1,21 +1,29 @@
# Skyvern Skills Package
AI-powered browser automation skill for coding agents. Bundled with `pip install skyvern`.
AI-powered browser automation skills for coding agents. Bundled with `pip install skyvern`.
## Quick Start
```bash
pip install skyvern
export SKYVERN_API_KEY="YOUR_KEY" # get one at https://app.skyvern.com
# Set up MCP + install skills in one step:
skyvern setup claude-code
```
The skill teaches CLI commands via `skyvern <command>` invocations. For richer
AI-coding-tool integration, run `skyvern setup claude-code --project` to enable
MCP (Model Context Protocol) with auto-tool-calling.
`skyvern setup claude-code` registers the Skyvern MCP server and installs these
skills into your project's `.claude/skills/` directory automatically.
## What's Included
A single `skyvern` skill covering all browser automation capabilities:
### qa
QA test your frontend changes in a real browser. Reads your `git diff`, generates
targeted browser tests, runs them against your local dev server, and reports
pass/fail with screenshots. Invoke with `/qa` in Claude Code.
### skyvern
CLI reference covering all browser automation capabilities:
- Browser session lifecycle (create, navigate, close)
- AI actions: act, extract, validate, screenshot
@ -26,19 +34,27 @@ A single `skyvern` skill covering all browser automation capabilities:
- Block schema discovery and validation
- Debugging with screenshot + validate loops
### testing
Smoke-test skill for verifying Skyvern deployments.
## Structure
```
qa/
SKILL.md Diff-driven frontend QA testing
skyvern/
SKILL.md Main skill file (CLI-first, all capabilities)
references/ 17 deep-dive reference files
examples/ Workflow JSON examples
testing/
SKILL.md Deployment smoke testing
```
## Install to a Project
## Manual Install
If you prefer to install skills without running setup:
```bash
# Copy skill files to your project
skyvern skill copy --output .claude/skills
skyvern skill copy --output .agents/skills
```

View file

@ -0,0 +1,270 @@
---
name: qa
description: "QA test your frontend changes in a real browser. Reads your git diff, generates targeted browser tests, runs them against your local dev server, and reports pass/fail with screenshots."
---
# QA — Test Your Frontend Changes in a Real Browser
<!-- NOTE: This content is maintained in three places — keep all in sync:
1. skyvern/cli/skills/qa/SKILL.md (bundled with pip package — canonical)
2. .claude/skills/qa/SKILL.md (project-local copy for this repo)
3. skyvern/cli/mcp_tools/prompts.py (QA_TEST_CONTENT for the MCP prompt) -->
You changed some code. This skill reads your diff, understands what UI was affected, opens
a real browser against your running dev server, and tests that your changes actually work.
## Quick Start
```
/qa # Diff-based: test what you changed
/qa http://localhost:3000 # Same, explicit URL
/qa -- test the checkout flow # Targeted: test specific behavior
```
## How It Works
1. **Read your code changes** (`git diff`) to understand what you modified
2. **Read the changed files** to understand the UI: routes, components, props, text
3. **Generate targeted test cases** based on what the code actually does
4. **Open a browser** against your running dev server
5. **Run the tests** — navigate, interact, assert, screenshot
6. **Report** pass/fail with evidence
This is NOT a generic website crawler. It tests YOUR changes specifically.
---
## Step 1: Understand the Changes
### Get the diff
```bash
# What files changed?
git diff --name-only HEAD~1 # vs last commit (if changes are committed)
git diff --name-only # vs working tree (if uncommitted)
# Full diff for context
git diff HEAD~1 # or git diff for uncommitted
```
Pick whichever diff has content. If both are empty, there's nothing to QA.
### Read the changed files
For every changed frontend file (`.tsx`, `.jsx`, `.ts`, `.js`, `.css`, `.html`):
- Read the FULL file (not just the diff) to understand the component
- Look for: route paths, component names, text labels, form fields, button labels,
API endpoints called, conditional rendering, error states
### Classify the changes
| Change Type | What to Test |
|-------------|-------------|
| New component/page | Navigate to it, verify it renders, interact with its elements |
| Modified component | Navigate to it, verify the specific change works (new button, new text, new behavior) |
| Styling changes | Navigate, screenshot, verify layout isn't broken |
| API integration | Navigate, trigger the action, verify the API call works (check network, verify UI updates) |
| Form changes | Fill the form, submit, verify validation and success states |
| Route changes | Navigate to old and new routes, verify routing works |
| Shared component (used in many places) | Test 2-3 pages that use it |
| Bug fix | Reproduce the original bug scenario, verify it's fixed |
### Generate test cases
For each changed file, write specific test cases. Example:
If the diff shows changes to `LoginForm.tsx` adding a "Forgot password" link:
```
Test 1: Login page renders the new "Forgot password" link
- Navigate to /login
- Assert: link with text "Forgot password" exists
- Click it
- Assert: navigated to /forgot-password (or modal appeared)
Test 2: Login form still works (regression)
- Navigate to /login
- Verify email and password inputs exist
- Submit empty form, verify validation errors appear
```
**Be specific.** Don't write "verify the page works." Write "verify the 'Forgot password'
link navigates to /forgot-password."
---
## Step 2: Find the Dev Server
If the user provided a URL, use it. Otherwise, auto-detect:
```
# Try common dev server ports
# 5173 (Vite), 3000 (Next/CRA), 3001, 8080, 8000, 4200 (Angular)
```
Navigate to each until one responds. If none respond, tell the user:
"Start your dev server first, then run `/qa` again."
---
## Step 3: Connect to a Browser
```
skyvern_browser_session_create(local=true, headless=false, timeout=15)
```
Use `local=true` so it can reach `localhost`. Use `headless=false` so the user can watch.
If local fails, fall back to cloud (warn that URL must be publicly accessible):
```
skyvern_browser_session_create(timeout=15)
```
---
## Step 4: Run the Tests
For each test case generated in Step 1:
### Navigate
```
skyvern_navigate(url="http://localhost:<port>/<route>")
```
### Health gate (after every navigate, ~10ms)
```
skyvern_evaluate(expression="(() => {
const errors = [];
const body = document.body?.innerText || '';
if (body.includes('Something went wrong')) errors.push('error_message');
if (body.includes('Cannot read properties')) errors.push('js_error_in_ui');
if (/\bundefined\b/.test(body) && !/\bif\b|\btypeof\b|\bdocument|tutorial|example/i.test(body) && body.length < 5000) errors.push('undefined_text');
if (body.includes('connection refused')) errors.push('connection_refused');
if (/sign.?in|log.?in|auth/i.test(window.location.pathname)) errors.push('auth_redirect');
if (document.querySelector('[role=\"alert\"]')) errors.push('alert_element');
if (!document.querySelector('main, [role=\"main\"], nav, header, h1, h2, [class*=\"layout\" i], [class*=\"page\" i], [class*=\"app\" i]'))
errors.push('blank_page');
return JSON.stringify({ pass: errors.length === 0, errors });
})()")
```
### Assert with DOM queries (prefer `skyvern_evaluate` — fast, deterministic)
```
# Element exists
skyvern_evaluate(expression="!!document.querySelector('a[href=\"/forgot-password\"]')")
# Text content
skyvern_evaluate(expression="document.querySelector('h1')?.textContent?.trim()")
# Element count
skyvern_evaluate(expression="document.querySelectorAll('.card').length")
# URL after navigation
skyvern_evaluate(expression="window.location.pathname")
```
### Interact (use `skyvern_act` for natural language actions)
```
skyvern_act(prompt="Click the 'Forgot password' link")
skyvern_act(prompt="Fill the email field with 'test@example.com' and click Submit")
skyvern_act(prompt="Open the dropdown menu and select 'Settings'")
```
### Visual checks (use `skyvern_validate` only when DOM queries aren't enough)
```
skyvern_validate(prompt="The login form shows email and password fields with a blue Submit button")
```
### Screenshot (after every significant action)
```
skyvern_screenshot()
```
### Failed network requests (once per page)
```
skyvern_evaluate(expression="(() => {
const entries = performance.getEntriesByType('resource').filter(e => e.responseStatus >= 400);
return JSON.stringify({ failed: entries.map(e => ({ url: e.name, status: e.responseStatus })).slice(0, 5) });
})()")
```
---
## Step 5: Report Results
```markdown
## QA Report
### Changes Tested
Files: `LoginForm.tsx`, `ForgotPassword.tsx`
Diff summary: Added "Forgot password" link to login form, new /forgot-password page
### Results
| # | Test | Result | Screenshot |
|---|------|--------|------------|
| 1 | Login page renders "Forgot password" link | PASS | screenshot_1 |
| 2 | Clicking link navigates to /forgot-password | PASS | screenshot_2 |
| 3 | Forgot password page renders form | PASS | screenshot_3 |
| 4 | Login form still works (regression) | PASS | screenshot_4 |
| 5 | Empty form shows validation errors | FAIL | screenshot_5 |
### Issues Found
1. **Empty login form submits without validation** — Submitting with no email/password
doesn't show error messages. The form submits and the page reloads.
Expected: validation errors. Screenshot: screenshot_5
### Network
- No failed requests detected
### Verdict
4/5 tests passed. 1 issue found: missing form validation on empty submit.
```
---
## Tool Selection
| What you need | Tool | Speed |
|---------------|------|-------|
| Check element exists, text, count, URL | `skyvern_evaluate` | ~10ms |
| Click, type, fill forms, multi-step interaction | `skyvern_act` | 5-30s |
| "Does this look right?" visual check | `skyvern_validate` | 15-50s |
| Get structured data from a page | `skyvern_extract` | 15-50s |
| Screenshot | `skyvern_screenshot` | ~1s |
| Wait for async content | `skyvern_wait` | varies |
**Default to `skyvern_evaluate` for assertions.** Only use `skyvern_validate` when you
can't express the check as a DOM query (visual layout, "does this look like a dashboard").
---
## Error Handling
| Problem | Action |
|---------|--------|
| No git diff found | Ask user what they want to test, fall back to explore mode |
| Dev server not running | Tell user to start it. Suggest common commands (npm run dev, etc.) |
| Auth redirect on page | Report it. Ask if they want to provide credentials or skip that route. |
| Component doesn't render | Screenshot + check console. Report with the specific error. |
| Session create fails | Try cloud fallback. Warn about URL accessibility. |
## Session Cleanup
ALWAYS close the session when done, even if errors occurred:
```
skyvern_browser_session_close()
```
---
## Fallback: Explore Mode
If there's no git diff (user just wants a general QA pass), fall back to exploring:
1. Navigate to the root URL
2. Extract nav links and page structure with `skyvern_extract`
3. Visit each major route, health gate + screenshot
4. Test interactive elements (forms, buttons, links)
5. Report findings
But the primary mode is **diff-driven**. The agent should always try to read the code
changes first.