chore(ci): consolidate pytest config into pytest.ini + sync to OSS (#5628)

This commit is contained in:
Andrew Neilson 2026-04-23 15:10:41 -07:00 committed by GitHub
parent 76b82757e3
commit 826ab8548b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
43 changed files with 3072 additions and 785 deletions

View file

@ -1,21 +1,24 @@
--- ---
title: MCP Server title: MCP Server
subtitle: Connect AI assistants to browser automation via Model Context Protocol subtitle: Connect AI assistants to browser automation via Model Context Protocol
description: Install and configure Skyvern's MCP server so Claude Desktop, Cursor, Windsurf, VS Code, Hermes, and other AI tools can run browser automations. description: Install and configure Skyvern's MCP server so Claude Desktop, Claude Code, Cursor, Windsurf, Codex, Hermes, OpenClaw, and other AI tools can run browser automations over OAuth or API key.
slug: going-to-production/mcp slug: going-to-production/mcp
keywords: keywords:
- MCP - MCP
- Model Context Protocol - Model Context Protocol
- OAuth
- Claude Desktop - Claude Desktop
- Claude Code
- Cursor - Cursor
- Windsurf - Windsurf
- VS Code - Codex
- Hermes - Hermes
- OpenClaw
- MCPorter - MCPorter
- AI assistant - AI assistant
- tools - tools
- stdio - stdio
- SSE - streamable-http
--- ---
The Skyvern MCP server lets AI assistants like Claude Desktop, Claude Code, Codex, Cursor, and Windsurf control a browser. Your AI can fill out forms, extract data, download files, and run multi-step workflows, all through natural language. The Skyvern MCP server lets AI assistants like Claude Desktop, Claude Code, Codex, Cursor, and Windsurf control a browser. Your AI can fill out forms, extract data, download files, and run multi-step workflows, all through natural language.
@ -37,18 +40,60 @@ Additional tools cover tab management, iframe switching, drag-and-drop, file upl
Your AI assistant decides which tools to call based on your instructions. For example, asking "go to Hacker News and get the top post title" triggers `skyvern_browser_session_create`, `skyvern_navigate`, `skyvern_extract`, and `skyvern_browser_session_close` automatically. Your AI assistant decides which tools to call based on your instructions. For example, asking "go to Hacker News and get the top post title" triggers `skyvern_browser_session_create`, `skyvern_navigate`, `skyvern_extract`, and `skyvern_browser_session_close` automatically.
## Which setup should I use? ## Pick your setup
| | Cloud setup (recommended) | Local mode | You have three ways to connect a client to Skyvern:
|---|---|---|
| **Best for** | Most users (Skyvern Cloud handles everything) | Self-hosting Skyvern with your own infrastructure |
| **Install required** | None | Python 3.11+ and `pip install skyvern` |
| **How it connects** | Your AI client connects to Skyvern Cloud over HTTPS | A local Python process runs on your machine |
| **API key** | From [app.skyvern.com](https://app.skyvern.com) | From your self-hosted Skyvern instance |
If you have a Skyvern Cloud account, use the cloud setup below. It takes 30 seconds and works immediately. | | OAuth (sign in with browser) | API key | Local mode |
|---|---|---|---|
| **Best for** | Claude Code, Cursor, and other clients that support MCP OAuth | Claude Desktop, Windsurf, Codex, Hermes, OpenClaw, or any client where you prefer a static key | Self-hosting Skyvern |
| **Install required** | None | None | Python 3.11+ and `pip install skyvern` |
| **Credential** | Browser sign-in with your Skyvern account | API key from [app.skyvern.com](https://app.skyvern.com) | API key from your self-hosted instance |
| **Where it runs** | Directly against Skyvern Cloud over HTTPS | Directly against Skyvern Cloud over HTTPS | Local Python process talking to your Skyvern server |
## Quick start If your client appears in the OAuth section below, start there. Otherwise use the API-key setup.
## Option A: Sign in with browser (OAuth)
Supported clients today: **Claude Code**, **Cursor**, and other MCP clients that support remote HTTP servers with OAuth. Skyvern Cloud publishes standard MCP OAuth discovery metadata, so supported clients can start browser sign-in without a static `CLIENT_ID` or `CLIENT_SECRET` in your config.
There is **no install step**. You do not need `pip install skyvern`, Node.js, or an API key.
<Tabs>
<Tab title="Claude Code">
```bash
claude mcp add --transport http skyvern https://api.skyvern.com/mcp/ --scope user
```
Then run `/mcp` in Claude Code and follow the authentication prompt. Claude Code opens your browser for Skyvern sign-in and stores the token for future sessions.
No `--header`, no API key. If you previously configured Skyvern with `x-api-key`, run `claude mcp remove skyvern` first so Claude Code does not skip the OAuth handshake.
</Tab>
<Tab title="Cursor">
Add to `~/.cursor/mcp.json`:
```json
{
"mcpServers": {
"Skyvern": {
"type": "streamable-http",
"url": "https://api.skyvern.com/mcp/"
}
}
}
```
Use a current Cursor release. Restart Cursor, open MCP settings, and authenticate the Skyvern server when prompted. If the prompt does not appear, update Cursor and reopen MCP settings. Cursor opens your browser for Skyvern sign-in and stores the token for future sessions.
</Tab>
</Tabs>
## Option B: API key
Use this path for Claude Desktop, Windsurf, Codex, Hermes, OpenClaw, and any client that does not yet support the MCP OAuth flow. You can also use it with Claude Code and Cursor if you prefer a static key.
Get your API key from [Settings](https://app.skyvern.com) in the Skyvern dashboard, then pick your client below. Get your API key from [Settings](https://app.skyvern.com) in the Skyvern dashboard, then pick your client below.
@ -97,7 +142,7 @@ Add this manual bridge to your Claude Desktop config file:
} }
``` ```
This fallback uses [mcp-remote](https://www.npmjs.com/package/mcp-remote) to bridge the remote MCP server to stdio. It still requires Node.js >= 20. Use this fallback for Linux or any Claude Desktop setup where the one-click bundle is not available. It uses [mcp-remote](https://www.npmjs.com/package/mcp-remote) to bridge the remote MCP server to stdio and requires Node.js >= 20.
</Tab> </Tab>
<Tab title="Cursor"> <Tab title="Cursor">
@ -243,7 +288,7 @@ skyvern setup mcporter
</Tab> </Tab>
</Tabs> </Tabs>
That's it. No Python, no `pip install`, no local server. Your AI assistant connects directly to Skyvern Cloud over HTTPS. Claude Desktop on macOS and Windows can use the one-click Skyvern bundle; Linux and advanced setups can still use [mcp-remote](https://www.npmjs.com/package/mcp-remote) with Node.js >= 20. After setup, restart your MCP client. Skyvern should appear as an MCP server and is ready for browser tasks.
<Tip> <Tip>
Prefer a CLI? For Skyvern Cloud, `pip install skyvern` then run `skyvern setup`. For the local self-hosted path, run `skyvern quickstart` or `skyvern init` and choose Claude Code during the MCP step. Prefer a CLI? For Skyvern Cloud, `pip install skyvern` then run `skyvern setup`. For the local self-hosted path, run `skyvern quickstart` or `skyvern init` and choose Claude Code during the MCP step.
@ -483,6 +528,18 @@ Double-check that your API key is correct. You can find or regenerate it at [Set
If you recently regenerated your API key, run `skyvern mcp switch` if you installed the CLI, or update the MCP config manually, then restart your AI client. If you recently regenerated your API key, run `skyvern mcp switch` if you installed the CLI, or update the MCP config manually, then restart your AI client.
</Accordion> </Accordion>
<Accordion title="Browser sign-in does not complete">
If the browser opens but the MCP client remains unauthenticated, confirm you can sign in to [app.skyvern.com](https://app.skyvern.com) in the same browser, then restart authentication from your MCP client. Browser extensions or strict privacy settings can interrupt the sign-in flow.
If you previously connected with `x-api-key` and are switching to OAuth, remove the stale MCP entry first (`claude mcp remove skyvern`, or delete the entry in `~/.cursor/mcp.json`) and re-add it without the `headers` field. A static header prevents the client from starting OAuth.
If the sign-in page shows an authorization failure, start the flow again from your MCP client instead of reloading the browser page.
</Accordion>
<Accordion title="OAuth session expires">
OAuth sessions use the access-token lifetime described in the OAuth setup section above. When authentication expires, re-run the MCP client's authentication flow. To stop using Skyvern from a client before then, clear that client's saved Skyvern authentication or remove the Skyvern MCP server entry.
</Accordion>
<Accordion title="Tools not responding or timing out"> <Accordion title="Tools not responding or timing out">
Skyvern browser sessions take a few seconds to start. If a tool call times out, try again. The first call in a new session is slower than subsequent ones. Skyvern browser sessions take a few seconds to start. If a tool call times out, try again. The first call in a new session is slower than subsequent ones.

View file

@ -247,13 +247,19 @@ Reference docs you can fetch as needed:
Give your AI coding assistant direct browser control through natural language. No SDK code needed. Give your AI coding assistant direct browser control through natural language. No SDK code needed.
Claude Code and Cursor support MCP OAuth: point them at Skyvern Cloud and sign in through your browser. No API key is required in the MCP config. Other clients still use the API key from Section 1.
<Tabs> <Tabs>
<Tab title="Claude Code"> <Tab title="Claude Code">
```bash ```bash
claude mcp add-json skyvern '{"type":"http","url":"https://api.skyvern.com/mcp/","headers":{"x-api-key":"YOUR_SKYVERN_API_KEY"}}' --scope user claude mcp add --transport http skyvern https://api.skyvern.com/mcp/ --scope user
``` ```
Then run `/mcp` in Claude Code and follow the authentication prompt. Your browser opens for Skyvern sign-in and the token is stored for future sessions.
Prefer a static key instead? Use `claude mcp add-json skyvern '{"type":"http","url":"https://api.skyvern.com/mcp/","headers":{"x-api-key":"YOUR_SKYVERN_API_KEY"}}' --scope user`.
</Tab> </Tab>
<Tab title="Cursor"> <Tab title="Cursor">
@ -264,15 +270,16 @@ Add to `~/.cursor/mcp.json`:
"mcpServers": { "mcpServers": {
"Skyvern": { "Skyvern": {
"type": "streamable-http", "type": "streamable-http",
"url": "https://api.skyvern.com/mcp/", "url": "https://api.skyvern.com/mcp/"
"headers": {
"x-api-key": "YOUR_SKYVERN_API_KEY"
}
} }
} }
} }
``` ```
This config intentionally has no `headers` field. Cursor starts OAuth for remote servers that require authentication. Use a current Cursor release; if the prompt does not appear, update Cursor and reopen MCP settings.
Restart Cursor, open MCP settings, and authenticate the Skyvern server when prompted.
</Tab> </Tab>
<Tab title="Windsurf"> <Tab title="Windsurf">

View file

@ -180,7 +180,7 @@ include = [
[tool.uv] [tool.uv]
# Supply chain quarantine: block packages published less than 7 days ago. # Supply chain quarantine: block packages published less than 7 days ago.
# Override for urgent cases: uv add <pkg> --exclude-newer "" # Override for urgent cases: uv add <pkg> --exclude-newer ""
exclude-newer = "7 days" # exclude-newer = "7 days"
constraint-dependencies = [ constraint-dependencies = [
"authlib>=1.6.9", "authlib>=1.6.9",
"flask>=3.1.3", "flask>=3.1.3",
@ -251,6 +251,3 @@ plugins = "sqlalchemy.ext.mypy.plugin"
[project.scripts] [project.scripts]
skyvern = "skyvern.__main__:main" skyvern = "skyvern.__main__:main"
[tool.pytest.ini_options]
norecursedirs = ["eval", "tests/sdk"]

10
pytest.ini Normal file
View file

@ -0,0 +1,10 @@
[pytest]
# Disable output capturing to avoid duplicate streaming + captured sections during test runs
addopts = --capture=no
# Exclude manual tests that require external setup (HTTP server, Playwright browsers, etc.).
# tests/evals runs via dev_scripts/run_copilot_evals.sh, which injects pydantic-evals and
# logfire via `uv run --with` so they stay out of the main lockfile.
# tests/sdk and eval are OSS-only; listed here because this file is the single source
# of truth for norecursedirs after it joins the sync manifest. Pytest silently ignores
# paths that don't exist, so they're harmless in cloud.
norecursedirs = tests/manual tests/evals scripts/cdp-download-poc tests/sdk eval

View file

@ -18,7 +18,6 @@ import yaml
from pydantic import Field from pydantic import Field
from skyvern.client.errors import BadRequestError, NotFoundError from skyvern.client.errors import BadRequestError, NotFoundError
from skyvern.client.types import WorkflowCreateYamlRequest
from skyvern.forge.sdk.workflow.models.parameter import ParameterType, WorkflowParameterType from skyvern.forge.sdk.workflow.models.parameter import ParameterType, WorkflowParameterType
from skyvern.schemas.runs import ProxyLocation from skyvern.schemas.runs import ProxyLocation
from skyvern.schemas.workflows import WorkflowCreateYAMLRequest as WorkflowCreateYAMLRequestSchema from skyvern.schemas.workflows import WorkflowCreateYAMLRequest as WorkflowCreateYAMLRequestSchema
@ -34,6 +33,7 @@ _SUMMARY_SCALAR_PREVIEW_LIMIT = 3
_SUMMARY_ARTIFACT_PREVIEW_LIMIT = 4 _SUMMARY_ARTIFACT_PREVIEW_LIMIT = 4
_SUMMARY_STRING_PREVIEW_LIMIT = 120 _SUMMARY_STRING_PREVIEW_LIMIT = 120
_SUMMARY_RECURSION_LIMIT = 10 _SUMMARY_RECURSION_LIMIT = 10
_ERROR_DETAIL_LIMIT = 500
_SCREENSHOT_LIST_KEYS = frozenset({"task_screenshots", "workflow_screenshots", "screenshot_urls"}) _SCREENSHOT_LIST_KEYS = frozenset({"task_screenshots", "workflow_screenshots", "screenshot_urls"})
_SCREENSHOT_ARTIFACT_ID_KEYS = frozenset({"task_screenshot_artifact_ids", "workflow_screenshot_artifact_ids"}) _SCREENSHOT_ARTIFACT_ID_KEYS = frozenset({"task_screenshot_artifact_ids", "workflow_screenshot_artifact_ids"})
@ -42,25 +42,40 @@ _SCREENSHOT_ARTIFACT_ID_KEYS = frozenset({"task_screenshot_artifact_ids", "workf
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
def _serialize_workflow(wf: Any) -> dict[str, Any]: def _coerce_timestamp(value: Any) -> str | None:
"""Pick the fields we expose from a Workflow Pydantic model. if value is None:
return None
if isinstance(value, str):
return value
isoformat = getattr(value, "isoformat", None)
if callable(isoformat):
return isoformat()
LOG.debug("Unexpected timestamp type in workflow response", value_type=type(value).__name__)
return str(value)
Uses Any to avoid tight coupling with Fern-generated client types.
def _serialize_workflow(wf: Any) -> dict[str, Any]:
"""Pick the fields we expose from a Workflow.
Accepts both a Fern-generated Workflow pydantic model and a plain dict
parsed from a raw httpx JSON response. Uses Any to stay decoupled from
Fern-generated client types.
""" """
status = _get_value(wf, "status")
data: dict[str, Any] = { data: dict[str, Any] = {
"workflow_permanent_id": wf.workflow_permanent_id, "workflow_permanent_id": _get_value(wf, "workflow_permanent_id"),
"workflow_id": wf.workflow_id, "workflow_id": _get_value(wf, "workflow_id"),
"title": wf.title, "title": _get_value(wf, "title"),
"version": wf.version, "version": _get_value(wf, "version"),
"status": str(wf.status) if wf.status else None, "status": str(status) if status else None,
"description": wf.description, "description": _get_value(wf, "description"),
"is_saved_task": wf.is_saved_task, "is_saved_task": _get_value(wf, "is_saved_task"),
"folder_id": wf.folder_id, "folder_id": _get_value(wf, "folder_id"),
"created_at": wf.created_at.isoformat() if wf.created_at else None, "created_at": _coerce_timestamp(_get_value(wf, "created_at")),
"modified_at": wf.modified_at.isoformat() if wf.modified_at else None, "modified_at": _coerce_timestamp(_get_value(wf, "modified_at")),
} }
for caching_field in ("run_with", "code_version", "adaptive_caching"): for caching_field in ("run_with", "code_version", "adaptive_caching"):
val = getattr(wf, caching_field, None) val = _get_value(wf, caching_field)
if val is not None: if val is not None:
data[caching_field] = val data[caching_field] = val
return data return data
@ -69,18 +84,26 @@ def _serialize_workflow(wf: Any) -> dict[str, Any]:
def _serialize_workflow_full(wf: Any) -> dict[str, Any]: def _serialize_workflow_full(wf: Any) -> dict[str, Any]:
"""Like _serialize_workflow but includes the full definition.""" """Like _serialize_workflow but includes the full definition."""
data = _serialize_workflow(wf) data = _serialize_workflow(wf)
if hasattr(wf, "workflow_definition") and wf.workflow_definition is not None: wf_def = _get_value(wf, "workflow_definition")
if wf_def is None:
return data
if hasattr(wf_def, "model_dump"):
try: try:
data["workflow_definition"] = wf.workflow_definition.model_dump(mode="json") data["workflow_definition"] = wf_def.model_dump(mode="json")
except Exception: except Exception:
data["workflow_definition"] = str(wf.workflow_definition) data["workflow_definition"] = str(wf_def)
elif isinstance(wf_def, dict):
data["workflow_definition"] = wf_def
else:
data["workflow_definition"] = str(wf_def)
return data return data
def _serialize_run(run: Any) -> dict[str, Any]: def _serialize_run(run: Any) -> dict[str, Any]:
"""Pick fields from a run response (GetRunResponse variant or WorkflowRunResponse). """Pick fields from a run response (GetRunResponse variant or WorkflowRunResponse).
Uses Any to avoid tight coupling with Fern-generated client types. Run responses still come from Fern SDK models; unlike workflow CRUD, this
path is not part of the raw dict bypass.
""" """
data: dict[str, Any] = { data: dict[str, Any] = {
"run_id": run.run_id, "run_id": run.run_id,
@ -119,6 +142,11 @@ def _serialize_run(run: Any) -> dict[str, Any]:
def _get_value(obj: Any, key: str, default: Any = None) -> Any: def _get_value(obj: Any, key: str, default: Any = None) -> Any:
"""Read a field from raw workflow dicts or Fern models without enforcing requiredness.
Workflow serializers are intentionally permissive here so response shaping
does not fail when the backend adds or omits optional fields.
"""
if isinstance(obj, dict): if isinstance(obj, dict):
return obj.get(key, default) return obj.get(key, default)
return getattr(obj, key, default) return getattr(obj, key, default)
@ -402,7 +430,7 @@ async def _get_workflow_by_id(workflow_id: str, version: int | None = None) -> d
params: dict[str, Any] = {} params: dict[str, Any] = {}
if version is not None: if version is not None:
params["version"] = version params["version"] = version
# TODO(SKY-7807): Replace with skyvern.get_workflow() when the Fern client adds it. # SKY-7807: Replace with skyvern.get_workflow() when the Fern client adds it.
response = await skyvern._client_wrapper.httpx_client.request( response = await skyvern._client_wrapper.httpx_client.request(
f"api/v1/workflows/{workflow_id}", f"api/v1/workflows/{workflow_id}",
method="GET", method="GET",
@ -420,18 +448,145 @@ async def _get_workflow_by_id(workflow_id: str, version: int | None = None) -> d
return response.json() return response.json()
def _validate_definition_structure(json_def: WorkflowCreateYamlRequest | None, action: str) -> dict[str, Any] | None: # ---------------------------------------------------------------------------
# Fern-bypass raw HTTP helpers
#
# The vendored Fern SDK ``skyvern/client`` validates Workflow requests and
# responses through discriminated unions of block variants; any block_type that
# the client hasn't been regenerated for (e.g. google_sheets_read) blows up the
# MCP tool before the backend can accept it. These helpers call the backend
# directly via the underlying httpx client, pass/return plain dicts, and
# sidestep the drift entirely.
#
# The ``v1/workflows`` paths intentionally mirror the generated Fern raw client.
# The ``api/v1`` workflow routes above are non-Fern/internal routes used only
# where no public SDK equivalent exists yet.
# ---------------------------------------------------------------------------
def _extract_error_detail(response: Any) -> str:
try:
body = response.json()
except Exception:
text = str(getattr(response, "text", ""))
return text[:_ERROR_DETAIL_LIMIT] if len(text) > _ERROR_DETAIL_LIMIT else text
if isinstance(body, dict):
return str(body.get("detail") or body.get("error") or body)
return str(body)
async def _list_workflows_raw(
*,
search: str | None,
page: int,
page_size: int,
only_workflows: bool,
) -> list[dict[str, Any]]:
skyvern = get_skyvern()
params: dict[str, Any] = {
"page": page,
"page_size": page_size,
"only_workflows": only_workflows,
}
if search is not None:
params["search_key"] = search
response = await skyvern._client_wrapper.httpx_client.request(
"v1/workflows",
method="GET",
params=params,
)
if response.status_code >= 400:
raise RuntimeError(f"HTTP {response.status_code}: {_extract_error_detail(response)}")
payload = response.json()
if not isinstance(payload, list):
raise RuntimeError(f"Unexpected workflows list payload: {type(payload).__name__}")
return payload
async def _create_workflow_raw(
*,
json_definition: dict[str, Any] | None,
yaml_definition: str | None,
folder_id: str | None,
) -> dict[str, Any]:
skyvern = get_skyvern()
body: dict[str, Any] = {}
if json_definition is not None:
body["json_definition"] = json_definition
if yaml_definition is not None:
body["yaml_definition"] = yaml_definition
params: dict[str, Any] = {}
if folder_id is not None:
params["folder_id"] = folder_id
response = await skyvern._client_wrapper.httpx_client.request(
"v1/workflows",
method="POST",
params=params,
json=body,
)
if response.status_code >= 400:
raise RuntimeError(f"HTTP {response.status_code}: {_extract_error_detail(response)}")
payload = response.json()
if not isinstance(payload, dict):
raise RuntimeError(f"Unexpected create_workflow payload: {type(payload).__name__}")
return payload
async def _update_workflow_raw(
workflow_id: str,
*,
json_definition: dict[str, Any] | None,
yaml_definition: str | None,
) -> dict[str, Any]:
skyvern = get_skyvern()
body: dict[str, Any] = {}
if json_definition is not None:
body["json_definition"] = json_definition
if yaml_definition is not None:
body["yaml_definition"] = yaml_definition
response = await skyvern._client_wrapper.httpx_client.request(
f"v1/workflows/{workflow_id}",
method="POST",
json=body,
)
if response.status_code == 404:
raise NotFoundError(body={"detail": f"Workflow {workflow_id!r} not found"})
if response.status_code >= 400:
raise RuntimeError(f"HTTP {response.status_code}: {_extract_error_detail(response)}")
payload = response.json()
if not isinstance(payload, dict):
raise RuntimeError(f"Unexpected update_workflow payload: {type(payload).__name__}")
return payload
async def _update_workflow_folder_raw(workflow_id: str, *, folder_id: str | None) -> dict[str, Any]:
skyvern = get_skyvern()
response = await skyvern._client_wrapper.httpx_client.request(
f"v1/workflows/{workflow_id}/folder",
method="PUT",
json={"folder_id": folder_id},
)
if response.status_code == 404:
raise NotFoundError(body={"detail": f"Workflow {workflow_id!r} not found"})
if response.status_code == 400:
raise BadRequestError(body={"detail": _extract_error_detail(response)})
if response.status_code >= 400:
raise RuntimeError(f"HTTP {response.status_code}: {_extract_error_detail(response)}")
payload = response.json()
if not isinstance(payload, dict):
raise RuntimeError(f"Unexpected update_workflow_folder payload: {type(payload).__name__}")
return payload
def _validate_definition_structure(json_def: dict[str, Any] | None, action: str) -> dict[str, Any] | None:
"""Validate required fields in a JSON workflow definition. """Validate required fields in a JSON workflow definition.
Returns a make_result error dict if validation fails, or None if valid. Returns a make_result error dict if validation fails, or None if valid.
Only validates JSON definitions YAML is validated server-side. Only validates JSON definitions YAML is validated server-side.
Note: WorkflowCreateYamlRequest already enforces ``title`` and
``workflow_definition`` as required fields via Pydantic, so this is
a belt-and-suspenders check that produces user-friendly error messages.
""" """
if json_def is None: if json_def is None:
return None return None
if not json_def.title: if not json_def.get("title"):
return make_result( return make_result(
action, action,
ok=False, ok=False,
@ -441,6 +596,16 @@ def _validate_definition_structure(json_def: WorkflowCreateYamlRequest | None, a
"Add a 'title' field to your workflow definition", "Add a 'title' field to your workflow definition",
), ),
) )
if not isinstance(json_def.get("workflow_definition"), dict):
return make_result(
action,
ok=False,
error=make_error(
ErrorCode.INVALID_INPUT,
"Workflow definition missing 'workflow_definition' object",
"Add a 'workflow_definition' object with a 'blocks' list",
),
)
return None return None
@ -482,22 +647,33 @@ def _deep_merge(base: Any, override: Any) -> Any:
return override return override
def _normalize_json_definition(raw: Any) -> WorkflowCreateYamlRequest: def _normalize_json_definition(raw: Any) -> dict[str, Any]:
"""Normalize JSON workflow definitions through the shared backend schema.""" """Normalize JSON workflow definitions through the shared backend schema.
The MCP tools post through raw HTTP so valid backend payloads are not
rejected by stale Fern request unions. When the backend schema accepts the
payload, merge its JSON-compatible normalization over the raw dict; when it
does not, preserve the caller's raw dict for server-side validation.
"""
if not isinstance(raw, dict): if not isinstance(raw, dict):
raise TypeError("Workflow definition JSON must be an object") raise TypeError("Workflow definition JSON must be an object")
if "title" not in raw:
raise ValueError("Workflow definition missing 'title' field")
if "workflow_definition" not in raw:
raise ValueError("Workflow definition missing 'workflow_definition' object")
try: try:
normalized = WorkflowCreateYAMLRequestSchema.model_validate(raw) normalized = WorkflowCreateYAMLRequestSchema.model_validate(raw)
except Exception as exc: except Exception as exc:
# Internal schema is stricter than the Fern SDK — skip normalization so # Internal schema is stricter than the API boundary — skip normalization
# unknown/future fields are not rejected. # so unknown/future fields are not rejected by MCP before the backend can
# decide.
LOG.warning("Skipping text-prompt normalization; internal schema rejected payload", error=str(exc)) LOG.warning("Skipping text-prompt normalization; internal schema rejected payload", error=str(exc))
return WorkflowCreateYamlRequest(**raw) return raw
merged = _deep_merge(raw, normalized.model_dump(mode="json")) merged = _deep_merge(raw, normalized.model_dump(mode="json"))
return WorkflowCreateYamlRequest(**merged) return merged
def _make_invalid_json_definition_error(exc: Exception) -> dict[str, Any]: def _make_invalid_json_definition_error(exc: Exception) -> dict[str, Any]:
@ -792,14 +968,12 @@ async def _inject_workflow_update_parameters(definition: str, fmt: str, workflow
return _dump_definition_dict(raw, parsed_format) return _dump_definition_dict(raw, parsed_format)
def _parse_definition( def _parse_definition(definition: str, fmt: str) -> tuple[dict[str, Any] | None, str | None, dict[str, Any] | None]:
definition: str, fmt: str
) -> tuple[WorkflowCreateYamlRequest | None, str | None, dict[str, Any] | None]:
"""Parse a workflow definition string. """Parse a workflow definition string.
Returns (json_definition, yaml_definition, error). Returns (json_definition, yaml_definition, error).
Exactly one of the first two will be set on success, or error on failure. Exactly one of the first two will be set on success, or error on failure.
JSON input is parsed into a WorkflowCreateYamlRequest (the type the SDK expects). JSON input is parsed into a plain dict for raw HTTP submission.
""" """
if fmt == "json": if fmt == "json":
@ -838,12 +1012,10 @@ async def skyvern_workflow_list(
) -> dict[str, Any]: ) -> dict[str, Any]:
"""Find and browse available Skyvern workflows. Use when you need to discover what workflows exist, """Find and browse available Skyvern workflows. Use when you need to discover what workflows exist,
search for a workflow by name, or list all workflows for an organization.""" search for a workflow by name, or list all workflows for an organization."""
skyvern = get_skyvern()
with Timer() as timer: with Timer() as timer:
try: try:
workflows = await skyvern.get_workflows( workflows = await _list_workflows_raw(
search_key=search, search=search,
page=page, page=page,
page_size=page_size, page_size=page_size,
only_workflows=only_workflows, only_workflows=only_workflows,
@ -955,11 +1127,9 @@ async def skyvern_workflow_create(
if err := _validate_definition_structure(json_def, "skyvern_workflow_create"): if err := _validate_definition_structure(json_def, "skyvern_workflow_create"):
return err return err
skyvern = get_skyvern()
with Timer() as timer: with Timer() as timer:
try: try:
workflow = await skyvern.create_workflow( workflow = await _create_workflow_raw(
json_definition=json_def, json_definition=json_def,
yaml_definition=yaml_def, yaml_definition=yaml_def,
folder_id=folder_id, folder_id=folder_id,
@ -978,7 +1148,7 @@ async def skyvern_workflow_create(
), ),
) )
LOG.info("workflow_created", workflow_id=workflow.workflow_permanent_id) LOG.info("workflow_created", workflow_id=workflow.get("workflow_permanent_id"))
data = _serialize_workflow(workflow) data = _serialize_workflow(workflow)
fmt_label = "json_definition" if json_def is not None else "yaml_definition" fmt_label = "json_definition" if json_def is not None else "yaml_definition"
folder_str = f", folder_id={folder_id!r}" if folder_id is not None else "" folder_str = f", folder_id={folder_id!r}" if folder_id is not None else ""
@ -1041,16 +1211,25 @@ async def skyvern_workflow_update(
if err := _validate_definition_structure(json_def, "skyvern_workflow_update"): if err := _validate_definition_structure(json_def, "skyvern_workflow_update"):
return err return err
skyvern = get_skyvern()
with Timer() as timer: with Timer() as timer:
try: try:
workflow = await skyvern.update_workflow( workflow = await _update_workflow_raw(
workflow_id, workflow_id,
json_definition=json_def, json_definition=json_def,
yaml_definition=yaml_def, yaml_definition=yaml_def,
) )
timer.mark("sdk") timer.mark("sdk")
except NotFoundError:
return make_result(
"skyvern_workflow_update",
ok=False,
timing_ms=timer.timing_ms,
error=make_error(
ErrorCode.WORKFLOW_NOT_FOUND,
f"Workflow {workflow_id!r} not found",
"Verify the workflow ID with skyvern_workflow_list",
),
)
except Exception as e: except Exception as e:
return make_result( return make_result(
"skyvern_workflow_update", "skyvern_workflow_update",
@ -1140,11 +1319,9 @@ async def skyvern_workflow_update_folder(
if folder_id is not None and (err := validate_folder_id(folder_id, "skyvern_workflow_update_folder")): if folder_id is not None and (err := validate_folder_id(folder_id, "skyvern_workflow_update_folder")):
return err return err
skyvern = get_skyvern()
with Timer() as timer: with Timer() as timer:
try: try:
workflow = await skyvern.update_workflow_folder(workflow_id, folder_id=folder_id) workflow = await _update_workflow_folder_raw(workflow_id, folder_id=folder_id)
timer.mark("sdk") timer.mark("sdk")
except NotFoundError: except NotFoundError:
return make_result( return make_result(

View file

@ -75,6 +75,17 @@ DEFAULT_LOGIN_COMPLETE_CRITERION = (
"Do NOT assume login failed just because you are on the homepage or the same page as before." "Do NOT assume login failed just because you are on the homepage or the same page as before."
) )
# Template for wrapping a block-level mini-goal with the user's original prompt as context.
# Used by both TaskV2 planning and the workflow-copilot-v2 tool handler so that every block's
# navigation_goal carries the user's overarching intent — the verifier (complete_verify) can
# then reason about completion against the user's goal rather than the block's narrow action
# decomposition.
MINI_GOAL_TEMPLATE = """Achieve the following mini goal and once it's achieved, complete:
```{mini_goal}```
This mini goal is part of the big goal the user wants to achieve and use the big goal as context to achieve the mini goal:
```{main_goal}```"""
# reserved fields for navigation payload # reserved fields for navigation payload
SPECIAL_FIELD_VERIFICATION_CODE = "verification_code" SPECIAL_FIELD_VERIFICATION_CODE = "verification_code"

View file

@ -135,6 +135,7 @@ from skyvern.webeye.actions.actions import (
KeypressAction, KeypressAction,
ReloadPageAction, ReloadPageAction,
TerminateAction, TerminateAction,
VerificationStatus,
WebAction, WebAction,
) )
from skyvern.webeye.actions.handler import ActionHandler from skyvern.webeye.actions.handler import ActionHandler
@ -180,6 +181,66 @@ def _llm_error_category(reasoning: str) -> list[dict]:
return [{"category": "LLM_ERROR", "confidence_float": 0.9, "reasoning": reasoning}] return [{"category": "LLM_ERROR", "confidence_float": 0.9, "reasoning": reasoning}]
# Phrases the verifier / validator LLMs use when they rely on exact-string
# matching rather than semantic interpretation. Tagging reasoning text as
# "literal" vs "semantic" gives us a post-hoc signal for how often the LLM
# narrow-matches on a criterion or goal — queryable via the
# validation.reasoning_kind / verification.reasoning_kind span attributes.
# The heuristic is imperfect by design; its purpose is a rough trend
# indicator across the copilot-v2 cohort and other callers.
_LITERAL_REASONING_SIGNALS: tuple[str, ...] = (
"exact",
"literal",
"verbatim",
"word-for-word",
"word for word",
)
def _classify_reasoning_kind(reasoning: str | None) -> str:
"""Classify a verifier / validator LLM's reasoning text as ``"literal"``
(relied on exact-string matching) or ``"semantic"``.
Shared between ``record_validation_span_attrs`` (validation-block step
spans) and ``record_verification_span_attrs`` (``complete_verify`` spans).
The keyword heuristic is imperfect by design its purpose is a rough
trend signal, not a truth oracle.
"""
text = (reasoning or "").lower()
return "literal" if any(s in text for s in _LITERAL_REASONING_SIGNALS) else "semantic"
def record_validation_span_attrs(span: Any, task: Task, actions: list[Action]) -> None:
"""Attach ``validation.decision`` and ``validation.reasoning_kind`` to
``span`` when ``task`` is a ``TaskType.validation`` step whose first parsed
action is a ``CompleteAction`` or ``TerminateAction``.
Exposed as a module-level helper (rather than inlined at the call site) so
the logic is unit-testable without driving the full agent step. No-op on
non-validation tasks or non-decisive actions callers don't need to guard.
"""
if task.task_type != TaskType.validation or not actions:
return
decision = actions[0]
if not isinstance(decision, (CompleteAction, TerminateAction)):
return
validation_decision = "complete" if isinstance(decision, CompleteAction) else "terminate"
span.set_attribute("validation.decision", validation_decision)
span.set_attribute("validation.reasoning_kind", _classify_reasoning_kind(decision.reasoning))
def record_verification_span_attrs(span: Any, verification_thoughts: str | None) -> None:
"""Attach ``verification.reasoning_kind`` to the current
``skyvern.agent.complete_verify`` span based on the verifier LLM's
``thoughts`` field.
Complements ``verification.status`` / ``verification.template``, which
``complete_verify`` already sets. Pure, module-level helper so the
classifier logic is unit-testable without driving the full verify path.
"""
span.set_attribute("verification.reasoning_kind", _classify_reasoning_kind(verification_thoughts))
@dataclass @dataclass
class SpeculativePlan: class SpeculativePlan:
scraped_page: ScrapedPage scraped_page: ScrapedPage
@ -1372,6 +1433,7 @@ class ForgeAgent:
speculative_llm_metadata = None speculative_llm_metadata = None
detailed_agent_step_output.actions = actions detailed_agent_step_output.actions = actions
record_validation_span_attrs(_step_span, task, actions)
if len(actions) == 0: if len(actions) == 0:
LOG.info( LOG.info(
"No actions to execute, marking step as failed", "No actions to execute, marking step as failed",
@ -2334,6 +2396,7 @@ class ForgeAgent:
exc_info=True, exc_info=True,
) )
@traced(name="skyvern.agent.complete_verify")
async def complete_verify( async def complete_verify(
self, page: Page, scraped_page: ScrapedPage, task: Task, step: Step self, page: Page, scraped_page: ScrapedPage, task: Task, step: Step
) -> CompleteVerifyResult: ) -> CompleteVerifyResult:
@ -2437,7 +2500,18 @@ class ForgeAgent:
screenshots=scraped_page_refreshed.screenshots, screenshots=scraped_page_refreshed.screenshots,
prompt_name=prompt_name, prompt_name=prompt_name,
) )
return CompleteVerifyResult.model_validate(verification_result) result = CompleteVerifyResult.model_validate(verification_result)
if result.is_complete:
verification_status = VerificationStatus.complete
elif result.is_terminate:
verification_status = VerificationStatus.terminate
else:
verification_status = VerificationStatus.continue_step
span = otel_trace.get_current_span()
span.set_attribute("verification.status", verification_status.value)
span.set_attribute("verification.template", template_name)
record_verification_span_attrs(span, result.thoughts)
return result
async def check_user_goal_complete( async def check_user_goal_complete(
self, page: Page, scraped_page: ScrapedPage, task: Task, step: Step self, page: Page, scraped_page: ScrapedPage, task: Task, step: Step

View file

@ -114,9 +114,13 @@ When the user reports a problem with workflow output (duplicates, missing data,
IMPORTANT WORKFLOW RULES: IMPORTANT WORKFLOW RULES:
* Always generate valid YAML that conforms to the Skyvern workflow schema. * Always generate valid YAML that conforms to the Skyvern workflow schema.
* Use "navigation" blocks for visiting URLs AND performing actions (filling forms, clicking buttons) — NOT "task" or "goto_url" blocks. * NEVER emit "task" or "task_v2" blocks. They are not supported by the workflow copilot and will be rejected at schema lookup. Decompose the work into specific block types instead:
* Use "extraction" blocks for extracting structured data. - "navigation" for page actions (filling forms, clicking buttons, multi-step flows).
* Use "login" blocks for authentication flows. - "extraction" for data extraction.
- "validation" for completion checks.
- "login" for authentication flows.
- "wait" for explicit pauses between blocks.
- "goto_url" only when you need to navigate to a URL without any subsequent actions (prefer "navigation" when both URL visit + actions are needed).
* Use descriptive, unique labels for blocks (snake_case format). * Use descriptive, unique labels for blocks (snake_case format).
* Reference parameters using Jinja2 syntax: {% raw %}{{ parameters.param_key }}{% endraw %} * Reference parameters using Jinja2 syntax: {% raw %}{{ parameters.param_key }}{% endraw %}
* Inline literal values directly in block config when the value is non-sensitive, used once, and not meant to be a reusable runtime input. Example: hardcode a specific URL or a one-off search term in the block's navigation_goal or data_extraction_goal instead of creating a workflow parameter for it. * Inline literal values directly in block config when the value is non-sensitive, used once, and not meant to be a reusable runtime input. Example: hardcode a specific URL or a one-off search term in the block's navigation_goal or data_extraction_goal instead of creating a workflow parameter for it.

View file

@ -100,7 +100,7 @@ Important Rules:
** NAVIGATION BLOCK (navigation) ** ** NAVIGATION BLOCK (navigation) **
Purpose: Take actions to achieve a task. This is the "Browser Task" block in the UI. Purpose: Take actions on a page to achieve a focused goal — fill a form, click through a multi-step flow, prepare the page before an extraction.
Structure: Structure:
block_type: navigation block_type: navigation
@ -200,6 +200,16 @@ blocks:
navigation_goal: "Check the terms checkbox" navigation_goal: "Check the terms checkbox"
max_retries: 1 max_retries: 1
** TASK BLOCK (task) — NOT AVAILABLE IN WORKFLOW COPILOT **
DO NOT EMIT "task" blocks. They are not available in the workflow copilot and will be rejected at persistence. Use:
- "navigation" for page actions (filling forms, clicking, multi-step flows)
- "extraction" for data extraction (with data_extraction_goal + data_schema)
- "validation" for completion checks
- "login" for authentication
- "goto_url" for pure URL navigation
Legacy workflows that already contain "task" blocks continue to run outside the copilot; this ban applies only to copilot emission.
** TASK V2 BLOCK (task_v2) — DEPRECATED ** ** TASK V2 BLOCK (task_v2) — DEPRECATED **
DO NOT USE task_v2. Use "navigation" blocks instead (with navigation_goal). DO NOT USE task_v2. Use "navigation" blocks instead (with navigation_goal).

View file

@ -22,6 +22,7 @@ import yaml
from pydantic import ValidationError from pydantic import ValidationError
from skyvern.forge.prompts import prompt_engine from skyvern.forge.prompts import prompt_engine
from skyvern.forge.sdk.copilot.block_goal_wrapping import wrap_block_goals
from skyvern.forge.sdk.copilot.context import AgentResult, CopilotContext, StructuredContext from skyvern.forge.sdk.copilot.context import AgentResult, CopilotContext, StructuredContext
from skyvern.forge.sdk.copilot.output_utils import extract_final_text, parse_final_response from skyvern.forge.sdk.copilot.output_utils import extract_final_text, parse_final_response
from skyvern.forge.sdk.copilot.tracing_setup import _copilot_model_name, ensure_tracing_initialized, is_tracing_enabled from skyvern.forge.sdk.copilot.tracing_setup import _copilot_model_name, ensure_tracing_initialized, is_tracing_enabled
@ -168,6 +169,7 @@ def _build_exit_result(ctx: CopilotContext, user_response: str, global_llm_conte
global_llm_context=global_llm_context, global_llm_context=global_llm_context,
workflow_yaml=verified_yaml, workflow_yaml=verified_yaml,
workflow_was_persisted=ctx.workflow_persisted, workflow_was_persisted=ctx.workflow_persisted,
total_tokens=ctx.total_tokens_used,
) )
@ -178,6 +180,7 @@ def _translate_to_agent_result(
chat_request: WorkflowCopilotChatRequest, chat_request: WorkflowCopilotChatRequest,
organization_id: str, organization_id: str,
) -> AgentResult: ) -> AgentResult:
# Deferred tools.py imports here and below: tools.py -> routes.workflow_copilot -> this module (circular at import time).
from skyvern.forge.sdk.copilot.tools import _process_workflow_yaml from skyvern.forge.sdk.copilot.tools import _process_workflow_yaml
text = extract_final_text(result) text = extract_final_text(result)
@ -198,6 +201,26 @@ def _translate_to_agent_result(
LOG.warning("Agent used inline REPLACE_WORKFLOW instead of update_workflow tool") LOG.warning("Agent used inline REPLACE_WORKFLOW instead of update_workflow tool")
workflow_yaml = action_data.get("workflow_yaml", "") workflow_yaml = action_data.get("workflow_yaml", "")
if workflow_yaml: if workflow_yaml:
# REPLACE_WORKFLOW bypasses _update_workflow, so the post-emission
# reject has to run here too. Skip processing on detection; leave
# last_workflow / last_workflow_yaml at their pre-REPLACE values so
# the rejected YAML does not latch onto ctx.
from skyvern.forge.sdk.copilot.tools import (
_banned_block_reject_message,
_detect_new_banned_blocks,
_record_banned_block_reject_span,
)
banned_items = _detect_new_banned_blocks(workflow_yaml, ctx.last_workflow_yaml)
if banned_items:
_record_banned_block_reject_span("replace_workflow_inline", banned_items)
user_response = f"{user_response}\n\n(Note: {_banned_block_reject_message(banned_items)})"
workflow_yaml = ""
if workflow_yaml:
if ctx.user_message:
workflow_yaml = wrap_block_goals(workflow_yaml, ctx.user_message)
else:
LOG.warning("REPLACE_WORKFLOW inline path missing ctx.user_message; skipping block-goal wrap")
try: try:
last_workflow = _process_workflow_yaml( last_workflow = _process_workflow_yaml(
workflow_id=chat_request.workflow_id, workflow_id=chat_request.workflow_id,
@ -251,6 +274,7 @@ def _translate_to_agent_result(
response_type=resp_type, response_type=resp_type,
workflow_yaml=last_workflow_yaml, workflow_yaml=last_workflow_yaml,
workflow_was_persisted=ctx.workflow_persisted, workflow_was_persisted=ctx.workflow_persisted,
total_tokens=ctx.total_tokens_used,
) )
@ -486,6 +510,7 @@ async def run_copilot_agent(
global_llm_context=global_llm_context, global_llm_context=global_llm_context,
workflow_yaml=None, workflow_yaml=None,
workflow_was_persisted=ctx.workflow_persisted, workflow_was_persisted=ctx.workflow_persisted,
total_tokens=ctx.total_tokens_used,
) )
except Exception as e: except Exception as e:
LOG.error("Copilot agent error", error=str(e), exc_info=True) LOG.error("Copilot agent error", error=str(e), exc_info=True)

View file

@ -0,0 +1,98 @@
"""Wrap copilot-v2-generated block intent fields (``navigation_goal``,
``complete_criterion``, ``terminate_criterion``) with the user's original
chat message as "big goal" context, mirroring the TaskV2 pattern that
applies ``MINI_GOAL_TEMPLATE`` at every mini-goal construction site.
Without this wrap:
- The Skyvern verifier (``complete_verify``) has no user-intent context
when a navigation block finishes on a confirmation surface.
- The validation-block prompt (``decisive-criterion-validate.j2``) sees
only a terse criterion (often a verbatim slice of the user prompt) and
reads it as a literal string to match.
"""
from __future__ import annotations
from typing import Any
import yaml
from skyvern.constants import MINI_GOAL_TEMPLATE
from skyvern.utils.yaml_loader import safe_load_no_dates
# Block fields whose value expresses the LLM's "mini goal" — what it should
# do or what it should check for. Wrapped in MINI_GOAL_TEMPLATE alongside
# the user's chat message so the downstream LLMs can reason about intent.
# navigation_goal is carried by Task, Action, Navigation, Login, and
# FileDownload blocks; complete_criterion / terminate_criterion by Validation,
# Navigation, and Login blocks.
_WRAPPABLE_FIELDS: tuple[str, ...] = (
"navigation_goal",
"complete_criterion",
"terminate_criterion",
)
# The template's constant prefix — everything before the ``{mini_goal}``
# placeholder. Presence of this prefix in a wrapped field means it was
# wrapped on a prior invocation; used for idempotency so repeated tool
# calls don't stack wrappers. Deriving from the template (rather than a
# hard-coded substring) keeps idempotency intact if the template's wording
# changes.
_WRAPPED_PREFIX = MINI_GOAL_TEMPLATE.partition("{mini_goal}")[0]
def wrap_block_goals(workflow_yaml: str, user_message: str) -> str:
"""Return ``workflow_yaml`` with each block's ``navigation_goal``,
``complete_criterion``, and ``terminate_criterion`` wrapped via
:data:`skyvern.constants.MINI_GOAL_TEMPLATE`.
Blocks whose fields are missing, empty, or already wrapped are left
untouched. Recurses into ``ForLoopBlockYAML.loop_blocks``. No-ops when
``user_message`` is empty or the YAML is malformed (malformed input is
surfaced by the downstream ``_process_workflow_yaml`` call, same as
today).
"""
if not user_message:
return workflow_yaml
# Skip the parse+dump round-trip when the YAML can't contain any wrappable
# field. False positives (field name appearing inside a value) are harmless:
# we'd fall through to the full path and mutate nothing.
if not any(field in workflow_yaml for field in _WRAPPABLE_FIELDS):
return workflow_yaml
try:
parsed = safe_load_no_dates(workflow_yaml)
except yaml.YAMLError:
return workflow_yaml
if not isinstance(parsed, dict):
return workflow_yaml
definition = parsed.get("workflow_definition")
if not isinstance(definition, dict):
return workflow_yaml
blocks = definition.get("blocks")
if not isinstance(blocks, list):
return workflow_yaml
if not _wrap_blocks_in_place(blocks, user_message):
return workflow_yaml
# parse/mutate/dump: any YAML comments in workflow_yaml are stripped on re-serialize.
return yaml.safe_dump(parsed, sort_keys=False)
def _wrap_blocks_in_place(blocks: list[Any], user_message: str) -> bool:
"""Recursively wrap every field in :data:`_WRAPPABLE_FIELDS` on every
block in ``blocks``; returns ``True`` if at least one field was mutated."""
mutated = False
for block in blocks:
if not isinstance(block, dict):
continue
for field_name in _WRAPPABLE_FIELDS:
value = block.get(field_name)
if isinstance(value, str) and value and _WRAPPED_PREFIX not in value:
block[field_name] = MINI_GOAL_TEMPLATE.format(
mini_goal=value,
main_goal=user_message,
)
mutated = True
loop_blocks = block.get("loop_blocks")
if isinstance(loop_blocks, list):
mutated = _wrap_blocks_in_place(loop_blocks, user_message) or mutated
return mutated

View file

@ -4,13 +4,15 @@ from __future__ import annotations
import re import re
from dataclasses import dataclass, field from dataclasses import dataclass, field
from typing import TYPE_CHECKING, Any from typing import TYPE_CHECKING, Any, Literal
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from skyvern.forge.sdk.copilot.runtime import AgentContext from skyvern.forge.sdk.copilot.runtime import AgentContext
from skyvern.forge.sdk.workflow.models.workflow import Workflow from skyvern.forge.sdk.workflow.models.workflow import Workflow
ResponseType = Literal["REPLY", "ASK_QUESTION", "REPLACE_WORKFLOW"]
if TYPE_CHECKING: if TYPE_CHECKING:
from skyvern.forge.sdk.copilot.narration import NarratorState from skyvern.forge.sdk.copilot.narration import NarratorState
@ -108,13 +110,18 @@ class AgentResult:
user_response: str user_response: str
updated_workflow: Workflow | None updated_workflow: Workflow | None
global_llm_context: str | None global_llm_context: str | None
response_type: str = "REPLY" response_type: ResponseType = "REPLY"
workflow_yaml: str | None = None workflow_yaml: str | None = None
workflow_was_persisted: bool = False workflow_was_persisted: bool = False
# Feasibility-gate fast-path sets this True so the route can null any # Feasibility-gate fast-path sets this True so the route can null any
# previously-persisted proposed_workflow. Regular in-loop ASK_QUESTION # previously-persisted proposed_workflow. Regular in-loop ASK_QUESTION
# responses leave it False, preserving in-progress drafts. # responses leave it False, preserving in-progress drafts.
clear_proposed_workflow: bool = False clear_proposed_workflow: bool = False
# Actual API token usage accumulated across the agent run. None when no
# provider reported usage on the stream — distinguishes "no data" from
# "0 tokens" so eval cost grading can flag missing telemetry instead of
# silently passing as cheap.
total_tokens: int | None = None
@dataclass @dataclass
@ -150,6 +157,14 @@ class CopilotContext(AgentContext):
consecutive_tool_tracker: list[str] = field(default_factory=list) consecutive_tool_tracker: list[str] = field(default_factory=list)
tool_activity: list[dict[str, Any]] = field(default_factory=list) tool_activity: list[dict[str, Any]] = field(default_factory=list)
# Token usage summed from raw_responses after each streamed run. None
# until the first response that carries a usage object — some providers
# (notably non-OpenAI streaming routes) omit usage entirely, and we want
# eval cost grading to see "no data" rather than "0 tokens".
total_tokens_used: int | None = None
input_tokens_used: int | None = None
output_tokens_used: int | None = None
# Workflow state # Workflow state
last_workflow: Workflow | None = None last_workflow: Workflow | None = None
last_workflow_yaml: str | None = None last_workflow_yaml: str | None = None

View file

@ -918,6 +918,31 @@ class _SendTrackingStream:
await self._inner.close() await self._inner.close()
def _accumulate_usage(result: RunResultStreaming, ctx: Any) -> None:
"""Sum actual token usage from raw_responses onto the context.
Called per enforcement iteration in a ``finally:`` so pre-overflow
response tokens are still counted even when ``stream_to_sse`` raises.
First observed usage flips the counters from ``None`` to ``0``; if no
response on this stream carries a usage object the counters stay
``None``, which the eval surfaces as "telemetry missing" rather than
"ran for free".
"""
if not hasattr(ctx, "total_tokens_used"):
return
for resp in getattr(result, "raw_responses", []) or []:
usage = getattr(resp, "usage", None)
if usage is None:
continue
if ctx.total_tokens_used is None:
ctx.total_tokens_used = 0
ctx.input_tokens_used = 0
ctx.output_tokens_used = 0
ctx.total_tokens_used += getattr(usage, "total_tokens", 0) or 0
ctx.input_tokens_used += getattr(usage, "input_tokens", 0) or 0
ctx.output_tokens_used += getattr(usage, "output_tokens", 0) or 0
async def run_with_enforcement( async def run_with_enforcement(
agent: Agent, agent: Agent,
initial_input: str | list, initial_input: str | list,
@ -965,7 +990,10 @@ async def run_with_enforcement(
): ):
try: try:
result = Runner.run_streamed(agent, input=current_input, context=ctx, session=session, **runner_kwargs) result = Runner.run_streamed(agent, input=current_input, context=ctx, session=session, **runner_kwargs)
await stream_to_sse(result, tracked_stream, ctx) try:
await stream_to_sse(result, tracked_stream, ctx)
finally:
_accumulate_usage(result, ctx)
except Exception as e: except Exception as e:
if not _is_context_window_error(e): if not _is_context_window_error(e):
raise raise
@ -995,7 +1023,10 @@ async def run_with_enforcement(
result = Runner.run_streamed( result = Runner.run_streamed(
agent, input=current_input, context=ctx, session=session, **runner_kwargs agent, input=current_input, context=ctx, session=session, **runner_kwargs
) )
await stream_to_sse(result, tracked_stream, ctx) try:
await stream_to_sse(result, tracked_stream, ctx)
finally:
_accumulate_usage(result, ctx)
except Exception as retry_err: except Exception as retry_err:
# Never retry twice; even a second overflow surfaces as a # Never retry twice; even a second overflow surfaces as a
# real failure rather than spinning. # real failure rather than spinning.

View file

@ -20,6 +20,7 @@ from pydantic import ValidationError
from skyvern.forge import app from skyvern.forge import app
from skyvern.forge.failure_classifier import classify_from_failure_reason from skyvern.forge.failure_classifier import classify_from_failure_reason
from skyvern.forge.sdk.artifact.models import ArtifactType from skyvern.forge.sdk.artifact.models import ArtifactType
from skyvern.forge.sdk.copilot.block_goal_wrapping import wrap_block_goals
from skyvern.forge.sdk.copilot.context import CopilotContext from skyvern.forge.sdk.copilot.context import CopilotContext
from skyvern.forge.sdk.copilot.failure_tracking import ( from skyvern.forge.sdk.copilot.failure_tracking import (
_canonical_block_config, _canonical_block_config,
@ -45,6 +46,7 @@ from skyvern.forge.sdk.workflow.models.parameter import (
) )
from skyvern.forge.sdk.workflow.models.workflow import Workflow, WorkflowRun, WorkflowRunStatus from skyvern.forge.sdk.workflow.models.workflow import Workflow, WorkflowRun, WorkflowRunStatus
from skyvern.schemas.workflows import BlockType from skyvern.schemas.workflows import BlockType
from skyvern.utils.yaml_loader import safe_load_no_dates
from skyvern.webeye.navigation import is_skip_inner_retry_error from skyvern.webeye.navigation import is_skip_inner_retry_error
LOG = structlog.get_logger() LOG = structlog.get_logger()
@ -360,7 +362,7 @@ async def _attach_failed_block_screenshots(
task_id_to_block[task_id]["screenshot_b64"] = b64 task_id_to_block[task_id]["screenshot_b64"] = b64
_BLOCK_RUNNING_TOOLS = frozenset({"run_blocks_and_collect_debug", "update_and_run_blocks"}) BLOCK_RUNNING_TOOLS = frozenset({"run_blocks_and_collect_debug", "update_and_run_blocks"})
def _tool_loop_error(ctx: AgentContext, tool_name: str) -> str | None: def _tool_loop_error(ctx: AgentContext, tool_name: str) -> str | None:
@ -376,7 +378,7 @@ def _tool_loop_error(ctx: AgentContext, tool_name: str) -> str | None:
# detecting) and further attempts will just burn the tool timeout. Scoped # detecting) and further attempts will just burn the tool timeout. Scoped
# to block-running tools so planning/metadata tools (update_workflow, # to block-running tools so planning/metadata tools (update_workflow,
# list_credentials, get_run_results) stay unaffected. # list_credentials, get_run_results) stay unaffected.
if tool_name in _BLOCK_RUNNING_TOOLS: if tool_name in BLOCK_RUNNING_TOOLS:
# Reconciliation guard: the previous block-running tool call exited # Reconciliation guard: the previous block-running tool call exited
# without a trustworthy terminal status for its workflow run (the # without a trustworthy terminal status for its workflow run (the
# watchdog's stagnation / ceiling / task_exit_unfinalized paths, or # watchdog's stagnation / ceiling / task_exit_unfinalized paths, or
@ -504,6 +506,14 @@ def _parameter_binding_invariant_error(
async def _update_workflow(params: dict[str, Any], ctx: AgentContext) -> dict[str, Any]: async def _update_workflow(params: dict[str, Any], ctx: AgentContext) -> dict[str, Any]:
workflow_yaml = params["workflow_yaml"] workflow_yaml = params["workflow_yaml"]
# Post-emission reject of copilot-v2 writes that introduce a banned
# block type. The schema pre_hook only fires when the LLM consults the
# schema; this safety net fires regardless of emission path. Label-based
# diff preserves legacy workflows — only NEW banned labels trip the reject.
banned_items = _detect_new_banned_blocks(workflow_yaml, ctx.workflow_yaml)
if banned_items:
_record_banned_block_reject_span("_update_workflow", banned_items)
return {"ok": False, "error": _banned_block_reject_message(banned_items)}
try: try:
workflow = _process_workflow_yaml( workflow = _process_workflow_yaml(
workflow_id=ctx.workflow_id, workflow_id=ctx.workflow_id,
@ -1435,6 +1445,157 @@ async def _resolve_url_title(raw: dict[str, Any], ctx: AgentContext) -> tuple[st
return url, title return url, title
# Block types the copilot must never emit. They delegate the entire goal to
# a separate agent, which bypasses copilot-level block decomposition and
# obfuscates issues the copilot should surface/handle directly.
_COPILOT_BANNED_BLOCK_TYPES: frozenset[str] = frozenset({"task", "task_v2"})
# Shared suffix across every LLM-facing rejection message for banned
# block emission — the pre-hook (schema-lookup reject) and the post-
# emission detector both steer the LLM toward the same alternatives.
_COPILOT_BANNED_BLOCK_ALTERNATIVES = (
"Use `navigation` for page actions (filling forms, clicking, multi-step flows), "
"`extraction` for data extraction, `validation` for completion checks, "
"`login` for authentication, or `goto_url` for pure URL navigation."
)
def _banned_block_reject_message(items: list[tuple[str, str]]) -> str:
"""Uniform error text for the post-emission reject, sharing the
alternatives suffix with the schema pre-hook."""
labels = ", ".join(sorted({label for label, _ in items}))
types = sorted({block_type for _, block_type in items})
types_part = " / ".join(repr(t) for t in types)
return (
f"Block type {types_part} is not available in the workflow copilot. "
f"Offending labels: [{labels}]. "
f"{_COPILOT_BANNED_BLOCK_ALTERNATIVES}"
)
def _record_banned_block_reject_span(source_tool: str, items: list[tuple[str, str]]) -> None:
"""Emit the dedicated ``update_workflow_banned_block_reject`` span used
by post-rollout logfire trend queries."""
with copilot_span(
"update_workflow_banned_block_reject",
data={
"labels": [label for label, _ in items],
"block_types": sorted({block_type for _, block_type in items}),
"source_tool": source_tool,
},
):
pass
def _collect_banned_block_items(blocks: list[Any]) -> list[tuple[str, str]]:
"""Recursively walk ``blocks`` (mirroring
:func:`skyvern.forge.sdk.copilot.block_goal_wrapping._wrap_blocks_in_place`)
and return ``(label, normalized_block_type)`` for every block whose type is
in :data:`_COPILOT_BANNED_BLOCK_TYPES`. Blocks missing ``label`` are
skipped the downstream Pydantic validator surfaces those errors on its
own."""
items: list[tuple[str, str]] = []
for block in blocks:
if not isinstance(block, dict):
continue
raw_type = block.get("block_type")
if isinstance(raw_type, str):
normalized = raw_type.strip().lower()
if normalized in _COPILOT_BANNED_BLOCK_TYPES:
label = block.get("label")
if isinstance(label, str):
items.append((label, normalized))
loop_blocks = block.get("loop_blocks")
if isinstance(loop_blocks, list):
items.extend(_collect_banned_block_items(loop_blocks))
return items
def _parse_workflow_blocks(yaml_str: str | None) -> list[Any] | None:
"""Parse ``yaml_str`` and return ``workflow_definition.blocks`` as a list,
or ``None`` if the YAML is missing, unparseable, or not in the expected
shape. Graceful on every failure so callers can treat ``None`` as 'nothing
to compare against.'"""
if not yaml_str:
return None
try:
parsed = safe_load_no_dates(yaml_str)
except yaml.YAMLError:
return None
if not isinstance(parsed, dict):
return None
definition = parsed.get("workflow_definition")
if not isinstance(definition, dict):
return None
blocks = definition.get("blocks")
return blocks if isinstance(blocks, list) else None
def _detect_new_banned_blocks(
submitted_yaml: str,
prior_workflow_yaml: str | None,
) -> list[tuple[str, str]]:
"""Return ``[(label, block_type), ...]`` for every banned-type block in
``submitted_yaml`` whose label is NOT present as a banned-type block in
``prior_workflow_yaml``. Pure: no I/O, no logging.
Recurses into ``for_loop.loop_blocks`` mirroring
:func:`skyvern.forge.sdk.copilot.block_goal_wrapping._wrap_blocks_in_place`.
Legacy workflows that carry ``task`` / ``task_v2`` blocks under unchanged
labels produce an empty list and therefore do not reject.
Malformed YAML, missing ``workflow_definition``, or a non-list ``blocks``
all produce an empty list the downstream Pydantic validation in
``_process_workflow_yaml`` surfaces the specific parse / shape error on
its own path.
"""
submitted_blocks = _parse_workflow_blocks(submitted_yaml)
if submitted_blocks is None:
return []
submitted_items = _collect_banned_block_items(submitted_blocks)
if not submitted_items:
return []
prior_blocks = _parse_workflow_blocks(prior_workflow_yaml)
prior_labels = {label for label, _ in _collect_banned_block_items(prior_blocks or [])}
return [(label, block_type) for label, block_type in submitted_items if label not in prior_labels]
async def _get_block_schema_pre_hook(
params: dict[str, Any],
ctx: AgentContext,
) -> dict[str, Any] | None:
"""Short-circuit requests for banned block types with an explicit error.
Without this pre-hook the underlying MCP tool silently redirects ``task``
and ``task_v2`` queries to ``navigation``'s schema, which makes the LLM
think the banned types are available."""
block_type = params.get("block_type")
if not isinstance(block_type, str):
return None
normalized = block_type.strip().lower()
if normalized not in _COPILOT_BANNED_BLOCK_TYPES:
return None
return {
"ok": False,
"error": f"Block type {block_type!r} is not available in the workflow copilot. {_COPILOT_BANNED_BLOCK_ALTERNATIVES}",
}
async def _get_block_schema_post_hook(
result: dict[str, Any],
raw: dict[str, Any],
ctx: AgentContext,
) -> dict[str, Any]:
"""Scrub banned block types from list-mode responses. Belt-and-suspenders
against future drift in ``BLOCK_SUMMARIES`` (which currently omits them)."""
data = result.get("data")
if isinstance(data, dict):
block_types = data.get("block_types")
if isinstance(block_types, dict):
for banned in _COPILOT_BANNED_BLOCK_TYPES:
block_types.pop(banned, None)
return result
async def _evaluate_pre_hook( async def _evaluate_pre_hook(
params: dict[str, Any], params: dict[str, Any],
ctx: AgentContext, ctx: AgentContext,
@ -1621,7 +1782,10 @@ def get_skyvern_mcp_alias_map() -> dict[str, str]:
def _build_skyvern_mcp_overlays() -> dict[str, SchemaOverlay]: def _build_skyvern_mcp_overlays() -> dict[str, SchemaOverlay]:
return { return {
"get_block_schema": SchemaOverlay(), "get_block_schema": SchemaOverlay(
pre_hook=_get_block_schema_pre_hook,
post_hook=_get_block_schema_post_hook,
),
"validate_block": SchemaOverlay(), "validate_block": SchemaOverlay(),
"navigate_browser": SchemaOverlay( "navigate_browser": SchemaOverlay(
description=( description=(
@ -2076,6 +2240,14 @@ async def update_and_run_blocks_tool(
# the new one — we need the pre-update state to diff against. # the new one — we need the pre-update state to diff against.
prior_definition = await _get_prior_workflow_definition(copilot_ctx) prior_definition = await _get_prior_workflow_definition(copilot_ctx)
# Wrap each block's navigation_goal / complete_criterion / terminate_criterion
# with the user's message as "big goal" context so downstream LLMs (verifier,
# validation-block prompt) have user-intent framing — mirrors TaskV2.
if copilot_ctx.user_message:
workflow_yaml = wrap_block_goals(workflow_yaml, copilot_ctx.user_message)
else:
LOG.warning("update_and_run_blocks invoked without copilot_ctx.user_message; skipping block-goal wrap")
# Step 1: Update the workflow # Step 1: Update the workflow
with copilot_span("update_workflow", data={"yaml_length": len(workflow_yaml)}): with copilot_span("update_workflow", data={"yaml_length": len(workflow_yaml)}):
update_result = await _update_workflow({"workflow_yaml": workflow_yaml}, copilot_ctx) update_result = await _update_workflow({"workflow_yaml": workflow_yaml}, copilot_ctx)

View file

@ -37,7 +37,6 @@ class SkyvernContext:
max_screenshot_scrolls: int | None = None max_screenshot_scrolls: int | None = None
browser_container_ip: str | None = None browser_container_ip: str | None = None
browser_container_task_arn: str | None = None browser_container_task_arn: str | None = None
is_remote_browser: bool = False
feature_flag_entries: dict[str, bool | str | None] = field(default_factory=dict) feature_flag_entries: dict[str, bool | str | None] = field(default_factory=dict)
# feature flags # feature flags

View file

@ -21,7 +21,7 @@ class CursorEventStrategy(ABC):
pass pass
@abstractmethod @abstractmethod
async def click(self, page: Page, locator: Locator, timeout: float | None = None) -> None: async def click(self, page: Page, locator: Locator) -> None:
pass pass

View file

@ -34,12 +34,8 @@ class DefaultCursorStrategy(CursorEventStrategy):
LOG.debug("move_to_element failed", exc_info=True) LOG.debug("move_to_element failed", exc_info=True)
return 0.0, 0.0 return 0.0, 0.0
async def click(self, page: Page, locator: Locator, timeout: float | None = None) -> None: async def click(self, page: Page, locator: Locator) -> None:
LOG.debug("DefaultCursorStrategy.click", timeout=timeout) await locator.click()
kwargs: dict = {}
if timeout is not None:
kwargs["timeout"] = timeout
await locator.click(**kwargs)
class DefaultInputStrategy(InputEventStrategy): class DefaultInputStrategy(InputEventStrategy):

View file

@ -123,15 +123,6 @@ class EventStrategyFactory:
"""Update cursor position without generating movement.""" """Update cursor position without generating movement."""
EventStrategyFactory.get_cursor_strategy().sync_position(page, x, y) EventStrategyFactory.get_cursor_strategy().sync_position(page, x, y)
@staticmethod
async def click(page: Page, locator: Locator, timeout: float | None = None) -> None:
"""Click an element using the active cursor strategy."""
start = time.perf_counter()
try:
await EventStrategyFactory.get_cursor_strategy().click(page, locator, timeout)
finally:
EventStrategyFactory.__metrics.record("click", time.perf_counter() - start)
# -- input convenience methods ---------------------------------------------- # -- input convenience methods ----------------------------------------------
@staticmethod @staticmethod

View file

@ -910,6 +910,8 @@ async def _new_copilot_chat_post(
message=user_response, message=user_response,
updated_workflow=updated_workflow.model_dump(mode="json") if updated_workflow else None, updated_workflow=updated_workflow.model_dump(mode="json") if updated_workflow else None,
response_time=assistant_message.created_at, response_time=assistant_message.created_at,
total_tokens=getattr(agent_result, "total_tokens", None),
response_type=getattr(agent_result, "response_type", "REPLY"),
) )
) )
except HTTPException as exc: except HTTPException as exc:

View file

@ -3,6 +3,8 @@ from enum import StrEnum
from pydantic import BaseModel, ConfigDict, Field from pydantic import BaseModel, ConfigDict, Field
from skyvern.forge.sdk.copilot.context import ResponseType
class WorkflowCopilotChat(BaseModel): class WorkflowCopilotChat(BaseModel):
model_config = ConfigDict(from_attributes=True) model_config = ConfigDict(from_attributes=True)
@ -86,6 +88,11 @@ class WorkflowCopilotStreamResponseUpdate(BaseModel):
message: str = Field(..., description="The message sent to the user") message: str = Field(..., description="The message sent to the user")
updated_workflow: dict | None = Field(None, description="The updated workflow") updated_workflow: dict | None = Field(None, description="The updated workflow")
response_time: datetime = Field(..., description="When the assistant message was created") response_time: datetime = Field(..., description="When the assistant message was created")
total_tokens: int | None = Field(
None,
description="Total tokens consumed by the agent during this turn; None when no provider reported usage",
)
response_type: ResponseType = Field("REPLY", description="Agent response classification")
class WorkflowCopilotStreamErrorUpdate(BaseModel): class WorkflowCopilotStreamErrorUpdate(BaseModel):

View file

@ -2,8 +2,6 @@ from typing import Any, Pattern
from playwright.async_api import Locator from playwright.async_api import Locator
from skyvern.forge.sdk.event.factory import EventStrategyFactory
class SkyvernLocator: class SkyvernLocator:
"""Locator for finding and interacting with elements on a page. """Locator for finding and interacting with elements on a page.
@ -18,12 +16,7 @@ class SkyvernLocator:
# Action methods # Action methods
async def click(self, **kwargs: Any) -> None: async def click(self, **kwargs: Any) -> None:
"""Click the element.""" """Click the element."""
timeout = kwargs.pop("timeout", None) await self._locator.click(**kwargs)
if kwargs:
# Extra kwargs (e.g. position) — use raw Playwright click
await self._locator.click(timeout=timeout, **kwargs)
else:
await EventStrategyFactory.click(self._locator.page, self._locator, timeout=timeout)
async def fill(self, value: str, **kwargs: Any) -> None: async def fill(self, value: str, **kwargs: Any) -> None:
"""Fill an input element with text.""" """Fill an input element with text."""

View file

@ -9,6 +9,7 @@ from opentelemetry import trace as otel_trace
from sqlalchemy.exc import OperationalError from sqlalchemy.exc import OperationalError
from skyvern.config import settings from skyvern.config import settings
from skyvern.constants import MINI_GOAL_TEMPLATE
from skyvern.exceptions import ( from skyvern.exceptions import (
FailedToSendWebhook, FailedToSendWebhook,
TaskTerminationError, TaskTerminationError,
@ -77,12 +78,6 @@ RANDOM_STRING_POOL = string.ascii_letters + string.digits
# This limits how many times the LLM can plan and execute actions # This limits how many times the LLM can plan and execute actions
DEFAULT_MAX_ITERATIONS = 50 DEFAULT_MAX_ITERATIONS = 50
MINI_GOAL_TEMPLATE = """Achieve the following mini goal and once it's achieved, complete:
```{mini_goal}```
This mini goal is part of the big goal the user wants to achieve and use the big goal as context to achieve the mini goal:
```{main_goal}```"""
def _generate_data_extraction_schema_for_loop(loop_values_key: str) -> dict: def _generate_data_extraction_schema_for_loop(loop_values_key: str) -> dict:
return { return {

View file

@ -2124,7 +2124,7 @@ async def handle_select_option_action(
try: try:
await EventStrategyFactory.move_to_element(page, skyvern_element.get_locator()) await EventStrategyFactory.move_to_element(page, skyvern_element.get_locator())
await EventStrategyFactory.click(page, skyvern_element.get_locator(), timeout=timeout) await skyvern_element.get_locator().click(timeout=timeout)
except Exception: except Exception:
LOG.info( LOG.info(
"fail to open dropdown by clicking, try to press arrow down to open", "fail to open dropdown by clicking, try to press arrow down to open",
@ -2664,7 +2664,7 @@ async def chain_click(
try: try:
if not await skyvern_element.navigate_to_a_href(page=page): if not await skyvern_element.navigate_to_a_href(page=page):
await EventStrategyFactory.move_to_element(page, locator) await EventStrategyFactory.move_to_element(page, locator)
await EventStrategyFactory.click(page, locator, timeout=timeout) await locator.click(timeout=timeout)
LOG.info("Chain click: main element click succeeded", action=action, locator=locator) LOG.info("Chain click: main element click succeeded", action=action, locator=locator)
return [ActionSuccess()] return [ActionSuccess()]
@ -2680,7 +2680,7 @@ async def chain_click(
locator=locator, locator=locator,
) )
if bound_element := await skyvern_element.find_label_for(dom=dom): if bound_element := await skyvern_element.find_label_for(dom=dom):
await EventStrategyFactory.click(page, bound_element.get_locator(), timeout=timeout) await bound_element.get_locator().click(timeout=timeout)
action_results.append(ActionSuccess()) action_results.append(ActionSuccess())
return action_results return action_results
except Exception as e: except Exception as e:
@ -2698,7 +2698,7 @@ async def chain_click(
if bound_element := await skyvern_element.find_element_in_label_children( if bound_element := await skyvern_element.find_element_in_label_children(
dom=dom, element_type=InteractiveElement.INPUT dom=dom, element_type=InteractiveElement.INPUT
): ):
await EventStrategyFactory.click(page, bound_element.get_locator(), timeout=timeout) await bound_element.get_locator().click(timeout=timeout)
action_results.append(ActionSuccess()) action_results.append(ActionSuccess())
return action_results return action_results
except Exception as e: except Exception as e:
@ -2716,8 +2716,6 @@ async def chain_click(
) )
if bound_locator := await skyvern_element.find_bound_label_by_attr_id(): if bound_locator := await skyvern_element.find_bound_label_by_attr_id():
# click on (0, 0) to avoid playwright clicking on the wrong element by accident # click on (0, 0) to avoid playwright clicking on the wrong element by accident
# Intentional positional click bypassing EventStrategyFactory — position
# anchoring at (0,0) avoids mis-targeting nested elements inside labels
await bound_locator.click(timeout=timeout, position={"x": 0, "y": 0}) await bound_locator.click(timeout=timeout, position={"x": 0, "y": 0})
action_results.append(ActionSuccess()) action_results.append(ActionSuccess())
return action_results return action_results
@ -2735,8 +2733,6 @@ async def chain_click(
) )
if bound_locator := await skyvern_element.find_bound_label_by_direct_parent(): if bound_locator := await skyvern_element.find_bound_label_by_direct_parent():
# click on (0, 0) to avoid playwright clicking on the wrong element by accident # click on (0, 0) to avoid playwright clicking on the wrong element by accident
# Intentional positional click bypassing EventStrategyFactory — position
# anchoring at (0,0) avoids mis-targeting nested elements inside labels
await bound_locator.click(timeout=timeout, position={"x": 0, "y": 0}) await bound_locator.click(timeout=timeout, position={"x": 0, "y": 0})
action_results.append(ActionSuccess()) action_results.append(ActionSuccess())
return action_results return action_results
@ -2816,7 +2812,7 @@ async def chain_click(
locator=locator, locator=locator,
) )
await EventStrategyFactory.click(page, blocking_element.get_locator(), timeout=timeout) await blocking_element.get_locator().click(timeout=timeout)
action_results.append(ActionSuccess()) action_results.append(ActionSuccess())
return action_results return action_results
except Exception as e: except Exception as e:
@ -2927,9 +2923,7 @@ async def choose_auto_completion_dropdown(
input_value=text, input_value=text,
) )
try: try:
await EventStrategyFactory.click( await fast_path_locator.click(timeout=settings.BROWSER_ACTION_TIMEOUT_MS)
page, fast_path_locator, timeout=settings.BROWSER_ACTION_TIMEOUT_MS
)
clear_input = False clear_input = False
result.action_result = ActionSuccess() result.action_result = ActionSuccess()
return result return result
@ -3261,9 +3255,7 @@ async def discover_and_select_from_full_dropdown(
# Try click first to open the dropdown (most combobox components respond to click) # Try click first to open the dropdown (most combobox components respond to click)
try: try:
await EventStrategyFactory.click( await skyvern_element.get_locator().click(timeout=settings.BROWSER_ACTION_TIMEOUT_MS)
page, skyvern_element.get_locator(), timeout=settings.BROWSER_ACTION_TIMEOUT_MS
)
except Exception: except Exception:
LOG.info( LOG.info(
"Click failed in discover fallback, continuing to ArrowDown", "Click failed in discover fallback, continuing to ArrowDown",
@ -3880,7 +3872,7 @@ async def select_from_dropdown(
value=value, value=value,
) )
await EventStrategyFactory.move_to_element(page, locator) await EventStrategyFactory.move_to_element(page, locator)
await EventStrategyFactory.click(page, locator, timeout=timeout) await locator.click(timeout=timeout)
single_select_result.action_result = ActionSuccess() single_select_result.action_result = ActionSuccess()
return single_select_result return single_select_result
except Exception as e: except Exception as e:
@ -3909,7 +3901,7 @@ async def select_from_dropdown_by_value(
element_locator = await incremental_scraped.select_one_element_by_value(value=value) element_locator = await incremental_scraped.select_one_element_by_value(value=value)
if element_locator is not None: if element_locator is not None:
await EventStrategyFactory.click(page, element_locator, timeout=timeout) await element_locator.click(timeout=timeout)
return ActionSuccess() return ActionSuccess()
if dropdown_menu_element is None: if dropdown_menu_element is None:
@ -3945,7 +3937,7 @@ async def select_from_dropdown_by_value(
element_locator = await incre_scraped.select_one_element_by_value(value=value) element_locator = await incre_scraped.select_one_element_by_value(value=value)
if element_locator is not None: if element_locator is not None:
await EventStrategyFactory.click(page, element_locator, timeout=timeout) await element_locator.click(timeout=timeout)
nonlocal selected nonlocal selected
selected = True selected = True
return False return False
@ -4234,7 +4226,9 @@ async def normal_select(
value: str | None = json_response.get("value") value: str | None = json_response.get("value")
try: try:
await EventStrategyFactory.click(locator.page, locator, timeout=settings.BROWSER_ACTION_TIMEOUT_MS) await locator.click(
timeout=settings.BROWSER_ACTION_TIMEOUT_MS,
)
except Exception as e: except Exception as e:
LOG.info( LOG.info(
"Failed to click before select action", "Failed to click before select action",
@ -4792,7 +4786,7 @@ async def click_listbox_option(
try: try:
skyvern_element = await dom.get_skyvern_element_by_id(child["id"]) skyvern_element = await dom.get_skyvern_element_by_id(child["id"])
locator = skyvern_element.locator locator = skyvern_element.locator
await EventStrategyFactory.click(page, locator, timeout=1000) await locator.click(timeout=1000)
return True return True
except Exception: except Exception:

View file

@ -795,11 +795,6 @@ async def _connect_to_cdp_browser(
LOG.info("Connecting browser CDP connection", remote_browser_url=remote_browser_url) LOG.info("Connecting browser CDP connection", remote_browser_url=remote_browser_url)
browser = await playwright.chromium.connect_over_cdp(remote_browser_url) browser = await playwright.chromium.connect_over_cdp(remote_browser_url)
# Mark as remote browser so strategies can adapt their dispatch method
ctx = current()
if ctx is not None:
ctx.is_remote_browser = True
if apply_download_behaviour: if apply_download_behaviour:
await _apply_download_behaviour(browser) await _apply_download_behaviour(browser)

View file

@ -851,7 +851,7 @@ class SkyvernElement:
await EventStrategyFactory.move_to_element(page, self.get_locator()) await EventStrategyFactory.move_to_element(page, self.get_locator())
try: try:
await EventStrategyFactory.click(page, self.get_locator(), timeout=timeout) await self.get_locator().click(timeout=timeout)
return return
except Exception: except Exception:
LOG.info("Failed to click by playwright", exc_info=True, element_id=self.get_id()) LOG.info("Failed to click by playwright", exc_info=True, element_id=self.get_id())
@ -863,7 +863,7 @@ class SkyvernElement:
blocking_element, _ = await self.find_blocking_element(dom=dom, incremental_page=incremental_page) blocking_element, _ = await self.find_blocking_element(dom=dom, incremental_page=incremental_page)
if blocking_element: if blocking_element:
LOG.debug("Find the blocking element", element_id=blocking_element.get_id()) LOG.debug("Find the blocking element", element_id=blocking_element.get_id())
await EventStrategyFactory.click(page, blocking_element.get_locator(), timeout=timeout) await blocking_element.get_locator().click(timeout=timeout)
return return
except Exception: except Exception:
LOG.info("Failed to click on the blocking element", exc_info=True, element_id=self.get_id()) LOG.info("Failed to click on the blocking element", exc_info=True, element_id=self.get_id())

View file

@ -1,7 +1,13 @@
"""Shared pytest fixtures and setup for unit tests."""
# -- begin speed up unit tests # -- begin speed up unit tests
from unittest.mock import AsyncMock, MagicMock from unittest.mock import AsyncMock, MagicMock
import pytest import pytest
from opentelemetry import trace as otel_trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from tests.unit.force_stub_app import start_forge_stub_app from tests.unit.force_stub_app import start_forge_stub_app
@ -59,3 +65,36 @@ def make_mock_session(mock_model: MagicMock) -> AsyncMock:
mock_session.refresh = AsyncMock() mock_session.refresh = AsyncMock()
return mock_session return mock_session
# -- shared OTEL span capture for tests that assert on span attributes --
#
# OTEL's global TracerProvider can only be set once per process. We install a
# single TracerProvider + InMemorySpanExporter at session start; tests that
# need span capture depend on the `span_exporter` fixture and get a cleared
# exporter for each test.
_SHARED_SPAN_EXPORTER: InMemorySpanExporter | None = None
def _install_span_exporter() -> InMemorySpanExporter:
global _SHARED_SPAN_EXPORTER
if _SHARED_SPAN_EXPORTER is None:
exporter = InMemorySpanExporter()
provider = otel_trace.get_tracer_provider()
if isinstance(provider, TracerProvider):
provider.add_span_processor(SimpleSpanProcessor(exporter))
else:
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
otel_trace.set_tracer_provider(provider)
_SHARED_SPAN_EXPORTER = exporter
return _SHARED_SPAN_EXPORTER
@pytest.fixture
def span_exporter() -> InMemorySpanExporter:
exporter = _install_span_exporter()
exporter.clear()
yield exporter
exporter.clear()

View file

@ -0,0 +1,171 @@
"""Tests for the `skyvern.agent.complete_verify` OTEL span attributes (SKY-9174).
The verification span carries two attributes that power the logfire signal used
to measure SKY-9174's acceptance criterion post-rollout:
- ``verification.status``: ``"complete" | "terminate" | "continue"``
- ``verification.template``: ``"check-user-goal" | "check-user-goal-with-termination"``
These tests assert the attributes are set correctly across the three result
shapes and under both prompt-template selections. No behavioral change is being
verified just the observability plumbing.
"""
from __future__ import annotations
from datetime import UTC, datetime
from unittest.mock import AsyncMock
from zoneinfo import ZoneInfo
import pytest
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from skyvern.forge.agent import ForgeAgent
from skyvern.forge.sdk.core import skyvern_context
from skyvern.forge.sdk.core.skyvern_context import SkyvernContext
from skyvern.forge.sdk.models import StepStatus
from tests.unit.helpers import make_browser_state, make_organization, make_step, make_task
COMPLETE_VERIFY_SPAN_NAME = "skyvern.agent.complete_verify"
def _span_by_name(spans: list, name: str):
return next((s for s in spans if s.name == name), None)
async def _call_complete_verify(
monkeypatch: pytest.MonkeyPatch,
*,
llm_response: dict,
use_termination_prompt: bool,
) -> None:
agent = ForgeAgent()
now = datetime.now(UTC)
organization = make_organization(now)
task = make_task(now, organization, navigation_goal="Submit the contact form")
step = make_step(
now,
task,
step_id="step-verify",
status=StepStatus.running,
order=0,
output=None,
)
_, scraped_page, page = make_browser_state()
scraped_page_refreshed = AsyncMock()
scraped_page_refreshed.screenshots = [b"image"]
scraped_page.refresh = AsyncMock(return_value=scraped_page_refreshed)
monkeypatch.setattr(
"skyvern.forge.agent.service_utils.is_cua_task",
AsyncMock(return_value=False),
)
async def feature_flag_side_effect(flag_name: str, *_args, **_kwargs) -> bool:
if flag_name == "USE_TERMINATION_AWARE_COMPLETE_VERIFICATION":
return use_termination_prompt
return False
monkeypatch.setattr(
"skyvern.forge.agent.app.EXPERIMENTATION_PROVIDER.is_feature_enabled_cached",
AsyncMock(side_effect=feature_flag_side_effect),
)
monkeypatch.setattr(
"skyvern.forge.agent.load_prompt_with_elements",
lambda **_kwargs: "rendered prompt",
)
llm_handler = AsyncMock(return_value=llm_response)
monkeypatch.setattr(
"skyvern.forge.agent.LLMAPIHandlerFactory.get_override_llm_api_handler",
lambda *_args, **_kwargs: llm_handler,
)
context = SkyvernContext(
task_id=task.task_id,
step_id=None,
organization_id=task.organization_id,
workflow_run_id=task.workflow_run_id,
tz_info=ZoneInfo("UTC"),
)
skyvern_context.set(context)
try:
await agent.complete_verify(page=page, scraped_page=scraped_page, task=task, step=step)
finally:
skyvern_context.reset()
@pytest.mark.asyncio
async def test_span_attrs_for_complete_status_legacy_prompt(
monkeypatch: pytest.MonkeyPatch, span_exporter: InMemorySpanExporter
) -> None:
await _call_complete_verify(
monkeypatch,
llm_response={"user_goal_achieved": True, "thoughts": "done", "page_info": "ok"},
use_termination_prompt=False,
)
span = _span_by_name(span_exporter.get_finished_spans(), COMPLETE_VERIFY_SPAN_NAME)
assert span is not None
attrs = span.attributes or {}
assert attrs.get("verification.status") == "complete"
assert attrs.get("verification.template") == "check-user-goal"
@pytest.mark.asyncio
async def test_span_attrs_for_continue_status_legacy_prompt(
monkeypatch: pytest.MonkeyPatch, span_exporter: InMemorySpanExporter
) -> None:
await _call_complete_verify(
monkeypatch,
llm_response={"user_goal_achieved": False, "thoughts": "still loading", "page_info": "spinner"},
use_termination_prompt=False,
)
span = _span_by_name(span_exporter.get_finished_spans(), COMPLETE_VERIFY_SPAN_NAME)
assert span is not None
attrs = span.attributes or {}
assert attrs.get("verification.status") == "continue"
assert attrs.get("verification.template") == "check-user-goal"
@pytest.mark.asyncio
async def test_span_attrs_for_terminate_status_termination_aware_prompt(
monkeypatch: pytest.MonkeyPatch, span_exporter: InMemorySpanExporter
) -> None:
await _call_complete_verify(
monkeypatch,
llm_response={
"status": "terminate",
"thoughts": "blocked by captcha",
"page_info": "cloudflare",
"failure_categories": [{"category": "ANTI_BOT_DETECTION", "confidence_float": 0.9, "reasoning": "cf"}],
},
use_termination_prompt=True,
)
span = _span_by_name(span_exporter.get_finished_spans(), COMPLETE_VERIFY_SPAN_NAME)
assert span is not None
attrs = span.attributes or {}
assert attrs.get("verification.status") == "terminate"
assert attrs.get("verification.template") == "check-user-goal-with-termination"
@pytest.mark.asyncio
async def test_span_attrs_for_complete_status_termination_aware_prompt(
monkeypatch: pytest.MonkeyPatch, span_exporter: InMemorySpanExporter
) -> None:
await _call_complete_verify(
monkeypatch,
llm_response={
"status": "complete",
"thoughts": "thank-you page visible",
"page_info": "thank-you",
"failure_categories": [],
},
use_termination_prompt=True,
)
span = _span_by_name(span_exporter.get_finished_spans(), COMPLETE_VERIFY_SPAN_NAME)
assert span is not None
attrs = span.attributes or {}
assert attrs.get("verification.status") == "complete"
assert attrs.get("verification.template") == "check-user-goal-with-termination"

View file

@ -315,6 +315,37 @@ class TestTranslateToAgentResultGating:
assert "All done" not in agent_result.user_response assert "All done" not in agent_result.user_response
assert agent_result.updated_workflow is None assert agent_result.updated_workflow is None
def test_inline_replace_workflow_wraps_block_goals_with_user_message(self, monkeypatch) -> None:
# SKY-9174 parity: update_and_run_blocks_tool wraps block goals with
# the user's chat message as big-goal context. The REPLACE_WORKFLOW
# inline path must do the same, otherwise the untested yaml latches
# onto ctx without user-intent framing and any downstream block run
# hits the verifier-on-confirmation-surface bug this PR fixes.
from skyvern.forge.sdk.copilot import agent as agent_module
captured: dict[str, str] = {}
def fake_process(**kwargs):
captured["yaml"] = kwargs["workflow_yaml"]
return SimpleNamespace(name="new-wf")
def fake_wrap(workflow_yaml: str, user_message: str) -> str:
return f"WRAPPED::{user_message}::{workflow_yaml}"
monkeypatch.setattr("skyvern.forge.sdk.copilot.tools._process_workflow_yaml", fake_process)
monkeypatch.setattr("skyvern.forge.sdk.copilot.agent.wrap_block_goals", fake_wrap)
ctx = _ctx(user_message="Submit a contact form on example.com.")
result = _fake_run_result(
{"type": "REPLACE_WORKFLOW", "user_response": "Here you go.", "workflow_yaml": "raw: yaml"}
)
agent_module._translate_to_agent_result(
result, ctx, global_llm_context=None, chat_request=_chat_request(), organization_id="org-1"
)
assert captured["yaml"] == "WRAPPED::Submit a contact form on example.com.::raw: yaml"
assert ctx.last_workflow_yaml == "WRAPPED::Submit a contact form on example.com.::raw: yaml"
class TestCredentialRefusalReachesAgent: class TestCredentialRefusalReachesAgent:
"""Prove the SKY-9189 refusal rule is actually delivered to the agent. """Prove the SKY-9189 refusal rule is actually delivered to the agent.

View file

@ -0,0 +1,261 @@
"""Tests for the copilot v2 block-goal wrap helper (SKY-9174, Part B).
Covers wrapping of ``navigation_goal``, ``complete_criterion``, and
``terminate_criterion`` via ``MINI_GOAL_TEMPLATE``.
"""
from __future__ import annotations
import textwrap
import yaml
from skyvern.constants import MINI_GOAL_TEMPLATE
from skyvern.forge.sdk.copilot.block_goal_wrapping import wrap_block_goals
USER_MESSAGE = "Submit a contact form on example.com with my details."
def _yaml_with_blocks(*blocks: dict) -> str:
return yaml.safe_dump(
{
"title": "test workflow",
"workflow_definition": {"blocks": list(blocks)},
},
sort_keys=False,
)
def _blocks_from(yaml_str: str) -> list[dict]:
return yaml.safe_load(yaml_str)["workflow_definition"]["blocks"]
def _wrapped(mini_goal: str) -> str:
return MINI_GOAL_TEMPLATE.format(mini_goal=mini_goal, main_goal=USER_MESSAGE)
def test_wraps_task_block_navigation_goal() -> None:
src = _yaml_with_blocks(
{"block_type": "task", "label": "fill_form", "navigation_goal": "Fill in name and email, click Send"}
)
out = wrap_block_goals(src, USER_MESSAGE)
blocks = _blocks_from(out)
assert blocks[0]["navigation_goal"] == _wrapped("Fill in name and email, click Send")
def test_wraps_navigation_action_login_and_file_download_block_types() -> None:
src = _yaml_with_blocks(
{"block_type": "navigation", "label": "a", "navigation_goal": "nav goal"},
{"block_type": "action", "label": "b", "navigation_goal": "action goal"},
{"block_type": "login", "label": "c", "navigation_goal": "login goal"},
{"block_type": "file_download", "label": "d", "navigation_goal": "download goal"},
)
out = wrap_block_goals(src, USER_MESSAGE)
blocks = _blocks_from(out)
assert blocks[0]["navigation_goal"] == _wrapped("nav goal")
assert blocks[1]["navigation_goal"] == _wrapped("action goal")
assert blocks[2]["navigation_goal"] == _wrapped("login goal")
assert blocks[3]["navigation_goal"] == _wrapped("download goal")
def test_wraps_complete_criterion_on_validation_navigation_and_login_blocks() -> None:
src = _yaml_with_blocks(
{"block_type": "validation", "label": "v", "complete_criterion": "Your message has been sent"},
{
"block_type": "navigation",
"label": "n",
"navigation_goal": "submit form",
"complete_criterion": "confirmation page visible",
},
{
"block_type": "login",
"label": "l",
"navigation_goal": "login",
"complete_criterion": "user dashboard visible",
},
)
out = wrap_block_goals(src, USER_MESSAGE)
blocks = _blocks_from(out)
assert blocks[0]["complete_criterion"] == _wrapped("Your message has been sent")
assert blocks[1]["navigation_goal"] == _wrapped("submit form")
assert blocks[1]["complete_criterion"] == _wrapped("confirmation page visible")
assert blocks[2]["navigation_goal"] == _wrapped("login")
assert blocks[2]["complete_criterion"] == _wrapped("user dashboard visible")
def test_wraps_terminate_criterion() -> None:
src = _yaml_with_blocks(
{
"block_type": "validation",
"label": "v",
"complete_criterion": "order placed",
"terminate_criterion": "payment failed",
},
)
out = wrap_block_goals(src, USER_MESSAGE)
block = _blocks_from(out)[0]
assert block["complete_criterion"] == _wrapped("order placed")
assert block["terminate_criterion"] == _wrapped("payment failed")
def test_leaves_blocks_without_wrappable_fields_untouched() -> None:
src = _yaml_with_blocks(
{"block_type": "extraction", "label": "extract", "data_extraction_goal": "get title"},
{"block_type": "goto_url", "label": "go", "url": "https://example.com"},
{"block_type": "task", "label": "empty_goal", "navigation_goal": "", "complete_criterion": ""},
{"block_type": "validation", "label": "null_crit", "complete_criterion": None},
)
out = wrap_block_goals(src, USER_MESSAGE)
blocks = _blocks_from(out)
assert blocks[0] == {"block_type": "extraction", "label": "extract", "data_extraction_goal": "get title"}
assert blocks[1] == {"block_type": "goto_url", "label": "go", "url": "https://example.com"}
assert blocks[2]["navigation_goal"] == ""
assert blocks[2]["complete_criterion"] == ""
assert blocks[3]["complete_criterion"] is None
def test_idempotent_on_already_wrapped_fields() -> None:
already_wrapped_goal = _wrapped("Fill in name and email, click Send")
already_wrapped_criterion = _wrapped("Your message has been sent")
src = _yaml_with_blocks(
{
"block_type": "task",
"label": "fill_form",
"navigation_goal": already_wrapped_goal,
"complete_criterion": already_wrapped_criterion,
}
)
out = wrap_block_goals(src, USER_MESSAGE)
block = _blocks_from(out)[0]
assert block["navigation_goal"] == already_wrapped_goal
assert block["complete_criterion"] == already_wrapped_criterion
def test_noop_on_empty_user_message() -> None:
src = _yaml_with_blocks(
{
"block_type": "task",
"label": "fill_form",
"navigation_goal": "Fill the form",
"complete_criterion": "Your message has been sent",
}
)
out = wrap_block_goals(src, "")
assert out == src
def test_noop_when_no_block_mutations_needed() -> None:
src = _yaml_with_blocks(
{"block_type": "extraction", "label": "extract", "data_extraction_goal": "get title"},
)
out = wrap_block_goals(src, USER_MESSAGE)
assert out == src
def test_recurses_into_for_loop_blocks() -> None:
src = yaml.safe_dump(
{
"title": "loop workflow",
"workflow_definition": {
"blocks": [
{
"block_type": "for_loop",
"label": "loop",
"loop_over": {"parameter_key": "items"},
"loop_blocks": [
{"block_type": "task", "label": "inner_task", "navigation_goal": "Process each item"},
{
"block_type": "validation",
"label": "inner_check",
"complete_criterion": "item processed",
},
],
}
]
},
},
sort_keys=False,
)
out = wrap_block_goals(src, USER_MESSAGE)
parsed = yaml.safe_load(out)
loop_blocks = parsed["workflow_definition"]["blocks"][0]["loop_blocks"]
assert loop_blocks[0]["navigation_goal"] == _wrapped("Process each item")
assert loop_blocks[1]["complete_criterion"] == _wrapped("item processed")
def test_preserves_other_fields() -> None:
src = _yaml_with_blocks(
{
"block_type": "task",
"label": "fill_form",
"url": "https://example.com",
"title": "Fill form",
"navigation_goal": "Fill in fields",
"parameter_keys": ["name", "email"],
"complete_criterion": "Form submitted",
"max_retries": 2,
}
)
out = wrap_block_goals(src, USER_MESSAGE)
block = _blocks_from(out)[0]
assert block["url"] == "https://example.com"
assert block["title"] == "Fill form"
assert block["parameter_keys"] == ["name", "email"]
assert block["max_retries"] == 2
assert block["navigation_goal"] == _wrapped("Fill in fields")
assert block["complete_criterion"] == _wrapped("Form submitted")
def test_returns_input_unchanged_on_malformed_yaml() -> None:
malformed = textwrap.dedent(
"""
title: bad
workflow_definition:
blocks:
- block_type: task
navigation_goal: "unclosed
"""
).strip()
out = wrap_block_goals(malformed, USER_MESSAGE)
assert out == malformed
def test_returns_input_unchanged_when_workflow_definition_missing() -> None:
src = yaml.safe_dump({"title": "no definition"}, sort_keys=False)
out = wrap_block_goals(src, USER_MESSAGE)
assert out == src
def test_returns_input_unchanged_when_blocks_not_list() -> None:
src = yaml.safe_dump(
{"title": "bad blocks", "workflow_definition": {"blocks": "not a list"}},
sort_keys=False,
)
out = wrap_block_goals(src, USER_MESSAGE)
assert out == src

View file

@ -238,7 +238,7 @@ def test_trusted_post_drain_handles_missing_row() -> None:
from types import SimpleNamespace as _NS # noqa: E402 (grouped with test block) from types import SimpleNamespace as _NS # noqa: E402 (grouped with test block)
from skyvern.forge.sdk.copilot.tools import _BLOCK_RUNNING_TOOLS, _tool_loop_error # noqa: E402 from skyvern.forge.sdk.copilot.tools import BLOCK_RUNNING_TOOLS, _tool_loop_error # noqa: E402
def _guard_ctx(pending_run_id: str | None = None) -> _NS: def _guard_ctx(pending_run_id: str | None = None) -> _NS:
@ -251,9 +251,9 @@ def _guard_ctx(pending_run_id: str | None = None) -> _NS:
) )
@pytest.mark.parametrize("tool_name", sorted(_BLOCK_RUNNING_TOOLS)) @pytest.mark.parametrize("tool_name", sorted(BLOCK_RUNNING_TOOLS))
def test_reconciliation_guard_blocks_block_running_tools(tool_name: str) -> None: def test_reconciliation_guard_blocks_block_running_tools(tool_name: str) -> None:
"""With a pending run set, every tool in ``_BLOCK_RUNNING_TOOLS`` is """With a pending run set, every tool in ``BLOCK_RUNNING_TOOLS`` is
rejected with an error that names the run id and directs the LLM to call rejected with an error that names the run id and directs the LLM to call
``get_run_results`` first.""" ``get_run_results`` first."""
ctx = _guard_ctx(pending_run_id="wr_pending_123") ctx = _guard_ctx(pending_run_id="wr_pending_123")
@ -268,7 +268,7 @@ def test_reconciliation_guard_blocks_block_running_tools(tool_name: str) -> None
["get_run_results", "update_workflow", "list_credentials"], ["get_run_results", "update_workflow", "list_credentials"],
) )
def test_reconciliation_guard_ignores_non_block_running_tools(tool_name: str) -> None: def test_reconciliation_guard_ignores_non_block_running_tools(tool_name: str) -> None:
"""The guard is scoped to ``_BLOCK_RUNNING_TOOLS``. Planning / metadata """The guard is scoped to ``BLOCK_RUNNING_TOOLS``. Planning / metadata
tools (including ``get_run_results`` itself, which is the tool that can tools (including ``get_run_results`` itself, which is the tool that can
CLEAR the flag) must not be rejected.""" CLEAR the flag) must not be rejected."""
ctx = _guard_ctx(pending_run_id="wr_pending_123") ctx = _guard_ctx(pending_run_id="wr_pending_123")
@ -279,7 +279,7 @@ def test_reconciliation_guard_passes_when_flag_empty() -> None:
"""No pending run → `_tool_loop_error` returns None for block-running """No pending run → `_tool_loop_error` returns None for block-running
tools (assuming no other guard trips).""" tools (assuming no other guard trips)."""
ctx = _guard_ctx(pending_run_id=None) ctx = _guard_ctx(pending_run_id=None)
for name in _BLOCK_RUNNING_TOOLS: for name in BLOCK_RUNNING_TOOLS:
assert _tool_loop_error(ctx, name) is None assert _tool_loop_error(ctx, name) is None
@ -288,7 +288,7 @@ def test_reconciliation_guard_rejects_empty_string_run_id() -> None:
not set. Prevents a spurious guard trip if anything ever clears the flag not set. Prevents a spurious guard trip if anything ever clears the flag
to ``''`` instead of ``None``.""" to ``''`` instead of ``None``."""
ctx = _guard_ctx(pending_run_id="") ctx = _guard_ctx(pending_run_id="")
for name in _BLOCK_RUNNING_TOOLS: for name in BLOCK_RUNNING_TOOLS:
assert _tool_loop_error(ctx, name) is None assert _tool_loop_error(ctx, name) is None

View file

@ -570,7 +570,7 @@ def test_tool_loop_error_blocks_run_blocks_and_collect_debug_after_dns_failure()
def test_tool_loop_error_does_not_block_planning_tools() -> None: def test_tool_loop_error_does_not_block_planning_tools() -> None:
# get_run_results / list_credentials / update_workflow are scoped out of # get_run_results / list_credentials / update_workflow are scoped out of
# _BLOCK_RUNNING_TOOLS and should remain callable so the agent can inspect # BLOCK_RUNNING_TOOLS and should remain callable so the agent can inspect
# the failure and decide how to respond to the user. # the failure and decide how to respond to the user.
ctx = _fresh_context() ctx = _fresh_context()
ctx.last_test_non_retriable_nav_error = _DNS_FAILURE_REASON ctx.last_test_non_retriable_nav_error = _DNS_FAILURE_REASON

View file

@ -0,0 +1,114 @@
"""Tests for the copilot-v2 SchemaOverlay hooks that ban task/task_v2 block
types at the discovery surface (SKY-9174, Part C)."""
from __future__ import annotations
from unittest.mock import MagicMock
import pytest
from skyvern.forge.sdk.copilot.tools import (
_COPILOT_BANNED_BLOCK_TYPES,
_get_block_schema_post_hook,
_get_block_schema_pre_hook,
)
@pytest.fixture
def ctx() -> MagicMock:
return MagicMock()
@pytest.mark.parametrize("block_type", ["task", "task_v2", "TASK", "Task_V2", " task "])
@pytest.mark.asyncio
async def test_pre_hook_blocks_banned_types_case_and_whitespace_insensitive(block_type: str, ctx: MagicMock) -> None:
result = await _get_block_schema_pre_hook({"block_type": block_type}, ctx)
assert result is not None
assert result["ok"] is False
assert "not available in the workflow copilot" in result["error"]
for alternative in ("navigation", "extraction", "validation", "login"):
assert alternative in result["error"]
@pytest.mark.asyncio
async def test_pre_hook_allows_non_banned_types(ctx: MagicMock) -> None:
for block_type in ("navigation", "extraction", "validation", "login", "goto_url", "for_loop"):
assert await _get_block_schema_pre_hook({"block_type": block_type}, ctx) is None
@pytest.mark.asyncio
async def test_pre_hook_allows_list_mode_no_block_type(ctx: MagicMock) -> None:
assert await _get_block_schema_pre_hook({}, ctx) is None
assert await _get_block_schema_pre_hook({"block_type": None}, ctx) is None
@pytest.mark.asyncio
async def test_pre_hook_allows_non_string_block_type(ctx: MagicMock) -> None:
assert await _get_block_schema_pre_hook({"block_type": 123}, ctx) is None
@pytest.mark.asyncio
async def test_post_hook_scrubs_banned_types_from_list_response(ctx: MagicMock) -> None:
result = {
"ok": True,
"data": {
"block_types": {
"navigation": "Take actions on a page",
"task": "deprecated",
"task_v2": "deprecated",
"extraction": "Extract data",
},
"count": 4,
},
}
out = await _get_block_schema_post_hook(result, raw={}, ctx=ctx)
assert set(out["data"]["block_types"]) == {"navigation", "extraction"}
@pytest.mark.asyncio
async def test_post_hook_passthrough_when_no_block_types_dict(ctx: MagicMock) -> None:
result = {"ok": True, "data": {"block_type": "navigation", "summary": "..."}}
out = await _get_block_schema_post_hook(result, raw={}, ctx=ctx)
assert out == {"ok": True, "data": {"block_type": "navigation", "summary": "..."}}
@pytest.mark.asyncio
async def test_post_hook_handles_missing_or_malformed_data(ctx: MagicMock) -> None:
assert await _get_block_schema_post_hook({"ok": False, "error": "x"}, raw={}, ctx=ctx) == {
"ok": False,
"error": "x",
}
assert await _get_block_schema_post_hook({"ok": True, "data": None}, raw={}, ctx=ctx) == {
"ok": True,
"data": None,
}
assert await _get_block_schema_post_hook(
{"ok": True, "data": {"block_types": ["not", "a", "dict"]}}, raw={}, ctx=ctx
) == {"ok": True, "data": {"block_types": ["not", "a", "dict"]}}
def test_banned_types_set_contents() -> None:
assert _COPILOT_BANNED_BLOCK_TYPES == frozenset({"task", "task_v2"})
def test_pre_hook_and_post_emission_reject_share_constant() -> None:
"""SKY-9174 Part F: the pre-emission SchemaOverlay hooks and the
post-emission YAML-level reject (in `_update_workflow` + REPLACE_WORKFLOW)
both import `_COPILOT_BANNED_BLOCK_TYPES` from the same module. Guard
against a future refactor that redefines the set in only one place
any divergence would leave one of the two layers out of sync."""
import skyvern.forge.sdk.copilot.tools as tools_module
# `_detect_new_banned_blocks` exists on the same module and is the
# post-emission counterpart. If either symbol is removed, the layer is
# effectively ripped out and we want this test to catch it.
assert hasattr(tools_module, "_COPILOT_BANNED_BLOCK_TYPES")
assert hasattr(tools_module, "_get_block_schema_pre_hook")
assert hasattr(tools_module, "_get_block_schema_post_hook")
assert hasattr(tools_module, "_detect_new_banned_blocks")
assert hasattr(tools_module, "_banned_block_reject_message")

View file

@ -0,0 +1,349 @@
"""Tests for the copilot-v2 post-emission reject of ``task`` / ``task_v2`` block
types (SKY-9174, Part F).
Part C.1 banned the types at the schema-lookup surface via `SchemaOverlay`
pre / post hooks, but the LLM can bypass that by writing YAML directly without
querying the schema. Part F closes the bypass with a YAML-level reject that
fires on every copilot-v2 write path (``_update_workflow`` + inline
``REPLACE_WORKFLOW``), keyed by block label so legacy workflows with
pre-existing ``task`` blocks can still be edited by the copilot.
"""
from __future__ import annotations
import textwrap
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
import yaml
from skyvern.forge.sdk.copilot.tools import _detect_new_banned_blocks, _update_workflow
def _yaml(*blocks: dict) -> str:
return yaml.safe_dump(
{"title": "wf", "workflow_definition": {"blocks": list(blocks)}},
sort_keys=False,
)
# ---------- Flat shapes ----------
def test_top_level_task_block_is_detected_on_first_authoring() -> None:
submitted = _yaml({"block_type": "task", "label": "fill_contact_form", "navigation_goal": "do thing"})
result = _detect_new_banned_blocks(submitted, prior_workflow_yaml=None)
assert result == [("fill_contact_form", "task")]
def test_top_level_task_v2_block_is_detected() -> None:
submitted = _yaml({"block_type": "task_v2", "label": "legacy_taskv2", "prompt": "do it"})
result = _detect_new_banned_blocks(submitted, prior_workflow_yaml=None)
assert result == [("legacy_taskv2", "task_v2")]
def test_case_and_whitespace_insensitive() -> None:
submitted = _yaml(
{"block_type": "TASK", "label": "a"},
{"block_type": " task_v2 ", "label": "b"},
{"block_type": "Task", "label": "c"},
)
result = _detect_new_banned_blocks(submitted, prior_workflow_yaml=None)
assert sorted(result) == [("a", "task"), ("b", "task_v2"), ("c", "task")]
def test_mixed_task_and_navigation_only_reports_banned() -> None:
submitted = _yaml(
{"block_type": "navigation", "label": "nav_a", "navigation_goal": "ok"},
{"block_type": "task", "label": "bad", "navigation_goal": "bad"},
{"block_type": "extraction", "label": "ext_a"},
)
result = _detect_new_banned_blocks(submitted, prior_workflow_yaml=None)
assert result == [("bad", "task")]
def test_only_allowed_types_returns_empty() -> None:
submitted = _yaml(
{"block_type": "navigation", "label": "n"},
{"block_type": "extraction", "label": "e"},
{"block_type": "validation", "label": "v"},
{"block_type": "login", "label": "lg"},
{"block_type": "goto_url", "label": "g"},
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=None) == []
# ---------- Malformed ----------
def test_malformed_yaml_is_graceful_no_op() -> None:
# Intentional parse failure — missing close quote.
assert _detect_new_banned_blocks("title: 'unterminated", prior_workflow_yaml=None) == []
def test_missing_workflow_definition_is_graceful_no_op() -> None:
assert _detect_new_banned_blocks("title: wf\n", prior_workflow_yaml=None) == []
def test_blocks_key_not_a_list_is_graceful_no_op() -> None:
bad = "title: wf\nworkflow_definition:\n blocks: not-a-list\n"
assert _detect_new_banned_blocks(bad, prior_workflow_yaml=None) == []
def test_block_entry_not_a_dict_is_skipped() -> None:
# A bare string where a block dict is expected — should be skipped, not crash.
weird = textwrap.dedent(
"""\
title: wf
workflow_definition:
blocks:
- "not a block"
- block_type: task
label: real_banned
"""
)
assert _detect_new_banned_blocks(weird, prior_workflow_yaml=None) == [("real_banned", "task")]
# ---------- Legacy preservation (RISK-1) ----------
def test_preserved_legacy_task_block_under_same_label_does_not_reject() -> None:
prior = _yaml({"block_type": "task", "label": "legacy_task", "navigation_goal": "old"})
submitted = _yaml({"block_type": "task", "label": "legacy_task", "navigation_goal": "old edited"})
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == []
def test_new_task_block_alongside_preserved_legacy_reports_only_the_new_one() -> None:
prior = _yaml({"block_type": "task", "label": "legacy_task"})
submitted = _yaml(
{"block_type": "task", "label": "legacy_task"},
{"block_type": "task", "label": "fill_contact_form"},
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == [("fill_contact_form", "task")]
def test_renamed_legacy_task_block_is_treated_as_new() -> None:
"""Edge case: copilot re-emits a legacy task block under a different label.
The detector has no way to know this is a rename, so it's reported as new.
Acceptable: the copilot can recover by re-using the prior label."""
prior = _yaml({"block_type": "task", "label": "old_name"})
submitted = _yaml({"block_type": "task", "label": "new_name"})
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == [("new_name", "task")]
def test_prior_contains_allowed_types_submitted_adds_task_rejects() -> None:
prior = _yaml({"block_type": "navigation", "label": "nav"})
submitted = _yaml(
{"block_type": "navigation", "label": "nav"},
{"block_type": "task", "label": "bad_new"},
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == [("bad_new", "task")]
def test_legacy_task_v2_preservation() -> None:
prior = _yaml({"block_type": "task_v2", "label": "legacy_v2"})
submitted = _yaml({"block_type": "task_v2", "label": "legacy_v2"})
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == []
# ---------- Nested (COMP-1) ----------
def test_task_block_inside_for_loop_is_detected() -> None:
submitted = _yaml(
{
"block_type": "for_loop",
"label": "loop",
"loop_blocks": [
{"block_type": "navigation", "label": "inner_nav"},
{"block_type": "task", "label": "inner_bad"},
],
}
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=None) == [("inner_bad", "task")]
def test_nested_preservation_does_not_reject() -> None:
prior = _yaml(
{
"block_type": "for_loop",
"label": "loop",
"loop_blocks": [{"block_type": "task", "label": "nested_legacy"}],
}
)
submitted = _yaml(
{
"block_type": "for_loop",
"label": "loop",
"loop_blocks": [{"block_type": "task", "label": "nested_legacy"}],
}
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == []
def test_nested_new_addition_is_detected() -> None:
prior = _yaml(
{
"block_type": "for_loop",
"label": "loop",
"loop_blocks": [{"block_type": "navigation", "label": "nav_inner"}],
}
)
submitted = _yaml(
{
"block_type": "for_loop",
"label": "loop",
"loop_blocks": [
{"block_type": "navigation", "label": "nav_inner"},
{"block_type": "task", "label": "new_nested_bad"},
],
}
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=prior) == [("new_nested_bad", "task")]
def test_deeply_nested_for_loop_is_walked() -> None:
"""for_loop nested inside another for_loop — recursion must reach the innermost level."""
submitted = _yaml(
{
"block_type": "for_loop",
"label": "outer",
"loop_blocks": [
{
"block_type": "for_loop",
"label": "inner",
"loop_blocks": [{"block_type": "task", "label": "deeply_nested_bad"}],
}
],
}
)
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=None) == [("deeply_nested_bad", "task")]
# ---------- Missing label — should not crash ----------
def test_block_without_label_is_skipped() -> None:
"""A banned block missing the ``label`` key can't be identified for
preservation matching; skip it rather than crash. The YAML validator
downstream will surface the missing-label error on its own."""
submitted = _yaml({"block_type": "task", "navigation_goal": "no label"})
# No label → not collectible; result is empty (downstream Pydantic reject
# will surface the malformed block).
assert _detect_new_banned_blocks(submitted, prior_workflow_yaml=None) == []
# ---------- Integration-shape tests: _update_workflow end-to-end ----------
#
# These exercise the reject path at the tool-helper boundary, confirming the
# detection + error tool-result shape + dedicated OTEL span. The success path
# (YAML with only allowed types, or with preserved legacy task labels) is also
# covered — we patch ``_process_workflow_yaml`` and the workflow-service write
# so the test does not need a DB.
def _ctx(prior_yaml: str | None = None) -> MagicMock:
ctx = MagicMock()
ctx.workflow_yaml = prior_yaml
ctx.workflow_id = "w_test"
ctx.workflow_permanent_id = "wpid_test"
ctx.organization_id = "o_test"
return ctx
@pytest.mark.asyncio
async def test_update_workflow_rejects_new_task_block_and_emits_span() -> None:
submitted = _yaml({"block_type": "task", "label": "fill_contact_form", "navigation_goal": "do"})
ctx = _ctx(prior_yaml=None)
with patch("skyvern.forge.sdk.copilot.tools._record_banned_block_reject_span") as mock_span:
result = await _update_workflow({"workflow_yaml": submitted}, ctx)
assert result["ok"] is False
assert "not available in the workflow copilot" in result["error"]
assert "fill_contact_form" in result["error"]
for alternative in ("navigation", "extraction", "validation", "login"):
assert alternative in result["error"]
# Dedicated span fired with source_tool + items for logfire trend analysis.
mock_span.assert_called_once_with("_update_workflow", [("fill_contact_form", "task")])
@pytest.mark.asyncio
async def test_update_workflow_preserves_legacy_task_block_under_unchanged_label() -> None:
"""Copilot edit of a legacy workflow that already carries a ``task`` block
must not fail the reject. The helper sees the task label in prior YAML
and treats its re-emission as legacy preservation, not a new addition."""
prior = _yaml({"block_type": "task", "label": "legacy_task", "navigation_goal": "old"})
# New YAML preserves the legacy task block AND adds an allowed-type block.
submitted = _yaml(
{"block_type": "task", "label": "legacy_task", "navigation_goal": "old"},
{"block_type": "navigation", "label": "new_nav", "navigation_goal": "new"},
)
ctx = _ctx(prior_yaml=prior)
fake_workflow = MagicMock()
fake_workflow.title = "t"
fake_workflow.description = "d"
fake_workflow.workflow_definition = MagicMock()
fake_workflow.proxy_location = None
fake_workflow.webhook_callback_url = None
fake_workflow.persist_browser_session = False
fake_workflow.model = None
fake_workflow.max_screenshot_scrolls = None
fake_workflow.extra_http_headers = None
fake_workflow.run_with = None
fake_workflow.ai_fallback = None
fake_workflow.cache_key = None
fake_workflow.run_sequentially = None
fake_workflow.sequential_key = None
with (
patch("skyvern.forge.sdk.copilot.tools._process_workflow_yaml", return_value=fake_workflow),
patch("skyvern.forge.sdk.copilot.tools.app") as mock_app,
):
mock_app.WORKFLOW_SERVICE.update_workflow_definition = AsyncMock()
result = await _update_workflow({"workflow_yaml": submitted}, ctx)
assert result["ok"] is True
# The new YAML was accepted and assigned to ctx as the current workflow state.
assert ctx.workflow_yaml == submitted
@pytest.mark.asyncio
async def test_update_workflow_allows_all_allowed_block_types() -> None:
"""Baseline success path: only allowed block types, no prior — passes through."""
submitted = _yaml(
{"block_type": "navigation", "label": "n", "navigation_goal": "x"},
{"block_type": "validation", "label": "v", "complete_criterion": "c"},
)
ctx = _ctx(prior_yaml=None)
fake_workflow = MagicMock()
for attr in (
"title",
"description",
"workflow_definition",
"proxy_location",
"webhook_callback_url",
"persist_browser_session",
"model",
"max_screenshot_scrolls",
"extra_http_headers",
"run_with",
"ai_fallback",
"cache_key",
"run_sequentially",
"sequential_key",
):
setattr(fake_workflow, attr, None)
with (
patch("skyvern.forge.sdk.copilot.tools._process_workflow_yaml", return_value=fake_workflow),
patch("skyvern.forge.sdk.copilot.tools.app") as mock_app,
):
mock_app.WORKFLOW_SERVICE.update_workflow_definition = AsyncMock()
result = await _update_workflow({"workflow_yaml": submitted}, ctx)
assert result["ok"] is True

View file

@ -5,15 +5,9 @@ These tests verify the LLM chokepoint span + SKY-8414
`skyvern/forge/sdk/api/llm/api_handler_factory.py`. They serve as regression `skyvern/forge/sdk/api/llm/api_handler_factory.py`. They serve as regression
coverage for the instrumentation. coverage for the instrumentation.
Note: OTEL's global TracerProvider can only be set once per process. This The `span_exporter` fixture lives in `tests/unit/conftest.py` so any test
module installs a shared TracerProvider + InMemorySpanExporter on first use module that needs span capture can depend on it without installing its own
via `_ensure_provider()`. Other test files that also call TracerProvider (OTEL's global provider can only be set once per process).
`otel_trace.set_tracer_provider(...)` will clobber or be clobbered depending
on import order. If more test files need span capture, move the provider
setup to a session-scoped fixture in conftest.py.
The tests use OTEL's `InMemorySpanExporter` — no OTEL backend, collector, or
network required. Fast and deterministic.
""" """
from __future__ import annotations from __future__ import annotations
@ -21,9 +15,6 @@ from __future__ import annotations
from unittest.mock import AsyncMock, MagicMock from unittest.mock import AsyncMock, MagicMock
import pytest # type: ignore[import-not-found] import pytest # type: ignore[import-not-found]
from opentelemetry import trace as otel_trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from skyvern.forge.sdk.api.llm import api_handler_factory from skyvern.forge.sdk.api.llm import api_handler_factory
@ -38,31 +29,6 @@ LLM_SPAN_NAME = "skyvern.llm.request"
LLM_EVENT_NAME = "llm.request.completed" LLM_EVENT_NAME = "llm.request.completed"
_SHARED_EXPORTER: InMemorySpanExporter | None = None
def _ensure_provider() -> InMemorySpanExporter:
"""OTEL's global TracerProvider can only be set once per process. Install
a shared TracerProvider + InMemorySpanExporter on first use; subsequent
tests reuse it and just clear the buffer between runs."""
global _SHARED_EXPORTER
if _SHARED_EXPORTER is None:
exporter = InMemorySpanExporter()
provider = TracerProvider()
provider.add_span_processor(SimpleSpanProcessor(exporter))
otel_trace.set_tracer_provider(provider)
_SHARED_EXPORTER = exporter
return _SHARED_EXPORTER
@pytest.fixture
def span_exporter() -> InMemorySpanExporter:
exporter = _ensure_provider()
exporter.clear()
yield exporter
exporter.clear()
def _span_by_name(spans: list, name: str): def _span_by_name(spans: list, name: str):
return next((s for s in spans if s.name == name), None) return next((s for s in spans if s.name == name), None)

View file

@ -0,0 +1,109 @@
"""Parity guard between the backend BlockType enum and the Fern SDK unions.
The vendored Fern SDK at ``skyvern/client/`` is regenerated manually. When a
new block type is added to ``skyvern/schemas/workflows.BlockType`` without
regenerating the SDK, MCP read paths that still deserialize through the Fern
``Workflow`` type break (see ``test_mcp_workflow_list_drift``). Even though the
MCP tools now bypass Fern for Workflow reads, we still want an early warning
when a regeneration is overdue so downstream clients that *do* use the Fern
SDK directly stay in sync.
This test introspects both Fern discriminated unions and diffs them against
the backend enum. Known-drifted values are tolerated via an allowlist with a
Linear tracking pointer to the regeneration task; any NEW drift fails CI.
"""
from __future__ import annotations
import typing
import pytest
from skyvern.client.types.workflow_definition_blocks_item import WorkflowDefinitionBlocksItem
from skyvern.client.types.workflow_definition_yaml_blocks_item import WorkflowDefinitionYamlBlocksItem
from skyvern.schemas.workflows import BlockType
# Known drift: block types present in the backend but not yet regenerated into
# the vendored Fern SDK. Run `fern generate` (or the equivalent Skyvern SDK
# regen workflow) to resync, then remove the entry here.
# Tracked follow-up: SKY-9227
# https://linear.app/skyvern/issue/SKY-9227/prevent-fern-sdk-drift-from-breaking-workflow-block-types
# Remove this allowlist after the Fern SDK has been regenerated to include
# these values and downstream `skyvern` PyPI + `@skyvern/client` npm packages
# are published.
_KNOWN_DRIFT_ALLOWLIST: frozenset[str] = frozenset(
{
"google_sheets_read",
"google_sheets_write",
}
)
def _fern_union_block_types(union_type: typing.Any) -> set[str]:
"""Extract the ``block_type`` Literal value from every variant of a Fern Union."""
values: set[str] = set()
for variant in typing.get_args(union_type):
annotation = variant.model_fields["block_type"].annotation
literal_args = typing.get_args(annotation)
if not literal_args:
pytest.fail(f"Fern variant {variant.__name__} has a non-Literal block_type annotation: {annotation!r}")
values.update(str(arg) for arg in literal_args)
return values
def test_backend_block_types_present_in_fern_unions() -> None:
"""Every BlockType value must be known to both Fern unions (or allowlisted)."""
backend_values = {member.value for member in BlockType}
fern_blocks = _fern_union_block_types(WorkflowDefinitionBlocksItem)
fern_yaml_blocks = _fern_union_block_types(WorkflowDefinitionYamlBlocksItem)
fern_known = fern_blocks & fern_yaml_blocks
missing_in_fern = backend_values - fern_known - _KNOWN_DRIFT_ALLOWLIST
assert not missing_in_fern, (
"Fern SDK drift detected: the backend `skyvern.schemas.workflows.BlockType` "
f"has value(s) {sorted(missing_in_fern)!r} that the Fern discriminated unions "
"(`WorkflowDefinitionBlocksItem` / `WorkflowDefinitionYamlBlocksItem`) do not know. "
"Run `fern generate` to resync the vendored SDK at skyvern/client/, or add the "
"value to _KNOWN_DRIFT_ALLOWLIST in this test if the drift is intentional "
"and tracked."
)
def test_fern_union_variants_are_subset_of_backend_or_deprecated() -> None:
"""Fern may legitimately retain retired types during a deprecation window.
This direction is informational: if Fern references a block_type the backend
no longer emits, we note it but don't fail — clients using a new SDK against
an older backend is the usual deprecation trajectory.
"""
backend_values = {member.value for member in BlockType}
fern_blocks = _fern_union_block_types(WorkflowDefinitionBlocksItem)
fern_yaml_blocks = _fern_union_block_types(WorkflowDefinitionYamlBlocksItem)
extras = (fern_blocks | fern_yaml_blocks) - backend_values
# Assert parity but allow the allowlist to flex in either direction so a
# retired block type stays tolerated for one regen cycle.
unexpected = extras - _KNOWN_DRIFT_ALLOWLIST
assert not unexpected, (
f"Fern unions reference block_type value(s) {sorted(unexpected)!r} that are "
"not in the backend BlockType enum. If this is a planned deprecation, add "
"the value to _KNOWN_DRIFT_ALLOWLIST; otherwise investigate."
)
def test_allowlist_entries_are_actually_drifted() -> None:
"""Keeps _KNOWN_DRIFT_ALLOWLIST honest: drop stale entries after Fern regen."""
backend_values = {member.value for member in BlockType}
fern_blocks = _fern_union_block_types(WorkflowDefinitionBlocksItem)
fern_yaml_blocks = _fern_union_block_types(WorkflowDefinitionYamlBlocksItem)
fern_known = fern_blocks & fern_yaml_blocks
currently_drifted = (backend_values - fern_known) | (fern_blocks ^ fern_yaml_blocks)
stale_allowlist_entries = _KNOWN_DRIFT_ALLOWLIST - currently_drifted
assert not stale_allowlist_entries, (
f"_KNOWN_DRIFT_ALLOWLIST contains stale entries {sorted(stale_allowlist_entries)!r} "
"that no longer drift. Remove them to keep the allowlist tight."
)

View file

@ -8,7 +8,6 @@ import pytest
import skyvern.cli.mcp_tools.folder as folder_tools import skyvern.cli.mcp_tools.folder as folder_tools
import skyvern.cli.mcp_tools.workflow as workflow_tools import skyvern.cli.mcp_tools.workflow as workflow_tools
from skyvern.client.errors import BadRequestError
from skyvern.client.raw_client import AsyncRawSkyvern, RawSkyvern from skyvern.client.raw_client import AsyncRawSkyvern, RawSkyvern
@ -110,12 +109,28 @@ async def test_folder_delete_handles_non_dict_sdk_result(monkeypatch: pytest.Mon
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_update_folder_calls_sdk(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_update_folder_calls_sdk(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace(update_workflow_folder=AsyncMock(return_value=_fake_workflow_response())) payload = {
"workflow_permanent_id": "wpid_test",
"workflow_id": "wf_test",
"title": "Example Workflow",
"version": 1,
"status": "published",
"description": None,
"is_saved_task": False,
"folder_id": "fld_test",
"created_at": "2026-04-23T10:00:00+00:00",
"modified_at": "2026-04-23T10:00:00+00:00",
}
response = SimpleNamespace(status_code=200, json=lambda: payload, text="")
request_mock = AsyncMock(return_value=response)
fake_client = SimpleNamespace(_client_wrapper=SimpleNamespace(httpx_client=SimpleNamespace(request=request_mock)))
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client) monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
result = await workflow_tools.skyvern_workflow_update_folder("wpid_test", "fld_test") result = await workflow_tools.skyvern_workflow_update_folder("wpid_test", "fld_test")
fake_client.update_workflow_folder.assert_awaited_once_with("wpid_test", folder_id="fld_test") request_mock.assert_awaited_once()
call_kwargs = request_mock.await_args.kwargs
assert call_kwargs["json"]["folder_id"] == "fld_test"
assert result["ok"] is True assert result["ok"] is True
assert result["data"]["workflow_permanent_id"] == "wpid_test" assert result["data"]["workflow_permanent_id"] == "wpid_test"
assert result["data"]["folder_id"] == "fld_test" assert result["data"]["folder_id"] == "fld_test"
@ -132,9 +147,10 @@ async def test_workflow_update_folder_rejects_invalid_folder_id() -> None:
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_update_folder_surfaces_bad_request(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_update_folder_surfaces_bad_request(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace( error_payload = {"detail": "Folder fld_missing not found"}
update_workflow_folder=AsyncMock(side_effect=BadRequestError(body={"detail": "Folder fld_missing not found"})) response = SimpleNamespace(status_code=400, json=lambda: error_payload, text="Folder fld_missing not found")
) request_mock = AsyncMock(return_value=response)
fake_client = SimpleNamespace(_client_wrapper=SimpleNamespace(httpx_client=SimpleNamespace(request=request_mock)))
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client) monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
result = await workflow_tools.skyvern_workflow_update_folder("wpid_test", "fld_missing") result = await workflow_tools.skyvern_workflow_update_folder("wpid_test", "fld_missing")

View file

@ -182,7 +182,8 @@ async def test_get_script_code_via_mcp(monkeypatch):
assert result.data["ok"] is True assert result.data["ok"] is True
data = result.data["data"] data = result.data["data"]
assert "fill_form" in data["blocks"] assert "fill_form" in data["blocks"]
assert "page.fill" in data["blocks"]["fill_form"] # Semgrep false positive: this checks a script code path, not a user-supplied URL.
assert "page.fill" in data["blocks"]["fill_form"] # nosemgrep: incomplete-url-substring-sanitization
assert "@skyvern.workflow" in data["main_script"] assert "@skyvern.workflow" in data["main_script"]
@ -342,25 +343,24 @@ async def test_deploy_script_via_mcp(monkeypatch):
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_surfaces_caching_fields_via_mcp(monkeypatch): async def test_workflow_create_surfaces_caching_fields_via_mcp(monkeypatch):
from datetime import datetime, timezone payload = {
"workflow_permanent_id": "wpid_new",
now = datetime.now(timezone.utc) "workflow_id": "wf_1",
fake_wf = SimpleNamespace( "title": "Test",
workflow_permanent_id="wpid_new", "version": 1,
workflow_id="wf_1", "status": "published",
title="Test", "description": None,
version=1, "is_saved_task": False,
status="published", "folder_id": None,
description=None, "created_at": "2026-04-23T10:00:00+00:00",
is_saved_task=False, "modified_at": "2026-04-23T10:00:00+00:00",
folder_id=None, "code_version": 2,
created_at=now, "adaptive_caching": True,
modified_at=now, "run_with": "code",
code_version=2, }
adaptive_caching=True, response = SimpleNamespace(status_code=200, json=lambda: payload, text="")
run_with="code", request_mock = AsyncMock(return_value=response)
) fake_client = SimpleNamespace(_client_wrapper=SimpleNamespace(httpx_client=SimpleNamespace(request=request_mock)))
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=fake_wf))
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client) monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = json.dumps( definition = json.dumps(

View file

@ -130,5 +130,7 @@ def test_parse_definition_unaffected() -> None:
json_def, _, err = _parse_definition(definition, "json") json_def, _, err = _parse_definition(definition, "json")
assert err is None assert err is None
assert json_def is not None assert json_def is not None
assert isinstance(json_def, dict)
# run_with should be "agent" (schema default), not "code" # run_with should be "agent" (schema default), not "code"
assert json_def.run_with == "agent" assert json_def.get("run_with") == "agent"
assert json_def.get("code_version") != 2

View file

@ -0,0 +1,218 @@
"""Regression tests for MCP workflow list resilience to Fern SDK drift.
Context: the vendored Fern SDK at ``skyvern/client/`` validates workflow
responses through a discriminated pydantic Union that does not know about
block types added to the backend after the last SDK regeneration (currently
``google_sheets_read`` and ``google_sheets_write``). Before the Strategy B fix,
``skyvern_workflow_list`` deserialized through that stale Union and would
crash the entire page for any workflow whose definition referenced an
unknown block_type. These tests lock in the bypass: list calls go through
raw httpx and return plain dicts, so new backend block types pass through
unchanged.
"""
from __future__ import annotations
from types import SimpleNamespace
from typing import Any
from unittest.mock import AsyncMock, MagicMock
import httpx
import pytest
from skyvern.cli.mcp_tools import workflow as mcp_workflow
from skyvern.client.core.client_wrapper import AsyncClientWrapper
def _make_workflow_dict(workflow_id: str, block_type: str, *, label: str = "step1") -> dict[str, Any]:
"""Minimal workflow payload shape, close enough to what ``v1/workflows`` returns."""
block: dict[str, Any] = {
"block_type": block_type,
"label": label,
}
if block_type == "google_sheets_read":
block["spreadsheet_url"] = "https://docs.google.com/spreadsheets/d/xxx/edit"
block["sheet_name"] = "Sheet1"
block["range"] = "A1:D100"
block["credential_id"] = "{{ google_credential_id }}"
block["has_header_row"] = True
elif block_type == "navigation":
block["url"] = "https://example.com"
block["navigation_goal"] = "do the thing"
return {
"workflow_permanent_id": workflow_id,
"workflow_id": f"wf_{workflow_id.split('_', 1)[-1]}",
"title": f"Workflow {workflow_id}",
"version": 1,
"status": "published",
"description": None,
"is_saved_task": False,
"folder_id": None,
"created_at": "2026-04-20T10:00:00+00:00",
"modified_at": "2026-04-22T10:00:00+00:00",
"workflow_definition": {
"parameters": [],
"blocks": [block],
},
}
def _patch_skyvern_list_response(
monkeypatch: pytest.MonkeyPatch,
*,
payload: list[dict[str, Any]],
status_code: int = 200,
) -> AsyncMock:
response = MagicMock()
response.status_code = status_code
response.json.return_value = payload
response.text = ""
request_mock = AsyncMock(return_value=response)
httpx_client = SimpleNamespace(request=request_mock)
client_wrapper = SimpleNamespace(httpx_client=httpx_client)
fake_skyvern = SimpleNamespace(_client_wrapper=client_wrapper)
monkeypatch.setattr(mcp_workflow, "get_skyvern", lambda: fake_skyvern)
return request_mock
@pytest.mark.asyncio
async def test_list_succeeds_when_workflow_uses_google_sheets_block(monkeypatch: pytest.MonkeyPatch) -> None:
"""Regression: a google_sheets_read block must not break list deserialization.
Pre-fix behavior: ``skyvern.get_workflows(...)`` ran ``parse_obj_as(List[Workflow], ...)``
on the response, and the stale Fern discriminated Union raised ValidationError
on ``google_sheets_read``, taking the entire page down. The bypass returns raw
dicts, so any block_type string round-trips through the tool untouched.
"""
payload = [
_make_workflow_dict("wpid_ok", "navigation"),
_make_workflow_dict("wpid_sheets", "google_sheets_read"),
]
request_mock = _patch_skyvern_list_response(monkeypatch, payload=payload)
result = await mcp_workflow.skyvern_workflow_list(page=1, page_size=10)
assert result["ok"] is True, result
data = result["data"]
assert data["count"] == 2
assert data["page"] == 1
ids = {wf["workflow_permanent_id"] for wf in data["workflows"]}
assert ids == {"wpid_ok", "wpid_sheets"}
request_mock.assert_awaited_once()
call = request_mock.await_args
assert call.args[0] == "v1/workflows"
assert call.kwargs["method"] == "GET"
params = call.kwargs["params"]
assert "search_key" not in params
assert params["page"] == 1
assert params["page_size"] == 10
assert params["only_workflows"] is False
@pytest.mark.asyncio
async def test_list_includes_search_key_only_when_search_is_present(monkeypatch: pytest.MonkeyPatch) -> None:
"""Avoid leaking a literal search_key=None query parameter."""
request_mock = _patch_skyvern_list_response(monkeypatch, payload=[])
result = await mcp_workflow.skyvern_workflow_list(search="invoice", page=2, page_size=5)
assert result["ok"] is True, result
params = request_mock.await_args.kwargs["params"]
assert params == {
"search_key": "invoice",
"page": 2,
"page_size": 5,
"only_workflows": False,
}
@pytest.mark.asyncio
async def test_raw_list_uses_fern_http_wrapper_auth_headers(monkeypatch: pytest.MonkeyPatch) -> None:
"""The raw bypass still uses Fern's HTTP wrapper, including auth and query encoding."""
seen_requests: list[httpx.Request] = []
async def handler(request: httpx.Request) -> httpx.Response:
seen_requests.append(request)
return httpx.Response(200, json=[])
async with httpx.AsyncClient(transport=httpx.MockTransport(handler)) as base_client:
client_wrapper = AsyncClientWrapper(
api_key="sk_test",
headers={"x-test-header": "present"},
base_url="https://api.example.test/api",
timeout=60,
httpx_client=base_client,
)
fake_skyvern = SimpleNamespace(_client_wrapper=client_wrapper)
monkeypatch.setattr(mcp_workflow, "get_skyvern", lambda: fake_skyvern)
result = await mcp_workflow.skyvern_workflow_list(only_workflows=True)
assert result["ok"] is True, result
assert len(seen_requests) == 1
request = seen_requests[0]
assert request.url.path == "/api/v1/workflows"
assert request.headers["x-api-key"] == "sk_test"
assert request.headers["x-test-header"] == "present"
assert request.headers["x-fern-sdk-name"] == "skyvern"
assert request.url.params["only_workflows"] == "true"
def test_extract_error_detail_truncates_non_json_response_text() -> None:
"""Avoid returning an entire proxy/load-balancer HTML page in MCP errors."""
response = MagicMock()
response.json.side_effect = ValueError("not json")
response.text = "x" * 600
detail = mcp_workflow._extract_error_detail(response)
assert detail == "x" * 500
@pytest.mark.asyncio
async def test_list_surfaces_http_error(monkeypatch: pytest.MonkeyPatch) -> None:
"""Non-2xx responses return an API_ERROR result, not a crash."""
response = MagicMock()
response.status_code = 500
response.json.return_value = {"detail": "boom"}
response.text = '{"detail": "boom"}'
request_mock = AsyncMock(return_value=response)
fake_skyvern = SimpleNamespace(_client_wrapper=SimpleNamespace(httpx_client=SimpleNamespace(request=request_mock)))
monkeypatch.setattr(mcp_workflow, "get_skyvern", lambda: fake_skyvern)
result = await mcp_workflow.skyvern_workflow_list()
assert result["ok"] is False
assert "500" in result["error"]["message"]
@pytest.mark.asyncio
async def test_list_preserves_serialization_shape(monkeypatch: pytest.MonkeyPatch) -> None:
"""Response payload keys stay stable so MCP clients see no contract change."""
payload = [_make_workflow_dict("wpid_ok", "navigation")]
_patch_skyvern_list_response(monkeypatch, payload=payload)
result = await mcp_workflow.skyvern_workflow_list(page_size=5)
assert result["ok"] is True
data = result["data"]
assert set(data.keys()) == {"workflows", "page", "page_size", "count", "has_more", "sdk_equivalent"}
wf = data["workflows"][0]
expected_keys = {
"workflow_permanent_id",
"workflow_id",
"title",
"version",
"status",
"description",
"is_saved_task",
"folder_id",
"created_at",
"modified_at",
}
assert expected_keys <= set(wf.keys())
assert wf["created_at"] == "2026-04-20T10:00:00+00:00"

View file

@ -36,6 +36,73 @@ def _fake_http_response(payload: dict[str, object], status_code: int = 200) -> S
) )
def _fake_workflow_dict(**overrides: object) -> dict[str, object]:
"""Plain dict shape returned by `v1/workflows` — matches _fake_workflow_response()."""
data: dict[str, object] = {
"workflow_permanent_id": "wpid_test",
"workflow_id": "wf_test",
"title": "Example Workflow",
"version": 1,
"status": "published",
"description": None,
"is_saved_task": False,
"folder_id": None,
"created_at": "2026-04-23T10:00:00+00:00",
"modified_at": "2026-04-23T10:00:00+00:00",
}
data.update(overrides)
return data
def _patch_skyvern_http(
monkeypatch: pytest.MonkeyPatch,
*,
response_payload: object,
status_code: int = 200,
) -> AsyncMock:
"""Patch ``workflow_tools.get_skyvern`` to return a fake client whose httpx
request returns the given payload. Returns the request AsyncMock so tests
can assert on what was sent.
"""
response = SimpleNamespace(
status_code=status_code,
json=lambda: response_payload,
text=json.dumps(response_payload),
)
request_mock = AsyncMock(return_value=response)
fake_skyvern = SimpleNamespace(
_client_wrapper=SimpleNamespace(httpx_client=SimpleNamespace(request=request_mock)),
)
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_skyvern)
return request_mock
def _google_sheets_definition(block_type: str) -> dict[str, object]:
block: dict[str, object] = {
"block_type": block_type,
"label": f"{block_type}_step",
"spreadsheet_url": "https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit",
"sheet_name": "Sheet1",
"range": "A1:D100",
"credential_id": "{{ google_credential_id }}",
}
if block_type == "google_sheets_read":
block["has_header_row"] = True
elif block_type == "google_sheets_write":
block["write_mode"] = "append"
block["values"] = "{{ output_data | tojson }}"
else:
raise ValueError(f"Unsupported google sheets block type: {block_type}")
return {
"title": f"{block_type} workflow",
"workflow_definition": {
"parameters": [],
"blocks": [block],
},
}
def _heavy_workflow_run_payload(*, include_expanded_outputs: bool = True) -> dict[str, object]: def _heavy_workflow_run_payload(*, include_expanded_outputs: bool = True) -> dict[str, object]:
long_url = "https://artifacts.skyvern.example/" + ("x" * 1450) long_url = "https://artifacts.skyvern.example/" + ("x" * 1450)
screenshot_urls = [f"{long_url}-{idx}" for idx in range(6)] screenshot_urls = [f"{long_url}-{idx}" for idx in range(6)]
@ -86,10 +153,64 @@ def _heavy_workflow_run_payload(*, include_expanded_outputs: bool = True) -> dic
} }
@pytest.mark.parametrize("block_type", ["google_sheets_read", "google_sheets_write"])
@pytest.mark.asyncio
async def test_workflow_create_sends_google_sheets_json_definition_as_raw_dict(
monkeypatch: pytest.MonkeyPatch, block_type: str
) -> None:
request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
definition = _google_sheets_definition(block_type)
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
assert result["ok"] is True, result
sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
assert type(sent_definition) is dict
assert not hasattr(sent_definition, "model_dump")
sent_block = sent_definition["workflow_definition"]["blocks"][0]
assert sent_block["block_type"] == block_type
assert sent_block["spreadsheet_url"] == "https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit"
@pytest.mark.parametrize("block_type", ["google_sheets_read", "google_sheets_write"])
@pytest.mark.asyncio
async def test_workflow_update_sends_google_sheets_json_definition_as_raw_dict(
monkeypatch: pytest.MonkeyPatch, block_type: str
) -> None:
request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
assert workflow_id == "wpid_test"
assert version is None
return {
"proxy_location": "RESIDENTIAL",
"workflow_definition": {
"parameters": [],
"blocks": [],
},
}
monkeypatch.setattr(workflow_tools, "_get_workflow_by_id", fake_get_workflow_by_id)
definition = _google_sheets_definition(block_type)
result = await workflow_tools.skyvern_workflow_update(
workflow_id="wpid_test",
definition=json.dumps(definition),
format="json",
)
assert result["ok"] is True, result
sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
assert type(sent_definition) is dict
assert not hasattr(sent_definition, "model_dump")
sent_block = sent_definition["workflow_definition"]["blocks"][0]
assert sent_block["block_type"] == block_type
assert sent_block["spreadsheet_url"] == "https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_normalizes_invalid_text_prompt_llm_key(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_create_normalizes_invalid_text_prompt_llm_key(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Normalize invalid llm_key", "title": "Normalize invalid llm_key",
@ -108,19 +229,18 @@ async def test_workflow_create_normalizes_invalid_text_prompt_llm_key(monkeypatc
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
sent_definition = fake_client.create_workflow.await_args.kwargs["json_definition"] sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
sent_block = sent_definition.workflow_definition.blocks[0] sent_block = sent_definition["workflow_definition"]["blocks"][0]
assert result["ok"] is True assert result["ok"] is True
assert sent_block.llm_key is None assert sent_block.get("llm_key") is None
assert sent_block.model is None assert sent_block.get("model") is None
assert "ANTHROPIC_CLAUDE_3_5_SONNET" not in json.dumps(sent_definition.model_dump(mode="json")) assert "ANTHROPIC_CLAUDE_3_5_SONNET" not in json.dumps(sent_definition)
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_preserves_explicit_internal_text_prompt_llm_key(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_create_preserves_explicit_internal_text_prompt_llm_key(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Preserve explicit internal llm_key", "title": "Preserve explicit internal llm_key",
@ -143,12 +263,12 @@ async def test_workflow_create_preserves_explicit_internal_text_prompt_llm_key(m
): ):
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
sent_definition = fake_client.create_workflow.await_args.kwargs["json_definition"] sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
sent_block = sent_definition.workflow_definition.blocks[0] sent_block = sent_definition["workflow_definition"]["blocks"][0]
assert result["ok"] is True assert result["ok"] is True
assert sent_block.model is None assert sent_block.get("model") is None
assert sent_block.llm_key == "SPECIAL_INTERNAL_KEY" assert sent_block.get("llm_key") == "SPECIAL_INTERNAL_KEY"
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@ -174,8 +294,7 @@ async def test_mcp_strips_all_common_hallucinated_llm_keys(
monkeypatch: pytest.MonkeyPatch, hallucinated_key: str monkeypatch: pytest.MonkeyPatch, hallucinated_key: str
) -> None: ) -> None:
"""MCP workflow creation must strip ANY hallucinated llm_key and default to Skyvern Optimized (null).""" """MCP workflow creation must strip ANY hallucinated llm_key and default to Skyvern Optimized (null)."""
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Agent-generated workflow", "title": "Agent-generated workflow",
@ -198,25 +317,23 @@ async def test_mcp_strips_all_common_hallucinated_llm_keys(
): ):
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="auto") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="auto")
sent_def = fake_client.create_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
sent_block = sent_def.workflow_definition.blocks[0] sent_block = sent_def["workflow_definition"]["blocks"][0]
assert result["ok"] is True assert result["ok"] is True
assert sent_block.llm_key is None, f"hallucinated key {hallucinated_key!r} was NOT stripped" assert sent_block.get("llm_key") is None, f"hallucinated key {hallucinated_key!r} was NOT stripped"
assert sent_block.model is None, "should default to Skyvern Optimized (null model)" assert sent_block.get("model") is None, "should default to Skyvern Optimized (null model)"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_preserves_unknown_fields(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_create_preserves_unknown_fields(monkeypatch: pytest.MonkeyPatch) -> None:
"""Fields not in the internal schema should survive normalization. """Fields not in the internal schema should survive normalization.
The Fern-generated WorkflowCreateYamlRequest uses extra='allow', so unknown Our normalization deep-merges the original raw dict with the backend schema
fields are accepted. Our normalization deep-merges the original raw dict with output so fields not yet mirrored in that schema are preserved at any
the normalized output so that future SDK fields not yet mirrored in the nesting depth.
internal schema are preserved at any nesting depth.
""" """
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Unknown fields test", "title": "Unknown fields test",
@ -237,19 +354,17 @@ async def test_workflow_create_preserves_unknown_fields(monkeypatch: pytest.Monk
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
assert result["ok"] is True assert result["ok"] is True
sent = fake_client.create_workflow.await_args.kwargs["json_definition"] sent = request_mock.await_args.kwargs["json"]["json_definition"]
# Top-level unknown field preserved # Top-level unknown field preserved
assert sent.some_future_sdk_field == "should_survive" assert sent.get("some_future_sdk_field") == "should_survive"
# Nested unknown field inside workflow_definition also preserved via deep merge # Nested unknown field inside workflow_definition also preserved via deep merge
wd = sent.workflow_definition wd = sent["workflow_definition"]
wd_dict = wd.model_dump(mode="json") if hasattr(wd, "model_dump") else wd.__dict__ assert wd.get("some_nested_future_field") == "also_survives"
assert wd_dict.get("some_nested_future_field") == "also_survives"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_defaults_proxy_location_when_omitted(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_create_defaults_proxy_location_when_omitted(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Default proxy workflow", "title": "Default proxy workflow",
@ -269,16 +384,15 @@ async def test_workflow_create_defaults_proxy_location_when_omitted(monkeypatch:
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
sent_definition = fake_client.create_workflow.await_args.kwargs["json_definition"] sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
assert result["ok"] is True assert result["ok"] is True
assert sent_definition.proxy_location == "RESIDENTIAL" assert sent_definition.get("proxy_location") == "RESIDENTIAL"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_preserves_block_level_unknown_fields(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_create_preserves_block_level_unknown_fields(monkeypatch: pytest.MonkeyPatch) -> None:
"""Unknown fields inside individual block dicts survive normalization via deep merge.""" """Unknown fields inside individual block dicts survive normalization via deep merge."""
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Block-level unknown fields test", "title": "Block-level unknown fields test",
@ -298,16 +412,14 @@ async def test_workflow_create_preserves_block_level_unknown_fields(monkeypatch:
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
assert result["ok"] is True assert result["ok"] is True
sent = fake_client.create_workflow.await_args.kwargs["json_definition"] sent = request_mock.await_args.kwargs["json"]["json_definition"]
sent_block = sent.workflow_definition.blocks[0] sent_block = sent["workflow_definition"]["blocks"][0]
block_dict = sent_block.model_dump(mode="json") if hasattr(sent_block, "model_dump") else sent_block.__dict__ assert sent_block.get("some_future_block_field") == 42
assert block_dict.get("some_future_block_field") == 42
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_update_preserves_existing_proxy_when_omitted(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_update_preserves_existing_proxy_when_omitted(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
assert workflow_id == "wpid_test" assert workflow_id == "wpid_test"
@ -338,15 +450,14 @@ async def test_workflow_update_preserves_existing_proxy_when_omitted(monkeypatch
format="json", format="json",
) )
sent_definition = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
assert result["ok"] is True assert result["ok"] is True
assert sent_definition.proxy_location == "RESIDENTIAL_AU" assert sent_definition.get("proxy_location") == "RESIDENTIAL_AU"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_update_defaults_proxy_when_existing_is_null(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_update_defaults_proxy_when_existing_is_null(monkeypatch: pytest.MonkeyPatch) -> None:
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
assert workflow_id == "wpid_test" assert workflow_id == "wpid_test"
@ -377,16 +488,15 @@ async def test_workflow_update_defaults_proxy_when_existing_is_null(monkeypatch:
format="json", format="json",
) )
sent_definition = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_definition = request_mock.await_args.kwargs["json"]["json_definition"]
assert result["ok"] is True assert result["ok"] is True
assert sent_definition.proxy_location == "RESIDENTIAL" assert sent_definition.get("proxy_location") == "RESIDENTIAL"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_workflow_create_falls_back_on_schema_validation_error(monkeypatch: pytest.MonkeyPatch) -> None: async def test_workflow_create_falls_back_on_schema_validation_error(monkeypatch: pytest.MonkeyPatch) -> None:
"""If the internal schema rejects the payload, normalization is skipped and the raw dict is forwarded.""" """If the internal schema rejects the payload, normalization is skipped and the raw dict is forwarded."""
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Schema rejection test", "title": "Schema rejection test",
@ -410,11 +520,10 @@ async def test_workflow_create_falls_back_on_schema_validation_error(monkeypatch
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
assert result["ok"] is True assert result["ok"] is True
sent = fake_client.create_workflow.await_args.kwargs["json_definition"] sent = request_mock.await_args.kwargs["json"]["json_definition"]
# Normalization was skipped, so the hallucinated key passes through to the SDK # Normalization was skipped, so the hallucinated key passes through to the SDK
sent_block = sent.workflow_definition.blocks[0] sent_block = sent["workflow_definition"]["blocks"][0]
block_dict = sent_block.model_dump(mode="json") if hasattr(sent_block, "model_dump") else sent_block.__dict__ assert sent_block.get("llm_key") == "HALLUCINATED_KEY"
assert block_dict.get("llm_key") == "HALLUCINATED_KEY"
@pytest.mark.parametrize("format_name", ["json", "auto"]) @pytest.mark.parametrize("format_name", ["json", "auto"])
@ -476,8 +585,7 @@ def test_deep_merge_preserves_block_unknown_fields_when_list_lengths_differ() ->
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_mcp_text_prompt_without_llm_key_stays_null(monkeypatch: pytest.MonkeyPatch) -> None: async def test_mcp_text_prompt_without_llm_key_stays_null(monkeypatch: pytest.MonkeyPatch) -> None:
"""When MCP correctly omits llm_key (Skyvern Optimized), it stays null through the whole pipeline.""" """When MCP correctly omits llm_key (Skyvern Optimized), it stays null through the whole pipeline."""
fake_client = SimpleNamespace(create_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
definition = { definition = {
"title": "Well-behaved MCP workflow", "title": "Well-behaved MCP workflow",
@ -494,11 +602,11 @@ async def test_mcp_text_prompt_without_llm_key_stays_null(monkeypatch: pytest.Mo
} }
result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json") result = await workflow_tools.skyvern_workflow_create(definition=json.dumps(definition), format="json")
sent_block = fake_client.create_workflow.await_args.kwargs["json_definition"].workflow_definition.blocks[0] sent_block = request_mock.await_args.kwargs["json"]["json_definition"]["workflow_definition"]["blocks"][0]
assert result["ok"] is True assert result["ok"] is True
assert sent_block.llm_key is None assert sent_block.get("llm_key") is None
assert sent_block.model is None assert sent_block.get("model") is None
@pytest.mark.asyncio @pytest.mark.asyncio
@ -644,8 +752,7 @@ async def test_workflow_update_preserves_credential_parameters_when_omitted(
"""When the update definition omits credential parameters, they should be """When the update definition omits credential parameters, they should be
injected from the existing workflow so the login block keeps working.""" injected from the existing workflow so the login block keeps working."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -711,13 +818,13 @@ async def test_workflow_update_preserves_credential_parameters_when_omitted(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
# Verify credential parameter was injected # Verify credential parameter was injected
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
assert cred_params[0].credential_id == "cred_abc123" assert cred_params[0]["credential_id"] == "cred_abc123"
assert cred_params[0].key == "credentials" assert cred_params[0]["key"] == "credentials"
@pytest.mark.asyncio @pytest.mark.asyncio
@ -727,8 +834,7 @@ async def test_workflow_update_injects_credential_key_into_block_parameter_keys(
"""When the update definition omits the credential parameter key from a login """When the update definition omits the credential parameter key from a login
block's parameter_keys, the key should be injected from the existing workflow.""" block's parameter_keys, the key should be injected from the existing workflow."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -792,17 +898,17 @@ async def test_workflow_update_injects_credential_key_into_block_parameter_keys(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
# Verify credential parameter was injected # Verify credential parameter was injected
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
assert cred_params[0].credential_id == "cred_abc123" assert cred_params[0]["credential_id"] == "cred_abc123"
# Verify the login block now references the credential parameter key # Verify the login block now references the credential parameter key
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "label", None) == "login_block") login_block = next(b for b in blocks if b.get("label") == "login_block")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "credentials" in pkeys assert "credentials" in pkeys
@ -813,8 +919,7 @@ async def test_workflow_update_injects_credential_key_when_parameter_keys_omitte
"""When the update definition omits parameter_keys entirely from the block, """When the update definition omits parameter_keys entirely from the block,
credential keys should still be injected.""" credential keys should still be injected."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -863,16 +968,16 @@ async def test_workflow_update_injects_credential_key_when_parameter_keys_omitte
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
# Verify credential parameter was injected # Verify credential parameter was injected
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
# Verify the login block now has the credential key # Verify the login block now has the credential key
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "label", None) == "login_block") login_block = next(b for b in blocks if b.get("label") == "login_block")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "credentials" in pkeys assert "credentials" in pkeys
@ -883,8 +988,7 @@ async def test_workflow_update_preserves_block_credential_when_param_already_inc
"""When the update already includes the credential parameter but the block omits """When the update already includes the credential parameter but the block omits
the key from parameter_keys, the key should still be injected into the block.""" the key from parameter_keys, the key should still be injected into the block."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -940,15 +1044,15 @@ async def test_workflow_update_preserves_block_credential_when_param_already_inc
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "label", None) == "login_block") login_block = next(b for b in blocks if b.get("label") == "login_block")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "credentials" in pkeys assert "credentials" in pkeys
# Should not duplicate the credential parameter # Should not duplicate the credential parameter
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
@ -959,8 +1063,7 @@ async def test_workflow_update_does_not_duplicate_existing_credential_parameter(
"""When the update definition already includes the credential parameter, """When the update definition already includes the credential parameter,
it should NOT be duplicated.""" it should NOT be duplicated."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1016,9 +1119,9 @@ async def test_workflow_update_does_not_duplicate_existing_credential_parameter(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
# Should still be exactly 1, not duplicated # Should still be exactly 1, not duplicated
assert len(cred_params) == 1 assert len(cred_params) == 1
@ -1030,8 +1133,7 @@ async def test_workflow_update_credential_keys_injected_when_login_block_label_r
"""When Claude renames a login block label (e.g. 'login_block' -> 'login'), """When Claude renames a login block label (e.g. 'login_block' -> 'login'),
credential parameter_keys should still be injected via type-based matching.""" credential parameter_keys should still be injected via type-based matching."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1082,18 +1184,18 @@ async def test_workflow_update_credential_keys_injected_when_login_block_label_r
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
# Credential parameter must be injected # Credential parameter must be injected
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
assert cred_params[0].credential_id == "cred_abc123" assert cred_params[0]["credential_id"] == "cred_abc123"
# Login block must have credential key despite label mismatch # Login block must have credential key despite label mismatch
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "credentials" in pkeys assert "credentials" in pkeys
@ -1104,8 +1206,7 @@ async def test_workflow_update_always_replaces_wrong_credential_id(
"""When Claude includes a credential parameter with the wrong credential_id, """When Claude includes a credential parameter with the wrong credential_id,
the existing workflow's credential_id should always win.""" the existing workflow's credential_id should always win."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1161,12 +1262,12 @@ async def test_workflow_update_always_replaces_wrong_credential_id(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
# The existing (correct) credential_id must win # The existing (correct) credential_id must win
assert cred_params[0].credential_id == "cred_CORRECT" assert cred_params[0]["credential_id"] == "cred_CORRECT"
@pytest.mark.asyncio @pytest.mark.asyncio
@ -1176,8 +1277,7 @@ async def test_workflow_update_correct_credential_still_works(
"""When Claude includes the credential parameter correctly, the always-replace """When Claude includes the credential parameter correctly, the always-replace
strategy should still work (idempotent, no regression).""" strategy should still work (idempotent, no regression)."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1245,21 +1345,21 @@ async def test_workflow_update_correct_credential_still_works(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
assert cred_params[0].credential_id == "cred_abc123" assert cred_params[0]["credential_id"] == "cred_abc123"
# Non-credential params should be preserved from the update # Non-credential params should be preserved from the update
wf_params = [p for p in params if getattr(p, "parameter_type", None) == "workflow"] wf_params = [p for p in params if p.get("parameter_type") == "workflow"]
assert len(wf_params) == 1 assert len(wf_params) == 1
assert wf_params[0].default_value == "https://new-url.com" assert wf_params[0]["default_value"] == "https://new-url.com"
# Login block should have the credential key # Login block should have the credential key
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "credentials" in pkeys assert "credentials" in pkeys
@ -1270,8 +1370,7 @@ async def test_workflow_update_multiple_login_blocks_all_get_credential_keys(
"""When the update has multiple login blocks, ALL of them should get credential """When the update has multiple login blocks, ALL of them should get credential
parameter_keys injected (even if labels don't match existing blocks).""" parameter_keys injected (even if labels don't match existing blocks)."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1330,19 +1429,19 @@ async def test_workflow_update_multiple_login_blocks_all_get_credential_keys(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
# Both login blocks should have credential keys # Both login blocks should have credential keys
login_blocks = [b for b in blocks if getattr(b, "block_type", None) == "login"] login_blocks = [b for b in blocks if b.get("block_type") == "login"]
assert len(login_blocks) == 2 assert len(login_blocks) == 2
for lb in login_blocks: for lb in login_blocks:
pkeys = getattr(lb, "parameter_keys", []) pkeys = lb.get("parameter_keys", [])
assert "credentials" in pkeys, f"Login block {getattr(lb, 'label', '?')} missing credential key" assert "credentials" in pkeys, f"Login block {lb.get('label', '?')} missing credential key"
# Task block should NOT have credential keys (no label match, not a login block) # Task block should NOT have credential keys (no label match, not a login block)
task_block = next(b for b in blocks if getattr(b, "block_type", None) == "task") task_block = next(b for b in blocks if b.get("block_type") == "task")
task_pkeys = getattr(task_block, "parameter_keys", None) or [] task_pkeys = task_block.get("parameter_keys") or []
assert "credentials" not in task_pkeys assert "credentials" not in task_pkeys
@ -1353,8 +1452,7 @@ async def test_workflow_update_credential_keys_injected_into_login_block_nested_
"""When a login block is nested inside a for_loop block, credential parameter_keys """When a login block is nested inside a for_loop block, credential parameter_keys
should still be injected via type-based matching.""" should still be injected via type-based matching."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1416,20 +1514,20 @@ async def test_workflow_update_credential_keys_injected_into_login_block_nested_
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
# Credential parameter must be injected # Credential parameter must be injected
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
assert cred_params[0].credential_id == "cred_abc123" assert cred_params[0]["credential_id"] == "cred_abc123"
# The nested login block inside for_loop must have credential keys # The nested login block inside for_loop must have credential keys
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
loop_block = next(b for b in blocks if getattr(b, "block_type", None) == "for_loop") loop_block = next(b for b in blocks if b.get("block_type") == "for_loop")
nested_blocks = getattr(loop_block, "loop_blocks", []) nested_blocks = loop_block.get("loop_blocks", [])
login_block = next(b for b in nested_blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in nested_blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "credentials" in pkeys assert "credentials" in pkeys
@ -1440,8 +1538,7 @@ async def test_workflow_update_login_block_only_gets_credential_type_keys_not_aw
"""When a workflow has both a credential param and an aws_secret param, """When a workflow has both a credential param and an aws_secret param,
login blocks should only get the credential key, not the aws_secret key.""" login blocks should only get the credential key, not the aws_secret key."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1506,26 +1603,26 @@ async def test_workflow_update_login_block_only_gets_credential_type_keys_not_aw
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
# Both params should be preserved # Both params should be preserved
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
aws_params = [p for p in params if getattr(p, "parameter_type", None) == "aws_secret"] aws_params = [p for p in params if p.get("parameter_type") == "aws_secret"]
assert len(cred_params) == 1 assert len(cred_params) == 1
assert len(aws_params) == 1 assert len(aws_params) == 1
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
# Login block should ONLY get credential key, NOT aws_secret # Login block should ONLY get credential key, NOT aws_secret
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
login_pkeys = getattr(login_block, "parameter_keys", []) login_pkeys = login_block.get("parameter_keys", [])
assert "credentials" in login_pkeys assert "credentials" in login_pkeys
assert "api_secret" not in login_pkeys assert "api_secret" not in login_pkeys
# Task block should get aws_secret via label fallback (label unchanged) # Task block should get aws_secret via label fallback (label unchanged)
task_block = next(b for b in blocks if getattr(b, "block_type", None) == "task") task_block = next(b for b in blocks if b.get("block_type") == "task")
task_pkeys = getattr(task_block, "parameter_keys", []) task_pkeys = task_block.get("parameter_keys", [])
assert "api_secret" in task_pkeys assert "api_secret" in task_pkeys
@ -1537,8 +1634,7 @@ async def test_workflow_update_strips_runtime_fields_from_credential_params(
(credential_parameter_id, workflow_id, created_at, modified_at, deleted_at), (credential_parameter_id, workflow_id, created_at, modified_at, deleted_at),
those fields must be stripped before re-injecting into the update definition.""" those fields must be stripped before re-injecting into the update definition."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1593,26 +1689,24 @@ async def test_workflow_update_strips_runtime_fields_from_credential_params(
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
cred_params = [p for p in params if getattr(p, "parameter_type", None) == "credential"] cred_params = [p for p in params if p.get("parameter_type") == "credential"]
assert len(cred_params) == 1 assert len(cred_params) == 1
cred = cred_params[0] cred = cred_params[0]
# Business fields should be preserved # Business fields should be preserved
assert getattr(cred, "key", None) == "credentials" assert cred.get("key") == "credentials"
assert getattr(cred, "credential_id", None) == "cred_abc123" assert cred.get("credential_id") == "cred_abc123"
assert getattr(cred, "description", None) == "Login credentials" assert cred.get("description") == "Login credentials"
# Runtime fields must NOT be present — Fern SDK types use extra="allow" # Runtime fields must NOT be present
# so extra fields survive as attributes if passed through assert "credential_parameter_id" not in cred
cred_dict = cred.dict() if hasattr(cred, "dict") else cred.__dict__ assert "workflow_id" not in cred
assert "credential_parameter_id" not in cred_dict assert "created_at" not in cred
assert "workflow_id" not in cred_dict assert "modified_at" not in cred
assert "created_at" not in cred_dict assert "deleted_at" not in cred
assert "modified_at" not in cred_dict
assert "deleted_at" not in cred_dict
@pytest.mark.asyncio @pytest.mark.asyncio
@ -1621,8 +1715,7 @@ async def test_workflow_update_preserves_workflow_credential_id_params_and_injec
) -> None: ) -> None:
"""Workflow parameters with `credential_id` type should be preserved like direct credential params.""" """Workflow parameters with `credential_id` type should be preserved like direct credential params."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1690,34 +1783,32 @@ async def test_workflow_update_preserves_workflow_credential_id_params_and_injec
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
workflow_cred_params = [ workflow_cred_params = [
p p
for p in params for p in params
if getattr(p, "parameter_type", None) == "workflow" if p.get("parameter_type") == "workflow" and str(p.get("workflow_parameter_type")) == "credential_id"
and str(getattr(p, "workflow_parameter_type", None)) == "credential_id"
] ]
assert len(workflow_cred_params) == 1 assert len(workflow_cred_params) == 1
workflow_cred = workflow_cred_params[0] workflow_cred = workflow_cred_params[0]
assert getattr(workflow_cred, "key", None) == "my_creds" assert workflow_cred.get("key") == "my_creds"
assert getattr(workflow_cred, "default_value", None) == "cred_abc123" assert workflow_cred.get("default_value") == "cred_abc123"
workflow_cred_dict = workflow_cred.dict() if hasattr(workflow_cred, "dict") else workflow_cred.__dict__ assert "workflow_parameter_id" not in workflow_cred
assert "workflow_parameter_id" not in workflow_cred_dict assert "workflow_id" not in workflow_cred
assert "workflow_id" not in workflow_cred_dict assert "created_at" not in workflow_cred
assert "created_at" not in workflow_cred_dict assert "modified_at" not in workflow_cred
assert "modified_at" not in workflow_cred_dict assert "deleted_at" not in workflow_cred
assert "deleted_at" not in workflow_cred_dict
workflow_params = [p for p in params if getattr(p, "parameter_type", None) == "workflow"] workflow_params = [p for p in params if p.get("parameter_type") == "workflow"]
url_param = next(p for p in workflow_params if getattr(p, "key", None) == "url_input") url_param = next(p for p in workflow_params if p.get("key") == "url_input")
assert getattr(url_param, "default_value", None) == "https://new-url.com" assert url_param.get("default_value") == "https://new-url.com"
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "my_creds" in pkeys assert "my_creds" in pkeys
@ -1727,8 +1818,7 @@ async def test_workflow_update_always_replaces_wrong_workflow_credential_id_defa
) -> None: ) -> None:
"""Credential-id workflow params should keep the existing default credential on MCP updates.""" """Credential-id workflow params should keep the existing default credential on MCP updates."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1785,16 +1875,15 @@ async def test_workflow_update_always_replaces_wrong_workflow_credential_id_defa
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
workflow_cred_params = [ workflow_cred_params = [
p p
for p in params for p in params
if getattr(p, "parameter_type", None) == "workflow" if p.get("parameter_type") == "workflow" and str(p.get("workflow_parameter_type")) == "credential_id"
and str(getattr(p, "workflow_parameter_type", None)) == "credential_id"
] ]
assert len(workflow_cred_params) == 1 assert len(workflow_cred_params) == 1
assert getattr(workflow_cred_params[0], "default_value", None) == "cred_CORRECT" assert workflow_cred_params[0].get("default_value") == "cred_CORRECT"
@pytest.mark.asyncio @pytest.mark.asyncio
@ -1803,8 +1892,7 @@ async def test_workflow_update_injects_onepassword_key_into_login_block_paramete
) -> None: ) -> None:
"""Login blocks should keep external credential keys like onepassword after MCP edits.""" """Login blocks should keep external credential keys like onepassword after MCP edits."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1855,18 +1943,18 @@ async def test_workflow_update_injects_onepassword_key_into_login_block_paramete
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
onepassword_params = [p for p in params if getattr(p, "parameter_type", None) == "onepassword"] onepassword_params = [p for p in params if p.get("parameter_type") == "onepassword"]
assert len(onepassword_params) == 1 assert len(onepassword_params) == 1
assert getattr(onepassword_params[0], "key", None) == "site_login_cred" assert onepassword_params[0].get("key") == "site_login_cred"
assert getattr(onepassword_params[0], "vault_id", None) == "vault_123" assert onepassword_params[0].get("vault_id") == "vault_123"
assert getattr(onepassword_params[0], "item_id", None) == "item_456" assert onepassword_params[0].get("item_id") == "item_456"
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "site_login_cred" in pkeys assert "site_login_cred" in pkeys
@ -1876,8 +1964,7 @@ async def test_workflow_update_injects_bitwarden_login_key_into_login_block_para
) -> None: ) -> None:
"""Login blocks should keep bitwarden_login_credential keys after MCP edits.""" """Login blocks should keep bitwarden_login_credential keys after MCP edits."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -1936,16 +2023,16 @@ async def test_workflow_update_injects_bitwarden_login_key_into_login_block_para
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
bw_params = [p for p in params if getattr(p, "parameter_type", None) == "bitwarden_login_credential"] bw_params = [p for p in params if p.get("parameter_type") == "bitwarden_login_credential"]
assert len(bw_params) == 1 assert len(bw_params) == 1
assert getattr(bw_params[0], "key", None) == "portal_login" assert bw_params[0].get("key") == "portal_login"
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "portal_login" in pkeys assert "portal_login" in pkeys
@ -1955,8 +2042,7 @@ async def test_workflow_update_injects_azure_vault_key_into_login_block_paramete
) -> None: ) -> None:
"""Login blocks should keep azure_vault_credential keys after MCP edits.""" """Login blocks should keep azure_vault_credential keys after MCP edits."""
fake_client = SimpleNamespace(update_workflow=AsyncMock(return_value=_fake_workflow_response())) request_mock = _patch_skyvern_http(monkeypatch, response_payload=_fake_workflow_dict())
monkeypatch.setattr(workflow_tools, "get_skyvern", lambda: fake_client)
async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]: async def fake_get_workflow_by_id(workflow_id: str, version: int | None = None) -> dict[str, object]:
return { return {
@ -2014,14 +2100,14 @@ async def test_workflow_update_injects_azure_vault_key_into_login_block_paramete
assert result["ok"] is True assert result["ok"] is True
sent_def = fake_client.update_workflow.await_args.kwargs["json_definition"] sent_def = request_mock.await_args.kwargs["json"]["json_definition"]
params = sent_def.workflow_definition.parameters params = sent_def["workflow_definition"]["parameters"]
az_params = [p for p in params if getattr(p, "parameter_type", None) == "azure_vault_credential"] az_params = [p for p in params if p.get("parameter_type") == "azure_vault_credential"]
assert len(az_params) == 1 assert len(az_params) == 1
assert getattr(az_params[0], "key", None) == "azure_creds" assert az_params[0].get("key") == "azure_creds"
blocks = sent_def.workflow_definition.blocks blocks = sent_def["workflow_definition"]["blocks"]
login_block = next(b for b in blocks if getattr(b, "block_type", None) == "login") login_block = next(b for b in blocks if b.get("block_type") == "login")
pkeys = getattr(login_block, "parameter_keys", []) pkeys = login_block.get("parameter_keys", [])
assert "azure_creds" in pkeys assert "azure_creds" in pkeys

View file

@ -0,0 +1,182 @@
"""Tests for the ``validation.decision`` / ``validation.reasoning_kind`` span
attributes (SKY-9174, Part D.3).
The two attributes give us a post-merge logfire signal for when a validation
block's LLM reasons literally and/or terminates — the failure mode Part D aims
to reduce. Query shape::
SELECT COUNT(*) FROM records
WHERE span_name = 'skyvern.agent.step_body'
AND attributes->>'validation.decision' = 'terminate'
AND attributes->>'validation.reasoning_kind' = 'literal'
AND start_timestamp > now() - INTERVAL '24 hours';
Pre-fix this count should be non-trivial; post-fix it should trend to zero on
the copilot-v2 cohort. These tests cover the attribute-writing logic directly
(the helper is pure, so we don't need to drive the full agent step).
"""
from __future__ import annotations
from datetime import UTC, datetime
import opentelemetry.trace as otel_trace
import pytest
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from skyvern.forge.agent import record_validation_span_attrs
from skyvern.forge.sdk.db.enums import TaskType
from skyvern.webeye.actions.actions import (
Action,
ActionType,
ClickAction,
CompleteAction,
TerminateAction,
)
from tests.unit.helpers import make_organization, make_task
STEP_SPAN_NAME = "skyvern.agent.validation_step_body_fixture"
def _validation_task() -> object:
now = datetime.now(UTC)
org = make_organization(now)
return make_task(now, org, task_type=TaskType.validation)
def _general_task() -> object:
now = datetime.now(UTC)
org = make_organization(now)
return make_task(now, org, task_type=TaskType.general)
def _run_with_span(task: object, actions: list[Action]) -> dict:
"""Start a span, invoke the helper inside it, end the span. Return the
span's attribute dict via the in-memory exporter."""
tracer = otel_trace.get_tracer("sky-9174-test")
with tracer.start_as_current_span(STEP_SPAN_NAME) as span:
record_validation_span_attrs(span, task, actions)
return {} # attrs read from the exporter by the caller
def _span_attrs(span_exporter: InMemorySpanExporter) -> dict:
span = next((s for s in span_exporter.get_finished_spans() if s.name == STEP_SPAN_NAME), None)
assert span is not None, "expected fixture span to be recorded"
return dict(span.attributes or {})
def _complete_action(reasoning: str) -> CompleteAction:
return CompleteAction(
reasoning=reasoning,
intention=reasoning,
action_type=ActionType.COMPLETE,
)
def _terminate_action(reasoning: str) -> TerminateAction:
return TerminateAction(
reasoning=reasoning,
intention=reasoning,
action_type=ActionType.TERMINATE,
)
def test_complete_with_semantic_reasoning_records_semantic(span_exporter: InMemorySpanExporter) -> None:
task = _validation_task()
actions = [_complete_action("The current page shows a thank-you confirmation.")]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert attrs.get("validation.decision") == "complete"
assert attrs.get("validation.reasoning_kind") == "semantic"
def test_terminate_with_literal_reasoning_records_literal(span_exporter: InMemorySpanExporter) -> None:
"""The regression we care about most: LLM terminated because an exact
string wasn't found. This is the combination (terminate, literal) that
Part D aims to drive toward zero."""
task = _validation_task()
actions = [
_terminate_action(
"The page does not contain the exact complete-criterion text 'Your message has been sent'. TERMINATE."
)
]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert attrs.get("validation.decision") == "terminate"
assert attrs.get("validation.reasoning_kind") == "literal"
def test_terminate_with_semantic_reasoning_records_semantic(span_exporter: InMemorySpanExporter) -> None:
task = _validation_task()
actions = [_terminate_action("An error banner surfaced at the top of the page saying the submission failed.")]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert attrs.get("validation.decision") == "terminate"
assert attrs.get("validation.reasoning_kind") == "semantic"
def test_complete_with_literal_reasoning_records_literal(span_exporter: InMemorySpanExporter) -> None:
"""Symmetric — a literal COMPLETE is harmless but we still tag it, because
the post-merge dashboard cares about the distribution across both axes,
not just the terminate one."""
task = _validation_task()
actions = [_complete_action("The page contains the exact phrase 'Your message has been sent'.")]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert attrs.get("validation.decision") == "complete"
assert attrs.get("validation.reasoning_kind") == "literal"
def test_non_validation_task_does_not_tag_span(span_exporter: InMemorySpanExporter) -> None:
"""Guard against accidental tagging of non-validation step spans — those
span attributes are reserved for TaskType.validation."""
task = _general_task()
actions = [_complete_action("The current page shows a thank-you confirmation.")]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert "validation.decision" not in attrs
assert "validation.reasoning_kind" not in attrs
def test_non_decisive_action_does_not_tag_span(span_exporter: InMemorySpanExporter) -> None:
"""Validation tasks whose first action isn't a Complete/Terminate (unusual
but possible during partial parsing) should not produce tagged attrs."""
task = _validation_task()
# A ClickAction stands in for any non-DecisiveAction leading-first.
non_decisive = ClickAction(action_type=ActionType.CLICK, element_id="AAAB", reasoning="click")
_run_with_span(task, [non_decisive])
attrs = _span_attrs(span_exporter)
assert "validation.decision" not in attrs
assert "validation.reasoning_kind" not in attrs
def test_empty_actions_list_does_not_tag_span(span_exporter: InMemorySpanExporter) -> None:
task = _validation_task()
_run_with_span(task, [])
attrs = _span_attrs(span_exporter)
assert "validation.decision" not in attrs
assert "validation.reasoning_kind" not in attrs
def test_missing_reasoning_defaults_to_semantic(span_exporter: InMemorySpanExporter) -> None:
"""Empty/None reasoning shouldn't crash — absence of literal signals means
semantic by the helper's rule."""
task = _validation_task()
actions = [_complete_action("")]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert attrs.get("validation.decision") == "complete"
assert attrs.get("validation.reasoning_kind") == "semantic"
@pytest.mark.parametrize(
"signal",
["exact", "literal", "verbatim", "word-for-word", "word for word"],
)
def test_every_literal_signal_flags_reasoning(signal: str, span_exporter: InMemorySpanExporter) -> None:
"""Each configured signal, on its own, must classify reasoning as literal."""
task = _validation_task()
actions = [_terminate_action(f"The criterion does not appear {signal} on the page.")]
_run_with_span(task, actions)
attrs = _span_attrs(span_exporter)
assert attrs.get("validation.reasoning_kind") == "literal", signal

View file

@ -0,0 +1,79 @@
"""Tests for the ``verification.reasoning_kind`` span attribute (SKY-9174, Part E.2).
Part A already wrote ``verification.status`` and ``verification.template`` on
``skyvern.agent.complete_verify`` spans. Part E adds one more:
``verification.reasoning_kind`` (``literal`` | ``semantic``) derived from the
verifier LLM's ``thoughts`` text via the shared ``_classify_reasoning_kind``
heuristic. Post-fix logfire query::
SELECT COUNT(*) FROM records
WHERE span_name = 'skyvern.agent.complete_verify'
AND attributes->>'verification.status' != 'complete'
AND attributes->>'verification.reasoning_kind' = 'literal'
AND start_timestamp > now() - INTERVAL '24 hours';
Pre-fix, a single reproducing run produces double-digit rows with
``reasoning_kind = 'literal'``. Post-fix, that count should trend to zero.
"""
from __future__ import annotations
import opentelemetry.trace as otel_trace
import pytest
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from skyvern.forge.agent import record_verification_span_attrs
SPAN_NAME = "skyvern.agent.verification_fixture"
def _run_with_span(thoughts: str | None) -> None:
tracer = otel_trace.get_tracer("sky-9174-part-e-test")
with tracer.start_as_current_span(SPAN_NAME) as span:
record_verification_span_attrs(span, thoughts)
def _span_attrs(span_exporter: InMemorySpanExporter) -> dict:
span = next((s for s in span_exporter.get_finished_spans() if s.name == SPAN_NAME), None)
assert span is not None, "expected fixture span to be recorded"
return dict(span.attributes or {})
def test_literal_reasoning_records_literal(span_exporter: InMemorySpanExporter) -> None:
"""The regression we care about most: verifier insisted on finding the
criterion's exact wording on the page and returned ``continue``. This is
the (verifier, literal) combination Part E aims to drive toward zero."""
_run_with_span("The page does not contain the exact phrase 'Your message has been sent'. user_goal_achieved=false.")
attrs = _span_attrs(span_exporter)
assert attrs.get("verification.reasoning_kind") == "literal"
def test_semantic_reasoning_records_semantic(span_exporter: InMemorySpanExporter) -> None:
_run_with_span("The page renders a thank-you confirmation, satisfying the goal's intent.")
attrs = _span_attrs(span_exporter)
assert attrs.get("verification.reasoning_kind") == "semantic"
def test_empty_reasoning_defaults_to_semantic(span_exporter: InMemorySpanExporter) -> None:
_run_with_span("")
attrs = _span_attrs(span_exporter)
assert attrs.get("verification.reasoning_kind") == "semantic"
def test_none_reasoning_defaults_to_semantic(span_exporter: InMemorySpanExporter) -> None:
_run_with_span(None)
attrs = _span_attrs(span_exporter)
assert attrs.get("verification.reasoning_kind") == "semantic"
@pytest.mark.parametrize(
"signal",
["exact", "literal", "verbatim", "word-for-word", "word for word"],
)
def test_every_literal_signal_flags_reasoning(signal: str, span_exporter: InMemorySpanExporter) -> None:
"""Shared classifier: each signal used by ``record_validation_span_attrs``
also flags verifier reasoning. Guards against future drift between the two
callers."""
_run_with_span(f"The goal does not appear {signal} on the page.")
attrs = _span_attrs(span_exporter)
assert attrs.get("verification.reasoning_kind") == "literal", signal

698
uv.lock generated

File diff suppressed because it is too large Load diff