diff --git a/docs/changelog.mdx b/docs/changelog.mdx index 1277c039b..4a5664c5c 100644 --- a/docs/changelog.mdx +++ b/docs/changelog.mdx @@ -13,6 +13,7 @@ keywords:

TIMELINE

+ Week of Apr 21, 2026 Week of Apr 14, 2026 v1.0.22 — Feb 26, 2026 v1.0.21 — Feb 24, 2026 @@ -26,7 +27,24 @@ keywords:

Changelog

New features, improvements, and fixes in Skyvern

- + + +## Improvements + +- **Copilot Live Status Narration** — The workflow editor copilot now displays a short live status line while it works, so you can see what the agent is doing instead of waiting in silence. ([#5567](https://github.com/Skyvern-AI/skyvern/pull/5567)) + +- **Renamed Browser Profile Reset Action** — The API endpoint for clearing a persisted browser session has been renamed from `/browser_session/refresh` to `/browser_session/reset_profile` for clarity. The old path remains available as a deprecated alias. ([#5572](https://github.com/Skyvern-AI/skyvern/pull/5572)) + +- **Fewer Screenshots for API Runs** — Screenshot capture frequency has been reduced for API-triggered runs, lowering storage overhead on long-running automations without affecting interactive sessions. ([#5565](https://github.com/Skyvern-AI/skyvern/pull/5565)) + +## Bug fixes + +- Fixed the workflow copilot incorrectly proposing a workflow it had just failed to build. The copilot now only surfaces an Accept/Reject panel when the last test actually passed. ([#5574](https://github.com/Skyvern-AI/skyvern/pull/5574)) +- Fixed 2FA entry failing on pages with custom overlay input components where the real input field is hidden. The agent now types digit keys directly when no interactable input exists. ([#5568](https://github.com/Skyvern-AI/skyvern/pull/5568)) + + + + ## New features @@ -34,12 +52,26 @@ keywords: - **Workflow-Level Error Code Mapping** — Workflows now support an `error_code_mapping` field that all blocks inherit automatically. Block-level mappings still take precedence, giving you a single place to define default error handling without repeating it on every block. ([#5506](https://github.com/Skyvern-AI/skyvern/pull/5506)) +- **AI Summarize for Workflow Run Outputs** — A magic-wand "Summarize with AI" button now appears next to block and workflow run output sections, generating a plain-language summary of raw JSON output on demand. ([#5556](https://github.com/Skyvern-AI/skyvern/pull/5556)) + +- **Reset Browser Profile API** — A new `POST /workflows/{workflow_permanent_id}/browser_session/reset_profile` endpoint clears a corrupted persisted browser session so the next run starts from a fresh browser state. ([#5544](https://github.com/Skyvern-AI/skyvern/pull/5544)) + +- **MCP OAuth for Remote Clients** — MCP clients (e.g. Claude Code, Codex) can now complete browser-based OAuth to authenticate with Skyvern without getting stuck in redirect chains. ([#5558](https://github.com/Skyvern-AI/skyvern/pull/5558)) + +- **OpenClaw MCP Setup** — `skyvern setup openclaw` and `skyvern mcp switch` now support OpenClaw as an MCP client target for both remote and local stdio configurations. ([#5536](https://github.com/Skyvern-AI/skyvern/pull/5536)) + ## Improvements - **Smarter Extraction Cache Keys** — The extraction cache now normalizes session-volatile tokens (random UUIDs, CSRF values, nonce query params, timestamps) before hashing cache keys. Semantically identical page visits now share cached extraction results, improving cache hit rates across workflow re-runs. ([#5504](https://github.com/Skyvern-AI/skyvern/pull/5504)) - **Reduced Rate-Limit Errors on Extraction Prompts** — Extraction prompts are now capped in size before being sent to the LLM. Oversized `previous_extracted_information`, `extracted_information_schema`, and element trees are truncated rather than sent whole, significantly reducing `429 RESOURCE_EXHAUSTED` errors on high-volume workflows. ([#5502](https://github.com/Skyvern-AI/skyvern/pull/5502)) +- **Cross-Run Extraction Cache** — Extraction results are now cached in Redis across workflow runs, scoped to the workflow. Recurring scheduled workflows skip redundant LLM calls when the page content hasn't changed. ([#5562](https://github.com/Skyvern-AI/skyvern/pull/5562)) + +- **Copilot Survives Tab Switches and Reconnects** — The workflow copilot agent now runs to completion even if the browser tab is closed or the connection drops mid-request, and the AI reply is persisted so it appears when you reconnect. ([#5560](https://github.com/Skyvern-AI/skyvern/pull/5560)) + +- **Copilot Fails Fast on Unresolvable URLs** — The workflow copilot now immediately surfaces the real error (e.g. DNS resolution failure) and stops retrying when a URL cannot be reached, rather than reporting success. ([#5563](https://github.com/Skyvern-AI/skyvern/pull/5563)) + ## Bug fixes - Fixed CSV parsing failing on files whose first header row exceeded ~1 MB, causing valid wide-format CSVs to be rejected with "Could not determine delimiter". ([#5510](https://github.com/Skyvern-AI/skyvern/pull/5510)) @@ -48,7 +80,11 @@ keywords: - Fixed workflow file-download blocks losing their configured `download_suffix` at task finalization, causing downloaded files to be persisted without the expected filename suffix. ([#5499](https://github.com/Skyvern-AI/skyvern/pull/5499)) - Fixed cached scripts for non-ATS workflows incorrectly generating `fill_form()` calls, which defeated the performance benefits of caching by delegating back to AI at runtime. ([#5497](https://github.com/Skyvern-AI/skyvern/pull/5497)) - Fixed the pagination navigator on the Workflow Past Runs and Run History pages being right-aligned instead of centered. - +- Fixed run recordings missing an end timestamp and broken scrubbing. Recordings are now finalized with a stream-copy remux so players correctly show duration and support seeking. ([#5551](https://github.com/Skyvern-AI/skyvern/pull/5551)) +- Fixed tag input fields (e.g. skill set fields on job application forms) not selecting values from the dropdown. These fields are now correctly treated as auto-completion inputs. ([#5557](https://github.com/Skyvern-AI/skyvern/pull/5557)) +- Fixed per-step video sync spawning a redundant ffmpeg process on every step, causing CPU spikes and log noise. ([#5559](https://github.com/Skyvern-AI/skyvern/pull/5559)) +- Fixed workflow parameter edit and delete buttons being hidden when a parameter name was too long. +- Fixed webhook delivery failing silently when a stored webhook URL had a leading or trailing space. ([#5550](https://github.com/Skyvern-AI/skyvern/pull/5550)) diff --git a/fern/docs/changelog.mdx b/fern/docs/changelog.mdx index d9edd71b2..3454ac81c 100644 --- a/fern/docs/changelog.mdx +++ b/fern/docs/changelog.mdx @@ -7,6 +7,7 @@ mode: custom

TIMELINE

+ Week of Apr 21, 2026 Week of Apr 14, 2026 Week of Mar 30, 2026 v1.0.22 — Feb 26, 2026 @@ -21,7 +22,24 @@ mode: custom

Changelog

New features, improvements, and fixes in Skyvern

- + + +## Improvements + +- **Copilot Live Status Narration** — The workflow editor copilot now displays a short live status line while it works, so you can see what the agent is doing instead of waiting in silence. ([#5567](https://github.com/Skyvern-AI/skyvern/pull/5567)) + +- **Renamed Browser Profile Reset Action** — The API endpoint for clearing a persisted browser session has been renamed from `/browser_session/refresh` to `/browser_session/reset_profile` for clarity. The old path remains available as a deprecated alias. ([#5572](https://github.com/Skyvern-AI/skyvern/pull/5572)) + +- **Fewer Screenshots for API Runs** — Screenshot capture frequency has been reduced for API-triggered runs, lowering storage overhead on long-running automations without affecting interactive sessions. ([#5565](https://github.com/Skyvern-AI/skyvern/pull/5565)) + +## Bug fixes + +- Fixed the workflow copilot incorrectly proposing a workflow it had just failed to build. The copilot now only surfaces an Accept/Reject panel when the last test actually passed. ([#5574](https://github.com/Skyvern-AI/skyvern/pull/5574)) +- Fixed 2FA entry failing on pages with custom overlay input components where the real input field is hidden. The agent now types digit keys directly when no interactable input exists. ([#5568](https://github.com/Skyvern-AI/skyvern/pull/5568)) + + + + ## New features @@ -29,12 +47,26 @@ mode: custom - **Workflow-Level Error Code Mapping** — Workflows now support an `error_code_mapping` field that all blocks inherit automatically. Block-level mappings still take precedence, giving you a single place to define default error handling without repeating it on every block. ([#5506](https://github.com/Skyvern-AI/skyvern/pull/5506)) +- **AI Summarize for Workflow Run Outputs** — A magic-wand "Summarize with AI" button now appears next to block and workflow run output sections, generating a plain-language summary of raw JSON output on demand. ([#5556](https://github.com/Skyvern-AI/skyvern/pull/5556)) + +- **Reset Browser Profile API** — A new `POST /workflows/{workflow_permanent_id}/browser_session/reset_profile` endpoint clears a corrupted persisted browser session so the next run starts from a fresh browser state. ([#5544](https://github.com/Skyvern-AI/skyvern/pull/5544)) + +- **MCP OAuth for Remote Clients** — MCP clients (e.g. Claude Code, Codex) can now complete browser-based OAuth to authenticate with Skyvern without getting stuck in redirect chains. ([#5558](https://github.com/Skyvern-AI/skyvern/pull/5558)) + +- **OpenClaw MCP Setup** — `skyvern setup openclaw` and `skyvern mcp switch` now support OpenClaw as an MCP client target for both remote and local stdio configurations. ([#5536](https://github.com/Skyvern-AI/skyvern/pull/5536)) + ## Improvements - **Smarter Extraction Cache Keys** — The extraction cache now normalizes session-volatile tokens (random UUIDs, CSRF values, nonce query params, timestamps) before hashing cache keys. Semantically identical page visits now share cached extraction results, improving cache hit rates across workflow re-runs. ([#5504](https://github.com/Skyvern-AI/skyvern/pull/5504)) - **Reduced Rate-Limit Errors on Extraction Prompts** — Extraction prompts are now capped in size before being sent to the LLM. Oversized `previous_extracted_information`, `extracted_information_schema`, and element trees are truncated rather than sent whole, significantly reducing `429 RESOURCE_EXHAUSTED` errors on high-volume workflows. ([#5502](https://github.com/Skyvern-AI/skyvern/pull/5502)) +- **Cross-Run Extraction Cache** — Extraction results are now cached in Redis across workflow runs, scoped to the workflow. Recurring scheduled workflows skip redundant LLM calls when the page content hasn't changed. ([#5562](https://github.com/Skyvern-AI/skyvern/pull/5562)) + +- **Copilot Survives Tab Switches and Reconnects** — The workflow copilot agent now runs to completion even if the browser tab is closed or the connection drops mid-request, and the AI reply is persisted so it appears when you reconnect. ([#5560](https://github.com/Skyvern-AI/skyvern/pull/5560)) + +- **Copilot Fails Fast on Unresolvable URLs** — The workflow copilot now immediately surfaces the real error (e.g. DNS resolution failure) and stops retrying when a URL cannot be reached, rather than reporting success. ([#5563](https://github.com/Skyvern-AI/skyvern/pull/5563)) + ## Bug fixes - Fixed CSV parsing failing on files whose first header row exceeded ~1 MB, causing valid wide-format CSVs to be rejected with "Could not determine delimiter". ([#5510](https://github.com/Skyvern-AI/skyvern/pull/5510)) @@ -43,6 +75,11 @@ mode: custom - Fixed workflow file-download blocks losing their configured `download_suffix` at task finalization, causing downloaded files to be persisted without the expected filename suffix. ([#5499](https://github.com/Skyvern-AI/skyvern/pull/5499)) - Fixed cached scripts for non-ATS workflows incorrectly generating `fill_form()` calls, which defeated the performance benefits of caching by delegating back to AI at runtime. ([#5497](https://github.com/Skyvern-AI/skyvern/pull/5497)) - Fixed the pagination navigator on the Workflow Past Runs and Run History pages being right-aligned instead of centered. +- Fixed run recordings missing an end timestamp and broken scrubbing. Recordings are now finalized with a stream-copy remux so players correctly show duration and support seeking. ([#5551](https://github.com/Skyvern-AI/skyvern/pull/5551)) +- Fixed tag input fields (e.g. skill set fields on job application forms) not selecting values from the dropdown. These fields are now correctly treated as auto-completion inputs. ([#5557](https://github.com/Skyvern-AI/skyvern/pull/5557)) +- Fixed per-step video sync spawning a redundant ffmpeg process on every step, causing CPU spikes and log noise. ([#5559](https://github.com/Skyvern-AI/skyvern/pull/5559)) +- Fixed workflow parameter edit and delete buttons being hidden when a parameter name was too long. +- Fixed webhook delivery failing silently when a stored webhook URL had a leading or trailing space. ([#5550](https://github.com/Skyvern-AI/skyvern/pull/5550)) diff --git a/pyproject.toml b/pyproject.toml index 023bf6368..0a7d35958 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -59,6 +59,8 @@ dependencies = [ "tiktoken>=0.9.0", "anthropic>=0.50.0,<0.89", "google-cloud-aiplatform>=1.90.0,<2", + "google-auth>=2.49.2,<3", + "google-auth-oauthlib>=1.3.1,<2", "onepassword-sdk==0.4.0", "types-boto3[full]>=1.38.31,<2", "lark>=1.2.2,<2", diff --git a/skyvern-frontend/src/api/types.ts b/skyvern-frontend/src/api/types.ts index c6a15e9f2..9cf5b98c8 100644 --- a/skyvern-frontend/src/api/types.ts +++ b/skyvern-frontend/src/api/types.ts @@ -147,6 +147,12 @@ export type Task = { application: string | null; }; +export type FailureCategory = { + category: string; + confidence_float: number; + reasoning: string; +}; + export type TaskApiResponse = { request: CreateTaskRequest; task_id: string; @@ -157,6 +163,7 @@ export type TaskApiResponse = { screenshot_url: string | null; recording_url: string | null; failure_reason: string | null; + failure_category: Array | null; webhook_failure_reason: string | null; errors: Array>; max_steps_per_run: number | null; @@ -483,6 +490,7 @@ export type WorkflowRunStatusApiResponse = { recording_url: string | null; outputs: Record | null; failure_reason: string | null; + failure_category: Array | null; webhook_failure_reason: string | null; downloaded_file_urls: Array | null; total_steps: number | null; @@ -513,6 +521,7 @@ export type WorkflowRunStatusApiResponseWithWorkflow = { recording_url: string | null; outputs: Record | null; failure_reason: string | null; + failure_category: Array | null; webhook_failure_reason: string | null; downloaded_file_urls: Array | null; total_steps: number | null; diff --git a/skyvern-frontend/src/components/FailureCategoryBadge.tsx b/skyvern-frontend/src/components/FailureCategoryBadge.tsx new file mode 100644 index 000000000..c10af3995 --- /dev/null +++ b/skyvern-frontend/src/components/FailureCategoryBadge.tsx @@ -0,0 +1,27 @@ +import type { FailureCategory } from "@/api/types"; +import { Badge } from "@/components/ui/badge"; + +function formatCategoryLabel(category: string): string { + return category + .split("_") + .map((word) => word.charAt(0).toUpperCase() + word.slice(1).toLowerCase()) + .join(" "); +} + +type Props = { + failureCategory: Array | null; +}; + +function FailureCategoryBadge({ failureCategory }: Props) { + const primary = failureCategory?.[0]; + if (!primary) { + return null; + } + return ( + + {formatCategoryLabel(primary.category)} + + ); +} + +export { FailureCategoryBadge }; diff --git a/skyvern-frontend/src/routes/tasks/detail/TaskDetails.tsx b/skyvern-frontend/src/routes/tasks/detail/TaskDetails.tsx index e60d71f22..65a9d66b6 100644 --- a/skyvern-frontend/src/routes/tasks/detail/TaskDetails.tsx +++ b/skyvern-frontend/src/routes/tasks/detail/TaskDetails.tsx @@ -7,6 +7,7 @@ import { TaskApiResponse, WorkflowRunStatusApiResponse, } from "@/api/types"; +import { FailureCategoryBadge } from "@/components/FailureCategoryBadge"; import { Status404 } from "@/components/Status404"; import { StatusBadge } from "@/components/StatusBadge"; import { SwitchBarNavigation } from "@/components/SwitchBarNavigation"; @@ -177,7 +178,10 @@ function TaskDetails() { task?.status === Status.TimedOut; const failureReason = showFailureReason ? (
- +
+ + +
-
{failureReasonTitle}
+
+
{failureReasonTitle}
+ +
{workflowRun.failure_reason}
{matchedTips} {shouldShowFinallyNote && ( diff --git a/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/FileParserNode.tsx b/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/FileParserNode.tsx index b850d413a..860bd99e7 100644 --- a/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/FileParserNode.tsx +++ b/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/FileParserNode.tsx @@ -24,6 +24,7 @@ import { ModelSelector } from "@/components/ModelSelector"; import { useRecordingStore } from "@/store/useRecordingStore"; const FILE_TYPE_OPTIONS: Array<{ value: FileParserFileType; label: string }> = [ + { value: "auto_detect", label: "Auto detect" }, { value: "csv", label: "CSV" }, { value: "excel", label: "Excel" }, { value: "pdf", label: "PDF" }, @@ -78,7 +79,9 @@ function FileParserNode({ id, data }: NodeProps) { function handleFileUrlChange(value: string) { const detected = detectFileTypeFromUrl(value); - if (detected) { + const currentType = data.fileType; + const canInfer = detected && (!currentType || currentType === detected); + if (canInfer) { update({ fileUrl: value, fileType: detected }); } else { update({ fileUrl: value }); diff --git a/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/types.ts b/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/types.ts index 71d146259..026f0d01c 100644 --- a/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/types.ts +++ b/skyvern-frontend/src/routes/workflows/editor/nodes/FileParserNode/types.ts @@ -6,7 +6,13 @@ import { WorkflowModel, } from "@/routes/workflows/types/workflowTypes"; -export type FileParserFileType = "csv" | "excel" | "pdf" | "image" | "docx"; +export type FileParserFileType = + | "auto_detect" + | "csv" + | "excel" + | "pdf" + | "image" + | "docx"; export type FileParserNodeData = NodeBaseData & { fileUrl: string; @@ -22,7 +28,7 @@ export const fileParserNodeDefaultData: FileParserNodeData = { editable: true, label: "", fileUrl: "", - fileType: "csv", + fileType: "auto_detect", continueOnFailure: false, jsonSchema: "null", model: null, diff --git a/skyvern-frontend/src/routes/workflows/editor/workflowEditorUtils.ts b/skyvern-frontend/src/routes/workflows/editor/workflowEditorUtils.ts index 69b66d950..5d5fbccdc 100644 --- a/skyvern-frontend/src/routes/workflows/editor/workflowEditorUtils.ts +++ b/skyvern-frontend/src/routes/workflows/editor/workflowEditorUtils.ts @@ -864,7 +864,7 @@ function convertToNode( data: { ...commonData, fileUrl: block.file_url, - fileType: block.file_type ?? "csv", + fileType: block.file_type ?? "auto_detect", jsonSchema: JSON.stringify(block.json_schema, null, 2), model: block.model, }, diff --git a/skyvern-frontend/src/routes/workflows/types/workflowTypes.ts b/skyvern-frontend/src/routes/workflows/types/workflowTypes.ts index db6e8ddbf..29ec08a65 100644 --- a/skyvern-frontend/src/routes/workflows/types/workflowTypes.ts +++ b/skyvern-frontend/src/routes/workflows/types/workflowTypes.ts @@ -410,7 +410,7 @@ export type SendEmailBlock = WorkflowBlockBase & { export type FileURLParserBlock = WorkflowBlockBase & { block_type: "file_url_parser"; file_url: string; - file_type: "csv" | "excel" | "pdf" | "image" | "docx"; + file_type: "auto_detect" | "csv" | "excel" | "pdf" | "image" | "docx"; json_schema: Record | null; }; diff --git a/skyvern-frontend/src/routes/workflows/types/workflowYamlTypes.ts b/skyvern-frontend/src/routes/workflows/types/workflowYamlTypes.ts index 6ae6dd798..c02240627 100644 --- a/skyvern-frontend/src/routes/workflows/types/workflowYamlTypes.ts +++ b/skyvern-frontend/src/routes/workflows/types/workflowYamlTypes.ts @@ -352,7 +352,7 @@ export type SendEmailBlockYAML = BlockYAMLBase & { export type FileUrlParserBlockYAML = BlockYAMLBase & { block_type: "file_url_parser"; file_url: string; - file_type: "csv" | "excel" | "pdf" | "image" | "docx"; + file_type: "auto_detect" | "csv" | "excel" | "pdf" | "image" | "docx"; json_schema?: Record | null; }; diff --git a/skyvern-frontend/src/types/fetch-to-curl.d.ts b/skyvern-frontend/src/types/fetch-to-curl.d.ts new file mode 100644 index 000000000..f58290465 --- /dev/null +++ b/skyvern-frontend/src/types/fetch-to-curl.d.ts @@ -0,0 +1,6 @@ +declare module "fetch-to-curl" { + // eslint-disable-next-line @typescript-eslint/no-explicit-any + function fetchToCurl(requestInfo: any, requestInit?: any): string; + export default fetchToCurl; + export { fetchToCurl }; +} diff --git a/skyvern-frontend/src/util/apiCommands.ts b/skyvern-frontend/src/util/apiCommands.ts index 9dd9e46c9..f7e40da1f 100644 --- a/skyvern-frontend/src/util/apiCommands.ts +++ b/skyvern-frontend/src/util/apiCommands.ts @@ -1,4 +1,4 @@ -import fetchToCurl from "fetch-to-curl"; +import { fetchToCurl } from "fetch-to-curl"; export interface ApiCommandOptions { method: string; diff --git a/skyvern/cli/core/mcp_http_auth.py b/skyvern/cli/core/mcp_http_auth.py index 233f941c2..08f963516 100644 --- a/skyvern/cli/core/mcp_http_auth.py +++ b/skyvern/cli/core/mcp_http_auth.py @@ -77,15 +77,24 @@ class _OAuthResolution: def _canonical_mcp_url(base_url: str | None = None) -> str: + """Return the canonical MCP resource URI with no trailing slash. + + MCP clients canonicalize the resource URI without a trailing slash for + RFC 8707 audience / RFC 9728 protected-resource comparison. Advertising + ``.../mcp/`` caused Claude's MCP SDK to reject discovery with + ``Protected resource .../mcp/ does not match expected .../mcp``. The + audience / resource-claim validators below normalize both sides before + comparing so tokens issued against either form still validate. + """ candidate = (base_url or settings.SKYVERN_BASE_URL or _DEFAULT_REMOTE_BASE_URL).strip() if not candidate: candidate = _DEFAULT_REMOTE_BASE_URL parts = urlsplit(candidate) path = (parts.path or "").rstrip("/") if path.endswith("/mcp"): - canonical_path = f"{path}/" + canonical_path = path else: - canonical_path = f"{path}/mcp/" if path else "/mcp/" + canonical_path = f"{path}/mcp" if path else "/mcp" return urlunsplit((parts.scheme, parts.netloc, canonical_path, "", "")) @@ -151,25 +160,49 @@ def _validate_token_issuer(payload: dict[str, object], expected_issuer: str) -> raise HTTPException(status_code=401, detail="Token issuer is not valid for this MCP resource") +def _normalize_resource(resource: str) -> str: + """Strip a trailing slash so audience / resource comparisons are slash-agnostic. + + Centralized here so ``_validate_token_audience``, + ``_validate_token_resource_claims``, and any future validator apply the + same normalization rule to both sides of the comparison. + """ + return resource.rstrip("/") + + def _validate_token_audience(payload: dict[str, object], expected_resource: str) -> None: audience = payload.get("aud") if isinstance(audience, str): audiences = [audience] elif isinstance(audience, list): + # RFC 7519 permits `aud` as string-or-array-of-strings; silently drop + # any non-string items in the array rather than rejecting. This is + # intentionally more permissive than `_validate_token_resource_claims` + # (where the single `resource` claim must be a string outright): + # a malformed array element here shouldn't poison an otherwise valid + # audience list. audiences = [item for item in audience if isinstance(item, str)] else: audiences = [] - if expected_resource not in audiences: + # Slash-agnostic compare: tokens whose `aud` was minted against either + # `.../mcp` or `.../mcp/` validate against either expected form. + expected_norm = _normalize_resource(expected_resource) + if not any(_normalize_resource(a) == expected_norm for a in audiences): raise HTTPException(status_code=401, detail="Token audience is not valid for this MCP resource") def _validate_token_resource_claims(payload: dict[str, object], expected_resource: str) -> None: + expected_norm = _normalize_resource(expected_resource) for key in _RESOURCE_CLAIM_KEYS: claim_value = payload.get(key) if claim_value is None: continue - if claim_value != expected_resource: + # A non-string `resource` claim is a malformed token, not a value + # mismatch — use a distinct error so the cause is obvious in logs. + if not isinstance(claim_value, str): + raise HTTPException(status_code=401, detail="Token resource claim must be a string") + if _normalize_resource(claim_value) != expected_norm: raise HTTPException(status_code=401, detail="Token resource is not valid for this MCP resource") diff --git a/skyvern/cli/mcp_tools/blocks.py b/skyvern/cli/mcp_tools/blocks.py index 7c0ee0d14..9dea9e570 100644 --- a/skyvern/cli/mcp_tools/blocks.py +++ b/skyvern/cli/mcp_tools/blocks.py @@ -28,6 +28,8 @@ from skyvern.schemas.workflows import ( FileParserBlockYAML, FileUploadBlockYAML, ForLoopBlockYAML, + GoogleSheetsReadBlockYAML, + GoogleSheetsWriteBlockYAML, HttpRequestBlockYAML, HumanInteractionBlockYAML, LoginBlockYAML, @@ -78,6 +80,8 @@ BLOCK_TYPE_MAP: dict[str, type[BlockYAML]] = { BlockType.HUMAN_INTERACTION.value: HumanInteractionBlockYAML, BlockType.PRINT_PAGE.value: PrintPageBlockYAML, BlockType.WORKFLOW_TRIGGER.value: WorkflowTriggerBlockYAML, + BlockType.GOOGLE_SHEETS_READ.value: GoogleSheetsReadBlockYAML, + BlockType.GOOGLE_SHEETS_WRITE.value: GoogleSheetsWriteBlockYAML, } # --------------------------------------------------------------------------- @@ -107,6 +111,8 @@ BLOCK_SUMMARIES: dict[str, str] = { "human_interaction": "Pause workflow for human approval via email", "print_page": "Print the current page to PDF", "workflow_trigger": "Trigger another workflow by permanent ID, with optional payload and wait-for-completion", + "google_sheets_read": "Read rows from a Google Sheet as structured data (list of dicts)", + "google_sheets_write": "Write rows to a Google Sheet (append new rows or update existing cells)", } # --------------------------------------------------------------------------- @@ -211,6 +217,25 @@ BLOCK_EXAMPLES: dict[str, dict[str, Any]] = { "payload": {"url": "{{ some_parameter }}"}, "wait_for_completion": True, }, + "google_sheets_read": { + "block_type": "google_sheets_read", + "label": "read_sheet_data", + "spreadsheet_url": "https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit", + "sheet_name": "Sheet1", + "range": "A1:D100", + "credential_id": "{{ google_credential_id }}", + "has_header_row": True, + }, + "google_sheets_write": { + "block_type": "google_sheets_write", + "label": "write_sheet_data", + "spreadsheet_url": "https://docs.google.com/spreadsheets/d/SPREADSHEET_ID/edit", + "sheet_name": "Sheet1", + "range": "A1", + "credential_id": "{{ google_credential_id }}", + "write_mode": "append", + "values": "{{ output_data | tojson }}", + }, } # --------------------------------------------------------------------------- diff --git a/skyvern/exceptions.py b/skyvern/exceptions.py index 1c539a798..770fd8fd5 100644 --- a/skyvern/exceptions.py +++ b/skyvern/exceptions.py @@ -433,6 +433,16 @@ class ScrapingFailedBlankPage(ScrapingFailed): super().__init__(reason="It's a blank page. Please ensure there is a non-blank page for Skyvern to work with.") +class MissingStarterUrl(SkyvernException): + def __init__(self, block_label: str | None = None) -> None: + self.block_label = block_label + location = f"block '{block_label}'" if block_label else "the first browser block" + super().__init__( + f"{location} has no starting URL set. The first browser block must have a URL to navigate to. " + "Set a URL on the block, or reference a workflow parameter (e.g. '{{ starting_url }}')." + ) + + class WorkflowRunContextNotInitialized(SkyvernException): def __init__(self, workflow_run_id: str) -> None: super().__init__(f"WorkflowRunContext not initialized for workflow run {workflow_run_id}") diff --git a/skyvern/forge/agent.py b/skyvern/forge/agent.py index 0337be12c..4a00b2c09 100644 --- a/skyvern/forge/agent.py +++ b/skyvern/forge/agent.py @@ -94,7 +94,7 @@ from skyvern.forge.sdk.schemas.files import FileInfo from skyvern.forge.sdk.schemas.organizations import Organization from skyvern.forge.sdk.schemas.tasks import Task, TaskRequest, TaskResponse, TaskStatus from skyvern.forge.sdk.schemas.totp_codes import OTPType -from skyvern.forge.sdk.trace import traced +from skyvern.forge.sdk.trace import apply_context_attrs, traced from skyvern.forge.sdk.workflow.context_manager import WorkflowRunContext from skyvern.forge.sdk.workflow.models.block import ( ActionBlock, @@ -423,7 +423,12 @@ class ForgeAgent: operations = await app.AGENT_FUNCTION.generate_async_operations(organization, task, page) self.async_operation_pool.add_operations(task.task_id, operations) - @traced(name="skyvern.agent.execute_step") + # Span name intentionally differs from the method: this is the outer wrapper + # that handles cancellation checks, status updates, retries, and cleanup around + # the inner step body. "step_orchestration" reads better on the trace dashboard + # than "execute_step", which was easy to confuse with the inner span below. + # code.function still carries the qualname for grep. + @traced(name="skyvern.agent.step_orchestration", role="wrapper") async def execute_step( self, organization: Organization, @@ -1084,7 +1089,12 @@ class ForgeAgent: ) return True - @traced(name="skyvern.agent.step") + # Span name intentionally differs from the method: this is the inner body + # that actually runs a step (scrape, LLM call, action execution). It sits + # under the "step_orchestration" span above. "step_body" vs the old + # "skyvern.agent.step" makes the parent/child relationship obvious at a glance + # on the dashboard. code.function still carries the qualname for grep. + @traced(name="skyvern.agent.step_body", role="wrapper") async def agent_step( self, task: Task, @@ -2546,26 +2556,34 @@ class ForgeAgent: await skyvern_frame.safe_scroll_to_x_y(x, y) LOG.debug("Scrolled back to the original x, y position of the page after taking screenshot", x=x, y=y) _ctx = skyvern_context.current() - if step and not step.is_speculative and _ctx and _ctx.use_artifact_bundling: - ids = app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( - step=step, - screenshots=[screenshot], - artifact_type=ArtifactType.SCREENSHOT_ACTION, - ) - if ids: - screenshot_artifact_id = ids[0] - else: - screenshot_request = await app.ARTIFACT_MANAGER.prepare_llm_artifact( - data=screenshot, - artifact_type=ArtifactType.SCREENSHOT_ACTION, - step=step, - ) - if screenshot_request: - artifacts.append(screenshot_request) - for artifact_data in screenshot_request.artifacts: - if artifact_data.artifact_model.artifact_type == ArtifactType.SCREENSHOT_ACTION: - screenshot_artifact_id = artifact_data.artifact_model.artifact_id - break + # Derive bundled from the actual branch condition, not just the context flag — + # step=None and speculative steps force the non-bundled path. + _bundled = bool(step and not step.is_speculative and _ctx and _ctx.use_artifact_bundling) + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.agent.artifact.screenshot_action") as _ss_art_span: + apply_context_attrs(_ss_art_span) + _ss_art_span.set_attribute("screenshot_bytes", len(screenshot)) + _ss_art_span.set_attribute("bundled", _bundled) + if _bundled: + ids = app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( + step=step, + screenshots=[screenshot], + artifact_type=ArtifactType.SCREENSHOT_ACTION, + ) + if ids: + screenshot_artifact_id = ids[0] + else: + screenshot_request = await app.ARTIFACT_MANAGER.prepare_llm_artifact( + data=screenshot, + artifact_type=ArtifactType.SCREENSHOT_ACTION, + step=step, + ) + if screenshot_request: + artifacts.append(screenshot_request) + for artifact_data in screenshot_request.artifacts: + if artifact_data.artifact_model.artifact_type == ArtifactType.SCREENSHOT_ACTION: + screenshot_artifact_id = artifact_data.artifact_model.artifact_id + break except Exception: LOG.error( "Failed to record screenshot after action", @@ -2576,24 +2594,38 @@ class ForgeAgent: skyvern_frame = await SkyvernFrame.create_instance(frame=working_page) html = await skyvern_frame.get_content() _ctx = skyvern_context.current() - if step and not step.is_speculative and _ctx and _ctx.use_artifact_bundling: - app.ARTIFACT_MANAGER.accumulate_action_html_to_archive(step=step, html_action=html.encode("utf-8")) - else: - artifacts.append( - await app.ARTIFACT_MANAGER.prepare_llm_artifact( - data=html.encode(), - artifact_type=ArtifactType.HTML_ACTION, - step=step, + # Encode once to fix the html_bytes char-vs-byte mismatch and avoid a + # second pass through html.encode() in the chosen branch. + html_bytes = html.encode("utf-8") + _bundled = bool(step and not step.is_speculative and _ctx and _ctx.use_artifact_bundling) + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.agent.artifact.html_action") as _html_art_span: + apply_context_attrs(_html_art_span) + _html_art_span.set_attribute("html_bytes", len(html_bytes)) + _html_art_span.set_attribute("bundled", _bundled) + if _bundled: + app.ARTIFACT_MANAGER.accumulate_action_html_to_archive(step=step, html_action=html_bytes) + else: + artifacts.append( + await app.ARTIFACT_MANAGER.prepare_llm_artifact( + data=html_bytes, + artifact_type=ArtifactType.HTML_ACTION, + step=step, + ) ) - ) except Exception: LOG.exception("Failed to record html after action") if artifacts: - try: - await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) - except Exception: - LOG.warning("Failed to bulk create artifacts after action", exc_info=True) + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.agent.artifact.bulk_create") as _bulk_span: + apply_context_attrs(_bulk_span) + # Count underlying artifacts (main + screenshots per request), not request wrappers. + _bulk_span.set_attribute("artifact_count", sum(len(a.artifacts) for a in artifacts if a is not None)) + try: + await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) + except Exception: + LOG.warning("Failed to bulk create artifacts after action", exc_info=True) if screenshot_artifact_id and action.action_id and action.organization_id: action.screenshot_artifact_id = screenshot_artifact_id @@ -2608,19 +2640,22 @@ class ForgeAgent: artifact_id=screenshot_artifact_id, ) else: - try: - await app.DATABASE.artifacts.update_action_screenshot_artifact_id( - organization_id=action.organization_id, - action_id=action.action_id, - screenshot_artifact_id=screenshot_artifact_id, - ) - except Exception: - LOG.warning( - "Failed to update action with screenshot artifact id", - action_id=action.action_id, - screenshot_artifact_id=screenshot_artifact_id, - exc_info=True, - ) + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.agent.artifact.update_action_screenshot_fk") as _fk_span: + apply_context_attrs(_fk_span) + try: + await app.DATABASE.artifacts.update_action_screenshot_artifact_id( + organization_id=action.organization_id, + action_id=action.action_id, + screenshot_artifact_id=screenshot_artifact_id, + ) + except Exception: + LOG.warning( + "Failed to update action with screenshot artifact id", + action_id=action.action_id, + screenshot_artifact_id=screenshot_artifact_id, + exc_info=True, + ) async def initialize_execution_state( self, @@ -2705,7 +2740,7 @@ class ForgeAgent: scroll=scroll, ) - @traced(name="skyvern.agent.scrape_and_prompt") + @traced(name="skyvern.agent.scrape_and_prompt", role="wrapper") async def build_and_record_step_prompt( self, task: Task, @@ -3339,6 +3374,14 @@ class ForgeAgent: # Map template to prompt_name for logging/caching guards prompt_name = EXTRACT_ACTION_PROMPT_NAME if template == EXTRACT_ACTION_TEMPLATE else template + + # Tag the span so we can explain the 26x p95/p50 variance — prompt size + # is almost always the reason prompt_build gets slow. + _prompt_build_span = otel_trace.get_current_span() + _prompt_build_span.set_attribute("prompt_name", prompt_name) + _prompt_build_span.set_attribute("prompt_tokens", count_tokens(full_prompt)) + _prompt_build_span.set_attribute("use_caching", bool(use_caching)) + return full_prompt, use_caching, prompt_name async def _get_prompt_caching_settings(self, context: SkyvernContext) -> dict[str, bool]: @@ -3700,37 +3743,42 @@ class ForgeAgent: LOG.exception("Failed to take screenshot before sending task response") if task.organization_id: - try: - # Keep both finalize and save inside a single timeout budget so a hung - # finalize call cannot block persistence forever; accept the trade-off - # that a very slow finalize on many files could crowd out save. - async with asyncio.timeout(SAVE_DOWNLOADED_FILES_TIMEOUT): - if download_suffix and list_files_before is not None: - await self._finalize_downloaded_files_for_task( - task, + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.agent.cleanup.save_downloaded_files") as _cl_save_span: + apply_context_attrs(_cl_save_span) + try: + # Keep both finalize and save inside a single timeout budget so a hung + # finalize call cannot block persistence forever; accept the trade-off + # that a very slow finalize on many files could crowd out save. + async with asyncio.timeout(SAVE_DOWNLOADED_FILES_TIMEOUT): + if download_suffix and list_files_before is not None: + await self._finalize_downloaded_files_for_task( + task, + organization_id=task.organization_id, + download_suffix=download_suffix, + list_files_before=list_files_before, + randomize_if_missing=False, + ) + context = skyvern_context.current() + await app.STORAGE.save_downloaded_files( organization_id=task.organization_id, - download_suffix=download_suffix, - list_files_before=list_files_before, - randomize_if_missing=False, + run_id=context.run_id + if context and context.run_id + else task.workflow_run_id or task.task_id, ) - context = skyvern_context.current() - await app.STORAGE.save_downloaded_files( - organization_id=task.organization_id, - run_id=context.run_id if context and context.run_id else task.workflow_run_id or task.task_id, + except asyncio.TimeoutError: + LOG.warning( + "Timeout to save downloaded files", + task_id=task.task_id, + workflow_run_id=task.workflow_run_id, + ) + except Exception: + LOG.warning( + "Failed to save downloaded files", + exc_info=True, + task_id=task.task_id, + workflow_run_id=task.workflow_run_id, ) - except asyncio.TimeoutError: - LOG.warning( - "Timeout to save downloaded files", - task_id=task.task_id, - workflow_run_id=task.workflow_run_id, - ) - except Exception: - LOG.warning( - "Failed to save downloaded files", - exc_info=True, - task_id=task.task_id, - workflow_run_id=task.workflow_run_id, - ) # if it's a task block from workflow run, # we don't need to close the browser, save browser artifacts, or call webhook @@ -3744,15 +3792,22 @@ class ForgeAgent: await self.async_operation_pool.remove_task(task.task_id) - await self.cleanup_browser_and_create_artifacts( - close_browser_on_completion, last_step, task, browser_session_id=browser_session_id - ) + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.agent.cleanup.browser_and_artifacts") as _cl_br_span: + apply_context_attrs(_cl_br_span) + await self.cleanup_browser_and_create_artifacts( + close_browser_on_completion, last_step, task, browser_session_id=browser_session_id + ) # Wait for all tasks to complete before generating the links for the artifacts - await app.ARTIFACT_MANAGER.wait_for_upload_aiotasks([task.task_id]) + with _tracer.start_as_current_span("skyvern.agent.cleanup.wait_for_upload") as _cl_wait_span: + apply_context_attrs(_cl_wait_span) + await app.ARTIFACT_MANAGER.wait_for_upload_aiotasks([task.task_id]) if need_call_webhook: - await self.execute_task_webhook(task=task, api_key=api_key) + with _tracer.start_as_current_span("skyvern.agent.cleanup.webhook") as _cl_wh_span: + apply_context_attrs(_cl_wh_span) + await self.execute_task_webhook(task=task, api_key=api_key) async def execute_task_webhook( self, diff --git a/skyvern/forge/agent_functions.py b/skyvern/forge/agent_functions.py index 5be783810..2e28a35de 100644 --- a/skyvern/forge/agent_functions.py +++ b/skyvern/forge/agent_functions.py @@ -606,6 +606,96 @@ class AgentFunction: Cloud override provides actual solving; OSS base is a no-op.""" return False + async def get_google_sheets_credentials( + self, + organization_id: str, + credential_id: str, + ) -> str | None: + """Get a Google Sheets access token for the given credential. + + Returns None in OSS. Cloud override uses the OAuth service to + decrypt the stored refresh token and exchange it for an access token. + """ + return None + + async def get_google_workspace_credentials( + self, + organization_id: str, + credential_id: str, + required_scopes: list[str] | None = None, + ) -> object | None: + """OSS no-op; cloud override returns a refreshed google.oauth2.credentials.Credentials or None.""" + return None + + async def ensure_sheet_tab( + self, + *, + access_token: str, + spreadsheet_id: str, + title: str, + ) -> int | None: + """Ensure a sheet tab with the given title exists in the spreadsheet. + + Returns the sheet_id of the newly created tab, or None if the caller + should fall back to its own lookup (e.g. a concurrent creator won the + race). OSS base is a no-op that returns None; cloud override calls the + Sheets v4 batchUpdate addSheet endpoint. + """ + return None + + async def google_sheets_values_get( + self, + *, + access_token: str, + spreadsheet_id: str, + ranges: str, + fields: str | None = None, + ) -> dict[str, Any] | None: + """Read ranges from a spreadsheet via spreadsheets.get. OSS no-op.""" + return None + + async def google_sheets_values_append( + self, + *, + access_token: str, + spreadsheet_id: str, + range_: str, + values: list[list[Any]], + ) -> dict[str, Any] | None: + """Append rows via spreadsheets.values.append. OSS no-op.""" + return None + + async def google_sheets_values_update( + self, + *, + access_token: str, + spreadsheet_id: str, + range_: str, + values: list[list[Any]], + ) -> dict[str, Any] | None: + """Update rows via spreadsheets.values.update. OSS no-op.""" + return None + + async def google_sheets_batch_update( + self, + *, + access_token: str, + spreadsheet_id: str, + requests: list[dict[str, Any]], + ) -> dict[str, Any] | None: + """Apply a batchUpdate to a spreadsheet. OSS no-op.""" + return None + + async def google_sheets_get_sheet_id( + self, + *, + access_token: str, + spreadsheet_id: str, + sheet_title: str, + ) -> int | None: + """Resolve a tab title to its numeric sheetId. OSS no-op.""" + return None + async def generate_async_operations( self, organization: Organization, @@ -621,7 +711,7 @@ class AgentFunction: ) -> CleanupElementTreeFunc: MAX_ELEMENT_CNT = settings.SVG_MAX_PARSING_ELEMENT_CNT - @traced() + @traced(name="skyvern.agent.cleanup_element_tree") async def cleanup_element_tree_func(frame: Page | Frame, url: str, element_tree: list[dict]) -> list[dict]: """ Remove rect and attribute.unique_id from the elements. diff --git a/skyvern/forge/sdk/api/llm/api_handler_factory.py b/skyvern/forge/sdk/api/llm/api_handler_factory.py index 8b3dddf70..004a6c1b8 100644 --- a/skyvern/forge/sdk/api/llm/api_handler_factory.py +++ b/skyvern/forge/sdk/api/llm/api_handler_factory.py @@ -49,7 +49,7 @@ from skyvern.forge.sdk.core.skyvern_context import SkyvernContext from skyvern.forge.sdk.models import SpeculativeLLMMetadata, Step from skyvern.forge.sdk.schemas.ai_suggestions import AISuggestion from skyvern.forge.sdk.schemas.task_v2 import TaskV2, Thought -from skyvern.forge.sdk.trace import traced +from skyvern.forge.sdk.trace import apply_context_attrs, traced from skyvern.utils.image_resizer import Resolution, get_resize_target_dimension, resize_screenshots LOG = structlog.get_logger() @@ -590,7 +590,9 @@ class LLMAPIHandlerFactory: _llm_span.set_attribute("llm_key", llm_key) _llm_span.set_attribute("llm_model", main_model_group) _llm_span.set_attribute("prompt_name", prompt_name) - _llm_span.set_attribute("handler_type", "router") + # handler_type distinguishes the three LLM entry points that share + # the skyvern.llm.request span name. Dashboards filter on this attr. + _llm_span.set_attribute("handler_type", "router_with_fallback") if parameters is None: parameters = LLMAPIHandlerFactory.get_api_parameters(llm_config) @@ -647,24 +649,32 @@ class LLMAPIHandlerFactory: ) llm_prompt_value = prompt - if should_persist_llm_artifacts: - if _should_bundle: - _bundle_prompt = llm_prompt_value.encode("utf-8") - if screenshots and step: - app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( - step=step, - screenshots=screenshots, - artifact_type=ArtifactType.SCREENSHOT_LLM, + # Pre-request artifact persistence cluster. Covers prompt + screenshot + # staging before the LLM call. The hot cost is inside the non-bundled + # branch (prepare_llm_artifact → S3 upload). + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.artifact.pre_request") as _pre_span: + apply_context_attrs(_pre_span) + _pre_span.set_attribute("bundled", bool(_should_bundle)) + _pre_span.set_attribute("persist", bool(should_persist_llm_artifacts)) + if should_persist_llm_artifacts: + if _should_bundle: + _bundle_prompt = llm_prompt_value.encode("utf-8") + if screenshots and step: + app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( + step=step, + screenshots=screenshots, + artifact_type=ArtifactType.SCREENSHOT_LLM, + ) + else: + artifacts.append( + await app.ARTIFACT_MANAGER.prepare_llm_artifact( + data=llm_prompt_value.encode("utf-8"), + artifact_type=ArtifactType.LLM_PROMPT, + screenshots=screenshots, + **artifact_targets, + ) ) - else: - artifacts.append( - await app.ARTIFACT_MANAGER.prepare_llm_artifact( - data=llm_prompt_value.encode("utf-8"), - artifact_type=ArtifactType.LLM_PROMPT, - screenshots=screenshots, - **artifact_targets, - ) - ) # Build messages and apply caching in one step messages = await llm_messages_builder(prompt, screenshots, llm_config.add_assistant_prefix) @@ -845,7 +855,7 @@ class LLMAPIHandlerFactory: raise LLMProviderErrorRetryableTask(llm_key, cause=e) from e except litellm.exceptions.ContextWindowExceededError as e: duration_seconds = time.perf_counter() - start_time - _llm_span.set_attribute("status", "error") + _llm_span.set_attribute("status", "context_exceeded") LOG.exception( "Context window exceeded", llm_key=llm_key, @@ -887,6 +897,17 @@ class LLMAPIHandlerFactory: duration_seconds=_duration, ) raise LLMProviderError(llm_key) from None + except litellm.exceptions.RateLimitError as e: + duration_seconds = time.perf_counter() - start_time + _llm_span.set_attribute("status", "rate_limited") + LOG.warning( + "LLM request rate limited", + llm_key=llm_key, + model=main_model_group, + prompt_name=prompt_name, + duration_seconds=duration_seconds, + ) + raise LLMProviderError(llm_key, cause=e) from e except Exception as e: duration_seconds = time.perf_counter() - start_time _llm_span.set_attribute("status", "error") @@ -1066,24 +1087,35 @@ class LLMAPIHandlerFactory: return parsed_response finally: - try: - await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) - except Exception: - LOG.error("Failed to persist artifacts", exc_info=True) - if _should_bundle and should_persist_llm_artifacts and step: - _ctx = skyvern_context.current() - app.ARTIFACT_MANAGER.accumulate_llm_call_to_archive( - step=step, - workflow_run_id=_ctx.workflow_run_id if _ctx else None, - workflow_run_block_id=_ctx.parent_workflow_run_block_id if _ctx else None, - run_id=_ctx.run_id if _ctx else None, - hashed_href_map=_bundle_hashed_href_map, - prompt=_bundle_prompt, - request=_bundle_request, - response=_bundle_response, - parsed_response=_bundle_parsed, - rendered_response=_bundle_rendered, + # Post-response artifact persistence cluster. bulk_create_artifacts + # does the S3 uploads + DB rows for every LLM artifact queued during + # this request; the bundled-path archive accumulate is fast in-memory. + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.artifact.post_response") as _post_span: + apply_context_attrs(_post_span) + _post_span.set_attribute("bundled", bool(_should_bundle)) + # Count underlying artifacts (main + screenshots per request), not request wrappers. + _post_span.set_attribute( + "artifact_count", sum(len(a.artifacts) for a in artifacts if a is not None) ) + try: + await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) + except Exception: + LOG.error("Failed to persist artifacts", exc_info=True) + if _should_bundle and should_persist_llm_artifacts and step: + _ctx = skyvern_context.current() + app.ARTIFACT_MANAGER.accumulate_llm_call_to_archive( + step=step, + workflow_run_id=_ctx.workflow_run_id if _ctx else None, + workflow_run_block_id=_ctx.parent_workflow_run_block_id if _ctx else None, + run_id=_ctx.run_id if _ctx else None, + hashed_href_map=_bundle_hashed_href_map, + prompt=_bundle_prompt, + request=_bundle_request, + response=_bundle_response, + parsed_response=_bundle_parsed, + rendered_response=_bundle_rendered, + ) llm_api_handler_with_router_and_fallback.llm_key = llm_key # type: ignore[attr-defined] LLMAPIHandlerFactory._router_handler_cache[llm_key] = llm_api_handler_with_router_and_fallback @@ -1135,7 +1167,9 @@ class LLMAPIHandlerFactory: _assert_step_thought_exclusive(step, thought) start_time = time.perf_counter() _llm_span = otel_trace.get_current_span() - _llm_span.set_attribute("handler_type", "default") + # handler_type distinguishes the three LLM entry points that share + # the skyvern.llm.request span name. Dashboards filter on this attr. + _llm_span.set_attribute("handler_type", "single_handler") _llm_span.set_attribute("llm_key", llm_key) _llm_span.set_attribute("llm_model", llm_config.model_name) _llm_span.set_attribute("prompt_name", prompt_name) @@ -1202,24 +1236,32 @@ class LLMAPIHandlerFactory: ) llm_prompt_value = prompt - if should_persist_llm_artifacts: - if _should_bundle: - _bundle_prompt = llm_prompt_value.encode("utf-8") - if screenshots and step: - app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( - step=step, - screenshots=screenshots, - artifact_type=ArtifactType.SCREENSHOT_LLM, + # Pre-request artifact persistence cluster. Covers prompt + screenshot + # staging before the LLM call. The hot cost is inside the non-bundled + # branch (prepare_llm_artifact → S3 upload). + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.artifact.pre_request") as _pre_span: + apply_context_attrs(_pre_span) + _pre_span.set_attribute("bundled", bool(_should_bundle)) + _pre_span.set_attribute("persist", bool(should_persist_llm_artifacts)) + if should_persist_llm_artifacts: + if _should_bundle: + _bundle_prompt = llm_prompt_value.encode("utf-8") + if screenshots and step: + app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( + step=step, + screenshots=screenshots, + artifact_type=ArtifactType.SCREENSHOT_LLM, + ) + else: + artifacts.append( + await app.ARTIFACT_MANAGER.prepare_llm_artifact( + data=llm_prompt_value.encode("utf-8"), + artifact_type=ArtifactType.LLM_PROMPT, + screenshots=screenshots, + **artifact_targets, + ) ) - else: - artifacts.append( - await app.ARTIFACT_MANAGER.prepare_llm_artifact( - data=llm_prompt_value.encode("utf-8"), - artifact_type=ArtifactType.LLM_PROMPT, - screenshots=screenshots, - **artifact_targets, - ) - ) if not llm_config.supports_vision: screenshots = None @@ -1355,7 +1397,7 @@ class LLMAPIHandlerFactory: raise LLMProviderErrorRetryableTask(llm_key, cause=e) from e except litellm.exceptions.ContextWindowExceededError as e: duration_seconds = time.perf_counter() - start_time - _llm_span.set_attribute("status", "error") + _llm_span.set_attribute("status", "context_exceeded") LOG.exception( "Context window exceeded", llm_key=llm_key, @@ -1388,6 +1430,17 @@ class LLMAPIHandlerFactory: duration=t_llm_cancelled - t_llm_request, ) raise LLMProviderError(llm_key) from None + except litellm.exceptions.RateLimitError as e: + duration_seconds = time.perf_counter() - start_time + _llm_span.set_attribute("status", "rate_limited") + LOG.warning( + "LLM request rate limited", + llm_key=llm_key, + model=model_name, + prompt_name=prompt_name, + duration_seconds=duration_seconds, + ) + raise LLMProviderError(llm_key, cause=e) from e except Exception as e: duration_seconds = time.perf_counter() - start_time _llm_span.set_attribute("status", "error") @@ -1565,24 +1618,35 @@ class LLMAPIHandlerFactory: return parsed_response finally: - try: - await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) - except Exception: - LOG.error("Failed to persist artifacts", exc_info=True) - if _should_bundle and should_persist_llm_artifacts and step: - _ctx = skyvern_context.current() - app.ARTIFACT_MANAGER.accumulate_llm_call_to_archive( - step=step, - workflow_run_id=_ctx.workflow_run_id if _ctx else None, - workflow_run_block_id=_ctx.parent_workflow_run_block_id if _ctx else None, - run_id=_ctx.run_id if _ctx else None, - hashed_href_map=_bundle_hashed_href_map, - prompt=_bundle_prompt, - request=_bundle_request, - response=_bundle_response, - parsed_response=_bundle_parsed, - rendered_response=_bundle_rendered, + # Post-response artifact persistence cluster. bulk_create_artifacts + # does the S3 uploads + DB rows for every LLM artifact queued during + # this request; the bundled-path archive accumulate is fast in-memory. + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.artifact.post_response") as _post_span: + apply_context_attrs(_post_span) + _post_span.set_attribute("bundled", bool(_should_bundle)) + # Count underlying artifacts (main + screenshots per request), not request wrappers. + _post_span.set_attribute( + "artifact_count", sum(len(a.artifacts) for a in artifacts if a is not None) ) + try: + await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) + except Exception: + LOG.error("Failed to persist artifacts", exc_info=True) + if _should_bundle and should_persist_llm_artifacts and step: + _ctx = skyvern_context.current() + app.ARTIFACT_MANAGER.accumulate_llm_call_to_archive( + step=step, + workflow_run_id=_ctx.workflow_run_id if _ctx else None, + workflow_run_block_id=_ctx.parent_workflow_run_block_id if _ctx else None, + run_id=_ctx.run_id if _ctx else None, + hashed_href_map=_bundle_hashed_href_map, + prompt=_bundle_prompt, + request=_bundle_request, + response=_bundle_response, + parsed_response=_bundle_parsed, + rendered_response=_bundle_rendered, + ) llm_api_handler.llm_key = llm_key # type: ignore[attr-defined] return llm_api_handler @@ -1702,7 +1766,9 @@ class LLMCaller: _llm_span.set_attribute("llm_key", self.llm_key) _llm_span.set_attribute("llm_model", self.llm_config.model_name) _llm_span.set_attribute("prompt_name", prompt_name or "") - _llm_span.set_attribute("handler_type", "llm_caller") + # handler_type distinguishes the three LLM entry points that share + # the skyvern.llm.request span name. Dashboards filter on this attr. + _llm_span.set_attribute("handler_type", "direct_call") active_parameters = self.base_parameters or {} if parameters is None: parameters = LLMAPIHandlerFactory.get_api_parameters(self.llm_config) @@ -1759,27 +1825,43 @@ class LLMCaller: tool["display_height_px"] = target_dimension["height"] if "display_width_px" in tool: tool["display_width_px"] = target_dimension["width"] - screenshots = resize_screenshots(screenshots, target_dimension) + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.screenshot_resize") as _resize_span: + apply_context_attrs(_resize_span) + _resize_span.set_attribute("input_count", len(screenshots)) + _resize_span.set_attribute("input_bytes", sum(len(s) for s in screenshots)) + _resize_span.set_attribute("target_width", target_dimension["width"]) + _resize_span.set_attribute("target_height", target_dimension["height"]) + screenshots = resize_screenshots(screenshots, target_dimension) + _resize_span.set_attribute("output_bytes", sum(len(s) for s in screenshots)) llm_prompt_value = prompt or "" - if prompt and should_persist_llm_artifacts: - if _should_bundle: - _bundle_prompt = prompt.encode("utf-8") - if screenshots and step: - app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( - step=step, - screenshots=screenshots, - artifact_type=ArtifactType.SCREENSHOT_LLM, + # Pre-request artifact persistence cluster. Covers prompt + screenshot + # staging before the LLM call. The hot cost is inside the non-bundled + # branch (prepare_llm_artifact → S3 upload). + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.artifact.pre_request") as _pre_span: + apply_context_attrs(_pre_span) + _pre_span.set_attribute("bundled", bool(_should_bundle)) + _pre_span.set_attribute("persist", bool(prompt and should_persist_llm_artifacts)) + if prompt and should_persist_llm_artifacts: + if _should_bundle: + _bundle_prompt = prompt.encode("utf-8") + if screenshots and step: + app.ARTIFACT_MANAGER.accumulate_screenshot_to_step_archive( + step=step, + screenshots=screenshots, + artifact_type=ArtifactType.SCREENSHOT_LLM, + ) + else: + artifacts.append( + await app.ARTIFACT_MANAGER.prepare_llm_artifact( + data=prompt.encode("utf-8"), + artifact_type=ArtifactType.LLM_PROMPT, + screenshots=screenshots, + **artifact_targets, + ) ) - else: - artifacts.append( - await app.ARTIFACT_MANAGER.prepare_llm_artifact( - data=prompt.encode("utf-8"), - artifact_type=ArtifactType.LLM_PROMPT, - screenshots=screenshots, - **artifact_targets, - ) - ) if not self.llm_config.supports_vision: screenshots = None @@ -1838,7 +1920,7 @@ class LLMCaller: _llm_span.set_attribute("status", "error") raise LLMProviderErrorRetryableTask(self.llm_key, cause=e) from e except litellm.exceptions.ContextWindowExceededError as e: - _llm_span.set_attribute("status", "error") + _llm_span.set_attribute("status", "context_exceeded") LOG.exception( "Context window exceeded", llm_key=self.llm_key, @@ -1867,6 +1949,10 @@ class LLMCaller: duration=t_llm_cancelled - t_llm_request, ) raise LLMProviderError(self.llm_key) from None + except litellm.exceptions.RateLimitError as e: + _llm_span.set_attribute("status", "rate_limited") + LOG.warning("LLM request rate limited", llm_key=self.llm_key) + raise LLMProviderError(self.llm_key, cause=e) from e except Exception as e: _llm_span.set_attribute("status", "error") LOG.exception("LLM request failed unexpectedly", llm_key=self.llm_key) @@ -2005,31 +2091,40 @@ class LLMCaller: return parsed_response finally: - try: - await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) - except Exception: - LOG.error("Failed to persist artifacts", exc_info=True) - if _should_bundle and should_persist_llm_artifacts and step: - _ctx = skyvern_context.current() - app.ARTIFACT_MANAGER.accumulate_llm_call_to_archive( - step=step, - workflow_run_id=_ctx.workflow_run_id if _ctx else None, - workflow_run_block_id=_ctx.parent_workflow_run_block_id if _ctx else None, - run_id=_ctx.run_id if _ctx else None, - hashed_href_map=_bundle_hashed_href_map, - prompt=_bundle_prompt, - request=_bundle_request, - response=_bundle_response, - parsed_response=_bundle_parsed, - rendered_response=_bundle_rendered, - ) + # Post-response artifact persistence cluster. bulk_create_artifacts does + # the S3 uploads + DB rows for every LLM artifact queued during this + # request; the bundled-path archive accumulate is fast in-memory. + _tracer = otel_trace.get_tracer("skyvern") + with _tracer.start_as_current_span("skyvern.llm.artifact.post_response") as _post_span: + apply_context_attrs(_post_span) + _post_span.set_attribute("bundled", bool(_should_bundle)) + # Count underlying artifacts (main + screenshots per request), not request wrappers. + _post_span.set_attribute("artifact_count", sum(len(a.artifacts) for a in artifacts if a is not None)) + try: + await app.ARTIFACT_MANAGER.bulk_create_artifacts(artifacts) + except Exception: + LOG.error("Failed to persist artifacts", exc_info=True) + if _should_bundle and should_persist_llm_artifacts and step: + _ctx = skyvern_context.current() + app.ARTIFACT_MANAGER.accumulate_llm_call_to_archive( + step=step, + workflow_run_id=_ctx.workflow_run_id if _ctx else None, + workflow_run_block_id=_ctx.parent_workflow_run_block_id if _ctx else None, + run_id=_ctx.run_id if _ctx else None, + hashed_href_map=_bundle_hashed_href_map, + prompt=_bundle_prompt, + request=_bundle_request, + response=_bundle_response, + parsed_response=_bundle_parsed, + rendered_response=_bundle_rendered, + ) def get_screenshot_resize_target_dimension(self, window_dimension: Resolution | None) -> Resolution: if window_dimension and window_dimension != self.browser_window_dimension: return get_resize_target_dimension(window_dimension) return self.screenshot_resize_target_dimension - @traced() + @traced(name="skyvern.llm.dispatch") async def _dispatch_llm_call( self, messages: list[dict[str, Any]], diff --git a/skyvern/forge/sdk/copilot/agent.py b/skyvern/forge/sdk/copilot/agent.py index 4e3cbe841..f128925c0 100644 --- a/skyvern/forge/sdk/copilot/agent.py +++ b/skyvern/forge/sdk/copilot/agent.py @@ -187,10 +187,14 @@ def _translate_to_agent_result( action_data = parse_final_response(text) user_response = action_data.get("user_response") or "Done." + resp_type = action_data.get("type", "REPLY") + if resp_type not in ("REPLY", "ASK_QUESTION", "REPLACE_WORKFLOW"): + resp_type = "REPLY" + last_workflow = ctx.last_workflow last_workflow_yaml = ctx.last_workflow_yaml - if action_data.get("type") == "REPLACE_WORKFLOW": + if resp_type == "REPLACE_WORKFLOW": LOG.warning("Agent used inline REPLACE_WORKFLOW instead of update_workflow tool") workflow_yaml = action_data.get("workflow_yaml", "") if workflow_yaml: @@ -201,6 +205,7 @@ def _translate_to_agent_result( organization_id=organization_id, workflow_yaml=workflow_yaml, ) + last_workflow_yaml = workflow_yaml except (yaml.YAMLError, ValidationError, BaseWorkflowHTTPException) as e: LOG.warning("Failed to process final workflow YAML", error=str(e)) user_response = ( @@ -209,22 +214,24 @@ def _translate_to_agent_result( f"Please ask me to fix it.)" ) - # Inline REPLACE_WORKFLOW bypasses _update_workflow so ctx.last_workflow - # is still whatever the tool layer last saw. Writing the REPLACE_WORKFLOW - # result onto ctx ensures _verified_workflow_or_none sees the right - # candidate — though it still gates on last_test_ok, so an untested - # REPLACE is rejected too. - if action_data.get("type") == "REPLACE_WORKFLOW" and last_workflow is not ctx.last_workflow: + # Inline REPLACE_WORKFLOW bypasses _update_workflow, so ctx.last_workflow + # is whatever the tool layer last saw. Write the REPLACE candidate onto + # ctx and invalidate any prior passing test: the REPLACE yaml itself was + # never run, so a leftover ``last_test_ok is True`` from an earlier tested + # (but different) yaml must not promote this untested one. + if resp_type == "REPLACE_WORKFLOW" and last_workflow is not ctx.last_workflow: ctx.last_workflow = last_workflow ctx.last_workflow_yaml = last_workflow_yaml + ctx.last_test_ok = None - user_response = _rewrite_failed_test_response(str(user_response), ctx) + # ASK_QUESTION replies carry a specific clarifying question — often the + # "stop and ask" unblocker the system prompt now requires when the agent + # cannot test. The generic rewrite would replace it with a vague + # "Could you share more context", so skip it for ASK_QUESTION. + if resp_type != "ASK_QUESTION": + user_response = _rewrite_failed_test_response(str(user_response), ctx) last_workflow, last_workflow_yaml = _verified_workflow_or_none(ctx) - resp_type = action_data.get("type", "REPLY") - if resp_type not in ("REPLY", "ASK_QUESTION", "REPLACE_WORKFLOW"): - resp_type = "REPLY" - llm_context_raw = action_data.get("global_llm_context") structured = StructuredContext.from_json_str(global_llm_context) if isinstance(llm_context_raw, dict): diff --git a/skyvern/forge/sdk/copilot/context.py b/skyvern/forge/sdk/copilot/context.py index 2e65182f0..4ddf0d615 100644 --- a/skyvern/forge/sdk/copilot/context.py +++ b/skyvern/forge/sdk/copilot/context.py @@ -160,6 +160,15 @@ class CopilotContext(AgentContext): failed_test_nudge_count: int = 0 explore_without_workflow_nudge_count: int = 0 last_failed_workflow_yaml: str | None = None + # Set when a block-running tool timed out and the run's true outcome + # could not be reconciled (post-drain row was ``canceled``, non-final, or + # unreadable). Blocks further block-running tool calls until the LLM + # calls ``get_run_results(workflow_run_id=)`` AND that read returns + # a status in ``_TRUSTED_POST_DRAIN_STATUSES``. Turn-scoped by + # construction — ``CopilotContext`` is re-created per agent turn — so + # this guards auto-retry WITHIN a turn but not cross-turn "user says + # retry" requests. + pending_reconciliation_run_id: str | None = None # Consecutive test runs whose data-producing blocks completed with no # meaningful output (missing, empty, or all-null fields). Resets when a # run produces real data. Used to escalate when the agent is stuck diff --git a/skyvern/forge/sdk/copilot/tools.py b/skyvern/forge/sdk/copilot/tools.py index aae464fb4..287747b51 100644 --- a/skyvern/forge/sdk/copilot/tools.py +++ b/skyvern/forge/sdk/copilot/tools.py @@ -41,7 +41,7 @@ from skyvern.forge.sdk.workflow.models.parameter import ( WorkflowParameter, WorkflowParameterType, ) -from skyvern.forge.sdk.workflow.models.workflow import Workflow, WorkflowRunStatus +from skyvern.forge.sdk.workflow.models.workflow import Workflow, WorkflowRun, WorkflowRunStatus from skyvern.webeye.navigation import is_skip_inner_retry_error LOG = structlog.get_logger() @@ -55,7 +55,7 @@ _FAILED_BLOCK_STATUSES: frozenset[str] = frozenset( } ) _DATA_PRODUCING_BLOCK_TYPES = frozenset({"EXTRACTION", "TEXT_PROMPT"}) -RUN_BLOCKS_DEBUG_TIMEOUT_SECONDS = 180 +RUN_BLOCKS_DEBUG_TIMEOUT_SECONDS = 300 # Detached cleanup tasks held here so the garbage collector does not drop them # while they still have work to do, and so the "task exception was never @@ -114,6 +114,81 @@ def _log_detached_cleanup_failure(task: asyncio.Task) -> None: LOG.warning("Detached cancel fallback failed", exc_info=exc) +async def _safe_read_workflow_run( + workflow_run_id: str, + organization_id: str, + *, + context: str, +) -> WorkflowRun | None: + """Read a workflow_runs row, logging-and-returning-None on failure. + + The ``context`` string distinguishes call sites in logs (e.g. + ``"pre-cancel"`` vs ``"post-drain"``) so a failure is attributable to + the specific phase of the timeout branch it fired from. + """ + try: + return await app.DATABASE.workflow_runs.get_workflow_run( + workflow_run_id=workflow_run_id, + organization_id=organization_id, + ) + except Exception: + LOG.warning( + "Workflow run re-read failed", + workflow_run_id=workflow_run_id, + context=context, + exc_info=True, + ) + return None + + +def _trusted_post_drain_status(run: WorkflowRun | None) -> str | None: + """Return the run's status if it is one we can trust after the cancel + helper has run; otherwise ``None``. + + ``canceled`` is deliberately rejected because at post-drain read time we + can't tell a legitimate ``canceled`` (written by + ``_finalize_workflow_run_status`` when a block/user canceled the run) + apart from a synthetic ``canceled`` (written by the cancel helper's + fallback). Callers that need to distinguish those cases must read the row + BEFORE the cancel helper runs. + """ + if run is None: + return None + if WorkflowRunStatus(run.status).is_final_excluding_canceled(): + return run.status + return None + + +def _maybe_clear_reconciliation_flag(copilot_ctx: Any, result: Any) -> None: + """Clear ``pending_reconciliation_run_id`` iff ``result`` proves the + pending run has reached a trustworthy-final status. + + Called from ``get_run_results_tool`` after ``_get_run_results`` returns. + Requires (a) a pending run_id on the ctx, (b) a matching resolved run_id + in ``result.data`` (so a ``workflow_run_id=None`` call that resolves to + a different run does NOT clear), and (c) the resolved ``overall_status`` + passes ``is_final_excluding_canceled`` (so an ambiguous ``canceled`` + does NOT clear). + """ + pending_run_id = getattr(copilot_ctx, "pending_reconciliation_run_id", None) + if not isinstance(pending_run_id, str) or not pending_run_id: + return + if not isinstance(result, dict): + return + data = result.get("data") + if not isinstance(data, dict): + return + resolved_run_id = data.get("workflow_run_id") + resolved_status = data.get("overall_status") + if ( + isinstance(resolved_run_id, str) + and resolved_run_id == pending_run_id + and isinstance(resolved_status, str) + and WorkflowRunStatus(resolved_status).is_final_excluding_canceled() + ): + copilot_ctx.pending_reconciliation_run_id = None + + # Streak threshold at which the copilot hard-aborts a tool call because the # same action sequence has repeated run-over-run with no intervening success. # The streak counter is incremented in ``update_repeated_failure_state`` AFTER @@ -240,6 +315,23 @@ def _tool_loop_error(ctx: AgentContext, tool_name: str) -> str | None: # to block-running tools so planning/metadata tools (update_workflow, # list_credentials, get_run_results) stay unaffected. if tool_name in _BLOCK_RUNNING_TOOLS: + # Reconciliation guard: a previous block-running tool call timed out + # without reconciling the run's outcome (post-drain status was + # ``canceled``, non-final, or unreadable). Block further block-running + # calls until ``get_run_results`` clears the flag. Prevents the LLM + # from auto-retrying a mutation block whose side effects may have + # landed. + pending_run_id = getattr(ctx, "pending_reconciliation_run_id", None) + if isinstance(pending_run_id, str) and pending_run_id: + return ( + f"The previous block-running tool call timed out and run " + f"{pending_run_id} was not reconciled to a trustworthy terminal " + f'status. Call `get_run_results(workflow_run_id="{pending_run_id}")` ' + f"first, report the result to the user, then await user input " + f"before running more blocks. This guard prevents duplicate " + f"side effects on live sites." + ) + streak_raw = getattr(ctx, "repeated_action_fingerprint_streak_count", 0) streak = streak_raw if isinstance(streak_raw, int) else 0 if streak >= REPEATED_ACTION_STREAK_ABORT_AT: @@ -861,26 +953,78 @@ async def _run_blocks_and_collect_debug( break if final_status is None: - await _cancel_run_task_if_not_final(run_task, workflow_run.workflow_run_id) - timeout_msg = ( - f"Block execution timed out after {max_poll}s. " - f"Run ID: {workflow_run.workflow_run_id}. " - f"The task was likely stuck repeating failing actions. " - f"Consider: simplifying the navigation_goal, using a more specific URL, " - f"adding a dismiss-popup step, or concluding the site is not automatable." + # Read the row BEFORE the cancel helper runs. The poll interval is + # 2 s, so a legitimate self-finalize (``canceled`` from a user/block + # cancel, or any other terminal) can land in the narrow gap between + # the last poll and entering this branch. Trusting the pre-cancel + # status preserves that signal without ambiguity — nothing we do + # could have written it. Any post-cancel re-read has to exclude + # ``canceled`` because the helper itself writes synthetic + # ``canceled`` (see ``WorkflowRunStatus.is_final_excluding_canceled``). + pre_cancel_run = await _safe_read_workflow_run( + workflow_run.workflow_run_id, ctx.organization_id, context="pre-cancel" ) - if ctx.browser_session_id: - try: - browser_state = await app.PERSISTENT_SESSIONS_MANAGER.get_browser_state( - session_id=ctx.browser_session_id, - organization_id=ctx.organization_id, + + if pre_cancel_run is not None and WorkflowRunStatus(pre_cancel_run.status).is_final(): + final_status = pre_cancel_run.status + run = pre_cancel_run + else: + await _cancel_run_task_if_not_final(run_task, workflow_run.workflow_run_id) + + # The cancel helper waits for ``execute_workflow``'s shielded + # ``_finalize_workflow_run_status`` to run, which writes the real + # terminal status when the run completed during the drain window. + # Re-read once before declaring timeout so a run that finished + # in those last few seconds is not mis-reported as a failure the + # LLM will retry against a live site. + run = await _safe_read_workflow_run( + workflow_run.workflow_run_id, ctx.organization_id, context="post-drain" + ) + trusted = _trusted_post_drain_status(run) + if trusted is not None: + final_status = trusted + else: + # Three distinct cases produce a ``None`` from + # ``_trusted_post_drain_status`` — DB read failed, row + # non-final, or ``canceled`` (legitimate vs synthetic + # indistinguishable at this read). Body differs so logs + # are attributable; the LLM-facing instruction is + # identical: outcome is uncertain, reconcile via + # ``get_run_results`` before re-invoking block-running + # tools. The reconciliation guard below enforces that. + if run is None: + timeout_body = ( + f"Block execution exceeded the {max_poll}s tool budget and the workflow " + f"run row could not be re-read after cancellation (see 'Workflow run " + f"re-read failed' log)." + ) + elif run.status == WorkflowRunStatus.canceled.value: + timeout_body = ( + f"Block execution was cancelled after exceeding the {max_poll}s tool " + f"budget. Side effects of any actions already taken may have landed." + ) + else: + timeout_body = ( + f"Block execution exceeded the {max_poll}s tool budget and the workflow " + f"run did not reach a terminal status within the cancellation grace " + f"window (last observed status: {run.status})." + ) + timeout_msg = ( + f"{timeout_body} Run ID: {workflow_run.workflow_run_id}. " + f"Outcome is uncertain. Do NOT re-invoke block-running tools in this " + f"session without first calling `get_run_results` with this " + f"workflow_run_id and reporting the result to the user." ) - if browser_state: - page = await browser_state.get_or_create_page() - timeout_msg += f" Browser was on: {page.url}" - except Exception: - pass - return {"ok": False, "error": timeout_msg} + current_url, _ = await _fallback_page_info(ctx) + if current_url: + timeout_msg += f" Browser was on: {current_url}" + # Turn-scoped reconciliation guard. Clearing requires an + # explicit ``get_run_results`` read that resolves to this + # run_id with a status that passes + # ``WorkflowRunStatus.is_final_excluding_canceled`` (see + # clearing logic in ``get_run_results_tool``). + ctx.pending_reconciliation_run_id = workflow_run.workflow_run_id + return {"ok": False, "error": timeout_msg} except asyncio.CancelledError: # The SDK's @function_tool(timeout=...) cancelled us mid-poll. Shield # the cleanup so the parent cancellation can't interrupt it mid-await. @@ -1691,6 +1835,8 @@ async def get_run_results_tool( if workflow_run_id: params["workflow_run_id"] = workflow_run_id result = await _get_run_results(params, copilot_ctx) + _maybe_clear_reconciliation_flag(copilot_ctx, result) + sanitized = sanitize_tool_result_for_llm("get_run_results", result) return json.dumps(sanitized) diff --git a/skyvern/forge/sdk/core/skyvern_context.py b/skyvern/forge/sdk/core/skyvern_context.py index feb6bad37..1aa332b6e 100644 --- a/skyvern/forge/sdk/core/skyvern_context.py +++ b/skyvern/forge/sdk/core/skyvern_context.py @@ -37,6 +37,7 @@ class SkyvernContext: max_screenshot_scrolls: int | None = None browser_container_ip: str | None = None browser_container_task_arn: str | None = None + is_remote_browser: bool = False feature_flag_entries: dict[str, bool | str | None] = field(default_factory=dict) # feature flags diff --git a/skyvern/forge/sdk/event/base.py b/skyvern/forge/sdk/event/base.py index a7767ca2d..e52834ed6 100644 --- a/skyvern/forge/sdk/event/base.py +++ b/skyvern/forge/sdk/event/base.py @@ -21,7 +21,7 @@ class CursorEventStrategy(ABC): pass @abstractmethod - async def click(self, page: Page, locator: Locator) -> None: + async def click(self, page: Page, locator: Locator, timeout: float | None = None) -> None: pass diff --git a/skyvern/forge/sdk/event/default.py b/skyvern/forge/sdk/event/default.py index 5a3b5f007..a0d4e2705 100644 --- a/skyvern/forge/sdk/event/default.py +++ b/skyvern/forge/sdk/event/default.py @@ -34,8 +34,12 @@ class DefaultCursorStrategy(CursorEventStrategy): LOG.debug("move_to_element failed", exc_info=True) return 0.0, 0.0 - async def click(self, page: Page, locator: Locator) -> None: - await locator.click() + async def click(self, page: Page, locator: Locator, timeout: float | None = None) -> None: + LOG.debug("DefaultCursorStrategy.click", timeout=timeout) + kwargs: dict = {} + if timeout is not None: + kwargs["timeout"] = timeout + await locator.click(**kwargs) class DefaultInputStrategy(InputEventStrategy): diff --git a/skyvern/forge/sdk/event/factory.py b/skyvern/forge/sdk/event/factory.py index 2e38cecc5..83ab61988 100644 --- a/skyvern/forge/sdk/event/factory.py +++ b/skyvern/forge/sdk/event/factory.py @@ -123,6 +123,15 @@ class EventStrategyFactory: """Update cursor position without generating movement.""" EventStrategyFactory.get_cursor_strategy().sync_position(page, x, y) + @staticmethod + async def click(page: Page, locator: Locator, timeout: float | None = None) -> None: + """Click an element using the active cursor strategy.""" + start = time.perf_counter() + try: + await EventStrategyFactory.get_cursor_strategy().click(page, locator, timeout) + finally: + EventStrategyFactory.__metrics.record("click", time.perf_counter() - start) + # -- input convenience methods ---------------------------------------------- @staticmethod diff --git a/skyvern/forge/sdk/routes/workflow_copilot.py b/skyvern/forge/sdk/routes/workflow_copilot.py index 601105f34..b8d88db88 100644 --- a/skyvern/forge/sdk/routes/workflow_copilot.py +++ b/skyvern/forge/sdk/routes/workflow_copilot.py @@ -858,7 +858,8 @@ async def _new_copilot_chat_post( # panel state below stays gated on auto_accept — the frontend # applies proposals via applyWorkflowUpdate when auto-accept is # on. - if _should_restore_persisted_workflow(chat.auto_accept, agent_result): + restored = _should_restore_persisted_workflow(chat.auto_accept, agent_result) + if restored: await _restore_workflow_definition(original_workflow, organization.organization_id) if chat.auto_accept is not True: @@ -871,11 +872,16 @@ async def _new_copilot_chat_post( workflow_copilot_chat_id=chat.workflow_copilot_chat_id, proposed_workflow=proposed_data, ) - elif getattr(agent_result, "clear_proposed_workflow", False): - # Feasibility-gate fast-path returned ASK_QUESTION. Null - # any previously-persisted proposed_workflow so a page - # reload does not resurrect a stale draft alongside the - # new clarification question. + elif ( + restored or getattr(agent_result, "clear_proposed_workflow", False) + ) and chat.proposed_workflow is not None: + # Null any previously-persisted proposed_workflow so a + # page reload does not resurrect a stale Accept/Reject + # card next to an assistant message that just explained + # why no verified proposal is available. Covers: + # * feasibility-gate fast-path clarifications, and + # * SKY-9143 strict-gate turns where a mid-turn draft was + # rolled back (``restored=True``). await app.DATABASE.workflow_params.update_workflow_copilot_chat( organization_id=chat.organization_id, workflow_copilot_chat_id=chat.workflow_copilot_chat_id, diff --git a/skyvern/forge/sdk/schemas/tasks.py b/skyvern/forge/sdk/schemas/tasks.py index 5e4d9c796..68670bd31 100644 --- a/skyvern/forge/sdk/schemas/tasks.py +++ b/skyvern/forge/sdk/schemas/tasks.py @@ -357,6 +357,7 @@ class Task(TaskBase): finished_at=self.finished_at, extracted_information=self.extracted_information, failure_reason=failure_reason or self.failure_reason, + failure_category=self.failure_category, webhook_failure_reason=self.webhook_failure_reason, action_screenshot_urls=action_screenshot_urls, screenshot_url=screenshot_url, @@ -387,6 +388,7 @@ class TaskResponse(BaseModel): downloaded_files: list[FileInfo] | None = None downloaded_file_urls: list[str] | None = None failure_reason: str | None = None + failure_category: list[dict[str, Any]] | None = None webhook_failure_reason: str | None = None errors: list[dict[str, Any]] = [] max_steps_per_run: int | None = None diff --git a/skyvern/forge/sdk/trace/__init__.py b/skyvern/forge/sdk/trace/__init__.py index 07bff8a28..882a055b2 100644 --- a/skyvern/forge/sdk/trace/__init__.py +++ b/skyvern/forge/sdk/trace/__init__.py @@ -1,9 +1,11 @@ import asyncio from functools import wraps -from typing import Any, Callable +from typing import Any, Callable, Literal from opentelemetry import trace +SpanRole = Literal["wrapper"] + # Context fields to auto-attach to every span. Deliberately minimal — each # attribute is paid for in storage and index cardinality, so only IDs we # actively query on during profiling / Milestone 2 aggregations belong here. @@ -56,7 +58,11 @@ def apply_context_attrs(span: Any) -> None: span.set_attribute(attr, str(value)) -def traced(name: str | None = None, tags: list[str] | None = None) -> Callable: +def traced( + name: str | None = None, + tags: list[str] | None = None, + role: SpanRole | None = None, +) -> Callable: """Decorator that creates an OTEL span. No-op without SDK installed. Every span is tagged with: @@ -73,6 +79,11 @@ def traced(name: str | None = None, tags: list[str] | None = None) -> Callable: Args: name: Span name. If not provided, uses func.__qualname__. tags: Tags to add as a span attribute. + role: Optional span role. Set to "wrapper" on spans whose duration is + dominated by the work of their children (e.g. `agent.step`, + `workflow.execute`). Dashboards filter these out with + `skyvern.span.role != 'wrapper'` so leaf-time composition (pie, + stacked bar) isn't double-counted via nesting. """ def decorator(func: Callable) -> Callable: @@ -87,6 +98,8 @@ def traced(name: str | None = None, tags: list[str] | None = None) -> Callable: with trace.get_tracer("skyvern").start_as_current_span(span_name) as span: span.set_attribute("code.function", code_function) span.set_attribute("code.namespace", code_namespace) + if role is not None: + span.set_attribute("skyvern.span.role", role) apply_context_attrs(span) if tags: span.set_attribute("tags", tags) @@ -105,6 +118,8 @@ def traced(name: str | None = None, tags: list[str] | None = None) -> Callable: with trace.get_tracer("skyvern").start_as_current_span(span_name) as span: span.set_attribute("code.function", code_function) span.set_attribute("code.namespace", code_namespace) + if role is not None: + span.set_attribute("skyvern.span.role", role) apply_context_attrs(span) if tags: span.set_attribute("tags", tags) diff --git a/skyvern/forge/sdk/workflow/models/_jinja.py b/skyvern/forge/sdk/workflow/models/_jinja.py new file mode 100644 index 000000000..a27069058 --- /dev/null +++ b/skyvern/forge/sdk/workflow/models/_jinja.py @@ -0,0 +1,37 @@ +"""Standalone Jinja environment shared by block modules to avoid import cycles.""" + +from __future__ import annotations + +import json +from typing import Any + +from jinja2.sandbox import SandboxedEnvironment + +# Sentinel marker for native JSON type injection via | json filter. +_JSON_TYPE_MARKER = "__SKYVERN_RAW_JSON__" + + +def _json_type_filter(value: Any) -> str: + """Jinja filter that marks a value for native JSON type injection. + + Usage in templates: {{ some_bool | json }} + + The filter serializes the value to JSON and wraps it with sentinel markers. + When _render_templates_in_json() detects these markers, it unwraps and + parses the JSON to get the native typed value (bool, int, list, etc.). + + Uses default=str to handle non-JSON-serializable types (datetime, Enum, etc.) + """ + return f"{_JSON_TYPE_MARKER}{json.dumps(value, default=str)}{_JSON_TYPE_MARKER}" + + +def _json_finalize(value: Any) -> Any: + """Jinja finalize hook: JSON-serialize dict/list values so `{{ var }}` yields valid JSON + instead of Python repr. Strings (including `| tojson` output) pass through unchanged.""" + if isinstance(value, (dict, list)): + return json.dumps(value, default=str) + return value + + +jinja_json_finalize_env = SandboxedEnvironment(finalize=_json_finalize) +jinja_json_finalize_env.filters["json"] = _json_type_filter diff --git a/skyvern/forge/sdk/workflow/models/block.py b/skyvern/forge/sdk/workflow/models/block.py index 273809da7..6419304dc 100644 --- a/skyvern/forge/sdk/workflow/models/block.py +++ b/skyvern/forge/sdk/workflow/models/block.py @@ -31,6 +31,7 @@ from charset_normalizer import from_bytes from email_validator import EmailNotValidError, validate_email from jinja2 import StrictUndefined from jinja2.sandbox import SandboxedEnvironment +from opentelemetry import trace as otel_trace from playwright.async_api import Page from pydantic import BaseModel, Field, model_validator @@ -47,6 +48,7 @@ from skyvern.exceptions import ( DownloadFileMaxSizeExceeded, MissingBrowserState, MissingBrowserStatePage, + MissingStarterUrl, PDFParsingError, TaskNotFound, UnexpectedTaskStatus, @@ -101,6 +103,10 @@ from skyvern.forge.sdk.workflow.loop_download_filter import ( filter_downloaded_files_for_current_iteration, to_downloaded_file_signature, ) +from skyvern.forge.sdk.workflow.models._jinja import ( + _JSON_TYPE_MARKER, + _json_type_filter, +) from skyvern.forge.sdk.workflow.models.parameter import ( PARAMETER_TYPE, AWSSecretParameter, @@ -219,24 +225,6 @@ else: # Date format used for the built-in {{current_date}} reserved parameter. CURRENT_DATE_FORMAT = "%Y-%m-%d" -# Sentinel marker for native JSON type injection via | json filter. -_JSON_TYPE_MARKER = "__SKYVERN_RAW_JSON__" - - -def _json_type_filter(value: Any) -> str: - """Jinja filter that marks a value for native JSON type injection. - - Usage in templates: {{ some_bool | json }} - - The filter serializes the value to JSON and wraps it with sentinel markers. - When _render_templates_in_json() detects these markers, it unwraps and - parses the JSON to get the native typed value (bool, int, list, etc.). - - Uses default=str to handle non-JSON-serializable types (datetime, Enum, etc.) - """ - return f"{_JSON_TYPE_MARKER}{json.dumps(value, default=str)}{_JSON_TYPE_MARKER}" - - jinja_sandbox_env.filters["json"] = _json_type_filter @@ -414,6 +402,7 @@ class Block(BaseModel, abc.ABC): workflow_run_context: WorkflowRunContext, *, force_include_secrets: bool = False, + env: SandboxedEnvironment | None = None, ) -> str: """ Format a template string using the workflow run context. @@ -431,7 +420,7 @@ class Block(BaseModel, abc.ABC): BlockType.HTTP_REQUEST, ] - template = jinja_sandbox_env.from_string(potential_template) + template = (env or jinja_sandbox_env).from_string(potential_template) block_reference_data: dict[str, Any] = workflow_run_context.get_block_metadata(self.label) template_data = workflow_run_context.values.copy() @@ -601,7 +590,7 @@ class Block(BaseModel, abc.ABC): """Return block-level error codes for unexpected failures. Override in subclasses.""" return [] - @traced(name="skyvern.block.execute") + @traced(name="skyvern.block.execute", role="wrapper") async def execute_safe( self, workflow_run_id: str, @@ -612,6 +601,10 @@ class Block(BaseModel, abc.ABC): current_index: int | None = None, **kwargs: dict, ) -> BlockResult: + # block_type slices the 303s p95 by block kind — task/for_loop/code/extraction + # have wildly different latency profiles. Set early so it's present even if + # execute_safe raises before any child work. + otel_trace.get_current_span().set_attribute("block_type", self.block_type.value) workflow_run_block_id = None engine: RunEngine | None = None try: @@ -1077,6 +1070,24 @@ class BaseTaskBlock(Block): ) raise e + # Validate starter URL before downstream scraping on a blank page + if not (self.url and self.url.strip()) and working_page.url in ("about:blank", "", ":"): + missing_url_exc = MissingStarterUrl(block_label=self.label) + LOG.warning( + "First browser block has no starter URL", + task_id=task.task_id, + workflow_run_id=workflow_run_id, + block_label=self.label, + ) + await self._handle_task_failure_with_error_detection( + task=task, + step=step, + browser_state=browser_state, + failure_reason=str(missing_url_exc), + organization_id=workflow_run.organization_id, + ) + raise missing_url_exc + try: # add screenshot artifact for the first task screenshot = await browser_state.take_fullpage_screenshot() @@ -7174,6 +7185,12 @@ def get_all_blocks(blocks: list[BlockTypeVar]) -> list[BlockTypeVar]: return all_blocks +# Late import: google_sheets_blocks imports Block from this module, so top-level import would cycle. +from skyvern.forge.sdk.workflow.models.google_sheets_blocks import ( # noqa: E402 + GoogleSheetsReadBlock, + GoogleSheetsWriteBlock, +) + BlockSubclasses = Union[ ConditionalBlock, ForLoopBlock, @@ -7199,6 +7216,8 @@ BlockSubclasses = Union[ HttpRequestBlock, PrintPageBlock, WorkflowTriggerBlock, + GoogleSheetsReadBlock, + GoogleSheetsWriteBlock, ] BlockTypeVar = Annotated[BlockSubclasses, Field(discriminator="block_type")] diff --git a/skyvern/forge/sdk/workflow/models/google_sheets_blocks.py b/skyvern/forge/sdk/workflow/models/google_sheets_blocks.py new file mode 100644 index 000000000..f07e65879 --- /dev/null +++ b/skyvern/forge/sdk/workflow/models/google_sheets_blocks.py @@ -0,0 +1,893 @@ +import json +import re +from dataclasses import dataclass +from typing import Any, Literal, cast + +import structlog + +from skyvern.forge import app +from skyvern.forge.sdk.workflow.context_manager import WorkflowRunContext +from skyvern.forge.sdk.workflow.models._jinja import jinja_json_finalize_env +from skyvern.forge.sdk.workflow.models.block import Block +from skyvern.forge.sdk.workflow.models.parameter import PARAMETER_TYPE +from skyvern.schemas.google_sheets import ( + MAX_COLUMN_INDEX, + GoogleSheetsAPIError, + a1_to_grid_range, + build_a1, + column_letters_to_index, + extract_a1_sheet_prefix, + extract_spreadsheet_id, + leading_column_offset, +) +from skyvern.schemas.workflows import BlockResult, BlockStatus, BlockType + +LOG = structlog.get_logger() + + +def _disambiguate_header(header: list[str]) -> list[str]: + """Rename empty/duplicate header cells so dict(zip(header, row)) does not drop columns.""" + counts: dict[str, int] = {} + reserved = {h.strip() for h in header if h and h.strip()} + disambiguated: list[str] = [] + for idx, raw in enumerate(header): + name = raw.strip() if raw else "" + if not name: + candidate = f"col_{idx + 1}" + suffix = 2 + while candidate in reserved or candidate in disambiguated: + candidate = f"col_{idx + 1}_{suffix}" + suffix += 1 + disambiguated.append(candidate) + reserved.add(candidate) + continue + if counts.get(name, 0) == 0: + counts[name] = 1 + disambiguated.append(name) + continue + counts[name] += 1 + candidate = f"{name}_{counts[name]}" + while candidate in reserved or candidate in disambiguated: + counts[name] += 1 + candidate = f"{name}_{counts[name]}" + disambiguated.append(candidate) + reserved.add(candidate) + return disambiguated + + +class GoogleSheetsReadBlock(Block): + block_type: Literal[BlockType.GOOGLE_SHEETS_READ] = BlockType.GOOGLE_SHEETS_READ # type: ignore + + spreadsheet_url: str + sheet_name: str | None = None + range: str | None = None + credential_id: str | None = None + has_header_row: bool = True + parameters: list[PARAMETER_TYPE] = [] + + def get_all_parameters(self, workflow_run_id: str) -> list[PARAMETER_TYPE]: + return self.parameters + + def _render_templates(self, workflow_run_context: WorkflowRunContext) -> None: + if self.spreadsheet_url: + self.spreadsheet_url = self.format_block_parameter_template_from_workflow_run_context( + self.spreadsheet_url, workflow_run_context + ) + if self.sheet_name: + self.sheet_name = self.format_block_parameter_template_from_workflow_run_context( + self.sheet_name, workflow_run_context + ) + if self.range: + self.range = self.format_block_parameter_template_from_workflow_run_context( + self.range, workflow_run_context + ) + if self.credential_id: + self.credential_id = self.format_block_parameter_template_from_workflow_run_context( + self.credential_id, workflow_run_context + ) + + async def execute( + self, + workflow_run_id: str, + workflow_run_block_id: str, + organization_id: str | None = None, + browser_session_id: str | None = None, + **kwargs: Any, + ) -> BlockResult: + workflow_run_context = self.get_workflow_run_context(workflow_run_id) + + try: + self._render_templates(workflow_run_context) + except Exception as e: + return await self.build_block_result( + success=False, + failure_reason=f"Failed to format jinja template: {str(e)}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + if not self.credential_id: + return await self.build_block_result( + success=False, + failure_reason="Google credential_id is required", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + try: + spreadsheet_id = extract_spreadsheet_id(self.spreadsheet_url) + except ValueError: + return await self.build_block_result( + success=False, + failure_reason=f"Could not resolve spreadsheet id from: {self.spreadsheet_url}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + a1 = build_a1(self.sheet_name, self.range) + if not a1: + return await self.build_block_result( + success=False, + failure_reason="Either sheet_name or range must be provided", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + effective_org_id = organization_id or workflow_run_context.organization_id + if not effective_org_id: + return await self.build_block_result( + success=False, + failure_reason="organization_id is required to load Google Sheets credentials", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + access_token = await app.AGENT_FUNCTION.get_google_sheets_credentials( + organization_id=effective_org_id, + credential_id=self.credential_id, + ) + if not access_token: + return await self.build_block_result( + success=False, + failure_reason="Reconnect the Google account: no valid access token", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + fields = ( + "spreadsheetId,sheets(" + "properties(sheetId,title,index)," + "merges," + "data(startRow,startColumn,rowData(values(" + "userEnteredValue,userEnteredFormat,formattedValue,note,hyperlink" + ")))" + ")" + ) + try: + payload = await app.AGENT_FUNCTION.google_sheets_values_get( + access_token=access_token, + spreadsheet_id=spreadsheet_id, + ranges=a1, + fields=fields, + ) + except GoogleSheetsAPIError as e: + failure_reason = _failure_reason_from_sheets_error("read", e) + error_data = {"status_code": e.status, "code": e.code, "error": e.message} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=failure_reason, + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + except Exception as e: + error_data = {"error": str(e), "error_type": "unknown"} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=f"Google Sheets read failed: {str(e)}", + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + if payload is None: + error_data = {"error": "Google Sheets read returned no payload"} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason="Google Sheets runtime is not available in this build", + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + sheets = payload.get("sheets") or [] + # spreadsheets.get returns every sheet object even when ranges= is set, so + # selecting [0] would silently grab the first tab instead of the requested one. + target_sheet_title = self.sheet_name or extract_a1_sheet_prefix(a1) + sheet_block: dict[str, Any] = {} + if target_sheet_title: + for candidate in sheets: + candidate_title = (candidate.get("properties") or {}).get("title") + if candidate_title == target_sheet_title: + sheet_block = candidate + break + if not sheet_block: + sheet_block = sheets[0] if sheets else {} + properties = sheet_block.get("properties") or {} + data_blocks = sheet_block.get("data") or [] + first_data = data_blocks[0] if data_blocks else {} + row_data = first_data.get("rowData") or [] + + cells: list[list[dict[str, Any]]] = [] + values: list[list[Any]] = [] + for row in row_data: + row_values = row.get("values") or [] + cells.append([dict(cell) for cell in row_values]) + values.append([cell.get("formattedValue", "") for cell in row_values]) + + start_row = int(first_data.get("startRow", 0)) + start_column = int(first_data.get("startColumn", 0)) + row_count = len(cells) + col_count = max((len(r) for r in cells), default=0) + end_row_exclusive = start_row + row_count + end_col_exclusive = start_column + col_count + + merges: list[dict[str, int]] = [] + for merge in sheet_block.get("merges") or []: + m_start_row = int(merge.get("startRowIndex", 0)) + m_end_row = int(merge.get("endRowIndex", 0)) + m_start_col = int(merge.get("startColumnIndex", 0)) + m_end_col = int(merge.get("endColumnIndex", 0)) + # Clip to the intersection so a merge that begins outside the read window + # does not produce negative offsets when a downstream rich write replays it. + clipped_start_row = max(m_start_row, start_row) + clipped_end_row = min(m_end_row, end_row_exclusive) + clipped_start_col = max(m_start_col, start_column) + clipped_end_col = min(m_end_col, end_col_exclusive) + if clipped_start_row >= clipped_end_row or clipped_start_col >= clipped_end_col: + continue + merges.append( + { + "start_row_index": clipped_start_row - start_row, + "end_row_index": clipped_end_row - start_row, + "start_column_index": clipped_start_col - start_column, + "end_column_index": clipped_end_col - start_column, + } + ) + + rows: list[dict[str, Any]] | None = None + if self.has_header_row and values: + header = _disambiguate_header([str(h) for h in values[0]]) + rows = [dict(zip(header, row)) for row in values[1:]] + + output_data: dict[str, Any] = { + "spreadsheet_id": spreadsheet_id, + "range": a1, + "sheet_id": properties.get("sheetId"), + "sheet_title": properties.get("title"), + "start_row": start_row, + "start_column": start_column, + "values": values, + "rows": rows, + "cells": cells, + "merges": merges, + } + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, output_data) + return await self.build_block_result( + success=True, + failure_reason=None, + output_parameter_value=output_data, + status=BlockStatus.completed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + +@dataclass(frozen=True) +class RichSheetsInput: + cells: list[list[dict[str, Any]]] + merges: list[dict[str, int]] + sheet_id: int | None + sheet_title: str | None + + +def _failure_reason_from_sheets_error(action: str, exc: GoogleSheetsAPIError) -> str: + if exc.status == 403 and exc.code == "reconnect_required": + return f"Reconnect the Google account: {exc.message}" + if exc.status == 429: + return f"Google Sheets rate limit on {action}: {exc.message}" + return f"Google Sheets {action} failed (HTTP {exc.status}): {exc.message}" + + +def _try_rich_sheets_input(parsed: Any) -> RichSheetsInput | None: + if not isinstance(parsed, dict): + return None + cells = parsed.get("cells") + if not isinstance(cells, list) or not cells: + return None + if not all(isinstance(row, list) for row in cells): + return None + merges_in = parsed.get("merges") or [] + merges: list[dict[str, int]] = [] + for m in merges_in: + if not isinstance(m, dict): + continue + try: + merges.append( + { + "start_row_index": int(m.get("start_row_index", 0)), + "end_row_index": int(m.get("end_row_index", 0)), + "start_column_index": int(m.get("start_column_index", 0)), + "end_column_index": int(m.get("end_column_index", 0)), + } + ) + except (TypeError, ValueError): + continue + sheet_id_raw = parsed.get("sheet_id") + sheet_title_raw = parsed.get("sheet_title") + return RichSheetsInput( + cells=cells, + merges=merges, + sheet_id=int(sheet_id_raw) if isinstance(sheet_id_raw, int) else None, + sheet_title=str(sheet_title_raw) if isinstance(sheet_title_raw, str) else None, + ) + + +class GoogleSheetsWriteBlock(Block): + block_type: Literal[BlockType.GOOGLE_SHEETS_WRITE] = BlockType.GOOGLE_SHEETS_WRITE # type: ignore + + spreadsheet_url: str + sheet_name: str | None = None + range: str | None = None + credential_id: str | None = None + write_mode: Literal["append", "update"] = "append" + values: str = "" + column_mapping: dict[str, str] | None = None + create_sheet_if_missing: bool = False + parameters: list[PARAMETER_TYPE] = [] + + def get_all_parameters(self, workflow_run_id: str) -> list[PARAMETER_TYPE]: + return self.parameters + + def _render_templates(self, workflow_run_context: WorkflowRunContext) -> None: + if self.spreadsheet_url: + self.spreadsheet_url = self.format_block_parameter_template_from_workflow_run_context( + self.spreadsheet_url, workflow_run_context + ) + if self.sheet_name: + self.sheet_name = self.format_block_parameter_template_from_workflow_run_context( + self.sheet_name, workflow_run_context + ) + if self.range: + self.range = self.format_block_parameter_template_from_workflow_run_context( + self.range, workflow_run_context + ) + if self.credential_id: + self.credential_id = self.format_block_parameter_template_from_workflow_run_context( + self.credential_id, workflow_run_context + ) + if self.values: + self.values = self.format_block_parameter_template_from_workflow_run_context( + self.values, workflow_run_context, env=jinja_json_finalize_env + ) + + def _coerce_values(self, raw: Any, *, column_offset: int = 0) -> list[list[Any]]: + if isinstance(raw, dict): + if isinstance(raw.get("values"), list) and isinstance(raw.get("rows"), list): + LOG.warning("Google Sheets write payload has both 'values' and 'rows'; using 'values'") + for key in ("values", "rows"): + inner = raw.get(key) + if isinstance(inner, list): + raw = inner + break + if not isinstance(raw, list): + raise ValueError("Google Sheets write expects a JSON array of rows") + if not raw: + return [] + if all(isinstance(row, list) for row in raw): + return cast(list[list[Any]], raw) + if all(isinstance(row, dict) for row in raw): + if not self.column_mapping: + raise ValueError("column_mapping is required when writing a list of objects to Google Sheets") + indexed: list[tuple[int, str]] = [] + seen_columns: set[int] = set() + for field_key, target in self.column_mapping.items(): + target_str = str(target).strip().upper() + # str.isalpha accepts non-ASCII letters; restrict to A-Z so we never silently + # map "Ω" or "Α" to a column index. + if not target_str or not re.fullmatch(r"[A-Z]+", target_str): + raise ValueError(f"column_mapping target must be a column letter (A, B, ... ZZ), got: {target!r}") + col_index = column_letters_to_index(target_str) + if col_index > MAX_COLUMN_INDEX: + raise ValueError(f"column_mapping target {target!r} exceeds the Google Sheets column limit (ZZZ)") + if col_index in seen_columns: + raise ValueError(f"column_mapping has duplicate destination column: {target!r}") + pos = col_index - column_offset + if pos < 0: + raise ValueError( + f"column_mapping target {target!r} falls before the range start column; " + f"widen the range or remap this field" + ) + seen_columns.add(col_index) + indexed.append((pos, field_key)) + width = max(pos for pos, _ in indexed) + 1 + coerced: list[list[Any]] = [] + for row in raw: + padded: list[Any] = [None] * width + for pos, field_key in indexed: + padded[pos] = row.get(field_key) + coerced.append(padded) + return coerced + raise ValueError("Google Sheets write expects rows to be all lists or all objects") + + def _parse_values_or_raise(self) -> Any: + """Parse `self.values` JSON text into raw Python data. + + Returns raw parsed JSON so the caller can rich-detect or coerce as needed. + Callers performing a flat write must pass the result through `_coerce_values`. + """ + if not self.values: + return [] + snippet = self.values[:200] + try: + return json.loads(self.values) + except (ValueError, json.JSONDecodeError) as e: + stripped = self.values.lstrip() + hint = "" + if stripped.startswith("{'") or stripped.startswith("[{'") or stripped.startswith("['"): + hint = ( + " Looks like a Python dict/list repr - wrap your template with | tojson " + "(e.g. {{ block_1.output | tojson }})." + ) + raise ValueError(f"{str(e)}.{hint} Rendered values: {snippet!r}") from e + + async def execute( + self, + workflow_run_id: str, + workflow_run_block_id: str, + organization_id: str | None = None, + browser_session_id: str | None = None, + **kwargs: Any, + ) -> BlockResult: + workflow_run_context = self.get_workflow_run_context(workflow_run_id) + + try: + self._render_templates(workflow_run_context) + except Exception as e: + return await self.build_block_result( + success=False, + failure_reason=f"Failed to format jinja template: {str(e)}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + if not self.credential_id: + return await self.build_block_result( + success=False, + failure_reason="Google credential_id is required", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + try: + spreadsheet_id = extract_spreadsheet_id(self.spreadsheet_url) + except ValueError: + return await self.build_block_result( + success=False, + failure_reason=f"Could not resolve spreadsheet id from: {self.spreadsheet_url}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + a1 = build_a1(self.sheet_name, self.range) + if not a1: + return await self.build_block_result( + success=False, + failure_reason="Either sheet_name or range must be provided", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + try: + parsed_values: Any = self._parse_values_or_raise() + except ValueError as e: + return await self.build_block_result( + success=False, + failure_reason=f"Invalid values payload: {str(e)}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + effective_org_id = organization_id or workflow_run_context.organization_id + if not effective_org_id: + return await self.build_block_result( + success=False, + failure_reason="organization_id is required to load Google Sheets credentials", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + access_token = await app.AGENT_FUNCTION.get_google_sheets_credentials( + organization_id=effective_org_id, + credential_id=self.credential_id, + ) + if not access_token: + return await self.build_block_result( + success=False, + failure_reason="Reconnect the Google account: no valid access token", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + created_sheet_id: int | None = None + target_sheet_title = self.sheet_name or extract_a1_sheet_prefix(a1) + if self.create_sheet_if_missing and target_sheet_title: + try: + existing_sheet_id = await self._resolve_sheet_id( + spreadsheet_id=spreadsheet_id, + access_token=access_token, + sheet_title=target_sheet_title, + ) + except GoogleSheetsAPIError as e: + failure_reason = _failure_reason_from_sheets_error("lookup", e) + error_data = {"status_code": e.status, "code": e.code, "error": e.message} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=failure_reason, + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + if existing_sheet_id is None: + try: + created_sheet_id = await app.AGENT_FUNCTION.ensure_sheet_tab( + access_token=access_token, + spreadsheet_id=spreadsheet_id, + title=target_sheet_title, + ) + except Exception as e: + return await self.build_block_result( + success=False, + failure_reason=f"Failed to create sheet '{target_sheet_title}': {str(e)}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + else: + created_sheet_id = existing_sheet_id + + rich = _try_rich_sheets_input(parsed_values) + if rich is not None: + return await self._execute_rich( + spreadsheet_id=spreadsheet_id, + a1=a1, + access_token=access_token, + workflow_run_context=workflow_run_context, + workflow_run_id=workflow_run_id, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + rich=rich, + known_sheet_id=created_sheet_id, + ) + + try: + rows = self._coerce_values(parsed_values, column_offset=leading_column_offset(a1)) + except ValueError as e: + snippet = self.values[:200] if self.values else "" + extra = "" + if isinstance(parsed_values, dict): + extra = ( + " Rendered a single object instead of a list; reference a list field " + "(e.g. {{ block_1.output.rows | tojson }}) or wrap the object in an array." + ) + return await self.build_block_result( + success=False, + failure_reason=f"Invalid values payload: {str(e)}.{extra} Rendered values: {snippet!r}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + try: + if self.write_mode == "append": + payload = await app.AGENT_FUNCTION.google_sheets_values_append( + access_token=access_token, + spreadsheet_id=spreadsheet_id, + range_=a1, + values=rows, + ) + else: + payload = await app.AGENT_FUNCTION.google_sheets_values_update( + access_token=access_token, + spreadsheet_id=spreadsheet_id, + range_=a1, + values=rows, + ) + except GoogleSheetsAPIError as e: + failure_reason = _failure_reason_from_sheets_error("write", e) + error_data = {"status_code": e.status, "code": e.code, "error": e.message} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=failure_reason, + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + except Exception as e: + error_data = {"error": str(e), "error_type": "unknown"} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=f"Google Sheets write failed: {str(e)}", + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + if payload is None: + error_data = {"error": "Google Sheets write returned no payload"} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason="Google Sheets runtime is not available in this build", + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + output_data: dict[str, Any] = { + "spreadsheet_id": spreadsheet_id, + "write_mode": self.write_mode, + "rows_written": len(rows), + "response": payload, + } + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, output_data) + return await self.build_block_result( + success=True, + failure_reason=None, + output_parameter_value=output_data, + status=BlockStatus.completed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + async def _execute_rich( + self, + *, + spreadsheet_id: str, + a1: str, + access_token: str, + workflow_run_context: WorkflowRunContext, + workflow_run_id: str, + workflow_run_block_id: str, + organization_id: str | None, + rich: RichSheetsInput, + known_sheet_id: int | None = None, + ) -> BlockResult: + sheet_id: int | None = known_sheet_id + try: + if sheet_id is None and self.sheet_name: + sheet_id = await self._resolve_sheet_id( + spreadsheet_id=spreadsheet_id, + access_token=access_token, + sheet_title=self.sheet_name, + ) + if sheet_id is None: + # Honor an explicit sheet prefix in the configured A1 (e.g. "'Target'!B2:C3") + # before falling back to the rich payload's sheet metadata. + a1_prefix = extract_a1_sheet_prefix(a1) + if a1_prefix: + sheet_id = await self._resolve_sheet_id( + spreadsheet_id=spreadsheet_id, + access_token=access_token, + sheet_title=a1_prefix, + ) + if sheet_id is None and rich.sheet_title: + # rich.sheet_id is local to the source spreadsheet; resolve by title against the destination. + sheet_id = await self._resolve_sheet_id( + spreadsheet_id=spreadsheet_id, + access_token=access_token, + sheet_title=rich.sheet_title, + ) + except GoogleSheetsAPIError as e: + failure_reason = _failure_reason_from_sheets_error("lookup", e) + error_data = {"status_code": e.status, "code": e.code, "error": e.message} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=failure_reason, + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + if sheet_id is None: + return await self.build_block_result( + success=False, + failure_reason=( + "Could not resolve sheet_id for batchUpdate; " + "set sheet_name on the Write block or include sheet_title in the input" + ), + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + fields_mask = "userEnteredValue,userEnteredFormat,note,hyperlink" + requests: list[dict[str, Any]] = [] + + if self.write_mode == "append": + append_col_offset = leading_column_offset(a1) + padded_cells: list[list[dict[str, Any]]] + if append_col_offset: + padded_cells = [[{}] * append_col_offset + row for row in rich.cells] + else: + padded_cells = list(rich.cells) + rows_payload = [{"values": row} for row in padded_cells] + requests.append( + { + "appendCells": { + "sheetId": sheet_id, + "rows": rows_payload, + "fields": fields_mask, + } + } + ) + merge_origin_row: int | None = None + merge_origin_col = append_col_offset + else: + rows_payload = [{"values": row} for row in rich.cells] + try: + grid_range = a1_to_grid_range(a1, sheet_id) + except ValueError as e: + return await self.build_block_result( + success=False, + failure_reason=f"Update mode requires a fully-qualified A1 range: {e}", + output_parameter_value=None, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + requests.append( + { + "updateCells": { + "range": grid_range, + "rows": rows_payload, + "fields": fields_mask, + } + } + ) + merge_origin_row = grid_range["startRowIndex"] + merge_origin_col = grid_range["startColumnIndex"] + + for merge in rich.merges: + if self.write_mode == "append": + # Append mode appends after the last table row; we'd need the response's + # updatedRange to shift merges correctly. Skip for now. + continue + row_offset = merge_origin_row or 0 + requests.append( + { + "mergeCells": { + "range": { + "sheetId": sheet_id, + "startRowIndex": merge["start_row_index"] + row_offset, + "endRowIndex": merge["end_row_index"] + row_offset, + "startColumnIndex": merge["start_column_index"] + merge_origin_col, + "endColumnIndex": merge["end_column_index"] + merge_origin_col, + }, + "mergeType": "MERGE_ALL", + } + } + ) + + try: + payload = await app.AGENT_FUNCTION.google_sheets_batch_update( + access_token=access_token, + spreadsheet_id=spreadsheet_id, + requests=requests, + ) + except GoogleSheetsAPIError as e: + failure_reason = _failure_reason_from_sheets_error("batchUpdate", e) + error_data = {"status_code": e.status, "code": e.code, "error": e.message} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=failure_reason, + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + except Exception as e: + error_data = {"error": str(e), "error_type": "unknown"} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason=f"Google Sheets batchUpdate failed: {str(e)}", + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + if payload is None: + error_data = {"error": "Google Sheets batchUpdate returned no payload"} + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, error_data) + return await self.build_block_result( + success=False, + failure_reason="Google Sheets runtime is not available in this build", + output_parameter_value=error_data, + status=BlockStatus.failed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + output_data: dict[str, Any] = { + "spreadsheet_id": spreadsheet_id, + "write_mode": self.write_mode, + "rows_written": len(rich.cells), + "response": payload, + } + await self.record_output_parameter_value(workflow_run_context, workflow_run_id, output_data) + return await self.build_block_result( + success=True, + failure_reason=None, + output_parameter_value=output_data, + status=BlockStatus.completed, + workflow_run_block_id=workflow_run_block_id, + organization_id=organization_id, + ) + + async def _resolve_sheet_id( + self, + *, + spreadsheet_id: str, + access_token: str, + sheet_title: str | None, + ) -> int | None: + if not sheet_title: + return None + return await app.AGENT_FUNCTION.google_sheets_get_sheet_id( + access_token=access_token, + spreadsheet_id=spreadsheet_id, + sheet_title=sheet_title, + ) diff --git a/skyvern/forge/sdk/workflow/models/workflow.py b/skyvern/forge/sdk/workflow/models/workflow.py index cddff396b..e636369b4 100644 --- a/skyvern/forge/sdk/workflow/models/workflow.py +++ b/skyvern/forge/sdk/workflow/models/workflow.py @@ -157,6 +157,17 @@ class WorkflowRunStatus(StrEnum): WorkflowRunStatus.completed, ] + def is_final_excluding_canceled(self) -> bool: + """Like :meth:`is_final` but excludes ``canceled``. + + For callers that can't distinguish a legitimate user/block cancel from + a synthetic ``canceled`` written as a last-resort fallback — e.g. the + copilot tool reading the row AFTER ``mark_workflow_run_as_canceled_if_not_final`` + has run. Callers that want to trust a legitimate ``canceled`` must read + the row BEFORE invoking any cancel helper. + """ + return self.is_final() and self is not WorkflowRunStatus.canceled + class WorkflowRun(BaseModel): workflow_run_id: str @@ -244,6 +255,7 @@ class WorkflowRunResponseBase(BaseModel): workflow_run_id: str status: WorkflowRunStatus failure_reason: str | None = None + failure_category: list[dict[str, Any]] | None = None proxy_location: ProxyLocationInput = None webhook_callback_url: str | None = None webhook_failure_reason: str | None = None diff --git a/skyvern/forge/sdk/workflow/service.py b/skyvern/forge/sdk/workflow/service.py index 29bab3583..ee79d1365 100644 --- a/skyvern/forge/sdk/workflow/service.py +++ b/skyvern/forge/sdk/workflow/service.py @@ -915,7 +915,7 @@ class WorkflowService: return None - @traced(name="skyvern.workflow.execute") + @traced(name="skyvern.workflow.execute", role="wrapper") async def execute_workflow( self, workflow_run_id: str, @@ -924,6 +924,7 @@ class WorkflowService: block_labels: list[str] | None = None, block_outputs: dict[str, Any] | None = None, browser_session_id: str | None = None, + need_call_webhook: bool = True, ) -> WorkflowRun: """Execute a workflow.""" organization_id = organization.organization_id @@ -1014,6 +1015,7 @@ class WorkflowService: api_key=api_key, browser_session_id=browser_session_id, close_browser_on_completion=close_browser_on_completion, + need_call_webhook=need_call_webhook, ) return workflow_run @@ -1069,6 +1071,7 @@ class WorkflowService: api_key=api_key, browser_session_id=browser_session_id, close_browser_on_completion=close_browser_on_completion, + need_call_webhook=need_call_webhook, ) return workflow_run # Start background task to periodically renew the browser session @@ -1256,6 +1259,7 @@ class WorkflowService: api_key=api_key, browser_session_id=browser_session_id, close_browser_on_completion=close_browser_on_completion, + need_call_webhook=need_call_webhook, ) return workflow_run @@ -4526,6 +4530,7 @@ class WorkflowService: workflow_run_id=workflow_run_id, status=workflow_run.status, failure_reason=workflow_run.failure_reason, + failure_category=workflow_run.failure_category, proxy_location=workflow_run.proxy_location, webhook_callback_url=workflow_run.webhook_callback_url, webhook_failure_reason=workflow_run.webhook_failure_reason, diff --git a/skyvern/forge/sdk/workflow/workflow_definition_converter.py b/skyvern/forge/sdk/workflow/workflow_definition_converter.py index bc030de55..7e0d893c9 100644 --- a/skyvern/forge/sdk/workflow/workflow_definition_converter.py +++ b/skyvern/forge/sdk/workflow/workflow_definition_converter.py @@ -58,6 +58,10 @@ from skyvern.forge.sdk.workflow.models.block import ( WaitBlock, WorkflowTriggerBlock, ) +from skyvern.forge.sdk.workflow.models.google_sheets_blocks import ( + GoogleSheetsReadBlock, + GoogleSheetsWriteBlock, +) from skyvern.forge.sdk.workflow.models.parameter import ( PARAMETER_TYPE, RESERVED_PARAMETER_KEYS, @@ -747,6 +751,31 @@ def block_yaml_to_block( use_parent_browser_session=block_yaml.use_parent_browser_session, parameters=workflow_trigger_block_parameters, ) + elif block_yaml.block_type == BlockType.GOOGLE_SHEETS_READ: + google_sheets_read_parameters = _resolve_block_parameters(block_yaml, parameters) + return GoogleSheetsReadBlock( + **base_kwargs, + spreadsheet_url=block_yaml.spreadsheet_url, + sheet_name=block_yaml.sheet_name, + range=block_yaml.range, + credential_id=block_yaml.credential_id, + has_header_row=block_yaml.has_header_row, + parameters=google_sheets_read_parameters, + ) + elif block_yaml.block_type == BlockType.GOOGLE_SHEETS_WRITE: + google_sheets_write_parameters = _resolve_block_parameters(block_yaml, parameters) + return GoogleSheetsWriteBlock( + **base_kwargs, + spreadsheet_url=block_yaml.spreadsheet_url, + sheet_name=block_yaml.sheet_name, + range=block_yaml.range, + credential_id=block_yaml.credential_id, + write_mode=block_yaml.write_mode, + values=block_yaml.values, + column_mapping=block_yaml.column_mapping, + create_sheet_if_missing=block_yaml.create_sheet_if_missing, + parameters=google_sheets_write_parameters, + ) raise ValueError(f"Invalid block type {block_yaml.block_type}") diff --git a/skyvern/library/skyvern_locator.py b/skyvern/library/skyvern_locator.py index 8409ddd0f..fc8593288 100644 --- a/skyvern/library/skyvern_locator.py +++ b/skyvern/library/skyvern_locator.py @@ -2,6 +2,8 @@ from typing import Any, Pattern from playwright.async_api import Locator +from skyvern.forge.sdk.event.factory import EventStrategyFactory + class SkyvernLocator: """Locator for finding and interacting with elements on a page. @@ -16,7 +18,12 @@ class SkyvernLocator: # Action methods async def click(self, **kwargs: Any) -> None: """Click the element.""" - await self._locator.click(**kwargs) + timeout = kwargs.pop("timeout", None) + if kwargs: + # Extra kwargs (e.g. position) — use raw Playwright click + await self._locator.click(timeout=timeout, **kwargs) + else: + await EventStrategyFactory.click(self._locator.page, self._locator, timeout=timeout) async def fill(self, value: str, **kwargs: Any) -> None: """Fill an input element with text.""" diff --git a/skyvern/schemas/google_sheets.py b/skyvern/schemas/google_sheets.py new file mode 100644 index 000000000..9d57a5a9c --- /dev/null +++ b/skyvern/schemas/google_sheets.py @@ -0,0 +1,128 @@ +import re + +_SPREADSHEET_URL_RE = re.compile(r"/spreadsheets(?:/u/\d+)?/d/([a-zA-Z0-9-_]+)") +_BARE_ID_RE = re.compile(r"^[a-zA-Z0-9-_]{20,}$") +_A1_CELL_RE = re.compile(r"^([A-Z]+)(\d+)$") +_A1_COLUMN_ONLY_RE = re.compile(r"^([A-Z]+)$") + +MAX_COLUMN_INDEX = 18277 + + +class GoogleSheetsAPIError(RuntimeError): + """Raised when the Drive or Sheets API returns a non-2xx response.""" + + def __init__(self, status: int, code: str | None, message: str) -> None: + super().__init__(message) + self.status = status + self.code = code + self.message = message + + +def extract_spreadsheet_id(url_or_id: str) -> str: + if not url_or_id: + raise ValueError("Empty spreadsheet reference") + match = _SPREADSHEET_URL_RE.search(url_or_id) + if match: + return match.group(1) + if _BARE_ID_RE.match(url_or_id): + return url_or_id + raise ValueError(f"Could not extract spreadsheet id from: {url_or_id}") + + +def quote_sheet_name(name: str) -> str: + return "'" + name.replace("'", "''") + "'" + + +def build_a1(sheet_name: str | None, cell_range: str | None) -> str | None: + quoted = quote_sheet_name(sheet_name) if sheet_name else None + if quoted and cell_range: + return f"{quoted}!{cell_range}" + if quoted: + return quoted + if cell_range: + return cell_range + return None + + +def column_letters_to_index(letters: str) -> int: + index = 0 + for ch in letters: + index = index * 26 + (ord(ch) - ord("A") + 1) + return index - 1 + + +def strip_a1_sheet_prefix(a1: str) -> str: + if a1.startswith("'"): + idx = 1 + while idx < len(a1): + if a1[idx] == "'": + if idx + 1 < len(a1) and a1[idx + 1] == "'": + idx += 2 + continue + break + idx += 1 + if idx < len(a1) and idx + 1 < len(a1) and a1[idx + 1] == "!": + return a1[idx + 2 :] + if "!" in a1: + return a1.rsplit("!", 1)[1] + return a1 + + +def extract_a1_sheet_prefix(a1: str | None) -> str | None: + """Return the unquoted sheet title from an A1 string (e.g. ``'Target'!B2:C3`` -> ``Target``).""" + if not a1: + return None + if a1.startswith("'"): + idx = 1 + while idx < len(a1): + if a1[idx] == "'": + if idx + 1 < len(a1) and a1[idx + 1] == "'": + idx += 2 + continue + break + idx += 1 + if idx < len(a1) and idx + 1 < len(a1) and a1[idx + 1] == "!": + return a1[1:idx].replace("''", "'") or None + return None + if "!" in a1: + sheet_part = a1.partition("!")[0] + return sheet_part or None + return None + + +def leading_column_offset(a1: str | None) -> int: + if not a1: + return 0 + body = strip_a1_sheet_prefix(a1) + start = body.split(":", 1)[0].upper() + cell_match = _A1_CELL_RE.match(start) + if cell_match: + return column_letters_to_index(cell_match.group(1)) + col_match = _A1_COLUMN_ONLY_RE.match(start) + if col_match: + return column_letters_to_index(col_match.group(1)) + return 0 + + +def a1_to_grid_range(a1: str, sheet_id: int) -> dict[str, int]: + body = strip_a1_sheet_prefix(a1) + parts = body.split(":") + if len(parts) == 1: + parts = [parts[0], parts[0]] + elif len(parts) != 2: + raise ValueError(f"Unsupported A1 range: {a1!r}") + start_match = _A1_CELL_RE.match(parts[0].upper()) + end_match = _A1_CELL_RE.match(parts[1].upper()) + if not start_match or not end_match: + raise ValueError(f"Only fully qualified A1 ranges are supported, got: {a1!r}") + start_col = column_letters_to_index(start_match.group(1)) + start_row = int(start_match.group(2)) - 1 + end_col = column_letters_to_index(end_match.group(1)) + end_row = int(end_match.group(2)) - 1 + return { + "sheetId": sheet_id, + "startRowIndex": start_row, + "endRowIndex": end_row + 1, + "startColumnIndex": start_col, + "endColumnIndex": end_col + 1, + } diff --git a/skyvern/schemas/workflows.py b/skyvern/schemas/workflows.py index 941fa7d12..9d903071f 100644 --- a/skyvern/schemas/workflows.py +++ b/skyvern/schemas/workflows.py @@ -418,6 +418,8 @@ class BlockType(StrEnum): HUMAN_INTERACTION = "human_interaction" PRINT_PAGE = "print_page" WORKFLOW_TRIGGER = "workflow_trigger" + GOOGLE_SHEETS_READ = "google_sheets_read" + GOOGLE_SHEETS_WRITE = "google_sheets_write" class BlockStatus(StrEnum): @@ -1029,6 +1031,29 @@ class WorkflowTriggerBlockYAML(BlockYAML): parameter_keys: list[str] | None = None +class GoogleSheetsReadBlockYAML(BlockYAML): + block_type: Literal[BlockType.GOOGLE_SHEETS_READ] = BlockType.GOOGLE_SHEETS_READ # type: ignore + spreadsheet_url: str + sheet_name: str | None = None + range: str | None = None + credential_id: str | None = None + has_header_row: bool = True + parameter_keys: list[str] | None = None + + +class GoogleSheetsWriteBlockYAML(BlockYAML): + block_type: Literal[BlockType.GOOGLE_SHEETS_WRITE] = BlockType.GOOGLE_SHEETS_WRITE # type: ignore + spreadsheet_url: str + sheet_name: str | None = None + range: str | None = None + credential_id: str | None = None + write_mode: Literal["append", "update"] = "append" + values: str = "" + column_mapping: dict[str, str] | None = None + create_sheet_if_missing: bool = False + parameter_keys: list[str] | None = None + + PARAMETER_YAML_SUBCLASSES = ( AWSSecretParameterYAML | BitwardenLoginCredentialParameterYAML @@ -1068,6 +1093,8 @@ BLOCK_YAML_SUBCLASSES = ( | ConditionalBlockYAML | PrintPageBlockYAML | WorkflowTriggerBlockYAML + | GoogleSheetsReadBlockYAML + | GoogleSheetsWriteBlockYAML ) BLOCK_YAML_TYPES = Annotated[BLOCK_YAML_SUBCLASSES, Field(discriminator="block_type")] diff --git a/skyvern/services/task_v2_service.py b/skyvern/services/task_v2_service.py index e845c4bc9..48b4af586 100644 --- a/skyvern/services/task_v2_service.py +++ b/skyvern/services/task_v2_service.py @@ -455,7 +455,7 @@ async def initialize_task_v2_metadata( return task_v2 -@traced() +@traced(name="skyvern.task_v2.run") async def run_task_v2( organization: Organization, task_v2_id: str, diff --git a/skyvern/webeye/actions/handler.py b/skyvern/webeye/actions/handler.py index 320cd869e..dbdd8df20 100644 --- a/skyvern/webeye/actions/handler.py +++ b/skyvern/webeye/actions/handler.py @@ -77,7 +77,7 @@ from skyvern.forge.sdk.models import Step from skyvern.forge.sdk.schemas.tasks import Task from skyvern.forge.sdk.services.bitwarden import BitwardenConstants from skyvern.forge.sdk.services.credentials import AzureVaultConstants, OnePasswordConstants -from skyvern.forge.sdk.trace import traced +from skyvern.forge.sdk.trace import apply_context_attrs, traced from skyvern.services import service_utils from skyvern.services.action_service import get_action_history from skyvern.utils.prompt_engine import ( @@ -392,7 +392,7 @@ class ActionHandler: cls._teardown_action_types[action_type] = handler @staticmethod - @traced(name="skyvern.agent.action") + @traced(name="skyvern.agent.action", role="wrapper") async def handle_action( scraped_page: ScrapedPage, task: Task, @@ -411,14 +411,21 @@ class ActionHandler: trigger_download_action = ( isinstance(action, (SelectOptionAction, ClickAction, DownloadFileAction)) and action.download ) + # triggers_download splits the bimodal distribution: non-download actions + # finish in ~1s while download actions can burn up to BROWSER_DOWNLOAD_MAX_WAIT_TIME + # (120s) polling for the file. Explains the 36s p95 on this wrapper. + _action_span.set_attribute("triggers_download", trigger_download_action) + _tracer = otel_trace.get_tracer("skyvern") if not trigger_download_action: - results = await ActionHandler._handle_action( - scraped_page=scraped_page, - task=task, - step=step, - page=page, - action=action, - ) + with _tracer.start_as_current_span("skyvern.agent.action.handle_inner") as _hi_span: + apply_context_attrs(_hi_span) + results = await ActionHandler._handle_action( + scraped_page=scraped_page, + task=task, + step=step, + page=page, + action=action, + ) persisted_action = await app.DATABASE.workflow_params.create_action(action=action) action.action_id = persisted_action.action_id return results @@ -449,45 +456,57 @@ class ActionHandler: download_triggered = False try: - results = await ActionHandler._handle_action( - scraped_page=scraped_page, - task=task, - step=step, - page=page, - action=action, - ) + with _tracer.start_as_current_span("skyvern.agent.action.handle_inner") as _hi_span: + apply_context_attrs(_hi_span) + results = await ActionHandler._handle_action( + scraped_page=scraped_page, + task=task, + step=step, + page=page, + action=action, + ) if not results: return results - try: - LOG.info( - "Checking if there is any new files after click", - download_dir=download_dir, - ) - async with asyncio.timeout(task.download_timeout or BROWSER_DOWNLOAD_MAX_WAIT_TIME): - while True: - list_files_after = list_files_in_directory(download_dir) - if task.browser_session_id: - files_in_browser_session = await app.STORAGE.list_downloaded_files_in_browser_session( - organization_id=task.organization_id, browser_session_id=task.browser_session_id - ) - list_files_after = list_files_after + files_in_browser_session + _download_timeout = task.download_timeout or BROWSER_DOWNLOAD_MAX_WAIT_TIME + with _tracer.start_as_current_span("skyvern.agent.action.download_wait") as _dl_wait_span: + apply_context_attrs(_dl_wait_span) + _dl_wait_span.set_attribute("timeout_seconds", _download_timeout) + _poll_iterations = 0 + try: + LOG.info( + "Checking if there is any new files after click", + download_dir=download_dir, + ) + async with asyncio.timeout(_download_timeout): + while True: + _poll_iterations += 1 + list_files_after = list_files_in_directory(download_dir) + if task.browser_session_id: + files_in_browser_session = await app.STORAGE.list_downloaded_files_in_browser_session( + organization_id=task.organization_id, + browser_session_id=task.browser_session_id, + ) + list_files_after = list_files_after + files_in_browser_session - if len(list_files_after) > len(list_files_before): - LOG.info( - "Found new files in download directory after action", - num_downloaded_files_after=len(list_files_after), - download_dir=download_dir, - workflow_run_id=task.workflow_run_id, - ) - download_triggered = True - break - await asyncio.sleep(1) + if len(list_files_after) > len(list_files_before): + LOG.info( + "Found new files in download directory after action", + num_downloaded_files_after=len(list_files_after), + download_dir=download_dir, + workflow_run_id=task.workflow_run_id, + ) + download_triggered = True + break + await asyncio.sleep(1) - except asyncio.TimeoutError: - LOG.warning( - "No file to download after action", - workflow_run_id=task.workflow_run_id, - ) + except asyncio.TimeoutError: + LOG.warning( + "No file to download after action", + workflow_run_id=task.workflow_run_id, + ) + finally: + _dl_wait_span.set_attribute("download_triggered", download_triggered) + _dl_wait_span.set_attribute("poll_iterations", _poll_iterations) if not download_triggered: results[-1].download_triggered = False @@ -701,7 +720,7 @@ def check_for_invalid_web_action( return [] -@traced() +@traced(name="skyvern.agent.action.solve_captcha") async def handle_solve_captcha_action( action: actions.SolveCaptchaAction, page: Page, @@ -717,7 +736,7 @@ async def handle_solve_captcha_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.click") async def handle_click_action( action: actions.ClickAction, page: Page, @@ -880,7 +899,7 @@ async def handle_click_action( return results -@traced() +@traced(name="skyvern.agent.action.click_dropdown_sequential") async def handle_sequential_click_for_dropdown( action: actions.ClickAction, action_history: list[ActionResult], @@ -1006,7 +1025,7 @@ async def handle_sequential_click_for_dropdown( ) -@traced() +@traced(name="skyvern.agent.action.click_to_download") async def handle_click_to_download_file_action( action: actions.ClickAction, page: Page, @@ -1145,7 +1164,7 @@ async def _handle_multi_field_totp_sequence( return None # Success -@traced() +@traced(name="skyvern.agent.action.input_text") async def handle_input_text_action( action: actions.InputTextAction, page: Page, @@ -1596,7 +1615,7 @@ async def handle_input_text_action( await skyvern_element.press_key("Tab") -@traced() +@traced(name="skyvern.agent.action.upload_file") async def handle_upload_file_action( action: actions.UploadFileAction, page: Page, @@ -1682,7 +1701,7 @@ async def handle_upload_file_action( # This function is deprecated in 'extract-actions' prompt. Downloads are handled by the click action handler now. # Currently, it's only used for the download action triggered by the code. -@traced() +@traced(name="skyvern.agent.action.download_file") async def handle_download_file_action( action: actions.DownloadFileAction, page: Page, @@ -1742,7 +1761,7 @@ async def handle_download_file_action( return [ActionFailure(e)] -@traced() +@traced(name="skyvern.agent.action.null") async def handle_null_action( action: actions.NullAction, page: Page, @@ -1753,7 +1772,7 @@ async def handle_null_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.select_option") async def handle_select_option_action( action: actions.SelectOptionAction, page: Page, @@ -2029,7 +2048,7 @@ async def handle_select_option_action( try: await EventStrategyFactory.move_to_element(page, skyvern_element.get_locator()) - await skyvern_element.get_locator().click(timeout=timeout) + await EventStrategyFactory.click(page, skyvern_element.get_locator(), timeout=timeout) except Exception: LOG.info( "fail to open dropdown by clicking, try to press arrow down to open", @@ -2074,7 +2093,7 @@ async def handle_select_option_action( await incremental_scraped.stop_listen_dom_increment() -@traced() +@traced(name="skyvern.agent.action.checkbox") async def handle_checkbox_action( action: actions.CheckboxAction, page: Page, @@ -2103,7 +2122,7 @@ async def handle_checkbox_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.wait") async def handle_wait_action( action: actions.WaitAction, page: Page, @@ -2115,7 +2134,7 @@ async def handle_wait_action( return [ActionFailure(exception=Exception("Wait action is treated as a failure"))] -@traced() +@traced(name="skyvern.agent.action.hover") async def handle_hover_action( action: actions.HoverAction, page: Page, @@ -2154,7 +2173,7 @@ async def handle_hover_action( return [ActionFailure(FailToHover(skyvern_element.get_id(), msg=str(exc)))] -@traced() +@traced(name="skyvern.agent.action.terminate") async def handle_terminate_action( action: actions.TerminateAction, page: Page, @@ -2186,58 +2205,68 @@ async def handle_complete_action( task: Task, step: Step, ) -> list[ActionResult]: - if not action.verified and task.navigation_goal: - LOG.info( - "CompleteAction hasn't been verified, going to verify the user goal", + # verification_path separates the three distinct runtime paths that + # roll up to this span — explains the 175x p95/p50 ratio. + _span = otel_trace.get_current_span() + if action.verified or not task.navigation_goal: + _span.set_attribute("verification_path", "already_verified") + return [ActionSuccess()] + + LOG.info( + "CompleteAction hasn't been verified, going to verify the user goal", + workflow_run_id=task.workflow_run_id, + ) + try: + verification_result = await app.agent.complete_verify(page, scraped_page, task, step) + except Exception as e: + _span.set_attribute("verification_path", "needs_llm_error") + LOG.exception( + "Failed to verify the complete action", workflow_run_id=task.workflow_run_id, ) - try: - verification_result = await app.agent.complete_verify(page, scraped_page, task, step) - except Exception as e: - LOG.exception( - "Failed to verify the complete action", - workflow_run_id=task.workflow_run_id, - ) - return [ActionFailure(exception=e)] + return [ActionFailure(exception=e)] - # Check if we should terminate instead of complete - # Note: This requires the USE_TERMINATION_AWARE_COMPLETE_VERIFICATION experiment to be enabled - if verification_result.is_terminate: - LOG.warning( - "CompleteAction verification determined task should terminate instead (termination-aware experiment)", - workflow_run_id=task.workflow_run_id, - thoughts=verification_result.thoughts, - status=verification_result.status if verification_result.status else "legacy", - ) - # Create a TerminateAction and execute it - terminate_action = actions.TerminateAction( - reasoning=verification_result.thoughts, - organization_id=action.organization_id, - workflow_run_id=action.workflow_run_id, - task_id=action.task_id, - step_id=action.step_id, - step_order=action.step_order, - action_order=action.action_order, - ) - results = await handle_terminate_action(terminate_action, page, scraped_page, task, step) - action.action_type = ActionType.TERMINATE - action.reasoning = terminate_action.reasoning - action.errors = terminate_action.errors - return results - - if not verification_result.is_complete: - return [ActionFailure(exception=IllegitComplete(data={"error": verification_result.thoughts}))] - - LOG.info( - "CompleteAction has been verified successfully", + # Check if we should terminate instead of complete + # Note: This requires the USE_TERMINATION_AWARE_COMPLETE_VERIFICATION experiment to be enabled + if verification_result.is_terminate: + _span.set_attribute("verification_path", "terminate_requested") + LOG.warning( + "CompleteAction verification determined task should terminate instead (termination-aware experiment)", workflow_run_id=task.workflow_run_id, + thoughts=verification_result.thoughts, + status=verification_result.status if verification_result.status else "legacy", ) - action.verified = True + # Create a TerminateAction and execute it + terminate_action = actions.TerminateAction( + reasoning=verification_result.thoughts, + organization_id=action.organization_id, + workflow_run_id=action.workflow_run_id, + task_id=action.task_id, + step_id=action.step_id, + step_order=action.step_order, + action_order=action.action_order, + ) + results = await handle_terminate_action(terminate_action, page, scraped_page, task, step) + action.action_type = ActionType.TERMINATE + action.reasoning = terminate_action.reasoning + action.errors = terminate_action.errors + return results + + if not verification_result.is_complete: + _span.set_attribute("verification_path", "needs_llm_rejected") + return [ActionFailure(exception=IllegitComplete(data={"error": verification_result.thoughts}))] + + _span.set_attribute("verification_path", "needs_llm_verified") + LOG.info( + "CompleteAction has been verified successfully", + workflow_run_id=task.workflow_run_id, + ) + action.verified = True return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.extract") async def handle_extract_action( action: actions.ExtractAction, page: Page, @@ -2259,7 +2288,7 @@ async def handle_extract_action( return [ActionFailure(exception=Exception("No data extraction goal"))] -@traced() +@traced(name="skyvern.agent.action.scroll") async def handle_scroll_action( action: actions.ScrollAction, page: Page, @@ -2373,7 +2402,7 @@ async def handle_scroll_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.keypress") async def handle_keypress_action( action: actions.KeypressAction, page: Page, @@ -2385,7 +2414,7 @@ async def handle_keypress_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.move") async def handle_move_action( action: actions.MoveAction, page: Page, @@ -2397,7 +2426,7 @@ async def handle_move_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.drag") async def handle_drag_action( action: actions.DragAction, page: Page, @@ -2409,7 +2438,7 @@ async def handle_drag_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.verification_code") async def handle_verification_code_action( action: actions.VerificationCodeAction, page: Page, @@ -2426,7 +2455,7 @@ async def handle_verification_code_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.left_mouse") async def handle_left_mouse_action( action: actions.LeftMouseAction, page: Page, @@ -2438,7 +2467,7 @@ async def handle_left_mouse_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.goto_url") async def handle_goto_url_action( action: actions.GotoUrlAction, page: Page, @@ -2450,7 +2479,7 @@ async def handle_goto_url_action( return [ActionSuccess()] -@traced() +@traced(name="skyvern.agent.action.close_page") async def handle_close_page_action( action: actions.ClosePageAction, page: Page, @@ -2559,7 +2588,7 @@ async def chain_click( try: if not await skyvern_element.navigate_to_a_href(page=page): await EventStrategyFactory.move_to_element(page, locator) - await locator.click(timeout=timeout) + await EventStrategyFactory.click(page, locator, timeout=timeout) LOG.info("Chain click: main element click succeeded", action=action, locator=locator) return [ActionSuccess()] @@ -2575,7 +2604,7 @@ async def chain_click( locator=locator, ) if bound_element := await skyvern_element.find_label_for(dom=dom): - await bound_element.get_locator().click(timeout=timeout) + await EventStrategyFactory.click(page, bound_element.get_locator(), timeout=timeout) action_results.append(ActionSuccess()) return action_results except Exception as e: @@ -2593,7 +2622,7 @@ async def chain_click( if bound_element := await skyvern_element.find_element_in_label_children( dom=dom, element_type=InteractiveElement.INPUT ): - await bound_element.get_locator().click(timeout=timeout) + await EventStrategyFactory.click(page, bound_element.get_locator(), timeout=timeout) action_results.append(ActionSuccess()) return action_results except Exception as e: @@ -2611,6 +2640,8 @@ async def chain_click( ) if bound_locator := await skyvern_element.find_bound_label_by_attr_id(): # click on (0, 0) to avoid playwright clicking on the wrong element by accident + # Intentional positional click bypassing EventStrategyFactory — position + # anchoring at (0,0) avoids mis-targeting nested elements inside labels await bound_locator.click(timeout=timeout, position={"x": 0, "y": 0}) action_results.append(ActionSuccess()) return action_results @@ -2628,6 +2659,8 @@ async def chain_click( ) if bound_locator := await skyvern_element.find_bound_label_by_direct_parent(): # click on (0, 0) to avoid playwright clicking on the wrong element by accident + # Intentional positional click bypassing EventStrategyFactory — position + # anchoring at (0,0) avoids mis-targeting nested elements inside labels await bound_locator.click(timeout=timeout, position={"x": 0, "y": 0}) action_results.append(ActionSuccess()) return action_results @@ -2707,7 +2740,7 @@ async def chain_click( locator=locator, ) - await blocking_element.get_locator().click(timeout=timeout) + await EventStrategyFactory.click(page, blocking_element.get_locator(), timeout=timeout) action_results.append(ActionSuccess()) return action_results except Exception as e: @@ -2732,7 +2765,7 @@ async def chain_click( return [ActionFailure(WrongElementToUploadFile(action.element_id))] -@traced() +@traced(name="skyvern.agent.dropdown.auto_completion") async def choose_auto_completion_dropdown( context: InputOrSelectContext, page: Page, @@ -2818,7 +2851,9 @@ async def choose_auto_completion_dropdown( input_value=text, ) try: - await fast_path_locator.click(timeout=settings.BROWSER_ACTION_TIMEOUT_MS) + await EventStrategyFactory.click( + page, fast_path_locator, timeout=settings.BROWSER_ACTION_TIMEOUT_MS + ) clear_input = False result.action_result = ActionSuccess() return result @@ -3122,7 +3157,7 @@ async def input_or_auto_complete_input( return None -@traced() +@traced(name="skyvern.agent.dropdown.discover_and_select") async def discover_and_select_from_full_dropdown( context: InputOrSelectContext, page: Page, @@ -3150,7 +3185,9 @@ async def discover_and_select_from_full_dropdown( # Try click first to open the dropdown (most combobox components respond to click) try: - await skyvern_element.get_locator().click(timeout=settings.BROWSER_ACTION_TIMEOUT_MS) + await EventStrategyFactory.click( + page, skyvern_element.get_locator(), timeout=settings.BROWSER_ACTION_TIMEOUT_MS + ) except Exception: LOG.info( "Click failed in discover fallback, continuing to ArrowDown", @@ -3307,7 +3344,7 @@ async def discover_and_select_from_full_dropdown( await incremental_scraped.stop_listen_dom_increment() -@traced() +@traced(name="skyvern.agent.dropdown.select_sequential") async def sequentially_select_from_dropdown( action: SelectOptionAction, input_or_select_context: InputOrSelectContext, @@ -3487,7 +3524,7 @@ class CustomSelectPromptOptions(BaseModel): target_value: str | None = None -@traced() +@traced(name="skyvern.agent.dropdown.select_emerging") async def select_from_emerging_elements( current_element_id: str, options: CustomSelectPromptOptions, @@ -3580,7 +3617,7 @@ async def select_from_emerging_elements( return ActionSuccess() -@traced() +@traced(name="skyvern.agent.dropdown.select") async def select_from_dropdown( context: InputOrSelectContext, page: Page, @@ -3767,7 +3804,7 @@ async def select_from_dropdown( value=value, ) await EventStrategyFactory.move_to_element(page, locator) - await locator.click(timeout=timeout) + await EventStrategyFactory.click(page, locator, timeout=timeout) single_select_result.action_result = ActionSuccess() return single_select_result except Exception as e: @@ -3775,7 +3812,7 @@ async def select_from_dropdown( return single_select_result -@traced() +@traced(name="skyvern.agent.dropdown.select_by_value") async def select_from_dropdown_by_value( value: str, page: Page, @@ -3796,7 +3833,7 @@ async def select_from_dropdown_by_value( element_locator = await incremental_scraped.select_one_element_by_value(value=value) if element_locator is not None: - await element_locator.click(timeout=timeout) + await EventStrategyFactory.click(page, element_locator, timeout=timeout) return ActionSuccess() if dropdown_menu_element is None: @@ -3832,7 +3869,7 @@ async def select_from_dropdown_by_value( element_locator = await incre_scraped.select_one_element_by_value(value=value) if element_locator is not None: - await element_locator.click(timeout=timeout) + await EventStrategyFactory.click(page, element_locator, timeout=timeout) nonlocal selected selected = True return False @@ -4002,7 +4039,7 @@ async def try_to_find_potential_scrollable_element( return skyvern_element -@traced() +@traced(name="skyvern.agent.dropdown.scroll_load_options") async def scroll_down_to_load_all_options( scrollable_element: SkyvernElement, page: Page, @@ -4121,9 +4158,7 @@ async def normal_select( value: str | None = json_response.get("value") try: - await locator.click( - timeout=settings.BROWSER_ACTION_TIMEOUT_MS, - ) + await EventStrategyFactory.click(locator.page, locator, timeout=settings.BROWSER_ACTION_TIMEOUT_MS) except Exception as e: LOG.info( "Failed to click before select action", @@ -4603,7 +4638,31 @@ async def extract_information_for_navigation_goal( # dict / list / str — the `extract-information` prompt uses # `force_dict=False`, so root `type: array` or scalar schemas are valid # return shapes (matches ``ScrapeResult.scraped_data``). + # TEMPORARY INSTRUMENTATION (SKY-8992): the dual-write block below appears + # to never populate Redis in production despite the code being deployed + # and the cloud override verified. Log the gate inputs every call so we + # can see which guard is closing the block. Revert after root-cause is + # identified. + LOG.info( + "extract_information cache store gate", + task_id=task.task_id, + workflow_run_id=task.workflow_run_id, + workflow_permanent_id=task.workflow_permanent_id, + cache_key_present=cache_key is not None, + json_response_type=type(json_response).__name__, + json_response_is_cacheable=isinstance(json_response, (dict, list, str)), + cache_path="handler", + ) if cache_key is not None and isinstance(json_response, (dict, list, str)): + # TEMPORARY INSTRUMENTATION (SKY-8992): confirm the dual-write block is entered. + LOG.info( + "extract_information cache store block entered", + task_id=task.task_id, + workflow_run_id=task.workflow_run_id, + workflow_permanent_id=task.workflow_permanent_id, + cache_key=cache_key, + cache_path="handler", + ) try: extraction_cache.store(task.workflow_run_id, cache_key, json_response) except Exception: @@ -4657,7 +4716,7 @@ async def click_listbox_option( try: skyvern_element = await dom.get_skyvern_element_by_id(child["id"]) locator = skyvern_element.locator - await locator.click(timeout=1000) + await EventStrategyFactory.click(page, locator, timeout=1000) return True except Exception: diff --git a/skyvern/webeye/browser_factory.py b/skyvern/webeye/browser_factory.py index 913f75c73..15bafaedb 100644 --- a/skyvern/webeye/browser_factory.py +++ b/skyvern/webeye/browser_factory.py @@ -795,6 +795,11 @@ async def _connect_to_cdp_browser( LOG.info("Connecting browser CDP connection", remote_browser_url=remote_browser_url) browser = await playwright.chromium.connect_over_cdp(remote_browser_url) + # Mark as remote browser so strategies can adapt their dispatch method + ctx = current() + if ctx is not None: + ctx.is_remote_browser = True + if apply_download_behaviour: await _apply_download_behaviour(browser) diff --git a/skyvern/webeye/scraper/scraper.py b/skyvern/webeye/scraper/scraper.py index 7e87fc0d8..777e04f98 100644 --- a/skyvern/webeye/scraper/scraper.py +++ b/skyvern/webeye/scraper/scraper.py @@ -138,7 +138,6 @@ def build_element_dict( return id_to_css_dict, id_to_element_dict, id_to_frame_dict, id_to_element_hash, hash_to_element_ids -@traced(name="skyvern.agent.scrape_retry") async def scrape_website( browser_state: BrowserState, url: str, @@ -360,7 +359,7 @@ async def scrape_web_unsafe( LOG.warning("Failed to get current x, y position of the page", exc_info=True) _tracer = otel_trace.get_tracer("skyvern") - with _tracer.start_as_current_span("skyvern.agent.screenshot") as _ss_span: + with _tracer.start_as_current_span("skyvern.browser.scrape_screenshot") as _ss_span: apply_context_attrs(_ss_span) # Hardcoded since this is an inline span, not a @traced method. # Update if scrape_web_unsafe is renamed. @@ -542,6 +541,9 @@ async def get_interactable_element_tree( context = skyvern_context.ensure_context() frames = await get_all_children_frames(page) frames = await filter_frames(frames, scrape_exclude) + # Iframe-heavy pages dominate the 1.8s p95 here; frame_count lets us + # slice element_tree latency by page complexity. + otel_trace.get_current_span().set_attribute("frame_count", len(frames)) for frame in frames: frame_index = context.frame_index_map.get(frame, None) @@ -580,7 +582,7 @@ class IncrementalScrapePage(ElementTreeBuilder): return True return False - @traced() + @traced(name="skyvern.agent.incremental_element_tree") async def get_incremental_element_tree( self, cleanup_element_tree: CleanupElementTreeFunc, diff --git a/skyvern/webeye/utils/dom.py b/skyvern/webeye/utils/dom.py index 6fdf7d8cf..86d53dfd1 100644 --- a/skyvern/webeye/utils/dom.py +++ b/skyvern/webeye/utils/dom.py @@ -851,7 +851,7 @@ class SkyvernElement: await EventStrategyFactory.move_to_element(page, self.get_locator()) try: - await self.get_locator().click(timeout=timeout) + await EventStrategyFactory.click(page, self.get_locator(), timeout=timeout) return except Exception: LOG.info("Failed to click by playwright", exc_info=True, element_id=self.get_id()) @@ -863,7 +863,7 @@ class SkyvernElement: blocking_element, _ = await self.find_blocking_element(dom=dom, incremental_page=incremental_page) if blocking_element: LOG.debug("Find the blocking element", element_id=blocking_element.get_id()) - await blocking_element.get_locator().click(timeout=timeout) + await EventStrategyFactory.click(page, blocking_element.get_locator(), timeout=timeout) return except Exception: LOG.info("Failed to click on the blocking element", exc_info=True, element_id=self.get_id()) diff --git a/skyvern/webeye/utils/page.py b/skyvern/webeye/utils/page.py index 5e3d30bd2..34084837b 100644 --- a/skyvern/webeye/utils/page.py +++ b/skyvern/webeye/utils/page.py @@ -7,6 +7,7 @@ from io import BytesIO from typing import Any import structlog +from opentelemetry import trace as otel_trace from PIL import Image from playwright._impl._errors import Error as PlaywrightError from playwright._impl._errors import TimeoutError @@ -15,7 +16,7 @@ from playwright.async_api import ElementHandle, Frame, Page from skyvern.constants import PAGE_CONTENT_TIMEOUT, SKYVERN_DIR from skyvern.exceptions import FailedToTakeScreenshot from skyvern.forge.sdk.settings_manager import SettingsManager -from skyvern.forge.sdk.trace import traced +from skyvern.forge.sdk.trace import apply_context_attrs, traced LOG = structlog.get_logger() @@ -282,7 +283,7 @@ class SkyvernFrame: return await SkyvernFrame.evaluate(frame=frame, expression="() => document.location.href") @staticmethod - @traced() + @traced(name="skyvern.browser.scrolling_screenshot") async def take_scrolling_screenshot( page: Page, file_path: str | None = None, @@ -356,7 +357,7 @@ class SkyvernFrame: await skyvern_frame.safe_scroll_to_x_y(x, y) @staticmethod - @traced() + @traced(name="skyvern.browser.split_screenshots") async def take_split_screenshots( page: Page, url: str | None = None, @@ -552,7 +553,7 @@ class SkyvernFrame: js_script = "() => removeAllUniqueIds()" await self.evaluate(frame=self.frame, expression=js_script) - @traced() + @traced(name="skyvern.browser.element_tree_from_body") async def build_tree_from_body( self, frame_name: str | None, @@ -569,7 +570,7 @@ class SkyvernFrame: arg=[frame_name, frame_index, must_included_tags], ) - @traced() + @traced(name="skyvern.browser.incremental_element_tree") async def get_incremental_element_tree( self, wait_until_finished: bool = True, @@ -580,7 +581,7 @@ class SkyvernFrame: frame=self.frame, expression=js_script, timeout_ms=timeout_ms, arg=[wait_until_finished] ) - @traced() + @traced(name="skyvern.browser.element_tree_from_element") async def build_tree_from_element( self, starter: ElementHandle, @@ -595,11 +596,20 @@ class SkyvernFrame: @traced(name="skyvern.browser.wait_for_animation") async def safe_wait_for_animation_end(self, before_wait_sec: float = 0, timeout_ms: float = 3000) -> None: + # Separates the fast finished-quickly path from the timeout/error paths + # that burn the full timeout budget — explains the 124x p95/p50 ratio. + _span = otel_trace.get_current_span() try: await asyncio.sleep(before_wait_sec) await self.frame.wait_for_load_state("load", timeout=timeout_ms) await self.wait_for_animation_end(timeout_ms=timeout_ms) + _span.set_attribute("animation_result", "finished") + except (TimeoutError, asyncio.TimeoutError): + _span.set_attribute("animation_result", "timeout") + LOG.debug("Timed out waiting for animation end, but ignore it", exc_info=True) + return except Exception: + _span.set_attribute("animation_result", "error") LOG.debug("Failed to wait for animation end, but ignore it", exc_info=True) return @@ -615,6 +625,7 @@ class SkyvernFrame: return await asyncio.sleep(0.1) + @traced(name="skyvern.browser.page_ready", role="wrapper") async def wait_for_page_ready( self, network_idle_timeout_ms: float = 3000, @@ -635,70 +646,84 @@ class SkyvernFrame: before attempting to interact with elements. """ total_start_time = time.time() + _tracer = otel_trace.get_tracer("skyvern") # 1. Wait for loading indicators to disappear (longest timeout first) loading_indicator_duration_ms = 0.0 step_start_time = time.time() loading_indicator_result = "success" - try: - await self._wait_for_loading_indicators_gone(timeout_ms=loading_indicator_timeout_ms) - except (TimeoutError, asyncio.TimeoutError): - loading_indicator_result = "timeout" - LOG.warning("Loading indicator timeout - some indicators may still be present, proceeding") - except Exception: - loading_indicator_result = "error" - LOG.warning("Failed to check loading indicators, proceeding", exc_info=True) - finally: - loading_indicator_duration_ms = (time.time() - step_start_time) * 1000 - LOG.info( - "page_readiness_check", - step="loading_indicators", - result=loading_indicator_result, - duration_ms=loading_indicator_duration_ms, - timeout_ms=loading_indicator_timeout_ms, - ) + with _tracer.start_as_current_span("skyvern.browser.page_ready.loading_indicators") as _li_span: + apply_context_attrs(_li_span) + _li_span.set_attribute("timeout_ms", loading_indicator_timeout_ms) + try: + await self._wait_for_loading_indicators_gone(timeout_ms=loading_indicator_timeout_ms) + except (TimeoutError, asyncio.TimeoutError): + loading_indicator_result = "timeout" + LOG.warning("Loading indicator timeout - some indicators may still be present, proceeding") + except Exception: + loading_indicator_result = "error" + LOG.warning("Failed to check loading indicators, proceeding", exc_info=True) + finally: + loading_indicator_duration_ms = (time.time() - step_start_time) * 1000 + _li_span.set_attribute("result", loading_indicator_result) + LOG.info( + "page_readiness_check", + step="loading_indicators", + result=loading_indicator_result, + duration_ms=loading_indicator_duration_ms, + timeout_ms=loading_indicator_timeout_ms, + ) # 2. Wait for network idle (with short timeout - some pages never go idle) network_idle_duration_ms = 0.0 step_start_time = time.time() network_idle_result = "success" - try: - await self.frame.wait_for_load_state("networkidle", timeout=network_idle_timeout_ms) - except (TimeoutError, asyncio.TimeoutError): - network_idle_result = "timeout" - LOG.warning("Network idle timeout - page may have constant activity, proceeding") - finally: - network_idle_duration_ms = (time.time() - step_start_time) * 1000 - LOG.info( - "page_readiness_check", - step="network_idle", - result=network_idle_result, - duration_ms=network_idle_duration_ms, - timeout_ms=network_idle_timeout_ms, - ) + with _tracer.start_as_current_span("skyvern.browser.page_ready.network_idle") as _ni_span: + apply_context_attrs(_ni_span) + _ni_span.set_attribute("timeout_ms", network_idle_timeout_ms) + try: + await self.frame.wait_for_load_state("networkidle", timeout=network_idle_timeout_ms) + except (TimeoutError, asyncio.TimeoutError): + network_idle_result = "timeout" + LOG.warning("Network idle timeout - page may have constant activity, proceeding") + finally: + network_idle_duration_ms = (time.time() - step_start_time) * 1000 + _ni_span.set_attribute("result", network_idle_result) + LOG.info( + "page_readiness_check", + step="network_idle", + result=network_idle_result, + duration_ms=network_idle_duration_ms, + timeout_ms=network_idle_timeout_ms, + ) # 3. Wait for DOM to stabilize dom_stability_duration_ms = 0.0 step_start_time = time.time() dom_stability_result = "success" - try: - await self._wait_for_dom_stable(stable_ms=dom_stable_ms, timeout_ms=dom_stability_timeout_ms) - except (TimeoutError, asyncio.TimeoutError): - dom_stability_result = "timeout" - LOG.warning("DOM stability timeout - DOM may still be changing, proceeding") - except Exception: - dom_stability_result = "error" - LOG.warning("Failed to check DOM stability, proceeding", exc_info=True) - finally: - dom_stability_duration_ms = (time.time() - step_start_time) * 1000 - LOG.info( - "page_readiness_check", - step="dom_stability", - result=dom_stability_result, - duration_ms=dom_stability_duration_ms, - timeout_ms=dom_stability_timeout_ms, - stable_ms=dom_stable_ms, - ) + with _tracer.start_as_current_span("skyvern.browser.page_ready.dom_stability") as _ds_span: + apply_context_attrs(_ds_span) + _ds_span.set_attribute("timeout_ms", dom_stability_timeout_ms) + _ds_span.set_attribute("stable_ms", dom_stable_ms) + try: + await self._wait_for_dom_stable(stable_ms=dom_stable_ms, timeout_ms=dom_stability_timeout_ms) + except (TimeoutError, asyncio.TimeoutError): + dom_stability_result = "timeout" + LOG.warning("DOM stability timeout - DOM may still be changing, proceeding") + except Exception: + dom_stability_result = "error" + LOG.warning("Failed to check DOM stability, proceeding", exc_info=True) + finally: + dom_stability_duration_ms = (time.time() - step_start_time) * 1000 + _ds_span.set_attribute("result", dom_stability_result) + LOG.info( + "page_readiness_check", + step="dom_stability", + result=dom_stability_result, + duration_ms=dom_stability_duration_ms, + timeout_ms=dom_stability_timeout_ms, + stable_ms=dom_stable_ms, + ) # Log total page readiness check duration total_duration_ms = (time.time() - total_start_time) * 1000 diff --git a/tests/unit/embedded/conftest.py b/tests/unit/embedded/conftest.py index 6e43a7ee1..2514ef460 100644 --- a/tests/unit/embedded/conftest.py +++ b/tests/unit/embedded/conftest.py @@ -37,3 +37,25 @@ def _restore_settings_fixture(): # type: ignore[no-untyped-def] # pytest fixtu os.environ.pop("SKYVERN_API_KEY", None) else: os.environ["SKYVERN_API_KEY"] = prev_api_key + + +@pytest.fixture(autouse=True) +def _reset_mcp_session_contextvar(): # type: ignore[no-untyped-def] # pytest fixture + """Reset the mcp_session ContextVar in pytest's main context before each test. + + pytest-asyncio runs the async browser fixture and the test body in separate + asyncio tasks. Each task gets its own copy of the parent ContextVars context + at creation, so ContextVar.set() in the fixture does NOT propagate to the + test task. The fixture relies on the _global_session module-level fallback + in get_current_session() to bridge the two, but that fallback only fires + when _current_session.get() returns None. Any other test that leaves a + populated SessionState in the pytest main context poisons every later + async test task and defeats the fallback. + """ + from skyvern.cli.core import session_manager + + token = session_manager._current_session.set(None) + try: + yield + finally: + session_manager._current_session.reset(token) diff --git a/tests/unit/test_copilot_agent_helpers.py b/tests/unit/test_copilot_agent_helpers.py index 3b6df82e0..75b466c9f 100644 --- a/tests/unit/test_copilot_agent_helpers.py +++ b/tests/unit/test_copilot_agent_helpers.py @@ -2,6 +2,8 @@ from __future__ import annotations +import json +from types import SimpleNamespace from unittest.mock import MagicMock @@ -189,3 +191,126 @@ class TestShouldRestorePersistedWorkflow: r = self._result(persisted=False, updated_workflow=None) assert _should_restore_persisted_workflow(True, r) is False assert _should_restore_persisted_workflow(False, r) is False + + +def _fake_run_result(payload: dict) -> SimpleNamespace: + """Minimal shim for ``RunResultStreaming`` — extract_final_text reads ``final_output``.""" + return SimpleNamespace(final_output=json.dumps(payload), new_items=[]) + + +def _chat_request() -> SimpleNamespace: + return SimpleNamespace( + workflow_id="wf-1", + workflow_permanent_id="wfp-1", + workflow_copilot_chat_id="chat-1", + ) + + +class TestTranslateToAgentResultGating: + """Covers the three SKY-9143 invariants that live in _translate_to_agent_result.""" + + def test_inline_replace_workflow_resets_test_ok_after_prior_pass(self, monkeypatch) -> None: + # A prior run_blocks test passed for the old workflow (ctx.last_test_ok=True, + # ctx.last_workflow=old_wf). The agent then emits inline REPLACE_WORKFLOW + # with a different yaml. The translate helper must invalidate the prior + # test result so _verified_workflow_or_none rejects the untested REPLACE. + from skyvern.forge.sdk.copilot import agent as agent_module + + old_wf = SimpleNamespace(name="old") + new_wf = SimpleNamespace(name="new-from-replace") + monkeypatch.setattr( + "skyvern.forge.sdk.copilot.tools._process_workflow_yaml", + lambda **kwargs: new_wf, + ) + + ctx = _ctx(last_workflow=old_wf, last_workflow_yaml="old: yaml", last_test_ok=True) + result = _fake_run_result( + {"type": "REPLACE_WORKFLOW", "user_response": "Here you go.", "workflow_yaml": "new: yaml"} + ) + agent_result = agent_module._translate_to_agent_result( + result, ctx, global_llm_context=None, chat_request=_chat_request(), organization_id="org-1" + ) + + assert ctx.last_test_ok is None + assert ctx.last_workflow is new_wf + # The REPLACE yaml itself (not the stale snapshot) must land on ctx; + # otherwise a future code path that reads last_workflow_yaml would + # see a string that no longer matches last_workflow. + assert ctx.last_workflow_yaml == "new: yaml" + assert agent_result.updated_workflow is None + assert agent_result.workflow_yaml is None + assert agent_result.response_type == "REPLACE_WORKFLOW" + + def test_inline_replace_with_invalid_yaml_keeps_prior_pass(self, monkeypatch) -> None: + # _process_workflow_yaml raising on a malformed REPLACE must leave + # ctx untouched — no spurious last_test_ok reset, no workflow swap — + # so a prior tested workflow remains available. + import yaml as yaml_mod + + from skyvern.forge.sdk.copilot import agent as agent_module + + tested_wf = SimpleNamespace(name="tested") + + def boom(**kwargs): + raise yaml_mod.YAMLError("mangled yaml") + + monkeypatch.setattr("skyvern.forge.sdk.copilot.tools._process_workflow_yaml", boom) + + ctx = _ctx(last_workflow=tested_wf, last_workflow_yaml="tested: yaml", last_test_ok=True) + result = _fake_run_result( + {"type": "REPLACE_WORKFLOW", "user_response": "here", "workflow_yaml": "::: not yaml"} + ) + agent_result = agent_module._translate_to_agent_result( + result, ctx, global_llm_context=None, chat_request=_chat_request(), organization_id="org-1" + ) + + assert ctx.last_workflow is tested_wf + assert ctx.last_workflow_yaml == "tested: yaml" + assert ctx.last_test_ok is True + assert agent_result.updated_workflow is tested_wf + assert "validation error" in agent_result.user_response.lower() + + def test_ask_question_preserves_model_specific_question(self) -> None: + # The new prompt instructs the model to stop and ASK_QUESTION when it + # cannot test an edit. Row-3 of _rewrite_failed_test_response would + # clobber that specific unblocker with "Could you share more context"; + # the resp_type==ASK_QUESTION guard must skip the rewrite. + from skyvern.forge.sdk.copilot import agent as agent_module + + ctx = _ctx( + last_update_block_count=1, + last_test_ok=None, + last_workflow=SimpleNamespace(name="drafted"), + last_workflow_yaml="drafted: yaml", + ) + specific_question = "I need credentials for site.example — can you link one in Settings?" + result = _fake_run_result({"type": "ASK_QUESTION", "user_response": specific_question}) + agent_result = agent_module._translate_to_agent_result( + result, ctx, global_llm_context=None, chat_request=_chat_request(), organization_id="org-1" + ) + + assert agent_result.user_response == specific_question + # Even ASK_QUESTION must obey the strict gate — no verified workflow this turn. + assert agent_result.updated_workflow is None + assert agent_result.response_type == "ASK_QUESTION" + + def test_reply_still_rewrites_after_failed_test(self) -> None: + # Guard rail for the above: a plain REPLY after a failed test must + # still flow through the "test failed" rewrite so we don't regress + # the original SKY-9143 behavior. + from skyvern.forge.sdk.copilot import agent as agent_module + + ctx = _ctx( + last_update_block_count=2, + last_test_ok=False, + last_test_failure_reason="Failed to navigate to url https://bad.example.", + last_failure_category_top="NAVIGATION_FAILURE", + ) + result = _fake_run_result({"type": "REPLY", "user_response": "All done — your workflow is ready."}) + agent_result = agent_module._translate_to_agent_result( + result, ctx, global_llm_context=None, chat_request=_chat_request(), organization_id="org-1" + ) + + assert "test failed" in agent_result.user_response.lower() + assert "All done" not in agent_result.user_response + assert agent_result.updated_workflow is None diff --git a/tests/unit/test_copilot_cancel_helpers.py b/tests/unit/test_copilot_cancel_helpers.py index ac12898d3..8ae451c1d 100644 --- a/tests/unit/test_copilot_cancel_helpers.py +++ b/tests/unit/test_copilot_cancel_helpers.py @@ -7,16 +7,25 @@ Covers: cancel, but because the row is terminal the real helper would be a no-op. - An SDK-realistic ``asyncio.wait_for`` timeout around the tool coroutine does not leave ``run_task`` running in the background. +- ``_trusted_post_drain_status`` accepts post-drain rows that landed in a + non-ambiguous terminal state (``completed``/``failed``/``terminated``/ + ``timed_out``) and rejects ``canceled``. At post-drain read time a + ``canceled`` row can't be told apart from the synthetic ``canceled`` that + ``mark_workflow_run_as_canceled_if_not_final`` writes as a last-resort + fallback; callsites that want the legitimate-``canceled`` signal read the + row BEFORE invoking the cancel helper. """ from __future__ import annotations import asyncio +from types import SimpleNamespace from typing import Any import pytest -from skyvern.forge.sdk.copilot.tools import _cancel_run_task_if_not_final +from skyvern.forge.sdk.copilot.tools import _cancel_run_task_if_not_final, _trusted_post_drain_status +from skyvern.forge.sdk.workflow.models.workflow import WorkflowRunStatus class _FakeService: @@ -146,3 +155,238 @@ async def test_sdk_style_wait_for_timeout_does_not_leak_background_work( assert "run_task" in run_task_ref assert run_task_ref["run_task"].cancelled() or run_task_ref["run_task"].done() assert service.mark_calls == ["wr_sdk"] + + +# --------------------------------------------------------------------------- +# _trusted_post_drain_status +# --------------------------------------------------------------------------- +# +# After the poll loop exhausts its budget, ``_cancel_run_task_if_not_final`` +# waits briefly for ``execute_workflow``'s shielded finalize, then writes +# synthetic ``canceled`` as a last-resort fallback when nothing else finalized +# the row. At post-drain read time a row with status ``canceled`` could be +# either that synthetic fallback or a legitimate cancel that the drain +# restored — the two are indistinguishable on disk. ``_trusted_post_drain_status`` +# therefore only accepts the unambiguous terminal statuses; the callsite +# reads the row BEFORE the cancel helper runs to catch any legitimate +# ``canceled`` self-finalize. + + +def _run_with_status(status: str) -> SimpleNamespace: + return SimpleNamespace(status=status) + + +@pytest.mark.parametrize( + "status", + [ + WorkflowRunStatus.completed.value, + WorkflowRunStatus.failed.value, + WorkflowRunStatus.terminated.value, + WorkflowRunStatus.timed_out.value, + ], +) +def test_trusted_post_drain_accepts_real_terminal_statuses(status: str) -> None: + """Statuses reachable via ``_finalize_workflow_run_status`` — the drain's + real finalize step — let the tool fall through into the normal result + path so ``run_ok`` / ``failure_reason`` land correctly for the LLM.""" + run = _run_with_status(status) + assert _trusted_post_drain_status(run) == status + + +def test_trusted_post_drain_rejects_canceled() -> None: + """``canceled`` at post-drain read time is ambiguous: it could be the + synthetic fallback that ``mark_workflow_run_as_canceled_if_not_final`` + writes when the run was genuinely stuck, OR a legitimate cancel that + ``_finalize_workflow_run_status`` restored. The helper rejects both; the + callsite handles the legitimate case with a pre-cancel read instead.""" + run = _run_with_status(WorkflowRunStatus.canceled.value) + assert _trusted_post_drain_status(run) is None + + +@pytest.mark.parametrize( + "status", + [ + WorkflowRunStatus.created.value, + WorkflowRunStatus.queued.value, + WorkflowRunStatus.running.value, + WorkflowRunStatus.paused.value, + ], +) +def test_trusted_post_drain_rejects_non_final_statuses(status: str) -> None: + """A non-final post-drain status means the drain did not manage to + reconcile the row within its grace window. Emit the timeout error.""" + run = _run_with_status(status) + assert _trusted_post_drain_status(run) is None + + +def test_trusted_post_drain_handles_missing_row() -> None: + """A missing row (DB read returned None) is treated the same as a + non-final row: no trusted status, caller emits the timeout error.""" + assert _trusted_post_drain_status(None) is None + + +# --------------------------------------------------------------------------- +# Reconciliation guard via ``_tool_loop_error`` +# --------------------------------------------------------------------------- +# +# When a block-running tool times out and the post-drain read doesn't +# reconcile the row to a trustworthy-final status, the tool sets +# ``pending_reconciliation_run_id`` on ``CopilotContext``. Until a +# ``get_run_results`` call observes that run in a trustworthy-final status, +# ``_tool_loop_error`` rejects further block-running tool calls so the LLM +# cannot re-invoke a mutation block whose side effects may have landed. + +from types import SimpleNamespace as _NS # noqa: E402 (grouped with test block) + +from skyvern.forge.sdk.copilot.tools import _BLOCK_RUNNING_TOOLS, _tool_loop_error # noqa: E402 + + +def _guard_ctx(pending_run_id: str | None = None) -> _NS: + """Minimal ctx stub for ``_tool_loop_error``: only the fields it reads.""" + return _NS( + consecutive_tool_tracker=[], + repeated_action_fingerprint_streak_count=0, + last_test_non_retriable_nav_error=None, + pending_reconciliation_run_id=pending_run_id, + ) + + +@pytest.mark.parametrize("tool_name", sorted(_BLOCK_RUNNING_TOOLS)) +def test_reconciliation_guard_blocks_block_running_tools(tool_name: str) -> None: + """With a pending run set, every tool in ``_BLOCK_RUNNING_TOOLS`` is + rejected with an error that names the run id and directs the LLM to call + ``get_run_results`` first.""" + ctx = _guard_ctx(pending_run_id="wr_pending_123") + err = _tool_loop_error(ctx, tool_name) + assert isinstance(err, str) + assert "wr_pending_123" in err + assert "get_run_results" in err + + +@pytest.mark.parametrize( + "tool_name", + ["get_run_results", "update_workflow", "list_credentials"], +) +def test_reconciliation_guard_ignores_non_block_running_tools(tool_name: str) -> None: + """The guard is scoped to ``_BLOCK_RUNNING_TOOLS``. Planning / metadata + tools (including ``get_run_results`` itself, which is the tool that can + CLEAR the flag) must not be rejected.""" + ctx = _guard_ctx(pending_run_id="wr_pending_123") + assert _tool_loop_error(ctx, tool_name) is None + + +def test_reconciliation_guard_passes_when_flag_empty() -> None: + """No pending run → `_tool_loop_error` returns None for block-running + tools (assuming no other guard trips).""" + ctx = _guard_ctx(pending_run_id=None) + for name in _BLOCK_RUNNING_TOOLS: + assert _tool_loop_error(ctx, name) is None + + +def test_reconciliation_guard_rejects_empty_string_run_id() -> None: + """An empty string is treated the same as None — the flag is considered + not set. Prevents a spurious guard trip if anything ever clears the flag + to ``''`` instead of ``None``.""" + ctx = _guard_ctx(pending_run_id="") + for name in _BLOCK_RUNNING_TOOLS: + assert _tool_loop_error(ctx, name) is None + + +# --------------------------------------------------------------------------- +# Clearing the reconciliation guard: ``_maybe_clear_reconciliation_flag`` +# --------------------------------------------------------------------------- +# +# ``get_run_results_tool`` calls this helper after ``_get_run_results`` +# returns. Testing the helper directly keeps the tests targeted and avoids +# the Agents-SDK ``@function_tool`` wrapper machinery. + + +from skyvern.forge.sdk.copilot.tools import _maybe_clear_reconciliation_flag # noqa: E402 + + +def _ctx_with_pending(run_id: str | None) -> _NS: + return _NS(pending_reconciliation_run_id=run_id) + + +def test_reconciliation_flag_clears_on_matching_trusted_final_status() -> None: + """Same run_id AND a trustworthy-final status (``completed``/``failed``/ + ``terminated``/``timed_out``) — the flag clears so the LLM can resume.""" + ctx = _ctx_with_pending("wr_match") + result = { + "ok": True, + "data": { + "workflow_run_id": "wr_match", + "overall_status": WorkflowRunStatus.completed.value, + }, + } + _maybe_clear_reconciliation_flag(ctx, result) + assert ctx.pending_reconciliation_run_id is None + + +def test_reconciliation_flag_does_not_clear_on_matching_canceled() -> None: + """The canceled case is the whole reason the guard exists. Even an + explicit read of the pending run that returns ``canceled`` must keep + the guard set — the LLM should report to the user and await input.""" + ctx = _ctx_with_pending("wr_match") + result = { + "ok": True, + "data": { + "workflow_run_id": "wr_match", + "overall_status": WorkflowRunStatus.canceled.value, + }, + } + _maybe_clear_reconciliation_flag(ctx, result) + assert ctx.pending_reconciliation_run_id == "wr_match" + + +def test_reconciliation_flag_does_not_clear_on_non_matching_run_id() -> None: + """``get_run_results(workflow_run_id=None)`` resolves internally to the + most-recent finished run. If that resolves to a different run than the + one the guard is pending on, the flag must NOT clear.""" + ctx = _ctx_with_pending("wr_match") + result = { + "ok": True, + "data": { + "workflow_run_id": "wr_other", + "overall_status": WorkflowRunStatus.completed.value, + }, + } + _maybe_clear_reconciliation_flag(ctx, result) + assert ctx.pending_reconciliation_run_id == "wr_match" + + +@pytest.mark.parametrize( + "status", + [ + WorkflowRunStatus.running.value, + WorkflowRunStatus.queued.value, + WorkflowRunStatus.paused.value, + ], +) +def test_reconciliation_flag_does_not_clear_on_non_final_status(status: str) -> None: + """A matching run_id but a non-final status (e.g. ``running``) leaves + the flag set — the run hasn't reached a trustworthy outcome yet.""" + ctx = _ctx_with_pending("wr_match") + result = { + "ok": True, + "data": {"workflow_run_id": "wr_match", "overall_status": status}, + } + _maybe_clear_reconciliation_flag(ctx, result) + assert ctx.pending_reconciliation_run_id == "wr_match" + + +def test_reconciliation_flag_noop_when_unset() -> None: + """If no reconciliation is pending, the helper must be a silent no-op — + even when the result shape is unexpected.""" + ctx = _ctx_with_pending(None) + _maybe_clear_reconciliation_flag(ctx, {"ok": False, "error": "x"}) + assert ctx.pending_reconciliation_run_id is None + + +def test_reconciliation_flag_noop_on_malformed_result() -> None: + """A result without a `data` dict (e.g. an error envelope) must leave + the flag untouched — we only clear when we can affirmatively see a + trusted status for the pending run_id.""" + ctx = _ctx_with_pending("wr_match") + _maybe_clear_reconciliation_flag(ctx, {"ok": False, "error": "run not found"}) + assert ctx.pending_reconciliation_run_id == "wr_match" diff --git a/tests/unit/test_mcp_http_auth.py b/tests/unit/test_mcp_http_auth.py index a677d341b..35bb33af0 100644 --- a/tests/unit/test_mcp_http_auth.py +++ b/tests/unit/test_mcp_http_auth.py @@ -305,10 +305,13 @@ async def test_validate_mcp_api_key_retries_transient_failure_without_negative_c def test_profile_to_mcp_url_normalizes_base_variants() -> None: - assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com") == "https://api.skyvern.com/mcp/" - assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com/") == "https://api.skyvern.com/mcp/" - assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com/mcp") == "https://api.skyvern.com/mcp/" - assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com/mcp/") == "https://api.skyvern.com/mcp/" + # Canonical form has no trailing slash so the advertised MCP resource URI + # matches what clients send during RFC 8707 audience / RFC 9728 + # protected-resource comparison. + assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com") == "https://api.skyvern.com/mcp" + assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com/") == "https://api.skyvern.com/mcp" + assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com/mcp") == "https://api.skyvern.com/mcp" + assert mcp_http_auth._canonical_mcp_url("https://api.skyvern.com/mcp/") == "https://api.skyvern.com/mcp" def test_resource_metadata_url_normalizes_base_variants() -> None: @@ -337,6 +340,87 @@ def test_validate_token_audience_accepts_matching_url() -> None: ) +def test_validate_token_audience_tolerates_trailing_slash_mismatch() -> None: + # Token audience minted against the slashed form must still validate when + # the canonical (slashless) expected_resource is used, and vice versa. + mcp_http_auth._validate_token_audience( + {"aud": ["https://api.skyvern.com/mcp/"]}, + "https://api.skyvern.com/mcp", + ) + mcp_http_auth._validate_token_audience( + {"aud": ["https://api.skyvern.com/mcp"]}, + "https://api.skyvern.com/mcp/", + ) + + +def test_validate_token_resource_claim_tolerates_trailing_slash_mismatch() -> None: + # Same normalization applies to the RFC 8707 `resource` claim. + mcp_http_auth._validate_token_resource_claims( + {"resource": "https://api.skyvern.com/mcp/"}, + "https://api.skyvern.com/mcp", + ) + mcp_http_auth._validate_token_resource_claims( + {"resource": "https://api.skyvern.com/mcp"}, + "https://api.skyvern.com/mcp/", + ) + + +def test_validate_token_audience_rejects_missing_aud() -> None: + # Payload without any `aud` key at all must reject — the `any(...)` check + # on an empty audience list cannot match the expected resource. + with pytest.raises(HTTPException, match="Token audience is not valid for this MCP resource"): + mcp_http_auth._validate_token_audience({}, "https://api.skyvern.com/mcp") + + +def test_validate_token_audience_rejects_none_aud() -> None: + with pytest.raises(HTTPException, match="Token audience is not valid for this MCP resource"): + mcp_http_auth._validate_token_audience({"aud": None}, "https://api.skyvern.com/mcp") + + +def test_validate_token_audience_rejects_empty_list_aud() -> None: + with pytest.raises(HTTPException, match="Token audience is not valid for this MCP resource"): + mcp_http_auth._validate_token_audience({"aud": []}, "https://api.skyvern.com/mcp") + + +def test_validate_token_audience_filters_non_string_list_items() -> None: + # Non-string items inside the `aud` array are silently dropped (per the + # asymmetry documented in _validate_token_audience); with only garbage in + # the list, there is nothing to match against the expected resource. + with pytest.raises(HTTPException, match="Token audience is not valid for this MCP resource"): + mcp_http_auth._validate_token_audience({"aud": [42, None, {}]}, "https://api.skyvern.com/mcp") + + +def test_validate_token_audience_rejects_different_path_despite_normalization() -> None: + # Guards against a future refactor broadening rstrip normalization into a + # prefix / startswith check. `/mcp-other/` is not a slash-variant of + # `/mcp` and must be rejected. + with pytest.raises(HTTPException, match="Token audience is not valid for this MCP resource"): + mcp_http_auth._validate_token_audience( + {"aud": ["https://api.skyvern.com/mcp-other/"]}, + "https://api.skyvern.com/mcp", + ) + + +def test_validate_token_resource_claim_rejects_different_path_despite_normalization() -> None: + # Same boundary guard for the `resource` claim. + with pytest.raises(HTTPException, match="Token resource is not valid for this MCP resource"): + mcp_http_auth._validate_token_resource_claims( + {"resource": "https://api.skyvern.com/mcp-other/"}, + "https://api.skyvern.com/mcp", + ) + + +def test_validate_token_resource_claim_rejects_non_string_claim() -> None: + # Explicit type guard: a non-string `resource` claim is a malformed token, + # not a slash-variant of the expected value, and gets its own error detail + # so the cause is obvious in logs. + with pytest.raises(HTTPException, match="Token resource claim must be a string"): + mcp_http_auth._validate_token_resource_claims( + {"resource": 42}, + "https://api.skyvern.com/mcp", + ) + + def test_looks_like_jwt_rejects_dotted_opaque_token() -> None: assert mcp_http_auth._looks_like_jwt("opaque.with.dots") is False diff --git a/tests/unit/test_mcp_session_lifecycle.py b/tests/unit/test_mcp_session_lifecycle.py index a0e2f0b18..735a8e043 100644 --- a/tests/unit/test_mcp_session_lifecycle.py +++ b/tests/unit/test_mcp_session_lifecycle.py @@ -14,6 +14,10 @@ from skyvern.cli.mcp_tools import session as mcp_session @pytest.fixture(autouse=True) def _reset_singletons() -> None: + # Must leave _current_session as None in the pytest main context; any populated + # SessionState written here would be inherited by every later async test task + # via contextvars.copy_context() and would short-circuit the _global_session + # fallback in get_current_session(). client_mod._skyvern_instance.set(None) client_mod._api_key_override.set(None) client_mod._global_skyvern_instance = None @@ -22,7 +26,6 @@ def _reset_singletons() -> None: session_manager._current_session.set(None) session_manager._global_session = None session_manager.set_stateless_http_mode(False) - mcp_session.set_current_session(mcp_session.SessionState()) def test_get_skyvern_reuses_global_instance_across_contexts(monkeypatch: pytest.MonkeyPatch) -> None: diff --git a/tests/unit/test_missing_starter_url.py b/tests/unit/test_missing_starter_url.py new file mode 100644 index 000000000..a835558df --- /dev/null +++ b/tests/unit/test_missing_starter_url.py @@ -0,0 +1,218 @@ +"""Tests for when the first browser block has no starter URL + +Covers: +- MissingStarterUrl exception message formatting. +- The fail-early branch in TaskBlock.execute() for a first task block with no URL. +- The negative case: URL=None but the page already navigated (e.g. browser profile + loaded a homepage) must NOT raise. +""" + +from __future__ import annotations + +from contextlib import contextmanager +from datetime import datetime, timezone +from types import SimpleNamespace +from typing import Any, Iterator +from unittest.mock import AsyncMock, MagicMock, patch + +import pytest + +from skyvern.exceptions import MissingStarterUrl +from skyvern.forge.sdk.schemas.tasks import TaskStatus +from skyvern.forge.sdk.workflow.context_manager import WorkflowRunContext +from skyvern.forge.sdk.workflow.models.block import TaskBlock +from skyvern.forge.sdk.workflow.models.parameter import OutputParameter, ParameterType + + +def _output_parameter(key: str = "task_output") -> OutputParameter: + now = datetime.now(timezone.utc) + return OutputParameter( + parameter_type=ParameterType.OUTPUT, + key=key, + description="test output", + output_parameter_id="op_missing_starter_url_test", + workflow_id="w_missing_starter_url_test", + created_at=now, + modified_at=now, + ) + + +def _workflow_run_context() -> WorkflowRunContext: + return WorkflowRunContext( + workflow_title="test", + workflow_id="w_missing_starter_url_test", + workflow_permanent_id="wpid_missing_starter_url_test", + workflow_run_id="wr_missing_starter_url_test", + aws_client=MagicMock(), + ) + + +@contextmanager +def _mock_block_execute_deps(working_page_url: str) -> Iterator[dict[str, Any]]: + """Patch the app-level singletons used by TaskBlock.execute() for a first-task + scenario and hand the test back the mocks it needs to assert on.""" + + workflow_run = SimpleNamespace( + workflow_run_id="wr_missing_starter_url_test", + workflow_permanent_id="wpid_missing_starter_url_test", + organization_id="o_test", + browser_profile_id=None, + browser_address=None, + ) + + working_page = SimpleNamespace(url=working_page_url) + browser_state = MagicMock() + browser_state.get_working_page = AsyncMock(return_value=working_page) + browser_state.take_fullpage_screenshot = AsyncMock(return_value=None) + browser_state.navigate_to_url = AsyncMock() + + browser_manager = MagicMock() + browser_manager.get_or_create_for_workflow_run = AsyncMock(return_value=browser_state) + + workflow_service = MagicMock() + workflow_service.get_workflow_run = AsyncMock(return_value=workflow_run) + workflow_service.get_workflow_by_permanent_id = AsyncMock( + return_value=MagicMock(workflow_id="w_missing_starter_url_test") + ) + + organization = SimpleNamespace(organization_id="o_test") + organizations_db = MagicMock() + organizations_db.get_organization = AsyncMock(return_value=organization) + + tasks_db = MagicMock() + tasks_db.update_task = AsyncMock() + tasks_db.get_last_task_for_workflow_run = AsyncMock(return_value=None) + + observer_db = MagicMock() + observer_db.update_workflow_run_block = AsyncMock(return_value=MagicMock()) + + database = MagicMock() + database.tasks = tasks_db + database.organizations = organizations_db + database.observer = observer_db + + task = SimpleNamespace(task_id="tsk_test", status=TaskStatus.failed) + step = SimpleNamespace(step_id="stp_test") + + agent = MagicMock() + agent.create_task_and_step_from_block = AsyncMock(return_value=(task, step)) + agent.execute_step = AsyncMock() + + with ( + patch("skyvern.forge.sdk.workflow.models.block.app") as mock_app, + patch( + "skyvern.forge.sdk.workflow.models.block.Block.get_workflow_run_context", + return_value=_workflow_run_context(), + ), + patch( + "skyvern.forge.sdk.workflow.models.block.capture_block_download_baseline", + new=AsyncMock(), + ), + ): + mock_app.BROWSER_MANAGER = browser_manager + mock_app.WORKFLOW_SERVICE = workflow_service + mock_app.DATABASE = database + mock_app.agent = agent + + yield { + "task": task, + "tasks_db": tasks_db, + "agent": agent, + "browser_state": browser_state, + } + + +def test_missing_starter_url_message_uses_block_label() -> None: + exc = MissingStarterUrl(block_label="open_vendor_url") + assert "open_vendor_url" in str(exc) + assert "starting URL" in str(exc) + assert "workflow parameter" in str(exc) + + +def test_missing_starter_url_message_without_label() -> None: + exc = MissingStarterUrl() + assert "first browser block" in str(exc) + assert "starting URL" in str(exc) + + +@pytest.mark.asyncio +@pytest.mark.parametrize("blank_url", ["about:blank", "", ":"]) +async def test_execute_fails_early_when_first_block_has_no_url(blank_url: str) -> None: + """A first browser block with no URL landing on any blank-page marker + (``about:blank``, empty string, or the rare ``":"`` Playwright reports for + brand-new pages) should raise MissingStarterUrl before scraping starts, + instead of the confusing downstream ScrapingFailedBlankPage.""" + + block = TaskBlock( + label="open_vendor_url", + output_parameter=_output_parameter(), + title="Open vendor URL", + url=None, + ) + + with _mock_block_execute_deps(working_page_url=blank_url) as deps: + with pytest.raises(MissingStarterUrl) as excinfo: + await block.execute( + workflow_run_id="wr_missing_starter_url_test", + workflow_run_block_id="wrb_test", + organization_id="o_test", + ) + + assert "open_vendor_url" in str(excinfo.value) + deps["tasks_db"].update_task.assert_any_call( + deps["task"].task_id, + status=TaskStatus.failed, + organization_id="o_test", + failure_reason=str(excinfo.value), + ) + + +@pytest.mark.asyncio +async def test_execute_does_not_raise_when_profile_loaded_a_page() -> None: + """If the browser session/profile navigated the page away from about:blank before + the first task starts, the block has a meaningful page to scrape — the missing-URL + check must NOT fire.""" + + block = TaskBlock( + label="use_existing_session", + output_parameter=_output_parameter(), + title="Use existing session", + url=None, + ) + + with _mock_block_execute_deps(working_page_url="https://example.com/dashboard") as deps: + # execute_step is mocked to no-op; we rely on the task-status lookup below to + # short-circuit the rest of the block runner. + deps["tasks_db"].get_task = AsyncMock( + return_value=SimpleNamespace( + task_id=deps["task"].task_id, + status=TaskStatus.completed, + failure_reason=None, + ) + ) + # The extra downstream services the block touches for a completed task — stub + # enough of them to let execute() return without crashing. + with ( + patch( + "skyvern.forge.sdk.workflow.models.block.app.STORAGE", + new=MagicMock(get_downloaded_files=AsyncMock(return_value=[])), + create=True, + ), + patch( + "skyvern.forge.sdk.workflow.models.block.app.ARTIFACT_MANAGER", + new=MagicMock(create_workflow_run_block_artifact=AsyncMock()), + create=True, + ), + ): + try: + await block.execute( + workflow_run_id="wr_missing_starter_url_test", + workflow_run_block_id="wrb_test", + organization_id="o_test", + ) + except MissingStarterUrl: + pytest.fail("MissingStarterUrl should not be raised when the page has already navigated") + except Exception: + # Other downstream failures (e.g. artifact lookup) are fine — we only + # care that MissingStarterUrl is NOT raised in this configuration. + pass diff --git a/tests/unit/test_workflow_copilot_route.py b/tests/unit/test_workflow_copilot_route.py index c6db45a6d..38b80ab40 100644 --- a/tests/unit/test_workflow_copilot_route.py +++ b/tests/unit/test_workflow_copilot_route.py @@ -255,6 +255,89 @@ async def test_flag_on_mid_stream_disconnect_restores_when_persisted_and_not_aut restore_mock.assert_not_awaited() +@pytest.mark.asyncio +@pytest.mark.parametrize( + ("auto_accept", "workflow_was_persisted", "has_valid_proposal", "prior_proposal", "expect_clear_call"), + [ + # SKY-9143: auto_accept=False + gated-out turn (no proposal) + mid-turn + # persisted draft => we restore AND clear the stale proposed_workflow + # card so reload doesn't resurrect a card the assistant just invalidated. + (False, True, False, {"workflow_id": "stale"}, True), + # Restore fires but there's nothing stale to clear: skip the no-op DB write. + (False, True, False, None, False), + # auto_accept=False chat-only turn (no persisted draft, no proposal) + # must NOT clear a prior turn's proposed_workflow. + (False, False, False, {"workflow_id": "stale"}, False), + # auto_accept=False turn with a valid proposal sets the new card (not clear). + (False, True, True, {"workflow_id": "stale"}, False), + # auto_accept=True never writes proposed_workflow at all. + (True, True, False, {"workflow_id": "stale"}, False), + (True, True, True, {"workflow_id": "stale"}, False), + ], +) +async def test_proposed_workflow_cleared_on_restore( + monkeypatch: pytest.MonkeyPatch, + auto_accept: bool, + workflow_was_persisted: bool, + has_valid_proposal: bool, + prior_proposal: dict | None, + expect_clear_call: bool, +) -> None: + monkeypatch.setattr(settings, "ENABLE_WORKFLOW_COPILOT_V2", True) + + captured = _install_fake_create(monkeypatch) + + chat = SimpleNamespace( + workflow_copilot_chat_id="chat-1", + workflow_permanent_id="wpid-1", + organization_id="org-1", + proposed_workflow=prior_proposal, + auto_accept=auto_accept, + ) + original_workflow = SimpleNamespace( + workflow_id="wf-canonical", + title="Original", + description="Original description", + workflow_definition=None, + ) + proposal = MagicMock(spec=["model_dump"]) if has_valid_proposal else None + if proposal is not None: + proposal.model_dump.return_value = {"workflow_id": "wf-canonical"} + agent_result = SimpleNamespace( + user_response="done", + updated_workflow=proposal, + global_llm_context=None, + workflow_yaml=None, + workflow_was_persisted=workflow_was_persisted, + clear_proposed_workflow=False, + ) + + _setup_new_copilot_mocks(monkeypatch, chat, original_workflow, agent_result) + + request = MagicMock() + request.headers = {"x-api-key": "sk-test-key"} + organization = SimpleNamespace(organization_id="org-1") + + response = await workflow_copilot_chat_post(request, _make_chat_request(), organization) + assert response is captured["sentinel"] + + stream = MagicMock() + stream.send = AsyncMock(return_value=True) + stream.is_disconnected = AsyncMock(return_value=False) + + handler = captured["handler"] + assert callable(handler) + await handler(stream) + + update_calls = app.DATABASE.workflow_params.update_workflow_copilot_chat.await_args_list + clear_calls = [c for c in update_calls if c.kwargs.get("proposed_workflow") is None] + + if expect_clear_call: + assert len(clear_calls) == 1, f"expected a proposed_workflow=None clear, got {update_calls!r}" + else: + assert not clear_calls, f"did not expect a clear call, got {update_calls!r}" + + @pytest.mark.asyncio async def test_env_on_short_circuits_posthog(monkeypatch: pytest.MonkeyPatch) -> None: """Env var True must skip the PostHog check entirely and route to v2.""" diff --git a/tests/unit/worker_activity_import_helpers.py b/tests/unit/worker_activity_import_helpers.py deleted file mode 100644 index f66762ba5..000000000 --- a/tests/unit/worker_activity_import_helpers.py +++ /dev/null @@ -1,86 +0,0 @@ -import importlib -import sys -from types import ModuleType, SimpleNamespace - -import pytest - - -def _install_temporal_activity_stubs(monkeypatch: pytest.MonkeyPatch) -> None: - temporalio_package = ModuleType("temporalio") - temporalio_package.__path__ = [] - temporalio_activity = ModuleType("temporalio.activity") - temporalio_activity.defn = lambda fn: fn - temporalio_package.activity = temporalio_activity - - monkeypatch.setitem(sys.modules, "temporalio", temporalio_package) - monkeypatch.setitem(sys.modules, "temporalio.activity", temporalio_activity) - - -def import_cron_worker_activities(monkeypatch: pytest.MonkeyPatch): - cloud_package = ModuleType("cloud") - cloud_package.__path__ = [] - cloud_clients = ModuleType("cloud.clients") - cloud_yescaptcha = ModuleType("cloud.clients.yescaptcha") - cloud_yescaptcha_client = ModuleType("cloud.clients.yescaptcha.client") - cloud_config = ModuleType("cloud.config") - cloud_db = ModuleType("cloud.db") - cloud_agent_db = ModuleType("cloud.db.cloud_agent_db") - cloud_tasks = ModuleType("cloud.tasks") - - class YescaptchaClient: - def __init__(self, client_key: str) -> None: - self.client_key = client_key - - async def update_stuck_tasks_to_timed_out() -> None: - return None - - async def update_stuck_workflow_runs_to_timed_out() -> None: - return None - - cloud_yescaptcha_client.YescaptchaClient = YescaptchaClient - cloud_config.settings = SimpleNamespace( - ENABLE_YESCAPTCHA_BALANCE_ALERT=False, - YESCAPTCHA_API_KEY="", - ) - cloud_agent_db.cloud_db = SimpleNamespace() - cloud_tasks.update_stuck_tasks_to_timed_out = update_stuck_tasks_to_timed_out - cloud_tasks.update_stuck_workflow_runs_to_timed_out = update_stuck_workflow_runs_to_timed_out - - monkeypatch.setitem(sys.modules, "cloud", cloud_package) - monkeypatch.setitem(sys.modules, "cloud.clients", cloud_clients) - monkeypatch.setitem(sys.modules, "cloud.clients.yescaptcha", cloud_yescaptcha) - monkeypatch.setitem(sys.modules, "cloud.clients.yescaptcha.client", cloud_yescaptcha_client) - monkeypatch.setitem(sys.modules, "cloud.config", cloud_config) - monkeypatch.setitem(sys.modules, "cloud.db", cloud_db) - monkeypatch.setitem(sys.modules, "cloud.db.cloud_agent_db", cloud_agent_db) - monkeypatch.setitem(sys.modules, "cloud.tasks", cloud_tasks) - _install_temporal_activity_stubs(monkeypatch) - - sys.modules.pop("workers.cron_worker.activities", None) - return importlib.import_module("workers.cron_worker.activities") - - -def import_temporal_v2_worker_activities(monkeypatch: pytest.MonkeyPatch): - cloud_package = ModuleType("cloud") - cloud_package.__path__ = [] - cloud_services = ModuleType("cloud.services") - data_scrubber_module = ModuleType("cloud.services.data_scrubber_service") - worker_utils_module = ModuleType("workers.worker_utils") - - class DataScrubber: - pass - - async def activity_teardown() -> None: - return None - - data_scrubber_module.DataScrubber = DataScrubber - worker_utils_module.activity_teardown = activity_teardown - - monkeypatch.setitem(sys.modules, "cloud", cloud_package) - monkeypatch.setitem(sys.modules, "cloud.services", cloud_services) - monkeypatch.setitem(sys.modules, "cloud.services.data_scrubber_service", data_scrubber_module) - monkeypatch.setitem(sys.modules, "workers.worker_utils", worker_utils_module) - _install_temporal_activity_stubs(monkeypatch) - - sys.modules.pop("workers.temporal_v2_worker.activities", None) - return importlib.import_module("workers.temporal_v2_worker.activities") diff --git a/uv.lock b/uv.lock index 8717fa448..cac03b239 100644 --- a/uv.lock +++ b/uv.lock @@ -14,7 +14,7 @@ resolution-markers = [ ] [options] -exclude-newer = "2026-04-10T02:43:47.326099Z" +exclude-newer = "2026-04-14T19:31:07.927062Z" exclude-newer-span = "P7D" [manifest] @@ -5606,6 +5606,8 @@ dependencies = [ { name = "fastapi" }, { name = "fastmcp" }, { name = "filetype" }, + { name = "google-auth" }, + { name = "google-auth-oauthlib" }, { name = "google-cloud-aiplatform" }, { name = "greenlet", version = "3.0.3", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version < '3.12'" }, { name = "greenlet", version = "3.4.0", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.12'" }, @@ -5731,6 +5733,8 @@ requires-dist = [ { name = "fastapi", specifier = ">=0.121.0,<0.136" }, { name = "fastmcp", specifier = ">=3.2.0,<4" }, { name = "filetype", specifier = ">=1.2.0,<2" }, + { name = "google-auth", specifier = ">=2.49.2,<3" }, + { name = "google-auth-oauthlib", specifier = ">=1.3.1,<2" }, { name = "google-cloud-aiplatform", specifier = ">=1.90.0,<2" }, { name = "greenlet", marker = "python_full_version == '3.11.*'", specifier = "==3.0.3" }, { name = "greenlet", marker = "python_full_version >= '3.12' and python_full_version < '3.14'", specifier = ">3.0.3" },