mirror of
https://github.com/Skyvern-AI/skyvern.git
synced 2026-04-26 10:41:14 +00:00
1057 lines
36 KiB
Text
1057 lines
36 KiB
Text
SKYVERN WORKFLOW YAML KNOWLEDGE BASE
|
|
|
|
This document provides comprehensive information about Skyvern Workflow YAML structure and blocks. Use this to understand how to construct, modify, and validate workflow definitions.
|
|
|
|
** WORKFLOW STRUCTURE OVERVIEW **
|
|
|
|
A Skyvern workflow is defined in YAML format with the following top-level structure
|
|
for a workflow definition (embedded under workflow_definition in full specs):
|
|
|
|
title: "<workflow title>"
|
|
description: "<optional description>"
|
|
workflow_definition:
|
|
version: 2 # IMPORTANT: Always use version 2
|
|
blocks: []
|
|
parameters: []
|
|
webhook_callback_url: "<optional_https_url>" # Optional: Webhook URL to receive workflow run updates
|
|
|
|
Key Concepts:
|
|
- Workflows consist of sequential or conditional blocks that represent specific tasks
|
|
- Each block has a unique label for identification and navigation
|
|
- Blocks can reference workflow parameters using Jinja2 templating
|
|
- Block execution is defined by next_block_label on every non-terminal block
|
|
|
|
** WORKFLOW PARAMETERS **
|
|
|
|
Parameters provide input values and credentials to workflows. They are defined in the "parameters" list.
|
|
|
|
Common Parameter Types:
|
|
|
|
* WORKFLOW PARAMETERS (user inputs)
|
|
parameter_type: workflow
|
|
key: <unique_key>
|
|
workflow_parameter_type: <string|integer|float|boolean|json|file_url|credential_id>
|
|
description: <optional description>
|
|
default_value: <optional default>
|
|
|
|
Define exactly one parameters list under workflow_definition.
|
|
|
|
Example:
|
|
workflow_definition:
|
|
version: 2
|
|
blocks: []
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: search_query
|
|
workflow_parameter_type: string
|
|
description: "Search term to use"
|
|
default_value: "example"
|
|
|
|
* OUTPUT PARAMETERS (block outputs)
|
|
parameter_type: output
|
|
key: <unique_key>
|
|
description: <optional description>
|
|
|
|
* CREDENTIAL PARAMETERS
|
|
parameter_type: workflow
|
|
workflow_parameter_type: credential_id
|
|
key: <unique_key>
|
|
default_value: <credential_id>
|
|
|
|
Using Parameters in Blocks:
|
|
- Reference using Jinja2: {{ param_key }}
|
|
- List parameter_keys in blocks that use them
|
|
- Parameters are resolved before block execution. ALL PARAMETER KEYS REFERENCED IN BLOCKS MUST FIRST BE DEFINED IN THE WORKFLOW PARAMETERS LIST
|
|
|
|
Example:
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- label: block_1
|
|
block_type: navigation
|
|
url: https://news.ycombinator.com/
|
|
navigation_goal: "Give me top {{topics_count}} news items"
|
|
next_block_label: null
|
|
parameters:
|
|
- key: topics_count
|
|
description: null
|
|
parameter_type: workflow
|
|
workflow_parameter_type: integer
|
|
default_value: "3"
|
|
|
|
** COMMON BLOCK FIELDS **
|
|
|
|
All blocks inherit these base fields:
|
|
|
|
block_type: <type> # Required: Defines the block type
|
|
label: <unique_label> # Required: Unique identifier for this block
|
|
next_block_label: <label|null> # Required: Label of next block; use null only for terminal blocks
|
|
continue_on_failure: false # Optional: Continue workflow if block fails
|
|
next_loop_on_failure: false # Optional: Continue to next loop iteration on failure (for loop blocks only)
|
|
model: {} # Optional: Override model settings for this block
|
|
|
|
Important Rules:
|
|
- Labels must be unique within a workflow
|
|
- Labels cannot be empty or contain only whitespace
|
|
- next_block_label is required for all non-terminal blocks
|
|
- Use next_block_label for explicit flow control
|
|
- Set next_block_label to null to mark the end of a flow
|
|
- continue_on_failure allows graceful error handling
|
|
|
|
** NAVIGATION BLOCK (navigation) **
|
|
|
|
Purpose: Take actions on a page to achieve a focused goal — fill a form, click through a multi-step flow, prepare the page before an extraction.
|
|
|
|
Structure:
|
|
block_type: navigation
|
|
label: <unique_label>
|
|
url: <starting_url> # Optional: URL to navigate to; omit to continue on current page
|
|
title: str # Required: The title of the block
|
|
navigation_goal: <action_description> # Required: What actions to perform
|
|
error_code_mapping: {} # Optional: Map errors to custom codes
|
|
max_retries: 0 # Optional: Number of retry attempts
|
|
max_steps_per_run: null # Optional: Limit steps per execution
|
|
parameter_keys: [] # Optional: Parameters used in this block
|
|
complete_on_download: false # Optional: Complete when file downloads
|
|
download_suffix: null # Optional: Downloaded file name
|
|
totp_verification_url: null # Optional: TOTP verification URL
|
|
disable_cache: false # Optional: Disable caching
|
|
complete_criterion: null # Optional: Condition to mark complete
|
|
terminate_criterion: null # Optional: Condition to terminate
|
|
complete_verification: true # Optional: Verify completion
|
|
include_action_history_in_verification: false # Optional: Include history in verification
|
|
|
|
Use Cases:
|
|
- Fill out forms on websites
|
|
- Navigate complex multi-step processes
|
|
- Prepare the page for a subsequent extraction block
|
|
- Execute focused browser tasks with clear completion criteria
|
|
|
|
Example:
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: navigation
|
|
label: search_and_open
|
|
next_block_label: null
|
|
url: "https://example.com/search"
|
|
navigation_goal: "Search for {{ query }} and click the first result"
|
|
parameter_keys:
|
|
- query
|
|
max_retries: 2
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: query
|
|
workflow_parameter_type: string
|
|
|
|
** URL BLOCK (goto_url) **
|
|
|
|
Purpose: Navigate directly to a URL without any additional instructions.
|
|
|
|
Structure:
|
|
block_type: goto_url
|
|
label: <unique_label>
|
|
url: <target_url> # Required: URL to navigate to
|
|
error_code_mapping: {} # Optional: Custom error codes
|
|
max_retries: 0 # Optional: Retry attempts
|
|
parameter_keys: [] # Optional: Parameters used
|
|
|
|
Use Cases:
|
|
- Jump to a known page before other blocks
|
|
- Reset the browser state to a specific URL
|
|
- Split URL navigation from subsequent actions
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: goto_url
|
|
label: open_cart
|
|
next_block_label: null
|
|
url: "https://example.com/cart"
|
|
|
|
** ACTION BLOCK (action) **
|
|
|
|
Purpose: Perform a single focused action on the current page without data extraction.
|
|
|
|
Structure:
|
|
block_type: action
|
|
label: <unique_label>
|
|
navigation_goal: <action_description> # Required: Single action to perform
|
|
url: <starting_url> # Optional: URL to start from
|
|
error_code_mapping: {} # Optional: Custom error codes
|
|
max_retries: 0 # Optional: Retry attempts
|
|
parameter_keys: [] # Optional: Parameters used
|
|
complete_on_download: false # Optional: Complete on download
|
|
download_suffix: null # Optional: Download file name
|
|
totp_verification_url: null # Optional: TOTP verification URL
|
|
totp_identifier: null # Optional: TOTP identifier
|
|
disable_cache: false # Optional: Disable cache
|
|
|
|
Use Cases:
|
|
- Click a specific button or link
|
|
- Fill a single field or selection
|
|
- Trigger a download with one action
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: action
|
|
label: accept_terms
|
|
next_block_label: null
|
|
url: "https://example.com/checkout"
|
|
navigation_goal: "Check the terms checkbox"
|
|
max_retries: 1
|
|
|
|
** TASK BLOCK (task) — NOT AVAILABLE IN WORKFLOW COPILOT **
|
|
|
|
DO NOT EMIT "task" blocks. They are not available in the workflow copilot and will be rejected at persistence. Use:
|
|
- "navigation" for page actions (filling forms, clicking, multi-step flows)
|
|
- "extraction" for data extraction (with data_extraction_goal + data_schema)
|
|
- "validation" for completion checks
|
|
- "login" for authentication
|
|
- "goto_url" for pure URL navigation
|
|
Legacy workflows that already contain "task" blocks continue to run outside the copilot; this ban applies only to copilot emission.
|
|
|
|
** TASK V2 BLOCK (task_v2) — DEPRECATED **
|
|
|
|
DO NOT USE task_v2. Use "navigation" blocks instead (with navigation_goal).
|
|
Use "extraction" blocks for data extraction (with data_extraction_goal + data_schema).
|
|
The task_v2 block type exists only for backward compatibility with existing workflows.
|
|
|
|
** FOR LOOP BLOCK (for_loop) **
|
|
|
|
Purpose: Iterate over a list of values and run a sequence of blocks for each item.
|
|
|
|
Structure:
|
|
block_type: for_loop
|
|
label: <unique_label>
|
|
loop_blocks: [] # Required: Blocks to run for each iteration
|
|
loop_over_parameter_key: <param_key> # Optional: Use ONLY for workflow parameters defined in parameters section
|
|
loop_variable_reference: <block_label> # Optional: Use to reference output from a previous block
|
|
complete_if_empty: false # Optional: Complete successfully when list is empty
|
|
|
|
Important Notes:
|
|
- Provide either loop_over_parameter_key or loop_variable_reference
|
|
- Loop blocks must use next_block_label to chain within the loop
|
|
- Each iteration exposes {{ current_value }}, {{ current_item }}, and {{ current_index }}
|
|
|
|
CRITICAL DISTINCTION:
|
|
* loop_over_parameter_key: ONLY for workflow parameters defined in the parameters section
|
|
- Use when looping over a static list or user-provided input
|
|
- Must reference a key from the workflow parameters list
|
|
- Example: loop_over_parameter_key: urls (where "urls" is in parameters)
|
|
|
|
* loop_variable_reference: For referencing output from a PREVIOUS BLOCK
|
|
- Use when looping over data extracted or generated by a previous block
|
|
- Reference the block's label directly (e.g., "extract_rows")
|
|
- System automatically tries: block_label.extracted_information, block_label.results, etc.
|
|
- Example: loop_variable_reference: extract_rows (where "extract_rows" is a previous extraction block)
|
|
|
|
Example 1 - Loop over workflow parameter:
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: urls
|
|
workflow_parameter_type: json
|
|
default_value:
|
|
- "https://example.com/a"
|
|
- "https://example.com/b"
|
|
blocks:
|
|
- block_type: for_loop
|
|
label: visit_urls
|
|
next_block_label: null
|
|
loop_over_parameter_key: urls # References the workflow parameter "urls"
|
|
loop_blocks:
|
|
- block_type: goto_url
|
|
label: open_url
|
|
next_block_label: null
|
|
url: "{{ current_value }}"
|
|
|
|
Example 2 - Loop over extraction block output:
|
|
blocks:
|
|
- block_type: extraction
|
|
label: extract_products
|
|
next_block_label: process_each_product
|
|
data_extraction_goal: "Extract all products from the table"
|
|
data_schema:
|
|
type: array
|
|
items:
|
|
type: object
|
|
properties:
|
|
name: {type: string}
|
|
price: {type: number}
|
|
- block_type: for_loop
|
|
label: process_each_product
|
|
next_block_label: null
|
|
loop_variable_reference: extract_products # References the extraction block's output
|
|
loop_blocks:
|
|
- block_type: navigation
|
|
label: add_to_cart
|
|
next_block_label: null
|
|
navigation_goal: "Add {{ current_value.name }} to cart"
|
|
|
|
** CONDITIONAL BLOCK (conditional) **
|
|
|
|
Purpose: Branch to different blocks based on ordered conditions.
|
|
|
|
Structure:
|
|
block_type: conditional
|
|
label: <unique_label>
|
|
branch_conditions: [] # Required: Ordered list of branch conditions
|
|
|
|
Branch Condition Structure:
|
|
criteria: # Optional for default branch
|
|
criteria_type: <jinja2_template|prompt> # Optional: inferred when omitted
|
|
expression: <expression> # Required when criteria present
|
|
description: <optional description>
|
|
next_block_label: <label|null> # Optional: Label to execute when matched
|
|
description: <optional description>
|
|
is_default: false # Required when criteria is omitted
|
|
|
|
Important Notes:
|
|
- At least one branch is required
|
|
- Branches are evaluated in order; first match wins
|
|
- Only one default branch is allowed
|
|
- Branches without criteria must set is_default: true
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: conditional
|
|
label: route_by_status
|
|
next_block_label: null
|
|
branch_conditions:
|
|
- criteria:
|
|
criteria_type: jinja2_template
|
|
expression: "{{ account_status == 'active' }}"
|
|
next_block_label: handle_active
|
|
description: "Active accounts"
|
|
is_default: false
|
|
- is_default: true
|
|
next_block_label: handle_inactive
|
|
description: "Fallback for all other states"
|
|
|
|
** LOGIN BLOCK (login) **
|
|
|
|
Purpose: Handle authentication flows including username/password and TOTP/2FA.
|
|
|
|
Structure:
|
|
block_type: login
|
|
label: <unique_label>
|
|
url: <login_page_url> # Optional: Login page URL
|
|
title: str # Required: The title of the block
|
|
navigation_goal: null # Optional: Additional navigation after login
|
|
error_code_mapping: {} # Optional: Custom error codes
|
|
max_retries: 0 # Optional: Retry attempts
|
|
max_steps_per_run: null # Optional: Step limit
|
|
parameter_keys: [] # Required: Should include credential parameters
|
|
complete_criterion: null # Optional: Completion condition
|
|
terminate_criterion: null # Optional: Termination condition
|
|
complete_verification: true # Optional: Verify successful login
|
|
|
|
Use Cases:
|
|
- Login to websites with username/password
|
|
- Handle 2FA/TOTP authentication
|
|
- Manage credential-protected workflows
|
|
- Session initialization
|
|
|
|
Important Notes:
|
|
- Credentials should be stored as parameters (credential, bitwarden_login_credential, etc.)
|
|
- TOTP is automatically handled if the credential parameter has TOTP configured
|
|
|
|
Example:
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: login
|
|
label: login_to_portal
|
|
next_block_label: null
|
|
url: "https://portal.example.com/login"
|
|
parameter_keys:
|
|
- my_credentials # This must match a 'key' from the parameters list above
|
|
complete_criterion: "Current URL is 'https://portal.example.com/dashboard'"
|
|
max_retries: 2
|
|
parameters:
|
|
- parameter_type: workflow
|
|
workflow_parameter_type: credential_id
|
|
key: my_credentials
|
|
default_value: "cred_uuid_here"
|
|
|
|
** VALIDATION BLOCK (validation) **
|
|
|
|
Purpose: Validate workflow state and decide whether to continue or terminate.
|
|
|
|
Structure:
|
|
block_type: validation
|
|
label: <unique_label>
|
|
complete_criterion: <success_condition> # Optional: Condition for success
|
|
terminate_criterion: <termination_condition> # Optional: Condition to stop workflow
|
|
error_code_mapping: {} # Optional: Map errors to custom codes
|
|
parameter_keys: [] # Optional: Parameters used in this block
|
|
disable_cache: false # Optional: Disable caching
|
|
|
|
Use Cases:
|
|
- Confirm a successful navigation or submission
|
|
- Stop the workflow when an error state appears
|
|
- Validate content on the page before extracting data
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: validation
|
|
label: verify_submission
|
|
next_block_label: null
|
|
complete_criterion: "Page contains 'Thank you for your submission'"
|
|
terminate_criterion: "Page contains 'Error' or 'Try again'"
|
|
|
|
** WAIT BLOCK (wait) **
|
|
|
|
Purpose: Pause workflow execution for a specified duration.
|
|
|
|
Structure:
|
|
block_type: wait
|
|
label: <unique_label>
|
|
wait_sec: <seconds> # Required: Number of seconds to wait
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: wait
|
|
label: wait_for_processing
|
|
next_block_label: check_results
|
|
wait_sec: 30
|
|
|
|
** EXTRACTION BLOCK (extraction) **
|
|
|
|
Purpose: Extract structured data from the current page without navigation.
|
|
|
|
Structure:
|
|
block_type: extraction
|
|
label: <unique_label>
|
|
title: str # Required: The title of the block
|
|
data_extraction_goal: <what_to_extract> # Required: Description of data to extract
|
|
data_schema: <json_schema> # Optional: Structure of extracted data
|
|
url: <page_url> # Optional: URL to navigate to first
|
|
max_retries: 0 # Optional: Retry attempts
|
|
max_steps_per_run: null # Optional: Step limit
|
|
parameter_keys: [] # Optional: Parameters used
|
|
disable_cache: false # Optional: Disable cache
|
|
Use Cases:
|
|
- Extract structured data after other blocks
|
|
- Parse tables, lists, or forms
|
|
- Collect multiple data points from a page
|
|
- Data mining from web pages
|
|
|
|
Data Schema Formats:
|
|
* JSON Schema object:
|
|
data_schema:
|
|
type: object
|
|
properties:
|
|
field1: {type: string}
|
|
field2: {type: number}
|
|
|
|
* JSON Schema array:
|
|
data_schema:
|
|
type: array
|
|
items:
|
|
type: object
|
|
properties:
|
|
name: {type: string}
|
|
|
|
* String format (for simple extractions):
|
|
data_schema: "csv_string"
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: extraction
|
|
label: extract_product_list
|
|
next_block_label: null
|
|
data_extraction_goal: "Extract all products with their names, prices, and stock status"
|
|
data_schema:
|
|
type: array
|
|
items:
|
|
type: object
|
|
properties:
|
|
product_name: {type: string}
|
|
price: {type: number}
|
|
in_stock: {type: boolean}
|
|
rating: {type: number}
|
|
max_retries: 1
|
|
|
|
** FILE DOWNLOAD BLOCK (file_download) **
|
|
|
|
Purpose: Download files from a website. This is the "File Download Block" in the UI.
|
|
|
|
Structure:
|
|
block_type: file_download
|
|
label: <unique_label>
|
|
navigation_goal: <download_instruction> # Required: How to trigger the download
|
|
url: <starting_url> # Optional: URL to navigate to first
|
|
title: str # Optional: Title for the block
|
|
engine: skyvern-1.0 # Optional: Run engine
|
|
error_code_mapping: {} # Optional: Map errors to custom codes
|
|
max_retries: 0 # Optional: Number of retry attempts
|
|
max_steps_per_run: null # Optional: Limit steps per execution
|
|
parameter_keys: [] # Optional: Parameters used in this block
|
|
download_suffix: null # Optional: Full filename for the download
|
|
totp_verification_url: null # Optional: TOTP verification URL
|
|
totp_identifier: null # Optional: TOTP identifier
|
|
disable_cache: false # Optional: Disable caching
|
|
download_timeout: null # Optional: Download timeout in seconds
|
|
|
|
Use Cases:
|
|
- Download invoices, receipts, or reports from a portal
|
|
- Export data as CSV or PDF from a dashboard
|
|
- Trigger file downloads that require navigation steps
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: file_download
|
|
label: download_report
|
|
next_block_label: null
|
|
url: "https://portal.example.com/reports"
|
|
navigation_goal: "Open the latest report and click Download as PDF"
|
|
download_suffix: "latest_report.pdf"
|
|
|
|
** CLOUD STORAGE BLOCK (file_upload) **
|
|
|
|
Purpose: Upload files to storage. This is the "Cloud Storage Block" in the UI.
|
|
|
|
Structure:
|
|
block_type: file_upload
|
|
label: <unique_label>
|
|
storage_type: <s3|azure> # Optional: Storage backend (default: s3)
|
|
s3_bucket: <bucket_name> # Optional: S3 bucket name
|
|
aws_access_key_id: <access_key_id> # Optional: AWS access key id
|
|
aws_secret_access_key: <secret_access_key> # Optional: AWS secret access key
|
|
region_name: <aws_region> # Optional: AWS region
|
|
azure_storage_account_name: <account_name> # Optional: Azure storage account
|
|
azure_storage_account_key: <account_key> # Optional: Azure storage account key
|
|
azure_blob_container_name: <container_name> # Optional: Azure blob container
|
|
azure_folder_path: <folder_path> # Optional: Azure folder path
|
|
path: <local_or_workspace_path> # Optional: File path to upload
|
|
|
|
Use Cases:
|
|
- Upload downloaded artifacts to S3
|
|
- Publish files to Azure Blob Storage
|
|
- Persist workflow outputs to storage
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: file_upload
|
|
label: upload_report
|
|
next_block_label: null
|
|
storage_type: s3
|
|
s3_bucket: "my-reports"
|
|
region_name: "us-west-2"
|
|
path: "/tmp/latest_report.pdf"
|
|
|
|
** FILE PARSER BLOCK (file_url_parser) **
|
|
|
|
Purpose: Parse PDFs, CSVs, Excel files, images, and DOCX files. This is the "File Parser Block" in the UI.
|
|
|
|
Structure:
|
|
block_type: file_url_parser
|
|
label: <unique_label>
|
|
file_url: <https_url_or_file_url> # Required: URL to the file
|
|
file_type: <auto_detect|csv|excel|pdf|image|docx> # Optional: defaults to auto_detect (infer from URL/content)
|
|
json_schema: <json_schema> # Optional: Structure of parsed output
|
|
|
|
Use Cases:
|
|
- Parse a PDF invoice into structured fields
|
|
- Extract rows from a CSV file
|
|
- Read Excel sheets for downstream processing
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: file_url_parser
|
|
label: parse_invoice
|
|
next_block_label: null
|
|
file_url: "https://example.com/invoice.pdf"
|
|
file_type: pdf
|
|
json_schema:
|
|
type: object
|
|
properties:
|
|
invoice_id: {type: string}
|
|
total: {type: number}
|
|
|
|
** SEND EMAIL BLOCK (send_email) **
|
|
|
|
Purpose: Send email notifications. This is the "Send Email Block" in the UI.
|
|
|
|
Structure:
|
|
block_type: send_email
|
|
label: <unique_label>
|
|
smtp_host_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP host
|
|
smtp_port_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP port
|
|
smtp_username_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP username
|
|
smtp_password_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP password
|
|
sender: <email_address> # Required: Sender email address
|
|
recipients: [<email_address>] # Required: Recipient list
|
|
subject: <subject_line> # Required: Email subject
|
|
body: <email_body> # Required: Email body
|
|
file_attachments: [<file_path>] # Optional: Local file paths to attach
|
|
|
|
Use Cases:
|
|
- Notify a team when a workflow completes
|
|
- Send extracted data summaries to stakeholders
|
|
- Email reports or attachments
|
|
|
|
Example:
|
|
blocks:
|
|
- block_type: send_email
|
|
label: notify_ops
|
|
next_block_label: null
|
|
smtp_host_secret_parameter_key: smtp_host
|
|
smtp_port_secret_parameter_key: smtp_port
|
|
smtp_username_secret_parameter_key: smtp_user
|
|
smtp_password_secret_parameter_key: smtp_pass
|
|
sender: "automation@example.com"
|
|
recipients: ["ops@example.com"]
|
|
subject: "Daily report ready"
|
|
body: "The latest report is ready for review."
|
|
file_attachments: ["/tmp/latest_report.pdf"]
|
|
|
|
** TEXT PROMPT BLOCK (text_prompt) **
|
|
|
|
Purpose: Process text with LLM. This is the "Text Prompt Block" in the UI.
|
|
|
|
Structure:
|
|
block_type: text_prompt
|
|
label: <unique_label>
|
|
llm_key: <llm_key> # Optional: Model key override
|
|
prompt: <text_prompt> # Required: Prompt to run
|
|
parameter_keys: [] # Optional: Parameters used in this block
|
|
json_schema: <json_schema> # Optional: Structured output schema
|
|
|
|
Use Cases:
|
|
- Summarize extracted text or documents
|
|
- Normalize free-form content into a schema
|
|
- Generate classifications or tags
|
|
|
|
Example:
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: text_prompt
|
|
label: summarize_notes
|
|
next_block_label: null
|
|
prompt: "Summarize these notes: {{ notes }}"
|
|
json_schema:
|
|
type: object
|
|
properties:
|
|
summary: {type: string}
|
|
action_items: {type: array, items: {type: string}}
|
|
parameter_keys: [notes]
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: notes
|
|
workflow_parameter_type: string
|
|
|
|
** HTTP REQUEST BLOCK (http_request) **
|
|
|
|
Purpose: Make HTTP API calls. This is the "HTTP Request Block" in the UI.
|
|
|
|
Structure:
|
|
block_type: http_request
|
|
label: <unique_label>
|
|
method: <GET|POST|PUT|PATCH|DELETE> # Optional: HTTP method (default: GET)
|
|
url: <https_url> # Optional: Target URL
|
|
headers: {} # Optional: HTTP headers
|
|
body: {} # Optional: JSON body (MUST be a dict, not a string)
|
|
files: {} # Optional: Multipart files mapping
|
|
timeout: 30 # Optional: Timeout in seconds
|
|
follow_redirects: true # Optional: Follow redirects
|
|
parameter_keys: [] # Optional: Workflow parameters used (block outputs don't need to be listed)
|
|
|
|
Use Cases:
|
|
- Call third-party APIs for enrichment
|
|
- Post data to internal services
|
|
- Upload files via multipart requests
|
|
- Send results from previous blocks to webhooks
|
|
|
|
Example 1 - Using workflow parameters:
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: http_request
|
|
label: lookup_customer
|
|
next_block_label: null
|
|
method: "POST"
|
|
url: "https://api.example.com/customers/search"
|
|
headers:
|
|
Authorization: "Bearer {{ api_token }}"
|
|
body:
|
|
email: "{{ customer_email }}"
|
|
parameter_keys: [api_token, customer_email]
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: api_token
|
|
workflow_parameter_type: string
|
|
- parameter_type: workflow
|
|
key: customer_email
|
|
workflow_parameter_type: string
|
|
|
|
Example 2 - Sending block output to webhook (no parameter_keys needed):
|
|
workflow_definition:
|
|
version: 2
|
|
parameters: []
|
|
blocks:
|
|
- block_type: navigation
|
|
label: get_data
|
|
next_block_label: send_webhook
|
|
navigation_goal: "Get top 3 hacker news items"
|
|
url: "https://news.ycombinator.com"
|
|
- block_type: http_request
|
|
label: send_webhook
|
|
next_block_label: null
|
|
method: POST
|
|
url: "http://example.com/webhook"
|
|
body:
|
|
data: "{{ get_data.output }}"
|
|
parameter_keys: []
|
|
|
|
** PARAMETER TEMPLATING **
|
|
|
|
All string fields in blocks support Jinja2 templating to reference parameters and block outputs.
|
|
|
|
There are TWO types of references:
|
|
|
|
1. WORKFLOW PARAMETERS - Reference input parameters defined in the parameters section
|
|
Syntax: {{ param_key }}
|
|
Requires: parameter_keys list must include the param_key
|
|
|
|
2. BLOCK OUTPUTS - Reference output from a previous block by its label
|
|
Syntax: {{ block_label.output }}
|
|
Requires: NOTHING - block outputs are automatically available, no parameter_keys needed
|
|
|
|
IMPORTANT: Block outputs use the block's label directly (e.g., {{ block_1.output }}).
|
|
|
|
Examples - Workflow Parameters:
|
|
|
|
* In URL:
|
|
url: "https://example.com/search?q={{ search_term }}"
|
|
parameter_keys: [search_term]
|
|
|
|
* In goals:
|
|
navigation_goal: "Search for {{ product_name }} and filter by {{ category }}"
|
|
parameter_keys: [product_name, category]
|
|
|
|
Examples - Block Outputs (no parameter_keys needed):
|
|
|
|
* Send previous block output to webhook:
|
|
blocks:
|
|
- block_type: navigation
|
|
label: get_data
|
|
navigation_goal: "Get top 3 hacker news items"
|
|
...
|
|
- block_type: http_request
|
|
label: send_webhook
|
|
method: POST
|
|
url: "http://example.com/webhook"
|
|
body:
|
|
data: "{{ get_data.output }}"
|
|
parameter_keys: [] # Empty - block outputs don't need to be declared
|
|
|
|
* Use extraction output in next block:
|
|
blocks:
|
|
- block_type: extraction
|
|
label: extract_items
|
|
...
|
|
- block_type: navigation
|
|
label: process_items
|
|
navigation_goal: "Process these items: {{ extract_items.output }}"
|
|
|
|
* In schemas (as descriptions):
|
|
data_schema:
|
|
type: object
|
|
properties:
|
|
query_result:
|
|
type: string
|
|
description: "Result for query: {{ query }}"
|
|
|
|
** ERROR HANDLING AND RETRIES **
|
|
|
|
Error Code Mapping:
|
|
Map internal errors to custom error codes for easier handling:
|
|
|
|
error_code_mapping:
|
|
"ElementNotFound": "ELEMENT_MISSING"
|
|
"TimeoutError": "PAGE_TIMEOUT"
|
|
"NavigationFailed": "NAV_ERROR"
|
|
|
|
Retry Configuration:
|
|
max_retries: 3 # Block will retry up to 3 times on failure
|
|
|
|
Conditional Continuation:
|
|
continue_on_failure: true # Workflow continues even if block fails
|
|
|
|
Loop Continuation:
|
|
next_loop_on_failure: true # Skip to next iteration in loops
|
|
|
|
Completion Criteria:
|
|
complete_criterion: "URL contains '/success'" # Condition for success
|
|
terminate_criterion: "Element with text 'Error' exists" # Condition to stop
|
|
|
|
** WORKFLOW EXECUTION FLOW **
|
|
|
|
Sequential Execution:
|
|
blocks:
|
|
- block_type: goto_url
|
|
label: step1
|
|
next_block_label: step2
|
|
url: "https://example.com/start"
|
|
- block_type: extraction
|
|
label: step2
|
|
next_block_label: step3
|
|
- block_type: navigation
|
|
label: step3
|
|
next_block_label: null
|
|
navigation_goal: "Complete the final step on the page"
|
|
# Executes: step1 → step2 → step3
|
|
|
|
Explicit Flow Control (Skip blocks):
|
|
blocks:
|
|
- block_type: goto_url
|
|
label: login
|
|
next_block_label: extract_data
|
|
url: "https://app.example.com/login"
|
|
- block_type: navigation
|
|
label: handle_error
|
|
next_block_label: null
|
|
navigation_goal: "Handle the error state if it appears"
|
|
- block_type: extraction
|
|
label: extract_data
|
|
next_block_label: null
|
|
# Executes: login → extract_data (skips handle_error)
|
|
|
|
Error Recovery Flow:
|
|
blocks:
|
|
- block_type: navigation
|
|
label: primary_task
|
|
next_block_label: verify_result
|
|
continue_on_failure: true
|
|
navigation_goal: "Attempt the primary task on the page"
|
|
- block_type: validation
|
|
label: verify_result
|
|
next_block_label: null
|
|
|
|
** BEST PRACTICES **
|
|
|
|
* Naming Conventions:
|
|
- Use descriptive labels: "login_to_portal" not "step1"
|
|
- Use snake_case for labels and parameter keys
|
|
- Use snake_case for data schema field names (e.g., product_name, total_price)
|
|
- Make labels unique and meaningful
|
|
|
|
* Goal Writing:
|
|
- Be specific: "Click the blue 'Submit' button" vs "Submit the form"
|
|
- Include context: "After clicking Search, wait for results to load"
|
|
- Natural language: Write as you would instruct a human
|
|
|
|
* Parameter Usage:
|
|
- Always list parameter_keys when using parameters in a block
|
|
- Validate parameter types match usage
|
|
- Provide default values for optional parameters
|
|
|
|
* Error Handling:
|
|
- Set appropriate max_retries for flaky operations
|
|
- Use complete_criterion for validation
|
|
- Map errors to meaningful codes for debugging
|
|
|
|
* Data Extraction:
|
|
- Always provide data_schema for structured extraction
|
|
- Use specific extraction goals
|
|
- Handle arrays vs objects appropriately
|
|
|
|
* Performance:
|
|
- Use disable_cache: true for dynamic content
|
|
- Set max_steps_per_run to prevent infinite loops
|
|
- Use navigation blocks for actions and extraction blocks for data extraction
|
|
|
|
* Security:
|
|
- Never hardcode credentials in workflows
|
|
- Use credential parameters for sensitive data
|
|
- Use AWS secrets or vault integrations
|
|
|
|
** COMMON PATTERNS **
|
|
|
|
Pattern 1: Login → Navigate → Extract
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: login
|
|
label: authenticate
|
|
next_block_label: go_to_reports
|
|
url: "https://app.example.com/login"
|
|
parameter_keys: [my_credentials]
|
|
- block_type: navigation
|
|
label: go_to_reports
|
|
next_block_label: get_report_data
|
|
navigation_goal: "Navigate to Reports section"
|
|
- block_type: extraction
|
|
label: get_report_data
|
|
next_block_label: null
|
|
data_extraction_goal: "Extract all report entries"
|
|
data_schema:
|
|
type: array
|
|
items: {type: object}
|
|
parameters:
|
|
- parameter_type: workflow
|
|
workflow_parameter_type: credential_id
|
|
key: my_credentials
|
|
default_value: "uuid"
|
|
- parameter_type: output
|
|
key: extracted_data
|
|
|
|
Pattern 2: Search with Dynamic Input
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: navigation
|
|
label: search_and_extract
|
|
next_block_label: null
|
|
url: "https://example.com"
|
|
navigation_goal: "Search for '{{ search_query }}' and extract the first 10 results with titles and URLs"
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: search_query
|
|
workflow_parameter_type: string
|
|
|
|
Pattern 3: Multi-Step Form Filling
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: goto_url
|
|
label: open_form
|
|
next_block_label: fill_personal_info
|
|
url: "https://forms.example.com/application"
|
|
- block_type: navigation
|
|
label: fill_personal_info
|
|
next_block_label: fill_address
|
|
navigation_goal: "Fill in name as {{ name }}, email as {{ email }}"
|
|
parameter_keys: [name, email]
|
|
- block_type: navigation
|
|
label: fill_address
|
|
next_block_label: submit
|
|
navigation_goal: "Fill in address fields and click Continue"
|
|
parameter_keys: [address, city, zip]
|
|
- block_type: navigation
|
|
label: submit
|
|
next_block_label: null
|
|
navigation_goal: "Review information and click Submit"
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: name
|
|
workflow_parameter_type: string
|
|
- parameter_type: workflow
|
|
key: email
|
|
workflow_parameter_type: string
|
|
- parameter_type: workflow
|
|
key: address
|
|
workflow_parameter_type: string
|
|
- parameter_type: workflow
|
|
key: city
|
|
workflow_parameter_type: string
|
|
- parameter_type: workflow
|
|
key: zip
|
|
workflow_parameter_type: string
|
|
|
|
Pattern 4: Conditional Extraction
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: navigation
|
|
label: search_product
|
|
next_block_label: check_availability
|
|
navigation_goal: "Search for {{ product }}"
|
|
- block_type: extraction
|
|
label: check_availability
|
|
next_block_label: add_to_cart
|
|
data_extraction_goal: "Check if product is in stock"
|
|
data_schema:
|
|
type: object
|
|
properties:
|
|
in_stock: {type: boolean}
|
|
- block_type: navigation
|
|
label: add_to_cart
|
|
next_block_label: null
|
|
navigation_goal: "If product is in stock, add to cart"
|
|
parameters: []
|
|
|
|
** VALIDATION RULES **
|
|
|
|
Workflow-Level:
|
|
- All block labels must be unique
|
|
- Parameters referenced in blocks must be defined
|
|
- next_block_label must point to existing block labels or be null
|
|
- The last block in execution flow should have next_block_label: null
|
|
|
|
Block-Level:
|
|
- label is required and cannot be empty
|
|
- block_type must be a valid type
|
|
- For navigation blocks: navigation_goal is required
|
|
- For extraction blocks: data_extraction_goal is required
|
|
- For action blocks: navigation_goal is required
|
|
- For login blocks: parameter_keys should include credentials
|
|
|
|
Parameter-Level:
|
|
- key must be unique across parameters
|
|
- key cannot contain whitespace
|
|
- parameter_type must be valid
|
|
- Referenced keys (like source_parameter_key) must exist
|
|
|
|
** COMPLETE WORKFLOW EXAMPLE **
|
|
|
|
title: E-commerce Product Search and Purchase
|
|
description: Search for a product, extract details, and add to cart
|
|
workflow_definition:
|
|
version: 2
|
|
blocks:
|
|
- block_type: login
|
|
label: login_to_store
|
|
next_block_label: search_and_filter
|
|
url: "https://shop.example.com/login"
|
|
parameter_keys:
|
|
- account_creds
|
|
complete_criterion: "URL contains '/dashboard'"
|
|
|
|
- block_type: navigation
|
|
label: search_and_filter
|
|
next_block_label: get_product_info
|
|
url: "https://shop.example.com/search"
|
|
navigation_goal: "Search for {{ product_name }} and filter results by price under ${{ max_price }}"
|
|
parameter_keys:
|
|
- product_name
|
|
- max_price
|
|
max_retries: 2
|
|
|
|
- block_type: extraction
|
|
label: get_product_info
|
|
next_block_label: add_to_cart
|
|
data_extraction_goal: "Extract product name, price, rating, and availability"
|
|
data_schema:
|
|
type: object
|
|
properties:
|
|
name: {type: string}
|
|
price: {type: number}
|
|
rating: {type: number}
|
|
available: {type: boolean}
|
|
|
|
- block_type: navigation
|
|
label: add_to_cart
|
|
next_block_label: null
|
|
navigation_goal: "Click on the first available product and add it to cart"
|
|
max_retries: 3
|
|
parameters:
|
|
- parameter_type: workflow
|
|
key: product_name
|
|
workflow_parameter_type: string
|
|
description: "Product to search for"
|
|
- parameter_type: workflow
|
|
key: max_price
|
|
workflow_parameter_type: float
|
|
description: "Maximum price willing to pay"
|
|
- parameter_type: workflow
|
|
workflow_parameter_type: credential_id
|
|
key: account_creds
|
|
default_value: "cred_12345"
|
|
- parameter_type: output
|
|
key: product_details
|
|
description: "Extracted product information"
|
|
|
|
END OF KNOWLEDGE BASE
|