Skyvern/skyvern/forge/prompts/skyvern/workflow_knowledge_base.txt

1057 lines
36 KiB
Text

SKYVERN WORKFLOW YAML KNOWLEDGE BASE
This document provides comprehensive information about Skyvern Workflow YAML structure and blocks. Use this to understand how to construct, modify, and validate workflow definitions.
** WORKFLOW STRUCTURE OVERVIEW **
A Skyvern workflow is defined in YAML format with the following top-level structure
for a workflow definition (embedded under workflow_definition in full specs):
title: "<workflow title>"
description: "<optional description>"
workflow_definition:
version: 2 # IMPORTANT: Always use version 2
blocks: []
parameters: []
webhook_callback_url: "<optional_https_url>" # Optional: Webhook URL to receive workflow run updates
Key Concepts:
- Workflows consist of sequential or conditional blocks that represent specific tasks
- Each block has a unique label for identification and navigation
- Blocks can reference workflow parameters using Jinja2 templating
- Block execution is defined by next_block_label on every non-terminal block
** WORKFLOW PARAMETERS **
Parameters provide input values and credentials to workflows. They are defined in the "parameters" list.
Common Parameter Types:
* WORKFLOW PARAMETERS (user inputs)
parameter_type: workflow
key: <unique_key>
workflow_parameter_type: <string|integer|float|boolean|json|file_url|credential_id>
description: <optional description>
default_value: <optional default>
Define exactly one parameters list under workflow_definition.
Example:
workflow_definition:
version: 2
blocks: []
parameters:
- parameter_type: workflow
key: search_query
workflow_parameter_type: string
description: "Search term to use"
default_value: "example"
* OUTPUT PARAMETERS (block outputs)
parameter_type: output
key: <unique_key>
description: <optional description>
* CREDENTIAL PARAMETERS
parameter_type: workflow
workflow_parameter_type: credential_id
key: <unique_key>
default_value: <credential_id>
Using Parameters in Blocks:
- Reference using Jinja2: {{ param_key }}
- List parameter_keys in blocks that use them
- Parameters are resolved before block execution. ALL PARAMETER KEYS REFERENCED IN BLOCKS MUST FIRST BE DEFINED IN THE WORKFLOW PARAMETERS LIST
Example:
workflow_definition:
version: 2
blocks:
- label: block_1
block_type: navigation
url: https://news.ycombinator.com/
navigation_goal: "Give me top {{topics_count}} news items"
next_block_label: null
parameters:
- key: topics_count
description: null
parameter_type: workflow
workflow_parameter_type: integer
default_value: "3"
** COMMON BLOCK FIELDS **
All blocks inherit these base fields:
block_type: <type> # Required: Defines the block type
label: <unique_label> # Required: Unique identifier for this block
next_block_label: <label|null> # Required: Label of next block; use null only for terminal blocks
continue_on_failure: false # Optional: Continue workflow if block fails
next_loop_on_failure: false # Optional: Continue to next loop iteration on failure (for loop blocks only)
model: {} # Optional: Override model settings for this block
Important Rules:
- Labels must be unique within a workflow
- Labels cannot be empty or contain only whitespace
- next_block_label is required for all non-terminal blocks
- Use next_block_label for explicit flow control
- Set next_block_label to null to mark the end of a flow
- continue_on_failure allows graceful error handling
** NAVIGATION BLOCK (navigation) **
Purpose: Take actions on a page to achieve a focused goal — fill a form, click through a multi-step flow, prepare the page before an extraction.
Structure:
block_type: navigation
label: <unique_label>
url: <starting_url> # Optional: URL to navigate to; omit to continue on current page
title: str # Required: The title of the block
navigation_goal: <action_description> # Required: What actions to perform
error_code_mapping: {} # Optional: Map errors to custom codes
max_retries: 0 # Optional: Number of retry attempts
max_steps_per_run: null # Optional: Limit steps per execution
parameter_keys: [] # Optional: Parameters used in this block
complete_on_download: false # Optional: Complete when file downloads
download_suffix: null # Optional: Downloaded file name
totp_verification_url: null # Optional: TOTP verification URL
disable_cache: false # Optional: Disable caching
complete_criterion: null # Optional: Condition to mark complete
terminate_criterion: null # Optional: Condition to terminate
complete_verification: true # Optional: Verify completion
include_action_history_in_verification: false # Optional: Include history in verification
Use Cases:
- Fill out forms on websites
- Navigate complex multi-step processes
- Prepare the page for a subsequent extraction block
- Execute focused browser tasks with clear completion criteria
Example:
workflow_definition:
version: 2
blocks:
- block_type: navigation
label: search_and_open
next_block_label: null
url: "https://example.com/search"
navigation_goal: "Search for {{ query }} and click the first result"
parameter_keys:
- query
max_retries: 2
parameters:
- parameter_type: workflow
key: query
workflow_parameter_type: string
** URL BLOCK (goto_url) **
Purpose: Navigate directly to a URL without any additional instructions.
Structure:
block_type: goto_url
label: <unique_label>
url: <target_url> # Required: URL to navigate to
error_code_mapping: {} # Optional: Custom error codes
max_retries: 0 # Optional: Retry attempts
parameter_keys: [] # Optional: Parameters used
Use Cases:
- Jump to a known page before other blocks
- Reset the browser state to a specific URL
- Split URL navigation from subsequent actions
Example:
blocks:
- block_type: goto_url
label: open_cart
next_block_label: null
url: "https://example.com/cart"
** ACTION BLOCK (action) **
Purpose: Perform a single focused action on the current page without data extraction.
Structure:
block_type: action
label: <unique_label>
navigation_goal: <action_description> # Required: Single action to perform
url: <starting_url> # Optional: URL to start from
error_code_mapping: {} # Optional: Custom error codes
max_retries: 0 # Optional: Retry attempts
parameter_keys: [] # Optional: Parameters used
complete_on_download: false # Optional: Complete on download
download_suffix: null # Optional: Download file name
totp_verification_url: null # Optional: TOTP verification URL
totp_identifier: null # Optional: TOTP identifier
disable_cache: false # Optional: Disable cache
Use Cases:
- Click a specific button or link
- Fill a single field or selection
- Trigger a download with one action
Example:
blocks:
- block_type: action
label: accept_terms
next_block_label: null
url: "https://example.com/checkout"
navigation_goal: "Check the terms checkbox"
max_retries: 1
** TASK BLOCK (task) — NOT AVAILABLE IN WORKFLOW COPILOT **
DO NOT EMIT "task" blocks. They are not available in the workflow copilot and will be rejected at persistence. Use:
- "navigation" for page actions (filling forms, clicking, multi-step flows)
- "extraction" for data extraction (with data_extraction_goal + data_schema)
- "validation" for completion checks
- "login" for authentication
- "goto_url" for pure URL navigation
Legacy workflows that already contain "task" blocks continue to run outside the copilot; this ban applies only to copilot emission.
** TASK V2 BLOCK (task_v2) — DEPRECATED **
DO NOT USE task_v2. Use "navigation" blocks instead (with navigation_goal).
Use "extraction" blocks for data extraction (with data_extraction_goal + data_schema).
The task_v2 block type exists only for backward compatibility with existing workflows.
** FOR LOOP BLOCK (for_loop) **
Purpose: Iterate over a list of values and run a sequence of blocks for each item.
Structure:
block_type: for_loop
label: <unique_label>
loop_blocks: [] # Required: Blocks to run for each iteration
loop_over_parameter_key: <param_key> # Optional: Use ONLY for workflow parameters defined in parameters section
loop_variable_reference: <block_label> # Optional: Use to reference output from a previous block
complete_if_empty: false # Optional: Complete successfully when list is empty
Important Notes:
- Provide either loop_over_parameter_key or loop_variable_reference
- Loop blocks must use next_block_label to chain within the loop
- Each iteration exposes {{ current_value }}, {{ current_item }}, and {{ current_index }}
CRITICAL DISTINCTION:
* loop_over_parameter_key: ONLY for workflow parameters defined in the parameters section
- Use when looping over a static list or user-provided input
- Must reference a key from the workflow parameters list
- Example: loop_over_parameter_key: urls (where "urls" is in parameters)
* loop_variable_reference: For referencing output from a PREVIOUS BLOCK
- Use when looping over data extracted or generated by a previous block
- Reference the block's label directly (e.g., "extract_rows")
- System automatically tries: block_label.extracted_information, block_label.results, etc.
- Example: loop_variable_reference: extract_rows (where "extract_rows" is a previous extraction block)
Example 1 - Loop over workflow parameter:
parameters:
- parameter_type: workflow
key: urls
workflow_parameter_type: json
default_value:
- "https://example.com/a"
- "https://example.com/b"
blocks:
- block_type: for_loop
label: visit_urls
next_block_label: null
loop_over_parameter_key: urls # References the workflow parameter "urls"
loop_blocks:
- block_type: goto_url
label: open_url
next_block_label: null
url: "{{ current_value }}"
Example 2 - Loop over extraction block output:
blocks:
- block_type: extraction
label: extract_products
next_block_label: process_each_product
data_extraction_goal: "Extract all products from the table"
data_schema:
type: array
items:
type: object
properties:
name: {type: string}
price: {type: number}
- block_type: for_loop
label: process_each_product
next_block_label: null
loop_variable_reference: extract_products # References the extraction block's output
loop_blocks:
- block_type: navigation
label: add_to_cart
next_block_label: null
navigation_goal: "Add {{ current_value.name }} to cart"
** CONDITIONAL BLOCK (conditional) **
Purpose: Branch to different blocks based on ordered conditions.
Structure:
block_type: conditional
label: <unique_label>
branch_conditions: [] # Required: Ordered list of branch conditions
Branch Condition Structure:
criteria: # Optional for default branch
criteria_type: <jinja2_template|prompt> # Optional: inferred when omitted
expression: <expression> # Required when criteria present
description: <optional description>
next_block_label: <label|null> # Optional: Label to execute when matched
description: <optional description>
is_default: false # Required when criteria is omitted
Important Notes:
- At least one branch is required
- Branches are evaluated in order; first match wins
- Only one default branch is allowed
- Branches without criteria must set is_default: true
Example:
blocks:
- block_type: conditional
label: route_by_status
next_block_label: null
branch_conditions:
- criteria:
criteria_type: jinja2_template
expression: "{{ account_status == 'active' }}"
next_block_label: handle_active
description: "Active accounts"
is_default: false
- is_default: true
next_block_label: handle_inactive
description: "Fallback for all other states"
** LOGIN BLOCK (login) **
Purpose: Handle authentication flows including username/password and TOTP/2FA.
Structure:
block_type: login
label: <unique_label>
url: <login_page_url> # Optional: Login page URL
title: str # Required: The title of the block
navigation_goal: null # Optional: Additional navigation after login
error_code_mapping: {} # Optional: Custom error codes
max_retries: 0 # Optional: Retry attempts
max_steps_per_run: null # Optional: Step limit
parameter_keys: [] # Required: Should include credential parameters
complete_criterion: null # Optional: Completion condition
terminate_criterion: null # Optional: Termination condition
complete_verification: true # Optional: Verify successful login
Use Cases:
- Login to websites with username/password
- Handle 2FA/TOTP authentication
- Manage credential-protected workflows
- Session initialization
Important Notes:
- Credentials should be stored as parameters (credential, bitwarden_login_credential, etc.)
- TOTP is automatically handled if the credential parameter has TOTP configured
Example:
workflow_definition:
version: 2
blocks:
- block_type: login
label: login_to_portal
next_block_label: null
url: "https://portal.example.com/login"
parameter_keys:
- my_credentials # This must match a 'key' from the parameters list above
complete_criterion: "Current URL is 'https://portal.example.com/dashboard'"
max_retries: 2
parameters:
- parameter_type: workflow
workflow_parameter_type: credential_id
key: my_credentials
default_value: "cred_uuid_here"
** VALIDATION BLOCK (validation) **
Purpose: Validate workflow state and decide whether to continue or terminate.
Structure:
block_type: validation
label: <unique_label>
complete_criterion: <success_condition> # Optional: Condition for success
terminate_criterion: <termination_condition> # Optional: Condition to stop workflow
error_code_mapping: {} # Optional: Map errors to custom codes
parameter_keys: [] # Optional: Parameters used in this block
disable_cache: false # Optional: Disable caching
Use Cases:
- Confirm a successful navigation or submission
- Stop the workflow when an error state appears
- Validate content on the page before extracting data
Example:
blocks:
- block_type: validation
label: verify_submission
next_block_label: null
complete_criterion: "Page contains 'Thank you for your submission'"
terminate_criterion: "Page contains 'Error' or 'Try again'"
** WAIT BLOCK (wait) **
Purpose: Pause workflow execution for a specified duration.
Structure:
block_type: wait
label: <unique_label>
wait_sec: <seconds> # Required: Number of seconds to wait
Example:
blocks:
- block_type: wait
label: wait_for_processing
next_block_label: check_results
wait_sec: 30
** EXTRACTION BLOCK (extraction) **
Purpose: Extract structured data from the current page without navigation.
Structure:
block_type: extraction
label: <unique_label>
title: str # Required: The title of the block
data_extraction_goal: <what_to_extract> # Required: Description of data to extract
data_schema: <json_schema> # Optional: Structure of extracted data
url: <page_url> # Optional: URL to navigate to first
max_retries: 0 # Optional: Retry attempts
max_steps_per_run: null # Optional: Step limit
parameter_keys: [] # Optional: Parameters used
disable_cache: false # Optional: Disable cache
Use Cases:
- Extract structured data after other blocks
- Parse tables, lists, or forms
- Collect multiple data points from a page
- Data mining from web pages
Data Schema Formats:
* JSON Schema object:
data_schema:
type: object
properties:
field1: {type: string}
field2: {type: number}
* JSON Schema array:
data_schema:
type: array
items:
type: object
properties:
name: {type: string}
* String format (for simple extractions):
data_schema: "csv_string"
Example:
blocks:
- block_type: extraction
label: extract_product_list
next_block_label: null
data_extraction_goal: "Extract all products with their names, prices, and stock status"
data_schema:
type: array
items:
type: object
properties:
product_name: {type: string}
price: {type: number}
in_stock: {type: boolean}
rating: {type: number}
max_retries: 1
** FILE DOWNLOAD BLOCK (file_download) **
Purpose: Download files from a website. This is the "File Download Block" in the UI.
Structure:
block_type: file_download
label: <unique_label>
navigation_goal: <download_instruction> # Required: How to trigger the download
url: <starting_url> # Optional: URL to navigate to first
title: str # Optional: Title for the block
engine: skyvern-1.0 # Optional: Run engine
error_code_mapping: {} # Optional: Map errors to custom codes
max_retries: 0 # Optional: Number of retry attempts
max_steps_per_run: null # Optional: Limit steps per execution
parameter_keys: [] # Optional: Parameters used in this block
download_suffix: null # Optional: Full filename for the download
totp_verification_url: null # Optional: TOTP verification URL
totp_identifier: null # Optional: TOTP identifier
disable_cache: false # Optional: Disable caching
download_timeout: null # Optional: Download timeout in seconds
Use Cases:
- Download invoices, receipts, or reports from a portal
- Export data as CSV or PDF from a dashboard
- Trigger file downloads that require navigation steps
Example:
blocks:
- block_type: file_download
label: download_report
next_block_label: null
url: "https://portal.example.com/reports"
navigation_goal: "Open the latest report and click Download as PDF"
download_suffix: "latest_report.pdf"
** CLOUD STORAGE BLOCK (file_upload) **
Purpose: Upload files to storage. This is the "Cloud Storage Block" in the UI.
Structure:
block_type: file_upload
label: <unique_label>
storage_type: <s3|azure> # Optional: Storage backend (default: s3)
s3_bucket: <bucket_name> # Optional: S3 bucket name
aws_access_key_id: <access_key_id> # Optional: AWS access key id
aws_secret_access_key: <secret_access_key> # Optional: AWS secret access key
region_name: <aws_region> # Optional: AWS region
azure_storage_account_name: <account_name> # Optional: Azure storage account
azure_storage_account_key: <account_key> # Optional: Azure storage account key
azure_blob_container_name: <container_name> # Optional: Azure blob container
azure_folder_path: <folder_path> # Optional: Azure folder path
path: <local_or_workspace_path> # Optional: File path to upload
Use Cases:
- Upload downloaded artifacts to S3
- Publish files to Azure Blob Storage
- Persist workflow outputs to storage
Example:
blocks:
- block_type: file_upload
label: upload_report
next_block_label: null
storage_type: s3
s3_bucket: "my-reports"
region_name: "us-west-2"
path: "/tmp/latest_report.pdf"
** FILE PARSER BLOCK (file_url_parser) **
Purpose: Parse PDFs, CSVs, Excel files, images, and DOCX files. This is the "File Parser Block" in the UI.
Structure:
block_type: file_url_parser
label: <unique_label>
file_url: <https_url_or_file_url> # Required: URL to the file
file_type: <auto_detect|csv|excel|pdf|image|docx> # Optional: defaults to auto_detect (infer from URL/content)
json_schema: <json_schema> # Optional: Structure of parsed output
Use Cases:
- Parse a PDF invoice into structured fields
- Extract rows from a CSV file
- Read Excel sheets for downstream processing
Example:
blocks:
- block_type: file_url_parser
label: parse_invoice
next_block_label: null
file_url: "https://example.com/invoice.pdf"
file_type: pdf
json_schema:
type: object
properties:
invoice_id: {type: string}
total: {type: number}
** SEND EMAIL BLOCK (send_email) **
Purpose: Send email notifications. This is the "Send Email Block" in the UI.
Structure:
block_type: send_email
label: <unique_label>
smtp_host_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP host
smtp_port_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP port
smtp_username_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP username
smtp_password_secret_parameter_key: <param_key> # Required: Secret parameter key for SMTP password
sender: <email_address> # Required: Sender email address
recipients: [<email_address>] # Required: Recipient list
subject: <subject_line> # Required: Email subject
body: <email_body> # Required: Email body
file_attachments: [<file_path>] # Optional: Local file paths to attach
Use Cases:
- Notify a team when a workflow completes
- Send extracted data summaries to stakeholders
- Email reports or attachments
Example:
blocks:
- block_type: send_email
label: notify_ops
next_block_label: null
smtp_host_secret_parameter_key: smtp_host
smtp_port_secret_parameter_key: smtp_port
smtp_username_secret_parameter_key: smtp_user
smtp_password_secret_parameter_key: smtp_pass
sender: "automation@example.com"
recipients: ["ops@example.com"]
subject: "Daily report ready"
body: "The latest report is ready for review."
file_attachments: ["/tmp/latest_report.pdf"]
** TEXT PROMPT BLOCK (text_prompt) **
Purpose: Process text with LLM. This is the "Text Prompt Block" in the UI.
Structure:
block_type: text_prompt
label: <unique_label>
llm_key: <llm_key> # Optional: Model key override
prompt: <text_prompt> # Required: Prompt to run
parameter_keys: [] # Optional: Parameters used in this block
json_schema: <json_schema> # Optional: Structured output schema
Use Cases:
- Summarize extracted text or documents
- Normalize free-form content into a schema
- Generate classifications or tags
Example:
workflow_definition:
version: 2
blocks:
- block_type: text_prompt
label: summarize_notes
next_block_label: null
prompt: "Summarize these notes: {{ notes }}"
json_schema:
type: object
properties:
summary: {type: string}
action_items: {type: array, items: {type: string}}
parameter_keys: [notes]
parameters:
- parameter_type: workflow
key: notes
workflow_parameter_type: string
** HTTP REQUEST BLOCK (http_request) **
Purpose: Make HTTP API calls. This is the "HTTP Request Block" in the UI.
Structure:
block_type: http_request
label: <unique_label>
method: <GET|POST|PUT|PATCH|DELETE> # Optional: HTTP method (default: GET)
url: <https_url> # Optional: Target URL
headers: {} # Optional: HTTP headers
body: {} # Optional: JSON body (MUST be a dict, not a string)
files: {} # Optional: Multipart files mapping
timeout: 30 # Optional: Timeout in seconds
follow_redirects: true # Optional: Follow redirects
parameter_keys: [] # Optional: Workflow parameters used (block outputs don't need to be listed)
Use Cases:
- Call third-party APIs for enrichment
- Post data to internal services
- Upload files via multipart requests
- Send results from previous blocks to webhooks
Example 1 - Using workflow parameters:
workflow_definition:
version: 2
blocks:
- block_type: http_request
label: lookup_customer
next_block_label: null
method: "POST"
url: "https://api.example.com/customers/search"
headers:
Authorization: "Bearer {{ api_token }}"
body:
email: "{{ customer_email }}"
parameter_keys: [api_token, customer_email]
parameters:
- parameter_type: workflow
key: api_token
workflow_parameter_type: string
- parameter_type: workflow
key: customer_email
workflow_parameter_type: string
Example 2 - Sending block output to webhook (no parameter_keys needed):
workflow_definition:
version: 2
parameters: []
blocks:
- block_type: navigation
label: get_data
next_block_label: send_webhook
navigation_goal: "Get top 3 hacker news items"
url: "https://news.ycombinator.com"
- block_type: http_request
label: send_webhook
next_block_label: null
method: POST
url: "http://example.com/webhook"
body:
data: "{{ get_data.output }}"
parameter_keys: []
** PARAMETER TEMPLATING **
All string fields in blocks support Jinja2 templating to reference parameters and block outputs.
There are TWO types of references:
1. WORKFLOW PARAMETERS - Reference input parameters defined in the parameters section
Syntax: {{ param_key }}
Requires: parameter_keys list must include the param_key
2. BLOCK OUTPUTS - Reference output from a previous block by its label
Syntax: {{ block_label.output }}
Requires: NOTHING - block outputs are automatically available, no parameter_keys needed
IMPORTANT: Block outputs use the block's label directly (e.g., {{ block_1.output }}).
Examples - Workflow Parameters:
* In URL:
url: "https://example.com/search?q={{ search_term }}"
parameter_keys: [search_term]
* In goals:
navigation_goal: "Search for {{ product_name }} and filter by {{ category }}"
parameter_keys: [product_name, category]
Examples - Block Outputs (no parameter_keys needed):
* Send previous block output to webhook:
blocks:
- block_type: navigation
label: get_data
navigation_goal: "Get top 3 hacker news items"
...
- block_type: http_request
label: send_webhook
method: POST
url: "http://example.com/webhook"
body:
data: "{{ get_data.output }}"
parameter_keys: [] # Empty - block outputs don't need to be declared
* Use extraction output in next block:
blocks:
- block_type: extraction
label: extract_items
...
- block_type: navigation
label: process_items
navigation_goal: "Process these items: {{ extract_items.output }}"
* In schemas (as descriptions):
data_schema:
type: object
properties:
query_result:
type: string
description: "Result for query: {{ query }}"
** ERROR HANDLING AND RETRIES **
Error Code Mapping:
Map internal errors to custom error codes for easier handling:
error_code_mapping:
"ElementNotFound": "ELEMENT_MISSING"
"TimeoutError": "PAGE_TIMEOUT"
"NavigationFailed": "NAV_ERROR"
Retry Configuration:
max_retries: 3 # Block will retry up to 3 times on failure
Conditional Continuation:
continue_on_failure: true # Workflow continues even if block fails
Loop Continuation:
next_loop_on_failure: true # Skip to next iteration in loops
Completion Criteria:
complete_criterion: "URL contains '/success'" # Condition for success
terminate_criterion: "Element with text 'Error' exists" # Condition to stop
** WORKFLOW EXECUTION FLOW **
Sequential Execution:
blocks:
- block_type: goto_url
label: step1
next_block_label: step2
url: "https://example.com/start"
- block_type: extraction
label: step2
next_block_label: step3
- block_type: navigation
label: step3
next_block_label: null
navigation_goal: "Complete the final step on the page"
# Executes: step1 → step2 → step3
Explicit Flow Control (Skip blocks):
blocks:
- block_type: goto_url
label: login
next_block_label: extract_data
url: "https://app.example.com/login"
- block_type: navigation
label: handle_error
next_block_label: null
navigation_goal: "Handle the error state if it appears"
- block_type: extraction
label: extract_data
next_block_label: null
# Executes: login → extract_data (skips handle_error)
Error Recovery Flow:
blocks:
- block_type: navigation
label: primary_task
next_block_label: verify_result
continue_on_failure: true
navigation_goal: "Attempt the primary task on the page"
- block_type: validation
label: verify_result
next_block_label: null
** BEST PRACTICES **
* Naming Conventions:
- Use descriptive labels: "login_to_portal" not "step1"
- Use snake_case for labels and parameter keys
- Use snake_case for data schema field names (e.g., product_name, total_price)
- Make labels unique and meaningful
* Goal Writing:
- Be specific: "Click the blue 'Submit' button" vs "Submit the form"
- Include context: "After clicking Search, wait for results to load"
- Natural language: Write as you would instruct a human
* Parameter Usage:
- Always list parameter_keys when using parameters in a block
- Validate parameter types match usage
- Provide default values for optional parameters
* Error Handling:
- Set appropriate max_retries for flaky operations
- Use complete_criterion for validation
- Map errors to meaningful codes for debugging
* Data Extraction:
- Always provide data_schema for structured extraction
- Use specific extraction goals
- Handle arrays vs objects appropriately
* Performance:
- Use disable_cache: true for dynamic content
- Set max_steps_per_run to prevent infinite loops
- Use navigation blocks for actions and extraction blocks for data extraction
* Security:
- Never hardcode credentials in workflows
- Use credential parameters for sensitive data
- Use AWS secrets or vault integrations
** COMMON PATTERNS **
Pattern 1: Login → Navigate → Extract
workflow_definition:
version: 2
blocks:
- block_type: login
label: authenticate
next_block_label: go_to_reports
url: "https://app.example.com/login"
parameter_keys: [my_credentials]
- block_type: navigation
label: go_to_reports
next_block_label: get_report_data
navigation_goal: "Navigate to Reports section"
- block_type: extraction
label: get_report_data
next_block_label: null
data_extraction_goal: "Extract all report entries"
data_schema:
type: array
items: {type: object}
parameters:
- parameter_type: workflow
workflow_parameter_type: credential_id
key: my_credentials
default_value: "uuid"
- parameter_type: output
key: extracted_data
Pattern 2: Search with Dynamic Input
workflow_definition:
version: 2
blocks:
- block_type: navigation
label: search_and_extract
next_block_label: null
url: "https://example.com"
navigation_goal: "Search for '{{ search_query }}' and extract the first 10 results with titles and URLs"
parameters:
- parameter_type: workflow
key: search_query
workflow_parameter_type: string
Pattern 3: Multi-Step Form Filling
workflow_definition:
version: 2
blocks:
- block_type: goto_url
label: open_form
next_block_label: fill_personal_info
url: "https://forms.example.com/application"
- block_type: navigation
label: fill_personal_info
next_block_label: fill_address
navigation_goal: "Fill in name as {{ name }}, email as {{ email }}"
parameter_keys: [name, email]
- block_type: navigation
label: fill_address
next_block_label: submit
navigation_goal: "Fill in address fields and click Continue"
parameter_keys: [address, city, zip]
- block_type: navigation
label: submit
next_block_label: null
navigation_goal: "Review information and click Submit"
parameters:
- parameter_type: workflow
key: name
workflow_parameter_type: string
- parameter_type: workflow
key: email
workflow_parameter_type: string
- parameter_type: workflow
key: address
workflow_parameter_type: string
- parameter_type: workflow
key: city
workflow_parameter_type: string
- parameter_type: workflow
key: zip
workflow_parameter_type: string
Pattern 4: Conditional Extraction
workflow_definition:
version: 2
blocks:
- block_type: navigation
label: search_product
next_block_label: check_availability
navigation_goal: "Search for {{ product }}"
- block_type: extraction
label: check_availability
next_block_label: add_to_cart
data_extraction_goal: "Check if product is in stock"
data_schema:
type: object
properties:
in_stock: {type: boolean}
- block_type: navigation
label: add_to_cart
next_block_label: null
navigation_goal: "If product is in stock, add to cart"
parameters: []
** VALIDATION RULES **
Workflow-Level:
- All block labels must be unique
- Parameters referenced in blocks must be defined
- next_block_label must point to existing block labels or be null
- The last block in execution flow should have next_block_label: null
Block-Level:
- label is required and cannot be empty
- block_type must be a valid type
- For navigation blocks: navigation_goal is required
- For extraction blocks: data_extraction_goal is required
- For action blocks: navigation_goal is required
- For login blocks: parameter_keys should include credentials
Parameter-Level:
- key must be unique across parameters
- key cannot contain whitespace
- parameter_type must be valid
- Referenced keys (like source_parameter_key) must exist
** COMPLETE WORKFLOW EXAMPLE **
title: E-commerce Product Search and Purchase
description: Search for a product, extract details, and add to cart
workflow_definition:
version: 2
blocks:
- block_type: login
label: login_to_store
next_block_label: search_and_filter
url: "https://shop.example.com/login"
parameter_keys:
- account_creds
complete_criterion: "URL contains '/dashboard'"
- block_type: navigation
label: search_and_filter
next_block_label: get_product_info
url: "https://shop.example.com/search"
navigation_goal: "Search for {{ product_name }} and filter results by price under ${{ max_price }}"
parameter_keys:
- product_name
- max_price
max_retries: 2
- block_type: extraction
label: get_product_info
next_block_label: add_to_cart
data_extraction_goal: "Extract product name, price, rating, and availability"
data_schema:
type: object
properties:
name: {type: string}
price: {type: number}
rating: {type: number}
available: {type: boolean}
- block_type: navigation
label: add_to_cart
next_block_label: null
navigation_goal: "Click on the first available product and add it to cart"
max_retries: 3
parameters:
- parameter_type: workflow
key: product_name
workflow_parameter_type: string
description: "Product to search for"
- parameter_type: workflow
key: max_price
workflow_parameter_type: float
description: "Maximum price willing to pay"
- parameter_type: workflow
workflow_parameter_type: credential_id
key: account_creds
default_value: "cred_12345"
- parameter_type: output
key: product_details
description: "Extracted product information"
END OF KNOWLEDGE BASE