Streamlit + Readme update: copy to cURL (#22)

This commit is contained in:
Suchintan 2024-03-04 12:41:38 -05:00 committed by GitHub
parent 0495552b11
commit 3bf56717c9
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 79 additions and 34 deletions

View file

@ -27,8 +27,35 @@
<img src="images/geico_shu_recording_cropped.gif"/>
</p>
Want to see more examples of Skyvern in action? Click [here](#real-world-examples-of-skyvern)!
Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.
Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them.
This approach gives us a few advantages:
1. Skyvern can operate on websites its never seen before, as its able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, its understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)
Want to see examples of Skyvern in action? Jump to [#real-world-examples-of-skyvern](#real-world-examples-of-skyvern)
# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).
<picture>
<source media="(prefers-color-scheme: dark)" srcset="images/skyvern-system-diagram-dark.png" />
<img src="images/skyvern-system-diagram-light.png" />
</picture>
<!-- TODO (suchintan):
Expand the diagram above to go deeper into how:
1. We draw bounding boxes
2. We parse the HTML + extract the image to generate an interactable element map
-->
# Quickstart
This quickstart guide will walk you through getting Skyvern up and running on your local machine.
@ -72,20 +99,26 @@ pre-commit install
## Running your first automation
### Executing tasks (UI)
Once you have the UI running, you can start an automation by filling out the fields shown in the UI and clicking "Execute"
# How it works
Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major difference: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/).
<p align="center">
<img src="images/skyvern_visualizer_run_task.png"/>
</p>
<picture>
<source media="(prefers-color-scheme: dark)" srcset="images/skyvern-system-diagram-dark.png"/>
<img src="images/skyvern-system-diagram-light.png"/>
</picture>
### Executing tasks (cURL)
```
curl -X POST -H 'Content-Type: application/json' -H 'x-api-key: {Your local API key}' -d '{
"url": "https://www.geico.com",
"webhook_callback_url": "",
"navigation_goal": "Navigate through the website until you generate an auto insurance quote. Do not generate a home insurance quote. If this page contains an auto insurance quote, consider the goal achieved",
"data_extraction_goal": "Extract all quote information in JSON format including the premium amount, the timeframe for the quote.",
"navigation_payload": "{Your data here}",
"proxy_location": "NONE"
}' http://0.0.0.0:8000/api/v1/tasks
```
<!-- > TODO (suchintan):
Expand the diagram above to go deeper into how:
1. We draw bounding boxes
2. We parse the HTML + extract the image to generate an interactable element map
-->
# Real-world examples of Skyvern
<!-- > TODO (suchintan):
@ -123,18 +156,6 @@ More extensive documentation can be found on our [documentation website](https:/
Our focus is bringing stability to browser-based workflows. We leverage LLMs to create an AI Agent capable of interacting with websites like you or I would — all via a simple API call.
Traditional approaches required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed.
Skyvern operates like a human — increasing reliability by not relying on fragile scripts, instead relying on computer vision to parse items in the viewport and interact with them the way a human would.
This approach gives us a few advantages:
1. Skyvern can operate on websites its never seen before, as its able to map visual elements to actions necessary to complete a workflow, without any customized code
1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate
1. Skyvern is able to circumvent or navigate through many bot detection methods as many of them rely on allowing people to access the websites
1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include:
1. If you wanted to get an auto insurance quote from Geico, the answer to a common question “Were you eligible to drive at 18?” could be inferred from the driver receiving their license at age 16
1. If you were doing competitor analysis, its understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!)
# Feature Roadmap

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

View file

@ -43,6 +43,8 @@ asyncache = "^0.3.1"
orjson = "^3.9.10"
structlog = "^23.2.0"
plotly = "^5.18.0"
clipboard = "^0.0.4"
curlify = "^2.2.1"
[tool.poetry.group.dev.dependencies]
@ -66,6 +68,7 @@ notebook = "^7.0.6"
freezegun = "^1.2.2"
snoop = "^0.4.3"
rich = {extras = ["jupyter"], version = "^13.7.0"}
clipboard = "^0.0.4"
[build-system]

View file

@ -1,7 +1,9 @@
import json
from typing import Any
import curlify
import requests
from requests import PreparedRequest
from skyvern.forge.sdk.schemas.tasks import TaskRequest
@ -11,7 +13,7 @@ class SkyvernClient:
self.base_url = base_url
self.credentials = credentials
def create_task(self, task_request_body: TaskRequest) -> str | None:
def generate_curl_params(self, task_request_body: TaskRequest) -> PreparedRequest:
url = f"{self.base_url}/tasks"
payload = task_request_body.model_dump()
headers = {
@ -19,11 +21,23 @@ class SkyvernClient:
"x-api-key": self.credentials,
}
return url, payload, headers
def create_task(self, task_request_body: TaskRequest) -> str | None:
url, payload, headers = self.generate_curl_params(task_request_body)
response = requests.post(url, headers=headers, data=json.dumps(payload))
if "task_id" not in response.json():
return None
return response.json()["task_id"]
def copy_curl(self, task_request_body: TaskRequest) -> str:
url, payload, headers = self.generate_curl_params(task_request_body)
req = requests.Request("POST", url, headers=headers, data=json.dumps(payload, indent=4))
return curlify.to_curl(req.prepare())
def get_task(self, task_id: str) -> dict[str, Any] | None:
"""Get a task by id."""
url = f"{self.base_url}/internal/tasks/{task_id}"

View file

@ -1,16 +1,11 @@
from pydantic import BaseModel
from skyvern.forge.sdk.schemas.tasks import TaskRequest
class SampleData(BaseModel):
class SampleTaskRequest(TaskRequest):
name: str
url: str
navigation_goal: str
data_extraction_goal: str
navigation_payload: dict
extracted_information_schema: dict
geico_sample_data = SampleData(
geico_sample_data = SampleTaskRequest(
name="Geico",
url="https://www.geico.com",
navigation_goal="Navigate through the website until you generate an auto insurance quote. Do not generate a home insurance quote. If this page contains an auto insurance quote, consider the goal achieved",

View file

@ -1,3 +1,4 @@
import clipboard
import pandas as pd
import streamlit as st
@ -104,6 +105,11 @@ st.markdown("# **:dragon: Skyvern :dragon:**")
st.markdown(f"### **{select_env} - {select_org}**")
execute_tab, visualizer_tab = st.tabs(["Execute", "Visualizer"])
def copy_curl_to_clipboard(task_request_body: TaskRequest) -> None:
clipboard.copy(client.copy_curl(task_request_body=task_request_body))
with execute_tab:
example_tabs = st.tabs([supported_example.name for supported_example in supported_examples])
@ -111,8 +117,14 @@ with execute_tab:
with example_tab:
create_column, explanation_column = st.columns([1, 2])
with create_column:
run_task, copy_curl = st.columns([3, 1])
task_request_body = supported_examples[i]
copy_curl.button(
"Copy cURL", on_click=lambda: copy_curl_to_clipboard(task_request_body=task_request_body)
)
with st.form("task_form"):
st.markdown("## Run a task")
run_task.markdown("## Run a task")
example = supported_examples[i]
# Create all the fields to create a TaskRequest object
st_url = st.text_input("URL*", value=example.url, key="url")