talemate/docs/runpod.md
veguAI 611f77a730
Prep 0.16.0 (#40)
* remove dbg message

* more work to make clients and agents modular
allow conversation and narrator to attempt to auto break AI repetition

* application settings refactor
setup third party api keys through application settings

* runpod docs

* fix wording

* docs

* improvements to auto-break-repetition functionality

* more auto-break-repetition improvements

* some cleanup to narrate on dialogue chance calculations

* changing api keys via ux should now reflect to ux instantly.

* memory agent / chromadb agent - wrap blocking functions calls in asyncio

* clean up narrate progression prompt and function

* turn off dedupe debug message for now

* encourage the AI to break repetition as well

* indicate if the current model is missing a LLM prompt template
add prompt template to client modal
fix a bunch of bad vue code

* only show llm prompt when editing client

* OpenHermes-2.5-neural-chat
RpBird-Yi-34B

* fix bug with auto rep break when no repetition was found

* allow giving extra instructions to narrator agent

* emit agents as needed, not constantly

* fix a bunch of vue alerts

* fix request-client-status event

* remove undefined reference

* log client / status emit

* worldstate component track scene time

* Tess
Noromaid

* fix narrate-character prompt context length overflow issues

* disable worldstate refresh button while waiting for response

* history timestamp moved to tooltip off of history button

* fixes #39: using openai embeddings for chromadb tends to error

* adjust conversation again default instructions

* poetry lock

* remove debug message

* chromadb - agent status error if openai embeddings are selected in api key isn't set

* prep 0.16.0
2023-12-08 22:57:44 +02:00

3 KiB

RunPod integration

RunPod allows you to quickly set up and run text-generation-webui instances on powerful GPUs, remotely. If you want to run the significantly larger models (like 70B parameters) with reasonable speeds, this is probably the best way to do it.

Create / grab your RunPod API key and add it to the talemate config

You can manage your RunPod api keys at https://www.runpod.io/console/user/settings

Add the key to your Talemate config file (config.yaml):

runpod:
    api_key: <your api key>

Then restart Talemate.

Create a RunPod instance

Community Cloud

The community cloud pods are cheaper and there are generally more GPUs available. They do however not support persistent storage and you will have to download your model and data every time you deploy a pod.

Secure Cloud

The secure cloud pods are more expensive and there are generally fewer GPUs available, but they do support persistent storage.

Peristent volumes are super convenient, but optional for our purposes and are not free and you will have to pay for the storage you use.

Deploy pod

For us it does not matter which cloud you choose. The only thing that matters is that it deploys a text-generation-webui instance, and you ensure that by choosing the right template.

Pick the GPU you want to use, for 70B models you want at least 48GB of VRAM and click Deploy, then select a template and deploy.

When choosing the template for your pod, choose the RunPod TheBloke LLMs template. This template is pre-configured with all the dependencies needed to run text-generation-webui. There are other text-generation-webui templates, but they are usually out of date and this one i found to be consistently good.

⚠️ The name of your pod is important and ensures that Talemate will be able to find it. Talemate will only be able to find pods that have the word thebloke llms or textgen in their name. (case insensitive)

Once your pod is deployed and has finished setup and is running, the client will automatically appear in the Talemate client list, making it available for you to use like you would use a locally hosted text-generation-webui instance.

RunPod client

Connecting to the text-generation-webui UI

To manage your text-generation-webui instance, click the Connect button in your RunPod pod dashboard at https://www.runpod.io/console/pods and in the popup click on Connect to HTTP Service [Port 7860] to open the text-generation-webui UI. Then just download and load your model as you normally would.

⚠️ Always check your pod status on the RunPod dashboard

Talemate is not a suitable or reliable way for you to determine whether your pod is currently running or not. Always check the runpod dashboard to see if your pod is running or not.

While your pod us running it will be eating up your credits, so make sure to stop it when you're not using it.