talemate/docs/user-guide/integrations/runpod.md
veguAI 95a17197ba
0.26.0 (#133)
* implement manually disabling and enabling clients

* relock

* fix warning spam

* start moving stuff around

* move more stuff

* start separating world state manager into more managable submodules

* character title

* scroll home to top always

* finish separating character state editor into components

* fix defered nav to character sections

* separate components for pin and contextdb managing

* fix issue with context character filter search

* fix world state manage ux state reset issues

* wsm menu refactor
allow updating character image from wsm
cover image layout fixes

* remove debug spam

* fix client deletion / disabling rubber banding issue

* deactivate / activate / delete characters through wsm

* reload character instead

* fix koboldcpp client jiggle arguments

* save scene title

* fix deferred nav

* fix issue where blanking a character detail would bug out

* some layout changes

* character import copies cover image

* remove debug message

* character import via wsm

* deactivate imported characters

* images nav option placeholder

* start move towards new world state templating system

* prompt tweak

* add templates/world-state/*.yaml

* switch to new world state template system in manager

* template editor progress

* more wsm template changes

* template applicator component

* template applicate added to attributes and details

* selective template application

* fix issue with template editing

* attribute and detail templates dont require instructions

* adjust character attributes and details template applicator integration

* add gpt-4o

add gpt-4o-2024-05-13

* autocomplete prompt and postprocessing tweaks

* prompt tweaks

* fix issue where saving a new scene could cause recent config changes to revert

* only download punkt if its not downloaded yet

* working character attribute templates

* character detail generate working
move template generate logic to worldstate.templates

* character creator first steps

* support contextual generate when character doesn't exist

* move talemate wsm templates to their own dir, add supports_spice and supports_style flags

* wsm character creator progress

* character creator progress

* character creator progress and wire up image creation in character editor

* templating progress

* contextual generate generation options

* ux tweaks

* wirte up writing style and spice to generation

* wire spice / writing style to detail generation

* notify when spice is applied

* tweaks to generation spice notifications

* add some help / information to template editor

* fix some issues with detail and attribute generation

* some context gen tweaks

* character gen tweaks

* character color changer

* link to templates form gen option ux

* gen options for dialogue example genrate

* ctrl click to max spice level

* unify spice application notification into a component for reuse

* improvements to example dialogue generation

* some refinements to character editor

* remove some old cruft from scene schema

* wsm scene editor progress

* relock

* relock

* debug message cleanup

* fix issue with tab selection when loading a scene

* scene editor progress

* centralized generation options

* pass generation settings through to character creator

* save changes from wsm view

* scene settings
save copy

* refactor world entry / states editor

* fix issue with applying non-character world state templates

* layout fixes

* allow updating of scene cover image

* move history manager to world editor

* add phi-3 base template

* dialogue cleanup improvements

* refactor scoped game-engine api

* separate legacy creator functions to own file

* remove cruft

* some cleanup and fixes

* add photo style

* remove noisy log message

* better handling of active scene

* some fixes to pin editor

* don't enforce height

* active scene context fixes

* fix intro and scene description generration

* tweak preset for scene direction and summarization tasks

* ensure memory db is open

* update frontend dependencies

* update frontend dependencies

* fix issue with prompt query_memory function returning None

* typo

* default  world state templates

* new scene creation fixes
remove legacy creator ux

* scene export

* fix scene loading from upload

* add claude 3.5 sonnet

* fix automatic client selection when the current client is disabled

* remove cruft

* agent modal extended to support multiple config panels
visual agent prompt prefixes and suffixes addeed

* fix issue with world state template group saving

* resolve attribute name issue `copy`

* RequestInput: fix form validation and keystroke submit

* support chara load from json files also refactor character loading to load.py

* implement simple act-as feature using tab to cycle through active characters in the scene

* docs progress

* tts settings tweaks

* fix issue with loading older talemate scenes

* docs progress

* fix issue with config validation on new installs

* some tweaks for agent setting modals

* default template changed to alpaca

* docs dependencies

* gemma2 template

* nemotron4 template

* docs

* docs

* docs

* change prompt template section to autocomplete

* fix agent config not loading for some agents

* allow deletion of player character

* fix some oddities with scene outline commit

* automatically active player characters and create player characters with the correct actor class

* also set the first npc created as immediately acitve

* add has_active_npcs property and re-emit message history when scene outline is updated.

* indicate when visualizer is busy in the scene tools

* check for busy instead

* prompt tweaks for movie script type dialogue format

* gemma2 prompt fixed

* scene message colors updated

* act as narrator

* move to _old

* scene message appearance tweaks

* fix rubberbanding when editing text field in agent configs

* fix autocompletionm when acting as different character or narrator

* disable autocomplete during command execution

* remove autocomplete button from scene tools

* docs

* relock

* docs

* docs

* improve context pins in dialogue context

* better approximate token count

* fix pin condition editing

* fix issue where scene save as would lose long term memory entries

* immediately clean message history when loading a new scene

* docs

* ensure intro text has formatting markers

* narrator messages written by the player can now be deleted.

* scene editor

* move docs around

* start character editor docs

* more character editor docs:

* fix some ux bugs

* fix template group deletrion not removing the file

* docs

* typos

* docs

* relock

* docs

* notify image generation errors

* linting

* gh pages workflow

* use poetry

* dont use poetry

* link to docs site

* set site_url

* add trailing slash

* fix image paths

* re-add tabbyai link

* fix image generation error triggering incorrectly

* fix intro formatting incosistencies

* remove cruft

* add time passed label to history view

* date adjustments

* tests

* add gpt-4o-mini

* fix links

* remove hard ntlk requirement for voice generation chunking

ntlk error handling

fix typo

* docs

* fix issdue with dupe character card intro text

* disable character forms while templates are being applied.

* failure during context generate no longer locks ux

* refactor client and agent status display in system bar

* llama 3.1 8b claude

* fix format

* adjustments to automcomplete dialogue instructions

* add mistral nemo

* debug info

* fix system agent status getting stuck

* readme

* readme

* fix autocomplete responses when they are framed by quotes
2024-07-26 21:43:06 +03:00

3.4 KiB

!!! note These instructions have not been updated in a while and RunPod has changed somewhat. I will update this as soon as I can. The general idea should still be the same.

RunPod allows you to quickly set up and run text-generation-webui instances on powerful GPUs, remotely. If you want to run the significantly larger models (like 70B parameters) with reasonable speeds, this is probably the best way to do it.

Get a RunPod API key and add it to the talemate config

You can manage your RunPod api keys at https://www.runpod.io/console/user/settings

Once you have your key you can open the settings in talemate.

Open settings

Then click the APPLICATION tab and then the RUNPOD category. Here you can add your RunPod API key.

Runpod settings

Setting the runpod api key requires a restart of the backend, so make sure to save your changes and restart the backend.

Create a RunPod instance

Community Cloud

The community cloud pods are cheaper and there are generally more GPUs available. They do however not support persistent storage and you will have to download your model and data every time you deploy a pod.

Secure Cloud

The secure cloud pods are more expensive and there are generally fewer GPUs available, but they do support persistent storage.

Peristent volumes are super convenient, but optional for our purposes and are not free and you will have to pay for the storage you use.

Deploy pod

For us it does not matter which cloud you choose. The only thing that matters is that it deploys a text-generation-webui instance, and you ensure that by choosing the right template.

Pick the GPU you want to use, for 70B models you want at least 48GB of VRAM and click Deploy, then select a template and deploy.

When choosing the template for your pod, choose the RunPod TheBloke LLMs template. This template is pre-configured with all the dependencies needed to run text-generation-webui. There are other text-generation-webui templates, but they are usually out of date and this one i found to be consistently good.

!!! warning The name of your pod is important and ensures that Talemate will be able to find it. Talemate will only be able to find pods that have the word thebloke llms or textgen in their name. (case insensitive)

Once your pod is deployed and has finished setup and is running, the client will automatically appear in the Talemate client list, making it available for you to use like you would use a locally hosted text-generation-webui instance.

RunPod client

Connecting to the text-generation-webui UI

To manage your text-generation-webui instance, click the Connect button in your RunPod pod dashboard at https://www.runpod.io/console/pods and in the popup click on Connect to HTTP Service [Port 7860] to open the text-generation-webui UI. Then just download and load your model as you normally would.

⚠️ Always check your pod status on the RunPod dashboard

Talemate is not a suitable or reliable way for you to determine whether your pod is currently running or not. Always check the runpod dashboard to see if your pod is running or not.

While your pod us running it will be eating up your credits, so make sure to stop it when you're not using it.