mirror of
https://github.com/vegu-ai/talemate.git
synced 2025-09-02 02:19:12 +00:00
* track time passage in scene using iso 8601 format * chromadb openai instructions model recommendations updated * time context passed to long term memory * add some pre-established history for testing purposes * time passage analyze dialogue to template query_text template function analyze text and answer question summarizer function llm prompt template adjustments iso8601 time utils chromadb docs adjustments * didnt mean to remove this * fix ClientContext stacking * conversation cleanup tweaks * prompt prepared response padding * fix some bugs causing conversation lines containing : to be terminated early * fixes issue with chara importing dialoge examples as huge blob instea of splitting into lines dialogue example in conversation template randomized * llm prompt template for Speechless-Llama2-Hermes-Orca-Platypus-WizardLM * version to 0.10.0
61 lines
No EOL
2.3 KiB
Markdown
61 lines
No EOL
2.3 KiB
Markdown
# ChromaDB
|
|
|
|
Talemate uses ChromaDB to maintain long-term memory. The default embeddings used are really fast but also not incredibly accurate. If you want to use more accurate embeddings you can use the instructor embeddings or the openai embeddings. See below for instructions on how to enable these.
|
|
|
|
In my testing so far, instructor-xl has proved to be the most accurate (even more-so than openai)
|
|
|
|
## Local instructor embeddings
|
|
|
|
If you want chromaDB to use the more accurate (but much slower) instructor embeddings add the following to `config.yaml`:
|
|
|
|
**Note**: The `xl` model takes a while to load even with cuda. Expect a minute of loading time on the first scene you load.
|
|
|
|
```yaml
|
|
chromadb:
|
|
embeddings: instructor
|
|
instructor_device: cpu
|
|
instructor_model: hkunlp/instructor-xl
|
|
```
|
|
|
|
### Instructor embedding models
|
|
|
|
- `hkunlp/instructor-base` (smallest / fastest)
|
|
- `hkunlp/instructor-large`
|
|
- `hkunlp/instructor-xl` (largest / slowest) - requires about 5GB of memory
|
|
|
|
You will need to restart the backend for this change to take effect.
|
|
|
|
**NOTE** - The first time you do this it will need to download the instructor model you selected. This may take a while, and the talemate backend will be un-responsive during that time.
|
|
|
|
Once the download is finished, if talemate is still un-responsive, try reloading the front-end to reconnect. When all fails just restart the backend as well. I'll try to make this more robust in the future.
|
|
|
|
### GPU support
|
|
|
|
If you want to use the instructor embeddings with GPU support, you will need to install pytorch with CUDA support.
|
|
|
|
To do this on windows, run `install-pytorch-cuda.bat` from the project directory. Then change your device in the config to `cuda`:
|
|
|
|
```yaml
|
|
chromadb:
|
|
embeddings: instructor
|
|
instructor_device: cuda
|
|
instructor_model: hkunlp/instructor-xl
|
|
```
|
|
|
|
## OpenAI embeddings
|
|
|
|
First make sure your openai key is specified in the `config.yaml` file
|
|
|
|
```yaml
|
|
openai:
|
|
api_key: <your-key-here>
|
|
```
|
|
|
|
Then add the following to `config.yaml` for chromadb:
|
|
|
|
```yaml
|
|
chromadb:
|
|
embeddings: openai
|
|
```
|
|
|
|
**Note**: As with everything openai, using this isn't free. It's way cheaper than their text completion though. ALSO - if you send super explicit content they may flag / ban your key, so keep that in mind (i hear they usually send warnings first though), and always monitor your usage on their dashboard. |