|
||
---|---|---|
.. | ||
scripts | ||
.gitignore | ||
CMakeLists.txt | ||
logits.cpp | ||
Makefile | ||
README.md | ||
requirements.txt |
Model Conversion Example
This directory contains scripts and code to help in the process of converting HuggingFace PyTorch models to GGUF format.
The motivation for having this is that the conversion process can often be an iterative process, where the original model is inspected, converted, updates made to llama.cpp, converted again, etc. Once the model has been converted it needs to be verified against the original model, and then optionally quantified, and in some cases perplexity checked of the quantized model. And finally the model/models need to the ggml-org on Hugging Face. This tool/example tries to help with this process.
Overview
The idea is that the makefile targets and scripts here can be used in the development/conversion process assisting with things like:
- inspect/run the original model to figure out how it works
- convert the original model to GGUF format
- inspect/run the converted model
- verify the logits produced by the original model and the converted model
- quantize the model to GGUF format
- run perplexity evaluation to verify that the quantized model is performing as expected
- upload the model to HuggingFace to make it available for others
Setup
Create virtual python environment
$ python3.11 -m venv venv
$ source venv/bin/activate
(venv) $ pip install -r requirements.txt
Causal Language Model Conversion
This section describes the steps to convert a causal language model to GGUF and to verify that the conversion was successful.
Download the original model
First, clone the original model to some local directory:
$ mkdir models && cd models
$ git clone https://huggingface.co/user/model_name
$ cd model_name
$ git lfs install
$ git lfs pull
Set the MODEL_PATH
The path to the downloaded model can be provided in two ways:
Option 1: Environment variable (recommended for iterative development)
export MODEL_PATH=~/work/ai/models/some_model
Option 2: Command line argument (for one-off tasks)
make causal-convert-model MODEL_PATH=~/work/ai/models/some_model
Command line arguments take precedence over environment variables when both are provided.
In cases where the transformer implementation for the model has not been released
yet it is possible to set the environment variable UNRELEASED_MODEL_NAME
which
will then cause the transformer implementation to be loaded explicitely and not
use AutoModelForCausalLM:
export UNRELEASED_MODEL_NAME=SomeNewModel
Inspecting the original tensors
# Using environment variable
(venv) $ make causal-inspect-original-model
# Or using command line argument
(venv) $ make causal-inspect-original-model MODEL_PATH=~/work/ai/models/some_model
Running the original model
This is mainly to verify that the original model works, and to compare the output from the converted model.
# Using environment variable
(venv) $ make causal-run-original-model
# Or using command line argument
(venv) $ make causal-run-original-model MODEL_PATH=~/work/ai/models/some_model
This command will save two files to the data
directory, one is a binary file
containing logits which will be used for comparison with the converted model
later, and the other is a text file which allows for manual visual inspection.
Model conversion
After updates have been made to gguf-py to add support for the new model, the model can be converted to GGUF format using the following command:
# Using environment variable
(venv) $ make causal-convert-model
# Or using command line argument
(venv) $ make causal-convert-model MODEL_PATH=~/work/ai/models/some_model
Inspecting the converted model
The converted model can be inspected using the following command:
(venv) $ make inspect-converted-model
Running the converted model
(venv) $ make run-converted-model
Model logits verfication
The following target will run the original model and the converted model and compare the logits:
(venv) $ make causal-verify-logits
Quantizing the model
The causal model can be quantized to GGUF format using the following command:
(venv) $ make causal-quantize-Q8_0
Quantized model saved to: /path/to/quantized/model-Q8_0.gguf
Export the quantized model path to QUANTIZED_MODEL variable in your environment
This will show the path to the quantized model in the terminal, which can then
be used to set the QUANTIZED_MODEL
environment variable:
export QUANTIZED_MODEL=/path/to/quantized/model-Q8_0.gguf
Then the quantized model can be run using the following command:
(venv) $ make causal-run-quantized-model
Embedding Language Model Conversion
Download the original model
$ mkdir models && cd models
$ git clone https://huggingface.co/user/model_name
$ cd model_name
$ git lfs install
$ git lfs pull
The path to the embedding model can be provided in two ways:
Option 1: Environment variable (recommended for iterative development)
export EMBEDDING_MODEL_PATH=~/path/to/embedding_model
Option 2: Command line argument (for one-off tasks)
make embedding-convert-model EMBEDDING_MODEL_PATH=~/path/to/embedding_model
Command line arguments take precedence over environment variables when both are provided.
Running the original model
This is mainly to verify that the original model works and to compare the output with the output from the converted model.
# Using environment variable
(venv) $ make embedding-run-original-model
# Or using command line argument
(venv) $ make embedding-run-original-model EMBEDDING_MODEL_PATH=~/path/to/embedding_model
This command will save two files to the data
directory, one is a binary
file containing logits which will be used for comparison with the converted
model, and the other is a text file which allows for manual visual inspection.
Model conversion
After updates have been made to gguf-py to add support for the new model the model can be converted to GGUF format using the following command:
(venv) $ make embedding-convert-model
Run the converted model
(venv) $ make embedding-run-converted-model
Model logits verfication
The following target will run the original model and the converted model (which was done manually in the previous steps) and compare the logits:
(venv) $ make embedding-verify-logits
llama-server verification
To verify that the converted model works with llama-server, the following command can be used:
(venv) $ make embedding-start-embedding-server
Then open another terminal and set the EMBEDDINGS_MODEL_PATH
environment
variable as this will not be inherited by the new terminal:
(venv) $ make embedding-curl-embedding-endpoint
This will call the embedding
endpoing and the output will be piped into
the same verification script as used by the target embedding-verify-logits
.
The causal model can also be used to produce embeddings and this can be verified using the following commands:
(venv) $ make causal-start-embedding-server
Then open another terminal and set the MODEL_PATH
environment
variable as this will not be inherited by the new terminal:
(venv) $ make casual-curl-embedding-endpoint
Quantizing the model
The embedding model can be quantized to GGUF format using the following command:
(venv) $ make embedding-quantize-Q8_0
Quantized model saved to: /path/to/quantized/model-Q8_0.gguf
Export the quantized model path to QUANTIZED_EMBEDDING_MODEL variable in your environment
This will show the path to the quantized model in the terminal, which can then
be used to set the QUANTIZED_EMBEDDING_MODEL
environment variable:
export QUANTIZED_EMBEDDING_MODEL=/path/to/quantized/model-Q8_0.gguf
Then the quantized model can be run using the following command:
(venv) $ make embedding-run-quantized-model
Perplexity Evaluation
Simple perplexity evaluation
This allows to run the perplexity evaluation without having to generate a token/logits file:
(venv) $ make perplexity-run QUANTIZED_MODEL=~/path/to/quantized/model.gguf
This will use the wikitext dataset to run the perplexity evaluation and output the perplexity score to the terminal. This value can then be compared with the perplexity score of the unquantized model.
Full perplexity evaluation
First use the converted, non-quantized, model to generate the perplexity evaluation dataset using the following command:
$ make perplexity-data-gen CONVERTED_MODEL=~/path/to/converted/model.gguf
This will generate a file in the data
directory named after the model and with
a .kld
suffix which contains the tokens and the logits for the wikitext dataset.
After the dataset has been generated, the perplexity evaluation can be run using the quantized model:
$ make perplexity-run-full QUANTIZED_MODEL=~/path/to/quantized/model-Qxx.gguf LOGITS_FILE=data/model.gguf.ppl
📝 Note: The
LOGITS_FILE
is the file generated by the previous command can be very large, so make sure you have enough disk space available.
HuggingFace utilities
The following targets are useful for creating collections and model repositories on Hugging Face in the the ggml-org. These can be used when preparing a relase to script the process for new model releases.
For the following targets a HF_TOKEN
environment variable is required.
📝 Note: Don't forget to logout from Hugging Face after running these commands, otherwise you might have issues pulling/cloning repositories as the token will still be in use: $ huggingface-cli logout $ unset HF_TOKEN
Create a new Hugging Face Model (model repository)
This will create a new model repsository on Hugging Face with the specified model name.
(venv) $ make hf-create-model MODEL_NAME='TestModel' NAMESPACE="danbev"
Repository ID: danbev/TestModel-GGUF
Repository created: https://huggingface.co/danbev/TestModel-GGUF
Note that we append a -GGUF
suffix to the model name to ensure a consistent
naming convention for GGUF models.
Upload a GGUF model to model repository
The following target uploads a model to an existing Hugging Face model repository.
(venv) $ make hf-upload-gguf-to-model MODEL_PATH=dummy-model1.gguf REPO_ID=danbev/TestModel-GGUF
📤 Uploading dummy-model1.gguf to danbev/TestModel-GGUF/dummy-model1.gguf
✅ Upload successful!
🔗 File available at: https://huggingface.co/danbev/TestModel-GGUF/blob/main/dummy-model1.gguf
This command can also be used to update an existing model file in a repository.
Create a new Collection
(venv) $ make hf-new-collection NAME=TestCollection DESCRIPTION="Collection for testing scripts" NAMESPACE=danbev
🚀 Creating Hugging Face Collection
Title: TestCollection
Description: Collection for testing scripts
Namespace: danbev
Private: False
✅ Authenticated as: danbev
📚 Creating collection: 'TestCollection'...
✅ Collection created successfully!
📋 Collection slug: danbev/testcollection-68930fcf73eb3fc200b9956d
🔗 Collection URL: https://huggingface.co/collections/danbev/testcollection-68930fcf73eb3fc200b9956d
🎉 Collection created successfully!
Use this slug to add models: danbev/testcollection-68930fcf73eb3fc200b9956d
Add model to a Collection
(venv) $ make hf-add-model-to-collection COLLECTION=danbev/testcollection-68930fcf73eb3fc200b9956d MODEL=danbev/TestModel-GGUF
✅ Authenticated as: danbev
🔍 Checking if model exists: danbev/TestModel-GGUF
✅ Model found: danbev/TestModel-GGUF
📚 Adding model to collection...
✅ Model added to collection successfully!
🔗 Collection URL: https://huggingface.co/collections/danbev/testcollection-68930fcf73eb3fc200b9956d
🎉 Model added successfully!