mirror of
https://github.com/kvcache-ai/ktransformers.git
synced 2025-09-06 20:49:55 +00:00
108 lines
3.8 KiB
Markdown
108 lines
3.8 KiB
Markdown
# API
|
|
|
|
- [OpenAI ChatCompletion](#openai-chatcompletion)
|
|
- [Ollama ChatCompletion](#ollama-chatcompletion)
|
|
- [OpenAI Assistant](#openai-assistant)
|
|
|
|
## OpenAI ChatCompletion
|
|
```bash
|
|
POST /v1/chat/completions
|
|
|
|
```
|
|
Generate responses based on the selected model.
|
|
|
|
### Parameters
|
|
- `messages`: An array of `message` representing all historical messages. A `message` can be from a user or model (assistant) and includes:
|
|
|
|
- `role`: Either `user` or `assistant`, indicating the creator of this message.
|
|
- `content`: The message from the user or model.
|
|
- `model`: The name of the selected model
|
|
- `stream`: Either true or false. Indicates whether to use streaming response. If true, model inference results are returned via HTTP event stream.
|
|
|
|
### Response
|
|
- Streaming response: An event stream, each event contains a `chat.completion.chunk`. `chunk.choices[0].delta.content` is the incremental output returned by the model each time.
|
|
- Non-streaming response: Not supported yet.
|
|
|
|
|
|
|
|
### Example
|
|
|
|
```bash
|
|
curl -X 'POST' \
|
|
'http://localhost:9112/v1/chat/completions' \
|
|
-H 'accept: application/json' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"messages": [
|
|
{
|
|
"content": "tell a joke",
|
|
"role": "user"
|
|
}
|
|
],
|
|
"model": "Meta-Llama-3-8B-Instruct",
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
```bash
|
|
data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"Why ","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}
|
|
|
|
data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}
|
|
|
|
data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"couldn't ","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}
|
|
|
|
...
|
|
|
|
data:{"id":"c30445e8-1061-4149-a101-39b8222e79e1","object":"chat.completion.chunk","created":1720511671,"model":"not implmented","system_fingerprint":"not implmented","usage":null,"choices":[{"index":0,"delta":{"content":"two-tired!","role":"assistant","name":null},"logprobs":null,"finish_reason":null}]}
|
|
|
|
event: done
|
|
data: [DONE]
|
|
```
|
|
|
|
|
|
|
|
## Ollama ChatCompletion
|
|
|
|
```bash
|
|
POST /api/generate
|
|
```
|
|
|
|
Generate responses using the selected model.
|
|
|
|
### Parameters
|
|
- `prompt`: A string representing the input prompt.
|
|
- `model`: The name of the selected model
|
|
- `stream`: Either true or false. Indicates whether to use streaming responses. If true, returns the model inference results in the form of an HTTP event stream.
|
|
|
|
### Response
|
|
- Streaming response: A stream of JSON responses, each line is a JSON.
|
|
- `response`: The incremental result of the model completion.
|
|
- `done`: Whether the inference has finished.
|
|
- Non-streaming response: Not yet supported.
|
|
|
|
### 例子
|
|
|
|
```bash
|
|
curl -X 'POST' \
|
|
'http://localhost:9112/api/generate' \
|
|
-H 'accept: application/json' \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{
|
|
"model": "Meta-Llama-3-8B-Instruct",
|
|
"prompt": "tell me a joke",
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
```bash
|
|
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:11.686513","response":"I'll ","done":false}
|
|
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:11.729214","response":"give ","done":false}
|
|
|
|
...
|
|
|
|
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:33.955475","response":"for","done":false}
|
|
{"model":"Meta-Llama-3-8B-Instruct","created_at":"2024-07-09 08:13:33.956795","response":"","done":true}
|
|
```
|
|
|
|
|
|
|