mirror of
https://github.com/Alishahryar1/free-claude-code.git
synced 2026-04-28 11:30:03 +00:00
Gate NIM thinking params behind NIM_ENABLE_THINKING env var
Mistral models reject chat_template_kwargs, causing 400 errors. Make thinking params (chat_template_kwargs, reasoning_budget) opt-in via NIM_ENABLE_THINKING env var (default false) so only models that need it (kimi, nemotron) receive them.
This commit is contained in:
parent
ab0d6aca14
commit
b75f47b62d
6 changed files with 49 additions and 8 deletions
|
|
@ -73,6 +73,9 @@ MODEL_OPUS="nvidia_nim/z-ai/glm4.7"
|
|||
MODEL_SONNET="nvidia_nim/moonshotai/kimi-k2-thinking"
|
||||
MODEL_HAIKU="nvidia_nim/stepfun-ai/step-3.5-flash"
|
||||
MODEL="nvidia_nim/z-ai/glm4.7" # fallback
|
||||
|
||||
# Enable for thinking models (kimi, nemotron). Leave false for others (e.g. Mistral).
|
||||
NIM_ENABLE_THINKING=true
|
||||
```
|
||||
|
||||
</details>
|
||||
|
|
@ -437,7 +440,8 @@ Configure via `WHISPER_DEVICE` (`cpu` | `cuda` | `nvidia_nim`) and `WHISPER_MODE
|
|||
| `MODEL_OPUS` | Model for Claude Opus requests (falls back to `MODEL`) | `nvidia_nim/z-ai/glm4.7` |
|
||||
| `MODEL_SONNET` | Model for Claude Sonnet requests (falls back to `MODEL`) | `open_router/arcee-ai/trinity-large-preview:free` |
|
||||
| `MODEL_HAIKU` | Model for Claude Haiku requests (falls back to `MODEL`) | `open_router/stepfun/step-3.5-flash:free` |
|
||||
| `NVIDIA_NIM_API_KEY` | NVIDIA API key | required for NIM |
|
||||
| `NVIDIA_NIM_API_KEY` | NVIDIA API key | required for NIM |
|
||||
| `NIM_ENABLE_THINKING` | Send `chat_template_kwargs` + `reasoning_budget` on NIM requests. Enable for thinking models (kimi, nemotron); leave `false` for others (e.g. Mistral) | `false` |
|
||||
| `OPENROUTER_API_KEY` | OpenRouter API key | required for OpenRouter |
|
||||
| `LM_STUDIO_BASE_URL` | LM Studio server URL | `http://localhost:1234/v1` |
|
||||
| `LLAMACPP_BASE_URL` | llama.cpp server URL | `http://localhost:8080/v1` |
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue