## Summary
* add native DeepSeek provider support via the shared OpenAI-compatible
provider base
* allow `deepseek/...` model prefixes in config validation
* add `DEEPSEEK_API_KEY` and `DEEPSEEK_BASE_URL` settings
* add DeepSeek entries to `.env.example` and `config/env.example`
* implement `DeepSeekProvider` and register it in provider dependencies
* add a DeepSeek request builder with DeepSeek-specific thinking payload
handling
* preserve Anthropic thinking blocks as `reasoning_content` for
DeepSeek-compatible continuation flows
* update `claude-pick` to discover DeepSeek models from the DeepSeek API
* document DeepSeek usage in `README.md`
* add tests for config validation, provider dependency wiring, request
building, and streaming behavior
## Motivation
DeepSeek exposes an OpenAI-compatible API and can be used directly
without routing through OpenRouter. This lets users spend their existing
DeepSeek balance through the proxy while keeping the same Claude Code
workflow and per-model provider mapping.
## Example
```dotenv
DEEPSEEK_API_KEY="sk-..."
DEEPSEEK_BASE_URL="https://api.deepseek.com"
MODEL_OPUS="deepseek/deepseek-reasoner"
MODEL_SONNET="deepseek/deepseek-chat"
MODEL_HAIKU="deepseek/deepseek-chat"
MODEL="deepseek/deepseek-chat"
---------
Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
Mistral models reject chat_template_kwargs, causing 400 errors. Make
thinking params (chat_template_kwargs, reasoning_budget) opt-in via
NIM_ENABLE_THINKING env var (default false) so only models that need it
(kimi, nemotron) receive them.
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.
## What changed
- **Transcription**: Now supports the two backends
- Local Whisper: Free, runs on your GPU/CPU (existing)
- NVIDIA NIM: Cloud API via Riva gRPC (new)
- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)
---------
Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
updated mock settings to use `5` instead of `None`
https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
The flag was unnecessary: running claude-pick implies wanting the picker.
Remove MODEL_PICKER from claude-pick and README, restore .env.example
to upstream.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `claude-pick` bash script: reads PROVIDER_TYPE from .env, fetches
available models (NVIDIA NIM, OpenRouter, LM Studio), and launches Claude
with the selected model via fzf. Falls back to direct launch when
MODEL_PICKER=false.
- Add MODEL_PICKER=false flag to .env.example.
- Document setup in README (fzf install, alias, fixed-model alias pattern).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Updated transcription logic to utilize Hugging Face's Whisper models instead of faster-whisper.
- Introduced new model mapping and pipeline loading functions.
- Adjusted tests to reflect changes in the transcription process.
- Updated documentation in README, .env.example, and settings to align with the new implementation.
- Ensured compatibility with CUDA 13 and removed unnecessary dependencies.
- Validate whisper_device in Settings and _get_local_model
- Reject 'auto' with clear ValueError/ValidationError
- Update docs in config, .env.example, README
- Add tests for invalid device and valid cpu/cuda
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
- Remove _cuda_failed_models and inference-time CPU fallback
- auto: try CUDA only, fail fast on RuntimeError (no CPU fallback)
- cpu/cuda: use device directly, fail fast on errors
- Update docs in config, .env.example, README
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>