Commit graph

150 commits

Author SHA1 Message Date
Alishahryar1
48b085950a Warn on inherited auth token
Some checks are pending
CI / checks (push) Waiting to run
2026-04-24 00:42:33 -07:00
Alishahryar1
6f3d762a4f Revert "Add per-model thinking toggles"
This reverts commit 1f12a33dd7.
2026-04-24 00:26:15 -07:00
Alishahryar1
9c28af7cf1 Fix auth token dotenv precedence 2026-04-24 00:25:31 -07:00
Alishahryar1
1f12a33dd7 Add per-model thinking toggles 2026-04-24 00:14:49 -07:00
Ali Khokhar
462a9430bb
Add local live smoke test suite (#148)
## Summary
- add an opt-in local `smoke/` pytest suite for API, auth, providers,
CLI, IDE-shaped requests, messaging, voice, tools, and thinking stream
contracts
- keep smoke tests out of normal CI collection with `testpaths =
["tests"]`
- write sanitized smoke artifacts under `.smoke-results/`

## Verification
- `uv run ruff format`
- `uv run ruff check`
- `uv run ty check`
- `uv run ty check smoke`
- `FCC_LIVE_SMOKE=1 FCC_SMOKE_TARGETS=all FCC_SMOKE_RUN_VOICE=1 uv run
pytest smoke -n 0 -m live -s --tb=short` -> 17 passed, 9 skipped
- `uv run pytest` -> 904 passed

## Notes
- Skipped live checks require local credentials/tools/services, such as
provider models, Telegram/Discord targets, voice backend, or Claude CLI.
- `claude-pick` smoke was intentionally removed.
2026-04-23 19:06:09 -07:00
Alishahryar1
55131019e1 Sync config defaults and proxy docs
Some checks are pending
CI / checks (push) Waiting to run
2026-04-22 17:34:00 -07:00
Anuj Nitin Bharambe
4fdf7e8b7e
Fix: Exclude chat_template for Mistral tokenizers in NVIDIA NIM (#130) (#131)
Fixes #130. This PR updates the NVIDIA NIM provider to omit
\chat_template_kwargs\ and \chat_template\ when using a Mistral
tokenizer model. This resolves the 400 Bad Request error returned by the
API.

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:16:45 -07:00
Wang Ji
4afca05318
bug: nvidia didn't not support reasoning_budget parameter (#126)
<img width="2538" height="411" alt="image"
src="https://github.com/user-attachments/assets/8fc07f00-8869-4548-b40a-a36a15e4e043"
/>

Fixes #127.

---------

Co-authored-by: u011436427 <u011436427@noreply.gitcode.com>
Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:46 -07:00
arssing
2fe15bd2cd
feat: add proxy support for httpx clients (#125)
Add proxy support for providers based on
[doc](https://www.python-httpx.org/advanced/proxies/):

- Add per-provider proxy support (HTTP and SOCKS5) for all 4 providers:
nvidia_nim, open_router, lmstudio, llamacpp
- Each provider gets its own env var (NVIDIA_NIM_PROXY,
OPENROUTER_PROXY, LMSTUDIO_PROXY, LLAMACPP_PROXY) for independent proxy
configuration

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:16 -07:00
Pavel Yurchenko
e719e4aed2
feat: deepseek api support (#118)
## Summary

* add native DeepSeek provider support via the shared OpenAI-compatible
provider base
* allow `deepseek/...` model prefixes in config validation
* add `DEEPSEEK_API_KEY` and `DEEPSEEK_BASE_URL` settings
* add DeepSeek entries to `.env.example` and `config/env.example`
* implement `DeepSeekProvider` and register it in provider dependencies
* add a DeepSeek request builder with DeepSeek-specific thinking payload
handling
* preserve Anthropic thinking blocks as `reasoning_content` for
DeepSeek-compatible continuation flows
* update `claude-pick` to discover DeepSeek models from the DeepSeek API
* document DeepSeek usage in `README.md`
* add tests for config validation, provider dependency wiring, request
building, and streaming behavior

## Motivation

DeepSeek exposes an OpenAI-compatible API and can be used directly
without routing through OpenRouter. This lets users spend their existing
DeepSeek balance through the proxy while keeping the same Claude Code
workflow and per-model provider mapping.

## Example

```dotenv
DEEPSEEK_API_KEY="sk-..."
DEEPSEEK_BASE_URL="https://api.deepseek.com"

MODEL_OPUS="deepseek/deepseek-reasoner"
MODEL_SONNET="deepseek/deepseek-chat"
MODEL_HAIKU="deepseek/deepseek-chat"
MODEL="deepseek/deepseek-chat"

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:01 -07:00
Alishahryar1
835d0454e8 Fixes for issue 113 and 116 2026-04-18 16:32:31 -07:00
Alishahryar1
ec904c6e0c lint
Some checks failed
CI / checks (push) Has been cancelled
2026-03-27 21:49:04 -07:00
Alishahryar1
6dd07d9b6b fix: update test_build_request_body to use enable_thinking=True 2026-03-27 21:48:21 -07:00
Alishahryar1
b75f47b62d Gate NIM thinking params behind NIM_ENABLE_THINKING env var
Mistral models reject chat_template_kwargs, causing 400 errors. Make
thinking params (chat_template_kwargs, reasoning_budget) opt-in via
NIM_ENABLE_THINKING env var (default false) so only models that need it
(kimi, nemotron) receive them.
2026-03-27 21:44:36 -07:00
th-ch
f703a0e403
Implement optional authentication (Anthropic style) (#80)
Some checks are pending
CI / checks (push) Waiting to run
2026-03-27 11:11:47 -07:00
Alishahryar1
2fad4dd4c9 Support both kimi (thinking) and nemotron (enable_thinking) in chat_template_kwargs
Some checks are pending
CI / checks (push) Waiting to run
2026-03-26 12:34:12 -07:00
Alishahryar1
f9e7f65f4c Fix NVIDIA NIM reasoning params for updated API
Replace dropped params (thinking, reasoning_split, include_reasoning,
return_tokens_as_token_ids, reasoning_effort) with the new API format:
chat_template_kwargs.enable_thinking=True and reasoning_budget=max_tokens.
2026-03-26 12:25:04 -07:00
Yuval Dinodia
00038209b2
fix: remove unsupported include_stop_str_in_output NIM param (#95)
Some checks failed
CI / checks (push) Has been cancelled
2026-03-23 11:38:13 -07:00
Alishahryar1
55945df1d2 removed logging utils 2026-03-11 07:24:50 -07:00
Alishahryar1
5a36a32836 feat: add llama.cpp provider for local anthropic messages API 2026-03-08 10:38:25 -07:00
Alishahryar1
1aedf4763c fix(providers): map httpx exceptions natively and remove type ignores 2026-03-08 08:33:34 -07:00
Alishahryar1
87d8ce1196 feat(lmstudio): route natively to Anthropic /v1/messages endpoint
- Rewrites LMStudioProvider to inherit from BaseProvider
- Passes requests natively to /v1/messages using httpx instead of AsyncOpenAI
- Auto-translates internal ThinkingConfig to Anthropic schema
- Updates .env.example with model routing instructions
- Adjusts test suite for new native integration
2026-03-08 08:17:05 -07:00
Ali Khokhar
884ddd77af
Add tests for fcc-init entrypoint (cli/entrypoints.py) (#77)
Some checks are pending
CI / checks (push) Waiting to run
2026-03-07 08:27:11 -08:00
Alishahryar1
2e8b22fa9d Remvoed root insert hack from conftest 2026-03-01 21:57:25 -08:00
Alishahryar1
a7d88d5cbd Updated README with per-model mapping, fixed test .env isolation 2026-03-01 21:52:35 -08:00
Ali Khokhar
0b324e0421
Per claude model mapping (#66) 2026-03-01 21:32:23 -08:00
Ali Khokhar
fae8a2a044
Remove over-engineering: drop tree_queue setter, _set_connected(), fi… (#63)
…x cancel_all() TOCTOU

- Remove tree_queue property setter (backward-compat hack; all callers
already migrated to replace_tree_queue()); keep property getter only
- Update 2 remaining tests that still used direct assignment to use
replace_tree_queue()
- Remove _set_connected() 1-line wrapper on DiscordPlatform; assign
_connected directly
- Fix cancel_all() TOCTOU: hold self._lock for the full loop so newly
created trees cannot slip through between the snapshot and cancellation

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-03-01 12:34:00 -08:00
Alishahryar1
35a2760f6e Fixed encapsulation violations 2026-03-01 04:28:22 -08:00
Alishahryar1
302ee28585 Removed dead code 2026-03-01 04:21:06 -08:00
Alishahryar1
34757511a0 Improve deterministic error surfacing across stream and API 2026-03-01 01:32:52 -08:00
Alishahryar1
7f2612d2df Added optimization logging 2026-03-01 01:02:59 -08:00
Ali Khokhar
aee9f0ad93
Add code review fix plan covering 11 issues across modularity, encapsulation, performance, and dead code (#62) 2026-03-01 00:45:33 -08:00
Alishahryar1
744eec2772 Major cleanup with GLM-5 2026-02-28 09:10:21 -08:00
Mauro Druwel
de70700dde
feat: Use NVIDIA NIM ASR for audio transcription (#53)
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.

## What changed

- **Transcription**: Now supports the two backends

  - Local Whisper: Free, runs on your GPU/CPU (existing)
  - NVIDIA NIM: Cloud API via Riva gRPC (new)

- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-02-28 08:48:59 -08:00
Alishahryar1
a74ec74271 Major refactor done with minimax m2.5 2026-02-28 04:36:29 -08:00
Alishahryar1
79a1ae0c54 minor refactor using minimax m2.5 2026-02-27 20:44:39 -08:00
Ali Khokhar
c4d8681000
Backup/before cleanup 20260222 230402 (#58) 2026-02-27 19:50:21 -08:00
Alishahryar1
d6a0e1a401 Provider inferred from model name using prefix 2026-02-19 20:53:02 -08:00
Alishahryar1
21959b6189 lint 2026-02-19 20:40:05 -08:00
Alishahryar1
0c8d59e33e Removed deprecated modules and updated imports 2026-02-19 20:38:11 -08:00
Alishahryar1
2b0495dd08 moved text.py to common utils for providers 2026-02-19 20:32:45 -08:00
Alishahryar1
2c1158f62f removed a test 2026-02-19 20:06:15 -08:00
Claude
45b7e4cafd
Make PROVIDER_MAX_CONCURRENCY required with default of 5
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
  is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
  no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
  setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
  updated mock settings to use `5` instead of `None`

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:39:42 +00:00
Claude
99f99fce90
Remove max_cli_sessions — CLI session pool is now unbounded
The max_sessions cap in CLISessionManager was the only thing enforcing
a limit on concurrent CLI processes. Now that provider concurrency is
controlled at the streaming layer (PROVIDER_MAX_CONCURRENCY semaphore),
the CLI session pool cap is redundant and removed entirely.

Changes:
- cli/manager.py: remove max_sessions param, cap check, _cleanup_idle_sessions_unlocked, max_sessions from get_stats()
- config/settings.py: remove max_cli_sessions field
- api/app.py: remove max_sessions=settings.max_cli_sessions from CLISessionManager constructor
- messaging/handler.py: remove "Waiting for slot" status check; stats display no longer shows Max CLI
- .env.example: remove MAX_CLI_SESSIONS line
- tests/cli/test_cli.py: remove max_sessions args and assertion from manager tests
- tests/cli/test_cli_manager_edge_cases.py: remove two tests for cap/cleanup behavior
- tests/api/test_app_lifespan_and_errors.py: remove max_cli_sessions from all SimpleNamespace settings
- tests/config/test_config.py: remove max_cli_sessions isinstance assertion
- tests/conftest.py: remove max_sessions from mock stats
- tests/messaging/test_handler.py: merge slot/capacity tests into single new-conversation test; remove Max CLI assertion from stats test
- tests/messaging/test_handler_markdown_and_status_edges.py: remove "Waiting for slot" assertion; drop max_sessions from all stats mocks

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:31:47 +00:00
Claude
afaf50a972
Add queue-level concurrency limit to provider streaming
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.

Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:23:21 +00:00
Shantanu Suryawanshi
24a5e4d968 Fixing sse stream 2026-02-18 21:31:28 -05:00
Alishahryar1
e7ac85264f Improved optimizations to decrease llm calls further and increase throughput 2026-02-18 17:54:41 -08:00
Alishahryar1
593fb55954 Added fix for large replies being truncated entirely leaving no response text 2026-02-18 17:39:38 -08:00
Alishahryar1
16fa9d90cd Add message_thread_id support across messaging components
- Introduced message_thread_id to the IncomingMessage model for handling forum topic IDs in Telegram.
- Updated messaging platforms (Discord and Telegram) to accept and process message_thread_id in send_message methods.
- Modified message handlers to utilize message_thread_id when sending messages.
- Enhanced test cases to validate the integration of message_thread_id in message handling.

This change improves support for forum supergroups in Telegram and enhances message management across platforms.
2026-02-18 16:10:57 -08:00
Alishahryar1
2220880671 Add voice note cancellation feature during transcription
- Implemented functionality to cancel pending voice transcriptions when a user replies with the /clear command.
- Updated the Telegram and Discord platform classes to manage pending voice messages, including registration and cancellation logic.
- Enhanced the message handler to delete associated messages and notify users when a voice note is cancelled.
- Added tests to ensure the cancellation feature works as expected during transcription.
2026-02-18 06:36:42 -08:00