Commit graph

57 commits

Author SHA1 Message Date
Alishahryar1
5a36a32836 feat: add llama.cpp provider for local anthropic messages API 2026-03-08 10:38:25 -07:00
Alishahryar1
87d8ce1196 feat(lmstudio): route natively to Anthropic /v1/messages endpoint
- Rewrites LMStudioProvider to inherit from BaseProvider
- Passes requests natively to /v1/messages using httpx instead of AsyncOpenAI
- Auto-translates internal ThinkingConfig to Anthropic schema
- Updates .env.example with model routing instructions
- Adjusts test suite for new native integration
2026-03-08 08:17:05 -07:00
Alishahryar1
49075b7fa5 Fixed default models 2026-03-01 21:34:01 -08:00
Alishahryar1
ac499cf585 Increased read timeout 2026-03-01 21:33:32 -08:00
Alishahryar1
feba0d456a Updated .env.example 2026-03-01 21:33:17 -08:00
Ali Khokhar
0b324e0421
Per claude model mapping (#66) 2026-03-01 21:32:23 -08:00
Mauro Druwel
de70700dde
feat: Use NVIDIA NIM ASR for audio transcription (#53)
## Summary
Added NVIDIA NIM as a second transcription option ( alongside local
Whisper). This lets you transcribe voice notes using NVIDIA's cloud API
instead of running Whisper locally.

## What changed

- **Transcription**: Now supports the two backends

  - Local Whisper: Free, runs on your GPU/CPU (existing)
  - NVIDIA NIM: Cloud API via Riva gRPC (new)

- **Supported models**: 8 NVIDIA NIM models added (Parakeet variants for
different languages, Whisper Large V3)

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-02-28 08:48:59 -08:00
Ali Khokhar
c4d8681000
Backup/before cleanup 20260222 230402 (#58) 2026-02-27 19:50:21 -08:00
Alishahryar1
d6a0e1a401 Provider inferred from model name using prefix 2026-02-19 20:53:02 -08:00
Alishahryar1
2ad64cc97a quoted string vars in env example 2026-02-19 20:27:28 -08:00
Claude
45b7e4cafd
Make PROVIDER_MAX_CONCURRENCY required with default of 5
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
  is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
  no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
  setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
  updated mock settings to use `5` instead of `None`

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:39:42 +00:00
Claude
99f99fce90
Remove max_cli_sessions — CLI session pool is now unbounded
The max_sessions cap in CLISessionManager was the only thing enforcing
a limit on concurrent CLI processes. Now that provider concurrency is
controlled at the streaming layer (PROVIDER_MAX_CONCURRENCY semaphore),
the CLI session pool cap is redundant and removed entirely.

Changes:
- cli/manager.py: remove max_sessions param, cap check, _cleanup_idle_sessions_unlocked, max_sessions from get_stats()
- config/settings.py: remove max_cli_sessions field
- api/app.py: remove max_sessions=settings.max_cli_sessions from CLISessionManager constructor
- messaging/handler.py: remove "Waiting for slot" status check; stats display no longer shows Max CLI
- .env.example: remove MAX_CLI_SESSIONS line
- tests/cli/test_cli.py: remove max_sessions args and assertion from manager tests
- tests/cli/test_cli_manager_edge_cases.py: remove two tests for cap/cleanup behavior
- tests/api/test_app_lifespan_and_errors.py: remove max_cli_sessions from all SimpleNamespace settings
- tests/config/test_config.py: remove max_cli_sessions isinstance assertion
- tests/conftest.py: remove max_sessions from mock stats
- tests/messaging/test_handler.py: merge slot/capacity tests into single new-conversation test; remove Max CLI assertion from stats test
- tests/messaging/test_handler_markdown_and_status_edges.py: remove "Waiting for slot" assertion; drop max_sessions from all stats mocks

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:31:47 +00:00
Claude
afaf50a972
Add queue-level concurrency limit to provider streaming
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.

Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:23:21 +00:00
Alishahryar1
4aff0b910f provider type quoted 2026-02-18 19:54:30 -08:00
Alishahryar1
cf1284b784 default voice note enabled set to false 2026-02-18 19:54:13 -08:00
Alishahryar1
416f259c8b Reordered env example 2026-02-18 19:53:30 -08:00
Alishahryar1
c35ecba9d8 Update Whisper model configuration to use 'base' as the default model ID 2026-02-18 19:36:58 -08:00
Alishahryar1
8807f58267 decreased default message rate limit 2026-02-18 13:35:09 -08:00
Alishahryar1
75e066f17f Refactor voice note transcription to use Hugging Face transformers Whisper pipeline
- Updated transcription logic to utilize Hugging Face's Whisper models instead of faster-whisper.
- Introduced new model mapping and pipeline loading functions.
- Adjusted tests to reflect changes in the transcription process.
- Updated documentation in README, .env.example, and settings to align with the new implementation.
- Ensured compatibility with CUDA 13 and removed unnecessary dependencies.
2026-02-18 06:18:28 -08:00
Cursor Agent
db646ef2db Remove auto support for whisper_device; only cpu and cuda allowed
- Validate whisper_device in Settings and _get_local_model
- Reject 'auto' with clear ValueError/ValidationError
- Update docs in config, .env.example, README
- Add tests for invalid device and valid cpu/cuda

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-18 13:38:59 +00:00
Cursor Agent
eabe8db2e8 Remove CPU fallbacks for voice note transcribe; auto/cuda/cpu fail fast
- Remove _cuda_failed_models and inference-time CPU fallback
- auto: try CUDA only, fail fast on RuntimeError (no CPU fallback)
- cpu/cuda: use device directly, fail fast on errors
- Update docs in config, .env.example, README

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-18 13:37:23 +00:00
Alishahryar1
d668f6e476 Add voice note transcription feature
- Introduced voice note handling for Discord and Telegram platforms.
- Added configuration options for voice note functionality in settings.py and .env.example.
- Updated README to include voice note instructions and configuration details.
- Implemented audio attachment processing and transcription using faster-whisper.
- Enabled voice note support through message handlers in both platforms.
2026-02-16 20:14:59 -08:00
Alishahryar1
0eb999a0c1 lint 2026-02-16 02:23:57 -08:00
Alishahryar1
01852e1638 Add configurable HTTP timeouts for provider API requests
Updated the README to include new timeout settings. Implemented these timeouts in the provider classes and added corresponding tests to ensure they are correctly passed to the client. Also included environment variable support for the new settings.
2026-02-16 01:40:15 -08:00
Alishahryar1
6511542bfe Implement Discord bot support and update README for messaging platform changes 2026-02-16 00:08:09 -08:00
Alishahryar1
b53a1b20c5 lint 2026-02-15 23:55:57 -08:00
Alishahryar1
b6f870ffab Added space in example env 2026-02-15 23:53:42 -08:00
Alishahryar1
b83be84313 Add LM Studio provider support
- Introduced `LMStudioProvider` to the provider system.
- Added a new fixture `lmstudio_provider` in `conftest.py` for testing.
- Updated `get_provider` function to handle `lmstudio` as a valid provider type.
- Enhanced README and `.env.example` to include LM Studio configuration details.
- Updated settings to accommodate LM Studio's base URL and provider type.
- Added tests to verify the functionality of the LM Studio provider.
2026-02-15 19:41:03 -08:00
Alishahryar1
45d4dc25d2 Removed nim setting from .env 2026-02-15 11:17:09 -08:00
Alishahryar1
054f9869b7 Refactor rate limiting configuration to use unified provider settings
- Replaced NVIDIA NIM and OpenRouter specific rate limit settings with a generic provider rate limit in settings, tests, and environment files.
- Updated README.md to reflect the new provider rate limit configuration.
- Adjusted tests to validate the new provider rate limit attributes.
2026-02-15 11:03:59 -08:00
Alishahryar1
e5a096049d feat: add OpenRouter support and configuration options
- Introduced OpenRouter as a new provider option in settings and environment configuration.
- Updated README.md to include instructions for using OpenRouter.
- Enhanced the message converter to support reasoning content for OpenRouter.
- Added tests for OpenRouter provider functionality and message conversion.
- Updated dependencies to include OpenRouterProvider.
2026-02-15 10:50:53 -08:00
Alishahryar1
95039bab1f Updated .env.example 2026-02-15 04:04:23 -08:00
Alishahryar1
6102583026 Major Refactor Part 2 with kimi-k2.5 in claude code 2026-02-05 16:09:16 -08:00
Alishahryar1
9e6552389c Updated default model 2026-02-05 11:10:45 -08:00
Ali Khokhar
e6972e3fba
Increase NVIDIA NIM rate limit from 20 to 40 2026-02-03 19:57:44 -08:00
Alishahryar1
81d41fb6d5 Added mocking for suggestion mode and file path extraction 2026-02-03 19:24:18 -08:00
Alishahryar1
ec86f5bda6 Update code to always log everything to server.log removing server_debug.jsonl 2026-02-03 19:13:04 -08:00
Alishahryar1
7045e0ed44 add retry for telegram message edits with tests and updated rate telegram limit 2026-01-30 15:41:24 -08:00
Alishahryar1
0e06307366 added 2 new optimization for mocking title and quota 2026-01-30 03:20:34 -08:00
Alishahryar1
9a1f178622 updated env example 2026-01-30 00:47:22 -08:00
Alishahryar1
eb19a69afd Updated example env 2026-01-30 00:46:27 -08:00
Alishahryar1
d958544c3d Migrated from telethon to telegram bot api 2026-01-30 00:38:39 -08:00
Alishahryar1
fb845384f0 fixed server start message on telegram 2026-01-30 00:02:39 -08:00
Alishahryar1
2a84823825 added rate limiter for telegram to prevent flood wait 2026-01-29 23:25:55 -08:00
Alishahryar1
82ab6d64e1 Revert "added messaging app rate limiter params"
This reverts commit 8ee1da7140.
2026-01-29 23:14:59 -08:00
Alishahryar1
116fe6cba1 Revert "updated env example"
This reverts commit 0761cbe907.
2026-01-29 23:14:49 -08:00
Alishahryar1
fd6b32113d Revert "updated env example"
This reverts commit 4f4552b960.
2026-01-29 23:14:41 -08:00
Alishahryar1
4f4552b960 updated env example 2026-01-29 22:44:12 -08:00
Alishahryar1
0761cbe907 updated env example 2026-01-29 22:42:15 -08:00
Alishahryar1
8ee1da7140 added messaging app rate limiter params 2026-01-29 22:38:26 -08:00