Replace dropped params (thinking, reasoning_split, include_reasoning,
return_tokens_as_token_ids, reasoning_effort) with the new API format:
chat_template_kwargs.enable_thinking=True and reasoning_budget=max_tokens.
- Rewrites LMStudioProvider to inherit from BaseProvider
- Passes requests natively to /v1/messages using httpx instead of AsyncOpenAI
- Auto-translates internal ThinkingConfig to Anthropic schema
- Updates .env.example with model routing instructions
- Adjusts test suite for new native integration
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
updated mock settings to use `5` instead of `None`
https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.
Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper
https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
- Add MessageTree.set_current_task() method
- Update tree_processor to use set_current_task instead of _current_task
- Move nim_settings out of ProviderConfig, pass only to NvidiaNimProvider
- Update api/dependencies and all tests
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
- Create providers/openai_compat.py with shared streaming logic
- Refactor NvidiaNimProvider, OpenRouterProvider, LMStudioProvider to extend it
- OpenRouter overrides _handle_extra_reasoning for reasoning_details
- Update test patches to providers.openai_compat
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
- Create providers/common/ with sse_builder, message_converter, think_parser,
heuristic_tool_parser, error_mapping
- Update nvidia_nim/utils and errors to re-export from common for backward compat
- Update all provider clients and tests to import from providers.common
- Remove duplicated files from nvidia_nim/utils/
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
Updated the README to include new timeout settings. Implemented these timeouts in the provider classes and added corresponding tests to ensure they are correctly passed to the client. Also included environment variable support for the new settings.
- Introduced `LMStudioProvider` to the provider system.
- Added a new fixture `lmstudio_provider` in `conftest.py` for testing.
- Updated `get_provider` function to handle `lmstudio` as a valid provider type.
- Enhanced README and `.env.example` to include LM Studio configuration details.
- Updated settings to accommodate LM Studio's base URL and provider type.
- Added tests to verify the functionality of the LM Studio provider.
- Introduced OpenRouter as a new provider option in settings and environment configuration.
- Updated README.md to include instructions for using OpenRouter.
- Enhanced the message converter to support reasoning content for OpenRouter.
- Added tests for OpenRouter provider functionality and message conversion.
- Updated dependencies to include OpenRouterProvider.
- Added request ID context to logging in FastAPI routes and NVIDIA NIM provider.
- Improved logging format to include context variables for better traceability.
- Updated message handling in Telegram and Claude handlers to log message previews.
- Enhanced error logging in NVIDIA NIM provider with request ID for easier debugging.
- Added logging for tree repository actions to track tree and node registrations.
- Added a step to fail the CI if any '# type: ignore' comments are found in Python files.
- Refactored tests to use mocking for better isolation and reliability.
- Updated type hints and casting in several files to improve type safety.
- Assistant messages: build content string in block order (thinking+text interleaved)
- User messages: emit text before tool results when order is text→tool_result
- Response: add extract_think_content_interleaved() to preserve <think>...</think> order
- Add tests and docs for context preservation bugs
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
ContentBlockManager already declares task_arg_buffer, task_args_emitted,
tool_ids. Remove defensive getattr/isinstance checks from _process_tool_call
and _flush_task_arg_buffers.
Update test_subagent_interception to set task_arg_buffer, task_args_emitted,
tool_ids on mock so it behaves like real ContentBlockManager.
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
Thin wrapper only used in tests. Tests now import extract_text_from_content
directly from utils.text.
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>