Commit graph

86 commits

Author SHA1 Message Date
Alishahryar1
2fad4dd4c9 Support both kimi (thinking) and nemotron (enable_thinking) in chat_template_kwargs
Some checks are pending
CI / checks (push) Waiting to run
2026-03-26 12:34:12 -07:00
Alishahryar1
f9e7f65f4c Fix NVIDIA NIM reasoning params for updated API
Replace dropped params (thinking, reasoning_split, include_reasoning,
return_tokens_as_token_ids, reasoning_effort) with the new API format:
chat_template_kwargs.enable_thinking=True and reasoning_budget=max_tokens.
2026-03-26 12:25:04 -07:00
Yuval Dinodia
00038209b2
fix: remove unsupported include_stop_str_in_output NIM param (#95)
Some checks failed
CI / checks (push) Has been cancelled
2026-03-23 11:38:13 -07:00
Alishahryar1
55945df1d2 removed logging utils 2026-03-11 07:24:50 -07:00
Alishahryar1
5a36a32836 feat: add llama.cpp provider for local anthropic messages API 2026-03-08 10:38:25 -07:00
Alishahryar1
1aedf4763c fix(providers): map httpx exceptions natively and remove type ignores 2026-03-08 08:33:34 -07:00
Alishahryar1
87d8ce1196 feat(lmstudio): route natively to Anthropic /v1/messages endpoint
- Rewrites LMStudioProvider to inherit from BaseProvider
- Passes requests natively to /v1/messages using httpx instead of AsyncOpenAI
- Auto-translates internal ThinkingConfig to Anthropic schema
- Updates .env.example with model routing instructions
- Adjusts test suite for new native integration
2026-03-08 08:17:05 -07:00
Ali Khokhar
f57598fee3
Move nim_settings from shared base class to NvidiaNimProvider (#78)
Some checks failed
CI / checks (push) Has been cancelled
2026-03-07 22:34:45 -08:00
Alishahryar1
34757511a0 Improve deterministic error surfacing across stream and API 2026-03-01 01:32:52 -08:00
Ali Khokhar
aee9f0ad93
Add code review fix plan covering 11 issues across modularity, encapsulation, performance, and dead code (#62) 2026-03-01 00:45:33 -08:00
Alishahryar1
744eec2772 Major cleanup with GLM-5 2026-02-28 09:10:21 -08:00
Alishahryar1
a74ec74271 Major refactor done with minimax m2.5 2026-02-28 04:36:29 -08:00
Alishahryar1
79a1ae0c54 minor refactor using minimax m2.5 2026-02-27 20:44:39 -08:00
Ali Khokhar
c4d8681000
Backup/before cleanup 20260222 230402 (#58) 2026-02-27 19:50:21 -08:00
Alishahryar1
2b0495dd08 moved text.py to common utils for providers 2026-02-19 20:32:45 -08:00
Claude
45b7e4cafd
Make PROVIDER_MAX_CONCURRENCY required with default of 5
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
  is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
  no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
  setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
  updated mock settings to use `5` instead of `None`

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:39:42 +00:00
Claude
afaf50a972
Add queue-level concurrency limit to provider streaming
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.

Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:23:21 +00:00
Shantanu Suryawanshi
24a5e4d968 Fixing sse stream 2026-02-18 21:31:28 -05:00
Alishahryar1
b05d0d2703 new linter rules and fixes 2026-02-18 04:13:41 -08:00
Cursor Agent
bfc781e0ed Phase 4-6: Dead code removal, performance, minor fixes
Phase 4:
- Remove legacy SessionRecord, _sessions, _msg_to_session from SessionStore
- Fix hardcoded provider in root endpoint (use settings.provider_type)
- Update session store tests

Phase 5:
- Use list-based string accumulation in ThinkingSegment, TextSegment, ToolCallSegment
- Cache MAX_MESSAGE_LOG_ENTRIES_PER_CHAT at SessionStore init
- Use iterative DFS in MessageTree.get_descendants

Phase 6:
- Add comment for abstract async generator workaround in BaseProvider
- Rename TELEGRAM_EDIT log tags to PLATFORM_EDIT in handler

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-17 02:01:01 +00:00
Cursor Agent
72b7e34999 Phase 3: Fix encapsulation violations
- Add MessageTree.set_current_task() method
- Update tree_processor to use set_current_task instead of _current_task
- Move nim_settings out of ProviderConfig, pass only to NvidiaNimProvider
- Update api/dependencies and all tests

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-17 01:58:51 +00:00
Cursor Agent
bab86a2687 Phase 2: Extract OpenAICompatibleProvider base class
- Create providers/openai_compat.py with shared streaming logic
- Refactor NvidiaNimProvider, OpenRouterProvider, LMStudioProvider to extend it
- OpenRouter overrides _handle_extra_reasoning for reasoning_details
- Update test patches to providers.openai_compat

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-17 01:57:34 +00:00
Cursor Agent
d3f5f8877f Phase 1: Extract shared provider utils into providers/common/
- Create providers/common/ with sse_builder, message_converter, think_parser,
  heuristic_tool_parser, error_mapping
- Update nvidia_nim/utils and errors to re-export from common for backward compat
- Update all provider clients and tests to import from providers.common
- Remove duplicated files from nvidia_nim/utils/

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-17 01:55:38 +00:00
Alishahryar1
01852e1638 Add configurable HTTP timeouts for provider API requests
Updated the README to include new timeout settings. Implemented these timeouts in the provider classes and added corresponding tests to ensure they are correctly passed to the client. Also included environment variable support for the new settings.
2026-02-16 01:40:15 -08:00
Alishahryar1
1d52e5d3bb Updated thinking keys 2026-02-15 23:49:54 -08:00
Alishahryar1
539854fe7b Refactor done using GLM-5 2026-02-15 21:58:03 -08:00
Alishahryar1
b83be84313 Add LM Studio provider support
- Introduced `LMStudioProvider` to the provider system.
- Added a new fixture `lmstudio_provider` in `conftest.py` for testing.
- Updated `get_provider` function to handle `lmstudio` as a valid provider type.
- Enhanced README and `.env.example` to include LM Studio configuration details.
- Updated settings to accommodate LM Studio's base URL and provider type.
- Added tests to verify the functionality of the LM Studio provider.
2026-02-15 19:41:03 -08:00
Alishahryar1
30ea67bcd4 Updated open router default max tokens 2026-02-15 11:37:53 -08:00
Alishahryar1
e5a096049d feat: add OpenRouter support and configuration options
- Introduced OpenRouter as a new provider option in settings and environment configuration.
- Updated README.md to include instructions for using OpenRouter.
- Enhanced the message converter to support reasoning content for OpenRouter.
- Added tests for OpenRouter provider functionality and message conversion.
- Updated dependencies to include OpenRouterProvider.
2026-02-15 10:50:53 -08:00
Alishahryar1
7dfcad2a4c Enhance logging and error handling across multiple modules
- Added request ID context to logging in FastAPI routes and NVIDIA NIM provider.
- Improved logging format to include context variables for better traceability.
- Updated message handling in Telegram and Claude handlers to log message previews.
- Enhanced error logging in NVIDIA NIM provider with request ID for easier debugging.
- Added logging for tree repository actions to track tree and node registrations.
2026-02-15 02:01:57 -08:00
Alishahryar1
0d292cd578 ci: enhance type checking in workflow and improve test coverage
- Added a step to fail the CI if any '# type: ignore' comments are found in Python files.
- Refactored tests to use mocking for better isolation and reliability.
- Updated type hints and casting in several files to improve type safety.
2026-02-14 23:01:11 -08:00
Alishahryar1
952a2351ec always enabled thinking 2026-02-14 19:46:29 -08:00
Alishahryar1
9be9943401 Improved test coverage 2026-02-14 19:17:19 -08:00
Alishahryar1
96747f2216 Updated token counting and removed non streaming support 2026-02-14 19:10:09 -08:00
Cursor Agent
ec71a2232c fix: preserve interleaved thinking, tool calls, and text in Anthropic↔NIM conversion
- Assistant messages: build content string in block order (thinking+text interleaved)
- User messages: emit text before tool results when order is text→tool_result
- Response: add extract_think_content_interleaved() to preserve <think>...</think> order
- Add tests and docs for context preservation bugs

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-15 02:08:35 +00:00
Cursor Agent
0bab393c05 Phase 6: Remove dynamic attribute creation in NIM client
ContentBlockManager already declares task_arg_buffer, task_args_emitted,
tool_ids. Remove defensive getattr/isinstance checks from _process_tool_call
and _flush_task_arg_buffers.

Update test_subagent_interception to set task_arg_buffer, task_args_emitted,
tool_ids on mock so it behaves like real ContentBlockManager.

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-15 01:39:43 +00:00
Cursor Agent
f80c7ce42e Remove _extract_text_from_content wrapper from logging_utils
Thin wrapper only used in tests. Tests now import extract_text_from_content
directly from utils.text.

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-15 01:32:04 +00:00
Cursor Agent
212b2b29b2 Remove dead code: extract_reasoning_from_delta
Exported but never used. NIM client uses getattr(delta, 'reasoning_content')
directly.

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-15 01:31:43 +00:00
Alishahryar1
4b95429c32 fixed NIM_INTERCEPT for chunked tool calls 2026-02-14 03:25:32 -08:00
Alishahryar1
85eed8a1bc updated heuristic tool parser for orphaned toolcall tags 2026-02-14 00:16:54 -08:00
Alishahryar1
8647aa52c5 Revert "force enabled thinking"
This reverts commit eaaaab2c42.
2026-02-13 22:59:54 -08:00
Alishahryar1
eaaaab2c42 force enabled thinking 2026-02-13 21:52:54 -08:00
Alishahryar1
665e24e2db Migrated from token bucket rate limiter to sliding window rate limiter 2026-02-13 19:05:16 -08:00
Alishahryar1
fab66edcd3 Fixed rate limiting issues 2026-02-13 17:40:19 -08:00
Alishahryar1
6102583026 Major Refactor Part 2 with kimi-k2.5 in claude code 2026-02-05 16:09:16 -08:00
Alishahryar1
fcbe204f44 Major refactor done with kimi-k2.5 in claude code 2026-02-05 10:51:33 -08:00
Alishahryar1
58f556f8bd Fixed orphan </think> tags 2026-02-03 22:04:06 -08:00
Alishahryar1
ec86f5bda6 Update code to always log everything to server.log removing server_debug.jsonl 2026-02-03 19:13:04 -08:00
Alishahryar1
55ac01c716 fixed exception handling during open content block 2026-01-31 16:45:34 -08:00
Alishahryar1
707e8aec2b fixed subagents 2026-01-31 15:57:37 -08:00