The max_sessions cap in CLISessionManager was the only thing enforcing
a limit on concurrent CLI processes. Now that provider concurrency is
controlled at the streaming layer (PROVIDER_MAX_CONCURRENCY semaphore),
the CLI session pool cap is redundant and removed entirely.
Changes:
- cli/manager.py: remove max_sessions param, cap check, _cleanup_idle_sessions_unlocked, max_sessions from get_stats()
- config/settings.py: remove max_cli_sessions field
- api/app.py: remove max_sessions=settings.max_cli_sessions from CLISessionManager constructor
- messaging/handler.py: remove "Waiting for slot" status check; stats display no longer shows Max CLI
- .env.example: remove MAX_CLI_SESSIONS line
- tests/cli/test_cli.py: remove max_sessions args and assertion from manager tests
- tests/cli/test_cli_manager_edge_cases.py: remove two tests for cap/cleanup behavior
- tests/api/test_app_lifespan_and_errors.py: remove max_cli_sessions from all SimpleNamespace settings
- tests/config/test_config.py: remove max_cli_sessions isinstance assertion
- tests/conftest.py: remove max_sessions from mock stats
- tests/messaging/test_handler.py: merge slot/capacity tests into single new-conversation test; remove Max CLI assertion from stats test
- tests/messaging/test_handler_markdown_and_status_edges.py: remove "Waiting for slot" assertion; drop max_sessions from all stats mocks
https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.
Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper
https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
When NVIDIA_NIM_API_KEY or OPENROUTER_API_KEY is empty or not set,
the proxy forwarded requests without a valid Authorization header,
causing providers to return 403 with 'Header of type authorization
was missing'.
Now fail fast with HTTP 503 and a clear message telling users to add
the key to .env, with links to obtain keys.
Fixes#29
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
- Add MessageTree.set_current_task() method
- Update tree_processor to use set_current_task instead of _current_task
- Move nim_settings out of ProviderConfig, pass only to NvidiaNimProvider
- Update api/dependencies and all tests
Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
- Introduced `plans_directory` parameter in `CLISessionManager`, `CLISession`, and updated related methods to handle plan files.
- Updated `api/app.py` to pass the plans directory to the CLI session manager.
- Added a new test in `test_cli.py` to verify that the `--settings plansDirectory` argument is included when starting a task with a specified plans directory.
Updated the README to include new timeout settings. Implemented these timeouts in the provider classes and added corresponding tests to ensure they are correctly passed to the client. Also included environment variable support for the new settings.
- Introduced `LMStudioProvider` to the provider system.
- Added a new fixture `lmstudio_provider` in `conftest.py` for testing.
- Updated `get_provider` function to handle `lmstudio` as a valid provider type.
- Enhanced README and `.env.example` to include LM Studio configuration details.
- Updated settings to accommodate LM Studio's base URL and provider type.
- Added tests to verify the functionality of the LM Studio provider.
- Replaced NVIDIA NIM and OpenRouter specific rate limit settings with a generic provider rate limit in settings, tests, and environment files.
- Updated README.md to reflect the new provider rate limit configuration.
- Adjusted tests to validate the new provider rate limit attributes.
- Introduced OpenRouter as a new provider option in settings and environment configuration.
- Updated README.md to include instructions for using OpenRouter.
- Enhanced the message converter to support reasoning content for OpenRouter.
- Added tests for OpenRouter provider functionality and message conversion.
- Updated dependencies to include OpenRouterProvider.
- Added request ID context to logging in FastAPI routes and NVIDIA NIM provider.
- Improved logging format to include context variables for better traceability.
- Updated message handling in Telegram and Claude handlers to log message previews.
- Enhanced error logging in NVIDIA NIM provider with request ID for easier debugging.
- Added logging for tree repository actions to track tree and node registrations.
- Added a step to fail the CI if any '# type: ignore' comments are found in Python files.
- Refactored tests to use mocking for better isolation and reliability.
- Updated type hints and casting in several files to improve type safety.