Commit graph

13 commits

Author SHA1 Message Date
Ali Khokhar
aee9f0ad93
Add code review fix plan covering 11 issues across modularity, encapsulation, performance, and dead code (#62) 2026-03-01 00:45:33 -08:00
Alishahryar1
79a1ae0c54 minor refactor using minimax m2.5 2026-02-27 20:44:39 -08:00
Claude
45b7e4cafd
Make PROVIDER_MAX_CONCURRENCY required with default of 5
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
  is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
  no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
  setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
  updated mock settings to use `5` instead of `None`

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:39:42 +00:00
Claude
afaf50a972
Add queue-level concurrency limit to provider streaming
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.

Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:23:21 +00:00
Alishahryar1
b05d0d2703 new linter rules and fixes 2026-02-18 04:13:41 -08:00
Alishahryar1
539854fe7b Refactor done using GLM-5 2026-02-15 21:58:03 -08:00
Alishahryar1
0d292cd578 ci: enhance type checking in workflow and improve test coverage
- Added a step to fail the CI if any '# type: ignore' comments are found in Python files.
- Refactored tests to use mocking for better isolation and reliability.
- Updated type hints and casting in several files to improve type safety.
2026-02-14 23:01:11 -08:00
Alishahryar1
665e24e2db Migrated from token bucket rate limiter to sliding window rate limiter 2026-02-13 19:05:16 -08:00
Alishahryar1
fab66edcd3 Fixed rate limiting issues 2026-02-13 17:40:19 -08:00
Alishahryar1
fcbe204f44 Major refactor done with kimi-k2.5 in claude code 2026-02-05 10:51:33 -08:00
Alishahryar1
746089ecea Migrated provider ratelimiter to aiolimiter 2026-01-29 23:34:41 -08:00
Alishahryar1
b2b5a5a1d4 Fixed streaming bugs and added better logging 2026-01-29 21:40:20 -08:00
Alishahryar1
7a25c967b0 moved rate limiting to the api level 2026-01-29 18:59:25 -08:00