Commit graph

50 commits

Author SHA1 Message Date
Alishahryar1
ac2c37f613 Use canonical FCC server log path
Some checks are pending
CI / checks (push) Waiting to run
2026-05-16 11:51:45 -07:00
Alishahryar1
a728994e29 Update default config and workspace paths 2026-05-16 11:36:53 -07:00
Siarhei Krotau
972bc1661c
feat(providers): add Z.ai Coding Plan provider (#440)
Some checks are pending
CI / checks (push) Waiting to run
2026-05-13 12:02:15 -07:00
Rizki Kotet
32e2e4d755
feat(opencode): integrate OpenCode Zen provider and API key support (#426)
Some checks are pending
CI / checks (push) Waiting to run
2026-05-12 08:44:54 -07:00
Alishahryar1
2637824a3f fix(openai): close async client with supported method 2026-05-10 22:30:23 -07:00
Alishahryar1
29e7714337 feat(logging): structured TRACE events and end-to-end request correlation
Add core/trace.py with trace_event, traced_async_stream, and payload snapshots.
Merge TRACE fields into JSON logs; promote claude_session_id, http path/method.
Instrument API, messaging/CLI, and OpenAI-compat/native provider paths.
Harden log sink with enqueue and stdlib intercept re-entrancy guard.
Document behavior in .env.example and README; extend tests.
2026-05-10 18:24:48 -07:00
Nox
1e97dff214
fix: handle disallowed special tokens in tiktoken encoder (#382)
Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-05-10 17:30:24 -07:00
Alishahryar1
f9fc614563 Allow unauthenticated root probes 2026-05-10 16:03:19 -07:00
Alishahryar1
e386a3c8aa Improve admin UI setup flow 2026-05-10 15:57:56 -07:00
Alishahryar1
8ee72968ed Initial admin impl 2026-05-10 01:21:16 -07:00
Alishahryar1
5294661aa4 feat: add Wafer provider 2026-05-08 23:43:16 -07:00
Alishahryar1
d3a3b37e9e Filter OpenRouter model variants by thinking support
Some checks failed
CI / checks (push) Has been cancelled
2026-04-30 22:01:36 -07:00
Alishahryar1
db3c9521b1 Add no-thinking model picker variants
Some checks are pending
CI / checks (push) Waiting to run
2026-04-30 21:27:23 -07:00
Alishahryar1
72b34ad57c Added claude-code native model picker 2026-04-30 20:34:35 -07:00
Alishahryar1
d9040ce901 Report startup validation failures without tracebacks 2026-04-30 00:43:43 -07:00
Alishahryar1
85232a3ccb Log startup model validation failures clearly 2026-04-30 00:37:32 -07:00
Alishahryar1
eb5516e53b Validate configured models at startup 2026-04-30 00:33:45 -07:00
Alishahryar1
6297b48f81 feat(deepseek): use native Anthropic Messages transport
- Point DeepSeek at api.deepseek.com/anthropic with x-api-key headers
- Native request builder, DeepSeek-specific thinking/block sanitization
- Drop deepseek from OpenAI-chat server-tool preflight; update tests and docs
- Default smoke model deepseek-v4-pro; re-export dump_raw_messages_request
2026-04-26 12:03:21 -07:00
Alishahryar1
f3a7528d49 Major refactor: API, providers, messaging, and Anthropic protocol
Some checks are pending
CI / checks (push) Waiting to run
Consolidates the incremental refactor work into a single change set: modular web tools (api/web_tools), native Anthropic request building and SSE block policy, OpenAI conversion and error handling, provider transports and rate limiting, messaging handler and tree queue, safe logging, smoke tests, and broad test coverage.
2026-04-26 03:01:14 -07:00
Wang Ji
b525217633
[feat] ollama method support (#129)
Support use ollama method like LM stuio

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
Co-authored-by: u011436427 <u011436427@noreply.gitcode.com>
2026-04-25 22:06:36 -07:00
Alishahryar1
f29e693dc5 Add per-model thinking toggles 2026-04-25 20:51:07 -07:00
Alishahryar1
40951c145a refactor: drop legacy title-generation detection copy
Some checks are pending
CI / checks (push) Waiting to run
Remove new-conversation-topic heuristic; keep sentence-case and JSON session
title patterns. Update unit and smoke E2E payloads accordingly.
2026-04-25 00:45:22 -07:00
Alishahryar1
080ebefc7b fix: detect Claude Code 2.1+ session title requests for optimization skip
Expand is_title_generation_request to match sentence-case/JSON title prompts
in addition to legacy new-conversation-topic copy. Add unit test for the
current session-title system text shape.
2026-04-25 00:44:25 -07:00
Alishahryar1
b926f60f64 feat: Anthropic web server tools, provider metadata, messaging hardening
- Add local web_search/web_fetch SSE handling and optional tool schemas
- Extend HeuristicToolParser for JSON-style WebFetch/WebSearch text
- Consolidate provider defaults, ids, and exception typing; stream contracts
- Messaging: typed options, voice config injection, platform contract cleanup
- Tests for web server tools, converters, parsers, contracts; ignore debug-*.log
2026-04-24 23:01:14 -07:00
Alishahryar1
0e3b2c24b4 refactor: remove OpenRouter rollback, shims, and redundant layers
- OpenRouter: native Anthropic only; remove chat_request and OPENROUTER_TRANSPORT
- Drop OpenAICompatibleProvider alias, api.request_utils, voice_pipeline facade
- Simplify OpenRouter SSE, generic reasoning in conversion, messaging dispatch
- Shared markdown table helpers; API optimization response helper; contract guards
- Restore PLAN.md; update docs and tests
2026-04-24 21:08:38 -07:00
Alishahryar1
26b8a29537 Architecture refactor: core anthropic, runtime, smoke tiers, remove providers.common 2026-04-24 20:03:14 -07:00
Alishahryar1
66ef23072c Refactor provider routing and smoke coverage 2026-04-24 19:34:34 -07:00
Alishahryar1
48b085950a Warn on inherited auth token
Some checks are pending
CI / checks (push) Waiting to run
2026-04-24 00:42:33 -07:00
Alishahryar1
6f3d762a4f Revert "Add per-model thinking toggles"
This reverts commit 1f12a33dd7.
2026-04-24 00:26:15 -07:00
Alishahryar1
1f12a33dd7 Add per-model thinking toggles 2026-04-24 00:14:49 -07:00
arssing
2fe15bd2cd
feat: add proxy support for httpx clients (#125)
Add proxy support for providers based on
[doc](https://www.python-httpx.org/advanced/proxies/):

- Add per-provider proxy support (HTTP and SOCKS5) for all 4 providers:
nvidia_nim, open_router, lmstudio, llamacpp
- Each provider gets its own env var (NVIDIA_NIM_PROXY,
OPENROUTER_PROXY, LMSTUDIO_PROXY, LLAMACPP_PROXY) for independent proxy
configuration

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:16 -07:00
Pavel Yurchenko
e719e4aed2
feat: deepseek api support (#118)
## Summary

* add native DeepSeek provider support via the shared OpenAI-compatible
provider base
* allow `deepseek/...` model prefixes in config validation
* add `DEEPSEEK_API_KEY` and `DEEPSEEK_BASE_URL` settings
* add DeepSeek entries to `.env.example` and `config/env.example`
* implement `DeepSeekProvider` and register it in provider dependencies
* add a DeepSeek request builder with DeepSeek-specific thinking payload
handling
* preserve Anthropic thinking blocks as `reasoning_content` for
DeepSeek-compatible continuation flows
* update `claude-pick` to discover DeepSeek models from the DeepSeek API
* document DeepSeek usage in `README.md`
* add tests for config validation, provider dependency wiring, request
building, and streaming behavior

## Motivation

DeepSeek exposes an OpenAI-compatible API and can be used directly
without routing through OpenRouter. This lets users spend their existing
DeepSeek balance through the proxy while keeping the same Claude Code
workflow and per-model provider mapping.

## Example

```dotenv
DEEPSEEK_API_KEY="sk-..."
DEEPSEEK_BASE_URL="https://api.deepseek.com"

MODEL_OPUS="deepseek/deepseek-reasoner"
MODEL_SONNET="deepseek/deepseek-chat"
MODEL_HAIKU="deepseek/deepseek-chat"
MODEL="deepseek/deepseek-chat"

---------

Co-authored-by: Alishahryar1 <alishahryar2@gmail.com>
2026-04-22 17:06:01 -07:00
Alishahryar1
835d0454e8 Fixes for issue 113 and 116 2026-04-18 16:32:31 -07:00
th-ch
f703a0e403
Implement optional authentication (Anthropic style) (#80)
Some checks are pending
CI / checks (push) Waiting to run
2026-03-27 11:11:47 -07:00
Alishahryar1
87d8ce1196 feat(lmstudio): route natively to Anthropic /v1/messages endpoint
- Rewrites LMStudioProvider to inherit from BaseProvider
- Passes requests natively to /v1/messages using httpx instead of AsyncOpenAI
- Auto-translates internal ThinkingConfig to Anthropic schema
- Updates .env.example with model routing instructions
- Adjusts test suite for new native integration
2026-03-08 08:17:05 -07:00
Alishahryar1
a7d88d5cbd Updated README with per-model mapping, fixed test .env isolation 2026-03-01 21:52:35 -08:00
Ali Khokhar
0b324e0421
Per claude model mapping (#66) 2026-03-01 21:32:23 -08:00
Alishahryar1
34757511a0 Improve deterministic error surfacing across stream and API 2026-03-01 01:32:52 -08:00
Alishahryar1
7f2612d2df Added optimization logging 2026-03-01 01:02:59 -08:00
Ali Khokhar
c4d8681000
Backup/before cleanup 20260222 230402 (#58) 2026-02-27 19:50:21 -08:00
Alishahryar1
d6a0e1a401 Provider inferred from model name using prefix 2026-02-19 20:53:02 -08:00
Alishahryar1
21959b6189 lint 2026-02-19 20:40:05 -08:00
Alishahryar1
0c8d59e33e Removed deprecated modules and updated imports 2026-02-19 20:38:11 -08:00
Claude
45b7e4cafd
Make PROVIDER_MAX_CONCURRENCY required with default of 5
- `max_concurrency` is now always an `int` (default 5) — `None`/unlimited
  is no longer a valid state; omitting the env var uses the default
- `GlobalRateLimiter`: semaphore is always created; `concurrency_slot()`
  no longer has None guards; log message always includes concurrency
- `ProviderConfig.max_concurrency`: `int = 5` (was `int | None = None`)
- `Settings.provider_max_concurrency`: `int = Field(default=5, ...)` —
  setting env var to an invalid value (e.g. empty string) raises
- `.env.example`: uncommented `PROVIDER_MAX_CONCURRENCY=5`
- README: updated config table default from `—` to `5`
- Tests: removed `test_concurrency_slot_noop_when_not_configured`;
  updated mock settings to use `5` instead of `None`

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:39:42 +00:00
Claude
99f99fce90
Remove max_cli_sessions — CLI session pool is now unbounded
The max_sessions cap in CLISessionManager was the only thing enforcing
a limit on concurrent CLI processes. Now that provider concurrency is
controlled at the streaming layer (PROVIDER_MAX_CONCURRENCY semaphore),
the CLI session pool cap is redundant and removed entirely.

Changes:
- cli/manager.py: remove max_sessions param, cap check, _cleanup_idle_sessions_unlocked, max_sessions from get_stats()
- config/settings.py: remove max_cli_sessions field
- api/app.py: remove max_sessions=settings.max_cli_sessions from CLISessionManager constructor
- messaging/handler.py: remove "Waiting for slot" status check; stats display no longer shows Max CLI
- .env.example: remove MAX_CLI_SESSIONS line
- tests/cli/test_cli.py: remove max_sessions args and assertion from manager tests
- tests/cli/test_cli_manager_edge_cases.py: remove two tests for cap/cleanup behavior
- tests/api/test_app_lifespan_and_errors.py: remove max_cli_sessions from all SimpleNamespace settings
- tests/config/test_config.py: remove max_cli_sessions isinstance assertion
- tests/conftest.py: remove max_sessions from mock stats
- tests/messaging/test_handler.py: merge slot/capacity tests into single new-conversation test; remove Max CLI assertion from stats test
- tests/messaging/test_handler_markdown_and_status_edges.py: remove "Waiting for slot" assertion; drop max_sessions from all stats mocks

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:31:47 +00:00
Claude
afaf50a972
Add queue-level concurrency limit to provider streaming
Adds max_concurrency cap to GlobalRateLimiter using asyncio.Semaphore.
A request now waits for a concurrency slot before the sliding window rate
limit check, so at most N streams are open to the provider simultaneously,
even when the rate window would allow more.

Changes:
- providers/rate_limit.py: max_concurrency param, _concurrency_sem, concurrency_slot() asynccontextmanager
- providers/openai_compat.py: pass max_concurrency to limiter; wrap execute_with_retry + stream iteration in concurrency_slot()
- providers/base.py: max_concurrency field on ProviderConfig
- config/settings.py: provider_max_concurrency setting (PROVIDER_MAX_CONCURRENCY env var, default None = unlimited)
- api/dependencies.py: pass provider_max_concurrency into all three provider ProviderConfig instantiations
- .env.example: document PROVIDER_MAX_CONCURRENCY (commented out)
- tests/providers/test_provider_rate_limit.py: 5 new tests covering concurrency limit enforcement, slot release on exception, noop when unconfigured
- tests/api/test_dependencies.py: add provider_max_concurrency=None to mock settings helper

https://claude.ai/code/session_014mrF1WMNgmNjtPBuoQHsbg
2026-02-19 14:23:21 +00:00
Alishahryar1
e7ac85264f Improved optimizations to decrease llm calls further and increase throughput 2026-02-18 17:54:41 -08:00
Alishahryar1
b05d0d2703 new linter rules and fixes 2026-02-18 04:13:41 -08:00
Cursor Agent
e9beb28897 fix: validate API keys at provider init to prevent 403 'authorization missing'
When NVIDIA_NIM_API_KEY or OPENROUTER_API_KEY is empty or not set,
the proxy forwarded requests without a valid Authorization header,
causing providers to return 403 with 'Header of type authorization
was missing'.

Now fail fast with HTTP 503 and a clear message telling users to add
the key to .env, with links to obtain keys.

Fixes #29

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-17 07:33:56 +00:00
Cursor Agent
4b4f87515d Phase 7: Directory restructuring (messaging/ and tests/)
- Create messaging/platforms/ (base, discord, telegram, factory)
- Create messaging/rendering/ (discord_markdown, telegram_markdown)
- Create messaging/trees/ (data, repository, processor, queue_manager)
- Organize tests/ into api/, providers/, messaging/, cli/, config/
- Add backward-compatible re-exports at old locations
- Update handler.py and test_messaging_factory.py imports
- Fix Telegram type hints for TELEGRAM_AVAILABLE=False case
- Fix Python 3 except syntax in discord_markdown

Co-authored-by: Ali Khokhar <alishahryar2@gmail.com>
2026-02-17 02:25:42 +00:00