The edit prediction, web search and completions endpoints in Cloud all
use tokens called LlmApiToken. These were independently created, cached,
and refreshed in three places: the cloud language model provider, the
edit prediction store, and the cloud web search provider. Each held its
own LlmApiToken instance, meaning three separate requests to get these
tokens at startup / login and three redundant refreshes whenever the
server signaled a token update was needed.
We already had a global singleton reacting to the refresh signals:
RefreshLlmTokenListener. It now holds a single LlmApiToken that all
three services use, performs the refresh itself, and emits
RefreshLlmTokenEvent only after the token is fresh. That event is used
by the language model provider to re-fetch models after a refresh. The
singleton is accessed only through `LlmApiToken::global()`.
I have tested this manually, and it token acquisition and usage appear
to be working fine.
Edit: I've tested it with a long running session, and refresh seems to
be working fine too.
Release Notes:
- N/A
---------
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Emit client-side organization changed events through
`RefreshLlmTokenListener` so it produces the same `RefreshLlmTokenEvent`
used for server-pushed `UserUpdated` messages.
This keeps token refresh fan-out in one place.
Closes CLO-383.
Release Notes:
- N/A
---------
Co-authored-by: Tom Houlé <tom@tomhoule.com>
This is already expected on the cloud side. This lets us know under
which organization the user is logged in when requesting an llm_api
token.
Closes CLO-337
Release Notes:
- N/A
This is a staff only toggle for now, since the consequences of
activating it are not obvious and quite dire (tokens costs 6 times
more).
Also, persist thinking, thinking effort and fast mode in DbThread so the
thinking mode toggle and thinking effort are persisted.
Release Notes:
- Agent: The thinking mode toggle and thinking effort are now persisted
when selecting a thread from history.
Before you mark this PR as ready for review, make sure that you have:
- [x] Added a solid test coverage and/or screenshots from doing manual
testing
- [x] Done a self-review taking into account security and performance
aspects
- [x] Aligned any UI changes with the [UI
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
Release Notes:
- N/A
---------
Co-authored-by: Zed Zippy <234243425+zed-zippy[bot]@users.noreply.github.com>
This PR makes it so the user gets signed out upon receiving an
Unauthorized response when acquiring an LLM token.
This is a re-landing of #49661.
Closes CLO-324.
Release Notes:
- N/A
### Description
Related Discussions: #44499, #35742, #31851
Display cost multiplier for GitHub Copilot models in the model selectors
(Both in Chat Panel and Inline Assistant)
<img width="436" height="800" alt="image"
src="https://github.com/user-attachments/assets/c9ebd8fa-4d55-4be8-b3e1-f46dbf1f0145"
/>
### Some technical notes
Although this PR's primary intent is to show the cost multiplier for
GitHub Copilot models alone, I have included some necessary plumbing to
allow specifying costs for other providers in future. I have introduced
an enum called `LanguageModelCostInfo` for showing cost in different
ways for different models. Now, this enum is used in `LanguageModel`
trait to get the cost info.
For now to begin with, in `LanguageModelCostInfo`, I have specified two
ways of pricing: Request-based (1 Agent request - GitHub Copilot uses
this) and Token-based (1M Input tokens / 1M Output tokens). I had
initially thought about adding a `Free` type, especially for Ollama but
didn't do it after realizing that Ollama has paid plans. Right now, only
the Request-based pricing is implemented and used for Copilot models.
Feel free to suggest changes on how to improve this design better.
Release Notes:
- Show cost multiplier for GitHub Copilot models
---------
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Add StreamEnded variant so the client can distinguish between a stream
that the cloud ran to completion versus one that was interrupted (see
CLO-258). **That logic is to be added in a follow up PR**.
Add an Unknown fallback with #[serde(other)] for forward-compatible
deserialization of future variants.
The client advertises support via a new
x-zed-client-supports-stream-ended-request-completion-status header. The
server will only send the new variant if that header is passed. Both
StreamEnded and Unknown are silently ignored at the event mapping layer
(from_completion_request_status returns Ok(None)).
Part of CLO-264 and CLO-266; cloud-side changes to follow.
Release Notes:
- N/A
---------
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
TODO:
- [x] Review code
- [x] Decide whether to keep ignored API tests
Release Notes:
- Fixed a bug where cancelling a thread mid-thought would cause further
anthropic requests to fail
- Fixed a bug where the model configured on a thread would not be
persisted alongside that thread
This PR updates the model selector to highlight the latest models that
are available through the Zed provider:
<img width="388" height="477" alt="Screenshot 2026-02-06 at 1 46 41 PM"
src="https://github.com/user-attachments/assets/70760399-ecf6-46e3-80a7-cb998216c192"
/>
Closes CLO-205.
Release Notes:
- Added a "Latest" indicator to highlight the latest models available
through the Zed provider.
This PR adds a new `supported_effort_levels` method to the
`LanguageModel` trait.
This can be used to retrieve the list of effort levels that the model
supports, which will eventually be used to drive the UI for selecting
the thinking effort.
Right now this list will only be populated for Cloud models.
Release Notes:
- N/A
This PR adds support for refreshing LLM tokens that are "outdated"—that
is, that are missing some required claims.
Release Notes:
- Fixed some instances of authentication errors with the Zed API that
could be resolved automatically by refreshing the token.
This PR adds a new `needs_llm_token_refresh` helper method for checking
if the LLM token needs to be refreshed.
We were duplicating the check for the `x-zed-expired-token` header in a
number of spots, and it will be gaining an additional case soon.
Release Notes:
- N/A
The rate limiter's semaphore guard was being held for the entire
duration of a turn, including during tool execution. This caused
deadlocks when subagents tried to acquire permits while parent requests
were waiting for them to complete.
## The Problem
In `run_turn_internal`, the stream (which contains the `RateLimitGuard`
holding the semaphore permit) was kept alive throughout the entire loop
iteration - including during **tool execution**:
1. Parent request acquires permit
2. Parent starts streaming, consumes response
3. Parent starts executing tools (subagents)
4. **Stream/guard still held** while tools execute
5. Subagents try to acquire permits → blocked because parent still holds
permit
6. Deadlock if all permits are held by parents waiting for subagent
children
## The Fix
Two changes were made:
1. **Drop the stream early**: Added an explicit `drop(events)` after the
stream is fully consumed but before tool execution begins. This releases
the rate limit permit so subagents can acquire it.
2. **Removed the `bypass_rate_limit` workaround**: Since the root cause
is now fixed, the bypass mechanism is no longer needed.
Note: no release notes because subagents are still feature-flagged, and
this rate limiting change isn't actually observable without them.
Release Notes:
- N/A
This PR adds a thinking toggle for controlling whether to use thinking
for a model in the Zed provider:
<img width="645" height="142" alt="Screenshot 2026-01-22 at 12 34 01 PM"
src="https://github.com/user-attachments/assets/9aa543fe-e708-4840-8b38-1a6fbcb78388"
/>
Previously we would create separate "Thinking" variants of the models
that supported thinking in the model selector.
This only applies to Anthropic models in the Zed provider, currently.
This is gated behind the `cloud-thinking-toggle` feature flag.
Release Notes:
- N/A
---------
Co-authored-by: Neel <neel@zed.dev>
## Problem
When subagents use the `edit_file` tool, it creates an `EditAgent` that
makes its own model request to get the edit instructions. These "nested"
requests compete with the parent subagent conversation requests for rate
limiter permits.
The rate limiter uses a semaphore with a limit of 4 concurrent requests
per model instance. When multiple subagents run in parallel:
1. 3 subagents each hold 1 permit for their ongoing conversation streams
(3 permits used)
2. When all 3 try to use `edit_file` simultaneously, their edit agents
need permits too
3. Only 1 edit agent can get the 4th permit; the other 2 block waiting
4. The blocked edit agents can't complete, so their parent subagent
conversations can't complete
5. The parent conversations hold their permits, so the blocked edit
agents stay blocked
6. **Deadlock**
## Solution
Added a `bypass_rate_limit` field to `LanguageModelRequest`. When set to
`true`, the request skips the rate limiter semaphore entirely. The
`EditAgent` sets this flag because its requests are already "part of" a
rate-limited parent request.
(No release notes because subagents are still feature-flagged.)
Release Notes:
- N/A
---------
Co-authored-by: Zed Zippy <234243425+zed-zippy[bot]@users.noreply.github.com>
This PR removes the code for the legacy plans.
No more users will be on this plan as of January 17th, so it's fine to
land these changes now (as they won't be released until the 21st).
Closes CLO-76.
Release Notes:
- N/A
This feature cost $15.
Up -> Tokens we're sending to the model
Down -> Tokens we've received from the model.
<img width="377" height="69" alt="Screenshot 2026-01-14 at 12 31 01 PM"
src="https://github.com/user-attachments/assets/fc15824f-de5d-466b-8cc1-329f3c1940bb"
/>
Release Notes:
- Changed the display of tokens for OpenAI models to reflect the
input/output limits.
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Problem
When users paste or drag large images into the agent panel, the encoded
payload can exceed upstream provider limits (e.g., Anthropic's 5MB
per-image limit), causing API errors.
## Solution
Enforce a default 5MB limit on encoded PNG bytes in
`LanguageModelImage::from_image`:
1. Apply existing Anthropic dimension limits first (1568px max in either
dimension)
2. Iteratively downscale by ~15% per pass until the encoded PNG is under
5MB
3. Return `None` if the image can't be shrunk within 8 passes
(fail-safe)
The limit is enforced at the `LanguageModelImage` conversion layer,
which is the choke point for all image ingestion paths (agent panel
paste/drag, file mentions, text threads, etc.).
## Future Work
The 5MB limit is a conservative default. Provider-specific limits can be
introduced later by adding a `from_image_with_constraints` API.
## Testing
Added a regression test that:
1. Generates a noisy 4096x4096 PNG (guaranteed >5MB)
2. Converts it via `LanguageModelImage::from_image`
3. Asserts the result is ≤5MB and was actually downscaled
---
**Note:** This PR builds on #45312 (prompt store fail-open fix). Please
merge that first.
cc @rtfeldman
---------
Co-authored-by: Zed Zippy <234243425+zed-zippy[bot]@users.noreply.github.com>
- Edit prediction providers can now be configured through the settings
UI
- Cleaned up the status bar menu to only show _configured_ providers
- Added to the status bar icon button tooltip the name of the active
provider
- Only display the data collection functionality under "Privacy" for the
Zed models
- Moved the Codestral edit prediction provider out of the Mistral
section in the agent panel into the settings UI
- Refined and improved UI and states for configuring GitHub Copilot as
both an agent and edit prediction provider
#### Todos before merge:
- [x] UI: Unify with settings UI style and tidy it all up
- [x] Unify Copilot modal `impl`s to use separate window
- [x] Remove stop light icons from GitHub modal
- [x] Make dismiss events work on GitHub modal
- [ ] Investigate workarounds to tell if Copilot authenticated even when
LSP not running
Release Notes:
- settings_ui: Added a section for configuring edit prediction providers
under AI > Edit Predictions, including Codestral and GitHub Copilot.
Once you've updated you can use the following link to open it:
zed://settings/edit_predictions.providers
---------
Co-authored-by: Ben Kunkle <ben@zed.dev>
This PR simplifies error and event handling by removing the
`Ok(LanguageModelCompletionEvent::Status(CompletionRequestStatus::Failed)))`
state from the stream returned by `LanguageModel::stream_completion()`,
by changing it into an `Err(LanguageModelCompletionError)`. This was
done by collapsing the valid `CompletionRequestStatus` values into
`LanguageModelCompletionEvent`.
Release Notes:
- N/A
---------
Co-authored-by: Michael Benfield <mbenfield@zed.dev>
This PR makes the description in the callout that display general errors
in the agent panel be rendered as markdown. This allow us to pass URLs
to these error strings that will be clickable, improving the overall
interaction with them. Here's an example:
<img width="500" height="396" alt="Screenshot 2025-11-14 at 11 43@2x"
src="https://github.com/user-attachments/assets/f4fc629a-6314-4da1-8c19-b60e1a09653b"
/>
Release Notes:
- agent: Improved the interaction with errors by allowing links to be
clickable.
Automatically retry the agent's LLM completion requests when the
provider returns 429 Too Many Requests. Uses the Retry-After header to
determine the retry delay if it is available.
Many providers are frequently overloaded or have low rate limits. These
providers are essentially unusable without automatic retries.
Tested with Cerebras configured via openai_compatible.
Related: #31531
Release Notes:
- Added automatic retries for OpenAI-compatible LLM providers
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Since we removed the filtering step during context gathering, we want
the model to perform more targeted searches. This PR tweaks search tool
schema allowing the model to search within syntax nodes such as `impl`
blocks or methods.
This is what the query schema looks like now:
```rust
/// Search for relevant code by path, syntax hierarchy, and content.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub struct SearchToolQuery {
/// 1. A glob pattern to match file paths in the codebase to search in.
pub glob: String,
/// 2. Regular expressions to match syntax nodes **by their first line** and hierarchy.
///
/// Subsequent regexes match nodes within the full content of the nodes matched by the previous regexes.
///
/// Example: Searching for a `User` class
/// ["class\s+User"]
///
/// Example: Searching for a `get_full_name` method under a `User` class
/// ["class\s+User", "def\sget_full_name"]
///
/// Skip this field to match on content alone.
#[schemars(length(max = 3))]
#[serde(default)]
pub syntax_node: Vec<String>,
/// 3. An optional regular expression to match the final content that should appear in the results.
///
/// - Content will be matched within all lines of the matched syntax nodes.
/// - If syntax node regexes are provided, this field can be skipped to include as much of the node itself as possible.
/// - If no syntax node regexes are provided, the content will be matched within the entire file.
pub content: Option<String>,
}
```
We'll need to keep refining this, but the core implementation is ready.
Release Notes:
- N/A
---------
Co-authored-by: Ben <ben@zed.dev>
Co-authored-by: Max <max@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
I noticed we had some typos that were getting through CI, but it looks
like the new version of `typos` catches them. So I updated it and fixed
them.
Release Notes:
- N/A
We've been considering removing workspace-hack for a couple reasons:
- Lukas ran into a situation where its build script seemed to be causing
spurious rebuilds. This seems more likely to be a cargo bug than an
issue with workspace-hack itself (given that it has an empty build
script), but we don't necessarily want to take the time to hunt that
down right now.
- Marshall mentioned hakari interacts poorly with automated crate
updates (in our case provided by rennovate) because you'd need to have
`cargo hakari generate && cargo hakari manage-deps` after their changes
and we prefer to not have actions that make commits.
Currently removing workspace-hack causes our workspace to grow from
~1700 to ~2000 crates being built (depending on platform), which is
mainly a problem when you're building the whole workspace or running
tests across the the normal and remote binaries (which is where
feature-unification nets us the most sharing). It doesn't impact
incremental times noticeably when you're just iterating on `-p zed`, and
we'll hopefully get these savings back in the future when
rust-lang/cargo#14774 (which re-implements the functionality of hakari)
is finished.
Release Notes:
- N/A
Release Notes:
- Added Codestral edit predictions provider which can be enabled by adding an API key in the Mistral section of agent settings.

## Config
Get API key from https://console.mistral.ai/codestral and add it in the Mistral section of the agent settings.
```
"features": {
"edit_prediction_provider": "codestral"
},
"edit_predictions": {
"codestral": {
"model": "codestral-latest",
"max_tokens": 150
}
},
```
---------
Co-authored-by: Michael Sloan <michael@zed.dev>
Closes#5355
Release Notes:
- Fixed rendering glitches with files with more than 16 million lines
(that occured due to floating number rounding errors).
---------
Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com>
Co-Authored-By: Ben K <ben@zed.dev>
Co-Authored-By: Anthony <anthony@zed.dev>
Co-Authored-By: Mikayla <mikayla@zed.dev>
Release Notes:
- settings: Major internal changes to settings. The primary user-facing
effect is that some settings which did not make sense in project
settings files are no-longer read from there. (For example the inline
blame settings)
---------
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
Co-authored-by: Anthony <anthony@zed.dev>
In zed logs you can see these logs of lmstudio connection refused.
Currently zed connects to lmstudio by default as there is no credential
mechanism to check if the user has enabled lmstudio previously or not
like we do with other providers using api keys.
This pr removes the below annoying log and makes the zed logs less
polluted.
```
2025-09-01T02:11:33+05:30 ERROR [language_models] Other(error sending request for url (http://localhost:1234/api/v0/models)
Caused by:
0: client error (Connect)
1: tcp connect error: Connection refused (os error 61)
2: Connection refused (os error 61))
```
Release Notes:
- N/A
---------
Signed-off-by: Umesh Yadav <git@umesh.dev>
This PR fixes the backwards compatibility of the new `Plan` variants.
We can't add new variants to the wire representation, as old clients
won't be able to understand them.
Release Notes:
- N/A