Self-Review Checklist:
- [x] I've reviewed my own diff for quality, security, and reliability
- [x] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable
Release Notes:
- N/A
#56472 broke Copilot chat
> Failed to connect to API: 400 Bad Request {"message":"cache_control:
Extra inputs are not permitted"}
This PR makes it so that we still use the legacy caching approach for
Copilot
Release Notes:
- N/A
Switches the Anthropic provider from hand-stamping `cache_control` onto
the last message content block over to Anthropic's top-level automatic
prompt caching, paired with explicit long-TTL (1h) anchors on the last
tool definition and on the system prompt.
The prefix order `tools` → `system` → `messages` satisfies Anthropic's
requirement that longer TTLs appear earlier in the prefix, so the static
prefix is cached for 1h (surviving idle gaps longer than the 5-minute
default) while the rapidly-changing conversation tail uses the
free-to-refresh 5-minute TTL via the top-level automatic breakpoint.
Three of the four available cache breakpoints are used (last tool,
system, automatic conversation), leaving one in reserve.
As a side benefit, this fixes a latent issue where the previous stamping
loop could place `cache_control` on a `Thinking` content block, which
the Anthropic API does not allow. Automatic caching is documented to
walk past ineligible blocks (including thinking) when selecting its
breakpoint, so we now delegate that responsibility to the server.
The new shape we send (when caching is enabled):
```json
{
"tools": [{ "...": "...", "cache_control": {"type": "ephemeral", "ttl": "1h"} }],
"system": [
{"type": "text", "text": "...", "cache_control": {"type": "ephemeral", "ttl": "1h"}}
],
"messages": [ /* no per-block cache_control */ ],
"cache_control": {"type": "ephemeral"}
}
```
Release Notes:
- Improved Anthropic prompt cache utilization, reducing latency and cost
for ongoing conversations
---------
Co-authored-by: Martin Ye <martinye022@gmail.com>
Most compelling reason to make this change is that we don't have to ship
a new Zed binary if Anthropic releases a new model
Self-Review Checklist:
- [x] I've reviewed my own diff for quality, security, and reliability
- [x] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable
Release Notes:
- anthropic: Dynamically fetch available models from Anthropic API
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Self-Review Checklist:
- [x] I've reviewed my own diff for quality, security, and reliability
- [x] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable
Closes #ISSUE
Release Notes:
- N/A
No, sadly, the title is not a typo. See
https://www.githubstatus.com/incidents/zsg1lk7w13cf for the context.
I'll read with joy and popcorn through that root cause analysis.
It makes literally zero sense what happened here, but for some completly
bonkers reason GitHub completely messed up the merge queue with
https://github.com/zed-industries/zed/pull/54632.
I have no idea how it happened. It makes literally zero sense. A PR
going into the merge queue should have the same LoC when getting out of
it. GitHub obviously does not check this. GitHub causes extra work with
a feature that is supposed to save time.
Thanks, I guess.
Release Notes:
- N/A
---------
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
This PR brings back the button to filter remote branches when accessing
the title bar's branch picker with the mouse. It was unintentionally
removed when we introduced the new worktree picker.
Release Notes:
- N/A
Starting with Claude Opus 4.7, Anthropic omits thinking content from
responses by default; callers must pass `display: "summarized"` to keep
seeing thinking summaries. Without opting in, the agent UI shows a long
pause with no visible thinking, and users get no progress indication
during extended reasoning.
This extends the adaptive-thinking wire type with an optional `display`
field and requests `Summarized` from every call site that builds an
adaptive thinking request (direct Anthropic, Copilot Chat proxy, Zed
Cloud, and Bedrock).
## Notes
- Applied at the adaptive-thinking layer rather than special-casing Opus
4.7. The `display` parameter is accepted by every
adaptive-thinking-capable model, and the previous behavior (visible
summaries) is what users already see on Opus 4.6 / Sonnet 4.6, so there
is no behavior change for those models.
Release Notes:
- Restored thinking summaries for Claude Opus 4.7.
Drop the `count_tokens` API and related implementations across
providers, and remove the unused `tiktoken-rs` dependency.
I was going to update the dependency becuase they finally released a fix
we needed. But then I realized we only used this api in one place, the
Rules library. And for most models it would have been wildly incorrect
becuase we use tiktoken, i.e. OpenAI tokenizers, for almost every model,
which is going to give incorrect results.
Given that, I just removed these because the difference in how we get
these has caused plenty of confusion in the past.
Self-Review Checklist:
- [x] I've reviewed my own diff for quality, security, and reliability
- [x] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable
Release Notes:
- N/A
<img width="767" height="428" alt="Screenshot 2026-04-16 at 11 29 13 AM"
src="https://github.com/user-attachments/assets/e8b450fa-aefc-4dec-a286-b211bd492011"
/>
Add Claude Opus 4.7 (`claude-opus-4-7`) to the anthropic, bedrock, and
opencode provider crates.
Key specs:
- 1M token context window
- 128k max output tokens
- Adaptive thinking support
- AWS Bedrock cross-region inference (global, US, EU, AU)
Release Notes:
- Added Claude Opus 4.7 as an available language model
PR #51946 broke `Model::Custom` thinking behavior: `mode()`,
`supports_thinking()`, and `supports_adaptive_thinking()` all inferred
capabilities from hardcoded built-in model lists, so any `Custom`
variant always fell back to `Default` regardless of its configured
`mode` field.
### Fixes
- **`Model::mode()`** — `Custom` now short-circuits to `mode.clone()`
before the built-in inference logic
- **`Model::supports_thinking()`** — `Custom` returns `true` when `mode`
is `Thinking { .. }` or `AdaptiveThinking`
- **`Model::supports_adaptive_thinking()`** — `Custom` returns `true`
when `mode` is `AdaptiveThinking`
Built-in model behavior is unchanged.
### Tests
Three regression tests added covering the three `Custom` mode cases:
explicit `Thinking`, `AdaptiveThinking`, and `Default` (which must
disable both flags).
Self-Review Checklist:
- [x] I've reviewed my own diff for quality, security, and reliability
- [ ] Unsafe blocks (if any) have justifying comments
- [x] The content is consistent with the [UI/UX
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
- [x] Tests cover the new/changed behavior
- [x] Performance impact has been considered and is acceptable
Release Notes:
- Fixed custom Anthropic models losing their configured
thinking/adaptive-thinking mode after the thinking-toggle refactor
(#51946)
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
- `language_model` no longer depends on provider-specific crates such as
`anthropic` and `open_ai` (inverted dependency)
- `language_model_core` was extracted from `language_model` which
contains the types for the provider-specific crates to convert to/from.
- `gpui::SharedString` has been extracted into its own crate (still
exposed by `gpui`), so `language_model_core` and provider API crates
don't have to depend on `gpui`.
- Removes some unnecessary `&'static str` | `SharedString` -> `String`
-> `SharedString` conversions across the codebase.
- Extracts the core logic of the cloud `LanguageModelProvider` into its
own crate with simpler dependencies.
Release Notes:
- N/A
---------
Co-authored-by: John Tur <john-tur@outlook.com>
The `CONTEXT_1M_BETA_HEADER` (`context-1m-2025-08-07`) is deprecated for
Sonnet 4 and 4.5. This removes the constant from the anthropic crate and
the match arm in `beta_headers()` that sent it for
`ClaudeSonnet4_5_1mContext`.
Note: The bedrock crate still has its own copy of this constant, used
when the user-configurable `allow_extended_context` setting is enabled.
That may warrant a separate cleanup.
Closes AI-114
Release Notes:
- N/A
This adds support for the thinking toggle + reasoning effort for the
Anthropic provider
Release Notes:
- anthropic: Added support for selecting reasoning effort
---------
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Before you mark this PR as ready for review, make sure that you have:
- [x] Added a solid test coverage and/or screenshots from doing manual
testing
- [x] Done a self-review taking into account security and performance
aspects
- [x] Aligned any UI changes with the [UI
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
Release Notes:
- Updated our BYOK integration to support the new 1M context windows for
Opus and Sonnet.
This will help with test times (in some cases), as nextest cannot figure
out whether a given rdep is actually an alive edge of the build graph
Closes #ISSUE
Before you mark this PR as ready for review, make sure that you have:
- [ ] Added a solid test coverage and/or screenshots from doing manual
testing
- [ ] Done a self-review taking into account security and performance
aspects
- [ ] Aligned any UI changes with the [UI
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
Release Notes:
- N/A
This is a staff only toggle for now, since the consequences of
activating it are not obvious and quite dire (tokens costs 6 times
more).
Also, persist thinking, thinking effort and fast mode in DbThread so the
thinking mode toggle and thinking effort are persisted.
Release Notes:
- Agent: The thinking mode toggle and thinking effort are now persisted
when selecting a thread from history.
Before you mark this PR as ready for review, make sure that you have:
- [x] Added a solid test coverage and/or screenshots from doing manual
testing
- [x] Done a self-review taking into account security and performance
aspects
- [x] Aligned any UI changes with the [UI
checklist](https://github.com/zed-industries/zed/blob/main/CONTRIBUTING.md#uiux-checklist)
Release Notes:
- N/A
---------
Co-authored-by: Zed Zippy <234243425+zed-zippy[bot]@users.noreply.github.com>
The issue I ran into was that responses from anthropic compatible
providers, like Kimi for Coding, have no space after `data:`. This
change just adds a quick check to also allow for those providers to
work.
Before it just resolved but did not show any output:
<img width="50%" alt="CleanShot 2026-01-28 at 12 50 31@2x"
src="https://github.com/user-attachments/assets/c3c8fe27-348e-4b21-a5f1-25bcc82f3774"
width=50%/>
Now it returns the proper result:
<img width="50%" alt="CleanShot 2026-01-28 at 12 56 30@2x"
src="https://github.com/user-attachments/assets/4e524c1e-78ab-4956-bd65-a919d46adc59"
width=50%/>
Normal Anthropic models still work as expected:
<img width="50%" alt="CleanShot 2026-01-28 at 12 58 37@2x"
src="https://github.com/user-attachments/assets/5a2906aa-1183-45b6-939b-01a6830f3385"
/>
Config to test
```json
"language_models": {
"anthropic": {
"api_url": "https://api.kimi.com/coding",
"available_models": [
{
"name": "kimi-for-coding",
"display_name": "Kimi 2.5 Coding",
"max_tokens": 262144,
"max_output_tokens": 32768,
},
],
},
}
```
TLDR:
- Accepts SSE data:{...} lines (no space) emitted by some alternative
Anthropic providers, in addition to the standard data: {...} format.
Release Notes:
- Fixed Anthropic streaming for alternative providers by accepting SSE data:{...} (no space) lines.
---------
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
<img width="435" height="211" alt="Screenshot 2026-02-17 at 1 32 48 PM"
src="https://github.com/user-attachments/assets/136c188d-5001-4526-961e-9f7faccc5f7a"
/>
Add support for the new Claude Sonnet 4.6 model across the anthropic,
bedrock, and language_models crates. Includes base, thinking, and 1M
context variants.
Closes AI-39
Release Notes:
- Added BYOK support for Claude Sonnet 4.6
TODO:
- [x] Review code
- [x] Decide whether to keep ignored API tests
Release Notes:
- Fixed a bug where cancelling a thread mid-thought would cause further
anthropic requests to fail
- Fixed a bug where the model configured on a thread would not be
persisted alongside that thread
<img width="588" height="485" alt="Screenshot 2026-02-05 at 1 29 10 PM"
src="https://github.com/user-attachments/assets/f3d36c8b-b371-4226-af60-bdc2c6b34009"
/>
<img width="586" height="468" alt="Screenshot 2026-02-05 at 1 30 15 PM"
src="https://github.com/user-attachments/assets/878e91ad-948c-4b35-a37b-f5a8db7e0b3f"
/>
This adds Claude Opus 4.6 as a new Anthropic model, along with 1M
context window variants for both Opus 4.6 and Sonnet 4.5.
## Opus 4.6
Adds `ClaudeOpus4_6` and `ClaudeOpus4_6Thinking` with the same
properties as other Claude 4+ models (200k context, 8192 max output
tokens, fine-grained tool streaming beta header).
## 1M context variants
Adds 1M context window variants for Sonnet 4.5 and Opus 4.6. These are
identical to their base models except:
- Context window is 1,000,000 tokens instead of 200,000
- They send the `context-1m-2025-08-07` beta header
Release Notes:
- Added Claude Opus 4.6
- Now Claude Opus 4.6 and Sonnet 4.5 BYOK models support variations that
have context windows of 1 million tokens (and have different pricing)
Closes#38533
<img width="807" height="425" alt="Screenshot 2025-12-16 at 2 32 21 PM"
src="https://github.com/user-attachments/assets/6ebb915c-91d3-4158-a2b9-9fe17d301dd6"
/>
Release Notes:
- Use up-to-date token counts from LLM responses when reporting tokens
used per thread
---------
Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
This PR partially implements a knowledge distillation data pipeline.
`zeta distill` gets a dataset of chronologically ordered commits and
generates synthetic predictions with a teacher model (one-shot Claude
Sonnet).
`zeta distill --batches cache.db` will enable Message Batches API. Under
the first run, this command will collect all LLM requests and upload a
batch of them to Anthropic. On subsequent runs, it will check the batch
status. If ready, it will download the result and put them into the
local cache.
Release Notes:
- N/A
---------
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Adds support for Opus 4.5
- [x] BYOK
- [x] Amazon Bedrock
Release Notes:
- Added support for Opus 4.5
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
We've been considering removing workspace-hack for a couple reasons:
- Lukas ran into a situation where its build script seemed to be causing
spurious rebuilds. This seems more likely to be a cargo bug than an
issue with workspace-hack itself (given that it has an empty build
script), but we don't necessarily want to take the time to hunt that
down right now.
- Marshall mentioned hakari interacts poorly with automated crate
updates (in our case provided by rennovate) because you'd need to have
`cargo hakari generate && cargo hakari manage-deps` after their changes
and we prefer to not have actions that make commits.
Currently removing workspace-hack causes our workspace to grow from
~1700 to ~2000 crates being built (depending on platform), which is
mainly a problem when you're building the whole workspace or running
tests across the the normal and remote binaries (which is where
feature-unification nets us the most sharing). It doesn't impact
incremental times noticeably when you're just iterating on `-p zed`, and
we'll hopefully get these savings back in the future when
rust-lang/cargo#14774 (which re-implements the functionality of hakari)
is finished.
Release Notes:
- N/A
Co-Authored-By: Ben K <ben@zed.dev>
Co-Authored-By: Anthony <anthony@zed.dev>
Co-Authored-By: Mikayla <mikayla@zed.dev>
Release Notes:
- settings: Major internal changes to settings. The primary user-facing
effect is that some settings which did not make sense in project
settings files are no-longer read from there. (For example the inline
blame settings)
---------
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
Co-authored-by: Anthony <anthony@zed.dev>
Closes#37289
The current implementation has a problem. The **`from_id` method** in
the Anthropic crate works well for predefined models, but not for custom
models that are defined in the settings. This is because it fallbacks to
using default beta headers, which are incorrect for custom models.
The issue is that the model instance for custom models lives within the
`language_models` provider, so I've updated the **`stream_completion`**
method to explicitly accept beta headers from its caller. Now, the beta
headers are passed from the `language_models` provider all the way to
`anthropic.stream_completion`, which resolves the issue.
Release Notes:
- Fixed a bug where extra_beta_headers defined in settings for Anthropic
custom models were being ignored.
---------
Signed-off-by: Umesh Yadav <git@umesh.dev>
This prevents the common footgun of copy/pasting an API key
starting/ending with extra newlines, which would lead to a "bad request"
error.
Closes#37038
Release Notes:
- agent: Support pasting language model API keys that contain newlines.
* Updates to `zed_llm_client-0.8.5` which adds support for `retry_after`
when anthropic provides it.
* Distinguishes upstream provider errors and rate limits from errors
that originate from zed's servers
* Moves `LanguageModelCompletionError::BadInputJson` to
`LanguageModelCompletionEvent::ToolUseJsonParseError`. While arguably
this is an error case, the logic in thread is cleaner with this move.
There is also precedent for inclusion of errors in the event type -
`CompletionRequestStatus::Failed` is how cloud errors arrive.
* Updates `PROVIDER_ID` / `PROVIDER_NAME` constants to use proper types
instead of `&str`, since they can be constructed in a const fashion.
* Removes use of `CLIENT_SUPPORTS_EXA_WEB_SEARCH_PROVIDER_HEADER_NAME`
as the server no longer reads this header and just defaults to that
behavior.
Release notes for this is covered by #33275
Release Notes:
- N/A
---------
Co-authored-by: Richard Feldman <oss@rtfeldman.com>
Co-authored-by: Richard <richard@zed.dev>
This PR is in preparation for doing automatic retries for certain
errors, e.g. Overloaded. It doesn't change behavior yet (aside from some
granularity of error messages shown to the user), but rather mostly
changes some error handling to be exhaustive enum matches instead of
`anyhow` downcasts, and leaves some comments for where the behavior
change will be in a future PR.
Release Notes:
- N/A
Previously we were using a mix of `u32` and `usize`, e.g. `max_tokens:
usize, max_output_tokens: Option<u32>` in the same `struct`.
Although [tiktoken](https://github.com/openai/tiktoken) uses `usize`,
token counts should be consistent across targets (e.g. the same model
doesn't suddenly get a smaller context window if you're compiling for
wasm32), and these token counts could end up getting serialized using a
binary protocol, so `usize` is not the right choice for token counts.
I chose to standardize on `u64` over `u32` because we don't store many
of them (so the extra size should be insignificant) and future models
may exceed `u32::MAX` tokens.
Release Notes:
- N/A
This PR reorders the `Model` variants in the `anthropic` crate in
descending order.
Newer/more powerful models at the top -> older/less powerful models at
the bottom.
Release Notes:
- N/A
Bubbles up rate limit information so that we can retry after a certain
duration if needed higher up in the stack.
Also caps the number of concurrent evals running at once to also help.
Release Notes:
- N/A