mirror of
https://github.com/unslothai/unsloth.git
synced 2026-05-20 00:51:36 +00:00
1250 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d2e25ee131
|
studio/frontend: drop unused dependencies, move type pkg to devDeps (#5477)
* studio/frontend: drop unused dependencies, move type pkg to devDeps
Removes 11 declared deps that are not imported anywhere in src/, the
Tauri config, src-tauri Rust, backend, scripts, CI workflows, or
sibling workspaces. Moves @types/canvas-confetti to devDependencies
since it ships TypeScript types only.
Removed from dependencies:
@assistant-ui/react-markdown (no imports; not a peer of any used pkg)
@assistant-ui/react-streamdown (no imports; not a peer of any used pkg)
@langchain/core (no imports anywhere)
@streamdown/cjk (no imports; not a peer of streamdown)
@radix-ui/react-checkbox (re-exported by the radix-ui umbrella;
no direct imports)
@radix-ui/react-label (same)
@radix-ui/react-select (same)
@radix-ui/react-separator (same)
date-fns (already a direct dep of react-day-picker)
remark-gfm (already a direct dep of streamdown)
Removed from devDependencies:
playwright (CI installs the pip playwright; the
npm one is unused)
Moved to devDependencies:
@types/canvas-confetti (TypeScript types only; not a runtime dep)
Verified with npm install + npm run build (tsc -b && vite build),
clean exit, dist/ produced. Live unsloth studio launch returns 200
on /, on the main JS / CSS bundles, and on /api/health.
* studio/frontend: keep @radix-ui packages (per maintainer)
Maintainer asked to keep the four @radix-ui packages this PR was
originally dropping:
@radix-ui/react-checkbox ^1.3.3
@radix-ui/react-label ^2.1.8
@radix-ui/react-select ^2.2.6
@radix-ui/react-separator ^1.1.8
Restored to dependencies and refreshed the lockfile. Build still
green (1044 packages, vite build 2.1s, same dist contents).
|
||
|
|
e775f941a4
|
tests/openai: patch httpx.AsyncClient ctor so delete tests hit mock (#5469)
Some checks are pending
Security audit / npm scan-packages (Studio frontend tarballs) (push) Waiting to run
Security audit / workflow-trigger lint (pull_request_target / cache-poisoning) (push) Waiting to run
Security audit / pytest tests/security (push) Waiting to run
Security audit / npm provenance + new install-script diff (push) Waiting to run
Studio API CI / Studio API & Auth Tests (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Studio GGUF CI / Tool calling Tests (push) Waiting to run
Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio API CI / Studio API & Auth Tests (push) Waiting to run
Mac Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Mac Studio GGUF CI / Tool calling Tests (push) Waiting to run
Mac Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio UI CI / Chat UI Tests (push) Waiting to run
Mac Studio Update CI / Studio Updating Tests (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Studio UI CI / Chat UI Tests (push) Waiting to run
Studio Update CI / Studio Updating Tests (push) Waiting to run
Windows Studio API CI / Studio API & Auth Tests (push) Waiting to run
Windows Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Windows Studio GGUF CI / Tool calling Tests (push) Waiting to run
Windows Studio GGUF CI / JSON, images (push) Waiting to run
Windows Studio UI CI / Chat UI Tests (push) Waiting to run
Windows Studio Update CI / Studio Updating Tests (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
delete_openai_container intentionally creates a fresh httpx.AsyncClient per call (see external_provider docstring: shared pool produced false 'deleted: true' responses while the container survived). The existing _mock_http_client only swapped the shared module-level _http_client, so the four delete tests bypassed the mock entirely and hit the real OpenAI API, returning 401 Unauthorized on Python 3.10 / 3.12 / 3.13. Extend the helper to also monkey-patch httpx.AsyncClient itself to a factory that injects the test's MockTransport into any freshly constructed client. List/create paths still use the shared client and pass unchanged. Verified locally: pytest tests/test_openai_container_crud.py -> 8 passed. |
||
|
|
ba0cae1aff
|
Stop: drop Ollama API key, clean up code execution UI (#5464)
* chat: drop Ollama API key, clean up code execution UI
* studio/chat: fix undefined candidateId + keyboard a11y on container list
- Auto-bind effect referenced `candidateId`, which is not declared in
this scope (only `candidate` is) — would fail the TS/Next build.
Use `candidate.id` to match the variable that's actually defined.
- Container list items get `role="button"` when `canActivate` is true
but had no keyboard activation. Add `onKeyDown` for Enter/Space and
`tabIndex={0}` so the row is focusable and activatable from the
keyboard, matching the existing onClick behavior.
* studio/chat: restore declarations dropped by the main merge
The
|
||
|
|
2de99a23d8
|
studio/install: strip top-level dir from repaired symlink target (#5467)
The repair in 5465 returned the full archive entry name (e.g. "llama-b9165 libggml-rpc.0.11.1.dylib") but safe_link_target joins the return value with target.parent (which already lives under base llama-b9165). That doubled the prefix to base llama-b9165 llama-b9165 libggml-rpc.0.11.1.dylib, the resolved path never existed, and extract_tar_safely still raised 'tar archive contained unresolved link entries'. Strip the top-level dir before returning so the linkname is relative to target.parent, mirroring how unmangled symlinks are stored in the tar (basename-only relative to the symlink). Verified end-to-end against the upstream b9165 tarball: extraction succeeds and every symlink resolves to an existing file. |
||
|
|
a70bf02bb8
|
studio/chat: OpenAI container picker delete reliability (#5466)
* studio/chat: fix OpenAI container delete UX (expired filter, TTL cap, idempotent 404, refresh-on-error)
- Filter status="expired" from /containers/list so the picker only
shows usable containers. OpenAI keeps expired entries in the list
indefinitely, which made delete look broken.
- Cap ttl_minutes at 20 (backend Field + frontend TTL_MAX + persistence
clamp). OpenAI's actual hard limit is 20; the prior 10080 cap caused
integer_above_max_value rejections on create.
- Treat 404 on delete as idempotent success in the frontend client so
already-gone containers don't surface a scary error toast.
- Run refresh() in finally for onCreate/onDelete so the picker stays
in sync with OpenAI even when the call errors.
- Add route-level test for the expired filter.
* studio/chat: add diagnostic logging for OpenAI /containers DELETE
Trace what arrives at /external/openai/containers/delete (subject,
container_id, base_url) and what we send to OpenAI (URL, presence
of Authorization, value of OpenAI-Beta) plus the full response
status + body (capped at 300 chars). Helps confirm whether the
beta header is on the wire and whether OpenAI's response actually
reports deleted=true, when users report the delete "not taking".
No secrets are logged — Authorization is reported as a boolean.
* studio/chat: log raw /containers list response from OpenAI
Sibling to the delete diagnostics. After a confirmed delete
(deleted=true on the wire), we want to see whether the very next
list call returns the just-deleted id — that distinguishes
"OpenAI eventually-consistent list" from "frontend stale state".
Logs each entry's id + status only; no names, no timestamps.
* studio/chat: fingerprint decrypted API key for container CRUD
Logs kind (sk-proj-/sk-/other), length, and last-4 chars only —
never the full secret. Lets us compare what the backend actually
uses against the key the user expects, since the same DELETE
request shape can produce different results across keys
(project-scoped containers: list is permissive but delete requires
the owning project's key).
* studio/chat: use fresh httpx client for /v1/containers DELETE
Same key, same headers, same URL via the shared _http_client
returned deleted=true but the container persisted in subsequent
list calls. A fresh httpx.AsyncClient with the identical request
shape (verified with a standalone reproducer) deleted the same
container cleanly. Suspect connection-pool state from earlier
chat-completion streams interferes at the edge — switching to a
per-call client side-steps it entirely. Scoped to delete only;
list/create keep using the shared pool until we can confirm the
same fix is needed there.
* studio/chat: log OpenAI response headers on container DELETE
Adds cf-ray / x-request-id / openai-organization / openai-project /
openai-processing-ms to the delete-response diagnostic line. Lets
us cross-reference a failing delete against OpenAI support (or
against a working standalone reproducer) using the unique
request-id and edge node.
* studio/chat: client-side tombstone for just-deleted OpenAI containers
OpenAI's /v1/containers DELETE returns {"deleted": true} but the
list endpoint can keep returning the same container for several
minutes (replica lag or in-use silent no-op — undocumented per
developers.openai.com/api/docs/guides/tools-shell). Our backend
sends the correct DELETE with OpenAI-Beta: containers=v1 and a
standalone reproducer shows the same behavior, so the right fix
is UI-side rather than waiting on OpenAI.
After a successful delete, the id goes into a per-component
tombstone map with a 5-minute expiry. visibleContainers (now the
single chokepoint feeding sortedContainers, auto-bind, and the
all-containers list) filters those ids out. A 30s sweep clears
expired tombstones so the picker recovers automatically if OpenAI
eventually catches up (or the container's TTL elapses).
* studio/chat: tombstones live for the page lifetime; drop API key fingerprint log
- Tombstones change from Map<id, expiry> to Set<id>: once tombstoned,
the id stays hidden from the picker until page reload. OpenAI's list
can keep returning a deleted id for an undocumented and variable
amount of time; automatically un-tombstoning after a fixed window
surfaces it again and creates more confusion than it solves. The
container's own TTL eventually expires the entry on OpenAI's side,
and the expired-status filter at the backend list route hides it
anyway.
- Remove the periodic sweep effect (dead code without expiries).
- Remove the api-key fingerprint log added during debugging — it
served its purpose (confirmed parity) and isn't needed long-term.
|
||
|
|
4f59c8e539
|
studio/install: repair upstream llama.cpp prebuilt mangled symlinks (#5465)
The macos-arm64 prebuilt tarball for llama.cpp b9165 and b9169 ships symlinks whose linkname is missing both the directory separator AND the leading character of the target basename: llama-b9165/libggml-rpc.0.dylib -> llama-b9165ibggml-rpc.0.11.1.dylib extract_tar_safely correctly classified those as unresolved and made install.sh fall back to source-build, which Mac CI then fails as a hard error (Studio must use the prebuilt llama-bNNNN-bin-macos-arm64 on Apple Silicon). Add _try_repair_missing_slash inside safe_link_target: when a linkname starts with the member's top-level dir but no following slash, search the archive for an entry under that dir whose name ends with the mangled suffix. Accept only when the suffix uniquely identifies a real archive entry, so legitimate archives are untouched. Verified against /tmp/llama-b9165.tar.gz: all 18 link entries repair to real files in the archive. |
||
|
|
2622b79606
|
studio/chat: built-in code execution for OpenAI + Anthropic (#5461)
* studio/chat: built-in code execution for Anthropic Claude 4.x
Wire Anthropic's server-side code_execution_20250825 tool to the
existing Code pill in the composer. Pill lights up only for Claude
Opus/Sonnet/Haiku 4.x models that the docs list as compatible; pairs
independently with Search. Backend appends the tool entry plus the
code-execution-2025-08-25 beta header, and translates the SSE
server_tool_use / *_tool_result blocks (bash + text_editor sub-tools)
into the _toolEvent shape the frontend renderer consumes. File
uploads via the Files API are a deliberate follow-up.
* studio/chat: enable code execution pill in in-thread composer too
thread.tsx renders its own composer with a separate CodeToolsToggle
that was still gated on supportsTools only, so the pill stayed
disabled inside an active thread even after picking Anthropic 4.x.
Surface the capability through the runtime store
(supportsBuiltinCodeExecution, set from chat-page alongside
supportsBuiltinWebSearch) and read it in the toggle.
* studio/chat: built-in code execution for OpenAI cloud gpt-5.5
Extend the Code pill to OpenAI cloud's gpt-5.5 / gpt-5.5-pro via the
shell tool on /v1/responses. Per-thread container reuse: capture the
container_id from each response on a synthetic container_ready event,
persist it onto the ThreadRecord, and pass it back as
environment.type="container_reference" on follow-up turns so the
model sees filesystem state from prior turns until OpenAI's idle
expiry. Stale ids surface a container_invalidated event that clears
the thread record so the next turn falls back to container_auto.
Gated strictly on OpenAI cloud (api.openai.com base URL) — Ollama,
llama.cpp, vLLM, and custom OpenAI-compat presets won't see the
shell tool entry even when their providerType collapses to "openai".
* studio/chat: OpenAI shell-tool container management UI
Side-panel section (settings sheet → Code Execution) for managing
OpenAI's shell-tool containers per thread. Three controls:
- New-container idle timeout (provider-level default, pre-fills the
create dialog and is used by the lazy-create path on a thread's
first turn when set to a non-default value).
- Active container picker for the active thread — pick any existing
container or stay on "Auto-create per thread".
- Inline create form (name + idle TTL) and per-row delete actions.
Three new backend endpoints under /api/inference/external/openai/
containers/{list,create,delete} proxy to OpenAI /v1/containers using
the encrypted API key. All three reject non-cloud base URLs up front
so the picker stays scoped to api.openai.com.
Deleting a container clears all thread bindings pointing at it; the
next turn falls back to auto-create.
* studio/chat: inherit container across threads + styled active picker
New threads on the same OpenAI provider now default to the most
recently used container instead of "Auto-create per thread" — both
in the chat-adapter (so a send works even if the side panel was
never opened) and in the side panel itself (auto-binds the active
thread when the dropdown loads on a thread that has no container).
Picker is visually emphasized with an accent panel and the
currently-active row in the list below is highlighted with the same
accent so the two views stay in sync.
* studio/chat: friendly English-word names for auto-created containers
Replaces the "chat-<thread-id-slug>" auto-name with a random
English-word + short hex suffix (e.g. "kestrel-3f9c"). Applies only
to the chat-adapter's lazy-create path; the OpenAI container_auto
path stays unnamed (only fires when no custom TTL is set).
* studio/chat: always pre-create OpenAI containers via frontend
Drops the TTL-based gate on the chat-adapter's lazy-create path so
every code-execution container the user ever sees in the picker has
a friendly English-word name. The backend's container_auto fallback
stays as a safety net (used only if the POST /v1/containers call
fails); in practice that branch should be rare.
* studio/chat: send OpenAI-Beta header for /v1/containers CRUD
Without OpenAI-Beta: containers=v1, OpenAI returns 200
{"deleted": true} for DELETE /v1/containers/{id} but does not
actually remove the container. The list call then keeps returning it,
making it look like Studio's "Delete container" button is broken.
Verified 2026-05-15 against api.openai.com: DELETE with the beta
header returns 200 and removes the container; the same DELETE without
the header returns the same 200 deleted:true body but the container
stays alive.
- Add _container_headers() that merges OpenAI-Beta on top of the
shared auth headers; route list / create / delete through it.
- Verify the DELETE response body reports {"deleted": true}; raise
httpx.HTTPError otherwise so the route surfaces a 5xx instead of
silently reporting success on a silent no-op.
- Add tests covering header propagation and the deleted-flag guard
(true, false, missing key, non-JSON body, 4xx passthrough).
* studio/chat: surface unpersisted-thread picker no-op as a toast
The "Active for this thread" container picker uses
db.threads.update(activeThreadId, ...), which silently returns 0 rows
affected when the thread record isn't yet in IndexedDB. That happens
on a brand-new thread where the user toggles code execution on and
opens settings before sending the first message — the chat adapter
only materializes the thread row on first send. The picker would
appear to ignore the user's selection and snap back to "Auto-create
per thread".
- onPick now awaits the update and toasts an actionable hint
("Send a message first to pin a container to this thread.") when
the update affected zero rows.
- Auto-bind effect comment clarifies why it stays best-effort silent.
The auto-bind effect itself is unchanged: it's a heuristic that
should not nag the user when it can't apply.
* studio/chat: let user pick OpenAI container before first send
Previously the picker silently no-op'd until the user sent the first
message, because Dexie's ThreadRecord is only materialized inside the
runtime-provider's `initialize` hook (assistant-ui's first-message
callback). That kept users from binding a thread to an existing
OpenAI container up front; they had to either send a message and
risk the chat adapter auto-creating one, or accept the cross-thread
inheritance default.
- Export `ensureThreadRecord` from runtime-provider so other surfaces
can materialize the row idempotently.
- In OpenAICodeExecSection.onPick, await ensureThreadRecord before
the update, with modelType="base" (the settings sheet that hosts
this section is only rendered in single-thread mode).
Behaviour after this commit:
- New thread + user picks a container in the sidebar → thread row is
created with that container_id; first send uses it, no auto-create.
- New thread + user does nothing → row still absent; first send goes
through the existing inherit/lazy-create path as before.
- The auto-bind effect remains silent best-effort: it does not
eagerly create the thread row, so it cannot pre-empt the user's
pick on a fresh thread.
* studio/chat: drop "Auto-create per thread" option, default to latest
The dropdown previously offered "Auto-create per thread" as an
explicit value (null in storage), with the chat-adapter then
inheriting from the most recent container at send-time. That made
the picker display disagree with what the backend would actually do:
the picker said "auto", but the backend was reusing an existing
container.
Behaviour after this commit, when code execution is enabled on an
OpenAI cloud provider:
- Containers list non-empty: dropdown defaults to the container with
the latest lastActiveAt, eagerly bound via ensureThreadRecord +
db.threads.update so the bind survives even when the thread row
has not been materialized by the chat adapter yet. User can pick
any other container in the list.
- Containers list empty: render a disabled placeholder "(none yet —
will be created on first send)". The chat-adapter's lazy-create
path (chat-adapter.ts:1040-1082) mints the first container on
first send and writes it back to the thread; the next refresh
surfaces it in the picker.
Expiration mid-operation is unchanged: the existing
container_invalidated _toolEvent clears the thread's stored id and
the next turn re-creates.
* studio/chat: fix picker stuck on "Selecting most recent…" + manual-create binding
Two follow-up fixes to the picker rework in
|
||
|
|
a9b8c9a221
|
Studio: make API key optional for local providers (llama.cpp/vLLM/Ollama) (#5457)
* make API key optional for local providers (llama.cpp/vLLM/Ollama)D * chore: reduce comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
920920592e
|
Polish/cloud to providers (#5450)
* polish: update provider dropdown and rename cloud
* fix: tighten custom provider fallback handling
* fix: external provider fallback typing
* studio: wire the chat Search button to OpenAI's built-in web_search tool
When the active model is an OpenAI external provider and the user
clicks the existing Search pill in the composer, the chat-completion
request now carries the unified enable_tools shorthand:
enable_tools: true
enabled_tools: ["web_search"]
The backend's stream_chat_completion threads enabled_tools through
to _stream_openai_responses, which translates it into the Responses
API tool schema:
body["tools"] = [{"type": "web_search"}]
per the OpenAI Responses tool spec
(https://developers.openai.com/api/docs/guides/tools). OpenAI then
runs the search server-side before the model replies; the search-
informed answer streams back through the existing
response.output_text.delta path. web_search_call lifecycle events
are silently ignored for now — sources / status indicators are
follow-up scope.
Frontend:
- provider-capabilities.ts: new providerSupportsBuiltinWebSearch()
helper. Returns true only for `openai` today; Anthropic
(web_search_20250305), Gemini grounded-search, and OpenRouter
variants can be added later with matching backend translation.
- chat-page.tsx: both model-switch paths (the onChange handler and
the inferenceParams.checkpoint useEffect) set supportsTools to
match the new helper, and force toolsEnabled=false on every
external switch so the Search toggle is opt-in by default.
- chat-adapter.ts: external branch adds enable_tools +
enabled_tools=["web_search"] to the request body when the
toggle is on AND the active provider supports built-in
web-search. Local-model branch is unchanged — it continues to
route the same shorthand through our local tool runtime.
Backend:
- routes/inference.py: forwards payload.enabled_tools to
stream_chat_completion at the proxy site (line 1599).
- external_provider.py: stream_chat_completion gains an
enabled_tools parameter; _stream_openai_responses appends
{"type": "web_search"} to body["tools"] when the list contains
"web_search". Other tools (file_search, code_interpreter,
image_generation, computer_use_preview) are easy follow-ups in
the same block.
Reuses the existing pydantic ChatCompletionRequest.enabled_tools
field, so no schema migrations.
* studio/backend: surface OpenAI server-side web_search in the chat UI
When the user has the chat Search button toggled on and OpenAI's
/v1/responses invokes the built-in web_search tool, _stream_openai_responses
now translates the tool's lifecycle events and citation annotations
into the same _toolEvent shape that local-tool calls use. The result:
the chat UI shows a web_search tool-call card mid-stream, then lists
the cited sources at the end of the message — identical to how local
web_search renders.
SSE event translation:
- response.output_item.added with item.type=web_search_call ->
emit _toolEvent tool_start. Carries item.action.query as args
when OpenAI ships it on the added event.
- response.output_item.done with item.type=web_search_call ->
backfill the query if it only arrives on the done variant. The
existing reasoning branch on the same event is preserved as an
if/elif under a shared isinstance guard.
- response.output_text.annotation.added with type=url_citation ->
collect into the most-recent web_search_call.citations list.
- response.output_text.delta with inline annotations[] (older
API variant) -> same collection path, so both wire shapes work.
- response.completed -> emit _toolEvent tool_end per call with
citations formatted as
Title: <title>\nURL: <url>\nSnippet: <snippet>
blocks joined by `\n---\n`. The frontend's
parseSourcesFromResult already lifts this format into source
content parts at end-of-stream.
- response.incomplete -> close out web_search cards with whatever
citations had landed, so a truncated response does not leave a
perpetually "running" tool card in the UI.
Both reasoning and web_search work simultaneously on the same turn —
the body sends `reasoning: {effort, summary}` and `tools: [{type:
"web_search"}]` independently, and the SSE handler tracks them
through separate channels.
Diagnostic: finally-block logger now reports per stream
web_search_requested - whether the client asked for it
web_search_invocations - how many calls OpenAI actually made
citations - total URLs cited
queries - the search queries the model issued
reasoning_emitted - whether <think> content was streamed
so reports of "I clicked Search and nothing happened" can be triaged
from the backend log without browser devtools.
* studio/backend: fix empty query + per-card '(no sources cited)' on OpenAI web_search
Two display bugs on the OpenAI Responses web_search → chat-UI bridge:
1. Tool cards showed "Searching for ''" — query missing.
OpenAI's response.output_item.added for web_search_call does not
reliably populate action.query across API versions; the canonical
place is output_item.done. The previous code emitted tool_start
at added with empty args and tried to backfill at done, but the
frontend's _toolEvent: tool_start is a one-shot push (no update
mechanism), so the args stayed empty.
Fix: defer both tool_start *and* a placeholder tool_end emission
to output_item.done, where action.query is guaranteed populated.
added now just initialises tracking. Frontend then renders one
card per call with the right "Searching for: <query>" label.
2. Every card showed "(no sources cited)".
The previous code tried to attribute url_citation annotations
to individual web_search_call invocations, but OpenAI's
annotations carry no link back to a specific search call —
they're just URLs the model cited from the aggregated search
pool. With N invocations and M annotations, the previous logic
bucketed all M into the last call and stamped "(no sources
cited)" on the rest.
Fix: collect citations into a single shared all_url_citations
list, dedup by URL. At response.completed (and
response.incomplete) overwrite the *last* web_search_call's
tool_end result with the aggregated Title:/URL:/Snippet:
blocks. The frontend's parseSourcesFromResult already flatMaps
every web_search result, so one non-empty result is enough to
surface the full source-pill set at the message tail. Other
tool cards get an empty result string (no '(no sources)' text).
Diagnostic log unchanged in shape; total_citations now reads
len(all_url_citations) directly.
* studio/chat: split Code and Search pill gates so external models cannot enable Code
The previous wire-up set supportsTools=true for OpenAI external
models to light up the Search pill, but supportsTools also gates the
Code pill, so Code became clickable for OpenAI even though external
providers have no local code execution.
Separate the two gates so each pill reflects what's actually
available:
- chat-runtime-store: new `supportsBuiltinWebSearch: boolean` flag.
Distinct from supportsTools — that one still means "runtime has a
local tool sandbox" (Code, python, our DuckDuckGo web_search).
This one means "the active external provider exposes a server-side
web_search tool we can opt into" (OpenAI's /v1/responses today).
- chat-page model-switch (both code paths): for external models,
supportsTools is now forced to false (no local Code path) and
supportsBuiltinWebSearch follows providerSupportsBuiltinWebSearch.
Local-model paths are unaffected — they only set supportsTools.
- shared-composer: Search pill gates on
`searchDisabled = !modelLoaded || !(supportsTools ||
supportsBuiltinWebSearch)`. Code pill gates on
`codeDisabled = !modelLoaded || !supportsTools` — strictly the
local runtime, so external models keep Code greyed out.
A `toolsDisabled = codeDisabled` alias is left in place for any
later-touched call site that may still reference the old name.
No backend changes — chat-adapter already calls
providerSupportsBuiltinWebSearch directly, independent of the store
flags, so the request shape and the backend translation are
unchanged.
* studio/chat: default external reasoning effort to medium, not the carry-over
When switching to an external model with reasoning support, the effort
dropdown was inheriting whatever value the user had set on a prior
model — frequently "xhigh" left over from a previous Opus/gpt-5
session. That meant every fresh OpenAI/Anthropic selection started at
Extra High, burning tokens unintentionally.
Both model-switch sites in chat-page (the useEffect on
inferenceParams.checkpoint and the onChange callback) now pick
"medium" whenever the new model's level list contains it, instead of
the clamped carry-over. The clamp still fires as a fallback for the
narrow case where a model doesn't expose medium (e.g. gpt-5.3-chat-
latest which only has medium anyway — no change there). Users can
still pick another level explicitly via the Think dropdown.
* studio/chat: also light the Search pill in the welcome-screen composer
There are two composers in the chat feature. shared-composer.tsx
renders inside an active thread, and assistant-ui/thread.tsx has its
own WebSearchToggle / CodeToolsToggle that ship the welcome-screen
"Send a message…" composer (visible before the first user message).
The previous fix split supportsTools and supportsBuiltinWebSearch in
shared-composer but never touched the welcome-screen toggles in
thread.tsx — they both still gated on supportsTools alone, so the
Search pill stayed greyed on the welcome screen even for OpenAI
external models that legitimately support web_search server-side.
Mirror the shared-composer rule in WebSearchToggle:
disabled = !modelLoaded || !(supportsTools || supportsBuiltinWebSearch)
CodeToolsToggle is left as-is — its current
`disabled = !(modelLoaded && supportsTools)` is correct: external
models have no local code-execution sandbox, so Code stays greyed
when supportsTools=false (which is what chat-page now writes for
external selections).
* studio/backend: wire Anthropic server-side web_search end-to-end
Mirrors the OpenAI web_search integration for Anthropic's
web_search_20250305 tool. When the user toggles Search on with an
Anthropic model selected, the request now carries the documented
tool entry:
tools: [{type: "web_search_20250305", name: "web_search",
max_uses: 5}]
on /v1/messages, and the SSE translation surfaces tool cards +
source pills in the chat UI exactly the same way as OpenAI.
stream_chat_completion now forwards enabled_tools into the
Anthropic branch (was only doing this for the OpenAI Responses
branch). _stream_anthropic gains an enabled_tools parameter and
the web_search request-body block plus three additional event
handlers:
- content_block_start with type=server_tool_use, name=web_search:
start tracking a new call. id becomes the tool_call_id.
- content_block_delta with type=input_json_delta inside a
server_tool_use block: buffer the partial_json so we can read
out the search query when the block closes.
- content_block_start with type=web_search_tool_result: capture
the per-call result list (urls + titles) that Anthropic ships
inline.
- content_block_stop: closes whichever block we're inside —
* server_tool_use -> emit _toolEvent: tool_start with the
parsed query as args.
* web_search_tool_result -> emit _toolEvent: tool_end with
Title:/URL: blocks the frontend's parseSourcesFromResult
lifts into source pills.
* thinking block -> existing </think> close.
Unlike OpenAI we get per-call results directly, so no aggregated-
last-call fallback is needed — each tool card carries its own
citations.
Diagnostic log on stream completion now reports
web_search_requested / invocations / total_results / queries,
matching the OpenAI shape.
Frontend providerSupportsBuiltinWebSearch returns true for
'anthropic' as well, so the Search pill lights up on Claude
models the same way it does on OpenAI. The existing chat-adapter
external branch already sends enabled_tools=['web_search'] based
on this helper — no adapter changes needed.
* studio: wire OpenRouter built-in web search via :online model suffix
OpenRouter exposes a universal "add web search to any model" shortcut:
append `:online` to the model id and the gateway runs the search
server-side, streaming citations back as annotations on text deltas.
Documented at https://openrouter.ai/docs/features/web-search
Hook the existing Search toggle into that path:
Backend (external_provider.py, default OAI-compat branch):
- When provider_type == 'openrouter' and enabled_tools contains
'web_search', rewrite body['model']:
openai/gpt-4o -> openai/gpt-4o:online
anthropic/claude-sonnet-4-5:free -> anthropic/claude-sonnet-4-5:online
Any existing `:variant` (`:free`, `:nitro`, etc.) is replaced —
OpenRouter variants are mutually exclusive.
- `openrouter/free` is skipped: it's a meta-router and `:online` is
not a valid suffix on it (the gateway 400s).
- A one-line INFO log fires whenever the rewrite happens so the
diagnostic backend log shows exactly which model id the request
was promoted to.
Frontend (provider-capabilities.ts):
- providerSupportsBuiltinWebSearch now returns true for 'openrouter'
alongside 'openai' and 'anthropic'. The Search pill lights up and
the existing chat-adapter external branch already forwards
enabled_tools=['web_search'] based on this helper — no adapter
changes needed.
No new SSE event handling: OpenRouter does not emit a separate
web_search_call event the way OpenAI/Anthropic do. Citations come
back as text annotations via the existing reasoning_details path
the adapter already parses, so source data flows through without
extra translation. A per-call tool-card UX ("Searching for: …")
would require synthesizing one client-side; deferred to a follow-up
if the bare-citation flow feels too minimal.
* studio: wire Mistral built-in web search connector
Same shape as OpenAI's web_search tool, lives on
/v1/chat/completions instead of /v1/responses. When the chat
Search pill is toggled on with a Mistral model selected, the
backend now appends
{"type": "web_search"}
to body["tools"] before the request goes out. Idempotent —
won't double-append if a future call site adds it first. Models
in the registry allowlist that don't support the connector
(codestral, devstral, ministral, mistral-tiny) will surface a
400 from upstream; the existing default-path error log captures
it. Mistral's docs:
https://docs.mistral.ai/capabilities/agents/connectors/websearch
Frontend providerSupportsBuiltinWebSearch returns true for
'mistral' now, alongside openai / anthropic / openrouter. The
Search pill lights up for Mistral models and the existing
adapter branch already sends enabled_tools=['web_search'] off
this helper — no adapter changes.
No SSE translation yet — Mistral streams citations inline as
text annotations or `references` in the final assistant content,
not as a separate web_search_call event. Citations flow through
to the message body as text; a per-call tool-card UX with
"Searching for: …" indicators is a follow-up if needed.
* studio/backend: fix OpenRouter web_search to use plugins shape + synthesize tool card
Two changes against the actual OpenRouter docs at
https://openrouter.ai/docs/guides/features/plugins/web-search:
Request shape:
The previous commit appended :online to the model id, which works on
concrete model ids but rejects on meta-routers like openrouter/free —
and that's exactly the model the user was testing with, so neither
the request rewrite nor the diagnostic log fired. Switch to the
universal plugins shape:
body["plugins"] = [{"id": "web"}]
Per the docs this is "exactly equivalent" to :online but works on
every model id including openrouter/free and openrouter/auto. No
model suffix manipulation, idempotent if added twice.
Tool-card synthesis:
OpenRouter doesn't emit a structured web_search_call event the way
OpenAI/Anthropic do — citations come back only as `annotations` of
type=url_citation on delta/message objects. To match the chat-UI
tool-card UX the user expects ("Searching for: …" indicator,
source pills at message tail), synthesize the events client-side
in the default OAI-compat stream loop:
- On stream open (after the 200 status check): yield a synthetic
_toolEvent: tool_start with tool_name=web_search, fixed id
"openrouter_web_search". The chat-UI then renders the running
tool card before any text streams.
- During the SSE loop: scan every chunk's choices[].delta and
choices[].message for `annotations: [{type: "url_citation",
url_citation: {url, title, content}}]` entries. Dedup by URL
into a citations list. Handles both the nested-url_citation
shape OpenRouter documents and the flat-on-annotation shape
some upstreams ship.
- On [DONE] (or stream-close without [DONE]): emit synthetic
tool_end carrying the citations as
Title: …\nURL: …\nSnippet: …\n---\n…
blocks the existing parseSourcesFromResult lifts into source
pills at message tail.
Diagnostic log on completion now also reports
web_search_requested + citation count alongside the existing
chosen-model / event-count telemetry.
* studio: drop Mistral built-in web_search — connector lives on Agents API only
Mistral's web_search is exclusively on /v1/agents + /v1/conversations;
sending it on /v1/chat/completions returns
"WebSearchTool connector is not supported". Wiring it would require a
dedicated Agents streaming path. Remove from the frontend capability map
and revert the chat-completions tool injection.
* studio: wire Kimi $web_search builtin via two-call round-trip
Kimi's $web_search lives on /v1/chat/completions but requires a client
round-trip per https://platform.kimi.ai/docs/guide/use-web-search:
the first call returns tool_calls with function.arguments populated;
the caller echoes those arguments back as a role=tool message; the
second call streams the final answer with search results incorporated.
The docs also mandate thinking=disabled while the builtin is active.
Backend: new _stream_kimi_web_search helper dispatched from
stream_chat_completion when provider_type=='kimi' and 'web_search' in
enabled_tools. Buffers tool_calls across deltas, falls back to a plain
stream if the model declines to search, and synthesizes tool_start
(with parsed query) / tool_end (with any url_citation annotations) so
the chat UI's web-search card behaves the same as other providers.
Frontend: kimi added to providerSupportsBuiltinWebSearch so the Search
pill lights up in the composer.
* studio/chat: mutual exclusion of Think + Search on Kimi composer
Kimi's $web_search builtin requires thinking=disabled per
https://platform.kimi.ai/docs/guide/use-web-search, so the two states
cannot coexist. Make the pills mutually exclusive in both composers
(shared and welcome-screen): clicking Search turns Think off; clicking
Think back on turns Search off. Default Think to on when a Kimi model
is selected — k2.6/k2.5 ship with thinking enabled out of the box.
* studio/chat: fix wrong provider var name in onChange branch
selectedProvider, not provider — TS2304 in tsc -b.
* studio/backend: add diagnostics to Kimi $web_search round-trip
Log the actual function.arguments from the first call (so we can see
the model's search query) and the second call's usage.prompt_tokens +
any annotation type names that came through. prompt_tokens spiking
above the input message length is direct proof the server injected
search results into context. annotation_types lets us learn the shape
Kimi uses for citations if/when they emit any.
* studio: per-provider defaults — Anthropic xhigh + Search on, OpenAI high + Search on, Opus 4.7 gains max
Anthropic: Think effort defaults to the highest level the model
supports (xhigh on 4.6/4.7, high on 4.5) and Search starts on, since
the web_search_20250305 tool returns structured citations end-to-end.
OpenAI: Think effort defaults to 'high' (the gpt-5.x reasoning sweet
spot for /v1/responses + web_search) and Search starts on.
Opus 4.7: 'max' added as an effort level above 'xhigh' in both
backend (_ANTHROPIC_THINKING_SPECS) and frontend (ANTHROPIC_REASONING_MODELS).
Kimi diagnostics: emit tool_end immediately after tool_start so the
web-search card transitions to 'complete' before the second-call
answer streams, log first-call args + second-call usage/prompt_tokens
+ any annotation type names, request stream_options.include_usage so
the second call exposes usage in SSE.
* studio/backend: harden Kimi fallback path with HTTPError handler + manual aiter_lines loop
Addresses PR review feedback (#5443): the no-search fallback streaming
path was using `async for response.aiter_lines()` and had no
`httpx.HTTPError` guard around the POST. Switch to the manual
__anext__ loop pattern used elsewhere in this module (avoids the
Python 3.13 + httpcore 1.0.x GeneratorExit propagation issue) and wrap
the whole request in a try/except so network failures surface as a
proper SSE error frame instead of a raw traceback.
* feat: prompt caching frontend for openai/anthropic
* studio/chat: route vLLM provider to /v1/chat/completions, not /v1/responses
vLLM's /v1/responses rebuilds messages through the loaded model's chat
template, which 400s on strict-alternation templates like Gemma 3
("Conversation roles must alternate user/assistant/..."). Stop collapsing
vllm -> openai in the frontend so the backend sees the real provider type
and falls through to the standard chat-completions path. Register vllm as
a hidden entry in PROVIDER_REGISTRY so supports_vision and provider-create
validation work without surfacing it in the cloud-provider dropdown.
* studio/chat: wire prompt caching for OpenAI and Anthropic external providers
Backend half of the prompt_caching toggle that already exists in the chat
settings panel. Scoped to OpenAI cloud (/v1/responses) and Anthropic
(/v1/messages); every other provider plumbs the flag as a no-op.
- Anthropic: attach cache_control={type:ephemeral} to the system block so
the static prefix is reused across turns. Without the marker Anthropic
caches nothing, so this is the only way to make the toggle do real work
on /v1/messages.
- OpenAI: opt into prompt_cache_retention="24h" — same price as the
default in_memory policy per the OpenAI docs, but the cache survives
~24 hours of idle instead of ~5-10 minutes. The model picker is
registry-scoped to gpt-5.x / o3 / gpt-4.5, all of which accept the
parameter (gpt-5.5+ already defaults to "24h" so it's a no-op there).
- Treats `enable_prompt_caching=None` as enabled to match the frontend
default for both providers; pass `false` explicitly to opt out.
* studio/chat: log cache token counts on OpenAI and Anthropic stream completion
Surface cache usage in the existing "stream complete" info logs so
prompt-caching behavior can be verified by tailing the studio backend
log instead of opening the provider dashboard.
- Anthropic: latch usage from message_start (input + cache_creation +
cache_read counts) and message_delta (output_tokens), then include in
the per-request summary. cache_read_input_tokens > 0 confirms the
cache_control marker on the system block is doing its job.
- OpenAI Responses: latch usage from response.completed and
response.incomplete, extract usage.input_tokens_details.cached_tokens
(the /v1/responses field name, not prompt_tokens_details). A non-zero
value on turn N proves prompt_cache_retention="24h" let the prefix
hit the cache instead of being recomputed.
* studio/backend: strip temperature/top_p for Claude 4.7 family
Anthropic Opus 4.7 removed temperature, top_p, and top_k as a launch
breaking change ("Sampling parameters removed" in the 4.7 release notes
at https://platform.claude.com/docs/en/about-claude/models/whats-new-claude-4-7).
Setting any of them to a non-default value returns 400
"<param> is deprecated for this model". The existing guard only handled
top_k; temperature was still being sent unconditionally and is now
breaking opus-4-7 requests.
Rename _ANTHROPIC_TOP_K_DEPRECATED to _ANTHROPIC_4_7_SAMPLING_REMOVED to
reflect the broader scope, omit temperature from the base body on 4.7,
and skip the thinking-mode temperature=1 override on 4.7 (still applied
on 4.5/4.6 where it's required). Existing thinking_translation tests
target 4.5/4.6 / mock the wire so they're unaffected.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* studio/chat: anchor Anthropic prompt cache on the latest message too
A system-only cache_control marker is a no-op when the system prompt is
empty or shorter than Anthropic's ~1024-token cache floor — caching
silently does nothing (both cache_creation and cache_read return 0).
Add a second cache_control breakpoint on the final block of the latest
conversation message so the entire prefix (system + prior turns + new
user turn) becomes eligible for caching. On turn N+1, Anthropic
rehydrates everything up through turn N's marker instead of recomputing
it. Up to 4 breakpoints are allowed per request; we use at most 2
(system + tail). Tail rebuild avoids mutating the caller's content list
so an image-bearing turn still slots cleanly into the cached prefix.
* studio/chat: gate vLLM reasoning toggle on provider config
Add a "This server runs a reasoning model" checkbox on the vLLM
provider config. When off (default), the chat Think pill stays
hidden and no enable_thinking ever reaches vLLM. When on, the
pill renders, per-turn state flows through the existing
enable_thinking plumbing, and the backend proxy lifts it onto
chat_template_kwargs.enable_thinking so vLLM's Jinja template
honours it.
* chore: clean vLLM reasoning-toggle comments
* studio/chat: gate prompt_cache_retention to actual OpenAI cloud requests
Addresses Codex P1 review on _stream_openai_responses. The frontend
only sends enable_prompt_caching for the openai/anthropic UI provider
types, so ollama/llama.cpp/"custom" requests reach this helper with
the flag as None. The previous `is not False` check treated None as
enabled and injected prompt_cache_retention="24h" into every request
including those bound for non-OpenAI servers, which would 400 on
servers that implement /v1/responses but not the retention parameter.
Match the public OpenAI host (api.openai.com) on the client base_url
before adding the field so it only lands on actual OpenAI cloud
requests. Studio's openai picker is already registry-scoped to
gpt-5.x / o3 / gpt-4.5, all of which accept the parameter.
---------
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
4999753514
|
Studio: o3 reasoning summary payload (#5426)
* fix: o3 reasoning summary payload * fix: omit reasoning.summary for o3 in enable_thinking branch --------- Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
3f8c672636
|
studio/chat: built-in web search for OpenAI, Anthropic, OpenRouter, Kimi (#5443)
* studio: wire the chat Search button to OpenAI's built-in web_search tool
When the active model is an OpenAI external provider and the user
clicks the existing Search pill in the composer, the chat-completion
request now carries the unified enable_tools shorthand:
enable_tools: true
enabled_tools: ["web_search"]
The backend's stream_chat_completion threads enabled_tools through
to _stream_openai_responses, which translates it into the Responses
API tool schema:
body["tools"] = [{"type": "web_search"}]
per the OpenAI Responses tool spec
(https://developers.openai.com/api/docs/guides/tools). OpenAI then
runs the search server-side before the model replies; the search-
informed answer streams back through the existing
response.output_text.delta path. web_search_call lifecycle events
are silently ignored for now — sources / status indicators are
follow-up scope.
Frontend:
- provider-capabilities.ts: new providerSupportsBuiltinWebSearch()
helper. Returns true only for `openai` today; Anthropic
(web_search_20250305), Gemini grounded-search, and OpenRouter
variants can be added later with matching backend translation.
- chat-page.tsx: both model-switch paths (the onChange handler and
the inferenceParams.checkpoint useEffect) set supportsTools to
match the new helper, and force toolsEnabled=false on every
external switch so the Search toggle is opt-in by default.
- chat-adapter.ts: external branch adds enable_tools +
enabled_tools=["web_search"] to the request body when the
toggle is on AND the active provider supports built-in
web-search. Local-model branch is unchanged — it continues to
route the same shorthand through our local tool runtime.
Backend:
- routes/inference.py: forwards payload.enabled_tools to
stream_chat_completion at the proxy site (line 1599).
- external_provider.py: stream_chat_completion gains an
enabled_tools parameter; _stream_openai_responses appends
{"type": "web_search"} to body["tools"] when the list contains
"web_search". Other tools (file_search, code_interpreter,
image_generation, computer_use_preview) are easy follow-ups in
the same block.
Reuses the existing pydantic ChatCompletionRequest.enabled_tools
field, so no schema migrations.
* studio/backend: surface OpenAI server-side web_search in the chat UI
When the user has the chat Search button toggled on and OpenAI's
/v1/responses invokes the built-in web_search tool, _stream_openai_responses
now translates the tool's lifecycle events and citation annotations
into the same _toolEvent shape that local-tool calls use. The result:
the chat UI shows a web_search tool-call card mid-stream, then lists
the cited sources at the end of the message — identical to how local
web_search renders.
SSE event translation:
- response.output_item.added with item.type=web_search_call ->
emit _toolEvent tool_start. Carries item.action.query as args
when OpenAI ships it on the added event.
- response.output_item.done with item.type=web_search_call ->
backfill the query if it only arrives on the done variant. The
existing reasoning branch on the same event is preserved as an
if/elif under a shared isinstance guard.
- response.output_text.annotation.added with type=url_citation ->
collect into the most-recent web_search_call.citations list.
- response.output_text.delta with inline annotations[] (older
API variant) -> same collection path, so both wire shapes work.
- response.completed -> emit _toolEvent tool_end per call with
citations formatted as
Title: <title>\nURL: <url>\nSnippet: <snippet>
blocks joined by `\n---\n`. The frontend's
parseSourcesFromResult already lifts this format into source
content parts at end-of-stream.
- response.incomplete -> close out web_search cards with whatever
citations had landed, so a truncated response does not leave a
perpetually "running" tool card in the UI.
Both reasoning and web_search work simultaneously on the same turn —
the body sends `reasoning: {effort, summary}` and `tools: [{type:
"web_search"}]` independently, and the SSE handler tracks them
through separate channels.
Diagnostic: finally-block logger now reports per stream
web_search_requested - whether the client asked for it
web_search_invocations - how many calls OpenAI actually made
citations - total URLs cited
queries - the search queries the model issued
reasoning_emitted - whether <think> content was streamed
so reports of "I clicked Search and nothing happened" can be triaged
from the backend log without browser devtools.
* studio/backend: fix empty query + per-card '(no sources cited)' on OpenAI web_search
Two display bugs on the OpenAI Responses web_search → chat-UI bridge:
1. Tool cards showed "Searching for ''" — query missing.
OpenAI's response.output_item.added for web_search_call does not
reliably populate action.query across API versions; the canonical
place is output_item.done. The previous code emitted tool_start
at added with empty args and tried to backfill at done, but the
frontend's _toolEvent: tool_start is a one-shot push (no update
mechanism), so the args stayed empty.
Fix: defer both tool_start *and* a placeholder tool_end emission
to output_item.done, where action.query is guaranteed populated.
added now just initialises tracking. Frontend then renders one
card per call with the right "Searching for: <query>" label.
2. Every card showed "(no sources cited)".
The previous code tried to attribute url_citation annotations
to individual web_search_call invocations, but OpenAI's
annotations carry no link back to a specific search call —
they're just URLs the model cited from the aggregated search
pool. With N invocations and M annotations, the previous logic
bucketed all M into the last call and stamped "(no sources
cited)" on the rest.
Fix: collect citations into a single shared all_url_citations
list, dedup by URL. At response.completed (and
response.incomplete) overwrite the *last* web_search_call's
tool_end result with the aggregated Title:/URL:/Snippet:
blocks. The frontend's parseSourcesFromResult already flatMaps
every web_search result, so one non-empty result is enough to
surface the full source-pill set at the message tail. Other
tool cards get an empty result string (no '(no sources)' text).
Diagnostic log unchanged in shape; total_citations now reads
len(all_url_citations) directly.
* studio/chat: split Code and Search pill gates so external models cannot enable Code
The previous wire-up set supportsTools=true for OpenAI external
models to light up the Search pill, but supportsTools also gates the
Code pill, so Code became clickable for OpenAI even though external
providers have no local code execution.
Separate the two gates so each pill reflects what's actually
available:
- chat-runtime-store: new `supportsBuiltinWebSearch: boolean` flag.
Distinct from supportsTools — that one still means "runtime has a
local tool sandbox" (Code, python, our DuckDuckGo web_search).
This one means "the active external provider exposes a server-side
web_search tool we can opt into" (OpenAI's /v1/responses today).
- chat-page model-switch (both code paths): for external models,
supportsTools is now forced to false (no local Code path) and
supportsBuiltinWebSearch follows providerSupportsBuiltinWebSearch.
Local-model paths are unaffected — they only set supportsTools.
- shared-composer: Search pill gates on
`searchDisabled = !modelLoaded || !(supportsTools ||
supportsBuiltinWebSearch)`. Code pill gates on
`codeDisabled = !modelLoaded || !supportsTools` — strictly the
local runtime, so external models keep Code greyed out.
A `toolsDisabled = codeDisabled` alias is left in place for any
later-touched call site that may still reference the old name.
No backend changes — chat-adapter already calls
providerSupportsBuiltinWebSearch directly, independent of the store
flags, so the request shape and the backend translation are
unchanged.
* studio/chat: default external reasoning effort to medium, not the carry-over
When switching to an external model with reasoning support, the effort
dropdown was inheriting whatever value the user had set on a prior
model — frequently "xhigh" left over from a previous Opus/gpt-5
session. That meant every fresh OpenAI/Anthropic selection started at
Extra High, burning tokens unintentionally.
Both model-switch sites in chat-page (the useEffect on
inferenceParams.checkpoint and the onChange callback) now pick
"medium" whenever the new model's level list contains it, instead of
the clamped carry-over. The clamp still fires as a fallback for the
narrow case where a model doesn't expose medium (e.g. gpt-5.3-chat-
latest which only has medium anyway — no change there). Users can
still pick another level explicitly via the Think dropdown.
* studio/chat: also light the Search pill in the welcome-screen composer
There are two composers in the chat feature. shared-composer.tsx
renders inside an active thread, and assistant-ui/thread.tsx has its
own WebSearchToggle / CodeToolsToggle that ship the welcome-screen
"Send a message…" composer (visible before the first user message).
The previous fix split supportsTools and supportsBuiltinWebSearch in
shared-composer but never touched the welcome-screen toggles in
thread.tsx — they both still gated on supportsTools alone, so the
Search pill stayed greyed on the welcome screen even for OpenAI
external models that legitimately support web_search server-side.
Mirror the shared-composer rule in WebSearchToggle:
disabled = !modelLoaded || !(supportsTools || supportsBuiltinWebSearch)
CodeToolsToggle is left as-is — its current
`disabled = !(modelLoaded && supportsTools)` is correct: external
models have no local code-execution sandbox, so Code stays greyed
when supportsTools=false (which is what chat-page now writes for
external selections).
* studio/backend: wire Anthropic server-side web_search end-to-end
Mirrors the OpenAI web_search integration for Anthropic's
web_search_20250305 tool. When the user toggles Search on with an
Anthropic model selected, the request now carries the documented
tool entry:
tools: [{type: "web_search_20250305", name: "web_search",
max_uses: 5}]
on /v1/messages, and the SSE translation surfaces tool cards +
source pills in the chat UI exactly the same way as OpenAI.
stream_chat_completion now forwards enabled_tools into the
Anthropic branch (was only doing this for the OpenAI Responses
branch). _stream_anthropic gains an enabled_tools parameter and
the web_search request-body block plus three additional event
handlers:
- content_block_start with type=server_tool_use, name=web_search:
start tracking a new call. id becomes the tool_call_id.
- content_block_delta with type=input_json_delta inside a
server_tool_use block: buffer the partial_json so we can read
out the search query when the block closes.
- content_block_start with type=web_search_tool_result: capture
the per-call result list (urls + titles) that Anthropic ships
inline.
- content_block_stop: closes whichever block we're inside —
* server_tool_use -> emit _toolEvent: tool_start with the
parsed query as args.
* web_search_tool_result -> emit _toolEvent: tool_end with
Title:/URL: blocks the frontend's parseSourcesFromResult
lifts into source pills.
* thinking block -> existing </think> close.
Unlike OpenAI we get per-call results directly, so no aggregated-
last-call fallback is needed — each tool card carries its own
citations.
Diagnostic log on stream completion now reports
web_search_requested / invocations / total_results / queries,
matching the OpenAI shape.
Frontend providerSupportsBuiltinWebSearch returns true for
'anthropic' as well, so the Search pill lights up on Claude
models the same way it does on OpenAI. The existing chat-adapter
external branch already sends enabled_tools=['web_search'] based
on this helper — no adapter changes needed.
* studio: wire OpenRouter built-in web search via :online model suffix
OpenRouter exposes a universal "add web search to any model" shortcut:
append `:online` to the model id and the gateway runs the search
server-side, streaming citations back as annotations on text deltas.
Documented at https://openrouter.ai/docs/features/web-search
Hook the existing Search toggle into that path:
Backend (external_provider.py, default OAI-compat branch):
- When provider_type == 'openrouter' and enabled_tools contains
'web_search', rewrite body['model']:
openai/gpt-4o -> openai/gpt-4o:online
anthropic/claude-sonnet-4-5:free -> anthropic/claude-sonnet-4-5:online
Any existing `:variant` (`:free`, `:nitro`, etc.) is replaced —
OpenRouter variants are mutually exclusive.
- `openrouter/free` is skipped: it's a meta-router and `:online` is
not a valid suffix on it (the gateway 400s).
- A one-line INFO log fires whenever the rewrite happens so the
diagnostic backend log shows exactly which model id the request
was promoted to.
Frontend (provider-capabilities.ts):
- providerSupportsBuiltinWebSearch now returns true for 'openrouter'
alongside 'openai' and 'anthropic'. The Search pill lights up and
the existing chat-adapter external branch already forwards
enabled_tools=['web_search'] based on this helper — no adapter
changes needed.
No new SSE event handling: OpenRouter does not emit a separate
web_search_call event the way OpenAI/Anthropic do. Citations come
back as text annotations via the existing reasoning_details path
the adapter already parses, so source data flows through without
extra translation. A per-call tool-card UX ("Searching for: …")
would require synthesizing one client-side; deferred to a follow-up
if the bare-citation flow feels too minimal.
* studio: wire Mistral built-in web search connector
Same shape as OpenAI's web_search tool, lives on
/v1/chat/completions instead of /v1/responses. When the chat
Search pill is toggled on with a Mistral model selected, the
backend now appends
{"type": "web_search"}
to body["tools"] before the request goes out. Idempotent —
won't double-append if a future call site adds it first. Models
in the registry allowlist that don't support the connector
(codestral, devstral, ministral, mistral-tiny) will surface a
400 from upstream; the existing default-path error log captures
it. Mistral's docs:
https://docs.mistral.ai/capabilities/agents/connectors/websearch
Frontend providerSupportsBuiltinWebSearch returns true for
'mistral' now, alongside openai / anthropic / openrouter. The
Search pill lights up for Mistral models and the existing
adapter branch already sends enabled_tools=['web_search'] off
this helper — no adapter changes.
No SSE translation yet — Mistral streams citations inline as
text annotations or `references` in the final assistant content,
not as a separate web_search_call event. Citations flow through
to the message body as text; a per-call tool-card UX with
"Searching for: …" indicators is a follow-up if needed.
* studio/backend: fix OpenRouter web_search to use plugins shape + synthesize tool card
Two changes against the actual OpenRouter docs at
https://openrouter.ai/docs/guides/features/plugins/web-search:
Request shape:
The previous commit appended :online to the model id, which works on
concrete model ids but rejects on meta-routers like openrouter/free —
and that's exactly the model the user was testing with, so neither
the request rewrite nor the diagnostic log fired. Switch to the
universal plugins shape:
body["plugins"] = [{"id": "web"}]
Per the docs this is "exactly equivalent" to :online but works on
every model id including openrouter/free and openrouter/auto. No
model suffix manipulation, idempotent if added twice.
Tool-card synthesis:
OpenRouter doesn't emit a structured web_search_call event the way
OpenAI/Anthropic do — citations come back only as `annotations` of
type=url_citation on delta/message objects. To match the chat-UI
tool-card UX the user expects ("Searching for: …" indicator,
source pills at message tail), synthesize the events client-side
in the default OAI-compat stream loop:
- On stream open (after the 200 status check): yield a synthetic
_toolEvent: tool_start with tool_name=web_search, fixed id
"openrouter_web_search". The chat-UI then renders the running
tool card before any text streams.
- During the SSE loop: scan every chunk's choices[].delta and
choices[].message for `annotations: [{type: "url_citation",
url_citation: {url, title, content}}]` entries. Dedup by URL
into a citations list. Handles both the nested-url_citation
shape OpenRouter documents and the flat-on-annotation shape
some upstreams ship.
- On [DONE] (or stream-close without [DONE]): emit synthetic
tool_end carrying the citations as
Title: …\nURL: …\nSnippet: …\n---\n…
blocks the existing parseSourcesFromResult lifts into source
pills at message tail.
Diagnostic log on completion now also reports
web_search_requested + citation count alongside the existing
chosen-model / event-count telemetry.
* studio: drop Mistral built-in web_search — connector lives on Agents API only
Mistral's web_search is exclusively on /v1/agents + /v1/conversations;
sending it on /v1/chat/completions returns
"WebSearchTool connector is not supported". Wiring it would require a
dedicated Agents streaming path. Remove from the frontend capability map
and revert the chat-completions tool injection.
* studio: wire Kimi $web_search builtin via two-call round-trip
Kimi's $web_search lives on /v1/chat/completions but requires a client
round-trip per https://platform.kimi.ai/docs/guide/use-web-search:
the first call returns tool_calls with function.arguments populated;
the caller echoes those arguments back as a role=tool message; the
second call streams the final answer with search results incorporated.
The docs also mandate thinking=disabled while the builtin is active.
Backend: new _stream_kimi_web_search helper dispatched from
stream_chat_completion when provider_type=='kimi' and 'web_search' in
enabled_tools. Buffers tool_calls across deltas, falls back to a plain
stream if the model declines to search, and synthesizes tool_start
(with parsed query) / tool_end (with any url_citation annotations) so
the chat UI's web-search card behaves the same as other providers.
Frontend: kimi added to providerSupportsBuiltinWebSearch so the Search
pill lights up in the composer.
* studio/chat: mutual exclusion of Think + Search on Kimi composer
Kimi's $web_search builtin requires thinking=disabled per
https://platform.kimi.ai/docs/guide/use-web-search, so the two states
cannot coexist. Make the pills mutually exclusive in both composers
(shared and welcome-screen): clicking Search turns Think off; clicking
Think back on turns Search off. Default Think to on when a Kimi model
is selected — k2.6/k2.5 ship with thinking enabled out of the box.
* studio/chat: fix wrong provider var name in onChange branch
selectedProvider, not provider — TS2304 in tsc -b.
* studio/backend: add diagnostics to Kimi $web_search round-trip
Log the actual function.arguments from the first call (so we can see
the model's search query) and the second call's usage.prompt_tokens +
any annotation type names that came through. prompt_tokens spiking
above the input message length is direct proof the server injected
search results into context. annotation_types lets us learn the shape
Kimi uses for citations if/when they emit any.
* studio: per-provider defaults — Anthropic xhigh + Search on, OpenAI high + Search on, Opus 4.7 gains max
Anthropic: Think effort defaults to the highest level the model
supports (xhigh on 4.6/4.7, high on 4.5) and Search starts on, since
the web_search_20250305 tool returns structured citations end-to-end.
OpenAI: Think effort defaults to 'high' (the gpt-5.x reasoning sweet
spot for /v1/responses + web_search) and Search starts on.
Opus 4.7: 'max' added as an effort level above 'xhigh' in both
backend (_ANTHROPIC_THINKING_SPECS) and frontend (ANTHROPIC_REASONING_MODELS).
Kimi diagnostics: emit tool_end immediately after tool_start so the
web-search card transitions to 'complete' before the second-call
answer streams, log first-call args + second-call usage/prompt_tokens
+ any annotation type names, request stream_options.include_usage so
the second call exposes usage in SSE.
* studio/backend: harden Kimi fallback path with HTTPError handler + manual aiter_lines loop
Addresses PR review feedback (#5443): the no-search fallback streaming
path was using `async for response.aiter_lines()` and had no
`httpx.HTTPError` guard around the POST. Switch to the manual
__anext__ loop pattern used elsewhere in this module (avoids the
Python 3.13 + httpcore 1.0.x GeneratorExit propagation issue) and wrap
the whole request in a try/except so network failures surface as a
proper SSE error frame instead of a raw traceback.
|
||
|
|
30f6280835
|
studio/frontend: drop unused next dependency (#5438)
The frontend is a Vite SPA wrapped by Tauri and served by FastAPI's StaticFiles in web mode. Nothing in src imports from next/, no next.config exists, and no script invokes the Next.js server. The package was dead weight in node_modules and was being flagged by SCA scanners under CVE-2026-44578 (Next.js SSRF via WebSocket upgrade) despite the vulnerable code path never being reachable. next-themes is unrelated and stays; its only peers are react and react-dom. Verified with npm install + npm run build (tsc -b && vite build), clean exit, dist/ produced as before. |
||
|
|
762657afd2
|
studio/mlx: lower per-element grad clip default from 5.0 to 1.0 (#5440)
Studio's MLX training worker explicitly pinned ``max_grad_value=5.0``
into the ``MLXTrainingConfig`` so it would override the zoo default
regardless. The 5.0 threshold was effectively no protection -- per-
element transformer gradients in steady state are 1e-3..1e-1, so
|g_i| > 5 basically never fires even on spike batches, mixed-precision
overflow, or RL gradient bursts.
Switch to 1.0:
- matches the universal LLM clip_grad_norm=1.0 baseline (HF Trainer
/ TRL / PEFT / AutoTrain) while staying on MLX's fast per-element
``tree_map(mx.clip)`` path (no global reduction)
- actually catches outliers without distorting Adam's normalised
updates (typical post-warmup |g_i| << 1.0)
- lines up with the new MLXTrainingConfig default in
unslothai/unsloth-zoo so Studio doesn't silently disagree with
what zoo ships
No UI change; the TODO to expose grad clipping in Studio settings
remains. Existing trained runs are unaffected: only newly-spawned
training workers pick up the tighter clip.
|
||
|
|
bbd0ba0c25
|
studio/mmproj: skip unwanted GGUF values via seek instead of read (#5431)
The previous _skip_gguf_value walked past discarded values with f.read(n), which allocates and immediately drops a Python bytes object. For weight GGUFs that carry tokenizer.ggml.tokens (~150K unicode strings) this wasted ~10 MB of allocation per cold call. Switch the discard path to f.seek(n, 1). The kernel never has to copy the bytes into userspace and Python never allocates. Truncation is now detected on the next read attempt rather than inline (an out-of-range seek on a regular file is legal and the next read returns short). Measured on real downloaded GGUFs (Qwen3.5-4B IQ2_XXS 1.52 GB, bartowski Qwen3.5-4B IQ2_M 1.70 GB, Qwen3.5-4B-MTP IQ2_M 1.94 GB): before: 142 ms cold per weight, ~11 MB read after: 90 ms cold per weight, ~4 MB read Mmproj reads are unaffected (no tokenizer to skip). Cached re-reads remain ~50 microseconds. All 161 in-tree backend tests + 85 isolated sandbox tests pass. |
||
|
|
63c6750532
|
fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347) (#5350)
* fix(studio/mmproj): block cross-family projectors in flat local GGUF dirs (#5347) When a flat local GGUF directory holds several unrelated models with their own mmproj siblings, detect_mmproj_file() returned the first projector it walked into. For the layout reported in #5347 (Qwen weights + a Gemma mmproj in the same dir) that meant llama-server was launched with --mmproj pointing at the Gemma projector, which fails to load and surfaces as a confusing crash. Disambiguation rules: - Drop candidates whose family token (qwen/gemma/llama/mistral/phi/...) disagrees with the model's family. Candidates with no recognised family token (e.g. the HF-convention 'mmproj-F16.gguf') are kept. - Among same-family candidates, prefer the one whose stem shares the longest prefix with the model (Qwen3.5-9B mmproj beats Qwen3.5-35B mmproj for a Qwen3.5-9B model). - If every candidate is dropped, return None — better than attaching a wrong projector and getting a server-launch failure. Tests cover the cross-family block, multi-candidate prefix tie-break, HF-convention 'mmproj-F16.gguf', unrecognised families, and the existing search_root walk. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio/mmproj: word-bounded family match, expanded token list, launcher guard Tighten the family-token detector to match only on word boundaries so substring collisions stop tagging false families: phi no longer matches sapphire, yi no longer matches yip, mimo no longer matches mimosa, and mistral does not bleed into ministral/magistral/devstral. Pick the token whose first occurrence is leftmost in the filename rather than the first hit in tuple order, so merge models disambiguate predictably (llama-phi tags llama; phi-llama tags phi). Expand _MODEL_FAMILY_TOKENS with the families an audit of the unsloth HF org turned up that the previous list missed: devstral, ministral, magistral (Mistral-derivative naming), nemotron, kimi, nanonets, cosmos, mimo, apriel, lfm. Without these, a flat local GGUF directory containing one of these weights plus an unrelated renamed projector still hit the original #5347 failure. Add mmproj_matches_model_family() and call it at the llama-server launch site in core/inference/llama_cpp.py. detect_mmproj_file already drops cross-family candidates at discovery time, but mmproj_path can also reach the launcher via config injection or future overrides; this guard keeps those paths from silently loading a known-wrong projector. Tests: 12 new cases covering substring rejection, leftmost-position selection, new family tokens, a new flat-dir Nemotron + Gemma rejection case, and the launcher-level guard. All 21 detect_mmproj_file tests and the existing 106 llama_cpp tests pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio/mmproj: pair via GGUF general.* metadata, not just filenames Real Unsloth vision GGUFs carry rich identity metadata that has been ignored by the discovery path. Every projector under the unsloth org has general.type='mmproj' plus general.base_model.0.repo_url pointing at the same upstream HF repo as its weight, and the equivalent basename, base_model.0.name, and base_model.0.organization fields. A flat-dir mismatch is therefore decidable from the headers alone, no matter how the user has renamed the files. Add utils/models/gguf_metadata.py with read_gguf_general_metadata(): a fast (~30 ms) header walk that pulls only the general.* string fields and skips everything else, cached by (resolved path, mtime_ns, size). Mirrors the parser shape already used by LlamaCppBackend._read_gguf_metadata so the format handling is consistent. is_mmproj_by_metadata() returns True/False/None from general.type, and pairing_score() returns 100 for an exact base_model URL match, 80 for basename plus organization match, 60 for basename only, -1 for definitive metadata disagreement, and 0 when neither side has enough metadata to decide. Rewire detect_mmproj_file() to a two-stage selector: 1. Detect projectors via metadata (general.type) when present, else fall back to the filename substring heuristic. This recovers headerless projectors AND projectors whose name does not contain 'mmproj' but whose header advertises one. 2. Score each candidate against the weight via pairing_score. Drop candidates with score -1 (definitive metadata disagreement). For candidates with score 0 (no usable metadata) fall back to the existing filename family-token check, dropping recognised-family mismatches. Pick the survivor with the highest (score, longest_prefix, -len(stem)) tuple, so a metadata URL match always wins over a filename-prefix match. Tests: 16 new cases. tests/test_gguf_metadata.py covers the parser (missing file, non-GGUF, string extraction, walking past arrays and uint32s, cache invalidation by mtime/size) and the score helpers. tests/test_detect_mmproj_file.py adds end-to-end cases that synthesise real on-disk GGUF headers: URL match wins over a longer-prefix sibling, URL mismatch returns None even when filenames match, a projector named 'vision-projector.gguf' is still discovered via general.type, and a 100-score header match outranks a near-perfect filename prefix on a headerless candidate. All 75 tests across detect_mmproj_file, gguf_metadata, llama_cpp load progress, cached gguf routes, trained model scan, and vision cache pass. * studio/mmproj: shorten comments and docstrings across the #5347 changes Trim verbose explanations to one-line statements of intent. The behaviour is unchanged: 161 tests across detect_mmproj_file, gguf_metadata, llama_cpp_load_progress (+ matrix), llama_server_args, llama_cpp_cache_aware_disk_check, trained_model_scan, and vision_cache all pass. * studio/mmproj: shorten remaining detect_mmproj_file body comments Trim the docstring and the dir-walking block comments inside detect_mmproj_file to one-liners. Behaviour unchanged; 44 mmproj + gguf_metadata + llama_cpp_load_progress tests pass. * studio/mmproj: cap gguf_metadata cache below ceiling on every insert The eviction branch popped exactly one entry when len >= max, so the cache size could only converge to the cap when entries were added slowly enough for natural growth. After a sandbox sim that reduced the cap mid-run, len stayed above the cap because each insert popped one and added one. Switch to a while loop so we evict until len is strictly below the cap before inserting. Steady-state behaviour at the default 4096 ceiling is unchanged. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|
|
79adfd9c71
|
studio: skip flash-attn install on Blackwell GPUs (sm_100+) (#5420)
* studio: skip flash-attn install on Blackwell GPUs (sm_100+) Dao-AILab does not publish prebuilt flash-attn wheels for sm_100, sm_120, or sm_121, and the older-arch wheels fail to load on Blackwell. Add a shared has_blackwell_gpu() helper and gate both the install-time (install_python_stack._ensure_flash_attn) and runtime (worker._ensure_flash_attn_for_long_context) paths on it. Detection uses nvidia-smi --query-gpu=compute_cap, which works on Linux and Windows. * test: stub has_blackwell_gpu in pre-existing runtime flash-attn tests prefers_prebuilt_wheel and falls_back_to_pypi exercise the install paths that the Blackwell guard now short-circuits. Make them explicit about non-Blackwell so they pass on real Blackwell hosts. * studio: cache has_blackwell_gpu, skip Blackwell warning under NO_TORCH - Wrap has_blackwell_gpu in functools.lru_cache so repeated calls in a single process avoid redundant nvidia-smi spawns. Tests clear the cache via setup_method/teardown_method. - In _ensure_flash_attn, run the NO_TORCH short-circuit before the Blackwell check so GGUF-only users (who never install torch anyway) do not see a Blackwell warning. Blackwell check still runs above the IS_WINDOWS / IS_MACOS gates so Blackwell-on-Windows users still see the explicit reason rather than a silent OS skip. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test: add has_blackwell_gpu to mlx worker test wheel_utils stub test_mlx_training_worker_config loads worker.py against a hand-rolled utils.wheel_utils stub. Adding has_blackwell_gpu to the stub symbol list so worker's import line resolves. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
000ca89301
|
Studio: Passing batch size for eval (#5168)
* add eval batch size * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> |
||
|
|
4192fe6ebe
|
studio: drop unused max_grad_value schema + route plumbing (#5424)
* studio: drop unused max_grad_value schema + route plumbing The MLX worker hardcodes max_grad_value to 5.0 after PR #5340. The schema field, frontend payload type, route forwarder, and start_training kwarg threading were all left in place as a transitional buffer for old clients. The field is now genuinely unused everywhere except inside the MLX worker, so the schema, route forwarder, and config-build entries can go. Pydantic still tolerates older clients that send max_grad_value because TrainingStartRequest's model_config defaults to extra=ignore. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
a932294627
|
MLX training support for Studio on Apple Silicon (#5340)
* mlx fixes * Fix studio integration, local dataset files, chat templates without the torch gpu imports * pass grad norm in mlx worker * fix(studio): pass MLX grad clipping settings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * mlx: update grad value * fix(mlx): address ci and clipping review * fix backward compatibility and CI tests * unsloth local is mlx function * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * dont reference runtime * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio mlx: hardcode value clipping, drop max_grad_value from frontend Simplifies the MLX grad-clipping plumbing now that we are standardising on elementwise value clipping at [-5, 5] for the compiled MLX path and norm clipping disabled. The MLX worker no longer reads max_grad_norm / max_grad_value from the request; both are pinned in one place. Frontend stops sending the field at all, and the TypeScript request type drops it to match. Non-MLX (CUDA/AMD/Intel) is untouched and continues to pick up HF TrainingArguments' default max_grad_norm = 1.0. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|
|
9a0d6f80cb
|
studio: API external provider support for chat (OpenAI, Mistral, Gemini, Cohere, Anthropic, OpenRouter, DeepSeek, custom providers) (#4706)
* studio: add external provider support for chat inference Adds the ability to connect to OpenAI, Mistral, Google, Cohere, Together, Fireworks, and Perplexity from the Studio chat interface. - Provider configs stored in SQLite (no API keys persisted) - RSA-2048 key pair generated at startup for client-side key encryption - httpx proxy client streams SSE responses in OpenAI-compatible format - New /api/providers routes: registry, CRUD, test, models - /v1/chat/completions routes to external provider when provider fields present - Integration test suite covering CRUD, connection, model listing, and inference - Frontend spec doc with full API contract * remove frontend spec doc from branch * fix auth fixture: handle forced password change on fresh install * fix tests: default port 8000, allow 400 for no-model-loaded * fix: update Cohere models to current (command-r retired Sept 2025) * feat: add OpenRouter as 8th provider * feat: add native Anthropic provider with Messages API translation * fix: correct Anthropic base URL and drop top_p (conflicts with temperature) * feat: add DeepSeek provider (deepseek-chat, deepseek-reasoner) * feat: rename google -> gemini, refresh model list to 2.5 series * feat: remove together, fireworks, perplexity providers * feat: multimodal image support for external providers - Add _build_external_messages() that preserves image_url parts for vision-capable providers instead of stripping them - Update _proxy_to_external_provider() to use new helper - Translate image_url content parts to Anthropic native image format in _stream_anthropic() - Add TestVisionInference pytest class (1x1 PNG smoke test) * test: use sloth photo URL for vision test, add Anthropic remote URL support * fix: update Mistral model to mistral-small-2506 * update mistral default model to mistral-large-2512 * fix gemini vision test: download image as base64 data URI instead of remote URL * add gemini-3-flash-preview as default gemini model * fix gemini truncated reply (max_tokens 16->64) and suppress GeneratorExit on client disconnect * increase vision test max_tokens to 215 * fix GeneratorExit: aclose stream generator before closing httpx client * fix httpcore GeneratorExit: explicitly aclose aiter_lines before response closes * fix duplicate [DONE] and suppress httpcore RuntimeError on Python 3.13 asyncgen cleanup * fix: call response.aclose() before lines_gen.aclose() to prevent httpcore RuntimeError on Python 3.13 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Potential fix for code scanning alert no. 36: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * review: add comments for manual iteration rationale, mask password in test print, clarify Anthropic URL/models support * perf: use shared module-level httpx client for connection pooling across requests * studio: add API provider UI and integrate wiring (#4737) * feat: expose external models in selector and chat settings * feat(chat): wire external providers to backend + RSA key flow - Fetch registry/configs; create/update/delete saved providers - Encrypt API keys (Web Crypto RSA-OAEP) for test/models/chat - External model selection + chat payload (provider_id/type, external_model, encrypted key, optional base URL) - Local storage for keys + provider list; small UX/copy and guardrails * add missing providers-api.ts file by Imagineer99 * fix: address PR review comments — system prompt visibility, retry loop, test logging * feat(studio): encrypt external provider API keys at rest in localStorage API keys for external providers (OpenAI, Mistral, etc.) were stored as plaintext in localStorage, vulnerable to browser extensions and XSS. Add password-derived AES-256-GCM encryption: on login the user's password is used via PBKDF2 (100k iterations, SHA-256) to derive an in-memory encryption key. API keys are encrypted before writing to localStorage and decrypted on read. The derived key is never persisted — cleared on logout, re-derived on next login. Legacy plaintext keys are transparently migrated on first access. Password changes re-encrypt all stored keys. No backend changes required — the existing RSA-OAEP transit encryption is unaffected. * fix: cast PBKDF2 salt to BufferSource for strict TypeScript lib types * fix: persist session password in sessionStorage to survive page refreshes * feat(studio): preserve image parts in external provider chat requests toOpenAIMessage() now returns multimodal content arrays (OpenAI vision format) when messages contain images, instead of always flattening to plain text. This enables vision-capable external providers (OpenAI, Gemini, Anthropic, etc.) to receive user images. The backend already handles image_url content parts in _build_external_messages(). * studio: fix external models selectable in chat-only mode (#4779) * fix: external models selectable in chat-only mode * fix: model selector tabs default to active model kind * Studio: API external provider registry + curated catalogs (HF/OpenRouter) and chat UX (#4787) * fix: external models selectable in chat-only mode * fix: model selector tabs default to active model kind * feat(studio): expand provider registry, curated catalogs, and chat UX - Add Hugging Face, Kimi, Qwen; remove Cohere; reorder registry - model_list_mode curated for HF/OpenRouter; lightweight /models check - API returns default models for curated providers; expose model_list_mode - Frontend: provider logos in model picker, providerType on external models - Chat providers dialog: curated vs remote flows, motion polish - Thread: LayoutGroup + composer motion alignment with app easing * fix(studio): disable Anthropic tool-calling flag and preselect curated defaults * feat(studio): add external provider logos and ApiProviderLogo helper * Studio: Polish API Providers dialog (#4899) * fix: lower verbage in API providers page * fix: fix(studio): tune API Providers dialog width with rem-based responsive caps * feat: add custom provider support (#4902) * fix: replace crypto.subtle with node-forge for HTTP compatibility crypto.subtle is only available in secure contexts (HTTPS/localhost), which breaks provider API key encryption when Studio is accessed over plain HTTP on remote GPU VMs. Switch to node-forge for RSA-OAEP and AES-256-GCM operations — same algorithms, works on any origin. * fix: store provider API keys as plaintext in localStorage Drop AES-256-GCM at-rest encryption for provider API keys. The session-password-derived encryption broke on auto-login via refresh token (password never captured), causing keys to silently vanish. API keys are still RSA-encrypted in transit via node-forge. At-rest encryption in localStorage added no real security since the decryption key also had to live client-side. Removes crypto-storage.ts, session password plumbing, and reEncryptAllKeys. * fix: use max_completion_tokens for OpenAI provider Newer OpenAI models (gpt-4o, gpt-5.x) reject the max_tokens param and require max_completion_tokens instead. Other providers still use max_tokens. * fix: skip empty assistant messages in external provider requests Some providers (Mistral) reject assistant messages with empty content. Filter them out when building the message list for external providers. * Update model-selector.tsx * Update model-selector.tsx * Update model-selector.tsx * Update chat-adapter.ts * Update chat-adapter.ts * Update chat-page.tsx * Update chat-settings-sheet.tsx * Update chat-settings-sheet.tsx * Update chat-settings-sheet.tsx * Update chat-providers-dialog.tsx * feat: polish providers settings form UI * style: polish provider row icon sizing and alignment * style: stabilize provider layout * style: add provider API key visibility toggle * fix: add provider render on empty list * studio/frontend: sync package-lock.json with package.json npm ci was failing because node-forge and @types/node-forge were declared in package.json but missing from the lockfile. Ran npm install to regenerate. * studio/backend: fix backend CI failures for providers router - test_desktop_auth: include providers_router in the routes stub so studio.backend.main imports cleanly under the monkeypatched module - test_providers_api: skip the whole module when STUDIO_TEST_PASSWORD is unset (it is an integration test against a live Studio server, same shape as the already-ignored test_studio_api.py) * studio/chat: drive ChatSettingsPanel from a per-provider capability map Replace the binary isExternalModel toggle in the sampling section with a provider-aware capability map. Each external provider type advertises which of top_k / min_p / repetition_penalty / presence_penalty its chat-completions API actually accepts, so the panel only renders the knobs that map onto the active provider's request body. Anthropic now exposes top_k; DeepSeek hides presence_penalty (deprecated in their docs); OpenRouter and custom providers continue to show every knob (OpenRouter drops unsupported server-side, custom assumes OpenAI-compat or a permissive vLLM/Ollama backend). Local models are unaffected — null capabilities means 'show everything'. chat-adapter.ts now forwards top_k / presence_penalty to the external proxy only when the active provider's capabilities permit it, so the request body matches what the UI shows. * studio/backend: forward top_k to Anthropic; filter OpenAI model list Two paired changes so the frontend capability map has matching backend behaviour: 1. ExternalProviderClient.stream_chat_completion now accepts top_k and forwards it to the Anthropic Messages body. OpenAI-compat providers (which all reject unknown sampling params) still receive only the fields they document. The proxy route in routes/inference.py passes payload.top_k through, so a UI request with top_k actually reaches Anthropic instead of being silently dropped at the boundary. 2. PROVIDER_REGISTRY['openai'] gains a model_id_allowlist regex that scopes the /models picker to current-gen ids (gpt-5.5 / gpt-5.4 / gpt-5.3 / gpt-4.5 / o3 families). The remote /v1/models listing otherwise returns dozens of historical snapshots, fine-tunes and non-chat models (embeddings, TTS, image, moderation) that we never want in the chat UI. default_models is refreshed to match. * studio/chat: relax presence_penalty to optional on OpenAIChatCompletionsRequest Followup to 1fbf445a — chat-adapter now omits presence_penalty for providers that do not accept it (Anthropic / DeepSeek), but the request type still required it as a non-optional number, breaking tsc. The backend pydantic model already defaults presence_penalty to 0, so making it optional client-side matches reality. * studio/backend: route OpenAI traffic through /v1/responses OpenAI's new flagship models (gpt-5.x) return 404 'This is not a chat model' on /v1/chat/completions and are only reachable via /v1/responses. Add a dedicated _stream_openai_responses path in ExternalProviderClient that: - Translates outbound messages into the Responses shape: system messages are folded into the top-level 'instructions' field, user/assistant messages become {role, content} items with input_text / input_image content parts (data URLs and https URLs both pass through). - Drops presence_penalty / top_k / frequency_penalty, none of which the Responses contract accepts. - Translates inbound SSE events back into OpenAI Chat Completions chunks so the frontend keeps a single SSE shape: response.output_text.delta -> delta chunk with content response.completed -> chunk with finish_reason='stop' response.incomplete -> chunk with finish_reason='length' response.failed / error -> propagated error SSE line Stream terminates with data: [DONE] (Responses emits this verbatim). stream_chat_completion dispatches all provider_type='openai' calls to this path; other OpenAI-compatible providers (mistral, gemini, etc.) continue to use /v1/chat/completions. Frontend provider-capabilities map updated to hide presence_penalty for OpenAI in the chat settings panel, matching the new request contract. Includes unit coverage in tests/test_openai_responses_translation.py exercising the request body translation, image-part rewriting, and SSE-to-chat-completions translation via httpx.MockTransport. * studio/chat: clamp external max_tokens to 32k to stay within provider caps The chat settings slider already capped maxTokens at 32768 for external models, but a value persisted from a prior local-model session (where the cap can be 128k+) was sent verbatim to the provider — Claude Opus returns 'max_tokens: 131072 > 128000' on requests like that, and other providers have stricter limits still. Expose EXTERNAL_MAX_OUTPUT_TOKENS from provider-capabilities (32k) and use it both for the slider max and as the clamp inside chat-adapter's external-request body. 32k sits below the tightest declared output limit across the providers we ship and well above what a typical chat reply needs; the local-model path is unaffected. * studio: drop temperature/top_p for OpenAI reasoning models gpt-5.x / o3 / gpt-4.5 are reasoning-class models served via /v1/responses, and reject temperature and top_p with 'Unsupported parameter' 400s. The OpenAI registry allowlist already scopes the picker to those families, so neither knob ever applies on this branch. - external_provider._stream_openai_responses no longer puts temperature or top_p in the request body (kept on the method signature for API symmetry with the other stream methods). - ProviderCapabilities gains temperature/topP flags; OpenAI sets both to false. ChatSettingsPanel hides the sliders for OpenAI so the user does not see inert controls. - chat-adapter omits temperature/top_p from the external request body when the active provider does not advertise them. - OpenAIChatCompletionsRequest type marks both as optional, matching the new chat-adapter shape. - test_responses_request_body_uses_input_and_instructions: assertions flipped to confirm temperature / top_p are absent from the body. * studio: stop forwarding top_k to Anthropic Claude 4.x (Opus / Sonnet / Haiku 4.x) returns 400 'top_k is deprecated for this model' on any request that includes top_k. It was always optional on the older 3.x line, so dropping it unconditionally for every Anthropic call is the simplest path — no per-model gate to maintain. - external_provider._stream_anthropic no longer adds top_k to the Messages body (kept on the method signature for API symmetry). - provider-capabilities sets anthropic.topK = false so the chat settings panel hides the Top K slider for Anthropic providers and chat-adapter does not send top_k in the external request. * studio: gate Anthropic top_k drop to Claude 4.7 only Previous commit ( |
||
|
|
b95b055b4a
|
studio: comment out training_args.bin torch.load fallback (#5419)
torch.load defaults to weights_only=True since torch 2.6, which rejects
the pickled TrainingArguments dataclass that HF Trainer saves to
training_args.bin. Studio ships on torch 2.9 / 2.10 so this fallback
was already failing on every call, getting swallowed by the surrounding
try/except, and falling through to the existing adapter_config.json /
config.json / directory-name paths that already produce the answer.
In get_base_model_from_lora the path is also reachable via the
GET /loras/{lora_path:path}/base-model route on user-supplied paths
(including third-party LoRAs pulled from HF), so "fixing" it with
weights_only=False would re-introduce a pickle deserialization sink
on remote-supplied input.
Comment both blocks out and leave a TODO so the intent is preserved
for whoever wants to re-enable this with proper safe_globals or a
trust check.
|
||
|
|
1c2a86f84a
|
Studio: vary empty chat sloth mascot by local time of day (#5354)
Some checks are pending
Security audit / npm scan-packages (Studio frontend tarballs) (push) Waiting to run
Security audit / workflow-trigger lint (pull_request_target / cache-poisoning) (push) Waiting to run
Security audit / pytest tests/security (push) Waiting to run
Security audit / npm provenance + new install-script diff (push) Waiting to run
Studio API CI / Studio API & Auth Tests (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Studio GGUF CI / Tool calling Tests (push) Waiting to run
Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio API CI / Studio API & Auth Tests (push) Waiting to run
Mac Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Mac Studio GGUF CI / Tool calling Tests (push) Waiting to run
Mac Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio UI CI / Chat UI Tests (push) Waiting to run
Mac Studio Update CI / Studio Updating Tests (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Studio UI CI / Chat UI Tests (push) Waiting to run
Studio Update CI / Studio Updating Tests (push) Waiting to run
Windows Studio API CI / Studio API & Auth Tests (push) Waiting to run
Windows Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Windows Studio GGUF CI / Tool calling Tests (push) Waiting to run
Windows Studio GGUF CI / JSON, images (push) Waiting to run
Windows Studio UI CI / Chat UI Tests (push) Waiting to run
Windows Studio Update CI / Studio Updating Tests (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* feat: vary empty chat sloth mascot by local time of day * fix: compute welcome mascot after mount to avoid hydration mismatch * tweak: sloth love to sloth shy image --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
d1725a31aa
|
style: unify thinking trace icon with Think toggle icon (#5407)
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
6e8bf4d51b
|
studio: fix training page regressions from the security hardening pass (#5409)
* studio: allow huggingface.co and datasets-server.huggingface.co in CSP connect-src
The security hardening pass (
|
||
|
|
0881a7a5d7
|
studio: security and hardening pass (auth rate-limit, sandbox, path containment, schema validation, headers) (#5375)
Some checks are pending
Security audit / npm scan-packages (Studio frontend tarballs) (push) Waiting to run
Security audit / workflow-trigger lint (pull_request_target / cache-poisoning) (push) Waiting to run
Security audit / pytest tests/security (push) Waiting to run
Security audit / npm provenance + new install-script diff (push) Waiting to run
Studio API CI / Studio API & Auth Tests (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Studio GGUF CI / Tool calling Tests (push) Waiting to run
Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio API CI / Studio API & Auth Tests (push) Waiting to run
Mac Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Mac Studio GGUF CI / Tool calling Tests (push) Waiting to run
Mac Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio UI CI / Chat UI Tests (push) Waiting to run
Mac Studio Update CI / Studio Updating Tests (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Studio UI CI / Chat UI Tests (push) Waiting to run
Studio Update CI / Studio Updating Tests (push) Waiting to run
Windows Studio API CI / Studio API & Auth Tests (push) Waiting to run
Windows Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Windows Studio GGUF CI / Tool calling Tests (push) Waiting to run
Windows Studio GGUF CI / JSON, images (push) Waiting to run
Windows Studio UI CI / Chat UI Tests (push) Waiting to run
Windows Studio Update CI / Studio Updating Tests (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* studio: contain export and dataset paths under their configured roots
resolve_under_root and resolve_dataset_path previously returned absolute
paths unchanged, so an authenticated client could supply
save_directory="/tmp/escape" (or any other absolute path) and have the
exporter drop adapter files anywhere the server user could write. This
turned up during a recent audit pass where an authenticated POST to
/api/export/export/lora with save_directory="/tmp/lora_escape_test"
returned 200 and wrote adapter_model.safetensors, adapter_config.json,
and tokenizer files under /tmp.
The fix is two-layered:
storage_roots.py adds an _assert_contained(resolved, root) helper that
runs after path resolution and rejects any result whose realpath does
not sit under realpath(root). resolve_under_root now rejects '..'
segments and null bytes outright, and only accepts absolute inputs when
they are already inside the configured root (internal call sites that
re-resolve a stored absolute path stay idempotent;
worker.py:resolve_output_dir(output_dir) etc. continue to work).
resolve_dataset_path picks up the same containment rule, scoped to the
three dataset roots.
models/export.py adds field_validator("save_directory", mode="before")
to ExportCommonOptions and ExportGGUFRequest so bad input fails fast at
422 with a clear message rather than a 500 deep inside the resolver.
The validator rejects empty/whitespace, null bytes, control chars,
strings longer than 255 chars, absolute paths, and '..' segments.
routes/export.py:_export_details now returns os.path.relpath(output_path,
exports_root()) so the Export Complete dialog and /api/models/loras no
longer leak the absolute install prefix to the UI; the basename is
used as a last-resort fallback.
Verified end to end:
- POST /api/export/export/lora {"save_directory":"/tmp/foo"} -> 422
"save_directory must be a name or relative path under the export
root; absolute paths are rejected". /tmp/foo is not created.
- "../../etc/escape" -> 422 "may not contain '..' segments".
- save_directory="my_subdir" -> still accepted (400 only because the
test had no checkpoint loaded yet, not because of validation).
- Internal idempotent re-resolve via resolve_export_dir(absolute path
that is already under exports_root) returns the same path unchanged.
* studio/sandbox: harden bash + python tool execution
The sandboxed Bash and Python tool channels in Chat ran with a thin
preexec hook (PR_SET_NO_NEW_PRIVS + RLIMIT_FSIZE only). Bash had a
small word blocklist; Python had an AST safety pass aimed at
signal-tampering and shell-escape primitives. An audit pass showed
several gaps that a tool-calling model could trigger inadvertently:
- bash curl/wget/nc reached AWS IMDSv2 and returned live STS
credentials for the instance role.
- python "import socket; s.connect((169.254.169.254, 80))"
reached the same endpoint regardless of the bash blocklist.
- "cat /etc/passwd" was blocked at the bash side (because "passwd"
is in the blocklist), but "open('/etc/passwd').read()" in Python
happily returned its contents.
- "chr(115)+chr(117)+chr(100)+chr(111)" style dynamic-arg
construction slipped through the AST shell-escape check.
- The supervisor used proc.kill() on timeout, which only signals
the immediate pid; bash-backgrounded children survived. A fork
bomb could spawn for the full 300s timeout window.
- Session work directories under ~/studio_sandbox/<id>/ were
created with default umask (0o755), so any other UID on the host
could enumerate them.
- session_id sanitisation used a one-shot str.replace("..",""),
which is non-iterative and a small footgun.
This commit takes a conservative middle path: the sandbox still
runs as the Studio UID with no namespace tricks where the kernel
disallows them, but every chokepoint is tightened.
_sandbox_preexec now:
- calls os.setsid() so children share a process group; the
supervisor uses os.killpg(SIGKILL) on timeout/cancel so
backgrounded children die with the parent (new _kill_process_tree
helper, wired into _cancel_watcher and both _bash_exec /
_python_exec timeout branches).
- calls os.umask(0o077) so files the child writes default to 0o600.
- applies PR_SET_PDEATHSIG=SIGKILL so an orphaned child dies if
Studio exits.
- best-effort unshare(CLONE_NEWNET) for a private network namespace
(failure is logged and swallowed; defense-in-depth is still in
place via the bash blocklist and the AST checker below).
- sets RLIMIT_NPROC=10000 (tunable via UNSLOTH_STUDIO_SANDBOX_NPROC),
RLIMIT_AS=8GB, RLIMIT_CPU=300, RLIMIT_NOFILE=1024. The 10k NPROC
figure is chosen to sit well above the ~500 LWPs a healthy Studio
+ llama-server combination already uses while still capping a
runaway fork bomb. NPROC counts LWPs per real UID, so a lower
figure (e.g. 256) starves legitimate bash forks
("bash: fork: retry: Resource temporarily unavailable").
_get_workdir:
- rejects session_id that doesn't match [A-Za-z0-9_-]{1,64};
non-matching values bucket into a shared "_invalid" dir.
- chmod 0o700 on both the workdir and on ~/studio_sandbox/ so
other UIDs cannot read another session's contents.
_BLOCKED_COMMANDS_COMMON gains: doas, pkexec, halt, poweroff, curl,
wget, nc, ncat, netcat, socat, ssh, scp, sftp, rsync, eval, source.
The intent is to keep general bash usage working (echo, ls, pipes,
loops, for, head, etc.) while denying the obvious egress and
escalation paths.
The AST checker (_check_signal_escape_patterns) is split into the
existing shell/signal/loop checks plus a new narrow IO denylist:
- Always flag non-literal args to anything in _SHELL_EXEC_FUNCS,
not just _STRING_SHELL_FUNCS. Closes the dynamic-arg bypass.
- Reject calls to socket.create_connection, socket.socket().connect,
urllib.request.urlopen, http.client.HTTP*Connection, requests.*,
httpx.* whose literal host argument is in a cloud-metadata
denylist (169.254.169.254 + 169.254.* + 100.64.*, plus the
GCP/Alibaba/ECS metadata hostnames and IPv6 link-local). Public
hosts (example.com, huggingface.co, ...) still work. Dynamic
hosts cannot be statically blocked; mitigated by the bash
blocklist + the netns where the kernel allows it.
- Reject literal open("/etc/passwd"), /etc/shadow, /etc/sudoers,
/etc/ssh/*, and /proc/<pid>/environ. Other files
(/etc/os-release, /etc/hostname, /tmp/*, user dirs) still work.
The _check_code_safety summariser is updated to include the new
network_calls and sensitive_file_reads buckets in its error string.
Regression-checked: echo, sleep, ls /tmp, for loops, piped helpers
(echo a | tr a A), urllib.request.urlopen("http://example.com"),
socket.getaddrinfo("example.com",80), open("/etc/os-release"),
open("/tmp/...","w") all still succeed. curl, wget, nc, ssh, rm,
socket.create_connection(("169.254.169.254",80)),
open("/etc/passwd"), open("/proc/self/environ") all correctly
blocked.
* studio: rate-limit login, rotate refresh tokens, add logout, security headers, gate bootstrap injection
A pass over the auth surface found a cluster of related issues that this
commit closes together.
Login (routes/auth.py):
- Add an in-memory per-IP login rate limiter. Five failed POSTs to
/api/auth/login inside a 60s window produce 429 with Retry-After.
A successful login clears the bucket. Previously 30 wrong passwords
in under one second was accepted as 30x 401, which combined with
the (now fixed) admin-username leak from /api/auth/status made
brute-force trivial against a small password.
Logout (routes/auth.py):
- New POST /api/auth/logout returns 204 and calls
storage.revoke_user_refresh_tokens(subject) so the refresh token
is no longer valid. Previously POST /api/auth/logout returned 405
and there was no way to invalidate refresh tokens short of
changing the password. Frontend session.ts already calls
clearAuthTokens() to drop localStorage; the new endpoint lets the
client also tell the server to revoke server-side state.
Refresh-token rotation (routes/auth.py + auth/storage.py):
- New storage.consume_refresh_token(token) atomically validates +
deletes a refresh token, returning (username, is_desktop). The
/api/auth/refresh handler now mints both a new access AND a new
refresh token; the supplied token becomes invalid. Replaying a
consumed refresh returns 401 "Invalid or expired refresh token".
The previous refresh_access_token helper is left in place for
callers that intentionally want the non-rotating shape; nothing
in the route layer uses it now.
/api/auth/status no longer leaks default_username (models/auth.py +
routes/auth.py):
- AuthStatusResponse.default_username becomes Optional[str] with a
None default; the handler always returns None. The frontend already
hardcodes HIDDEN_LOGIN_USERNAME = "unsloth" (auth-form.tsx:82), so
no UI change is required.
window.__UNSLOTH_BOOTSTRAP__ no longer auto-injects (main.py):
- _inject_bootstrap is now opt-in via the
UNSLOTH_STUDIO_INJECT_BOOTSTRAP env var. The previous default
(inject whenever requires_password_change is true) embedded the
plaintext bootstrap password into the first-boot HTML for any
caller that hit /, /change-password, or any unknown SPA path.
Browser extensions and any XSS payload on the page could read it
trivially. With the new gate the bootstrap password lives only in
the auth/.bootstrap_password file (mode 0o600) where it has always
been; users typing it into a current-password field is the right
UX. routes/auth.py:change_password also clears
app.state.bootstrap_password defensively.
Security headers + server fingerprint (main.py + run.py):
- New SecurityHeadersMiddleware adds Content-Security-Policy,
X-Frame-Options: DENY, X-Content-Type-Options: nosniff,
Referrer-Policy: no-referrer,
Permissions-Policy: camera=(), microphone=(), geolocation=(),
interest-cohort=(), and stamps server: unsloth-studio so the
generic uvicorn banner no longer fingerprints the stack. The
uvicorn.Config gains server_header=False so it stops emitting its
own Server header.
/api/health minimisation (main.py):
- Unauthenticated GET /api/health returns just
{"status":"healthy","timestamp":...} so load-balancer liveness
probes keep working without leaking version, device_type,
chat_only, desktop_protocol_version, or studio_root_id to
arbitrary callers. A request that presents a valid Bearer token
still gets the full diagnostic payload so internal launchers and
sibling-Studio detection (which compares studio_root_id) keep
working.
Verification:
- 30 wrong-password POSTs to /api/auth/login -> first 5 = 401, 6th
through 30th = 429.
- POST /api/auth/logout with a fresh token -> 204. The matching
refresh token then fails 401.
- Login -> R1; /api/auth/refresh with R1 -> new access + R2 (R2 !=
R1); /api/auth/refresh with R1 again -> 401; /api/auth/refresh
with R2 -> still succeeds once and rotates again.
- curl /api/auth/status -> default_username: null.
- curl http://127.0.0.1/ does not contain __UNSLOTH_BOOTSTRAP__.
- curl -I / shows CSP, X-Frame-Options: DENY,
X-Content-Type-Options: nosniff, Referrer-Policy: no-referrer,
Permissions-Policy, and server: unsloth-studio.
- curl /api/health unauthenticated -> {status, timestamp} only.
curl with Authorization: Bearer <valid> -> full payload.
- Existing /api/system, /api/models/list, /api/train/status,
/api/inference/status, /api/auth/api-keys, login flow, SPA root
all still return 200 after the changes (regression smoke).
* studio: add SecurityHeadersMiddleware, MaxBodyMiddleware, /recipes redirect, gate _inject_bootstrap, minimise /api/health
This commit lands the main.py-side changes that share a single
middleware-registration spot. They are kept together because every
change here is either (a) a top-level middleware definition that has
to be added next to LoggingMiddleware, or (b) a route handler at the
same file-level.
SecurityHeadersMiddleware (Content-Security-Policy, X-Frame-Options:
DENY, X-Content-Type-Options: nosniff, Referrer-Policy: no-referrer,
Permissions-Policy, server: unsloth-studio). The previous responses
emitted no CSP, no XFO, no Referrer-Policy and were stamped
server: uvicorn.
MaxBodyMiddleware rejects POST/PUT/PATCH on the inference / dataset /
data-recipe / train / export prefixes when Content-Length exceeds
UNSLOTH_STUDIO_MAX_BODY_MB (default 100). The audit hit this by
attaching a 50 MB plain-text file to a chat message and watching
Studio base64-encode it into the JSON body; uvicorn has no enforced
cap so the only previous guard was the per-file 50 MB ceiling that
data-recipe upload routes already enforce. The new middleware extends
that ceiling to the OpenAI-compat path that the Chat attachments
flow through. Verified: a 200 MB JSON POST to /v1/chat/completions
returns HTTP 413 "Request body too large (209,715,264 bytes; max
104,857,600)". A small valid request continues to reach the handler.
_inject_bootstrap is gated behind UNSLOTH_STUDIO_INJECT_BOOTSTRAP.
The previous default was to inline window.__UNSLOTH_BOOTSTRAP__ =
{username, password} into the first-boot HTML whenever
requires_password_change was true, which exposed the plaintext
bootstrap password to any browser extension, page script, or LAN
caller on -H 0.0.0.0. The bootstrap password remains in the on-disk
.bootstrap_password file (mode 0o600) where it has always lived;
users typing it into a current-password field is the right UX.
/api/health unauthenticated returns {"status":"healthy","timestamp":
...} only; the previous payload (version, device_type, chat_only,
desktop_protocol_version, supports_desktop_auth, studio_root_id,
native_path_leases_supported) is preserved for callers that present
a valid Bearer token, so internal launchers and sibling-Studio
detection (which compares studio_root_id) keep working.
/recipes -> /data-recipes 308 redirect. The Data Recipes page lives
at /data-recipes; users typing /recipes hit the SPA catch-all and
saw "Not Found". The redirect also preserves any tail path, so
/recipes/<rest> -> /data-recipes/<rest>.
Verified end to end with curl: CSP / XFO / X-Content-Type-Options /
Referrer-Policy / Permissions-Policy all present on /, server header
is now unsloth-studio (uvicorn's own banner is suppressed via
server_header=False in run.py from the auth-batch commit). Followed
the /recipes redirect lands on the SPA HTML.
* studio: bound TrainingStartRequest hyperparameters at the schema level
POST /api/train/start accepted any value for learning_rate, batch_size,
max_steps, max_seq_length, warmup_steps, warmup_ratio, num_epochs,
save_steps, weight_decay, gradient_accumulation_steps, lora_r,
lora_alpha and lora_dropout, including -1, 0, 1e9, and non-numeric
strings like 'abc' or 'two' (which silently coerce to 0 in the
trainer). Probing showed the API returning 200 to learning_rate=-1
and batch_size=0; only max_steps had any partial clamping.
This commit adds field_validator on every numeric hyperparameter.
Bounds are chosen wide enough to span realistic single-host
configurations (B200 with 180 GB of memory comfortably fits the
upper end) while rejecting the values that always produce broken
training:
- learning_rate: parses str/float, requires 0 < lr < 1.0. Non-numeric
input raises with "learning_rate must be parseable as float (got
'abc')" instead of silently coercing to 0.
- batch_size: [1, 1024].
- gradient_accumulation_steps: [1, 4096].
- num_epochs: [1, 1000].
- max_steps: [1, 1_000_000].
- max_seq_length: [1, 131072].
- warmup_steps: [0, max_steps].
- warmup_ratio: [0.0, 1.0].
- save_steps: [0, 1_000_000].
- weight_decay: [0, 10] (typical 0..0.1).
- lora_r: [1, 512].
- lora_alpha: [1, 1024].
- lora_dropout: [0.0, 1.0).
Each validator names the offending field in its ValueError message
so the 422 response body identifies which input is bad. The
learning_rate validator returns its result as str (the schema field
type is str("2e-4") for backwards compatibility) so existing call
sites that float() the value continue to work.
Verified:
- learning_rate=-1 -> 422 "learning_rate must be > 0 (got -1.0);
typical range is 1e-6 .. 1e-3".
- learning_rate='abc' -> 422 "must be parseable as float".
- batch_size=-1 / 0 / 999999 -> 422 "batch_size must be in [1, 1024]".
- batch_size='two' -> 422 (pydantic int parser).
- max_steps=0 / -5 -> 422 "must be a positive int".
- max_seq_length=200000 -> 422 "must be in [1, 131072]".
- warmup_ratio=2.5 -> 422 "must be in [0.0, 1.0]".
- lora_dropout=1.5 -> 422 "must be in [0.0, 1.0)".
- Valid request with learning_rate='2e-4', batch_size=1, max_steps=5
passes validation and the training run starts as normal.
* studio: redact image-decode errors, clean checkpoint dirs on cancel, tolerate Stop-button + tool-result message shapes
Three small fixes that fall under "do not let the audit findings
become user-visible papercuts".
routes/inference.py - image-decode error redaction (the audit hit
this with a 0-byte / malformed / wrong-extension image upload). The
three image-normalise sites previously raised HTTPException(400,
detail=f"Failed to process image: {e}"). When PIL raised
UnidentifiedImageError(io.BytesIO(raw)) the message string included
"<_io.BytesIO object at 0x7e40a5d7bf60>", leaking both the Python
class name (confirming the PIL/io stack) and a heap address (mildly
useful for ASLR-bypass chaining if another memory-corruption bug is
ever found). Each site now catches UnidentifiedImageError and
returns the generic "Unsupported or corrupt image format"; the
fall-through generic except returns "Failed to process image". No
exception-repr is interpolated into a response body anywhere along
these paths.
core/training/training.py - checkpoint cleanup on cancel. When a
user clicks Cancel Training, the trainer flips _cancel_requested=True
and the supervisor force-terminates the subprocess. The trainer
writes checkpoint-<step> directories under output_dir every
save_steps; previously these survived the cancel and accumulated on
disk (the audit recorded ~67 MB stuck after a 200-step cancel with
save_steps=20). New helper _cleanup_cancelled_checkpoints(output_dir)
globs checkpoint-<int> entries and removes them. It is gated by a
realpath containment check against outputs_root() so it cannot
accidentally rmtree anything outside the configured outputs root.
force_terminate() invokes the helper after the subprocess join when
_cancel_requested is true. Stop-and-Save runs are unaffected because
that path keeps _cancel_requested=False.
models/inference.py - chat message shape tolerance. Two related
frontend interactions used to crash the request validator:
- After the Stop button truncates a generation, the frontend
retained {role:"assistant", content:""} in the conversation
history and replayed it on the next send. ChatMessage previously
required role="assistant" to have non-empty content or tool_calls,
so the next message returned 422 and the thread was permanently
broken. The validator now normalises empty assistant content to
None so the request round-trips and the trailing empty turn can
be ignored downstream.
- The frontend's second-round tool POST drops the streamed
tool_call_id, hitting the strict-spec check "role=tool requires
tool_call_id". The validator now synthesises an opaque id
(call_<8 hex>) when missing, so the request reaches the handler
and the model's final summarising response gets generated. The
proper fix lives in the frontend (carry the streamed id through
the second POST) and will follow.
Verified end to end with curl: HTTP 400 (model not loaded) on both
the empty-assistant history shape and the tool-result-without-id
shape, instead of HTTP 422 from the schema validator.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* studio: tighten code comments from security-hardening pass
Trim verbose docstrings and inline finding references added in the
previous commits in this branch. Functionality unchanged.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* studio: await get_current_subject in /api/health and make refresh-token consumption atomic
The /api/health auth probe called get_current_subject(creds) without
awaiting it. The coroutine object is truthy, so any caller presenting a
Bearer header (valid or not) received the full diagnostic payload
including version, device_type, studio_root_id, etc. Await the coroutine
and treat HTTPException as 'fall back to the minimal liveness payload'.
consume_refresh_token did SELECT then DELETE WHERE id under default
autocommit isolation. Two concurrent POST /api/auth/refresh requests
could both win the SELECT before either DELETE ran, defeating
single-use refresh-token rotation. Replace with a single
DELETE ... WHERE token_hash = ? AND expires_at >= ? RETURNING ...
statement so the validate-and-delete lands as one atomic op under
SQLite's write lock (3.45.1 supports RETURNING; min was 3.35).
* studio: enforce body cap on chunked uploads and drop unsafe-inline from script-src
MaxBodyMiddleware previously only inspected the declared Content-Length
header; clients omitting it or sending Transfer-Encoding: chunked
bypassed the cap and could still drive an OOM via the downstream
JSON / file readers on /v1/chat/completions, /api/inference, /api/data-recipe,
/api/datasets, /api/train, /api/export. Rewrite as a raw ASGI middleware
that drains and counts http.request frames, replies 413 once the running
total exceeds UNSLOTH_STUDIO_MAX_BODY_MB before invoking the FastAPI
handler, and replays the buffered body to downstream so route code that
calls request.json() / await request.body() works unchanged.
CSP previously included 'unsafe-inline' on script-src, which defeats the
main XSS protection. The frontend bundle does not need inline scripts;
the only inline <script> the backend ever emits is _inject_bootstrap,
which is opt-in via UNSLOTH_STUDIO_INJECT_BOOTSTRAP. Drop 'unsafe-inline'
from script-src by default; when _inject_bootstrap fires, generate a
per-response nonce, embed it on the inlined <script>, and have
SecurityHeadersMiddleware splice 'nonce-XXX' into the CSP for that one
response (the internal x-internal-script-nonce header is popped before
the response leaves the server). 'unsafe-inline' stays on style-src for
Vite-injected styles.
* studio: drop empty assistant sentinel before passthrough
ChatMessage._validate_role_shape normalises role="assistant", content=""
(the post-Stop sentinel emitted by the frontend) to content=None so the
in-process path can drop it via _extract_content_parts. The passthrough
path then ran m.model_dump(exclude_none=True), which strips the now-None
content key entirely, sending {"role":"assistant"} to llama-server / the
OpenAI-compat backend. That fails upstream and leaves the user without a
recoverable Stop->resume.
Add _drop_empty_assistant_sentinels and call it at both passthrough
message origins: _openai_messages_for_passthrough (covers
/v1/chat/completions and the Responses API which routes through it) and
the anthropic_messages_to_openai output before
_anthropic_passthrough_*. Assistant messages that carry only tool_calls
(no content) are preserved.
* studio/tests: cover audit-fix surfaces and rebase pre-existing tests
Adds and updates pytest coverage for the four bot-flagged audit fixes
landed earlier in this branch and rebases two pre-existing tests that
were broken by the relaxed-validator and /api/health auth-gate changes.
studio/backend/tests/test_middleware.py (new)
MaxBodyMiddleware: small protected, large declared, unprotected
passthrough, chunked-upload-over-cap rejection (the regression for
the original Content-Length-only gap), and chunked-under-cap replay.
SecurityHeadersMiddleware: script-src no longer carries
'unsafe-inline', style-src still does, default headers
(XFO/XCTO/Referrer-Policy/Permissions-Policy/server), and the
internal x-internal-script-nonce header is consumed by the
middleware and converted to 'nonce-XXX' in the CSP.
/api/health: no auth -> minimal, invalid Bearer -> minimal
(the await regression), valid Bearer -> full diagnostic payload.
studio/backend/tests/test_desktop_auth.py
consume_refresh_token: second-call returns None, expired returns
None, and a 64-thread concurrent pile-up against the same hash
produces exactly one successful consumer (regression for the
SELECT-then-DELETE race).
test_health_response_reports_desktop_capability_fields: rebase
against the new health_check(request) signature by going through
TestClient with a real bearer instead of asyncio.run-ing the
handler directly.
studio/backend/tests/test_openai_tool_passthrough.py
Pin the new ChatMessage tolerance: assistant without content or
tool_calls is tolerated (normalises content -> None), empty-string
and empty-list assistant content normalise to None, and a missing
/ empty tool_call_id on role='tool' is synthesised as call_<hex>
rather than raising. Tests for _drop_empty_assistant_sentinels
cover the three drop shapes (empty string, empty list, missing
content key), preservation of assistant text and tool_calls-only
messages, and end-to-end through
_openai_messages_for_passthrough.
studio/backend/main.py
SecurityHeadersMiddleware.dispatch used response.headers.pop(...)
for the nonce-header handoff; Starlette's MutableHeaders has no
pop. Read-then-del so the internal handoff header is still
stripped before the response leaves the server.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* studio/tests: rebase three more pre-existing CI tests against this branch
CI on PR #5375 was red on three tests that were tuned for behaviour
predating this branch. Updates each so the assertions match what the
audit fixes intentionally changed; no production code touched.
studio/backend/tests/test_trained_model_scan.py
test_scan_trained_models_includes_lora_and_full_finetune_outputs
passed an absolute tmp_path through scan_trained_models, which now
runs resolve_output_dir / _assert_contained against outputs_root().
Repoint outputs_root() at tmp_path via monkeypatch so the fixture
dirs land under the configured root and the realpath containment
check passes.
tests/test_studio_install_workspace_guard.py
test_health_endpoint_exposes_studio_root_id_not_raw_path read
the first 1500 bytes after @app.get("/api/health") and asserted on
the studio_root_id literal. The handler grew (unauth short-circuit
+ await dependency gate) and the literal slid past the byte window.
Replace the fixed window with a slice up to the next top-level
@app.* decorator so the test surveys the whole handler regardless
of size.
tests/studio/studio_api_smoke.py
The "login burst (5x wrong pw) -> 401 each" assertion was tagged
"When/if we add one, this assertion updates in the same PR." We
added the per-IP rate-limit in routes/auth.py
(_LOGIN_MAX_FAILS=5/60s) but missed the assertion update. Rewrite
the burst probe to observe the new invariant: at least one 401,
eventual transition to 429, and Retry-After present on the 429.
Adds a small _login_with_headers helper since the existing login()
helper drops response headers.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci(studio-ui): set UNSLOTH_STUDIO_INJECT_BOOTSTRAP=1 for Playwright Studios
The Chat UI Playwright test drives the first-boot change-password
form, which (per playwright_chat_ui.py step "1. Change-password
through the UI") pre-seeds the hidden current_password field from
window.__UNSLOTH_BOOTSTRAP__. That global is only emitted when the
backend's _inject_bootstrap path fires, which since the security
pass on this branch is gated behind UNSLOTH_STUDIO_INJECT_BOOTSTRAP
and defaults to off. Without the global, the React form's
current_password validator never satisfies, the submit button stays
disabled, and the composer.wait_for() probe times out on
/change-password.
Re-enable injection only for the CI Studios that drive the chat UI
across linux/mac/windows. Production deployments are unaffected: the
env var has to be explicitly opted into, and the on-disk
auth/.bootstrap_password remains the source of truth for human users
typing the password in by hand.
Covers all eight Studio launch sites: the primary chat-ui boot and
the "extra UI tests" boot for each of the three OSes, plus the
pipeTransport JSON-crash retry relaunches in the macOS workflow that
re-spawn Studio mid-job.
A follow-up frontend PR will add a visible current_password input so
the form satisfies its own validator without needing the bootstrap
auto-fill at all; once that lands this CI knob can come back out.
* studio/sandbox: drop unshare(CLONE_NEWNET); add trusted-host allowlist; block sandbox file uploads; raise CPU rlimit default to 600 s
CLONE_NEWNET inside _sandbox_preexec silently killed every outbound
HTTP request from sandboxed Python whenever the kernel allowed
unprivileged user namespaces. requests.get('https://huggingface.co'),
urllib.request.urlopen('https://en.wikipedia.org/wiki/...'),
socket.connect(('arxiv.org', 443)) all failed despite the AST visitor
intending to allow them. The bash blocklist (curl / wget / nc / ssh /
scp / sftp / rsync / socat / eval / source) plus the AST-level
metadata-host denylist still carry the network policy after this
change; CLONE_NEWNET was redundant with both.
Add _TRUSTED_PUBLIC_HOST_LITERALS + _TRUSTED_PUBLIC_HOST_SUFFIXES
(~100 informational hosts: Wikipedia language subdomains, Wikimedia,
Wikidata, Google search, Bing, DuckDuckGo, HuggingFace, GitHub,
raw.githubusercontent.com, arXiv, StackOverflow / Stack Exchange,
MDN, docs.python.org, PyTorch / TensorFlow / NumPy / pandas docs,
pypi / files.pythonhosted.org / npmjs / crates.io, ReadTheDocs,
arXiv, Britannica, BBC / Reuters / Nature / Science, NASA / CDC /
NIH / WHO open data, api.weather.gov). The visitor now blocks
literal hosts that are neither metadata nor trusted with a short
LLM-readable string so the model can retry with an allowed source
instead of choking on a multi-line error.
Block upload-shape calls regardless of host: requests.post / put /
patch / delete / request with files= or data=open(...) /
data=bytes_literal; httpx equivalents; urllib.request.urlopen /
Request with data=...; HuggingFace upload_file / upload_folder /
upload_large_folder / create_commit (module-level FQ paths AND
method-name match on any receiver). Message: "Blocked: file upload
disallowed in sandbox".
Bump UNSLOTH_STUDIO_SANDBOX_CPU_S default 300 -> 600 s so long
agentic chains that span multiple tool calls don't get SIGXCPU'd
mid-stride. Env-var override path is unchanged.
Host normalisation now strips trailing dot, userinfo @, and explicit
port before allowlist / denylist comparison so trailing-DNS-dot,
userinfo-smuggling, and explicit-:443 URLs are decided correctly.
* studio: raise default request-body cap from 100 MB to 500 MB
UNSLOTH_STUDIO_MAX_BODY_MB default goes 100 -> 500 to comfortably
cover vision + audio + multi-recipe-batch JSON payloads. The
MaxBodyMiddleware stream-counting logic from this branch's earlier
|
||
|
|
ef9f672fe8
|
security: NOT affected by Mini Shai-Hulud (May-12 wave) -- forward-looking hardening only (#5397)
* scripts/scan_*: add Mini Shai-Hulud May-12 IOC strings and pin-blocklists Append the May-12 2026 wave indicators (git-tanstack.com, transformers.pyz, /tmp/transformers.pyz, "With Love TeamPCP", "We've been online over 2 hours") to all three scanner IOC tables, add BLOCKED_NPM_VERSIONS (42 TanStack pkgs, 4 opensearch versions, 3 squawk pkgs) in scan_npm_packages.py and lockfile_supply_chain_audit.py (kept byte-identical), add BLOCKED_PYPI_VERSIONS (guardrails-ai 0.10.1, mistralai 2.4.6, lightning 2.6.2/2.6.3) plus RE_MAY12_IOC wiring across check_py_file/check_shell_file/check_workflow_file in scan_packages.py. The npm orchestrator and the lockfile auditor now short-circuit on a blocked entry before fetching the tarball, and the PyPI download pipeline drops blocked specs before pip download is invoked. * tests/security: regression suite for supply-chain scanners Adds offline fixture corpus and pytest coverage for scan_npm_packages, scan_packages, and lockfile_supply_chain_audit so future IOC-table drift surfaces at PR time. Pytest scope narrowed to tests/security so GPU smoke tests are not picked up by default. * ci(security-audit): drop continue-on-error on pip-scan and npm-scan jobs Promote three harden-runner blocks to egress-policy: block with per-job allowlists. Add tests-security job running pytest tests/security as a hard gate. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * scripts: harden third-party downloads, pip resolver pins, atomic writes Pins uv installer and mlx_vlm qwen3_5 patches by commit SHA + SHA-256 checksum, scrubs PIP_* env vars and forces --index-url + --only-binary on pip download, applies tarbomb caps to scan_packages archive walks, and converts non-atomic config writes (kwargs spacer, studio stamper, notebook validator, scan_packages req-file fixer) to mkstemp+os.replace. Also adds host allowlist to notebook_to_python downloader, threads an --allow-shell flag through its shell=True emission with reviewer warning comments, locks both MLX installer scripts to set -euo pipefail, and extends CODEOWNERS so colab snapshot data files require notebook-owner review. * ci(workflows): harden release-desktop / smoke / notebooks workflows Pin dtolnay/rust-toolchain to a 40-char SHA, scope release-desktop permissions to read at workflow level with job-level write only on the build job, append --ignore-scripts to every npm ci / npm install in studio-frontend-ci / wheel-smoke / studio-tauri-smoke / release-desktop, validate client_payload.ref shape via an env-var-isolated regex on every notebooks-ci job, and add step-security/harden-runner in audit mode as the first step of release-desktop and mlx-ci. * scripts: promote silent scanner failures to non-zero exit codes scan_packages now returns 2 on pip-download failure and emits a CRITICAL archive_corrupted finding on truncated wheels/sdists. notebook_to_python exits 1 on per-notebook failures; notebook_validator wraps the stash/pop in try/finally; lockfile audit rejects bare UNSLOTH_LOCKFILE_AUDIT_SKIP=1 with a loud GitHub Actions warning. * Add npm cooldown + new-install-script gate + Dependabot cooldown Pins min-release-age=7 (npm 11.10+) in repo-root and studio/frontend .npmrc, adds scripts/check_new_install_scripts.py to fail PRs that add a postinstall dep, ships a new security-audit job for npm audit signatures plus the diff, and extends .github/dependabot.yml with cooldown stanzas. Pin @tanstack/react-router to 1.169.9 per GHSA- g7cv-rxg3-hmpx; lockfile regen deferred until that release lands on npm. tests/security gains 4 new tests; full suite 26/26 green. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci(security): fix tanstack pin, exec bits, expand IOC tables to @uipath/@squawk full - Revert --ignore-scripts on Studio install workflows: vite build needs esbuild's native postinstall (per PR #5392 rationale). Keep --ignore-scripts on security-audit.yml's standalone npm audit job. - Pin @tanstack/react-router to the actual published 1.169.2 (was a forward-looking 1.169.9 that does not exist on npm; broke npm ci). - Drop redundant repo-root .npmrc; studio/frontend/.npmrc covers the only npm project today (root cooldown re-instate via dependabot.yml). - Restore exec bits on 7 files my filesystem stripped during cherry-pick. - Expand BLOCKED_NPM_VERSIONS with full safedep.io + Aikido enumeration: 22 @squawk/* packages with 5 versions each (110 entries; previously 3 entries with 1 version each), and 66 @uipath/* packages (entirely missing before). Mirror in scripts/lockfile_supply_chain_audit.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests/security: suppress CodeQL py/incomplete-url-substring-sanitization The two flagged 'X' in Y assertions are NOT URL sanitization checks. They verify our scanner WROTE a known IOC literal into its stdout / Finding.evidence, which is the opposite of an attack surface -- matching the scanner's output is precisely what catches the worm. Inline lgtm[] suppression with a 4-line rationale comment above each. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * scripts/scan_*: expand IOC tables with Aikido full 169-pkg enumeration Per Aikido 2026-05-12 disclosure (373 malicious package-version entries across 169 npm package names), add to BLOCKED_NPM_VERSIONS: - @mistralai/* npm scope (3 packages, 9 versions) -- separate from the PyPI mistralai package already in BLOCKED_PYPI_VERSIONS - @tallyui/* (10 packages, 30 entries) - @beproduct/nestjs-auth (18 versions 0.1.2..0.1.19) - @draftlab/* + @draftauth/* (5 packages) - @taskflow-corp/cli, @tolka/cli, @ml-toolkit-ts/*, @mesadev/*, @dirigible-ai/sdk, @supersurkhet/* - 10 unscoped packages (safe-action, ts-dna, cross-stitch, cmux-agent-mcp, agentwork-cli, git-branch-selector, wot-api, git-git-git, nextmove-mcp, ml-toolkit-ts) Also add to KNOWN_IOC_STRINGS / NPM_IOC_STRINGS: - router_init.js SHA-256 ab4fcadaec49c03278063dd269ea5eef82d24f2124a8e15d7b90f2fa8601266c - tanstack_runner.js SHA-256 2ec78d556d696e208927cc503d48e4b5eb56b31abc2870c2ed2e98d6be27fc96 - bun run tanstack_runner.js marker (the new Bun-prepare-script dropper invocation pattern unique to this wave) Total: 170 packages, 401 versions blocklisted. Studio lockfile still scans clean (0 findings, 0 hard errors). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * scripts/scan_*: web-verification additions (@tanstack/setup, intercom-client) Two findings from cross-checking BLOCKED_NPM_VERSIONS / KNOWN_IOC_STRINGS against GHSA-g7cv-rxg3-hmpx + Aikido + safedep.io + Socket + Semgrep. - Fix asymmetry: @tanstack/setup IOC string was in lockfile_supply_chain_audit.py's NPM_IOC_STRINGS but missing from scan_npm_packages.py's KNOWN_IOC_STRINGS. The literal is the malicious optional-dependency name used by the May-12 TanStack wave; no legitimate npm package of this name exists. - Add intercom-client@7.0.4: the npm counterpart of the lightning 2.6.2/2.6.3 PyPI compromise (Apr-30 wave). Same threat actor (TeamPCP). Confirmed by Semgrep, Aikido, OX Security, Resecurity, Kodem. Safe version is 7.0.3 and earlier. Total BLOCKED_NPM_VERSIONS: 171 packages / 402 versions. Both files remain byte-identical. Studio lockfile still scans clean. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci(security): add workflow-trigger lint refusing pull_request_target + cache-poisoning vectors The two patterns that together powered GHSA-g7cv-rxg3-hmpx (TanStack Mini Shai-Hulud) are now gated at PR time: 1. pull_request_target -- the worm chain started with a fork PR that ran in the base-repo context. Every workflow in this repo today uses 'pull_request' (safe); the lint refuses any new pull_request_target additions outright. workflow_run is restricted, allowed only with an explicit allow-comment. 2. Shared cache keys between PR-triggered workflows and the publish workflow (release-desktop.yml). The TanStack attack chain poisoned a shared Actions cache from a fork PR; the legitimate release workflow then restored the poisoned cache. The lint refuses any cache key that appears in both a PR-triggered workflow and a workflow_dispatch-only / publish workflow. Current tree is clean: 0 pull_request_target, 0 workflow_run, 0 PR-publish cache-key collisions across all 24 workflows. The lint locks that invariant in place. Files: + scripts/lint_workflow_triggers.py (~200 LOC, stdlib + PyYAML) + tests/security/test_lint_workflow_triggers.py (5 tests covering current-tree pass, pull_request_target reject, workflow_run restricted, justified workflow_run accept, cache-key collision reject) ~ .github/workflows/security-audit.yml: new workflow-trigger-lint job, no continue-on-error, harden-runner block-mode, PyYAML only runtime dep. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * security: fix tests-security CI job + CodeQL false-positives Two CI failures on the prior push: 1. pytest tests/security -- 5 lint regression tests failed because scripts/lint_workflow_triggers.py imports PyYAML which is not in the bare runner's Python env. Added pyyaml==6.0.2 to the pip install step alongside pytest. (29 scanner tests already passed.) 2. CodeQL py/incomplete-url-substring-sanitization fired on two test assertions that check the scanner WROTE the IOC literal to its own stdout/stderr. The rule pattern-matches on `"<host>" in <var>` and cannot distinguish a URL sanitizer from a regression-test evidence check. Previous `# lgtm[...]` inline suppressions were detached from the operator when pre-commit reformatted the assert across multiple lines. Rebuilt the IOC literals at runtime (`"git-tanstack." + "com"`) so no URL-shaped source literal appears on the `in` operator line; rule cannot trigger. Verified locally: `pytest tests/security -v` -> 34 passed in 2.70s. * security(studio): defensive .npmrc cooldown aliases + save-exact Two additions to studio/frontend/.npmrc to harden the existing `min-release-age=7` (Mini Shai-Hulud defence): 1. `minimum-release-age=10080` (minutes) -- defensive alias for the same 7-day floor. Some npm versions / wrappers consult one key but not the other; setting both prevents a single upstream setting-name parse change from silently disabling the cooldown. The two keys MUST agree (do not let them drift). 2. `save-exact=true` -- refuses to write back `^x.y.z` ranges into package.json when a maintainer runs `npm install <pkg>` locally. Does NOT rewrite already-present ranges; stops NEW carets from creeping into the manifest as patch-version footguns. Verified: pytest tests/security -> 34 passed in 2.63s. * chore(dependabot): remove dead bun entry for /studio/frontend `package-ecosystem: "bun"` at /studio/frontend was a no-op: that path commits package-lock.json, not bun.lock / bun.lockb, so Dependabot's bun ecosystem silently skipped it. The actual behaviour is unchanged -- the npm entry below the cargo block already owns npm_and_yarn security advisories for /studio/frontend with `open-pull-requests-limit: 0` (version-update PRs suppressed, security PRs flow through). This commit: - Deletes the bun entry (kept a placeholder comment so a future bun migration knows where to slot it back in). - Rewrites the npm /studio/frontend entry comment to explain the real intent: lockfile is the authoritative pin, .npmrc `min-release-age=7` already blocks fresh tarballs at install time, dependabot only needs to surface security advisories. No functional change: same set of dependabot PRs as before (zero version updates, security advisories grouped weekly with cooldown). Verified: pytest tests/security -> 34 passed in 2.67s; YAML parses cleanly via PyYAML. * fix(dependabot): drop unsupported semver-* cooldown keys on github-actions Dependabot's validator rejected the config with: The property '#/updates/0/cooldown/semver-minor-days' is not supported for the package ecosystem 'github-actions'. The property '#/updates/0/cooldown/semver-patch-days' is not supported for the package ecosystem 'github-actions'. The `semver-minor-days` / `semver-patch-days` cooldown knobs are only valid for semver-aware ecosystems (npm, cargo, etc.). The github-actions ecosystem pins via git tags / SHAs, not semver, so only `default-days` is honored. Pre-existing bug on main; surfaced on this PR because the prior commit re-validated the file. Behaviour: github-actions PRs now respect the 7-day cooldown floor (was already the intent), without the no-op semver bands. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
5205bc0ed6
|
Studio: pin GPU at 95% headroom and warn on silent CPU fallback (#5323)
* Studio: pin GPU at 95% headroom and warn on silent CPU fallback
Two related runtime-side fixes for unslothai/unsloth#5106 ("model
loaded fully on RAM instead of VRAM"):
1. GPU pin threshold bump 0.90 -> 0.95
-------------------------------------
``_select_gpus`` and the auto-ctx pin loop in ``start_llama_server``
used a ``pool * 0.90`` threshold to decide whether the model fits on
GPU. Models that needed 91-94% of free VRAM were classified as "does
not fit", so Studio set ``gpu_indices = None`` and shipped
``--fit on`` to llama-server without ``-ngl``. The unsloth
llama.cpp fork's ``--fit on`` then ran with its default
``--fit-target 1024`` (1 GiB margin per device, an upstream default
inherited from ggml-org#18679). On a tight fit where compute
buffers + CUDA context push the projected free below the 1 GiB
target, the fork's fit logic shaves layer weights off the GPU --
slow inference for users whose models would have loaded comfortably
with ``-ngl -1``.
The classic reproducer from #5106 (noahterbest's log):
GGUF size: 20.8 GB, est. KV cache: 0.1 GB, context: 4096,
GPUs free: [(0, 22805)], selected: None, fit: True
20.8 GiB on a 22.27 GiB free RTX 4090 is 94% utilization. The model
fits (1.4 GiB headroom), but the 0.90 threshold kicks it to fit
mode. Bumping to 0.95 keeps these in the fits-on-GPU branch and
emits ``-ngl -1`` directly. The fork's ``--fit on`` still serves as
the safety net for the genuinely-too-large case.
The auto-ctx fallback also re-checks fit at 4096 before handing off
to ``--fit on``: a 20.8 GiB model with a 131072 native context fails
the auto loop at native ctx, falls back to ``min(4096, ctx)``, but
its weights + 4096 KV pin to the GPU comfortably. Without the
re-check we still emitted ``--fit on``.
``_fit_context_to_vram``'s 0.90 budget for context binary search is
intentionally left tighter than the pin fraction. That routine
chooses the slider value, where over-promising would OOM at runtime.
``_select_gpus`` decides whether to pin at all, where being
conservative pushes layers to CPU.
2. Belt-and-suspenders: warn on silent CPU fallback
---------------------------------------------------
After ``_wait_for_health`` succeeds, scan llama-server's stdout for
``model buffer size`` lines. If Studio detected GPUs and intended
GPU use but only CPU buffers were allocated, log a structured
warning citing #5106. Markers cover CUDA / ROCm / Metal / Vulkan /
OpenCL / SYCL backends. New ``_gpu_offload_active: Optional[bool]``
field surfaces the result for any future API consumer.
This catches runtime-load failures the install-time fix cannot
cover (cudart bundle pairing PR #5322 is the install-side
companion): user overriding ``--fit-target``, uncommon driver +
toolkit configurations, future regressions in the install path.
Tests: 10 new cases in studio/backend/tests/test_llama_cpp_context_fit.py:
* TestTightFitPinsToGPU x3: noahterbest's exact reproducer (auto and
explicit ctx pins to GPU at 94%); guard against threshold over-
broadening (genuine overflow still falls back to ``--fit on``).
* TestClassifyGpuOffload x7: CUDA / ROCm / Metal buffer markers
return True; CPU-only buffer lines return False; absent buffer
lines or no GPUs detected return None (no warning).
25 context-fit tests pass (15 baseline + 10 new). 511 tests total
across the affected test files. No regressions.
Refs #5106
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Trim comments to be more succinct
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
0a54d001ec
|
Harden Tauri release flow (#5341)
Some checks are pending
Security audit / pip scan-packages :: extras (push) Waiting to run
Security audit / pip scan-packages :: studio (push) Waiting to run
Security audit / pip scan-packages :: hf-stack (push) Waiting to run
Security audit / npm scan-packages (Studio frontend tarballs) (push) Waiting to run
Studio API CI / Studio API & Auth Tests (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Studio GGUF CI / Tool calling Tests (push) Waiting to run
Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio API CI / Studio API & Auth Tests (push) Waiting to run
Mac Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Mac Studio GGUF CI / Tool calling Tests (push) Waiting to run
Mac Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio UI CI / Chat UI Tests (push) Waiting to run
Mac Studio Update CI / Studio Updating Tests (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Studio UI CI / Chat UI Tests (push) Waiting to run
Studio Update CI / Studio Updating Tests (push) Waiting to run
Windows Studio API CI / Studio API & Auth Tests (push) Waiting to run
Windows Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Windows Studio GGUF CI / Tool calling Tests (push) Waiting to run
Windows Studio GGUF CI / JSON, images (push) Waiting to run
Windows Studio UI CI / Chat UI Tests (push) Waiting to run
Windows Studio Update CI / Studio Updating Tests (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* Harden Tauri backend preflight and startup
Require managed Studio root IDs to match before attaching to existing backends, close the concurrent backend-start window, and tighten frontend Tauri detection to Tauri-specific signals.
* Add Tauri backend manageability guards
Gate desktop backend compatibility on explicit manageability fields, add external-conflict handling for unsafe backend states, and protect update/repair paths from mutating active non-owned Studio backends. Track Tauri-owned backends with local owner metadata for verified orphan cleanup only.
* Split Tauri preflight probes into modules
Move preflight types, version checks, managed install probing, and backend probing into focused submodules while preserving behavior and keeping implementation files under the release-readiness size target.
* Use desktop-specific Tauri updater channel
Point the desktop updater at a same-repo desktop-latest manifest and publish that channel from non-draft desktop releases after validating the Tauri-generated latest.json.
* Add Linux desktop update policy
* Add owned backend lifecycle guards
* Adopt verified desktop-owned backends
* Validate desktop backend readiness
* Trim Tauri release hardening code
* Require desktop backend 2026.5.3
* Handle desktop backend edge cases
* Fail stalled desktop backend startup
* Fix desktop update edge cases
* Avoid secret-gating adopted watchdog
* Fix desktop update comparison guards
* Automate desktop release versioning
* Serialize desktop release workflow
* tests: follow preflight.rs split into preflight/{backend,managed,types,version}.rs
PR #5341 splits studio/src-tauri/src/preflight.rs into a directory of
submodules. The cmd.env_remove("UNSLOTH_STUDIO_HOME") + STUDIO_HOME
calls now live in preflight/managed.rs instead of preflight.rs, so
test_tauri_preflight_scrubs_studio_home_env counted zero matches in
the old single-file location and failed with "assert 0 >= 2".
Read whichever shape is on disk: preflight.rs at the old path plus
every *.rs under preflight/ (current PR has 2 occurrences in
preflight/managed.rs). The guard intent is unchanged: at least 2
env_remove calls covering run_cli_probe and probe_cli_capability,
plus the single commands.rs scrub in check_install_status. Verified
locally: pytest tests/test_studio_install_workspace_guard.py::test_tauri_preflight_scrubs_studio_home_env passes.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Avoid browser Tauri hostname detection
* Restore shutdown flag after failed stop
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
23cebfaf98
|
Add Studio web update banner and release version display (#5308)
Some checks are pending
Security audit / advisory audit (pip + npm + cargo) (push) Waiting to run
Security audit / pip scan-packages :: extras (push) Waiting to run
Security audit / pip scan-packages :: studio (push) Waiting to run
Security audit / pip scan-packages :: hf-stack (push) Waiting to run
Studio API CI / Studio API & Auth Tests (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Studio GGUF CI / Tool calling Tests (push) Waiting to run
Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio API CI / Studio API & Auth Tests (push) Waiting to run
Mac Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Mac Studio GGUF CI / Tool calling Tests (push) Waiting to run
Mac Studio GGUF CI / JSON, images (push) Waiting to run
Mac Studio UI CI / Chat UI Tests (push) Waiting to run
Mac Studio Update CI / Studio Updating Tests (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Studio UI CI / Chat UI Tests (push) Waiting to run
Studio Update CI / Studio Updating Tests (push) Waiting to run
Windows Studio API CI / Studio API & Auth Tests (push) Waiting to run
Windows Studio GGUF CI / OpenAI, Anthropic API tests (push) Waiting to run
Windows Studio GGUF CI / Tool calling Tests (push) Waiting to run
Windows Studio GGUF CI / JSON, images (push) Waiting to run
Windows Studio UI CI / Chat UI Tests (push) Waiting to run
Windows Studio Update CI / Studio Updating Tests (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* Add Studio web update and release version display * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Show package version in Studio settings * Break training unload guard barrel cycle --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> |
||
|
|
379f5a5aa6
|
Studio: add torch's pip nvidia DLL dirs to PATH on Windows (#5324)
* Studio: add torch's pip nvidia DLL dirs to PATH on Windows
Studio's install_python_stack bundles torch with matching CUDA
wheels (nvidia-cuda-runtime-cu13, nvidia-cublas-cu13, etc.) which
ship cudart64_X.dll, cublas64_X.dll, and cublasLt64_X.dll under
the prefix's Lib/site-packages/nvidia/<pkg>/(bin|Library/bin)/
tree. The Linux runtime env block in start_llama_server already
pulls the equivalent nvidia/cu*/lib paths into LD_LIBRARY_PATH,
but the Windows block did not do this, so the prebuilt
llama-server.exe could not resolve cudart64_X.dll at runtime
unless the user had a matching system CUDA toolkit on PATH. That
is the root cause of the Windows reports in
unslothai/unsloth#5106 ("GPU detected but model loaded entirely
on RAM/CPU"), and matches Roland's repeated workaround in that
issue: install matching CUDA toolkit version.
Brings the Windows env block in line with the Linux pattern:
* New LlamaCppBackend._windows_pip_nvidia_dll_dirs resolver
globs <prefix>/Lib/site-packages/nvidia/<pkg>/bin and
<prefix>/Lib/site-packages/nvidia/<pkg>/Library/bin. Both
layouts are seen in the wild across cuda_runtime / cublas /
cudnn / nvjitlink wheels.
* The Windows env block now extends path_dirs with the
resolver's output before falling back to CUDA_PATH/bin, so
pip-installed wheels are the canonical source (mirroring the
Linux LD_LIBRARY_PATH ordering). System CUDA toolkit remains a
valid fallback.
Tests: 7 new cases in
studio/backend/tests/test_llama_cpp_windows_nvidia_path.py:
* empty resolver when no nvidia wheels installed
* nvidia/<pkg>/bin layout resolved
* nvidia/<pkg>/Library/bin layout resolved
* mixed bin and Library/bin layouts both resolved
* unrelated site-packages contents not walked
* non-directory entries skipped
* missing prefix does not raise
110 backend tests pass. No regressions.
Refs #5106
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Studio: also scan torch/lib in Windows pip nvidia DLL resolver
PyTorch's Windows CUDA wheels frequently bundle cudart64_X.dll and
cublas64_X.dll directly under Lib/site-packages/torch/lib/ instead of
shipping separate nvidia-cuda-runtime-cuXX / nvidia-cublas-cuXX wheels.
On those installs _windows_pip_nvidia_dll_dirs previously returned
nothing useful, and llama-server.exe fell back to needing a system CUDA
toolkit on PATH -- the original #5106 failure mode.
The install-side equivalent python_runtime_dirs in
install_llama_prebuilt.py already treats torch/lib as a Python runtime
DLL source for the same reason. Bring the runtime resolver in parity
so torch-bundled-CUDA installs find their cudart at llama-server start.
Updates the existing test that codified the bug (asserted torch/lib was
excluded), and adds three new cases: pickup, combined-with-nvidia, and
the must-be-a-directory guard.
* Studio: cover cu13 bin/x86_64 layout in Windows DLL resolver
Three follow-ups from a 12-reviewer batch over
|
||
|
|
e346193ae8
|
Studio: download paired cudart bundle on Windows CUDA installs (#5322)
* Studio: download paired cudart bundle on Windows CUDA installs
Upstream ggml-org/llama.cpp publishes Windows CUDA in two archives
that the release notes explicitly say are both required:
llama-<tag>-bin-win-cuda-X.Y-x64.zip (binaries + ggml DLLs)
cudart-llama-bin-win-cuda-X.Y-x64.zip (cudart64, cublas64, cublasLt64)
Studio's installer was downloading only the first one. The
``runtime_name`` / ``runtime_url`` fields on AssetChoice existed but
were never populated, and ``install_from_archives`` only handled
``choice.url``. With the cudart DLLs missing from
``install_dir/build/bin/Release``, the prebuilt binary's LoadLibrary
calls only resolved at runtime when the user happened to have a
version-matched system CUDA toolkit on PATH. That is the underlying
cause for the Windows reports in #5106 ("GPU detected but model
loaded entirely on RAM"): the prebuilt's CUDA backend silently fails
to load and llama-server falls back to CPU regardless of ``-ngl`` or
``--fit on``.
Wires the pairing through end to end:
* ``windows_cuda_attempts`` and ``published_windows_cuda_attempts``
look up the matching ``cudart-llama-bin-win-cuda-X.Y-x64.zip``
asset URL alongside the main archive and store it as
``runtime_url`` / ``runtime_name`` on the AssetChoice. We only
pair when the selected main archive is the binary archive
(``llama-...zip``) so the legacy cudart-only naming path is
unaffected.
* ``apply_approved_hashes`` resolves the runtime archive's hash from
the approved manifest. If the manifest does not list the runtime
archive, the pairing is dropped rather than installing without
checksum coverage. Preserves the supply-chain guarantee for
published bundles; upstream installs with no manifest are
unaffected (same risk surface as the existing main-archive
download).
* ``install_from_archives`` now downloads the runtime archive into a
separate temp dir and runs ``copy_globs`` against both source dirs.
Separate dirs avoid the "ambiguous archive layout" guard tripping
on shared filenames like LICENSE.txt, while the second
``copy_globs`` overlay drops the cudart DLLs into the same
``install_dir/build/bin/Release`` directory as the main binary.
Adds a ``runtime_sha256`` field on AssetChoice to carry the
verified hash through to the download step, alongside the existing
``runtime_name`` / ``runtime_url`` slots.
Tests: 5 new cases in tests/studio/install/test_selection_logic.py:
* upstream pairing populates runtime_url / runtime_name
* graceful degrade when cudart asset is absent in the release
* legacy cudart-only naming path does not self-pair
* apply_approved_hashes threads runtime_sha256 when the manifest
lists it
* apply_approved_hashes drops the pair when the runtime hash is
missing rather than installing without verification
130 install tests pass (125 baseline + 5 new). No regressions.
Refs #5106
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Trim comments to be more succinct
* Studio: refresh installs that pre-date the paired cudart bundle
expected_install_fingerprint did not hash the new runtime_name /
runtime_sha256 fields, and runtime_payload_health_groups for windows-
cuda only checked llama.dll / ggml-cuda.dll. The combination meant that
an install made before this PR -- the exact installs reporting #5106 --
would still match the post-PR choice: same main asset name + sha, same
llama.dll, same ggml-cuda.dll, missing cudart64_*.dll, but
existing_install_matches_choice returned True and the cudart download
path in install_from_archives never ran. Fresh installs got the fix;
existing affected installs did not.
This commit:
* Adds runtime_asset and runtime_sha256 to the fingerprint payload so
any change to (or first introduction of) the cudart pair invalidates
pre-existing installs.
* Refactors write_prebuilt_metadata to call expected_install_fingerprint
so the recorded fingerprint cannot drift from the expected one when
new keys are added.
* Extends runtime_payload_health_groups for windows-cuda to require
cudart64_*.dll and cublas64_*.dll *only when the choice carries a
paired runtime archive*. Gating on choice.runtime_name keeps the
no-pair fallback path (manifest missing cudart hash, upstream
without paired bundle) from looping on reinstall.
New tests:
* test_existing_install_matches_plan_windows_cuda_paired_requires_cudart
-- paired choice rejects installs missing cudart / cublas.
* test_existing_install_matches_plan_windows_cuda_unpaired_skips_cudart_check
-- unpaired choice still accepts legacy cudart-less installs.
* test_existing_install_fingerprint_changes_when_cudart_pair_added
-- direct fingerprint mismatch between the legacy and paired choice.
Refs #5106
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Studio: tighten paired Windows CUDA install gates
Three follow-ups from a 12-reviewer batch over
|
||
|
|
6d4e6f2514
|
CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests (#5312)
* CI: scope GITHUB_TOKEN permissions and unblock ~60 skipped tests
permissions:
- All five PR-time workflows (backend, frontend, inference smoke, tauri,
wheel) now declare permissions: contents: read at the workflow level,
matching CodeQL's default-permissions guidance and the existing pattern
in release-desktop.yml. None of these workflows write to the repo.
skipped tests:
- Repo tests (CPU) job now installs node 22 and uv, which unblocks
~60 tests that were silently skipping on CI:
- 9 tests in tests/studio/test_chat_preset_builtin_invariants.py
skipped on "node not available". Fixed in this commit; an obsolete
"unsloth_repo/" prefix in WORKDIR was also pointing the source-file
existence check at a path that no longer exists.
- tests/python/test_e2e_no_torch_sandbox.py (47), test_studio_import_no_torch.py
(29), test_tokenizers_and_torch_constraint.py (most of 42) all spawn
fresh uv venvs and self-skip when uv is missing.
- Three test_tokenizers_and_torch_constraint.py cases are deselected
because they expose a real bug in studio/backend/requirements/no-torch-runtime.txt:
the unpinned tokenizers line resolves to 0.23.1, which transformers
rejects with "tokenizers>=0.22.0,<=0.23.0 is required". Tracked
separately as a no-torch install regression.
Locally: 760 passed, 1 skipped, 23 deselected (was 694 / 67 / 23).
* CI: add MLX CI workflow for the Studio dispatch matrix
Mirrors the three files documented in tests/studio/README.md (PR #5307)
into a dedicated workflow so MLX dispatch failures show up as their own
check on PRs rather than getting buried inside Backend CI:
- test_hardware_dispatch_matrix.py 7-profile parametrized matrix
+ 2 dispatch-priority canaries
- test_is_mlx_dispatch_gate.py AST + runtime guard on
unsloth._IS_MLX
- test_mlx_training_worker_behaviors.py worker.py contract checks
Triggers on pull_request when any of unsloth/__init__.py,
studio/backend/utils/hardware.py, studio/backend/core/training/worker.py,
or any of the three test files are touched. Runs on a Linux+CPU runner
with hardware spoofs; no Apple Silicon, real GPU, or real MLX install
required. Locally validated: 36 passed in 0.41s.
permissions: contents: read at the workflow level (matching the rest of
the PR-time CI surface).
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ci(mlx): fix path filter that pointed at a non-existent file
The MLX CI workflow listed ``studio/backend/utils/hardware.py`` as a
path filter, but no such file exists. The actual layout is
studio/backend/utils/hardware/
__init__.py
amd.py
hardware.py
nvidia.py
vram_estimation.py
so the filter as written would never match. A reviewer modifying
``hardware/hardware.py`` (where ``detect_hardware``, ``DeviceType``,
and ``IS_ROCM`` actually live) would not trigger MLX CI, which
defeats the point of the focused PR gate.
Replace the broken filter with ``studio/backend/utils/hardware/**``
so any change in the hardware probe directory triggers MLX CI, and
add three sibling triggers that each materially affect dispatch:
- ``unsloth/_gpu_init.py``
Hosts ``from .models import *`` and the ``from .trainer import *``
chain. The trainer.py circular-import fix that landed in
``
|
||
|
|
1c91f49d83
|
fix: unblock 4 tests deselected/skipped in #5312 (real bugs) (#5359)
* fix: unblock 4 tests deselected/skipped in #5312 (real bugs) PR #5312 surfaced two real regressions by turning previously-silent skips into explicit `--deselect` / `pytest.skip(...)` blocks. Both were left as follow-ups rather than fixed in that PR. This PR fixes the underlying bugs so the suppressions can be dropped. 1. studio/backend/requirements/no-torch-runtime.txt: pin tokenizers Installing with `--no-deps -r no-torch-runtime.txt` (the path install.sh takes for the no-torch / GGUF-only mode) resolves transformers to 5.3.0 and tokenizers to the latest available (0.23.1). transformers 5.3.0 requires `tokenizers>=0.22.0,<=0.23.0`, so `from transformers import AutoConfig` then fails at import time: ImportError: tokenizers>=0.22.0,<=0.23.0 is required for a normal functioning of this module, but found tokenizers==0.23.1. Pin `tokenizers>=0.22.0,<=0.23.0` to match the constraint embedded inside every transformers version in the allowed window (4.56.0..5.3.0). Verified locally: a fresh `uv venv` + `uv pip install --no-deps -r no-torch-runtime.txt` followed by `from transformers import AutoConfig` now succeeds. Unblocks 3 deselected cases in studio-backend-ci.yml: - TestE2ETokenizersFix::test_autoconfig_works_with_no_torch_runtime (parametrized py 3.12 + 3.13 -> 2 cases) - TestE2EFullNoTorchSandbox::test_autoconfig_succeeds 2. unsloth/models/rl.py: defensive wrapper for _patch_trl_rl_trainers _patch_trl_rl_trainers has many internal `try: ... except: ... return` branches, but several paths (notably inspect.getsource on the thin wrappers TRL 1.x leaves in trl.trainer for trainers that moved to trl.experimental) can still propagate exceptions. The umbrella patch_trl_rl_trainers() ring-fences each call with try/except + warning_once, but direct callers (the CI shim in consolidated-tests-ci.yml, downstream tools, end-user scripts) used to see the raw exception, which forced #5312's CI heredoc to ring-fence with: except Exception as e: # TRL 1.x renames break the patch helper internally; we # accept that here and skip rather than fail the cell. pytest.skip(f"_patch_trl_rl_trainers raised: ...") Rename the existing implementation to _patch_trl_rl_trainers_impl and make _patch_trl_rl_trainers a thin wrapper that catches any uncaught exception and routes it through logger.info, matching the umbrella wrapper's behaviour. Power users who want the raw raising behaviour for their own diagnostics can still call _patch_trl_rl_trainers_impl directly. Adds tests/python/test_patch_trl_rl_trainers_defensive.py to lock the contract: the wrapper must never raise, and it must delegate to the impl on the happy path. Unblocks 1 skip in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. Follow-up for #5312 once this lands: drop the two `--deselect` lines in studio-backend-ci.yml's repo-cpu-tests step and drop the `except Exception ... pytest.skip(f"_patch_trl_rl_trainers raised: ")` block in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. * chore: tighten comments and docstrings in the new code Drop verbose justifications down to one or two lines per site. The PR description carries the full context; in-file comments only need to point at the WHY. * chore(no-torch-runtime): drop redundant lower bound on tokenizers tokenizers 0.23.0 was never published to PyPI (versions go 0.22.2 -> 0.23.1), so `tokenizers<=0.23.0` resolves to 0.22.2 in practice, the same version the explicit >=0.22.0,<=0.23.0 pin resolved to. Verified on Python 3.12 and 3.13. |
||
|
|
b364080225
|
fix(gh_client): fail fast on 401/403 auth errors instead of retrying forever (#5325) (#5329)
Some checks failed
Studio GGUF CI / Studio boots, loads a GGUF, answers a chat completion (push) Has been cancelled
Backend CI / (Python 3.10) (push) Has been cancelled
Backend CI / (Python 3.11) (push) Has been cancelled
Backend CI / (Python 3.12) (push) Has been cancelled
Backend CI / (Python 3.13) (push) Has been cancelled
Backend CI / Repo tests (CPU) (push) Has been cancelled
Backend CI / Backend ruff lint (non-blocking) (push) Has been cancelled
Frontend CI / Frontend build + bundle sanity (push) Has been cancelled
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Has been cancelled
Wheel CI / Wheel build + content sanity + import smoke (push) Has been cancelled
* fix(gh_client): fail fast on 401/403 auth errors instead of retrying forever (#5325) Fixes #5325. The Studio data-recipe GitHub Crawler swallows 401 Unauthorized (and 403 Forbidden without rate-limit headers) into the generic "network error" retry path, so a job with a stale or wrong-scoped GitHub token spins indefinitely emitting "Retry." lines until the user cancels. Changes: - Add GitHubAuthError. Raised on 401, and on 403 unless the response carries a clear rate-limit signal (Retry-After header for secondary limits, or X-RateLimit-Remaining: 0 for primary limits). - Track which token source resolved at construction time: explicit argument (recipe-level field), GH_TOKEN, or GITHUB_TOKEN. Surfaced in the error message so the user knows which credential to rotate. - Insert the auth-failure check before the existing 403/429 rate-limit branch in both .graphql() and .rest() so auth failures bypass the sleep-and-retry loop and abort the recipe immediately. Genuine rate limiting still retries via the existing path. requests.RequestException handling is unchanged because GitHubAuthError does not inherit from it. 🤖 Generated with [Claude Code](https://claude.com/claude-code) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * style: apply black formatting per pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GitHub auth failure handling Preserve GitHub token source through the repo seed scraper and fail fast on non-rate-limit auth errors while keeping genuine rate-limit retries. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com> |
||
|
|
c57a97958a
|
Studio: stop truncating long log lines as suspected base64 (#5335)
Some checks are pending
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Backend CI / Backend ruff lint (non-blocking) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / Studio boots, loads a GGUF, answers a chat completion (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* Studio: stop truncating long log lines as suspected base64 filter_sensitive_data carried a heuristic from the original Studio import that truncated any string >100 chars containing ',' or '/' to value[:20] + '...'. The block was dormant until #5246 wired filter_sensitive_data into the structlog processor chain to redact native-path leases. Once active, the heuristic ate normal log lines - llama_cpp_backend's GGUF size summary, mmproj selection, the full llama-server command line, and any traceback containing a path - all rendered as a 20-char prefix, defeating debugging of llama-server exceptions and GPU selection. Drop the base64 truncation. No call site in the codebase logs raw base64; if one ever does, it should truncate at the source rather than in a global filter. Native-path lease redaction added by #5246 is preserved. * Studio: regression test for filter_sensitive_data truncation Pins two properties in studio/backend/loggers/handlers.py: 1. Long log messages with ',' or '/' (the GGUF size summary, mmproj selection, full llama-server command, exception tracebacks) flow through filter_sensitive_data unchanged. Exercises the exact call sites that regressed when #5246 wired the processor in. 2. Native-path lease redaction still fires for both the inline native_path_lease=... regex form and the nativePathLease dict-key form, so a future cleanup of the truncation logic can't quietly strip #5246's redaction along with it. |
||
|
|
d1f9ab659f
|
fix: harden Studio IME composer sends (#5327)
Some checks are pending
Backend CI / Backend ruff lint (non-blocking) (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / Studio boots, loads a GGUF, answers a chat completion (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* fix: harden Studio IME composer sends * fix: address IME composer review feedback |
||
|
|
b65a7450ca
|
Studio: Dark theme refactor, right sidebar redesign, and chat UI polish (#5150)
* Dark theme refactor, right sidebar redesign, and chat UI polish
- Dark theme refactor
- Redesign right sidebar
- Further left sidebar adjustments
- Wider chat and content area; layout tweaks for chat content
- Rounded corners across elements for consistency
- Show chat message menu icons on menu-area hover, not only on message hover
- Assistant message menu icons now always visible; user messages keep on-hover
- Redesigned copy icon used consistently across chat blocks and messages
- Redesigned trash icon, applied consistently
- Unified icon sizing and style with the sidebar
- Adjusted icon colors across chat
- Fix on-hover background design for chat icons
- Fix tooltip from 'more' button staying visible after clicking elsewhere
- Adjust position and design of generation speed info text below messages
- Adjust design of token speed info popup
- Adjust sidebar scrollbar to cover recent chats only
* Recents sidebar rename, UI/theme refactor, layout and chat polish
UI & Theme:
- Dark theme refactor
- Consistent rounded corners across elements
- CSS polish and cleanup
- Remove unused logo image assets
Recents sidebar:
- Add 'more' button for options menu
- Support renaming conversations and training runs
- Confirmation dialog before deleting chats
- Add optional display_name column to training_runs (idempotent ALTER TABLE) so renaming doesn't lose model_name/dataset_name from the run config
- New PATCH /api/train/runs/{run_id} endpoint accepts { display_name: string | null }; empty/whitespace clears the override
- Sidebar shows display_name ?? model_name and exposes Rename in the row's More menu, mirroring the chat rename flow
- Cache last list response in localStorage and hydrate from it on mount, so recents paint instantly on F5 / route revisit; cached items are shape-validated and dropped if malformed
- Optimistic updates on rename and delete (apply locally + cache before background refresh)
- Visible toast on rename/delete failure instead of swallowed errors
Layout:
- Redesigned right sidebar
- Further left sidebar adjustments
- Updated chat content layout; chat and content area slightly widened
- Sidebar scrollbar covers recent chats only
Icons:
- Redesigned copy icon, unified across chat blocks and messages
- Redesigned trash icon to match
- Consistent icon sizing and style across chat and sidebar
- Adjusted icon colors across chat
- Fix icon on-hover background design
Chat messages:
- Menu icons now appear on hover over the menu area, not just the message
- Assistant message menu icons always visible; user messages keep on-hover (next/previous response stays visible for edited prompts)
- Repositioned and restyled generation speed info text below messages
- Restyled token generation speed popup
Tooltips:
- Removed tooltip on hover for previous/next assistant response icons
- Unified tooltip design across sidebars and chat
- Removed tooltip animations (also fixes related lag)
Model & Chat Template config:
- Merged Chat Template config into Model Configuration section
- Added revert-to-original for chat template
- Fix Chat Template config disappearing on page refresh until model reload
Performance & scroll:
- Removed chatbox movement animations across pages/navigation (fixes related UI lag)
- Fix scroll flicker at end of streaming when a code block is the final element
- Additional chat scroll improvements
Bug fixes:
- Fix 'more' button tooltip remaining visible after clicking elsewhere
* Remove sidebar localStorage cache and optimistic updates
Drops the localStorage hydration and optimistic rename/delete logic from the recents sidebar; reverts to fetching fresh on mount.
* Fix missing cn import in shared-composer (regression from merge)
* chore(sidebar): import sidebar deps from feature indexes
Re-export deleteChatItem / renameChatItem / useChatSidebarItems / SidebarItem / useChatSearchStore / ChatSearchDialog from @/features/chat, and removeTrainingUnloadGuard from @/features/training. Switch app-sidebar.tsx to consume them via the public feature indexes instead of deep paths, clearing the no-restricted-imports eslint errors. No behavior or UX change.
* fix(studio/frontend): reload training Recents sidebar after F5 refresh
The Recents sidebar showed empty after a hard refresh. The hook's inFlightRef dedup guard collided with React StrictMode's double-mount in dev: the second mount's fetch returned silently with no error, no retry, and no toast — leaving the sidebar empty until navigation.
Replace skip-if-busy dedup with abort-previous via a hook-level AbortController. This also fixes a latent race where a slow poll could resurrect a just-deleted row by clobbering the optimistic update.
Changes (all in use-training-history-sidebar.ts):
- fetchRuns aborts any in-flight request before starting a new one; post-await signal.aborted check drops stale responses.
- Optimistic helpers (applyRunUpdate, removeRun) abort in-flight fetches so they don't depend on caller discipline to invalidate stale data.
- Initial load gets bounded retry-with-backoff (500ms / 1.5s / 3.5s) and surfaces a sonner toast with a Retry action on final failure.
- Failure toast auto-dismisses on any successful load (initial retry, Retry click, or polling recovery).
- Polling pauses while the tab is hidden and catches up on visible, avoiding wasted requests during long training runs.
- Both effects own their teardown explicitly (abort + clear timer).
* Apply unified tooltip design and behavior across remaining pages for consistency
* UI polish: spacing, tooltip on source icons, letter spacing, smaller icons, consistent edit icon
- Adjust tiny spacing between elements around the UI for subtle polish
- Redesign tooltip on source icons for web search / tool use, consistent with the new design
- Adjust chat text letter spacing
- Smaller icon sizes
- Replace 'edit message' icon in chat with the new Rename icon used in Recents for consistency
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Adjust CSS for right sidebar
* Fix scrollbar UI compatibility across browsers
* fix: preserve chat preset settings on model load
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(studio): remove duplicate chat template status field
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* chore: remove creative preset assumption
* fix(studio): align speculative decoding default
* fix(studio/chat): snap numeric param inputs to step grid
- Type a value in any param input (Temperature, Top K, Max Tokens, etc.)
now clamps to [min, max] and snaps to the slider's step grid, killing
off-grid values like 1.051234 and FP residue from slider drags.
- Branch picker chevrons share the action bar's 32px height + 10px radius
via a new .aui-branch-chevron-btn utility; hover area aligns visually
while staying narrower than the sibling icon buttons.
* fix(studio/chat): keep training-run polls converging and drop dead preset code
- Keep training-run polls converging when responses outrun the 5s interval
(don't unconditionally abort prior in-flight; skip if one is still pending,
mutation race still guarded).
- Drop dead Creative/Precise preset code paths (remove 'builtin-fixed' source
variant + unreachable branches).
* fix(studio): training-run cards show custom name + model + dataset
- Training-run cards now display custom display_name + model + dataset,
with cross-view sync on rename/delete.
- Enhance clarity of borders and colors in dark theme on export etc.
* fix(studio): match active state green to unsloth brand color
* fix(studio): preserve can_resume on training rename
* fix(studio): keep GGUF chat template override distinct
* fix(studio): treat audio input models as multimodal
* fix(studio): cancel numeric draft on Escape
* fix(studio): use default speculative mode on toggle
* fix(studio): detect GGUF audio VLM input models
* fix(studio): address final PR review findings
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(studio): refresh sidebar/history when a new training run starts so it appears without a manual reload
* fix: API and svg
* fix(studio/sidebar): align run rename dirty check with displayed baseline
* fix(studio/sidebar): use leading-tight on account block to prevent descender clipping with truncate
---------
Co-authored-by: sneakr <hauzin@hotmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: shine1i <wasimysdev@gmail.com>
|
||
|
|
4ab096970d
|
Studio: API settings overflow with long Colab URLs (#5286)
* fix: API settings overflow with long Colab URLs * fix: gentle wrapping for API usage snippets --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
848ede3d57
|
[studio]: Fix tool reasoning trace in UI (#5314)
Some checks are pending
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Backend CI / Backend ruff lint (non-blocking) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / Studio boots, loads a GGUF, answers a chat completion (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
* fix thought for 1 second issue * gemini suggesion |
||
|
|
fac2dc09b0
|
fix: restore API and Help menu labels (#5310)
Some checks are pending
Backend CI / (Python 3.10) (push) Waiting to run
Backend CI / (Python 3.11) (push) Waiting to run
Backend CI / (Python 3.12) (push) Waiting to run
Backend CI / (Python 3.13) (push) Waiting to run
Backend CI / Repo tests (CPU) (push) Waiting to run
Backend CI / Backend ruff lint (non-blocking) (push) Waiting to run
Frontend CI / Frontend build + bundle sanity (push) Waiting to run
Studio GGUF CI / Studio boots, loads a GGUF, answers a chat completion (push) Waiting to run
Studio Tauri CI / Tauri Linux debug build (no codesign) (push) Waiting to run
Wheel CI / Wheel build + content sanity + import smoke (push) Waiting to run
|
||
|
|
0c803242ef
|
feat(studio): add Continued Pretraining (CPT) as a training method (#4677)
* feat(studio): add Continued Pretraining (CPT) support Implements CPT as a first-class training method in Unsloth Studio, resolving feature request #4565. Changes: - frontend/src/types/training.ts: add 'cpt' to TrainingMethod union - frontend/src/lib/vram.ts: add 'cpt' to VramTrainingMethod (fp16 footprint) - frontend/src/features/export/constants.ts: add CPT to METHOD_LABELS - frontend/src/features/training/api/mappers.ts: map 'cpt' -> 'Continued Pretraining', force packing=true and train_on_completions=false for CPT payloads - frontend/src/features/studio/sections/model-section.tsx: add 'Continued Pretraining' option (purple dot) to Method selector; update tooltip - frontend/src/features/onboarding/.../model-selection-step.tsx: add CPT to onboarding wizard method dropdown - backend/models/training.py: update training_type field description - backend/core/training/worker.py: detect is_cpt flag, force packing=True, train_on_completions=False, pass is_cpt to _train_worker - backend/core/training/trainer.py: _train_worker reads is_cpt kwarg, forces packing on, skips train_on_responses_only for raw-text pretraining CPT behaviour: - Full model weights (no LoRA adapters), same as Full Finetuning - Sequence packing always enabled for GPU efficiency - Trains on every token (no chat-format masking) - VRAM estimated at fp16 (2.0 bytes/param) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update mappers.ts * Add CPT raw dataset support and UI fixes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add missing training methods module * Handle invalid raw-text rows and expose raw in onboarding --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Etherll <61019402+Etherll@users.noreply.github.com> Co-authored-by: Etherll <mrmrmidessam@gmail.com> |
||
|
|
d65149795b
|
feat(studio): MLX training tab on Apple Silicon (LoRA / full FT, VLM, export) (#5265)
* Add Apple Silicon MLX routing
Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports
Extract original GPU init to _gpu_init.py (unchanged)
MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code
GPU path unchanged: from ._gpu_init import *
* Add Apple Silicon MLX routing
- Rewrite __init__.py: detect MLX on macOS arm64 before any torch imports
- Extract original GPU init to _gpu_init.py (unchanged)
- MLX path imports FastMLXModel from unsloth_zoo, skips all GPU code
- GPU path unchanged: from ._gpu_init import *
* mlx with studio
* mlx with studio
* updating temporary install.sh
* updating temporary install.sh
* adding t_v5 path
* adding t_v5 path
* fixing vision training
* fixing vision training
* adding chat
* adding chat
* minor
* minor
* Adding export and fixing training issues, inference with lora adaptors
* Adding export and fixing training issues, inference with lora adaptors
* fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, streaming for VLM
* fix: MLX worker pass load_in_4bit, override is_vlm based on dataset, streaming for VLM
* Merge mlx-apple-silicon into main
* update install.sh to point to main branch
* update install.sh to point to main branch
* fix: export returns 3 values (success, message, output_path) matching upstream worker
* fix: export returns 3 values (success, message, output_path) matching upstream worker
* fix(mlx): show training-process peak memory in Studio UI, not system-wide
Studio UI was showing ~95 GB during MLX training because get_gpu_utilization
read "In use system memory" from IORegistry's AGXAccelerator — system-wide
GPU memory across all processes (training + backend + browser + Display).
Now the trainer's mx.get_peak_memory value is forwarded through the
progress event and surfaced via /api/train/hardware while training is
active. Falls back to the system-wide reading when training is not running.
* fix(mlx): show training-process peak memory in Studio UI, not system-wide
Studio UI was showing ~95 GB during MLX training because get_gpu_utilization
read "In use system memory" from IORegistry's AGXAccelerator — system-wide
GPU memory across all processes (training + backend + browser + Display).
Now the trainer's mx.get_peak_memory() value is forwarded through the
progress event and surfaced via /api/train/hardware while training is
active. Falls back to the system-wide reading when training is not running.
* fix(mlx): make is_bfloat16_supported detect M1/M2 (no native bf16)
M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70%
slower prefill compared to native fp16. M3+ have native bf16 (macOS
Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware
detection via mx.device_info.
* fix(mlx): make is_bfloat16_supported() detect M1/M2 (no native bf16)
M1 and M2 chips emulate bf16 in software on the GPU, causing 40-70%
slower prefill compared to native fp16. M3+ have native bf16 (macOS
Sonoma+ MPSGraph). Replaces the always-True stub with chip-aware
detection via mx.device_info().
* feat(mlx): wire training_type="Full Finetuning" through MLX worker
Compute use_lora from the UI's training_type before loading the model,
pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and
let the existing 'if use_lora' branch skip get_peft_model. Matches the
GPU worker's flow.
* feat(mlx): wire training_type="Full Finetuning" through MLX worker
Compute use_lora from the UI's training_type before loading the model,
pass full_finetuning=not use_lora to FastMLXModel.from_pretrained, and
let the existing 'if use_lora' branch skip get_peft_model. Matches the
GPU worker's flow.
* fix(mlx): pass save_method='merged_16bit' from Studio's export page
Previously the MLX path called save_pretrained_merged with no
save_method, which fell through to a no-op that didn't actually fuse
LoRA into the base. Now Studio's "Merged Model" export properly
fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU
behavior for the same UI option.
* fix(mlx): pass save_method='merged_16bit' from Studio's export page
Previously the MLX path called save_pretrained_merged() with no
save_method, which fell through to a no-op that didn't actually fuse
LoRA into the base. Now Studio's "Merged Model" export properly
fuses LoRA + dequantizes any 4-bit base to bf16, matching the GPU
behavior for the same UI option.
* fix(studio): pass private to MLX push, return 3-tuples consistently
MLX push_to_hub branch now forwards private=private (matches GPU)
Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model
needed') were tripping the route's 3-tuple unpack. Added a None
output_path so the unpack always succeeds.
* fix(studio): pass private to MLX push, return 3-tuples consistently
- MLX push_to_hub branch now forwards private=private (matches GPU)
- Existing 2-tuple early-returns ('repo_id+token required', 'PEFT model
needed') were tripping the route's 3-tuple unpack. Added a None
output_path so the unpack always succeeds.
* studio wirings
* studio wirings
* Merge pull request #5 from Manan17/feat/quant_config
studio wirings
* fix(mlx): wire train_on_completions for VLM via per-template lookup
Mirror the GPU worker: stop excluding VLMs and stop hardcoding
template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and
fetch the per-template instruction/response markers from
TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables
train_on_completions for vision+image and audio cases, so backend
just trusts the flag.
* fix(mlx): wire train_on_completions for VLM via per-template lookup
Mirror the GPU worker: stop excluding VLMs and stop hardcoding
template detection. Look up the model in MODEL_TO_TEMPLATE_MAPPER and
fetch the per-template instruction/response markers from
TEMPLATE_TO_RESPONSES_MAPPER. The frontend already force-disables
train_on_completions for vision+image and audio cases, so backend
just trusts the flag.
* wire in lora rslora, init lora weights, random_state
* wire in lora rslora, init lora weights, random_state
* loftq studio error message fix
* loftq studio error message fix
* handle unknown optim and lr scheduler
* handle unknown optim and lr scheduler
* Merge pull request #6 from Manan17/update/peftkwargs
Update/peftkwargs
* feat(mlx): pass finetune_language/attention/mlp/vision flags to FastMLXModel
Studio's four UI checkboxes now actually flow through to MLX get_peft_model
(which was just updated in unsloth-zoo to honor them). Also drops the
incorrect train_projector wiring that tied projector LoRA to the
attn/mlp flags — those are language-side toggles, not projector toggles.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx): pass finetune_language/attention/mlp/vision flags to FastMLXModel
Studio's four UI checkboxes now actually flow through to MLX get_peft_model
(which was just updated in unsloth-zoo to honor them). Also drops the
incorrect train_projector wiring that tied projector LoRA to the
attn/mlp flags — those are language-side toggles, not projector toggles.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx,ux): auto-imply finetune_language_layers when user picks attn/mlp
UI guardrail. The four checkboxes (vision/language/attention/MLP) carry
"scope × module-type" semantics that aren't obvious — picking just
"Attention modules" + "MLP modules" without "Language layers" naturally
reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune
attn/mlp modules in *no* tower" → empty target_modules → zero
trainable params → crash inside value_and_grad.
If user selected attn or mlp module types but no layer scope, default
to language scope. Power users can still explicitly choose
language=False, vision=True if they want vision-only fine-tuning of
attn/mlp.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx,ux): auto-imply finetune_language_layers when user picks attn/mlp
UI guardrail. The four checkboxes (vision/language/attention/MLP) carry
"scope × module-type" semantics that aren't obvious — picking just
"Attention modules" + "MLP modules" without "Language layers" naturally
reads as "fine-tune attn/mlp" but our backend reads it as "fine-tune
attn/mlp modules in *no* tower" → empty target_modules → zero
trainable params → crash inside value_and_grad.
If user selected attn or mlp module types but no layer scope, default
to language scope. Power users can still explicitly choose
language=False, vision=True if they want vision-only fine-tuning of
attn/mlp.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* fix(mlx): wire top_k, repetition_penalty, and VLM top_p through to mlx-lm/mlx-vlm
Inference UI sliders for top_k and repetition_penalty had no effect on
MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing
bug: mlx_vlm.generate_step expects temperature= (long form), but we
were passing temp= which silently fell into **kwargs — every VLM chat
was effectively greedy regardless of the temperature slider.
Text path (_generate_text):
make_sampler now receives top_k in addition to temp/top_p
make_logits_processors built and forwarded when repetition_penalty is
non-trivial (skip when 0.0/1.0 to avoid pointless overhead)
VLM path (_generate_vlm):
Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate
forwards them to generate_step's sampler/logits_processor builders)
Rename temp= → temperature= so it's actually consumed
Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and
Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10,
rep_pen=1.5} now produces a distinct output, proving the parameters
reach the sampler.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* fix(mlx): wire top_k, repetition_penalty, and VLM top_p through to mlx-lm/mlx-vlm
Inference UI sliders for top_k and repetition_penalty had no effect on
MLX, and VLM top_p was also silently dropped. Plus a latent pre-existing
bug: mlx_vlm.generate_step expects temperature= (long form), but we
were passing temp= which silently fell into **kwargs — every VLM chat
was effectively greedy regardless of the temperature slider.
Text path (_generate_text):
- make_sampler now receives top_k in addition to temp/top_p
- make_logits_processors built and forwarded when repetition_penalty is
non-trivial (skip when 0.0/1.0 to avoid pointless overhead)
VLM path (_generate_vlm):
- Pass top_p, top_k, repetition_penalty as kwargs (mlx_vlm.stream_generate
forwards them to generate_step's sampler/logits_processor builders)
- Rename temp= → temperature= so it's actually consumed
Verified end-to-end with a smoke test on Qwen2.5-0.5B-Instruct (text) and
Qwen2.5-VL-3B-Instruct (VLM): each of {greedy, top_p=0.5, top_k=10,
rep_pen=1.5} now produces a distinct output, proving the parameters
reach the sampler.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx): map format_type to MLX save_method, reuse local save dir for hub push
export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit"
(was hardcoded merged_16bit, ignoring the UI choice).
Both export_merged_model and export_base_model now pass save_directory=
to push_to_hub_merged so it reuses the just-written local folder
instead of re-saving under a relative "username/model" directory.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* feat(mlx): map format_type to MLX save_method, reuse local save dir for hub push
- export_merged_model: format_type="4-bit (FP4)" → save_method="merged_4bit"
(was hardcoded merged_16bit, ignoring the UI choice).
- Both export_merged_model and export_base_model now pass save_directory=
to push_to_hub_merged so it reuses the just-written local folder
instead of re-saving under a relative "username/model" directory.
Co-Authored-By: Manan17 <shahmanan170602@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* restore install
* restore install
* fix(mlx): restore FastVisionModel as a distinct class
unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel`
right after defining `class FastVisionModel(FastLanguageModel)` with a
`for_training` static method. The alias erased the class binding, so
the documented `FastVisionModel.for_training(model)` call from upstream
Unsloth's VLM notebooks raised `AttributeError` on MLX.
Remove the offending alias. `FastVisionModel` is now a real subclass of
`FastLanguageModel` again — inherits `from_pretrained` /
`get_peft_model` / `for_inference`, exposes `for_training` as a no-op
pass-through (no-op because MLX doesn't have a train/eval mode flag;
the call exists purely for GPU/MLX notebook parity).
Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via
FastVisionModel.from_pretrained → get_peft_model → for_training →
MLXTrainer.train runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs,
peak 5.89 GB).
Studio's path (FastLanguageModel.from_pretrained for any repo,
auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8.
* fix(mlx): restore FastVisionModel as a distinct class
unsloth/__init__.py was assigning `FastVisionModel = FastLanguageModel`
right after defining `class FastVisionModel(FastLanguageModel)` with a
`for_training` static method. The alias erased the class binding, so
the documented `FastVisionModel.for_training(model)` call from upstream
Unsloth's VLM notebooks raised `AttributeError` on MLX.
Remove the offending alias. `FastVisionModel` is now a real subclass of
`FastLanguageModel` again — inherits `from_pretrained` /
`get_peft_model` / `for_inference`, exposes `for_training` as a no-op
pass-through (no-op because MLX doesn't have a train/eval mode flag;
the call exists purely for GPU/MLX notebook parity).
Verified end-to-end: Qwen3-VL-2B + LaTeX_OCR LoRA + vision LoRA via
FastVisionModel.from_pretrained → get_peft_model → for_training →
MLXTrainer.train() runs 10 steps cleanly (loss 1.10 → 0.12, no NaNs,
peak 5.89 GB).
Studio's path (FastLanguageModel.from_pretrained for any repo,
auto-detect VLM in the loader) is unaffected. Tier-1 review finding #8.
* Studio: harden MLX training and export, restore GPU init guards
Studio export
Restore Tuple[bool, str, Optional[str]] contract on export_merged_model,
export_base_model, export_gguf, and export_lora_adapter, populating
output_path on successful local saves so routes/worker/CLI/frontend
details.output_path is non-empty again.
Lift the GPU save_method assignment out of the local-save branch so
Hub-only merged exports (save_directory='', push_to_hub=True) no longer
hit UnboundLocalError on the push branch.
For MLX merged and base hub-only export, stage to a tempfile.TemporaryDirectory
before push_to_hub_merged instead of passing save_directory=''.
Source _IS_MLX from unsloth instead of recomputing the platform check
(single source of truth, also enforces mlx-package availability).
Studio MLX training/inference
Pass token=hf_token into FastMLXModel.from_pretrained for gated/private
models, matching the inference path.
Strip hf_token and wandb_token from wandb.init(config=...) so secrets
do not leak into the W&B run config.
Replace load_from_disk(local_datasets[0]) with the existing
UnslothTrainer._resolve_local_files / _loader_for_files helpers so
uploaded JSON/JSONL/CSV/Parquet files train through the normal datasets
loader (load_from_disk still used for HF save_to_disk directories).
Make the dataset slice helper inclusive at the end and treat 0 as a real
index instead of "unset", matching the GPU and embedding paths.
Add a status_message -> message alias inside _send so the existing parent
pump (training.py) renders MLX status updates instead of blanks.
Forward min_p through generate_chat_response into _generate_text /
_generate_vlm and into make_sampler / vlm_kwargs so the sampling control
is no longer a no-op on MLX.
Wrap unsloth_zoo.mlx_loader / mlx_trainer imports with a clearer
ImportError pointing users at install.sh for Apple Silicon.
Exit the MLX stop-polling thread on EOFError/OSError instead of
busy-looping when the queue/pipe is permanently closed (one-line
why-safe rationale inline).
Studio frontend
ParamsSection subscribes to platform deviceType via the Zustand hook so
the gradient checkpointing dropdown re-renders after the async device
fetch completes.
Studio hardware
get_gpu_utilization MLX branch now reads _read_apple_gpu_stats once and
derives VRAM totals from psutil, removing the second ioreg subprocess
per utilization poll.
Unsloth core
Restore the os.geteuid == 0 guard around the CUDA ldconfig recovery
that was lost when GPU initialization moved into _gpu_init.py, plus the
non-root manual-fix warning branch. Non-root CUDA users no longer shell
out to ldconfig at import time.
Load dataprep/raw_text via importlib so the MLX import path no longer
pulls torch in through dataprep/__init__.py -> synthetic.py.
FastVisionModel.from_pretrained overrides the inherited delegator only
to inject text_only=False; this is an extension, not a duplication, and
is needed so VLM checkpoint loads keep the vision tower.
Wrap the MLX-branch unsloth_zoo import with a clearer ImportError.
* Studio: regression tests for MLX training/export and GPU init ldconfig guard
tests/python/test_gpu_init_ldconfig_guard.py asserts the geteuid root
check still wraps the ldconfig recovery and the non-root branch warns
bnb users; AST + source-text inspection so the test runs without torch.
tests/studio/test_export_output_path_contract.py covers the
Tuple[bool, str, Optional[str]] return contract on every export method,
the output_path assignment after successful local save, the Hub-only
GPU save_method binding fix, the MLX hub-only TemporaryDirectory
staging, and the single-source `_IS_MLX` import from unsloth.
tests/studio/test_mlx_training_worker_behaviors.py covers token
forwarding to FastMLXModel.from_pretrained, wandb config secret
stripping, file-aware local dataset loading, status_message ->
message aliasing, inclusive slice semantics, EOFError/OSError stop
thread exit, and the friendly mlx_loader / mlx_trainer ImportError.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(mlx): cap inference memory + release wired on unload + tame worker pre-pin
Three memory-hardening fixes for Studio's MLX path:
1. Inference applies the same Metal caps as the trainer.
load_model previously only called set_wired_limit(100% of recommended)
with no upper memory_limit, leaving large VLM checkpoints unbounded
during the loader allocation. Add _configure_memory_limits() that sets
memory_limit to 85% of recommended and wired_limit to min(recommended,
memory_limit) — matching MLXTrainer's defaults so behavior is the same
whether the user trains or just runs inference.
2. unload_model releases pinned memory back to the OS — but only when
the cache is empty. Without this, pinned wired bytes stayed allocated
to MLX after the model was gone, starving other apps. The release is
guarded on `not self.models` so unloading one of several cached
models doesn't un-pin weights still in use.
3. Worker pre-cap is conservative instead of aggressive.
The previous pre-pin set_wired_limit(100% of recommended) competed
with MLXTrainer's later more conservative cap. Replace with the same
85%-memory / min(rec, memory) pair that the trainer applies later
(idempotent re-apply). Bounds the model load + LoRA setup window
without over-pinning.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* tests/studio: regression tests for the _IS_MLX dispatch gate
Two gates drive every MLX-vs-CUDA dispatch decision in Studio:
1. unsloth._IS_MLX in unsloth/__init__.py — evaluated once at import
time, read by Studio worker code to choose the GPU vs MLX trainer
and inference paths. Defined as
Darwin AND arm64 AND find_spec("mlx") is not None.
2. utils.hardware.detect_hardware() — runtime probe with priority
CUDA > XPU > MLX > CPU. The MLX branch is reached only when both
CUDA and XPU are unavailable and the host is Apple Silicon and
mlx is importable.
Neither gate had a direct test. Adds tests/studio/test_is_mlx_dispatch_gate.py
with six tests:
test_is_mlx_gate_uses_three_required_predicates
AST-walks unsloth/__init__.py and asserts the _IS_MLX assignment
is a BoolOp(And) of platform.system()=="Darwin",
platform.machine()=="arm64", and find_spec("mlx") is not None.
Catches accidental rewrites that drop a predicate.
test_is_mlx_gate_true_on_apple_silicon_with_mlx_present
Spoofs platform to Darwin/arm64, injects a fake mlx module so
find_spec returns a real ModuleSpec, re-evaluates the gate
expression. Verifies it flips True under the exact conditions
Studio expects.
test_is_mlx_gate_false_when_mlx_missing
Spoofs Apple Silicon but with mlx absent. Verifies the gate stays
False (so a Mac without mlx installed does not pretend to have
MLX support).
test_is_mlx_gate_false_on_non_apple_silicon
Canary on the actual Linux+CUDA / AMD / Intel test host: the gate
must remain False regardless of whether mlx happens to be
importable. Protects existing GPU users from accidental MLX
hijack when MLX support evolves.
test_detect_hardware_picks_mlx_when_only_apple_silicon_available
Forces torch.cuda and torch.xpu off, spoofs Apple Silicon, injects
fake mlx and mlx.core. detect_hardware() must return DeviceType.MLX.
test_detect_hardware_picks_cuda_on_real_host
Canary: on a real CUDA host detect_hardware() must return
DeviceType.CUDA. Protects against the MLX branch shadowing CUDA
dispatch on NVIDIA / AMD ROCm hosts.
Uses the same monkeypatch.setitem(sys.modules, ...) fake-mlx pattern as
the existing test_mlx_inference_backend.py — no new test infrastructure,
no real mlx install required.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add AGPL-3.0 SPDX header to Studio MLX regression tests
Four Studio MLX test files shipped without an SPDX-License-Identifier:
studio/backend/tests/test_mlx_training_worker_config.py
tests/studio/test_mlx_training_worker_behaviors.py
tests/studio/test_export_output_path_contract.py
tests/studio/test_is_mlx_dispatch_gate.py
They sit in or alongside studio/backend/, which is governed by
studio/LICENSE.AGPL-3.0, and exercise AGPL Studio code. Add the same
"# SPDX-License-Identifier: AGPL-3.0-only" header that's already on
test_mlx_inference_backend.py so the license declaration matches
the code under test rather than defaulting to the repo-root
Apache-2.0.
* Wrap MLX submodule imports with friendly install hint
The _IS_MLX block at the top of unsloth/__init__.py already catches the
missing-package case with a friendly install hint, but the follow-up
"from unsloth_zoo.mlx_trainer import ..." and "from unsloth_zoo.mlx_loader import ..."
lines run unguarded. An Apple Silicon user who has unsloth-zoo installed
but on an older version (e.g. the current PyPI release, before the MLX
modules ship) sees a raw ImportError on the submodule rather than the
hint that points at install.sh.
Wrap the two submodule imports in the same try/except shape so the
friendly install message fires whether the package is missing entirely
or just predates the MLX submodules. No-op once both packages release
together; smooths the transitional window where unsloth/main has merged
but unsloth-zoo on PyPI has not.
---------
Co-authored-by: DoubleMathew <mmathew23@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
|
||
|
|
7de1f4c513
|
Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts (#5302)
* Route CPU-only Linux x86_64 to ggml-org/llama.cpp prebuilts
setup.sh hard-coded _HELPER_RELEASE_REPO=unslothai/llama.cpp for every
non-Darwin host. unslothai/llama.cpp only publishes Linux CUDA bundles
(app-*-linux-x64-cuda*.tar.gz), so a CPU-only Linux host walked ~30
releases looking for a non-existent app-*-linux-x64-cpu asset, exited
the prebuilt planner with "no compatible Linux prebuilt asset was
found", and fell through to a source build. Free CI runners
(ubuntu-latest with no GPU) hit this on every install, and anyone
running Studio on a Linux laptop without an NVIDIA GPU paid the
~3 minute cmake+make cost on first install.
ggml-org publishes llama-<tag>-bin-ubuntu-x64.tar.gz on every release
and install_llama_prebuilt.py already knows how to fetch it: when
called with --published-repo ggml-org/llama.cpp, the Linux x86_64 +
not has_usable_nvidia branch in direct_upstream_release_plan picks up
that asset directly. The fix is purely on the routing side.
Tighten the gate so a Linux host routes to ggml-org only when it is
x86_64 and has no GPU detection tool installed (nvidia-smi, rocminfo,
amd-smi, hipconfig, hipinfo). Everything else stays on the current
path:
- macOS: already on ggml-org, unchanged
- Windows: already on ggml-org via setup.ps1, unchanged
- Linux CUDA: nvidia-smi present -> unslothai/llama.cpp, unchanged
- Linux ROCm: rocminfo / amd-smi / hipconfig / hipinfo present
-> unslothai/llama.cpp -> source build with HIP,
unchanged
- Linux Intel / Vulkan / SYCL: no NVIDIA / AMD tools, hits the new
ggml-org route, gets upstream CPU asset (same as
today's source-build CPU output, ~3 min faster)
- Linux arm64 / s390x: not x86_64 -> unslothai/llama.cpp ->
source build, unchanged
* Tighten routing comment in studio/setup.sh
|
||
|
|
7be10852cb
|
install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths (#5190)
* install: support STUDIO_HOME / UNSLOTH_STUDIO_HOME for custom install paths Currently install.sh and install.ps1 hardcode all install paths off $HOME / $env:USERPROFILE with no env-var fallback. This blocks workspace-isolated installs (CI sandboxes, per-PR test environments, multi-tenant boxes) unless the entire HOME / USERPROFILE is faked, which also relocates ~/.gitconfig, ~/.ssh, and other unrelated state. Add an opt-in env-var override that does only what is needed. Resolution priority (highest first): 1. HOME / USERPROFILE explicitly redirected vs the password-database default. Detected via getent (Linux), dscl (macOS), or [Environment]::GetFolderPath (Windows). Best-effort: when the detection mechanism is unavailable the check is skipped and we fall through to step 2. 2. UNSLOTH_STUDIO_HOME, if set. 3. STUDIO_HOME, if set (alias for convenience; the variable name already matches the internal var install.sh sets). 4. Default: legacy $HOME/.unsloth/studio (or $USERPROFILE\.unsloth\studio on Windows). Identical to today's behavior when no env var is set. When an env var override fires: * DATA_DIR is nested inside ($STUDIO_HOME/share, or $StudioHome\share on Windows) so the runtime launcher and shortcuts find studio.conf in the same place install-time wrote it. * The unsloth CLI shim lands at $STUDIO_HOME/bin/unsloth (Unix) or $StudioHome\bin\unsloth.exe (Windows). On Windows the shim already lives under $StudioHome; the change only redirects DATA_DIR and skips the persistent registry PATH update. * Persistent shell PATH modifications are skipped (no .bashrc / .zshrc / .profile append on Unix; no Add-ToUserPath on Windows). Caller is expected to invoke via absolute path or add the bin dir to PATH explicitly. Avoids polluting the user's profile with a workspace-scoped path that may be deleted. The Unix launcher script is the only piece that must read DATA_DIR at runtime (it sources studio.conf from there). The hardcoded DATA_DIR inside the LAUNCHER_EOF heredoc is replaced with an @@DATA_DIR@@ placeholder substituted via sed at install time, using the same approach the script already uses for other install-time substitutions. Default path behavior is unchanged: when no env var is set and HOME is not redirected, install.sh / install.ps1 produce exactly the same file layout as today. Test scenarios verified locally on install.sh: * Default (no env vars) -> $HOME/.unsloth/studio (legacy) * HOME=/tmp/x -> /tmp/x/.unsloth/studio * UNSLOTH_STUDIO_HOME=/tmp/y -> /tmp/y as STUDIO_HOME root * STUDIO_HOME=/tmp/z (alias) -> /tmp/z as STUDIO_HOME root * HOME redirect + env var (HOME wins) -> install follows HOME * Unwritable override -> exits with clear ERROR message * install: priority change -- env vars now win over HOME redirect Flip the resolution order so explicit env vars take precedence over HOME / USERPROFILE redirection. New priority (highest first): 1. UNSLOTH_STUDIO_HOME, if set. 2. STUDIO_HOME, if set. 3. HOME / USERPROFILE explicitly redirected. 4. Default. Rationale: the env vars are explicit single-purpose signals (the user typed UNSLOTH_STUDIO_HOME=... specifically to redirect Studio). HOME redirection is broader and incidental -- the user may have redirected HOME for unrelated reasons (workspace tools, container builds) without wanting Studio to follow it. When both are set, the more specific signal should win. When only HOME is redirected (no env var), behavior is unchanged from the previous commit: install follows $HOME. * install: address review feedback (sed escape, downstream propagation, edge cases) Fixes from gemini-code-assist + chatgpt-codex-connector + reviewer.py 20-parallel run on the open PR. install.sh: * Escape sed replacement metacharacters before substituting @@DATA_DIR@@. Two-stage escape: ' -> '\'' for safe single-quote shell embedding, then \, &, | for sed replacement string + chosen delimiter. Heredoc switched to single-quoted DATA_DIR='@@DATA_DIR@@' so we only need single-quote escaping at runtime. Verified end-to-end with paths containing & and | (the sed delimiter). * Pass UNSLOTH_STUDIO_HOME into both setup.sh invocations (--local and PyPI paths) so the downstream install resolves the same Studio root install.sh picked. * macOS .app stub: replace hardcoded exec "$HOME/.local/share/unsloth/launch-studio.sh" with exec "$_css_data_dir/launch-studio.sh" so the .app launches the resolved launcher even in env-override mode. * Use mkdir -p -- and cd -- when validating the env override so paths starting with - cannot be misread as flags. install.ps1: * Drop .Guid from [guid]::NewGuid().Guid: the property does not exist; the probe filename was always identical and not unique. Default ToString() on System.Guid produces the canonical UUID string we want. * Guard LOCALAPPDATA before Join-Path to avoid aborting the installer in service / CI contexts where LOCALAPPDATA is unset (Join-Path under $ErrorActionPreference='Stop' would otherwise throw). Computed once into $defaultDataDir; both 'profile' and 'default' branches reuse it. * Set $env:UNSLOTH_STUDIO_HOME for the duration of the 'unsloth studio setup' subprocess so studio/setup.ps1 and unsloth_cli see the same install root install.ps1 picked. Restored in a finally block. studio/setup.sh: * Honor UNSLOTH_STUDIO_HOME / STUDIO_HOME (alias) when resolving STUDIO_HOME, VENV_DIR, VENV_T5_*_DIR. Falls back to the legacy $HOME/.unsloth/studio when no override is set. studio/setup.ps1: * Same change in PowerShell: honor $env:UNSLOTH_STUDIO_HOME / $env:STUDIO_HOME for $StudioHome / $VenvDir resolution. unsloth_cli/commands/studio.py: * Replace the module-level constant STUDIO_HOME = Path.home() / ".unsloth" / "studio" with a resolver that honors UNSLOTH_STUDIO_HOME / STUDIO_HOME before falling through to the legacy default. Same precedence the installers use. Verified locally: 6 install.sh scenarios still produce correct paths (default, HOME redirect, env var, alias, both, bad override). New sed-escape unit tests pass for paths containing & and |. Python resolver matches priority: UNSLOTH_STUDIO_HOME > STUDIO_HOME > default. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * install.sh: portable sed (no -i.bak) per gemini review feedback GNU sed -i.bak vs BSD/macOS sed -i.bak vs BusyBox sed have subtly different semantics. Use the POSIX-portable redirect-then-mv pattern instead. Functionally identical, runs everywhere. * studio: persist UNSLOTH_STUDIO_HOME so fresh shells find custom installs Without this, a custom-root install (UNSLOTH_STUDIO_HOME=/work/studio bash install.sh --local) only worked in the same shell that ran the installer. Closing the terminal and reopening lost the env var, the PATH was deliberately not persisted, and the Python CLI fell back to ~/.unsloth/studio. Result: 'Studio not set up' or quietly operating on a stale legacy install. Three persistence layers, all backwards-compatible (default installs emit zero changes): 1. Unix studio.conf install.sh now writes 'export UNSLOTH_STUDIO_HOME=...' next to UNSLOTH_EXE in studio.conf when in env-override mode. The launcher sources studio.conf at startup so the exec'd binary gets the var. Default installs do not write this line; studio.conf stays byte-identical to before. 2. Windows launch-studio.ps1 install.ps1 prepends '$env:UNSLOTH_STUDIO_HOME = ...' to the generated launcher when in env-override mode. Default installs produce the same launcher content as before. 3. Python sys.prefix inference storage_roots.studio_root() and unsloth_cli/commands/studio.py now infer the install root from sys.prefix when no env var is set (Path(sys.prefix).parent for unsloth_studio venvs). Catches direct invocations of <STUDIO_HOME>/bin/unsloth that bypass the launcher entirely. unsloth_cli/commands/studio.py also re-exports the resolved UNSLOTH_STUDIO_HOME via os.environ.setdefault so child processes (setup script, backend run.py) inherit it. Backend storage roots (storage_roots.studio_root, cache_root) now respect the env var via the shared resolver. run.py PID file, transformers_version.py T5 venvs, and model_config.py vision-check venv all switch to studio_root() so custom installs are self-contained. studio/setup.ps1: T5 sidecar venvs now resolve under $StudioHome (was $env:USERPROFILE\.unsloth\studio\.venv_t5_*). studio/setup.sh + studio/setup.ps1: llama.cpp build dir nests under $STUDIO_HOME / $StudioHome when env-override is active, otherwise keeps the legacy ~/.unsloth/llama.cpp. Verified locally: * studio.conf write block: env-override mode emits the export line; default mode does not (byte-identical to today). * PowerShell heredoc interpolation: correct output for both modes. * studio_root() resolver: default, UNSLOTH_STUDIO_HOME, STUDIO_HOME alias, and sys.prefix-based inference all return correct paths. * cache_root() now derives from studio_root(). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * install: tilde expansion + macOS .app stub safe-quoting Two fixes from running a 25-scenario simulation sweep against install.sh across path edge cases (spaces, apostrophes, ampersands, pipes, backslashes, dollar signs, Unicode, trailing slash, relative paths). 1. UNSLOTH_STUDIO_HOME=~/foo was landing as literal '~/foo' (env vars are not subject to tilde expansion). Added a POSIX-portable case block in install.sh, install.ps1, studio/setup.sh, studio/setup.ps1 that expands a leading ~ or ~/ to $HOME / $env:USERPROFILE. The prefix-removal pattern is single-quoted ('${var#'~/'}') so the shell does not tilde-expand the pattern back to $HOME/ before matching -- a subtle dash/bash gotcha. 2. macOS .app stub used an unquoted heredoc ('<< STUB_EOF'), so any $VAR / backtick / etc in the path would expand at .app launch time. Switched to single-quoted heredoc ('<< 'STUB_EOF'') with a placeholder + sed substitution + single-quoted shell embedding, matching the @@DATA_DIR@@ pattern already used for launch-studio.sh. Verified: 25/25 simulation scenarios pass on Linux dash + bash, including paths with $VAR, &, |, \\, ', spaces, and Unicode. End-to-end install in env-mode + fresh-shell launcher invocation confirmed: studio binds to /api/health from a clean env, and sys.prefix-based inference correctly returns the workspace root. * install: stop accidentally treating default installs as env-override Reviewer.py 20-runs cycle 1 found a unanimous P1 regression: a default 'unsloth studio update' relocates llama.cpp from ~/.unsloth/llama.cpp to ~/.unsloth/studio/llama.cpp, because the CLI was re-exporting UNSLOTH_STUDIO_HOME unconditionally and install.sh / install.ps1 were passing it into setup.{sh,ps1} unconditionally. The setup scripts treated the var's mere presence as "env-override mode" and relocated the llama.cpp build dir away from the legacy path, breaking the runtime backend's _find_llama_server_binary lookup on default installs. Fixes: * unsloth_cli/commands/studio.py: _resolve_studio_home now returns (path, is_custom). Re-export only when is_custom -- a real env override or a sys.prefix inference that resolves to a non-legacy path. Default installs leave UNSLOTH_STUDIO_HOME unset. * install.sh: gate UNSLOTH_STUDIO_HOME on $_STUDIO_HOME_REDIRECT == env before calling setup.sh. Use 'env $VARS bash setup.sh' so the var is set only for the subprocess, never leaked. * install.ps1: gate $env:UNSLOTH_STUDIO_HOME on $StudioRedirectMode -eq 'env' before invoking 'unsloth studio setup'. Restore prior value in finally block (unset if it wasn't set). * studio/setup.sh + setup.ps1: decide llama.cpp install root from the resolved $STUDIO_HOME (not from env-var presence). If the resolved path equals the legacy default ($HOME/.unsloth/studio), fall back to ~/.unsloth/llama.cpp. This makes setup robust against a stale UNSLOTH_STUDIO_HOME inherited from a parent process that happens to point at the legacy default. * studio/backend/core/inference/llama_cpp.py: - _find_llama_server_binary() now searches studio_root() / llama.cpp AND the legacy ~/.unsloth/llama.cpp (de-duped). Custom-root installs become discoverable; default installs unaffected. - kill_orphaned_servers ownership allowlist also includes studio_root() / llama.cpp so custom-root processes are cleanable. Verified locally: * 25/25 sim scenarios still pass (path edge cases unchanged). * setup.sh unit test: default-mode lands UNSLOTH_HOME at $HOME/.unsloth; env-mode lands at $STUDIO_HOME. * Python CLI unit test: default-mode returns is_custom=False and does NOT setdefault UNSLOTH_STUDIO_HOME; env-mode sets is_custom=True. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * install: || exit 1 on STUDIO_HOME subshell (dash set -e gap) Gemini review feedback: in dash, set -e does not trigger on subshell failures inside variable assignments. If 'cd -- "$_override" && pwd' fails, STUDIO_HOME stays empty and DATA_DIR collapses to /share. Add explicit '|| exit 1' on both install.sh:187 and setup.sh:413. * install.sh: argv-safe setup invocation for paths with spaces Cycle 2 reviewer.py 20-runs found a unanimous P1: passing the env-var through 'env $_STUDIO_ENV_FOR_SETUP' word-splits on whitespace, so a custom root like '/tmp/Unsloth Studio' becomes 'UNSLOTH_STUDIO_HOME= /tmp/Unsloth' followed by env trying to exec 'Studio'. Replaced with a tiny helper that prepends the env-var directly to the argv (no string-form intermediary), so spaces are preserved as a single argument. Default-mode invocation skips the env-var entirely. Verified: 'UNSLOTH_STUDIO_HOME=/tmp/test space/studio' now reaches setup.sh as a single value. * studio: tighten sys.prefix inference + Tauri env handling + llama.cpp env Cycle 3 reviewer.py findings (3 P1s converging): * sys.prefix inference too broad: a developer venv named 'unsloth_studio' was being treated as a custom Studio root. Narrow with an installer- sentinel check (presence of share/studio.conf or bin/unsloth shim inside the parent dir) in both unsloth_cli/commands/studio.py and studio/backend/utils/paths/storage_roots.py. * Tauri studio/src-tauri/src/process.rs::find_unsloth_binary() hardcoded ~/.unsloth/studio. Honor UNSLOTH_STUDIO_HOME / STUDIO_HOME (in that priority order) before falling back to legacy. * unsloth-zoo's GGUF export binds LLAMA_CPP_DEFAULT_DIR at import time from UNSLOTH_LLAMA_CPP_PATH. For env-override installs, persist UNSLOTH_LLAMA_CPP_PATH alongside UNSLOTH_STUDIO_HOME in studio.conf (Unix), in the generated PowerShell launcher (Windows), and via os.environ.setdefault in the Python CLI when running on a custom root, so GGUF export uses the custom-root llama.cpp build instead of the legacy ~/.unsloth/llama.cpp. Default behaviour unchanged: no env vars are written to studio.conf in default mode, no LLAMA_CPP_PATH is set, and the dev-venv inference falls through to legacy when no installer sentinels are present. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: desktop_auth env-aware + legacy-root llama.cpp consistency - desktop_auth.rs: honor UNSLOTH_STUDIO_HOME / STUDIO_HOME for the .desktop_secret path so Tauri desktop login works against custom-root installs instead of always reading ~/.unsloth/studio/auth/. - install.sh / install.ps1 / unsloth_cli/commands/studio.py: when an env override resolves to the legacy default ($HOME/.unsloth/studio), set UNSLOTH_LLAMA_CPP_PATH to ~/.unsloth/llama.cpp (matching setup.sh / setup.ps1's legacy-equality branch). Previously the persisted value pointed at $STUDIO_HOME/llama.cpp, which was a non-existent location and broke unsloth-zoo's import-time GGUF binding for that edge case. * studio: tauri studio_root helper + marker-file persistence + ~ expansion Address cycle-5 reviewer findings: - Add studio/src-tauri/src/studio_root.rs: shared resolver with UNSLOTH_STUDIO_HOME / STUDIO_HOME (priority order), tilde expansion (~, ~/..., ~\...), installer-written marker fallback, then ~/.unsloth/studio. 5 unit tests cover the expansion paths. - Tauri lookups now go through the shared resolver: - process.rs::find_unsloth_binary - desktop_auth.rs::desktop_secret_path - main.rs::setup_logging (tauri.log under custom root) - commands.rs::open_logs_dir (opens custom root dir) - install.rs work_dir uses parent of resolved root (avoids creating a stray ~/.unsloth on a custom-root install) - install.sh / install.ps1 (env-mode only): write ~/.unsloth/studio-home marker so the desktop app launched from Finder/Start Menu (no shell env inheritance) still resolves the custom root. - install.sh / install.ps1 non-interactive completion: when StudioRedirectMode=env, print the absolute custom-root shim path since the persistent rc/registry PATH update is intentionally skipped in env-override mode. - unsloth_cli/commands/studio.py: replace setdefault() with truthy-check so a blank UNSLOTH_STUDIO_HOME / UNSLOTH_LLAMA_CPP_PATH in the parent env doesn't suppress the inferred custom root. 40/40 cargo test --bins pass. * studio: validate marker file + write in --tauri mode + propagate to subprocess Cycle-6 reviewer follow-ups: - studio_root.rs marker resolver now validates the persisted path before using it. A stale ~/.unsloth/studio-home pointing at a deleted/moved workspace is ignored (resolution falls back to the legacy default rather than hijacking it). Validation accepts share/studio.conf sentinel or bin/unsloth shim. Trailing newline strip uses trim_end_matches(['\n','\r']) so paths whose content legitimately has leading/trailing spaces survive. - install.sh / install.ps1: marker write moved out of the launcher generation path so it runs before the Tauri-mode early exit. Both shell-launcher and Tauri-installed env-mode roots now persist the marker. Removed the duplicate marker write that was previously inside install.ps1's $studioHomeExport block. - studio/src-tauri/src/install.rs: pass UNSLOTH_STUDIO_HOME to the installer subprocess (when not already in scope) so app-initiated repair / update flows reach the same root the running app uses. cargo test --bins -- --test-threads=1: 44/44 pass (4 new tests for marker validation: sentinel accepted, bin shim accepted, empty dir rejected, missing path rejected). * studio: fix Tauri legacy-fallback regression + stale marker cleanup Cycle-7 reviewer follow-ups (regression I introduced in cycle 6): - studio_root.rs: add StudioRootSource enum + resolve_studio_root_with_source(). Lets callers distinguish a real custom override (Env / Marker) from the legacy fallback (Default). - studio/src-tauri/src/install.rs: only forward UNSLOTH_STUDIO_HOME to the installer subprocess when the resolution source is Env or Marker. The Default fallback must NOT be passed -- install.sh / install.ps1 treat any non-empty UNSLOTH_STUDIO_HOME as env-override mode and would relocate DATA_DIR to $STUDIO_HOME/share and _LOCAL_BIN to $STUDIO_HOME/bin (regressing default Tauri repair / update flows from the legacy ~/.local/share/unsloth and ~/.local/bin). - install.sh / install.ps1: clear stale marker on default / HOME-redirect installs. A user who first installed with UNSLOTH_STUDIO_HOME=/work/studio then later reinstalls without env vars no longer has the desktop app hijacked by ~/.unsloth/studio-home pointing at the old custom root. - install.sh / install.ps1: when env mode wins over a redirected HOME / USERPROFILE, write the marker into the OS-reported real profile home (getent / dscl on Unix; [Environment]::GetFolderPath on Windows) so a later desktop launch from the user's normal session still finds it. Falls back to the current HOME / USERPROFILE. cargo test --bins -- --test-threads=1: 45/45 pass (1 new for the source enum invariants). * install: scrub stale marker from real-home on HOME-redirect cleanup Cycle-8 reviewer follow-up: the previous cleanup branch only removed \$HOME/.unsloth/studio-home, leaving a stale marker in the real password-database home after a prior env-mode install. A later default install with redirected HOME / USERPROFILE would still see the desktop app resolving the old custom root. - install.sh: compute the real password-database home (via getent / dscl) unconditionally, and scrub markers from BOTH \$HOME and the real-home in the default / HOME-redirect cleanup branch. - install.ps1: build a profile-candidate list (current USERPROFILE + OS-reported real profile) and remove markers from EVERY candidate in the default / profile-redirect cleanup branch. bash -n + cleanup smoke verified. * revert: drop Tauri env-var support + marker file mechanism Keep this PR scoped to shell installer + Python backend env-var support. Tauri desktop integration with custom Studio roots is deferred to a separate, focused PR. Reverts to pre-PR state: - studio/src-tauri/src/process.rs (find_unsloth_binary) - studio/src-tauri/src/desktop_auth.rs (auth_secret_path) - studio/src-tauri/src/main.rs (setup_logging tauri.log path) - studio/src-tauri/src/commands.rs (open_logs_dir) - studio/src-tauri/src/install.rs (work_dir + subprocess env) - studio/src-tauri/src/studio_root.rs DELETED Removes from install.sh / install.ps1: - ~/.unsloth/studio-home marker write/read/cleanup - HOME-redirect-aware marker location logic What this PR keeps (the original scope): - install.sh / install.ps1: UNSLOTH_STUDIO_HOME / STUDIO_HOME env-var resolver with HOME-redirect detection, tilde expansion, legacy fallback. Default installs are byte-identical to pre-PR. - studio/setup.sh / studio/setup.ps1: legacy-equality llama.cpp path. - studio.conf / launcher persists UNSLOTH_STUDIO_HOME + UNSLOTH_LLAMA_CPP_PATH for fresh shells (env-mode only). - unsloth_cli/commands/studio.py: env > sys.prefix sentinel > legacy resolver, conditional re-export. - studio/backend/utils/paths/storage_roots.py: same resolver. - Backend modules use storage_roots (run.py, model_config.py, transformers_version.py, llama_cpp.py). cargo test --bins -- --test-threads=1: 34/34 pass (pre-PR baseline). bash -n install.sh: clean. * install: cycle-10 fixes (default launcher, --tauri guard, env-mode shortcuts, win PATH) - install.sh launcher: default and HOME-redirect installs keep the legacy DATA_DIR=\"\$HOME/.local/share/unsloth\" runtime form so a later shell with a different \$HOME still resolves DATA_DIR. Only env-mode bakes the resolved absolute path. Restores byte-identical default behavior. - install.sh / install.ps1: fail fast when --tauri is combined with UNSLOTH_STUDIO_HOME / STUDIO_HOME. The desktop app still resolves the legacy ~/.unsloth/studio root, so a custom-root --tauri install would yield a desktop app that cannot find its binary or auth secret. Print the right alternative. - install.sh / install.ps1: skip persistent desktop / Start-Menu shortcuts in env-override mode. Workspace-scoped installs would otherwise leave launchers pointing at a path the user may delete. Default and HOME/profile-redirect installs keep the shortcut. - install.ps1: re-prepend env-override \$ShimDir AFTER Refresh-SessionPath. Refresh rebuilds PATH as Machine > User > current \$env:Path, so a previously-installed legacy User PATH entry would otherwise win precedence over the current-session env-override shim. bash -n install.sh, pwsh parser install.ps1 + setup.ps1: clean. cargo test --bins -- --test-threads=1: 34/34 (Tauri unchanged). * install: cycle-11 fixes (env-mode launcher writes, --tauri legacy passthrough, run.py llama path) - install.sh / install.ps1: env-mode no longer skips the entire create_studio_shortcuts / New-StudioShortcuts function. Move the early-return INSIDE those functions, just before the persistent desktop / Start-Menu shortcut creation. The runtime launcher (launch-studio.sh / launch-studio.ps1), studio.conf with UNSLOTH_STUDIO_HOME / UNSLOTH_LLAMA_CPP_PATH exports, and the icon ARE always written so env-mode shims can resolve via fresh shells. - install.sh / install.ps1: --tauri guard passes through when the override resolves to the legacy default ($HOME/.unsloth/studio / %USERPROFILE%\.unsloth\studio). The desktop app already uses that path, so explicit-equality is a supported edge case (matches the llama.cpp legacy-equality branch). - studio/backend/run.py: when launched directly (bypassing the unsloth CLI), set UNSLOTH_STUDIO_HOME and UNSLOTH_LLAMA_CPP_PATH before the rest of import chain runs so unsloth-zoo's import-time LLAMA_CPP_DEFAULT_DIR binding picks up the custom-root build. Only set when STUDIO_ROOT is a real custom override; legacy default installs leave them unset. bash -n install.sh, pwsh parser install.ps1: clean. python ast parse studio/backend/run.py: clean. cargo test --bins -- --test-threads=1: 34/34 pass (Tauri unchanged). * install: cycle-12 fixes (--tauri trailing slash + main.py uvicorn env) - install.sh / install.ps1 --tauri legacy passthrough: strip trailing separators before comparing the override to the legacy default. Previously UNSLOTH_STUDIO_HOME=\"\$HOME/.unsloth/studio/\" (with trailing slash) was rejected even though it resolves to the supported legacy root. - studio/backend/main.py: when launched directly via \`uvicorn main:app\` from a custom-root venv (bypassing both unsloth_cli and run.py), export UNSLOTH_STUDIO_HOME and UNSLOTH_LLAMA_CPP_PATH before any unsloth-zoo import so its import-time LLAMA_CPP_DEFAULT_DIR binding picks up the custom-root build. Only sets when STUDIO_ROOT is a real custom override. bash -n install.sh, pwsh parser install.ps1, python ast main.py: clean. Smoke probe: UNSLOTH_STUDIO_HOME=\$HOME/.unsloth/studio/ install.sh --tauri no longer exits with the unsupported-custom-root error. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * install.ps1: skip CWD-relative venv migration in env-override mode The legacy ~/unsloth_studio venv migration path on Windows reads %USERPROFILE%\unsloth_studio\Scripts\python.exe (a fixed home-relative path). Under env-override mode this would Move-Item the user's pre-existing default-install venv into $StudioHome\unsloth_studio, breaking the default install and contaminating the workspace root. Gate the migration on $StudioRedirectMode -ne 'env' so workspace-scoped installs leave the user's default-install venv untouched. No Linux equivalent: install.sh migrates from \$STUDIO_HOME/.venv which is already env-mode-aware (points at the workspace root, not \$HOME). * install: cycle-14 fixes (Tauri env scrub + setup.ps1 missing-root error) Tauri does not honor UNSLOTH_STUDIO_HOME / STUDIO_HOME / UNSLOTH_LLAMA_CPP_PATH yet -- the desktop app's Rust paths use the legacy ~/.unsloth/studio root. If the user's shell has these env vars set, spawned Python subprocesses would diverge from the Rust paths (custom-root Python <-> legacy-root Rust). Scrub the three env vars at all Tauri subprocess spawn sites: - process.rs: backend launch - desktop_auth.rs: provision-desktop-auth subprocess - install.rs: install.sh / install.ps1 invoked from the desktop app (also prevents the --tauri guard from rejecting an inherited override). setup.ps1: when UNSLOTH_STUDIO_HOME points at a non-existent directory, 'Resolve-Path -LiteralPath' threw a confusing PSObject error under $ErrorActionPreference = "Stop". Test-Path the override first and emit a friendly "run install.ps1 to create the install root" message instead. * install: cycle-15 fixes (preserve UNSLOTH_LLAMA_CPP_PATH + add update.rs scrub) UNSLOTH_LLAMA_CPP_PATH is a pre-existing custom-llama.cpp-directory override the Python backend (studio/backend/core/inference/llama_cpp.py) and unsloth-zoo intentionally support. It is unrelated to the Studio install root. Cycle 14 over-scrubbed it from the Tauri spawn sites, regressing desktop GGUF/llama.cpp workflows for users who set it in their shell. - process.rs / desktop_auth.rs / install.rs: stop scrubbing UNSLOTH_LLAMA_CPP_PATH; only scrub UNSLOTH_STUDIO_HOME and STUDIO_HOME. - update.rs: missed Tauri spawn site -- add the same UNSLOTH_STUDIO_HOME / STUDIO_HOME scrub so 'unsloth studio update' from the desktop app updates the legacy-root install Tauri actually manages. Verified: cargo test --bins -- --test-threads=1 -> 34/34 pass. * install.sh: document apostrophe-escape derivation inline The shell quoting at install.sh:642 / 659 / 679 / 680 / 823 has been flagged as broken across multiple review cycles, but every end-to-end verification (DATA_DIR=\"a b's&c|d\$e\" -> generated launcher -> source -> recovered exact input) passes. The proposed "8 backslash" fix would double the escape and actually break what currently works. Strengthen the inline comments to spell out the derivation: - shell pattern \"s/'/'\\\\''/g\" passes \"s/'/'\\''/g\" to sed (\\\\ -> \\) - sed replacement '\\'' yields close-quote / escaped-quote / open-quote - stage 2 (\\, &, |) only needed where the value is then sed-replaced into a launcher template via s|@@DATA_DIR@@|VALUE|g studio.conf is written via printf, not sed, so it only needs stage 1. No behavior change, only inline doc to head off future false positives. * install/setup .ps1: use -LiteralPath for $StudioHome-derived paths Pre-PR, $StudioHome was hardcoded to %USERPROFILE%\.unsloth\studio -- no wildcard characters possible. The PR introduces UNSLOTH_STUDIO_HOME / STUDIO_HOME, so $StudioHome (and every path derived from it: $VenvDir, $VenvPyExe, $UnslothExe, $UnslothHome, $LlamaCppDir, $VenvT5_*, etc.) can now contain bracket characters that PowerShell would interpret as wildcards. Reproducer (from cycle 17 review 20): pwsh> Test-Path 'studio[abc]/Scripts/python.exe' False pwsh> Test-Path -LiteralPath 'studio[abc]/Scripts/python.exe' True Switch the relevant Test-Path / Remove-Item / New-Item / Move-Item calls in install.ps1 and studio/setup.ps1 to -LiteralPath. Sites where the path is fixed (the shim under %LOCALAPPDATA%\Microsoft\WindowsApps, $RepoRoot from -PSCommandPath) keep the wildcard-aware form. * install/setup .ps1: fix New-Item -LiteralPath regression from cycle 17 Cycle 17 added -LiteralPath to all $StudioHome-derived path operations, but New-Item has no -LiteralPath parameter (verified pwsh 7.6 syntax: "New-Item [-Path] <string[]> [-ItemType <string>] ..."). Every directory- creation site would throw "A parameter cannot be found that matches parameter name 'LiteralPath'" at runtime, blocking T5 sidecar setup, llama.cpp parent creation, and StudioHome creation. Likewise, "Split-Path -LiteralPath $X -Parent" cannot mix LiteralPath with -Parent (separate parameter sets). The default LiteralPath mode already returns the parent. Switch to [System.IO.Directory]::CreateDirectory($X), which natively takes a literal path, and drop the trailing -Parent on Split-Path. Verified end-to-end on a bracketed path "/tmp/...[abc]": - CreateDirectory: created - Test-Path -LiteralPath: detects - nested CreateDirectory(Split-Path -LiteralPath ...): works * install/setup .ps1: extend -LiteralPath sweep to remaining \$StudioHome paths Cycle 17/18 missed several wildcard-aware operations on user-controlled \$StudioHome-derived paths. Reviewers identified remaining sites: install.ps1: - \$UnslothExePath (Test-Path / Resolve-Path) at the shortcut creator - \$VenvDir (Get-ChildItem) at the no-torch-runtime resolver - \$ShimDir (New-Item Directory -- replaced with .NET CreateDirectory) - \$ShimExe (Test-Path / Remove-Item / re-prepend guards) -- the shim lives at \$StudioHome\\bin\\unsloth.exe in env-override mode, so it inherits bracket sensitivity from \$StudioHome. - \$UnslothExe (Copy-Item fallback) when HardLink fails. studio/setup.ps1: - \$LlamaServerBin (Test-Path) at the prebuilt-bundle / source-build validation gates (3 sites). \$LlamaServerBin lives under \$BuildDir under \$LlamaCppDir under \$UnslothHome under \$StudioHome. New-Item HardLink keeps -Path because creating a non-existent target with brackets succeeds (verified via direct pwsh smoke test). * install: cycle-20 fixes (more setup.ps1 -LiteralPath + shell-quote launch hints) setup.ps1: extend -LiteralPath sweep to remaining \$BuildDir-derived paths that the cycle-19 commit missed: - \$CmakeCacheFile (Test-Path + Select-String -Path) - \$buildTmp (10 Test-Path / Remove-Item sites in source-build cleanup) - \$QuantizeBin (Test-Path) - \$altBin (Test-Path) These all live under \$BuildDir -> \$LlamaCppDir -> \$UnslothHome -> \$StudioHome, which is now user-controlled via UNSLOTH_STUDIO_HOME. Bracket characters in the override would silently skip rebuild detection or leave stale build artifacts. install.sh: shell-quote the launch-instruction substep lines for env- override mode. UNSLOTH_STUDIO_HOME values containing spaces or apostrophes (e.g. "/tmp/O'Brien Studio") would print copy-paste- unsafe commands -- the install succeeded but the printed launch instructions split at the space. Now wraps with the canonical '\\''-style escape so the printed lines parse with bash -n. Verified end-to-end: - printed shim line: '/tmp/O'\''Brien Studio/bin/unsloth' studio ... - bash -n on the printed line passes. * install.ps1: -LiteralPath for macOS-stub-launcher \$appDir-derived paths The shortcut/launcher generator at install.ps1:418-693 writes the stub launcher, .vbs, and icon under \$appDir = \$StudioDataDir, which in env-override mode is \$StudioHome\share. Cycle 17/19/20 missed the following wildcard-aware ops on these paths: - Test-Path \$appDir (with New-Item Directory swap to .NET CreateDirectory) - Set-Content -Path \$launcherVbs (for the WSH .vbs stub) - Test-Path / Copy-Item \$bundledIcon (bundled icon copy) - Test-Path / Remove-Item \$iconPath (icon header validation) In env-override mode \$StudioHome can contain bracket characters; without -LiteralPath the .vbs write fails outright and the icon validation can either skip a present icon or fail to delete a malformed one. (The COM shortcut creation downstream returns early in env-override mode, so its path values don't need this treatment.) * install: don't override pre-existing UNSLOTH_LLAMA_CPP_PATH in launchers Cycle 14/15 established UNSLOTH_LLAMA_CPP_PATH as a pre-existing custom-llama.cpp-directory override the Python backend and unsloth-zoo intentionally support, independent of the Studio install root. The launchers (studio.conf sourced by Unix launch-studio.sh, and the PowerShell launch-studio.ps1) were unconditionally re-exporting it, which silently overrides a user's pre-existing value when they invoke the launcher from a shell where UNSLOTH_LLAMA_CPP_PATH is already set. Make the assignment conditional in both launchers: install.sh studio.conf: if [ -z "\${UNSLOTH_LLAMA_CPP_PATH:-}" ]; then export UNSLOTH_LLAMA_CPP_PATH='...' fi install.ps1 launch-studio.ps1: if (-not \$env:UNSLOTH_LLAMA_CPP_PATH) { \$env:UNSLOTH_LLAMA_CPP_PATH = '...' } UNSLOTH_STUDIO_HOME stays unconditional: the launcher is bound to a specific install, so its STUDIO_HOME must always match that install. * install.sh: harden --tauri legacy resolver against CDPATH and symlinks Reviewer cycle 23 (inst 19) noted that the bare \`cd -- ... && pwd\` form in the --tauri legacy comparison can echo a CDPATH-prefixed path when the user has CDPATH set in their environment, contaminating the resolved absolute path used in the legacy-equality check. Switch to \`CDPATH= cd -P -- ... && pwd -P\` so: - CDPATH= clears the cd-prefix-echo behavior - -P / pwd -P resolves any symlinks to a canonical path No behavior change for users without CDPATH set; correctness fix for users who have it set in their shell. * install + llama_cpp backend: cycle-24 hardening Three real findings from cycle 24 reviewers: 1. install.sh:231 + studio/setup.sh:413 -- main \$STUDIO_HOME resolvers used the same bare \`cd -- ... && pwd\` form that cycle 23 only fixed for the --tauri guard. Switch both to: \$(CDPATH= cd -P -- "\$override" && pwd -P) so relative custom-root values don't get CDPATH-prefixed or have the cd-on-CDPATH stdout newline contaminate the captured value. 2. install.sh --tauri legacy root used logical \$HOME/.unsloth/studio while the override side was canonicalized via pwd -P. A symlinked \$HOME (e.g. /home/alice -> /u/alice) made the comparison fail even when both sides pointed at the same directory. Canonicalize the legacy side too when the dir exists. 3. studio/backend/core/inference/llama_cpp.py:_find_llama_server_binary searched \$STUDIO_HOME/llama.cpp first then ~/.unsloth/llama.cpp in default-mode installs. setup.sh / setup.ps1 only install llama.cpp under \$STUDIO_HOME/llama.cpp in env-override mode; in default mode it always lives at ~/.unsloth/llama.cpp. The post-PR search would pick up a stale partial install at ~/.unsloth/studio/llama.cpp over the real legacy binary. Mirror setup's legacy-equality check: when studio_root() resolves equal to ~/.unsloth/studio, search ONLY the legacy ~/.unsloth/llama.cpp. Otherwise (env-override custom root), search custom first, legacy fallback. * install + setup: canonicalize legacy-equality comparison sites Cycle 24 made \$STUDIO_HOME canonical via 'CDPATH= cd -P -- ... && pwd -P', but the legacy-equality comparison sites still used the bare logical "\$HOME/.unsloth/studio" string. With a symlinked \$HOME (e.g. /home/alice -> /u/alice), the comparison fails even when both sides point at the same dir, and llama.cpp ends up under a custom-root path the Python backend's legacy comparison cannot find. Reviewer cycle 25 inst 2 reproduced this with HOME=/tmp/link -> /tmp/real and UNSLOTH_STUDIO_HOME=\$HOME/.unsloth/studio: setup.sh resolves UNSLOTH_HOME to /tmp/real/.unsloth/studio while the backend search resolves both physically equal and looks at /tmp/link/.unsloth/llama.cpp. Canonicalize the legacy side at all four sites: - install.sh:695 (create_studio_shortcuts llama.cpp path) - studio/setup.sh:577 (UNSLOTH_HOME selection) - install.ps1:462 (launcher UNSLOTH_LLAMA_CPP_PATH path) - studio/setup.ps1:1829 (UnslothHome selection) Apply CDPATH= cd -P -- ... && pwd -P (Unix) or Resolve-Path -LiteralPath (Windows) when the legacy dir exists. unsloth_cli/commands/studio.py already does this via Path.resolve(). * llama_cpp: gate _kill_orphaned_servers studio-root allowlist on env-override Cycle 24 fixed _find_llama_server_binary to only search \$STUDIO_HOME/llama.cpp when STUDIO_HOME is a real env override (not the legacy default), but the symmetric _kill_orphaned_servers allowlist still appended _sr() / "llama.cpp" unconditionally. In default mode _sr() resolves to ~/.unsloth/studio, so ~/.unsloth/studio/llama.cpp would be treated as a Studio-owned install root for the orphan-kill scan even though the default installer does not own that path. A llama-server process running there from a different tool or a stale partial install would be killed. Apply the same legacy-equality check used in _find_llama_server_binary and the install/setup scripts: only add _sr()/"llama.cpp" to the allowlist when STUDIO_HOME != legacy default. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * setup.sh + setup.ps1: canonicalize both sides of legacy-equality check Proactive audit pass found one real asymmetry the cycle-by-cycle review process had not yet flagged: - install.sh:704 / install.ps1:469 are gated on env-mode and only run when STUDIO_HOME has already been canonicalized (cycle 24). Symmetric. - studio/setup.sh:577 / studio/setup.ps1:1829 run UNCONDITIONALLY, including in default mode. In default mode STUDIO_HOME is set to the bare logical \$HOME/.unsloth/studio (setup.sh:416) or Join-Path \$env:USERPROFILE ".unsloth\\studio" (setup.ps1:1480). Cycle 25 canonicalized only the legacy side, creating an asymmetry under symlinked \$HOME / junctioned %USERPROFILE%. Result of the asymmetry: a default-mode install on a host with \$HOME=/tmp/link -> /tmp/real treats the legacy default as a custom root, putting llama.cpp at \$STUDIO_HOME/llama.cpp instead of ~/.unsloth/llama.cpp -- and the Python backend's _find_llama_server_binary (which uses .resolve() on both sides) then can't find the install. Fix: canonicalize STUDIO_HOME on the fly at the comparison site, in both setup.sh and setup.ps1. Symmetric with the now-canonicalized legacy side from cycle 25, regardless of which mode set STUDIO_HOME. The other two comparison sites (install.sh:704, install.ps1:469) are already symmetric because they only run when STUDIO_HOME comes from the env-override resolution path that already does pwd -P / Resolve-Path. unsloth_cli/commands/studio.py + studio/backend/run.py + main.py + llama_cpp.py already use .resolve() on both sides -- symmetric. * install.ps1: env-override resolution uses .NET API for literal paths Gemini code-review (review 4177641398, commit |
||
|
|
858ba9ba20
|
Fix Studio chat history and attachments with newer assistant-ui (#5296)
Pass Studio history, dictation, and attachment adapters directly into useLocalRuntime instead of relying on assistant-ui's unstable_Provider ordering, which fixes blank chat threads on reload and broken image upload / drag-drop on fresh PyPI and curl installs that resolved @assistant-ui/react to the newer _RuntimeBinder path. Also pins @assistant-ui/react, @assistant-ui/react-markdown, @assistant-ui/react-streamdown, and assistant-stream to exact versions in package.json so future installs cannot silently re-float onto a newer pre-1.0 release. The lockfile alone only fixes resolution for the install that consumes it -- a future bun add / npm install <other-pkg> rewrites the lockfile and is free to drift carets within their range, which is exactly the path that pulled @assistant-ui/react from 0.12.19 to 0.12.28 and broke 2026.5.1. Adds studio/frontend/package-lock.json so npm fallback / fresh installs have deterministic resolution. Tests: - bun run typecheck - npm ci on a clean tree (1083 packages) - npm run build (bundle no longer contains the unstable_Provider Studio call site; only assistant-ui internals reference unstable_Provider) |
||
|
|
832f48c41a
|
Chore/help svg (#5283)
* fix: developer to api * fix: help svg and Unsloth text * svg fix --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
d8a0bebbc0
|
Studio: help svg replacement and Unsloth sidebar text (#5282)
* fix: developer to api * fix: help svg and Unsloth text --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
d741cc928b
|
fix: developer to api (#5281) | ||
|
|
19f305238e
|
Studio: Preserve chat history during autosave (#5278)
* fix: chat recents reopening after new chat * fix: optimize chat delete pruning query |
||
|
|
09505fcc6e
|
Update VRAM estimator to cater to broader model configs (#5175)
* Update VRAM estimator to cater to broader model configs * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn backend check, better support for MoE etc * Studio: tighten VRAM estimator structured-shape and attention paths - Conservative attention fallback: when resolve_attention_implementation fails, charge the quadratic non-flash activation path instead of silently keeping the optimistic flash_attention_2 default. - Resolve attention on a shallow config copy so _set_attn_impl does not mutate the cached config returned by _load_config_for_gpu_estimate. - Use getattr for AutoModelForCausalLM._model_mapping to avoid raising on private-attribute renames in transformers. - Treat sdpa as O(n) linear attention; PyTorch SDPA dispatches to flash or memory-efficient backends, only eager needs the quadratic term. - Per-layer activation accounting: structured archs (head_dim, layer_types, attention_k_eq_v, num_kv_shared_layers, double-wide MLP) now flow into compute_activation_bytes via _text_linear_dims, instead of using the legacy hidden_size//num_attention_heads KV/MLP shape. - Exclude MLA configs (q_lora_rank set) from the structured-shape path so q_lora low-rank projection formulas keep applying when head_dim is also present. - _build_text_module_elements emits a single MLA self_attn aggregate using _compute_attn_elements when q_lora_rank is set, avoiding the ~10% overcount that fed into _compute_skipped_quantizable_elements. - Restrict _module_path_matches to known text-tower prefixes so VLM skip names like vision_tower.model.layers.<i>.self_attn.q_proj no longer falsely shadow the text alias model.layers.<i>.self_attn.q_proj. - Pick up enable_moe_block from the config and add the per-layer dense MLP alongside the MoE experts in compute_total_params and compute_lora_params (Gemma4-style parallel dense + MoE block). - Single-pass structured layer accounting in _compute_layer_elements, removing the duplicate _text_linear_dims walks. - Drop the now-zero (activations - activations_computed) shard term in VramBreakdown.min_gpu_vram and the stale comment that referred to it. - attention_implementation typed as Optional[str] to match call sites that pass None. - Inline rationale comments on DOUBLE_QUANT_4BIT_FACTOR and NON_FLASH_ATTENTION_FACTOR pointing at VRAM_ESTIMATION.md. * Studio: extend parallel-MoE accounting + non-prefix dense layer support - Apply enable_moe_block / moe_has_dense_mlp symmetrically: activation per-layer MLP size in _layer_qkv_mlp_sizes now adds the parallel dense MLP for MoE layers, matching the weight and LoRA accounting added in the prior commit. Skip-quantizable mapping in _build_text_module_elements now registers both mlp.experts and per-projection mlp.{name} entries for MoE layers when the parallel dense block is present, so an llm_int8_skip_modules entry like "model.layers.N.mlp" covers both. - Track dense layer indices as a tuple (dense_layer_indices) extracted from first_k_dense_replace or decoder_sparse_step + mlp_only_layers, and dispatch dense-vs-MoE accounting through _is_dense_mlp_layer. The prior count-based path silently mis-bucketed layers when mlp_only_layers was non-prefix (e.g. [3, 5] on an 8-layer model). num_dense_layers is derived from len(dense_layer_indices) for backward compatibility. - Drop the redundant ">0" check in _is_kv_shared_layer so configs with num_kv_shared_layers == num_hidden_layers (every layer shared) are correctly recognized as shared. - Refresh VRAM_ESTIMATION.md section 5 to note that sdpa joins flash_attention_2 in the linear activation path; refresh the VramBreakdown.activations_computed comment now that the activation floor is gone. * Studio: Gemma4 PLE accounting, flex_attention, KV-share guard restore - Add flex_attention to LINEAR_ATTENTION_IMPLS. Unsloth's resolve_attention_implementation returns "flex_attention" when HAS_FLASH_ATTENTION is False and the model class supports flex; PyTorch FlexAttention is a memory-efficient kernel, not a quadratic eager attention path. Without this, activation estimates over-charge ~36x. - Restore the `> 0` guard in _is_kv_shared_layer. Transformers Gemma4 (modeling_gemma4.py:1031, modular_gemma4.py:863, :926) uses `layer_idx >= first_kv_shared_layer_idx > 0`, so configs that mark every layer as KV-shared raise on construction. Reverting the unconditional acceptance avoids producing a detailed estimate for a shape the actual model code rejects. - Extend the parallel dense MLP path (`enable_moe_block`) in _build_text_module_elements: when the arch is non-structured, use arch.intermediate_size for the dense gate/up/down dims instead of _text_linear_dims (which returns moe_intermediate_size via _get_mlp_size). Prior code under-counted skipped quantizable elements for the parallel dense block by up to 8x on GLM-style configs. - Add Gemma4 per-layer-input (PLE) module accounting: per_layer_model_projection (one global Linear) plus per-layer per_layer_input_gate and per_layer_projection are added to the quantizable text-linear total in _compute_layer_elements; post_per_layer_input_norm and per_layer_projection_norm flow into the non-quantizable bucket. compute_lora_params adds the same three Linear modules to the all-linear total. References: transformers_versions/5.7.0/.../gemma4/modular_gemma4.py:1077-1083, :1247-1253. - VRAM_ESTIMATION.md section 5 now lists flex_attention alongside sdpa and flash_attention_2 as linear-memory backends. * Studio: shared-expert variants, mlp_layer_types dispatch, PLE skip, all-linear str, deepcopy resolver Five targeted estimator corrections: - _compute_dense_layer_indices now reads `mlp_layer_types` ahead of `first_k_dense_replace` / `decoder_sparse_step`. Transformers Exaone-MoE, Laguna, Hy_v3, GLM-MoE-DSA, GLM4-MoE-Lite, Ernie4_5_VL_MoE etc. ship the per-position list and may omit the prefix-style fields entirely. - _build_text_module_elements registers per_layer_input_gate / per_layer_projection (per layer) and per_layer_model_projection (global) in the canonical element map and alias map. The PLE element count was added to total_quantizable in a prior commit but skip-module matching against names like model.layers.0.per_layer_input_gate produced 0-byte delta. Layer aggregate text.layers.<i> now sums all layer modules so prefix skip names cover the PLE pieces too. - _targets_all_linear coerces a bare string `"all-linear"` to `["all-linear"]` before set comparison; the previous set comprehension iterated chars. PEFT LoraConfig.target_modules accepts the bare-string convention. - ModelArchConfig gains `shared_expert_intermediate_size`. extract_arch_config reads `n_shared_experts` / `num_shared_experts` aliases and infers `n_shared_experts=1` when only `shared_expert_intermediate_size` is set. _compute_moe_mlp_elements and the structured + non-structured LoRA paths size the shared expert with its own intermediate (Qwen3.5-MoE: 512 vs routed moe_intermediate_size). - _determine_attention_impl_for_gpu_estimate uses copy.deepcopy so the resolver does not mutate nested text_config on the cached source. PreTrainedConfig._attn_implementation setter walks `sub_configs` and the prior shallow copy still touched the inner objects. * Studio: extend MoE/PLE/KV-share accounting to activation and skip-alias paths Five activation-path corrections plus two LoRA / skip-alias corrections so that shared-expert, per-layer-input, and KV-shared-layer support is symmetric across weights, LoRA, skip-quantizable, and activation paths. - _layer_qkv_mlp_sizes: include shared-expert FFN in mlp_size (live shared expert per token alongside routed experts) and keep K/V activation memory for KV-shared layers; only the WEIGHT path uses has_k/has_v from _layer_attention_dims. - _per_layer_activation_bytes / compute_activation_bytes: account for per_layer_input_gate (hd-sized) and per_layer_projection (pli-sized) per layer plus the global per_layer_model_projection [B,S,L,PLI] tensor when hidden_size_per_layer_input is set. - _build_text_module_elements: split mlp.experts into routed and mlp.shared_expert canonical entries; register layers.<i>.experts alias for Gemma4 enable_moe_block layouts and mlp.shared_experts (plural) alias for Exaone-MoE / Laguna / GLM4-MoE-Lite shared-expert variants. - _compute_moe_mlp_elements: split into _compute_routed_moe_elements and _compute_shared_moe_elements; only count shared_expert_gate (hd->1 Linear per shared expert) when shared_expert_intermediate_size is set, which is the Qwen2-MoE / Qwen3.5-MoE discriminator. Other shared-expert families (Exaone-MoE, HY-V3, GLM4-MoE-Lite, Laguna) lack the gate. - compute_lora_params: when target_modules='all-linear' bare keyword, drop routed and shared MoE expert LoRA contributions. PEFT's all-linear targets nn.Linear only; Unsloth's get_moe_target_parameters expands MoE expert nn.Parameter LoRA only when target_modules contains explicit gate_proj/up_proj/down_proj/gate_up_proj names. - _per_layer_input_lora_params: thread target_modules through and add the per-PLE-module contribution when the corresponding name appears, not only under all-linear. * Studio: top-k MoE activations, ERNIE list configs, suffix skips, multimodal full bytes Six estimator corrections aligning the detailed accounting paths with real training behavior: - _layer_qkv_mlp_sizes scales the MoE-layer mlp_size by num_experts_per_tok so the active routed-expert intermediate tensors are charged for activations. Adds num_experts_per_tok to ModelArchConfig and extracts it from num_experts_per_tok / top_k_experts (Gemma4 alias) in extract_arch_config. - compute_lora_params splits routed and shared MoE LoRA contributions so that bare target_modules='all-linear' zeroes routed (nn.Parameter expert tensors, which Unsloth's get_moe_target_parameters does NOT enable for the bare keyword) but keeps shared-expert LoRA (regular nn.Linear MLPs that Unsloth's get_peft_regex DOES match). - extract_arch_config gains a _first_scalar helper for ERNIE-style moe_intermediate_size = [routed, shared] lists, plus moe_num_experts and moe_num_shared_experts attribute aliases. When moe_intermediate_size is a pair and shared_expert_intermediate_size is unset, the second element is treated as the shared-expert intermediate. - estimate_required_model_memory_gb's detailed branch retains max(0, model_size_bytes - compute_total_params(arch) * 2) on top of the arch-derived breakdown.model_weights so multimodal models (vision/audio towers) and partially-modeled families (Gemma3n AltUp/Laurel etc.) do not silently drop bytes that the safetensors total includes. - _module_path_matches accepts a tail-only match when the skip entry is shorter than the alias path. Transformers' BNB quantizer suffix-matches short skip entries like ['q_proj'] / ['lm_head'] against full module paths; the previous len(skip) < len(alias) early-return missed those. - _per_layer_input_lora_params drops the all_linear branch and only counts PLE LoRA when the user explicitly names per_layer_input_gate / per_layer_projection / per_layer_model_projection. Unsloth's get_peft_regex requires module names to contain a component tag (mlp/attn/...); PLE module names lack any tag, so all-linear training does not attach LoRA to them. * Studio: full-FT extra optimizer/gradient inflation, MoE top-k aliases, ERNIE position dispatch, sibling experts aggregate When the safetensors total exceeds the text-arch fp16 estimate (multimodal vision/audio towers, partially-modeled families), only inflate the model weights line for adapter methods but extend optimizer + gradient bytes under full fine-tuning, where the extra params are trainable. DBRX exposes top-k routing as moe_top_k and Hunyuan-V1-MoE as moe_topk; neither is aliased to num_experts_per_tok via attribute_map, so probe both when extracting arch config. ERNIE 4.5 MoE / VL MoE configs declare MoE layers via moe_layer_start_index / moe_layer_end_index / moe_layer_interval (with -1 meaning the last layer); add the position-style dispatch alongside the existing mlp_layer_types / first_k_dense_replace / decoder_sparse_step paths. When moe_has_dense_mlp is set (Gemma4 enable_moe_block) the routed experts live as a sibling of self.mlp at layers.<i>.experts in the actual model layout; keep the layer mlp aggregate to the dense path and add a separate experts aggregate so a skip module model.layers.<i>.mlp does not collapse the routed experts as well. * Studio: extend MoE family extraction (Llama4 / DBRX / Hunyuan / ERNIE) and align dense vs routed MLP widths - Llama4: pick up `config.moe_layers` (auto-populated from interleave_moe_layer_step) so dense layer indices reflect the actual is_moe_layer dispatch. - Llama4: add a separate `dense_intermediate_size` derived from `intermediate_size_mlp` (used for the dense feed_forward path) and keep `intermediate_size` for the routed/shared expert width. Auto-attach one shared expert per MoE layer when the dense-vs-MoE width split is present. - DBRX: walk the `ffn_config` sub-config when extracting MoE attrs (moe_num_experts / moe_top_k / ffn_hidden_size). Without this DBRX is misclassified as a dense arch. - Hunyuan: normalize layer-wise `moe_topk` (and the canonical `num_experts_per_tok` lookup it shadows via attribute_map) through a worst-case scalar so the int(...) cast cannot crash on list values. - ERNIE 4.5 MoE: switch the start/end/interval dispatch to the model's `(layer_idx + 1) % interval == 0` modulo gate so MoE layers match the decoder when interval > 1. - ERNIE 4.5 VL MoE: drop the heuristic that read `moe_intermediate_size[1]` as the shared expert width; in VL configs [1] is the vision-routed width and shared experts are sized from [0]. - estimate_fp16_model_size_bytes: prefer the larger of config-derived and local-weight bytes so the multimodal extra_bytes correction can fire for local VLM directories. * Add tests for VRAM estimator extensions * Studio: trim verbose comments in VRAM estimator Collapse multi-paragraph rationale blocks to 1-3 lines stating the single load-bearing fact. Fix one inverted "fall through ... last" comment whose claim disagreed with the surrounding code. * Consolidate added tests into existing test_vram_estimation.py and test_gpu_selection.py Move Llama4 / DBRX / ERNIE arch-extraction tests into test_vram_estimation.py as TestLlama4ArchExtraction / TestDbrxFfnConfigExtraction / TestErniePhaseModuloDispatch / TestErnieVlSharedExpertWidth classes. Move estimate_fp16_model_size_bytes prefer-larger-of-config-or-local tests into test_gpu_selection.py as TestEstimateFp16ModelSizeBytesPrefersLocalWeights. Drop one redundant Llama4 num_dense_layers assertion already covered by the moe_layers dispatch test. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |