unsloth/tests/studio/test_cancel_atomicity.py
Daniel Han eb8b0dee2e
Studio: make stop button actually stop generation (#5069)
* Studio: make stop button actually stop generation

The UI stop button routes through assistant-ui's cancelRun, which aborts
the frontend fetch. Four issues combined to let llama-server keep decoding
long after the user clicked stop:

1. request.is_disconnected() does not fire reliably behind proxies
   (e.g. Colab) that don't propagate fetch aborts.
2. llama-server defaults n_predict to n_ctx when max_tokens is not sent,
   so a cancelled request keeps producing tokens up to 262144.
3. The httpx.Client pool keeps TCP keep-alive, so even a cleanly closed
   stream reuses the same connection and llama-server's liveness poll
   never sees a disconnect.
4. No explicit backend route to cancel - every cancel path relied on
   is_disconnected.

Changes:
- Add POST /api/inference/cancel keyed by session_id/completion_id, with
  a registry populated for the lifetime of each streaming response.
- Have the frontend (chat-adapter.ts) POST /inference/cancel on
  AbortController abort, alongside the existing fetch teardown.
- Send max_tokens=4096 + t_max_predict_ms=120000 as defaults on every
  outbound chat completion to llama-server; honoured by user overrides.
- Disable httpx keep-alive on the streaming client so connection close
  reaches llama-server and its 1s liveness check fires.

No behaviour changes for non-streaming paths or for existing callers
that already pass max_tokens/session_id.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: harden stop-button cancel path and scope cancel route

- Require at least one identifier for /api/inference/cancel so a missing
  thread id cannot silently cancel every in-flight generation.
- Scope /cancel to a dedicated studio_router so it is not exposed under
  the /v1 OpenAI-compat prefix as a surprise endpoint.
- Store a set of cancel events per key in _CANCEL_REGISTRY so concurrent
  requests on the same session_id do not overwrite each other, and
  deduplicate in _cancel_by_keys so the cancelled count reflects unique
  requests.
- Always send session_id with chat completions (not only when tools are
  enabled) so non-tool GGUF streams register under it and are reachable
  from /cancel.
- Register the non-GGUF stream_chunks path in the cancel registry too,
  so transformers-based stop-button works behind proxies that swallow
  fetch aborts.
- Only apply the 2-minute t_max_predict_ms wall-clock cap when the
  caller did not pass max_tokens, so legitimate long generations on
  slow CPU/macOS/Windows supported installs are not silently truncated.
- Remove the abort listener on normal stream completion so reused
  AbortSignals cannot fire a spurious cancel POST after the fact.

* studio: close cancel-race and stale-cancel gaps in stop path

- Register the cancel tracker before returning StreamingResponse so a
  stop POST that arrives during prefill / warmup / proxy buffering
  finds an entry in _CANCEL_REGISTRY. Cleanup now runs via a Starlette
  BackgroundTask instead of a finally inside the async generator body.
- Add a per-run cancel_id on the frontend (crypto.randomUUID) and in
  ChatCompletionRequest so /api/inference/cancel matches one specific
  generation. Removes the stale-cancel bug where pressing stop then
  starting a new run in the same thread would cancel the retry.
- Apply t_max_predict_ms unconditionally in all three llama-server
  payload builders (previously gated on max_tokens=None, which made it
  dead code for UI callers that always send params.maxTokens). Raise
  the default to 10 minutes so slow CPU / macOS / Windows installs are
  not cut off mid-generation.
- Make _cancel_by_keys refuse empty input (return 0) so a future
  internal caller can not accidentally mass-cancel every in-flight
  request.
- Accept cancel_id (primary), session_id, and completion_id on the
  /api/inference/cancel route. Unify the three streaming sites on the
  same _cancel_keys / _tracker variable names.
- Annotate _CANCEL_REGISTRY as dict[str, set[threading.Event]].

* Add review tests for PR #5069

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: harden stop-button cancel semantics and wall-clock cap

- Make /inference/cancel match cancel_id EXCLUSIVELY when supplied.
  Previously the handler iterated ('cancel_id','session_id','completion_id')
  and unioned matches, so a stale cancel POST carrying {cancel_id:old,
  session_id:thr} would still cancel a later run on the same thread via
  the shared session_id. cancel_id is now a per-run exclusive key;
  session_id / completion_id are only used as fallbacks when cancel_id
  is absent.

- Close the early-cancel race. If /inference/cancel lands before the
  streaming handler reaches _TrackedCancel.__enter__() (stop clicked
  during prefill / warmup / proxy buffering), the cancel was silently
  dropped. Stash unmatched cancel_ids in _PENDING_CANCELS with a 30 s
  TTL; _TrackedCancel.__enter__() now replays any matching pending
  cancel by set()-ing the event immediately after registration.

- Make t_max_predict_ms = _DEFAULT_T_MAX_PREDICT_MS conditional on
  max_tokens is None at all three llama-server payload sites. The cap
  is a safety net for callers who leave max_tokens unset (otherwise
  llama-server defaults n_predict to n_ctx, up to 262144). Callers who
  set an explicit max_tokens are already self-limiting and must not be
  silently truncated at 10 minutes on slow CPU / macOS / Windows
  legitimate long generations.

- Guard each StreamingResponse return with try/except BaseException so
  _tracker.__exit__ runs even if StreamingResponse construction or any
  preceding statement raises between _tracker.__enter__() and the
  BackgroundTask attachment. Prevents a registry leak on that narrow
  window.

* studio: close TOCTOU race and restore wall-clock backstop on UI path

- Close TOCTOU race in the pending-cancel mechanism. The previous fix
  split cancel_inference's (cancel_by_keys + remember_pending_cancel)
  and _TrackedCancel.__enter__'s (register + consume_pending) into
  four separate lock acquisitions. Under contention a cancel POST
  could acquire-then-release the lock, find the registry empty, and
  stash ONLY AFTER __enter__ had already registered and consumed an
  empty pending map -- silently dropping the cancel. Both call sites
  now do their work inside a single _CANCEL_LOCK critical section, via
  the new atomic helper _cancel_by_cancel_id_or_stash() and an
  inlined consume-pending step in __enter__. Reproduced the race under
  forced interleaving pre-fix; 0/2000 drops post-fix under parallel
  stress.

- Apply t_max_predict_ms UNCONDITIONALLY at all three llama-server
  payload sites. The previous iteration gated the cap on
  `max_tokens is None`, which turned out to be dead code on the
  primary Studio UI path: chat-adapter.ts sets
  maxTokens=loadResp.context_length after every model load, so every
  chat request carries an explicit max_tokens and the wall-clock
  safety net never fired. The cap's original purpose is to bound
  stuck decodes regardless of the token budget; it must always apply.

- Raise _DEFAULT_T_MAX_PREDICT_MS from 10 minutes to 1 hour. 10
  minutes was too aggressive for legitimate slow-CPU chat responses
  (a 4096-token reply at 2 tok/s takes ~34 min); 1 hour accommodates
  that and still catches genuine zombie decodes.

- Prune _PENDING_CANCELS inside _cancel_by_keys as well, so stashed
  entries expire proportionally to overall cancel traffic rather than
  only to cancel_id-specific POSTs.

* studio: trim verbose comments and docstrings in cancel path

* studio/llama_cpp: drop upstream PR hashes from benchmark comment

* Add review tests for Studio stop button

* Consolidate review tests for Studio stop button

* Align cancel-route test with exclusive cancel_id semantics

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: move cancel cleanup to generator finally; drop dead helper

- Move _tracker.__exit__ from Starlette BackgroundTask into each
  streaming generator's finally block. Starlette skips the background
  callback when stream_response raises (OSError / ClientDisconnect),
  which leaked _CANCEL_REGISTRY entries on abrupt disconnect.
- Check cancel_event.is_set() at the top of each GGUF while loop so a
  pending-replay cancel falls through to final_chunk + [DONE] instead
  of propagating GeneratorExit out of _stream_with_retry.
- Remove unused _remember_pending_cancel; _cancel_by_cancel_id_or_stash
  superseded it.

* Add review tests for Studio stop-button

* studio: wire audio-input stream into cancel registry

- Register cancel_event with _TrackedCancel on the audio-input streaming
  path so POST /api/inference/cancel can stop whisper / audio-input GGUF
  runs. Previously the registry stayed empty on this branch, so the stop
  button returned {"cancelled":0} and the decode ran to completion.
- Apply the same finally-based cleanup and pre-iteration cancel-event
  check used on the other three streaming paths.
- Update the _CANCEL_REGISTRY block comment to list cancel_id as the
  primary key (was stale "session_id preferred").

* Consolidate review tests for Studio stop-button cancel flow

- Merge the 6 behavioral tests from test_stream_cleanup_on_disconnect.py
  (finally cleanup on normal/exception/aclose, pre-set cancel_event
  pattern, and its regressions) into test_stream_cancel_registration_timing.py,
  which is the PR's existing file covering the same area.
- Extend structural invariants to include audio_input_stream alongside the
  three GGUF / Unsloth streaming generators: no _tracker.__enter__ inside
  the async gen body, cleanup via try/finally, no background= on
  StreamingResponse.
- Delete test_stream_cleanup_on_disconnect.py (now empty).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: make cancel-via-POST interrupt Unsloth and audio-input streams

Close two remaining gaps in the stop-button cancellation wiring:

- stream_chunks (Unsloth path): add a top-of-loop cancel_event check and
  call backend.reset_generation_state() so cancel POSTs flush GPU state
  and close the SSE cleanly instead of relying on request.is_disconnected
  (which does not fire through proxies like Colab's).
- audio_input_stream: run the synchronous audio_input_generate() via
  asyncio.to_thread so blocking whisper chunks do not freeze the event
  loop, matching the pattern already used by the GGUF streaming paths.

* Add review tests for Studio stop-button cancel flow

* Consolidate review tests for Studio stop-button cancel flow

- Delete standalone test_cancel_registry.py at repo root: tests duplicated
  test_cancel_atomicity.py / test_cancel_id_wiring.py and re-implemented
  registry primitives inline (scaffolding).
- Extend tests/studio/test_stream_cancel_registration_timing.py with
  regression guards for the iter-1 cancel-loop fixes:
    structural: each streaming generator checks cancel_event in its loop;
                audio_input_stream offloads next() via asyncio.to_thread;
                stream_chunks cancel branch calls reset_generation_state().
    runtime:    Unsloth loop breaks on external cancel and resets state;
                audio loop stays responsive under blocking next();
                both loops emit zero tokens on pre-set cancel (replay path).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* studio: extend stop-path to passthrough streams; tighten wall-clock cap

- Lower _DEFAULT_T_MAX_PREDICT_MS from 1 hour to 10 minutes so the
  wall-clock backstop actually bounds runaway decodes when cancel
  signaling fails.
- Wire _TrackedCancel and cancel_event.is_set() into
  _openai_passthrough_stream and _anthropic_passthrough_stream and
  disable httpx keepalive so stop requests from /v1 and /v1/messages
  tool-calling clients reach llama-server.
- Apply t_max_predict_ms to the tool-passthrough request body so the
  backstop covers passthrough paths as well.
- Symmetric pre-registration stash for session_id/completion_id
  cancels (_cancel_by_keys_or_stash) so early cancels by those keys
  replay on later registration like cancel_id.
- Drop dead except BaseException guards around StreamingResponse()
  at four streaming sites; cleanup lives in the generator's finally.

* studio: harden cancel registry against ghost-cancel and leak paths

- Revert the session_id/completion_id stash in the fallback cancel
  helper. session_id is thread-scoped and reused across runs, so
  stashing it on an unmatched POST would fire cancel_event for the
  user's next unrelated request via _TrackedCancel.__enter__.
  cancel_id remains the only per-run unique key that gets stashed.
- Default max_tokens to _DEFAULT_MAX_TOKENS in the tool-passthrough
  body. Mirror the direct GGUF path so OpenAI/Anthropic passthrough
  callers who omit max_tokens get the same zombie-decode cap instead
  of relying on the wall-clock backstop alone.
- Wrap _openai_passthrough_stream setup with an outer try/except
  BaseException. The inner except httpx.RequestError does not catch
  asyncio.CancelledError at await client.send, which would otherwise
  leave _tracker registered in _CANCEL_REGISTRY indefinitely.
- Frontend stop POST uses plain fetch + manual Authorization header
  instead of authFetch. A 401 on the cancel POST no longer refreshes
  tokens or redirects the user to the login page mid-stop.

* Add review tests for Studio stop-button cancel flow

* studio: trim comments on stop-button review changes

Collapse multi-paragraph rationale blocks on the cancel registry,
_openai_passthrough_stream, and the frontend onAbortCancel handler
into one-line explanations of why the non-obvious behaviour exists.
Drop authFetch import that became unused when the cancel POST
switched to plain fetch.

* Consolidate review tests for Studio stop-button cancel flow

Move review-added tests out of test_cancel_dispatch_edges.py into the
existing PR test files that already cover the same areas:
- backend registry fan-out / exclusivity / idempotency / falsy-keys
  edge cases moved into tests/studio/test_cancel_atomicity.py
- frontend plain-fetch (not authFetch) + manual Authorization header
  moved into tests/studio/test_cancel_id_wiring.py
Delete the now-empty test_cancel_dispatch_edges.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: stop default-capping responses at 4096 tokens (follow-up to #5069) (#5174)

* Studio: stop default-capping responses at 4096 tokens

Follow-up to #5069. The 4096 default introduced for runaway-decode
defense silently truncates any caller that omits max_tokens. The
Studio chat UI sets params.maxTokens = loadResp.context_length after
a GGUF load, so it's fine, but every other consumer is not:

- OpenAI-API direct callers (/v1/chat/completions, /v1/responses,
  /v1/messages, /v1/completions) where the OpenAI default is
  effectively unlimited per response. langchain, llama-index, raw
  curl, and the openai SDK all rely on that.
- Reasoning models. Qwen3 / gpt-oss reasoning traces routinely exceed
  4096 tokens before the model emits a single visible content token.
  The user sees the trace cut off mid-thought.
- Long-form generation ("write a chapter", "produce a full SVG").

Reproduced on this branch: gemma-4-E2B-it-GGUF Q8_0, prompt asking
for a 10000-word story, no max_tokens in the request:

    finish_reason: stop  (misleading -- should be 'length')
    content_chars: 19772
    content_tail: ...'a comforting, yet immense, pressure.\n\n*"'

Body ended mid-sentence on a stray opening quote, right at the 4096
token mark.

After this patch the same request returns 38357 chars ending with
'...held in a perfect, dynamic equilibrium.' -- a natural stop, not
a truncation.

Implementation: rename the constant to _DEFAULT_MAX_TOKENS_FLOOR and
set it to 32768. Each call site now uses the model's effective
context length when known, falling back to the floor:

    default_cap = self._effective_context_length or _DEFAULT_MAX_TOKENS_FLOOR

The 10-minute t_max_predict_ms wall-clock backstop from #5069 is
preserved as the second line of defense.

Plumbed _build_passthrough_payload + _build_openai_passthrough_body
through the routes layer so the Anthropic and OpenAI passthrough
paths also respect the model's context length.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Studio: cancel passthrough streams during llama-server prefill + route through apiUrl for Tauri

Three reviewer-flagged correctness gaps in the stop-button mechanism.

1) `_openai_passthrough_stream` could not honor cancel during prefill.
   The cancel check ran inside the `async for raw_line in lines_iter`
   body, so a cancel POST that arrived before llama-server emitted the
   first SSE line was unobservable until prefill completed. With a long
   prompt under proxy/Colab conditions -- the exact target scenario for
   this PR -- that left the model decoding for a long time after the
   user clicked Stop. Add an asyncio watcher task that closes `resp` as
   soon as `cancel_event` is set, raising in `aiter_lines` so the
   generator can exit. The watcher polls a threading.Event because the
   cancel registry is keyed by threading.Event for the synchronous
   /cancel handler.

2) `_anthropic_passthrough_stream` had the same blocking-prefill pattern.
   Same fix.

3) The frontend's stop-button cancel POST used a bare relative
   `fetch("/api/inference/cancel", ...)`, which targets the webview
   origin in Tauri production builds (where the backend is at
   `http://127.0.0.1:8888`). Route through the existing `apiUrl()`
   helper from `lib/api-base.ts` to match every other Studio call.
   Browser/dev builds get the empty base, so behavior is unchanged
   there.

Verified via temp/pr_simulation/sim_5069_prefill_cancel.py: cancel
during prefill terminates within ~250ms on both passthrough paths
(was 145s+ on the Anthropic path before this change), and the standard
non-passthrough chat path still cancels with no regression.

* Studio: log cancel-body parse errors instead of silently swallowing

Reviewer-flagged defensive logging gap. The bare `except Exception: pass`
in `cancel_inference` would mask malformed payloads that hint at a buggy
client or a transport issue. Log at debug so future investigation isn't
left guessing whether `body={}` came from a missing body or a parse
failure. Behavior is unchanged: an unparseable body still falls through
to the empty-dict path and the cancel call returns `{"cancelled": 0}`.

* Studio: Anthropic passthrough cancel parity with OpenAI passthrough

Two reviewer-flagged consistency gaps in the cancel surface for
/v1/messages.

1) Anthropic passthrough did not register cancel_id, so a per-run cancel
   POST (the cleanest Studio-style cancel path) silently missed when
   the route hit `_anthropic_passthrough_stream`. The OpenAI passthrough
   has registered (cancel_id, session_id, completion_id) since this PR
   was first opened; mirror that here. Also add `cancel_id` to
   `AnthropicMessagesRequest` so the route handler can plumb it through.

2) The cancel handler's fallback key list checked only completion_id
   and session_id, never message_id. Anthropic clients that send their
   native `id` (returned in the SSE message_start event) for cancel had
   no way to hit the registry. Add message_id to the fallback list.

Verified via temp/pr_simulation/sim_5069_prefill_cancel.py: P2 now
cancels by cancel_id in 137ms (was hanging pre-fix), and the new P2b
case cancels by message_id in 77ms. P1 (OpenAI) and P3 (standard chat)
still pass with no regression.

---------

Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-24 10:09:25 -07:00

289 lines
9 KiB
Python

"""
TOCTOU atomicity guards for the cancel path.
Structural: cancel_inference, _cancel_by_cancel_id_or_stash, and
_TrackedCancel.__enter__ must each use a single _CANCEL_LOCK critical
section over lookup + stash / register + consume-pending.
Behavioral: parallel cancel-POST vs __enter__ must never drop a cancel.
"""
from __future__ import annotations
import ast
import random
import threading
from pathlib import Path
SOURCE_PATH = (
Path(__file__).resolve().parents[2]
/ "studio"
/ "backend"
/ "routes"
/ "inference.py"
)
_SRC = SOURCE_PATH.read_text()
_TREE = ast.parse(_SRC)
def _find_function(name: str) -> ast.FunctionDef | ast.AsyncFunctionDef:
for node in ast.walk(_TREE):
if (
isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef))
and node.name == name
):
return node
raise AssertionError(f"function {name!r} not found")
def _find_class(name: str) -> ast.ClassDef:
for node in ast.walk(_TREE):
if isinstance(node, ast.ClassDef) and node.name == name:
return node
raise AssertionError(f"class {name!r} not found")
def _count_with_cancel_lock_blocks(node: ast.AST) -> int:
n = 0
for sub in ast.walk(node):
if not isinstance(sub, ast.With):
continue
for item in sub.items:
ctx = item.context_expr
if isinstance(ctx, ast.Name) and ctx.id == "_CANCEL_LOCK":
n += 1
break
return n
def test_cancel_by_cancel_id_or_stash_is_single_lock_critical_section():
fn = _find_function("_cancel_by_cancel_id_or_stash")
assert _count_with_cancel_lock_blocks(fn) == 1, (
"_cancel_by_cancel_id_or_stash must use exactly one `with "
"_CANCEL_LOCK:` block; splitting into two acquisitions reopens "
"the TOCTOU race with _TrackedCancel.__enter__"
)
src = ast.unparse(fn)
assert "_CANCEL_REGISTRY.get(cancel_id)" in src
assert "_PENDING_CANCELS[cancel_id]" in src
def test_tracked_cancel_enter_registers_and_consumes_pending_under_one_lock():
cls = _find_class("_TrackedCancel")
enter = None
for n in cls.body:
if isinstance(n, ast.FunctionDef) and n.name == "__enter__":
enter = n
break
assert enter is not None
assert _count_with_cancel_lock_blocks(enter) == 1, (
"_TrackedCancel.__enter__ must acquire _CANCEL_LOCK exactly once. "
"A second acquisition for consume-pending lets a concurrent "
"cancel POST stash after consume sees an empty map, silently "
"dropping the cancel"
)
with_block = None
for sub in ast.walk(enter):
if isinstance(sub, ast.With) and any(
isinstance(i.context_expr, ast.Name) and i.context_expr.id == "_CANCEL_LOCK"
for i in sub.items
):
with_block = sub
break
assert with_block is not None
block_src = "\n".join(ast.unparse(s) for s in with_block.body)
assert "_CANCEL_REGISTRY.setdefault" in block_src
assert "_PENDING_CANCELS.pop" in block_src, (
"__enter__ critical section must consume from _PENDING_CANCELS "
"inside the same lock, not a later re-acquisition"
)
def test_cancel_inference_uses_atomic_helper_for_cancel_id_path():
fn = _find_function("cancel_inference")
src = ast.unparse(fn)
assert "_cancel_by_cancel_id_or_stash" in src
# The pre-fix two-step idiom must be gone.
assert "_remember_pending_cancel(cancel_id)" not in src, (
"two-step _cancel_by_keys + _remember_pending_cancel produced "
"the TOCTOU race and must not return"
)
_WANTED = {
"_CANCEL_REGISTRY",
"_CANCEL_LOCK",
"_PENDING_CANCELS",
"_PENDING_CANCEL_TTL_S",
"_prune_pending",
"_remember_pending_cancel",
"_TrackedCancel",
"_cancel_by_keys",
"_cancel_by_cancel_id_or_stash",
}
def _load_registry_module():
chunks = []
for n in _TREE.body:
seg = ast.get_source_segment(_SRC, n)
if seg is None:
continue
if isinstance(n, (ast.FunctionDef, ast.ClassDef)) and n.name in _WANTED:
chunks.append(seg)
elif isinstance(n, ast.Assign):
names = [t.id for t in n.targets if isinstance(t, ast.Name)]
if any(name in _WANTED for name in names):
chunks.append(seg)
elif (
isinstance(n, ast.AnnAssign)
and isinstance(n.target, ast.Name)
and n.target.id in _WANTED
):
chunks.append(seg)
mod = {}
exec(
"import threading, time\nfrom typing import Optional\n" + "\n\n".join(chunks),
mod,
)
return mod
def test_parallel_cancel_vs_register_never_drops():
m = _load_registry_module()
trials = 500
dropped = 0
for i in range(trials):
m["_CANCEL_REGISTRY"].clear()
m["_PENDING_CANCELS"].clear()
cid = f"cid-{i}"
ev = threading.Event()
tracker = m["_TrackedCancel"](ev, cid, "thread")
start = threading.Event()
def do_cancel():
start.wait()
m["_cancel_by_cancel_id_or_stash"](cid)
def do_enter():
start.wait()
tracker.__enter__()
threads = [
threading.Thread(target = do_cancel),
threading.Thread(target = do_enter),
]
random.shuffle(threads)
for t in threads:
t.start()
start.set()
for t in threads:
t.join(timeout = 5.0)
assert not t.is_alive()
if not ev.is_set():
dropped += 1
tracker.__exit__(None, None, None)
assert dropped == 0, (
f"TOCTOU regression: {dropped}/{trials} parallel trials silently "
f"dropped the cancel"
)
def test_cancel_before_register_replays_atomically():
m = _load_registry_module()
cid = "early-cid"
ev = threading.Event()
tracker = m["_TrackedCancel"](ev, cid, "thread-x")
assert m["_cancel_by_cancel_id_or_stash"](cid) == 0
assert cid in m["_PENDING_CANCELS"]
tracker.__enter__()
assert ev.is_set()
assert cid not in m["_PENDING_CANCELS"]
tracker.__exit__(None, None, None)
def test_cancel_after_register_signals_without_stash():
m = _load_registry_module()
cid = "post-cid"
ev = threading.Event()
tracker = m["_TrackedCancel"](ev, cid, "thread-y")
tracker.__enter__()
assert m["_cancel_by_cancel_id_or_stash"](cid) == 1
assert ev.is_set()
assert cid not in m["_PENDING_CANCELS"]
tracker.__exit__(None, None, None)
def test_cancel_by_keys_tolerates_empty_and_falsy_keys():
m = _load_registry_module()
m["_CANCEL_REGISTRY"].clear()
m["_PENDING_CANCELS"].clear()
assert m["_cancel_by_keys"]([]) == 0
assert m["_cancel_by_keys"](["", None, "unknown"]) == 0
# Non-stashing fallback must never leak into _PENDING_CANCELS.
assert m["_PENDING_CANCELS"] == {}
def test_cancel_by_keys_fans_out_to_all_streams_on_same_session():
# Compare mode and other flows launch concurrent streams under a
# shared session_id; a single session cancel POST must hit all of them.
m = _load_registry_module()
m["_CANCEL_REGISTRY"].clear()
m["_PENDING_CANCELS"].clear()
session = "shared-thread"
ev_a = threading.Event()
ev_b = threading.Event()
tracker_a = m["_TrackedCancel"](ev_a, "cancel-a", session, "chatcmpl-a")
tracker_b = m["_TrackedCancel"](ev_b, "cancel-b", session, "chatcmpl-b")
tracker_a.__enter__()
tracker_b.__enter__()
try:
assert m["_cancel_by_keys"]([session]) == 2
assert ev_a.is_set() and ev_b.is_set()
finally:
tracker_a.__exit__(None, None, None)
tracker_b.__exit__(None, None, None)
assert session not in m["_CANCEL_REGISTRY"]
def test_cancel_by_cancel_id_is_exclusive_to_single_run():
# cancel_id is per-run unique; cancelling run A must not touch run B
# even when both share a session_id.
m = _load_registry_module()
m["_CANCEL_REGISTRY"].clear()
m["_PENDING_CANCELS"].clear()
session = "shared-thread-2"
ev_a = threading.Event()
ev_b = threading.Event()
tracker_a = m["_TrackedCancel"](ev_a, "cancel-only-a", session, "chatcmpl-a")
tracker_b = m["_TrackedCancel"](ev_b, "cancel-only-b", session, "chatcmpl-b")
tracker_a.__enter__()
tracker_b.__enter__()
try:
assert m["_cancel_by_cancel_id_or_stash"]("cancel-only-a") == 1
assert ev_a.is_set()
assert not ev_b.is_set()
finally:
tracker_a.__exit__(None, None, None)
tracker_b.__exit__(None, None, None)
def test_tracked_cancel_exit_is_idempotent():
# Outer except BaseException + the generator's finally may both call
# __exit__ under certain race combos; must not raise.
m = _load_registry_module()
m["_CANCEL_REGISTRY"].clear()
m["_PENDING_CANCELS"].clear()
ev = threading.Event()
tracker = m["_TrackedCancel"](ev, "cid", "sess", "chatcmpl-x")
tracker.__enter__()
tracker.__exit__(None, None, None)
tracker.__exit__(None, None, None)
tracker.__exit__(None, None, None)
assert not m["_CANCEL_REGISTRY"]