koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-16 19:59:16 +00:00

Author	SHA1	Message	Date
Concedo	9d38a9edc0	quick fix for colab	2026-05-17 00:18:02 +08:00
Concedo	0d320f60a6	fix multiuser regression	2026-05-17 00:17:12 +08:00
Concedo	47d5772fbe	add batching failure spam logs	2026-05-16 23:21:01 +08:00
Concedo	9203b6a051	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # .github/workflows/server-sanitize.yml # .github/workflows/server-self-hosted.yml # .github/workflows/server.yml # .github/workflows/ui-build.yml # .github/workflows/ui-ci.yml # .github/workflows/ui-publish.yml # .gitignore # CMakeLists.txt # CODEOWNERS # scripts/ui-download.cmake # scripts/xxd.cmake # tests/test-backend-ops.cpp # tests/test-reasoning-budget.cpp # tools/CMakeLists.txt # tools/server/CMakeLists.txt # tools/server/README.md	2026-05-16 22:56:33 +08:00
Concedo	3095da076a	only fetch new popped horde requests if model is not blocked queue	2026-05-16 22:27:12 +08:00
Pascal	366c5e2a3b	ui: untrack settings sync in props effect to prevent reactive loop (#23127 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / python type-check (push) Waiting to run Details	2026-05-16 11:25:34 +02:00
Aleksander Grygier	1d9f99aa75	fix: Add build step using build workflow to publish workflow (#23134 )	2026-05-16 11:22:59 +02:00
ynankani	42928bc14d	model : NvFP4 quantized LM head support (#23046 ) * NvFP4 quantized LM head support Signed-off-by: ynankani <ynankani@nvidia.com> * Address review commnets Signed-off-by: ynankani <ynankani@nvidia.com> * Add assert for NvFp4 lm head and tied embeddings Signed-off-by: ynankani <ynankani@nvidia.com> * Address review commnets Signed-off-by: ynankani <ynankani@nvidia.com> * Create output_s tensor only when LM head NvFp4 Signed-off-by: ynankani <ynankani@nvidia.com> --------- Signed-off-by: ynankani <ynankani@nvidia.com>	2026-05-16 11:09:27 +02:00
Concedo	80ce8a50b3	allow token bans and eos handling in	2026-05-16 15:20:46 +08:00
Wagner Bruna	f273fd35b9	sd: sync to master-601-eeac950 (#2206 ) * sd: sync to master-601-eeac950 * sd: add mmap support	2026-05-16 11:23:10 +08:00
askmyteapot	3174fccb83	Fix linker error for kcpp_permit_any_repack (#2210 ) Fixes `gpttype_adapter.lib(gpttype_adapter.obj) : error LNK2019: unresolved external symbol "int kcpp_permit_any_repack"` It's a `bool`, not an `int`	2026-05-16 08:55:56 +08:00
Aleksander Grygier	59778f0196	ui: Restructure repo to use `tools/ui` folder and `ui` / `UI` / `llama-ui` / `LLAMA_UI` naming (#23064 ) * webui: Move static build output from `tools/server/public` to `build/ui` directory * refactor: Move to `tools/ui` * refactor: rename CMake variables and preprocessor defines - Rename LLAMA_BUILD_WEBUI -> LLAMA_BUILD_UI (old kept as deprecated) - Rename LLAMA_USE_PREBUILT_WEBUI -> LLAMA_USE_PREBUILT_UI (old kept as deprecated) - Backward compat: old vars auto-forward to new ones with DEPRECATION warning - Rename internal vars: WEBUI_SOURCE -> UI_SOURCE, WEBUI_SOURCE_DIR -> UI_SOURCE_DIR, etc. - Rename HF bucket: LLAMA_WEBUI_HF_BUCKET -> LLAMA_UI_HF_BUCKET - Emit both LLAMA_BUILD_WEBUI and LLAMA_BUILD_UI preprocessor defines - Emit both LLAMA_WEBUI_DEFAULT_ENABLED and LLAMA_UI_DEFAULT_ENABLED * refactor: rename CLI flags (--webui -> --ui) with backward compat - Add --ui/--no-ui (old --webui/--no-webui kept as deprecated aliases) - Add --ui-config (old --webui-config kept as deprecated alias) - Add --ui-config-file (old --webui-config-file kept as deprecated alias) - Add --ui-mcp-proxy/--no-ui-mcp-proxy (old --webui-mcp-proxy kept as deprecated) - Add new env vars: LLAMA_ARG_UI, LLAMA_ARG_UI_CONFIG, LLAMA_ARG_UI_CONFIG_FILE, LLAMA_ARG_UI_MCP_PROXY - C++ struct fields: params.ui, params.ui_config_json, params.ui_mcp_proxy added alongside old fields - Backward compat: old fields synced to new ones in g_params_to_internals * refactor: update C++ server internals with backward compat - Rename json_webui_settings -> json_ui_settings (both kept in server_context_meta) - Rename params.webui usage -> params.ui (both synced, old still works) - JSON API emits both "ui"/"ui_settings" and "webui"/"webui_settings" keys - Server routes use params.ui_mcp_proxy \|\| params.webui_mcp_proxy - Preprocessor guards use #if defined(LLAMA_BUILD_UI) \|\| defined(LLAMA_BUILD_WEBUI) * refactor: rename CI/CD workflows, artifacts, and build script - Rename webui-build.yml -> ui-build.yml; artifact webui-build -> ui-build - Rename webui-publish.yml -> ui-publish.yml; var HF_BUCKET_WEBUI_STATIC_OUTPUT -> HF_BUCKET_UI_STATIC_OUTPUT - Rename server-webui.yml -> server-ui.yml; job webui-build/checks -> ui-build/checks - Update server.yml: job/artifact refs webui-build -> ui-build - Update release.yml: all webui-build/publish refs -> ui-build/publish; HF_TOKEN_WEBUI_STATIC_OUTPUT -> HF_TOKEN_UI_STATIC_OUTPUT - Update server-self-hosted.yml: webui-build -> ui-build - Update build-self-hosted.yml: HF_WEBUI_VERSION -> HF_UI_VERSION - Rename webui-download.cmake -> ui-download.cmake (internal refs updated) - Update labeler.yml: server/webui -> server/ui path label * docs: update CODEOWNERS and server README docs - Update CODEOWNERS: team ggml-org/llama-webui -> ggml-org/llama-ui, path /tools/server/webui/ -> /tools/ui/ - Update server README.md: CLI tables show --ui flags with deprecated --webui aliases - Update server README-dev.md: "WebUI" -> "UI", paths updated to tools/ui/ * fix: Small fixes for UI build * fix: CMake.txt syntax * chore: Formatting * fix: `.editorconfig` for llama-ui * chore: Formatting * refactor: Use `APP_NAME` in Error route * refactor: Cleanup * refactor: Single migration service * make llama-ui a linkable target * fix: UI Build output * fix: Missing change * fix: separate llama-ui npm build output into build/tools/ui/dist subfolder + use cmake npm build instead of downloading ui-build.yml artifacts in CI * refactor: UI workflows cleanup --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-05-16 02:02:40 +02:00
Sigbjørn Skjæret	49d1701bd2	ci : fix release symlinks (#23119 )	2026-05-16 01:09:28 +02:00
Omer Ozarslan	1348f67c58	webui: Use lowercase hash for HF checksum check (#23107 )	2026-05-15 19:38:16 +02:00
Pascal	cfabeb1bad	tests: add BF16 non-contig coverage for MUL_MAT permutations (#22689 ) The MUL_MAT test loop iterates over base_types[] to generate non-contig permutation cases (3 standard permutations across n in {1, 8, 16}). BF16 is absent from base_types[], so these 9 cases were never generated for BF16 even though every other type covered by base_types[] tests them. Add the missing 9 cases explicitly: BF16 x F32, m=16, k=256, bs=[2,3], permutations {0,2,1,3}, {0,1,3,2}, {0,3,2,1}, with n in {1, 8, 16}. Suggested-by: @jeffbolznv	2026-05-15 19:35:05 +02:00
Julien Chaumond	6831fe470c	docs: document `usage` object in server timings response (#23110 ) * docs: document `usage` object in server timings response Co-Authored-By: julien-agent <Agents+cyolo@huggingface.co> * Apply suggestion from @julien-c --------- Co-authored-by: julien-agent <Agents+cyolo@huggingface.co>	2026-05-15 19:33:12 +02:00
Xuan-Son Nguyen	72e60f500d	mtmd: add chunks and fix preproc for qwen3a (#23073 ) * mtmd: add chunks and fix preproc for qwen3a * add attn_mask * limit mtmd_chunk size (avoid blow up memory) * correct audio tokens * re-order the set_input case * remove attn_mask	2026-05-15 19:32:47 +02:00
Pascal	8be1786707	webui: fix theme from --webui-config-file not applied on first load (fresh localStorage) (#22902 )	2026-05-15 19:25:38 +02:00
Concedo	79666e5764	revert sdcpp build steps to use makefile and cmake without external txt files	2026-05-16 00:53:56 +08:00
Sigbjørn Skjæret	18d1717d62	convert : fix Qwen3 ASR conversion (#23081 ) * fix qwen3asr * fix qwen3asr	2026-05-15 18:38:39 +02:00
Concedo	77fa2cd348	batching horde worker adjustments	2026-05-16 00:30:23 +08:00
Concedo	f8391d527a	fix broken makefile	2026-05-15 23:02:38 +08:00
Concedo	74c6daba1a	fix bad merge	2026-05-15 22:47:24 +08:00
Piotr Wilkin (ilintar)	cc7200bf12	Refactor: convert_hf_to_gguf.py (#17114 ) * move conversion code to a dedicated conversion directory and split the files akin to the src/models architecture --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-15 15:18:12 +02:00
Sigbjørn Skjæret	769cc93a43	ci : fix transform of top . entry in release archive (#23080 ) * fix transform of top . entry in release archive * simplify	2026-05-15 13:13:16 +02:00
Georgi Gerganov	d5dc2e0a02	llama-eval : add AIME 2026 dataset support (#23058 ) Add Aime2026Dataset class loading from MathArena/aime_2026 on HuggingFace. 30 problems (two sets of 15), single config/split. Usage: --dataset aime2026 Assisted-by: llama.cpp:local pi	2026-05-15 13:58:30 +03:00
Aman Gupta	ac33f032ac	reasoning-budget: clone should do a deep-copy (#23095 )	2026-05-15 11:59:07 +02:00
Concedo	35f524d3e2	horde advertise more threads when batching is enabled	2026-05-15 17:36:53 +08:00
Pascal	d528444580	webui: preserve partial response on streaming error (#23090 )	2026-05-15 11:18:11 +02:00
Concedo	da2cc90723	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/labeler.yml # .github/workflows/build-and-test-snapdragon.yml # .github/workflows/build-self-hosted.yml # .github/workflows/release.yml # .github/workflows/server-self-hosted.yml # .github/workflows/server-webui.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # CONTRIBUTING.md # README.md # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-hexagon/htp/cpy-ops.c # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # grammars/README.md # scripts/snapdragon/qdc/run_qdc_jobs.py # scripts/snapdragon/qdc/tests/run_backend_ops_posix.py # scripts/snapdragon/qdc/tests/run_bench_tests_posix.py # scripts/snapdragon/qdc/tests/utils.py # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/server/CMakeLists.txt # tools/server/README.md # tools/server/webui/src/lib/components/app/server/ServerLoadingSplash.svelte # tools/server/webui/src/routes/(chat)/chat/[id]/+page.svelte # ty.toml	2026-05-15 17:09:48 +08:00
Reithan	5962bca463	Fix jinja error on case-insensitive roles and 0-len messages result (#2201 ) * fix jinja error on case-insensitive roles and 0-len messages result * check length in correct place	2026-05-15 16:48:42 +08:00
Concedo	66a7b5e5de	prevent repacking if mmap is used, reduces memory footprint	2026-05-15 16:40:53 +08:00
Sid Shaytay	91e84fed64	Support for Codex CLI by skipping unsupported Responses tools (#23041 ) Some checks are pending Python Type-Check / python type-check (push) Waiting to run Details * Support for Codex CLI by skipping unsupported Responses tools * Warn on skipped Responses tools and preserve gpt-oss apply_patch rejection * Revert gpt-oss apply_patch special handling	2026-05-15 09:03:24 +02:00
KITAITI Makoto	7155a49771	readme : update bindings (#23063 )	2026-05-15 08:41:24 +03:00
Pranav Dhinakar	5c0e946837	ggml-hexagon: cpy: add contiguous fast-path in reshape copy (#23076 )	2026-05-14 16:55:54 -07:00
Johannes Gäßler	3e037f313c	HIP: RDNA3 mma FA, faster AMD transpose, tune AMD (#22880 ) Adds RDNA3 support to the CUDA mma FA kernel. To make the RDNA3 tensor cores work with the FP16 accumulation for VKQ the tiles they need to be 32 logical units long in direction of the attention head; for head sizes 80 and 112 that are not exactly divided by 32 the regular length of 16 with FP32 accumulation is used instead. The longer tiles also enable more efficient transposition for a warp size of 32 which is why it's also used for RDNA4. However, this scrambles the data layout of the accumulators along the attention head dimension. To prevent accidental misuse I added another entry to ggml_cuda_mma::data_layout. I also tuned the kernel parameters for RDNA3, RDNA4, and CDNA1 in general, during which I discovered that the kernel can be made to work for head sizes up to 256 for CDNA. For RDNA3/4 I was not able to get better performance that the tile kernel for head sizes > 128.	2026-05-14 22:58:58 +02:00
Zack Li	d81e63dcfd	CI : support IOT device (IQ9) (#22987 ) * update test scripts * align CI behavior between linux and android * remove automatically cancel in 15min * enable cancel-in-progress * fix ty check issue * update and fix pylint issue * update runner such that we are not restricted by the 15min limit rule * fix flake8 lint issue * update runner according to review feedback * code update according to review feedback * switch from llama-cli to llama-completion binary with -no-cnv flag	2026-05-14 13:58:34 -07:00
Reese Levine	834a243664	ggml-webgpu: Enable NVIDIA self-hosted CI (#22976 ) * Enabel nvidia ci for webgpu * Address precision issues * fix placement * Relax more set_rows and div * Try relaxing all f16 * formatting and naming * Add comment explaining max_nmse_err logic Added comment referencing pull request for clarification.	2026-05-14 09:41:32 -07:00
Zheyuan Chen	5ec717d125	ggml-webgpu: makes the flash attn vec path subgroup-aware (#23040 ) * ggml-webgpu: makes the flash attn vec path compile and size its split/reduce work from the device’s reported subgroup range instead of assuming 32 subgroup size. * ggml-webgpu: remove the extra max_wg_size >= max_subgroup_size guard. Remove hardcoded 32 when determine the value of reduce_wg_size and vec_nwg_cap	2026-05-14 09:31:36 -07:00
Aleksander Grygier	0c3e4fccca	fix: Propagate version tag to WebUI asset download in self-hosted CI (#23051 ) * fix: Propagate version tag to WebUI asset download in self-hosted CI * refactor: Apply suggestions from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix: Skip npm build when Node.js is not installed Avoid 'no such file or directory' errors on CI runners that lack Node.js. Check if npm is available via find_program before attempting npm install + npm run build. Falls back to HF Bucket download. * fix: Use + separator for ASSETS list to fix Windows build Replace fragile \; escaping with a + separator when passing the WebUI asset list via -DASSETS to the download script. On Windows, the \; escaping was not reliably preserved through the CMake build system, causing all asset filenames to be concatenated into one (e.g., 'index.html;bundle.js;bundle.css;loading.html' as a single file), which broke the HF Bucket download and subsequent xxd.cmake step. + is safe because it is not special in cmd.exe (unlike \| which is a pipe operator), not special in CMake's -D argument parser, and not a valid Windows filename character. CMakeLists.txt joins assets with + and webui-download.cmake splits them back via regex. * fix: Validate HF_WEBUI_VERSION environment variable with regex Add input validation for the HF_WEBUI_VERSION env var to prevent CMake list separator or path-traversal issues in stamp filenames and download URLs. Rejects non-conforming characters early. * fix: Remove 'latest' fallback for HF_WEBUI_VERSION When needs.determine-tag.outputs.tag_name is empty, let CMake's default resolution handle it (empty -> git-based version lookup) instead of falling back to 'latest'. This ensures the sentinel stamp file is consistent with CMake's resolution logic. * fix: Demote checksum verification failure to warning instead of hard gate * fix: End line character --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-14 17:57:20 +02:00
Aman Gupta	97b658cee8	contributing: new contributors should not submit trivial fixes (#23045 )	2026-05-14 23:55:24 +08:00
Concedo	1fe1a083cd	run multiple horde workers if used with batching.	2026-05-14 23:36:42 +08:00
Wagner Bruna	bfe9548fd5	sd: sync to master-596-90e87bc (#2204 ) * sd: reuse source lists between make and cmake * sd: sync to master-596-90e87bc * Update source file path for sdtype_adapter.cpp --------- Co-authored-by: LostRuins Concedo <39025047+LostRuins@users.noreply.github.com>	2026-05-14 23:14:33 +08:00
Aleksander Grygier	253ba110bc	webui: Move static build output from repo code to HF Bucket (#22937 ) * ci: add workflow to publish webui to Hugging Face bucket * ci: add webui release job to release workflow * ci: test webui release job * chore: Return to default minification strategy for build output files * ci: extract webui build into separate workflow and job * chore: Ignore webui static output + clean up references * chore: Delete legacy webui static output * chore: Ignore webui build static output * fix: Workflow * fix: Versioning naming * chore: Update package name * test: Test CI fix * refactor: Naming * server: implement webui build strategy with HF Bucket support * chore: Remove test workflow * chore: Use WebUI build workflow call in other workflows * server: HF Buckets fallback for WebUI build * refactor: App name variable * refactor: Naming * fix: Retrieve loading.html * fix: workflow syntax * fix: Rewrite malformed release.yml * fix: Req param * test: Re-add missing Playwright installation for CI tests * refactor: Logic & security improvements * refactor: Retrieve publishing jobs and DRY the workflows * fix: Test workflow syntax * fix: Upstream Release Tag for test workflow * chore: Remove test workflow * ci: Run WebUI jobs on `ubuntu-24.04-arm` * refactor: Post-CR cleanup Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * refactor: CI cleanup * refactor: Cleanup * test: Test workflow * refactor: use LLAMA_BUILD_NUMBER instead of LLAMA_BUILD_TAG for HF Bucket webui downloads * server: add fallback mechanism for HF Bucket webui downloads from latest directory * fix: Incorrect argument order in file(SHA256) calls for checksum verification * refactor: Use cmake script for handling the HF Bucket download on build time * feat: support local npm build for WebUI assets * refactor: add `HF_ENABLED` flag to control WebUI build/download provisioning * refactor: Cleanup * chore: Remove test workflow * fix: remove s390x from release workflow * fix: add webui-build dependency to ubuntu-22-rocm and windows-hip * Revert "fix: remove s390x from release workflow" This reverts commit debcfffa9bc1e3112eae41f2d29741b682e4eb19. * fix: Release workflow file * fix: Proper release tag used for HF Bucket upload * fix: Remove duplicate steps in release workflow --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-05-14 13:21:41 +02:00
Concedo	cc82c3164e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # .github/workflows/build-cross.yml # .github/workflows/build-sycl.yml # .github/workflows/build.yml # .github/workflows/editorconfig.yml # .github/workflows/release.yml # cmake/riscv64-spacemit-linux-gnu-gcc.cmake # docs/backend/OPENVINO.md # docs/backend/SYCL.md # docs/build-riscv64-spacemit.md # docs/ops.md # docs/ops/WebGPU.csv # embd_res/ggml-vocab-qwen35.gguf # embd_res/ggml-vocab-qwen35.gguf.inp # embd_res/ggml-vocab-qwen35.gguf.out # examples/model-conversion/Makefile # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/hmx-flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/hmx-utils.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/common.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_tile.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_reduce.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec_acc.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl # ggml/src/ggml-zendnn/CMakeLists.txt # ggml/src/ggml-zendnn/ggml-zendnn.cpp # scripts/snapdragon/adb/run-completion.sh # tests/CMakeLists.txt # tools/cli/README.md # tools/completion/README.md # tools/mtmd/clip-impl.h # tools/mtmd/clip.cpp # tools/mtmd/clip.h # tools/server/README.md	2026-05-14 19:04:04 +08:00
Concedo	6e42e0ebb6	revert to use upstream fix	2026-05-14 18:56:57 +08:00
Georgi Gerganov	67b2b7f2f2	logs : reduce (#23021 ) Some checks failed Python Type-Check / python type-check (push) Waiting to run Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Update Operations Documentation / update-ops-docs (push) Has been cancelled Details * logs : reduce * args : fix envs * server : fix build * common : print verbosity level at start * server : clean-up logs * server : print prompt processing timings + sampling params * minor : whitespaces	2026-05-14 13:05:52 +03:00
alex-spacemit	81b0d882ae	ggml-cpu: Add IME2 Instruction Support for the SpacemiT Backend (#22863 )	2026-05-14 17:39:30 +08:00
Neo Zhang	0f45f1a35c	docker : revert stable version of intel compute-runtime (#22968 )	2026-05-14 11:30:40 +02:00
Kabir Potdar	42532afff4	unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regr… (#22110 ) * unicode,test: add Qwen3.5 non-backtracking tokenizer handler and regression tests - Add unicode_regex_split_custom_qwen35() to [src/unicode.cpp](src/unicode.cpp), a non-backtracking handler for Qwen3.5's [\p{L}\p{M}]+ regex (letters + combining marks). - Register the handler in the custom tokenizer dispatch table to prevent stack overflows on long inputs (fixes #21919). - Add [models/ggml-vocab-qwen35.gguf](models/ggml-vocab-qwen35.gguf) (test vocab), [models/ggml-vocab-qwen35.gguf.inp](models/ggml-vocab-qwen35.gguf.inp) (test cases), and [models/ggml-vocab-qwen35.gguf.out](models/ggml-vocab-qwen35.gguf.out) (expected output) for regression testing. - Update [tests/CMakeLists.txt](tests/CMakeLists.txt) to include the new test entry. This mirrors the Qwen2 fix (commit `0d049d6`), but adapts for Qwen3.5's regex. Ensures robust Unicode tokenization and prevents std::regex stack overflows. Closes #21919. * fix: enhance regex handling for Qwen3.5 tokenizer to include accent marks * cont : remove trailing whitespace --------- Co-authored-by: Kabir <kabir@example.com> Co-authored-by: Alde Rojas <hello@alde.dev>	2026-05-14 11:03:40 +02:00

1 2 3 4 5 ...

13329 commits