koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-07 09:02:04 +00:00

Author	SHA1	Message	Date
Concedo	376aaf258c	Merge branch 'upstream' into concedo_experimental	2026-04-04 23:56:02 +08:00
Dan Hoffman	9c699074c9	server: Fix undefined timing measurement errors in server context (#21201 ) Co-authored-by: Dan Hoffman <dhoffman@cyket.net>	2026-04-04 22:11:19 +08:00
Concedo	2e4f94822e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-self-hosted.yml # .github/workflows/docker.yml # ci/run.sh # docs/build.md # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # src/llama-vocab.cpp # tests/test-chat.cpp # tests/test-jinja.cpp # tools/cli/README.md # tools/completion/README.md # tools/server/README.md	2026-04-04 14:27:23 +08:00
Yes You Can Have Your Own	50e0ad08fb	server: save and clear idle slots on new task (`--clear-idle`) (#20993 ) * server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE) * server: move idle slot KV clearing to slot release The save "cost" is now paid by the finishing request. * server: add --kv-clear-idle flag, enable by default * server: skip clearing last idle slot, clear on launch * server: test --no-kv-clear-idle flag * server: simplify on-release clearing loop * server: remove on-release KV clearing, keep launch-only * cont : clean-up * tests: update log strings after --clear-idle rename * tests: use debug tags instead of log message matching * test: fix Windows CI by dropping temp log file unlink --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-04-03 19:02:27 +02:00
Concedo	34ad53e950	merged support for gemma4. the e2b, e4b and 26b work, the 31b does not	2026-04-03 11:07:46 +08:00
Concedo	5dee1a1cbb	Merge commit '`fbd441c379`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # AGENTS.md # ci/run.sh # docs/build.md # embd_res/templates/LFM2.5-Instruct.jinja # ggml/CMakeLists.txt # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/htp-ops.h # ggml/src/ggml-hexagon/htp/hvx-div.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/unary-ops.c # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl # scripts/hip/gcn-cdna-vgpr-check.py # scripts/sync-ggml.last # tests/test-chat.cpp	2026-04-03 01:06:02 +08:00
Roger Chen	d6dac92bfd	Ignore Transfer-Encoding header. (#20269 )	2026-04-02 10:41:19 +02:00
Aleksander Grygier	12dbf1da95	server: Bypass API Key validation for WebUI static bundle assets (#21269 ) * fix: Bypass API Key validation for static bundle assets * refactor: All bypassed routes in `public_endpoints` * test: Update static assets API Key test	2026-04-01 21:32:15 +02:00
Concedo	31aa072da1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # .gitignore # examples/batched/batched.cpp # examples/debug/debug.cpp # examples/eval-callback/eval-callback.cpp # examples/idle/idle.cpp # examples/lookahead/lookahead.cpp # examples/lookup/lookup-create.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # examples/training/finetune.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/fattn-tile.hpp # ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/cpy.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py # ggml/src/ggml-webgpu/wgsl-shaders/rope.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/soft_max.wgsl # scripts/sync-ggml.last # tests/export-graph-ops.cpp # tests/test-chat.cpp # tests/test-state-restore-fragmented.cpp # tests/test-thread-safety.cpp # tools/batched-bench/batched-bench.cpp # tools/cli/cli.cpp # tools/cvector-generator/cvector-generator.cpp # tools/export-lora/export-lora.cpp # tools/imatrix/imatrix.cpp # tools/perplexity/perplexity.cpp # tools/results/results.cpp # tools/server/CMakeLists.txt	2026-04-01 10:54:13 +08:00
Aleksander Grygier	0fcb3760b2	fix: Use lower-case proxy headers naming (#21235 )	2026-03-31 17:47:46 +02:00
Xuan-Son Nguyen	4a00bbfed6	server: (webui) no more gzip compression (#21073 ) * webui: no more gzip * try changing a small line * Revert "try changing a small line" This reverts commit 0d7a3531593d87b724d404c8727a96becab3ab07. * fix lint * fix test * rebuild * split into html/css/js * lint * chore: update webui build output * chore: Update git hooks script * server: update webui build output * chore: Update pre-commit hook * refactor: Cleanup --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-31 15:44:26 +02:00
Adrien Gallouët	41361c8599	common : move up common_init() and fix Windows UTF-8 logs (#21176 ) The build info is now only for debug, so we avoid the duplicate with `--version`. The UTF-8 setup at the beginning is needed to avoid logging garbage on Windows. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-31 12:53:41 +02:00
SATISH K C	fcc2d598c8	fix: include API key in CORS proxy requests for MCP connections (#21193 ) * fix: include API key in CORS proxy requests for MCP connections When llama-server is started with --api-key-file and --webui-mcp-proxy, the /cors-proxy endpoint requires authentication. The WebUI was not including the Authorization header in proxy requests, causing MCP connections to fail with 401. Inject getAuthHeaders() into requestInit when useProxy is true so the proxy request carries the Bearer token alongside the forwarded target headers. Fixes #21167 * fix: simplify headers assignment based on reviewer suggestion Apply buildProxiedHeaders only when useProxy is true, pass headers directly to the transport otherwise.	2026-03-31 10:52:34 +02:00
Piotr Wilkin (ilintar)	4453e77561	server/webui: cleanup dual representation approach, simplify to openai-compat (#21090 ) * server/webui: cleanup dual representation approach, simplify to openai-compat * feat: Fix regression for Agentic Loop UI * chore: update webui build output * refactor: Post-review code improvements * chore: update webui build output * refactor: Cleanup * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-31 10:42:06 +02:00
Concedo	a3a5897d93	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # .github/workflows/python-type-check.yml # embd_res/templates/Qwen3.5-4B.jinja # examples/model-conversion/scripts/causal/compare-logits.py # examples/model-conversion/scripts/utils/check-nmse.py # examples/model-conversion/scripts/utils/compare_tokens.py # examples/model-conversion/scripts/utils/semantic_check.py # examples/sycl/build.sh # examples/sycl/run-llama2.sh # ggml/src/ggml-hexagon/htp/flash-attn-ops.c # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/rope-ops.c # scripts/gen-unicode-data.py # tests/test-chat.cpp	2026-03-30 21:41:19 +08:00
Concedo	42ad89cd86	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/cann.Dockerfile # .devops/cpu.Dockerfile # .devops/llama-cli-cann.Dockerfile # .devops/nix/package.nix # .github/workflows/build-android.yml # .github/workflows/build-cann.yml # .github/workflows/build-msys.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/python-lint.yml # .github/workflows/release.yml # CMakeLists.txt # docs/backend/CANN.md # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/matmul-ops.c # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/sync_vendor.py # tests/test-chat-auto-parser.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-reasoning-budget.cpp # tools/cli/cli.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2026-03-30 20:45:38 +08:00
Aleksander Grygier	389c7d4955	webui: Fix branching logic on edit message (#21175 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details * fix: Branching logic + small refactor * chore: update webui build output	2026-03-30 14:40:50 +02:00
Sigbjørn Skjæret	e2eb39e81c	ci : bump ty to 0.0.26 (#21156 ) * fix incorrect type ignore comments * bump ty to 0.0.26	2026-03-30 09:29:15 +02:00
Xuan-Son Nguyen	abf9a62161	server: wrap headers for mcp proxy (#21072 ) * server: wrap headers for mcp proxy * Update tools/server/server-cors-proxy.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix build * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-30 08:59:16 +02:00
BlueMöhre	968189729f	WebUI: Replace illegal nested button elements (#21026 ) * remove/replace nested button elements * map rest props to outer element * solve TODO * chore: update webui build output	2026-03-28 17:57:59 +01:00
Georgi Gerganov	edfb440a2f	server : fix processing of multiple back-to-back mtmd chunks (#21107 )	2026-03-28 16:27:36 +02:00
Adrien Gallouët	3d66da1809	ci : gracefully shut down the server (#21110 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 14:49:57 +01:00
Woof Dog	82b703f8bc	Document custom default webui preferences in server README (#19771 )	2026-03-28 14:19:16 +01:00
Aleksander Grygier	51a84efc53	webui: Conversation forking + branching improvements (#21021 ) * refactor: Make `DialogConfirmation` extensible with children slot * feat: Add conversation forking logic * feat: Conversation forking UI * feat: Update delete/edit dialogs and logic for forks * refactor: Improve Chat Sidebar UX and add MCP Servers entry * refactor: Cleanup * feat: Update message in place when editing leaf nodes * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * chore: Cleanup * refactor: Post-review improvements * chore: update webui build output * test: Update Storybook test * chore: update webui build output * chore: update webui build output	2026-03-28 13:38:15 +01:00
Adrien Gallouët	b0f0dd3e51	vendor : update cpp-httplib to 0.40.0 (#21100 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 08:59:44 +01:00
Sigbjørn Skjæret	c46758d28f	cli : add /glob command (#21084 ) * add /glob command * output error when max files reached * support globbing outside curdir	2026-03-28 02:33:04 +01:00
Adrien Gallouët	5c1a7b8355	server : add custom socket options to disable SO_REUSEPORT (#21056 ) * server : add custom socket options to disable SO_REUSEPORT Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add --reuse-port $ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 --reuse-port setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 $ strace -e trace=setsockopt,bind build/bin/llama-server -lv 2 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Update tools/server/README.md (llama-gen-docs) Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Fix windows Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-28 01:12:43 +01:00
Aldehir Rojas	59d840209a	common : inhibit lazy grammar sampler while reasoning is active (#20970 ) * common : inhibit grammar while reasoning budget is active * cont : update force_pos in accept * cont : fix tests * cont : tweak should apply logic * cont : return early not using grammar sampler * Add tests * cont : prevent backend sampling when reasoning budget enabled * cont : fix typo --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>	2026-03-27 18:30:40 +01:00
Concedo	3ec6381123	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build-self-hosted.yml # .github/workflows/build.yml # .github/workflows/copilot-setup-steps.yml # .github/workflows/gguf-publish.yml # ci/run.sh # docs/backend/OPENVINO.md # examples/llama.android/lib/src/main/cpp/ai_chat.cpp # ggml/src/ggml-sycl/add-id.cpp # requirements/requirements-pydantic.txt # tests/test-gguf.cpp # tests/test-jinja.cpp # tests/test-llama-archs.cpp # tools/gguf-split/README.md # tools/llama-bench/llama-bench.cpp	2026-03-28 01:18:20 +08:00
Kusha Gharahi	ff934e29bc	server: Introduce LLAMA_BUILD_WEBUI build flag to allow disabling the embedded web ui (#20158 ) * introduce LLAMA_SERVER_NO_WEBUI * LLAMA_SERVER_NO_WEBUI → LLAMA_BUILD_WEBUI * LLAMA_BUILD_WEBUI ON by default not based on LLAMA_STANDALONE * MIssed this * Add useWebUi to package.nix	2026-03-27 17:25:55 +01:00
Aleksander Grygier	e6f6770515	webui: Improve Chat Messages initial scroll + auto-scroll logic + add lazy loading with transitions to content blocks (#20999 ) * refactor: Always use agentic content renderer for Assistant Message * feat: Improve initial scroll + auto-scroll logic + implement fade in action for content blocks * chore: update webui build output	2026-03-27 17:01:36 +01:00
AN Long	48cda24c11	server: remove the verbose_prompt parameter (#21059 ) * server: respect the verbose_prompt parameter * Revert "server: respect the verbose_prompt parameter" This reverts commit 8ed885cf375b2c8ba641c661f3667df70b9797f4. * Remove --verbose-prompt parameter from llama-server * Using set_examples instead of set_excludes	2026-03-27 13:36:13 +02:00
Xuan-Son Nguyen	20197b6fe3	server: add built-in tools backend support (#20898 ) * wip: server_tools * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * change arg to --tools all * add readme mention * llama-gen-docs	2026-03-27 10:07:11 +01:00
Pascal	d0fa2c9fbb	Send reasoning content back to the model across turns via the reasoning_content API field (#21036 ) * webui: send reasoning_content back to model in context Preserve assistant reasoning across turns by extracting it from internal tags and sending it as a separate reasoning_content field in the API payload. The server and Jinja templates handle native formatting (e.g. <think> tags for Qwen, GLM, DeepSeek...). Adds "Exclude reasoning from context" toggle in Settings > Developer (off by default, so reasoning is preserved). Includes unit tests. * webui: add syncable parameter for excludeReasoningFromContext * chore: update webui build output	2026-03-27 08:17:35 +01:00
Concedo	c00fe0af5a	Merge commit '`9f102a1407`' into concedo_experimental # Conflicts: # .devops/intel.Dockerfile # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/011-bug-results.yml # .github/pull_request_template.md # CODEOWNERS # README.md # common/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/binary-ops.c # ggml/src/ggml-hexagon/htp/hex-dma.c # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/hex-dump.h # ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hexagon/htp/ssm-conv.c # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # scripts/snapdragon/adb/run-bench.sh # scripts/sync_vendor.py # tests/test-backend-ops.cpp # tools/llama-bench/llama-bench.cpp	2026-03-25 23:45:41 +08:00
Concedo	8a6c41dc5c	Merge commit '`841bc203e2`' into concedo_experimental # Conflicts: # .github/workflows/ai-issues.yml # embd_res/templates/HuggingFaceTB-SmolLM3-3B.jinja # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-openvino/ggml-openvino.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-chat-auto-parser.cpp # tests/test-jinja.cpp # tools/cli/README.md # tools/completion/README.md # tools/server/README.md	2026-03-25 22:49:53 +08:00
Aleksander Grygier	69e0ecef06	webui: Fix editing assistant message without branching (#20944 ) * fix: Editing assistant response without branching * chore: update webui build output	2026-03-25 12:47:33 +02:00
Pascal	062cca58fc	Add SLEEPING status to the WebUI model selector (#20949 ) * webui: handle sleeping model status, fix favourite -> favorite * Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: fix optional event parameter in sleeping model onclick * typo * webui: restore orange sleeping indicator dot with hover unload * chore: update webui build output * webui: move stopPropagation into ActionIcon onclick, remove svelte-ignore * chore: update webui build output * webui: fix favourite -> favorite (UK -> US spelling) everywhere Address review feedback from WhyNotHugo * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-25 11:02:32 +01:00
BlueMöhre	a94fdb090a	WebUI: fix edit msg form textarea height (#20830 ) * autoresize textarea on mount * allow textarea to grow to same height as rendered messages * add UI build file	2026-03-24 13:17:45 +01:00
Adrien Gallouët	8c7957ca33	common : add standard Hugging Face cache support (#20775 ) * common : add standard Hugging Face cache support - Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check with the quant tag Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Cleanup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Improve error handling and report API errors Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Restore common_cached_model_info and align mmproj filtering Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Prefer main when getting cached ref Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use cached files when HF API fails Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use final_path.. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check all inputs Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-24 07:30:33 +01:00
Aleksander Grygier	11fb11b901	webui: Improve chat form positioning (#20901 )	2026-03-23 14:30:55 +01:00
Eric Zhang	841bc203e2	docs : rerun llama-gen-docs to include new CLI args (#20892 )	2026-03-23 12:33:38 +01:00
Xuan-Son Nguyen	31a5cf4c3f	server: use httplib dynamic threads (#20817 ) * server: use httplib dynamic threads * change to n_threads_http + 1024	2026-03-23 12:22:46 +01:00
Pascal	c44a932cf4	webui: fix --webui-config-file settings not applied on load (#20823 ) * webui: fix --webui-config-file settings not applied on load * chore: update webui build output	2026-03-23 11:25:35 +01:00
Xuan-Son Nguyen	49bfddeca1	server: allow router to report child instances sleep status (#20849 ) * server: allow router to report child instances sleep status * refactor * move sleeping to state * nits	2026-03-22 18:33:52 +01:00
Concedo	ef854f002e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/python-type-check.yml # AGENTS.md # CONTRIBUTING.md # examples/model-conversion/scripts/embedding/run-original-model.py # examples/model-conversion/scripts/utils/compare_tokens.py # examples/pydantic_models_to_grammar.py # ggml/src/ggml-rpc/ggml-rpc.cpp # pyrightconfig.json # scripts/compare-llama-bench.py # scripts/jinja/jinja-tester.py # scripts/server-bench.py # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-llama-grammar.cpp # tests/test-tokenizer-random.py # tools/cli/README.md # tools/completion/README.md # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2026-03-22 23:39:13 +08:00
Evgeny Kurnevsky	81bc4d3ddc	server: fix Host header (#20843 ) It should include port when it's not default.	2026-03-22 22:29:22 +08:00
ddh0	3306dbaef7	misc : prefer ggml-org models in docs and examples (#20827 ) Some checks failed Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Python Type-Check / python type-check (push) Has been cancelled Details * misc : prefer ggml-org models in docs and examples Prefer referring to known-good quantizations under ggml-org rather than 3rd-party uploaders. * remove accidentally committed file	2026-03-21 22:00:26 +01:00
Sigbjørn Skjæret	29b28a9824	ci : switch from pyright to ty (#20826 ) * type fixes * switch to ty * tweak rules * tweak more rules * more tweaks * final tweak * use common import-not-found rule	2026-03-21 08:54:34 +01:00
Concedo	6054bacadd	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/ai-issues.yml # CONTRIBUTING.md # docs/autoparser.md # docs/ops.md # docs/ops/Metal.csv # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/hex-dma.h # ggml/src/ggml-hexagon/htp/hex-utils.h # ggml/src/ggml-hexagon/htp/htp-ctx.h # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/htp_iface.idl # ggml/src/ggml-hexagon/htp/hvx-base.h # ggml/src/ggml-hexagon/htp/main.c # ggml/src/ggml-hip/CMakeLists.txt # models/templates/Apriel-1.6-15b-Thinker-fixed.jinja # models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja # models/templates/deepseek-ai-DeepSeek-V3.1.jinja # models/templates/llama-cpp-deepseek-r1.jinja # models/templates/meetkai-functionary-medium-v3.1.jinja # scripts/fetch_server_test_models.py # scripts/snapdragon/adb/run-cli.sh # scripts/snapdragon/adb/run-completion.sh # scripts/snapdragon/adb/run-mtmd.sh # scripts/snapdragon/adb/run-tool.sh # tests/test-chat-auto-parser.cpp # tests/test-chat-peg-parser.cpp # tests/test-chat.cpp # tools/cli/cli.cpp # tools/server/README.md	2026-03-21 12:06:01 +08:00

1 2 3 4 5 ...

615 commits