Commit graph

1116 commits

Author SHA1 Message Date
Concedo
7b393fa487 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	AUTHORS
#	ci/run.sh
#	docs/backend/SYCL.md
#	docs/build.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmo4.0.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	docs/multimodal/minicpmv4.0.md
#	docs/multimodal/minicpmv4.5.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	docs/speculative.md
#	examples/deprecation-warning/README.md
#	examples/deprecation-warning/deprecation-warning.cpp
#	examples/model-conversion/Makefile
#	examples/model-conversion/scripts/causal/convert-model.sh
#	ggml/include/ggml-cann.h
#	ggml/src/ggml-cann/acl_tensor.cpp
#	ggml/src/ggml-cann/acl_tensor.h
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cann/common.h
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-metal/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/concat.cl
#	ggml/src/ggml-opencl/kernels/repeat.cl
#	ggml/src/ggml-opencl/kernels/scale.cl
#	ggml/src/ggml-opencl/kernels/tanh.cl
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-sycl/dpct/helper.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/outprod.cpp
#	ggml/src/ggml-sycl/rope.cpp
#	ggml/src/ggml-sycl/wkv.cpp
#	src/llama-vocab.cpp
#	tests/test-autorelease.cpp
#	tests/test-backend-ops.cpp
#	tools/cvector-generator/pca.hpp
#	tools/export-lora/export-lora.cpp
#	tools/perplexity/README.md
2026-02-03 19:00:42 +08:00
Alexey Dubrov
1efb5f7ae1
vocab: add Falcon-H1-Tiny-Coder FIM tokens (#19249) 2026-02-03 08:31:01 +02:00
Georgi Gerganov
6fdddb4987
metal : support virtual devices (#18919)
* metal : support virtual devices

* cont : manage buffer type context memory

* metal : add events

* cont : implement cpy_tensor_async
2026-02-02 14:29:44 +02:00
Christian Kastner
7a4ca3cbd9
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00
Concedo
ddce19db72 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package-gguf-py.nix
#	.devops/nix/scope.nix
#	common/CMakeLists.txt
#	docs/backend/SYCL.md
#	examples/lookahead/lookahead.cpp
#	examples/lookup/lookup.cpp
#	examples/sycl/run-llama2.sh
#	examples/sycl/win-run-llama2.bat
#	examples/sycl/win-test.bat
#	ggml/src/ggml-hexagon/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/hvx-dump.h
#	ggml/src/ggml-hexagon/htp/hvx-reduce.h
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	scripts/sync-ggml.last
2026-02-01 22:35:25 +08:00
Daniel Bevenius
f3bc98890c
memory : clarify comments for r_l and s_l tensors [no ci] (#19203)
This commit updates the comments in state_write_data to clarify that it
is handling the R and S tensors and not Key and Value tensors.
2026-01-30 15:18:41 +01:00
Concedo
a6efa9d182 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	tests/test-backend-ops.cpp
2026-01-30 20:37:37 +08:00
Daniel Bevenius
83bcdf7217
memory : remove unused tmp_buf (#19199)
This commit removes the unused tmp_buf variable from llama-kv-cache.cpp
and llama-memory-recurrent.cpp.

The tmp_buf variable was declared but never used but since it has a
non-trivial constructor/desctuctor we don't get an unused variable
warning about it.
2026-01-30 10:37:06 +01:00
Concedo
8d173f50c2 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	docs/backend/SYCL.md
#	docs/backend/snapdragon/CMakeUserPresets.json
#	docs/backend/snapdragon/README.md
#	docs/backend/snapdragon/developer.md
#	docs/ops.md
#	docs/ops/SYCL.csv
#	embd_res/templates/upstage-Solar-Open-100B.jinja
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-hexagon/CMakeLists.txt
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-sycl/element_wise.cpp
#	ggml/src/ggml-sycl/element_wise.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl
#	tests/test-chat.cpp
2026-01-30 15:32:59 +08:00
Georgi Gerganov
4fdbc1e4db
cuda : fix nkvo, offload and cuda graph node properties matching (#19165)
* cuda : fix nkvo

* cont : more robust cuda graph node property matching

* cont : restore pre-leafs implementation

* cont : comments + static_assert
2026-01-29 18:45:30 +02:00
Concedo
46cd17c17e Merge commit '88d23ad515' into concedo_experimental
# Conflicts:
#	CODEOWNERS
#	docs/build.md
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-zendnn/CMakeLists.txt
#	tests/test-chat-template.cpp
2026-01-29 22:25:56 +08:00
Georgi Gerganov
c5c64f72ac
llama : disable Direct IO by default (#19109)
* llama : disable Direct IO by default

* cont : override mmap if supported
2026-01-28 09:11:13 +02:00
Daniel Bevenius
eef375ce16
sampling : remove sampling branching in output_reserve (#18811)
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
2026-01-28 05:59:30 +01:00
Concedo
f6ece6fd37 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/check-vendor.yml
#	.github/workflows/close-issue.yml
#	.github/workflows/editorconfig.yml
#	.github/workflows/gguf-publish.yml
#	.github/workflows/labeler.yml
#	.github/workflows/pre-tokenizer-hashes.yml
#	.github/workflows/python-check-requirements.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/python-type-check.yml
#	.github/workflows/server.yml
#	.github/workflows/update-ops-docs.yml
#	README.md
#	docs/build.md
#	examples/model-conversion/scripts/utils/perplexity-gen.sh
#	examples/model-conversion/scripts/utils/perplexity-run-simple.sh
#	examples/model-conversion/scripts/utils/perplexity-run.sh
#	examples/model-conversion/scripts/utils/quantize.sh
#	examples/model-conversion/scripts/utils/run-embedding-server.sh
#	ggml/src/ggml-cpu/ggml-cpu.c
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-opencl/kernels/cvt.cl
#	ggml/src/ggml-opencl/kernels/mul_mv_q6_k_f32.cl
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/compare-llama-bench.py
#	tests/test-backend-ops.cpp
#	tests/test-gguf.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-01-27 23:06:13 +08:00
Georgi Gerganov
8f80d1b254
graph : fix nkvo offload with FA (#19105) 2026-01-26 20:18:34 +02:00
Georgi Gerganov
56f3ebf38e
model : add correct type for GLM 4.7 Flash (#19106) 2026-01-26 11:24:30 +02:00
Georgi Gerganov
d9c6ce46f7
kv-cache : support V-less cache (#19067)
* kv-cache : support V-less cache

* cuda : better check for V_is_K_view

* cuda : improve V_is_K_view check

* graph : add comments

* hparams : refactor
2026-01-25 15:48:56 +02:00
Georgi Gerganov
080b161995
completion : fix prompt cache for recurrent models (#19045) 2026-01-25 09:12:50 +02:00
Jakkala Mahesh
24bc238303
llama: fix integer type consistency in split helpers (#18894)
* llama: fix integer type consistency in split helpers

* llama: apply minor style fixes

* llama: remove trailing whitespace
2026-01-25 09:10:52 +02:00
Johannes Gäßler
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 (#19070) 2026-01-24 22:13:08 +01:00
Georgi Gerganov
557515be1e
graph : utilize ggml_build_forward_select() to avoid reallocations (#18898)
* graph : avoid branches between embedding and token inputs

* models : make deepstack graphs (e.g. Qwen3 VL) have constant topology

* ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI

* cont : pad token embeddings to n_embd_inp
2026-01-23 18:22:34 +02:00
Concedo
e8e7c357c9 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build-cache.yml
#	.github/workflows/build-cmake-pkg.yml
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/check-vendor.yml
#	.github/workflows/close-issue.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/docker.yml
#	.github/workflows/editorconfig.yml
#	.github/workflows/gguf-publish.yml
#	.github/workflows/labeler.yml
#	.github/workflows/pre-tokenizer-hashes.yml
#	.github/workflows/python-check-requirements.yml
#	.github/workflows/python-lint.yml
#	.github/workflows/python-type-check.yml
#	.github/workflows/release.yml
#	.github/workflows/server-webui.yml
#	.github/workflows/server.yml
#	.github/workflows/update-ops-docs.yml
#	.github/workflows/winget.yml
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-zdnn/ggml-zdnn.cpp
#	requirements/requirements-tool_bench.txt
#	src/CMakeLists.txt
#	src/llama-quant.cpp
#	tests/test-backend-ops.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
#	tools/server/README.md
2026-01-23 14:27:04 +08:00
Concedo
5c6cc02985 remove clblast, part 2 2026-01-23 14:09:46 +08:00
Georgi Gerganov
a5eaa1d6a3
mla : make the V tensor a view of K (#18986)
* mla : pass V as a view of K to the FA op

* cuda : adjust mla logic to new layout

* kv-cache : fix rope shift

* tests : remove comment

* cuda : fix reusable_cutoff

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-22 22:09:01 +02:00
Georgi Gerganov
0e4ebeb057
quant : manual overrides of tensor types take precedence (#18952) 2026-01-22 16:17:06 +02:00
Daniel Bevenius
9da3dcd753
llama : clarify nemotron-h.cpp comment about RoPE [no ci] (#18997)
This commit removes the mention of RoPE in the comment for the Q and K
computation as RoPE is not applied.
2026-01-21 18:31:34 +01:00
Concedo
4984c9bc16 Merge commit '12a4a47e6a' into concedo_experimental
# Conflicts:
#	ci/run.sh
#	examples/model-conversion/scripts/causal/run-converted-model-embeddings-logits.sh
#	examples/model-conversion/scripts/causal/run-converted-model.sh
#	examples/model-conversion/scripts/embedding/run-converted-model.sh
#	ggml/src/ggml-cann/ggml-cann.cpp
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-zdnn/ggml-zdnn.cpp
#	ggml/src/ggml-zendnn/ggml-zendnn.cpp
#	tests/CMakeLists.txt
#	tests/test-chat-parser.cpp
#	tests/test-chat-peg-parser.cpp
#	tests/test-chat.cpp
#	tools/cli/cli.cpp
2026-01-21 21:00:44 +08:00
Tarek Dakhran
ad8d85bd94
memory : add llama_memory_hybrid_iswa (#18601)
Some checks failed
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* memory : add llama_memory_hybrid_iswa

* Update src/llama-memory-hybrid-iswa.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-21 14:30:23 +02:00
Piotr Wilkin (ilintar)
12a4a47e6a
Fix GLM 4.7 Lite MoE gating func (#18980)
* Fix GLM 4.7 MoE gating func

* Update src/models/deepseek2.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-model.cpp

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-21 12:35:20 +01:00
Julius Tischbein
287a33017b
llama : Extend fallback, fix fileno for dio file, exclude case that mmap uses dio file (#18887) 2026-01-18 18:35:57 +02:00
Concedo
7f618454ff Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/labeler.yml
#	CODEOWNERS
#	docs/backend/OPENCL.md
#	docs/ops.md
#	docs/ops/CANN.csv
#	docs/ops/WebGPU.csv
#	ggml/src/ggml-blas/CMakeLists.txt
#	ggml/src/ggml-opencl/kernels/mul_mv_q6_k.cl
#	ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	ggml/src/ggml-webgpu/wgsl-shaders/cpy.tmpl.wgsl
#	ggml/src/ggml-webgpu/wgsl-shaders/set_rows.wgsl
#	tests/test-backend-ops.cpp
2026-01-18 23:24:29 +08:00
Concedo
7b4517c2fe embeddings memory usage regression fix 2026-01-18 16:26:52 +08:00
Georgi Gerganov
2fbde785bc
kv-cache : optimize KQ mask construction (#18842)
* kv-cache : optimize KQ mask construction

* cont : add explanation + improve

* cont : fix
2026-01-17 15:42:42 +02:00
Concedo
0d43bdc46d Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	examples/batched/batched.cpp
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	src/llama-context.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/server/README.md
2026-01-17 00:41:28 +08:00
Concedo
22af5f1250 Merge commit '2a13180100' into concedo_experimental
# Conflicts:
#	.devops/cann.Dockerfile
#	.devops/cpu.Dockerfile
#	.devops/cuda-new.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/llama-cli-cann.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/nix/package.nix
#	.devops/rocm.Dockerfile
#	.devops/s390x.Dockerfile
#	.devops/vulkan.Dockerfile
#	.github/workflows/build-cmake-pkg.yml
#	.github/workflows/build-linux-cross.yml
#	.github/workflows/build.yml
#	.github/workflows/copilot-setup-steps.yml
#	.github/workflows/release.yml
#	.github/workflows/server-webui.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	README.md
#	build-xcframework.sh
#	ci/run.sh
#	cmake/common.cmake
#	common/CMakeLists.txt
#	docs/backend/hexagon/CMakeUserPresets.json
#	docs/backend/hexagon/README.md
#	docs/build-riscv64-spacemit.md
#	docs/build.md
#	examples/debug/debug.cpp
#	examples/eval-callback/CMakeLists.txt
#	examples/eval-callback/eval-callback.cpp
#	examples/llama.android/lib/build.gradle.kts
#	examples/sycl/build.sh
#	examples/sycl/win-build-sycl.bat
#	ggml/src/ggml-hexagon/ggml-hexagon.cpp
#	ggml/src/ggml-hexagon/htp/CMakeLists.txt
#	ggml/src/ggml-hexagon/htp/act-ops.c
#	ggml/src/ggml-hexagon/htp/binary-ops.c
#	ggml/src/ggml-hexagon/htp/flash-attn-ops.c
#	ggml/src/ggml-hexagon/htp/get-rows-ops.c
#	ggml/src/ggml-hexagon/htp/hex-dma.c
#	ggml/src/ggml-hexagon/htp/hex-dma.h
#	ggml/src/ggml-hexagon/htp/htp-ctx.h
#	ggml/src/ggml-hexagon/htp/htp-msg.h
#	ggml/src/ggml-hexagon/htp/htp-ops.h
#	ggml/src/ggml-hexagon/htp/hvx-utils.h
#	ggml/src/ggml-hexagon/htp/main.c
#	ggml/src/ggml-hexagon/htp/matmul-ops.c
#	ggml/src/ggml-hexagon/htp/rope-ops.c
#	ggml/src/ggml-hexagon/htp/set-rows-ops.c
#	ggml/src/ggml-hexagon/htp/softmax-ops.c
#	ggml/src/ggml-hexagon/htp/unary-ops.c
#	ggml/src/ggml-hexagon/htp/worker-pool.c
#	scripts/debug-test.sh
#	scripts/serve-static.js
#	scripts/snapdragon/adb/run-bench.sh
#	scripts/snapdragon/adb/run-cli.sh
#	scripts/snapdragon/adb/run-mtmd.sh
#	scripts/snapdragon/adb/run-tool.sh
#	scripts/tool_bench.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/mtmd/clip.cpp
2026-01-16 21:52:01 +08:00
Concedo
af7811dbe1 Merge commit '3e4bb29666' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	ci/run.sh
#	cmake/common.cmake
#	examples/eval-callback/CMakeLists.txt
#	examples/model-conversion/scripts/causal/modelcard.template
#	ggml/src/ggml-cuda/fattn.cu
#	ggml/src/ggml-metal/CMakeLists.txt
#	src/CMakeLists.txt
#	tests/CMakeLists.txt
#	tests/test-arg-parser.cpp
2026-01-16 17:55:22 +08:00
Georgi Gerganov
be8e3d9515
context : do not reserve scheduler for warmups (#18867) 2026-01-15 19:35:57 +02:00
ddh0
13f1e4a9ca
llama : add adaptive-p sampler (#17927)
* initial commit for branch

* simplify constants

* add params to `struct common_params_sampling`, add reference to PR

* explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]`

* add args, rename `queue_size` -> `window_size`

* improved comments

* minor

* remove old unused code from algorithm

* minor

* add power law case to `common_sampler_init`, add sampler name mappings

* clarify behaviour when `window_size = 0`

* add missing enums

* remove `target_range` param, make `target == 1` no-op, cleanup code

* oops, straggler

* add missing parameters in `server-task.cpp`

* copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

* remove old debug log, style nit

* fix compiler warning, add commented-out logging per token

* re-write + change parameters + simplify

* oops forgot args.cpp

* fix leftover `window_size`

* add missing values to `common_params_sampling::print()`

* with logging

* does this fix it?

* no, but does this?

* update default decay

* optimize

* fix bad merge

my git skills are lacking

* silence `missing initializer for member`

* update default decay to 0.9

* fix logging

* format (double)

* add power law to the new `samplers` vector

* log sampler init values

* improve logging messages in llama_sampler_power_law

* remove extraneous logging

* simplify target computation

last commit with debug logging!

* remove debug logging, explicitly clamp params at init

* add `use_power_law` flag + logic, minor cleanup

* update `power-law` -> `adaptive-p`

* fix cold start EMA

- `ctx->weighted_sum` is now initialized and reset to `target / (1.0f -
clamped_decay)`
- `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f -
clamped_decay)`

this fixes a "cold start" problem with the moving average

* update `SHARPNESS` constant to `10.0f`

* minor style fixes

no functional changes

* minor style fixes cont.

* update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004)

* separate into `apply` + `accept` functions

* `pending_token_idx`: switch from `llama_token` to `int32`

functionally identical (`llama.h` has `typedef int32_t llama_token;`),
but its more correct now

* don't transform logits <= -1e9f

* fix masking in backend top-p, min-p

* address review comments

* typo in comments `RND` -> `RNG`

* add docs

* add recommended values in completion docs

* address PR feedback

* remove trailing whitespace (for CI `editorconfig`)

* add to adaptive-p to `common_sampler_types_from_chars`
2026-01-15 19:16:29 +02:00
Georgi Gerganov
39173bcacb
context : reserve new scheduler when graph topology changes (#18547)
Some checks failed
Python Type-Check / pyright type-check (push) Waiting to run
Copilot Setup Steps / copilot-setup-steps (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
* context : reserve new scheduler when graph topology changes

* cont : fix

* cont : fix reserve

* cont : reserve only when changes occur + timing

* context : add comments

* llama : reserve on sampler changes

* common : allow null common_sampler

* server : task declares needs (embd, logits, sampling)

* server : do not init sampler if not needed

* llama : fix need_reserve when unsetting a sampler

* server : consolidate slot reset/clear logic
2026-01-15 16:39:17 +02:00
Xuan-Son Nguyen
a7e6ddb8bd
lora: make sure model keep track of associated adapters (#18490)
* lora: make sure model keep track of associated adapters

* deprecate llama_adapter_lora_free

* minor : std::unordered_set over std::set

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-15 10:24:28 +01:00
Sigbjørn Skjæret
2a13180100
model-loader : support bool array sliding window pattern (#18850) 2026-01-15 10:12:46 +01:00
Junwon Hwang
8fb7175576
model : clean up and fix EXAONE-MoE configuration (#18840)
* Fix mismatch of EXAONE-MoE configuration

* ensure gating func is set, cleanup

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-14 19:38:21 +01:00
Aman Gupta
47f9612492
llama-model: fix unfortunate typo (#18832) 2026-01-14 17:55:15 +08:00
Daniel Benjaminsson
d34aa07193
mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819)
Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS.
Skip the resource limit check on Haiku to allow mlock functionality
to work without compile errors.

Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.
2026-01-14 09:11:05 +02:00
ddh0
6e36299b47
llama : print_info alignment fix (#18708)
* fix text spacing in print_info

* align all
2026-01-14 00:05:11 +01:00
Junwon Hwang
60591f01d4
model : add EXAONE MoE (#18543)
* Add EXAONE MoE implementations

Co-authored-by: Junwon Hwang <nuclear1221@gmail.com>

* Address PR feedback

* Address PR feedback

* [WIP] Add MTP for EXAONE-MoE

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

---------

Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>
2026-01-13 23:28:38 +01:00
Georgi Gerganov
e4832e3ae4
vocab : fix attribute overrides for harmony (#18806)
* vocab : fix attribute overrides for harmony

* cont : add warning log
2026-01-13 17:40:13 +02:00
Concedo
7d2c1c4f46 note: clip_is_mrope was moved to mtmd_decode_use_mrope upstream and no longer syncs since https://github.com/ggml-org/llama.cpp/pull/18793
Merge commit 'c1e79e610f' into concedo_experimental

# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/release.yml
#	CMakeLists.txt
#	CONTRIBUTING.md
#	MIT_LICENSE_GGML_SDCPP_LLAMACPP_ONLY.md
#	README.md
#	SECURITY.md
#	ci/run.sh
#	common/CMakeLists.txt
#	common/arg.cpp
#	docs/ops.md
#	docs/ops/BLAS.csv
#	docs/ops/zDNN.csv
#	docs/preset.md
#	examples/batched/batched.cpp
#	examples/debug/debug.cpp
#	ggml/src/ggml-blas/CMakeLists.txt
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	licenses/LICENSE-curl
#	licenses/LICENSE-httplib
#	scripts/pr2wt.sh
#	scripts/sync_vendor.py
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tools/cli/README.md
#	tools/completion/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/server/README.md
#	vendor/cpp-httplib/LICENSE
2026-01-13 23:31:14 +08:00
Concedo
0dc18c668c Merge commit 'a61c8bc3bf' into concedo_experimental
# Conflicts:
#	ggml/src/ggml-opencl/CMakeLists.txt
#	ggml/src/ggml-opencl/ggml-opencl.cpp
#	ggml/src/ggml-webgpu/ggml-webgpu.cpp
#	scripts/pr2wt.sh
#	src/llama-model.cpp
#	tools/CMakeLists.txt
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/clip.cpp
#	tools/mtmd/clip.h
2026-01-13 23:06:50 +08:00
Ruben Ortlam
960e5e3b46
llama-mmap: fix direct-io loading fallback EOF exception (#18801)
Some checks failed
Python Type-Check / pyright type-check (push) Has been cancelled
2026-01-13 15:57:07 +01:00