Concedo
f8ee5d9e25
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# src/llama-kv-cache.cpp
# tests/test-backend-ops.cpp
2025-08-25 01:53:26 +08:00
Georgi Gerganov
b730706a49
kv-cache : support layer reuse ( #15504 )
...
* kv-cache : support layer reuse
ggml-ci
* cont : update comments [no ci]
2025-08-24 13:07:07 +03:00
Concedo
90a6cb5b6c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# scripts/compare-llama-bench.py
# tests/test-chat-template.cpp
2025-08-23 23:50:12 +08:00
Concedo
a67218cf93
Revert "cherry pick seed OSS impl for early merge"
...
This reverts commit b87d49c018 .
2025-08-23 23:49:29 +08:00
Piotr Wilkin (ilintar)
b1afcab804
model : add support for Seed-OSS ( #15490 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* First draft
* Fix linter errors
* Added missing sinks nullptr
* Don't forget the llama-arch!
* We're through to the generation stage.
* Fix post-attention norm
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Fix RoPE type
* Fix tensor name and reorder llm_types
* Update gguf-py/gguf/constants.py
Remove nonexistent FFN_POST_NORM tensor
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Add basic chat template
* Add chat template tests
* Remake chat template test
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-chat.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Reorder llm type descriptions
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-23 15:21:52 +02:00
Concedo
b87d49c018
cherry pick seed OSS impl for early merge
2025-08-23 18:02:04 +08:00
Concedo
4828d0e148
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/vulkan.Dockerfile
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/CMakeLists.txt
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
# ggml/src/ggml-webgpu/wgsl-shaders/memset.wgsl
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
2025-08-23 17:49:24 +08:00
LaffeyNyaa
21dc4ddaf2
chat : fix debug build assertion in trim function ( #15520 )
2025-08-23 10:38:30 +02:00
Concedo
8b8396c30c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# docs/build-s390x.md
# examples/llama.vim
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-08-23 11:35:28 +08:00
Georgi Gerganov
9ebebef62f
llama : remove KV cache defragmentation logic ( #15473 )
...
ggml-ci
2025-08-22 12:22:13 +03:00
Tarek Dakhran
e288693669
readme : model : mtdm : lfm2 improvements ( #15476 )
...
* Support untied embeddings
* Increase number of image tokens to 1024
* Add LFM2-VL to readme
* Actually use untied embeddings
2025-08-22 09:29:08 +02:00
Georgi Gerganov
cd36b5e5c7
llama : remove deprecated llama_kv_self API ( #15472 )
...
ggml-ci
2025-08-21 19:13:45 +03:00
Georgi Gerganov
3f196be84b
graph : remove build_attn_with_sinks overload ( #15469 )
...
ggml-ci
2025-08-21 18:44:45 +03:00
Georgi Gerganov
715a6db02c
kv-cache : drop the "unified" prefix ( #15467 )
...
* kv-cache : drop the "unified" prefix
ggml-ci
* cont : fix comment [no ci]
2025-08-21 17:00:33 +03:00
Concedo
1c41c38a6a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cuda.Dockerfile
# CODEOWNERS
# README.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-opencl/ggml-opencl.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/test-chat.cpp
# tools/batched-bench/batched-bench.cpp
# tools/mtmd/clip.h
2025-08-20 20:34:45 +08:00
Georgi Gerganov
9ef6b0b835
model : add gpt-oss type strings ( #15424 )
2025-08-19 19:58:28 +03:00
Georgi Gerganov
9d262f4bad
server : remove swa_full warning ( #15399 )
2025-08-19 08:45:26 +03:00
Sigbjørn Skjæret
baa9255a45
llama : merge conts and reshapes and remove unnecessary cont ( #15380 )
...
* remove unnecessary conts and merge reshapes
* restore necessary conts
* merge more conts and reshapes
* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
Concedo
d876898476
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/labeler.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# CODEOWNERS
# README.md
# docs/build-s390x.md
# docs/ops.md
# examples/eval-callback/eval-callback.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/transpose.cl
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tests/test-opt.cpp
2025-08-16 12:39:25 +08:00
Daniel Bevenius
7a0de96045
llama : add 18-layer model type for Gemma 3-270m ( #15319 )
...
This commit adds support for the 18-layer model type in the Gemma3
series, which is the size of the Gemma3-270m model.
The motivation for this commit is was the only change required for
Gemma3-270m to be converted to GGUF format and used with llama.cpp.
Once the model has been converted and uploaded to Huggingface it can be
used like this:
```console
$ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0
```
2025-08-14 17:56:26 +02:00
Aldehir Rojas
b204a5a234
gpt-oss: implement harmony parsing ( #15181 )
...
* model : add harmony parser for gpt-oss
* gpt-oss : fix grammar trigger from causing empty stack
* gpt-oss: tweak the grammar trigger again
* gpt-oss : add support for recipient in role header
* gpt-oss : fix ungrouped tool calls in grammar
* gpt-oss : loosen function name matching during parse
* gpt-oss : clean up workarounds
* gpt-oss : add template tests
* gpt-oss : simulate thinking and tool call tags
* gpt-oss : undo think tags when reasoning_format is none
* gpt-oss : set special tokens back to user defined
* gpt-oss : update openai-gpt-oss template
* server : filter out harmony thought messages
* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00
Concedo
7ac0102ed3
hope i didnt break anything
2025-08-14 21:42:24 +08:00
Concedo
d5876024ec
Merge commit ' f4586ee598' into concedo_experimental
...
# Conflicts:
# README.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.6.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/add.cl
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tools/perplexity/perplexity.cpp
# tools/server/README.md
2025-08-14 21:29:52 +08:00
Georgi Gerganov
d32e03f449
server : add SWA checkpoints ( #15293 )
...
* server : add SWA checkpoints
ggml-ci
* cont : server clean-up
* server : handle state restore fails
* llama : add extended llama_state_seq_ API
* server : do not make checkpoints if --swa-full
ggml-ci
* llama : remove flags value for NONE
* server : configure number of SWA checkpoints with CLI arg
ggml-ci
* args : fix scope of new argument
2025-08-14 14:59:50 +03:00
kallewoof
810b9fc8b9
perplexity : provide a helpful hint for has_cpl case in split_equal error. ( #15304 )
...
When attempting to do llama-perplexity on certain tasks which have coupled sequences there is a cryptic error that does not tell you what to do, which is to set the -kvu flag. This adds a hint about that fact.
2025-08-14 14:03:30 +03:00
Jonathan Graehl
5cdb27e091
finetune: SGD optimizer, more CLI args ( #13873 )
...
* examples/finetune -opt SGD (stochastic gradient descent) memory opt
add unit tested GGML_OPT_OPTIMIZER_SGD to ggml - avoids allocating
m, v tensors.
support finetune.cpp arg -opt SGD (or sgd). (default adamw as before)
llama 3.2-1b-F32 result: observed 11gb gpu ram (41 sec/epoch)
when using SGD instead of 19gb (55 sec/epoch) using adamw.
(wikipedia 100 lines finetune)
(
using the same GPU memory, adamw can only do before OOM 512
batch/context, reaching:
train: [███████▉] data=0000140/0000140 loss=0.02575±0.00099 acc=99.52±0.03% t=00:00:47 ETA=00:00:00
val: [███████▉] data=0000008/0000008 loss=4.76565±0.28810 acc=41.46±0.77% t=00:00:00 ETA=00:00:00
SGD is superior, though it converges slower, with max before OOM 1728
batch/context (esp see the better validation perf):
train: [███████▉] data=0000039/0000039 loss=0.00371±0.00010 acc=99.96±0.01% t=00:00:41 ETA=00:00:00
val: [███████▉] data=0000003/0000003 loss=5.11406±0.76034 acc=48.01±0.69% t=00:00:01 ETA=00:00:00
)
note: when finetuning long enough (or w/ enough -lr),
validation accuracy *eventually* drops ('catastrophic forgetting')
-lr-half (halflife) option useful for SGD to avoid oscillation or
super slow underdamped learning (makes setting -lr more forgiving).
terminal -lr for now is set by lr-halvings i.e. if you want at most
1/8 the inital -lr you set -lr-halvings 3.
note: objective loss not directly comparable between adamw, sgd? -
check perplexity or accuracy or consider relative improvements
for convergence
new finetune args -wd 1e-9 to enable weight decay in sgd or adamw,
and max -epochs N (default 2 as before)
cache (1 - wd*alpha) in 'adamw' opt struct -
no noticeable perf benefit, disabled (still done
for new SGD though)
since opt. memory is pre-allocated, the ggml_opt_get_optimizer_params
would probably be able to change between SGD and AdamW with each epoch
but would need to use adamw for the first (unconfirmed - no cmdline arg
to set such a policy yet)
test-opt checks adamw as before and now sgd (except for a few disabled
tests for sgd only; probably just needs logging values and adding
alternate reference values); tolerance on the 'regression'
test is broader for sgd (so we don't need many more epochs)
* Vulkan: Implement GGML_OP_OPT_STEP_SGD
* tests: Fix OPT_STEP_SGD test-backend-ops
* SGD op param store weight-decay and not 1-alpha*wd
* minor + cosmetic changes
* fix vulkan sgd
* try CI fix
---------
Co-authored-by: 0cc4m <picard12@live.de>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-08-14 12:03:57 +02:00
Georgi Gerganov
228f724d9c
kv-cache : fix seq_rm with seq_id == -1 ( #15226 )
...
* kv-cache : fix seq_rm with seq_id == -1
ggml-ci
* cont : iterate over streams
ggml-ci
2025-08-11 13:58:24 +03:00
Daniel Bevenius
cd3069dfcb
kv-cache : log (debug) all streams in find_slot ( #15176 )
...
This commit updates `llama_kv_cache_unified::find_slot` to log
information for all streams when debug is enabled.
The motivation for this change is that currently if a non-unified
kv-cache is used, then only one stream will be logged because the
code was currently uses `seq_to_stream[1]`.
2025-08-11 11:21:19 +02:00
Concedo
d5b5e79035
should fix vulkan bsod
2025-08-08 10:57:50 +08:00
Xuan-Son Nguyen
50aa938901
convert : support non-mxfp4 HF model ( #15153 )
...
* convert : support non-mxfp4 HF model
* rm redundant check
* disable debug check
2025-08-07 23:26:03 +02:00
Concedo
8a71eb03c0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ggml/cmake/ggml-config.cmake.in
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/fattn.cu
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# requirements/requirements-convert_hf_to_gguf.txt
# scripts/compare-llama-bench.py
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2025-08-07 21:23:09 +08:00
Concedo
338b1fe97e
readjusted mistral and oai template, fixed compile issue on termux, updated lite, show generated token ids in debug mode
2025-08-07 21:14:48 +08:00
Sigbjørn Skjæret
65c797c4fa
chat : fix yandex chat template ( #15116 )
2025-08-06 13:26:49 +02:00
stevenkuang
25726898e8
chat : fix hunyuan auto-detection ( #15114 )
...
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
2025-08-06 11:48:30 +02:00
Concedo
6eea7b88d2
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2025-08-06 10:51:29 +08:00
Georgi Gerganov
fd1234cb46
llama : add gpt-oss ( #15091 )
...
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7 )
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1 )
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11 )
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com>
change kvalues_mxfp4 table to match e2m1 (#6 )
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13 )
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>
2025-08-05 22:10:36 +03:00
Concedo
8bffbd9ce5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/ggml-sycl.cpp
2025-08-06 00:44:19 +08:00
Juk Armstrong
c81de6e107
Fix glm4moe bug ( #15088 )
2025-08-05 13:56:44 +01:00
Concedo
7590a0ea39
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ggml/CMakeLists.txt
# ggml/cmake/ggml-config.cmake.in
# ggml/src/CMakeLists.txt
# models/templates/README.md
# tools/imatrix/imatrix.cpp
2025-08-05 19:24:29 +08:00
compilade
ee3a9fcf88
context : fix index overflow on huge outputs ( #15080 )
...
* context : fix overflow when re-ordering huge outputs
* context : fix logits size overflow for huge batches
2025-08-05 11:27:45 +02:00
Sam
ef0144c087
model: support GLM 4.5 family of models ( #14939 )
...
* model: Add GLM 4.5 (#14921 )
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Merge in PR suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: Add GLM 4.5 family of models (#14921 )
1. Updated tensor_mapping.py with NextN tensor mappings
- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm
2. Added num_nextn_predict_layers configuration
- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config
3. Added FIM tokens for GLM4_MOE
- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
- <|code_prefix|> for FIM_PRE
- <|code_suffix|> for FIM_SUF
- <|code_middle|> for FIM_MID
4. Removed manual NextN tensor handling
- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system
* glm 4.5 update tensors names
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: glm 4.5 apply suggestions from code review
* Apply suggestions from code review
* patch broken chat template
* typings fix
* add TENSOR_SKIP flag
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Update src/llama-model-loader.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-04 20:29:25 +02:00
Concedo
8bd0a560f0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# requirements/requirements-convert_hf_to_gguf_update.txt
# scripts/compare-llama-bench.py
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/imatrix/README.md
# tools/imatrix/imatrix.cpp
# tools/llama-bench/llama-bench.cpp
2025-08-04 22:42:02 +08:00
Concedo
d37529c0cd
add sanitize flag
2025-08-04 22:19:23 +08:00
compilade
11a3811164
memory : handle kv_unified for hybrid models ( #15050 )
2025-08-03 21:43:07 +02:00
Csaba Kecskemeti
97366dc6ab
vocab : JetBrains Mellum pre-tokenizer ( #15045 )
2025-08-03 21:38:18 +02:00
Daniel Bevenius
4fdea540bd
kv-cache : skip alignment of n_stream in kv-cache log msg [no ci] ( #15040 )
...
This commit removes the right alignment the `n_stream` value in the
log message in the `llama_kv_cache_unified` constructor.
The motivation for this change is to enhance the readability of log
message. Currently the output looks like this:
```console
llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/ 1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
```
Notice that the `n_stream` value is right aligned, which makes it a
little harder to read.
With the change in this commit the output will look like
```console
llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
```
2025-08-02 17:14:57 +03:00
Georgi Gerganov
a4569c41fd
llama : enable LLAMA_SET_ROWS=1 by default ( #14959 )
...
ggml-ci
2025-08-02 17:14:21 +03:00
Douglas Hanley
339bd0268c
model : support Qwen3-Embedding ( #15023 )
2025-08-02 10:44:50 +02:00
Concedo
f430916a71
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/backend/CANN.md
# docs/multimodal/minicpmo2.6.md
# docs/multimodal/minicpmv2.5.md
# docs/multimodal/minicpmv2.6.md
# examples/speculative-simple/speculative-simple.cpp
# ggml/cmake/ggml-config.cmake.in
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/add.cl
# ggml/src/ggml-opencl/kernels/mul.cl
# scripts/compare-commits.sh
# scripts/compare-llama-bench.py
# scripts/sync-ggml.last
# tools/server/README.md
2025-08-02 10:25:10 +08:00
Concedo
b04362f831
Merge commit ' 00131d6eaf' into concedo_experimental
...
# Conflicts:
# docs/ops.md
# examples/save-load-state/save-load-state.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-sycl/cpy.cpp
# ggml/src/ggml-sycl/cpy.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
# scripts/server-bench.py
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-thread-safety.cpp
# tools/llama-bench/llama-bench.cpp
2025-08-02 10:15:39 +08:00