Concedo
d20e60ddd5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/build.md
# examples/batched/batched.cpp
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
# examples/deprecation-warning/deprecation-warning.cpp
# examples/eval-callback/eval-callback.cpp
# examples/gen-docs/gen-docs.cpp
# examples/gguf-hash/gguf-hash.cpp
# examples/gguf/gguf.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-create.cpp
# examples/lookup/lookup-merge.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# examples/sycl/ls-sycl-device.cpp
# examples/training/finetune.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/amx/common.h
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-opencl/kernels/gemv_noshuffle_general_q8_0_f32.cl
# ggml/src/ggml-opencl/kernels/transpose.cl
# ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# scripts/get-wikitext-2.sh
# tests/test-backend-ops.cpp
# tools/batched-bench/batched-bench.cpp
# tools/cvector-generator/cvector-generator.cpp
# tools/export-lora/export-lora.cpp
# tools/imatrix/imatrix.cpp
# tools/llama-bench/llama-bench.cpp
# tools/perplexity/perplexity.cpp
# tools/rpc/rpc-server.cpp
# tools/tokenize/tokenize.cpp
2026-03-06 21:19:49 +08:00
ddh0
c99909dd0b
impl : use 6 digits for tensor dims ( #20094 )
...
Many models have vocabulary sizes, and thus tensor shapes, with more
than 5 digits (ex: Gemma 3's vocab size is 262,208).
I already fixed this for `llama_format_tensor_shape` but missed it for
`llama_format_tensor_shape` until now. Oops.
2026-03-04 09:53:38 +01:00
Concedo
d06700687f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/rocm.Dockerfile
# .github/workflows/release.yml
# CMakeLists.txt
# ggml/src/ggml-cuda/common.cuh
# scripts/sync_vendor.py
# tests/test-chat.cpp
2026-02-22 09:33:13 +08:00
ddh0
492bc31978
quantize : add --dry-run option ( #19526 )
...
* clean slate for branch
* use 6 characters for tensor dims
* add --dry-run to llama-quantize
* use 6 characters for tensor dims (cont.)
* no need to re-calculate ggml_nbytes for tensor
* fix indent
* show model and quant BPW when quant completes
* add example to --help
* new function `tensor_requires_imatrix`, add courtesy warning about imatrix
* missing __func__, move imatrix flag set
* logic error
* fixup tensor_requires_imatrix
* add missing `GGML_TYPE`s
* simplify and rename `tensor_type_requires_imatrix`
* simplify for style
* add back Q2_K edge case for imatrix
* guard ftype imatrix warning
* comment ref #12557
* remove per @compilade
* remove unused `params` parameter
* move `bool dry_run` per GG
* move `bool dry_run` per GG
* Update src/llama-quant.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-quant.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-quant.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-20 09:20:16 +01:00
Concedo
c93c4c5505
Merge commit ' 4a4f7e6550' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# CODEOWNERS
# README.md
# ci/run.sh
# docs/development/HOWTO-add-model.md
# grammars/README.md
# src/llama-context.cpp
# src/llama.cpp
# tools/CMakeLists.txt
# tools/completion/README.md
# tools/llama-bench/README.md
2025-12-17 14:30:39 +08:00
Johannes Gäßler
b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization ( #16653 )
...
* llama: automatically fit args to free memory
llama-fit-params tool
* fix CI
* hints for bug reports, ensure no reallocation
* fix segfault with Vulkan
* add llama-fit-params to CI
* fix CI
* fix CI
* fix CI
* minor adjustments
* fix assignment of 1 dense layer
* fix logger not being reset on model load failure
* remove --n-gpu-layer hint on model load failure
* fix llama-fit-params verbosity
* fix edge case
* fix typo [no ci]
2025-12-15 09:24:59 +01:00
Concedo
5248838a05
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# README.md
# common/CMakeLists.txt
# docs/ops.md
# docs/ops/Vulkan.csv
# examples/eval-callback/eval-callback.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# scripts/sync-ggml.last
# src/llama-grammar.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/server/CMakeLists.txt
2025-11-22 18:26:13 +08:00
Georgi Gerganov
196f5083ef
common : more accurate sampling timing ( #17382 )
...
* common : more accurate sampling timing
* eval-callback : minor fixes
* cont : add time_meas impl
* cont : fix log msg [no ci]
* cont : fix multiple definitions of time_meas
* llama-cli : exclude chat template init from time measurement
* cont : print percentage of unaccounted time
* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Concedo
60e9f285c3
extend log
2025-06-26 18:52:44 +08:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp ( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00