Concedo
c93c4c5505
Merge commit ' 4a4f7e6550' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# CODEOWNERS
# README.md
# ci/run.sh
# docs/development/HOWTO-add-model.md
# grammars/README.md
# src/llama-context.cpp
# src/llama.cpp
# tools/CMakeLists.txt
# tools/completion/README.md
# tools/llama-bench/README.md
2025-12-17 14:30:39 +08:00
Johannes Gäßler
b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization ( #16653 )
...
* llama: automatically fit args to free memory
llama-fit-params tool
* fix CI
* hints for bug reports, ensure no reallocation
* fix segfault with Vulkan
* add llama-fit-params to CI
* fix CI
* fix CI
* fix CI
* minor adjustments
* fix assignment of 1 dense layer
* fix logger not being reset on model load failure
* remove --n-gpu-layer hint on model load failure
* fix llama-fit-params verbosity
* fix edge case
* fix typo [no ci]
2025-12-15 09:24:59 +01:00
Concedo
5248838a05
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cann.Dockerfile
# .github/workflows/build.yml
# .github/workflows/release.yml
# .gitignore
# README.md
# common/CMakeLists.txt
# docs/ops.md
# docs/ops/Vulkan.csv
# examples/eval-callback/eval-callback.cpp
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/arch/x86/repack.cpp
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# scripts/sync-ggml.last
# src/llama-grammar.cpp
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/server/CMakeLists.txt
2025-11-22 18:26:13 +08:00
Georgi Gerganov
196f5083ef
common : more accurate sampling timing ( #17382 )
...
* common : more accurate sampling timing
* eval-callback : minor fixes
* cont : add time_meas impl
* cont : fix log msg [no ci]
* cont : fix multiple definitions of time_meas
* llama-cli : exclude chat template init from time measurement
* cont : print percentage of unaccounted time
* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Concedo
60e9f285c3
extend log
2025-06-26 18:52:44 +08:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp ( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00