Concedo
03cec02a3d
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# .github/workflows/winget.yml
# CODEOWNERS
# README.md
# ci/run.sh
# docs/build.md
# docs/ops.md
# docs/ops/Vulkan.csv
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync_vendor.py
# src/CMakeLists.txt
# tests/test-json-schema-to-grammar.cpp
# tests/test-quantize-stats.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-12-03 18:56:31 +08:00
Concedo
83269df91b
Merge commit ' 649495c9d9' into concedo_experimental
...
# Conflicts:
# CONTRIBUTING.md
# SECURITY.md
# docs/backend/SYCL.md
# examples/sycl/run-llama2.sh
# examples/sycl/run-llama3.sh
# examples/sycl/win-run-llama2.bat
# examples/sycl/win-run-llama3.bat
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/cpy.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-backend-ops.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/server/CMakeLists.txt
2025-12-03 18:43:46 +08:00
Daniel Bevenius
7f3a72a8ed
ggml : remove redundant n_copies check when setting input/output ( #17612 )
...
This commit removes a redundant check for sched->n_copies > 1 when
setting input and output flags on tensor copies in
ggml_backend_sched_split_graph.
The motivation for this change is to clarify the code as the outer if
statement already performs this check.
2025-12-02 12:52:45 +01:00
Georgi Gerganov
90c72a614a
ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler ( #17617 )
2025-12-01 12:49:33 +02:00
Concedo
bf5efcf86d
Merge commit ' d82b7a7c1d' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cuda/common.cuh
# tests/CMakeLists.txt
2025-11-30 15:43:11 +08:00
Diego Devesa
e072b2052e
ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched ( #17276 )
...
* ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched
Enabled in ggml-ci for testing.
* llama : update worst-case graph for unified cache
* ci : disable op offload in some tests
* fix spelling
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 17:33:23 +02:00
LostRuins Concedo
3fe0e39b62
Merge commit ' 4dca015b7e' into concedo_experimental
...
# Conflicts:
# .github/copilot-instructions.md
# README.md
# docs/ops.md
# docs/ops/CPU.csv
# docs/ops/CUDA.csv
# docs/ops/Vulkan.csv
# ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-11-16 18:33:58 +08:00
Diego Devesa
dd091e52f8
sched : fix reserve ignoring user tensor assignments ( #17232 )
2025-11-13 13:14:02 +01:00
Concedo
9720aa6224
change an assert to optional testing https://github.com/LostRuins/koboldcpp/issues/1821
2025-11-02 10:30:04 +08:00
Concedo
b120e107f9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# .devops/musa.Dockerfile
# .github/workflows/build-linux-cross.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .gitignore
# CODEOWNERS
# CONTRIBUTING.md
# README.md
# build-xcframework.sh
# ci/README-MUSA.md
# ci/run.sh
# common/CMakeLists.txt
# docs/docker.md
# examples/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/model-conversion/Makefile
# examples/model-conversion/README.md
# examples/model-conversion/logits.cpp
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh
# examples/model-conversion/scripts/embedding/run-converted-model.sh
# examples/model-conversion/scripts/embedding/run-original-model.py
# examples/model-conversion/scripts/utils/check-nmse.py
# examples/model-conversion/scripts/utils/inspect-org-model.py
# examples/model-conversion/scripts/utils/semantic_check.py
# ggml/CMakeLists.txt
# ggml/include/ggml-zdnn.h
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/set_rows.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-zdnn/ggml-zdnn.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-quantize-perf.cpp
# tests/test-tokenizers-repo.sh
# tools/perplexity/perplexity.cpp
# tools/server/tests/README.md
2025-09-27 17:09:14 +08:00
Johannes Gäßler
e789095502
llama: print memory breakdown on exit ( #15860 )
...
* llama: print memory breakdown on exit
2025-09-24 16:53:48 +02:00
Concedo
c7a1eec4e4
try to solve ttscpp oom regression
2025-09-24 17:45:28 +08:00
Concedo
0dc6b9f418
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/amx/amx.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.tmpl.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/set_rows.wgsl
# ggml/src/ggml-zdnn/ggml-zdnn.cpp
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
2025-09-21 11:38:47 +08:00
Jeff Bolz
c0b45097c3
rename optimize_graph to graph_optimize ( #16082 )
2025-09-18 13:46:17 -05:00
Concedo
6463f5c26b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# CONTRIBUTING.md
# docs/backend/CANN.md
# examples/eval-callback/eval-callback.cpp
# examples/model-conversion/requirements.txt
# examples/model-conversion/scripts/causal/run-org-model.py
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-zdnn/ggml-zdnn.cpp
# models/templates/README.md
# requirements/requirements-convert_hf_to_gguf.txt
# requirements/requirements-convert_legacy_llama.txt
# requirements/requirements-tool_bench.txt
# tests/.gitignore
# tests/test-backend-ops.cpp
# tests/test-chat-parser.cpp
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-tokenizer-random.py
2025-09-11 22:34:45 +08:00
Jeff Bolz
e68aa10d8f
vulkan: sort graph to allow more parallel execution ( #15850 )
...
* vulkan: sort graph to allow more parallel execution
Add a backend proc to allow the backend to modify the graph. The
vulkan implementation looks at which nodes depend on each other
and greedily reorders them to group together nodes that don't
depend on each other. It only reorders the nodes, doesn't change
the contents of any of them.
With #15489 , this reduces the number of synchronizations needed.
* call optimize_graph per-split
2025-09-09 02:10:07 +08:00
Concedo
2562129271
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# ci/run.sh
# docs/backend/CANN.md
# examples/speculative/speculative.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/flash_attn_f16.cl
# ggml/src/ggml-opencl/kernels/flash_attn_f32.cl
# ggml/src/ggml-opencl/kernels/flash_attn_f32_f16.cl
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/gguf.cpp
# src/llama-context.cpp
# tests/test-sampling.cpp
# tools/server/README.md
2025-09-03 17:16:42 +08:00
Johannes Gäßler
5d804a4938
ggml-backend: raise GGML_MAX_SPLIT_INPUTS ( #15722 )
2025-09-01 16:14:55 -07:00
Concedo
7e35954695
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/build.md
# docs/function-calling.md
# examples/eval-callback/eval-callback.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/kleidiai/kernels.cpp
# ggml/src/ggml-cpu/kleidiai/kernels.h
# ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
# scripts/compare-llama-bench.py
# scripts/server-bench.py
# scripts/tool_bench.py
# tests/test-chat.cpp
# tools/batched-bench/batched-bench.cpp
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-08-31 23:33:36 +08:00
Diego Devesa
9777032dcc
llama : separate compute buffer reserve from fattn check ( #15696 )
...
Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.
2025-08-31 15:49:03 +02:00
Johannes Gäßler
e81b8e4b7f
llama: use FA + max. GPU layers by default ( #15434 )
...
* llama: use max. GPU layers by default, auto -fa
* ggml-backend: abort instead of segfault
2025-08-30 16:32:10 +02:00
Concedo
8b8396c30c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# docs/build-s390x.md
# examples/llama.vim
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/common.h
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# tests/test-backend-ops.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-08-23 11:35:28 +08:00
Concedo
257992d6b8
possibly unstable, needs testing for fa
2025-08-22 17:35:32 +08:00
Diego Devesa
54a241f505
sched : fix possible use of wrong ids tensor when offloading moe prompt processing ( #15488 )
2025-08-21 23:09:32 +02:00
Diego Devesa
5682a3745f
sched : copy only the used experts when offloading prompt processing ( #15346 )
2025-08-21 01:35:28 +02:00
Concedo
8a71eb03c0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ggml/cmake/ggml-config.cmake.in
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/fattn.cu
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# requirements/requirements-convert_hf_to_gguf.txt
# scripts/compare-llama-bench.py
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2025-08-07 21:23:09 +08:00
Diego Devesa
0d8831543c
ggml : fix fallback to CPU for ununsupported ops ( #15118 )
2025-08-06 14:37:35 +02:00
Concedo
0fcfbdb93c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/musa.Dockerfile
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# ci/README.md
# docs/build.md
# docs/docker.md
# ggml/CMakeLists.txt
# ggml/cmake/ggml-config.cmake.in
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/fattn-wmma-f16.cu
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tools/imatrix/README.md
# tools/imatrix/imatrix.cpp
2025-07-25 19:53:13 +08:00
Diego Devesa
c12bbde372
sched : fix multiple evaluations of the same graph with pipeline parallelism ( #14855 )
...
ggml-ci
2025-07-25 11:07:26 +03:00
Concedo
30675b0798
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CODEOWNERS
# docs/build.md
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tools/imatrix/README.md
# tools/imatrix/imatrix.cpp
2025-07-20 22:47:31 +08:00
Georgi Gerganov
bf9087f59a
metal : fuse add, mul + add tests ( #14596 )
...
ggml-ci
2025-07-18 20:37:26 +03:00
Concedo
cdda9d16e0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# build-xcframework.sh
# ci/run.sh
# examples/Miku.sh
# examples/chat-13B.sh
# examples/chat-persistent.sh
# examples/chat-vicuna.sh
# examples/chat.sh
# examples/jeopardy/jeopardy.sh
# examples/reason-act.sh
# examples/server-llama2-13B.sh
# examples/sycl/build.sh
# examples/sycl/run-llama2.sh
# examples/sycl/run-llama3.sh
# examples/ts-type-to-grammar.sh
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/apple/validate-apps.sh
# scripts/apple/validate-ios.sh
# scripts/apple/validate-macos.sh
# scripts/apple/validate-tvos.sh
# scripts/apple/validate-visionos.sh
# scripts/check-requirements.sh
# scripts/ci-run.sh
# scripts/compare-commits.sh
# scripts/debug-test.sh
# scripts/gen-authors.sh
# scripts/get-hellaswag.sh
# scripts/get-pg.sh
# scripts/get-wikitext-103.sh
# scripts/get-wikitext-2.sh
# scripts/get-winogrande.sh
# scripts/hf.sh
# scripts/qnt-all.sh
# scripts/run-all-perf.sh
# scripts/run-all-ppl.sh
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.sh
# scripts/tool_bench.sh
# tests/test-backend-ops.cpp
# tests/test-lora-conversion-inference.sh
# tests/test-tokenizer-0.sh
# tools/server/README.md
2025-06-30 20:38:44 +08:00
Jeff Bolz
bd9c981d72
vulkan: Add fusion support for RMS_NORM+MUL ( #14366 )
...
* vulkan: Add fusion support for RMS_NORM+MUL
- Add a use_count to ggml_tensor, so we can detect if an output is used more than once.
- Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor.
- Add detection logic and basic fusion logic in ggml-vulkan.
- Add some testing support for fusion. Rather than computing one node at a time, allow
for computing the whole graph and just testing one node's results. Add rms_norm_mul tests
and enable a llama test.
* extract some common fusion logic
* fix -Winconsistent-missing-override
* move ggml_can_fuse to a common function
* build fix
* C and C++ versions of can_fuse
* move use count to the graph to avoid data races and double increments when used in multiple threads
* use hash table lookup to find node index
* change use_counts to be indexed by hash table slot
* minimize hash lookups
style fixes
* last node doesn't need single use.
fix type.
handle mul operands being swapped.
* remove redundant parameter
---------
Co-authored-by: slaren <slarengh@gmail.com>
2025-06-29 09:43:36 +02:00
Concedo
b08dca65ed
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/CMakeLists.txt
# common/arg.cpp
# common/chat.cpp
# examples/parallel/README.md
# examples/parallel/parallel.cpp
# ggml/cmake/common.cmake
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/rope.cpp
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-command-r.gguf.inp
# models/ggml-vocab-command-r.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-qwen2.gguf.inp
# models/ggml-vocab-qwen2.gguf.out
# models/ggml-vocab-refact.gguf.inp
# models/ggml-vocab-refact.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements/requirements-gguf_editor_gui.txt
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/mtmd/CMakeLists.txt
# tools/run/run.cpp
# tools/server/CMakeLists.txt
2025-05-31 13:04:21 +08:00
Diego Devesa
b47ab7b8e9
sched : avoid changing cur_copy when a graph is already allocated ( #13922 )
2025-05-30 18:56:19 +02:00
Concedo
8c701d7ded
Merge commit ' 72b090da2c' into concedo_experimental
...
# Conflicts:
# docs/backend/CANN.md
# docs/function-calling.md
# examples/embedding/embedding.cpp
# examples/retrieval/retrieval.cpp
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/Doxyfile
# ggml/src/ggml-cann/acl_tensor.cpp
# ggml/src/ggml-cann/acl_tensor.h
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/binbcast.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/concat.cpp
# ggml/src/ggml-sycl/conv.cpp
# ggml/src/ggml-sycl/cpy.cpp
# ggml/src/ggml-sycl/dmmv.cpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/getrows.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/gla.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/norm.cpp
# ggml/src/ggml-sycl/outprod.cpp
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-sycl/softmax.cpp
# ggml/src/ggml-sycl/tsembd.cpp
# ggml/src/ggml-sycl/wkv.cpp
# scripts/compare-commits.sh
# tests/test-chat.cpp
# tests/test-sampling.cpp
2025-05-28 00:28:41 +08:00
Diego Devesa
952f3953c1
ggml : allow CUDA graphs when using pipeline parallelism ( #13814 )
2025-05-27 13:05:18 +02:00
Concedo
21e31e255b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-metal/ggml-metal.m
# ggml/src/ggml-metal/ggml-metal.metal
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# src/llama-model.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/clip.cpp
# tools/rpc/rpc-server.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-05-13 00:28:35 +08:00
Johannes Gäßler
10d2af0eaa
llama/ggml: add LLM training support ( #10544 )
...
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
2025-05-12 14:44:49 +02:00
David Huang
7f323a589f
Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B ( #13386 )
2025-05-11 14:18:39 +02:00
Concedo
ffe23f0e93
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/ggml-sycl.cpp
# pyproject.toml
2025-05-06 23:39:45 +08:00
Johannes Gäßler
9070365020
CUDA: fix logic for clearing padding with -ngl 0 ( #13320 )
2025-05-05 22:32:13 +02:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Concedo
ec43d2b147
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# common/common.cpp
# examples/embedding/embedding.cpp
# examples/json_schema_to_grammar.py
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.swiftui/README.md
# examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj
# examples/lookahead/lookahead.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# requirements.txt
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
2025-03-06 18:54:58 +08:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ( #12150 )
...
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
2025-03-04 18:53:26 +02:00
Concedo
6b7d2349a7
Rewrite history to fix bad vulkan shader commits without increasing repo size
...
added dpe colab (+8 squashed commit)
Squashed commit:
[b8362da4] updated lite
[ed6c037d] move nsigma into the regular sampler stack
[ac5f61c6] relative filepath fixed
[05fe96ab] export template
[ed0a5a3e] nix_example.md: refactor (#1401 )
* nix_example.md: add override example
* nix_example.md: drop graphics example, already basic nixos knowledge
* nix_example.md: format
* nix_example.md: Vulkan is disabled on macOS
Disabled in: 1ccd253acc
* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}
Fixes: https://github.com/LostRuins/koboldcpp/issues/1367
[675c62f7] AutoGuess: Phi 4 (mini) (#1402 )
[4bf56982 ] phrasing
[b8c0df04 ] Add Rep Pen to Top N Sigma sampler chain (#1397 )
- place after nsigma and before xtc (+3 squashed commit)
Squashed commit:
[87c52b97 ] disable VMM from HIP
[ee8906f3 ] edit description
[e85c0e69 ] Remove Unnecessary Rep Counting (#1394 )
* stop counting reps
* fix range-based initializer
* strike that - reverse it
2025-03-05 00:02:20 +08:00
William Tambellini
70680c48e5
ggml : upgrade init_tensor API to return a ggml_status ( #11854 )
...
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com>
2025-02-28 14:41:47 +01:00
Concedo
dcfa1eca4e
Merge commit ' 017cc5f446' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# CODEOWNERS
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
# examples/gritlm/gritlm.cpp
# examples/llama-bench/llama-bench.cpp
# examples/passkey/passkey.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/run/run.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/tokenize/tokenize.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-autorelease.cpp
# tests/test-model-load-cancel.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2025-01-08 23:15:21 +08:00
Diego Devesa
017cc5f446
ggml-backend : only offload from host buffers (fix) ( #11124 )
2025-01-07 16:11:57 +01:00
Diego Devesa
a3d50bc022
ggml-backend : only offload from host buffers ( #11120 )
2025-01-07 12:38:05 +01:00