koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-16 11:20:51 +00:00

Author	SHA1	Message	Date
Concedo	03cec02a3d	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # .github/workflows/winget.yml # CODEOWNERS # README.md # ci/run.sh # docs/build.md # docs/ops.md # docs/ops/Vulkan.csv # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/sync_vendor.py # src/CMakeLists.txt # tests/test-json-schema-to-grammar.cpp # tests/test-quantize-stats.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-12-03 18:56:31 +08:00
Concedo	83269df91b	Merge commit '`649495c9d9`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # SECURITY.md # docs/backend/SYCL.md # examples/sycl/run-llama2.sh # examples/sycl/run-llama3.sh # examples/sycl/win-run-llama2.bat # examples/sycl/win-run-llama3.bat # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-backend-ops.cpp # tests/test-json-schema-to-grammar.cpp # tools/server/CMakeLists.txt	2025-12-03 18:43:46 +08:00
Daniel Bevenius	7f3a72a8ed	ggml : remove redundant n_copies check when setting input/output (#17612 ) This commit removes a redundant check for sched->n_copies > 1 when setting input and output flags on tensor copies in ggml_backend_sched_split_graph. The motivation for this change is to clarify the code as the outer if statement already performs this check.	2025-12-02 12:52:45 +01:00
Georgi Gerganov	90c72a614a	ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (#17617 )	2025-12-01 12:49:33 +02:00
Concedo	bf5efcf86d	Merge commit '`d82b7a7c1d`' into concedo_experimental # Conflicts: # ci/run.sh # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cuda/common.cuh # tests/CMakeLists.txt	2025-11-30 15:43:11 +08:00
Diego Devesa	e072b2052e	ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276 ) * ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. * llama : update worst-case graph for unified cache * ci : disable op offload in some tests * fix spelling --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 17:33:23 +02:00
LostRuins Concedo	3fe0e39b62	Merge commit '`4dca015b7e`' into concedo_experimental # Conflicts: # .github/copilot-instructions.md # README.md # docs/ops.md # docs/ops/CPU.csv # docs/ops/CUDA.csv # docs/ops/Vulkan.csv # ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp # src/CMakeLists.txt # tests/test-backend-ops.cpp	2025-11-16 18:33:58 +08:00
Diego Devesa	dd091e52f8	sched : fix reserve ignoring user tensor assignments (#17232 )	2025-11-13 13:14:02 +01:00
Concedo	9720aa6224	change an assert to optional testing https://github.com/LostRuins/koboldcpp/issues/1821	2025-11-02 10:30:04 +08:00
Concedo	b120e107f9	Merge branch 'upstream' into concedo_experimental # Conflicts: # .clang-tidy # .devops/musa.Dockerfile # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/docker.yml # .gitignore # CODEOWNERS # CONTRIBUTING.md # README.md # build-xcframework.sh # ci/README-MUSA.md # ci/run.sh # common/CMakeLists.txt # docs/docker.md # examples/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/model-conversion/Makefile # examples/model-conversion/README.md # examples/model-conversion/logits.cpp # examples/model-conversion/scripts/causal/compare-logits.py # examples/model-conversion/scripts/causal/run-org-model.py # examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh # examples/model-conversion/scripts/embedding/run-converted-model.sh # examples/model-conversion/scripts/embedding/run-original-model.py # examples/model-conversion/scripts/utils/check-nmse.py # examples/model-conversion/scripts/utils/inspect-org-model.py # examples/model-conversion/scripts/utils/semantic_check.py # ggml/CMakeLists.txt # ggml/include/ggml-zdnn.h # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/set_rows.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/set_rows.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-quantize-perf.cpp # tests/test-tokenizers-repo.sh # tools/perplexity/perplexity.cpp # tools/server/tests/README.md	2025-09-27 17:09:14 +08:00
Johannes Gäßler	e789095502	llama: print memory breakdown on exit (#15860 ) * llama: print memory breakdown on exit	2025-09-24 16:53:48 +02:00
Concedo	c7a1eec4e4	try to solve ttscpp oom regression	2025-09-24 17:45:28 +08:00
Concedo	0dc6b9f418	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/amx/amx.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/cvt.cl # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py # ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.tmpl.wgsl # ggml/src/ggml-webgpu/wgsl-shaders/set_rows.wgsl # ggml/src/ggml-zdnn/ggml-zdnn.cpp # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-chat.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp	2025-09-21 11:38:47 +08:00
Jeff Bolz	c0b45097c3	rename optimize_graph to graph_optimize (#16082 )	2025-09-18 13:46:17 -05:00
Concedo	6463f5c26b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # CONTRIBUTING.md # docs/backend/CANN.md # examples/eval-callback/eval-callback.cpp # examples/model-conversion/requirements.txt # examples/model-conversion/scripts/causal/run-org-model.py # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-zdnn/ggml-zdnn.cpp # models/templates/README.md # requirements/requirements-convert_hf_to_gguf.txt # requirements/requirements-convert_legacy_llama.txt # requirements/requirements-tool_bench.txt # tests/.gitignore # tests/test-backend-ops.cpp # tests/test-chat-parser.cpp # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-tokenizer-random.py	2025-09-11 22:34:45 +08:00
Jeff Bolz	e68aa10d8f	vulkan: sort graph to allow more parallel execution (#15850 ) * vulkan: sort graph to allow more parallel execution Add a backend proc to allow the backend to modify the graph. The vulkan implementation looks at which nodes depend on each other and greedily reorders them to group together nodes that don't depend on each other. It only reorders the nodes, doesn't change the contents of any of them. With #15489, this reduces the number of synchronizations needed. * call optimize_graph per-split	2025-09-09 02:10:07 +08:00
Concedo	2562129271	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # ci/run.sh # docs/backend/CANN.md # examples/speculative/speculative.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-opencl/kernels/flash_attn_f16.cl # ggml/src/ggml-opencl/kernels/flash_attn_f32.cl # ggml/src/ggml-opencl/kernels/flash_attn_f32_f16.cl # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/gguf.cpp # src/llama-context.cpp # tests/test-sampling.cpp # tools/server/README.md	2025-09-03 17:16:42 +08:00
Johannes Gäßler	5d804a4938	ggml-backend: raise GGML_MAX_SPLIT_INPUTS (#15722 )	2025-09-01 16:14:55 -07:00
Concedo	7e35954695	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/build.md # docs/function-calling.md # examples/eval-callback/eval-callback.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/kleidiai/kernels.cpp # ggml/src/ggml-cpu/kleidiai/kernels.h # ggml/src/ggml-cpu/kleidiai/kleidiai.cpp # scripts/compare-llama-bench.py # scripts/server-bench.py # scripts/tool_bench.py # tests/test-chat.cpp # tools/batched-bench/batched-bench.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-08-31 23:33:36 +08:00
Diego Devesa	9777032dcc	llama : separate compute buffer reserve from fattn check (#15696 ) Exposes ggml_backend_sched_split_graph() to allow splitting the graph without allocating compute buffers and uses it to split the graph for the automatic Flash Attention check.	2025-08-31 15:49:03 +02:00
Johannes Gäßler	e81b8e4b7f	llama: use FA + max. GPU layers by default (#15434 ) * llama: use max. GPU layers by default, auto -fa * ggml-backend: abort instead of segfault	2025-08-30 16:32:10 +02:00
Concedo	8b8396c30c	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build-s390x.md # examples/llama.vim # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # scripts/compare-llama-bench.py # src/CMakeLists.txt # tests/test-backend-ops.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-08-23 11:35:28 +08:00
Concedo	257992d6b8	possibly unstable, needs testing for fa	2025-08-22 17:35:32 +08:00
Diego Devesa	54a241f505	sched : fix possible use of wrong ids tensor when offloading moe prompt processing (#15488 )	2025-08-21 23:09:32 +02:00
Diego Devesa	5682a3745f	sched : copy only the used experts when offloading prompt processing (#15346 )	2025-08-21 01:35:28 +02:00
Concedo	8a71eb03c0	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # ggml/cmake/ggml-config.cmake.in # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cuda/fattn.cu # ggml/src/ggml-opencl/CMakeLists.txt # ggml/src/ggml-opencl/ggml-opencl.cpp # requirements/requirements-convert_hf_to_gguf.txt # scripts/compare-llama-bench.py # tests/test-chat-template.cpp # tests/test-chat.cpp # tools/llama-bench/llama-bench.cpp	2025-08-07 21:23:09 +08:00
Diego Devesa	0d8831543c	ggml : fix fallback to CPU for ununsupported ops (#15118 )	2025-08-06 14:37:35 +02:00
Concedo	0fcfbdb93c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/musa.Dockerfile # .github/workflows/build.yml # .github/workflows/close-issue.yml # ci/README.md # docs/build.md # docs/docker.md # ggml/CMakeLists.txt # ggml/cmake/ggml-config.cmake.in # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/fattn-wmma-f16.cu # ggml/src/ggml-musa/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tools/imatrix/README.md # tools/imatrix/imatrix.cpp	2025-07-25 19:53:13 +08:00
Diego Devesa	c12bbde372	sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855 ) ggml-ci	2025-07-25 11:07:26 +03:00
Concedo	30675b0798	Merge branch 'upstream' into concedo_experimental # Conflicts: # CODEOWNERS # docs/build.md # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tools/imatrix/README.md # tools/imatrix/imatrix.cpp	2025-07-20 22:47:31 +08:00
Georgi Gerganov	bf9087f59a	metal : fuse add, mul + add tests (#14596 ) ggml-ci	2025-07-18 20:37:26 +03:00
Concedo	cdda9d16e0	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/tools.sh # build-xcframework.sh # ci/run.sh # examples/Miku.sh # examples/chat-13B.sh # examples/chat-persistent.sh # examples/chat-vicuna.sh # examples/chat.sh # examples/jeopardy/jeopardy.sh # examples/reason-act.sh # examples/server-llama2-13B.sh # examples/sycl/build.sh # examples/sycl/run-llama2.sh # examples/sycl/run-llama3.sh # examples/ts-type-to-grammar.sh # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # scripts/apple/validate-apps.sh # scripts/apple/validate-ios.sh # scripts/apple/validate-macos.sh # scripts/apple/validate-tvos.sh # scripts/apple/validate-visionos.sh # scripts/check-requirements.sh # scripts/ci-run.sh # scripts/compare-commits.sh # scripts/debug-test.sh # scripts/gen-authors.sh # scripts/get-hellaswag.sh # scripts/get-pg.sh # scripts/get-wikitext-103.sh # scripts/get-wikitext-2.sh # scripts/get-winogrande.sh # scripts/hf.sh # scripts/qnt-all.sh # scripts/run-all-perf.sh # scripts/run-all-ppl.sh # scripts/sync-ggml-am.sh # scripts/sync-ggml.sh # scripts/tool_bench.sh # tests/test-backend-ops.cpp # tests/test-lora-conversion-inference.sh # tests/test-tokenizer-0.sh # tools/server/README.md	2025-06-30 20:38:44 +08:00
Jeff Bolz	bd9c981d72	vulkan: Add fusion support for RMS_NORM+MUL (#14366 ) * vulkan: Add fusion support for RMS_NORM+MUL - Add a use_count to ggml_tensor, so we can detect if an output is used more than once. - Change the ggml-vulkan rms_norm shader to optionally multiply by another tensor. - Add detection logic and basic fusion logic in ggml-vulkan. - Add some testing support for fusion. Rather than computing one node at a time, allow for computing the whole graph and just testing one node's results. Add rms_norm_mul tests and enable a llama test. * extract some common fusion logic * fix -Winconsistent-missing-override * move ggml_can_fuse to a common function * build fix * C and C++ versions of can_fuse * move use count to the graph to avoid data races and double increments when used in multiple threads * use hash table lookup to find node index * change use_counts to be indexed by hash table slot * minimize hash lookups style fixes * last node doesn't need single use. fix type. handle mul operands being swapped. * remove redundant parameter --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-06-29 09:43:36 +02:00
Concedo	b08dca65ed	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/CMakeLists.txt # common/arg.cpp # common/chat.cpp # examples/parallel/README.md # examples/parallel/parallel.cpp # ggml/cmake/common.cmake # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/rope.cpp # models/ggml-vocab-bert-bge.gguf.inp # models/ggml-vocab-bert-bge.gguf.out # models/ggml-vocab-command-r.gguf.inp # models/ggml-vocab-command-r.gguf.out # models/ggml-vocab-deepseek-coder.gguf.inp # models/ggml-vocab-deepseek-coder.gguf.out # models/ggml-vocab-deepseek-llm.gguf.inp # models/ggml-vocab-deepseek-llm.gguf.out # models/ggml-vocab-falcon.gguf.inp # models/ggml-vocab-falcon.gguf.out # models/ggml-vocab-gpt-2.gguf.inp # models/ggml-vocab-gpt-2.gguf.out # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # models/ggml-vocab-llama-spm.gguf.inp # models/ggml-vocab-llama-spm.gguf.out # models/ggml-vocab-mpt.gguf.inp # models/ggml-vocab-mpt.gguf.out # models/ggml-vocab-phi-3.gguf.inp # models/ggml-vocab-phi-3.gguf.out # models/ggml-vocab-qwen2.gguf.inp # models/ggml-vocab-qwen2.gguf.out # models/ggml-vocab-refact.gguf.inp # models/ggml-vocab-refact.gguf.out # models/ggml-vocab-starcoder.gguf.inp # models/ggml-vocab-starcoder.gguf.out # requirements/requirements-gguf_editor_gui.txt # tests/CMakeLists.txt # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp # tools/mtmd/CMakeLists.txt # tools/run/run.cpp # tools/server/CMakeLists.txt	2025-05-31 13:04:21 +08:00
Diego Devesa	b47ab7b8e9	sched : avoid changing cur_copy when a graph is already allocated (#13922 )	2025-05-30 18:56:19 +02:00
Concedo	8c701d7ded	Merge commit '`72b090da2c`' into concedo_experimental # Conflicts: # docs/backend/CANN.md # docs/function-calling.md # examples/embedding/embedding.cpp # examples/retrieval/retrieval.cpp # ggml/src/ggml-cann/CMakeLists.txt # ggml/src/ggml-cann/Doxyfile # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/acl_tensor.h # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-sycl/binbcast.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/concat.cpp # ggml/src/ggml-sycl/conv.cpp # ggml/src/ggml-sycl/cpy.cpp # ggml/src/ggml-sycl/dmmv.cpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/getrows.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/gla.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/norm.cpp # ggml/src/ggml-sycl/outprod.cpp # ggml/src/ggml-sycl/rope.cpp # ggml/src/ggml-sycl/softmax.cpp # ggml/src/ggml-sycl/tsembd.cpp # ggml/src/ggml-sycl/wkv.cpp # scripts/compare-commits.sh # tests/test-chat.cpp # tests/test-sampling.cpp	2025-05-28 00:28:41 +08:00
Diego Devesa	952f3953c1	ggml : allow CUDA graphs when using pipeline parallelism (#13814 )	2025-05-27 13:05:18 +02:00
Concedo	21e31e255b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # build-xcframework.sh # common/CMakeLists.txt # examples/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-metal/ggml-metal.m # ggml/src/ggml-metal/ggml-metal.metal # ggml/src/ggml-sycl/CMakeLists.txt # ggml/src/ggml-sycl/backend.hpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/vecdotq.hpp # scripts/compare-llama-bench.py # src/CMakeLists.txt # src/llama-model.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-opt.cpp # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/clip.cpp # tools/rpc/rpc-server.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-05-13 00:28:35 +08:00
Johannes Gäßler	10d2af0eaa	llama/ggml: add LLM training support (#10544 ) * llama/ggml: add LLM training support more compact progress bar llama_save_model_to_file llama_opt_param_filter ggml_graph_dup force_grads refactor ggml_opt, fix test-opt * remove logits_all * refactor CUDA implementation for ACC * reset graph at beginning of opt period	2025-05-12 14:44:49 +02:00
David Huang	7f323a589f	Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386 )	2025-05-11 14:18:39 +02:00
Concedo	ffe23f0e93	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-sycl/ggml-sycl.cpp # pyproject.toml	2025-05-06 23:39:45 +08:00
Johannes Gäßler	9070365020	CUDA: fix logic for clearing padding with -ngl 0 (#13320 )	2025-05-05 22:32:13 +02:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Concedo	ec43d2b147	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # common/common.cpp # examples/embedding/embedding.cpp # examples/json_schema_to_grammar.py # examples/llama.android/llama/src/main/cpp/llama-android.cpp # examples/llama.swiftui/README.md # examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj # examples/lookahead/lookahead.cpp # examples/parallel/parallel.cpp # examples/passkey/passkey.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # requirements.txt # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp	2025-03-06 18:54:58 +08:00
mgroeber9110	5bbe6a9fe9	ggml : portability fixes for VS 2017 (#12150 ) * Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>	2025-03-04 18:53:26 +02:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
William Tambellini	70680c48e5	ggml : upgrade init_tensor API to return a ggml_status (#11854 ) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-02-28 14:41:47 +01:00
Concedo	dcfa1eca4e	Merge commit '`017cc5f446`' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/010-bug-compilation.yml # .github/ISSUE_TEMPLATE/019-bug-misc.yml # CODEOWNERS # examples/batched-bench/batched-bench.cpp # examples/batched/batched.cpp # examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp # examples/gritlm/gritlm.cpp # examples/llama-bench/llama-bench.cpp # examples/passkey/passkey.cpp # examples/quantize-stats/quantize-stats.cpp # examples/run/run.cpp # examples/simple-chat/simple-chat.cpp # examples/simple/simple.cpp # examples/tokenize/tokenize.cpp # ggml/CMakeLists.txt # ggml/src/ggml-metal/CMakeLists.txt # ggml/src/ggml-vulkan/CMakeLists.txt # scripts/sync-ggml.last # src/llama.cpp # tests/test-autorelease.cpp # tests/test-model-load-cancel.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp	2025-01-08 23:15:21 +08:00
Diego Devesa	017cc5f446	ggml-backend : only offload from host buffers (fix) (#11124 )	2025-01-07 16:11:57 +01:00
Diego Devesa	a3d50bc022	ggml-backend : only offload from host buffers (#11120 )	2025-01-07 12:38:05 +01:00

1 2

89 commits