koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

History

Jeff Bolz 611f419cff vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281 ) * vulkan: optimize rms_norm, and allow the work to spread across multiple SMs There are really two parts to this change: (1) Some optimizations similar to what we have in soft_max, to unroll with different numbers of iterations. (2) A fusion optimization where we detect add followed by rms_norm, and make the add shader atomically accumulate the values^2 into memory. Then the rms_norm shader can just load that sum. This allows the rms_norm to be parallelized across multiple workgroups, it just becomes a simple per-element multiply. The fusion optimization is currently only applied when the rms_norm is on a single vector. This previously always ran on a single SM. It could apply more broadly, but when there are other dimensions the work can already spread across SMs, and there would be some complexity to tracking multiple atomic sums. * Change add+rms_norm optimization to write out an array of partial sums rather than using atomic add, to make it deterministic. The rms_norm shader fetches a subgroup's worth in parallel and uses subgroupAdd to add them up. * complete rebase against fused adds - multi_add shader can also compute partial sums * fix validation errors * disable add_rms_fusion for Intel due to possible driver bug * resolve against #15489, sync after clearing partial sums		2025-08-23 13:16:17 -05:00
..
.gitignore	tests : gitignore ggml-common.h	2024-03-09 14:17:11 +02:00
CMakeLists.txt	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
get-model.cpp	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
get-model.h	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
test-arg-parser.cpp	tests : avoid github urls due to throttling (#13654 )	2025-05-20 12:03:17 +02:00
test-autorelease.cpp	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
test-backend-ops.cpp	vulkan: optimize rms_norm, and allow the work to spread across multiple SMs (#15281 )	2025-08-23 13:16:17 -05:00
test-barrier.cpp	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
test-c.c	ggml : remove kompute backend (#14501 )	2025-07-03 07:48:32 +03:00
test-chat-parser.cpp	tests : remove json.hpp from a test (#13880 )	2025-05-29 12:17:16 +03:00
test-chat-template.cpp	model : add support for Seed-OSS (#15490 )	2025-08-23 15:21:52 +02:00
test-chat.cpp	chat : clarify the meaning of reasoning_format (#15408 )	2025-08-19 10:29:36 +02:00
test-double-float.cpp	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-gguf.cpp	gguf: fix failure on version == 0 (#13956 )	2025-06-01 18:08:05 +02:00
test-grammar-integration.cpp	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
test-grammar-llguidance.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-grammar-parser.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-json-partial.cpp	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 )	2025-05-25 01:48:08 +01:00
test-json-schema-to-grammar.cpp	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
test-llama-grammar.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-log.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
test-model-load-cancel.cpp	llama : update llama_model API names (#11063 )	2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c	mtmd : add C public API (#13184 )	2025-05-04 23:43:42 +02:00
test-opt.cpp	test-opt: allow slight inprecision (#15503 )	2025-08-22 23:47:01 +02:00
test-quantize-fns.cpp	tests : fix test-quantize-fns to init the CPU backend (#12306 )	2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp	ggml : inttypes.h -> cinttypes (#0 )	2024-11-17 08:30:29 +02:00
test-quantize-stats.cpp	docker : do not build tests (#13204 )	2025-04-30 10:44:07 +02:00
test-regex-partial.cpp	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
test-rope.cpp	llama : add Qwen2VL support + multimodal RoPE (#10361 )	2024-12-14 14:43:46 +02:00
test-sampling.cpp	sampling : make sure samplers return at least 1 token (#13822 )	2025-05-27 12:07:52 +03:00
test-thread-safety.cpp	tests : update for LLAMA_SET_ROWS=1 (#14961 )	2025-07-30 15:12:02 +03:00
test-tokenizer-0.cpp	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
test-tokenizer-0.py	py : logging and flake8 suppression refactoring (#7081 )	2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
test-tokenizer-1-bpe.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-tokenizer-1-spm.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-tokenizer-random.py	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
test-tokenizers-repo.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00