koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-22 19:47:49 +00:00

History

Georgi Gerganov a1a69f777a metal : optimize concat kernel and fix set kernel threads (#23411 ) * metal : fix GGML_OP_SET kernel threads * tests : extend test_cpy to support different src/dst shapes Extend test_cpy to support different source and destination tensor shapes for CPY operations (reshaping), where the total number of elements must match. - Renamed ne -> ne_src, added ne_dst parameter (default: use src shape) - Added 50 new reshaping test cases covering 1D<->2D<->3D<->4D conversions - Tests exercise 1024 boundary, small shapes, and large dimensionality changes - Fixed dangling reference bug (storing & to temporary std::array) - Updated all existing test calls with permute/transpose args for compatibility Assisted-by: llama.cpp:local pi * metal : optimize concat kernel with row batching for small widths When ne0 < 256, batch multiple rows into a single threadgroup to improve occupancy. This avoids underutilizing the GPU when processing narrow tensors. - Dispatch nth = min(256, ne0) threads per group - Calculate nrptg (rows per threadgroup) to fill up to 256 threads - Update kernel index calculation to handle the row batching - Add boundary check for i1 >= ne1 Assisted-by: llama.cpp:local pi * tests : clean-up * tests : refactor CPY shape tests to use dimension permutations Replace 75 hardcoded test cases with a loop over permutations of {3, 5, 7, 32} (total elements: 3360). Each src permutation is tested against canonical sorted and reverse dst, skipping identical shapes. Covers F32, F16, and Q4_0 (when both src and dst ne0 == 32). Assisted-by: llama.cpp:local pi		2026-05-21 13:34:08 +03:00
..
peg-parser	common/gemma4 : handle parsing edge cases (#21760 )	2026-04-13 18:18:18 -05:00
snapshots	tests : add unit test coverage for llama_tensor_get_type (#20112 )	2026-04-02 22:53:58 +02:00
.gitignore	tests : add unit test coverage for llama_tensor_get_type (#20112 )	2026-04-02 22:53:58 +02:00
CMakeLists.txt	llama + spec: MTP Support (#22673 )	2026-05-16 20:06:23 +08:00
export-graph-ops.cpp	tests: allow exporting graph ops from HF file without downloading weights (#21182 )	2026-04-02 18:19:20 +02:00
get-model.cpp
get-model.h
gguf-model-data.cpp	tests : add unit test coverage for llama_tensor_get_type (#20112 )	2026-04-02 22:53:58 +02:00
gguf-model-data.h	tests : add unit test coverage for llama_tensor_get_type (#20112 )	2026-04-02 22:53:58 +02:00
test-alloc.cpp	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
test-arg-parser.cpp	spec : refactor params (#22397 )	2026-04-28 09:07:33 +03:00
test-autorelease.cpp	docs : Minor cleanups (#19252 )	2026-02-02 08:38:55 +02:00
test-backend-ops.cpp	metal : optimize concat kernel and fix set kernel threads (#23411 )	2026-05-21 13:34:08 +03:00
test-backend-sampler.cpp	tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645 )	2026-03-18 17:40:22 +08:00
test-barrier.cpp	Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748 )	2025-12-10 12:32:23 -08:00
test-c.c	ggml : remove kompute backend (#14501 )	2025-07-03 07:48:32 +03:00
test-chat-auto-parser.cpp	chat: fix handling of space in reasoning markers (#22353 )	2026-04-25 21:24:13 +02:00
test-chat-peg-parser.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
test-chat-template.cpp	chat : add Granite 4.0 chat template with correct tool_call role mapping (#20804 )	2026-04-02 11:28:56 +02:00
test-chat.cpp	common : delegate assistant continuation to underlying template handlers (#23089 )	2026-05-17 13:36:05 +02:00
test-double-float.cpp
test-gbnf-validator.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-gguf-model-data.cpp	tests : add unit test coverage for llama_tensor_get_type (#20112 )	2026-04-02 22:53:58 +02:00
test-gguf.cpp	llama: fix llama-model-saver (#20503 )	2026-03-25 12:53:16 +02:00
test-grammar-integration.cpp	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )	2026-03-21 18:43:35 +01:00
test-grammar-llguidance.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-grammar-parser.cpp	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )	2026-03-21 18:43:35 +01:00
test-jinja.cpp	jinja : support ensure_ascii=true, string repetition and int/float self-filtering (#21623 )	2026-04-09 11:28:33 +02:00
test-json-partial.cpp	common : handle unicode during partial json parsing (#16526 )	2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp	tests : remove obsolete .mjs script (#21615 )	2026-04-08 13:20:46 +03:00
test-llama-archs.cpp	mtmd: add Gemma 4 audio conformer encoder support (#21421 )	2026-04-12 14:15:26 +02:00
test-llama-grammar.cpp	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )	2026-03-21 18:43:35 +01:00
test-log.cpp	common: Intentionally leak logger instance to fix hanging on Windows (#22273 )	2026-04-29 10:58:43 +03:00
test-lora-conversion-inference.sh	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
test-model-load-cancel.cpp	llama : update llama_model API names (#11063 )	2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c	mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change) (#22082 )	2026-04-19 11:57:21 +02:00
test-opt.cpp	tests : fix test-opt with GGML_BACKEND_DL (#15599 )	2025-08-26 22:14:38 +02:00
test-peg-parser.cpp	Autoparser - complete refactoring of parser architecture (#18675 )	2026-03-06 21:01:00 +01:00
test-quant-type-selection.cpp	tests : add unit test coverage for llama_tensor_get_type (#20112 )	2026-04-02 22:53:58 +02:00
test-quantize-fns.cpp	ggml: add Q1_0 1-bit quantization support (CPU) (#21273 )	2026-04-06 20:55:21 +02:00
test-quantize-perf.cpp	ci: run the x64 and arm ci on the github machines instead (#16183 )	2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
test-reasoning-budget.cpp	reasoning-budget: clone should do a deep-copy (#23095 )	2026-05-15 11:59:07 +02:00
test-recurrent-state-rollback.cpp	llama + spec: MTP Support (#22673 )	2026-05-16 20:06:23 +08:00
test-regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
test-rope.cpp	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805 )	2025-11-11 13:33:24 +02:00
test-sampling.cpp	sampling : optimize samplers by reusing bucket sort (#15665 )	2025-08-31 20:41:02 +03:00
test-state-restore-fragmented.cpp	common : only load backends when required (#22290 )	2026-05-05 09:23:50 +02:00
test-thread-safety.cpp	common : move up common_init() and fix Windows UTF-8 logs (#21176 )	2026-03-31 12:53:41 +02:00
test-tokenizer-0.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-0.py	requirements : update transformers to 5.5.1 (#21617 )	2026-04-09 12:36:29 +02:00
test-tokenizer-0.sh	model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826 )	2026-02-26 12:14:09 +01:00
test-tokenizer-1-bpe.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-1-spm.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-random.py	requirements : update transformers to 5.5.1 (#21617 )	2026-04-09 12:36:29 +02:00
test-tokenizers-repo.sh	devops: add s390x & ppc64le CI (#15925 )	2025-09-27 02:03:33 +08:00
testing.h	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00