koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-09 19:46:11 +00:00

Author	SHA1	Message	Date
Concedo	4d8a7a6594	fix occasional clip segfault, fix glm4 (+1 squashed commits) Squashed commits: [bd71cd688] GLM4 fix wip	2025-04-29 01:42:50 +08:00
Concedo	e659cadf48	more sanitization for user inputs	2025-04-28 15:01:50 +08:00
Concedo	94c2572cb5	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/llama-bench/llama-bench.cpp	2025-04-28 14:42:57 +08:00
Concedo	a9bc1a2ee2	do not use shell true instead	2025-04-28 14:26:55 +08:00
4onen	c0a97b762e	llama-bench : Add `--override-tensors` arg (#12922 ) * Add --override-tensors option to llama-bench * Correct llama-bench --override-tensors to --override-tensor * llama-bench: Update --override-tensors parsing to match --tensor-split, appear in test matrix. * Make new llama-bench util functions static to fix Ubuntu CI * llama-bench: Correct -ot corner cases (No -ot calls, leading and trailing empty -ot spans, etc.)	2025-04-27 23:48:26 +02:00
matteo	ced44be342	llama-chat : fix wrong template in GLM4-0414 (#13140 ) * fix wrong template in GLM4-0414 * fix spaces * no bos token since it is already in the template * moved the chatgml4 check to higher priority * restored template for old GLM models * moved the GLM4 template check in the correct place with correct check	2025-04-27 21:57:32 +02:00
Concedo	ca281bd5ba	fix sanity check	2025-04-28 00:00:07 +08:00
Concedo	87cd8e6a00	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/llava/clip.cpp	2025-04-27 23:51:19 +08:00
Concedo	5fa9e02bc3	add debugging info to zenity check	2025-04-27 23:48:23 +08:00
R0CKSTAR	e291450b76	musa: fix build warning (#13129 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-27 13:22:49 +02:00
LostRuins Concedo	59e991c23c	Fixes Qwen2.5VL segfault during inference with https://github.com/ggml-org/llama.cpp/pull/12402 as has_qwen2vl_merger migration was incomplete (#13133 )	2025-04-27 12:43:37 +02:00
Concedo	f77574765e	termux script fix	2025-04-27 17:56:00 +08:00
Concedo	37060f54da	backwards compat handle older HimarIO quants	2025-04-27 17:38:22 +08:00
Concedo	f8b7ddeac0	emergency fix for q25vl	2025-04-27 16:46:33 +08:00
HimariO	ca2bb89eac	clip : Add Qwen2.5VL support (#12402 ) * implment vision model architecture, gguf convertor * handle window attention inputs * add debug utils * fix few incorrect tensor memory layout * move position id remap out of ggml to avoid int32 cuda operations * cleaning up * ignore transformers Qwen2_5_xxx type check * remove not so often use `qwen2vl-cli` debug functions * remove commented-out code blocks * fix attn weight scaling after rebase * add `PROJECTOR_TYPE_QWEN2_5_VL` * remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM` * replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN` * remove `attn_window_size` from gguf * fix model conversion * clean up * fix merging problem * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-27 10:10:34 +02:00
Concedo	1b0481f4b1	wip qwen25vl merge	2025-04-27 13:07:07 +08:00
Concedo	36c8db1248	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/llava/clip-impl.h # examples/llava/clip.cpp # tests/test-arg-parser.cpp # tests/test-json-schema-to-grammar.cpp	2025-04-27 12:51:02 +08:00
Xuan Son Nguyen	53a15d014f	add test	2025-04-26 23:00:41 +02:00
Xuan-Son Nguyen	2d451c8059	common : add common_remote_get_content (#13123 ) * common : add common_remote_get_content * support max size and timeout * add tests	2025-04-26 22:58:12 +02:00
Xuan Son Nguyen	89be919988	fix merging problem	2025-04-26 22:54:41 +02:00
Xuan Son Nguyen	82f8e72ecd	Merge branch 'master' into qwen25-vl	2025-04-26 22:45:06 +02:00
Xuan-Son Nguyen	4753791e70	clip : improve projector naming (#13118 ) * clip : improve projector naming * no more kv has_llava_projector * rm unused kv * rm more unused	2025-04-26 22:39:47 +02:00
Xuan Son Nguyen	0c74ea54f5	clean up	2025-04-26 22:37:05 +02:00
Xuan Son Nguyen	5085dbb293	Merge branch 'master' into qwen25-vl	2025-04-26 22:24:04 +02:00
Xuan Son Nguyen	516735ad21	fix model conversion	2025-04-26 22:23:48 +02:00
Concedo	378c3dd40c	updated lite	2025-04-27 01:46:52 +08:00
Concedo	77b9a83956	tryout termux autoinstaller (+1 squashed commits) Squashed commits: [9aeb5e902] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [0e33b5934] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [70232ea70] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [050770315] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [27bfc75a2] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [6a32c1f93] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [1e53b9d48] tryout termux autoinstaller	2025-04-27 01:27:23 +08:00
SXX	77d5e9a76a	ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107 ) * ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion * move fp converter to ggml-cpu * Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32	2025-04-26 16:05:31 +02:00
HimariO	7e1bb0437a	remove `attn_window_size` from gguf	2025-04-26 20:19:51 +08:00
frob	d5fe4e81bd	grammar : handle maxItems == 0 in JSON schema (#13117 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-04-26 10:10:20 +02:00
Concedo	4dcd215b27	handle explicit null	2025-04-26 13:06:38 +08:00
Concedo	cb1c182673	add more warmup (+1 squashed commits) Squashed commits: [9578d5352] updated lite	2025-04-26 10:22:09 +08:00
Concedo	4decd6bea1	GLM4 batch clamp	2025-04-26 09:42:17 +08:00
Concedo	3f545eadbe	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-backend-ops.cpp	2025-04-26 09:12:40 +08:00
kallewoof	7cb815b727	AutoGuess: GLM-4 (#1502 ) * AutoGuess: GLM-4 * add 'chat_start' field to adapters * GLM-4 fix	2025-04-26 08:47:42 +08:00
Concedo	35dc8387e9	fixed rwkv7 handling	2025-04-26 02:13:06 +08:00
Concedo	5e87c04056	improved memory estimation (+2 squashed commit) Squashed commit: [3319540f9] mem estimation [43bad21db] mem estimation	2025-04-26 02:03:09 +08:00
Diego Devesa	295354ea68	llama : fix K-shift with quantized K and BLAS backend (#13113 )	2025-04-25 19:40:11 +02:00
HimariO	77b144a8e7	replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`	2025-04-26 01:00:00 +08:00
HimariO	f69e9fa04d	remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`	2025-04-26 00:16:27 +08:00
HimariO	caa7e57ec5	add `PROJECTOR_TYPE_QWEN2_5_VL`	2025-04-26 00:03:02 +08:00
HimariO	a3cd0e52f2	fix attn weight scaling after rebase	2025-04-25 22:12:55 +08:00
HimariO	7f530ac040	remove commented-out code blocks	2025-04-25 22:12:55 +08:00
HimariO	2de5dc3a14	remove not so often use `qwen2vl-cli` debug functions	2025-04-25 22:12:55 +08:00
HimariO	91fbdd781d	ignore transformers Qwen2_5_xxx type check	2025-04-25 22:12:26 +08:00
HimariO	d1af45988a	cleaning up	2025-04-25 22:12:26 +08:00
HimariO	2eb32933ea	move position id remap out of ggml to avoid int32 cuda operations	2025-04-25 22:12:26 +08:00
HimariO	444e47c088	fix few incorrect tensor memory layout	2025-04-25 22:11:48 +08:00
HimariO	69b39addd2	add debug utils	2025-04-25 22:11:48 +08:00
HimariO	3d5198ee05	handle window attention inputs	2025-04-25 22:11:13 +08:00

1 2 3 4 5 ...

7753 commits