koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-22 03:10:03 +00:00

Author	SHA1	Message	Date
Concedo	f8b7ddeac0	emergency fix for q25vl	2025-04-27 16:46:33 +08:00
Concedo	1b0481f4b1	wip qwen25vl merge	2025-04-27 13:07:07 +08:00
Concedo	36c8db1248	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/llava/clip-impl.h # examples/llava/clip.cpp # tests/test-arg-parser.cpp # tests/test-json-schema-to-grammar.cpp	2025-04-27 12:51:02 +08:00
Xuan Son Nguyen	53a15d014f	add test	2025-04-26 23:00:41 +02:00
Xuan-Son Nguyen	2d451c8059	common : add common_remote_get_content (#13123 ) * common : add common_remote_get_content * support max size and timeout * add tests	2025-04-26 22:58:12 +02:00
Xuan Son Nguyen	89be919988	fix merging problem	2025-04-26 22:54:41 +02:00
Xuan Son Nguyen	82f8e72ecd	Merge branch 'master' into qwen25-vl	2025-04-26 22:45:06 +02:00
Xuan-Son Nguyen	4753791e70	clip : improve projector naming (#13118 ) * clip : improve projector naming * no more kv has_llava_projector * rm unused kv * rm more unused	2025-04-26 22:39:47 +02:00
Xuan Son Nguyen	0c74ea54f5	clean up	2025-04-26 22:37:05 +02:00
Xuan Son Nguyen	5085dbb293	Merge branch 'master' into qwen25-vl	2025-04-26 22:24:04 +02:00
Xuan Son Nguyen	516735ad21	fix model conversion	2025-04-26 22:23:48 +02:00
Concedo	378c3dd40c	updated lite	2025-04-27 01:46:52 +08:00
Concedo	77b9a83956	tryout termux autoinstaller (+1 squashed commits) Squashed commits: [9aeb5e902] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [0e33b5934] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [70232ea70] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [050770315] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [27bfc75a2] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [6a32c1f93] tryout termux autoinstaller (+1 squashed commits) Squashed commits: [1e53b9d48] tryout termux autoinstaller	2025-04-27 01:27:23 +08:00
SXX	77d5e9a76a	ggml: move fp16/bf16 conversion optimizations to CPU backend + export conversion APIs (#13107 ) * ggml: dynamic x86_64 feature detection for FP32 <-> FP16/BF16 conversion * move fp converter to ggml-cpu * Switch ggml_compute_forward_get_rows_f16/bf16 to new ggml_cpu_fp16/bf16_to_fp32	2025-04-26 16:05:31 +02:00
HimariO	7e1bb0437a	remove `attn_window_size` from gguf	2025-04-26 20:19:51 +08:00
frob	d5fe4e81bd	grammar : handle maxItems == 0 in JSON schema (#13117 ) Co-authored-by: Richard Lyons <frob@cloudstaff.com>	2025-04-26 10:10:20 +02:00
Concedo	4dcd215b27	handle explicit null	2025-04-26 13:06:38 +08:00
Concedo	cb1c182673	add more warmup (+1 squashed commits) Squashed commits: [9578d5352] updated lite	2025-04-26 10:22:09 +08:00
Concedo	4decd6bea1	GLM4 batch clamp	2025-04-26 09:42:17 +08:00
Concedo	3f545eadbe	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # tests/test-backend-ops.cpp	2025-04-26 09:12:40 +08:00
kallewoof	7cb815b727	AutoGuess: GLM-4 (#1502 ) * AutoGuess: GLM-4 * add 'chat_start' field to adapters * GLM-4 fix	2025-04-26 08:47:42 +08:00
Concedo	35dc8387e9	fixed rwkv7 handling	2025-04-26 02:13:06 +08:00
Concedo	5e87c04056	improved memory estimation (+2 squashed commit) Squashed commit: [3319540f9] mem estimation [43bad21db] mem estimation	2025-04-26 02:03:09 +08:00
Diego Devesa	295354ea68	llama : fix K-shift with quantized K and BLAS backend (#13113 )	2025-04-25 19:40:11 +02:00
HimariO	77b144a8e7	replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`	2025-04-26 01:00:00 +08:00
HimariO	f69e9fa04d	remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`	2025-04-26 00:16:27 +08:00
HimariO	caa7e57ec5	add `PROJECTOR_TYPE_QWEN2_5_VL`	2025-04-26 00:03:02 +08:00
HimariO	a3cd0e52f2	fix attn weight scaling after rebase	2025-04-25 22:12:55 +08:00
HimariO	7f530ac040	remove commented-out code blocks	2025-04-25 22:12:55 +08:00
HimariO	2de5dc3a14	remove not so often use `qwen2vl-cli` debug functions	2025-04-25 22:12:55 +08:00
HimariO	91fbdd781d	ignore transformers Qwen2_5_xxx type check	2025-04-25 22:12:26 +08:00
HimariO	d1af45988a	cleaning up	2025-04-25 22:12:26 +08:00
HimariO	2eb32933ea	move position id remap out of ggml to avoid int32 cuda operations	2025-04-25 22:12:26 +08:00
HimariO	444e47c088	fix few incorrect tensor memory layout	2025-04-25 22:11:48 +08:00
HimariO	69b39addd2	add debug utils	2025-04-25 22:11:48 +08:00
HimariO	3d5198ee05	handle window attention inputs	2025-04-25 22:11:13 +08:00
HimariO	d9f2d71bc2	implment vision model architecture, gguf convertor	2025-04-25 22:11:13 +08:00
City	558a764713	Force FP32 compute in GLM4 FFN Down (#13101 ) * Force FP32 compute in cuBLAS GEMM * Revert "Force FP32 compute in cuBLAS GEMM" This reverts commit 6efd872732159ab88ee7b3c1d77ba5ebc83079bd. * Force F32 compute in GLM4 ffn down * Edit comment to clarify issue Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-04-25 14:38:34 +02:00
Xuan-Son Nguyen	edb18b6e8f	clip : fix pixtral on some GPU backends (#13097 ) * clip : fix pixtral on some GPU backends * refactor inp_raw set * rm outdated comment * fix dynamic size * add TODO	2025-04-25 14:31:42 +02:00
Neo Zhang Jianyu	514c45608f	change the reorder tensor from init to execute OP (#13003 )	2025-04-25 17:37:51 +08:00
Concedo	6b6597ebf1	allow for single token prompt processing (actual batch size 1)	2025-04-25 16:54:46 +08:00
Radoslav Gerganov	553a5c3a9f	rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (#12943 ) RPC_CMD_SET_TENSOR always returns an empty response and we send this 4 times per token. We can improve TG speed if we don't wait for this empty response. The performance impact of this change depends on the network latency.	2025-04-25 10:08:08 +03:00
Xuan-Son Nguyen	13be08daf9	clip : remove boi/eoi embeddings for GLM-edge model (#13081 )	2025-04-24 22:17:04 +02:00
Georgi Gerganov	226251ed56	embeddings : fix batch sizes (#13076 ) ggml-ci	2025-04-24 22:29:22 +03:00
Concedo	d32d0b382a	glm4 template	2025-04-25 00:41:15 +08:00
Georgi Gerganov	87616f0680	ggml : fix trailing whitespaces (#0 )	2025-04-24 17:32:47 +03:00
Georgi Gerganov	63b4911494	sync : ggml ggml-ci	2025-04-24 17:32:47 +03:00
Acly	c6e8cc28c1	ggml : Depthwise 2D convolution (ggml/1152) * ggml-cpu : kernels for faster depthwise 2D convolution * fix compile: remove static after moving to ops.cpp * add dilation for depthwise_conv_2d * review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace * review: rename depthwise_conv_2d -> conv_2d_dw everywhere	2025-04-24 17:32:47 +03:00
Johannes Gäßler	b10d8bfdb1	CUDA: use switch statements in constexpr functions (#13095 )	2025-04-24 15:57:10 +02:00
Georgi Gerganov	13b4548877	cmake : do not include ./src as public for libllama (#13062 ) * cmake : do not include ./src as public for libllama ggml-ci * cmake : rework tests ggml-ci * llguidance : remove unicode include ggml-ci * cmake : make c++17 private ggml-ci	2025-04-24 16:00:10 +03:00

1 2 3 4 5 ...

7739 commits