koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-20 09:25:53 +00:00

Author	SHA1	Message	Date
Xuan Son Nguyen	0c74ea54f5	clean up	2025-04-26 22:37:05 +02:00
HimariO	7e1bb0437a	remove `attn_window_size` from gguf	2025-04-26 20:19:51 +08:00
HimariO	77b144a8e7	replace `KEY_FULLATTN_BLK_IDX` with `KEY_WIN_ATTN_PATTERN`	2025-04-26 01:00:00 +08:00
HimariO	f69e9fa04d	remove `KEY_USE_GLU_MLP`, `KEY_USE_RMS_NORM`	2025-04-26 00:16:27 +08:00
HimariO	caa7e57ec5	add `PROJECTOR_TYPE_QWEN2_5_VL`	2025-04-26 00:03:02 +08:00
HimariO	a3cd0e52f2	fix attn weight scaling after rebase	2025-04-25 22:12:55 +08:00
HimariO	7f530ac040	remove commented-out code blocks	2025-04-25 22:12:55 +08:00
HimariO	2de5dc3a14	remove not so often use `qwen2vl-cli` debug functions	2025-04-25 22:12:55 +08:00
HimariO	91fbdd781d	ignore transformers Qwen2_5_xxx type check	2025-04-25 22:12:26 +08:00
HimariO	d1af45988a	cleaning up	2025-04-25 22:12:26 +08:00
HimariO	2eb32933ea	move position id remap out of ggml to avoid int32 cuda operations	2025-04-25 22:12:26 +08:00
HimariO	444e47c088	fix few incorrect tensor memory layout	2025-04-25 22:11:48 +08:00
HimariO	69b39addd2	add debug utils	2025-04-25 22:11:48 +08:00
HimariO	3d5198ee05	handle window attention inputs	2025-04-25 22:11:13 +08:00
HimariO	d9f2d71bc2	implment vision model architecture, gguf convertor	2025-04-25 22:11:13 +08:00
Xuan-Son Nguyen	edb18b6e8f	clip : fix pixtral on some GPU backends (#13097 ) * clip : fix pixtral on some GPU backends * refactor inp_raw set * rm outdated comment * fix dynamic size * add TODO	2025-04-25 14:31:42 +02:00
Xuan-Son Nguyen	13be08daf9	clip : remove boi/eoi embeddings for GLM-edge model (#13081 )	2025-04-24 22:17:04 +02:00
Xuan-Son Nguyen	7c727fbe39	arg : add --no-mmproj-offload (#13093 ) * arg : add --no-mmproj-offload * Update common/arg.cpp	2025-04-24 14:04:14 +02:00
Xuan-Son Nguyen	80982e815e	arg : clean up handling --mmproj with -hf (#13082 ) * arg : clean up handling --mmproj with -hf * rm change about no_mmproj * Revert "rm change about no_mmproj" This reverts commit 2cac8e0efb629d66c612f137e75d562f94bb9e6c. * handle no_mmproj explicitly * skip download mmproj on examples not using it	2025-04-24 12:14:13 +02:00
pl752	5630406959	llama-mtmd-cli: Sigint rework in mtmd vision example (#13080 ) * Sigint rework in mtmd vision example * Applied suggestions on mtmd-cli PR * Forgot to invert one of the conditions * Update examples/llava/mtmd-cli.cpp * Removed redundant exit check --------- Co-authored-by: pl752 <maximpl752@gmail.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-04-23 23:32:35 +02:00
Xuan-Son Nguyen	ecda2ec4b3	mtmd : Support Pixtral 12B (#13065 ) * add pixtral text model (vision is wip) * cgraph ok, just missing 2D RoPE * fix bad rebase * first working version * fix problem with img_break token * support dynamic image size * update docs * update test script	2025-04-23 20:21:59 +02:00
Xuan-Son Nguyen	dc39a5e7a8	mtmd : support SmolVLM (version 1 and 2) (#13050 ) * mtmd : support SmolVLM (version 1 and 2) * correct chat template * fix n_patches * scale_factor is an int * add more models to test	2025-04-22 16:24:54 +02:00
Xuan-Son Nguyen	243453533e	llava : update documentations (#13055 ) * llava : update documentations * fix typo	2025-04-22 10:37:00 +02:00
Xuan-Son Nguyen	84a9bf2fc2	mtmd : merge llava, gemma3 and minicpmv CLI into single `llama-mtmd-cli` (#13012 ) * mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli` * support for minicpmv * remove cpp files of llava and minicpmv * update hot topics * mtmd : add not supported msg for qwen2vl * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-21 15:32:58 +02:00
Xuan-Son Nguyen	2016f07bd1	convert : experimental support for `--mmproj` flag (#13023 ) * convert : experimental support for `--mmproj` flag * fix bad ctrl+f replace * fix style * split into subclasses TextModel and VisionModel * rename Mode --> ModelBase * small fix * correct CLIP_VISION arch name (because existing GGUF already use it) * Apply suggestions from code review Co-authored-by: compilade <git@compilade.net> * fix Mistral3Model * fix typo Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2025-04-20 23:29:36 +02:00
Jeffrey Morgan	6602304814	llava: fix errors in clip.h on certain compilers (#13030 )	2025-04-20 12:15:41 +02:00
Xuan-Son Nguyen	37b9f0d29d	clip : refactor, add `image_manipulation` and `llava_uhd` classes (#13011 ) * clip : refactor, add `image_manipulation` and `llava_uhd` * refactor llava-1.6 preprocessing * simplify logic for llava-1.5 * missing include	2025-04-19 09:15:45 +02:00
Xuan-Son Nguyen	b9154ecff9	mtmd : add methods to access `mtmd_image_tokens` (#12906 ) * mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * fix prompt_modified * rm redundant data member	2025-04-18 10:04:51 +02:00
Russyyds	d6d2c2ab8c	Add performance print for gemma3 in example (#12929 )	2025-04-14 19:18:20 +02:00
Matt Clayton	e59ea539b8	llava: Fix cpu-only clip image encoding sefault (#12907 ) * llava: Fix cpu-only clip image encoding * clip : no smart ptr for ggml_backend_t * Fix for backend_ptr push_back --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-12 07:29:03 +02:00
Xuan-Son Nguyen	0c50923944	clip : use smart pointer (⚠️ breaking change) (#12869 ) * clip : use smart pointers * fix warmup * add forward declaration * misisng include * fix include (2) * composite * simplify batch ptr * fix conflict	2025-04-11 12:09:39 +02:00
Xuan-Son Nguyen	8b9cc7cdd8	llava : introduce libmtmd (#12849 ) * wip llava2 * migrated gemma3 to llava2 * add timings * correct pre/postfix * fix missing include * fix compilation unused var warn * update llava2_tokenize * change name llava2 --> mtmd * improve api * refine helpers * Update examples/llava/mtmd.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-10 22:57:16 +02:00
Xuan-Son Nguyen	65a69e6e1b	clip : do not print ftype (#12832 )	2025-04-09 10:09:53 +02:00
Matt Clayton	b32efad2bc	llava: improve clip_ctx destructor to not memleak load_image_size (#12834 )	2025-04-08 22:01:58 +02:00
dm4	2dabf759e7	llava: add more helper functions to check projector types in clip context (#12824 ) Signed-off-by: dm4 <sunrisedm4@gmail.com>	2025-04-08 15:49:13 +02:00
Sergey Fedorov	f1e3eb4249	common : fix includes in arg.cpp and gemma3-cli.cpp (#12766 ) * arg.cpp: add a missing include * gemma3-cli.cpp: fix cinttypes include	2025-04-05 17:46:00 +02:00
Xuan-Son Nguyen	0364178ca2	clip : refactor clip_init, add tests (#12757 ) * refactor clip_init * fix loading file * fix style * test ok * better test with report * add missing headers * clarify * add KEY_MM_PATCH_MERGE_TYPE * remove bool has_* pattern * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/clip.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * use ggml_soft_max_ext * refactor logging system * add minicpm-v-o 2.6 for testing * use nullptr everywhere * fix Yi-VL model --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-04-05 17:17:40 +02:00
Xuan-Son Nguyen	267c1399f1	common : refactor downloading system, handle mmproj with -hf option (#12694 ) * (wip) refactor downloading system [no ci] * fix all examples * fix mmproj with -hf * gemma3: update readme * only handle mmproj in llava example * fix multi-shard download * windows: fix problem with std::min and std::max * fix 2	2025-04-01 23:44:05 +02:00
Sigbjørn Skjæret	1a85949067	llava : proper description fix (#12668 )	2025-03-31 11:28:30 +02:00
Sigbjørn Skjæret	f52d59d771	llava : fix clip loading GGUFs with missing description (#12660 )	2025-03-31 11:07:07 +02:00
Ivy233	02082f1519	clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566 ) * [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize. After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA. * [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.	2025-03-26 15:06:04 +01:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Xuan-Son Nguyen	96e1280839	clip : bring back GPU support (#12322 ) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up	2025-03-11 09:20:16 +01:00
tc-mb	8352cdc87b	llava : fix bug in minicpm-v code (#11513 ) * fix bug in minicpm-v code * update readme of minicpm-v	2025-03-10 10:33:24 +02:00
Aaron Teo	e9b2f84f14	llava: add big-endian conversion for image encoder (#12218 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-03-06 09:33:21 +01:00
Alex Brooks	84d5f4bc19	Update granite vision docs for 3.2 model (#12105 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-28 11:31:47 +00:00
Ting Lou	a800ae46da	llava : add struct for FFI bindgen (#12079 ) * add struct for FFI bindgen * Apply suggestions from code review --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-26 15:26:52 +01:00
Alex Brooks	4d1051a40f	Add Doc for Converting Granite Vision -> GGUF (#12006 ) * Add example docs for granite vision Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-25 10:46:05 +01:00
Alex Brooks	7a2c913e66	llava : Add Granite Vision Support (#11794 ) * Add super wip scripts for multimodal granite gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add example for converting mmgranite to gguf Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * remove hardcoded path Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add vision feature layer to gguf params Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Clean up llava surgery and remove name substitution hacks Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add transformers llava next tensor name mapping Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Make siglip / openclip mutuall exclusive Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix projector linear substitution Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix linear 2 substitution index Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Increase max flattened gridpoints to 64 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix hardcoded concat for multiple feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Pull vision feature layers out of gguf keys Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * fix num gridpoints and use all layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Avoid dropping last image encoder layer in llava models Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use 10 for max number of patches Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Standardize vision feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Cleanup logs Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update comment for vision feature layer init Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update notes for alternative to legacy llm conversion script Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix notes rendering Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add v prefix to vision feature layer log Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use current defaults for feature layer Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use constant for max gridpoints / feat layers, style fixes Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * clarify non-negative feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Remove CLIP_API from func signature Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * USE MAX_IMAGE_FEATURE_LAYERS const in layer calc Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Clarify feature layers are non negative ints and not uint Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix condition for reading feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * pop last llava layer when feature layers are unset Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix unset vision layer 0 Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Update examples/llava/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Reenable assertion for out of bounds get_rows Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use std vector for gridpoints and feature layers Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Caculate max feature layer at load time Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Include base patch for granite vision allocation Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Fix trailing whitespace Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Add max num patches = 10 back for minicpmv Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use unordered set to store feature layers Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Use max feature layer for postnorm Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> * Apply suggestions from code review --------- Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-02-24 17:09:51 +01:00

1 2 3 4

183 commits