koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-13 18:39:48 +00:00

Author	SHA1	Message	Date
Concedo	fe12b1cbd4	fixed lora, now works quanted too	2025-04-14 23:44:42 +08:00
Concedo	ad2522b319	str splitter	2025-04-14 23:05:36 +08:00
Akarshan Biswas	75afa0ae31	SYCL: Fix im2col (#12910 ) * SYCL: Fix im2col * restore local workgroup size adjustments for large inputs * restore format	2025-04-14 14:23:53 +02:00
Radoslav Gerganov	c772d54926	rpc : use ggml_context_ptr (#12938 )	2025-04-14 13:59:34 +03:00
Neo Zhang Jianyu	81c7e64fc2	dsiable curl lib check, this action is missed by commit `bd3f59f812` (#12761 ) (#12937 )	2025-04-14 18:19:07 +08:00
Concedo	6bc2ca4803	added more sanity checks on zenity	2025-04-14 15:06:08 +08:00
Concedo	ffa0bc21e6	workaround for rwkv	2025-04-14 14:46:08 +08:00
Georgi Gerganov	526739b879	sync : ggml ggml-ci	2025-04-14 09:26:15 +03:00
cmdr2	a25355e264	cpu: fix cpu backend's supports-op for GET_ROWS_BACK. fixes a fatal when running test-backend-ops with only the CPU backend (ggml/1190)	2025-04-14 09:26:15 +03:00
Concedo	3d31d75c8f	clamp and display detected GPU memory	2025-04-14 14:19:23 +08:00
SXX	e959d32b1c	ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register (#12773 ) * ggml: use _mm[512/256]_dpbusd[_avx]_epi32 to directly accumulate into the result register * simplifies the codebase by removing redundant functions	2025-04-14 08:47:55 +03:00
Concedo	e1ee857b1e	allow vulkan to be packaged without coopmat for noavx2	2025-04-14 12:40:00 +08:00
Alan Gray	307bfa253d	ggml: disable CUDA graphs for unsupported DUP and CONT node types (#12891 ) Fixes #12798	2025-04-13 23:12:21 +02:00
Ed Addario	71e90e8813	quantize: Handle user-defined quantization levels for additional tensors (#12511 ) * Add llama_model_quantize_params parameters * Add new quantize parameters parsing and validation * Update usage * Add new parameters defaults * Add new quantization parameters logic * Add llama_model_quantize_params parameters * Add new quantize parameters parsing and validation * Update usage * Add new parameters defaults * Add new quantization parameters logic * Minor refactoring as per the contributors' coding guidelines * Update descriptions to match existing style * Add llama_model_quantize_params parameters * Add new quantize parameters parsing and validation * Update usage * Add new parameters defaults * Add new quantization parameters logic * Minor refactoring as per the contributors' guidelines * Implement general --tensor-type instead of tensor-specific command option * Fix implied type bug * Restore missing #includes * Add regex capability for tensor selection * Refactor function name and update ALLOWED_TENSOR_TYPE * Add missing #include * Handle edge case when tensor name is cls.output * Minor logging improvement	2025-04-13 21:29:28 +03:00
Concedo	e0aa7aa4d9	updated sdui	2025-04-13 22:37:30 +08:00
Concedo	2d0b7e37f9	fix build	2025-04-13 22:01:48 +08:00
Concedo	895d008c5f	the bloke has retired for a year, its time to let go	2025-04-13 17:00:00 +08:00
Prajwal B Mehendarkar	bc091a4dc5	common : Define cache directory on AIX (#12915 )	2025-04-12 17:33:39 +02:00
Concedo	a6149ad0fc	fixed g3 adapter back	2025-04-12 23:17:54 +08:00
Concedo	9f94f62768	fixed segfault	2025-04-12 19:08:27 +08:00
Concedo	7b4254bef9	not working on cpu	2025-04-12 18:55:29 +08:00
Concedo	c94aec1930	update workflows, update gemma default adapter sysprompt	2025-04-12 18:38:23 +08:00
Concedo	956ed89595	fixed build	2025-04-12 17:06:55 +08:00
Jeff Bolz	a4837577aa	vulkan: use aligned loads for flash attention mask (#12853 ) Rewrite the stride logic for the mask tensor in the FA shader to force the stride to be aligned, to allow using more efficient loads.	2025-04-12 10:44:48 +02:00
Concedo	6302709fbb	discourage but dont prevent vulkan FA (it's occasionally still useful)	2025-04-12 16:23:52 +08:00
Concedo	b42fa821d8	try allow build from commit hash	2025-04-12 13:37:10 +08:00
Matt Clayton	e59ea539b8	llava: Fix cpu-only clip image encoding sefault (#12907 ) * llava: Fix cpu-only clip image encoding * clip : no smart ptr for ggml_backend_t * Fix for backend_ptr push_back --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-04-12 07:29:03 +02:00
Concedo	5908f2ca19	based on occam and henky advice, disabled flash attention entirely on vulkan.	2025-04-12 12:30:48 +08:00
Concedo	7a7bdeab6d	json to gbnf endpoint added	2025-04-12 11:41:11 +08:00
Concedo	7e1289ade8	fixes for sdcpp	2025-04-12 10:08:23 +08:00
Concedo	a0ae187563	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # README.md # build-xcframework.sh # examples/llava/CMakeLists.txt # examples/llava/clip.cpp # examples/rpc/rpc-server.cpp # examples/run/run.cpp # ggml/src/ggml-cann/ggml-cann.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-chat.cpp	2025-04-12 10:06:47 +08:00
Concedo	efef14bb82	added llama4 tags	2025-04-12 08:58:04 +08:00
Concedo	ea9bd61e47	Merge commit '`64eda5deb9`' into concedo_experimental # Conflicts: # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/llama-cli-cann.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # README.md # docs/backend/SYCL.md # examples/llava/clip.cpp # examples/server_embd.py # ggml/src/ggml-cann/acl_tensor.cpp # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cann/ggml-cann.cpp # src/CMakeLists.txt # tests/test-chat-template.cpp	2025-04-12 08:31:22 +08:00
Georgi Gerganov	c94085df28	server : add VSCode's Github Copilot Chat support (#12896 ) * server : add VSCode's Github Copilot Chat support * cont : update handler name	2025-04-11 23:37:41 +03:00
yuri@FreeBSD	e8a62631b3	rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903 )	2025-04-11 22:04:14 +02:00
Olivier Chafik	b6930ebc42	`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 templates (#12900 ) * `tool-call`: don't call common_chat_params_init_hermes_2_pro when there aren't tools (or when there's a schema) * test all chat formats w/o tools	2025-04-11 21:47:52 +02:00
yuri@FreeBSD	68b08f36d0	common : Define cache directory on FreeBSD (#12892 )	2025-04-11 21:45:44 +02:00
Concedo	a56cc72bd0	added handling for remembering file paths, added gui option to disable zenity in GUI	2025-04-12 00:42:26 +08:00
henk717	f6b7fea979	zentk - folder select workaround (#1478 ) * zentk - folder select workaround * kcppt extention fix	2025-04-11 22:37:07 +08:00
Ewan Crawford	578754b315	sycl: Support sycl_ext_oneapi_limited_graph (#12873 ) The current usage of the SYCL-Graph extension checks for the `sycl_ext_oneapi_graph` device aspect. However, it is also possible to support `sycl_ext_oneapi_limied_graph` devices that don't support update	2025-04-11 15:32:14 +02:00
tastelikefeet	b2034c2b55	contrib: support modelscope community (#12664 ) * support download from modelscope * support login * remove comments * add arguments * fix code * fix win32 * test passed * fix readme * revert readme * change to MODEL_ENDPOINT * revert tail line * fix readme * refactor model endpoint * remove blank line * fix header * fix as comments * update comment * update readme --------- Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>	2025-04-11 14:01:56 +02:00
henk717	8fd70f37bd	Zentk integration (Zenity/yad support) (#1475 ) * Zentk integration (Zenity/yad support) * Escape incompatible dependencies in zentk * Properly clean env	2025-04-11 18:23:23 +08:00
Yuxuan Zhang	06bb53ad9b	llama-model : add Glm4Model implementation for GLM-4-0414 (#12867 ) * GLM-4-0414 * use original one * Using with tensor map * fix bug * change order * change order * format with flask8	2025-04-11 12:10:10 +02:00
Xuan-Son Nguyen	0c50923944	clip : use smart pointer (⚠️ breaking change) (#12869 ) * clip : use smart pointers * fix warmup * add forward declaration * misisng include * fix include (2) * composite * simplify batch ptr * fix conflict	2025-04-11 12:09:39 +02:00
Akarshan Biswas	fccf9cae83	SYCL: Add fp16 type support to unary op kernels (#12788 ) * SYCL: Add fp16 support to some elementwise OP kernels * remove comment ggml-ci * Use static_cast directly * remove not needed cast from tanh * Use static cast and remove unneeded castings * Adjust device_support_op for unary OPs * Use cast_data and typed_data struct to deduplicate casting code	2025-04-11 16:03:50 +08:00
Daniel Han	ec6c09d0fa	convert : Llama4 RoPE fix (#12889 )	2025-04-11 09:49:09 +02:00
R0CKSTAR	8ac9f5d765	ci : Replace freediskspace to free_disk_space in docker.yml (#12861 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-04-11 09:26:17 +02:00
Daniel Bevenius	12e9158f25	xcf : add check for visionos build version (#12854 ) This commit adds a check for the visionos build version used with vtool in build-xcframework.sh. The script now checks the Xcode version and determines whether to use "xros" or "visionos" for the build version. This commit also uses xcrun for the vtool so that the version of vtool in xcode command line tools is used instead of the one in the system path. Refs: https://github.com/ggml-org/whisper.cpp/pull/2994#issuecomment-2773292223	2025-04-11 09:24:34 +02:00
Xuan-Son Nguyen	5b1f13cb64	convert : proper tensor name mapping for llama4 (#12870 ) * Llama-4 mapping * remove hacky renaming --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2025-04-11 09:23:37 +02:00
Xuan-Son Nguyen	8b91d5355a	llama : correct rms norm for llama 4 (#12882 )	2025-04-11 08:49:50 +02:00

1 2 3 4 5 ...

7655 commits