koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-15 01:15:34 +00:00

Author	SHA1	Message	Date
Concedo	b5ba6c9ece	test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models	2024-02-25 21:14:53 +08:00
Concedo	f3a0e05d91	added noavx2 vulkan	2024-02-22 16:56:25 +08:00
henk717	01b7daf6b7	CUDA 12 support for the Conda Runtime (#680 ) * Remove libculibos dependency This dependency is something that is used to build libcudart which we are also already targeting. The individual file is no longer being distributed with the CUDA12 conda devkit, so we can no longer target it directly. But because all its functionality is inside libcudart we also don't need it. This commit removes the inclusion so that Koboldcpp can be compiled with CUDA12 as distributed by conda. I have tested this on the 1.57 release on CUDA11.5 and CUDA12.1. * Cleanup version definitions The package versions are already controlled by the label, we don't need to define it multiple times for it to work correctly. Removing the separate definitions allows people to easily change which version of CUDA they wish for their system.	2024-02-16 21:32:30 +08:00
Concedo	f81404e33c	updated class py, added imatrix	2024-01-28 22:37:11 +08:00
Concedo	2a4a7241e6	Merge branch 'vulkan_test' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # llama.cpp	2024-01-25 23:01:44 +08:00
Concedo	72f99f0545	changes required to get vulkan working on windows	2024-01-25 18:29:45 +08:00
Concedo	7b3866f211	vulkan implementation from occam (early access, squashed)	2024-01-25 18:13:19 +08:00
Concedo	e2e8da0d1d	remove mcpu native (+1 squashed commits) Squashed commits: [0617bd8f] disable fp16 VA (+1 squashed commits) Squashed commits: [4213851a] disable FP16 VA	2024-01-24 23:18:15 +08:00
Concedo	0f6fa6be93	try adding other fallback backends for linux	2024-01-23 23:37:56 +08:00
Concedo	14de08586e	added more compile flags to set apart the conda paths, and also for colab. updated readme for multitool	2024-01-21 17:38:33 +08:00
Concedo	71e9a64171	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/nix-ci.yml # CMakeLists.txt # Makefile # ggml-cuda.cu # ggml-opencl.cpp # llama.cpp	2024-01-20 23:27:42 +08:00
Concedo	425849f387	try generating fat binaries for cuda	2024-01-20 22:05:08 +08:00
DaniAndTheWeb	9ab904562e	Debian Unstable compatibility for HIP (#620 ) * Support rocm on Debian unstable * Update Makefile	2024-01-20 11:11:04 +08:00
Concedo	db14de5c32	fossilize ggml library ver 3, to support ggjtv3	2024-01-20 10:49:25 +08:00
Georgi Gerganov	c918fe8dca	metal : create autorelease pool during library build (#4970 ) * metal : create autorelease pool during library build ggml-ci * test : simplify ggml-ci	2024-01-17 18:38:39 +02:00
Georgi Gerganov	4be5ef556d	metal : remove old API (#4919 ) ggml-ci	2024-01-13 20:45:45 +02:00
Kawrakow	326b418b59	Importance Matrix calculation (#4861 ) * imatrix: 1st version * imatrix: WIP * Cleanup * Update examples/imatrix/imatrix.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-12 06:59:57 +01:00
Georgi Gerganov	b0034d93ce	examples : add passkey test (#3856 ) * examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * make : add passkey target * passkey : add "self-extend"-like context extension (#4810) * llama : "self-extend"-like context extension * passkey : add comment * passkey : add readme	2024-01-08 11:14:04 +02:00
henk717	abecbfc6b5	Add Conda Runtime support for -lcuda (#593 ) * Use Conda Runtime Libraries * Also look at stubs	2024-01-01 12:15:10 +08:00
Concedo	76362ae3c1	fix makefile for linux cuda	2023-12-31 11:45:36 +08:00
Concedo	cead207888	add missing dependency for linux cuda	2023-12-31 11:10:40 +08:00
Concedo	293395e0f5	Merge commit '`708e179e85`' into concedo_experimental # Conflicts: # .github/workflows/docker.yml	2023-12-25 16:48:15 +08:00
slaren	5bf3953d7e	cuda : improve cuda pool efficiency using virtual memory (#4606 ) * cuda : improve cuda pool efficiency using virtual memory * fix mixtral * fix cmake build * check for vmm support, disable for hip ggml-ci * fix hip build * clarify granularity * move all caps to g_device_caps * refactor error checking * add cuda_pool_alloc, refactor most pool allocations ggml-ci * fix hip build * CUBLAS_TF32_TENSOR_OP_MATH is not a macro * more hip crap * llama : fix msvc warnings * ggml : fix msvc warnings * minor * minor * cuda : fallback to CPU on host buffer alloc fail * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml-cuda.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * ensure allocations are always aligned * act_size -> actual_size --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2023-12-24 14:34:22 +01:00
LeonEricsson	7082d24cec	lookup : add prompt lookup decoding example (#4484 ) * initial commit, going through initializations * main loop finished, starting to debug * BUG: generates gibberish/repeating tokens after a while * kv_cache management * Added colors to distinguish drafted tokens (--color). Updated README * lookup : fix token positions in the draft batch * lookup : use n_draft from CLI params * lookup : final touches --------- Co-authored-by: Leon Ericsson <leon.ericsson@icloud.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-22 18:05:56 +02:00
FantasyGmm	a55876955b	cuda : fix jetson compile error (#4560 ) * fix old jetson compile error * Update Makefile * update jetson detect and cuda version detect * update cuda marco define * update makefile and cuda,fix some issue * Update README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update Makefile * Update README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-22 17:11:12 +02:00
Michael Kesper	28cb35a0ec	make : add LLAMA_HIP_UMA option (#4587 ) NB: LLAMA_HIP_UMA=1 (or any value) adds MK_CPPFLAG -DGGML_HIP_UMA	2023-12-22 10:03:25 +02:00
Georgi Gerganov	32259b2dad	gguf : simplify example dependencies	2023-12-21 23:08:14 +02:00
slaren	d232aca5a7	llama : initial ggml-backend integration (#4520 ) * llama : initial ggml-backend integration * add ggml-metal * cuda backend can be used though ggml-backend with LLAMA_GGML_BACKEND_CUDA_TEST access all tensor data with ggml_backend_tensor_get/set * add ggml_backend_buffer_clear zero-init KV cache buffer * add ggml_backend_buffer_is_hos, used to avoid copies if possible when accesing tensor data * disable gpu backends with ngl 0 * more accurate mlock * unmap offloaded part of the model * use posix_fadvise64(.., POSIX_FADV_SEQUENTIAL) to improve performance with mmap * update quantize and lora * update session copy/set to use ggml-backend ggml-ci * use posix_fadvise instead of posix_fadvise64 * ggml_backend_alloc_ctx_tensors_from_buft : remove old print * llama_mmap::align_offset : use pointers instead of references for out parameters * restore progress_callback behavior * move final progress_callback call to load_all_data * cuda : fix fprintf format string (minor) * do not offload scales * llama_mmap : avoid unmapping the same fragments again in the destructor * remove unnecessary unmap * metal : add default log function that prints to stderr, cleanup code ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-12-21 21:07:46 +01:00
Matheus Gabriel Alves Silva	919c40660f	build : Check the ROCm installation location (#4485 ) * build : Check the ROCm installation location * more generic approach * fixup! It was returning the path instead of the command output * fixup! Trailing whitespace	2023-12-17 17:23:33 +02:00
Jared Van Bortel	70f806b821	build : detect host compiler and cuda compiler separately (#4414 )	2023-12-13 12:10:10 -05:00
slaren	799a1cb13b	llama : add Mixtral support (#4406 ) * convert : support Mixtral as LLAMA arch * convert : fix n_ff typo * llama : model loading * ggml : sync latest ggml_mul_mat_id * llama : update graph to support MoE * llama : fix cur -> cur_expert * llama : first working version * llama : fix expert weighting in the FFN * ggml : ggml_get_rows support 2D indexing [n_tokens, n_experts] (cpu only) * ggml : add n_as argument to ggml_mul_mat_id * ggml : fix ggml_get_rows to take into account ne02 / ne11 * metal : add more general support for ggml_get_rows + tests * llama : add basic support for offloading moe with CUDA * metal : add/mul/div use general kernel when src1 not cont * metal : reduce the kernel launches for ggml_mul_mat_id * ggml : get_rows : support non-contiguos tensors with gaps, generalize up to 3D * ggml : update get_rows f16 and q * cuda : support non-contiguous src1 in get_rows * llama : offload missing ffn_moe_silu * metal : fix ggml_get_rows to work with non-cont src1 * metal : add indirect mat-vec kernels for all quantization types * llama : do not quantize expert gating tensors * llama : add n_expert and n_expert_used to hparams + change quants * test-backend-ops : add moe test * cuda : fix get_rows when ncols is odd * convert : determine n_ctx correctly * metal : fix ggml_mul_mat_id for F32 * test-backend-ops : make experts more evenly probable (test_moe) * test-backend-ops : cleanup, add moe test for batches * test-backend-ops : add cpy from f32 -> all types test * test-backend-ops : fix dequantize block offset * llama : fix hard-coded number of experts * test-backend-ops : simplify and disable slow tests to avoid CI timeout * test-backend-ops : disable MOE test with thread sanitizer * cuda : fix mul_mat_id with multi gpu * convert : use 1e6 rope_freq_base for mixtral * convert : fix style * convert : support safetensors format * gguf-py : bump version * metal : add cpy f16 -> f32 kernel * metal : fix binary ops for ne10 % 4 != 0 * test-backend-ops : add one more sum_rows test * ggml : do not use BLAS with ggml_mul_mat_id * convert-hf : support for mixtral-instruct (#4428) * convert : typo fix, add additional hyperparameters, use LLaMA arch for Mixtral-instruct * convert : use sentencepiece tokenizer for Mixtral-instruct * convert : make flake8 happy * metal : fix soft_max kernels ref: https://github.com/ggerganov/ggml/pull/621/commits/1914017863d2f9ab8ecc0281cc2a56d683668b92 * metal : limit kernels to not use more than the allowed threads --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Radek Pilar <github@mrkva.eu>	2023-12-13 14:04:25 +02:00
Jared Van Bortel	6138963fb2	build : target Windows 8 for standard mingw-w64 (#4405 ) * build : target Windows 8 for standard mingw-w64 * make : fix missing console.o deps This was causing a link error with `make all` on Windows.	2023-12-12 11:27:26 +02:00
Concedo	fce971d541	do not build the clblast noavx2 binary if not on windows	2023-12-11 16:17:10 +08:00
Georgi Gerganov	fe680e3d10	sync : ggml (new ops, tests, backend, etc.) (#4359 ) * sync : ggml (part 1) * sync : ggml (part 2, CUDA) * sync : ggml (part 3, Metal) * ggml : build fixes ggml-ci * cuda : restore lost changes * cuda : restore lost changes (StableLM rope) * cmake : enable separable compilation for CUDA ggml-ci * ggml-cuda : remove device side dequantize * Revert "cmake : enable separable compilation for CUDA" This reverts commit 09e35d04b1c4ca67f9685690160b35bc885a89ac. * cuda : remove assert for rope * tests : add test-backend-ops * ggml : fix bug in ggml_concat * ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()` * ci : try to fix macOS * ggml-backend : remove backend self-registration * ci : disable Metal for macOS cmake build ggml-ci * metal : fix "supports family" call * metal : fix assert * metal : print resource path ggml-ci --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-12-07 22:26:54 +02:00
Jared Van Bortel	511f52c334	build : enable libstdc++ assertions for debug builds (#4275 )	2023-12-01 20:18:35 +02:00
WillCorticesAI	d2809a3ba2	make : fix Apple clang determination bug (#4272 ) Co-authored-by: Will Findley <findley@gmail.com>	2023-12-01 00:23:44 +02:00
Jared Van Bortel	15f5d96037	build : fix build info generation and cleanup Makefile (#3920 ) * cmake : fix joining of REAL_GIT_DIR * fix includes with help from include-what-you-use * make : remove unneeded deps and add test-rope target * fix C includes in C++ source files * Revert "fix includes with help from include-what-you-use" This reverts commit 635e9fadfd516d4604a0fecf4a854bfb25ad17ae.	2023-12-01 00:23:08 +02:00
Georgi Gerganov	922754a8d6	lookahead : add example for lookahead decoding (#4207 ) * lookahead : init * lookahead : generate and store n-grams * lookahead : use loop instead recursion to generate n-grams * lookahead : initial working implementation * lookahead : filter repeating n-grams * lookahead : use deterministic init * lookahead : add to Makefile * lookahead : fix a bug in the seq_id of the lookahead tokens * lookahead : add comments --------- Co-authored-by: slaren <slarengh@gmail.com>	2023-11-26 20:33:07 +02:00
Kerfuffle	28a2e6e7d4	tokenize example: Respect normal add BOS token behavior (#4126 ) Allow building with Makefile	2023-11-18 14:48:17 -07:00
Roger Meier	8e9361089d	build : support ppc64le build for make and CMake (#3963 ) * build: support ppc64le build for make and CMake * build: keep __POWER9_VECTOR__ ifdef and extend with __powerpc64__ Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-17 18:11:23 +02:00
Michael Potter	6bb4908a17	Fix MacOS Sonoma model quantization (#4052 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-11-14 12:34:41 -05:00
Georgi Gerganov	413503d4b9	make : do not add linker flags when compiling static llava lib (#3977 )	2023-11-07 20:25:32 +03:00
Damian Stewart	381efbf480	llava : expose as a shared library for downstream projects (#3613 ) * wip llava python bindings compatibility * add external llava API * add base64 in-prompt image support * wip refactor image loading * refactor image load out of llava init * cleanup * further cleanup; move llava-cli into its own file and rename * move base64.hpp into common/ * collapse clip and llava libraries * move llava into its own subdir * wip * fix bug where base64 string was not removed from the prompt * get libllava to output in the right place * expose llava methods in libllama.dylib * cleanup memory usage around clip_image_* * cleanup and refactor again * update headerdoc * build with cmake, not tested (WIP) * Editorconfig * Editorconfig * Build with make * Build with make * Fix cyclical depts on Windows * attempt to fix build on Windows * attempt to fix build on Windows * Upd TODOs * attempt to fix build on Windows+CUDA * Revert changes in cmake * Fix according to review comments * Support building as a shared library * address review comments --------- Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2023-11-07 00:36:23 +03:00
Concedo	2102942121	testing LLAMA_PORTABLE flag for building	2023-11-06 20:15:15 +08:00
Concedo	2f16eccb89	special colab build	2023-11-06 01:46:58 +08:00
Concedo	2b32b170a1	clang 15 check for macOS	2023-11-05 22:57:05 +08:00
YellowRoseCx	e2e5fe56a8	KCPP Fetches AMD ROCm Memory without a stick, CC_TURING Gets the Boot, koboldcpp_hipblas.dll Talks To The Hand, and hipBLAS Compiler Finds Its Independence! (#517 ) * AMD ROCm memory fetching and max mem setting * Update .gitignore with koboldcpp_hipblas.dll * Update CMakeLists.txt remove CC_TURING for AMD * separate hipBLAS compiler, update MMV_Y, move CXX/CC print separate hipBLAS compiler, update MMV_Y value, move the section that prints CXX and CC compiler name	2023-11-05 22:23:18 +08:00
Concedo	fca7a4c054	added noavx2 model for clblast (+1 squashed commits) Squashed commits: [291ecae6] added noavx2 mode for clblast (+1 squashed commits) Squashed commits: [562bc872] wip adding noavx2 cl	2023-11-02 15:22:34 +08:00
cebtenzzre	b12fa0d1c1	build : link against build info instead of compiling against it (#3879 ) * cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>	2023-11-02 08:50:16 +02:00
Concedo	df7e757d40	windows: added simpleclinfo, which helps determine clblast platform and device on windows	2023-11-01 18:10:35 +08:00

1 2 3 4 5 ...

349 commits