koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 09:59:50 +00:00

Author	SHA1	Message	Date
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	bd38665e1f	some cleanup before starting on TTS	2025-01-10 22:13:44 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
Concedo	bb2e739627	fixed simplercflags	2025-01-07 21:34:38 +08:00
Concedo	58791612d2	sse3 mode for noavx2 clblast, fixed metadata, added version command	2025-01-06 21:59:05 +08:00
Concedo	b4dc29f425	kobo cheats death again (+1 squashed commits) Squashed commits: [708e2429] kobo cheats death again	2025-01-04 01:06:41 +08:00
Concedo	22fd7a0439	fix make tools for linux	2025-01-03 11:39:23 +08:00
Concedo	2a890ec25a	Breaking change: unify the windows and linux build flags. To do a full build on windows you now need LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1	2024-12-23 22:35:54 +08:00
Concedo	1e07043a6e	clean and rename old clblast files in preparation for merge	2024-12-15 15:29:02 +08:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
Concedo	a63c2c914d	made shaders gen deterministic, update to c++17 (+4 squashed commit) Squashed commit: [7bb2441b] made shaders gen deterministic [906e02af] Update c++ from 11 to 17 (#1263) * Update c/c++ from 11 to 17 * Update CMakeLists.txt only bump c++ [7ca430ed] C++17 ver [`b7dfb55d`] give up and switch to c++17 (+1 squashed commits) Squashed commits: [96cfbc48] give up and switch to c++17 (+5 squashed commit) Squashed commit: [19ac7c26] Revert "fixed incorrect number of params" This reverts commit 51388729bc4ffe51ab07ae02ce386219fb5e2876. [45f730da] Revert "fix for c++17" This reverts commit 050ba5f72b3358f958722addb9aaa77ff2e428ee. [51388729] fixed incorrect number of params [8f1ee54e] build latest vk shaders [050ba5f7] fix for c++17	2024-12-13 23:07:10 +08:00
Concedo	7e1abf3aaf	sync - fix cmake failing to build with c++11, updated glslc.exe to handle coopmat, sync sdtype count, aarch repack flags	2024-12-13 17:08:10 +08:00
Concedo	de64b9198c	merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)	2024-12-13 17:04:19 +08:00
Concedo	4548d893ee	better way to handle termux compatibility (+2 squashed commit) Squashed commit: [301986f11] better way to handle termux compatibility [16b03b225] updated lite	2024-12-11 15:05:01 +08:00
Concedo	a11bba5893	cleanup, fix native build for arm (+28 squashed commit) Squashed commit: [d1f6a4154] bundle library [947ab84b7] undo [0f9aba8d8] test [e9ac93873] test [920438202] test [`1c6d98804`] Revert "quick test" This reverts commit `acf8ec8940`. [`acf8ec894`] quick test [`6a9937233`] undo [`5a263a5bd`] test [`ddfd82bca`] test [`0b30e45da`] test [`c3bfece55`] messed up [`2a4b37fe0`] Revert "test" This reverts commit `80a1fcaeaf`. [`80a1fcaea`] test [`e2aa7d944`] test [`264d80200`] test [`f5b123173`] undo [`1ffacc484`] test [`63c0be926`] undo [`510e0377e`] ofast try fix [`4ac199b20`] try fix sigill [`1bc987ba2`] try fix illegal instruction [`7697252b1`] edit [`f87087b28`] check gcc ver [`e9dfe2cef`] try using qemu to do the pyinstaller [`b411192db`] revert [`25b5301e5`] try using qemu to do the pyinstaller [`58038cddc`] try using qemu to do the pyinstaller	2024-12-10 19:42:23 +08:00
Djip007	19d8762ab6	ggml : refactor online repacking (#10446 ) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-07 14:37:50 +02:00
Concedo	ece96e19bf	clean up makefile	2024-12-05 23:58:23 +08:00
Xuan Son Nguyen	91c36c269b	server : (web ui) Various improvements, now use vite as bundler (#10599 ) * hide buttons in dropdown menu * use npm as deps manager and vite as bundler * fix build * fix build (2) * fix responsive on mobile * fix more problems on mobile * sync build * (test) add CI step for verifying build * fix ci * force rebuild .hpp files * cmake: clean up generated files pre build	2024-12-03 19:38:44 +01:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
Wang Qin	43957ef203	build: update Makefile comments for C++ version change (#10598 )	2024-12-01 04:19:44 +01:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Tristan Druyen	be0e350c8b	Fix HIP flag inconsistency & build docs (#10524 ) * Fix inconsistency of HIP flags in cmake & make * Fix docs regarding GGML_HIP	2024-11-26 19:27:28 +01:00
R0CKSTAR	249cd93da3	mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (#10516 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2024-11-26 17:00:41 +01:00
Concedo	b9e99c69e8	fixed build	2024-11-26 22:06:55 +08:00
Eric Curtin	0cc63754b8	Introduce llama-run (#10291 ) It's like simple-chat but it uses smart pointers to avoid manual memory cleanups. Less memory leaks in the code now. Avoid printing multiple dots. Split code into smaller functions. Uses no exception handling. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2024-11-25 22:56:24 +01:00
Diego Devesa	5931c1f233	ggml : add support for dynamic loading of backends (#10469 ) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-25 15:13:39 +01:00
Georgi Gerganov	d9d54e498d	speculative : refactor and add a simpler example (#10362 ) * speculative : refactor and add a simpler example ggml-ci * speculative : clean-up and add comments and TODOs [no ci] * speculative : manage context in common_speculative ggml-ci * speculative : simplify ggml-ci * speculative : simplify (cont) ggml-ci * speculative : add --draft-min CLI arg * speculative : minor fixup * make : build fixes * speculative : do not redraft previous drafts ggml-ci * speculative : fix the draft sampling ggml-ci * speculative : fix compile warning * common : refactor args ggml-ci * common : change defaults [no ci] * common : final touches ggml-ci	2024-11-25 09:58:41 +02:00
Concedo	dbbdb2eedc	try fix macos build again (+3 squashed commit) Squashed commit: [7d2a67132] fix ci builds [f0a5f0a97] fixed a typo [8736d9034] try fix ci builds (+1 squashed commits) Squashed commits: [c2ae5a542] Revert "updated ci" This reverts commit `d8ebdde6ee`.	2024-11-21 23:15:51 +08:00
Anthony Van de Gejuchte	3952a221af	Fix missing file renames in Makefile due to changes in commit `ae8de6d50a` (#10413 )	2024-11-19 23:18:17 +01:00
Concedo	ee586b9a9d	fixed vulkan	2024-11-19 01:26:31 +08:00
Georgi Gerganov	cf32a9b93a	metal : refactor kernel args into structs (#10238 ) * metal : add kernel arg structs (wip) * metal : fattn args ggml-ci * metal : cont + avoid potential int overflow [no ci] * metal : mul mat struct (wip) * cont : mul mat vec * cont : pass by reference * cont : args is first argument * cont : use char ptr * cont : shmem style * cont : thread counters style * cont : mul mm id ggml-ci * cont : int safety + register optimizations ggml-ci * metal : GGML_OP_CONCAT ggml-ci * metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV * metal : GGML_OP_REPEAT * metal : GGML_OP_CPY * metal : GGML_OP_RMS_NORM * metal : GGML_OP_NORM * metal : add TODOs for rest of ops * ggml : add ggml-metal-impl.h ggml-ci	2024-11-17 11:23:01 +02:00
Johannes Gäßler	c3ea58aca4	CUDA: remove DMMV, consolidate F16 mult mat vec (#10318 )	2024-11-17 09:09:55 +01:00
Georgi Gerganov	a4200cafad	make : add ggml-opt (#0 ) ggml-ci	2024-11-17 08:30:29 +02:00
Georgi Gerganov	84274a10c3	tests : remove test-grad0	2024-11-17 08:30:29 +02:00
Concedo	d6932bbff8	test fix linux build	2024-11-17 02:43:42 +08:00
Concedo	e1f0b0bedd	try fix macos build (+1 squashed commits) Squashed commits: [ae66dddfd] try fix macos build	2024-11-17 02:37:08 +08:00
Georgi Gerganov	8ee0d09ae6	make : auto-determine dependencies (#0 )	2024-11-16 20:36:26 +02:00
Concedo	70aee82552	attempts a backflip, but does he stick the landing?	2024-11-16 17:05:45 +08:00
slaren	883d206fbd	ggml : fix some build issues	2024-11-15 21:45:32 +02:00
Charles Xu	1607a5e5b0	backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels (#9921 ) * backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2024-11-15 01:28:50 +01:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00
Georgi Gerganov	ec450d3bbf	metal : opt-in compile flag for BF16 (#10218 ) * metal : opt-in compile flag for BF16 ggml-ci * ci : use BF16 ggml-ci * swift : switch back to v12 * metal : has_float -> use_float ggml-ci * metal : fix BF16 check in MSL ggml-ci	2024-11-08 21:59:46 +02:00
Xuan Son Nguyen	a71d81cf8c	server : revamp chat UI with vuejs and daisyui (#10175 ) * server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)	2024-11-07 17:31:10 -04:00
Concedo	847689e74c	fixed incorrect makefile flags	2024-11-04 20:39:10 +08:00
Concedo	bb13925f39	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakePresets.json # Makefile # Package.swift # ci/run.sh # common/CMakeLists.txt # examples/CMakeLists.txt # flake.lock # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml.c # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp # tests/test-rope.cpp	2024-11-04 16:54:53 +08:00
Diego Devesa	9f40989351	ggml : move CPU backend to a separate file (#10144 )	2024-11-03 19:34:08 +01:00
Diego Devesa	a6744e43e8	llama : add simple-chat example (#10124 ) * llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-11-01 23:50:59 +01:00
Ma Mingfei	60ce97c9d8	add amx kernel for gemm (#8998 ) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-10-18 13:34:36 +08:00
Concedo	24a58a8b16	try fix hipblass build	2024-10-06 17:11:08 +08:00
Concedo	3e1cbedbae	Merge commit '`c83ad6d01e`' into concedo_experimental # Conflicts: # .github/workflows/bench.yml.disabled # Makefile # Package.swift # README.md # docs/backend/SYCL.md # examples/CMakeLists.txt # examples/benchmark/benchmark-matmult.cpp # ggml/src/CMakeLists.txt # scripts/sync-ggml-am.sh # scripts/sync-ggml.sh # src/llama.cpp # tests/test-backend-ops.cpp	2024-10-05 22:17:33 +08:00

1 2 3 4 5 ...

565 commits