koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-31 21:39:42 +00:00

Author	SHA1	Message	Date
Concedo	00a686fc72	fixed fast forwarding context corruption after abort during prompt processing	2024-12-10 22:37:40 +08:00
Concedo	a11bba5893	cleanup, fix native build for arm (+28 squashed commit) Squashed commit: [d1f6a4154] bundle library [947ab84b7] undo [0f9aba8d8] test [e9ac93873] test [920438202] test [`1c6d98804`] Revert "quick test" This reverts commit `acf8ec8940`. [`acf8ec894`] quick test [`6a9937233`] undo [`5a263a5bd`] test [`ddfd82bca`] test [`0b30e45da`] test [`c3bfece55`] messed up [`2a4b37fe0`] Revert "test" This reverts commit `80a1fcaeaf`. [`80a1fcaea`] test [`e2aa7d944`] test [`264d80200`] test [`f5b123173`] undo [`1ffacc484`] test [`63c0be926`] undo [`510e0377e`] ofast try fix [`4ac199b20`] try fix sigill [`1bc987ba2`] try fix illegal instruction [`7697252b1`] edit [`f87087b28`] check gcc ver [`e9dfe2cef`] try using qemu to do the pyinstaller [`b411192db`] revert [`25b5301e5`] try using qemu to do the pyinstaller [`58038cddc`] try using qemu to do the pyinstaller	2024-12-10 19:42:23 +08:00
Concedo	e9d2332dd8	improved tool calls and whisper	2024-12-06 14:34:31 +08:00
Concedo	836c06d91a	minor edit	2024-12-06 00:37:38 +08:00
Concedo	746cb01843	remove test since it wont work on x64	2024-12-06 00:26:58 +08:00
Concedo	65a11451e3	fix missing bundled files	2024-12-06 00:21:08 +08:00
Concedo	fe72c8db9f	CI for ARM should appear as ARM	2024-12-06 00:12:30 +08:00
Concedo	5cddd0a878	Merge branch 'concedo' into concedo_experimental	2024-12-05 23:58:31 +08:00
Concedo	ece96e19bf	clean up makefile	2024-12-05 23:58:23 +08:00
Concedo	8d5bb06aeb	test aarch64 ci workflow	2024-12-05 23:57:25 +08:00
Concedo	d0d1d922de	handle and fix temp paths to chat completions adapter	2024-12-05 17:22:35 +08:00
Concedo	5106816eac	drafted tokens debug prints	2024-12-05 17:05:20 +08:00
Concedo	2787fca6b4	refactored library selection, fixed ollama params	2024-12-05 16:47:52 +08:00
Concedo	52cc908f7f	default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change.	2024-12-03 22:44:10 +08:00
Concedo	7d11d2946c	only show warning if more than 1 moved tensor	2024-12-03 22:09:26 +08:00
Ikko Eltociear Ashimine	ed9e229372	docs: update README.md (#1244 ) recomended -> recommended	2024-12-02 17:20:20 +08:00
Concedo	2ba5949054	updated sdcpp, also set euler as default sampler	2024-12-01 17:00:20 +08:00
Concedo	e93c2427b4	allow incompatible vocab in debugmode	2024-12-01 14:11:03 +08:00
Concedo	42228b9746	warning when selecting non gguf models	2024-12-01 13:35:51 +08:00
Concedo	d5e732f3ab	updated lite	2024-12-01 01:49:09 +08:00
Concedo	b7cd210cd2	more linting with Ruff (+1 squashed commits) Squashed commits: [43802cfe2] Applied default Ruff linting	2024-12-01 01:23:13 +08:00
Concedo	409e393d10	fixed critical bug in image model loader	2024-11-30 23:28:24 +08:00
Concedo	153da19274	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md	2024-11-30 16:59:25 +08:00
Concedo	0028e71993	special handling to resolve incomplete utf8 token sequences in qwen	2024-11-30 16:54:01 +08:00
Concedo	32ac3153e4	default speculative set to 8. added more adapter fields	2024-11-30 16:18:27 +08:00
Georgi Gerganov	3e0ba0e604	readme : remove old badge	2024-11-30 10:09:21 +02:00
Georgi Gerganov	abadba05be	readme : refresh (#10587 ) * readme : refresh * readme : move section [no ci] * readme : clarify [no ci] * readme : fixes [no ci] * readme : more fixes [no ci] * readme : simplify [no ci] * readme : clarify GGUF	2024-11-30 09:47:07 +02:00
Eve	0533e7fb38	vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536 ) * subgroup 64 version with subgroup add. 15% faster scalable version tested for subgroup sizes 16-128 * check for subgroup multiple of 16 and greater than 16 * subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45) * force 16 sequential threads per block * make 16 subgroup size a constant	2024-11-30 08:00:02 +01:00
Concedo	5353bfa983	updated lite	2024-11-30 12:26:20 +08:00
Concedo	557bcaf86e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .clang-tidy # .github/workflows/build.yml # Makefile # Package.swift # common/CMakeLists.txt # examples/batched-bench/CMakeLists.txt # examples/batched/CMakeLists.txt # examples/convert-llama2c-to-ggml/CMakeLists.txt # examples/cvector-generator/CMakeLists.txt # examples/embedding/CMakeLists.txt # examples/eval-callback/CMakeLists.txt # examples/export-lora/CMakeLists.txt # examples/gbnf-validator/CMakeLists.txt # examples/gguf-split/CMakeLists.txt # examples/gguf/CMakeLists.txt # examples/gritlm/CMakeLists.txt # examples/imatrix/CMakeLists.txt # examples/infill/CMakeLists.txt # examples/llama-bench/CMakeLists.txt # examples/llava/CMakeLists.txt # examples/lookahead/CMakeLists.txt # examples/lookup/CMakeLists.txt # examples/main-cmake-pkg/CMakeLists.txt # examples/main/CMakeLists.txt # examples/parallel/CMakeLists.txt # examples/passkey/CMakeLists.txt # examples/perplexity/CMakeLists.txt # examples/quantize-stats/CMakeLists.txt # examples/quantize/CMakeLists.txt # examples/retrieval/CMakeLists.txt # examples/run/CMakeLists.txt # examples/save-load-state/CMakeLists.txt # examples/server/CMakeLists.txt # examples/simple-chat/CMakeLists.txt # examples/simple/CMakeLists.txt # examples/speculative-simple/CMakeLists.txt # examples/speculative/CMakeLists.txt # examples/tokenize/CMakeLists.txt # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-backend.cpp # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt # pocs/vdot/CMakeLists.txt # src/CMakeLists.txt # src/unicode.cpp # tests/test-sampling.cpp	2024-11-30 12:24:51 +08:00
Concedo	697ca70115	temp checkpoint	2024-11-30 12:13:20 +08:00
Concedo	ec95241e38	temp checkpoint	2024-11-30 11:59:27 +08:00
Concedo	0c8939be19	temp checkpoint	2024-11-30 11:57:28 +08:00
Concedo	e0c59486ee	default to 12 tokens drafted	2024-11-30 11:52:07 +08:00
Concedo	b21d0fe3ac	customizable speculative size	2024-11-30 11:28:19 +08:00
Concedo	f75bbb945f	speculative decoding initial impl completed (+6 squashed commit) Squashed commit: [0a6306ca0] draft wip dont use (will be squashed) [a758a1c9c] wip dont use (will be squashed) [e1994d3ce] wip dont use [f59690d68] wip [77228147d] wip on spec decoding. dont use yet [2445bca54] wip adding speculative decoding (+1 squashed commits) Squashed commits: [50e341bb7] wip adding speculative decoding	2024-11-30 10:41:10 +08:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00
Xuan Son Nguyen	b782e5c7d4	server : add more test cases (#10569 ) * server : add split model test * add test speculative * add invalid cases	2024-11-29 21:48:56 +01:00
Robert Collins	3a8e9af402	imatrix : support combine-only (#10492 ) * imatrix-combine-only idea * ensured that behavior consistent with log	2024-11-29 19:21:37 +02:00
Diego Devesa	a3a3048e7a	cleanup UI link list (#10577 ) * cleanup UI link list * sort list alphabetically * add missing licenses	2024-11-29 17:45:08 +01:00
Georgi Gerganov	f0678c5ff4	ggml : fix I8MM Q4_1 scaling factor conversion (#10562 ) ggml-ci	2024-11-29 16:25:39 +02:00
Shupei Fan	4b3242bbea	ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580 )	2024-11-29 14:49:02 +01:00
Alberto Cabrera Pérez	0f77aae560	sycl : offload of get_rows set to 0 (#10432 )	2024-11-29 20:38:45 +08:00
Alberto Cabrera Pérez	266b8519ee	sycl : Reroute permuted mul_mats through oneMKL (#10408 ) This PR fixes the failing MUL_MAT tests for the sycl backend.	2024-11-29 09:49:43 +00:00
Chenguang Li	938f608742	CANN: RoPE operator optimization (#10563 ) * [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-11-29 14:46:55 +08:00
Jeff Bolz	f095a649ec	vulkan: get the first command buffer submitted sooner (#10499 ) This is an incremental improvement over #9118 to get work to the GPU a bit sooner. The first part is to start with a smaller number of nodes before the first submit, and ramp it up to the current 100 nodes/submit. The second part is to reduce the dryrun overhead for all the nodes that just need to request descriptor space. With these changes I get around 1-2% speedup on RTX 4070 combined with my old Haswell-era CPU.	2024-11-29 07:18:02 +01:00
Ting Lou	678d7994f4	llava: return false instead of exit (#10546 )	2024-11-29 01:09:46 +01:00
Georgi Gerganov	dc22344088	ggml : remove redundant copyright notice + update authors	2024-11-28 20:46:40 +02:00
Georgi Gerganov	4c0a95b107	llama : add missing model types	2024-11-28 20:45:07 +02:00
Xuan Son Nguyen	6c59567689	server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568 ) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3	2024-11-28 19:17:49 +01:00

1 2 3 4 5 ...

6333 commits