koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-08 01:41:37 +00:00

Author	SHA1	Message	Date
Concedo	b99ee451f8	Merge commit '`4ccea213bc`' into concedo_experimental # Conflicts: # .devops/cpu.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .github/workflows/bench.yml.disabled # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # build-xcframework.sh # ci/run.sh # common/CMakeLists.txt # examples/llama.android/llama/build.gradle.kts # examples/perplexity/perplexity.cpp # examples/run/CMakeLists.txt # examples/server/tests/README.md # examples/sycl/win-build-sycl.bat # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # licenses/LICENSE-linenoise # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-04-08 21:26:23 +08:00
Concedo	6e42e673c6	attempt to fall back to system glslc	2025-04-07 00:33:52 +08:00
Concedo	59b7796b96	binops does not need clblast anymore	2025-04-03 23:06:19 +08:00
Concedo	8c74520586	added NO_VULKAN_EXTENSIONS flag to disable dp4a and coopmat if needed	2025-04-03 20:51:17 +08:00
Concedo	e1d3c19673	clblast not working correctly	2025-03-30 21:02:30 +08:00
Concedo	61a73347c6	fixed mrope for multiple images in qwen2vl (+1 squashed commits) Squashed commits: [63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits) Squashed commits: [bb78db1e] wip fixing mrope	2025-03-30 17:23:58 +08:00
Concedo	3992fb79cc	wip adding embeddings support	2025-03-24 18:01:23 +08:00
Concedo	9910f3abe0	remove precompiled vulkan shaders from repo. They will now have to be recreated in vulkan-shaders-gen from scratch at runtime, which is auto handled by the makefile for windows and linux.	2025-03-19 21:51:16 +08:00
Concedo	0cfd8d23cb	handle symlinks (+1 squashed commits) Squashed commits: [fb8477b9] fixed makefile (+4 squashed commit) Squashed commit: [4a245bba] fixed a makefile issue [d68eba69] alias usehipblas to usecublas [a9ab0a7c] dynamic rocwmma selection [fefe17c7] revert rocwmma	2025-03-17 21:03:30 +08:00
Concedo	98eade358a	more rocm include dir	2025-03-15 23:29:00 +08:00
Concedo	2c9ade61fe	test automatic vk shader rebuilding	2025-03-13 19:34:15 +08:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Concedo	e500968f92	fixed ggml common path in metal build	2025-03-12 10:58:57 +08:00
R0CKSTAR	251364549f	musa: support new arch mp_31 and update doc (#12296 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-03-10 18:18:25 +01:00
Concedo	7eadd0a1d3	add GGML_HIP_ROCWMMA_FATTN	2025-03-08 17:15:41 +08:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
Johannes Gäßler	a28e0d5eb1	CUDA: app option to compile without FlashAttention (#12025 )	2025-02-22 20:44:34 +01:00
Bodhi	0b3863ff95	MUSA: support ARM64 and enable dp4a .etc (#11843 ) * MUSA: support ARM64 and enable __dp4a .etc * fix cross entropy loss op for musa * update * add cc info log for musa * add comment for the MUSA .cc calculation block --------- Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>	2025-02-21 09:46:23 +02:00
Olivier Chafik	63e489c025	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 ) * tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type * addressed clang-tidy lints in [test-]chat.* * rm minja deps from util & common & move it to common/minja/ * add name & tool_call_id to common_chat_msg * add common_chat_tool * added json <-> tools, msgs conversions to chat.h * fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens) * fix deepseek r1 slow test (no longer <think> opening w/ new template) * allow empty tools w/ auto + grammar * fix & test server grammar & json_schema params w/ & w/o --jinja	2025-02-18 18:03:23 +00:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Concedo	816d9b7989	edit makefile flags	2025-02-08 22:36:26 +08:00
Concedo	ff9b4041da	fix builds	2025-02-07 11:46:08 +08:00
Johannes Gäßler	864a0b67a6	CUDA: use mma PTX instructions for FlashAttention (#11583 ) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-02 19:31:09 +01:00
Concedo	f13498df13	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/tools.sh # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/server.yml # Makefile # README.md # cmake/llama-config.cmake.in # common/CMakeLists.txt # examples/gbnf-validator/gbnf-validator.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp	2025-02-01 17:14:59 +08:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Concedo	7a5499e77b	added one more backend for clblast noavx2 and clblast failsafe	2025-01-30 22:47:22 +08:00
Concedo	898856e183	cleaned up unused flags from makefile, updated lite	2025-01-30 19:34:55 +08:00
Concedo	2f69432774	makefile indentation fix (+1 squashed commits) Squashed commits: [f640eb59] makefile indentation fix	2025-01-29 22:18:54 +08:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	bd38665e1f	some cleanup before starting on TTS	2025-01-10 22:13:44 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
Concedo	bb2e739627	fixed simplercflags	2025-01-07 21:34:38 +08:00
Concedo	58791612d2	sse3 mode for noavx2 clblast, fixed metadata, added version command	2025-01-06 21:59:05 +08:00
Concedo	b4dc29f425	kobo cheats death again (+1 squashed commits) Squashed commits: [708e2429] kobo cheats death again	2025-01-04 01:06:41 +08:00
Concedo	22fd7a0439	fix make tools for linux	2025-01-03 11:39:23 +08:00
Concedo	2a890ec25a	Breaking change: unify the windows and linux build flags. To do a full build on windows you now need LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1	2024-12-23 22:35:54 +08:00
Concedo	1e07043a6e	clean and rename old clblast files in preparation for merge	2024-12-15 15:29:02 +08:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
Concedo	a63c2c914d	made shaders gen deterministic, update to c++17 (+4 squashed commit) Squashed commit: [7bb2441b] made shaders gen deterministic [906e02af] Update c++ from 11 to 17 (#1263) * Update c/c++ from 11 to 17 * Update CMakeLists.txt only bump c++ [7ca430ed] C++17 ver [`b7dfb55d`] give up and switch to c++17 (+1 squashed commits) Squashed commits: [96cfbc48] give up and switch to c++17 (+5 squashed commit) Squashed commit: [19ac7c26] Revert "fixed incorrect number of params" This reverts commit 51388729bc4ffe51ab07ae02ce386219fb5e2876. [45f730da] Revert "fix for c++17" This reverts commit 050ba5f72b3358f958722addb9aaa77ff2e428ee. [51388729] fixed incorrect number of params [8f1ee54e] build latest vk shaders [050ba5f7] fix for c++17	2024-12-13 23:07:10 +08:00
Concedo	7e1abf3aaf	sync - fix cmake failing to build with c++11, updated glslc.exe to handle coopmat, sync sdtype count, aarch repack flags	2024-12-13 17:08:10 +08:00
Concedo	de64b9198c	merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)	2024-12-13 17:04:19 +08:00
Concedo	4548d893ee	better way to handle termux compatibility (+2 squashed commit) Squashed commit: [301986f11] better way to handle termux compatibility [16b03b225] updated lite	2024-12-11 15:05:01 +08:00
Concedo	a11bba5893	cleanup, fix native build for arm (+28 squashed commit) Squashed commit: [d1f6a4154] bundle library [947ab84b7] undo [0f9aba8d8] test [e9ac93873] test [920438202] test [`1c6d98804`] Revert "quick test" This reverts commit `acf8ec8940`. [`acf8ec894`] quick test [`6a9937233`] undo [`5a263a5bd`] test [`ddfd82bca`] test [`0b30e45da`] test [`c3bfece55`] messed up [`2a4b37fe0`] Revert "test" This reverts commit `80a1fcaeaf`. [`80a1fcaea`] test [`e2aa7d944`] test [`264d80200`] test [`f5b123173`] undo [`1ffacc484`] test [`63c0be926`] undo [`510e0377e`] ofast try fix [`4ac199b20`] try fix sigill [`1bc987ba2`] try fix illegal instruction [`7697252b1`] edit [`f87087b28`] check gcc ver [`e9dfe2cef`] try using qemu to do the pyinstaller [`b411192db`] revert [`25b5301e5`] try using qemu to do the pyinstaller [`58038cddc`] try using qemu to do the pyinstaller	2024-12-10 19:42:23 +08:00
Djip007	19d8762ab6	ggml : refactor online repacking (#10446 ) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-07 14:37:50 +02:00
Concedo	ece96e19bf	clean up makefile	2024-12-05 23:58:23 +08:00
Xuan Son Nguyen	91c36c269b	server : (web ui) Various improvements, now use vite as bundler (#10599 ) * hide buttons in dropdown menu * use npm as deps manager and vite as bundler * fix build * fix build (2) * fix responsive on mobile * fix more problems on mobile * sync build * (test) add CI step for verifying build * fix ci * force rebuild .hpp files * cmake: clean up generated files pre build	2024-12-03 19:38:44 +01:00
Georgi Gerganov	8648c52101	make : deprecate (#10514 ) * make : deprecate ggml-ci * ci : disable Makefile builds ggml-ci * docs : remove make references [no ci] * ci : disable swift build ggml-ci * docs : remove obsolete make references, scripts, examples ggml-ci * basic fix for compare-commits.sh * update build.md * more build.md updates * more build.md updates * more build.md updates * Update Makefile Co-authored-by: Diego Devesa <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-12-02 21:22:53 +02:00
Wang Qin	43957ef203	build: update Makefile comments for C++ version change (#10598 )	2024-12-01 04:19:44 +01:00
Diego Devesa	7cc2d2c889	ggml : move AMX to the CPU backend (#10570 ) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-29 21:54:58 +01:00

1 2 3 4 5 ...

594 commits