koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-11 04:51:25 +00:00

Author	SHA1	Message	Date
Concedo	26b71e33b1	Revert "vulkan: matmul dequantization improvements (#12015 )" This reverts commit `fbeda9002d`.	2025-03-05 00:02:47 +08:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
Concedo	50eae1ffeb	added trycatch for ipv4	2025-02-26 00:45:06 +08:00
Reithan	62cd9bb0b2	use range neq zero instead of lt (#1388 )	2025-02-24 18:47:19 +08:00
Concedo	12c501f723	fixed wrong file open mode	2025-02-24 15:14:02 +08:00
Concedo	159c47f0e6	Merge commit '`335eb04a91`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CONTRIBUTING.md # Makefile # docs/build.md # examples/llama.swiftui/llama.swiftui/UI/ContentView.swift # examples/run/run.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cuda/CMakeLists.txt # ggml/src/ggml-musa/CMakeLists.txt	2025-02-24 11:55:14 +08:00
Concedo	ccd2dbe020	added support for server side save slots	2025-02-24 00:20:16 +08:00
Concedo	5ee7cbe08c	add cydonia to colab	2025-02-22 23:02:44 +08:00
Rohanjames1997	335eb04a91	ci : Build on Github-hosted arm64 runners (#12009 )	2025-02-22 11:48:57 +01:00
Georgi Gerganov	cf756d6e0a	server : disable Nagle's algorithm (#12020 )	2025-02-22 11:46:31 +01:00
Gian-Carlo Pascutto	d70908421f	cuda: Add Q5_1, Q5_0, Q4_1 and Q4_0 to F32 conversion support. (#12000 )	2025-02-22 09:43:24 +01:00
Daniel Bevenius	de8b5a3624	llama.swiftui : add "Done" dismiss button to help view (#11998 ) The commit updates the help view in the llama.swiftui example to use a NavigationView and a Done button to dismiss the help view. The motivation for this is that without this change there is now way to dimiss the help view.	2025-02-22 06:33:29 +01:00
Georgi Gerganov	51f311e057	llama : skip loading unused tensors (#12004 ) * llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci	2025-02-21 18:33:18 +02:00
Concedo	34a0fab87c	updated to latest clinfo from https://github.com/Oblomov/clinfo direct link: https://ci.appveyor.com/api/projects/oblomov/clinfo/artifacts/clinfo.exe?job=platform%3a+x64	2025-02-21 19:51:27 +08:00
Johannes Gäßler	586d5fe6eb	doc: update contributing guidelines [no ci] (#11969 )	2025-02-21 12:51:25 +01:00
PureJourney	ecc8e3aeff	CUDA: correct the lowest Maxwell supported by CUDA 12 (#11984 ) * CUDA: correct the lowest Maxwell supported by CUDA 12 --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-02-21 12:21:05 +01:00
Bodhi	0b3863ff95	MUSA: support ARM64 and enable dp4a .etc (#11843 ) * MUSA: support ARM64 and enable __dp4a .etc * fix cross entropy loss op for musa * update * add cc info log for musa * add comment for the MUSA .cc calculation block --------- Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>	2025-02-21 09:46:23 +02:00
Concedo	f2ac10c014	added nsigma to lite	2025-02-21 15:11:24 +08:00
EquinoxPsychosis	2740af3660	add top n sigma sampler from llama.cpp (#1384 ) * Add N Sigma Sampler * update nsigma sampler chain * xtc position fix * remove stray newline --------- Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>	2025-02-21 14:31:42 +08:00
Alex Brooks	ee02ad02c5	clip : fix visual encoders with no CLS (#11982 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2025-02-21 08:11:03 +02:00
Concedo	5f74ee3c3b	merge sd fix	2025-02-21 11:16:26 +08:00
momonga	c392e5094d	server (webui): Fix Premature Submission During IME Conversion (#11971 ) * fix skip ime composing * fix npm rebuild * fix warn --------- Co-authored-by: momonga <115213907+mmnga@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-20 19:43:22 +01:00
Concedo	6d7ef10671	Merge branch 'upstream' into concedo_experimental Renable qwen2vl GPU for vulkan https://github.com/ggml-org/llama.cpp/pull/11902 # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # .gitignore # CONTRIBUTING.md # Makefile # common/CMakeLists.txt # common/arg.cpp # common/common.cpp # examples/main/main.cpp # examples/run/run.cpp # examples/server/tests/README.md # ggml/src/ggml-cuda/mma.cuh # scripts/get_chat_template.py # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-chat.cpp	2025-02-20 23:17:20 +08:00
Concedo	41350df81f	updated lite, added ability to export kcpps via CLI	2025-02-20 22:58:12 +08:00
Charles Xu	c5d91a7400	ggml-cpu: Add CPU backend support for KleidiAI library (#11390 ) * ggml-cpu: Add CPU backend support for KleidiAI library * Add environmental variable GGML_KLEIDIAI_SME * Add support for multithread LHS conversion * Switch kernel selection order to dotprod and i8mm * updates for review comments * More updates for review comments * Reorganize and rename KleidiAI files * Move ggml-cpu-traits.h to source file * Update cmake for SME build and add alignment for SME * Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list	2025-02-20 15:06:51 +02:00
Prashant Vithule	4806498bf1	ggml: aarch64: implement SVE kernels for q3_K_q8_K vector dot (#11917 ) * Added SVE Implementation for Q3_K Kernel in ggml-cpu-quants.c file * Improved Formating of code in ggml-cpu-quants.c file * style : minor fixes * style : less whitespaces * style : ptr spaceing --------- Co-authored-by: vithulep <p.m.vithule1517@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-20 12:08:32 +02:00
Michael Engel	0d559580a0	run : add --chat-template-file (#11961 ) Relates to: https://github.com/ggml-org/llama.cpp/issues/11178 Added --chat-template-file CLI option to llama-run. If specified, the file will be read and the content passed for overwriting the chat template of the model to common_chat_templates_from_model. Signed-off-by: Michael Engel <mengel@redhat.com>	2025-02-20 10:35:11 +02:00
Johannes Gäßler	d04e7163c8	doc: add links to ggml examples [no ci] (#11958 )	2025-02-19 20:45:17 +01:00
Daniel Bevenius	d07c621393	common : add llama.vim preset for Qwen2.5 Coder (#11945 ) This commit adds a preset for llama.vim to use the default Qwen 2.5 Coder models. The motivation for this change is to make it easier to start a server suitable to be used with the llama.vim plugin. For example, the server can be started with a command like the following: ```console $ llama.vim --fim-qwen-1.5b-default ``` Refs: https://github.com/ggml-org/llama.cpp/issues/10932	2025-02-19 12:29:52 +01:00
Georgi Gerganov	abd4d0bc4f	speculative : update default params (#11954 ) * speculative : update default params * speculative : do not discard the last drafted token	2025-02-19 13:29:42 +02:00
Daniel Bevenius	9626d9351a	llama : fix indentation in llama-grammar [no ci] (#11943 ) This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.	2025-02-19 06:16:23 +01:00
igardev	b58934c183	server : (webui) Enable communication with parent html (if webui is in iframe) (#11940 ) * Webui: Enable communication with parent html (if webui is in iframe): - Listens for "setText" command from parent with "text" and "context" fields. "text" is set in inputMsg, "context" is used as hidden context on the following requests to the llama.cpp server - On pressing na Escape button sends command "escapePressed" to the parent Example handling from the parent html side: - Send command "setText" from parent html to webui in iframe: const iframe = document.getElementById('askAiIframe'); if (iframe) { iframe.contentWindow.postMessage({ command: 'setText', text: text, context: context }, ''); } - Listen for Escape key from webui on parent html: // Listen for escape key event in the iframe window.addEventListener('keydown', (event) => { if (event.key === 'Escape') { // Process case when Escape is pressed inside webui } }); Move the extraContext from storage to app.context. * Fix formatting. * add Message.extra * format + build * MessageExtraContext * build * fix display * rm console.log --------- Co-authored-by: igardev <ivailo.gardev@akros.ch> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-02-18 23:01:44 +01:00
Olivier Chafik	63e489c025	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 ) * tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type * addressed clang-tidy lints in [test-]chat.* * rm minja deps from util & common & move it to common/minja/ * add name & tool_call_id to common_chat_msg * add common_chat_tool * added json <-> tools, msgs conversions to chat.h * fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens) * fix deepseek r1 slow test (no longer <think> opening w/ new template) * allow empty tools w/ auto + grammar * fix & test server grammar & json_schema params w/ & w/o --jinja	2025-02-18 18:03:23 +00:00
Xuan-Son Nguyen	63ac128563	server : add TEI API format for /rerank endpoint (#11942 ) * server : add TEI API format for /rerank endpoint * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix * also gitignore examples/server/*.gz.hpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-18 14:21:41 +01:00
MoonRide303	5137da7b8c	scripts: corrected encoding when getting chat template (#11866 ) (#11907 ) Signed-off-by: MoonRide303 <moonride303@gmail.com>	2025-02-18 10:30:16 +01:00
xiaobing318	09aaf4f1f5	docs : Fix duplicated file extension in test command (#11935 ) This commit fixes an issue in the llama.cpp project where the command for testing the llama-server object contained a duplicated file extension. The original command was: ./tests.sh unit/test_chat_completion.py.py -v -x It has been corrected to: ./tests.sh unit/test_chat_completion.py -v -x This change ensures that the test script correctly locates and executes the intended test file, preventing test failures due to an incorrect file name.	2025-02-18 10:12:49 +01:00
Johannes Gäßler	73e2ed3ce3	CUDA: use async data loading for FlashAttention (#11894 ) * CUDA: use async data loading for FlashAttention --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-17 14:03:24 +01:00
Eve	f7b1116af1	update release requirements (#11897 )	2025-02-17 12:20:23 +01:00
Antoine Viallon	c4d29baf32	server : fix divide-by-zero in metrics reporting (#11915 )	2025-02-17 11:25:12 +01:00
Rémy O	2eea03d86a	vulkan: implement several ops relevant for ggml_opt (#11769 ) * vulkan: support memset_tensor * vulkan: support GGML_OP_SUM * vulkan: implement GGML_OP_ARGMAX * vulkan: implement GGML_OP_SUB * vulkan: implement GGML_OP_COUNT_EQUAL * vulkan: implement GGML_OP_OPT_STEP_ADAMW * vulkan: fix check_results RWKV_WKV6 crash and memory leaks * vulkan: implement GGML_OP_REPEAT_BACK * tests: remove invalid test-backend-ops REPEAT_BACK tests * vulkan: fix COUNT_EQUAL memset using a fillBuffer command	2025-02-17 07:55:57 +01:00
Concedo	a67044270a	Merge remote-tracking branch 'jg/cuda-fa-mma-17' into debug4	2025-02-17 09:50:11 +08:00
Xuan-Son Nguyen	0f2bbe6564	server : bump httplib to 0.19.0 (#11908 )	2025-02-16 17:11:22 +00:00
Concedo	6fa50f78bf	allow kcppt for config switching	2025-02-17 00:48:34 +08:00
Concedo	15ae98c9cd	better error handling for downloads	2025-02-16 23:13:09 +08:00
Concedo	58380153b2	safer autoguess fix verbose outputs (+3 squashed commit) Squashed commit: [7bbbfc10] fixed a retry history bug [824b9bf7] another autoguess fix	2025-02-16 21:13:45 +08:00
standby24x7	fe163d5bf3	common : Fix a typo in help (#11899 ) This patch fixes a typo in command help. prefx -> prefix Signed-off-by: Masanari Iida <standby24x7@gmail.com>	2025-02-16 10:51:13 +01:00
Xuan-Son Nguyen	818a340ea8	ci : fix (again) arm64 build fails (#11895 ) * docker : attempt fixing arm64 build on ci * qemu v7.0.0-28	2025-02-16 10:36:39 +01:00
Jeff Bolz	bf42a23d0a	vulkan: support multi/vision rope, and noncontiguous rope (#11902 )	2025-02-16 08:52:23 +01:00
Hale Chan	c2ea16f260	metal : fix the crash caused by the lack of residency set support on Intel Macs. (#11904 )	2025-02-16 08:50:26 +02:00
Concedo	e0bdb2f622	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/imatrix/README.md # scripts/compare-llama-bench.py	2025-02-16 12:48:54 +08:00

1 2 3 4 5 ...

7060 commits