koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-04-28 03:30:20 +00:00

Author	SHA1	Message	Date
Concedo	60268de62c	update targets for rocm	2025-05-25 18:41:15 +08:00
Concedo	499283c63a	rename define to match upstream	2025-05-23 17:10:12 +08:00
Concedo	dec3cd92b0	fix cuda compile	2025-05-13 02:15:33 +08:00
Concedo	40eb3a54c4	rename some toolip texts	2025-05-11 22:50:40 +08:00
Concedo	5cf5f35540	added vulkan build target for main.exe	2025-05-11 21:53:08 +08:00
Concedo	b951310ca5	tryout smaller binaries	2025-05-07 14:56:34 +08:00
Concedo	0fa435b2a6	Merge commit '`9b61acf060`' into concedo_experimental # Conflicts: # Makefile # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # requirements/requirements-all.txt # tools/mtmd/CMakeLists.txt # tools/mtmd/README.md # tools/mtmd/android/adb_run.sh # tools/mtmd/android/build_64.sh # tools/mtmd/clip-quantize-cli.cpp	2025-05-06 23:34:21 +08:00
Xuan-Son Nguyen	9b61acf060	mtmd : rename llava directory to mtmd (#13311 ) * mv llava to mtmd * change ref everywhere	2025-05-05 16:02:55 +02:00
Concedo	5a2808ffaf	Merge branch 'upstream' into concedo_experimental # Conflicts: # .flake8 # .github/labeler.yml # .github/workflows/bench.yml.disabled # .github/workflows/build-linux-cross.yml # .github/workflows/build.yml # .github/workflows/server.yml # .gitignore # CMakeLists.txt # CODEOWNERS # Makefile # README.md # SECURITY.md # build-xcframework.sh # ci/run.sh # docs/development/HOWTO-add-model.md # docs/multimodal/MobileVLM.md # docs/multimodal/glmedge.md # docs/multimodal/llava.md # docs/multimodal/minicpmo2.6.md # docs/multimodal/minicpmv2.5.md # docs/multimodal/minicpmv2.6.md # examples/CMakeLists.txt # examples/pydantic_models_to_grammar_examples.py # grammars/README.md # pyrightconfig.json # requirements/requirements-all.txt # scripts/fetch_server_test_models.py # scripts/tool_bench.py # scripts/xxd.cmake # tests/CMakeLists.txt # tests/run-json-schema-to-grammar.mjs # tools/batched-bench/CMakeLists.txt # tools/batched-bench/README.md # tools/batched-bench/batched-bench.cpp # tools/cvector-generator/CMakeLists.txt # tools/cvector-generator/README.md # tools/cvector-generator/completions.txt # tools/cvector-generator/cvector-generator.cpp # tools/cvector-generator/mean.hpp # tools/cvector-generator/negative.txt # tools/cvector-generator/pca.hpp # tools/cvector-generator/positive.txt # tools/export-lora/CMakeLists.txt # tools/export-lora/README.md # tools/export-lora/export-lora.cpp # tools/gguf-split/CMakeLists.txt # tools/gguf-split/README.md # tools/imatrix/CMakeLists.txt # tools/imatrix/README.md # tools/imatrix/imatrix.cpp # tools/llama-bench/CMakeLists.txt # tools/llama-bench/README.md # tools/llama-bench/llama-bench.cpp # tools/llava/CMakeLists.txt # tools/llava/README.md # tools/llava/android/adb_run.sh # tools/llava/android/build_64.sh # tools/llava/clip-quantize-cli.cpp # tools/main/CMakeLists.txt # tools/main/README.md # tools/perplexity/CMakeLists.txt # tools/perplexity/README.md # tools/perplexity/perplexity.cpp # tools/quantize/CMakeLists.txt # tools/rpc/CMakeLists.txt # tools/rpc/README.md # tools/rpc/rpc-server.cpp # tools/run/CMakeLists.txt # tools/run/README.md # tools/run/linenoise.cpp/linenoise.cpp # tools/run/linenoise.cpp/linenoise.h # tools/run/run.cpp # tools/server/CMakeLists.txt # tools/server/README.md # tools/server/bench/README.md # tools/server/public_simplechat/readme.md # tools/server/tests/README.md # tools/server/themes/README.md # tools/server/themes/buttons-top/README.md # tools/server/themes/wild/README.md # tools/tokenize/CMakeLists.txt # tools/tokenize/tokenize.cpp	2025-05-03 12:15:36 +08:00
Diego Devesa	1d36b3670b	llama : move end-user examples to tools directory (#13249 ) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-05-02 20:27:13 +02:00
Concedo	f1eb6c4e36	mtmd for debug	2025-04-24 16:27:24 +08:00
Concedo	6dbee2f2f8	more robust glslc checks, increase default denoise str	2025-04-22 15:19:47 +08:00
David Huang	84778e9770	CUDA/HIP: Share the same unified memory allocation logic. (#12934 ) Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.	2025-04-15 11:20:38 +02:00
Concedo	e1ee857b1e	allow vulkan to be packaged without coopmat for noavx2	2025-04-14 12:40:00 +08:00
Concedo	2d0b7e37f9	fix build	2025-04-13 22:01:48 +08:00
Concedo	956ed89595	fixed build	2025-04-12 17:06:55 +08:00
Concedo	b99ee451f8	Merge commit '`4ccea213bc`' into concedo_experimental # Conflicts: # .devops/cpu.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .github/workflows/bench.yml.disabled # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # build-xcframework.sh # ci/run.sh # common/CMakeLists.txt # examples/llama.android/llama/build.gradle.kts # examples/perplexity/perplexity.cpp # examples/run/CMakeLists.txt # examples/server/tests/README.md # examples/sycl/win-build-sycl.bat # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/aclnn_ops.h # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-cpu/ggml-cpu.c # licenses/LICENSE-linenoise # scripts/sync-ggml.last # tests/CMakeLists.txt	2025-04-08 21:26:23 +08:00
Concedo	6e42e673c6	attempt to fall back to system glslc	2025-04-07 00:33:52 +08:00
Concedo	59b7796b96	binops does not need clblast anymore	2025-04-03 23:06:19 +08:00
Concedo	8c74520586	added NO_VULKAN_EXTENSIONS flag to disable dp4a and coopmat if needed	2025-04-03 20:51:17 +08:00
Concedo	e1d3c19673	clblast not working correctly	2025-03-30 21:02:30 +08:00
Concedo	61a73347c6	fixed mrope for multiple images in qwen2vl (+1 squashed commits) Squashed commits: [63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits) Squashed commits: [bb78db1e] wip fixing mrope	2025-03-30 17:23:58 +08:00
Concedo	3992fb79cc	wip adding embeddings support	2025-03-24 18:01:23 +08:00
Concedo	9910f3abe0	remove precompiled vulkan shaders from repo. They will now have to be recreated in vulkan-shaders-gen from scratch at runtime, which is auto handled by the makefile for windows and linux.	2025-03-19 21:51:16 +08:00
Concedo	0cfd8d23cb	handle symlinks (+1 squashed commits) Squashed commits: [fb8477b9] fixed makefile (+4 squashed commit) Squashed commit: [4a245bba] fixed a makefile issue [d68eba69] alias usehipblas to usecublas [a9ab0a7c] dynamic rocwmma selection [fefe17c7] revert rocwmma	2025-03-17 21:03:30 +08:00
Concedo	98eade358a	more rocm include dir	2025-03-15 23:29:00 +08:00
Concedo	2c9ade61fe	test automatic vk shader rebuilding	2025-03-13 19:34:15 +08:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Concedo	e500968f92	fixed ggml common path in metal build	2025-03-12 10:58:57 +08:00
R0CKSTAR	251364549f	musa: support new arch mp_31 and update doc (#12296 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-03-10 18:18:25 +01:00
Concedo	7eadd0a1d3	add GGML_HIP_ROCWMMA_FATTN	2025-03-08 17:15:41 +08:00
Concedo	6b7d2349a7	Rewrite history to fix bad vulkan shader commits without increasing repo size added dpe colab (+8 squashed commit) Squashed commit: [b8362da4] updated lite [ed6c037d] move nsigma into the regular sampler stack [ac5f61c6] relative filepath fixed [05fe96ab] export template [ed0a5a3e] nix_example.md: refactor (#1401) * nix_example.md: add override example * nix_example.md: drop graphics example, already basic nixos knowledge * nix_example.md: format * nix_example.md: Vulkan is disabled on macOS Disabled in: `1ccd253acc` * nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities} Fixes: https://github.com/LostRuins/koboldcpp/issues/1367 [675c62f7] AutoGuess: Phi 4 (mini) (#1402) [`4bf56982`] phrasing [`b8c0df04`] Add Rep Pen to Top N Sigma sampler chain (#1397) - place after nsigma and before xtc (+3 squashed commit) Squashed commit: [`87c52b97`] disable VMM from HIP [`ee8906f3`] edit description [`e85c0e69`] Remove Unnecessary Rep Counting (#1394) * stop counting reps * fix range-based initializer * strike that - reverse it	2025-03-05 00:02:20 +08:00
Johannes Gäßler	a28e0d5eb1	CUDA: app option to compile without FlashAttention (#12025 )	2025-02-22 20:44:34 +01:00
Bodhi	0b3863ff95	MUSA: support ARM64 and enable dp4a .etc (#11843 ) * MUSA: support ARM64 and enable __dp4a .etc * fix cross entropy loss op for musa * update * add cc info log for musa * add comment for the MUSA .cc calculation block --------- Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>	2025-02-21 09:46:23 +02:00
Olivier Chafik	63e489c025	tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900 ) * tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type * addressed clang-tidy lints in [test-]chat.* * rm minja deps from util & common & move it to common/minja/ * add name & tool_call_id to common_chat_msg * add common_chat_tool * added json <-> tools, msgs conversions to chat.h * fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens) * fix deepseek r1 slow test (no longer <think> opening w/ new template) * allow empty tools w/ auto + grammar * fix & test server grammar & json_schema params w/ & w/o --jinja	2025-02-18 18:03:23 +00:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Concedo	816d9b7989	edit makefile flags	2025-02-08 22:36:26 +08:00
Concedo	ff9b4041da	fix builds	2025-02-07 11:46:08 +08:00
Johannes Gäßler	864a0b67a6	CUDA: use mma PTX instructions for FlashAttention (#11583 ) * CUDA: use mma PTX instructions for FlashAttention * __shfl_sync workaround for movmatrix * add __shfl_sync to HIP Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-02-02 19:31:09 +01:00
Concedo	f13498df13	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/tools.sh # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/docker.yml # .github/workflows/server.yml # Makefile # README.md # cmake/llama-config.cmake.in # common/CMakeLists.txt # examples/gbnf-validator/gbnf-validator.cpp # examples/run/run.cpp # examples/server/README.md # examples/server/tests/README.md # ggml/src/CMakeLists.txt # ggml/src/ggml-hip/CMakeLists.txt # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-grammar-integration.cpp	2025-02-01 17:14:59 +08:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Concedo	7a5499e77b	added one more backend for clblast noavx2 and clblast failsafe	2025-01-30 22:47:22 +08:00
Concedo	898856e183	cleaned up unused flags from makefile, updated lite	2025-01-30 19:34:55 +08:00
Concedo	2f69432774	makefile indentation fix (+1 squashed commits) Squashed commits: [f640eb59] makefile indentation fix	2025-01-29 22:18:54 +08:00
Olivier Chafik	6171c9d258	Add Jinja template support (#11016 ) * Copy minja from `58f0ca6dd7` * Add --jinja and --chat-template-file flags * Add missing <optional> include * Avoid print in get_hf_chat_template.py * No designated initializers yet * Try and work around msvc++ non-macro max resolution quirk * Update test_chat_completion.py * Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template * Refactor test-chat-template * Test templates w/ minja * Fix deprecation * Add --jinja to llama-run * Update common_chat_format_example to use minja template wrapper * Test chat_template in e2e test * Update utils.py * Update test_chat_completion.py * Update run.cpp * Update arg.cpp * Refactor common_chat_* functions to accept minja template + use_jinja option * Attempt to fix linkage of LLAMA_CHATML_TEMPLATE * Revert LLAMA_CHATML_TEMPLATE refactor * Normalize newlines in test-chat-templates for windows tests * Forward decl minja::chat_template to avoid eager json dep * Flush stdout in chat template before potential crash * Fix copy elision warning * Rm unused optional include * Add missing optional include to server.cpp * Disable jinja test that has a cryptic windows failure * minja: fix vigogne (https://github.com/google/minja/pull/22) * Apply suggestions from code review Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Finish suggested renamings * Move chat_templates inside server_context + remove mutex * Update --chat-template-file w/ recent change to --chat-template * Refactor chat template validation * Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr) * Warn against missing eos / bos tokens when jinja template references them * rename: common_chat_template[s] * reinstate assert on chat_templates.template_default * Update minja to `b8437df626` * Update minja to https://github.com/google/minja/pull/25 * Update minja from https://github.com/google/minja/pull/27 * rm unused optional header --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-01-21 13:18:51 +00:00
Concedo	b3de1598e7	Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS tts is functional (+6 squashed commit) Squashed commit: [22396311] wip tts [3a883027] tts not yet working [0dcfab0e] fix silly bug [a378d9ef] some long overdue cleanup [fc5a6fb5] Wip tts [39f50497] wip TTS integration	2025-01-13 14:23:25 +08:00
Concedo	bd38665e1f	some cleanup before starting on TTS	2025-01-10 22:13:44 +08:00
Concedo	e788b8289a	You'll never take us alive We swore that death will do us part They'll call our crimes a work of art	2025-01-09 11:27:06 +08:00
Concedo	bb2e739627	fixed simplercflags	2025-01-07 21:34:38 +08:00
Concedo	58791612d2	sse3 mode for noavx2 clblast, fixed metadata, added version command	2025-01-06 21:59:05 +08:00

1 2 3 4 5 ...

610 commits