Commit graph

594 commits

Author SHA1 Message Date
Concedo
b99ee451f8 Merge commit '4ccea213bc' into concedo_experimental
# Conflicts:
#	.devops/cpu.Dockerfile
#	.devops/cuda.Dockerfile
#	.devops/intel.Dockerfile
#	.devops/musa.Dockerfile
#	.devops/rocm.Dockerfile
#	.github/workflows/bench.yml.disabled
#	.github/workflows/build.yml
#	.github/workflows/server.yml
#	CMakeLists.txt
#	build-xcframework.sh
#	ci/run.sh
#	common/CMakeLists.txt
#	examples/llama.android/llama/build.gradle.kts
#	examples/perplexity/perplexity.cpp
#	examples/run/CMakeLists.txt
#	examples/server/tests/README.md
#	examples/sycl/win-build-sycl.bat
#	ggml/src/ggml-cann/aclnn_ops.cpp
#	ggml/src/ggml-cann/aclnn_ops.h
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cpu/ggml-cpu.c
#	licenses/LICENSE-linenoise
#	scripts/sync-ggml.last
#	tests/CMakeLists.txt
2025-04-08 21:26:23 +08:00
Concedo
6e42e673c6 attempt to fall back to system glslc 2025-04-07 00:33:52 +08:00
Concedo
59b7796b96 binops does not need clblast anymore 2025-04-03 23:06:19 +08:00
Concedo
8c74520586 added NO_VULKAN_EXTENSIONS flag to disable dp4a and coopmat if needed 2025-04-03 20:51:17 +08:00
Concedo
e1d3c19673 clblast not working correctly 2025-03-30 21:02:30 +08:00
Concedo
61a73347c6 fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:

[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)

Squashed commits:

[bb78db1e] wip fixing mrope
2025-03-30 17:23:58 +08:00
Concedo
3992fb79cc wip adding embeddings support 2025-03-24 18:01:23 +08:00
Concedo
9910f3abe0 remove precompiled vulkan shaders from repo. They will now have to be recreated in vulkan-shaders-gen from scratch at runtime, which is auto handled by the makefile for windows and linux. 2025-03-19 21:51:16 +08:00
Concedo
0cfd8d23cb handle symlinks (+1 squashed commits)
Squashed commits:

[fb8477b9] fixed makefile (+4 squashed commit)

Squashed commit:

[4a245bba] fixed a makefile issue

[d68eba69] alias usehipblas to usecublas

[a9ab0a7c] dynamic rocwmma selection

[fefe17c7] revert rocwmma
2025-03-17 21:03:30 +08:00
Concedo
98eade358a more rocm include dir 2025-03-15 23:29:00 +08:00
Concedo
2c9ade61fe test automatic vk shader rebuilding 2025-03-13 19:34:15 +08:00
Concedo
77debb1b1b gemma3 vision works, but is using more tokens than expected - may need resizing 2025-03-13 00:31:16 +08:00
Concedo
e500968f92 fixed ggml common path in metal build 2025-03-12 10:58:57 +08:00
R0CKSTAR
251364549f
musa: support new arch mp_31 and update doc (#12296)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-10 18:18:25 +01:00
Concedo
7eadd0a1d3 add GGML_HIP_ROCWMMA_FATTN 2025-03-08 17:15:41 +08:00
Concedo
6b7d2349a7 Rewrite history to fix bad vulkan shader commits without increasing repo size
added dpe colab (+8 squashed commit)

Squashed commit:

[b8362da4] updated lite

[ed6c037d] move nsigma into the regular sampler stack

[ac5f61c6] relative filepath fixed

[05fe96ab] export template

[ed0a5a3e] nix_example.md: refactor (#1401)

* nix_example.md: add override example

* nix_example.md: drop graphics example, already basic nixos knowledge

* nix_example.md: format

* nix_example.md: Vulkan is disabled on macOS

Disabled in: 1ccd253acc

* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}

Fixes: https://github.com/LostRuins/koboldcpp/issues/1367

[675c62f7] AutoGuess: Phi 4 (mini) (#1402)

[4bf56982] phrasing

[b8c0df04] Add Rep Pen to Top N Sigma sampler chain (#1397)

- place after nsigma and before xtc (+3 squashed commit)

Squashed commit:

[87c52b97] disable VMM from HIP

[ee8906f3] edit description

[e85c0e69] Remove Unnecessary Rep Counting (#1394)

* stop counting reps

* fix range-based initializer

* strike that - reverse it
2025-03-05 00:02:20 +08:00
Johannes Gäßler
a28e0d5eb1
CUDA: app option to compile without FlashAttention (#12025) 2025-02-22 20:44:34 +01:00
Bodhi
0b3863ff95
MUSA: support ARM64 and enable dp4a .etc (#11843)
* MUSA:  support ARM64 and enable __dp4a .etc

* fix cross entropy loss op for musa

* update

* add cc info log for musa

* add comment for the MUSA .cc calculation block

---------

Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>
2025-02-21 09:46:23 +02:00
Olivier Chafik
63e489c025
tool-call: refactor common chat / tool-call api (+ tests / fixes) (#11900)
* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type

* addressed clang-tidy lints in [test-]chat.*

* rm minja deps from util & common & move it to common/minja/

* add name & tool_call_id to common_chat_msg

* add common_chat_tool

* added json <-> tools, msgs conversions to chat.h

* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)

* fix deepseek r1 slow test (no longer <think> opening w/ new template)

* allow empty tools w/ auto + grammar

* fix & test server grammar & json_schema params w/ & w/o --jinja
2025-02-18 18:03:23 +00:00
Georgi Gerganov
68ff663a04
repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
Concedo
816d9b7989 edit makefile flags 2025-02-08 22:36:26 +08:00
Concedo
ff9b4041da fix builds 2025-02-07 11:46:08 +08:00
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention (#11583)
* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-02 19:31:09 +01:00
Concedo
f13498df13 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	.devops/vulkan.Dockerfile
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	.github/workflows/server.yml
#	Makefile
#	README.md
#	cmake/llama-config.cmake.in
#	common/CMakeLists.txt
#	examples/gbnf-validator/gbnf-validator.cpp
#	examples/run/run.cpp
#	examples/server/README.md
#	examples/server/tests/README.md
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-hip/CMakeLists.txt
#	scripts/sync-ggml.last
#	tests/CMakeLists.txt
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-grammar-integration.cpp
2025-02-01 17:14:59 +08:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Concedo
7a5499e77b added one more backend for clblast noavx2 and clblast failsafe 2025-01-30 22:47:22 +08:00
Concedo
898856e183 cleaned up unused flags from makefile, updated lite 2025-01-30 19:34:55 +08:00
Concedo
2f69432774 makefile indentation fix (+1 squashed commits)
Squashed commits:

[f640eb59] makefile indentation fix
2025-01-29 22:18:54 +08:00
Olivier Chafik
6171c9d258
Add Jinja template support (#11016)
* Copy minja from 58f0ca6dd7

* Add --jinja and --chat-template-file flags

* Add missing <optional> include

* Avoid print in get_hf_chat_template.py

* No designated initializers yet

* Try and work around msvc++ non-macro max resolution quirk

* Update test_chat_completion.py

* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template

* Refactor test-chat-template

* Test templates w/ minja

* Fix deprecation

* Add --jinja to llama-run

* Update common_chat_format_example to use minja template wrapper

* Test chat_template in e2e test

* Update utils.py

* Update test_chat_completion.py

* Update run.cpp

* Update arg.cpp

* Refactor common_chat_* functions to accept minja template + use_jinja option

* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE

* Revert LLAMA_CHATML_TEMPLATE refactor

* Normalize newlines in test-chat-templates for windows tests

* Forward decl minja::chat_template to avoid eager json dep

* Flush stdout in chat template before potential crash

* Fix copy elision warning

* Rm unused optional include

* Add missing optional include to server.cpp

* Disable jinja test that has a cryptic windows failure

* minja: fix vigogne (https://github.com/google/minja/pull/22)

* Apply suggestions from code review

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Finish suggested renamings

* Move chat_templates inside server_context + remove mutex

* Update --chat-template-file w/ recent change to --chat-template

* Refactor chat template validation

* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)

* Warn against missing eos / bos tokens when jinja template references them

* rename: common_chat_template[s]

* reinstate assert on chat_templates.template_default

* Update minja to b8437df626

* Update minja to https://github.com/google/minja/pull/25

* Update minja from https://github.com/google/minja/pull/27

* rm unused optional header

---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Concedo
b3de1598e7 Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
tts is functional (+6 squashed commit)

Squashed commit:

[22396311] wip tts

[3a883027] tts not yet working

[0dcfab0e] fix silly bug

[a378d9ef] some long overdue cleanup

[fc5a6fb5] Wip tts

[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
bd38665e1f some cleanup before starting on TTS 2025-01-10 22:13:44 +08:00
Concedo
e788b8289a You'll never take us alive
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
Concedo
bb2e739627 fixed simplercflags 2025-01-07 21:34:38 +08:00
Concedo
58791612d2 sse3 mode for noavx2 clblast, fixed metadata, added version command 2025-01-06 21:59:05 +08:00
Concedo
b4dc29f425 kobo cheats death again (+1 squashed commits)
Squashed commits:

[708e2429] kobo cheats death again
2025-01-04 01:06:41 +08:00
Concedo
22fd7a0439 fix make tools for linux 2025-01-03 11:39:23 +08:00
Concedo
2a890ec25a Breaking change: unify the windows and linux build flags.
To do a full build on windows you now need LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1
2024-12-23 22:35:54 +08:00
Concedo
1e07043a6e clean and rename old clblast files in preparation for merge 2024-12-15 15:29:02 +08:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE (#10361)
* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Concedo
a63c2c914d made shaders gen deterministic, update to c++17 (+4 squashed commit)
Squashed commit:

[7bb2441b] made shaders gen deterministic

[906e02af] Update c++ from 11 to 17 (#1263)

* Update c/c++ from 11 to 17

* Update CMakeLists.txt

only bump c++

[7ca430ed] C++17 ver

[b7dfb55d] give up and switch to c++17 (+1 squashed commits)

Squashed commits:

[96cfbc48] give up and switch to c++17 (+5 squashed commit)

Squashed commit:

[19ac7c26] Revert "fixed incorrect number of params"

This reverts commit 51388729bc4ffe51ab07ae02ce386219fb5e2876.

[45f730da] Revert "fix for c++17"

This reverts commit 050ba5f72b3358f958722addb9aaa77ff2e428ee.

[51388729] fixed incorrect number of params

[8f1ee54e] build latest vk shaders

[050ba5f7] fix for c++17
2024-12-13 23:07:10 +08:00
Concedo
7e1abf3aaf sync - fix cmake failing to build with c++11, updated glslc.exe to handle coopmat, sync sdtype count, aarch repack flags 2024-12-13 17:08:10 +08:00
Concedo
de64b9198c merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders) 2024-12-13 17:04:19 +08:00
Concedo
4548d893ee better way to handle termux compatibility (+2 squashed commit)
Squashed commit:

[301986f11] better way to handle termux compatibility

[16b03b225] updated lite
2024-12-11 15:05:01 +08:00
Concedo
a11bba5893 cleanup, fix native build for arm (+28 squashed commit)
Squashed commit:

[d1f6a4154] bundle library

[947ab84b7] undo

[0f9aba8d8] test

[e9ac93873] test

[920438202] test

[1c6d98804] Revert "quick test"

This reverts commit acf8ec8940.

[acf8ec894] quick test

[6a9937233] undo

[5a263a5bd] test

[ddfd82bca] test

[0b30e45da] test

[c3bfece55] messed up

[2a4b37fe0] Revert "test"

This reverts commit 80a1fcaeaf.

[80a1fcaea] test

[e2aa7d944] test

[264d80200] test

[f5b123173] undo

[1ffacc484] test

[63c0be926] undo

[510e0377e] ofast try fix

[4ac199b20] try fix sigill

[1bc987ba2] try fix illegal instruction

[7697252b1] edit

[f87087b28] check gcc ver

[e9dfe2cef] try using qemu to do the pyinstaller

[b411192db] revert

[25b5301e5] try using qemu to do the pyinstaller

[58038cddc] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Djip007
19d8762ab6
ggml : refactor online repacking (#10446)
* rename ggml-cpu-aarch64.c to .cpp

* reformat extra cpu backend.

- clean Q4_0_N_M and IQ4_0_N_M
  - remove from "file" tensor type
  - allow only with dynamic repack

- extract cpu extra bufts and convert to C++
  - hbm
  - "aarch64"

- more generic use of extra buffer
  - generalise extra_supports_op
  - new API for "cpu-accel":
     - amx
     - aarch64

* clang-format

* Clean Q4_0_N_M ref

Enable restrict on C++

* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack

* added/corrected control on tensor size for Q4 repacking.

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add debug logs on repacks.

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
Concedo
ece96e19bf clean up makefile 2024-12-05 23:58:23 +08:00
Xuan Son Nguyen
91c36c269b
server : (web ui) Various improvements, now use vite as bundler (#10599)
* hide buttons in dropdown menu

* use npm as deps manager and vite as bundler

* fix build

* fix build (2)

* fix responsive on mobile

* fix more problems on mobile

* sync build

* (test) add CI step for verifying build

* fix ci

* force rebuild .hpp files

* cmake: clean up generated files pre build
2024-12-03 19:38:44 +01:00
Georgi Gerganov
8648c52101
make : deprecate (#10514)
* make : deprecate

ggml-ci

* ci : disable Makefile builds

ggml-ci

* docs : remove make references [no ci]

* ci : disable swift build

ggml-ci

* docs : remove obsolete make references, scripts, examples

ggml-ci

* basic fix for compare-commits.sh

* update build.md

* more build.md updates

* more build.md updates

* more build.md updates

* Update Makefile

Co-authored-by: Diego Devesa <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-12-02 21:22:53 +02:00
Wang Qin
43957ef203
build: update Makefile comments for C++ version change (#10598) 2024-12-01 04:19:44 +01:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend (#10570)
* ggml : move AMX to the CPU backend

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00