Concedo
b99ee451f8
Merge commit ' 4ccea213bc' into concedo_experimental
...
# Conflicts:
# .devops/cpu.Dockerfile
# .devops/cuda.Dockerfile
# .devops/intel.Dockerfile
# .devops/musa.Dockerfile
# .devops/rocm.Dockerfile
# .github/workflows/bench.yml.disabled
# .github/workflows/build.yml
# .github/workflows/server.yml
# CMakeLists.txt
# build-xcframework.sh
# ci/run.sh
# common/CMakeLists.txt
# examples/llama.android/llama/build.gradle.kts
# examples/perplexity/perplexity.cpp
# examples/run/CMakeLists.txt
# examples/server/tests/README.md
# examples/sycl/win-build-sycl.bat
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/ggml-cpu.c
# licenses/LICENSE-linenoise
# scripts/sync-ggml.last
# tests/CMakeLists.txt
2025-04-08 21:26:23 +08:00
Concedo
6e42e673c6
attempt to fall back to system glslc
2025-04-07 00:33:52 +08:00
Concedo
59b7796b96
binops does not need clblast anymore
2025-04-03 23:06:19 +08:00
Concedo
8c74520586
added NO_VULKAN_EXTENSIONS flag to disable dp4a and coopmat if needed
2025-04-03 20:51:17 +08:00
Concedo
e1d3c19673
clblast not working correctly
2025-03-30 21:02:30 +08:00
Concedo
61a73347c6
fixed mrope for multiple images in qwen2vl (+1 squashed commits)
...
Squashed commits:
[63e4d91c] fixed mrope for multiple images in qwen2vl (+1 squashed commits)
Squashed commits:
[bb78db1e] wip fixing mrope
2025-03-30 17:23:58 +08:00
Concedo
3992fb79cc
wip adding embeddings support
2025-03-24 18:01:23 +08:00
Concedo
9910f3abe0
remove precompiled vulkan shaders from repo. They will now have to be recreated in vulkan-shaders-gen from scratch at runtime, which is auto handled by the makefile for windows and linux.
2025-03-19 21:51:16 +08:00
Concedo
0cfd8d23cb
handle symlinks (+1 squashed commits)
...
Squashed commits:
[fb8477b9] fixed makefile (+4 squashed commit)
Squashed commit:
[4a245bba] fixed a makefile issue
[d68eba69] alias usehipblas to usecublas
[a9ab0a7c] dynamic rocwmma selection
[fefe17c7] revert rocwmma
2025-03-17 21:03:30 +08:00
Concedo
98eade358a
more rocm include dir
2025-03-15 23:29:00 +08:00
Concedo
2c9ade61fe
test automatic vk shader rebuilding
2025-03-13 19:34:15 +08:00
Concedo
77debb1b1b
gemma3 vision works, but is using more tokens than expected - may need resizing
2025-03-13 00:31:16 +08:00
Concedo
e500968f92
fixed ggml common path in metal build
2025-03-12 10:58:57 +08:00
R0CKSTAR
251364549f
musa: support new arch mp_31 and update doc ( #12296 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-10 18:18:25 +01:00
Concedo
7eadd0a1d3
add GGML_HIP_ROCWMMA_FATTN
2025-03-08 17:15:41 +08:00
Concedo
6b7d2349a7
Rewrite history to fix bad vulkan shader commits without increasing repo size
...
added dpe colab (+8 squashed commit)
Squashed commit:
[b8362da4] updated lite
[ed6c037d] move nsigma into the regular sampler stack
[ac5f61c6] relative filepath fixed
[05fe96ab] export template
[ed0a5a3e] nix_example.md: refactor (#1401 )
* nix_example.md: add override example
* nix_example.md: drop graphics example, already basic nixos knowledge
* nix_example.md: format
* nix_example.md: Vulkan is disabled on macOS
Disabled in: 1ccd253acc
* nix_examples.md: nixpkgs.config.cuda{Arches -> Capabilities}
Fixes: https://github.com/LostRuins/koboldcpp/issues/1367
[675c62f7] AutoGuess: Phi 4 (mini) (#1402 )
[4bf56982 ] phrasing
[b8c0df04 ] Add Rep Pen to Top N Sigma sampler chain (#1397 )
- place after nsigma and before xtc (+3 squashed commit)
Squashed commit:
[87c52b97 ] disable VMM from HIP
[ee8906f3 ] edit description
[e85c0e69 ] Remove Unnecessary Rep Counting (#1394 )
* stop counting reps
* fix range-based initializer
* strike that - reverse it
2025-03-05 00:02:20 +08:00
Johannes Gäßler
a28e0d5eb1
CUDA: app option to compile without FlashAttention ( #12025 )
2025-02-22 20:44:34 +01:00
Bodhi
0b3863ff95
MUSA: support ARM64 and enable dp4a .etc ( #11843 )
...
* MUSA: support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>
2025-02-21 09:46:23 +02:00
Olivier Chafik
63e489c025
tool-call: refactor common chat / tool-call api (+ tests / fixes) ( #11900 )
...
* tool-call refactoring: moved common_chat_* to chat.h, common_chat_templates_init return a unique_ptr to opaque type
* addressed clang-tidy lints in [test-]chat.*
* rm minja deps from util & common & move it to common/minja/
* add name & tool_call_id to common_chat_msg
* add common_chat_tool
* added json <-> tools, msgs conversions to chat.h
* fix double bos/eos jinja avoidance hack (was preventing inner bos/eos tokens)
* fix deepseek r1 slow test (no longer <think> opening w/ new template)
* allow empty tools w/ auto + grammar
* fix & test server grammar & json_schema params w/ & w/o --jinja
2025-02-18 18:03:23 +00:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Concedo
816d9b7989
edit makefile flags
2025-02-08 22:36:26 +08:00
Concedo
ff9b4041da
fix builds
2025-02-07 11:46:08 +08:00
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention ( #11583 )
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-02 19:31:09 +01:00
Concedo
f13498df13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/server.yml
# Makefile
# README.md
# cmake/llama-config.cmake.in
# common/CMakeLists.txt
# examples/gbnf-validator/gbnf-validator.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
2025-02-01 17:14:59 +08:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Concedo
7a5499e77b
added one more backend for clblast noavx2 and clblast failsafe
2025-01-30 22:47:22 +08:00
Concedo
898856e183
cleaned up unused flags from makefile, updated lite
2025-01-30 19:34:55 +08:00
Concedo
2f69432774
makefile indentation fix (+1 squashed commits)
...
Squashed commits:
[f640eb59] makefile indentation fix
2025-01-29 22:18:54 +08:00
Olivier Chafik
6171c9d258
Add Jinja template support ( #11016 )
...
* Copy minja from 58f0ca6dd7
* Add --jinja and --chat-template-file flags
* Add missing <optional> include
* Avoid print in get_hf_chat_template.py
* No designated initializers yet
* Try and work around msvc++ non-macro max resolution quirk
* Update test_chat_completion.py
* Wire LLM_KV_TOKENIZER_CHAT_TEMPLATE_N in llama_model_chat_template
* Refactor test-chat-template
* Test templates w/ minja
* Fix deprecation
* Add --jinja to llama-run
* Update common_chat_format_example to use minja template wrapper
* Test chat_template in e2e test
* Update utils.py
* Update test_chat_completion.py
* Update run.cpp
* Update arg.cpp
* Refactor common_chat_* functions to accept minja template + use_jinja option
* Attempt to fix linkage of LLAMA_CHATML_TEMPLATE
* Revert LLAMA_CHATML_TEMPLATE refactor
* Normalize newlines in test-chat-templates for windows tests
* Forward decl minja::chat_template to avoid eager json dep
* Flush stdout in chat template before potential crash
* Fix copy elision warning
* Rm unused optional include
* Add missing optional include to server.cpp
* Disable jinja test that has a cryptic windows failure
* minja: fix vigogne (https://github.com/google/minja/pull/22 )
* Apply suggestions from code review
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Finish suggested renamings
* Move chat_templates inside server_context + remove mutex
* Update --chat-template-file w/ recent change to --chat-template
* Refactor chat template validation
* Guard against missing eos/bos tokens (null token otherwise throws in llama_vocab::impl::token_get_attr)
* Warn against missing eos / bos tokens when jinja template references them
* rename: common_chat_template[s]
* reinstate assert on chat_templates.template_default
* Update minja to b8437df626
* Update minja to https://github.com/google/minja/pull/25
* Update minja from https://github.com/google/minja/pull/27
* rm unused optional header
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-21 13:18:51 +00:00
Concedo
b3de1598e7
Fixed some GGUFv1 loading bugs, long overdue cleanup for compiling, integrated TTS
...
tts is functional (+6 squashed commit)
Squashed commit:
[22396311] wip tts
[3a883027] tts not yet working
[0dcfab0e] fix silly bug
[a378d9ef] some long overdue cleanup
[fc5a6fb5] Wip tts
[39f50497] wip TTS integration
2025-01-13 14:23:25 +08:00
Concedo
bd38665e1f
some cleanup before starting on TTS
2025-01-10 22:13:44 +08:00
Concedo
e788b8289a
You'll never take us alive
...
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
Concedo
bb2e739627
fixed simplercflags
2025-01-07 21:34:38 +08:00
Concedo
58791612d2
sse3 mode for noavx2 clblast, fixed metadata, added version command
2025-01-06 21:59:05 +08:00
Concedo
b4dc29f425
kobo cheats death again (+1 squashed commits)
...
Squashed commits:
[708e2429] kobo cheats death again
2025-01-04 01:06:41 +08:00
Concedo
22fd7a0439
fix make tools for linux
2025-01-03 11:39:23 +08:00
Concedo
2a890ec25a
Breaking change: unify the windows and linux build flags.
...
To do a full build on windows you now need LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1
2024-12-23 22:35:54 +08:00
Concedo
1e07043a6e
clean and rename old clblast files in preparation for merge
2024-12-15 15:29:02 +08:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE ( #10361 )
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Concedo
a63c2c914d
made shaders gen deterministic, update to c++17 (+4 squashed commit)
...
Squashed commit:
[7bb2441b] made shaders gen deterministic
[906e02af] Update c++ from 11 to 17 (#1263 )
* Update c/c++ from 11 to 17
* Update CMakeLists.txt
only bump c++
[7ca430ed] C++17 ver
[b7dfb55d ] give up and switch to c++17 (+1 squashed commits)
Squashed commits:
[96cfbc48] give up and switch to c++17 (+5 squashed commit)
Squashed commit:
[19ac7c26] Revert "fixed incorrect number of params"
This reverts commit 51388729bc4ffe51ab07ae02ce386219fb5e2876.
[45f730da] Revert "fix for c++17"
This reverts commit 050ba5f72b3358f958722addb9aaa77ff2e428ee.
[51388729] fixed incorrect number of params
[8f1ee54e] build latest vk shaders
[050ba5f7] fix for c++17
2024-12-13 23:07:10 +08:00
Concedo
7e1abf3aaf
sync - fix cmake failing to build with c++11, updated glslc.exe to handle coopmat, sync sdtype count, aarch repack flags
2024-12-13 17:08:10 +08:00
Concedo
de64b9198c
merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)
2024-12-13 17:04:19 +08:00
Concedo
4548d893ee
better way to handle termux compatibility (+2 squashed commit)
...
Squashed commit:
[301986f11] better way to handle termux compatibility
[16b03b225] updated lite
2024-12-11 15:05:01 +08:00
Concedo
a11bba5893
cleanup, fix native build for arm (+28 squashed commit)
...
Squashed commit:
[d1f6a4154] bundle library
[947ab84b7] undo
[0f9aba8d8] test
[e9ac93873] test
[920438202] test
[1c6d98804 ] Revert "quick test"
This reverts commit acf8ec8940 .
[acf8ec894 ] quick test
[6a9937233 ] undo
[5a263a5bd ] test
[ddfd82bca ] test
[0b30e45da ] test
[c3bfece55 ] messed up
[2a4b37fe0 ] Revert "test"
This reverts commit 80a1fcaeaf .
[80a1fcaea ] test
[e2aa7d944 ] test
[264d80200 ] test
[f5b123173 ] undo
[1ffacc484 ] test
[63c0be926 ] undo
[510e0377e ] ofast try fix
[4ac199b20 ] try fix sigill
[1bc987ba2 ] try fix illegal instruction
[7697252b1 ] edit
[f87087b28 ] check gcc ver
[e9dfe2cef ] try using qemu to do the pyinstaller
[b411192db ] revert
[25b5301e5 ] try using qemu to do the pyinstaller
[58038cddc ] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Djip007
19d8762ab6
ggml : refactor online repacking ( #10446 )
...
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
Concedo
ece96e19bf
clean up makefile
2024-12-05 23:58:23 +08:00
Xuan Son Nguyen
91c36c269b
server : (web ui) Various improvements, now use vite as bundler ( #10599 )
...
* hide buttons in dropdown menu
* use npm as deps manager and vite as bundler
* fix build
* fix build (2)
* fix responsive on mobile
* fix more problems on mobile
* sync build
* (test) add CI step for verifying build
* fix ci
* force rebuild .hpp files
* cmake: clean up generated files pre build
2024-12-03 19:38:44 +01:00
Georgi Gerganov
8648c52101
make : deprecate ( #10514 )
...
* make : deprecate
ggml-ci
* ci : disable Makefile builds
ggml-ci
* docs : remove make references [no ci]
* ci : disable swift build
ggml-ci
* docs : remove obsolete make references, scripts, examples
ggml-ci
* basic fix for compare-commits.sh
* update build.md
* more build.md updates
* more build.md updates
* more build.md updates
* Update Makefile
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-12-02 21:22:53 +02:00
Wang Qin
43957ef203
build: update Makefile comments for C++ version change ( #10598 )
2024-12-01 04:19:44 +01:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00