Concedo
0ac20e30b5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# docs/backend/SYCL.md
# docs/build.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/amx/mmq.cpp
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/sycl_hw.cpp
# ggml/src/ggml-sycl/sycl_hw.hpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-06-28 08:56:29 +08:00
Aaron Teo
bf5bcd0b85
docs: update s390x documentation + add faq ( #14389 )
...
* docs: update s390x documentation + add faq
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add s390x z17 build q&a
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-26 12:41:41 +02:00
Aaron Teo
60ef23d6c1
ggml-cpu: enable IBM NNPA Vector Intrinsics ( #14317 )
...
* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 4a9f60c201573128f73a65999b3e5cc497fae5c1)
* ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8d4a7987f9c1887f716be96250f2caeee0253929)
* ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0ff0d6516247a41d2ade42b42cf0d676a4dd1627)
* ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 2f58bbcbb89c183340e252362b2a40651f573f1f)
* docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 01b929491b50071a5d0572235dcf5a449da70aa7)
* ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34589566151630e294563a420702db39.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 157f856c34589566151630e294563a420702db39)
* ggml-cpu: move things around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30b39d9aaa0bd58400c5cf2c32135c9a.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451f27677063b914d4f6c9a9841d45b38d.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa52b5244f65cee21cb61062f1a9e0ca5.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564bdb7902cefb9c89b1c9e956a81ce29.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984ad27d7b60392fb2b7548885201864fe4.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 9e40d984ad27d7b60392fb2b7548885201864fe4)
* ggml: move ggml_table_f32_f16 to ggml-cpu.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f74f5e03ec0c2b4fefd3cbf395aecf16.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178bed5ef5c8398c4230ddd14fef80e49.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: slaren <slarengh@gmail.com>
2025-06-25 23:49:04 +02:00
Anton Mitkov
2bf9d539dd
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices ( #13973 )
2025-06-25 18:09:55 +02:00
David Chiu
d860dd99a4
docs : fix the link to llama.h ( #14293 )
2025-06-20 19:43:35 +02:00
Concedo
b59b5dbbd1
Merge commit ' 456af35eb7
' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/getrows.cpp
# src/CMakeLists.txt
# tools/llama-bench/llama-bench.cpp
2025-06-20 23:41:27 +08:00
Aaron Teo
8d94713654
docs: add s390x build documentation ( #14264 )
...
* docs: add s390x-specific build docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add s390x model conversion steps
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x build indent
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update hyperlinks for s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update llama.h docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x add accelerator and perf optimizations
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x indent blocks
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: revert block indentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add support information for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: remove indentation for accelerator section s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: remove redundant words s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: reword for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x reword simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: fix trailing whitespace for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-18 18:10:26 +01:00
Pepijn de Vos
00ba772610
docs : remove WIP since PR has been merged ( #13912 )
2025-06-15 08:06:37 +02:00
ddpasa
26ff3685bf
docs : Update multimodal.md ( #14122 )
...
* Update multimodal.md
* Update multimodal.md
2025-06-13 15:17:53 +02:00
Xinpeng Dou
e21d2d4ae2
CANN: Simplify the environment variable setting( #13104 )
...
* Simplify the environment variable setting to specify the memory pool type.
* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.
* update
* fix CI
* update
* delete whitespace
* fix according to review
* update CANN.md
* update CANN.md
2025-06-09 19:47:39 +08:00
Xuan-Son Nguyen
ea1431b0fa
docs : add "Quick start" section for new users ( #13862 )
...
* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md
2025-06-03 13:09:36 +02:00
Jiří Podivín
b3a89c3d9e
docs : Note about necessity of having libcurl installed for standard build. ( #13945 )
...
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2025-05-31 18:58:35 +02:00
Concedo
b08dca65ed
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/CMakeLists.txt
# common/arg.cpp
# common/chat.cpp
# examples/parallel/README.md
# examples/parallel/parallel.cpp
# ggml/cmake/common.cmake
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/rope.cpp
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-command-r.gguf.inp
# models/ggml-vocab-command-r.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-qwen2.gguf.inp
# models/ggml-vocab-qwen2.gguf.out
# models/ggml-vocab-refact.gguf.inp
# models/ggml-vocab-refact.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements/requirements-gguf_editor_gui.txt
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/mtmd/CMakeLists.txt
# tools/run/run.cpp
# tools/server/CMakeLists.txt
2025-05-31 13:04:21 +08:00
Concedo
0c108f6054
Merge commit ' 34b7c0439e
' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# scripts/sync-ggml.last
# src/CMakeLists.txt
# tools/mtmd/clip.cpp
2025-05-31 12:27:45 +08:00
Concedo
868cb6aff7
Merge commit ' e121edc432
' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# common/CMakeLists.txt
# docs/function-calling.md
# ggml/src/ggml-sycl/binbcast.cpp
# models/templates/README.md
# scripts/tool_bench.py
# src/llama-kv-cache.cpp
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/mtmd/clip.h
# tools/rpc/rpc-server.cpp
# tools/server/README.md
2025-05-28 00:20:45 +08:00
Xuan-Son Nguyen
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )
...
* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens
2025-05-27 14:06:10 +02:00
bandoti
72b090da2c
docs: remove link for llama-cli function calling ( #13810 )
2025-05-27 08:52:40 -03:00
Bizhao Shi
2d38b6e400
CANN: Add the basic supports of Flash Attention kernel ( #13627 )
...
* cann: add the basic FA support
* cann: update the readme
* cann: update the FlashAttention with PSEShift
* cann: update the input parameters in FA
* cann: update the alibi with max_bias
* cann: add the constrints of softcap
* cann: update the docs CANN.md
* cann: update the docs CANN.md
* cann: fix typo of CANN.md
* cann: add some comments and update the CANN.md
* cann: update the CANN.md
* cann: update the inner precise for fusedInferAttention
* cann: update the constraints of flash_attn_ext on ggml-cann.cpp
* cann: clean the whitespace
* cann: clean the whitespace
* cann: add a new endline
2025-05-26 10:20:18 +08:00
Xuan-Son Nguyen
40aaa8a403
mtmd : add support for Qwen2-Audio and SeaLLM-Audio ( #13760 )
...
* mtmd : add Qwen2-Audio support
* small clean up
* update discussion link
* clarify mtmd_get_output_embd
* clarification in multimodal.md
* fix ultravox bug
* ggml_cont
2025-05-25 14:06:32 +02:00
ddpasa
a08c1d2845
docs : add Moondream2 pre-quantized link ( #13745 )
...
* Multimodal: Added Moondream2 model and fixed ggml.org link
* Apply suggestions from code review
---------
Co-authored-by: name <none@none.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-25 14:04:49 +02:00
Olivier Chafik
f5cd27b71d
server
: streaming of tool calls and thoughts when --jinja
is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>
2025-05-25 01:48:08 +01:00
Concedo
55cc9acec5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# README.md
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/clip.cpp
# tools/mtmd/clip.h
2025-05-24 12:10:36 +08:00
Xuan-Son Nguyen
797990c4bc
mtmd : add ultravox audio input ( #13623 )
...
* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker()
2025-05-22 20:42:48 +02:00
Concedo
d04b4eeb04
merge not working
2025-05-21 18:06:41 +08:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-05-21 09:58:49 +08:00
Xinpeng Dou
f0adb80bf7
CANN: Update CANN model support ( #13162 )
...
* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md
2025-05-20 11:43:43 +08:00
Alberto Cabrera Pérez
725f23f1f3
sycl : backend documentation review ( #13544 )
...
* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table
2025-05-19 14:38:20 +01:00
Xuan-Son Nguyen
92ecdcc06a
mtmd : add vision support for llama 4 ( #13282 )
...
* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality
2025-05-19 13:04:14 +02:00
Concedo
e5d26a2356
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/CMakeLists.txt
# docs/backend/SYCL.md
# ggml/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/binbcast.cpp
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dmmv.cpp
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# ggml/src/gguf.cpp
# scripts/compare-llama-bench.py
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
# tools/server/README.md
2025-05-16 15:30:31 +08:00
Łukasz Ślusarczyk
9c404ed54c
sycl: use oneDNN for matrices multiplication ( #12972 )
2025-05-15 16:53:41 +02:00
ddpasa
21ca987fba
docs: Update link to ggml-org in multimodal.md ( #13513 )
...
* Update multimodal.md
Minor change to include the huggingface link
* Update docs/multimodal.md
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-14 09:59:12 +02:00
Concedo
21e31e255b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-metal/ggml-metal.m
# ggml/src/ggml-metal/ggml-metal.metal
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# src/llama-model.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/clip.cpp
# tools/rpc/rpc-server.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-05-13 00:28:35 +08:00
Thomas Germer
62d4250e52
docs : Fix typo in InternVL3 model name ( #13440 )
2025-05-10 22:26:46 +02:00
Xuan-Son Nguyen
3b24d26c22
server : update docs ( #13432 )
2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 ( #13422 )
...
* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen
33eff40240
server : vision support via libmtmd ( #12898 )
...
* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-09 19:29:37 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd ( #13311 )
...
* mv llava to mtmd
* change ref everywhere
2025-05-05 16:02:55 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Concedo
6b6597ebf1
allow for single token prompt processing (actual batch size 1)
2025-04-25 16:54:46 +08:00
Concedo
28a2723100
merged pixtral support, not fully working
2025-04-24 15:27:02 +08:00
Concedo
8f1edcbdac
Merge commit ' dc39a5e7a8
' into concedo_experimental
...
# Conflicts:
# README.md
# SECURITY.md
# docs/multimodal/MobileVLM.md
# examples/llava/CMakeLists.txt
# examples/llava/README.md
# examples/llava/android/adb_run.sh
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/rope.cpp
# ggml/src/ggml-sycl/rope.hpp
2025-04-24 11:49:08 +08:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
Concedo
4b0f63ed62
cleanup
2025-04-18 22:57:10 +08:00
David Huang
84778e9770
CUDA/HIP: Share the same unified memory allocation logic. ( #12934 )
...
Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.
2025-04-15 11:20:38 +02:00
Romain Biessy
8ed71242f4
sycl: update documentation to use -no-cnv ( #12845 )
2025-04-09 11:22:04 +02:00
Nicolò Scipione
94148ba330
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution ( #12625 )
2025-04-04 16:00:46 +02:00
Concedo
4e740311fe
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ci/run.sh
# docs/backend/SYCL.md
# docs/build.md
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# tests/test-chat-template.cpp
2025-04-04 15:07:47 +08:00
lhez
7d7b1bafa7
opencl: update doc for OpenCL ( #12702 )
...
* opencl: add OpenCL to build.md
* opencl: remove fixed issue/TODO
* opencl: add link to OPENCL.md
* opencl: update doc - refine tools requirement for Windows 11 arm64
2025-04-03 22:18:17 -07:00
Atharva Dubey
2004644b7a
ci : add env variable in ggml-ci and document the same in SYCL.md ( #12736 )
2025-04-03 15:12:39 +03:00