Concedo
22ef97d7d3
Merge commit ' ab86335760
' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# examples/retrieval/retrieval.cpp
# examples/simple-chat/simple-chat.cpp
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# requirements/requirements-convert_hf_to_gguf.txt
# requirements/requirements-convert_hf_to_gguf_update.txt
# requirements/requirements-convert_lora_to_gguf.txt
# tools/run/run.cpp
2025-05-23 11:41:36 +08:00
Xuan-Son Nguyen
cf4cb59e64
ggml : add ggml_gelu_erf() ( #13667 )
...
* ggml : add ggml_gelu_na (not approximated)
* fix naming order
* rename na --> erf
* apply review suggesions
* revert naming order
2025-05-21 16:26:33 +02:00
Concedo
21e31e255b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-metal/ggml-metal.m
# ggml/src/ggml-metal/ggml-metal.metal
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/mmvq.cpp
# ggml/src/ggml-sycl/vecdotq.hpp
# scripts/compare-llama-bench.py
# src/CMakeLists.txt
# src/llama-model.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tools/llama-bench/README.md
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/mtmd/README.md
# tools/mtmd/clip.cpp
# tools/rpc/rpc-server.cpp
# tools/server/CMakeLists.txt
# tools/server/README.md
2025-05-13 00:28:35 +08:00
Johannes Gäßler
10d2af0eaa
llama/ggml: add LLM training support ( #10544 )
...
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
2025-05-12 14:44:49 +02:00
Concedo
ffe23f0e93
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/ggml-sycl.cpp
# pyproject.toml
2025-05-06 23:39:45 +08:00
Johannes Gäßler
2356fb1d53
CUDA: fix bad asserts for partial offload ( #13337 )
2025-05-06 13:58:51 +02:00
Concedo
b2ecfa0f55
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/README.md
# examples/llama-bench/llama-bench.cpp
# examples/llava/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# tests/test-chat-template.cpp
2025-04-29 21:05:16 +08:00
Johannes Gäßler
69699be48a
CUDA: fix q_nope_absorbed prec for DS 2 Lite f16 ( #13137 )
2025-04-28 09:29:26 +02:00
Concedo
6b6597ebf1
allow for single token prompt processing (actual batch size 1)
2025-04-25 16:54:46 +08:00
Acly
c6e8cc28c1
ggml : Depthwise 2D convolution (ggml/1152)
...
* ggml-cpu : kernels for faster depthwise 2D convolution
* fix compile: remove static after moving to ops.cpp
* add dilation for depthwise_conv_2d
* review: rename to ggml_conv_2d_dw_direct, remove redundant struct keywords, pass by ref, whitespace
* review: rename depthwise_conv_2d -> conv_2d_dw everywhere
2025-04-24 17:32:47 +03:00
Concedo
a0ae187563
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# README.md
# build-xcframework.sh
# examples/llava/CMakeLists.txt
# examples/llava/clip.cpp
# examples/rpc/rpc-server.cpp
# examples/run/run.cpp
# ggml/src/ggml-cann/ggml-cann.cpp
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-chat.cpp
2025-04-12 10:06:47 +08:00
Diego Devesa
fe92821ea9
ggml : add bilinear upscale support (ggml/1185)
2025-04-11 00:17:47 +03:00
Diego Devesa
459895c326
ggml : add more generic custom op, remove deprecated custom ops (ggml/1183)
...
* ggml : add more generic ggml_custom op
* ggml : remove deprecated custom ops
2025-04-11 00:17:47 +03:00
Concedo
396875e1c4
update api docs and lite
2025-03-29 15:39:25 +08:00
Georgi Gerganov
b4ae50810e
metal : improve FA + improve MoE ( #12612 )
...
* ggml : FA with different K, V head sizes (CPU)
ggml-ci
* metal : add FA with HS=192
* metal : extend FA to support different K and V head sizes
ggml-ci
* metal : add FA vector kernels for heads K 192 and V 128
ggml-ci
* ggml : restrict op on other backends to equal head sizes
ggml-ci
* metal : optimize FA-vec kernel
ggml-ci
* metal : FA remove mq registers
* metal : improve MoE mul_mat_id condition
ggml-ci
* metal : fix comments + remove unnecessary addition
ggml-ci
* metal : avoid too much shared memory usage with mul_mat_id
ggml-ci
2025-03-28 20:21:59 +02:00
Concedo
0c90d2ebcf
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# cmake/common.cmake
# docs/backend/SYCL.md
# examples/main/README.md
# examples/speculative/speculative.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-03-19 19:27:11 +08:00
Molly Sophia
7dfad387e3
llama: Add support for RWKV v7 architecture ( #12412 )
...
* ggml: Add op l2_norm
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* ggml: Add op rwkv_wkv7
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: Add support for RWKV7 and ARWKV7 models
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: fix inference with RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: add more (a)rwkv7 variants in size
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Apply code-format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* fix MUSA build
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* llama: fix shape error with rwkv using llama-parallel
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-03-18 07:27:50 +08:00
Concedo
ec43d2b147
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# common/common.cpp
# examples/embedding/embedding.cpp
# examples/json_schema_to_grammar.py
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.swiftui/README.md
# examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj
# examples/lookahead/lookahead.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# requirements.txt
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
2025-03-06 18:54:58 +08:00
mgroeber9110
5bbe6a9fe9
ggml : portability fixes for VS 2017 ( #12150 )
...
* Add include files for std::min/max and std::toupper/tolower
* win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined
* Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode
* win32: only use __restrict in MSVC if C11/C17 support is not enabled
---------
Co-authored-by: Marcus Groeber <Marcus.Groeber@cerence.com>
2025-03-04 18:53:26 +02:00
Concedo
754fef5204
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cuda.Dockerfile
# .devops/musa.Dockerfile
# .github/workflows/build.yml
# README.md
# docs/docker.md
# examples/imatrix/imatrix.cpp
# examples/llama-bench/llama-bench.cpp
# examples/main/README.md
# examples/perplexity/perplexity.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-cuda/CMakeLists.txt
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# scripts/get_chat_template.py
# scripts/sync-ggml.last
# tests/test-chat.cpp
# tests/test-gguf.cpp
# tests/test-sampling.cpp
2025-02-15 00:49:46 +08:00
bandoti
fef0cbeadf
cleanup: fix compile warnings associated with gnu_printf ( #11811 )
2025-02-12 10:06:53 -04:00
Concedo
db6db9dff9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/server.yml
# AUTHORS
# CMakeLists.txt
# Makefile
# README.md
# cmake/llama.pc.in
# common/CMakeLists.txt
# docs/build.md
# examples/batched.swift/Sources/main.swift
# examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
# examples/llava/CMakeLists.txt
# examples/llava/clip.h
# examples/run/run.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2025-02-07 00:52:31 +08:00
Johannes Gäßler
864a0b67a6
CUDA: use mma PTX instructions for FlashAttention ( #11583 )
...
* CUDA: use mma PTX instructions for FlashAttention
* __shfl_sync workaround for movmatrix
* add __shfl_sync to HIP
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-02-02 19:31:09 +01:00
Concedo
96407502cd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/llama-bench.cpp
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
Johannes Gäßler
9c8dcefe17
CUDA: backwards pass for misc. ops, add tests ( #11257 )
...
* CUDA: backwards pass for misc. ops, add tests
* remove restrict from pointers
2025-01-16 16:43:38 +01:00
Concedo
11cd7c7bb0
survived the storm, again
2025-01-16 22:25:18 +08:00
Johannes Gäßler
432df2d5f9
RoPE: fix back, CUDA support for back + noncont. ( #11240 )
...
* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci]
2025-01-15 12:51:37 +01:00
Concedo
b154bd3671
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# docs/build.md
# docs/development/HOWTO-add-model.md
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2025-01-10 17:57:38 +08:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture ( #11001 )
...
llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Concedo
e788b8289a
You'll never take us alive
...
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
Johannes Gäßler
53ff6b9b9f
GGUF: C++ refactor, backend support, misc fixes ( #11030 )
...
* GGUF: C++ refactor, backend support, misc fixes
remove ggml_tensor.backend
update CODEOWNERS [no ci]
remove gguf_get_data from API
revise GGUF API data types
2025-01-07 18:01:58 +01:00
Concedo
ee486bad3e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# examples/CMakeLists.txt
# examples/batched/batched.cpp
# examples/gritlm/gritlm.cpp
# examples/llama.android/llama/build.gradle.kts
# examples/main/README.md
# examples/retrieval/retrieval.cpp
# examples/server/CMakeLists.txt
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml.c
# scripts/compare-commits.sh
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-sampling.cpp
2024-12-19 11:57:43 +08:00
Georgi Gerganov
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00
Concedo
f456ed7237
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .devops/tools.sh
# .github/workflows/build.yml
# Makefile
# README.md
# common/CMakeLists.txt
# common/common.h
# examples/llava/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/run/README.md
# examples/run/run.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-kompute/ggml-kompute.cpp
# tests/test-backend-ops.cpp
# tests/test-rope.cpp
2024-12-15 15:30:10 +08:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE ( #10361 )
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Concedo
bcec998979
backported support for old interleaved aarch64 quants e.g. q4_0_4_4
2024-12-13 17:10:20 +08:00
Concedo
de64b9198c
merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)
2024-12-13 17:04:19 +08:00
Concedo
4c4ce5e808
rewritten checkpoint 1 - before coopmat
2024-12-13 16:55:23 +08:00
Djip007
19d8762ab6
ggml : refactor online repacking ( #10446 )
...
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
PAB
c2082d93a8
ggml : add GGML_PAD_REFLECT_1D
operation (ggml/1034)
...
* ggml_pad_reflect_1d defined in header
* implemented on CPU
* called the forward pass
* impl Metal kernel
* added Metal kernel
* added OP_PAD_REFLECT_1D in test-backend-ops.cpp
* add test-pad-reflect-1d test case
* test case support multiple backend
2024-12-05 13:27:31 +02:00
Concedo
ec95241e38
temp checkpoint
2024-11-30 11:59:27 +08:00
Shupei Fan
c202cef168
ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )
...
* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-11-28 13:52:03 +01:00
Concedo
ec581b19d8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# Package.swift
# examples/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/llama-bench/llama-bench.cpp
# examples/server/README.md
# examples/server/server.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-amx/CMakeLists.txt
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-kompute/CMakeLists.txt
# ggml/src/ggml-kompute/ggml-kompute.cpp
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# pocs/CMakeLists.txt
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-quantize-fns.cpp
2024-11-26 17:01:20 +08:00
Diego Devesa
5931c1f233
ggml : add support for dynamic loading of backends ( #10469 )
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-25 15:13:39 +01:00
Concedo
282a647689
Merge commit ' 467576b6cc
' into concedo_experimental
...
# Conflicts:
# .gitignore
# Makefile
# README.md
# common/common.h
# docs/build.md
# examples/infill/infill.cpp
# examples/perplexity/perplexity.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tests/test-quantize-perf.cpp
2024-11-21 16:05:21 +08:00
Johannes Gäßler
8a43e940ab
ggml: new optimization interface (ggml/988)
2024-11-17 08:30:29 +02:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00
Concedo
a244b1ffd2
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# Package.swift
# ci/run.sh
# docs/backend/SYCL.md
# examples/llama-bench/llama-bench.cpp
# examples/server/CMakeLists.txt
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/run-json-schema-to-grammar.mjs
# tests/test-backend-ops.cpp
2024-11-09 13:36:47 +08:00
Georgi Gerganov
841f27abdb
metal : optimize FA kernels ( #10171 )
...
* ggml : add ggml_flash_attn_ext_get_prec
* metal : use F16 precision in FA kernels
ggml-ci
* metal : minor clean-up
* metal : compile-guard bf16 FA kernels
ggml-ci
* build : remove obsolete compile flag [no ci]
* metal : prevent int overflows [no ci]
* cuda : disable BF16 FA
ggml-ci
* metal : fix BF16 requirement for FA kernels
ggml-ci
* make : clean-up [no ci]
2024-11-08 13:47:22 +02:00