Johannes Gäßler
9c8dcefe17
CUDA: backwards pass for misc. ops, add tests ( #11257 )
...
* CUDA: backwards pass for misc. ops, add tests
* remove restrict from pointers
2025-01-16 16:43:38 +01:00
Concedo
11cd7c7bb0
survived the storm, again
2025-01-16 22:25:18 +08:00
Johannes Gäßler
432df2d5f9
RoPE: fix back, CUDA support for back + noncont. ( #11240 )
...
* RoPE: fix back, CUDA support for back + noncont.
* fix comments reg. non-cont. RoPE support [no-ci]
2025-01-15 12:51:37 +01:00
Concedo
b154bd3671
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# docs/build.md
# docs/development/HOWTO-add-model.md
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2025-01-10 17:57:38 +08:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture ( #11001 )
...
llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Concedo
e788b8289a
You'll never take us alive
...
We swore that death will do us part
They'll call our crimes a work of art
2025-01-09 11:27:06 +08:00
Concedo
7c671f289e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# examples/cvector-generator/mean.hpp
# examples/cvector-generator/pca.hpp
# examples/export-lora/export-lora.cpp
# examples/rpc/rpc-server.cpp
# examples/run/README.md
# examples/run/run.cpp
# examples/server/CMakeLists.txt
# examples/server/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-vulkan/ggml-vulkan.cpp
# scripts/compare-llama-bench.py
# scripts/hf.sh
# tests/test-chat-template.cpp
2024-12-28 12:48:34 +08:00
Djip007
2cd43f4900
ggml : more perfo with llamafile tinyblas on x86_64 ( #10714 )
...
* more perfo with llamafile tinyblas on x86_64.
- add bf16 suport
- change dispache strategie (thanks:
https://github.com/ikawrakow/ik_llama.cpp/pull/71 )
- reduce memory bandwidth
simple tinyblas dispache and more cache freindly
* tinyblas dynamic dispaching
* sgemm: add M blocs.
* - git 2.47 use short id of len 9.
- show-progress is not part of GNU Wget2
* remove not stable test
2024-12-24 18:54:49 +01:00
Diego Devesa
32d6ee6385
ggml : fix const usage in SSE path ( #10962 )
2024-12-23 20:25:52 +01:00
Concedo
50648de0af
rephrase tensor moved warning, cleanup and prepare for ci
2024-12-19 22:57:43 +08:00
Concedo
f456ed7237
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .devops/tools.sh
# .github/workflows/build.yml
# Makefile
# README.md
# common/CMakeLists.txt
# common/common.h
# examples/llava/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/run/README.md
# examples/run/run.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-kompute/ggml-kompute.cpp
# tests/test-backend-ops.cpp
# tests/test-rope.cpp
2024-12-15 15:30:10 +08:00
Concedo
1e07043a6e
clean and rename old clblast files in preparation for merge
2024-12-15 15:29:02 +08:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE ( #10361 )
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Concedo
ed75f8a741
up to date merge, without vulkan-gen-shaders. They will be built before each release from now on, as they are very large
2024-12-13 17:18:01 +08:00
Concedo
de64b9198c
merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)
2024-12-13 17:04:19 +08:00
Concedo
4c4ce5e808
rewritten checkpoint 1 - before coopmat
2024-12-13 16:55:23 +08:00
Karol Kontny
d583cd03f6
ggml : Fix compilation issues on ARM platform when building without fp16 ( #10811 )
2024-12-13 01:04:19 +01:00
Diego Devesa
cb13ef85a4
remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ( #10797 )
...
other windows build fixes
2024-12-12 19:02:49 +01:00
Concedo
4548d893ee
better way to handle termux compatibility (+2 squashed commit)
...
Squashed commit:
[301986f11] better way to handle termux compatibility
[16b03b225] updated lite
2024-12-11 15:05:01 +08:00
Concedo
a11bba5893
cleanup, fix native build for arm (+28 squashed commit)
...
Squashed commit:
[d1f6a4154] bundle library
[947ab84b7] undo
[0f9aba8d8] test
[e9ac93873] test
[920438202] test
[1c6d98804
] Revert "quick test"
This reverts commit acf8ec8940
.
[acf8ec894
] quick test
[6a9937233
] undo
[5a263a5bd
] test
[ddfd82bca
] test
[0b30e45da
] test
[c3bfece55
] messed up
[2a4b37fe0
] Revert "test"
This reverts commit 80a1fcaeaf
.
[80a1fcaea
] test
[e2aa7d944
] test
[264d80200
] test
[f5b123173
] undo
[1ffacc484
] test
[63c0be926
] undo
[510e0377e
] ofast try fix
[4ac199b20
] try fix sigill
[1bc987ba2
] try fix illegal instruction
[7697252b1
] edit
[f87087b28
] check gcc ver
[e9dfe2cef
] try using qemu to do the pyinstaller
[b411192db
] revert
[25b5301e5
] try using qemu to do the pyinstaller
[58038cddc
] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Djip007
19d8762ab6
ggml : refactor online repacking ( #10446 )
...
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
PAB
a8cbab201d
ggml: add GGML_SET
Metal kernel + i32 CPU kernel (ggml/1037)
...
* implemented cpu kernel
* add i32 test cases in test-backend-ops
* typedef `ggml_metal_kargs_set`
* implemented `kernel_set`
* memcpy
2024-12-05 13:27:33 +02:00
PAB
c2082d93a8
ggml : add GGML_PAD_REFLECT_1D
operation (ggml/1034)
...
* ggml_pad_reflect_1d defined in header
* implemented on CPU
* called the forward pass
* impl Metal kernel
* added Metal kernel
* added OP_PAD_REFLECT_1D in test-backend-ops.cpp
* add test-pad-reflect-1d test case
* test case support multiple backend
2024-12-05 13:27:31 +02:00
Diego Devesa
59f4db1088
ggml : add predefined list of CPU backend variants to build ( #10626 )
...
* ggml : add predefined list of CPU backend variants to build
* update CPU dockerfiles
2024-12-04 14:45:40 +01:00
Diego Devesa
2803540814
ggml-cpu : fix HWCAP2_I8MM value ( #10646 )
2024-12-04 14:40:44 +01:00
Concedo
557bcaf86e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# .github/workflows/build.yml
# Makefile
# Package.swift
# common/CMakeLists.txt
# examples/batched-bench/CMakeLists.txt
# examples/batched/CMakeLists.txt
# examples/convert-llama2c-to-ggml/CMakeLists.txt
# examples/cvector-generator/CMakeLists.txt
# examples/embedding/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/export-lora/CMakeLists.txt
# examples/gbnf-validator/CMakeLists.txt
# examples/gguf-split/CMakeLists.txt
# examples/gguf/CMakeLists.txt
# examples/gritlm/CMakeLists.txt
# examples/imatrix/CMakeLists.txt
# examples/infill/CMakeLists.txt
# examples/llama-bench/CMakeLists.txt
# examples/llava/CMakeLists.txt
# examples/lookahead/CMakeLists.txt
# examples/lookup/CMakeLists.txt
# examples/main-cmake-pkg/CMakeLists.txt
# examples/main/CMakeLists.txt
# examples/parallel/CMakeLists.txt
# examples/passkey/CMakeLists.txt
# examples/perplexity/CMakeLists.txt
# examples/quantize-stats/CMakeLists.txt
# examples/quantize/CMakeLists.txt
# examples/retrieval/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/save-load-state/CMakeLists.txt
# examples/server/CMakeLists.txt
# examples/simple-chat/CMakeLists.txt
# examples/simple/CMakeLists.txt
# examples/speculative-simple/CMakeLists.txt
# examples/speculative/CMakeLists.txt
# examples/tokenize/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# pocs/vdot/CMakeLists.txt
# src/CMakeLists.txt
# src/unicode.cpp
# tests/test-sampling.cpp
2024-11-30 12:24:51 +08:00
Concedo
697ca70115
temp checkpoint
2024-11-30 12:13:20 +08:00
Concedo
ec95241e38
temp checkpoint
2024-11-30 11:59:27 +08:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00
Georgi Gerganov
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )
...
ggml-ci
2024-11-29 16:25:39 +02:00
Georgi Gerganov
76b27d29c2
ggml : fix row condition for i8mm kernels ( #10561 )
...
ggml-ci
2024-11-28 14:56:37 +02:00
Shupei Fan
c202cef168
ggml-cpu: support IQ4_NL_4_4 by runtime repack ( #10541 )
...
* ggml-cpu: support IQ4_NL_4_4 by runtime repack
* ggml-cpu: add __ARM_FEATURE_DOTPROD guard
2024-11-28 13:52:03 +01:00
Concedo
ec581b19d8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# Package.swift
# examples/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/llama-bench/llama-bench.cpp
# examples/server/README.md
# examples/server/server.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-amx/CMakeLists.txt
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-kompute/CMakeLists.txt
# ggml/src/ggml-kompute/ggml-kompute.cpp
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# pocs/CMakeLists.txt
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-quantize-fns.cpp
2024-11-26 17:01:20 +08:00
Diego Devesa
5931c1f233
ggml : add support for dynamic loading of backends ( #10469 )
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-25 15:13:39 +01:00
Concedo
83350ec314
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/020-enhancement.yml
# .github/ISSUE_TEMPLATE/030-research.yml
# .github/ISSUE_TEMPLATE/040-refactor.yml
# .github/workflows/build.yml
# Makefile
# common/CMakeLists.txt
# examples/CMakeLists.txt
# examples/infill/infill.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/parallel/parallel.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/speculative/speculative.cpp
# flake.lock
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/kernels/CMakeLists.txt
# ggml/src/ggml-cann/kernels/dup.cpp
# ggml/src/ggml-cann/kernels/get_row_f16.cpp
# ggml/src/ggml-cann/kernels/get_row_f32.cpp
# ggml/src/ggml-cann/kernels/get_row_q4_0.cpp
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
2024-11-25 16:26:08 +08:00
Diego Devesa
55ed008b2d
ggml : do not use ARM features not included in the build ( #10457 )
2024-11-23 14:41:12 +01:00
Concedo
091a432cf6
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-cli-musa.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-musa.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .gitignore
# CMakeLists.txt
# Makefile
# cmake/llama-config.cmake.in
# docs/backend/SYCL.md
# docs/build.md
# examples/llama-bench/llama-bench.cpp
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
2024-11-21 16:26:24 +08:00
Concedo
282a647689
Merge commit ' 467576b6cc
' into concedo_experimental
...
# Conflicts:
# .gitignore
# Makefile
# README.md
# common/common.h
# docs/build.md
# examples/infill/infill.cpp
# examples/perplexity/perplexity.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tests/test-quantize-perf.cpp
2024-11-21 16:05:21 +08:00
FirstTimeEZ
a43178299c
ggml : fix undefined reference to 'getcpu' ( #10354 )
...
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-17 10:39:22 +02:00
Johannes Gäßler
8a43e940ab
ggml: new optimization interface (ggml/988)
2024-11-17 08:30:29 +02:00
Concedo
590553ef07
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .github/workflows/build.yml
# CMakePresets.json
# Makefile
# docs/backend/SYCL.md
# docs/build.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
2024-11-16 17:20:14 +08:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00
Eve
18429220bd
AVX BF16 and single scale quant optimizations ( #10212 )
...
* use 128 bit loads (i've tried 256->128 to death and its slower)
* double accumulator
* avx bf16 vec dot
* +3% q4_0 inference
* +7% tg +5% pp compared to master
* slower f16c version, kep for reference
* 256b version, also slow. i tried :)
* revert f16
* faster with madd
* split to functions
* Q8_0 and IQ4_NL, 5-7% faster
* fix potential overflow (performance reduced)
* 16 bit add for q4_0 only
* merge
2024-11-15 12:47:58 +01:00
Charles Xu
1607a5e5b0
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels ( #9921 )
...
* backend-cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-11-15 01:28:50 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00