Concedo
db6db9dff9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/server.yml
# AUTHORS
# CMakeLists.txt
# Makefile
# README.md
# cmake/llama.pc.in
# common/CMakeLists.txt
# docs/build.md
# examples/batched.swift/Sources/main.swift
# examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
# examples/llava/CMakeLists.txt
# examples/llava/clip.h
# examples/run/run.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2025-02-07 00:52:31 +08:00
junchao-zhao
8d4d2be143
ggml : fix LoongArch compile error with 128-bit SIMD ( #11701 )
2025-02-06 11:20:00 +02:00
Concedo
96407502cd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/llama-bench.cpp
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
fj-y-saito
c67cc9837d
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot ( #11227 )
...
* Add SVE support for q4_K_q8_K
* Update ggml/src/ggml-cpu/ggml-cpu-quants.c
change to use K_SCALE_SIZE
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-16 11:11:49 +02:00
Concedo
911da8765f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/bench/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-01-03 11:56:20 +08:00
Srihari-mcw
0827b2c1da
ggml : fixes for AVXVNNI instruction set with MSVC and Clang ( #11027 )
...
* Fixes for clang AVX VNNI
* enable AVX VNNI and alder lake build for MSVC
* Apply suggestions from code review
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-12-31 15:23:33 +01:00
Concedo
4548d893ee
better way to handle termux compatibility (+2 squashed commit)
...
Squashed commit:
[301986f11] better way to handle termux compatibility
[16b03b225] updated lite
2024-12-11 15:05:01 +08:00
Concedo
a11bba5893
cleanup, fix native build for arm (+28 squashed commit)
...
Squashed commit:
[d1f6a4154] bundle library
[947ab84b7] undo
[0f9aba8d8] test
[e9ac93873] test
[920438202] test
[1c6d98804
] Revert "quick test"
This reverts commit acf8ec8940
.
[acf8ec894
] quick test
[6a9937233
] undo
[5a263a5bd
] test
[ddfd82bca
] test
[0b30e45da
] test
[c3bfece55
] messed up
[2a4b37fe0
] Revert "test"
This reverts commit 80a1fcaeaf
.
[80a1fcaea
] test
[e2aa7d944
] test
[264d80200
] test
[f5b123173
] undo
[1ffacc484
] test
[63c0be926
] undo
[510e0377e
] ofast try fix
[4ac199b20
] try fix sigill
[1bc987ba2
] try fix illegal instruction
[7697252b1
] edit
[f87087b28
] check gcc ver
[e9dfe2cef
] try using qemu to do the pyinstaller
[b411192db
] revert
[25b5301e5
] try using qemu to do the pyinstaller
[58038cddc
] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Concedo
697ca70115
temp checkpoint
2024-11-30 12:13:20 +08:00
Concedo
ec95241e38
temp checkpoint
2024-11-30 11:59:27 +08:00
Georgi Gerganov
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )
...
ggml-ci
2024-11-29 16:25:39 +02:00
Georgi Gerganov
76b27d29c2
ggml : fix row condition for i8mm kernels ( #10561 )
...
ggml-ci
2024-11-28 14:56:37 +02:00
Concedo
590553ef07
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .github/workflows/build.yml
# CMakePresets.json
# Makefile
# docs/backend/SYCL.md
# docs/build.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
2024-11-16 17:20:14 +08:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00
Eve
18429220bd
AVX BF16 and single scale quant optimizations ( #10212 )
...
* use 128 bit loads (i've tried 256->128 to death and its slower)
* double accumulator
* avx bf16 vec dot
* +3% q4_0 inference
* +7% tg +5% pp compared to master
* slower f16c version, kep for reference
* 256b version, also slow. i tried :)
* revert f16
* faster with madd
* split to functions
* Q8_0 and IQ4_NL, 5-7% faster
* fix potential overflow (performance reduced)
* 16 bit add for q4_0 only
* merge
2024-11-15 12:47:58 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00