mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2026-05-02 21:50:31 +00:00
8 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
40be51152d
|
ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free (#15839)
Some checks failed
Update Operations Documentation / update-ops-docs (push) Has been cancelled
|
||
|
|
186415d595
|
ggml-cpu: drop support for nnpa intrinsics (#15821) | ||
|
|
ad5c975c2d
|
ggml-cpu: Support Q5_0 and Q5_1 on s390x (#15486)
* ggml-cpu: initial q5_0 impl for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: updated q5_0 code for better performance Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: use optimised hsum for better performance Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: introduce q5_1 simd + refactor q5_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix incorrect return type vec_hsum Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: refactor q5_1 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: q5_1 update loop unroll to 4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update q5_0 unroll to 4 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update build-s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update unused variables q5_0 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update the last update date Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> |
||
|
|
ff27f80a74
|
ggml: initial IBM zDNN backend (#14975)
* ggml-zdnn: inital backend impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: temp change z17 to arch15 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: fix build bugs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: tensor->extra logging check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add layout name mapping, ztensor information Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: separate logging into its own line Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add shape comparison Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add ggml_tensor shape log Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: fix incorrect shape logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add output buffer check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: run compute and store into tensor->extra Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more loggers Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update set_tensor logging to check only for matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: last working matmul version Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add comments to prevent accidentally deleting lines Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: support op out_prod Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update op out_prod to use tensor->extra Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rewrite the backend implementation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bugfix new impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix compiler warnings and bugfixes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: test ztensor finding in init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: implement at least 1 op to test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: assign tensor->extra to buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add check for view tensors to prevent init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rework init_tensor to create new buffers Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to std vector instead of array Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch buffers back and set to arbitrary number Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: impl init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update supports_op matmul matrix Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix incorrect ztensor shape, reduce memory padding Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: impl matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix compiler error missing type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing data transform call Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * chore: add codeowners Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update ops docs for zdnn Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * codeowners: remove taronaeo for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "codeowners: remove taronaeo for now" This reverts commit 411ea4ed78d08778967bd0bd33a6538cfcbe082f. * ggml-zdnn: remove unused ggml_zdnn macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> |
||
|
|
c7f3169cd5
|
ggml-cpu : disable GGML_NNPA by default due to instability (#14880)
* docs: update s390x document for sentencepiece Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit e086c5e3a7ab3463d8e0906efcfa39352db0a48d) * docs: update huggingface links + reword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 8410b085ea8c46e22be38266147a1e94757ef108) * ggml-cpu: disable ggml-nnpa compile flag by default fixes #14877 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit 412f4c7c88894b8f55846b4719c76892a23cfe09) * docs: update s390x build docs to reflect nnpa disable Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit c1eeae1d0c2edc74ab9fbeff2707b0d357cf0b4d) --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> |
||
|
|
bf5bcd0b85
|
docs: update s390x documentation + add faq (#14389)
* docs: update s390x documentation + add faq Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add s390x z17 build q&a Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> |
||
|
|
60ef23d6c1
|
ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317)
* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 4a9f60c201573128f73a65999b3e5cc497fae5c1)
* ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8d4a7987f9c1887f716be96250f2caeee0253929)
* ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0ff0d6516247a41d2ade42b42cf0d676a4dd1627)
* ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 2f58bbcbb89c183340e252362b2a40651f573f1f)
* docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 01b929491b50071a5d0572235dcf5a449da70aa7)
* ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34589566151630e294563a420702db39.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 157f856c34589566151630e294563a420702db39)
* ggml-cpu: move things around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30b39d9aaa0bd58400c5cf2c32135c9a.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451f27677063b914d4f6c9a9841d45b38d.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa52b5244f65cee21cb61062f1a9e0ca5.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564bdb7902cefb9c89b1c9e956a81ce29.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984ad27d7b60392fb2b7548885201864fe4.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 9e40d984ad27d7b60392fb2b7548885201864fe4)
* ggml: move ggml_table_f32_f16 to ggml-cpu.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f74f5e03ec0c2b4fefd3cbf395aecf16.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178bed5ef5c8398c4230ddd14fef80e49.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: slaren <slarengh@gmail.com>
|
||
|
|
8d94713654
|
docs: add s390x build documentation (#14264)
* docs: add s390x-specific build docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add s390x model conversion steps Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x build indent Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update hyperlinks for s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update llama.h docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x add accelerator and perf optimizations Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x indent blocks Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: revert block indentation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: add support information for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x reword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: remove indentation for accelerator section s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: remove redundant words s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: reword for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: s390x reword simd Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: fix trailing whitespace for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> |