hexagon: eliminate scalar VTCM loads via HVX splat helpers (#22993)

* hexagon: add hvx_vec_repl helpers and use those for splat-from-vtcm usecase

* hmx-mm: optimize per-group scale handling

* hmx-fa: optimize slope load from vtcm

* hmx-fa: use aligned access where possible in hmx-utils

* hexagon: add hvx_vec_repl_2x_f16 helper and consolidate repl helpers

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
This commit is contained in:
Trivikram Reddy 2026-05-12 19:28:02 -05:00 committed by GitHub
parent a9883db8ee
commit 856c3adac1
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 107 additions and 38 deletions

View file

@ -70,5 +70,5 @@ adb $adbserial $adbhost shell " \
./$branch/bin/llama-completion --no-mmap -m $basedir/../gguf/$model \
--poll 1000 -t 6 --cpu-mask 0xfc --cpu-strict 1 \
--ctx-size 8192 --ubatch-size 256 -fa on \
-ngl 99 -no-cnv --device $device $cli_opts $@ \
-ngl 99 --device $device $cli_opts $@ \
"