Wagner Bruna
86a094c559
fix autofit_tax_mb type error ( #1897 )
2025-12-23 11:31:09 +08:00
Concedo
62e6956def
wider launch button
2025-12-22 22:34:54 +08:00
Concedo
8b184dd638
corrupt scaler fix test
2025-12-22 22:24:10 +08:00
Concedo
a14fb971b9
template saving fix
2025-12-22 22:13:58 +08:00
Concedo
7fad4dc0ad
fixed ordering of gpu overhead detection
2025-12-22 17:39:05 +08:00
Concedo
7c82cad72c
support ovis, added taehv wan embed, fixed compile error (+1 squashed commits)
...
Squashed commits:
[ab71f6d33] support ovis, added taehv wan embed
2025-12-22 17:08:09 +08:00
Wagner Bruna
44ce1a80b3
sd: sync to master-431-23fce0b ( #1893 )
...
* sd: sync to master-427-78e15bd
* add kl_optimal to the available schedulers list
* more robust workaround to avoid stb linkage issues
* sd: sync to master-431-23fce0b
* add TAEHV support and disable TAE if the model isn't found
2025-12-22 15:07:09 +08:00
Concedo
27c53099f4
adjust scaler checks
2025-12-22 11:50:15 +08:00
Concedo
a0e4b8c18a
text for maingpu
2025-12-22 11:07:18 +08:00
Concedo
db4634b9a4
testing new workaround for corrupt scaling
2025-12-21 22:54:40 +08:00
Concedo
4b899b19dc
fixed save state a bit better
2025-12-21 22:24:13 +08:00
Concedo
b51e3592ba
revert all tk experiments
2025-12-21 21:10:36 +08:00
Concedo
80a0269dbe
improve snapshotting for rnn
2025-12-21 21:07:31 +08:00
Concedo
d577187875
update sdui
2025-12-21 20:35:19 +08:00
Concedo
b4bdc26d64
try tk 8.6.13 instead
2025-12-21 16:17:03 +08:00
Concedo
6128a91d5a
trying somethning else (+1 squashed commits)
...
Squashed commits:
[bf497e5cf] trying somethning else
2025-12-21 15:38:07 +08:00
Concedo
fedd529fdc
autofit counts overheads
2025-12-21 14:31:08 +08:00
Concedo
edfc961ff8
transplanted tk
2025-12-21 13:33:45 +08:00
Concedo
8b066d9765
don't crash workgroup size
2025-12-21 13:22:34 +08:00
Concedo
0c7e1d91ea
try a transplanted tk (+1 squashed commits)
...
Squashed commits:
[1eb87e4d1] try a transplanted tk (+1 squashed commits)
Squashed commits:
[094d1566a] try a transplanted tk
2025-12-21 11:31:32 +08:00
Concedo
d69db26b44
fix stb multiple impl
2025-12-20 12:05:50 +08:00
Concedo
17b4b888d0
revert changes for now, we'll do it again next time
2025-12-20 11:02:34 +08:00
Concedo
c406b9f33e
another font check (+1 squashed commits)
...
Squashed commits:
[6da9493ec] another font check
2025-12-20 09:49:29 +08:00
Concedo
7304640f72
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/release.yml
# docs/android.md
# docs/backend/hexagon/CMakeUserPresets.json
# examples/llama.android/app/src/main/res/layout/activity_main.xml
# examples/llama.android/app/src/main/res/layout/item_message_assistant.xml
# examples/llama.android/app/src/main/res/layout/item_message_user.xml
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/model-conversion/scripts/utils/common.py
# ggml/CMakeLists.txt
# ggml/src/ggml-hexagon/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/matmul-ops.c
# tests/test-arg-parser.cpp
# tools/server/README.md
2025-12-20 09:32:06 +08:00
Concedo
714ab0682e
Revert "Revert "llama : Async DirectIO model loading on Linux ( #18012 )""
...
This reverts commit a45fc5ee88 .
2025-12-20 09:25:10 +08:00
Concedo
710d88687b
try a more modern way of fixing font since xft is dead
2025-12-20 09:24:30 +08:00
Sigbjørn Skjæret
74e05131e9
ci : remove non-windows zip artifacts ( #18201 )
...
* remove non-windows zip artifacts
* add cuda dll links
2025-12-19 22:29:46 +01:00
Sigbjørn Skjæret
f74747d886
ci : only save ccache on master ( #18207 )
2025-12-19 22:29:37 +01:00
Alfred
ce734a8a2f
ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations ( #17977 )
...
* feat: implement real Q8_0
* feat: adding cmake option for configuring FP32 quantize group size
* typo: set() shall be used
---------
Co-authored-by: ngdxzy <zhenyu_xu@uri.edu>
2025-12-19 09:42:28 -08:00
Pascal
14931a826e
arg: fix order to use short form before long form ( #18196 )
...
* arg: fix order to use short form before long form
* arg: update doc
* arg: update test-arg-parser
* arg: address review feedback from ngxson
simplified to check first.length() <= last.length() only
fixed: --sampler-seq, --rerank, --draft ordering
note: middle positions in 3+ arg sets are not verified
* arg: update doc
2025-12-19 18:01:56 +01:00
Concedo
9458e08346
fixed https://github.com/LostRuins/koboldcpp/issues/1892
2025-12-19 22:52:39 +08:00
Julius Tischbein
f99ef53d2a
llama : Changing off_t to size_t for Windows ( #18204 )
2025-12-19 16:42:46 +02:00
Concedo
9ea153c14c
try a more modern way of fixing font since xft is dead
2025-12-19 21:50:17 +08:00
Concedo
9ea6a3fa62
add download page
2025-12-19 19:59:12 +08:00
Aman Gupta
cc0a04343e
server: friendlier error msg when ctx < input ( #18174 )
...
* llama-server: friendlier error msg when ctx < input
This PR adds formatted strings to the server's send_error function
* llama-server: use string_format inline
* fix test
2025-12-19 12:10:00 +01:00
Xuan-Son Nguyen
98c1c7a7bf
presets: refactor, allow cascade presets from different sources, add global section ( #18169 )
...
* presets: refactor, allow cascade presets from different sources
* update docs
* fix neg arg handling
* fix empty mmproj
* also filter out server-controlled args before to_ini()
* skip loading custom_models if not specified
* fix unset_reserved_args
* fix crash on windows
2025-12-19 12:08:20 +01:00
Concedo
a45fc5ee88
Revert "llama : Async DirectIO model loading on Linux ( #18012 )"
...
This reverts commit 4d4f4cacd1 .
2025-12-19 19:06:30 +08:00
Aleksander Grygier
acb73d8340
webui: Add editing attachments in user messages ( #18147 )
...
Python check requirements.txt / check-requirements (push) Has been cancelled
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python Type-Check / pyright type-check (push) Has been cancelled
* feat: Enable editing attachments in user messages
* feat: Improvements for data handling & UI
* docs: Update Architecture diagrams
* chore: update webui build output
* refactor: Exports
* chore: update webui build output
* feat: Add handling paste for Chat Message Edit Form
* chore: update webui build output
* refactor: Cleanup
* chore: update webui build output
2025-12-19 11:14:07 +01:00
Concedo
2e57e5ead4
rename eval function
2025-12-19 17:54:23 +08:00
Concedo
e9ae0cb2dd
added support for RNN models in smartcache
2025-12-19 16:36:25 +08:00
Daniel Bevenius
0a271d82b4
model-conversion : add verbose flag in run-org-model.py ( #18194 )
...
This commit adds a --verbose flag to the run-org-model.py script to
enable or disable detailed debug output, such as input and output
tensors for each layer. Debug utilities (summarize, debug_hook,
setup_rope_debug) have been moved to utils/common.py.
The motivation for this is that the detailed debug output can be useful
for diagnosing issues with model conversion or execution, but it can
also produce a large amount of output that may not always be needed.
The script will also be further cleaned/refactored in follow-up commits.
2025-12-19 08:43:16 +01:00
Naco Siren
52fc7fee8a
android: fix missing screenshots for Android.md ( #18156 )
...
* Android basic sample app layout polish
* Add missing screenshots and polish android README doc
* Replace file blobs with URLs served by GitHub pages service.
2025-12-19 09:32:04 +02:00
Jeff Bolz
cdbada8d10
vulkan: Add perf logger mode with concurrency ( #17944 )
...
This implements a variation of the perf logger where rather than timing each
operation individually with effectively a barrier in between, we put the
timing boundaries where we already synchronize and time the groups of work
that normally overlap. This can be useful to help understand whether
individual operations need to be optimized, or if the group is already running
efficiently.
GGML_VK_PERF_LOGGER_CONCURRENT=1 enables the new mode (when
GGML_VK_PERF_LOGGER is also set).
GGML_VK_SYNC_LOGGER=1 replaces the ENABLE_SYNC_LOGGING compile time switch.
2025-12-19 06:36:46 +01:00
Concedo
cde4791e36
fix tools building
2025-12-19 12:08:29 +08:00
Concedo
51b1d12914
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# tests/test-backend-ops.cpp
# tools/mtmd/CMakeLists.txt
2025-12-19 11:11:19 +08:00
Concedo
fef2ea46fd
Merge remote-tracking branch 'jeff/im2col_wglimit' into concedo_experimental
...
# Conflicts:
# tests/test-backend-ops.cpp
2025-12-19 11:01:47 +08:00
Concedo
58eb5573de
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/hvx-utils.c
# ggml/src/ggml-hexagon/htp/main.c
# src/llama-model.cpp
# tools/server/README.md
2025-12-19 11:00:43 +08:00
Xuan-Son Nguyen
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) ( #18106 )
...
* ASR with LFM2-Audio-1.5B
* Set rope_theta
* Fix comment
* Remove rope_theta setting
* Address PR feedback
* rename functions to conformer
* remove some redundant ggml_cont
* fix missing tensor
* add prefix "a." for conv tensors
* remove redundant reshape
* clean up
* add test model
---------
Co-authored-by: Tarek Dakhran <tarek@liquid.ai>
2025-12-19 00:18:01 +01:00
Jeff Bolz
442723c946
vulkan: fix im2col overflowing maxworkgroupcount
2025-12-18 12:16:41 -06:00
Concedo
e005fc2587
Merge commit ' 8dcc3662a2' into concedo_experimental
...
Keep changes from https://github.com/ggml-org/llama.cpp/pull/18096 without https://github.com/ggml-org/llama.cpp/pull/14904
Reason is to maintain compatibility with 2023 w64devkit
# Conflicts:
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# examples/model-conversion/scripts/causal/run-org-model.py
# examples/speculative/speculative.cpp
# ggml/src/ggml-cpu/arch-fallback.h
# ggml/src/ggml-cpu/repack.cpp
# ggml/src/ggml-cpu/repack.h
# ggml/src/ggml-hexagon/ggml-hexagon.cpp
# ggml/src/ggml-hexagon/htp/act-ops.c
# ggml/src/ggml-hexagon/htp/htp-msg.h
# ggml/src/ggml-hexagon/htp/hvx-utils.c
# ggml/src/ggml-hexagon/htp/hvx-utils.h
# ggml/src/ggml-hexagon/htp/main.c
2025-12-19 02:11:55 +08:00