Concedo
c3c42f6e7f
updated lite
2026-04-19 23:40:29 +08:00
Concedo
a8290a072f
more robust json field handling
2026-04-19 23:27:19 +08:00
Concedo
271c4c332c
hack to allow kokoro to remain functional even with much higher GGML_SCHED_MAX_SPLIT_INPUTS
2026-04-19 20:40:07 +08:00
Concedo
707bb67b30
minimal uses 10% of budget
2026-04-19 20:19:45 +08:00
Concedo
afaf3b960e
try to make kokoro take less graph size
2026-04-19 19:00:35 +08:00
Concedo
2336c3e549
updated lite
2026-04-19 14:15:10 +08:00
Concedo
8f4eaedfd8
updated sdui
2026-04-19 13:24:41 +08:00
Concedo
71b4107bb6
fixed terminal logs
2026-04-19 11:31:12 +08:00
Concedo
8886e48a4a
cache sd info
2026-04-19 02:19:11 +08:00
Wagner Bruna
1be08b9d15
sd: report all sampler aliases and centralize name mapping ( #2149 )
...
* debug: allow loading backend libraries without normal arg parsing
This is just to be able to test backend functions directly, with e.g.:
>> import koboldcpp
>> koboldcpp.init_libraries()
>> koboldcpp.sd_get_info()
* sd: report all sampler aliases and centralize name mapping
2026-04-19 01:51:42 +08:00
Concedo
e5eab545f3
handle override jinja template
2026-04-19 00:30:28 +08:00
Concedo
ff37b336a7
updated lite
2026-04-18 18:38:32 +08:00
Concedo
2962e5bac4
updated colab image models
2026-04-18 18:02:17 +08:00
Concedo
40827ab5b5
updated lite, improved reasoning budget
2026-04-18 17:37:47 +08:00
Concedo
17c754a5fc
improved reasoning budget
2026-04-18 17:19:09 +08:00
Concedo
78589974de
updated colab
2026-04-18 16:41:27 +08:00
Concedo
0b37cb9a57
added preliminary support for reasoning budget
2026-04-18 11:56:33 +08:00
Concedo
79882d669a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build-android.yml
# .github/workflows/build.yml
# .github/workflows/release.yml
# CMakeLists.txt
# CODEOWNERS
# common/CMakeLists.txt
# common/common.h
# docs/ops.md
# docs/ops/Metal.csv
# examples/batched/CMakeLists.txt
# examples/convert-llama2c-to-ggml/CMakeLists.txt
# examples/debug/CMakeLists.txt
# examples/diffusion/CMakeLists.txt
# examples/embedding/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/gen-docs/CMakeLists.txt
# examples/idle/CMakeLists.txt
# examples/lookahead/CMakeLists.txt
# examples/lookup/CMakeLists.txt
# examples/parallel/CMakeLists.txt
# examples/passkey/CMakeLists.txt
# examples/retrieval/CMakeLists.txt
# examples/save-load-state/CMakeLists.txt
# examples/speculative-simple/CMakeLists.txt
# examples/speculative/CMakeLists.txt
# examples/sycl/CMakeLists.txt
# examples/training/CMakeLists.txt
# ggml/src/ggml-hexagon/htp/hmx-matmul-ops.c
# ggml/src/ggml-hexagon/htp/htp-ops.h
# ggml/src/ggml-hexagon/htp/main.c
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-opencl/kernels/cvt.cl
# pocs/vdot/CMakeLists.txt
# src/CMakeLists.txt
# tests/CMakeLists.txt
# tests/test-quantize-stats.cpp
# tools/batched-bench/CMakeLists.txt
# tools/cli/CMakeLists.txt
# tools/cli/cli.cpp
# tools/completion/CMakeLists.txt
# tools/cvector-generator/CMakeLists.txt
# tools/cvector-generator/cvector-generator.cpp
# tools/export-lora/CMakeLists.txt
# tools/gguf-split/CMakeLists.txt
# tools/gguf-split/gguf-split.cpp
# tools/imatrix/CMakeLists.txt
# tools/llama-bench/CMakeLists.txt
# tools/llama-bench/llama-bench.cpp
# tools/mtmd/CMakeLists.txt
# tools/perplexity/CMakeLists.txt
# tools/quantize/CMakeLists.txt
# tools/quantize/quantize.cpp
# tools/results/CMakeLists.txt
# tools/server/CMakeLists.txt
# tools/tokenize/CMakeLists.txt
# tools/tts/CMakeLists.txt
2026-04-17 22:37:37 +08:00
Concedo
768527b031
Merge commit ' 1e796eb41f' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/workflows/build-riscv.yml
# .github/workflows/build-vulkan.yml
# .github/workflows/build.yml
# docs/backend/SYCL.md
# docs/build.md
# docs/development/HOWTO-add-model.md
# embd_res/templates/Reka-Edge.jinja
# ggml/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-sycl/convert.cpp
# ggml/src/ggml-sycl/dequantize.hpp
# ggml/src/ggml-sycl/dmmv.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# ggml/src/ggml-webgpu/wgsl-shaders/common_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/get_rows.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_decls.tmpl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_id.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_reg_tile.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_subgroup_matrix.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/mul_mat_vec.wgsl
# ggml/src/ggml-webgpu/wgsl-shaders/unary.wgsl
# tests/test-chat.cpp
# tools/rpc/README.md
2026-04-17 21:47:29 +08:00
Concedo
a089d6c59b
updated lite
2026-04-17 21:12:25 +08:00
Yuri Khrustalev
a279d0f0f4
ci : add android arm64 build and release ( #21647 )
...
Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled
Python check requirements.txt / check-requirements (push) Has been cancelled
Python Type-Check / python type-check (push) Has been cancelled
Update Operations Documentation / update-ops-docs (push) Has been cancelled
* server: respect the ignore eos flag
* ci: add android arm64 build and release
* patch
* pin android-setup actions to v4
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* lf in the suggestion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-04-17 11:32:24 +02:00
Concedo
9a38091207
support q5_1 kv
2026-04-17 17:06:15 +08:00
65a
268d61e178
mtmd: add missing struct tag ( #22023 )
2026-04-17 10:48:33 +02:00
Georgi Gerganov
6990e2f1f7
libs : rename libcommon -> libllama-common ( #21936 )
...
* cmake : allow libcommon to be shared
* cmake : rename libcommon to libllama-common
* cont : set -fPIC for httplib
* cont : export all symbols
* cont : fix build_info exports
* libs : add libllama-common-base
* log : add common_log_get_verbosity_thold()
2026-04-17 11:11:46 +03:00
Eric Zhang
fcc7508759
model : Gemma4 model type detection ( #22027 )
...
* model : Gemma4 model type detection
* model : Gemma4 model type detection
2026-04-17 10:07:11 +02:00
Concedo
e074939c17
compact context GUI page (+1 squashed commits)
...
Squashed commits:
[136f073ce] compact context GUI page
2026-04-17 14:40:53 +08:00
Concedo
cccb45a00a
summary outputs include processed amt
2026-04-17 14:22:51 +08:00
lhez
5e6c0e18b6
opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for Adreno ( #21938 )
...
* opencl: refactor q8_0 gemm/gemv Adreno dispatch
* opencl: refactor q8_0 set_tensor
* opencl: fix whitespace
2026-04-16 22:28:33 -07:00
Concedo
64ce5fca15
better approach when SWA window exceeded, simply refill the window. this is not 100% correct but good enough for fastforward users. Disable FF or increase window if not good enough
2026-04-17 11:44:13 +08:00
Concedo
fa3f86ee70
added simplepod cloud template to readme
2026-04-17 10:58:44 +08:00
Concedo
aed18cc901
swa padding default to 0
2026-04-17 10:54:14 +08:00
Sigbjørn Skjæret
30dce2cf29
cli : use get_media_marker ( #22017 )
2026-04-17 00:12:31 +02:00
Xuan-Son Nguyen
089dd41fe3
cmake: use glob to collect src/models sources ( #22005 )
2026-04-16 23:25:16 +02:00
nullname
85dde8dc4a
hexagon: optimize HMX matmul operations ( #21071 )
...
* optimize hmx_mat_mul functions by calculating row and column tiles upfront
* refactor core_dot_chunk_fp16 to use size_t for tile counts and improve readability
* wip
* set scale outside of loop
* wip
* refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t for tile counts
* wip
* wip
* refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions
* refactor core_dot_chunk_fp16 to use size_t for tile row stride calculation
* wip
* refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column scales initialization
* refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale setting and locking
* refactor core_dot_chunk_fp16 to improve tile stride calculations for output
* refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales initialization
* fix compiling error
* wip
* optimize row and column tile indexing in core_mma_chunk_fp16 function
* wip
* Revert "wip"
This reverts commit cde679eff79c4a28dd2d89d32f710015e09592b6.
* Add size limit check for HAP_mmap in htp_iface_mmap and drop_mmap functions
* wip
2026-04-16 13:48:34 -07:00
Xuan-Son Nguyen
4fbdabdc61
model: using single llm_build per arch ( #21970 )
...
* model: using single llm_build per arch
* fix merge
* nits
2026-04-16 21:10:22 +02:00
shaofeiqi
e45dbdece8
opencl: add q5_K gemm and gemv kernels for Adreno ( #21595 )
2026-04-16 12:08:33 -07:00
Pascal
4adac43f6f
server: tests: fetch random media marker via /apply-template ( #21962 ) ( #21980 )
...
* server: tests: fetch random media marker via /apply-template (#21962 fix)
* server: allow pinning media marker via LLAMA_MEDIA_MARKER env var
get_media_marker() checks LLAMA_MEDIA_MARKER at first call and uses it
as-is if set, falling back to the random marker otherwise.
Tests no longer need to fetch the marker dynamically via /apply-template:
the fixture sets LLAMA_MEDIA_MARKER=<__media__> so the hardcoded prompts
work as before.
Address review feedback from ngxson
* server: make get_media_marker() thread-safe via magic statics
Use a C++11 static local with a lambda initializer instead of a global
static with an empty-check. The runtime guarantees initialization exactly
once without explicit locking.
Address review feedback from ggerganov
* nits
* nits
2026-04-16 20:46:21 +03:00
Concedo
b5e317e015
SWA fix attempt 2
2026-04-17 00:33:45 +08:00
PikaPikachu
9db77a020c
model : refactor QKV into common build_qkv and create_tensor_qkv helpers ( #21245 )
...
* model : refactor QKV into common build_qkv and create_tensor_qkv helpers
* model : extend build_qkv to bert/mpt/dbrx/olmo/lfm2/nemotron-h/granite-hybrid/gemma3n-iswa/t5-dec and fix wqkv_s
2026-04-16 17:41:34 +02:00
Concedo
ab2c596718
updated lite
2026-04-16 23:21:57 +08:00
Sigbjørn Skjæret
f772f6e434
model : support NVFP4 tensors for Gemma4 ( #21971 )
...
* support nvfp4 tensors for Gemma4
* add wo_s to build_attn
* add wo_s to build_attn
* fix glm4
2026-04-16 16:51:47 +02:00
Ruben Ortlam
b572d1ecd6
codeowners: add team member comments ( #21714 )
2026-04-16 13:13:11 +03:00
Anav Prasad
03b3d07798
Convert: Fix NemotronH Config Parsing ( #21664 )
...
* fix NemotronH vocab loading by using trust_remote_code for unsupported config patterns
* fix NemotronH tokenizer loading by overriding set_vocab with trust_remote_code
2026-04-16 13:11:45 +03:00
Aman Gupta
3f7c29d318
ggml: add graph_reused ( #21764 )
...
* ggml: add graph_reused
* use versioning instead of reuse flag
* increment version with atomic
* use top bits for split numbering
* add assert
* move counter to ggml.c
* set uid in split_graph only
* fix windows
* address further review comments
* get next_uid rather than doing bit manipulation
* rename + add comment about uid
2026-04-16 17:21:28 +08:00
Concedo
ae292c496e
handle SWA conflicting with rewind, increased default SWA padding.
2026-04-16 17:00:26 +08:00
Kusha Gharahi
ae2d34899e
metal: Implement ROLL op ( #21946 )
...
* nix: support unified apple-sdk
* Impl roll op for Metal
* Revert "nix: support unified apple-sdk"
This reverts commit abfa473360471532c547de8b202c780507924d4b.
* update ops.md
* update op docs
2026-04-16 11:54:37 +03:00
Concedo
0251c6dbde
added swa padding controls
2026-04-16 16:21:48 +08:00
rehan-10xengineer
1e796eb41f
ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot ( #20633 )
...
* ggml-cpu: add 128-bit impls for i-quants, ternary quants
* ggml-cpu: add 128-bit impls for iq2_xs, iq3_s, iq3_xxs, tq2_0
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
* ggml-cpu: refactor; add rvv checks
---------
Co-authored-by: taimur-10x <taimur.ahmad@10xengineers.ai>
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
2026-04-16 11:15:15 +03:00
rehan-10xengineer
5637536517
ggml : implemented simd_gemm kernel for riscv vector extension ( #20627 )
...
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
2026-04-16 11:14:26 +03:00
Yuannan
90fb96a7b3
devops : added spirv-headers to nix ( #21965 )
2026-04-16 11:12:52 +03:00