henk717
4e30294cb1
Henk's Gemma4 31B Magic ( #2096 )
2026-04-06 18:49:19 +08:00
Concedo
6c937c05d9
improve ncmoe / moecpu regex
2026-04-04 23:53:13 +08:00
Concedo
db8bc40731
add some warnings if shifting fails
2026-04-04 23:16:26 +08:00
Concedo
eb3422996a
BOS fix for gemma4
2026-04-04 22:15:01 +08:00
Concedo
97f785efce
ensure BOS on vision prefix
2026-04-03 16:20:36 +08:00
Concedo
e8cffa37c8
fixed gemma4v image crashing on encode, however images are not yet working correctly
2026-04-03 15:56:35 +08:00
Concedo
0c2b679ea3
support bf16 quantkv cache type
2026-03-28 00:01:17 +08:00
Concedo
c91f350ed5
increase max images, take images from the end instead of beginning if too many images
2026-03-26 23:03:52 +08:00
Concedo
993925ba96
gracefully handle bad grammar instead of crashing
2026-03-23 17:00:53 +08:00
Concedo
07327b6c10
double n_batch size when pipeline parallel is enabled, keep u_batch the same
2026-03-21 11:22:10 +08:00
Concedo
3113e3a643
move main device print
2026-03-21 10:47:21 +08:00
Concedo
f579939057
updated lite, change smartcache snapshot behavior to conserve slots
2026-03-15 15:15:39 +08:00
Concedo
fcdf2f40d5
no need snapshot after gen is complete.
2026-03-15 12:34:48 +08:00
Concedo
33211e6edf
timing measure fixes
2026-03-15 12:23:14 +08:00
Concedo
500a1ab466
disable smartcache if slots is zero
2026-03-10 08:57:31 +08:00
Concedo
0df18d2ae2
fixed single token bans
2026-03-07 22:50:53 +08:00
JustCommitRandomness
2fbc3b2ae5
Adjust int types in format strings ( #2009 )
...
* tweak format sting types
This may not be all of them, but it's the ones which warn on OpenBSD
* complete the changes needed to fix the format string specifers
* avoid using inttypes, directly cast to size_t (u64 usually) instead
---------
Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>
2026-03-06 19:06:18 +08:00
Concedo
e36d7b6464
warn about RNN models not supporting antislop
2026-03-06 14:02:51 +08:00
Concedo
4f1b22c415
kv snapshots save and load last logits for correctness. added some text for musicui, updated docs
2026-03-04 21:57:28 +08:00
Concedo
54cf43ae64
rnn fix adjust
2026-03-04 10:59:51 +08:00
Concedo
707f7b37bf
optimize pp
2026-03-03 21:02:51 +08:00
Concedo
ae67caa2f7
ace qwen rep pen for codes
2026-03-02 21:18:06 +08:00
Concedo
d904b51b0f
adjust slot counts
2026-03-02 15:56:15 +08:00
Concedo
42134db6b4
finally fixed smartcache for qwen
2026-03-02 00:47:38 +08:00
Concedo
0b76f73fc2
smartcache bug seems to be fixed
2026-02-28 18:08:54 +08:00
Concedo
dd08d675f2
incomplete fix for rnn models, load state works but logits slightly different
2026-02-28 11:52:24 +08:00
Concedo
72f7e01b27
Merge commit ' 01d8eaa28d' into concedo_experimental
...
# Conflicts:
# build-xcframework.sh
# scripts/sync_vendor.py
# tests/test-backend-ops.cpp
# tools/mtmd/CMakeLists.txt
# tools/rpc/rpc-server.cpp
2026-02-16 15:36:59 +08:00
Concedo
9258d91b70
try initi rocblas before autofit
2026-02-11 22:29:47 +08:00
Concedo
e6d271db05
fixed typo
2026-02-09 17:12:03 +08:00
Concedo
3d3f02ef4a
revert layers if fail
2026-02-08 12:58:05 +08:00
Concedo
6bfbb5b283
allow autofit logspam in debugmode
2026-02-08 01:00:41 +08:00
Concedo
5cf21443bc
added autofit padding. autofit is now in the quick menu
2026-02-07 18:29:30 +08:00
Concedo
812da8b75d
fix autofit spamming
2026-02-07 18:01:01 +08:00
Concedo
349c461453
add stop reason for error
2026-02-04 20:23:18 +08:00
Concedo
226c79338f
handle glm4.7 flash template
2026-01-28 23:29:08 +08:00
Concedo
5c6cc02985
remove clblast, part 2
2026-01-23 14:09:46 +08:00
Concedo
3816391a74
increase logprobs returned to 10
2026-01-18 11:13:42 +08:00
Concedo
62bea5ef4f
allow overriding the devices directly
2026-01-17 19:08:06 +08:00
Concedo
983baac46b
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# ci/run.sh
# examples/model-conversion/Makefile
# examples/model-conversion/README.md
# examples/model-conversion/scripts/causal/compare-logits.py
# examples/model-conversion/scripts/embedding/run-converted-model.sh
# examples/model-conversion/scripts/utils/common.py
# examples/model-conversion/scripts/utils/semantic_check.py
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-webgpu/ggml-webgpu.cpp
# scripts/pr2wt.sh
# scripts/sync_vendor.py
# tests/test-arg-parser.cpp
2026-01-09 01:23:10 +08:00
Concedo
d8942cde14
smartcache allow custom number of slots
2026-01-02 17:19:40 +08:00
Concedo
bfa2ae7744
fixed smartcache bug when used with images
2026-01-02 00:35:05 +08:00
Concedo
51edb6ae61
allow clip fa for anything besides cuda on gpu
2026-01-01 21:09:51 +08:00
Concedo
54e419f587
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# docs/ops.md
# docs/ops/Metal.csv
# ggml/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# grammars/README.md
# models/templates/llama-cpp-deepseek-r1.jinja
# scripts/sync-ggml.last
# tests/test-chat.cpp
2026-01-01 15:34:10 +08:00
Concedo
76ef726ec8
adaptive p sharpness to 10.0f
2025-12-31 17:28:30 +08:00
Concedo
0e26e4d354
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# ggml/src/ggml-rpc/ggml-rpc.cpp
2025-12-28 23:47:55 +08:00
Concedo
21d801f6d5
init total weight for adaptive p
2025-12-28 15:33:06 +08:00
Concedo
27261bfc26
adaptive decay as an overridable param (+1 squashed commits)
...
Squashed commits:
[d94df7843] adaptive decay as an overridable param
2025-12-28 13:34:20 +08:00
Concedo
6548645aaa
rename power law sampler to adaptive p
2025-12-27 17:50:58 +08:00
Concedo
9bb362cce9
revised power law sampling
2025-12-27 10:59:46 +08:00
Concedo
91d8863f18
power law sampler added
2025-12-27 09:46:06 +08:00