Concedo
4e40f2aaf4
added photomaker face cloning
2025-06-20 21:33:36 +08:00
Concedo
21881a861d
rename restrict square to sdclampedsoft
2025-06-20 15:39:55 +08:00
Concedo
175c99081e
merged https://github.com/leejet/stable-diffusion.cpp/issues/588 to fix vae tiling, ref https://github.com/LostRuins/koboldcpp/issues/1603
2025-06-20 11:13:04 +08:00
Concedo
b925bbfc6d
add simple api example
2025-06-19 23:05:28 +08:00
Concedo
771261f5be
updated sdui
2025-06-19 22:16:23 +08:00
Concedo
924dfa7cd3
bump version
2025-06-18 21:37:24 +08:00
Concedo
9e49350507
merge occam's https://github.com/ggml-org/llama.cpp/pull/14249
2025-06-18 21:23:23 +08:00
Concedo
5f0a7a84ae
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# scripts/sync-ggml.last
2025-06-18 21:22:51 +08:00
Concedo
268b6f76df
updated lite
2025-06-18 21:06:24 +08:00
Concedo
e35c6b8f9b
remove t5 masking sdcpp
2025-06-18 21:05:03 +08:00
Concedo
e0a7694328
try to set cuda pcie order first thing
2025-06-18 20:25:38 +08:00
Charles Xu
ef035803eb
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS ( #14258 )
2025-06-18 12:40:07 +01:00
Concedo
a8d33ebb0d
increase genamt hardlimit from 0.1 to 0.2 ratio
2025-06-18 19:34:12 +08:00
Concedo
40443a98f5
show available RAM, fixed SD vae tiling noise
2025-06-18 18:44:50 +08:00
Xuan-Son Nguyen
413977de32
mtmd : refactor llava-uhd preprocessing logic ( #14247 )
...
* mtmd : refactor llava-uhd preprocessing logic
* fix editorconfig
2025-06-18 10:43:57 +02:00
Xuan-Son Nguyen
95402553a5
llama-chat : fix multiple system message for gemma, orion ( #14246 )
2025-06-18 09:58:43 +02:00
Sigbjørn Skjæret
3865cff4f5
convert : fix null head_dim AutoConfig regression ( #14248 )
2025-06-18 09:52:07 +02:00
Georgi Gerganov
d03172cc79
sync : ggml
...
ggml-ci
2025-06-18 09:59:21 +03:00
Daniel Bevenius
dd8e59f443
ggml : disable warnings for tests when using MSVC (ggml/1273)
...
* ggml : disable warnings for tests when using MSVC
This commit disables warnings for tests on windows when using MSVC.
The motivation for this is that this brings the build output more
inline with what Linux/MacOS systems produce.
There is still one warning generated for the tests which is:
```console
Building Custom Rule C:/ggml/tests/CMakeLists.txt
cl : command line warning D9025: overriding '/DNDEBUG' with '/UNDEBUG'
[C:\ggml\build\tests\test-arange.vcxproj]
test-arange.cpp
test-arange.vcxproj -> C:\ggml\build\bin\Release\test-arange.exe
```
* ggml : fix typo in tests disable list
2025-06-18 09:59:21 +03:00
Daniel Bevenius
bbe98d2784
ggml : remove unused ggml_context_container (ggml/1272)
...
This commit removes the unused `ggml_context_container` structure from
the ggml library. It looks like the usage of this struct was removed in
Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc
ggml_contexts on the heap (whisper/2525)").
The motivation for this changes is to improve code clarity/readability.
2025-06-18 09:59:21 +03:00
Daniel Bevenius
c2056ed6d4
examples : include examples in msvc disable warn (ggml/1270)
...
This commit adds the examples in the "list" of targets to ignore MSVC
warnings.
The motivation for this is that currently the examples generate a number
of warnings that are ignore/disabled for the core ggml project. This
makes for a cleaner output when building.
2025-06-18 09:59:21 +03:00
bandoti
c46503014d
cmake: remove shader-gen step-targets from ggml-vulkan ( #14226 )
...
* Remove step-targets from vulkan-shaders-gen
* Unset DESTDIR when building vulkan-shaders-gen
2025-06-17 22:33:25 +02:00
Concedo
7966bdd1ad
allow embeddings model to use gpu
2025-06-18 00:46:30 +08:00
Concedo
4356a00f4a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ci/run.sh
# docs/function-calling.md
# examples/gritlm/gritlm.cpp
# ggml/CMakeLists.txt
# ggml/cmake/common.cmake
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# requirements/requirements-compare-llama-bench.txt
# scripts/compare-llama-bench.py
# tests/CMakeLists.txt
2025-06-18 00:16:54 +08:00
Reithan
f07434f4c1
streamline grammar sampler to speed up generation while using heavy grammar ( #1606 )
2025-06-17 23:04:59 +08:00
xctan
860a9e4eef
ggml-cpu : remove the weak alias trick ( #14221 )
2025-06-17 12:58:32 +03:00
R0CKSTAR
fe9d60e74a
musa: fix build warning (unused variable) ( #14231 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-17 17:48:08 +08:00
Sigbjørn Skjæret
e434e69183
common : suggest --jinja when autodetection fails ( #14222 )
2025-06-16 21:58:42 +02:00
Georgi Gerganov
89fea80d29
server : fix incorrect usage of llama_get_embeddings() ( #14225 )
...
* server : fix incorrect usage of llama_get_embeddings()
ggml-ci
* cont : fix the fix
ggml-ci
2025-06-16 22:33:27 +03:00
Concedo
ab29be54c4
comfyui compat - serve temporary upload endpoint for img2img
2025-06-16 23:18:47 +08:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-16 08:11:43 -07:00
bandoti
0dbcabde8c
cmake: clean up external project logic for vulkan-shaders-gen ( #14179 )
...
* Remove install step for vulkan-shaders-gen
* Add install step to normalize msvc with make
* Regenerate modified shaders at build-time
2025-06-16 10:32:13 -03:00
Đinh Trọng Huy
ad590be98c
model : add NeoBERT ( #14164 )
...
* convert neobert model to gguf
* add inference graph
* fix flake8 lint
* followed reviewer suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* follow reviewers suggestions
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* override NeoBERT feed-forward length
---------
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-16 14:53:41 +02:00
uvos
7d6d91babf
HIP: disable rocwmma on gfx12 by default until rocm 7.0 ( #14202 )
2025-06-16 13:47:38 +02:00
Georgi Gerganov
d3e64b9f49
llama : rework embeddings logic ( #14208 )
...
* llama : rework embeddings logic
ggml-ci
* cont : fix rerank
ggml-ci
* cont : engrish [no ci]
* cont : fix rerank
ggml-ci
* server : support both embeddings and completions with single model
ggml-ci
* cont : avoid embeddings_org
ggml-ci
2025-06-16 14:14:00 +03:00
Charles Xu
3ba0d843c6
ggml: Add Android support for GGML_CPU_ALL_VARIANTS ( #14206 )
2025-06-16 11:47:57 +02:00
Bartowski
0bf49eb668
convert : remove arcee change in convert_hf_to_gguf_update.py ( #14207 )
2025-06-16 10:16:06 +02:00
Đinh Trọng Huy
4ad243677b
gguf-py : allow key override when adding value to GGUFWriter ( #14194 )
...
Co-authored-by: dinhhuy <huy.dinh@brains-tech.co.jp>
2025-06-16 09:20:59 +02:00
Jeff Bolz
c89c2d1ab9
vulkan: mutex around vkQueueSubmit ( #14127 )
...
This fixes the remaining crash in test-thread-safety on my system.
2025-06-16 08:21:08 +02:00
xctan
3555b3004b
ggml-cpu : rework weak alias on apple targets ( #14146 )
...
* ggml-cpu : rework weak alias on apple targets
* fix powerpc detection
* fix ppc detection
* fix powerpc detection on darwin
2025-06-16 13:54:15 +08:00
Bartowski
d7da8dc83a
model : Add support for Arcee AI's upcoming AFM model ( #14185 )
...
* Add Arcee AFM support
* Add draft update code
* Fix linter and update URL, may still not be final
* Update src/llama-model.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Remote accidental blank line
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-06-16 01:04:06 +02:00
Eric Curtin
cd355eda7d
server : When listening on a unix domain socket don't print http:// and port ( #14180 )
...
Instead show something like this:
main: server is listening on file.sock - starting the main loop
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-06-15 23:36:22 +02:00
Ed Addario
30e5b01de2
quantize : change int to unsigned int for KV overrides ( #14197 )
2025-06-15 18:53:45 +02:00
Concedo
6c9654f744
updated lite and docs
2025-06-15 23:44:51 +08:00
uvos
e54b394082
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 ( #14196 )
2025-06-15 17:30:13 +02:00
Concedo
861a2f5275
terminal title
2025-06-15 21:51:44 +08:00
uvos
2c2caa4443
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ ( #14183 )
2025-06-15 15:45:27 +02:00
Georgi Gerganov
5fce5f948d
kv-cache : fix use-after-move of defrag info ( #14189 )
...
ggml-ci
2025-06-15 10:52:11 +03:00
Mikko Juola
9ae4143bc6
model : add dots.llm1 architecture support ( #14044 ) ( #14118 )
...
Adds:
* Dots1Model to convert_hf_to_gguf.py
* Computation graph code to llama-model.cpp
* Chat template to llama-chat.cpp to detect this model's template.
---
The model is called "dots.llm1" (I decided to shorten it to dots1 or
DOTS1 in the code generally) architecture.
The only models that exist as of writing of this commit that follow this
architecture are "dots.llm1.inst" and "dots.llm1.base" from here:
* https://huggingface.co/rednote-hilab/dots.llm1.inst
* https://huggingface.co/rednote-hilab/dots.llm1.base
The model architecture is a combination of Qwen and Deepseek parts, as
seen here:
ffe12627b4/src/transformers/models/dots1/modular_dots1.py
2025-06-15 09:52:06 +02:00
Georgi Gerganov
c311ac664d
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ ( #14188 )
...
ggml-ci
2025-06-15 10:08:58 +03:00