pl752
5630406959
llama-mtmd-cli: Sigint rework in mtmd vision example ( #13080 )
...
* Sigint rework in mtmd vision example
* Applied suggestions on mtmd-cli PR
* Forgot to invert one of the conditions
* Update examples/llava/mtmd-cli.cpp
* Removed redundant exit check
---------
Co-authored-by: pl752 <maximpl752@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-04-23 23:32:35 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
piDack
eb1776b15a
convert : Append mult-eos,half-rope,bos to GLM4-0414 and Z ( #13021 )
...
* append mult-eos,half-rope,bos to GLM4-0414
* remove unset var
2025-04-23 16:59:14 +02:00
Radoslav Gerganov
2cca6c01e4
rpc : add command line option for number of threads for the CPU backend ( #13060 )
...
closes #13051
2025-04-23 10:32:49 +03:00
Johannes Gäßler
658987cfc9
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID ( #13014 )
...
* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID
* fix logic for RoPE support, CUDA graphs
2025-04-22 21:27:40 +02:00
Xuan-Son Nguyen
dc39a5e7a8
mtmd : support SmolVLM (version 1 and 2) ( #13050 )
...
* mtmd : support SmolVLM (version 1 and 2)
* correct chat template
* fix n_patches
* scale_factor is an int
* add more models to test
2025-04-22 16:24:54 +02:00
Concedo
3e8b84b8e5
added support for structured output in chat completions
2025-04-22 22:23:36 +08:00
Georgi Gerganov
ab47dec3d3
security : add note about RPC and server functionality ( #13061 )
...
* security : add note about RPC functionality
* security : add note about llama-server
2025-04-22 16:16:10 +03:00
Georgi Gerganov
7b53389c24
metal : add memory pool for temp allocs ( #12850 )
...
* metal : add memory pool for temp allocs (wip) [no ci]
* cont : free buffers from the heap
* cont : resize heap [no ci]
* cont : refactor heap [no ci]
* cont : heap for each cmd buffer [no ci]
* cont : fix free
* wip
* cont : fix alignment [no ci]
* cont : not working .. [no ci]
* cont : heap allocation now works [no ci]
* cont : use MTLHeapTypePlacement
ggml-ci
* metal : use dynamic MTLHeap allocations
ggml-ci
* metal : add comments
* metal : disable softmax use of mem_pool
ggml-ci
* metal : final touches
2025-04-22 16:15:51 +03:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
Concedo
e8b3aeaa28
update some defaults for max length and max ctx
2025-04-22 15:47:01 +08:00
Concedo
6dbee2f2f8
more robust glslc checks, increase default denoise str
2025-04-22 15:19:47 +08:00
Diego Devesa
1d735c0b4f
ggml : add SSE 4.2 and x64 base variant for CPUs without AVX ( #12871 )
...
* ggml : add SSE 4.2 variant for CPUs without AVX
* ggml : add x64 base ABI variant
2025-04-21 18:13:51 +02:00
Concedo
16156f0d86
updated lite
2025-04-21 22:24:33 +08:00
Concedo
6494dce405
handle estimation for multipart gguf (+1 squashed commits)
...
Squashed commits:
[c7b4af92] handle estimation for multipart gguf
2025-04-21 22:07:22 +08:00
Akarshan Biswas
5368ddda7a
SYCL: Add non-contiguous support in ROPE ( #12993 )
...
ggml-ci
2025-04-21 19:13:30 +05:30
Xuan-Son Nguyen
84a9bf2fc2
mtmd : merge llava, gemma3 and minicpmv CLI into single llama-mtmd-cli
( #13012 )
...
* mtmd : merge `llava-cli` and `gemma3-cli` into single `mtmd-cli`
* support for minicpmv
* remove cpp files of llava and minicpmv
* update hot topics
* mtmd : add not supported msg for qwen2vl
* Update examples/llava/mtmd.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-21 15:32:58 +02:00
Concedo
9cd6a1add2
allow mmproj to be run on cpu
2025-04-21 21:03:10 +08:00
Concedo
f968079290
randomize image names to prevent caching in noscript
2025-04-21 13:24:40 +08:00
Xuan-Son Nguyen
2016f07bd1
convert : experimental support for --mmproj
flag ( #13023 )
...
* convert : experimental support for `--mmproj` flag
* fix bad ctrl+f replace
* fix style
* split into subclasses TextModel and VisionModel
* rename Mode --> ModelBase
* small fix
* correct CLIP_VISION arch name (because existing GGUF already use it)
* Apply suggestions from code review
Co-authored-by: compilade <git@compilade.net>
* fix Mistral3Model
* fix typo
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: compilade <git@compilade.net>
2025-04-20 23:29:36 +02:00
Concedo
687bb5375c
Merge branch 'upstream' into concedo_experimental
2025-04-20 20:57:11 +08:00
Concedo
2ed6850c0b
added override tensor
2025-04-20 20:56:17 +08:00
Jeffrey Morgan
6602304814
llava: fix errors in clip.h on certain compilers ( #13030 )
2025-04-20 12:15:41 +02:00
Concedo
17360a3b32
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/llava/clip.cpp
2025-04-20 17:59:58 +08:00
Concedo
636b92ec1d
updated lite
2025-04-20 17:52:55 +08:00
Jeff Bolz
66168204be
vulkan: support noncontiguous rms_norm ( #13031 )
2025-04-20 10:50:02 +02:00
Jeffrey Morgan
4ba9d711ba
metal: add neg operator ( #13029 )
2025-04-20 08:28:40 +03:00
bandoti
00137157fc
Disable CI cross-compile builds ( #13022 )
2025-04-19 18:05:03 +02:00
Concedo
75dfad2bb0
fixed noscript (+1 squashed commits)
...
Squashed commits:
[dba28399] fixed noscript
2025-04-19 23:16:08 +08:00
Sigbjørn Skjæret
fb28f4f80e
gguf-py : fix upload python package workflow ( #13020 )
2025-04-19 16:26:38 +02:00
Concedo
12c2efdadd
noscript image gen
2025-04-19 18:56:52 +08:00
Xuan-Son Nguyen
37b9f0d29d
clip : refactor, add image_manipulation
and llava_uhd
classes ( #13011 )
...
* clip : refactor, add `image_manipulation` and `llava_uhd`
* refactor llava-1.6 preprocessing
* simplify logic for llava-1.5
* missing include
2025-04-19 09:15:45 +02:00
Concedo
95d1aaf4d4
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# examples/rpc/rpc-server.cpp
# ggml/src/ggml-rpc/ggml-rpc.cpp
# ggml/src/ggml-sycl/backend.hpp
# ggml/src/ggml-sycl/common.hpp
# ggml/src/ggml-sycl/element_wise.cpp
# ggml/src/ggml-sycl/element_wise.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# requirements/requirements-all.txt
2025-04-19 13:17:13 +08:00
Concedo
305e533dc6
i already knew zenity would cause issues
2025-04-19 13:04:41 +08:00
Concedo
78a910be26
noscript chat mode tweaks
2025-04-19 12:40:13 +08:00
Daniel Tang
6408210082
main : Fix Ctrl+D/newline handling ( #12951 )
...
This restores the behavior from #491 . This does not affect Ctrl+D's ability to
terminate --multiline-input lines (#1040 ).
This also actually implements #587 : "If the user wants the text to end in a
newline, this should be accomplished by explicitly adding a newline by using
\ followed by return, then returning control by pressing return again."
Fixes #12949
2025-04-18 22:02:55 +02:00
Chris Thompson
aff9d107b0
gguf-py : GGUF Editor GUI - Python + Qt6 ( #12930 )
2025-04-18 20:30:41 +02:00
Xuan-Son Nguyen
35370ba945
server : use std::move whenever possible ( #12936 )
...
* server : use std::move whenever possible
* use r-value ref
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* make task creation scoped
* restore std::move
* fix task_id not set correctly
* apply changes from suggestion
Co-authored-by: ggerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-04-18 19:58:12 +02:00
Concedo
a5b5d21cca
added chat mode to noscript
2025-04-19 00:59:00 +08:00
Concedo
4b0f63ed62
cleanup
2025-04-18 22:57:10 +08:00
Concedo
29b57d2175
updated vulkan to make use of cm2
2025-04-18 22:10:57 +08:00
Akarshan Biswas
8d66005763
SYCL: Refactor and enable FP16 in binary broadcast OPs ( #12975 )
...
* SYCL: refactor move to a separate file
* Fix binbcast
* Remove duplicates
* fix include formatting
* fix typo
2025-04-18 15:57:56 +02:00
Xuan-Son Nguyen
b9154ecff9
mtmd : add methods to access mtmd_image_tokens
( #12906 )
...
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* fix prompt_modified
* rm redundant data member
2025-04-18 10:04:51 +02:00
Radoslav Gerganov
2db9ba1464
rpc : add RPC_CMD_HELLO ( #12955 )
...
Add RPC_CMD_HELLO for getting the version of the protocol implemend by
the server. Follow the semantic versioning rules at https://semver.org
Hopefully this bring better user experience when we make breaking
changes at the protocol level and avoid issues like #12465
2025-04-18 10:13:42 +03:00
Concedo
40adb8af35
update lite, remove aetherroom dead site
2025-04-18 13:23:14 +08:00
Concedo
5d57d62665
add a timeout for zenity check
2025-04-18 13:07:26 +08:00
Concedo
bce519cee7
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/aclnn_ops.h
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# tests/test-backend-ops.cpp
2025-04-18 12:44:20 +08:00
Concedo
1a09d9cf0e
increase to 10 save slots
2025-04-18 11:30:32 +08:00
Georgi Gerganov
2f74c354c0
graph : make FA compatible with MLA + add initial Metal kernels ( #12953 )
...
* graph : make mla compatible with FA
* metal : add exp FA kernels for DeepSeek models
ggml-ci
* llama : minor naming updates
ggml-ci
* ggml : disable FA for DS head sizes
* tests : add FA tests for MLA shapes
ggml-ci
2025-04-17 18:16:36 +03:00
Concedo
64dd4f932f
update readme
2025-04-17 23:06:07 +08:00