Concedo
a22a666f3b
allow online apis when in nomodel mode
2024-08-27 15:30:54 +08:00
Concedo
b78a637da5
try to optimize context shifting
2024-08-26 23:07:31 +08:00
Concedo
c61fa9155d
handle oversized images by downscaling
2024-08-26 13:58:18 +08:00
Concedo
6acbf1d7f4
macos default to full offload when using gpulayers auto (-1)
2024-08-26 12:12:51 +08:00
Concedo
97aa8648ed
allow launching with no models loaded
2024-08-25 23:57:32 +08:00
Concedo
efb8be013e
fixed swagger
2024-08-25 23:29:55 +08:00
Concedo
7bc87e1f0f
added llava letterboxing feature
2024-08-25 23:15:38 +08:00
Concedo
cca3c4c78b
xtc fixes
2024-08-22 23:18:46 +08:00
Concedo
0b96097439
add version number into help page
2024-08-22 00:52:30 +08:00
Concedo
fc2545dc83
fixed a typo
2024-08-22 00:25:56 +08:00
Concedo
5bf527a6ae
added xtc sampler
2024-08-21 23:57:15 +08:00
Concedo
1a7ecd55e6
timing for init step, clip for vulkan
2024-08-21 18:14:53 +08:00
Concedo
6200b6d64e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .gitignore
# README.md
# docs/build.md
# flake.lock
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
2024-08-21 17:17:36 +08:00
Concedo
cd69ab218e
fixed DRY
2024-08-21 17:01:28 +08:00
Younes Belkada
b40eb84895
llama : support for falcon-mamba architecture ( #9074 )
...
* feat: initial support for llama.cpp
* fix: lint
* refactor: better refactor
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* fix: address comments
* Update convert_hf_to_gguf.py
Co-authored-by: compilade <git@compilade.net>
* fix: add more cleanup and harmonization
* fix: lint
* Update gguf-py/gguf/gguf_writer.py
Co-authored-by: compilade <git@compilade.net>
* fix: change name
* Apply suggestions from code review
Co-authored-by: compilade <git@compilade.net>
* add in operator
* fix: add `dt_b_c_rms` in `llm_load_print_meta`
* fix: correct printf format for bool
* fix: correct print format
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* llama : quantize more Mamba tensors
* llama : use f16 as the fallback of fallback quant types
---------
Co-authored-by: compilade <git@compilade.net>
2024-08-21 11:06:36 +03:00
fairydreaming
f63f603c87
llava : zero-initialize clip_ctx structure fields with aggregate initialization 908)
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-08-21 09:45:49 +02:00
Daniel Bevenius
8455340b87
llama : std::move llm_bigram_bpe from work_queue ( #9062 )
...
* llama : std::move llm_bigram_bpe from work_queue
This commit updates the retrieval of llm_bigram_bpe objects from
work_queue.top() by using std::move.
The motivation for this is to avoid the copying of the std::string
`text` member of the llm_bigram_bpe struct.
* squash! llama : std::move llm_bigram_bpe from work_queue
Introduced a MovablePriorityQueue class to allow moving elements
out of the priority queue for llm_bigram_bpe.
* squash! llama : std::move llm_bigram_bpe from work_queue
Rename MovablePriorityQueue to lama_priority_queue.
* squash! llama : std::move llm_bigram_bpe from work_queue
Rename lama_priority_queue -> llama_priority_queue.
2024-08-21 10:32:58 +03:00
Changyeon Kim
2f3c1466ff
llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model. ( #8984 )
...
* llava: Add ACC OP for GPU acceleration to the Vulkan backend in the LLAVA CLIP model.
- The CLIP model now prioritizes the Vulkan backend over the CPU when vulkan available.
- A GGML_OP_ACC shader has been added.
- The encoding performance of the CLIP model improved from 4.2s on the CPU to 0.9s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* fix-up coding style.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* Fix-up the missing initial parameter to resolve the compilation warning.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Add missing parameters.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Use nb1 and nb2 for dst.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* Fix check results ggml_acc call
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
Co-authored-by: 0cc4m <picard12@live.de>
2024-08-20 21:00:00 +02:00
Concedo
2cf6d16c40
adjust sleep time
2024-08-21 01:06:41 +08:00
Concedo
6a4becb731
dry is still buggy because token indexes are wrong
2024-08-21 00:59:26 +08:00
Meng, Hengyu
50addec9a5
[SYCL] fallback mmvq ( #9088 )
...
* fallback mmvq to mul_mat
* mmvq in cuda path
* Update ggml/src/ggml-sycl.cpp
Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
---------
Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
2024-08-20 23:50:17 +08:00
zhentaoyu
4f8d19ff17
[SYCL] Fix SYCL im2col and convert Overflow with Large Dims ( #9052 )
...
* sycl: fix im2col overflow and sync with cuda
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: fix convert overflow
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: fix convert and dequantize
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: fix ib in dmmv
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl:refine convert
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* sycl: move downsample global_range into common
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* test: add im2col and convert test cases
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* test: make new cases only in sycl
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
* test: comment new test_cases for only local testing
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
---------
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-20 23:06:51 +08:00
Concedo
db6ef8d1e1
revert dry state reset
2024-08-20 22:22:21 +08:00
Concedo
c1ae350e5b
fixed race condition when generating
2024-08-20 20:17:55 +08:00
Concedo
7ee359a59b
on multigpu setups, pick lowest free mem instead of highest for auto layers
2024-08-20 19:02:16 +08:00
fairydreaming
90db8146d5
tests : add missing comma in grammar integration tests ( #9099 )
...
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2024-08-20 12:09:55 +03:00
wangshuai09
cfac111e2b
cann: add doc for cann backend ( #8867 )
...
Co-authored-by: xuedinge233 <damow890@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2024-08-19 16:46:38 +08:00
Radoslav Gerganov
1b6ff90ff8
rpc : print error message when failed to connect endpoint ( #9042 )
2024-08-19 10:11:45 +03:00
Radoslav Gerganov
18eaf29f4c
rpc : prevent crashes on invalid input ( #9040 )
...
Add more checks which prevent RPC server from crashing if invalid input
is received from client
2024-08-19 10:10:21 +03:00
Concedo
3bd70d75ea
fix segfault, kcpp is now debuggable
2024-08-19 13:50:49 +08:00
Concedo
1fbf21eec4
Revert "fix out of bounds access"
...
This reverts commit 3ac183633a .
2024-08-19 13:06:39 +08:00
Concedo
3ac183633a
fix out of bounds access
2024-08-19 00:30:54 +08:00
Georgi Gerganov
554b049068
flake.lock: Update ( #9068 )
2024-08-18 07:43:32 -07:00
Concedo
04166d20a4
better quant clip
2024-08-18 22:15:59 +08:00
Concedo
b3b00750b7
update lite
2024-08-18 18:23:21 +08:00
ltoniazzi
2339a0be1c
tests : add integration test for lora adapters ( #8957 )
...
* Add printing to check weights match torch version
* minor code style changes
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-08-18 11:58:04 +02:00
Concedo
e9eb6fe51a
move chat compl to models tab
2024-08-18 14:56:10 +08:00
Concedo
314a620e96
added readme for macos
2024-08-18 13:11:49 +08:00
Concedo
06476b8247
Merge branch 'upstream' into concedo_experimental
2024-08-18 12:11:14 +08:00
Concedo
98dff80b9c
update lite
2024-08-18 12:00:06 +08:00
Concedo
e2e6d892b4
fix declaration order
2024-08-18 02:15:34 +08:00
Concedo
d71b5477c5
update lite, cleanup, fix interrogate format
2024-08-18 00:48:53 +08:00
Yoshi Suhara
2fb9267887
Fix incorrect use of ctx_split for bias tensors ( #9063 )
2024-08-17 15:34:21 +02:00
Concedo
1edf83761a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/bench.yml.disabled
# Makefile
# README.md
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-vulkan.cpp
2024-08-17 16:21:14 +08:00
Xuan Son Nguyen
8b3befc0e2
server : refactor middleware and /health endpoint ( #9056 )
...
* server : refactor middleware and /health endpoint
* move "fail_on_no_slot" to /slots
* Update examples/server/server.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix server tests
* fix CI
* update server docs
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-16 17:19:05 +02:00
tc-mb
d565bb2fd5
llava : support MiniCPM-V-2.6 ( #8967 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* support minicpmv2.6
* modify convert script of minicpmv
* modify convert
* modify convert
* add readme
* add resampler of v2.6
* modify clip
* modify readme
* fix type-check
* fix type-check
* fix type-check
* fix type-check
* modify convert script and readme
* fix convert script and readme
* fix convert
* fix num in convert
* fix type-check
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com>
Co-authored-by: harvestingmoon <leewenyeong@gmail.com>
2024-08-16 16:34:41 +03:00
Farbod Bijary
ee2984bdaf
py : fix wrong input type for raw_dtype in ggml to gguf scripts ( #8928 )
...
Co-authored-by: farbod <farbod.bjary82@gmail.com>
2024-08-16 13:36:30 +03:00
Aisuko
c8ddce8560
Fix inference example lacks required parameters ( #9035 )
...
Signed-off-by: Aisuko <urakiny@gmail.com>
2024-08-16 11:08:59 +02:00
compilade
23fd453544
gguf-py : bump version from 0.9.1 to 0.10.0 ( #9051 )
2024-08-16 09:36:11 +03:00
Minsoo Cheong
c679e0cb5c
llama : add EXAONE model support ( #9025 )
...
* add exaone model support
* add chat template
* fix whitespace
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add ftype
* add exaone pre-tokenizer in `llama-vocab.cpp`
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* fix lint
Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>
* add `EXAONE` to supported models in `README.md`
* fix space
Co-authored-by: compilade <git@compilade.net>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <113953597+compilade@users.noreply.github.com>
Co-authored-by: compilade <git@compilade.net>
2024-08-16 09:35:18 +03:00