Commit graph

4371 commits

Author SHA1 Message Date
DAN™
d8b009a945
Remove undeed header file. (#6158) 2024-03-19 17:16:09 +01:00
Concedo
8131616454 updated lite 2024-03-20 00:13:44 +08:00
Concedo
942fb4b413 fixed removed ref (+1 squashed commits)
Squashed commits:

[93f3c270] fixed removed ref (+1 squashed commits)

Squashed commits:

[df361250] remove some files
2024-03-19 19:33:56 +08:00
Pierrick Hymbert
d0d5de42e5
gguf-split: split and merge gguf per batch of tensors (#6135)
* gguf-split: split and merge gguf files per tensor

* gguf-split: build with make toolchain

* gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split

* split : minor style + fix compile warnings

* gguf-split: remove --upload not implemented

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-19 12:05:44 +01:00
Concedo
a3fa919c67 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	flake.lock
#	ggml-cuda.cu
#	ggml-cuda.h
2024-03-19 18:57:22 +08:00
Georgi Gerganov
b80cf3b2d1
common : disable repeat penalties by default (#6127) 2024-03-19 10:21:54 +02:00
slaren
970a48060a
ci : exempt some labels from being tagged as stale (#6140) 2024-03-19 10:06:54 +02:00
DAN™
4c28b82529
common : print usage on '-h' and '--help' (#6145) 2024-03-19 07:59:36 +02:00
github-actions[bot]
2d15886bb0 flake.lock: Update
Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06)
  → 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)
2024-03-18 18:51:30 +00:00
Jared Van Bortel
d199ca79f2
mpt : implement backwards compatiblity with duped output tensor (#6139) 2024-03-18 12:49:02 -04:00
Felix
104f5e0fc1
clip : fix memory leak (#6138) 2024-03-18 17:40:22 +02:00
slaren
5e1b7f94a0
backend : set max split inputs to GGML_MAX_SRC (#6137) 2024-03-18 16:33:44 +01:00
Concedo
073a279e70 change reference from kobold horde to ai horde 2024-03-18 22:35:49 +08:00
Georgi Gerganov
ac9ee6a4ad
ci : disable stale issue messages (#6126) 2024-03-18 13:45:38 +02:00
Georgi Gerganov
4f6d1337ca
ci : temporary disable sanitizer builds (#6128) 2024-03-18 13:45:27 +02:00
slaren
2bf8d0f7c4
backend : offload large batches to GPU (#6083)
* backend : offload large batches to GPU

* fix hip

* code cleanup

* fix CUDA split buffers

* Update ggml-backend-impl.h

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* cuda : fix memset without set_device

* imatrix : remove sched affix from weight names

* sched : add a new split if the current one has too many inputs
reduce max inputs per split
more cleanup

* update backends

ggml-ci

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-03-18 11:03:04 +01:00
DAN™
496bc79bc2
common : tidy-up argument parsing (#6105)
* Tidy-up argument parsing.

* Missing ref.

* common : minor

* common : add static classifier

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-18 10:27:44 +02:00
Thérence
9b03719ad7
convert : add support for CamembertModel architecture (#6119)
Adding support for CamembertModel architecture used by :
https://huggingface.co/dangvantuan/sentence-camembert-large
2024-03-18 10:17:00 +02:00
Romain D
3a6efdd03c
convert : use f32 outtype for bf16 tensors (#6106)
The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion.
Change the outtype to f32 to default to a lossless conversion.
2024-03-18 10:04:41 +02:00
Concedo
ffad5be712 updated docker link 2024-03-18 10:30:23 +08:00
Pierrick Hymbert
d01b3c4c32
common: llama_load_model_from_url using --model-url (#6098)
* common: llama_load_model_from_url with libcurl dependency

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-17 19:12:37 +01:00
Georgi Gerganov
cd776c37c9
ci : close all stale issues at once (#6115) 2024-03-17 18:51:57 +01:00
GainLee
dc0f612548
ggml:fix finding transfer queue family index error (#6094)
Co-authored-by: GainLee <ligen@meizu.com>
2024-03-17 18:12:22 +01:00
Concedo
8b360b661c Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	common/common.h
2024-03-17 23:03:12 +08:00
Concedo
5410e4644a symlink docs 2024-03-17 22:27:26 +08:00
AmirAli Mirian
c47cf414ef
ggml : add AVX512F SIMD (#6088) 2024-03-16 17:52:02 +02:00
Daniel Bevenius
b5f4ae09c3
gritlm : add initial README.md (#6086)
* gritlm: add initial README.md to examples/gritlm

This commit adds a suggestion for an initial README.md for the gritlm
example.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Use the `scripts/hf.sh` script to download the model file.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

* squash! gritlm: add initial README.md to examples/gritlm

Fix editorconfig-checker error in examples/gritlm/README.md.

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-03-16 17:46:29 +02:00
Xuan Son Nguyen
dfbfdd60f9
readme : add wllama as a wasm binding (#6100) 2024-03-16 17:42:08 +02:00
DAN™
15961ec04d
common : refactor nested if causing error C1061 on MSVC (#6101)
* Refactor nested if causing error C1061 on MSVC.

* Revert back and remove else's.

* Add flag to track found arguments.
2024-03-16 17:39:15 +02:00
Concedo
9342071f9c don't print url for localhost if remote tunnel 2024-03-16 22:19:04 +08:00
Pierrick Hymbert
a56d09a440
ci : close inactive issue with workflow (#6053)
* issues: ci - close inactive issue with workflow

* ci: close issue, change workflow schedule time
2024-03-16 14:20:53 +02:00
Concedo
7968bdebbb added more stats in perf 2024-03-16 16:53:48 +08:00
slaren
d84c48505f
llama : fix Baichuan2 13B (#6092) 2024-03-15 23:14:16 +02:00
Theia Vogel
877b4d0c62
llama : add support for control vectors (#5970)
* control vector api and implementation

* control-vectors : minor code style updates

* disable control vector when data == nullptr

use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings)

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-15 22:43:02 +02:00
Andrew Canis
12247f4c69
llama : add Command-R support (#6033)
Information about the Command-R 35B model (128k context) can be found at:
	https://huggingface.co/CohereForAI/c4ai-command-r-v01

Based on the llama2 model with a few changes:

1) New hyper parameter to scale output logits (logit_scale)
2) Uses LayerNorm instead of RMSNorm
3) Transfomer layers have a single shared LayerNorm that feeds into both the
   self-attention and FFN layers in parallel. There is no post-attention LayerNorm.
4) No support for Rotary Position Embeddings (RoPE) scaling
5) No biases used

Find GGUF files here:
	https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF

To convert model to GGUF format yourself:

1) Download Command-R Hugging Face safetensors:
	git lfs install
	git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01

2) Run:
	python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01
2024-03-15 22:41:22 +02:00
Ting Lou
4e9a7f7f7f
llava : change API to pure C style for Rust FFI bindgen (#6079)
Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>
2024-03-15 16:31:05 +02:00
slaren
3020327f6c
cuda : disable unused cudaLaunchHostFunc code (#6078) 2024-03-15 14:24:03 +02:00
Neo Zhang Jianyu
46acb36767
fix set main gpu error (#6073) 2024-03-15 18:53:53 +08:00
Georgi Gerganov
131b058409
make : ggml-metal.o depends on ggml.h 2024-03-15 11:38:40 +02:00
AidanBeltonS
753e36f650
[SYCL] Fix non-intel device selection (#6042)
* Fix non-intel device selection

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

* Update ggml-sycl.cpp

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>

---------

Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-03-15 14:56:20 +05:30
Ondřej Čertík
7ce2c77f88
gguf : add support for I64 and F64 arrays (#6062)
* gguf : add support for I64 and F64 arrays

GGML currently does not support I64 or F64 arrays and they are not often
used in machine learning, however if in the future the need arises, it
would be nice to add them now, so that the types are next to the other
types I8, I16, I32 in the enums, and it also reserves their type number.

Furthermore, with this addition the GGUF format becomes very usable for
most computational applications of NumPy (being compatible with the most
common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster,
and more versatile alternative to the `npz` format, and a simpler
alternative to the `hdf5` format.

The change in this PR seems small, not significantly increasing the
maintenance burden. I tested this from Python using GGUFWriter/Reader
and `gguf-dump`, as well as from C, everything seems to work.

* Fix compiler warnings
2024-03-15 10:46:51 +02:00
Concedo
2ef03c9de6 fix for physical batch size 2024-03-15 16:45:20 +08:00
Xuan Son Nguyen
aab606a11f
llama : add Orion chat template (#6066) 2024-03-15 10:44:57 +02:00
slaren
b0bc9f4a9d
llama-bench : use random tokens to improve accuracy with mixtral (#6069) 2024-03-15 10:22:24 +02:00
Concedo
93d3871056 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	ggml-metal.m
2024-03-15 10:37:48 +08:00
Georgi Gerganov
4755afd1cb
llama : fix integer overflow during quantization (#6063) 2024-03-14 22:58:41 +02:00
Steve Grubb
6e0438da3c
gguf : fix resource leaks (#6061)
There several places where a gguf context is allocated. A call to gguf_free
is missing in some error paths. Also on linux, llama-bench was missing a
fclose.
2024-03-14 20:29:32 +02:00
Ondřej Čertík
727107707a
gguf-py : bump version to 0.8.0 (#6060) 2024-03-14 19:57:31 +02:00
Michael Podvitskiy
69ff61397d
llama : support models without vocabulary (#5798)
* additional methods to read model and ctx parameters

* vocab size as a part of a model metadata

* models without vocabulary, convert.py part

* models without vocabulary, llama.cpp part

* PR clean up

* converter scrypt fixes

* llama_vocab_type update (renamed the new key)

* pr review fixes

* revert function renaming

* one more NoVocab assert
2024-03-14 18:21:56 +02:00
Concedo
f20fb7d778 mmq defaults to disabled only if full offload is possible 2024-03-14 23:34:45 +08:00