koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-13 02:19:41 +00:00

Author	SHA1	Message	Date
DAN™	d8b009a945	Remove undeed header file. (#6158 )	2024-03-19 17:16:09 +01:00
Concedo	8131616454	updated lite	2024-03-20 00:13:44 +08:00
Concedo	942fb4b413	fixed removed ref (+1 squashed commits) Squashed commits: [93f3c270] fixed removed ref (+1 squashed commits) Squashed commits: [df361250] remove some files	2024-03-19 19:33:56 +08:00
Pierrick Hymbert	d0d5de42e5	gguf-split: split and merge gguf per batch of tensors (#6135 ) * gguf-split: split and merge gguf files per tensor * gguf-split: build with make toolchain * gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split * split : minor style + fix compile warnings * gguf-split: remove --upload not implemented --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-19 12:05:44 +01:00
Concedo	a3fa919c67	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # flake.lock # ggml-cuda.cu # ggml-cuda.h	2024-03-19 18:57:22 +08:00
Georgi Gerganov	b80cf3b2d1	common : disable repeat penalties by default (#6127 )	2024-03-19 10:21:54 +02:00
slaren	970a48060a	ci : exempt some labels from being tagged as stale (#6140 )	2024-03-19 10:06:54 +02:00
DAN™	4c28b82529	common : print usage on '-h' and '--help' (#6145 )	2024-03-19 07:59:36 +02:00
github-actions[bot]	2d15886bb0	flake.lock: Update Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/9df3e30ce24fd28c7b3e2de0d986769db5d6225d' (2024-03-06) → 'github:NixOS/nixpkgs/d691274a972b3165335d261cc4671335f5c67de9' (2024-03-14)	2024-03-18 18:51:30 +00:00
Jared Van Bortel	d199ca79f2	mpt : implement backwards compatiblity with duped output tensor (#6139 )	2024-03-18 12:49:02 -04:00
Felix	104f5e0fc1	clip : fix memory leak (#6138 )	2024-03-18 17:40:22 +02:00
slaren	5e1b7f94a0	backend : set max split inputs to GGML_MAX_SRC (#6137 )	2024-03-18 16:33:44 +01:00
Concedo	073a279e70	change reference from kobold horde to ai horde	2024-03-18 22:35:49 +08:00
Georgi Gerganov	ac9ee6a4ad	ci : disable stale issue messages (#6126 )	2024-03-18 13:45:38 +02:00
Georgi Gerganov	4f6d1337ca	ci : temporary disable sanitizer builds (#6128 )	2024-03-18 13:45:27 +02:00
slaren	2bf8d0f7c4	backend : offload large batches to GPU (#6083 ) * backend : offload large batches to GPU * fix hip * code cleanup * fix CUDA split buffers * Update ggml-backend-impl.h Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cuda : fix memset without set_device * imatrix : remove sched affix from weight names * sched : add a new split if the current one has too many inputs reduce max inputs per split more cleanup * update backends ggml-ci --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-03-18 11:03:04 +01:00
DAN™	496bc79bc2	common : tidy-up argument parsing (#6105 ) * Tidy-up argument parsing. * Missing ref. * common : minor * common : add static classifier --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-18 10:27:44 +02:00
Thérence	9b03719ad7	convert : add support for CamembertModel architecture (#6119 ) Adding support for CamembertModel architecture used by : https://huggingface.co/dangvantuan/sentence-camembert-large	2024-03-18 10:17:00 +02:00
Romain D	3a6efdd03c	convert : use f32 outtype for bf16 tensors (#6106 ) The old behaviour is to use f16, but bf16 to f16 is not a lossless conversion. Change the outtype to f32 to default to a lossless conversion.	2024-03-18 10:04:41 +02:00
Concedo	ffad5be712	updated docker link	2024-03-18 10:30:23 +08:00
Pierrick Hymbert	d01b3c4c32	common: llama_load_model_from_url using --model-url (#6098 ) * common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-17 19:12:37 +01:00
Georgi Gerganov	cd776c37c9	ci : close all stale issues at once (#6115 )	2024-03-17 18:51:57 +01:00
GainLee	dc0f612548	ggml:fix finding transfer queue family index error (#6094 ) Co-authored-by: GainLee <ligen@meizu.com>	2024-03-17 18:12:22 +01:00
Concedo	8b360b661c	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # README.md # common/common.h	2024-03-17 23:03:12 +08:00
Concedo	5410e4644a	symlink docs	2024-03-17 22:27:26 +08:00
AmirAli Mirian	c47cf414ef	ggml : add AVX512F SIMD (#6088 )	2024-03-16 17:52:02 +02:00
Daniel Bevenius	b5f4ae09c3	gritlm : add initial README.md (#6086 ) * gritlm: add initial README.md to examples/gritlm This commit adds a suggestion for an initial README.md for the gritlm example. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Use the `scripts/hf.sh` script to download the model file. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> * squash! gritlm: add initial README.md to examples/gritlm Fix editorconfig-checker error in examples/gritlm/README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com> --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-03-16 17:46:29 +02:00
Xuan Son Nguyen	dfbfdd60f9	readme : add wllama as a wasm binding (#6100 )	2024-03-16 17:42:08 +02:00
DAN™	15961ec04d	common : refactor nested if causing error C1061 on MSVC (#6101 ) * Refactor nested if causing error C1061 on MSVC. * Revert back and remove else's. * Add flag to track found arguments.	2024-03-16 17:39:15 +02:00
Concedo	9342071f9c	don't print url for localhost if remote tunnel	2024-03-16 22:19:04 +08:00
Pierrick Hymbert	a56d09a440	ci : close inactive issue with workflow (#6053 ) * issues: ci - close inactive issue with workflow * ci: close issue, change workflow schedule time	2024-03-16 14:20:53 +02:00
Concedo	7968bdebbb	added more stats in perf	2024-03-16 16:53:48 +08:00
slaren	d84c48505f	llama : fix Baichuan2 13B (#6092 )	2024-03-15 23:14:16 +02:00
Theia Vogel	877b4d0c62	llama : add support for control vectors (#5970 ) * control vector api and implementation * control-vectors : minor code style updates * disable control vector when data == nullptr use -1 for disabled range (also on init) in case we ever support controlling layer 0 (embeddings) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-15 22:43:02 +02:00
Andrew Canis	12247f4c69	llama : add Command-R support (#6033 ) Information about the Command-R 35B model (128k context) can be found at: https://huggingface.co/CohereForAI/c4ai-command-r-v01 Based on the llama2 model with a few changes: 1) New hyper parameter to scale output logits (logit_scale) 2) Uses LayerNorm instead of RMSNorm 3) Transfomer layers have a single shared LayerNorm that feeds into both the self-attention and FFN layers in parallel. There is no post-attention LayerNorm. 4) No support for Rotary Position Embeddings (RoPE) scaling 5) No biases used Find GGUF files here: https://huggingface.co/andrewcanis/c4ai-command-r-v01-GGUF To convert model to GGUF format yourself: 1) Download Command-R Hugging Face safetensors: git lfs install git clone https://huggingface.co/CohereForAI/c4ai-command-r-v01 2) Run: python3 convert-hf-to-gguf.py --outtype f16 ./c4ai-command-r-v01	2024-03-15 22:41:22 +02:00
Ting Lou	4e9a7f7f7f	llava : change API to pure C style for Rust FFI bindgen (#6079 ) Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>	2024-03-15 16:31:05 +02:00
slaren	3020327f6c	cuda : disable unused cudaLaunchHostFunc code (#6078 )	2024-03-15 14:24:03 +02:00
Neo Zhang Jianyu	46acb36767	fix set main gpu error (#6073 )	2024-03-15 18:53:53 +08:00
Georgi Gerganov	131b058409	make : ggml-metal.o depends on ggml.h	2024-03-15 11:38:40 +02:00
AidanBeltonS	753e36f650	[SYCL] Fix non-intel device selection (#6042 ) * Fix non-intel device selection * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * Update ggml-sycl.cpp Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> --------- Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-03-15 14:56:20 +05:30
Ondřej Čertík	7ce2c77f88	gguf : add support for I64 and F64 arrays (#6062 ) * gguf : add support for I64 and F64 arrays GGML currently does not support I64 or F64 arrays and they are not often used in machine learning, however if in the future the need arises, it would be nice to add them now, so that the types are next to the other types I8, I16, I32 in the enums, and it also reserves their type number. Furthermore, with this addition the GGUF format becomes very usable for most computational applications of NumPy (being compatible with the most common NumPy dtypes: i8, i16, i32, i64, f32, f64), providing a faster, and more versatile alternative to the `npz` format, and a simpler alternative to the `hdf5` format. The change in this PR seems small, not significantly increasing the maintenance burden. I tested this from Python using GGUFWriter/Reader and `gguf-dump`, as well as from C, everything seems to work. * Fix compiler warnings	2024-03-15 10:46:51 +02:00
Concedo	2ef03c9de6	fix for physical batch size	2024-03-15 16:45:20 +08:00
Xuan Son Nguyen	aab606a11f	llama : add Orion chat template (#6066 )	2024-03-15 10:44:57 +02:00
slaren	b0bc9f4a9d	llama-bench : use random tokens to improve accuracy with mixtral (#6069 )	2024-03-15 10:22:24 +02:00
Concedo	93d3871056	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # ggml-metal.m	2024-03-15 10:37:48 +08:00
Georgi Gerganov	4755afd1cb	llama : fix integer overflow during quantization (#6063 )	2024-03-14 22:58:41 +02:00
Steve Grubb	6e0438da3c	gguf : fix resource leaks (#6061 ) There several places where a gguf context is allocated. A call to gguf_free is missing in some error paths. Also on linux, llama-bench was missing a fclose.	2024-03-14 20:29:32 +02:00
Ondřej Čertík	727107707a	gguf-py : bump version to 0.8.0 (#6060 )	2024-03-14 19:57:31 +02:00
Michael Podvitskiy	69ff61397d	llama : support models without vocabulary (#5798 ) * additional methods to read model and ctx parameters * vocab size as a part of a model metadata * models without vocabulary, convert.py part * models without vocabulary, llama.cpp part * PR clean up * converter scrypt fixes * llama_vocab_type update (renamed the new key) * pr review fixes * revert function renaming * one more NoVocab assert	2024-03-14 18:21:56 +02:00
Concedo	f20fb7d778	mmq defaults to disabled only if full offload is possible	2024-03-14 23:34:45 +08:00

... 8 9 10 11 12 ...

4371 commits