koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-09 19:46:11 +00:00

Author	SHA1	Message	Date
Concedo	47cbfd6150	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # llama.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-backend-ops.cpp	2024-05-17 22:30:41 +08:00
slaren	344f9126cc	ggml : tag ggml_tensor::backend as deprecated (#7290 )	2024-05-15 15:08:48 +02:00
Concedo	2ee808a747	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # ci/run.sh # llama.cpp # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # requirements.txt # scripts/compare-llama-bench.py # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-1-bpe.cpp	2024-05-14 19:28:47 +08:00
k.h.lai	30e70334f7	llava-cli: fix base64 prompt (#7248 )	2024-05-14 00:02:36 +10:00
Justine Tunney	4e3880978f	Fix memory bug in grammar parser (#7194 ) The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.	2024-05-10 21:01:08 +10:00
Andrei	d11afd6652	llava : fix moondream support (#7163 ) * Revert "Revert "llava : add support for moondream vision language model (#6899)"" This reverts commit `9da243b36a`. * Fix num_positions and embeddings initialization	2024-05-10 09:41:10 +03:00
Georgi Gerganov	9da243b36a	Revert "llava : add support for moondream vision language model (#6899 )" This reverts commit `46e12c4692`.	2024-05-08 22:14:39 +03:00
Concedo	bc39b4d98a	Merge branch 'upstream' into concedo_experimental # Conflicts: # CMakeLists.txt # README.md # ci/run.sh # docs/BLIS.md # flake.lock # grammars/README.md	2024-05-08 09:58:23 +08:00
omahs	04976db7a8	docs: fix typos (#7124 ) * fix typo * fix typos * fix typo * fix typos * fix typo * fix typos	2024-05-07 18:20:33 +03:00
Concedo	89db8afded	revert moondream to try and fix llava	2024-05-04 10:07:54 +08:00
Concedo	17a24d753c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/main-intel.Dockerfile # .devops/main-vulkan.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-vulkan.Dockerfile # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .gitignore # Makefile # README-sycl.md # README.md # ci/run.sh # flake.lock # llama.cpp # models/ggml-vocab-falcon.gguf # models/ggml-vocab-llama-spm.gguf # models/ggml-vocab-mpt.gguf # models/ggml-vocab-stablelm.gguf # models/ggml-vocab-starcoder.gguf # requirements.txt # scripts/check-requirements.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-0-bpe.py # tests/test-tokenizer-0-spm.py # tests/test-tokenizer-1-spm.cpp	2024-04-30 21:04:17 +08:00
cpumaxx	ffe666572f	llava-cli : multiple images (#6969 ) Co-authored-by: root <root@nenya.lothlorien.ca>	2024-04-29 17:34:24 +03:00
Concedo	544c36f751	merge, deprecate openblas	2024-04-26 19:24:59 +08:00
vik	46e12c4692	llava : add support for moondream vision language model (#6899 ) * add support for moondream vision language model This required making the following changes to the CLIP model: 1. Support for patch embedding bias. 2. Make class embedding and pre-layernorm optional. 3. Add support for post-layernorm. * Update examples/llava/clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-25 22:38:31 +03:00
Daniel Bevenius	4ab99d8d47	clip : rename lerp function to avoid conflict (#6894 ) This commit renamesthe lerp (linear interpolation) function in clip.cpp to avoid a conflict with the lerp function in the <cmath> standard C++ library when using c++20. The motivation for this change is to enable projects that use c++20 to be able to compile clip.cpp without having to resort to patching it. The lerp function was added to cmath in version C++20 (202002L) and is why this is not causing any issue at the moment as C++11/C++17 is currently used by llama.cpp. I realize that llama.cpp uses either C++11 (or C++17 in the case for SYCL) but wanted to ask if this would be an acceptable change just the same. Refs: https://en.cppreference.com/w/cpp/numeric/lerp Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-04-25 15:38:14 +03:00
Concedo	b4d2031215	merged, added ability to render special tokens	2024-04-22 18:19:58 +08:00
Justine Tunney	89b0bf0d5d	llava : use logger in llava-cli (#6797 ) This change removes printf() logging so llava-cli is shell scriptable.	2024-04-21 15:19:04 +03:00
Pedro Cuenca	b97bc3966e	llama : support Llama 3 HF conversion (#6745 ) * Support Llama 3 conversion The tokenizer is BPE. * style * Accept suggestion Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> * llama : add llama_token_is_eog() ggml-ci * llama : auto-detect more EOT tokens when missing in KV data * convert : replacing EOS token is a hack * llama : fix codegemma EOT token + add TODOs * llama : fix model type string for 8B model --------- Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-21 14:50:41 +03:00
Concedo	9a25d77cc1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # Makefile # README-sycl.md # README.md # ci/run.sh # ggml-cuda.cu # ggml.c # grammars/README.md # scripts/get-wikitext-2.sh # scripts/hf.sh # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2024-04-14 21:18:39 +08:00
Rene Leonhardt	5c4d767ac0	chore: Fix markdown warnings (#6625 )	2024-04-12 10:52:36 +02:00
Jared Van Bortel	1b67731e18	BERT tokenizer fixes (#6498 ) Key changes: * BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS * Nomic Embed conversion: pad vocab instead of slicing embedding tensor * llama_tokenize: handle added special tokens like HF does	2024-04-09 13:44:08 -04:00
Concedo	81ac0e5656	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cpp-clblast.srpm.spec # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-rocm.Dockerfile # .devops/server-vulkan.Dockerfile # .devops/server.Dockerfile # .github/workflows/build.yml # .github/workflows/code-coverage.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/nix-ci-aarch64.yml # .github/workflows/nix-ci.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .github/workflows/zig-build.yml # CMakeLists.txt # Makefile # README-sycl.md # README.md # ci/run.sh # examples/gguf-split/gguf-split.cpp # flake.lock # flake.nix # llama.cpp # scripts/compare-llama-bench.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2024-04-07 22:07:27 +08:00
Concedo	a530afa1e4	Merge commit '`280345968d`' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/main-cuda.Dockerfile # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # ci/run.sh # docs/token_generation_performance_tips.md # flake.lock # llama.cpp # scripts/LlamaConfig.cmake.in # scripts/compare-commits.sh # scripts/server-llm.sh # tests/test-quantize-fns.cpp	2024-04-07 20:27:17 +08:00
Concedo	9c0fbf9f73	Merge commit '`ad3a0505e3`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/close-issue.yml # .github/workflows/code-coverage.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/nix-ci-aarch64.yml # .github/workflows/nix-ci.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .github/workflows/zig-build.yml # .gitignore # CMakeLists.txt # Makefile # README-sycl.md # README.md # build.zig # common/CMakeLists.txt # llama.cpp # tests/CMakeLists.txt # tests/test-backend-ops.cpp	2024-04-06 18:32:57 +08:00
Ziang Wu	66ba560256	llava : fix MobileVLM (#6364 ) * fix empty bug * Update MobileVLM-README.md added more results on devices * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update MobileVLM-README.md remove gguf links --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-28 16:33:10 +02:00
Ziang Wu	d0e2f6416b	doc: fix typo in MobileVLM-README.md (#6181 )	2024-03-28 13:03:30 +09:00
slaren	280345968d	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
Ziang Wu	f9c7ba3447	llava : update MobileVLM-README.md (#6180 )	2024-03-20 17:29:51 +02:00
Ziang Wu	272935b281	llava : add MobileVLM_V2 backup (#6175 ) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace * fix deifinition mistake in clip.cpp --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 17:02:32 +02:00
Georgi Gerganov	d795988d9e	Revert "llava : add a MobileVLM_V2-1.7B backup (#6152 )" This reverts commit `f8c4e745e1`.	2024-03-20 13:29:49 +02:00
Ziang Wu	f8c4e745e1	llava : add a MobileVLM_V2-1.7B backup (#6152 ) * Add MobileVLM_V2 backup * Update MobileVLM-README.md * Update examples/llava/MobileVLM-README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/llava/convert-image-encoder-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * clip : fix whitespace --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-20 13:20:37 +02:00
Concedo	a3fa919c67	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # flake.lock # ggml-cuda.cu # ggml-cuda.h	2024-03-19 18:57:22 +08:00
Felix	104f5e0fc1	clip : fix memory leak (#6138 )	2024-03-18 17:40:22 +02:00
Concedo	8b360b661c	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # README.md # common/common.h	2024-03-17 23:03:12 +08:00
Ting Lou	4e9a7f7f7f	llava : change API to pure C style for Rust FFI bindgen (#6079 ) Co-authored-by: Lou Ting <louting.t@alibaba-inc.com>	2024-03-15 16:31:05 +02:00
Concedo	93d3871056	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # ggml-metal.m	2024-03-15 10:37:48 +08:00
Steve Grubb	6e0438da3c	gguf : fix resource leaks (#6061 ) There several places where a gguf context is allocated. A call to gguf_free is missing in some error paths. Also on linux, llama-bench was missing a fclose.	2024-03-14 20:29:32 +02:00
Jian Liao	15a333260a	readme : improve readme for Llava-1.6 example (#6044 ) Co-authored-by: Jian Liao <jianliao@adobe.com>	2024-03-14 13:18:23 +02:00
Concedo	6a32c14e86	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README-sycl.md # README.md # flake.lock # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/.gitignore # tests/test-backend-ops.cpp	2024-03-11 23:00:47 +08:00
Concedo	227f59dab6	added a simple program to do quantization for clip models	2024-03-11 21:50:30 +08:00
Georgi Gerganov	5b09797321	ggml : remove old quantization functions (#5942 ) * ggml : remove old quantization functions ggml-ci * ggml : simplify ggml_quantize_chunk ggml-ci * ggml : restrict correctness ggml-ci * ggml : remove hist data from the quantization API ggml-ci * tests : remove hist usage in test-backend-ops ggml-ci * vulkan : remove hist and fix typo	2024-03-09 15:53:59 +02:00
Georgi Gerganov	ab336a9d5e	code : normalize enum names (#5697 ) * coda : normalize enum names ggml-ci * code : cont * code : cont	2024-02-25 12:09:09 +02:00
Daniel Bevenius	cc6cac08e3	llava : add --skip-unknown to 1.6 convert.py (#5632 ) This commit adds the `--skip-unknown` option to the convert.py script and removes the saving of the updated checkpoints to avoid updating possibly checked out files. The motivation for this change is that this was done for 1.5 in Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") and makes the examples more consistent. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-21 15:36:57 +02:00
CJ Pais	6560bed3f0	server : support llava 1.6 (#5553 ) * server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build	2024-02-20 21:07:22 +02:00
Daniel Bevenius	4ed8e4fbef	llava : add explicit instructions for llava-1.6 (#5611 ) This commit contains a suggestion for the README.md in the llava example. The suggestion adds explicit instructions for how to convert a llava-1.6 model and run it using llava-cli. The motivation for this is that having explicit instructions similar to the 1.5 instructions will make it easier for users to try this out. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-20 19:30:27 +02:00
Georgi Gerganov	1387cf60f7	llava : remove extra cont (#5587 )	2024-02-19 15:23:17 +02:00
slaren	6fd413791a	llava : replace ggml_cpy with ggml_cont	2024-02-19 15:09:43 +02:00
Daniel Bevenius	7084755396	llava : avoid changing the original BakLLaVA model (#5577 ) This is a follup of Commit `fc0c8d286a` ("llava : update surgery script to not remove tensors") but this time the change is to the BakLLaVA specific part of the surgery script. I've been able to test this using SkunkworksAI/BakLLaVA-1 and it works as expected using the instructions in README.md. Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-02-19 10:31:59 +02:00
Daniel Bevenius	fc0c8d286a	llava : update surgery script to not remove tensors (#5536 ) This commit updates the surgery script to not remove the tensors from the model file. For this to work the `--skip-unknown` flag is added as an argument to the convert.py script in README.md. The motivation for this change is that the surgery script currently removes the projector tensors from the model file. If the model was checked out from a repository, the model file will have been updated and have to be checked out again to reset this effect. If this can be avoided I think it would be preferable. I did not perform this change for BakLLaVA models as I am not sure how that part works.	2024-02-18 18:19:23 +02:00
Herman Semenov	4cb0727698	llava : removed excess free(NULL) operation (#5531 )	2024-02-16 14:43:23 +02:00

1 2

95 commits