koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-06-02 07:19:23 +00:00

Author	SHA1	Message	Date
0cc4m	c3db99ea32	Allow use of OpenCL GPU-based BLAS using ClBlast instead of OpenBLAS for context processing	2023-04-10 18:20:40 +02:00
Concedo	f53238f570	Merged the upstream updates for model loading code, and ditched the legacy llama loaders since they were no longer needed.	2023-04-10 12:00:34 +08:00
comex	f963b63afa	Rewrite loading code to try to satisfy everyone: - Support all three formats (ggml, ggmf, ggjt). (However, I didn't include the hack needed to support GPT4All files without conversion. Those can still be used after converting them with convert.py from my other PR.) - Support both mmap and read (mmap is used by default, but can be disabled with `--no-mmap`, and is automatically disabled for pre-ggjt files or on platforms where mmap is not supported). - Support multi-file models like before, but automatically determine the number of parts rather than requiring `--n_parts`. - Improve validation and error checking. - Stop using the per-file type field (f16) entirely in favor of just relying on the per-tensor type/size fields. This has no immediate benefit, but makes it easier to experiment with different formats, and should make it easier to support the new GPTQ-for-LLaMa models in the future (I have some work in progress on that front). - Support VirtualLock on Windows (using the same `--mlock` option as on Unix). - Indicate loading progress when using mmap + mlock. (Which led me to the interesting observation that on my Linux machine, with a warm file cache, mlock actually takes some time, whereas mmap without mlock starts almost instantly...) - To help implement this, move mlock support from ggml to the loading code. - madvise/PrefetchVirtualMemory support (based on #740) - Switch from ifstream to the `fopen` family of functions to avoid unnecessary copying and, when mmap is enabled, allow reusing the same file descriptor for both metadata reads and mmap (whereas the existing implementation opens the file a second time to mmap). - Quantization now produces a single-file output even with multi-file inputs (not really a feature as much as 'it was easier this way'). Implementation notes: I tried to factor the code into more discrete pieces than before. Regarding code style: I tried to follow the code style, but I'm naughty and used a few advanced C++ features repeatedly: - Destructors to make it easier to ensure everything gets cleaned up. - Exceptions. I don't even usually use exceptions when writing C++, and I can remove them if desired... but here they make the loading code much more succinct while still properly handling a variety of errors, ranging from API calls failing to integer overflow and allocation failure. The exceptions are converted to error codes at the API boundary.) Co-authored-by: Pavol Rusnak <pavol@rusnak.io> (for the bit I copied from #740)	2023-04-10 01:10:46 +02:00
Concedo	0b904e12db	Merge branch 'master' into concedo # Conflicts: # Makefile	2023-04-08 17:42:09 +08:00
Concedo	d8e37bfe75	new gpt2 format supported	2023-04-08 17:35:36 +08:00
unbounded	62cfc54f77	Add quantize-stats command for testing quantization (#728 ) Command that calculates some statistics over the errors introduced by quantization, like mean square error, max error and some percentile errors for layer weights. Should be useful for testing quantization improvements. Exposes some internal state from ggml and llama for testing	2023-04-08 00:09:18 +02:00
Concedo	d1c957ee64	strip symbols	2023-04-08 00:59:34 +08:00
bhubbb	698f7b5d63	make : add libllama.so target for llama-cpp-python (#797 ) I was able to get llama-cpp-python working but only when I build libllama.so with make.	2023-04-07 19:11:58 +03:00
Concedo	1d48db4f63	dont build quantize	2023-04-07 17:11:26 +08:00
Ivan Stepanov	0c44427df1	make : missing host optimizations in CXXFLAGS (#763 )	2023-04-05 17:38:37 +03:00
Concedo	5c1920df43	why nobody ever told me the makefile doesnt work outside x86 xD	2023-04-05 17:15:42 +08:00
Concedo	57e9f929ee	renamed misnamed ACCELERATE define, and removed all -march=native and -mtune=native flags	2023-04-05 15:22:13 +08:00
Concedo	14273fea7a	integrated gpt2 support	2023-04-04 23:15:47 +08:00
Concedo	52de932842	removed main.exe to reduce clutter, added support for rep pen in gptj	2023-04-04 20:43:13 +08:00
Concedo	eb5b22dda2	rebrand to koboldcpp	2023-04-03 10:35:18 +08:00
Concedo	8dd8ab1659	Various enhancement and integration pygmalion.cpp	2023-04-03 00:04:43 +08:00
Concedo	bb965cc120	Merge branch 'master' into concedo # Conflicts: # README.md	2023-04-02 17:13:28 +08:00
Concedo	9aabb0d9db	massive refactor completed, GPT-J integrated	2023-04-02 17:03:30 +08:00
Fabian	c4f89d8d73	make : use -march=native -mtune=native on x86 (#609 )	2023-04-02 10:17:05 +03:00
Concedo	b1f08813e3	added support for gpt4all original format	2023-04-02 00:53:46 +08:00
Concedo	085a9f90a7	still refactoring	2023-04-01 11:56:34 +08:00
Concedo	6e6125ebdb	updated pyinstaller to clean temp dir,removed warning flags from makefile because they are just clutter.	2023-04-01 09:25:41 +08:00
Concedo	801b178f2a	still refactoring, but need a checkpoint to prepare build for 1.0.7	2023-04-01 08:55:14 +08:00
Concedo	6b86f5ea22	halfway refactoring, wip adding other model types	2023-04-01 01:13:05 +08:00
Concedo	559a1967f7	Backwards compatibility formats all done Merge branch 'master' into concedo # Conflicts: # CMakeLists.txt # README.md # llama.cpp	2023-03-31 19:01:33 +08:00
david raistrick	1f0414feec	make : fix darwin f16c flags check (#615 ) ...there was no check. ported upstream from https://github.com/zanussbaum/gpt4all.cpp/pull/2 (I dont see any clean path for upstream patches)	2023-03-30 20:34:45 +03:00
Concedo	354d4f232f	fixed linux openblas build errors	2023-03-30 11:55:35 +08:00
Concedo	664b277c27	integrated libopenblas for greatly accelerated prompt processing. Windows binaries are included - feel free to build your own or to build for other platforms, but that is beyond the scope of this repo. Will fall back to non-blas if libopenblas is removed.	2023-03-30 00:43:52 +08:00
Concedo	49c4c225b5	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile	2023-03-29 21:08:03 +08:00
Stephan Walter	436e561931	all : be more strict about converting float to double (#458 ) * Be more strict about converting float to double * Test equivalence of round, SILU implementations Test module is commented out in CMakeLists.txt because the tests may take a long time, depending on how much the compiler optimizes. * Fix softmax in perplexity.cpp * all : prefer float over double where appropriate * perplexity : add <cmath> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-03-28 19:48:20 +03:00
Concedo	bf30406f50	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # Makefile # README.md	2023-03-28 17:13:38 +08:00
RJ Adriaansen	4b8efff0e3	Add embedding example to Makefile (#540 )	2023-03-28 09:11:09 +03:00
Concedo	57474944d6	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md	2023-03-26 14:52:08 +08:00
Georgi Gerganov	a316a425d0	Overhaul the examples structure - main -> examples - utils -> examples (renamed to "common") - quantize -> examples - separate tools for "perplexity" and "embedding" Hope I didn't break something !	2023-03-25 20:26:40 +02:00
Concedo	3c78124aac	Merge branch 'master' into concedo # Conflicts: # README.md	2023-03-25 11:20:04 +08:00
Concedo	506cd62638	changed some defaults to hopefully increase compatibility	2023-03-25 10:40:11 +08:00
Cameron Kaiser	481044d50c	additional optimizations for POWER9 (#454 )	2023-03-24 17:19:26 +02:00
Concedo	1166fda943	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md	2023-03-23 23:51:07 +08:00
Kerfuffle	a140219e81	Fix Makefile echo escape codes (by removing them). (#418 )	2023-03-23 12:41:32 +01:00
Concedo	86c7457e24	Merge branch 'master' into concedo # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # main.cpp	2023-03-22 22:31:45 +08:00
Georgi Gerganov	f5a77a629b	Introduce C-style API (#370 ) * Major refactoring - introduce C-style API * Clean up * Add <cassert> * Add <iterator> * Add <algorithm> .... * Fix timing reporting and accumulation * Measure eval time only for single-token calls * Change llama_tokenize return meaning	2023-03-22 07:32:36 +02:00
Alex von Gluck IV	f157088cb7	makefile: Fix CPU feature detection on Haiku (#218 )	2023-03-21 18:21:06 +02:00
Kevin Lo	715d292ee0	Add OpenBSD support (#314 )	2023-03-21 17:50:09 +02:00
Qingyou Meng	c3b2306b18	Makefile: slightly cleanup for Mac Intel; echo instead of run ./main -h (#335 )	2023-03-21 17:44:11 +02:00
Georgi Gerganov	eb34620aec	Add tokenizer test + revert to C++11 (#355 ) * Add test-tokenizer-0 to do a few tokenizations - feel free to expand * Added option to convert-pth-to-ggml.py script to dump just the vocabulary * Added ./models/ggml-vocab.bin containing just LLaMA vocab data (used for tests) * Added utility to load vocabulary file from previous point (temporary implementation) * Avoid using std::string_view and drop back to C++11 (hope I didn't break something) * Rename gpt_vocab -> llama_vocab * All CMake binaries go into ./bin/ now	2023-03-21 17:29:41 +02:00
Casey Primozic	2e664f1ff4	Add initial AVX512 support for dot product on Linux (#320 ) * Update Makefile to detect AVX512 support and add compiler flags if it's available * Based on existing AVX2 implementation, dot product on one 32-value block of 4-bit quantized ints at a time * Perform 8 bit -> 16 bit sign extension and multiply+add on 32 values at time instead of 16 * Use built-in AVX512 horizontal reduce add to get sum at the end * Manual unrolling on inner dot product loop to reduce loop counter overhead	2023-03-21 15:35:42 +01:00
Concedo	8d39365af6	update license, added backwards compatibility with both ggml model formats, fixed context length issues.	2023-03-20 23:43:35 +08:00
Concedo	a2c10e0d2f	Merge branch 'master' into concedo # Conflicts: # .devops/full.Dockerfile # README.md # main.cpp	2023-03-20 20:58:27 +08:00
Mack Straight	074bea2eb1	sentencepiece bpe compatible tokenizer (#252 ) * potential out of bounds read * fix quantize * style * Update convert-pth-to-ggml.py * mild cleanup * don't need the space-prefixing here rn since main.cpp already does it * new file magic + version header field * readme notice * missing newlines Co-authored-by: slaren <2141330+slaren@users.noreply.github.com>	2023-03-20 03:17:23 -07:00
Concedo	f952b7c613	Removed junk, fixed some bugs and support dynamic number of sharded files Merge remote-tracking branch 'origin/master' into concedo # Conflicts: # README.md	2023-03-19 11:13:00 +08:00

1 2

54 commits