koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-13 15:39:11 +00:00

Author	SHA1	Message	Date
drbh	ee77efea2a	test : add simple grammar parsing tests (#2594 ) * adds simple grammar parsing tests * adds cassert header	2023-08-13 17:00:48 +03:00
Concedo	9483288e03	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile	2023-08-12 16:04:11 +08:00
byte-6174	b19edd54d5	Adding support for llama2.c models (#2559 )	2023-08-12 01:17:25 +02:00
Concedo	dae9dffa6a	rename koboldcpp.dll to koboldcpp_default.dll	2023-08-11 14:54:27 +08:00
Concedo	45456fa6ca	switch noavx2 to not use openblas, as it has incompatible instructions	2023-07-30 16:47:33 +08:00
Concedo	343ae756fa	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # CMakeLists.txt # Makefile # README.md # flake.nix # ggml-cuda.cu	2023-07-22 11:51:30 +08:00
Georgi Gerganov	3973b25a64	gitignore : fix final newline	2023-07-21 14:42:41 +03:00
Jose Maldonado	73643f5fb1	gitignore : changes for Poetry users + chat examples (#2284 ) A fix in Makefile for FreeBSD users. In the platfrom x86_64 is amd64. This fix resolve compilation using CFLAGS and CXXFLAGS with -march=native and -mtune=native Add two examples for interactive mode using Llama2 models (thx TheBloke for models) Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-21 13:53:27 +03:00
Jiří Podivín	54e3bc76fe	make : add new target for test binaries (#2244 ) Programs in the tests directory are now build with target tests and placed in the same location. * clean target was expanded to remove new binaries * test target binaries are listed in a variable * Locations of binaries were added to the .gitignore Signed-off-by: Jiri Podivin <jpodivin@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-21 13:09:16 +03:00
Concedo	374fffb9c6	Reworking rope WIP	2023-07-19 00:54:41 +08:00
Georgi Gerganov	d01bccde9f	ci : integrate with ggml-org/ci (#2250 ) * ci : run ctest ggml-ci * ci : add open llama 3B-v2 tests ggml-ci * ci : disable wget progress output ggml-ci * ci : add open llama 3B-v2 tg tests for q4 and q5 quantizations ggml-ci * tests : try to fix tail free sampling test ggml-ci * ci : add K-quants ggml-ci * ci : add short perplexity tests ggml-ci * ci : add README.md * ppl : add --chunks argument to limit max number of chunks ggml-ci * ci : update README	2023-07-18 14:24:43 +03:00
Concedo	b0b131499f	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # tests/test-tokenizer-0.cpp	2023-07-11 16:12:15 +08:00
Evan Miller	5656d10599	mpi : add support for distributed inference via MPI (#2099 ) * MPI support, first cut * fix warnings, update README * fixes * wrap includes * PR comments * Update CMakeLists.txt * Add GH workflow, fix test * Add info to README * mpi : trying to move more MPI stuff into ggml-mpi (WIP) (#2099) * mpi : add names for layer inputs + prep ggml_mpi_graph_compute() * mpi : move all MPI logic into ggml-mpi Not tested yet * mpi : various fixes - communication now works but results are wrong * mpi : fix output tensor after MPI compute (still not working) * mpi : fix inference * mpi : minor * Add OpenMPI to GH action * [mpi] continue-on-error: true * mpi : fix after master merge * [mpi] Link MPI C++ libraries to fix OpenMPI * tests : fix new llama_backend API * [mpi] use MPI_INT32_T * mpi : factor out recv / send in functions and reuse * mpi : extend API to allow usage with outer backends (e.g. Metal) --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-07-10 18:49:56 +03:00
Concedo	523fc3be52	fixed rwkv, standardized new ctx usage	2023-07-10 20:05:53 +08:00
Concedo	dff5575647	Merge branch 'master' into concedo_experimental # Conflicts: # .gitignore # Makefile # ggml-opencl.cpp # llama.cpp	2023-06-29 17:35:28 +08:00
ningshanwutuobang	cfa0750bc9	llama : support input embeddings directly (#1910 ) * add interface for float input * fixed inpL shape and type * add examples of input floats * add test example for embd input * fixed sampling * add free for context * fixed add end condition for generating * add examples for llava.py * add READMD for llava.py * add READMD for llava.py * add example of PandaGPT * refactor the interface and fixed the styles * add cmake build for embd-input * add cmake build for embd-input * Add MiniGPT-4 example * change the order of the args of llama_eval_internal * fix ci error	2023-06-28 18:53:37 +03:00
Concedo	278427d9a4	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md	2023-06-18 15:29:44 +08:00
Georgi Gerganov	051e1b0e6a	llama : fix kv_cache `n` init (close #1903 )	2023-06-17 19:31:20 +03:00
Concedo	9f8e2f8a18	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # pocs/vdot/vdot.cpp # scripts/verify-checksum-models.py # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp # tests/test-sampling.cpp # tests/test-tokenizer-0.cpp	2023-06-17 20:02:32 +08:00
Randall Fitzgerald	794db3e7b9	Server Example Refactor and Improvements (#1570 ) A major rewrite for the server example. Note that if you have built something on the previous server API, it will probably be incompatible. Check out the examples for how a typical chat app could work. This took a lot of effort, there are 24 PR's closed in the submitter's repo alone, over 160 commits and a lot of comments and testing. Summary of the changes: - adds missing generation parameters: tfs_z, typical_p, repeat_last_n, repeat_penalty, presence_penalty, frequency_penalty, mirostat, penalize_nl, seed, ignore_eos - applies missing top k sampler - removes interactive mode/terminal-like behavior, removes exclude parameter - moves threads and batch size to server command-line parameters - adds LoRA loading and matches command line parameters with main example - fixes stopping on EOS token and with the specified token amount with n_predict - adds server timeouts, host, and port settings - adds expanded generation complete response; adds generation settings, stop reason, prompt truncated, model used, and final text - sets defaults for unspecified parameters between requests - removes /next-token endpoint and as_loop parameter, adds stream parameter and server-sent events for streaming - adds CORS headers to responses - adds request logging, exception printing and optional verbose logging - adds better stopping words handling when matching multiple tokens and while streaming, or when it finishes on a partial stop string - adds printing an error when it can't bind to the host/port specified - fixes multi-byte character handling and replaces invalid UTF-8 characters on responses - prints timing and build info on startup - adds logit bias to request parameters - removes embedding mode - updates documentation; adds streaming Node.js and Bash examples - fixes code formatting - sets server threads to 1 since the current global state doesn't work well with simultaneous requests - adds truncation of the input prompt and better context reset - removes token limit from the input prompt - significantly simplified the logic and removed a lot of variables --------- Co-authored-by: anon998 <131767832+anon998@users.noreply.github.com> Co-authored-by: Henri Vasserman <henv@hot.ee> Co-authored-by: Felix Hellmann <privat@cirk2.de> Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Lesaun Harvey <Lesaun@gmail.com>	2023-06-17 14:53:04 +03:00
Concedo	7ef8d740b9	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile	2023-06-16 16:37:14 +08:00
Borislav Stanimirov	602c748863	gitignore : add several entries specific to Visual Studio (#1888 )	2023-06-16 09:58:11 +03:00
daboe01	cf267d1c71	make : add train-text-from-scratch (#1850 ) * make finetuning example accessible * fixed: targed was in wrong line * fixed: name of executable was wrong * fixed: naming of binary * fixed: model path was wrong * fixed clean target * Update examples/train-text-from-scratch/README.md --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-06-15 20:42:48 +03:00
Concedo	0833845268	merged metal patch directly into the file	2023-06-09 14:38:31 +08:00
Hyun-joo KIM	6fa1613f15	Metal inference enhancement - put hard-wired relative path of ggml-model.model file using a patch file due to lack of NSBundle environment	2023-06-09 01:47:36 +09:00
Concedo	ed603dcafc	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # docs/BLIS.md # llama.cpp # tests/test-quantize-fns.cpp	2023-06-06 23:12:01 +08:00
Georgi Gerganov	2d43387daf	ggml : fix builds, add ggml-quants-k.o (close #1712 , close #1710 )	2023-06-06 10:18:03 +03:00
Georgi Gerganov	7ad7750c5c	gitignore : add .clang-tidy	2023-06-06 09:55:25 +03:00
Georgi Gerganov	ecb217db4f	llama : Metal inference (#1642 ) * mtl : export the LLaMA computation graph * ci : disable temporary * mtl : adapt the MNIST example as starter * mtl : no need for mtl-export tool, add cli arg for main instead * mtl : export just a small part of the graph for now to make it easier * mtl : move MSL code into separate file for easy editing * mtl : initial get_rows_q4_0 kernel * mtl : confirmed get_rows_q4_0 is working correctly * mtl : add rms_norm kernel + confirm working * mtl : add mul kernel + confirm working * mtl : initial mul_mat Q4 kernel (wrong results) * mtl : mul_mat fixes (still wrong) * mtl : another mul_mat Q4 (still does not work) * mtl : working mul_mat q4 * ggml : fix handling of "view" ops in ggml_graph_import() * mtl : add rope kernel * mtl : add reshape and transpose handling * ggml : store offset as opt arg for ggml_view_xd() operators * mtl : add cpy kernel + handle view ops * mtl : confirm f16 x f32 attention mul mat * mtl : add scale kernel * mtl : add diag_mask_inf kernel * mtl : fix soft_max kernel * ggml : update ggml_nbytes() to handle non-contiguous tensors * mtl : verify V tensor contents * mtl : add f32 -> f32 cpy kernel * mtl : add silu kernel * mtl : add non-broadcast mul kernel * mtl : full GPU inference of the computation graph * mtl : optimize rms_norm and soft_max kernels * mtl : add f16 mat x f32 vec multiplication kernel * mtl : fix bug in f16 x f32 mul mat + speed-up computation * mtl : faster mul_mat_q4_0_f32 kernel * mtl : fix kernel signature + roll inner loop * mtl : more threads for rms_norm + better timing * mtl : remove printfs from inner loop * mtl : simplify implementation * mtl : add save/load vocab to ggml file * mtl : plug Metal inference into llama.cpp (very quick-n-dirty) * mtl : make it work with main example Lots of hacks but at least now it generates text * mtl : preparing for merge * mtl : clean-up ggml mtl interface + suport scratch / inplace * mtl : remove temp / debug code * metal : final refactoring and simplification * Revert "ci : disable temporary" This reverts commit 98c267fc77fe811082f672538fc91bcfc9072d63. * metal : add comments * metal : clean-up stuff, fix typos * readme : add Metal instructions * readme : add example for main	2023-06-04 23:34:30 +03:00
Concedo	20803c221e	cleaning up some old junk	2023-06-04 11:05:46 +08:00
Concedo	cd4012c3ed	minor fixes to debug logging, fixed a typo, added a new failsafe mode	2023-05-23 21:31:42 +08:00
Concedo	587308a202	fixed some build errors on linux, changed icon resolution, added more error printing	2023-05-22 12:18:42 +08:00
Concedo	e01e373e63	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # ggml.c # llama.cpp	2023-05-14 11:34:41 +08:00
Georgi Gerganov	0cd22e190a	llama : fix various warnings	2023-05-13 11:23:15 +03:00
Concedo	e9caff1cda	Interim merge. Do not use. Merge branch 'master' into concedo_experimental # Conflicts: # README.md # SHA256SUMS # examples/quantize/quantize.cpp # ggml-opencl.c # ggml.c # ggml.h # llama.cpp # llama.h	2023-05-12 23:20:27 +08:00
Georgi Gerganov	b9fd7eee57	ggml : remove bit shuffling (#1405 ) * ggml : remove Q4_0 bit shufling (ARM NEON) * ggml : remove Q4_1 bit shuffling (ARM NEON + reference) * ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON) * ggml : remove Q4_2 bit shuffling (WIP, BROKEN) * ggml : remove Q5_0 bit shuffling (ARM NEON) * ggml : 2x faster scalar implementations * ggml : remove Q5_1 bit shuffling (ARM NEON + scalar) * ggml : simplify scalar dot * ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit * ggml : fix Q4_1 quantization * ggml : update cuBLAS + normalize variable names * ggml : remove Q4_2 mode * ggml : minor formatting * ggml : fix Q5_0 quantization * scripts : add script for measuring the time per token * AVX implementations (#1370) * ggml : uniform 5th bit extraction * llama : produce error upon loading old model files * llama : fix model magic/version write * ggml : speed-up Q5_0 + Q5_1 at 4 threads * ggml : preserve old Q4 and Q5 formats * ggml : simplify Q8_1 - no need for low / high sums anymore * ggml : fix Q8_0 and Q8_1 rounding * Revert "AVX implementations (#1370)" This reverts commit 948d124837f9d287d8490f41338e0e4cceb0814f. * ggml : fix AVX2 implementation * sha : update hashes for 7B and 13B * readme : update timings + remove warning banner * llama : update v2 PR number to 1405 * ggml : fix WASM comments * ggml : back to original bit order * readme : add note that Q4 and Q5 have been changed * llama : fix return for unknown version --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-05-12 00:23:08 +03:00
Concedo	54194911ac	Merge branch 'master' into concedo_experimental # Conflicts: # README.md	2023-05-09 16:50:43 +08:00
Georgi Gerganov	f9a6364912	llama : require first token to be BOS (#1303 ) * llama : require first token to be BOS * scripts : add ppl-run-all.sh * perplexity : add BOS for each chunk * readme : update perplexity values after BOS fix * perplexity : add clarifying comments	2023-05-08 17:41:54 +03:00
Concedo	62beded0e7	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # README.md	2023-05-07 19:10:01 +08:00
Jed Fox	3924088512	Remove default arguments from sampling functions (#1343 )	2023-05-06 17:01:47 -04:00
Concedo	966cd2ce91	Merge remote-tracking branch 'temp/concedo' into concedo_experimental # Conflicts: # koboldcpp.py	2023-05-02 22:43:34 +08:00
Sergey Kucher	069b3d4c37	Adds --mlock argument	2023-05-02 16:19:37 +03:00
Concedo	94827172e0	Merge branch 'master' into concedo # Conflicts: # CMakeLists.txt # Makefile # ggml-cuda.cu # ggml-cuda.h	2023-05-02 14:38:31 +08:00
DannyDaemonic	f4cef87edf	Add git-based build information for better issue tracking (#1232 ) * Add git-based build information for better issue tracking * macOS fix * "build (hash)" and "CMAKE_SOURCE_DIR" changes * Redo "CMAKE_CURRENT_SOURCE_DIR" and clearer build messages * Fix conditional dependency on missing target * Broke out build-info.cmake, added find_package fallback, and added build into to all examples, added dependencies to Makefile * 4 space indenting for cmake, attempt to clean up my mess in Makefile * Short hash, less fancy Makefile, and don't modify build-info.h if it wouldn't change it	2023-05-01 18:23:47 +02:00
Concedo	3de34ee492	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # ggml-opencl.c	2023-05-01 12:03:46 +08:00
Stephan Walter	f0d70f147d	Various fixes to mat_mul benchmark (#1253 )	2023-04-30 12:32:37 +00:00
Concedo	0fc1772a8f	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ggml.c	2023-04-29 11:14:05 +08:00
CRD716	5fba3c016b	examples : add Jeopardy example (#1168 ) * Basic Setup * Prevent Results.txt from coming up * Prefixes, Line separators, etc * editorcheck * introduction to give more consistent results * Basic graph thing * Grading, ready for testing! * Y'all ready to get funky? * fix column removal stuff * missed a few	2023-04-28 19:13:33 +03:00
Concedo	95bbd46019	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # README.md	2023-04-27 16:12:00 +08:00
Georgi Gerganov	574406dc7e	ggml : add Q5_0 and Q5_1 quantization (#1187 ) * ggml : add Q5_0 quantization (cuBLAS only) * ggml : fix Q5_0 qh -> uint32_t * ggml : fix q5_0 histogram stats * ggml : q5_0 scalar dot product * ggml : q5_0 ARM NEON dot * ggml : q5_0 more efficient ARM NEON using uint64_t masks * ggml : rename Q5_0 -> Q5_1 * ggml : adding Q5_0 mode * quantize : add Q5_0 and Q5_1 to map * ggml : AVX2 optimizations for Q5_0, Q5_1 (#1195) --------- Co-authored-by: Stephan Walter <stephan@walter.name>	2023-04-26 23:14:13 +03:00

1 2 3

123 commits