koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-11 04:51:25 +00:00

Author	SHA1	Message	Date
Concedo	909e4334f9	fixed indentation in makefile	2024-04-08 16:27:13 +08:00
Concedo	87175db07d	try to fix cuda build makefile	2024-04-08 16:18:13 +08:00
Concedo	22f543d09b	Merge commit '`32c8486e1f`' into concedo_experimental # Conflicts: # .devops/nix/package.nix # CMakeLists.txt # Makefile # Package.swift # README.md # build.zig # llama.cpp # tests/test-backend-ops.cpp	2024-04-07 20:39:17 +08:00
Concedo	a530afa1e4	Merge commit '`280345968d`' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/main-cuda.Dockerfile # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # ci/run.sh # docs/token_generation_performance_tips.md # flake.lock # llama.cpp # scripts/LlamaConfig.cmake.in # scripts/compare-commits.sh # scripts/server-llm.sh # tests/test-quantize-fns.cpp	2024-04-07 20:27:17 +08:00
Concedo	bec16d182b	Merge commit '`2f34b865b6`' into concedo_experimental # Conflicts: # .clang-tidy # CMakeLists.txt # Makefile # ggml-cuda.cu	2024-04-07 18:30:35 +08:00
Concedo	0061299cce	fixed quant tools not compiling, updated docs	2024-04-06 23:11:05 +08:00
Jared Van Bortel	32c8486e1f	wpm : portable unicode tolower (#6305 ) Also use C locale for ispunct/isspace, and split unicode-data.cpp from unicode.cpp.	2024-03-26 17:46:21 -04:00
slaren	280345968d	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
slaren	ae1f211ce2	cuda : refactor into multiple files (#6269 )	2024-03-25 13:50:23 +01:00
Minsoo Cheong	64e7b47c69	examples : add "retrieval" (#6193 ) * add `retrieval` example * add README * minor fixes * cast filepos on print * remove use of variable sized array * store similarities in separate vector * print error on insufficient batch size * fix error message printing * assign n_batch value to n_ubatch * fix param definitions * define retrieval-only parameters in retrieval.cpp * fix `--context-file` option to be provided multiple times for multiple files * use vector for `query_emb` * add usage description in README * fix merge conflict * fix usage printing * remove seed setting * fix lint * increase file read buffer size * retrieval : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-25 09:38:22 +02:00
Pierrick Hymbert	21cad01b6e	split: add gguf-split in the make build target (#6262 )	2024-03-23 17:18:13 +01:00
Johannes Gäßler	50ccaf5eac	lookup: complement data from context with general text statistics (#5479 ) * lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens * fixup! lookup: evaluation tools, use corpus/previous gens	2024-03-23 01:24:36 +01:00
slaren	2f0e81e053	cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy (#6208 ) * cuda : add LLAMA_CUDA_NO_PEER_COPY to workaround broken ROCm p2p copy * add LLAMA_CUDA_NO_PEER_COPY to HIP build	2024-03-22 14:05:31 +01:00
Olivier Chafik	5b7b0ac8df	json-schema-to-grammar improvements (+ added to server) (#5978 ) * json: fix arrays (disallow `[,1]`) * json: support tuple types (`[number, string]`) * json: support additionalProperties (`{[k: string]: [string,number][]}`) * json: support required / optional properties * json: add support for pattern * json: resolve $ref (and support https schema urls) * json: fix $ref resolution * join: support union types (mostly for nullable types I think) * json: support allOf + nested anyOf * json: support any (`{}` or `{type: object}`) * json: fix merge * json: temp fix for escapes * json: spaces in output and unrestricted output spaces * json: add typings * json:fix typo * Create ts-type-to-grammar.sh * json: fix _format_literal (json.dumps already escapes quotes) * json: merge lit sequences and handle negatives {"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"} * json: handle pattern repetitions * Update json-schema-to-grammar.mjs * Create regex-to-grammar.py * json: extract repeated regexp patterns to subrule * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * json: handle schema from pydantic Optional fields * Update json-schema-to-grammar.py * Update json-schema-to-grammar.py * Update ts-type-to-grammar.sh * Update ts-type-to-grammar.sh * json: simplify nullable fields handling * json: accept duplicate identical rules * json: revert space to 1 at most * json: reuse regexp pattern subrules * json: handle uuid string format * json: fix literal escapes * json: add --allow-fetch * json: simplify range escapes * json: support negative ranges in patterns * Delete commit.txt * json: custom regex parser, adds dot support & JS-portable * json: rm trailing spaces * Update json-schema-to-grammar.mjs * json: updated server & chat `( cd examples/server && ./deps.sh )` * json: port fixes from mjs to python * Update ts-type-to-grammar.sh * json: support prefixItems alongside array items * json: add date format + fix uuid * json: add date, time, date-time formats * json: preserve order of props from TS defs * json: port schema converter to C++, wire in ./server * json: nits * Update json-schema-to-grammar.cpp * Update json-schema-to-grammar.cpp * Update json-schema-to-grammar.cpp * json: fix mjs implementation + align outputs * Update json-schema-to-grammar.mjs.hpp * json: test C++, JS & Python versions * json: nits + regen deps * json: cleanup test * json: revert from c++17 to 11 * json: nit fixes * json: dirty include for test * json: fix zig build * json: pass static command to std::system in tests (fixed temp files) * json: fix top-level $refs * json: don't use c++20 designated initializers * nit * json: basic support for reserved names `{number:{number:{root:number}}}` * Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test) * json: re-ran server deps.sh * json: simplify test * json: support mix of additional props & required/optional * json: add tests for some expected failures * json: fix type=const in c++, add failure expectations for non-str const&enum * json: test (& simplify output of) empty schema * json: check parsing in test + fix value & string refs * json: add server tests for OAI JSON response_format * json: test/fix top-level anyOf * json: improve grammar parsing failures * json: test/fix additional props corner cases * json: fix string patterns (was missing quotes) * json: ws nit * json: fix json handling in server when there's no response_format * json: catch schema conversion errors in server * json: don't complain about unknown format type in server if unset * json: cleaner build of test * json: create examples/json-schema-pydantic-example.py * json: fix date pattern * json: move json.hpp & json-schema-to-grammar.{cpp,h} to common * json: indent 4 spaces * json: fix naming of top-level c++ function (+ drop unused one) * json: avoid using namespace std * json: fix zig build * Update server.feature * json: iostream -> fprintf * json: space before & refs for consistency * json: nits	2024-03-21 11:50:43 +00:00
Pierrick Hymbert	d0d5de42e5	gguf-split: split and merge gguf per batch of tensors (#6135 ) * gguf-split: split and merge gguf files per tensor * gguf-split: build with make toolchain * gguf-split: rename `--split-tensors-size` to `--split-max-tensors`. Set general.split_count KV to all split * split : minor style + fix compile warnings * gguf-split: remove --upload not implemented --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-19 12:05:44 +01:00
Pierrick Hymbert	d01b3c4c32	common: llama_load_model_from_url using --model-url (#6098 ) * common: llama_load_model_from_url with libcurl dependency Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-17 19:12:37 +01:00
Georgi Gerganov	131b058409	make : ggml-metal.o depends on ggml.h	2024-03-15 11:38:40 +02:00
Georgi Gerganov	381da2d9f0	metal : build metallib + fix embed path (#6015 ) * metal : build metallib + fix embed path ggml-ci * metal : fix embed build + update library load logic ggml-ci * metal : fix embeded library build ggml-ci * ci : fix iOS builds to use embedded library	2024-03-14 11:55:23 +02:00
Concedo	ec5dea14d7	merged, try to fix metal build	2024-03-14 11:15:50 +08:00
slaren	f30ea47a87	llama : add pipeline parallelism support (#6017 ) * llama : add pipeline parallelism support for batch processing with multiple CUDA GPUs ggml-ci * server : add -ub, --ubatch-size parameter * fix server embedding test * llama : fix Mamba inference for pipeline parallelism Tested to work correctly with both `main` and `parallel` examples. * llama : limit max batch size to n_batch * add LLAMA_SCHED_MAX_COPIES to configure the number of input copies for pipeline parallelism default increase to 4 (from 2) changing this value may improve performance for some systems, but increases memory usage * fix hip build * fix sycl build (disable cpy_tensor_async) * fix hip build * llama : limit n_batch and n_ubatch to n_ctx during context creation * llama : fix norm backend * batched-bench : sync after decode * swiftui : sync after decode * ggml : allow ggml_get_rows to use multiple threads if they are available * check n_ubatch >= n_tokens with non-casual attention * llama : do not limit n_batch to n_ctx with non-casual attn * server : construct batch with size of llama_n_batch * ggml_backend_cpu_graph_compute : fix return value when alloc fails * llama : better n_batch and n_ubatch comment * fix merge * small fix * reduce default n_batch to 2048 --------- Co-authored-by: Francis Couture-Harpin <git@compilade.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-13 18:54:21 +01:00
Concedo	9f102b9db6	update makefile	2024-03-13 21:53:52 +08:00
Concedo	ba950716a9	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # Package.swift # README.md # build.zig # llama.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-llama.cpp	2024-03-13 11:21:58 +08:00
Georgi Gerganov	83796e62bc	llama : refactor unicode stuff (#5992 ) * llama : refactor unicode stuff ggml-ci * unicode : names * make : fix c++ compiler * unicode : names * unicode : straighten tables * zig : fix build * unicode : put nfd normalization behind API ggml-ci * swift : fix build * unicode : add BOM * unicode : add <cstdint> ggml-ci * unicode : pass as cpts as const ref	2024-03-11 17:47:47 +02:00
Concedo	227f59dab6	added a simple program to do quantization for clip models	2024-03-11 21:50:30 +08:00
DAN™	bcebd7dbf6	llama : add support for GritLM (#5959 ) * add gritlm example * gritlm results match * tabs to spaces * comment out debug printing * rebase to new embed * gritlm embeddings are back babeee * add to gitignore * allow to toggle embedding mode * Clean-up GritLM sample code. * Fix types. * Flush stdout and output ending newline if streaming. * mostly style fixes; correct KQ_mask comment * add causal_attn flag to llama_cparams * gritml : minor * llama : minor --------- Co-authored-by: Douglas Hanley <thesecretaryofwar@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-10 17:56:30 +02:00
Concedo	c08d7e5042	wip integration of llava	2024-03-10 11:18:47 +08:00
Georgi Gerganov	8a3012a4ad	ggml : add ggml-common.h to deduplicate shared code (#5940 ) * ggml : add ggml-common.h to shared code ggml-ci * scripts : update sync scripts * sycl : reuse quantum tables ggml-ci * ggml : minor * ggml : minor * sycl : try to fix build	2024-03-09 12:47:57 +02:00
Gabe Goodhart	e1fa9569ba	server : add SSL support (#5926 ) * add cmake build toggle to enable ssl support in server Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * add flags for ssl key/cert files and use SSLServer if set All SSL setup is hidden behind CPPHTTPLIB_OPENSSL_SUPPORT in the same way that the base httlib hides the SSL support Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * Update readme for SSL support in server Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * Add LLAMA_SERVER_SSL variable setup to top-level Makefile Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2024-03-09 11:57:09 +02:00
Georgi Gerganov	2002bc96bf	server : refactor (#5882 ) * server : refactoring (wip) * server : remove llava/clip objects from build * server : fix empty prompt handling + all slots idle logic * server : normalize id vars * server : code style * server : simplify model chat template validation * server : code style * server : minor * llama : llama_chat_apply_template support null buf * server : do not process embedding requests when disabled * server : reorganize structs and enums + naming fixes * server : merge oai.hpp in utils.hpp * server : refactor system prompt update at start * server : disable cached prompts with self-extend * server : do not process more than n_batch tokens per iter * server: tests: embeddings use a real embeddings model (#5908) * server, tests : bump batch to fit 1 embedding prompt * server: tests: embeddings fix build type Debug is randomly failing (#5911) * server: tests: embeddings, use different KV Cache size * server: tests: embeddings, fixed prompt do not exceed n_batch, increase embedding timeout, reduce number of concurrent embeddings * server: tests: embeddings, no need to wait for server idle as it can timout * server: refactor: clean up http code (#5912) * server : avoid n_available var ggml-ci * server: refactor: better http codes * server : simplify json parsing + add comment about t_last * server : rename server structs * server : allow to override FQDN in tests ggml-ci * server : add comments --------- Co-authored-by: Pierrick Hymbert <pierrick.hymbert@gmail.com>	2024-03-07 11:41:53 +02:00
Concedo	4eb3a95cbb	reenable LCM sampler	2024-03-05 17:39:21 +08:00
Concedo	3463688a0e	image generation is fully working over api (+1 squashed commits) Squashed commits: [c98ab0b4] single image generation is working now	2024-03-01 14:43:44 +08:00
Concedo	5a44d4de2b	refactor and clean identifiers for sd, fix cmake	2024-02-29 18:28:45 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	8a919daafb	add as library to makefile	2024-02-28 17:45:00 +08:00
Concedo	17355faf6e	sdcpp is working!	2024-02-28 17:00:18 +08:00
Concedo	26696970ce	initial files from sdcpp (not working)	2024-02-28 15:45:13 +08:00
le.chang	cbbd1efa06	Makefile: use variables for cublas (#5689 ) * make: use arch variable for cublas * fix UNAME_M * check opt first --------- Co-authored-by: lindeer <le.chang118@gmail.com>	2024-02-27 03:03:06 +01:00
kwin1412	f1a98c5254	make : fix nvcc version is empty (#5713 ) fix nvcc version is empty	2024-02-25 18:46:49 +02:00
Concedo	b5ba6c9ece	test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models	2024-02-25 21:14:53 +08:00
Concedo	f3a0e05d91	added noavx2 vulkan	2024-02-22 16:56:25 +08:00
CJ Pais	6560bed3f0	server : support llava 1.6 (#5553 ) * server: init working 1.6 * move clip_image to header * remove commented code * remove c++ style from header * remove todo * expose llava_image_embed_make_with_clip_img * fix zig build	2024-02-20 21:07:22 +02:00
slaren	06bf2cf8c4	make : fix debug build with CUDA (#5616 )	2024-02-20 20:06:17 +01:00
Haoxiang Fei	8dbbd75754	metal : add build system support for embedded metal library (#5604 ) * add build support for embedded metal library * Update Makefile --------- Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-20 11:58:36 +02:00
Jared Van Bortel	f24ed14ee0	make : pass CPPFLAGS directly to nvcc, not via -Xcompiler (#5598 )	2024-02-19 15:54:12 -05:00
Georgi Gerganov	d0e3ce51f4	ci : enable -Werror for CUDA builds (#5579 ) * cmake : pass -Werror through -Xcompiler ggml-ci * make, cmake : enable CUDA errors on warnings ggml-ci	2024-02-19 14:45:41 +02:00
Georgi Gerganov	68a6b98b3c	make : fix CUDA build (#5580 )	2024-02-19 13:41:51 +02:00
Xuan Son Nguyen	11b12de39b	llama : add llama_chat_apply_template() (#5538 ) * llama: add llama_chat_apply_template * test-chat-template: remove dedundant vector * chat_template: do not use std::string for buffer * add clarification for llama_chat_apply_template * llama_chat_apply_template: add zephyr template * llama_chat_apply_template: correct docs * llama_chat_apply_template: use term "chat" everywhere * llama_chat_apply_template: change variable name to "tmpl"	2024-02-19 10:23:37 +02:00
Jared Van Bortel	a0c2dad9d4	build : pass all warning flags to nvcc via -Xcompiler (#5570 ) * build : pass all warning flags to nvcc via -Xcompiler * make : fix apparent mis-merge from #3952 * make : fix incorrect GF_CC_VER for CUDA host compiler	2024-02-18 16:21:52 -05:00
Ananta Bastola	6e4e973b26	ci : add an option to fail on compile warning (#3952 ) * feat(ci): add an option to fail on compile warning * Update CMakeLists.txt * minor : fix compile warnings ggml-ci * ggml : fix unreachable code warnings ggml-ci * ci : disable fatal warnings for windows, ios and tvos * ggml : fix strncpy warning * ci : disable fatal warnings for MPI build * ci : add fatal warnings to ggml-ci ggml-ci --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-02-17 23:03:14 +02:00
henk717	01b7daf6b7	CUDA 12 support for the Conda Runtime (#680 ) * Remove libculibos dependency This dependency is something that is used to build libcudart which we are also already targeting. The individual file is no longer being distributed with the CUDA12 conda devkit, so we can no longer target it directly. But because all its functionality is inside libcudart we also don't need it. This commit removes the inclusion so that Koboldcpp can be compiled with CUDA12 as distributed by conda. I have tested this on the 1.57 release on CUDA11.5 and CUDA12.1. * Cleanup version definitions The package versions are already controlled by the label, we don't need to define it multiple times for it to work correctly. Removing the separate definitions allows people to easily change which version of CUDA they wish for their system.	2024-02-16 21:32:30 +08:00

1 2 3 4 5 ...

405 commits