koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-14 10:59:41 +00:00

Author	SHA1	Message	Date
William Tambellini	858f6b73f6	Add an option to build without CUDA VMM (#7067 ) Add an option to build ggml cuda without CUDA VMM resolves https://github.com/ggerganov/llama.cpp/issues/6889 https://forums.developer.nvidia.com/t/potential-nvshmem-allocated-memory-performance-issue/275416/4	2024-05-06 20:12:14 +02:00
Georgi Gerganov	b3a995b416	flake.lock: Update (#7079 ) Flake lock file updates: • Updated input 'flake-parts': 'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d?narHash=sha256-sB4SWl2lX95bExY2gMFG5HIzvva5AVMJd4Igm%2BGpZNw%3D' (2024-04-01) → 'github:hercules-ci/flake-parts/e5d10a24b66c3ea8f150e47dfdb0416ab7c3390e?narHash=sha256-yzcRNDoyVP7%2BSCNX0wmuDju1NUCt8Dz9%2BlyUXEI0dbI%3D' (2024-05-02) • Updated input 'flake-parts/nixpkgs-lib': 'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib&narHash=sha256-iMUFArF0WCatKK6RzfUJknjem0H9m4KgorO/p3Dopkk%3D' (2024-03-29) → 'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D' (2024-05-02) • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25) → 'github:NixOS/nixpkgs/63c3a29ca82437c87573e4c6919b09a24ea61b0f?narHash=sha256-4cPymbty65RvF1DWQfc%2BBc8B233A1BWxJnNULJKQ1EY%3D' (2024-05-02) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2024-05-06 08:36:06 -07:00
Concedo	0d1cd0171a	update docs	2024-05-06 21:17:11 +08:00
Concedo	62ea3eee4a	announce sdui url	2024-05-06 18:15:34 +08:00
Concedo	6c000cbe7a	Merge branch 'upstream' into concedo_experimental # Conflicts: # .flake8 # .github/workflows/bench.yml # .github/workflows/python-lint.yml # .pre-commit-config.yaml # Makefile # README.md # models/ggml-vocab-bert-bge.gguf.inp # models/ggml-vocab-bert-bge.gguf.out # models/ggml-vocab-deepseek-coder.gguf.inp # models/ggml-vocab-deepseek-coder.gguf.out # models/ggml-vocab-deepseek-llm.gguf.inp # models/ggml-vocab-deepseek-llm.gguf.out # models/ggml-vocab-falcon.gguf.inp # models/ggml-vocab-falcon.gguf.out # models/ggml-vocab-gpt-2.gguf.inp # models/ggml-vocab-gpt-2.gguf.out # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # models/ggml-vocab-llama-spm.gguf.inp # models/ggml-vocab-llama-spm.gguf.out # models/ggml-vocab-mpt.gguf.inp # models/ggml-vocab-mpt.gguf.out # models/ggml-vocab-phi-3.gguf # models/ggml-vocab-phi-3.gguf.inp # models/ggml-vocab-phi-3.gguf.out # models/ggml-vocab-refact.gguf # models/ggml-vocab-starcoder.gguf.inp # models/ggml-vocab-starcoder.gguf.out # requirements/requirements-convert.txt # scripts/compare-llama-bench.py # scripts/run-with-preset.py # scripts/verify-checksum-models.py # tests/CMakeLists.txt # tests/test-tokenizer-0.cpp	2024-05-06 18:09:45 +08:00
Concedo	173c7272d5	EOS bypass mode added	2024-05-06 18:01:49 +08:00
Georgi Gerganov	bcdee0daa7	minor : fix trailing whitespace	2024-05-06 09:31:30 +03:00
Concedo	3667cc0113	fixed stableui btn (+4 squashed commit) Squashed commit: [1d4714f1] update default amount to gen [6eacba33] updated lite [033589af] added first ver sdui [16f66d57] updated lite	2024-05-06 00:55:16 +08:00
kunnis	628b299106	Adding support for the --numa argument for llama-bench. (#7080 )	2024-05-05 14:17:47 +02:00
Sigbjørn Skjæret	8f8acc8683	Disable benchmark on forked repo (#7034 ) * Disable benchmark on forked repo * only check owner on schedule event * check owner on push also * more readable as multi-line * ternary won't work * style++ * test++ * enable actions debug * test-- * remove debug * test++ * do debug where we can get logs * test-- * this is driving me crazy * correct github.event usage * remove test condition * correct github.event usage * test++ * test-- * event_name is pull_request_target * test++ * test-- * update ref checks	2024-05-05 13:38:55 +02:00
Lyle Dean	ca36326020	readme : add note that LLaMA 3 is not supported with convert.py (#7065 )	2024-05-05 08:21:46 +03:00
DAN™	889bdd7686	command-r : add BPE pre-tokenization (#7063 ) * Add BPE pre-tokenization for Command-R/R+. * Bump transformers convert requirement. * command-r : add individual digits regex --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-05 08:19:30 +03:00
Brian	6fbd432211	py : logging and flake8 suppression refactoring (#7081 ) Set one as executable and add basicConfig() to another. Also added noqa tag to test scripts.	2024-05-05 08:07:48 +03:00
Xuan Son Nguyen	842500144e	gguf-split: add --no-tensor-first-split (#7072 )	2024-05-04 18:56:22 +02:00
Concedo	0c381f9ded	increase interrogate length	2024-05-05 00:40:49 +08:00
Jeximo	cf768b7e71	Tidy Android Instructions README.md (#7016 ) * Tidy Android Instructions README.md Remove CLBlast instructions(outdated), added OpenBlas. * don't assume git is installed Added apt install git, so that git clone works * removed OpenBlas Linked to Linux build instructions * fix typo Remove word "run" * correct style Co-authored-by: slaren <slarengh@gmail.com> * correct grammar Co-authored-by: slaren <slarengh@gmail.com> * delete reference to Android API * remove Fdroid reference, link directly to Termux Fdroid is not required Co-authored-by: slaren <slarengh@gmail.com> * Update README.md Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-05-04 18:10:15 +02:00
Concedo	5ca267dc9c	remove unnecessary prints	2024-05-04 23:28:21 +08:00
viric	fcd84a0f5a	Fix Linux /sys cpu path to guess number of cores (#7064 )	2024-05-04 15:26:53 +02:00
maor-ps	03fb8a002d	If first token generated from the server is the stop word the server will crash (#7038 ) This will reproduce the issue in llama13b { 'prompt': 'Q: hello world \nA: ', 'stop': ['\n'], 'temperature': 0.0, 'n_predict': 10, 'cache_prompt': True, 'n_probs': 10 }	2024-05-04 11:06:40 +02:00
Georgi Gerganov	92139b90af	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 ) * tests : add test-tokenizer-0.sh * unicode : add all unicode number ranges * starcoder : fix pre-tokenizer * tests : add test that fails with DeepSeek tokenizers * falcon : fix regex * unicode : regenerate unicode tables * refact : add tokenizer model * lint : fix * tests : disable failing tests ggml-ci * refact : add tests files ggml-ci * convert : print -> logging ggml-ci * lint : fix * unicode : digit -> number * phi-3 : update	2024-05-04 08:32:32 +03:00
Concedo	a3718c6354	1.64.1 to fix llava issues	2024-05-04 10:38:20 +08:00
Concedo	89db8afded	revert moondream to try and fix llava	2024-05-04 10:07:54 +08:00
Brian	a2ac89d6ef	convert.py : add python logging instead of print() (#6511 ) * convert.py: add python logging instead of print() * convert.py: verbose flag takes priority over dump flag log suppression * convert.py: named instance logging * convert.py: use explicit logger id string * convert.py: convert extra print() to named logger * convert.py: sys.stderr.write --> logger.error * .py: Convert all python scripts to use logging module requirements.txt: remove extra line * flake8: update flake8 ignore and exclude to match ci settings * gh-actions: add flake8-no-print to flake8 lint step * pre-commit: add flake8-no-print to flake8 and also update pre-commit version * convert-hf-to-gguf.py: print() to logger conversion * .py: logging basiconfig refactor to use conditional expression .py: removed commented out logging fixup! .py: logging basiconfig refactor to use conditional expression constant.py: logger.error then exit should be a raise exception instead * .py: Convert logger error and sys.exit() into a raise exception (for atypical error) gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar * verify-checksum-model.py: This is the result of the program, it should be printed to stdout. * compare-llama-bench.py: add blank line for readability during missing repo response * reader.py: read_gguf_file() use print() over logging * convert.py: warning goes to stderr and won't hurt the dump output * gguf-dump.py: dump_metadata() should print to stdout * convert-hf-to-gguf.py: print --> logger.debug or ValueError() * verify-checksum-models.py: use print() for printing table * .py: refactor logging.basicConfig() gguf-py/gguf/.py: use __name__ as logger name Since they will be imported and not run directly. python-lint.yml: use .flake8 file instead * constants.py: logger no longer required * convert-hf-to-gguf.py: add additional logging * convert-hf-to-gguf.py: print() --> logger * .py: fix flake8 warnings revert changes to convert-hf-to-gguf.py for get_name() * convert-hf-to-gguf-update.py: use triple quoted f-string instead * .py: accidentally corrected the wrong line *.py: add compilade warning suggestions and style fixes	2024-05-03 22:36:41 +03:00
Daniel Bevenius	433def286e	llama : rename ctx to user_data in progress_callback (#7045 ) * llama : rename ctx to user_data in progress_callback This commit renames the `ctx` parameter to `user_data` in the `llama_progress_callback` typedef. The motivation for this is that other callbacks use `user_data` or `data`, and using `ctx` in this case might be confusing as it could be confused with `llama_context`. --------- Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>	2024-05-03 15:24:30 +02:00
Concedo	640f195140	add kobble tiny to readme	2024-05-03 18:13:39 +08:00
henk717	b6bfab128f	CUDA 12 CI (#815 ) * Allow KCPP_CUDA to specify CUDA version * CUDA 12 CI Linux * CUDA 12 CI * Fix KCPP_CUDA indent * KCPP_CUDA ENV Fix StackOverflow is bad for advice sometimes.... * Lowcase cuda on output filename * Strip . from filename output	2024-05-03 17:12:57 +08:00
Concedo	a34a09d196	replace destroy with quit for tk	2024-05-03 15:57:13 +08:00
Bartowski	60325fa56f	Remove .attention from skipped tensors to match more accurately (#7051 )	2024-05-03 01:49:09 +02:00
alwqx	6ecf3189e0	chore: fix typo in llama.cpp (#7032 ) Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-05-02 11:56:41 -04:00
Concedo	4c5d307f59	fixed benchmark interrupt (+2 squashed commit) Squashed commit: [6e334c8b] require enter key to be pressed [d50d49b6] fixed bench script	2024-05-02 23:22:47 +08:00
Concedo	0d8c4a9b73	remove quick lowvram option	2024-05-02 14:21:44 +08:00
Concedo	fb7e72352e	benchmark includes ver	2024-05-02 14:17:48 +08:00
Concedo	e7a962c70a	update readme	2024-05-02 10:57:54 +08:00
Andrew Downing	b0d943de17	Update LOG_IMPL and LOG_TEE_IMPL (#7029 ) ROCm clang defines _MSC_VER which results in the wrong implementation of LOG_IMPL and LOG_TEE_IMPL being compiled. This fixes https://github.com/ggerganov/llama.cpp/issues/6972	2024-05-01 23:31:30 +02:00
l3utterfly	8d608a81b7	main : fix off by one error for context shift (#6921 )	2024-05-01 22:27:41 +03:00
Johannes Gäßler	3ea0d36000	Server: add tests for batch size, different seeds (#6950 )	2024-05-01 17:52:55 +02:00
Concedo	3c2bd8aad3	add cu12 ci for windows	2024-05-01 22:46:02 +08:00
Concedo	81619f3611	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/close-issue.yml # ggml-cuda/common.cuh # ggml-cuda/fattn.cu	2024-05-01 21:14:34 +08:00
Johannes Gäßler	1613ef8d8e	CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019 )	2024-05-01 14:46:37 +02:00
Concedo	b641d986f7	use johannes implementation instead (+1 squashed commits) Squashed commits: [f5e6709d] use johannes implementation instead	2024-05-01 18:47:24 +08:00
Concedo	e9978bfac0	resize window dimensions	2024-05-01 17:38:49 +08:00
Concedo	cea46750b0	try hack in missing hmax2 functions (+1 squashed commits) Squashed commits: [c98d0ab6] try hack in missing hmax2 functions (+1 squashed commits) Squashed commits: [9ba8599f] try hack in missing hmax2 functions (+2 squashed commit) Squashed commit: [be497493] try hack in missing hmax2 functions [159ee4c3] bypass missing hmax functions on old cuda	2024-05-01 15:36:16 +08:00
slaren	c4ec9c0d3d	ci : exempt confirmed bugs from being tagged as stale (#7014 )	2024-05-01 08:13:59 +03:00
Concedo	b48ea96ead	removed unwanted debugs	2024-05-01 11:35:07 +08:00
Concedo	63f8f55c4e	Merge branch 'upstream' into concedo_experimental	2024-05-01 11:04:18 +08:00
Johannes Gäßler	a8f9b07631	perplexity: more statistics, added documentation (#6936 ) * perplexity: more statistics, added documentation * add LLaMA 3 8b scoreboard	2024-04-30 23:36:27 +02:00
Kevin Gibbons	f364eb6fb5	switch to using localizedDescription (#7010 )	2024-04-30 17:14:02 +02:00
Concedo	c65448d17a	add flash attention toggle	2024-04-30 21:29:11 +08:00
Concedo	17a24d753c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/main-intel.Dockerfile # .devops/main-vulkan.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-vulkan.Dockerfile # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .gitignore # Makefile # README-sycl.md # README.md # ci/run.sh # flake.lock # llama.cpp # models/ggml-vocab-falcon.gguf # models/ggml-vocab-llama-spm.gguf # models/ggml-vocab-mpt.gguf # models/ggml-vocab-stablelm.gguf # models/ggml-vocab-starcoder.gguf # requirements.txt # scripts/check-requirements.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-0-bpe.py # tests/test-tokenizer-0-spm.py # tests/test-tokenizer-1-spm.cpp	2024-04-30 21:04:17 +08:00
Georgi Gerganov	77e15bec62	metal : remove deprecated error code (#7008 )	2024-04-30 15:52:21 +03:00

... 32 33 34 35 36 ...

6004 commits