koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-22 03:10:03 +00:00

Author	SHA1	Message	Date
Olivier Chafik	b8a7a5a90f	build(cmake): simplify instructions (`cmake -B build && cmake --build build ...`) (#6964 ) * readme: cmake . -B build && cmake --build build * build: fix typo Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * build: drop implicit . from cmake config command * build: remove another superfluous . * build: update MinGW cmake commands * Update README-sycl.md Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com> * build: reinstate --config Release as not the default w/ some generators + document how to build Debug * build: revert more --config Release * build: nit / remove -H from cmake example * build: reword debug instructions around single/multi config split --------- Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2024-04-29 17:02:45 +01:00
Georgi Gerganov	f4ab2a4147	llama : fix BPE pre-tokenization (#6920 ) * merged the changes from deepseeker models to main branch * Moved regex patterns to unicode.cpp and updated unicode.h * Moved header files * Resolved issues * added and refactored unicode_regex_split and related functions * Updated/merged the deepseek coder pr * Refactored code * Adding unicode regex mappings * Adding unicode regex function * Added needed functionality, testing remains * Fixed issues * Fixed issue with gpt2 regex custom preprocessor * unicode : fix? unicode_wstring_to_utf8 * lint : fix whitespaces * tests : add tokenizer tests for numbers * unicode : remove redundant headers * tests : remove and rename tokenizer test scripts * tests : add sample usage * gguf-py : reader prints warnings on duplicate keys * llama : towards llama3 tokenization support (wip) * unicode : shot in the dark to fix tests on Windows * unicode : first try custom implementations * convert : add "tokenizer.ggml.pre" GGUF KV (wip) * llama : use new pre-tokenizer type * convert : fix pre-tokenizer type writing * lint : fix * make : add test-tokenizer-0-llama-v3 * wip * models : add llama v3 vocab file * llama : adapt punctuation regex + add llama 3 regex * minor * unicode : set bomb * unicode : set bomb * unicode : always use std::wregex * unicode : support \p{N}, \p{L} and \p{P} natively * unicode : try fix windows * unicode : category support via std::regex * unicode : clean-up * unicode : simplify * convert : add convert-hf-to-gguf-update.py ggml-ci * lint : update * convert : add falcon ggml-ci * unicode : normalize signatures * lint : fix * lint : fix * convert : remove unused functions * convert : add comments * convert : exercise contractions ggml-ci * lint : fix * cmake : refactor test targets * tests : refactor vocab tests ggml-ci * tests : add more vocabs and tests ggml-ci * unicode : cleanup * scripts : ignore new update script in check-requirements.sh * models : add phi-3, mpt, gpt-2, starcoder * tests : disable obsolete ggml-ci * tests : use faster bpe test ggml-ci * llama : more prominent warning for old BPE models * tests : disable test-tokenizer-1-bpe due to slowness ggml-ci --------- Co-authored-by: Jaggzh <jaggz.h@gmail.com> Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>	2024-04-29 16:58:41 +03:00
Przemysław Pawełczyk	ca7f29f568	ci : add building in MSYS2 environments (Windows) (#6967 )	2024-04-29 15:59:47 +03:00
Pierrick Hymbert	b7368332e2	ci: server: tests python env on github container ubuntu latest / fix n_predict (#6935 ) * ci: server: fix python env * ci: server: fix server tests after #6638 * ci: server: fix windows is not building PR branch	2024-04-27 17:50:48 +02:00
Pierrick Hymbert	bbe3c6e761	ci: server: fix python installation (#6925 )	2024-04-26 12:27:25 +02:00
Pierrick Hymbert	9e4e077ec5	ci: server: fix python installation (#6922 )	2024-04-26 11:11:51 +02:00
Pierrick Hymbert	d4a9afc100	ci: server: fix python installation (#6918 )	2024-04-26 09:27:49 +02:00
Pierrick Hymbert	7d641c26ac	ci: fix concurrency for pull_request_target (#6917 )	2024-04-26 09:26:59 +02:00
Pierrick Hymbert	c0956b09ba	ci: fix job are cancelling each other (#6781 )	2024-04-22 13:22:54 +02:00
loonerin	0e4802b2ec	ci: add ubuntu latest release and fix missing build number (mac & ubuntu) (#6748 )	2024-04-19 19:03:35 +02:00
Concedo	b0d796fb49	use different cublas binaries	2024-04-17 17:14:22 +08:00
Concedo	790b58fbf6	updated workflow for windows build (+1 squashed commits) Squashed commits: [b7e59661] test workflow	2024-04-16 17:20:46 +08:00
Concedo	bb7eb36134	test copying from install	2024-04-16 16:49:38 +08:00
Jaemin Son	e689fc4e91	[bug fix] convert github repository_owner to lowercase (#6673 )	2024-04-14 13:12:36 +02:00
Georgi Gerganov	9ed2737acc	ci : disable Metal for macOS-latest-cmake-x64 (#6628 )	2024-04-12 11:15:05 +03:00
Hugo Roussel	1bbdaf6ecd	ci: download artifacts to release directory (#6612 ) When action download-artifact was updated to v4, the default download path changed. This fix binaries not being uploaded to releases.	2024-04-11 19:52:21 +02:00
Concedo	2f3597c29a	typo for build dir	2024-04-12 00:10:28 +08:00
Concedo	a5fbf49a97	added cuda kcpp build steps	2024-04-11 23:45:32 +08:00
Concedo	06e3a6f36e	test workflow (+9 squashed commit) Squashed commit: [3d1fedab] test workflow [c26d3a50] test workflow [70e84f54] test workflow [3383d040] workflow test [2262b3c6] workflow test [cd335d5a] workflow test [bdbbfaeb] workflow test [8e9fed4c] testing workflow [e5b90d66] workflow test	2024-04-11 23:20:08 +08:00
Concedo	41fa4310b9	workflow test	2024-04-11 21:35:12 +08:00
Concedo	d0e40f9233	fix indentation (+1 squashed commits) Squashed commits: [4d0fc028] testing a simple workflow for windows full build	2024-04-11 21:33:33 +08:00
Pierrick Hymbert	b804b1ef77	eval-callback: Example how to use eval callback for debugging (#6576 ) * gguf-debug: Example how to use ggml callback for debugging * gguf-debug: no mutex, verify type, fix stride. * llama: cv eval: move cb eval field in common gpt_params * ggml_debug: use common gpt_params to pass cb eval. Fix get tensor SIGV random. * ggml_debug: ci: add tests * ggml_debug: EOL in CMakeLists.txt * ggml_debug: Remove unused param n_batch, no batching here * ggml_debug: fix trailing spaces * ggml_debug: fix trailing spaces * common: fix cb_eval and user data not initialized * ci: build revert label * ggml_debug: add main test label * doc: add a model: add a link to ggml-debug * ggml-debug: add to make toolchain * ggml-debug: tests add the main label * ggml-debug: ci add test curl label * common: allow the warmup to be disabled in llama_init_from_gpt_params * ci: add curl test * ggml-debug: better tensor type support * gitignore : ggml-debug * ggml-debug: printing also the sum of each tensor * ggml-debug: remove block size * eval-callback: renamed from ggml-debug * eval-callback: fix make toolchain --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-11 14:51:07 +02:00
Concedo	3fd40ae7f7	removed a workflow	2024-04-10 19:28:10 +08:00
Concedo	81ac0e5656	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cpp-clblast.srpm.spec # .devops/llama-cpp-cuda.srpm.spec # .devops/llama-cpp.srpm.spec # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-rocm.Dockerfile # .devops/server-vulkan.Dockerfile # .devops/server.Dockerfile # .github/workflows/build.yml # .github/workflows/code-coverage.yml # .github/workflows/docker.yml # .github/workflows/editorconfig.yml # .github/workflows/gguf-publish.yml # .github/workflows/nix-ci-aarch64.yml # .github/workflows/nix-ci.yml # .github/workflows/python-check-requirements.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .github/workflows/zig-build.yml # CMakeLists.txt # Makefile # README-sycl.md # README.md # ci/run.sh # examples/gguf-split/gguf-split.cpp # flake.lock # flake.nix # llama.cpp # scripts/compare-llama-bench.py # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp	2024-04-07 22:07:27 +08:00
Pierrick Hymbert	75cd4c7729	ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response (#6495 ) * ci: bench: support sse and fix prompt processing time server: add tokens usage in stream mode * ci: bench: README.md EOL * ci: bench: remove total pp and tg as it is not accurate * ci: bench: fix case when there is no token generated * ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics * ci: bench: fix finish reason rate	2024-04-06 05:40:47 +02:00
Minsoo Cheong	7dda1b727e	ci: exempt master branch workflows from getting cancelled (#6486 ) * ci: exempt master branch workflows from getting cancelled * apply to bench.yml	2024-04-04 18:30:53 +02:00
Ewout ter Hoeven	c666ba26c3	build CI: Name artifacts (#6482 ) Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP. It might be possible to further simplify the packing step (in future PRs).	2024-04-04 17:08:55 +02:00
Pierrick Hymbert	8120efee1d	ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478 )	2024-04-04 16:59:04 +02:00
Pierrick Hymbert	7a2c92637a	ci: bench: add more ftype, fix triggers and bot comment (#6466 ) * ci: bench: change trigger path to not spawn on each PR * ci: bench: add more file type for phi-2: q8_0 and f16. - do not show the comment by default * ci: bench: add seed parameter in k6 script * ci: bench: artefact name perf job * Add iteration in the commit status, reduce again the autocomment * ci: bench: add per slot metric in the commit status * Fix trailing spaces	2024-04-04 12:57:58 +03:00
Ewout ter Hoeven	9f62c0173d	ci : update checkout, setup-python and upload-artifact to latest (#6456 ) * CI: Update actions/checkout to v4 * CI: Update actions/setup-python to v5 * CI: Update actions/upload-artifact to v4	2024-04-03 21:01:13 +03:00
Pierrick Hymbert	226e819371	ci: server: verify deps are coherent with the commit (#6409 ) * ci: server: verify deps are coherent with the commit * ci: server: change the ref to build as now it's a pull event target	2024-04-01 12:36:40 +02:00
Pierrick Hymbert	37e7854c10	ci: bench: fix Resource not accessible by integration on PR event (#6393 )	2024-03-30 12:36:07 +02:00
Pierrick Hymbert	28cb9a09c4	ci: bench: fix master not schedule, fix commit status failed on external repo (#6365 )	2024-03-28 11:27:56 +01:00
Pierrick Hymbert	a016026a3a	server: continuous performance monitoring and PR comment (#6283 ) * server: bench: init * server: bench: reduce list of GPU nodes * server: bench: fix graph, fix output artifact * ci: bench: add mermaid in case of image cannot be uploaded * ci: bench: more resilient, more metrics * ci: bench: trigger build * ci: bench: fix duration * ci: bench: fix typo * ci: bench: fix mermaid values, markdown generated * typo on the step name Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * ci: bench: trailing spaces * ci: bench: move images in a details section * ci: bench: reduce bullet point size --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-03-27 20:26:49 +01:00
Neo Zhang Jianyu	a4f569e8a3	[SYCL] fix no file in win rel (#6314 )	2024-03-27 09:47:06 +08:00
slaren	280345968d	cuda : rename build flag to LLAMA_CUDA (#6299 )	2024-03-26 01:16:01 +01:00
Pierrick Hymbert	ea279d5609	ci : close inactive issue, increase operations per run (#6270 )	2024-03-24 10:57:06 +02:00
Neo Zhang Jianyu	d03224ac98	Support build win release for SYCL (#6241 ) * support release win * fix value * fix value * fix value * fix error * fix error * fix format	2024-03-24 09:44:01 +08:00
Pierrick Hymbert	f482bb2e49	common: llama_load_model_from_url split support (#6192 ) * llama: llama_split_prefix fix strncpy does not include string termination common: llama_load_model_from_url: - fix header name case sensitive - support downloading additional split in parallel - hide password in url * common: EOL EOF * common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition * common: change max url max length * common: minor comment * server: support HF URL options * llama: llama_model_loader fix log * common: use a constant for max url length * common: clean up curl if file cannot be loaded in gguf * server: tests: add split tests, and HF options params * common: move llama_download_hide_password_in_url inside llama_download_file as a lambda * server: tests: enable back Release test on PR * spacing Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * spacing Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * spacing Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-23 18:07:00 +01:00
fraxy-v	92397d87a4	convert-llama2c-to-ggml : enable conversion of GQA models (#6237 ) * convert-llama2c-to-ggml: enable conversion of multiqueries, #5608 * add test in build action * Update build.yml * Update build.yml * Update build.yml * gg patch	2024-03-22 20:49:06 +02:00
Minsoo Cheong	ee804f6223	ci: apply concurrency limit for github workflows (#6243 )	2024-03-22 19:15:06 +02:00
Olivier Chafik	f77a8ffd3b	tests : conditional python & node json schema tests (#6207 ) * json: only attempt python & node schema conversion tests if their bins are present Tests introduced in https://github.com/ggerganov/llama.cpp/pull/5978 disabled in https://github.com/ggerganov/llama.cpp/pull/6198 * json: orange warnings when tests skipped * json: ensure py/js schema conv tested on ubuntu-focal-make * json: print env vars in test	2024-03-22 15:09:07 +02:00
Vaibhav Srivastav	b2075fd6a5	ci : add CURL flag for the mac builds (#6214 )	2024-03-22 09:53:43 +02:00
Vaibhav Srivastav	1943c01981	ci : fix indentation error (#6195 )	2024-03-21 11:30:40 +02:00
Vaibhav Srivastav	5e43ba8742	build : add mac pre-build binaries (#6182 ) * Initial commit - add mac prebuilds. * forward contribution credits for building the workflow. * minor : remove trailing whitespaces --------- Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-21 11:13:12 +02:00
Concedo	942fb4b413	fixed removed ref (+1 squashed commits) Squashed commits: [93f3c270] fixed removed ref (+1 squashed commits) Squashed commits: [df361250] remove some files	2024-03-19 19:33:56 +08:00
Concedo	a3fa919c67	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # flake.lock # ggml-cuda.cu # ggml-cuda.h	2024-03-19 18:57:22 +08:00
slaren	970a48060a	ci : exempt some labels from being tagged as stale (#6140 )	2024-03-19 10:06:54 +02:00
Georgi Gerganov	ac9ee6a4ad	ci : disable stale issue messages (#6126 )	2024-03-18 13:45:38 +02:00
Georgi Gerganov	4f6d1337ca	ci : temporary disable sanitizer builds (#6128 )	2024-03-18 13:45:27 +02:00

1 2 3 4 5

238 commits