Olivier Chafik
b8a7a5a90f
build(cmake): simplify instructions (cmake -B build && cmake --build build ...) ( #6964 )
...
* readme: cmake . -B build && cmake --build build
* build: fix typo
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
* build: drop implicit . from cmake config command
* build: remove another superfluous .
* build: update MinGW cmake commands
* Update README-sycl.md
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
* build: reinstate --config Release as not the default w/ some generators + document how to build Debug
* build: revert more --config Release
* build: nit / remove -H from cmake example
* build: reword debug instructions around single/multi config split
---------
Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2024-04-29 17:02:45 +01:00
Georgi Gerganov
f4ab2a4147
llama : fix BPE pre-tokenization ( #6920 )
...
* merged the changes from deepseeker models to main branch
* Moved regex patterns to unicode.cpp and updated unicode.h
* Moved header files
* Resolved issues
* added and refactored unicode_regex_split and related functions
* Updated/merged the deepseek coder pr
* Refactored code
* Adding unicode regex mappings
* Adding unicode regex function
* Added needed functionality, testing remains
* Fixed issues
* Fixed issue with gpt2 regex custom preprocessor
* unicode : fix? unicode_wstring_to_utf8
* lint : fix whitespaces
* tests : add tokenizer tests for numbers
* unicode : remove redundant headers
* tests : remove and rename tokenizer test scripts
* tests : add sample usage
* gguf-py : reader prints warnings on duplicate keys
* llama : towards llama3 tokenization support (wip)
* unicode : shot in the dark to fix tests on Windows
* unicode : first try custom implementations
* convert : add "tokenizer.ggml.pre" GGUF KV (wip)
* llama : use new pre-tokenizer type
* convert : fix pre-tokenizer type writing
* lint : fix
* make : add test-tokenizer-0-llama-v3
* wip
* models : add llama v3 vocab file
* llama : adapt punctuation regex + add llama 3 regex
* minor
* unicode : set bomb
* unicode : set bomb
* unicode : always use std::wregex
* unicode : support \p{N}, \p{L} and \p{P} natively
* unicode : try fix windows
* unicode : category support via std::regex
* unicode : clean-up
* unicode : simplify
* convert : add convert-hf-to-gguf-update.py
ggml-ci
* lint : update
* convert : add falcon
ggml-ci
* unicode : normalize signatures
* lint : fix
* lint : fix
* convert : remove unused functions
* convert : add comments
* convert : exercise contractions
ggml-ci
* lint : fix
* cmake : refactor test targets
* tests : refactor vocab tests
ggml-ci
* tests : add more vocabs and tests
ggml-ci
* unicode : cleanup
* scripts : ignore new update script in check-requirements.sh
* models : add phi-3, mpt, gpt-2, starcoder
* tests : disable obsolete
ggml-ci
* tests : use faster bpe test
ggml-ci
* llama : more prominent warning for old BPE models
* tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
---------
Co-authored-by: Jaggzh <jaggz.h@gmail.com>
Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
2024-04-29 16:58:41 +03:00
Przemysław Pawełczyk
ca7f29f568
ci : add building in MSYS2 environments (Windows) ( #6967 )
2024-04-29 15:59:47 +03:00
Pierrick Hymbert
b7368332e2
ci: server: tests python env on github container ubuntu latest / fix n_predict ( #6935 )
...
* ci: server: fix python env
* ci: server: fix server tests after #6638
* ci: server: fix windows is not building PR branch
2024-04-27 17:50:48 +02:00
Pierrick Hymbert
bbe3c6e761
ci: server: fix python installation ( #6925 )
2024-04-26 12:27:25 +02:00
Pierrick Hymbert
9e4e077ec5
ci: server: fix python installation ( #6922 )
2024-04-26 11:11:51 +02:00
Pierrick Hymbert
d4a9afc100
ci: server: fix python installation ( #6918 )
2024-04-26 09:27:49 +02:00
Pierrick Hymbert
7d641c26ac
ci: fix concurrency for pull_request_target ( #6917 )
2024-04-26 09:26:59 +02:00
Pierrick Hymbert
c0956b09ba
ci: fix job are cancelling each other ( #6781 )
2024-04-22 13:22:54 +02:00
loonerin
0e4802b2ec
ci: add ubuntu latest release and fix missing build number (mac & ubuntu) ( #6748 )
2024-04-19 19:03:35 +02:00
Concedo
b0d796fb49
use different cublas binaries
2024-04-17 17:14:22 +08:00
Concedo
790b58fbf6
updated workflow for windows build (+1 squashed commits)
...
Squashed commits:
[b7e59661] test workflow
2024-04-16 17:20:46 +08:00
Concedo
bb7eb36134
test copying from install
2024-04-16 16:49:38 +08:00
Jaemin Son
e689fc4e91
[bug fix] convert github repository_owner to lowercase ( #6673 )
2024-04-14 13:12:36 +02:00
Georgi Gerganov
9ed2737acc
ci : disable Metal for macOS-latest-cmake-x64 ( #6628 )
2024-04-12 11:15:05 +03:00
Hugo Roussel
1bbdaf6ecd
ci: download artifacts to release directory ( #6612 )
...
When action download-artifact was updated to v4, the default download path changed.
This fix binaries not being uploaded to releases.
2024-04-11 19:52:21 +02:00
Concedo
2f3597c29a
typo for build dir
2024-04-12 00:10:28 +08:00
Concedo
a5fbf49a97
added cuda kcpp build steps
2024-04-11 23:45:32 +08:00
Concedo
06e3a6f36e
test workflow (+9 squashed commit)
...
Squashed commit:
[3d1fedab] test workflow
[c26d3a50] test workflow
[70e84f54] test workflow
[3383d040] workflow test
[2262b3c6] workflow test
[cd335d5a] workflow test
[bdbbfaeb] workflow test
[8e9fed4c] testing workflow
[e5b90d66] workflow test
2024-04-11 23:20:08 +08:00
Concedo
41fa4310b9
workflow test
2024-04-11 21:35:12 +08:00
Concedo
d0e40f9233
fix indentation (+1 squashed commits)
...
Squashed commits:
[4d0fc028] testing a simple workflow for windows full build
2024-04-11 21:33:33 +08:00
Pierrick Hymbert
b804b1ef77
eval-callback: Example how to use eval callback for debugging ( #6576 )
...
* gguf-debug: Example how to use ggml callback for debugging
* gguf-debug: no mutex, verify type, fix stride.
* llama: cv eval: move cb eval field in common gpt_params
* ggml_debug: use common gpt_params to pass cb eval.
Fix get tensor SIGV random.
* ggml_debug: ci: add tests
* ggml_debug: EOL in CMakeLists.txt
* ggml_debug: Remove unused param n_batch, no batching here
* ggml_debug: fix trailing spaces
* ggml_debug: fix trailing spaces
* common: fix cb_eval and user data not initialized
* ci: build revert label
* ggml_debug: add main test label
* doc: add a model: add a link to ggml-debug
* ggml-debug: add to make toolchain
* ggml-debug: tests add the main label
* ggml-debug: ci add test curl label
* common: allow the warmup to be disabled in llama_init_from_gpt_params
* ci: add curl test
* ggml-debug: better tensor type support
* gitignore : ggml-debug
* ggml-debug: printing also the sum of each tensor
* ggml-debug: remove block size
* eval-callback: renamed from ggml-debug
* eval-callback: fix make toolchain
---------
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-11 14:51:07 +02:00
Concedo
3fd40ae7f7
removed a workflow
2024-04-10 19:28:10 +08:00
Concedo
81ac0e5656
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/llama-cpp-clblast.srpm.spec
# .devops/llama-cpp-cuda.srpm.spec
# .devops/llama-cpp.srpm.spec
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-rocm.Dockerfile
# .devops/server-vulkan.Dockerfile
# .devops/server.Dockerfile
# .github/workflows/build.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/gguf-publish.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# examples/gguf-split/gguf-split.cpp
# flake.lock
# flake.nix
# llama.cpp
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
2024-04-07 22:07:27 +08:00
Pierrick Hymbert
75cd4c7729
ci: bench: support sse and fix prompt processing time / server: add tokens usage in stream OAI response ( #6495 )
...
* ci: bench: support sse and fix prompt processing time
server: add tokens usage in stream mode
* ci: bench: README.md EOL
* ci: bench: remove total pp and tg as it is not accurate
* ci: bench: fix case when there is no token generated
* ci: bench: change to the 95 percentile for pp and tg as it is closer to what the server exports in metrics
* ci: bench: fix finish reason rate
2024-04-06 05:40:47 +02:00
Minsoo Cheong
7dda1b727e
ci: exempt master branch workflows from getting cancelled ( #6486 )
...
* ci: exempt master branch workflows from getting cancelled
* apply to bench.yml
2024-04-04 18:30:53 +02:00
Ewout ter Hoeven
c666ba26c3
build CI: Name artifacts ( #6482 )
...
Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.
It might be possible to further simplify the packing step (in future PRs).
2024-04-04 17:08:55 +02:00
Pierrick Hymbert
8120efee1d
ci: bench fix concurrency for workflow trigger dispatch with sha1 ( #6478 )
2024-04-04 16:59:04 +02:00
Pierrick Hymbert
7a2c92637a
ci: bench: add more ftype, fix triggers and bot comment ( #6466 )
...
* ci: bench: change trigger path to not spawn on each PR
* ci: bench: add more file type for phi-2: q8_0 and f16.
- do not show the comment by default
* ci: bench: add seed parameter in k6 script
* ci: bench: artefact name perf job
* Add iteration in the commit status, reduce again the autocomment
* ci: bench: add per slot metric in the commit status
* Fix trailing spaces
2024-04-04 12:57:58 +03:00
Ewout ter Hoeven
9f62c0173d
ci : update checkout, setup-python and upload-artifact to latest ( #6456 )
...
* CI: Update actions/checkout to v4
* CI: Update actions/setup-python to v5
* CI: Update actions/upload-artifact to v4
2024-04-03 21:01:13 +03:00
Pierrick Hymbert
226e819371
ci: server: verify deps are coherent with the commit ( #6409 )
...
* ci: server: verify deps are coherent with the commit
* ci: server: change the ref to build as now it's a pull event target
2024-04-01 12:36:40 +02:00
Pierrick Hymbert
37e7854c10
ci: bench: fix Resource not accessible by integration on PR event ( #6393 )
2024-03-30 12:36:07 +02:00
Pierrick Hymbert
28cb9a09c4
ci: bench: fix master not schedule, fix commit status failed on external repo ( #6365 )
2024-03-28 11:27:56 +01:00
Pierrick Hymbert
a016026a3a
server: continuous performance monitoring and PR comment ( #6283 )
...
* server: bench: init
* server: bench: reduce list of GPU nodes
* server: bench: fix graph, fix output artifact
* ci: bench: add mermaid in case of image cannot be uploaded
* ci: bench: more resilient, more metrics
* ci: bench: trigger build
* ci: bench: fix duration
* ci: bench: fix typo
* ci: bench: fix mermaid values, markdown generated
* typo on the step name
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* ci: bench: trailing spaces
* ci: bench: move images in a details section
* ci: bench: reduce bullet point size
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-03-27 20:26:49 +01:00
Neo Zhang Jianyu
a4f569e8a3
[SYCL] fix no file in win rel ( #6314 )
2024-03-27 09:47:06 +08:00
slaren
280345968d
cuda : rename build flag to LLAMA_CUDA ( #6299 )
2024-03-26 01:16:01 +01:00
Pierrick Hymbert
ea279d5609
ci : close inactive issue, increase operations per run ( #6270 )
2024-03-24 10:57:06 +02:00
Neo Zhang Jianyu
d03224ac98
Support build win release for SYCL ( #6241 )
...
* support release win
* fix value
* fix value
* fix value
* fix error
* fix error
* fix format
2024-03-24 09:44:01 +08:00
Pierrick Hymbert
f482bb2e49
common: llama_load_model_from_url split support ( #6192 )
...
* llama: llama_split_prefix fix strncpy does not include string termination
common: llama_load_model_from_url:
- fix header name case sensitive
- support downloading additional split in parallel
- hide password in url
* common: EOL EOF
* common: remove redundant LLAMA_CURL_MAX_PATH_LENGTH definition
* common: change max url max length
* common: minor comment
* server: support HF URL options
* llama: llama_model_loader fix log
* common: use a constant for max url length
* common: clean up curl if file cannot be loaded in gguf
* server: tests: add split tests, and HF options params
* common: move llama_download_hide_password_in_url inside llama_download_file as a lambda
* server: tests: enable back Release test on PR
* spacing
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* spacing
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* spacing
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-23 18:07:00 +01:00
fraxy-v
92397d87a4
convert-llama2c-to-ggml : enable conversion of GQA models ( #6237 )
...
* convert-llama2c-to-ggml: enable conversion of multiqueries, #5608
* add test in build action
* Update build.yml
* Update build.yml
* Update build.yml
* gg patch
2024-03-22 20:49:06 +02:00
Minsoo Cheong
ee804f6223
ci: apply concurrency limit for github workflows ( #6243 )
2024-03-22 19:15:06 +02:00
Olivier Chafik
f77a8ffd3b
tests : conditional python & node json schema tests ( #6207 )
...
* json: only attempt python & node schema conversion tests if their bins are present
Tests introduced in https://github.com/ggerganov/llama.cpp/pull/5978
disabled in https://github.com/ggerganov/llama.cpp/pull/6198
* json: orange warnings when tests skipped
* json: ensure py/js schema conv tested on ubuntu-focal-make
* json: print env vars in test
2024-03-22 15:09:07 +02:00
Vaibhav Srivastav
b2075fd6a5
ci : add CURL flag for the mac builds ( #6214 )
2024-03-22 09:53:43 +02:00
Vaibhav Srivastav
1943c01981
ci : fix indentation error ( #6195 )
2024-03-21 11:30:40 +02:00
Vaibhav Srivastav
5e43ba8742
build : add mac pre-build binaries ( #6182 )
...
* Initial commit - add mac prebuilds.
* forward contribution credits for building the workflow.
* minor : remove trailing whitespaces
---------
Co-authored-by: Nicolas Patry <Narsil@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-21 11:13:12 +02:00
Concedo
942fb4b413
fixed removed ref (+1 squashed commits)
...
Squashed commits:
[93f3c270] fixed removed ref (+1 squashed commits)
Squashed commits:
[df361250] remove some files
2024-03-19 19:33:56 +08:00
Concedo
a3fa919c67
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# flake.lock
# ggml-cuda.cu
# ggml-cuda.h
2024-03-19 18:57:22 +08:00
slaren
970a48060a
ci : exempt some labels from being tagged as stale ( #6140 )
2024-03-19 10:06:54 +02:00
Georgi Gerganov
ac9ee6a4ad
ci : disable stale issue messages ( #6126 )
2024-03-18 13:45:38 +02:00
Georgi Gerganov
4f6d1337ca
ci : temporary disable sanitizer builds ( #6128 )
2024-03-18 13:45:27 +02:00