William Tambellini
858f6b73f6
Add an option to build without CUDA VMM ( #7067 )
...
Add an option to build ggml cuda without CUDA VMM
resolves
https://github.com/ggerganov/llama.cpp/issues/6889
https://forums.developer.nvidia.com/t/potential-nvshmem-allocated-memory-performance-issue/275416/4
2024-05-06 20:12:14 +02:00
Georgi Gerganov
b3a995b416
flake.lock: Update ( #7079 )
...
Flake lock file updates:
• Updated input 'flake-parts':
'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d?narHash=sha256-sB4SWl2lX95bExY2gMFG5HIzvva5AVMJd4Igm%2BGpZNw%3D' (2024-04-01)
→ 'github:hercules-ci/flake-parts/e5d10a24b66c3ea8f150e47dfdb0416ab7c3390e?narHash=sha256-yzcRNDoyVP7%2BSCNX0wmuDju1NUCt8Dz9%2BlyUXEI0dbI%3D' (2024-05-02)
• Updated input 'flake-parts/nixpkgs-lib':
'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib&narHash=sha256-iMUFArF0WCatKK6RzfUJknjem0H9m4KgorO/p3Dopkk%3D' (2024-03-29)
→ 'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D ' (2024-05-02)
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25)
→ 'github:NixOS/nixpkgs/63c3a29ca82437c87573e4c6919b09a24ea61b0f?narHash=sha256-4cPymbty65RvF1DWQfc%2BBc8B233A1BWxJnNULJKQ1EY%3D' (2024-05-02)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-05-06 08:36:06 -07:00
Concedo
0d1cd0171a
update docs
2024-05-06 21:17:11 +08:00
Concedo
62ea3eee4a
announce sdui url
2024-05-06 18:15:34 +08:00
Concedo
6c000cbe7a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/workflows/bench.yml
# .github/workflows/python-lint.yml
# .pre-commit-config.yaml
# Makefile
# README.md
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-refact.gguf
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements/requirements-convert.txt
# scripts/compare-llama-bench.py
# scripts/run-with-preset.py
# scripts/verify-checksum-models.py
# tests/CMakeLists.txt
# tests/test-tokenizer-0.cpp
2024-05-06 18:09:45 +08:00
Concedo
173c7272d5
EOS bypass mode added
2024-05-06 18:01:49 +08:00
Georgi Gerganov
bcdee0daa7
minor : fix trailing whitespace
2024-05-06 09:31:30 +03:00
Concedo
3667cc0113
fixed stableui btn (+4 squashed commit)
...
Squashed commit:
[1d4714f1] update default amount to gen
[6eacba33] updated lite
[033589af] added first ver sdui
[16f66d57] updated lite
2024-05-06 00:55:16 +08:00
kunnis
628b299106
Adding support for the --numa argument for llama-bench. ( #7080 )
2024-05-05 14:17:47 +02:00
Sigbjørn Skjæret
8f8acc8683
Disable benchmark on forked repo ( #7034 )
...
* Disable benchmark on forked repo
* only check owner on schedule event
* check owner on push also
* more readable as multi-line
* ternary won't work
* style++
* test++
* enable actions debug
* test--
* remove debug
* test++
* do debug where we can get logs
* test--
* this is driving me crazy
* correct github.event usage
* remove test condition
* correct github.event usage
* test++
* test--
* event_name is pull_request_target
* test++
* test--
* update ref checks
2024-05-05 13:38:55 +02:00
Lyle Dean
ca36326020
readme : add note that LLaMA 3 is not supported with convert.py ( #7065 )
2024-05-05 08:21:46 +03:00
DAN™
889bdd7686
command-r : add BPE pre-tokenization ( #7063 )
...
* Add BPE pre-tokenization for Command-R/R+.
* Bump transformers convert requirement.
* command-r : add individual digits regex
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-05 08:19:30 +03:00
Brian
6fbd432211
py : logging and flake8 suppression refactoring ( #7081 )
...
Set one as executable and add basicConfig()
to another. Also added noqa tag to test scripts.
2024-05-05 08:07:48 +03:00
Xuan Son Nguyen
842500144e
gguf-split: add --no-tensor-first-split ( #7072 )
2024-05-04 18:56:22 +02:00
Concedo
0c381f9ded
increase interrogate length
2024-05-05 00:40:49 +08:00
Jeximo
cf768b7e71
Tidy Android Instructions README.md ( #7016 )
...
* Tidy Android Instructions README.md
Remove CLBlast instructions(outdated), added OpenBlas.
* don't assume git is installed
Added apt install git, so that git clone works
* removed OpenBlas
Linked to Linux build instructions
* fix typo
Remove word "run"
* correct style
Co-authored-by: slaren <slarengh@gmail.com>
* correct grammar
Co-authored-by: slaren <slarengh@gmail.com>
* delete reference to Android API
* remove Fdroid reference, link directly to Termux
Fdroid is not required
Co-authored-by: slaren <slarengh@gmail.com>
* Update README.md
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-05-04 18:10:15 +02:00
Concedo
5ca267dc9c
remove unnecessary prints
2024-05-04 23:28:21 +08:00
viric
fcd84a0f5a
Fix Linux /sys cpu path to guess number of cores ( #7064 )
2024-05-04 15:26:53 +02:00
maor-ps
03fb8a002d
If first token generated from the server is the stop word the server will crash ( #7038 )
...
This will reproduce the issue in llama13b
{
'prompt': 'Q: hello world \nA: ',
'stop': ['\n'],
'temperature': 0.0,
'n_predict': 10,
'cache_prompt': True,
'n_probs': 10
}
2024-05-04 11:06:40 +02:00
Georgi Gerganov
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers ( #7036 )
...
* tests : add test-tokenizer-0.sh
* unicode : add all unicode number ranges
* starcoder : fix pre-tokenizer
* tests : add test that fails with DeepSeek tokenizers
* falcon : fix regex
* unicode : regenerate unicode tables
* refact : add tokenizer model
* lint : fix
* tests : disable failing tests
ggml-ci
* refact : add tests files
ggml-ci
* convert : print -> logging
ggml-ci
* lint : fix
* unicode : digit -> number
* phi-3 : update
2024-05-04 08:32:32 +03:00
Concedo
a3718c6354
1.64.1 to fix llava issues
2024-05-04 10:38:20 +08:00
Concedo
89db8afded
revert moondream to try and fix llava
2024-05-04 10:07:54 +08:00
Brian
a2ac89d6ef
convert.py : add python logging instead of print() ( #6511 )
...
* convert.py: add python logging instead of print()
* convert.py: verbose flag takes priority over dump flag log suppression
* convert.py: named instance logging
* convert.py: use explicit logger id string
* convert.py: convert extra print() to named logger
* convert.py: sys.stderr.write --> logger.error
* *.py: Convert all python scripts to use logging module
* requirements.txt: remove extra line
* flake8: update flake8 ignore and exclude to match ci settings
* gh-actions: add flake8-no-print to flake8 lint step
* pre-commit: add flake8-no-print to flake8 and also update pre-commit version
* convert-hf-to-gguf.py: print() to logger conversion
* *.py: logging basiconfig refactor to use conditional expression
* *.py: removed commented out logging
* fixup! *.py: logging basiconfig refactor to use conditional expression
* constant.py: logger.error then exit should be a raise exception instead
* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)
* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar
* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.
* compare-llama-bench.py: add blank line for readability during missing repo response
* reader.py: read_gguf_file() use print() over logging
* convert.py: warning goes to stderr and won't hurt the dump output
* gguf-dump.py: dump_metadata() should print to stdout
* convert-hf-to-gguf.py: print --> logger.debug or ValueError()
* verify-checksum-models.py: use print() for printing table
* *.py: refactor logging.basicConfig()
* gguf-py/gguf/*.py: use __name__ as logger name
Since they will be imported and not run directly.
* python-lint.yml: use .flake8 file instead
* constants.py: logger no longer required
* convert-hf-to-gguf.py: add additional logging
* convert-hf-to-gguf.py: print() --> logger
* *.py: fix flake8 warnings
* revert changes to convert-hf-to-gguf.py for get_name()
* convert-hf-to-gguf-update.py: use triple quoted f-string instead
* *.py: accidentally corrected the wrong line
* *.py: add compilade warning suggestions and style fixes
2024-05-03 22:36:41 +03:00
Daniel Bevenius
433def286e
llama : rename ctx to user_data in progress_callback ( #7045 )
...
* llama : rename ctx to user_data in progress_callback
This commit renames the `ctx` parameter to `user_data` in the
`llama_progress_callback` typedef.
The motivation for this is that other callbacks use `user_data` or
`data`, and using `ctx` in this case might be confusing as it could be
confused with `llama_context`.
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-05-03 15:24:30 +02:00
Concedo
640f195140
add kobble tiny to readme
2024-05-03 18:13:39 +08:00
henk717
b6bfab128f
CUDA 12 CI ( #815 )
...
* Allow KCPP_CUDA to specify CUDA version
* CUDA 12 CI Linux
* CUDA 12 CI
* Fix KCPP_CUDA indent
* KCPP_CUDA ENV Fix
StackOverflow is bad for advice sometimes....
* Lowcase cuda on output filename
* Strip . from filename output
2024-05-03 17:12:57 +08:00
Concedo
a34a09d196
replace destroy with quit for tk
2024-05-03 15:57:13 +08:00
Bartowski
60325fa56f
Remove .attention from skipped tensors to match more accurately ( #7051 )
2024-05-03 01:49:09 +02:00
alwqx
6ecf3189e0
chore: fix typo in llama.cpp ( #7032 )
...
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-05-02 11:56:41 -04:00
Concedo
4c5d307f59
fixed benchmark interrupt (+2 squashed commit)
...
Squashed commit:
[6e334c8b] require enter key to be pressed
[d50d49b6] fixed bench script
2024-05-02 23:22:47 +08:00
Concedo
0d8c4a9b73
remove quick lowvram option
2024-05-02 14:21:44 +08:00
Concedo
fb7e72352e
benchmark includes ver
2024-05-02 14:17:48 +08:00
Concedo
e7a962c70a
update readme
2024-05-02 10:57:54 +08:00
Andrew Downing
b0d943de17
Update LOG_IMPL and LOG_TEE_IMPL ( #7029 )
...
ROCm clang defines _MSC_VER which results in the wrong implementation of LOG_IMPL and LOG_TEE_IMPL being compiled.
This fixes https://github.com/ggerganov/llama.cpp/issues/6972
2024-05-01 23:31:30 +02:00
l3utterfly
8d608a81b7
main : fix off by one error for context shift ( #6921 )
2024-05-01 22:27:41 +03:00
Johannes Gäßler
3ea0d36000
Server: add tests for batch size, different seeds ( #6950 )
2024-05-01 17:52:55 +02:00
Concedo
3c2bd8aad3
add cu12 ci for windows
2024-05-01 22:46:02 +08:00
Concedo
81619f3611
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/close-issue.yml
# ggml-cuda/common.cuh
# ggml-cuda/fattn.cu
2024-05-01 21:14:34 +08:00
Johannes Gäßler
1613ef8d8e
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 ( #7019 )
2024-05-01 14:46:37 +02:00
Concedo
b641d986f7
use johannes implementation instead (+1 squashed commits)
...
Squashed commits:
[f5e6709d] use johannes implementation instead
2024-05-01 18:47:24 +08:00
Concedo
e9978bfac0
resize window dimensions
2024-05-01 17:38:49 +08:00
Concedo
cea46750b0
try hack in missing hmax2 functions (+1 squashed commits)
...
Squashed commits:
[c98d0ab6] try hack in missing hmax2 functions (+1 squashed commits)
Squashed commits:
[9ba8599f] try hack in missing hmax2 functions (+2 squashed commit)
Squashed commit:
[be497493] try hack in missing hmax2 functions
[159ee4c3] bypass missing hmax functions on old cuda
2024-05-01 15:36:16 +08:00
slaren
c4ec9c0d3d
ci : exempt confirmed bugs from being tagged as stale ( #7014 )
2024-05-01 08:13:59 +03:00
Concedo
b48ea96ead
removed unwanted debugs
2024-05-01 11:35:07 +08:00
Concedo
63f8f55c4e
Merge branch 'upstream' into concedo_experimental
2024-05-01 11:04:18 +08:00
Johannes Gäßler
a8f9b07631
perplexity: more statistics, added documentation ( #6936 )
...
* perplexity: more statistics, added documentation
* add LLaMA 3 8b scoreboard
2024-04-30 23:36:27 +02:00
Kevin Gibbons
f364eb6fb5
switch to using localizedDescription ( #7010 )
2024-04-30 17:14:02 +02:00
Concedo
c65448d17a
add flash attention toggle
2024-04-30 21:29:11 +08:00
Concedo
17a24d753c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/main-intel.Dockerfile
# .devops/main-vulkan.Dockerfile
# .devops/server-intel.Dockerfile
# .devops/server-vulkan.Dockerfile
# .github/workflows/bench.yml
# .github/workflows/build.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .gitignore
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# flake.lock
# llama.cpp
# models/ggml-vocab-falcon.gguf
# models/ggml-vocab-llama-spm.gguf
# models/ggml-vocab-mpt.gguf
# models/ggml-vocab-stablelm.gguf
# models/ggml-vocab-starcoder.gguf
# requirements.txt
# scripts/check-requirements.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-tokenizer-0-bpe.py
# tests/test-tokenizer-0-spm.py
# tests/test-tokenizer-1-spm.cpp
2024-04-30 21:04:17 +08:00
Georgi Gerganov
77e15bec62
metal : remove deprecated error code ( #7008 )
2024-04-30 15:52:21 +03:00