Commit graph

4365 commits

Author SHA1 Message Date
Concedo
bc39b4d98a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
#	ci/run.sh
#	docs/BLIS.md
#	flake.lock
#	grammars/README.md
2024-05-08 09:58:23 +08:00
henk717
3eeb4bc127
Stop forcing concedo_experimental in CI (#833)
* Use correct branch name

* CI now uses selected branch
2024-05-08 09:54:56 +08:00
Jeximo
c780e75305
Further tidy on Android instructions README.md (#7077)
* Further tidy on Android instructions README.md

Fixed some logic when following readme direction

* Clean up redundent information

A new user arriving will see simple directions on llama.cpp homepage

* corrected puncuation

Period after cmake, colon after termux

* re-word for clarity

method seems to be more correct, instead of alternative in this context

* Organized required packages per build type

building llama.cpp with NDK on a pc doesn't require installing clang, cmake, git, or wget in termux.

* README.md

corrected title

* fix trailing whitespace
2024-05-08 02:26:43 +02:00
jukofyork
48b2f9c1fc
Fixed save_imatrix to match old behaviour for MoE (#7099)
* Fixed save_imatrix to match old behaviour for MoE

This fix is simple and clear, but unnecessarily doubles the memory overhead..

* Fixed missing idx variable

* Unconditionally increment ncall

Co-authored-by: slaren <slarengh@gmail.com>

* Fixed 2 bugs in save_imatrix()

- Fixed segfault bug because the counts vector needed to be created.
- Fixed pre-existing bug didn't actually add to the counts for "--combine" option.

* ncall needs summing too

* Trailing whitespace

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-05-08 02:24:16 +02:00
Johannes Gäßler
af0a5b6163
server: fix incorrectly reported token probabilities (#7125)
* server: normalize token probabilities

* fix temperature == 0.0f
2024-05-07 23:07:58 +02:00
nopperl
b6aa670203
Fix OLMo HF to GGUF conversion (#6910) 2024-05-07 21:39:43 +02:00
Kyle Mistele
260b7c6529
server : update readme with undocumented options (#7013) 2024-05-07 21:44:29 +03:00
Georgi Gerganov
53d6c52e22
readme : update hot topics 2024-05-07 21:43:13 +03:00
RhinoDevel
3af34c1d1b
main : update log text (EOS to EOG) (#7104)
* Update log text (EOS to EOG)

The log text "found EOS" is no longer always correct, here, because there is now an is-EOG check that also returns true for EOT.

* Improve log msg. further by using "an" instead of "some".

As suggested, to avoid misunderstanding (no multiple EOG tokens found, just one).
2024-05-07 20:51:31 +03:00
omahs
04976db7a8
docs: fix typos (#7124)
* fix typo

* fix typos

* fix typo

* fix typos

* fix typo

* fix typos
2024-05-07 18:20:33 +03:00
Georgi Gerganov
947d3ad27d
ci : add GG_BUILD_EXTRA_TESTS_0 env (#7098)
* ci : add GG_BUILD_EXTRA_TESTS_0 env

ggml-ci

* Update run.sh

ggml-ci
2024-05-07 11:08:49 +03:00
William Tambellini
858f6b73f6
Add an option to build without CUDA VMM (#7067)
Add an option to build ggml cuda without CUDA VMM
resolves
https://github.com/ggerganov/llama.cpp/issues/6889
https://forums.developer.nvidia.com/t/potential-nvshmem-allocated-memory-performance-issue/275416/4
2024-05-06 20:12:14 +02:00
Georgi Gerganov
b3a995b416
flake.lock: Update (#7079)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/9126214d0a59633752a136528f5f3b9aa8565b7d?narHash=sha256-sB4SWl2lX95bExY2gMFG5HIzvva5AVMJd4Igm%2BGpZNw%3D' (2024-04-01)
  → 'github:hercules-ci/flake-parts/e5d10a24b66c3ea8f150e47dfdb0416ab7c3390e?narHash=sha256-yzcRNDoyVP7%2BSCNX0wmuDju1NUCt8Dz9%2BlyUXEI0dbI%3D' (2024-05-02)
• Updated input 'flake-parts/nixpkgs-lib':
    'github:NixOS/nixpkgs/d8fe5e6c92d0d190646fb9f1056741a229980089?dir=lib&narHash=sha256-iMUFArF0WCatKK6RzfUJknjem0H9m4KgorO/p3Dopkk%3D' (2024-03-29)
  → 'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D' (2024-05-02)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/7bb2ccd8cdc44c91edba16c48d2c8f331fb3d856?narHash=sha256-Drmja/f5MRHZCskS6mvzFqxEaZMeciScCTFxWVLqWEY%3D' (2024-04-25)
  → 'github:NixOS/nixpkgs/63c3a29ca82437c87573e4c6919b09a24ea61b0f?narHash=sha256-4cPymbty65RvF1DWQfc%2BBc8B233A1BWxJnNULJKQ1EY%3D' (2024-05-02)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-05-06 08:36:06 -07:00
Concedo
0d1cd0171a update docs 2024-05-06 21:17:11 +08:00
Concedo
62ea3eee4a announce sdui url 2024-05-06 18:15:34 +08:00
Concedo
6c000cbe7a Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.flake8
#	.github/workflows/bench.yml
#	.github/workflows/python-lint.yml
#	.pre-commit-config.yaml
#	Makefile
#	README.md
#	models/ggml-vocab-bert-bge.gguf.inp
#	models/ggml-vocab-bert-bge.gguf.out
#	models/ggml-vocab-deepseek-coder.gguf.inp
#	models/ggml-vocab-deepseek-coder.gguf.out
#	models/ggml-vocab-deepseek-llm.gguf.inp
#	models/ggml-vocab-deepseek-llm.gguf.out
#	models/ggml-vocab-falcon.gguf.inp
#	models/ggml-vocab-falcon.gguf.out
#	models/ggml-vocab-gpt-2.gguf.inp
#	models/ggml-vocab-gpt-2.gguf.out
#	models/ggml-vocab-llama-bpe.gguf.inp
#	models/ggml-vocab-llama-bpe.gguf.out
#	models/ggml-vocab-llama-spm.gguf.inp
#	models/ggml-vocab-llama-spm.gguf.out
#	models/ggml-vocab-mpt.gguf.inp
#	models/ggml-vocab-mpt.gguf.out
#	models/ggml-vocab-phi-3.gguf
#	models/ggml-vocab-phi-3.gguf.inp
#	models/ggml-vocab-phi-3.gguf.out
#	models/ggml-vocab-refact.gguf
#	models/ggml-vocab-starcoder.gguf.inp
#	models/ggml-vocab-starcoder.gguf.out
#	requirements/requirements-convert.txt
#	scripts/compare-llama-bench.py
#	scripts/run-with-preset.py
#	scripts/verify-checksum-models.py
#	tests/CMakeLists.txt
#	tests/test-tokenizer-0.cpp
2024-05-06 18:09:45 +08:00
Concedo
173c7272d5 EOS bypass mode added 2024-05-06 18:01:49 +08:00
Georgi Gerganov
bcdee0daa7
minor : fix trailing whitespace 2024-05-06 09:31:30 +03:00
Concedo
3667cc0113 fixed stableui btn (+4 squashed commit)
Squashed commit:

[1d4714f1] update default amount to gen

[6eacba33] updated lite

[033589af] added first ver sdui

[16f66d57] updated lite
2024-05-06 00:55:16 +08:00
kunnis
628b299106
Adding support for the --numa argument for llama-bench. (#7080) 2024-05-05 14:17:47 +02:00
Sigbjørn Skjæret
8f8acc8683
Disable benchmark on forked repo (#7034)
* Disable benchmark on forked repo

* only check owner on schedule event

* check owner on push also

* more readable as multi-line

* ternary won't work

* style++

* test++

* enable actions debug

* test--

* remove debug

* test++

* do debug where we can get logs

* test--

* this is driving me crazy

* correct github.event usage

* remove test condition

* correct github.event usage

* test++

* test--

* event_name is pull_request_target

* test++

* test--

* update ref checks
2024-05-05 13:38:55 +02:00
Lyle Dean
ca36326020
readme : add note that LLaMA 3 is not supported with convert.py (#7065) 2024-05-05 08:21:46 +03:00
DAN™
889bdd7686
command-r : add BPE pre-tokenization (#7063)
* Add BPE pre-tokenization for Command-R/R+.

* Bump transformers convert requirement.

* command-r : add individual digits regex

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-05 08:19:30 +03:00
Brian
6fbd432211
py : logging and flake8 suppression refactoring (#7081)
Set one as executable and add basicConfig()
to another. Also added noqa tag to test scripts.
2024-05-05 08:07:48 +03:00
Xuan Son Nguyen
842500144e
gguf-split: add --no-tensor-first-split (#7072) 2024-05-04 18:56:22 +02:00
Concedo
0c381f9ded increase interrogate length 2024-05-05 00:40:49 +08:00
Jeximo
cf768b7e71
Tidy Android Instructions README.md (#7016)
* Tidy Android Instructions README.md

Remove CLBlast instructions(outdated), added OpenBlas.

* don't assume git is installed

Added apt install git, so that git clone works

* removed OpenBlas

Linked to Linux build instructions

* fix typo

Remove word "run"

* correct style

Co-authored-by: slaren <slarengh@gmail.com>

* correct grammar

Co-authored-by: slaren <slarengh@gmail.com>

* delete reference to Android API

* remove Fdroid reference, link directly to Termux

Fdroid is not required

Co-authored-by: slaren <slarengh@gmail.com>

* Update README.md

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-05-04 18:10:15 +02:00
Concedo
5ca267dc9c remove unnecessary prints 2024-05-04 23:28:21 +08:00
viric
fcd84a0f5a
Fix Linux /sys cpu path to guess number of cores (#7064) 2024-05-04 15:26:53 +02:00
maor-ps
03fb8a002d
If first token generated from the server is the stop word the server will crash (#7038)
This will reproduce the issue in llama13b
{
'prompt': 'Q: hello world \nA: ',
 'stop': ['\n'],
 'temperature': 0.0,
 'n_predict': 10,
 'cache_prompt': True,
 'n_probs': 10
}
2024-05-04 11:06:40 +02:00
Georgi Gerganov
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers (#7036)
* tests : add test-tokenizer-0.sh

* unicode : add all unicode number ranges

* starcoder : fix pre-tokenizer

* tests : add test that fails with DeepSeek tokenizers

* falcon : fix regex

* unicode : regenerate unicode tables

* refact : add tokenizer model

* lint : fix

* tests : disable failing tests

ggml-ci

* refact : add tests files

ggml-ci

* convert : print -> logging

ggml-ci

* lint : fix

* unicode : digit -> number

* phi-3 : update
2024-05-04 08:32:32 +03:00
Concedo
a3718c6354 1.64.1 to fix llava issues 2024-05-04 10:38:20 +08:00
Concedo
89db8afded revert moondream to try and fix llava 2024-05-04 10:07:54 +08:00
Brian
a2ac89d6ef
convert.py : add python logging instead of print() (#6511)
* convert.py: add python logging instead of print()

* convert.py: verbose flag takes priority over dump flag log suppression

* convert.py: named instance logging

* convert.py: use explicit logger id string

* convert.py: convert extra print() to named logger

* convert.py: sys.stderr.write --> logger.error

* *.py: Convert all python scripts to use logging module

* requirements.txt: remove extra line

* flake8: update flake8 ignore and exclude to match ci settings

* gh-actions: add flake8-no-print to flake8 lint step

* pre-commit: add flake8-no-print to flake8 and also update pre-commit version

* convert-hf-to-gguf.py: print() to logger conversion

* *.py: logging basiconfig refactor to use conditional expression

* *.py: removed commented out logging

* fixup! *.py: logging basiconfig refactor to use conditional expression

* constant.py: logger.error then exit should be a raise exception instead

* *.py: Convert logger error and sys.exit() into a raise exception (for atypical error)

* gguf-convert-endian.py: refactor convert_byteorder() to use tqdm progressbar

* verify-checksum-model.py: This is the result of the program, it should be printed to stdout.

* compare-llama-bench.py: add blank line for readability during missing repo response

* reader.py: read_gguf_file() use print() over logging

* convert.py: warning goes to stderr and won't hurt the dump output

* gguf-dump.py: dump_metadata() should print to stdout

* convert-hf-to-gguf.py: print --> logger.debug or ValueError()

* verify-checksum-models.py: use print() for printing table

* *.py: refactor logging.basicConfig()

* gguf-py/gguf/*.py: use __name__ as logger name

Since they will be imported and not run directly.

* python-lint.yml: use .flake8 file instead

* constants.py: logger no longer required

* convert-hf-to-gguf.py: add additional logging

* convert-hf-to-gguf.py: print() --> logger

* *.py: fix flake8 warnings

* revert changes to convert-hf-to-gguf.py for get_name()

* convert-hf-to-gguf-update.py: use triple quoted f-string instead

* *.py: accidentally corrected the wrong line

* *.py: add compilade warning suggestions and style fixes
2024-05-03 22:36:41 +03:00
Daniel Bevenius
433def286e
llama : rename ctx to user_data in progress_callback (#7045)
* llama : rename ctx to user_data in progress_callback

This commit renames the `ctx` parameter to `user_data` in the
`llama_progress_callback` typedef.

The motivation for this is that other callbacks use `user_data` or
`data`, and using `ctx` in this case might be confusing as it could be
confused with `llama_context`.

---------

Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-05-03 15:24:30 +02:00
Concedo
640f195140 add kobble tiny to readme 2024-05-03 18:13:39 +08:00
henk717
b6bfab128f
CUDA 12 CI (#815)
* Allow KCPP_CUDA to specify CUDA version

* CUDA 12 CI Linux

* CUDA 12 CI

* Fix KCPP_CUDA indent

* KCPP_CUDA ENV Fix

StackOverflow is bad for advice sometimes....

* Lowcase cuda on output filename

* Strip . from filename output
2024-05-03 17:12:57 +08:00
Concedo
a34a09d196 replace destroy with quit for tk 2024-05-03 15:57:13 +08:00
Bartowski
60325fa56f
Remove .attention from skipped tensors to match more accurately (#7051) 2024-05-03 01:49:09 +02:00
alwqx
6ecf3189e0
chore: fix typo in llama.cpp (#7032)
Co-authored-by: Jared Van Bortel <jared@nomic.ai>
2024-05-02 11:56:41 -04:00
Concedo
4c5d307f59 fixed benchmark interrupt (+2 squashed commit)
Squashed commit:

[6e334c8b] require enter key to be pressed

[d50d49b6] fixed bench script
2024-05-02 23:22:47 +08:00
Concedo
0d8c4a9b73 remove quick lowvram option 2024-05-02 14:21:44 +08:00
Concedo
fb7e72352e benchmark includes ver 2024-05-02 14:17:48 +08:00
Concedo
e7a962c70a update readme 2024-05-02 10:57:54 +08:00
Andrew Downing
b0d943de17
Update LOG_IMPL and LOG_TEE_IMPL (#7029)
ROCm clang defines _MSC_VER which results in the wrong implementation of LOG_IMPL and LOG_TEE_IMPL being compiled.

This fixes https://github.com/ggerganov/llama.cpp/issues/6972
2024-05-01 23:31:30 +02:00
l3utterfly
8d608a81b7
main : fix off by one error for context shift (#6921) 2024-05-01 22:27:41 +03:00
Johannes Gäßler
3ea0d36000
Server: add tests for batch size, different seeds (#6950) 2024-05-01 17:52:55 +02:00
Concedo
3c2bd8aad3 add cu12 ci for windows 2024-05-01 22:46:02 +08:00
Concedo
81619f3611 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/close-issue.yml
#	ggml-cuda/common.cuh
#	ggml-cuda/fattn.cu
2024-05-01 21:14:34 +08:00
Johannes Gäßler
1613ef8d8e
CUDA: CUDART < 11.7 workaround for __hmax, __hmax2 (#7019) 2024-05-01 14:46:37 +02:00