Concedo
b4d2031215
merged, added ability to render special tokens
2024-04-22 18:19:58 +08:00
Olivier Chafik
5cf5e7d490
build
: generate hex dump of server assets during build (#6661 )
...
* `build`: generate hex dumps of server assets on the fly
* build: workaround lack of -n on gnu xxd
* build: don't use xxd in cmake
* build: don't call xxd from build.zig
* build: more idiomatic hexing
* build: don't use xxd in Makefile (od hackery instead)
* build: avoid exceeding max cmd line limit in makefile hex dump
* build: hex dump assets at cmake build time (not config time)
2024-04-21 18:48:53 +01:00
slaren
0d56246f4b
ggml : group all experts in a single ggml_mul_mat_id ( #6505 )
...
* ggml : group all experts in a single ggml_mul_mat_id
cuda : improve mmid row copy
* cuda : fix bin bcast with non-cont src0
* test-backend-ops : only run all mul mat tests for base types
* llama : disable moe offloading with SYCL
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-18 15:18:48 +02:00
Pierrick Hymbert
4bd0f93e4a
model: support arch DbrxForCausalLM
( #6515 )
...
* model: dbrx convert to gguf
#6344
* llama: support dbrx
#6344
* doc: dbrx: add the model as supported
* scripts: get-wikitext-2 add unzip
* llama: increase maximum experts allowed
* llama: factorize moe graph implementation between grok, mixtral and dbrx
---------
Co-authored-by: Megha Agarwal <16129366+megha95@users.noreply.github.com>
2024-04-13 11:33:52 +02:00
Concedo
d0e40f9233
fix indentation (+1 squashed commits)
...
Squashed commits:
[4d0fc028] testing a simple workflow for windows full build
2024-04-11 21:33:33 +08:00
Daniel Bevenius
f4183afe6a
scripts : add --outdir option to hf.sh ( #6600 )
...
* scripts : add --outdir option to hf.sh
This commit adds an option to the hf.sh script that allows the user to
specify an output directory for the downloaded file.
The motivation for this changes is that examples that use the hf.sh
script to download models from huggingface can now specify the output
directory, perhaps to the `models` directory to keep them in one place
and not clutter the root directory.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
* squash! scripts : add --outdir option to hf.sh
Fix format of the --outdir option in the usage message.
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
---------
Signed-off-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2024-04-11 16:22:47 +03:00
Georgi Gerganov
c4a3a4ff47
sync : ggml
2024-04-09 20:29:06 +03:00
Concedo
d1bb126605
Merge branch 'upstream' into concedo
...
# Conflicts:
# README.md
# llama.cpp
# otherarch/sdcpp/SDCPP_LICENSE
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.sh
2024-04-09 17:18:35 +08:00
Georgi Gerganov
e11a8999b5
license : update copyright notice + add AUTHORS ( #6405 )
...
* license : add AUTHORS
* authors : update
* scipts : add LICENSE and gen-authors.sh to sync
2024-04-09 09:23:19 +03:00
Georgi Gerganov
c37247796b
sync : ggml
2024-04-07 17:05:51 +03:00
Georgi Gerganov
43e8995e75
scripts : sync ggml-cuda folder
2024-04-07 16:08:12 +03:00
Concedo
a530afa1e4
Merge commit ' 280345968d
' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cpp-cuda.srpm.spec
# .devops/main-cuda.Dockerfile
# .devops/nix/package.nix
# .devops/server-cuda.Dockerfile
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# docs/token_generation_performance_tips.md
# flake.lock
# llama.cpp
# scripts/LlamaConfig.cmake.in
# scripts/compare-commits.sh
# scripts/server-llm.sh
# tests/test-quantize-fns.cpp
2024-04-07 20:27:17 +08:00
Georgi Gerganov
54ea0698fb
sync : ggml
2024-04-06 18:27:46 +03:00
Concedo
9c0fbf9f73
Merge commit ' ad3a0505e3
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/code-coverage.yml
# .github/workflows/docker.yml
# .github/workflows/editorconfig.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# .github/workflows/python-check-requirements.yml
# .github/workflows/python-lint.yml
# .github/workflows/server.yml
# .github/workflows/zig-build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# build.zig
# common/CMakeLists.txt
# llama.cpp
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2024-04-06 18:32:57 +08:00
Johannes Gäßler
33a5244806
compare-llama-bench.py: fix long hexsha args ( #6424 )
2024-04-01 13:30:43 +02:00
Georgi Gerganov
d48ccf3ad4
sync : ggml ( #6351 )
...
* sync : ggml
ggml-ci
* cuda : move GGML_CUDA_DMMV constants to dmmv.cuh
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-03-29 17:45:46 +02:00
slaren
280345968d
cuda : rename build flag to LLAMA_CUDA ( #6299 )
2024-03-26 01:16:01 +01:00
Johannes Gäßler
50ccaf5eac
lookup: complement data from context with general text statistics ( #5479 )
...
* lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
2024-03-23 01:24:36 +01:00
Georgi Gerganov
b838b53ad6
sync : ggml
2024-03-10 20:10:46 +02:00
Georgi Gerganov
8a3012a4ad
ggml : add ggml-common.h to deduplicate shared code ( #5940 )
...
* ggml : add ggml-common.h to shared code
ggml-ci
* scripts : update sync scripts
* sycl : reuse quantum tables
ggml-ci
* ggml : minor
* ggml : minor
* sycl : try to fix build
2024-03-09 12:47:57 +02:00
slaren
652ca2bded
compare-llama-bench.py : remove mul_mat_q ( #5892 )
2024-03-05 22:27:29 +01:00
Georgi Gerganov
efd8533ef8
sync : ggml
...
ggml-ci
2024-03-04 20:54:23 +02:00
Georgi Gerganov
a0fc62661f
sync : ggml
2024-03-04 10:40:04 +02:00
Concedo
7c64845dea
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/nix/sif.nix
# .github/workflows/build.yml
# .github/workflows/python-check-requirements.yml
# README-sycl.md
# README.md
# flake.lock
# flake.nix
# requirements/requirements-convert-hf-to-gguf.txt
# scripts/compare-llama-bench.py
2024-03-04 15:33:33 +08:00
Georgi Gerganov
ef2cd694c4
scripts : add pod-llama.sh
2024-03-02 16:54:20 +02:00
Pierrick Hymbert
3ab8b3a92e
llama : cleanup unused mmq flags ( #5772 )
...
* cleanup unused --no-mul-mat-q,-nommq, -mmq, --mul-mat-q, mul_mat_q
* remove: mul_mat_q in compare llama bench and usage
* update llama-bench
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-03-01 13:39:06 +02:00
Georgi Gerganov
8c0e8f4e73
sync : ggml
2024-02-28 11:17:32 +02:00
Georgi Gerganov
334f76fa38
sync : ggml
2024-02-22 23:21:05 +02:00
Georgi Gerganov
5022cf242d
sync : ggml
2024-02-21 16:52:52 +02:00
Georgi Gerganov
eccd7a26dd
sync : ggml ( #5633 )
...
* ggml : fix conv_2d batch mode (ggml/737)
Co-authored-by: bssrdf <bssrdf@gmail.com>
* ggml : compute forward no longer pass src tensors (ggml/729)
* sync : ggml
ggml-ci
---------
Co-authored-by: bssrdf <merlintiger@hotmail.com>
Co-authored-by: bssrdf <bssrdf@gmail.com>
2024-02-21 16:17:10 +02:00
Georgi Gerganov
337c9cbd52
sync : ggml
...
ggml-ci
2024-02-19 15:09:43 +02:00
Jared Van Bortel
a0c2dad9d4
build : pass all warning flags to nvcc via -Xcompiler ( #5570 )
...
* build : pass all warning flags to nvcc via -Xcompiler
* make : fix apparent mis-merge from #3952
* make : fix incorrect GF_CC_VER for CUDA host compiler
2024-02-18 16:21:52 -05:00
Georgi Gerganov
b1de96824b
ci : fix wikitext url + compile warnings ( #5569 )
...
ggml-ci
2024-02-18 22:39:30 +02:00
Concedo
1e460bb936
remove junk
2024-02-17 17:12:59 +08:00
Concedo
8d5e25008f
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# ci/run.sh
# tests/test-tokenizer-0-falcon.cpp
# tests/test-tokenizer-0-llama.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-llama.cpp
2024-02-17 15:22:05 +08:00
Georgi Gerganov
d2819d5577
scripts : add helpers script for bench comparing commits ( #5521 )
...
* scripts : add helpers script for bench comparing commits
* scripts : detect CUDA
* set flags after checking the command line
* fix make flags
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-02-16 15:14:40 +02:00
Georgi Gerganov
9350a1cf21
scripts : add hf.sh helper script ( #5501 )
...
* scripts : add hf.sh helper scripts
* hf : add error logs
* hf : add support for --repo and --file
2024-02-15 15:41:15 +02:00
Concedo
3cec37c2e0
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/workflows/python-lint.yml
# flake.lock
# ggml-cuda.cu
# ggml-quants.c
# llama.cpp
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
2024-02-13 00:14:22 +08:00
Georgi Gerganov
3b169441df
sync : ggml ( #5452 )
...
* ggml-alloc : v3 (ggml/727)
* ggml-alloc v3
ggml-ci
* fix ci
ggml-ci
* whisper : check for backend buffer allocation failures
* whisper : avoid leaks when initialization fails
* cleanup
ggml-ci
* style fixes
ggml-ci
* sync : ggml
* update llama.cpp, clip.cpp, export-lora.cpp
* update finetune.cpp, train-text-from-scratch.cpp
ggml-ci
* ggml-backend : reduce alignment to 32 to match gguf and fix mmap
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-02-12 09:16:06 +02:00
Concedo
ea3fd87f68
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# README.md
# scripts/sync-ggml.sh
2024-02-11 15:18:46 +08:00
Georgi Gerganov
cd9aea63b5
scripts : update sync scripts with new backends
2024-02-10 09:53:05 +02:00
Georgi Gerganov
43b65f5eb8
sync : ggml
2024-02-10 09:30:36 +02:00
Concedo
ec2dbd99a3
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# flake.lock
# llama.cpp
2024-02-07 22:21:32 +08:00
Georgi Gerganov
30679d438d
scripts : fix typos, cleanup ( #5303 )
2024-02-05 09:48:03 +02:00
Нияз Гарифзянов
4be04c8965
scripts : add non-interactive server-llm.sh ( #5303 )
...
* Update server-llm.sh
Add flag --non-interactive that allows run script without asking a permission
* Update scripts/server-llm.sh
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-02-05 09:43:57 +02:00
Concedo
6dc01297f8
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README.md
# flake.nix
# llama.cpp
# llama.h
# tests/test-llama-grammar.cpp
2024-02-04 19:42:57 +08:00
Georgi Gerganov
e437b37fd0
scripts : parse wtype in server-llm.sh ( #5167 )
...
* scripts : parse wtype in server-llm.sh
* scripts : fix check for wfile
2024-02-02 14:23:40 +02:00
Concedo
15deabd200
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/editorconfig.yml
# .gitignore
# CMakeLists.txt
# README.md
2024-01-31 18:53:38 +08:00
Neo Zhang Jianyu
01684139c3
support SYCL backend windows build ( #5208 )
...
* support SYCL backend windows build
* add windows build in CI
* add for win build CI
* correct install oneMKL
* fix install issue
* fix ci
* fix install cmd
* fix install cmd
* fix install cmd
* fix install cmd
* fix install cmd
* fix win build
* fix win build
* fix win build
* restore other CI part
* restore as base
* rm no new line
* fix no new line issue, add -j
* fix grammer issue
* allow to trigger manually, fix format issue
* fix format
* add newline
* fix format
* fix format
* fix format issuse
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-01-31 08:08:07 +05:30
Concedo
8c22f109fa
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# ggml.c
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
2024-01-30 23:57:06 +08:00