Commit graph

521 commits

Author SHA1 Message Date
R0CKSTAR
f0204a0ec7
ci: build test musa with cmake (#10298)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-11-15 12:47:25 +01:00
Concedo
fedc3874bd try fix build inconsistency 2024-11-15 14:12:53 +08:00
Concedo
d595a80abc update prints 2024-11-15 14:10:02 +08:00
Romain Biessy
5a54af4d4f
sycl: Use syclcompat::dp4a (#10267)
* sycl: Use syclcompat::dp4a

* Using the syclcompat version allow the compiler to optimize the
  operation with native function

* Update news section

* Update CI Windows oneAPI version to 2025.0

* Reword doc

* Call syclcompat::dp4a inside dpct::dp4a

This reverts commit 90cb61d692d61360b46954a1c7f780bd2e569b73.
2024-11-15 11:09:12 +08:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries (#10256)
* ggml : build backends as libraries

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00
Georgi Gerganov
ec450d3bbf
metal : opt-in compile flag for BF16 (#10218)
* metal : opt-in compile flag for BF16

ggml-ci

* ci : use BF16

ggml-ci

* swift : switch back to v12

* metal : has_float -> use_float

ggml-ci

* metal : fix BF16 check in MSL

ggml-ci
2024-11-08 21:59:46 +02:00
Eve
3407364776
Q6_K AVX improvements (#10118)
* q6_k instruction reordering attempt

* better subtract method

* should be theoretically faster

small improvement with shuffle lut, likely because all loads are already done at that stage

* optimize bit fiddling

* handle -32 offset separately. bsums exists for a reason!

* use shift

* Update ggml-quants.c

* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
2024-11-04 23:06:31 +01:00
Concedo
4ae06b4a64 print some env vars for win ci 2024-11-01 23:58:41 +08:00
R0CKSTAR
cf8e0a3bb9
musa: add docker image support (#9685)
* mtgpu: add docker image support

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* mtgpu: enable docker workflow

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-10 20:10:37 +02:00
Xuan Son Nguyen
f3fdcfaa79
ci : fine-grant permission (#9710) 2024-10-04 11:47:19 +02:00
Diego Devesa
c83ad6d01e
ggml-backend : add device and backend reg interfaces (#9707)
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
serhii-nakon
6f1d9d71f4
Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS (#9641)
* Fix Docker ROCM builds, use AMDGPU_TARGETS instead of GPU_TARGETS

* Set ROCM_DOCKER_ARCH as string due it incorrectly build and cause OOM exit code
2024-09-30 20:57:12 +02:00
compilade
511636df0c
ci : reduce severity of unused Pyright ignore comments (#9697) 2024-09-30 14:13:16 -04:00
Neo Zhang Jianyu
95bc82fbc0
[SYCL] add missed dll file in package (#9577)
* update oneapi to 2024.2

* use 2024.1

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-26 17:38:31 +08:00
Xuan Son Nguyen
ea9c32be71
ci : fix docker build number and tag name (#9638)
* ci : fix docker build number and tag name

* fine-grant permissions
2024-09-25 17:26:01 +02:00
Huang Qi
e948a7da7a
CI: Provide prebuilt windows binary for hip (#9467) 2024-09-21 02:39:41 +02:00
Georgi Gerganov
6262d13e0b
common : reimplement logging (#9418)
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-15 20:46:12 +03:00
Mathijs Henquet
78203641fe
server : Add option to return token pieces in /tokenize endpoint (#9108)
* server : added with_pieces functionality to /tokenize endpoint

* server : Add tokenize with pieces tests to server.feature

* Handle case if tokenizer splits along utf8 continuation bytes

* Add example of token splitting

* Remove trailing ws

* Fix trailing ws

* Maybe fix ci

* maybe this fix windows ci?

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-09-12 22:30:11 +02:00
Huang Qi
4dc4f5f14a
ci : update HIP SDK to 24.Q3 (ROCm 6.1) (#9329) 2024-09-12 14:28:43 +03:00
Trivikram Kamat
3c26a1644d
ci : bump actions/checkout to v4 (#9377) 2024-09-12 14:27:45 +03:00
slaren
6c89eb0b47
ci : disable rocm image creation (#9340) 2024-09-07 10:48:54 +03:00
awatuna
32b2ec88bc
Update build.yml (#9184)
build rpc-server for windows cuda
2024-09-06 00:34:36 +02:00
slaren
9fe94ccac9
docker : build images only once (#9225) 2024-08-28 17:28:00 +02:00
Georgi Gerganov
d5492f0525
ci : disable bench workflow (#9010) 2024-08-15 10:11:11 +03:00
Diogo Teles Sant'Anna
fc4ca27b25
ci : fix github workflow vulnerable to script injection (#9008)
Signed-off-by: Diogo Teles Sant'Anna <diogoteles@google.com>
2024-08-12 19:28:23 +03:00
Radoslav Gerganov
1f67436c5e
ci : enable RPC in all of the released builds (#9006)
ref: #8912
2024-08-12 19:17:03 +03:00
Georgi Gerganov
d3ae0ee8d7
py : fix requirements check '==' -> '~=' (#8982)
* py : fix requirements check '==' -> '~='

* cont : fix the fix

* ci : run on all requirements.txt
2024-08-12 11:02:01 +03:00
Concedo
03adb90dc6 prompt command done 2024-08-07 20:52:28 +08:00
Concedo
c7108742f4 fix typo 2024-08-06 17:24:58 +08:00
henk717
0d534d810f Mac builds (#1037)
* OSX attempt 1

* OSX Pyinstaller

* Update kcpp-build-release-osx.yaml

* Update kcpp-build-release-osx.yaml

* Update kcpp-build-release-osx.yaml

* Add .metal file

* Update kcpp-build-release-osx.yaml

* Polish Mac

(cherry picked from commit 52cc0daa1b)
2024-08-06 17:11:19 +08:00
Johannes Gäßler
6eeaeba126
cmake: use 1 more thread for non-ggml in CI (#8740) 2024-07-28 22:32:44 +02:00
Concedo
a84f7c5d81 revert num old cpu for ci 2024-07-25 13:24:34 +08:00
Concedo
e28c42d7f7 adjusted layer estimation 2024-07-24 21:54:49 +08:00
Concedo
44ef87f14c update lite, try fix ci 2024-07-24 16:31:34 +08:00
Johannes Gäßler
69c487f4ed
CUDA: MMQ code deduplication + iquant support (#8495)
* CUDA: MMQ code deduplication + iquant support

* 1 less parallel job for CI build
2024-07-20 22:25:26 +02:00
Concedo
8412946b9f fix oldcpu build avx1 2024-07-15 23:42:22 +08:00
Concedo
21179d675b try ci for avx1, up ver (+2 squashed commit)
Squashed commit:

[74150175] up version

[97b6163c] try ci for avx1 linux
2024-07-15 23:07:07 +08:00
bandoti
17eb6aa8a9
vulkan : cmake integration (#8119)
* Add Vulkan to CMake pkg

* Add Sycl to CMake pkg

* Add OpenMP to CMake pkg

* Split generated shader file into separate translation unit

* Add CMake target for Vulkan shaders

* Update README.md

* Add make target for Vulkan shaders

* Use pkg-config to locate vulkan library

* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow

* Clean up tabs

* Move sudo to apt-key invocation

* Forward GGML_EXTRA_LIBS to CMake config pkg

* Update vulkan obj file paths

* Add shaderc to nix pkg

* Add python3 to Vulkan nix build

* Link against ggml in cmake pkg

* Remove Python dependency from Vulkan build

* code review changes

* Remove trailing newline

* Add cflags from pkg-config to fix w64devkit build

* Update README.md

* Remove trailing whitespace

* Update README.md

* Remove trailing whitespace

* Fix doc heading

* Make glslc required Vulkan component

* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00
Concedo
2cad736260 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.devops/nix/package.nix
#	.github/labeler.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	Package.swift
#	README.md
#	ci/run.sh
#	docs/build.md
#	examples/CMakeLists.txt
#	flake.lock
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	grammars/README.md
#	requirements/requirements-convert_hf_to_gguf.txt
#	requirements/requirements-convert_hf_to_gguf_update.txt
#	scripts/check-requirements.sh
#	scripts/compare-llama-bench.py
#	scripts/gen-unicode-data.py
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.last
#	scripts/sync-ggml.sh
#	tests/test-backend-ops.cpp
#	tests/test-chat-template.cpp
#	tests/test-tokenizer-random.py
2024-07-11 16:36:16 +08:00
compilade
3fd62a6b1c
py : type-check all Python scripts with Pyright (#8341)
* py : type-check all Python scripts with Pyright

* server-tests : use trailing slash in openai base_url

* server-tests : add more type annotations

* server-tests : strip "chat" from base_url in oai_chat_completions

* server-tests : model metadata is a dict

* ci : disable pip cache in type-check workflow

The cache is not shared between branches, and it's 250MB in size,
so it would become quite a big part of the 10GB cache limit of the repo.

* py : fix new type errors from master branch

* tests : fix test-tokenizer-random.py

Apparently, gcc applies optimisations even when pre-processing,
which confuses pycparser.

* ci : only show warnings and errors in python type-check

The "information" level otherwise has entries
from 'examples/pydantic_models_to_grammar.py',
which could be confusing for someone trying to figure out what failed,
considering that these messages can safely be ignored
even though they look like errors.
2024-07-07 15:04:39 -04:00
Concedo
572aba8e9c add target for oldcpu cuda 2024-07-06 00:37:01 +08:00
Olivier Chafik
8748d8ac6f
json: attempt to skip slow tests when running under emulator (#8189) 2024-06-28 18:02:05 +01:00
loonerin
558f44bf83
CI: fix release build (Ubuntu+Mac) (#8170)
* CI: fix release build (Ubuntu)

PR #8006 changes defaults to build shared libs. However, CI for releases
expects static builds.

* CI: fix release build (Mac)

---------

Co-authored-by: loonerin <loonerin@users.noreply.github.com>
2024-06-27 21:01:23 +02:00
slaren
ae5d0f4b89
ci : publish new docker images only when the files change (#8142) 2024-06-26 21:59:28 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake (#8006)
* scripts : update sync [no ci]

* files : relocate [no ci]

* ci : disable kompute build [no ci]

* cmake : fixes [no ci]

* server : fix mingw build

ggml-ci

* cmake : minor [no ci]

* cmake : link math library [no ci]

* cmake : build normal ggml library (not object library) [no ci]

* cmake : fix kompute build

ggml-ci

* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE

ggml-ci

* move public backend headers to the public include directory (#8122)

* move public backend headers to the public include directory

* nix test

* spm : fix metal header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* scripts : fix sync paths [no ci]

* scripts : sync ggml-blas.h [no ci]

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Concedo
c66371fbb0 cu toolkit ver 2024-06-26 12:41:05 +08:00
slaren
dd047b476c
disable docker CI on pull requests (#8110) 2024-06-25 19:20:06 +02:00
henk717
fdca385cd9
Give the CI builds a recognizable AVX1 name (#937) 2024-06-25 19:25:50 +08:00
slaren
8cb508d0d5
disable publishing the full-rocm docker image (#8083) 2024-06-24 08:36:11 +03:00
slaren
b6b9a8e606
fix CI failures (#8066)
* test-backend-ops : increase cpy max nmse

* server ci : disable thread sanitizer
2024-06-23 13:14:45 +02:00