Commit graph

315 commits

Author SHA1 Message Date
slaren
41c674161f
make rms_norm_eps a parameter (#2374)
* make rms_norm_eps a parameter

* add rms_norm_eps to command line

* fix baby llama, test-grad0

* use scientific notation for eps param in the help

ggml-ci
2023-07-24 17:57:12 +02:00
Concedo
8a9b40840b Merge branch 'master' into concedo_experimental
# Conflicts:
#	tests/test-grad0.c
#	tests/test-opt.c
2023-07-24 20:51:28 +08:00
Georgi Gerganov
5b2b2dc6ae
ggml : sync (unary ops refactor, static-correctness) (#2370)
* ggml : sync (unary ops, tests)

ggml-ci

* tests : remove unnecessary funcs
2023-07-24 14:46:21 +03:00
Concedo
910744e2c0 Merge branch 'master' into concedo_experimental
# Conflicts:
#	Makefile
#	README.md
#	flake.nix
#	llama.cpp
2023-07-23 22:37:38 +08:00
slaren
3602ac4255
fix n_tasks (#2342)
ggml-ci
2023-07-23 15:19:39 +02:00
slaren
95a6c595e7
ggml: move op parameters from tensors to ggml_tensor::op_params (#2333)
* ggml: move op parameters from tensors to ggml_tensor::op_params

* alibi: use memcpy for float params

* remove `src[1] = NULL` in ops
2023-07-23 14:36:02 +02:00
Concedo
343ae756fa Merge branch 'master' into concedo_experimental
# Conflicts:
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
#	flake.nix
#	ggml-cuda.cu
2023-07-22 11:51:30 +08:00
Georgi Gerganov
0db14fef06
ggml : fix the rope fix (513f861953) 2023-07-21 15:16:55 +03:00
Georgi Gerganov
513f861953
ggml : fix rope args order + assert (#2054) 2023-07-21 14:51:34 +03:00
Concedo
e9467f5a44 auto rope scale adjustments, added sched yield fix for apple, adjust warning for mirostat 2023-07-19 16:44:44 +08:00
Concedo
374fffb9c6 Reworking rope WIP 2023-07-19 00:54:41 +08:00
Concedo
0a11f50da8 reenabled sched_yield, reduced sampler warning msg to once per session 2023-07-18 20:26:18 +08:00
Concedo
6d32e7fc8b Merge commit 'a6803cab94' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	Makefile
#	build.zig
#	flake.nix
#	ggml-cuda.cu
#	ggml.h
#	tests/test-grad0.c
#	tests/test-opt.c
2023-07-18 19:12:06 +08:00
Qingyou Meng
672dda10e4
ggml : fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG (#2219)
* fixed runtime bugs and compile errors related to GGML_PERF and GGML_DEBUG

* remove ifdef GGML_PERF; update fmt
2023-07-16 22:57:28 +03:00
Xiao-Yong Jin
6e7cca4047
llama : add custom RoPE (#2054)
* Implement customizable RoPE

The original RoPE has pre-defined parameters

theta_i = 10000^(−2(i−1)/d), for i in [1, 2, ..., d/2]

Our customizable RoPE, ggml_rope_custom_inplace, uses

theta_i = scale * base^(−2(i−1)/d), for i in [1, 2, ..., d/2]

with the default matches the original

scale = 1.0
base = 10000

The new command line arguments
--rope-freq-base
--rope-freq-scale
set the two new RoPE parameter.

Recent researches show changing these two parameters extends the context limit with minimal loss.

1. Extending Context to 8K
   kaiokendev
   https://kaiokendev.github.io/til#extending-context-to-8k

2. Extending Context Window of Large Language Models via Positional Interpolation
   Shouyuan Chen, Sherman Wong, Liangjian Chen, Yuandong Tian
   https://arxiv.org/abs/2306.15595

3. NTK-Aware Scaled RoPE allows LLaMA models to have extended (8k+) context size without any fine-tuning and minimal perplexity degradation.
   https://www.reddit.com/user/bloc97
   https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

For the bold, try adding the following command line parameters to your favorite model:
-c 16384 --rope-freq-base 80000 --rope-freq-scale 0.5

* ggml-metal: fix custom rope

* common: fix argument names in help

* llama: increase MEM_REQ_EVAL for MODEL_3B

It avoids crashing for quantized weights on CPU.
Better ways to calculate the required buffer size would be better.

* llama: make MEM_REQ_EVAL depend on n_ctx

* server: use proper Content-Type in curl examples

Without the header Content-Type: application/json, curl will POST with
Content-Type: application/x-www-form-urlencoded

Though our simple server doesn't care, the httplib.h used has a limit
with CPPHTTPLIB_FORM_URL_ENCODED_PAYLOAD_MAX_LENGTH 8192

With Content-Type: application/json, we can send large json data.

* style : minor fixes, mostly indentations

* ggml : fix asserts

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-15 13:34:16 +03:00
Evan Miller
e8035f141e
ggml : fix static_assert with older compilers #2024 (#2218) 2023-07-14 21:55:56 +03:00
Georgi Gerganov
697966680b
ggml : sync (ggml_conv_2d, fix mul_mat bug, CUDA GLM rope) 2023-07-14 16:36:41 +03:00
Georgi Gerganov
975221e954
ggml : broadcast mul_mat + conv batch support (#2199)
* ggml : broadcast mul_mat + conv batch support

* ggml : apply mul_mat broadcast fix by @jploski
2023-07-12 20:51:29 +03:00
Georgi Gerganov
4523d10d0c ggml : add ggml_pool_1d and ggml_pool_2d 2023-07-12 20:32:15 +03:00
Concedo
5941514e95 Merge commit '5bf2a27718' into concedo_experimental
# Conflicts:
#	.devops/tools.sh
#	README.md
2023-07-12 13:05:16 +08:00
Georgi Gerganov
20d7740a9b
ggml : sync (abort callback, mul / add broadcast, fix alibi) (#2183) 2023-07-11 22:53:34 +03:00
Spencer Sutton
5bf2a27718
ggml : remove src0 and src1 from ggml_tensor and rename opt to src (#2178)
* Add ggml changes

* Update train-text-from-scratch for change

* mpi : adapt to new ggml_tensor->src

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-11 19:31:10 +03:00
Concedo
4be167915a added linear rope option, added warning for bad samplers 2023-07-11 18:08:19 +08:00
Concedo
50097e6c7f Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	README.md
#	llama.cpp
2023-07-10 20:08:27 +08:00
clyang
3bbc1a11f0
ggml : fix buidling with Intel MKL but ask for "cblas.h" issue (#2104) (#2115)
* Fix buidling with Intel MKL but ask for "cblas.h" issue

* Use angle brackets to indicate the system library
2023-07-09 11:12:20 +03:00
Concedo
15576bc865 Merge branch 'kquant_vocab_fix' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	Makefile
#	README.md
#	llama.cpp
#	tests/CMakeLists.txt
#	tests/test-grad0.c
#	tests/test-opt.c
2023-07-08 20:43:20 +08:00
Qingyou Meng
1d656d6360
ggml : change ggml_graph_compute() API to not require context (#1999)
* ggml_graph_compute: deprecate using ggml_context, try resolve issue #287

* rewrite: no longer consider backward compitability; plan and make_plan

* minor: rename ctx as plan; const

* remove ggml_graph_compute from tests/test-grad0.c, but current change breaks backward

* add static ggml_graph_compute_sugar()

* minor: update comments

* reusable buffers

* ggml : more consistent naming + metal fixes

* ggml : fix docs

* tests : disable grad / opt + minor naming changes

* ggml : add ggml_graph_compute_with_ctx()

- backwards compatible API
- deduplicates a lot of copy-paste

* ci : enable test-grad0

* examples : factor out plan allocation into a helper function

* llama : factor out plan stuff into a helper function

* ci : fix env

* llama : fix duplicate symbols + refactor example benchmark

* ggml : remove obsolete assert + refactor n_tasks section

* ggml : fix indentation in switch

* llama : avoid unnecessary bool

* ggml : remove comments from source file and match order in header

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-07 19:24:01 +03:00
Georgi Gerganov
7242140283 ggml : remove sched_yield() call in ggml_graph_compute_thread() (#2134) 2023-07-07 18:37:10 +03:00
Concedo
27a0907cfa backport MM256_SET_M128I to ggml_v2, updated lite, added support for selecting the GPU for cublas 2023-07-06 22:33:46 +08:00
Concedo
220aa707e6 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README.md
#	pocs/vdot/q8dot.cpp
#	pocs/vdot/vdot.cpp
#	scripts/sync-ggml.sh
#	tests/test-grad0.c
#	tests/test-quantize-fns.cpp
#	tests/test-quantize-perf.cpp
2023-07-06 15:40:40 +08:00
Georgi Gerganov
ec326d350c
ggml : fix bug introduced in #1237 2023-07-05 20:44:11 +03:00
Stephan Walter
1b107b8550
ggml : generalize quantize_fns for simpler FP16 handling (#1237)
* Generalize quantize_fns for simpler FP16 handling

* Remove call to ggml_cuda_mul_mat_get_wsize

* ci : disable FMA for mac os actions

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-07-05 19:13:06 +03:00
Georgi Gerganov
ed9a54e512
ggml : sync latest (new ops, macros, refactoring) (#2106)
- add ggml_argmax()
- add ggml_tanh()
- add ggml_elu()
- refactor ggml_conv_1d() and variants
- refactor ggml_conv_2d() and variants
- add helper macros to reduce code duplication in ggml.c
2023-07-04 21:54:11 +03:00
Concedo
d6b47e6a5b Merge branch 'master' into concedo_experimental 2023-07-02 17:26:39 +08:00
Concedo
e17c8497cf switched to NTK aware scaling 2023-07-02 17:25:08 +08:00
Georgi Gerganov
46088f7231 ggml : fix build with OpenBLAS (close #2066) 2023-07-02 09:46:46 +03:00
Concedo
b85ea580d3 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-07-02 14:45:25 +08:00
Qingyou Meng
b1ca8f36a9
ggml : disable GGML_TASK_INIT and GGML_TASK_FINALIZE by default (#1995)
Will not be scheduled unless explicitly enabled.
2023-07-01 18:42:43 +03:00
Concedo
10a2bdfaf1 Merge remote-tracking branch 'upstream/ik/context_extend' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
2023-06-29 20:35:17 +08:00
Concedo
dff5575647 Merge branch 'master' into concedo_experimental
# Conflicts:
#	.gitignore
#	Makefile
#	ggml-opencl.cpp
#	llama.cpp
2023-06-29 17:35:28 +08:00
Erik Scholz
9d23589d63
fix pthreads setaffinity usage on android (#2020) 2023-06-27 19:06:33 +02:00
Iwan Kawrakow
333c40b94c Fixed typo 2023-06-27 19:04:00 +03:00
Iwan Kawrakow
cda30038e4 Modified RoPE with linear scaling
When the context size is greater than the maximum context size
during training, scale the position given to RoPE with
trainign context / n_ctx.
2023-06-27 15:00:22 +03:00
Concedo
282376c85a Merge branch 'master' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
#	Makefile
#	README.md
#	tests/test-quantize-perf.cpp
2023-06-27 19:15:27 +08:00
Georgi Gerganov
d9779021bd
ggml : add support for ChatGLM RoPE 2023-06-27 00:06:51 +03:00
Georgi Gerganov
c824d2e368
ggml : avoid conv 2d kernel round up 2023-06-26 21:03:59 +03:00
zrm
b853d45601
ggml : add NUMA support (#1556)
* detect NUMA systems and pin work threads to nodes (linux)

* disable mmap prefetch/readahead for NUMA systems

* avoid sending finalize op to thread pool if it does nothing

* silence robot

* fix args

* make --numa a param

* recommendation that n_nodes evenly divide n_threads did not warrant such aggressive enforcement

* lower synchronization overhead

* statically allocate

* move numa state to g_state

* add description for --numa

* ggml : minor style changes

* ggml : minor style + try fix sanitizer build

* llama : allow to initialize backend with NUMA support

* llama : avoid ggml include in llama-util.h

* ggml : style / formatting

* ggml : fix handling of ops with n_threads > n_tasks > 1

* server : utilize numa parameter

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-26 20:57:59 +03:00
Concedo
e4c9aea840 Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
2023-06-26 10:35:47 +08:00
Georgi Gerganov
bd34cdde38
ggml : sync latest ggml (custom operators) 2023-06-25 14:25:08 +03:00
Concedo
d2034ced7b Merge branch 'master' into concedo_experimental
# Conflicts:
#	README.md
#	build.zig
#	flake.nix
#	tests/test-grad0.c
#	tests/test-sampling.cpp
#	tests/test-tokenizer-0.cpp
2023-06-25 17:01:15 +08:00