Commit graph

6004 commits

Author SHA1 Message Date
Concedo
1ad56e9b6b if quiet mode just show transcription event without text 2024-06-06 20:26:47 +08:00
Mattheus Chediak
a143c04375
README minor fixes (#7798) [no ci]
derievatives --> derivatives
2024-06-06 22:17:54 +10:00
Lexi
1c5e05e477
whisper: fix printf format string (#894)
This format string uses %d to print uint32_t and size_t{ype,}, which is
not guaranteed to work.  Instead, use PRIu32 for uint32_t, and %zu for
size_t.
2024-06-06 19:50:59 +08:00
Concedo
813cf829b5 allow selecting multigpu on vulkan 2024-06-06 18:36:56 +08:00
Olivier Chafik
55b2d0849d
grammars: x{min,max} repetition operator (#6640)
* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates

* grammars: handle `x{n}` and fix `x{n,n}`

* grammars: document new repetition operators

* grammars: uniform use of int for min & max

* grammars: refactor parser test

* grammar: parsing tests w/ natural pretty print of updated expectations

* grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all)

* grammars: improve test pretty print again

* grammars: pretty print rules and chars

* grammars: fix copy rule skipping

* grammars: disallow `a{,}` (not allowed in regexps)

* Update common/grammar-parser.cpp

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: fix copy rule skipping (again) & display of expectations

* grammars: more test cases

* grammars: update reps parsing to bring ? / * / + closer to before

* json: use new GBNF repetitions{m,n} syntax

* grammars: update performance gotchas w/ repetition advice

* Update examples/json_schema_to_grammar.py

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* grammars: comment on rule repetitions

* grammars: ensure unambiguous number alternatives

* grammar: nit typo switched error msgs

* grammar: nit numbering in comment

* json: update numeric rule to be unambiguous

* Apply suggestions from code review

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* Update examples/server/public/json-schema-to-grammar.mjs

Co-authored-by: Clint Herron <hanclinto@gmail.com>

* json: fix integral-part

* grammar: add repetition tests

---------

Co-authored-by: Clint Herron <hanclinto@gmail.com>
2024-06-06 10:07:06 +01:00
Joan Fontanals
f5d7b268ec
llama : add jina v2 base code (#7596)
* feat: add changes to handle jina v2 base code

* fix: do not complicate things

* fix: fix the usage of the code model

* fix: fix comments

* fix: fix linting issues

* fix: remove ollama patches

* style : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-06 10:22:41 +03:00
slaren
2d08b7fbb4
docker : build only main and server in their images (#7782)
* add openmp lib to dockerfiles

* build only main and server in their docker images
2024-06-06 08:19:49 +03:00
slaren
d67caea0d6
docker : add openmp lib (#7780) 2024-06-06 08:17:21 +03:00
Galunid
7672adeec7
Fix encoding in python scripts (#7733) 2024-06-06 03:07:24 +10:00
Johannes Gäßler
7d1a378b8f
CUDA: refactor mmq, dmmv, mmvq (#7716)
* CUDA: refactor mmq, dmmv, mmvq

* fix out-of-bounds write

* struct for qk, qr, qi

* fix cmake build

* mmq_type_traits
2024-06-05 16:53:00 +02:00
Georgi Gerganov
2b3389677a
ggml : refactor rope norm/neox (#7634)
* ggml : unify rope norm/neox (CPU)

* ggml : fix compile warning

* ggml : remove GLM rope mode

ggml-ci

* metal : better rope implementation

ggml-ci

* cuda : better rope implementation

ggml-ci

* naming : n_orig_ctx -> n_ctx_orig

ggml-ci

* dev : add reminders to update backends

ggml-ci

* vulkan : fix ggml_rope_ext() usage

* cuda : fix array size + indents

ggml-ci
2024-06-05 11:29:20 +03:00
arch-btw
9973e81c5c
readme : remove -ins (#7759)
-ins and --instruct were moved in https://github.com/ggerganov/llama.cpp/pull/7675

I have adjusted the README accordingly.
There was no trace of --chatml in the README.
2024-06-05 09:40:49 +03:00
Concedo
3e4c44bace update lite 2024-06-05 11:41:56 +08:00
Concedo
6659742a2d do not merge the removal of opencl 2024-06-05 10:57:52 +08:00
Concedo
ba9135b356 Merge commit '554c247caf' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	CMakeLists.txt
#	Makefile
#	README-sycl.md
#	README.md
#	flake.nix
#	ggml-opencl.cpp
#	ggml-opencl.h
#	scripts/LlamaConfig.cmake.in
#	scripts/compare-llama-bench.py
#	scripts/server-llm.sh
#	scripts/sync-ggml-am.sh
#	scripts/sync-ggml.sh
2024-06-05 10:55:55 +08:00
Concedo
e3e21cc44d Merge commit '0cd6bd3483' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.gitignore
#	CMakeLists.txt
#	Makefile
#	README.md
#	models/ggml-vocab-phi-3.gguf
#	scripts/compare-commits.sh
#	tests/test-tokenizer-random.py
2024-06-05 10:53:52 +08:00
Concedo
10b148f4c2 added skip bos for tokenize endpoint 2024-06-05 10:49:11 +08:00
jaime-m-p
c90dbe026b
Fix per token atrributes bits (#7749) 2024-06-05 01:26:14 +02:00
agray3
b90dc566c1
Allow number of nodes in CUDA graph to change (#7738)
Previously the code would have failed to cope in the case that the
number of nodes changes in an existing CUDA graph. This fixes the
issue by removing an unnecessary conditional.
2024-06-04 22:06:49 +02:00
Georgi Gerganov
1442677f92
common : refactor cli arg parsing (#7675)
* common : gpt_params_parse do not print usage

* common : rework usage print (wip)

* common : valign

* common : rework print_usage

* infill : remove cfg support

* common : reorder args

* server : deduplicate parameters

ggml-ci

* common : add missing header

ggml-ci

* common : remote --random-prompt usages

ggml-ci

* examples : migrate to gpt_params

ggml-ci

* batched-bench : migrate to gpt_params

* retrieval : migrate to gpt_params

* common : change defaults for escape and n_ctx

* common : remove chatml and instruct params

ggml-ci

* common : passkey use gpt_params
2024-06-04 21:23:39 +03:00
Georgi Gerganov
554c247caf
ggml : remove OpenCL (#7735)
ggml-ci
2024-06-04 21:23:20 +03:00
Georgi Gerganov
0cd6bd3483
llama : remove beam search (#7736) 2024-06-04 21:23:05 +03:00
Georgi Gerganov
5ca0944a15
readme : remove obsolete Zig instructions (#7471) 2024-06-04 19:43:01 +03:00
slaren
adc9ff3841
llama-bench : allow using a different printer for stderr with -oe (#7722)
compare-commits.sh : hide stdout, use -oe to print markdown
2024-06-04 14:32:42 +02:00
Daniele
987d743d6b
Improve hipBLAS support in CMake (#7696)
* Improve hipBLAS support in CMake

This improves the detection of the correct CMAKE_PREFIX_PATH when using different distributions or a self-built ROCm SDK.

* Set ROCM_PATH correctly
2024-06-04 14:09:15 +02:00
zhouwg
b226c1227b
refine .gitignore (#7688)
This adds tags and android ndk into the git ignore list
2024-06-04 21:21:26 +10:00
Concedo
5789417802 logit bias tokenize feature 2024-06-04 15:47:34 +08:00
jaime-m-p
3b38d48609
Per token attributes (#7685)
* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c)
* Replace llama_token_type with llama_token_attribs
2024-06-04 09:17:17 +02:00
Georgi Gerganov
6d1616944d
ggml : prevent builds with -ffinite-math-only (#7726)
This enforces a check that -fno-finite-math-only was set and that the operating
compiling mode is not in finite maths mode. This is because during rewriting of
silu and softmax for cpu #7154 there emerged an issue where the result that was
observed when >1 slot was nondeterministic as found by @JohannesGaessler.

@LostRuins narrowed the problem down to -ffinite-math-only which was theorised
to be due to SiLU, instead of flushing small values to 0, returns NaN or some 
other garbage. @jart proposed a fix that @ggerganov then implemented in this fix

ref https://github.com/ggerganov/llama.cpp/pull/7154#issuecomment-2145661825
2024-06-04 17:01:09 +10:00
Radoslav Gerganov
bde7cd3cd9
llama : offload to RPC in addition to other backends (#7640)
* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <slarengh@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
2024-06-03 20:03:26 +03:00
Masaya, Kato
a5735e4426
ggml : use OpenMP as a thread pool (#7606)
* ggml: Added OpenMP for multi-threads processing

* ggml : Limit the number of threads used to avoid deadlock

* update shared state n_threads in parallel region

* clear numa affinity for main thread even with openmp

* enable openmp by default

* fix msvc build

* disable openmp on macos

* ci : disable openmp with thread sanitizer

* Update ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-06-03 17:14:15 +02:00
Concedo
967b6572a2 try to use GPU for whisper 2024-06-03 23:07:26 +08:00
Concedo
a2e304ed4d remove issue templates 2024-06-03 22:52:09 +08:00
Johannes Gäßler
0b832d53ba
make: fix debug options not being applied to NVCC (#7714) 2024-06-03 16:28:58 +02:00
Concedo
cb62ab237b updated settings menu 2024-06-03 22:06:38 +08:00
Concedo
a541a3d509 quantkv will not trigger if fa is off or ctx shift is on 2024-06-03 19:14:22 +08:00
Concedo
94753ad103 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	CMakeLists.txt
2024-06-03 18:33:23 +08:00
Concedo
efee37a708 gui for quantkv 2024-06-03 18:25:57 +08:00
0cc4m
3d7ebf6312
Vulkan Mixture of Experts (MoE) support (#7628)
* Finish Vulkan mul_mat_id implementation

* Add Vulkan sum_rows and div ops

* Fix MUL_MAT_ID matrix matrix shader

* Fix MUL_MAT_ID matrix vector shader dispatch size

* Fix MUL_MAT_ID matrix vector shader and dispatch code

* Update Vulkan CPU offload for MUL_MAT_ID

* Fix crash when using split mode none and setting a main GPU
2024-06-03 10:59:14 +02:00
Andy Tai
a10cda58d3
cmake : add pkg-config spec file for llama.cpp (#7702) 2024-06-03 11:06:24 +03:00
Concedo
0978806a3c fix inverted probability 2024-06-03 16:02:35 +08:00
zhangkaihuo
6f28a333c1
llama : MiniCPM support tied embeddings (#7664)
* support lm_head

* remove the code block

---------

Co-authored-by: zhangkaihuo <zhangkaihuo@modelbest.cn>
2024-06-03 10:49:30 +03:00
Concedo
5ebc532ca9 update colab 2024-06-03 14:55:12 +08:00
Concedo
8b29d5f848 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.gitignore
#	CMakeLists.txt
#	flake.lock
#	llama.cpp
2024-06-03 14:46:12 +08:00
Concedo
10a1d628ad added new binding fields for quant k and quant v 2024-06-03 14:35:59 +08:00
Georgi Gerganov
549279d804
llama : avoid double token-to-piece cache (#7654)
ggml-ci
2024-06-03 08:34:43 +03:00
woachk
9e405b6e2e
kompute : implement op_getrows_f32 (#6403)
op_getrows_f32 is required since https://github.com/ggerganov/llama.cpp/pull/6122
for the Vulkan w/ Kompute backend to be functional.

As such, implement this op to make this backend functional again.
2024-06-03 08:32:16 +03:00
Dave Airlie
3413ae2193
fix bug introduced in using calloc (#7701)
compilade pointed this out on the previous MR
2024-06-02 17:59:54 -04:00
Georgi Gerganov
1669810d7c
flake.lock: Update (#7686)
Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/8dc45382d5206bd292f9c2768b8058a8fd8311d9?narHash=sha256-/GJvTdTpuDjNn84j82cU6bXztE0MSkdnTWClUCRub78%3D' (2024-05-16)
  → 'github:hercules-ci/flake-parts/2a55567fcf15b1b1c7ed712a2c6fadaec7412ea8?narHash=sha256-iKzJcpdXih14qYVcZ9QC9XuZYnPc6T8YImb6dX166kw%3D' (2024-06-01)
• Updated input 'flake-parts/nixpkgs-lib':
    'https://github.com/NixOS/nixpkgs/archive/50eb7ecf4cd0a5756d7275c8ba36790e5bd53e33.tar.gz?narHash=sha256-QBx10%2Bk6JWz6u7VsohfSw8g8hjdBZEf8CFzXH1/1Z94%3D' (2024-05-02)
  → 'https://github.com/NixOS/nixpkgs/archive/eb9ceca17df2ea50a250b6b27f7bf6ab0186f198.tar.gz?narHash=sha256-lIbdfCsf8LMFloheeE6N31%2BBMIeixqyQWbSr2vk79EQ%3D' (2024-06-01)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/bfb7a882678e518398ce9a31a881538679f6f092?narHash=sha256-4zSIhSRRIoEBwjbPm3YiGtbd8HDWzFxJjw5DYSDy1n8%3D' (2024-05-24)
  → 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-06-02 14:13:12 -07:00
Austin
7c4e5b7eae
chore : add ignore rule for generated server themes (#7689) 2024-06-02 20:39:08 +03:00