PAB
efb6ae9630
feat: add GGML_UNARY_OP_ARGMAX
Metal kernel (ggml/1019)
...
* implemented argmax kernel
* tpig -> tgpig
* change to strides
* contiguous assertions
* kernel working and tested
* argmax simd parallel implementation
* added 2 new tests for argmax in test-backend-ops
* cosmit
* added 3 tests cases for perf eval
* add test_argmax in make_test_cases_perf
* Update test-backend-ops.cpp
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-03 20:04:49 +02:00
PAB
667d70d170
metal : add GGML_OP_CONV_TRANSPOSE_1D
kernels (ggml/1026)
...
* wip
* wip implementation f32
* kernel conv transpose 1d f32 working
* initial commit
2024-12-03 20:04:49 +02:00
Concedo
52cc908f7f
default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change.
2024-12-03 22:44:10 +08:00
Concedo
7d11d2946c
only show warning if more than 1 moved tensor
2024-12-03 22:09:26 +08:00
Xuan Son Nguyen
3b4f2e33e2
llama : add missing LLAMA_API for llama_chat_builtin_templates ( #10636 )
2024-12-03 12:54:30 +01:00
Nikolaos Pothitos
82bca2257b
readme : add option, update default value, fix formatting ( #10271 )
...
* readme : document --no-display-prompt
* readme : update default prompt context size
* readme : remove unnecessary indentation
Indenting a line with four spaces makes Markdown treat that section as
plain text.
* readme : indent commands under bullets
* readme : indent commands in lettered list
2024-12-03 12:50:08 +02:00
Georgi Gerganov
0115df2f65
metal : small-batch mat-mul kernels ( #10581 )
...
* metal : small-batch mat-mul kernels
ggml-ci
* metal : add rest of types
ggml-ci
* metal : final adjustments
ggml-ci
* metal : add comments
ggml-ci
2024-12-03 11:52:33 +02:00
Georgi Gerganov
515d4e5372
github : minify link [no ci] (revert)
...
this doesn't work as expected
2024-12-03 11:21:43 +02:00
Georgi Gerganov
844e2e1fee
github : minify link [no ci]
2024-12-03 11:20:35 +02:00
Georgi Gerganov
70b98fadbc
server : fix default draft model parameters ( #10586 )
...
* server : force F16 KV cache for the draft model
ggml-ci
* server : fix draft params
ggml-ci
* server : various params fixes
ggml-ci
2024-12-03 11:20:00 +02:00
Xuan Son Nguyen
642330ac7c
llama : add enum for built-in chat templates ( #10623 )
...
* llama : add enum for supported chat templates
* use "built-in" instead of "supported"
* arg: print list of built-in templates
* fix test
* update server README
2024-12-02 22:10:19 +01:00
Georgi Gerganov
8648c52101
make : deprecate ( #10514 )
...
* make : deprecate
ggml-ci
* ci : disable Makefile builds
ggml-ci
* docs : remove make references [no ci]
* ci : disable swift build
ggml-ci
* docs : remove obsolete make references, scripts, examples
ggml-ci
* basic fix for compare-commits.sh
* update build.md
* more build.md updates
* more build.md updates
* more build.md updates
* Update Makefile
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-12-02 21:22:53 +02:00
haopeng
64ed2091b2
server: Add "tokens per second" information in the backend ( #10548 )
...
* add cmake rvv support
* add timings
* remove space
* update readme
* fix
* fix code
* remove empty line
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-12-02 14:45:54 +01:00
Ikko Eltociear Ashimine
ed9e229372
docs: update README.md ( #1244 )
...
recomended -> recommended
2024-12-02 17:20:20 +08:00
Akarshan Biswas
991f8aabee
SYCL: Fix and switch to GGML_LOG system instead of fprintf ( #10579 )
...
* Switched to GGML_LOG
* Fix missing semicolon
2024-12-02 15:04:11 +08:00
Georgi Gerganov
4cb003dd8d
contrib : refresh ( #10593 )
...
* contrib : refresh
* contrib : expand [no ci]
* contrib : expand test-backend-ops instructions
* contrib : add CODEOWNERS
* prs : update template to not have checkbox [no ci]
2024-12-02 08:53:27 +02:00
Juk Armstrong
917786f43d
Add mistral-v1
, mistral-v3
, mistral-v3-tekken
and mistral-v7
chat template types ( #10572 )
...
* Templates: `mistral-v1`, `mistral-v2`, `mistral-v3`, `mistral-v3-tekken`
* Changed system message logic and added tests for all 4
* Invalid `system_message` instead of `content` fixed
* Removed tab-indented lines
* Added template code and test for `mistral-v7`
* Added all tests. Fixed bug with `tmpl == "llama2"` test.
* Replaced tabs with spaces.
* Removed `'mistral-v2'` option as no (open) models ever used it
* Removed all references to 'v2' template from comments
* Update llama.cpp
Fixed `trim_assistant_message` bug
2024-12-01 23:09:49 +01:00
Georgi Gerganov
5e1ed95583
grammars : add English-only grammar ( #10612 )
2024-12-01 21:37:54 +02:00
Wang Qin
5c7a5aa0c3
ci: add error handling for Python venv creation in run.sh ( #10608 )
2024-12-01 20:11:42 +02:00
Diego Devesa
3420909dff
ggml : automatic selection of best CPU backend ( #10606 )
...
* ggml : automatic selection of best CPU backend
* amx : minor opt
* add GGML_AVX_VNNI to enable avx-vnni, fix checks
2024-12-01 16:12:41 +01:00
alek3y
86dc11c5bc
server : bind to any port when specified ( #10590 )
2024-12-01 13:33:12 +02:00
Georgi Gerganov
6acce39710
readme : update the usage section with examples ( #10596 )
...
* readme : update the usage section with examples
* readme : more examples
2024-12-01 11:25:17 +02:00
Concedo
2ba5949054
updated sdcpp, also set euler as default sampler
2024-12-01 17:00:20 +08:00
Concedo
e93c2427b4
allow incompatible vocab in debugmode
2024-12-01 14:11:03 +08:00
Concedo
42228b9746
warning when selecting non gguf models
2024-12-01 13:35:51 +08:00
Wang Qin
43957ef203
build: update Makefile comments for C++ version change ( #10598 )
2024-12-01 04:19:44 +01:00
Concedo
d5e732f3ab
updated lite
2024-12-01 01:49:09 +08:00
Concedo
b7cd210cd2
more linting with Ruff (+1 squashed commits)
...
Squashed commits:
[43802cfe2] Applied default Ruff linting
2024-12-01 01:23:13 +08:00
Adrien Gallouët
0c39f44d70
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_q4_0_4x4_q8_0() ( #10567 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2024-11-30 09:13:18 -08:00
Concedo
409e393d10
fixed critical bug in image model loader
2024-11-30 23:28:24 +08:00
Concedo
153da19274
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
2024-11-30 16:59:25 +08:00
Concedo
0028e71993
special handling to resolve incomplete utf8 token sequences in qwen
2024-11-30 16:54:01 +08:00
Concedo
32ac3153e4
default speculative set to 8. added more adapter fields
2024-11-30 16:18:27 +08:00
Georgi Gerganov
3e0ba0e604
readme : remove old badge
2024-11-30 10:09:21 +02:00
Georgi Gerganov
abadba05be
readme : refresh ( #10587 )
...
* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF
2024-11-30 09:47:07 +02:00
Eve
0533e7fb38
vulkan: Dynamic subgroup size support for Q6_K mat_vec ( #10536 )
...
* subgroup 64 version with subgroup add. 15% faster
scalable version
tested for subgroup sizes 16-128
* check for subgroup multiple of 16 and greater than 16
* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45 )
* force 16 sequential threads per block
* make 16 subgroup size a constant
2024-11-30 08:00:02 +01:00
Concedo
5353bfa983
updated lite
2024-11-30 12:26:20 +08:00
Concedo
557bcaf86e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# .github/workflows/build.yml
# Makefile
# Package.swift
# common/CMakeLists.txt
# examples/batched-bench/CMakeLists.txt
# examples/batched/CMakeLists.txt
# examples/convert-llama2c-to-ggml/CMakeLists.txt
# examples/cvector-generator/CMakeLists.txt
# examples/embedding/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/export-lora/CMakeLists.txt
# examples/gbnf-validator/CMakeLists.txt
# examples/gguf-split/CMakeLists.txt
# examples/gguf/CMakeLists.txt
# examples/gritlm/CMakeLists.txt
# examples/imatrix/CMakeLists.txt
# examples/infill/CMakeLists.txt
# examples/llama-bench/CMakeLists.txt
# examples/llava/CMakeLists.txt
# examples/lookahead/CMakeLists.txt
# examples/lookup/CMakeLists.txt
# examples/main-cmake-pkg/CMakeLists.txt
# examples/main/CMakeLists.txt
# examples/parallel/CMakeLists.txt
# examples/passkey/CMakeLists.txt
# examples/perplexity/CMakeLists.txt
# examples/quantize-stats/CMakeLists.txt
# examples/quantize/CMakeLists.txt
# examples/retrieval/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/save-load-state/CMakeLists.txt
# examples/server/CMakeLists.txt
# examples/simple-chat/CMakeLists.txt
# examples/simple/CMakeLists.txt
# examples/speculative-simple/CMakeLists.txt
# examples/speculative/CMakeLists.txt
# examples/tokenize/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# pocs/vdot/CMakeLists.txt
# src/CMakeLists.txt
# src/unicode.cpp
# tests/test-sampling.cpp
2024-11-30 12:24:51 +08:00
Concedo
697ca70115
temp checkpoint
2024-11-30 12:13:20 +08:00
Concedo
ec95241e38
temp checkpoint
2024-11-30 11:59:27 +08:00
Concedo
0c8939be19
temp checkpoint
2024-11-30 11:57:28 +08:00
Concedo
e0c59486ee
default to 12 tokens drafted
2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac
customizable speculative size
2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00
Xuan Son Nguyen
b782e5c7d4
server : add more test cases ( #10569 )
...
* server : add split model test
* add test speculative
* add invalid cases
2024-11-29 21:48:56 +01:00
Robert Collins
3a8e9af402
imatrix : support combine-only ( #10492 )
...
* imatrix-combine-only idea
* ensured that behavior consistent with log
2024-11-29 19:21:37 +02:00
Diego Devesa
a3a3048e7a
cleanup UI link list ( #10577 )
...
* cleanup UI link list
* sort list alphabetically
* add missing licenses
2024-11-29 17:45:08 +01:00
Georgi Gerganov
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )
...
ggml-ci
2024-11-29 16:25:39 +02:00
Shupei Fan
4b3242bbea
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )
2024-11-29 14:49:02 +01:00