Commit graph

519 commits

Author SHA1 Message Date
Concedo
895d008c5f the bloke has retired for a year, its time to let go 2025-04-13 17:00:00 +08:00
Concedo
8e23a087e7 updated readme, memory detection prints 2025-04-08 20:23:52 +08:00
Concedo
4a29e216e7 edit readme 2025-03-14 21:06:55 +08:00
Concedo
4b63ee5096 updated readme 2025-02-01 17:41:50 +08:00
Concedo
96407502cd Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	README.md
#	examples/llama-bench/llama-bench.cpp
#	examples/llama.android/llama/src/main/cpp/llama-android.cpp
#	examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
#	src/llama-vocab.cpp
#	tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
musoles
7a689c415e
README : added kalavai to infrastructure list (#11216) 2025-01-17 01:10:49 +01:00
Xuan Son Nguyen
84a44815f7
cli : auto activate conversation mode if chat template is available (#11214)
* cli : auto activate conversation mode if chat template is detected

* add warn on bad template

* update readme (writing with the help of chatgpt)

* update readme (2)

* do not activate -cnv for non-instruct models
2025-01-13 20:18:12 +01:00
Concedo
4d92b4e98e updated readme and colab 2025-01-14 00:31:52 +08:00
Concedo
bd38665e1f some cleanup before starting on TTS 2025-01-10 22:13:44 +08:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture (#11001)
llama: add support for QRWKV6 model architecture (#11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix some typos

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* code format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Fix cuda warning

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update README.md

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Pierrick Hymbert
f8feb4b01a
model: Add support for PhiMoE arch (#11003)
* model: support phimoe

* python linter

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: minor

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>

* doc: add phimoe as supported model

ggml-ci

---------

Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
2025-01-09 11:21:41 +01:00
Concedo
dcfa1eca4e Merge commit '017cc5f446' into concedo_experimental
# Conflicts:
#	.github/ISSUE_TEMPLATE/010-bug-compilation.yml
#	.github/ISSUE_TEMPLATE/019-bug-misc.yml
#	CODEOWNERS
#	examples/batched-bench/batched-bench.cpp
#	examples/batched/batched.cpp
#	examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
#	examples/gritlm/gritlm.cpp
#	examples/llama-bench/llama-bench.cpp
#	examples/passkey/passkey.cpp
#	examples/quantize-stats/quantize-stats.cpp
#	examples/run/run.cpp
#	examples/simple-chat/simple-chat.cpp
#	examples/simple/simple.cpp
#	examples/tokenize/tokenize.cpp
#	ggml/CMakeLists.txt
#	ggml/src/ggml-metal/CMakeLists.txt
#	ggml/src/ggml-vulkan/CMakeLists.txt
#	scripts/sync-ggml.last
#	src/llama.cpp
#	tests/test-autorelease.cpp
#	tests/test-model-load-cancel.cpp
#	tests/test-tokenizer-0.cpp
#	tests/test-tokenizer-1-bpe.cpp
#	tests/test-tokenizer-1-spm.cpp
2025-01-08 23:15:21 +08:00
Benson Wong
a45433ba20
readme : add llama-swap to infrastructure section (#11032)
* list llama-swap under tools in README

* readme: add llama-swap to Infrastructure
2025-01-02 09:14:54 +02:00
Concedo
2a890ec25a Breaking change: unify the windows and linux build flags.
To do a full build on windows you now need LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1
2024-12-23 22:35:54 +08:00
Eric Curtin
7909e8588d
llama-run : improve progress bar (#10821)
Set default width to whatever the terminal is. Also fixed a small bug around
default n_gpu_layers value.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-19 03:58:00 +01:00
redbeard
6b064c92b4
docs: Fix HIP (née hipBLAS) in README (#10880)
Related to #10524 / be0e350c references to hipBLAS have been removed
across the repository.  This fixes the link from the repositories
`README.md`.

Signed-off-by: Brian 'redbeard' Harrington <redbeard@dead-city.org>
2024-12-18 10:35:00 +02:00
Ruan
4f51968aca
readme : update typos (#10863) 2024-12-17 11:47:20 +02:00
Valentin Mamedov
a0974156f3
llama : add Deepseek MoE v1 & GigaChat models (#10827)
* Add deepseek v1 arch & gigachat template

* improve template code

* add readme

* delete comments

* remove comment

* fix format

* lint llama.cpp

* fix order of deepseek and deepseek2, move gigachat temlate to the end of func

* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need

* remove comments

* move deepseek above deepseek2

* change placement of gigachat chat template
2024-12-15 19:02:46 +02:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE (#10361)
* Barebone Qwen2VL LLM convertor

* Add Qwen2VL cli entrypoint

* [WIP] add qwen2vl arch

* Verify m-rope output

* Add vl-rope/2d-rope support for qwen2vl ViT

* update qwen2vl cli tool

* update 5D tensor op workaround

* [WIP] qwen2vl vision model

* make batch and clip utils compatible with qwen2vl

* [WIP] create inference workflow, gguf convert script but fix

* correcting vision-rope behavior, add the missing last layer back to ViT

* add arg parser to qwen2vl_surgery

* replace variable size array with vector

* cuda-gdb cmake preset

* add fp32 mrope, vision rope kernel

* add fp16 support for qwen2vl and m-rope

* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`

* fix rope op mode switching, out dated func args

* update `llama_hparams`

* update to keep up stream changes

* resolve linter, test errors

* add makefile entry, update speical image padding token

* add mrope unit test, fix few compiler warnings

* rename `mrope` related function, params

* minor updates on debug util, bug fixs

* add `m-rope` testcase to `test-backend-ops`

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix traililng whitespce

* store `llama_hparams.rope_sections` with fixed size array

* update position id tensor size check in GGML_OP_ROPE

* minor updates

* update `ggml_backend_*_supports_op` of unsupported backends

* remote old `rope_section` compare operator

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Eric Curtin
c27ac678dd
Opt class for positional argument handling (#10508)
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:

  llama-run llama3
  llama-run ollama://granite-code
  llama-run ollama://granite-code:8b
  llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
  llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
  llama-run https://example.com/some-file1.gguf
  llama-run some-file2.gguf
  llama-run file://some-file3.gguf

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-13 19:34:25 +01:00
Ikko Eltociear Ashimine
ed9e229372
docs: update README.md (#1244)
recomended -> recommended
2024-12-02 17:20:20 +08:00
Georgi Gerganov
6acce39710
readme : update the usage section with examples (#10596)
* readme : update the usage section with examples

* readme : more examples
2024-12-01 11:25:17 +02:00
Georgi Gerganov
3e0ba0e604
readme : remove old badge 2024-11-30 10:09:21 +02:00
Georgi Gerganov
abadba05be
readme : refresh (#10587)
* readme : refresh

* readme : move section [no ci]

* readme : clarify [no ci]

* readme : fixes [no ci]

* readme : more fixes [no ci]

* readme : simplify [no ci]

* readme : clarify GGUF
2024-11-30 09:47:07 +02:00
Diego Devesa
a3a3048e7a
cleanup UI link list (#10577)
* cleanup UI link list

* sort list alphabetically

* add missing licenses
2024-11-29 17:45:08 +01:00
Shane A
de5097351c
Add OLMo 2 model in docs (#10530)
* Add link to OLMo 2 model in docs

* Change link to landing page
2024-11-26 21:55:29 +01:00
Concedo
5b35790343 move nix example to a standalone file (+1 squashed commits)
Squashed commits:

[fef38bab5] move nix example to a standalone file
2024-11-26 16:32:26 +08:00
DontEatOreo
d82d7b60e6
Refactor Nix section (#1235)
* README.md: remove `nix{3-run,shell}` example

* README.md: better re-word NVIDIA CUDA section for nix

* README.md: remove unneed section in nix

* README.md: add section to open issue regarding nix version on nixpkgs

* README.md: add clarification on how to add KoboldCpp to `home-manager`

* README.md: add example in nix section
2024-11-26 16:18:45 +08:00
GPTLocalhost (Word Add-in)
aacb6c3a70
Add GPTLocalhost as third-party resource (#1221) 2024-11-18 10:17:06 +08:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build (#10320) 2024-11-17 09:06:34 +01:00
Small Grass Forest
1ee9eea094
docs : update bindings list (#10261)
Signed-off-by: tianzixuan <tianzixuan335@hellobike.com>
2024-11-13 13:17:10 +02:00
Concedo
3cfc4dc581 avoid euler a for flux (+4 squashed commit)
Squashed commit:

[5a4b72385] fix cuda build

[5f969a645] add vulkan information

[6849e7398] fixed flux

[740e80419] update readme
2024-11-05 22:50:14 +08:00
Georgi Gerganov
ba6f62eb79
readme : update hot topics 2024-11-01 17:31:51 +02:00
Molly Sophia
4ff7fe1fb3
llama : add chat template for RWKV-World + fix EOT (#9968)
* Add chat template for RWKV-World

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV: Fix the chat template not being used

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* RWKV v6: Set EOT token to ``\n\n``

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* readme: add rwkv into supported model list

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-10-22 13:33:37 +03:00
Asghar Ghorbani
994cfb1acb
readme : update UI list (#9972)
add PocketPal AI app
2024-10-21 21:20:59 +03:00
Loïc Carrère
45f097645e
readme : update bindings list (#9951)
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
2024-10-20 19:25:41 +03:00
icppWorld
7cab2083c7
readme : update infra list (#9942)
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
2024-10-20 19:01:34 +03:00
Ma Mingfei
60ce97c9d8
add amx kernel for gemm (#8998)
add intel amx isa detection

add vnni kernel for gemv cases

add vnni and amx kernel support for block_q8_0

code cleanup

fix packing B issue

enable openmp

fine tune amx kernel

switch to aten parallel pattern

add error message for nested parallelism

code cleanup

add f16 support in ggml-amx

add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS

update CMakeList

update README

fix some compilation warning

fix compiler warning when amx is not enabled

minor change

ggml-ci

move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp

ggml-ci

update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16

ggml-ci

add amx as an ggml-backend

update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h

minor change

update CMakeLists.txt

minor change

apply weight prepacking in set_tensor method in ggml-backend

fix compile error

ggml-ci

minor change

ggml-ci

update CMakeLists.txt

ggml-ci

add march dependency

minor change

ggml-ci

change ggml_backend_buffer_is_host to return false for amx backend

ggml-ci

fix supports_op

use device reg for AMX backend

ggml-ci

minor change

ggml-ci

minor change

fix rebase

set .buffer_from_host_ptr to be false for AMX backend
2024-10-18 13:34:36 +08:00
Tim Wang
3752217ed5
readme : update bindings list (#9918)
Co-authored-by: Tim Wang <tim.wang@ing.com>
2024-10-17 09:57:14 +03:00
Michał Tuszyński
4c42f93b22
readme : update bindings list (#9889) 2024-10-15 11:20:34 +03:00
R0CKSTAR
943d20b411
musa : update doc (#9856)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-12 08:09:53 +03:00
Viet-Anh NGUYEN (Andrew)
71967c2a6d
Add Llama Assistant (#9744) 2024-10-04 20:29:35 +02:00
Paweł Wodnicki
3f1ae2e32c
Update README.md (#9591)
Add Bielik model.
2024-10-01 19:18:46 +02:00
Georgi Gerganov
589b48d41e
contrib : add Resources section (#9675) 2024-09-29 14:38:18 +03:00
Aarni Koskela
43bcdd9703
readme : add tool (#9655) 2024-09-28 15:07:14 +03:00
Georgi Gerganov
b5de3b74a5
readme : update hot topics 2024-09-27 20:57:51 +03:00
Concedo
6342b414ea update readme 2024-09-24 23:04:23 +08:00
Riceball LEE
1d48e98e4f
readme : add programmable prompt engine language CLI (#9599) 2024-09-23 18:58:17 +03:00
Shane A
0aadac10c7
llama : support OLMoE (#9462) 2024-09-16 09:47:37 +03:00
Concedo
de0c96818e update readme 2024-09-15 21:36:20 +08:00