Concedo
895d008c5f
the bloke has retired for a year, its time to let go
2025-04-13 17:00:00 +08:00
Concedo
8e23a087e7
updated readme, memory detection prints
2025-04-08 20:23:52 +08:00
Concedo
4a29e216e7
edit readme
2025-03-14 21:06:55 +08:00
Concedo
4b63ee5096
updated readme
2025-02-01 17:41:50 +08:00
Concedo
96407502cd
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama-bench/llama-bench.cpp
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.android/llama/src/main/java/android/llama/cpp/LLamaAndroid.kt
# src/llama-vocab.cpp
# tests/test-backend-ops.cpp
2025-01-17 23:13:50 +08:00
musoles
7a689c415e
README : added kalavai to infrastructure list ( #11216 )
2025-01-17 01:10:49 +01:00
Xuan Son Nguyen
84a44815f7
cli : auto activate conversation mode if chat template is available ( #11214 )
...
* cli : auto activate conversation mode if chat template is detected
* add warn on bad template
* update readme (writing with the help of chatgpt)
* update readme (2)
* do not activate -cnv for non-instruct models
2025-01-13 20:18:12 +01:00
Concedo
4d92b4e98e
updated readme and colab
2025-01-14 00:31:52 +08:00
Concedo
bd38665e1f
some cleanup before starting on TTS
2025-01-10 22:13:44 +08:00
Molly Sophia
ee7136c6d1
llama: add support for QRWKV6 model architecture ( #11001 )
...
llama: add support for QRWKV6 model architecture (#11001 )
* WIP: Add support for RWKV6Qwen2
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Some graph simplification
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Add support for RWKV6Qwen2 with cpu and cuda GLA
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix some typos
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* code format changes
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix wkv test & add gla test
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Fix cuda warning
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update README.md
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update ggml/src/ggml-cuda/gla.cu
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Fix fused lerp weights loading with RWKV6
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* better sanity check skipping for QRWKV6 in llama-quant
thanks @compilade
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: compilade <git@compilade.net>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <git@compilade.net>
2025-01-10 09:58:08 +08:00
Pierrick Hymbert
f8feb4b01a
model: Add support for PhiMoE arch ( #11003 )
...
* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
2025-01-09 11:21:41 +01:00
Concedo
dcfa1eca4e
Merge commit ' 017cc5f446
' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# CODEOWNERS
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
# examples/gritlm/gritlm.cpp
# examples/llama-bench/llama-bench.cpp
# examples/passkey/passkey.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/run/run.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/tokenize/tokenize.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-autorelease.cpp
# tests/test-model-load-cancel.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2025-01-08 23:15:21 +08:00
Benson Wong
a45433ba20
readme : add llama-swap to infrastructure section ( #11032 )
...
* list llama-swap under tools in README
* readme: add llama-swap to Infrastructure
2025-01-02 09:14:54 +02:00
Concedo
2a890ec25a
Breaking change: unify the windows and linux build flags.
...
To do a full build on windows you now need LLAMA_PORTABLE=1 LLAMA_VULKAN=1 LLAMA_CLBLAST=1
2024-12-23 22:35:54 +08:00
Eric Curtin
7909e8588d
llama-run : improve progress bar ( #10821 )
...
Set default width to whatever the terminal is. Also fixed a small bug around
default n_gpu_layers value.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-19 03:58:00 +01:00
redbeard
6b064c92b4
docs: Fix HIP (née hipBLAS) in README ( #10880 )
...
Related to #10524 / be0e350c
references to hipBLAS have been removed
across the repository. This fixes the link from the repositories
`README.md`.
Signed-off-by: Brian 'redbeard' Harrington <redbeard@dead-city.org>
2024-12-18 10:35:00 +02:00
Ruan
4f51968aca
readme : update typos ( #10863 )
2024-12-17 11:47:20 +02:00
Valentin Mamedov
a0974156f3
llama : add Deepseek MoE v1 & GigaChat models ( #10827 )
...
* Add deepseek v1 arch & gigachat template
* improve template code
* add readme
* delete comments
* remove comment
* fix format
* lint llama.cpp
* fix order of deepseek and deepseek2, move gigachat temlate to the end of func
* fix order of deepseek and deepseek2 in constants; mark shared exp as deepseek arch need
* remove comments
* move deepseek above deepseek2
* change placement of gigachat chat template
2024-12-15 19:02:46 +02:00
HimariO
ba1cb19cdd
llama : add Qwen2VL support + multimodal RoPE ( #10361 )
...
* Barebone Qwen2VL LLM convertor
* Add Qwen2VL cli entrypoint
* [WIP] add qwen2vl arch
* Verify m-rope output
* Add vl-rope/2d-rope support for qwen2vl ViT
* update qwen2vl cli tool
* update 5D tensor op workaround
* [WIP] qwen2vl vision model
* make batch and clip utils compatible with qwen2vl
* [WIP] create inference workflow, gguf convert script but fix
* correcting vision-rope behavior, add the missing last layer back to ViT
* add arg parser to qwen2vl_surgery
* replace variable size array with vector
* cuda-gdb cmake preset
* add fp32 mrope, vision rope kernel
* add fp16 support for qwen2vl and m-rope
* add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION`
* fix rope op mode switching, out dated func args
* update `llama_hparams`
* update to keep up stream changes
* resolve linter, test errors
* add makefile entry, update speical image padding token
* add mrope unit test, fix few compiler warnings
* rename `mrope` related function, params
* minor updates on debug util, bug fixs
* add `m-rope` testcase to `test-backend-ops`
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix traililng whitespce
* store `llama_hparams.rope_sections` with fixed size array
* update position id tensor size check in GGML_OP_ROPE
* minor updates
* update `ggml_backend_*_supports_op` of unsupported backends
* remote old `rope_section` compare operator
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-14 14:43:46 +02:00
Eric Curtin
c27ac678dd
Opt class for positional argument handling ( #10508 )
...
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
llama-run llama3
llama-run ollama://granite-code
llama-run ollama://granite-code:8b
llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
llama-run https://example.com/some-file1.gguf
llama-run some-file2.gguf
llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-13 19:34:25 +01:00
Ikko Eltociear Ashimine
ed9e229372
docs: update README.md ( #1244 )
...
recomended -> recommended
2024-12-02 17:20:20 +08:00
Georgi Gerganov
6acce39710
readme : update the usage section with examples ( #10596 )
...
* readme : update the usage section with examples
* readme : more examples
2024-12-01 11:25:17 +02:00
Georgi Gerganov
3e0ba0e604
readme : remove old badge
2024-11-30 10:09:21 +02:00
Georgi Gerganov
abadba05be
readme : refresh ( #10587 )
...
* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF
2024-11-30 09:47:07 +02:00
Diego Devesa
a3a3048e7a
cleanup UI link list ( #10577 )
...
* cleanup UI link list
* sort list alphabetically
* add missing licenses
2024-11-29 17:45:08 +01:00
Shane A
de5097351c
Add OLMo 2 model in docs ( #10530 )
...
* Add link to OLMo 2 model in docs
* Change link to landing page
2024-11-26 21:55:29 +01:00
Concedo
5b35790343
move nix example to a standalone file (+1 squashed commits)
...
Squashed commits:
[fef38bab5] move nix example to a standalone file
2024-11-26 16:32:26 +08:00
DontEatOreo
d82d7b60e6
Refactor Nix section ( #1235 )
...
* README.md: remove `nix{3-run,shell}` example
* README.md: better re-word NVIDIA CUDA section for nix
* README.md: remove unneed section in nix
* README.md: add section to open issue regarding nix version on nixpkgs
* README.md: add clarification on how to add KoboldCpp to `home-manager`
* README.md: add example in nix section
2024-11-26 16:18:45 +08:00
GPTLocalhost (Word Add-in)
aacb6c3a70
Add GPTLocalhost as third-party resource ( #1221 )
2024-11-18 10:17:06 +08:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build ( #10320 )
2024-11-17 09:06:34 +01:00
Small Grass Forest
1ee9eea094
docs : update bindings list ( #10261 )
...
Signed-off-by: tianzixuan <tianzixuan335@hellobike.com>
2024-11-13 13:17:10 +02:00
Concedo
3cfc4dc581
avoid euler a for flux (+4 squashed commit)
...
Squashed commit:
[5a4b72385] fix cuda build
[5f969a645] add vulkan information
[6849e7398] fixed flux
[740e80419] update readme
2024-11-05 22:50:14 +08:00
Georgi Gerganov
ba6f62eb79
readme : update hot topics
2024-11-01 17:31:51 +02:00
Molly Sophia
4ff7fe1fb3
llama : add chat template for RWKV-World + fix EOT ( #9968 )
...
* Add chat template for RWKV-World
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV: Fix the chat template not being used
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* RWKV v6: Set EOT token to ``\n\n``
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* readme: add rwkv into supported model list
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-10-22 13:33:37 +03:00
Asghar Ghorbani
994cfb1acb
readme : update UI list ( #9972 )
...
add PocketPal AI app
2024-10-21 21:20:59 +03:00
Loïc Carrère
45f097645e
readme : update bindings list ( #9951 )
...
Update the binding list by adding LM-Kit.NET (C# & VB.NET)
2024-10-20 19:25:41 +03:00
icppWorld
7cab2083c7
readme : update infra list ( #9942 )
...
llama_cpp_canister allows you to run llama.cpp as a Smart Contract on the Internet Computer. The smart contract runs as WebAssembly in a so-called 'canister'.
2024-10-20 19:01:34 +03:00
Ma Mingfei
60ce97c9d8
add amx kernel for gemm ( #8998 )
...
add intel amx isa detection
add vnni kernel for gemv cases
add vnni and amx kernel support for block_q8_0
code cleanup
fix packing B issue
enable openmp
fine tune amx kernel
switch to aten parallel pattern
add error message for nested parallelism
code cleanup
add f16 support in ggml-amx
add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS
update CMakeList
update README
fix some compilation warning
fix compiler warning when amx is not enabled
minor change
ggml-ci
move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp
ggml-ci
update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16
ggml-ci
add amx as an ggml-backend
update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h
minor change
update CMakeLists.txt
minor change
apply weight prepacking in set_tensor method in ggml-backend
fix compile error
ggml-ci
minor change
ggml-ci
update CMakeLists.txt
ggml-ci
add march dependency
minor change
ggml-ci
change ggml_backend_buffer_is_host to return false for amx backend
ggml-ci
fix supports_op
use device reg for AMX backend
ggml-ci
minor change
ggml-ci
minor change
fix rebase
set .buffer_from_host_ptr to be false for AMX backend
2024-10-18 13:34:36 +08:00
Tim Wang
3752217ed5
readme : update bindings list ( #9918 )
...
Co-authored-by: Tim Wang <tim.wang@ing.com>
2024-10-17 09:57:14 +03:00
Michał Tuszyński
4c42f93b22
readme : update bindings list ( #9889 )
2024-10-15 11:20:34 +03:00
R0CKSTAR
943d20b411
musa : update doc ( #9856 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-12 08:09:53 +03:00
Viet-Anh NGUYEN (Andrew)
71967c2a6d
Add Llama Assistant ( #9744 )
2024-10-04 20:29:35 +02:00
Paweł Wodnicki
3f1ae2e32c
Update README.md ( #9591 )
...
Add Bielik model.
2024-10-01 19:18:46 +02:00
Georgi Gerganov
589b48d41e
contrib : add Resources section ( #9675 )
2024-09-29 14:38:18 +03:00
Aarni Koskela
43bcdd9703
readme : add tool ( #9655 )
2024-09-28 15:07:14 +03:00
Georgi Gerganov
b5de3b74a5
readme : update hot topics
2024-09-27 20:57:51 +03:00
Concedo
6342b414ea
update readme
2024-09-24 23:04:23 +08:00
Riceball LEE
1d48e98e4f
readme : add programmable prompt engine language CLI ( #9599 )
2024-09-23 18:58:17 +03:00
Shane A
0aadac10c7
llama : support OLMoE ( #9462 )
2024-09-16 09:47:37 +03:00
Concedo
de0c96818e
update readme
2024-09-15 21:36:20 +08:00