Commit graph

417 commits

Author SHA1 Message Date
Li, Zonghang
dfb1feb54e update README 2025-06-16 12:09:07 +04:00
Li, Zonghang
fbf853341b add endpoint /v1/cancel 2025-06-07 11:34:38 +04:00
Lizonghang
ef1e10101e add test for IQ1 and doc for device selection 2025-06-04 15:12:00 +04:00
Lizonghang
c54a6a0132 fix context shifting 2025-05-19 16:58:35 +04:00
Lizonghang
2cc01483fd support server mode 2025-05-14 18:28:46 +04:00
Lizonghang
ebd09fc83c Merge branch 'dev' 2025-05-14 14:19:53 +04:00
Lizonghang
258fb2d06b add QA: How to manually profile a device 2025-05-14 14:19:20 +04:00
Li, Zonghang
e2de4511c5
Update README.md 2025-05-11 18:15:39 +08:00
leeetao
b212d74dc3 update Readme.md 2025-04-17 09:17:11 +00:00
Zonghang Li
f9702ec4c0 update README.md 2025-04-16 15:55:43 +04:00
Li, Zonghang
b59d6d9cdf
Update README.md 2025-04-15 09:59:08 +08:00
Li, Zonghang
6d13836c44
Update README.md 2025-04-11 01:41:43 +08:00
Li, Zonghang
4845abf25e
Update README.md 2025-04-11 01:20:36 +08:00
Lizonghang
e48b804730 update README.md 2025-04-09 13:55:30 +04:00
Li, Zonghang
55f8dc588f
Update README.md 2025-04-09 10:56:25 +08:00
Lizonghang
e421d788d3 update README 2025-04-08 23:15:43 +04:00
Lizonghang
03ff9a7654 update README 2025-04-07 23:28:01 +04:00
Li, Zonghang
a3a1f4499b
Update README.md 2025-04-07 22:14:44 +08:00
Li, Zonghang
98d73778a6
Update README.md 2025-04-07 22:13:31 +08:00
Li, Zonghang
5984b1b75f
Update README.md 2025-04-07 22:13:12 +08:00
Li, Zonghang
ebd15b4112
Update README.md 2025-04-07 22:12:11 +08:00
Li, Zonghang
35adc76337
Update README.md 2025-04-07 22:08:14 +08:00
Lizonghang
87eb1aa7ec update README 2025-04-07 18:06:57 +04:00
Lizonghang
fffefb9259 update README 2025-04-07 17:57:57 +04:00
Lizonghang
3b264352e7 update README 2025-03-30 23:39:36 +04:00
Lizonghang
2a01ff5fb1 init 2024-10-23 09:42:32 +04:00
Viet-Anh NGUYEN (Andrew)
71967c2a6d
Add Llama Assistant (#9744) 2024-10-04 20:29:35 +02:00
Paweł Wodnicki
3f1ae2e32c
Update README.md (#9591)
Add Bielik model.
2024-10-01 19:18:46 +02:00
Georgi Gerganov
589b48d41e
contrib : add Resources section (#9675) 2024-09-29 14:38:18 +03:00
Aarni Koskela
43bcdd9703
readme : add tool (#9655) 2024-09-28 15:07:14 +03:00
Georgi Gerganov
b5de3b74a5
readme : update hot topics 2024-09-27 20:57:51 +03:00
Riceball LEE
1d48e98e4f
readme : add programmable prompt engine language CLI (#9599) 2024-09-23 18:58:17 +03:00
Shane A
0aadac10c7
llama : support OLMoE (#9462) 2024-09-16 09:47:37 +03:00
OSecret
d6b37c881f
readme : update tools list (#9475)
* Added link to proprietary wrapper for Unity3d into README.md

Wrapper has prebuild library and was tested on iOS, Android, WebGL, PC, Mac platforms, has online demos like [this](https://d23myu0xfn2ttc.cloudfront.net/rich/index.html) and [that](https://d23myu0xfn2ttc.cloudfront.net/).

* Update README.md

Fixes upon review
2024-09-15 10:36:53 +03:00
Faisal Zaghloul
449ccfb6f5
Add Jais to list of supported models (#9439)
Co-authored-by: fmz <quic_fzaghlou@quic.com>
2024-09-12 02:29:53 +02:00
Georgi Gerganov
38ca6f644b
readme : update hot topics 2024-09-09 15:51:37 +03:00
Antonis Makropoulos
5ed087573e
readme : add LLMUnity to UI projects (#9381)
* add LLMUnity to UI projects

* add newline to examples/rpc/README.md to fix editorconfig-checker unit test
2024-09-09 14:21:38 +03:00
Georgi Gerganov
b69a480af4
readme : refactor API section + remove old hot topics 2024-09-03 10:00:36 +03:00
Younes Belkada
b40eb84895
llama : support for falcon-mamba architecture (#9074)
* feat: initial support for llama.cpp

* fix: lint

* refactor: better refactor

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix: address comments

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <git@compilade.net>

* fix: add more cleanup and harmonization

* fix: lint

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* fix: change name

* Apply suggestions from code review

Co-authored-by: compilade <git@compilade.net>

* add in operator

* fix: add `dt_b_c_rms` in `llm_load_print_meta`

* fix: correct printf format for bool

* fix: correct print format

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* llama : quantize more Mamba tensors

* llama : use f16 as the fallback of fallback quant types

---------

Co-authored-by: compilade <git@compilade.net>
2024-08-21 11:06:36 +03:00
wangshuai09
cfac111e2b
cann: add doc for cann backend (#8867)
Co-authored-by: xuedinge233 <damow890@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2024-08-19 16:46:38 +08:00
Minsoo Cheong
c679e0cb5c
llama : add EXAONE model support (#9025)
* add exaone model support

* add chat template

* fix whitespace

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* add ftype

* add exaone pre-tokenizer in `llama-vocab.cpp`

Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>

* fix lint

Co-Authored-By: compilade <113953597+compilade@users.noreply.github.com>

* add `EXAONE` to supported models in `README.md`

* fix space

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: compilade <113953597+compilade@users.noreply.github.com>
Co-authored-by: compilade <git@compilade.net>
2024-08-16 09:35:18 +03:00
Frank Mai
84eb2f4fad
docs: introduce gpustack and gguf-parser (#8873)
* readme: introduce gpustack

GPUStack is an open-source GPU cluster manager for running large
language models, which uses llama.cpp as the backend.

Signed-off-by: thxCode <thxcode0824@gmail.com>

* readme: introduce gguf-parser

GGUF Parser is a tool to review/check the GGUF file and estimate the
memory usage without downloading the whole model.

Signed-off-by: thxCode <thxcode0824@gmail.com>

---------

Signed-off-by: thxCode <thxcode0824@gmail.com>
2024-08-12 14:45:50 +02:00
Eric Curtin
b42978e7e4
readme : add ramalama to the availables UI (#8811)
ramalama is a repo agnostic boring CLI tool that supports pulling from
ollama, huggingface and oci registries.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-08-05 15:45:01 +03:00
BarfingLemurs
400ae6f65f
readme : update model list (#8851) 2024-08-05 08:54:10 +03:00
R0CKSTAR
e54c35e4fb
feat: Support Moore Threads GPU (#8383)
* Update doc for MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in Makefile

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Add GGML_MUSA in CMake

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* CUDA => MUSA

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* MUSA adds support for __vsubss4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Fix CI build failure

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-07-28 01:41:25 +02:00
MorganRO8
68504f0970
readme : update games list (#8673)
Added link to game I made that depends on llama
2024-07-24 19:48:00 +03:00
Thorsten Sommer
3a7ac5300a
readme : update UI list [no ci] (#8505) 2024-07-24 15:52:30 +03:00
Georgi Gerganov
be0cfb4175
readme : fix server badge 2024-07-19 14:34:55 +03:00
Andy Salerno
fd560fe680
Update README.md to fix broken link to docs (#8399)
Update the "Performance troubleshooting" doc link to be correct - the file was moved into a dir called 'development'
2024-07-09 14:58:44 -04:00
b4b4o
c4dd11d1d3
readme : fix web link error [no ci] (#8347) 2024-07-08 17:19:24 +03:00