slaren
2b1f616b20
ggml : reduce hash table reset cost ( #8698 )
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-07-27 04:41:55 +02:00
Concedo
4531ab5465
refactor some fields
2024-07-27 00:04:29 +08:00
Judd
01245f5b16
llama : fix order of parameters ( #8706 )
...
usage of `aclrtGetMemInfo` is correct:
https://www.hiascend.com/doc_center/source/zh/canncommercial/63RC2/inferapplicationdev/aclcppdevg/aclcppdevg_03_0103.html
Co-authored-by: Judd <foldl@boxvest.com>
2024-07-26 11:38:12 +03:00
Yaiko
01aec4a631
server : add Speech Recognition & Synthesis to UI ( #8679 )
...
* server : add Speech Recognition & Synthesis to UI
* server : add Speech Recognition & Synthesis to UI (fixes)
2024-07-26 00:10:16 +02:00
Xuan Son Nguyen
41cd47caab
examples : export-lora : fix issue with quantized base models ( #8687 )
2024-07-25 23:49:39 +02:00
DavidKorczynski
49ce0ab6d4
ggml: handle ggml_init failure to fix NULL pointer deref ( #8692 )
...
`ggml_init` can fail if no unused context is found. In that case, a NULL-pointer deref will happen later in the code during a call to `ggml_set_on_alloc`.
This fixes it by bailing out if no context is found.
2024-07-25 23:23:05 +02:00
Georgi Gerganov
4226a8d10e
llama : fix build + fix fabs compile warnings ( #8683 )
...
ggml-ci
2024-07-25 19:57:31 +03:00
Andreas (Andi) Kunar
bf5a81df37
ggml : fix build on Windows with Snapdragon X ( #8531 )
...
* Improvements for Windows with Snapdragon X
* Revert "Improvements for Windows with Snapdragon X"
This reverts commit bf21397ae5ea7c73d3494db3b91505599909227d.
* Improvements for Windows with Snapdragon X
* WOA build clarifications
* WIndows on ARM build clarifications
* cmake build for Windows clarifications
* Update docs/build.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-25 19:01:00 +03:00
Georgi Gerganov
88954f7fbd
tests : fix printfs ( #8068 )
2024-07-25 18:58:04 +03:00
Concedo
9f2076b4b3
fix rocminfo error
2024-07-25 22:23:36 +08:00
Chen Xi
ed67bcb24f
[SYCL] fix multi-gpu issue on sycl ( #8554 )
...
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-07-25 19:45:18 +08:00
Georgi Gerganov
eddcb5238b
ggml : add and use ggml_cpu_has_llamafile() ( #8664 )
2024-07-25 12:37:42 +03:00
Xuan Son Nguyen
be6d7c0791
examples : remove finetune
and train-text-from-scratch
( #8669 )
...
* examples : remove finetune and train-text-from-scratch
* fix build
* update help message
* fix small typo for export-lora
2024-07-25 10:39:04 +02:00
Ujjawal Panchal
4b0eff3df5
docs : Quantum -> Quantized ( #8666 )
...
* docfix: imatrix readme, quantum models -> quantized models.
* docfix: server readme: quantum models -> quantized models.
2024-07-25 11:13:27 +03:00
Fan Shupei
8a4bad50a8
llama: use sliding window for phi3 ( #8627 )
...
* use sliding window for phi3
* fix typo, "data_swa" -> "data"
* [conver_hf_to_gguf.py] add phi3 sliding window
2024-07-25 10:21:09 +03:00
Concedo
a84f7c5d81
revert num old cpu for ci
2024-07-25 13:24:34 +08:00
Concedo
57a98ba308
fixed dict loading
2024-07-25 11:41:05 +08:00
Concedo
0024d9d682
fixed order of selection
2024-07-25 11:15:30 +08:00
MorganRO8
68504f0970
readme : update games list ( #8673 )
...
Added link to game I made that depends on llama
2024-07-24 19:48:00 +03:00
Concedo
d1f7832d21
adjusted layer estimation
2024-07-24 22:51:02 +08:00
Concedo
cca2fa9a6c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-server-intel.Dockerfile
# README.md
# ggml/src/CMakeLists.txt
# tests/test-chat-template.cpp
2024-07-24 21:57:50 +08:00
Concedo
e28c42d7f7
adjusted layer estimation
2024-07-24 21:54:49 +08:00
Joe Todd
f19bf99c01
Build Llama SYCL Intel with static libs ( #8668 )
...
Ensure SYCL CI builds both static & dynamic libs for testing purposes
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-24 14:36:00 +01:00
Thorsten Sommer
3a7ac5300a
readme : update UI list [no ci] ( #8505 )
2024-07-24 15:52:30 +03:00
Concedo
b7fc8e644a
fix broken template, updated lite
2024-07-24 20:47:05 +08:00
Xuan Son Nguyen
96952e7181
llama : fix llama_chat_format_single
for mistral ( #8657 )
...
* fix `llama_chat_format_single` for mistral
* fix typo
* use printf
2024-07-24 13:48:46 +02:00
Joe Todd
79167d9e49
Re-add erroneously removed -fsycl from GGML_EXTRA_LIBS ( #8667 )
2024-07-24 11:55:26 +01:00
Xuan Son Nguyen
b115105f05
add llama_lora_adapter_clear ( #8653 )
2024-07-24 11:25:19 +02:00
Concedo
01d5175654
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# ggml/src/CMakeLists.txt
2024-07-24 16:41:33 +08:00
Concedo
c76f3401e3
remove extra padding for layer guessing
2024-07-24 16:36:34 +08:00
Concedo
44ef87f14c
update lite, try fix ci
2024-07-24 16:31:34 +08:00
Xuan Son Nguyen
de280085e7
examples : Fix llama-export-lora
example ( #8607 )
...
* fix export-lora example
* add more logging
* reject merging subset
* better check
* typo
2024-07-23 23:48:37 +02:00
Concedo
eb5b4d0186
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# Package.swift
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-llama-grammar.cpp
2024-07-23 23:20:32 +08:00
Vali Malinoiu
b841d07408
server : fix URL.parse in the UI ( #8646 )
2024-07-23 17:37:42 +03:00
Joe Todd
64cf50a0ed
sycl : Add support for non-release DPC++ & oneMKL ( #8644 )
...
* Update cmake to support nvidia hardware & open-source compiler
---------
Signed-off-by: Joe Todd <joe.todd@codeplay.com>
2024-07-23 14:58:37 +01:00
Concedo
c81d1623b4
Merge commit ' 751fcfc6c3
' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CONTRIBUTING.md
# README.md
# flake.lock
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
2024-07-23 19:18:05 +08:00
Concedo
c80d5af014
add a tiny amount of padding
2024-07-23 18:58:26 +08:00
Georgi Gerganov
938943cdbf
llama : move vocab, grammar and sampling into separate files ( #8508 )
...
* llama : move sampling code into llama-sampling
ggml-ci
* llama : move grammar code into llama-grammar
ggml-ci
* cont
ggml-ci
* cont : pre-fetch rules
* cont
ggml-ci
* llama : deprecate llama_sample_grammar
* llama : move tokenizers into llama-vocab
ggml-ci
* make : update llama.cpp deps [no ci]
* llama : redirect external API to internal APIs
ggml-ci
* llama : suffix the internal APIs with "_impl"
ggml-ci
* llama : clean-up
2024-07-23 13:10:17 +03:00
0cc4m
751fcfc6c3
Vulkan IQ4_NL Support ( #8613 )
...
* Fix Vulkan matmul tests compile errors
* Add Vulkan IQ4_NL support
* Fix Vulkan DeepSeek-Coder-V2-Lite MoE support
2024-07-23 10:56:49 +02:00
Jeroen Mostert
46e47417aa
Allow all RDNA2 archs to use sdot4 intrinsic ( #8629 )
...
The check gating the use of `__builtin_amdgc_sdot4` specifically checks for gfx1030. This causes a severe perf regression for anything gfx103? that's not gfx1030 and not using `HSA_OVERRIDE_GFX_VERSION` (if you've built ROCm to support it). We already have a generic RDNA2 define, let's use it.
2024-07-23 10:50:40 +02:00
Georgi Gerganov
e7e6487ba0
contrib : clarify PR squashing + module names ( #8630 )
...
* contrib : clarify PR squashing
* contrib : fix typo + add list of modules
2024-07-23 11:28:38 +03:00
luoyu-intel
063d99ad11
[SYCL] fix scratch size of softmax ( #8642 )
2024-07-23 15:43:28 +08:00
Keke Han
081fe431aa
llama : fix codeshell support ( #8599 )
...
* llama : fix codeshell support
* llama : move codeshell after smollm below to respect the enum order
2024-07-22 19:43:43 +03:00
Jason Stillerman
d94c6e0ccb
llama : add support for SmolLm pre-tokenizer ( #8609 )
...
* Adding SmolLM Pre Tokenizer
* Update convert_hf_to_gguf_update.py
Co-authored-by: compilade <git@compilade.net>
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* handle regex
* removed .inp and out .out ggufs
---------
Co-authored-by: compilade <git@compilade.net>
2024-07-22 17:43:01 +03:00
Jiří Podivín
566daa5a5b
*.py: Stylistic adjustments for python ( #8233 )
...
* Superflous parens in conditionals were removed.
* Unused args in function were removed.
* Replaced unused `idx` var with `_`
* Initializing file_format and format_version attributes
* Renaming constant to capitals
* Preventing redefinition of the `f` var
Signed-off-by: Jiri Podivin <jpodivin@redhat.com>
2024-07-22 23:44:53 +10:00
Georgi Gerganov
6f11a83e4e
llama : allow overrides for tokenizer flags ( #8614 )
...
ggml-ci
2024-07-22 13:33:22 +03:00
Georgi Gerganov
e093dd2382
tests : re-enable tokenizer tests ( #8611 )
...
* models : remove duplicated gpt-2 vocab
* models : remove old stablelm vocab
* tests : re-enable MPT tokenizer tests
* tests : re-enable DeepSeek tokenizer tests
* cmake : sort
ggml-ci
2024-07-22 13:32:49 +03:00
Douglas Hanley
50e05353e8
llama : add Mistral Nemo inference support ( #8604 )
2024-07-22 11:06:17 +03:00
Jan Boon
628154492a
server : update doc to clarify n_keep when there is bos token ( #8619 )
2024-07-22 11:02:09 +03:00
Mark Zhuang
04bab6b7da
ggml: fix compile error for RISC-V ( #8623 )
2024-07-22 10:56:45 +03:00