Concedo
dcfa1eca4e
Merge commit ' 017cc5f446' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# CODEOWNERS
# examples/batched-bench/batched-bench.cpp
# examples/batched/batched.cpp
# examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp
# examples/gritlm/gritlm.cpp
# examples/llama-bench/llama-bench.cpp
# examples/passkey/passkey.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/run/run.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/tokenize/tokenize.cpp
# ggml/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-autorelease.cpp
# tests/test-model-load-cancel.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2025-01-08 23:15:21 +08:00
Concedo
3732bb2686
taesd now supports flux and sd3
2025-01-08 22:35:50 +08:00
Concedo
c73d99ccac
updated lite
2025-01-08 13:35:59 +08:00
Concedo
568e476997
added toggle for vae tiling, use custom memory buffer
2025-01-08 13:12:03 +08:00
Diego Devesa
017cc5f446
ggml-backend : only offload from host buffers (fix) ( #11124 )
2025-01-07 16:11:57 +01:00
Concedo
d752846116
fixed ask save file
2025-01-07 22:11:15 +08:00
Concedo
bb2e739627
fixed simplercflags
2025-01-07 21:34:38 +08:00
Diego Devesa
a3d50bc022
ggml-backend : only offload from host buffers ( #11120 )
2025-01-07 12:38:05 +01:00
Radoslav Gerganov
a4dd490069
rpc : code cleanup ( #11107 )
...
Remove duplicated macros, use GGML_LOG_ERROR for errors
2025-01-07 08:37:02 +02:00
Akarshan Biswas
c0d6f790d0
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 ( #11087 )
...
* SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6
* Revert "SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6"
This reverts commit f62dc45f318e48d375e7734b34cbddee81deed52.
* Reland: Use get_multi_ptr instead of deprecated get_pointer in wkv6
2025-01-07 14:26:07 +08:00
Eric Curtin
dc7cef9f37
llama-run : fix context size ( #11094 )
...
Set `n_ctx` equal to `n_batch` in `Opt` class. Now context size is
a more reasonable 2048.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-01-06 23:45:28 +01:00
Georgi Gerganov
ecebbd292d
llama : remove unused headers ( #11109 )
...
ggml-ci
2025-01-06 17:52:35 +02:00
Xuan Son Nguyen
96be8c3264
github : add cmd line field to bug report ( #11090 )
...
* github : cmd line to bug report
* codeowners : (@ngxson) only watch dockerfile
* Apply suggestions from code review [no ci]
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* rm cmd in log output [no ci]
* rm 2 [no ci]
* no need backticks [no ci]
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-01-06 16:34:49 +01:00
Concedo
58791612d2
sse3 mode for noavx2 clblast, fixed metadata, added version command
2025-01-06 21:59:05 +08:00
Georgi Gerganov
e6e7c75d94
server : fix extra BOS in infill endpoint ( #11106 )
...
* server : fix extra BOS in infill endpoing
ggml-ci
* server : update infill tests
2025-01-06 15:36:08 +02:00
Xuan Son Nguyen
09186fabbe
llama : remove check flash_attn with lora ( #11104 )
2025-01-06 13:41:12 +01:00
Asghar Ghorbani
96a1dc27c3
llama : prevent system info string accumulation across calls ( #11101 )
2025-01-06 13:21:46 +02:00
Daniel Bevenius
6369f867a4
llama : rename missed batch params/vars to ubatch ( #10059 )
...
This commit renames the `batch` parameter to `ubatch` in the
`llama_kv_cache_find_slot`, `llm_build_inp_embd`, and
`llm_build_mamba` functions.
The motivation for this is that this should have been done as part of
Commit 19d900a756 ("llama : rename batch
to ubatch (#9950 )") but for some reason I missed these functions in
that commit and only noticed them now (sorry).
2025-01-06 11:28:17 +02:00
Georgi Gerganov
47182dd03f
llama : update llama_model API names ( #11063 )
...
* llama : deprecate llama_free_model, add llama_model_free
ggml-ci
* llama : change `llama_load_model_from_file` -> `llama_model_load_from_file`
ggml-ci
2025-01-06 10:55:18 +02:00
Georgi Gerganov
3e6e7a6bc2
tokenize : escape the prompt ( #11058 )
...
* tokenize : escape the prompt
* tokenize : update help
2025-01-06 10:54:25 +02:00
Georgi Gerganov
ae2f606bb5
mmap : fix fileno macro clash ( #11076 )
...
* mmap : fix fileno macro clash
ggml-ci
* cont
ggml-ci
2025-01-06 10:52:38 +02:00
Georgi Gerganov
727368c60f
llama : use LLAMA_TOKEN_NULL ( #11062 )
...
ggml-ci
2025-01-06 10:52:15 +02:00
Georgi Gerganov
5047dd3546
llama : use _impl suffix instead of _internal ( #11060 )
...
ggml-ci
2025-01-06 10:52:01 +02:00
Johannes Gäßler
46e3556e01
CUDA: add BF16 support ( #11093 )
...
* CUDA: add BF16 support
2025-01-06 02:33:52 +01:00
Concedo
7b25b6171c
updated lite
2025-01-05 23:10:31 +08:00
Concedo
9b32482089
fixed bug in aesthetic ui
2025-01-05 18:04:02 +08:00
0cc4m
b56f079e28
Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver ( #11074 )
...
* Vulkan: Add device-specific blacklist for coopmat for the AMD proprietary driver
* Add (TM) to AMD name check
2025-01-04 21:09:59 +01:00
fairydreaming
9394bbd484
llama : Add support for DeepSeek V3 ( #11049 )
...
* convert : extend DEEPSEEK2 model architecture to support DeepseekV3ForCausalLM by adding EXPERT_WEIGHTS_NORM and EXPERT_GATING_FUNC model parameters and FFN_EXP_PROBS_B tensor type
* vocab : add DeepSeek V3 pre-tokenizer regexes
* unicode : handle ACCENT_MARK and SYMBOL categories in regex
* llama : add DeepSeek V3 chat template, handle new model parameters and tensor types
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
2025-01-04 21:06:11 +01:00
matt23654
f922a9c542
[GGML][RPC] Support for models with non-512-aligned tensors over RPC. ( #11047 )
...
* Added init tensor calling code
* Added get_alloc_size forwarding
* Cleaned up and improved type/error handling.
* fix: remove trailing whitespaces.
* Cleanup and use GGML error logging functions.
* Handle potentially dangerous edge cases.
* Apply suggestions from code review
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-04 17:10:30 +01:00
DAN™
46be942214
llama : add support for the cohere2 model architecture ( #10900 )
2025-01-04 16:33:31 +02:00
Georgi Gerganov
78c6785175
sync : ggml
2025-01-04 16:09:53 +02:00
Georgi Gerganov
5e3b08d606
ggml : do not install metal source when embed library (ggml/1054)
2025-01-04 16:09:53 +02:00
Daniel Bevenius
db68c93b57
ggml : improve inputs log sched_print_assignments (ggml/1053)
...
This commit attempts to improve the log message for the inputs of the
splits in the sched_print_assignments function.
The motivation for this change is that currently even if there are no
inputs a colon is displayed at the end of the line, which can make it a
little confusing when reading the output as it could be interpreted as
the line below are inputs when they are in fact nodes. With this change
the colon will only be printed if there actually are inputs.
2025-01-04 16:09:53 +02:00
Concedo
dca7ab5d9e
fixed tools compile error
2025-01-04 18:45:21 +08:00
Concedo
1559d4d2fb
fixed defective websearch
2025-01-04 16:47:38 +08:00
Gilad S.
c31fc8b966
fix: Vulkan shader gen binary path ( #11037 )
2025-01-04 09:17:31 +01:00
Concedo
b37354bf73
upgrade to upload-artifact v4
2025-01-04 13:32:49 +08:00
Concedo
e07e73aeb4
updated lite
2025-01-04 10:47:48 +08:00
Concedo
b4dc29f425
kobo cheats death again (+1 squashed commits)
...
Squashed commits:
[708e2429] kobo cheats death again
2025-01-04 01:06:41 +08:00
Concedo
f9f1585a7f
broken merge - kcpp changes will be applied above this commit for better tracking.
2025-01-03 23:49:17 +08:00
Molly Sophia
4b0c638b9a
common : disable KV cache shifting automatically for unsupported models ( #11053 )
...
* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-03 14:13:18 +02:00
Concedo
1012281320
updated colab
2025-01-03 18:02:02 +08:00
Georgi Gerganov
e7da954ecc
metal : avoid uint ( #11019 )
2025-01-03 11:26:14 +02:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp ( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00
Concedo
911da8765f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/bench/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-01-03 11:56:20 +08:00
Concedo
22fd7a0439
fix make tools for linux
2025-01-03 11:39:23 +08:00
Pierrick Hymbert
2f0ee84b9b
server: bench: minor fixes ( #10765 )
...
* server/bench:
- support openAI streaming standard output with [DONE]\n\n
- export k6 raw results in csv
- fix too many tcp idle connection in tcp_wait
- add metric time to emit first token
* server/bench:
- fix when prometheus not started
- wait for server to be ready before starting bench
2025-01-02 18:06:12 +01:00
Xuan Son Nguyen
0da5d86026
server : allow using LoRA adapters per-request ( #10994 )
...
* slot.can_batch_with
* lora per request
* test: force disable cache prompt
* move can_batch_with check
* fix condition
* add slow test with llama 8b
* update docs
* move lora change task to queue
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* lora_base
* remove redundant check
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-02 15:05:18 +01:00
Benson Wong
a45433ba20
readme : add llama-swap to infrastructure section ( #11032 )
...
* list llama-swap under tools in README
* readme: add llama-swap to Infrastructure
2025-01-02 09:14:54 +02:00
Srihari-mcw
0827b2c1da
ggml : fixes for AVXVNNI instruction set with MSVC and Clang ( #11027 )
...
* Fixes for clang AVX VNNI
* enable AVX VNNI and alder lake build for MSVC
* Apply suggestions from code review
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-12-31 15:23:33 +01:00