Concedo
1b663e10c8
first functional multiplayer
2024-11-19 22:49:28 +08:00
Diego Devesa
3ee6382d48
cuda : fix CUDA_FLAGS not being applied ( #10403 )
2024-11-19 14:29:38 +01:00
Georgi Gerganov
8e752a777b
llama : add check for KV cache shifts ( #10401 )
...
ggml-ci
2024-11-19 13:29:26 +02:00
Concedo
8db8154a25
Merge branch 'concedo_experimental' of https://github.com/LostRuins/koboldcpp into concedo_experimental
2024-11-19 18:09:29 +08:00
Concedo
14cbd07eaa
more wip multiplayer
2024-11-19 18:09:26 +08:00
Shane A
a88ad007de
llama : add OLMo November 2024 support ( #10394 )
...
* Add OLMo November 2024 constants
* Add OLMo November 2024 converter
* Add loading of OLMo November 2024 tensors and hyper parameters
* Add building of OLMo November 2024 model
2024-11-19 11:04:08 +02:00
Romain Biessy
2a1507c162
sycl : Add option to set the SYCL architecture for all targets ( #10266 )
...
* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance
2024-11-19 08:02:23 +00:00
Jeff Bolz
b3e585988f
vulkan: Optimize soft_max ( #10301 )
...
* vulkan: Optimize soft_max
Large soft_max could already saturate memory, but small/medium sizes were
pretty slow. The bulk of the gains for them comes from using a smaller
workgroup size, and making the workgroup size match the subgroup size also
makes the barriers much cheaper.
Cache some values in locals to avoid refetching/recomputing. And stamp
out a few "template instantiations" so smaller cases will fully unroll.
Add a missing early return for OOB rows. This happens when there are more
than 512 rows and the dispatch is 512 x H.
* vulkan: Further soft_max optimizations
Restore the workgroup size of 512 case, use it for >1024.
Use unrollable loops for more iteration counts.
2024-11-19 08:25:17 +01:00
pandora
a548108dd2
Create Mistral-V7.json ( #1224 )
2024-11-19 10:45:50 +08:00
Alberto Cabrera Pérez
557924f222
sycl: Revert MUL_MAT_OP support changes ( #10385 )
2024-11-19 08:50:04 +08:00
Diego Devesa
d3481e6316
cuda : only use native when supported by cmake ( #10389 )
2024-11-18 18:43:40 +01:00
Concedo
ee586b9a9d
fixed vulkan
2024-11-19 01:26:31 +08:00
Concedo
d5feaa8a3d
fixed old mixtral models, but at what cost? was it worth it?
2024-11-19 01:01:25 +08:00
bandoti
531cb1c233
Skip searching root path for cross-compile builds ( #10383 )
2024-11-18 16:23:58 +01:00
Jeff Bolz
f139d2ea61
vulkan: remove use of null initializer ( #10372 )
...
Seems like this isn't working for vulkan-over-metal when the array is sized
by a spec constant. Maybe a spirv-cross limitation?
2024-11-18 08:28:42 -06:00
Georgi Gerganov
2eb76b2a5e
flake.lock: Update ( #10346 )
...
Flake lock file updates:
• Updated input 'nixpkgs':
'github:NixOS/nixpkgs/4aa36568d413aca0ea84a1684d2d46f55dbabad7?narHash=sha256-Zwl8YgTVJTEum%2BL%2B0zVAWvXAGbWAuXHax3KzuejaDyo%3D' (2024-11-05)
→ 'github:NixOS/nixpkgs/5e4fbfb6b3de1aa2872b76d49fafc942626e2add?narHash=sha256-OZiZ3m8SCMfh3B6bfGC/Bm4x3qc1m2SVEAlkV6iY7Yg%3D' (2024-11-15)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-11-18 06:08:20 -08:00
0cc4m
9b75f03cd2
Vulkan: Fix device info output format specifiers ( #10366 )
...
* Vulkan: Fix device info output format specifiers
* Vulkan: Use zu printf specifier for size_t instead of ld
2024-11-18 11:02:43 +01:00
GPTLocalhost (Word Add-in)
aacb6c3a70
Add GPTLocalhost as third-party resource ( #1221 )
2024-11-18 10:17:06 +08:00
Johannes Gäßler
75207b3a88
docker: use GGML_NATIVE=OFF ( #10368 )
2024-11-18 00:21:53 +01:00
Johannes Gäßler
76e9e58b78
CUDA: fix MMV kernel being used for FP16 src1 ( #10357 )
2024-11-17 23:20:42 +01:00
Concedo
39124828ab
wip multiplayer
2024-11-17 23:29:25 +08:00
Johannes Gäßler
ce2e59ba10
CMake: fix typo in comment [no ci] ( #10360 )
2024-11-17 12:59:38 +01:00
Diego Devesa
be5caccef9
llama : only use default buffer types for the KV cache ( #10358 )
2024-11-17 12:25:45 +01:00
Georgi Gerganov
20a780c7b6
gitignore : ignore local run scripts [no ci]
2024-11-17 13:12:22 +02:00
Georgi Gerganov
cf32a9b93a
metal : refactor kernel args into structs ( #10238 )
...
* metal : add kernel arg structs (wip)
* metal : fattn args
ggml-ci
* metal : cont + avoid potential int overflow [no ci]
* metal : mul mat struct (wip)
* cont : mul mat vec
* cont : pass by reference
* cont : args is first argument
* cont : use char ptr
* cont : shmem style
* cont : thread counters style
* cont : mul mm id
ggml-ci
* cont : int safety + register optimizations
ggml-ci
* metal : GGML_OP_CONCAT
ggml-ci
* metal : GGML_OP_ADD, GGML_OP_SUB, GGML_OP_MUL, GGML_OP_DIV
* metal : GGML_OP_REPEAT
* metal : GGML_OP_CPY
* metal : GGML_OP_RMS_NORM
* metal : GGML_OP_NORM
* metal : add TODOs for rest of ops
* ggml : add ggml-metal-impl.h
ggml-ci
2024-11-17 11:23:01 +02:00
FirstTimeEZ
a43178299c
ggml : fix undefined reference to 'getcpu' ( #10354 )
...
https://github.com/ggerganov/llama.cpp/issues/10352
2024-11-17 10:39:22 +02:00
Johannes Gäßler
c3ea58aca4
CUDA: remove DMMV, consolidate F16 mult mat vec ( #10318 )
2024-11-17 09:09:55 +01:00
Johannes Gäßler
467576b6cc
CMake: default to -arch=native for CUDA build ( #10320 )
2024-11-17 09:06:34 +01:00
Diego Devesa
eda7e1d4f5
ggml : fix possible buffer use after free in sched reserve ( #9930 )
2024-11-17 08:31:17 +02:00
Georgi Gerganov
24203e9dd7
ggml : inttypes.h -> cinttypes ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5d9e59979c
ggml : adapt AMX to tensor->grad removal ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
a4200cafad
make : add ggml-opt ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Georgi Gerganov
84274a10c3
tests : remove test-grad0
2024-11-17 08:30:29 +02:00
Georgi Gerganov
68fcb4759c
ggml : fix compile warnings ( #0 )
...
ggml-ci
2024-11-17 08:30:29 +02:00
Johannes Gäßler
8a43e940ab
ggml: new optimization interface (ggml/988)
2024-11-17 08:30:29 +02:00
Georgi Gerganov
5c9a8b22b1
scripts : update sync
2024-11-17 08:30:29 +02:00
Concedo
e7897f3257
update docs
2024-11-17 11:43:49 +08:00
FirstTimeEZ
0fff7fd798
docs : vulkan build instructions to use git bash mingw64 ( #10303 )
2024-11-17 00:29:18 +01:00
Johannes Gäßler
4e54be0ec6
llama/ex: remove --logdir argument ( #10339 )
2024-11-16 23:00:41 +01:00
Concedo
d6932bbff8
test fix linux build
2024-11-17 02:43:42 +08:00
Concedo
e1f0b0bedd
try fix macos build (+1 squashed commits)
...
Squashed commits:
[ae66dddfd] try fix macos build
2024-11-17 02:37:08 +08:00
Georgi Gerganov
db4cfd5dbc
llamafile : fix include path ( #0 )
...
ggml-ci
2024-11-16 20:36:26 +02:00
Georgi Gerganov
8ee0d09ae6
make : auto-determine dependencies ( #0 )
2024-11-16 20:36:26 +02:00
Concedo
f6e9d11636
try with 2 parallel jobs
2024-11-17 01:46:41 +08:00
Concedo
952328fdc8
try fix cuda build
2024-11-17 01:41:52 +08:00
Concedo
9acfe96c77
fix cuda build
2024-11-16 21:58:22 +08:00
MaggotHATE
bcdb7a2386
server: (web UI) Add samplers sequence customization ( #10255 )
...
* Samplers sequence: simplified and input field.
* Removed unused function
* Modify and use `settings-modal-short-input`
* rename "name" --> "label"
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-11-16 14:26:54 +01:00
Concedo
a8694698fd
accept gguf text encoders for sd
2024-11-16 17:23:02 +08:00
Concedo
590553ef07
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .github/workflows/build.yml
# CMakePresets.json
# Makefile
# docs/backend/SYCL.md
# docs/build.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# scripts/compare-llama-bench.py
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
2024-11-16 17:20:14 +08:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00