wwoodsTM
5107e8cea3
DRY: Fixes clone functionality ( #10192 )
2024-11-07 16:20:25 +01:00
snadampal
2319126a70
fix q4_0_8_8 format for corrupted tokens issue ( #10198 )
...
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-62-167.us-west-2.compute.internal>
2024-11-07 09:02:08 +01:00
Concedo
262437f393
fallback flux loader
2024-11-07 15:55:43 +08:00
Zhiyuan Li
3bcd40b3c5
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ( #10133 )
...
* rwkv6: rename to wkv6
* rwkv6: support avx2 avx512 armv8 armv9
* rwkv6: update cuda file name
* rwkv6: rename params
* wkv on sycl
* sycl: add some ops
* sycl: Enhance OP support judgment
* wkv6: drop armv9 and tranfer to GGML style
ggml-ci
* sync : ggml
* update the function to use appropriate types
* fix define error
* Update ggml/src/ggml-cpu.c
* add appropriate asserts
* move element-wise functions outside
* put the declaration outside the loop
* rewrite to be more inline with the common pattern for distributing threads
* use recommended way GGML_TENSOR_LOCALS
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-11-07 15:19:10 +08:00
Concedo
c9977a5cb5
model downloading for new params
2024-11-07 14:41:25 +08:00
Georgi Gerganov
5c333e0140
metal : add BF16 support ( #8439 )
...
* ggml : add initial BF16 support
ggml-ci
* metal : add mul_mat_id BF16 support
ggml-ci
* metal : check for bfloat support on the Metal device
ggml-ci
* metal : better var names [no ci]
* metal : do not build bfloat kernels when not supported
ggml-ci
* metal : try to fix BF16 support check
ggml-ci
* metal : this should correctly check bfloat support
2024-11-06 19:53:51 +02:00
Concedo
628dcd640e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# examples/server/README.md
2024-11-06 23:13:00 +08:00
kallewoof
3c36bbdcd7
debug: display tokens that were dropped by XTC sampler when debugmode is enabled ( #1201 )
2024-11-06 23:09:28 +08:00
Concedo
859ec03cd0
updated lite
2024-11-06 22:37:35 +08:00
Georgi Gerganov
b11f9ba9b8
server : remove hack for extra parallel slot ( #10187 )
...
ggml-ci
2024-11-06 13:29:01 +02:00
Diego Devesa
94d8cb8be1
metal : fix from ptr buffer name ( #10189 )
2024-11-06 12:10:07 +01:00
Concedo
ccbd630a42
allow custom t5, clipl and clipg
2024-11-06 19:05:48 +08:00
Georgi Gerganov
1dc04b2dee
ggml : adjust is_first_call init value ( #10193 )
...
ggml-ci
2024-11-06 11:20:10 +02:00
Georgi Gerganov
a1eaf6a960
metal : add quantized FA support ( #10149 )
...
* metal : add quantized FA (vec) support
ggml-ci
* metal : add quantized FA (non-vec) support
* metal : fix support check
ggml-ci
* metal : clean-up
* metal : clean-up (cont)
* metal : fix shared memory calc + reduce smem + comments
* metal : float-correctness
* metal : minor [no ci]
2024-11-06 10:24:23 +02:00
Concedo
3cfc4dc581
avoid euler a for flux (+4 squashed commit)
...
Squashed commit:
[5a4b72385] fix cuda build
[5f969a645] add vulkan information
[6849e7398] fixed flux
[740e80419] update readme
2024-11-05 22:50:14 +08:00
Gabe Goodhart
b8deef0ec0
llama : add <|tool_call|> formatting to Granite template ( #10177 )
...
Branch: GraniteToolCallTemplate
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2024-11-05 14:23:04 +02:00
Diego Devesa
a9e8a9a030
ggml : fix arch check in bf16_to_fp32 ( #10164 )
2024-11-04 23:17:01 +01:00
Eve
3407364776
Q6_K AVX improvements ( #10118 )
...
* q6_k instruction reordering attempt
* better subtract method
* should be theoretically faster
small improvement with shuffle lut, likely because all loads are already done at that stage
* optimize bit fiddling
* handle -32 offset separately. bsums exists for a reason!
* use shift
* Update ggml-quants.c
* have to update ci macos version to 13 as 12 doesnt work now. 13 is still x86
2024-11-04 23:06:31 +01:00
Diego Devesa
d5a409e57f
ggml : fix gelu tables initialization ( #10172 )
2024-11-04 20:06:58 +01:00
Diego Devesa
401558b7ba
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment ( #10167 )
2024-11-04 17:34:08 +01:00
Xuan Son Nguyen
9e0ecfb697
server : clarify /slots endpoint, add is_processing ( #10162 )
...
* server : clarify /slots endpoint, add is_processing
* fix tests
2024-11-04 16:33:29 +01:00
Concedo
5b90eeaf17
fixed sd to work on larger images by adding tiling, also limit res for sd1.5
2024-11-04 23:26:15 +08:00
snadampal
6a066b9978
fix build break on arm64 linux ( #10166 )
...
This fixes the build break from the recent changes
to move the CPU backend to separate files
https://github.com/ggerganov/llama.cpp/pull/10144
2024-11-04 16:08:33 +01:00
Concedo
f153a14daf
add common identity provider /.well-known/serviceinfo, updated docs
2024-11-04 21:29:26 +08:00
Concedo
847689e74c
fixed incorrect makefile flags
2024-11-04 20:39:10 +08:00
Diego Devesa
ea02c753eb
cuda : clear error after changing peer access ( #10153 )
2024-11-04 13:10:23 +01:00
Georgi Gerganov
05697f670b
metal : simplify f16 and f32 dequant kernels ( #0 )
2024-11-04 13:49:34 +02:00
Georgi Gerganov
f8e58135cf
metal : move dequantize templates to beginning of MSL source ( #0 )
2024-11-04 13:44:06 +02:00
leo-pony
329ed914c9
CANN: adjust backend registry refactor. ( #10158 )
...
remove buffer->iface.get_name that used in cann as it was removed in backend registry refactor PR.
2024-11-04 19:08:22 +08:00
Concedo
75d2f90148
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/CMakeLists.txt
# scripts/sync-ggml.last
2024-11-04 16:58:09 +08:00
Concedo
bb13925f39
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# Makefile
# Package.swift
# ci/run.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml.c
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-rope.cpp
2024-11-04 16:54:53 +08:00
Georgi Gerganov
ce027adfb3
sync : ggml
2024-11-04 10:33:37 +02:00
Yuri Khrustalev
284e5b0275
cmake : make it possible linking ggml as external lib (ggml/1003)
2024-11-04 10:33:11 +02:00
Plamen Minev
e2292aaa17
metal : fix minor string leaks (ggml/1004)
2024-11-04 10:33:10 +02:00
Concedo
c7e351bf41
add exception for ibm granite, then keep using f16 kq mul for HIPBLAS only for now pending ROCM investigation re https://github.com/ggerganov/llama.cpp/pull/10015
2024-11-04 15:47:13 +08:00
Diego Devesa
9f40989351
ggml : move CPU backend to a separate file ( #10144 )
2024-11-03 19:34:08 +01:00
Concedo
5233e8ed1d
sd 3.5 medium
2024-11-03 23:27:06 +08:00
Concedo
f32a874966
resync and updated sdcpp for flux and sd3 support
2024-11-03 22:03:16 +08:00
Georgi Gerganov
08828a6d7d
metal : minor fixup in FA kernel ( #10143 )
...
* metal : minor fixup in FA kernel
ggml-ci
* metal : use the unrolled loop variable
* metal : remove unused var
2024-11-03 15:18:40 +02:00
Georgi Gerganov
1839f69130
flake.lock: Update ( #10146 )
2024-11-03 05:14:15 -08:00
Concedo
33721615b5
fixed build issues
2024-11-03 11:01:51 +08:00
Christian Köhnenkamp
9830b6923b
Add apple arm to presets ( #10134 )
...
* Add apple arm to presets
* Add final new line
2024-11-02 15:35:31 -07:00
sasha0552
42cadc74bd
server : fix slot selection by lru ( #10126 )
...
* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix
2024-11-02 18:34:56 +02:00
Georgi Gerganov
45950415ed
server : fix endpoint checks ( #10135 )
...
ggml-ci
2024-11-02 18:34:00 +02:00
Concedo
bc30ebd044
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# examples/CMakeLists.txt
# examples/main/README.md
# ggml/src/CMakeLists.txt
# ggml/src/kompute-shaders/common.comp
# scripts/sync-ggml.last
# src/llama.cpp
2024-11-02 21:57:29 +08:00
Concedo
223c5f0844
clblast survived
2024-11-02 21:51:38 +08:00
Georgi Gerganov
1926d6e39d
llama : adjust default context size + print warnings ( #10136 )
...
* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes
2024-11-02 15:18:56 +02:00
Diego Devesa
b634f8a26f
simple-chat : only add bos on first prompt ( #10129 )
2024-11-02 13:08:53 +01:00
Xuan Son Nguyen
7554aa4655
convert-lora : make --base optional ( #10110 )
...
* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci
2024-11-02 12:53:17 +01:00
Concedo
3072db6895
remove annoying eog prints
2024-11-02 12:44:33 +08:00