Concedo
ccbd630a42
allow custom t5, clipl and clipg
2024-11-06 19:05:48 +08:00
Concedo
3cfc4dc581
avoid euler a for flux (+4 squashed commit)
...
Squashed commit:
[5a4b72385] fix cuda build
[5f969a645] add vulkan information
[6849e7398] fixed flux
[740e80419] update readme
2024-11-05 22:50:14 +08:00
Concedo
5b90eeaf17
fixed sd to work on larger images by adding tiling, also limit res for sd1.5
2024-11-04 23:26:15 +08:00
Concedo
f153a14daf
add common identity provider /.well-known/serviceinfo, updated docs
2024-11-04 21:29:26 +08:00
Concedo
847689e74c
fixed incorrect makefile flags
2024-11-04 20:39:10 +08:00
Concedo
75d2f90148
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/CMakeLists.txt
# scripts/sync-ggml.last
2024-11-04 16:58:09 +08:00
Concedo
bb13925f39
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# Makefile
# Package.swift
# ci/run.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml.c
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-rope.cpp
2024-11-04 16:54:53 +08:00
Georgi Gerganov
ce027adfb3
sync : ggml
2024-11-04 10:33:37 +02:00
Yuri Khrustalev
284e5b0275
cmake : make it possible linking ggml as external lib (ggml/1003)
2024-11-04 10:33:11 +02:00
Plamen Minev
e2292aaa17
metal : fix minor string leaks (ggml/1004)
2024-11-04 10:33:10 +02:00
Concedo
c7e351bf41
add exception for ibm granite, then keep using f16 kq mul for HIPBLAS only for now pending ROCM investigation re https://github.com/ggerganov/llama.cpp/pull/10015
2024-11-04 15:47:13 +08:00
Diego Devesa
9f40989351
ggml : move CPU backend to a separate file ( #10144 )
2024-11-03 19:34:08 +01:00
Concedo
5233e8ed1d
sd 3.5 medium
2024-11-03 23:27:06 +08:00
Concedo
f32a874966
resync and updated sdcpp for flux and sd3 support
2024-11-03 22:03:16 +08:00
Georgi Gerganov
08828a6d7d
metal : minor fixup in FA kernel ( #10143 )
...
* metal : minor fixup in FA kernel
ggml-ci
* metal : use the unrolled loop variable
* metal : remove unused var
2024-11-03 15:18:40 +02:00
Georgi Gerganov
1839f69130
flake.lock: Update ( #10146 )
2024-11-03 05:14:15 -08:00
Concedo
33721615b5
fixed build issues
2024-11-03 11:01:51 +08:00
Christian Köhnenkamp
9830b6923b
Add apple arm to presets ( #10134 )
...
* Add apple arm to presets
* Add final new line
2024-11-02 15:35:31 -07:00
sasha0552
42cadc74bd
server : fix slot selection by lru ( #10126 )
...
* server : fix slot selection by lru, migrate lcs to `size_t`
* minor debug log fix
2024-11-02 18:34:56 +02:00
Georgi Gerganov
45950415ed
server : fix endpoint checks ( #10135 )
...
ggml-ci
2024-11-02 18:34:00 +02:00
Concedo
bc30ebd044
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
# examples/CMakeLists.txt
# examples/main/README.md
# ggml/src/CMakeLists.txt
# ggml/src/kompute-shaders/common.comp
# scripts/sync-ggml.last
# src/llama.cpp
2024-11-02 21:57:29 +08:00
Concedo
223c5f0844
clblast survived
2024-11-02 21:51:38 +08:00
Georgi Gerganov
1926d6e39d
llama : adjust default context size + print warnings ( #10136 )
...
* llama : adjust default context size + print warnings
ggml-ci
* ggml-ci : add missing gpu-layers + adjust context sizes
2024-11-02 15:18:56 +02:00
Diego Devesa
b634f8a26f
simple-chat : only add bos on first prompt ( #10129 )
2024-11-02 13:08:53 +01:00
Xuan Son Nguyen
7554aa4655
convert-lora : make --base optional ( #10110 )
...
* convert-lora : make `--base` optional
* lint
* handle case where base_model_name_or_path is invalid
* do not include metadata from base model
* clarify unspecified --base
* add small comment [no ci]
* trigger ci
2024-11-02 12:53:17 +01:00
Concedo
3072db6895
remove annoying eog prints
2024-11-02 12:44:33 +08:00
Concedo
6ac8b2bdb3
tweak ratios
2024-11-02 12:35:04 +08:00
Diego Devesa
a6744e43e8
llama : add simple-chat example ( #10124 )
...
* llama : add simple-chat example
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-11-01 23:50:59 +01:00
Diego Devesa
e991e3127f
llama : use smart pointers for ggml resources ( #10117 )
2024-11-01 23:48:26 +01:00
Shupei Fan
418f5eef26
vulkan : improve ggml_vk_create_buffer error handling ( #9898 )
2024-11-01 19:33:14 +01:00
Concedo
4ae06b4a64
print some env vars for win ci
2024-11-01 23:58:41 +08:00
Georgi Gerganov
ba6f62eb79
readme : update hot topics
2024-11-01 17:31:51 +02:00
Concedo
2a07f2dc2c
minor fix
2024-11-01 22:42:57 +08:00
sasha0552
d865d1478c
server : fix smart selection of available slot ( #10120 )
...
* Fix smart selection of available slot
* minor fix
* replace vectors of tokens with shorthands
2024-11-01 14:33:14 +01:00
Georgi Gerganov
1804adb0cf
ggml : remove ggml_scratch ( #10121 )
...
ggml-ci
2024-11-01 12:58:45 +02:00
Concedo
bbebc76817
fix top picks bug, lower input anti abuse thresholds (+1 squashed commits)
...
Squashed commits:
[a81d9b21] fix top picks bug, lower input anti abuse thresholds
2024-11-01 16:42:13 +08:00
Georgi Gerganov
815fe72adc
sync : ggml
2024-11-01 10:28:24 +02:00
Georgi Gerganov
f221d56220
ggml : alloc ggml_contexts on the heap (whisper/2525)
2024-11-01 10:24:50 +02:00
Concedo
6a27003a06
logprobs feature completed
2024-11-01 15:24:07 +08:00
Zhenwei Jin
e597e50794
build: fix build error in Windows env with OneAPI setup ( #10107 )
2024-11-01 11:09:59 +08:00
Diego Devesa
85679d37f3
llama : improve output buffer type selection ( #10098 )
2024-11-01 00:49:53 +01:00
Diego Devesa
1e9f94994e
quantize : fix --keep-split ( #10114 )
2024-11-01 00:45:34 +01:00
Diego Devesa
c02e5ab2a6
llama : fix buffer checks for mamba and rwk ( #10111 )
...
* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE
2024-10-31 22:54:23 +01:00
Zhenwei Jin
ab3d71f97f
loader: refactor tensor weights storage ( #9935 )
...
* loader: refactor tensor weights storage
* use sorted map, sort weights by layer
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-10-31 19:50:39 +01:00
Concedo
f7406dfdb1
updated lite
2024-11-01 01:13:15 +08:00
Concedo
a46f8acd03
note: also has support for completion tokens count
2024-11-01 00:44:14 +08:00
Concedo
aa26a58085
added logprobs api and logprobs viewer
2024-11-01 00:22:15 +08:00
Kevin Gibbons
0a683e8088
server : include scheme when printing URL ( #10106 )
2024-10-31 14:02:35 +01:00
Diego Devesa
dea5e86051
ggml : check tensor name lengths in gguf files ( #10100 )
2024-10-31 11:40:59 +01:00
Sergio López
1329c0a75e
kompute: add mul_mat_q4_k shader ( #10097 )
...
This is a more or less direct translation from the Metal implementation
to GLSL.
Signed-off-by: Sergio Lopez <slp@redhat.com>
2024-10-31 11:09:52 +02:00