Concedo
00a686fc72
fixed fast forwarding context corruption after abort during prompt processing
2024-12-10 22:37:40 +08:00
Concedo
a11bba5893
cleanup, fix native build for arm (+28 squashed commit)
...
Squashed commit:
[d1f6a4154] bundle library
[947ab84b7] undo
[0f9aba8d8] test
[e9ac93873] test
[920438202] test
[1c6d98804 ] Revert "quick test"
This reverts commit acf8ec8940 .
[acf8ec894 ] quick test
[6a9937233 ] undo
[5a263a5bd ] test
[ddfd82bca ] test
[0b30e45da ] test
[c3bfece55 ] messed up
[2a4b37fe0 ] Revert "test"
This reverts commit 80a1fcaeaf .
[80a1fcaea ] test
[e2aa7d944 ] test
[264d80200 ] test
[f5b123173 ] undo
[1ffacc484 ] test
[63c0be926 ] undo
[510e0377e ] ofast try fix
[4ac199b20 ] try fix sigill
[1bc987ba2 ] try fix illegal instruction
[7697252b1 ] edit
[f87087b28 ] check gcc ver
[e9dfe2cef ] try using qemu to do the pyinstaller
[b411192db ] revert
[25b5301e5 ] try using qemu to do the pyinstaller
[58038cddc ] try using qemu to do the pyinstaller
2024-12-10 19:42:23 +08:00
Concedo
e9d2332dd8
improved tool calls and whisper
2024-12-06 14:34:31 +08:00
Concedo
836c06d91a
minor edit
2024-12-06 00:37:38 +08:00
Concedo
746cb01843
remove test since it wont work on x64
2024-12-06 00:26:58 +08:00
Concedo
65a11451e3
fix missing bundled files
2024-12-06 00:21:08 +08:00
Concedo
fe72c8db9f
CI for ARM should appear as ARM
2024-12-06 00:12:30 +08:00
Concedo
5cddd0a878
Merge branch 'concedo' into concedo_experimental
2024-12-05 23:58:31 +08:00
Concedo
ece96e19bf
clean up makefile
2024-12-05 23:58:23 +08:00
Concedo
8d5bb06aeb
test aarch64 ci workflow
2024-12-05 23:57:25 +08:00
Concedo
d0d1d922de
handle and fix temp paths to chat completions adapter
2024-12-05 17:22:35 +08:00
Concedo
5106816eac
drafted tokens debug prints
2024-12-05 17:05:20 +08:00
Concedo
2787fca6b4
refactored library selection, fixed ollama params
2024-12-05 16:47:52 +08:00
Concedo
52cc908f7f
default trim_stop to true, which trims any tokens after a stop sequence and the stop sequence itself. This is potentially a breaking change.
2024-12-03 22:44:10 +08:00
Concedo
7d11d2946c
only show warning if more than 1 moved tensor
2024-12-03 22:09:26 +08:00
Ikko Eltociear Ashimine
ed9e229372
docs: update README.md ( #1244 )
...
recomended -> recommended
2024-12-02 17:20:20 +08:00
Concedo
2ba5949054
updated sdcpp, also set euler as default sampler
2024-12-01 17:00:20 +08:00
Concedo
e93c2427b4
allow incompatible vocab in debugmode
2024-12-01 14:11:03 +08:00
Concedo
42228b9746
warning when selecting non gguf models
2024-12-01 13:35:51 +08:00
Concedo
d5e732f3ab
updated lite
2024-12-01 01:49:09 +08:00
Concedo
b7cd210cd2
more linting with Ruff (+1 squashed commits)
...
Squashed commits:
[43802cfe2] Applied default Ruff linting
2024-12-01 01:23:13 +08:00
Concedo
409e393d10
fixed critical bug in image model loader
2024-11-30 23:28:24 +08:00
Concedo
153da19274
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
2024-11-30 16:59:25 +08:00
Concedo
0028e71993
special handling to resolve incomplete utf8 token sequences in qwen
2024-11-30 16:54:01 +08:00
Concedo
32ac3153e4
default speculative set to 8. added more adapter fields
2024-11-30 16:18:27 +08:00
Georgi Gerganov
3e0ba0e604
readme : remove old badge
2024-11-30 10:09:21 +02:00
Georgi Gerganov
abadba05be
readme : refresh ( #10587 )
...
* readme : refresh
* readme : move section [no ci]
* readme : clarify [no ci]
* readme : fixes [no ci]
* readme : more fixes [no ci]
* readme : simplify [no ci]
* readme : clarify GGUF
2024-11-30 09:47:07 +02:00
Eve
0533e7fb38
vulkan: Dynamic subgroup size support for Q6_K mat_vec ( #10536 )
...
* subgroup 64 version with subgroup add. 15% faster
scalable version
tested for subgroup sizes 16-128
* check for subgroup multiple of 16 and greater than 16
* subgroup sizes are always a power of 2 (https://github.com/KhronosGroup/GLSL/issues/45 )
* force 16 sequential threads per block
* make 16 subgroup size a constant
2024-11-30 08:00:02 +01:00
Concedo
5353bfa983
updated lite
2024-11-30 12:26:20 +08:00
Concedo
557bcaf86e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# .github/workflows/build.yml
# Makefile
# Package.swift
# common/CMakeLists.txt
# examples/batched-bench/CMakeLists.txt
# examples/batched/CMakeLists.txt
# examples/convert-llama2c-to-ggml/CMakeLists.txt
# examples/cvector-generator/CMakeLists.txt
# examples/embedding/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/export-lora/CMakeLists.txt
# examples/gbnf-validator/CMakeLists.txt
# examples/gguf-split/CMakeLists.txt
# examples/gguf/CMakeLists.txt
# examples/gritlm/CMakeLists.txt
# examples/imatrix/CMakeLists.txt
# examples/infill/CMakeLists.txt
# examples/llama-bench/CMakeLists.txt
# examples/llava/CMakeLists.txt
# examples/lookahead/CMakeLists.txt
# examples/lookup/CMakeLists.txt
# examples/main-cmake-pkg/CMakeLists.txt
# examples/main/CMakeLists.txt
# examples/parallel/CMakeLists.txt
# examples/passkey/CMakeLists.txt
# examples/perplexity/CMakeLists.txt
# examples/quantize-stats/CMakeLists.txt
# examples/quantize/CMakeLists.txt
# examples/retrieval/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/save-load-state/CMakeLists.txt
# examples/server/CMakeLists.txt
# examples/simple-chat/CMakeLists.txt
# examples/simple/CMakeLists.txt
# examples/speculative-simple/CMakeLists.txt
# examples/speculative/CMakeLists.txt
# examples/tokenize/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# pocs/vdot/CMakeLists.txt
# src/CMakeLists.txt
# src/unicode.cpp
# tests/test-sampling.cpp
2024-11-30 12:24:51 +08:00
Concedo
697ca70115
temp checkpoint
2024-11-30 12:13:20 +08:00
Concedo
ec95241e38
temp checkpoint
2024-11-30 11:59:27 +08:00
Concedo
0c8939be19
temp checkpoint
2024-11-30 11:57:28 +08:00
Concedo
e0c59486ee
default to 12 tokens drafted
2024-11-30 11:52:07 +08:00
Concedo
b21d0fe3ac
customizable speculative size
2024-11-30 11:28:19 +08:00
Concedo
f75bbb945f
speculative decoding initial impl completed (+6 squashed commit)
...
Squashed commit:
[0a6306ca0] draft wip dont use (will be squashed)
[a758a1c9c] wip dont use (will be squashed)
[e1994d3ce] wip dont use
[f59690d68] wip
[77228147d] wip on spec decoding. dont use yet
[2445bca54] wip adding speculative decoding (+1 squashed commits)
Squashed commits:
[50e341bb7] wip adding speculative decoding
2024-11-30 10:41:10 +08:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00
Xuan Son Nguyen
b782e5c7d4
server : add more test cases ( #10569 )
...
* server : add split model test
* add test speculative
* add invalid cases
2024-11-29 21:48:56 +01:00
Robert Collins
3a8e9af402
imatrix : support combine-only ( #10492 )
...
* imatrix-combine-only idea
* ensured that behavior consistent with log
2024-11-29 19:21:37 +02:00
Diego Devesa
a3a3048e7a
cleanup UI link list ( #10577 )
...
* cleanup UI link list
* sort list alphabetically
* add missing licenses
2024-11-29 17:45:08 +01:00
Georgi Gerganov
f0678c5ff4
ggml : fix I8MM Q4_1 scaling factor conversion ( #10562 )
...
ggml-ci
2024-11-29 16:25:39 +02:00
Shupei Fan
4b3242bbea
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 ( #10580 )
2024-11-29 14:49:02 +01:00
Alberto Cabrera Pérez
0f77aae560
sycl : offload of get_rows set to 0 ( #10432 )
2024-11-29 20:38:45 +08:00
Alberto Cabrera Pérez
266b8519ee
sycl : Reroute permuted mul_mats through oneMKL ( #10408 )
...
This PR fixes the failing MUL_MAT tests for the sycl backend.
2024-11-29 09:49:43 +00:00
Chenguang Li
938f608742
CANN: RoPE operator optimization ( #10563 )
...
* [cann] RoPE operator optimization
* [CANN]Code Formatting
---------
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-29 14:46:55 +08:00
Jeff Bolz
f095a649ec
vulkan: get the first command buffer submitted sooner ( #10499 )
...
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.
With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-11-29 07:18:02 +01:00
Ting Lou
678d7994f4
llava: return false instead of exit ( #10546 )
2024-11-29 01:09:46 +01:00
Georgi Gerganov
dc22344088
ggml : remove redundant copyright notice + update authors
2024-11-28 20:46:40 +02:00
Georgi Gerganov
4c0a95b107
llama : add missing model types
2024-11-28 20:45:07 +02:00
Xuan Son Nguyen
6c59567689
server : (tests) don't use thread for capturing stdout/stderr, bump openai client library ( #10568 )
...
* server : (tests) don't use thread for capturing stdout/stderr
* test: bump openai to 1.55.2
* bump openai to 1.55.3
2024-11-28 19:17:49 +01:00