Commit graph

1085 commits

Author SHA1 Message Date
Concedo
cfcdfd69bd allow embeddings models to use mmap 2025-06-07 10:14:00 +08:00
Concedo
6effb65cfe change singleinstance order 2025-06-06 21:20:30 +08:00
Concedo
740f91e3fd lower aria interval 2025-06-06 17:43:38 +08:00
Concedo
9cf32e5fee step limits over adapter for sd 2025-06-06 14:12:43 +08:00
Concedo
f6bbc350f2 various qol fixes 2025-06-05 10:26:02 +08:00
Concedo
736030bb9f save and load state upgraded to 3 available states 2025-06-04 22:09:40 +08:00
Concedo
06d2bc3404 ollama compat fixes 2025-06-04 19:22:29 +08:00
Concedo
53f1511396 use a static buffer for kv reloads instead. also, added into lite ui 2025-06-03 22:32:46 +08:00
Concedo
4b57108508 Save KV State and Load KV State to memory added. GUI not yet updated 2025-06-03 17:46:29 +08:00
Concedo
6ce85c54d6 not working correctly 2025-06-02 22:12:10 +08:00
Concedo
8e1ebc55b5 dropped support for lora base as upstream no longer uses it. If provided it will be silently ignored 2025-06-02 12:49:53 +08:00
Concedo
51dc1cf920 added scale for text lora 2025-06-02 00:13:42 +08:00
Concedo
74ef097c4a added ability to set koboldcpp as default handler for gguf and kcpps 2025-06-01 22:36:41 +08:00
Concedo
f3bb947a13 cuda use wmma flash attention for turing (+1 squashed commits)
Squashed commits:

[3c5112398] 117 (+10 squashed commit)

Squashed commit:

[4f01bb2d4] 117 graphs 80v

[7549034ea] 117 graphs

[dabf9cb99] checking if cuda 11.5.2 works

[ba7ccdb7a] another try cu11.7 only

[752cf2ae5] increase aria2c download log rate

[dc4f198fd] test send turing to wmma flash attention

[496a22e83] temp build test cu11.7.0

[ca759c424] temp build test cu11.7

[c46ada17c] test build: enable virtual80 for oldcpu

[3ccfd939a] test build: with cuda graphs for all
2025-06-01 11:41:45 +08:00
Concedo
b08dca65ed Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	common/CMakeLists.txt
#	common/arg.cpp
#	common/chat.cpp
#	examples/parallel/README.md
#	examples/parallel/parallel.cpp
#	ggml/cmake/common.cmake
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/rope.cpp
#	models/ggml-vocab-bert-bge.gguf.inp
#	models/ggml-vocab-bert-bge.gguf.out
#	models/ggml-vocab-command-r.gguf.inp
#	models/ggml-vocab-command-r.gguf.out
#	models/ggml-vocab-deepseek-coder.gguf.inp
#	models/ggml-vocab-deepseek-coder.gguf.out
#	models/ggml-vocab-deepseek-llm.gguf.inp
#	models/ggml-vocab-deepseek-llm.gguf.out
#	models/ggml-vocab-falcon.gguf.inp
#	models/ggml-vocab-falcon.gguf.out
#	models/ggml-vocab-gpt-2.gguf.inp
#	models/ggml-vocab-gpt-2.gguf.out
#	models/ggml-vocab-llama-bpe.gguf.inp
#	models/ggml-vocab-llama-bpe.gguf.out
#	models/ggml-vocab-llama-spm.gguf.inp
#	models/ggml-vocab-llama-spm.gguf.out
#	models/ggml-vocab-mpt.gguf.inp
#	models/ggml-vocab-mpt.gguf.out
#	models/ggml-vocab-phi-3.gguf.inp
#	models/ggml-vocab-phi-3.gguf.out
#	models/ggml-vocab-qwen2.gguf.inp
#	models/ggml-vocab-qwen2.gguf.out
#	models/ggml-vocab-refact.gguf.inp
#	models/ggml-vocab-refact.gguf.out
#	models/ggml-vocab-starcoder.gguf.inp
#	models/ggml-vocab-starcoder.gguf.out
#	requirements/requirements-gguf_editor_gui.txt
#	tests/CMakeLists.txt
#	tests/test-chat.cpp
#	tests/test-grammar-integration.cpp
#	tests/test-json-schema-to-grammar.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/run/run.cpp
#	tools/server/CMakeLists.txt
2025-05-31 13:04:21 +08:00
Concedo
c923e9fe46 added option to unload model from admin control 2025-05-31 11:51:09 +08:00
Concedo
08e0745e7e added singleinstance flag and local shutdown api 2025-05-31 11:37:32 +08:00
Concedo
6529326c59 allow temperatures up to 1.0 when function calling 2025-05-30 15:59:18 +08:00
Concedo
c881bb7348 match a few common oai voices 2025-05-29 23:29:17 +08:00
Concedo
26bf5b446d fixed thread count <=0 , fixed clip skip <= 0 2025-05-28 00:38:15 +08:00
Concedo
f97bbdde00 fix to allow all EOGs to trigger a stop, occam's glm4 fix, 2025-05-24 22:55:11 +08:00
Concedo
ec04115ae9 swa options now available 2025-05-24 11:50:37 +08:00
Concedo
748dfcc2e4 massively improved tool calling 2025-05-24 02:26:11 +08:00
Concedo
c4df151298 experimental swa flag 2025-05-23 21:33:26 +08:00
Concedo
499283c63a rename define to match upstream 2025-05-23 17:10:12 +08:00
Concedo
e68a5f448c add ddim sampler 2025-05-22 21:28:01 +08:00
Concedo
f125e724eb fix off-by-one npast during some instances of fast forwarding 2025-05-22 19:51:21 +08:00
Concedo
440350327c set random range for seed 2025-05-21 23:47:18 +08:00
Wagner Bruna
5d0cfc9db3
store on the image the actual random seed, for reproducibility (#1549) 2025-05-21 23:40:47 +08:00
Concedo
8b6dfbd1be disabling the gMask prefix for glm-4 completions 2025-05-21 17:29:24 +08:00
Concedo
49305942ab try disabling the gMask prefix for glm-4 completions 2025-05-21 16:47:08 +08:00
Concedo
5f4923bf24 backend tag replacement for endtags. view results with debug mode. 2025-05-19 23:14:43 +08:00
Concedo
710c747b60 minor noscript edit 2025-05-19 17:51:44 +08:00
Concedo
c546cb638e disable showgui if skiplauncher is used 2025-05-18 01:42:14 +08:00
Concedo
ca4274e384 added size info into HF searcher 2025-05-17 00:31:54 +08:00
Concedo
5ccd4b2bf5 horde default max ctx matches main ctx 2025-05-15 10:26:20 +08:00
Concedo
c5ea7fad93 updated lite, only show processed input in debugmode 2025-05-14 17:46:54 +08:00
Concedo
21e31e255b Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/build.yml
#	.github/workflows/docker.yml
#	README.md
#	build-xcframework.sh
#	common/CMakeLists.txt
#	examples/CMakeLists.txt
#	ggml/src/ggml-cpu/CMakeLists.txt
#	ggml/src/ggml-cuda/CMakeLists.txt
#	ggml/src/ggml-metal/ggml-metal.m
#	ggml/src/ggml-metal/ggml-metal.metal
#	ggml/src/ggml-sycl/CMakeLists.txt
#	ggml/src/ggml-sycl/backend.hpp
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	ggml/src/ggml-sycl/mmvq.cpp
#	ggml/src/ggml-sycl/vecdotq.hpp
#	scripts/compare-llama-bench.py
#	src/CMakeLists.txt
#	src/llama-model.cpp
#	src/llama.cpp
#	tests/test-backend-ops.cpp
#	tests/test-opt.cpp
#	tools/llama-bench/README.md
#	tools/llama-bench/llama-bench.cpp
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/README.md
#	tools/mtmd/clip.cpp
#	tools/rpc/rpc-server.cpp
#	tools/server/CMakeLists.txt
#	tools/server/README.md
2025-05-13 00:28:35 +08:00
Concedo
40eb3a54c4 rename some toolip texts 2025-05-11 22:50:40 +08:00
Concedo
1eb6d25010 truncate middle instead of end for long strings 2025-05-11 20:26:17 +08:00
Concedo
48c3682c2c improve search 2025-05-10 19:25:26 +08:00
Concedo
50e1064ffe better passthrough handling 2025-05-10 19:11:09 +08:00
Concedo
c4a0b323f0 remove fa restrictions for vulkan 2025-05-09 17:34:14 +08:00
Concedo
b6220669f4 Merge branch 'upstream' into concedo_experimental
# Conflicts:
#	.github/workflows/docker.yml
#	Makefile
#	examples/CMakeLists.txt
#	ggml/CMakeLists.txt
#	ggml/src/CMakeLists.txt
#	ggml/src/ggml-sycl/common.hpp
#	ggml/src/ggml-sycl/convert.cpp
#	ggml/src/ggml-sycl/convert.hpp
#	ggml/src/ggml-sycl/ggml-sycl.cpp
#	scripts/sync-ggml.last
2025-05-08 23:07:33 +08:00
Concedo
7c5d47f688 multigpu warning only once 2025-05-08 00:55:09 +08:00
Concedo
fa22c1a5a4 fixed cfg scale, but turns out it sucks. embedded aria2c into pyinstaller 2025-05-07 18:30:36 +08:00
Concedo
a5b6f372a3 cfg scale wip 2025-05-07 00:36:00 +08:00
Concedo
0fa435b2a6 Merge commit '9b61acf060' into concedo_experimental
# Conflicts:
#	Makefile
#	docs/multimodal/MobileVLM.md
#	docs/multimodal/glmedge.md
#	docs/multimodal/llava.md
#	docs/multimodal/minicpmo2.6.md
#	docs/multimodal/minicpmv2.5.md
#	docs/multimodal/minicpmv2.6.md
#	requirements/requirements-all.txt
#	tools/mtmd/CMakeLists.txt
#	tools/mtmd/README.md
#	tools/mtmd/android/adb_run.sh
#	tools/mtmd/android/build_64.sh
#	tools/mtmd/clip-quantize-cli.cpp
2025-05-06 23:34:21 +08:00
Concedo
38a8778f24 wip cfg scale 2025-05-06 23:06:25 +08:00
Concedo
13cee48740 embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)
Squashed commits:

[b9b695217] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[90b5d389d] embed aria2c for windows, add slowness check with highpriority recommendation (+1 squashed commits)

Squashed commits:

[fbbaa989f] embed aria2c for windows
2025-05-06 18:56:02 +08:00