koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-31 05:03:44 +00:00

Author	SHA1	Message	Date
Concedo	6128a91d5a	trying somethning else (+1 squashed commits) Squashed commits: [bf497e5cf] trying somethning else	2025-12-21 15:38:07 +08:00
Concedo	fedd529fdc	autofit counts overheads	2025-12-21 14:31:08 +08:00
Concedo	edfc961ff8	transplanted tk	2025-12-21 13:33:45 +08:00
Concedo	8b066d9765	don't crash workgroup size	2025-12-21 13:22:34 +08:00
Concedo	0c7e1d91ea	try a transplanted tk (+1 squashed commits) Squashed commits: [1eb87e4d1] try a transplanted tk (+1 squashed commits) Squashed commits: [094d1566a] try a transplanted tk	2025-12-21 11:31:32 +08:00
Concedo	d69db26b44	fix stb multiple impl	2025-12-20 12:05:50 +08:00
Concedo	17b4b888d0	revert changes for now, we'll do it again next time	2025-12-20 11:02:34 +08:00
Concedo	c406b9f33e	another font check (+1 squashed commits) Squashed commits: [6da9493ec] another font check	2025-12-20 09:49:29 +08:00
Concedo	7304640f72	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # docs/android.md # docs/backend/hexagon/CMakeUserPresets.json # examples/llama.android/app/src/main/res/layout/activity_main.xml # examples/llama.android/app/src/main/res/layout/item_message_assistant.xml # examples/llama.android/app/src/main/res/layout/item_message_user.xml # examples/model-conversion/scripts/causal/run-org-model.py # examples/model-conversion/scripts/utils/common.py # ggml/CMakeLists.txt # ggml/src/ggml-hexagon/CMakeLists.txt # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/matmul-ops.c # tests/test-arg-parser.cpp # tools/server/README.md	2025-12-20 09:32:06 +08:00
Concedo	714ab0682e	Revert "Revert "llama : Async DirectIO model loading on Linux (#18012 )"" This reverts commit `a45fc5ee88`.	2025-12-20 09:25:10 +08:00
Concedo	710d88687b	try a more modern way of fixing font since xft is dead	2025-12-20 09:24:30 +08:00
Sigbjørn Skjæret	74e05131e9	ci : remove non-windows zip artifacts (#18201 ) * remove non-windows zip artifacts * add cuda dll links	2025-12-19 22:29:46 +01:00
Sigbjørn Skjæret	f74747d886	ci : only save ccache on master (#18207 )	2025-12-19 22:29:37 +01:00
Alfred	ce734a8a2f	ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (#17977 ) * feat: implement real Q8_0 * feat: adding cmake option for configuring FP32 quantize group size * typo: set() shall be used --------- Co-authored-by: ngdxzy <zhenyu_xu@uri.edu>	2025-12-19 09:42:28 -08:00
Pascal	14931a826e	arg: fix order to use short form before long form (#18196 ) * arg: fix order to use short form before long form * arg: update doc * arg: update test-arg-parser * arg: address review feedback from ngxson simplified to check first.length() <= last.length() only fixed: --sampler-seq, --rerank, --draft ordering note: middle positions in 3+ arg sets are not verified * arg: update doc	2025-12-19 18:01:56 +01:00
Concedo	9458e08346	fixed https://github.com/LostRuins/koboldcpp/issues/1892	2025-12-19 22:52:39 +08:00
Julius Tischbein	f99ef53d2a	llama : Changing off_t to size_t for Windows (#18204 )	2025-12-19 16:42:46 +02:00
Concedo	9ea153c14c	try a more modern way of fixing font since xft is dead	2025-12-19 21:50:17 +08:00
Concedo	9ea6a3fa62	add download page	2025-12-19 19:59:12 +08:00
Aman Gupta	cc0a04343e	server: friendlier error msg when ctx < input (#18174 ) * llama-server: friendlier error msg when ctx < input This PR adds formatted strings to the server's send_error function * llama-server: use string_format inline * fix test	2025-12-19 12:10:00 +01:00
Xuan-Son Nguyen	98c1c7a7bf	presets: refactor, allow cascade presets from different sources, add global section (#18169 ) * presets: refactor, allow cascade presets from different sources * update docs * fix neg arg handling * fix empty mmproj * also filter out server-controlled args before to_ini() * skip loading custom_models if not specified * fix unset_reserved_args * fix crash on windows	2025-12-19 12:08:20 +01:00
Concedo	a45fc5ee88	Revert "llama : Async DirectIO model loading on Linux (#18012 )" This reverts commit `4d4f4cacd1`.	2025-12-19 19:06:30 +08:00
Aleksander Grygier	acb73d8340	webui: Add editing attachments in user messages (#18147 ) Some checks failed Python check requirements.txt / check-requirements (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details Python Type-Check / pyright type-check (push) Has been cancelled Details * feat: Enable editing attachments in user messages * feat: Improvements for data handling & UI * docs: Update Architecture diagrams * chore: update webui build output * refactor: Exports * chore: update webui build output * feat: Add handling paste for Chat Message Edit Form * chore: update webui build output * refactor: Cleanup * chore: update webui build output	2025-12-19 11:14:07 +01:00
Concedo	2e57e5ead4	rename eval function	2025-12-19 17:54:23 +08:00
Concedo	e9ae0cb2dd	added support for RNN models in smartcache	2025-12-19 16:36:25 +08:00
Daniel Bevenius	0a271d82b4	model-conversion : add verbose flag in run-org-model.py (#18194 ) This commit adds a --verbose flag to the run-org-model.py script to enable or disable detailed debug output, such as input and output tensors for each layer. Debug utilities (summarize, debug_hook, setup_rope_debug) have been moved to utils/common.py. The motivation for this is that the detailed debug output can be useful for diagnosing issues with model conversion or execution, but it can also produce a large amount of output that may not always be needed. The script will also be further cleaned/refactored in follow-up commits.	2025-12-19 08:43:16 +01:00
Naco Siren	52fc7fee8a	android: fix missing screenshots for Android.md (#18156 ) * Android basic sample app layout polish * Add missing screenshots and polish android README doc * Replace file blobs with URLs served by GitHub pages service.	2025-12-19 09:32:04 +02:00
Jeff Bolz	cdbada8d10	vulkan: Add perf logger mode with concurrency (#17944 ) This implements a variation of the perf logger where rather than timing each operation individually with effectively a barrier in between, we put the timing boundaries where we already synchronize and time the groups of work that normally overlap. This can be useful to help understand whether individual operations need to be optimized, or if the group is already running efficiently. GGML_VK_PERF_LOGGER_CONCURRENT=1 enables the new mode (when GGML_VK_PERF_LOGGER is also set). GGML_VK_SYNC_LOGGER=1 replaces the ENABLE_SYNC_LOGGING compile time switch.	2025-12-19 06:36:46 +01:00
Concedo	cde4791e36	fix tools building	2025-12-19 12:08:29 +08:00
Concedo	51b1d12914	Merge branch 'upstream' into concedo_experimental # Conflicts: # tests/test-backend-ops.cpp # tools/mtmd/CMakeLists.txt	2025-12-19 11:11:19 +08:00
Concedo	fef2ea46fd	Merge remote-tracking branch 'jeff/im2col_wglimit' into concedo_experimental # Conflicts: # tests/test-backend-ops.cpp	2025-12-19 11:01:47 +08:00
Concedo	58eb5573de	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/hvx-utils.c # ggml/src/ggml-hexagon/htp/main.c # src/llama-model.cpp # tools/server/README.md	2025-12-19 11:00:43 +08:00
Xuan-Son Nguyen	8ea958d4d9	model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106 ) * ASR with LFM2-Audio-1.5B * Set rope_theta * Fix comment * Remove rope_theta setting * Address PR feedback * rename functions to conformer * remove some redundant ggml_cont * fix missing tensor * add prefix "a." for conv tensors * remove redundant reshape * clean up * add test model --------- Co-authored-by: Tarek Dakhran <tarek@liquid.ai>	2025-12-19 00:18:01 +01:00
Jeff Bolz	442723c946	vulkan: fix im2col overflowing maxworkgroupcount	2025-12-18 12:16:41 -06:00
Concedo	e005fc2587	Merge commit '`8dcc3662a2`' into concedo_experimental Keep changes from https://github.com/ggml-org/llama.cpp/pull/18096 without https://github.com/ggml-org/llama.cpp/pull/14904 Reason is to maintain compatibility with 2023 w64devkit # Conflicts: # .github/ISSUE_TEMPLATE/019-bug-misc.yml # examples/model-conversion/scripts/causal/run-org-model.py # examples/speculative/speculative.cpp # ggml/src/ggml-cpu/arch-fallback.h # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-cpu/repack.h # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/hvx-utils.c # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c	2025-12-19 02:11:55 +08:00
Concedo	fb31059f9c	fixed a bug in vision with mrope, mrope is refactored to match upstream, should be more accurate now	2025-12-19 01:23:52 +08:00
Pascal	f9ec8858ed	webui: display prompt processing stats (#18146 ) Some checks are pending Python Type-Check / pyright type-check (push) Waiting to run Details * webui: display prompt processing stats * feat: Improve UI of Chat Message Statistics * chore: update webui build output * refactor: Post-review improvements * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-12-18 17:55:03 +01:00
Concedo	a01b49098c	fix tool builds	2025-12-18 23:26:31 +08:00
Concedo	cefb32df19	track clip img patch nx and ny	2025-12-18 22:58:10 +08:00
Taimur Ahmad	f716588e63	ggml-cpu: extend support for RVV floating-point kernels (#17318 ) * cmake: add BF16 RVV flag for ggml-cpu * ggml-cpu: add floating-point conversion kernels * ggml: add floating-point kernels Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: fix lmul in vec_dot_bf16 * ggml-cpu: change redsum to lmul 4, fix leftover --------- Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>	2025-12-18 16:02:09 +02:00
Xuan-Son Nguyen	4d1316c440	arg: fix ASAN error on sampler_type_names empty (#18167 )	2025-12-18 14:30:32 +01:00
Concedo	fae2ff6d2d	fix override tensors string matching issue (+2 squashed commit) Squashed commit: [850340501] will be deleted later, quick test [eb4f569a7] debug buffer types	2025-12-18 21:22:49 +08:00
Sigbjørn Skjæret	ec7b9329ae	gguf-py : use copy-on-write mode for localtensor (#18162 )	2025-12-18 13:45:38 +01:00
yulo	54189c0d39	remove i_major_dual (#18157 ) Co-authored-by: zhang hui <you@example.com>	2025-12-18 12:50:56 +01:00
Aleksander Grygier	9ce64aed7d	webui: Fix selecting generated output issues during active streaming (#18091 ) * draft: incremental markdown rendering with stable blocks * refactor: Logic improvements * refactor: DRY Markdown post-processing logic * refactor: ID generation improvements * fix: Remove runes * refactor: Clean up & add JSDocs * chore: update webui static output * fix: Add tick to prevent race conditions for rendering Markdown blocks Suggestion from @ServeurpersoCom Co-authored-by: Pascal <admin@serveurperso.com> * chore: Run `npm audit fix` * chore: update webui static output * feat: Improve performance using global counter & id instead of UUID * refactor: Enhance Markdown rendering with link and code features * chore: update webui static output * fix: Code block content extraction * chore: update webui static output * chore: update webui static output --------- Co-authored-by: Pascal <admin@serveurperso.com>	2025-12-18 11:13:52 +01:00
Kim S.	900316da4e	webui: fix chat screen shadow width (#18010 ) * webui: fix chat screen shadow width * chore: add index.html.gz	2025-12-18 11:08:42 +01:00
Concedo	30fecac3a3	small tweak	2025-12-18 15:41:22 +08:00
Johannes Gäßler	57c1e05643	llama: offload output layer to GPU first (#18148 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details	2025-12-18 08:12:18 +01:00
Sigbjørn Skjæret	9cff4cc554	convert : sort and use file parts from model index if present (#18043 ) * keep file part order from model index * treat index as authoritative * sort index parts	2025-12-18 07:54:54 +01:00
Julius Tischbein	4d4f4cacd1	llama : Async DirectIO model loading on Linux (#18012 ) * Uncached model read * Removing additional --mmap arg * Removing trailing whitespaces * Adding fallback when O_DIRECT is not supported * Remove branching in llama-model-loader.cpp and reduce code duplications in llama-mmap.cpp * Adding maybe unused keyword for Mac and Windows. * File seek aligned * Removing all branches for direct_io in llama-model-loader.cpp * Always use alignment from llama_file * use_mmap=true	2025-12-18 08:27:19 +02:00

1 2 3 4 5 ...

10914 commits