koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 09:59:41 +00:00

Author	SHA1	Message	Date
Concedo	5d7c5e9e33	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/tts/tts.cpp	2025-03-16 15:42:39 +08:00
Concedo	2401502cbd	improvement to tool calling, allowing specific tools to be used	2025-03-16 15:20:08 +08:00
Concedo	9f7fd63160	revert unwanted change to tool calling	2025-03-16 01:35:48 +08:00
marcoStocchi	f4c3dd5daa	llama-tts : add '-o' option (#12398 ) * added -o option to specify an output file name * llama-tts returns ENOENT in case of file write error note : PR #12042 is closed as superseded with this one.	2025-03-15 17:23:11 +01:00
Concedo	98eade358a	more rocm include dir	2025-03-15 23:29:00 +08:00
aubreyli	3d35d87b41	SYCL: Delete redundant plus sign and space (#12391 )	2025-03-15 15:49:03 +01:00
fairydreaming	b19bd064c0	SYCL : support non-contiguous tensors in binary ops (add, sub, etc) (#12399 ) * sycl : support non-contiguous tensors in binary ops * sycl : silence unused variable warning --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-03-15 22:19:30 +08:00
Concedo	67851e5415	Merge branch 'upstream' into concedo_experimental # Conflicts: # examples/run/run.cpp # ggml/src/ggml-cann/aclnn_ops.cpp	2025-03-15 19:54:19 +08:00
Concedo	e84596ec1a	add config for default gen tokens and bos toggle	2025-03-15 19:53:06 +08:00
Concedo	bfc30066c9	fixed a clip processing bug	2025-03-15 17:49:49 +08:00
Concedo	7272165e0e	verbosity	2025-03-15 12:13:04 +08:00
Concedo	4212f0b8e8	wip on multiple fixes	2025-03-15 10:50:36 +08:00
Chenguang Li	92a391327e	[CANN]MUL_MAT optimization (#12382 )	2025-03-15 09:31:08 +08:00
Eric Curtin	9f2250ba72	Add CLI arg to llama-run to adjust the number of threads used (#12370 ) We default to 4, sometimes we want to manually adjust this Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2025-03-14 16:41:20 +00:00
Sigbjørn Skjæret	774973b8f3	main : add -sysf / --system-prompt-file (#12249 ) (#12250 ) * add system_prompt_file * add -sysf / --system-prompt-file * remove system_prompt_file	2025-03-14 16:57:05 +01:00
Concedo	4a29e216e7	edit readme	2025-03-14 21:06:55 +08:00
fairydreaming	8fcb563613	Load all MoE experts during warmup (#11571 ) * llama : introduce llama_set_warmup() API call that controls warmup mode; use all MoE experts during warmup * common : use new API to enable warmup mode during model warmup --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>	2025-03-14 13:47:05 +01:00
Concedo	d7498e7e8a	added model switching to gguf in admin mode (auto guess layers)	2025-03-14 19:45:55 +08:00
Concedo	30cb77a900	rename replace_instruct_placeholders field	2025-03-14 18:37:12 +08:00
Concedo	be3bba67ff	Merge branch 'upstream' into concedo_experimental # Conflicts: # src/llama-model.cpp	2025-03-14 18:25:21 +08:00
Victor	add2a3aa5a	server: fix "--grammar-file" parameter (#12285 )	2025-03-14 11:21:17 +01:00
Concedo	782e1e193a	replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else (+1 squashed commits) Squashed commits: [4a73c8d3] replaced winclinfo.exe with a simplified simpleclinfo.exe that only provides device names and nothing else	2025-03-14 18:18:32 +08:00
Concedo	6a1dd57435	gemma3 template, updated lite, fixed tool calling, reenable ctx shift for gemma3	2025-03-14 17:47:01 +08:00
Georgi Gerganov	c522ce4143	graph : simplify attn input build for unified KV cache (#12381 ) ggml-ci	2025-03-14 10:47:44 +02:00
Georgi Gerganov	081bee8c64	hparams : add SWA rope parameters (#12374 ) ggml-ci	2025-03-14 09:03:24 +02:00
Concedo	7dc72db9de	Merge branch 'upstream' into concedo_experimental	2025-03-14 11:58:53 +08:00
Concedo	0db4ae6237	traded my ink for a pen	2025-03-14 11:58:15 +08:00
Georgi Gerganov	84d5475541	llama : fix Gemma3 SWA KV cache shift (#12373 ) * llama : fix Gemma3 SWA KV cache shift ggml-ci * hparams : add comment [no ci]	2025-03-13 19:08:07 +02:00
Concedo	52cf1ded0c	remove unwanted print	2025-03-14 00:24:28 +08:00
Concedo	bdf2977372	fixed windows ci	2025-03-13 20:45:16 +08:00
Concedo	0460d92cc3	disable context shifting for gemma3	2025-03-13 20:28:26 +08:00
Concedo	ca698f0cbe	tweaked sd img metadata	2025-03-13 20:04:29 +08:00
Wagner Bruna	5413be2c1b	sd: add generation parameters to image metadata (#1416 ) Straight adaptation from stable-diffusion.cpp main.cpp.	2025-03-13 19:35:06 +08:00
Xuan-Son Nguyen	be7c303410	arg : no n_predict = -2 for examples except for main and infill (#12364 )	2025-03-13 12:34:54 +01:00
Concedo	2c9ade61fe	test automatic vk shader rebuilding	2025-03-13 19:34:15 +08:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Ishaan Gandhi	2048b5913d	server : fix crash when using verbose output with input tokens that are not in printable range (#12178 ) (#12338 ) * Fix DOS index bug * Remove new APIs * remove extra line * Remove from API * Add extra newline * Update examples/server/server.cpp --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-03-13 11:10:05 +01:00
Concedo	e75539e8cb	too many issues without BOS (+1 squashed commits) Squashed commits: [7138d941] only print bos alert in debug	2025-03-13 16:48:29 +08:00
Concedo	1ef41c2124	streamline output console log (+1 squashed commits) Squashed commits: [ca474bdd] streamline output console log	2025-03-13 15:33:49 +08:00
Concedo	16137f4281	gemma3 now works correctly	2025-03-13 14:34:18 +08:00
Concedo	57c9523405	sd lora from url	2025-03-13 10:55:01 +08:00
Oscar Barenys	f08f4b3187	Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301 )	2025-03-12 20:06:58 +01:00
Concedo	77debb1b1b	gemma3 vision works, but is using more tokens than expected - may need resizing	2025-03-13 00:31:16 +08:00
Daniel Bevenius	80a02aa858	llama.swiftui : fix xcframework dir in README [no ci] (#12353 ) This commit fixes the path to the xcframework in the README file which I had forgotten to change after renaming the build directory.	2025-03-12 13:45:32 +01:00
Concedo	eb1809c105	add more perf stats	2025-03-12 18:58:27 +08:00
Alberto Cabrera Pérez	363f8c5d67	sycl : variable sg_size support for mmvq kernels (#12336 )	2025-03-12 09:57:32 +00:00
uvos	34c961b181	CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315 ) When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need to avoid launching them with parameters for warp64	2025-03-12 10:14:11 +01:00
Xuan-Son Nguyen	7841fc723e	llama : Add Gemma 3 support (+ experimental vision capability) (#12343 ) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64	2025-03-12 09:30:24 +01:00
Jeff Bolz	bf69cfe62f	vulkan: fix bug in coopmat1 mul_mat_id (#12316 ) * tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id	2025-03-12 06:59:19 +01:00
Concedo	e500968f92	fixed ggml common path in metal build	2025-03-12 10:58:57 +08:00

... 2 3 4 5 6 ...

7391 commits