koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-23 21:13:36 +00:00

Author	SHA1	Message	Date
Concedo	d577187875	update sdui	2025-12-21 20:35:19 +08:00
Concedo	7304640f72	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # docs/android.md # docs/backend/hexagon/CMakeUserPresets.json # examples/llama.android/app/src/main/res/layout/activity_main.xml # examples/llama.android/app/src/main/res/layout/item_message_assistant.xml # examples/llama.android/app/src/main/res/layout/item_message_user.xml # examples/model-conversion/scripts/causal/run-org-model.py # examples/model-conversion/scripts/utils/common.py # ggml/CMakeLists.txt # ggml/src/ggml-hexagon/CMakeLists.txt # ggml/src/ggml-hexagon/htp/CMakeLists.txt # ggml/src/ggml-hexagon/htp/matmul-ops.c # tests/test-arg-parser.cpp # tools/server/README.md	2025-12-20 09:32:06 +08:00
Concedo	714ab0682e	Revert "Revert "llama : Async DirectIO model loading on Linux (#18012 )"" This reverts commit `a45fc5ee88`.	2025-12-20 09:25:10 +08:00
Julius Tischbein	f99ef53d2a	llama : Changing off_t to size_t for Windows (#18204 )	2025-12-19 16:42:46 +02:00
Concedo	a45fc5ee88	Revert "llama : Async DirectIO model loading on Linux (#18012 )" This reverts commit `4d4f4cacd1`.	2025-12-19 19:06:30 +08:00
Concedo	58eb5573de	Merge branch 'upstream' into concedo_experimental # Conflicts: # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/hvx-utils.c # ggml/src/ggml-hexagon/htp/main.c # src/llama-model.cpp # tools/server/README.md	2025-12-19 11:00:43 +08:00
Concedo	e005fc2587	Merge commit '`8dcc3662a2`' into concedo_experimental Keep changes from https://github.com/ggml-org/llama.cpp/pull/18096 without https://github.com/ggml-org/llama.cpp/pull/14904 Reason is to maintain compatibility with 2023 w64devkit # Conflicts: # .github/ISSUE_TEMPLATE/019-bug-misc.yml # examples/model-conversion/scripts/causal/run-org-model.py # examples/speculative/speculative.cpp # ggml/src/ggml-cpu/arch-fallback.h # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-cpu/repack.h # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/act-ops.c # ggml/src/ggml-hexagon/htp/htp-msg.h # ggml/src/ggml-hexagon/htp/hvx-utils.c # ggml/src/ggml-hexagon/htp/hvx-utils.h # ggml/src/ggml-hexagon/htp/main.c	2025-12-19 02:11:55 +08:00
Johannes Gäßler	57c1e05643	llama: offload output layer to GPU first (#18148 ) Some checks are pending Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Waiting to run Details Python Type-Check / pyright type-check (push) Waiting to run Details	2025-12-18 08:12:18 +01:00
Julius Tischbein	4d4f4cacd1	llama : Async DirectIO model loading on Linux (#18012 ) * Uncached model read * Removing additional --mmap arg * Removing trailing whitespaces * Adding fallback when O_DIRECT is not supported * Remove branching in llama-model-loader.cpp and reduce code duplications in llama-mmap.cpp * Adding maybe unused keyword for Mac and Windows. * File seek aligned * Removing all branches for direct_io in llama-model-loader.cpp * Always use alignment from llama_file * use_mmap=true	2025-12-18 08:27:19 +02:00
Johannes Gäßler	8dcc3662a2	llama-fit-params: fix memory print (#18136 )	2025-12-17 21:10:03 +01:00
Georgi Gerganov	4301e27319	common : restore grammar-based rejection sampling (#18137 ) * common : restart grammar-based rejection sampling * sampling : allow null samplers	2025-12-17 19:46:00 +02:00
Concedo	1f2c9f6b62	gpt4v not working correctly	2025-12-17 21:02:16 +08:00
Concedo	1daeed5d4d	Merge commit '`9963b81f63`' into concedo_experimental # Conflicts: # .github/workflows/server.yml # SECURITY.md # docs/backend/SYCL.md # examples/model-conversion/README.md # examples/model-conversion/scripts/embedding/compare-embeddings-logits.sh # ggml/src/ggml-hexagon/ggml-hexagon.cpp # ggml/src/ggml-hexagon/htp/matmul-ops.c # tests/CMakeLists.txt # tests/test-chat.cpp # tests/test-json-schema-to-grammar.cpp	2025-12-17 20:30:34 +08:00
Tarek Dakhran	982060fadc	model: fix LFM2_MOE missing tensors (#18132 )	2025-12-17 12:17:11 +01:00
Concedo	c93c4c5505	Merge commit '`4a4f7e6550`' into concedo_experimental # Conflicts: # .github/ISSUE_TEMPLATE/011-bug-results.yml # CODEOWNERS # README.md # ci/run.sh # docs/development/HOWTO-add-model.md # grammars/README.md # src/llama-context.cpp # src/llama.cpp # tools/CMakeLists.txt # tools/completion/README.md # tools/llama-bench/README.md	2025-12-17 14:30:39 +08:00
Johannes Gäßler	d0794e89d9	llama-fit-params: force disable mlock (#18103 )	2025-12-17 00:50:12 +01:00
Johannes Gäßler	9dcac6cf9f	llama-fit-params: lower ctx size for multi GPU (#18101 )	2025-12-17 00:49:34 +01:00
Johannes Gäßler	0e49a7b8b4	llama-fit-params: fix underflow for dense models (#18095 )	2025-12-17 00:47:37 +01:00
Xuan-Son Nguyen	ef83fb8601	model: fix LFM2 missing tensors (#18105 )	2025-12-16 19:07:43 +01:00
Concedo	050a5b1f52	Merge commit '`4aced7a631`' into concedo_experimental # Conflicts: # .devops/cann.Dockerfile # .devops/cpu.Dockerfile # .devops/cuda.Dockerfile # .devops/intel.Dockerfile # .devops/musa.Dockerfile # .devops/rocm.Dockerfile # .devops/tools.sh # .devops/vulkan.Dockerfile # .github/workflows/build.yml # .github/workflows/release.yml # .gitignore # docs/ops.md # docs/ops/SYCL.csv # examples/batched/batched.cpp # examples/eval-callback/eval-callback.cpp # examples/gen-docs/gen-docs.cpp # examples/lookahead/lookahead.cpp # examples/lookup/lookup-create.cpp # examples/lookup/lookup-stats.cpp # examples/lookup/lookup.cpp # examples/model-conversion/scripts/causal/compare-logits.py # examples/model-conversion/scripts/causal/run-org-model.py # examples/model-conversion/scripts/utils/check-nmse.py # examples/parallel/parallel.cpp # examples/retrieval/retrieval.cpp # examples/save-load-state/save-load-state.cpp # examples/speculative-simple/speculative-simple.cpp # examples/speculative/speculative.cpp # examples/training/finetune.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-cpu/repack.cpp # ggml/src/ggml-sycl/common.hpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/dequantize.hpp # ggml/src/ggml-sycl/dpct/helper.hpp # ggml/src/ggml-sycl/element_wise.cpp # ggml/src/ggml-sycl/element_wise.hpp # ggml/src/ggml-sycl/ggml-sycl.cpp # ggml/src/ggml-sycl/mmvq.cpp # ggml/src/ggml-sycl/pad.cpp # ggml/src/ggml-sycl/ssm_conv.cpp # ggml/src/ggml-sycl/vecdotq.hpp # pyrightconfig.json # scripts/sync-ggml.last # tests/test-arg-parser.cpp # tests/test-backend-ops.cpp # tools/cvector-generator/cvector-generator.cpp # tools/imatrix/imatrix.cpp # tools/mtmd/CMakeLists.txt # tools/mtmd/clip.cpp # tools/perplexity/perplexity.cpp # tools/server/README.md	2025-12-16 23:14:12 +08:00
Johannes Gäßler	ec98e20021	llama: fix early stop in params_fit if ctx is set (#18070 ) Some checks failed Python Type-Check / pyright type-check (push) Waiting to run Details Python check requirements.txt / check-requirements (push) Has been cancelled Details Check Pre-Tokenizer Hashes / pre-tokenizer-hashes (push) Has been cancelled Details	2025-12-16 14:24:00 +01:00
Xuan-Son Nguyen	7f2b2f3c77	arch: refactor LLM_TENSOR_NAMES (#18051 ) * arch: refactor LLM_TENSOR_NAMES * update docs * typo * fix LLM_ARCH_NEMOTRON_H_MOE * show more meaningful error message on missing tensor * fix and tested LLM_ARCH_NEMOTRON_H_MOE	2025-12-16 13:22:30 +01:00
Piotr Wilkin (ilintar)	a5251ca11d	Optimization: Qwen3 next autoregressive pass (#17996 ) * It's Qwen3 Next, the lean mean token generation machine! * Apply patches from thread * Remove recurrent version, only keep chunked and autoregressive * Remove unnecessary conts and asserts * Remove more extra conts and asserts * Cleanup masking	2025-12-16 11:59:53 +01:00
Xuan-Son Nguyen	3d86c6c2b5	model: support GLM4V vision encoder (#18042 ) * convert ok * no deepstack * less new tensors * cgraph ok * add mrope for text model * faster patch merger * add GGML_ROPE_TYPE_MRNORM * add support for metal * move glm4v do dedicated graph * convert: add norm_embd * clip: add debugging fn * working correctly * fix style * use bicubic * fix mrope metal * improve cpu * convert to neox ordering on conversion * revert backend changes * force stop if using old weight * support moe variant * fix conversion * fix convert (2) * Update tools/mtmd/clip-graph.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * process mrope_section on TextModel base class * resolve conflict merge --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 11:25:26 +01:00
Concedo	e88bf41fdc	Merge commit '`12280ae905`' into concedo_experimental # Conflicts: # .github/workflows/build.yml # common/CMakeLists.txt # docs/docker.md # examples/model-conversion/scripts/causal/compare-logits.py # ggml/src/ggml-hexagon/htp/rope-ops.c # tests/test-backend-ops.cpp # tests/test-barrier.cpp # tools/server/CMakeLists.txt # tools/server/README.md	2025-12-16 16:29:01 +08:00
Chris Peterson	2aa45ef9e3	llama: Include algorithm header needed for C++23 (#18078 )	2025-12-16 09:37:55 +02:00
Georgi Gerganov	c560316440	graph : reuse SSM graphs (#16490 ) * graph : reuse hybrid graphs * graph : reuse recurrent graphs * graph : fix reuse check for recurrent inputs * memory : move the recurrent state into the memory context * Revert "memory : move the recurrent state into the memory context" This reverts commit 00f115fe810815d4a22a6dee0acc346131e970e1. * cont : fix build	2025-12-16 09:36:21 +02:00
Daniel Bevenius	2995341730	llama : add support for NVIDIA Nemotron 3 Nano (#18058 ) * llama : add support for NVIDIA Nemotron Nano 3 This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 07:19:26 +01:00
HelloKS	9d52f17ae3	model : add KORMo model (#18032 ) * vocab: add KORMo Tokenizer * model: add KORMoForCausalLM * vocab: change pretokenizer to qwen2 * lint: fix unintended line removal * model: make qwen2 bias tensor optional * model: use qwen2 architecture for KORMo	2025-12-15 18:51:43 +01:00
ssweens	4529c660c8	kv-cache: Fix state restore fragmented cache (#17982 ) * kv-cache : fix state restore with fragmented cache (#17527) Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache. * tests : update logic * cleanup: tightened state_read_meta sig, added is_contiguous case * fix: state_read_meta arg reorder loose ends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-15 19:28:35 +02:00
Johannes Gäßler	b1f3a6e5db	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 ) * llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]	2025-12-15 09:24:59 +01:00
Xuan-Son Nguyen	0759b09c90	graph: add f_attn_temp_offset (#18025 )	2025-12-14 13:05:59 +01:00
Georgi Gerganov	609a2d0268	models : fix YaRN regression + consolidate logic (#18006 ) * models : fix YaRN regression + consolidate logic * cont : fix the fix * cont : remove header * cont : add header	2025-12-14 08:34:56 +02:00
Jeff Bolz	5266379bca	llama_context: synchronize before reallocating output buffer (#17974 )	2025-12-13 09:19:51 -06:00
Georgi Gerganov	7bed317f53	models : fix the attn_factor for mistral3 graphs + improve consistency (#17945 ) * models : fix the attn_factor for mistral3 graphs * cont : rework attn_factor correction logic * cont : make deepseek2 consistent * cont : add TODO * cont : special-case DSv2 * cont : revert Mistral 3 Large changes * cont : fix DS2 to use the original attn_factor * cont : minor comments	2025-12-12 17:12:40 +02:00
Concedo	34d243bf3c	Merge commit '`b677721819`' into concedo_experimental # Conflicts: # CONTRIBUTING.md # common/chat.cpp # docs/ops.md # docs/ops/CPU.csv # docs/ops/CUDA.csv # docs/ops/OpenCL.csv # ggml/src/ggml-cann/aclnn_ops.cpp # ggml/src/ggml-cann/common.h # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-sycl/softmax.cpp # grammars/README.md # src/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat.cpp # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-llama-grammar.cpp # tools/mtmd/CMakeLists.txt	2025-12-11 23:33:19 +08:00
Concedo	278e45becf	Merge commit '`2fa51c19b0`' into concedo_experimental # Conflicts: # .github/actions/windows-setup-cuda/action.yml # .github/workflows/build-linux-cross.yml # .github/workflows/release.yml # README.md # docs/build-riscv64-spacemit.md # examples/model-conversion/logits.cpp # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # models/templates/Kimi-K2-Instruct.jinja # models/templates/Kimi-K2-Thinking.jinja # tests/test-chat.cpp # tools/server/README.md	2025-12-11 23:04:48 +08:00
Concedo	fd0d0cab03	move pipeline parallelism to a --pipelineparallel launch flag	2025-12-11 21:03:41 +08:00
Georgi Gerganov	d9f8f60618	batch : fix sequence id ownership (#17915 ) * batch : fix sequence id ownage * cont : reduce allocations	2025-12-11 14:29:47 +02:00
Georgi Gerganov	4dff236a52	ggml : remove GGML_KQ_MASK_PAD constant (#17910 ) * ggml : remove GGML_KQ_MASK_PAD constant * cont : remove comment	2025-12-10 20:53:16 +02:00
Eric Zhang	b677721819	model : Qwen3-Next-80B-A3B has 48 layers (#17898 ) * model : Qwen3-Next-80B-A3B has 48 layers * model : Add 80B-A3B type name	2025-12-10 15:22:40 +01:00
Rhys-T	63908b631a	cmake: fix Mach-O current version number (#17877 ) PR #17091 set the VERSION of various libraries to 0.0.abcd, where abcd is the LLAMA_BUILD_NUMBER. That build number is too large to fit in the Mach-O 'current version' field's 'micro' part, which only goes up to 255. This just sets the Mach-O current version to 0 to get it building properly again. Fixes #17258.	2025-12-09 13:17:41 +02:00
Sigbjørn Skjæret	42b12b5608	model : nit, DeepSeek V1 MoE is 16B and GigaChat is 20B (#12652 ) * nit, DeepSeek V1 MoE is 16B * base type on n_ff_exp instead	2025-12-09 12:15:06 +01:00
Aldehir Rojas	e39502e74b	llama : add token matching support to llama-grammar (#17816 ) * llama : add token support to llama-grammar * fix inverse token comment * refactor trigger_patterns to replay tokens instead of the entire string * add token documentation * fix test-llama-grammar * improve test cases for tokens	2025-12-09 00:32:57 -06:00
philip-essential	1d2a1ab73d	model : support Rnj-1 (#17811 ) * add support for rnj1 * refactor gemma3 to support rnj-1 * address review comments	2025-12-09 04:49:03 +01:00
Sigbjørn Skjæret	c8554b66e0	graph : use fill instead of scale_bias in grouped expert selection (#17867 ) * use fill instead of scale_bias in grouped expert selection * do not explicitly use _inplace	2025-12-08 21:29:59 +01:00
Piotr Wilkin (ilintar)	e4e9c4329c	Make graph_max_nodes vary by ubatch size (#17794 ) * Make graph_max_nodes vary by ubatch size for models where chunking might explode the graph * Update src/llama-context.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Add missing const --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-08 14:32:41 +01:00
Xuan-Son Nguyen	4d3726278b	model: add llama 4 scaling for mistral-large (deepseek arch) (#17744 )	2025-12-07 22:29:54 +01:00
Concedo	17c0c8d55d	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/backend/zDNN.md # docs/build.md # docs/ops.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-cann/ggml-cann.cpp # ggml/src/ggml-opencl/ggml-opencl.cpp # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-sycl/convert.cpp # ggml/src/ggml-sycl/ggml-sycl.cpp # src/llama-quant.cpp # tests/test-backend-ops.cpp # tools/llama-bench/llama-bench.cpp # tools/server/README.md	2025-12-07 16:48:38 +08:00
Concedo	7c5d271d6c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/release.yml # .github/workflows/winget.yml # CMakeLists.txt # CODEOWNERS # CONTRIBUTING.md # cmake/build-info.cmake # docs/ops.md # docs/ops/BLAS.csv # docs/ops/Metal.csv # examples/CMakeLists.txt # examples/save-load-state/save-load-state.cpp # examples/simple-cmake-pkg/README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-rpc/ggml-rpc.cpp # ggml/src/ggml-webgpu/ggml-webgpu.cpp # ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py # src/llama-quant.cpp # tests/test-backend-ops.cpp # tools/server/CMakeLists.txt	2025-12-07 16:37:32 +08:00

1 2 3 4 5 ...

1012 commits