koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-17 12:39:09 +00:00

History

Georgi Gerganov 68e7ea3eab spec : parallel drafting support (#22838 ) * spec : refactor * spec : drop support for incompatible vocabs * spec : update common_speculative_init() * cont : pass seq_id * cont : dedup ctx_seq_rm_type * server : sketch the ctx_dft decode loop * server : draft prompt cache and checkpoints * server : improve ctx names * server, spec : transition to unified spec context * cont : sync main and drft contexts * cont : async drft eval when possible * cont : handle non-ckpt models * cont : pass correct n_past for drafting * cont : process images throught the draft context * spec : handle draft running out of context * server : fix mtmd draft processing * server : fix URL for draft model * server : add comment * server : clean-up + dry * speculative-simple : update * spec : fix n_past type * server : fix slot ctx_drft ptr * tools : update readme * naming : improve consistency * spec : refactor for multi-sequence speculative context * cont : prepare params * cont : prepare params * spec : support parallel drafts * server : support parallel drafting * llama : reuse device buffers when possible * server, spec : clean-up * cont : clean-up * cont : minor * spec : reset `drafting` flag at the end * spec : introduce `common_speculative_process()` * spec : allow for multiple spec types (chain of speculators) * replace old type field of type common_speculative_type in the common_params_speculative struct with a vector to allow multiple types to be specified * introduce common_get_enabled_speculative_impls(const std::vector<enum common_speculative_type>) to figure out which implementations the user has enabled * introduce common_speculative_type_from_names(const std::vector<std::string> & names) to parse the already user provided spec types * all speculators run sequentially, best one wins (we verify its drafted tokens) * maximize expected accepted tokens for current round by calculating the product between the probability of accepting current token (n_acc_tokens / n_gen_drafts) and the draft's length --------- Co-authored-by: Petros Sideris <petros.sideris@nokia.com>		2026-05-11 19:09:43 +03:00
..
batched	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
batched.swift	examples : remove references to `make` in examples [no ci] (#15457 )	2025-08-21 06:12:28 +02:00
convert-llama2c-to-ggml	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
debug	common: fix missing exports in llama-common (#22340 )	2026-04-27 08:06:39 +03:00
deprecation-warning	Fix locale-dependent float printing in GGUF metadata (#17331 )	2026-03-04 09:30:40 +01:00
diffusion	examples: refactor diffusion generation (#22590 )	2026-05-04 20:19:30 +08:00
embedding	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
eval-callback	common: fix missing exports in llama-common (#22340 )	2026-04-27 08:06:39 +03:00
gen-docs	spec : refactor params (#22397 )	2026-04-28 09:07:33 +03:00
gguf	Fix locale-dependent float printing in GGUF metadata (#17331 )	2026-03-04 09:30:40 +01:00
gguf-hash	Fix locale-dependent float printing in GGUF metadata (#17331 )	2026-03-04 09:30:40 +01:00
idle	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
llama.android	android : libcommon -> libllama-common (#22076 )	2026-04-18 11:19:40 +02:00
llama.swiftui	llama : deprecate llama_kv_self_ API (#14030 )	2025-06-06 14:11:15 +03:00
lookahead	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
lookup	spec : refactor params (#22397 )	2026-04-28 09:07:33 +03:00
model-conversion	model-conversion : fix mmproj output file name [no ci] (#22274 )	2026-04-23 15:07:38 +02:00
parallel	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
passkey	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
retrieval	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
save-load-state	common : only load backends when required (#22290 )	2026-05-05 09:23:50 +02:00
simple	Fix locale-dependent float printing in GGUF metadata (#17331 )	2026-03-04 09:30:40 +01:00
simple-chat	Fix locale-dependent float printing in GGUF metadata (#17331 )	2026-03-04 09:30:40 +01:00
simple-cmake-pkg	examples : add missing code block end marker [no ci] (#17756 )	2025-12-04 14:17:30 +01:00
speculative	spec : fix vocab compat checks in spec example (#22426 )	2026-04-30 08:18:25 +03:00
speculative-simple	spec : parallel drafting support (#22838 )	2026-05-11 19:09:43 +03:00
sycl	fix script error (#22795sycl : )	2026-05-08 06:54:57 +03:00
training	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
CMakeLists.txt	examples : add debug utility/example (#18464 )	2026-01-07 10:42:19 +01:00
convert_legacy_llama.py	metadata: Detailed Dataset Authorship Metadata (#8875 )	2024-11-13 21:10:38 +11:00
json_schema_pydantic_example.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
json_schema_to_grammar.py	ci : switch from pyright to ty (#20826 )	2026-03-21 08:54:34 +01:00
llama.vim	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
pydantic_models_to_grammar.py	ci : switch from pyright to ty (#20826 )	2026-03-21 08:54:34 +01:00
pydantic_models_to_grammar_examples.py	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
reason-act.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
regex_to_grammar.py	py : switch to snake_case (#8305 )	2024-07-05 07:53:33 +03:00
server-llama2-13B.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
server_embd.py	llama : fix FA when KV cache is not used (i.e. embeddings) (#12825 )	2025-04-08 19:54:51 +03:00
ts-type-to-grammar.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00