koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-11 17:44:38 +00:00

Author	SHA1	Message	Date
Concedo	a681cdd9ef	Merge branch 'upstream' into concedo_experimental # Conflicts: # common/sampling.h # llama.h # tests/test-chat-template.cpp	2024-04-24 21:29:07 +08:00
Johannes Gäßler	28103f4832	Server: fix seed for multiple slots (#6835 ) * Server: add tests for consistent results * sampling: separate rng per sampling context	2024-04-24 11:08:36 +02:00
Concedo	b4d2031215	merged, added ability to render special tokens	2024-04-22 18:19:58 +08:00
Pedro Cuenca	b97bc3966e	llama : support Llama 3 HF conversion (#6745 ) * Support Llama 3 conversion The tokenizer is BPE. * style * Accept suggestion Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> * llama : add llama_token_is_eog() ggml-ci * llama : auto-detect more EOT tokens when missing in KV data * convert : replacing EOS token is a hack * llama : fix codegemma EOT token + add TODOs * llama : fix model type string for 8B model --------- Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-21 14:50:41 +03:00
Concedo	9a25d77cc1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # Makefile # README-sycl.md # README.md # ci/run.sh # ggml-cuda.cu # ggml.c # grammars/README.md # scripts/get-wikitext-2.sh # scripts/hf.sh # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2024-04-14 21:18:39 +08:00
Jared Van Bortel	1b67731e18	BERT tokenizer fixes (#6498 ) Key changes: * BERT conversion: fix abuse of LlamaHfVocab, do not set BOS or EOS * Nomic Embed conversion: pad vocab instead of slicing embedding tensor * llama_tokenize: handle added special tokens like HF does	2024-04-09 13:44:08 -04:00
Concedo	d1bb126605	Merge branch 'upstream' into concedo # Conflicts: # README.md # llama.cpp # otherarch/sdcpp/SDCPP_LICENSE # scripts/sync-ggml-am.sh # scripts/sync-ggml.sh	2024-04-09 17:18:35 +08:00
Jan Boon	beea6e1b16	llama : save and restore kv cache for single seq id (#6341 ) * llama : save and restore kv cache for single seq id * remove trailing whitespace * respond error in case there's no space in the kv cache * add kv seq save restore to test case * add --slot-save-path arg to enable save restore and restrict save location * Returning 0 for some cases, instead of asserting. * cleanup error cases * rename sequence state functions * rename state get set functions * add previous function names back in with DEPRECATED notice * update doc * adjust endpoints to preferred style * fix restoring zero cell count * handle seq rm return value * unused param * keep in the size check * fix return types * add server test case for slot save restore * cleanup * add cake * cleanup style * add special * removing a whole sequence never fails * move sequence state file functionality from server to llama to match session api and add version tags * catch exceptions on save as well * error log messages * check types for stricter restore * update server doc * readme : update API changes date * strict filename validation * move include, reject bom as well * also reject empty filename * reject whitespace and trailing dot --------- Co-authored-by: Martin Evans <martindevans@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-08 15:43:30 +03:00
Concedo	ba950716a9	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # Package.swift # README.md # build.zig # llama.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-llama.cpp	2024-03-13 11:21:58 +08:00
Georgi Gerganov	05b06210c9	llama : more consistent names of count variables (#5994 ) * llama : more consistent names of count variables ggml-ci * llama : n_parallel -> n_seq_max * common : fix param name * examples : fix param name	2024-03-11 17:49:47 +02:00
Concedo	ac43e0115c	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/nix/package.nix # README.md # ggml-metal.m # llama.cpp # scripts/sync-ggml.last # tests/test-backend-ops.cpp	2024-03-05 15:54:05 +08:00
DAN™	5a51cc1bb4	main : support special tokens as reverse/anti prompt (#5847 ) * Support special tokens as reverse/anti prompt. * Tokenize antiprompts only once. * main : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-04 09:57:20 +02:00
Concedo	ad638285de	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # flake.lock # ggml-cuda.cu # llama.cpp # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-02-28 13:41:35 +08:00
Georgi Gerganov	bf08e00643	llama : refactor k-shift implementation + KV defragmentation (#5691 ) * llama : refactor k-shift implementation ggml-ci * llama : rename llama_kv_cache_seq_shift to llama_kv_cache_seq_add * llama : cont k-shift refactoring + normalize type names ggml-ci * minor : fix MPI builds * llama : reuse n_rot from the build context ggml-ci * llama : revert enum name changes from this PR ggml-ci * llama : update llama_rope_type * llama : add comment about rope values * llama : fix build * passkey : apply kv cache updates explicitly ggml-ci * llama : change name to llama_kv_cache_update() * llama : add llama_kv_cache_seq_pos_max() * passkey : fix llama_kv_cache_seq_pos_max() usage * llama : some llama_kv_cell simplifications * llama : add llama_kv_cache_compress (EXPERIMENTAL) * llama : add alternative KV cache merging (EXPERIMENTAL) * llama : add llama_kv_cache_defrag * llama : comments * llama : remove llama_kv_cache_compress will add in a separate PR ggml-ci * llama : defragment via non-overlapping moves * llama : ggml_graph based defrag implementation ggml-ci * llama : switch the loop order in build_defrag * llama : add comments	2024-02-25 22:12:24 +02:00
Concedo	be696e0da9	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # scripts/sync-ggml.last	2024-02-22 15:49:19 +08:00
Jared Van Bortel	89febfed93	examples : do not assume BOS when shifting context (#5622 )	2024-02-21 10:33:54 -05:00
Concedo	8d5e25008f	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ci/run.sh # tests/test-tokenizer-0-falcon.cpp # tests/test-tokenizer-0-llama.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-llama.cpp	2024-02-17 15:22:05 +08:00
bmwl	f486f6e1e5	ggml : add numa options (#5377 ) * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverted Makefile * Fixed include * Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables * removed trailing whitespace * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverting Makefile * Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet * Removing MIRROR_MODE code for this PR * Removing last bit of MIRROR_MODE code for this PR * Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static * Fixed lingering init_llama_backend() bool calls in tests and examples * Remote enum llama_numa_strategies * Revert bad merge with dynatemp flags * add missing enum ggml_numa_strategies declaration and revert sync problem with master * add missing enum ggml_numa_strategies declaration * fixed ggml_init_numa variable * Update ggml.h Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges * split numa init out from llama_backend_init and created llama_numa_init. Updated all code paths and samples * Fix up some boolean vs enum comparisons * Added #ifdefs for non-Linux OS that don't have cpu_set_t datatype * Update ggml.h Align enum values Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c Remove whitespace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c align paremeters Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/server.cpp remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/common.cpp Remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * unified ggml_numa_strategy enum and fixed text alignment in server.cpp example * Update ggml.c simplified return for platforms without NUMA support Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * removed redundant else from cli argument processing of --numa * whitespace --------- Co-authored-by: root <root@nenya.lothlorien.ca> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-02-16 11:31:07 +02:00
Concedo	3cec37c2e0	Merge branch 'master' into concedo_experimental # Conflicts: # .flake8 # .github/workflows/python-lint.yml # flake.lock # ggml-cuda.cu # ggml-quants.c # llama.cpp # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp	2024-02-13 00:14:22 +08:00
Georgi Gerganov	85910c5b30	main : ctrl+C print timing in non-interactive mode (#3873 )	2024-02-11 15:35:50 +02:00
Concedo	6dc01297f8	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # flake.nix # llama.cpp # llama.h # tests/test-llama-grammar.cpp	2024-02-04 19:42:57 +08:00
Michael Klimenko	52bb63c708	refactor : switch to emplace_back to avoid extra object (#5291 )	2024-02-03 13:23:37 +02:00
Concedo	54cc31f9dc	Merge branch 'master' into concedo_experimental # Conflicts: # .ecrc # .github/workflows/build.yml # CMakeLists.txt # README.md # llama.cpp # tests/test-c.c	2024-01-30 19:18:10 +08:00
divinity76	813416991a	main : allow empty --prompt-cache file (#5176 ) * allow empty --prompt-cache file This allows the use of std::tmpnam(), std::tmpfile(), Python's tempfile.NamedTemporaryFile(), and similar create-empty-file API's for the user. I switched from the C fopen API to the C++ filesystem api to get around the fact that, to the best of my knowledge, C has no portable way to get the file size above LONG_MAX, with std::ftell() returning long? fallback to std::ifstream for c++ < 17 (the project is currently targeting C++11 it seems - file_exists() and file_size() can be removed when we upgrade to c++17) * formatting (requested in codereview) * remove c++17, file_is_empty	2024-01-30 11:18:02 +02:00
Concedo	71e9a64171	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/nix-ci.yml # CMakeLists.txt # Makefile # ggml-cuda.cu # ggml-opencl.cpp # llama.cpp	2024-01-20 23:27:42 +08:00
Concedo	dc7bc0cb50	Merge commit '`584d674be6`' into concedo_experimental # Conflicts: # .github/workflows/nix-flake-update.yml # Makefile # Package.swift # ggml-cuda.cu # tests/test-quantize-fns.cpp	2024-01-14 16:29:44 +08:00
Yann Follet	722d33f34e	main : add parameter --no-display-prompt (#4541 ) * add the parameter : --no-display-prompt , combine with --log-disable it will display only the generated tokens * remove empty line --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-01-13 18:09:08 +02:00
Georgi Gerganov	7edefbd79c	main : better name for variable n_print (#4874 )	2024-01-11 22:46:26 +02:00
Georgi Gerganov	3ca63b4538	main : disable token count by default (#4874 )	2024-01-11 22:43:05 +02:00
pudepiedj	43f76bf1c3	main : print total token count and tokens consumed so far (#4874 ) * Token count changes * Add show token count * Updating before PR * Two requested changes * Move param def posn	2024-01-11 18:14:52 +02:00
Concedo	66533c8424	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # Package.swift # README.md # tests/test-quantize-fns.cpp	2024-01-09 17:48:18 +08:00
Georgi Gerganov	52531fdff8	main : add self-extend support (#4815 ) * examples : add passkey test * passkey : better prints * passkey : select pass key pos from CLI * passkey : simplify n_past logic * llama : "self-extend"-like context extension * passkey : add comment * main : add Self-Extend support * llama : add comment about llama_kv_cache_seq_div	2024-01-08 11:18:32 +02:00
Concedo	ec21fa7712	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # Package.swift # README.md # ggml-cuda.cu # llama.cpp # llama.h # scripts/sync-ggml.sh # tests/CMakeLists.txt	2023-12-08 17:42:26 +08:00
MaggotHATE	52c8bc3cf3	sampling : custom samplers order (#4285 ) * Samplers sequence order w parameter * Cleaned commented code * Fixed formatting * Rewrote with unordered_map * Revert and rewrite, too many problems and safeguards would be needed * Fixed code style * Code style fixes according to review * More readable samplers input string, fixed help * Style fix in sampler_queue * Formatting fixes * Fixing whitespaces	2023-12-05 12:05:51 +02:00
Concedo	4f40c226a0	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/tools.sh # .gitignore # CMakeLists.txt # Makefile # README.md	2023-12-01 23:46:59 +08:00
Andrew Godfrey	8efa0f6ebe	main : pass LOG_TEE callback to llama.cpp log (#4033 ) * main : Call llama_log_set to use LOG_TEE * tabs to spaces	2023-11-30 23:56:19 +02:00
Concedo	56a5fa7a60	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # tests/test-tokenizer-0-falcon.py # tests/test-tokenizer-0-llama.py	2023-11-20 22:37:06 +08:00
Seb C	881800d1f0	main : Add ChatML functionality to main example (#4046 ) Co-authored-by: Sebastian Cramond <sebby37@users.noreply.github.com>	2023-11-20 14:56:59 +01:00
Concedo	6bf8ee4aea	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # ggml-cuda.cu # tests/test-tokenizer-0-falcon.py # tests/test-tokenizer-0-llama.py	2023-11-18 11:10:45 +08:00
Kerfuffle	91f6499393	Respect tokenizer.ggml.add_bos_token value when tokenizing (#4040 ) * gguf-py: gguf-dump: Respect --no-tensor flag in JSON mode. * Respect add_bos_token GGUF metadata value * gguf-py: Try to fix SpecialVocab giving up too easily for the Nth time	2023-11-16 19:14:37 -07:00
Concedo	bc4ff72317	not working merge	2023-11-02 17:52:40 +08:00
cebtenzzre	b12fa0d1c1	build : link against build info instead of compiling against it (#3879 ) * cmake : fix build when .git does not exist * cmake : simplify BUILD_INFO target * cmake : add missing dependencies on BUILD_INFO * build : link against build info instead of compiling against it * zig : make build info a .cpp source instead of a header Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com> * cmake : revert change to CMP0115 --------- Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>	2023-11-02 08:50:16 +02:00
Kerfuffle	6e08281e58	Extend llama_kv_cache_seq_rm to allow matching any sequence (#3843 ) * Extend llama_kv_cache_seq_rm to allow matichng any sequence * Replace llama_kv_cache_tokens_rm with llama_kv_cache_clear Use llama_kv_cache_clear for cache clearing Change calls to llama_kv_cache_tokens_rm that want to delete by position to use llama_kv_cache_seq_rm functionality	2023-10-29 11:31:40 -06:00
Marcus Dunn	5be6c803fa	llama : remove token functions with `context` args in favor of `model` (#3720 ) * added `llama_model_token_` variants to all the `llama_token_` functions. * added `LLAMA_API` * formatting Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * removed old `llama_token` functions * changed 3 more functions to take in model - `llama_token_get_text` - `llama_token_get_score` - `llama_token_get_type` * added back docs * fixed main.cpp * changed token functions to use new model variants * changed token functions to use new model variants --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-23 22:40:03 +03:00
vvhg1	d3956aea53	main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623 ) * infill tokens correction * serverinfill tokens correction * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * removing any leading whitespace from infill suffix and removing leeading space token from suffix when params.escape * only rm when params.escape, rm space if possible which is added back or rm added space token * only rm when params.escape, rm space if possible which is added back or rm added space token * Revert "only rm when params.escape, rm space if possible which is added back or rm added space token" This reverts commit 63ba0b621f21077c0e3bc6ba6a327534123cb738. * fix interactive prompt escaping and fix server infill leading space handling * rm unnecessary bool check * process escapes for neg prompt and interactive consec prompts * removed unneccessary static string escape	2023-10-22 21:09:51 +03:00
Georgi Gerganov	d1031cf49c	sampling : refactor init to use llama_sampling_params (#3696 ) * sampling : refactor init to use llama_sampling_params * llama : combine repetition, frequency and presence penalties in 1 call * examples : remove embd-input and gptneox-wip * sampling : rename penalty params + reduce size of "prev" vector * sampling : add llama_sampling_print helper * sampling : hide prev behind API and apply #3661 ggml-ci	2023-10-20 21:07:23 +03:00
Georgi Gerganov	0e89203b51	speculative : add tree-based sampling example (#3624 ) * sampling : one sequence per sampling context ggml-ci * speculative : add tree-based sampling support ggml-ci * speculative : reuse the n_parallel CLI param * speculative : refactor sampling * examples : fix build after sampling refactoring ggml-ci * batched : fix n_seq_id * sampling : fix malloc ggml-ci * swift : fix build ggml-ci * swift : try to fix build ggml-ci * prompts : add assistant.txt * common : add llama_batch_add() and llama_batch_clear() helpers * speculative : minor refactor ggml-ci * minor : comments + rename ggml-ci * speculative : fix off-by-one for n_drafted * speculative : fix the n_drafted fix + p constants	2023-10-18 16:21:57 +03:00
Georgi Gerganov	e1675d133c	llama : avoid fprintf in favor of LLAMA_LOG (#3538 )	2023-10-17 22:34:26 +03:00
staviq	1a159553f9	tokenizer : special token handling (#3538 ) * Rewrite special token handling from #1931 * shorten param name, add st verification by type * use offsets instead of copy by substr * formatting, remove copying iterator on delete * llama : normalize code-style * swift fix * print pfx/sfx if verb, main: split pfx input sfx * dont add space when using special tokens * minor : comment + spacing --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2023-10-17 18:11:01 +03:00
Georgi Gerganov	b8fe4b5cc9	main : fix session loading bug (#3400 )	2023-10-11 23:55:41 +03:00

1 2 3

142 commits