koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 18:09:42 +00:00

Author	SHA1	Message	Date
Concedo	4c56b7cada	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # examples/gbnf-validator/gbnf-validator.cpp # examples/llava/clip.cpp # examples/run/README.md # examples/run/run.cpp # examples/server/README.md # ggml/src/ggml-cpu/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-12-21 09:41:49 +08:00
Xuan Son Nguyen	57bb2c40cd	server : fix logprobs, make it OAI-compatible (#10783 ) * server : fix logprobs, make it openai-compatible * update docs * add std::log * return pre-sampling p * sort before apply softmax * add comment * fix test * set p for sampled token * update docs * add --multi-token-probs * update docs * add `post_sampling_probs` option * update docs [no ci] * remove --multi-token-probs * "top_probs" with "post_sampling_probs" * resolve review comments * rename struct token_prob to prob_info * correct comment placement * fix setting prob for sampled token	2024-12-19 15:40:08 +01:00
Concedo	ee486bad3e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # README.md # examples/CMakeLists.txt # examples/batched/batched.cpp # examples/gritlm/gritlm.cpp # examples/llama.android/llama/build.gradle.kts # examples/main/README.md # examples/retrieval/retrieval.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/ggml-cpu/CMakeLists.txt # ggml/src/ggml.c # scripts/compare-commits.sh # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-sampling.cpp	2024-12-19 11:57:43 +08:00
Georgi Gerganov	152610eda9	server : output embeddings for all tokens when pooling = none (#10861 ) * server : add "tokens" output ggml-ci * server : output embeddings for all tokens when pooling = none ggml-ci * server : update readme [no ci] * server : fix spacing [no ci] Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : be explicit about the pooling type in the tests ggml-ci * server : update /embeddings and /v1/embeddings endpoints ggml-ci * server : do not normalize embeddings when there is no pooling ggml-ci * server : update readme ggml-ci * server : fixes * tests : update server tests ggml-ci * server : update readme [no ci] * server : remove rebase artifact --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-18 13:01:41 +02:00
Georgi Gerganov	0e70ba686e	server : add "tokens" output (#10853 ) * server : add "tokens" output ggml-ci * server : update readme ggml-ci * server : return tokens ids only if requested ggml-ci * tests : improve "tokens" type check Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * server : remove "tokens" from the OAI endpoint ggml-ci --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-12-18 11:05:29 +02:00
Xuan Son Nguyen	46828872c3	server : (embeddings) using same format for "input" and "content" (#10872 ) * server : (embeddings) using same format for "input" and "content" * fix test case * handle empty input case * fix test	2024-12-18 10:55:09 +02:00
krystiancha	05c3a444b8	server : fill usage info in embeddings and rerank responses (#10852 ) * server : fill usage info in embeddings response * server : fill usage info in reranking response	2024-12-17 18:00:24 +02:00
Concedo	f456ed7237	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .devops/tools.sh # .github/workflows/build.yml # Makefile # README.md # common/CMakeLists.txt # common/common.h # examples/llava/CMakeLists.txt # examples/run/CMakeLists.txt # examples/run/README.md # examples/run/run.cpp # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # ggml/src/ggml-kompute/ggml-kompute.cpp # tests/test-backend-ops.cpp # tests/test-rope.cpp	2024-12-15 15:30:10 +08:00
Michelle Tan	89d604f2c8	server: Fix `has_next_line` in JSON response (#10818 ) * Update server JSON response. * Add unit test to check `has_new_line` JSON response * Remove `has_new_line` unit test changes. * Address code review comment: type check for `has_new_line` in unit test	2024-12-14 23:29:45 +01:00
Concedo	de64b9198c	merge checkpoint 2 - functional merge without q4_0_4_4 (need regen shaders)	2024-12-13 17:04:19 +08:00
Concedo	4c4ce5e808	rewritten checkpoint 1 - before coopmat	2024-12-13 16:55:23 +08:00
Yüg	a86ad841f1	server : add flag to disable the web-ui (#10762 ) (#10751 ) Co-authored-by: eugenio.segala <esegala@deloitte.co.uk>	2024-12-10 18:22:34 +01:00
Xuan Son Nguyen	ce8784bdb1	server : fix format_infill (#10724 ) * server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req	2024-12-08 23:04:29 +01:00
Xuan Son Nguyen	e52522b869	server : bring back info of final chunk in stream mode (#10722 ) * server : bring back into to final chunk in stream mode * clarify a bit * traling space	2024-12-08 20:38:51 +01:00
Xuan Son Nguyen	3573fa8e7b	server : (refactor) no more json in server_task input (#10691 ) * server : (refactor) no more json in server_task input * add test for slots endpoint * add tests for /props and /slots * remove task inf_type * fix CI by adding safe_json_to_str * add "model_path" to /props * update readme	2024-12-07 20:21:09 +01:00
Xuan Son Nguyen	6c5bc0625f	server : (refactoring) do not rely on JSON internally (#10643 ) * server : (refactoring) reduce usage of json internally * move all response types to struct * wip [no ci] * many fixes * add virtual function * fix index * minor style fix * add std::move * refactor handle_completions_generic * add virtual functions * remove server.hpp * clarify server_sent_event RFC specs * apply review comments * fix model_alias and completion_probabilities * small clean up * remove virtual for to_json_oai_compat() * naming oai_compat --> oaicompat * fix unwanted recursive call * update docs	2024-12-06 11:14:32 +01:00
Georgi Gerganov	1da7b76569	server : fix speculative decoding with context shift (#10641 ) * server : fix speculative decoding with context shift ggml-ci * server : take into account speculative limits ggml-ci * server : add tests	2024-12-04 22:38:20 +02:00
haopeng	64ed2091b2	server: Add "tokens per second" information in the backend (#10548 ) * add cmake rvv support * add timings * remove space * update readme * fix * fix code * remove empty line * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-12-02 14:45:54 +01:00
Concedo	697ca70115	temp checkpoint	2024-11-30 12:13:20 +08:00
Concedo	ec95241e38	temp checkpoint	2024-11-30 11:59:27 +08:00
Concedo	0c8939be19	temp checkpoint	2024-11-30 11:57:28 +08:00
Xuan Son Nguyen	b782e5c7d4	server : add more test cases (#10569 ) * server : add split model test * add test speculative * add invalid cases	2024-11-29 21:48:56 +01:00
Xuan Son Nguyen	6c59567689	server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568 ) * server : (tests) don't use thread for capturing stdout/stderr * test: bump openai to 1.55.2 * bump openai to 1.55.3	2024-11-28 19:17:49 +01:00
Xuan Son Nguyen	9f912511bc	common : fix duplicated file name with hf_repo and hf_file (#10550 )	2024-11-27 22:30:52 +01:00
Xuan Son Nguyen	45abe0f74e	server : replace behave with pytest (#10416 ) * server : replace behave with pytest * fix test on windows * misc * add more tests * more tests * styling * log less, fix embd test * added all sequential tests * fix coding style * fix save slot test * add parallel completion test * fix parallel test * remove feature files * update test docs * no cache_prompt for some tests * add test_cache_vs_nocache_prompt	2024-11-26 16:20:18 +01:00
Concedo	a244b1ffd2	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # Makefile # Package.swift # ci/run.sh # docs/backend/SYCL.md # examples/llama-bench/llama-bench.cpp # examples/server/CMakeLists.txt # examples/server/README.md # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/run-json-schema-to-grammar.mjs # tests/test-backend-ops.cpp	2024-11-09 13:36:47 +08:00
Xuan Son Nguyen	a71d81cf8c	server : revamp chat UI with vuejs and daisyui (#10175 ) * server : simple chat UI with vuejs and daisyui * move old files to legacy folder * embed deps into binary * basic markdown support * add conversation history, save to localStorage * fix bg-base classes * save theme preferences * fix tests * regenerate, edit, copy buttons * small fixes * docs: how to use legacy ui * better error handling * make CORS preflight more explicit * add GET method for CORS * fix tests * clean up a bit * better auto scroll * small fixes * use collapse-arrow * fix closeAndSaveConfigDialog * small fix * remove console.log * fix style for <pre> element * lighter bubble color (less distract when reading)	2024-11-07 17:31:10 -04:00
Concedo	628dcd640e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # examples/server/README.md	2024-11-06 23:13:00 +08:00
Xuan Son Nguyen	9e0ecfb697	server : clarify /slots endpoint, add is_processing (#10162 ) * server : clarify /slots endpoint, add is_processing * fix tests	2024-11-04 16:33:29 +01:00
Concedo	a46f8acd03	note: also has support for completion tokens count	2024-11-01 00:44:14 +08:00
Xuan Son Nguyen	958367bf53	server : refactor slot input data, move tokenizer to HTTP thread (#10023 ) * server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere	2024-10-24 21:51:22 +02:00
Concedo	a9dbcdd3ec	Merge branch 'upstream' into concedo_experimental # Conflicts: # README.md # docs/build.md # examples/infill/infill.cpp # examples/main/README.md # examples/server/README.md # flake.lock # scripts/sync-ggml.last # src/llama.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-sampling.cpp	2024-10-17 16:36:02 +08:00
Georgi Gerganov	1bde94dd02	server : remove self-extend features (#9860 ) * server : remove self-extend ggml-ci * server : fix context limit check to use slot.n_past ggml-ci	2024-10-12 16:06:31 +03:00
Concedo	e692a79aab	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # CMakeLists.txt # CONTRIBUTING.md # docs/android.md # docs/docker.md # examples/embedding/embedding.cpp # examples/imatrix/imatrix.cpp # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/main/README.md # examples/parallel/parallel.cpp # examples/perplexity/perplexity.cpp # examples/quantize-stats/quantize-stats.cpp # examples/save-load-state/save-load-state.cpp # examples/server/README.md # examples/simple/CMakeLists.txt # examples/speculative/speculative.cpp # flake.lock # ggml/src/CMakeLists.txt # ggml/src/ggml-blas.cpp # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # scripts/debug-test.sh # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp	2024-10-11 11:59:59 +08:00
Xuan Son Nguyen	458367a906	server : better security control for public deployments (#9776 ) * server : more explicit endpoint access settings * protect /props endpoint * fix tests * update server docs * fix typo * fix tests	2024-10-08 13:27:04 +02:00
Concedo	ce7f9c9a2c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-rocm.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-server-rocm.Dockerfile # .github/workflows/build.yml # .github/workflows/python-type-check.yml # CMakeLists.txt # CONTRIBUTING.md # README.md # ci/run.sh # examples/embedding/embedding.cpp # examples/server/README.md # flake.lock # ggml/include/ggml.h # ggml/src/ggml.c # requirements/requirements-convert_legacy_llama.txt # scripts/sync-ggml.last # src/llama-vocab.cpp # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-tokenizer-0.cpp	2024-10-02 01:00:57 +08:00
vb	08a43d05b6	py : update transfomers version (#9694 ) * update transfomers version. * update hfh version.	2024-09-30 18:03:47 +03:00
Georgi Gerganov	f4d2b8846a	llama : add reranking support (#9510 ) * py : add XLMRobertaForSequenceClassification [no ci] * py : fix scalar-tensor conversion [no ci] * py : fix position embeddings chop [no ci] * llama : read new cls tensors [no ci] * llama : add classigication head (wip) [no ci] * llama : add "rank" pooling type ggml-ci * server : add rerank endpoint ggml-ci * llama : aboud ggml_repeat during classification * rerank : cleanup + comments * server : accept /rerank endpoint in addition to /v1/rerank [no ci] * embedding : parse special tokens * jina : support v1 reranker * vocab : minor style ggml-ci * server : initiate tests for later ggml-ci * server : add docs * llama : add comment [no ci] * llama : fix uninitialized tensors * ci : add rerank tests ggml-ci * add reranking test * change test data * Update examples/server/server.cpp Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add `--reranking` argument * update server docs * llama : fix comment [no ci] ggml-ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-09-28 17:42:03 +03:00
Concedo	ea55f69dc1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .dockerignore # .github/workflows/build.yml # .github/workflows/docker.yml # Makefile # README.md # examples/infill/infill.cpp # examples/perplexity/perplexity.cpp # examples/server/README.md # examples/speculative/speculative.cpp # flake.lock # ggml/src/CMakeLists.txt # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-sampling.cpp	2024-09-27 11:21:28 +08:00
Xuan Son Nguyen	0b3bf966f4	server : add --no-context-shift option (#9607 ) * server : add --no-context-shift option * small fix * Update examples/server/tests/features/embeddings.feature Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * tests : minor fix * revert usage of GGML_ASSERT * update server documentation --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-23 22:23:54 +02:00
Concedo	29625c3d2e	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # README.md # ci/run.sh # common/CMakeLists.txt # common/common.cpp # docs/backend/SYCL.md # examples/embedding/embedding.cpp # examples/imatrix/imatrix.cpp # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/main/README.md # examples/parallel/parallel.cpp # examples/perplexity/perplexity.cpp # examples/server/CMakeLists.txt # examples/server/README.md # examples/server/bench/README.md # examples/server/tests/README.md # examples/speculative/speculative.cpp # flake.lock # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # grammars/README.md # scripts/compare-commits.sh # scripts/compare-llama-bench.py # tests/CMakeLists.txt	2024-09-19 14:53:57 +08:00
Georgi Gerganov	6262d13e0b	common : reimplement logging (#9418 ) https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-15 20:46:12 +03:00
Concedo	ab41e324d6	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # README.md # examples/server/CMakeLists.txt # ggml/src/CMakeLists.txt	2024-09-15 19:28:05 +08:00
VoidIsVoid	dcdcee3a74	server: add data: [DONE] to /chat/completions stream response (#9459 )	2024-09-14 11:36:44 +02:00
Concedo	e44ddf26ef	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # examples/embedding/embedding.cpp # examples/imatrix/imatrix.cpp # examples/llama-bench/llama-bench.cpp # examples/llava/MobileVLM-README.md # examples/parallel/parallel.cpp # examples/perplexity/perplexity.cpp # examples/quantize/CMakeLists.txt # examples/server/README.md # examples/speculative/speculative.cpp # tests/test-backend-ops.cpp	2024-09-13 16:17:24 +08:00
Mathijs Henquet	78203641fe	server : Add option to return token pieces in /tokenize endpoint (#9108 ) * server : added with_pieces functionality to /tokenize endpoint * server : Add tokenize with pieces tests to server.feature * Handle case if tokenizer splits along utf8 continuation bytes * Add example of token splitting * Remove trailing ws * Fix trailing ws * Maybe fix ci * maybe this fix windows ci? --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2024-09-12 22:30:11 +02:00
Concedo	b63158005f	All samplers moved to kcpp side	2024-09-09 18:14:11 +08:00
Concedo	12fd16bfd4	Merge commit '`df270ef745`' into concedo_experimental # Conflicts: # Makefile # common/CMakeLists.txt # common/common.h # common/sampling.cpp # common/sampling.h # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/quantize-stats/quantize-stats.cpp # examples/server/server.cpp # include/llama.h # src/llama-sampling.cpp # src/llama-sampling.h # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp # tests/test-sampling.cpp	2024-09-09 17:10:08 +08:00
Georgi Gerganov	faf69d4237	llama : sanitize invalid tokens (#9357 ) * common : do not add null tokens during warmup ggml-ci * llama : check that the input tokens are valid ggml-ci * tests : fix batch size of bert model ggml-ci	2024-09-08 00:33:13 +03:00
Xuan Son Nguyen	9b2c24c099	server : simplify state machine for slot (#9283 ) * server : simplify state machine for slot * add SLOT_STATE_DONE_PROMPT * pop_deferred_task * add missing notify_one * fix passkey test * metrics : add n_busy_slots_per_decode * fix test step * add test * maybe fix AddressSanitizer? * fix deque ? * missing lock * pop_deferred_task: also notify * Update examples/server/server.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-09-06 23:21:29 +02:00

1 2

93 commits