koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2026-05-12 22:32:18 +00:00

Author	SHA1	Message	Date
Concedo	4a3c8c190b	Merge branch 'upstream' into concedo_experimental # Conflicts: # tests/test-backend-ops.cpp	2024-05-22 15:04:31 +08:00
liuwei-git	201cc11afa	llama : add phi3 128K model support (#7225 ) * add phi3 128k support in convert-hf-to-gguf * add phi3 128k support in cuda * address build warnings on llama.cpp * adjust index value in cuda long rope freq factors * add long rope support in ggml cpu backend * make freq factors only depend on ctx size * remove unused rope scaling type 'su' frin gguf converter * fix flint warnings on convert-hf-to-gguf.py * set to the short freq factor when context size is small than trained context size * add one line of comments * metal : support rope freq_factors * ggml : update ggml_rope_ext API to support freq. factors * backends : add dev messages to support rope freq. factors * minor : style * tests : update to use new rope API * backends : fix pragma semicolons * minor : cleanup * llama : move rope factors from KV header to tensors * llama : remove tmp assert * cuda : fix compile warning * convert : read/write n_head_kv * llama : fix uninitialized tensors --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-21 23:28:32 +03:00
Olivier Chafik	e402de364b	`grammars`: fix resampling logic regression (#7424 )	2024-05-21 20:40:00 +01:00
Amir	11474e756d	examples: cache hf model when --model not provided (#7353 ) * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided * examples: cache hf model when --model not provided	2024-05-21 17:13:12 +03:00
jaime-m-p	d7e852c1bc	Tokenizer SPM fixes for phi-3 and llama-spm (bugfix) (#7425 ) * Update brute force test: add_special * Update brute force test: default values for add_bos_token and add_eos_token * Enable rtrim when pre-inserting BOS Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Revert "server : fix test regexes"	2024-05-21 14:39:48 +02:00
Concedo	52f9911240	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # README.md # requirements.txt # scripts/LlamaConfig.cmake.in	2024-05-21 19:05:52 +08:00
jaime-m-p	917dc8cfa6	Tokenizer SPM fixes for phi-3 and llama-spm (#7375 ) * Update brute force test: special tokens * Fix added tokens - Try to read 'added_tokens.json'. - Try to read 'tokenizer_config.json'. - Try to read 'tokenizer.json'. * Fix special tokens rtrim Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server : fix test regexes	2024-05-20 20:15:57 +02:00
Johannes Gäßler	20385cebcc	perplexity: update README FP16 results [no ci] (#7413 )	2024-05-20 18:15:38 +02:00
Georgi Gerganov	3bc10cb485	server : fix temperature + disable some tests (#7409 ) * server : fix temperature * server : disable tests relying on parallel determinism * ci : change server Debug -> RelWithDebInfo	2024-05-20 22:10:03 +10:00
Georgi Gerganov	1cc0155d04	server : tuning tests (#7388 ) * server : don't pass temperature as string * server : increase timeout * tests : fix the fix 0.8f -> 0.8 ggml-ci * tests : set explicit temperature	2024-05-20 10:16:41 +03:00
Georgi Gerganov	e932094d58	server : return error on too large embedding input (#7389 )	2024-05-20 08:56:05 +03:00
Georgi Gerganov	2789baf480	tests : fix --keep_split -> --keep-split (#7374 )	2024-05-20 08:55:09 +03:00
Fred Douglas	1ea2a0036e	quantize : fix --keep-split check (#7374 )	2024-05-19 19:37:04 +03:00
Johannes Gäßler	1b01f06db0	server: add test for token probs (#7347 )	2024-05-19 16:26:02 +02:00
Johannes Gäßler	41858392e1	server: fix seed being reported back (#7382 )	2024-05-19 17:06:33 +03:00
Concedo	d5d5dda02b	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/nix/package.nix # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # README.md # ggml-cuda.cu # tests/test-backend-ops.cpp	2024-05-19 17:55:20 +08:00
Georgi Gerganov	854d365aba	cmake : update android comments (#7341 )	2024-05-19 11:01:01 +03:00
Georgi Gerganov	511182eabb	android : use "ci-android" branch for CI (#7341 ) * android : use "ci-android" branch for CI * ggml : disable SIMD exp and silu for 32-bit ARM ggml-ci * android : do not fetch, use add_subdirectory instead * cmake : provide binary dir	2024-05-18 20:40:39 +10:00
Johannes Gäßler	cb42c29427	server: correct --threads documentation [no ci] (#7362 )	2024-05-18 11:10:47 +02:00
strawberrymelonpanda	ca57e0f35e	perplexity : ndot progress and show stats with < 100 tasks (#7348 ) Fix floating point error with ndot printing, allow end stats on lower task numbers if multiple-choice tasks.	2024-05-18 10:57:08 +03:00
Concedo	47cbfd6150	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # llama.cpp # scripts/sync-ggml-am.sh # scripts/sync-ggml.last # scripts/sync-ggml.sh # tests/test-backend-ops.cpp	2024-05-17 22:30:41 +08:00
Radoslav Gerganov	f4bd8b3d26	rpc : set SO_REUSEADDR for the server socket (#7320 ) ref: #7293	2024-05-17 17:25:44 +03:00
Radoslav Gerganov	ee94172d33	server : add support for the RPC backend (#7305 ) ref: #7292	2024-05-17 10:00:17 +03:00
Leon Knauer	9c4fdcbec8	[Server] Added --verbose option to README [no ci] (#7335 )	2024-05-17 10:11:03 +10:00
Pierrick Hymbert	24ecb58168	Revert "server bench: fix bench not waiting for model load (#7284 )" (#7334 ) This reverts commit `583fd6b000`.	2024-05-16 20:43:45 +02:00
Radoslav Gerganov	9afdffe70e	rpc : get available mem for the CPU backend This can be overridden with the -m command line option ref: #7293	2024-05-16 12:04:08 +03:00
Radoslav Gerganov	3b3963c55c	rpc : add command line arg for specifying backend memory ref: #7293	2024-05-16 09:58:29 +03:00
Vaibhav Srivastav	ad52d5c259	doc: add references to hugging face GGUF-my-repo quantisation web tool. (#7288 ) * chore: add references to the quantisation space. * fix grammer lol. * Update README.md Co-authored-by: Julien Chaumond <julien@huggingface.co> * Update README.md Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-16 15:38:43 +10:00
slaren	344f9126cc	ggml : tag ggml_tensor::backend as deprecated (#7290 )	2024-05-15 15:08:48 +02:00
dm4	ea3b0590ee	embedding : free the batch after execution (#7297 )	2024-05-15 15:01:12 +03:00
Johannes Gäßler	583fd6b000	server bench: fix bench not waiting for model load (#7284 )	2024-05-15 08:44:16 +02:00
Steve Grubb	4f0263633b	server: free sampling contexts on exit (#7264 ) * server: free sampling contexts on exit This cleans up last leak found by the address sanitizer. * fix whitespace * fix whitespace	2024-05-14 16:11:24 +02:00
Brian	1265c670fd	Revert "move ndk code to a new library (#6951 )" (#7282 ) This reverts commit `efc8f767c8`.	2024-05-14 16:10:39 +03:00
Concedo	2ee808a747	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # CMakeLists.txt # README.md # ci/run.sh # llama.cpp # models/ggml-vocab-llama-bpe.gguf.inp # models/ggml-vocab-llama-bpe.gguf.out # requirements.txt # scripts/compare-llama-bench.py # scripts/sync-ggml.last # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-1-bpe.cpp	2024-05-14 19:28:47 +08:00
Radoslav Gerganov	5e31828d3e	ggml : add RPC backend (#6829 ) * ggml : add RPC backend The RPC backend proxies all operations to a remote server which runs a regular backend (CPU, CUDA, Metal, etc). * set TCP_NODELAY * add CI workflows * Address review comments * fix warning * implement llama_max_devices() for RPC * Address review comments * Address review comments * wrap sockfd into a struct * implement get_alignment and get_max_size * add get_device_memory * fix warning * win32 support * add README * readme : trim trailing whitespace * Address review comments * win32 fix * Address review comments * fix compile warnings on macos	2024-05-14 14:27:19 +03:00
Elton Kola	efc8f767c8	move ndk code to a new library (#6951 )	2024-05-14 17:30:30 +10:00
Ryuei	27f65d6267	docs: Fix typo and update description for --embeddings flag (#7026 ) - Change '--embedding' to '--embeddings' in the README - Update the description to match the latest --help output - Added a caution about defining physical batch size	2024-05-14 15:20:47 +10:00
k.h.lai	30e70334f7	llava-cli: fix base64 prompt (#7248 )	2024-05-14 00:02:36 +10:00
Johannes Gäßler	1c570d8bee	perplexity: add BF16 vs. FP16 results (#7150 )	2024-05-13 13:03:27 +02:00
Benjamin Findley	e586ee4259	change default temperature of OAI compat API from 0 to 1 (#7226 ) * change default temperature of OAI compat API from 0 to 1 * make tests explicitly send temperature to OAI API	2024-05-13 12:40:08 +10:00
Xuan Son Nguyen	72c177c1f6	fix system prompt handling (#7153 )	2024-05-11 17:28:10 +02:00
Steve Grubb	988631335a	server : free llama_batch on exit (#7212 ) * [server] Cleanup a memory leak on exit There are a couple memory leaks on exit of the server. This hides others. After cleaning this up, you can see leaks on slots. But that is another patch to be sent after this. * make tab into spaces	2024-05-11 11:13:02 +03:00
Johannes Gäßler	5ae3426b0b	server: fix reported top tokens for temperature 0 (#7203 )	2024-05-11 10:11:28 +02:00
Joan Fontanals	b83cc3f5b3	llama : add Jina Embeddings architecture (#6826 ) * feat: first things to do * feat: create tensors for Jina architecture * fix: use other tensors * feat: embedding gets results * fix: fix usage of ALIBI * fix: clean prints * fix: do some cleanup unused vars * fix: revert changes to Makefile and CMakeLists * fix: revert some changes * fix: fix small detail * fix: fix convert formatting * fix: fix linting and editor * feat: set proper vocab settings * fix: JinaBertForMaskedLM registration * feat: support q_normalization and k_normalization in Jina arch * feat: handle gpt2 tokenizer with Jina architecture * feat: example comments in embedding * feat: rename Jina Bert to Jina Bert V2 * fix: add some changes as per review * feat: proper KQ_pos for Jina embeddings * feat: add capacity to load models ES and DE for Spanish * llama : fix pre-tokenizers * ggml : full ALiBi support * ggml : update ggml_soft_max_ext() CUDA, SYCL * ggml : ggml_flash_attn_ext() support ALiBi (CPU) * ggml : ggml_flash_attn_ext() support ALiBi (Metal) * ggml : fix warning * ggml : ggml_flash_attn_ext() support ALiBi (CUDA) ggml-ci * minor : clean-up * embedding : add warning about missing SEP --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-05-11 10:46:09 +03:00
slaren	e849648888	llama-bench : add pp+tg test type (#7199 )	2024-05-10 18:03:54 +02:00
Justine Tunney	4e3880978f	Fix memory bug in grammar parser (#7194 ) The llama.cpp grammar parser had a bug where forgetting to add a closing quotation mark to strings would cause parsing to crash. Anyone running a server on a public endpoint is advised to upgrade. To reproduce this bug ./llamafile -m foo.gguf -p bar --grammar 'root::="' Credit for discovering and reporting this issue goes to Eclypsium Security Researcher Richard Johnson <Richard.johnson@eclypsium.com>.	2024-05-10 21:01:08 +10:00
HanishKVC	f89fe2732c	Main+: optionally allow special tokens from user in interactive mode (#7097 ) @hanishkvc added a new `--interactive-specials` flag which would allow for inserting special tokens from user side into the embedding stream.	2024-05-10 20:21:58 +10:00
Concedo	db82bad6f2	Merge commit '`8c570c9496`' into concedo_experimental # Conflicts: # README.md # tests/test-backend-ops.cpp	2024-05-10 16:55:26 +08:00
Andrei	d11afd6652	llava : fix moondream support (#7163 ) * Revert "Revert "llava : add support for moondream vision language model (#6899)"" This reverts commit `9da243b36a`. * Fix num_positions and embeddings initialization	2024-05-10 09:41:10 +03:00
slaren	eaf4bd8b39	eval-callback : fix conversion to float (#7184 )	2024-05-10 01:04:12 +02:00

1 2 3 4 5 ...

969 commits