koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 09:04:36 +00:00

Author	SHA1	Message	Date
Concedo	fe5479f286	unify antislop and token bans	2024-10-10 18:21:07 +08:00
Concedo	9b614d46bd	antislop sampler working	2024-10-09 16:33:04 +08:00
Concedo	36e9bac98f	wip anti slop sampler	2024-10-09 13:34:47 +08:00
Concedo	f78f8d3d45	wip anti slop	2024-10-07 23:18:13 +08:00
Concedo	65f3c68399	wip antislop	2024-10-07 20:19:22 +08:00
Concedo	740c5e01cb	added token delay feature	2024-10-07 19:45:51 +08:00
Concedo	3e8bb10e2d	wip on rewind function	2024-10-06 16:21:03 +08:00
Concedo	c38d1ecc8d	update templates, fix rwkv	2024-09-22 01:32:12 +08:00
Concedo	53bf0fb32d	removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.	2024-09-15 19:21:52 +08:00
Concedo	e44ddf26ef	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # examples/embedding/embedding.cpp # examples/imatrix/imatrix.cpp # examples/llama-bench/llama-bench.cpp # examples/llava/MobileVLM-README.md # examples/parallel/parallel.cpp # examples/perplexity/perplexity.cpp # examples/quantize/CMakeLists.txt # examples/server/README.md # examples/speculative/speculative.cpp # tests/test-backend-ops.cpp	2024-09-13 16:17:24 +08:00
Concedo	7bdac9bc44	prevent shifting on rwkv	2024-09-11 20:22:45 +08:00
Concedo	eee67281be	move kcpp params out	2024-09-10 16:30:12 +08:00
Concedo	fc7fe2e7a0	allow rwkv6 to run although its broken	2024-09-09 20:50:58 +08:00
Concedo	b63158005f	All samplers moved to kcpp side	2024-09-09 18:14:11 +08:00
Concedo	12fd16bfd4	Merge commit '`df270ef745`' into concedo_experimental # Conflicts: # Makefile # common/CMakeLists.txt # common/common.h # common/sampling.cpp # common/sampling.h # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/quantize-stats/quantize-stats.cpp # examples/server/server.cpp # include/llama.h # src/llama-sampling.cpp # src/llama-sampling.h # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp # tests/test-sampling.cpp	2024-09-09 17:10:08 +08:00
Concedo	c78690737c	fix for DRY segfault on unicode character substring tokenization	2024-09-08 18:25:00 +08:00
Concedo	d220495dd4	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .github/workflows/docker.yml # docs/docker.md # examples/llama-bench/llama-bench.cpp # flake.lock # ggml/include/ggml.h # ggml/src/CMakeLists.txt # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-rope.cpp	2024-08-30 10:37:39 +08:00
Concedo	b78a637da5	try to optimize context shifting	2024-08-26 23:07:31 +08:00
Concedo	cca3c4c78b	xtc fixes	2024-08-22 23:18:46 +08:00
Concedo	fc2545dc83	fixed a typo	2024-08-22 00:25:56 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	1a7ecd55e6	timing for init step, clip for vulkan	2024-08-21 18:14:53 +08:00
Concedo	cd69ab218e	fixed DRY	2024-08-21 17:01:28 +08:00
Concedo	6a4becb731	dry is still buggy because token indexes are wrong	2024-08-21 00:59:26 +08:00
Concedo	db6ef8d1e1	revert dry state reset	2024-08-20 22:22:21 +08:00
Concedo	c1ae350e5b	fixed race condition when generating	2024-08-20 20:17:55 +08:00
Concedo	e12ab53488	force clear some DRY state vars on new generation - not sure if this helps	2024-08-14 21:35:39 +08:00
Concedo	689a17d756	always prefilter to 5k logits	2024-08-12 22:27:06 +08:00
Concedo	729eb1e552	no fast forward for empty prompt	2024-07-27 16:29:35 +08:00
Concedo	eb5b4d0186	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # Package.swift # src/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-07-23 23:20:32 +08:00
Concedo	e2b36aa6cf	fixed dry loading seq when not in use, set kcppt to -1 layers by default	2024-07-22 15:44:34 +08:00
Concedo	0ecf13fc13	updated lite, extra error logging	2024-07-21 17:55:47 +08:00
Concedo	24b9616344	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # CMakeLists.txt # CONTRIBUTING.md # Makefile # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # requirements.txt # src/llama.cpp # tests/test-backend-ops.cpp	2024-07-19 14:23:33 +08:00
Concedo	5988243aee	fix wrong order, fix llava debug mode failure	2024-07-17 15:30:19 +08:00
Concedo	d775a419b2	updated lite with chat inject, added layer detect, added more console logging	2024-07-16 23:10:15 +08:00
Llama	264575426e	Add the DRY dynamic N-gram anti-repetition sampler (#982 ) * Add the DRY dynamic N-gram anti-repetition sampler The DRY (Do not Repeat Yourself) sampler is a dynamic N-gram repetition penalty that negatively scores tokens that would extend sequences that already appear in the context. See this discussion for a motivation and explanation of the sampler: https://github.com/oobabooga/text-generation-webui/pull/5677 This implementation of DRY mostly aligns with the obabooga version with a few modifications. It uses a more efficient linear scanning algorithm to identify repetitions. It also supports multi-token sequence breakers. As a limitation, this implementation reuses the rep pen range parameter, rather than introducing a new range just for the DRY sampler. There is a separate change to lite.koboldai.net that exposes the DRY sampler parameters to KoboldAI Lite, so none of the embed files have been changed as part of this commit. * Update default DRY parameters to match lite * Improve DRY token debug logging * Replace `and` with `&&` to fix MSVC compile error Little known fact: The C++98 standard defines `and` as an alternative token for the `&&` operator (along with a bunch of other digraphs). MSVC does not allow these without using the /Za option or including the <iso646.h> header. Change to the more standard operator to make this code more portable. * Fix MSVC compile error because log is not constexpr Replace the compile-time computation with a floating-point approximation of log(std::numeric_limits<float>::max()). * Remove unused llama sampler variables and clean up sequence breakers. * Remove KCPP_SAMPLER_DRY as a separate enum entry The DRY sampler is effectively a repetition penalty and there are very few reasons to apply it at a different place in sampler order than the standard single-token penalty. There are also multiple projects that have dependencies on the existing sampler IDs, including KoboldAI, KoboldAI Lite, and Silly Tavern. In order to minimize the impact of the dependencies of adding the DRY sampler to koboldcpp, it makes the most sense to not add a new ID for now, and instead to piggyback on KCPP_SAMPLER_REP_PEN. In the future if we find a use case for splitting the application of rep pen and DRY we can introduce a new enum entry then. * Add the dry_penalty_last_n to independently control DRY penalty range This parameter follows the oobabooga semantics: it's optional, with a default value of zero. Zero means that DRY should sample the entire context. Otherwise, it's the number of tokens from the end of the context that are scanned for repetitions. * Limit sequence breaker lengths in tokens and characters The core DRY sampler algorithm is linear in the context length, but there are several parts of the sampler related to multi-token sequence breakers that are potentially quadratic. Without any restrictions, a suitably crafted context and sequence breaker could result in a denial-of-service attack on a server running koboldcpp. This change limits the maximum number of characters and the maximum token length of a sequence breaker in order to limit the maximum overhead associated with the sampler. This change also improves some comments, adding more detail and changing the wording to increase clarity.	2024-07-13 19:08:23 +08:00
Concedo	0dd3907940	qwen2 warning FA	2024-07-09 20:53:25 +08:00
Concedo	d120c55e12	try to fix build errors (+1 squashed commits) Squashed commits: [27c28292] try fix build errors	2024-06-29 23:11:00 +08:00
Nexesenex	cb2336f5d9	Gradient rope formula with offsets (#938 ) * Gradient rope formula with offsets Positive for Solar models Negative for Llama 1 and 2 models * Update gpttype_adapter.cpp Remove L1/L2 * cleanup PR, skip llama models, keep prints behind debug mode --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-25 20:46:34 +08:00
Concedo	12abc41bb4	add llava separator	2024-06-22 21:55:13 +08:00
Concedo	13398477a1	fix ubatch, autoselect vulkan dgpu if possible	2024-06-22 00:23:46 +08:00
askmyteapot	1e72b65c38	GradientAI Auto ROPE Base calculation (#910 ) * GradientAI Auto ROPE Base calculation https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models has a formula that better fits the ideal rope scaling. Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX. * add in solar scaling logic Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k. * Update model_adapter.h adding in tensor count to identify solar models based on tensor count of 435. * Update model_adapter.cpp add in n_tensor count for solar identification * refactor and cleanup GradientAI rope scaling --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-13 18:12:00 +08:00
Concedo	10b148f4c2	added skip bos for tokenize endpoint	2024-06-05 10:49:11 +08:00
Concedo	10a1d628ad	added new binding fields for quant k and quant v	2024-06-03 14:35:59 +08:00
Concedo	4b664b3409	improved EOT handling	2024-05-19 22:04:51 +08:00
Concedo	1db3421c52	multiple minor fixes	2024-05-17 15:47:53 +08:00
Concedo	44443edfda	rep pen slope works (+1 squashed commits) Squashed commits: [535ad566] experiment with rep pen range	2024-05-15 17:20:57 +08:00
Concedo	eff01660e4	re-added smart context due to people complaining	2024-05-11 17:25:03 +08:00
Concedo	dbe72b959e	tidy up and refactor code to support old flags	2024-05-10 16:50:53 +08:00
Concedo	173c7272d5	EOS bypass mode added	2024-05-06 18:01:49 +08:00

1 2 3 4 5 ...

289 commits