koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 17:14:36 +00:00

Author	SHA1	Message	Date
askmyteapot	1e72b65c38	GradientAI Auto ROPE Base calculation (#910 ) * GradientAI Auto ROPE Base calculation https://gradient.ai/blog/scaling-rotational-embeddings-for-long-context-language-models has a formula that better fits the ideal rope scaling. Tested with Lllama3, checked calculation is correct for llama2. Retains logic for not scaling rope if under trained CTX. * add in solar scaling logic Solar based models require the context values to be multiplied by 8. This is (i'm guessing) because the positions as based on a 32k context, but sliding window of 4k. * Update model_adapter.h adding in tensor count to identify solar models based on tensor count of 435. * Update model_adapter.cpp add in n_tensor count for solar identification * refactor and cleanup GradientAI rope scaling --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-06-13 18:12:00 +08:00
Concedo	10b148f4c2	added skip bos for tokenize endpoint	2024-06-05 10:49:11 +08:00
Concedo	10a1d628ad	added new binding fields for quant k and quant v	2024-06-03 14:35:59 +08:00
Concedo	4b664b3409	improved EOT handling	2024-05-19 22:04:51 +08:00
Concedo	1db3421c52	multiple minor fixes	2024-05-17 15:47:53 +08:00
Concedo	44443edfda	rep pen slope works (+1 squashed commits) Squashed commits: [535ad566] experiment with rep pen range	2024-05-15 17:20:57 +08:00
Concedo	eff01660e4	re-added smart context due to people complaining	2024-05-11 17:25:03 +08:00
Concedo	dbe72b959e	tidy up and refactor code to support old flags	2024-05-10 16:50:53 +08:00
Concedo	173c7272d5	EOS bypass mode added	2024-05-06 18:01:49 +08:00
Concedo	b48ea96ead	removed unwanted debugs	2024-05-01 11:35:07 +08:00
Concedo	c65448d17a	add flash attention toggle	2024-04-30 21:29:11 +08:00
Concedo	17a24d753c	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/main-intel.Dockerfile # .devops/main-vulkan.Dockerfile # .devops/server-intel.Dockerfile # .devops/server-vulkan.Dockerfile # .github/workflows/bench.yml # .github/workflows/build.yml # .github/workflows/python-lint.yml # .github/workflows/server.yml # .gitignore # Makefile # README-sycl.md # README.md # ci/run.sh # flake.lock # llama.cpp # models/ggml-vocab-falcon.gguf # models/ggml-vocab-llama-spm.gguf # models/ggml-vocab-mpt.gguf # models/ggml-vocab-stablelm.gguf # models/ggml-vocab-starcoder.gguf # requirements.txt # scripts/check-requirements.sh # tests/CMakeLists.txt # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-tokenizer-0-bpe.py # tests/test-tokenizer-0-spm.py # tests/test-tokenizer-1-spm.cpp	2024-04-30 21:04:17 +08:00
Concedo	c230b78906	refactored a lot of code, remove bantokens, move it to api	2024-04-27 17:57:13 +08:00
Concedo	4ec8a9c57b	expose stop reason in generation	2024-04-27 01:12:12 +08:00
Concedo	0871c7cbd1	Add additional debug info and increased ctx sizes, fixed a bug loading vulkan config	2024-04-25 23:07:37 +08:00
Concedo	cb2dbe9e9a	improved rep pen speed	2024-04-24 21:29:21 +08:00
Concedo	b4d2031215	merged, added ability to render special tokens	2024-04-22 18:19:58 +08:00
Concedo	3170284fc3	added support for special tokens as stop sequences	2024-04-20 09:48:32 +08:00
Concedo	b01820dec7	auto rope scaling changes	2024-04-19 23:08:55 +08:00
Concedo	9a25d77cc1	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/docker.yml # Makefile # README-sycl.md # README.md # ci/run.sh # ggml-cuda.cu # ggml.c # grammars/README.md # scripts/get-wikitext-2.sh # scripts/hf.sh # scripts/sync-ggml.last # tests/test-backend-ops.cpp # tests/test-grammar-integration.cpp # tests/test-json-schema-to-grammar.cpp	2024-04-14 21:18:39 +08:00
Concedo	125f84aa02	fixed compiler warnings	2024-04-08 16:40:55 +08:00
Concedo	a530afa1e4	Merge commit '`280345968d`' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cpp-cuda.srpm.spec # .devops/main-cuda.Dockerfile # .devops/nix/package.nix # .devops/server-cuda.Dockerfile # .github/workflows/build.yml # CMakeLists.txt # Makefile # README.md # ci/run.sh # docs/token_generation_performance_tips.md # flake.lock # llama.cpp # scripts/LlamaConfig.cmake.in # scripts/compare-commits.sh # scripts/server-llm.sh # tests/test-quantize-fns.cpp	2024-04-07 20:27:17 +08:00
Concedo	2ef03c9de6	fix for physical batch size	2024-03-15 16:45:20 +08:00
Concedo	47c42fd45c	fix for mamba processing	2024-03-13 13:27:46 +08:00
Concedo	484d90c330	llava support is now fully functioning	2024-03-11 15:55:32 +08:00
Concedo	d943c739a8	wip submitting of llava image to backend	2024-03-10 17:14:27 +08:00
Concedo	c08d7e5042	wip integration of llava	2024-03-10 11:18:47 +08:00
Concedo	7c64845dea	Merge branch 'master' into concedo_experimental # Conflicts: # .devops/nix/sif.nix # .github/workflows/build.yml # .github/workflows/python-check-requirements.yml # README-sycl.md # README.md # flake.lock # flake.nix # requirements/requirements-convert-hf-to-gguf.txt # scripts/compare-llama-bench.py	2024-03-04 15:33:33 +08:00
Concedo	2d9a90b652	try to fix ci compile errors (+1 squashed commits) Squashed commits: [d0d49663] fixed log multiline (+1 squashed commits) Squashed commits: [81a8befe] try to fix linux build error (+1 squashed commits) Squashed commits: [22850dda] try to fix build (+1 squashed commits) Squashed commits: [b8294611] missing type	2024-03-01 23:38:15 +08:00
Concedo	55af5446ad	Merge branch 'master' into concedo_experimental # Conflicts: # README.md # ci/run.sh # llama.cpp # scripts/sync-ggml.last	2024-03-01 17:41:37 +08:00
Concedo	524ba12abd	refactor - do not use a copy buffer to store generation outputs, instead return a cpp allocated ptr	2024-02-29 14:02:20 +08:00
Concedo	f75e479db0	WIP on sdcpp integration	2024-02-29 00:40:07 +08:00
Concedo	ad638285de	Merge branch 'master' into concedo_experimental # Conflicts: # Makefile # README.md # flake.lock # ggml-cuda.cu # llama.cpp # tests/test-backend-ops.cpp # tests/test-quantize-fns.cpp	2024-02-28 13:41:35 +08:00
Concedo	d47e13c892	fixed compile error: GGML_BACKEND_TYPE_GPU (+1 squashed commits) Squashed commits: [00ca282a] fixed compile error: LLAMA_SPLIT_MODE_ROW	2024-02-26 10:55:35 +08:00
Concedo	b5ba6c9ece	test to see if Ofast for ggml library plus batching adjustments fixes speed regression for ggmlv1 models	2024-02-25 21:14:53 +08:00
Concedo	6d6d79f359	fixed a horrible bug in thread counts	2024-02-22 23:57:40 +08:00
Concedo	8d5e25008f	Merge branch 'master' into concedo_experimental # Conflicts: # CMakeLists.txt # Makefile # README.md # ci/run.sh # tests/test-tokenizer-0-falcon.cpp # tests/test-tokenizer-0-llama.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-llama.cpp	2024-02-17 15:22:05 +08:00
Concedo	066e73d769	context shift even more lenient	2024-02-11 18:30:38 +08:00
Concedo	590af480ab	contextshift more forgiving	2024-02-10 20:49:21 +08:00
Concedo	35111ce01a	row split mode is now a toggle	2024-02-09 18:35:58 +08:00
Concedo	992eea71d7	fixes for vulkan multigpu	2024-02-09 14:42:27 +08:00
Concedo	fe424a5466	tensor split active text	2024-02-09 12:02:23 +08:00
Concedo	4cd571db89	vulkan multigpu, show uptime	2024-02-08 16:54:38 +08:00
Concedo	35c32fd0f2	refactor some old code with batching	2024-02-05 15:54:45 +08:00
Alexander Abushady	4cb956c7db	Quadratic Sampling UI (#652 ) * Quadratic Sampling UI Kalomaze's Quadratic Sampling, now has a UI within KCPP. * remove debug prints * cleanup, add smooth sampler to dynatemp --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-02-04 16:26:27 +08:00
Concedo	2b02cd75c7	reformat debug logging	2024-02-01 23:20:51 +08:00
Concedo	340fbbbb04	show warning if genamt >= ctxsize, show t/s values	2024-01-31 18:51:42 +08:00
Concedo	13dcf4b556	print seed	2024-01-31 14:42:47 +08:00
Concedo	21ab727e83	change split mode to rows	2024-01-30 22:30:08 +08:00
Concedo	ed09a854f0	Merge branch 'master' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .gitignore # CMakeLists.txt # Makefile # README.md # ci/run.sh # ggml-opencl.cpp # tests/CMakeLists.txt	2024-01-27 11:45:07 +08:00

1 2 3 4 5

248 commits