koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-10 09:04:36 +00:00

Author	SHA1	Message	Date
Concedo	2c1a06a07d	wip ollama emulation, added detokenize endpoint	2024-11-23 22:48:03 +08:00
Concedo	1dd37933e3	fixed grammar not resetting correctly	2024-11-23 09:55:12 +08:00
kallewoof	547ab2aebb	API: add /props route (#1222 ) * API: add an /extra/chat_template route A lot of manual tweaking is done when swapping between models. We can automate or make better assumptions about some of them by having more information, such as chat template. This PR adds an endpoint /extra/chat_template which returns the model chat template string as is in a 'chat_template' key. The front end can then use this to derive the proper templates or use it as is, or at least warn the user when they are trying to use e.g. a Mistral preset with a Llama 3.1 model. * switch to pre-established /props endpoint for chat template * bug-fix (upstream): one-off in string juggling	2024-11-21 10:58:32 +08:00
Concedo	70aee82552	attempts a backflip, but does he stick the landing?	2024-11-16 17:05:45 +08:00
Concedo	bfa118ee45	fix llava segfault	2024-11-14 14:16:39 +08:00
Concedo	4b96c3bba8	try new batch api (not actually batching)	2024-11-14 13:47:26 +08:00
Concedo	3813f6c517	added new flag nofastforward allowing users to disable fast forwarding	2024-11-13 10:59:01 +08:00
Concedo	48e9372337	prevent outputting infinity to logprobs (+1 squashed commits) Squashed commits: [bcc5f8b92] prevent outputting infinity to logprobs	2024-11-13 00:09:53 +08:00
kallewoof	3c36bbdcd7	debug: display tokens that were dropped by XTC sampler when debugmode is enabled (#1201 )	2024-11-06 23:09:28 +08:00
Concedo	223c5f0844	clblast survived	2024-11-02 21:51:38 +08:00
Concedo	bbebc76817	fix top picks bug, lower input anti abuse thresholds (+1 squashed commits) Squashed commits: [a81d9b21] fix top picks bug, lower input anti abuse thresholds	2024-11-01 16:42:13 +08:00
Concedo	aa26a58085	added logprobs api and logprobs viewer	2024-11-01 00:22:15 +08:00
Concedo	90f5cd0f67	wip logprobs data	2024-10-30 00:59:34 +08:00
Concedo	94a5a27b85	Alone in the darkness They're coming for you I know they will try to catch me too Alone in the darkness They're calling for you There's nowhere to run for cover	2024-10-24 22:29:20 +08:00
Concedo	becd737e0f	slightly increase padding to handle longer gen amts	2024-10-23 22:58:41 +08:00
Maya	8bb220329c	Dynamic sizes for sequences (#1157 ) * Dynamic sizes for sequences * cleanup PR - move all dynamic fields to end of payload, ensure correct null handling to match existing behavior, add anti abuse limit of max 512 for dynamic fields * adjust anti abuse limits --------- Co-authored-by: Concedo <39025047+LostRuins@users.noreply.github.com>	2024-10-16 23:55:11 +08:00
Concedo	7f76425450	lower topk prefilter token amount to 3k	2024-10-16 20:39:41 +08:00
Concedo	cff72c5d26	remove unwanted print	2024-10-11 18:56:32 +08:00
Concedo	e692a79aab	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/docker.yml # CMakeLists.txt # CONTRIBUTING.md # docs/android.md # docs/docker.md # examples/embedding/embedding.cpp # examples/imatrix/imatrix.cpp # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/main/README.md # examples/parallel/parallel.cpp # examples/perplexity/perplexity.cpp # examples/quantize-stats/quantize-stats.cpp # examples/save-load-state/save-load-state.cpp # examples/server/README.md # examples/simple/CMakeLists.txt # examples/speculative/speculative.cpp # flake.lock # ggml/src/CMakeLists.txt # ggml/src/ggml-blas.cpp # pocs/vdot/q8dot.cpp # pocs/vdot/vdot.cpp # scripts/debug-test.sh # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-chat-template.cpp # tests/test-quantize-fns.cpp # tests/test-quantize-perf.cpp # tests/test-tokenizer-0.cpp # tests/test-tokenizer-1-bpe.cpp # tests/test-tokenizer-1-spm.cpp	2024-10-11 11:59:59 +08:00
Maya	5c9650d68e	Fix access violation when using banned_phrases (#1154 )	2024-10-10 21:46:39 +08:00
Concedo	fe5479f286	unify antislop and token bans	2024-10-10 18:21:07 +08:00
Concedo	9b614d46bd	antislop sampler working	2024-10-09 16:33:04 +08:00
Concedo	36e9bac98f	wip anti slop sampler	2024-10-09 13:34:47 +08:00
Concedo	f78f8d3d45	wip anti slop	2024-10-07 23:18:13 +08:00
Concedo	65f3c68399	wip antislop	2024-10-07 20:19:22 +08:00
Concedo	740c5e01cb	added token delay feature	2024-10-07 19:45:51 +08:00
Concedo	3e8bb10e2d	wip on rewind function	2024-10-06 16:21:03 +08:00
Concedo	c38d1ecc8d	update templates, fix rwkv	2024-09-22 01:32:12 +08:00
Concedo	53bf0fb32d	removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.	2024-09-15 19:21:52 +08:00
Concedo	e44ddf26ef	Merge branch 'upstream' into concedo_experimental # Conflicts: # .github/workflows/build.yml # .github/workflows/server.yml # CMakeLists.txt # Makefile # examples/embedding/embedding.cpp # examples/imatrix/imatrix.cpp # examples/llama-bench/llama-bench.cpp # examples/llava/MobileVLM-README.md # examples/parallel/parallel.cpp # examples/perplexity/perplexity.cpp # examples/quantize/CMakeLists.txt # examples/server/README.md # examples/speculative/speculative.cpp # tests/test-backend-ops.cpp	2024-09-13 16:17:24 +08:00
Concedo	7bdac9bc44	prevent shifting on rwkv	2024-09-11 20:22:45 +08:00
Concedo	eee67281be	move kcpp params out	2024-09-10 16:30:12 +08:00
Concedo	fc7fe2e7a0	allow rwkv6 to run although its broken	2024-09-09 20:50:58 +08:00
Concedo	b63158005f	All samplers moved to kcpp side	2024-09-09 18:14:11 +08:00
Concedo	12fd16bfd4	Merge commit '`df270ef745`' into concedo_experimental # Conflicts: # Makefile # common/CMakeLists.txt # common/common.h # common/sampling.cpp # common/sampling.h # examples/infill/infill.cpp # examples/llama-bench/llama-bench.cpp # examples/quantize-stats/quantize-stats.cpp # examples/server/server.cpp # include/llama.h # src/llama-sampling.cpp # src/llama-sampling.h # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-grammar-parser.cpp # tests/test-json-schema-to-grammar.cpp # tests/test-llama-grammar.cpp # tests/test-sampling.cpp	2024-09-09 17:10:08 +08:00
Concedo	c78690737c	fix for DRY segfault on unicode character substring tokenization	2024-09-08 18:25:00 +08:00
Concedo	d220495dd4	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # .github/workflows/docker.yml # docs/docker.md # examples/llama-bench/llama-bench.cpp # flake.lock # ggml/include/ggml.h # ggml/src/CMakeLists.txt # scripts/sync-ggml.last # src/llama.cpp # tests/test-backend-ops.cpp # tests/test-grad0.cpp # tests/test-rope.cpp	2024-08-30 10:37:39 +08:00
Concedo	b78a637da5	try to optimize context shifting	2024-08-26 23:07:31 +08:00
Concedo	cca3c4c78b	xtc fixes	2024-08-22 23:18:46 +08:00
Concedo	fc2545dc83	fixed a typo	2024-08-22 00:25:56 +08:00
Concedo	5bf527a6ae	added xtc sampler	2024-08-21 23:57:15 +08:00
Concedo	1a7ecd55e6	timing for init step, clip for vulkan	2024-08-21 18:14:53 +08:00
Concedo	cd69ab218e	fixed DRY	2024-08-21 17:01:28 +08:00
Concedo	6a4becb731	dry is still buggy because token indexes are wrong	2024-08-21 00:59:26 +08:00
Concedo	db6ef8d1e1	revert dry state reset	2024-08-20 22:22:21 +08:00
Concedo	c1ae350e5b	fixed race condition when generating	2024-08-20 20:17:55 +08:00
Concedo	e12ab53488	force clear some DRY state vars on new generation - not sure if this helps	2024-08-14 21:35:39 +08:00
Concedo	689a17d756	always prefilter to 5k logits	2024-08-12 22:27:06 +08:00
Concedo	729eb1e552	no fast forward for empty prompt	2024-07-27 16:29:35 +08:00
Concedo	eb5b4d0186	Merge branch 'upstream' into concedo_experimental # Conflicts: # Makefile # Package.swift # src/CMakeLists.txt # src/llama.cpp # tests/test-grammar-integration.cpp # tests/test-llama-grammar.cpp	2024-07-23 23:20:32 +08:00

1 2 3 4 5 ...

309 commits