koboldcpp

mirror of https://github.com/LostRuins/koboldcpp.git synced 2025-09-12 18:09:42 +00:00

Author	SHA1	Message	Date
Georgi Gerganov	d197545530	llama : bump max layers from 256 to 512 (#8530 ) * llama : bump max layers from 256 to 512 * llama : replace asserts with exceptions	2024-07-19 16:50:47 +03:00
Georgi Gerganov	be0cfb4175	readme : fix server badge	2024-07-19 14:34:55 +03:00
Clint Herron	b57eb9ca4f	ggml : add friendlier error message to fopen errors (#8575 ) * Add additional error information when model files fail to load. * Adding additional error information to most instances of fopen.	2024-07-19 14:05:45 +03:00
Frank Mai	f299aa98ec	fix: typo of chatglm4 chat tmpl (#8586 ) Signed-off-by: thxCode <thxcode0824@gmail.com>	2024-07-19 11:44:41 +02:00
Concedo	1a23d49c32	serve tags endpoint	2024-07-19 16:08:54 +08:00
Brian	3d0e4367d9	convert-*.py: add general.name kv override (#8571 )	2024-07-19 17:51:51 +10:00
Concedo	24b9616344	Merge branch 'upstream' into concedo_experimental # Conflicts: # .devops/full-cuda.Dockerfile # .devops/full-rocm.Dockerfile # .devops/full.Dockerfile # .devops/llama-cli-cuda.Dockerfile # .devops/llama-cli-intel.Dockerfile # .devops/llama-cli-rocm.Dockerfile # .devops/llama-cli-vulkan.Dockerfile # .devops/llama-cli.Dockerfile # .devops/llama-server-cuda.Dockerfile # .devops/llama-server-intel.Dockerfile # .devops/llama-server-rocm.Dockerfile # .devops/llama-server-vulkan.Dockerfile # .devops/llama-server.Dockerfile # CMakeLists.txt # CONTRIBUTING.md # Makefile # ggml/CMakeLists.txt # ggml/src/CMakeLists.txt # requirements.txt # src/llama.cpp # tests/test-backend-ops.cpp	2024-07-19 14:23:33 +08:00
Johannes Gäßler	a15ef8f8a0	CUDA: fix partial offloading for ne0 % 256 != 0 (#8572 )	2024-07-18 23:48:47 +02:00
Concedo	a998588f3a	improved estimation	2024-07-19 00:20:11 +08:00
65a	705b7ecf60	cmake : install all ggml public headers (#8480 ) Co-authored-by: 65a <65a@65a.invalid>	2024-07-18 17:47:12 +03:00
Concedo	caab9cb8ae	fixed unwanted removal	2024-07-18 22:27:22 +08:00
BBC-Esq	621801da0e	Streamline misc (#1007 ) * fix typo and streamline a little * streamline togglehorde * oops	2024-07-18 22:25:38 +08:00
Concedo	8b0a9f7e56	remove keys, use tuple	2024-07-18 22:11:13 +08:00
BBC-Esq	7de1ebf897	Streamline with dictionaries (#1005 ) * dictionary #1 * dictionary #2	2024-07-18 22:05:30 +08:00
BBC-Esq	ce971a0f3d	Streamline with fstrings (#1006 ) * fstring #1 * fstring #2	2024-07-18 21:48:46 +08:00
Eric Zhang	0d2c7321e9	server: use relative routes for static files in new UI (#8552 ) * server: public: fix api_url on non-index pages * server: public: use relative routes for static files in new UI	2024-07-18 12:43:49 +02:00
Brian	672a6f1018	convert-.py: GGUF Naming Convention Refactor and Metadata Override Refactor (#7499 ) Main thing is that the default output filename will take this form {name}{parameters}{finetune}{version}{encoding}{kind} In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content No Change: - Internal GGUF Spec - `general.architecture` - `general.quantization_version` - `general.alignment` - `general.file_type` - General Model Details - `general.name` - `general.author` - `general.version` - `general.description` - Licensing details - `general.license` - Typically represents the converted GGUF repo (Unless made from scratch) - `general.url` - Model Source during conversion - `general.source.url` * Removed: - Model Source during conversion - `general.source.huggingface.repository` * Added: - General Model Details - `general.organization` - `general.finetune` - `general.basename` - `general.quantized_by` - `general.size_label` - Licensing details - `general.license.name` - `general.license.link` - Typically represents the converted GGUF repo (Unless made from scratch) - `general.doi` - `general.uuid` - `general.repo_url` - Model Source during conversion - `general.source.doi` - `general.source.uuid` - `general.source.repo_url` - Base Model Source - `general.base_model.count` - `general.base_model.{id}.name` - `general.base_model.{id}.author` - `general.base_model.{id}.version` - `general.base_model.{id}.organization` - `general.base_model.{id}.url` (Model Website/Paper) - `general.base_model.{id}.doi` - `general.base_model.{id}.uuid` - `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...)) - Array based KV stores - `general.tags` - `general.languages` - `general.datasets` --------- Co-authored-by: compilade <git@compilade.net> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>	2024-07-18 20:40:15 +10:00
RunningLeon	3807c3de04	server : respect `--special` cli arg (#8553 )	2024-07-18 11:06:22 +03:00
Concedo	6080fa38ce	updated lite	2024-07-18 15:55:45 +08:00
Concedo	90c1bbbcb9	more url downoad support	2024-07-18 11:56:05 +08:00
Johannes Gäßler	e02b597be3	lookup: fibonacci hashing, fix crashes (#8548 )	2024-07-17 23:35:44 +02:00
Al Mochkin	b3283448ce	build : Fix docker build warnings (#8535 ) (#8537 )	2024-07-17 20:21:55 +02:00
Concedo	ad86b1aeb8	Implemented Kcpp Launch Templates (+1 squashed commits) Squashed commits: [5ea4c1de] wip integrating skcpps templates (+1 squashed commits) Squashed commits: [737daa7f] skcpps wip	2024-07-18 00:22:59 +08:00
Brian	30f80ca0bc	CONTRIBUTING.md : remove mention of noci (#8541 )	2024-07-17 17:57:06 +03:00
Concedo	8ccc0144d2	ability to set -1 as gpulayers and determine at runtime (+1 squashed commits) Squashed commits: [594263c3] ability to set -1 as gpulayers and determine at runtime	2024-07-17 20:31:19 +08:00
hipudding	1bdd8ae19f	[CANN] Add Ascend NPU backend (#6035 ) * [CANN] Add Ascend NPU backend Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI. Co-authored-by: wangshuai09 <391746016@qq.com> * delete trailing whitespaces * Modify the code based on review comment * Rename LLAMA_CANN to GGML_CANN * Make ggml-common.h private * add ggml_cann prefix for acl funcs * Add logging for CANN backend * Delete Trailing whitespace --------- Co-authored-by: wangshuai09 <391746016@qq.com>	2024-07-17 14:23:50 +03:00
Concedo	869e30a6a0	Updated CLInfo from https://github.com/Oblomov/clinfo https://ci.appveyor.com/api/projects/oblomov/clinfo/artifacts/clinfo.exe?job=platform%3a+x64	2024-07-17 19:20:17 +08:00
Concedo	6c883a4803	dummy skcpps format	2024-07-17 18:35:27 +08:00
Concedo	eca7521c13	allowed embedded chat adapters	2024-07-17 18:08:43 +08:00
Masaya, Kato	da3913d8f9	batched: fix n_predict parameter (#8527 )	2024-07-17 10:34:28 +03:00
Georgi Gerganov	d65a8361fe	llama : disable context-shift for DeepSeek v2 (#8501 )	2024-07-17 10:32:59 +03:00
Concedo	5988243aee	fix wrong order, fix llava debug mode failure	2024-07-17 15:30:19 +08:00
Johannes Gäßler	5e116e8dd5	make/cmake: add missing force MMQ/cuBLAS for HIP (#8515 )	2024-07-16 21:20:59 +02:00
Concedo	e99fa531a2	reorder items	2024-07-17 00:28:48 +08:00
Concedo	d775a419b2	updated lite with chat inject, added layer detect, added more console logging	2024-07-16 23:10:15 +08:00
Brian	1666f92dcd	gguf-hash : update clib.json to point to original xxhash repo (#8491 ) * Update clib.json to point to Cyan4973 original xxhash Convinced Cyan4973 to add clib.json directly to his repo, so can now point the clib package directly to him now. Previously pointed to my fork with the clib.json package metadata https://github.com/Cyan4973/xxHash/pull/954 * gguf-hash: readme update to point to Cyan4973 xxHash repo [no ci]	2024-07-16 10:14:16 +03:00
Steve Bonds	37b12f92ab	export-lora : handle help argument (#8497 ) The --help option on export-lora isn't accepted as valid. The help still gets displayed by default, but the script exits with an error message and nonzero status.	2024-07-16 10:04:45 +03:00
Georgi Gerganov	0efec57787	llama : valign + remove unused ftype (#8502 )	2024-07-16 10:00:30 +03:00
compilade	7acfd4e8d5	convert_hf : faster lazy safetensors (#8482 ) * convert_hf : faster lazy safetensors This makes '--dry-run' much, much faster. * convert_hf : fix memory leak in lazy MoE conversion The '_lazy' queue was sometimes self-referential, which caused reference cycles of objects old enough to avoid garbage collection until potential memory exhaustion.	2024-07-15 23:13:10 -04:00
Xuan Son Nguyen	97bdd26eee	Refactor lora adapter support (#8332 ) * lora: load to devide buft * add patch tensor function * correct tensor patch * llama_lora_adapter_apply * correct ggml_backend_tensor_copy * add llm_build_mm * fix auto merge * update based on review comments * add convert script * no more transpose A * add f16 convert * add metadata check * add sanity check * fix ftype * add requirements * fix requirements * fix outfile * conversion: only allow selected models * fix types * cuda : do not use dmmv if the tensor does not have enough cols * llama : lora fixes * do not disable mmap with lora Co-authored-by: slaren <slarengh@gmail.com> * llm_build_lora_mm_id * convert_lora : MoE LoRA conversion support * convert_lora : prefer safetensors, similarly to convert_hf * convert_hf : simplify modify_tensors for InternLM2 * convert_lora : lazy conversion * llama : load and use alpha from LoRA adapters * llama : use llm_build_lora_mm in most model graphs * auto scale * Revert "auto scale" This reverts commit 42415a4874e0f963e4aca6796ea5dfb97cd17464. * remove redundant params * Apply suggestions from code review Co-authored-by: slaren <slarengh@gmail.com> * change kv metadata * move add_type to __init__ * convert_hf : move add_type to main() * convert_lora : use the GGUFWriter from Model instead of overwriting it --------- Co-authored-by: slaren <slarengh@gmail.com> Co-authored-by: Francis Couture-Harpin <git@compilade.net>	2024-07-15 20:50:47 +02:00
Xuan Son Nguyen	4db8f60fe7	fix ci (#8494 )	2024-07-15 19:23:10 +02:00
Concedo	a441c27cb5	fixed broken link	2024-07-16 01:00:16 +08:00
Concedo	e707ab9025	Merge branch 'upstream' into concedo_experimental # Conflicts: # docs/development/HOWTO-add-model.md # docs/development/token_generation_performance_tips.md # flake.lock	2024-07-16 00:49:34 +08:00
Concedo	516fd35e93	error popups on python exits	2024-07-16 00:46:32 +08:00
Concedo	8412946b9f	fix oldcpu build avx1	2024-07-15 23:42:22 +08:00
Concedo	21179d675b	try ci for avx1, up ver (+2 squashed commit) Squashed commit: [74150175] up version [97b6163c] try ci for avx1 linux	2024-07-15 23:07:07 +08:00
Daniel Bevenius	8fac431b06	ggml : suppress unknown pragma 'GCC' on windows (#8460 ) This commit adds a macro guard to pragma GCC to avoid the following warning on windows: ```console C:\llama.cpp\ggml\src\ggml-aarch64.c(17,9): warning C4068: unknown pragma 'GCC' [C:\lama.cpp\build\ggml\src\ggml.vcxproj] ```	2024-07-15 15:48:17 +03:00
M-A	f17f39ff9c	server: update README.md with llama-server --help output [no ci] (#8472 ) The README.md had a stale information. In particular, the --ctx-size "defaults to 512" confused me and I had to check the code to confirm this was false. This the server is evolving rapidly, it's probably better to keep the source of truth at a single place (in the source) and generate the README.md based on that. Did: make llama-server ./llama-server --help > t.txt vimdiff t.txt examples/server/README.md I copied the content inside a backquote block. I would have preferred proper text but it would require a fair amount of surgery to make the current output compatible with markdown. A follow up could be to automate this process with a script. No functional change.	2024-07-15 15:04:56 +03:00
Georgi Gerganov	9104bc20ed	common : add --no-cont-batching arg (#6358 )	2024-07-15 14:54:58 +03:00
NikolaiLyssogor	fc690b018e	docs: fix links in development docs [no ci] (#8481 ) Fixes a few links to within the repo that were broken in the reorganization of the documentation in #8325.	2024-07-15 14:46:39 +03:00

... 15 16 17 18 19 ...

6004 commits