mirror of
https://github.com/LostRuins/koboldcpp.git
synced 2025-09-08 16:19:05 +00:00
* ggml : update mul_mat_id to use the same tensor for all the experts * update cuda * minor * update metal * update test-backend-ops * fix cuda * Update ggml-metal.m Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update convert.py * update convert-hf-to-gguf.py * update convert.py for mixtral hf models * Update convert-hf-to-gguf.py Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * cuda : support non-pow-2 number of experts * allow quantize to work for split and merged experts models in the same way * cleanup + disable mmap automatically with split tensors models * update imatrix * test-backend-ops : test qwen argsort * update grok model loading * llama : add merged experts tensors to the grok tensor map * minor * gguf : bump version * fix quantizing of merged experts * convert-hf-to-gguf.py : update grok (untested) * make linter happy * cuda/argsort : use shared memory instead of pool memory * convert : fix grok tensor names * metal : add support for non-pow-2 argsort * llama : more loader cleanup, better error checking * cuda : fix warning * llama : still use mmap for loading old models, but copy the data to a host buffer * add review note * llama : remove ffn tensor counting + add sanity check ggml-ci * convert : fix handling of n_experts == None ggml-ci * imatrix : fix ncall counters * llama : produce error if imatrix size does not match * quantize : terminate on errors + trace logs ggml-ci * metal : pad shared memory to 16 bytes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |
||
---|---|---|
.. | ||
baby-llama | ||
batched | ||
batched-bench | ||
batched.swift | ||
beam-search | ||
benchmark | ||
convert-llama2c-to-ggml | ||
embedding | ||
export-lora | ||
finetune | ||
gguf | ||
gguf-split | ||
gritlm | ||
imatrix | ||
infill | ||
jeopardy | ||
llama-bench | ||
llama.android | ||
llama.swiftui | ||
llava | ||
lookahead | ||
lookup | ||
main | ||
main-cmake-pkg | ||
parallel | ||
passkey | ||
perplexity | ||
quantize | ||
quantize-stats | ||
retrieval | ||
save-load-state | ||
server | ||
simple | ||
speculative | ||
sycl | ||
tokenize | ||
train-text-from-scratch | ||
alpaca.sh | ||
base-translate.sh | ||
chat-13B.bat | ||
chat-13B.sh | ||
chat-persistent.sh | ||
chat-vicuna.sh | ||
chat.sh | ||
CMakeLists.txt | ||
gpt4all.sh | ||
json-schema-pydantic-example.py | ||
json-schema-to-grammar.py | ||
llama.vim | ||
llama2-13b.sh | ||
llama2.sh | ||
llm.vim | ||
make-ggml.py | ||
Miku.sh | ||
pydantic-models-to-grammar-examples.py | ||
pydantic_models_to_grammar.py | ||
reason-act.sh | ||
regex-to-grammar.py | ||
server-embd.py | ||
server-llama2-13B.sh | ||
ts-type-to-grammar.sh |