Concedo
53bf0fb32d
removed openblas backend, merged into CPU (with llamafile for BLAS). GPU backend is now automatically selected when running from CLI unless noblas is specified.
2024-09-15 19:21:52 +08:00
Concedo
b63158005f
All samplers moved to kcpp side
2024-09-09 18:14:11 +08:00
Concedo
12fd16bfd4
Merge commit ' df270ef745' into concedo_experimental
...
# Conflicts:
# Makefile
# common/CMakeLists.txt
# common/common.h
# common/sampling.cpp
# common/sampling.h
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/server/server.cpp
# include/llama.h
# src/llama-sampling.cpp
# src/llama-sampling.h
# src/llama.cpp
# tests/test-grammar-integration.cpp
# tests/test-grammar-parser.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-llama-grammar.cpp
# tests/test-sampling.cpp
2024-09-09 17:10:08 +08:00
Xuan Son Nguyen
1b9ae5189c
common : refactor arg parser ( #9308 )
...
* (wip) argparser v3
* migrated
* add test
* handle env
* fix linux build
* add export-docs example
* fix build (2)
* skip build test-arg-parser on windows
* update server docs
* bring back missing --alias
* bring back --n-predict
* clarify test-arg-parser
* small correction
* add comments
* fix args with 2 values
* refine example-specific args
* no more lamba capture
Co-authored-by: slaren@users.noreply.github.com
* params.sparams
* optimize more
* export-docs --> gen-docs
2024-09-07 20:43:51 +02:00
ltoniazzi
2339a0be1c
tests : add integration test for lora adapters ( #8957 )
...
* Add printing to check weights match torch version
* minor code style changes
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2024-08-18 11:58:04 +02:00
tc-mb
3071c0a5f2
llava : support MiniCPM-V-2.5 ( #7599 )
...
* init
* rename
* add run android for termux in readme
* add android readme
* add instructions in readme
* change name in readme
* Update README.md
* fixed line
* add result in readme
* random pos_embed
* add positions index
* change for ollama
* change for ollama
* better pos_embed in clip
* support ollama
* updata cmakelist
* updata cmakelist
* rename wrapper
* clear code
* replace and organize code
* add link
* sync master
* fix warnings
* fix warnings
* fix bug in bicubic resize when need resize iamge smaller
* receive review comments and modify
* receive review comments and modify
* put all code into llava dir
* fix quality problem in pr code
* change n_layer
* add space in "-1"
* imitate reshape bug of python code
* fix bug in clip
* fix issues for merging
* fix llama-minicpmv-cli in cmake file
* change pr readme
* fix code review
* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir
* fix cmakefile
* add warn
* fix KEY_HAS_MINICPMV_PROJ
* remove load_image_size into clip_ctx
* remove the extern "C", MINICPMV_API
* fix uhd code for review comment
* delete minicpmv-wrapper in pr
* remove uhd_image_embed
* Modify 2 notes
* clip : style changes
* del common.h in clip
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix Type-Check error
* fix makefile error
* fix ubuntu-make error
* try fix clip
* try fix 1
---------
Co-authored-by: Hongji Zhu <fireyoucan@gmail.com>
Co-authored-by: harvestingmoon <leewenyeong@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-09 13:33:53 +03:00
Austin
4730faca61
chore : Fix vulkan related compiler warnings, add help text, improve CLI options ( #8477 )
...
* chore: Fix compiler warnings, add help text, improve CLI options
* Add prototypes for function definitions
* Invert logic of --no-clean option to be more intuitive
* Provide a new help prompt with clear instructions
* chore : Add ignore rule for vulkan shader generator
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Update ggml/src/vulkan-shaders/vulkan-shaders-gen.cpp
Co-authored-by: 0cc4m <picard12@live.de>
* chore : Remove void and apply C++ style empty parameters
* chore : Remove void and apply C++ style empty parameters
---------
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
Co-authored-by: 0cc4m <picard12@live.de>
2024-07-28 09:52:42 +02:00
Georgi Gerganov
a977c11544
gitignore : deprecated binaries
2024-07-11 11:20:40 +03:00
Xuan Son Nguyen
be20e7f49d
Reorganize documentation pages ( #8325 )
...
* re-organize docs
* add link among docs
* add link to build docs
* fix style
* de-duplicate sections
2024-07-05 18:08:32 +02:00
ditsuke
de14e2ea2b
chore: ignore all __pychache__
2024-07-04 15:39:13 +00:00
ditsuke
b0a46993df
build(python): Package scripts with pip-0517 compliance
2024-07-04 15:39:13 +00:00
Nexesenex
1203a56f6b
Add Cublas12 dlls to .gitignore ( #946 )
2024-06-28 23:41:44 +08:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Michael de Gans
a7854743c5
un-ignore build-info.cmake and build-info.sh ( #7996 )
...
* un-ignore `build-info.cmake` and `build-info.sh`
I am assuming that ignoring them was unintentional. If they are ignored, some tools, like cargo, will consider the files inexistent, even if they're comitted, for the purpose of publishing. This leads to the build failing in such cases.
* un-ignore `build-info.cpp.in`
For the same reason as the previous two files.
* Reorganize `.gitignore`
* Add exceptions for files mentioned by @slaren
I did leave .clang-tidy since it was explicitly ignored before.
* Add comments for organization
* Sort some lines for pretty
* Test with `make` and `cmake` builds to ensure no build artifacts might be comitted
* Remove `.clang-tidy` from `.gitignore`
Per comment by @ggerganov
* Remove `IDEWorkspaceChecks.plist` from root-level `.gitignore`
2024-06-19 22:10:42 +02:00
Olivier Chafik
1c641e6aac
build: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... (#7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df481fd8936cd7d098e3065d7de378930.
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
zhouwg
b226c1227b
refine .gitignore ( #7688 )
...
This adds tags and android ndk into the git ignore list
2024-06-04 21:21:26 +10:00
Austin
7c4e5b7eae
chore : add ignore rule for generated server themes ( #7689 )
2024-06-02 20:39:08 +03:00
Olivier Chafik
8843a98c2b
Improve usability of --model-url & related flags ( #6930 )
...
* args: default --model to models/ + filename from --model-url or --hf-file (or else legacy models/7B/ggml-model-f16.gguf)
* args: main & server now call gpt_params_handle_model_default
* args: define DEFAULT_MODEL_PATH + update cli docs
* curl: check url of previous download (.json metadata w/ url, etag & lastModified)
* args: fix update to quantize-stats.cpp
* curl: support legacy .etag / .lastModified companion files
* curl: rm legacy .etag file support
* curl: reuse regex across headers callback calls
* curl: unique_ptr to manage lifecycle of curl & outfile
* curl: nit: no need for multiline regex flag
* curl: update failed test (model file collision) + gitignore *.gguf.json
2024-04-30 00:52:50 +01:00
Georgi Gerganov
f4ab2a4147
llama : fix BPE pre-tokenization ( #6920 )
...
* merged the changes from deepseeker models to main branch
* Moved regex patterns to unicode.cpp and updated unicode.h
* Moved header files
* Resolved issues
* added and refactored unicode_regex_split and related functions
* Updated/merged the deepseek coder pr
* Refactored code
* Adding unicode regex mappings
* Adding unicode regex function
* Added needed functionality, testing remains
* Fixed issues
* Fixed issue with gpt2 regex custom preprocessor
* unicode : fix? unicode_wstring_to_utf8
* lint : fix whitespaces
* tests : add tokenizer tests for numbers
* unicode : remove redundant headers
* tests : remove and rename tokenizer test scripts
* tests : add sample usage
* gguf-py : reader prints warnings on duplicate keys
* llama : towards llama3 tokenization support (wip)
* unicode : shot in the dark to fix tests on Windows
* unicode : first try custom implementations
* convert : add "tokenizer.ggml.pre" GGUF KV (wip)
* llama : use new pre-tokenizer type
* convert : fix pre-tokenizer type writing
* lint : fix
* make : add test-tokenizer-0-llama-v3
* wip
* models : add llama v3 vocab file
* llama : adapt punctuation regex + add llama 3 regex
* minor
* unicode : set bomb
* unicode : set bomb
* unicode : always use std::wregex
* unicode : support \p{N}, \p{L} and \p{P} natively
* unicode : try fix windows
* unicode : category support via std::regex
* unicode : clean-up
* unicode : simplify
* convert : add convert-hf-to-gguf-update.py
ggml-ci
* lint : update
* convert : add falcon
ggml-ci
* unicode : normalize signatures
* lint : fix
* lint : fix
* convert : remove unused functions
* convert : add comments
* convert : exercise contractions
ggml-ci
* lint : fix
* cmake : refactor test targets
* tests : refactor vocab tests
ggml-ci
* tests : add more vocabs and tests
ggml-ci
* unicode : cleanup
* scripts : ignore new update script in check-requirements.sh
* models : add phi-3, mpt, gpt-2, starcoder
* tests : disable obsolete
ggml-ci
* tests : use faster bpe test
ggml-ci
* llama : more prominent warning for old BPE models
* tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
---------
Co-authored-by: Jaggzh <jaggz.h@gmail.com>
Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
2024-04-29 16:58:41 +03:00
Olivier Chafik
5cf5e7d490
build: generate hex dump of server assets during build (#6661 )
...
* `build`: generate hex dumps of server assets on the fly
* build: workaround lack of -n on gnu xxd
* build: don't use xxd in cmake
* build: don't call xxd from build.zig
* build: more idiomatic hexing
* build: don't use xxd in Makefile (od hackery instead)
* build: avoid exceeding max cmd line limit in makefile hex dump
* build: hex dump assets at cmake build time (not config time)
2024-04-21 18:48:53 +01:00
Concedo
9a25d77cc1
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/docker.yml
# Makefile
# README-sycl.md
# README.md
# ci/run.sh
# ggml-cuda.cu
# ggml.c
# grammars/README.md
# scripts/get-wikitext-2.sh
# scripts/hf.sh
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2024-04-14 21:18:39 +08:00
Pierrick Hymbert
b804b1ef77
eval-callback: Example how to use eval callback for debugging ( #6576 )
...
* gguf-debug: Example how to use ggml callback for debugging
* gguf-debug: no mutex, verify type, fix stride.
* llama: cv eval: move cb eval field in common gpt_params
* ggml_debug: use common gpt_params to pass cb eval.
Fix get tensor SIGV random.
* ggml_debug: ci: add tests
* ggml_debug: EOL in CMakeLists.txt
* ggml_debug: Remove unused param n_batch, no batching here
* ggml_debug: fix trailing spaces
* ggml_debug: fix trailing spaces
* common: fix cb_eval and user data not initialized
* ci: build revert label
* ggml_debug: add main test label
* doc: add a model: add a link to ggml-debug
* ggml-debug: add to make toolchain
* ggml-debug: tests add the main label
* ggml-debug: ci add test curl label
* common: allow the warmup to be disabled in llama_init_from_gpt_params
* ci: add curl test
* ggml-debug: better tensor type support
* gitignore : ggml-debug
* ggml-debug: printing also the sum of each tensor
* ggml-debug: remove block size
* eval-callback: renamed from ggml-debug
* eval-callback: fix make toolchain
---------
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-11 14:51:07 +02:00
Minsoo Cheong
64e7b47c69
examples : add "retrieval" ( #6193 )
...
* add `retrieval` example
* add README
* minor fixes
* cast filepos on print
* remove use of variable sized array
* store similarities in separate vector
* print error on insufficient batch size
* fix error message printing
* assign n_batch value to n_ubatch
* fix param definitions
* define retrieval-only parameters in retrieval.cpp
* fix `--context-file` option to be provided multiple times for multiple files
* use vector for `query_emb`
* add usage description in README
* fix merge conflict
* fix usage printing
* remove seed setting
* fix lint
* increase file read buffer size
* retrieval : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-25 09:38:22 +02:00
Georgi Gerganov
95562175f8
gitignore : gguf-split
2024-03-23 21:35:23 +02:00
Johannes Gäßler
50ccaf5eac
lookup: complement data from context with general text statistics ( #5479 )
...
* lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
2024-03-23 01:24:36 +01:00
Olivier Chafik
5b7b0ac8df
json-schema-to-grammar improvements (+ added to server) ( #5978 )
...
* json: fix arrays (disallow `[,1]`)
* json: support tuple types (`[number, string]`)
* json: support additionalProperties (`{[k: string]: [string,number][]}`)
* json: support required / optional properties
* json: add support for pattern
* json: resolve $ref (and support https schema urls)
* json: fix $ref resolution
* join: support union types (mostly for nullable types I think)
* json: support allOf + nested anyOf
* json: support any (`{}` or `{type: object}`)
* json: fix merge
* json: temp fix for escapes
* json: spaces in output and unrestricted output spaces
* json: add typings
* json:fix typo
* Create ts-type-to-grammar.sh
* json: fix _format_literal (json.dumps already escapes quotes)
* json: merge lit sequences and handle negatives
{"type": "string", "pattern": "^({\"question\": \"[^\"]+\", \"response\": \"[^\"]+\"}\\n)+$"}
* json: handle pattern repetitions
* Update json-schema-to-grammar.mjs
* Create regex-to-grammar.py
* json: extract repeated regexp patterns to subrule
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* json: handle schema from pydantic Optional fields
* Update json-schema-to-grammar.py
* Update json-schema-to-grammar.py
* Update ts-type-to-grammar.sh
* Update ts-type-to-grammar.sh
* json: simplify nullable fields handling
* json: accept duplicate identical rules
* json: revert space to 1 at most
* json: reuse regexp pattern subrules
* json: handle uuid string format
* json: fix literal escapes
* json: add --allow-fetch
* json: simplify range escapes
* json: support negative ranges in patterns
* Delete commit.txt
* json: custom regex parser, adds dot support & JS-portable
* json: rm trailing spaces
* Update json-schema-to-grammar.mjs
* json: updated server & chat `( cd examples/server && ./deps.sh )`
* json: port fixes from mjs to python
* Update ts-type-to-grammar.sh
* json: support prefixItems alongside array items
* json: add date format + fix uuid
* json: add date, time, date-time formats
* json: preserve order of props from TS defs
* json: port schema converter to C++, wire in ./server
* json: nits
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* Update json-schema-to-grammar.cpp
* json: fix mjs implementation + align outputs
* Update json-schema-to-grammar.mjs.hpp
* json: test C++, JS & Python versions
* json: nits + regen deps
* json: cleanup test
* json: revert from c++17 to 11
* json: nit fixes
* json: dirty include for test
* json: fix zig build
* json: pass static command to std::system in tests (fixed temp files)
* json: fix top-level $refs
* json: don't use c++20 designated initializers
* nit
* json: basic support for reserved names `{number:{number:{root:number}}}`
* Revamp test cmake to allow args (WORKING_DIRECTORY needed for JSON test)
* json: re-ran server deps.sh
* json: simplify test
* json: support mix of additional props & required/optional
* json: add tests for some expected failures
* json: fix type=const in c++, add failure expectations for non-str const&enum
* json: test (& simplify output of) empty schema
* json: check parsing in test + fix value & string refs
* json: add server tests for OAI JSON response_format
* json: test/fix top-level anyOf
* json: improve grammar parsing failures
* json: test/fix additional props corner cases
* json: fix string patterns (was missing quotes)
* json: ws nit
* json: fix json handling in server when there's no response_format
* json: catch schema conversion errors in server
* json: don't complain about unknown format type in server if unset
* json: cleaner build of test
* json: create examples/json-schema-pydantic-example.py
* json: fix date pattern
* json: move json.hpp & json-schema-to-grammar.{cpp,h} to common
* json: indent 4 spaces
* json: fix naming of top-level c++ function (+ drop unused one)
* json: avoid using namespace std
* json: fix zig build
* Update server.feature
* json: iostream -> fprintf
* json: space before & refs for consistency
* json: nits
2024-03-21 11:50:43 +00:00
Georgi Gerganov
6b7e76d28c
gitignore : ignore curl-related files
2024-03-20 14:17:34 +02:00
Concedo
93d3871056
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# ggml-metal.m
2024-03-15 10:37:48 +08:00
Georgi Gerganov
381da2d9f0
metal : build metallib + fix embed path ( #6015 )
...
* metal : build metallib + fix embed path
ggml-ci
* metal : fix embed build + update library load logic
ggml-ci
* metal : fix embeded library build
ggml-ci
* ci : fix iOS builds to use embedded library
2024-03-14 11:55:23 +02:00
Concedo
ec5dea14d7
merged, try to fix metal build
2024-03-14 11:15:50 +08:00
Concedo
6a32c14e86
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# CMakeLists.txt
# Makefile
# README-sycl.md
# README.md
# flake.lock
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/.gitignore
# tests/test-backend-ops.cpp
2024-03-11 23:00:47 +08:00
DAN™
bcebd7dbf6
llama : add support for GritLM ( #5959 )
...
* add gritlm example
* gritlm results match
* tabs to spaces
* comment out debug printing
* rebase to new embed
* gritlm embeddings are back babeee
* add to gitignore
* allow to toggle embedding mode
* Clean-up GritLM sample code.
* Fix types.
* Flush stdout and output ending newline if streaming.
* mostly style fixes; correct KQ_mask comment
* add causal_attn flag to llama_cparams
* gritml : minor
* llama : minor
---------
Co-authored-by: Douglas Hanley <thesecretaryofwar@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-03-10 17:56:30 +02:00
Concedo
f3a0e05d91
added noavx2 vulkan
2024-02-22 16:56:25 +08:00
clibdev
d250c9d61d
gitignore : update for CLion IDE ( #5544 )
2024-02-17 18:28:37 +02:00
Neo Zhang Jianyu
01684139c3
support SYCL backend windows build ( #5208 )
...
* support SYCL backend windows build
* add windows build in CI
* add for win build CI
* correct install oneMKL
* fix install issue
* fix ci
* fix install cmd
* fix install cmd
* fix install cmd
* fix install cmd
* fix install cmd
* fix win build
* fix win build
* fix win build
* restore other CI part
* restore as base
* rm no new line
* fix no new line issue, add -j
* fix grammer issue
* allow to trigger manually, fix format issue
* fix format
* add newline
* fix format
* fix format
* fix format issuse
---------
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-01-31 08:08:07 +05:30
crasm
413e7b0559
ci : add model tests + script wrapper ( #4586 )
...
* scripts : add lib.sh and lib_test.sh
* scripts : stub out new ci-run.sh script
* scripts : switch to PascalCase for functions
This looks a little odd at first, but I find it very useful as a
convention to know if a command is part of our code vs a builtin.
* scripts : add some fancy conversion from snake_case to PascalCase
* Add venv to ci/run.sh
* Revert scripts work
* scripts : add wrapper script for local use of ci/run.sh
* Simplify .gitignore for tests, clang-tidy fixes
* Label all ctest tests
* ci : ctest uses -L main
* Attempt at writing ctest_with_model
* Update test-model-load-cancel
* ci : add ctest_with_model for debug and release
ggml-ci
* Fix gg_get_model function
ggml-ci
* got stuck on CMake
* Add get_model.cpp to tests/CMakeLists.txt
ggml-ci
* Fix README.md output for ctest_with_model
ggml-ci
* workflows : use `-L main` for all ctest
ggml-ci
* Fixes
* GG_RUN_CTEST_MODELFILE => LLAMACPP_TESTMODELFILE
* Always show warning rather than failing if model file variable is not
set
* scripts : update usage text for ci-run.sh
2024-01-26 14:18:00 +02:00
Concedo
2a4a7241e6
Merge branch 'vulkan_test' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# llama.cpp
2024-01-25 23:01:44 +08:00
Georgi Gerganov
c918fe8dca
metal : create autorelease pool during library build ( #4970 )
...
* metal : create autorelease pool during library build
ggml-ci
* test : simplify
ggml-ci
2024-01-17 18:38:39 +02:00
Concedo
dc7bc0cb50
Merge commit ' 584d674be6' into concedo_experimental
...
# Conflicts:
# .github/workflows/nix-flake-update.yml
# Makefile
# Package.swift
# ggml-cuda.cu
# tests/test-quantize-fns.cpp
2024-01-14 16:29:44 +08:00
Georgi Gerganov
5537d9d36b
gitignore : imatrix
2024-01-12 14:33:21 +02:00
Concedo
66533c8424
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# Package.swift
# README.md
# tests/test-quantize-fns.cpp
2024-01-09 17:48:18 +08:00
Georgi Gerganov
b0034d93ce
examples : add passkey test ( #3856 )
...
* examples : add passkey test
* passkey : better prints
* passkey : select pass key pos from CLI
* passkey : simplify n_past logic
* make : add passkey target
* passkey : add "self-extend"-like context extension (#4810 )
* llama : "self-extend"-like context extension
* passkey : add comment
* passkey : add readme
2024-01-08 11:14:04 +02:00
Concedo
4a8308b1c8
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
2023-12-23 10:40:29 +08:00
LeonEricsson
7082d24cec
lookup : add prompt lookup decoding example ( #4484 )
...
* initial commit, going through initializations
* main loop finished, starting to debug
* BUG: generates gibberish/repeating tokens after a while
* kv_cache management
* Added colors to distinguish drafted tokens (--color). Updated README
* lookup : fix token positions in the draft batch
* lookup : use n_draft from CLI params
* lookup : final touches
---------
Co-authored-by: Leon Ericsson <leon.ericsson@icloud.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-12-22 18:05:56 +02:00
henk717
e2cf3b7aca
koboldcpp.sh - The Mamba Multitool ( #554 )
...
* .sh script V1
* koboldcpp.sh polish
* koboldcpp.sh dist generator
* Include html's in dist
* RWKV in Linux Dist
* Lower dependency requirements
* Eliminate wget dependency
* More distinct binary name
I know its technically amd64, but I don't want to cause confusion among nvidia users.
* Use System OpenCL
Unsure how this will behave in the pyinstaller build, but pocl ended up CPU only. With a bit of luck the pyinstaller uses the one from the actual system if compiled in a system without opencl, while conda now includes it for that specific system.
* Add cblas dependency
Missing this causes compile failures on some system's
* ICD workaround
Ideally we find a better solution, but conda forces ICD and needs this for the successful compile. However, pyinstaller then embeds the ICD causing it to be limited to the system it was compiled for. By temporarily removing the ICD pyinstaller can't find it and everything remains functional. Ideally we do this on a pyinstaller level, but I could not find any good options to do so yet.
---------
Co-authored-by: root <root@DESKTOP-DQ1QRAG>
2023-12-10 21:30:17 +08:00
Concedo
ec21fa7712
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .gitignore
# CMakeLists.txt
# Makefile
# Package.swift
# README.md
# ggml-cuda.cu
# llama.cpp
# llama.h
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
2023-12-08 17:42:26 +08:00
Georgi Gerganov
fe680e3d10
sync : ggml (new ops, tests, backend, etc.) ( #4359 )
...
* sync : ggml (part 1)
* sync : ggml (part 2, CUDA)
* sync : ggml (part 3, Metal)
* ggml : build fixes
ggml-ci
* cuda : restore lost changes
* cuda : restore lost changes (StableLM rope)
* cmake : enable separable compilation for CUDA
ggml-ci
* ggml-cuda : remove device side dequantize
* Revert "cmake : enable separable compilation for CUDA"
This reverts commit 09e35d04b1c4ca67f9685690160b35bc885a89ac.
* cuda : remove assert for rope
* tests : add test-backend-ops
* ggml : fix bug in ggml_concat
* ggml : restore `ggml_get_n_tasks()` logic in `ggml_graph_plan()`
* ci : try to fix macOS
* ggml-backend : remove backend self-registration
* ci : disable Metal for macOS cmake build
ggml-ci
* metal : fix "supports family" call
* metal : fix assert
* metal : print resource path
ggml-ci
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-12-07 22:26:54 +02:00
Jared Van Bortel
15f5d96037
build : fix build info generation and cleanup Makefile ( #3920 )
...
* cmake : fix joining of REAL_GIT_DIR
* fix includes with help from include-what-you-use
* make : remove unneeded deps and add test-rope target
* fix C includes in C++ source files
* Revert "fix includes with help from include-what-you-use"
This reverts commit 635e9fadfd516d4604a0fecf4a854bfb25ad17ae.
2023-12-01 00:23:08 +02:00
Concedo
8acd7be734
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# Makefile
# README.md
2023-11-27 14:06:14 +08:00
Georgi Gerganov
922754a8d6
lookahead : add example for lookahead decoding ( #4207 )
...
* lookahead : init
* lookahead : generate and store n-grams
* lookahead : use loop instead recursion to generate n-grams
* lookahead : initial working implementation
* lookahead : filter repeating n-grams
* lookahead : use deterministic init
* lookahead : add to Makefile
* lookahead : fix a bug in the seq_id of the lookahead tokens
* lookahead : add comments
---------
Co-authored-by: slaren <slarengh@gmail.com>
2023-11-26 20:33:07 +02:00