Concedo
f9f1585a7f
broken merge - kcpp changes will be applied above this commit for better tracking.
2025-01-03 23:49:17 +08:00
Molly Sophia
4b0c638b9a
common : disable KV cache shifting automatically for unsupported models ( #11053 )
...
* Disable KV cache shifting automatically for unsupported models
instead of exiting directly
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update common/common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-01-03 14:13:18 +02:00
Georgi Gerganov
f66f582927
llama : refactor src/llama.cpp
( #10902 )
...
* llama : scatter llama.cpp into multiple modules (wip)
* llama : control-vector -> adapter
* llama : arch
* llama : mmap
ggml-ci
* ci : remove BUILD_SHARED_LIBS=OFF
ggml-ci
* llama : arch (cont)
ggml-ci
* llama : chat
ggml-ci
* llama : model
ggml-ci
* llama : hparams
ggml-ci
* llama : adapter
ggml-ci
* examples : fix
ggml-ci
* rebase
ggml-ci
* minor
* llama : kv cache
ggml-ci
* llama : impl
ggml-ci
* llama : batch
ggml-ci
* cont
ggml-ci
* llama : context
ggml-ci
* minor
* llama : context (cont)
ggml-ci
* llama : model loader
ggml-ci
* common : update lora
ggml-ci
* llama : quant
ggml-ci
* llama : quant (cont)
ggml-ci
* minor [no ci]
2025-01-03 10:18:53 +02:00
Concedo
911da8765f
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/bench/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# tests/test-backend-ops.cpp
2025-01-03 11:56:20 +08:00
Xuan Son Nguyen
45095a61bf
server : clean up built-in template detection ( #11026 )
...
* server : clean up built-in template detection
* fix compilation
* add chat template test
* fix condition
2024-12-31 15:22:01 +01:00
Peter
6e1531aca5
common, examples, ggml : fix MSYS2 GCC compiler errors and warnings when building with LLAMA_CURL=ON and GGML_OPENCL=ON ( #11013 )
...
In common/common.cpp:
* Convert usage of stat() function call to check if file exists to standard library function std::filesystem::exists (error unable to match to correct function signature)
* Additional conditions to check if PATH_MAX is already defined in WIN32 environment (warning it is already defined in MSYS2)
In examples/run/run.cpp:
* Add io.h header inclusion (error cannot find function _get_osfhandle)
* Change initialisers for OVERLAPPED to empty struct (warning about uninitialised members)
* Add initialiser for hFile (warning it may be uninitialised)
* Add cast for curl_off_t percentage value to long int in generate_progress_prefix function (warning that curl_off_t is long long int)
In ggml/src/ggml-opencl/ggml-opencl.cpp:
* Initialise certain declared cl_mem variables to nullptr for greater safety (warning about B_d variable possibly used unassigned)
2024-12-31 01:46:06 +01:00
Concedo
ee486bad3e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# examples/CMakeLists.txt
# examples/batched/batched.cpp
# examples/gritlm/gritlm.cpp
# examples/llama.android/llama/build.gradle.kts
# examples/main/README.md
# examples/retrieval/retrieval.cpp
# examples/server/CMakeLists.txt
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml.c
# scripts/compare-commits.sh
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-sampling.cpp
2024-12-19 11:57:43 +08:00
Georgi Gerganov
0bf2d10c55
tts : add OuteTTS support ( #10784 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : be explicit about the pooling type in the tests
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* llama : add OuteTTS support (wip)
* wip
* extract features
* first conv
* group norm
* resnet conv
* resnet
* attn
* pos net
* layer norm
* convnext
* head
* hann window
* fix n_embd + remove llama.cpp hacks
* compute hann window
* fft
* spectrum processing
* clean-up
* tts : receive input text and generate codes
* clip : fix new conv name
* tts : minor fix
* tts : add header + minor fixes
ggml-ci
* tts : add matchematical constant
ggml-ci
* tts : fix sampling + cut initial noise
* tts : fixes
* tts : update default samplers
ggml-ci
* tts : text pre-processing
* tts : outetts-voc -> wavtokenizer-dec
* tts : remove hardcoded constants
ggml-ci
* tts : fix tensor shapes
* llama : refactor wavtokenizer tensors
ggml-ci
* cont
ggml-ci
* cont [no ci]
* llama : update WavTokenizer to non-causal attn
* llama : handle no-vocab detokenization
* tts : add Python example for OuteTTS (wip)
* tts : extend python example to generate spectrogram
ggml-ci
* server : fix rebase artifacts
* tts : enable "return_tokens" in Python example
ggml-ci
* tts : minor fixes
* common : support HF download for vocoder
2024-12-18 19:27:21 +02:00
Georgi Gerganov
152610eda9
server : output embeddings for all tokens when pooling = none ( #10861 )
...
* server : add "tokens" output
ggml-ci
* server : output embeddings for all tokens when pooling = none
ggml-ci
* server : update readme [no ci]
* server : fix spacing [no ci]
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* server : be explicit about the pooling type in the tests
ggml-ci
* server : update /embeddings and /v1/embeddings endpoints
ggml-ci
* server : do not normalize embeddings when there is no pooling
ggml-ci
* server : update readme
ggml-ci
* server : fixes
* tests : update server tests
ggml-ci
* server : update readme [no ci]
* server : remove rebase artifact
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-12-18 13:01:41 +02:00
Georgi Gerganov
644fd71b44
sampling : refactor + optimize penalties sampler ( #10803 )
...
* sampling : refactor + optimize penalties sampler
ggml-ci
* common : apply ignore_eos as logit bias
ggml-ci
* batched : remove penalties sampler
* params : allow penalty_last_n == -1 to be equal to context size
ggml-ci
* common : by default, move the penalties at the end of the sampling chain
ggml-ci
* common : ignore all EOG tokens
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* common : move back the penalties at the front of the sampling chain
ggml-ci
* readme : restore hint about --ignore-eos flag [no ci]
* llama : minor
ggml-ci
* webui : update
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-16 12:31:14 +02:00
Concedo
f456ed7237
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/nix/package.nix
# .devops/tools.sh
# .github/workflows/build.yml
# Makefile
# README.md
# common/CMakeLists.txt
# common/common.h
# examples/llava/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/run/README.md
# examples/run/run.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-kompute/ggml-kompute.cpp
# tests/test-backend-ops.cpp
# tests/test-rope.cpp
2024-12-15 15:30:10 +08:00
Eric Curtin
c27ac678dd
Opt class for positional argument handling ( #10508 )
...
Added support for positional arguments `model` and `prompt`. Added
functionality to download via strings like:
llama-run llama3
llama-run ollama://granite-code
llama-run ollama://granite-code:8b
llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf
llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf
llama-run https://example.com/some-file1.gguf
llama-run some-file2.gguf
llama-run file://some-file3.gguf
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-13 19:34:25 +01:00
Concedo
ed75f8a741
up to date merge, without vulkan-gen-shaders. They will be built before each release from now on, as they are very large
2024-12-13 17:18:01 +08:00
Xuan Son Nguyen
adffa6ffd5
common : improve -ctv -ctk CLI arguments ( #10806 )
...
* common : improve ctv ctk cli argument
* regenerate docs
* even better approach
* use std::vector
2024-12-12 22:53:05 +01:00
Concedo
557bcaf86e
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .clang-tidy
# .github/workflows/build.yml
# Makefile
# Package.swift
# common/CMakeLists.txt
# examples/batched-bench/CMakeLists.txt
# examples/batched/CMakeLists.txt
# examples/convert-llama2c-to-ggml/CMakeLists.txt
# examples/cvector-generator/CMakeLists.txt
# examples/embedding/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/export-lora/CMakeLists.txt
# examples/gbnf-validator/CMakeLists.txt
# examples/gguf-split/CMakeLists.txt
# examples/gguf/CMakeLists.txt
# examples/gritlm/CMakeLists.txt
# examples/imatrix/CMakeLists.txt
# examples/infill/CMakeLists.txt
# examples/llama-bench/CMakeLists.txt
# examples/llava/CMakeLists.txt
# examples/lookahead/CMakeLists.txt
# examples/lookup/CMakeLists.txt
# examples/main-cmake-pkg/CMakeLists.txt
# examples/main/CMakeLists.txt
# examples/parallel/CMakeLists.txt
# examples/passkey/CMakeLists.txt
# examples/perplexity/CMakeLists.txt
# examples/quantize-stats/CMakeLists.txt
# examples/quantize/CMakeLists.txt
# examples/retrieval/CMakeLists.txt
# examples/run/CMakeLists.txt
# examples/save-load-state/CMakeLists.txt
# examples/server/CMakeLists.txt
# examples/simple-chat/CMakeLists.txt
# examples/simple/CMakeLists.txt
# examples/speculative-simple/CMakeLists.txt
# examples/speculative/CMakeLists.txt
# examples/tokenize/CMakeLists.txt
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-vulkan/vulkan-shaders/CMakeLists.txt
# pocs/vdot/CMakeLists.txt
# src/CMakeLists.txt
# src/unicode.cpp
# tests/test-sampling.cpp
2024-11-30 12:24:51 +08:00
Concedo
ec95241e38
temp checkpoint
2024-11-30 11:59:27 +08:00
Diego Devesa
7cc2d2c889
ggml : move AMX to the CPU backend ( #10570 )
...
* ggml : move AMX to the CPU backend
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-29 21:54:58 +01:00
Xuan Son Nguyen
9f912511bc
common : fix duplicated file name with hf_repo and hf_file ( #10550 )
2024-11-27 22:30:52 +01:00
Concedo
ec581b19d8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/010-bug-compilation.yml
# .github/ISSUE_TEMPLATE/011-bug-results.yml
# .github/ISSUE_TEMPLATE/019-bug-misc.yml
# .github/workflows/build.yml
# .github/workflows/docker.yml
# CMakeLists.txt
# Makefile
# Package.swift
# examples/CMakeLists.txt
# examples/eval-callback/CMakeLists.txt
# examples/llama-bench/llama-bench.cpp
# examples/server/README.md
# examples/server/server.cpp
# examples/simple-chat/simple-chat.cpp
# examples/simple/simple.cpp
# examples/speculative-simple/speculative-simple.cpp
# examples/speculative/speculative.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-amx/CMakeLists.txt
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-kompute/CMakeLists.txt
# ggml/src/ggml-kompute/ggml-kompute.cpp
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-rpc/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# ggml/src/ggml-vulkan/CMakeLists.txt
# pocs/CMakeLists.txt
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-quantize-fns.cpp
2024-11-26 17:01:20 +08:00
Diego Devesa
10bce0450f
llama : accept a list of devices to use to offload a model ( #10497 )
...
* llama : accept a list of devices to use to offload a model
* accept `--dev none` to completely disable offloading
* fix dev list with dl backends
* rename env parameter to LLAMA_ARG_DEVICE for consistency
2024-11-25 19:30:06 +01:00
Diego Devesa
5931c1f233
ggml : add support for dynamic loading of backends ( #10469 )
...
* ggml : add support for dynamic loading of backends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-25 15:13:39 +01:00
Concedo
83350ec314
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/020-enhancement.yml
# .github/ISSUE_TEMPLATE/030-research.yml
# .github/ISSUE_TEMPLATE/040-refactor.yml
# .github/workflows/build.yml
# Makefile
# common/CMakeLists.txt
# examples/CMakeLists.txt
# examples/infill/infill.cpp
# examples/lookahead/lookahead.cpp
# examples/lookup/lookup-stats.cpp
# examples/lookup/lookup.cpp
# examples/parallel/parallel.cpp
# examples/retrieval/retrieval.cpp
# examples/save-load-state/save-load-state.cpp
# examples/speculative/speculative.cpp
# flake.lock
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/aclnn_ops.cpp
# ggml/src/ggml-cann/kernels/CMakeLists.txt
# ggml/src/ggml-cann/kernels/dup.cpp
# ggml/src/ggml-cann/kernels/get_row_f16.cpp
# ggml/src/ggml-cann/kernels/get_row_f32.cpp
# ggml/src/ggml-cann/kernels/get_row_q4_0.cpp
# tests/test-arg-parser.cpp
# tests/test-backend-ops.cpp
2024-11-25 16:26:08 +08:00
Georgi Gerganov
d9d54e498d
speculative : refactor and add a simpler example ( #10362 )
...
* speculative : refactor and add a simpler example
ggml-ci
* speculative : clean-up and add comments and TODOs [no ci]
* speculative : manage context in common_speculative
ggml-ci
* speculative : simplify
ggml-ci
* speculative : simplify (cont)
ggml-ci
* speculative : add --draft-min CLI arg
* speculative : minor fixup
* make : build fixes
* speculative : do not redraft previous drafts
ggml-ci
* speculative : fix the draft sampling
ggml-ci
* speculative : fix compile warning
* common : refactor args
ggml-ci
* common : change defaults [no ci]
* common : final touches
ggml-ci
2024-11-25 09:58:41 +02:00
Concedo
091a432cf6
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/llama-cli-cann.Dockerfile
# .devops/llama-cli-cuda.Dockerfile
# .devops/llama-cli-intel.Dockerfile
# .devops/llama-cli-musa.Dockerfile
# .devops/llama-cli-vulkan.Dockerfile
# .devops/llama-server-cuda.Dockerfile
# .devops/llama-server-intel.Dockerfile
# .devops/llama-server-musa.Dockerfile
# .devops/llama-server-vulkan.Dockerfile
# .gitignore
# CMakeLists.txt
# Makefile
# cmake/llama-config.cmake.in
# docs/backend/SYCL.md
# docs/build.md
# examples/llama-bench/llama-bench.cpp
# flake.lock
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml-blas/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-metal/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# ggml/src/ggml-sycl/CMakeLists.txt
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
2024-11-21 16:26:24 +08:00
Concedo
282a647689
Merge commit ' 467576b6cc
' into concedo_experimental
...
# Conflicts:
# .gitignore
# Makefile
# README.md
# common/common.h
# docs/build.md
# examples/infill/infill.cpp
# examples/perplexity/perplexity.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.sh
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-opt.cpp
# tests/test-quantize-perf.cpp
2024-11-21 16:05:21 +08:00
Georgi Gerganov
8e752a777b
llama : add check for KV cache shifts ( #10401 )
...
ggml-ci
2024-11-19 13:29:26 +02:00
Johannes Gäßler
4e54be0ec6
llama/ex: remove --logdir argument ( #10339 )
2024-11-16 23:00:41 +01:00
Concedo
70aee82552
attempts a backflip, but does he stick the landing?
2024-11-16 17:05:45 +08:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00
Concedo
a244b1ffd2
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# Makefile
# Package.swift
# ci/run.sh
# docs/backend/SYCL.md
# examples/llama-bench/llama-bench.cpp
# examples/server/CMakeLists.txt
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# grammars/README.md
# scripts/sync-ggml-am.sh
# scripts/sync-ggml.last
# scripts/sync-ggml.sh
# tests/run-json-schema-to-grammar.mjs
# tests/test-backend-ops.cpp
2024-11-09 13:36:47 +08:00
Georgi Gerganov
5c333e0140
metal : add BF16 support ( #8439 )
...
* ggml : add initial BF16 support
ggml-ci
* metal : add mul_mat_id BF16 support
ggml-ci
* metal : check for bfloat support on the Metal device
ggml-ci
* metal : better var names [no ci]
* metal : do not build bfloat kernels when not supported
ggml-ci
* metal : try to fix BF16 support check
ggml-ci
* metal : this should correctly check bfloat support
2024-11-06 19:53:51 +02:00
Concedo
bb13925f39
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakePresets.json
# Makefile
# Package.swift
# ci/run.sh
# common/CMakeLists.txt
# examples/CMakeLists.txt
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-backend.cpp
# ggml/src/ggml.c
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-rope.cpp
2024-11-04 16:54:53 +08:00
Diego Devesa
9f40989351
ggml : move CPU backend to a separate file ( #10144 )
2024-11-03 19:34:08 +01:00
Concedo
a46f8acd03
note: also has support for completion tokens count
2024-11-01 00:44:14 +08:00
Georgi Gerganov
8d8ff71536
llama : remove Tail-Free sampling ( #10071 )
...
ggml-ci
2024-10-29 10:42:05 +02:00
wwoodsTM
ff252ea48e
llama : add DRY sampler ( #9702 )
...
* sampling : add DRY sampler (post-refactor)
* DRY: Trying to fix coauthors, removed unneeded line
* DRY: Fixed redundant code
* DRY: Fixed crash issue due to DRY being in chain but uninitialized
---------
Co-authored-by: l3utterfly <gc.pthzfoldr@gmail.com>
Co-authored-by: pi6am <34464159+pi6am@users.noreply.github.com>
2024-10-25 19:07:34 +03:00
Michael Podvitskiy
d80fb71f8b
llama: string_split fix ( #10022 )
...
* llama: Refactor string_split to use template specialization, fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string
2024-10-25 17:57:54 +02:00
Concedo
94a5a27b85
Alone in the darkness
...
They're coming for you
I know they will try to catch me too
Alone in the darkness
They're calling for you
There's nowhere to run for cover
2024-10-24 22:29:20 +08:00
Georgi Gerganov
f594bc80ba
ggml : add asserts for type conversion in fattn kernels ( #9971 )
...
ggml-ci
2024-10-21 16:20:46 +03:00
Xuan Son Nguyen
cda0e4b648
llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch ( #9745 )
...
* refactor llama_batch_get_one
* adapt all examples
* fix simple.cpp
* fix llama_bench
* fix
* fix context shifting
* free batch before return
* use common_batch_add, reuse llama_batch in loop
* null terminated seq_id list
* fix save-load-state example
* fix perplexity
* correct token pos in llama_batch_allocr
2024-10-18 23:18:01 +02:00
Concedo
a9dbcdd3ec
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# README.md
# docs/build.md
# examples/infill/infill.cpp
# examples/main/README.md
# examples/server/README.md
# flake.lock
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-json-schema-to-grammar.cpp
# tests/test-sampling.cpp
2024-10-17 16:36:02 +08:00
MaggotHATE
fbc98b748e
sampling : add XTC sampler ( #9742 )
...
* Initial XTC commit
Adds XTC sampler, not activated by default, but recommended settings by default.
* Cleanup
* Simplified chances calculation
To be more inline with the original implementation, chance is calculated once at the beginning.
* First fixes by comments
Still need to look into sorting
* Fixed trailing backspaces
* Fixed RNG to be reproduceable
Thanks to @slaren for directions
* Fixed forgotten header
* Moved `min_keep`
Moved from conditions to a simple check at the end.
* Fixed broken randomization
Thanks to @slaren for explanation
* Swapped sorting for a custom algorithm
Shifts tokens to remove the penalized ones, then puts the penalized at the back. Should make `min_keep` still viable.
* Algorithm rework
1. Scan token from top till the first non-penalizable
2. Remove the last captured token (the least probable above threshold)
3. Shift all tokens to override the remaining penalizable
4. Penalize and put them at the the bottom.
* Added XTC to `test-sampling`
* Simplified algorithm and more tests
* Updated info in common and args
* Merged back lost commits in common and arg
* Update dump info in common
* Fixed incorrect min_keep check
* Added XTC to README
* Renamed parameters, fixed info and defaults
* probability is at 0 by default, but XTC is included in sampling queue
* threshold higher than 0.5 switches XTC off
* Initial server support
* Added XTC to server UIs
* Fixed labels in old server UI
* Made algorithm safer and more readable
* Removed xtc_threshold_max
* Fixed arg after update
* Quick fixes by comments
* Simplified algorithm since threshold_max is removed
* Renamed random distribution
* Fixed tests and outdated README
* Small fixes
2024-10-15 12:54:55 +02:00
Georgi Gerganov
11ac9800af
llama : improve infill support and special token detection ( #9798 )
...
* llama : improve infill support
ggml-ci
* llama : add more FIM token strings
ggml-ci
* server : update prompt on slot restore (#9800 )
* gguf : deprecate old FIM token KVs
2024-10-12 08:21:51 +03:00
Concedo
e692a79aab
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/docker.yml
# CMakeLists.txt
# CONTRIBUTING.md
# docs/android.md
# docs/docker.md
# examples/embedding/embedding.cpp
# examples/imatrix/imatrix.cpp
# examples/infill/infill.cpp
# examples/llama-bench/llama-bench.cpp
# examples/main/README.md
# examples/parallel/parallel.cpp
# examples/perplexity/perplexity.cpp
# examples/quantize-stats/quantize-stats.cpp
# examples/save-load-state/save-load-state.cpp
# examples/server/README.md
# examples/simple/CMakeLists.txt
# examples/speculative/speculative.cpp
# flake.lock
# ggml/src/CMakeLists.txt
# ggml/src/ggml-blas.cpp
# pocs/vdot/q8dot.cpp
# pocs/vdot/vdot.cpp
# scripts/debug-test.sh
# scripts/sync-ggml.last
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-quantize-fns.cpp
# tests/test-quantize-perf.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
2024-10-11 11:59:59 +08:00
Diego Devesa
7eee341bee
common : use common_ prefix for common library functions ( #9805 )
...
* common : use common_ prefix for common library functions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-10-10 22:57:42 +02:00
Concedo
da6cf261a8
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/nix-ci-aarch64.yml
# .github/workflows/nix-ci.yml
# README.md
# ci/run.sh
# examples/server/README.md
# ggml/src/ggml-cuda.cu
# ggml/src/ggml-metal.m
# scripts/sync-ggml.last
# tests/test-backend-ops.cpp
2024-10-05 22:24:08 +08:00
Georgi Gerganov
8c475b97b8
rerank : use [SEP] token instead of [BOS] ( #9737 )
...
* rerank : use [SEP] token instead of [BOS]
ggml-ci
* common : sanity check for non-NULL tokens
ggml-ci
* ci : adjust rank score interval
ggml-ci
* ci : add shebang to run.sh
ggml-ci
2024-10-05 15:55:04 +03:00
Concedo
ce7f9c9a2c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-rocm.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .github/workflows/build.yml
# .github/workflows/python-type-check.yml
# CMakeLists.txt
# CONTRIBUTING.md
# README.md
# ci/run.sh
# examples/embedding/embedding.cpp
# examples/server/README.md
# flake.lock
# ggml/include/ggml.h
# ggml/src/ggml.c
# requirements/requirements-convert_legacy_llama.txt
# scripts/sync-ggml.last
# src/llama-vocab.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-tokenizer-0.cpp
2024-10-02 01:00:57 +08:00
matiaslin
faac0bae26
common : ensure llama_batch size does not exceed max size ( #9668 )
...
A crash was observed when the number of tokens added to a batch exceeds
llama_batch size. An assertion in llama_batch_add was added to protect
against llama_batch size overflow.
2024-09-29 15:25:00 +03:00
Georgi Gerganov
f4d2b8846a
llama : add reranking support ( #9510 )
...
* py : add XLMRobertaForSequenceClassification [no ci]
* py : fix scalar-tensor conversion [no ci]
* py : fix position embeddings chop [no ci]
* llama : read new cls tensors [no ci]
* llama : add classigication head (wip) [no ci]
* llama : add "rank" pooling type
ggml-ci
* server : add rerank endpoint
ggml-ci
* llama : aboud ggml_repeat during classification
* rerank : cleanup + comments
* server : accept /rerank endpoint in addition to /v1/rerank [no ci]
* embedding : parse special tokens
* jina : support v1 reranker
* vocab : minor style
ggml-ci
* server : initiate tests for later
ggml-ci
* server : add docs
* llama : add comment [no ci]
* llama : fix uninitialized tensors
* ci : add rerank tests
ggml-ci
* add reranking test
* change test data
* Update examples/server/server.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* add `--reranking` argument
* update server docs
* llama : fix comment [no ci]
ggml-ci
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
2024-09-28 17:42:03 +03:00