Concedo
8a71eb03c0
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# ggml/cmake/ggml-config.cmake.in
# ggml/src/ggml-cann/CMakeLists.txt
# ggml/src/ggml-cann/common.h
# ggml/src/ggml-cann/ggml-cann.cpp
# ggml/src/ggml-cuda/fattn.cu
# ggml/src/ggml-opencl/CMakeLists.txt
# ggml/src/ggml-opencl/ggml-opencl.cpp
# requirements/requirements-convert_hf_to_gguf.txt
# scripts/compare-llama-bench.py
# tests/test-chat-template.cpp
# tests/test-chat.cpp
# tools/llama-bench/llama-bench.cpp
2025-08-07 21:23:09 +08:00
Sachin Desai
3db4da56a5
chat : support Granite model reasoning and tool call ( #14864 )
2025-08-06 20:27:30 +02:00
Sam
ef0144c087
model: support GLM 4.5 family of models ( #14939 )
...
* model: Add GLM 4.5 (#14921 )
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Merge in PR suggestions
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: Add GLM 4.5 family of models (#14921 )
1. Updated tensor_mapping.py with NextN tensor mappings
- Added proper tensor mappings for all NextN/MTP tensors in /Users/samm/git/llama.cpp/gguf-py/gguf/tensor_mapping.py
- Added mappings for: eh_proj, embed_tokens, enorm, hnorm, shared_head.head, shared_head.norm
2. Added num_nextn_predict_layers configuration
- Added LLM_KV_NUM_NEXTN_PREDICT_LAYERS constant to llama-arch.h and llama-arch.cpp
- Added num_nextn_predict_layers field to llama_hparams struct
- Updated GLM4_MOE parameter loading in llama-model.cpp to read this parameter
- Modified tensor loading logic to conditionally load NextN tensors based on num_nextn_predict_layers
- Added GGUF writer support in gguf_writer.py with add_num_nextn_predict_layers() method
- Updated conversion script to extract and write this parameter from HuggingFace config
3. Added FIM tokens for GLM4_MOE
- Added GLM-4.5's FIM tokens to llama-vocab.cpp:
- <|code_prefix|> for FIM_PRE
- <|code_suffix|> for FIM_SUF
- <|code_middle|> for FIM_MID
4. Removed manual NextN tensor handling
- Removed the special-case handling in convert_hf_to_gguf.py that manually mapped NextN tensors
- NextN tensors are now handled automatically through the proper tensor mapping system
* glm 4.5 update tensors names
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: glm 4.5 apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* model: glm 4.5 apply suggestions from code review
* Apply suggestions from code review
* patch broken chat template
* typings fix
* add TENSOR_SKIP flag
Co-authored-by: Diego Devesa <slarengh@gmail.com>
* Update src/llama-model-loader.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-04 20:29:25 +02:00
Xuan-Son Nguyen
00fa15fedc
mtmd : add support for Voxtral ( #14862 )
...
* mtmd : add support for Voxtral
* clean up
* fix python requirements
* add [BEGIN_AUDIO] token
* also support Devstral conversion
* add docs and tests
* fix regression for ultravox
* minor coding style improvement
* correct project activation fn
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-28 15:01:48 +02:00
Concedo
cbe9fc87c5
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# src/llama-vocab.cpp
2025-07-16 12:03:54 +08:00
Gabriel Larson
4a4f426944
model : add Kimi-K2 support ( #14654 )
...
* Kimi-K2 conversion
* add Kimi_K2 pre type
* Kimi-K2
* Kimi-K2 unicode
* Kimi-K2
* LLAMA_MAX_EXPERTS 384
* fix vocab iteration
* regex space fix
* add kimi-k2 to pre_computed_hashes
* Updated with kimi-k2 get_vocab_base_pre hash
* fix whitespaces
* fix flake errors
* remove more unicode.cpp whitespaces
* change set_vocab() flow
* add moonshotai-Kimi-K2.jinja to /models/templates/
* update moonshotai-Kimi-K2.jinja
* add kimi-k2 chat template
* add kimi-k2
* update NotImplementedError
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* except Exception
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* LLM_CHAT_TEMPLATE_KIMI_K2 if(add_ass){}
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-15 21:54:22 +02:00
Concedo
4db8ba6228
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# ggml/src/ggml-sycl/gemm.hpp
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/set_rows.cpp
2025-07-14 23:16:44 +08:00
Molly Sophia
0d9226763c
llama : add jinja template for rwkv-world ( #14665 )
...
* llama : add jinja template for rwkv-world
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-14 07:43:43 +08:00
Bartowski
901e20bbe5
jinja : Add Mistral-Small-3.2-24B-Instruct-2506.jinja ( #14349 )
...
This will allow the use of tools on the llama-server
2025-06-24 09:17:58 +03:00
Concedo
b08dca65ed
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# common/CMakeLists.txt
# common/arg.cpp
# common/chat.cpp
# examples/parallel/README.md
# examples/parallel/parallel.cpp
# ggml/cmake/common.cmake
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# ggml/src/ggml-sycl/ggml-sycl.cpp
# ggml/src/ggml-sycl/rope.cpp
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-command-r.gguf.inp
# models/ggml-vocab-command-r.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-qwen2.gguf.inp
# models/ggml-vocab-qwen2.gguf.out
# models/ggml-vocab-refact.gguf.inp
# models/ggml-vocab-refact.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements/requirements-gguf_editor_gui.txt
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
# tools/mtmd/CMakeLists.txt
# tools/run/run.cpp
# tools/server/CMakeLists.txt
2025-05-31 13:04:21 +08:00
Xuan-Son Nguyen
07e4351ce6
convert : allow partial update to the chkhsh pre-tokenizer list ( #13847 )
...
* convert : allow partial update to the chkhsh pre-tokenizer list
* code style
* update tokenizer out
* rm inp/out files for models not having gguf
* fixed hash for glm
* skip nomic-bert-moe test
* Update convert_hf_to_gguf_update.py
* fix minerva-7b hash
* rm redundant import
2025-05-30 12:24:37 +02:00
Concedo
868cb6aff7
Merge commit ' e121edc432
' into concedo_experimental
...
# Conflicts:
# .github/workflows/release.yml
# common/CMakeLists.txt
# docs/function-calling.md
# ggml/src/ggml-sycl/binbcast.cpp
# models/templates/README.md
# scripts/tool_bench.py
# src/llama-kv-cache.cpp
# tests/CMakeLists.txt
# tests/test-chat.cpp
# tools/mtmd/clip.h
# tools/rpc/rpc-server.cpp
# tools/server/README.md
2025-05-28 00:20:45 +08:00
Olivier Chafik
e121edc432
server
: add --reasoning-budget 0
to disable thinking (incl. qwen3 w/ enable_thinking:false) (#13771 )
...
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-26 00:30:51 +01:00
Sigbjørn Skjæret
aa50ba462f
tests : improve UGM tokenizer test coverage ( #13773 )
2025-05-25 16:22:29 +02:00
Olivier Chafik
f5cd27b71d
server
: streaming of tool calls and thoughts when --jinja
is on (#12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>
2025-05-25 01:48:08 +01:00
Concedo
6b6597ebf1
allow for single token prompt processing (actual batch size 1)
2025-04-25 16:54:46 +08:00
Concedo
28a2723100
merged pixtral support, not fully working
2025-04-24 15:27:02 +08:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Concedo
4b0f63ed62
cleanup
2025-04-18 22:57:10 +08:00
Concedo
ebf924c5d1
Merge branch 'upstream' into concedo_experimental
2025-04-08 21:46:30 +08:00
Xuan-Son Nguyen
1466621e73
llama : Support llama 4 text-only ( #12791 )
...
* llama4 conversion
* initial support, no chat template
* clean up a bit
* fix tokenizer conversion
* correct hparams
* try this
* fix shexp
* ffn_inp_normed
* chat template
* clean up model conversion
* add_bos
* add scale_before_ffn
* fix order
* weight_before_ffn
* llm_graph_input_attn_temp
* add chunk attn mask
* build_inp_attn_scale()
* add comment about ggml_repeat
* clarify comments
* fix build
2025-04-07 23:06:44 +02:00
Concedo
ec43d2b147
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# README.md
# common/common.cpp
# examples/embedding/embedding.cpp
# examples/json_schema_to_grammar.py
# examples/llama.android/llama/src/main/cpp/llama-android.cpp
# examples/llama.swiftui/README.md
# examples/llama.swiftui/llama.swiftui.xcodeproj/project.pbxproj
# examples/lookahead/lookahead.cpp
# examples/parallel/parallel.cpp
# examples/passkey/passkey.cpp
# ggml/CMakeLists.txt
# ggml/src/CMakeLists.txt
# ggml/src/ggml-cpu/CMakeLists.txt
# requirements.txt
# requirements/requirements-all.txt
# scripts/fetch_server_test_models.py
# tests/test-chat.cpp
# tests/test-json-schema-to-grammar.cpp
2025-03-06 18:54:58 +08:00
Olivier Chafik
669912d9a5
tool-call
: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )
...
* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
Xuan-Son Nguyen
c43a3e7996
llama : add Phi-4-mini support (supersede #12099 ) ( #12108 )
...
* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
2025-02-28 12:44:11 +01:00
Concedo
754fef5204
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/cuda.Dockerfile
# .devops/musa.Dockerfile
# .github/workflows/build.yml
# README.md
# docs/docker.md
# examples/imatrix/imatrix.cpp
# examples/llama-bench/llama-bench.cpp
# examples/main/README.md
# examples/perplexity/perplexity.cpp
# examples/server/README.md
# ggml/src/ggml-cpu/ggml-cpu.c
# ggml/src/ggml-cuda/CMakeLists.txt
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Llama-8B.jinja
# models/templates/deepseek-ai-DeepSeek-R1-Distill-Qwen-32B.jinja
# scripts/get_chat_template.py
# scripts/sync-ggml.last
# tests/test-chat.cpp
# tests/test-gguf.cpp
# tests/test-sampling.cpp
2025-02-15 00:49:46 +08:00
Olivier Chafik
c7f460ab88
server
: fix tool-call of DeepSeek R1 Qwen, return reasoning_content (Command 7RB & DeepSeek R1) unless --reasoning-format none
(#11607 )
...
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 10:05:16 +00:00
Concedo
db6db9dff9
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/workflows/build.yml
# .github/workflows/close-issue.yml
# .github/workflows/server.yml
# AUTHORS
# CMakeLists.txt
# Makefile
# README.md
# cmake/llama.pc.in
# common/CMakeLists.txt
# docs/build.md
# examples/batched.swift/Sources/main.swift
# examples/llama.swiftui/llama.cpp.swift/LibLlama.swift
# examples/llava/CMakeLists.txt
# examples/llava/clip.h
# examples/run/run.cpp
# examples/server/README.md
# ggml/CMakeLists.txt
# ggml/src/ggml-cuda/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# ggml/src/ggml-musa/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
# tests/test-json-schema-to-grammar.cpp
2025-02-07 00:52:31 +08:00
Olivier Chafik
bfcce4d693
tool-call
: support Command R7B (+ return tool_plan "thoughts" in API) (#11585 )
...
* `tool-call`: support Command R7B (w/ tool_plan return)
* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override
* `tool-call`: test cleanup / handle lazy grammar triggers
2025-02-02 09:25:38 +00:00
Concedo
f13498df13
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/tools.sh
# .devops/vulkan.Dockerfile
# .github/workflows/build.yml
# .github/workflows/docker.yml
# .github/workflows/server.yml
# Makefile
# README.md
# cmake/llama-config.cmake.in
# common/CMakeLists.txt
# examples/gbnf-validator/gbnf-validator.cpp
# examples/run/run.cpp
# examples/server/README.md
# examples/server/tests/README.md
# ggml/src/CMakeLists.txt
# ggml/src/ggml-hip/CMakeLists.txt
# scripts/sync-ggml.last
# tests/CMakeLists.txt
# tests/test-backend-ops.cpp
# tests/test-chat-template.cpp
# tests/test-grammar-integration.cpp
2025-02-01 17:14:59 +08:00
Olivier Chafik
8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars ( #9639 )
...
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Xuan Son Nguyen
ec7f3ac9ab
llama : add support for Deepseek-R1-Qwen distill model ( #11310 )
...
* llama : add support for Deepseek-R1-Qwen distill model
* coding style
2025-01-20 14:35:07 +01:00
Sukriti Sharma
784a14aa49
convert : add support for Roberta embeddings ( #10695 )
2024-12-07 09:02:14 +02:00
Concedo
afc575fbd8
cleanup, try to add version tagging
2024-11-23 12:59:06 +08:00
Concedo
ce7f9c9a2c
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .devops/full-rocm.Dockerfile
# .devops/llama-cli-rocm.Dockerfile
# .devops/llama-server-rocm.Dockerfile
# .github/workflows/build.yml
# .github/workflows/python-type-check.yml
# CMakeLists.txt
# CONTRIBUTING.md
# README.md
# ci/run.sh
# examples/embedding/embedding.cpp
# examples/server/README.md
# flake.lock
# ggml/include/ggml.h
# ggml/src/ggml.c
# requirements/requirements-convert_legacy_llama.txt
# scripts/sync-ggml.last
# src/llama-vocab.cpp
# src/llama.cpp
# tests/test-backend-ops.cpp
# tests/test-grad0.cpp
# tests/test-tokenizer-0.cpp
2024-10-02 01:00:57 +08:00
nopperl
9a913110cf
llama : add support for Chameleon ( #8543 )
...
* convert chameleon hf to gguf
* add chameleon tokenizer tests
* fix lint
* implement chameleon graph
* add swin norm param
* return qk norm weights and biases to original format
* implement swin norm
* suppress image token output
* rem tabs
* add comment to conversion
* fix ci
* check for k norm separately
* adapt to new lora implementation
* fix layer input for swin norm
* move swin_norm in gguf writer
* add comment regarding special token regex in chameleon pre-tokenizer
* Update src/llama.cpp
Co-authored-by: compilade <git@compilade.net>
* fix punctuation regex in chameleon pre-tokenizer (@compilade)
Co-authored-by: compilade <git@compilade.net>
* fix lint
* trigger ci
---------
Co-authored-by: compilade <git@compilade.net>
2024-09-28 15:08:43 +03:00
Georgi Gerganov
e093dd2382
tests : re-enable tokenizer tests ( #8611 )
...
* models : remove duplicated gpt-2 vocab
* models : remove old stablelm vocab
* tests : re-enable MPT tokenizer tests
* tests : re-enable DeepSeek tokenizer tests
* cmake : sort
ggml-ci
2024-07-22 13:32:49 +03:00
Concedo
8e5fd6f509
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .gitignore
# README.md
# docs/backend/BLIS.md
# docs/backend/SYCL.md
# docs/development/llama-star/idea-arch.key
# docs/development/llama-star/idea-arch.pdf
# docs/development/token_generation_performance_tips.md
# src/llama.cpp
# tests/test-tokenizer-0.cpp
# tests/test-tokenizer-1-bpe.cpp
# tests/test-tokenizer-1-spm.cpp
# tests/test-tokenizer-random.py
2024-07-06 19:39:24 +08:00
Concedo
5b605d03ea
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .github/ISSUE_TEMPLATE/config.yml
# .gitignore
# CMakeLists.txt
# CONTRIBUTING.md
# Makefile
# README.md
# ci/run.sh
# common/common.h
# examples/main-cmake-pkg/CMakeLists.txt
# ggml/src/CMakeLists.txt
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements.txt
# requirements/requirements-convert_legacy_llama.txt
# scripts/check-requirements.sh
# scripts/pod-llama.sh
# src/CMakeLists.txt
# src/llama.cpp
# tests/test-rope.cpp
2024-07-06 00:25:10 +08:00
fairydreaming
807b0c49ff
Inference support for T5 and FLAN-T5 model families ( #5763 )
...
* llama : add inference support and model types for T5 and FLAN-T5 model families
* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()
* common, llama-cli, llama-batched : add support for encoder-decoder models
* convert-hf : handle shared token embeddings tensors in T5Model
* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)
* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model
* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5
---------
Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-04 15:46:11 +02:00
Georgi Gerganov
20fc3804bf
convert : fix gemma v1 tokenizer convert ( #8248 )
...
ggml-ci
2024-07-04 10:41:03 +03:00
jaime-m-p
3b38d48609
Per token attributes ( #7685 )
...
* Add per token attributes enum
* Using phi-3 for testing 'rstrip'
* Using jina-v2 for testing 'lstrip'
* Brute force test for 'lstrip' and 'rstrip'
* Implement 'rstrip' and 'lstrip'
* Update phi-3 GGUF file (obsolete since 917dc8c
)
* Replace llama_token_type with llama_token_attribs
2024-06-04 09:17:17 +02:00
Haoxiang Fei
f99e1e456e
llama : lookup word in vocab before doing BPE merges ( #7193 )
...
* fix: llama-3 ignore_merges
* test: add test for llama-3 bpe ignore_merges
* fix: set ignore_merges only for llama-3
* fix: test-tokenizer-1-bpe --ingore-merges detection
* fix: copy to fix fallthrough
* fix: change ignore_merges to bool
* fix: add ignore merges tests to cmake
* llama : alternative merge ignore logic
---------
Co-authored-by: Haoxiang Fei <feihaoxiang@idea.edu.cn>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-11 11:12:06 +03:00
Concedo
d084f78faa
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# CMakeLists.txt
# Makefile
# README.md
# common/common.cpp
# requirements/requirements-convert-hf-to-gguf-update.txt
# requirements/requirements-convert-hf-to-gguf.txt
# requirements/requirements-convert.txt
# tests/CMakeLists.txt
# tests/test-json-schema-to-grammar.cpp
2024-05-09 15:13:34 +08:00
Ren Xuancheng
229ffff872
llama : add BPE pre-tokenization for Qwen2 ( #7114 )
...
* Add BPE pre-tokenization for Qwen2.
* minor : fixes
---------
Co-authored-by: Ren Xuancheng <17811943+jklj077@users.noreply.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-08 15:06:43 +03:00
Concedo
6c000cbe7a
Merge branch 'upstream' into concedo_experimental
...
# Conflicts:
# .flake8
# .github/workflows/bench.yml
# .github/workflows/python-lint.yml
# .pre-commit-config.yaml
# Makefile
# README.md
# models/ggml-vocab-bert-bge.gguf.inp
# models/ggml-vocab-bert-bge.gguf.out
# models/ggml-vocab-deepseek-coder.gguf.inp
# models/ggml-vocab-deepseek-coder.gguf.out
# models/ggml-vocab-deepseek-llm.gguf.inp
# models/ggml-vocab-deepseek-llm.gguf.out
# models/ggml-vocab-falcon.gguf.inp
# models/ggml-vocab-falcon.gguf.out
# models/ggml-vocab-gpt-2.gguf.inp
# models/ggml-vocab-gpt-2.gguf.out
# models/ggml-vocab-llama-bpe.gguf.inp
# models/ggml-vocab-llama-bpe.gguf.out
# models/ggml-vocab-llama-spm.gguf.inp
# models/ggml-vocab-llama-spm.gguf.out
# models/ggml-vocab-mpt.gguf.inp
# models/ggml-vocab-mpt.gguf.out
# models/ggml-vocab-phi-3.gguf
# models/ggml-vocab-phi-3.gguf.inp
# models/ggml-vocab-phi-3.gguf.out
# models/ggml-vocab-refact.gguf
# models/ggml-vocab-starcoder.gguf.inp
# models/ggml-vocab-starcoder.gguf.out
# requirements/requirements-convert.txt
# scripts/compare-llama-bench.py
# scripts/run-with-preset.py
# scripts/verify-checksum-models.py
# tests/CMakeLists.txt
# tests/test-tokenizer-0.cpp
2024-05-06 18:09:45 +08:00
DAN™
889bdd7686
command-r : add BPE pre-tokenization ( #7063 )
...
* Add BPE pre-tokenization for Command-R/R+.
* Bump transformers convert requirement.
* command-r : add individual digits regex
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-05-05 08:19:30 +03:00
Georgi Gerganov
92139b90af
tests : add test-tokenizer-0.sh + fix some tokenizers ( #7036 )
...
* tests : add test-tokenizer-0.sh
* unicode : add all unicode number ranges
* starcoder : fix pre-tokenizer
* tests : add test that fails with DeepSeek tokenizers
* falcon : fix regex
* unicode : regenerate unicode tables
* refact : add tokenizer model
* lint : fix
* tests : disable failing tests
ggml-ci
* refact : add tests files
ggml-ci
* convert : print -> logging
ggml-ci
* lint : fix
* unicode : digit -> number
* phi-3 : update
2024-05-04 08:32:32 +03:00
Georgi Gerganov
f4ab2a4147
llama : fix BPE pre-tokenization ( #6920 )
...
* merged the changes from deepseeker models to main branch
* Moved regex patterns to unicode.cpp and updated unicode.h
* Moved header files
* Resolved issues
* added and refactored unicode_regex_split and related functions
* Updated/merged the deepseek coder pr
* Refactored code
* Adding unicode regex mappings
* Adding unicode regex function
* Added needed functionality, testing remains
* Fixed issues
* Fixed issue with gpt2 regex custom preprocessor
* unicode : fix? unicode_wstring_to_utf8
* lint : fix whitespaces
* tests : add tokenizer tests for numbers
* unicode : remove redundant headers
* tests : remove and rename tokenizer test scripts
* tests : add sample usage
* gguf-py : reader prints warnings on duplicate keys
* llama : towards llama3 tokenization support (wip)
* unicode : shot in the dark to fix tests on Windows
* unicode : first try custom implementations
* convert : add "tokenizer.ggml.pre" GGUF KV (wip)
* llama : use new pre-tokenizer type
* convert : fix pre-tokenizer type writing
* lint : fix
* make : add test-tokenizer-0-llama-v3
* wip
* models : add llama v3 vocab file
* llama : adapt punctuation regex + add llama 3 regex
* minor
* unicode : set bomb
* unicode : set bomb
* unicode : always use std::wregex
* unicode : support \p{N}, \p{L} and \p{P} natively
* unicode : try fix windows
* unicode : category support via std::regex
* unicode : clean-up
* unicode : simplify
* convert : add convert-hf-to-gguf-update.py
ggml-ci
* lint : update
* convert : add falcon
ggml-ci
* unicode : normalize signatures
* lint : fix
* lint : fix
* convert : remove unused functions
* convert : add comments
* convert : exercise contractions
ggml-ci
* lint : fix
* cmake : refactor test targets
* tests : refactor vocab tests
ggml-ci
* tests : add more vocabs and tests
ggml-ci
* unicode : cleanup
* scripts : ignore new update script in check-requirements.sh
* models : add phi-3, mpt, gpt-2, starcoder
* tests : disable obsolete
ggml-ci
* tests : use faster bpe test
ggml-ci
* llama : more prominent warning for old BPE models
* tests : disable test-tokenizer-1-bpe due to slowness
ggml-ci
---------
Co-authored-by: Jaggzh <jaggz.h@gmail.com>
Co-authored-by: Kazim Abrar Mahi <kazimabrarmahi135@gmail.com>
2024-04-29 16:58:41 +03:00
Concedo
1e460bb936
remove junk
2024-02-17 17:12:59 +08:00
Concedo
fe7c200610
Merge branch 'master' into concedo_experimental
...
# Conflicts:
# .devops/full-cuda.Dockerfile
# .devops/full-rocm.Dockerfile
# .devops/full.Dockerfile
# .devops/main-rocm.Dockerfile
# README.md
# flake.lock
# flake.nix
# ggml-cuda.cu
# requirements.txt
# tests/CMakeLists.txt
2023-12-31 00:42:59 +08:00